PostgreSQL begins within the list
Fast way how to find addresses matching loooong list of postal codes, and the actual address postal code may be in short version of the long version
eg values in DB may be:
eg values in DB may be:
- addresses.postal_code == "12345"
- addresses.postal_code == "12345-8767"
- addresses.postal_code == "N17 9GD"
SELECT * FROM addresses WHERE postal_code SIMILAR TO '(12758|60937|N17 9GD)%") // notice there is no % at start, just at the end ^^^^^^^^^^^^ imagine this list to be ten thousand postal codes
or if you want case insensitive:
SELECT * FROM addresses WHERE lower(postal_code) SIMILAR TO '(12758|60937|N17 9GD)%");
Don't use LIKE or ILIKE
ILIKE & LIKE will be slower (any of the following examoles will work but will be slower than proposed solution)
SELECT * FROM addresses WHERE postal_code ILIKE '80013%' OR addresses.postal_code ILIKE '80014%' OR addresses.postal_code ILIKE '80015%' // slow, long and ugly once you place here 10 000 postcodes
there is also this format of ILIKE:
SELECT * FROM addresses WHERE postal_code LIKE any (array['80013%', '80014%', '80015%']); SELECT * FROM addresses WHERE postal_code ILIKE any (array['80013%', '80014%', '80015%']);
... it will still be slower than proposed solution
Also be aware this will suck too:
SELECT * FROM addresses WHERE postal_code lower(value) ~~ ANY('{80013%,80014%,80015%}');
~~ is just another name for LIKE: "The operator ~~ is equivalent to LIKE, and ~~* corresponds to ILIKE. There are also !~~ and !~~* operators that represent NOT LIKE and NOT ILIKE, respectively. All of these operators are PostgreSQL-specific."
Sources
- lot of inspiration from this stackoverflow question
- SIMILAR TO regexp
- POSIX Regular Expressions
Match reqexp with ~
- text ~ text → matches regular expression, case sensitively. ( 'thomas' ~ 't.*ma' → t )
- text ~* text → matches regular expression, case insensitively ( 'thomas' ~* 'T.*ma' → t )
- text !~ text → does not match regular expression, case sensitively ( 'thomas' !~ 't.*max' → t )
- text !~* text → does not match regular expression, case insensitively ( 'thomas' !~* 'T.*ma' → f )
`~` will use all regex operators, so we could use `^` operator, which will not make a full match of the string.
note that ~ uses a seqscan even when it's substantially more expensive (artifically so due to enable_seqscan) because it has no alternative, while LIKE uses the index. However, a corrected ~ with a left anchor uses the index too
source
So theoretically this should be the fastest way to do it:
SELECT * FROM addresses WHERE postal_code !~ '^(22222|33333)' SELECT * FROM addresses WHERE postal_code ~ '^(22222|33333)'
However I've done some tests and when it comes to
SELECT * FROM addresses WHERE postal_code SIMILAR TO '(12758|60937|02218)%")
...and
SELECT * FROM addresses WHERE postal_code ~ '^(12758|60937|02218)
...are same speed ( tiny tiny difference ). Same was for negative versions `postal_code NOT SIMILAR TO` and `postal_code !~ '^(80013|...` .
Now here is the important part: Yes ~ was same speed thanks to ^ BUT if you ever decide you want to remove the ^ then "SIMILAR TO '(123|456)%'" is faster 🫨