Hi Felix
Something else: you store the IP4 address as a string.
You really shouldn't do this.
Instead invest a little effort into the loading of the data and your SQL coding and keep the 4 bytes separated (e.g. byte1, byte2, byte3, byte4).
That way you get way better compression as you can only get 256 different values in each byte in the worst case (which you won't hit, due to the way IP addresses are maintained).
By storing the string you always store the combination of the bytes + three dots which really doesn't add anything to your query.
With this you end up with a much larger dictionary, and a value vector that compresses much worse due to less repetitions.
The smart thing to do here is to store the data in separate bytes. Just make sure to query the combination of bytes correctly and by using the tuple-notion:
... WHERE (y.byte1, y.byte2, y.byte3, y.byte4) = (x.byte1, x.byte2, x.byte3, x.byte4)
This way even opens the opportunity for a higher degree of parallelism as each byte column can be filtered independent from the others.
Without having this tested on some realistic data volume I say: I would be surprised if this wouldn't increase the performance of your solution.
- Lars