Quantcast
Channel: SCN: Message List - SAP HANA Developer Center
Viewing all articles
Browse latest Browse all 9165

Re: Better compression due to column based storage grouped by native datatype ?

$
0
0

If it would be a good idea really depends on what you want to achieve.

If the goal is to have a single dictionary per technical data type then this would be the way to go.

You would of course then have a single shared resource for all tables using the specific data type and thereby a technical dependency between otherwise independent tables.

 

Also things like checking for the existence of a specific value or min/max evaluation gets more complex and now will always require full column scans of the value vectors.

Compression wise you might even end up worse then saving the dictionaries separately, since the more different values you store the more bits you need to encode the dictionary id.

So you could easily end up with multi-byte ID values where you otherwise could have survived with e.g. a single byte for each columns value range.

 

Even more worrying: delta merge and distributed systems. Multiple delta merges could block as they would need to operate on the same dictionary. When this happens on multiple nodes you end up with a lot of cross node traffic.

And think about how you would clean up unused values in this - checking every single column for the occurrence of a to-be-removed dictionary value. This is a total transaction halt right there.

 

All that for what benefit? Basically only that there is no need for translation tables during joins any more. That means much faster join-performance for complete loss of parallel data update ability. Here we are back at loading data into the query accelerator... BWA anyone?


Viewing all articles
Browse latest Browse all 9165

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>