Quantcast
Channel: SCN: Message List - SAP HANA Developer Center
Viewing all articles
Browse latest Browse all 9165

Re: How is data retrieved from Column table for changed records?

$
0
0

Hi there!

 

The way we do this is rather straight forward.

The column store tables consist of multiple data structures, that are presented to the SQL interface as a table.

 

Obviously an important part of column store tables are the columns.

Again, these consist of several data structures.

One is the 'main' store - basically a value dictionary and a huge vector of pointers to this value dictionary.

This 'main' store may be compressed and is highly optimized for read access.

 

Changed or new data is put into what is named 'delta' store. If you followed the HPI lecture then this is the 'differential' store.

In this 'delta' store the data is not so much optimized or compressed, but just appended upon data change.

 

Now, clearly, it's required to distinguish between the data in the main store that is valid right now and data that got changed meanwhile.

This requirement simply results from transactional consistency.

So what you have to figure out is a way to identify the data that is valid for any point in time.

Doing that is not that difficult, if you think about it.

 

If you need to know if data is valid at a given point in time, you need to stamp it with the point in time when it stopped being valid. Kind of a 'best-before' date.

Since we're in a transaction timed world, we don't need the wall time, but can simply take a transaction id or commit id or system change nr. - whatever does the trick for you.

 

Now, with the added information assume a table like this:

 

-- 'main store' data --

[last valid transaction] [name]    [city] ...

--                      "Lars"    "Wien"

  --                      "Matt"    "Petersburg"

  4                       "Ellie"   "Dublin"

 

-- 'delta store'  data --

[last valid transaction] [name]    [city] ...

2                       "Joe"     "Tokyo"

  --                      "Mary"    "Cologne"

--                      "Ellie"   "San Francisco"

 

Now, we're on transaction nr. 6 meanwhile (assume that there had been other transactions in the system that did not affect this table) let's run a SELECT from this table and we get this:

 

[name]    [city] ...

"Lars"    "Wien"

"Matt"    "Petersburg"

"Mary"    "Cologne"

"Ellie"   "San Francisco"

 

We see "Lars" and "Matt" in there, because the data wasn't changed, thus "last valid transaction" was not set.

 

We see "Ellie" in "San Francisco" as the entry for "Ellie" in the 'main store' was not valid anymore since transaction nr. 4. After looking in the 'delta store' we found a version that was still valid and that's the one we see.

 

We also see "Mary" who's entry was put as new into the 'delta' store and is still valid.

 

We don't see "Joe", even though his entry is in the 'delta' store, since its not valid any more since transaction nr. 2.

 

As you've figured out by now, we read the 'main' store  PLUS the 'delta' store to find all the valid records.

All this has nothing to do with the history tables concept, but is what happens on standard column store tables.

For history tables, we would need to actually preserve the old data states, which is not the case here.

 

Upon MERGE time, all entries that are valid at this point in time are put into the main store, optimized, compressed and the not valid records get thrown out.

 

 

- Lars


Viewing all articles
Browse latest Browse all 9165

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>