Quantcast
Channel: SCN: Message List - SAP HANA Developer Center
Viewing all articles
Browse latest Browse all 9165

Re: When to go for which type of data partitioning in hana

$
0
0

Hi,

 

1.Hash and Range require primary key to be part of partitioning columns and Round-robin do not. Can anyone tell exactly when to go for Hash, when to for round-robin and when to go for range.


First of all, I think there may be some misunderstanding about "Hash and Range require primary key to be part of partitioning columns and Round-robin do not." You can find the complete description from the section Single-Level Partitioning - SAP HANA Administration Guide - SAP Library


For Hash and Range partitioning:

 

For each hash partitioning specification, columns must be specified as partitioning columns. The actual values of these columns are used when the hash value is determined. If the table has a primary key, these partitioning columns must be part of the key. The advantage of this restriction is that a uniqueness check of the key can be performed on the local server. You can use as many partitioning columns as required to achieve a good variety of values for an equal distribution.

 

 

So, here's two examples according to my understanding.

 

  • If the primary key of the table is just one column (column A), the partitioning column must be column A.

 

  • If the primary key of the table has two columns (column A and column B), the partitioning column(s) can be one of the following three:
    • column A
    • column B
    • column A and column B

 

As stated in the official guide, "The advantage of this restriction is that a uniqueness check of the key can be performed on the local server". Imagine you have 100 partitions. Let's still use the above two examples.

 

  • Primary key is column A. Since you use column A as the partitioning column, when a tuple is to be inserted, hash(A) or range(A) can be located in only one partition (for hash function, same input then same output), e.g., partition 59. So, the uniqueness checked is only necessary on partition 59 and other 99 partitions has nothing to do. However if you used non-primary key column as the partitioning column, how can the uniqueness check be performed? Let all partitions check and wait for all 100 results? Absolutely NO.

 

  • Primary key has column A (INTEGER) and column B (NVARCHAR).
    • If you just use column A as the partitioning column. When a tuple is to be inserted, hash(A) can only be located in only one partition. For example, partition 59 has (59, 'A') and (59, 'B'), when you want to insert (59, 'C'), only partition 59 needs to check the uniqueness and say it's OK or not.
    • If you just use column B as the partitioning column. Same with just use column A. For example, partition 60 has (59, 'A') and (60, 'A'), when you want to insert (61, 'A'), only partition 60 needs to check the uniqueness.
    • If you use column A and B as the partitioning columns. The uniqueness check is also performed in only one partition. For example, partition 59 has (1, 'B') and partition 60 has (2, 'A'), when you insert (1, 'B'), partition 59 will say NO.

 

So, the partitioning column(s) must be part of the primary key and "Hash and Range require primary key to be part of partitioning columns" is wrong. Just imagine the following scenario:

 

The primary key of the table has two columns column A (INTEGER) and column B(NVARCHAR). There's another column C (INTEGER). If you use all three columns (A, B, C) as the partitioning columns, the problem will happen. For example partition 59 has (1, 'B', 1) and partition 60 has (2, 'A', 2), when you insert (1, 'B', 2), the result of the hash function may go to partition 60 which can not check the uniqueness correctly since only partition 59 can check that!

 

 

For Round-Robin partitioning:

 

Round-robin partitioning is used to achieve an equal distribution of rows to partitions. However, unlike hash partitioning, you do not have to specify partitioning columns. With round-robin partitioning, new rows are assigned to partitions on a rotation basis. The table must not have primary keys.

 

So, if you use round-robin partitioning, the table must not have primary keys, since it's round-robin and there's no uniqueness check. Partition 1, then partition 2, then partition 3, then partition 1, ...

 

For your question "Can anyone tell exactly when to go for Hash, when to for round-robin and when to go for range.". My answer is that it's based on your scenario and demand. You can use range partitioning to partition you data into various months when your table has a date column and your data is increasing month by month. Just an example.




2. In has partitioning, H(x) where x is the integer in the primary key field, is used to determine the partition in which the integer goes. Can anyone please provide more information on this. How hash function is used exactly for partitioning and how can we query to know the partitions?

 

You may have a look at Hash function - Wikipedia, the free encyclopedia and I think from the above examples in question 1 you may also get some info.

 

Best regards,

Wenjun


Viewing all articles
Browse latest Browse all 9165

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>