Hi Micha,
a phrase index is as the name suggests an index for phrases. A phrase means here the combination of at least two text tokens in their actual order in the text. For indexing we only consider 2 token phrases. The phrase index is an optional part of a full text index. Small example.
We have the following text: “This is a text.”
After tokenization we have the tokens “this”, “is”, “a” and “text”. These are always indexed in the full text index.
Now you have the following phrases: “this is”, “is a”, “a text”, “this is a” “is a text” and “this is a text”. Only the subset “this is”, “is a”, “a text” is relevant for phrase indexing.
Depending on the setting of the phrase index ratio a certain amount of them will be indexed. The phrase index ratio describes the relative amount of memory compared to the actual full text index that can be used for indexing phrases. The default 0.2 means that 20% of the full text index size should be used for phrases. If the size of the full text index is 10MB and you specify 0.2 the phrase index can consume up to 2MB. Both together will then consume up to 12MB. Regarding the "may" or "will" it consume so much memory. The specified value is an upper boundary for the phrase index size. If the phrase index can be built with less memory it will consume less.
If you set the ratio to 0.0 there will be no phrase indexing at all. This has no functional impact, as the free text search capabilities are already secured by the full text index.
The impact will only be a possible performance degradation for phrase searches. If you search our example above for “This is” it will probably be faster with phrase index ratio 1.0 compared to 0.0. Searching for This is without double quotes would not see performance impact as you would not search for the phrase “this is”, but for the single tokens connected via AND.
Another thing regarding default settings of SAP HANA objects. You can have a look at the object definition to see the complete CREATE statement containing default values if not specified otherwise. In this case you can check the following example:
CREATE SCHEMA FULLTEXT_TEST;
CREATE COLUMN TABLE FULLTEXT_TEST.TEST_TABLE (A NVARCHAR (5000), B SHORTTEXT(5000), C TEXT);
CREATE FULLTEXT INDEX FULLTEXT_TEST.FTI_A ON FULLTEXT_TEST.TEST_TABLE (A);
CALL GET_OBJECT_DEFINITION ('FULLTEXT_TEST','TEST_TABLE');
CALL GET_OBJECT_DEFINITION ('FULLTEXT_TEST','FTI_A');
--drop schema FULLTEXT_TEST cascade; -- for cleanup
Please be aware that you only see the actual CREATE statement. If an object was created with values other than the default, you will see them and not the default values.
Kind regards,
René Oschmann
SAP HANA Development Support