I am also having problems with this.
So far, what I found is that if you create a varchar field but try to insert special caracter not in the 7 bit ascii table (NOT the 8 bit extended ascii), then hana will automatically store that caracter in a unicode format that will take more space.
.. I think it does this,.. we cannot be sure
According to unicode.org
http://www.unicode.org/faq/utf_bom.html
UTF-8 can take between 1 and 4 bytes per character.
What I think happen with Hana is that they store VARCHAR type as UTF-8 and NVARCHAR as UTF-8 or UTF-16.
When they store varchar, they calculate the size of the field taking into account that it will contains only ascii caracter so 1 caracter will only take 1 byte. So they allocate 20 bytes for a 20 character length.
As the varchar back-end is a utf-8, you can store unicode characters inside it, however, if they are specials characters, they will take more than 1 byte per character and break hana estimate on the size of the fied.
Example :
CREATE TABLE varchartest(varcol varchar(10), nvarcol nvarchar(10))
--It work, as it's normal character.
insert into varchartest(varcol,nvarcol) values ('1234567890','1234567890');
--does not work, inserted value too large for column
insert into varchartest(varcol,nvarcol) values ('éééééééééé','éééééééééé');
--does not work, inserted value too large for column
insert into varchartest(varcol,nvarcol) values ('éééééé','éééééééééé');
--This one work, é character must be 2 bytes wide so 2 bytes * 5 character = 10 bytes, as defined by the varchar column.
insert into varchartest(varcol,nvarcol) values ('ééééé','éééééééééé');
--still work, 1 more caracter in the varchar does not work.
insert into varchartest(varcol,nvarcol) values ('ààààà','àààààààààà');
--does not work : 4 chinese character don't fit into 10 bytes on the varchar field.
insert into varchartest(varcol,nvarcol) values ('章章章章','章章章章章章章章章章') --6
--work, we can fit 3 chinese character.
insert into varchartest(varcol,nvarcol) values ('章章章','章章章章章章章章章章') --6
With this, we know that a chinese character take 3 normal character.
--This will also give the length of the string as it would be saved on the database.
select length(CAST('章' AS text)) from dummy --- return 3 character
If you select * the table, you will see that your unicode text into the varchar field were correctly saved.
Anyway, this problem is big for my team
1. we cannot use nvarchar because we are using legacy application that cannot deal with unicode string.
2. we just want to use characters in the Extended ascii with 8 bits, however, they trigger that problem.
If you could set the catalogue of the table or the schema to something that include our set of characters, then the caracters could would always be correct. However, we cannot do that.
bad solution :
3. I guess that we will multiply the length * 2 or *4 each varchar fields to prevent user crashing the insert when the real count is correct but the database think otherwise.