FWIW - I used the following sed command to replace the SOH characters with commas
sed "s/\x01/,/g" 000000 > 000000.csv
I have two challenges with this approach:
- This introduces another process into my "Big Data" flow using AWS HIVE
- All my records don't load! In the file, there are 30,000 rows and only 2000 load. To make matters worse - it's not the first 2000 characters that were loaded.
![]()
Needless to say '\x01' didn't work directly in the IMPORT FROM statement.
The reason that I didn't get the expected result is that one of my data values every now and then had a comma in the name. The good news is that I could use the '|' pipe character - as there were none in the source file.
So, the updated sed command achieved the desired result:
sed "s/\x01/,/g" 000000 > 000000.csv
I then changed my IMPORT FROM command to the following:
IMPORTFROM CSV FILE '/wiki-data/year=2013/month=04/000000.csv'
INTO"WIKIPEDIA"."pagecounts"
WITH RECORD DELIMITED BY'\n'
FIELD DELIMITED BY'|';
As a result, I got all 30,000 records. I'd still like to be able to process the file directly - any help would be appreciated. At least I'm not blocked for now.
Regards,
Bill