Hey there, Could you explain what you mean by "losing its meaning"? It's possible you may need to set the character set: http://dev.mysql.com/doc/connector-j/en/connector-j-reference-charsets.html. -Abe answered Nov 21 2014 at 12:45 |
Hi Abe, Well with the above statement I mean to say that the data which is residing in mysql is different from what is been imported via sqoop. So let me shoot out an example for the same, *Data in mysql : *सुरेन्द्र कुमार पाण्डेय *Data in HDFS(Sqoop import) : * M-`M-$M-8M-`M-%M- So this is the kind of changes I am landing into which is completely loosing the meaning of the data. Any help would be appreciated. Thanks again! answered Nov 22 2014 at 00:28 |
This could be in 2 places: Loading to HDFS, or extracting from MySQL. Sqoop should load every thing as UTF-8 by default, which supports Hindi. What is your default character set in MySQL? Could you copy/paste your my.cnf? Also, what version of MySQL are you running? answered Nov 22 2014 at 11:10 |
Hi Abe, Thanks for your mail, well mysql table is defined with utf-8 and even the data is visible like mentioned below, *Data in mysql : *सुरेन्द्र कुमार पाण्डेय but as I move the same through sqoop import of data gets corrupted, as provided in the last thread of this mail. Well I even tried to set the parameters *useUnicode=true&characterEncoding=utf8* and *--direct -- still there's no luck. Additionally, the data is containing some control character like Ctrl-A (x001) and Ctrl-M likewise, which is even violating the field delimeter set to sqoop import precisely as Ctrl-A. Is there a way to keep a possible delimeter which can handle/work with any special or control character introduced. Looking out for quick response. Thanks! answered Nov 24 2014 at 02:50 |
Well it seems to be the issue with Mysql Client configuration present on the datanodes where sqoop is invoking the m/r job. I performed a test on my local machine dumping the same data to mysql and did a sqoop import to the hdfs and I can clearly see the data boarded to HDFS. This clearly indicates that the issue was in mysql client configuration which I need to rectify and set character-set type to utf-8(I thought the default character-set would be set to utf-8). But still the later part of the question remains same, how do I manage the control character present in the data as I don't know what could be the part of data(as I have encountered Control characters), setting delimiter as Control character would not solve the meaning if the data contained that character itself. Looking out for the standard solution. Thanks! answered Nov 25 2014 at 21:45 |
Group Sqoop-user
asked Nov 21 2014 at 05:57
active Nov 25 2014 at 21:45
posts:6
users:2