[Sqoop-User] Is hash-based partition supported by Sqoop?

Hi, guys,
I have a question how Sqoop imports the data in parallel. IMO, Sqoop first
gets the min and max values for the SPLIT_BY column, and then does a
range-based partition, to let each mapper consumes one range. Do we support
hash-based partition, like each mapper ingests the data satisfying query
"select * from table where hash(split_by) % n = i" ?
thanks,
Wei

Reply To : Is Hash-based Partition Supported By Sqoop?

asked Nov 5 2015 at 13:59

Wei Yan

Related discussions

Using BLOB Created By Sqoop

Using Sqoop I’ve successfully imported a few rows from a table that has a BLOB column. As indicated in the Sqoop documentation, it has created ‘_lob’ directory with files such as: large_obj_attempt_201503141229_83736_m_000004_00.lob for *some* of the rows. Questions: 1) As per doc, only files over 16M will go in this directory, correct? 2) How do I know which row this file is related

Sqoop Import Into A Hive Table With 2 Columns As Partition Keys

Hi, I am trying to run sqoop import into a Hive table partitioned on 2 columns (col1 string, col2 string). I looked at the documentation for both --hive-import and --hcatalog and could not find any such option. It works successfully when table is partitioned on 1 column but not when i increase number of partition columns. Has anyone tried it and knows how to achieve it ? Thanks,

Sqoop And More Than One Partition Key

Hello, I am trying to import and create hive table which has more than one partition key using below mentioned command however command fails giving below mentioned error. Is more than one partition key supported in Sqoop. 13/01/04 17:08:01 INFO hive.HiveImport: FAILED: ParseException line 1:403 cannot recognize input near ',' ' sqoop --options-file connect.parm --table TABLEA --columns "f1, f2,

HCat And Non-string Partition Types

I have some Hive tables that are partitioned by an int field. When I tried to do a Sqoop import using Sqoops HCatalog support, it failed complaining that HCatalog only supports string partitions. However, I�ve used HCatalog in mapReduce jobs with int partitions successfully. The docs that I�ve seen don�t go into much detail on this subject. Can someone clarify or point me to a definitive doc? (

Does --hive-partition-key Work For Import-all-tables

Hi, We have close to a 100 SQL tables that we are trying to import into Hive using sqoop import-all-tables command. Not all the tables share some common key. So if we assigned one partition key using --hive-partition-key argument will it have any impact on the tables that do not have this key? Thanks, Mohit Mehra

Sqoop Is At Stuck

Hi, I had running sqoop export commands like: sqoop export --connect 'jdbc:oracle:thin:@::' --username DUMMY --password 'DUMMY' --table TEST_STG --export-dir /user/export_import/SQOOP --fields-terminated-by '\t' --lines-terminated-by '\n' This command had run successfully but suddenly it become falling into stuck. When I start, it prints the normal message like this: [root@devnode1 ~]# sqoop export

Sqoop Is Moving To Github!

Hi Hadoop, Hive, and Sqoop users, For the past year, the Apache Hadoop MapReduce project has played host to Sqoop, a command-line tool that performs parallel imports and exports between relational databases and HDFS. We've developed a lot of features and gotten a lot of great feedback from users. While Sqoop was a contrib project in Hadoop, it has been steadily improved and grown. But the contrib

How Efficient Is Sqoop?

How efficient is sqoop when compared to transferring data using sftp after extracting the tables in to csv files? The transfer speed and reliability!! You received this message because you are subscribed to the Google Groups "CDH Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https

Sqoop Import Issue With Split-by Column

All , Am trying to run a free form SQL from Sqoop to import data from Oracle 10g , every time I keep getting the split-by column mentioned is invalid . Can you please help figure what am I missing . Sqoop version : 1.4.3 Oracle 10g Ojdbc6.jar Error 14/03/14 08:55:16 INFO mapred.JobClient: Task Id : attempt_201403100830_0118_m_000009_0, Status : FAILED java.io.IOException: SQLException

Sqoop Split-by Column Limiting Map Tasks

I'm using Sqoop 1.4.1 to import a table from MySQL to HDFS. The table contains log entries by users who are identified by an integer user ID but does not have a primary key. Because of the way user ID's were assigned, lower value ID's have more records in the table than larger ID's making parallel imports extremely unbalanced (I'm only running 7 map tasks). In order balance the parallel import, I

Is Hash-based Partition Supported By Sqoop?

Related discussions

Sqoop-commits

Sqoop-dev

Sqoop-user