Hi
I would be grateful for any tips on how to "prepare" the data so they can
be exported to a Postgesql Database using sqoop.
As an example:
Provided some files of events. (user events, product events,
productActivity events)
[file0001]
event:user propertes:{name:"john" ...}
event:product properties:{ref:123,color:"blue",...
event:productActivity properties:{user:"john", product:"ref", action:"
I trying to import HDFS data to Mysql using the following command:
/opt/cloudera/parcels/CDH-5.0.0-1.cdh5.0.0.p0.47/bin/sqoop export --driver com.mysql.jdbc.Driver --connect jdbc:mysql://server:port/dbname --table table_name --username user --password pwd --export-dir /user/hdfs/hiveoutput/consolidatedconsumption/Hits/* --input-fields-terminated-by '\t' --input-lines-terminated-by '\n'
I'm trying to read data from ZooKeeper nodes that was written by different
Kafka components. As a specific example (just one from a bunch), I'm trying
to read current offset for specific group, topic and partition. As far as I
understand, it is stored under the path
/consumers/data-processing-team/offsets/unloads/35
I'm using `com.101tec.zkclient` to get data. I'm able to walk through
Hi,
I have installed impala 0.6 and CDH 4.2, i have setuped my cluster with three data nodes and a namenode. First, i created a table data stored as TEXTFILE format in hive, And i have loaded about 150 millons rows into the table, I could query data in hive and in impalad-shell without any errors, But it was too slow query speed(described on https://groups.google.com/a/cloudera.org/forum/#!topic
I have a text file which is extracted from a non-sql
database each night and then a cron sql script runs to
insert the text data into the mysql database tables.
My problem is that the date data in the text file is
formatted incosistently (12/31/00 or 12-31-00) and so
the fields that hold date data are currently char
datatypes.
Since I need the dates to be dates for queries, I need
a solution
I was doing bulk insert into MongoDB using NodeJs (native -driver). I have date field in the data. Is there anyway to store the date field as Date rather than String?
I have date in dd/mm/yyyy format. In current scenario I attain the result by iterating through the bulk data converting the date into mm/dd/yyyy format, then create new Date and save.
Since the iteration takes too much time
Hello Everyone,
When I tried to import the below data from an Oracle table (columns delimited by ',') to HDFS using the below mentioned Sqoop command,
12345,1-1SKCE5P,null,2013-10-11 06:23:22.0,2014-12-02 14:22:32.0,Switched "INFONET CONFERENCING" GSP P3519,null,OS
sqoop import --connect jdbc:oracle:thin:@//xxxxx:xxxxx/xxxx_xxxx --username SUMAN --password-file /user/$USER/sqoop.password --
This is the CREATED_TIME *2009-12-14 10:15:54*
*
*
How I can get only the date part from the above created_time, just like
below.
*2009-12-14*
Any suggestions will be appreciated.
*Raihan Jamal*
Hi,
I am attempting to import some of our data into SOLR. I did it the quickest way
I know because I literally only have 2 days to import the data and do some
queries for a proof-of-concept.
So I have this data in XML format and I wrote a short XSLT script to convert it
to the format in solr/example/exampledocs (except I retained the element names
so I had to modify schema.xml in the conf directory
I'm going through the tutorial at
https://cwiki.apache.org/Hive/tutorial.html . It's not clear to me what
the exact format of the log file would be for the sample queries described
eg at https://cwiki.apache.org/Hive/tutorial.html#Tutorial-LoadingData I
can't find a link to download such a file and while I'd be happy to
construct one myself it's not clear to me what a viewTime of type INT would
Hi,
We are using same underlying column family and extract the data using Hive
query and CQL query.
Column family meta data contains Comparator='IntegerType'
and default_validation_class = FloatType.
CREATE COLUMN FAMILY cpu_avg_5min
WITH COMPARATOR = 'IntegerType'
AND key_validation_class = UTF8Type
AND default_validation_class = FloatType;
Queering through Hive using a Hive query returns readable
Hi,
I'm using Nutch 2.1 (Inside Eclipse) + Solr 4.0.0 with schema-solr4.xml. The run configuration
in eclipse is:
org.apache.nutch.crawl.Crawler
urls -solr http://localhost:8080/solr/#/collection2 -threads 1 -depth 1 -topN 3
-Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
Rarely, it works fine, but most time there's an exception in console:
Adding 1 documents
Exception in thread "main" java.lang
hi... currently i am integrating nutch (release 1.2) into solr
(trunk). if i indexing to solr index with nutch i got the exception:
java.lang.RuntimeException: Invalid version or the data in not in
'javabin' format
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
at
org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser
We are looking for databases with Unicode implementation. In the TODO list
there is an entry "Add support for UNICODE".
When will this feature be available?
Display of Chinese Characters GmbH Tel : +49 511 / 9357-810
Sven Just Fax : +49 511 / 9357-819
Vahrenwalder Str. 7 WWW : http://www.dcc-asia.de
30165 Hannover, Germany
A named list contains a key value pair. At the very basic level, if we want
to access the data that is contained in named list
NamedList foo = thisIsSolrQueryResponseObject.getValues();
Entry bar = null;
// Creating a iterator to iterate through the response
Iterator It =foo.iterator();
while (It.hasNext()) {
bar = It.next();
SolrDocumentList solDocLst = (SolrDocumentList) bar.getValue();
for (int
Using $scope.gridOptions.filterOptions.filterText i am able to filter my data. For example to filter on the basis of Destination Country, i am setting the filterText as:
$scope.gridOptions.filterOptions.filterText += 'DestinationCountry:' + $scope.filter.DestinationCountry + ';';
However i am unable to figure out how to filter a date column on the basis of date range i.e. between from-date and
I am using Solr 4.3.1 on solrcloud with 10 nodes.
I added 3 million documents from a csv file with this command
curl
'http://localhost:8080/solr/trcollection2/update/csv?stream.file=/home/hduser/csvFile.csv&skipLines=1&fieldnames=,cache,segment,digest,tstamp,lang,url,,content,id,title,boost&stream.contentType=text/p
lain;charset=utf-8'
Then I query the data, fetching first 100K documents
Hi,
Thanks for your reply. But I need one clarification. When you say it will contain the data
you requested for, do you mean the data as requested in fl parameter of the query?
Thanks.
Aman
Hello,
I have a csv file that has columns which contains commas within a string
enclosed with a ". ex: column name:*'Issue' *value:*"Other (phone, health
club, etc)"*
*Question:* What should the data type of 'Issue' be? Or how should I format
the table (row format delimited terminated by) so that the comma in the
column (issue) is accounted for correctly
I had set it as below but this
0