keep calm
and use elephant-bird
https://github.com/kevinweil/elephant-bird
I posted here yesterday an example how to load tweets in json
here goes again. I hope it helps.
register 'elephant-bird-core-3.0.0.jar'
register 'elephant-bird-pig-3.0.0.jar'
register 'google-collections-1.0.jar'
register 'json-simple-1.1.jar'
json_lines =3D LOAD
'/twitter_data/tweets/stream/v1/json/2012_10_10/08' USING
com.twitter.elephantbird.pig.load.JsonLoader();
geo_tweets =3D FOREACH json_lines GENERATE (CHARARRAY) $0#'id' AS
id, (CHARARRAY) $0#'geoLocation' AS geoLocation;
only_not_nulls =3D FILTER geo_tweets BY geoLocation is not null;
store only_not_nulls into '/twitter_data/results/geo_tweets';
Arian Rodrigo Pasquali
FEUP, SAPO Labs
http://www.arianpasquali.com
twitter @arianpasquali
2012/11/18 Dan Young
> No sure if this helps, but in 0.11 I've been using this on EMR for some o=
f
> our JSON data....
>
> raw =3D load 'hdfs:///cleaned_logs/clicks2/$year_id/$month_id/part-*' USI=
NG
>
> JsonLoader('a:chararray,at:chararray,c1:(url:chararray,useragent:chararra=
y,referrer:chararray,window:(innerheight:chararray,innerwidth:chararray,out=
erheight:chararray,outerwidth:chararray),resolution:(height:chararray,width=
:chararray)),cst:chararray,d:(a:chararray,b:chararray),i:chararray,id:chara=
rray,ip:chararray,k:chararray,l:(lat:chararray,lng:chararray),p:chararray,p=
v:chararray,sa:chararray,sid:chararray,sst:chararray,t:chararray,uuid:chara=
rray,v:chararray');
>
>
> Regards,
>
> Dano
>
>
>
> On Sat, Nov 17, 2012 at 3:09 PM, Russell Jurney >wrote:
>
> > I have some JSON data with a uniform schema. I want to load it in Pig.
> > JsonStorage doesn't work, because the data has no schema.
> >
> > How can I load JSON data in Pig?
> >
> > --
> > Russell Jurney twitter.com/rjurney [email protected]
> > datasyndrome.com
> >
>
answered
Nov 18 2012 at 02:30
Arian Pasquali