Hi, all,
I installed nutch 1.1 using 6 servers following "http://wiki.apache.org/nutch/NutchHadoopTutorial",and
I tried to do some crawl. But I found that the "crawl" job runs very slow. The"craw" takes
more than 10 mins sometimes 20, while other jobs like "crawldb" and"generate" takes only about
30 seconds.Any idea?
Dennis
We have Hue 3.5 connecting to HiveServer2 (Hive-0.12). All the queries run fine. However, just opening the metastore browser takes a lot of time. We have been profiling our Hue instance and the time it takes for "GET /metastore/tables/" call is around 100 seconds.
However, a "show tables" query in beeswax takes around 2-4 seconds. The number of tables we have is close to 4500.
Is it due to the
I am using CDH3 update1 hadoop 0.20.2, the cluster is composed of 9 nodes(1NN + 8 (DN and TT)) , I found that it's very slow to submit job to the jobtracker (using hive or hadoop jar command) when there are dozens of jobs are running on the cluster, when the cluster is idle, submitting job to the cluster is fast.
but when I use CDH3 Beta2, there was no such problems , is there some change in
Hue is very very slow, even in home, sign in page.
I have switch the database engine to MySQL in another host. So I think db is not bottleneck.
How to debug Hue to find out the reason ?
Thanks!
Hi,
I wrote a custom InputFormat for parsing through the Enron Email corpus whic=
h is attached in the file named EmailInputFormat
I have attached the code in a text file with the sample input mail also atta=
ched as a text document
The EmailClass extends Writable and implements all the methods needed to be=
implemented and also contains an initiate function to initialize the values=
in that
---------- Forwarded message ----------
I just saw http://github.com/igal/ruby_datastores/raw/master/2009-08-04%20Non-relational%20data%20stores%20for%20Ruby%20r1.pdf
the comments for CouchDB is 'very very very slow'... I am not sure about the context.
Slide #13 has some performance data. The difference is significant.
any comments ?
rgds,
canal
Hi,
The generator time has gone from 8 minutes to 106 minutes few days ago and stayed there since
then. AFAIK, I haven't made any configuration changes recently (attached you can find some
of the configurations that I thought might be related).
A quick CPU sampling shows that most of the time is spent on java.util.regex.Matcher.find().
Since I'm using default regex configurations and my crawldb
Hello All,
We're performing a database import from standalone MySQL to Cluster. The
import procedure is using dump file. The problem is insert is very slow,
takes ages to finish.
Our Cluster is in Gigabit Network, RAID 5 disks, nbdmtd. How can I
investigate and find the solution for this situation?
Thanks,
reza
Everyone,
I have spent the last 3 or 4 hours looking into this
and have not found a good solution and am wondering if
anyone else has seen this.
I have a table PhotoID (PK), PhotoName char(40), image
MediumBlob(), Caption char(100) and a few other
columns. The database is like 4 Gig. and runs great
expect. I know someone might say why put the images
in the database, but of any solution