Working of SGE_Hadoop Integration

Discussion:

adarsh

2010-12-06 08:44:49 UTC

Dear all,

Thanks for your replies so that I am able to configure Hadoop with SGE integrated on 10 nodes cluster.

I overcomed all the difficulties faced during Configuration.

Yet there are some doubts in my mind.

1. I loaded data of different types in Hadoop ( 24MB, 2 GB, 20 GB file ). When i issued a command ./qhost -F | grep hdfs, it shows data paths. But when I ran any SGE job on these types of data files,it executes on only 1 execution daemon.

It is good for small files, but for 20 Gb file, data is distributed on 10 nodes. So it might runs all tasktrackers for running wordcount. But it shows only one execution daemon.
I check through Web UI and logs, only one execution daemon is running.

It causes data transfer to one node which takes too much time.

What is the benefit, Hadoop made for distribution processing.

Is it our configuration problem ( I configured all.q to all execution daemons )
Is it possible to run a job on several hosts concurrently ( Hadoop is used for ) though single or different queues.

Thanks & Regards
Adarsh Sharma

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302383

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].

templedf

2010-12-06 14:23:37 UTC

Permalink

Are you submitting your job as a parallel job?

Daniel

Post by adarsh
Dear all,
Thanks for your replies so that I am able to configure Hadoop with SGE integrated on 10 nodes cluster.
I overcomed all the difficulties faced during Configuration.
Yet there are some doubts in my mind.
1. I loaded data of different types in Hadoop ( 24MB, 2 GB, 20 GB file ). When i issued a command ./qhost -F | grep hdfs, it shows data paths. But when I ran any SGE job on these types of data files,it executes on only 1 execution daemon.
It is good for small files, but for 20 Gb file, data is distributed on 10 nodes. So it might runs all tasktrackers for running wordcount. But it shows only one execution daemon.
I check through Web UI and logs, only one execution daemon is running.
It causes data transfer to one node which takes too much time.
What is the benefit, Hadoop made for distribution processing.
Is it our configuration problem ( I configured all.q to all execution daemons )
Is it possible to run a job on several hosts concurrently ( Hadoop is used for ) though single or different queues.
Thanks & Regards
Adarsh Sharma
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302383

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302473

To unsubscribe from this discussion,

adarsh

2010-12-07 04:23:30 UTC

Permalink

I am using parallel environment of hadoop that is created after ./setup.pl -i script.

Here is my command :

echo $HADOOP_HOME/bin/hadoop --config \$TMPDIR/conf jar $HADOOP_HOME/hadoop-*-examples.jar wordcount /user/hadoop/numb_complete.txt /user/hadoop/output6 | qsub -pe hadoop 3 -jsv /opt/hadoop_copy/jsv.sh -l hdfs_input=/user/hadoop/numb_complete.txt

I want to now that is it depends on number of nodes ( qsub hadoop 3 ) whereas my nodes ( execution hosts ) are 8.

However I also checked it with 1,8 no matter it runs on single hosts.

Thanks for your Interest

Adarsh

Thanks for your interest.

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302641

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].

templedf

2010-12-07 13:08:23 UTC

Permalink

So, while the job is running, if you connect to the JobTracker page,
it says that only one TaskTracker is running?

It sounds like the master task is failing to kick off the slaves. Check
for the output files from the job. You should have <jobname>.o<jobid>,
<jobname>.e<jobid>, <jobname>.po<jobid>, and <jobname>.pe<jobid>. If
the slaves are failing to start, there should be a clue in one of those
files, probably the .po or .pe file.

Daniel

Post by adarsh
I am using parallel environment of hadoop that is created after ./setup.pl -i script.
echo $HADOOP_HOME/bin/hadoop --config \$TMPDIR/conf jar $HADOOP_HOME/hadoop-*-examples.jar wordcount /user/hadoop/numb_complete.txt /user/hadoop/output6 | qsub -pe hadoop 3 -jsv /opt/hadoop_copy/jsv.sh -l hdfs_input=/user/hadoop/numb_complete.txt
I want to now that is it depends on number of nodes ( qsub hadoop 3 ) whereas my nodes ( execution hosts ) are 8.
However I also checked it with 1,8 no matter it runs on single hosts.
Thanks for your Interest
Adarsh
Thanks for your interest.
------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302641

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=302768

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].

templedf

2010-12-08 04:58:51 UTC

Permalink

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/jobcache/job_201012061426_0001/job.xml
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:343)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:124)
at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:750)
at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:1664)
at org.apache.hadoop.mapred.TaskTracker.access$1200(TaskTracker.java:97)
at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:1629)

Looks like that's your problem. For some reason, the task trackers
can't find any space they can write to to store the job data. That's
not something I've encountered before. I'm on the road at the moment,
so there's not much I can do to debug it at the moment. I'll have a
look when I get back.

Daniel

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=303037

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].

adarsh

2010-12-08 04:39:48 UTC

Permalink

Hi,

I am listing you the contents of file that are created on execution hosts running my hadoop job.

***@ws37-user-lin:~# cat STDIN.pe9
***@ws37-user-lin:~# cat STDIN.o9
***@ws37-user-lin:~#

Above files are empty.

I attached the contents of STDIN.e9 and STDIN.o9.

I also checked my hadoop tasktracker and jobtracker logs that are created as hadoop-root user.

I am sorry but Please also check the error in that logs , it seems that tasktracker didn't able to connect jobtracker.

Don't know what's the problem :

Please be kind to find the root cause.

Thanks & Regards

Adarsh Sharma

------------------------------------------------------
http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=303031

To unsubscribe from this discussion, e-mail: [users-***@gridengine.sunsource.net].