I'll refer to the guides by number later. Doc 1 is the current #1 hit for 'ubuntu hadoop' on google, so it seemed a good spot to start.
Documents:
- http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29
- http://archive.cloudera.com/docs/_apt.html
- http://github.com/spazm/config/tree/master/hadoop/conf/
1) created a hadoop user and group, as per document 1. Also ssh-key for hadoop user. (currently no-password, will check that soon).
2) added jaunty-testing repo from cloudera, see doc 2. They don't have a jaunty package yet. Add /etc/apt/souces.list.d/cloudera.list
#deb http://archive.cloudera.com/debian karmic-testing contrib
#deb-src http://archive.cloudera.com/debian karmic-testing contrib
#no packages for karmic yet, trying jaunty-testing, jaunty-stable, jaunty-cdh1 or jaunty-cdh2
deb http://archive.cloudera.com/debian jaunty-testing contrib
deb-src http://archive.cloudera.com/debian jaunty-testing contrib
3) install hadoop:
[andrew@mini]% sudo aptitude install hadoop 0 ~/src Reading package lists... Done Building dependency tree Reading state information... Done Reading extended state information Initializing package states... Done "hadoop" is a virtual package provided by: hadoop-0.20 hadoop-0.18 You must choose one to install. No packages will be installed, upgraded, or removed. 0 packages upgraded, 0 newly installed, 0 to remove and 25 not upgraded. Need to get 0B of archives. After unpacking 0B will be used. Reading package lists... Done Building dependency tree Reading state information... Done Reading extended state information Initializing package states... Done
3b) sudo aptitude update, sudo aptitude install hadoop-0.20
[andrew@mini]% sudo aptitude install hadoop-0.20 0 ~/src Reading package lists... Done Building dependency tree Reading state information... Done Reading extended state information Initializing package states... Done The following NEW packages will be installed: hadoop-0.20 hadoop-0.20-native{a} 0 packages upgraded, 2 newly installed, 0 to remove and 25 not upgraded. Need to get 20.1MB of archives. After unpacking 41.9MB will be used. Do you want to continue? [Y/n/?] Y Writing extended state information... Done [... snip ...] Initializing package states... Done Writing extended state information... Done4) this has setup our config information in /etc/hadoop-0.20, also symlinked as /etc/hadoop/
hadoop-env.sh is loaded from /etc/hadoop/conf/hadoop-env.sh (aka /etc/hadoop-0.20/conf.empty/hadoop-envb.sh)
Modify hadoop-env.sh to point to our jvm. Since I installed sun java 1.6 (aka Java6), I updated it to:
export JAVA_HOME=/usr/lib/jvm/java-6-sun
5) update rest of configs.
Snapshotted conf.empty to ~/config/hadoop/conf, and started making edits, as per doc 1. Symlinked into /etc/hadoop/conf
files available at document #3, my github config project, hadoop/conf subidr.
6) switch to hadoop user
sudo -i -u hadoop
7) initiale hdfs (as hadoop user)
mkdir ~hadoop/tmp
chmod a+rwx ~hadoop/tmp
hadoop namenode -format
8) fire it up: (as hadoop user)
/usr/lib/hadoop/bin/start-all.sh
hadoop@mini:/usr/lib/hadoop/logs$ /usr/lib/hadoop/bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
hadoop@mini:/usr/lib/hadoop/logs$ /usr/lib/hadoop/bin/start-all.sh
starting namenode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-namenode-mini.out
localhost: starting datanode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-datanode-mini.out
localhost: starting secondarynamenode, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-mini.out
starting jobtracker, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-jobtracker-mini.out
localhost: starting tasktracker, logging to /usr/lib/hadoop/bin/../logs/hadoop-hadoop-tasktracker-mini.out
8) Check that it is running via jps
hadoop@mini:/usr/lib/hadoop/logs$ jps
12001 NameNode
12166 DataNode
12684 Jps
12568 TaskTracker
12409 JobTracker
12332 SecondaryNameNode
(note to self, why don't we have hadoop 9) Run example. See doc 1:
hadoop jar hadoop-0.20.0-examples.jar wordcount gutenberg gutenberg-output
hadoop@mini:~/install$ hadoop jar hadoop-0.20.1+152-examples.jar wordcount gutenberg gutenberg-output
09/12/25 23:24:19 INFO input.FileInputFormat: Total input paths to process : 3
09/12/25 23:24:20 INFO mapred.JobClient: Running job: job_200912252310_0001
09/12/25 23:24:21 INFO mapred.JobClient: map 0% reduce 0%
09/12/25 23:24:33 INFO mapred.JobClient: map 66% reduce 0%
09/12/25 23:24:39 INFO mapred.JobClient: map 100% reduce 0%
09/12/25 23:24:42 INFO mapred.JobClient: map 100% reduce 33%
09/12/25 23:24:48 INFO mapred.JobClient: map 100% reduce 100%
09/12/25 23:24:50 INFO mapred.JobClient: Job complete: job_200912252310_0001
...
hadoop@mini:~/install$ hadoop dfs -ls gutenberg-output
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2009-12-25 23:24 /user/hadoop/gutenberg-output/_logs
-rw-r--r-- 1 hadoop supergroup 21356 2009-12-25 23:24 /user/hadoop/gutenberg-output/part-r-00000
It Lives!
No comments:
Post a Comment