In this post I shall explain how to install Cloudera’s Hadoop distribution CDH4 to a virtual machine running Ubuntu 12.04 LTS from scratch. The second part can be used to install Hadoop natively to you Ubuntu 12.04 installation. However, this tutorial was written with a development or experimental environment in mind.
- A Computer with at least 4GB of RAM (8GB+ recommended), as the virtualization of Hadoop consumes a lot of memory.
- 64 bit processor with activated Hardware virtualisation in BIOS (If you don’t have a 64 bit processor, you won’t be able to use Ubuntu with CDH 4 and have to choose another OS)
Virtual machine installation:
- Download the .iso-file for Ubuntu 12.04 LTS 64 bit from www.ubuntu.com.
- Next, download Virtualbox for your host operation system, e.g. Windows 7. The host operating system is the system running Virtualbox, whereas the virtual operating system is called guest system.
You will find the binaries for your specific operating system with further installation instructions at https://www.virtualbox.org/wiki/Downloads.
- After installation, open Virtualbox and create a new virtual machine. Call it Ubuntu Hadoop or any other name and select Linux, Ubuntu 64bit as operating system.
- In the next step, you will have to create a new virtual hard disk. I’d recommend a minimum of 20GB in VDI format with dynamic allocation. Dynamic allocation means that the file on your host disc won’t be a fixed size but will grow with the hard disc of the guest system. However, it will be limited to a fixed size, i.e. 20GB. Initially this will be slower than a fixed size disc, while consuming less space on the host system.
- Once you have created the virtual machine, start your virtual machine by double clicking on the name. You can now choose the downloaded Ubuntu .iso-file as start disk.
- This will open the Ubuntu installer. Follow the on screen instructions to install Ubuntu to the virtual machine. Visit the Ubuntu Homepage for further help with the installation.
Prerequesites to installing CDH4
- Install open ssh server package.
sudo apt-get install openssh-server
- Set a password for root. WARNING: This is only set for this virtual machine. In production environments, this could present a security risk.
- Edit /etc/hosts in a editor with root privileges. Uncomment the second line. It should look something like this.
127.0.0.1 antony-VirtualBox localhost #127.0.1.1 antony-VirtualBox # The following lines are desirable for IPv6 capable hosts #::1 ip6-localhost ip6-loopback #fe00::0 ip6-localnet #ff00::0 ip6-mcastprefix #ff02::1 ip6-allnodes #ff02::2 ip6-allrouters
Installing Cloudera Hadoop Distribution (CDH4)
- Go to www.cloudera.com and locate Products – CDH. Click ‘Download and Install CDH 4’. On the next page, click ‘Download and Install CDH 4 automatically’. On the following page, under Cloudera Manager 4.1.1, click ‘download‘. Save the .bin-file to disk.
- Make the installer executable and run it as root.
chmod u+x ./cloudera-manager-installer.bin sudo ./cloudera-manager-installer.bin
- Accept the licenses and follow the on-screen instructions. It is important to be patient! The installer may seem to have crashed at times, however, it simply takes its time to install. At the end of the installation a browser should open.
- Log in with the credentials ‘admin’, ‘admin’.
- You can now add hosts to your cluster by clicking ‘Hosts’ and then add hosts. Enter localhost or 127.0.0.1 as IP. In the process you can choose whether to install YARN or MRv1. Be sure to select the latter.
- After the installation and configuration of your cluster, you can access your running services under ‘services’.
- You have successfully installed a one-node cluster on the virtual Ubuntu machine.