Why GUI? or Ambari? Hello Ambari 2.0
Neeraj Sabharwal

Big Data – Solutions Engineer

Why GUI? or Ambari? Hello Ambari 2.0

I heard the above a couple of times in my career as a DBA and my “reaction” was, really? GUI? Why?

There was an inner resistance and irritation on while using a GUI to manage databases and loading data into the database.

Around 3 years ago, I decided to build my first Hadoop cluster, 1 master and 6 data nodes. Time taken to build it was around 3 hours. There was lots of frustration including copying xml and then managing all the changes.

After a year or so, I heard about Ambari and I was like “ Are you serious? Please don’t mention GUI. I am fine with a manual way of managing my cluster.”

After a few months, I heard about Ambari again and this time I chose to accept the GUI approach and decided to try it. It took me sometime to figure it out but the end result was AWESOME!

I looked at the interface and started playing around with different tabs and was very impressed with the tool.

In case you don’t know what Ambari is “The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop cluster.” Source

This is how Ambari looks. Once you do the cluster install or download a sandboxthen, go to http://<ambari server>:8080 ( In case of sandbox localhost:8080).

In my case it’s http://c6501.ambari.apache.org:8080/

username and password

You can see all the components installed in Hadoop cluster and different metrics like HDFS disk usage, CPU etc. In my case, I am not using falcon so I decided to turn on the maintenance mode. More information

Imagine that you have 100’s of nodes in your cluster and with Ambari you can see the details of each node by clicking Hosts tab and then particular node.

Click Admin tab and you can see stack and versions details.

Kerberos can be enabled using Ambari ( some manual work required)

Namenode HA ( High Availability) – under service actions for HDFS

Resourcemanager HA

Built-in version control to track all the config changes made. If you want to add or modify any setting then it can be done using Ambari.

Ambari will make sure to replicate all the changes in all the nodes ” No more manual xml editing”

Last but not least is Ambari views

Click admin –> manage Ambari

Click Views on the left hand side and you can see different views in the window ( more views : work in progress )

 

Learn more about Capacity Scheduler & Ambari makes it easy to manage queues.

Rolling upgrades

More information on Ambari can be found on http://docs.hortonworks.com/

Happy Hadooping!

Source : Linkedin

1) Setup Azure account

2) Setup CloudBreak account

Very important steps : Applies to Azure only

Create a test network in Azure before you start creating cloudbreak credentials. 

In your local machine, run the following and accept default values.

 

You will see 2 files as listed below.

-rw-r–r–   1 nsabharwal  staff         1346 May  7 17:00 azuretest.pem –> We need this file to create credentials in cloudbreak.

-rw-r–r–   1 nsabharwal  staff         1679 May  7 17:00 azuretest.key –> We need this to login into the host after cluster deployment.

chmod 400 azuretest.key  –> otherwise, you will receiver bad permission error

for example: ssh -i azuretest.key ubuntu@IP/FQDN

Login to cloudbreak portal and create Azure credential

Once you fill the information and hit create credentials then you will get a file from cloudbreak that needs to be uploaded into the Azure portal.

I saved it as azuretest.cert

Login to Azure portal ( switch to classic mode in case you are using new portal)

click Settings –> Manage Certificates then upload the bottom of the screen.

There are 2 more actions

In CloudBreak windows

1) Create a template

You can change the instance type & volume type as per your setup.

2) Create a blueprint – You can grab sample blueprints here  ( You may have to format the blueprint in case there is any issue)

Once all this done then you are all set to deploy the cluster

select the credential and hit create cluster

Create cluster window

handy commands to login into docker

login into your host

ssh -i azuretest.key ubuntu@fqdn

Once you are in the shell  , sudo su  –

docker ps

docker exec -it <container id> bash

[root@azuretest ~]# docker ps

CONTAINER ID        IMAGE                                               COMMAND               CREATED             STATUS              PORTS               NAMES

f493922cd629        sequenceiq/docker-consul-watch-plugn:1.7.0-consul   “/start.sh”            2 hours ago         Up 2 hours                              consul-watch        

100e7c0b6d3d        sequenceiq/ambari:2.0.0-consul                      “/start-agent”        2 hours ago         Up 2 hours                              ambari-agent        

d05b85859031        sequenceiq/consul:v0.4.1.ptr                        “/bin/start -adverti  2 hours ago         Up 2 hours                              consul              

[root@test~]# docker exec -it 100e7c0b6d3d bash

bash-4.1#

docker commands

Happy Hadooping!!!!

NestedThrowablesStackTrace:

java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://master1/hive?createDatabaseIfNotExist=true, username = hive. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ——

java.sql.SQLException: Access denied for user ‘hive’@’master1′ (using password: YES)

at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)

Solution:

export HIVE_CONF_DIR=/etc/hive/conf.server

hive –service metatool -listFSRoot

ambari-server stop
ambari-agent stop
#################################
# Remove Packages
################################

yum -y remove ambari-\*
yum -y remove hcatalog\*
yum -y remove hive\*
yum -y remove hbase\*
yum -y remove zookeeper\*
yum -y remove oozie\*
yum -y remove pig\*
yum -y remove snappy\*
yum -y remove hadoop-lzo\*
yum -y remove knox\*
yum -y remove hadoop\*
yum -y remove bigtop-jsvc.x86_64
yum -y remove extjs-2.2-1 mysql-connector-java-5.0.8-1\*
yum -y remove lzo.x86_64
yum -y remove extjs.noarch
yum -y remove sqoop.noarch hadoop.x86_64
yum -y remove hcatalog.noarch
yum -y remove ganglia-gmond-modules-python.x86_64
yum -y remove hadoop-libhdfs.x86_64
yum -y remove hbase.noarch
yum -y remove ambari-log4j.noarch
yum -y remove oozie-client.noarch
yum -y remove pig.noarch hive.noarch
yum -y remove hadoop-lzo.x86_64
yum -y remove hadoop-lzo-native.x86_64
yum -y remove hadoop-sbin.x86_64
yum -y remove libconfuse.x86_64
yum -y remove lzo.x86_64
yum -y remove hadoop-native.x86_64
yum -y remove hadoop-pipes.x86_64
yum -y remove zookeeper.noarch
yum -y remove libganglia.x86_64
yum -y remove ganglia-gmond.x86_64
yum -y remove lzo-devel.x86_64
yum -y remove oozie.noarch
yum -y remove extjs.noarch
yum -y remove compat-readline5.x86_64
yum -y remove rrdtool.x86_64
yum -y remove ganglia-web.noarch
yum -y remove python-rrdtool.x86_64
yum -y remove nagios.x86_64
yum -y remove ganglia-devel.x86_64
yum -y remove perl-Digest-HMAC.noarch
yum -y remove perl-Crypt-DES.x86_64
yum -y remove ganglia-gmetad.x86_64
yum -y remove nagios-www.x86_64
yum -y remove perl-Net-SNMP.noarch
yum -y remove nagios-plugins.x86_64
yum -y remove nagios-devel.x86_64
yum -y remove perl-Digest-SHA1.x86_64
yum -y remove fping.x86_64
yum -y remove perl-rrdtool.x86_64
yum -y remove webhcat-tar-pig.noarch
yum -y remove webhcat-tar-hive.noarch
yum –y remove mysql mysql-server
yum –y remove bigtop-jsvc.x86_64
yum –y remove snappy.x86_64
yum –y remove snappy-devel.x86_64
yum –y remove bigtop-tomcat.noarch
yum -y remove ruby ruby-irb ruby-libs ruby-shadow ruby-rdoc ruby-augeas rubygems libselinux-ruby
yum -y remove ruby-devel libganglia libconfuse hdp_mon_ganglia_addons postgresql-server
yum -y remove postgresql postgresql-libs ganglia-gmond-python ganglia ganglia-gmetad ganglia-web
yum -y remove ganglia-devel httpd mysql mysql-server mysqld puppet
######################
# Remove Directories
####################
rm -rf /etc/hadoop
rm -rf /etc/hbase
rm -rf /etc/hcatalog
rm -rf /etc/hive
rm -rf /etc/ganglia
rm -rf /etc/oozie
rm -rf /etc/sqoop
rm -rf /etc/zookeeper
rm -rf /var/run/hadoop
rm -rf /var/run/hbase
rm -rf /var/run/hive
rm -rf /var/run/ganglia
rm -rf /var/run/webhcat
rm -rf /var/log/hadoop
rm -rf /var/log/hbase
rm -rf /var/log/hive
rm -rf /var/log/zookeeper
rm -rf /usr/lib/hadoop
rm -rf /usr/lib/hadoop-yarn
rm -rf /usr/lib/hadoop-mapreduce
rm -rf /usr/lib/hbase
rm -rf /usr/lib/hcatalog
rm -rf /usr/lib/hive
rm -rf /usr/lib/oozie
rm -rf /usr/lib/sqoop
rm -rf /usr/lib/zookeeper
rm -rf /var/lib/hive
rm -rf /var/lib/zookeeper
rm -rf /var/lib/hadoop-hdfs
rm -rf /hadoop/hbase
rm -rf /hadoop/zookeeper
rm -rf /hadoop/mapred
rm -rf /hadoop/hdfs
rm -rf /tmp/sqoop-ambari-qa
rm -rf /var/run/oozie
rm -rf /var/log/oozie
rm -rf /var/lib/oozie
rm -rf /var/tmp/oozie
rm -rf /hadoop/oozie
rm -rf /etc/nagios
rm -rf /var/run/nagios
rm -rf /var/log/nagios
rm -rf /usr/lib/nagios
rm -rf /var/lib/ganglia
rm -rf /tmp/nagios
rm -rf /var/nagios
rm -rf /var/log/webhcat
rm -rf /tmp/hive
rm -rf /var/run/zookeeper
rm -rf /tmp/ambari-qa
rm -rf /etc/storm
rm -rf /etc/hive-hcatalog
rm -rf /etc/tez
rm -rf /etc/falcon
rm -rf /var/run/hadoop-yarn
rm -rf /var/run/hadoop-mapreduce
rm -rf /var/log/hadoop-yarn
rm -rf /var/log/hadoop-mapreduce
rm -rf /usr/lib/hive-hcatalog
rm -rf /usr/lib/falcon
rm -rf /tmp/hadoop
rm -rf /var/hadoop
rm -rf /etc/webhcat
rm -rf /var/log/hadoop-hdfs
rm -rf /var/log/hue
rm -rf /var/lib/alternatives
rm -rf /var/lib/alternatives/flume
rm -rf /var/lib/alternatives/sqoop2
rm -rf /var/lib/alternatives/impala
rm -rf /var/lib/alternativese/hdfs
rm -rf /var/lib/alternatives/webhcat
rm -rf /var/lib/alternatives/hive
rm -rf /var/lib/alternatives/zookeeper
rm -rf /etc/alternative/hadoop
rm -rf /var/log/hadoop-hdfs
rm -rf /etc/webhcat
rm -rf /var/log/hadoop-hdfs
rm -rf /var/log/hue
rm -rf /etc/alternatives/flume
rm -rf /etc/alternative/sqoop2
rm -rf /etc/alternative/impala
rm -rf /etc/alternative/hdfs
rm -rf /etc/alternative/webhcat
rm -rf /etc/alternative/hive
rm -rf /etc/alternative/zookeeper
rm -rf /etc/alternative/hadoop
################################
# uer delete
################################

userdel -r nagios
userdel -r hive
userdel -r ambari-qa
userdel -r hbase
userdel -r oozie
userdel -r hcat
userdel -r hdfs
userdel -r mapred
userdel -r zookeeper
userdel -r sqoop
userdel -r rrdcached
userdel -r yarn
userdel -r flume
userdel -r hue
userdel -r sqoop2

yum list installed | grep -i ambari

rm -rf /usr/sbin/ambari-server
rm -rf /usr/lib/ambari-server
rm -rf /var/run/ambari-server
rm -rf /var/log/ambari-server
rm -rf /var/lib/ambari-server
rm -rf /etc/rc.d/init.d/ambari-server
rm -rf /etc/ambari-server
rm -rf /usr/sbin/ambari-agent
rm -rf /usr/lib/ambari-agent
rm -rf /var/run/ambari-agent
rm -rf /var/log/ambari-agent
rm -rf /var/lib/ambari-agent
rm -rf /etc/rc.d/init.d/ambari-agent
rm -rf /etc/ambari-agent
#python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py

yum list installed | grep -i ambari

python /usr/lib/python2.6/site-packages/ambari_agent/HostCleanup.py –silent –skip=users

HDP deployment in Azure and configuring wasb in ambari

Lessons learned: All VMs needs to be under virtual private network and hostnames needs to be changed.

1) Create virtual network for your cluster

Note: screen shot has name hdptest01 but it’s test

2) Create VMs and choose network created in the above step

This is very important step and we need to pay attention to the hostname.


Added endpoint for 8080.

There is a step to add disks to vm. You can follow this blog

At this point all the hosts are created and we will make changes to hostnames.

for example: the following needs to be modified. The command hostname should show internal hostname or same output as hostname -f so modify /etc/sysconfig/network and reboot all the nodes or change hostname using hostname command.

[root@hdpmaster01 ~]# cat /etc/sysconfig/network

HOSTNAME=hdpmaster01.hdpmaster01.j3.internal.cloudapp.net

NETWORKING=yes

#hostname

hdpmaster01.hdpmaster01.j3.internal.cloudapp.net

Follow HDP Docs to install cluster using Ambari

Use hostname entries for install , No public DNS

Once cluster is installed then add WASB config using Ambari

HDFS –> Configs –> under Custom hdfs-site

add property

fs.azure.account.key.hdptest01.blob.core.windows.net

WASB: Get secret key from Azure portal.

Restart the services and then test if you can use the wasb

[root@hdpmaster01 ~]# cat > test.txt

abc

[root@hdpmaster01 ~]# hdfs dfs -put test.txt wasb://hdpmastercontainer@hdptest01.blob.core.windows.net/

15/03/11 23:49:21 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties

15/03/11 23:49:21 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 60 second(s).

15/03/11 23:49:21 INFO impl.MetricsSystemImpl: azure-file-system metrics system started

15/03/11 23:49:23 INFO impl.MetricsSystemImpl: Stopping azure-file-system metrics system…

15/03/11 23:49:23 INFO impl.MetricsSystemImpl: azure-file-system metrics system stopped.

15/03/11 23:49:23 INFO impl.MetricsSystemImpl: azure-file-system metrics system shutdown complete.

[root@hdpmaster01 ~]# hdfs dfs -ls -R wasb://hdpmastercontainer@hdptest01.blob.core.windows.net/

15/03/11 23:49:35 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties

15/03/11 23:49:35 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 60 second(s).

15/03/11 23:49:35 INFO impl.MetricsSystemImpl: azure-file-system metrics system started

-rw-r–r– 1 root supergroup 4 2015-03-11 23:49 wasb://hdpmastercontainer@hdptest01.blob.core.windows.net/test.txt

15/03/11 23:49:35 INFO impl.MetricsSystemImpl: Stopping azure-file-system metrics system…

15/03/11 23:49:35 INFO impl.MetricsSystemImpl: azure-file-system metrics system stopped.

15/03/11 23:49:35 INFO impl.MetricsSystemImpl: azure-file-system metrics system shutdown complete.

Useful links:

HDP docs

WASB Configs – Helpful link

Attaching disk

cat /etc/mke2fs.conf
hadoop = {
features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize
inode_ratio = 131072
blocksize = -1
reserved_ratio = 0
default_mntopts = acl,user_xaddr
          }
mkfs.ext4 -T hadoop /dev/sdc1
fstab
/dev/sdc1               /hadoop                 ext4  data=writeback,noatime,nodev,nobarrier  0 1
[root@test~]# cat /etc/rc.local
#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don’t
# want to do the full Sys V style init stuff.

touch /var/lock/subsys/local
for i in /sys/class/scsi_generic/*/device/timeout; do echo 900 > “$i”; done
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then
echo never > /sys/kernel/mm/transparent_hugepage/enabled
fi
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then echo never > /sys/kernel/mm/transparent_hugepage/defrag

fi

OpenStack integration with Cloudbreak

End result is:

http://192.168.60.10/

username:admin password:openstack

 Screen Shot 2015-03-07 at 12.39.07 PMScreen Shot 2015-03-07 at 12.39.37 PM

The following is demo blog

Make sure you have Ansible, PyYaml and Jinja installed.

Download Ansible link

To avoid:The executable ‘ansible-playbook’ Vagrant is trying to run was not found in the PATH variable. This is an error. Please verify this software is installed and on the path.

Download PyYaml – link

cd PyYAML-3.11

python setup.py install

Download Jinja – link

cd Jinja2-2.7.3

python setup.py install

root# cd PyYAML-3.11

root# python ./setup.py install

Demo:

git clone https://github.com/sequenceiq/sequenceiq-samples

cd sequenceiq-samples/devstack/vagrant/devstack-neutron

export PATH=$PATH: <location of ansible>/ansible-1.8.4/bin

hw11326:Jinja2-2.7.3 root# vagrant up

Bringing machine ‘default’ up with ‘virtualbox’ provider…

==> default: Importing base box ‘ubuntu/trusty64’…

==> default: Matching MAC address for NAT networking…

==> default: Checking if box ‘ubuntu/trusty64′ is up to date…

==> default: Setting the name of the VM: devstack-neutron_default_1425748068278_99768

==> default: Clearing any previously set forwarded ports…

==> default: Fixed port collision for 22 => 2222. Now on port 2200.

==> default: Clearing any previously set network interfaces…

==> default: Preparing network interfaces based on configuration…

default: Adapter 1: nat

default: Adapter 2: hostonly

==> default: Forwarding ports…

default: 22 => 2200 (adapter 1)

==> default: Running ‘pre-boot’ VM customizations…

==> default: Booting VM…

==> default: Waiting for machine to boot. This may take a few minutes…

default: SSH address: 127.0.0.1:2200

default: SSH username: vagrant

default: SSH auth method: private key

default: Warning: Connection timeout. Retrying…

==> default: Machine booted and ready!

==> default: Checking for guest additions in VM…

==> default: Configuring and enabling network interfaces…

==> default: Mounting shared folders…

default: /vagrant => /Users/nsabharwal/sequenceiq-samples/devstack/vagrant/devstack-neutron

==> default: Running provisioner: ansible…

ANSIBLE_FORCE_COLOR=true ANSIBLE_HOST_KEY_CHECKING=false PYTHONUNBUFFERED=1 ansible-playbook –private-key=/var/root/.vagrant.d/insecure_private_key –user=vagrant –limit=’default’ –inventory-file=/Users/nsabharwal/sequenceiq-samples/devstack/vagrant/devstack-neutron/.vagrant/provisioners/ansible/inventory -v ../../ansible/local-vagrant-vm.yml

PLAY [Provision common parts] *************************************************

GATHERING FACTS ***************************************************************

ok: [default]

TASK: [common | Run the equivalent of “apt-get update” as a separate step] ****

ok: [default] => {“changed”: false}

TASK: [common | Update all packages to the latest version] ********************

changed: [default] => {“changed”: true, “msg”: “Reading package lists…\nBuilding dependency tree…\nReading state information…\nThe following packages will be upgraded:\n libicu52\n1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.\nNeed to get 6751 kB of archives.\nAfter this operation, 3072 B disk space will be freed.\nGet:1 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libicu52 amd64 52.1-3ubuntu0.2 [6751 kB]\nFetched 6751 kB in 9s (678 kB/s)\n(Reading database … 60969 files and directories currently installed.)\nPreparing to unpack …/libicu52_52.1-3ubuntu0.2_amd64.deb …\nUnpacking libicu52:amd64 (52.1-3ubuntu0.2) over (52.1-3) …\nSetting up libicu52:amd64 (52.1-3ubuntu0.2) …\nProcessing triggers for libc-bin (2.19-0ubuntu6.6) …\n”, “stderr”: “”, “stdout”: “Reading package lists…\nBuilding dependency tree…\nReading state information…\nThe following packages will be upgraded:\n libicu52\n1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.\nNeed to get 6751 kB of archives.\nAfter this operation, 3072 B disk space will be freed.\nGet:1 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libicu52 amd64 52.1-3ubuntu0.2 [6751 kB]\nFetched 6751 kB in 9s (678 kB/s)\n(Reading database … 60969 files and directories currently installed.)\nPreparing to unpack …/libicu52_52.1-3ubuntu0.2_amd64.deb …\nUnpacking libicu52:amd64 (52.1-3ubuntu0.2) over (52.1-3) …\nSetting up libicu52:amd64 (52.1-3ubuntu0.2) …\nProcessing triggers for libc-bin (2.19-0ubuntu6.6) …\n”}

TASK: [common | Generate SSH keys] ********************************************

changed: [default] => {“append”: false, “changed”: true, “comment”: “”, “group”: 1000, “home”: “/home/vagrant”, “move_home”: false, “name”: “vagrant”, “shell”: “/bin/bash”, “ssh_fingerprint”: “2048 dd:1e:c9:aa:5b:24:82:dc:50:31:2e:a5:15:29:26:18 ansible-generated (RSA)”, “ssh_key_file”: “/home/vagrant/.ssh/id_rsa”, “ssh_public_key”: “ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEZ6jxBYjh4i4ek6EhKPxlUEJt9LN5KmGaaP4lT87ySXqJsKQtI/8jWI+jmQhes+WHWccnvAnJQVLY0AZA3dSFCgjh0E/fy/wdAXSqlU86mpoCIfQAjqJ3HgM8w/9qreYt0vyGDzgaboWVgQXojhGAL/VpNCDpV9x7xjGtj6NesOTnvZvDrCL14z65AdGyvtyCGJsPIEwa/J5WremT7GXAY8XcS3DJz2MGYhl9rYkIakfxCw8JwOR7oTRaIrQwpJU6OiG2vbVfsB1/+kM9tPsoa8/F1EYe63b6O/wn0EBlgjV5vQNP2C+7hHBU7Awo3hnaeJ4WB2Mz6NyhONoFThAh ansible-generated”, “state”: “present”, “uid”: 1000}

TASK: [common | Install sysstat] **********************************************

changed: [default] => {“changed”: true, “stderr”: “\nCreating config file /etc/default/sysstat with new version\nupdate-alternatives: using /usr/bin/sar.sysstat to provide /usr/bin/sar (sar) in auto mode\n”, “stdout”: “Reading package lists…\nBuilding dependency tree…\nReading state information…\nThe following extra packages will be installed:\n libsensors4\nSuggested packages:\n lm-sensors isag\nThe following NEW packages will be installed:\n libsensors4 sysstat\n0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.\nNeed to get 310 kB of archives.\nAfter this operation, 1022 kB of additional disk space will be used.\nGet:1 http://archive.ubuntu.com/ubuntu/ trusty/main libsensors4 amd64 1:3.3.4-2ubuntu1 [27.2 kB]\nGet:2 http://archive.ubuntu.com/ubuntu/ trusty/main sysstat amd64 10.2.0-1 [283 kB]\nPreconfiguring packages …\nFetched 310 kB in 6s (49.4 kB/s)\nSelecting previously unselected package libsensors4:amd64.\n(Reading database … 60969 files and directories currently installed.)\nPreparing to unpack …/libsensors4_1%3a3.3.4-2ubuntu1_amd64.deb …\nUnpacking libsensors4:amd64 (1:3.3.4-2ubuntu1) …\nSelecting previously unselected package sysstat.\nPreparing to unpack …/sysstat_10.2.0-1_amd64.deb …\nUnpacking sysstat (10.2.0-1) …\nProcessing triggers for ureadahead (0.100.0-16) …\nProcessing triggers for man-db (2.6.7.1-1ubuntu1) …\nSetting up libsensors4:amd64 (1:3.3.4-2ubuntu1) …\nSetting up sysstat (10.2.0-1) …\nProcessing triggers for libc-bin (2.19-0ubuntu6.6) …\nProcessing triggers for ureadahead (0.100.0-16) …\n”}

TASK: [devstack-common | Install git] *****************************************

changed: [default] => {“changed”: true, “stderr”: “”, “stdout”: “Reading package lists…\nBuilding dependency tree…\nReading state information…\nThe following extra packages will be installed:\n git-man liberror-perl\nSuggested packages:\n git-daemon-run git-daemon-sysvinit git-doc git-el git-email git-gui gitk\n gitweb git-arch git-bzr git-cvs git-mediawiki git-svn\nThe following NEW packages will be installed:\n git git-man liberror-perl\n0 upgraded, 3 newly installed, 0 to remove and 0 not upgraded.\nNeed to get 3346 kB of archives.\nAfter this operation, 21.6 MB of additional disk space will be used.\nGet:1 http://archive.ubuntu.com/ubuntu/ trusty/main liberror-perl all 0.17-1.1 [21.1 kB]\nGet:2 http://archive.ubuntu.com/ubuntu/ trusty-updates/main git-man all 1:1.9.1-1ubuntu0.1 [698 kB]\nGet:3 http://archive.ubuntu.com/ubuntu/ trusty-updates/main git amd64 1:1.9.1-1ubuntu0.1 [2627 kB]\nFetched 3346 kB in 11s (291 kB/s)\nSelecting previously unselected package liberror-perl.\n(Reading database … 61025 files and directories currently installed.)\nPreparing to unpack …/liberror-perl_0.17-1.1_all.deb …\nUnpacking liberror-perl (0.17-1.1) …\nSelecting previously unselected package git-man.\nPreparing to unpack …/git-man_1%3a1.9.1-1ubuntu0.1_all.deb …\nUnpacking git-man (1:1.9.1-1ubuntu0.1) …\nSelecting previously unselected package git.\nPreparing to unpack …/git_1%3a1.9.1-1ubuntu0.1_amd64.deb …\nUnpacking git (1:1.9.1-1ubuntu0.1) …\nProcessing triggers for man-db (2.6.7.1-1ubuntu1) …\nSetting up liberror-perl (0.17-1.1) …\nSetting up git-man (1:1.9.1-1ubuntu0.1) …\nSetting up git (1:1.9.1-1ubuntu0.1) …\n”}

TASK: [devstack-common | get iptables rules] **********************************

changed: [default] => {“changed”: true, “cmd”: “iptables -t nat -L -n”, “delta”: “0:00:00.027939″, “end”: “2015-03-07 17:09:11.425859″, “rc”: 0, “start”: “2015-03-07 17:09:11.397920″, “stderr”: “”, “stdout”: “Chain PREROUTING (policy ACCEPT)\ntarget prot opt source destination \n\nChain INPUT (policy ACCEPT)\ntarget prot opt source destination \n\nChain OUTPUT (policy ACCEPT)\ntarget prot opt source destination \n\nChain POSTROUTING (policy ACCEPT)\ntarget prot opt source destination “, “warnings”: []}

TASK: [devstack-common | iptables rule to access external network by VMs] *****

changed: [default] => {“changed”: true, “cmd”: [“sudo”, “iptables”, “-t”, “nat”, “-A”, “POSTROUTING”, “-o”, “eth0″, “-j”, “MASQUERADE”, “-m”, “comment”, “–comment”, “DevstackExt”], “delta”: “0:00:00.015410″, “end”: “2015-03-07 17:09:11.543966″, “rc”: 0, “start”: “2015-03-07 17:09:11.528556″, “stderr”: “”, “stdout”: “”, “warnings”: []}

TASK: [devstack-common | Creates directory for openstack] *********************

changed: [default] => {“changed”: true, “gid”: 1000, “group”: “vagrant”, “mode”: “0755”, “owner”: “vagrant”, “path”: “/opt/stack”, “size”: 4096, “state”: “directory”, “uid”: 1000}

TASK: [devstack-common | Creates directory] ***********************************

ok: [default] => {“changed”: false, “gid”: 1000, “group”: “vagrant”, “mode”: “0755”, “owner”: “vagrant”, “path”: “/opt/stack”, “size”: 4096, “state”: “directory”, “uid”: 1000}

TASK: [devstack-common | Cloning devstack repo] *******************************

changed: [default] => {“after”: “57be53b51adf06140d82c623bc418f32af3167b5″, “before”: null, “changed”: true}

TASK: [devstack-vm | Copy and filter localrc config file] *********************

changed: [default] => {“changed”: true, “checksum”: “639687da6ae5ecaa994d7e6339b01b5f1f8fa23e”, “dest”: “/opt/stack/devstack/localrc”, “gid”: 1000, “group”: “vagrant”, “md5sum”: “86a232b8d999a45ad74ce41e36b3189d”, “mode”: “0664”, “owner”: “vagrant”, “size”: 1165, “src”: “/home/vagrant/.ansible/tmp/ansible-tmp-1425748180.16-181938926310427/source”, “state”: “file”, “uid”: 1000}

TASK: [devstack-vm | Copy and filter local.conf config file] ******************

changed: [default] => {“changed”: true, “checksum”: “7af1b13e806da7e34543163e4841c227c1e8dfcc”, “dest”: “/opt/stack/devstack/local.conf”, “gid”: 1000, “group”: “vagrant”, “md5sum”: “275017ef5c73b9ce148f6f2eb2f1e986″, “mode”: “0664”, “owner”: “vagrant”, “size”: 74, “src”: “/home/vagrant/.ansible/tmp/ansible-tmp-1425748180.3-105128739995702/source”, “state”: “file”, “uid”: 1000}

TASK: [devstack-vm | Install devstack] ****************************************

changed: [default] => {“changed”: true, “cmd”: “./stack.sh”, “delta”: “0:24:11.468315″, “end”: “2015-03-07 17:33:51.990596″, “rc”: 0, “start”: “2015-03-07 17:09:40.522281″, “stderr”: “”, “stdout”: “2015-03-07 17:09:40.801 | ++ trueorfalse False\n2015-03-07 17:09:40.801 | +

after 30 minutes or so

PLAY RECAP ********************************************************************

default : ok=14 changed=11unreachable=0failed=0

click http://192.168.60.10/ and start provisioning

More details – Link

Happy Clouding and Hadooping!

Please see the following details on Apache Phoenix “sql skin for HBase” .

Phoenix

The following details are based on a test done in one of my lab environments. You can see that we can run sql, secondary indexes, explain plan, data load and bulk load by using phoenix.

Table definition

drop table if exists crime;

create table crime (

caseid varchar,

Date varchar,

block varchar,

description varchar,

sdesc varchar,

ldesc varchar,

arrest char(2),

domestic char(2),

lat float,

long float

constraint PK PRIMARY KEY (caseid)

);

Sql prompt can be launched from your desktop by using

sqlline.py <zookeeper server>:<zookeeper port>

0: jdbc:phoenix:lab> !describe CRIME

+————+————-+————+————-+————+————+————-+—————+—————-+—————-+————+————+—————+

| TABLE_CAT  | TABLE_SCHEM | TABLE_NAME | COLUMN_NAME | DATA_TYPE  | TYPE_NAME  | COLUMN_SIZE | BUFFER_LENGTH | DECIMAL_DIGITS | NUM_PREC_RADIX |  NULLABLE  | COLUMN_DEF | SQL_DATA_TYPE |

+————+————-+————+————-+————+————+————-+—————+—————-+—————-+————+————+—————+

| null       | null        | CRIME      | CASEID      | 12         | VARCHAR    | null        | null          | null           | null           | 0          | null       | null          |

| null       | null        | CRIME      | DATE        | 12         | VARCHAR    | null        | null          | null           | null           | 1          | null       | null          |

| null       | null        | CRIME      | BLOCK       | 12         | VARCHAR    | null        | null          | null           | null           | 1          | null       | null          |

| null       | null        | CRIME      | DESCRIPTION | 12         | VARCHAR    | null        | null          | null           | null           | 1          | null       | null          |

| null       | null        | CRIME      | SDESC       | 12         | VARCHAR    | null        | null          | null           | null           | 1          | null       | null          |

| null       | null        | CRIME      | LDESC       | 12         | VARCHAR    | null        | null          | null           | null           | 1          | null       | null          |

| null       | null        | CRIME      | ARREST      | 1          | CHAR       | 2           | null          | null           | null           | 1          | null       | null          |

| null       | null        | CRIME      | DOMESTIC    | 1          | CHAR       | 2           | null          | null           | null           | 1          | null       | null          |

| null       | null        | CRIME      | LAT         | 6          | FLOAT      | null        | null          | null           | null           | 1          | null       | null          |

| null       | null        | CRIME      | LONG        | 6          | FLOAT      | null        | null          | null           | null           | 1          | null       | null          |

+————+————-+————+————-+————+————+————-+—————+—————-+—————-+————+————+———————+

0: jdbc:phoenix:lab> select count(1) from CRIME where ARREST=’Y';

+————+

| COUNT(1) |

+————+

|1092172   |

+————+

1 row selected (9.964 seconds)

Secondary Indexes http://phoenix.apache.org/secondary_indexing.html

0: jdbc:phoenix:lab> create index index_arrest on CRIME (arrest);

3,837,776 rows affected (95.428 seconds)

0: jdbc:phoenix:lab> select count(1) from CRIME where ARREST=’Y';

+————+

| COUNT(1) |

+————+

|1092172   |

+————+

1 row selected (2.169 seconds)

0: jdbc:phoenix:lab> select count(1) from CRIME where ARREST=’N';

+————+

| COUNT(1) |

+————+

|2745604   |

+————+

1 row selected (4.055 seconds)

explain select count(1) from CRIME where ARREST=’Y';

+————+

|   PLAN    |

+————+

| CLIENT PARALLEL 32-WAY RANGE SCAN OVER INDEX_ARREST [‘Y’] |

|     SERVER AGGREGATE INTO SINGLE ROW |

+————+

2 rows selected (0.03 seconds)

Statistics: http://phoenix.apache.org/update_statistics.html

Tuning : http://phoenix.apache.org/tuning.html

Infomarion: http://phoenix.apache.org/language/index.html

You can see table in HBASE

hbase(main):004:0> list

TABLE

CRIME

Data load

/usr/lib/phoenix/bin/psql.py -t CRIME lab:2181 crim4.csv

Bulk load  – Be careful about ignore-errors parameter

HADOOP_CLASSPATH=/usr/lib/hbase/lib/hbase-protocol.jar:/etc/hbase/conf hadoop jar /usr/lib/phoenix/phoenix-4.0.0.2.1.4.0-632-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool –table CRIME  –input /tmp/crim4.csv  -z lab:2181 –ignore-errors