Issue:

Data load was in progress and Secondary Index drop started in parallel.

util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => 38b02ae162a7600c2f0d2d380bb6c56d, NAME => ‘CRIME,OO336357,1416582094697.38b02ae162a7600c2f0d2d380bb6c56d.’, STARTKEY => ‘OO336357′, ENDKEY => ”}
2014-11-22 04:01:53,257 INFO [main] util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {ENCODED => 38b02ae162a7600c2f0d2d380bb6c56d, NAME => ‘CRIME,OO336357,1416582094697.38b02ae162a7600c2f0d2d380bb6c56d.’, STARTKEY => ‘OO336357′, ENDKEY => ”}

ERROR: Region { meta => CRIME,OO336357,1416582094697.38b02ae162a7600c2f0d2d380bb6c56d., hdfs => hdfs://seregiondev/apps/hbase/data/data/default/CRIME/38b02ae162a7600c2f0d2d380bb6c56d, deployed =>  } not deployed on any region server.

ERROR: Region { meta => CRIME,,1416581888939.9aec10d3fb1d63719bdd1743a7d7d1c9., hdfs => hdfs://seregiondev/apps/hbase/data/data/default/CRIME/9aec10d3fb1d63719bdd1743a7d7d1c9, deployed =>  } not deployed on any region server.

2014-11-22 04:30:53,557 INFO  [main] util.HBaseFsck: Region { meta => INDEX_ARREST,,1416585436754.9d1205ae1f6abeafefe1a15b1a5c40c3., hdfs => hdfs://seregiondev/apps/hbase/data/data/default/INDEX_ARREST/9d1205ae1f6abeafefe1a15b1a5c40c3, deployed =>  } is in META, and in a disabled tabled that is not deployed

2014-11-22 04:30:53,557 DEBUG [main] util.HBaseFsck: There are 18 region info entries

2014-11-22 04:30:53,660 INFO  [main] util.HBaseFsck: Handling overlap merges in parallel. set hbasefsck.overlap.merge.parallel to false to run serially.

Solution:

 

After lot of research

hbase zkcli

rmr /hbase

restart HBASE & REGION SERVERS

Ran  the following command & after errors  executed hbase hbck …and it fixed the issue

hbase hbck -fix CRIME

Average load: NaN

Number of requests: 0

Number of regions: 0

Number of regions in transition: 0

ERROR: META region or some of its attributes are null.

ERROR: hbase:meta is not found on any region.

Trying to fix a problem with hbase:meta..

2014-11-22 05:24:29,773 INFO  [main] client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService

2014-11-22 05:24:29,774 INFO  [main] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x349d29d130c0038

2014-11-22 05:24:29,777 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down

2014-11-22 05:24:29,777 INFO  [main] zookeeper.ZooKeeper: Session: 0x349d29d130c0038 closed

Exception in thread “main” org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=35, exceptions:

Sat Nov 22 05:15:38 PST 2014, org.apache.hadoop.hbase.client.RpcRetryingCaller@34418cee, org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.PleaseHoldException): org.apache.hadoop.hbase.PleaseHoldException: Master is initializing

at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2512)

at org.apache.hadoop.hbase.master.HMaster.assignRegion(HMaster.java:2568)

at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:38207)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)

at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)

at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

 

0: jdbc:phoenix> SELECT count(1) from CRIME;

+————+

| COUNT(1) |

+————+

| 4111901    |

+————+

1 row selected (17.119 seconds)

1) Copy book.txt from local filesystem to HDFS

book example is book.txt

[root@sandbox ~]# cat > book.txt

Hello I am book

Hello I am book

Hi You are book

[root@sandbox ~]#

Launch grunt shell

[root@sandbox ~]# pig

2014-10-22 01:01:05,908 [main] INFO  org.apache.pig.Main – Apache Pig version 0.12.0.2.0.6.0-76 (rexported) compiled Oct 17 2013, 20:44:07

2014-10-22 01:01:05,908 [main] INFO  org.apache.pig.Main – Logging error messages to: /root/pig_1413964865906.log

2014-10-22 01:01:05,938 [main] INFO  org.apache.pig.impl.util.Utils – Default bootup file /root/.pigbootup not found

2014-10-22 01:01:06,190 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine – Connecting to hadoop file system at: hdfs://sandbox.hortonworks.com:8020

grunt>

The following step will copy file from local file system to HDFS

grunt> copyfromlocal book.txt demo/book.txt

grunt>  quit;

Create pig script

[root@sandbox ~]# cat > book.pig

book = LOAD ‘$INPUTFILE’ USING PigStorage() AS (lines:chararray);

words = FOREACH book GENERATE FLATTEN(TOKENIZE(lines)) as word;

wordsGrouped =  Group words by word;

wordsAggregated =  FOREACH wordsGrouped GENERATE group as word, COUNT(words);

wordsSorted =  ORDER wordsAggregated by $1 DESC;

STORE wordsSorted INTO ‘book_out';

Run pig script 

[root@sandbox ~]#

pig  -p INPUTFILE=hdfs://sandbox.hortonworks.com:8020/user/root/book/book.txt book.pig

Check out output

grunt> cat hdfs://sandbox.hortonworks.com:8020/user/root/demo/final_result/part-r-00000

Hello 2

book 2

am 2

I 2

grunt>

 All done :)

HIVEJDBC

 

Hive JAR

hive-jdbc-uber-jar-master

 

Below is an example configuration using DbVisualizer (http://www.dbvis.com/):

  1. Under “Tools” > “Driver Manager…” hit the “Create a new driver” button.
  2. Fill in the information as seen below. For the “Driver File Paths” you are pointing to the hive-jdbc-uber-1.0.jar created above.

Next create a new connection.

 

Database url jdbc:hive2://<server IP>:10000/default

 

Use -verbose to see the details

root@ns-lab02 conf]# /usr/lib/hive/bin/schematool -initSchema -dbType mysql -userName hive -passWord xxx -verbose

Metastore connection URL: jdbc:mysql://ns-lab02-snn/hive?createDatabaseIfNotExist=true

Metastore Connection Driver : com.mysql.jdbc.Driver

Metastore connection User: hive

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.

org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.

Caused by: java.sql.SQLException: Access denied for user ‘hive’@’ns-lab02-snn’ (using password: YES)

… 9 more

*** schemaTool failed ***

Solution 

root@ns-lab02 conf]# mysql -u root -p

Enter password:

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 8

Server version: 5.1.73 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

mysql> GRANT ALL ON *.* to ‘hive’@’ns-lab02-snn’ IDENTIFIED BY ‘xxxx';

Query OK, 0 rows affected (0.00 sec)

Error:

Exception in thread “main” java.lang.ClassFormatError: org.apache.spark.deploy.SparkSubmit (unrecognized class file version)

hdfs@ambari1 spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]$ ./bin/spark-submit –class org.apache.spark.examples.SparkPi –master yarn-cluster –num-executors 3 –driver-memory 512m –executor-memory 512m –executor-cores 1 lib/spark-examples*.jar 10
Exception in thread “main” java.lang.ClassFormatError: org.apache.spark.deploy.SparkSubmit (unrecognized class file version)
at java.lang.VMClassLoader.defineClass(libgcj.so.7rh)
at java.lang.ClassLoader.defineClass(libgcj.so.7rh)
at java.security.SecureClassLoader.defineClass(libgcj.so.7rh)
at java.net.URLClassLoader.findClass(libgcj.so.7rh)
at java.lang.ClassLoader.loadClass(libgcj.so.7rh)
at java.lang.ClassLoader.loadClass(libgcj.so.7rh)
at gnu.java.lang.MainThread.run(libgcj.so.7rh)

Solution:

export JAVA_HOME=/usr/jdk64/jdk1.7.0_45
export PATH=$JAVA_HOME/bin:$PATH
export YARN_CONF_DIR=/etc/hadoop/conf

[hdfs@ambari1 spark-1.0.1.2.1.3.0-563-bin-2.4.0.2.1.3.0-563]$ ./bin/spark-submit –class org.apache.spark.examples.SparkPi –master yarn-cluster –num-executors 3 –driver-memory 512m –executor-memory 512m –executor-cores 1 lib/spark-examples*.jar 10

http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/

On your mac or linux machine
Note: IP address is replaced with x.x.x.x for security reasons

1) Download following zip file and extract

example: /tmp/dse/
Drivers

$ unzip dse.zip
Archive: dse.zip
creating: dse/
inflating: dse/cassandra-driver-core-2.1.0-rc1.jar
inflating: dse/cassandra-driver-dse-2.1.0-rc1-tests.jar
inflating: dse/guava-16.0.1.jar
inflating: dse/metrics-core-3.0.2.jar
inflating: dse/netty-3.9.0.Final.jar
inflating: dse/slf4j-api-1.7.5.jar

2) create SimpleClient.java using vi or emacs or nano
$ cat SimpleClient.java
package com.example.cassandra;
import com.datastax.driver.core.Cluster;
import com.datastax.driver.core.Host;
import com.datastax.driver.core.Metadata;

public class SimpleClient {
private Cluster cluster;

public void connect(String node) {
cluster = Cluster.builder()
.addContactPoint(node)
.build();
Metadata metadata = cluster.getMetadata();
System.out.printf(“Connected to cluster: %s\n”,
metadata.getClusterName());
for ( Host host : metadata.getAllHosts() ) {
System.out.printf(“Datatacenter: %s; Host: %s; Rack: %s\n”,
host.getDatacenter(), host.getAddress(), host.getRack());
}
}

public void close() {
cluster.close();
}

public static void main(String[] args) {
SimpleClient client = new SimpleClient();
client.connect(“Cassandra Sever IP”);
client.close();
}
}

3) Compile and run

mkdir ns

javac -classpath /tmp/dse/*:. SimpleClient.java -d ns

cd ns

java -classpath /tmp/dse/*:. com/example/cassandra/SimpleClient

SLF4J: Failed to load class “org.slf4j.impl.StaticLoggerBinder”.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
Connected to cluster: Test Cluster
Datatacenter: Cassandra; Host: /127.0.0.1; Rack: rack1
Datatacenter: Cassandra; Host: /x.x.x.x; Rack: rack1
Datatacenter: Analytics; Host: /x.x.x.x; Rack: rack1

FYI – Java Home setting. It may vary as per your environment
JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre

yum install java-1.6.0-openjdk*

Very important

[root@master ~]# rpm -Uvh http://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
Retrieving http://dl.fedoraproject.org/pub/epel/5/i386/epel-release-5-4.noarch.rpm
warning: /var/tmp/rpm-xfer.F1J7um: Header V3 DSA signature: NOKEY, key ID 217521f6
Preparing… ########################################### [100%]
1:epel-release ########################################### [100%]
[root@master ~]# yum install dse-full opscenter
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* epel: mirror.sfo12.us.leaseweb.net
epel | 3.7 kB 00:00
epel/primary_db | 3.9 MB 00:00
Setting up Install Process
Resolving Dependencies
–> Running transaction check
—> Package dse-full.noarch 0:4.5.1-1 set to be updated
–> Processing Dependency: dse-libsqoop = 4.5.1 for package: dse-full
–> Processing Dependency: dse-libhive = 4.5.1 for package: dse-full
–> Processing Dependency: dse-demos = 4.5.1 for package: dse-full
–> Processing Dependency: dse-libsolr = 4.5.1 for package: dse-full
–> Processing Dependency: dse-libmahout = 4.5.1 for package: dse-full
–> Processing Dependency: dse-libpig = 4.5.1 for package: dse-full
–> Processing Dependency: dse-libhadoop = 4.5.1 for package: dse-full
–> Processing Dependency: dse-libtomcat = 4.5.1 for package: dse-full
–> Processing Dependency: dse-liblog4j = 4.5.1 for package: dse-full
–> Processing Dependency: dse-libspark = 4.5.1 for package: dse-full
–> Processing Dependency: dse-libcassandra = 4.5.1 for package: dse-full
–> Processing Dependency: datastax-agent for package: dse-full
—> Package opscenter.noarch 0:4.1.4-1 set to be updated
–> Processing Dependency: python(abi) >= 2.6 for package: opscenter
–> Processing Dependency: pyOpenSSL for package: opscenter
–> Running transaction check
—> Package datastax-agent.noarch 0:4.1.4-1 set to be updated
–> Processing Dependency: sysstat for package: datastax-agent
—> Package dse-demos.noarch 0:4.5.1-1 set to be updated
—> Package dse-libcassandra.noarch 0:4.5.1-1 set to be updated
–> Processing Dependency: jna >= 3.2.4 for package: dse-libcassandra
—> Package dse-libhadoop.noarch 0:4.5.1-1 set to be updated
–> Processing Dependency: dse-libhadoop-native = 4.5.1 for package: dse-libhadoop
—> Package dse-libhive.noarch 0:4.5.1-1 set to be updated
—> Package dse-liblog4j.noarch 0:4.5.1-1 set to be updated
—> Package dse-libmahout.noarch 0:4.5.1-1 set to be updated
—> Package dse-libpig.noarch 0:4.5.1-1 set to be updated
—> Package dse-libsolr.noarch 0:4.5.1-1 set to be updated
—> Package dse-libspark.noarch 0:4.5.1-1 set to be updated
—> Package dse-libsqoop.noarch 0:4.5.1-1 set to be updated
—> Package dse-libtomcat.noarch 0:4.5.1-1 set to be updated
—> Package pyOpenSSL.x86_64 0:0.6-2.el5 set to be updated
—> Package python26.x86_64 0:2.6.8-2.el5 set to be updated
–> Processing Dependency: libpython2.6.so.1.0()(64bit) for package: python26
–> Processing Dependency: libffi.so.5()(64bit) for package: python26
–> Running transaction check
—> Package dse-libhadoop-native.x86_64 0:4.5.1-1 set to be updated
—> Package jna.x86_64 0:3.4.0-4.el5 set to be updated
—> Package libffi.x86_64 0:3.0.5-1.el5 set to be updated
—> Package python26-libs.x86_64 0:2.6.8-2.el5 set to be updated
—> Package sysstat.x86_64 0:7.0.2-12.el5 set to be updated
–> Finished Dependency Resolution

Dependencies Resolved

=================================================================================================================================================================================
Package Arch Version Repository Size
=================================================================================================================================================================================
Installing:
dse-full noarch 4.5.1-1 datastax 6.2 M
opscenter noarch 4.1.4-1 datastax 66 M
Installing for dependencies:
datastax-agent noarch 4.1.4-1 datastax 19 M
dse-demos noarch 4.5.1-1 datastax 42 M
dse-libcassandra noarch 4.5.1-1 datastax 23 M
dse-libhadoop noarch 4.5.1-1 datastax 21 M
dse-libhadoop-native x86_64 4.5.1-1 datastax 407 k
dse-libhive noarch 4.5.1-1 datastax 33 M
dse-liblog4j noarch 4.5.1-1 datastax 14 k
dse-libmahout noarch 4.5.1-1 datastax 87 M
dse-libpig noarch 4.5.1-1 datastax 18 M
dse-libsolr noarch 4.5.1-1 datastax 50 M
dse-libspark noarch 4.5.1-1 datastax 147 M
dse-libsqoop noarch 4.5.1-1 datastax 2.4 M
dse-libtomcat noarch 4.5.1-1 datastax 4.8 M
jna x86_64 3.4.0-4.el5 epel 270 k
libffi x86_64 3.0.5-1.el5 epel 24 k
pyOpenSSL x86_64 0.6-2.el5 base 120 k
python26 x86_64 2.6.8-2.el5 epel 6.5 M
python26-libs x86_64 2.6.8-2.el5 epel 695 k
sysstat x86_64 7.0.2-12.el5 base 187 k

Transaction Summary
=================================================================================================================================================================================
Install 21 Package(s)
Upgrade 0 Package(s)

Total download size: 527 M
Is this ok [y/N]: y
Downloading Packages:
(1/21): dse-liblog4j-4.5.1-1.noarch.rpm | 14 kB 00:00
(2/21): libffi-3.0.5-1.el5.x86_64.rpm | 24 kB 00:00
(3/21): pyOpenSSL-0.6-2.el5.x86_64.rpm | 120 kB 00:00
(4/21): sysstat-7.0.2-12.el5.x86_64.rpm | 187 kB 00:00
(5/21): jna-3.4.0-4.el5.x86_64.rpm | 270 kB 00:00
(6/21): dse-libhadoop-native-4.5.1-1.x86_64.rpm | 407 kB 00:00
(7/21): python26-libs-2.6.8-2.el5.x86_64.rpm | 695 kB 00:00
(8/21): dse-libsqoop-4.5.1-1.noarch.rpm | 2.4 MB 00:01
(9/21): dse-libtomcat-4.5.1-1.noarch.rpm | 4.8 MB 00:02
(10/21): dse-full-4.5.1-1.noarch.rpm | 6.2 MB 00:01
(11/21): python26-2.6.8-2.el5.x86_64.rpm | 6.5 MB 00:00
(12/21): dse-libpig-4.5.1-1.noarch.rpm | 18 MB 00:06
(13/21): datastax-agent-4.1.4-1.noarch.rpm | 19 MB 00:07
(14/21): dse-libhadoop-4.5.1-1.noarch.rpm | 21 MB 00:03
(15/21): dse-libcassandra-4.5.1-1.noarch.rpm | 23 MB 00:08
(16/21): dse-libhive-4.5.1-1.noarch.rpm | 33 MB 00:08
(17/21): dse-demos-4.5.1-1.noarch.rpm | 42 MB 00:09
(18/21): dse-libsolr-4.5.1-1.noarch.rpm | 50 MB 00:07
(19/21): opscenter-4.1.4-1.noarch.rpm | 66 MB 00:08
(20/21): dse-libmahout-4.5.1-1.noarch.rpm (64%) 56% [================================== ] 5.6 MB/s | 49 MB 00:06 ETA

Big Data for SMB ( Small & Medium Business) is very crucial  to generate more revenue. Most of SMB hear about  Big Data and decide to stay away of it because of following

1. Lack of knowledge and understanding the technologies involved in Big Data analysis
2. Lack of Engineering power which is Data Engineers and Data Scientist
3. Lack of awareness on the data coming in from  various resources from their application engagements with various social platforms like FB , Pinterest , Twitter and many more 

How can we address this?

1. Invest on people. Give them access to various Data analysis tools
2. Outsource the Data Engineering part and just focus on data analysis part. Let hosting provider worry about providing secure, robust and compliant infrastructure and let them handle data loading and various data analysis jobs to process Data.
3. Bridge up the gap between various segments of your organisations to understand the data coming in and then what data to keep to process it