Hadoop Services Starting and stopping Sequence (Mostly in HDP)

All installed services should be started in a specific order. The suggested order is:

1) Knox
2) ZooKeeper
3) HDFS
4) YARN
5) HBase
6) Hive Metastore
7) HiveServer2
8) WebHCat
9) Oozie
10) Storm
11) Kafka

Running services in the cluster should be stopped in a specific order. The suggested order is is:

1) Knox
2) Oozie
3) WebHCat
4) HiveServer2
5) Hive Metastore
6) HBase
7) YARN
8) HDFS
9) ZooKeeper
10) Storm
11) Kafka

Resetting Ambari Admin password

I was trying to setup the Ambari to authenticate against Active directory and meanwhile I tried logging in to Ambari portal. I tried with admin userid and it’s password but did not work. Thinking I might have forgotten them, tried resetting. To reset there is not direct way (which I know of ), you can follow the following path to set it to default id/password to : admin/admin.

  1. Stop Ambari server
  2. Log on to ambari server host (node on which Amabari server is installed)
  3. Run ‘psql -U ambari’
  4. Enter password **** (this password is stored in /etc/ambari-server/conf/password.dat, default is bigdata)
  5. In psql prompt ambari=>:
  6. update ambari.users set user_password=’538916f8943ec225d97a9a86a2c6ec0818c1cd400e09e03b660fdaaec4af29ddbb6f2b1033b81b00′ where user_name=’admin’
  7. Quit psql
  8. Restart Ambari Server ‘ambari-server restart’

PS: The password used in step 6 is the encrypted form of ‘admin’.

Fixing : Ambari Agent Disk Usage Alert Critical

Hi,

I recently faced this issue. Even though I’ve 3 TB of HDD attached with the machines, I used to frequently see the Ambari Agent Disk usage red alerts. It was a production cluster and was supposed to hand over customer and on the same day this error started appearing. Tried googling and searching on Ambari forums but it was merely waste of time. It was a bit scary situation and was under lot of pressure. :).

Sharing the solution here so that others get benefited and can fix it easily without taking pressure 😉

Capacity Used: [59.05%, 18.0 GB], Capacity Total: [30.5 GB], path=/usr/hdp
Capacity Used: [56.50%, 17.3 GB], Capacity Total: [30.5 GB], path=/usr/hdp
Capacity Used: [60.61%, 18.5 GB], Capacity Total: [30.5 GB], path=/usr/hdp

Along with this you may also see this kind of error:

1/1 local-dirs are bad: /hadoop/yarn/local; 1/1 log-dirs are bad: /hadoop/yarn/log

Generally it happens due to YARN Applications while executing the job, generates lot of temporary data and as show above Ambari is taking only /usr/hdp path by default.

Fixing it is relatively easy :

  1. Create the directories for Log and temporary directory for intermediate data and set the owner of these directories to yarn.

mkdir -p /mnt/datadrive01/hadoop/yarn/local
mkdir -p /mnt/datadrive01/hadoop/yarn/log

(Do for various mounts)

chown yarn:hadoop /mnt/datadrive01/hadoop/yarn/log
chown yarn:hadoop /mnt/datadrive01/hadoop/yarn/local

Change following two properties under Node Manager

yarn.nodemanager.local-dirs = /mnt/datadrive01/hadoop/yarn/local, /mnt/datadrive02/hadoop/yarn/local,/mnt/datadrive03/hadoop/yarn/local,/mnt/datadrive04/hadoop/yarn/local

yarn.nodemanager.log-dirs =/mnt/datadrive01/hadoop/yarn/log,/mnt/datadrive02/hadoop/yarn/log,/mnt/datadrive03/hadoop/yarn/log,/mnt/datadrive04/hadoop/yarn/log

Restart the affected components and you are done.

Good luck.

-A

Full authentication is required to access this resource while deleting a service/component from Ambari

Hi,

We had setup an component/service which was no longer required by the customer hence decided to get rid of it. Checked the Ambari documentation did few trial error approach but did not help much.

Tried many permutation combinations, referred to Ambari Documentation to delete the service, referred to Stack-overflow but finally stuck up with

HTTP ERROR: 403

Problem accessing /api/v1/hostname.

Reason: Full authentication is required to access this resource

Finally after spending much time I could fix it. Sharing the same RESTful command with community in order to save others’ time 🙂

curl -v -u admin:@dminPassword -H “X-Requested-By:ambari” -X DELETE “http://<Ambari_Host>/api/v1/clusters/<clustername>/ervices/<SerivceName >”

 

 

I recently learnt that it deletes the services only from the Ambari GUI but it does not delete from the database.

So you need to login to Ambari database. (Default is postgres) And execute the update table commands. For example : For Knox service I executed following DMLs.

psql ambari ambari
Password for user ambari: #default password is “bigdata”
delete from servicedesiredstate where service_name like ‘%KNOX%’;
delete from clusterservices where service_name like ‘KNOX’;
delete from hostcomponentstate where component_name like ‘%KNOX%’;
delete from hostcomponentdesiredstate where component_name like ‘%KNOX%’;
delete from servicecomponentdesiredstate where component_name like ‘%KNOX%’;

Note : If you are copy pasting the above command make sure you replace – and ” as they are encoded by the wordpress blog.

Status_Code 403 while setting up Spark Component on the HDP 2.3 cluster

Hi,

Recently I was trying to setup Spark components on the Hortonworks 2.3 cluster. While installation I found the something like this : http://<serverName>:50070/webhdfs/v1/user/spark?op=MKDIRS&user.name=hdfs” returned status_code=403

Complete stack trace is like this from Ambari Server:

Traceback (most recent call last):
File “/var/lib/ambari-agent/cache/common-services/SPARK/1.2.0.2.2/package/scripts/job_history_server.py”, line 90, in <module>
JobHistoryServer().execute()
File “/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py”, line 218, in execute
method(env)
File “/var/lib/ambari-agent/cache/common-services/SPARK/1.2.0.2.2/package/scripts/job_history_server.py”, line 54, in start
self.configure(env)
File “/var/lib/ambari-agent/cache/common-services/SPARK/1.2.0.2.2/package/scripts/job_history_server.py”, line 48, in configure
setup_spark(env, ‘server’, action = ‘config’)
File “/var/lib/ambari-agent/cache/common-services/SPARK/1.2.0.2.2/package/scripts/setup_spark.py”, line 44, in setup_spark
mode=0775
File “/usr/lib/python2.6/site-packages/resource_management/core/base.py”, line 157, in __init__
self.env.run()
File “/usr/lib/python2.6/site-packages/resource_management/core/environment.py”, line 152, in run
self.run_action(resource, action)
File “/usr/lib/python2.6/site-packages/resource_management/core/environment.py”, line 118, in run_action
provider_action()
File “/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py”, line 390, in action_create_on_execute
self.action_delayed(“create”)
File “/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py”, line 387, in action_delayed
self.get_hdfs_resource_executor().action_delayed(action_name, self)
File “/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py”, line 246, in action_delayed
self._create_resource()
File “/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py”, line 256, in _create_resource
self._create_directory(self.main_resource.resource.target)
File “/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py”, line 280, in _create_directory
self.util.run_command(target, ‘MKDIRS’, method=’PUT’)
File “/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py”, line 201, in run_command
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of ‘curl -sS -L -w ‘%{http_code}’ -X PUT ‘http://serverName:50070/webhdfs/v1/user/spark?op=MKDIRS&user.name=hdfs&#8221; returned status_code=403.

Well this issue is relatively simple.

Check your name node status most of the time this issue comes due to Name node in the safe mode.

Either you can wait for NN to leave the safe mode or force the NN to come out of safe mode.  Generally when you have configured the NN in High Availability mode, it take time to come out of Safe mode on the cluster restart because of existing number of blocks and the block report needs to be sent to both the active and stand by name nodes.

Once the name node is out of safe mode, try restarting the install process for Spark component it should be smooth.

Cheers.

Configuring single node Storm Cluster.

The following steps are for Ubuntu 12.04 LT or 14.05 LT. Storm has lot of moving parts as of now and the easiest configuration happens on Ubuntu. I tried configuring on CentOS and found quite challenging. Before trying storm configuration on CentOS I suggest you first try on Ubuntu. ——————————- PRE Requisites—————————-

  1. Make sure your Ubuntu is updated. You can update it using $ sudo apt-get update
  2. Install your favorite JDK. For example : sudo apt-get install openjdk-6-jdk

Image ————————– Other required tools —————————–

$ sudo apt-get install git -y

$ sudo apt-get install libtool -y

$ sudo apt-get install automake -y

$ sudo apt-get install uuid-dev

$ sudo apt-get install g++ -y

$ sudo apt-get install gcc-multilib -y

————————- Zookeeper————————-

ZooKeeper provides a service for maintaining centralized information in a distributed environment using a small set of primitives and group services. Storm uses ZooKeeper primarily to coordinate state information such as task assignments, worker status, and topology metrics between nimbus and supervisors in a cluster. 1) Get the Zookeeper Download the zookeeper setup ( latest at the time of writing is : 3.4.6 ). You can download from browser or with wget. wget http://www.eng.lsu.edu/mirrors/apache/zookeeper/stable/zookeeper-3.4.6.tar.gz Image 2)      Extract the tarball. Image 3) Rename the Zookeeper extracted directory : $ mv zookeeper-3.4.6 zookeeper Image   4) Optionally : a)  Add ZOOKEEPER_HOME under .bashrc b)  Add ZOOKEEPER_HOME/bin to the PATH variable Image   5)  Create a data directory on your favorite place. $ mkdir zookeeper-data/ Image 6) Create a configuration file under ZOOKEEPER_HOME/conf/ directory say zoo.cfg Image 7) Add ticktime, dataDir and clientPort properties in the zoo.cfg file. Image 8) Verify that you are able to start the zookeeper server  : after starting using : $ zkServer.sh start. Image   ———————- Zero MQ ———————————- Storm internally uses ZerMQ, in the current version it is to be installed explicitly, in the future releases they are planning to include this dependency as a part of the storm distribution. 1) Get the ZeroMQ $ wget http://download.zeromq.org/zeromq-2.1.7.tar.gz Image 2) Untar the tarball. $ tar –xvf zeromq-2.1.7.tar.gz Image ————- Configuring ZeroMQ——————- 1) $ cd zeromq-2.1.7 2) $ ./configure   Image   Image 3) $ make ZeroMQ5 4) $ sudo make install ZeroMQ6     ————————– Java Bindings for ZeroMQ  —————————– 1) Get the java binding for ZeroMQ $ git clone https://github.com/nathanmarz/jzmq.git This will create a folder with name jzmq jzmq1  Configurating jzmq: 1) $ cd jzmq 2)  $ sed -i ‘s/classdist_noinst.stamp/classnoinst.stamp/g’ src/Makefile.am 3) $ ./autogen.sh jzmq2 4) $ ./configure jzmq3 5) $ make jzmq4 6) $ sudo make install jzmq5       ————————————– Configuring Storm ———————————— 1) Download storm binaries $ wget http://mirror.tcpdiag.net/apache/incubator/storm/apache-storm-0.9.1-incubating/apache-storm-0.9.1-incubating.tar.gz 2) Untar the tarball $ tar -xvf apache-storm-0.9.1-incubating.tar.gz Storm1 2) Rename the extracted directory to Storm $ mv apache-storm-0.9.1-incubating storm   Storm2 3) Optionally : Add STORM_HOME in .bashrc file. Add STORM_HOME/bin to the PATH. Storm3   4) Add a data directory for storm to store the temporary data and topology jars. I am creating under $STORM_HOME $ cd $STORM_HOME $ mkdir data 5) Edit $STORM_HOME/conf/storm.yaml file. Storm4 6) Edit the values of various parameters like following :  This is very important. And this is the major part of Storm configuration.  Storm5 ————- Start the demeans and verify that installation is successful.———————   Note : If you have not added Storm_Home/bin to the path then you will require to go to STORM_HOME/bin directory and issue the commands on terminal…

 

1) Start Nimbus :
$ storm nimbus

x1

 

2) Start Supervisor ( Do not close previous terminal Open another terminal window and type following )

x2

3) Start UI ( Open a new terminal, change the directory to storm and start UI . Don’t close the previous terminal)

x3

4) Check the UI ( Hit the URL with IP Address of the UI Port defined in storm.yaml file)

x4

 

Troubleshooting Notes : 

Most commonly faced issues are two :

1) Exception in thread “main” java.lang.RuntimeException: org.apache.thrift7.transport.TransportException: 

java.net.ConnectException: Connection refused 

2) Nimbus starts , but stops after few seconds.

The problem could be one of the following

  1. 1. Nimbus is not started correctly. Or Nimbus stopped some time after starting. Check the nimbus logs for errors.
  2. Nimbus or zookeeper is not correctly configured. Please check the storm.yaml file.
  3. dataDir defined in the storm.yaml does not exist or has permission issues.
  4. storm.local.dir defined in the storm.yaml, does not exist or it has permission issues.
  5. There could be connectivity issues between the machines. Please check your network settings.

Happy STORMing!

Recently added data node is not cumming up

  • —namespaceID of master (name node ) and all slave ( datanodes )should match.
  • —If any of the machines has mismatch , while starting the cluster you will get an error : Incompatible namespaceID
  • —Due to mismatch data node process will not be started.
  • —Whenever data node /Name node does not start, please check for this issue. (generally when you add additional node to the cluster).
  • Every time NameNode is formatted, new namespace id is allocated to each of the machines. So formatting is not an option !

Solution :

Open the version file under namenode/<hadoop -temp dir>/dfs/name/current/VERSION check for property : namespaceID, copy this id and check if it is the same as datanode/<hadoop-temp>/dfs/data/current/VERSION. If not replace the one in data node’s version file with one of Namenode.