Recently added data node is not cumming up

  • —namespaceID of master (name node ) and all slave ( datanodes )should match.
  • —If any of the machines has mismatch , while starting the cluster you will get an error : Incompatible namespaceID
  • —Due to mismatch data node process will not be started.
  • —Whenever data node /Name node does not start, please check for this issue. (generally when you add additional node to the cluster).
  • Every time NameNode is formatted, new namespace id is allocated to each of the machines. So formatting is not an option !

Solution :

Open the version file under namenode/<hadoop -temp dir>/dfs/name/current/VERSION check for property : namespaceID, copy this id and check if it is the same as datanode/<hadoop-temp>/dfs/data/current/VERSION. If not replace the one in data node’s version file with one of Namenode.

Advertisements

Input path does not exist: file:……………………………./pigsample_1406502801_1378470046724

Hi guys,

Again one more issue which is very specific to cygwin + PIG.

You may see Input path does not exist <some path>/pigsampe_somenumber. on the cygwin while doing “ORDER BY” clause. It took some time for me to figure out it was due to ORDER BY clause.

Commonly you may see the stacktrace like this :

2013-09-06 17:50:52,110 [Thread-118] WARN org.apache.hadoop.mapred.LocalJobRunner – job_local_0008
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/E:/<directory from grunt started>/pigsample_1406502801_1378470046724

at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/E:/<directory from grunt started>/pigsample_1406502801_1378470046724

Solution :

You can use

A2 = foreach A1 {

A3 = ORDER A0 by fieldName;

GENERATE $0, $1…….

}

Hive and PIG/Grunt shell hangs on Cygwin

Again this is typical issue related to Cygwin.

Scenario:

I am running Hadoop on local mode on my windows 7 machine (32/64 Bit).

I’ve installed HIVE/PIG/Hadoop/Java6 all on the C: drive.

I am using Cygwin version: 2.819 (current latest).
I’ve mounted C: on the cygwin.
I am able to run hadoop commands from the cygwin terminal for example : fs -ls etc.
I am also able to start grunt and hive shells.

But the real problem is:

Any command I enter on grunt shell (example: fs -ls or records = LOAD..... ) I do not see any output, it kind of hangs. Similarly with the hive prompt if I give the command as show tables ; I do not see any output just cursor keeps on blinking! Any keyboard inputs and gives NOTHING. System appears to be doing NOTHING.

To me all the things look fine.  All the environment variables are set correctly. I am not sure what is going wrong here!

Wow !!! I spent hours to fix it!

This is the issue with cygwin created icon on the desktop or shortcut.

If you right click the icon -> properties you will see like this in target field :

<cygwin_home>\bin\mintty.exe -i /Cygwin-Terminal.ico –

Just point it to

<cygwin_home>\Cygwin.bat -i /Cygwin-Terminal.ico –

Alternatively you can also go to <cygwin_home> and start Cygwin.bat from command prompt.

Cheers!

NoSuchMethodError while using joda-time-2.2.jar in PIG

Hi Guys,

I spent many hours solving this before I found the solution.

In the UDF we are using some APIs of Joda-time. The issue is while running the job it fails. There are no compilation issues in eclipse.  In eclipse also you might face this issue because the pig-version.jar also has joda package. Just put the joda-time-version.jar first in the classpath (before pig.jar) and you’r issue will be fixed.

I was trying to run it on the cygwin on windows machine but same issue can be seen on the linux box also.

Common stacktrace you might see :

2013-08-11 13:01:06,911 [Thread-9] WARN org.apache.hadoop.mapred.LocalJobRunner – job_local_0001

java.lang.NoSuchMethodError: org.joda.time.DateTime.now(Lorg/joda/time/DateTimeZone;)Lorg/joda/time/DateTime;
at com.myproject.pig.udf.ExtractDataByDates.exec(ExtractDataByDates.java:178)
at com.myproject.pig.udf.ExtractDataByDates.exec(ExtractDataByDates.java:12)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:337)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:381)

I tried many options :

1) Registering this joda jar in the PIG script using REGISTER call. (Didn’t work)

2) using -Dpig.additional.jars=/path/to/joda-time/jar (Did not work)

3) Set this jar in $HADOOP_CLASSPATH (Did not work)

4) Set this jar in the $classpath (Did not work)

5) Set the jar in the $PIG_CLASSPATH (It works)

export PIG_CLASSPATH=$PIG_CLASSPATH:/path/to/joda-time-2.2.jar

Cannot locate pig.jar. do ‘ant jar’, and try again

Hi folks,

I was trying to set up PIG on my gateway machine which has Windows 7 installed on it.

This issue is very specific to Cygwin.

After breaking my head for a couple of hours I found the solution :

Solution is very simple.

Just rename your from  “pig-0.10.1-withouthadoop.jar” to “pig-withouthadoop.jar”.

Namenode doesn’t start after upgrading Hadoop version

I have copied all the files correctly, all my jars are in place. I’ve set all the important properties correctly (i.e. namenode address,etc). I have formatted the HDFS .Now I am trying to start the cluster.

If you are running the cluster which is 0.20 version or before, and if you upgrade to 1.0.4 or above, while you restart the cluster by saying start-all.sh. All the demons should have started on the respective machines. (masters and slaves). But my namenode is not starting……..

This is due to file system changes in the HDFS itself. If you check the log you will see :

File system image contains an old layout version -18. An upgrade to version -32 is required.

Solution :

Very simple :

No need to stop other demons (you can stop if you want but it is not required).

Use start-dfs.sh -upgrade   (Here upgrade is mandatory).

Using jps you can see that now namenode is also running!

Common problem while copying data from source to HDFS using Flume

FLUME JAVA.LANG.CLASSNOTFOUNDEXCEPTION: ORG.APACHE.HADOOP.IO.SEQUENCEFILE$COMPRESSIONTYPE

Scenario: I wanted to copy the logs from the scource to HDFS. HDFS demons are up and running on the cluster. I’ve pointed the sink to hdfs but when I am trying to start the agent it is not starting. On checking the log files I see the stacktrace like this :

Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.SequenceFile$CompressionType
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)

It is very clear that is not able to find the expected class on the classpath, hence the solution :
Copy your hadoop-core-xyz.jar to $FLUME_HOME/lib directory.

Note : If you are running your hadoop cluster on 0.20 versions, after copying this file, FileNotFound Exception will be solved, but you will end up getting authentication errors. Try using 1.0.x stable versions.