Input path does not exist: file:……………………………./pigsample_1406502801_1378470046724

Hi guys,

Again one more issue which is very specific to cygwin + PIG.

You may see Input path does not exist <some path>/pigsampe_somenumber. on the cygwin while doing “ORDER BY” clause. It took some time for me to figure out it was due to ORDER BY clause.

Commonly you may see the stacktrace like this :

2013-09-06 17:50:52,110 [Thread-118] WARN org.apache.hadoop.mapred.LocalJobRunner – job_local_0008
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/E:/<directory from grunt started>/pigsample_1406502801_1378470046724

at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/E:/<directory from grunt started>/pigsample_1406502801_1378470046724

Solution :

You can use

A2 = foreach A1 {

A3 = ORDER A0 by fieldName;

GENERATE $0, $1…….

}

Tagged: ,

Leave a comment