Hi guys,
Again one more issue which is very specific to cygwin + PIG.
You may see Input path does not exist <some path>/pigsampe_somenumber. on the cygwin while doing “ORDER BY” clause. It took some time for me to figure out it was due to ORDER BY clause.
Commonly you may see the stacktrace like this :
2013-09-06 17:50:52,110 [Thread-118] WARN org.apache.hadoop.mapred.LocalJobRunner – job_local_0008
java.lang.RuntimeException: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/E:/<directory from grunt started>/pigsample_1406502801_1378470046724at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.setConf(WeightedRangePartitioner.java:157)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:677)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214)
Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/E:/<directory from grunt started>/pigsample_1406502801_1378470046724
Solution :
You can use
A2 = foreach A1 {
A3 = ORDER A0 by fieldName;
GENERATE $0, $1…….
}
Leave a comment