【前言】hadoop2.X后就不能再用eclipse的hadoop插件了,所以在书写hadoop工程的时候尤其是mapreduce计算工程运行时与hadoop1.x稍有区别,在此简短做一记录
注意:这里以jar包的方式运行mapreduce工程
典型案例wordCount
【1】书写WCMapper
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
//accept
String line = value.toString();
//split
String[] words = line.split(" ");
//loop
for(String w : words){
//send
context.write(new Text(w), new LongWritable(1));
}
}
}
【2】书写WCReduce
import java.io.IOException;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
@Override
protected void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException {
//define a counter
long counter = 0;
//loop
for(LongWritable l : values){
counter += l.get();
}
//write
context.write(key, new LongWritable(counter));
}
}
【3】书写WordCount
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static void main(String[] args) throws Exception {
Job job = Job.getInstance(new Configuration());
//notice
job.setJarByClass(WordCount.class);
//set mapper`s property
job.setMapperClass(WCMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(LongWritable.class);
FileInputFormat.setInputPaths(job, new Path("/mrDemo/input/words.txt"));
//set reducer`s property
job.setReducerClass(WCReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileOutputFormat.setOutputPath(job, new Path("/mrDemo/output/wcCount"));
//submit
job.waitForCompletion(true);
}
}
【4】打jar包
步骤:
右击 > export > java > JAR file > next > 填写jar名称 > next > next > 选择main方法入口
【5】将jar包和所要计算的文件上传至linux
注意:这里所要分析的文件必须存入再次上传到HDFS,而且其路径要与代码中inputPath的路径一致
【6】hadoop运行jar
hadoop jar /jar包路径