转载

【源】从零自学Hadoop(08)：第一个MapReduce

阅读目录

序
数据准备
wordcount
Yarn
新建MapReduce
示例下载
系列索引

本文版权归mephisto和博客园共有，欢迎转载，但须保留此段声明，并给出原文链接，谢谢合作。

文章是哥(mephisto)写的，SourceLink

上一篇，我们的Eclipse插件搞定，那开始我们的MapReduce之旅。

在这里，我们先调用官方的wordcount例子，然后再手动创建个例子，这样可以更好的理解Job。

数据准备

一：说明

wordcount这个类是对不同的word进行统计个数，所以这里我们得准备数据，当然也不需要很大的数据量，毕竟是自己做试验对吧。

二：造数据

打开记事本，输入各种word，有相同的，不同的。然后保存为words_01.txt。

【源】从零自学Hadoop(08)：第一个MapReduce

三：上传

打开eclipse，然后在DFS location 中将我们准备的数据源上传到tmp/input。

这样我们的数据就准备好了。

【源】从零自学Hadoop(08)：第一个MapReduce

wordcount

一：官网示例

wordcount是hadoop的一个官网试例，打包在hadoop-mapreduce-examples-<ver>.jar。

2.7.1版本的地址： http://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html

二：找到示例

我们在结果中看到两个地方有，那就找个近一点的地方吧。

find / -name *hadoop-mapreduce-examples*

【源】从零自学Hadoop(08)：第一个MapReduce

四：进入目录

我们选择进入/usr/hdp/下面的这个例子。

cd /usr/hdp/2.3.0.0-2557/hadoop-mapreduce

五：执行

我们先使用hadoop jar这个命令执行。

命令说明:hadoop jar 包名称方法输入文件/目录输出目录

#切换用户 su hsfs #执行 hadoop jar hadoop-mapreduce-examples-2.7.1.2.3.0.0-2557.jar wordcount /tmp/input/words_01.txt /tmp/output/1007_01

命令执行结果

【源】从零自学Hadoop(08)：第一个MapReduce

插件结果

【源】从零自学Hadoop(08)：第一个MapReduce

job页面结果

【源】从零自学Hadoop(08)：第一个MapReduce

这样我们的第一个job就这样顺利的执行完成了。

一：介绍

Hadoop2.X和Hadoop1.X有两个最大的变化，也是根本性变化。

其中一个是Namenode的单点问题解决，然后就是Yarn的引入。在这里我们就不做展开的讲了，后面会安排章节进行讲述。

二：Yarn命令

如果仔细看的话，我们可以发现在上面hadoop jar这个命令执行后，会有一个警告。

yarn jar hadoop-mapreduce-examples-2.7.1.2.3.0.0-2557.jar wordcount /tmp/input/words_01.txt /tmp/output/1007_02

【源】从零自学Hadoop(08)：第一个MapReduce

新建MapReduce

一：通过插件新建工程

这里就不详说了，在上一篇我们通过插件建立了一个工程，我们直接使用那个工程“com.first”。

二：新建WordCountEx类

这个是我们的自定义的wordcount类，仿照官网例子写的，做了点DIY，方便大家理解。

【源】从零自学Hadoop(08)：第一个MapReduce

完成后

【源】从零自学Hadoop(08)：第一个MapReduce

三：新建Mapper

在WordCountEx类中建一个内部类MyMapper。

在这里我们做了点DIY,排除了字母长度小于5的数据，方便大家对比理解程序。

static class MyMapper extends Mapper<Object, Text, Text, IntWritable> {  private final static IntWritable one = new IntWritable(1);  private Text word = new Text();  @Override  protected void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context)    throws IOException, InterruptedException {   // 分割字符串   StringTokenizer itr = new StringTokenizer(value.toString());   while (itr.hasMoreTokens()) {    // 排除字母少于5个的    String tmp = itr.nextToken();    if (tmp.length() < 5)     continue;    word.set(tmp);    context.write(word, one);   }  } }

View Code

四：新建Reduce

同上，我们将map的结果乘以2，然后输出的内容的key加了个前缀。

static class MyReduce extends Reducer<Text, IntWritable, Text, IntWritable> {  private IntWritable result = new IntWritable();  private Text keyEx = new Text();  @Override  protected void reduce(Text key, Iterable<IntWritable> values,    Reducer<Text, IntWritable, Text, IntWritable>.Context context)      throws IOException, InterruptedException {   int sum = 0;   for (IntWritable val : values) {    // 将map的结果放大，乘以2    sum += val.get() * 2;   }   result.set(sum);   // 自定义输出key   keyEx.set("输出:" + key.toString());   context.write(keyEx, result);  } }

View Code

五：新建Main

在main方法中我们得定义一个job，配置它。

public static void main(String[] args) throws Exception {  //配置信息  Configuration conf = new Configuration();  //job名称  Job job = Job.getInstance(conf, "mywordcount");  job.setJarByClass(WordCountEx.class);  job.setMapperClass(MyMapper.class);  // job.setCombinerClass(IntSumReducer.class);  job.setReducerClass(MyReduce.class);  job.setOutputKeyClass(Text.class);  job.setOutputValueClass(IntWritable.class);  //输入、输出path  FileInputFormat.addInputPath(job, new Path(args[0]));  FileOutputFormat.setOutputPath(job, new Path(args[1]));  //结束  System.exit(job.waitForCompletion(true) ? 0 : 1); }

View Code

六：导出jar包

导出我们写好的jar包。命名为com.first.jar

【源】从零自学Hadoop(08)：第一个MapReduce

七：放入Linux

将导出的jar包放到H31的/var/tmp下

cd /var/tmp
ls

八：执行

大家仔细看下命令和结果会发现有什么不同

yarn jar com.first.jar  /tmp/input/words_01.txt /tmp/output/1007_03

【源】从零自学Hadoop(08)：第一个MapReduce

如果是仔细看了，发现少个wordcount对吧，为什么列，因为在导出jar包的时候制定的main函数。

九：导出不指定main入口的jar包

我们在导出的时候，不指定main的入口。

十：执行2

我们发现这里就得多带一个参数了，就是方法的入口，这里得全路径。

yarn jar com.first.jar com.first.WordCountEx /tmp/input/words_01.txt /tmp/output/1007_04

【源】从零自学Hadoop(08)：第一个MapReduce

十一：结果

我们看下输出的结果，可以明显的看到少于5个长度的被排除了，而且结果的count都乘以了2。前缀乱码的不要纠结了，换个编码方式就好了。

【源】从零自学Hadoop(08)：第一个MapReduce

--------------------------------------------------------------------

到此，本章节的内容讲述完毕。

示例下载

Github: https://github.com/sinodzh/HadoopExample/tree/master/2015/com.first

系列索引

【源】从零自学Hadoop系列索引

本文版权归mephisto和博客园共有，欢迎转载，但须保留此段声明，并给出原文链接，谢谢合作。

文章是哥(mephisto)写的，SourceLink

正文到此结束

所属分类：编程技术

本文标签： 配置插件博客 apache find GitHub client eclipse map value http 目录 IDE example node apr key 统计 linux https 数据 final tab Namenode HTML core git UI 参数 Hadoop cat App src ip
版权声明： 本文为互联网转载文章，出处已在文章中说明(部分除外)。如果侵权，请联系本站长删除，谢谢。
本文海报： 生成海报一生成海报二

其他链接

关于本站

本站定位：个人技术类博客

本站作用：写博客、记日志、闲聊扯淡鼓捣技术。

问题交流

【源】从零自学Hadoop(08)：第一个MapReduce

阅读目录

数据准备

一：说明

二：造数据

三：上传

wordcount

一：官网示例

二：找到示例

四：进入目录

五：执行

一：介绍

二：Yarn命令

新建MapReduce

一：通过插件新建工程

二：新建WordCountEx类

三：新建Mapper

四：新建Reduce

五：新建Main

六：导出jar包

七：放入Linux

八：执行

如果是仔细看了，发现少个wordcount对吧，为什么列，因为在导出jar包的时候制定的main函数。

九：导出不指定main入口的jar包

十：执行2

十一：结果

示例下载

系列索引

热门推荐

相关文章

说给你听

本文目录

随机标签

书籍教程

近期评论

网站信息

其他链接

关于本站

问题交流