Hadoop 安装

必备软件

JDK
下载Oracle JDK1.8 jdk-8u251-linux-x64.tar.gz

配置JAVA_HOME、CLASSPATH、PATH等参数。

Hadoop 安装

下载Hadoop

wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz

解压到 /opt/middleware

字符搜索案例

cd $HADOOP_HOME
mkdir input
cp etc/hadoop/*.xml input
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep input output 'dfs[a-z.]+'
cat output/* #查看结果

output为结果输出目录。必须是不存在的,如果存在会报目录已存在的错误信息。<br />

[root@bogon hadoop-3.2.1]# ll output
总用量 4
-rw-r--r--. 1 root root 11 6月  18 00:23 part-r-00000
-rw-r--r--. 1 root root  0 6月  18 00:23 _SUCCESS

_SUCCESS 仅仅是一个标识,表示任务成功执行。<br />

WordCount案例

[root@bogon hadoop-3.2.1]# mkdir -p study/wcinput
[root@bogon hadoop-3.2.1]# vim study/wcinput/wc.input

输入如下内容(可以是任意单词)

The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

[root@bogon hadoop-3.2.1]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar wordcount study/wcinput/ study/wcoutput
2020-06-18 02:52:27,056 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2020-06-18 02:52:27,189 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2020-06-18 02:52:27,189 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2020-06-18 02:52:27,473 INFO input.FileInputFormat: Total input files to process : 1
2020-06-18 02:52:27,503 INFO mapreduce.JobSubmitter: number of splits:1
2020-06-18 02:52:27,621 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local329843332_0001
2020-06-18 02:52:27,621 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-06-18 02:52:27,720 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2020-06-18 02:52:27,721 INFO mapreduce.Job: Running job: job_local329843332_0001
2020-06-18 02:52:27,725 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2020-06-18 02:52:27,731 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2020-06-18 02:52:27,731 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-06-18 02:52:27,732 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2020-06-18 02:52:27,758 INFO mapred.LocalJobRunner: Waiting for map tasks
2020-06-18 02:52:27,759 INFO mapred.LocalJobRunner: Starting task: attempt_local329843332_0001_m_000000_0
2020-06-18 02:52:27,782 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2020-06-18 02:52:27,782 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-06-18 02:52:27,796 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2020-06-18 02:52:27,800 INFO mapred.MapTask: Processing split: file:/opt/middleware/hadoop-3.2.1/study/wcinput/wc.input:0+661
2020-06-18 02:52:27,826 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2020-06-18 02:52:27,826 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2020-06-18 02:52:27,826 INFO mapred.MapTask: soft limit at 83886080
2020-06-18 02:52:27,826 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2020-06-18 02:52:27,826 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2020-06-18 02:52:27,830 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2020-06-18 02:52:27,838 INFO mapred.LocalJobRunner: 
2020-06-18 02:52:27,838 INFO mapred.MapTask: Starting flush of map output
2020-06-18 02:52:27,838 INFO mapred.MapTask: Spilling map output
2020-06-18 02:52:27,838 INFO mapred.MapTask: bufstart = 0; bufend = 1057; bufvoid = 104857600
2020-06-18 02:52:27,838 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214004(104856016); length = 393/6553600
2020-06-18 02:52:27,853 INFO mapred.MapTask: Finished spill 0
2020-06-18 02:52:27,871 INFO mapred.Task: Task:attempt_local329843332_0001_m_000000_0 is done. And is in the process of committing
2020-06-18 02:52:27,874 INFO mapred.LocalJobRunner: map
2020-06-18 02:52:27,874 INFO mapred.Task: Task 'attempt_local329843332_0001_m_000000_0' done.
2020-06-18 02:52:27,880 INFO mapred.Task: Final Counters for attempt_local329843332_0001_m_000000_0: Counters: 18
    File System Counters
        FILE: Number of bytes read=317372
        FILE: Number of bytes written=837373
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=2
        Map output records=99
        Map output bytes=1057
        Map output materialized bytes=1014
        Input split bytes=121
        Combine input records=99
        Combine output records=75
        Spilled Records=75
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=7
        Total committed heap usage (bytes)=212860928
    File Input Format Counters 
        Bytes Read=661
2020-06-18 02:52:27,880 INFO mapred.LocalJobRunner: Finishing task: attempt_local329843332_0001_m_000000_0
2020-06-18 02:52:27,881 INFO mapred.LocalJobRunner: map task executor complete.
2020-06-18 02:52:27,886 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2020-06-18 02:52:27,887 INFO mapred.LocalJobRunner: Starting task: attempt_local329843332_0001_r_000000_0
2020-06-18 02:52:27,896 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2020-06-18 02:52:27,896 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2020-06-18 02:52:27,897 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
2020-06-18 02:52:27,899 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@2d8d2e2
2020-06-18 02:52:27,900 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2020-06-18 02:52:27,922 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=617296704, maxSingleShuffleLimit=154324176, mergeThreshold=407415840, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2020-06-18 02:52:27,926 INFO reduce.EventFetcher: attempt_local329843332_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2020-06-18 02:52:27,949 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local329843332_0001_m_000000_0 decomp: 1010 len: 1014 to MEMORY
2020-06-18 02:52:27,953 INFO reduce.InMemoryMapOutput: Read 1010 bytes from map-output for attempt_local329843332_0001_m_000000_0
2020-06-18 02:52:27,955 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 1010, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->1010
2020-06-18 02:52:27,957 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2020-06-18 02:52:27,959 INFO mapred.LocalJobRunner: 1 / 1 copied.
2020-06-18 02:52:27,959 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2020-06-18 02:52:27,965 INFO mapred.Merger: Merging 1 sorted segments
2020-06-18 02:52:27,965 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1001 bytes
2020-06-18 02:52:27,967 INFO reduce.MergeManagerImpl: Merged 1 segments, 1010 bytes to disk to satisfy reduce memory limit
2020-06-18 02:52:27,967 INFO reduce.MergeManagerImpl: Merging 1 files, 1014 bytes from disk
2020-06-18 02:52:27,967 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2020-06-18 02:52:27,968 INFO mapred.Merger: Merging 1 sorted segments
2020-06-18 02:52:27,977 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1001 bytes
2020-06-18 02:52:27,977 INFO mapred.LocalJobRunner: 1 / 1 copied.
2020-06-18 02:52:27,980 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2020-06-18 02:52:27,985 INFO mapred.Task: Task:attempt_local329843332_0001_r_000000_0 is done. And is in the process of committing
2020-06-18 02:52:27,985 INFO mapred.LocalJobRunner: 1 / 1 copied.
2020-06-18 02:52:27,986 INFO mapred.Task: Task attempt_local329843332_0001_r_000000_0 is allowed to commit now
2020-06-18 02:52:27,987 INFO output.FileOutputCommitter: Saved output of task 'attempt_local329843332_0001_r_000000_0' to file:/opt/middleware/hadoop-3.2.1/study/wcoutput
2020-06-18 02:52:27,988 INFO mapred.LocalJobRunner: reduce > reduce
2020-06-18 02:52:27,988 INFO mapred.Task: Task 'attempt_local329843332_0001_r_000000_0' done.
2020-06-18 02:52:27,989 INFO mapred.Task: Final Counters for attempt_local329843332_0001_r_000000_0: Counters: 24
    File System Counters
        FILE: Number of bytes read=319432
        FILE: Number of bytes written=839111
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Combine input records=0
        Combine output records=0
        Reduce input groups=75
        Reduce shuffle bytes=1014
        Reduce input records=75
        Reduce output records=75
        Spilled Records=75
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=0
        Total committed heap usage (bytes)=212860928
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Output Format Counters 
        Bytes Written=724
2020-06-18 02:52:27,990 INFO mapred.LocalJobRunner: Finishing task: attempt_local329843332_0001_r_000000_0
2020-06-18 02:52:27,990 INFO mapred.LocalJobRunner: reduce task executor complete.
2020-06-18 02:52:28,725 INFO mapreduce.Job: Job job_local329843332_0001 running in uber mode : false
2020-06-18 02:52:28,728 INFO mapreduce.Job:  map 100% reduce 100%
2020-06-18 02:52:28,729 INFO mapreduce.Job: Job job_local329843332_0001 completed successfully
2020-06-18 02:52:28,735 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=636804
        FILE: Number of bytes written=1676484
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
    Map-Reduce Framework
        Map input records=2
        Map output records=99
        Map output bytes=1057
        Map output materialized bytes=1014
        Input split bytes=121
        Combine input records=99
        Combine output records=75
        Reduce input groups=75
        Reduce shuffle bytes=1014
        Reduce input records=75
        Reduce output records=75
        Spilled Records=150
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=7
        Total committed heap usage (bytes)=425721856
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=661
    File Output Format Counters 
        Bytes Written=724
[root@bogon hadoop-3.2.1]# cat study/wcoutput/*
Apache  1
Apache™ 1
Hadoop  1
Hadoop® 1
It  1
Rather  1
The 2
a   3
across  1
allows  1
and 2
application 1
at  1
be  1
cluster 1
clusters    1
computation 1
computers   1
computers,  1
computing.  1
data    1
deliver 1
delivering  1
designed    2
detect  1
develops    1
distributed 2
each    2
failures    1
failures.   1
for 2
framework   1
from    1
handle  1
hardware    1
high-availability,  1
highly-available    1
is  3
itself  1
large   1
layer,  1
library 2
local   1
machines,   1
may 1
models. 1
of  6
offering    1
on  2
open-source 1
processing  1
programming 1
project 1
prone   1
reliable,   1
rely    1
scalable,   1
scale   1
servers 1
service 1
sets    1
simple  1
single  1
so  1
software    2
storage.    1
than    1
that    1
the 3
thousands   1
to  5
top 1
up  1
using   1
which   1
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 194,242评论 5 459
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 81,769评论 2 371
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 141,484评论 0 319
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 52,133评论 1 263
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 61,007评论 4 355
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 46,080评论 1 272
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 36,496评论 3 381
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 35,190评论 0 253
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 39,464评论 1 290
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 34,549评论 2 309
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 36,330评论 1 326
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,205评论 3 312
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 37,567评论 3 298
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 28,889评论 0 17
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,160评论 1 250
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 41,475评论 2 341
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 40,650评论 2 335

推荐阅读更多精彩内容