hadoop官网地址:http://hadoop.apache.org/
官网对于Hadoop的介绍
What Is Apache Hadoop?
The Apache™ Hadoop® project develops open-source software for reliable(可靠), scalable(可扩展), distributed(分布式) computing.
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters(集群) of computers using simple programming models. It is designed to scale up(扩大) from single servers to thousands of machines, each offering local computation(计算) and storage(存储). Rather than rely on hardware to deliver(提供) high-availability, the library itself is designed to detect(检测) and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone(易于) to failures.
The project includes these modules:
Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput(高吞吐量) access to application data.
Hadoop YARN: A framework for job scheduling(作业调度) and cluster resource management(资源管理).
Hadoop MapReduce: A YARN-based system for parallel(并行) processing of large data sets.
Hadoop能做什么?
搭建大型数据仓库,PB级数据的存储、处理、分析、统计等业务
例如:搜索引擎、日志分析、商业智能、数据挖掘