Friday, December 17, 2010Hadoop cluster at Ebay
I am always curious to know how other companies are installing Hadoop clusters. How are they using its ecosystem. Since Hadoop is still relatively new, there are no best practices. Every company is implementing what they think is the best infrastructure for the Hadoop Cluster.
At Hadoop NYC 2010 conference, ebay showcased there implementation of Hadoop production cluster. Following are some tidbits on ebay's implementation of Hadoop.
- JobTracker, Namenode, Zookeeper, HBase Master are all enterprise nodes running in Sun 64 bit architecture. They are running red hat linux with 72GB Ram and 4TB disks.
- There are 4000 datanodes, each running cent OS with 48 GB RAM and 10TB space
- Ganglia and Nagios are used for monitoring and alerting. Ebay is also building a custom solution to augment them.
- ETL is done using mostly Java Map Reduce programs
- Pig is used to build data pipelines
- Hive is used for AdHoc queries
- Mahout is used for Data Mining
They are toying with the idea of using Oozie to manage work flows but haven't decided to use it yet.
It looks like they are doing all the right things.
分享到:
相关推荐
Hadoop Cluster Deployment.
hadoop cluster build detail
Hadoop cluster planning guide
hadoop-cluster-docker, 在 Docker 容器中运行 Hadoop 在 Docker 容器内运行Hadoop集群博客:在 Docker 更新中运行Hadoop集群。博客:基于Docker搭建Hadoop集群之升级版 3节点Hadoop集群 1.拉 Docker 图像sudo do
配置hadoop的集群文档,包含了详细配置的PDF文档和WordCount代码
指导Hadoop集群部署的资料, 注意: 内容是英文的, 可能有些同学会失望
Hadoop在centOS系统下的安装文档,系统是虚拟机上做出来的,一个namenode,两个datanode,详细讲解了安装过程。
[Packt Publishing] Hadoop Operations and Cluster Management Cookbook (E-Book) ☆ 图书概要:☆ Over 60 recipes showing you how to design, configure, manage, monitor, and tune a Hadoop cluster ...
人工智能-Hadoop
Hadoop Multi Node Cluster 安装步骤.pdf
大数据与云计算培训学习资料 基于Hadoop平台的eBay用户邮件数据分析 共26页.pptx
The Hadoop market is predicted to grow at a compound annual growth rate over the next several years. Several good tools and guides describe how to deploy Hadoop clusters, but very little ...
Hadoop Performance at LinkedIn
Hadoop Single Node Cluster的详细安装,master主机与data1、data2、data3三台节点连接。
A sample of the NCDC weather dataset that is used throughout the book can be found at https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all. and another one : The full dataset is stored...
NULL 博文链接:https://snv.iteye.com/blog/1936891
MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, ...
maintain a Hadoop cluster running HDFS and MapReduce. Chapters 11, 12, and 13 present Pig, HBase, and ZooKeeper, respectively. Finally, Chapter 14 is a collection of case studies contributed by ...
Hadoop.Operations.and.Cluster.Management.Cookbook(2013.7).Shumin.Guo.文字版
为Hadoop MultiNode Cluster创建AWS基础架构 Hadoop名称节点 配置Hadoop名称节点 Hadoop数据节点 配置Hadoop数据节点 先决条件 Ansible应该已安装和配置 应该安装和配置AWS CLI 角色的其他要求包含在特定角色的...