ApacheCon NA 2011

Konstantin V. Shvachko

Konstantin is a veteran Hadoop developer. He is a principal Hadoop architect at eBay. Konstantin specializes in efficient data structures and algorithms for large-scale distributed storage systems. He is a member of the Apache Hadoop PMC. Timothy Coulter Tim Coulter is a specialist consultant in the areas of Enterprise Networking, Operations, and Infrastructure.

Hot HA for Hadoop NameNode
November 11 3:30PM
Current HDFS design assumes that a single server, NameNode, dedicated to maintaining the file system metadata, controls the work of other cluster nodes, DataNodes, handling actual file data blocks. The system is designed to survive and recover in minutes from a loss of multiple DataNodes. But the NameNode failure makes the entire cluster unavailable, since there is no other place to obtain metadata information immediately. Although this design simplifies overall architecture of HDFS, it also makes the NameNode a single point failure, which is considered a serious deficiency for production grade systems.
The primary goal of the proposed architecture is to build a highly available NameNode, which can failover to a Standby node in seconds, and which requires minimum changes to the existing code base.
The architecture introduces a StandbyNode, which is an evolutionary modification of BackupNode already existing in HDFS. This is the only major change required to the current Hadoop code base. The approach further utilizes standard HA software like LinuxHA, and existing functionality of load balancing hardware or software platforms. The system is prototyped on eBay clusters.

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Community Sponsors