Hot HA for Hadoop NameNode
Current HDFS design assumes that a single server, NameNode, dedicated to maintaining the file system metadata, controls the work of other cluster nodes, DataNodes, handling actual file data blocks. The system is designed to survive and recover in minutes from a loss of multiple DataNodes. But the NameNode failure makes the entire cluster unavailable, since there is no other place to obtain metadata information immediately. Although this design simplifies overall architecture of HDFS, it also makes the NameNode a single point failure, which is considered a serious deficiency for production grade systems.
The primary goal of the proposed architecture is to build a highly available NameNode, which can failover to a Standby node in seconds, and which requires minimum changes to the existing code base.
The architecture introduces a StandbyNode, which is an evolutionary modification of BackupNode already existing in HDFS. This is the only major change required to the current Hadoop code base. The approach further utilizes standard HA software like LinuxHA, and existing functionality of load balancing hardware or software platforms. The system is prototyped on eBay clusters.