Skip to content

A repository which documents general guidelines for scaling the Apache Hadoop Namenode.

License

Notifications You must be signed in to change notification settings

dineshchitlangia/NamenodeScalability

Repository files navigation

Scaling the Namenode

General guidelines for scaling the Namenode in HDFS, lessons learnt from the field.

This repo is a collateral for my presentation at ApacheCon 2021 titled "Scaling the Namenode - Lessons Learnt"

The slide deck from the talk is available here.

Commonly reported problems

  • Performance
  • RPC Processing Time
  • GC pauses
  • Read/Write performance
  • Too long to start NN
  • Stability
  • Frequent Failover
  • Frequent Crash

Various causes

  • Small files
  • Sub optimal heap settings
  • Missing RPC improvements
  • Bad Applications / Mistuned Components
  • Degraded AD
  • Too frequent/delayed checkpointing
  • Heavy Services co-located / Disk throughput
  • Too much logging
  • Degraded JN / communication between NN/JN/ZK

Tips for Scaling

About

A repository which documents general guidelines for scaling the Apache Hadoop Namenode.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published