ApacheCon NA 2011

Instrumenting Hadoop Jobs for Fun and Profit

1:30 - 2:20pm on Friday, November 11 in Salon B

Instrumentation is a general purpose technique to automatically gather detailed information about the execution of a process.

The distributed nature of a Hadoop job makes both the engineering of the instrumentation and the presentation of the output harder.

However, instrumentation can also take advantage of a detailed knowledge of the code paths within Hadoop to build a much deeper insight into the behaviour of the user code.

We will present our approach to general purpose instrumentation for Hadoop, which uses Hadoop-specific insights to profile, debug and diagnose faults in a job.

We will describe techniques using attempt success/failure, internal exception rates and differential analysis, amongst others, to help us localize badly performing code or malformed input data without user intervention.

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Community Sponsors