Shevek is an expert Java programmer, and one part of the creative team behind Karmasphere, a San Francisco based big data analytics company. He has worked on cutting edge research in compilers and language design, algorithmic optimization, systems and security. He received a Doctorate in Computing from the University of Bath, England. He also holds a Masters in Pure Mathematics and an epee.
The distributed nature of a Hadoop job makes both the engineering of the instrumentation and the presentation of the output harder.
However, instrumentation can also take advantage of a detailed knowledge of the code paths within Hadoop to build a much deeper insight into the behaviour of the user code.
We will present our approach to general purpose instrumentation for Hadoop, which uses Hadoop-specific insights to profile, debug and diagnose faults in a job.
We will describe techniques using attempt success/failure, internal exception rates and differential analysis, amongst others, to help us localize badly performing code or malformed input data without user intervention.