ApacheCon NA 2011

One Day -- Mahout Boot Camp

10:00 - 5:30pm on Monday, November 7

Mahout Boot Camp is a 1-day training designed to get newcomers to Mahout up and running using Mahout's classification, clustering and collaborative filtering tools. The class will also introduce some of Mahout's other features such as frequent patternset mining. We will also cover the basics of machine learning.

The class will be both lecture and labs, so students should be prepared to code. No prior machine learning experience is required. Experience with Java is helpful, but not required.

Course Outline:
1. Introduction
a. What is Mahout?
b. What is Machine Learning?
c. What can it solve?
d. What can’t it solve?
e. What version and Why?
2. Getting Started
a. Installing Mahout
b. Validating Installation
3. The Three C’s of Mahout – Mahout Concepts
a. Classification
b. Clustering
c. Collaborative Filtering (Recommendation)
4. Lab 1: The C’s in Action
i. Run the Mahout examples
5. Classification In Depth
a. Concepts in Classification
i. Understanding your data
1. Feature Selection
b. Mahout’s classification algorithms
i. Naïve Bayes and Complementary Naïve Bayes
ii. Random Forests
iii. SGD
c. Lab: Classifying Wikipedia
d. Classification in Production
6. Clustering In Depth
a. Concepts in Clustering
i. Document
ii. Topic/Word
b. Mahout’s Clustering Algorithms
i. K-Means
ii. Mean-shift
iii. Canopy
iv. Latent Dirchlet
c. Lab: Clustering the News
d. Clustering in Production
7. Collaborative Filtering (CF) In Depth
a. Concepts in CF
i. Modeling data
ii. Measuring Affinity
b. Mahout’s CF Capabilities
i. User-Item
ii. Item-Item
iii. Scoring
1. Slope One
2. Other Distance Measures
iv. Online vs Offline
c. Lab: Recommending Movies
8. Mahout’s other features and functionalities
a. Freq. Patternset Mining
b. Primitive Collections
c. Utils

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

Community Sponsors