Register Now or Visit the Website for more Information

Random Forest Clustering with Apache Spark - Erik Erlandson, Red Hat, Inc.

Analytics applications often boil down to grouping objects into two or more clusters having similar elements. Defining what “similar” means can be surprisingly difficult when data elements have many columns or dimensions. Having tools at hand to generate quality clusters from high-dimensional data greatly increases the variety of applications that can successfully leverage clustering.

In this presentation, Erik Erlandson will introduce the basic principles and advantages of Random Forest learning models and Random Forest clustering. He will explain how to build up an implementation of Random Forest clustering in the Apache Spark analytics framework, based on the Spark MLLib Random Forest modeling API.

The presentation will include examples of Random Forest clustering applied to VM installed-package profiles and a discussion of practical issues encountered along the way.

Speakers

Erik Erlandson

Senior Principle Software Engineer, Red Hat

Erik Erlandson is a Software Engineer at Red Hat Emerging Technologies, where he leads a team dedicated to exploring tools, methodologies and use cases at the intersection of Data Science workloads and the Kubernetes ecosystem.

Random Forest Clustering with Apache Spark pdf

Tuesday May 10, 2016 9:00am - 9:50am PDT
Plaza C

Spark, Intermediate

Attendees (39)

C
S
M
S
C
A
H
M
D
D
K
S
M
C
A
J
b
G
W
P
F
View All →

Apache: Big Data 2016

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Erik Erlandson

Attendees (39)