Loading…
Apache: Big Data 2016 has ended
Register Now or Visit the Website for more Information 
Tuesday, May 10 • 9:00am - 9:50am
Random Forest Clustering with Apache Spark - Erik Erlandson, Red Hat, Inc.

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Analytics applications often boil down to grouping objects into two or more clusters having similar elements. Defining what “similar” means can be surprisingly difficult when data elements have many columns or dimensions. Having tools at hand to generate quality clusters from high-dimensional data greatly increases the variety of applications that can successfully leverage clustering.

In this presentation, Erik Erlandson will introduce the basic principles and advantages of Random Forest learning models and Random Forest clustering. He will explain how to build up an implementation of Random Forest clustering in the Apache Spark analytics framework, based on the Spark MLLib Random Forest modeling API.

The presentation will include examples of Random Forest clustering applied to VM installed-package profiles and a discussion of practical issues encountered along the way.

Speakers
avatar for Erik Erlandson

Erik Erlandson

Senior Principle Software Engineer, Red Hat
Erik Erlandson is a Software Engineer at Red Hat Emerging Technologies, where he leads a team dedicated to exploring tools, methodologies and use cases at the intersection of Data Science workloads and the Kubernetes ecosystem.



Tuesday May 10, 2016 9:00am - 9:50am PDT
Plaza C