Loading…
Apache: Big Data Europe 2016
Click here to Register or for more information 
Tuesday, November 15 • 12:00 - 12:50
Create a Hadoop Cluster and Migrate 39PB Data Plus 150000 Jobs/Day - Stuart Pook, Criteo

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Criteo had an Hadoop cluster with 39 PB raw stockage, 13404 CPUs, 105 TB RAM, 40 TB data imported per day and >100000 jobs per day. This cluster was critical in both stockage and compute but without backups. This talk describes: 0/ the different options considered when deciding how to protect our data and compute capacity 1/ the criteria established for the 800 new computers and comparison tests between suppliers' hardware 2/ the non-blocking network infrastructure with 10 Gb/s endpoints scalable to 5000 machines 3/ the installation and configuration, using Chef, of a cluster on new hardware 4/ the problems encountered in moving our jobs and data from the old CDH4 cluster to the new CDH5 cluster 600 km distant 5/ running and feeding with data the two clusters in parallel 6/ fail over plans 7/ operational issues 8/ the performance of the 16800 core, 200 TB RAM and 60 PB disk CDH5 cluster.

Speakers
avatar for Stuart Pook

Stuart Pook

Senior DevOps Engineer, Criteo
Stuart loves storage (130 PB at Criteo) and is part of Criteo's Lake team that runs some small and two rather large Hadoop clusters. He also loves automation with Chef because configuring more than 2200 Hadoop nodes by hand is just too slow. Before discovering Hadoop he developed... Read More →



Tuesday November 15, 2016 12:00 - 12:50 CET
Giralda VI/VII