GoDaddy utilizes an 800-node Apache Hadoop cluster to hold over 2.5 petabytes of customer-related activity and behavior data. This on-premises data lake is critical for guiding business operations and determining the company’s investment strategies. The system is in operation 24x7. It can generate peak loads of more than 100,000 file system events per second, with sustained 12 hour periods processing an average of over 21,000 change operations every second.
While the on-premises data lake is business critical, it is aging and running on an old version of Apache Hadoop (2.8). GoDaddy wanted to modernize the implementation by migrating the data to Amazon Web Services (AWS) to take advantage of the modern tooling and analytics capabilities available on AWS, and mitigating the risks and costs associated with maintaining the on-premises Hadoop cluster and the underlying hardware.
The challenge for GoDaddy was how to migrate petabytes of actively changing, “live” data when the business depends on the continued operation of applications in the cluster and access to its data. Any disruption to business operations would be unacceptable and may have prevented a migration from even being attempted.
GoDaddy, being a technically oriented company with deep software development skills, often builds their own solutions. As such, they investigated building their own custom migration solution leveraging open source tools. However, it was deemed that performing the initial migration and ongoing synchronization manually is a complex, error-prone task, and not the core competency on which they wanted their highly skilled engineers to spend their time. Instead, following a quick demonstration of a 2TB migration, and a subsequent 10TB proof-of-concept GoDaddy selected Cirata Data Migrator to automate the migration. Data Migrator combines a single scan of the source datasets with processing of the ongoing changes that occur to achieve a complete and continuous data migration. It does not impose any cluster downtime or disruption, and requires no changes to cluster operation or application behavior.
“At GoDaddy, deep technical knowledge is in our DNA, and we often build applications in-house to support growth. In the use case of a Hadoop to Amazon S3 data migration and replication, we found Cirata’s Data Migrator to be the optimal approach to deliver the best time to value, rather than running a more time consuming and costly manual migration project internally.”
– Wayne Peacock, Chief Data dnd Analytics Officer, Godaddy