Databricks enables companies to accelerate data-driven innovation with a unified approach to data analytics and AI. Leveraging Data Migrator to automate Hadoop data and Hive metadata migration directly to Databricks enables organizations to focus resources on development of new AI innovations rather than migration complexities. With Data Migrator’s native integration with the Databricks Unity Catalog, centralized data governance and access control capabilities enable faster data operations and accelerated time-to-business-value for enterprises.

Automate data and metadata migration to Databricks

Cirata Data Migrator is a safe and reliable cloud migration solution that automates the migration of Hadoop data and Hive metadata to the cloud. Data Migrator provides three key Databricks-specific functionalities:
  • Cirata Data Migrator supports Databricks Unity Catalog, enhancing data governance, access control, and collaboration while automating the large-scale transfer of data and metadata from existing data lakes to Databricks, even during source changes. This ensures seamless adoption of the Databricks Data Intelligence Platform with minimal disruption and risk, with faster implementation.
  • Cirata Data Migrator streamlines Apache Hive metadata access directly to Databricks, empowering your data initiatives with crucial information. It ensures Databricks remains updated with live data transfers from your existing data lake infrastructure, eliminating the need for complex and fragile data pipelines.
  • Cirata Data Migrator simplifies the transformation of data formats from Hadoop and Spark to Databricks' Delta Lake. This automation allows you to leverage Databricks' unique features without the hassle of manual transformations and by using your scalable Databricks runtimes, this ensures flexible handling of both large and small data sets.
Talk to an expert to learn more

Learn more with an informative partnership webinar between Databricks and Cirata on accelerating Hadoop migration to Databricks with Cirata.

“As a long-standing partner, Cirata has helped many customers in their legacy Hadoop to Databricks migrations. Now, the seamless integration of Cirata Data Migrator with Unity Catalog enables enterprises to capitalize on our Data and AI capabilities to drive productivity and accelerate their business value.” — Siva Abbaraju, Go-to-Market Leader, Migrations, Databricks.

Case studies from global leaders on how Cirata solved their challenges.

Leading global airline
Challenge
  • Expedite migration of their on-premises Hadoop data in Cloudera to Azure Data Lake Storage (ADLS) Gen2 and Delta Lake on Databricks while the on-premises system remained active.
Results
  • Automated migration with no disruption to existing production environment.
  • Faster time-to-market for revenue generating apps.
  • Reduced ongoing support costs and cost avoidance by decommissioning on-premises platform faster.
Leading global telecom
Challenge
  • Migrate 10s of PBs of data from their on-premises Hadoop environment to Microsoft Azure and Databricks without disrupting the business or downtime of their production environment.
Results
  • Migrated 13 PB of data from on-prem Hortonworks cluster to ADLS Gen2.
  • Ability to block 1 billion robocalls per month, over 7.2 times more per year than before.
  • 42% reduction in the original data integration timeline.
Leading global automotive manufacture
Challenge
  • Multiple attempts to transition from their approach of using batch uploads to near-real-time, including with Microsoft data mover, failed to deliver both accurate replication and ongoing synchronization between the data center and the cloud.
Results
  • Enables the automotive manufacturer to see near-real-time insights from their data.
  • Data scientists could begin developing AI & ML models immediately.
  • Initial replication performed without business impact.

Cirata Data Migrator for Hadoop automates the movement of data to the cloud

The following capabilities enable zero business disruption, reduced risk, and best time-to-value.
Quick deployment and operation
Data Migrator is installed on an edge node of your Hadoop cluster. Deployment can be performed in minutes without impacting current operations, so users can begin moving data immediately.
Synchronization & replication
Existing datasets can be moved with a single pass through the source storage system, eliminating the CPU cycles and overhead associated with multiple scans, while also supporting continuous migration of any ongoing changes from source to target with zero disruption to current production systems.
Support for multiple sources and targets
Data Migrator supports HDFS distributions v2.6 and higher as source systems, as well as leading cloud service providers and select independent software vendors, such as Databricks and Snowflake, as target systems. See Data Migrator documentation for further details.
Transfer Hadoop data and Hive metadata
Data Migrator supports migration of HDFS data and Hive metadata to any public cloud and on-premises environments.
Data transfer at any scale
Datasets of any size — from terabytes to multiple petabytes — can be moved without affecting production environments. Horizontal scaling capabilities allow users to scale their migration capacity by configuring transfer agents to maximize the productivity of available bandwidth.
Easy management
Cirata browser-based user interface (UI) lets users manage the entire data and metadata migration from a single management console.
Programmatic interface
Migrations can also be managed through a comprehensive and intuitive command-line interface or by using the self-documenting representational state transfer API to integrate the solution with other programs as needed.
Flexible configurations and precise control
Organizations can configure migration jobs to meet their specific needs, such as defining sources, targets, and which data to migrate. There are also advanced capabilities, such as migration prioritization, path mapping, and network bandwidth-management controls.
Transfer verification
Data Migrator contains a data transfer verification function that scans both source and target environments to ensure data fidelity and validate the success of all data transfers. Results and reports are delivered through the UI or by email.
Powerful metrics and real-time monitoring
Users are updated on migration jobs, from health and status metrics providing estimates for migration completion to email notifications and real-time insights regarding usage enabling hands-off operations.

Modernize your data architecture with a unified analytics platform

Databricks provides a Unified Analytics Platform powered by Apache Spark for data science teams to collaborate with data engineering and lines of business to build data products. You can achieve faster time-to-value with Cirata Data Migrator by transforming your data during migration into Delta Lake format, accelerating the creation of analytic workflows in Databricks that go from ETL and interactive exploration to production.
Talk to an expert to learn more

The sweet spot: Cloudera/Hadoop data movement and data lake migration

Navigating the complexities of migrating Cloudera/Hadoop data to Databricks can be daunting. Cirata streamlines this process, addressing multiple challenges with precision and expertise, ensuring a seamless transition for your data workloads.
Data migrations can block or slow Databricks adoption with the following issues:
  • Large data volumes
  • High change rate
  • In-house migrations solutions take too long
Automation simplifies data migration projects with the following benefits:
  • Migrate data straight Into DeltaLake format
  • Time to benefit reduced
  • Begin consuming Databrick Units (DBU’s) earlier in the migration project
  • Realize new revenue streams earlier
Talk to an expert to learn more
Ready to get started on your data migration to Databricks? Reach out today for more information and to speak to an expert.

Data Matters Podcast – Databricks + Cirata

Featured resources

Data sheet
Data migration for Databricks
Press release
Cirata expands Databricks partnership with Native Unity Catalog Integration
Webinar
Accelerating Hadoop migration to Databricks

Cookies and Privacy

We use technology on our website to collect information that helps us enhance your experience and understand what information is most useful to visitors.
By clicking “I ACCEPT,” you agree to the terms of our privacy policy.

Cookie Setting