WANdisco is now Cirata
Databricks enables companies to accelerate data-driven innovation with a unified approach to data analytics and AI. Leveraging Data Migrator to automate Hadoop data and Hive metadata migration directly to Databricks enables organizations to focus resources on development of new AI innovations rather than migration complexities. With Data Migrator’s native integration with the Databricks Unity Catalog, centralized data governance and access control capabilities enable faster data operations and accelerated time-to-business-value for enterprises.

Automate data and metadata migration to Databricks

Cirata Data Migrator is a safe and reliable cloud migration solution that automates the migration of Hadoop data and Hive metadata to the cloud. Data Migrator provides three key Databricks-specific functionalities:
  • With the ability to support Databricks Unity Catalog’s functionality for stronger data operations, access control, accessibility and search, Cirata Data Migrator automates large-scale transfer of data and metadata from existing data lakes to cloud storage and database targets, even while changes are being made by the application at the source.
  • Make Apache Hive metadata available directly in Databricks workspaces using live migration so that ongoing changes to source metadata are reflected immediately in the Databricks target.
  • Transform the on-premises data formats used in Hadoop and Hive to the Databricks-preferred Delta Lake form, so that users can take full advantage of the features that are unique to the combination of Databricks and Delta Lake.
Learn more

Cirata Data Migrator for Hadoop automates the movement of data to the cloud

The following capabilities enable zero business disruption, reduced risk, and best time-to-value.
Quick deployment and operation

Data Migrator is installed on an edge node of your Hadoop cluster. Deployment can be performed in minutes without impacting current operations, so users can begin moving data immediately.

Synchronization & replication

Existing datasets can be moved with a single pass through the source storage system, eliminating the CPU cycles and overhead associated with multiple scans, while also supporting continuous migration of any ongoing changes from source to target with zero disruption to current production systems.

Support for multiple sources and targets

Data Migrator supports HDFS distributions v2.6 and higher as source systems, as well as leading cloud service providers and select independent software vendors, such as Databricks and Snowflake, as target systems. See Data Migrator documentation for further details.

Transfer Hadoop data and Hive metadata

Data Migrator supports migration of HDFS data and Hive metadata to any public cloud and on-premises environments.

Data transfer at any scale

Datasets of any size — from terabytes to multiple petabytes — can be moved without affecting production environments. Horizontal scaling capabilities allow users to scale their migration capacity by configuring transfer agents to maximize the productivity of available bandwidth.

Easy management

Cirata browser-based user interface (UI) lets users manage the entire data and metadata migration from a single management console.

Programmatic interface

Migrations can also be managed through a comprehensive and intuitive command-line interface or by using the self-documenting representational state transfer API to integrate the solution with other programs as needed.

Flexible configurations and precise control

Organizations can configure migration jobs to meet their specific needs, such as defining sources, targets, and which data to migrate. There are also advanced capabilities, such as migration prioritization, path mapping, and network bandwidth-management controls.

Transfer verification

Data Migrator contains a data transfer verification function that scans both source and target environments to ensure data fidelity and validate the success of all data transfers. Results and reports are delivered through the UI or by email.

Powerful metrics and real-time monitoring

Users are updated on migration jobs, from health and status metrics providing estimates for migration completion to email notifications and real-time insights regarding usage enabling hands-off operations.

Modernize your data architecture with a unified analytics platform

Databricks provides a Unified Analytics Platform powered by Apache Spark for data science teams to collaborate with data engineering and lines of business to build data products. You can achieve faster time-to-value with Cirata Data Migrator by transforming your data during migration into Delta Lake format, accelerating the creation of analytic workflows in Databricks that go from ETL and interactive exploration to production.
Learn more

Data Matters Podcast – Databricks + Cirata

“As a long-standing partner, Cirata has helped many customers in their legacy Hadoop to Databricks migrations. Now, the seamless integration of Cirata Data Migrator with Unity Catalog enables enterprises to capitalize on our Data and AI capabilities to drive productivity and accelerate their business value.”
— Siva Abbaraju, Go-to-Market Leader, Migrations, Databricks.

Featured resources

White paper
The distributed coordination engine
Data sheet
Data Migration for Databricks
We use technology on our website to collect information that helps us enhance your experience and understand what information is most useful to visitors. By clicking “I ACCEPT,” you agree to the terms of our privacy policy.
Cookie Setting