Data Migrator for Azure is a native Azure service that enables users to integrate petabyte-scale Hadoop data and Hive metadata to the Azure cloud with zero application downtime and zero risk of data loss even while the source data is under active change.
With Data Migrator for Azure, you can deploy and manage your data lake migrations using the same Azure management experience you enjoy today through the Azure Portal and Azure command-line interface (CLI).
Deep integration with Azure resources enables Data Migrator for Azure to be deployed at the same time as other native Azure services and with an equivalent user experience.
Data Migrator for Azure leverages Azure features, such as role-based access control, Active Directory, Azure Policy enforcement, and Activity Monitor log integration.
Customers are billed through Azure, eliminating the need to add a new vendor contract or require additional vendor approvals.
The Data Migrator for Azure resource can be created directly from the Azure portal. The Data Migrator for Azure service is installed on an edge node of your Hadoop cluster. Deployment can be performed in minutes without impact to current operations, so users can begin moving data immediately.
Data Migrator for Azure supports migration of HDFS data and Hive metadata to Azure Data Lake Storage Gen2. Hive metadata can optionally be further transformed to Azure SQL Database metastore, Delta Lake format on Azure Databricks, or Snowflake. See Data Migrator for Azure documentation for details.
Selected datasets can be moved with a single pass through the source storage system, eliminating overhead associated with repeated scans, while supporting continuous migration of any ongoing changes from source to target, with zero disruption to current production systems.
Any-size datasets — from terabytes to multiple petabytes — can be moved without affecting production environments. Horizontal scaling capabilities allow users to scale their migration capacity by configuring transfer agents to maximize the productivity of available bandwidth.
Users can manage the full data migration directly from the Azure portal. Additionally, Data Migrator for Azure can be configured and operated from the Azure CLI.
Organizations can configure migration jobs to meet their specific needs, such as defining sources, targets, and which data to migrate. There are also advanced capabilities, such as migration prioritization, path mapping, and network bandwidth-management controls.
Data Migrator for Azure enables hands-off operations by providing updates on the migration, from health and status metrics for estimates on migration completion, to information on files transferred over time, migration path exclusions, failed transfers, and real-time insights regarding data usage.