<-- Back to full color view
Successful Data Migration: A Step-by-Step Process
Originally published mayo 11, 2009
More than 2,500 years ago, Greek philosophers, such as Heraclitus of Ephesus, identified that “Everything flows, nothing stands still.” More recently in the 1960s, the cofounder of Intel,Gordon Moore, identified the rapid advances in the computer chip, with the doubling of transistor numbers on a chip the same size every two years (Moore’s law). Even more recently, RaymondKurzweil, a scientist and futurist, pronounced: “An analysis of the history of technology shows that technological change is exponential, contrary to the common-sense 'intuitive linear' view.So we won't experience 100 years of progress in the 21st century – it will be more like 20,000 years of progress.”
What then are the consequences for organizations in this dynamic and ever-changing world? Well, to survive, they will have to redefine themselves and adapt at an ever increasing pace. To do this theywill be updating their technology on a regular basis. Underpinning these technological updates will be a data migration.
A data migration is a crucial operation within any enterprise and failure can be catastrophic. The organisation’s ability to move forward with new technology is stopped dead in its tracks andsignificant expenditures have been made with little or no value gained.
Data migration projects have historically had a tendency to fail. According to Bloor Research, as many as 60% do not succeed.1 What are the characteristics of the 40% who successfullycomplete a data migration project and what process have they followed?
First, keep in mind the seven “R’s” of data migration. A migration solution must have the following essential characteristics:
Robust and Resilient: manage all aspects of the data extraction, transformation, cleansing, validation and loading into the target – and manage high volumes of data. Also, be able to adjust and to issues in the working environment such as potential problems with source and target connections, disk space and memory problems.
Rapid: execute efficiently and take advantage of existing source or target facilities to enable rapid processing.
Reporting and Reconciliation: provide progress indicators during migration and reconcile the completed process.
Recoverable: recover from the point of failure when necessary.
Reusable: ability to reuse components of the migration in other projects, including transformation functions, error handling and data cleansing routines.
With those characteristics in mind, a tools-based approach to a data migration will provide a lot of the “Rs” as standard “out-of-the-box” functionality to support the datamigration. As well as the tool-based approach, there is a seven-step process to execute a successful data migration:
- Source system exploration
The first phase of a data migration project is to identify and explore the source systems. The most appropriate route for identification is to group data, customer names, addresses and product descriptions based on the target model.
Although the source systems may contain thousands of fields, some might be duplicates or not be applicable to the target system. In this stage, it is critical to identify which data is required and where it is, as well as what data is redundant and not required for the migration.
Conversely, if the initially identified sources do not contain all of the data required for the target model, a gap is identified. In this case, you may have to consolidate data from multiple sources to create a record with the correct set of data to fulfill the requirements of the target.
Using multiple data sources allows you to add another element of data validation and a level of confidence in your data.
At the end of this phase, you will have identified the source data that will populate the target model. You will also have identified any gaps in the data and, if possible, included extra sources to compensate. Optimally, you will have broken down the data into categories that enable you to work on manageable and possible parallel tasks.
- Data assessment
The next logical phase is to assess the quality of this source data. If the new system fails due to data inconsistencies, incorrect or duplicate data, there is very limited value in migrating data to the target system. To assess the data, we recommend profiling the data.
Data profiling is the process of systematically scanning and analyzing the contents of all the columns in tables of interest. Profiling identifies data defects at the table and column level. Data profiling is integral to the process of evaluating the conformity of the data and ensuring compliance to the requirements of the target system.
The profiling functions include examining the actual record value and its metadata information. Too many data migration initiatives begin without first examining the quality levels of the source data. By including data profiling early in the migration process, the risks of project overruns, delays and potentially complete failures are reduced.
Through the use of data profiling, you can:
- Immediately identify whether the data will fit the business purpose.
- Accurately plan the integration strategy by identifying data anomalies up front.
- Successfully integrate the source data using an automated data quality process.
The output of this phase of the project is a thorough understanding of the data quality in the source systems, identification of data issues and a list of defined rules to be built to correct them. You will have identified and defined your data quality rules and mappings from the sources to the target model. At this point you will also have a good idea, at a high level, of the design of the integration processes.
- Migration design
In the migration design phase, the main tasks will be to define the technical architecture and design of the migration processes. In addition, you will define the testing processes and how to transition them to the production system. You will also determine whether there will be a parallel run, a zero-downtime migration, or whether you will be expecting to complete the migration and simply “decommission” the old system.
In this phase, it’s important to put your plans for the next four steps down on paper. Include timelines, technical details and any other concerns or approval requirements so that the entire project will be documented.
- Migration build
Be careful about using a “just enough” development approach with your data migration, simply because the migration will be executed only once and there will be limited reuse of the code. These assumptions explain why data migrations are prone to failure.
A typical sequence to follow when developing a migration is to subset the data and test one category of data at a time (e.g., product or customer). This approach aligns with the first stage of the project which through data exploration categorises the data. In the case of larger projects, you can develop and test each category in parallel. Testing the migration solution is usually an iterative approach. Start by checking the components individually in small subsets to ensure the mappings, transformations and data cleansing routines are working. Then, increase the data volumes and eventually link all of the components together into a single migration job.
Output from this phase results in a fully tested data migration process that is scalable, reliable and can deliver the migration within the allocated time.
After comprehensive testing, the time comes to run the migration. In the majority of cases, the source systems are shut down while the migration executes. To minimize impact, this is likely to occur over a weekend or public holiday. In some cases, where the source applications are required to run 24/7, 365 days a year, a zero-downtime migration approach may be needed. This requires a tool to provide the initial load processes, with additional data synchronization technology to capture any changes that took place during the initial load; then, it requires synchronizing the source and target data after the initial load finishes.
At some point after the data has been migrated, decide when to move to the new system and, where appropriate, retire the old system. During the execution phase, audit trails and logs will be created to ensure that all data has been correctly migrated and, when appropriate, that the correct synchronization has been achieved. Finally, after reviewing the audit trails and logs, you will be prepared to make the decision to transition users to the new system. Note that although audit trails and logs are invaluable, it also may be worth profiling the current source and target systems to ensure synchronization is correct.
In the case of a zero-downtime migration, this becomes more complex because updates on the systems may be occurring while a synchronization check is occurring. As a result, there may always be some delay in the synchronization. In this case, tools can verify synchronization by taking into account the lag between source system changes and data replication into the target system.
As part of the design process, a system retirement policy will be created to address the old system. There will also be ongoing data quality enhancements. Remember that because not all source systems will be retired, data quality issues may be identified and will need rectifying in several of the source systems. You will also need to manage ongoing improvements and monitor the data quality of the new system.
Before you know it, there will be the inevitable next migration project. At this point, it is well worth identifying existing components that can be reused, as well as possible improvements thatcan be applied to the next migration solution. End Note:
- Philip Howard, “Data Migration: A White Paper by Bloor Research” (Northamptonshire, UK: Bloor Research, September 2007).
SOURCE: Successful Data Migration: A Step-by-Step Process