Case Study: MD Anderson Successfully Implements Tools to Migrate Clinical Research Data from MUMPS to Oracle

Roger A. MaduroThe University of Texas MD Anderson Cancer Center has successfully completed the initial stages of a major migration of their MUMPS-based Clinical Trials Databases to Oracle as one of the critical stages in MD Anderson’s multiyear eResearch project to modernize management of Clinical Trials. The MD Anderson Cancer Center chose CAV Systems' Evolve Suite to carry out the migration. The institutions have produced a case study outlining the background and results of this extraordinary project.

This is perhaps the greatest migration from MUMPS into another database ever carried out in the healthcare industry. The migration includes more than 4,000 clinical studies that had been administered annually using an in-house developed system – the Protocol Data Management System (PDMS). The active clinical trials will be migrated to a modern clinical trials software package, eResearch. In addition, MD Anderson needs to ensure that historical data must still be accessible indefinitely to meet regulatory and compliance requirements.

This migration has profound implications for the entire Health IT industry as it conclusively demonstrates that users can retrieve legacy and active data from legacy MUMPS-based systems, as well as existing proprietarty lock-in EHR systems. We have received permission from CAV Systems to reprint the entire case study. Roger A. Maduro, Publisher and Editor-in-Chief, Open Health News.

Case Study: MD Anderson MUMPS Migration

The Customer

Located in Houston, Texas, The University of Texas MD Anderson Cancer Center is a world leader in Cancer research. In addition to treating patients using FDA- approved medications and procedures, the Center also conducts clinical trials to evaluate the effectiveness of new treatments. As one of the original three comprehensive cancer centers in the United States established by the National Cancer Act of 1971, MD Anderson has been ranked Number 1 in cancer care more than any other institution during the past decade.

The Historical Background

At the time when the Center began to conduct clinical trials more than 3 decades ago, no off-the-shelf software products existed for managing clinical trials. The Center therefore developed its own system in-house – the Protocol Data Management System (PDMS) – using core technology that, at the time, was becoming well- established for applications related to managing clinical patient data in the US healthcare IT sector, namely MUMPS. The PDMS system proved itself during more than three decades of clinical trials reaching the stage today where the Center processes more than 4,000 clinical studies annually.

MD Anderson Cancer CenterWith the major changes that occurred in the IT landscape in the intervening years and the availability of clinical research management solutions, the strategic decision was taken by MD Anderson in 2007 to replace PDMS with a system specifically designed for today’s clinical trials environment as part of a major revamping of the organization’s clinical trials ecosystem.

The inherent nature of clinical trials is that they are conducted over long periods of time using a statistically significant number of patients who volunteer to participate in these trials with the recognition that their medical condition will be monitored and recorded regularly, possibly over many years, to determine both the effectiveness of the treatment and the possible side effects that may not show up until later in life. This results in very large data records of variable length.

The Strategic Choice

In May 2011, MD Anderson announced its choice of eResearch as the new generation solution for future clinical studies. Having made this key strategic decision, two major tactical challenges had to be addressed.

While the data for new clinical trials would be entered directly into the eResearch system, the data for clinical trials that were initiated under PDMS and were still ongoing, would require migration from the legacy system to the new system.

The second challenge concerned data belonging to clinical trials that had already been completed and were no longer considered active. In order to meet Federal and other regulatory and compliance requirements, the legacy data nevertheless had to be accessible for a period of time extending over several years.

Since a major objective of the transition to eResearch was to decommission the legacy system entirely together with its related costs, the question arose how to access the legacy PDMS data after the MUMPS platform was no longer available.

The Technical Challenges

The legacy platform was a VAX/VMS system with Digital Standard MUMPS (DSM) as the core application technology. With the strategic decision having already been taken to replace the MUMPS application with a solution that was not based on any MUMPS-related technology, the decision quickly followed that the legacy data would not remain in any MUMPS-related format.

Since MD Anderson was already using the Oracle database system throughout the organization for other applications and had Oracle skills in-house, the decision was taken to migrate the legacy data from DSM to Oracle. The technical challenges towards achieving this goal were not trivial. Three major steps had to be completed successfully:

  1. Data mapping
  2. Data migration
  3. Data validation.

Step #1: Data Mapping

The differences between the architecture and design of MUMPS database schemas and those of relational databases are significant even for databases of modest size and complexity. In the case of the MD Anderson system, the scope and size were measured in tens of thousands of data elements that would need to be defined and mapped. This is the key step in the data migration process without which steps #2 and #3 cannot be performed.

Step #2: Data Migration

Provided Step #1 has been completed successfully, this step is relatively straightforward. However, it would likely be quite time-consuming due to the size of the legacy database.

Step #3: Data Validation

This step is by no means trivial. Even if Steps #1 and #2 execute flawlessly, the migrated data still needs to be validated against the original legacy data – record-by- record, field-by-field for the entire database.

The Options

The MD Anderson team assigned to the migration project considered several approaches to mapping and migrating the data from the legacy Digital Standard MUMPS (DSM) system on the VAX/VMS platform to Oracle on a Windows Server platform. The two major options considered were (a) a manual approach, and (b) a fully or partly automated approach using software tools.

The manual approach would entail engaging experienced database analysts to identify every data element in the PDMS DSM database and its relations to other data elements in order to define a relational schema that, to the extent possible, would be meaningful to an Oracle database administrator.

With over 60,000 individual data fields in the PDMS system, and with the inevitability that the data migration process – whether manual or automated – will require several iterations of the “mapping migration validation” process to get it right, it became clear early on in the evaluation process that the manual option would have to be viewed as “the last resort” if no automated or semi-automated approaches could be found.

The most natural place to look for automated data mapping and conversion tools are the vendors of ETL packages – Extract, Transform and Load tools. While there is no shortage of such vendors, few – if any – can handle situations in which the source database is MUMPS. Following an extensive search on a global basis, the MD Anderson team identified CAV Systems Ltd as the only company with the tools and the expertise in both MUMPS and Oracle.

The Proof-of-Concept (PoC) Project

Recognizing that every legacy MUMPS system has its own idiosyncrasies and unique requirements, MD Anderson and CAV Systems jointly defined a Proof-of-Concept (PoC) pilot project to validate the claims and capabilities of the Evolve Suite [note: at that time, the names associated with the tool set were JUMPS (Java-from-MUMPS) and M2R_Replicator (MUMPS-to-Relational)].

Although MD Anderson’s requirements for their full project only required the migration of MUMPS data – not the MUMPS programs – the PoC project encompassed migration of a major subset of the PDMS programs as well as test data. This requirement was included in order to evaluate the viability from a technical standpoint of migrating the entire PDMS system as a fallback position should a catastrophic failure of the legacy hardware platform occur before the new replacement system was up and running.

Following successful completion of the PoC project in June 2011, MD Anderson evaluated all options for achieving their corporate objectives and, in December 2011, selected CAV Systems Ltd as the vendor to provide the technology and expertise to migrate the entire legacy PDMS database to Oracle as an essential stage in the eResearch project.

The Full Migration Project

During the PoC project, the joint team of MD Anderson and CAV Systems professionals identified a number of issues and areas of improvement that would facilitate the implementation, usability and performance of the migration toolset in the context of the overall revamping of the MD Anderson clinical trials ecosystem.

From a technical standpoint, the most significant changes were PDMS-specific enhancements tailored to generate the optimal schema for the migrated Oracle database by analyzing and mapping all the PDMS/FileMan hierarchical data levels, migrating essential meta-data, including support for foreign keys, and comments for each table and column.

From a logistical standpoint, the most significant change related to the need for the legacy PDMS system and the eResearch system to coexist during a transition period whose duration was likely to be influenced by progress in other areas of the eResearch project.

With this in mind, the decision was taken to introduce an intermediate platform – the staging platform – between the legacy PDMS platform and the Oracle database platform.

The staging platform was configured using the InterSystems Caché implementation of MUMPS instead of DSM. This was done for a number of reasons related to platform reliability, availability and performance of the data migration process itself. In this configuration, the entire PDMS database is exported from the legacy VAX/VMS platform and imported to the Caché database on the staging platform.

The data mapping and data migration tasks are then performed in their entirety on the staging platform thereby creating two outputs: (1) an XML file containing the mapping definitions in a form suitable for Oracle database definition and creation; (2) a data file in a format suitable for importing into the newly defined Oracle database.

In February 2012, the CAV and MD Anderson teams jointly installed the software tools on the staging platform and performed a complete run of the entire mapping and migration process with the full PDMS legacy database.

In addition to validating the toolset, this run served as the vehicle for training the MD Anderson team to be able to perform as many runs as may be necessary during the transition phase from the old to the new system. Following each run, the MD Anderson team validated the migrated Oracle data against the legacy MUMPS data using third-party and internally custom developed utilities.

The Benefits

The reputation of an organization engaged in the treatment of patients and clinical trials research is of paramount importance. It cannot be measured in purely monetary terms. It impacts the professional reputation not only of the institution itself but also of the professional staff – the professors, doctors, nurses, other clinical staff, and the IT staff. A key factor in maintaining reputation is the reliability of the IT systems and the integrity of the data – patient data that may extend over many years. In choosing CAV Systems Ltd and the Evolve Suite, MD Anderson secured the integrity of its clinical trials database. The Evolve Suite delivers an approach to the mapping and migration of very large legacy MUMPS databases to relational database replicas that is easy to comprehend, straightforward to implement, and simple to learn.