Database   Migration   Metrics     
       
          Project planning metrics for database migration  

 

 

 

Introduction

 

When manually migrating a source database to a target, the scope of the work required is clear, as every source object will have to be converted to the target. Based on a figure for the average amount of time needed to convert one object, project planning is relatively straightforward and relatively low risk. If we underestimate the average amount of time needed to convert an object by, say, 5%, then we underestimate our budget by approximately 5%.

 

As manually converting every object in a source database is very expensive, the key objective when migrating a database is to automate the process as far as possible by using a migration tool. We might use the somewhat limited conversion facilities that are available in most SQL editors, we might purchase a dedicated database migration tool that provides a much higher conversion rate, or we might create custom software to parse and convert some categories of source database objects.

 

When we use a migration tool we decrease the cost but increase the project risk. With a high level of automation we dramatically decrease the cost, but also dramatically increase the project risk – automated database migration is not dissimilar to trading in highly leveraged assets. Even a slight miscalculation of how effective the tool will be can have a dramatic impact on project budgets and timescales. For example, if we misestimate what the tool can convert by about 5%, say, 97% when in fact it can only convert 92%, then we underestimate our budget by about 267%.

 

In this article we’ll examine a collection of metrics that can be used to characterize the database migration process. If we calibrate these metrics using a particular tool and a representative sample of source database objects, then we can obtain a useful estimate of how much the tool will actually save us in terms of development time and cost, and we will also have a useful framework for quantifying how much work has been done, and how much more work remains to be done, at any stage during the project.

 

 

Migration Phases

 

Our source database will consist of objects that belong to various object types. For example, most databases contain tables, views, primary-key constraints, unique-key constraints, foreign-key constraints, check constraints, sequences, functions, procedures, and triggers. Figure 1 shows the various ways in which we can characterize these objects at various stages throughout the migration process:

            *
      Figure 1

 

We can divide the migration process into three phases:

 

*  Object Classification

*  Migration Tool Execution

*  Target Testing

 

Within each phase the objects are divided into categories. The symbol in brackets that follows each object name represents the number of objects within that category.

 

 

Object Classification

 

We start with a collection of Database Objects (total number “DN”), corresponding to either the entire database or to some representative sample of database objects. During the first phase we classify these objects according to whether or not they can be migrated by the tool.

 

Unsupported Object Types

 

Typically, a migration tool will only support the migration of certain types of objects; for example, it might support the migration of tables, but not the migration of user-defined operators.

 

Unsupported Objects

 

Let the number of Unsupported Objects, the total number of database objects that belong to all the unsupported object types, equal “CU”.

 

Unsupported Objects Ratio

 

The Unsupported Objects Ratio (CU/DN) provides an initial estimate for the fraction of all objects that will have to be migrated by manual means.

 

Supported Object Types

 

The Supported Object Types represent all those types of objects that can, in principle, be migrated by the migration tool.

 

Partially Supported Objects

 

It’s easy to assume that if a migration tool supports a particular object type, say tables, then it will support the migration of all objects that belong to this type, say all tables. However, it’s common for an object belonging to a Supported Object Type, say a particular table, to reference an object that belongs to an Unsupported Object Type, say, a particular user-defined operator (see the dotted line in the diagram). These objects are Partially Supported Objects (total number “CP”). They are partially supported in that the migration tool may perform a partial conversion by ignoring the reference altogether, or it may perform a partial conversion after we have commented out the reference in the source code.

 

Partially Supported Objects Ratio

 

The Partially Supported Objects Ratio (CP/DN) provides an initial estimate for the fraction of all objects that will require some manual intervention in order to proceed further with the migration process.

 

Supported Objects

 

Supported Objects (total number “CS”) represent those objects that the migration should in principle be able to migrate automatically.

 

Supported Objects Ratio

 

The Supported Objects Ratio (CS/DN) provides an initial estimate for the fraction of objects that will be migrated automatically.

 

Conclusion

 

The migration tool should be able to parse the source database data dictionary and derive values for these three key object classification ratios.

 

Note that since:

 

*  CS + CP + CU = DN

 

we have:

 

*  Supported-Objects-Ratio +
Partially-Supported-Objects-Ratio +
Unsupported-Objects-Ratio = 1

 

 

Migration Tool Execution

 

The second phase in data migration occurs when we run the migration tool and examine the reports that it produces. In this phase we are able to break down the number of Supported Objects into further categories and gain a better understanding of the effort that will be involved in migrating all the database objects.

 

Unparsable Objects

 

Migration tools are usually designed to handle a subset of the language constructs and syntax available for use in defining source database objects. Hence, the migration tool will usually be unable to parse and convert all Supported Objects. Unparsable Objects (total number “MU”) are those Supported Objects that the migration tool fails to convert due to parsing errors.

 

Source Parsing Error Rate

 

The Source Parsing Error Rate (MU/CS) provides an initial estimate for the fraction of all Supported Objects that cannot be parsed and which will require some manual intervention in order to proceed further with the migration process.

 

Unimportable Objects

 

After the migration tool has successfully converted the source code of an object into target code, it will run that code against the target. However, in some cases the migration tool will have made a conversion error and the target database will be unable to successfully parse the target object. Unimportable Objects (total number “MT”) are those Supported Objects that the migration tool can convert without recognizing that it has made an error but which the target database rejects as containing invalid syntax.

 

Target Import Error Rate

 

The Target Import Error Rate (MT/CS) provides an initial estimate for the fraction of all Supported Objects that the migration tool considers it has converted correctly but which cannot be imported into the target, and which will require some manual intervention in order to proceed further with the migration process.

 

Provisionally Valid Objects

 

After the migration tool has successfully converted the source code of an object into target code, it will run the code against the target, and in many cases the code will run without generating any errors. Provisionally Valid Objects (total number “MP”) are those Supported Objects that the migration tool can convert and which the target database accepts as containing valid syntax.

 

Speculative Objects

 

A migration tool may be unsure as to whether or not it has correctly converted an object, and may flag up that object as requiring investigation. Speculative Objects (total number “MS”) are those Supported Objects that the migration tool has converted and which it suspects may not correctly implement the functionality associated with the corresponding source objects.

 

Speculative Conversion Ratio

 

The Speculative Conversion Ratio (MS/CS) provides an initial estimate for the fraction of all Supported Objects that appear to be valid but which require further investigation.

 

Invalid Speculative Objects

 

Once we examine the Speculative Objects and run the converted code against the target, we can divide them into two categories. The first is Invalid Speculative Objects (total number “MI”), representing those Speculative Objects that have been found to contain errors. The second is Provisionally Valid Objects, and these objects can be added to the pool of other Provisionally Valid Objects that has already been created by the migration tool (these objects increase the value of “MP” above that initially calculated above).

 

Speculative Conversion Error Rate

 

The Speculative Conversion Error Rate (MI/CS) provides an initial estimate for the fraction of all questionable Supported Objects that are deemed invalid on further investigation.

 

Provisionally Valid Objects Ratio

 

The Provisionally Valid Objects Ratio (MP/CS) provides an initial estimate for the fraction of all Supported Objects that are valid (the value of “MP” includes the contribution that comes from the analysis of speculative conversions).

 

Type I Error Rate

 

The Type I Error Rate, (MU+MT+MI)/CS, provides an initial estimate for the fraction of all Supported Objects that contain errors and which will require some manual intervention in order to proceed further with the migration process.

 

Conclusion

 

The migration tool should be able to provide values for the Source Parsing Error Rate, the Target Import Error Rate, the Speculative Conversion Ratio, and a “first-cut” value for the Provisionally Valid Objects Ratio.

 

Note that since:

 

*  MP + MU + MT + MI = CS

 

we have:

 

*  Provisionally-Valid-Objects-Ratio +
Type-I-Error-Rate = 1

 

and we also have:

 

*  Source-Parsing-Error-Rate +
Target-Import-Error-Rate +
Speculative-Conversion-Error-Rate =
Type-I-Error-Rate

 

 

Target Testing

 

The third phase in database migration occurs when we run a suite of test cases against the converted objects that have been imported into the target. While these objects are syntactically valid, they may still not be semantically valid.

 

Semantically Invalid Objects

 

Semantically Invalid Objects (total number “TS”) represent those syntactically valid target objects that still contain semantic errors.

 

Type II Error Rate

 

The Type II Error Rate (TS/CS) provides an initial estimate for the fraction of all Supported Objects that have been converted and run against the target without error, but which are still not functionally equivalent to the corresponding source objects. They will require some manual intervention in order to proceed further with the migration process.

 

Migrated Objects

 

Migrated Objects (total number “TM”) represent target objects that are valid both syntactically and semantically and which correctly implement the corresponding source objects.

 

Supported Objects Migration Rate

 

The Supported Objects Migration Rate (TM/CS) provides an initial estimate for the fraction of all Supported Objects that are valid both syntactically and semantically and which correctly implement the corresponding source objects.

 

Migration Rate

 

The Migration Rate (TM/DN) provides an initial estimate for the fraction of all database objects that are valid both syntactically and semantically and which correctly implement the functionality present in the corresponding source objects.

 

Conclusion

 

Note that:

 

*  Supported-Objects-Ratio *
Supported-Objects-Migration-Rate =
Migration-Rate

 

and:

 

*  Supported-Objects-Migration-Rate +
Type-I-Error-Rate +
Type-II-Error-Rate = 1

 

and:

 

*  Migration-Rate =
Supported-Objects-Ratio *
( 1 - Type-I-Error-Rate - Type-II-Error-Rate )

 

So, if we know what fraction of all database objects are supported, and the error rates that occur during migration tool execution and target testing, then we can calculate the Migration Rate.

 

 

Object Costing Groups

 

As a result of our analysis we have ended up with seven object categories, one representing objects that have been successfully migrated, and six that represent migrations that have failed at various stages during the process for different reasons, or migrations that have never been attempted. For the purpose of developing a costing model for a project these categories can be grouped together as follows:

 

*  Migrated Objects (TM)

 

*  Partially Migrated Objects (PM):
   Partially Supported Objects (CP)
   Unparsable Objects (MU)
   Invalid Speculative Objects (MI)
   Unimportable Objects (MT)
   Semantically Invalid Objects (TS)

 

*  Unsupported Objects (CU)

 

In the case of the Migrated Objects (“green” in the diagram) all the migration work has been done automatically by the migration tool, so the cost of migration will be close to zero.

 

In the case of the Unsupported Objects (“red” in the diagram) the migration tool has provided no assistance, so these objects will have to be migrated manually.

 

The Partially Migrated Objects (“blue” in the diagram) represent an intermediate case. They all contain at least one construct that prevents them from being migrated by the tool, and they may well contain multiple such constructs. They have been listed above according to the likelihood that they will contain multiple constructs that impede an automated migration (Partially Supported Objects are likely to be the most difficult and Semantically Invalid Objects are likely to be the least). For this group of objects it is likely that the migration tool will do at least some portion of the migration for us, so that we will only have to manually migrate part of the code. So the cost of migration will be less, often much less, than that required for a full manual migration.

 

Partially Migrated Objects Ratio

 

The Partially Migrated Objects Ratio equals the ratio of the number of Partially Migrated Objects (PM) to the total number of Database Objects (DN).

 

 

Object Quotients

 

In order to develop a costing model, we need some quotients that measure how efficient the migration tool is at converting objects that belong to different object groups. The fewer the number of Unsupported Objects and Partially Migrated Objects relative to the number of Migrated Objects then the better the migration tool and the lower the migration cost.

 

Unsupported Object Quotient

 

The Unsupported Object Quotient (UOQ) equals the ratio of the number of Unsupported Objects (CU) to the number of Migrated Objects (TM).

 

Partially Migrated Object Quotient

 

The Partially Migrated Object Quotient (POQ) equals the ratio of the number of Partially Migrated Objects (PM) to the number of Migrated Objects (TM).

 

 

Cost Quotients

 

A common mistake when evaluating migration tools is to mentally translate the Migration Rate into a prospective cost saving – if the migration tool can migrate 90% of the objects automatically, then the cost saving will be 90%! This analysis would only be valid if the costs of manually migrating those objects that the migration tool migrates and those that it doesn’t migrate were the same. But this is far from being the case, as migration tools tend to convert the simplest objects, those would require the least effort to convert manually.

 

Another trap when comparing migration tools is the problem of comparing “apples with oranges” when it comes to what constitutes an object. Suppose, for example, that we have to migrate 50 tables and 50 functions and the migration tool migrates all the tables and none of the functions. Then if a table is classified as a single object the Migration Rate is a 50%. But if each table contains on average 10 constraints, and we classify constraints are separate objects, then we have to migrate 50 tables, 500 constraints, and 50 functions, and our Migration Rate has increased from a miserly 50% to an impressive 92%.

 

Unsupported Cost Quotient

 

The Unsupported Cost Quotient (UCQ) equals the ratio of the average cost of manually migrating an Unsupported Object to the average cost of manually migrating an object that can be automatically migrated.

 

Partially Migrated Cost Quotient

 

The Partially Migrated Cost Quotient (PCQ) equals the ratio of the average cost of manually migrating a Partially Migrated Object to the average cost of manually migrating an object that can be automatically migrated.

 

 

Quality of Partial Conversions

 

Yet another factor that is easily overlooked when evaluating a migration tool is the quality of partial conversions. For example, which migration tool would you prefer, one with a Migration Rate of 60% or one with a Migration Rate of 80%? Let’s suppose that for the second tool the 20% of objects that are not migrated are Unsupported Objects, while for the first tool the 40% of objects that are not migrated are Partially Migrated Objects, and that on average the migration tool converts 80% of the code contained in these objects. Then the Effective Migration Rate for the second tool is 80% while that for the first tool equals 92%. Which migration tool would you prefer now?

 

Partial Conversion Efficiency

 

The Partial Conversion Efficiency (PCE) is the fraction of the code contained in a Partially Migrated Object that is migrated automatically by the migration tool.

 

Hence, the cost of migrating a Partially Migrated Object is about (1-PCE) times the cost of migrating the same object manually.

 

Effective Migration Rate

 

The Effective Migration Rate equals the Migration Rate Plus the product of the Partially Migrated Object Ratio and the Partial Conversion Efficiency, or TM/DN + PCE*PM/DN.

 

 

Percentage Cost Savings

 

We are now in a position to estimate the cost savings offered by a migration tool taking all the relevant factors into account.

 

Cost with Migration Tool

 

The cost of migrating the objects when using the migration tool equals:

 

*  Number of Unsupported Objects *
Cost-of-Manually-Migrating-an-Unsupported Object
   +
Number-of-Partially-Migrated-Objects *
Cost-of-Manually-Migrating-a-Partially-Migrated-Object *
( 1 - Partial-Conversion-Efficiency )

 

When we multiply and divide by the product of the Number of Migrated Objects and the Cost of Manually Migrating a Migrated Object the cost (CT) becomes:

 

*  Number-of-Migrated-Objects *
Cost-of-Manually-Migrating-a-Migrated-Object *
( UOQ * UCQ + POQ * PCQ * ( 1 - PCE ) )

 

Cost without Migration Tool

 

The equivalent cost of a manual conversion if the migration tool was not used would be:

 

*   Number of Unsupported Objects *
Cost-of-Manually-Migrating-an-Unsupported-Object
   +
Number-of-Partially-Migrated-Objects *
Cost-of-Manually-Migrating-a-Partially-Migrated-Object
   +
Number-of-Migrated-Objects *
Cost-of-Manually-Migrating-a-Migrated-Object

 

When we multiply and divide by the product of the Number of Migrated Objects and the Cost of Manually Migrating a Migrated Object the cost (CM) becomes:

 

*  Number-of-Migrated-Objects *
Cost-of-Manually-Migrating-a-Migrated-Object *
( UOQ * UCQ + POQ * PCQ + 1 )

 

Percentage Saving

 

Hence, the percentage saving obtained by using the migration tool equals:

 

*  100 * ( CM – CT ) / CM

 

or:

 

*  100 * ( 1 – ( UOQ * UCQ + POQ * PCQ * ( 1 - PCE ) ) /
                 (UOQ * UCQ + POQ * PCQ + 1 ) )

 

We can estimate all these factors by taking a representative sample of database objects and by using the migration tool to migrate them.

 

For organizations that undertake many migrations, it can be useful to break these metrics down by object type, as different databases will have very different object type mixes and the costs of migration will vary very widely. For example, a business intelligence database that contains little by way of procedural code often has a Migration Rate very close to 100%, whereas a transactional database with many complex business rules implemented by way of triggers, functions, and procedures, will have a much lower Migration Rate, and the cost of migration will be correspondingly higher.

 

 

Summary

 

The migration and costing metrics that we’ve derived are summarized in the following tables.

 

The object classification metrics (x + y + z = 100%) associated with the object classification phase are listed in the following table:

 

Object Classification MetricsValue
Supported Objects Ratiox%
Partially Supported Objects Ratioy%
Unsupported Objects Ratioz%

 

The speculative conversion metric associated with the migration tool execution phase is listed in the following table:

 

Speculative Conversion MetricValue
Speculative Conversion Ratiox%

 

The object conversion metrics (a + b = 100% and b = x + y + z) associated with the migration tool execution phase are listed in the following table:

 

Object Conversion MetricsValue
Provisionally Valid Objects Ratioa%
Type I Error Rate:
   • Source Parsing Error Rate (x%)
   • Speculative Conversion Error Rate (y%)
   • Target Import Error Rate (z%)
b%

 

The target testing metrics (x + y = Provisionally Valid Objects Ratio, a%) associated with the target testing phase are listed in the following table:

 

Target Testing MetricsValue
Supported Objects Migration Ratex%
Type II Error Ratey%

 

The raw migration rate metrics (x + y + z = 100%) are listed in the following table:

 

Raw Migration Rate MetricsValue
Migration Ratex%
Partially Migrated Objects Ratioy%
Unsupported Objects Ratioz%

 

The effective migration rate metrics are listed in the following table:

 

Effective Migration Rate MetricsValue
Partial Conversion Efficiencyx
Effective Migration Ratey%

 

The object conversion costing metrics are listed in the following table:

 

Object Conversion Costing MetricsValue
Unsupported Object Quotienta
Partially Migrated Object Quotientb
Unsupported Cost Quotientc
Partially Migrated Cost Quotient d
Percentage Cost Savinge%