Validating Data Transformation


Software developers and other data professionals often need to transform data from one format to another. These transformations can be simple one-to-one conversions or may require more complex manipulation. For instance, relationships must be generated when importing flat CSV files into a database, or source data may need to be split for the target, as in full name vs. first, middle, last, and optional suffix. Validating data transformation is critical to prevent data loss or corruption.

In our recent post on Web service data integration combined a string value for GMT time with a numeric offset in seconds to generate the local time for weather forecasts. We created a user function that performed all the steps required to complete this operation. MapForce includes a powerful interactive data mapping debugger that can easily trace and validate this transformation. Let’s take a look at how it works.

Here is the view of the user function calc-local-time in our data mapping:

A user function to calculate local time in the Web service data mapping

The function accepts dt_txt, which is a string value for a date and time in the form 2020-02-22 18:00:00, and the numeric value of timezone, the offset from GMT for the forecast location provided in seconds.

The output of the function is mapped to column A of an Excel spreadsheet. Inside the spreadsheet the format of column A is defined as Date, and the format specified includes the time, as in 5/20/20 4:00 PM.

Double-clicking the header of the user function expands it to show the individual operations:

Operations included in the user function to calculate local time

The two input values are on the right and the final output labelled A is on the far left. Explained simply, the function adds the seconds in timezone to the dt_txt date and time to calculate the local time. MapForce includes a built-in datetime-add function that performs this operation, shown in the image above, just to the left of the output value A. But nothing is as simple as it looks! The input values received from the Web service enter in different formats and must be adjusted to the format and  syntax of the datetime-add function.

Validating Data Transformation Using the MapForce Debugger

We can use the MapForce debugger to trace the steps of the user function to make sure it gives the desired result. First, we’ll set a debugging breakpoint by right-clicking the timezone input:

Validating data transformation with a debugger breakpoint

Several debugging helper windows are opened and mapping execution proceeds until the breakpoint is reached. The processed value at the breakpoint is highlighted:

Data mapping execution pauses at the breakpoint

We have several stepping options to continue execution. The Minimal Step option provides the most granular view of execution.

Debugger stepping options in the MapForce toolbar

This command subdivides a step and will typically stop twice for each connection: once when its source produces a value and once when its target consumes it. Repeatedly clicking Minimal Step continues executing the mapping and allows us to see exactly what happens as the calc-local-time function processes:

Stepping through execution of the user function

As we continue stepping through execution, values at each connection are added to the window.

The built-in datetime-add function requires a datetime value for the first input and a string value specifying the period of time to add as the second value. In the image below we continued stepping all the way up to production of the output value. Highlighted are the inputs to the datetime-add function and the final produced result:

Continued stepping to view the function result

The original offset value for seconds is 28,800 seconds, which equals eight hours. The produced result is eight hours later than the dt_txt reference time, so the output is valid.

Considering Other Cases

The offset from GMT time will only be positive for locations that are east of GMT. Locations west of GMT should have a negative offset. In other words, New York is west of London and local time is earlier. Beijing is east of GMT and local time is later.

We need to validate the calc-local-time function for a case where the timezone offset is negative.

There are several ways we could do that. Our input list of cites has some locations east of GMT and some to the west. We could keep stepping through execution until we encounter a location with a negative timezone.

View of the city list input file for the Eb service data mapping

The Web service returns 40 forecasts for each city, so stepping though the entire first four cities would be tedious.

We could modify the list of cities to insert a new one from the western hemisphere at the top of the list or we could create a new  input file for a test case. In many situations that’s a reasonable testing strategy.

Or, we can set a breakpoint condition. If we can set the breakpoint to pause only when the value encountered is negative, we can continue with our original input file. That will be the fastest way to reach a final conclusion.

Returning to the data mapping, we right-click the breakpoint to modify its properties:

Modifying the breakpoint properties

This opens a dialog where we set a property to make the breakpoint conditional. In the image below shows the condition to halt execution if the timezone value is negative:

Breakpoint properties dialog

Now we can re-execute the mapping in debug mode. The first few cities on the input list that are all east of GMT are processed. When the first negative timezone value is encountered, the breakpoint is triggered and execution pauses.

The breakpoint condition is encountered and execution pauses

Using the same stepping technique described above, we can trace what happens to the negative value as the calc-local-time function processes.

Validating data transformation by tracing execution for a negative input value

The timezone value of -25,200 seconds is converted to a string. Then the built-in concat function adds characters at the beginning and end to build the duration string for the datetime-add function.

Adding a negative value is the same as subtraction, but at this point the syntax for duration isn’t quite correct. The minus sign (hyphen) needs to be the first character of the duration string, so the replace function searches the string for PT- and if found, replaces it with -PT.

The value of 25,200 seconds is seven hours. Subtracting seven hours from the input value of 2020-05-23T00:00:00 equals 2020-05-22T17:00:00, or 5/33/2020 at 5:00 PM. We have now validated the user function for cases with positive and negative offsets.

Whether you are validating data transformation for accuracy or tracking down an unexpected result, the  MapForce data mapping debugger provides deep insight into the exact inner workings of data integration and ETL projects in a way that was never possible before, not even in much higher-cost mapping products.

Watch a quick video demo of MapForce, or download a free trial including Tutorials, Help, and many more examples to get started with your own project for JSON and Web service data integration, or other data mapping, conversion, and transformation needs!

Tags: , , , ,
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *