Data Mapping JSON Lines


The JSON data format continues to evolve as an open standard as it is creatively applied to new data interchange requirements. JSON Lines, defined at http://jsonlines.org/, is a convenient text format for storing structured data where each record is a single line and a valid JSON object. JSON Lines handles tabular data and clearly identifies data types without ambiguity. This allows records to be processed one at a time, which makes the format very useful for exporting and sending data.

Altova MapForce supports data mapping JSON Lines as either a data source or target. Let’s look at a mapping project to extract records from a database table and map to a JSON Lines file for output.

MapForce supports advanced data mapping, conversion, and transformation between all popular data formats and relational databases. For this example, let’s assume we’ve been tasked to create a JSON Lines file containing the information stored in a database table originally created for a mobile survey application. The Cars table is part of a database that contains model names and specifications for cars from a wide variety of manufacturers.

Data Mapping JSON Lines as the Output Target

Our strategy for completing this JSON Lines data mapping assignment is simple: examine the existing database table, create a small instance file to represent the data in JSON Lines format, map columns from the database table to corresponding items in the JSON Lines output in a MapForce mapping design, then simply click the MapForce Output button to automatically process the entire table to generate the output file.

We’ll use tools from the Altova MissionKit to complete each step along the way. First, we’ll open the database with DatabaseSpy and examine the Cars table in the table Designer view:

Source DB table for data mapping JSON Lines

The table contains 9 columns that correspond to JSON string and numeric data types, although several columns are permitted to be null. The window on the left shows the count of 6,266 rows of data. We can use the right-click context menu to retrieve the first 200 rows to examine the data:

Contents of the DB table for data mapping JSON Lines

Lines 6 and 7 reveal that some rows do indeed contain null data. Now that we understand the source data, we can create a small example file to represent the data structure in a single JSON Line. The JSON editing features of XMLSpy provide an excellent tool:

Example file for data mapping JSON Lines

Our JSON input is color coded in XMLSpy and we can verify the file is well-formed. Note we’ve saved the file with the extension .jsonl to indicate JSON Lines.

Next, we’ll open a new MapForce data mapping and drop in the database table and .jsonl instance file:

New design for data mapping JSON Lines

When we dropped in the .jsonl instance file, MapForce automatically created a JSON schema that describes the possible data types for each cell in the array that describes our data. The default schema only describes one item, but we can select the item and use the context menu to add duplicates, creating 8 additional children:

JSON schema with 9 items for data mapping JSON lines

Next we connect the database columns to the appropriate data type for each cell in the JSON array, and we also connect the Cars table to the first item at the top of the JSON schema. This instructs the mapping to create a new output line for each row in the Cars table.

Connecting the database source to output target data

We could click the Output button at the bottom of the mapping window to convert the entire table and preview the results, except for one complication: we haven’t considered any null contents. If we save the output file now any nulls in the database will be skipped entirely, leaving lines in the JSON Lines file with fewer than 9 fields. These lines would be impossible to interpret accurately in later processing.

Data Mapping JSON Lines with Null Data

MapForce includes a filter we can apply to identify and handle null fields:

Handling null fields when data mapping JSON lines

A single item in the JSON array cannot be both a number and a null value. The filter and connected components instruct that null data in the Cylinder column will be mapped to the null data type in item 4. If it is not null, it will be mapped to the number data type. We can use this identical filter structure for all remaining nullable columns.

MapForce includes a feature to create user functions that lets us save the filter structure and reuse it in this mapping and other JSON data mappings that include null data. We simply select the filter and supporting components and choose Create User Defined Function from the Function main menu:

Creating a user function to process null entries

The user function encapsulates the selected components, replaces them in the original mapping, and is added to the Function Library:

User function to detect null data in the source database

We can simply drag the user function from the library and connect it to the other nullable database columns.

Sometimes when data mapping JSON lines it may be preferable to map a null text value to an empty string. MapForce can handle that requirement with a substitute-missing function. In the image below, a null text value in the last database column is mapped to an empty string:

Mapping a null to an empty string

Here is a view of the completed data mapping with the null-detector user function:

A view of the completed data mapping for JSON Lines

Note the null-detector user function highlighted in the Libraries window on the left.

Now that all the possible null values are accounted for, we can click the Output button to generate the .jsonl output file:

A partial view of the .jsonl ouput file

Note that line 1675 and others contain both a null and an empty string, but lines 1690-1691 are fully populated with data.

For a one-time transformation, this output file can be immediately saved. If the database is continually updated and needs to be transformed again later, we can save the mapping as a MapForce Server execution file for automated processing with MapForce Server or deploy it directly to FlowForce Server.

Data Mapping JSON Lines as Source Files

The data mapping below illustrates mapping a similar data set from a .jsonl source file to a database table.

Data mapping JSON lines as the source for a new database table

To make this version more clear we edited the JSON schema in XMLSpy to add descriptions and to delete invalid data types for each cell in the array. Clicking the Output button for this mapping will generate a SQL script to load data into the database table.

Note that none of the null items require connection from the source to the target. Any necessary null values are inserted into the database table automatically as seen in line 18:

Generated SQL script for data mapping JSON lines to a database table

Download a free trial including Tutorials, Help, and many examples to get started with your own project for data mapping JSON Lines or other data mapping, conversion, and transformation needs!

Tags: , , , ,