Data Mapping Binary Objects


Binary objects are difficult to manage in databases. They are large, their content is not human readable, and they can contain bytes of data easily misinterpreted as control characters. Even the data type name for binary large objects – BLOB – reflects most database managers’ dislike of them. Before relational databases, the definition of a blob was “something undefined or amorphous.”

Altova MapForce, the award-winning, graphical data mapping tool for any-to-any conversion and integration, includes features for effortlessly data mapping binary objects to or from all popular relational databases. Data such as images, PDF files, video files, or any other binary data can be mapped. Let’s look at an example.

MapForce includes two built-in functions, read-binary-file and write-binary-file, for mapping between binary objects and databases. In this post we will expand an example of read-binary-file from the MapForce online Help to build a complete mapping to insert a variety of binary objects into a relational database.

Here is the Help example to illustrate inserting an image into a database:

Online Help example of binary file mapping

This mapping inserts the local file named image01.png into a database table. We can see the database is structured to include some metadata about the image in separate columns that could be queried to extract the same image. You don’t want to dump large binary data files into the database without a way to find them later!

For our use case, we’ll create a SQLite database to track various binary files that are assets supporting multiple products for a marketing department. The binary data might be logo image files, PDFs for product data sheets, screenshots of the product in action, photographs, video files, etc.

We’ll start with a simple CSV text file to describe the information we want to insert:

CSV model for database upload

By the time we complete this project, users might have dozens or even hundreds of assets in the database. When we need to extract an asset later, the names of the asset types will be critical metadata for queries. But if any individual entry in the type column is misspelled or inconsistent, a huge binary object will be orphaned and unrecoverable.

We can enforce a list of asset types by importing the CSV file into XMLSpy and generating an XML Schema where the Type element is selected from a defined list. Then an inexperienced marketing intern can create a list of assets in XMLSpy Grid view. The Type field is enforced:

Database upload model in XMLSpy grid view.

We could also add a list of Product names to the XML Schema, or at least require the Product element to be present. We can even set the Asset type to xs:anyURI, enforcing reference to a file. Here is the Details helper window in XMLSpy Schema view with the datatype setting for the Asset element:

XMLSchema type definition for the Asset element of the upload model

Data mapping binary objects in MapForce is essentially identical whether we map from a CSV or XML source.

We’ll start by opening a new MapForce data mapping and dropping in our list of assets and the database table.

Data mapping binary objects between a list of source objects and a database table

We used DatabaseSpy to create a new SQLite database table with some refinements to the example in the MapForce Help. The product column will hold the product name and the asset type goes in the description column. The recordDate column will keep track of the age of each binary object.

Now we’ll set the id column to auto number as described in the MapForce Help and draw the simple connections between the source and target:

Simple mapping connections between the source object list and the database table

Product and Type from the source map directly to columns in the database. Mapping the Row element instructs each row of input to create a new database record.

Now we’ll map binary data to BLOBs in the database:

Mapping the binary objects to BLOBs in the database

The built-in MapForce function read-binary-file uses the filename provided in the Asset element of the source to create the BLOB. The read-binary-file function always treats the source as base64binary data, regardless of any conventions based on the source file name.

To complete the mapping, we need connections to extract the filename from Asset and save the storage date:

Complete example of data mapping binary objects

The remove-folder function automatically extracts the filename from any path for local files, network files, or files on the Web.

The now function records the mapping execution date and time, but we only want the date, so we used substring-before to throw away the time.

The Output button at the bottom of the mapping pane executes the mapping and generates a SQL script:

SQL script generated for data mapping binary objects

The script is only a preview to check the results so far. A selection on the main Output menu runs the script:

Output menu option to execute the SQL script for data mapping binary objects

The script result is displayed:

SQL script result after successfully  data mapping binary objects

We’ll click the DB Query button to connect to the database and further validate the result right in MapForce:

MapForce database query window to validate data mapping binary objects

You can’t tell much by looking at the BLOBs in the productfile column, but the other columns create useful metadata about each asset.

In a followup post we’ll demonstrate data mapping binary objects to extract assets from the database and save them in their original forms. If you just can’t wait, download a free trial including Tutorials, Help, and many more examples to get started with your own data mapping, conversion, and transformation project!

Tags: ,