Data Mapping NoSQL Databases


NoSQL databases are non-tabular databases that store data differently than traditional databases made up of relational tables. Two of the most popular NoSQL databases, MongoDB and Apache CouchDB, store data as collections of BSON (binary JSON) and JSON documents. These databases leverage flexible JSON schemas and scale easily with large amounts of data and high user loads.

Altova MapForce has long supported data mapping all popular relational databases and now also includes native support for data mapping NoSQL databases. MapForce includes functionality for inserting, extracting, filtering, and ordering NoSQL data. Let’s look at an example.

Shown below is a complete data mapping from the NoSQL MongoDB sample_analytics database. This mapping extracts a subset of the binary JSON (BSON) data from multiple collections, combines, filters,  transforms, and sorts the data, then exports the result as a single JSON document.

Data mapping NoSQL databases to JSON example

The sample_analytics NoSQL database is an example hosted on the publicly accessible MongoDB Atlas cloud server, storing training data for a mock financial services application. The database includes collections of BSON documents for 500 customers, 1,746 accounts, and 1,746 buy or sell transactions.

Data mapping NoSQL databases in MapForce starts by selecting the NoSQL database from the MapForce Insert Database Connection Wizard and entering the connection parameters. Next, we can select the collections to  map:

Selecting collections for data mapping from the NoSQL database

Each collection requires a JSON schema for mapping its objects. The schema may be assigned in the database for validation, or it may be an external file.

Assigning a JSON schema to the collection for data mapping

Here is an expanded view of objects in  the customers and transactions collections ready for NoSQL data mapping in MapForce:

View of the NoSQL collections ready for data mapping

After we drop in a JSON Schema or example JSON file for the output target, we’re ready to begin connecting objects in the database to the output. The MongoDB NoSQL database stores binary JSON data. The MapForce Function Library includes built-in functions to manipulate BSON data:

MapForce includes built-in functions to manipulate BSON data

The database includes customers with gmail, hotmail, and yahoo email addresses in their profiles but our mapping project is tasked with retriving only results for customers with yahoo.com addresses. Obviously, there is no SQL Select / Where query in a NoSQL database to filter the email addresses! So instead we can combine a BSON to-regex function with a MapForce built-in Where / Order structure to achieve the same result:

Creating a filter for the NoSQL data

The MapForce Where / Order Sort option also accomplishes another project requirement — sort the customers by birthdate in the output.

Properties dialog for the MapForce Where / Order structure

BSON format supports additional data formats than JSON string or numeric. MapForce automatically casts the birthdate from BSON date format to string for processing by the format-date function for the final output.

MapForce automatically casts BSON data format to string

Combining MapForce Where/Order structures with other MapForce built-in functions filters the transactions collection to output only “buy” transactions, and joins the transactions in grouped and sorted format to profiles in the customers collection.

Additional join, filter, and sort options for data mapping NoSQL databases

The output file consists of all customers with yahoo.com email addresses and reports their buy orders grouped by account for that customer. Further, the output file lists customers in order by birthdate from youngest to oldest. Here is a partial view of the JSON output in XMLSpy, the popular tool for editing, modeling, transforming, and debugging JSON, and XML technologies:

Partial view of the NoSQL data mapping output file in XMLSpy grid view
Click image to view full size

The the users appear in descending order by birthdate, and buy transactions for each account are in alphabetical order of the JSON “name” objects in the output.

If you need to periodically repeat processing to collect newer transactions, MapForce Server automates execution of data mappings designed in MapForce

MapForce is a powerful data mapping tool for data mapping NoSQL databases to or from any of: XML, JSON, databases, EDI, XBRL, flat files, Excel and/or Web services. To try it out for your own NoSQL data mapping  application, download a fully functional free trial today!

Tags: , , ,