Altova DiffDog Server 2024

Navigation: Comparing CSV and Database Data

Setting up Data Sources

Before you can perform comparison of tabular data originating from a database, a data source which includes the database connection details and other configuration parameters must be configured. One data source must exist for each database table, view, or custom SQL query that you want to include in the comparison. If CSV files are involved in the comparison, then a data source must exist for each CSV file as well.

To configure a data source, open the client configuration file and add a new [datasouce:<name>] section, where <name> is the unique name by which you want to identify this data source. The parameters that may be present in a [datasource] section are listed below.

You can also add data sources to the server configuration file. This is useful if you need to connect to the data source from the machine where DiffDog Server (not the DiffDog Command Line Client) runs. However, note that server-side connections that require Windows authentication for databases like SQL Server may encounter issues in this case, because the DiffDog Server service runs as a separate user who may not have access to the database. The workaround in that case is to define the data source in the client configuration file—the DiffDog Command Line Client runs as the user running the data-diff command.

As stated previously, each data source must provide one table, view or some custom named recordset retrieved by a SQL query. To specify this, set the table or sql options below (they are mutually exclusive). Importantly, for each table, view, or recordset, there should be at least one column which guarantees the uniqueness of each row. This column (or multiple columns) are subsequently referred to as the "key" columns. If the table has a primary key, a key column already exists and the uniqueness of the row is guaranteed. Otherwise, it is recommended that you specify one or more columns that should be treated as "key" columns. You can do this by setting the key-columns-by-name and key-columns-by-ordinal options below.

type

Mandatory parameter. Specifies the type of the database connection. Valid values are ado, adonet, csv, jdbc, odbc, postgresql, and sqlite. Example:

type=sqlite

Connections of type csv, postgresql and sqlite are built-in and do not require any drivers. For other database kinds, choose a database connection method that is suitable for the machine that will be running the comparison job. Such connections require that the corresponding database drivers be installed on the machine. For more information, see Supported Databases.

path

This parameter specifies the path to the database file. It is applicable for CSV files and SQLite databases. Do not set this parameter if you set the connection parameter, because they are mutually exclusive.

path=C:\Articles.sqlite

SQLite

path=C:\data.csv

CSV

connection

Conditional parameter. Specifies the database connection string. Do not set this parameter if you set the path parameter, because they are mutually exclusive.

connection=Data Source=DBSQLSERV;User ID=altova_user;Password=dhjdf84h;Provider=SQLNCLI11.1;Initial Catalog=NANONULL;Persist Security Info=true

SQL Server (ADO)

connection=jdbc:db2://mydb2-105:50000/NANONULL

DB2 (JDBC)

connection=jdbc:oracle:thin:@dbora12c:1521:orcl12c

Oracle (JDBC)

connection=my.dbserver.com

PostgreSQL (Native)

Note:

For native PostgreSQL connections, the connection can also be the IP address of the database server, for example, 127.0.0.1. The database and port of a native PostgreSQL connection are specified in the database and port options.

Call the db-drivers command to view connections available on the local machine, displayed in a format that you can copy-paste. Also, if you have licensed DatabaseSpy, you can create a database connection from there, and then reuse the connection details as displayed in the Properties window.

extend-classpath

This parameter is applicable only for JDBC connections. It is optional and enables you to extend the Java class path on the machine where DiffDog Server runs. This could be useful, for example, when you do not have the means or privileges to change the operating system's CLASSPATH environment variable. Note that there is no need to set this value if all the class paths required by the JDBC connection are already set via the CLASSPATH environment variable.

If you set this value, then it will be taken in consideration in addition to the value that may already be set in the CLASSPATH environment variable. For example, the following value adds two .jar files to the class path in order to establish a connection to an IBM DB2 database:

extend-classpath=C:\jdbc\db2\db2jcc.jar;C:\jdbc\db2\db2jcc_license_cu.jar

The JDBC driver paths defined in the CLASSPATH variable, as well as any .jar file paths specified in the extend-classpath parameter are all supplied to the Java Virtual Machine (JVM). The JVM then decides which drivers to use in order to establish a connection. It is recommended to keep track of Java classes loaded into the JVM so as not to create potential JDBC driver conflicts and avoid unexpected results when connecting to the database.

database

Mandatory and applicable only for native PostgreSQL connections. Specifies the database of the PostgreSQL connection.

database = ZooDB

port

Specifies the port of the PostgreSQL database connection. Applicable only for native PostgreSQL database connections.

separator

Optional parameter. Applicable only for CSV data sources. It indicates the field separator used in the CSV file. If not specified, the default separator is assumed to be comma. Valid values are comma (default), tab, and semicolon.

separator=comma

table

Conditional parameter. Specifies the full name of the database table or view to be used in the left or right part of the comparison. If you configure this parameter, do not set the sql parameter, because they are mutually exclusive. Examples:

table=[HR].[dbo].[Employees]

SQL Server

table="ALTOVA_ADMIN"."customers"

IBM DB2

Tip:	If you have DatabaseSpy installed, you can easily obtain the full table name by right-clicking the table in Online Browser, and then selecting Show in new SQL Editor \| Path from the context menu.

key-columns-by-name

For each table, view, or recordset, there should be at least one column which guarantees the uniqueness of each row. This column (or multiple columns) are subsequently referred to as the "key" columns. If the table has a primary key, a key column already exists and the uniqueness of the row is guaranteed. Otherwise, it is recommended that you specify one or more columns that should be treated as "key" columns. These don't necessarily have to be columns that are part of an actual primary or unique key on the table. Examples:

key-columns-by-name="guid"

key-columns-by-name="first_name","last_name","email"

Defining "key" columns is optional but should be done as best practice. If you don't define the "key" columns, the application will determine them automatically, but this may not work consistently across all database drivers. Also, detection works for tables or views only, so it is recommended that you define the "key" columns explicitly if you are using SQL queries that return computed values or results returned by stored procedures.

If you specify key columns, then the number of key columns must be the same for the "left" and "right" data source. For example, if you specify two key columns for the left data source and only one key column for the right data source, the comparison will fail with an error.

In case of CSV files, you can refer to key columns by name as follows:

•If the first row of the CSV file contains the column names, you can use the column name, for example:

key-columns-by-name="first","last"

•If there is no header row in the CSV file, you can use the conventional names c1, c2, c3, and so on, where each number identifies that respective column, for example:

key-columns-by-name=c1,c2

key-columns-by-ordinal

You can specify a "key" column either by name (see above) or by its ordinal number in the table, starting with 1. This option lets you specify key columns by their ordinal number. Examples:

key-columns-by-ordinal=1

key-columns-by-ordinal=1,2,5

If you specify key columns, then the number of key columns must be the same for the "left" and "right" data source.

user

Conditional parameter. Specifies the username required to connect to the database. This parameter is not necessary if the database accepts anonymous connections, or if the connection string already includes the username (see the connection parameter).

pass

Conditional parameter. Specifies the password required to connect to the database. This parameter is not necessary if the database accepts anonymous connections, or if the connection string already includes the password (see the connection parameter).

assembly

This option is mandatory for ADO.NET connections. Examples:

assembly=System.Data.SqlClient.SqlClientFactory, System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089

.NET Framework Data Provider for SQL Server

assembly=IBM.Data.DB2.iSeries.iDB2Factory, IBM.Data.DB2.iSeries, Version=12.0.0.0, Culture=neutral, PublicKeyToken=9cdb2ebfb1f93a26

.NET Framework Data Provider for IBM i

Run the db-drivers command to view .NET assemblies available on the local machine, displayed in a format that you can copy-paste.

class

This option is mandatory for ADO.NET and JDBC connections. Examples:

class=com.ibm.db2.jcc.DB2Driver

IBM DB2 via JDBC

class=com.mysql.jdbc.Driver

MySQL via JDBC

class=System.Data.SqlClient

SQL Server via ADO.NET

Run the db-drivers command to view the JDBC drivers available on the local machine, displayed in a format that you can copy-paste.

sql

Conditional parameter. Specifies a SELECT query in the syntax of the current database. The recordset retrieved by the SELECT query may be used as the left or right part of the comparison. If you set this parameter, do not set the table parameter, because they are mutually exclusive. Examples:

sql="SELECT * FROM employees WHERE a > 5 and b < 6 ORDER BY c"

SELECT query

sql="EXEC dbo.uspGetEmployeeManagers @BusinessEntityID = 50"

Stored procedure

If you specify a stored procedure call, it is expected that the stored procedure returns a recordset, not just output parameters.

has-header-row

Optional parameter. Applicable only for CSV files. Valid values are true or false. If set to true, the first row of the CSV file is treated as a header column and does not take part in the comparison.

has-header-row=true