Executing Python Scripts through callback from RaptorXML

Creating Python Callback Scripts

RaptorXML invokes python scripts after a job is finished. These scripts are normal Python script files which define one or more of the RaptorXML Python API Entry-point functions and are passed to RaptorXML (see Passing a Python Callback Script to RaptorXML). The overall structure of a Python callback script used to access the Python interface is as follows. Notice how the entry-point Python function is defined.

# 1 imports
import os
from altova import xml, xsd, xbrl
# 2 entry point
def on_xsi_finished(job,instance):
        filename = os.path.join(job.output_dir,'script_out.txt')
        job.append_output_filename(filename)
        f = open(filename,'w')
# 3 do something with the instance object, write output to f
        f.close()

# 4 other entry points, helper classes or functions
CodeBlock-1
  ...
CodeBlock-N

Description of the Python script structure shown above:

  1. Imports Python’s built-in os module, and then some of the RaptorXML specific modules from the altova package.

  2. The entry-point Python function on_xsi_finished (Supported Python API Callbacks).

  3. Your application logic goes here, do something with the instance object, write output to f.

  4. Additional blocks of code, each containing function definitions or other code.

Note

Please keep in mind that the altova.xbrl.* modules are only available in RaptorXML+XBRL .

  • The line def on_xsi_finished(job,instance) declares the entry-point Python function.

  • This is the function that is invoked after RaptorXML+XBRL Server has executed the command valxml-withxsd (xsi).

  • The job and instance arguments are provided by RaptorXML+XBRL Server.

  • The filename variable is constructed by joining job.output_dir and the name of the file.

    • In case of HTTP it will use the temporary job output directory on the server.

    • In case of command line invocation the working directory is used.

  • The job.append_output_filename function appends a filename to the job output.

Passing a Python Callback Script to RaptorXML

RatporXML command line

Python scripts for callback invocation are passed to RaptorXML+XBRL by giving the script’s URL as the value of the --script option. The --script option to invoke Python callback scripts is supported for the following commands:

  • valxml-withxsd (xsi)

  • valdtd (dtd)

  • valxsd (xsd)

  • valxbrltaxonomy (dts)

  • valxbrl (xbrl)

  • valinlinexbrl (ixbrl)

These commands can be used on the command line interface or via the HTTP interface. Here are examples of usage with the different commands:

raptorxmlxbrl xsi --script=xml.py --script-api-version=2.8.1 --streaming=false test.xml
raptorxmlxbrl xsd --script=xsd.py --script-api-version=2.8.1 test.xsd
raptorxmlxbrl dts --script=dts.py --script-api-version=2.8.1 test.xsd
raptorxmlxbrl xbrl --script=xbrl.py --script-api-version=2.8.1 test.xbrl
raptorxmlxbrl ixbrl --script=inlinexbrl.py --script-api-version=2.8.1 test.htm

Note

When using the --script option with the valxml-withxsd command, make sure to specify --streaming=false. Otherwise a the script will not be executed and a warning is issued.

Note

The --script-api-version=2.8.1 option is optional and defaults to the latest RaptorXML Python API version. When it is important that you use an exact version of the api (e.g. after upgrades when RaptorXML+XBRL might update the default) it is suggested to specify this version explicitly.

RaptorXML+XBRL Server

A Python callback script is passed with the script option in the JSON job description of the following commands:

  • valxml-withxsd (xsi)

  • valdtd (dtd)

  • valxsd (xsd)

  • valxbrltaxonomy (dts)

  • valxbrl (xbrl)

  • valinlinexbrl (ixbrl)

{
        ...
        "script": "myscript.py"
        ...
}

Secure Python Script execution on RaptorXML+XBRL Server

When a Python callback script is specified in a command via HTTP to RaptorXML+XBRL Server, the script will only work if it is located in the trusted directory (Server Setup). The trusted directory is specified in the server.script-root-dir setting of the server configuration file etc/server_config.xml, and a trusted directory must be specified if you wish to use Python callback scripts. The script is executed from the trusted directory (or any sub-directory). Specifying a Python script from any other directory will result in an error. Make sure that all Python scripts to be used are saved in this directory.

All output generated by the server for HTTP job requests is written to the job output directory (which is a sub-directory of the output-root-directory). This security restriction does not apply to Python scripts executed as callbacks on the command line, which can write to any location.

RaptorXML Python API Entry-point functions

The commands that allow access to the Python interface are validation commands and the Python script will be executed regardless of the validation outcome. After validation has completed successfully, RaptorXML+XBRL Server will call a specific function, according to which command was executed. The called function (see table below), therefore, must be defined in the Python script. It must be defined with two parameters: the first is the job object, the second parameter varies according to which command was executed (see table). The second parameter will be None if the validation failed.

Command

Function called by RaptorXML+XBRL Server

valxml-withxsd (xsi)

on_xsi_finished( job, xml-instance )

valdtd (dtd)

on_dtd_finished( job, dtd ) (since v2.1)

valxsd (xsd)

on_xsd_finished( job, schema )

valxbrltaxonomy (dts)

on_dts_finished( job, dts )

valxbrl (xbrl)

on_xbrl_finished( job, xbrl-instance )

valinlinexbrl (ixbrl)

on_ixbrl_finished( job, document-set, target-documents )

Passing arguments to the Python Callback Scripts

After the command has been successfully submitted, RaptorXML calls the entry-point Python function related to the executed command with the two arguments.

One can supply one or more arguments to the entry-point function using the --script-param option:

raptorxmlxbrl xsd --script=xsd.py --script-param="key1:value1" --script-param="key2:value2" test.xsd

In the entry-point function the arguments can be accessed through job.script_params dictionary.

v = job.script_params['key1']  # v will receive value1 as string

For RatorXML Server the additional parameters are passed through the script-param array in the JSON job description like this:

{
        ...
        "script-param": [{"key1": value1}, {"key2":value2}, ...]
        ...
}

Executing Python Scripts through raptorxml-python

RaptorXML Server comes with a custom python interpreter. It acts as a drop-in replacement for a standard python3 interpreter and includes complete support for all RaptorXML Python API modules. The custom Python interpreter has the same name as the command line tool with -python appended (e.g. raptorxml-python).

To execute a python script with raptorxml-python simply pass it’s name as argument:

raptorxml-python myscript.py

Importing RaptorXML Python API modules from raptorxml-python

During Python API callback script invocation the API version is specified with the --script-api-version option. For raptorxml-python all parameters are processed directly by the python interpreter and no API version specific initialization occurs. The API version can be selected by importing the specific version of the RaptorXML Python API modules directly.

import altova_api.v2.xml as xml
import altova_api.v2.xbrl as xbrl
...

Note

For most user scripts raptorxml-python behaves exactly like raptorxml script. These import statements work also in RaptorXML Python API callback scripts.

During an interactive session, it is sometimes convenient to import all RaptorXML Python API modules at once with a single import statement:

from altova_api.v2 import *

Caution

Please note that import * is generally not recommended for production code as it can cause unwanted side-effects, e.g. by hiding built-in and previously imported symbols.

Extending the custom RaptorXML Python interpreter with pip

RaptorXML Server can be extended with 3rd-party packages using python pip:

raptorxml-python -m pip install pyodbc

Note

Depending on your platform and installation location you might need administrator privileges to install python extension packages into RaptorXML Server.

The installed modules are avaliable to any python script executed with RaptorXML independent from the invocation method.

Caution

Altova GmbH does not provide support for user installed 3rd-party modules.

Danger

RaptorXML Server already comes with some extension modules pre-installed. These modules are required for RaptorXML Server and must not be changed, upgraded or uninstalled.

This command lists all python packages that are installed into RaptorXML Server 2016:

raptorxml-python -m pip list

CherryPy (3.6.0)
Genshi (0.7)
pip (1.5.6)
protobuf (2.6.2-pre)
pytz (2014.9)
setuptools (3.3)
ws4py (0.3.4)

Python extension packages with native code

The raptorxml-python -m pip install command is capable to build native code python extension packages. All required libraries and header files are included in the RaptorXML Server distribution.

Some 3rd-party extension packages might have additional build dependencies which you have to provide yourself. For example on Ubuntu Linux you need to install the unixodbc-dev platform package before you can install the pyodbc module into RaptorXML Server:

sudo apt-get install unixodbc-dev
sudo raptorxml-python -m pip install pyodbc

To build native extension packages the same compiler that was used to build RatporXML Server should be used.

RatporXML Server Release

Windows

MacOSX

Linux

v2016

VS2013

clang >= 5.0

gcc >= 4.9

Python extension packages which install new scripts or executables: white-space in install path

Some 3rd-party extension modules install scripts or executables (e.g. jupyter). These fail to execute if the RaptorXML Server install path contains white-spaces (e.g. on windows c:\Program Files\Altova\...). These 3rd-party extension modules have to be installed using the path without white-spaces. On Windows platforms this can be achieved by using the short path form for all parts that contain white-spaces:

C:\PROGRA~1\Altova\RaptorXMLXBRLServer2016\bin\RaptorXMLXBRL-python.exe  -m pip install jupyter

Tip

On Windows the short name without white-space for a folder can be obtained using dir /X.

Example Scripts

Examples scripts are hosted on GitHub .