This document contains a technical description of the SPARQLMotion language, an RDF-based scripting language with a graphical notation to describe data processing pipelines. It introduces the core classes and properties that are used to represent SPARQLMotion scripts, and defines how SPARQLMotion engines will interpret them. Note that this document alone may not be a good starting point to learn the actual use of SPARQLMotion. Instead, it acts as a reference for users who need to fully understand the details and internals of a SPARQLMotion engine.
Version 2.1.0 - Draft for 2.1.0 release together with TopBraid Suite 3.3
This document is part of the SPARQLMotion Specification.
The SPARQLMotion Core Vocabulary is part of the SPARQLMotion specification which is outlined in the SPARQLMotion Overview page. This vocabulary does not describe the various types of modules, which are included in the SPARQLMotion Standard Module Library.
The SPARQLMotion language itself is a fairly light-weight collection of classes and properties used to represent SPARQLMotion scripts in RDF.
The SPARQLMotion system vocabulary is found in the namespace
http://topbraid.org/sparqlmotion#, which is typically
abbreviated with the prefix sm.
This system vocabulary is associated with semantics, which instruct the execution engine how to process (or display) SPARQLMotion scripts. The remainder of this document provides details on the various parts of the system vocabulary and their semantics.
SPARQLMotion scripts consist of modules. Each module represents a single processing step. Modules can be linked together with various relationships, as shown in the figure below.
The figure above displays a visual rendering of five SPARQLMotion modules,
represented as rectangular nodes in the diagram and linked with the
relationships sm:next and sm:body.
The visual rendering above is, however, just one way of interpreting SPARQLMotion
scripts: the ultimate storage format of scripts is entirely as RDF models.
The following sub-sections provide details on those general concepts.
A SPARQLMotion module is an instances of a module type.
Module types are (RDFS) classes that have the metaclass sm:Module.
Module types define properties that the SPARQLMotion user needs to fill in
at the instance level to control the module's behavior.
The class sm:Modules (note the 's' at the end!) serves as
"abstract" base class of the various module types.
It can be used as range or domain of properties, but has no other formal meaning.
A SPARQLMotion execution engine has a registry of known module types.
For example, a collection of Standard Module Types may be
coded into the engine in Java.
The executing engine will call the appropriate implementations at each step.
SPARQLMotion module libraries may also define sub-classes of existing
module types.
Unless a more specific implementation exists, the SPARQLMotion engine should
in this case execute the implementation for the superclass.
This makes it possible to specialize existing module types without having to
add a low-level implementation to the engine.
The subclasses may set default values of properties expected by the superclass,
e.g. using SPIN constructors.
Some modules, including sml:PostRequest will even walk through the
properties defined by their subclasses to build a list of request arguments.
In addition to SPARQLMotion module types (sm:Module), scripts can also
instantiate SPIN Functions
as modules.
SPIN functions are another kind of classes, and function calls (in SPIN) are
instances of those classes.
The arguments of the functions (sp:arg1, sp:arg2, etc)
can be passed into the function using the same mechanisms as with other module types.
SPARQLMotion modules can be chained together in various ways, to instruct the engine that the output of one module is the input to another module. The core vocabulary defines a collection of RDF properties that are used to link modules (instances) with each other.
Note: SPARQLMotion scripts form a directed acyclic graph, i.e. may not contain cycles.
The most frequently used relationship property is sm:next,
which indicates that the first module (subject) is producing input for the
second module (object).
The presence of an sm:next triple does not necessarily
mean that the first module is executed before the second module.
The technical execution order is left to the engine and the module
implementations.
Some SPARQLMotion modules can spawn off sub-scripts.
For example, an iteration module such as sml:IterateOverSelect
will repeat a "body" script in each iteration, before it continues the
execution of its successors via sm:next.
In the case of iterations, the property sm:body should be used.
In the case of IF-THEN-ELSE branches, the properties sm:if and
sm:else should be used.
However, SPARQLMotion does not prescribe specific meaning to any of those properties,
apart from the fact that they are sub-properties of sm:child, which
helps the engine identify that they describe a parent-child relationship between scripts.
Any of the sm:child properties link a module with a child script
by pointing to any module of the child script. This means that it is possible
to point to a the head or tail or anything in between - for display purposes it
is common practice to point to the "start" of the child script.
In either case, the child script must be self-contained (i.e. have no backward
references into the parent script), and must have a single target module, i.e.
exactly one module that does not have any sm:next value.
A SPARQLMotion script is a collection of modules that are connected using any of the module relationships mentioned above. Scripts are usually stored in a single RDF file or graph, but multiple scripts may be stored in the same graph.
Given the acyclic nature of SPARQLMotion scripts, any well-formed script will
have at least one module without any successors (sm:next).
Those modules are called target modules.
A script may have multiple target modules, and users can invoke either one
of those target modules separately.
In those cases, the execution engine only needs to traverse a sub-set of the modules
to create the results, as those branches leading to other target modules can be ignored.
One of the features of SPARQLMotion is that scripts can displayed and edited
visually. The suggested rendering of scripts is using directed graphs, so that
modules are represented as nodes, and relationships as edges, as shown in the
example above.
Each node should uniquely identify the module (e.g. with a label) and also
indicate the type of module (e.g. with an icon).
Furthermore, input and output variables should be displayed so that users
can recognize the data flow between modules.
The edges should be labeled to distinguish the various kinds of relationships.
In this kind of graphical visualization, the properties sm:nodeX
and sm:nodeY should be used to store the coordinates of nodes.
The property sm:icon should be used to link a module type (class)
with the URL of a display icon.
SPARQLMotion scripts define a processing pipeline in which data is being produced, processed and consumed. Individual modules are free to do whatever they like in each step: They can produce side effects (such as writing files), modify RDF triples in a graph, change variable bindings or invoke sub-scripts. Changes to the RDF graphs and variable bindings are relevant to the engine, and are covered by the following sub-sections.
Most SPARQLMotion modules operate on RDF graphs. They can query RDF graphs and may write to them. These RDF graphs might be derived from files, point to a database, or be entirely virtual, in the sense that they only exist during the execution of a script. In terms of an implementation, it only matters that those graphs implement the usual triple-level graph functions, e.g. as defined by the Jena Graph interface.
Each SPARQLMotion module also represents an RDF graph.
For example, the module sml:ImportRDFFromURL represents the
graph loaded from a given URL.
When invoked, the module may load the file from the web and then passes
this graph to the modules specified by sm:next.
These next modules may take the loaded graph as input and run SPARQL queries
over them. In those SPARQL queries, the input graph is the default named
graph, i.e. will be queried in the WHERE clause if no other graph has been
specified (e.g. using FROM or SERVICE keywords).
Many SPARQLMotion modules do not manipulate their input graph, and simply pass it on to their successors unmodified. Other modules may completely replace the input graph with some other graph to downstream modules. Some modules may not even produce any graph and simply represent the empty graph.
If a module has multiple incoming sm:next triples, then the
input graphs will be merged (logically), forming a union graph.
Engines may optimize this step by merging multiple in-memory graphs into
a single graph, or pruning empty sub-graphs.
In addition to RDF graphs, which are implicitly passed from module to module, modules can also communicate by passing variable bindings. A variable binding is a name/value pair, in which the variable name follows the usual SPARQL variable naming rules. The values of those variable bindings can be anything, but the officially supported default types are:
Many other data types such as file names can often be represented by means of RDF literals or URI references. According to the contract, any SPARQLMotion engine must be able to convert variable values to RDF nodes, so that they can be part of SPARQL queries. For example, in the case of XML nodes, a suitable string rendering must be derived by serializing the nodes.
SPARQLMotion modules can create variable bindings and thus pass new values
to their successors.
In a typical case such as sml:BindWithConstant, modules only
create a single new variable binding as "result" of its execution.
This result variable is typically represented using the property
sm:outputVariable.
However, modules do not have to declare the variables that they bind.
For example, sml:BindBySelect may bind any number of variables,
as specified by the variables appearing in its SELECT clause.
When a SPARQLMotion module executes a SPARQL query, then the current variable
bindings from all its predecessors will be pre-bound as query variables.
In the example figure above, the module
Set initial text binds the variable text with some value,
and this value could be queried as ?text in the WHERE clause
of the query in Iterate over persons.
SPARQLMotion modules are instances of module classes.
These classes should formally define the properties that script designers
should use to configure the behavior of the module.
Most modules have at least one property, e.g. sml:ImportRDFFromURL
has a property sml:url containing the URL to load from.
Those property values can be specified as triples in the module instance,
as illustrated by following example module (in Turtle notation):
:ImportKennedys
a sml:ImportRDFFromURL ;
rdfs:label "Import kennedys"^^xsd:string ;
sm:next :IterateOverPersons ;
sm:nodeX 5 ;
sm:nodeY 2 ;
sml:url "http://topbraid.org/examples/kennedys"^^xsd:string .
Some modules may not have any properties, and simply operate on the RDF input, with a pre-defined (fixed) behavior.
Since the access to the property values is done by the implementation,
modules can query any property of the module that they like.
However, it is strongly recommended that module types explicitly declare
the properties that they expect.
The property spin:constraint is used to link a module class
with a property.
The values of spin:constraint are typically the
SPIN templates
spl:Attribute or spl:Argument, both of which are
described in the following sub-sections.
Note that spl:Argument properties carry special semantics
that are hard-coded in the SPARQLMotion engine.
The SPIN template
spl:Attribute
is used to declare properties that are filled in by the script designer.
Attributes are typically used for SPARQL queries, child relationships and
any multi-valued property.
For example, sml:IterateOverSelect defines two attributes
as shown in the following Turtle snippet:
sml:IterateOverSelect
a sm:Module ;
rdfs:comment "..."^^xsd:string ;
rdfs:label "Iterate over select"^^xsd:string ;
rdfs:subClassOf sml:ControlFlowModules ;
spin:constraint
[ a spl:Attribute ;
rdfs:comment "The body of the iteration loop."^^xsd:string ;
spl:maxCount 1 ;
spl:minCount 1 ;
spl:predicate sm:body
] ;
spin:constraint
[ a spl:Attribute ;
rdfs:comment "A SPARQL Select query that ...."^^xsd:string ;
spl:maxCount 1 ;
spl:minCount 1 ;
spl:predicate sml:selectQuery
] ;
...
For readers familiar with OWL, this is comparable to OWL Restrictions,
using rdfs:subClassOf instead of spin:constraint,
spl:predicate instead of owl:onProperty
and spl:maxCount instead of owl:maxCardinality.
In contrast to OWL though, the spl:Attribute template carries
strict closed-world semantics, suitable for specifications.
The value type of those attributes can be either specified locally, using
spl:valueType, or using global rdfs:range statements
(used but not shown above).
Most module properties in SPARQLMotion are declared using the SPIN Template
spl:Argument.
Arguments can take at most one value, as indicated by the boolean field
spl:optional.
The following Turtle snippet shows the declaration of the module
sml:ImportRDFFromWorkspace.
sml:ImportRDFFromURL
a sm:Module ;
rdfs:comment "Gets RDF data from a given URL. The URL..." ;
rdfs:label "Import RDF from URL"^^xsd:string ;
rdfs:subClassOf sml:ImportFromRemoteModules ;
spin:constraint
[ a spl:Argument ;
rdfs:comment "The URL of the RDF source..."^^xsd:string ;
spl:predicate sml:url
] .
Like with attributes, the above is similar to OWL restrictions. Unlike attributes, module instances do not need to specify the actual value of the property as an explicit triple at the instance. Instead they can be computed dynamically at execution time, as explained in the following sub-sections.
If a module instance has no declared value for a property, then
the execution engine will check if there is a bound variable with the
same name as the local name of the property in the current scope.
For example, if a module expects a value for sml:url
and a predecessor of the module has created a binding for
?url, then this binding will be inserted into the module
at run time.
Using this mechanism, it is possible to link modules easily and
conveniently.
Most modules follow some naming conventions on the declared properties
to make it likely that the output variable of one module matches the
expected input properties of another module. For example, the property
sml:text is typically used as a property on modules that
process text. Modules that produce text, such as sml:ImportText
have ?text as their default value for sm:outputVariable.
Although convenient, the disadvantage of this approach is that the link between modules and their variable bindings is neither very transparent nor flexible. In particular, in many use cases the names of output and input variables do not match, so that the following alternatives are often a better choice.
Many modules operate on string arguments.
For example, modules of type sml:ImportRDFFromURL use the
property sml:url to retrieve the URL, stored as xsd:string.
In SPARQLMotion, those strings may be String Templates, with inline
variable names.
For example, the value of sml:url could be http://example.org/{?fileName}.rdf
where ?fileName is the name of a bound variable.
The SPARQLMotion engine will interpret those string arguments and apply
string substitutions based on the current bindings.
For example, if the variable ?fileName has the value
"test", then the URL above becomes http://example.org/test.rdf.
Names of unbound variables will be substituted by empty strings.
It is up to the SPARQLMotion module implementation to decide how to substitute string templates. In particular, many modules do not insert the variable bindings verbatim but instead escape URL characters etc.
Instead of a direct property value as an RDF resource, literal or string template,
SPARQLMotion allows script designers to assign the property through SPARQL expressions.
Those SPARQL expressions must be stored as blank nodes using the
SPIN RDF Syntax.
For example, the following Turtle snippet shows the use of a SPARQL function
call to dynamically compute the value of sml:mimeType:
:ReturnTheText
a sml:ReturnText ;
rdfs:label "Return the text"^^xsd:string ;
sml:mimeType
[ a smf:if ;
sp:arg1 [ sp:varName "xml"^^xsd:string
] ;
sp:arg2 "xml" ;
sp:arg3 "text/html"
] .
In a more readable form, the module above can be rendered as a form:
At execution time, the engine will invoke the example SPARQL function
smf:if and use its result as value for sml:mimeType.
Any SPARQL expression, including variables, function calls and built-in mathematical operations can be used in those expressions. User interfaces should render those SPARQL expressions between { and } so that they can be readily distinguished from constant values.
Taking the idea of SPARQL Expression Arguments further, SPARQLMotion also allows users to insert arbitrary SPARQL SELECT queries into arguments. Again the SPIN RDF Syntax is used to represent those queries, although it is also possible to use SPIN Templates. If the value of an argument property is a SPARQL query, then the engine will evaluate this query when the module executes, and use the first binding of the first result result variable in the SELECT clause as value of the argument.
The following example module shows an equivalent SPARQL query to the example from above (for brevity as a form):
In this example, the first binding for ?result will be used
as sml:mimeType when the module is executed.
Any other results or variables produced by the query will be ignored.
SPARQLMotion scripts are typically driven by SPARQL queries.
This enables script designers to exploit the full range of SPARQL
features and extensions at execution time.
In particular it is possible to call user-defined functions, including
SPIN Functions.
Such SPIN functions are typically based on a spin:body - a nested
SPARQL query that is executed whenever the function is invoked.
However, SPARQLMotion also provides a mechanism for defining new SPARQL/SPIN
functions that are backed by a complete SPARQLMotion script instead.
User-Defined SPARQLMotion Functions are declared like other SPIN Functions.
The difference is that instead of a spin:body, the function
must point to the target module of a SPARQLMotion script, using
sm:returnModule.
When the function is invoked, the return module will be launched as a
SPARQLMotion script, and the result of the target module will be used as
function call result value.
How this result value is being computed is not defined by the SPARQLMotion
Core specification and not all kinds of return modules are permitted.
A popular choice is to use sml:ReturnNode as an end point.
Since SPIN functions are instances of the function class, and SPIN functions can also be used as SPARQLMotion modules, it is also possible to directly insert user-defined SPARQLMotion functions into scripts.
The URL of the SPARQLMotion Core Schema is http://topbraid.org/sparqlmotion