Forward Chaining with the Jena Rules Language

Context

In some areas of programming languages, certain styles of programming have jelled down into standard languages like C, Pascal, or Common LISP in which huge amounts of commercial and/or open source software are implemented. RDF is a standard for logical facts, but there is no dominant standard for logic and rules programming.

Two rules engine that somewhat compete with the Jena Rules engine are Drools and Clara. I say these compete because they exist in the JVM and could both be operated in a forward-chaining mode to solve a similar range of problems. Drools works on "Plain Ordinary Java Objects" while Clara works on "Plain Ordinary Clojure Objects". It would be definitely possible to import RDF data into those rules engines, but the Jena Rules Engine has the advantage of using RDF Nodes as the basic data type, which is important for us.

In this chapter we work through a few specific examples that will help explain how the Jena Rules Engine is used in Real Semantics and how the Jena Rules Engine can envolve in the near term.

A typical "business rule"

Javagate, the component of Real Semantics that moves data between RDF and prexisting Java classes. If you are writing classes intended for use with Real Semantics you can set up this mapping by putting Java annotations on your code. If you're using other classes, Real Semantics needs a map that connects RDF properties to Java members.

The good news is that Real Semantics can generate this map automatically in most cases, because most Java programs use design patterns such as the Java Beans convention:

By default, we use design patterns to locate properties by looking for methods of the form:

    public <PropertyType> get<PropertyName>();
    public void set<PropertyName>(<PropertyType> a);

If we discover a matching pair of “get<PropertyName>” and “set<PropertyName>” methods that take and return the same type, then we regard these methods as defining a read-write property whose name will be “<propertyName>”. We will use the “get<PropertyName>” method to get the property value and the “set<PropertyName>” method to set the property value. The pair of methods may be located either in the same class or one may be in a base class and the other may be in a derived class.

If we find only one of these methods, then we regard it as defining either a read-only or a writeonly property called “<propertyName>”


— Section 8.3.1 of JavaBeans(TM) Specification 1.01

This kind of specification is a good match for rules technology because it describes a number of arbitrary properties. This kind of specification can be broken down to a set of rules which state what the English says in a rigorous way. It would be asking a lot to turn that English into rules automatically, but you can definitely display the rules side by side with the specification to confirm the correctness of the rules.

Anatomy of a Rule

For instance, while building the RDF to Java mapping, Real Semantics scans Java classes for metadata about fields, methods and annotations. This data is inserted into an RDF graph, and a set of rules that look like the following match the patterns described in the specification above. This specific rule finds setter methods and configures property accessors to fetch data from them:

@prefix : <http://rdf.ontology2.com/javagate/>
@prefix unq: <http://rdf.ontology2.com/unqualified/>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

[
    (?A rdf:type :Method)
    (?A :memberName ?memberName)
    (?A :returnType ?javaType)
    (?A :hasModifier :Public)
    (?A :parameterTypes rdf:nil)
    regex(?memberName, '^get([A-Z].*)$', ?propName)
    lcFirst(?propName,?lcPropName)
    uriConcat(unq:,?lcPropName,?propertyIs)
    makeTemp(?B)
->
    (?B rdf:type :Accessor)
    (?B :propertyIs ?propertyIs )
    (?B :getter ?A)
    (?B :javaType ?javaType)
]
    

The structure of this rule matches the structure of the specification well, so it is easy to implement variant rules such as the special case of isBooleanVariable, indexed properties, etc. Thus we can get data in and out of Java Beans as well as classes that use other conventions to name entities and fields.

Let's talk about this rule, which has the structure

[ *body* -> *head* ]

and means more or less

    IF *body* THEN *head*

First note the rule is made of two sorts of things, (i) triples, which look like (?A rdf:type :Method) and (ii) builtins, which look like makeTemp(?B). Triples in the body match patterns in the RDF graph that are already known, while triples in the head get inserted into to graph when the inference rule is triggered. Builtins do many other things, and are similiar in nature to the "predicates" used in the Prolog programming language.

The first four triples in the head match five key properties of ?A, which represents a Java method. The fifth one refers to rdf:nil, which is also known as an empty list, and is making a precise statement that the parameter list of the function is empty. (This might not be familiar if you've seen legacy RDF systems that don't support ordered collections.)

Jena Rules Builtins

The snippet regex(?memberName, '^get([A-Z].*)$', ?propName) is possibly the first thing which is a little unusual. In most programming languages you would write this function like this:

    ?propName=regex(?memberName,'^get([A-Z].*)$')

and perhaps a future version of the Jena Rules Language could let you write it this way, but instead the Jena Rules language treats the return parameter as just another parameter in the parameter list as opposed to something that is applied to the left. Because of this, Jena Rules programs have a graph structure instead of the typical tree structures found in most programming languages. Other than that, the method is utterly conventional in that it matches get followed by a capital letter, and deposits the matching name into the ?propName variable.

At this point (imagining that execution proceeds downward) we are done matching and now we are computing values that we will insert into the graph via the head. Actually Jena can match and execute the clauses

Forst, lcFirst(?propName,?lcPropName) is a user defined function (UDF) that Real Semantics adds to the Jena Rules engine which makes the first letter of ?propName lowercased and inserts that string into ?lcPropName Real Semantics comes with an extended library of user defined functions that solve practical problems like this. As much as it would be desirable to express ourselves in rules, often the exact logic we need can be found in a function written in Java, in which case this is made as easy as possible.

The last two builtins come from the standard library and finish the job, uriConcat(unq:,?lcPropName,?propertyIs) and makeTemp(?B). The first one creates a URI ?propertyIs by using unq: as a namespace and ?lcPropName as a localname. The last one creates a new blank node ?B which is a unique name for the data record to be created in the head.

If you look at the above and squint you might see some similarity between that rule and a SPARQL CONSTRUCT query, particularly in that the structure of the query is

CONSTRUCT *head* WHERE *body*

Something that the Jena Rule shares with the CONSTRUCT query is that you can't do any computations in the *head*, for instance you can't use a function like lcFirst, instead, you have to do all of your thinking in the *body* and pass the results through variables. One thing that is different about Jena Rules, however, is that you can use builtins in the head that have side effects such as print() and hide(). Creating such a set of builtins is a natural way to express a Java API that constructs Java objects to have effects on the world.

We are contributing a patch to the Jena Project, JENA-1204 which makes it straightforward to add locally scoped UDFs to Jena and are working on JENA-1201 to improve the function library.

Jena Rules vs SPIN

SPIN rules are appealing in many ways. In particular, using the CONSTRUCT query above you get a rule language that is similar in many ways to Jena Rules. For instance, the Jena rule that matches the getter pattern could be written as a SPIN rule not very different from the Jena Rule.

The weakness of SPIN, compared to Jena Rules, is that current SPIN engines use a "fixed-point" strategy to run the rules. Essentially it runs all of the rules over and over again until there are no new facts. Good rule sets will terminate this way, but it is by no means guaranteed. This strategy is simple to implement, but it is not necessarily efficient.

To the contrary, the Jena rules engine contains two powerful reasoning engines:

  1. Forward-chaining: the Jena rules engine precompiles a RETE network that records which rules can be fired by another rule. This engine produces the same output as the fixed point iteration under reasonable conditions, but is exceptionally effective at processing incremental changes to the rules by inferring new rules
  2. Backward-chaining: the Jena rules engine implements a Datalog engine with tabling. This solves problems by working backward and has performance and expressiveness advantages for some problems

Both of these engines are basically compatible (many rules can be run either way) and both are better than anything that researchers had in "the Golden Age of A.I." Practically we do use SPARQL to express rules and infer facts in Real Semantics when it is convenient, the Jena Rules Language is the official rules language of Real Semantics and we expect do do more engineering on and around it.