Notes on RAML

Paul Houle, Ontology2

Introduction

The RAML specification is designed to describe the kind of "pragmatic" APIs that we see on the web where somebody does an http request, possibly posting some data, and gets back a JSON or XML document which is somewhat structured. In some ways it is like WSDL, but instead of specifying an new protocol for APIs (ex. SOAP), it documents vernacular APIs that already exist -- much like the vision of the semantic web where schema languages such as RDFS are almost as good at assigning meaning to predicates that somebody else wrote as they are to designing an ontology from scratch.

That said, RAML positions itself as "The simplest way to design APIs" as such:

RAML is the only spec designed to encompass the full API lifecycle in a human readable format with code and design pattern reuse. With RAML you can truly design, build, test, document, and share your API all with one spec.

Compared to Swagger, RAML offers a great deal of amenity to API developers, particularly in the form of a number of mechanisms for code reuse and specialization.

Optics

I'll talk about some superficial characteristics that make RAML easy to swallow, particularly when compared to the OpenAPI specification. The RAML specification uses a consistent notation, based on YAML, wheras the OpenAPI specification switches back and forth between a JSON representation and a YAML representation, distracting the reader.

Relative to OpenAPI (aka "Swagger"), RAML values making life easier for the writer. Thus it looks concise, looks almost like a Markdown, Word or Whiteboard outline you use already.

One RAML use case would be to generate HTML or PDF documentation of an API. Another would be to generate a set of client stubs to access the API from some language like Python, Java, Node, FORTRAN, etc. Of course you can generate server stubs too if your are the developer of the service.

RAML is all about defining the semantics of specific URLs and families of URLs, but it rejects the "namespace" mechanism from the XML and RDF world. For instance you cannot use a well-known predicate such as dc:Author. Statements certainly look cleaner than they do in RDF.

Despite this heresy, the type system is modeled after XSD but uses a more hip syntax which may appeal to the young people in the audience. A RAML document could be expressed as an RDF graph by a mechanical translation and then SPARQL queried for fun and profit.

RAML is based on YAML which is specified in great detail. YAML gives writers a lot of latitude about the (unquoted) form of keys in key-value pairs. This gives DSLs like RAML the ability to shape the appearant syntax to the domain. In this case, the use of paths (e.g. /this/and/that) and numbers as keys makes it natural to describe URIs and http response codes -- almost as if YAML were designed for this from the beginning.

Annotations

RAML is rich in extension methods much like the FHIR standard used for clinical notes. A primary method of extension is the annotation, which looks like

#%RAML 1.0
title: Testing annotations
mediaType: application/json
annotationTypes:
  testHarness:
    type: string
/users:
  (testHarness): usersTest

You could parse this in a non-validating manner, just treating a word inside parenthesis as another key it accepts, but a good parser would check the annotationTypes: declarations to apply correct data types, etc. As with FIHR, you can add annotations at will so long as you declare them in the annotationTypes section.

Reification here is hardcore, you can even apply it to a scalar value. Normally you might write a scalar value like this:

baseUri: http://www.example.com/api

but you can also expand it like this to create a "blank node"

baseUri:
  value: http://www.example.com/api

and then make a statement about the fact:

baseUri:
  value: http://www.example.com/api
  (redirectable): true

If you had tons of (annotations) and very long documentation strings, the file could get ugly and you would want tooling to help. On the other hand it is a conceptually clean way to extend the specification with any kind of vendor- or site-specific features.

Examples (Instance Data)

RAML descibes a schema for objects, but you can also represent instances of that data in the RAML document. In this case, an example is defined for the type User: as part of the type declaration.

types:
  User:
    type: object
    properties:
      name: string
      lastname: string
    example:
      name: Curtis
      lastname: Mayfield

Since examples are abstract RAML objects, agnostic to any programming language, it should be possible to translate these into code in an implementation language (say Python or Java) that calls the API with an instance of said object.

Type system

Probably the most unusual feature in the type system is the existence of a UNION type which does not translate directly to some languages (eg. Java) Instances of types contain data like big-O Objects, but do not have methods. The API endpoints are functions, however, and with the

self.method() -> method.self()

trick you can model an OO system. The RAML specification describes them as "similar to Java classes", but they closely correspond to the JSON / Scalars Plus Collections of Scalars and Collections approach. Types can inherit from multiple other types; the processor should give an error if one tries to combine types with incompatible definitions. RAML defined facets can specify restrictions on primitive types (as in XSD or OWL) such as "an integer between 2 and 12", plus users can use private facets.

RAML also accepts JSON schema and XSD type definitions. RAML can import types from JSON Schema and XSD definitions, although these types cannot be specialized or inherited from.

Traits

according to the RAML spec

resource and method declarations are frequently repetitive. For example, an API that requires OAuth authentication might require an X-Access-Token header for all methods across all resources. For many reasons, it might be preferable to define such a pattern in a single place and apply it consistently everywhere.

--- https://github.com/raml-org/raml-spec/blob/master/versions/raml-10/raml-10.md/#resource-types-and-traits

traits can be defined in a RAML file or can be included from another RAML file, as seen in this example:

#%RAML 1.0
title: Example API
version: v1
resourceTypes:
  collection:  !include resourceTypes/collection.raml
  member:      !include resourceTypes/member.raml
traits:
  secured:     !include traits/secured.raml
  paged:       !include traits/paged.raml
  rateLimited: !include traits/rate-limited.raml
/users:
  type: collection
  is: [ secured ]
  get:
    is: [ paged, rateLimited ] # this method is also secured
  post:                        # this method is also secured

the "is" statement about causes '/users' to inherit a set of properties shared by secured resources. The get method (an http method!) gets the paged and rateLimited properties. The actual traits can be parameterized with templates, for example

traits:
  secured:
    queryParameters:
      <<tokenName>>:
        description: A valid <<tokenName>> is required
  paged:
    queryParameters:
      numPages:
        description: The number of pages to return, not to exceed <<maxPages>>

the template mechanism even supports some linguistic transformations, such as:

    post:
      responses:
        200:
          body:
            type: <<resourcePathName | !singularize>>  # e.g. User

which would translate a resourcePathName with value "users" to "Users". So far as underlying YAML is concerned, there is no special meaning to the punctuation in a variable name like <<tokenName>> -- this is something that RAML can build on top of YAML thanks to permissive punctuation.

Overlays and Extensions

Something exciting about RAML is that it support modular specifications; one YAML file can reference other YAML files and can add to and amend the declarations in another RAML file. For instance, this could be used to offer different documentation for different service tiers have access to a variable but overlapping set of API calls. Another scenario would be overlaying Spanish documentation over a specification written in English.

The first scenario would be an example of an extension, which is allowed to add and override behaviors of the API. The second would be one of an overlay, which can override documentation-related properties but not the actual behavior of the API. The RAML spec contains detailed descriptions of the merging algorithms that support overlays and extensions.

Anatomy of a RAML processor

One obvious way to implement a RAML processor is with a multiple-pass process, which would look something like:

  1. Parse the RAML document(s) as a YAML 1.2 into either a tree (JSON-like) or graph (RDF) representation
  2. Apply transformations such as merging overlay and extension graphs, specializing traits and inheriting properties from parents.
  3. Use type declarations to determine the types of other scalar nodes, merge this with node values to restore full types in the internal representation
  4. Validate key names, annotation use, etc.

That order might not be quite right, there may be a bit of a chicken-and-egg sort of thing when it comes to understanding enough about what types are expected where to read the type declarations and then apply them to the data. Also the above is based on a forward-chaining approach where facts are materialized ahead of time; in the case of type inheritence one might be able to do backward-chaining (infer inheritence at query time).

(Note that YAML itself allows single-pass processing, having some details that limit the size of buffers needed to parse it. Some RAML documents pro)

After going through the above process, the RAML graph should be straightforward to use for most purposes such as documentation generation. Still there will likely be canonicalizations for certain purposes, for instance, stripping annotation values for parts of the system which are not aware of any or all annotations.

To understand the scope of existing RAML processors, the project listing has 12 pages of them that do everything from convert RAML to Swagger, provide GUI interfaces for producers and consumers, generate client-side and server-side stubs, and even create mock services and proxy servers. In other words, the works!

Use of Markdown for descriptions

RAML uses Markdown (specifically described as "Github-flavored Markdown" (GFM) for long descriptions. Markdown is commonly implemented and plays well with Jupyter, Sphinx and other documentation tools, however it has the problem of being poorly specified -- and if an implementation allows HTML to be embedded in Markdown (as most do) rendering behavior is almost open-ended. The good news is that Github defines GFM in reference to the CommonMark standard, so rendering behavior should be fairly well defined.

Conclusion

RAML looks simple and approachable, but inside it has powerful mechanisms to reuse code and enable maintainance of large and complex APIs. RAML covers much of the ground that older systems, such as CORBA and SOAP did, but it approaches it from the viewpoint of describing the kind of APIs people are already creating, as opposed to defining a new protocol for APIs. RAML is an attractive tool for API designers and consumers for 2018.