CMIS Query Language

Categories:
Posted on: Mon, 2009-11-23 17:00

The CMIS Query Language is the means by which any CMIS compliant repository provides a read only Relation View into the repository.  As we saw in the CMIS Domain Model, support for queries is optional.  Having said that, in my humble opinion, it is highly unlikely that any self respecting CMIS compliant repository would not support relational queries.  Keep in mind that some repositories may require you to issue full text and metadata queries separately.  CMIS SQL is a subset of SQL-92 with added support for type inheritance, multivalued properties,  full text search and folder location.

Below is a sample query that touches on much of the CMIS SQL functionality:

SELECT ObjectId, SCORE() AS Relevance, Destination, DepartureDates
/* The SELECT is required and specifies which Virtual Columns
(or Properties) will be included in the results */
FROM TravelBrochure
/* The FROM Clause is required and Specifies which Virtual
Tables (Or Object Types) the query will run against */
WHERE CONTAINS(‘CARIBBEAN CENTRAL AMERICA CRUISE TOUR’)
AND ‘2010-01-01’ < ANY DepartureDates
/* The WHERE clause is optional and specifies which constraints
the rows returned must satisfy */
ORDER BY Relevance DESC
/* The ORDER BY clause is optional and specifies the order in
which the rows must be returned

This Query finds all travel brochures with text related to Caribbean and Central American Cruises and Tours where at least one departure date is after January 1, 2010.

Mapping to the Relational View

A CMIS compliant repository will map all Queryable Object Types to a Virtual Table and all of the queryable properties within those types to Virtual Columns.  The names of the virtual tables are based upon the Query Name of the object type and the names of the virtual columns are based upon the query name of the queryable properties. For multi valued properties, the result set must contain all of the values rendered as a single list.  Content streams are not exposed as part of the relational view.

The CMIS Query capabilities understands the hierarchy of object types.  For instance if I create a subtype of the cmis:folder called "Collaboration Folder" and I issue a query against the cmis:folder virtual table, the Collaboration Folders will be returned if the Collaboration Folder Type was set up to be included in the result sets of any super types (done by setting the includedInSupertypeQuery property to true).  In this case, the result set will not include any properties that are defined specifically for the "Collaboration Folder". 

Syntax Elements

A CMIS Query must contain a SELECT clause and a FROM clause.  It can optionally contain a WHERE clause and an ORDER BY clause.

The SELECT clause must either contain a single "*" or a list of one or more virtual column names. The FROM clause must contain the name of one or more virtual tables. If there is more than one table, the additional tables must be included by using explicit JOINs.

The ORDER BY clause will contain a comma separated list of columns (representing single valued properties) used to sort the output.  All of the columns used to sort the output must be colums that were included in the SELECT statement.

The WHERE clause can use the comparisons that you would expect in the appropriate places (NOT, LIKE, IN, =, <>, <, >, >=, <=). Keep in mind that IDs are not strings and as a result you cannot use LIKE when comparing IDs. In CMIS you cannot define implicit joins in the WHERE clause, all joins must be explicitly made using the JOIN clause. For any joins that are made, they can only be done using equality predicates on single valued properties. Keep in mind that some CMIS compliant repositories that support CMIS Queries, may not support joins.

Support for Multi Valued Columns

When comparing columns that represent multi valued properties, you must use precede the column name with the ANY predicate.  Below are some examples

WHERE "RED" = ANY COLOR
...
WHERE ANY COLOR NOT IN ("RED","WHITE","BLUE")
...
WHERE ANY StartDate < '2010-01-01'

Support for Full Text Search

CMIS supports full text search via the CONTAINS and SCORE predicate functions. These two predicates are used together.

The SCORE predicate returns the relevance value of the search made in the CONTAINS predicate.  The value is between 0 and 1, with 0 meaning that the text was not found in the document and 1 indicating a perfect match.  You cannot use the SCORE function without the CONTAINS predicate.  The SCORE function can only be used in the SELECT clause and can be aliased.  If no alias is provided, then there is an implicit alias of SEARCH_SCORE that is defined. The ORDER BY clause must use the alias name of the SCORE result.

The CONTAINS predicate is used in the WHERE clause and returns a boolean depending upon whether or not the text was found in each document searched. The CONTAINS predicate can only occur once in the query and can only be anded with other predicates in the WHERE clause. If one or more tables are used in the query (in the case of a join) an qualifier must be included, indicating which table the search is being made on. The following shows the syntax for the qualifier

SELECT cmis:objectId, cmis:name, SCORE() as Relevance
FROM Documents
JOIN Attachments on (Documents.AttachmentId = Attachments.cmis:objectId)
WHERE CONTAINS(Attachments,'ECM Web2.0 CMIS REST')
ORDER BY Relevance DESC

Notice in the example above, because we joined the Attachments table, we needed to specify which table the full text search was being executed against.

Support for Folder Location

The IN_TREE() and  IN_FOLDER() predicate functions allow for the limiting of a search to objects files in a particular folder or a folder and its subfolders. These functions accept an optional Qualifier and a mandatory Folder ID (the object ID of a folder) as shown below:

IN_TREE([Qualifier],<folderId>)
...
IN_FOLDER([Qualifier],<folderId>)

The qualifier is required if more than one table is present in the query. Keep in mind that these functions take the folder Id, not the folder path.

Escape Sequences

CMIS compliant repositories must support escaping The backslash (\) is used for escaping and must only be used within quoted strings. Below are the only valid uses for the backslash character

  • \"represents the double-quote character (“)

  • \' represents the single-quote character (‘)
  • \\ represents the backslash character (\)
  • \% represents the percent character (%) - only valid within a LIKE predicate
  • \_ represents the underscore character (_) - only valid within a LIKE predicate

Summary

The CMIS Query Language should simplify the process of searching ECM repositories.  In the case of Alfresco, this is a huge plus as more technologists are familiar with SQL than with the LUCENE query language.