Qexo runs on the JavaTM platform.
It is written in Java, and it compiles XQuery expressions and programs
to Java bytecodes (.class
files).
Qexo is Free Software (or open-source, if you prefer), available from
the Qexo website.
Qexo is based and an part of the Kawa framework. Kawa (no relation to the now-defunct IDE of the same name) was originally written in 1996 at Cygnus Solutions (now part of Red Hat) to compile the Scheme functional programming language to Java bytecodes. Since then Kawa has been generalized to handle other programming languages, including now XQuery.
Kawa depends on a Java feature called a ClassLoader
,
which can take bytecode representation of a Java program
(same format as a .class
file, but stored in an
array in memory), and convert that into a runnable class in
an existing Java executable (or "virtual machine"). (The same
mechanism is used when a browser down-loads and runs an "applet".)
Compiling the bytecodes and then using a ClassLoader
gives Qexo the best of both fast interactive responsiveness,
and fast execution of repetitive code. You can also save
compiled code in a .class
file so it available for
future use, and it can even be compiled to machine code
using a Java-to-machine-code compiler such
as GCJ.
The Qexo web site gives instructions for how you can get Qexo.
The easiest way is to down-load the latest version of the Kawa
jar
file, for example kawa-1.7.jar
,
and put it in your class path.
In the following, we write 'qexo
' to means
the command you use to start up Qexo. There are a number of
ways you can actually run Qexo.
If you have downloaded Kawa as a jar
file
(for example kawa-1.7.jar
), you can start up
Qexo using the either command:
java -cp kawa-1.7.jar kawa.repl --xqueryor
java -jar kawa-1.7.jar --xquery
In the following we'll assuming you've defined qexo
as an alias for one of the above.
On a Unix or GNU/Linux system, you can make an alias like this,
$ alias qexo='java -jar kawa-1.7.jar --xquery'and then just do:
$ qexo
We use $
to stand for the prompt for your
command-line processor (shell or console), and
we use boldface for commands you type.
Alternatively, you can place the kawa-1.7.jar
file in your
class path, and just type 'java kawa.repl --xquery
'
instead of 'qexo
':
$ java kawa.repl --xquery
If you start up Qexo without specifying any file parameters, it will enter an inteactive loop. Here are some examples, with user input shown in bold.
$ qexo (: 1 :) for $i in 1 to 3 return 10*$i 10 20 30
The command line prompt includes the current input line number,
and has the form of an XQuery comment, to make it easier to cut and paste.
Following the prompt you can type some complete XQuery expression,
in the example for $i in 1 to 3 return 10*$i
.
and hit
How does Qexo know when an expression is "complete"? When should it evaluate what it has, as opposed to prompting for more input? The rule is that if current input line forms a complete valid expression, it evaluates it. If it has seen a syntax error, it prints out a message and discards the input. Otherwise, it prints a prompt, and waits for more input.
Let us continue, this time with some multi-line expression:
(: 2 :) (3 (: 3(:) +10) 13 (: 4 :) if (3<2) (: 5i:) then "it's true" (: 6i:) else "it's false" it's false
Notice how the prompt changes to '('
or an 'i'
to indicate that we're inside an incomplete parenthetical
or if
expression, respectively.
Next some examples of syntax errors.
(: 7 :) (for $x := 10 return $x <stdin>:7:9: missing 'in' in 'for' clause (: 8 :) %+1 <stdin>:8:1: invalid character '%' (: 9 :) = 5 <stdin>:9:1: missing expression
Qexo prints out the "file name" of the error
(in this case the standard console input), followed by the
line and column numbers. For the last error, it couldn't be
more specific than missing expression
.
Next is an example of an element constructor expression. Notice how the prompt changes to an XML comment.
(: 2 :) <a> <!--3--><b>{for $i in 1 to 3 return 10*$i}</b> <!--4--></a> <a> <b>10 20 30</b> </a>
You can also define XQuery functions interactively:
(: 5 :) define function repeat ($count, $values) { (: 6{:) for $i in 1 to $count return $values (: 7{:) } (: 8 :) "[", repeat(4, (1,2)), "]" [1 2 1 2 1 2 1 2] (: 9 :)
The XQuery specification defines a program as a collection
of declarations followed by a top-level expression.
The "normal" way of running a program is to put it in a file,
and evaluate it. You can use the -f
command-line flag
to specify the name of a file containing a program:
$ qexo -f pictures.xql
You can also specify a (short!) XQuery program on the command line
following a -e
flag:
$ qexo -e '<img src="file.png"></img>' <img src="file.png" />The output is by default printed using the XHTML style, which is XML in a style that most HTML browsers can handle. You can override the output format using an
--output-format
option. For example you can specify HTML format:
$ qexo --output-format html -e '<img src="file.png"></img>' <img src="file.png">
You can even specify a format for Scheme programmers:
$ qexo --output-format scheme -e '<img src="file.png"></img>' (img src: file.png )
If you have an application you'll be running repeatedly,
it makes sense to compile it and save the compiled form for future use.
If you run Qexo with the -C
flag followed by one or more
filenames, then those files will be compiled,
producing or more .class
files.
The --main
option species that Qexo should generate
a main
method, creating an application that can be
run by the java
command.
Assume pictures.xql
is the name of a file
containing an XQuery program:
$ qexo --main -C pictures.xql
This creates a file pictures.class
. (It may in rare cases
create some other classes as well.
These have the form pictures*.class
.)
You can run this as follows:
$ java -cp .:kawa-1.7.jar pictures
This should be the same as, but faster than, running:
$ qexo -f pictures.xql
A servlet is a Java class that can loaded into a Web server to process and answer HTTP requests. It is an efficient way to provide server-side computation, because the servlet can be loaded and allocated once, and then process thousands of requests. An XQuery program can be compiled by Qexo into a servlet. See here and chapter 12 for more information and examples of servlets using Qexo.
A Qexo extension allows you to call an arbitary Java method in an XQuery expression, using XQuery function call notation.
The following example uses Drew Noakes'
EXIF extraction library
for extracting EXIF meta-data (time-stamps, focal-lensgth, etc)
commonly produced by digital cameras.
The code assume that exifExtractor.jar
is in your class path.
The code first declares a number of namespaces as aliases for Java classes.
declare namespace exif-extractor = "class:com.drew.imaging.exif.ExifExtractor" declare namespace exif-loader = "class:com.drew.imaging.exif.ExifLoader" declare namespace ImageInfo = "class:com.drew.imaging.exif.ImageInfo" declare namespace File = "class:java.io.File"
Remember that a namespace defines a prefix alias for a URL literal,
which can be any string, used as a unique name.
Qexo uses the convention that a URL string starting with class:
refers to a Java class. Specifically, it acts as if all Java methods
are pre-bound to a QName whose local name is the method name,
and whose namespace URI is class:
followed by the
fully-qualified Java class name.
For example, if the Qexo processor sees a call to a function
exif-loader:getImageInfo
, with the namespaces as
defined above, then it will translate that into a call to
a method named getImageInfo
in the class
com.drew.imaging.exif.ExifLoader
.
(That is assuming you haven't explicitly defined a function by that name!)
If the method is overloaded, Qexo uses the argument types to select a method.
The method name new
is used specially for creating a new objects,
being equivalent to a Java new
expression.
define function get-image-info ($filename as xs:string) { <pre>{ let $info := exif-loader:getImageInfo(File:new($filename)) for $i in iterator-items(ImageInfo:getTagIterator($info)) return ( " ", ImageInfo:getTagName($i),": ", ImageInfo:getDescription($info, $i)) }</pre> }
The function takes a single parameter: $filename
, which is the
name of a JPEG image file as a string.
It uses that to create a new File
,
which is used to create an ImageInfo
object.
The getTagIterator
method creates an
java.util.Iterator
instance, which you can use
to get all the EXIF tags in the image.
The Qexo function interator-items
takes
an Iterator
and turns it into an XQuery sequence
consisting of the values returned by the Iterator
.
The for
"loops" over this sequence,
and we format each tag item into a readable output line.
For more information, see here.
Often XQuery will be used as part of a larger Java application.
In this section we will see how you can use Qexo
to evaluate an XQuery expression in a Java program.
The following statement creates an XQuery evaluation
context, and assigns it to the variable named xq
:
XQuery xq = new XQuery();
You can then use the eval
method to evaluate
an XQuery expression, returning a Java Object
:
Object result = xq.eval(expression);
The following application reads the strings on the command line, evaluates them as XQuery expressions, and prints the result.
import gnu.xquery.lang.XQuery; public class RunXQuery { public static void main (String[] args) throws Throwable { XQuery xq = new XQuery(); for (int i = 0; i < args.length; i++) { String exp = args[i]; Object result = xq.eval(exp); System.out.print(exp); System.out.print(" => "); System.out.println(result); } } }
You can use these commands to compile and run this application,
assuming that kawa-1.7.jar
is in your class path:
$ javac -g RunXQuery.java $ java RunXQuery '3+4' 'for $i in 1 to 5 return $i+10' '<a>{3+4}</a>' 3+4 => 7 for $i in 1 to 5 return $i+10 => 11, 12, 13, 14, 15 <a>{3+4}</a> => <a>7</a>
The println
method calls the generic toString
method, which is fine for quick-and-dirty output (such as for debugging),
but isn't recommended for printing real data.
One reason is that it requires allocating a temporary string,
which then has to get copied into the PrintStream
's
output buffer, which is wasteful for large data structures.
Another reason is that none of the output shows up in the output
until it has all been converted, which can also hurt performance.
(If the toString
gets into a loop, which is quite possible
for cyclic data structures, you just sit there waiting with no idea
what is going on!)
Another reason to avoid toString
is that it doesn't
provide any control over the output format, such as whether
you want characters like '<'
escaped as
'<'
, or whether you want HTML-style or XML-style output,
for example. Formatting to a specific line width is also difficult.
In Qexo you can instead send the output to a special Consumer
,
which is something you can send data to. It's like a Writer
(or a SAX2 ContentHandler
),
but it works with abstract data rather than characters.
The gnu.xml.XMLPrinter
class implements Consumer
and extends PrintWriter
, so you can use it as either of those two.
It writes out the received data in XML format, though there are options
to produce HTML and other styles.
Below is a revised version of RunXQuery
that uses an XMLPrinter
:
import gnu.xquery.lang.XQuery; import gnu.xml.XMLPrinter; public class RunXQuery { public static void main (String[] args) throws Throwable { XQuery xq = new XQuery(); XMLPrinter pp = new XMLPrinter(System.out); for (int i = 0; i < args.length; i++) { String exp = args[i]; System.out.print(exp); System.out.print(" => "); Object x = xq.eval(exp); pp.writeObject(x); pp.println(); pp.flush(); } } }
$ java RunXQuery 'for $i in 1 to 5 return $i+10' for $i in 1 to 5 return $i+10 => 11 12 13 14 15
Note the flush
call to make sure that the output from
the XMLPrinter
is sent to the System.out
before we write anything on the latter directly.
This produces mostly the same output as before, except that
sequence item are separated by space instead of comma-space.
(Also, XML quoting is handled correctly.)
This still isn't the best way to evaluate-and-print.
It is more efficient to have the evaluator print directly to the output,
rather than create an intermediate data structure.
To do that we can pass the XMLPrinter
directly
to the eval
call.
import gnu.xquery.lang.XQuery; import gnu.xml.XMLPrinter; public class RunXQuery { public static void main (String[] args) throws Throwable { XQuery xq = new XQuery(); XMLPrinter pp = new XMLPrinter(System.out); for (int i = 0; i < args.length; i++) { String exp = args[i]; System.out.print(exp); System.out.print(" => "); xq.eval(exp, pp); pp.println(); pp.flush(); } } }
This produces the same output as before. Whether it is more efficient
will depend on the expression you evaluate (and how clever Qexo is).
But for XQuery programs that generate large XML data sets it can
make a large difference, and in general it's a good idea to pass
the Consumer
directly to the evaluator.
If the XQuery program is in a file, rather than a String
,
you can use an eval
method that takes a Reader
.
xq.eval (new FileReader("file.xql"), new XMLPrinter(System.out));
You can also call Qexo functions that have been compiled
to .class
files, directly using Java method invocation.
How to do so is a bit complicated and likely to change;
it will be documented later.
(This feature is only available in the CVS version of Qexo so far. It will be in the next release.)
When you evaluate an XQuery expression from Java,
you may want to set the context item, position, and size
(collectively known as the focus) of the expression.
The preceding eval
methods evaluate the expression
without the focus defined, and if you evaluate an expression that
assumes a focus (such as a top-level path expression) then
Qexo will report a syntax error.
If you want to specify the focus for an expression, you can use
the evalWithFocus
methods of gnu.xquery.lang.XQuery
.
For example:
import gnu.xquery.lang.XQuery; public class EvalWithFocus1 { public static void main (String[] args) throws Throwable { XQuery xq = new XQuery(); Object a = xq.eval("<a><b id='1'/><b id='2'/></a>"); Object b = xq.evalWithFocus("<r size='{last()}'>{b}</r>", a, 1, 9); System.out.println(b); } } }
The <r>
element constructor has an enclosed path
expression b
. This is evaluated relative to the context item,
which is the second argument to evalWithFocus
, in this
case the result of the previous eval
in variable a
.
So the b
returns the two <b>
children of the
<a>
element. The remaining two parameters
to evalWithFocus
are the context position and context size.
(In this case the 8 other items of the context sequence don't exist.)
So the above program prints out:
<r size="9"><b id="1" /><b id="2" /></r>
If there is more than one item in the context sequence, you will
usually want to evaluate the expression for each item in the sequence.
Instead of writing a loop in Java, use the two-operand
form of EvalWithFocus
and pass it the whole sequence:
import gnu.xquery.lang.XQuery; public class EvalWithFocus2 { public static void main (String[] args) throws Throwable { XQuery xq = new XQuery(); Object a = xq.eval("<a><b id='1'/></a>, <a><b id='2'/></a>"); Object b = xq.evalWithFocus("<r pos='{position()}'>{b}</r>", a); System.out.println(b); } }
This results in a 2-item sequence, one for each item in a
.
(Note that a
in this example is different than before.)
<r pos="1"><b id="1" /></r>, <r pos="2"><b id="2" /></r>
Note that if v1
is the result of evaluating e1
,
then the result of evalWithFocus("e2", v1)
is
equivalent to evaluating e1/e2
.
There are variants of these methods where the output to written to a
Consumer
, and the expression is read from a Reader
.
There are also methods so you can pre-compile the expression
(using evalToFocusProc
) and then repeatedly apply that to
different values (using applyWithFocus
).
The Simple API for XML
(SAX) is a set of classes for "copying" XML data
(infosets) using method calls, not necessarily doing any physical
copying. It is a propular API because it is an efficient way
to process large datasets. The Consumer
interface
is similar to the SAX2 ContentHandler
interface.
If you have a class that implements ContentHandler
you can use a ContentConsumer
filter to convert
it to a Consumer
. The following code snippet shows how you
can pass the result of evaluating an XQuery expresion to
a ContentHandler
.
import org.xml.sax.ContentHandler; ... ContentHandler ch = ...; xq.eval(exp, new ContentConsumer(ch));
The Consumer
interface (like the SAX2 ContentHandler
)
is very useful and efficient for any kind
of processing of XML data that can be done in a single pass.
A Consumer
is a passive output "sink". It doesn't
do anything on its own. Instead, it is used as the output
of a producer, which is the application that does the actual work,
and sends the results to the Consumer
.
The separation between a producer (when generates results)
and a Consumer
(which uses the results) allows
allows for great flexibility in plugging together modules.
Note that a Consumer
can pass the data along to
another Consumer
, acting as the latter's producer.
This allows you to chain together a pipeline of Consumer
filters.
Here is a Java program that counts the number of different kinds of
elements produced by evaluating XQuery expressions.
It is a class that extends the basic gnu.lists.FilterConsumer
,
which provides dummy implementations of the Consumer
methods.
import gnu.xquery.lang.XQuery; import java.util.*; import gnu.lists.*; import java.io.PrintStream; public class CountElements extends FilterConsumer { CountElements() { super(VoidConsumer.getInstance()); } List elementNames = new ArrayList(); int numAttributes = 0; int numInts = 0; int numObjects = 0; public void beginGroup(String typeName, Object type) { elementNames.add(typeName); super.beginGroup(typeName, type); } public void beginAttribute(String attrName, Object attrType) { numAttributes++; super.beginAttribute(attrName, attrType); } public void writeInt(int v) { numInts++; super.writeInt(v); } public void writeObject(Object v) { numObjects++; super.writeObject(v); } void dump (PrintStream out) { Collections.sort(elementNames); int total = 0; ListIterator it = elementNames.listIterator(); String previous = null; int count = 0; for (;;) { boolean done = ! it.hasNext(); String cur = done ? "" : (String) it.next(); if (previous != null && ! previous.equals(cur)) { out.println("<" + previous + "> - " + count + " times"); count = 0; } if (done) break; previous = cur; count++; total++; } out.println("TOTAL: " + total); if (numAttributes > 0) out.println("Attributes: " + numAttributes); if (numInts > 0) out.println("ints: " + numInts); if (numObjects > 0) out.println("Objects: " + numObjects); } public static void main(String[] args) throws Throwable { XQuery xq = new XQuery(); CountElements counter = new CountElements(); for (int i = 0; i < args.length; i++) { String exp = args[i]; xq.eval(exp, counter); } counter.dump(System.out); } }
The producer (in this case the XQuery.eval
method called by the main
method)
calls the beginGroup
method when it
want to "write out" an XML element. The beginGroup
implementation in this class just adds the elements string name
(the typeName
) to a List
elementNames
.
It then calls super.beginGroup
to do the default
processing of beginGroup
, which calls
beginGroup
in the next Consumer
in the filter.
In this case, that is a VoidConsumer
, which ignores
everything it receives, so the super.beginGroup
isn't really needed, but we include it to illustrate the general idea.
We also count attributes using the beginAttribute
method as wells as calls to writeInt
and writeObject
.
These are used for non-XML typed values, which SAX doesn't handle.
At the end the dump
method is called. It sorts the
list of elements and writes out the number of times each has been seen,
along with some other statistics. Here is a sample run.
$ javac -g CountElements.java $ java CountElements '<a><b/>{10 to 20}<b/>{1+1}<b/></a>' <a> - 1 times <b> - 3 times TOTAL: 4 ints: 11 Objects: 1
Note how the sequence 10 to 20
produces 11 calls to
writeInt
, while expression 1+1
produces a
single call to writeObject
. Whether an XQuery integer
produces a calls to writeInt
or writeObject
is up to the Qexo implementation and how clever it is.
When Qexo needs to store a document in a data structure it uses
an instance of the class gnu.lists.TreeList
.
The name of the class isn't Document
because
it's actually a lot more general than what is needed for
plain XML documents. It can handle typed values, and
it is also used to represent sequences containing multiple items.
The TreeList
class is used to implement a Document
Object Model (DOM), but it does not implement the standard
org.w3c.dom.Node
or org.w3c.dom.Document
interfaces. The reason for that is that the W3C DOM APIs
use a separate Node
object for each conceptual node
(element, attribute, etc) in a document. This is very inefficient,
as it wastes a lot of space and makes a lot of work for the garbage collector.
Instead, TreeList
uses a much more compact array-based
representation, using one char
array and
one Object
array for the entire document.
A "node" is just an index into the former array, which
makes it efficient to traverse a document.
The following example shows how you can modify the CountElements
application so that the command line arguments are the URLs of XML files
(instead of XQuery expressions).
Replace the main
method by the following, leaving the
rest of the CountElements
class as before.
Each URL is opened and parsed as an XML file, to create a
TreeList
object. You can now do a lot of things
with this TreeList
; in this example all we do
is invoke its consume
method, which "writes out"
all of its data to a Consumer
, which
in this case is a CountElements
object.
public static void main(String[] args) throws Throwable { CountElements counter = new CountElements(); for (int i = 0; i < args.length; i++) { String url = args[i]; TreeList doc = gnu.kawa.xml.Document.parse(url); doc.consume(counter); } counter.dump(System.out); }