Skip to content
This repository has been archived by the owner on Jun 6, 2021. It is now read-only.

jycr/annotation-xpath-sax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is Annotation XPath for SAX, AXS. It is is made freely available under an MIT-type license, as described in the LICENSE file. All of the code generated by the attribute processor is freely licensed under the same terms as this package.

INTRODUCTION AND EXAMPLE

AXS (pronounced "axis") is an effort to make writing SAX DocumentHandlers easy. An AXS handler subclasses com.googlecode.axs.AbstractAnnotatedHandler and then instead of (or in addition to) the usual startElement(), endElement(), etc. SAX handlers, it defines annotated handlers which are called when the current element in the document being parsed matches an XPath expression. For example, if one had a document

<person>
  <names>
    <name>John Smith</name>
    <name type="alias">Kyon</name>
    <name type="alias">Hey, you!</name>
  </names>
  <age span="subjective">18.32</age>
  <age span="years-since-birth">16.1</age>
  <locations>
    <location>
  	  <country>Japan</country>
  	  <era>mid-Haruhi</era>
  	</location>
  	<location>
  	  <country>alternate-Japan@3c603ff:110bb8e</country>
  	  <era>elided-Haruhi</era>
  	  <subsidary-universe/>
  </locations>
</person>

the handler function

@XPath("names/name[@type != 'alias']")
public void realName(String name) { ... }

would be called exactly once with the string "John Smith". Similarly, a function

@XPath("locations/location/country")
public void whereIsHeNow(String country) { ... }

would be called twice, once with "Japan" and once with "alternate-Japan@3c603ff:110bb8e".

USING AXS

AXS provides two JAR files, one of which ("axs-runtime") must be included in your application. The other JAR ("axs-compiler") must be added to your project as an attribute processor for javac. For the Oracle (Sun) javac, this is done by using the -processorpath command line argument. In an Ant <javac> task, this can be done with <compilerarg> elements.

<javac...>
  <compilerarg value="-processorpath"/>
  <compilerarg value="${axs-compiler-jar}"/>
  <compilerarg value="-s"/>
  <compilerarg value="${generated-code-dir}"/>
</javac>

Then, the code generated by the attribute processor must also be compiled and included into your application.

THE AXS @ATTRIBUTES

AXS provides four attributes, one which applies to the handler class and three which apply to specific handler methods. The attributes are

@XPathNamespaces(String[] namespacePairs)

This attribute is applied to the handler class, and defines the qualified name (QName) Prefix to Namespace URI mappings used for all the XPath expressions in this path. If this attribute is not present, a single mapping of the null Prefix ("") to the null Namespace URI ("") is used. The strings are of the form "prefix=URI", e. g. "html=http://www.w3.org/1999/xhtml" defines that the Prefix "html" refers to elements in the XHTML namespace. Any prefix, including the null prefix, can be mapped.

@XPath(String xpathExpression)

This attribute is applied to a handler method, and specifies that the method will be called with the text enclosed by the right-most Element of the XPath expression. If you don't care about the content of the element, only its existance or attributes, you probably want to use @XPathStart instead.

@XPathStart(String xpathExpression)

This attribute is applied to a handler method, and specifies that the method will be called with the SAX Attributes of the right-most Element of the XPath expression as soon as that Element is started.

To continue on the example, if one wanted to know all the different ways that John Smith's age is tracked, the handler function

  @XPathStart("/person/age")
  public void foundAnAge(org.xml.sax.Attributes attrs) { ... }

would be called twice, once for each element.

@XPathEnd(String xpathExpression)

This attribute is applied to a handler method, and specifies that the method will be called when the right-most Element of the XPath expression is ended.

In the example, if one wanted to stop parsing as soon as two aliases were found, the handler function

  @XPathEnd("/person/names/name[@type='alias'][2]")
  public void gotTwoAliases() { throw new SAXException("got what we needed"); }

would do so by throwing a SAXException after the </name> tag of the "Hey, you!" entry. (In practice, you'd want to throw a subclass of SAXException so you could tell it apart from an actual error!)

Multiple XPath expressions may be combined in a single attribute by using the '|' character to separate alternatives.

SUBSET OF XPATH UNDERSTOOD BY AXS

AXS handles only a subset of the full XPath specification. Since SAX is a streaming parser, AXS only accepts forward path steps, and specifically only the child:: and descendant:: axes. Every path step must have an element (no attribute-only steps), and wildcards elements ('*') are not permitted. Use the descendant:: (i.e. //) axis instead.

Only a few predicates are accepted:

  • the numeric singleton predicate [N] which selects the element at Context Position N (e.g. names/name[2] selects the 2nd <name> that is a child of <names>)
  • string value comparisons $A CMP $B where CMP is either '=' or '!=' and $A and $B are either string literals ('value' or "value"; for either form, a doubled delimiter is the escape sequence to write that delimiter, e.g. 'a''b' is the literal "a'b" and likewise "a""b" is 'a"b') or attribute names @NAME
  • string match functions "contains(a, b)", "starts-with(a, b)", and "ends-with(a, b)" where A and B are either literals or attribute names
  • the regular expression string match function "match(A, L)" where A is either a literal or an attribute name, and L is a string literal: this tests whether A matches the regular expression specified in L. Regular expression syntax is that of java.util.regex.Pattern, not that of XPath. The optional third (flags) argument in the XPath standard is not supported. Use the (?idmsux-idmsux) syntax inside the pattern to set pattern flags, instead.
  • numeric comparisons to the position() function (e.g. [position() < 4] selects the first three matches)
  • the special function [captureattrs()] which ensures that the attributes of the Element to which it is applied will be available when the handler function is called (see section V). This predicate is always true.
  • parenthesized expressions, the "and" and "or" boolean operators, and the function "not(EXPR)"

AXS supports both the full and abbreviated naming forms of XPath. The "child::" axis prefix can be freely omitted, the "descendant::" prefix can be abbreviated as "//", and "attribute::" can be abbreviated as "@".

ADDITIONAL INFORMATION AVAILABLE TO HANDLER FUNCTIONS

AbstractAnnotatedHandler provides several functions which may be called by a handler function to request more information about the context of the handler call:

int tagDepth()

Returns how many elements deep the current path is to the root.

QName tagAtDepth(int depth)

Returns the tag at a given depth in the current path. Depth 0 is the root element.

int findTag(QName tag[, int start])

Returns the depth at which the tag can be found, or -1 if it was not found. Can optionally take the depth at which to start an incremental search.

Map<QName, String> attributesAtDepth(int depth)

Returns the attributes of the tag at a given depth, or null if they are not available. Note that the attributes of a given tag will only be available if they were used in a predicate, or if the special [captureattrs()] predicate was used. (Predicates which do not need the attributes to be evaluated don't capture them. If you want them available, write e.g. "...[position() > 2 and captureattrs()]".)

Handler subclasses are free to implement all the usual SAX DocumentHandler methods themselves if needed, but they must call the superclass implementation as well.