The (sxml simple)
module presents a basic interface for parsing
XML from a port into the Scheme SXML format, and for serializing it back
to text.
(use-modules (sxml simple))
Use SSAX to parse an XML document into SXML. Takes one optional argument, string-or-port, which defaults to the current input port. Returns the resulting SXML document. If string-or-port is a port, it will be left pointing at the next available character in the port.
As is normal in SXML, XML elements parse as tagged lists. Attributes,
if any, are placed after the tag, within an @
element. The root
of the resulting XML will be contained in a special tag, *TOP*
.
This tag will contain the root element of the XML, but also any prior
processing instructions.
(xml->sxml "<foo/>") ⇒ (*TOP* (foo)) (xml->sxml "<foo>text</foo>") ⇒ (*TOP* (foo "text")) (xml->sxml "<foo kind=\"bar\">text</foo>") ⇒ (*TOP* (foo (@ (kind "bar")) "text")) (xml->sxml "<?xml version=\"1.0\"?><foo/>") ⇒ (*TOP* (*PI* xml "version=\"1.0\"") (foo))
All namespaces in the XML document must be declared, via xmlns
attributes. SXML elements built from non-default namespaces will have
their tags prefixed with their URI. Users can specify custom prefixes
for certain namespaces with the #:namespaces
keyword argument to
xml->sxml
.
(xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>") ⇒ (*TOP* (http://example.org/ns1:foo "text")) (xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>" #:namespaces '((ns1 . "http://example.org/ns1"))) ⇒ (*TOP* (ns1:foo "text")) (xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\"><bar:baz/></foo>" #:namespaces '((ns2 . "http://example.org/ns2"))) ⇒ (*TOP* (foo (ns2:baz)))
By default, namespaces passed to xml->sxml
are treated as if they
were declared on the root element. Passing a false
#:declare-namespaces?
argument will disable this behavior,
requiring in-document declarations of namespaces before use..
(xml->sxml "<foo><ns2:baz/></foo>" #:namespaces '((ns2 . "http://example.org/ns2"))) ⇒ (*TOP* (foo (ns2:baz))) (xml->sxml "<foo><ns2:baz/></foo>" #:namespaces '((ns2 . "http://example.org/ns2")) #:declare-namespaces? #f) ⇒ error: undeclared namespace: `bar'
By default, all whitespace in XML is significant. Passing the
#:trim-whitespace?
keyword argument to xml->sxml
will trim
whitespace in front, behind and between elements, treating it as
“unsignificant”. Whitespace in text fragments is left alone.
(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>") ⇒ (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n")) (xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>" #:trim-whitespace? #t) ⇒ (*TOP* (foo (bar " Alfie the parrot! ")))
Parsed entities may be declared with the #:entities
keyword
argument, or handled with the #:default-entity-handler
. By
default, only the standard <
, >
, &
,
'
and "
entities are defined, as well as the
&#N;
and &#xN;
(decimal and hexadecimal)
numeric character entities.
(xml->sxml "<foo>&</foo>") ⇒ (*TOP* (foo "&")) (xml->sxml "<foo> </foo>") ⇒ error: undefined entity: nbsp (xml->sxml "<foo> </foo>") ⇒ (*TOP* (foo "\xa0")) (xml->sxml "<foo> </foo>" #:entities '((nbsp . "\xa0"))) ⇒ (*TOP* (foo "\xa0")) (xml->sxml "<foo> &foo;</foo>" #:default-entity-handler (lambda (port name) (case name ((nbsp) "\xa0") (else (format (current-warning-port) "~a:~a:~a: undefined entitity: ~a\n" (or (port-filename port) "<unknown file>") (port-line port) (port-column port) name) (symbol->string name))))) -| <unknown file>:0:17: undefined entitity: foo ⇒ (*TOP* (foo "\xa0 foo"))
By default, xml->sxml
skips over the <!DOCTYPE>
declaration, if any. This behavior can be overridden with the
#:doctype-handler
argument, which should be a procedure of three
arguments: the docname (a symbol), systemid (a string), and
the internal doctype subset (as a string or #f
if not present).
The handler should return keyword arguments as multiple values, as if it
were calling its continuation with keyword arguments. The continuation
accepts the #:entities
and #:namespaces
keyword arguments,
in the same format that xml->sxml
itself takes. These entities
and namespaces will be prepended to those given to the xml->sxml
invocation.
(define (handle-foo docname systemid internal-subset) (case docname ((foo) (values #:entities '((greets . "<i>Hello, world!</i>")))) (else (values)))) (xml->sxml "<!DOCTYPE foo><p>&greets;</p>" #:doctype-handler handle-foo) ⇒ (*TOP* (p (i "Hello, world!")))
If the document has no doctype declaration, the doctype-handler is
invoked with #f
for the three arguments.
In the future, the continuation may accept other keyword arguments, for example to validate the parsed SXML against the doctype.
Serialize the SXML tree tree as XML. The output will be written to the current output port, unless the optional argument port is present.
Detag an sxml tree sxml into a string. Does not perform any formatting.