Adding new document types to be recognized by nndoc
isn’t
difficult. You just have to whip up a definition of what the document
looks like, write a predicate function to recognize that document type,
and then hook into nndoc
.
First, here’s an example document type definition:
(mmdf (article-begin . "^\^A\^A\^A\^A\n") (body-end . "^\^A\^A\^A\^A\n"))
The definition is simply a unique name followed by a series of regexp pseudo-variable settings. Below are the possible variables—don’t be daunted by the number of variables; most document types can be defined with very few settings:
first-article
If present, nndoc
will skip past all text until it finds
something that match this regexp. All text before this will be
totally ignored.
article-begin
This setting has to be present in all document type definitions. It
says what the beginning of each article looks like. To do more
complicated things that cannot be dealt with a simple regexp, you can
use article-begin-function
instead of this.
article-begin-function
If present, this should be a function that moves point to the beginning
of each article. This setting overrides article-begin
.
head-begin
If present, this should be a regexp that matches the head of the
article. To do more complicated things that cannot be dealt with a
simple regexp, you can use head-begin-function
instead of this.
head-begin-function
If present, this should be a function that moves point to the head of
the article. This setting overrides head-begin
.
head-end
This should match the end of the head of the article. It defaults to ‘^$’—the empty line.
body-begin
This should match the beginning of the body of the article. It defaults
to ‘^\n’. To do more complicated things that cannot be dealt with
a simple regexp, you can use body-begin-function
instead of this.
body-begin-function
If present, this function should move point to the beginning of the body
of the article. This setting overrides body-begin
.
body-end
If present, this should match the end of the body of the article. To do
more complicated things that cannot be dealt with a simple regexp, you
can use body-end-function
instead of this.
body-end-function
If present, this function should move point to the end of the body of
the article. This setting overrides body-end
.
file-begin
If present, this should match the beginning of the file. All text before this regexp will be totally ignored.
file-end
If present, this should match the end of the file. All text after this regexp will be totally ignored.
So, using these variables nndoc
is able to dissect a document
file into a series of articles, each with a head and a body. However, a
few more variables are needed since not all document types are all that
news-like—variables needed to transform the head or the body into
something that’s palatable for Gnus:
prepare-body-function
If present, this function will be called when requesting an article. It will be called with point at the start of the body, and is useful if the document has encoded some parts of its contents.
article-transform-function
If present, this function is called when requesting an article. It’s meant to be used for more wide-ranging transformation of both head and body of the article.
generate-head-function
If present, this function is called to generate a head that Gnus can understand. It is called with the article number as a parameter, and is expected to generate a nice head for the article in question. It is called when requesting the headers of all articles.
generate-article-function
If present, this function is called to generate an entire article that Gnus can understand. It is called with the article number as a parameter when requesting all articles.
dissection-function
If present, this function is called to dissect a document by itself,
overriding first-article
, article-begin
,
article-begin-function
, head-begin
,
head-begin-function
, head-end
, body-begin
,
body-begin-function
, body-end
, body-end-function
,
file-begin
, and file-end
.
Let’s look at the most complicated example I can come up with—standard digests:
(standard-digest (first-article . ,(concat "^" (make-string 70 ?-) "\n\n+")) (article-begin . ,(concat "\n\n" (make-string 30 ?-) "\n\n+")) (prepare-body-function . nndoc-unquote-dashes) (body-end-function . nndoc-digest-body-end) (head-end . "^ ?$") (body-begin . "^ ?\n") (file-end . "^End of .*digest.*[0-9].*\n\\*\\*\\|^End of.*Digest *$") (subtype digest guess))
We see that all text before a 70-width line of dashes is ignored; all
text after a line that starts with that ‘^End of’ is also ignored;
each article begins with a 30-width line of dashes; the line separating
the head from the body may contain a single space; and that the body is
run through nndoc-unquote-dashes
before being delivered.
To hook your own document definition into nndoc
, use the
nndoc-add-type
function. It takes two parameters—the first
is the definition itself and the second (optional) parameter says
where in the document type definition alist to put this definition.
The alist is traversed sequentially, and
nndoc-type-type-p
is called for a given type type.
So nndoc-mmdf-type-p
is called to see whether a document is of
mmdf
type, and so on. These type predicates should return
nil
if the document is not of the correct type; t
if it
is of the correct type; and a number if the document might be of the
correct type. A high number means high probability; a low number
means low probability with ‘0’ being the lowest valid number.