use Texinfo::Parser; my $parser = Texinfo::Parser::parser(); my $tree = $parser->parse_texi_file("somefile.texi"); # a Texinfo::Report object in which the errors and warnings # encountered while parsing are registered. my $registrar = $parser->registered_errors(); my ($errors, $errors_count) = $registrar->errors(); foreach my $error_message (@$errors) { warn $error_message->{'error_line'}; } my $indices_information = $parser->indices_information(); my $float_types_arrays = $parser->floats_information(); my $internal_references_array = $parser->internal_references_information(); # $labels_information is an hash reference on normalized node/float/anchor names. my ($labels_information, $targets_list, $nodes_list) = $parser->labels_information(); # A hash reference, keys are @-command names, value is an # array reference holding all the corresponding @-commands. my $global_commands_information = $parser->global_commands_information(); # a hash reference on document information (encodings, # input file name, dircategory and direntry list, for example). my $global_information = $parser->global_information();
The Texinfo Perl module main purpose is to be used in texi2any
to convert
Texinfo to other formats. There is no promise of API stability.
Texinfo::Parser
will parse Texinfo text into a Perl tree. In one pass
it expands user-defined @-commands, conditionals (@ifset
, @ifinfo
...)
and @value
and constructs the tree. Some extra information is gathered
while doing the tree: for example, the @quotation
associated to an @author
command, the number of columns in a multitable, or the node associated with a
section.
No method is exported in the default case. The module allows both an object-oriented syntax, or traditional function, with the parser as an opaque data structure given as an argument to every function.
The following method is used to construct a new Texinfo::Parser
object:
This method creates a new parser. The options may be provided as a hash reference. Most of those options correspond to Texinfo customization options described in the Texinfo manual.
Handle cpp like synchronization lines if set. Set in the default case.
An array reference of the output formats for which @ifFORMAT
conditional blocks should be expanded. Default is empty.
Possible values are nomenu
, menu
and sectiontoc
. Only report
menu-related errors for menu
.
An array reference of directories in which @include
files should be
searched for. Default contains the working directory, ..
If set, spaces after an @-command name that take braces are ignored. Default on.
Maximal number of nested user-defined macro calls. Default is 100000.
A string corresponding to a document language set by @documentlanguage
.
It overrides the document @documentlanguage
information, if present.
Texinfo::Report object reused by the parser to register errors.
A hash reference. Keys are names, values are the corresponding values.
Same as values set by @set
.
Different methods may be called to parse some Texinfo code:
parse_texi_line
for a line, parse_texi_piece
for a fragment of
Texinfo, parse_texi_text
for a string corresponding to a full document
and parse_texi_file
for a file.
For all those functions, if the $parser argument is undef, a new parser object is generated to parse the line. Otherwise the parser given as an argument is used to parse into a tree.
When parse_texi_line
is used, the resulting tree is rooted at
a root_line
type container. Otherwise, the resulting tree should be
rooted at a document_root
type container.
This function is used to parse a short fragment of Texinfo code.
$text is the string containing the texinfo line. $first_line_number is the line number of the line, if undef, it will be set to 1.
This function is used to parse Texinfo fragments.
$text is the string containing the texinfo text. $first_line_number is the line number of the first text line, if undef, it will be set to 1.
This function is used to parse a text as a whole document.
$text is the string containing the texinfo text. $first_line_number is the line number of the first text line, if undef, it will be set to 1.
The file with name $file_name is considered to be a Texinfo file and is parsed into a tree. $file_name should be a binary string.
undef is returned if the file couldn’t be read.
The errors collected during the tree parsing are registered in a
Texinfo::Report object. This object is available with
registered_errors
. The errors registered in the Texinfo::Report
object are available through the errors
method. This method is
described in Texinfo::Report::errors.
$registrar is a Texinfo::Report object in which the errors and warnings encountered while parsing are registered. If a registrar is passed to the parser initialization options, it is reused, otherwise a new one is created.
After parsing some information about the Texinfo code that was processed is available from the parser.
Some global information is available through global_information
:
The $info returned is a hash reference. The possible keys are
An array of successive @dircategory
and @direntry
as they appear
in the document.
input_encoding_name
string is the encoding name used for the
Texinfo code.
input_perl_encoding
string is a corresponding Perl encoding name.
The name of the main Texinfo input file and the associated directory.
Binary strings. In texi2any
, they should come from the command line
(and can be decoded with the encoding in the customization variable
COMMAND_LINE_ENCODING
).
Some command lists are available, such that it is possible to go through
the corresponding tree elements without walking the tree. They are
available through global_commands_information
:
$commands is an hash reference. The keys are @-command names. The associated values are array references containing all the corresponding tree elements.
All the @-commands that have an associated label (so can be the
target of cross references) -- @node
, @anchor
and @float
with
label -- have a normalized name associated, constructed as described in the
HTML Xref node in the Texinfo documentation. Those normalized labels and
the association with @-commands is available through labels_information
:
$labels_information is a hash reference whose keys are normalized labels, and the associated value is the corresponding @-command. $targets_list is a list of labels @-command. Using $labels_information is preferred. $nodes_list is a list of all the nodes appearing in the document.
Information on @float
is also available, grouped by type of
floats, each type corresponding to potential @listoffloats
.
This information is available through the method floats_information
.
$float_types is a hash reference whose keys are normalized float
types (the first float argument, or the @listoffloats
argument).
The normalization is the same as for the first step of node names
normalization. The value is the list of float tree elements appearing
in the texinfo document.
Internal references, that is, @-commands that refer to node, anchors or floats within the document are also available:
The function returns a list of cross-reference commands referring to the same document.
Information about defined indices, merged indices and index entries is
also available through the indices_information
method.
$indices_information is a hash reference. The keys are
1 if the index entries should be formatted as code, 0 in the opposite case.
The index name.
An array reference of prefix associated to the index.
In case the index is merged to another index, this key holds the name of the index the index is merged into. It takes into account indirectly merged indices.
An hash reference holding names of indices that are merged into the index, including itself. It also contains indirectly merged indices. This key is removed if the index is itself later merged to another index.
An array reference containing index entry structures for index entries
associated with the index. The index entry could be associated to
@-commands like @cindex
, or @item
in @vtable
, or definition
commands entries like @deffn
.
The keys of the index entry structures are
The following shows the references corresponding to the default indexes cp and fn, the fn index having its entries formatted as code and the indices corresponding to the following texinfo
@defindex some @defcodeindex code $index_names = {'cp' => {'name' => 'cp', 'in_code' => 0, }, 'fn' => {'name' => 'fn', 'in_code' => 1, }, 'some' => {'in_code' => 0}, 'code' => {'in_code' => 1}};
If name
is not set, it is set to the index name.
A Texinfo tree element (called element because node is overloaded in
the Texinfo world) is an hash reference. There are three main categories
of tree element. Tree elements associated with an @-command have a
cmdname
key holding the @-command name. Tree elements corresponding
to text fragments have a text
key holding the corresponding text.
Finally, the last category is other elements, which in most cases have
a type
key holding their name. Text fragments and @-command elements
may also have an associated type when such information is needed.
The children of an @-command or of other container element are in the array
referred to with the args
key or with the contents
key. The args
key
is for arguments of @-commands, either in braces or on the rest of the line
after the command, depending on the type of command. The contents
key array
holds the contents of the texinfo code appearing within a block @-command,
within a container, or within a @node
or sectioning @-command.
Another important key for the elements is the extra
key which is
associated to a hash reference and holds all kinds of information that
is gathered during the parsing and may help with the conversion.
You can see examples of the tree structure by running makeinfo like this:
makeinfo -c DUMP_TREE=1 -c TEXINFO_OUTPUT_FORMAT=parse document.texi
For a simpler, more regular representation of the tree structure, you can do:
makeinfo -c TEXINFO_OUTPUT_FORMAT=debugtree document.texi
info
keyextra
keyThe command name of @-command elements.
The text fragment of text elements.
The type of element considered, in general a container. Frequent types encountered are paragraph for a paragraph container, brace_command_arg for the container holding the brace @-commands contents, line_arg and block_line_arg contain the arguments appearing on the line of @-commands. Text fragments may have a type to give an information of the kind of text fragment, for example spaces_before_paragraph is associated to spaces appearing before a paragraph beginning. Most @-commands elements do not have a type associated.
Arguments in braces or on @-command line. An array reference.
The Texinfo appearing in the element. For block commands, other
containers, @node
and sectioning commands. An array reference.
The parent element.
An hash reference corresponding to information on the location of the element in the Texinfo input manual. It should mainly be available for @-command elements, and only for @-commands that are considered to be complex enough that the location in the document is needed, for example to prepare an error message.
The keys of the line number hash references are
A hash reference holding any other information that cannot be
obtained otherwise from the tree.
See Information available in the info
key.
A hash reference holding information that could also be obtained
from the tree, but is directly associated to the element to simplify
downstream code.
See Information available in the extra
key.
Some types can be associated with @-commands (in addition to cmdname
),
although usually there will be no type at all. The following are the
possible values of type
for tree elements for @-commands.
This is the type of a command given in argument of @itemize
,
@table
, @vtable
or @ftable
. For example in
@itemize @bullet @item item @end itemize
the element corresponding with bullet has the following keys:
'cmdname' => 'bullet' 'type' => 'command_as_argument'
The parent @-command has an entry in extra
for the command_as_argument
element:
'cmdname' => 'itemize' 'extra' => {'command_as_argument' => $command_element_as_argument}
This type may be associated with a definition command with a x form,
like @defunx
, @defvrx
. For the form without x, the associated
def_line is the first contents
element. It is described in more
details below.
This type is set for an @-command that is redefined by @definfoenclose
.
The beginning is in {'extra'}->{'begin'}
and the end in
{'extra'}->{'end'}
.
This is the type of index entry command like @cindex
, and, more
importantly user-defined index entry commands. So for example if there
is:
@defindex foo ... @fooindex index entry
the @fooindex
@-command element will have the index_entry_command
type.
The text elements may have the following types (or may have no type at all):
Space after a node in the menu entry, when there is no description, and space appearing after the description line.
An empty line (possibly containing whitespace characters only).
spaces appearing after an @-command without braces that does not
take takes argument on the line, but which is followed by ignorable
spaces, such as @item
in @itemize
or @multitable
, or @noindent
.
Spaces appearing after a closing brace, for some rare commands for which
this space should be ignorable (like @caption
or @sortas
).
Space appearing before a paragraph beginning.
Text in an environment where it should be kept as is (in @verbatim
,
@verb
, @macro
body).
Used for the arguments to some special line commands whose arguments
aren’t subject to the usual macro expansion. For example @set
,
@clickstyle
, @unmacro
, @comment
. The argument is associated to
the text key.
Space within an index @-command before an @-command interrupting the index command.
Text appearing after @bye.
Text appearing before real content, including the \input texinfo.tex
.
English text added by the parser that may need to be translated
during conversion. Happens for @def*
@-commands aliases that
leads to prepending text such as ’Function’.
Some types of element are containers of portions of the tree,
either for the whole tree, or for contents appearing before @node
and sectioning commands.
Content before nodes and sectioning commands at the beginning of
document_root
.
root_line
is the type of the root tree when parsing Texinfo line
fragments using parse_texi_line
. document_root
is the document
root otherwise.
document_root
first content should be before_node_section
, then nodes and
sections @-commands elements, @bye
element and postamble_after_end
.
This container holds everything appearing after @bye.
This container holds everything appearing before the first content, including
the \input texinfo.tex
line and following blank lines.
This container holds everything that appears before @setfilename
.
This container holds everything appearing before the first formatted content, corresponding to the preamble in the Texinfo documentation.
The other types of element are containers with other elements appearing in
their contents
. The paragraph
container holds normal text from the
Texinfo manual outside of any @-commands, and within @-commands with blocks of
text (@footnote
, @itemize
@item
, @quotation
for example). The
preformatted
container holds the content appearing in @-commands like
@example
and the rawpreformatted
container holds the content appearing in
format commands such as @html
. The other containers are more specific.
The types of container element are the following:
Special type containing balanced braces content (braces included)
in the context where they are valid, and where balanced braces need to
be collected to know when a top-level brace command is closed. In @math
,
in raw output format brace commands and within brace @-commands in raw output
format block commands.
A container for content before the first @item
of block @-commands
with items (@table
, @multitable
, @enumerate
...).
Those containers occur within the args
array of @-commands taking an
argument. brace_command_arg is used for the arguments to commands
taking arguments surrounded by braces (and in some cases separated by
commas). brace_command_context is used for @-commands with braces
that start a new context (@footnote
, @caption
, @math
).
line_arg is used for commands that take the texinfo code on the rest of the
line as their argument, such as @settitle
, @node
, @section
.
block_line_arg is similar but is used for commands that start a new block
(which is to be ended with @end
).
following_arg is used for the accent @-commands argument that did not use braces but instead followed the @-command, possibly after a space, as
@~n @ringaccent A
For example
@code{in code}
leads to
{'cmdname' => 'code', 'args' => [{'type' => 'brace_command_arg', 'contents' => [{'text' => 'in code'}]}]}
As an exception, @value
flag argument is directly in the args array
reference, not in a brace_command_arg container. Note that only @value
commands that are not expanded because there is no corresponding value set
are present as elements in the tree.
Bracketed argument. On definition command and on @multitable
line.
Argument of a user defined linemacro call in bracket. It holds directly the argument text (which does not contain the braces) and does not contain other elements. It should not appear directly in the tree as the user defined linemacro call is replaced by the linemacro body.
Contains several elements that together are a single unit on a @def* line.
The def_line type is either associated with a container within a
definition command, or is the type of a definition command with a x form,
like @deffnx
, or @defline
. It holds the definition line arguments.
The container with type def_item holds the definition text content.
Content appearing before a definition command with a x form is in
an inter_def_item container.
Container holding the arguments of a user defined macro, linemacro or rmacro. It should not appear directly in the tree as the user defined call is expanded. The name of the macro, rmacro or linemacro is the the info command_name value.
Taken from @macro
definition and put in the args
key array of
the macro, macro_name is the type of the text fragment corresponding
to the macro name, macro_arg is the type of the text fragments
corresponding to macro formal arguments.
The menu_comment container holds what is between menu entries in menus. For example, in:
@menu Menu title * entry:: Between entries * other:: @end menu
Both
Menu title
and
Between entries
will be in a menu_comment.
A menu_entry holds a full menu entry, like
* node:: description.
The different elements of the menu entry are in the
menu_entry contents
array reference.
menu_entry_leading_text holds the star and following spaces.
menu_entry_name is the menu entry name (if present), menu_entry_node
corresponds to the node in the menu entry, menu_entry_separator holds
the text after the node and before the description, in most cases
::
. Lastly, menu_entry_description is for the description.
In @multitable
, a multitable_head container contains all the rows
with @headitem
, while multitable_body contains the rows associated
with @item
. A row container contains the @item
and @tab
forming a row.
A paragraph. The contents
of a paragraph (like other container
elements for Texinfo content) are elements representing the contents of
the paragraph in the order they occur, such as text elements
without a cmdname
or type
, or @-command elements for commands
appearing in the paragraph.
Texinfo code within a format that is not filled. Happens within some
block commands like @example
, but also in menu (in menu descriptions,
menu comments...).
Texinfo code within raw output format block commands such as @tex
or @html
.
Those containers appear in @table
, @ftable
and @vtable
.
A table_entry container contains an entire row of the table.
It contains a table_term container, which holds all the @item
and
@itemx
lines. This is followed by a table_definition container, which
holds the content that is to go into the second column of the table.
If there is any content before an @itemx
(normally only comments,
empty lines or maybe index entries are allowed), it will be in
a container with type inter_item at the same level of @item
and @itemx
, in a table_term.
info
key ¶The string correspond to the line after the @-command
for @-commands that have special arguments on their line,
and for @macro
line.
The name of the user defined macro, rmacro or linemacro called associated with the element holding the arguments of the user defined command call.
@verb
delimiter is in delimiter.
A reference to an element containing the spaces after @-command arguments before a comma, a closing brace or at end of line, for some @-commands and bracketed content type with opening brace, and line commands and block command lines taking Texinfo as argument and comma delimited arguments. Depending on the @-command, the spaces_after_argument is associated with the @-command element, or with each argument element.
For accent commands with spaces following the @-command, like:
@ringaccent A @^ u
there is a spaces_after_cmd_before_arg key linking to an element containing the spaces appearing after the command in text.
Space between a brace @-command name and its opening brace also ends up in spaces_after_cmd_before_arg. It is not recommended to leave space between an @-command name and its opening brace.
A reference to an element containing the spaces following the opening brace of some @-commands with braces and bracketed content type, spaces following @-commands for line commands and block command taking Texinfo as argument, and spaces following comma delimited arguments. For context brace commands, line commands and block commands, spaces_before_argument is associated with the @-command element, for other brace commands and for spaces after comma, it is associated with each argument element.
extra
key ¶The node element in the parsed tree containing the element.
Set for @-commands elements that have an associated
index entry and for @nodedescription
.
The region command (@copying
, @titlepage
) containing the element,
if it is in such an environement. Set for @-commands elements that have an
associated index entry and for @anchor.
The index entry information is associated to @-commands that have an associated
index entry. The associated information should not be directly accessed,
instead Texinfo::Common::lookup_index_entry
should be called on the extra
index_entry value. The
$indices_information is the information on a Texinfo manual indices obtained
from
Texinfo::Parser::indices_information
.
The index entry information hash returned by
Texinfo::Common::lookup_index_entry
is described in index_entries.
Currently, the index_entry value is an array reference with an index name as first element and the index entry number in that index (1-based) as second element.
A string containing the characters flagged as ignored in key sorting in the document by setting flags such as txiindexbackslashignore. Set, if not empty, for @-commands elements that have an associated index entry.
An array holding strings, the arguments of @-commands taking simple
textual arguments as arguments, like @everyheadingmarks
,
@frenchspacing
, @alias
, @synindex
, @columnfractions
.
Set for some @-commands with line arguments and a missing argument.
The string correspond to the line after the @-command for @-commands
that have an argument interpreted as simple text, like @setfilename
,
@end
or @documentencoding
.
@abbr
@acronym
The first argument normalized is in normalized.
@anchor
@float
@-commands that are targets for cross-references have a normalized key for the normalized label, built as specified in the Texinfo documentation in the HTML Xref node. There is also a node_content key for an array holding the corresponding content.
@author
If in a @titlepage
, the titlepage is in titlepage, if in
@quotation
or @smallquotation
, the corresponding tree element
is in quotation.
The author tree element is in the authors array of the @titlepage
or the @quotation
or @smallquotation
it is associated with.
@click
In clickstyle there is the current clickstyle command.
def_command holds the command name, without x if it is an x form of a definition command. original_def_cmdname is the original def command.
If it is an x form, it has not_after_command set if not appearing after the definition command without x.
def_line
For each element in a def_line
, the key def_role holds a string
describing the meaning of the element. It is one of
category, name, class, type, arg, typearg,
spaces or delimiter, depending on the definition.
The def_index_element is a Texinfo tree element corresponding to
the index entry associated to the definition line, based on the
name and class. If needed this element is based on translated strings.
In that case, if @documentlanguage
is defined where the def_line
is located, documentlanguage holds the documentlanguage value.
def_index_ref_element is similar, but not translated, and only set if
there could have been a translation.
The omit_def_name_space key value is set and true if the Texinfo variable
txidefnamenospace
was set for the def_line
, signaling that the
space between function definition name and arguments should be omitted.
@definfoenclose
defined commandsbegin holds the string beginning the @definfoenclose
,
end holds the string ending the @definfoenclose
.
@documentencoding
The argument, normalized is in input_encoding_name.
@enumerate
The enumerate_specification extra
key contains the enumerate argument.
@float
@listoffloats
If @float
has a first argument, and for @listoffloats
argument there
is a float_type key with the normalized float type.
caption and shortcaption hold the corresponding tree elements
associated to a @float
. The @caption
or @shortcaption
have the
float tree element stored in float.
@subentry
If an index entry @-command, such as @cindex
, or a @subentry
contains
a @sortas
command, sortas holds the @sortas
command content
formatted as plain text.
subentry links to the next level @subentry
element.
Index entry @-command (but not @subentry
) can also have seentry
and seealso keys that link to the corresponding @-commands elements.
@inlinefmt
@inlineraw
@inlinefmtifelse
@inlineifclear
@inlineifset
The first argument is in format. If an argument has been determined
as being expanded by the Parser, the index of this argument is in
expand_index. Index numbering begins at 0, but the first argument is
always the format or flag name, so, if set, it should be 1 or 2 for
@inlinefmtifelse
, and 1 for other commands.
@item
in @enumerate
or @itemize
The item_number extra
key holds the number of this item.
@item
and @tab
in @multitable
The cell_number index key holds the index of the column of the cell.
@itemize
@table
@vtable
@ftable
The command_as_argument extra
key points to the @-command on
as argument on the @-command line.
If the command in argument for @table
, @vtable
or @ftable
is @kbd
and the context and @kbdinputstyle
is such that @kbd
should be formatted as code, the command_as_argument_kbd_code
extra
key is set to 1.
@kbd
code is set depending on the context and @kbdinputstyle
.
@macro
invalid_syntax is set if there was an error on the @macro
line. info
key hash arg_line holds the line after @macro
.
menu_entry_node
Extra keys with information about the node entry label same as those
appearing in the @node
line_arg explicit directions arguments
extra
hash labels information.
@multitable
The key max_columns holds the maximal number of columns. If there is a
@columnfractions
as argument, then the columnfractions key is associated
with the element for the @columnfractions command.
@node
Explicit directions labels information is in the line_arg
arguments extra
node direction @node
arguments. They consist
in a hash with the node_content key for an array holding the
corresponding content, a manual_content key if there is an
associated external manual name, and a normalized key for the
normalized label, built as specified in the HTML Xref
Texinfo documentation node.
An associated_section key holds the tree element of the
sectioning command that follows the node. An node_preceding_part
key holds the tree element of the @part
that precedes the node,
if there is no sectioning command between the @part
and the node.
A node_description key holds the first @nodedescription
associated
to the node.
A node containing a menu have a menus key which refers to an array of references to menu elements occuring in the node.
The first node containing a @printindex
@-command has the isindex
key set.
paragraph
The indent or noindent key value is set if the corresponding @-commands are associated with that paragraph.
@part
The next sectioning command tree element is in part_associated_section.
The following node tree element is in part_following_node if there is
no sectioning command between the @part
and the node.
@ref
@xref
@pxref
@inforef
The node argument brace_command_arg holds information on
the label, like the one appearing in the @node
line_arg explicit
directions arguments extra
hash labels information.
row
The row_number index key holds the index of the row in
the @multitable
.
The node preceding the command is in associated_node.
The part preceding the command is in associated_part.
If the level of the document was modified by @raisections
or @lowersections
, the differential level is in sections_level.
untranslated
documentlanguage holds the @documentlanguage
value.
If there is a translation context, it should be in translation_context.
Copyright 2010- Free Software Foundation, Inc. See the source file for all copyright years.
This library is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.