For text files: FILE HANDLE handle_name /NAME=’file_name [/MODE=CHARACTER] [/ENDS={CR,CRLF}] /TABWIDTH=tab_width [ENCODING=’encoding’] For binary files in native encoding with fixed-length records: FILE HANDLE handle_name /NAME=’file_name’ /MODE=IMAGE [/LRECL=rec_len] [ENCODING=’encoding’] For binary files in native encoding with variable-length records: FILE HANDLE handle_name /NAME=’file_name’ /MODE=BINARY [/LRECL=rec_len] [ENCODING=’encoding’] For binary files encoded in EBCDIC: FILE HANDLE handle_name /NAME=’file_name’ /MODE=360 /RECFORM={FIXED,VARIABLE,SPANNED} [/LRECL=rec_len] [ENCODING=’encoding’]
Use FILE HANDLE
to associate a file handle name with a file and
its attributes, so that later commands can refer to the file by its
handle name. Names of text files can be specified directly on
commands that access files, so that FILE HANDLE
is only needed when a
file is not an ordinary file containing lines of text. However,
FILE HANDLE
may be used even for text files, and it may be
easier to specify a file’s name once and later refer to it by an
abstract handle.
Specify the file handle name as the identifier immediately following the
FILE HANDLE
command name. The identifier INLINE is reserved for
representing data embedded in the syntax file (see BEGIN DATA) The
file handle name must not already have been used in a previous
invocation of FILE HANDLE
, unless it has been closed by an
intervening command (see CLOSE FILE HANDLE).
The effect and syntax of FILE HANDLE
depends on the selected MODE:
In CHARACTER mode only, tabs are expanded to spaces by input programs,
except by DATA LIST FREE
with explicitly specified delimiters.
Each tab is 4 characters wide by default, but TABWIDTH (a PSPP
extension) may be used to specify an alternate width. Use a TABWIDTH
of 0 to suppress tab expansion.
A file written in CHARACTER mode by default uses the line ends of the system on which PSPP is running, that is, on Windows, the default is CR LF line ends, and on other systems the default is LF only. Specify ENDS as CR or CRLF to override the default. PSPP reads files using either convention on any kind of system, regardless of ENDS.
Alphanumeric data in mode 360 files are encoded in EBCDIC. PSPP translates EBCDIC to or from the host’s native format as necessary on input or output, using an ASCII/EBCDIC translation that is one-to-one, so that a “round trip” from ASCII to EBCDIC back to ASCII, or vice versa, always yields exactly the original data.
The RECFORM
subcommand is required in mode 360. The precise file
format depends on its setting:
This record format is equivalent to IMAGE mode, except for EBCDIC translation.
IBM documentation calls this *F
(fixed-length, deblocked)
format.
The file comprises a sequence of zero or more variable-length blocks. Each block begins with a 4-byte block descriptor word (BDW). The first two bytes of the BDW are an unsigned integer in big-endian byte order that specifies the length of the block, including the BDW itself. The other two bytes of the BDW are ignored on input and written as zeros on output.
Following the BDW, the remainder of each block is a sequence of one or more variable-length records, each of which in turn begins with a 4-byte record descriptor word (RDW) that has the same format as the BDW. Following the RDW, the remainder of each record is the record data.
The maximum length of a record in VARIABLE mode is 65,527 bytes: 65,535 bytes (the maximum value of a 16-bit unsigned integer), minus 4 bytes for the BDW, minus 4 bytes for the RDW.
In mode VARIABLE, LRECL specifies a maximum, not a fixed, record length, in bytes. The default is 8,192.
IBM documentation calls this *VB
(variable-length, blocked,
unspanned) format.
The file format is like that of VARIABLE mode, except that logical records may be split among multiple physical records (called segments) or blocks. In SPANNED mode, the third byte of each RDW is called the segment control character (SCC). Odd SCC values cause the segment to be appended to a record buffer maintained in memory; even values also append the segment and then flush its contents to the input procedure. Canonically, SCC value 0 designates a record not spanned among multiple segments, and values 1 through 3 designate the first segment, the last segment, or an intermediate segment, respectively, within a multi-segment record. The record buffer is also flushed at end of file regardless of the final record’s SCC.
The maximum length of a logical record in VARIABLE mode is limited only by memory available to PSPP. Segments are limited to 65,527 bytes, as in VARIABLE mode.
This format is similar to what IBM documentation call *VS
(variable-length, deblocked, spanned) format.
In mode 360, fields of type A that extend beyond the end of a record
read from disk are padded with spaces in the host’s native character
set, which are then translated from EBCDIC to the native character
set. Thus, when the host’s native character set is based on ASCII,
these fields are effectively padded with character X'80'
. This
wart is implemented for compatibility.
The NAME
subcommand specifies the name of the file associated with the
handle. It is required in all modes but SCRATCH mode, in which its
use is forbidden.
The ENCODING subcommand specifies the encoding of text in the file. For reading text files in CHARACTER mode, all of the forms described for ENCODING on the INSERT command are supported (see INSERT). For reading in other file-based modes, encoding autodetection is not supported; if the specified encoding requests autodetection then the default encoding is used. This is also true when a file handle is used for writing a file in any mode.