character sets
... and converting


changes from earlier versions
tag numbers

drafts (partly obsolete)

object model
IIF, MARC and Z39.50

IIF is the "Information Interchange Format", a record serialization format specified in ISO standard 2709, also published as ANSI Z39.2. IIF is mostly a plaintext format, in that almost any information is encoded using ASCII characters (no binary numbers) and the only control characters used are byte values 29 (record terminator RT), 30 (field terminator FT) and 31 (as subfield delimiter).

MARC ("MAchine Readable Catalogue") is actually a family of largely incompatible standards ( USMARC , UNIMARC , UKMARC, ...) that evolved from MARC I (1965). While the main concern of the MARC standards is to specify actual data models (assigning tags and subfield codes, which can be used perfectly well in Malete, CDS/ISIS or other databases), they also specify a variant of IIF as suggested common format for data exchange, which we here refer to as "MARC". (This file syntax seems to be mostly the same for all MARC standards).

Z39.50 is a network protocol to search and retrieve records. It supports various query "languages", the most commonly used of which is called Type-1 query. Type-1 is similar to the queries as supported by Malete and CDS/ISIS, however, much more general and complex. Terms can be searched for in any indexed field or with restriction to one or more "attributes".
Attributes are basically the tags used in the index, which are almost always different from those used in records. While it is common for records to use any of the various MARCs or even completely different formats, the attributes used in bibliographical systems are typically those specified by the Bib-1 attribute set (e.g. assigning 4 to title).

Z39.50 allows a client to select a record format from various conversions supported by a server. When a MARC format is selected, the data is actually transmitted serialized according to IIF.

IIF and MARC serialized records

IIF specifies a serialization for records. Like the Malete record data file, an IIF file is simply a stream of such records; there is no additional file header.
A record has
  • a 24 byte leader, containing 16 bytes structural data and 8 bytes application data (x, imported as "MARC leader"). The format for MARC is LLLLLxxxxx22BBBBBxxx4500. The Ls and Bs are total record length (including leader and a terminating RT) and start of data (field values, after an FT terminating the dictionary). The first '2' denotes that every field starts with two indicator bytes, the second is the subfield identifier length including the delimiter char.
  • a "dictionary" array with one entry per field containing 3 bytes tag, and n and m bytes for length and offset. n and m are digits at leader offset 20 and 21, MARC uses 4 and 5. In general IIF, leader byte 22 may specify a number of implementation defined entry bytes.
  • the actual field values, each terminated by the FT character.

As opposed to folklore, MARC does NOT use a '$' as subfield delimiter, nor a '#' for unused indicators. Rather, the examples in the specs use a '$' to REPRESENT the subfield delimiter control character 31 (^_), and a '#' to REPRESENT a blank. The RT(29, ^]) is sometimes represented as '\' and the FT(30, ^^) as '^' or '@'.

Malete IIF import and export

The malete tool provides two rather simplistic commands iifimp and iifexp.
The command specific options are:
  • Ffile
    specify full filename for the IIF files. Default is the basename of the Malete database with extension .iif. On UNIX, a filename '-' selects stdin/out.
  • Nomarc (literally)
    do not assume the MARC structure 22/450 on import. Requires proper IIF data.
  • P[iic]
    on export, prepend indicators ii and, where needed, subfield c. A single -P uses two blanks as indicators and subfield '0'. Suggested to produce at least syntactically correct MARC.
  • Rid (literally)
    on import, use a numeric control number (1st field, if it has tag 1) as record id. Note that on export, the record id is always used as control number unless the record already has one, since this is specified as a must not only by MARC, but by IIF.

creating proper IIF from WinIsis

In Database-Export, set the subfield separator to \031 and output line length to 0.
If the fields do not contain valid MARC data, use a reformatting FST like
001 0 MFN
044 0 |00^a|,v44
024 0 |00^a|,v24
026 0 |00|,v26
070 0 (|00^a|,v70/)
Make sure, that
  • the first output field is tag 1 containing some unique id
  • every field (other than tagged 00*) starts with two indicator characters (really should be blank, but that would be stripped during export)
  • the indicators are followed by a delimiter and subfield identifier
Still the output is not 100% correct, since WinIsis sets number of indicators and identifier length to 0, where MARC specifies 2. However, many other MARC processors, including zebraidx, ignore these settings.

making MARC data available via Z39.50

MARC records can be made easily available using indexdata's zebra.
If records in your IIF file use tags and subfields conforming to, say, USmarc, simply check out the test/usmarc example in the zebra distribution. Put your data in the records subdir and run "zebraidx update records; zebrasrv".
If your data was exported from WinIsis, you may want to put a line "encoding Cp850" in the .abs file.

You must use recordType: grs.marc.something, meaning that it's general structured data in some marc file format. The sample usmarc.abs uses the "marc usmarc.mar" statement, and usmarc.mar (in the zebra/tab directory) contains "reference USmarc", stating that the marc input actually IS in USmarc. This need not be the truth, it just means that the records will be served as is, if a client asks for USmarc. However, only the tags listed in "elm" statements in the .abs files will be indexed.

Note that zebra's indexing support is not as flexible as that of CDS/ISIS: you can only select fields or subfields to be indexed in one of a couple of modes (like word or phrase). To take full advantage of sophisticated CDS/ISIS FSTs, include them in your export reformatting FST. Use some otherwise unused field tags to hold the index terms and "elm" statements to map them to bib-1 attributes. Omit those fields from the display mapping.

To keep the data in its native format (say CDS), change the elm statements to map the fields to index to the corresponding bib-1 attributes for searching, e.g. "elm 024 Conference-name !", and, instead of using the "marc usmarc.mar" statement, create one or more maptabs to map the full record to one or more USmarc a/o other presentation formats as applicable. Check out the example in the zebra/tab directory.

Consult the zebra documentation for details.


$Id: IIF.txt,v 1.6 2005/05/24 16:44:06 kripke Exp $