|
Protocol
|
The Malete server protocol.
| introduction |
The Malete server is based on passing of messages, which are represented
as records. The only interface to the server can be regarded as a single
function "send", which takes a record as parameter and returns a record.
The result record itself is a valid message.
This "send" can be actually invoked in one of two ways:
- by having the server in process
i.e. by actually calling the C function "send",
possibly via some wrapper to interface another programming language.
This is the way the Malete Tcl extension works.
- via some bytestream
This can be regarded as just one of the wrappers, interfacing a
bytestream by deserializing message records from the bytestream
and serializing result records to the bytestream.
The standard server process uses stdin and stdout and thus can
be invoked by executing it from pipes or by contacting it via TCP,
when running from
tcpserver.
As a special case, the record data file itself is such a bytestream,
however only containing simple write messages.
The server maintains a session state bound to a bytestream,
e.g. one TCP connection.
| messages and data |
In Malete, every record has a "header", which is the value of the first field.
The header specifies which message the record represents,
with the following fields ("body") containing parameter data for the message.
Recall that
- the first field's tag denotes the number of fields in the record
- a "data record" is a record that can be written to a database.
This requires a record id (MFN), which, however, can be 0
to denote an append with the next available id.
- for a data record read from or written to the database,
the header will/must be empty or start with a digit.
The general format is 'rid[@pos][*TAB*leader]'.
Rid is the record id (MFN), which on write may be 0 to append a new record.
Pos is the optional old position to guard an updating write
against concurrent changes.
Leader contains arbitrary data like e.g. a MARC leader,
a record key or a message header.
Proper message headers are not empty and do not start with a digit.
The first token of a message header (up to a *TAB* or end of value)
is the message name, optionally qualified by a message target,
i.e. an object to receive the message (usually a database).
However, messages and data are converted into each other canonically:
- If a data record header is encountered where a message is expected,
it is treated as a write message as if 'W*TAB*' where prepended
(which oviously will write just this record).
Even the empty message (a record with 0 fields) is a valid message
and will append an empty record when sent to a database.
- If a message is treated as data, its header is treated as leader
as if '0*TAB*' where prepended.
| message targets are objects |
A server processes messages by first looking up a target object by
inspecting and stripping an initial addressing part of the message header
(or resorting to some default) and then passing the message to this object.
(Actually, even this dispatching is done by an object, the session).
In general, objects are free in how they process messages.
For example, an object might represent a (session on a) remote server,
and simply pass every message there. Objects using the same processing
function are said to be in the same "class". Commonly processing functions
handle only some known messages and pass anything else on to the function
of another class, which is called "inheriting from this class".
Objects to which messages can be send are
- a structure
is a collection of other (child) objects like databases (tables).
It does basically nothing but passing messages to its childs.
It may support a listing of the known childs.
The structure interface may be implemented locally or as a remote server.
- a database (table)
supports reading and writing of record and query data.
A database is a structure, it may support childs e.g. to provide views.
- a session
is a structure representing the connection to a (local or remote) server.
It passes messages to the server's childs (like databases) and maintains
some state, called the environment.
Any object should recognize
- the comment '#'
a special message used to pass additional info (echo/error)
A structure in addition recognizes
- child addressing '.'
if the message name starts with a letter and contains a dot '.',
everything up to the dot is taken as the name of a child.
After stripping the child qualifier, the message is send to the child.
With no additional message, the child's existence is tested
and returned in a comment.
The qualification can contain several dots, which are processed from left.
Therefore, 'a.b.c' means to send message 'b.c' to target 'a',
which could be for example a remote server, which in turn is expected
to somehow dispatch message 'c' to its local child 'b'.
The standard messages a database should recognize are
- the write message W
writing one or more records to a database
- the read message R
reading records by record id
- the query message Q
to search the query data (btree index)
- the index message X
to write index data
- the terms message T
to list index entries
Standard message and object names always start with an ASCII letter.
As a convention, message names should start uppercase and
object names lowercase.
Every message returns an error comment message in case of error
or another message as specified (possibly the empty message).
The body of a message (i.e. the fields following the header)
may define a fixed or variable number of parameter fields
or one or more records, which are in turn, depending on the message,
used as message or data records (generally regardless of their contents):
- header only:
The message is not using any fields or records as parameter.
Such messages treat any body as embedded records (see below) specifying
one or more chained messages, which are then processed in turn.
A possible but currently unused generalization of this is
a fixed number of parameter fields.
- parameter list:
the contents of following fields is interpreted by the message itself.
Many messages use only one type of parameter fields and ignore their tags.
- embedded records:
Each of the records begins with a proper header field,
with the tag being its negative length (including the header).
A tag of 0 is treated as using all available fields.
Should such a tag be positive or specify a length
exceeding the number of available fields, the result is undefined,
but either an error or treating it as record using all available fields.
- immediate record:
Some messages also support a short form, where they do not themselves
take all of their header, but only chop off some initial part of it,
using the remaining message as record.
| write |
The write message takes one of two forms:
- short write (immediate record):
The header is of the form 'W*TAB*rid[@pos][*TAB*leader]',
and the following fields are the body of a record to write.
This message writes the record with header 'rid[@pos][*TAB*leader]'
and the body as given by the following fields.
It returns a short read message with the record id written.
- long write (embedded records):
The header is a single 'W'. The body contains any number of embedded records.
Multiwrite returns a long read message with the record ids written.
With an empty body, long write can be used to test the existence
and writeability of a database.
Note that there is no special support for deleting records;
writing empty records has the same effect.
| read |
Like write, the read message takes one of two forms,
all returning a long write for the retrieved records:
- short read (header only):
The header is of the form 'R*TAB*rid[*TAB*count]'.
It reads count (default 1) records starting at record rid.
A count of 0 reads any records as available and within the read limit.
Note that a read of record 0 retrieves the metadata.
- long read (parameter list):
The header is a single 'R'.
The following fields contain one record id each.
Note that
- the number of records read at once is limited
- read might retrieve older versions of records,
if the database has a snapshot position set
| query |
The query message is of the form 'Q[*TAB*query]',
where query is an expression in the
Malete query language.
With parameters, the query message creates a new query as the current.
With or without parameters, the query message returns an echo
of the estimated remaining result set size, followed by a long write
containing the next 'r' records from the current query set
(subject to a snapshot like read).
The query can contain two parts, separated by a '?':
- an index based search defining a result set.
If it is empty, the search result set is the entire database.
- a filter to be applied on record retrieval.
If no filter is specified (i.e. no '?'), only record ids are returned.
An empty filter selects every record with all fields.
Other filters will select records and/or fields.
In future versions, one or both parts might be specified as embedded
records. By now, however, the query message is header only.
Note that
- the size of a search result set is limited
This limit applies also to any intermediate result, thus the
actual set might be much smaller or even empty due to the limit.
Some search expressions might allow larger set sizes,
especially the empty one does (since no record ids need to be stored).
The returned echo contains several numbers:
- estimated number of remaining records, including the ones just read.
This number may be wrong for a number of reasons, especially it does
not account for filtering. However, if it equals the number of returned
records, it is safe to assume that there are no more records.
This number is the primary echo code, if it is negative,
the rest of the echo is some error message.
- number of the query, by which it can be referenced.
These numbers are per database.
- truncation record id. If not 0, this is a record id where the search
was truncated due to the result set size limit.
Future versions might support transparent continuation after truncation.
| terms |
The terms message has one of the forms
- 'T*TAB*from*TAB*to'
Selects terms greater or equal the first parameter and less than the second.
Where the second parameter is empty, no upper bound is used.
- 'T*TAB*prefix'
Selects terms with the parameter as prefix.
Using a prefix ABC is just a shorthand for from ABC to ABD.
- 'T*TAB*from*TAB*to*TAB*tag'
Like the first form, but restrict matches to the given tag (number).
Terms are returned as a list (record with 0-tagged fields),
where each field value is a count of hits of the term,
followed by a *TAB* and the term.
The list is limited to the result set size.
The full index can be looped by using the last returned term
as from parameter for the next invocation.
When not restricting to a tag, the hit count is just the number of all
index entries for the selected terms. This may be higher than the number
of matched records, where a term has multiple hits for the same records.
With a restriction to a tag, the count is the actual number of records
(even where a term has multiple entries for the same record and tag).
If the database uses the traditional fulltext index format (the default),
tag 0 selects any tag, else tag 0 selects actual tag 0 entries (unique keys).
| index |
The index message 'X' takes a parameter list of data and control fields.
Control fields have tag 0 and change the way the data fields are processed.
All other fields contain index data. During processing of the message,
a position counter is maintained which is incremented by one for every word
(in word or split mode), to the next multiple of the field step (default 65536)
for every field (1 in word mode), and reset to 0 on tag change.
Every control field contains one or more instructions
(as always, separated by TABs):
- f[pos]
sets default (full field) indexing mode where every data field contains
one index entry. The position is set to the given or 0 and then
incremented to the field step.
- w[pos]
Like field mode, but incrementing the position by one.
- s[pos]
Split mode, where each data field is split into words according
to collation info.
If the index has no collation info, all characters but the well-known
ASCII non-letters are assumed to be word characters.
- a[pos]
set add mode (default)
- d[pos]
set delete mode: following index entries are deleted.
- m[mode]
mode 'H' selects traditional conversion of angle brackets:
<a[=b]> is replaced by b (or nothing).
mode 'P' or none turns this off.
- p*pfx*
prepend prefix pfx to index entries
- r*id*
set record id (defaults to the session's last written record)
- [+|-]*tag*
where tag is a number, stops processing of the field and treats
everything after the next *TAB* as data field with *tag*.
With a leading + or -, set mode to add or del, resp.
Control instructions may also be part of the message header.
The index message echoes a count of the index entries made.
| comment |
The comment message '#' is used to augment other messages.
It is header only (executing any body) of the form '#*TAB*code[*TAB*message]',
where code is a number.
A nonnegative code indicates a success, typically some count.
A negative code indicates some sort of error (-1..-10) or notification.
Message is arbitrary.
This message copies itself to the result.
| serialization |
Message can be represented in byte streams according to the following rules:
- Field values (including the header) MUST NOT contain a newline character,
else the results are undefined. Where an application must be prepared
to handle newlines, it must take care of encoding them (see below).
- If the message header is empty, no header is printed
- else if the message is a regular message (not starting with a digit),
the header is printed followed by a newline.
- else 'W*TAB*' is printed followed by the header and a newline.
- All body fields are printed as the tag followed by a *TAB*,
the value and a newline.
- A single newline is printed to terminate the message.
On deserialization, if a message starts with a number (digit or -sign),
this is the tag of the first body field, and an empty header is to
be assumed (equivalent to a 'W*TAB*0' append message).
For all body fields, the deserialization must be done in the following steps:
- take an initial '-' sign and any digits as tag, defaulting to 0
- skip one following *TAB* character
- use anything up to a newline as value
Consequently, on serialization:
- a tag of 0 may and commonly will be omitted
- where a value does not start with a TAB,
the TAB may be ommited
- where a value does not start with a '-', digit or TAB,
both a 0 tag and the TAB may be ommited
- where values containing newlines are used unencoded,
they will in most cases result in following 0 tagged fields
However, ommiting the TAB is considered bad style.
The record data ("master") file is simply a stream of data record messages,
using headerless mode where possible (i.e. appends of leaderless records).
Some easy common encodings are suggested to deal with newline characters:
- in "field mode",
discard newlines by replacing them with spaces or tabs.
- in "text mode",
newlines are replaced with vertical tabs VT (ASCII 11, ^K).
This maybe reversed to restore newline-separated lines if needed,
but e.g. on printing the VT will have the desired effect.
- in "binary mode",
newlines are replaced as VT followed by a byte value 1,
if the newline is followed by a byte value 0 or 1, else by a single VT.
A VT is replaced by a VT and a 0 byte.
- as an "ultra robust binary mode", use BASE64.
The advantages of text mode over binary mode are
- it is slightly faster than the binary translation
- the serialized records do not need more space
(whereas the binary serialization might need twice the space)
The binary mode has the advantage of not loosing vertical tab characters that
might have been contained in the original field values.
It is fully transparent and can be used to store any binary data like images
with an average overhead of 0.4% (as compared to +33% with BASE64 encoding).
Note that for a plain text not containing control characters 0, 1 or 11,
text and binary mode have the same results, thus it is reasonably safe
for client libraries to use binary mode by default on all communication.
However, BASE64 has the advantage of even surviving a character set recoding,
thus is more robust for databases which may be exchanged internationally.
Also the overhead of BASE64 is fixed to 33% (4 bytes for every 3),
while the binary mode has a worst case of +100% (on all VTs).
$Id: Protocol.txt,v 1.14 2005/05/24 16:44:06 kripke Exp $
|
|