GNI - coronita

Coronita

Announcement
htld

Selene

Announcement

Malete

DownLoad
Status
OverView
Usage
Structures
Protocol
Query
MultiProcess

formats

FileFormats
character sets
... and converting
CDS/ISIS
IIF/ISO2709

misc

changes from earlier versions
tag numbers

drafts (partly obsolete)

MetaData
object model
Tcl

coronita

coronita -- a tiny webserver

The current release 1.0.4 has been tested on Linux (2.4/x86), Solaris (10/x86 and T1) and Mac OS (10.4/PPC).

Coronita is

a standalone HTTP/1.0 webserver
heavily based on unix fork.
Made for Linux and tested on Solaris. Should work on most *nixes with some patching of the sendfile code.
pretty fast in serving static content
very fast in serving "NCGI" (similar to NPH CGI)
supporting SSL (spawning sslio per connection)
not for the faint of heart.
For speed and flexibility there is next to no double-checking or error handling. Run buggy CGIs under some graceful wrapper. Contribute a Perl script to check your setup before runnning a server. See issues below.

overview

Coronita forks and monitors n childs. The childs loop accepting new connections.

For each request, at most 8K are read as headers.
Coronita serves HTTP/1.0, including HTTP/0.9 Simple-Requests and HTTP/1.0 persistent connections. Header unfolding and omitting the CR before LF is supported. All tabs are blanked, any other control char (0-31,127) aborts.

Any host header (up to a ':', defaulting to "default") must correspond to a subdirectory (of ".") with at least 4 characters. The process chdir(2)s there and opens ".log" as fd 1.
An unclean Request-URI (or host header) is refused (see below). After stripping the ?QUERY_STRING, the filename is stat(2)ed. If the specified file does not exist, and is not a dangling symlink, or is a directory, it's a 404.

Coronita uses the error file .404 (not found) to handle most special cases. This is usually a NCGI script.

If the file is a dangling symlink, a 301 redirect to the link contents is sent (so the link should better read as fully qualified URL http://...).

If the file is a non-executable regular file:

a POST is handed over to .404
a file which is not world readable (S_IROTH) is handed over to .404
for a GET, If-Modified-Since is honoured with a 304, if appropriate
for a HEAD or (not simple) GET, a HTTP/1.0 header is written
for a GET, the file is copied or sendfile(2)d

Else the file should be an executable regular file. Coronita uses fork, execve and wait to run a NCGI executable.
Coronita forks a child (i.e. grandchild) if a connection is blocking or it should keep the connection alive after the request. No keep-alive is attempted, if the executable exits with a code other than 0.

request line

The general syntax of a Request-URI ("abs_path") as of HTTP/1.0 (3.2.1) is / [path] [";" params] ["?" query].
Coronita follows URI: Generic Syntax (3.) by treating ";" as ordinary character and thus anything up to the first "?" as filename (including any "params", which have no special semantics anyway).

RFC 1738 URLs (2.2) states that ``only (US_ASCII) alphanumerics, the special characters "$-_.+!*'(),", and reserved characters (";/?:@=&") used for their reserved purposes may be used unencoded within a URL´´. But ``on the other hand, characters that are not required to be encoded (including alphanumerics) may be encoded within the scheme-specific part of a URL, as long as they are not being used for a reserved purpose´´.
RFC 2396 obsoletes 1738 (and 1808 "relative URLs"), makes "$+," reserved, "~" unreserved, and states in URLs (2.3) that escaping the unreserved characters (US-ASCII alphanum and "-_.!~*'()") ``should not be done´´. Well.
HTTP/1.0 (3.2.1) refers to 1738 as source of ``definitive information´´, and immediatly violates it by allowing the unwise "[\]^`{|}~" and 128-255 as "national".

Anyway, according to HTTP/1.0 (5.1.2) an ``origin server must decode the Request-URI in order to properly interpret the request´´.

However, ``origin server´´ applies to the complete system including CGIs. Clearly the Request-URI's query part MUST be passed undecoded to CGIs, only it's parts may be decoded by some application code more or less carefully depending on the intended usage.
For the filename, there is no reason to decode it whatsoever, since user agents do not encode it (at least as long as it looks "reasonable", especially does not contain blanks, which simply can not be send unencoded).
To be on the safe, fast and convenient side, Coronita recommends to use URL-encoded representations of any weird characters both in the actual filenames and in URLS. Should some stupid user agent (probably IE 7) start to apply unwanted additional URL-encoding, the RFC's requirement still may be met by the .404 handler.

So, Coronita does not do any URL-decoding at all. In addition to refusing any controls, a filename (path) is refused with a 400, if it does not start with "/" or any segment starts with a "." or "/".

request headers

Coronita honours Host, If-Modified-Since and Connection, and collects Referer, User-Agent and Cookie for logging. Content-Type and Content-Length are recognized for CGIs.

HTTP/1.0 specifies (RFC 1945, sec 4.3 , 5.2 ):

If-Modified-Since: honoured for GET/HEAD. Only the recommended RFC 1123 style dates are recognized, not other obsolete variants allowed by RFC 1945 3.3.
Referer, User-Agent: logged
Date, Pragma, Authorization, From, Entity-Headers: ignored

HTTP/1.1 adds (RFC 2616 14.10 et al.):

Connection: close or keep-alive
Host: honoured
Range: ignored (scripts should check)
Transfer-Encoding: ignored (scripts handling POST should check)

(... and mucho more)

response headers

Coronita sends Date, Location, Content-Length, Content-Type, Last-Modified, Server, Connection and Keep-Alive. Dates are sent as of RFC 1123.

HTTP/1.0 specifies (RFC 1945, sec 4.3 , 6.2 ):

Date: current date (of receiving complete request header)
Location: used with redirect
Server: optionally set at compiletime
WWW-Authenticate: not used

and the entity header fields (sec 7.1 ):

Content-Length: file size
Content-Type: text/html or as of file extension
Last-Modified: file's ctime
Expires: current date + configured ttl
Allow, Content-Encoding: not used

HTTP/1.0 additionally specifies (RFC 1945, sec D.2 ) Accept, Accept-Charset, Accept-Encoding, Accept-Language, Content-Language, Link, MIME-Version, Retry-After, Title and URI, which are neither honoured nor sent, resp.

HTTP/1.1 adds (RFC 2616, sec 14.9 ):

Cache-Control: not used

CGIs, however, are free to send any headers they seem fit, especially regarding Pragma: no-cache, Expires and Cache-Control. CGIs should NOT use a protocol version other than HTTP/1.0, as this would signal capabilities we do not have to proxies and clients.

CGI

Coronita supports a variant of non-parsed header (NPH) CGI called NCGI. NCGI scripts talk directly to the client socket on fd 0 and are responsible for logging to fd 1 (since Coronita does not learn the requests outcome). In other words, most of the CGI processing logic is moved from the webserver to the script, resulting in more speed and flexibility.
As a special application NCGI scripts can hijack the connection by backgrounding themselves and returning non-zero to Coronita, which is useful for large downloads, streaming or chat sessions.

The supported environment variables are:
The fixed environment is passed unmodified as inherited. Suggested values to impress your CGI scripts are:

SERVER_SOFTWARE: Coronita
GATEWAY_INTERFACE: CGI/1.1

Per connection (unchanged on subsequent keep-alive requests):

REMOTE_ADDR: client IP address as of
RFC 2373
REMOTE_HOST, REMOTE_IDENT: not used
X_CONN: the port, with a '$' prepended, if using a filter like sslio

Per request:

REQUEST_METHOD: GET, HEAD or POST
SCRIPT_NAME: the unmodified Request-URI up to ? for a found CGI
PATH_INFO: the unmodified Request-URI up to ? for a .404
QUERY_STRING: everything after the ?
SERVER_NAME: as of Host header or "default", lowercased, port stripped to SERVER_PORT.
SERVER_PROTOCOL: HTTP/rev (actually the CLIENT protocol). This is unset for a HTTP/0.9 (Simple-)Request, indicating that a Simple-Response (entity body only) is expected. It is suggested that subrequests unset SERVER_PROTOCOL to include bodies only or use INCLUDED to ask for naked bodies (HTML fragments without header).
CONTENT_LENGTH, CONTENT_TYPE: as given
PATH_TRANSLATED, AUTH_TYPE, REMOTE_USER: not set by Coronita

Coronita sets additional per-request variables:

X_KALI: request number (1,2,...) in connection, if keep-alive (unset else)
X_DATE, X_GTFM: request date in RFC 1123 (for response) and GTF+msec (for log)
X_IFMS: value of If-Modified-Since in secs

In general,

http$S://$SERVER_NAME[:$SERVER_PORT]$SCRIPT_NAME$PATH_INFO[?$QUERY_STRING]

should reference the current resource (with S = 's', if X_CONN starts with a '$');

According to RFC 1945 4.2 header (field-)names are tokens and may thus contain not only alphanum and "-", but also "!#$%&´*+.^_´|~". Coronita ignores (silently drops) such headers.
According to CGI/1.1 4.1.18 , all other headers (i.e. but the special Connection, Host, If-Modified-Since, Content-Length and Content-Type) are passed per request with HTTP_ prependend, all alpha uppercased and any "-" converted to "_" (in the header name).

Standard parsed header CGI scripts must be run by some wrapper, which creates a proper HTTP response, does logging, compensates for a couple of deviations from RFC 3875 "MUSTs" and may even support the command line (NPH scripts may check QUERY_STRING for a "=").

Uniform Resource Locators (URL) update thereof HTTP/1.0 HTTP/1.1 HTTP Authentication CGI ("current") CGI 1.1 (old draft) (or go to RFC.net for nice HTML versions with useful links) HTML 4.01 forms

logging

Log is written in tab separated lines to fd 1 (i.e. vhost/.log).
Every line starts with

X_GTFM (request time with msec YYYYMMDDhhmmssttt)
REMOTE_ADDR
X_CONN[#X_KALI] (e.g. port#request)
msec (log time - request time)
code (3-digit response code or something like "LOG", "DBG")

0xx codes are used for severe errors (not generating a response). Scripts may use non-numeric response codes for debug logging.
Request (i.e. non-debug) log lines continue with

size (complete response incl. headers)
B,H,G_ims_ or P_cl_ for body (simple), head, get and post, resp.
REQUEST_URI (up to '?')
QUERY_STRING
SERVER_PROTOCOL (with HTTP/1. stripped)
Refer(r)er
User-Agent
Cookie

and optionally (in that suggested order)

REMOTE_USER
remarks (e.g. session id)
other notes like some post data

running and configuration

CC='diet -Os gcc' make
env - SERVER_SOFTWARE=for-CG-eyes-only bin/coronita -p8080

runs coronita listening on port 8080 (any address).
Per extension and default mime types are compiled in.
The command line has -options in any order. Anything from the first arg not starting with a '-' will be spawned as filter for every connection with fd 0 to the client and one end of a socketpair (to Coronita) on fd 1.

Options have a single letter immediatly followed by some number. Options are:

a address (IP) as decimal. 2130706433 (printf %d 0x7f00000 is localhost).
b # backgrounded connections per child (default 256)
c # childs to spawn (default 8)
h ttl for text/html (time to live in seconds) default 0 (now)
i initial timeout in msec before backgrounding after accept (default 10)
p port
s sendfilelimit: larger files use sendfile (default 24K)
t ttl for other types (default 86400 - one day)
u uid to set after binding the socket (if real and effective are 0)

If the effective uid is 0, as required to bind reserved ports, Coronita chroots to '.' and sets the real uid (if it is not 0, i.e. you made coronita setuid) or the specified uid (if real is 0).

env - bin/coronita -p443 -u65535 bin/sslio 'cert.pem;ca.pem'

runs coronita as SSL server using certificate chain from files cert.pem and ca.pem and the default keyfile priv.pem.
Note: since this runs chrooted to ., you have to setup some minimal environment for sslio (instructions for sslio on linux with dietlibc):

mkdir dev
mknod dev/random c 1 8
mknod dev/urandom c 1 9
mkdir proc
cp /proc/cpuinfo proc

Sslio also expects to find file cert.pem and priv.pem as found in the matrixssl distro. Your scripts may also require some files in /etc (like passwd, group, resolv.conf) a/o /tmp and /var.

speed

We performed speed tests comparing coronita against next to anything we could get hold of, and found it outperforming most others in most tests (1000 bytes OneK.txt, 15145 LICENSE(GPL).txt, 137582 bytes rfc1945.txt), when it comes to keep-alive a/o CGI.
With keep alive on OneK.txt, 10 httpds and ab (apache bench) concurrency 10, we found coronita to top 10,000 requests/sec on average (12,000 sometimes) on a 800MHz Pentium III, kernel 2.4.13.
See FeFe's tests for other numbers.

The http_load tests (no keep-alive) as of thttp benchmarks showed Coronita to be on par with thttp in static no-keepalive pages: doing the small test with -parallel 10, 100 and 250, thttpd went from about 3000/sec down to about 1700, Coronita from about 3700 to 2300 but a little bit less stable. However, these tests fry the TCP stack anyway.

In another test we compared coronita to lighttpd on a much faster machine (P4 2.8 GHz, kernel 2.6.13) with keepalive on a 4K image. With concurrency 100, coronita does more than 13.500/sec, while lighty makes 11.500/sec. With concurrency 1000, numbers drop to 12.000 and 10.800, resp.

On a Sun Fire X4200 with two single core Opteron 248 running Solaris 10 we got more than 30.000 requests/sec (-k -c100 1K file, almost the same locally and remote over Gbit ethernet) and still a whopping 12.500 without keepalive. However, once 32K sockets are in TIME_WAIT, throughput drops to some 500 requests a second.

discussion

As a http daemon, Coronita only catches the worst of input. For example it completely ignores any URL encoding issues. The idea is that applications need to check anyway, e.g. when processing POST data.
No attempt is made to try to prevent stupid or malicious users from symlinking /etc/passwd and things like that, since a webserver can not prevent their CGIs from doing much more evil. Run a chrooted/ulimited/niced/quotad... httpd for untrusted vhosts.

Coronita strictly limits the amount of input and uses buffers of fixed but sufficiently large size for further processing. Please check the code for errors in these calculations!

In general, functionality like index.html, directory index, checking for scripts somewhere in the path, authorization etc should be added via .404.
The only extension considered is an external connection manager to exchange connections via an unix socket with coronita. This would allow for throttling like thttpd does and for delaying close to reduce the number of sockets in TIME_WAIT.

download

source (45K) static coronita binary (linux/dietlibc) (13K), static sslio binary (48K) and the htld hypertext linker binary (10K)

friends

We owe much to FeFe's fnord and dietlibc
See also lighttpd mathopd thttpd boa webfs publicfile

$Id: coronita.txt,v 1.9 2006/03/04 15:19:00 krip Exp $