Shark CGI Function 0.3 - Sat Dec 28 2002
Copyright (C) 2002,2003 Michel Blomgren
shark@zebra.ath.cx
http://linuxego.mine.nu/

SourceForge project pages:
http://sharkcgi.sourceforge.net
http://sourceforge.net/projects/sharkcgi/


NOTICE

This is the C source version of the Shark CGI Function. There is also an
assembler source version (currently only for x86 Linux). I've dubbed this C
source version to 0.2.x since it's derived from the assembly source. The
current assembler source version is 0.1b and can be downloaded from
SourceForge.

See further down for license information.


WHAT IS THE SHARK CGI FUNCTION?

It's a one line function for programming CGI programs in C. It extracts
variables and contents from GET, POST & multipart/form-data forms, including
cookies, and makes them accessible as environment variables.

The Shark CGI Function was designed to allow a web-developer to write CGI:s in
C instead of PHP, ASP, JSP or Perl, etc. Since binary programs are much faster
and consume less resources than scripts that need an interpreter to execute
(PHP, ASP, JSP, Perl, etc.).


VERSION

This is version 0.3 (C source) of the Shark CGI Function. See the CHANGELOG
file for version changes/improvements/bugfixes, etc.


WHY "SHARK"?

Well, I admit that the word "shark" is a cliche, but my idea behind the name
was this: The function (that is the "Shark CGI Function") was supposed to be
fast and furious, yet simple and (hopefully) elegant, and that's, IMHO, a
shark. :)

In order not to confuse my product with any other product that just by accident
also happen to be named "Shark", the full name of the routine is the "Shark CGI
Function" and should always be described as that. Though I myself like to refer
to it as simply "Shark"... jadajada...


FEATURES

       o	Shark handles GET and POST forms by setting up the variable
		name and the variable's content, e.g. for a POST form you'll
		have the environment variable "sharkp_varname=varcontent" and
		for a GET form you'll have the environment variable
		"sharkg_varname=varcontent".

       o	Shark handles multipart/form-data by extracting all variables
		and putting them in the environment (e.g.
		"sharkmulti_varname=varcontent"). It extracts files that have
		been uploaded and puts them in temporary files, which could
		easily be accessed by the "sharkfile_" environment variable
		(e.g. "sharkfile_varname=/tmp/shark.temp/upload.4531").

       o	It also handles cookies by extracting any type of URL or ";"-
		terminated cookie format strings. E.g. cookie strings could be
		either "varname=varcontens; varname=varcontens;" or URL-encoded
		strings;
		"cookievar1=hello&cookievar2=hello+world;" (or both), e.g.

       o	It's not necessary for you, the CGI coder, to remove any
		temporary files that have been created during a file upload,
		Shark will automatically remove any tempfiles that are over 30
		seconds old. This check is done every time Shark is executed.

       o	In case of error, Shark will report an "Apache-like" error
		message to the requester (the end-user).

       o	Variables that already exist in the environ will be numbered,
		e.g. if variable "sharkp_fruit" is already defined and another
		"sharkp_fruit" is about to be defined it will be named
		"sharkp_fruit2". This is good for "<select multiple>"
		input-fields.

BAD FEATURES (or "so many good features and no bad ones?"):

       o	I have to admit that the multipart/form-data function is
		currently too slow, ideas on making it faster is highly
		appreciated! Here are some stats...

		While imitating a CGI environment and emulating a 970KB upload
		of two files, I time(1)'d it and I got these "real" times on
		different systems (this is _excluding_ any potential upload
		time, which was not emulated):

On an AMD Athlon XP 1800+ running Linux: 0.163s.
On a COMPAQ AlphaServer DS20E 666 MHz (2 processors) running Linux: 0.183s.
On a 375MHz PPC (604ev5 - MachV) running Linux: 0.49s.
On a TI UltraSparc II (BlackBird) running Linux: 0.34s.
On a R220 (SPARC) running SunOS 5.8: 0.5s.
On a dual Intel P3 Xeon 500Mhz running FreeBSD: 0.98s.

       o	One idea to make it faster would be to mmap(2) stdin instead of
		first allocating enough space with malloc(3) and then reading
		stdin into that buffer, which is probably somewhat slower.
		Though the major time-consumer is probably shark__locateit()
		used by shark__get_filename().

NOT SUPPORTED

       o	The "multipart/mixed" format is (still) not supported yet,
		sorry!


The handler for "multipart/form-data" was designed to strictly conform to a few
RFC documents (mainly RFC 2388, and 1867). This proved to be correct, most
browsers comply with the standard (in RFC 2388). If a browser doesn't strictly
conform to the RFC "multipart/form-data" format standard, the end-user will get
an error message saying that; "Your browser doesn't conform to the RFC standard
for multipart/form-data!".

Browsers that are confirmed to support Shark's "multipart/form-data" handler
(and comply with the RFC standard) are:

* Netscape 4.79 (for Linux)
* Netscape 6.2 (for Linux)
* Mozilla 1.0 (Mozilla/5.0 - for Linux)
* Konqueror 3.0.1 (for Linux)
* Internet Explorer 5.00.2919 (for Windows)
* Opera 6.02 (for Linux)

Basically, the only feature not supported (yet) is the "multipart/mixed" for
uploading more than one file per file-field, though not many browsers support
this feature anyway (I only know that Opera 6.02 support this, none of my other
browsers do). Shark will print an error message if "multipart/mixed" contents
have been uploaded.


USAGE

Instead of giving you a dense syntax, let me show you this small program:

-------------------------------------------------------------------------------

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

extern int shark();

int main()
{
	char program[] = "env";
	int pid;

	/* a simple way to call shark
	 * shark() always return 0 if OK, 1 if error occured...
	 */
	if (shark())
		return(1);	/* return from main if error occured */

	printf("Content-type: text/plain\r\n\r\n");
	fflush(stdout);		/* we must flush it, or else it will be printed
				   on the bottom, which we don't want! */

	pid = fork();		/* create a child process */

	if (pid == 0)
	{	/* child */
		execlp(program, program, NULL);
		printf("error executing %s!\n", program);
		return(-1);
	}

	wait(NULL);

	return 0;
}


-------------------------------------------------------------------------------

Make a form in a html file with a submit-button and look how the environment
looks like after Shark has been at it.


THE ENVIRONMENT VARIABLES

sharkg_varname - For "method=GET" forms (or variables defined after a
                 query URL, e.g. "http://localhost/cgi?hello=world").

sharkp_varname - For "method=POST" forms.

sharkcookie_varname - Variable used to obtain the contents in cookies. Cookie
                      strings could either be a simple "varname=varcontets;
                      varname=varcontents;" string or URL-encoded;
                      "cookievar1=hello&cookievar2=hello+world;", for example.
                      Either way, the variables will be extracted.

sharkmulti_varname - Basically the same as "sharkp_varname", but this time the
                     form had the 'enctype="multipart/form-data"' variable
                     added to it, and is thus a multipart/form-data form
                     instead of a simple "method=POST" form.

sharkfile_varname - If a file was uploaded through the "input=file" format this
                    variable will not point to the file's contents, but to the
                    complete path to a temporary file that was created.
                    The existance of the temporary file will expire if it's
                    more than 30 seconds old the next time shark(); is
                    executed. Thus, your CGI has 30 seconds to copy the file
                    to it's (optionally) permanent location. This gives you
                    another advantage, you don't have to remove the temporary
                    file, Shark will remove it for you when it's old enough to
                    be removed.


VARIABLES IN THE CODE (shark.c)

Here are a few variables you might want to take a look at before coding your
CGI, the defaults are...

    /* clean_up_interval = number of seconds to keep old tempfiles */
    #define clean_up_interval 30

    /* Bytes to reserve for "content" temporary buffer.
     * This means that the content (unescaped) in any form-field (e.g. a
     * <textarea>) can only be tempbuffer_size bytes long, anything over that
     * will be truncated.
     */
    #define tempbuffer_size 1024*16	/* 16KB */

    /* Maximum bytes to malloc(3) for multipart/form-data content */
    #define max_multipart_malloc 1024*1024*1	/* 1MB */

    ...

    static char tempdir[] = "/tmp/shark.temp";


Since version 0.3 you can re-define these with the "-D" flag when compiling
your CGI, e.g. by running...

gcc -Wall -s -O3 -D max_multipart_malloc=1024*1024*15 -o test.cgi test.c shark.c

List of all definitions you can define like above...

clean_up_interval		- How old, in seconds, files in the temp-dir
				  may be before deleted.
tempbuffer_size			- See shark.c for thorough information.
max_shark_environment_items	- Maximum number of environment items allowed
				  to be allocated.
max_post_malloc			- Maximum bytes to malloc(3) for complete
				  method=POST data.
max_multipart_malloc		- Maximum bytes to malloc(3) for
				  multipart/form-data content.


FUNCTIONS IN THE CODE (shark.c)

There are a few functions in Shark that you might find useful. They can all be
included in your CGI by simply "extern" them, same as with shark(). This is the
complete list of useful functions:

extern int shark(void);

extern int makelowercase(char *);
extern int makeuppercase(char *);
extern void unescape(char *);
extern char *findstringci(char *, char *);
extern void removechar(char *, unsigned int);
extern char *findzero(char *);
extern void removelastlf(char *);
extern char *percentencode(char *);
extern int cat(char *, FILE *);
extern void namealize(char *);
extern void truncatebuf(char *, unsigned int);
extern char *mallocfile(char *, unsigned int *);
extern char *getarg(char *, unsigned int delim, unsigned int arg);
extern char *gethttpvar(char *src, char *find);
extern void removedoubles(char *, unsigned int c);
extern void replacechar(char *srcstr, unsigned int from, unsigned int to);
extern int mknicefilename(char *);
extern char *sharksig(void);
extern char *sharkhtmlsig(void);
extern char *sharksig2(void);
extern void foreachline(FILE *strm, int(*function)(char *line, FILE *strm));
extern int checkemail(char *emailaddress);
extern int initsrand(void);
extern char *generatepasswd(void);
extern int rmrf(char *dir);


EXPLANATION

NOTE: compare() and comparez() have been removed from Shark 0.3 and has been
replaced by strncmp() and strcmp().


int makelowercase(char *string)
-------------------------------

	Makes all upper case letters lower case in a string.

	example usage:

	       char string[] = "TEST STRNG";

	       makelowercase(string);


int makeuppercase(char *string)
-------------------------------

	You get it...


void unescape(char *string)
---------------------------

	Unescapes a %-encoded string.


char *findstringci(char *nullterminatedbuf, char *cmpagainst)
-------------------------------------------------------------

	Case-insensitive find cmpagainst in nullterminatedbuf and return a
	pointer to whatever is after that string (Note: this function malloc's
	memory).


void removechar(char *string, unsigned int c)
------------------------------------

	Strips a null-terminated string from char c.

	example usage:

	       char string[] = "TEST\nSTRNG";

	       removechar(string, 0x0a);


char *findzero(char *buf)
-------------------------

	Finds and returns a pointer to the null in a null-terminated string.


void removelastlf(char *src)
----------------------------

	Removes the last linefeed (0x0A).


char *percentencode(char *string)
---------------------------------

	URL-encodes a string, for use in web environment, e.g.

	char *string is a null-terminated string to urlencode. percentencode()
	will return a pointer to the URL-encoded string (null-terminated), or
	NULL if malloc() wouldn't allocate maximum space needed to url-encode
	the string.


int cat(char *filename, FILE *out)
----------------------------------

	cat(), kind of like the program "cat", outputs a file to a stream
	(could be "stdout" or whatever).

	cat() will return != 0 if an error occured while doing either fopen(),
	fread(), or fwrite() - "errno" will speak the truth about the error.

	return codes from this function is as follows:

		0 = no error (everything OK).
		1 = couldn't open "filename".
		2 = couldn't fread().
		3 = couldn't fwrite().


void namealize(char *src)
-------------------------

	Converts a string like "michel blomgren" (or "MICHel   BLomgren") to
	"Michel Blomgren".


void truncatebuf(char *, unsigned int n)
----------------------------------------

	Null-terminates a buf at offset (buf+n) if it is longer than n-bytes.


char *mallocfile(char *file, unsigned int *filesize_out)
--------------------------------------------------------

	This routine is basically a copy of mmap(2), with one exception, it
	always adds a trailing zero (null, '\0') to the end of the allocated
	file to ensure that ascii routines never read passed an allocated ascii
	file.

	Example usage:	unsigned int filesize;
			char *pointer;

			pointer = mallocfile("/etc/passwd", &filesize)

	mallocfile() returns the pointer to the read file (same as malloc(3))
	or NULL if error occured (for example; file couldn't be stat'd or
	open'd). The filesize integer will contain the exact size in bytes of
	the file (as reported by stat(3); statbuf.st_size).


char *getarg(char *, unsigned int delim, unsigned int arg)
----------------------------------------------------------

	getarg() return an argument in a string separated with char 'delim'.
	The returned string should be free()'d. Returns NULL if argument number
	'arg' was not found or malloc() didn't behave, else it returns a
	free()-able ascii-zero string with the argument.


char *gethttpvar(char *src, char *find)
---------------------------------------

	gethttpvar() attempts to find a variable in a METHOD=GET or METHOD=POST
	type of string, if found it will return an unescaped string with it's
	contents or NULL if not found.


void removedoubles(char *string, unsigned int c)
-------------------------------------

	Removes every double character c from an ascii zero string.


void replacechar(char *string, unsigned int from, unsigned int to)
------------------------------------------------------------------

	Strips a null-terminated string from char 'from' and replaces it with
	'to'.

	example usage:

		char string[] = "TEST\nSTRNG";

		replacechar(string, 0x0a, 0x20);


int mknicefilename(char *src)
-----------------------------

	Turns a windows/unix filename into a "nice" basename. For use with
	Internet Exploder when uploaded files' filenames are e.g.
	"C:\Windows\Dsktpp\Suckfile.BMP" - which will be simply
	"suckfile.bmp" when passed through this function.

	Returns != 0 if an error occured (basename() fucked up), otherwise - if
	all is well - it returns 0.


char *sharksig(void) - including sharkhtmlsig(), sharksig2()
------------------------------------------------------------

	These functions returns a pointer to an ascii-zero string with the
	Shark version signature. sharkhtmlsig() will also print html-links to
	the project website.


extern void foreachline(FILE *strm, int(*function)(char *line, FILE *strm))
---------------------------------------------------------------------------

	Executes a user-specified function for each line in a stream. If the
	"function" returns (int) 0 it will continue to call it. If the
	"function" return != 0, then it will stop.

	foreachline() handles UNIX, DOS and/or Macintosh text files.

	e.g.:

		int myfunction(char *line, FILE *strm)
		{
			printf("%s\n", line);	// echo the line to stdout.
			return 0;
		}

		...

		foreachline(stdin, myfunction);


extern int checkemail(char *emailaddress)
-----------------------------------------

	Checks whether or not 'email' is an authentic e-mail address.
	checkemail() will check if the domain exists, but not whether an MX
	record exist or a responding SMTP connection established.

	Returns: 0 if 'email' is OK. != 0 if 'email' is not OK.


extern int initsrand(void)
--------------------------

	An attempt at initializing srand() with a random number without using
	/dev/random.


extern char *generatepasswd(void)
---------------------------------

	Generates a password. Returns NULL if strdup() failed. Before calling
	this function you must call initsrand().


extern int rmrf(char *dir)
--------------------------

	Will delete a path in the same manner as "rm -rf path" will. Returns -1
	if error or if all/some part(s) of a path could not be deleted. Errors
	occur when a syscall/function fail, e.g. unlink(), rmdir(), chdir(),
	getcwd(), etc.



COMPILATION AND APPLICATION

Shark was originally made using the Netwide Assembler. Approximately two weeks
later I re-wrote it in C using gcc v2.95.3. It is known to compile using:

	gcc version 2.95.3 20010315 (release)
	gcc version 2.95.4 20020320 [FreeBSD]
	gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-112)

You can not use Sun's "ucbcc"! Why? Because the d_name in the dirent structure
starts 2 bytes off, so dp->d_name return a pointer where the first two
characters of each filename is missing. If ucbcc didn't have this "bloat",
Shark compiles and work correctly (proven and tested) with "ucbcc: Sun WorkShop
6 update 1 C 5.2 2000/09/11".

The simplest way to produce a CGI would be to create a directory for your CGI
and then extract Shark into that directory (tar -xzf sharkcgi-version.tar.gz).
Write your CGI (see the template under "USAGE" for an example) and then compile
it...

$ gcc -Wall -s -O3 sharkcgi-version/shark.c yourcgi.c -o yourcgi.cgi

...and you're done.


REPORTING BUGS

Reporting bugs or any other faults (e.g. RFC-incompliance, etc.) can be done
using SourceForge's trackers: http://sourceforge.net/tracker/?group_id=60803


LICENSE

The Shark CGI Function is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by the
Free Software Foundation; either version 2.1 of the License, or (at your
option) any later version.

This function is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more
details.


CREDITS

This program could not have been made without these factors...

* Linux/GNU (without it I wouldn't have had the knowledges I have today).
* The Netwide Assembler (nasm, the assembler in which Shark was originally
  written -  http://nasm.sourceforge.net/).
* The GNU tools, especially the GNU C compiler and linker.
* The Un-CGI by Steven Grimm (http://www.midwinter.com/~koreth/uncgi.html).
* NEdit (the editor used to write it - http://nedit.org/).
* GNU Window Maker (window manager for X - http://windowmaker.org/).
* Slackware (the GNU/Linux distro I use - http://slackware.org).

Also, last but not least, a big Thank You to SourceForge.net and DynDNS.org
respectively for great services!


The Shark CGI Function documentation was written by Michel Blomgren in 2002.
Contact: shark@zebra.ath.cx.

