libu8
u8streamio.h File Reference

These functions and macros support I/O with UTF-8 streams. More...

Data Structures

struct  U8_STREAM
 struct U8_STREAM is an abstract structural type which is extended by U8_INPUT and U8_OUTPUT. More...
 
struct  U8_OUTPUT
 struct U8_OUTPUT is an structural type which provides for UTF-8 output. More...
 
struct  U8_INPUT
 struct U8_INPUT Structure used for stream-based UTF-8 input. More...
 

Macros

#define U8_STREAM_MALLOCD   0x01
 This bit describes whether the stream is mallocd or static. More...
 
#define U8_OUTPUT_STREAM   0x02
 This bit describes whether the stream is an output or input stream. More...
 
#define U8_FIXED_STREAM   0x04
 This bit describes whether the stream can grow to accomodate more input or output. More...
 
#define U8_STREAM_OWNS_BUF   0x08
 This bit describes whether the stream is responsible for freeing its buffer when closed. More...
 
#define U8_STREAM_OWNS_XBUF   0x10
 This bit describes whether an XFILE stream is responsible for freeing its translation buffer when closed. More...
 
#define U8_STREAM_OWNS_SOCKET   0x20
 This bit describes whether an XFILE stream is responsible for closing its socket/file descriptor when closed. More...
 
#define U8_STREAM_CAN_SEEK   0x40
 This bit describes whether seeks are possible on an XFILE's underlying socket/file descriptor. More...
 
#define U8_STREAM_CRLFS   0x80
 This bit describes whether the XFILE should do CRLF translation. More...
 
#define U8_STREAM_TACITURN   0x100
 This bit describes a verbosity level for the stream. More...
 
#define U8_STREAM_UTF8WARN   0x200
 This bit describes whether the stream should emit warnings for invalid UTF-8 bytes or sequences. More...
 
#define U8_STREAM_UTF8ERR   0x400
 This bit describes whether the stream generate errors and stop on UTF-8 errors. More...
 
#define U8_STREAM_UTF8FIX   0x800
 This bit describes whether the stream should try to fix UTF-8 errors. More...
 
#define U8_STREAM_OVERFLOW   0x1000
 This bit describes whether the stream is fixed length and has overflowed. More...
 
#define U8_INIT_OUTPUT(s, sz)   U8_INIT_OUTPUT_X((s),sz,NULL,0);
 Initializes a string output stream with a particular initial size This always allocates a buffer but arranges for the buffer to grow. More...
 
#define U8_INIT_OUTPUT_BUF(s, sz, buf)   U8_INIT_OUTPUT_X((s),sz,buf,0);
 Initializes a string output stream with a initial buffer. More...
 
#define U8_INIT_STATIC_OUTPUT(s, sz)   {memset((&s),0,sizeof(s)); U8_INIT_OUTPUT_X((&s),sz,NULL,0);}
 Initializes a string output stream with a particular initial size This always allocates a buffer but arranges for the buffer to grow. More...
 
#define U8_INIT_FIXED_OUTPUT(s, sz, buf)   U8_INIT_OUTPUT_X(s,sz,buf,U8_FIXED_STREAM)
 U8_INIT_FIXED_OUTPUT Initializes a string output stream with a fixed size buffer This stream will discard content after the buffer is exhausted. More...
 
#define u8_outstring(s)   ((s)->u8_outbuf)
 Returns the string content of the output stream. More...
 
#define u8_outlen(s)   (((s)->u8_outptr)-((s)->u8_outbuf))
 Returns the length in bytes of the string content of the output stream. More...
 
#define U8_INIT_INPUT(s, n)   U8_INIT_INPUT_X(s,n,NULL,0)
 Initializes an input stream with a buffer of n bytes This allocates the buffer and sets its U8_OWNS_BUF bit. More...
 
#define u8_getrec(f, eos)   (u8_gets_x(NULL,0,f,eos,NULL))
 Returns a UTF-8 string from f terminated by eos or the end of the stream. More...
 
#define u8_gets(f)   (u8_gets_x(NULL,0,f,"\n",NULL))
 Returns a UTF-8 string from f terminated by a newline the end of the stream. More...
 
#define u8_current_output   u8_global_output
 This macro (which looks like a variable) refers to the default output for the current thread or the global output (when no thread default has been defined).
 
#define u8_current_output   ((u8_default_output)?(u8_default_output):(u8_global_output))
 This macro (which looks like a variable) refers to the default output for the current thread or the global output (when no thread default has been defined).
 

Typedefs

typedef struct U8_STREAM U8_STREAM
 struct U8_STREAM is an abstract structural type which is extended by U8_INPUT and U8_OUTPUT. More...
 
typedef struct U8_OUTPUT U8_OUTPUT
 struct U8_OUTPUT is an structural type which provides for UTF-8 output. More...
 
typedef struct U8_INPUT U8_INPUT
 struct U8_INPUT Structure used for stream-based UTF-8 input. More...
 

Functions

U8_EXPORT U8_OUTPUTu8_open_output_string (int initial_size)
 Allocates and opens an output string with an initial size. More...
 
U8_EXPORT U8_INPUTu8_open_input_string (u8_string input)
 Opens an input stream reading characters from the UTF-8 string input. More...
 
U8_EXPORT u8_string u8_gets_x (u8_byte *buf, int len, struct U8_INPUT *f, u8_string eos, int *sizep)
 Reads a string from f into buf up to the string eos. More...
 
U8_EXPORT int u8_ungetc (struct U8_INPUT *f, int c)
 Puts the character c back into the input stream f. More...
 
U8_EXPORT int u8_probec (struct U8_INPUT *f)
 Returns the next character to be read from f. More...
 
U8_EXPORT int u8_peekc (struct U8_INPUT *f)
 Returns the next character to be read from f. More...
 
U8_EXPORT void u8_set_global_output (u8_output out)
 Sets the global output stream to out. More...
 
U8_EXPORT void u8_set_default_output (u8_output out)
 Sets the default output stream (for the current thread) to out. More...
 
U8_EXPORT void u8_reset_default_output (u8_output out)
 Resets the default output stream (for the current thread) to out. More...
 
U8_EXPORT U8_OUTPUTu8_get_default_output (void)
 Gets the default output stream for the current thread. More...
 
U8_EXPORT int u8_get_entity (U8_INPUT *in)
 Reads and interprets an XML character entity from in. More...
 

Variables

U8_EXPORT u8_output u8_global_output
 This variable is the global output stream (a pointer to a U8_OUTPUT structure or equivalent)
 

Detailed Description

These functions and macros support I/O with UTF-8 streams.

The files here provide a generic buffered I/O layer and immediate operations with in-memory streams writing to UTF-8 byte buffers. Within libu8, this provides support for "xfiles" which provide automatic conversion to/from external character encodings.

Macro Definition Documentation

#define U8_FIXED_STREAM   0x04

This bit describes whether the stream can grow to accomodate more input or output.

#define u8_getrec (   f,
  eos 
)    (u8_gets_x(NULL,0,f,eos,NULL))

Returns a UTF-8 string from f terminated by eos or the end of the stream.

The terminating sequence itself is not included in the result.

Parameters
fa pointer to a U8_INPUT stream
eosa string indicating the end of a record
Returns
a UTF-8 string pointer
#define u8_gets (   f)    (u8_gets_x(NULL,0,f,"\n",NULL))

Returns a UTF-8 string from f terminated by a newline the end of the stream.

Parameters
fa pointer to a U8_INPUT stream
Returns
a UTF-8 string pointer
#define U8_INIT_FIXED_OUTPUT (   s,
  sz,
  buf 
)    U8_INIT_OUTPUT_X(s,sz,buf,U8_FIXED_STREAM)

U8_INIT_FIXED_OUTPUT Initializes a string output stream with a fixed size buffer This stream will discard content after the buffer is exhausted.

Parameters
sa pointer to a U8_OUTPUT structure
szthe number of bytes in the buffer
bufa pointer to a byte buffer which must exist
Returns
void

Referenced by u8_server_status(), u8_server_status_raw(), u8_sessionid(), and u8_ungetc().

#define U8_INIT_INPUT (   s,
 
)    U8_INIT_INPUT_X(s,n,NULL,0)

Initializes an input stream with a buffer of n bytes This allocates the buffer and sets its U8_OWNS_BUF bit.

Parameters
sa pointer to a U8_INPUT stream
nthe size of the buffer for the stream to use
Returns
void
#define U8_INIT_OUTPUT (   s,
  sz 
)    U8_INIT_OUTPUT_X((s),sz,NULL,0);

Initializes a string output stream with a particular initial size This always allocates a buffer but arranges for the buffer to grow.

Parameters
sa pointer to a U8_OUTPUT structure
szthe number of bytes in the buffer
Returns
void

Referenced by u8_open_output_string().

#define U8_INIT_OUTPUT_BUF (   s,
  sz,
  buf 
)    U8_INIT_OUTPUT_X((s),sz,buf,0);

Initializes a string output stream with a initial buffer.

This will allocates a buffer if the output grows beyond the initial size.

Parameters
sa pointer to a U8_OUTPUT structure
szthe number of bytes in the buffer
bufa pointer to a byte/character array with at least sz elements
Returns
void
#define U8_INIT_STATIC_OUTPUT (   s,
  sz 
)    {memset((&s),0,sizeof(s)); U8_INIT_OUTPUT_X((&s),sz,NULL,0);}

Initializes a string output stream with a particular initial size This always allocates a buffer but arranges for the buffer to grow.

Parameters
sa pointer to a U8_OUTPUT structure
szthe number of bytes in the buffer
szthe number of bytes in the buffer
Returns
void

Referenced by u8_convert_crlfs(), u8_decompose(), u8_downcase(), u8_errstring(), u8_make_string(), u8_mime_convert(), u8_mkstring(), u8_rusage_string(), u8_server_status(), u8_server_status_raw(), u8_sessionid(), u8_string_append(), u8_string_subst(), u8_syslog(), u8_upcase(), u8_use_syslog(), and u8_valid_copy().

#define u8_outlen (   s)    (((s)->u8_outptr)-((s)->u8_outbuf))

Returns the length in bytes of the string content of the output stream.

#define U8_OUTPUT_STREAM   0x02

This bit describes whether the stream is an output or input stream.

Referenced by u8_open_output_string().

#define u8_outstring (   s)    ((s)->u8_outbuf)

Returns the string content of the output stream.

#define U8_STREAM_CAN_SEEK   0x40

This bit describes whether seeks are possible on an XFILE's underlying socket/file descriptor.

#define U8_STREAM_CRLFS   0x80

This bit describes whether the XFILE should do CRLF translation.

This is mostly neccessary for dealing with DOS/Windows, and causes newlines (0x) to turn into the sequence (0x0x).

#define U8_STREAM_MALLOCD   0x01

This bit describes whether the stream is mallocd or static.

Mallocd streams are freed when closed.

Referenced by u8_open_input_string(), and u8_open_output_string().

#define U8_STREAM_OVERFLOW   0x1000

This bit describes whether the stream is fixed length and has overflowed.

#define U8_STREAM_OWNS_BUF   0x08

This bit describes whether the stream is responsible for freeing its buffer when closed.

Referenced by u8_clear_errors(), u8_log(), u8_open_input_string(), u8_open_output_string(), and u8_set_logfn().

#define U8_STREAM_OWNS_SOCKET   0x20

This bit describes whether an XFILE stream is responsible for closing its socket/file descriptor when closed.

#define U8_STREAM_OWNS_XBUF   0x10

This bit describes whether an XFILE stream is responsible for freeing its translation buffer when closed.

#define U8_STREAM_TACITURN   0x100

This bit describes a verbosity level for the stream.

This may be consulted by I/O routines to determine detail or decoration.

#define U8_STREAM_UTF8ERR   0x400

This bit describes whether the stream generate errors and stop on UTF-8 errors.

#define U8_STREAM_UTF8FIX   0x800

This bit describes whether the stream should try to fix UTF-8 errors.

(Not yet implemented.)

#define U8_STREAM_UTF8WARN   0x200

This bit describes whether the stream should emit warnings for invalid UTF-8 bytes or sequences.

Typedef Documentation

typedef struct U8_INPUT U8_INPUT

struct U8_INPUT Structure used for stream-based UTF-8 input.

This structure is subclassed by other structures which share its initial fields, allowing casting into the more general class which input functions operate over. At any point, the stream has at least one internal buffer of UTF-8 characters, pointed to by u8_inbuf and with a current cursor of u8_inptr and a limit (the end of valid data) of u8_inlim. The size of the buffer is in u8_bufsz and various other bits are stored in u8_streaminfo. If an input operation needs more than the buffered data, the u8_fillfn is called on the stream, if non-NULL. Also provided is a u8_closefn which is used whenever the application indicates that it is done with a stream.

typedef struct U8_OUTPUT U8_OUTPUT

struct U8_OUTPUT is an structural type which provides for UTF-8 output.

This structure is subclassed by other structures which share its initial fields, allowing casting into the more general class which output functions operate over. At any point, the stream has at least one internal buffer of UTF-8 characters, pointed to by u8_inbuf and with a current cursor of u8_inptr and a limit (the end of writable data) of u8_inlim. The size of the buffer is in u8_bufsz (note that this is redundant with u8_outlim) and various other bits are stored in u8_streaminfo. If an output operation overflows the buffer, the u8_flushfn (if non-NULL) is called on the stream. If space is still not available, the output buffer is automatically grown. Also provided is a u8_closefn which indicates that an application is done with a stream.

typedef struct U8_STREAM U8_STREAM

struct U8_STREAM is an abstract structural type which is extended by U8_INPUT and U8_OUTPUT.

The general layout of a stream structure is an integer buffer size, and an integer to store streaminfo bitwise. This is followed by three string pointers into a UTF-8 stream, either for input or output, and a pointer to a close function and a transfer (xfn).

Function Documentation

U8_EXPORT U8_OUTPUT* u8_get_default_output ( void  )

Gets the default output stream for the current thread.

This defaults to u8_global_output.

Returns
a pointer to a U8_OUTPUT structure

References u8_global_output, u8_reset_default_output(), and u8_set_default_output().

Referenced by u8_set_global_output().

U8_EXPORT int u8_get_entity ( U8_INPUT in)

Reads and interprets an XML character entity from in.

Parameters
ina pointer to a U8_INPUT stream positioned just after the ampersand (&) of an XML character entity
Returns
a unicode code point

References u8_parse_entity().

U8_EXPORT u8_string u8_gets_x ( u8_byte *  buf,
int  len,
struct U8_INPUT f,
u8_string  eos,
int *  sizep 
)

Reads a string from f into buf up to the string eos.

This stores the number of bytes read into sizep and returns a pointer to buf. If there is not enough space in buf (which has len bytes), u8_gets_x returns NULL but deposits the number of bytes needed into sizep. If buf is NULL, this function allocates a new buffer/string with enough space to hold the requested data. The terminating sequence itself is not included in the result.

Parameters
bufan buffer/string of n bytes
lenthe number of bytes available in buf
fa pointer to a U8_INPUT stream
eosa UTF-8 string indicating the "end of record"
sizepa pointer to an int used to record how many bytes were read (or are needed)
Returns
a pointer to the buffer or results or NULL if the provided buffer was to small to contain the requested data.
U8_EXPORT U8_INPUT* u8_open_input_string ( u8_string  input)

Opens an input stream reading characters from the UTF-8 string input.

This is the simplest kind of input stream and is malloc'd.

Parameters
inputa null-terminated UTF-8 string
Returns
a pointer to a U8_INPUT stream

References U8_STREAM_MALLOCD, and U8_STREAM_OWNS_BUF.

U8_EXPORT U8_OUTPUT* u8_open_output_string ( int  initial_size)

Allocates and opens an output string with an initial size.

Parameters
initial_sizethe initial space allocated for the stream
Returns
a u8_output stream

References u8_global_output, U8_INIT_OUTPUT, U8_OUTPUT_STREAM, U8_STREAM_MALLOCD, and U8_STREAM_OWNS_BUF.

U8_EXPORT int u8_peekc ( struct U8_INPUT f)

Returns the next character to be read from f.

This does not advance the buffer point and does not attempt to fill the buffer.

Parameters
fa pointer to a U8_INPUT stream
Returns
-1 on error

References u8_validate().

U8_EXPORT int u8_probec ( struct U8_INPUT f)

Returns the next character to be read from f.

This does not advance the buffer point but will try to fetch data if needed. If there is a UTF-8 parsing error, this either returns -2 or issues a warning and returns , depending on whether the stream has its U8_STREAM_UTFERR bit set.

Parameters
fa pointer to a U8_INPUT stream
Returns
-1 on error
U8_EXPORT void u8_reset_default_output ( u8_output  out)

Resets the default output stream (for the current thread) to out.

If out is the same as u8_global_output, this clears the thread-local output stream value (so that changes to u8_global_output will change this threads default output stream).

Parameters
outa pointer to a U8_OUTPUT stream

Referenced by u8_get_default_output(), and u8_set_global_output().

U8_EXPORT void u8_set_default_output ( u8_output  out)

Sets the default output stream (for the current thread) to out.

Parameters
outa pointer to a U8_OUTPUT stream

Referenced by u8_get_default_output(), and u8_set_global_output().

U8_EXPORT void u8_set_global_output ( u8_output  out)

Sets the global output stream to out.

This is used as the default when a thread doesn't specify a default output stream.

Parameters
outa pointer to a U8_OUTPUT stream

References u8_get_default_output(), u8_global_output, u8_reset_default_output(), and u8_set_default_output().

U8_EXPORT int u8_ungetc ( struct U8_INPUT f,
int  c 
)

Puts the character c back into the input stream f.

This can be used by parsing algorithms which get a character, look at it and then put it back before calling another procedure.

Parameters
fa pointer to a U8_INPUT stream
cthe unicode code point last read from stream
Returns
-1 on error

References U8_INIT_FIXED_OUTPUT, and u8_seterr().