libu8
Defines | Functions
include/libu8/u8stringfns.h File Reference

These functions provide utilities over UTF-8 strings. More...

Defines

#define u8_bytelen(string)   (strlen(string))
 Returns the number of bytes in a UTF-8 string.

Functions

U8_EXPORT u8_string u8_upcase (u8_string string)
 Returns an uppercase version of a UTF-8 string.
U8_EXPORT u8_string u8_downcase (u8_string string)
 Returns a lowercase version of a UTF-8 string.
U8_EXPORT u8_string u8_decompose (u8_string string)
 Returns a decomposed version of a UTF-8 string.
U8_EXPORT u8_string u8_string_append (u8_string first_string,...)
 Appends together any number of UTF-8 strings This takes any number of UTF-8 strings, finishing with a NULL pointer, and returns the result of appending them together.
U8_EXPORT u8_string u8_string_subst (u8_string input, u8_string key, u8_string replace)
 Substitutes one string for another within its input This takes an input string, a key string, and a replacement string.
U8_EXPORT u8_string u8_slice (u8_byte *start, u8_byte *end)
 Extracts and copies a substring of a UTF-8 string.
U8_EXPORT int u8_strlen (u8_string string)
 Returns the number of characters in a UTF-8 string.
U8_EXPORT int u8_strlen_x (u8_string string, int len)
 Returns the number of characters in a UTF-8 string with an explicit length.
U8_EXPORT u8_string u8_substring (u8_string string, int i)
 Returns a pointer into string starting at the ith character.
U8_EXPORT int u8_string_ref (u8_string strptr)
 Returns the first codepoint in strptr.
U8_EXPORT int u8_validptr (u8_byte *s)
 Checks if the start of a string is a valid UTF-8 representation.
U8_EXPORT int u8_validp (u8_byte *s)
 Checks if a string is a valid UTF-8 representation.
U8_EXPORT int u8_validate (u8_byte *s, int n)
 Checks if the n bytes starting at s are a valid UTF-8 string.
U8_EXPORT u8_string u8_valid_copy (u8_byte *s)
 Checks the validity of a UTF-8 string and copies it.
U8_EXPORT u8_string u8_convert_crlfs (u8_byte *s)
 Checks the validity of a UTF-8 string and copies it, converting CRLFS.

Detailed Description

These functions provide utilities over UTF-8 strings.

Many of these work generically over NUL-terminated strings but some are particular to UTF-8.


Define Documentation

#define u8_bytelen (   string)    (strlen(string))

Returns the number of bytes in a UTF-8 string.

This is just an alias for the C library function strlen();

Parameters:
stringa UTF-8 string
Returns:
the number of bytes in the string

Function Documentation

U8_EXPORT u8_string u8_convert_crlfs ( u8_byte *  s)

Checks the validity of a UTF-8 string and copies it, converting CRLFS.

Parameters:
sa possibly (probably) valid UTF-8 string.
Returns:
a valid UTF-8 string or NULL
U8_EXPORT u8_string u8_decompose ( u8_string  string)

Returns a decomposed version of a UTF-8 string.

Parameters:
stringa UTF-8 string
Returns:
a UTF-8 string with all composed characters broken down
U8_EXPORT u8_string u8_downcase ( u8_string  string)

Returns a lowercase version of a UTF-8 string.

Parameters:
stringa UTF-8 string
Returns:
a UTF-8 string with all uppercase characters converted to lowercase.
U8_EXPORT u8_string u8_slice ( u8_byte *  start,
u8_byte *  end 
)

Extracts and copies a substring of a UTF-8 string.

Parameters:
starta pointer into a UTF-8 string
enda pointer into a later location in the same string
Returns:
a UTF-8 string extracted from between the two pointers
U8_EXPORT int u8_string_ref ( u8_string  strptr)

Returns the first codepoint in strptr.

Parameters:
strptra pointer into a UTF-8 string
Returns:
the unicode code point at the pointer.
U8_EXPORT u8_string u8_string_subst ( u8_string  input,
u8_string  key,
u8_string  replace 
)

Substitutes one string for another within its input This takes an input string, a key string, and a replacement string.

It returns a copy of the input string with the replacement string substituted for all occurences of the key string. Note: that this does not do any UTF-8 normalization.

U8_EXPORT int u8_strlen ( u8_string  string)

Returns the number of characters in a UTF-8 string.

This counts the number of unicode codepoints, so combining characters are counted as separate characters. This assumes that the string is NUL terminated; to count characters given a particular end pointer use u8_strlen_x()

Parameters:
stringa UTF-8 string
Returns:
the number of characters (codepoints) in the string
U8_EXPORT int u8_strlen_x ( u8_string  string,
int  len 
)

Returns the number of characters in a UTF-8 string with an explicit length.

This counts the number of unicode codepoints, so combining characters are counted as separate characters.

Parameters:
stringa UTF-8 string
lenthe number of bytes in the string to be measured
Returns:
the number of characters (codepoints) in the string
U8_EXPORT u8_string u8_substring ( u8_string  string,
int  i 
)

Returns a pointer into string starting at the ith character.

This does not copy its result, so the returned string shares memory with the string.

Parameters:
stringa UTF-8 string
ihow many characters in to start the string
Returns:
a substring (not copied)
U8_EXPORT u8_string u8_upcase ( u8_string  string)

Returns an uppercase version of a UTF-8 string.

Parameters:
stringa UTF-8 string
Returns:
a UTF-8 string with all lowercase characters converted to uppercase.
U8_EXPORT u8_string u8_valid_copy ( u8_byte *  s)

Checks the validity of a UTF-8 string and copies it.

Parameters:
sa possibly (probably) valid UTF-8 string.
Returns:
a valid UTF-8 string or NULL
U8_EXPORT int u8_validate ( u8_byte *  s,
int  n 
)

Checks if the n bytes starting at s are a valid UTF-8 string.

Parameters:
sa possible UTF-8 string
nthe number of bytes in the string
Returns:
1 if the pointer refers to a valid UTF-8 sequence, 0 otherwise
U8_EXPORT int u8_validp ( u8_byte *  s)

Checks if a string is a valid UTF-8 representation.

Parameters:
sa possible UTF-8 string
Returns:
1 if the string is a valid UTF-8 string, 0 otherwise
U8_EXPORT int u8_validptr ( u8_byte *  s)

Checks if the start of a string is a valid UTF-8 representation.

This checks only for a single character representation.

Parameters:
sa possible UTF-8 string
Returns:
1 if the pointer refers to a valid UTF-8 sequence, 0 otherwise