|
libu8
|
These functions provide utilities over UTF-8 strings. More...
Defines | |
| #define | u8_bytelen(string) (strlen(string)) |
| Returns the number of bytes in a UTF-8 string. | |
Functions | |
| U8_EXPORT u8_string | u8_upcase (u8_string string) |
| Returns an uppercase version of a UTF-8 string. | |
| U8_EXPORT u8_string | u8_downcase (u8_string string) |
| Returns a lowercase version of a UTF-8 string. | |
| U8_EXPORT u8_string | u8_decompose (u8_string string) |
| Returns a decomposed version of a UTF-8 string. | |
| U8_EXPORT u8_string | u8_string_append (u8_string first_string,...) |
| Appends together any number of UTF-8 strings This takes any number of UTF-8 strings, finishing with a NULL pointer, and returns the result of appending them together. | |
| U8_EXPORT u8_string | u8_string_subst (u8_string input, u8_string key, u8_string replace) |
| Substitutes one string for another within its input This takes an input string, a key string, and a replacement string. | |
| U8_EXPORT u8_string | u8_slice (u8_byte *start, u8_byte *end) |
| Extracts and copies a substring of a UTF-8 string. | |
| U8_EXPORT int | u8_strlen (u8_string string) |
| Returns the number of characters in a UTF-8 string. | |
| U8_EXPORT int | u8_strlen_x (u8_string string, int len) |
| Returns the number of characters in a UTF-8 string with an explicit length. | |
| U8_EXPORT u8_string | u8_substring (u8_string string, int i) |
| Returns a pointer into string starting at the ith character. | |
| U8_EXPORT int | u8_string_ref (u8_string strptr) |
| Returns the first codepoint in strptr. | |
| U8_EXPORT int | u8_validptr (u8_byte *s) |
| Checks if the start of a string is a valid UTF-8 representation. | |
| U8_EXPORT int | u8_validp (u8_byte *s) |
| Checks if a string is a valid UTF-8 representation. | |
| U8_EXPORT int | u8_validate (u8_byte *s, int n) |
| Checks if the n bytes starting at s are a valid UTF-8 string. | |
| U8_EXPORT u8_string | u8_valid_copy (u8_byte *s) |
| Checks the validity of a UTF-8 string and copies it. | |
| U8_EXPORT u8_string | u8_convert_crlfs (u8_byte *s) |
| Checks the validity of a UTF-8 string and copies it, converting CRLFS. | |
These functions provide utilities over UTF-8 strings.
Many of these work generically over NUL-terminated strings but some are particular to UTF-8.
| #define u8_bytelen | ( | string | ) | (strlen(string)) |
Returns the number of bytes in a UTF-8 string.
This is just an alias for the C library function strlen();
| string | a UTF-8 string |
| U8_EXPORT u8_string u8_convert_crlfs | ( | u8_byte * | s | ) |
Checks the validity of a UTF-8 string and copies it, converting CRLFS.
| s | a possibly (probably) valid UTF-8 string. |
| U8_EXPORT u8_string u8_decompose | ( | u8_string | string | ) |
Returns a decomposed version of a UTF-8 string.
| string | a UTF-8 string |
| U8_EXPORT u8_string u8_downcase | ( | u8_string | string | ) |
Returns a lowercase version of a UTF-8 string.
| string | a UTF-8 string |
| U8_EXPORT u8_string u8_slice | ( | u8_byte * | start, |
| u8_byte * | end | ||
| ) |
Extracts and copies a substring of a UTF-8 string.
| start | a pointer into a UTF-8 string |
| end | a pointer into a later location in the same string |
| U8_EXPORT int u8_string_ref | ( | u8_string | strptr | ) |
Returns the first codepoint in strptr.
| strptr | a pointer into a UTF-8 string |
| U8_EXPORT u8_string u8_string_subst | ( | u8_string | input, |
| u8_string | key, | ||
| u8_string | replace | ||
| ) |
Substitutes one string for another within its input This takes an input string, a key string, and a replacement string.
It returns a copy of the input string with the replacement string substituted for all occurences of the key string. Note: that this does not do any UTF-8 normalization.
| U8_EXPORT int u8_strlen | ( | u8_string | string | ) |
Returns the number of characters in a UTF-8 string.
This counts the number of unicode codepoints, so combining characters are counted as separate characters. This assumes that the string is NUL terminated; to count characters given a particular end pointer use u8_strlen_x()
| string | a UTF-8 string |
| U8_EXPORT int u8_strlen_x | ( | u8_string | string, |
| int | len | ||
| ) |
Returns the number of characters in a UTF-8 string with an explicit length.
This counts the number of unicode codepoints, so combining characters are counted as separate characters.
| string | a UTF-8 string |
| len | the number of bytes in the string to be measured |
| U8_EXPORT u8_string u8_substring | ( | u8_string | string, |
| int | i | ||
| ) |
Returns a pointer into string starting at the ith character.
This does not copy its result, so the returned string shares memory with the string.
| string | a UTF-8 string |
| i | how many characters in to start the string |
| U8_EXPORT u8_string u8_upcase | ( | u8_string | string | ) |
Returns an uppercase version of a UTF-8 string.
| string | a UTF-8 string |
| U8_EXPORT u8_string u8_valid_copy | ( | u8_byte * | s | ) |
Checks the validity of a UTF-8 string and copies it.
| s | a possibly (probably) valid UTF-8 string. |
| U8_EXPORT int u8_validate | ( | u8_byte * | s, |
| int | n | ||
| ) |
Checks if the n bytes starting at s are a valid UTF-8 string.
| s | a possible UTF-8 string |
| n | the number of bytes in the string |
| U8_EXPORT int u8_validp | ( | u8_byte * | s | ) |
Checks if a string is a valid UTF-8 representation.
| s | a possible UTF-8 string |
| U8_EXPORT int u8_validptr | ( | u8_byte * | s | ) |
Checks if the start of a string is a valid UTF-8 representation.
This checks only for a single character representation.
| s | a possible UTF-8 string |
1.7.4