|
M_bool | M_utf8_is_valid (const char *str, const char **endptr) |
|
M_bool | M_utf8_is_valid_cp (M_uint32 cp) |
|
size_t | M_utf8_cnt (const char *str) |
|
M_utf8_error_t | M_utf8_get_cp (const char *str, M_uint32 *cp, const char **next) |
|
M_utf8_error_t | M_utf8_get_chr (const char *str, char *buf, size_t buf_size, size_t *len, const char **next) |
|
M_utf8_error_t | M_utf8_get_chr_buf (const char *str, M_buf_t *buf, const char **next) |
|
char * | M_utf8_next_chr (const char *str) |
|
M_utf8_error_t | M_utf8_from_cp (char *buf, size_t buf_size, size_t *len, M_uint32 cp) |
|
M_utf8_error_t | M_utf8_from_cp_buf (M_buf_t *buf, M_uint32 cp) |
|
M_utf8_error_t | M_utf8_cp_at (const char *str, size_t idx, M_uint32 *cp) |
|
M_utf8_error_t | M_utf8_chr_at (const char *str, char *buf, size_t buf_size, size_t *len, size_t idx) |
|
Targets unicode 10.0.
- Note
- Non-characters are considered an error conditions because they do not have a defined meaning.
A utf-8 sequence is defined as the variable number of bytes that represent a single utf-8 display character.
◆ M_utf8_error_t
Error codes.
Enumerator |
---|
M_UTF8_ERROR_SUCCESS | Success.
|
M_UTF8_ERROR_BAD_START | Start of byte sequence is invalid.
|
M_UTF8_ERROR_TRUNCATED | The utf-8 character length exceeds the data length.
|
M_UTF8_ERROR_EXPECT_CONTINUE | A conurbation marker was expected but not found.
|
M_UTF8_ERROR_BAD_CODE_POINT | Code point is invalid.
|
M_UTF8_ERROR_OVERLONG | Overlong encoding encountered.
|
M_UTF8_ERROR_INVALID_PARAM | Input parameter is invalid.
|
◆ M_utf8_is_valid()
M_bool M_utf8_is_valid |
( |
const char * |
str, |
|
|
const char ** |
endptr |
|
) |
| |
Check if a given string is valid utf-8 encoded.
- Parameters
-
[in] | str | utf-8 string. |
[out] | endptr | On success, will be set to the NULL terminator. On error, will be set to the character that caused the failure. |
- Returns
- M_TRUE if str is a valid utf-8 sequence. Otherwise, M_FALSE.
◆ M_utf8_is_valid_cp()
M_bool M_utf8_is_valid_cp |
( |
M_uint32 |
cp | ) |
|
Check if a given code point is valid for utf-8.
- Parameters
-
- Returns
- M_TRUE if code point is valid for utf-8. Otherwise, M_FALSE.
◆ M_utf8_cnt()
size_t M_utf8_cnt |
( |
const char * |
str | ) |
|
Ge the number of utf-8 characters in a string.
This is the number of characters not the number of bytes in the string. M_str_len will only return the same value if the string is only ascii.
- Parameters
-
- Returns
- Number of characters on success. On failure will return 0. Use M_str_isempty to determine if 0 is a failure or empty string.
◆ M_utf8_get_cp()
M_utf8_error_t M_utf8_get_cp |
( |
const char * |
str, |
|
|
M_uint32 * |
cp, |
|
|
const char ** |
next |
|
) |
| |
Read a utf-8 sequence as a code point.
- Parameters
-
[in] | str | utf-8 string. |
[out] | cp | Code point. Can be NULL. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- Result.
◆ M_utf8_get_chr()
M_utf8_error_t M_utf8_get_chr |
( |
const char * |
str, |
|
|
char * |
buf, |
|
|
size_t |
buf_size, |
|
|
size_t * |
len, |
|
|
const char ** |
next |
|
) |
| |
Read a utf-8 sequence.
Output is not NULL terminated.
- Parameters
-
[in] | str | utf-8 string. |
[in] | buf | Buffer to put utf-8 sequence. Can be NULL. |
[in] | buf_size | Size of the buffer. |
[out] | len | Length of the sequence that was put into buffer. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- Result.
◆ M_utf8_get_chr_buf()
Read a utf-8 sequence into an M_buf_t.
- Parameters
-
[in] | str | utf-8 string. |
[in] | buf | Buffer to put utf-8 sequence. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- Result.
◆ M_utf8_next_chr()
char * M_utf8_next_chr |
( |
const char * |
str | ) |
|
Get the location of the next utf-8 sequence.
Does not validate characters. Useful when parsing an invalid string and wanting to move past to ignore or replace invalid characters.
- Parameters
-
- Returns
- Pointer to next character in sequence.
◆ M_utf8_from_cp()
M_utf8_error_t M_utf8_from_cp |
( |
char * |
buf, |
|
|
size_t |
buf_size, |
|
|
size_t * |
len, |
|
|
M_uint32 |
cp |
|
) |
| |
Convert a code point to a utf-8 sequence.
Output is not NULL terminated.
- Parameters
-
[in] | buf | Buffer to put utf-8 sequence. |
[in] | buf_size | Size of the buffer. |
[out] | len | Length of the sequence that was put into buffer. |
[in] | cp | Code point to convert from. |
- Returns
- Result.
◆ M_utf8_from_cp_buf()
Convert a code point to a utf-8 sequence writing to an M_buf_t.
- Parameters
-
[in] | buf | Buffer to put utf-8 sequence. |
[in] | cp | Code point to convert from. |
- Returns
- Result.
◆ M_utf8_cp_at()
M_utf8_error_t M_utf8_cp_at |
( |
const char * |
str, |
|
|
size_t |
idx, |
|
|
M_uint32 * |
cp |
|
) |
| |
Get the code point at a given index.
Index is based on M_utf8_cnt not the number of bytes. This causes a full scan of the string. Iteration should use M_utf8_get_cp.
- Parameters
-
[in] | str | utf-8 string. |
[in] | idx | Index. |
[out] | cp | Code point. |
- Returns
- Result.
◆ M_utf8_chr_at()
M_utf8_error_t M_utf8_chr_at |
( |
const char * |
str, |
|
|
char * |
buf, |
|
|
size_t |
buf_size, |
|
|
size_t * |
len, |
|
|
size_t |
idx |
|
) |
| |
Get the utf-8 sequence at a given index.
Index is based on M_utf8_cnt not the number of bytes. This causes a full scan of the string. Iteration should use M_utf8_get_chr.
- Parameters
-
[in] | str | utf-8 string. |
[in] | buf | Buffer to put utf-8 sequence. |
[in] | buf_size | Size of the buffer. |
[out] | len | Length of the sequence that was put into buffer. |
[in] | idx | Index. |
- Returns
- Result.