UTF-8 Checking and Validation
Validate if a UTF-8 sequence or string is comprised of a given type of characters.
◆ M_utf8_islower_cp()
M_bool M_utf8_islower_cp |
( |
M_uint32 |
cp | ) |
|
Checks for a lower-case code point.
Derived Core Properties: Lowercase. -> General Category: Ll + Other_Lowercase
- Parameters
-
- Returns
- M_TRUE if lowercase. Otherwise M_FALSE.
◆ M_utf8_islower_chr()
M_bool M_utf8_islower_chr |
( |
const char * |
str, |
|
|
const char ** |
next |
|
) |
| |
Checks if a utf-8 sequence is lower-case.
- Parameters
-
[in] | str | utf-8 string. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- M_TRUE if lowercase. Otherwise M_FALSE.
◆ M_utf8_islower()
M_bool M_utf8_islower |
( |
const char * |
str | ) |
|
Checks if a utf-8 string is lower-case.
- Parameters
-
- Returns
- M_TRUE if lowercase. Otherwise M_FALSE.
◆ M_utf8_isupper_cp()
M_bool M_utf8_isupper_cp |
( |
M_uint32 |
cp | ) |
|
Checks for a upper-case code point.
Derived Core Properties: Uppercase. -> General Category: Lu + Other_Uppercase
- Parameters
-
- Returns
- M_TRUE if uppercase. Otherwise M_FALSE.
◆ M_utf8_isupper_chr()
M_bool M_utf8_isupper_chr |
( |
const char * |
str, |
|
|
const char ** |
next |
|
) |
| |
Checks if a utf-8 sequence is upper-case.
- Parameters
-
[in] | str | utf-8 string. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- M_TRUE if uppercase. Otherwise M_FALSE.
◆ M_utf8_isupper()
M_bool M_utf8_isupper |
( |
const char * |
str | ) |
|
Checks if a utf-8 string is upper-case.
- Parameters
-
- Returns
- M_TRUE if uppercase. Otherwise M_FALSE.
◆ M_utf8_isalpha_cp()
M_bool M_utf8_isalpha_cp |
( |
M_uint32 |
cp | ) |
|
Checks for an alphabetic cp.
Derived Core Properties: Alphabetic. -> Lowercase + Uppercase + Lt + Lm + Lo + Nl + Other_Alphabetic
- Parameters
-
- Returns
- M_TRUE if alphabetic. Otherwise M_FALSE.
◆ M_utf8_isalpha_chr()
M_bool M_utf8_isalpha_chr |
( |
const char * |
str, |
|
|
const char ** |
next |
|
) |
| |
Checks if a utf-8 sequence is alphabetic.
- Parameters
-
[in] | str | utf-8 string. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- M_TRUE if alphabetic. Otherwise M_FALSE.
◆ M_utf8_isalpha()
M_bool M_utf8_isalpha |
( |
const char * |
str | ) |
|
Checks if a utf-8 string is alphabetic.
- Parameters
-
- Returns
- M_TRUE if alphabetic. Otherwise M_FALSE.
◆ M_utf8_isalnum_cp()
M_bool M_utf8_isalnum_cp |
( |
M_uint32 |
cp | ) |
|
Checks for an alphabetic or numeric cp.
Alphabetic + Nd + Nl + No.
- Parameters
-
- Returns
- M_TRUE if alphanumeric. Otherwise M_FALSE.
◆ M_utf8_isalnum_chr()
M_bool M_utf8_isalnum_chr |
( |
const char * |
str, |
|
|
const char ** |
next |
|
) |
| |
Checks if a utf-8 sequence is alphabetic or numeric.
- Parameters
-
[in] | str | utf-8 string. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- M_TRUE if alphanumeric. Otherwise M_FALSE.
◆ M_utf8_isalnum()
M_bool M_utf8_isalnum |
( |
const char * |
str | ) |
|
Checks if a utf-8 string is alphabetic or numeric.
- Parameters
-
- Returns
- M_TRUE if alphanumeric. Otherwise M_FALSE.
◆ M_utf8_isnum_cp()
M_bool M_utf8_isnum_cp |
( |
M_uint32 |
cp | ) |
|
Checks for numeric code point.
General Category: Nd, Nl, No
- Parameters
-
- Returns
- M_TRUE if numeric. Otherwise M_FALSE.
◆ M_utf8_isnum_chr()
M_bool M_utf8_isnum_chr |
( |
const char * |
str, |
|
|
const char ** |
next |
|
) |
| |
Checks if a utf-8 sequence is numeric.
- Parameters
-
[in] | str | utf-8 string. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- M_TRUE if numeric. Otherwise M_FALSE.
◆ M_utf8_isnum()
M_bool M_utf8_isnum |
( |
const char * |
str | ) |
|
Checks if a utf-8 string is numeric.
- Parameters
-
- Returns
- M_TRUE if numeric. Otherwise M_FALSE.
◆ M_utf8_iscntrl_cp()
M_bool M_utf8_iscntrl_cp |
( |
M_uint32 |
cp | ) |
|
Checks for a control character code point.
General Category: Cc
- Parameters
-
- Returns
- M_TRUE if control. Otherwise M_FALSE.
◆ M_utf8_iscntrl_chr()
M_bool M_utf8_iscntrl_chr |
( |
const char * |
str, |
|
|
const char ** |
next |
|
) |
| |
Checks if a utf-8 sequence is a control character.
- Parameters
-
[in] | str | utf-8 string. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- M_TRUE if control. Otherwise M_FALSE.
◆ M_utf8_iscntrl()
M_bool M_utf8_iscntrl |
( |
const char * |
str | ) |
|
Checks if a utf-8 string is a control character.
- Parameters
-
- Returns
- M_TRUE if control. Otherwise M_FALSE.
◆ M_utf8_ispunct_cp()
M_bool M_utf8_ispunct_cp |
( |
M_uint32 |
cp | ) |
|
Checks for a punctuation code point.
General Category: Pc + Pd + Ps + Pe + Pi + Pf + Po
- Parameters
-
- Returns
- M_TRUE if punctuation. Otherwise M_FALSE.
◆ M_utf8_ispunct_chr()
M_bool M_utf8_ispunct_chr |
( |
const char * |
str, |
|
|
const char ** |
next |
|
) |
| |
Checks if a utf-8 sequence is punctuation.
- Parameters
-
[in] | str | utf-8 string. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- M_TRUE if punctuation. Otherwise M_FALSE.
◆ M_utf8_ispunct()
M_bool M_utf8_ispunct |
( |
const char * |
str | ) |
|
Checks if a utf-8 string is punctuation.
- Parameters
-
- Returns
- M_TRUE if punctuation. Otherwise M_FALSE.
◆ M_utf8_isprint_cp()
M_bool M_utf8_isprint_cp |
( |
M_uint32 |
cp | ) |
|
Checks for a printable codepoint.
Defined as tables L, M, N, P, S, ASCII space, and UniHan.
- Parameters
-
- Returns
- M_TRUE if printable. Otherwise M_FALSE.
◆ M_utf8_isprint_chr()
M_bool M_utf8_isprint_chr |
( |
const char * |
str, |
|
|
const char ** |
next |
|
) |
| |
Checks if a utf-8 sequence is printable.
Defined as tables L, M, N, P, S and ASCII space
- Parameters
-
[in] | str | utf-8 string. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- M_TRUE if printable. Otherwise M_FALSE.
◆ M_utf8_isprint()
M_bool M_utf8_isprint |
( |
const char * |
str | ) |
|
Checks if a utf-8 string is printable.
Defined as tables L, M, N, P, S, ASCII space, and UniHan.
- Parameters
-
- Returns
- M_TRUE if printable. Otherwise M_FALSE.
◆ M_utf8_isunihan_cp()
M_bool M_utf8_isunihan_cp |
( |
M_uint32 |
cp | ) |
|
Checks for a unihan codepoint.
Defined as tables L, M, N, P, S, ASCII space, and UniHan.
- Parameters
-
- Returns
- M_TRUE if unihan. Otherwise M_FALSE.
◆ M_utf8_isunihan_chr()
M_bool M_utf8_isunihan_chr |
( |
const char * |
str, |
|
|
const char ** |
next |
|
) |
| |
Checks if a utf-8 sequence is unihan.
- Parameters
-
[in] | str | utf-8 string. |
[out] | next | Start of next character. Will point to NULL terminator if last character. |
- Returns
- M_TRUE if unihan. Otherwise M_FALSE.
◆ M_utf8_isunihan()
M_bool M_utf8_isunihan |
( |
const char * |
str | ) |
|
Checks if a utf-8 string is unihan.
- Parameters
-
- Returns
- M_TRUE if unihan. Otherwise M_FALSE.