Mstdlib-1.24.0
|
Text codec conversion. E.g. utf-8 to X and X to utf-8.
utf-8 is used as the base codec. Input for encode should be utf-8 and output from decode will be utf-8.
Codec | Name | Alias |
---|---|---|
UTF8 | utf8 | utf-8, utf_8 |
ASCII | ascii | us-ascii |
CP037 | cp037 | ibm037, ibm-037, ibm039, ibm-039 |
CP500 | cp500 | ibm500, ibm-500, ebcdic-cp-be, ebcdic-cp-ch |
CP874 | cp874 | windows-874 |
CP1250 | cp1250 | windows-1250 |
CP1251 | cp1251 | windows-1251 |
CP1252 | cp1252 | windows-1252 |
CP1253 | cp1253 | windows-1253 |
CP1254 | cp1254 | windows-1254 |
CP1255 | cp1255 | windows-1255 |
CP1256 | cp1256 | windows-1256 |
CP1257 | cp1257 | windows-1257 |
CP1258 | cp1258 | windows-1258 |
ISO8859_1 | latin_1 | latin-1, latin1, latin 1, latin, l1, iso-8859-1, iso8859-1, iso8859_1, iso88591, 8859, 88591, cp819 |
ISO8859_2 | latin_2 | latin-2, latin2, latin 2, l2, iso-8859-2, iso8859-2, iso8859_2, iso88592, 88592 |
ISO8859_3 | latin_3 | latin-3, latin3, latin 3, l3, iso-8859-3, iso8859-3, iso8859_3, iso88593, 88593 |
ISO8859_4 | latin_4 | latin-4, latin4, latin 4, l4, iso-8859-4, iso8859-4, iso8859_4, iso88594, 88594 |
ISO8859_5 | cyrillic | iso-8859-5, iso8859-5, iso8859_5, iso88595, 88595 |
ISO8859_6 | arabic | iso-8859-6, iso8859-6, iso8859_6, iso88596, 88596 |
ISO8859_7 | greek | iso-8859-7, iso8859-7, iso8859_7, iso88597, 88597 |
ISO8859_8 | hebrew | iso-8859-8, iso8859-8, iso8859_8, iso88598, 88598 |
ISO8859_9 | latin_5 | latin-5, latin5, latin 5, l5, iso-8859-9, iso8859-9, iso8859_9, iso88599, 88599 |
ISO8859_10 | latin_6 | latin-6, latin6, latin 6, l6, iso-8859-10, iso8859-10, iso8859_10, iso885910, 885910 |
ISO8859_11 | thai | iso-8859-11, iso8859-11, iso8859_11, iso885911, 885911 |
ISO8859_13 | latin_7 | latin-7, latin7, latin 7, l7, iso-8859-13, iso8859-13, iso8859_13, iso885913, 885913 |
ISO8859_14 | latin_8 | latin-8, latin8, latin 8, l8, iso-8859-14, iso8859-14, iso8859_14, iso885914, 885914 |
ISO8859_15 | latin_9 | latin-9, latin9, latin 9, l9, iso-8859-15, iso8859-15, iso8859_15, iso885915, 885915 |
ISO8859_16 | latin_10 | latin-10, latin10, latin 10, l10, iso-8859-16, iso8859-16, iso8859_16, iso885916, 885916 |
PERCENT_URL | percent | url |
PERCENT_FORM | application/x-www-form-urlencoded | x-www-form-urlencoded, www-form-urlencoded, form-urlencoded, percent_plus url_plus, , percent-plus, url-plus, percentplus, urlplus |
PERCENT_URLMIN | percent_min | url_min |
PERCENT_FORMMIN | form_min | form-urlencoded-min |
PUNYCODE | punycode | puny |
QUOTED_PRINTABLE | puoted-printable | qp |
If validating UTF-8 strings, use M_utf8_is_valid().
UTF-8 to UTF-8 conversion for decode and encode is supported and intended to be used with the replace error handler. Specifically when dealing with UTF-8 strings that are known to be or could be invalid and need to be "sanitized" for continued use. The difference between encode and decode with UTF-8 to UTF-8 is the replacement character.
enum M_textcodec_codec_t |
Text codecs that can be used for encoding and decoding.
enum M_textcodec_error_t |
Result of a codec conversion.
M_textcodec_error_t M_textcodec_encode | ( | char ** | out, |
const char * | in, | ||
M_textcodec_ehandler_t | ehandler, | ||
M_textcodec_codec_t | codec | ||
) |
Encode a utf-8 string using the requested text encoding.
[out] | out | Encoded string. |
[in] | in | Input utf-8 string. |
[in] | ehandler | Error handling logic to use. |
[in] | codec | Encoding to use for output. |
M_textcodec_error_t M_textcodec_encode_buf | ( | M_buf_t * | buf, |
const char * | in, | ||
M_textcodec_ehandler_t | ehandler, | ||
M_textcodec_codec_t | codec | ||
) |
Encode a utf-8 string into an M_buf_t using the requested text encoding.
[in] | buf | Buffer to put encoded string data. |
[in] | in | Input utf-8 string. |
[in] | ehandler | Error handling logic to use. |
[in] | codec | Encoding to use for output. |
M_textcodec_error_t M_textcodec_encode_parser | ( | M_parser_t * | parser, |
const char * | in, | ||
M_textcodec_ehandler_t | ehandler, | ||
M_textcodec_codec_t | codec | ||
) |
Encode a utf-8 string into an M_parser_t using the requested text encoding.
[in] | parser | Parser to put encoded string data. |
[in] | in | Input utf-8 string. |
[in] | ehandler | Error handling logic to use. |
[in] | codec | Encoding to use for output. |
M_textcodec_error_t M_textcodec_decode | ( | char ** | out, |
const char * | in, | ||
M_textcodec_ehandler_t | ehandler, | ||
M_textcodec_codec_t | codec | ||
) |
Decode a string to utf-8.
[out] | out | utf-8 string. |
[in] | in | Input encoded string. |
[in] | ehandler | Error handling logic to use. |
[in] | codec | Encoding of the input string. |
M_textcodec_error_t M_textcodec_decode_buf | ( | M_buf_t * | buf, |
const char * | in, | ||
M_textcodec_ehandler_t | ehandler, | ||
M_textcodec_codec_t | codec | ||
) |
Decode a string to utf-8 into a M_buf_t.
[in] | buf | Buffer to put decoded utf-8 string data. |
[in] | in | Input encoded string. |
[in] | ehandler | Error handling logic to use. |
[in] | codec | Encoding of the input string. |
M_textcodec_error_t M_textcodec_decode_parser | ( | M_parser_t * | parser, |
const char * | in, | ||
M_textcodec_ehandler_t | ehandler, | ||
M_textcodec_codec_t | codec | ||
) |
Decode a string to utf-8 into a M_parser_t.
[in] | parser | Parser to put decoded utf-8 string data. |
[in] | in | Input encoded string. |
[in] | ehandler | Error handling logic to use. |
[in] | codec | Encoding of the input string. |
M_bool M_textcodec_error_is_error | ( | M_textcodec_error_t | err | ) |
Returns if error code is a failure or not.
[in] | err | Error to evaluate |
M_textcodec_codec_t M_textcodec_codec_from_str | ( | const char * | s | ) |
Get the codec from the string name.
[in] | s | Codec as a string. |
const char * M_textcodec_codec_to_str | ( | M_textcodec_codec_t | codec | ) |
Covert the codec to its string name.
[in] | codec | Codec. |