pub const fn is_noncharacter(c: char) -> bool
Expand description
Asserts a codepoint is a “noncharacter” based on a certain range of Unicode codepoints.
A noncharacter is a codepoint that is in the range U+FDD0 to U+FDEF, inclusive, or U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, or U+10FFFF.
Essentially, a noncharacter includes:
- the 36 codepoints from U+FDD0 to U+FDEF,
- the U+..FFFE U+..FFFF codepoints in all 17 Unicode planes which are guaranteed to never encode as anything, per the Unicode Standard (in Section 3.2, Conformance Requirements and Section 3.4, Characters and Encoding).
See also: WHATWG Infra Standard definition
§Examples
use whatwg_infra::scalar::is_noncharacter;
assert!(is_noncharacter('\u{FDD0}'));
assert!(is_noncharacter('\u{FDD1}'));
assert!(is_noncharacter('\u{FFFE}'));
assert!(is_noncharacter('\u{10FFFF}'));