Primitive Type char [−]
A character type.
The char
type represents a single character. More specifically, since
'character' isn't a well-defined concept in Unicode, char
is a 'Unicode
scalar value', which is similar to, but not the same as, a 'Unicode code
point'.
This documentation describes a number of methods and trait implementations on the
char
type. For technical reasons, there is additional, separate
documentation in the std::char
module as well.
Representation
char
is always four bytes in size. This is a different representation than
a given character would have as part of a String
, for example:
let v = vec!['h', 'e', 'l', 'l', 'o']; // five elements times four bytes for each element assert_eq!(20, v.len() * std::mem::size_of::<char>()); let s = String::from("hello"); // five elements times one byte per element assert_eq!(5, s.len() * std::mem::size_of::<u8>());
As always, remember that a human intuition for 'character' may not map to Unicode's definitions. For example, emoji symbols such as '❤️' are more than one byte; ❤️ in particular is six:
fn main() { let s = String::from("❤️"); // six bytes times one byte for each element assert_eq!(6, s.len() * std::mem::size_of::<u8>()); }let s = String::from("❤️"); // six bytes times one byte for each element assert_eq!(6, s.len() * std::mem::size_of::<u8>());
This also means it won't fit into a char
, and so trying to create a
literal with let heart = '❤️';
gives an error:
error: character literal may only contain one codepoint: '❤
let heart = '❤️';
^~
Another implication of this is that if you want to do per-char
acter
processing, it can end up using a lot more memory:
let s = String::from("love: ❤️"); let v: Vec<char> = s.chars().collect(); assert_eq!(12, s.len() * std::mem::size_of::<u8>()); assert_eq!(32, v.len() * std::mem::size_of::<char>());
Or may give you results you may not expect:
fn main() { let s = String::from("❤️"); let mut iter = s.chars(); // we get two chars out of a single ❤️ assert_eq!(Some('\u{2764}'), iter.next()); assert_eq!(Some('\u{fe0f}'), iter.next()); assert_eq!(None, iter.next()); }let s = String::from("❤️"); let mut iter = s.chars(); // we get two chars out of a single ❤️ assert_eq!(Some('\u{2764}'), iter.next()); assert_eq!(Some('\u{fe0f}'), iter.next()); assert_eq!(None, iter.next());
Methods
impl char
fn is_digit(self, radix: u32) -> bool
Checks if a char
is a digit in the given radix.
A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexicdecimal, to give some common values. Arbitrary radicum are supported.
Compared to is_numeric()
, this function only recognizes the characters
0-9
, a-z
and A-Z
.
'Digit' is defined to be only the following characters:
0-9
a-z
A-Z
For a more comprehensive understanding of 'digit', see is_numeric()
.
Panics
Panics if given a radix larger than 36.
Examples
Basic usage:
fn main() { let d = '1'; assert!(d.is_digit(10)); let d = 'f'; assert!(d.is_digit(16)); assert!(!d.is_digit(10)); }let d = '1'; assert!(d.is_digit(10)); let d = 'f'; assert!(d.is_digit(16)); assert!(!d.is_digit(10));
Passing a large radix, causing a panic:
fn main() { use std::thread; let result = thread::spawn(|| { let d = '1'; // this panics d.is_digit(37); }).join(); assert!(result.is_err()); }use std::thread; let result = thread::spawn(|| { let d = '1'; // this panics d.is_digit(37); }).join(); assert!(result.is_err());
fn to_digit(self, radix: u32) -> Option<u32>
Converts a char
to a digit in the given radix.
A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexicdecimal, to give some common values. Arbitrary radicum are supported.
'Digit' is defined to be only the following characters:
0-9
a-z
A-Z
Failure
Returns None
if the char
does not refer to a digit in the given radix.
Panics
Panics if given a radix larger than 36.
Examples
Basic usage:
fn main() { let d = '1'; assert_eq!(d.to_digit(10), Some(1)); let d = 'f'; assert_eq!(d.to_digit(16), Some(15)); }let d = '1'; assert_eq!(d.to_digit(10), Some(1)); let d = 'f'; assert_eq!(d.to_digit(16), Some(15));
Passing a non-digit results in failure:
fn main() { let d = 'f'; assert_eq!(d.to_digit(10), None); let d = 'z'; assert_eq!(d.to_digit(16), None); }let d = 'f'; assert_eq!(d.to_digit(10), None); let d = 'z'; assert_eq!(d.to_digit(16), None);
Passing a large radix, causing a panic:
fn main() { use std::thread; let result = thread::spawn(|| { let d = '1'; d.to_digit(37); }).join(); assert!(result.is_err()); }use std::thread; let result = thread::spawn(|| { let d = '1'; d.to_digit(37); }).join(); assert!(result.is_err());
fn escape_unicode(self) -> EscapeUnicode
Returns an iterator that yields the hexadecimal Unicode escape of a
character, as char
s.
All characters are escaped with Rust syntax of the form \\u{NNNN}
where NNNN
is the shortest hexadecimal representation.
Examples
Basic usage:
fn main() { for c in '❤'.escape_unicode() { print!("{}", c); } println!(""); }for c in '❤'.escape_unicode() { print!("{}", c); } println!("");
This prints:
\u{2764}
Collecting into a String
:
let heart: String = '❤'.escape_unicode().collect(); assert_eq!(heart, r"\u{2764}");
fn escape_default(self) -> EscapeDefault
Returns an iterator that yields the literal escape code of a char
.
The default is chosen with a bias toward producing literals that are legal in a variety of languages, including C++11 and similar C-family languages. The exact rules are:
- Tab is escaped as
\t
. - Carriage return is escaped as
\r
. - Line feed is escaped as
\n
. - Single quote is escaped as
\'
. - Double quote is escaped as
\"
. - Backslash is escaped as
\\
. - Any character in the 'printable ASCII' range
0x20
..0x7e
inclusive is not escaped. - All other characters are given hexadecimal Unicode escapes; see
escape_unicode
.
Examples
Basic usage:
fn main() { for i in '"'.escape_default() { println!("{}", i); } }for i in '"'.escape_default() { println!("{}", i); }
This prints:
\
"
Collecting into a String
:
let quote: String = '"'.escape_default().collect(); assert_eq!(quote, "\\\"");
fn len_utf8(self) -> usize
Returns the number of bytes this char
would need if encoded in UTF-8.
That number of bytes is always between 1 and 4, inclusive.
Examples
Basic usage:
fn main() { let len = 'A'.len_utf8(); assert_eq!(len, 1); let len = 'ß'.len_utf8(); assert_eq!(len, 2); let len = 'ℝ'.len_utf8(); assert_eq!(len, 3); let len = '💣'.len_utf8(); assert_eq!(len, 4); }let len = 'A'.len_utf8(); assert_eq!(len, 1); let len = 'ß'.len_utf8(); assert_eq!(len, 2); let len = 'ℝ'.len_utf8(); assert_eq!(len, 3); let len = '💣'.len_utf8(); assert_eq!(len, 4);
The &str
type guarantees that its contents are UTF-8, and so we can compare the length it
would take if each code point was represented as a char
vs in the &str
itself:
// as chars let eastern = '東'; let capitol = '京'; // both can be represented as three bytes assert_eq!(3, eastern.len_utf8()); assert_eq!(3, capitol.len_utf8()); // as a &str, these two are encoded in UTF-8 let tokyo = "東京"; let len = eastern.len_utf8() + capitol.len_utf8(); // we can see that they take six bytes total... assert_eq!(6, tokyo.len()); // ... just like the &str assert_eq!(len, tokyo.len());
fn len_utf16(self) -> usize
Returns the number of 16-bit code units this char
would need if
encoded in UTF-16.
See the documentation for len_utf8()
for more explanation of this
concept. This function is a mirror, but for UTF-16 instead of UTF-8.
Examples
Basic usage:
fn main() { let n = 'ß'.len_utf16(); assert_eq!(n, 1); let len = '💣'.len_utf16(); assert_eq!(len, 2); }let n = 'ß'.len_utf16(); assert_eq!(n, 1); let len = '💣'.len_utf16(); assert_eq!(len, 2);
fn encode_utf8(self, dst: &mut [u8]) -> Option<usize>
Encodes this character as UTF-8 into the provided byte buffer, and then returns the number of bytes written.
If the buffer is not large enough, nothing will be written into it and a
None
will be returned. A buffer of length four is large enough to
encode any char
.
Examples
In both of these examples, 'ß' takes two bytes to encode.
#![feature(unicode)] fn main() { let mut b = [0; 2]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, Some(2)); }#![feature(unicode)] let mut b = [0; 2]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, Some(2));
A buffer that's too small:
#![feature(unicode)] fn main() { let mut b = [0; 1]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, None); }#![feature(unicode)] let mut b = [0; 1]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, None);
fn encode_utf16(self, dst: &mut [u16]) -> Option<usize>
Encodes this character as UTF-16 into the provided u16
buffer, and
then returns the number of u16
s written.
If the buffer is not large enough, nothing will be written into it and a
None
will be returned. A buffer of length 2 is large enough to encode
any char
.
Examples
In both of these examples, 'ß' takes one u16
to encode.
#![feature(unicode)] let mut b = [0; 1]; let result = 'ß'.encode_utf16(&mut b); assert_eq!(result, Some(1));
A buffer that's too small:
#![feature(unicode)] fn main() { let mut b = [0; 0]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, None); }#![feature(unicode)] let mut b = [0; 0]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, None);
fn is_alphabetic(self) -> bool
Returns true if this char
is an alphabetic code point, and false if not.
Examples
Basic usage:
fn main() { let c = 'a'; assert!(c.is_alphabetic()); let c = '京'; assert!(c.is_alphabetic()); let c = '💝'; // love is many things, but it is not alphabetic assert!(!c.is_alphabetic()); }let c = 'a'; assert!(c.is_alphabetic()); let c = '京'; assert!(c.is_alphabetic()); let c = '💝'; // love is many things, but it is not alphabetic assert!(!c.is_alphabetic());
fn is_xid_start(self) -> bool
Returns true if this char
satisfies the 'XID_Start' Unicode property, and false
otherwise.
'XID_Start' is a Unicode Derived Property specified in
UAX #31,
mostly similar to ID_Start
but modified for closure under NFKx
.
fn is_xid_continue(self) -> bool
Returns true if this char
satisfies the 'XID_Continue' Unicode property, and false
otherwise.
'XID_Continue' is a Unicode Derived Property specified in UAX #31, mostly similar to 'ID_Continue' but modified for closure under NFKx.
fn is_lowercase(self) -> bool
Returns true if this char
is lowercase, and false otherwise.
'Lowercase' is defined according to the terms of the Unicode Derived Core
Property Lowercase
.
Examples
Basic usage:
fn main() { let c = 'a'; assert!(c.is_lowercase()); let c = 'δ'; assert!(c.is_lowercase()); let c = 'A'; assert!(!c.is_lowercase()); let c = 'Δ'; assert!(!c.is_lowercase()); // The various Chinese scripts do not have case, and so: let c = '中'; assert!(!c.is_lowercase()); }let c = 'a'; assert!(c.is_lowercase()); let c = 'δ'; assert!(c.is_lowercase()); let c = 'A'; assert!(!c.is_lowercase()); let c = 'Δ'; assert!(!c.is_lowercase()); // The various Chinese scripts do not have case, and so: let c = '中'; assert!(!c.is_lowercase());
fn is_uppercase(self) -> bool
Returns true if this char
is uppercase, and false otherwise.
'Uppercase' is defined according to the terms of the Unicode Derived Core
Property Uppercase
.
Examples
Basic usage:
fn main() { let c = 'a'; assert!(!c.is_uppercase()); let c = 'δ'; assert!(!c.is_uppercase()); let c = 'A'; assert!(c.is_uppercase()); let c = 'Δ'; assert!(c.is_uppercase()); // The various Chinese scripts do not have case, and so: let c = '中'; assert!(!c.is_uppercase()); }let c = 'a'; assert!(!c.is_uppercase()); let c = 'δ'; assert!(!c.is_uppercase()); let c = 'A'; assert!(c.is_uppercase()); let c = 'Δ'; assert!(c.is_uppercase()); // The various Chinese scripts do not have case, and so: let c = '中'; assert!(!c.is_uppercase());
fn is_whitespace(self) -> bool
Returns true if this char
is whitespace, and false otherwise.
'Whitespace' is defined according to the terms of the Unicode Derived Core
Property White_Space
.
Examples
Basic usage:
fn main() { let c = ' '; assert!(c.is_whitespace()); // a non-breaking space let c = '\u{A0}'; assert!(c.is_whitespace()); let c = '越'; assert!(!c.is_whitespace()); }let c = ' '; assert!(c.is_whitespace()); // a non-breaking space let c = '\u{A0}'; assert!(c.is_whitespace()); let c = '越'; assert!(!c.is_whitespace());
fn is_alphanumeric(self) -> bool
Returns true if this char
is alphanumeric, and false otherwise.
'Alphanumeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No' and the Derived Core Property 'Alphabetic'.
Examples
Basic usage:
fn main() { let c = '٣'; assert!(c.is_alphanumeric()); let c = '7'; assert!(c.is_alphanumeric()); let c = '৬'; assert!(c.is_alphanumeric()); let c = 'K'; assert!(c.is_alphanumeric()); let c = 'و'; assert!(c.is_alphanumeric()); let c = '藏'; assert!(c.is_alphanumeric()); let c = '¾'; assert!(!c.is_alphanumeric()); let c = '①'; assert!(!c.is_alphanumeric()); }let c = '٣'; assert!(c.is_alphanumeric()); let c = '7'; assert!(c.is_alphanumeric()); let c = '৬'; assert!(c.is_alphanumeric()); let c = 'K'; assert!(c.is_alphanumeric()); let c = 'و'; assert!(c.is_alphanumeric()); let c = '藏'; assert!(c.is_alphanumeric()); let c = '¾'; assert!(!c.is_alphanumeric()); let c = '①'; assert!(!c.is_alphanumeric());
fn is_control(self) -> bool
Returns true if this char
is a control code point, and false otherwise.
'Control code point' is defined in terms of the Unicode General
Category Cc
.
Examples
Basic usage:
fn main() { // U+009C, STRING TERMINATOR let c = ''; assert!(c.is_control()); let c = 'q'; assert!(!c.is_control()); }// U+009C, STRING TERMINATOR let c = ''; assert!(c.is_control()); let c = 'q'; assert!(!c.is_control());
fn is_numeric(self) -> bool
Returns true if this char
is numeric, and false otherwise.
'Numeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No'.
Examples
Basic usage:
fn main() { let c = '٣'; assert!(c.is_numeric()); let c = '7'; assert!(c.is_numeric()); let c = '৬'; assert!(c.is_numeric()); let c = 'K'; assert!(!c.is_numeric()); let c = 'و'; assert!(!c.is_numeric()); let c = '藏'; assert!(!c.is_numeric()); let c = '¾'; assert!(!c.is_numeric()); let c = '①'; assert!(!c.is_numeric()); }let c = '٣'; assert!(c.is_numeric()); let c = '7'; assert!(c.is_numeric()); let c = '৬'; assert!(c.is_numeric()); let c = 'K'; assert!(!c.is_numeric()); let c = 'و'; assert!(!c.is_numeric()); let c = '藏'; assert!(!c.is_numeric()); let c = '¾'; assert!(!c.is_numeric()); let c = '①'; assert!(!c.is_numeric());
fn to_lowercase(self) -> ToLowercase
Returns an iterator that yields the lowercase equivalent of a char
.
If no conversion is possible then an iterator with just the input character is returned.
This performs complex unconditional mappings with no tailoring: it maps
one Unicode character to its lowercase equivalent according to the
Unicode database and the additional complex mappings
SpecialCasing.txt
. Conditional mappings (based on context or
language) are not considered here.
For a full reference, see here.
Examples
Basic usage:
fn main() { let c = 'c'; assert_eq!(c.to_uppercase().next(), Some('C')); // Japanese scripts do not have case, and so: let c = '山'; assert_eq!(c.to_uppercase().next(), Some('山')); }let c = 'c'; assert_eq!(c.to_uppercase().next(), Some('C')); // Japanese scripts do not have case, and so: let c = '山'; assert_eq!(c.to_uppercase().next(), Some('山'));
fn to_uppercase(self) -> ToUppercase
Returns an iterator that yields the uppercase equivalent of a char
.
If no conversion is possible then an iterator with just the input character is returned.
This performs complex unconditional mappings with no tailoring: it maps
one Unicode character to its uppercase equivalent according to the
Unicode database and the additional complex mappings
SpecialCasing.txt
. Conditional mappings (based on context or
language) are not considered here.
For a full reference, see here.
Examples
Basic usage:
fn main() { let c = 'c'; assert_eq!(c.to_uppercase().next(), Some('C')); // Japanese does not have case, and so: let c = '山'; assert_eq!(c.to_uppercase().next(), Some('山')); }let c = 'c'; assert_eq!(c.to_uppercase().next(), Some('C')); // Japanese does not have case, and so: let c = '山'; assert_eq!(c.to_uppercase().next(), Some('山'));
In Turkish, the equivalent of 'i' in Latin has five forms instead of two:
- 'Dotless': I / ı, sometimes written ï
- 'Dotted': İ / i
Note that the lowercase dotted 'i' is the same as the Latin. Therefore:
fn main() { let i = 'i'; let upper_i = i.to_uppercase().next(); }let i = 'i'; let upper_i = i.to_uppercase().next();
The value of upper_i
here relies on the language of the text: if we're
in en-US
, it should be Some('I')
, but if we're in tr_TR
, it should
be Some('İ')
. to_uppercase()
does not take this into account, and so:
let i = 'i'; let upper_i = i.to_uppercase().next(); assert_eq!(Some('I'), upper_i);
holds across languages.
Trait Implementations
impl PartialEq<char> for char
impl Eq for char
impl PartialOrd<char> for char
fn partial_cmp(&self, other: &char) -> Option<Ordering>
fn lt(&self, other: &char) -> bool
fn le(&self, other: &char) -> bool
fn ge(&self, other: &char) -> bool
fn gt(&self, other: &char) -> bool
impl Ord for char
impl Clone for char
fn clone(&self) -> char
fn clone_from(&mut self, source: &Self)
impl Default for char
impl<'a> Pattern<'a> for char
Searches for chars that are equal to a given char