Primitive Type char []

A character type.

The char type represents a single character. More specifically, since 'character' isn't a well-defined concept in Unicode, char is a 'Unicode scalar value', which is similar to, but not the same as, a 'Unicode code point'.

This documentation describes a number of methods and trait implementations on the char type. For technical reasons, there is additional, separate documentation in the std::char module as well.

Representation

char is always four bytes in size. This is a different representation than a given character would have as part of a String, for example:

fn main() { let v = vec!['h', 'e', 'l', 'l', 'o']; // five elements times four bytes for each element assert_eq!(20, v.len() * std::mem::size_of::<char>()); let s = String::from("hello"); // five elements times one byte per element assert_eq!(5, s.len() * std::mem::size_of::<u8>()); }
let v = vec!['h', 'e', 'l', 'l', 'o'];

// five elements times four bytes for each element
assert_eq!(20, v.len() * std::mem::size_of::<char>());

let s = String::from("hello");

// five elements times one byte per element
assert_eq!(5, s.len() * std::mem::size_of::<u8>());

As always, remember that a human intuition for 'character' may not map to Unicode's definitions. For example, emoji symbols such as '❤️' are more than one byte; ❤️ in particular is six:

fn main() { let s = String::from("❤️"); // six bytes times one byte for each element assert_eq!(6, s.len() * std::mem::size_of::<u8>()); }
let s = String::from("❤️");

// six bytes times one byte for each element
assert_eq!(6, s.len() * std::mem::size_of::<u8>());

This also means it won't fit into a char, and so trying to create a literal with let heart = '❤️'; gives an error:

error: character literal may only contain one codepoint: '❤
let heart = '❤️';
            ^~

Another implication of this is that if you want to do per-character processing, it can end up using a lot more memory:

fn main() { let s = String::from("love: ❤️"); let v: Vec<char> = s.chars().collect(); assert_eq!(12, s.len() * std::mem::size_of::<u8>()); assert_eq!(32, v.len() * std::mem::size_of::<char>()); }
let s = String::from("love: ❤️");
let v: Vec<char> = s.chars().collect();

assert_eq!(12, s.len() * std::mem::size_of::<u8>());
assert_eq!(32, v.len() * std::mem::size_of::<char>());

Or may give you results you may not expect:

fn main() { let s = String::from("❤️"); let mut iter = s.chars(); // we get two chars out of a single ❤️ assert_eq!(Some('\u{2764}'), iter.next()); assert_eq!(Some('\u{fe0f}'), iter.next()); assert_eq!(None, iter.next()); }
let s = String::from("❤️");

let mut iter = s.chars();

// we get two chars out of a single ❤️
assert_eq!(Some('\u{2764}'), iter.next());
assert_eq!(Some('\u{fe0f}'), iter.next());
assert_eq!(None, iter.next());

Methods

impl char

fn is_digit(self, radix: u32) -> bool

Checks if a char is a digit in the given radix.

A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexicdecimal, to give some common values. Arbitrary radicum are supported.

Compared to is_numeric(), this function only recognizes the characters 0-9, a-z and A-Z.

'Digit' is defined to be only the following characters:

  • 0-9
  • a-z
  • A-Z

For a more comprehensive understanding of 'digit', see is_numeric().

Panics

Panics if given a radix larger than 36.

Examples

Basic usage:

fn main() { let d = '1'; assert!(d.is_digit(10)); let d = 'f'; assert!(d.is_digit(16)); assert!(!d.is_digit(10)); }
let d = '1';

assert!(d.is_digit(10));

let d = 'f';

assert!(d.is_digit(16));
assert!(!d.is_digit(10));

Passing a large radix, causing a panic:

fn main() { use std::thread; let result = thread::spawn(|| { let d = '1'; // this panics d.is_digit(37); }).join(); assert!(result.is_err()); }
use std::thread;

let result = thread::spawn(|| {
    let d = '1';

    // this panics
    d.is_digit(37);
}).join();

assert!(result.is_err());

fn to_digit(self, radix: u32) -> Option<u32>

Converts a char to a digit in the given radix.

A 'radix' here is sometimes also called a 'base'. A radix of two indicates a binary number, a radix of ten, decimal, and a radix of sixteen, hexicdecimal, to give some common values. Arbitrary radicum are supported.

'Digit' is defined to be only the following characters:

  • 0-9
  • a-z
  • A-Z

Failure

Returns None if the char does not refer to a digit in the given radix.

Panics

Panics if given a radix larger than 36.

Examples

Basic usage:

fn main() { let d = '1'; assert_eq!(d.to_digit(10), Some(1)); let d = 'f'; assert_eq!(d.to_digit(16), Some(15)); }
let d = '1';

assert_eq!(d.to_digit(10), Some(1));

let d = 'f';

assert_eq!(d.to_digit(16), Some(15));

Passing a non-digit results in failure:

fn main() { let d = 'f'; assert_eq!(d.to_digit(10), None); let d = 'z'; assert_eq!(d.to_digit(16), None); }
let d = 'f';

assert_eq!(d.to_digit(10), None);

let d = 'z';

assert_eq!(d.to_digit(16), None);

Passing a large radix, causing a panic:

fn main() { use std::thread; let result = thread::spawn(|| { let d = '1'; d.to_digit(37); }).join(); assert!(result.is_err()); }
use std::thread;

let result = thread::spawn(|| {
  let d = '1';

  d.to_digit(37);
}).join();

assert!(result.is_err());

fn escape_unicode(self) -> EscapeUnicode

Returns an iterator that yields the hexadecimal Unicode escape of a character, as chars.

All characters are escaped with Rust syntax of the form \\u{NNNN} where NNNN is the shortest hexadecimal representation.

Examples

Basic usage:

fn main() { for c in '❤'.escape_unicode() { print!("{}", c); } println!(""); }
for c in '❤'.escape_unicode() {
    print!("{}", c);
}
println!("");

This prints:

\u{2764}

Collecting into a String:

fn main() { let heart: String = '❤'.escape_unicode().collect(); assert_eq!(heart, r"\u{2764}"); }
let heart: String = '❤'.escape_unicode().collect();

assert_eq!(heart, r"\u{2764}");

fn escape_default(self) -> EscapeDefault

Returns an iterator that yields the literal escape code of a char.

The default is chosen with a bias toward producing literals that are legal in a variety of languages, including C++11 and similar C-family languages. The exact rules are:

  • Tab is escaped as \t.
  • Carriage return is escaped as \r.
  • Line feed is escaped as \n.
  • Single quote is escaped as \'.
  • Double quote is escaped as \".
  • Backslash is escaped as \\.
  • Any character in the 'printable ASCII' range 0x20 .. 0x7e inclusive is not escaped.
  • All other characters are given hexadecimal Unicode escapes; see escape_unicode.

Examples

Basic usage:

fn main() { for i in '"'.escape_default() { println!("{}", i); } }
for i in '"'.escape_default() {
    println!("{}", i);
}

This prints:

\
"

Collecting into a String:

fn main() { let quote: String = '"'.escape_default().collect(); assert_eq!(quote, "\\\""); }
let quote: String = '"'.escape_default().collect();

assert_eq!(quote, "\\\"");

fn len_utf8(self) -> usize

Returns the number of bytes this char would need if encoded in UTF-8.

That number of bytes is always between 1 and 4, inclusive.

Examples

Basic usage:

fn main() { let len = 'A'.len_utf8(); assert_eq!(len, 1); let len = 'ß'.len_utf8(); assert_eq!(len, 2); let len = 'ℝ'.len_utf8(); assert_eq!(len, 3); let len = '💣'.len_utf8(); assert_eq!(len, 4); }
let len = 'A'.len_utf8();
assert_eq!(len, 1);

let len = 'ß'.len_utf8();
assert_eq!(len, 2);

let len = 'ℝ'.len_utf8();
assert_eq!(len, 3);

let len = '💣'.len_utf8();
assert_eq!(len, 4);

The &str type guarantees that its contents are UTF-8, and so we can compare the length it would take if each code point was represented as a char vs in the &str itself:

fn main() { // as chars let eastern = '東'; let capitol = '京'; // both can be represented as three bytes assert_eq!(3, eastern.len_utf8()); assert_eq!(3, capitol.len_utf8()); // as a &str, these two are encoded in UTF-8 let tokyo = "東京"; let len = eastern.len_utf8() + capitol.len_utf8(); // we can see that they take six bytes total... assert_eq!(6, tokyo.len()); // ... just like the &str assert_eq!(len, tokyo.len()); }
// as chars
let eastern = '東';
let capitol = '京';

// both can be represented as three bytes
assert_eq!(3, eastern.len_utf8());
assert_eq!(3, capitol.len_utf8());

// as a &str, these two are encoded in UTF-8
let tokyo = "東京";

let len = eastern.len_utf8() + capitol.len_utf8();

// we can see that they take six bytes total...
assert_eq!(6, tokyo.len());

// ... just like the &str
assert_eq!(len, tokyo.len());

fn len_utf16(self) -> usize

Returns the number of 16-bit code units this char would need if encoded in UTF-16.

See the documentation for len_utf8() for more explanation of this concept. This function is a mirror, but for UTF-16 instead of UTF-8.

Examples

Basic usage:

fn main() { let n = 'ß'.len_utf16(); assert_eq!(n, 1); let len = '💣'.len_utf16(); assert_eq!(len, 2); }
let n = 'ß'.len_utf16();
assert_eq!(n, 1);

let len = '💣'.len_utf16();
assert_eq!(len, 2);

fn encode_utf8(self, dst: &mut [u8]) -> Option<usize>

Unstable (unicode #27784)

: pending decision about Iterator/Writer/Reader

Encodes this character as UTF-8 into the provided byte buffer, and then returns the number of bytes written.

If the buffer is not large enough, nothing will be written into it and a None will be returned. A buffer of length four is large enough to encode any char.

Examples

In both of these examples, 'ß' takes two bytes to encode.

#![feature(unicode)] fn main() { let mut b = [0; 2]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, Some(2)); }
#![feature(unicode)]

let mut b = [0; 2];

let result = 'ß'.encode_utf8(&mut b);

assert_eq!(result, Some(2));

A buffer that's too small:

#![feature(unicode)] fn main() { let mut b = [0; 1]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, None); }
#![feature(unicode)]

let mut b = [0; 1];

let result = 'ß'.encode_utf8(&mut b);

assert_eq!(result, None);

fn encode_utf16(self, dst: &mut [u16]) -> Option<usize>

Unstable (unicode #27784)

: pending decision about Iterator/Writer/Reader

Encodes this character as UTF-16 into the provided u16 buffer, and then returns the number of u16s written.

If the buffer is not large enough, nothing will be written into it and a None will be returned. A buffer of length 2 is large enough to encode any char.

Examples

In both of these examples, 'ß' takes one u16 to encode.

#![feature(unicode)] fn main() { let mut b = [0; 1]; let result = 'ß'.encode_utf16(&mut b); assert_eq!(result, Some(1)); }
#![feature(unicode)]

let mut b = [0; 1];

let result = 'ß'.encode_utf16(&mut b);

assert_eq!(result, Some(1));

A buffer that's too small:

#![feature(unicode)] fn main() { let mut b = [0; 0]; let result = 'ß'.encode_utf8(&mut b); assert_eq!(result, None); }
#![feature(unicode)]

let mut b = [0; 0];

let result = 'ß'.encode_utf8(&mut b);

assert_eq!(result, None);

fn is_alphabetic(self) -> bool

Returns true if this char is an alphabetic code point, and false if not.

Examples

Basic usage:

fn main() { let c = 'a'; assert!(c.is_alphabetic()); let c = '京'; assert!(c.is_alphabetic()); let c = '💝'; // love is many things, but it is not alphabetic assert!(!c.is_alphabetic()); }
let c = 'a';

assert!(c.is_alphabetic());

let c = '京';
assert!(c.is_alphabetic());

let c = '💝';
// love is many things, but it is not alphabetic
assert!(!c.is_alphabetic());

fn is_xid_start(self) -> bool

Unstable (unicode #0)

: mainly needed for compiler internals

Returns true if this char satisfies the 'XID_Start' Unicode property, and false otherwise.

'XID_Start' is a Unicode Derived Property specified in UAX #31, mostly similar to ID_Start but modified for closure under NFKx.

fn is_xid_continue(self) -> bool

Unstable (unicode #0)

: mainly needed for compiler internals

Returns true if this char satisfies the 'XID_Continue' Unicode property, and false otherwise.

'XID_Continue' is a Unicode Derived Property specified in UAX #31, mostly similar to 'ID_Continue' but modified for closure under NFKx.

fn is_lowercase(self) -> bool

Returns true if this char is lowercase, and false otherwise.

'Lowercase' is defined according to the terms of the Unicode Derived Core Property Lowercase.

Examples

Basic usage:

fn main() { let c = 'a'; assert!(c.is_lowercase()); let c = 'δ'; assert!(c.is_lowercase()); let c = 'A'; assert!(!c.is_lowercase()); let c = 'Δ'; assert!(!c.is_lowercase()); // The various Chinese scripts do not have case, and so: let c = '中'; assert!(!c.is_lowercase()); }
let c = 'a';
assert!(c.is_lowercase());

let c = 'δ';
assert!(c.is_lowercase());

let c = 'A';
assert!(!c.is_lowercase());

let c = 'Δ';
assert!(!c.is_lowercase());

// The various Chinese scripts do not have case, and so:
let c = '中';
assert!(!c.is_lowercase());

fn is_uppercase(self) -> bool

Returns true if this char is uppercase, and false otherwise.

'Uppercase' is defined according to the terms of the Unicode Derived Core Property Uppercase.

Examples

Basic usage:

fn main() { let c = 'a'; assert!(!c.is_uppercase()); let c = 'δ'; assert!(!c.is_uppercase()); let c = 'A'; assert!(c.is_uppercase()); let c = 'Δ'; assert!(c.is_uppercase()); // The various Chinese scripts do not have case, and so: let c = '中'; assert!(!c.is_uppercase()); }
let c = 'a';
assert!(!c.is_uppercase());

let c = 'δ';
assert!(!c.is_uppercase());

let c = 'A';
assert!(c.is_uppercase());

let c = 'Δ';
assert!(c.is_uppercase());

// The various Chinese scripts do not have case, and so:
let c = '中';
assert!(!c.is_uppercase());

fn is_whitespace(self) -> bool

Returns true if this char is whitespace, and false otherwise.

'Whitespace' is defined according to the terms of the Unicode Derived Core Property White_Space.

Examples

Basic usage:

fn main() { let c = ' '; assert!(c.is_whitespace()); // a non-breaking space let c = '\u{A0}'; assert!(c.is_whitespace()); let c = '越'; assert!(!c.is_whitespace()); }
let c = ' ';
assert!(c.is_whitespace());

// a non-breaking space
let c = '\u{A0}';
assert!(c.is_whitespace());

let c = '越';
assert!(!c.is_whitespace());

fn is_alphanumeric(self) -> bool

Returns true if this char is alphanumeric, and false otherwise.

'Alphanumeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No' and the Derived Core Property 'Alphabetic'.

Examples

Basic usage:

fn main() { let c = '٣'; assert!(c.is_alphanumeric()); let c = '7'; assert!(c.is_alphanumeric()); let c = '৬'; assert!(c.is_alphanumeric()); let c = 'K'; assert!(c.is_alphanumeric()); let c = 'و'; assert!(c.is_alphanumeric()); let c = '藏'; assert!(c.is_alphanumeric()); let c = '¾'; assert!(!c.is_alphanumeric()); let c = '①'; assert!(!c.is_alphanumeric()); }
let c = '٣';
assert!(c.is_alphanumeric());

let c = '7';
assert!(c.is_alphanumeric());

let c = '৬';
assert!(c.is_alphanumeric());

let c = 'K';
assert!(c.is_alphanumeric());

let c = 'و';
assert!(c.is_alphanumeric());

let c = '藏';
assert!(c.is_alphanumeric());

let c = '¾';
assert!(!c.is_alphanumeric());

let c = '①';
assert!(!c.is_alphanumeric());

fn is_control(self) -> bool

Returns true if this char is a control code point, and false otherwise.

'Control code point' is defined in terms of the Unicode General Category Cc.

Examples

Basic usage:

fn main() { // U+009C, STRING TERMINATOR let c = 'œ'; assert!(c.is_control()); let c = 'q'; assert!(!c.is_control()); }
// U+009C, STRING TERMINATOR
let c = 'œ';
assert!(c.is_control());

let c = 'q';
assert!(!c.is_control());

fn is_numeric(self) -> bool

Returns true if this char is numeric, and false otherwise.

'Numeric'-ness is defined in terms of the Unicode General Categories 'Nd', 'Nl', 'No'.

Examples

Basic usage:

fn main() { let c = '٣'; assert!(c.is_numeric()); let c = '7'; assert!(c.is_numeric()); let c = '৬'; assert!(c.is_numeric()); let c = 'K'; assert!(!c.is_numeric()); let c = 'و'; assert!(!c.is_numeric()); let c = '藏'; assert!(!c.is_numeric()); let c = '¾'; assert!(!c.is_numeric()); let c = '①'; assert!(!c.is_numeric()); }
let c = '٣';
assert!(c.is_numeric());

let c = '7';
assert!(c.is_numeric());

let c = '৬';
assert!(c.is_numeric());

let c = 'K';
assert!(!c.is_numeric());

let c = 'و';
assert!(!c.is_numeric());

let c = '藏';
assert!(!c.is_numeric());

let c = '¾';
assert!(!c.is_numeric());

let c = '①';
assert!(!c.is_numeric());

fn to_lowercase(self) -> ToLowercase

Returns an iterator that yields the lowercase equivalent of a char.

If no conversion is possible then an iterator with just the input character is returned.

This performs complex unconditional mappings with no tailoring: it maps one Unicode character to its lowercase equivalent according to the Unicode database and the additional complex mappings SpecialCasing.txt. Conditional mappings (based on context or language) are not considered here.

For a full reference, see here.

Examples

Basic usage:

fn main() { let c = 'c'; assert_eq!(c.to_uppercase().next(), Some('C')); // Japanese scripts do not have case, and so: let c = '山'; assert_eq!(c.to_uppercase().next(), Some('山')); }
let c = 'c';

assert_eq!(c.to_uppercase().next(), Some('C'));

// Japanese scripts do not have case, and so:
let c = '山';
assert_eq!(c.to_uppercase().next(), Some('山'));

fn to_uppercase(self) -> ToUppercase

Returns an iterator that yields the uppercase equivalent of a char.

If no conversion is possible then an iterator with just the input character is returned.

This performs complex unconditional mappings with no tailoring: it maps one Unicode character to its uppercase equivalent according to the Unicode database and the additional complex mappings SpecialCasing.txt. Conditional mappings (based on context or language) are not considered here.

For a full reference, see here.

Examples

Basic usage:

fn main() { let c = 'c'; assert_eq!(c.to_uppercase().next(), Some('C')); // Japanese does not have case, and so: let c = '山'; assert_eq!(c.to_uppercase().next(), Some('山')); }
let c = 'c';
assert_eq!(c.to_uppercase().next(), Some('C'));

// Japanese does not have case, and so:
let c = '山';
assert_eq!(c.to_uppercase().next(), Some('山'));

In Turkish, the equivalent of 'i' in Latin has five forms instead of two:

  • 'Dotless': I / ı, sometimes written ï
  • 'Dotted': İ / i

Note that the lowercase dotted 'i' is the same as the Latin. Therefore:

fn main() { let i = 'i'; let upper_i = i.to_uppercase().next(); }
let i = 'i';

let upper_i = i.to_uppercase().next();

The value of upper_i here relies on the language of the text: if we're in en-US, it should be Some('I'), but if we're in tr_TR, it should be Some('İ'). to_uppercase() does not take this into account, and so:

fn main() { let i = 'i'; let upper_i = i.to_uppercase().next(); assert_eq!(Some('I'), upper_i); }
let i = 'i';

let upper_i = i.to_uppercase().next();

assert_eq!(Some('I'), upper_i);

holds across languages.

Trait Implementations

impl PartialEq<char> for char

fn eq(&self, other: &char) -> bool

fn ne(&self, other: &char) -> bool

impl Eq for char

impl PartialOrd<char> for char

fn partial_cmp(&self, other: &char) -> Option<Ordering>

fn lt(&self, other: &char) -> bool

fn le(&self, other: &char) -> bool

fn ge(&self, other: &char) -> bool

fn gt(&self, other: &char) -> bool

impl Ord for char

fn cmp(&self, other: &char) -> Ordering

impl Clone for char

fn clone(&self) -> char

fn clone_from(&mut self, source: &Self)

impl Default for char

fn default() -> char

impl<'a> Pattern<'a> for char

Searches for chars that are equal to a given char

type Searcher = CharSearcher<'a>

fn into_searcher(self, haystack: &'a str) -> CharSearcher<'a>

fn is_contained_in(self, haystack: &'a str) -> bool

fn is_prefix_of(self, haystack: &'a str) -> bool

fn is_suffix_of(self, haystack: &'a str) -> bool where CharSearcher<'a>: ReverseSearcher<'a>

impl Hash for char

fn hash<H>(&self, state: &mut H) where H: Hasher

fn hash_slice<H>(data: &[Self], state: &mut H) where H: Hasher

impl Debug for char

fn fmt(&self, f: &mut Formatter) -> Result<(), Error>

impl Display for char

fn fmt(&self, f: &mut Formatter) -> Result<(), Error>

impl AsciiExt for char

type Owned = char

fn is_ascii(&self) -> bool

fn to_ascii_uppercase(&self) -> char

fn to_ascii_lowercase(&self) -> char

fn eq_ignore_ascii_case(&self, other: &char) -> bool

fn make_ascii_uppercase(&mut self)

fn make_ascii_lowercase(&mut self)