Characters (ECL Manual)

Next: Conses, Previous: Numbers, Up: Standards [Contents][Index]

2.11 Characters

ECL is fully ANSI Common-Lisp compliant in all aspects of the character data type, with the following peculiarities.

Unicode vs. POSIX locale
#\Newline characters
C Reference

Next: #\Newline characters, Up: Characters [Contents][Index]

2.11.1 Unicode vs. POSIX locale

There are two ways of building ECL: with C or with Unicode character codes. These build modes are accessed using the --disable-unicode and --enable-unicode configuration options, the last one being the default.

When using C characters we are actually relying on the char type of the C language, using the C library functions for tasks such as character conversions, comparison, etc. In this case characters are typically 8 bit wide and the character order and collation are determines by the current POSIX or C locale. This is not very accurate, leaves out many languages and character encodings but it is sufficient for small applications that do not need multilingual support.

When no option is specified ECL builds with support for a larger character set, the Unicode 6.0 standard. This uses 24 bit large character codes, also known as codepoints, with a large database of character properties which include their nature (alphanumeric, numeric, etc), their case, their collation properties, whether they are standalone or composing characters, etc.

Character types
Character names

2.11.1.1 Character types

If ECL is compiled without Unicode support, all characters are implemented using 8-bit codes and the type extended-char is empty. If compiled with Unicode support, characters are implemented using 24 bits and the extended-char type covers characters above code 255.

Type	Without Unicode	With Unicode
standard-char	#\Newline,32-126	#\Newline,32-126
base-char	0-255	0-255
extended-char	-	256-16777215

2.11.1.2 Character names

All characters have a name. For non-printing characters between 0 and 32, and for 127 we use the ordinary ASCII names. Characters above 127 are printed and read using hexadecimal Unicode notation, with a U followed by 24 bit hexadecimal number, as in U0126.

Character	Code
#\Null	0
#\Ack	1
#\Bell	7
#\Backspace	8
#\Tab	9
#\Newline	10
#\Linefeed	10
#\Page	12
#\Esc	27
#\Escape	27
#\Space	32
#\Rubout	127
#\U0080	128

Table 2.6

Note that #\Linefeed is synonymous with #\Newline and thus is a member of standard-char.

Next: C Reference, Previous: Unicode vs. POSIX locale, Up: Characters [Contents][Index]

2.11.2 `#\Newline` characters

Internally, ECL represents the #\Newline character by a single code. However, when using external formats, ECL may parse character pairs as a single #\Newline, and vice versa, use multiple characters to represent a single #\Newline, see External formats.

Previous: #\Newline characters, Up: Characters [Contents][Index]

2.11.3.1 C types

C character types

Type names

ecl_character	character
ecl_base_char	base-char

Description

ECL defines two C types to hold its characters: ecl_base_char and ecl_character.

When ECL is built without Unicode, they both coincide and typically match unsigned char, to cover the 256 codes that are needed.
When ECL is built with Unicode, the two types are no longer equivalent, with ecl_character being larger.

For your code to be portable and future proof, use both types to really express what you intend to do.

2.11.3.2 Constructors

Creating and extracting characters from Lisp objects

Functions

Macro: cl_object ECL_CODE_CHAR (ecl_character code); ¶
Macro: ecl_character ECL_CHAR_CODE (cl_object o); ¶

Function: ecl_character ecl_char_code (cl_object o); ¶

Function: ecl_base_char ecl_base_char_code (cl_object o); ¶

Description

These functions and macros convert back and forth from C character types to Lisp. The macros ECL_CHAR_CODE and ECL_CODE_CHAR perform this coercion without checking the arguments. The functions ecl_char_code and ecl_base_char_code, on the other hand, verify that the argument has the right type and signal an error otherwise.

2.11.3.3 Predicates

C predicates for Lisp characters

Functions

Function: bool ecl_base_char_p (ecl_character c); ¶

Function: bool ecl_alpha_char_p (ecl_character c); ¶

Function: bool ecl_alphanumericp (ecl_character c); ¶

Function: bool ecl_graphic_char_p (ecl_character c); ¶

Function: bool ecl_digitp (ecl_character c); ¶

Function: bool ecl_standard_char_p (ecl_character c); ¶

Description

These functions are equivalent to their Lisp equivalents but return C booleans.

2.11.3.4 Character case

C functions related to the character case

Functions

Function: bool ecl_upper_case_p (ecl_character c); ¶

Function: bool ecl_lower_case_p (ecl_character c); ¶

Function: bool ecl_both_case_p (ecl_character c); ¶

Function: ecl_character ecl_char_downcase (ecl_character c); ¶

Function: ecl_character ecl_char_upcase (ecl_character c); ¶

Description

These functions check or change the case of a character. Note that in a Unicode context, the output of these functions might not be accurate (for instance when the uppercase character has two or more codepoints).

2.11.3.5 ANSI Dictionary

Common Lisp and C equivalence

Lisp symbol	C function
char=	cl_object cl_charE(cl_narg narg, ...)
char/=	cl_object cl_charNE(cl_narg narg, ...)
char<	cl_object cl_charL(cl_narg narg, ...)
char>	cl_object cl_charG(cl_narg narg, ...)
char<=	cl_object cl_charLE(cl_narg narg, ...)
char>=	cl_object cl_charGE(cl_narg narg, ...)
char-equal	cl_object cl_char_equal(cl_narg narg, ...)
char-not-equal	cl_object cl_char_not_equal(cl_narg narg, ...)
char-lessp	cl_object cl_char_lessp(cl_narg narg, ...)
char-greaterp	cl_object cl_char_greaterp(cl_narg narg, ...)
char-not-greaterp	cl_object cl_char_not_greaterp(cl_narg narg, ...)
char-not-lessp	cl_object cl_char_not_lessp(cl_narg narg, ...)
character	cl_object cl_character(cl_object char_designator)
characterp	cl_object cl_characterp(cl_object object)
alpha-char-p	cl_object cl_alpha_char_p(cl_object character)
alphanumericp	cl_object cl_alphanumericp(cl_object character)
digit-char	cl_object cl_digit_char(cl_narg narg, cl_object character, ...)
digit-char-p	cl_object cl_digit_char_p(cl_narg narg, cl_object character, ...)
graphic-char-p	cl_object cl_graphic_char_p(cl_object character)
standard-char-p	cl_object cl_standard_char_p(cl_object character)
char_upcase	cl_object cl_char_upcase(cl_object character)
char-downcase	cl_object cl_char_downcase(cl_object character)
upper-case-p	cl_object cl_upper_case_p(cl_object character)
lower-case-p	cl_object cl_lower_case_p(cl_object character)
both-case-p	cl_object cl_both_case_p(cl_object character)
char-code	cl_object cl_char_code(cl_object character)
char-int	cl_object cl_char_int(cl_object character)
code-char	cl_object cl_code_char(cl_object code)
char-name	cl_object cl_char_name(cl_object character)
name-char	cl_object cl_name_char(cl_object name)
char-code-limit	ECL_CHAR_CODE_LIMIT

2.11 Characters

2.11.1 Unicode vs. POSIX locale

2.11.1.1 Character types

2.11.1.2 Character names

2.11.2 #\Newline characters

2.11.3 C Reference

2.11.3.1 C types

2.11.3.2 Constructors

Functions

2.11.3.3 Predicates

Functions

2.11.3.4 Character case

Functions

2.11.3.5 ANSI Dictionary

2.11.2 `#\Newline` characters