2.14 Strings


2.14.1 String types & Unicode

The ECL implementation of strings is ANSI Common-Lisp compliant. There are basically four string types as shown in Table 2.7. As explained in Characters, when Unicode support is disabled, character and base-character are the same type and the last two string types are equivalent to the first two.

AbbreviationExpanded typeRemarks
string(array character (*))8 or 32 bits per character, adjustable.
simple-string(simple-array character (*))8 or 32 bits per character, not adjustable nor displaced.
base-string(array base-char (*))8 bits per character, adjustable.
simple-base-string(simple-array base-char (*))8 bits per character, not adjustable nor displaced.

Table 2.7: Common Lisp string types

It is important to remember that strings with unicode characters can only be printed readably when the external format supports those characters. If this is not the case, ECL will signal a serious-condition. This condition will abort your program if not properly handled.


2.14.2 C reference

2.14.2.1 Base string constructors

Building strings of C data

Functions

Function: cl_object ecl_alloc_adjustable_base_string (cl_index length);
Function: cl_object ecl_alloc_simple_base_string (cl_index length);
Function: cl_object ecl_make_simple_base_string (const char* data, cl_fixnum length);
Function: cl_object ecl_make_constant_base_string (const char* data, cl_fixnum length);

Description

These are different ways to create a base string, which is a string that holds a small subset of characters, the base-char, with codes ranging from 0 to 255.

ecl_alloc_simple_base_string creates an empty string with that much space for characters and a fixed length. The string does not have a fill pointer and cannot be resized, and the initial data is unspecified

ecl_alloc_adjustable_base_string is similar to the previous function, but creates an adjustable string with a fill pointer. This means that the length of the string can be changed and the string itself can be resized to accommodate more data.

The other constructors create strings but use some preexisting data. ecl_make_simple_base_string creates a string copying the data that the user supplies, and using freshly allocated memory. ecl_make_constant_base_string on the other hand, does not allocate memory, but simply uses the supplied pointer as buffer for the string. This last function should be used with care, ensuring that the supplied buffer is not deallocated. If the length argument of these functions is -1, the length is determined by strlen.

2.14.2.2 String accessors

Reading and writing characters into a string

Functions

Function: ecl_character ecl_char (cl_object string, cl_index index);
Function: ecl_character ecl_char_set (cl_object string, cl_index index, ecl_character c);

Description

Access to string information should be done using these two functions. The first one implements the equivalent of the char function from Common Lisp, returning the character that is at position index in the string string.

The counterpart of the previous function is ecl_char_set, which implements (setf char) and stores character c at the position index in the given string.

Both functions check the type of their arguments and verify that the indices do not exceed the string boundaries. Otherwise they signal a serious-condition.

2.14.2.3 Converting Unicode strings

Converting between different encodings. See External formats for a list of supported encodings (external formats).

Functions

Function: ext:octets-to-string octets &key (external-format :default) (start 0) (end nil)

Decode a sequence of octets (i.e. 8-bit bytes) into a string according to the given external format. octets must be a vector whose elements have a size of 8-bit. The bounding index designators start and end optionally denote a subsequence to be decoded. Signals an ext:character-decoding-error if the decoding fails.

Function: ext:string-to-octets string &key (external-format :default) (start 0) (end nil) (null-terminate nil)

Encode a string into a sequence of octets according to the given external format. The bounding index designators start and end optionally denote a subsequence to be encoded. If null-terminate is true, add a terminating null byte. Signals an ext:character-encoding-error if the encoding fails.

Function: cl_object ecl_decode_from_cstring (const char *string, cl_fixnum length, cl_object external_format)

Decode a C string of the given length into a Lisp string using the specified external format. If length is -1, the length is determined by strlen. Returns NULL if the decoding fails.

Function: cl_fixnum ecl_encode_to_cstring (char *output, cl_fixnum output_length, cl_object input, cl_object external_format)

Encode the Lisp string input into a C string of the given length using the specified external format. Returns the number of characters necessary to encode the Lisp string (including the null terminator). If this is larger than output_length, output is unchanged. Returns -1 if the encoding fails.

Function: cl_object ecl_decode_from_unicode_wstring (const wchar_t *string, cl_fixnum length)
Function: cl_fixnum ecl_encode_to_unicode_wstring (wchar_t *output, cl_fixnum output_length, cl_object input)

These functions work the same as ecl_decode_from_cstring, ecl_encode_to_cstring, except that the external format used is either utf-8, utf-16 or utf-32 depending on whether sizeof(wchar_t) is 1, 2, or 4 respectively.

2.14.2.4 ANSI dictionary

Common Lisp and C equivalence

Lisp symbolC function
simple-string-pcl_object cl_simple_string_p(cl_object string)
charcl_object cl_char(cl_object string, cl_object index)
(setf char)cl_object si_char_set(cl_object string, cl_object index, cl_object char)
scharcl_object cl_schar(cl_object string, cl_object index)
(setf schar)cl_object si_char_set(cl_object string, cl_object index, cl_object char)
stringcl_object cl_string(cl_object x)
string-upcasecl_object cl_string_upcase(cl_narg narg, cl_obejct string, ...)
string-downcasecl_object cl_string_downcase(cl_narg narg, cl_obejct string, ...)
string-capitalizecl_object cl_string_capitalize(cl_narg narg, cl_obejct string, ...)
nstring-upcasecl_object cl_nstring_upcase(cl_narg narg, cl_obejct string, ...)
nstring-downcasecl_object cl_nstring_downcase(cl_narg narg, cl_obejct string, ...)
nstring-capitalizecl_object cl_nstring_capitalize(cl_narg narg, cl_obejct string, ...)
string-trimcl_object cl_string_trim(cl_object character_bag, cl_object string)
string-left-trimcl_object cl_string_left_trim(cl_object character_bag, cl_object string)
string-right-trimcl_object cl_string_right_trim(cl_object character_bag, cl_object string)
stringcl_object cl_string(cl_object x)
string=cl_object cl_stringE(cl_narg narg, cl_object string1, cl_object string2, ...)
string/=cl_object cl_stringNE(cl_narg narg, cl_object string1, cl_object string2, ...)
string<cl_object cl_stringL(cl_narg narg, cl_object string1, cl_object string2, ...)
string>cl_object cl_stringG(cl_narg narg, cl_object string1, cl_object string2, ...)
string<=cl_object cl_stringLE(cl_narg narg, cl_object string1, cl_object string2, ...)
string>=cl_object cl_stringGE(cl_narg narg, cl_object string1, cl_object string2, ...)
string-equalcl_object cl_string_equal(cl_narg narg, cl_object string1, cl_object string2, ...)
string-not-equalcl_object cl_string_not_equal(cl_narg narg, cl_object string1, cl_object string2, ...)
string-lesspcl_object cl_string_lessp(cl_narg narg, cl_object string1, cl_object string2, ...)
string-greaterpcl_object cl_string_greaterp(cl_narg narg, cl_object string1, cl_object string2, ...)
string-not-greaterpcl_object cl_string_not_greaterp(cl_narg narg, cl_object string1, cl_object string2, ...)
string-not-lesspcl_object cl_string_not_lessp(cl_narg narg, cl_object string1, cl_object string2, ...)
stringpcl_object cl_stringp(cl_object x)
make-stringcl_object cl_make_string(cl_narg narg, cl_object size, ...)