Bytes¶
The following document shows functions and methods used to manipulate and process Chapel bytes variables.
- type bytes¶
The bytes
type is similar to a string but allows arbitrary
data to be stored in it. Methods on bytes that interpret the data as
characters assume that the bytes are ASCII characters.
Creating bytes
¶
A
bytes
can be created using the literals similar to strings:
var b = b"my bytes";
If you need to create
bytes
using a specific buffer (i.e. data in anotherbytes
, a c_string or a C pointer) you can use the factory functions shown below, such ascreateBytesWithNewBuffer
.
bytes
and string
¶
As bytes
can store arbitrary data, any string
can
be cast to bytes
. In that event, the bytes will store UTF-8
encoded character data. However, a bytes
can contain non-UTF-8
bytes and needs to be decoded to be converted to string.
var s = "my string";
var b = s:bytes; // this is legal
/*
The reverse is not. The following is a compiler error:
var s2 = b:string;
*/
var s2 = b.decode(); // you need to decode a bytes to convert it to a string
See the documentation for the decode
method for details.
Similarly, a bytes
can be initialized using a string:
var s = "my string";
var b: bytes = s;
Casts from bytes
to a Numeric Type¶
This module supports casts from bytes
to numeric types. Such
casts will interpret the bytes
as ASCII characters and convert it
to the numeric type and throw an error if the bytes
does not
match the expected format of a number. For example:
var b = b"a";
var number = b:int;
throws an error when it is executed, but
var b = b"1";
var number = b:int;
stores the value 1
in number
.
To learn more about handling these errors, see the Error Handling technical note.
- proc createBytesWithBorrowedBuffer(x: bytes): bytes¶
Creates a new
bytes
which borrows the internal buffer of anotherbytes
. If the buffer is freed before thebytes
returned from this function, accessing it is undefined behavior.
- proc createBytesWithBorrowedBuffer(x: c_string, length = x.size): bytes
Creates a new
bytes
which borrows the internal buffer of a c_string. If the buffer is freed before thebytes
returned from this function, accessing it is undefined behavior.- Arguments
s – c_string to borrow the buffer from
length : int – Length of s’s buffer, excluding the terminating null byte.
- Returns
A new
bytes
- proc createBytesWithBorrowedBuffer(x: c_ptr(?t), length: int, size: int): bytes
Creates a new
bytes
which borrows the memory allocated for a c_ptr. If the buffer is freed before thebytes
returned from this function, accessing it is undefined behavior.- Arguments
s – Buffer to borrow
length – Length of the buffer s, excluding the terminating null byte.
size – Size of memory allocated for s in bytes
- Returns
A new
bytes
- proc createBytesWithOwnedBuffer(x: c_string, length = x.size): bytes¶
Creates a new
bytes
which takes ownership of the internal buffer of a c_string.The buffer will be freed when thebytes
is deinitialized.- Arguments
s – The c_string to take ownership of the buffer from
length : int – Length of s’s buffer, excluding the terminating null byte.
- Returns
A new
bytes
- proc createBytesWithOwnedBuffer(x: c_ptr(?t), length: int, size: int): bytes
Creates a new
bytes
which takes ownership of the memory allocated for a c_ptr. The buffer will be freed when thebytes
is deinitialized.- Arguments
s – The buffer to take ownership of
length – Length of the buffer s, excluding the terminating null byte.
size – Size of memory allocated for s in bytes
- Returns
A new
bytes
- proc createBytesWithNewBuffer(x: bytes): bytes¶
Creates a new
bytes
by creating a copy of the buffer of anotherbytes
.
- proc createBytesWithNewBuffer(x: c_string, length = x.size): bytes
Creates a new
bytes
by creating a copy of the buffer of a c_string.- Arguments
s – The c_string to copy the buffer from
length : int – Length of s’s buffer, excluding the terminating null byte.
- Returns
A new
bytes
- proc createBytesWithNewBuffer(x: c_ptr(?t), length: int, size = length + 1): bytes
Creates a new
bytes
by creating a copy of a buffer.- Arguments
s – The buffer to copy
length – Length of buffer s, excluding the terminating null byte.
size – Size of memory allocated for s in bytes
- Returns
A new
bytes
- proc bytes.indices: range¶
- Returns
The indices that can be used to index into the bytes (i.e., the range
0..<this.size
)
- proc bytes.localize(): bytes¶
Gets a version of the
bytes
that is on the currently executing locale.- Returns
A shallow copy if the
bytes
is already on the current locale, otherwise a deep copy is performed.
- proc bytes.c_str(): c_string¶
Gets a c_string from a
bytes
. The returned c_string shares the buffer with thebytes
.Warning
This can only be called safely on a
bytes
whose home is the current locale. This property can be enforced by callingbytes.localize()
beforec_str()
. If the bytes is remote, the program will halt.For example:
var myBytes = b"Hello!"; on different_locale { printf("%s", myBytes.localize().c_str()); }
- Returns
A c_string that points to the underlying buffer used by this
bytes
. The returned c_string is only valid when used on the same locale as the bytes.
- proc bytes.item(i: int): bytes¶
Gets an ASCII character from the
bytes
- Arguments
i – The index
- Returns
A 1-length
bytes
- proc bytes.this(i: int): uint(8)¶
Gets a byte from the
bytes
- Arguments
i – The index
- Returns
uint(8)
- proc bytes.byte(i: int): uint(8)¶
Gets a byte from the
bytes
- Arguments
i – The index
- Returns
The value of the i th byte as an integer.
- iter bytes.items(): bytes¶
Iterates over the
bytes
, yielding ASCII characters.- Yields
1-length
bytes
- proc bytes.this(r: range(?)): bytes
Slices the
bytes
. Halts if r is non-empty and not completely inside the rangethis.indices
when compiled with –checks. –fast disables this check.
- proc bytes.isEmpty(): bool¶
Checks if the
bytes
is empty.- Returns
true – when empty
false – otherwise
- proc bytes.startsWith(patterns: bytes ...): bool¶
Checks if the
bytes
starts with any of the given arguments.
- proc bytes.endsWith(patterns: bytes ...): bool¶
Checks if the
bytes
ends with any of the given arguments.
- proc bytes.find(needle: bytes, region: range(?) = this.indices): idxType¶
Warning
the ‘needle’ and ‘region’ arguments are deprecated, use ‘pattern’ and ‘indices’ instead
- proc bytes.find(pattern: bytes, indices: range(?) = this.indices): idxType
Finds the argument in the
bytes
- Arguments
pattern –
bytes
to search forindices – an optional range defining the indices to search within, default is the whole. Halts if the range is not within
this.indices
- Returns
the index of the first occurrence from the left of pattern within the
bytes
, or -1 if the pattern is not in thebytes
.
- proc bytes.rfind(needle: bytes, region: range(?) = this.indices): idxType¶
Warning
the ‘needle’ and ‘region’ arguments are deprecated, use ‘pattern’ and ‘indices’ instead
- proc bytes.rfind(pattern: bytes, indices: range(?) = this.indices): idxType
Finds the argument in the
bytes
- Arguments
pattern – The
bytes
to search forindices – an optional range defining the indices to search within, default is the whole. Halts if the range is not within
this.indices
- Returns
the index of the first occurrence from the right of pattern within the
bytes
, or -1 if the pattern is not in thebytes
.
- proc bytes.count(needle: bytes, region: range(?) = this.indices): int¶
Warning
the ‘needle’ and ‘region’ arguments are deprecated, use ‘pattern’ and ‘indices’ instead
- proc bytes.count(pattern: bytes, indices: range(?) = this.indices): int
Counts the number of occurrences of the argument in the
bytes
- proc bytes.replace(needle: bytes, replacement: bytes, count: int = -1): bytes¶
Warning
the ‘needle’ argument is deprecated, use ‘pattern’ instead
- proc bytes.replace(pattern: bytes, replacement: bytes, count: int = -1): bytes
Replaces occurrences of a
bytes
with another.
- iter bytes.split(sep: bytes, maxsplit: int = -1, ignoreEmpty: bool = false): bytes¶
Splits the
bytes
on sep yielding the bytes between each occurrence, up to maxsplit times.
- iter bytes.split(maxsplit: int = -1): bytes
Works as above, but uses runs of whitespace as the delimiter.
- proc bytes.join(const ref x: bytes ...): bytes¶
Returns a new
bytes
, which is the concatenation of all of thebytes
passed in with the contents of the method receiver inserted between them.var myBytes = b"|".join(b"a",b"10",b"d"); writeln(myBytes); // prints: "a|10|d"
- proc bytes.join(const ref x): bytes
Returns a new
bytes
, which is the concatenation of all of thebytes
passed in with the contents of the method receiver inserted between them.var tup = (b"a",b"10",b"d"); var myJoinedTuple = b"|".join(tup); writeln(myJoinedTuple); // prints: "a|10|d" var myJoinedArray = b"|".join([b"a",b"10",b"d"]); writeln(myJoinedArray); // prints: "a|10|d"
- proc bytes.strip(chars = b" \t\r\n", leading = true, trailing = true): bytes¶
Strips given set of leading and/or trailing characters.
- Arguments
chars – Characters to remove. Defaults to b” \t\r\n”.
leading – Indicates if leading occurrences should be removed. Defaults to true.
trailing – Indicates if trailing occurrences should be removed. Defaults to true.
- Returns
A new
bytes
with leading and/or trailing occurrences of characters in chars removed as appropriate.
- proc bytes.dedent(columns = 0, ignoreFirst = true): bytes¶
Remove indentation from each line of bytes.
This can be useful when applied to multi-line bytes that are indented in the source code, but should not be indented in the output.
When
columns == 0
, determine the level of indentation to remove from all lines by finding the common leading whitespace across all non-empty lines. Empty lines are lines containing only whitespace. Tabs and spaces are the only whitespaces that are considered, but are not treated as the same characters when determining common whitespace.When
columns > 0
, removecolumns
leading whitespace characters from each line. Tabs are not considered whitespace whencolumns > 0
, so only leading spaces are removed.- Arguments
columns – The number of columns of indentation to remove. Infer common leading whitespace if
columns == 0
.ignoreFirst – When
true
, ignore first line when determining the common leading whitespace, and make no changes to the first line.
- Returns
A new
bytes
with indentation removed.
Warning
bytes.dedent
is not considered stable and is subject to change in future Chapel releases.
- proc bytes.decode(policy = decodePolicy.strict): string throws¶
Returns a UTF-8 string from the given
bytes
. If the data is malformed for UTF-8, policy argument determines the action.- Arguments
policy –
decodePolicy.strict raises an error
decodePolicy.replace replaces the malformed character with UTF-8 replacement character
decodePolicy.drop drops the data silently
decodePolicy.escape escapes each illegal byte with private use codepoints
- Throws
DecodeError if decodePolicy.strict is passed to the policy argument and the
bytes
contains non-UTF-8 characters.- Returns
A UTF-8 string.
- proc bytes.isUpper(): bool¶
Checks if all the characters in the
bytes
are uppercase (A-Z) in ASCII. Ignores uncased (not a letter) and extended ASCII characters (decimal value larger than 127)- Returns
true–there is at least one uppercase and no lowercase characters
false–otherwise
- proc bytes.isLower(): bool¶
Checks if all the characters in the
bytes
are lowercase (a-z) in ASCII. Ignores uncased (not a letter) and extended ASCII characters (decimal value larger than 127)- Returns
true–there is at least one lowercase and no uppercase characters
false–otherwise
- proc bytes.isSpace(): bool¶
Checks if all the characters in the
bytes
are whitespace (‘ ‘, ‘\t’, ‘\n’, ‘\v’, ‘\f’, ‘\r’) in ASCII.- Returns
true – when all the characters are whitespace.
false – otherwise
- proc bytes.isAlpha(): bool¶
Checks if all the characters in the
bytes
are alphabetic (a-zA-Z) in ASCII.- Returns
true – when the characters are alphabetic.
false – otherwise
- proc bytes.isDigit(): bool¶
Checks if all the characters in the
bytes
are digits (0-9) in ASCII.- Returns
true – when the characters are digits.
false – otherwise
- proc bytes.isAlnum(): bool¶
Checks if all the characters in the
bytes
are alphanumeric (a-zA-Z0-9) in ASCII.- Returns
true – when the characters are alphanumeric.
false – otherwise
- proc bytes.isPrintable(): bool¶
Checks if all the characters in the
bytes
are printable in ASCII.- Returns
true – when the characters are printable.
false – otherwise
- proc bytes.isTitle(): bool¶
Checks if all uppercase characters are preceded by uncased characters, and if all lowercase characters are preceded by cased characters in ASCII.
- Returns
true – when the condition described above is met.
false – otherwise
- proc bytes.toLower(): bytes¶
Creates a new
bytes
with all applicable characters converted to lowercase.- Returns
A new
bytes
with all uppercase characters (A-Z) replaced with their lowercase counterpart in ASCII. Other characters remain untouched.
- proc bytes.toUpper(): bytes¶
Creates a new
bytes
with all applicable characters converted to uppercase.- Returns
A new
bytes
with all lowercase characters (a-z) replaced with their uppercase counterpart in ASCII. Other characters remain untouched.
- proc bytes.toTitle(): bytes¶
Creates a new
bytes
with all applicable characters converted to title capitalization.- Returns
A new
bytes
with all cased characters(a-zA-Z) following an uncased character converted to uppercase, and all cased characters following another cased character converted to lowercase.
- proc type bytes.+=(ref lhs: bytes, const ref rhs: bytes): void¶
- proc type bytes.=(ref lhs: bytes, rhs_c: c_string): void
Copies the c_string rhs_c into the bytes lhs.
Halts if lhs is a remote bytes.