mirror of
https://github.com/AleoHQ/leo.git
synced 2024-12-19 15:41:36 +03:00
266 lines
11 KiB
Markdown
266 lines
11 KiB
Markdown
|
# Leo RFC 009: Conversions with Bits and Bytes
|
||
|
|
||
|
## Authors
|
||
|
|
||
|
- Max Bruce
|
||
|
- Collin Chin
|
||
|
- Alessandro Coglio
|
||
|
- Eric McCarthy
|
||
|
- Jon Pavlik
|
||
|
- Damir Shamanaev
|
||
|
- Damon Sicore
|
||
|
- Howard Wu
|
||
|
|
||
|
## Status
|
||
|
|
||
|
DRAFT
|
||
|
|
||
|
# Summary
|
||
|
|
||
|
This RFC proposes the addition of natively implemented global functions to perform conversions
|
||
|
between Leo integer values and sequences of bits or bytes in big endian or little endian order.
|
||
|
This RFC also proposes a future transition from these functions to methods associated to the integer types.
|
||
|
|
||
|
# Motivation
|
||
|
|
||
|
Conversions of integers to bits and bytes are fairly common in programming languages.
|
||
|
Use case include communication with the external world
|
||
|
(since external data is sometimes represented as bits and bytes rather than higher-level data structures),
|
||
|
and serialization/deserialization for cryptographic purposes (e.g. hashing data).
|
||
|
|
||
|
# Design
|
||
|
|
||
|
## Concepts
|
||
|
|
||
|
The Leo integer values can be thought of sequences of bits.
|
||
|
Therefore, it makes sense to convert between integer values and their corresponding sequences of bits;
|
||
|
the sequences of bits can be in little or big endian order (i.e. least vs. most significant bit first),
|
||
|
naturally leading to two possible conversions.
|
||
|
Obviously, the bits represent the integers in base 2.
|
||
|
|
||
|
Since all the Leo integer values consist of multiples of 8 bits,
|
||
|
it also makes sense to convert between integer values and squences of bytes,
|
||
|
which represents the integers in base 256.
|
||
|
Again, the bytes may be in little or big endian order.
|
||
|
|
||
|
It could also make sense to convert between integers consisting of `N` bits
|
||
|
and sequences of "words" of `M` bits if `N` is a multiple of `M`,
|
||
|
e.g. convert a `u32` into a sequence of two `u16`s, or convert a `u128` into a sequence of four `u32`s.
|
||
|
However, the case in which `M` is 1 (bits) or 8 (bytes) is by far the most common,
|
||
|
and therefore the initial focus of this RFC;
|
||
|
nonetheless, it seems valuable to keep these possible generalizations in mind as we work though this initial design.
|
||
|
|
||
|
Another possible generalization is to lift these conversions to sequences,
|
||
|
e.g. converting from a sequence of integer values to a sequence of bits or bytes
|
||
|
by concatenating the results of converting the integer values,
|
||
|
and converting from a sequence of bits or bytes to a sequence of integer values
|
||
|
by grouping the bits or bytes into chunks and converting each chunk into an integer.
|
||
|
For instance, a sequence of 4 `u32` values can be turned into a sequence of 32 bytes or a sequence of 128 bits.
|
||
|
Note that, in these cases, the endianness only applies to the individual element conversion,
|
||
|
not to the ordering of the integer values, which should be preserved by the conversion.
|
||
|
|
||
|
Besides integers, it could make sense to consider converting other Leo values between bits and bytes,
|
||
|
namely characters, field elements, group elements, and addresses (but perhaps not booleans).
|
||
|
If this is further extended to aggregate values (tuples, arrays, and circuits),
|
||
|
then this moves towards a general serialization/deserialization library for Leo, which could be a separate feature.
|
||
|
|
||
|
## Representation of Bits
|
||
|
|
||
|
In Leo's current type system, bits can be represented as `bool` values.
|
||
|
These are not quite the numbers 0 and 1, but they are isomorphic, and it is easy to convert between booleans and bits:
|
||
|
```ts
|
||
|
// convert a boolean x to a bit:
|
||
|
(x ? 1 : 0)
|
||
|
|
||
|
// convert f bit y to a boolean:
|
||
|
(y == 1)
|
||
|
```
|
||
|
|
||
|
If Leo had a type `u1` for unsigned 1-bit integers, we could use that instead of `bool`.
|
||
|
Separately from this RFC, such a type could be added.
|
||
|
There is also an outstanding proposal (not in an RFC currently) to support types `uN` and `iN` for every positive `N`,
|
||
|
in which case `u1` would be an instance of that.
|
||
|
|
||
|
## Representation of Bytes
|
||
|
|
||
|
The type `u8` is the natural way to represent a byte.
|
||
|
The type `i8` is isomorphic to that, but we tend to think of bytes as unsigned.
|
||
|
|
||
|
## Representation of Sequences
|
||
|
|
||
|
This applies to the sequence of bits or bytes that a Leo integer converts to or from.
|
||
|
E.g. a `u32` is converted to/from a sequence of bits or bytes.
|
||
|
|
||
|
Sequences in Leo may be ntaurally represented as arrays or tuples.
|
||
|
Arrays are more flexible; in particular, they allow indexing via expressions rather than just numbers, unlike tuples.
|
||
|
Thus, arrays are the natural choice to represent these sequences.
|
||
|
|
||
|
## Conversion Functions
|
||
|
|
||
|
We propose the following global functions,
|
||
|
for which we write declarations without bodies below,
|
||
|
since the implementation is native.
|
||
|
(It is a separate issue whether the syntax below should be allowed,
|
||
|
in order to represent natively implemented functions,
|
||
|
or whether there should be a more explicit indication such as `native` in Java).
|
||
|
|
||
|
These are tentative names, which we can tweak.
|
||
|
What is more important is the selection of operations, and their input/output types.
|
||
|
|
||
|
### Conversions between Integers and Bits
|
||
|
|
||
|
```ts
|
||
|
// unsigned to bits, little and big endian
|
||
|
function u8_to_bits_le(x: u8) -> [bool; 8];
|
||
|
function u8_to_bits_be(x: u8) -> [bool; 8];
|
||
|
function u16_to_bits_le(x: u16) -> [bool; 16];
|
||
|
function u16_to_bits_be(x: u16) -> [bool; 16];
|
||
|
function u32_to_bits_le(x: u32) -> [bool; 32];
|
||
|
function u32_to_bits_be(x: u32) -> [bool; 32];
|
||
|
function u64_to_bits_le(x: u64) -> [bool; 64];
|
||
|
function u64_to_bits_be(x: u64) -> [bool; 64];
|
||
|
function u128_to_bits_le(x: u128) -> [bool; 128];
|
||
|
function u128_to_bits_be(x: u128) -> [bool; 128];
|
||
|
|
||
|
// signed to bits, little and big endian
|
||
|
function i8_to_bits_le(x: i8) -> [bool; 8];
|
||
|
function i8_to_bits_be(x: i8) -> [bool; 8];
|
||
|
function i16_to_bits_le(x: i16) -> [bool; 16];
|
||
|
function i16_to_bits_be(x: i16) -> [bool; 16];
|
||
|
function i32_to_bits_le(x: i32) -> [bool; 32];
|
||
|
function i32_to_bits_be(x: i32) -> [bool; 32];
|
||
|
function i64_to_bits_le(x: i64) -> [bool; 64];
|
||
|
function i64_to_bits_be(x: i64) -> [bool; 64];
|
||
|
function i128_to_bits_le(x: i128) -> [bool; 128];
|
||
|
function i128_to_bits_be(x: i128) -> [bool; 128];
|
||
|
|
||
|
// unsigned from bits, little and big endian
|
||
|
function u8_from_bits_le(x: [bool; 8]) -> u8;
|
||
|
function u8_from_bits_be(x: [bool; 8]) -> u8;
|
||
|
function u16_from_bits_le(x: [bool; 16]) -> u16;
|
||
|
function u16_from_bits_be(x: [bool; 16]) -> u16;
|
||
|
function u32_from_bits_le(x: [bool; 32]) -> u32;
|
||
|
function u32_from_bits_be(x: [bool; 32]) -> u32;
|
||
|
function u64_from_bits_le(x: [bool; 64]) -> u64;
|
||
|
function u64_from_bits_be(x: [bool; 64]) -> u64;
|
||
|
function u128_from_bits_le(x: [bool; 128]) -> u128;
|
||
|
function u128_from_bits_be(x: [bool; 128]) -> u128;
|
||
|
|
||
|
// signed from bits, little and big endian
|
||
|
function i8_from_bits_le(x: [bool; 8]) -> i8;
|
||
|
function i8_from_bits_be(x: [bool; 8]) -> i8;
|
||
|
function i16_from_bits_le(x: [bool; 16]) -> i16;
|
||
|
function i16_from_bits_be(x: [bool; 16]) -> i16;
|
||
|
function i32_from_bits_le(x: [bool; 32]) -> i32;
|
||
|
function i32_from_bits_be(x: [bool; 32]) -> i32;
|
||
|
function i64_from_bits_le(x: [bool; 64]) -> i64;
|
||
|
function i64_from_bits_be(x: [bool; 64]) -> i64;
|
||
|
function i128_from_bits_le(x: [bool; 128]) -> i128;
|
||
|
function i128_from_bits_be(x: [bool; 128]) -> i128;
|
||
|
```
|
||
|
|
||
|
### Conversions between Integers and Bytes
|
||
|
|
||
|
```ts
|
||
|
// unsigned to bytes, little and big endian
|
||
|
function u16_to_bytes_le(x: u16) -> [u8; 2];
|
||
|
function u16_to_bytes_be(x: u16) -> [u8; 2];
|
||
|
function u32_to_bytes_le(x: u32) -> [u8; 4];
|
||
|
function u32_to_bytes_be(x: u32) -> [u8; 4];
|
||
|
function u64_to_bytes_le(x: u64) -> [u8; 8];
|
||
|
function u64_to_bytes_be(x: u64) -> [u8; 8];
|
||
|
function u128_to_bytes_le(x: u128) -> [u8; 16];
|
||
|
function u128_to_bytes_be(x: u128) -> [u8; 16];
|
||
|
|
||
|
// signed to bytes, little and big endian
|
||
|
function i16_to_bytes_le(x: i16) -> [u8; 2];
|
||
|
function i16_to_bytes_be(x: i16) -> [u8; 2];
|
||
|
function i32_to_bytes_le(x: i32) -> [u8; 4];
|
||
|
function i32_to_bytes_be(x: i32) -> [u8; 4];
|
||
|
function i64_to_bytes_le(x: i64) -> [u8; 8];
|
||
|
function i64_to_bytes_be(x: i64) -> [u8; 8];
|
||
|
function i128_to_bytes_le(x: i128) -> [u8; 16];
|
||
|
function i128_to_bytes_be(x: i128) -> [u8; 16];
|
||
|
|
||
|
// unsigned from bytes, little and big endian
|
||
|
function u16_from_bytes_le(x: [u8; 2]) -> u16;
|
||
|
function u16_from_bytes_be(x: [u8; 2]) -> u16;
|
||
|
function u32_from_bytes_le(x: [u8; 4]) -> u32;
|
||
|
function u32_from_bytes_be(x: [u8; 4]) -> u32;
|
||
|
function u64_from_bytes_le(x: [u8; 8]) -> u64;
|
||
|
function u64_from_bytes_be(x: [u8; 8]) -> u64;
|
||
|
function u128_from_bytes_le(x: [u8; 16]) -> u128;
|
||
|
function u128_from_bytes_be(x: [u8; 16]) -> u128;
|
||
|
|
||
|
// signed from bytes, little and big endian
|
||
|
function i16_from_bytes_le(x: [u8; 2]) -> i16;
|
||
|
function i16_from_bytes_be(x: [u8; 2]) -> i16;
|
||
|
function i32_from_bytes_le(x: [u8; 4]) -> i32;
|
||
|
function i32_from_bytes_be(x: [u8; 4]) -> i32;
|
||
|
function i64_from_bytes_le(x: [u8; 8]) -> i64;
|
||
|
function i64_from_bytes_be(x: [u8; 8]) -> i64;
|
||
|
function i128_from_bytes_le(x: [u8; 16]) -> i128;
|
||
|
function i128_from_bytes_be(x: [u8; 16]) -> i128;
|
||
|
```
|
||
|
|
||
|
## Transition to Methods
|
||
|
|
||
|
Once a separate proposal for adding methods to Leo scalar types is realized,
|
||
|
we may want to turn the global functions listed above into methods,
|
||
|
deprecating the global functions, and eventually eliminating them.
|
||
|
|
||
|
Conversions to bits or bytes will be instance methods of the integer types,
|
||
|
e.g. `u8` will include an instance method `to_bits_le` that takes no arguments and that returns a `[bool; 8]`.
|
||
|
Example:
|
||
|
```ts
|
||
|
let int: u8 = 12;
|
||
|
let bits: [bool; 8] = int.to_bits_le();
|
||
|
console.assert(bits == [false, false, true, true, false, false, false, false]); // 00110000 (little endian)
|
||
|
```
|
||
|
|
||
|
Conversions from bits or bytes will be static methods of the integer types,
|
||
|
e.g. `u8` will include a static metod `from_bits_le` that takes a `[bool; 8]` argument and returns a `u8`.
|
||
|
Example:
|
||
|
```ts
|
||
|
let bits: [bool; 8] = [false, false, true, true, false, false, false, false]; // 00110000 (little endian)
|
||
|
let int = u8::from_bits_le(bits);
|
||
|
console.assert(int == 12);
|
||
|
```
|
||
|
|
||
|
# Drawbacks
|
||
|
|
||
|
This does not seem to bring any drawbacks.
|
||
|
|
||
|
# Effect on Ecosystem
|
||
|
|
||
|
None.
|
||
|
|
||
|
# Alternatives
|
||
|
|
||
|
## Pure Leo Implementation
|
||
|
|
||
|
These conversions can be realized in Leo (i.e. without native implementations),
|
||
|
provided that Leo is extended with certain operations that are already separately planned:
|
||
|
* Integer division and remainder, along with type casts, could be used.
|
||
|
* Bitwise shifts and masks, along with type casts, could be used.
|
||
|
|
||
|
However, compiling the Leo code that realizes the conversions may result in less efficient R1CS than the native ones.
|
||
|
|
||
|
## Naming Bit and Byte Types Explicitly
|
||
|
|
||
|
Names like `u8_to_bits_le` and `u32_to_bytes_le` talk about bits and bytes,
|
||
|
therefore relying on a choice of representation for bits and bytes,
|
||
|
which is `bool` for bits and `u8` for bytes as explained above.
|
||
|
An alternative is to have names like `u8_to_bools_le` and `u32_to_u8s_le`,
|
||
|
which explicate the representation of bits and bytes in the name,
|
||
|
and open the door to additional conversions to different representations.
|
||
|
In particular, if and when Leo is extended with a type `u1` for bits,
|
||
|
there could be additional operations like `u8_to_u1s_le`.
|
||
|
|
||
|
This more explicit naming scheme also provides a path towards extending
|
||
|
bit and byte conversions to more generic "word" conversions,
|
||
|
such as `u64_to_u16s_le`, which would turn a `u64` into a `[u16; 4]`.
|
||
|
In general, it makes sense to convert between `uN` or `iN` and `[uM; P]` when `N == M * P`.
|
||
|
If Leo were extended with types `uN` and `iN` for all positive `N` as proposed elsewhere,
|
||
|
there could be a family of all such conversions.
|