enso/docs/parser/reader.md

---
layout: developer-doc
title: Reading Source Code
category: parser
tags: [parser, reader]
order: 10
---

# Reading Source Code

The reader is responsible for abstracting the interface to reading a character
from a stream. This handles abstracting away the various encodings that the
project is going to use, as well as backing formats for the stream.

<!-- MarkdownTOC levels="2,3" autolink="true" -->

- [Reader Functionality](#reader-functionality)
- [Reader Structure](#reader-structure)
  - [Read](#read)
  - [Decoder](#decoder)
- [Provided Encodings](#provided-encodings)
  - [UTF-8](#utf-8)
  - [UTF-16](#utf-16)
  - [UTF-32](#utf-32)
  - [Benchmarks](#benchmarks)

<!-- /MarkdownTOC -->

## Reader Functionality

The reader has the following functionality:

- It reads its input _lazily_, not requiring the entire input to be in memory.
- It provides the interface to `next_character`, returning rust-native UTF-32,
  and abstracts away the various underlying encodings.
- It allows to bookmark the character that was last read, and return to it later
  by calling `rewind`.

## Reader Structure

The lazy reader consists of the following parts:

### Read

The `Read` trait is similar to `std::io::Read`, but supports different encodings
than just `&[u8]`. It provides the interface
`fn read(&mut self, buffer:&mut [Self::Item]) -> usize` that fills the provided
buffer with the data that is being read.

Any structure that implements `std::io::Read` also implements `Read<Item=u8>`.

### Decoder

The `Decoder` trait is an interface for reading a single character from an
underlying buffer `fn decode(words:&[Self::Word]) -> Char`. The type of buffer
depends on the type of the underlying encoding so that i.e. UTF-32 can use
`&[char]` directly.

#### Example Usage

To put things into perspective, this is how the reader is constructed from a
file and a string.

```rust
let string      = "Hello, World!";
let byte_reader = Reader::new(string.as_bytes(), DecoderUTF8(), 0);
let file_reader = Reader::new(File::open("foo.txt")?, DecoderUTF8(), 0);
```

## Provided Encodings

The decoders currently provides the following input encodings.

### UTF-8

Rust natively uses UTF-8 encoding for its strings. In order for the IDE to make
use of the parser, a simple rust-native UTF-8 encoding is provided.

### UTF-16

As the JVM as a platform makes use of UTF-16 for encoding its strings, we need
to have a reader that lets JVM clients of the parser provide the source code in
a streaming fashion without needing to re-encode it prior to passing it to the
parser.

### UTF-32

Rust also uses UTF-32 encoding for its characters. Therefore, this encoding is
required in order to support inputs as `&[char]`.

### Benchmarks

7/17/2020: The reader throughput is around 1e+8 chars/s (or 1e-8 secs/char).
Initial parser docs setup (#936) 2020-06-25 15:06:08 +03:00			`---`
			`layout: developer-doc`
			`title: Reading Source Code`
			`category: parser`
			`tags: [parser, reader]`
Complete the implementation of the Enso lexer (#1177) 2020-10-30 17:06:24 +03:00			`order: 10`
Initial parser docs setup (#936) 2020-06-25 15:06:08 +03:00			`---`

			`# Reading Source Code`
Add a markdown style guide (#1022) 2020-07-21 15:59:40 +03:00
Do initial design for the lexer (#947) 2020-06-26 16:54:20 +03:00			`The reader is responsible for abstracting the interface to reading a character`
			`from a stream. This handles abstracting away the various encodings that the`
			`project is going to use, as well as backing formats for the stream.`
Initial parser docs setup (#936) 2020-06-25 15:06:08 +03:00
			`<!-- MarkdownTOC levels="2,3" autolink="true" -->`

Do initial design for the lexer (#947) 2020-06-26 16:54:20 +03:00			`- [Reader Functionality](#reader-functionality)`
Complete the implementation of the Enso lexer (#1177) 2020-10-30 17:06:24 +03:00			`- [Reader Structure](#reader-structure)`
			`- [Read](#read)`
			`- [Decoder](#decoder)`
			`- [Provided Encodings](#provided-encodings)`
			`- [UTF-8](#utf-8)`
			`- [UTF-16](#utf-16)`
			`- [UTF-32](#utf-32)`
			`- [Benchmarks](#benchmarks)`
Do initial design for the lexer (#947) 2020-06-26 16:54:20 +03:00
Initial parser docs setup (#936) 2020-06-25 15:06:08 +03:00			`<!-- /MarkdownTOC -->`
Do initial design for the lexer (#947) 2020-06-26 16:54:20 +03:00
			`## Reader Functionality`
Add a markdown style guide (#1022) 2020-07-21 15:59:40 +03:00
Add a lazy input reader for flexer (#1014) 2020-07-21 18:25:02 +03:00			`The reader has the following functionality:`
Do initial design for the lexer (#947) 2020-06-26 16:54:20 +03:00
Add a lazy input reader for flexer (#1014) 2020-07-21 18:25:02 +03:00			`- It reads its input _lazily_, not requiring the entire input to be in memory.`
			- It provides the interface to `next_character`, returning rust-native UTF-32,
			`and abstracts away the various underlying encodings.`
			`- It allows to bookmark the character that was last read, and return to it later`
			by calling `rewind`.
Do initial design for the lexer (#947) 2020-06-26 16:54:20 +03:00
Add a lazy input reader for flexer (#1014) 2020-07-21 18:25:02 +03:00			`## Reader Structure`
Add a markdown style guide (#1022) 2020-07-21 15:59:40 +03:00
Add a lazy input reader for flexer (#1014) 2020-07-21 18:25:02 +03:00			`The lazy reader consists of the following parts:`
Do initial design for the lexer (#947) 2020-06-26 16:54:20 +03:00
Add a lazy input reader for flexer (#1014) 2020-07-21 18:25:02 +03:00			`### Read`

			The `Read` trait is similar to `std::io::Read`, but supports different encodings
			than just `&[u8]`. It provides the interface
			`fn read(&mut self, buffer:&mut [Self::Item]) -> usize` that fills the provided
			`buffer with the data that is being read.`

			Any structure that implements `std::io::Read` also implements `Read<Item=u8>`.

			`### Decoder`

			The `Decoder` trait is an interface for reading a single character from an
			underlying buffer `fn decode(words:&[Self::Word]) -> Char`. The type of buffer
			`depends on the type of the underlying encoding so that i.e. UTF-32 can use`
			`&[char]` directly.

			`#### Example Usage`

			`To put things into perspective, this is how the reader is constructed from a`
			`file and a string.`

			```rust
			`let string = "Hello, World!";`
			`let byte_reader = Reader::new(string.as_bytes(), DecoderUTF8(), 0);`
			`let file_reader = Reader::new(File::open("foo.txt")?, DecoderUTF8(), 0);`
			```

			`## Provided Encodings`

			`The decoders currently provides the following input encodings.`

			`### UTF-8`
Add a markdown style guide (#1022) 2020-07-21 15:59:40 +03:00
Do initial design for the lexer (#947) 2020-06-26 16:54:20 +03:00			`Rust natively uses UTF-8 encoding for its strings. In order for the IDE to make`
Add a lazy input reader for flexer (#1014) 2020-07-21 18:25:02 +03:00			`use of the parser, a simple rust-native UTF-8 encoding is provided.`
Do initial design for the lexer (#947) 2020-06-26 16:54:20 +03:00
Add a lazy input reader for flexer (#1014) 2020-07-21 18:25:02 +03:00			`### UTF-16`
Add a markdown style guide (#1022) 2020-07-21 15:59:40 +03:00
Do initial design for the lexer (#947) 2020-06-26 16:54:20 +03:00			`As the JVM as a platform makes use of UTF-16 for encoding its strings, we need`
Add a lazy input reader for flexer (#1014) 2020-07-21 18:25:02 +03:00			`to have a reader that lets JVM clients of the parser provide the source code in`
			`a streaming fashion without needing to re-encode it prior to passing it to the`
			`parser.`

			`### UTF-32`

			`Rust also uses UTF-32 encoding for its characters. Therefore, this encoding is`
			required in order to support inputs as `&[char]`.

			`### Benchmarks`

			`7/17/2020: The reader throughput is around 1e+8 chars/s (or 1e-8 secs/char).`