kitty/protocol-extensions.asciidoc

= Extensions to the xterm protocol
:toc:
:toc-placement!:

kitty has a few extensions to the xterm protocol, to enable advanced features.
These are typically in the form of new or re-purposed escape codes. While these
extensions are currently kitty specific, it would be nice to get some of them
adopted more broadly.

toc::[]

== Colored and styled underlines

kitty supports colored and styled (wavy) underlines. This is of particular use
in terminal editors such as vim and emacs to display red, wavy underlines under
mis-spelled words and/or syntax errors. This is done by re-purposing some SGR escape codes

Setting the width and/or height to zero means that no drawing is done and the
cursor position remains unchanged.
that are not used in modern terminals (https://en.wikipedia.org/wiki/ANSI_escape_code#CSI_codes)

To change the underline style from straight line to curl (this used to be the
code for rapid blinking text, only previous use I know of was in MS-DOS ANSI.sys):

```
<ESC>[6m
```

To set the underline color (this is reserved and as far as I can tell not actually used for anything):

```
<ESC>[58...m
```

This works exactly like the codes `38, 48` that are used to set foreground and
background color respectively.

To reset the underline color (also previously reserved and unused):

```
<ESC>[59m
```


== Graphics rendering

The goal of this specification is to create a flexible and performant protocol
that allows the program running in the terminal, hereafter called the _client_,
to render arbitrary pixel (raster) graphics to the screen of the terminal
emulator. The major design goals are

 * Should not require terminal emulators to understand image formats.
 * Should allow specifying graphics to be drawn per individual character cell. This allows graphics to mix with text using
   the existing cursor based protocols.
 * Should use optimizations when the client is running on the same computer as the terminal emulator.

To achieve these goals the first extended escape code is to allow the client to
query the current character cell size in pixels:

```
<ESC>[?7n
```

To which the terminal emulator replies with:

```
<ESC>[?7;<width>;<height>n
```

where `width` and `height` are in pixels and refer to the current size of a single character cell.

=== Transferring pixel data

```
<ESC>_G<control data>;<payload><ESC>\
```

Before describing this escape code in detail, lets see some quick examples to get a flavor of it in action.

```
# Draw 10x20 pixels starting at the top-left corner of the current cell.
<ESC>_Gw=10,h=20,s=100;<pixel data><ESC>\

# Ditto, getting the pixel data from /tmp/pixel_data
<ESC>_Gw=10,h=20,t=f,s=100;<encoded /tmp/pixel_data><ESC>\

# Ditto, getting the pixel data from /dev/shm/pixel_data, deleting the file after reading data
<ESC>_Gw=10,h=20,t=t,s=100;<encoded /dev/shm/pixel_data><ESC>\

# Draw 10x20 pixels starting at the top-left corner of the current cell, ignoring the first 4 rows and 3 columns of the pixel data
<ESC>_Gw=10,h=20,x=3,y=4,s=100;<pixel data><ESC>\
```

This control code is an _Application-Programming Command (APC)_, indicated by
the leading `<ESC>_`. No modern terminals that I know of use APC codes, and
well-behaved terminals are supposed to ignore APC codes they do not understand.

The next character `G` indicates this APC code is for graphics data. In the future, we might
have different first letters for different needs.

The control data is a comma-separated list of key-value pairs with the restriction that
keys and values must contain only the characters `0-9a-zA-Z_-+/*`. The payload is arbitrary binary
data interpreted based on the control codes. The binary data must be base-64 encoded so as to minimize
the probability of problems with legacy systems that might interpret control
codes in the binary data incorrectly.

The key to the operation of this escape code is understanding the way the control data works.
The control data's keys are split up into categories for easier reference.

==== Controlling drawing

|===
| Key | Default     | Meaning

| w   | full width  | width -- number of columns to draw
| h   | full height | height -- number of columns to draw
| x   | zero        | x-offset -- the column in the pixel data to start from (0-based)
| y   | zero        | y-offset -- the row in the pixel data to start from (0-based)
|===

The origin for `(x, y)` is the top left corner of the pixel data, with `x`
increasing from left-to-right and `y` increasing downwards. The terminal
emulator will draw the specified region starting at the top-left corner of the
current cell. If the width is greater than a single cell, the cursor will be
moved one cell to the right and drawing will continue.  If the cursor reaches
the end of the line, it moves to the next line and starts drawing the next row
of data.  This means that the displayed image will be truncated at the right
edge of the screen. If the cursor needs to move past the bottom of the screen,
the screen is scrolled. After the entire region is drawn, the cursor will be
positioned at the first cell after the image.

Setting the width and/or height to zero means that no drawing is done and the
cursor position remains unchanged.


==== Transmitting data

The first consideration when transferring data between the client and the
terminal emulator is the format in which to do so. Since there is a vast and
growing number of image formats in existence, it does not make sense to have
every terminal emulator implement support for them. Instead, the client should
send simple pixel data to the terminal emulator. The obvious downside to this
is performance, especially when the client is running on a remote machine.
Techniques for remedying this limitation are discussed later. The terminal
emulator must understand pixel data in two formats, 24-bit RGB and 32-bit RGBA.
This is specified using the `f` key in the control data. `f=32` (which is the
default) indicates 32-bit RGBA data and `f=24` indicates 24-bit RGB data.

One additional parameter is needed to describe the pixel data, the _stride_,
that is the number of pixels per row. This is encoded using the `s` key, which
is **required**. For example, `s=100` means there are one hundred pixels per
row in the pixel data.

Now let us turn to considering how the data is actually transmitted.


===== Local client

When the client and the terminal emulator are on the same computer and share a
filesystem or shared memory, transfer can happen efficiently using files or
shared memory objects to pass the data around. The type of transfer is
controlled by the `t` key. When sending data via files/shared memory, `t` can
take three values, described below:

|===
| Value of `t` | Meaning

| f | A simple file
| t | A temporary file, the terminal emulator will delete the file after reading the pixel data
| s | A http://man7.org/linux/man-pages/man7/shm_overview.7.html[POSIX shared memory object]. The terminal emulator will delete it after reading the pixel data
|===

In all these cases, the payload data must be the base-64 encoded absolute file path.

[[query]]An important consideration is how the client can tell if the terminal emulator
and it share a filesystem. This can be done by using the _response mode_, specifying
the `q` key, with some unique id as the value. For example,

```
<ESC>_Gt=t,s=100,q=33;<encoded /tmp/pixel_data><ESC>\
```

When the terminal emulator receives this escape code, it will read and display
the pixel data as normal, and also send an escape code back to the client
indicating whether the reading of the data was successful or not. The returned
escape code will look like:

```
<ESC>_Gq=33;<encoded error message or OK><ESC>\
```

Here the `q` value will be the same as was sent by the client in the original
request.  The payload data will be a base-64 encoded UTF-8 string. The string
will be `OK` if reading the pixel data succeeded or an error message. Clients
can set the width and height to zero to avoid actually drawing anything on
screen during the test.


===== Remote client

Remote clients, those that are unable to use the filesystem/shared memory to
transmit data, must send the pixel data directly using escape codes. Since
escape codes are of limited maximum length, the data will need to be chunked up
for transfer. This is done using the `m` key. The pixel data must first be
base64 encoded then chunked up into chunks no larger than `4096` bytes. The client
then sends the graphics escape code as usual, with the addition os an `m` key that
must have the value `1` for all but the last chunk, where it must be `0`. For example,
if the data is split into three chunks, the client would send the following
sequence of escape codes to the terminal emulator:

```
<ESC>_Gw=100,h=30,s=100,m=1;<base-64 pixel data first chunk><ESC>\
<ESC>_Gm=1;<base-64 pixel data second chunk><ESC>\
<ESC>_Gm=0;<base-64 pixel data last chunk><ESC>\
```

Note that only the first escape code needs to have the full set of control
codes such as stride, width, height, format etc. Subsequent chunks must have
only the `m` key. The client must finish sending all chunks for a single image
before sending any other graphics related escape codes.


=== Image persistence

Full screen applications may need to render the same image multiple times or
even render different parts of an image, in different locations, for example,
if the image is sprite map. Resending the image data each time this happens is
wasteful. Instead this protocol allows the client to have the terminal emulator
manage a persistent store of images.

Persistence is implemented by simply assigning an id to transmitted pixel data using the
key `i`. So for example,

```
<ESC>_Gt=t,s=100,i=some-id;<encoded /tmp/pixel_data><ESC>\
```

Now, if the client wants to redraw that image in the future, all it has to do is send
a code with the keys `t=i,i=some-id`, and no payload, like this:

```
<ESC>_Gt=i,i=some-id;<ESC>\
```

The client can use the `w, h, x, y` keys to draw differnt parts of the image
and draw it at different locations by positioning the cursor before sending the
code.

Saved images can be deleted, to free up resources, by using the code:

```
<ESC>_Gt=d,i=some-id;<ESC>\
```

The special value of 'i=*' will cause the terminal emulator to delete all
stored images.  Well behaved clients should send this code before terminating.

Terminal emulators may limit the maximum amount of saved data to avoid denial-of-service
attacks.  Terminal emulators should make the limit fairly generous, at least a
few hundred, full screen RGBA images worth of data should be allowed.

Client applications can check if an image is still stored by sending the `q`
key, as described <<query,above>>. For example,

```
<ESC>_Gt=i,i=some-id,q=some-id;<ESC>\
```

The terminal emulator will respond with:

```
<ESC>_Gq=some-id;<encoded OK or error message><ESC>\
```

If `OK` is sent the image was successfully loaded from the persistent storage, if not,
then it must be resent.