daml/daml-lf/archive
Martin Huschenbett f0e5bed36f
DAML-LF: Add interning for type to DAML-LF 1.dev (#7893)
* DAML-LF: Add interning for type to DAML-LF 1.dev

We add two new features to DAML-LF 1.dev:

* a per package list (or table) of `Type` messages, and
* a new case in the `Type` message which is an index into this table.

In combination, these two features can be used to allow DAML-LF
encoders to perform hash-consing of `Type` messages. We also change the
Haskell implementation of our DAML-LF encoder to do exactly that when
targetting DAML-LF 1.dev.

Doing this has a few benefits:

1. The DALFs produced by `damlc` get smaller: I've seen a case where
   the size dropped from 69MB to 45MB.
2. DAML-LF decoders need to decode less data.
3. Decoded packages use less memory because identical structures are
   now shared. This is particularly helpful in situations where we need
   to keep the interface (or signature) of a package in memory for a
   long time.

This PR mostly takes care of the Haskell implementation. However, we
need to make the Scala implementation of the decoder aware of the new
features as well since we have tests that load DAML-LF 1.dev into the
engine. A decoder and _targeted_ tests on the Scala side will follow
in a separate PR.

CHANGELOG_BEGIN
CHANGELOG_END

* Make jq tests aware of type interning

CHANGELOG_BEGIN
CHANGELOG_END

* Improve jq test

CHANGELOG_BEGIN
CHANGELOG_END

* Apply Remy's suggestions

Co-authored-by: Remy <remy.haemmerle@daml.com>

* Improve the imperative bits

CHANGELOG_BEGIN
CHANGELOG_END

Co-authored-by: Remy <remy.haemmerle@daml.com>
2020-11-18 11:14:30 +00:00
..
src DAML-LF: Add interning for type to DAML-LF 1.dev (#7893) 2020-11-18 11:14:30 +00:00
archive.bzl replace DAML Authors with DA in copyright headers (#5228) 2020-03-27 01:26:10 +01:00
BUILD.bazel Remove vendored pkg_tar (#6934) 2020-07-30 15:53:16 +00:00
README.md Use com.daml as root package (#5343) 2020-04-05 19:49:57 +02:00

DAML-LF archive

This component contains the .proto definitions specifying the format in which DAML-LF packages are stored -- the DAML-LF archive. All the proto definitions are kept in the directory src/protobuf/com/daml/daml_lf_dev/

The entry point definition is Archive in src/protobuf/com/daml/daml_lf_dev/daml_lf.proto. Archive contains some metadata about the actual archive (currently the hashing function and the hash), and then a binary blob containing the archive. The binary blob must be an ArchivePayload -- we keep it in binary form to facilitate hashing, signing, etc. The encoding and decoding of the payload is handled by Haskell and Java libraries in daml-core-package, so that consumers and producers do not really need to worry about it.

ArchivePayload is a sum type containing the various DAML-LF versions supported by the DAML-LF archive. Currently we have two major versions:

Snapshot versions

The component contains also an arbitrary number of snapshots of the protobuf definitions as they were as the time a particular version of DAML-LF was frozen. For versions <= 1.8, those snapshots are kept in the directories src/protobuf/com/digitalasset/daml_lf_x_y/, where x.y is a already frozen DAML-LF version. For newer versions, the directory is src/protobuf/com/daml/daml_lf_x_y/. A snapshot for version x.y can be used to read any DAML-LF version from 1.0 to x.y without suffering breaking changes (at the generated code level) often introduced in the current version.

Building

It produces several libraries containing code to encode / decode such definition, a Haskell one, and several Java ones:

$ bazel build //daml-lf/archive:daml_lf_archive_haskell_proto
$ bazel build //daml-lf/archive:daml_lf_dev_archive_java_proto
$ bazel build //daml-lf/archive:daml_lf_1_6_archive_java_proto

Editing the .proto definitions

When editing the proto definitions, you must make sure to not change them in a backwards-incompatible way. To make sure this doesn't happen:

  • DO NOT delete message fields;
  • DO NOT change the number of a message field or an enum value;
  • DO NOT change the type of a message field;

Note that "fields" include oneof fields. Also note that the "don't delete fields" rule is there not because they introduce a backwards incompatible change, but rather because after a field has been deleted another commiter might redefine it with a different type without realizing.

What is OK is renaming message fields while keeping the number and semantics unchanged. For example, if you have

message Foo {
  bytes blah = 1;
}

it's OK to change it to

message Foo {
  // this field is deprecated -- use baz instead!
  bytes blah_deprecated = 1;
  string baz = 2;
}

Conversion from the .proto to AST

The .proto definitions contain the serialized format for DAML-LF packages, however the code to convert from the .proto definitions to the actual AST lives elsewhere.