From f77e4229a25c05508c4bfe3e18ecc66dbdadeba0 Mon Sep 17 00:00:00 2001 From: Stephen Compall Date: Tue, 13 Aug 2019 17:56:33 -0400 Subject: [PATCH] document lf-value-json's encoding of various types (#2519) * copy design draft for JSON LF encoding with minimal changes * literal block formatting * :: on its own * only describe the variant notation we've chosen * wrap * mark some text literal * correct reason why null is not a valid Unit * alignment * explain that the presence of type variables has no bearing on nested Optional encoding * correct copyright header * quotes and subscripts * remove disallowed examples of variants * positive? negative? * you never know when JavaScript will ruin your day * what's {} anyway * we are talking about JSON, you know --- .../lf-value-json/specification.rst | 438 ++++++++++++++++++ 1 file changed, 438 insertions(+) create mode 100644 ledger-service/lf-value-json/specification.rst diff --git a/ledger-service/lf-value-json/specification.rst b/ledger-service/lf-value-json/specification.rst new file mode 100644 index 0000000000..d841e58994 --- /dev/null +++ b/ledger-service/lf-value-json/specification.rst @@ -0,0 +1,438 @@ +.. Copyright (c) 2019 The DAML Authors. All rights reserved. +.. SPDX-License-Identifier: Apache-2.0 + +DAML-LF JSON Encoding +===================== + +We describe how to decode and encode DAML-LF values as JSON. For each +DAML-LF type we explain what JSON inputs we accept (decoding), and what +JSON output we produce (encoding). + +The output format is parameterized by two flags:: + + encodeDecimalAsString: boolean + encodeInt64AsString: boolean + +The suggested defaults for both of these flags is false. If the +intended recipient is written in JavaScript, however, note that the +JavaScript data model will decode these as numbers, discarding data in +some cases; encode-as-String avoids this, as mentioned with respect to +``JSON.parse`` below. + +Note that throughout the document the decoding is type-directed. In +other words, the same JSON value can correspond to many DAML-LF values, +and the expected DAML-LF type is needed to decide which one. + +ContractId +---------- + +Contract ids are expressed as their string representation:: + + "123" + "XYZ" + "foo:bar#baz" + +Decimal +------- + +Input +~~~~~ + +Decimals can be expressed as JSON numbers or as JSON strings. JSON +strings are accepted using the same format that JSON accepts, and +treated them as the equivalent JSON number:: + + -?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+-]?\d+)? + +Note that JSON numbers would be enough to represent all +Decimals. However, we also accept strings because in many languages +(most notably JavaScript) use IEEE Doubles to express JSON numbers, and +IEEE Doubles cannot express DAML-LF Decimals correctly. Therefore, we +also accept strings so that JavaScript users can use them to specify +Decimals that do not fit in IEEE Doubles. + +Numbers must be within the bounds of Decimal, [–(10³⁸–1)÷10¹⁰, +(10³⁸–1)÷10¹⁰]. Numbers outside those bounds will be rejected. Numbers +inside the bounds will always be accepted, using banker's rounding to +fit them within the precision supported by Decimal. + +A few valid examples:: + + 42 --> 42 + 42.0 --> 42 + "42" --> 42 + 9999999999999999999999999999.9999999999 --> + 9999999999999999999999999999.9999999999 + -42 --> -42 + "-42" --> -42 + 0 --> 0 + -0 --> 0 + 0.30000000000000004 --> 0.3 + 2e3 --> 2000 + +A few invalid examples:: + + " 42 " + "blah" + 99999999999999999999999999990 + +42 + +Output +~~~~~~ + +If encodeDecimalAsString is set, decimals are encoded as strings, using +the format ``-?[0-9]{1,28}(\.[0-9]{1,10})?``. If encodeDecimalAsString +is not set, they are encoded as JSON numbers, also using the format +``-?[0-9]{1,28}(\.[0-9]{1,10})?``. + +Note that the flag encodeDecimalAsString is useful because it lets +JavaScript consumers consume Decimals safely with the standard +JSON.parse. + +Int64 +----- + +Input +~~~~~ + +Int64, much like Decimal, can be represented as JSON numbers and as +strings, with the string representation being ``[+-]?[0-9]+``. The +numbers must fall within [-9223372036854775808, +9223372036854775807]. Moreover, if represented as JSON numbers, they +must have no fractional part. + +A few valid examples:: + + 42 + "+42" + -42 + 0 + -0 + 9223372036854775807 + "9223372036854775807" + -9223372036854775808 + "-9223372036854775808" + +A few invalid examples:: + + 42.3 + +42 + 9223372036854775808 + -9223372036854775809 + "garbage" + " 42 " + +Output +~~~~~~ + +If encodeInt64AsString is set, Int64s are encoded as strings, using the +format ``-?[0-9]+``. If encodeInt64AsString is not set, they are encoded as +JSON numbers, also using the format ``-?[0-9]+``. + +Note that the flag encodeInt64AsString is useful because it lets +JavaScript consumers consume Int64s safely with the standard +``JSON.parse``. + +Timestamp +--------- + +Input +~~~~~ + +Timestamps are represented as ISO 8601 strings, rendered using the +format ``yyyy-mm-ddThh:mm:ss[.ssssss]Z``:: + + 1990-11-09T04:30:23.1234569Z + 1990-11-09T04:30:23Z + 1990-11-09T04:30:23.123Z + 0001-01-01T00:00:00Z + 9999-12-31T23:59:59.999999Z + +It's OK to omit the microsecond part partially or entirely. Sub-second +data beyond microseconds will be dropped. The UTC timezone designator +must be included. The rationale behind the inclusion of the timezone +designator is minimizing the risk that users pass in local times. + +The timestamp must be between the bounds specified by DAML-LF and ISO +8601, [0001-01-01T00:00:00Z, 9999-12-31T23:59:59.999999Z]. + +JavaScript + +:: + + > new Date().toISOString() + '2019-06-18T08:59:34.191Z' + +Python + +:: + + >>> datetime.datetime.utcnow().isoformat() + 'Z' + '2019-06-18T08:59:08.392764Z' + +Java + +:: + + import java.time.Instant; + class Main { + public static void main(String[] args) { + Instant instant = Instant.now(); + // prints 2019-06-18T09:02:16.652Z + System.out.println(instant.toString()); + } + } + +Output +~~~~~~ + +Timestamps are encoded as ISO 8601 strings, rendered using the format +``yyyy-mm-ddThh:mm:ss[.ssssss]Z``. + +The sub-second part will be formatted as follows: + +- If no sub-second part is present in the timestamp (i.e. the timestamp + represents whole seconds), the sub-second part will be omitted + entirely; +- If the sub-second part does not go beyond milliseconds, the sub-second + part will be up to milliseconds, padding with trailing 0s if + necessary; +- Otherwise, the sub-second part will be up to microseconds, padding + with trailing 0s if necessary. + +In other words, the encoded timestamp will either have no sub-second +part, a sub-second part of length 3, or a sub-second part of length 6. + +Party +----- + +Represented using their string representation, without any additional +quotes:: + + "Alice" + "Bob" + +Unit +---- + +Represented as empty object ``{}``. Note that in JavaScript ``{} !== +{}``; however, ``null`` would be ambiguous; for the type ``Optional +Unit``, ``null`` decodes to ``None``, but ``{}`` decodes to ``Some ()``. + +Additionally, we think that this is the least confusing encoding for +Unit since unit is conceptually an empty record. We do not want to +imply that Unit is used similarly to null in JavaScript or None in +Python. + +Date +---- + +Represented as an ISO 8601 date rendered using the format +``yyyy-mm-dd``:: + + 2019-06-18 + 9999-12-31 + 0001-01-01 + +The dates must be between the bounds specified by DAML-LF and ISO 8601, +[0001-01-01, 9999-99-99]. + +Text +---- + +Represented as strings. + +Bool +---- + +Represented as booleans. + +Record +------ + +Input +~~~~~ + +Records can be represented in two ways. As objects:: + + { f₁: v₁, ..., fₙ: vₙ } + +And as arrays:: + + [ v₁, ..., vₙ ] + +Note that DAML-LF record fields are ordered. So if we have + +:: + + record Foo = {f1: Int64, f2: Bool} + +when representing the record as an array the user must specify the +fields in order:: + + [42, true] + +The motivation for the array format for records is to allow specifying +tuple types closer to what it looks like in DAML. Note that a DAML +tuple, i.e. (42, True), will be compiled to a DAML-LF record ``Tuple2 { +_1 = 42, _2 = True }``. + +Output +~~~~~~ + +Records are always encoded as objects. + +List +---- + +Lists are represented as + +:: + + [v₁, ..., vₙ] + +Map +--- + +Maps are represented as objects: + +:: + + { k₁: v₁, ..., kₙ: vₙ } + +Optional +-------- + +Input +~~~~~ + +Optionals are encoded using ``null`` if the value is None, and with the +value itself if it's Some. However, this alone does not let us encode +nested optionals unambiguously. Therefore, nested Optionals are encoded +using an empty list for None, and a list with one element for Some. Note +that after the top-level Optional, all the nested ones must be +represented using the list notation. + +A few examples, using the form + +:: + + JSON --> DAML-LF : Expected DAML-LF type + +to make clear what the target DAML-LF type is:: + + null --> None : Optional Int64 + null --> None : Optional (Optional Int64) + 42 --> Some 42 : Optional Int64 + [] --> Some None : Optional (Optional Int64) + [42] --> Some (Some 42) : Optional (Optional Int64) + [[]] --> Some (Some None) : Optional (Optional (Optional Int64)) + [[42]] --> Some (Some (Some 42)) : Optional (Optional (Optional Int64)) + ... + +Finally, if Optional values appear in records, they can be omitted to +represent None. Given DAML-LF types + +:: + + record Depth1 = { foo: Optional Int64 } + record Depth2 = { foo: Optional (Optional Int64) } + +We have + +:: + + { } --> Depth1 { foo: None } : Depth1 + { } --> Depth2 { foo: None } : Depth2 + { foo: 42 } --> Depth1 { foo: Some 42 } : Depth1 + { foo: [42] } --> Depth2 { foo: Some (Some 42) } : Depth2 + { foo: null } --> Depth1 { foo: None } : Depth1 + { foo: null } --> Depth2 { foo: None } : Depth2 + { foo: [] } --> Depth2 { foo: Some None } : Depth2 + +Note that the shortcut for records and Optional fields does not apply to +Map (which are also represented as objects), since Map relies on absence +of key to determine what keys are present in the Map to begin with. Nor +does it apply to the ``[f₁, ..., fₙ]`` record form; ``Depth1 None`` in +the array notation must be written as ``[null]``. + +Type variables may appear in the DAML-LF language, but are always +resolved before deciding on a JSON encoding. So, for example, even +though ``Oa`` doesn't appear to contain a nested ``Optional``, it may +contain a nested ``Optional`` by virtue of substituting the type +variable ``a``:: + + record Oa a = { foo: Optional a } + + { foo: 42 } --> Oa { foo: Some 42 } : Oa Int + { } --> Oa { foo: None } : Oa Int + { foo: [] } --> Oa { foo: Some None } : Oa (Optional Int) + { foo: [42] } --> Oa { foo: Some (Some 42) } : Oa (Optional Int) + +In other words, the correct JSON encoding for any LF value is the one +you get when you have eliminated all type variables. + +Output +~~~~~~ + +Encoded as described above, always applying the shortcut for None record +fields. + +Variant +------- + +Variants are expressed as + +:: + + { constructor: argument } + +For example, if we have + +:: + + variant Foo = Bar Int64 | Baz Unit | Quux (Optional Int64) + +These are all valid JSON encodings for values of type Foo:: + + {"Bar": 42} + {"Baz": {}} + {"Quux": null} + {"Quux": 42} + +Note that DAML data types with named fields are compiled by factoring +out the record. So for example if we have + +:: + + data Foo = Bar {f1: Int64, f2: Bool} | Baz + +We'll get in DAML-LF + +:: + + record Foo.Bar = {f1: Int64, f2: Bool} + variant Foo = Bar Foo.Bar | Baz Unit + +and then, from JSON + +:: + + {"Bar": {"f1": 42, "f2": true}} + {"Baz": {}} + +This can be encoded and used in TypeScript, including exhaustiveness +checking; see `a keyed example`_. + +.. _a keyed example: https://www.typescriptlang.org/play/#src=type%20Foo%20%3D%0D%0A%20%20%20%20%7B%20Bar%3A%20%7B%20f1%3A%20number%2C%20f2%3A%20boolean%20%7D%20%7D%0D%0A%20%20%7C%20%7B%20Baz%3A%20%7B%20f3%3A%20string%20%7D%20%7D%3B%0D%0A%0D%0Afunction%20test(v%3A%20Foo)%20%7B%0D%0A%20%20if%20(%22Bar%22%20in%20v)%20%7B%0D%0A%20%20%20%20console.log(v.Bar.f1%2C%20v.Bar.f2)%3B%0D%0A%20%20%7D%20else%20if%20(%22Baz%22%20in%20v)%20%7B%0D%0A%20%20%20%20console.log(v.Baz.f3)%3B%0D%0A%20%20%7D%20else%20%7B%0D%0A%20%20%20%20const%20_%3A%20never%20%3D%20v%3B%0D%0A%20%20%7D%0D%0A%7D%20%0D%0A + +Enum +---- + +Enums are represented as strings. So if we have + +:: + + enum Foo = Bar | Baz + +There are exactly two valid JSON values for Foo, "Bar" and "Baz".