Write some docs

This commit is contained in:
Richard Feldman 2020-05-11 23:38:18 -04:00
parent f688236118
commit 1a348d1731
6 changed files with 239 additions and 82 deletions

View File

@ -4,9 +4,9 @@ interface List
## Types
## A list of values.
## A sequential list of values.
##
## >>> [ 1, 2, 3 ] # a list of ints
## >>> [ 1, 2, 3 ] # a list of numbers
##
## >>> [ "a", "b", "c" ] # a list of strings
##
@ -20,29 +20,43 @@ interface List
## mixedList = [ IntElem 1, IntElem 2, StrElem "a", StrElem "b" ]
## ```
##
## Lists are persistent data structures, meaning they are designed for fast copying.
## The maximum size of a #List is limited by the amount of heap memory available
## to the current process. If there is not enough memory available, attempting to
## create the list could crash. (On Linux, where [overcommit](https://www.etalabs.net/overcommit.html)
## is normally enabled, not having enough memory could result in the list appearing
## to be created just fine, but then crashing later.)
##
## > Under the hood, large lists are Reduced Radix Balanced Trees.
## > Small lists are stored as flat arrays. This "small list optimization"
## > applies to lists that take up 8 machine words in memory or fewer, so
## > for example on a 64-bit system, a list of 8 #Int values will be
## > stored as a flat array instead of as an RRBT.
## > The theoretical maximum length for a list created in Roc is
## > #Int.highestUlen divided by 2. Attempting to create a list bigger than that
## > in Roc code will always fail, although in practice it is likely to fail
## > at much smaller lengths due to insufficient memory being available.
##
## One #List can store up to 2,147,483,648 elements (just over 2 billion). If you need to store more
## elements than that, you can split them into smaller lists and operate
## on those instead of on one large #List. This often runs faster in practice,
## even for strings much smaller than 2 gigabytes.
## ## Performance notes
##
## Under the hood, a list is a record containing a `len : Ulen` field as well
## as a pointer to a flat list of bytes.
##
## This is not a [persistent data structure](https://en.wikipedia.org/wiki/Persistent_data_structure),
## so copying it is not cheap! The reason #List is designed this way is because:
##
## * Copying small lists is typically slightly faster than copying small persistent data structures. This is because, at small sizes, persistent data structures are usually thin wrappers around flat lists anyway. They don't start conferring copying advantages until crossing a certain minimum size threshold.
## Many list operations are no faster with persistent data structures. For example, even if it were a persistent data structure, #List.map, #List.fold, and #List.keepIf would all need to traverse every element in the list and build up the result from scratch.
## * Roc's compiler optimizes many list operations into in-place mutations behind the scenes, depending on how the list is being used. For example, #List.map, #List.keepIf, and #List.set can all be optimized to perform in-place mutations.
## * If possible, it is usually best for performance to use large lists in a way where the optimizer can turn them into in-place mutations. If this is not possible, a persistent data structure might be faster - but this is a rare enough scenario that it would not be good for the average Roc program's performance if this were the way #List worked by default. Instead, you can look outside Roc's standard modules for an implementation of a persistent data structure - likely built using #List under the hood!
List elem : @List elem
## Initialize
single : elem -> List elem
## If given any number less than 1, returns #[].
repeat : elem, Int -> List elem
empty : List *
range : Int, Int -> List Int
repeat : elem, Ulen -> List elem
range : Int a, Int a -> List (Int a)
## TODO I don't think we should have this after all. *Maybe* have an Ok.toList instead?
##
## When given an #Err, returns #[], and when given #Ok, returns a list containing
## only that value.
##
@ -59,8 +73,7 @@ fromResult : Result elem * -> List elem
reverse : List elem -> List elem
sort : List elem, (elem, elem -> [ Eq, Lt, Gt ]) -> List elem
sortBy : List elem, (elem -> field), (field, field -> [ Eq, Lt, Gt ]) -> List elem
sort : List elem, Sorter elem -> List elem
## Convert each element in the list to something new, by calling a conversion
## function on each of them. Then return a new list of the converted values.
@ -123,7 +136,7 @@ joinMap : List before, (before -> List after) -> List after
## >>> |> List.joinOks
joinOks : List (Result elem *) -> List elem
## Iterates over the shortest of the given lists and returns a list of `Tuple`
## Iterates over the shortest of the given lists and returns a list of `Pair`
## tags, each wrapping one of the elements in that list, along with the elements
## in the same position in # the other lists.
##
@ -131,12 +144,12 @@ joinOks : List (Result elem *) -> List elem
##
## Accepts up to 8 lists.
##
## > For a generalized version that returns whatever you like, instead of a `Tup`,
## > For a generalized version that returns whatever you like, instead of a `Pair`,
## > see `zipMap`.
zip :
List a, List b, -> List [ Tup a b ]*
List a, List b, List c, -> List [ Tup a b c ]*
List a, List b, List c, List d -> List [ Tup a b c d ]*
List a, List b, -> List [ Pair a b ]*
List a, List b, List c, -> List [ Pair a b c ]*
List a, List b, List c, List d -> List [ Pair a b c d ]*
## Like `zip` but you can specify what to do with each element.
##
@ -157,13 +170,35 @@ zipMap :
## elements for which the function returned `True`.
##
## >>> List.keepIf [ 1, 2, 3, 4 ] (\num -> num > 2)
keepIf : List elem, (elem -> Bool) -> List elem
##
## ## Performance Notes
##
## #List.keepIf always returns a list that takes up exactly the same amount
## of memory as the original, even if its length decreases. This is becase it
## can't know in advance exactly how much space it will need, and if it guesses a
## length that's too low, it would have to re-allocate.
##
## (If you want to do an operation like this which reduces the memory footprint
## of the resulting list, you can do two passes over the lis with #List.fold - one
## to calculate the precise new size, and another to populate the new list.)
##
## If given a unique list, #List.keepIf will mutate it in place to assemble the appropriate list.
## If that happens, this function will not allocate any new memory on the heap.
## If all elements in the list end up being kept, Roc will return the original
## list unaltered.
##
keepIf : List elem, (elem -> [True, False]) -> List elem
## Run the given function on each element of a list, and return all the
## elements for which the function returned `False`.
##
## >>> List.dropIf [ 1, 2, 3, 4 ] (\num -> num > 2)
dropIf : List elem, (elem -> Bool) -> List elem
##
## ## Performance Notes
##
## #List.dropIf has the same performance characteristics as #List.keepIf.
## See its documentation for details on those characteristics!
dropIf : List elem, (elem -> [True, False]) -> List elem
## Takes the requested number of elements from the front of a list
## and returns them.
@ -187,6 +222,22 @@ take : List elem, Int -> List elem
## >>> drop 5 [ 1, 2 ]
drop : List elem, Int -> List elem
## Access
first : List elem -> [Ok elem, ListWasEmpty]*
last : List elem -> [Ok elem, ListWasEmpty]*
get : List elem, Ulen -> [Ok elem, OutOfBounds]*
max : List (Num a) -> [Ok (Num a), ListWasEmpty]*
min : List (Num a) -> [Ok (Num a), ListWasEmpty]*
## Modify
set : List elem, Ulen, elem -> List elem
## Deconstruct
split : List elem, Int -> { before: List elem, remainder: List elem }
@ -202,7 +253,7 @@ walkBackwards : List elem, { start : state, step : (state, elem -> state) } -> s
## One #List can store up to 2,147,483,648 elements (just over 2 billion), which
## is exactly equal to the highest valid #I32 value. This means the #U32 this function
## returns can always be safely converted to an #I32 without losing any data.
len : List * -> U32
len : List * -> Ulen
isEmpty : List * -> Bool
@ -211,4 +262,3 @@ contains : List elem, elem -> Bool
all : List elem, (elem -> Bool) -> Bool
any : List elem, (elem -> Bool) -> Bool

View File

@ -1,12 +1,9 @@
interface Bool
exposes [ Bool, not, equal, notEqual ]
exposes [ not, isEq, isNe ]
imports []
## Either #True or #False.
Bool : [ False, True ]
## Returns #False when given #True, and vice versa.
not : Bool -> Bool
not : [True, False] -> [True, False]
## Returns #True when given #True and #True, and #False when either argument is #False.
##
@ -60,7 +57,7 @@ not : Bool -> Bool
## That said, in practice the `&& Str.isEmpty str` approach will typically run
## faster than the `&& emptyStr` approach - both for `Str.isEmpty` in particular
## as well as for most functions in general.
and : Bool, Bool -> Bool
and : [True, False], [True, False] -> [True, False]
## Returns #True when given #True for either argument, and #False only when given #False and #False.
@ -82,7 +79,10 @@ and : Bool, Bool -> Bool
## #True (causing it to immediately returns #True).
##
## See the performance notes for #Bool.and for details.
or : Bool, Bool -> Bool
or : [True, False], [True, False] -> [True, False]
## Exclusive or
xor : [True, False], [True, False] -> [True, False]
## Returns #True if the two values are *structurally equal*, and #False otherwise.
##
@ -95,12 +95,16 @@ or : Bool, Bool -> Bool
## 5. Collections (#String, #List, #Map, #Set, and #Bytes) are equal if they are the same length, and also all their corresponding elements are equal.
## 6. All functions are considered equal. (So `Bool.not == Bool.not` will return #True, as you might expect, but also `Num.abs == Num.negate` will return #True, as you might not. This design is because function equality has been formally proven to be undecidable in the general case, and returning #True in all cases turns out to be mostly harmless - especially compared to alternative designs like crashing, making #equal inconvenient to use, and so on.)
##
## This function always crashes when given two functions, or an erroneous
## #Float value (see #Float.isErroneous)
##
## This is the same as the #== operator.
eq : val, val -> Bool
isEq : val, val -> [True, False]
## Calls #eq on the given values, then calls #not on the result.
##
## This is the same as the #=/= operator.
notEq : val, val -> Bool
notEq = \left, right ->
## This is the same as the #!= operator.
isNe : val, val -> [True, False]
isNe = \left, right ->
not (equal left right)

View File

@ -71,10 +71,44 @@ interface Float
## These are very error-prone values, so if you see an assertion fail in
## developent because of one of them, take it seriously - and try to fix
## the code so that it can't come up in a release!
#FloatingPoint := FloatingPoint
## Returned in an #Err by #Float.sqrt when given a negative number.
#InvalidSqrt := InvalidSqrt
##
## ## Loud versus Quiet errors
##
## Besides precision problems, another reason floats are error-prone
## is that they have quiet error handling built in. For example, in
## a 64-bit floating point number, there are certain patterns of those
## 64 bits which do not represent valid floats; instead, they represent
## erroneous results of previous operations.
##
## Whenever any arithmetic operation is performed on an erroneous float,
## the result is also erroneous. This is called *error propagation*, and
## it is notoriously error-prone. In Roc, using equality operations like
## `==` and `!=` on an erroneous float causes a crash. (See #Float.isErroneous
## for other ways to check what erroneous value you have.)
##
## Beause erroneous floats are so error-prone, Roc discourages using them.
## Instead, by default it treats them the same way as overflow: by
## crashing whenever any #Float function would otherwise return one.
## You can also use functions like #Float.tryAdd to get an `Ok` or an error
## back so you can gracefully recover from erroneous values.
##
## Quiet errors can be useful sometimes. For example, you might want to
## do three floating point calculations in a row, and then gracefully handle
## the situation where any one of the three was erroneous. In that situation,
## quiet errors can be more efficient than using three `try` functions, because
## it can have one condition at the end instead of three along the way.
##
## ## Performance Notes
##
## Currently, loud errors are implemented using an extra conditional. Although
## this conditional will always be correctly branh-predicted unless an error
## occurs, there is a small effect on the instruction cache, which means
## quiet errors are very slightly more efficient.
##
## Long-term, it's possible that the Roc compiler may be able to implement
## loud errors using *signalling errors* in some situations, which could
## eliminate the performance difference between loud and quiet errors in
## the situation where no error occurs.
## Conversions
@ -167,7 +201,7 @@ tryRecip : Float a -> Result (Float a) [ DivByZero ]*
## Return an approximation of the absolute value of the square root of the #Float.
##
## Return #InvalidSqrt if given a negative number. The square root of a negative number is an irrational number, and #Float only supports rational numbers.
## Return #InvalidSqrt if given a negative number or an erroneous #Float. The square root of a negative number is an irrational number, and #Float only supports rational numbers.
##
## >>> Float.sqrt 4.0
##
@ -176,7 +210,22 @@ tryRecip : Float a -> Result (Float a) [ DivByZero ]*
## >>> Float.sqrt 0.0
##
## >>> Float.sqrt -4.0
sqrt : Float a -> Result (Float a) [ InvalidSqrt ]
sqrt : Float a -> [Ok (Float a), InvalidSqrt]*
## Like #Float.sqrt, but returning a *quiet NaN* if given a negative number.
##
## Quiet NaNs are notoriously more error-prone than explicit #Ok unions,
## so if you're using this instead of #Float.sqrt, be very careful not to let
## potential error cases go unhandled.
##
## ## Performance Notes
##
## This runs faster than #Float.sqrt, but is more error-prone because it makes
## it easier to forget to handle potential error cases. You may not forget
## when you just got done reading this paragraph, but the next person who
## comes along to modify the code may not have read it at all, and might not
## realize the need for seurity checks beause the requirement is implicit.
sqrtQuiet : Float a -> Float a
## Constants
@ -198,17 +247,28 @@ asc : Float a, Float a -> [ Eq, Lt, Gt ]
##
desc : Float a, Float a -> [ Eq, Lt, Gt ]
## Returns `True` when given #NaN, #Infiniy, or #Infinity,
## and `False` otherwise.
##
## >>> Float.isErroneous (Float.sqrtQuiet -2)
##
## >>> Float.isErroneous (Float.sqrtQuiet 2)
##
## To check more specifically which erroneneous value you have, see
## #Float.isNaN, #Float.isInfinite, #Float.isInfinity, and #Float.isNegativeInfinity
isErroneous : Float * -> Bool
## Limits
## The highest supported #Float value you can have, which is approximately 1.8 × 10^308.
##
## If you go higher than this, your running Roc code will crash - so be careful not to!
highest : Float *
maxF64 : Float *
## The lowest supported #Float value you can have, which is approximately -1.8 × 10^308.
##
## If you go lower than this, your running Roc code will crash - so be careful not to!
lowest : Float *
minF64 : Float *
## The highest integer that can be represented as a #Float without # losing precision.
## It is equal to 2^53, which is approximately 9 × 10^15.
@ -220,7 +280,7 @@ lowest : Float *
## >>> Float.highestInt + 100 # Increasing may lose precision
##
## >>> Float.highestInt - 100 # Decreasing is fine - but watch out for lowestLosslessInt!
highestInt : Float *
maxPreciseInt : Float *
## The lowest integer that can be represented as a #Float without losing precision.
## It is equal to -2^53, which is approximately -9 × 10^15.
@ -232,4 +292,4 @@ highestInt : Float *
## >>> Float.lowestIntVal - 100 # Decreasing may lose precision
##
## >>> Float.lowestIntVal + 100 # Increasing is fine - but watch out for highestInt!
lowestInt : Float *
maxPreciseInt : Float *

View File

@ -52,7 +52,7 @@ interface Int
##
## * Start by deciding if this integer should allow negative numbers, and choose signed or unsigned accordingly.
## * Next, think about the range of numbers you expect this number to hold. Choose the smallest size you will never expect to overflow, no matter the inputs your program receives. (Validating inputs for size, and presenting the user with an error if they are too big, can help guard against overflow.)
## * Finally, if a particular operation is too slow at runtime, and you know the native machine word size on which it will be running (most often either 64-bit or 32-bit), try switching to an integer of that size and see if it makes a meaningful difference. (The difference is typically extremely small.)
## * Finally, if a particular numeric calculation is running too slowly, you can try experimenting with other number sizes. This rarely makes a meaningful difference, but some processors can operate on different number sizes at different speeds.
Int size : Num (@Int size)
## A signed 8-bit integer, ranging from -128 to 127
@ -66,8 +66,8 @@ I64 : Int @I64
U64 : Int @U64
I128 : Int @I128
U128 : Int @U128
ILen : Int @ILen
ULen : Int @ULen
Ilen : Int @Ilen
Ulen : Int @Ulen
## A 64-bit signed integer. All number literals without decimal points are compatible with #Int values.
##
@ -91,7 +91,7 @@ ULen : Int @ULen
##
## * Larger integer sizes can represent a wider range of numbers. If you absolutely need to represent numbers in a certain range, make sure to pick an integer size that can hold them!
## * Smaller integer sizes take up less memory. This savings rarely matters in variables and function arguments, but the sizes of integers that you use in data structures can add up. This can also affect whether those data structures fit in [cache lines](https://en.wikipedia.org/wiki/CPU_cache#Cache_performance), which can be a performance bottleneck.
## * CPUs typically work fastest on their native [word size](https://en.wikipedia.org/wiki/Word_(computer_architecture)). For example, 64-bit CPUs tend to work fastest on 64-bit integers. Especially if your performance profiling shows that you are CPU bound rather than memory bound, consider #ILen or #ULen.
## * Certain CPUs work faster on some numeric sizes than others. If the CPU is taking too long to run numeric calculations, you may find a performance improvement by experimenting with numeric sizes that are larger than otherwise necessary. However, in practice, doing this typically degrades overall performance, so be careful to measure properly!
##
## Here are the different fixed size integer types:
##
@ -127,24 +127,19 @@ ULen : Int @ULen
## | ` (over 340 undecillion) 0` | #U128 | 16 Bytes |
## | ` 340_282_366_920_938_463_463_374_607_431_768_211_455` | | |
##
## There are also two variable-size integer types: #Iword and #Uword.
## Their sizes are determined by the machine word size for the system you're
## compiling for. For example, on a 64-bit system, #Iword is the same as #I64,
## and #Uword is the same as #U64.
## There are also two variable-size integer types: #Ulen and #Ilen. Their sizes
## are determined by the [machine word length](https://en.wikipedia.org/wiki/Word_(computer_architecture))
## of the system you're compiling for. (The "len" in their names is short for "length of a machine word.")
## For example, when compiling for a 64-bit target, #Ulen is the same as #U64,
## and #Ilen is the same as #I64. When compiling for a 32-bit target, #Ulen is the same as #U32,
## and #Ilen is the same as #I32. In practice, #Ulen sees much more use than #Ilen.
##
## If any operation would result in an #Int that is either too big
## or too small to fit in that range (e.g. calling `Int.highest32 + 1`),
## then the operation will *overflow* or *underflow*, respectively.
##
## When this happens:
##
## * In a development build, you'll get an assertion failure.
## * In a release build, you'll get [wrapping overflow](https://en.wikipedia.org/wiki/Integer_overflow), which is almost always a mathematically incorrect outcome for the requested operation. (If you actually want wrapping, because you're writing something like a hash function, use functions like #Int.addWrapping.)
## or too small to fit in that range (e.g. calling `Int.highestI32 + 1`),
## then the operation will *overflow*. When an overflow occurs, the program will crash.
##
## As such, it's very important to design your code not to exceed these bounds!
## If you need to do math outside these bounds, consider using
## a different representation other than #Int. The reason #Int has these
## bounds is for performance reasons.
## If you need to do math outside these bounds, consider using a larger numeric size.
# Int size : Num [ @Int size ]
## Arithmetic
@ -217,6 +212,11 @@ desc : Int a, Int a -> [ Eq, Lt, Gt ]
## TODO should we offer hash32 etc even if someday it has to do a hash64 and truncate?
##
## This function can crash under these circumstances:
##
## * It receives a function, or any type that contains a function (for example a record, tag, or #List containing a function)
## * It receives an erroneous #Float (`NaN`, `Infinity`, or `-Infinity` - these values can only originate from hosts)
##
## CAUTION: This function may give different answers in future releases of Roc,
## so be aware that if you rely on the exact answer this gives today, your
## code may break in a future Roc release.
@ -229,11 +229,11 @@ hash64 : a -> U64
##
## Note that this is smaller than the positive version of #Int.lowestI32
## which means if you call #Num.abs on #Int.lowestI32, it will overflow and crash!
highestI32 : I32
maxI32 : I32
## The lowest number that can be stored in an #I32 without overflowing its
## available memory and crashing.
##
## Note that the positive version of this number is this is larger than
## #Int.highestI32, which means if you call #Num.abs on #Int.lowestI32, it will overflow and crash!
lowest : I32
minI32 : I32

View File

@ -1,4 +1,6 @@
api Num provides Num, DivByZero..., neg, abs, add, sub, mul, isOdd, isEven, isPositive, isNegative, isZero
interface Num
exposes [ Num, neg, abs, add, sub, mul, isOdd, isEven, isPositive, isNegative, isZero ]
imports []
## Types
@ -12,13 +14,43 @@ api Num provides Num, DivByZero..., neg, abs, add, sub, mul, isOdd, isEven, isPo
##
## The number 1.5 technically has the type `Num FloatingPoint`, so when you pass two of them to `Num.add`, the answer you get is `3.0 : Num FloatingPoint`.
##
## The type #Float is defined to be an alias for `Num FloatingPoint`, so `3.0 : Num FloatingPoint` is the same answer as `3.0 : Float`.
##
## Similarly, the number 1 technically has the type `Num Integer`, so when you pass two of them to `Num.add`, the answer you get is `2 : Num Integer`.
##
## The type #Int is defined to be an alias for `Num Integer`, so `2 : Num Integer` is the same answer as `2 : Int`.
##
## The type #Float is defined to be an alias for `Num FloatingPoint`, so `3.0 : Num FloatingPoint` is the same answer as `3.0 : Float`. # # Similarly, the number 1 technically has the type `Num Integer`, so when you pass two of them to `Num.add`, the answer you get is `2 : Num Integer`. # # The type #Int is defined to be an alias for `Num Integer`, so `2 : Num Integer` is the same answer as `2 : Int`. #
## In this way, the `Num` type makes it possible to have `1 + 1` return `2 : Int` and `1.5 + 1.5` return `3.0 : Float`.
##
## ## Number Literals
##
## Number literals without decimal points (like `0`, `4` or `360`)
## have the type `Num *` at first, but usually end up taking on
## a more specific type based on how they're used.
##
## For example, in `(1 + List.len myList)`, the `1` has the type `Num *` at first,
## but because `List.len` returns a `Ulen`, the `1` ends up changing from
## `Num *` to the more specific `Ulen`, and the expression as a whole
## ends up having the type `Ulen`.
##
## Sometimes number literals don't become more specific. For example,
## the #Num.toStr function has the type `Num * -> Str`. This means that
## when calling `Num.toStr (5 + 6)`, the expression `(5 + 6)`
## still has the type `Num *`. When this happens, `Num *` defaults to
## being an #I32 - so this addition expression would overflow
## if either 5 or 6 were replaced with a number big enough to cause
## addition overflow on an #I32.
##
## If this default of #I32 is not big enough for your purposes,
## you can add an `i64` to the end of the number literal, like so:
##
## >>> Num.toStr 5_000_000_000i64
##
## This `i64` suffix specifies that you want this number literal to be
## an #I64 instead of a `Num *`. All the other numeric types have
## suffixes just like `i64`; here are some other examples:
##
## * `215u8` is a `215` value of type #U8
## * `76.4f32` is a `76.4` value of type #F32
## * `12345ulen` is a `12345` value of type `#Ulen`
##
## In practice, these are rarely needed. It's most common to write
## number literals without any suffix.
Num range : @Num range
## Convert
@ -137,3 +169,19 @@ sub : Num range, Num range -> Num range
## >>> Float.pi
## >>> |> Num.mul 2.0
mul : Num range, Num range -> Num range
## Convert
## Convert a number to a string, formatted as the traditional base 10 (decimal).
##
## >>> Num.toStr 42
##
## Only #Float values will show a decimal point, and they will always have one.
##
## >>> Num.toStr 4.2
##
## >>> Num.toStr 4.0
##
## For other bases see #toHexStr, #toOctalStr, and #toBinaryStr.
toStr : Num * -> Str

View File

@ -63,14 +63,9 @@ Str : [ @Str ]
## Since #Float values are imprecise, it's usually best to limit this to the lowest
## number you can choose that will make sense for what you want to display.
##
## If you want to kep all the digits, passing #Int.highestSupported will accomplish this,
## but it's recommended to pass much smaller numbers instead.
##
## Passing a negative number for decimal places is equivalent to passing 0.
decimal : Float *, ULen -> Str
## Convert an #Int to a string.
int : Int * -> Str
## If you want to keep all the digits, passing the same float to #Str.num
## will do that.
decimal : Float *, Ulen -> Str
## Split a string around a separator.
##