From e91df35902801b15d0bf2762953473eedf71ed68 Mon Sep 17 00:00:00 2001 From: Ara Adkins Date: Tue, 11 Jun 2019 17:07:54 +0100 Subject: [PATCH] Set up the repository (#1) * Add scalafmt configuration * Add docs and issue/PR templates * Update gitignore, add readme and license * Add contributing and code of conduct --- .github/ISSUE_TEMPLATE/bug-report.md | 53 + .github/ISSUE_TEMPLATE/epic.md | 31 + .github/ISSUE_TEMPLATE/feature-request.md | 24 + .github/ISSUE_TEMPLATE/task.md | 31 + .github/PULL_REQUEST_TEMPLATE.md | 19 + .gitignore | 54 + .scalafmt.conf | 53 + CODE_OF_CONDUCT.md | 88 + CONTRIBUTING.md | 197 ++ LICENSE | 201 ++ README.md | 59 + build.sbt | 2 +- doc/.DS_Store | Bin 0 -> 6148 bytes doc/design/.DS_Store | Bin 0 -> 6148 bytes doc/design/enso-philosophy.md | 100 + doc/design/runtime/graalvm.md | 76 + doc/design/runtime/runtime.md | 394 +++ doc/design/syntax/syntax.md | 2917 +++++++++++++++++++++ doc/design/type-system/.DS_Store | Bin 0 -> 6148 bytes doc/design/type-system/types.md | 1049 ++++++++ doc/graphmod/run.sh | 7 + doc/haskell-style-guide.md | 1375 ++++++++++ doc/scala-style-guide.md | 306 +++ 23 files changed, 7035 insertions(+), 1 deletion(-) create mode 100644 .github/ISSUE_TEMPLATE/bug-report.md create mode 100644 .github/ISSUE_TEMPLATE/epic.md create mode 100644 .github/ISSUE_TEMPLATE/feature-request.md create mode 100644 .github/ISSUE_TEMPLATE/task.md create mode 100644 .github/PULL_REQUEST_TEMPLATE.md create mode 100644 .scalafmt.conf create mode 100644 CODE_OF_CONDUCT.md create mode 100644 CONTRIBUTING.md create mode 100644 LICENSE create mode 100644 README.md create mode 100644 doc/.DS_Store create mode 100644 doc/design/.DS_Store create mode 100644 doc/design/enso-philosophy.md create mode 100644 doc/design/runtime/graalvm.md create mode 100644 doc/design/runtime/runtime.md create mode 100644 doc/design/syntax/syntax.md create mode 100644 doc/design/type-system/.DS_Store create mode 100644 doc/design/type-system/types.md create mode 100755 doc/graphmod/run.sh create mode 100644 doc/haskell-style-guide.md create mode 100644 doc/scala-style-guide.md diff --git a/.github/ISSUE_TEMPLATE/bug-report.md b/.github/ISSUE_TEMPLATE/bug-report.md new file mode 100644 index 00000000000..bc0369287c3 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug-report.md @@ -0,0 +1,53 @@ +--- +name: Bug Report +about: Report a bug in Enso. +title: '' +labels: 'Type: Bug' +assignees: '' + +--- + + + +### General Summary + + +### Steps to Reproduce + + +### Expected Result + + +### Actual Result + + +### Luna Version + diff --git a/.github/ISSUE_TEMPLATE/epic.md b/.github/ISSUE_TEMPLATE/epic.md new file mode 100644 index 00000000000..03461506333 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/epic.md @@ -0,0 +1,31 @@ +--- +name: Epic +about: Create a new epic for Enso development. +title: '' +labels: '' +assignees: '' + +--- + +### Summary + + +### Value + + +### Specification + + +### Acceptance Criteria & Test Cases + diff --git a/.github/ISSUE_TEMPLATE/feature-request.md b/.github/ISSUE_TEMPLATE/feature-request.md new file mode 100644 index 00000000000..a6ef4f8a57f --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature-request.md @@ -0,0 +1,24 @@ +--- +name: Feature Request +about: Request a new feature in Enso. +title: '' +labels: 'Type: Enhancement' +assignees: '' + +--- + + + +### General Summary + + +### Motivation + diff --git a/.github/ISSUE_TEMPLATE/task.md b/.github/ISSUE_TEMPLATE/task.md new file mode 100644 index 00000000000..f436156b873 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/task.md @@ -0,0 +1,31 @@ +--- +name: Task +about: Create a new development task for Enso. +title: '' +labels: '' +assignees: '' + +--- + +### Summary + + +### Value + + +### Specification + + +### Acceptance Criteria & Test Cases + diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md new file mode 100644 index 00000000000..5d33b1e2b25 --- /dev/null +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -0,0 +1,19 @@ +### Pull Request Description + + +### Important Notes + + +### Checklist +Please include the following checklist in your PR: + +- [ ] The documentation has been updated if necessary. +- [ ] The code conforms to the [Scala](https://github.com/luna/enso/blob/master/doc/scala-style-guide.md) or [Haskell](https://github.com/luna/enso/blob/master/doc/haskell-style-guide.md) style guides as appropriate. +- [ ] The code has been tested where possible. + diff --git a/.gitignore b/.gitignore index 96ef862d54e..1dc72453da1 100644 --- a/.gitignore +++ b/.gitignore @@ -1,2 +1,56 @@ + +########### +## Scala ## +########### + target/ +*.class +*.log + +############# +## Haskell ## +############# + +dist +cabal-dev +*.o +*.hi +*.chi +*.chs.h +*.dyn_o +*.dyn_hi +.hpc +.hsenv +.cabal-sandbox/ +cabal.sandbox.config +*.cabal +*.prof +*.aux +*.hp +*.DS_Store +.stack-work/ + +############ +## System ## +############ + +# OSX +.DS_Store + +############ +## Images ## +############ + +*.jpg +*.jpeg +*.png +*.bmp +*.psd + +###################### +## Tooling Specific ## +###################### + .idea/ +*.swp +.projections.json diff --git a/.scalafmt.conf b/.scalafmt.conf new file mode 100644 index 00000000000..ba07504a467 --- /dev/null +++ b/.scalafmt.conf @@ -0,0 +1,53 @@ +// Scala Formatting Configuration for Enso + +// All Scala files should be reformatted through this formatter before being +// committed to the repositories. + +version = "2.0.0-RC8" + +// Wrapping and Alignment +align = most +align.openParenCallSite = false +align.openParenDefnSite = false +align.tokens = [ + {code = "=>", owner = "Case"} + {code = "%", owner = "Term.ApplyInfix"} + {code = "%%", owner = "Term.ApplyInfix"} + {code = "="} + {code = "<-"} + {code = "extends"} + {code = ":", owner = "Defn.Def"} +] +maxColumn = 80 +verticalAlignMultilineOperators = true + +// Comments and Documentation +docstrings = "scaladoc" + +// Indentation +assumeStandardLibraryStripMargin = true +continuationIndent.callSite = 2 +continuationIndent.defnSite = 2 + +// Newlines +newlines.alwaysBeforeElseAfterCurlyIf = true +newlines.alwaysBeforeTopLevelStatements = true + +// Rewrite Rules +rewrite.rules = [ + ExpandImportSelectors, + PreferCurlyFors, + RedundantParens, + SortModifiers, +] +rewrite.sortModifiers.order = [ + "implicit", "final", "sealed", "abstract", + "override", "private", "protected", "lazy" +] + +// Multiline Configuration +verticalMultiline.atDefnSite = true +verticalMultiline.arityThreshold = 4 + +// Please remember that `//format: off` and `//format: on` directives should be +// used sparingly, if at all. diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md new file mode 100644 index 00000000000..f1de5c7bf9b --- /dev/null +++ b/CODE_OF_CONDUCT.md @@ -0,0 +1,88 @@ +# The Enso Code of Conduct + +## Conduct + +**Contact**: [luna-mods@luna-lang.org](mailto:luna-mods@luna-lang.org) + +- We are committed to providing a friendly, safe and welcoming environment for + all, regardless of level of experience, gender identity and expression, sexual + orientation, disability, personal appearance, body size, race, ethnicity, age, + religion, nationality, or other similar characteristic. +- On Discord, please avoid using overtly sexual nicknames or other nicknames + that might detract from a friendly, safe and welcoming environment for all. +- Please be kind and courteous. There's no need to be mean or rude. +- Respect that people have differences of opinion and that every design or + implementation choice carries a trade-off and numerous costs. There is seldom + a right answer. +- Please keep unstructured critique to a minimum. If you have solid ideas you + want to experiment with, make a fork and see how it works. +- We will exclude you from interaction if you insult, demean or harass anyone. + That is not welcome behaviour. We interpret the term "harassment" as including + the definition in the + [Citizen Code of Conduct](http://citizencodeofconduct.org/); if you have any + lack of clarity about what might be included in that concept, please read + their definition. In particular, we don't tolerate behaviour that excludes + people in socially marginalized groups. +- Private harassment is also unacceptable. No matter who you are, if you feel + you have been or are being harassed or made uncomfortable by a community + member, please contact one of the moderators or any of the + [Enso moderation team][mod_team] immediately. Whether you're a regular + contributor or a newcomer, we care about making this community a safe place + for you and we've got your back. +- Likewise any spamming, trolling, flaming, baiting or other attention-stealing + behaviour is not welcome. + +## Moderation +These are the policies for upholding our community's standards of conduct. If +you feel that a thread needs moderation, please contact the +[Enso moderation team][mod_team]. + +1. Remarks that violate the Enso standards of conduct, including hateful, + hurtful, oppressive, or exclusionary remarks, are not allowed. Cursing is + allowed, but never targeting another user, and never in a hateful manner. +2. Remarks that moderators find inappropriate, whether listed in the code of + conduct or not, are also not allowed. +3. Moderators will first respond to such remarks with a warning. +4. If the warning is unheeded, the user will be "kicked," i.e., kicked out of + the communication channel to cool off. +5. If the user comes back and continues to make trouble, they will be banned, + i.e., indefinitely excluded. +6. Moderators may choose at their discretion to un-ban the user if it was a + first offense and they offer the offended party a genuine apology. +7. If a moderator bans someone and you think it was unjustified, please take it + up with that moderator, or with a different moderator, **in private**. + Complaints about bans in-server are not allowed. +8. Moderators are held to a higher standard than other community members. If a + moderator creates an inappropriate situation, they should expect less leeway + than others. + +In the Enso community we all aim to go the extra mile and look out for each +other. We don't just aim to be technically unimpeachable, but we try to be our +very best selves. In particular, avoid flirting with offensive or sensitive +topics, particularly if they're off-topic. Doing so all too often leads to +unnecessary fights, hurt feelings and damaged trust; worse, it can drive people +away from the community entirely. + +If someone takes issue with something you said or did, resist the urge to be +defensive. Just stop what it was they complained about and apologise. Even if +you feel that you were misinterpreted or unfairly accused, it is likely that +there was something you could've communicated better — remember it is _your_ +responsibility to make your fellow Enso contributors comfortable. Everyone wants +to get along, and everyone in this community is here first and foremost to talk +about cool technology! You will find that people will be eager to assume good +intent and forgive as long as there is an atmosphere of trust. + +The enforcement policies listed above apply to all official Enso venues. This +includes the official discord and GitHub repositories under `luna`. + +_Adapted from the_ +_[Node.js Policy on Trolling](http://blog.izs.me/post/30036893703/policy-on-trolling)_ +_as well as the_ +_[Contributor Covenant v1.3.0](https://www.contributor-covenant.org/version/1/3/0/)._ + +[mod_team]: [luna-mods@luna-lang.org](mailto:luna-mods@luna-lang.org) + +## Credits +This code of conduct is adapted from the [Rust](https://rust-lang.org) code of +conduct. Many thanks to the Rust community for being such an exemplar of the +open-source spirit! diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 00000000000..9878eeaa61f --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,197 @@ +# Contributing to Enso +Thank you for your interest in contributing to Enso! We believe that only +through community involvement can Enso be the best it can be! There are a whole +host of ways to contribute, and every single one is appreciated. The major +sections of this document are linked below: + + + +- [Issues](#issues) +- [Feature Enhancements](#feature-enhancements) +- [Bug Reports](#bug-reports) +- [Hacking on Enso](#hacking-on-enso) +- [Pull Requests](#pull-requests) +- [Documentation](#documentation) +- [Issue Triage](#issue-triage) +- [Out-of-Tree Contributions](#out-of-tree-contributions) +- [Helpful Documentation and Links](#helpful-documentation-and-links) + + + +All contributions to Luna should be in keeping with our +[Code of Conduct](https://github.com/luna/luna/blob/CODE_OF_CONDUCT.md). + +## Issues +If you are looking for somewhere to start, check out the `Help Wanted` tag in +the following repositories: +- [Enso](https://github.com/luna/luna/labels/Status%3A%20Help%20Wanted) +- [Luna Studio](https://github.com/luna/luna-studio/labels/Status%3A%20Help%20Wanted) +- [Luna Manager](https://github.com/luna/luna-manager/labels/Status%3A%20Help%20Wanted) +- [Luna Dataframes](https://github.com/luna/dataframes/labels/Status%3A%20Help%20Wanted) + +## Feature Enhancements +If you feel like you have a suggestion for a change to the way that Enso works +as a language, please open an issue in our +[RFCs Repository](https://github.com/luna/luna-rfcs), rather than in this one! +New features and other significant language changes must go through the RFC +process so they can be properly discussed. + +## Bug Reports +While it's never great to find a bug, they are a reality of software and +software development! We can't fix or improve on the things that we don't know +about, so report as many bugs as you can! If you're not sure whether something +is a bug, file it anyway! + +**If you are concerned that your bug publicly presents a security risk to the +users of Enso, please contact +[security@luna-lang.org](mailto:security@luna-lang.org).** + +Even though GitHub search can be a bit hard to use sometimes, we'd appreciate if +you could +[search](https://github.com/luna/enso/search?q=&type=Issues&utf8=%E2%9C%93) for +your issue before filing a bug as it's possible that someone else has already +reported the issue. We know the search isn't the best, and it can be hard to +know what to search for, so we really don't mind if you do submit a duplicate! + +Opening an issue is as easy as following [this link](https://github.com/luna/enso/issues/new?template=bug-report.md) +and filling out the fields. The template is intended to collect all the +information we need to best diagnose the issue, so please take the time to fill +it out accurately. + +The reproduction steps are particularly important, as the more easily we can +reproduce it, the faster we can fix the bug! It's also helpful to have the +output of `enso --version`, as that will let us know if the bug is Operating +System or Architecture specific. + +## Hacking on Enso +This will get you up and running for Enso development, with only a minimal +amount of setup required. Enso's build system is fairly simple, allowing you to +bootstrap the compiler as long as you have... + +### Design Documentation +If you're going to start contributing to Enso, it is often a good idea to take a +look at the design documentation for the language. These files explain provide +both a rigorous specification of Enso's design, but also insight into the _why_ +behind the decisions that have been made. + +These can be found in [`doc/design/`](doc/design/), and are organised by the +part of the compiler that they relate to. + +### System Requirements +TBC + +### Getting the Sources +Given you've probably been reading this document on GitHub, you might have an +inkling where to look!. You can clone Luna using two methods: + +- **Via HTTPS:** We recommend you only use HTTPS if checking out the sources as + read-only. + +``` +git clone https://github.com/luna/enso.git +``` + +- **Via SSH:** For those who plan on regularly making direct commits, cloning + over SSH may provide a better user experience (but requires setting up your + SSH Keys with GitHub). + +``` +git clone git@github.com:luna/enso.git +``` + +### Building Enso +TBC + +#### Building Enso Components +TBC + +#### Developing Enso's Libraries +TBC + +#### Building Enso for Release +TBC + +#### Packaging Enso +TBC + +### Running Luna +TBC + +#### Projects + +#### Standalone Files + +#### Language Server Mode + +## Pull Requests +Pull Requests are the primary method for making changes to Enso. GitHub has +[fantastic documentation](https://help.github.com/articles/about-pull-requests/) +on using the pull request feature. Luna uses the 'fork-and-pull' model of +development. It is as described +[here](https://help.github.com/articles/about-collaborative-development-models/) +and involves people pushing changes to their own fork and creating pull requests +to bring those changes into the main Luna repository. + +Please make all pull requests against the `master` branch. + +Before making your PR, please make sure that the commit passes the Enso test +suite. You can run all the tests by ... TBC. In addition, please ensure that +your code conforms to the Enso [Scala Style Guide](./doc/scala-style-guide.md) +and [Haskell Style Guide](./doc/haskell-style-guide.md) as relevant. + +Make sure you perform these checks before _every_ pull request. You can even add +[git hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks) before +every push to make sure that you can't forget. + +Every pull request for Enso is reviewed by another person! You'll get a +reviewer from the core team assigned at random, but feel free to ask for a +specific person if you've dealt with them in a certain area before! + +Once the reviewer approves your pull request it will be tested by our continuous +integration provider before being merged! + +## Documentation +Documentation improvements are very welcome! The source for the Enso, Book can be +found in [`luna/luna-book`](https://github.com/luna/luna-book), but most of the +API documentation is generated directly from the code! + +Documentation pull requests are reviewed in exactly the same way as normal pull +requests. + +To find documentation-related issues, sort by the +[Category: Documentation](hhttps://github.com/luna/enso/labels/Category%3A%20Documentation) +label. + +## Issue Triage +Sometimes issues can be left open long after the bug has been fixed. Other +times, a bug might go stale because something has changed in the meantime. + +It can be helpful to go through older bug reports and make sure that they are +still valid. Load up an older issue, double check that it's still true, and +leave a comment letting us know if it is or is not. The +[least recently updated](https://github.com/luna/enso/issues?q=is%3Aissue+is%3Aopen+sort%3Aupdated-asc) +sort is good for finding issues like this. + +Contributors with sufficient permissions can help by adding labels to help with +issue triage. + +If you're looking for somewhere to start, take a look at the +[Difficulty: Beginner](https://github.com/luna/enso/labels/Difficulty%3A%20Beginner) +issue label, as well as the +[Status: Help Wanted](https://github.com/luna/enso/labels/Status%3A%20Help%20Wanted) +label. + +## Out-of-Tree Contributions +As helpful as contributing to Enso directly is, it can also be just as helpful +to contribute in other ways outside this repository: + +- Answer questions in the [Discord](https://discordapp.com/invite/YFEZz3y) or + on [StackOverflow](https://stackoverflow.com/questions/tagged/luna). +- Participate in the [RFC Process](https://github.com/luna/luna-rfcs). + +## Helpful Documentation and Links +For people new to Luna, and just starting to contribute, or even for more +seasoned developers, some useful places to look for information are: + +- [The Enso Book](https://luna-lang.gitbooks.io/docs/) +- The community! Don't be afraid to ask questions. diff --git a/LICENSE b/LICENSE new file mode 100644 index 00000000000..261eeb9e9f8 --- /dev/null +++ b/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/README.md b/README.md new file mode 100644 index 00000000000..af34189f35e --- /dev/null +++ b/README.md @@ -0,0 +1,59 @@ + + +

+ +

+

Enso Programming Language

+

+Visual and textual functional programming language with a focus on productivity, collaboration and development ergonomics. +

+ +Enso is a developer’s whiteboard on steroids. Design, prototype, develop and +refactor any application simply by connecting visual elements together. +Collaborate with co-workers, interactively fine tune parameters, inspect the +results and visually profile the performance in real-time. + +Visit [The Enso Website](http://www.luna-lang.org) to learn more! + +This repository contains the Enso interpreter and runtime, as well as its +command-line interface. For the full (visual) Enso Studio, please take a look at +[Enso Studio](https://github.com/luna/luna-studio). + +## Contributing to Enso +If you are interested in contributing to the development of Enso, please read +the [`CONTRIBUTING.md`](./CONTRIBUTING.md) file. It describes all the ways in +which you can help the project, as well as provides instructions for how to +build Enso. + +## Enso's Design +If you would like to gain a better understanding of the principles on which Enso +is based, or just delve into the why's and what's of Luna's design, please take +a look in the [`doc/design/` folder](./doc/design). + +This documentation will evolve as Enso does, both to help newcomers to the +project understand the reasoning behind the code, but also to act as a record of +the decisions that have been made through Enso's evolution. + +## License +This repository is licensed under the +[Apache 2.0](https://opensource.org/licenses/apache-2.0), as specified in the +[LICENSE](https://github.com/luna/luna/blob/master/LICENSE) file. + +Please be aware that, as the commercial backing for Enso, +**New Byte Order Sp. z o. o.** reserves the right under the CLA to use +contributions made to this repository as part of commercially available Enso +products. + +If these terms are unacceptable to you, please do not contribute to the +repository. + +### The Contributor License Agreement +As part of your first contribution to this repository, you need to accept the +Contributor License Agreement. You will automatically be asked to sign the CLA +when you make your first pull request. + +Any work intentionally submitted for inclusion in Luna shall be licensed under +this CLA. + +The CLA you sign applies to all repositories associated with the Enso project, +so you will only have to sign it once at the start of your contributions. diff --git a/build.sbt b/build.sbt index c7dcca525b4..ff33868d8dd 100644 --- a/build.sbt +++ b/build.sbt @@ -34,4 +34,4 @@ lazy val syntax = (project in file("syntax")) .settings(mainClass in (Compile,run) := Some("org.enso.syntax.Main")) .settings(bench := { (test in Benchmark).value - }) \ No newline at end of file + }) diff --git a/doc/.DS_Store b/doc/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..9dc9da12da6681c6e005303ec73eafbf65d368fa GIT binary patch literal 6148 zcmeHK!AiqG5Z!I7rij>sV2^w8)Olyx9=r(=J*cz^32mSzrAaLsBl!*eBfr4k zab|a02-SlZ5jz93Z+3QOmwgF4yNoezO}!RlHO80$idd>b^M&9z>XMYS2a(G;1}t=E z@z@4{vreT?beycd=|5S6~EuRJ01mb(rkY4Qn|9WzG0wj8F%(Dn%RkyOk>NL z++gp*NrtYrFQaI*upRd*@Mpc+&S?}TP7wMNnH+dNgxp>Sp%=}pXc~I4oX1uJRg_+> zo~G^AQ9~SbjOly#9=r(=J*cz^2{zCsrAZAnR`MJAM}C37 zXPKN2a(G;4p_t{ zpfU57qRH_e8NjnE!Ff&C5YFuT`3;;T8an;Zfrt-h*!QEjR{P|+d|_jA%Rt#S9;{xN zSg{?Cqn16o!`_u0_Z)3shhcwi+s=*eC7tr_c^JgDA9y2~?7JR>+~4|v8z!xA9JrC3 z$5I1Dluo&lrj7bZRU9=>r&WFb-~n8W8NWk=%- z-oaS0-6v}l^uyp0B4*jMEJ9*{7$64Li~)Vk8KpJb7j1_aAO?QV0NxJ*6w%a}DU??S zH2C|7;{`+%u<uXFVzCrgK&60RpaE!V%oKtLgnk4h4b%_=f6BlY>Wx&= literal 0 HcmV?d00001 diff --git a/doc/design/enso-philosophy.md b/doc/design/enso-philosophy.md new file mode 100644 index 00000000000..9968a83779d --- /dev/null +++ b/doc/design/enso-philosophy.md @@ -0,0 +1,100 @@ +# Enso's Philosophy +Enso is a programming language unlike any that have come before it, a seamless +blend of code and visual communication that can span organisations. + +> "The compiler is your assistant, and you interact with it to arrive at a +> working program via a conversation, with information going both ways." +> +> – [Edwin Brady](https://twitter.com/edwinbrady/status/1115361451005423617) + + + +- [Tenets of Enso](#tenets-of-enso) + - [Explicit Overrides Implicit](#explicit-overrides-implicit) +- [Designing Enso](#designing-enso) + + + +## Tenets of Enso +As we design Enso, we rely on a small set of principles to guide the design and +evolution of the language. They are elucidated below. + +- **Visual and Textual:** The visual and textual syntaxes are both first-class. + One is not translated to the other, but they are instead equivalent. New + features must work across both. +- **Unified Syntax:** The language syntax should be simple, concise, and be + usable at both the type and term levels. +- **Visual Communication:** A pure and functional language that lends itself + easily to visualisation of data-flows. +- **One Right Way:** There should, overwhelmingly, be only one way to perform a + given task. Limited choice breeds simplicity. +- **Help the User:** Enso should do its utmost to make things easier for the + user, even if that involves accepting additional tooling complexity. This does + not come at the exclusion of letting users access that power. +- **Simple Complexity:** Though the language is backed by powerful functionality + and compiler smarts, this should be invisible to the users until they care to + engage with it. +- **Powerful When Necessary:** Designed to employ powerful techniques that + confer safety and speed, allowing the users to write correct programs. +- **Performance and Predictability:** Predictable performance, with integrated + debugging and profiling features to help users diagnose their problems. +- **Explicit Overrides Implicit:** The design of Enso and its libraries should + _always_ account for making all behaviour implicit. + +### Explicit Overrides Implicit +When designing Enso and its libraries, we don't want to have any behaviour of a +function that is not recorded in its type, or its defaults. This gives rise to +two main principles for designing Enso's APIs: + +1. Use the correct types to inform both users and the tools about the behaviour + of the function. +2. Use the inbuilt capabilities for named and default arguments to provide + _sensible defaults_, for an API, without hiding behaviour. + +An example of this trade-off is reading a opening a file handle with +`openHandle`. + +Unlike more specialised functions such as `readFile` and `writeFile`, +`openHandle` is much more flexible about how it opens the file. In such cases, +users making use of this file function can generally be assumed to want to open +the file for both reading _and_ writing. + +This is the sensible, default, but it is made properly explicit by inclusion as +a defaulted keyword argument to the function: + +``` +type File.Mode : + Read + Write + RW + Append + RWA + +openHandle : File.Path -> File.Mode -> File.Handle +openHandle path -> fileMode = RW -> ... +``` + +In doing so, the design of the function tells the user the following things: + +- It takes a file path, which represents the location of a file on the user's + local system (as opposed to a `Resource.Path`, which is a location for a + generic resource). +- It has an _explicit_ default behaviour that it opens the file for both reading + and writing, that is defaulted as part of the definition, but can be + overridden if necessary. + +As a result, this design allows for _explicit_ communication of the behaviour of +the function, both under default, and non-default circumstances. Hence, the user +who would like to open a file for appending (via `appendHandle`), as well as +reading and writing, can call it as follows: `openHandle path (fileMode = RWA)`, +or just `openHandle path RWA`. + +## Designing Enso +As Enso's design and functionality evolves, we have to take the utmost care to +ensure that it doesn't balloon beyond control. As a result, every new feature +that we contemplate adding to the language should advance these core tenets, and +thereby ensure that it matches with the overall vision for Enso. + +We stick to the above principles when we're _building_ the compiler as well, +with code being liberally commented with need-to-know information, as well as +for clean and clear design documents to accompany it. diff --git a/doc/design/runtime/graalvm.md b/doc/design/runtime/graalvm.md new file mode 100644 index 00000000000..c4ad90ba2ae --- /dev/null +++ b/doc/design/runtime/graalvm.md @@ -0,0 +1,76 @@ +# GraalVM +The following are notes based on questions that we have about using GraalVM to +provide a backend for Enso. + +- C-API is a potentially useful tool for combining the Java/Truffle side and the + Haskell side. +- The polyglot API is the bindings to Graal itself, with access from both C and + Java. This is _not_ the same thing as the truffle framework for building new + programming languages. +- You can define your own C-API (as C-entry points) for your hosted language, + which then allows for you to call into the hosted language. +- There are no current plans to allow for compilation of the hosted language to + native code via SubstrateVM. You can store part of the JVM heap in the native + image, allowing for standalone executables. Serialisation of heap into the + image. +- JavaScript and LLVM are possible backends for SubstrateVM, so we can compile + Enso code indirectly to JS. The LLVM backend is on the roadmap, while the + former is possible. +- No explicit control of memory layout for the hosted language, instead with a + high level (JS-style) API for working with objects. There is no + non-encapsulated mechanism for accessing objects due to optimisation reasons. + If we wanted explicit memory layout control. Hybrid systems are supported. +- GraalPython is at an early stage of development, with a critical focus on the + scientific libraries. It is very much a first-class language on the Graal + platform. +- The Graal garbage collector is evolving to be a Java port of G1. Fairly high + priority to improve the GC, and its support for immutability. +- Graal doesn't currently have support for saving the heap except via something + like SubstrateVM. +- Caching: Truffle is designed for live specialisation, and has support for + inline caching. No support for retention or eviction at this stage. +- Truffle provides some support for reusable front-end optimisations in the form + of AST specialisation. Strong support for AST re-writing as an optimisation + technique. You can walk the tree at creation time, or at execution time, in + order to implement more global optimisations. Separation between allocation + and re-writing. +- Commercial support for bugfixes, but not necessarily for feature requests yet. +- Introspection/Debugging framework lets you introspect all portions of the code + during execution. We can dynamically add those introspection hooks in order to + do this optimally. +- There is no way to force compilation, but could modify truffle to add some + support for an API that allows this control. There is currently only one tier + of compilation, but it is _possible_ to write additional JIT phases. +- There is the polyglot REPL, but it is definitely possible to build a proper + polyglot repl based around Enso. Take a look at `polyglot - shell`. + +Sit down with Chris in London with some concrete code and examples. + +## A Design Based on GraalVM + +- The parser remains in Haskell. +- The backend works on AST changesets instead. +- The compiler is written in Scala using Truffle, implementing the following set + of processes: + + Ingestion: Transformation of AST to Graal internal AST. + + Desugaring: Removal of all syntax sugar from the expressions to create a + core language. + + Typechecking: Typechecking of the core language. + + Optimisation: Optimisation of the core language. + + Interpretation: Using GraalVM. + +Roadmap: + +1. Write up a specification of the runtime, type system, and syntax. +2. WD fixes the parser for the new syntax including markers, double rep, and AST + printing. +3. Build the first version of the GraalVM backend to operate in a standalone + fashion, with no interactivity. This will have no standard library support. +4. Build the standard library, initially focusing on the basics, HTTP, and the + primary interchange formats (JSON and YAML). +5. Design the IDE protocol and implement the necessary introspection + functionality. The messages should only be implemented enough to support the + Enso Studio use-cases. +6. Implement the caching in earnest. +7. Implement the typechecker and rework caching as needed. +8. Implement the necessary sets of optimisation passes in GraalVM. diff --git a/doc/design/runtime/runtime.md b/doc/design/runtime/runtime.md new file mode 100644 index 00000000000..bb81aa91505 --- /dev/null +++ b/doc/design/runtime/runtime.md @@ -0,0 +1,394 @@ +# Enso Runtime +This document contains a detailed specification of Enso's runtime. It includes +a description of the technologies on which it is built, as well as the features +and functionality that it is required to support. In addition, the document aims +to explain why _this_ design, rather than one of the many alternatives available +to the team. + +When we refer to the Enso 'runtime' in this document, we are referring to the +combination of the language communication protocol, typechecker, optimiser, and +interpreter. Though the interpreter itself has its own runtime, it is these +components that make up _Enso's_ runtime. + +The runtime is built on top of [GraalVM](https://www.graalvm.org/), a universal +virtual machine on which you can run any language with an appropriate +interpreter. In basing Enso's runtime on GraalVM, we not only have access to a +comprehensive toolkit for building high-performance language interpreters, but +also to the ecosystems of all the other languages (e.g. C++, Python, R) that can +run on top of it. GraalVM also brings some additional important tooling, such as +the JVM ecosystem's performance monitoring, analysis, and debugging toolsets. + +The runtime described below is a complex beast, so this document is broken up +into a number of sections. These aim to provide an architectural overview, and +then describe the design of each component in detail. + + + +- [Architectural Overview](#architectural-overview) + - [The Broader Enso Ecosystem](#the-broader-enso-ecosystem) + - [The Runtime's Architecture](#the-runtimes-architecture) +- [Choosing GraalVM](#choosing-graalvm) +- [The Runtime Components](#the-runtime-components) + - [Language Server](#language-server) + - [Filesystem Driver](#filesystem-driver) + - [Typechecker](#typechecker) + - [Optimiser](#optimiser) + - [Interpreter and JIT](#interpreter-and-jit) +- [Cross-Cutting Concerns](#cross-cutting-concerns) + - [Caching](#caching) + - [Profiling and Debugging](#profiling-and-debugging) + - [Foreign Language Interoperability](#foreign-language-interoperability) + - [Lightweight Concurrency](#lightweight-concurrency) +- [The Initial Version of the Runtime](#the-initial-version-of-the-runtime) +- [Development Considerations](#development-considerations) + + + +# Architectural Overview +The Enso runtime is just one of the many components of the Enso ecosystem. This +section provides an overview of how it fits into the broader ecosystem, with a +particular focus on how it enables workflows for Enso Studio, the Enso CLI, and +Language Server integration. In addition, this section also explores the +architecture of the runtime itself, breaking down the opaque 'runtime' label +into the + +## The Broader Enso Ecosystem +While the runtime is arguably the core part of Enso, for the language would not +be able to exist without it, the language's success is just as dependent on the +surrounding ecosystem. + +TBC... + +It is worth providing a brief explanation of each of the components to aid in +understanding how the runtime fits into the ecosystem. + +- **Enso Studio GUI:** This is the interface with which most of Enso's users + will interact. It handles the drawing of and interaction with the Enso graph + for users, as well as the searcher and other user-facing functionality. It + also provides a text editor. +- **Project Manager:** This allows for management of one or more Enso projects, + and is primarily responsible for file-system-agnostic interaction with the + project structure, and spawning of the Enso runtime. +- **GUI Backend:** The GUI backend is instantiated for each project, and is + responsible for all of the user-facing logic that goes into interaction with + the Enso runtime. + + **Graph State Manager:** This component handles management of the state + required to draw the graph in the GUI. + + **Double Representation Manager:** This component handles the encoding and + decoding of the Enso program to and from the intermediate representation. + + **Undo/Redo Manager:** This component handles undo and redo for the graph, a + somewhat novel operation as it does not not always have a 1:1 correspondence + with textual editing. +- **CLI:** This provides a command-line (specifically a terminal) interface to + the Enso runtime. This allows both for the CLI invocation of Enso, as well as + an interactive REPL. This communicates with the runtime itself via the + language server protocol. +- **Enso Runtime:** This is what is described in this document, and is + responsible for the execution of Enso programs. It handles the typechecking, + optimisation, and interpretation of Enso code, as well as the provision of + interfaces to foreign languages. + +## The Runtime's Architecture +In order to better appreciate how the components specified below interact, it is +important to have an understanding of the high-level architecture of the runtime +itself. The design in this document pertains _only_ to the 'Enso Runtime' +component of the diagram above, and hence makes no mention of the others. + +In the diagram below, the direction of arrows is used to represent the 'flow' of +information between the various components. + +TBC... + +# Choosing GraalVM +Building the runtime on top of GraalVM was of course not the only choice that +could've been made, but it was overwhelmingly the most sensible option out of +those considered. + +At the time the runtime was designed, there were three main options that were +being considered. + +- **LLVM:** A battle-tested and comprehensive toolchain for the creation of + language compilers, [LLVM](https://llvm.org/) includes facilities for + compilation, optimisation, JIT, and linking. +- **GHC:** The [Glasgow Haskell Compiler](https://gitlab.haskell.org/ghc/) is + a sophisticated compiler and runtime for Haskell that provides a + language-agnostic set of internal representations that could be leveraged to + compile and/or interpret other functional languages. +- **JVM:** The [JVM](https://openjdk.java.net/) is a high-performance virtual + machine that includes sophisticated garbage collection, profiling tools, and + a JIT compiler. +- **GraalVM:** A universal virtual machine and language development toolkit, + [GraalVM](https://www.graalvm.org/) provides a framework for building language + interpreters, as well as a JIT compiler. Most importantly, it provides tools + for seamless interoperability between languages that can run on Graal, which + include Python and R. + +The decision to build Enso's runtime using GraalVM was primarily motivated by +business concerns, but these concerns did not override the technical as well. +Addressing them one by one provides a comprehensive picture of why the decision +was made. + +Overall, it is clear that GraalVM is an optimal choice for Enso at this stage of +the language's development. While the other potential targets do have their +upsides (e.g. the JVM's sophisticated garbage collection machinery), they all +had at least one 'fatal flaw' for Enso's use case. + +### Speed of Development +A language runtime is a complex beast, so any solution that could remove some of +the implementation burden would be beneficial to Enso as a product. + +Where LLVM provides comprehensive tools for compiling languages, it provides no +actual runtime. This would require significant implementation effort, requiring +the implementation of facilities for concurrency, as well as garbage collection, +neither of which are simple tasks. + +GHC, on the other hand, provides a comprehensive runtime system that includes +both a garbage collector and sophisticated concurrency system. However, while it +does provide language-agnostic intermediate representations, these are tied to +Haskell from a development perspective. Unlike LLVM, GraalVM, or even the JVM, +if GHC Haskell requires a change to these representations, that change will be +made. + +With many languages already targeting the JVM it also seemed like an attractive +option. The stable bytecode target would be useful, but other languages have +proven the challenges of generating sensible bytecode to provide good language +performance. + +GraalVM manages to provide excellent performance with a sensible, high-level +interface, thereby enabling rapid development of a performant runtime without +the need to implement complex components such as a GC and concurrency. + +### Language Interoperability Support +With Enso aiming to be the be-all and end-all for the data-science world, the +ability to seamlessly interoperate with other programming languages is key. This +means that a user should be able to paste in some Python or R code and have it +work properly. + +From a simple perspective, there were no other options in this category. While +the JVM would allow for interoperability with other JVM languages such as Scala, +Kotlin, and Java itself, the two 'most important' languages for interoperation +had no support. LLVM's story is similar, allowing users to use LLVM IR as a +common interoperation format, but this is far less practical than the JVM. With +GHC, any interoperation would have to be developed from-scratch and by hand, +essentially ruling it out in this category. + +With GraalVM supporting not only our primary interoperability targets, but also +the whole JVM ecosystem and any language that targets LLVM, it is an absolute +dream for ensuring that Luna can seamlessly communicate with a whole host of +other programming languages. + +### Implementation Performance +Data science often involves the manipulation of very large amounts of data, and +ensuring that an interactive environment like Enso doesn't slow down as it does +so requires a high level of performance. + +GraalVM's partially-evaluated-interpreter based approach allows the developers +to write a 'naive interpreter' and automatically have the platform provide +better performance. This is a stark contrast to all of the other listed options, +each of which would require significant complexity around generating the right +intermediate representation structures, as well as significant work on front-end +language optimisations. + +In essence, GraalVM provides for the best performance with the smallest amount +of effort, while still providing comprehensive facilities to improve performance +further in the future. + +### Maintenance Burden +Just as important as getting a working runtime is the ability for the developers +to improve and evolve it. This encompasses many factors, but Enso is primarily +concerned with being able to evolve without having to account for undue changes +to the runtime. + +LLVM provides a relatively stable IR target, so the maintenance burden wouldn't +have been too onerous. Similarly for the JVM, where the bytecode format has been +stable for many years. Though both projects add new instructions, they very +rarely remove them, meaning that Enso's potential code generator would be able +to work as the underlying platform evolves. + +As mentioned before, however, the intermediate representations in GHC that Enso +would have used as a target are very much changeable. This is due to their +primary existence being to support GHC's version of Haskell, which means that +they change often. Furthermore, their generation would require copying of many +of the idiosyncrasies of GHC's lowering mechanisms, and in all likelihood place +a significant burden on Enso's developers. + +GraalVM, on the other hand, provides a stable interface to writing an +interpreter that is far higher level than any of the other options. This API is +very unlikely to change, but even if it does the high-level nature means that +the maintenance burden of coping with those changes is significantly reduced. +Furthermore, GraalVM comes with the truffle toolkit for building interpreters, +and as a result provides many of the facilities required by Enso for free or at +least for little effort. + +# The Runtime Components +Like any sensible large software project, Enso's runtime is modular and broken +down into components. These are described in detail below. + +## Language Server +The language server component is responsible for controlling the runtime itself. +It communicates with other portions of the ecosystem (such as the REPL and the +Enso Studio backend) via a protocol. While this protocol is based on the +[Language Server Protocol](https://microsoft.github.io/language-server-protocol/specification), +it has been extended significantly to better support Enso's use-cases. + + + +## Filesystem Driver +This component of the runtime deals with access from the runtime to external +devices. This includes the Enso code files on disk, but is also responsible for +watching filesystem resources (such as databases, files, and sockets) that are +used by Enso programs. + + + +## Typechecker +The typechecker is the portion of the runtime that handles the type-inference +and type-checking of Enso code. This is a sophisticated piece of machinery, with +the primary theory under which it operates being described in the specification +of [the type-system](../type-system/types.md). + + + +## Optimiser +With much of Enso's performance relying on the JIT optimiser built into Graal, +the native language optimiser instead relies on handling more front-end specific +optimisations. + + + +## Interpreter and JIT +The interpreter component is responsible for the actual execution of Enso code. +It is built on top of the Truffle framework provided by GraalVM, and is JIT +compiled by GraalVM. + + + +# Cross-Cutting Concerns +The runtime also has to deal with a number of concerns that don't fit directly +into the above components, but are nevertheless important parts of the design. + +## Caching +The runtime cache for Enso is a key part of how it delivers exceptional +performance when working on big data sets. The key recognition, as seen in many +data processing tools, is that changing code or data often doesn't require the +interpreter to recompute the entire program. Instead, it can only recompute the +portions that are required of it, while using cached results for the rest. + + + +## Profiling and Debugging +Similarly important to the Enso user experience is the ability to visually +debug and profile programs. This component deals with the retrieval, storage, +and manipulation of profiling data, as well as the ability to debug programs in +Enso using standard and non-standard debugging paradigms. + + + +## Foreign Language Interoperability +This component deals with using the GraalVM language interoperability features +to provide a seamless interface to foreign code from inside Enso. + + + +## Lightweight Concurrency +Though not strictly a component, this section deals with how Enso can provide +its users with lightweight concurrency primitives in the form of green threads. + + + +# The Initial Version of the Runtime +In order to have a working version of the new runtime as quickly as possible, it +was decided to design and build an initial, stripped-down version of the final +design. This design focused on development of a minimal working subset of the +runtime that would allow Enso to run. + + + +# Development Considerations +As part of developing the new Enso runtime, the following things need to be +accounted for. This is to ensure that the eventual quality of the software is +high, and that we also provide a product that is actually useful to our users. + +- **Benchmarking:** A comprehensive micro and macro benchmark suite that tests + all the components of the runtime. This should be accompanied by a regression + suite to catch performance regressions. +- **Execution Tests:** A test suite that checks that executing Enso programs + results in the correct outputs. +- **Typechecker Tests:** A test suite that ensures that changes made to the + typechecker do not result in acceptance of ill-typed programs, or rejection of + well-typed programs. +- **Caching Tests:** A test suite that ensures that data is evicted from the + cache when it should be, and retained when it should be. diff --git a/doc/design/syntax/syntax.md b/doc/design/syntax/syntax.md new file mode 100644 index 00000000000..ea7e40789ce --- /dev/null +++ b/doc/design/syntax/syntax.md @@ -0,0 +1,2917 @@ +# Enso: Simplicity and Correctness + +Enso is an award winning, general purpose, purely functional programming +language with a double visual and textual representations and a very expressive, +dependent type system. Enso was designed to be easy to use and reason about. It +was equipped with a novel type system which automatically limits the possible +human errors significantly and thus drastically improves the quality of the +final solution. Enso is intended to be a production language, not a research +one, and so, the design avoids including new and untested features in its main +development branch. + +From the technical point of view, Enso incorporates many recent innovations in +programming language design. It provides higher-order functions, strict and +non-strict semantics, first-class algebraic data types, and a novel type system, +which merges the worlds of dependent types and refinement types under a single +umbrella. Enso is both the culmination and solidification of many years of +research on functional languages and proof assistants, such as Haskell, Idris, +Agda, or Liquid Haskell. + +**Note [To be included somewhere]**: Enso is dependently typed because we can +run arbitrary code on type-level. + +## Why a New Programming Language? + +Since the 1980s, the way programmers work and the tools they use have changed +remarkably little. There is a small but growing chorus that worries the status +quo is unsustainable. The problem is that programmers are having a hard time +keeping up with their own creations. “Even very good programmers are struggling +to make sense of the systems that they are working with,” says Chris Granger, a +software developer who worked as a lead at Microsoft on Visual Studio. + +We believe that without a drastic change to how a software is created, the +humanity is not able to progress to the next level. However, in contrast to many +approaches to find the next generation human-computer interaction interface, we +believe that the textual code should not be replaced, it should be enhanced +instead. The same way as writing co-exists with speaking, there are use cases +where the code is just more convenient than any other approach. To learn more +about why we have created Enso, please refer to our +[blog post about it](https://medium.com/@luna_language/luna-the-visual-way-to-create-software-c4db520d6d1e). + +## Software Correctness Matters + +In September 2007, Jean Bookout was driving on the highway with her best friend +in a Toyota Camry when the accelerator seemed to get stuck. When she took her +foot off the pedal, the car didn't slow down. She tried the brakes but they +seemed to have lost their power. As she swerved toward an off-ramp going 50 +miles per hour, she pulled the emergency brake. The car left a skid mark 150 +feet long before running into an embankment by the side of the road. The +passenger was killed. Bookout woke up in a hospital a month later. + +The incident was one of many in a nearly decade-long investigation into claims +of so-called unintended acceleration in Toyota cars. Toyota blamed the incidents +on poorly designed floor mats, “sticky” pedals, and driver error, but outsiders +suspected that faulty software might be responsible. The National Highway +Traffic Safety Administration enlisted software experts from NASA to perform an +intensive review of Toyota’s code. After nearly 10 months, the NASA team hadn't +found evidence that software was the cause—but said they couldn't prove it +wasn't. + +It was during litigation of the Bookout accident that someone finally found a +convincing connection. Michael Barr, an expert witness for the plaintiff, had a +team of software experts spend 18 months with the Toyota code, picking up where +NASA left off. Using the same model as the Camry involved in the accident, +Barr’s team demonstrated that there were more than 10 million ways for key tasks +on the on-board computer to fail, potentially leading to unintended +acceleration. They showed that as little as a single bit flip could make a car +run out of control. + +The above text is part of an amazing article +[The Coming Software Apocalypse](https://www.theatlantic.com/technology/archive/2017/09/saving-the-world-from-code/540393/) +by James Somers, we strongly encourage you to read it all in order to understand +many of Enso design principles. + +We believe that everyone should be able to process data and create software. +Thus, we strongly disagree with the assumption that developers should learn how +to formally prove properties about their programs, as it requires a very rare +theoretical background. We believe that it's the responsibility of the language +and its tooling to prove the correctness of the users' creation. _“Human +intuition is poor at estimating the true probability of supposedly ‘extremely +rare’ combinations of events in systems operating at a scale of millions of +requests per second. That human fallibility means that some of the more subtle, +dangerous bugs turn out to be errors in design; the code faithfully implements +the intended design, but the design fails to correctly handle a particular +‘rare’ scenario.”_, wrote Chris Newcombe, who was a leader on Amazon Web +Services team and one of Steam creators. Enso was designed to prove the +correctness and provide as much valuable information to the user as possible in +a completely automatic and interactive fashion. + +## Immediate Connection To Data + +Software creation is a very creative process. However, while using conventional +languages, programmers are like chess players trying to play with a blindfold on +– so much of their mental energy is spent just trying to picture where the +pieces are that there’s hardly any left over to think about the game itself. + +Enso was designed around the idea that _people need an immediate connection to +what they are making_, which was introduced by Brett Victor in his amazing talk +[Inventing on Principle](https://vimeo.com/36579366). Any violation of this +principle alienates users from the actual problems they are trying to solve, +which consequently decreases the understanding and increases the number of +mistakes. + +Enso visual representation targets domains where data processing is the primary +focus, including data science, machine learning, IoT, bioinformatics, predictive +maintenance, computer vision, computer graphics, sound processing or +architecture. Each such domain requires a highly tailored data processing +toolbox and Enso provides both an unified foundation for building such toolboxes +as well as growing library of existing ones. At its core, Enso delivers a +powerful data flow modeling environment and an extensive data visualization and +manipulation framework. + +## Simplicity. The Ultimate Sophistication + +The language design can drastically affect the required +[cognitive load](https://en.wikipedia.org/wiki/Cognitive_load), the total amount +of mental effort being used in the brains' working memory. The easier it is to +both express thoughts and understand the existing logic, the faster and less +error prone the whole software creation process is. Enso bases on a set of +principles designed to keep the required cognitive effort low: + +1. **The textual and visual representations must play well with each other.** + Both visual and textual representations are equivalently important, as the + user is allowed to switch between them on demand. Moreover, the textual + representation is an integral part of the visual one, in the form of + expressions above nodes. Any functionality which does not fit both worlds at + the same time will be rejected. +2. **Simplicity and expressiveness are more important than design + minimalism.** + Enso targets a broad range of domain experts, not necessarily professional + developers. Thus it should be expressive, easy to understand, and reason + about, yet it should never stand in a way of power users. +3. **There should be one (and preferably only one) way to achieve a goal.** One + of the greatest power of a good syntax is that it is easy to understand by a + wide range of users in the sense of both skill set as well as background + (different organizations). The more layout styles or design patterns, the + more libraries with different, often incompatible approaches will appear. In + the ideal world, a language would provide one and only one way to express + intention and format source. +4. **Type level syntax = value level syntax.** + Enso type system is designed to be as expressive and as natural to use as + rest of the code. We believe that the only true solution for next generation + programming languages is a well designed dependent type system which will + blend type level and value level computations into a single scope. Creating a + future proof dependent type system is still an open research field and we can + observe many different approaches among modern programming languages and + learn from their mistakes. One of such biggest mistake is using different + syntax forms or namespaces for type and value level expressions. It leads to + having special mechanisms to promote values between the namespaces, like + prefixing value level data with apostrophe to bring it to type level and + prevent name clash (see `-XDataKinds` in Haskell). +5. **Small number of rules is better than large.** + Any special case or syntactic rule has to be remembered by the user and + consumes important cognitive power. On the other hand, the syntax can easily + be oversimplified, which usually leads to complex, hard to understand errors. + Usually it is preferred to choose a solution which does not introduce any new + special cases. +6. **Predictable performance and behavior.** + Predictable performance and behavior is one of the most important principles + which separates well designed languages from the bad designed ones. A + language provides a predicable behavior when its user can write code which + will not break because of some external conditions, like not-dependent code + change. A good examples of breaking this rule are standard extension methods + mechanism (monkey patching in Ruby, Python, JavaScript) or orphan overlapping + instances in Haskell. Moreover, simple refactoring of the code should never + affect the performance. Again, consider Haskell here. Changing + `func2 a = func1 a` to `func2 = func1` + [can affect performance](https://gitlab.haskell.org/ghc/ghc/issues/8099) + which it makes Haskell programs very hard to reason about. +7. **Guidance is better than on-boarding.** + In particular, it should provide guidance regarding possible next steps and + human readable error messages. + +# Textual Representation + +## Encoding + +Enso accepts UTF8 encoded source code. Tabs are disallowed and every tab is +always automatically converted to four spaces. There is no configuration option +provided on purpose. All variables and operators identifiers are restricted to +ASCII characters. Enso libraries should be widely accessible and users cannot +struggle with typing the function names. However, we understand that there are +situations when using Unicode characters is desirable, for example to design a +high level visual library targeting a narrow domain in a particular country. +That's why Enso allows users to specify optional localized names as part of +function documentation and provides a special support for searching them in Enso +Studio. + +We do not plan to support the usage of Unicode characters for operators. +Paraphrasing the +[Idris wiki](https://github.com/idris-lang/Idris-dev/wiki/Unofficial-FAQ#will-there-be-support-for-unicode-characters-for-operators), +which we highly agree with: + +- Unicode operators are hard to type. This is important, as it often disables + the possibility of using someone else's code. Various code editors provide + their users with their own input methods, but we haven't experienced an + efficient UX yet. +- Not every piece of software easily supports it. Unicode does not render + properly on some phone browsers, email clients, or IRC clients to name a few. + All of these can be fixed by the end user, for example by using a different + software. However, it sets a higher barrier to entry to using a programming + language. +- Many Unicode characters look very similar. We had enough trouble with + confusion between 0 and O without worrying about all the different kinds of + colons and brackets. + +Surely, Unicode operators can make the code look pretty, however, proper font +with a well-designed ligatures is able to provide the same, or very similar +results. We are very open to revisit this topic in a few years from now, +however, for now Unicode characters are disallowed in Enso operators. If you +want to help us design an Enso Font, don't hesitate to tell us about it! + +## Layout rules + +The layout rules were designed to be both flexible yet enforce good practices. + +### Maximum Line Length + +The maximum line length is 80 characters. If your code exceeds that, the +compiler will emit warning message about it. There is no way to change this +setting on purpose. Limiting the required editor window width makes it possible +to have several files open side-by-side, and works well when using code review +tools that present the two versions in adjacent columns. The default wrapping in +most tools disrupts the visual structure of the code, making it more difficult +to understand. The limits are chosen to avoid wrapping in editors with the +window width set to 80. + +### Indentation Blocks + +Enso uses indentation to determine the structure of the code. + +In general, every indented line consists a sub-structure of the nearest previous +line with a smaller indentation. We refer to them as child line and parent line, +respectively. There are a few additional layout rules: + +- **Operator on the end of a parent line** + If a line ends with an operator then all of its child lines form a code block. + Code blocks in Enso are a syntactic sugar for a monadic bindings, you will + learn about them in later chapters. The most common usage is a function + definition body after the last arrow operator: + + ```haskell + test = a -> b -> + sum = a + b + print 'The sum is `sum`' + ``` + +* **Operator on the beginning of a child line** + If all the children lines start with operators, they form a single expression, + while the operators behave left associative with the lowest precedence level. + In other words, every line forms a separate expression and the final + expression is build line-by-line, top to bottom. The most common usage is to + use the dot operator to create chained method calls. Please note that the + operator on the beginning of a child line is used after the line expression is + formed, so in the following code both `tst1` and `tst2` have exactly the same + value. + + ```haskell + nums = 1..100 + . each random + . sort + . take 100 + + tst1 = 12 * (1 + 2) + tst2 = 12 + * 1 + 2 + ``` + +* **Otherwise** + In all other cases, every child line is considered to form a separate + expression passed as an argument to the parent expression. The most common + usage is to split long expressions to multiple lines. The following example + uses the named argument mechanism. + + ```haskell + geo1 = sphere (radius = 15) (position = vector 10 0 10) (color = rgb 0 1 0) + geo2 = sphere + radius = 15 + position = vector 10 0 10 + color = rgb 0 1 0 + ``` + +- **Debug line breaker `\\`** + + There is also a special, debug line-break operator `\\` which placed on the + beginning of a child line tells Enso to just glue the line with the previous + one. However, the line-break operator should not be used in production code, + as it's always better to re-structure the code to separate method calls + instead. In the following code, both `debugFunc` and `validFunc` work in the + same way, but the definition of `validFunc` is formatted properly. + + ```haskell + debugFunc = v -> v2 -> + print (v2.normalize * ((v.x * v.x) + (v.y * v.y) + \\ + (v.z * v.z)).sqrt) + + validFunc = v -> v2 -> + len = ((v.x * v.x) + (v.y * v.y) + (v.z * v.z)).sqrt + v' = v2.normalize * len + print v' + ``` + +## Naming Rules + +Naming convention unifies how code is written by different developers, increases +the immediate understanding and allows to provide compiler with useful +information in a very convenient fashion. In particular, pattern matching +requires a way to distinguish between free variables and already declared ones. +Enso uses a simple naming convention to provide such information: + +- You are free to use capitalized and uncapitalized identifiers as function and + variable names. Type definition identifier should always be capitalized. +- Capitalized and uncapitalized identifiers are not distinguishable and always + refer to the same value. +- Using capitalized identifier to refer to uncapitalized one is allowed in + pattern matching only. Using uncapitalized identifier to refer to capitalized + one is disallowed. +- In pattern matching capitalized identifiers refer to values in the scope, + while uncapitalized identifiers are used for free variables only. + +Using upper case letters for constructors in pattern matching has important +benefits. Whenever you see an upper case identifier, you know it is a data +structure being taken apart, which makes it much easier for a human to see what +is going on in a piece of code. Moreover, while using this convention, +construction and pattern matching is as simple as writing the right name and +does not require any magic from the compiler or usage of special symbols. + +# Type System + +Enso is a statically typed language. It means that every variable is tagged with +an information about its possible values. Enso's type system bases on the idea +that each type is denoted by a set of values, called `constructors`. Formally, +this makes the type system a +[Modular Lattice](https://en.wikipedia.org/wiki/Modular_lattice). For an +example, the type `Nat` contains constructors `1, 2, 3, ...`, and is hence +denotable by a set of the possible values. + +As a result, type checking doesn't work via _unification_ as one might expect if +they are familiar with other functional programming languages, but instead +checks if a given set of values is a valid substitution for another. There is, +of course, the empty set `Void`, and a set of all possible values `Any`. + +Each value forms a set with a single member, the value itself. This notion is +supported by an enforced equivalence between value-level and type-level syntax, +as the compiler makes no distinction between the two. This means that it is +perfectly valid to type `7 : 7`. Because we can describe infinite number of sets +containing a particular value, every value in Enso has infinite number of types. +Taking in consideration the lucky number `7`, it is a `Natural` number, +`Integer` number, a `Number`, and also `Any` value at the same time! This +relation could be expressed as follow: + +```haskell +7 : 7 : Natural : Integer : Number : Any : Any : ... +``` + +## Type Signatures + +Enso allows providing explicit type information by using the colon operator. The +compiler considers type signatures as hints and is free to discard them if they +do not provide any new information. However, if the provided hint is incorrect, +an error is reported. + +For example, the following code contains an explicit type signature for the `a` +variable. Although the provided type tells that `a` is either an integer number +or a text, the compiler knows its exact value and is free to use it instead of +the more general type. Thus, no error is reported when the value is incremented +in the next line. + +```haskell +a = 17 : Int | Text +b = a + 1 +print b +``` + +However, if the provided type contains more information than the currently +inferred one, both are merged together. Consider the following example for +reference. + +```haskell +test : Int -> Int -> Int +test = a -> b -> + c = a + b + print c + c +``` + +Without the explicit type signature, the inferred type would be very generic, +allowing the arguments to be of any type as long as it allows for adding the +values and printing them to the screen. The provided type is more specific, so +Enso would allow to provide this function only with integer numbers now. +However, the provided type does not mention the context of the computations. The +compiler knows that `print` uses the `IO` context, so considering the provided +hint, the final inferred type would be +`Int in c1 -> Int in c2 -> Int in IO | c1 | c2`. + +TODO: The above information about contexts could be removed from here as its +pretty advanced. We should just mention that explicit type signatures are hints +everywhere BUT function definitions and new type definitions, where they are +constraining possible values. + +It's worth to note that the type operator is just a regular operator with a very +low precedence and it is defined in the standard library. + +# Variables + +## Bringing Variables to Scope + +The only way to bring new variables into scope is by using pattern matching. +There are two places in the code where pattern matching occurs – on the left +side of assignment operator and on the left side of lambda operator. + +```haskell + = ... + -> ... +``` + +## To Be Described + +- explicit typing +- immutable memory + +# Functions + +Enso is a purely functional programming language. It supports +[first-class and higher-order functions](https://en.wikipedia.org/wiki/Functional_programming#First-class_and_higher-order_functions), +which means that you can pass functions as arguments to other functions, return +them as functions results, assign them to variables, and store them in data +structures. + +## Creating and Using Functions + +Functions are defined in a similar way to variables. The only difference is that +the function name is followed by parameters seperated by spaces. For example, +the following code defines a function taking two values and returning their sum. + +```haskell +sum x y = x + y +``` + +Putting a space between two things in expressions is simply _function +application_. For example, to sum two numbers by using the function defined +above, simply write `sum 1 2`. + +Under the hood, the function definition is translated to a much more primitive +construct, a variable assigned with an expression of nested, unnamed functions, +often referred to as lambdas. In contrast to the function definition, lambda +definition accepts a single argument only: + +```haskell +sum = x -> y -> x + y +``` + +Expressing functions accepting multiple arguments as nested lambda expressions +is a clever trick, often referred to as _curried functions_. What does that +mean? The `sum` function looks like it takes two parameters and returns their +sum. In reality, doing `sum 1 2` first creates a function that takes a parameter +and returns either sum of `1` and that parameter. Then, `2` is applied to that +function and that function produces our desired result. That sounds like a +mouthful but it's actually a really cool concept. The following two calls are +equivalent: + +```haskell +sum 1 2 +(sum 1) 2 +``` + +Functions allow expressing complex logic easily by encapsulating and reusing +common behaviors. The following code defines a sequence of one hundred numbers, +uses each of them to get a new random number, discards everything but the first +10 numbers, and then sorts them. Please note the usage of the `each` function, +which takes an action and a list as arguments, and applies the action to every +element of the list. The `random` returns a pseudo-random number if applied with +a seed value (it always returns the same value for the same seed argument). + +```haskell +list = 1 .. 100 +randomList = each random list +headOfList = head 10 randomList +result = sort headOfList +``` + +## Function Type + +As the function definition translates under the hood to an ordinary variable +assignment, you can use the type expression to provide the compiler with an +additional information about arguments and the result type. In the same fashion +to variables, if no explicit type is provided, the type is assigned with the +value itself: + +```haskell +sum : x -> y -> x + y +sum = x -> y -> x + y +``` + +By using an explicit type, you can narrow the scope of possible values accepted +by the function. For example, the above definition accepts any type which can be +concatenated, like numbers or texts, while the following one accepts numbers +only: + +```haskell +sum : Number -> Number -> Number +sum = x -> y -> x + y +``` + +Each function is assigned with an _arity_. Although you will not often use this +term when writing the code, it's a useful concept used later in this document. +Arity is the number of arguments and lambdas statically used in the function +definition. Note that arity is not deducible from the type. For example, the +function `fn` has the arity of `2` even though its type suggests it takes `3` +arguments: + +```haskell +fn : Bool -> Bool -> Bool -> Bool + fn a = b -> case a && b of + True -> not + False -> id +``` + +## Code Blocks + +You can think of code blocks like about functions without arguments. Code blocks +do not accept arguments, however they can invoke actions when used. Let's just +see how to define and use a code block. The definition is just like a variable +definition, however, there is a new line immediately after the `=` sign. +Consider the following code. It just ask the user about name and stores the +answer in the `name` variable: + +```haskell +print "What is your name?" +name = Console.get +``` + +We can now define a main function, or to be more precise, the main code block: + +```haskell +getName = + print "What is your name?" + Console.get +``` + +In contrast to expression, code blocks are not evaluated immediately. In order +to evaluate the code block, simply refer to it in your code: + +```haskell +greeter = + name = getName + print "It's nice to meet you, #{name}!" +``` + +You may now wonder, what the type of a code block is. The code block `getName` +returns a `Text`, so your first guess may be that it's type is simply +`getName : Text`. Although the compiler is very permissive and will accept this +type signature, the more detailed one is `getName : Text in IO`, or to be really +precise `getName : Text in IO.Read ! IO.ReadError`. A detailed description of +how code blocks work and what this type means will be provided in the chapter +about contexts later in this book. + +There are rare situations when you want to evaluate the code block in place. You +can use the `do` keyword for exactly this purpose. The do function just accepts +a code block, evaluates it and returns its result. An example usage is shown +below: + +```haskell +greeter = + name = do + print "What is your name?" + Console.get + print "It's nice to meet you, #{name}" +``` + +Without the `do` keyword the code block would not be executed and `name` would +refer to the code block itself, not its final value. + +## Uniform Calling Syntax (UCS) + +Enso uses Uniform Calling Syntax which generalizes two function call notations +`lst.map +1` and `map +1 lst`. The generalization assumes flipped argument order +for operators, so `a + b` is equal to `a.+ b`. Paraphrasing Bjarne Stroustrup +and Herb Sutter, having two call syntaxes makes it hard to write generic code. +Libraries authors will either have to support both syntaxes (verbose, +potentially doubling the size of the implementation) or make assumptions about +how objects of certain types are to be invoked (and we may be wrong close to 50% +of the time). + +Each of these notations has advantages but to a user the need to know which +syntax is provided by a library is a bother. Thus implementation concerns can +determine the user interface. We consider that needlessly constraining. + +The following rules apply: + +- Two notations, one semantics. Both notations are equivalent and always resolve + to the same behavior. + +- The expression `base.fn` is a syntactic sugar for `fn (this=base)`. In most + cases, the `this` argument is the last argument to a function, however, + sometimes the argument name could be omitted. Consider the following example + including sample implementation of the concatenation operator: + + ```haskell + >> : (a -> b) -> (b -> c) -> a -> c + >> f g this = g $ f this + + vecLength = map (^2) >> sum >> sqrt + print $ [3,4].vecLength -- Result: 5 + ``` + +Function resolution: + +- Always prefer a member function for both `x.f y` and `f y x` notations. +- Only member functions, current module's functions, and imported functions are + considered to be in scope. Local variable `f` could not be used in the `x.f y` + syntax. +- Selecting the matching function: + 1. Look up the member function. If it exists, select it. + 2. If not, find all functions with the matching name in the current module and + all directly imported modules. These functions are the _candidates_. + 3. Eliminate any candidate `X` for which there is another candidate `Y` whose + `this` argument type is strictly more specific. That is, `Y` self type is a + substitution of `X` self type but not vice versa. + 4. If not all of the remaining candidates have the same self type, the search + fails. + 5. Eliminate any candidate `X` for which there is another candidate `Y` which + type signature is strictly more specific. That is, `Y` type signature is a + substitution of `X` type signature. + 6. If exactly one candidate remains, select it. Otherwise, the search fails. + +For example, the following code results in a compile time error. The self type +`[Int, Int]` is strictly more specific than the type `[a,b]` and thus this +candidate was selected in the step 3 of the algorithm. However, it is impossible +to unify `1` and `Text`. + +```haskell +test = n -> [a,b] -> + [a+n, b+n] + +test : Text -> [Int, Int] -> [Text, Text] +test = s -> [a,b] -> + [s + a.show , s + b.show] + +[1,2].test 1 +``` + +## Operators + +Operators are functions with non alphanumeric names, like `+`, `-` or `*`. +Operators are always provided with two arguments, one on the left, one one the +right side, for example, in order to add two numbers together you can simply +write `1 + 2`. It could be a surprise, but we've been using a lot of operators +so far – a space is a special operator which applies arguments to functions! +Space has a relatively high precedence, higher than any operator, so the code +`max 0 10 + max 0 -10` is equivalent to `(max 0 10) + (max 0 -10)`. Another +interesting operator is the field accessor operator, often referred to as the +dot operator. It is used to access fields of structures. For example, to print +the first coordinate of a point `pt` you can simply write `print pt.x`. However, +please note that the way the accessor function behaves differs from probably +every language you've learned so far. You'll learn more about it in the +following sections. + +Enso gives a lot of flexibility to developers to define custom operators. +Formally, any sequence of the following characters forms an operator +`.!$%&*+-/<>?^~\`. The operator definition is almost the same as function +definition, with an optional precedence relation declaration. Consider the +following definition from the standard library: + +```haskel +@prec [> *, < $] +@assoc left +a ^ n = a * a ^ (n-1) +``` + +The `prec` decorator specifies the +[precedence relation](https://en.wikipedia.org/wiki/Order_of_operations) to +other operators. Here, we specified that the precedence is bigger than the +multiplication operator. The precedences are inherited in Enso, so if the +multiplication operator was provided with information that it has a bigger +precedence than addition, the new operator above will inherit this dependency as +well. The `assoc` decorator defines the +[operator associativity](https://en.wikipedia.org/wiki/Operator_associativity) – +it is either left, right or none. If you do not provide the information, no +precedence relations would be defined and the associativity will default to +left. + +### Precedence + +Operator precedence is a collection of rules that reflect conventions about +which procedures to perform first in order to evaluate a given mathematical +expression. For example, multiplication operator is granted with a higher +precedence than addition operator, which means that multiplication will be +performed before addition in a single expression like `2 + 5 * 10`. + +However, in contrast to most languages, the operator precedence depends on the +fact if a particular operator was surrounded with spaces or not. **The +precedence of any operator not surrounded with spaces is always higher than the +precedence of any operator surrounded with spaces.** For example, the code +`2+5 * 10` results in `70`, not `50`! + +The space-based precedence allows for writing much cleaner code than any other +functional language, including all languages from the ML family, like Haskell, +Agda or Idris. Let's consider the previous example: + +```haskell +list = 1 .. 100 +randomList = each random list +headOfList = head 10 randomList +result = sort headOfList +``` + +It could be easily refactored to a long one-liner: + +```haskell +result = sort (head 10 (each random (1 .. 100))) +``` + +Such expression is arguably much less readable than the original code, as it +does not allow to read in a top-bottom, left-right fashion. However, by using +the Uniform Calling Syntax, we can further transform the code: + +```haskell +result = (((1 .. 100).each random).head 10).sort +``` + +Much better. We can now read the expression from left to right. The result is +still a little bit verbose, as we need to use many nested parentheses. The +space-based precedence combined with the fact that the accessor is just a +regular operator in Enso allow us to throw them away! The rule is simple – the +space operator has higher precedence than any operator surrounded with spaces: + +```haskell +result = 1..100 . each random . head 10 . sort +``` + +### Sections + +Operator section is just a handy way to apply the left or the right argument to +an operator and return a curried function. For example, the expression `(+1)` is +a function accepting a single argument and returning an incremented value. +Incrementing every value in a list is a pure joy when using sections: + +```haskell +list = 1 .. 100 +list2 = list.each (+1) +``` + +Because the space-based precedence applies to sections as well, the above code +may be further simplified to: + +```haskell +list = 1 .. 100 +list2 = list.each +1 +``` + +Another interesting example is using the accessor operator with the section +syntax. The following code creates a list of one hundred spheres with random +positions sorts them based on the first position coordinate. The `.position.x` +is just a section which defines a function taking a parameter and returning its +nested field value. + +```haskell +spheres = 1..100 . each i -> sphere (position = point i.random 0 0) +sortedSpheres = spheres . sortBy .position.x +``` + +## Mixfix Functions + +Mixfix functions are just functions containing multiple sections, like +`if ... then ... else ...`. In Enso, every identifier containing underscores +indicates a mixfix operator. between each section there is always a single +argument and there is a special syntactic sugar for defining mixfix operators. +Consider the implementation of the `if_then_else` function from the standard +library: + +```haskell +if cond _then (ok in m) _else (fail in n) = + case cond of + True -> ok + False -> fail +``` + +For now, please ignore the `in m` and `in n` parts, you will learn about them in +the following chapters. When using mixfix functions, all the layout rules apply +like if every section was a separate operator, so you can write an indented +block of code after each section. Consider the following example, which asks the +user to guess a random number: + +```haskell +main = + print 'Guess the number (1-10)!' + guess = Console.get + target = System.random 1 10 + + if guess == target then print 'You won!' else + print 'The correct answer was #{target}' + answerLoop + + answerLoop = + print 'Do you want to try again? [yes / no]' + answer = Console.get + case answer of + 'yes' -> main + 'no' -> nothing + _ -> + print "I don't understand." + answerLoop +``` + +## Arguments + +### Named Arguments + +Unlike the majority of purely functional programming languages, Enso supports +calling functions by providing arguments by name. Consider a function that +creates a sphere based on the provided radius, position, color and geometry type +(like polygons or +[NURBS](https://en.wikipedia.org/wiki/Non-uniform_rational_B-spline)). All the +arguments are named and can be used explicitly when evaluating the function. + +```haskell +sphere : Number -> Point -> Color -> Geometry.Type +sphere radius position color type = undefined +``` + +Remembering the order of the arguments is cumbersome. Such code is also often +hard to understand and reason about: + +```haskell +s1 = sphere 10 (point 0 0 0) (color.rgb 0.5 0.5 0.5) geometry.NURBS +``` + +By using named arguments, we can transform the code to: + +```haskell +s1 = sphere (radius = 10) (position = point 0 0 0) (color = color.rgb 0.5 0.5 0.5) + (creator = geometry.NURBS) +``` + +By applying the layout rules described above, we can transform the code to a +much more readable form: + +```haskell +s1 = sphere + radius = 10 + position = point 0 0 0 + color = color.rgb 0.5 0.5 0.5 + creator = geometry.NURBS +``` + +### Default Arguments + +Consider the sphere example above again. Providing always all the arguments +manually is both cumbersome and error prone: + +```haskell +s1 = sphere 10 (point 0 0 0) (color.rgb 0.5 0.5 0.5) geometry.NURBS +``` + +Function definition allows providing a default value to some of the arguments. +The value will be automatically applied if not provided explicitly. For example, +the above code is equivalent to: + +```haskell +s1 = sphere 10 +``` + +Informally, when you call a function, Enso will traverse all not provided +arguments in order and will apply the default values unless it founds the first +argument without a default value defined. To disable this behavior, you can use +the special `...` operator. The following code creates a curried function which +accepts radius, color and geometry type and creates a sphere with radius of +placed in the center of the coordinate system: + +```haskell +centeredSphere radius = sphere radius (point 0 0 0) ... +``` + +By using the `...` operator in combination with named arguments, we can make the +code much more readable: + +```haskell +centeredSphere = sphere + position = point 0 0 0 + ... +``` + +### Positional Arguments + +Enso supports so called positional arguments call syntax. Consider the sphere +example above. How can you define a new function which accepts radius, color and +geometry type and returns a sphere always placed in the center of the coordinate +system? There are few ways. First, you can create the function explicitly (you +will learn more about function definition in the following chapters): + +```haskell +originSphere radius color creator = sphere radius (point 0 0 0) color creator +``` + +Alternatively, you can use the positional arguments call syntax: + +```haskell +originSphere = sphere _ (point 0 0 0) _ _ +``` + +Of course, you can combine it with the operator canceling default argument +application: + +```haskell +originSphere = sphere _ (point 0 0 0) ... +``` + +There is an important rule to remember. Enso gathers all positional arguments +inside a particular function body or expression enclosed in parentheses in order +to create a new function, so the following code creates a function accepting two +arguments. It will result the sum of the first argument squared and the second +argument. + +```haskell +squareFirstAndAddSecond = _ ^2 + _ +``` + +### Optional Arguments + +Optional arguments are not a new feature, they are modeled using the default +arguments mechanism. Consider the following implementation of a `read` function, +which reads a text and outputs a value of a particular type: + +```haskell +read : Text -> t -> t +read text this = t.fromText text +``` + +You can use this function by explicitly providing the type information in either +of the following ways: + +```haskell +val1 = read '5' Int +val2 = Int.read '5' +``` + +However, the need to provide the type information manually could be tedious, +especially in context when such information could be inferred, like when passing +`val1` to a function accepting argument of the `Int` type. Let's re-write the +function providing the default value for `this` argument to be ... itself! + +```haskell +read : Text -> (t=t) -> t +read text (this=this) = t.fromText text +``` + +The way it works is really simple. You can provide the argument explicitly, +however, if you don't provide it, it's just assigned to itself, so no new +information is provided to the compiler. If the compiler would not be able to +infer it then, an error would be raised. Now, we are able to use it in all the +following ways: + +```haskell +fn : Int -> Int +fn = id + +val1 = read '5' Int +val2 = Int.read '5' +val3 = read '5' +fn val3 +``` + +Enso provides a syntactic sugar for the `t=t` syntax. The above code can be +written in a much nicer way as: + +```haskell +read : Text -> t? -> t +read text this? = t.fromText text +``` + +### Splats Arguments + +Enso provides both args and kwargs splats arguments. You can easily construct +functions accepting variable number of both positional as well as keyword +arguments. Splats arguments are an amazing utility mostly for creating very +expressive EDSLs. Consider the following function which just prints arguments, +each in a separate line: + +```haskell +multiPring : args... -> Nothing in IO +multiPrint = args... -> args.each case + Simple val -> print val + Keyword key val -> print '#{key} = #{val}' + +multiPrint 1 2 (a=3) 4 +``` + +```haskell +--- Results --- +1 +2 +a = 3 +4 +``` + +## Variable Scoping + +Type variables, function variables, and lambda variables live in the same space +in Enso. Moreover, as there is no distinction between types and values, you can +use the same names both on type level as well as on value level because they +refer to the same information in the end: + +```haskell +mergeAndMap : f -> lst1 -> lst2 -> out +mergeAndMap = f -> lst1 -> lst2 -> (lst1 + lst2) . map f +``` + +If different names are used on type level and on a value level to refer a +variable, it's natural to think that this variable could be pointed just by two +separate names. The following code is valid as well: + +```haskell +mergeAndMap = (f : a -> b) -> (lst1 : List a) -> (lst2 : List a) -> (lst1 + lst2) . map f +``` + +Which, could also be distributed across separate lines as: + +```haskell +mergeAndMap : (a -> b) -> List a -> List a -> List b +mergeAndMap f lst1 lst2 = (lst1 + lst2) . map f +``` + +An interesting pattern can be observed in the code above. Taking in +consideration that every value and type level expressions have always the same +syntax, the lambda `(a -> b) -> List a -> List a -> List b` could not have much +sense at the first glance, as there are three pattern matches shadowing the +variable `a`. So how does it work? There is an important shadowing rule. **If in +a chain of lambdas the same name was used in several pattern matches, the names +are unified and have to be provided with the same value.** The chain of lambdas +could be broken with any expression or code block. + +This rule could not make a lot of sense in the value level, as if you define +function `add a a = a + a` you will be allowed to evaluate it as `add 2 2`, but +not as `add 2 3`, however, you should not try to use the same names when +defining functions on value level nevertheless. + +By applying this rule, the type of the above example makes much more sense now. +Especially, when evaluated as `mergeAndMap show [1,2] ['a','b']`, the variables +will be consequently instantiated as `a = Number | Text`, and `b = Text`, or to +be very precise, `a = 1|2|'a'|'b'`, and `b = '1'|'2'|'a'|'b'`. + +Because type variables are accessible in the scope of a function, it's +straightforward to define a polymorphic variable, which value will be an empty +value for the expected type: + +```haskell +empty : t +empty = t.empty + +print (empty : List Int) -- Result: [] +print (empty : Text) -- Result: '' +``` + +### Type Applications + +All libraries that sometimes need passing an explicit type from user should be +designed as the `read` utility, so you can optionally pass the type if you want +to, while it defaults to the inference otherwise. However, sometimes it's handy +to just ad-hoc refine a type of a particular function, often for the debugging +purposes. Luna allows to both apply values by name as well as refining types by +name. The syntax is very similar, consider this simple function: + +```haskell +checkLength : this -> Bool +checkLength this = + isZero = this.length == 0 + if isZero + then print "Oh, no!" + else print "It's OK!" + isZero +``` + +This function works on any type which has a method `length` returning a number. +We can easily create a function with exactly the same functionality but it's +input type restricted to accept lists only: + +```haskell +checkListLength = checkLength (this := List a) +``` + +As stated earlier, in most cases there is a nicer way of expressing such logic. +In this case, it would be just to create a function with an explicit type. The +following code is equivalent to the previous one: + +```haskell +checkListLength : List a -> Bool +checkListLength = checkLength +``` + +### Open Questions + +- Do we want to support explicit signatures for the following use case? The + function `f` is applied with two named arguments, but we do not know their + ordering. We only know that the function accepts at least `3` arguments and + that the 3rd argument can be `7`. + + ```haskell + test : a -> b -> (<> -> 7 -> x) -> x + test a b f = f (x=a) (y=b) 7 + ``` + +- Should this code work (double `b` name)? If so, what is the type signature + containing names? + + ```haskell + fn1 c b d = a + 1 + fn2 a b = fn1 + + value = fn2 + a = 1 + b = 2 + c = 3 + b = 4 + d = 5 + ``` + +## + +# Data Types + +## Constructor Types + +Constructors define the most primitive way to construct a type, it's where they +name comes from. Formally, they are +[product types](https://en.wikipedia.org/wiki/Product_type). Their fields are +always named and fully polymorphic (each field has a distinct polymorphic type). +Constructors are distinguishable. You are not allowed to pass an constructor to +a function accepting other constructor, even if their fields are named the same +way. + +```haskell +type Vec3 x y z +type Point3 x y z + +vec1 = Vec3 1 2 3 : Vec3 1 2 3 : Vec3 Int Int Int +pt1 = Point3 1 2 3 : Point3 1 2 3 : Point3 Int Int Int + +test : Vec3 Int Int Int -> Int +test v = v.x + v.y + v.z + +test pt1 -- Compile time error. Expected Vec3, got Point3. +``` + +## Algebraic Data Types + +Enso allows you to define new types by combining existing ones into so called +[algebraic data types](https://en.wikipedia.org/wiki/Algebraic_data_type). There +are several algebraic operations on types available: + +- **Intersection** + A type intersection combines multiple types into one type that has all the + features combined. For example, `Serializable & Showable` describes values + that provide mechanisms for both serialization and printing. + +- **Difference** + A type difference combines multiple types into one type that has all the + features of the first type but not the features of the second one. For + example, `Int \ Negative` describes all positive integer values or zero. + +- **Union** + A type union combines multiple types into one type that describes a value + being of one of the types. For example, `Int | String` describes values that + are either `Int` or `String`. + +```haskell +type Just value +type Nothing +maybe a = just a | nothing + +map : (a -> b) -> Maybe a -> Maybe b +map f = case of + Just a -> Just (f a) + Nothing -> Nothing +``` + +### Syntax sugar + +Enso provides a syntactic sugar for easy definition of algebraic data types and +related methods. You are always required to provide explicit name for all the +constructors and all its fields. + +```haskell +type Maybe a + Just value:a + Nothing + + map : (a -> b) -> Maybe b + map f = case this + Just a -> Just (f a) + Nothing -> Nothing +``` + +Please note, that all functions defined in the type definition scope are +desugared to global functions operating on that type. However, all functions +defined as constructor field are considered to be record fields. They can be +provided with a default implementation and their definition can be changed in +runtime. + +### To Be Described + +```haskell +-- Difference between method and a function component +type Foo + MkFoo + function : Int -> self + function = default implementation + + method : Int -> self + method = implementation +``` + +## Data Types as Values + +```haskell +sum : a -> b -> a + b +sum = a -> b -> a + b + +lessThan : a -> b -> a < b +lessThan = a -> b -> a < b + +main = + print $ sum 1 2 -- 3 + print $ sum Int Int -- Int + print $ lessThan 1 2 -- True + print $ lessThan Int Int -- Bool + print $ lessThan 0 Int -- Bool + print $ lessThan -1 Natural -- True +``` + +Please note, that `lessThan -1 Natural` returns `True`, which is just more +specific than `Bool` because it holds true for every natural number. +<<<<<<< HEAD +======= + +## Interfaces + +- **TO BE DONE [WD - research]** + +## Field Modifiers + +You can add the equal sign `=` as an operator suffix to transform it into a +modifier. Modifiers allow updating nested structures fields. + +In general, the following expressions are equivalent. The `+` is used as an +example and can be freely replaced with any other operator: + +```haskell +foo' = foo.bar += t +-- <=> +bar' = foo.bar +bar'' = t + bar' +foo' = foo.bar = bar'' +``` + +Please note the inversed order in the `t + bar` application. In most cases it +does not change anything, however, it simplifies the usage of such modifiers as +`foo.bar $= f` in order to modify a nested field with an `f` function. + +Examples: + +```haskell +type Vector + V3 x:Number y:Number z:Number + +type Sphere + MkSphere + radius : Number + position : Vector + +-- Position modification +s1 = MkSphere 10 (V3 0 0 0) +s2 = s1.position.x += 1 + +-- Which could be also expressed as +p1 = s1.position +p2 = p1.x += 1 +s2 = s1.position = p2 + +-- Or as a curried modification +s2 = s1.position.x $= +1 +``` + +## Prisms + +Alternative map implementations: + +```haskell +type Shape a + Circle + radius:a + Rectangle + width:a + height:a + +map1 : (a -> b) -> Shape a -> Shape b +map1 f self = case self of + Circle r -> Circle (f r) + Rectangle w h -> Rectangle (f w) (f h) + +map2 : (a -> b) -> Shape a -> Shape b +map2 f self = self + ? radius $= f + ? width $= f + ? height $= f + +map3 : (a -> b) -> Shape a -> Shape b +map3 f self = if self.is Circle + then self . radius $= f + else self . width $= f + . height $= f + +map4 : (a -> b) -> Shape a -> Shape b +map4 f self = + maybeNewCircle = self.circle.radius $= f + maybeNewRectangle = self.rectangle.[width,height] $= f + case maybeNewCircle of + Just a -> a + Nothing -> case maybeNewRectangle of + Just a -> a + Nothing -> error "impossible" +``` +>>>>>>> syntax-docs + +# Refinement Types + +### Ordered Lists + +Sometimes, it's desired to prove some structure behaviors, like the fact that a +list contains sorted values. Enso allows expressing such constraints in a simple +way. They are often called behavioral types, as they describe the behavior to be +checked. First, let's consider a simple List implementation and see how we can +create a refined type using the high level interface: + +```haskell +type List elems + Empty + Cons + head : elems + tail : List elems + +ordered = refined lst -> + if lst is empty + then true + else lst.head < lst.tail.elems + && isOrdered lst.tail +``` + +That's it! Now we can use it like this: + +```haskell +lst1 = [] : Ordered List Int -- OK +lst1 = [1,2,3] : Ordered List Int -- OK +lst1 = [3,2,1] : Ordered List Int -- ERROR +``` + +#### Under the Hood + +Let's understand how the above example works. First, let's implement it in an +inextendible way, just as a data type which cannot be used for other purpose: + +```haskell +data OrderedList elems + Empty + Cons + head : elems + tail : OrderedList (elems & Refinement (> this.head)) +``` + +The implementation is almost the same, however, the type of the `tail` is much +more interesting. It's an intersection of `elems` and a `Refinement` type. A +refinement type defines a set of values matching the provided requirement. Here, +values in `tail` have to be a subtype of `elems` and also have to be bigger than +the `head` element. Alternatively, you could express the type as: + +```haskell +data OrderedList elems + Empty + Cons + head : elems + tail : OrderedList (t:elems & if t > this.head then t else Void) +``` + +In both cases, we are using functions applied with type sets. For example, +`this.head` may resolve to a specific negative number while `t` may resolve to +any natural one. + +Let's extract the `isOrdered` function from the original example. The function +takes a list as an argument and checks if all of its elements are in an +ascending order. It's worth noting that Enso allows accessing the named type +variable parameters like `lst.tail.elems`. Moreover, let's define a helper +function `refine`: + +```haskell +isOrdered : List elems -> Bool +isOrdered lst = + if lst is Empty + then true + else lst.head < lst.tail.elems + && isOrdered lst.tail + +refine f = $ Refinement f +``` + +Having this function, we could now use it like: + +```haskell +lst1 = [] : Refine IsOrdered (List Int) -- OK +lst1 = [1,2,3] : Refine IsOrdered (List Int) -- OK +lst1 = [3,2,1] : Refine IsOrdered (List Int) -- ERROR +``` + +We can now define an alias `ordered = refine isOrdered`, however it would have +to be used like `Ordered (List Int)`, but in the first example we've been using +it like `Ordered List Int`. It was possible because there is a very special +function defined in the standard library: + +```haskell +applyToResult f tgt = case tgt of + (_ -> _) -> applyToResult << tgt + _ -> f tgt + +refined = applyToResult << refine +``` + +The `applyToResult` function is very simple, although, from the first sight it +may look strange. It just takes a function `f` and an argument and if the +argument was not a function, then it applies `f` to it. If the argument was a +function, it just skips it and does the same to the result of the function. Now, +we can define the `refined` function which we used on the beginning as: + +```haskell +refined = applyToResult << refine +``` + +It can be used either as shown in the original example or on the result of the +type expression directly: + +```haskell +ordered = refined isOrdered +lst1 = [] : Ordered (List Int) -- OK +lst1 = [1,2,3] : Ordered (List Int) -- OK +lst1 = [3,2,1] : Ordered (List Int) -- ERROR +``` + +# Type Inference + +Because every value belongs to infinite number of types, it's not always obvious +what type to infer by looking only at the variable definitions. The expression +`fib 10` could be typed as `55`, `Int` or `Any`, `Int`, to mention a few. The +way we type it depends on two factors: + +- **The optimizations we want to perform** + The performance implications are obvious. By computing the value during + compilation, we do not have to compute it during runtime anymore. On the other + side, compile time function evaluation is often costly, so such optimization + opportunities should be always chosen carefully. + +- **The information we need to proof the corectness of the program** + In a case we drop the results, like `print $ const 10 (fib 10)`, it's + completely OK to stop the type checking process on assuming that the type of + `fib 10` is just any type, or to be more precise, a `fib 10` itself. Its value + is always discarded and we do not need anymore information to prove that the + type flow is correct. However, if the result of `fib 10` would be passed to a + function accepting only numbers smaller than `100`, the value have to be + computed during compilation time. + +## Explicit type signatures + +Enso was designed in a way to minimize the need for explicit type signatures. +However, you are always free to provide one to check your assumptions regarding +the types. There are two major ways explicit type signatures are used in Enso: + +- **Explicit type constraints** + Explicit type signatures in type and function definitions constrain the + possible value set. For example, you will not be allowed to pass a text to a + function provided with an explicit type `fn : Int -> Int`. + +- **Explicit type checks** + Explicit type signatures in other places in the code are used as type checks. + If you type your variable as `Number` it does not mean that enso will forget + about other information inferred so far. It will always check if the signature + is correct and report an error in case it's not. For example, the following + code will type check correctly. + + ```haskell + dayNumber = 1 | ... | 7 + printDay : DayNumber -> Nothing + printDay = print + + myDay = 1 : Number + printDay myDay + ``` + +**Example 1** + +```haskell +square : (Text -> Text) | (Number -> Number) +square val = case val of + Text -> 'squared #{val}' + Number -> val * val + +action f a b = print 'The results are #{f a} and #{f b}' + +main = action square "10" 10 +``` + +**Example 2** + +```haskell +foo : Number -> Text | Integer +foo = if x < 10 then "test" else 16 + +fn1 : Text | Number -> Number +fn1 = ... + +fn2 : Text | Vector Number -> Number +fn2 = ... + +fn3 : 16 -> 17 +fn3 = +1 + +main = + val = foo 12 + fn1 val -- OK + fn2 val -- ERROR + fn3 val -- OK +``` + +### Simplified Type Signatures + +Types in Enso can be expressed in a very detailed form. Consider an `open` +function, which reads a file from disc. It's type could be expressed as: + +```haskell +open : FilePath -> Text ! FileReadError in IO +``` + +The are two important operators used here. The first one is the `!` operator, +which just means that instead of this value, we can get an error. The second one +is the `in` operator, which tells + +```haskell +openReadAndCompare + : FilePath -> Bool ! (IOError | ConversionError) in IO & State Int +openReadAndCompare path = + currentNumber = State.get + contents = open path + convertedNum = contents.as Int + convertedNum < currentNumber +``` + +## Record Types + +```haskell + +Point = {x: Number, y: Number, z: Number} + +type Point + x: Number + y: Number + z: Number + +``` + +Pattern matching works in a structural manner. The same applies to `|`, +`&`, etc. + +# ==== TO BE DESCRIBED NICER ==== + +# Monadic arguments + +Before evaluating a function, monads of all arguments are applied to host +function, so arguments are passed as `in Pure`. Why? Consider: + +```haskell +foo a = + if a == "hi" then print "hello" + if a == "no" then print "why?" + +main = + foo $ read "test.txt" +``` + +We've got here `read : Text -> Text in IO ! IO.Error`, but when evaluating +`foo`, the `a` argument is assigned with `Text in Pure`, because `IO` was merged +into main before passing the argument. Otherwise, the file would be read twice +(!) in the body of foo. + +Very rarely it is desirable to postpone the monad merging and just pass the +arguments in monads "as is". Example: + +```haskell +main = + a = ... + if a then read "a.txt" else read "b.txt" +``` + +You don't want to read both files, that's why these monads sohuld not be +unpacked with `if_then_else`. Thats why its definition is + +```haskell +if cond _then (ok in m) _else (fail in n) = + case cond of + True -> ok + False -> fail +``` + +If you don't provide the explicit `in m` and `in n`, the args are considered to +be `in Pure` + +# How `=` works + +Consider: + +```haskell +test = + body + a = f + out +``` + +Assume: + +```haskell +f : F in FM2 in FM1 +``` + +Then: + +```haskell +a : F in FM2 in Pure +body : _ in BM +test : out in FM1 & BM +``` + +Basically `=` transforms right side to left side like +`(right : R in RM2 in RM1) -> (left : R in RM2 in Pure)`, and it merges `RM1` +with host monad. + +# The Dynamic Type + +When calling a foreign python we get the result typed as `Dynamic`. Basically, +values typed as `Dynamic` work just like in Python. You can access their fields +/ methods by string, you can add or remove fields, and you always get the +`Dynamic` as result. Every operation on `Dynamic` results in `a ! DynamicError`. + +Everything that is possible to express on the `Dynamic` type should be possible +to be expressed using normal data types (see the "Dynamic access" chapter +above). + +There is an important change to how UCS works with dynamic types, namely, the +dot syntax always means the field access. + +```haskell +num = untypedNumberFromPythonCode +num2 = num + 1 -- : Dynamic ! DynamicError +num3 = num2 catch case + DynamicError -> 0 +-- num3 : Dynamic +num4 = num3 - 1 -- : Dynamic ! DynamicError + +``` + +```haskell +obj.__model__ = + { atom : Text + , dict : Map Text Any + , info : + { doc : Text + , name : Text + , code : Text + , loc : Location + } + , arg : -- used only when calling like a function + { doc : Text + , default : Maybe Any + } + } +``` + +## Dynamic access + +Even typed data in Enso behaves like it was fully dynamic. You can access the +field dictionary of each object and alter it. It's amazing for type level +programming, as you could be able to generate types by defining their +dictionaries during "module compilation time". To be described – how to do it – +type is just a named record, which is like a dictionary. + +Basically, every property of object (let them behave like classes, modules or +interfaces) should be accessible and extendible in such way. + +```haskell +class Point a + P3 x:a y:a z:a + fnfield : this + fnfield = P3 this.x this.x this.x + + length : a + length = this.x^2 + this.y^2 + this.z^2 . sqrt + +p1 = P3 1 2 3 +print $ p1.fields -- +f1 = p1.fields.get "fnfield" -- V3 a b c -> V3 a a a +print $ f1 p1 -- V3 1 1 1 +p2 = p1.fields.set "fnfield" $ p -> V3 p.y 0 p.y +print $ p2.fnfield -- V3 2 0 2 + +p3 = p1.fields.set "tupleFields" $ p -> [p.x, p.y, p.z] +print $ typeOf p3 -- P3 1 2 3 & {tupleFields: [this.x, this.y, this.z]} +print p3.tupleFields -- [1,2,3] +p4 = p3.tupleFields = [7,8,9] +print p4 -- P3 7 8 9 + +-- What if the name is not known at compilation time? +name : Text +field1 = p1.fields.get name -- field1 : Dynamic +``` + +# Lists + +Lists in Luna are defined as follow: + +```haskell +type List a + Cons value:a tail:(List a) + End +``` + +And can be used like: + +```haskell +lst1 = List.Cons 1 (List.Cons "foo" List.End) + : List.Cons 1 (List.Cons "foo" List.End) + : List (Int | String) +lst2 = [1,"foo"] : [1,"foo"] : List (Int | String) +``` + +# Proving the Software Correctness + +**Note [To be included somewhere]**: Enso is dependently typed because we can +run arbitrary code on type-level. + +**So, what are dependent types?** Dependent types are types expressed in terms +of data, explicitly relating their inhabitants to that data. As such, they +enable you to express more of what matters about data. While conventional type +systems allow us to validate our programs with respect to a fixed set of +criteria, dependent types are much more flexible, they realize a continuum of +precision from the basic assertions we are used to expect from types up to a +complete specification of the program’s behaviour. It is the programmer’s choice +to what degree he wants to exploit the expressiveness of such a powerful type +discipline. While the price for formally certified software may be high, it is +good to know that we can pay it in installments and that we are free to decide +how far we want to go. Dependent types reduce certification to type checking, +hence they provide a means to convince others that the assertions we make about +our programs are correct. Dependently typed programs are, by their nature, proof +carrying code. + +**If dependent types are so great, why they are not used widely?** Basically, +there are two problems. First, there is a small set of languages allowing for +dependent types, like Agda or Idris. Second, both writing as well as using +dependently typed code is significantly harder than a code using conventional +type system. The second problem is even bigger because it stands in a way to +easily refactor the code base and keep it in a good shape. + +**I've heard that dependent type system in Enso is different, how?** The Enso +type system provides a novel approach to dependent types. It allows to just +write simple code and in many cases provides the dependent type system benefits +for free! + +## Power and Simplicity + +Consider the following code snippets in Idris. This is a simple, but not very +robust implementation of List. If you try to get the head element of an empty +list, you'll get the runtime error and there is no way to prevent the developer +from using it by mistake: + +```Haskell +----------------------- +--- LANGUAGE: IDRIS --- +----------------------- + +data List elem + = Cons elem (List elem) + | Empty + +index : Int -> List a -> a +index 0 (Cons x xs) = x +index i (Cons x xs) = index (i-1) xs + +main : IO () +main = do + let lst1 : List String = (Cons "Hello!" Nil) + let lst2 : List String = Nil + print $ index 0 lst1 + print $ index 0 lst2 +``` + +```haskell +--- Runtime Output --- +Hello! +*** test.idr:18:23:unmatched case in Main.index *** +``` + +The above program crashed in the middle of execution. Such mistakes as the +possibility of the index to be out of bounds are very hard to catch and most of +programming languages do not provide a standard, easy mechanism to prevent them +from happening. Let's improve the situation and use the power of dependent types +to keep the information about the length of the list visible to the compiler: + +```haskell +----------------------- +--- LANGUAGE: IDRIS --- +----------------------- + +data List : (len : Nat) -> (elem : Type) -> Type where + Cons : (x : elem) -> (xs : List len elem) -> List (S len) elem + Empty : List Z elem + +index : Fin len -> Vect len elem -> elem +index FZ (Cons x xs) = x +index (FS k) (Cons x xs) = index k xs + +main : IO () +main = do + let lst1 : List 1 String = Cons "hello" Empty + let lst2 : List 0 String = Empty + print $ index 0 lst1 + print $ index 0 lst2 +``` + +```haskell +--- Compilation Error --- +test.idr:18:21: +When elaborating right hand side of main: +When elaborating argument prf to function Data.Fin.fromInteger: + When using 0 as a literal for a Fin 0 + 0 is not strictly less than 0 +``` + +This time the error was catched by the compiler, however, both the +implementation as well as the library interface are much more complex now. + +Let's now write the same implementation in Luna: + +```haskell +---------------------- +--- LANGUAGE: ENSO --- +---------------------- + +type List a + Cons value:a tail:a + Empty + +index : Natural -> List a -> a +index = case + 0 -> value + i -> tail >> index (i-1) + +main = + lst1 = Cons "hello" Empty + lst2 = Empty + print $ index 0 lst1 + print $ index 0 lst2 +``` + +```haskell +--- Compilation Error --- +Error in test.enso at line 18: + The field Empty.tail is not defined. + Arising from ... +``` + +Although the Enso implementation is over 15% shorter that the insecure Idris +implementation and over 50% shorter than the secure implementation, it provides +the same robustness as the secure Idris implementation. Moreover, the user +facing interface is kept simple, without information provided explicitly for the +compiler. + +## Another Example + +```haskell +----------------------- +--- LANGUAGE: IDRIS --- +----------------------- + +import Data.So + +countOcc : Eq a => a -> List a -> Nat +countOcc x xs = length (findIndices ((==) x) xs) + +validate : String -> Bool +validate x = let + containsOneAt = (countOcc '@' (unpack x)) == 1 + atNotAtStart = not (isPrefixOf "@" x) + atNotAtEnd = not (isSuffixOf "@" x) + in containsOneAt && atNotAtStart && atNotAtEnd + +data Email : Type where + MkEmail : (s : String) -> {auto p : So (validate s)} -> Email + +implicit emailString : (e : Email) -> String +emailString (MkEmail s) = s + +main : IO () +main = do + maybeEmail <- getLine + + case choose (validate maybeEmail) of + Left _ => putStrLn ("Your email: " ++ (MkEmail maybeEmail)) + Right _ => putStrLn "No email." +``` + +```haskell +---------------------- +--- LANGUAGE: ENSO --- +---------------------- + +isValid : String -> Bool +isValid address + = address.count '@' == 1 + && not $ address.startWith '@' + && not $ address.endsWith '@' + +type Email + Data address : Refine IsValid Text + +main = + mail = Email.Data Console.get + if mail.error + then 'Not a valid address.' + else print mail +``` + +## Types Resolution + +The natural next question is, how it was possible to get such a drastic quality +improvement? As already mentioned, dependent types are types expressed in terms +of data, explicitly relating their inhabitants to that data. Enso atom types +make it possible to expose all data structures to the compiler automatically, so +they can be statically analyzed. There is no need to explicitly provide some +selected data to the compiler, as it has access to every structural information +by design. + +Let's describe where the compiler gets the required information from. Please +note, that the following description is shown for illustration purposes only and +do not represent the real compilation algorithm. First, lets focus on the +definition of the `index` function: + +```haskell +index : Natural -> List a -> a +index = case + 0 -> value + i -> tail >> index (i-1) +``` + +Without using currying and after applying the Uniform Syntax Call, we can write +it's more explicit form: + +```haskell +index : Natural -> List a -> a +index i lst = case i of + 0 -> lst.value + i -> index (i-1) lst.tail +``` + +Let's break the function apart: + +```haskell +index_1 : 0 -> List a -> a +index_1 0 lst = lst.value + +index_2 : ((j:Natural) + 1) -> List a -> a +index_2 i lst = index (i-1) lst.tail + +index : Natural -> List a -> a +index i = case i of + 0 -> index_1 i + i -> index_2 i +``` + +Based on the provided information, including the fact that the `value` and +`tail` fields are defined only for the `Cons` atom, we can further refine the +types of `index_1` and `index_2`: + +```haskell +index_1 : 0 -> Cons t1 (List t2) -> t1 +index_2 : ((j:Natural) + 1) -> Cons t1 (List t2) -> t1 +``` + +Please note that the type `a` was refined to `t1 | t2`. We can now infer a much +more precise type of `index`, which makes it obvious why the code was incorrect. + +```haskell +index : Natural -> Cons t1 (List t2) -> t1 +``` + +A similar, but a little more complex case applies if we try to access a nested +element. We leave this exercise to the reader. + +### Bigger Example (to be finished) + +```haskell +type List a + Cons value:a tail:(List a) + End + +head : Cons a (List b) -> a +head = value + +last : Cons a (List a) -> a +last = case + Cons a End -> a + Cons a tail -> last tail + +init : Cons a (List a) -> List a +init = case + Cons a End -> End + Cons a (Cons b tail) -> Cons a $ init (Cons b tail) + +index :: Natural.range lst.length -> lst +``` + +## Autolifting functions to types + +```haskell +-- Consider +fn : Int -> Int -> Int +fn = a -> b -> a + b + +-- If we provide it with 1 and 2 then +fn 1 2 : fn 1 2 : 3 + +-- Howevere this is true as well +fn 1 2 : fn Int Int : Int + +-- Please note that 1:Int AND Int:Int +-- It means that functions can always be provided with type-sets and return type sets, so +fn Int Int -- returns Int +``` + +# Function composition + +```haskell +sumIncremented1 = map +1 >> fold (+) +sumIncremented2 = fold (+) << map +1 +``` + +However, the following is preferred: + +```haskell +sumIncremented1 = . map +1 . fold (+) +``` + +# Lazy / Strict + +```haskell +if_then_else :: Bool -> Lazy a in n -> Lazy a in m -> a in n | m +if cond _then ok _else fail = + case cond of + True -> ok + False -> fail + +test cond = if_then_else cond -- The arguments are still lazy and accept monads +test cond ok fail = if_then_else cond ok fail -- The arguments are strict and does not accept monads +``` + +**TODO:** ARA + WD - check with bigger examples if this really holds. +Alternatively we can think of `Lazy a` as a part of the `a` parameter, which +should not be dropped. WD feels it needs to be re-considered. + +# Context Defaults + +- Function arguments default to `in Pure` if not provided with an explicit type. +- Function results and variables default to `in m` if not provided with an + explicit type. + +For example: + +```haskell +test a b = ... +``` + +Has the inferred type of + +```haskell +test : a in Pure -> b in Pure -> out in m +``` + +Thus if used like + +```haskell +test (print 1) (print 2) +``` + +The prints will be evaluated before their results are passed to `test`. However, +when provided with explicit signature: + +```haskell +test2 : a in m -> b in n -> out in o +test2 a b = ... +``` + +Then the evaluation + +```haskell +test2 (print 1) (print 2) +``` + +Will pass both arguments as "actions" and their evaluation depends on the +`test2` body definition. + +# Type Based Implementations + +```haskell +default : a +default = a . default +``` + +# Explicit Types And Subtyping + +When explicit type is provided, the value is checked to be the subtype of the +provided type, so all the following lines are correct: + +```haskell +a = 1 +a : 1 +a : Natural +a : Integer +a : Number +a : Type +``` + +The same applies to functions – the inferred signature needs to be a subtype of +the provided one. However, the intuiting of what a subtype of a function is +could not be obvious, so lets describe it better. Consider a function `foo`: + +```haskell +foo : (Natural -> Int) -> String +``` + +From definition, we can provide it with any value, which type is the subtype of +`Natural -> Int`. This argument needs to handle all possible values of `Natural` +as an input. Moreover, we know that `foo` assumes that the result of the +argument is any value from the set `Int`, so we cannot provide a function with a +broader result, cause it may make `foo` ill-working (for example if it pattern +matches on the result inside). So the following holds: + +```haskell +(Natural -> Natural) : (Natural -> Int) +``` + +Please note, that we can provide a function accepting broader set of arguments, +so this holds as well: + +```haskell +(Int -> Natural) : (Natural -> Natural) +``` + +So, this holds as well: + +```haskell +(Int -> Natural) : (Natural -> Int) +``` + +(todo: describe variants and contravariants better here). + +Consider the following, more complex example: + +```haskell +add a name = + b = open name . to Int + result = (a + b).show + print result + result +``` + +This function works on any type which implements the `+` method, like `String`, +however, we can narrow it down by providing explicit type signature: + +```haskell +add : Natural in Pure -> Text in Pure -> String in IO ! IO.ReadError +add = ... + +addAlias = add +``` + +Now we can create an alias to this function and provide explicit type signature +as well. As long as the `add` signature will be the subtype of `addAlias` +signature, it will be accepted. First, we can skip the explicit error mention: + +```haskell +addAlias : Natural in Pure -> Text in Pure -> String in IO +``` + +Next, we can skip the explicit contexts, because the contexts of arguments +default to `Pure` while the context of the result does not have any restrictions +by default so will be correctly inferred: + +```haskell +addAlias : Natural -> Text -> String +``` + +We can also type the whole function using any broader type. In order to +understand what a subtype of a function, visualize it's transformation as arrows +between categories. The above function takes any value from a set `Natural` and +set `Text` and transforms it to some value in set `String`. We can use any wider +type instead: + +```haskell +addAlias : Natural -> Text -> Type +``` + +However, please note that the following will be not accepted: + +```haskell +addAlias : Int -> Type -> Type -- WRONG! +``` + +# Underscore in Pattern Matching + +`const a _ = a` behaves differently than underscore in expressions (implicit +lambda). + +# Type Holes + +```haskell +a :: ?? +``` + +Creates a type hole, which will be reported by the compiler. Describe the +programming with type holes model. A good reference: http://hazel.org + +# Mutable Fields (FIXME) + +```haskell +type Graph a + Node + inputs : List (Mutable (Graph a)) + value : a + +-- THIS MAY BE WRONG, we need to have semantics how to assign mutable vars to mutable vars to crete mutual refs and also pure vars to create new refs +n1 = Node [n2] 1 +n2 = Node [n1] 2 +``` + +# Other Things To Be Described + +- Implicit conversions + +- modules and imports (from the deprecated section) + +- Using and creating Monads, example State implementation (Monad = always + transformer, on the bottom Pure or IO) + +- IO should be more precise, like `IO.Read` or `IO.Wrtie`, while `IO.Read : IO` + +- Constrained types (like all numbers bigger that `10`) + +- Errors and the catch construct like + + ```haskell + num3 = num2 catch case + DynamicError -> 0 + ``` + +- Catching Errors when not catched explicitly – important for correctness + +- Type-level / meta programming – like taking an interface and returning + interface with more generic types (move a lot of examples from TypeScript + docs) + +- Question – should it be accessed like `End` or like `List.End` ? The later is + rather better! If so, we need to make changes across the whole doc! + + ```haskell + type List a + Cons a (List a) + End + ``` + +- monadfix + +- implementing custom contexts (monads). Including example how to implement a + "check monad" which have lines checking dataframes for errors. + +### + +# ==== DEPRECATED (Useful parts) ==== + +# Types + +## Types. Unified Classes, Modules and Interfaces + +Enso unifies the abstraction of classes, modules and interfaces under a single +first-class umbrella. All of the following functionalities are provided by the +`type` keyword, resulting in a highly flexible language construct: + +- **Classes.** Types provide containers for data and associated behavior. +- **Modules.** Types provide namespacing for code and data. +- **Interfaces.** Types provide behavior description required of a type. + +At a fundamental level, the definition of a new `type` in Enso is the creation +of a (usually named) category of values described by the data and behavior it +possesses. These are first-class values in Enso, and can be created and +manipulated at runtime. + +## Type Signatures + +Enso allows providing explicit type information by using the colon operator. The +compiler considers type signatures as hints and is free to discard them if they +do not provide any new information. However, if the provided hint is incorrect, +an error is reported. + +For example, the following code contains an explicit type signature for the `a` +variable. Although the provided type tells that `a` is either an integer number +or a text, the compiler knows its exact value and is free to use it instead of +the more general type. Thus, no error is reported when the value is incremented +in the next line. + +```haskell +a = 17 : Int | Text +b = a + 1 +print b +``` + +However, if the provided type contains more information than the currently +inferred one, both are merged together. Consider the following example for +reference. + +```haskell +test : Int -> Int -> Int +test = a -> b -> + c = a + b + print c + c +``` + +Without the explicit type signature, the inferred type would be very generic, +allowing the arguments to be of any type as long as it allows for adding the +values and printing them to the screen. The provided type is more specific, so +Enso would allow to provide this function only with integer numbers now. +However, the provided type does not mention the context of the computations. The +compiler knows that `print` uses the `IO` context, so considering the provided +hint, the final inferred type would be +`Int in c1 -> Int in c2 -> Int in IO | c1 | c2`. + +It's worth to note that the type operator is just a regular operator with a very +low precedence and it is defined in the standard library. + +## Types as Classes + +The following chapter describes the replacement for the currently used concept +of _classes_. We have been always dreaming about true dependent typed language +and the way classes currently work stands on the way to achieve the dreams. The +change is, however, not as drastic as it seems. It is rather a process of +extending the current model to provide more fine grained control over the +objects and types. + +Enso is an Object Oriented programming language. It provides the notion of +objects and methods so at first glance, Enso types may seem like conventional +_classes_ from traditional object-oriented languages. However, these concepts +differ significantly. Enso types have much more power, yet much simpler design, +disallowing concepts like inheritance in favour of composition and algebraic +data types. + +### Constructors + +While types in Enso describe categories of values, the constructors are the +values themselves. Constructors are used for defining new data structures +containing zero or more values, so called fields. Formally, constructors are +product types, a primitive building block of algebraic data types. + +A constructor definition starts with the `type` keyword followed by the +constructor name and lists its fields by name with possible default values. It +is possible to create unnamed fields by using wildcard symbol instead of the +name. Constructors cannot be parametrized and their fields cannot be provided +with explicit type annotations. The formal syntax description is presented +below. + +``` +consDef = "type" consName [{consField}] +fieldName = varName | wildcard +consField = fieldName ["=" value] +``` + +Below we present code snippets with constructors definitions. Constructors with +the same name are just alternative syntactic forms used to describe the same +entity. We will refer to these definitions in later sections of this chapter. + +```haskell +-- Boolean values +type True +type False + +-- Structure containing two unnamed fields +type Tuple _ _ + +-- Alternative Point definitions: +type Point x y z + +type Point (x = 0) (y = 0) (z = 0) + +type Point x=0 y=0 z=0 + +type Point + x = 0 + y = 0 + z = 0 +``` + +### Methods + +A method is a function associated with a given constructor. The primitive method +definition syntax is very similar to function definition, however it also +includes the constructor in its head: + +```haskell +True.not = False +False.not = True +Point x y z . length = (x^2 + y^2 + z^2).sqrt +Tuple a b . swap = Tuple b a +``` + +Most often methods are defined in the same module as the appropriate +constructors. Please refer to sections about interfaces and extension methods to +learn more about other possibilities. + +### Constructors as types + +As Enso is a dependently-typed language with no distinction between value- and +type-level syntax, we are allowed to write _very_ specific type for a given +value. As described earlier, constructors are the values belonging to categories +defined by Enso types. However, they are not only members of categories, they +are also useful to describe very specific categories per se. Formally, a +constructor is capable of describing any subset of the set of all possible +values of its fields. + +For example, the `True` constructor could be used to describe the set of all +possible values of its fields. While it does not have any fields, the set +contains only two value, the `True` constructor itself and an `undefined` value. +Thus it is correct to write in Enso `True : True` and assume that the only +possible values of a variable typed as `a : True` are either `True` or +`undefined`. + +On the other hand, The `Point` constructor do contain fields, thus it could be +used for example to describe all possible points, whose first coordinate is an +integral number, while the second and third coordinates are equal to zero: +`a : Point int 0 0`. + +### Type combinators + +The careful reader will notice here, that `int` is a category of all possible +integral numbers, while the numbers are considered constructors themselves. Enso +provides an operator used to join types together, the so called pipe operator. +The hypothetical `int` definition could look like `int = .. | -1 | 0 | 1 | ...`. +We can use this mechanism to easily express even complex type dependencies. For +example we can tell Enso that a particular value has the type of `int | text`. +Enso will allow us to either use pattern matching to discover at runtime which +type are we really dealing with or will allow to use only methods which have +common interface among all constructors described by the type. It will for +example allow us to print such value to the screen. + +### Pattern matching + +The proposed syntax changes allow us to improve pattern matching rules and make +them much more understandable, especially for new users. As we have described +earlier, there is no need to use qualified constructor names or special cases in +patterns anymore. Moreover, a new form of pattern matching is introduced, the so +called "type pattern matching". + +While constructors allow combining fields into a single structure and type +combinators allow joining types into more general ones, the pattern matching +mechanism allows going the opposite direction. In the most common use case +pattern matching will be performed during runtime, however it is worth to note +that the Enso compiler has enough information to perform pattern matching during +compilation if the appropriate values could be deduced in the compilation +process. There are two forms of pattern matching, namely constructor pattern +matching and generalized type pattern matching. The former syntax is practically +identical to the existing one, while the later uses the `type` keyword to denote +that we are performing pattern matching on type descriptor. Let's see how the +new syntax looks like in practice: + +```haskell +type shape a + type Circle + radius :: a + + type Rectangle + width :: a + height :: a + + +main = + c1 = Circle 5 :: shape int + v = if something then c1 else 0 + + print case v of + Circle r -> 'it is circle' + type shape -> 'it is other shape' + _ -> 'it is something else' + + print case v of type + shape -> 'it is shape' + int -> 'it is int' + +``` + +### Polymorphism + +Formally polymorphism is the provision of a single interface to entities of +different types. Enso does not provide any special construction to support +polymorphism, because even very complex polymorphic types could be described +just by using type-level functions. Consider the following example code: + +```haskell +type Point x y z +point a = Point a a a + +main = + p1 = Point 1 2 3 :: point int + print p1 +``` + +The `point` function is the most basic form of polymorphic type definition in +Enso. It defines all such sets of points, whose all components belong to the +provided type. To better understand this relation, please consider the following +valid expressions: + +```haskell +p1 = Point 1 2 3 : Point 1 2 3 +p1 = Point 1 2 3 : Point int int int +p1 = Point 1 2 3 : point int + +Point 1 2 3 : Point 1 2 3 : Point int int int : point int +``` + +This is a very flexible mechanism, allowing expressing even complex ideas in a +simple and flexible manner. An example is always worth more than 1000 words, so +please consider yet another example usage: + +```haskell +taxiDistance : point real -> point real -> real +taxiDistance p1 p2 = (p2.x - p1.x).abs + (p2.y - p1.y).abs + (p2.z - p1.z).abs + +main = + p1 = Point 1 2 3 + print $ taxiDistance p1 +``` + +### Generalized type definitions + +While we can define constructors, methods and compose them to create more +powerful types using the described methods, such definitions require significant +amount of code and do not reflect the real dependencies between the definitions. +This is the reason why Enso provides a syntactic sugar allowing to define +everything we have learned so far in more concise form. + +It is worth emphasizing that generalized type definitions are only a simpler way +to define multiple constructors, combine them into a common type and define +common methods. They do not provide any additional value or functionality. The +generalized type definition syntax is presented below: + +``` +typeDef = "type" varName [":" interface] [({consDef} | {consField})] [method] +``` + +The body of a type can contain functions, data, or even _other types_, and _yes_ +because ytrou were wondering, types _can_ be defined inductively or using a GADT +style. We can re-write the earlier provided definitions using this form as +follow: + +```haskell +type bool + type True + type False + + not = case self of + True -> False + False -> True +``` + +```haskell +type point a + x y z = 0 : a + + length = (x^2 + y^2 + z^2).sqrt +``` + +```haskell +type tuple a b + _ : a + _ : b + + swap = Tuple b a +``` + +While using this form we define common methods on a set of constructors, like +the method `not` and we use pattern matching to chose the right algorithm path, +this approach does not have any performance penalties, because the compiler is +provided with enough information to optimize this check away if the value was +known at compile time. + +One important thing to note here is that if you don't define any explicit +constructors, an implicit one will be generated automatically and will be named +the same way as the type but starting with an upper-letter instead. Now we can +use the above definitions as follow: + +```haskell +test check = + p1 = Point 1 2 3 : point int + p2 = Point 4 5 6 : point real + px = if check then p1 else p2 + print px.length +``` + +**Bonus question** +What is the most concrete type of the `px` variable above if we do not have any +information about the value of `check`? The answer is of course +`px : (Point 1 2 3 | Point 4 5 6)`, which is a sub type of the type +`Point (1|4) (2|5) (3|6)`. + +## Types as Modules + +The same notion of a type can be used to provide the functionality that is +traditionally expected of a _module_ (in the common, not ML sense). In most +programming languages, their module system provides a mechanism for code-reuse +through grouping and namespacing. Indeed, Enso's types provide both of these +functionalities: + +- **Grouping of Code** + A `type` declaration acts as a container for code, with functions able to be + declared in its scope. +- **Namespacing** + Unless otherwise declared (through a direct import statement), a `type` in + Enso also provides a namespace to constructs declared inside its scope. + +### Files and modules + +Files in Enso should contain at least one `type` definition, with one type named +the same as the file. This `type` is known as the 'primary' type, and it is this +type that is referred to when importing the 'module'. A file `data/map.luna` may +contain `type map`, `type helper` and various other types, but the only things +visible outside the file are the primary type and things defined or imported +into its scope. Inside the file, however, everything can be seen, with no need +to forward-declare. + +### Module Examples + +The concepts are best illustrated by example. Consider the following type. If it +is imported simply as `import math` (see [Importing Types](#importing-types)), +then `pi` value is only accessible within the scope of `math` (by using +`math.pi`). + +However, please note that `math.pi` is not some kind of a special construct for +a qualified data access. The `math` is a zero-argument constructor of the `math` +module (type). The expression `math.pi` is creating the module object and then +accessing its `pi` field. Of course such creation would be optimized away during +the compilation process. + +File `math.luna`: + +```haskell +type math + pi: 3.14 +``` + +File `main.luna`: + +```haskell +type main + import math + main = print math.pi +``` + +## Types as Interfaces + +A type in Enso can also act as a 'contract', a specification of the behavior +expected of a type. The use of types as interfaces in Enso is, as you might +expect, contravariant. As long as the type satisfies the category defined by the +interface, it can be used in its place. This leads to the expected semantics +where a type `Foo` implementing `Bar` can be used where a `Bar` is expected. + +Interfaces in Enso can range from general to very specific. As they define a +_category_ of values, interfaces can specify anything from function signatures +that must be present, all the way to names that must be present in the type's +scope and default behavior. The following are all valid ways to define types for +use as interfaces in Enso. + +```haskell +-- This interface requires a function called someFunction with the correct sig. +type Interface1 + someFunction : Int -> String + +-- This interface requires a function and a variable both named appropriately. +type (a : Numeric) => Interface2 a + someVar : a + + someFunction : a -> a + someFunction = ... + +-- This interface requires a function foo with the appropriate type. +type Interface3 a = { foo : a -> a } +``` + +For more information on the last example, please read the section on +[anonymous types](#anonymous-types). + +### Implementing Interfaces + +TODO: This section needs discussion. It is a very draft proposal for now. + +The nature of Enso's type system means that any type that _satisfies_ an +interface, even without explicitly implementing it, will be able to be used in +places where that interface is expected. However, in the cases of named +interfaces (not [anonymous types](#anonymous-types)), it is a compiler warning +to do so. (TODO: Explain why. What bad would happen otherwise?) + +You can explicitly implement an interface in two ways. Examples of both can be +found at the end of the section. + +1. **Implementation on the Type** + Interfaces can be directly implemented as part of the type's definition. In + this case the type header is annotated with `: InterfaceName` (and filled + type parameters as appropriate). The interface can then be used (if it has a + default implementation), or the implementation can be provided in the type + body. + +2. **Standalone Implementation:** + Interfaces can be implemented for types in a standalone implementation block. + These take the form of `instance Interface for Type`, with any type + parameters filled appropriately. + +Both of these methods will support extension to automatic deriving strategies in +future iterations of the Enso compiler. + +It should also be noted that it is not possible to implement orphan instances of +interfaces in Enso, as it leads to difficult to understand code. This means that +an interface must either be implemented in the same file as the interface +definition, or in the same file as the definition of the type for which the +interface is being implemented. (TODO: To be discussed) + +Consider an interface `PrettyPrinter` as follows, which has a default +implementation for its `prettyPrint` method. + +```haskell +type (t : Textual) => PrettyPrinter t = + prettyPrint : t + prettyPrint = self.show +``` + +For types we own, we can implement this interface directly on the type. Consider +this example `Point` type. + +```haskell +type Point : PrettyPrinter Text + x : Double + y : Double + z : Double + + prettyPrint : Text + prettyPrint = ... +``` + +If we have a type defined in external library that we want to pretty print, we +can define a standalone instance instead. Consider a type `External`. + +```haskell +instance PrettyPrint Text for External = + prettyPrint = ... +``` + +HOLES!!! + + + +## Imports + +To go along with the new system proposed in this RFC around code modularity, the +syntax for dealing with imports has been tweaked slightly. The following import +syntaxes are valid: + +- **Direct Imports:** These import the primary module from the file. This brings + the type and its constructors into scope. For example `import Data.Map` would + bring `Map` and its constructors into scope. +- **Specified Imports:** These allow the specification of additional functions + to be brought into the current scope. For example `import Data.Map: fromList` + would bring `Map`, its constructors and `fromList` into scope. +- **Renamed Imports:** These allow for the programmer to rename the imported + type. For example `import Data.Containers.Map as MapInterface` brings `Map` + into scope named as `MapInterface`. Here, constructors are also imported. +- **Specialised Imports:** These allow the programmer to specialise type + arguments as part of the import. For example `import Data.Map String` will + import `Map` and its constructors with their first type arguments specialised + to `String`. + +These above import styles can be combined, for example renaming a partially +specialised import (`import Data.Map String as StringMap`), or specialising +functions imported into scope (`import Data.Map String: fromList`). Much like +curried type application seen elsewhere in this proposal, it is possible to +partially apply the type arguments of an import, as seen above. + + + +### Scoping Rules and Code Modularity + +Imports in Enso can be performed in _any_ scope, and are accessible from the +scope into which they are imported. This gives rise to a particularly intuitive +way of handling re-exports. + +Consider the following file `Test.luna`. In this file, the imports of `Thing` +and `PrettyPrint` are not visible when `Test.luna` is imported. However, +`PrettyPrint` and `printer` are made visible from within the scope of `Test`. +This means that a user can write `import Test: printer` and have it work. + +``` +import Experiment.Thing +import Utils.PrettyPrint + +type Test a : PrettyPrint Text (Test a) = + import Utils.PrettyPrint: printer + + runTest : a -> Text + runTest test = ... + + prettyPrint : Test a -> Text + prettyPrint self = ... +``` + +## Anonymous Types + +In addition to the syntax proposed above in [Declaring Types](#declaring-types), +this RFC also proposes a mechanism for quickly declaring anonymous types. These +types are anonymous in that they provide a category of values without applying a +name to their category, and can be created both as types and as values. + +While it is possible to use the primary type declaration syntax without +providing an explicit name, this is highly impractical for most places where an +anonymous type becomes useful. This shorthand provides a way to get the same +benefit without the syntactic issues of the former. + +### Anonymous Types as Types + +When used in a type context, an anonymous type acts as a specification for an +interface that must be filled. This specification can contain anything from +types to names, and features its own syntax for inline declarations. + +Consider the following examples: + +- `{Int, Int, Int}`: This type declares a set of values where each value + contains three integers. +- `{Int, foo : Self -> Int}`: This type declares a set of values with an integer + and a function from `Self` to an Integer with name `foo`. +- `{Self -> Text -> Text}`: This defines an unnamed function. This may seem + useless at first, but the input argument can be pattern-matched on as in the + following example: + + ``` + foo : { Int, Int, Self -> Int } -> Int + foo rec@{x, y, fn} = fn rec + ``` + +`Self` is a piece of reserved syntax that allows anonymous types to refer to +their own type without knowing its name. + +### Anonymous Types as Values + +Anonymous types can also be constructed as values using similar syntax. You can +provide values directly, which will work in a context where names are not +required, or you can provide named values as in the following examples: + +- `{0, 0}`: This anonymous value will work anywhere a type with two numbers and + no other behaviour is expected. +- `{x = 0, y = 0, z = 0}`: This one provides explicit names for its values, and + will work where names are required. +- `{x = 0, fn = someFunction}`: This will also work, defining the value for `fn` + by use of a function visible in the scope. +- `{x = 0, fn = (f -> pure f)}`: Lambda functions can also be used. diff --git a/doc/design/type-system/.DS_Store b/doc/design/type-system/.DS_Store new file mode 100644 index 0000000000000000000000000000000000000000..5008ddfcf53c02e82d7eee2e57c38e5672ef89f6 GIT binary patch literal 6148 zcmeH~Jr2S!425mzP>H1@V-^m;4Wg<&0T*E43hX&L&p$$qDprKhvt+--jT7}7np#A3 zem<@ulZcFPQ@L2!n>{z**++&mCkOWA81W14cNZlEfg7;MkzE(HCqgga^y>{tEnwC%0;vJ&^%eQ zLs35+`xjp>T0 + +- [Motivation](#motivation) +- [Goals for the Type System](#goals-for-the-type-system) +- [Type Conversions](#type-conversions) + - [Convertible](#convertible) + - [Coercible](#coercible) +- [Principles for Enso's Type System](#principles-for-ensos-type-system) +- [Structural Type Shorthand](#structural-type-shorthand) +- [Interfaces](#interfaces) + - [Interfaces as Names for Structures](#interfaces-as-names-for-structures) + - [Interfaces as a Global Mapping](#interfaces-as-a-global-mapping) + - [Interface Generality](#interface-generality) +- [Subtyping and User-Facing Type Definitions](#subtyping-and-user-facing-type-definitions) +- [Row Polymorphism and Inference](#row-polymorphism-and-inference) +- [Unresolved Questions](#unresolved-questions) +- [Dependency and Enso](#dependency-and-enso) +- [Steps](#steps) +- [References](#references) + + + +# Motivation +At present, Enso has nothing but a rudimentary module system to aid in the +complexity management and scoping of code. For a sophisticated language this is +a bad state of affairs, and so this RFC aims to propose a redesign of certain +portions of Enso to present a _unified_ design which unifies modules (in both +the conventional and ML senses), classes, and interfaces under a single +first-class umbrella. + +In doing so, the proposal supports a diversity of use-cases, allowing the +first-class manipulation of types, including the creation of anonymous types. In +doing so, it provides users with first-class modularity for their code, and +intuitive mechanisms for working with types in Enso's type system. This concept +thus brings a massive simplification to the Enso ecosystem, providing one +powerful mechanism to accomplish many key language features, without requiring +the user to understand more than the mechanisms of `type`, and the principle +behind Enso's type system. + +In the end, Enso's current module system is insufficient for serious development +in the language, and needs to be replaced. In doing so, this RFC proposes to +take the time to bring a vast simplification to the language in the same swoop. + +# Goals for the Type System +In our design for Enso, we firmly believe that the type system should be able to +aid the user in writing correct programs, far and above anything else. However, +with so much of our targeted user-base being significantly non-technical, it +needs to be as unobtrusive as possible. + +- Inference should have maximal power. We want users to be _forced_ to write + type annotations in as few situations as possible. This means that, ideally, + we are able to infer higher-rank types and make impredicative instantiations + without annotations. +- Error messages must be informative. This is usually down to the details of the + implementation, but we'd rather not employ an algorithm that discards + contextual information that would be useful for crafting useful errors. +- Dependent types are a big boon for safety in programming languages, allowing + the users that _want to_ to express additional properties of their programs + in their types. We would like to introduce dependent types in future, but + would welcome insight on whether it is perhaps easier to do so from the get + go. If doing so, we would prefer to go with `Type : Type`. +- Our aim is to create a powerful type system to support development, rather + than turn Enso into a research language. We want users to be able to add + safety gradually. + +# Type Conversions +Like in any programming language, Enso requires the ability to convert between +types. Sometimes these conversions have to happen at runtime, incurring a +computational cost, but sometimes these conversions can be 'free', in cases +where the compiler can prove that the types have the same representation at +runtime. Enter the `Coercible` and `Convertible` mechanisms. + +## Convertible +There is a tension in Enso's design around conversions between types. In many +cases, our users performing data analysis will just 'want it to do the right +thing', whereas users writing production software will likely want control. + +The resolution for this tension comes in the form of `Convertible`, one of the +wired-in interfaces in the compiler. This type is defined as follows, and +represents the category of runtime conversions between types. As these +conversions must take place at runtime, they are able to perform computations. + +``` +type Convertible a b: + convert : a -> b + +convertVia : => (t : [Type]) -> a -> b +``` + +The fact that `Convertible` is wired in lets the compiler treat it in a special +fashion. When it encounters a type mismatch between types `A` and `B`, the +compiler is able to look up all instances of `Convertible` to see if there is a +matching conversion. If there is, it will be automatically (invisibly) inserted. + +Now this initially sounds like a recipe for a lack of control, but there are a +few elements of this design to keep in mind: + +- Due to the nature of Enso's type system and how default arguments count + towards the saturation of a function, an instance of convert can actually be + defined to be configurable. Consider the following example, which defines an + instance with an alternative signature. + + ``` + instance Convertible Text File.Path for Text: + convert : (a : Text) -> Bool -> (b : File.Path) + convert in -> expandEnvVars = True -> ... + ``` + + Under such a circumstance, the implicit call uses the default, but if a user + wants to configure or control elements of the conversion behaviour, then they + can be explicit `convert (expandVars = True)`. In cases where the types need + to be made explicit to ensure instance resolution, they too can be applied + (`convert (a = Text) (b = File.Path) foo`). +- This mechanism is unobtrusive in exploratory code, and contributes to the idea + that things 'just work'. +- It is accompanied by an optional warning `-Wimplicit-conversions` that warns + when a conversion is made without an explicit call to `convert`. This ensures + that users can opt in to having more feedback as their codebase evolves from + exploration to development. This warning will be accompanied by an IDE + protocol quick-fix that allows local (or global) insertion of explicit + conversions. +- The `Convertible` type is represented as an interface because users may need + to constrain the types of their functions based upon the ability to convert + between two types `a` and `b`. When used in this case, there is of course no + access to any defaulted arguments without extending the interface (see the + discussion on row extension above). +- A call to `convert` will only be implicitly inserted when the two types to be + converted between are explicitly known by the type-checker. It will _not_ + insert speculative calls. +- Conversion takes place at runtime at the last possible moment. This means that + if `a : Foo` is convertible to `Bar`, with some function `f : Bar -> ...`, the + call `f a` is implicitly rewritten `f (convert a)`. +- A call to `convert` is only inserted implicitly when the type mismatch can be + resolved +- The function `convertVia` lets you give a hint to the compiler as to the type + to convert through. This has a default implementation in every instance of + the interface. It is _never_ inserted by the compiler. + +Finally, you may be wondering about the quality of error messages that this can +produce in the case where there is a type mismatch and not enough information to +resolve the type variables. To this end, there has been some discussion about +making `convert` a reserved name, but this is not certain yet. + +It is an open question as to _where_ we infer convertible. + +## Coercible +It is often necessary, particularly when working with structs over the C-FFI or +in the case of embedded syntaxes, to need to be able to convert between types +with zero runtime cost. The `Coercible` mechanism provides a way to do this that +is safe and checked by the compiler. + +This is a wired-in type in the compiler, and unlike for `Convertible` above, it +cannot be defined by users. Instead, the compiler will automatically generate +pairs of coercions between _types that have identical runtime representations_. + +``` +type Coercible a b: + coerce : a -> b +``` + +Unlike calls to `convert`, calls to `coerce` are never inserted by the compiler. +This is due to the fact that, while two types may have the same runtime +representation, the semantics of these types can differ wildly. + +Much like above, there are some elements of this design that bear stating +explicitly: + +- `Coercible` is represented as an interface to allow users to parametrise their + functions on the availability of a coercion between two types. + +# Principles for Enso's Type System + +- Types in Enso are functions on sets (constructors included), and are based on + the theory of rows described in the "Abstracting Extensible Data Types" paper. + +- All interface definitions must resolve to a 1:1 mapping between resolution + type (e.g. `Text`) and result type (e.g. `Functor Char Char`), modulo type + annotations. + +- Scope lookup proceeds from function scope outwards, and the body is in scope + for the signature. The signature is in scope for the body. + +- There is no distinction between types and kinds. That means that all kinds + (e.g. Constraint, Type, Representation) are Type (Type in Type). + +- We want to provide the ability to explicitly quantify type variables, for all + of: dependent, visible, required, and relevant (e.g. `forall`, `foreach`). + +- It should be clear from a signature or pattern match in isolation which + variables are implicitly quantified. + +- Naming in Enso is case-insensitive, so to hoist a bare function to the type + level, you are required to capitalise it. `foo` and `Foo` are the same thing + in values, but when used in a type, the former will be inferred as a free var. + +- Implicitly quantified variables are parsed as explicit but hidden. + +- Applications of types are done via named arguments. + +- All arguments are named. + +- There is inbuilt support for inference of hole-fits. + +- Default arguments are always applied when left unfilled. We provide a syntax + `...` for preventing use of a default. + +- Argument names at the type and term level must not shadow each other. + +- There is no syntactic sugar for multiple argument functions. + +- Contexts (e.g. IO) are represented using `T in IO`. Multiple contexts are + combined as standard `(IO | State Int)`, and it is written the same in arg + position. Monadic Contexts. + +- Types define namespaces. + +- A more generic type can be written than will be inferred. + +- Contexts may be omitted when writing types. + +- Laziness may be omitted when writing types, but has explicit syntax. + +- Computation defaults to strict. + +- Laziness and Strictness is controlled by the type. + +- Type definitions are, by default, desugared to open polymorphic rows. + +- Type definitions that do not include data members, do not have generated + constructors. + +- Type definitions that do include data members are given autogenerated + constructors. + +- The desugaring of all type definitions will include default implementations + and values. + +- Provide some keyword (`prove`) to tell the compiler that a certain property + should hold when typechecking. It takes an unrestricted expression on types, + and utilises this when typechecking. It may also take a string description of + the property to prove, allowing for nicer error messages: + + ``` + append : (v1 : Vector a) -> (v2 : Vector a) -> (v3 : Vector a) + append = vec1 -> vec2 -> + prove (v3.size == v1.size + v2.size) "appending augments length" + ... + ```` + + Sample error: + + ``` + [line, col] Unable to prove that "appending augments length": + Required Property: v3.size == v1.size + v2.size + Proof State: + + + ``` + + This gives rise to the question as to how we determine which properties (or + data) are able to be reasoned about statically. + +- Dependent types in Enso will desugar to an application of Quantitative Type + Theory. + +- Tuples are strictly a less powerful case of rows. + +- The type system requires support for impredicative types and higher-rank + types with inference. + +- Passing a lazy type to a function expecting a strict one (i.e. one that is not + strictness-polymorphic) should force the computation. + +- The typechecker should support interleaving of metaprogramming and + type-checking for elab-style scripts. + +- Supermonadic theory, as a strictly more-powerful theory over standard monads, + allows trivial definitions in terms of standard monads with inferred trivial + contexts. Enso's Monadic Contexts are based on this theory. + +- The context inference algorithm should support a high-granularity (e.g. read, + write, FFI, print, etc). + +- The base theory of rows that underlies the typechecker must support projection + of values based on arbitrary types. + +- All of the above concepts are represented as operations over row types. + +- Row projections are first-class citizens in the language. They are based on + the projection mechanism described in "Abstracting Extensible Data Types". + +- When a function has defaulted arguments, these arguments should be treated as + filled for the purposes of matching types. This point is somewhat subsumed by + ones above, but bears making explicit. `f : A -> B` should match any function + of that type (accounting for defaulted arguments). + +- The type-system will support wired in `Convertible` and `Coercible` instances. + The former deals with runtime conversions between types, while the latter + deals with zero-cost conversions between types of the same runtime + representation. Both can be inserted by the compiler where necessary. + +- Enso will make use of whatever advanced type-system features are at its + disposal to improve safety and performance. + +- Complex errors will be explained simply, even if this requires additional + annotation from programmers who use the advanced features. + +- In the context of dependent types, relevance inference will be performed. + +- The Enso typechecker will integrate an aggressive proof rewrite engine to + automate as much of the proof burden as possible. + +- Inference is employed heavily, letting types span from invisible to fancy, + with each level providing as many guarantees as is practicable. + +- Enso _is not a proof system_, and its implementation of dependent types will + reflect said fact, being biased towards more practical usage. + +- The future implementation of dependent types into Enso will be based on RAE's + thesis about dependent types in Haskell (particularly PICO and BAKE). + +- Inference will propagate as far as possible, but under some circumstances it + will require users to write types. + +- Value-level names become part of the function interface. + + ``` + replicate : (n : Nat) -> a -> Vector n a + replicate = num -> val -> ... + + # Alternatively + replicate : Nat -> a -> Vector sz a + replicate = sz -> val -> ... + ``` + + The issue here is that after an `=`, you want a bare name to be a _name_, and + after a `:`, you want it to represent a type. In the above example, `a` is a + free type variable (`forall a`), while `n` is a name. + +- If it comes to a tension between typechecker speed and inference capability, + Enso will err on the side of inference capability in order to promote ease of + use. Speed will be increased by performing incremental type-checking where + possible on subsequent changes. + +- 'Categorical Typing' (e.g. the `1 <: Int`) relationship, is supported by two + key realisations: + + Atoms can be represented as elements of a row. + + Labels can be the atom they project (e.g. `1 : 1`) in the context of + polymorphic labels. + Thereby, when `Int : (1:1, 2:2, 3:3, ...)`, we trivially have `12:12 <: Int`. + We just need to ensure that the 'subtyping' rules for the primitives are + wired-in. This still supports `Vector3D 1 2 3 <: Vector3D Int Int Int`. A + close examination of how this works with functions is required. + +- We do not intend to support duplicate row labels, and will use appropriate + constraints on combination and projection to achieve this. + +- Deallocation of resources will be performed by 'drop', which can be explicitly + implemented by users if need be (a wired-in interface). + +- Record syntax is `{}`. + +- We do not want to support invisible arguments. + +- Enso must support nested type definitions. These nested types are + automatically labelled with their name (so constructors are `mkFoo`, rather + than `Foo`). The nested type is constructed as part of the containing type. + +- When writing a method, the type on which the method is defined must be the + last argument. + +- Every name uses the following syntax `name : type = value`, where either + `type` or `value` can be omitted. This has additional sugar that allows users + to write it as follows: + + ``` + name : type + name = value + ``` + + This sugar is most likely to be seen for function definitions, but it works in + all cases. + +- Rows in Enso are open, and have polymorphic projections. A projection without + a given type is assumed to be a label, but `{ (Foo : Type) : Int }` lets users + use other types as projections. A row `{ myLabel : Int }` is syntax sugar for + an explicit projection based on a label: `{ (myLabel : Label) : Int }`. + +- Enso features built-in first-class lenses and prisms based on row projections, + as described in "Abstracting Extensible Datatypes". + +- Rows where no explicit labels are given are assumed to be tuples. This means + that the row `{Int, Int, Char}` is syntax sugar for a row with autogenerated + labels for projection: `{ 1 : Int, 2 : Int, 3 : Char }`. Names of the labels + are TBC. In doing this, you enforce the expected 'ordering' constraint of a + tuple through the names. + +- Bare types, such as `Int` or `a : Int` are assumed to be rows with a single + member. In the case where no name is given, a it is treated as a 1-tuple. + +- We want the theory to have possible future support for GADTs. + +- The operation `foo a b` desugars to `b.foo a` so as to construct a + transformation applied between two types. + +- We want error messages to be as informative as possible, and are willing to + retain significant extra algorithmic state in the typechecker to ensure that + they are. This means both _formatting_ and _useful information_. + +- There is a formalisation of how this system lowers to System-FC as implemented + in GHC Core. + +- Constructors are treated specially for the purposes of pattern matching. + +- The implementation supports `Dynamic`, a type that allows typechecked + interactions with external data. The properties expected of a `Dynamic` value + become part of the `Dynamic` type signature, so if you need to access a + property `foo` on a value `a` that is dynamic, `a` has type + `Dynamic { foo : T }`, and thereby represents a contract expected of the + external data. Dynamic is a strict subtype of `Type`. + +- The typechecker will work efficiently in the presence of defaulted arguments + by lazily generating the possible permutations of a function's signature at + its use sites. This can be made to interact favourably with unification, much + like for prolog. +- Users need to explicitly run their contexts to provide the lifting. + +- It is going to be important to retain as much information as possible in order + to provide informative error messages. This means that the eventual algorithm + is likely to combine techniques from both W and M (context-insensitive and + context-sensitive respectively). + +- Type errors need to track possible fixes in the available context. + +- There is explicit support for constraints appearing at any point in a + polymorphic type. + +- Type equality in Enso is represented by both representational and structural + equality. Never nominal equality. There is no inbuilt notion of nominal + equality. + +- The type-system includes a mechanism for reasoning about the runtime + representation of types. It will allow the programmer to constrain an API + based upon runtime representations. While this is described as a type-level + mechanism, it only is insofar as kinds are also types. The kind of a type is + an expression that contains the following information: + + Levity: Types that are represented by a pointer can be both lifted (those + that may contain 'bottom') and unlifted (those that cannot). Thunks are + lifted. + + Boxiness: Whether the type is boxed (represented by a pointer) or unboxed. + + Representation: A description of the machine-level representation of the + type. + + This is effectively represented by a set of data declarations as follows: + + ```hs + data Levity + = Lifted -- Can contain bottom (is a lazy computation) + | Unlifted -- Cannot contain bottom (has been forced or is strict) + + data Size = UInt64 + + data MachineRep + = FlatRep [MachineRep] -- For unboxed rows + | UInt32 + | UInt64 + | SInt32 + | SInt64 + | Float32 + | Float64 + | ... + + data RuntimeRep + = Boxed Levity + | Unboxed MachineRep + + data TYPE where TYPE :: RuntimeRep -> ? -- Wired in + + -- Where does `Constraint` come into this? + ``` + + Doing this allows programs to abstract over the representation of their types, + and is very similar to the implementation described in the Levity Polymorphism + paper. The one change we make is that `FlatRep` is recursive; with most of + Enso's types able to be represented flat in memory. This means that the list + `[MachineRep]` is able to account for any row of unboxed types. + + The return type of `TYPE` is still an open question. What does it mean for a + dependently typed language to deal in unboxed types at runtime? Reference to + the Levity Polymorphism paper will be required. We don't want the usage of + this to rely on the JIT for code-generation, as it should operate in a static + context as well. + + An example of where this is useful is the implementation of unboxed arrays, + for which we want a flat in-memory layout with no indirections. Being able to + parametrise the array type over the kind `forall k. RuntimeRep (Unboxed k)`, + means that the type will only accept unboxed types. + +- We want to support contexts on types such that instantiation can be guarded. + For more information see the Partial Type-Constructors paper. If you have a + type `type Num a => Foo a = ...`, then it should be a type error to + instantiate `Foo a` where `Num a` doesn't hold. This allows a treatment of + partial data. However, this isn't easily extensible across interfaces. Could + we propagate the constraint to the constructors in a GADT-alike? Nevertheless, + they act as well-formedness constraints on the type definition. This means + that the desugaring for type definitions involves GADTs. The construction of + a constrained type creates evidence that is discharged at the type's use site + (pattern match or similar). This should be based on the reasoning in the + Partial Data paper, and so all types should automatically be generalised with + the 'well-formed type' constraints, where possible. + +- Type constructors are special entities. Not to be confused with value + constructors `Vector Int` vs. `mkVector 1`. + +- The syntax is as follows: + + Row Alternation: `|` + + Row Subtraction: `\` + + Row Concatenation: `&` or `,` + +- We should be able to infer variants and records, but this behaviour can be + overridden by explicit signatures. + +- With regards to Monadic Contexts, `in` allows nesting of contexts, as opposed + to composition of transformers. `=` 'peels off' the last layer of contexts. + When composing contexts, we have `&` for composing transformers, and `|` for + alternating them, similarly to the dual with rows. + +- To provide more predictable inference in dependent contexts, this system will + draw on the notion of _matchability polymorphism_ as outlined in the + higher-order type-level programming paper. The key recognition to make, + however is that where that paper was required to deal with + backwards-compatibility concerns, we are not, and hence can generalise all + definitions to be matchability polymorphic where appropriate. + +- Implicit parameters (e.g. `{f : Type -> Type} -> f a -> f a` in Idris) are + able to be declared as part of the signature context. + +- As of yet it is undecided as to what form of type-checking and inference will + be used. The idea is to match our goals against existing theory to inform + implementation. + + Boxy: Provides greater inference power for higher-rank types and + impredicative instantiation. Integrates well with wobbly inference for GADTs + and has a relatively simple metatheory. + + Complete Bidirectional: An interesting theory with particular power to infer + higher-rank types, but with no support for impredicative instantiation. Has + some additional complexity through use of subtyping relationships. This is + fully decidable, and subsumes standard Damas-Milner style inference. + + Flexible Types/HML: Provides a translation from MLF to System-F, which may + be useful during translation to GHC Core. Requires annotations only on + polymorphic function parameters (e.g. `fn f = (f 1, f True)` would require + annotation of `f :: forall a . a -> a`). However, in the context of this + requirement, all other instantiations (including impredicative and + higher-rank) can be inferred automatically. Could this be combined with + decidable rank-2 inference to increase expressiveness? + + FPH: Lifts restrictions on polymorphic instantiation through use of an + internal theory based on MLF. Focuses on impredicativity, but provides some + commentary on how to extend the theory with approaches to higher-rank + inference. This is based upon the Boxy work, but has trade-offs with regards + to typeability in absence of annotations. + + MLF: A highly complex type theory with the ability to represent strictly + more types than System-F (and its variants). There are various theories that + allow for translation of these types to System-F. It has theory supporting + qualified types (a separate paper), which are necessary for our type system. + I worry that choice of MLF will expose greater complexity to users. + + Practical: Provides a local inference-based foundation for propagation of + higher-rank type annotations at the top level to reduce the annotation + burden. We can likely use this to inform our design, but it is somewhat + subsumed by the HML approach. + + QML: A simple system supporting impredicative instantiation, but with a much + higher annotation burden than others. The underlying specification of + checking and inference is very clear, however, so there are still potential + lessons to be learned. + + Wobbly: Provides a unified foundation for treating all types as GADTs, and + hence allowing for bounded recursive type definitions. It precisely + describes where type-signatures are required in the presence of GADTs. Has + some interesting insight with regards to polymorphic recursion. + + It should be noted that none of these theories explicitly address extension to + dependent type theories, so doing so would be entirely on our plate. + Furthermore any theory chosen must have support for qualified types (e.g. + those with constraints, existentials). + + To this end, I don't know if it is possible to always transparently support + eta expansion of functions without annotation. + + My _current_ recommendation is to base things on HML. This is for the + following reasons: + + It maximises inference power for both higher-rank and impredicative type + instantiations. + + It has a simple rule for where annotations are _required_. + + While it requires them in more places than MLF, the notion of where to put + an annotation is defined by the function's external type, not its body. + + The inference mechanism is far simpler than MLF as it only makes use of + flexible types rather than constrained types. + + There is existing work for adding qualified types to HML. + +- There is an integration of constraints with interfaces. An implementation of + an interface may be _more specific_ than the interface definition itself, + through use of GADTs. + +- Interfaces in Enso are inherently multi-parameter, by virtue of specifying a + structure as a row. + +- In an ideal world, we would like to only require programmer-provided type + annotations in the following circumstances: + 1. Polymorphic Recursion (technically a case of #2) + 2. Higher-Rank Function Parameters + 3. Constrained Data-Types (GADTs) + + In order to achieve this, the final design will employ techniques from both + unification-based Damas-Milner inference techniques, and annotation + propagation inspired by local type-inference techniques. + +# Structural Type Shorthand +In Enso, we want to be able to write a type-signature that represents types in +terms of the operations that take place on the input values. A classical example +is `add`: + +``` +add : a -> b -> b + a +add = a -> b -> b + a +``` + +There are a few things to note about this signature from a design standpoint: + +- `a` and `b` are not the same type. This may, at first glance, seem like a + signature that can't work, but the return type, in combination with our + integrated `Convertible` mechanism gives us the tools to make it work. +- The return type is `a + b`. This is a shorthand expression for a detailed + desugaring. The desugaring provided below is what the typechecker would infer + based on such a signature. + +``` +add : forall a b c d. ({+ : Convertible b c => a -> c -> d} <: a) => a -> b -> d +``` + +This may look fairly strange at first, but we can work through the process as +follows: + +1. The expression `b + a` is syntactic sugar for a method call on a: `a.+ b`. +2. This means that there must be a `+` method on a that takes both an `a` and a + `b`, with return-type unspecified as of yet: `+ : a -> b -> ?` +3. However, as `Convertible` is built into the language, we have to consider + that for `a.+ b` to work, the `+` method can actually take any type to which + `b` converts. This introduces the constraint `Convertible b c`, and we get + `+ : a -> c -> ?` +4. The return type from a function need not be determined by its arguments, so + hence in the general case we have to default to an unrestricted type variable + giving `+ a -> c -> d`. +5. This method must exist on the type `a`, resulting in the constraint that the + row `{+ : a -> c -> d} <: a` must conform to that interface. +6. We now know the return type of `a + b`, and can rewrite it in the signature + as `d`. + +Please note that `a <: b` (which can be flipped as `:>`) means that `a` is a row +that is a sub-row contained within the row `b`. The containment relation allows +for the possibility that `a == b`. + +The inferred type for any function should, in general, match the given type +signature. Cases where this break down should only exist where the type +signature acts to constrain the function type further than would be inferred. + +# Interfaces +An interface in Enso is a representation of the (partial) structure of a type +given name. For a type in Enso to conform to an interface, under the hood it is +one that structurally conforms to the projections and result types provided by +the interface. + +There are two possible treatments of interfaces as far as Enso's type system is +concerned: + +1. **Names Only:** An interface is purely a name given to a type that describes + a certain structure. +2. **Global Mapping:** More akin to how they behave in Haskell, an interface + acts as a global mapping between a name and a structure (row) that contains + the associated behaviour. + +Both have their upsides and trade-offs, and each of which are explored below. +Each uses a separate keyword `interface` to define the interface, as this allows +for us to avoid generation of automatic constructors. In the below sections we +look at the following example. + +``` +interface Iterable : (a : Type) -> Type = + ElemType : Type + map : (a.ElemType -> a.ElemType) -> a -> a + + # A default implemented method + <$> : (a.ElemType -> a.ElemType -> a -> a) + <$> = a.map +``` + +We also use the `instance` keyword to denote a standalone implementation of an +interface. + +There are a few elements of the design for interfaces that will hold regardless +of the choice made below. + +- Interfaces are inherently type constructors that generate a row. This means + that they can use the standard dependency machinery included in Enso to + compute some or all of their types. +- Interfaces, as type constructors, are inherently multi-parameter should they + need to be. + +## Interfaces as Names for Structures +This first option is the most 'pure' with regards to the structural nature of +the type system. It works as follows: + +- The `interface` keyword differs from the `type` keyword only in that it will + never generate an automatic type constructor for the type. +- An interface definition creates a row constructor and associated row that can + represent the operations required of a type that conforms to the interface. + For example, the above definition would desugar as follows: + + ``` + Iterable : (a : Type) -> Type = + { ElemType : Type + , map : (a.ElemType -> a.ElemType) -> a -> a + , <$> : (a.ElemType -> a.ElemType) -> a -> a = a.map + } + + # The following is equivalent, but longer. + Iterable : (a : Type) -> Type + Iterable = a -> { ElemType : Type + , map : (a.ElemType -> a.ElemType) -> a -> a + , <$> : (a.ElemType -> a.ElemType) -> a -> a = a.map + } + ``` + +- Any type that conforms to this structure is an implementation of this + interface, regardless of an explicit definition. + +The primary benefits of choosing such a design are as follows: + +- Any type that conforms to the row declared by the interface is counted as an + implementation of the interface, regardless of any keyword. +- We do not have to reserve names in scope, meaning that `mplus` could be + represented by a `+`, as could numeric addition. +- It is very pure from a type-system perspective, with interfaces just being a + way to declare a row without a constructor. +- It is trivial to provide default implementations, as you would for any type + definition. + +However, it is not all rosy. This design has some downsides with regards to +usability, particularly around the provision of useful diagnostics to users in +the form of inferred type signatures and type errors. + +- Interface names are only useful as shorthand for programmers, as we can never + infer a name based on the structure of a type. They can be used for explicit + `implements` declarations to ensure that the methods of the interface are + implemented, and they can be used in explicit signatures. +- As we can never infer interface usage, all inferred signatures will need to + represent types in terms of their structure. This means that we can never say + that `a` needs to be a `Number` for you to use `+`, and instead we can only + say `a` needs to have a method `+ : a -> a -> a` in our type errors. While + this is not inherently a problem, it does limit our ability to give users + _named_ concepts to reason about their code with. + +## Interfaces as a Global Mapping +The alternative design is to use the `interface` keyword to provide the compiler +(and hence the users) with a global mapping from method names to the names of +the interfaces. This would work as follows: + +- The `interface` keyword generates a global mapping from the interface name to + the names of the interface methods and properties. For the above interface, it + would be `Iterable <-> (ElemType, map, <$>)`. +- This definition creates, in addition to the global mapping, a type constructor + that produces a row, as for above. The desugaring is identical to the above. +- Types may still implicitly conform to interfaces. + +The primary benefits of this design are as follows: + +- Given that there is a global mapping of interface names to method and + property names, whenever we see the usage of such a method we can infer that + the type(s) in question implement the interface. +- We can use the interface name to method mapping to infer signatures that + explicitly name the interface. This gives programmers the ability to reason + about concepts (or structures) with names. + +That is not to say that this approach, too, is without its downsides. In fact, +where the first approach has benefits, this approach tends to miss out on them: + +- A name used in an interface is globally reserved, meaning that it can't be + used for anything else. This means that numeric addition could be `+`, but + monoidal concatenation would have to be named differently (e.g. `<>`). +- It is far more complex from the perspective of the type system. While, in this + case, interfaces do generate rows, they also generate significant amounts of + extra baggage to support this global mapping. + +A variant on this approach would include the types of the methods and properties +in the mapping. This allows names to no longer need to be globally reserved, but +often means that inference still won't be able to match a name (e.g. if we +define `foo = a -> b -> a + b`, then we still can't guarantee that it belongs to +the `Num` interface, for example). + +As a result, if we want to include types in the mapping, the recommendation is +for the first option, not this variant of the second. + +## Interface Generality +One of the larger pain-points in Haskell comes from the fact that typeclass +structure places restrictions on what types can become an instance of the class. +Enso, instead, works from a foundation of rows. This means that we can trivially +make use of associated types in interfaces to compute more general signatures. + +Consider the following `Iterable` interface, which expresses a map operation in +a way that is not easy in Haskell, and that is more general than `Functor`. + +``` +# The `=` is used here for consistency with unified definitions of the form +# (name : type = val). + +interface Iterable : (a : Type) -> Type = + elemType : Type + map : (elemType -> elemType) -> a -> a + +# Needs to account for internal type transformations (map toString) for example + +instance Iterable Text = + elemType = Char + map = ... +``` + +Checking such a type is still able to be done through standard qualified +typechecking algorithms. There are, however, a few things to keep in mind: + +- For a type to conform to an interface, it is sufficient for its implementation + of the interface methods to be _callable_ with the signature given in the + interface. This means that an implementation can add additional function + parameters as long as they are defaulted. +- Of course, in the case where these function parameters are _used_, the type is + no longer conforming with the interface. This is a bit nasty, but unavoidable. + +Even nicer is the fact that such an approach can be combined with partial data +(restricted instantiation) to allow definition of `Iterable` across sets. +Consider the following: + +``` +instance Iterable (Set a) = + elemType = a + map = ... +``` + +# Subtyping and User-Facing Type Definitions +As complex as the internal type language of Enso may need to be, we ideally want +to only present a limited set of concepts to the user for use in writing their +own types. Internally, we have the following relations between types: + +- `:` - This gives a type to an expression (this expression can be a name, a + program fragment, or any valid expression). A program expression `a : b` means + that the expression `a` has the type given by `b`. +- `~` - This relation asserts structural equality between types. If `a ~ b`, + then the two types have exactly the same structure. This can be represented by + the containment relation: `a ~ b = a <: b && b <: a`. +- `<:` - This relation asserts that one type is wholly contained in another, and + can be thought of as a form of structural subtyping. If `a <: b`, then the + type `a` is structurally equal to or a subtype of `b`. In essence, this means + +Please note that this may initially be a little confusing. What, then, does it +mean to have an expression like `(name : type = value)`. Simply enough, this +universal definition format says that 'the identifier given by `name` has type +given by `type`, and has value given by `value`.' This works globally, including +for functions, and for type definitions. + +``` +type MyExample : (a : Type) -> (b : Type) -> Type = + # The body of the type goes here, as it is the VALUE of the type constructor + # In a truly dependently typed language, a type constructor is just a + # function on types, and the `type` keyword isn't even needed other than to + # indicate automatic constructor generation. +``` + +Externally, however, it is a different story. We want users to not have to think +at such depth about their types to express what they want. This means that we do +not intend to require users to write expressions involving structural equality +or containment (though we may still expose these relations to allow simpler +programming with types). + +- In reality, we want users to really only make use of `:` when defining their + types. +- This means that we need to be very clear about what it means, especially in + the face of recursive types. + +Let's examine the major cases: + +1. **Polymorphic Function Arguments:** +2. **Partial Data Types:** +3. **Qualified Types in Functions:** +4. **Variable Definitions:** + +Is variance always valid in the ways we need? + +# Row Polymorphism and Inference +The foundation of Enso's usability is based on a _structural_ type system. This +means that there is no concept of nominal equality. In Enso types are equal if +they have the same structure. Under such a system, the only reason that users +should _need_ to write types is to provide a type that the inference engine +cannot infer (though this may be more general, more specific, or for a case that +cannot be inferred). + +From a philosophical standpoint, we want Enso's type system to infer sensible +types for 99% of expressions that users write, and allow the users to _refine_ +these inferred types. This refinement process can involve: + +- Giving a more general (more permissive) type to an expression. +- Constrain an expression by giving a less permissive type to it. +- Add additional safety and checking by introducing more complex type-system + usage (e.g. dependency, partial data, etc). + +It should be noted that, in this sense, Enso's structural type system has no +concept of a 'principle' (or most-general) type for an expression, at least not +in the sense of traditional System-F. + +# Unresolved Questions + +1. How to integrate row polymorphism with the inference story? + +2. How to do dependent types? + +5. Representing ADTs as rows (variants and records). + +6. Auto-injectivity for Generalised inductive types (GADTS)? Are our type + constructors _matchable_ (injective and generative)? + +Points that need to be accounted for in the 'wishlist' design: + +- Basic types +- Constrained types +- Inference +- Dependent Types +- Containment/subtyping +- Dynamic +- Exception handling +- Monadic Contexts +- Complexity tradeoff between execution and implementation + +``` +read : String -> Ty? -> Ty +``` + +Named state items? Parametrise the state over a type and get via names, which +are projections. + +- State parametrised over a record giving both name and type +- Do we want to look things up based on type? +- `State.get name` where `name` is a row projection. + +- Monadic contexts desugared as transformers. + + +- Initially treat the value _in_ the state as a dynamic. + +- Dependent types at runtime compared with dynamics. + +- Encoding of strictness in the type system. + +IO, State, Exception (not ! error) + +- Dependent types as constructing proofs through combining types. Combining + types provides evidence which can be discharged to create a proof. A value can + then only be constructed by discharging a proof. + +# Dependency and Enso +The ability to evaluate arbitrary functions on the type level inherently makes +Enso a dependently typed language, as arbitrary values can appear in types. + +- The initial implementation will provide this facilities, but no system for + automating the proof steps (c.f. f-star), or interactive theorem proving. +- While this allows people to express safety guarantees in the type system, it + is a natural consequence of Enso's design. + +# Steps + +1. Wojciech produces examples of program fragments and the types that should be + inferred based on the expressions. +2. Based on these fragments, try and synthesis a theoretical approach to both + inference and type-checking based upon this. +3. Write down a high-level design based on this theory, stating the properties + of the system that we want. +4. Formalise the necessary parts thereof. +5. Implement based upon this design + formalisation. +6. Formalise the remaining parts of the system. + +By the end of working day on the 23rd of May, WD and AA need to have a +comprehensive mutual understanding of what the type system is going to be. + +# References +The design of the type-system described in this document is based on prior work +by others in the PL design community. The (probably) complete list of references +is as below. + +#### Rows +- [Abstracting Extensible Data Types](http://ittc.ku.edu/~garrett/pubs/morris-popl2019-rows.pdf) + +#### Maximum Inference Power +- [A Theory of Qualified Types](https://github.com/sdiehl/papers/blob/master/A_Theory_Of_Qualified_Types.pdf) +- [Boxy Type-Inference for Higher-Rank Types and Impredicativity](https://www.microsoft.com/en-us/research/publication/boxy-type-inference-for-higher-rank-types-and-impredicativity/) +- [Complete and Easy Bidirectional Typechecking for Higher-Rank Polymorphism](https://www.cl.cam.ac.uk/~nk480/bidir.pdf) +- [Flexible Types: Robust Type Inference for First-class Polymorphism](https://www.microsoft.com/en-us/research/publication/flexible-types-robust-type-inference-for-first-class-polymorphism/) +- [FPH: First-Class Polymorphism for Haskell](https://www.microsoft.com/en-us/research/publication/fph-first-class-polymorphism-for-haskell/) +- [MLF: Raising ML to the Power of System-F](http://gallium.inria.fr/~remy/work/mlf/icfp.pdf) +- [Practical Type Inference for Arbitrary-Rank Types](https://www.microsoft.com/en-us/research/publication/practical-type-inference-for-arbitrary-rank-types/) +- [QML: Explicit, First-Class Polymorphism for ML](https://www.microsoft.com/en-us/research/wp-content/uploads/2009/09/QML-Explicit-First-Class-Polymorphism-for-ML.pdf) +- [Wobbly Types: Type Inference for GADTs](https://www.microsoft.com/en-us/research/publication/wobbly-types-type-inference-for-generalised-algebraic-data-types/) + +#### Dependent Types +- [Dependent Types in Haskell: Theory and Practice](https://cs.brynmawr.edu/~rae/papers/2016/thesis/eisenberg-thesis.pdf) +- [Higher-Order Type-Level Programming in Haskell](https://www.microsoft.com/en-us/research/uploads/prod/2019/03/ho-haskell-5c8bb4918a4de.pdf) +- [Practical Erasure in Dependently-Typed Languages](https://eb.host.cs.st-andrews.ac.uk/drafts/dtp-erasure-draft.pdf) +- [Syntax and Semantics of Quantitative Type Theory](https://bentnib.org/quantitative-type-theory.pdf) + +#### Monadic Contexts +- [Supermonads](http://eprints.nottingham.ac.uk/36156/1/paper.pdf) + +#### Types and Performance +- [Levity Polymorphism](https://cs.brynmawr.edu/~rae/papers/2017/levity/levity-extended.pdf) +- [Partial Type-Constructors](https://cs.brynmawr.edu/~rae/papers/2019/partialdata/partialdata.pdf) + + diff --git a/doc/graphmod/run.sh b/doc/graphmod/run.sh new file mode 100755 index 00000000000..71ff5cf036b --- /dev/null +++ b/doc/graphmod/run.sh @@ -0,0 +1,7 @@ +#!/bin/bash + +SELF="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" +ROOT=$SELF/../.. + +find $ROOT/core/src -name "*.hs" -print | xargs graphmod -p > $SELF/dist/overview.dot +dot -Tpng -Gdpi=600 $SELF/dist/overview.dot > $SELF/dist/overview.png diff --git a/doc/haskell-style-guide.md b/doc/haskell-style-guide.md new file mode 100644 index 00000000000..25c2980111b --- /dev/null +++ b/doc/haskell-style-guide.md @@ -0,0 +1,1375 @@ +# Haskell Style Guide +Like many style guides, this Haskell style guide exists for two primary reasons. +The first is to provide guidelines that result in a consistent code style across +all of the Luna codebases, while the second is to guide people towards a style +that is expressive while still easy to read and understand. + +In general, it aims to create a set of 'zero-thought' rules in order to ease the +programmer burden; there is usually only _one way_ to lay out code correctly. + + + +- [Code Formatting](#code-formatting) + - [Whitespace](#whitespace) + - [Line Wrapping](#line-wrapping) + - [Alignment](#alignment) + - [Naming](#naming) + - [Imports](#imports) + - [Exports](#exports) + - [Section Headers](#section-headers) + - [Auto-Formatting](#auto-formatting) +- [Commenting](#commenting) + - [Documentation Comments](#documentation-comments) + - [Source Notes](#source-notes) + - [TODO Comments](#todo-comments) + - [Other Comment Usage](#other-comment-usage) +- [Program Design](#program-design) + - [Libraries](#libraries) + - [Modules](#modules) + - [Data Declarations](#data-declarations) + - [Testing and Benchmarking](#testing-and-benchmarking) + - [Warnings, and Lints](#warnings-and-lints) +- [Language Extensions](#language-extensions) + - [Default Extensions](#default-extensions) + - [Allowed Extensions](#allowed-extensions) + - [Allowed With Care](#allowed-with-care) + - [Disallowed Extensions](#disallowed-extensions) + + + +## Code Formatting +This section explains the rules for visually laying out your code. They provide +a robust set of guidelines for creating a consistent visual to the code. + +### Whitespace +The rules for whitespace in the Luna codebases are relatively simple: + +- 4 spaces are used for indentation, with no tabs. +- There should not be any trailing whitespace. +- There should be no spurious whitespace within the lines, unless it is used for + [alignment](#alignment) as discussed below. + +### Line Wrapping +In order to provide visual consistency across our codebases, and also to +contribute to making our code easier to scan, we enforce that all code should be +wrapped to 80 characters width at a maximum. + +The nature of Haskell, however, means that it can sometimes be unclear where to +break lines. We use the following guidelines: + +- Wrap all lines to a maximum length of 80 characters. +- Break the lines on operators where possible, rather than wrapping function + arguments. + + ```hs + -- This + foo <- veryLongFunction1 veryLongArgument1 + $ veryLongFunction2 veryLongArgument2 veryLongArgument3 + + -- Not this + foo <- veryLongFunction1 veryLongArgument1 $ veryLongFunction2 + veryLongArgument2 veryLongArgument3 + ``` + +- When you have a choice of operators on which you could break, choose the one + with the highest precedence. We find that this makes code significantly more + readable. + + ```hs + -- This + potentialPkgRoot <- liftIO $ Directory.canonicalizePath + =<< (canPath ) <$> pkgRootFromExe @a + + -- Not this + potentialPkgRoot <- liftIO $ Directory.canonicalizePath =<< (canPath ) + <$> pkgRootFromExe @a + ``` + +- Wrap operators to the _start_ of the line, rather than leaving them trailing + on a line. + + ```hs + -- This + foo <- veryLongFunction1 veryLongArgument1 + $ veryLongFunction2 veryLongArgument2 veryLongArgument3 + + -- Not this + foo <- veryLongFunction1 veryLongArgument1 $ + veryLongFunction2 veryLongArgument2 veryLongArgument3 + ``` + +- Function signatures should wrap on the `=>` and `->`, and in the context of + a doc comment should have each argument on a separate line. +- Lists (and all list-like constructs e.g. constraint tuples, import lists) + should be wrapped with a _leading_ comma, aligned with the opening bracket, + and a space between the opening bracket and the first item. This is also used + in record declarations. + + ```hs + -- This + myFunctionWithAVeryLongName :: forall a m . ( SomeConstraintOnA a + , SomeMonadConstraint m ) + => a -> SomeOtherType -> m a + + -- Not this + myFunctionWithAVeryLongName :: forall a m . (SomeConstraintOnA a, + SomeMonadConstraint m) + => a -> SomeOtherType -> m a + ``` + +- If all else fails, wrap the lines using your best effort (usually what you + find to be most readable). This may result in discussion during code review, + but will provide a learning experience to augment this guide with more + examples. + +Please _do not_ shorten sensible names in order to make things fit into a single +line. We would much prefer that the code wraps to two lines and that naming +remains intelligible than names become so shortened as to be useless. + +### Alignment +When there are multiple lines that are visually similar, we try to align the +similar portions of the lines vertically. + +```hs +people <- getAllPeople <$> worlds +names <- getName <$> people +surnames <- getSurnames <$> names +``` + +This should _only_ be done when the lines don't need to be wrapped. If you have +lines long enough that this visual justification would cause them to wrap, you +should prefer to _not_ wrap the lines and forego the visual alignment. + +Furthermore, if you have to wrap a visually similar line such that it now spans +multiple lines, it _no longer counts_ as visually similar, and hence subsequent +lines should not be aligned with it. + +### Naming +Luna has some fairly simple general naming conventions, though the sections +below may provide more rules for use in specific cases. + +- Types are written using `UpperCamelCase`. +- Variables and function names are written using `camelCase`. +- If a name contains an initialism or acronym, all parts of that initialism + should be of the same case: `httpRequest` or `makeHTTPRequest`. +- Short variable names such as `a` and `b` should only be used in contexts where + there is no other appropriate name (e.g. `flip (a, b) = (b, a)`). They should + _never_ be used to refer to temporary data in a `where` or `let` expression. + +### Imports +Organising imports properly means that it's easy to find the provenance of a +given function even in the absence of IDE-style tooling. We organise our imports +in four sections, each of which may be omitted if empty. + +1. **Re-Exports:** These are the modules that are to be re-exported from the + current module. We import these qualified under a name `X` (for export), and + then re-export these in the module header (see below for an example). +2. **Preludes:** As we recommend the use of `-XNoImplicitPrelude`, we then + explicitly import the prelude in use. This is almost always going to be + `Prologue` as described in the section on [libraries](#libraries) below. +3. **Qualified Imports:** A list of all modules imported qualified. The `as` + portion of the import expressions should be vertically aligned. +4. **Unqualified Imports:** These must _always_ have an explicit import list. + There are _no_ circumstances under which we allow a truly unqualified import. + The import lists should be vertically aligned. + +Imports within each section should be listed in alphabetical order, and should +be vertically aligned. + +When we have a module that exports a type the same as its name, we import the +module qualified as its name, but we _also_ import the primary type from the +module unqualified. This can be seen with `Map` in the examples below. + +This example is for a module that re-exports some names: + +```hs +module Luna.MyModule (module Luna.MyModule, module X) where + +import Luna.MyModule.Class as X (foo, bar) + +import Prologue + +import qualified Control.Monad.State as State +import qualified Data.Map as Map + +import Data.Map (Map) +import Rectangle (Rectangle) +import Vector (Vector (Vector), test) +``` + +However, in the context where your module doesn't re-export anything, you can +use the simplified form: + +```hs +module Luna.MyModule where + +import Prologue + +import qualified Control.Monad.State as State +import qualified Data.Map as Map + +import Data.Map (Map) +import Rectangle (Rectangle) +import Vector (Vector (Vector), test) +``` + +### Exports +There is nothing more frustrating than having a need to use a function in a +module that hasn't been exported. To that end, we do not allow for restricted +export lists in our modules. + +Instead, if you want to indicate that something is for internal use, you need to +define it in an internal module. For a module named `Luna.MyModule`, we can +define internal functions and data-types in `Luna.MyModule.Internal`. This means +that these functions can be imported by clients of the API if they need to, but +that we provide no guarantees about API stability when using those functions. + +### Section Headers +In order to visually break up the code for easier 'visual grepping', we organise +it using section headers. These allow us to easily find the section that we are +looking for, even in a large file. + +For each type defined in a file, it can be broken into sections as follows: + +```hs +-------------------- +-- === MyType === -- +-------------------- + +-- === Definition === -- +{- The definition of the type goes here -} + + +-- === API === -- +{- The API of the type goes here -} + + +-- === Instances === -- +{- Any instances for the type go here -} + +``` + +The section header must be preceded by three blank lines, while the subsection +headers (except the first) should be preceded by two blank lines. Any of these +subsections may be omitted if they don't exist, and a file may contain multiple +of these sections as relevant. + +### Auto-Formatting +While we have attempted to use haskell auto-formatters to enforce many of the +above stylistic choices in this document, none have been found to be flexible +enough for our needs. However, as tools evolve or new ones emerge, we are open +to revisiting this decision; if you know of a tool that would let us automate +the above stylistic rules, then please speak up. + +## Commenting +Comments are a tricky area to get right, as we have found that comments often +expire quickly and, in absence of a way to validate them, remain incorrect for +long periods of time. That is not to say, however, that we eschew comments +entirely. Instead, we make keeping comments up to date an integral part of our +programming practice, while also limiting the types of comments that we allow. + +When we write comments, we try to follow one general guideline. A comment should +explain _what_ and _why_, without mentioning _how_. The _how_ should be +self-explanatory from reading the code, and if you find that it is not, that is +a sign that the code in question needs refactoring. + +Code should be written in such a way that it guides you over what it does, and +comments should not be used as a crutch for badly-designed code. + +### Documentation Comments +One of the primary forms of comment that we allow across the Luna codebases is +the doc comment. These are intended to be consumed by users of the API, and use +the standard Haddock syntax. Doc comments should: + +- Provide a short one-line explanation of the object being documented. +- Provide a longer description of the object, including examples where relevant. +- Explain the arguments to a function where relevant. + +They should not reference internal implementation details, or be used to explain +choices made in the function's implementation. See [Source Notes](#source-notes) +below for how to indicate that kind of information. + +### Source Notes +Source Notes is a mechanism for moving detailed design information about a piece +of code out of the code itself. In doing so, it retains the key information +about the design while not impeding the flow of the code. + +Source notes are detailed comments that, like all comments, explain both the +_what_ and the _why_ of the code being described. In very rare cases, it may +include some _how_, but only to refer to why a particular method was chosen to +achieve the goals in question. + +A source note comment is broken into two parts: + +1. **Referrer:** This is a small comment left at the point where the explanation + is relevant. It takes the following form: `-- Note [Note Name]`, where + `Note Name` is a unique identifier across the codebase. These names should be + descriptive, and make sure you search for it before using it, in case it is + already in use. +2. **Source Note:** This is the comment itself, which is a large block comment + placed after the first function in which it is referred to in the module. It + uses the haskell block-comment syntax `{- ... -}`, and the first line names + the note using the same referrer as above: `{- Note [Note Name]`. The name(s) + in the note are underlined using a string of the `~` (tilde) character. + +A source note may contain sections within it where necessary. These are titled +using the following syntax: `== Note [Note Name (Section Name)]`, and can be +referred to from a referrer much as the main source note can be. + +Sometimes it is necessary to reference a source note in another module, but this +should never be done in-line. Instead, a piece of code should reference a source +note in the same module that references the other note while providing +additional context. + +An example, taken from the GHC codebase, can be seen below. + +```hs +prepareRhs :: SimplEnv -> OutExpr -> SimplM (SimplEnv, OutExpr) +-- Adds new floats to the env iff that allows us to return a good RHS +prepareRhs env (Cast rhs co) -- Note [Float Coercions] + | (ty1, _ty2) <- coercionKind co -- Do *not* do this if rhs is unlifted + , not (isUnLiftedType ty1) -- see Note [Float Coercions (Unlifted)] + = do { (env', rhs') <- makeTrivial env rhs + ; return (env', Cast rhs' co) } + + ...more equations for prepareRhs.... + +{- Note [Float Coercions] +~~~~~~~~~~~~~~~~~~~~~~~~~ +When we find the binding + x = e `cast` co +we'd like to transform it to + x' = e + x = x `cast` co -- A trivial binding +There's a chance that e will be a constructor application or function, or +something like that, so moving the coercion to the usage site may well cancel +the coercions and lead to further optimisation. + ...more stuff about coercion floating... + +== Note [Float Coercions (Unlifted)] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ...explanations of floating for unlifted types... +-} +``` + +A source note like this is useful whenever you have design decisions to explain, +but can also be used for: + +- **Formulae and Algorithms:** If your code makes use of a mathematical formula, + or algorithm, it should note where the design element came from, preferably + with a link. +- **Safety:** Sometimes it is necessary to use an unsafe API in a context where + it is trivially made safe. You should always use a source note to explain why + its usage is safe in this context. + +### TODO Comments +We follow a simple convention for `TODO` comments in our codebases: + +- The line starts with `TODO` or `FIXME`. +- It is then followed by the author's initials `[ARA]`, or for multiple people + `[ARA, WD]`, in square brackets. +- It is then followed by an explanation of what needs to be done. + +For example: + +```hs +-- TODO [ARA] This is a bit of a kludge. Instead of X it should to Y, accounting +-- for the fact that Z. +``` + +### Other Comment Usage +There are, of course, a few other situations where commenting is very useful: + +- **Commenting Out:** You may comment out code while developing it, but if you + commit any commented out code, it should be accompanied by an explanation of + why said code can't just be deleted. +- **Bugs:** You can use comments to indicate bugs in our code, as well as + third-party bugs. In both cases, the comment should link to the issue tracker + where the bug has been reported. + +## Program Design +Any good style guide goes beyond purely stylistic rules, and also talks about +design styles to use in code. + +### Libraries +The Luna project has many internal libraries that are useful, but we have found +that maintaining these on Hackage while they are under such active development +is counterproductive. + +Instead, libraries live in the `lib/` folder of the primary project with which +they are associated (Luna, Luna Studio, or Dataframes). These libraries may be +freely used by others of our projects by depending on a git commit of the +project that they live in. All of these are safe to use. + +#### Prologue +`Prologue` is our replacement for Haskell's `Prelude`. For the most part it is +compatible with the prelude, though it is designed with a safe API as the first +port of call. + +As a rule of thumb, if the prelude exports a partial function, that function has +been made total in Prologue. This usually takes the form of returning `Maybe`, +rather than throwing an error (e.g. `head :: [a] -> Maybe a`). In the case where +a function has been redefined like this, the original version is available using +an unsafe name (e.g. `unsafeHead` in the case above). + +Prologue also exports additional useful functionality from across the Haskell +ecosystem, such as utilities for working with Lenses and for writing type-level +computation. + +It is highly recommended that you scan the code of Prologue. + +#### Safety +It is incredibly important that we can trust the code that we use, and hence we +tend to disallow the definition of unsafe functions in our public API. When +defining an unsafe function, you must account for the following: + +- It must be named `unsafeX`. +- Unsafe functions should only be used in the minimal scope in which it can be + shown correct, not in larger pieces of code. +- Unsafe function definition must be accompanied by a source note explaining why + it is not defined safely (e.g. performance). +- Unsafe function usage must be accompanied by a source note explaining why this + usage of it is safe. + +Furthermore, we do not allow for code containing pattern matches that can fail. + +#### Control.Monad.Exception +We have our own exception framework based on `ExceptT` that encodes exception +usage at the type level. This ensures that all synchronous exceptions must be +dealt with. + +It is defined in [`lib/exception/`](https://github.com/luna/luna/tree/master/lib/exception) +and contains utilities for declaring that a function throws an exception, as +well as throwing and catching exceptions. + +The primary part of this API is the `Throws` constraint, which can be passed +both a single exception type or a list of exceptions. It is a monadic exception +framework. + +```hs +myFunction :: Throws '[MyErrorOne, MyErrorTwo] m => ArgType -> m ReturnType +``` + +We encourage our programmers to define their own exception types, and when doing +so they should use the following guidelines: + +- We name them using 'Error' rather than 'Exception', so `MyError`, rather than + `MyException`. +- We always provide an instance of `Exception` for our exception type. +- We avoid encoding error information as strings, instead passing a strongly + typed representation of the problem around. This often means that we end up + re-wrapping an error thrown inside our function. + +### Modules +Unlike much of the Haskell ecosystem, we tend to design modules to be imported +_qualified_ rather than unqualified. This means that we have a few rules to keep +in mind: + +- When designing a module that exports a type, the module should be named after + that type. If it exports multiple types, there should be a primary type, or + the other types should be factored out into their own modules. +- We import modules as their name. If you have a module `Luna.Space.MyType`, we + import it qualified as `MyType`. +- Functions should be named with the assumption of being used qualified. This + means that we rarely refer to the module name in the function name (e.g. + `State.run` rather than `State.runState`). + +### Data Declarations +When declaring data types in the Luna codebases, please make sure to keep the +following rules of thumb in mind: + +- For single-constructor types: + + Write the definition across multiple lines. + + Always name your fields. + + Always generate lenses. + + ```hs + data Rectangle = MkRectangle + { _width :: Double + , _height :: Double + } deriving (Eq, Ord, Show) + makeLenses ''Rectangle + ``` +- For multiple-constructor data-types: + + Write the definition across multiple lines. + + Never name your fields. + + Generate prisms only when necessary. + + ```hs + data Shape + = ShapeCircle Circle + | ShapeRect Rectangle + deriving (Eq, Ord, Show) + ``` + +- Always prefer named fields over unnamed ones. You should only use unnamed + fields if one or more of the following hold: + + Your data type is one where you are are _sure_ that separate field access + will never be needed. + + You are defining a multiple-constructor data type. +- Sort deriving clauses in alphabetical order, and derive the following for your + type if logically correct: + + General Types: `Eq`, `Generic`, `NFData`, `Ord`, `Show`. + + Parametric 1-Types: `Applicative`, `Alternative`, `Functor`. + + Monads: `Monad`, `MonadFix`. + + Monad Transformers: `MonadTrans`. + +#### Lenses +The Luna codebases make significant use of Lenses, and so we have some rules for +their use: + +- Always use the `makeLenses` wrapper exported from `Prologue`. +- Always generate lenses for single-constructor data types. +- Never generate lenses for multi-constructor data types (though you may + sometimes want to generate prisms). +- Fields in data types should be named with a single underscore. +- If you have multiple types where the fields need the same name, the `Prologue` + lens wrappers will disambiguate the names for you as follows as long as you + use a double underscore in the data declaration (e.g. `__x`). + +```hs +data Vector = Vector + { __x :: Double + , __y :: Double + , __z :: Double + } deriving (Show) +makeLenses ''Vector + +data Point = Point + { __x :: Double + , __y :: Double + } deriving (Show) +makeLenses ''Point +``` + +This will generate lenses with names like `vector_x`, `vector_y`, and `point_x`, +`point_y`. + +### Testing and Benchmarking +New code should always be accompanied by tests. These can be unit, integration, +or some combination of the two, and they should always aim to test the new code +in a rigorous fashion. + +- We tend to use `HSpec`, but also make use of QuickCheck for property-based + testing. +- Tests should be declared in the project configuration so they can be trivially + run, and should use the mechanisms HSpec provides for automatic test + discovery. +- A test file should be named after the module it tests. If the module is named + `Luna.MyModule`, then the test file should be named `Luna.MyModuleSpec`. + +Any performance-critical code should also be accompanied by a set of benchmarks. +These are intended to allow us to catch performance regressions as the code +evolves, but also ensure that we have some idea of the code's performance in +general. + +- We use `Criterion` for our benchmarks. +- We measure time, but also memory usage and CPU time where possible. +- Where relevant, benchmarks may set thresholds which, when surpassed, cause the + benchmark to fail. These thresholds should be set for a release build, and not + for a development build. + +_Do not benchmark a development build_ as the data you get will often be +entirely useless. + +### Warnings, and Lints +In general, we aim for a codebase that is free of warnings and lints, and we do +this using the following ideas: + +#### Warnings +New code should introduce no new warnings onto master. You may build with +warnings on your own branch, but the code that is submitted as part of a PR +should not introduce new warnings. You should also endeavour to fix any warnings +that you come across during development. + +Sometimes it is impossible to fix a warning (e.g. TemplateHaskell generated code +often warns about unused pattern matches). In such cases, you are allowed to +suppress the warning at the module level using an `OPTIONS_GHC` pragma, but this +must be accompanied by a source note explaining _why_ the warning cannot be +fixed otherwise. + +#### Lints +We also recommend using HLint on your code as a stylistic guide, as we find that +its suggestions in general lead to more readable code. If you don't know how to +set up automatic linting for your editor, somebody will be able to help. + +An example of an anti-pattern that HLint will catch is the repeated-`$`. Instead +of `foo $ bar $ baz $ bam quux`, you should write `foo . bar. baz $ bam quux` +to use function composition. + +## Language Extensions +Much like any sophisticated Haskell codebase, Luna makes heavy use of the GHC +language extensions. We have a broad swath of extensions that are enabled by +default across our projects, and a further set which are allowed whenever +necessary. We also have a set of extensions that are allowed with care, which +must be used sparingly. + +When enabling a non-default extension, we never do it at the project or package +level. Instead, they are enabled on a file-by-file basis using a `LANGUAGE` +pragma. You may also negate default extensions, if necessary, using this same +technique. + +It should be noted that not all of the extensions listed below are available +across all compiler versions. If you are unsure whether an extension is +available to you, we recommend checking the GHC Users Guide entry for that +extension (linked from the extension's table below). + +### Default Extensions +The following language extensions are considered to be so safe, or to have such +high utility, that they are considered to be Luna's set of default extensions. +You can find said set of extensions for Luna itself defined in a +[common configuration file](https://github.com/luna/luna/blob/master/config/hpack-common.yaml). + +#### AllowAmbiguousTypes + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`AllowAmbiguousTypes`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-AllowAmbiguousTypes) | +| **Flag** | `-XAllowAmbiguousTypes` | + +This extension is particularly useful in the context of `-XTypeApplications` +where the use of type applications can disambiguate the call to an ambiguous +function. + +We often use the design pattern where a function has a free type variable not +used by any of its arguments, which is then applied via type applications. This +would not be possible without `-XAllowAmbiguousTypes`. + +#### ApplicativeDo + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`ApplicativeDo`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-ApplicativeDo) | +| **Flag** | `-XApplicativeDo` | + +This extension allows desugaring of do-notation based on applicative operations +(`<$>`, `<*>`, and `join`) as far as is possible. This will preserve the +original semantics as long as the type has an appropriate applicative instance. + +Applicative operations are often easier to optimise than monadic ones, so if +you can write a computation using applicatives please do. This is the same +reason that we prefer `pure` to `return`. + +#### BangPatterns + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`BangPatterns`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-BangPatterns) | +| **Flag** | `-XBangPatterns` | + +This extension allows for strict pattern matching, where the type being matched +against is evaluated to WHNF before the match takes place. This is very useful +in performance critical code where you want more control over strictness and +laziness. + +#### BinaryLiterals + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`BinaryLiterals`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-BinaryLiterals) | +| **Flag** | `-XBinaryLiterals` | + +This extensions allow for binary literals to be written using the `0b` prefix. +This can be very useful when writing bit-masks, and other low-level code. + +#### ConstraintKinds + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`ConstraintKinds`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-ConstraintKinds) | +| **Flag** | `-XConstraintKinds` | + +This allows any types which have kind `Constraint` to be used in contexts (in +functions, type-classes, etc). This works for class constraints, implicit +parameters, and type quality constraints. It also enables type constraint +synonyms. + +All of these are very useful. + +#### DataKinds + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------| +| **Name** | [`DataKinds`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-DataKinds) | +| **Flag** | `-XDataKinds` | + +This extension enables the promotion of data types to be kinds. All data types +are promoted to kinds and the value constructors are promoted to type +constructors. + +This is incredibly useful, and used heavily in the type-level programming that +makes the Luna codebase so expressive and yet so safe. + +#### DefaultSignatures + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`DefaultSignatures`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-DefaultSignatures) | +| **Flag** | `-XDefaultSignatures` | + +When you declare a default in a typeclass, it conventionally has to have exactly +the same type signature as the typeclass method. This extension lifts this +restriction to allow you to specify a more-specific signature for the default +implementation of a typeclass method. + +#### DeriveDataTypeable + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`DeriveDataTypeable`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-DeriveDataTypeable) | +| **Flag** | `-XDeriveDataTypeable` | + +This extension enables deriving of the special kind-polymorphic `Typeable` +typeclass. Instances of this class cannot be written by hand, and they associate +type representations with types. This is often useful for low-level programming. + +#### DeriveFoldable + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`DeriveFoldable`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-DeriveFoldable) | +| **Flag** | `-XDeriveFoldable` | + +This enables deriving of the `Foldable` typeclass, which represents structures +that can be folded over. This allows automated deriving for any data type with +kind `Type -> Type`. + +#### DeriveFunctor + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`DeriveFunctor`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-DeriveFunctor) | +| **Flag** | `-XDeriveFunctor` | + +This enables automated deriving of the `Functor` typeclass for any data type +with kind `Type -> Type`. + +#### DeriveGeneric + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`DeriveGeneric`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-DeriveGeneric) | +| **Flag** | `-XDeriveGeneric` | + +Enables automated deriving of the `Generic` typeclass. Generic is a typeclass +that represents the structure of data types in a generic fashion, allowing for +generic programming. + +#### DeriveTraversable + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`DeriveTraversable`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-DeriveTraversable) | +| **Flag** | `-XDeriveTraversable` | + +Enables automated deriving of the `Traversable` typeclass that represents types +that can be traversed. It is a valid derivation for any data type with kind +`Type -> Type`. + +#### DerivingStrategies + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`DerivingStrategies`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-DerivingStrategies) | +| **Flag** | `-XDerivingStrategies` | + +Under certain circumstances it can be ambiguous as to which method to use to +derive an instance of a class for a data type. This extension allows users to +manually supply the strategy by which an instance is derived. + +If it is not specified, it uses the defaulting rules as described at the above +link. + +#### DerivingVia + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`DerivingVia`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-DerivingVia) | +| **Flag** | `-XDerivingVia` | + +This allows deriving a class instance for a type by specifying another type of +equal runtime representation (such that there exists a Coercible instance +between the two). It is indicated by use of the `via` deriving strategy, and +requires the specification of another type (the via-type) to coerce through. + +#### DuplicateRecordFields + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`DuplicateRecordFields`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-DuplicateRecordFields) | +| **Flag** | `-XDuplicateRecordFields` | + +This extension allows definitions of records with identically named fields. This +is very useful in the context of Prologue's `makeLenses` wrapper as discussed +above in the section on [lenses](#lenses). + +#### EmptyDataDecls + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`EmptyDataDecls`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-EmptyDataDecls) | +| **Flag** | `-XEmptyDataDecls` | + +Allows the definition of data types with no value constructors. This is very +useful in conjunction with `-XDataKinds` to allow the creation of more safety +properties in types through the use of rich kinds. + +#### FlexibleContexts + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`FlexibleContexts`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-FlexibleContexts) | +| **Flag** | `-XFlexibleContexts` | + +This enables the use of complex constraints in class declarations. This means +that anything with kind `Constraint` is usable in a class declaration's context. + +#### FlexibleInstances + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`FlexibleInstances`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-FlexibleInstances) | +| **Flag** | `-XFlexibleInstances` | + +This allows the definition of typeclasses with arbitrarily-nested types in the +instance head. This, like many of these extensions, is enabled by default to +support rich type-level programming. + +#### Functional Dependencies + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`FunctionalDependencies`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-FunctionalDependencies) | +| **Flag** | `-XFunctionalDependencies` | + +Despite this extension being on the 'defaults' list, this is only for the very +rare 1% of cases where Functional Dependencies allow you to express a construct +that Type Families do not. + +You should never need to use a Functional Dependency, and if you think you do it +is likely that your code can be expressed in a far more clean manner by using +Type Families. + +#### GeneralizedNewtypeDeriving + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`GeneralizedNewtypeDeriving`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-GeneralizedNewtypeDeriving) | +| **Flag** | `-XGeneralizedNewtypeDeriving` | + +This enables the generalised deriving mechanism for `newtype` definitions. This +means that a newtype can inherit some instances from its representation. This +has been somewhat superseded by `-XDerivingVia` + +#### InstanceSigs + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`InstanceSigs`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-InstanceSigs) | +| **Flag** | `-XInstanceSigs` | + +This extension allows you to write type signatures in the instance definitions +for type classes. This signature must be identical to, or more polymorphic than, +the signature provided in the class definition. + +#### LambdaCase + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------| +| **Name** | [`LambdaCase`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-LambdaCase) | +| **Flag** | `-XLambdaCase` | + +Enables `\case` as an alternative to `case <...> of`. This often results in +much cleaner code. + +#### LiberalTypeSynonyms + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`LiberalTypeSynonyms`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-LiberalTypeSynonyms) | +| **Flag** | `-XLiberalTypeSynonyms` | + +This extension moves the type synonym validity check to _after_ the synonym is +expanded, rather than before. This makes said synonyms more useful in the +context of type-level programming constructs. + +#### MonadComprehensions + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`GeneralizedNewtypeDeriving`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-GeneralizedNewtypeDeriving) | +| **Flag** | `-XGeneralizedNewtypeDeriving` | + +Enables a generalisation of the list comprehension notation that works across +any type that is an instance of `Monad`. + +#### MultiParamTypeClasses + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`MultiParamTypeClasses`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-MultiParamTypeClasses) | +| **Flag** | `-XMultiParamTypeClasses` | + +Enables the ability to write type classes with multiple parameters. This is very +useful for type-level programming, and to express relationships between types in +typeclasses. + +#### MultiWayIf + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------| +| **Name** | [`MultiWayIf`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-MultiWayIf) | +| **Flag** | `-XMultiWayIf` | + +This extension allows GHC to accept conditional expressions with multiple +branches, using the guard-style notation familiar from function definitions. + +#### NamedWildCards +#### NegativeLiterals +#### NoImplicitPrelude + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`NoImplicitPrelude`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-NoImplicitPrelude) | +| **Flag** | `-XNoImplicitPrelude` | + +Disables the implicit import of the prelude into every module. This enables us +to use `Prologue`, our own custom prelude (discussed in the section on +[prologue](#prologue)). + +#### NumDecimals + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`NumDecimals`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-NumDecimals) | +| **Flag** | `-XNumDecimals` | + +Enables writing integer literals using exponential syntax. + +#### OverloadedLabels + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`OverloadedLabels`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-OverloadedLabels) | +| **Flag** | `-XOverloadedLabels` | + +Enables support for Overloaded Labels, a type of identifier whose type depends +both on its literal text and its kind. This is similar to `-XOverloadedStrings`. + +#### OverloadedStrings + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`OverloadedStrings`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-OverloadedStrings) | +| **Flag** | `-XOverloadedStrings` | + +Enables overloading of the native `String` type. This means that string literals +are given their type based on contextual information as, and a string literal +can be used to represent any type that is an instance of `IsString`. + +#### PatternSynonyms + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`PatternSynonyms`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-PatternSynonyms) | +| **Flag** | `-XPatternSynonyms` | + +Pattern synonyms enable giving names to parametrized pattern schemes. They can +also be thought of as abstract constructors that don’t have a bearing on data +representation. They can be unidirectional or bidirectional, and are incredibly +useful for defining clean APIs to not-so-clean data. + +#### QuasiQuotes + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`QuasiQuotes`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-QuasiQuotes) | +| **Flag** | `-XQuasiQuotes` | + +Quasi-quotation allows patterns and expressions to be written using +programmer-defined concrete syntax. This extension enables the use of quotations +in Haskell source files. + +#### RankNTypes + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------| +| **Name** | [`RankNTypes`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-RankNTypes) | +| **Flag** | `-XRankNTypes` | + +Enables the ability to express arbitrary-rank polymorphic types (those with a +`forall` which is not on the far left of the type). These are incredibly useful +for defining clean and safe APIs. + +#### RecursiveDo + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`RecursiveDo`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-RecursiveDo) | +| **Flag** | `-XRecursiveDo` | + +This extension enables recursive binding in do-notation for any monad which is +an instance of MonadFix. Bindings introduced in this context are recursively +defined, much as for an ordinary `let`-expression. + +#### ScopedTypeVariables + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`ScopedTypeVariables`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-ScopedTypeVariables) | +| **Flag** | `-XScopedTypeVariables` | + +This enables lexical scoping of type variables introduced using an explicit +`forall` in the type signature of a function. With this extension enabled, the +scope of this variables is extended to the function body. + +#### StandaloneDeriving + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`StandaloneDeriving`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-StandaloneDeriving) | +| **Flag** | `-XStandaloneDeriving` | + +Allows the creation of `deriving` declarations that are not directly associated +with the class that is being derived. This is useful in the context where you +need to create orphan instances, or to derive some non-default classes. + +#### Strict + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------| +| **Name** | [`Strict`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-Strict) | +| **Flag** | `-XStrict` | + +We have found that making our code strict-by-default allows us to reason much +more easily about its performance. When we want lazy evaluation, we use a +combination of the negation flags and lazy pattern matching to achieve our +goals. + +When disabling strict for a module using `-XNoStrict`, you also need to add +`-XNoStrictData`. + +#### StrictData + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------| +| **Name** | [`StrictData`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-StrictData) | +| **Flag** | `-XStrictData` | + +Much like the above, this helps with reasoning about performance, but needs to +be explicitly disabled in contexts where the strictness is undesirable. + +#### TemplateHaskell + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`TemplateHaskell`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-TemplateHaskell) | +| **Flag** | `-XTemplateHaskell` | + +Enables the usage of Template Haskell, including the syntax for splices and +quotes. TH is a meta-language that allows for generating Haskell code from +arbitrary input. + +#### TupleSections + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`TupleSections`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-TupleSections) | +| **Flag** | `-XTupleSections` | + +Much like we can do operator sections to partially apply operators, this +extension enables partial application of tuple constructors. + +#### TypeApplications + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`TypeApplications`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-TypeApplications) | +| **Flag** | `-XTypeApplications` | + +This extension allows you to use visible type application in expressions. This +allows for easily providing types that are ambiguous (or otherwise) to GHC in a +way that doesn't require writing complete type signatures. We make heavy use of +type applications in our type-level programming and API. + +These should always be used as an alternative to `Proxy`, as they are just as +useful for passing type information around without provision of data, and lead +to nice and clean APIs. + +#### TypeFamilies + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`TypeFamilies`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-TypeFamilies) | +| **Flag** | `-XTypeFamilies` | + +Type Families can be thought of as type-level functions, or functions on types. +They are slightly more verbose than functional dependencies, but provide much +better reusability, clearer contexts, and are far easier to compose. They should +always be used in preference to functional dependencies. + +When using Type Families, please keep the following things in mind: + +- Prefer open type families to closed type families. +- Use closed type families if you want a fall-back when checking types. + + ```hs + type family SumOf where + SumOf Vector a = Vector + SumOf a Vector = Vector + SumOf a a = a + ``` + +- Do not use closed type families unless you are absolutely sure that your type + family should not be able to be extended in the future. + +#### TypeFamilyDependencies + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`TypeFamilyDependencies`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-TypeFamilyDependencies) | +| **Flag** | `-XTypeFamilyDependencies` | + +This extension allows type families to be annotated with injectivity information +using syntax similar to that used for functional dependencies. This information +is used by GHC during type-checking to resolve the types of expressions that +would otherwise be ambiguous. + +#### TypeOperators + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`TypeOperators`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-TypeOperators) | +| **Flag** | `-XTypeOperators` | + +Type operators is a simple extension that allows for the definition of types +with operators as their names. Much like you can define term-level operators, +this lets you define type-level operators. This is a big boon for the +expressiveness of type-level APIs. + +#### UnicodeSyntax + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`UnicodeSyntax`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-UnicodeSyntax) | +| **Flag** | `-XUnicodeSyntax` | + +Enables unicode syntax for certain parts of the Haskell language. + +#### ViewPatterns + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`ViewPatterns`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-ViewPatterns) | +| **Flag** | `-XViewPatterns` | + +View patterns provide a mechanism for pattern matching against abstract types by +letting the programmer execute arbitrary logic as part of a pattern match. This +is very useful for the creation of clean APIs. + +### Allowed Extensions +These extensions can be used in your code without reservation, but are not +enabled by default because they may interact negatively with other parts of the +codebase. + +#### BlockArguments + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`BlockArguments`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-BlockArguments) | +| **Flag** | `-XBlockArguments` | + +Block arguments allow expressions such as `do`, `\`, `if`, `case`, and `let`, +to be used as both arguments to operators and to functions. This can often make +code more readable than it otherwise would be. + +#### GADTs + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------| +| **Name** | [`GADTs`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-GADTs) | +| **Flag** | `-XGADTs` | + +This enables Generalised Algebraic Data Types, which expand upon normal data +definitions to allow both contexts for constructors and richer return types. +When this extension is enabled, there is a new style of data declaration +available for declaring GADTs. + +#### HexFloatLiterals + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`HexFloatLiterals`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-HexFloatLiterals) | +| **Flag** | `-XHexFloatLiterals` | + +Enables the ability to specify floating point literals using hexadecimal to +ensure that no rounding or truncation takes place. + +#### MagicHash + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------| +| **Name** | [`MagicHash`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-MagicHash) | +| **Flag** | `-XMagicHash` | + +Allows the use of `#` as a postfix modifier to identifiers. This allows the +programmer to refer to the names of many of GHC's internal unboxed types for use +in surface Haskell. + +#### NumericUnderscores + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`NumericUnderscores`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-NumericUnderscores) | +| **Flag** | `-XNumericUnderscores` | + +This extension allows breaking up of long numeric literals using underscores +(e.g `1_000_000_000`), which can often aid readability. + +#### PolyKinds + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------| +| **Name** | [`PolyKinds`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-PolyKinds) | +| **Flag** | `-XPolyKinds` | + +This extension enables users to access the full power of GHC's kind system, and +allows for programming with kinds as well as types and values. It expands the +scope of kind inference to bring additional power, but is sometimes unable to +infer types at the value level as a result. + +You should only enable `-XPolyKinds` in contexts where you need the power that +it brings. + +#### Quantified Constraints + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`QuantifiedConstraints`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-QuantifiedConstraints) | +| **Flag** | `-XQuantifiedConstraints` | + +Quantified constraints bring additional expressiveness to the constraint +language used in contexts in GHC Haskell. In essence, it allows for contexts to +contain nested contexts, and hence for users to express more complex constraints +than they would otherwise be able to. + +#### RoleAnnotations + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`RoleAnnotations`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-RoleAnnotations) | +| **Flag** | `-XRoleAnnotations` | + +Role annotations allow programmers to constrain the type inference process by +specifying the roles of the class and type parameters that they declare. + +#### UnboxedSums + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`UnboxedSums`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-UnboxedSums) | +| **Flag** | `-XUnboxedSums` | + +Enables the syntax for writing anonymous, unboxed sum types. These can be very +useful for writing performance critical code, as they can be used as a standard +anonymous sum type, including in pattern matching and at the type level. + +#### UnboxedTuples + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`UnboxedTuples`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-UnboxedTuples) | +| **Flag** | `-XUnboxedTuples` | + +This extension enables the syntax and use of unboxed tuples. This can be thought +of as a dual to the above `-XunboxedSums` as it allows for the declaration and +manipulation of unboxed product types. + +### Allowed With Care +If you make use of any of these extensions in your code, you should accompany +their usage by a source note that explains why they are used. + +#### CApiFFI + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------| +| **Name** | [`CApiFFI`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-CApiFFI) | +| **Flag** | `-XCApiFFI` | + +Enables the `capi` calling convention for foreign function declarations. This +should only be used when you need to call a foreign function using the C-level +API, rather than across the platform's ABI. This enables the programmer to make +use of preprocessor macros and the like, as the call is resolved as if it was +against the C language. + +#### CPP + +| | | +|:---------|:---------------------------------------------------------------------------------------------------------| +| **Name** | [`CPP`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-CPP) | +| **Flag** | `-XCPP` | + +Enables the C preprocessor. We strongly discourage use of the preprocessor, but +it is sometimes unavoidable when you need to do purely string-based +preprocessing of a Haskell source file. It should only be used if you have _no_ +_other_ solution to your problem. + +#### PostfixOperators + +| | | +|:---------|:-----------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`PostfixOperators`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-PostfixOperators) | +| **Flag** | `-XPostfixOperators` | + +Enables the definition of left-sections for postfix operators. Please take care +if you enable this that it does not lead to unclear code. You should not export +a postfix operator from a module, as we do not condone enabling this throughout +the entire codebase. + +#### StaticPointers + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`StaticPointers`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-StaticPointers) | +| **Flag** | `-XStaticPointers` | + +Allows static resolution of a limited subset of expressions to a value at +compile time. This allows for some precomputation of values during compilation +for later lookup at runtime. While this can be useful for some low-level code, +much care must be taken when it is used. + +#### UndecidableInstances + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`UndecidableInstances`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-UndecidableInstances) | +| **Flag** | `-XUndecidableInstances` | + +This extension permits the definition of typeclass instances that could +potentially lead to non-termination of the type-checker. This is sometimes +necessary to define the instance you want, but care must be taken to ensure that +you only enable this extension when you are _sure_ that your instances are +terminating. + +#### UndecidableSuperclasses + +| | | +|:---------|:-------------------------------------------------------------------------------------------------------------------------------------------------| +| **Name** | [`UndecidableSuperclasses`](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/glasgow_exts.html#extension-UndecidableSuperclasses) | +| **Flag** | `-XUndecidableSuperclasses` | + +Permits the definition of superclass constraints which can potentially lead to +the non-termination of the type-checker. Much like the above, this is sometimes +necessary but should only be enabled when you are _sure_ that you will not +cause the typechecker to loop. + +### Disallowed Extensions +If a language extension hasn't been listed in the above sections, then it is +considered to be disallowed throughout the Luna codebases. If you have a good +reason to want to use one of these disallowed extensions, please talk to Ara or +Wojciech to discuss its usage. + +If an extension not listed above is _implied_ by one of the extensions listed +above (e.g. `-XRankNTypes` implies `-XExplicitForall`), then the implied +extension is also considered at least as safe as the category the implying +extension is in. diff --git a/doc/scala-style-guide.md b/doc/scala-style-guide.md new file mode 100644 index 00000000000..7835b5a58ce --- /dev/null +++ b/doc/scala-style-guide.md @@ -0,0 +1,306 @@ +# Scala Style Guide +Like many style guides, this Scala style guide exists for two primary reasons. +The first is to provide guidelines that result in a consistent code style across +all of the Luna codebases, while the second is to guide people towards a style +that is expressive while still easy to read and understand. + +In general, it aims to create a set of 'zero-thought' rules in order to ease the +programmer burden; there is usually only _one way_ to lay out code correctly. + + + +- [Code Formatting](#code-formatting) + - [Naming](#naming) + - [Imports](#imports) + - [Visibility](#visibility) + - [Section Headers](#section-headers) +- [Build Tooling](#build-tooling) +- [Commenting](#commenting) + - [Documentation Comments](#documentation-comments) + - [Source Notes](#source-notes) + - [TODO Comments](#todo-comments) + - [Other Comment Usage](#other-comment-usage) +- [Program Design](#program-design) + - [Safety](#safety) + - [Testing and Benchmarking](#testing-and-benchmarking) + - [Warnings, and Lints](#warnings-and-lints) + + + +## Code Formatting +This section explains the rules for visually laying out your code. They provide +a robust set of guidelines for creating a consistent visual to the code. + +Primary formatting is dealt with through use of the Scala formatting tool +[`scalafmt`](https://scalameta.org/scalafmt/), which enforces rules around +whitespace, line-wrapping, and alignment. The Luna repository contains the main +[`.scalafmt.conf`](../.scalafmt.conf) configuration file, and this is what +should be used for all new Scala projects. + +All files must be formatted using `scalafmt` before commit, and this should be +set up as either a precommit hook, or using the integration in IntelliJ. If you +use the IntelliJ integration, please note that you need only have the official +[Scala Plugin](https://www.jetbrains.com/help/idea/discover-intellij-idea-for-scala.html) installed, and be using IntelliJ 2019.1 +or later. You should _not_ use the independent Scalafmt plugin. + +### Naming +Luna has some fairly simple general naming conventions, though the sections +below may provide more rules for use in specific cases. + +- Types are written using `UpperCamelCase`. +- Variables and function names are written using `camelCase`. +- If a name contains an initialism or acronym, all parts of that initialism + should be of the same case: `httpRequest` or `makeHTTPRequest`. +- Short variable names such as `a` and `b` should only be used in contexts where + there is no other appropriate name, and should _never_ be used to refer to + temporary data in a function. +- Names should be descriptive, even if this makes them longer. + +### Imports +Organisation of imports is simple, and should be ordered alphabetically within +each of the following sections. Each section should be separated from the one +above using a blank line. + +1. **Standard Library:** Any imports from the standard library. +2. **Java Standard Library:** Any imports from the Java standard library. +3. **Additional Dependencies:** Imports from project dependencies. + +In general, we prefer not to import unqualified into the package scope, as this +just leads to additional clutter. + +### Visibility +There is nothing more frustrating than needing to use a function that hasn't +been exported from a package. To this end, we strongly discourage making things +private or protected in our codebase. + +If, however, you want to indicate that something is for internal use, you use +one of the following two methods. + +1. **Nested Types:** Declaration of inner types called `Internal`. +2. **Internal Packages:** For a package named `com.luna-lang.package` that + contains `MyType`, we can define internal functions and data-types in a + package named `com.luna-lang.package.mytype`. This means that these functions + can be imported by clients of the API if they need to, but that we provide no + guarantees about API stability when using those functions. + +### Section Headers +In order to visually break up the code for easier 'visual grepping', we organise +it using section headers. These allow us to easily find the section that we are +looking for, even in a large file. + +For each Scala type, within the body of the type, we organise functions as +follows: + +```hs +-- === Definition === -- +{- The definition of the type goes here -} + + +-- === API === -- +{- The API of the type goes here -} + + +-- === Instances === -- +{- Any instances for the type go here -} + +``` + +The section header must be preceded by three blank lines, while the subsection +headers (except the first) should be preceded by two blank lines. Any of these +subsections may be omitted if they don't exist. + +## Build Tooling +All Scala projects in the Luna organisation should manage their dependencies and +build setup using [SBT](hhttps://www.scala-sbt.org/1.x/docs/index.html). + +If you are using IntelliJ, please ensure that you select to use the SBT shell +for both imports and builds. + +## Commenting +Comments are a tricky area to get right, as we have found that comments often +expire quickly and, in absence of a way to validate them, remain incorrect for +long periods of time. That is not to say, however, that we eschew comments +entirely. Instead, we make keeping comments up to date an integral part of our +programming practice, while also limiting the types of comments that we allow. + +When we write comments, we try to follow one general guideline. A comment should +explain _what_ and _why_, without mentioning _how_. The _how_ should be +self-explanatory from reading the code, and if you find that it is not, that is +a sign that the code in question needs refactoring. + +Code should be written in such a way that it guides you over what it does, and +comments should not be used as a crutch for badly-designed code. + +### Documentation Comments +One of the primary forms of comment that we allow across the Luna codebases is +the doc comment. These are intended to be consumed by users of the API, and use +the standard [scaladoc](https://docs.scala-lang.org/style/scaladoc.html) syntax. +Doc comments should: + +- Provide a short one-line explanation of the object being documented. +- Provide a longer description of the object, including examples where relevant. +- Explain the arguments to a function where relevant. + +They should not reference internal implementation details, or be used to explain +choices made in the function's implementation. See [Source Notes](#source-notes) +below for how to indicate that kind of information. + +### Source Notes +Source Notes is a mechanism for moving detailed design information about a piece +of code out of the code itself. In doing so, it retains the key information +about the design while not impeding the flow of the code. + +Source notes are detailed comments that, like all comments, explain both the +_what_ and the _why_ of the code being described. In very rare cases, it may +include some _how_, but only to refer to why a particular method was chosen to +achieve the goals in question. + +A source note comment is broken into two parts: + +1. **Referrer:** This is a small comment left at the point where the explanation + is relevant. It takes the following form: `// Note [Note Name]`, where + `Note Name` is a unique identifier across the codebase. These names should be + descriptive, and make sure you search for it before using it, in case it is + already in use. +2. **Source Note:** This is the comment itself, which is a large block comment + placed after the first function in which it is referred to in the module. It + uses the scala block-comment syntax `/* ... */`, and the first line names + the note using the same referrer as above: `/* Note [Note Name]`. The name(s) + in the note are underlined using a string of the `~` (tilde) character. + +A source note may contain sections within it where necessary. These are titled +using the following syntax: `== Note [Note Name (Section Name)]`, and can be +referred to from a referrer much as the main source note can be. + +Sometimes it is necessary to reference a source note in another module, but this +should never be done in-line. Instead, a piece of code should reference a source +note in the same module that references the other note while providing +additional context to that reference. + +An example, based on some code in the GHC codebase, can be seen below: + +```scala +{ +def prepRHS (env : SimplEnv, outExpr : OutExpr) : SimplM (SimplEnv, OutExpr) = { + (ty1, _ty2) <- coercionKind env // Note [Float Coercions] + + if (!isUnliftedType ty1) { + newTy1 = convertTy ty1 // Note [Float Coercions (Unlifted)] + + ...more expressions defining prepRHS... + } +} + +/* Note [Float Coercions] +~~~~~~~~~~~~~~~~~~~~~~~~~ +When we find the binding + x = cast(e, co) +we'd like to transform it to + x' = e + x = cast(x, co) // A trivial binding +There's a chance that e will be a constructor application or function, or +something like that, so moving the coercion to the usage site may well cancel +the coercions and lead to further optimisation. + ...more stuff about coercion floating... + +== Note [Float Coercions (Unlifted)] +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + ...explanations of floating for unlifted types... +*/ +} +``` + +A source note like this is useful whenever you have design decisions to explain, +but can also be used for: + +- **Formulae and Algorithms:** If your code makes use of a mathematical formula, + or algorithm, it should note where the design element came from, preferably + with a link. +- **Safety:** Sometimes it is necessary to use an unsafe API in a context where + it is trivially made safe. You should always use a source note to explain why + its usage is safe in this context. + +### TODO Comments +We follow a simple convention for `TODO` comments in our codebases: + +- The line starts with `TODO` or `FIXME`. +- It is then followed by the author's initials `[ARA]`, or for multiple people + `[ARA, WD]`, in square brackets. +- It is then followed by an explanation of what needs to be done. + +For example: + +```scala +{ +// TODO [ARA] This is a bit of a kludge. Instead of X it should to Y, accounting +// for the fact that Z. +} +``` + +### Other Comment Usage +There are, of course, a few other situations where commenting is very useful: + +- **Commenting Out:** You may comment out code while developing it, but if you + commit any commented out code, it should be accompanied by an explanation of + why said code can't just be deleted. +- **Bugs:** You can use comments to indicate bugs in our code, as well as + third-party bugs. In both cases, the comment should link to the issue tracker + where the bug has been reported. + +## Program Design +Any good style guide goes beyond purely stylistic rules, and also talks about +design styles to use in code. + +### Safety +It is incredibly important that we can trust the code that we use, and hence we +tend to disallow the definition of unsafe functions in our public API. When +defining an unsafe function, you must account for the following: + +- It must be named `unsafeX`. +- Unsafe functions should only be used in the minimal scope in which it can be + shown correct, not in larger pieces of code. +- Unsafe function definition must be accompanied by a source note explaining why + it is not defined safely (e.g. performance). +- Unsafe function usage must be accompanied by a source note explaining why this + usage of it is safe. + +Furthermore, we do not allow for code containing pattern matches that can fail. + +### Testing and Benchmarking +New code should always be accompanied by tests. These can be unit, integration, +or some combination of the two, and they should always aim to test the new code +in a rigorous fashion. + +- We tend to use ScalaTest, but also make use of ScalaCheck for property-based + testing. +- Tests should be declared in the project configuration so they can be trivially + run. +- A test file should be named after the module it tests. + +Any performance-critical code should also be accompanied by a set of benchmarks. +These are intended to allow us to catch performance regressions as the code +evolves, but also ensure that we have some idea of the code's performance in +general. + +- We use Caliper for our benchmarks. +- We measure time, but also memory usage and CPU time where possible. +- Where relevant, benchmarks may set thresholds which, when surpassed, cause the + benchmark to fail. These thresholds should be set for a release build, and not + for a development build. + +_Do not benchmark a development build_ as the data you get will often be +entirely useless. + +### Warnings, and Lints +In general, we aim for a codebase that is free of warnings and lints, and we do +this using the following ideas: + +#### Warnings +New code should introduce no new warnings onto master. You may build with +warnings on your own branch, but the code that is submitted as part of a PR +should not introduce new warnings. You should also endeavour to fix any warnings +that you come across during development. + +Sometimes it is impossible to fix a warning (often in situations involving the +use of macros). In such cases, you are allowed to suppress the warning locally, +but this must be accompanied by a source note explaining why.