- As asked for by @hubertp who was encountering flaky test failures on CI in the Http_Spec and related ones, I'm adding retry logic to make such cases much less likely.
- I've made the test server randomly fail 50% of tests and with the retry logic the tests are still passing, so I think that should be much more robust, in practice the failure rate is much much less (I imagine <1% as most of the time these tests were working and we do a ton of requests in a single CI run).
- I move the `with_retries` method to now be `Test.with_retries` which can be used anywhere in our tests for the retry logic.
- It sleeps for 0.1s between retries. Not all kinds of tests need it, this was mostly for propagation delays in the Cloud in our tests. I was thinking if the delay should be configurable, but I think the 0.1s delay is not problematic and if our tests are sometimes failing due to high machine load, the delay could also help.
- This _does not_ add retry logic to raw HTTP operations or `Data.fetch`. We may add that later, but that needs some further design. In such case we may remove some retries from tests if they become unnecessary.
`Jackson_Object` supported parsing but not creating JSON from text. With this change, `Jackson_Object` is on par with `JS_Object` API and replaces the latter.
The most visible differences come from more detailed parsing exception's messages. Had to add some special cases for corner cases like `NaN` or infinity.
Closes#9473.
`42 == (Error.throw "foo")` now correctly returns an `Error` rather than False
# Important Notes
The error was in the wrong usage of the `org.enso.interpreter.dsl.AcceptsError` DSL annotation.
Move the types from `Standard.Table.Data` to `Standard.Table`.
Exceptions:
- `Standard.Table.Data.Report_Unmatched` => `Standard.Table.Constants`.
- `Standard.Table.Data.Join_Kind_Cross` => `Standard.Table.Internal.Join_Kind_Cross`.
Also removed constructor as an atom type.
- `Standard.Table.Extensions.Table_Ref` => `Standard.Table.Internal.Table_Ref`.
- `Standard.Table.Data.Type.Value_Type_Helpers` => `Standard.Table.Internal.Value_Type_Helpers`.
- `Standard.Table.Data.Type.Enso_Types` => `Standard.Table.Internal.Value_Type_Helpers`.
- `Standard.Table.Data.Type.Storage` => `Standard.Table.Internal.Storage`.
Changed all `Standard.Table` imports inside project to be project.
Favoured importing from `Standard.Table.Main` in `Standard.Database`.
Also fixed some linting in Enso_File.
- Closes#9284
- Now our tests run without the default `AWS_` config, thus ensuring that the tested setups work in a clean environment.
- After all, more complicated logic was needed for buckets access - apparently the AWS SDK only allows for some operations on buckets to happen if the client is connected to the correct region. Thus detection of bucket regions had to be implemented.
- Added `AWS_Region` widget based on autoscoping.
- Fixed `AWS_Credential.profile_names` crashing if no AWS config was found. Now it returns no profiles if not found. Added a regression test.
Make expand_to_rows work for Table and Column
Make expand_column work for Column and Row
Makes solving the book club exercise easier
![image](https://github.com/enso-org/enso/assets/1720119/4532fc38-8765-43d9-b6a0-61254f52239f)
# Important Notes
We decided expand_column did not make sense for Table as the resulting Table of Columns would rarely be what was wanted.
* Initial connection to Snowflake via an account, username and password.
* Fix databases and schemas in Snowflake.
Add warehouses.
* Add warehouse.
Update schema dropdowns.
* Add ability to set warehouse and pass at connect.
* Fix for NPE in license review
* scalafmt
* Separate Snowflake from Database.
* Scala fmt.
* Legal Review
* Avoid using ARROW for snowflake.
* Tidy up Entity_Naming_Properties.
* Fix for separating Entity_Namimg_Properties.
* Allow some tweaking of Postgres dialect to allow snowflake to use as well.
* Working on reading Date, Time and Date Times.
* Changelog.
* Java format.
* Make Snowflake Time and TimeStamp stuff work.
Move some responsibilities to Type_Mapping.
* Make Snowflake Time and TimeStamp stuff work.
Move some responsibilities to Type_Mapping.
* fix
* Update distribution/lib/Standard/Database/0.0.0-dev/src/Connection/Connection.enso
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
* PR comments.
* Last refactor for PR.
* Fix.
---------
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
- Closes#9300
- Now the Enso libraries are themselves capable of refreshing the access token, thus there is no more problems if the token expires during a long running workflow.
- Adds `get_optional_field` sibling to `get_required_field` for more unified parsing of JSON responses from the Cloud.
- Adds `expected_type` that checks the type of extracted fields. This way, if the response is malformed we get a nice Enso Cloud error telling us what is wrong with the payload instead of a `Type_Error` later down the line.
- Fixes `Test.expect_panic_with` to actually catch only panics. Before it used to also handle dataflow errors - but these have `.should_fail_with` instead. We should distinguish these scenarios.
Table.from_union creates a new table when passed in a vector of tables. This is especially helpful when a grouped method is run multiple times, as it can create a unified result set.
- Fix `Excel_Workbook.sheet` and add a test.
- Add icon for `Table.row_count` and `DB_Table.row_count`.
- Make `join_kind` widget `Display.Always`.
- Add expression as an option to `Aggregate_Column`.
- Add `Simple_Calculation.Copy` to create a copy.
- Add defaults to `Simple_Expression` so less errory.
- Set period to default to day for `date_diff` allowing use in expressions.
- Add `Text_Left`, `Text_Right`, `Text_Length` and `Format` to `Simple_Expression`.
Follow up on #9150 - making sure that Arrow builder is not accidentally treated as an Array by disallowing reading elements.
# Important Notes
Also making sure that the length of the resulting Arrow Array is consistent with what user requested.
This is a first naïve implementation of Table.running that only supports count. Adding it as a scaffold for the rest of the functionality and to give us a place to agree on the API. (Which I changed slightly from the design)
![image](https://github.com/enso-org/enso/assets/1720119/a62a83ed-f864-4295-98ea-1007f62381b1)
# Important Notes
Only supports Statistic.Count. Other functionality to follow.
- Adds the Excel format as one of the formats supported when creating a data link.
- The data link can choose to read the file as a workbook, or read a sheet or range from it as a table, like `Excel_Format`.
- Also updated Delimited format dialog to allow customizing the quote style.
Including arrow language in the distribution by default. Added a basic example for creating an Arrow array.
Making sure that memory layout agrees with Arrow specification (padding, continuous allocation of memory chunks).
Related to #9118.
This should unblock work on allowing serialization/deserialization to/from Parquet but I'd like to delay it to a follow up ticket as it is going to be a significant amount of specialized work.
- Implements the core parts of #9048
- Currently the path resolution is done by resolving each segment, one by one - requiring as many API calls as there are segments in the path.
- This should be replaced in a followup PR, once https://github.com/enso-org/cloud-v2/issues/899 is implemented.
- Added `to_table` extensions on some core types.
- Added ICON to `Any.to`.
- Added ICON to `Column.info`, `Table.info`, `DB_Column.info` and `DB_Table.info`.
- Added defaults to `Table.cross_tab` and `DB_Table.cross_tab`.
- Added `name`, `get`, `at`, `inner_xml` and `outer_xml` to `XML_Document`.
- Added constants into left hand side of simple expressions.
- Added widget to `get` and `at` on `XML_Document` and `XML_Element`. (Some bug in annotation code with Dmitry)
- Altered `get` and `at` to not allow XPath and just get direct child/attribute values.
- Added `get_xpath` to `XML_Document`.
- Renamed `get_elements_by_tag_name` to `get_descendants_by_tag_name` and added new `get_children_by_tag_name`.
- Added `child_names` and `attribute_names` to `XML_Document` and `XML_Element`.
- 4 weeks ago we discussed with @jdunkerley we may want to move this module.
- `Data` is a submodule mostly for various data-structures etc., the cloud stuff did not fit that well in there.
* Test describing the current behavior of chained if then else application
* Chained block should behave just like Group around if_then_else
* Finishing line on BlockStart fixes if_then_else_chained_block
* Only finish the line when there was not start of a macro segment
* Fix tests
* Refine else-body with macro patterns.
* Update test syntax to maintain original semantics
* Few additional tests
---------
Co-authored-by: Kaz Wesley <kaz@lambdaverse.org>
Simplify the `Test.Suite.run_with_filter` to accept a single filter parameter that searches for all the groups and specs that matches that filter. This filter can be a simple text provided from the command line.
# Important Notes
- Pending groups are now printed at the end of the run
- `Test.Suite.run_with_filter` is simplified to accept a single filter parameter that is either `Text` or `Nothing`. See the docs.
- Passing a filter from the command line is therefore straightforward, it is treated as a regex.
- For convenience, I have left all the `main` methods in all the test sources. I have just refactored them to accept the `filter` argument from the command line.
- For example, to run only a single spec from `Vector_Spec.enso`, invoke `enso --run test/Base_Tests/src/Data/Vector_Spec.enso "should allow vector creation with a programmatic constructor"`
- **Majority of the PR is a regex replace** of `^main =` for `main filter=Nothing =` and of `suite.run_with_filter` for `suite.run_with_filter filter`.
- **Fixed some internal engine bugs:**
- `AtomWithHole` allows to specify only one hole - https://github.com/enso-org/enso/pull/9065/files#diff-0f7bb7e85cf86a965de133aa7e6b5958ceb889bd1921c01e00d3a9ceb19626ef
- NaN keys in hash maps are handled in polyglot maps as well - c5257f6c2b78f893214ff67300893b593ea05e21..db4b3c0e9828ee79208d52e02586b24bb845b0d6
- Added `regex` function to `Standard.Base` as a way to easily and cleanly make regular expressions.
- Added `expr` and `Expression.Value` to distinguish expressions from text values.
- Fixed issues with `Table.join` widget so dropdown exists.
- Needed fully qualified name.
- Added default empty text values for right column to provided text input boxes.
- Deprecate `Table.filter_by_expression` and allow `Table.filter` to take an `Expression`.
- Added `Simple_Expression` and deprecated `Column_Operation`. Changes the order so takes a column then a calculation.
- Rename `column` to `value` and `new_name` to `as` on `Table.set`.
- Rename `name_column` to `names` on `Table.cross_tab`.
- Removed `Column_Ref.Expression` in favour of using `Expression.Value`.
In order to allow clever masking, slicing, filtering and arrow backing stores...
- Adding ColumnStorage interface with the base API a storage will need.
- Refactored each of the unary operations to a new `UnaryOperation` interface which makes them responsible for deciding if they can be executed.
Cleaning up some of the structures in Storage before working on UnaryOperations.
- Removed some legacy code: `countMask`, `Index` and `DefaultIndex`.
- Renamed `mask` to `applyFilter` on `Column` and `Storage`.
- Renamed `Table.mask` to `Table.filter`.
* Use glob pattern to discover stdlib tests (rather than a hardcoded list).
* Don't fail CI check immediately after failing Scala test.
* Remove meta test suite tests.
# Important Notes
The meta test suite tests are removed following the discussion with @radeusgd. In short, these were failing anyway and were supposed to be rewritten (probably using a different technology, like JUnit). The current code will be a useful reference but it doesn't have to be kept on a repository head. The relevant information and references shall be added to the task.
Disables two Order_By tests that fail when the Postgres Unicode collation locale is not en_GB.UTF8. Further research would be needed to figure out exactly how to handle locale-specific collation.
Follow-up of #8890
Refactor the rest of the tests to the builder API (`Test_New`):
- `Image_Tests`
- `Geo_Tests`
- `Google_Api_Test`
- `Examples_Test`
- `AWS_Tests`
- `Meta_Test_Suite_Tests`
- `Visualization_Tests`
# Important Notes
- Unrelated: Fix NPE in `File.new "/" . name`
Fixes teardown of SQLite spec. There used to be only `connection.close`, but we also have to call `connection.drop_table` for every created table.
This causes problems only in `[SQLite File]` tests. These are backed by a sqlite file and some tables are persistent in that table. It is possible, that before tests are run, this file is non-empty and contains garbage from previous runs.
### Important Notes
Should fix https://github.com/enso-org/enso/actions/runs/7724599547/job/21057063900#step:10:7243 that was triggered when `Table_Tests` were run on a non-clean runner.
---------
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
- ✅Linting fixes and groups.
- ✅Add `File.from that:Text` and use `File` conversions instead of taking both `File` and `Text` and calling `File.new`.
- ✅Align Unix Epoc with the UTC timezone and add converting from long value to `Date_Time` using it.
- ❌Add simple first logging API allowing writing to log messages from Enso.
- ✅Fix minor style issue where a test type had a empty constructor.
- ❌Added a `long` based array builder.
- Added `File_By_Line` to read a file line by line.
- Added "fast" JSON parser based off Jackson.
- ✅Altered range `to_vector` to be a proxy Vector.
- ✅Added `at` and `get` to `Database.Column`.
- ✅Added `get` to `Table.Column`.
- ✅Added ability to expand `Vector`, `Array` `Range`, `Date_Range` to columns.
- ✅Altered so `expand_to_column` default column name will be the same as the input column (i.e. no `Value` suffix).
- ✅Added ability to expand `Map`, `JS_Object` and `Jackson_Object` to rows with two columns coming out (and extra key column).
- ✅ Fixed bug where couldn't use integer index to expand to rows.
Refactor `Base_Tests` to `Test_New` testing framework. Mostly automatic text replacements.
# Important Notes
List of changes that were not done automatically (not via automatic text replacement):
- Fix indexes in Instrumentor_Spec - f590c4a398
- If group or spec is pending, its block is not evaluated - 8d797f1a4a
- Spec_Result is not private - 5767535af2
Tests marked as *pending*:
- #8913
- #8910
- Closes#8808 - adds tests for various scenarios.
- Implements `size` using HEAD.
- Updates existing functions to changes in Cloud API.
- Adds stubs for `*_time` methods, `parent`, `path`.
- [x] TODO: resolve the `Enso_File.current_working_directory` from an environment variable.
- ~~TODO: recursive directory deletion?~~ left for later
# Important Notes
- Currently, the Cloud API does not offer an easy way to extract metadata for a file, in particular to get the parent folder from the file `id`.
- We should be able to get the parent, and stuff like creation/modified time.
- We need a way to resolve paths to asset ids, for `path` to work as well as `current_working_directory`.
- What is the environment variable that will be used to feed the `current_working_directory` property?
Refactor `test/Table_Test` to the builder API. The builder API is in a new library called `Test_New` that is alongside the old `Test` library. There will be follow-up PRs that will migrate the rest of the tests. Meanwhile, let's keep these two libraries, and merge them after the last PR.
# Important Notes
- For a brief introduction into the new API, see **Prototype 1** section in https://github.com/enso-org/enso/pull/8622#issuecomment-1889706168
- When executing all the tests, the behavior should be the same as with the old library. With the only exception that if `ENSO_TEST_ANSI_COLORS` env var is set, the output is more colorful than it used to be.
Goal of this PR is to refactor the design of OrderMask and avoid copying arrays or lists wherever possible.
We have removed a few legacy functions which were not being used.
On a poor mans benchmark seems to be quicker (13s vs 16s) and memory usage should be lower.
Adds new benchmark joining a large table to a small table in preparation for a coming optimisation that will only index the smaller of the 2 tables in #8342
- Closes#8723
- Adds some missing features that were needed to make this work:
- `Enso_File.create_directory` and `Enso_File.delete`, and basic tests for it
- Changes how `Enso_Secret.list` is obtained - using a different Cloud endpoint allows us to implement the desired logic, the default endpoint was giving us _all_ secrets which was not what we wanted here.
- Implements `Enso_Secret.update` and tests for it
# Important Notes
Notes describing any problems with the current Cloud API:
https://docs.google.com/document/d/1x8RUt3KkwyhlxGux7XUGfOdtFSAZV3fI9lSSqQ3XsXk/edit
Apparently, everything that was needed to make this feature work has already been implemented, although a few features needed workarounds on Enso side to work properly.
- `up_to` should have the `step` as an optional argument.
- `Date_Range` conversion can do a clever auto rename, so if a period use the name of it.
- Add `to Table` for `Range`, `Date_Range`.
Implements `Warnings.get_all wrap_errors=True` which wraps warnings attached to values inside vectors with `Map_Error`, which includes the position of the value within the vector. See [the documentation](https://github.com/enso-org/enso/blob/develop/docs/semantics/wrapped-errors.md) for more details.
`get_all wrap_errors=True` does not change the warnings that are attached to values -- it wraps them before returning them to the caller, but does not change the original warnings attached to the values.
Wrapped warnings only appear attached to the vector itself. The values inside the vector do not have their warnings wrapped.
Warning propagation is not changed at all; `Warnings.get_all` (with default `wrap_errors=False`) behaves as before. `get_all wrap_errors=True` is meant to be used primarily by the IDE, although it can be used anywhere this wrapping is desired.
This is the follow up PR addressing the last couple of points from https://github.com/enso-org/enso/pull/8643 around what the return type from fill_nothing.
# Important Notes
The biggest change is changing what we size we need for an empty string. This change says a variable length string of length 1 and does it at a low enough level that it will effect the whole language. But I think that is correct.
* Add new test for required behaviour
* Handle case where strArg is an empty string
* More tests around fixed width field. Remove unneeded duplicate logic
* javafmtAll
* Further simplification
* SQLite doesn't have full type system
* SQLite doesn't have full type system
Give file read its own helper widget for delimiters. Remove newline add none. The file read delimiter is similar but different to the split one and so should have its own set of options.
- Closes#8555
- Refactors the file format detection logic, compacting lots of repetitive logic for HTTP handling into helper functions.
- Some updates to CODEOWNERS.
- After [suggestion](https://github.com/enso-org/enso/pull/8497#discussion_r1429543815) from @JaroslavTulach I have tried reimplementing the URL encoding using just `URLEncode` builtin util. I will see if this does not complicate other followup improvements, but most likely all should work so we should be able to get rid of the unnecessary bloat.
- Closes#8354
- Extends `simple-httpbin` with a simple mock of the Cloud API (currently it checks the token and serves the `/users` endpoint).
- Renames `simple-httpbin` to `http-test-helper`.
- Closes#8352
- ~~Proposed fix for #8493~~
- The temporary fix is deemed not viable. I will try to figure out a workaround and leave fixing #8493 to the engine team.
- Closes#8111 by making sure that all Excel workbooks are read using a backing file (which should be more memory efficient).
- If the workbook is being opened from an input stream, that stream is materialized to a `Temporary_File`.
- Adds tests fetching Table formats from HTTP.
- Extends `simple-httpbin` with ability to serve files for our tests.
- Ensures that the `Infer` option on `Excel` format also works with streams, if content-type metadata is available (e.g. from HTTP headers).
- Implements a `Temporary_File` facility that can be used to create a temporary file that is deleted once all references to the `Temporary_File` instance are GCed.
* tests
* wip
* wip
* additional warnings
* wip
* wip
* cleanup
* nested wrapping
* multiple nestings
* wraps_error uses looks_for, test for should_fail_with
* wip
* stack trace line fix
* use catch_primitive internally
* fix warning mapping, dtf spec
* just one wrapper checker, vector spec
* missing ctor, back to non-primitive catch
* back to c_p
* put old map back
* wip
* unnest tests
* Array.map on_problems
* wip
* Revert "wip"
This reverts commit c30d171457.
* better test names
* warning logging
* wip
* wip
* move logic into ALH
* doc
* constant
* My_Error.Error
* nested
* doc
* map_primtiive in warning mapper
* composition
* ref spec
* Remove warnings prior to matching on the value
If an expression has warnings and is matched we:
1) extract the warnings
2) execute the branch of a pattern that matches the value
3) attach extracted warnings to the result
This caused warnings to reappear when doing the custom warnings
manipulation.
This is also consistent with how `CaseNode`'s `doWarning` specialization
is defined.
* fix 1
* do not auto unwrap in test error checkers
* nested error matcher
* in problems too
* dtf
* v
* statistics
* wip
* Table_Spec, map_with_index_primitive
* Column_Operations_Spec
* disable warning wrapping and Report_Warning
* unimpl test
* Warnings_Spec
* DCS
* ACG JP
* zip_primitive
* join_helpers
* Lookup_Helpers
* Table
* Data_Formatter
* Value_Type_Helpers
* revert check types changes
* table_helpers
* table tests
* remove st
* do not remove warnings from value
* vec docs, tests for zip, mwi, flat_map
* docs, fixes
* remove nested_error_matcher
* cleanup
* benchmark
* one error
* alter
* add bench to main
* review
* review
* review
* tail call
* changelog
* tail call was not a tail call
* ws
* bad import
* Added missing import
* Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Array.enso
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
* review, ref example
* lazy benchmark data
* extra paren
* check outside of catch
* review
* vector too
* actually lazy
* disambiguate Map_Error
* finish rename
* move to extensions
* combine Additional_Warnings error
* rename to map_no_wrap
* do not catch and rethrow
* review
* wip
* remove _primitives entirely
* remove unused should_fail_with function options
* remove expected_warning as function in Problems
---------
Co-authored-by: Hubert Plociniczak <hubert.plociniczak@gmail.com>
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
Adds these JAR modules to the `component` directory inside Engine distribution:
- `graal-language-23.1.0`
- `org.bouncycastle.*` - these need to be added for graalpy language
# Important Notes
- Remove `org.bouncycastle.*` packages from `runtime.jar` fat jar.
- Make sure that the `./run` script preinstalls GraalPy standalone distribution before starting engine tests
- Note that using `python -m venv` is only possible from standalone distribution, we cannot distribute `graalpython-launcher`.
- Make sure that installation of `numpy` and its polyglot execution example works.
- Convert `Text` to `TruffleString` before passing to GraalPy - 8ee9a2816f
Fixes#5233 by removing `EconomicMap` & co. and using plain old good _linear hashing_. Fixes#8090 by introducing `StorageEntry.removed()` rather than copying the builder on each removal.
With herein proposed change one can pass an optional filter to `enso --run test/Benchmarks` to execute only groups and specs that contain given string in its name.
While trying to speed `MetadataStorage` up - #8324 - I felt the need to have an independent (on my computer) measurement of startup time. Here is a benchmark that measures how long execution of two simple hello world programs take.
# Important Notes
There are two benchmarks:
- `empty_startup` measures the time needed to boot without using any `Standard` library - basically _an overhead of the JVM and engine_
- `hello_world_startup` measures the use of `IO.println` - it shall take longer than `empty_startup` and show the overhead we have while processing the standard library
- Add a `File_For_Read` type. Used for `File_Format` to read files.
- Added `Enso_User` representing the current user in `Enso_Cloud`.
- *Will be later able to list known users.*
- Added `Enso_Secret` representing a value defined in `Enso_Cloud`.
- Value not used within Enso only accessed within polyglot Java.
- Integrated into `Username_And_Password` and can be used within JDBC connections.
- Integrated into HTTP Headers so a secret can be used as a value.
- New `URI_With_Query` with the same API as `URI`. Supporting secrets in the value.
- *Will be integrated with AWS credentials.*
- Added `Enso_File` representing a file or a folder in the cloud.
- Support the same API as `File` (like the `S3_File`).
- *Will support `enso://` URI style access.*
Upgrade to GraalVM JDK 21.
```
> java -version
openjdk version "21" 2023-09-19
OpenJDK Runtime Environment GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15)
OpenJDK 64-Bit Server VM GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15, mixed mode, sharing)
```
With SDKMan, download with `sdk install java 21-graalce`.
# Important Notes
- After this PR, one can theoretically run enso with any JRE with version at least 21.
- Removed `sbt bootstrap` hack and all the other build time related hacks related to the handling of GraalVM distribution.
- `project-manager` remains backward compatible - it can open older engines with runtimes. New engines now do no longer require a separate runtime to be downloaded.
- sbt does not support compilation of `module-info.java` files in mixed projects - https://github.com/sbt/sbt/issues/3368
- Which means that we can have `module-info.java` files only for Java-only projects.
- Anyway, we need just a single `module-info.class` in the resulting `runtime.jar` fat jar.
- `runtime.jar` is assembled in `runtime-with-instruments` with a custom merge strategy (`sbt-assembly` plugin). Caching is disabled for custom merge strategies, which means that re-assembly of `runtime.jar` will be more frequent.
- Engine distribution contains multiple JAR archives (modules) in `component` directory, along with `runner/runner.jar` that is hidden inside a nested directory.
- The new entry point to the engine runner is [EngineRunnerBootLoader](https://github.com/enso-org/enso/pull/7991/files#diff-9ab172d0566c18456472aeb95c4345f47e2db3965e77e29c11694d3a9333a2aa) that contains a custom ClassLoader - to make sure that everything that does not have to be loaded from a module is loaded from `runner.jar`, which is not a module.
- The new command line for launching the engine runner is in [distribution/bin/enso](https://github.com/enso-org/enso/pull/7991/files#diff-0b66983403b2c329febc7381cd23d45871d4d555ce98dd040d4d1e879c8f3725)
- [Newest version of Frgaal](https://repo1.maven.org/maven2/org/frgaal/compiler/20.0.1/) (20.0.1) does not recognize `--source 21` option, only `--source 20`.
- Closes#5303
- Refactors `JoinStrategy` allowing us to 'stack' join strategies on top of each other (to some extent) - currently a `HashJoin` can be followed by another join strategy (currently `SortJoin`)
- Adds benchmarks for join
- Due to limitations of the sorting approach this will still not be as fast as possible for cases where there is more than 1 `Between` condition in a single query - trying to demonstrate that in benchmarks.
- We can replace sorting by d-dimensional [RangeTrees](https://en.wikipedia.org/wiki/Range_tree) to get `O((n + m) log^d n + k)` performance (where `n` and `m` are sizes of joined tables, `d` is the amount of `Between` conditions used in the query and `k` is the result set size).
- Follow up ticket for consideration later:
#8216
- Closes#8215
- After all, it turned out that `TreeSet` was problematic (because of not enough flexibility with duplicate key handling), so the simplest solution was to immediately implement this sub-task.
- Closes#8204
- Unrelated, but I ran into this here: adds type checks to other arguments of `set`.
- Before, putting in a Column as `new_name` (i.e. mistakenly messing up the order of arguments), lead to a hard to understand `Method `if_then_else` of type Column could not be found.`, instead now it would file with type error 'expected Text got Column`.
- Sets the default limit for `Table.read` in Database to be max 1000 rows.
- The limit for in-memory compatible API still defaults to `Nothing`.
- Adds a warning if there are more rows than limit.
- Enables a few unrelated asserts.
* doc
* one test
* date tests
* empty and nothing
* ints floats
* bools
* all columns
* regex and index
* locales
* bad formats
* all with one format
* docs
* examples, not impl db
* docs, more errors
* cleanup
* changelog
* check list
* reorder
* clue
* review
* review
* review
* review
* review
* review
* specify time zone
Added Blank_Selector constructor and applied to remove_blank_columns, select_blank_columns, filter_blank_rows for #7931 . Changed when_any to when for readability.
Previously, constant columns were given generated names with UUIDs in them, which are long and provide no information. Instead, we now use the constant value itself to form the name.
Since these new generated names are less unique, we must explicitly make them unique, in cases where the caller did not explicilty set a name.
- Closes#7981
- Adds a `RUNTIME_ERROR` operation into the DB dialect, that may be used to 'crash' a query if a condition is met - used to validate if `lookup_and_replace` invariants are still satisfied when the query is materialized.
- Removes old `Table_Helpers.is_table` and `same_backend` checks, in favour of the new way of checking this that relies on `Table.from` conversions, and is much simpler to use and also more robust.
After a discussion, I was really curious that our panics are supposed to be almost free - and while trusting that statement, it was really hard to believe - so I wanted to see for myself - knowing that an experiment is the most robust source of this kind of information - testing that in practice.
So I wrote a benchmark comparing various ways of reporting errors, also testing them both at 'shallow' and 'deep' stack traces (adding 200 additional frames) - to see how stack depth affects them, if at all.
The panics are indeed blazing fast! Kudos to the engine team. However, it seems that our dataflow errors are relatively slow (and we tend to use them _more_ than panics and want to be using them more and more). This uncovers a possible optimization opportunity. Can we make them as fast as panics??
Analysis of the benchmark results in comment below.
- Fixes#8049
- Adds tests for handling of Date_Time upload/download in Postgres.
- Adds tests for edge cases of handling of Decimal and Binary types in Postgres.
- Fixes issue with `to_display_text` on `Date_Time`. Now a reversible format.
- Adjusted the auto width code on table so it works.
- Manually sized columns should remain fixed size.
- Enable Excel range style selection in table.
- Removed page support as not implemented yet.
# Important Notes
Will need to apply to Vue based viz at some point as well.
- Follow-up of #8055
- Adds a benchmark comparing performance of Enso Map and Java HashMap in two scenarios - _only incremental_ updates (like `Vector.distinct`) and _replacing_ updates (like keeping a counter for each key). These benchmarks can be used as a metric for #8090
- Adds the ability to use numbers, date/time and Boolean values as constants in `set`.
- `Table.set` can take a `Column_Operation`, allowing for deriving of a new column based on other columns.
- Added `Column_Ref` type to refer to a column in `filter`.
- Added some ALIASes.
- Added `sheet` to `Excel_Workbook` to give familiar API to read sheet.
- Added conversion from range to vector allowing easy use with Zip.
- Add `Map.from_keys_and_values.
- Fixes a bug where creating a temporary table could accidentally issue a `DROP` statement of the table name that the user provided, risking destruction of user data.
- Fortunately, the bad scenario was almost impossible, because the `DROP` statement was only issued _if_ we previously checked that the mentioned table _does not exist_ - dropping a nonexistent table does not do any harm.
- It could have been dangerous in a very unlikely scenario that a table was created just between the _existence check_ and the _drop_.
- After the fix the existence check and any modifications are done within a transaction to avoid interference from concurrent modifications, and the DROP is correctly applied to a temporary Enso table instead of the original one.
- Replaced a temporary log with proper simple logging of SQL statements into a file, if an Environment variable is set.
- Used that feature to test that no unexpected statements occur.
- Closes#7749 implementing the in-memory logic.
- Additional complications have surfaced regarding the Database logic, so it has been split off into a separate ticket: #7981
This PR includes
* Reading XML from a file, stream, or string
* Reading XML via Data.fetch
* Accessing the root element, element children, and attributes
* Accessing tag text contents
* Get tags by name
* Inner / Outer XML string
- Fixes#7352 by remembering original value types in type inference mode to be able to reconstruct them for Mixed.
- Added more benchmarks for comparing performance of constructing columns.
- Fixes missing implementations that caused `Table.union` crashing on some type pairs.
- Ensures that `Loss_Of_Integer_Precision` warning is not swallowed when numeric columns are unioned to create a `Float` column.
- Adds test for all of the above cases.
- Allow to output benchmark results to a CSV by setting an environment variable - useful for quickly comparing benchmarks, e.g. in Enso.
- Shuffle a few `from`s into correct places:
- `Day_Of_Week.from` removing `Day_Of_Week_From` module.
- Adding short cut for `http` and `https` in `Data.read` so it calls onto `Data.fetch` giving a single entry point.
- Moved `URI` extensions from `Standard.Base.Data` module into `Standard.Base.Network.Extensions`.
- Added `post` extension for `URI`.
- Added `contains_key` to `JS_Object`.
- Restored `into` in `JS_Object`:
- Follows old logic populating a constructor.
- Will use conversion from `JS_Object` if present.
- Added automatic deserialization of `Date`, `Time_Of_Day` and `Date_Time` from JSON.
- Uses conversion from `JS_Object`.
- Added conversion from `Text` to a `HTTP_Method` and type checking where `HTTP_Method` used in public APIs.
- Added support for `Date`, `Time_Of_Day` and `Date_Time` in `Table.from_objects`.
- Added `expand_column` to `Table` to expand `JS_Object` to values.
- Add type checking for `Table` in `right` arguments (allowing `Column`s to be used).
- Use type checking in `Table.set` to allow for conversion to a `Column`.
- Remove some unused imports.
- Fix for bug in S3 edge case.
* Add Text.substring function and get_position helper function
For #7876 adds a Text.substring function which supports negative indexes and returns a part of a string from 0-based index 'start' and continuing for 'length'
* added substring and simplified get function
For #7876 adds a Text.substring function which supports negative indexes and returns a part of a string from 0-based index 'start' and continuing for 'length'.
Also simplified get function as it looped unnecessarily.
* Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso
punctuation corrections
Co-authored-by: GregoryTravis <greg.m.travis@gmail.com>
* Update Text_Spec.enso
Added test for start index larger than string
* Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso
updated Arguments: section to use consistent style
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
* Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso
updated Index_Out_Of_Bounds error to reference cached length
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
* Removed Slice label and added changelog entry
* Re-added slice tag to substring
Per conversation with James, added slice back to substring
* Update CHANGELOG.md add link
---------
Co-authored-by: GregoryTravis <greg.m.travis@gmail.com>
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
* Transform Range.iterate test to a benchmark
a) it slows down regular unit testing
b) it is not a unit test
c) it behaves like a benchmark
So it should be a benchmark.
* missing Range import
- Closes#7461 by introducing a `Date_Time_Formatter` type and making parsing date time formats more robust and safer.
- The default ('simple') set of patterns is slightly simplified and made case insensitive (except for `M/m` and `H/h`) to avoid the `YYYY` vs `yyyy` issues and make it less error prone.
- The `YYYY` now has the same meaning as `yyyy` in simple mode. The old meaning (week-based year) is moved to a _separate mode_, triggered by `Date_Time_Formatter.from_iso_week_date_pattern`.
- Full Java syntax, as well as custom-built Java `DateTimeFormatter` can also be used by `Date_Time_Formatter.from_java`.
- Text-based constants (e.g. `ISO_ZONED_DATE_TIME`) have now become methods on `Date_Time_Formatter`, e.g. `Date_Time_Formatter.iso_zoned_date_time`).
- Added a `FileSystemSPI` allowing protocol resolution to a target type.
- Separated `Input_Stream` and `Output_Stream` from `File` to allow use in other spaces.
- `File_Format` types `read_web` changed to be `read_stream` working with `InputStream`.
- Added directory listing to `Auto_Detect` allowing for `Data.read` to list a folder.
- Adjusted HTTP to return an `InputStream` not a `byte[]`:
- `Response_Body` adjusted to wrap an `InputStream`.
- Added ability to materialize to either and in-memory vector (<4KB) or a temporary file.
- `Data.fetch` will materialize if not a recognized mime-type.
- Added `HTTP_Error` to handle IO exceptions from the stream.
- `Excel_Format` now supports mime-type and reading a stream.
- `Excel_Workbook` can now get a `Excel_Section` using `read_section`.
- Added S3 APIs:
- `parse_uri`: splits an S3 URI into bucket and key.
- `list_objects`: list the items in a S3 bucket with specified prefix.
- `read_bucket`: list prefixes and keys with a delimiter in a S3 bucket with specified prefix.
- `head`: either head_bucket (tests existance) or head_object API (reads object meta data).
- `get_object`: gets an object from S3 returning as a `Response_Body`.
- Added `S3_File` type acting like a `File`:
- No support for writing in this PR.
- **ToDo:** recursive listing, glob filtering, exists, size.
- Fixed a few invalid type signature line.
- Moved `create` methods for `Postgres_Connection` and `SQLite_Connection` into type instead of module.
- Renamed `Column_Fetcher.Builder` to `Column_Fetcher_Builder`.
- Fixed bug with `select_into` in Dry Run mode creating permanent tables.
**ToDo:** Unit tests.
Closes#7677 by eliminating the _stackoverflow execption_. In general it seems _too adventurous_ to walk members of random foreign objects. There can be anything including cycles. Rather than trying to be too smart in these cases, let's just rely on `InteropLibrary.isIdentical` message.
# Important Notes
Calling `sort` on the `numpy` array no longer yields an error, but the array isn't sorted - that needs a fix on the Python side: https://github.com/oracle/graalpython/issues/354 - once it is in, the elements will be treated as numbers and the sorting happens automatically (without any changes in Enso code).