* Add new test for required behaviour
* Handle case where strArg is an empty string
* More tests around fixed width field. Remove unneeded duplicate logic
* javafmtAll
* Further simplification
* SQLite doesn't have full type system
* SQLite doesn't have full type system
Give file read its own helper widget for delimiters. Remove newline add none. The file read delimiter is similar but different to the split one and so should have its own set of options.
- After [suggestion](https://github.com/enso-org/enso/pull/8497#discussion_r1429543815) from @JaroslavTulach I have tried reimplementing the URL encoding using just `URLEncode` builtin util. I will see if this does not complicate other followup improvements, but most likely all should work so we should be able to get rid of the unnecessary bloat.
- Closes#8354
- Extends `simple-httpbin` with a simple mock of the Cloud API (currently it checks the token and serves the `/users` endpoint).
- Renames `simple-httpbin` to `http-test-helper`.
- Closes#8352
- ~~Proposed fix for #8493~~
- The temporary fix is deemed not viable. I will try to figure out a workaround and leave fixing #8493 to the engine team.
Add a local clone of javaFormatter plugin. The upstream is not maintained anymore. And we need to update it to use the newest Google java formatter because the old one, that we use, cannot format sources with Java 8+ syntax.
# Important Notes
Update to Google java formatter 1.18.1 - https://github.com/google/google-java-format/releases/tag/v1.18.1
- Closes#8111 by making sure that all Excel workbooks are read using a backing file (which should be more memory efficient).
- If the workbook is being opened from an input stream, that stream is materialized to a `Temporary_File`.
- Adds tests fetching Table formats from HTTP.
- Extends `simple-httpbin` with ability to serve files for our tests.
- Ensures that the `Infer` option on `Excel` format also works with streams, if content-type metadata is available (e.g. from HTTP headers).
- Implements a `Temporary_File` facility that can be used to create a temporary file that is deleted once all references to the `Temporary_File` instance are GCed.
- Add a `File_For_Read` type. Used for `File_Format` to read files.
- Added `Enso_User` representing the current user in `Enso_Cloud`.
- *Will be later able to list known users.*
- Added `Enso_Secret` representing a value defined in `Enso_Cloud`.
- Value not used within Enso only accessed within polyglot Java.
- Integrated into `Username_And_Password` and can be used within JDBC connections.
- Integrated into HTTP Headers so a secret can be used as a value.
- New `URI_With_Query` with the same API as `URI`. Supporting secrets in the value.
- *Will be integrated with AWS credentials.*
- Added `Enso_File` representing a file or a folder in the cloud.
- Support the same API as `File` (like the `S3_File`).
- *Will support `enso://` URI style access.*
Upgrade to GraalVM JDK 21.
```
> java -version
openjdk version "21" 2023-09-19
OpenJDK Runtime Environment GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15)
OpenJDK 64-Bit Server VM GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15, mixed mode, sharing)
```
With SDKMan, download with `sdk install java 21-graalce`.
# Important Notes
- After this PR, one can theoretically run enso with any JRE with version at least 21.
- Removed `sbt bootstrap` hack and all the other build time related hacks related to the handling of GraalVM distribution.
- `project-manager` remains backward compatible - it can open older engines with runtimes. New engines now do no longer require a separate runtime to be downloaded.
- sbt does not support compilation of `module-info.java` files in mixed projects - https://github.com/sbt/sbt/issues/3368
- Which means that we can have `module-info.java` files only for Java-only projects.
- Anyway, we need just a single `module-info.class` in the resulting `runtime.jar` fat jar.
- `runtime.jar` is assembled in `runtime-with-instruments` with a custom merge strategy (`sbt-assembly` plugin). Caching is disabled for custom merge strategies, which means that re-assembly of `runtime.jar` will be more frequent.
- Engine distribution contains multiple JAR archives (modules) in `component` directory, along with `runner/runner.jar` that is hidden inside a nested directory.
- The new entry point to the engine runner is [EngineRunnerBootLoader](https://github.com/enso-org/enso/pull/7991/files#diff-9ab172d0566c18456472aeb95c4345f47e2db3965e77e29c11694d3a9333a2aa) that contains a custom ClassLoader - to make sure that everything that does not have to be loaded from a module is loaded from `runner.jar`, which is not a module.
- The new command line for launching the engine runner is in [distribution/bin/enso](https://github.com/enso-org/enso/pull/7991/files#diff-0b66983403b2c329febc7381cd23d45871d4d555ce98dd040d4d1e879c8f3725)
- [Newest version of Frgaal](https://repo1.maven.org/maven2/org/frgaal/compiler/20.0.1/) (20.0.1) does not recognize `--source 21` option, only `--source 20`.
- Closes#5303
- Refactors `JoinStrategy` allowing us to 'stack' join strategies on top of each other (to some extent) - currently a `HashJoin` can be followed by another join strategy (currently `SortJoin`)
- Adds benchmarks for join
- Due to limitations of the sorting approach this will still not be as fast as possible for cases where there is more than 1 `Between` condition in a single query - trying to demonstrate that in benchmarks.
- We can replace sorting by d-dimensional [RangeTrees](https://en.wikipedia.org/wiki/Range_tree) to get `O((n + m) log^d n + k)` performance (where `n` and `m` are sizes of joined tables, `d` is the amount of `Between` conditions used in the query and `k` is the result set size).
- Follow up ticket for consideration later:
#8216
- Closes#8215
- After all, it turned out that `TreeSet` was problematic (because of not enough flexibility with duplicate key handling), so the simplest solution was to immediately implement this sub-task.
- Closes#8204
- Unrelated, but I ran into this here: adds type checks to other arguments of `set`.
- Before, putting in a Column as `new_name` (i.e. mistakenly messing up the order of arguments), lead to a hard to understand `Method `if_then_else` of type Column could not be found.`, instead now it would file with type error 'expected Text got Column`.
- Closes#7981
- Adds a `RUNTIME_ERROR` operation into the DB dialect, that may be used to 'crash' a query if a condition is met - used to validate if `lookup_and_replace` invariants are still satisfied when the query is materialized.
- Removes old `Table_Helpers.is_table` and `same_backend` checks, in favour of the new way of checking this that relies on `Table.from` conversions, and is much simpler to use and also more robust.
- Fixes#8049
- Adds tests for handling of Date_Time upload/download in Postgres.
- Adds tests for edge cases of handling of Decimal and Binary types in Postgres.
- Follow-up of #8055
- Adds a benchmark comparing performance of Enso Map and Java HashMap in two scenarios - _only incremental_ updates (like `Vector.distinct`) and _replacing_ updates (like keeping a counter for each key). These benchmarks can be used as a metric for #8090
- Closes#7749 implementing the in-memory logic.
- Additional complications have surfaced regarding the Database logic, so it has been split off into a separate ticket: #7981
This PR includes
* Reading XML from a file, stream, or string
* Reading XML via Data.fetch
* Accessing the root element, element children, and attributes
* Accessing tag text contents
* Get tags by name
* Inner / Outer XML string
- Fixes#7352 by remembering original value types in type inference mode to be able to reconstruct them for Mixed.
- Added more benchmarks for comparing performance of constructing columns.
- Fixes missing implementations that caused `Table.union` crashing on some type pairs.
- Ensures that `Loss_Of_Integer_Precision` warning is not swallowed when numeric columns are unioned to create a `Float` column.
- Adds test for all of the above cases.
- Allow to output benchmark results to a CSV by setting an environment variable - useful for quickly comparing benchmarks, e.g. in Enso.
- Closes#7461 by introducing a `Date_Time_Formatter` type and making parsing date time formats more robust and safer.
- The default ('simple') set of patterns is slightly simplified and made case insensitive (except for `M/m` and `H/h`) to avoid the `YYYY` vs `yyyy` issues and make it less error prone.
- The `YYYY` now has the same meaning as `yyyy` in simple mode. The old meaning (week-based year) is moved to a _separate mode_, triggered by `Date_Time_Formatter.from_iso_week_date_pattern`.
- Full Java syntax, as well as custom-built Java `DateTimeFormatter` can also be used by `Date_Time_Formatter.from_java`.
- Text-based constants (e.g. `ISO_ZONED_DATE_TIME`) have now become methods on `Date_Time_Formatter`, e.g. `Date_Time_Formatter.iso_zoned_date_time`).
- Added a `FileSystemSPI` allowing protocol resolution to a target type.
- Separated `Input_Stream` and `Output_Stream` from `File` to allow use in other spaces.
- `File_Format` types `read_web` changed to be `read_stream` working with `InputStream`.
- Added directory listing to `Auto_Detect` allowing for `Data.read` to list a folder.
- Adjusted HTTP to return an `InputStream` not a `byte[]`:
- `Response_Body` adjusted to wrap an `InputStream`.
- Added ability to materialize to either and in-memory vector (<4KB) or a temporary file.
- `Data.fetch` will materialize if not a recognized mime-type.
- Added `HTTP_Error` to handle IO exceptions from the stream.
- `Excel_Format` now supports mime-type and reading a stream.
- `Excel_Workbook` can now get a `Excel_Section` using `read_section`.
- Added S3 APIs:
- `parse_uri`: splits an S3 URI into bucket and key.
- `list_objects`: list the items in a S3 bucket with specified prefix.
- `read_bucket`: list prefixes and keys with a delimiter in a S3 bucket with specified prefix.
- `head`: either head_bucket (tests existance) or head_object API (reads object meta data).
- `get_object`: gets an object from S3 returning as a `Response_Body`.
- Added `S3_File` type acting like a `File`:
- No support for writing in this PR.
- **ToDo:** recursive listing, glob filtering, exists, size.
- Fixed a few invalid type signature line.
- Moved `create` methods for `Postgres_Connection` and `SQLite_Connection` into type instead of module.
- Renamed `Column_Fetcher.Builder` to `Column_Fetcher_Builder`.
- Fixed bug with `select_into` in Dry Run mode creating permanent tables.
**ToDo:** Unit tests.
- Fixes#7354
- And also closes#7712
- Refactors how we handle numeric ops - ensuring that the 'kernels' are placed all in one place and selected based on storage types.
- Closes#6111
- Aligns semantics of handling Mixed columns.
- Now, if an operation like `iif` or `fill_nothing` is given a `Mixed` column, the result will also be `Mixed` regardless of the `inferred_precise_value_type`.
- Enables a few old tests that were pending but could be enabled since the types work is advanced enough.
- Closes#5159
- Now data downloaded from the database can keep the type much closer to the original type (like string length limits or smaller integer types).
- Cast also exposes these types.
- The integers are still all stored as 64-bit Java `long`s, we just check their bounds. Changing underlying storage for memory efficiency may come in the future: #6109
- Fixes#7565
- Fixes#7529 by checking for arithmetic overflow in in-memory integer arithmetic operations that could overflow. Adds a documentation note saying that the behaviour for Database backends is unspecified and depends on particular database.
Closes#7353
I introduce a new type `WithAggregatedProblems`, because `WithProblems` was too simple - it only allowed to hold a `List<Problem>` but `AggregatedProblems` is more than that. Ideally we shouldn't multiply entities like this too much. We should probably unify all to use `WithAggregatedProblems` - but after starting this, I realised it will likely just take too much effort to do for this little PR. So instead, I created a follow-up task for this: #7514
# Important Notes
#### The Plot
- there used to be two kinds of benchmarks: in Java and in Enso
- those in Java got quite a good treatment
- there even are results updated daily: https://enso-org.github.io/engine-benchmark-results/
- the benchmarks written in Enso used to be 2nd class citizen
#### The Revelation
This PR has the potential to fix it all!
- It designs new [Bench API](88fd6fb988) ready for non-batch execution
- It allows for _single benchmark in a dedicated JVM_ execution
- It provides a simple way to wrap such an Enso benchmark as a Java benchmark
- thus the results of Enso and Java benchmarks are [now unified](https://github.com/enso-org/enso/pull/7101#discussion_r1257504440)
Long live _single benchmarking infrastructure for Java and Enso_!
- Fixes#7412
- Also adds tests and fixes some more edge cases:
- Ensures correct handling of existing Database tables whose column names may be invalid from Enso perspective, or clashing from Enso perspective (e.g. for most DBs `ś` and `s\u0301` are different names, but for Enso they are basically the same so this would cause issues - thus Enso now renames such columns when accessed (still using the correct column reference in the generated SQL under the hood).
- Closes#5951
- Ensures any SQL warnings reported by the database through the JDBC driver are processed and forwarded to the user.
- These warnings show issues like the implicit name truncation that this PR is also solving. It's good to make sure they are visible as they can help avoid and understand unexpected problems. They should not show up in most standard workflows.
- Adds simple history to our REPL.
- Fixes#7231
- Cleans up vectorized operations to distinguish unary and binary operations.
- Introduces MixedStorage which may pretend to be a more specialized storage on demand.
- Ensures that operations request a more specialized storage on right-hand side to ensure compatibility with reported inferred storage type.
- Ensures that a dataflow error returned by an Enso callback in Java is propagated as a polyglot exception and can be caught back in Enso
- Tests for comparison of Mixed storages with each other and other types
- Started using `Set` for `Filter_Condition.Is_In` for better performance.
- ~~Migrated `Column.map` and `Column.zip` to use the Java-to-Enso callbacks.~~
- This does not forward warnings. IMO we should not be losing them. We can switch and add a ticket to fix the warnings, but that would be a regression (current implementation handles them correctly). Instead, we should first gain some ability to work with warnings in polyglot. I created a ticket to get this figured out #7371
- ~~Trying to avoid conversions when calling Enso functions from Java.~~
- Needs extra care as dataflow errors may not be handled right then. So only works for simple functions that should not error.
- Not sure how much it really helps. [Benchmarks](https://github.com/enso-org/enso/pull/7270#issuecomment-1635618393) suggested it could improve the performance quite significantly, but the practical solution is not exactly the same as the one measured, so we may have to measure and tune it to get the best results.
- Created #7378 to track this.
The added benchmark is a basis for a performance investigation.
We compare the performance of the same operation run in Java vs Enso to see what is the overhead and try to get the Enso operations closer to the pure-Java performance.
- Adds `Column.date_diff` for computing date/time difference as integer multiply of some unit.
- Adds `Column.date_add` for shifting date/time by a unit.
- Adds `Column.date_part` for extracting various parts of the date/time value as integer.
- Adds widgets for the 3 methods above whose content depends on the column value type.
- Adds shorthands: `Column.hour`, `Column.minute` and `Column.second` to extract these date parts.
- Extends `Time_Period` with support for milli-, micro- and nano- seconds; and adapts functions taking `Time_Period` to support these wherever possible.
- Removed `module` argument from `enso_project` (new `Project_Description.new` API).
- Removed the custom option from date and time parse/format dropdowns.
- The `format` dropdown uses the value to create the dropdown. (Screenshot below)
- Removed `StorageType` coalescing rules and replaced them with simpler logic in `ObjectStorage`.
- Update signature for `add_row_number` and add aliases.
- Add type detection for `Mixed` columns when calling column functions.
- Excel uses column name for missing headers.
- Add aliases for parse functions on text.
- Adjust `Date`, `Time_Of_Day` and `Date_Time` parse functions to not take `Nothing` anymore and provide dropdowns.
- Removed built-in parses.
- All support Locale.
- Add support for missing day or year for parsing a Date.
- All will trim values automatically.
- Added ability to list AWS profiles.
- Added ability to list S3 buckets.
- Workaround for Table.aggregate so default item added works.
Related to #6912
It essentially solves it by removing any builtins that would take an EnsoDate/EnsoTimeOfDay/EnsoTimeZone and replacing them with Java utils that do the same operation.
This is not a proper solution - the builtin conversion is still invalid for the date/time types - but at this moment we may just no longer use the invalid conversion so it is much less of an issue. We still need to be aware of this if we want to introduce builtins taking date/time in the future.
Closes#5227
# Important Notes
- This lays first steps towards #6292 - we get pure Enso variants of MultiValueKey.
- Another part refactors `LongStorage` into `AbstractLongStorage` allowing it to provide alternative implementations of the underlying storage, in our case `LongRangeStorage` generating the values ad-hoc and `LongConstantStorage` - currently unused but in the future it can be adapted to support constant columns (once we implement similar facilities for other types).
Add format to the in-memory Column
# Important Notes
Also updates .format in date types.
Some rearrangement of date formatting builtins / Java libraries.
- Add dropdown to tokenize and split `column`.
- Remove the custom `Join_Kind` dropdown.
- Adjust split and tokenize names to start numbering from 1, not 0.
- Add JS_Object serialization for Period.
- Add `days_until` and `until` to `Date`.
- Add `Date_Period.Day` and create `next` and `previous` on `Date`.
- Use simple names with `File_Format` dropdown.
- Avoid using `Main.enso` based imports in `Standard.Base.Data.Map` and `Standard.Base.Data.Text.Helpers`.
- Remove an incorrect import from `Standard.Database.Data.Table`.
From #6587:
A few small changes, lots of lines because this affected lots of tests:
- `Table.join` now defaults to `Join_Kind.Left_Outer`, to avoid losing rows in the left table unexpectedly. If the user really wants to have an Inner join, they can switch to it.
- `Table.join` now defaults to joining columns by name not by index - it looks in the right table for a column with the same name as the first column in left table.
- Missing Input Column errors now specify which table they refer to in the join.
- The unique name suffix in column renaming / default column names when loading from file is now a space instead of underscore.
- Adjusted `Context.is_enabled` to support default argument (moved built in so can have defaults).
- Made `environment` case-insensitive.
- Bug fix for play button.
- Short hand to execute within an enabled context.
- Forbid file writing if the Output context is disabled with a `Forbidden_Operation` error.
- Add temporary file support via `File.create_temporary_file` which is deleted on exit of JVM.
- Execution Context first pass in `Text.write`.
- Added dry run warning.
- Writes to a temporary file if disabled.
- Created a `DryRunFileManager` which will create and manage the temporary files.
- Added `format` dropdown to `File.read` and `Data.read`.
- Renamed `JSON_File` to `JSON_Format` to be consistent.
(still to unit test).
* Rename makeUnique overloads to avoid issue when Nothing is passed.
Suspend warnings when building the output table to avoid mass warning duplication.
* Add test for mixed invalid names.
Adjust so a single warning attached.
* PR comments.
Follow up of #6298 as it grew too much. Adds the needed typechecks to aggregate operations. Ensures that the DB operations report `Floating_Point_Equality` warning consistently with in-memory.
- Add `replace` with same syntax as on `Text` to an in-memory `Column`.
- Add `trim` with same syntax as on `Text` to an in-memory `Column`.
- Add `trim` to in-database `Column`.
- Added `is_supported` to dialects and exposed the dialect consistently on the `Connection`.
- Add `write_table` support to `JSON_File` allowing `Table.write` to write JSON.
- Updated the parsing for integers and decimals:
- Support for currency symbols.
- Support for brackets for negative numbers.
- Automatic detection of decimal points and thousand separators.
- Tighter rules for scientific and thousand separated numbers.
- Remove `replace_text` from `Table`.
- Remove `write_json` from `Table`.
- `Process.run` now returns a `Process_Result` allowing the easy capture of stdout and stderr.
- Joining a column with a column name does not warn if adding just the prefix.
- Stop the table viz from changing case and adding spaces to the headers.
This is the first part of the #5158 umbrella task. It closes#5158, follow-up tasks are listed as a comment in the issue.
- Updates all prototype methods dealing with `Value_Type` with a proper implementation.
- Adds a more precise mapping from in-memory storage to `Value_Type`.
- Adds a dialect-dependent mapping between `SQL_Type` and `Value_Type`.
- Removes obsolete methods and constants on `SQL_Type` that were not portable.
- Ensures that in the Database backend, operation results are computed based on what the Database is meaning to return (by asking the Database about expected types of each operation).
- But also ensures that the result types are sane.
- While SQLite does not officially support a BOOLEAN affinity, we add a set of type overrides to our operations to ensure that Boolean operations will return Boolean values and will not be changed to integers as SQLite would suggest.
- Some methods in SQLite fallback to a NUMERIC affinity unnecessarily, so stuff like `max(text, text)` will keep the `text` type instead of falling back to numeric as SQLite would suggest.
- Adds ability to use custom fetch / builder logic for various types, so that we can support vendor specific types (for example, Postgres dates).
# Important Notes
- There are some TODOs left in the code. I'm still aligning follow-up tasks - once done I will try to add references to relevant tasks in them.
Enables `distinct`, `aggregate` and `cross_tab` to use the Enso hashing and equality operations.
Also, I rewired the way the ObjectComparators are obtained in polyglot code to be more consistent.
Add Comparator for `Day_Of_Week`, `Header`, `SQL_Type`, `Image` and `Matrix`.
Also, removed the custom `==` from these types as needed. (Closes#5626)
Closes#5151 and adds some additional tests for `cross_tab` that verify duplicated and invalid names.
I decided that for empty or `Nothing` names, instead of replacing them with `Column` and implicitly losing connection with the value that was in the column, we should just error on such values.
To make handling of these easier, `fill_empty` was added allowing to easily replace the empty values with something else.
Also, `{is,fill}_missing` was renamed to `{is,fill}_nothing` to align with `Filter_Condition.Is_Nothing`.
Adds a common project that allows sharing code between the `runtime` and `std-bits`.
Due to classpath separation and the way it is compiled, the classes will be duplicated - we will have one copy for the `runtime` classpath and another copy as a small JAR for `Standard.Base` library.
This is still much better than having the code duplicated - now at least we have a single source of truth for the shared implementations.
Due to the copying we should not expand this project too much, but I encourage to put here any methods that would otherwise require us to copy the code itself.
This may be a good place to put parts of the hashing logic to then allow sharing the logic between the `runtime` and the `MultiValueKey` in the `Table` library (cc: @Akirathan).
Follow-up to #5113 - I add some more edge case tests as we discussed with @jdunkerley
When debugging some quoting issues, I also realised the current `Mismatched_Quote` error provided not enough information. So I amended it to at least include some context indicating which was the 'offending' cell.
- Fix issue with Geo Map viz.
- Handle invalid format strings better in `Data_Formatter`.
- New constants for the ISO format strings (and a special ENSO_ZONED_DATE_TIME)
- Consistent Date Time format for parsing in all places.
- Avoid throwing exception in datetime parsing.
- Support for milliseconds (well nanoseconds) in Date_Time and Time_Of_Day.
- `Column.map` stays within Enso.
- Allow `Aggregate_Column.Group_By` in `cross_tab` group_by parameter.