Commit Graph

252 Commits

Author SHA1 Message Date
James Dunkerley
f2d2f73e89
Starting to refactor Storage and Operations (#9076)
Cleaning up some of the structures in Storage before working on UnaryOperations.

- Removed some legacy code: `countMask`, `Index` and `DefaultIndex`.
- Renamed `mask` to `applyFilter` on `Column` and `Storage`.
- Renamed `Table.mask` to `Table.filter`.
2024-02-15 18:21:07 +00:00
Pavel Marek
5919eda753
Fix incremental compilation of runtime/test (#8975) 2024-02-13 10:05:31 +01:00
AdRiley
9339672e0e
Remove _new and actually run the new tests (#9006)
Merge conflict on develop meant this one got left with a new_test.
2024-02-09 14:19:02 +00:00
AdRiley
e3f6ff1772
Add to_xml component (#8979)
Adds new to_xml component
2024-02-07 20:54:48 +00:00
James Dunkerley
0c39f8ec04
Allow Filter_Condition to be inverted. (#8861)
- Various linting fixes (doc comments and type annotations etc.).
- Add an action to determine if a `Filter_Condition` is keep or remove.

https://github.com/enso-org/enso/assets/4699705/69ba2bd3-8893-4237-acc4-eb01f534a209

- Remove `Not_In`, `Not_Contains` and `Not_Like` from `Filter_Condition`.

- Ability to use an `Expression` as a `Column_Ref`.

https://github.com/enso-org/enso/assets/4699705/16a2e030-f8f9-4f59-beca-2646f56fcb90
2024-02-07 14:36:14 +00:00
AdRiley
340a3eec4e
Split HashJoin to SimpleHashJoin and CompoundHashJoin (#8850)
Completes #8342 . Creates a SimpleHashJoin and CompoundHashJoin.

# Important Notes
Creates SimpleHashJoin and CompoundHashJoin.

CompoundHashJoin is what was HashJoin.
SimpleHashJoin is a new implementation that only indexs the smaller of the 2 tables being joined together.

The rest is refactor and clean-up of the shared join code.
2024-02-01 18:48:44 +00:00
James Dunkerley
eeaddbc434
Add parser for line by line processing (#8719)
- Linting fixes and groups.
- Add `File.from that:Text` and use `File` conversions instead of taking both `File` and `Text` and calling `File.new`.
- Align Unix Epoc with the UTC timezone and add converting from long value to `Date_Time` using it.
- Add simple first logging API allowing writing to log messages from Enso.
- Fix minor style issue where a test type had a empty constructor.
- Added a `long` based array builder.
- Added `File_By_Line` to read a file line by line.
- Added "fast" JSON parser based off Jackson.
- Altered range `to_vector` to be a proxy Vector.
- Added `at` and `get` to `Database.Column`.
- Added `get` to `Table.Column`.
- Added ability to expand `Vector`, `Array` `Range`, `Date_Range` to columns.
- Altered so `expand_to_column` default column name will be the same as the input column (i.e. no `Value` suffix).
- Added ability to expand `Map`, `JS_Object` and `Jackson_Object` to rows with two columns coming out (and extra key column).
-  Fixed bug where couldn't use integer index to expand to rows.
2024-02-01 07:29:50 +00:00
GregoryTravis
7436848e90
Implement relational NULL/Nothing for join for in-memory tables (#8849)
Implements relational NULL for join, for all `Join_Kind`s.
2024-01-29 16:19:07 +00:00
Radosław Waśko
ca4f98c78e
Adding tests and missing methods for Enso_File. (#8815)
- Closes #8808 - adds tests for various scenarios.
- Implements `size` using HEAD.
- Updates existing functions to changes in Cloud API.
- Adds stubs for `*_time` methods, `parent`, `path`.
- [x] TODO: resolve the `Enso_File.current_working_directory` from an environment variable.
- ~~TODO: recursive directory deletion?~~ left for later

# Important Notes
- Currently, the Cloud API does not offer an easy way to extract metadata for a file, in particular to get the parent folder from the file `id`.
- We should be able to get the parent, and stuff like creation/modified time.
- We need a way to resolve paths to asset ids, for `path` to work as well as `current_working_directory`.
- What is the environment variable that will be used to feed the `current_working_directory` property?
2024-01-26 19:04:42 +00:00
James Dunkerley
0b6db5797c
Refactor OrderMask to avoid memory copying (#8863)
Goal of this PR is to refactor the design of OrderMask and avoid copying arrays or lists wherever possible.
We have removed a few legacy functions which were not being used.

On a poor mans benchmark seems to be quicker (13s vs 16s) and memory usage should be lower.
2024-01-26 11:16:16 +00:00
GregoryTravis
5eb3f3bd1d
Implement relational NULL semantics for Nothing for in-memory Column operations (#8816)
Updates in-memory table column operations to treat Nothing as a relational NULL.
This PR does not include changes to Table.join.
2024-01-24 17:02:45 +00:00
AdRiley
6eb00a7c5f
Remove Conditions Helper Class (#8842)
* Tests green checkpoint

* Remove ConditionsHelper

* javafmtAll

* Dedupe
2024-01-24 16:56:39 +00:00
Radosław Waśko
edfcfde11c
Tests and improvements for secrets in cloud subdirectories (#8791)
- Closes #8723
- Adds some missing features that were needed to make this work:
- `Enso_File.create_directory` and `Enso_File.delete`, and basic tests for it
- Changes how `Enso_Secret.list` is obtained - using a different Cloud endpoint allows us to implement the desired logic, the default endpoint was giving us _all_ secrets which was not what we wanted here.
- Implements `Enso_Secret.update` and tests for it

# Important Notes
Notes describing any problems with the current Cloud API:
https://docs.google.com/document/d/1x8RUt3KkwyhlxGux7XUGfOdtFSAZV3fI9lSSqQ3XsXk/edit

Apparently, everything that was needed to make this feature work has already been implemented, although a few features needed workarounds on Enso side to work properly.
2024-01-24 10:17:22 +00:00
Radosław Waśko
368e4867b4
Allow secrets in AWS_Credential (#8774)
- Closes #8722
2024-01-19 19:00:56 +00:00
Radosław Waśko
14be36c401
Allow secrets in Header.authorization_* (#8761)
- Closes #8739
2024-01-18 12:49:47 +00:00
AdRiley
b8e93b3cba
Add new text_left and text_right functions (#8691)
Added text_left and text_right functions for in-memory and databases
2024-01-15 23:43:23 +00:00
Radosław Waśko
5b70ff25f7
Remove set_user_info from URI (#8738)
I have added this in #8591, but I have realised it may not be a good idea to have it, so I am removing that particular change.
2024-01-15 17:35:17 +00:00
Radosław Waśko
f34abeda0c
Add tests for Enso_Secrets, update to new cloud API (#8736)
- Closes #8556
2024-01-15 16:12:08 +00:00
Jaroslav Tulach
0e6952710a
Executing (parts of) Truffle TCK with Enso values (#8685) 2024-01-12 07:21:16 +01:00
AdRiley
f31ecc7c87
Make fill_nothing take an empty string (#8643)
* Add new test for required behaviour

* Handle case where strArg is an empty string

* More tests around fixed width field. Remove unneeded duplicate logic

* javafmtAll

* Further simplification

* SQLite doesn't have full type system

* SQLite doesn't have full type system
2024-01-10 11:59:10 +00:00
AdRiley
bf8dd1888c
Give file read its own helper widget for delimiters. (#8627)
Give file read its own helper widget for delimiters. Remove newline add none. The file read delimiter is similar but different to the split one and so should have its own set of options.
2024-01-04 11:59:42 +00:00
James Dunkerley
ffa06c9476
Sort handling of Nothing within Column || and && (#8656)
Follows the database logic:
![image](https://github.com/enso-org/enso/assets/4699705/328a0e36-5508-4c63-a60b-ac9a280cd93a)

Results:
![image](https://github.com/enso-org/enso/assets/4699705/77d6bf82-21f8-4aed-b4c5-45e429798189)
2024-01-03 10:40:40 +00:00
Radosław Waśko
d41d48e8a0
Merge URI_With_Query into URI, extend API of URI (#8591)
- Closes #8544
- Adds `reset_query_arguments` and `/` operators allowing to transform a URI.
- Adding tests for handling of various edge cases.
2023-12-21 18:39:26 +00:00
AdRiley
cfe0cbe0c1
Add text_length to column for in-memory and database (#8606)
Closes #8521
Adds text_length to Column
2023-12-21 11:31:13 +00:00
Radosław Waśko
d56b800c11
Remove the Apache dependency from std-base (#8571)
- After [suggestion](https://github.com/enso-org/enso/pull/8497#discussion_r1429543815) from @JaroslavTulach I have tried reimplementing the URL encoding using just `URLEncode` builtin util. I will see if this does not complicate other followup improvements, but most likely all should work so we should be able to get rid of the unnecessary bloat.
2023-12-20 18:01:08 +00:00
Radosław Waśko
724f8d2a56
Add tests for Enso Cloud auth + simple API mock for Enso_User (#8511)
- Closes #8354
- Extends `simple-httpbin` with a simple mock of the Cloud API (currently it checks the token and serves the `/users` endpoint).
- Renames `simple-httpbin` to `http-test-helper`.
2023-12-19 17:41:09 +00:00
Radosław Waśko
940b8f7d51
Improving tests and edge cases for URI and HTTP (#8497)
- Closes #8352
- ~~Proposed fix for #8493~~
- The temporary fix is deemed not viable. I will try to figure out a workaround and leave fixing #8493 to the engine team.
2023-12-15 17:58:45 +00:00
James Dunkerley
9e27b6487b
Minor fixes and tweak for Cloud APIs. (#8557)
- Fix secret to at least be working again
- Tweak to allow a MIMIC flow to work with value types (revisit in 2024).
2023-12-15 17:10:07 +00:00
Pavel Marek
c1098865f2
Update java formatter sbt plugin (#8543)
Add a local clone of javaFormatter plugin. The upstream is not maintained anymore. And we need to update it to use the newest Google java formatter because the old one, that we use, cannot format sources with Java 8+ syntax.

# Important Notes
Update to Google java formatter 1.18.1 - https://github.com/google/google-java-format/releases/tag/v1.18.1
2023-12-15 14:45:23 +00:00
Pavel Marek
4b65e44ef3
EpbLanguage re-uses other TruffleContext support to run tests with assertions enabled (#7882) 2023-12-15 13:31:32 +01:00
Radosław Waśko
b5c995a7bf
Reworking Excel support to allow for reading of big files (#8403)
- Closes #8111 by making sure that all Excel workbooks are read using a backing file (which should be more memory efficient).
- If the workbook is being opened from an input stream, that stream is materialized to a `Temporary_File`.
- Adds tests fetching Table formats from HTTP.
- Extends `simple-httpbin` with ability to serve files for our tests.
- Ensures that the `Infer` option on `Excel` format also works with streams, if content-type metadata is available (e.g. from HTTP headers).
- Implements a `Temporary_File` facility that can be used to create a temporary file that is deleted once all references to the `Temporary_File` instance are GCed.
2023-12-15 00:02:15 +00:00
Radosław Waśko
c6b6384fe6
Improve performance of anti-join (#8338)
- Closes #8217
2023-11-24 02:44:57 +00:00
James Dunkerley
ecaca12df1
Integrating Enso Cloud with the libraries (part 1...) (#8006)
- Add a `File_For_Read` type. Used for `File_Format` to read files.
- Added `Enso_User` representing the current user in `Enso_Cloud`.
- *Will be later able to list known users.*
- Added `Enso_Secret` representing a value defined in `Enso_Cloud`.
- Value not used within Enso only accessed within polyglot Java.
- Integrated into `Username_And_Password` and can be used within JDBC connections.
- Integrated into HTTP Headers so a secret can be used as a value.
- New `URI_With_Query` with the same API as `URI`. Supporting secrets in the value.
- *Will be integrated with AWS credentials.*
- Added `Enso_File` representing a file or a folder in the cloud.
- Support the same API as `File` (like the `S3_File`).
- *Will support `enso://` URI style access.*
2023-11-20 23:21:14 +00:00
Pavel Marek
5a7ad6bfe4
Upgrade enso to GraalVM for jdk 21 (#7991)
Upgrade to GraalVM JDK 21.
```
> java -version
openjdk version "21" 2023-09-19
OpenJDK Runtime Environment GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15)
OpenJDK 64-Bit Server VM GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15, mixed mode, sharing)
```

With SDKMan, download with `sdk install java 21-graalce`.

# Important Notes
- After this PR, one can theoretically run enso with any JRE with version at least 21.
- Removed `sbt bootstrap` hack and all the other build time related hacks related to the handling of GraalVM distribution.
- `project-manager` remains backward compatible - it can open older engines with runtimes. New engines now do no longer require a separate runtime to be downloaded.
- sbt does not support compilation of `module-info.java` files in mixed projects - https://github.com/sbt/sbt/issues/3368
- Which means that we can have `module-info.java` files only for Java-only projects.
- Anyway, we need just a single `module-info.class` in the resulting `runtime.jar` fat jar.
- `runtime.jar` is assembled in `runtime-with-instruments` with a custom merge strategy (`sbt-assembly` plugin). Caching is disabled for custom merge strategies, which means that re-assembly of `runtime.jar` will be more frequent.
- Engine distribution contains multiple JAR archives (modules) in `component` directory, along with `runner/runner.jar` that is hidden inside a nested directory.
- The new entry point to the engine runner is [EngineRunnerBootLoader](https://github.com/enso-org/enso/pull/7991/files#diff-9ab172d0566c18456472aeb95c4345f47e2db3965e77e29c11694d3a9333a2aa) that contains a custom ClassLoader - to make sure that everything that does not have to be loaded from a module is loaded from `runner.jar`, which is not a module.
- The new command line for launching the engine runner is in [distribution/bin/enso](https://github.com/enso-org/enso/pull/7991/files#diff-0b66983403b2c329febc7381cd23d45871d4d555ce98dd040d4d1e879c8f3725)
- [Newest version of Frgaal](https://repo1.maven.org/maven2/org/frgaal/compiler/20.0.1/) (20.0.1) does not recognize `--source 21` option, only `--source 20`.
2023-11-17 18:02:36 +00:00
GregoryTravis
ea3d778456
Allow the creation of a constant column on an in-memory table with no rows. (#8218) 2023-11-09 14:40:51 +00:00
Radosław Waśko
1b8b30a68d
Improve performance of Join_Condition.Between by sorting on one dimension (#8212)
- Closes #5303
- Refactors `JoinStrategy` allowing us to 'stack' join strategies on top of each other (to some extent) - currently a `HashJoin` can be followed by another join strategy (currently `SortJoin`)
- Adds benchmarks for join
- Due to limitations of the sorting approach this will still not be as fast as possible for cases where there is more than 1 `Between` condition in a single query - trying to demonstrate that in benchmarks.
- We can replace sorting by d-dimensional [RangeTrees](https://en.wikipedia.org/wiki/Range_tree) to get `O((n + m) log^d n + k)` performance (where `n` and `m` are sizes of joined tables, `d` is the amount of `Between` conditions used in the query and `k` is the result set size).
- Follow up ticket for consideration later:
#8216
- Closes #8215
- After all, it turned out that `TreeSet` was problematic (because of not enough flexibility with duplicate key handling), so the simplest solution was to immediately implement this sub-task.
- Closes #8204
- Unrelated, but I ran into this here: adds type checks to other arguments of `set`.
- Before, putting in a Column as `new_name` (i.e. mistakenly messing up the order of arguments), lead to a hard to understand `Method `if_then_else` of type Column could not be found.`, instead now it would file with type error 'expected Text got Column`.
2023-11-08 12:59:55 +00:00
Radosław Waśko
237aae33c7
Simplify internal logic of Table.order_by, avoid unnecessary warning (#8221)
- Fixes #8213
2023-11-06 11:00:01 +00:00
GregoryTravis
1480f50207
Overhaul the random number and item generation code (#8127)
Rewrite most of Random.enso.
2023-10-31 15:25:37 +00:00
Radosław Waśko
79011bd550
Implement Table.lookup_and_replace in Database (#8146)
- Closes #7981
- Adds a `RUNTIME_ERROR` operation into the DB dialect, that may be used to 'crash' a query if a condition is met - used to validate if `lookup_and_replace` invariants are still satisfied when the query is materialized.
- Removes old `Table_Helpers.is_table` and `same_backend` checks, in favour of the new way of checking this that relies on `Table.from` conversions, and is much simpler to use and also more robust.
2023-10-31 15:19:55 +00:00
Radosław Waśko
0c278391fe
Test and improve handling of Date_Time with_timezone=False in Postgres (#8114)
- Fixes #8049
- Adds tests for handling of Date_Time upload/download in Postgres.
- Adds tests for edge cases of handling of Decimal and Binary types in Postgres.
2023-10-21 21:35:13 +00:00
Radosław Waśko
8172896065
Support Previous_Value in fill_nothing and fill_missing (#8105)
- Adds `Previous_Value` to `fill_nothing` and `fill_empty`, as requested by #7192.
2023-10-20 13:18:53 +00:00
Radosław Waśko
93a31fcc8b
Add benchmarks related to add_row_number performance investigation (#8091)
- Follow-up of #8055
- Adds a benchmark comparing performance of Enso Map and Java HashMap in two scenarios - _only incremental_ updates (like `Vector.distinct`) and _replacing_ updates (like keeping a counter for each key). These benchmarks can be used as a metric for #8090
2023-10-18 17:21:59 +00:00
Radosław Waśko
e9fa12763e
Improve performance of add_row_number (#8076)
Fixes #8055
2023-10-17 00:42:35 +00:00
Radosław Waśko
08b717eb54
Refactor Table problem handling to a more robust and hopefully cleaner approach (#7879)
Closes #7514
2023-10-16 15:09:08 +00:00
GregoryTravis
f18d1323e1
Add Table.expand_to_rows to allow flattening vector and array values in table (#8042)
# Important Notes
Also includes a fix for a reallocation bug in `InferredBuilder`.
2023-10-13 20:54:06 +00:00
Radosław Waśko
cd84ac16ce
Restructure Table.from_objects to use conversions (#8020)
Closes #7957
2023-10-11 22:25:18 +00:00
somebody1234
826127d8ff
Eliminate line feeds from XML.outer_xml on Windows (#8013)
- Closes #7999

# Important Notes
None
2023-10-10 23:21:34 +00:00
Radosław Waśko
6e0bd86753
Implement Table.lookup_and_replace for in-memory (#7979)
- Closes #7749 implementing the in-memory logic.
- Additional complications have surfaced regarding the Database logic, so it has been split off into a separate ticket: #7981
2023-10-10 10:42:06 +00:00
GregoryTravis
9ba7be20af
Basic XML support (#7947)
This PR includes
* Reading XML from a file, stream, or string
* Reading XML via Data.fetch
* Accessing the root element, element children, and attributes
* Accessing tag text contents
* Get tags by name
* Inner / Outer XML string
2023-10-06 17:52:19 +00:00
Radosław Waśko
0cd446432f
Fix inconsistency when building a Mixed column, fixes to Union (#7919)
- Fixes #7352 by remembering original value types in type inference mode to be able to reconstruct them for Mixed.
   - Added more benchmarks for comparing performance of constructing columns.
- Fixes missing implementations that caused `Table.union` crashing on some type pairs.
- Ensures that `Loss_Of_Integer_Precision` warning is not swallowed when numeric columns are unioned to create a `Float` column.
- Adds test for all of the above cases.
- Allow to output benchmark results to a CSV by setting an environment variable - useful for quickly comparing benchmarks, e.g. in Enso.
2023-10-03 20:33:34 +02:00