Commit Graph

345 Commits

Author SHA1 Message Date
GregoryTravis
7436848e90
Implement relational NULL/Nothing for join for in-memory tables (#8849)
Implements relational NULL for join, for all `Join_Kind`s.
2024-01-29 16:19:07 +00:00
Radosław Waśko
ca4f98c78e
Adding tests and missing methods for Enso_File. (#8815)
- Closes #8808 - adds tests for various scenarios.
- Implements `size` using HEAD.
- Updates existing functions to changes in Cloud API.
- Adds stubs for `*_time` methods, `parent`, `path`.
- [x] TODO: resolve the `Enso_File.current_working_directory` from an environment variable.
- ~~TODO: recursive directory deletion?~~ left for later

# Important Notes
- Currently, the Cloud API does not offer an easy way to extract metadata for a file, in particular to get the parent folder from the file `id`.
- We should be able to get the parent, and stuff like creation/modified time.
- We need a way to resolve paths to asset ids, for `path` to work as well as `current_working_directory`.
- What is the environment variable that will be used to feed the `current_working_directory` property?
2024-01-26 19:04:42 +00:00
James Dunkerley
0b6db5797c
Refactor OrderMask to avoid memory copying (#8863)
Goal of this PR is to refactor the design of OrderMask and avoid copying arrays or lists wherever possible.
We have removed a few legacy functions which were not being used.

On a poor mans benchmark seems to be quicker (13s vs 16s) and memory usage should be lower.
2024-01-26 11:16:16 +00:00
GregoryTravis
5eb3f3bd1d
Implement relational NULL semantics for Nothing for in-memory Column operations (#8816)
Updates in-memory table column operations to treat Nothing as a relational NULL.
This PR does not include changes to Table.join.
2024-01-24 17:02:45 +00:00
AdRiley
6eb00a7c5f
Remove Conditions Helper Class (#8842)
* Tests green checkpoint

* Remove ConditionsHelper

* javafmtAll

* Dedupe
2024-01-24 16:56:39 +00:00
Radosław Waśko
edfcfde11c
Tests and improvements for secrets in cloud subdirectories (#8791)
- Closes #8723
- Adds some missing features that were needed to make this work:
- `Enso_File.create_directory` and `Enso_File.delete`, and basic tests for it
- Changes how `Enso_Secret.list` is obtained - using a different Cloud endpoint allows us to implement the desired logic, the default endpoint was giving us _all_ secrets which was not what we wanted here.
- Implements `Enso_Secret.update` and tests for it

# Important Notes
Notes describing any problems with the current Cloud API:
https://docs.google.com/document/d/1x8RUt3KkwyhlxGux7XUGfOdtFSAZV3fI9lSSqQ3XsXk/edit

Apparently, everything that was needed to make this feature work has already been implemented, although a few features needed workarounds on Enso side to work properly.
2024-01-24 10:17:22 +00:00
Radosław Waśko
368e4867b4
Allow secrets in AWS_Credential (#8774)
- Closes #8722
2024-01-19 19:00:56 +00:00
Radosław Waśko
14be36c401
Allow secrets in Header.authorization_* (#8761)
- Closes #8739
2024-01-18 12:49:47 +00:00
AdRiley
b8e93b3cba
Add new text_left and text_right functions (#8691)
Added text_left and text_right functions for in-memory and databases
2024-01-15 23:43:23 +00:00
Radosław Waśko
5b70ff25f7
Remove set_user_info from URI (#8738)
I have added this in #8591, but I have realised it may not be a good idea to have it, so I am removing that particular change.
2024-01-15 17:35:17 +00:00
Radosław Waśko
f34abeda0c
Add tests for Enso_Secrets, update to new cloud API (#8736)
- Closes #8556
2024-01-15 16:12:08 +00:00
Jaroslav Tulach
0e6952710a
Executing (parts of) Truffle TCK with Enso values (#8685) 2024-01-12 07:21:16 +01:00
AdRiley
f31ecc7c87
Make fill_nothing take an empty string (#8643)
* Add new test for required behaviour

* Handle case where strArg is an empty string

* More tests around fixed width field. Remove unneeded duplicate logic

* javafmtAll

* Further simplification

* SQLite doesn't have full type system

* SQLite doesn't have full type system
2024-01-10 11:59:10 +00:00
AdRiley
bf8dd1888c
Give file read its own helper widget for delimiters. (#8627)
Give file read its own helper widget for delimiters. Remove newline add none. The file read delimiter is similar but different to the split one and so should have its own set of options.
2024-01-04 11:59:42 +00:00
James Dunkerley
ffa06c9476
Sort handling of Nothing within Column || and && (#8656)
Follows the database logic:
![image](https://github.com/enso-org/enso/assets/4699705/328a0e36-5508-4c63-a60b-ac9a280cd93a)

Results:
![image](https://github.com/enso-org/enso/assets/4699705/77d6bf82-21f8-4aed-b4c5-45e429798189)
2024-01-03 10:40:40 +00:00
Radosław Waśko
d41d48e8a0
Merge URI_With_Query into URI, extend API of URI (#8591)
- Closes #8544
- Adds `reset_query_arguments` and `/` operators allowing to transform a URI.
- Adding tests for handling of various edge cases.
2023-12-21 18:39:26 +00:00
AdRiley
cfe0cbe0c1
Add text_length to column for in-memory and database (#8606)
Closes #8521
Adds text_length to Column
2023-12-21 11:31:13 +00:00
Radosław Waśko
d56b800c11
Remove the Apache dependency from std-base (#8571)
- After [suggestion](https://github.com/enso-org/enso/pull/8497#discussion_r1429543815) from @JaroslavTulach I have tried reimplementing the URL encoding using just `URLEncode` builtin util. I will see if this does not complicate other followup improvements, but most likely all should work so we should be able to get rid of the unnecessary bloat.
2023-12-20 18:01:08 +00:00
Radosław Waśko
724f8d2a56
Add tests for Enso Cloud auth + simple API mock for Enso_User (#8511)
- Closes #8354
- Extends `simple-httpbin` with a simple mock of the Cloud API (currently it checks the token and serves the `/users` endpoint).
- Renames `simple-httpbin` to `http-test-helper`.
2023-12-19 17:41:09 +00:00
Radosław Waśko
940b8f7d51
Improving tests and edge cases for URI and HTTP (#8497)
- Closes #8352
- ~~Proposed fix for #8493~~
- The temporary fix is deemed not viable. I will try to figure out a workaround and leave fixing #8493 to the engine team.
2023-12-15 17:58:45 +00:00
James Dunkerley
9e27b6487b
Minor fixes and tweak for Cloud APIs. (#8557)
- Fix secret to at least be working again
- Tweak to allow a MIMIC flow to work with value types (revisit in 2024).
2023-12-15 17:10:07 +00:00
Pavel Marek
c1098865f2
Update java formatter sbt plugin (#8543)
Add a local clone of javaFormatter plugin. The upstream is not maintained anymore. And we need to update it to use the newest Google java formatter because the old one, that we use, cannot format sources with Java 8+ syntax.

# Important Notes
Update to Google java formatter 1.18.1 - https://github.com/google/google-java-format/releases/tag/v1.18.1
2023-12-15 14:45:23 +00:00
Pavel Marek
4b65e44ef3
EpbLanguage re-uses other TruffleContext support to run tests with assertions enabled (#7882) 2023-12-15 13:31:32 +01:00
Radosław Waśko
b5c995a7bf
Reworking Excel support to allow for reading of big files (#8403)
- Closes #8111 by making sure that all Excel workbooks are read using a backing file (which should be more memory efficient).
- If the workbook is being opened from an input stream, that stream is materialized to a `Temporary_File`.
- Adds tests fetching Table formats from HTTP.
- Extends `simple-httpbin` with ability to serve files for our tests.
- Ensures that the `Infer` option on `Excel` format also works with streams, if content-type metadata is available (e.g. from HTTP headers).
- Implements a `Temporary_File` facility that can be used to create a temporary file that is deleted once all references to the `Temporary_File` instance are GCed.
2023-12-15 00:02:15 +00:00
Radosław Waśko
c6b6384fe6
Improve performance of anti-join (#8338)
- Closes #8217
2023-11-24 02:44:57 +00:00
James Dunkerley
ecaca12df1
Integrating Enso Cloud with the libraries (part 1...) (#8006)
- Add a `File_For_Read` type. Used for `File_Format` to read files.
- Added `Enso_User` representing the current user in `Enso_Cloud`.
- *Will be later able to list known users.*
- Added `Enso_Secret` representing a value defined in `Enso_Cloud`.
- Value not used within Enso only accessed within polyglot Java.
- Integrated into `Username_And_Password` and can be used within JDBC connections.
- Integrated into HTTP Headers so a secret can be used as a value.
- New `URI_With_Query` with the same API as `URI`. Supporting secrets in the value.
- *Will be integrated with AWS credentials.*
- Added `Enso_File` representing a file or a folder in the cloud.
- Support the same API as `File` (like the `S3_File`).
- *Will support `enso://` URI style access.*
2023-11-20 23:21:14 +00:00
Pavel Marek
5a7ad6bfe4
Upgrade enso to GraalVM for jdk 21 (#7991)
Upgrade to GraalVM JDK 21.
```
> java -version
openjdk version "21" 2023-09-19
OpenJDK Runtime Environment GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15)
OpenJDK 64-Bit Server VM GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15, mixed mode, sharing)
```

With SDKMan, download with `sdk install java 21-graalce`.

# Important Notes
- After this PR, one can theoretically run enso with any JRE with version at least 21.
- Removed `sbt bootstrap` hack and all the other build time related hacks related to the handling of GraalVM distribution.
- `project-manager` remains backward compatible - it can open older engines with runtimes. New engines now do no longer require a separate runtime to be downloaded.
- sbt does not support compilation of `module-info.java` files in mixed projects - https://github.com/sbt/sbt/issues/3368
- Which means that we can have `module-info.java` files only for Java-only projects.
- Anyway, we need just a single `module-info.class` in the resulting `runtime.jar` fat jar.
- `runtime.jar` is assembled in `runtime-with-instruments` with a custom merge strategy (`sbt-assembly` plugin). Caching is disabled for custom merge strategies, which means that re-assembly of `runtime.jar` will be more frequent.
- Engine distribution contains multiple JAR archives (modules) in `component` directory, along with `runner/runner.jar` that is hidden inside a nested directory.
- The new entry point to the engine runner is [EngineRunnerBootLoader](https://github.com/enso-org/enso/pull/7991/files#diff-9ab172d0566c18456472aeb95c4345f47e2db3965e77e29c11694d3a9333a2aa) that contains a custom ClassLoader - to make sure that everything that does not have to be loaded from a module is loaded from `runner.jar`, which is not a module.
- The new command line for launching the engine runner is in [distribution/bin/enso](https://github.com/enso-org/enso/pull/7991/files#diff-0b66983403b2c329febc7381cd23d45871d4d555ce98dd040d4d1e879c8f3725)
- [Newest version of Frgaal](https://repo1.maven.org/maven2/org/frgaal/compiler/20.0.1/) (20.0.1) does not recognize `--source 21` option, only `--source 20`.
2023-11-17 18:02:36 +00:00
GregoryTravis
ea3d778456
Allow the creation of a constant column on an in-memory table with no rows. (#8218) 2023-11-09 14:40:51 +00:00
Radosław Waśko
1b8b30a68d
Improve performance of Join_Condition.Between by sorting on one dimension (#8212)
- Closes #5303
- Refactors `JoinStrategy` allowing us to 'stack' join strategies on top of each other (to some extent) - currently a `HashJoin` can be followed by another join strategy (currently `SortJoin`)
- Adds benchmarks for join
- Due to limitations of the sorting approach this will still not be as fast as possible for cases where there is more than 1 `Between` condition in a single query - trying to demonstrate that in benchmarks.
- We can replace sorting by d-dimensional [RangeTrees](https://en.wikipedia.org/wiki/Range_tree) to get `O((n + m) log^d n + k)` performance (where `n` and `m` are sizes of joined tables, `d` is the amount of `Between` conditions used in the query and `k` is the result set size).
- Follow up ticket for consideration later:
#8216
- Closes #8215
- After all, it turned out that `TreeSet` was problematic (because of not enough flexibility with duplicate key handling), so the simplest solution was to immediately implement this sub-task.
- Closes #8204
- Unrelated, but I ran into this here: adds type checks to other arguments of `set`.
- Before, putting in a Column as `new_name` (i.e. mistakenly messing up the order of arguments), lead to a hard to understand `Method `if_then_else` of type Column could not be found.`, instead now it would file with type error 'expected Text got Column`.
2023-11-08 12:59:55 +00:00
Radosław Waśko
237aae33c7
Simplify internal logic of Table.order_by, avoid unnecessary warning (#8221)
- Fixes #8213
2023-11-06 11:00:01 +00:00
GregoryTravis
1480f50207
Overhaul the random number and item generation code (#8127)
Rewrite most of Random.enso.
2023-10-31 15:25:37 +00:00
Radosław Waśko
79011bd550
Implement Table.lookup_and_replace in Database (#8146)
- Closes #7981
- Adds a `RUNTIME_ERROR` operation into the DB dialect, that may be used to 'crash' a query if a condition is met - used to validate if `lookup_and_replace` invariants are still satisfied when the query is materialized.
- Removes old `Table_Helpers.is_table` and `same_backend` checks, in favour of the new way of checking this that relies on `Table.from` conversions, and is much simpler to use and also more robust.
2023-10-31 15:19:55 +00:00
Radosław Waśko
0c278391fe
Test and improve handling of Date_Time with_timezone=False in Postgres (#8114)
- Fixes #8049
- Adds tests for handling of Date_Time upload/download in Postgres.
- Adds tests for edge cases of handling of Decimal and Binary types in Postgres.
2023-10-21 21:35:13 +00:00
Radosław Waśko
8172896065
Support Previous_Value in fill_nothing and fill_missing (#8105)
- Adds `Previous_Value` to `fill_nothing` and `fill_empty`, as requested by #7192.
2023-10-20 13:18:53 +00:00
Radosław Waśko
93a31fcc8b
Add benchmarks related to add_row_number performance investigation (#8091)
- Follow-up of #8055
- Adds a benchmark comparing performance of Enso Map and Java HashMap in two scenarios - _only incremental_ updates (like `Vector.distinct`) and _replacing_ updates (like keeping a counter for each key). These benchmarks can be used as a metric for #8090
2023-10-18 17:21:59 +00:00
Radosław Waśko
e9fa12763e
Improve performance of add_row_number (#8076)
Fixes #8055
2023-10-17 00:42:35 +00:00
Radosław Waśko
08b717eb54
Refactor Table problem handling to a more robust and hopefully cleaner approach (#7879)
Closes #7514
2023-10-16 15:09:08 +00:00
GregoryTravis
f18d1323e1
Add Table.expand_to_rows to allow flattening vector and array values in table (#8042)
# Important Notes
Also includes a fix for a reallocation bug in `InferredBuilder`.
2023-10-13 20:54:06 +00:00
Radosław Waśko
cd84ac16ce
Restructure Table.from_objects to use conversions (#8020)
Closes #7957
2023-10-11 22:25:18 +00:00
somebody1234
826127d8ff
Eliminate line feeds from XML.outer_xml on Windows (#8013)
- Closes #7999

# Important Notes
None
2023-10-10 23:21:34 +00:00
Radosław Waśko
6e0bd86753
Implement Table.lookup_and_replace for in-memory (#7979)
- Closes #7749 implementing the in-memory logic.
- Additional complications have surfaced regarding the Database logic, so it has been split off into a separate ticket: #7981
2023-10-10 10:42:06 +00:00
GregoryTravis
9ba7be20af
Basic XML support (#7947)
This PR includes
* Reading XML from a file, stream, or string
* Reading XML via Data.fetch
* Accessing the root element, element children, and attributes
* Accessing tag text contents
* Get tags by name
* Inner / Outer XML string
2023-10-06 17:52:19 +00:00
Radosław Waśko
0cd446432f
Fix inconsistency when building a Mixed column, fixes to Union (#7919)
- Fixes #7352 by remembering original value types in type inference mode to be able to reconstruct them for Mixed.
   - Added more benchmarks for comparing performance of constructing columns.
- Fixes missing implementations that caused `Table.union` crashing on some type pairs.
- Ensures that `Loss_Of_Integer_Precision` warning is not swallowed when numeric columns are unioned to create a `Float` column.
- Adds test for all of the above cases.
- Allow to output benchmark results to a CSV by setting an environment variable - useful for quickly comparing benchmarks, e.g. in Enso.
2023-10-03 20:33:34 +02:00
Radosław Waśko
08cd449a99
Fix NumberParser to avoid thousandSeparator==decimalPoint and prefer US decimal format (#7946)
Closes #7930
2023-10-03 20:07:54 +02:00
Radosław Waśko
8d926166ea
Follow up improvements to Date_Time_Formatter (#7875)
- Closes #7872
- Also closes #7866
2023-09-28 09:38:00 +00:00
Radosław Waśko
c690559ec4
Implement auto_value_type operation (#7908)
Closes #6113
2023-09-27 15:45:34 +00:00
Radosław Waśko
12c4f2981d
More robust Date/Time format patterns parsing (#7826)
- Closes #7461 by introducing a `Date_Time_Formatter` type and making parsing date time formats more robust and safer.
- The default ('simple') set of patterns is slightly simplified and made case insensitive (except for `M/m` and `H/h`) to avoid the `YYYY` vs `yyyy` issues and make it less error prone.
- The `YYYY` now has the same meaning as `yyyy` in simple mode. The old meaning (week-based year) is moved to a _separate mode_, triggered by `Date_Time_Formatter.from_iso_week_date_pattern`.
- Full Java syntax, as well as custom-built Java `DateTimeFormatter` can also be used by `Date_Time_Formatter.from_java`.
- Text-based constants (e.g. `ISO_ZONED_DATE_TIME`) have now become methods on `Date_Time_Formatter`, e.g. `Date_Time_Formatter.iso_zoned_date_time`).
2023-09-22 10:12:18 +00:00
Jaroslav Tulach
ad34a701e4
Upgrading to Frgaal compiler 20.0.1 (#7860) 2023-09-22 09:58:19 +02:00
James Dunkerley
74d1d0861c
S3 Read Access, Input Stream based reading (#7776)
- Added a `FileSystemSPI` allowing protocol resolution to a target type.
- Separated `Input_Stream` and `Output_Stream` from `File` to allow use in other spaces.
- `File_Format` types `read_web` changed to be `read_stream` working with `InputStream`.
- Added directory listing to `Auto_Detect` allowing for `Data.read` to list a folder.
- Adjusted HTTP to return an `InputStream` not a `byte[]`:
- `Response_Body` adjusted to wrap an `InputStream`.
- Added ability to materialize to either and in-memory vector (<4KB) or a temporary file.
- `Data.fetch` will materialize if not a recognized mime-type.
- Added `HTTP_Error` to handle IO exceptions from the stream.
- `Excel_Format` now supports mime-type and reading a stream.
- `Excel_Workbook` can now get a `Excel_Section` using `read_section`.
- Added S3 APIs:
- `parse_uri`: splits an S3 URI into bucket and key.
- `list_objects`: list the items in a S3 bucket with specified prefix.
- `read_bucket`: list prefixes and keys with a delimiter in a S3 bucket with specified prefix.
- `head`: either head_bucket (tests existance) or head_object API (reads object meta data).
- `get_object`: gets an object from S3 returning as a `Response_Body`.
- Added `S3_File` type acting like a `File`:
- No support for writing in this PR.
- **ToDo:** recursive listing, glob filtering, exists, size.
- Fixed a few invalid type signature line.
- Moved `create` methods for `Postgres_Connection` and `SQLite_Connection` into type instead of module.
- Renamed `Column_Fetcher.Builder` to `Column_Fetcher_Builder`.
- Fixed bug with `select_into` in Dry Run mode creating permanent tables.

**ToDo:** Unit tests.
2023-09-20 15:09:11 +00:00
Hubert Plociniczak
1ee3d8f4f0
Rename Decimal to Float (#7807)
Implements #6889.
2023-09-14 15:01:30 +00:00
Radosław Waśko
8b6e70b155
Support for BigInteger values in Table (#7715)
- Fixes #7354
- And also closes #7712
- Refactors how we handle numeric ops - ensuring that the 'kernels' are placed all in one place and selected based on storage types.
2023-09-12 13:18:04 +00:00
Radosław Waśko
255b424b72
Add value_type to Column.from_vector and expected_value_type to Column.map and Column.zip (#7637)
- Closes #6111
- Aligns semantics of handling Mixed columns.
- Now, if an operation like `iif` or `fill_nothing` is given a `Mixed` column, the result will also be `Mixed` regardless of the `inferred_precise_value_type`.
- Enables a few old tests that were pending but could be enabled since the types work is advanced enough.
2023-08-31 13:20:49 +00:00
Radosław Waśko
2385f5b357
Add size-limited strings and varying bit-width integer Value_Types to in-memory backend and check for ArithmeticOverflow in LongStorage (#7557)
- Closes #5159
- Now data downloaded from the database can keep the type much closer to the original type (like string length limits or smaller integer types).
- Cast also exposes these types.
- The integers are still all stored as 64-bit Java `long`s, we just check their bounds. Changing underlying storage for memory efficiency may come in the future: #6109
- Fixes #7565
- Fixes #7529 by checking for arithmetic overflow in in-memory integer arithmetic operations that could overflow. Adds a documentation note saying that the behaviour for Database backends is unspecified and depends on particular database.
2023-08-22 18:10:46 +00:00
GregoryTravis
c9d7c5cb2b
Convert in-memory Column.round to Java (#7521) 2023-08-16 14:45:23 +00:00
Jaroslav Tulach
7a272ec152
Encapsulating array-like data and operations into a single package (#7544) 2023-08-15 13:00:47 +02:00
Radosław Waśko
b656b336c7
Report Loss_Of_Integer_Precision when an integer is not exactly representable as a float during conversion (#7509)
Closes #7353

I introduce a new type `WithAggregatedProblems`, because `WithProblems` was too simple - it only allowed to hold a `List<Problem>` but `AggregatedProblems` is more than that. Ideally we shouldn't multiply entities like this too much. We should probably unify all to use `WithAggregatedProblems` - but after starting this, I realised it will likely just take too much effort to do for this little PR. So instead, I created a follow-up task for this: #7514
2023-08-08 12:30:44 +00:00
Pavel Marek
8e49255d92
Invoke all Enso benchmarks via JMH (#7101)
# Important Notes
#### The Plot

- there used to be two kinds of benchmarks: in Java and in Enso
- those in Java got quite a good treatment
- there even are results updated daily: https://enso-org.github.io/engine-benchmark-results/
- the benchmarks written in Enso used to be 2nd class citizen

#### The Revelation
This PR has the potential to fix it all!
- It designs new [Bench API](88fd6fb988) ready for non-batch execution
- It allows for _single benchmark in a dedicated JVM_ execution
- It provides a simple way to wrap such an Enso benchmark as a Java benchmark
- thus the results of Enso and Java benchmarks are [now unified](https://github.com/enso-org/enso/pull/7101#discussion_r1257504440)

Long live _single benchmarking infrastructure for Java and Enso_!
2023-08-07 12:39:01 +00:00
GregoryTravis
758b3b31b9
Avoid indexing the table twice for Cross Tab (#7417)
Rewrites MultiValueIndex.makeCrossTabTable to build only a single index.
2023-08-04 21:14:18 +00:00
Radosław Waśko
bc9cde6543
Fix column naming edge cases - invalid and duplicated columns, case-insensitive name aliasing for case-insensitive backends (#7495)
- Fixes #7412
- Also adds tests and fixes some more edge cases:
- Ensures correct handling of existing Database tables whose column names may be invalid from Enso perspective, or clashing from Enso perspective (e.g. for most DBs `ś` and `s\u0301` are different names, but for Enso they are basically the same so this would cause issues - thus Enso now renames such columns when accessed (still using the correct column reference in the generated SQL under the hood).
2023-08-04 09:04:38 +00:00
GregoryTravis
037a687401
Expose Unicode normalization methods on Texts (#7425)
Exposes Text_Utils.normalize().
2023-08-03 18:07:00 +00:00
Radosław Waśko
c61c741476
Respect database backend naming limitations when generating table/column names and validate user-provided names to avoid silent name clashes; process JDBC warnings reported from backends (#7428)
- Closes #5951
- Ensures any SQL warnings reported by the database through the JDBC driver are processed and forwarded to the user.
- These warnings show issues like the implicit name truncation that this PR is also solving. It's good to make sure they are visible as they can help avoid and understand unexpected problems. They should not show up in most standard workflows.
- Adds simple history to our REPL.
2023-08-03 09:44:27 +00:00
James Dunkerley
7345f0fd9a
Speed up statistics (#7390)
- Allow `parse_to_columns` to take a `Regex` object.
- Add `pattern` to the `Regex` object.
- Add `column_names` to the `Row` object.
- Improve statistics performance.
- Add benchmarks for stats.

| Benchmark | Reference | New | Improvement |
| --- | --- | --- | --- |
| Max (by reduce) | 16.4ms | 16.3ms | - |
| Max (stats) | 703ms | 224ms | 68% |
| Sum (by reduce) | 38ms | 38ms | - |
| Sum (stats) | 753ms | 420ms | 44% |
| Variance (stats) | 745ms | 553s | 26% |

Also tried using a Ref approach for stats but as slower (7e13c45224).
2023-07-26 10:01:18 +00:00
Radosław Waśko
4b5a2e2176
Fixing operations on Mixed types (#7368)
- Fixes #7231
- Cleans up vectorized operations to distinguish unary and binary operations.
- Introduces MixedStorage which may pretend to be a more specialized storage on demand.
- Ensures that operations request a more specialized storage on right-hand side to ensure compatibility with reported inferred storage type.
- Ensures that a dataflow error returned by an Enso callback in Java is propagated as a polyglot exception and can be caught back in Enso
- Tests for comparison of Mixed storages with each other and other types
- Started using `Set` for `Filter_Condition.Is_In` for better performance.
- ~~Migrated `Column.map` and `Column.zip` to use the Java-to-Enso callbacks.~~
- This does not forward warnings. IMO we should not be losing them. We can switch and add a ticket to fix the warnings, but that would be a regression (current implementation handles them correctly). Instead, we should first gain some ability to work with warnings in polyglot. I created a ticket to get this figured out #7371
- ~~Trying to avoid conversions when calling Enso functions from Java.~~
- Needs extra care as dataflow errors may not be handled right then. So only works for simple functions that should not error.
- Not sure how much it really helps. [Benchmarks](https://github.com/enso-org/enso/pull/7270#issuecomment-1635618393) suggested it could improve the performance quite significantly, but the practical solution is not exactly the same as the one measured, so we may have to measure and tune it to get the best results.
- Created #7378 to track this.
2023-07-25 23:25:17 +00:00
Radosław Waśko
56635c9a88
Add benchmarks comparing performance of Table operations 'vectorized' in Java vs performed in Enso (#7270)
The added benchmark is a basis for a performance investigation.

We compare the performance of the same operation run in Java vs Enso to see what is the overhead and try to get the Enso operations closer to the pure-Java performance.
2023-07-21 17:25:02 +00:00
Radosław Waśko
620cc361ce
Add date_diff, date_add and date_part to scalar Enso date-time values. (#7273)
Followup of #7221, adding `date_diff`, `date_add` and `date_part` to scalar Enso date-time values.
2023-07-13 15:17:21 +00:00
Radosław Waśko
ca68dd94da
Adding new Date/Time operations (-, date_add, date_diff, date_part) (#7221)
- Adds `Column.date_diff` for computing date/time difference as integer multiply of some unit.
- Adds `Column.date_add` for shifting date/time by a unit.
- Adds `Column.date_part` for extracting various parts of the date/time value as integer.
- Adds widgets for the 3 methods above whose content depends on the column value type.
- Adds shorthands: `Column.hour`, `Column.minute` and `Column.second` to extract these date parts.
- Extends `Time_Period` with support for milli-, micro- and nano- seconds; and adapts functions taking `Time_Period` to support these wherever possible.
2023-07-13 12:56:54 +00:00
James Dunkerley
0adab6c68c
Round on a column was always adding a warning (#7246)
- Only warn if outside allowed range.
- Added `is_infinite` to In-Memory column.
- Allow integer value type for `is_nan` and `is_infinite`.
2023-07-10 17:35:23 +00:00
James Dunkerley
1fb60df61b
Fixes from the live demo. (#7243)
- Removed defaults from `cross_tab`. It caused an out-of-heap space error when it attempted to build a 205k x 205k table. Now has a hard limit of 10,000 columns - we can increase this once we have more concrete test data.
![image](https://github.com/enso-org/enso/assets/4699705/bc38d41c-56dc-41bd-8a7c-fa89ecfa7f79)

- Adjusted the dropdowns on `Aggregate_Column` for `columns` and `order_by` to be dropdowns as nested Vector editors are not supported.
![image](https://github.com/enso-org/enso/assets/4699705/f4a7c7cc-6a21-462c-a39e-65fbab82c367)

- Altered `Aggregate_Column` so `new_name` now `new_name:Text=""` and not taking `Nothing` anymore. Makes it appear correctly in IDE.
![image](https://github.com/enso-org/enso/assets/4699705/196a49ba-4274-44bb-b876-0372c8f62746)

- Added dropdowns for `fill_empty`, `fill_nothing` and `replace` on `Table`.
![image](https://github.com/enso-org/enso/assets/4699705/9ee5cec2-82d5-4452-b650-67015ac9fee5)

- Added `replace` to Database table throwing `Unsupport_Database_Operation`.
2023-07-09 18:03:05 +00:00
Radosław Waśko
78545b4402
Add safepoints to standard libraries Java polyglot helpers (#7183)
Closes #7129
2023-07-05 14:12:13 +00:00
Radosław Waśko
2d73277238
Fix a bug that somehow went under CI (#7204) 2023-07-05 08:54:27 +00:00
GregoryTravis
550d146493
Add round, ceil, floor, truncate to the In-Database Column type (#6988) 2023-06-30 16:47:40 +00:00
Radosław Waśko
2bac9cc844
Execution Context integration for Database write operations (#7072)
Closes #6887
2023-06-27 15:51:21 +00:00
James Dunkerley
937651f696
Code Clean Up, Fix Weird Namespace, S3 List Objects and Read Object (#7114)
Mostly a tidy up as part of looking over the function catalogue for groups.
Sorted some whitespaces issues.
2023-06-24 23:18:58 +00:00
James Dunkerley
1859ccbab5
Improving widgets and other minor tweaks. (#7052)
- Removed `module` argument from `enso_project` (new `Project_Description.new` API).
- Removed the custom option from date and time parse/format dropdowns.
- The `format` dropdown uses the value to create the dropdown. (Screenshot below)
- Removed `StorageType` coalescing rules and replaced them with simpler logic in `ObjectStorage`.
- Update signature for `add_row_number` and add aliases.
2023-06-19 19:03:36 +00:00
James Dunkerley
760fb71798
First part of AWS S3 API, various small fixes. (#6973)
- Add type detection for `Mixed` columns when calling column functions.
- Excel uses column name for missing headers.
- Add aliases for parse functions on text.
- Adjust `Date`, `Time_Of_Day` and `Date_Time` parse functions to not take `Nothing` anymore and provide dropdowns.
- Removed built-in parses.
- All support Locale.
- Add support for missing day or year for parsing a Date.
- All will trim values automatically.
- Added ability to list AWS profiles.
- Added ability to list S3 buckets.
- Workaround for Table.aggregate so default item added works.
2023-06-15 16:20:13 +00:00
Dmitry Bushev
6249c79ffd
Update sbt-java-formatter plugin (#7011)
Update java formatter plugin. The new version can remove unused imports.
2023-06-12 14:18:48 +00:00
James Dunkerley
578ba59f1d
Use US Locale for Date and Time parsing and formatting (#6967)
Sorts out parsing and printing long form names of months and weekdays.
2023-06-06 21:44:25 +00:00
Radosław Waśko
1931e9e51f
Workaround for to_date_time type errors (#6964)
Related to #6912

It essentially solves it by removing any builtins that would take an EnsoDate/EnsoTimeOfDay/EnsoTimeZone and replacing them with Java utils that do the same operation.

This is not a proper solution - the builtin conversion is still invalid for the date/time types - but at this moment we may just no longer use the invalid conversion so it is much less of an issue. We still need to be aware of this if we want to introduce builtins taking date/time in the future.
2023-06-06 20:28:11 +00:00
GregoryTravis
912fbce97b
Reimplement Column.truncate, .ceil, and .floor as vectorized Java ops (#6941)
Reimplement these in Java.

Benchmarks:

Before:

Column.truncate floats average: 124.4ms
Column.ceil floats average: 121.47ms
Column.floor floats average: 120.18ms
Column.truncate ints average: 124.78ms
Column.ceil ints average: 120.41ms
Column.floor ints average: 102.35ms

After (boxed):

Column.truncate floats average: 3.75ms
Column.ceil floats average: 2.25ms
Column.floor floats average: 1.89ms
Column.truncate ints average: 2ms
Column.ceil ints average: 1.77ms
Column.floor ints average: 1.74ms

After (unboxed):
Column.truncate floats average: 3.32ms
Column.ceil floats average: 2.15ms
Column.floor floats average: 1.69ms
Column.truncate ints average: 1.74ms
Column.ceil ints average: 1.61ms
Column.floor ints average: 1.99ms
2023-06-06 18:07:12 +00:00
Radosław Waśko
d44b1250b7
Implement Table.add_row_number (#6890)
Closes #5227

# Important Notes
- This lays first steps towards #6292 - we get pure Enso variants of MultiValueKey.
- Another part refactors `LongStorage` into `AbstractLongStorage` allowing it to provide alternative implementations of the underlying storage, in our case `LongRangeStorage` generating the values ad-hoc and `LongConstantStorage` - currently unused but in the future it can be adapted to support constant columns (once we implement similar facilities for other types).
2023-06-02 10:13:13 +00:00
GregoryTravis
0337180384
Add rounding functions to the Column type (#6817) 2023-06-01 20:06:23 +00:00
Radosław Waśko
c3e771c75c
Allow casting a Mixed column into a concrete type (#6777)
Follow-up of #6711

Closes #6838
2023-05-26 13:25:53 +00:00
Radosław Waśko
447786a304
Implement cast for Table and Column (#6711)
Closes #6112
2023-05-19 10:00:20 +00:00
Radosław Waśko
cd7fb73232
Add Date_Range (#6621)
Closes #6543
2023-05-11 16:03:02 +00:00
GregoryTravis
4ba8409def
Add format to the in-memory Column (#6538)
Add format to the in-memory Column

# Important Notes
Also updates .format in date types.
Some rearrangement of date formatting builtins / Java libraries.
2023-05-09 08:47:40 +00:00
James Dunkerley
bc0db18a6e
Small changes from Book Club issues (#6533)
- Add dropdown to tokenize and split `column`.
- Remove the custom `Join_Kind` dropdown.
- Adjust split and tokenize names to start numbering from 1, not 0.
- Add JS_Object serialization for Period.
- Add `days_until` and `until` to `Date`.
- Add `Date_Period.Day` and create `next` and `previous` on `Date`.
- Use simple names with `File_Format` dropdown.
- Avoid using `Main.enso` based imports in `Standard.Base.Data.Map` and `Standard.Base.Data.Text.Helpers`.
- Remove an incorrect import from `Standard.Database.Data.Table`.

From #6587:

A few small changes, lots of lines because this affected lots of tests:
- `Table.join` now defaults to `Join_Kind.Left_Outer`, to avoid losing rows in the left table unexpectedly. If the user really wants to have an Inner join, they can switch to it.
- `Table.join` now defaults to joining columns by name not by index - it looks in the right table for a column with the same name as the first column in left table.
- Missing Input Column errors now specify which table they refer to in the join.
- The unique name suffix in column renaming / default column names when loading from file is now a space instead of underscore.
2023-05-06 10:10:24 +00:00
Radosław Waśko
41a8257e8d
Separating Redshift connector from Database library into a new AWS library (#6550)
Related to #5777
2023-05-04 17:36:51 +00:00
James Dunkerley
6b0c682b08
Add Execution Context control to Text.write (#6459)
- Adjusted `Context.is_enabled` to support default argument (moved built in so can have defaults).
- Made `environment` case-insensitive.
- Bug fix for play button.
- Short hand to execute within an enabled context.
- Forbid file writing if the Output context is disabled with a `Forbidden_Operation` error.
- Add temporary file support via `File.create_temporary_file` which is deleted on exit of JVM.
- Execution Context first pass in `Text.write`.
- Added dry run warning.
- Writes to a temporary file if disabled.
- Created a `DryRunFileManager` which will create and manage the temporary files.
- Added `format` dropdown to `File.read` and `Data.read`.
- Renamed `JSON_File` to `JSON_Format` to be consistent.

(still to unit test).
2023-04-29 08:39:18 +00:00
James Dunkerley
0c7c3bdeaf
Fix for the massive number of warnings when renaming with invalid names. (#6450)
* Rename makeUnique overloads to avoid issue when Nothing is passed.
Suspend warnings when building the output table to avoid mass warning duplication.

* Add test for mixed invalid names.
Adjust so a single warning attached.

* PR comments.
2023-04-27 14:51:59 +01:00
Radosław Waśko
a43d524336
Add typechecks to Aggregate and Cross Tab (#6380)
Follow up of #6298 as it grew too much. Adds the needed typechecks to aggregate operations. Ensures that the DB operations report `Floating_Point_Equality` warning consistently with in-memory.
2023-04-24 08:55:54 +00:00
Radosław Waśko
8db2ad51a1
Adding typechecks to Column Operations (#6298)
Closes #6106
2023-04-21 12:20:12 +00:00
James Dunkerley
0350762386
Add replace, trim to Column. Better number parsing. (#6253)
- Add `replace` with same syntax as on `Text` to an in-memory `Column`.
- Add `trim` with same syntax as on `Text` to an in-memory `Column`.
- Add `trim` to in-database `Column`.
- Added `is_supported` to dialects and exposed the dialect consistently on the `Connection`.
- Add `write_table` support to `JSON_File` allowing `Table.write` to write JSON.
- Updated the parsing for integers and decimals:
- Support for currency symbols.
- Support for brackets for negative numbers.
- Automatic detection of decimal points and thousand separators.
- Tighter rules for scientific and thousand separated numbers.
- Remove `replace_text` from `Table`.
- Remove `write_json` from `Table`.
2023-04-20 16:04:59 +00:00
Radosław Waśko
f5db35af07
Adjust {Table|Column}.parse to use Value_Type (#6213)
Closes #5660
2023-04-06 10:58:55 +00:00
Jaroslav Tulach
4805193428
Text.to_display_text is (shortened) identity (#6174)
Fixes #5971.
2023-04-05 19:53:07 +00:00
GregoryTravis
d9bc5246ba
Remove old (Java) Regex library and replace with new (Truffle) library. (#6195)
Remove old (Java) Regex library and replace with new (Truffle) library.
2023-04-04 19:58:26 +00:00
GregoryTravis
fb77f42fd5
Update Text.split to take a Vector Text parameter (#6156)
Allows you to pass a vector of delimiters to `split`.
2023-04-04 14:44:47 +00:00
James Dunkerley
f26bcf6ab6
Small issues from working with Ned (#6160)
- `Process.run` now returns a `Process_Result` allowing the easy capture of stdout and stderr.
- Joining a column with a column name does not warn if adding just the prefix.
- Stop the table viz from changing case and adding spaces to the headers.
2023-04-03 13:01:42 +00:00
Radosław Waśko
6ddcb553e5
Date/time support for Postgres. Year/month/day operations on Columns. (#6153)
Closes #6115
2023-03-31 18:37:04 +00:00
Radosław Waśko
6f86115498
Proper implementation of Value Types in Table (#6073)
This is the first part of the #5158 umbrella task. It closes #5158, follow-up tasks are listed as a comment in the issue.

- Updates all prototype methods dealing with `Value_Type` with a proper implementation.
- Adds a more precise mapping from in-memory storage to `Value_Type`.
- Adds a dialect-dependent mapping between `SQL_Type` and `Value_Type`.
- Removes obsolete methods and constants on `SQL_Type` that were not portable.
- Ensures that in the Database backend, operation results are computed based on what the Database is meaning to return (by asking the Database about expected types of each operation).
- But also ensures that the result types are sane.
- While SQLite does not officially support a BOOLEAN affinity, we add a set of type overrides to our operations to ensure that Boolean operations will return Boolean values and will not be changed to integers as SQLite would suggest.
- Some methods in SQLite fallback to a NUMERIC affinity unnecessarily, so stuff like `max(text, text)` will keep the `text` type instead of falling back to numeric as SQLite would suggest.
- Adds ability to use custom fetch / builder logic for various types, so that we can support vendor specific types (for example, Postgres dates).

# Important Notes
- There are some TODOs left in the code. I'm still aligning follow-up tasks - once done I will try to add references to relevant tasks in them.
2023-03-31 16:16:18 +00:00
GregoryTravis
6b9cbeacb2
Implement Regular Expression replace and update Text.replace to the new API (#5959)
Re-implement replace on top of Truffle regex.
2023-03-28 06:13:12 +00:00
James Dunkerley
bf2545fa04
Use new common parse method throwing less exceptions. (#6075)
Avoiding exceptions by not using parseBest.

Time now in CLI is 1.15s for 500k rows vs 1.65s in GUI.

CLI:
![image](https://user-images.githubusercontent.com/4699705/227711266-bc005b0d-5011-450f-964b-65dd2e437c2e.png)

GUI:
![image](https://user-images.githubusercontent.com/4699705/227711259-f7ddda29-86c7-4eef-a002-4bf0bda6063f.png)

Added it as a function in the shared library so used by both engine and polyglot.
2023-03-27 11:02:10 +00:00
James Dunkerley
58f2c7643f
Use new Enso Hash Codes and Comparable (#6060)
Enables `distinct`, `aggregate` and `cross_tab` to use the Enso hashing and equality operations.
Also, I rewired the way the ObjectComparators are obtained in polyglot code to be more consistent.

Add Comparator for `Day_Of_Week`, `Header`, `SQL_Type`, `Image` and `Matrix`.
Also, removed the custom `==` from these types as needed. (Closes #5626)
2023-03-24 15:02:25 +00:00
Radosław Waśko
952beba8d1
Fix cross_tab column naming edge cases, add fill_empty (#5863)
Closes #5151 and adds some additional tests for `cross_tab` that verify duplicated and invalid names.

I decided that for empty or `Nothing` names, instead of replacing them with `Column` and implicitly losing connection with the value that was in the column, we should just error on such values.

To make handling of these easier, `fill_empty` was added allowing to easily replace the empty values with something else.

Also, `{is,fill}_missing` was renamed to `{is,fill}_nothing` to align with `Filter_Condition.Is_Nothing`.
2023-03-11 11:58:54 +00:00
Radosław Waśko
263c3ad651
Add a common-polyglot-core-utils project (#5855)
Adds a common project that allows sharing code between the `runtime` and `std-bits`.

Due to classpath separation and the way it is compiled, the classes will be duplicated - we will have one copy for the `runtime` classpath and another copy as a small JAR for `Standard.Base` library.

This is still much better than having the code duplicated - now at least we have a single source of truth for the shared implementations.

Due to the copying we should not expand this project too much, but I encourage to put here any methods that would otherwise require us to copy the code itself.

This may be a good place to put parts of the hashing logic to then allow sharing the logic between the `runtime` and the `MultiValueKey` in the `Table` library (cc: @Akirathan).
2023-03-11 09:27:26 +00:00
Radosław Waśko
91ef8acf35
Review generated Column names (#5850)
Closes #5583 and closes #5157
2023-03-10 19:07:58 +00:00
Radosław Waśko
62e57f5557
Test some Mismatched Quote edge cases in Delimited reader (#5810)
Follow-up to #5113 - I add some more edge case tests as we discussed with @jdunkerley

When debugging some quoting issues, I also realised the current `Mismatched_Quote` error provided not enough information. So I amended it to at least include some context indicating which was the 'offending' cell.
2023-03-10 15:47:57 +00:00
James Dunkerley
299bfd6b7d
Fixes from the Demo on 2nd March (#5823)
- Fix issue with Geo Map viz.
- Handle invalid format strings better in `Data_Formatter`.
- New constants for the ISO format strings (and a special ENSO_ZONED_DATE_TIME)
- Consistent Date Time format for parsing in all places.
- Avoid throwing exception in datetime parsing.
- Support for milliseconds (well nanoseconds) in Date_Time and Time_Of_Day.
- `Column.map` stays within Enso.
- Allow `Aggregate_Column.Group_By` in `cross_tab` group_by parameter.
2023-03-07 20:58:00 +00:00
Pavel Marek
b6e2319fcc
Comparators support partial ordering (#5778) 2023-03-07 04:16:38 +00:00
Radosław Waśko
2d29456ed1
Review File/Data read and read_text warnings (#5799)
Closes #5113

Fixes a bug where read-only files would be overwritten if File.write was used in backup mode, and added tests to avoid such regression. To implement it, introduced a `is_writable` property on `File`.
2023-03-06 03:43:38 +00:00
James Dunkerley
01fc34c18a
Improving Expression Support for In Database (#5790)
- Adjust Excel Workbook write behaviour.
- Support Nothing / Null constants.
- Deduce the type of arithmetic operations and `iif`.
- Allow Date_Time constants, treating as local timezone.
- Removed the `to_column_name` and `ensure_sane_name` code.
2023-03-03 12:03:05 +00:00
Radosław Waśko
793eafc866
Improve Table.parse_values API (#5692)
Closes #5111
2023-02-24 13:35:01 +00:00
James Dunkerley
652b8d5db3
Update rename_columns to new API design, add first_row, second_row and last_row functions to the table. (#5719)
- Updates the `rename_columns` API.
- Add `first_row`, `second_row` and `last_row` to the Table types.
- New option for reading only last row of ResultSet.
2023-02-23 19:42:45 +00:00
Radosław Waśko
4dcf802831
Ensure that warnings are preserved on Nothing values passing back to Enso through polyglot boundary (#5677)
Fixes #5672

# Important Notes
- Added a subproject `enso-test-java-helpers` which allows the in-Enso tests to add Java helpers for testing.
2023-02-17 13:38:26 +00:00
Radosław Waśko
3027c6f3a2
Ensure entries containing newlines are quoted when writing Delimited Files (#5652)
Fixes #5638
2023-02-17 00:57:48 +00:00
James Dunkerley
1bc27501e6
Remove Column type from Aggregate_Column, simplify Column_Selector, some new File_Formats (#5646)
- Updated `Widget.Vector_Editor` ready for use by IDE team.
- Added `get` to `Row` to make API more aligned.
- Added `first_column`, `second_column` and `last_column` to `Table` APIs.
- Adjusted `Column_Selector` and associated methods to have simpler API.
- Removed `Column` from `Aggregate_Column` constructors.
- Added new `Excel_Workbook` type and added to `Excel_Section`.
- Added new `SQLiteFormatSPI` and `SQLite_Format`.
- Added new `IamgeFormatSPI` and `Image_Format`.
2023-02-16 15:15:49 +00:00
Radosław Waśko
a02eab451e
Implement basic warnings for column arithmetic, review warnings on expressions and filter (#5605)
Closes #5109

# Important Notes
- Currently the tests pass for the in-memory parts of Common_Table_Operations, but still some stuff not working on DB backends - in progress.
2023-02-14 09:33:04 +00:00
James Dunkerley
1c821e22cf
Some fixed form the Anagrams experiment. (#5592)
- Fixes the display of Date, Time_Of_Day and Date_Time so doesn't wrap.
- Adjust serialization of large integer values for JS and display within table.
- Workaround for issue with using `.lines` in the Table (new bug filed).
- Disabled warning on no specified `separator` on `Concatenate`.

Does not include fix for aggregation on integer values outside of `long` range.
2023-02-08 22:17:00 +00:00
Radosław Waśko
4f90946d1e
Rework Invalid Aggregations (#5579)
Closes #5108
2023-02-08 18:39:09 +00:00
Radosław Waśko
778d28fba3
Table with no columns is not valid, No_Output_Columns is always an error (#4073)
Implements https://www.pivotaltracker.com/story/show/184226020
2023-01-25 02:40:23 +00:00
Radosław Waśko
d2e57edc8b
Add Table.cross_join and Table.zip to In-Memory Table (#4063)
Implements https://www.pivotaltracker.com/story/show/184239059
2023-01-23 13:19:52 +00:00
Radosław Waśko
8853053020
Division in Columns within InDB is integer based if both columns are integers (#4057)
Fixes https://www.pivotaltracker.com/story/show/184073099

# Important Notes
- Since now the only operator on columns for division, `/`, returns floats, it may be worth creating an additional `div` operator exposing integer division. But that will be done as a separate task aligning column operator APIs.
2023-01-17 20:29:25 +00:00
Radosław Waśko
082e0bfd0d
Add Table.union to the In-Memory Table. (#4052)
Implements https://www.pivotaltracker.com/story/show/183854144
2023-01-17 00:34:57 +00:00
Radosław Waśko
0088096a58
Implement Distinct for the Database backends (#4027)
Implements https://www.pivotaltracker.com/story/show/182307281
2023-01-11 22:46:54 +00:00
Radosław Waśko
8c661fdb74
Database Joins (#4007)
Implements https://www.pivotaltracker.com/story/show/184032869

# Important Notes
- Currently we get failures in Full joins on Postgres which show a more serious problem - amending equality to ensure that `[NULL = NULL] == True` breaks hash/merge based indexing - so such joins will be extremely inefficient. All our joins currently rely on this notion of equality which will mean all of our DB joins will be extremely inefficient.
- We need to find a solution that will support nulls and still work OK with indices (but after exploring a few approaches: `COALESCE(a = b, a IS NULL AND b is NULL)`, `a IS NOT DISTINCT FROM b`, `(a = b) OR (a IS NULL AND b is NULL)`; all of which did not work (they all result in `ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions`) I'm less certain that it is possible. Alternatively, we may need to change the NULL semantics to align it with SQL - this seems like likely the simpler solution, allowing us to generate simple, reliable SQL - the NULL=NULL solution will be cornering us into nasty workarounds very dependent on the particular backend.
2023-01-05 10:36:22 +00:00
Dmitry Bushev
1e5e2327ab
Improve performance of Text.compare_to (#4012)
PR adds a flag to `Text` implementation tracking whether it is in a FCD normal form. Then this information can be used in the `Normalizer.compare` method.



| Benchmark name | Old (ms) | With flag (ms)
| --- | --- | ---
| Unicode very short | 40.29 | 40.04
| Unicode medium | 9.07 | 1.99
| Unicode big - random | 115.39 | 0.35
| Unicode big - early difference | 107.02 | 0.54
| Unicode big - late difference | 749.81 | 94.73
| ASCII very short | 28.13 | 31.13
| ASCII medium | 4.58 | 2.26
| ASCII big - random | 42.68 | 0.26
| ASCII big - early difference | 30.91 | 0.32
| ASCII big - late difference | 66.29 | 42.72

Full benchmark output.
[bench_old.txt](https://github.com/enso-org/enso/files/10325202/bench_old.txt)
[bench_new.txt](https://github.com/enso-org/enso/files/10325201/bench_new.txt)
2023-01-02 17:09:03 +00:00
Jaroslav Tulach
7252af6d62
Enso.getMetaObject, Type.isMetaInstance and Meta.is_a consolidation (#3949)
Implements `getMetaObject` and related messages from Truffle interop for Enso values and types. Turns `Meta.is_a` into builtin and re-uses the same functionality.

# Important Notes
Adds `ValueGenerator` testing infrastructure to provide unified access to special Enso values and builtin types that can be reused by other tests, not just `MetaIsATest` and `MetaObjectTest`.
2022-12-22 08:00:06 +00:00
James Dunkerley
579d3fc397
Adds Date, Time_Of_Day and Date_Time support to Excel IO (#3997)
- Allow date time inputs from Excel.
- Enables disabled test.
- Fix for Map.==.
- Allow nulls in crosstab name.
2022-12-20 16:12:00 +00:00
James Dunkerley
ace459ed53
Let JavaScript parse JSON and write JSON ... (#3987)
Use JavaScript to parse and serialise to JSON. Parses to native Enso object.
- `.to_json` now returns a `Text` of the JSON.
- Json methods now `parse`, `stringify` and `from_pairs`.
- New `JSON_Object` representing a JavaScript Object.
- `.to_js_object` allows for types to custom serialize. Returning a `JS_Object`.
- Default JSON format for Atom now has a `type` and `constructor` property (or method to call for as needed to deserialise).
- Removed `.into` support for now.
- Added JSON File Format and SPI to allow `Data.read` to work.
- Added `Data.fetch` API for easy Web download.
- Default visualization for JS Object trunctes, and made Vector default truncate children too.

Fixes defect where types with no constructor crashed on `to_json` (e.g. `Matching_Mode.Last.to_json`.
Adjusted default visualisation for Vector, so it doesn't serialise an array of arrays forever.
Likewise, JS_Object default visualisation is truncated to a small subset.

New convention:
- `.get` returns `Nothing` if a key or index is not present. Takes an `other` argument allowing control of default.
- `.at` error if key or index is not present.
- `Nothing` gains a `get` method allowing for easy propagation.
2022-12-20 10:33:46 +00:00
Radosław Waśko
b9bf958f2c
Efficient joining for Equals and Equals_Ignore_Case using a hashmap (#3978)
- Implemented https://www.pivotaltracker.com/story/show/183913276
- Refactored MultiValueIndex and MultiValueKeys to be more type-safe and more direct about using ordered or unordered maps.
- Added performance tests ensuring we use an efficient algorithm for the joins (the tests will fail for a full O(N*M) scan).
- Removed some duplicate code in the Table library.
- Added optional coloring of test results in terminal to make failures easier to spot.
2022-12-14 22:56:20 +00:00
James Dunkerley
77fe69dfd9
JSON Improvements, small Table stuff, Statistic in Enso not Java and few other minor bits. (#3964)
- Aligned `compare_to` so returns `Type_Error` if `that` is wrong type for `Text`, `Ordering` and `Duration`.
- Add `empty_object`, `empty_array`. `get_or_else`, `at`, `field_names` and `length` to `Json`.
- Fix `Json` serialisation of NaN and Infinity (to "null").
- Added `length`, `at` and `to_vector` to Pair (allowing it to be treated as a Vector).
- Added `running_fold` to the `Vector` and `Range`.
- Added `first` and `last` to the `Vector.Builder`.
- Allow `order_by` to take a single `Sort_Column` or have a mix of `Text` and `Sort_Column.Name` in a `Vector`.
- Allow `select_columns_helper` to take a `Text` value. Allows for a single field in group_by in cross_tab.
- Added `Patch` and `Custom` to HTTP_Method.
- Added running `Statistic` calculation and moved more of the logic from Java to Enso. Performance seems similar to pure Java version now.
2022-12-14 19:40:27 +00:00
Radosław Waśko
8e880e430b
Improve basic join implementation (#3958)
Implements https://www.pivotaltracker.com/story/show/183913232

# Important Notes
Added counts of succeeded/failed tests within a group and global summary, to easier see how many tests failed.
2022-12-09 00:55:07 +00:00
James Dunkerley
11e07f8676
Use the MultiValueIndex for the JoinStrategy. (#3959)
Use the MultiValueStrategy for pure equals Joins.
2022-12-08 12:24:53 +00:00
James Dunkerley
0ad70c6332
Tidy Standard.Base part 5 of n ... (hopefully the end...) (#3929)
- Moved `Any`, `Error` and `Panic` to `Standard.Base`.
- Separated `Json` and `Range` extensions into own modules.
- Tidied `Case`, `Case_Sensitivity`, `Encoding`, `Matching`, `Regex_Matcher`, `Span`, `Text_Matcher`, `Text_Ordering` and `Text_Sub_Range` in `Standard.Base.Data.Text`.
- Tidied `Standard.Base.Data.Text.Extensions` and stopped it re-exporting anything.
- Tidied `Regex_Mode`. Renamed `Option` to `Regex_Option` and added type to export.
- Tidied up `Regex` space.
- Tidied up `Meta` space.
- Remove `Matching` from export.
- Moved `Standard.Base.Data.Boolean` to `Standard.Base.Boolean`.

# Important Notes
- Moved `to_json` and `to_default_visualization_data` from base types to extension methods.
2022-12-02 18:08:14 +00:00
James Dunkerley
4518f8303d
Implementing transpose and cross_tab for the InMemory table. (#3919)
- Adds transpose and cross_tab to the In-Memory table.
- Cross Tab is built on top of aggregate and hence allows for expressions and has same error trapping as in aggregate.

# Important Notes
Only basic tests have been implemented. Error and warning tests will be added as a follow up task.
2022-11-30 01:19:25 +00:00
Radosław Waśko
85cbf7d9f9
Initial (naive) implementation for in memory join (#3918)
Implements https://www.pivotaltracker.com/story/show/183854123

It features a naive full scan join and only allows equality conditions. More advanced conditions and better optimized algorithms will be implemented in a subsequent PR.
2022-11-29 19:37:31 +00:00
Jaroslav Tulach
35c9ef7680
Enhanced Vector Builder (#3809)
Manual implementation of vector builder that avoid any copying (if the initial `capacity` is exact). Moreover the builder optimizes for  storage of `double` and `long` values - if the array homogeneously consists of these values, then no boxing happens and only primitive types are stored.

# Important Notes
Added few tests to [Vector_Spec.enso](76d2f38247).
2022-11-29 04:41:06 +00:00
Jaroslav Tulach
ecd1fdc3f8
Caching the grapheme_length of a Text (#3864)
Computing length of a text takes time. Let's cache it after first computation.

# Important Notes
Wrote `StringBenchmarks` that sums lengths of (the same) `Text` present many time in a `Vector`. Initially it took `383.673 ms` per operation. Then it took `0.031 ms/op`. Looks like the `length` calls are returning instantly as they get cached.
2022-11-14 15:53:10 +00:00
James Dunkerley
45276b243d
Expanding Derived Columns and Expression Syntax (#3782)
- Added expression ANTLR4 grammar and sbt based build.
- Added expression support to `set` and `filter` on the Database and InMemory `Table`.
- Added expression support to `aggregate` on the Database and InMemory `Table`.
- Removed old aggregate functions (`sum`, `max`, `min` and `mean`) from `Column` types.
- Adjusted database `Column` `+` operator to do concatenation (`||`) when text types.
- Added power operator `^` to both `Column` types.
- Adjust `iif` to allow for columns to be passed for `when_true` and `when_false` parameters.
- Added `is_present` to database `Column` type.
- Added `coalesce`, `min` and `max` functions to both `Column` types performing row based operation.
- Added support for `Date`, `Time_Of_Day` and `Date_Time` constants in database.
- Added `read` method to InMemory `Column` returning `self` (or a slice).

# Important Notes
- Moved approximate type computation to `SQL_Type`.
- Fixed issue in `LongNumericOp` where it was always casting to a double.
- Removed `head` from InMemory Table (still has `first` method).
2022-11-08 15:57:59 +00:00
Pavel Marek
f8a4e2a9d2
Add Period type (#3818)
This PR adds `Period` type, which is a date-only complement to `Duration` builtin type.

# Important Notes
- `Period` replaces `Date_Period`, and `Time_Period`.
- Added shorthand constructors for `Duration` and `Period`. For example: `Period.days 10` instead of `Period.new days=10`.
- `Period` can be compared to other `Period` in some cases, other cases throw an error.
2022-10-28 17:27:20 +00:00
Radosław Waśko
2bc0611869
Add support for using Columns within Is_In (#3822)
Implements https://www.pivotaltracker.com/story/show/183560222
2022-10-24 12:51:15 +00:00
James Dunkerley
f0f6deef2a
Load the File_Format types via a ServiceLoader (#3813)
Moves the File.read method into the `File` type.
Uses the ServiceLoader to find all types for the File_Format.
2022-10-24 09:55:18 +00:00
Radosław Waśko
cc76e7d36a
Add support for Blank_Columns to Table and Database (#3812)
Implements https://www.pivotaltracker.com/story/show/183390281 and https://www.pivotaltracker.com/story/show/183390394
2022-10-20 09:11:08 +00:00
Radosław Waśko
17f73988e8
Update drop_missing_rows to filter_blank_rows API. (#3805)
Implements https://www.pivotaltracker.com/story/show/183390042 and https://www.pivotaltracker.com/story/show/183390370
2022-10-18 15:58:50 +00:00
Radosław Waśko
82de8f88bd
Add support for Is_In and Not_In to Filter_Condition (#3790)
Implements https://www.pivotaltracker.com/story/show/183389945
2022-10-15 11:29:59 +00:00
Pavel Marek
e9260227c4
Duration type is a builtin type (#3759)
- Reimplement the `Duration` type to a built-in type.
- `Duration` is an interop type.
- Allow Enso method dispatch on `Duration` interop coming from different languages.

# Important Notes
- The older `Duration` type should now be split into new `Duration` builtin type and a `Period` type.
- This PR does not implement `Period` type, so all the `Period`-related functionality is currently not working, e.g., `Date - Period`.
- This PR removes `Integer.milliseconds`, `Integer.seconds`, ..., `Integer.years` extension methods.
2022-10-14 18:08:08 +00:00
Radosław Waśko
592a8516a8
Add Is_Empty, Not_Empty, Like and Not_Like to Filter_Condition (#3775)
Implements https://www.pivotaltracker.com/story/show/183389890
2022-10-10 23:11:04 +00:00
Hubert Plociniczak
841b2e6e7a
Suppress some obvious warnings (#3768) 2022-10-07 10:07:40 +00:00
Radosław Waśko
61a4120cfb
Fix date comparisons and test sorting of tables and vectors with dates (#3745)
Implements https://www.pivotaltracker.com/story/show/183402892

# Important Notes
- Fixes inconsistent `compare_to` vs `==` behaviour in date/time types and adds test for that.
- Adds test for `Table.order_by` on dates and custom types.
- Fixes an issue with `Table.order_by` for custom types.
- Unifies how incomparable objects are reported by `Table.order_by` and `Vector.sort`.
- Adds benchmarks comparing `Table.order_by` and `Vector.sort` performance.
2022-09-29 08:48:00 +00:00
Radosław Waśko
cd10b5d34d
Add Date_Period.Week to start_of and end_of methods (#3733)
Implements https://www.pivotaltracker.com/story/show/183349732
2022-09-23 22:14:35 +00:00
Radosław Waśko
e9ebc663c1
Add business days functions to Date and Date_Time (#3726)
Implements https://www.pivotaltracker.com/story/show/183082087

# Important Notes
- Removed unnecessary invocations of `Error.throw` improving performance of `Vector.distinct`. The time of the `add_work_days and work_days_until should be consistent with each other` test suite came down from 15s to 3s after the changes.
2022-09-22 08:31:15 +00:00