Commit Graph

659 Commits

Author SHA1 Message Date
Radosław Waśko
f0c2a5fa7f
Opt-in return type checks (#8502)
- Closes #8240
2023-12-19 15:32:30 +00:00
Radosław Waśko
d4714af826
Add a few new Filter_Conditions (#8539)
- Closes #8045
2023-12-16 15:12:23 +00:00
Radosław Waśko
940b8f7d51
Improving tests and edge cases for URI and HTTP (#8497)
- Closes #8352
- ~~Proposed fix for #8493~~
- The temporary fix is deemed not viable. I will try to figure out a workaround and leave fixing #8493 to the engine team.
2023-12-15 17:58:45 +00:00
Pavel Marek
4b65e44ef3
EpbLanguage re-uses other TruffleContext support to run tests with assertions enabled (#7882) 2023-12-15 13:31:32 +01:00
Radosław Waśko
b5c995a7bf
Reworking Excel support to allow for reading of big files (#8403)
- Closes #8111 by making sure that all Excel workbooks are read using a backing file (which should be more memory efficient).
- If the workbook is being opened from an input stream, that stream is materialized to a `Temporary_File`.
- Adds tests fetching Table formats from HTTP.
- Extends `simple-httpbin` with ability to serve files for our tests.
- Ensures that the `Infer` option on `Excel` format also works with streams, if content-type metadata is available (e.g. from HTTP headers).
- Implements a `Temporary_File` facility that can be used to create a temporary file that is deleted once all references to the `Temporary_File` instance are GCed.
2023-12-15 00:02:15 +00:00
Radosław Waśko
7a05e679c3
Improve details attached to No_Output_Columns reported from various operations (#8528)
- Closes #7635
2023-12-14 10:49:07 +00:00
Hubert Plociniczak
a978d70a9e
Minor improvement to Startup.enso error (#8536)
Unable to parse the current error message due to newlines.

# Important Notes
For example https://github.com/enso-org/enso/actions/runs/7174245013/job/19535275294#step:10:16133
2023-12-14 09:38:58 +00:00
GregoryTravis
1c815a3d45
Better Error Trapping in map (#8307)
* tests

* wip

* wip

* additional warnings

* wip

* wip

* cleanup

* nested wrapping

* multiple nestings

* wraps_error uses looks_for, test for should_fail_with

* wip

* stack trace line fix

* use catch_primitive internally

* fix warning mapping, dtf spec

* just one wrapper checker, vector spec

* missing ctor, back to non-primitive catch

* back to c_p

* put old map back

* wip

* unnest tests

* Array.map on_problems

* wip

* Revert "wip"

This reverts commit c30d171457.

* better test names

* warning logging

* wip

* wip

* move logic into ALH

* doc

* constant

* My_Error.Error

* nested

* doc

* map_primtiive in warning mapper

* composition

* ref spec

* Remove warnings prior to matching on the value

If an expression has warnings and is matched we:
1) extract the warnings
2) execute the branch of a pattern that matches the value
3) attach extracted warnings to the result

This caused warnings to reappear when doing the custom warnings
manipulation.
This is also consistent with how `CaseNode`'s `doWarning` specialization
is defined.

* fix 1

* do not auto unwrap in test error checkers

* nested error matcher

* in problems too

* dtf

* v

* statistics

* wip

* Table_Spec, map_with_index_primitive

* Column_Operations_Spec

* disable warning wrapping and Report_Warning

* unimpl test

* Warnings_Spec

* DCS

* ACG JP

* zip_primitive

* join_helpers

* Lookup_Helpers

* Table

* Data_Formatter

* Value_Type_Helpers

* revert check types changes

* table_helpers

* table tests

* remove st

* do not remove warnings from value

* vec docs, tests for zip, mwi, flat_map

* docs, fixes

* remove nested_error_matcher

* cleanup

* benchmark

* one error

* alter

* add bench to main

* review

* review

* review

* tail call

* changelog

* tail call was not a tail call

* ws

* bad import

* Added missing import

* Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Array.enso

Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>

* review, ref example

* lazy benchmark data

* extra paren

* check outside of catch

* review

* vector too

* actually lazy

* disambiguate Map_Error

* finish rename

* move to extensions

* combine Additional_Warnings error

* rename to map_no_wrap

* do not catch and rethrow

* review

* wip

* remove _primitives entirely

* remove unused should_fail_with function options

* remove expected_warning as function in Problems

---------

Co-authored-by: Hubert Plociniczak <hubert.plociniczak@gmail.com>
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
2023-12-13 09:38:09 -05:00
somebody1234
3419c77bc6
Fix typo in Match.enso (#8515)
Fix issue causing visualizations to break:
![image](https://github.com/enso-org/enso/assets/4046547/d63239e2-3fde-424a-ba6c-3e163c6ce983)

# Important Notes
None
2023-12-12 17:59:44 +00:00
Jaroslav Tulach
80f94a21e1
Compare long and double and BigInteger properly (#8510) 2023-12-12 06:41:43 +01:00
Jaroslav Tulach
e4b2b56a40
Turning Sieve benchmarks into Enso benchmarks (#8475) 2023-12-08 10:27:52 +01:00
Jaroslav Tulach
7f0cb88fa1
Consistent simple and qualified type name (#8448)
Fixes #8255 by unifying `get_qualified_type_name` and `get_simple_type_name` implementations.
2023-12-06 04:30:24 +00:00
Pavel Marek
a67297aebf
Add graalpy packages to the component directory (#8351)
Adds these JAR modules to the `component` directory inside Engine distribution:
- `graal-language-23.1.0`
- `org.bouncycastle.*` - these need to be added for graalpy language

# Important Notes
- Remove `org.bouncycastle.*` packages from `runtime.jar` fat jar.
- Make sure that the `./run` script preinstalls GraalPy standalone distribution before starting engine tests
- Note that using `python -m venv` is only possible from standalone distribution, we cannot distribute `graalpython-launcher`.
- Make sure that installation of `numpy` and its polyglot execution example works.
- Convert `Text` to `TruffleString` before passing to GraalPy - 8ee9a2816f
2023-12-04 11:50:59 +00:00
Jaroslav Tulach
81f06456bf
400x faster with linear hashing of the hash map entries (#8425)
Fixes #5233 by removing `EconomicMap` & co. and using plain old good _linear hashing_. Fixes #8090 by introducing `StorageEntry.removed()` rather than copying the builder on each removal.
2023-12-01 06:43:13 +00:00
James Dunkerley
dd28517d9e
Separating Table tests from the Tests project. (#8397)
Moving a couple of tests to Table_Tests so Tests no longer depends on Tests.
2023-11-27 16:46:35 +00:00
Jaroslav Tulach
c6eb61a055
Filter for test/Benchmarks (#8391)
With herein proposed change one can pass an optional filter to `enso --run test/Benchmarks` to execute only groups and specs that contain given string in its name.
2023-11-27 15:27:12 +00:00
Jaroslav Tulach
893965ed5c
3% speedup with LazyMap and MetadataStorage (#8359) 2023-11-27 10:28:12 +01:00
Radosław Waśko
c6b6384fe6
Improve performance of anti-join (#8338)
- Closes #8217
2023-11-24 02:44:57 +00:00
Jaroslav Tulach
4464a15035
Benchmark to measure startup time (#8378)
While trying to speed `MetadataStorage` up - #8324 - I felt the need to have an independent (on my computer) measurement of startup time. Here is a benchmark that measures how long execution of two simple hello world programs take.

# Important Notes
There are two benchmarks:
- `empty_startup` measures the time needed to boot without using any `Standard` library - basically _an overhead of the JVM and engine_
- `hello_world_startup` measures the use of `IO.println` - it shall take longer than `empty_startup` and show the overhead we have while processing the standard library
2023-11-23 21:31:01 +00:00
James Dunkerley
347b5a7cf5
Linting and Groups update (#8357)
- Fix issues from the linter.
- Rename the constructors for `Blank_Selector`.
- Update various GROUP tags.
2023-11-21 18:12:27 +00:00
James Dunkerley
ecaca12df1
Integrating Enso Cloud with the libraries (part 1...) (#8006)
- Add a `File_For_Read` type. Used for `File_Format` to read files.
- Added `Enso_User` representing the current user in `Enso_Cloud`.
- *Will be later able to list known users.*
- Added `Enso_Secret` representing a value defined in `Enso_Cloud`.
- Value not used within Enso only accessed within polyglot Java.
- Integrated into `Username_And_Password` and can be used within JDBC connections.
- Integrated into HTTP Headers so a secret can be used as a value.
- New `URI_With_Query` with the same API as `URI`. Supporting secrets in the value.
- *Will be integrated with AWS credentials.*
- Added `Enso_File` representing a file or a folder in the cloud.
- Support the same API as `File` (like the `S3_File`).
- *Will support `enso://` URI style access.*
2023-11-20 23:21:14 +00:00
Jaroslav Tulach
1138dfe147
Specify expression to get more advanced results on_return callback (#8331) 2023-11-20 18:47:11 +01:00
Pavel Marek
5a7ad6bfe4
Upgrade enso to GraalVM for jdk 21 (#7991)
Upgrade to GraalVM JDK 21.
```
> java -version
openjdk version "21" 2023-09-19
OpenJDK Runtime Environment GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15)
OpenJDK 64-Bit Server VM GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15, mixed mode, sharing)
```

With SDKMan, download with `sdk install java 21-graalce`.

# Important Notes
- After this PR, one can theoretically run enso with any JRE with version at least 21.
- Removed `sbt bootstrap` hack and all the other build time related hacks related to the handling of GraalVM distribution.
- `project-manager` remains backward compatible - it can open older engines with runtimes. New engines now do no longer require a separate runtime to be downloaded.
- sbt does not support compilation of `module-info.java` files in mixed projects - https://github.com/sbt/sbt/issues/3368
- Which means that we can have `module-info.java` files only for Java-only projects.
- Anyway, we need just a single `module-info.class` in the resulting `runtime.jar` fat jar.
- `runtime.jar` is assembled in `runtime-with-instruments` with a custom merge strategy (`sbt-assembly` plugin). Caching is disabled for custom merge strategies, which means that re-assembly of `runtime.jar` will be more frequent.
- Engine distribution contains multiple JAR archives (modules) in `component` directory, along with `runner/runner.jar` that is hidden inside a nested directory.
- The new entry point to the engine runner is [EngineRunnerBootLoader](https://github.com/enso-org/enso/pull/7991/files#diff-9ab172d0566c18456472aeb95c4345f47e2db3965e77e29c11694d3a9333a2aa) that contains a custom ClassLoader - to make sure that everything that does not have to be loaded from a module is loaded from `runner.jar`, which is not a module.
- The new command line for launching the engine runner is in [distribution/bin/enso](https://github.com/enso-org/enso/pull/7991/files#diff-0b66983403b2c329febc7381cd23d45871d4d555ce98dd040d4d1e879c8f3725)
- [Newest version of Frgaal](https://repo1.maven.org/maven2/org/frgaal/compiler/20.0.1/) (20.0.1) does not recognize `--source 21` option, only `--source 20`.
2023-11-17 18:02:36 +00:00
GregoryTravis
ea3d778456
Allow the creation of a constant column on an in-memory table with no rows. (#8218) 2023-11-09 14:40:51 +00:00
GregoryTravis
6be94a854b
Implement truncate Date_Time for database backend (#8235)
Also adds some checks for column names generated for floor, ceil, truncate, round.
2023-11-08 23:23:59 +00:00
Radosław Waśko
1b8b30a68d
Improve performance of Join_Condition.Between by sorting on one dimension (#8212)
- Closes #5303
- Refactors `JoinStrategy` allowing us to 'stack' join strategies on top of each other (to some extent) - currently a `HashJoin` can be followed by another join strategy (currently `SortJoin`)
- Adds benchmarks for join
- Due to limitations of the sorting approach this will still not be as fast as possible for cases where there is more than 1 `Between` condition in a single query - trying to demonstrate that in benchmarks.
- We can replace sorting by d-dimensional [RangeTrees](https://en.wikipedia.org/wiki/Range_tree) to get `O((n + m) log^d n + k)` performance (where `n` and `m` are sizes of joined tables, `d` is the amount of `Between` conditions used in the query and `k` is the result set size).
- Follow up ticket for consideration later:
#8216
- Closes #8215
- After all, it turned out that `TreeSet` was problematic (because of not enough flexibility with duplicate key handling), so the simplest solution was to immediately implement this sub-task.
- Closes #8204
- Unrelated, but I ran into this here: adds type checks to other arguments of `set`.
- Before, putting in a Column as `new_name` (i.e. mistakenly messing up the order of arguments), lead to a hard to understand `Method `if_then_else` of type Column could not be found.`, instead now it would file with type error 'expected Text got Column`.
2023-11-08 12:59:55 +00:00
Radosław Waśko
2ce1567384
Limit max_rows that are downloaded in Table.read by default, and warn if more rows are available (#8159)
- Sets the default limit for `Table.read` in Database to be max 1000 rows.
- The limit for in-memory compatible API still defaults to `Nothing`.
- Adds a warning if there are more rows than limit.
- Enables a few unrelated asserts.
2023-11-06 16:41:47 +00:00
Radosław Waśko
237aae33c7
Simplify internal logic of Table.order_by, avoid unnecessary warning (#8221)
- Fixes #8213
2023-11-06 11:00:01 +00:00
GregoryTravis
3c371adbef
Implement Table.format similar to Table.parse allowing to format columns in bulk (#8150)
* doc

* one test

* date tests

* empty and nothing

* ints floats

* bools

* all columns

* regex and index

* locales

* bad formats

* all with one format

* docs

* examples, not impl db

* docs, more errors

* cleanup

* changelog

* check list

* reorder

* clue

* review

* review

* review

* review

* review

* review

* specify time zone
2023-11-02 09:36:36 -04:00
Cassandra-Clark
b5d6628c57
Change filter_blank_rows when_any parameter to have a more user-friendly type (#7935)
Added Blank_Selector constructor and applied to remove_blank_columns, select_blank_columns, filter_blank_rows for #7931 . Changed when_any to when for readability.
2023-11-01 16:51:15 +00:00
GregoryTravis
d467683ed1
Constant columns (in expressions and Column_Operations) should have clearer names (#8188)
Previously, constant columns were given generated names with UUIDs in them, which are long and provide no information. Instead, we now use the constant value itself to form the name.

Since these new generated names are less unique, we must explicitly make them unique, in cases where the caller did not explicilty set a name.
2023-11-01 14:41:03 +00:00
GregoryTravis
1480f50207
Overhaul the random number and item generation code (#8127)
Rewrite most of Random.enso.
2023-10-31 15:25:37 +00:00
Radosław Waśko
79011bd550
Implement Table.lookup_and_replace in Database (#8146)
- Closes #7981
- Adds a `RUNTIME_ERROR` operation into the DB dialect, that may be used to 'crash' a query if a condition is met - used to validate if `lookup_and_replace` invariants are still satisfied when the query is materialized.
- Removes old `Table_Helpers.is_table` and `same_backend` checks, in favour of the new way of checking this that relies on `Table.from` conversions, and is much simpler to use and also more robust.
2023-10-31 15:19:55 +00:00
Pavel Marek
b084e097c9
Add special handling for Dataflow_Error passed to Runtime.assert (#8168)
Add special handling for `Dataflow_Error` passed as action to `Runtime.assert`. It caused an infinite recursion.
2023-10-27 17:22:15 +00:00
Radosław Waśko
c1259cb4d2
Compare performance of Panic / Java Exception / Dataflow error (#8130)
After a discussion, I was really curious that our panics are supposed to be almost free - and while trusting that statement, it was really hard to believe - so I wanted to see for myself - knowing that an experiment is the most robust source of this kind of information - testing that in practice.

So I wrote a benchmark comparing various ways of reporting errors, also testing them both at 'shallow' and 'deep' stack traces (adding 200 additional frames) - to see how stack depth affects them, if at all.

The panics are indeed blazing fast! Kudos to the engine team. However, it seems that our dataflow errors are relatively slow (and we tend to use them _more_ than panics and want to be using them more and more). This uncovers a possible optimization opportunity. Can we make them as fast as panics??

Analysis of the benchmark results in comment below.
2023-10-24 12:03:44 +00:00
Jaroslav Tulach
e283c78977
Properly convert defaulted arguments before on_call back (#8143) 2023-10-24 13:39:54 +02:00
Radosław Waśko
0c278391fe
Test and improve handling of Date_Time with_timezone=False in Postgres (#8114)
- Fixes #8049
- Adds tests for handling of Date_Time upload/download in Postgres.
- Adds tests for edge cases of handling of Decimal and Binary types in Postgres.
2023-10-21 21:35:13 +00:00
Radosław Waśko
8172896065
Support Previous_Value in fill_nothing and fill_missing (#8105)
- Adds `Previous_Value` to `fill_nothing` and `fill_empty`, as requested by #7192.
2023-10-20 13:18:53 +00:00
James Dunkerley
ad95dc9815
Fixes for Table viz (#8102)
- Fixes issue with `to_display_text` on `Date_Time`. Now a reversible format.
- Adjusted the auto width code on table so it works.
- Manually sized columns should remain fixed size.
- Enable Excel range style selection in table.
- Removed page support as not implemented yet.

# Important Notes
Will need to apply to Vue based viz at some point as well.
2023-10-19 08:22:23 +00:00
GregoryTravis
7383db0e04
Restructuring XML into Table form (#8083)
# Important Notes
Adds `.to Table` support, as well as XML support for `expand_column`.
2023-10-19 07:02:48 +00:00
Radosław Waśko
28fc183f92
Review places where we can use Column_Ref (#8101)
Closes #8046
2023-10-18 19:03:50 +00:00
Radosław Waśko
93a31fcc8b
Add benchmarks related to add_row_number performance investigation (#8091)
- Follow-up of #8055
- Adds a benchmark comparing performance of Enso Map and Java HashMap in two scenarios - _only incremental_ updates (like `Vector.distinct`) and _replacing_ updates (like keeping a counter for each key). These benchmarks can be used as a metric for #8090
2023-10-18 17:21:59 +00:00
Radosław Waśko
e9fa12763e
Improve performance of add_row_number (#8076)
Fixes #8055
2023-10-17 00:42:35 +00:00
Radosław Waśko
08b717eb54
Refactor Table problem handling to a more robust and hopefully cleaner approach (#7879)
Closes #7514
2023-10-16 15:09:08 +00:00
GregoryTravis
f18d1323e1
Add Table.expand_to_rows to allow flattening vector and array values in table (#8042)
# Important Notes
Also includes a fix for a reallocation bug in `InferredBuilder`.
2023-10-13 20:54:06 +00:00
James Dunkerley
b7d7910a88
Use US locale for Date/Time parsing by default. (#8053)
Fixes issue with parsing long format month names.
2023-10-13 17:47:14 +00:00
James Dunkerley
fac9e7a420
Expand capabilities of Table.set and better dropdown support, (#8005)
- Adds the ability to use numbers, date/time and Boolean values as constants in `set`.
- `Table.set` can take a `Column_Operation`, allowing for deriving of a new column based on other columns.
- Added `Column_Ref` type to refer to a column in `filter`.
2023-10-13 16:03:28 +00:00
Jaroslav Tulach
31624b5314
Helper method to turn iterator to a Vector (#8035) 2023-10-12 17:50:51 +02:00
James Dunkerley
0dcfc3e9bf
Minor improvements from last couple of Book Clubs (#8034)
- Added some ALIASes.
- Added `sheet` to `Excel_Workbook` to give familiar API to read sheet.
- Added conversion from range to vector allowing easy use with Zip.
- Add `Map.from_keys_and_values.
2023-10-12 14:29:59 +00:00
Radosław Waśko
cd84ac16ce
Restructure Table.from_objects to use conversions (#8020)
Closes #7957
2023-10-11 22:25:18 +00:00