Commit Graph

651 Commits

Author SHA1 Message Date
Radosław Waśko
1b8b30a68d
Improve performance of Join_Condition.Between by sorting on one dimension (#8212)
- Closes #5303
- Refactors `JoinStrategy` allowing us to 'stack' join strategies on top of each other (to some extent) - currently a `HashJoin` can be followed by another join strategy (currently `SortJoin`)
- Adds benchmarks for join
- Due to limitations of the sorting approach this will still not be as fast as possible for cases where there is more than 1 `Between` condition in a single query - trying to demonstrate that in benchmarks.
- We can replace sorting by d-dimensional [RangeTrees](https://en.wikipedia.org/wiki/Range_tree) to get `O((n + m) log^d n + k)` performance (where `n` and `m` are sizes of joined tables, `d` is the amount of `Between` conditions used in the query and `k` is the result set size).
- Follow up ticket for consideration later:
#8216
- Closes #8215
- After all, it turned out that `TreeSet` was problematic (because of not enough flexibility with duplicate key handling), so the simplest solution was to immediately implement this sub-task.
- Closes #8204
- Unrelated, but I ran into this here: adds type checks to other arguments of `set`.
- Before, putting in a Column as `new_name` (i.e. mistakenly messing up the order of arguments), lead to a hard to understand `Method `if_then_else` of type Column could not be found.`, instead now it would file with type error 'expected Text got Column`.
2023-11-08 12:59:55 +00:00
Radosław Waśko
2ce1567384
Limit max_rows that are downloaded in Table.read by default, and warn if more rows are available (#8159)
- Sets the default limit for `Table.read` in Database to be max 1000 rows.
- The limit for in-memory compatible API still defaults to `Nothing`.
- Adds a warning if there are more rows than limit.
- Enables a few unrelated asserts.
2023-11-06 16:41:47 +00:00
James Dunkerley
a850ecb787
Change widget for Order By. (#8226)
Simplifies the Order By drop and always adds as a `Sort_Column`. If user wants to do descending is a two step process but feels more natural.
2023-11-06 16:27:36 +00:00
Radosław Waśko
237aae33c7
Simplify internal logic of Table.order_by, avoid unnecessary warning (#8221)
- Fixes #8213
2023-11-06 11:00:01 +00:00
GregoryTravis
3c371adbef
Implement Table.format similar to Table.parse allowing to format columns in bulk (#8150)
* doc

* one test

* date tests

* empty and nothing

* ints floats

* bools

* all columns

* regex and index

* locales

* bad formats

* all with one format

* docs

* examples, not impl db

* docs, more errors

* cleanup

* changelog

* check list

* reorder

* clue

* review

* review

* review

* review

* review

* review

* specify time zone
2023-11-02 09:36:36 -04:00
Hubert Plociniczak
2db4f4c5d9
Upgrade directory-watcher library (#8201)
The change upgrades `directory-watcher` library, hoping that it will fix the problem reported in #7695 (there has been a number of bug fixes in MacOS listener since then).

Once upgraded, tests in `WatcherAdapterSpec` because the logic that attempted to ensure the proper initialization order in the test using semaphore was wrong. Now starting the watcher using `watchAsync` which only returns the future when the watcher successfully registers for paths. Ideally authors of the library would make the registration bit public
(3218d68a84/core/src/main/java/io/methvin/watcher/DirectoryWatcher.java (L229C7-L229C20)) but it is the best we can do so far.

Had to adapt to the new API in PathWatcher as well, ensuring the right order of initialization.

Should fix #7695.
2023-11-02 11:24:26 +00:00
Cassandra-Clark
b5d6628c57
Change filter_blank_rows when_any parameter to have a more user-friendly type (#7935)
Added Blank_Selector constructor and applied to remove_blank_columns, select_blank_columns, filter_blank_rows for #7931 . Changed when_any to when for readability.
2023-11-01 16:51:15 +00:00
GregoryTravis
d467683ed1
Constant columns (in expressions and Column_Operations) should have clearer names (#8188)
Previously, constant columns were given generated names with UUIDs in them, which are long and provide no information. Instead, we now use the constant value itself to form the name.

Since these new generated names are less unique, we must explicitly make them unique, in cases where the caller did not explicilty set a name.
2023-11-01 14:41:03 +00:00
GregoryTravis
1480f50207
Overhaul the random number and item generation code (#8127)
Rewrite most of Random.enso.
2023-10-31 15:25:37 +00:00
Radosław Waśko
79011bd550
Implement Table.lookup_and_replace in Database (#8146)
- Closes #7981
- Adds a `RUNTIME_ERROR` operation into the DB dialect, that may be used to 'crash' a query if a condition is met - used to validate if `lookup_and_replace` invariants are still satisfied when the query is materialized.
- Removes old `Table_Helpers.is_table` and `same_backend` checks, in favour of the new way of checking this that relies on `Table.from` conversions, and is much simpler to use and also more robust.
2023-10-31 15:19:55 +00:00
Radosław Waśko
0c278391fe
Test and improve handling of Date_Time with_timezone=False in Postgres (#8114)
- Fixes #8049
- Adds tests for handling of Date_Time upload/download in Postgres.
- Adds tests for edge cases of handling of Decimal and Binary types in Postgres.
2023-10-21 21:35:13 +00:00
Radosław Waśko
8172896065
Support Previous_Value in fill_nothing and fill_missing (#8105)
- Adds `Previous_Value` to `fill_nothing` and `fill_empty`, as requested by #7192.
2023-10-20 13:18:53 +00:00
Kaz Wesley
2edd2bd7ff
Ensure all spans have document offsets (#8039)
- Validate spans during existing lexer and parser unit tests, and in `enso_parser_debug`.
- Fix lost span info causing failures of updated tests.

# Important Notes
- [x] Output of `parse_all_enso_files.sh` is unchanged since before #7881 (modulo libs changes since then).
- When the parser encounters an input with the first line indented, it now creates a sub-block for lines at than indent level, and emits a syntax error (every indented block must have a parent).
- When the parser encounters a number with a base but no digits (e.g. `0x`), it now emits a `Number` with `None` in the digits field rather than a 0-length digits token.
2023-10-19 12:36:42 +00:00
James Dunkerley
ad95dc9815
Fixes for Table viz (#8102)
- Fixes issue with `to_display_text` on `Date_Time`. Now a reversible format.
- Adjusted the auto width code on table so it works.
- Manually sized columns should remain fixed size.
- Enable Excel range style selection in table.
- Removed page support as not implemented yet.

# Important Notes
Will need to apply to Vue based viz at some point as well.
2023-10-19 08:22:23 +00:00
GregoryTravis
7383db0e04
Restructuring XML into Table form (#8083)
# Important Notes
Adds `.to Table` support, as well as XML support for `expand_column`.
2023-10-19 07:02:48 +00:00
Radosław Waśko
28fc183f92
Review places where we can use Column_Ref (#8101)
Closes #8046
2023-10-18 19:03:50 +00:00
Radosław Waśko
93a31fcc8b
Add benchmarks related to add_row_number performance investigation (#8091)
- Follow-up of #8055
- Adds a benchmark comparing performance of Enso Map and Java HashMap in two scenarios - _only incremental_ updates (like `Vector.distinct`) and _replacing_ updates (like keeping a counter for each key). These benchmarks can be used as a metric for #8090
2023-10-18 17:21:59 +00:00
Hubert Plociniczak
0b58a361ed
Eliminate VCS TimeoutExceptions on startup (#8080)
Having a modest-size files in a project would lead to a timeout when the project was first initialized. This became apparent when testing delivered `.enso-project` files with some data files. After some digging there was a bug in JGit
(https://bugs.eclipse.org/bugs/show_bug.cgi?id=494323) which meant that adding such files was really slow. The implemented fix is not on by default but even with `--renormalization` turned off I did not see improvement.
In the end it didn't make sense to add `data` directory to our version control, or any other files than those in `src` or some meta files in `.enso`. Not including such files eliminates first-use initialization problems.

# Important Notes
To test, pick an existing Enso project with some data files in it (> 100MB) and remove `.enso/.vcs` directory. Previously it would timeout on first try (and work in successive runs). Now it works even on the first try.

The crash:
```
[org.enso.languageserver.requesthandler.vcs.InitVcsHandler] Initialize project request [Number(2)] for [f9a7cd0d-529c-4e1d-a4fa-9dfe2ed79008] failed with: null.
java.util.concurrent.TimeoutException: null
at org.enso.languageserver.effect.ZioExec$.<clinit>(Exec.scala:134)
at org.enso.languageserver.effect.ZioExec.$anonfun$exec$3(Exec.scala:60)
at org.enso.languageserver.effect.ZioExec.$anonfun$exec$3$adapted(Exec.scala:60)
at zio.ZIO.$anonfun$foldCause$4(ZIO.scala:683)
at zio.internal.FiberRuntime.runLoop(FiberRuntime.scala:904)
at zio.internal.FiberRuntime.evaluateEffect(FiberRuntime.scala:381)
at zio.internal.FiberRuntime.evaluateMessageWhileSuspended(FiberRuntime.scala:504)
at zio.internal.FiberRuntime.drainQueueOnCurrentThread(FiberRuntime.scala:220)
at zio.internal.FiberRuntime.run(FiberRuntime.scala:139)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1395)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
```
2023-10-18 09:34:08 +00:00
Radosław Waśko
e9fa12763e
Improve performance of add_row_number (#8076)
Fixes #8055
2023-10-17 00:42:35 +00:00
Radosław Waśko
08b717eb54
Refactor Table problem handling to a more robust and hopefully cleaner approach (#7879)
Closes #7514
2023-10-16 15:09:08 +00:00
Hubert Plociniczak
352ad06d2f
Reduce extra output in compilation and tests (#7809)
* Reduce extra output in compilation and tests

I couldn't stand the amount of extra output that we got when compiling
a clean project and when executing regular tests. We should strive to
keep output clean and not print anything additional to stdout/stderr.

* Getting rid of explicit setup by service loading

In order for SL4J to use service loading correctly had to upgrade to
latest slf4j. Unfortunately `TestLogProvider` which essentially
delegates to `logback` provider will lead to spurious ambiguous warnings
on multiple providers. In order to dictate which one to use and
therefore eliminate the warnings we can use the `slf4j.provider` env
var, which is only available in slf4j 2.x.

Now, there is no need to explicitly call `LoggerSetup.get().setup()` as
that is being called during service setup.

* legal review

* linter

* Ensure ConsoleHandler uses the default level

ConsoleHandler's constructor uses `Level.INFO` which is unnecessary for
tests.

* report warnings
2023-10-16 10:57:44 +02:00
GregoryTravis
f18d1323e1
Add Table.expand_to_rows to allow flattening vector and array values in table (#8042)
# Important Notes
Also includes a fix for a reallocation bug in `InferredBuilder`.
2023-10-13 20:54:06 +00:00
James Dunkerley
b7d7910a88
Use US locale for Date/Time parsing by default. (#8053)
Fixes issue with parsing long format month names.
2023-10-13 17:47:14 +00:00
James Dunkerley
fac9e7a420
Expand capabilities of Table.set and better dropdown support, (#8005)
- Adds the ability to use numbers, date/time and Boolean values as constants in `set`.
- `Table.set` can take a `Column_Operation`, allowing for deriving of a new column based on other columns.
- Added `Column_Ref` type to refer to a column in `filter`.
2023-10-13 16:03:28 +00:00
Jaroslav Tulach
31624b5314
Helper method to turn iterator to a Vector (#8035) 2023-10-12 17:50:51 +02:00
James Dunkerley
0dcfc3e9bf
Minor improvements from last couple of Book Clubs (#8034)
- Added some ALIASes.
- Added `sheet` to `Excel_Workbook` to give familiar API to read sheet.
- Added conversion from range to vector allowing easy use with Zip.
- Add `Map.from_keys_and_values.
2023-10-12 14:29:59 +00:00
Radosław Waśko
cd84ac16ce
Restructure Table.from_objects to use conversions (#8020)
Closes #7957
2023-10-11 22:25:18 +00:00
Jaroslav Tulach
7ae0971d11
Runtime checks of signatures with polyglot java classes (#8016) 2023-10-11 15:05:36 +02:00
Radosław Waśko
6f78570115
Fix a DROP table bug, add SQL debug logging (#8007)
- Fixes a bug where creating a temporary table could accidentally issue a `DROP` statement of the table name that the user provided, risking destruction of user data.
- Fortunately, the bad scenario was almost impossible, because the `DROP` statement was only issued _if_ we previously checked that the mentioned table _does not exist_ - dropping a nonexistent table does not do any harm.
- It could have been dangerous in a very unlikely scenario that a table was created just between the _existence check_ and the _drop_.
- After the fix the existence check and any modifications are done within a transaction to avoid interference from concurrent modifications, and the DROP is correctly applied to a temporary Enso table instead of the original one.
- Replaced a temporary log with proper simple logging of SQL statements into a file, if an Environment variable is set.
- Used that feature to test that no unexpected statements occur.
2023-10-10 13:16:06 +00:00
Radosław Waśko
6e0bd86753
Implement Table.lookup_and_replace for in-memory (#7979)
- Closes #7749 implementing the in-memory logic.
- Additional complications have surfaced regarding the Database logic, so it has been split off into a separate ticket: #7981
2023-10-10 10:42:06 +00:00
Jaroslav Tulach
a234e82ee9
Instrumenter to observe behavior of nodes with UUID (#7833)
Exposing instrumentation capabilities to Enso. Fixes #7683.
2023-10-10 02:36:59 +00:00
Pavel Marek
4db61210c0
Exporting non-existing symbols fails with compiler error (#7960)
Adds a new compiler analysis pass that ensures that all the symbols exported via `export ...` or `from ... export ...` statements exist. If not, generates an IR Error.

# Important Notes
We already have such a compiler pass for the version with imports in [ImportSymbolAnalysis.scala](https://github.com/enso-org/enso/blob/develop/engine/runtime/src/main/scala/org/enso/compiler/pass/analyse/ImportSymbolAnalysis.scala)
2023-10-09 09:48:43 +00:00
GregoryTravis
a90af9f844
Cache filtered children in XML (#7998)
We currently scan the entire child list for each call to children, num_children, and get_child_element.

Instead, we should cache the filtered child list on first access.
2023-10-06 19:44:01 +00:00
GregoryTravis
9ba7be20af
Basic XML support (#7947)
This PR includes
* Reading XML from a file, stream, or string
* Reading XML via Data.fetch
* Accessing the root element, element children, and attributes
* Accessing tag text contents
* Get tags by name
* Inner / Outer XML string
2023-10-06 17:52:19 +00:00
Radosław Waśko
0cd446432f
Fix inconsistency when building a Mixed column, fixes to Union (#7919)
- Fixes #7352 by remembering original value types in type inference mode to be able to reconstruct them for Mixed.
   - Added more benchmarks for comparing performance of constructing columns.
- Fixes missing implementations that caused `Table.union` crashing on some type pairs.
- Ensures that `Loss_Of_Integer_Precision` warning is not swallowed when numeric columns are unioned to create a `Float` column.
- Adds test for all of the above cases.
- Allow to output benchmark results to a CSV by setting an environment variable - useful for quickly comparing benchmarks, e.g. in Enso.
2023-10-03 20:33:34 +02:00
Radosław Waśko
08cd449a99
Fix NumberParser to avoid thousandSeparator==decimalPoint and prefer US decimal format (#7946)
Closes #7930
2023-10-03 20:07:54 +02:00
Radosław Waśko
3222e5af62
Avoid exponential growth of column names (#7934)
- Fixes #7933
- Avoids duplicating `]` as `]]` in generated column names - now column names grow linearly.
2023-10-02 16:05:28 +00:00
Jaroslav Tulach
6a59fa5e93
Meta.Type.find a type by FQN (#7885) 2023-10-02 17:02:00 +02:00
James Dunkerley
fb50eb7595
Using conversions in a few places (#7859)
- Shuffle a few `from`s into correct places:
- `Day_Of_Week.from` removing `Day_Of_Week_From` module.
- Adding short cut for `http` and `https` in `Data.read` so it calls onto `Data.fetch` giving a single entry point.
- Moved `URI` extensions from `Standard.Base.Data` module into `Standard.Base.Network.Extensions`.
- Added `post` extension for `URI`.
- Added `contains_key` to `JS_Object`.
- Restored `into` in `JS_Object`:
- Follows old logic populating a constructor.
- Will use conversion from `JS_Object` if present.
- Added automatic deserialization of `Date`, `Time_Of_Day` and `Date_Time` from JSON.
- Uses conversion from `JS_Object`.
- Added conversion from `Text` to a `HTTP_Method` and type checking where `HTTP_Method` used in public APIs.
- Added support for `Date`, `Time_Of_Day` and `Date_Time` in `Table.from_objects`.
- Added `expand_column` to `Table` to expand `JS_Object` to values.
- Add type checking for `Table` in `right` arguments (allowing `Column`s to be used).
- Use type checking in `Table.set` to allow for conversion to a `Column`.
- Remove some unused imports.
- Fix for bug in S3 edge case.
2023-10-02 14:54:22 +00:00
Cassandra-Clark
d7258abbf5
Add Text.substring to allow for an easy short hand of Text.take (start.up_to end) (#7913)
* Add Text.substring function and get_position helper function

For #7876 adds a Text.substring function which supports negative indexes and returns a part of a string from 0-based index 'start' and continuing for 'length'

* added substring and simplified get function

For #7876 adds a Text.substring function which supports negative indexes and returns a part of a string from 0-based index 'start' and continuing for 'length'.

Also simplified get function as it looped unnecessarily.

* Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso

punctuation corrections

Co-authored-by: GregoryTravis <greg.m.travis@gmail.com>

* Update Text_Spec.enso

Added test for start index larger than string

* Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso

updated Arguments: section to use consistent style

Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>

* Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso

updated Index_Out_Of_Bounds error to reference cached length

Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>

* Removed Slice label and added changelog entry

* Re-added slice tag to substring

Per conversation with James, added slice back to substring

* Update CHANGELOG.md add link

---------

Co-authored-by: GregoryTravis <greg.m.travis@gmail.com>
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
2023-09-29 10:57:57 -06:00
Pavel Marek
9f15b90caa
Implement Enso-specific assert (#7883)
Implement Enso-specific assert - `Runtime.assert` that works like asserts in any other runtime.

# Important Notes
- Enso-specific assertions are enabled when JVM assertions are enabled, or when `ENSO_ENABLE_ASSERTIONS` env var is not empty (See 72cd8361cb/engine/runtime/src/main/java/org/enso/interpreter/runtime/EnsoContext.java (L139))
2023-09-29 14:46:58 +00:00
Jaroslav Tulach
ffa036411d
Make sure Integer can be treated as Float (#7909) 2023-09-28 11:56:54 +02:00
Radosław Waśko
8d926166ea
Follow up improvements to Date_Time_Formatter (#7875)
- Closes #7872
- Also closes #7866
2023-09-28 09:38:00 +00:00
Radosław Waśko
c690559ec4
Implement auto_value_type operation (#7908)
Closes #6113
2023-09-27 15:45:34 +00:00
GregoryTravis
b03712390c
Improve HTTP tests (#7847)
* simple-httpbin encodes response using the Content-encoding header value
* Return sent body verbatim
2023-09-27 14:02:32 +00:00
Hubert Plociniczak
18b2491a41
Always log to console and file (#7825)
* Always log verbose to a file

The change adds an option by default to always log to a file with
verbose log level.
The implementation is a bit tricky because in the most common use-case
we have to always log in verbose mode to a socket and only later apply
the desired log levels. Previously socket appender would respect the
desired log level already before forwarding the log.

If by default we log to a file, verbose mode is simply ignored and does
not override user settings.

To test run `project-manager` with `ENSO_LOGSERVER_APPENDER=console` env
variable. That will output to the console with the default `INFO` level
and `TRACE` log level for the file.

* add docs

* changelog

* Address some PR requests

1. Log INFO level to CONSOLE by default
2. Change runner's default log level from ERROR to WARN

Took a while to figure out why the correct log level wasn't being passed
to the language server, therefore ignoring the (desired) verbose logs
from the log file.

* linter

* 3rd party uses log4j for logging

Getting rid of the warning by adding a log4j over slf4j bridge:
```
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
```

* legal review update

* Make sure tests use test resources

Having `application.conf` in `src/main/resources` and `test/resources`
does not guarantee that in Tests we will pick up the latter. Instead, by
default it seems to do some kind of merge of different configurations,
which is far from desired.

* Ensure native launcher test log to console only

Logging to console and (temporary) files is problematic for Windows.
The CI also revealed a problem with the native configuration because it
was not possible to modify the launcher via env variables as everything
was initialized during build time.

* Adapt to method changes

* Potentially deal with Windows failures
2023-09-26 11:32:04 +02:00
Radosław Waśko
12c4f2981d
More robust Date/Time format patterns parsing (#7826)
- Closes #7461 by introducing a `Date_Time_Formatter` type and making parsing date time formats more robust and safer.
- The default ('simple') set of patterns is slightly simplified and made case insensitive (except for `M/m` and `H/h`) to avoid the `YYYY` vs `yyyy` issues and make it less error prone.
- The `YYYY` now has the same meaning as `yyyy` in simple mode. The old meaning (week-based year) is moved to a _separate mode_, triggered by `Date_Time_Formatter.from_iso_week_date_pattern`.
- Full Java syntax, as well as custom-built Java `DateTimeFormatter` can also be used by `Date_Time_Formatter.from_java`.
- Text-based constants (e.g. `ISO_ZONED_DATE_TIME`) have now become methods on `Date_Time_Formatter`, e.g. `Date_Time_Formatter.iso_zoned_date_time`).
2023-09-22 10:12:18 +00:00
Jaroslav Tulach
5150c14afd
Avoid duplicated conversion & search target type scope (#7849) 2023-09-20 17:23:09 +02:00
James Dunkerley
74d1d0861c
S3 Read Access, Input Stream based reading (#7776)
- Added a `FileSystemSPI` allowing protocol resolution to a target type.
- Separated `Input_Stream` and `Output_Stream` from `File` to allow use in other spaces.
- `File_Format` types `read_web` changed to be `read_stream` working with `InputStream`.
- Added directory listing to `Auto_Detect` allowing for `Data.read` to list a folder.
- Adjusted HTTP to return an `InputStream` not a `byte[]`:
- `Response_Body` adjusted to wrap an `InputStream`.
- Added ability to materialize to either and in-memory vector (<4KB) or a temporary file.
- `Data.fetch` will materialize if not a recognized mime-type.
- Added `HTTP_Error` to handle IO exceptions from the stream.
- `Excel_Format` now supports mime-type and reading a stream.
- `Excel_Workbook` can now get a `Excel_Section` using `read_section`.
- Added S3 APIs:
- `parse_uri`: splits an S3 URI into bucket and key.
- `list_objects`: list the items in a S3 bucket with specified prefix.
- `read_bucket`: list prefixes and keys with a delimiter in a S3 bucket with specified prefix.
- `head`: either head_bucket (tests existance) or head_object API (reads object meta data).
- `get_object`: gets an object from S3 returning as a `Response_Body`.
- Added `S3_File` type acting like a `File`:
- No support for writing in this PR.
- **ToDo:** recursive listing, glob filtering, exists, size.
- Fixed a few invalid type signature line.
- Moved `create` methods for `Postgres_Connection` and `SQLite_Connection` into type instead of module.
- Renamed `Column_Fetcher.Builder` to `Column_Fetcher_Builder`.
- Fixed bug with `select_into` in Dry Run mode creating permanent tables.

**ToDo:** Unit tests.
2023-09-20 15:09:11 +00:00
GregoryTravis
b0c1f3b00e
New Data.post for sending a payload to a Web API (#7700) 2023-09-19 11:26:29 +00:00