Commit Graph

421 Commits

Author SHA1 Message Date
AdRiley
c7476c10f3
Add data cleanse component (#9879)
* Signature

* More API work

* Red

* Still red

* Green

* Red

* Green

* Red

* Green

* Refactor

* Refactor

* Refactor

* Red

* Green

* Red

* Green

* Red

* Green

* More tests

* non-ascii

* Ordering tests

* Remove tabs

* Numbers and Letters

* Changelog.md

* Add documentation

* Tests for non-text columns

* Add punctuation

* Add symbols

* Fix

* Refactor

* Refactor

* Move to base

* Fix

* Start of in-db tests. Not working

* DB versions

* Update widget

* Fix widgets

* Move tests to base

* Code review changes
2024-05-20 09:33:18 +01:00
GregoryTravis
4d49b00375
Combine builders for Vector.build and Vector.new_builder (#9922)
We have decided to keep the old new_builder, and to combine the new and old builders into one builder.
2024-05-17 16:18:47 +00:00
Radosław Waśko
38ad7b0afa
Creating datalinks from code (#9957)
- Closes #9673
- Adds the ability to save an existing Postgres connection as a datalink into the Enso Cloud, automatically promoting plain-text passwords into a Secret.
- Fixes dataflow error propagation in `JS_Object.from_pairs`.
2024-05-16 13:29:41 +00:00
Radosław Waśko
a2481036e6
Prepare for the Path resolver change to GET (#9958)
- Prepares for the Cloud API change, together with a fallback for the old API to avoid problems during migration.
- This PR should be merged before the https://github.com/enso-org/cloud-v2/pull/1236 PR is _deployed_.
2024-05-15 13:00:32 +00:00
James Dunkerley
b2aeb9fc84
Moving away from Integer | Nothing to Rows_To_Read for limiting number of rows. (#9925)
- Added a new `Rows_To_Read` type with conversions from `Nothing` and integers.
- Updated `read` on `Table`, `Column`, `DB_Table` and `DB_Column`.
- Updated `Delimited_Format.Delimited` to use `Rows_To_Read` for `row_limit`.
- Updated `Excel_Format.Sheet` and `Excel_Format.Range` to use `Rows_To_Read` for `row_limit`.
- Updated `Excel_Workbook.read` to use `Rows_To_Read`.
- Updated `Connection.read` (in all connection types) to use `Rows_To_Read`.

![image](https://github.com/enso-org/enso/assets/4699705/553c027f-f4c3-4855-9f51-2c4bcaec48a0)

![image](https://github.com/enso-org/enso/assets/4699705/a06c3912-77e0-4c10-abb8-73aed667458d)
2024-05-14 16:31:26 +00:00
James Dunkerley
2d6735edc7
Allow a Table to be used to rename_columns. (#9940)
Following an example from Steve, adjusted `rename_column` to allow a `Table` to define the mapping.

- A single Text column table gives a list of new names.
- A two Text column table gives a map from old name to new name.
- If a `DB_Table` is passed errors and tells the user to materialize the table.

![image](https://github.com/enso-org/enso/assets/4699705/3ed98330-1fce-465e-bf96-4a86e0872dd3)
2024-05-14 15:57:10 +00:00
Radosław Waśko
5f0a16c87c
Audit Logs for Postgres connections opened through a data link (#9873)
- Closes #9599
- Implemented API for sending audit logs to the cloud on a background thread.
- If the Postgres connection is opened through a datalink, its internal JDBC connection is replaced by a wrapper that reports executed queries to the audit log.
- Also introduces `EnsoMeta` - a helper Java class that can be used in our helper libraries to access Enso types.
- I have replaced the common pattern scattered throughout the codebase with calls to this 'library' to avoid repetitive code.
- Refactored `Table.display` to share code between in-memory and DB - it was needed as the function stopped working for `DB_Table` after adding making the `Table` constructor `private`.
- Clearer error when reading a SQLite database from a remote file (tells the user to download it first).
- Follow up - correlate asset id of the data link:
#9869
- Follow up - include project name (once bug is fixed):
#9875
- Some problems/improvements of the audit log:
- The audit log system is not yet ready for high throughput of logs
#9870
- The logs may be lost if `System.exit` is used
#9871
2024-05-11 08:54:33 +00:00
James Dunkerley
d97754da17
Error messages for rename_columns and Vector.duplicates (#9917)
- Improve error message for `rename_columns`.
- Add `length` to `Set` and `Map`.
- Add `duplicates` to `Vector` (and `Array`).

![image](https://github.com/enso-org/enso/assets/4699705/623df253-52e8-4bdc-a69c-ac8dc3ca594e)
2024-05-10 17:43:50 +00:00
GregoryTravis
720d32cbe3
Convert most remaining uses of Vector.new_builder to Vector.build (#9891) 2024-05-09 15:52:53 +00:00
AdRiley
55be1057fa
Add missing tests (#9897)
* Add missing tests

* Fix

* Fix tests
2024-05-09 15:47:31 +01:00
AdRiley
e25ec96aaa
Add table running variance skew sd and kurtosis (#9854)
Adds support for Variance, Skew, Standard Deviation and Kurtosis to Table.Running.
2024-05-09 08:45:29 +00:00
AdRiley
15976a8505
Make table.Running return integer typed columns for min/max (#9853)
* New Tests

* Green

* Running min for longs

* Unsupported types test

* Revert

* Add support for all the integer types

* Another test
2024-05-07 10:49:12 +01:00
James Dunkerley
c0fd6eed2d
Small tweaks from Ned, Mark, Cass and Steve's feedback. (#9859)
- Rename `.info` to `.column_info`.
- Rename `.select_by_type` to `.select_columns_by_type`. (to be merged into `select_columns` functionality)
- Rename `.remove_by_type` to `.remove_columns_by_type`. (to be merged into `remove_columns` functionality)
- Adding "field" `ALIAS`es.
- Making `Table.Value` private (and hence hiding `java_table`).
- Making `Column.Value` private (and hence hiding `java_column`, added method to `Storage` to allow tests to work).
- Making `DB_Table.Value` private (and hence hiding `connection`, `context` and `internal_columns`).
- Making `DB_Column.Value` private (and hence hiding `connection`, `sql_type_reference`, `expression` and `context`).
- Add widgets for everywhere `Filter_Condition` used (fix missing entry for lower, can't use auto-scoping).
![image](https://github.com/enso-org/enso/assets/4699705/bad7d5c8-cd01-4384-a1c8-dbcd7a5ad92b)
- Fix for widgets in `Simple_Calculation`: `date_add` and `format`.
![image](https://github.com/enso-org/enso/assets/4699705/14eeef2e-5069-4a88-8b68-eb675497addf)
![image](https://github.com/enso-org/enso/assets/4699705/75d4e097-f171-4bf6-a27e-f9d396c44a92)
- Fix for nested `Row`, `Column` and `DB_Column` rendering in Table Viz.
![image](https://github.com/enso-org/enso/assets/4699705/8c50c519-7c34-4228-bab7-be938a6ce1bf)
![image](https://github.com/enso-org/enso/assets/4699705/c83ad301-f31e-451d-a8be-b022e1456b4e)

# Important Notes
- Can't use `.` to access a private constructor field when mapping, have to use a lambda.
- We should agree convention for private backing fields which are wrapped by a public getter (went with `internal_` for this pass).
2024-05-06 13:49:12 +00:00
AdRiley
f647045214
Make excel writer work for all types (#9846)
* New Test

* Improve DateTime recognition

* Re-enable slow test

* If there is a time take it regardless of format

* If there is a time take it regardless of format

* Code Review Changes
2024-05-03 07:09:54 +01:00
James Dunkerley
d2e6ff260e
Restructure SQLite_Details. (#9832)
```
type SQLite_Details
SQLite location:File|In_Memory

type In_Memory
```
to
```
type SQLite
From_File location:File

In_Memory
```

# Important Notes
Splits the In-Memory entry for Database Connect but still works nicely.

![image](https://github.com/enso-org/enso/assets/4699705/ec798ce0-9f41-4903-a2fd-722a9e37743c)

![image](https://github.com/enso-org/enso/assets/4699705/f233b055-893e-4c56-a23d-562e982560f6)
2024-05-01 22:15:41 +00:00
AdRiley
d1bf4cb771
Add Ignored_Nothing_Values (#9770)
Add a `IgnoredNothing` warning for Table.Running

![image](https://github.com/enso-org/enso/assets/1720119/1941d278-2c33-43fe-a175-8bcc65bae51a)

![image](https://github.com/enso-org/enso/assets/1720119/b5f6b235-d939-4868-9490-de0f226ea1a2)

![image](https://github.com/enso-org/enso/assets/1720119/a1d617a6-a684-4cc1-be13-c4907d2e6876)
2024-04-30 13:30:40 +00:00
AdRiley
32c3f5f3e8
Make Table.should_equal and Column.should_equal consider NaN equal (#9799)
* Make Column.should_equal detect colums of different types and think nan==nan

* Refactor Table.should_equal

* More Column tests

* Adjust spacing

* Tests Green

* Check same number of columns

* Refactor

* Extra test

* Code Review Changes

* Fix

* Fix

* Fix tests

* Fix Tests

* Fix Test

* Fix test

* Code review change
2024-04-29 22:21:34 +01:00
Pavel Marek
660c5e7a9d
Atom constructors can be private (#9692)
Closes #8836.

Atom constructors can be declared as private (project-private). project-private constructors can be called only from the same project. See the encapsulation.md docs for more info.

---------

Co-authored-by: Jaroslav Tulach <jaroslav.tulach@enso.org>
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
Co-authored-by: Hubert Plociniczak <hubert.plociniczak@gmail.com>
Co-authored-by: Kaz Wesley <kaz@lambdaverse.org>
2024-04-29 14:43:18 +02:00
Radosław Waśko
0c989c6938
Writing to Data Links (#9750)
- Closes #9292
- and also closes #9522
2024-04-25 17:55:49 +00:00
GregoryTravis
b150b091b0
Convert Vector.new_builder to Vector.build in Standard.Base (#9754)
* Vector.build

* new_builder deprecated

* convert one

* convert vector_spec

* limit param

* convert vector

* docs

* changelog

* wip

* rename limit to capacity

* wip

* review

* append comments

* review

* wip

* wip

* fix tests

* docs

* wip

* wip

* fix filter warning propagation

* doc weird dependencies

* fix tests

* cleanup

* check builder result
2024-04-25 10:38:29 -04:00
Jaroslav Tulach
0d495ffd97
Make conversion of double to BigDecimal exact (#9740)
Resolves #9607 by computing `Number.hash` by converting given number to `Float` first and then computing the hash. Also the conversion from `Float.to Decimal` is exact - done via `new BigDecimal(double)`. There is `Decimal.new` that handles the user-friendly conversion. However as a result `Decimal.from 2.1 != Decimal.new 2.1` - that's the only way to ensure consistency between hash code and conversions.
2024-04-25 11:22:50 +00:00
James Dunkerley
fb9cf38914
Excel_Workbook.read_many (#9759)
- Some minor linting fixes.
- Adjust `headers` parameter so a dedicated type.
![image](https://github.com/enso-org/enso/assets/4699705/989f464d-df95-410e-a03b-36661f1c4a37)
- Fix bug with `read` on an `Excel_Workbook` so error handled more gracefully and not panicking to UI.
![image](https://github.com/enso-org/enso/assets/4699705/23b4575f-daad-4719-a5cc-30d064bd7f7a)
- Fix bug when writing to a file with an `Excel_Format` with an invalid extension which was causing a panic.
![image](https://github.com/enso-org/enso/assets/4699705/dc0e055c-c1b6-482f-b129-eb69f6554d72)
- Add `read_many` to `Excel_Workbook` allowing reading more than one sheet at a time.
2024-04-24 13:16:44 +00:00
AdRiley
6f3e649e7a
Make errors propagate out of each (#9742)
Make errors propagate out of each
2024-04-23 17:14:54 +00:00
AdRiley
4a97bfa31f
Add table running functionality for Sum, Mean, Min, Max. (#9577)
* Add Table.Running

* Code Review fixes

* Code Review changes

* Change null handling
2024-04-23 09:45:43 +01:00
AdRiley
ceaba7f48d
Make excel writer work for custom types (#9752) 2024-04-20 10:34:06 +01:00
Radosław Waśko
fda41cbfd1
Writing Cloud files (#9686)
- Closes #9291
2024-04-16 14:01:03 +00:00
GregoryTravis
e3afa5561d
Add Decimal.round (#9672) 2024-04-11 15:47:50 +00:00
Radosław Waśko
354ee94a2f
Make HTTP tests more robust by adding retries to the tests (#9652)
- As asked for by @hubertp who was encountering flaky test failures on CI in the Http_Spec and related ones, I'm adding retry logic to make such cases much less likely.
- I've made the test server randomly fail 50% of tests and with the retry logic the tests are still passing, so I think that should be much more robust, in practice the failure rate is much much less (I imagine <1% as most of the time these tests were working and we do a ton of requests in a single CI run).
- I move the `with_retries` method to now be `Test.with_retries` which can be used anywhere in our tests for the retry logic.
- It sleeps for 0.1s between retries. Not all kinds of tests need it, this was mostly for propagation delays in the Cloud in our tests. I was thinking if the delay should be configurable, but I think the 0.1s delay is not problematic and if our tests are sometimes failing due to high machine load, the delay could also help.
- This _does not_ add retry logic to raw HTTP operations or `Data.fetch`. We may add that later, but that needs some further design. In such case we may remove some retries from tests if they become unnecessary.
2024-04-09 10:07:22 +00:00
Jaroslav Tulach
d9c7bf4138
Testing autoscoped constructors in a vector (#9630) 2024-04-04 17:13:33 +02:00
James Dunkerley
e262801daa
Restructure Standard.Table. (#9559)
Move the types from `Standard.Table.Data` to `Standard.Table`.

Exceptions:
- `Standard.Table.Data.Report_Unmatched` => `Standard.Table.Constants`.
- `Standard.Table.Data.Join_Kind_Cross` => `Standard.Table.Internal.Join_Kind_Cross`.
Also removed constructor as an atom type.
- `Standard.Table.Extensions.Table_Ref` => `Standard.Table.Internal.Table_Ref`.
- `Standard.Table.Data.Type.Value_Type_Helpers` => `Standard.Table.Internal.Value_Type_Helpers`.
- `Standard.Table.Data.Type.Enso_Types` => `Standard.Table.Internal.Value_Type_Helpers`.
- `Standard.Table.Data.Type.Storage` => `Standard.Table.Internal.Storage`.

Changed all `Standard.Table` imports inside project to be project.
Favoured importing from `Standard.Table.Main` in `Standard.Database`.
Also fixed some linting in Enso_File.
2024-03-27 17:10:43 +00:00
AdRiley
60fa83cb84
Make expand_to_rows, expand_column support Rows, Tables, Column data types (#9533)
Make expand_to_rows work for Table and Column
Make expand_column work for Column and Row

Makes solving the book club exercise easier

![image](https://github.com/enso-org/enso/assets/1720119/4532fc38-8765-43d9-b6a0-61254f52239f)

# Important Notes
We decided expand_column did not make sense for Table as the resulting Table of Columns would rarely be what was wanted.
2024-03-26 12:31:02 +00:00
Radosław Waśko
6665c22eb9
Make data-links behave more like 'symlinks' (#9485)
- Closes #9324
2024-03-22 17:01:54 +00:00
James Dunkerley
2f0d99a1cb
Snowflake Connectivity (#9435)
* Initial connection to Snowflake via an account, username and password.

* Fix databases and schemas in Snowflake.
Add warehouses.

* Add warehouse.
Update schema dropdowns.

* Add ability to set warehouse and pass at connect.

* Fix for NPE in license review

* scalafmt

* Separate Snowflake from Database.

* Scala fmt.

* Legal Review

* Avoid using ARROW for snowflake.

* Tidy up Entity_Naming_Properties.

* Fix for separating Entity_Namimg_Properties.

* Allow some tweaking of Postgres dialect to allow snowflake to use as well.

* Working on reading Date, Time and Date Times.

* Changelog.

* Java format.

* Make Snowflake Time and TimeStamp stuff work.
Move some responsibilities to Type_Mapping.

* Make Snowflake Time and TimeStamp stuff work.
Move some responsibilities to Type_Mapping.

* fix

* Update distribution/lib/Standard/Database/0.0.0-dev/src/Connection/Connection.enso

Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>

* PR comments.

* Last refactor for PR.

* Fix.

---------

Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2024-03-20 10:06:12 +00:00
Radosław Waśko
6e5b4d93a3
Implement refreshing the Cloud token in Enso libraries (#9390)
- Closes #9300
- Now the Enso libraries are themselves capable of refreshing the access token, thus there is no more problems if the token expires during a long running workflow.
- Adds `get_optional_field` sibling to `get_required_field` for more unified parsing of JSON responses from the Cloud.
- Adds `expected_type` that checks the type of extracted fields. This way, if the response is malformed we get a nice Enso Cloud error telling us what is wrong with the payload instead of a `Type_Error` later down the line.
- Fixes `Test.expect_panic_with` to actually catch only panics. Before it used to also handle dataflow errors - but these have `.should_fail_with` instead. We should distinguish these scenarios.
2024-03-19 19:26:34 +00:00
GregoryTravis
53e2636b8c
Allow Table.replace to take mutiple target columns (#9406) 2024-03-19 19:02:26 +00:00
Cassandra-Clark
f7295f3060
Added table.from_union and respective tests (#9343)
Table.from_union creates a new table when passed in a vector of tables. This is especially helpful when a grouped method is run multiple times, as it can create a unified result set.
2024-03-15 18:09:35 +00:00
GregoryTravis
bb080b54b6
Fix sporadic Table_Tests failures caused by reliance on test ordering. (#9438)
* wip

* 4 failures

* fix

* todo
2024-03-15 10:13:31 -04:00
James Dunkerley
19f15b8f97
Small fixes from building up another demo. (#9385)
- Fix `Excel_Workbook.sheet` and add a test.
- Add icon for `Table.row_count` and `DB_Table.row_count`.
- Make `join_kind` widget `Display.Always`.
- Add expression as an option to `Aggregate_Column`.
- Add `Simple_Calculation.Copy` to create a copy.
- Add defaults to `Simple_Expression` so less errory.
- Set period to default to day for `date_diff` allowing use in expressions.
- Add `Text_Left`, `Text_Right`, `Text_Length` and `Format` to `Simple_Expression`.
2024-03-13 18:13:33 +00:00
AdRiley
2fdb2fca62
Added Table.running (#9382)
This is a first naïve implementation of Table.running that only supports count. Adding it as a scaffold for the rest of the functionality and to give us a place to agree on the API. (Which I changed slightly from the design)

![image](https://github.com/enso-org/enso/assets/1720119/a62a83ed-f864-4295-98ea-1007f62381b1)

# Important Notes
Only supports Statistic.Count. Other functionality to follow.
2024-03-13 13:28:33 +00:00
Radosław Waśko
e98306f170
Excel DataLink (#9346)
- Adds the Excel format as one of the formats supported when creating a data link.
- The data link can choose to read the file as a workbook, or read a sheet or range from it as a table, like `Excel_Format`.
- Also updated Delimited format dialog to allow customizing the quote style.
2024-03-11 16:12:12 +00:00
Radosław Waśko
2e35189d83
Better error message in Data.read when an argument to format is missing (#9337) 2024-03-11 14:50:32 +00:00
James Dunkerley
9f9cf58d28
Add selecting by type to the table (#9334)
- Adds `select_by_type` and `remove_by_type` to tables.
2024-03-08 18:40:53 +00:00
Radosław Waśko
a3bf5a0be5
Split Excel_Format into 3 constructors (#9308)
Splitting the `Excel_Format` into 3 constructors.
2024-03-08 14:26:30 +00:00
James Dunkerley
c7d693dfc8
Move Standard.Database.Data to Standard.Database. (#9321)
Moves the types out of `Data`.
2024-03-07 14:43:38 +00:00
Radosław Waśko
e37862b09d
Implement a Data Link for Postgres (#9269)
- Closes #9124
2024-03-06 11:57:12 +00:00
AdRiley
8b889f0977
Make Table.To_Xml return a XML_Document (#9263)
As part of the XML improvements it makes more sense for Table.To_Xml to return a XML_Document.
2024-03-04 15:19:20 +00:00
Jaroslav Tulach
5676618bad
Autoscoped constructors (#9190)
Fixes #8645 by recognizing `~` prefix to constructor names.
2024-03-04 11:41:02 +00:00
GregoryTravis
54675b1e4d
Implement Table.replace for the database backend (#8986) 2024-02-29 18:36:42 +00:00
James Dunkerley
0e2a91cfe1
Remove countMissing from Storage and replace with a new CountNothing operation. (#9137)
Removing another small piece of logic from the storages to it's own operation.
2024-02-22 19:32:46 +00:00
Jaroslav Tulach
ad2f5b031e
Chained if_then_else application change (#8671)
* Test describing the current behavior of chained if then else application

* Chained block should behave just like Group around if_then_else

* Finishing line on BlockStart fixes if_then_else_chained_block

* Only finish the line when there was not start of a macro segment

* Fix tests

* Refine else-body with macro patterns.

* Update test syntax to maintain original semantics

* Few additional tests

---------

Co-authored-by: Kaz Wesley <kaz@lambdaverse.org>
2024-02-22 09:17:25 -05:00