Commit Graph

251 Commits

Author SHA1 Message Date
Radosław Waśko
255b424b72
Add value_type to Column.from_vector and expected_value_type to Column.map and Column.zip (#7637)
- Closes #6111
- Aligns semantics of handling Mixed columns.
- Now, if an operation like `iif` or `fill_nothing` is given a `Mixed` column, the result will also be `Mixed` regardless of the `inferred_precise_value_type`.
- Enables a few old tests that were pending but could be enabled since the types work is advanced enough.
2023-08-31 13:20:49 +00:00
James Dunkerley
7d83b3d7b4
Add GROUP to functions (#7622)
- Update list of groups to agreed list.
- Lower case `ALIAS` names to be consistent with function names.
- Add `GROUP` to methods.
- All constructors and functions have doc comments.
- Correct a few typos (e.g. `PRVIATE`).
- Mark some more things as `PRIVATE`.
- Use `ToDo:` and `Note:` consistently.
- Order tags in doc comment.

# Important Notes
We don't have all the doc comments on types and will want to add them in future,
2023-08-23 13:20:38 +00:00
Radosław Waśko
2385f5b357
Add size-limited strings and varying bit-width integer Value_Types to in-memory backend and check for ArithmeticOverflow in LongStorage (#7557)
- Closes #5159
- Now data downloaded from the database can keep the type much closer to the original type (like string length limits or smaller integer types).
- Cast also exposes these types.
- The integers are still all stored as 64-bit Java `long`s, we just check their bounds. Changing underlying storage for memory efficiency may come in the future: #6109
- Fixes #7565
- Fixes #7529 by checking for arithmetic overflow in in-memory integer arithmetic operations that could overflow. Adds a documentation note saying that the behaviour for Database backends is unspecified and depends on particular database.
2023-08-22 18:10:46 +00:00
GregoryTravis
c9d7c5cb2b
Convert in-memory Column.round to Java (#7521) 2023-08-16 14:45:23 +00:00
James Dunkerley
296c95d414
Fix for empty column on replace and out of memory catching for join and tab (#7593)
- Added a Panic.catch to catch heap memory error in joins and cross_tab.
- Adjusted column replace so type is correct.
2023-08-15 17:06:51 +00:00
Radosław Waśko
8541a9e1ac
Improve generation of long operation in presence of column name length limit (#7556)
I planned to do this as part of #7428, but I forgot. Making up for that now.
2023-08-14 16:58:36 +00:00
Radosław Waśko
b656b336c7
Report Loss_Of_Integer_Precision when an integer is not exactly representable as a float during conversion (#7509)
Closes #7353

I introduce a new type `WithAggregatedProblems`, because `WithProblems` was too simple - it only allowed to hold a `List<Problem>` but `AggregatedProblems` is more than that. Ideally we shouldn't multiply entities like this too much. We should probably unify all to use `WithAggregatedProblems` - but after starting this, I realised it will likely just take too much effort to do for this little PR. So instead, I created a follow-up task for this: #7514
2023-08-08 12:30:44 +00:00
GregoryTravis
758b3b31b9
Avoid indexing the table twice for Cross Tab (#7417)
Rewrites MultiValueIndex.makeCrossTabTable to build only a single index.
2023-08-04 21:14:18 +00:00
Radosław Waśko
bc9cde6543
Fix column naming edge cases - invalid and duplicated columns, case-insensitive name aliasing for case-insensitive backends (#7495)
- Fixes #7412
- Also adds tests and fixes some more edge cases:
- Ensures correct handling of existing Database tables whose column names may be invalid from Enso perspective, or clashing from Enso perspective (e.g. for most DBs `ś` and `s\u0301` are different names, but for Enso they are basically the same so this would cause issues - thus Enso now renames such columns when accessed (still using the correct column reference in the generated SQL under the hood).
2023-08-04 09:04:38 +00:00
Radosław Waśko
c61c741476
Respect database backend naming limitations when generating table/column names and validate user-provided names to avoid silent name clashes; process JDBC warnings reported from backends (#7428)
- Closes #5951
- Ensures any SQL warnings reported by the database through the JDBC driver are processed and forwarded to the user.
- These warnings show issues like the implicit name truncation that this PR is also solving. It's good to make sure they are visible as they can help avoid and understand unexpected problems. They should not show up in most standard workflows.
- Adds simple history to our REPL.
2023-08-03 09:44:27 +00:00
James Dunkerley
7345f0fd9a
Speed up statistics (#7390)
- Allow `parse_to_columns` to take a `Regex` object.
- Add `pattern` to the `Regex` object.
- Add `column_names` to the `Row` object.
- Improve statistics performance.
- Add benchmarks for stats.

| Benchmark | Reference | New | Improvement |
| --- | --- | --- | --- |
| Max (by reduce) | 16.4ms | 16.3ms | - |
| Max (stats) | 703ms | 224ms | 68% |
| Sum (by reduce) | 38ms | 38ms | - |
| Sum (stats) | 753ms | 420ms | 44% |
| Variance (stats) | 745ms | 553s | 26% |

Also tried using a Ref approach for stats but as slower (7e13c45224).
2023-07-26 10:01:18 +00:00
Radosław Waśko
4b5a2e2176
Fixing operations on Mixed types (#7368)
- Fixes #7231
- Cleans up vectorized operations to distinguish unary and binary operations.
- Introduces MixedStorage which may pretend to be a more specialized storage on demand.
- Ensures that operations request a more specialized storage on right-hand side to ensure compatibility with reported inferred storage type.
- Ensures that a dataflow error returned by an Enso callback in Java is propagated as a polyglot exception and can be caught back in Enso
- Tests for comparison of Mixed storages with each other and other types
- Started using `Set` for `Filter_Condition.Is_In` for better performance.
- ~~Migrated `Column.map` and `Column.zip` to use the Java-to-Enso callbacks.~~
- This does not forward warnings. IMO we should not be losing them. We can switch and add a ticket to fix the warnings, but that would be a regression (current implementation handles them correctly). Instead, we should first gain some ability to work with warnings in polyglot. I created a ticket to get this figured out #7371
- ~~Trying to avoid conversions when calling Enso functions from Java.~~
- Needs extra care as dataflow errors may not be handled right then. So only works for simple functions that should not error.
- Not sure how much it really helps. [Benchmarks](https://github.com/enso-org/enso/pull/7270#issuecomment-1635618393) suggested it could improve the performance quite significantly, but the practical solution is not exactly the same as the one measured, so we may have to measure and tune it to get the best results.
- Created #7378 to track this.
2023-07-25 23:25:17 +00:00
Adam Obuchowicz
1d2371f986
Groups in DocTags (#7337)
Fixes #7336 in a quick way.

Next to the old way of defining groups, the library can just add `GROUP` tag to some entities, and it will be added to the group specified in tag's description.

The group name may be qualified (with project name, like `Standard.Base.Input/Output`) or just name - in the latter case, IDE will assume a group defined in the same library as the entity.

Also moved some entities from "export" list in package.yaml to GROUP tag to give an example. I didn't move all of those, as I assume the library team will reorganize those groups anyway.

### Important Notes

@jdunkerley @radeusgd @GregoryTravis When you will start specifying groups in tags, remember that:
* The groups still belongs to a concrete project; if some entity outside a project wants to be added to its group, the "qualified" name should be specified. See `Table.new` example in this PR.
* If the group name does not reflect any group in package.yaml **the tag is ignored**.
* A single entity may be only in a single group. If it's specified in both package.yaml and in tag, the tag takes precedence.

---------

Co-authored-by: Ilya Bogdanov <fumlead@gmail.com>
2023-07-24 15:54:16 +02:00
Radosław Waśko
56635c9a88
Add benchmarks comparing performance of Table operations 'vectorized' in Java vs performed in Enso (#7270)
The added benchmark is a basis for a performance investigation.

We compare the performance of the same operation run in Java vs Enso to see what is the overhead and try to get the Enso operations closer to the pure-Java performance.
2023-07-21 17:25:02 +00:00
GregoryTravis
2fb5c3710b
Add Fallback to Prim_Text_Helper.compile_regex; accept Regex in Text.parse_to_table (#7297)
This PR does three related things:
- Fails more gracefully when a non-string is passed to compile_regex
- Don't pass a non-string to compile_regex
- Allow a Regex param to parse_to_table
2023-07-18 19:55:56 +00:00
James Dunkerley
fd0bdc86dd
Fix issue with rename_columns and revert order of parameter change on select_columns. (#7321)
The Regex change introduced some issues.
Added a test for missed case in `rename_columns` where using vector of pairs.
Reverted parameter order change for `select_columns`.
2023-07-18 13:30:23 +00:00
James Dunkerley
aaa235fbad
Add drop down for replace, remove Column_Selector (#7295)
- Add dropdowns for `replace` functions.
- Retire `Column_Selector` type.
- Add `select_blank_columns` and `remove_blank_columns` functions to table types.
- Allow Regex to be used to pick columns.
2023-07-14 17:30:52 +00:00
Radosław Waśko
866283c0a8
Improve error message on Filter_Condition missing arguments in Table.filter (#7290)
In #7148 I improved the error message when a `Filter_Condition` constructor without arguments is provided to `Vector.filter` and its friends. This PR applies the same check to the `Table.filter`.

This is useful, because when we select a Filter_Condition from a widget, initially it does not have all its arguments applied. This used to lead to confusing errors being reported to the user, now, a much clearer error is shown:

![image](https://github.com/enso-org/enso/assets/1436948/19140a7b-d6fc-4292-81d3-dc6d61135cb9)
2023-07-14 08:00:13 +00:00
Radosław Waśko
620cc361ce
Add date_diff, date_add and date_part to scalar Enso date-time values. (#7273)
Followup of #7221, adding `date_diff`, `date_add` and `date_part` to scalar Enso date-time values.
2023-07-13 15:17:21 +00:00
Radosław Waśko
ca68dd94da
Adding new Date/Time operations (-, date_add, date_diff, date_part) (#7221)
- Adds `Column.date_diff` for computing date/time difference as integer multiply of some unit.
- Adds `Column.date_add` for shifting date/time by a unit.
- Adds `Column.date_part` for extracting various parts of the date/time value as integer.
- Adds widgets for the 3 methods above whose content depends on the column value type.
- Adds shorthands: `Column.hour`, `Column.minute` and `Column.second` to extract these date parts.
- Extends `Time_Period` with support for milli-, micro- and nano- seconds; and adapts functions taking `Time_Period` to support these wherever possible.
2023-07-13 12:56:54 +00:00
James Dunkerley
0adab6c68c
Round on a column was always adding a warning (#7246)
- Only warn if outside allowed range.
- Added `is_infinite` to In-Memory column.
- Allow integer value type for `is_nan` and `is_infinite`.
2023-07-10 17:35:23 +00:00
GregoryTravis
345d6b9cb1
Add cross_join support to Database Table (#7234) 2023-07-10 16:29:37 +00:00
James Dunkerley
1fb60df61b
Fixes from the live demo. (#7243)
- Removed defaults from `cross_tab`. It caused an out-of-heap space error when it attempted to build a 205k x 205k table. Now has a hard limit of 10,000 columns - we can increase this once we have more concrete test data.
![image](https://github.com/enso-org/enso/assets/4699705/bc38d41c-56dc-41bd-8a7c-fa89ecfa7f79)

- Adjusted the dropdowns on `Aggregate_Column` for `columns` and `order_by` to be dropdowns as nested Vector editors are not supported.
![image](https://github.com/enso-org/enso/assets/4699705/f4a7c7cc-6a21-462c-a39e-65fbab82c367)

- Altered `Aggregate_Column` so `new_name` now `new_name:Text=""` and not taking `Nothing` anymore. Makes it appear correctly in IDE.
![image](https://github.com/enso-org/enso/assets/4699705/196a49ba-4274-44bb-b876-0372c8f62746)

- Added dropdowns for `fill_empty`, `fill_nothing` and `replace` on `Table`.
![image](https://github.com/enso-org/enso/assets/4699705/9ee5cec2-82d5-4452-b650-67015ac9fee5)

- Added `replace` to Database table throwing `Unsupport_Database_Operation`.
2023-07-09 18:03:05 +00:00
GregoryTravis
bd26e95fd6
Add Table.replace; Change Text.replace to take a Text|Pattern, and remove the use_regex param. (#7223) 2023-07-06 16:13:11 +00:00
James Dunkerley
7749286c69
Tidy up the imports using script (#7220)
Ordering the imports to test a script.
2023-07-06 14:22:50 +00:00
GregoryTravis
6eb46afb40
Do not rename column on fill_nothing and add version to the Table allowing filling multiple (include fill_empty as well). (#7166)
Updated Column.fill_nothing and .fill_empty, and added the same to Table. (Both in-memory and db.)
2023-07-05 17:20:23 +00:00
Radosław Waśko
78545b4402
Add safepoints to standard libraries Java polyglot helpers (#7183)
Closes #7129
2023-07-05 14:12:13 +00:00
GregoryTravis
966f8b773a
Combine Regex and Pattern (#7172)
Merge Pattern into Regex.
2023-07-05 13:51:53 +00:00
Radosław Waśko
2d73277238
Fix a bug that somehow went under CI (#7204) 2023-07-05 08:54:27 +00:00
James Dunkerley
4fbe7e3830
Remove Array.new and Array.copy and move Vector functions to builtins. (#7147)
- Removed Array methods: `new`, `copy` and `new_[1234]`.
- New builtins for `Vector.insert`, `Vector.remove` and `Vector.flatten`.
- Replaced `Vector_Builder` use of `Array.copy` to a `Vector.Builder` approach.
2023-07-03 12:41:41 +00:00
Radosław Waśko
4ccf3566ce
Implement add_row_number for Database backends, fix primary key inference for SQLite (#7174)
Closes #6921 and also closes #7037
2023-07-03 11:51:42 +00:00
GregoryTravis
c866aa7fb5
parse_to_columns should generate at least one row for a non-match (#7171) 2023-06-30 18:10:33 +00:00
GregoryTravis
550d146493
Add round, ceil, floor, truncate to the In-Database Column type (#6988) 2023-06-30 16:47:40 +00:00
Paweł Grabarz
cb9d4c4607
move method icon definition to documentation tag (#7123) 2023-06-29 14:48:55 +00:00
James Dunkerley
56688ec1e7
Minor fixes. (#7122)
Mostly stuff to tidy up the static methods in the CB.

- Remove default pattern from `parse_to_table` (caused IDE to freeze).
- Rename any `_` arguments to what they are.
- Merge `Date.now` into `Date.today`
- Merge the Interval constructors into a single constructor.
- Hide various methods.
2023-06-27 18:18:15 +00:00
James Dunkerley
937651f696
Code Clean Up, Fix Weird Namespace, S3 List Objects and Read Object (#7114)
Mostly a tidy up as part of looking over the function catalogue for groups.
Sorted some whitespaces issues.
2023-06-24 23:18:58 +00:00
James Dunkerley
1859ccbab5
Improving widgets and other minor tweaks. (#7052)
- Removed `module` argument from `enso_project` (new `Project_Description.new` API).
- Removed the custom option from date and time parse/format dropdowns.
- The `format` dropdown uses the value to create the dropdown. (Screenshot below)
- Removed `StorageType` coalescing rules and replaced them with simpler logic in `ObjectStorage`.
- Update signature for `add_row_number` and add aliases.
2023-06-19 19:03:36 +00:00
James Dunkerley
760fb71798
First part of AWS S3 API, various small fixes. (#6973)
- Add type detection for `Mixed` columns when calling column functions.
- Excel uses column name for missing headers.
- Add aliases for parse functions on text.
- Adjust `Date`, `Time_Of_Day` and `Date_Time` parse functions to not take `Nothing` anymore and provide dropdowns.
- Removed built-in parses.
- All support Locale.
- Add support for missing day or year for parsing a Date.
- All will trim values automatically.
- Added ability to list AWS profiles.
- Added ability to list S3 buckets.
- Workaround for Table.aggregate so default item added works.
2023-06-15 16:20:13 +00:00
Radosław Waśko
dad57e6c7d
Implement remaining Update_Actions for update_database_table. (#7035)
Closes #6498
2023-06-15 08:48:22 +00:00
Dmitry Bushev
48f0c6f5e8
Scala 2.13.11 and libraries update (#7010)
Update Scala and libraries.
2023-06-14 13:15:57 +00:00
Pavel Marek
67821bf8df
Add compiler pass that discovers ambiguous imports (#6868)
Add a new compiler pass that analyses duplicated and ambiguous symbols from imports
2023-06-14 12:18:57 +02:00
Radosław Waśko
d9ed63fb89
Implement Insert update action for update_database_table. (#6990)
This adds the spec for all update actions, but implements the common input validation framework and `Insert`. Tests for remaining actions are marked as pending - these will be implemented in a subsequent PR.
2023-06-14 00:14:32 +00:00
GregoryTravis
912fbce97b
Reimplement Column.truncate, .ceil, and .floor as vectorized Java ops (#6941)
Reimplement these in Java.

Benchmarks:

Before:

Column.truncate floats average: 124.4ms
Column.ceil floats average: 121.47ms
Column.floor floats average: 120.18ms
Column.truncate ints average: 124.78ms
Column.ceil ints average: 120.41ms
Column.floor ints average: 102.35ms

After (boxed):

Column.truncate floats average: 3.75ms
Column.ceil floats average: 2.25ms
Column.floor floats average: 1.89ms
Column.truncate ints average: 2ms
Column.ceil ints average: 1.77ms
Column.floor ints average: 1.74ms

After (unboxed):
Column.truncate floats average: 3.32ms
Column.ceil floats average: 2.15ms
Column.floor floats average: 1.69ms
Column.truncate ints average: 1.74ms
Column.ceil ints average: 1.61ms
Column.floor ints average: 1.99ms
2023-06-06 18:07:12 +00:00
Radosław Waśko
b513839418
Refactor create_database_table into Connection.create_table and select_into_database_table, implement Set. (#6925)
First part for #6498 - refactoring of the upload infrastructure, in preparation for `update_database_table`.

Implemented a `Set` data structure which was long needed.

The APIs are added and an initial implementation is created, but it is not complete - but it has grown significantly already so the remaining implementation will be done as a separate PR.

Adds some basic ability for a function to ensure that it is only executed from within a transaction.
2023-06-06 10:36:05 +00:00
James Dunkerley
db96bd2e2c
Small fixes from book club. (#6933)
- Add the missing dropdowns for `Locale` and `Encoding`.
- Correct a few mismatched type signatures.
- Adjust `order_by` calls with a single `Sort_Column` to call in a Vector.
- Adjust parameter names for `transpose`.
- Fix for the table viz: escape HTML and `suppressFieldDotNotation`.
- Use `Filter_Condition.Equal True` for the default filter.
- Adjust `Data.fetch` to return the response on success when parse fails. Rename `parse` to `try_auto_parse`.
- Add various aliases for methods.
- Add tests for `Table.set` when using a `Vector`, `Range` or `Date_Range`.
- Add check for mismatched length on `Table.set`.

![image](https://github.com/enso-org/enso/assets/4699705/23ea0ba3-2b05-4af8-afd9-f35b55446c24)

![image](https://github.com/enso-org/enso/assets/4699705/8b0253e6-e9e8-490a-9607-0da51ab5a215)
2023-06-05 13:57:30 +00:00
Radosław Waśko
cfb2f2916e
Merge Column_Indexes_Out_Of_Range into Missing_Input_Columns. (#6901)
Implements #6869
2023-06-02 12:09:20 +00:00
Radosław Waśko
d44b1250b7
Implement Table.add_row_number (#6890)
Closes #5227

# Important Notes
- This lays first steps towards #6292 - we get pure Enso variants of MultiValueKey.
- Another part refactors `LongStorage` into `AbstractLongStorage` allowing it to provide alternative implementations of the underlying storage, in our case `LongRangeStorage` generating the values ad-hoc and `LongConstantStorage` - currently unused but in the future it can be adapted to support constant columns (once we implement similar facilities for other types).
2023-06-02 10:13:13 +00:00
James Dunkerley
343b5fb085
Execution control for Table.write and various widget tweaks... (#6835)
- Adds execution control to `Table.write`.
- Refactored the `Text.write` to make part reusable.
- Tidied up some legacy mess in tests.
- Add easier flow to go from `Text` to an `URI` to fetching data.
- Add decode functions to `Response` and `Response_Body`.
- Fix issue with 0 length regex matches (using same as Python and .Net approach).
- Add various ALIAS entries to make function discovery easier.
- Sort a lot of drop down and vector editors out (including switch to fully qualified names).
2023-06-01 22:10:03 +00:00
GregoryTravis
0337180384
Add rounding functions to the Column type (#6817) 2023-06-01 20:06:23 +00:00
GregoryTravis
7e53cd9af1
Add drop down for Locale like Encoding (#6654)
Add dropdowns for locale parameters for format and parse methods.
2023-05-31 12:43:20 +00:00