enso-org/enso - enso - gitea: Gitea Service

mirror of https://github.com/enso-org/enso.git synced 2024-11-23 16:18:23 +03:00

Author	SHA1	Message	Date
Radosław Waśko	940b8f7d51	Improving tests and edge cases for URI and HTTP (#8497 ) - Closes #8352 - ~~Proposed fix for #8493~~ - The temporary fix is deemed not viable. I will try to figure out a workaround and leave fixing #8493 to the engine team.	2023-12-15 17:58:45 +00:00
James Dunkerley	9e27b6487b	Minor fixes and tweak for Cloud APIs. (#8557 ) - Fix secret to at least be working again - Tweak to allow a MIMIC flow to work with value types (revisit in 2024).	2023-12-15 17:10:07 +00:00
Pavel Marek	c1098865f2	Update java formatter sbt plugin (#8543 ) Add a local clone of javaFormatter plugin. The upstream is not maintained anymore. And we need to update it to use the newest Google java formatter because the old one, that we use, cannot format sources with Java 8+ syntax. # Important Notes Update to Google java formatter 1.18.1 - https://github.com/google/google-java-format/releases/tag/v1.18.1	2023-12-15 14:45:23 +00:00
Pavel Marek	4b65e44ef3	EpbLanguage re-uses other TruffleContext support to run tests with assertions enabled (#7882 )	2023-12-15 13:31:32 +01:00
Radosław Waśko	b5c995a7bf	Reworking Excel support to allow for reading of big files (#8403 ) - Closes #8111 by making sure that all Excel workbooks are read using a backing file (which should be more memory efficient). - If the workbook is being opened from an input stream, that stream is materialized to a `Temporary_File`. - Adds tests fetching Table formats from HTTP. - Extends `simple-httpbin` with ability to serve files for our tests. - Ensures that the `Infer` option on `Excel` format also works with streams, if content-type metadata is available (e.g. from HTTP headers). - Implements a `Temporary_File` facility that can be used to create a temporary file that is deleted once all references to the `Temporary_File` instance are GCed.	2023-12-15 00:02:15 +00:00
Radosław Waśko	c6b6384fe6	Improve performance of anti-join (#8338 ) - Closes #8217	2023-11-24 02:44:57 +00:00
James Dunkerley	ecaca12df1	Integrating Enso Cloud with the libraries (part 1...) (#8006 ) - Add a `File_For_Read` type. Used for `File_Format` to read files. - Added `Enso_User` representing the current user in `Enso_Cloud`. - Will be later able to list known users. - Added `Enso_Secret` representing a value defined in `Enso_Cloud`. - Value not used within Enso only accessed within polyglot Java. - Integrated into `Username_And_Password` and can be used within JDBC connections. - Integrated into HTTP Headers so a secret can be used as a value. - New `URI_With_Query` with the same API as `URI`. Supporting secrets in the value. - Will be integrated with AWS credentials. - Added `Enso_File` representing a file or a folder in the cloud. - Support the same API as `File` (like the `S3_File`). - Will support `enso://` URI style access.	2023-11-20 23:21:14 +00:00
Pavel Marek	5a7ad6bfe4	Upgrade enso to GraalVM for jdk 21 (#7991 ) Upgrade to GraalVM JDK 21. ``` > java -version openjdk version "21" 2023-09-19 OpenJDK Runtime Environment GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15) OpenJDK 64-Bit Server VM GraalVM CE 21+35.1 (build 21+35-jvmci-23.1-b15, mixed mode, sharing) ``` With SDKMan, download with `sdk install java 21-graalce`. # Important Notes - After this PR, one can theoretically run enso with any JRE with version at least 21. - Removed `sbt bootstrap` hack and all the other build time related hacks related to the handling of GraalVM distribution. - `project-manager` remains backward compatible - it can open older engines with runtimes. New engines now do no longer require a separate runtime to be downloaded. - sbt does not support compilation of `module-info.java` files in mixed projects - https://github.com/sbt/sbt/issues/3368 - Which means that we can have `module-info.java` files only for Java-only projects. - Anyway, we need just a single `module-info.class` in the resulting `runtime.jar` fat jar. - `runtime.jar` is assembled in `runtime-with-instruments` with a custom merge strategy (`sbt-assembly` plugin). Caching is disabled for custom merge strategies, which means that re-assembly of `runtime.jar` will be more frequent. - Engine distribution contains multiple JAR archives (modules) in `component` directory, along with `runner/runner.jar` that is hidden inside a nested directory. - The new entry point to the engine runner is [EngineRunnerBootLoader](https://github.com/enso-org/enso/pull/7991/files#diff-9ab172d0566c18456472aeb95c4345f47e2db3965e77e29c11694d3a9333a2aa) that contains a custom ClassLoader - to make sure that everything that does not have to be loaded from a module is loaded from `runner.jar`, which is not a module. - The new command line for launching the engine runner is in [distribution/bin/enso](https://github.com/enso-org/enso/pull/7991/files#diff-0b66983403b2c329febc7381cd23d45871d4d555ce98dd040d4d1e879c8f3725) - [Newest version of Frgaal](https://repo1.maven.org/maven2/org/frgaal/compiler/20.0.1/) (20.0.1) does not recognize `--source 21` option, only `--source 20`.	2023-11-17 18:02:36 +00:00
GregoryTravis	ea3d778456	Allow the creation of a constant column on an in-memory table with no rows. (#8218 )	2023-11-09 14:40:51 +00:00
Radosław Waśko	1b8b30a68d	Improve performance of `Join_Condition.Between` by sorting on one dimension (#8212 ) - Closes #5303 - Refactors `JoinStrategy` allowing us to 'stack' join strategies on top of each other (to some extent) - currently a `HashJoin` can be followed by another join strategy (currently `SortJoin`) - Adds benchmarks for join - Due to limitations of the sorting approach this will still not be as fast as possible for cases where there is more than 1 `Between` condition in a single query - trying to demonstrate that in benchmarks. - We can replace sorting by d-dimensional [RangeTrees](https://en.wikipedia.org/wiki/Range_tree) to get `O((n + m) log^d n + k)` performance (where `n` and `m` are sizes of joined tables, `d` is the amount of `Between` conditions used in the query and `k` is the result set size). - Follow up ticket for consideration later: #8216 - Closes #8215 - After all, it turned out that `TreeSet` was problematic (because of not enough flexibility with duplicate key handling), so the simplest solution was to immediately implement this sub-task. - Closes #8204 - Unrelated, but I ran into this here: adds type checks to other arguments of `set`. - Before, putting in a Column as `new_name` (i.e. mistakenly messing up the order of arguments), lead to a hard to understand `Method `if_then_else` of type Column could not be found.`, instead now it would file with type error 'expected Text got Column`.	2023-11-08 12:59:55 +00:00
Radosław Waśko	237aae33c7	Simplify internal logic of `Table.order_by`, avoid unnecessary warning (#8221 ) - Fixes #8213	2023-11-06 11:00:01 +00:00
GregoryTravis	1480f50207	Overhaul the random number and item generation code (#8127 ) Rewrite most of Random.enso.	2023-10-31 15:25:37 +00:00
Radosław Waśko	79011bd550	Implement `Table.lookup_and_replace` in Database (#8146 ) - Closes #7981 - Adds a `RUNTIME_ERROR` operation into the DB dialect, that may be used to 'crash' a query if a condition is met - used to validate if `lookup_and_replace` invariants are still satisfied when the query is materialized. - Removes old `Table_Helpers.is_table` and `same_backend` checks, in favour of the new way of checking this that relies on `Table.from` conversions, and is much simpler to use and also more robust.	2023-10-31 15:19:55 +00:00
Radosław Waśko	0c278391fe	Test and improve handling of `Date_Time with_timezone=False` in Postgres (#8114 ) - Fixes #8049 - Adds tests for handling of Date_Time upload/download in Postgres. - Adds tests for edge cases of handling of Decimal and Binary types in Postgres.	2023-10-21 21:35:13 +00:00
Radosław Waśko	8172896065	Support `Previous_Value` in `fill_nothing` and `fill_missing` (#8105 ) - Adds `Previous_Value` to `fill_nothing` and `fill_empty`, as requested by #7192.	2023-10-20 13:18:53 +00:00
Radosław Waśko	93a31fcc8b	Add benchmarks related to `add_row_number` performance investigation (#8091 ) - Follow-up of #8055 - Adds a benchmark comparing performance of Enso Map and Java HashMap in two scenarios - _only incremental_ updates (like `Vector.distinct`) and _replacing_ updates (like keeping a counter for each key). These benchmarks can be used as a metric for #8090	2023-10-18 17:21:59 +00:00
Radosław Waśko	e9fa12763e	Improve performance of `add_row_number` (#8076 ) Fixes #8055	2023-10-17 00:42:35 +00:00
Radosław Waśko	08b717eb54	Refactor Table problem handling to a more robust and hopefully cleaner approach (#7879 ) Closes #7514	2023-10-16 15:09:08 +00:00
GregoryTravis	f18d1323e1	Add Table.expand_to_rows to allow flattening vector and array values in table (#8042 ) # Important Notes Also includes a fix for a reallocation bug in `InferredBuilder`.	2023-10-13 20:54:06 +00:00
Radosław Waśko	cd84ac16ce	Restructure `Table.from_objects` to use conversions (#8020 ) Closes #7957	2023-10-11 22:25:18 +00:00
somebody1234	826127d8ff	Eliminate line feeds from `XML.outer_xml` on Windows (#8013 ) - Closes #7999 # Important Notes None	2023-10-10 23:21:34 +00:00
Radosław Waśko	6e0bd86753	Implement `Table.lookup_and_replace` for in-memory (#7979 ) - Closes #7749 implementing the in-memory logic. - Additional complications have surfaced regarding the Database logic, so it has been split off into a separate ticket: #7981	2023-10-10 10:42:06 +00:00
GregoryTravis	9ba7be20af	Basic XML support (#7947 ) This PR includes * Reading XML from a file, stream, or string * Reading XML via Data.fetch * Accessing the root element, element children, and attributes * Accessing tag text contents * Get tags by name * Inner / Outer XML string	2023-10-06 17:52:19 +00:00
Radosław Waśko	0cd446432f	Fix inconsistency when building a Mixed column, fixes to Union (#7919 ) - Fixes #7352 by remembering original value types in type inference mode to be able to reconstruct them for Mixed. - Added more benchmarks for comparing performance of constructing columns. - Fixes missing implementations that caused `Table.union` crashing on some type pairs. - Ensures that `Loss_Of_Integer_Precision` warning is not swallowed when numeric columns are unioned to create a `Float` column. - Adds test for all of the above cases. - Allow to output benchmark results to a CSV by setting an environment variable - useful for quickly comparing benchmarks, e.g. in Enso.	2023-10-03 20:33:34 +02:00
Radosław Waśko	08cd449a99	Fix `NumberParser` to avoid `thousandSeparator==decimalPoint` and prefer US decimal format (#7946 ) Closes #7930	2023-10-03 20:07:54 +02:00
Radosław Waśko	8d926166ea	Follow up improvements to `Date_Time_Formatter` (#7875 ) - Closes #7872 - Also closes #7866	2023-09-28 09:38:00 +00:00
Radosław Waśko	c690559ec4	Implement `auto_value_type` operation (#7908 ) Closes #6113	2023-09-27 15:45:34 +00:00
Radosław Waśko	12c4f2981d	More robust Date/Time format patterns parsing (#7826 ) - Closes #7461 by introducing a `Date_Time_Formatter` type and making parsing date time formats more robust and safer. - The default ('simple') set of patterns is slightly simplified and made case insensitive (except for `M/m` and `H/h`) to avoid the `YYYY` vs `yyyy` issues and make it less error prone. - The `YYYY` now has the same meaning as `yyyy` in simple mode. The old meaning (week-based year) is moved to a _separate mode_, triggered by `Date_Time_Formatter.from_iso_week_date_pattern`. - Full Java syntax, as well as custom-built Java `DateTimeFormatter` can also be used by `Date_Time_Formatter.from_java`. - Text-based constants (e.g. `ISO_ZONED_DATE_TIME`) have now become methods on `Date_Time_Formatter`, e.g. `Date_Time_Formatter.iso_zoned_date_time`).	2023-09-22 10:12:18 +00:00
Jaroslav Tulach	ad34a701e4	Upgrading to Frgaal compiler 20.0.1 (#7860 )	2023-09-22 09:58:19 +02:00
James Dunkerley	74d1d0861c	S3 Read Access, Input Stream based reading (#7776 ) - Added a `FileSystemSPI` allowing protocol resolution to a target type. - Separated `Input_Stream` and `Output_Stream` from `File` to allow use in other spaces. - `File_Format` types `read_web` changed to be `read_stream` working with `InputStream`. - Added directory listing to `Auto_Detect` allowing for `Data.read` to list a folder. - Adjusted HTTP to return an `InputStream` not a `byte[]`: - `Response_Body` adjusted to wrap an `InputStream`. - Added ability to materialize to either and in-memory vector (<4KB) or a temporary file. - `Data.fetch` will materialize if not a recognized mime-type. - Added `HTTP_Error` to handle IO exceptions from the stream. - `Excel_Format` now supports mime-type and reading a stream. - `Excel_Workbook` can now get a `Excel_Section` using `read_section`. - Added S3 APIs: - `parse_uri`: splits an S3 URI into bucket and key. - `list_objects`: list the items in a S3 bucket with specified prefix. - `read_bucket`: list prefixes and keys with a delimiter in a S3 bucket with specified prefix. - `head`: either head_bucket (tests existance) or head_object API (reads object meta data). - `get_object`: gets an object from S3 returning as a `Response_Body`. - Added `S3_File` type acting like a `File`: - No support for writing in this PR. - ToDo: recursive listing, glob filtering, exists, size. - Fixed a few invalid type signature line. - Moved `create` methods for `Postgres_Connection` and `SQLite_Connection` into type instead of module. - Renamed `Column_Fetcher.Builder` to `Column_Fetcher_Builder`. - Fixed bug with `select_into` in Dry Run mode creating permanent tables. ToDo: Unit tests.	2023-09-20 15:09:11 +00:00
Hubert Plociniczak	1ee3d8f4f0	Rename Decimal to Float (#7807 ) Implements #6889.	2023-09-14 15:01:30 +00:00
Radosław Waśko	8b6e70b155	Support for BigInteger values in Table (#7715 ) - Fixes #7354 - And also closes #7712 - Refactors how we handle numeric ops - ensuring that the 'kernels' are placed all in one place and selected based on storage types.	2023-09-12 13:18:04 +00:00
Radosław Waśko	255b424b72	Add `value_type` to `Column.from_vector` and `expected_value_type` to `Column.map` and `Column.zip` (#7637 ) - Closes #6111 - Aligns semantics of handling Mixed columns. - Now, if an operation like `iif` or `fill_nothing` is given a `Mixed` column, the result will also be `Mixed` regardless of the `inferred_precise_value_type`. - Enables a few old tests that were pending but could be enabled since the types work is advanced enough.	2023-08-31 13:20:49 +00:00
Radosław Waśko	2385f5b357	Add size-limited strings and varying bit-width integer Value_Types to in-memory backend and check for ArithmeticOverflow in LongStorage (#7557 ) - Closes #5159 - Now data downloaded from the database can keep the type much closer to the original type (like string length limits or smaller integer types). - Cast also exposes these types. - The integers are still all stored as 64-bit Java `long`s, we just check their bounds. Changing underlying storage for memory efficiency may come in the future: #6109 - Fixes #7565 - Fixes #7529 by checking for arithmetic overflow in in-memory integer arithmetic operations that could overflow. Adds a documentation note saying that the behaviour for Database backends is unspecified and depends on particular database.	2023-08-22 18:10:46 +00:00
GregoryTravis	c9d7c5cb2b	Convert in-memory Column.round to Java (#7521 )	2023-08-16 14:45:23 +00:00
Jaroslav Tulach	7a272ec152	Encapsulating array-like data and operations into a single package (#7544 )	2023-08-15 13:00:47 +02:00
Radosław Waśko	b656b336c7	Report `Loss_Of_Integer_Precision` when an integer is not exactly representable as a float during conversion (#7509 ) Closes #7353 I introduce a new type `WithAggregatedProblems`, because `WithProblems` was too simple - it only allowed to hold a `List<Problem>` but `AggregatedProblems` is more than that. Ideally we shouldn't multiply entities like this too much. We should probably unify all to use `WithAggregatedProblems` - but after starting this, I realised it will likely just take too much effort to do for this little PR. So instead, I created a follow-up task for this: #7514	2023-08-08 12:30:44 +00:00
Pavel Marek	8e49255d92	Invoke all Enso benchmarks via JMH (#7101 ) # Important Notes #### The Plot - there used to be two kinds of benchmarks: in Java and in Enso - those in Java got quite a good treatment - there even are results updated daily: https://enso-org.github.io/engine-benchmark-results/ - the benchmarks written in Enso used to be 2nd class citizen #### The Revelation This PR has the potential to fix it all! - It designs new [Bench API](`88fd6fb988`) ready for non-batch execution - It allows for _single benchmark in a dedicated JVM_ execution - It provides a simple way to wrap such an Enso benchmark as a Java benchmark - thus the results of Enso and Java benchmarks are [now unified](https://github.com/enso-org/enso/pull/7101#discussion_r1257504440) Long live _single benchmarking infrastructure for Java and Enso_!	2023-08-07 12:39:01 +00:00
GregoryTravis	758b3b31b9	Avoid indexing the table twice for Cross Tab (#7417 ) Rewrites MultiValueIndex.makeCrossTabTable to build only a single index.	2023-08-04 21:14:18 +00:00
Radosław Waśko	bc9cde6543	Fix column naming edge cases - invalid and duplicated columns, case-insensitive name aliasing for case-insensitive backends (#7495 ) - Fixes #7412 - Also adds tests and fixes some more edge cases: - Ensures correct handling of existing Database tables whose column names may be invalid from Enso perspective, or clashing from Enso perspective (e.g. for most DBs `ś` and `s\u0301` are different names, but for Enso they are basically the same so this would cause issues - thus Enso now renames such columns when accessed (still using the correct column reference in the generated SQL under the hood).	2023-08-04 09:04:38 +00:00
GregoryTravis	037a687401	Expose Unicode normalization methods on Texts (#7425 ) Exposes Text_Utils.normalize().	2023-08-03 18:07:00 +00:00
Radosław Waśko	c61c741476	Respect database backend naming limitations when generating table/column names and validate user-provided names to avoid silent name clashes; process JDBC warnings reported from backends (#7428 ) - Closes #5951 - Ensures any SQL warnings reported by the database through the JDBC driver are processed and forwarded to the user. - These warnings show issues like the implicit name truncation that this PR is also solving. It's good to make sure they are visible as they can help avoid and understand unexpected problems. They should not show up in most standard workflows. - Adds simple history to our REPL.	2023-08-03 09:44:27 +00:00
James Dunkerley	7345f0fd9a	Speed up statistics (#7390 ) - Allow `parse_to_columns` to take a `Regex` object. - Add `pattern` to the `Regex` object. - Add `column_names` to the `Row` object. - Improve statistics performance. - Add benchmarks for stats. \| Benchmark \| Reference \| New \| Improvement \| \| --- \| --- \| --- \| --- \| \| Max (by reduce) \| 16.4ms \| 16.3ms \| - \| \| Max (stats) \| 703ms \| 224ms \| 68% \| \| Sum (by reduce) \| 38ms \| 38ms \| - \| \| Sum (stats) \| 753ms \| 420ms \| 44% \| \| Variance (stats) \| 745ms \| 553s \| 26% \| Also tried using a Ref approach for stats but as slower (`7e13c45224`).	2023-07-26 10:01:18 +00:00
Radosław Waśko	4b5a2e2176	Fixing operations on Mixed types (#7368 ) - Fixes #7231 - Cleans up vectorized operations to distinguish unary and binary operations. - Introduces MixedStorage which may pretend to be a more specialized storage on demand. - Ensures that operations request a more specialized storage on right-hand side to ensure compatibility with reported inferred storage type. - Ensures that a dataflow error returned by an Enso callback in Java is propagated as a polyglot exception and can be caught back in Enso - Tests for comparison of Mixed storages with each other and other types - Started using `Set` for `Filter_Condition.Is_In` for better performance. - ~~Migrated `Column.map` and `Column.zip` to use the Java-to-Enso callbacks.~~ - This does not forward warnings. IMO we should not be losing them. We can switch and add a ticket to fix the warnings, but that would be a regression (current implementation handles them correctly). Instead, we should first gain some ability to work with warnings in polyglot. I created a ticket to get this figured out #7371 - ~~Trying to avoid conversions when calling Enso functions from Java.~~ - Needs extra care as dataflow errors may not be handled right then. So only works for simple functions that should not error. - Not sure how much it really helps. [Benchmarks](https://github.com/enso-org/enso/pull/7270#issuecomment-1635618393) suggested it could improve the performance quite significantly, but the practical solution is not exactly the same as the one measured, so we may have to measure and tune it to get the best results. - Created #7378 to track this.	2023-07-25 23:25:17 +00:00
Radosław Waśko	56635c9a88	Add benchmarks comparing performance of Table operations 'vectorized' in Java vs performed in Enso (#7270 ) The added benchmark is a basis for a performance investigation. We compare the performance of the same operation run in Java vs Enso to see what is the overhead and try to get the Enso operations closer to the pure-Java performance.	2023-07-21 17:25:02 +00:00
Radosław Waśko	620cc361ce	Add `date_diff`, `date_add` and `date_part` to scalar Enso date-time values. (#7273 ) Followup of #7221, adding `date_diff`, `date_add` and `date_part` to scalar Enso date-time values.	2023-07-13 15:17:21 +00:00
Radosław Waśko	ca68dd94da	Adding new Date/Time operations (`-`, `date_add`, `date_diff`, `date_part`) (#7221 ) - Adds `Column.date_diff` for computing date/time difference as integer multiply of some unit. - Adds `Column.date_add` for shifting date/time by a unit. - Adds `Column.date_part` for extracting various parts of the date/time value as integer. - Adds widgets for the 3 methods above whose content depends on the column value type. - Adds shorthands: `Column.hour`, `Column.minute` and `Column.second` to extract these date parts. - Extends `Time_Period` with support for milli-, micro- and nano- seconds; and adapts functions taking `Time_Period` to support these wherever possible.	2023-07-13 12:56:54 +00:00
James Dunkerley	0adab6c68c	Round on a column was always adding a warning (#7246 ) - Only warn if outside allowed range. - Added `is_infinite` to In-Memory column. - Allow integer value type for `is_nan` and `is_infinite`.	2023-07-10 17:35:23 +00:00
James Dunkerley	1fb60df61b	Fixes from the live demo. (#7243 ) - Removed defaults from `cross_tab`. It caused an out-of-heap space error when it attempted to build a 205k x 205k table. Now has a hard limit of 10,000 columns - we can increase this once we have more concrete test data. ![image](https://github.com/enso-org/enso/assets/4699705/bc38d41c-56dc-41bd-8a7c-fa89ecfa7f79) - Adjusted the dropdowns on `Aggregate_Column` for `columns` and `order_by` to be dropdowns as nested Vector editors are not supported. ![image](https://github.com/enso-org/enso/assets/4699705/f4a7c7cc-6a21-462c-a39e-65fbab82c367) - Altered `Aggregate_Column` so `new_name` now `new_name:Text=""` and not taking `Nothing` anymore. Makes it appear correctly in IDE. ![image](https://github.com/enso-org/enso/assets/4699705/196a49ba-4274-44bb-b876-0372c8f62746) - Added dropdowns for `fill_empty`, `fill_nothing` and `replace` on `Table`. ![image](https://github.com/enso-org/enso/assets/4699705/9ee5cec2-82d5-4452-b650-67015ac9fee5) - Added `replace` to Database table throwing `Unsupport_Database_Operation`.	2023-07-09 18:03:05 +00:00
Radosław Waśko	78545b4402	Add safepoints to standard libraries Java polyglot helpers (#7183 ) Closes #7129	2023-07-05 14:12:13 +00:00

1 2 3 4 5

226 Commits