enso-org/enso - enso - gitea: Gitea Service

mirror of https://github.com/enso-org/enso.git synced 2024-12-02 02:14:12 +03:00

Author	SHA1	Message	Date
Radosław Waśko	bc9cde6543	Fix column naming edge cases - invalid and duplicated columns, case-insensitive name aliasing for case-insensitive backends (#7495 ) - Fixes #7412 - Also adds tests and fixes some more edge cases: - Ensures correct handling of existing Database tables whose column names may be invalid from Enso perspective, or clashing from Enso perspective (e.g. for most DBs `ś` and `s\u0301` are different names, but for Enso they are basically the same so this would cause issues - thus Enso now renames such columns when accessed (still using the correct column reference in the generated SQL under the hood).	2023-08-04 09:04:38 +00:00
GregoryTravis	037a687401	Expose Unicode normalization methods on Texts (#7425 ) Exposes Text_Utils.normalize().	2023-08-03 18:07:00 +00:00
Radosław Waśko	c61c741476	Respect database backend naming limitations when generating table/column names and validate user-provided names to avoid silent name clashes; process JDBC warnings reported from backends (#7428 ) - Closes #5951 - Ensures any SQL warnings reported by the database through the JDBC driver are processed and forwarded to the user. - These warnings show issues like the implicit name truncation that this PR is also solving. It's good to make sure they are visible as they can help avoid and understand unexpected problems. They should not show up in most standard workflows. - Adds simple history to our REPL.	2023-08-03 09:44:27 +00:00
James Dunkerley	7345f0fd9a	Speed up statistics (#7390 ) - Allow `parse_to_columns` to take a `Regex` object. - Add `pattern` to the `Regex` object. - Add `column_names` to the `Row` object. - Improve statistics performance. - Add benchmarks for stats. \| Benchmark \| Reference \| New \| Improvement \| \| --- \| --- \| --- \| --- \| \| Max (by reduce) \| 16.4ms \| 16.3ms \| - \| \| Max (stats) \| 703ms \| 224ms \| 68% \| \| Sum (by reduce) \| 38ms \| 38ms \| - \| \| Sum (stats) \| 753ms \| 420ms \| 44% \| \| Variance (stats) \| 745ms \| 553s \| 26% \| Also tried using a Ref approach for stats but as slower (`7e13c45224`).	2023-07-26 10:01:18 +00:00
Radosław Waśko	4b5a2e2176	Fixing operations on Mixed types (#7368 ) - Fixes #7231 - Cleans up vectorized operations to distinguish unary and binary operations. - Introduces MixedStorage which may pretend to be a more specialized storage on demand. - Ensures that operations request a more specialized storage on right-hand side to ensure compatibility with reported inferred storage type. - Ensures that a dataflow error returned by an Enso callback in Java is propagated as a polyglot exception and can be caught back in Enso - Tests for comparison of Mixed storages with each other and other types - Started using `Set` for `Filter_Condition.Is_In` for better performance. - ~~Migrated `Column.map` and `Column.zip` to use the Java-to-Enso callbacks.~~ - This does not forward warnings. IMO we should not be losing them. We can switch and add a ticket to fix the warnings, but that would be a regression (current implementation handles them correctly). Instead, we should first gain some ability to work with warnings in polyglot. I created a ticket to get this figured out #7371 - ~~Trying to avoid conversions when calling Enso functions from Java.~~ - Needs extra care as dataflow errors may not be handled right then. So only works for simple functions that should not error. - Not sure how much it really helps. [Benchmarks](https://github.com/enso-org/enso/pull/7270#issuecomment-1635618393) suggested it could improve the performance quite significantly, but the practical solution is not exactly the same as the one measured, so we may have to measure and tune it to get the best results. - Created #7378 to track this.	2023-07-25 23:25:17 +00:00
Radosław Waśko	56635c9a88	Add benchmarks comparing performance of Table operations 'vectorized' in Java vs performed in Enso (#7270 ) The added benchmark is a basis for a performance investigation. We compare the performance of the same operation run in Java vs Enso to see what is the overhead and try to get the Enso operations closer to the pure-Java performance.	2023-07-21 17:25:02 +00:00
Radosław Waśko	620cc361ce	Add `date_diff`, `date_add` and `date_part` to scalar Enso date-time values. (#7273 ) Followup of #7221, adding `date_diff`, `date_add` and `date_part` to scalar Enso date-time values.	2023-07-13 15:17:21 +00:00
Radosław Waśko	ca68dd94da	Adding new Date/Time operations (`-`, `date_add`, `date_diff`, `date_part`) (#7221 ) - Adds `Column.date_diff` for computing date/time difference as integer multiply of some unit. - Adds `Column.date_add` for shifting date/time by a unit. - Adds `Column.date_part` for extracting various parts of the date/time value as integer. - Adds widgets for the 3 methods above whose content depends on the column value type. - Adds shorthands: `Column.hour`, `Column.minute` and `Column.second` to extract these date parts. - Extends `Time_Period` with support for milli-, micro- and nano- seconds; and adapts functions taking `Time_Period` to support these wherever possible.	2023-07-13 12:56:54 +00:00
James Dunkerley	0adab6c68c	Round on a column was always adding a warning (#7246 ) - Only warn if outside allowed range. - Added `is_infinite` to In-Memory column. - Allow integer value type for `is_nan` and `is_infinite`.	2023-07-10 17:35:23 +00:00
James Dunkerley	1fb60df61b	Fixes from the live demo. (#7243 ) - Removed defaults from `cross_tab`. It caused an out-of-heap space error when it attempted to build a 205k x 205k table. Now has a hard limit of 10,000 columns - we can increase this once we have more concrete test data. ![image](https://github.com/enso-org/enso/assets/4699705/bc38d41c-56dc-41bd-8a7c-fa89ecfa7f79) - Adjusted the dropdowns on `Aggregate_Column` for `columns` and `order_by` to be dropdowns as nested Vector editors are not supported. ![image](https://github.com/enso-org/enso/assets/4699705/f4a7c7cc-6a21-462c-a39e-65fbab82c367) - Altered `Aggregate_Column` so `new_name` now `new_name:Text=""` and not taking `Nothing` anymore. Makes it appear correctly in IDE. ![image](https://github.com/enso-org/enso/assets/4699705/196a49ba-4274-44bb-b876-0372c8f62746) - Added dropdowns for `fill_empty`, `fill_nothing` and `replace` on `Table`. ![image](https://github.com/enso-org/enso/assets/4699705/9ee5cec2-82d5-4452-b650-67015ac9fee5) - Added `replace` to Database table throwing `Unsupport_Database_Operation`.	2023-07-09 18:03:05 +00:00
Radosław Waśko	78545b4402	Add safepoints to standard libraries Java polyglot helpers (#7183 ) Closes #7129	2023-07-05 14:12:13 +00:00
Radosław Waśko	2d73277238	Fix a bug that somehow went under CI (#7204 )	2023-07-05 08:54:27 +00:00
GregoryTravis	550d146493	Add round, ceil, floor, truncate to the In-Database Column type (#6988 )	2023-06-30 16:47:40 +00:00
Radosław Waśko	2bac9cc844	Execution Context integration for Database write operations (#7072 ) Closes #6887	2023-06-27 15:51:21 +00:00
James Dunkerley	937651f696	Code Clean Up, Fix Weird Namespace, S3 List Objects and Read Object (#7114 ) Mostly a tidy up as part of looking over the function catalogue for groups. Sorted some whitespaces issues.	2023-06-24 23:18:58 +00:00
James Dunkerley	1859ccbab5	Improving widgets and other minor tweaks. (#7052 ) - Removed `module` argument from `enso_project` (new `Project_Description.new` API). - Removed the custom option from date and time parse/format dropdowns. - The `format` dropdown uses the value to create the dropdown. (Screenshot below) - Removed `StorageType` coalescing rules and replaced them with simpler logic in `ObjectStorage`. - Update signature for `add_row_number` and add aliases.	2023-06-19 19:03:36 +00:00
James Dunkerley	760fb71798	First part of AWS S3 API, various small fixes. (#6973 ) - Add type detection for `Mixed` columns when calling column functions. - Excel uses column name for missing headers. - Add aliases for parse functions on text. - Adjust `Date`, `Time_Of_Day` and `Date_Time` parse functions to not take `Nothing` anymore and provide dropdowns. - Removed built-in parses. - All support Locale. - Add support for missing day or year for parsing a Date. - All will trim values automatically. - Added ability to list AWS profiles. - Added ability to list S3 buckets. - Workaround for Table.aggregate so default item added works.	2023-06-15 16:20:13 +00:00
Dmitry Bushev	6249c79ffd	Update sbt-java-formatter plugin (#7011 ) Update java formatter plugin. The new version can remove unused imports.	2023-06-12 14:18:48 +00:00
James Dunkerley	578ba59f1d	Use US Locale for Date and Time parsing and formatting (#6967 ) Sorts out parsing and printing long form names of months and weekdays.	2023-06-06 21:44:25 +00:00
Radosław Waśko	1931e9e51f	Workaround for `to_date_time` type errors (#6964 ) Related to #6912 It essentially solves it by removing any builtins that would take an EnsoDate/EnsoTimeOfDay/EnsoTimeZone and replacing them with Java utils that do the same operation. This is not a proper solution - the builtin conversion is still invalid for the date/time types - but at this moment we may just no longer use the invalid conversion so it is much less of an issue. We still need to be aware of this if we want to introduce builtins taking date/time in the future.	2023-06-06 20:28:11 +00:00
GregoryTravis	912fbce97b	Reimplement Column.truncate, .ceil, and .floor as vectorized Java ops (#6941 ) Reimplement these in Java. Benchmarks: Before: Column.truncate floats average: 124.4ms Column.ceil floats average: 121.47ms Column.floor floats average: 120.18ms Column.truncate ints average: 124.78ms Column.ceil ints average: 120.41ms Column.floor ints average: 102.35ms After (boxed): Column.truncate floats average: 3.75ms Column.ceil floats average: 2.25ms Column.floor floats average: 1.89ms Column.truncate ints average: 2ms Column.ceil ints average: 1.77ms Column.floor ints average: 1.74ms After (unboxed): Column.truncate floats average: 3.32ms Column.ceil floats average: 2.15ms Column.floor floats average: 1.69ms Column.truncate ints average: 1.74ms Column.ceil ints average: 1.61ms Column.floor ints average: 1.99ms	2023-06-06 18:07:12 +00:00
Radosław Waśko	d44b1250b7	Implement `Table.add_row_number` (#6890 ) Closes #5227 # Important Notes - This lays first steps towards #6292 - we get pure Enso variants of MultiValueKey. - Another part refactors `LongStorage` into `AbstractLongStorage` allowing it to provide alternative implementations of the underlying storage, in our case `LongRangeStorage` generating the values ad-hoc and `LongConstantStorage` - currently unused but in the future it can be adapted to support constant columns (once we implement similar facilities for other types).	2023-06-02 10:13:13 +00:00
GregoryTravis	0337180384	Add rounding functions to the Column type (#6817 )	2023-06-01 20:06:23 +00:00
Radosław Waśko	c3e771c75c	Allow casting a Mixed column into a concrete type (#6777 ) Follow-up of #6711 Closes #6838	2023-05-26 13:25:53 +00:00
Radosław Waśko	447786a304	Implement `cast` for Table and Column (#6711 ) Closes #6112	2023-05-19 10:00:20 +00:00
Radosław Waśko	cd7fb73232	Add `Date_Range` (#6621 ) Closes #6543	2023-05-11 16:03:02 +00:00
GregoryTravis	4ba8409def	Add format to the in-memory Column (#6538 ) Add format to the in-memory Column # Important Notes Also updates .format in date types. Some rearrangement of date formatting builtins / Java libraries.	2023-05-09 08:47:40 +00:00
James Dunkerley	bc0db18a6e	Small changes from Book Club issues (#6533 ) - Add dropdown to tokenize and split `column`. - Remove the custom `Join_Kind` dropdown. - Adjust split and tokenize names to start numbering from 1, not 0. - Add JS_Object serialization for Period. - Add `days_until` and `until` to `Date`. - Add `Date_Period.Day` and create `next` and `previous` on `Date`. - Use simple names with `File_Format` dropdown. - Avoid using `Main.enso` based imports in `Standard.Base.Data.Map` and `Standard.Base.Data.Text.Helpers`. - Remove an incorrect import from `Standard.Database.Data.Table`. From #6587: A few small changes, lots of lines because this affected lots of tests: - `Table.join` now defaults to `Join_Kind.Left_Outer`, to avoid losing rows in the left table unexpectedly. If the user really wants to have an Inner join, they can switch to it. - `Table.join` now defaults to joining columns by name not by index - it looks in the right table for a column with the same name as the first column in left table. - Missing Input Column errors now specify which table they refer to in the join. - The unique name suffix in column renaming / default column names when loading from file is now a space instead of underscore.	2023-05-06 10:10:24 +00:00
Radosław Waśko	41a8257e8d	Separating Redshift connector from `Database` library into a new `AWS` library (#6550 ) Related to #5777	2023-05-04 17:36:51 +00:00
James Dunkerley	6b0c682b08	Add Execution Context control to Text.write (#6459 ) - Adjusted `Context.is_enabled` to support default argument (moved built in so can have defaults). - Made `environment` case-insensitive. - Bug fix for play button. - Short hand to execute within an enabled context. - Forbid file writing if the Output context is disabled with a `Forbidden_Operation` error. - Add temporary file support via `File.create_temporary_file` which is deleted on exit of JVM. - Execution Context first pass in `Text.write`. - Added dry run warning. - Writes to a temporary file if disabled. - Created a `DryRunFileManager` which will create and manage the temporary files. - Added `format` dropdown to `File.read` and `Data.read`. - Renamed `JSON_File` to `JSON_Format` to be consistent. (still to unit test).	2023-04-29 08:39:18 +00:00
James Dunkerley	0c7c3bdeaf	Fix for the massive number of warnings when renaming with invalid names. (#6450 ) * Rename makeUnique overloads to avoid issue when Nothing is passed. Suspend warnings when building the output table to avoid mass warning duplication. * Add test for mixed invalid names. Adjust so a single warning attached. * PR comments.	2023-04-27 14:51:59 +01:00
Radosław Waśko	a43d524336	Add typechecks to Aggregate and Cross Tab (#6380 ) Follow up of #6298 as it grew too much. Adds the needed typechecks to aggregate operations. Ensures that the DB operations report `Floating_Point_Equality` warning consistently with in-memory.	2023-04-24 08:55:54 +00:00
Radosław Waśko	8db2ad51a1	Adding typechecks to Column Operations (#6298 ) Closes #6106	2023-04-21 12:20:12 +00:00
James Dunkerley	0350762386	Add `replace`, `trim` to Column. Better number parsing. (#6253 ) - Add `replace` with same syntax as on `Text` to an in-memory `Column`. - Add `trim` with same syntax as on `Text` to an in-memory `Column`. - Add `trim` to in-database `Column`. - Added `is_supported` to dialects and exposed the dialect consistently on the `Connection`. - Add `write_table` support to `JSON_File` allowing `Table.write` to write JSON. - Updated the parsing for integers and decimals: - Support for currency symbols. - Support for brackets for negative numbers. - Automatic detection of decimal points and thousand separators. - Tighter rules for scientific and thousand separated numbers. - Remove `replace_text` from `Table`. - Remove `write_json` from `Table`.	2023-04-20 16:04:59 +00:00
Radosław Waśko	f5db35af07	Adjust `{Table\|Column}.parse` to use `Value_Type` (#6213 ) Closes #5660	2023-04-06 10:58:55 +00:00
Jaroslav Tulach	4805193428	Text.to_display_text is (shortened) identity (#6174 ) Fixes #5971.	2023-04-05 19:53:07 +00:00
GregoryTravis	d9bc5246ba	Remove old (Java) Regex library and replace with new (Truffle) library. (#6195 ) Remove old (Java) Regex library and replace with new (Truffle) library.	2023-04-04 19:58:26 +00:00
GregoryTravis	fb77f42fd5	Update `Text.split` to take a `Vector Text` parameter (#6156 ) Allows you to pass a vector of delimiters to `split`.	2023-04-04 14:44:47 +00:00
James Dunkerley	f26bcf6ab6	Small issues from working with Ned (#6160 ) - `Process.run` now returns a `Process_Result` allowing the easy capture of stdout and stderr. - Joining a column with a column name does not warn if adding just the prefix. - Stop the table viz from changing case and adding spaces to the headers.	2023-04-03 13:01:42 +00:00
Radosław Waśko	6ddcb553e5	Date/time support for Postgres. Year/month/day operations on Columns. (#6153 ) Closes #6115	2023-03-31 18:37:04 +00:00
Radosław Waśko	6f86115498	Proper implementation of Value Types in Table (#6073 ) This is the first part of the #5158 umbrella task. It closes #5158, follow-up tasks are listed as a comment in the issue. - Updates all prototype methods dealing with `Value_Type` with a proper implementation. - Adds a more precise mapping from in-memory storage to `Value_Type`. - Adds a dialect-dependent mapping between `SQL_Type` and `Value_Type`. - Removes obsolete methods and constants on `SQL_Type` that were not portable. - Ensures that in the Database backend, operation results are computed based on what the Database is meaning to return (by asking the Database about expected types of each operation). - But also ensures that the result types are sane. - While SQLite does not officially support a BOOLEAN affinity, we add a set of type overrides to our operations to ensure that Boolean operations will return Boolean values and will not be changed to integers as SQLite would suggest. - Some methods in SQLite fallback to a NUMERIC affinity unnecessarily, so stuff like `max(text, text)` will keep the `text` type instead of falling back to numeric as SQLite would suggest. - Adds ability to use custom fetch / builder logic for various types, so that we can support vendor specific types (for example, Postgres dates). # Important Notes - There are some TODOs left in the code. I'm still aligning follow-up tasks - once done I will try to add references to relevant tasks in them.	2023-03-31 16:16:18 +00:00
GregoryTravis	6b9cbeacb2	Implement Regular Expression replace and update `Text.replace` to the new API (#5959 ) Re-implement replace on top of Truffle regex.	2023-03-28 06:13:12 +00:00
James Dunkerley	bf2545fa04	Use new common parse method throwing less exceptions. (#6075 ) Avoiding exceptions by not using parseBest. Time now in CLI is 1.15s for 500k rows vs 1.65s in GUI. CLI: ![image](https://user-images.githubusercontent.com/4699705/227711266-bc005b0d-5011-450f-964b-65dd2e437c2e.png) GUI: ![image](https://user-images.githubusercontent.com/4699705/227711259-f7ddda29-86c7-4eef-a002-4bf0bda6063f.png) Added it as a function in the shared library so used by both engine and polyglot.	2023-03-27 11:02:10 +00:00
James Dunkerley	58f2c7643f	Use new Enso Hash Codes and Comparable (#6060 ) Enables `distinct`, `aggregate` and `cross_tab` to use the Enso hashing and equality operations. Also, I rewired the way the ObjectComparators are obtained in polyglot code to be more consistent. Add Comparator for `Day_Of_Week`, `Header`, `SQL_Type`, `Image` and `Matrix`. Also, removed the custom `==` from these types as needed. (Closes #5626)	2023-03-24 15:02:25 +00:00
Radosław Waśko	952beba8d1	Fix `cross_tab` column naming edge cases, add `fill_empty` (#5863 ) Closes #5151 and adds some additional tests for `cross_tab` that verify duplicated and invalid names. I decided that for empty or `Nothing` names, instead of replacing them with `Column` and implicitly losing connection with the value that was in the column, we should just error on such values. To make handling of these easier, `fill_empty` was added allowing to easily replace the empty values with something else. Also, `{is,fill}_missing` was renamed to `{is,fill}_nothing` to align with `Filter_Condition.Is_Nothing`.	2023-03-11 11:58:54 +00:00
Radosław Waśko	263c3ad651	Add a `common-polyglot-core-utils` project (#5855 ) Adds a common project that allows sharing code between the `runtime` and `std-bits`. Due to classpath separation and the way it is compiled, the classes will be duplicated - we will have one copy for the `runtime` classpath and another copy as a small JAR for `Standard.Base` library. This is still much better than having the code duplicated - now at least we have a single source of truth for the shared implementations. Due to the copying we should not expand this project too much, but I encourage to put here any methods that would otherwise require us to copy the code itself. This may be a good place to put parts of the hashing logic to then allow sharing the logic between the `runtime` and the `MultiValueKey` in the `Table` library (cc: @Akirathan).	2023-03-11 09:27:26 +00:00
Radosław Waśko	91ef8acf35	Review generated Column names (#5850 ) Closes #5583 and closes #5157	2023-03-10 19:07:58 +00:00
Radosław Waśko	62e57f5557	Test some Mismatched Quote edge cases in Delimited reader (#5810 ) Follow-up to #5113 - I add some more edge case tests as we discussed with @jdunkerley When debugging some quoting issues, I also realised the current `Mismatched_Quote` error provided not enough information. So I amended it to at least include some context indicating which was the 'offending' cell.	2023-03-10 15:47:57 +00:00
James Dunkerley	299bfd6b7d	Fixes from the Demo on 2nd March (#5823 ) - Fix issue with Geo Map viz. - Handle invalid format strings better in `Data_Formatter`. - New constants for the ISO format strings (and a special ENSO_ZONED_DATE_TIME) - Consistent Date Time format for parsing in all places. - Avoid throwing exception in datetime parsing. - Support for milliseconds (well nanoseconds) in Date_Time and Time_Of_Day. - `Column.map` stays within Enso. - Allow `Aggregate_Column.Group_By` in `cross_tab` group_by parameter.	2023-03-07 20:58:00 +00:00
Pavel Marek	b6e2319fcc	Comparators support partial ordering (#5778 )	2023-03-07 04:16:38 +00:00
Radosław Waśko	2d29456ed1	Review File/Data read and read_text warnings (#5799 ) Closes #5113 Fixes a bug where read-only files would be overwritten if File.write was used in backup mode, and added tests to avoid such regression. To implement it, introduced a `is_writable` property on `File`.	2023-03-06 03:43:38 +00:00
James Dunkerley	01fc34c18a	Improving Expression Support for In Database (#5790 ) - Adjust Excel Workbook write behaviour. - Support Nothing / Null constants. - Deduce the type of arithmetic operations and `iif`. - Allow Date_Time constants, treating as local timezone. - Removed the `to_column_name` and `ensure_sane_name` code.	2023-03-03 12:03:05 +00:00
Radosław Waśko	793eafc866	Improve Table.parse_values API (#5692 ) Closes #5111	2023-02-24 13:35:01 +00:00
James Dunkerley	652b8d5db3	Update `rename_columns` to new API design, add `first_row`, `second_row` and `last_row` functions to the table. (#5719 ) - Updates the `rename_columns` API. - Add `first_row`, `second_row` and `last_row` to the Table types. - New option for reading only last row of ResultSet.	2023-02-23 19:42:45 +00:00
Radosław Waśko	4dcf802831	Ensure that warnings are preserved on Nothing values passing back to Enso through polyglot boundary (#5677 ) Fixes #5672 # Important Notes - Added a subproject `enso-test-java-helpers` which allows the in-Enso tests to add Java helpers for testing.	2023-02-17 13:38:26 +00:00
Radosław Waśko	3027c6f3a2	Ensure entries containing newlines are quoted when writing Delimited Files (#5652 ) Fixes #5638	2023-02-17 00:57:48 +00:00
James Dunkerley	1bc27501e6	Remove `Column` type from Aggregate_Column, simplify Column_Selector, some new `File_Format`s (#5646 ) - Updated `Widget.Vector_Editor` ready for use by IDE team. - Added `get` to `Row` to make API more aligned. - Added `first_column`, `second_column` and `last_column` to `Table` APIs. - Adjusted `Column_Selector` and associated methods to have simpler API. - Removed `Column` from `Aggregate_Column` constructors. - Added new `Excel_Workbook` type and added to `Excel_Section`. - Added new `SQLiteFormatSPI` and `SQLite_Format`. - Added new `IamgeFormatSPI` and `Image_Format`.	2023-02-16 15:15:49 +00:00
Radosław Waśko	a02eab451e	Implement basic warnings for column arithmetic, review warnings on expressions and `filter` (#5605 ) Closes #5109 # Important Notes - Currently the tests pass for the in-memory parts of Common_Table_Operations, but still some stuff not working on DB backends - in progress.	2023-02-14 09:33:04 +00:00
James Dunkerley	1c821e22cf	Some fixed form the Anagrams experiment. (#5592 ) - Fixes the display of Date, Time_Of_Day and Date_Time so doesn't wrap. - Adjust serialization of large integer values for JS and display within table. - Workaround for issue with using `.lines` in the Table (new bug filed). - Disabled warning on no specified `separator` on `Concatenate`. Does not include fix for aggregation on integer values outside of `long` range.	2023-02-08 22:17:00 +00:00
Radosław Waśko	4f90946d1e	Rework Invalid Aggregations (#5579 ) Closes #5108	2023-02-08 18:39:09 +00:00
Radosław Waśko	778d28fba3	Table with no columns is not valid, No_Output_Columns is always an error (#4073 ) Implements https://www.pivotaltracker.com/story/show/184226020	2023-01-25 02:40:23 +00:00
Radosław Waśko	d2e57edc8b	Add Table.cross_join and Table.zip to In-Memory Table (#4063 ) Implements https://www.pivotaltracker.com/story/show/184239059	2023-01-23 13:19:52 +00:00
Radosław Waśko	8853053020	Division in Columns within InDB is integer based if both columns are integers (#4057 ) Fixes https://www.pivotaltracker.com/story/show/184073099 # Important Notes - Since now the only operator on columns for division, `/`, returns floats, it may be worth creating an additional `div` operator exposing integer division. But that will be done as a separate task aligning column operator APIs.	2023-01-17 20:29:25 +00:00
Radosław Waśko	082e0bfd0d	Add `Table.union` to the In-Memory Table. (#4052 ) Implements https://www.pivotaltracker.com/story/show/183854144	2023-01-17 00:34:57 +00:00
Radosław Waśko	0088096a58	Implement Distinct for the Database backends (#4027 ) Implements https://www.pivotaltracker.com/story/show/182307281	2023-01-11 22:46:54 +00:00
Radosław Waśko	8c661fdb74	Database Joins (#4007 ) Implements https://www.pivotaltracker.com/story/show/184032869 # Important Notes - Currently we get failures in Full joins on Postgres which show a more serious problem - amending equality to ensure that `[NULL = NULL] == True` breaks hash/merge based indexing - so such joins will be extremely inefficient. All our joins currently rely on this notion of equality which will mean all of our DB joins will be extremely inefficient. - We need to find a solution that will support nulls and still work OK with indices (but after exploring a few approaches: `COALESCE(a = b, a IS NULL AND b is NULL)`, `a IS NOT DISTINCT FROM b`, `(a = b) OR (a IS NULL AND b is NULL)`; all of which did not work (they all result in `ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions`) I'm less certain that it is possible. Alternatively, we may need to change the NULL semantics to align it with SQL - this seems like likely the simpler solution, allowing us to generate simple, reliable SQL - the NULL=NULL solution will be cornering us into nasty workarounds very dependent on the particular backend.	2023-01-05 10:36:22 +00:00
Dmitry Bushev	1e5e2327ab	Improve performance of Text.compare_to (#4012 ) PR adds a flag to `Text` implementation tracking whether it is in a FCD normal form. Then this information can be used in the `Normalizer.compare` method. \| Benchmark name \| Old (ms) \| With flag (ms) \| --- \| --- \| --- \| Unicode very short \| 40.29 \| 40.04 \| Unicode medium \| 9.07 \| 1.99 \| Unicode big - random \| 115.39 \| 0.35 \| Unicode big - early difference \| 107.02 \| 0.54 \| Unicode big - late difference \| 749.81 \| 94.73 \| ASCII very short \| 28.13 \| 31.13 \| ASCII medium \| 4.58 \| 2.26 \| ASCII big - random \| 42.68 \| 0.26 \| ASCII big - early difference \| 30.91 \| 0.32 \| ASCII big - late difference \| 66.29 \| 42.72 Full benchmark output. [bench_old.txt](https://github.com/enso-org/enso/files/10325202/bench_old.txt) [bench_new.txt](https://github.com/enso-org/enso/files/10325201/bench_new.txt)	2023-01-02 17:09:03 +00:00
Jaroslav Tulach	7252af6d62	Enso.getMetaObject, Type.isMetaInstance and Meta.is_a consolidation (#3949 ) Implements `getMetaObject` and related messages from Truffle interop for Enso values and types. Turns `Meta.is_a` into builtin and re-uses the same functionality. # Important Notes Adds `ValueGenerator` testing infrastructure to provide unified access to special Enso values and builtin types that can be reused by other tests, not just `MetaIsATest` and `MetaObjectTest`.	2022-12-22 08:00:06 +00:00
James Dunkerley	579d3fc397	Adds Date, Time_Of_Day and Date_Time support to Excel IO (#3997 ) - Allow date time inputs from Excel. - Enables disabled test. - Fix for Map.==. - Allow nulls in crosstab name.	2022-12-20 16:12:00 +00:00
James Dunkerley	ace459ed53	Let JavaScript parse JSON and write JSON ... (#3987 ) Use JavaScript to parse and serialise to JSON. Parses to native Enso object. - `.to_json` now returns a `Text` of the JSON. - Json methods now `parse`, `stringify` and `from_pairs`. - New `JSON_Object` representing a JavaScript Object. - `.to_js_object` allows for types to custom serialize. Returning a `JS_Object`. - Default JSON format for Atom now has a `type` and `constructor` property (or method to call for as needed to deserialise). - Removed `.into` support for now. - Added JSON File Format and SPI to allow `Data.read` to work. - Added `Data.fetch` API for easy Web download. - Default visualization for JS Object trunctes, and made Vector default truncate children too. Fixes defect where types with no constructor crashed on `to_json` (e.g. `Matching_Mode.Last.to_json`. Adjusted default visualisation for Vector, so it doesn't serialise an array of arrays forever. Likewise, JS_Object default visualisation is truncated to a small subset. New convention: - `.get` returns `Nothing` if a key or index is not present. Takes an `other` argument allowing control of default. - `.at` error if key or index is not present. - `Nothing` gains a `get` method allowing for easy propagation.	2022-12-20 10:33:46 +00:00
Radosław Waśko	b9bf958f2c	Efficient joining for Equals and Equals_Ignore_Case using a hashmap (#3978 ) - Implemented https://www.pivotaltracker.com/story/show/183913276 - Refactored MultiValueIndex and MultiValueKeys to be more type-safe and more direct about using ordered or unordered maps. - Added performance tests ensuring we use an efficient algorithm for the joins (the tests will fail for a full O(N*M) scan). - Removed some duplicate code in the Table library. - Added optional coloring of test results in terminal to make failures easier to spot.	2022-12-14 22:56:20 +00:00
James Dunkerley	77fe69dfd9	JSON Improvements, small Table stuff, Statistic in Enso not Java and few other minor bits. (#3964 ) - Aligned `compare_to` so returns `Type_Error` if `that` is wrong type for `Text`, `Ordering` and `Duration`. - Add `empty_object`, `empty_array`. `get_or_else`, `at`, `field_names` and `length` to `Json`. - Fix `Json` serialisation of NaN and Infinity (to "null"). - Added `length`, `at` and `to_vector` to Pair (allowing it to be treated as a Vector). - Added `running_fold` to the `Vector` and `Range`. - Added `first` and `last` to the `Vector.Builder`. - Allow `order_by` to take a single `Sort_Column` or have a mix of `Text` and `Sort_Column.Name` in a `Vector`. - Allow `select_columns_helper` to take a `Text` value. Allows for a single field in group_by in cross_tab. - Added `Patch` and `Custom` to HTTP_Method. - Added running `Statistic` calculation and moved more of the logic from Java to Enso. Performance seems similar to pure Java version now.	2022-12-14 19:40:27 +00:00
Radosław Waśko	8e880e430b	Improve basic join implementation (#3958 ) Implements https://www.pivotaltracker.com/story/show/183913232 # Important Notes Added counts of succeeded/failed tests within a group and global summary, to easier see how many tests failed.	2022-12-09 00:55:07 +00:00
James Dunkerley	11e07f8676	Use the MultiValueIndex for the JoinStrategy. (#3959 ) Use the MultiValueStrategy for pure equals Joins.	2022-12-08 12:24:53 +00:00
James Dunkerley	0ad70c6332	Tidy Standard.Base part 5 of n ... (hopefully the end...) (#3929 ) - Moved `Any`, `Error` and `Panic` to `Standard.Base`. - Separated `Json` and `Range` extensions into own modules. - Tidied `Case`, `Case_Sensitivity`, `Encoding`, `Matching`, `Regex_Matcher`, `Span`, `Text_Matcher`, `Text_Ordering` and `Text_Sub_Range` in `Standard.Base.Data.Text`. - Tidied `Standard.Base.Data.Text.Extensions` and stopped it re-exporting anything. - Tidied `Regex_Mode`. Renamed `Option` to `Regex_Option` and added type to export. - Tidied up `Regex` space. - Tidied up `Meta` space. - Remove `Matching` from export. - Moved `Standard.Base.Data.Boolean` to `Standard.Base.Boolean`. # Important Notes - Moved `to_json` and `to_default_visualization_data` from base types to extension methods.	2022-12-02 18:08:14 +00:00
James Dunkerley	4518f8303d	Implementing transpose and cross_tab for the InMemory table. (#3919 ) - Adds transpose and cross_tab to the In-Memory table. - Cross Tab is built on top of aggregate and hence allows for expressions and has same error trapping as in aggregate. # Important Notes Only basic tests have been implemented. Error and warning tests will be added as a follow up task.	2022-11-30 01:19:25 +00:00
Radosław Waśko	85cbf7d9f9	Initial (naive) implementation for in memory join (#3918 ) Implements https://www.pivotaltracker.com/story/show/183854123 It features a naive full scan join and only allows equality conditions. More advanced conditions and better optimized algorithms will be implemented in a subsequent PR.	2022-11-29 19:37:31 +00:00
Jaroslav Tulach	35c9ef7680	Enhanced Vector Builder (#3809 ) Manual implementation of vector builder that avoid any copying (if the initial `capacity` is exact). Moreover the builder optimizes for storage of `double` and `long` values - if the array homogeneously consists of these values, then no boxing happens and only primitive types are stored. # Important Notes Added few tests to [Vector_Spec.enso](`76d2f38247`).	2022-11-29 04:41:06 +00:00
Jaroslav Tulach	ecd1fdc3f8	Caching the grapheme_length of a Text (#3864 ) Computing length of a text takes time. Let's cache it after first computation. # Important Notes Wrote `StringBenchmarks` that sums lengths of (the same) `Text` present many time in a `Vector`. Initially it took `383.673 ms` per operation. Then it took `0.031 ms/op`. Looks like the `length` calls are returning instantly as they get cached.	2022-11-14 15:53:10 +00:00
James Dunkerley	45276b243d	Expanding Derived Columns and Expression Syntax (#3782 ) - Added expression ANTLR4 grammar and sbt based build. - Added expression support to `set` and `filter` on the Database and InMemory `Table`. - Added expression support to `aggregate` on the Database and InMemory `Table`. - Removed old aggregate functions (`sum`, `max`, `min` and `mean`) from `Column` types. - Adjusted database `Column` `+` operator to do concatenation (`\|\|`) when text types. - Added power operator `^` to both `Column` types. - Adjust `iif` to allow for columns to be passed for `when_true` and `when_false` parameters. - Added `is_present` to database `Column` type. - Added `coalesce`, `min` and `max` functions to both `Column` types performing row based operation. - Added support for `Date`, `Time_Of_Day` and `Date_Time` constants in database. - Added `read` method to InMemory `Column` returning `self` (or a slice). # Important Notes - Moved approximate type computation to `SQL_Type`. - Fixed issue in `LongNumericOp` where it was always casting to a double. - Removed `head` from InMemory Table (still has `first` method).	2022-11-08 15:57:59 +00:00
Pavel Marek	f8a4e2a9d2	Add `Period` type (#3818 ) This PR adds `Period` type, which is a date-only complement to `Duration` builtin type. # Important Notes - `Period` replaces `Date_Period`, and `Time_Period`. - Added shorthand constructors for `Duration` and `Period`. For example: `Period.days 10` instead of `Period.new days=10`. - `Period` can be compared to other `Period` in some cases, other cases throw an error.	2022-10-28 17:27:20 +00:00
Radosław Waśko	2bc0611869	Add support for using Columns within `Is_In` (#3822 ) Implements https://www.pivotaltracker.com/story/show/183560222	2022-10-24 12:51:15 +00:00
James Dunkerley	f0f6deef2a	Load the File_Format types via a ServiceLoader (#3813 ) Moves the File.read method into the `File` type. Uses the ServiceLoader to find all types for the File_Format.	2022-10-24 09:55:18 +00:00
Radosław Waśko	cc76e7d36a	Add support for `Blank_Columns` to Table and Database (#3812 ) Implements https://www.pivotaltracker.com/story/show/183390281 and https://www.pivotaltracker.com/story/show/183390394	2022-10-20 09:11:08 +00:00
Radosław Waśko	17f73988e8	Update `drop_missing_rows` to `filter_blank_rows` API. (#3805 ) Implements https://www.pivotaltracker.com/story/show/183390042 and https://www.pivotaltracker.com/story/show/183390370	2022-10-18 15:58:50 +00:00
Radosław Waśko	82de8f88bd	Add support for `Is_In` and `Not_In` to `Filter_Condition` (#3790 ) Implements https://www.pivotaltracker.com/story/show/183389945	2022-10-15 11:29:59 +00:00
Pavel Marek	e9260227c4	Duration type is a builtin type (#3759 ) - Reimplement the `Duration` type to a built-in type. - `Duration` is an interop type. - Allow Enso method dispatch on `Duration` interop coming from different languages. # Important Notes - The older `Duration` type should now be split into new `Duration` builtin type and a `Period` type. - This PR does not implement `Period` type, so all the `Period`-related functionality is currently not working, e.g., `Date - Period`. - This PR removes `Integer.milliseconds`, `Integer.seconds`, ..., `Integer.years` extension methods.	2022-10-14 18:08:08 +00:00
Radosław Waśko	592a8516a8	Add `Is_Empty`, `Not_Empty`, `Like` and `Not_Like` to `Filter_Condition` (#3775 ) Implements https://www.pivotaltracker.com/story/show/183389890	2022-10-10 23:11:04 +00:00
Hubert Plociniczak	841b2e6e7a	Suppress some obvious warnings (#3768 )	2022-10-07 10:07:40 +00:00
Radosław Waśko	61a4120cfb	Fix date comparisons and test sorting of tables and vectors with dates (#3745 ) Implements https://www.pivotaltracker.com/story/show/183402892 # Important Notes - Fixes inconsistent `compare_to` vs `==` behaviour in date/time types and adds test for that. - Adds test for `Table.order_by` on dates and custom types. - Fixes an issue with `Table.order_by` for custom types. - Unifies how incomparable objects are reported by `Table.order_by` and `Vector.sort`. - Adds benchmarks comparing `Table.order_by` and `Vector.sort` performance.	2022-09-29 08:48:00 +00:00
Radosław Waśko	cd10b5d34d	Add `Date_Period.Week` to `start_of` and `end_of` methods (#3733 ) Implements https://www.pivotaltracker.com/story/show/183349732	2022-09-23 22:14:35 +00:00
Radosław Waśko	e9ebc663c1	Add business days functions to Date and Date_Time (#3726 ) Implements https://www.pivotaltracker.com/story/show/183082087 # Important Notes - Removed unnecessary invocations of `Error.throw` improving performance of `Vector.distinct`. The time of the `add_work_days and work_days_until should be consistent with each other` test suite came down from 15s to 3s after the changes.	2022-09-22 08:31:15 +00:00
Radosław Waśko	8fa8d12cc3	String functionality in std-table should use std-base (#3717 ) Implements https://www.pivotaltracker.com/story/show/181754646	2022-09-17 14:38:02 +00:00
Radosław Waśko	5ed388930e	Additional tests for handling Dates in Table (#3707 ) Resolves https://www.pivotaltracker.com/story/show/183285801 @JaroslavTulach suggested the current implementation may not handle these correctly, which suggests that the logic is not completely trivial - so I added a test to ensure that it works as we'd expect. Fortunately, it did work - but it's good to keep the tests to avoid regressions.	2022-09-15 23:18:19 +00:00
Radosław Waśko	b304402d8e	Add Period Start and End functions to Date and DateTime (#3695 ) Implements https://www.pivotaltracker.com/story/show/183081152	2022-09-13 09:51:08 +00:00
Hubert Plociniczak	fba5047acc	Improved Vector/Array interop (#3667 ) `Vector` type is now a builtin type. This requires a bunch of additional builtin methods for its creation: - Use `Vector.from_array` to convert any array-like structure into a `Vector` [by copy](`f628b28f5f`) - Use (already existing) `Vector.from_polyglot_array` to convert any array-like structure into a `Vector` without copying - Use (already existing) `Vector.fill 1 item` to create a singleton `Vector` Additional, for pattern matching purposes, we had to implement a `VectorBranchNode`. Use following to match on `x` being an instance of `Vector` type: ``` import Standard.Base.Data.Vector size = case x of Vector.Vector -> x.length _ -> 0 ``` Finally, `VectorLiterals` pass that transforms `[1,2,3]` to (roughly) ``` a1 = 1 a2 = 2 a3 = 3 Vector (Array (a1,a2, a3)) ``` had to be modified to generate ``` a1 = 1 a2 = 2 a3 = 3 Vector.from_array (Array (a1, a2, a3)) ``` instead to accomodate to the API changes. As of `025acaa676` all the known CI checks passes. Let's start the review. # Important Notes Matching in `case` statement is currently done via `Vector_Data`. Use: ``` case x of Vector.Vector_Data -> True ``` until a better alternative is found.	2022-09-13 03:07:17 +00:00
James Dunkerley	2b425f8e08	Restructuring `Database.Connection` to allow for database specific types. (#3632 ) - Added `databases`, `database`, `set_database`. - Added `schemas`, `schema`, `set_schema`. - Added `table_types`, - Added `tables`. - Moved the vast majority of the connection work into a lower level `JDBC_Connection` object. - `Connection` represents the standard API for database connections and provides a base JDBC implementation. - `SQLite_Connection` has the `Connection` API but with custom `databases` and `schemas` methods for SQLite. - `Postgres_Connection` has the `Connection` API but with custom `set_database`, `databases`, `set_schema` and `schemas` methods for Postgres. - Updated `Redshift` - no public API change.	2022-09-07 17:32:28 +00:00
Radosław Waśko	551100af3b	Add `Table.distinct` function to In-Memory table (#3684 ) Implements https://www.pivotaltracker.com/story/show/182307143 # Important Notes - Modified standard library Java helpers dependencies so that `std-table` module depends on `std-base`, as a provided dependency. This is allowed, because `std-table` is used by the `Standard.Table` Enso module which depends on `Standard.Base` which ensures that the `std-base` is loaded onto the classpath, thus whenever `std-table` is loaded by `Standard.Table`, so is `std-base`. Thus we can rely on classes from `std-base` and its dependencies being _provided_ on the classpath. Thanks to that we can use utilities like `Text_Utils` also in `std-table`, avoiding code duplication. Additional advantage of that is that we don't need to specify ICU4J as a separate dependency for `std-table`, since it is 'taken' from `std-base` already - so we avoid including it in our build packages twice.	2022-09-07 12:28:41 +00:00
Radosław Waśko	eafba079d9	Make In Memory Table Aggregator types more specific where possible (#3679 ) Many aggregation types fell back to the general `Any` type where they could have used the type of input column - for example `First` of a column of integers is guaranteed to fit the `Integer` storage type, so it doesn't have to fall back to `Any`. This PR fixes that and adds a test that checks this.	2022-09-05 09:17:41 +00:00
Radosław Waśko	65140f48ca	Add storage support for Date, Time and DateTime to InMemory table (#3673 ) Implements https://www.pivotaltracker.com/story/show/183080911	2022-08-31 22:06:29 +00:00
Radosław Waśko	d7ebc4a338	Add `Table.take` and `Table.drop` functions to In-Memory table (#3647 ) Implements https://www.pivotaltracker.com/story/show/182307347	2022-08-26 19:41:36 +00:00
James Dunkerley	a20d43390e	Adding DateTime part functions (#3669 ) - Added `Zone`, `Date_Time` and `Time_Of_Day` to `Standard.Base`. - Renamed `Zone` to `Time_Zone`. - Added `century`. - Added `is_leap_year`. - Added `length_of_year`. - Added `length_of_month`. - Added `quarter`. - Added `day_of_year`. - Added `Day_Of_Week` type and `day_of_week` function. - Updated `week_of_year` to support ISO. # Important Notes - Had to pass locale to formatter for date/time tests to work on my PC. - Changed default of `week_of_year` to use ISO.	2022-08-26 15:47:58 +00:00
Radosław Waśko	fd318cfa96	Remove `Array.set_at` (#3634 ) Implements https://www.pivotaltracker.com/story/show/182879865 # Important Notes Note that removing `set_at` still does not make our arrays fully immutable - `Array.copy` can still be used to mutate them.	2022-08-26 09:34:33 +00:00
Hubert Plociniczak	d87a32d019	Builtin Date_Time, Time_Of_Day, Zone (#3658 ) * Builtin Date_Time, Time_Of_Day, Zone Improved polyglot support for Date_Time (formerly Time), Time_Of_Day and Zone. This follows the pattern introduced for Enso Date. Minor caveat - in tests for Date, had to bend a lot for JS Date to pass. This is because JS Date is not really only a Date, but also a Time and Timezone, previously we just didn't consider the latter. Also, JS Date does not deal well with setting timezones so the trick I used is to first call foreign function returning a polyglot JS Date, which is converted to ZonedDateTime and only then set the correct timezone. That way none of the existing tests had to be changes or special cased. Additionally, JS deals with milliseconds rather than nanoseconds so there is loss in precision, as noted in Time_Spec. * Add tests for Java's LocalTime * changelog * Make date formatters in table happy * PR review, add more tests for zone * More tests and fixed a bug in column reader Column reader didn't take into account timezone but that was a mistake since then it wouldn't map to Enso's Date_Time. Added tests that check it now. * remove redundant conversion * Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Time.enso Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org> * First round of addressing PR review * don't leak java exceptions in Zone * Move Date_Time to top-level module * PR review Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org> Co-authored-by: Jaroslav Tulach <jaroslav.tulach@enso.org>	2022-08-24 12:31:29 +02:00
Radosław Waśko	3dca738cf7	Add `Vector.take` and `Vector.drop` functions (#3629 ) Implements https://www.pivotaltracker.com/story/show/182307048	2022-08-10 16:02:02 +00:00
Dmitry Bushev	5e114acbb5	Update Scala to 2.13.8 (#3631 ) Update Scala compiler and libraries.	2022-08-08 19:32:55 +00:00
Radosław Waśko	0a2fea925c	Create `Index_Sub_Range` type and update `Text.take` and `Text.drop` (#3617 )	2022-08-03 11:41:34 +00:00
Radosław Waśko	ee91656f30	Remove duplicate `Line_Ending_Style` and update defaults (#3597 ) Implements https://www.pivotaltracker.com/story/show/182749831	2022-07-27 09:43:51 +00:00
James Dunkerley	be311457bd	Add Linear Regression support for Vectors. (#3601 ) Adds least squares regression APIs. Covers the basic 4 trend line types from Excel (doesn't cover Polynomial or Moving Average). Removes the old `Model` from the `Standard.Table`.	2022-07-22 08:41:17 +00:00
Radosław Waśko	16fd038c1a	Add support for `.pgpass` to PostgreSQL (#3593 ) Implements https://www.pivotaltracker.com/story/show/182582924	2022-07-21 13:32:37 +00:00
Jaroslav Tulach	4465d63dd8	Improved polyglot Date support (#3559 ) Significantly improves the polyglot Date support (as introduced by #3374). It enhances the `Date_Spec` to run it in four flavors: - with Enso Date (as of now) - with JavaScript Date - with JavaScript Date wrapped in (JavaScript) array - with Java LocalDate allocated directly The code is then improved by necessary modifications to make the `Date_Spec` pass. # Important Notes James has requested in [#181755990](https://www.pivotaltracker.com/n/projects/2539304/stories/181755990) - e.g. _Review and improve InMemory Table support for Dates, Times, DateTimes, BigIntegers_ the following program to work: ``` foreign js dateArr = """ return [1, new Date(), 7] main = IO.println <\| (dateArr.at 1).week_of_year ``` the program works with here in provided changes and prints `27` as of today. @jdunkerley has provided tests for proper behavior of date in `Table` and `Column`. Those tests are working as of [`f16d07e`](`f16d07e640`). One just needs to accept `List<Value>` and then query `Value` for `isDate()` when needed. Last round of changes is related to exception handling. `8b686b12bd` makes sure `makePolyglotError` accepts only polyglot values. Then it wraps plain Java exceptions into `WrapPlainException` with `has_type` method - `60da5e70ed` - the remaining changes in the PR are only trying to get all tests working in the new setup. The support for `Time` isn't part of this PR yet.	2022-07-21 06:32:40 +00:00
Radosław Waśko	35ddd2a89e	Add new options to the Delimited format (#3581 ) Implements https://www.pivotaltracker.com/story/show/182662195 and https://www.pivotaltracker.com/story/show/182651884	2022-07-14 15:01:26 +00:00
James Dunkerley	9578dc1e43	Move `write_bytes` to be part of `Vector`. (#3583 ) Updates `write_bytes` API to be part of `Vector` and to conform to `write` APIs. # Important Notes Ensures doesn't touch the file if an invalid byte array.	2022-07-14 11:30:40 +00:00
James Dunkerley	e41936f436	Additional tests for Excel Append (#3580 ) Add some additional scenarios to Excel append tests: - Non-A1 start - Name duplication - Hitting another range # Important Notes Also fixed a warning in the Image library.	2022-07-13 13:02:39 +00:00
Radosław Waśko	df10e4ba7c	Add appending support for Delimited files (#3573 ) Implements https://www.pivotaltracker.com/story/show/182309839	2022-07-11 12:36:01 +00:00
James Dunkerley	16e6f2fa08	Adding Append support to Excel.Write (#3558 ) Adds support for appending to an existing Excel table. # Important Notes - Renamed `Column_Mapping` to `Column_Name_Mapping` - Changed new type name to `Map_Column` - Added last modified time and creation time to `File`.	2022-07-07 06:41:33 +00:00
Radosław Waśko	7c94fa6a77	Custom Encoding support when writing Delimited files (#3564 ) Implements https://www.pivotaltracker.com/story/show/182545847	2022-07-07 00:20:00 +00:00
James Dunkerley	5174cc6ece	Update `Database.connect` to match new API (#3542 ) Initial work restructuring the `Database.connect` API - New SQLite API with support for InMemory. - Updated PostgreSQL API with SSL and Client Certificate Support. - Updated Redshift API. # Important Notes Follow up tasks: - PostgreSQL SSL additional testing. - Driver version updating. - `.pgpass` support.	2022-07-04 20:26:44 +00:00
James Dunkerley	4ca2097488	Adding write support to `File_Format.Excel` (#3551 ) Support for writing tables to Excel. # Important Notes Has custom support for Error mode as will allow appending a new table in this mode to the file.	2022-07-04 18:32:16 +00:00
Radosław Waśko	972b34d1a9	Implement value formatting and writing new files in Delimited format. (#3528 ) Implements https://www.pivotaltracker.com/story/show/182309429 and https://www.pivotaltracker.com/story/show/182309573	2022-06-23 16:51:52 +00:00
James Dunkerley	7a2d304fa0	Update Excel reading API (#3523 ) - Remove `from_xls` and `from_xlsx`. - Add `headers` support to `File_Format.Excel`. - Altered default read for Excel to be the first sheet. - Altered behavior so that single cells grow down and right when reading sheet. - Altered `Excel_Range` so knows if single cell or 1x1 range address. # Important Notes - Renamed `Range` to `Cell_Range` to avoid name clash.	2022-06-21 13:39:32 +00:00
James Dunkerley	a0c6fa9c96	Removing old functions and tidy up of Table types (#3519 ) - Removed `select` method. - Removed `group` method. - Removed `Aggregate_Table` type. - Removed `Order_Rule` type. - Removed `sort` method from Table. - Expanded comments on `order_by`. - Update comment on `aggregate` on Database. - Update Visualisation to use new APIs. - Updated Data Science examples to use new APIs. - Moved Examples test out of Tests to own test. # Important Notes Need to get Examples_Tests added to CI.	2022-06-14 13:37:20 +00:00
James Dunkerley	e97d27e1e0	Adjusting First and Last order_by to use Sort_Column_Selector (#3517 )	2022-06-10 09:59:03 +00:00
James Dunkerley	8afba43add	Implement In-Memory Table order_by (#3515 ) Implemented the `order_by` function with support for all modes of operation. Added support for case insensitive natural order. # Important Notes - Improved MultiValueIndex/Key to not create loads of arrays. - Adjusted HashCode for MultiValueKey to have a simple algorithm. - Added Text_Utils.compare_normalized_ignoring_case to allow for case insensitive comparisons. - Fixed issues with ObjectComparator and added some unit tests for it.	2022-06-08 12:30:50 +00:00
Radosław Waśko	2af970fe52	Basic changes to File_Format (#3516 ) Implements https://www.pivotaltracker.com/story/show/182308987	2022-06-08 09:53:18 +00:00
James Dunkerley	ba5d6823a9	Merge the Unique Name Strategy with NameDeduplicator (#3490 ) - Merge the two approaches and makes them consistent - Add warning support into Reader # Important Notes - Added support for JUnit format XML generation on tests. Use `ENSO_TEST_JUNIT_DIR`	2022-06-01 12:52:23 +00:00
James Dunkerley	1aa0bb3552	Rank Data, Correlation, Covariance, R Squared (#3484 ) - Added new `Statistic`s: Covariance, Pearson, Spearman, R Squared - Added `covariance_matrix` function - Added `pearson_correlation` function to compute correlation matrix - Added `rank_data` and Rank_Method type to create rankings of a Vector - Added `spearman_correlation` function to compute Spearman Rank correlation matrix # Important Notes - Added `Panic.throw_wrapped_if_error` and `Panic.handle_wrapped_dataflow_error` to help with errors within a loop. - Removed `Array.set_at` use from `Table.Vector_Builder`	2022-05-30 17:13:06 +00:00
Radosław Waśko	db611e1581	Remove obsolete `Csv` reading module (#3482 ) Completes https://www.pivotaltracker.com/story/show/182037405 # Important Notes - Some tests had to be adapted to the new parsing logic.	2022-05-28 10:01:14 +00:00
Radosław Waśko	7f572bf3e4	The user should be able to have the headers Inferred when reading a Delimited file (#3472 ) Implements https://www.pivotaltracker.com/story/show/181986831	2022-05-25 13:29:17 +00:00
Hubert Plociniczak	4918ccb5a3	Make sure formatting is applied to std-bits projects (#3477 ) @radeusgd discovered that no formatting was being applied to std-bits projects. This was caused by the fact that `enso` project didn't aggregate them. Compilation and packaging still worked because one relied on the output of some tasks but ``` sbt> javafmtAll ``` didn't apply it to `std-bits`. # Important Notes Apart from `build.sbt` no manual changes were made.	2022-05-25 09:26:50 +00:00
Radosław Waśko	ec1b072824	Integrate value parsing with Delimited file reading (#3463 ) Implements https://www.pivotaltracker.com/story/show/182200028	2022-05-24 17:59:00 +02:00
Radosław Waśko	ff7700ebb1	Automatic inference of value types when parsing table columns (#3462 ) Implements https://www.pivotaltracker.com/story/show/182199966	2022-05-20 15:08:36 +00:00
Radosław Waśko	8430ce2625	Parsing values with known types (#3455 ) Implements https://www.pivotaltracker.com/story/show/181824146	2022-05-18 15:27:48 +00:00
James Dunkerley	4f3a76817c	Statistics on a Vector (#3442 ) - Implements various statistics on Vector # Important Notes Some minor codebase improvements: - Some tweaks to Any/Nothing to improve performance - Fixed bug in ObjectComparator - Added if_nothing - Removed Group_By_Key	2022-05-11 13:25:06 +00:00
Radosław Waśko	64f178f7a8	Delimited File Encoding (#3430 ) Implements https://www.pivotaltracker.com/story/show/181998375	2022-05-10 22:44:05 +00:00
James Dunkerley	078c665a60	File_Format.Excel work (#3425 ) - Read in Excel files following the specification. - Support for XLSX and XLS formats. - Ability to select ranges and sheets. - Skip Rows and Row Limits. # Important Notes - Minor fix to DelimitedReader for Windows	2022-05-06 13:21:10 +00:00
Radosław Waśko	8219dca400	Improve support for reading Delimited files (#3424 ) Implements https://www.pivotaltracker.com/story/show/181823957	2022-04-29 17:12:19 +00:00
Radosław Waśko	14257d07aa	Data analysts should be able to use `Text.split`, `Text.lines` and `Text.words` to break up strings (#3415 ) Implements https://www.pivotaltracker.com/story/show/181266184 ### Important Notes Changed example image download to only proceed if the file did not exist before - thus cutting on the build time (the build used to download it _every_ time - which completely failed the build if network is down). A redownload can be forced by performing a fresh repository checkout.	2022-04-26 17:22:53 +02:00
James Dunkerley	5a6b6749cc	Restructuring for File.read (#3390 ) - Added Encoding type - Added `Text.bytes`, `Text.from_bytes` with Encoding support - Renamed `File.read` to `File.read_text` - Renamed `File.write` to `File.write_text` - Added Encoding support to `File.read_text` and `File.write_text` - Added warnings to invalid encodings	2022-04-19 16:50:03 +00:00
Radosław Waśko	0ea5dc2a6f	Data analysts should be able to use `Text.replace` to substitute parts of the text (#3393 ) Implements https://www.pivotaltracker.com/story/show/181266274	2022-04-13 19:21:47 +00:00
Radosław Waśko	891f064a6a	Extend Aggregate_Spec test suite with tests for missed edge-cases to ensure the feature is well-tested on all backends (#3383 ) Implements https://www.pivotaltracker.com/story/show/181805693 and finishes the basic set of features of the Aggregate component. Still not all aggregations are supported everywhere, because for example SQLite has quite limited support for aggregations. Currently the workaround is to bring the table into memory (if possible) and perform the computation locally. Later on, we may add more complex generator features to emulate the missing aggregations with complex sub-queries.	2022-04-12 11:02:01 +00:00
James Dunkerley	bade0c31de	First and Last ordering (#3380 ) Add the missing `order_by` support to First and Last aggregations for InMemory table.	2022-04-06 12:36:46 +00:00
James Dunkerley	a4dbc9a37b	Moving Aggregation to Java (#3364 )	2022-04-04 09:12:48 +00:00
James Dunkerley	02bcfbb2a8	Refactor Aggregate Column (#3349 ) - Make it easier to understand the computations. - Fix issue with First. - Improve quote handling in Concatenate - Added validation and warnings to input	2022-03-22 18:18:46 +00:00
Radosław Waśko	247b284316	Data analysts should be able to use `Text.location_of` to find indexes within string using various matchers (#3324 ) Implements https://www.pivotaltracker.com/n/projects/2539304/stories/181266029	2022-03-12 19:42:00 +00:00
Hubert Plociniczak	ac5c02ed8c	Use `.isEmpty()` instead of `.length() == 0` (#3314 ) Minor nit - since String is a CharSequence it's advisable to use the corresponding method for checking the condition rather than writing it by hand.	2022-03-04 16:41:48 +01:00
Radosław Waśko	40c851bf8b	Text.pad and Text.trim (#3309 ) Implements https://www.pivotaltracker.com/story/show/181265516	2022-03-02 17:19:39 +00:00
Radosław Waśko	2ae636f63c	Data analysts should be able to use `Text.starts_with` and `Text.ends_with` (#3292 ) Implements https://www.pivotaltracker.com/story/show/181265900	2022-02-23 16:48:33 +00:00
James Dunkerley	2e2c5562a8	Text.take and Text.drop (#3287 ) Implementation of the Text take and drop APIs - Added `Range.contains` function - Added `Text_Sub_Range` type - Added `Text_Utils.index_of` and `Text_Utils.last_index_of` based on ICU StringSearcher	2022-02-22 18:50:59 +00:00
Radosław Waśko	ae9d51555f	Data analysts should be able to use `Text.contains` to check for substring using various matcher techniques. (#3285 ) * Add matching mode definitions * Add stub for new method API and an initial test suite * Fix tests, implement exact matching * Implement Regex matching * changelog * Add benchmarks * Wokraround for case insensitive regex locale support * minor tweaks * Unify Case_Insensitive * Update edge cases * Fix other affected places * minor style change * Add a problematic test * Add a regex test for a similar situation * Migrate to StringSearch:wq * Add test cases for scharfes S edge case * Add problematic Regex Unicode normalization test * Document the regex accents peculiarity * Do not apply the normalization in ASCII only mode * cr	2022-02-22 15:41:56 +00:00

1 2 3 4 5 ...

287 Commits