Commit Graph

325 Commits

Author SHA1 Message Date
James Dunkerley
02bcfbb2a8
Refactor Aggregate Column (#3349)
- Make it easier to understand the computations.
- Fix issue with First.
- Improve quote handling in Concatenate
- Added validation and warnings to input
2022-03-22 18:18:46 +00:00
Radosław Waśko
cc7333812d
The library developer should be able to handle specific types of Panics while passing through others (#3344)
Implements https://www.pivotaltracker.com/story/show/181569176

Also ensures that Dataflow Errors have proper stack traces (earlier they did not point at the right location).
2022-03-18 16:57:06 +00:00
Radosław Waśko
08183f59f2
Minor fixes for Text (#3340)
* Avoid unnecessary copies

* Add tests for conversions

* Add guidelines for Text tests

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-03-15 16:11:46 +00:00
Radosław Waśko
dedd1eac96
Refactor library warnings to use the new system (#3337)
Implements https://www.pivotaltracker.com/story/show/181536964
2022-03-15 12:52:57 +01:00
Radosław Waśko
247b284316
Data analysts should be able to use Text.location_of to find indexes within string using various matchers (#3324)
Implements https://www.pivotaltracker.com/n/projects/2539304/stories/181266029
2022-03-12 19:42:00 +00:00
Marcin Kostrzewa
4653bfeeab
Decorate values with arbitrary warnings (#3248) 2022-03-09 16:40:02 +01:00
James Dunkerley
65465fb8ef
Restructuring the Faker type and creating tests for Group_By (#3318)
- Added Minimum, Maximum, Longest. Shortest, Mode, Percentile
- Added first and last to Map
- Restructured Faker type more inline with FakerJS
- Created 2,500 row data set
- Tests for group_by
- Performance tests for group_by
2022-03-09 10:31:02 +00:00
Hubert Plociniczak
8bdca89917
New Text.insert function (#3311)
Implements https://www.pivotaltracker.com/n/projects/2539304
2022-03-04 16:40:34 +01:00
James Dunkerley
fb68f18739
Within Vector, use Array.Copy wherever possible (#3236)
Following the Slice and Array.Copy experiment, took just the Array.Copy parts out and built into the Vector class.

This gives big performance wins in common operations:

| Test | Ref | New |
| --- | --- | --- |
| New Vector | 41.5 | 41.4 |
| Append Single | 26.6 | 4.2 |
| Append Large | 26.6 | 4.2 |
| Sum | 230.1 | 99.1 |
| Drop First 20 and Sum | 343.5 | 96.9 |
| Drop Last 20 and Sum | 311.7 | 96.9 |
| Filter | 240.2 | 92.5 |
| Filter With Index | 364.9 | 237.2 |
| Partition | 772.6 | 280.4 |
| Partition With Index | 912.3 | 427.9 |
| Each | 110.2 | 113.3 |

*Benchmarks run on an AWS EC2 r5a.xlarge with 1,000,000 item count, 100  iteration size run 10 times.*

# Important Notes
Have generally tried to push the `@Tail_Call` down from the Vector class and move to calling functions on the range class.

- Expanded benchmarks on Vector
- Added `take` method to Vector
- Added `each_with_index` method to Vector
- Added `filter_with_index` method to Vector
2022-03-03 15:40:48 +00:00
James Dunkerley
ad1130587d
Updating Text.repeat and adding Text.* (#3310)
Updating the `Text.repeat` function:
- fix issue with negative count
- add * operator

Add tests of the function.
2022-03-02 19:00:47 +00:00
Radosław Waśko
40c851bf8b
Text.pad and Text.trim (#3309)
Implements https://www.pivotaltracker.com/story/show/181265516
2022-03-02 17:19:39 +00:00
Radosław Waśko
0d96f59f44
Data analysts should be able to use Text.to_case to change the case of Text values (#3302)
* Move to_upper_case and to_lower_case into to_case

* Add an export, not sure about it

* Implement title case

TODO: some more tests would be good

* Add more tests

* explain title case

* fix todo

* changelog
2022-02-28 23:20:41 +00:00
Radosław Waśko
b03416f907
Update Column_Selector and Column_Mapping to use Matcher over Matching_Strategy (#3299)
Implements https://www.pivotaltracker.com/story/show/181339748
2022-02-25 18:39:10 +00:00
Radosław Waśko
2ae636f63c
Data analysts should be able to use Text.starts_with and Text.ends_with (#3292)
Implements https://www.pivotaltracker.com/story/show/181265900
2022-02-23 16:48:33 +00:00
James Dunkerley
2e2c5562a8
Text.take and Text.drop (#3287)
Implementation of the Text take and drop APIs
- Added `Range.contains` function
- Added `Text_Sub_Range` type
- Added `Text_Utils.index_of` and `Text_Utils.last_index_of` based on ICU StringSearcher
2022-02-22 18:50:59 +00:00
Radosław Waśko
ae9d51555f
Data analysts should be able to use Text.contains to check for substring using various matcher techniques. (#3285)
* Add matching mode definitions

* Add stub for new method API and an initial test suite

* Fix tests, implement exact matching

* Implement Regex matching

* changelog

* Add benchmarks

* Wokraround for case insensitive regex locale support

* minor tweaks

* Unify Case_Insensitive

* Update edge cases

* Fix other affected places

* minor style change

* Add a problematic test

* Add a regex test for a similar situation

* Migrate to StringSearch:wq

* Add test cases for scharfes S edge case

* Add problematic Regex Unicode normalization test

* Document the regex accents peculiarity

* Do not apply the normalization in ASCII only mode

* cr
2022-02-22 15:41:56 +00:00
Radosław Waśko
14f57271a2
Ensure that Text.compare_to compares strings according to grapheme clusters (#3282)
https://www.pivotaltracker.com/story/show/181175238
2022-02-17 17:09:41 +00:00
James Dunkerley
7afc8c48c5
Adding Integer.Parse (#3283)
* Integer parse via Longs

* Integer parse via Longs

* Benchmark for Number Parse

* CHANGELOG.md and Natural Order

* Expanded test set

* Number base tests

* Few more negative tests
2022-02-17 15:04:00 +00:00
James Dunkerley
68b85dea82
Improvement to the Natural Order Sort (#3276)
* Improved Natural Order
Data generator for benchmarking

* Missing Import
Benchmark script

* Update Natural_Order.enso

Restore missing ToDo

* Changelog

* PR Comments

* PR Comments

* Additional comments.

* Correction
2022-02-16 17:40:33 +00:00
Marcin Kostrzewa
67b4e59506
Properly expose stacktraces and related data to user code (#3271) 2022-02-16 10:36:19 +03:00
Radosław Waśko
fbf747d6cf
Implement Vector.flatten (#3259) 2022-02-15 16:16:08 +01:00
James Dunkerley
585afd83ce
Adding Text.at and Text.is_digit functions (#3269)
* Add Text.at function

* Add tests for Text.at

* Add tests for Text.is_digit

* Change log

* Avoid memory allocation
2022-02-14 09:03:55 +00:00
James Dunkerley
1814d3c4f1
Data analysts should be able to transform a Table using the rename_columns functions (#3249)
* Implement Natural_Order and sort_columns

* Starting on Rename

Align Column_Mapping

Add By_Position
Separating off the validation for By_Index so can reuse for rename

By_Position implemented

By_Index implemented
Adjusted behaviour following discussion with Ned, so that renames dominate untouched columns.

Moving to validation style checks for problems

Putting accumulator back

Rename work

* Add Range.find

* More work

* Regex support
Tidy of Unique Name Strategy

* Fix Regex support

* Warning messages
Tests for Unique Naming Strategy
Table rename working

* Database Table rename_columns
Fix for Table
**Must follow up on slice**

* Some tests

* More tests

* Complete test set
(and associated fixes)

* Functional use_first_row_as_names
Tests to go...

* Test for use_first_row_as_names

* Change log

* trailing space

Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
2022-02-11 10:18:51 +00:00
Radosław Waśko
8b24336604
Data analysts should be able to reorder columns into name order using sort_columns functions (#3250) 2022-02-08 17:28:46 +01:00
Michał Wawrzyniec Urbańczyk
4baad5f146
Nightly proccess preparations: Setting Enso version through the environment (#3241)
Co-authored-by: Radosław Waśko <radoslaw.wasko@enso.org>
Co-authored-by: Radosław Waśko <wasko.radek@gmail.com>
2022-02-07 15:14:32 +01:00