Implement Table.replace for the in-memory backend (#8935)

This commit is contained in:
GregoryTravis 2024-02-06 15:57:50 -05:00 committed by GitHub
parent e0ba39ed3e
commit 6554972b7d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 309 additions and 3 deletions

View File

@ -611,6 +611,7 @@
`Filter_Condition`.][8865]
- [Added `File_By_Line` type allowing processing a file line by line. New faster
JSON parser based off Jackson.][8719]
- [Implemented `Table.replace` for the in-memory backend.][8935]
[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
@ -878,6 +879,7 @@
[8816]: https://github.com/enso-org/enso/pull/8816
[8849]: https://github.com/enso-org/enso/pull/8849
[8865]: https://github.com/enso-org/enso/pull/8865
[8935]: https://github.com/enso-org/enso/pull/8935
#### Enso Compiler

View File

@ -1403,7 +1403,7 @@ type Table
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values, either in the lookup table,
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
@ -1420,6 +1420,85 @@ type Table
Helpers.ensure_same_connection "table" [self, lookup_table] <|
Lookup_Query_Helper.build_lookup_query self lookup_table key_columns add_new_columns allow_unmatched_rows on_problems
## ALIAS find replace
GROUP Standard.Base.Calculations
ICON join
Replaces values in `column` using `lookup_table` to specify a
mapping from old to new values.
Arguments:
- lookup_table: the table to use as a mapping from old to new values. A
`Map` can also be used here (in which case passing `from_column` or
`to_column` is disallowed and will throw an `Illegal_Argument` error.
- column: the column within `self` to perform the replace on.
- from_column: the column within `lookup_table` to match against `column`
in `self`.
- to_column: the column within `lookup_table` to get new values from.
- allow_unmatched_rows: Specifies how to handle missing rows in the lookup.
If `False` (the default), an `Unmatched_Rows_In_Lookup` error is raised.
If `True`, the unmatched rows are left unchanged. Any new columns will
be filled with `Nothing`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.
? Result Ordering
When operating in-memory, this operation preserves the order of rows
from this table (unlike `join`).
In the Database backend, there are no guarantees related to ordering of
results.
? Error Conditions
- If this table or the lookup table is lacking any of the columns
specified by `from_column`, `to_column`, or `column`, a
`Missing_Input_Columns` error is raised.
- If a single row is matched by multiple entries in the lookup table,
a `Non_Unique_Key` error is raised.
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
`Unmatched_Rows_In_Lookup` error is raised.
- The following problems may be reported according to the `on_problems`
setting:
- If any of the `columns` is a floating-point type,
a `Floating_Point_Equality`.
> Example
Replace values in column 'x' using a lookup table.
table = Table.new [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']], ['z', ['e', 'f', 'g', 'h']]]
# | x | y | z
# ---+---+---+---
# 0 | 1 | a | e
# 1 | 2 | b | f
# 2 | 3 | c | g
# 3 | 4 | d | h
lookup_table = Table.new [['x', [1, 2, 3, 4]], ['new_x', [10, 20, 30, 40]]]
# | old_x | new_x
# ---+-------+-------
# 0 | 1 | 10
# 1 | 2 | 20
# 2 | 3 | 30
# 3 | 4 | 40
result = table.replace lookup_table 'x'
# | x | y | z
# ---+----+---+---
# 0 | 10 | a | e
# 1 | 20 | b | f
# 2 | 30 | c | g
# 3 | 40 | d | h
replace : Table | Map -> (Text | Integer) -> (Text | Integer) -> (Text | Integer) -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup
replace self lookup_table:(Table | Map) column:(Text | Integer) from_column:(Text | Integer)=0 to_column:(Text | Integer)=1 allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning =
_ = [lookup_table, column, from_column, to_column, allow_unmatched_rows, on_problems]
Error.throw (Unsupported_Database_Operation.Error "Table.replace is not implemented yet for the Database backends.")
## ALIAS join by row position
GROUP Standard.Base.Calculations
ICON dataframes_join

View File

@ -1913,7 +1913,7 @@ type Table
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values, either in the lookup table,
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
@ -1959,6 +1959,112 @@ type Table
java_table = LookupJoin.lookupAndReplace java_keys java_descriptions allow_unmatched_rows java_problem_aggregator
Table.Value java_table
## ALIAS find replace
GROUP Standard.Base.Calculations
ICON join
Replaces values in `column` using `lookup_table` to specify a
mapping from old to new values.
Arguments:
- lookup_table: the table to use as a mapping from old to new values. A
`Map` can also be used here (in which case passing `from_column` or
`to_column` is disallowed and will throw an `Illegal_Argument` error.
- column: the column within `self` to perform the replace on.
- from_column: the column within `lookup_table` to match against `column`
in `self`.
- to_column: the column within `lookup_table` to get new values from.
- allow_unmatched_rows: Specifies how to handle missing rows in the lookup.
If `False` (the default), an `Unmatched_Rows_In_Lookup` error is raised.
If `True`, the unmatched rows are left unchanged. Any new columns will
be filled with `Nothing`.
- on_problems: Specifies how to handle problems if they occur, reporting
them as warnings by default.
? Result Ordering
When operating in-memory, this operation preserves the order of rows
from this table (unlike `join`).
In the Database backend, there are no guarantees related to ordering of
results.
? Error Conditions
- If this table or the lookup table is lacking any of the columns
specified by `from_column`, `to_column`, or `column`, a
`Missing_Input_Columns` error is raised.
- If a single row is matched by multiple entries in the lookup table,
a `Non_Unique_Key` error is raised.
- If a column that is being updated from the lookup table has a type
that is not compatible with the type of the corresponding column in
this table, a `No_Common_Type` error is raised.
- If a key column contains `Nothing` values in the lookup table,
a `Null_Values_In_Key_Columns` error is raised.
- If `allow_unmatched_rows` is `False` and there are rows in this table
that do not have a matching row in the lookup table, an
`Unmatched_Rows_In_Lookup` error is raised.
- The following problems may be reported according to the `on_problems`
setting:
- If any of the `columns` is a floating-point type,
a `Floating_Point_Equality`.
> Example
Replace values in column 'x' using a lookup table.
table = Table.new [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']], ['z', ['e', 'f', 'g', 'h']]]
# | x | y | z
# ---+---+---+---
# 0 | 1 | a | e
# 1 | 2 | b | f
# 2 | 3 | c | g
# 3 | 4 | d | h
lookup_table = Table.new [['x', [1, 2, 3, 4]], ['new_x', [10, 20, 30, 40]]]
# | old_x | new_x
# ---+-------+-------
# 0 | 1 | 10
# 1 | 2 | 20
# 2 | 3 | 30
# 3 | 4 | 40
result = table.replace lookup_table 'x'
# | x | y | z
# ---+----+---+---
# 0 | 10 | a | e
# 1 | 20 | b | f
# 2 | 30 | c | g
# 3 | 40 | d | h
@column Widget_Helpers.make_column_name_selector
@from_column Widget_Helpers.make_column_name_selector
@to_column Widget_Helpers.make_column_name_selector
replace : Table | Map -> (Text | Integer) -> (Text | Integer | Nothing) -> (Text | Integer | Nothing) -> Boolean -> Problem_Behavior -> Table ! Missing_Input_Columns | Non_Unique_Key | Unmatched_Rows_In_Lookup
replace self lookup_table:(Table | Map) column:(Text | Integer) from_column:(Text | Integer | Nothing)=Nothing to_column:(Text | Integer | Nothing)=Nothing allow_unmatched_rows:Boolean=True on_problems:Problem_Behavior=Problem_Behavior.Report_Warning =
case lookup_table of
_ : Map ->
if from_column.is_nothing.not || to_column.is_nothing.not then Error.throw (Illegal_Argument.Error "If a Map is provided as the lookup_table, then from_column and to_column should not also be specified.") else
self.replace (map_to_lookup_table lookup_table 'from' 'to') column 'from' 'to' allow_unmatched_rows=allow_unmatched_rows on_problems=on_problems
_ : Table ->
from_column_resolved = from_column.if_nothing 0
to_column_resolved = to_column.if_nothing 1
selected_lookup_columns = lookup_table.select_columns [from_column_resolved, to_column_resolved]
self.select_columns column . if_not_error <| selected_lookup_columns . if_not_error <|
unique = self.column_naming_helper.create_unique_name_strategy
unique.mark_used (self.column_names)
## We perform a `merge` into `column`, using a duplicate of `column`
as the key column to join with `from_column`.
duplicate_key_column_name = unique.make_unique "duplicate_key"
duplicate_key_column = self.at column . rename duplicate_key_column_name
self_with_duplicate = self.set duplicate_key_column set_mode=Set_Mode.Add
## Create a lookup table with just `to_column` and `from_column`,
renamed to match the base table's `column` and its duplicate,
respectively.
lookup_table_renamed = selected_lookup_columns . rename_columns (Map.from_vector [[from_column_resolved, duplicate_key_column_name], [to_column_resolved, column]])
merged = self_with_duplicate.merge lookup_table_renamed duplicate_key_column_name add_new_columns=False allow_unmatched_rows=allow_unmatched_rows on_problems=on_problems
merged.remove_columns duplicate_key_column_name
## ALIAS join by row position
GROUP Standard.Base.Calculations
ICON dataframes_join
@ -2701,6 +2807,13 @@ concat_columns column_set all_tables result_type result_row_count on_problems =
sealed_storage = storage_builder.seal
Column.from_storage column_set.name sealed_storage
## PRIVATE
A helper that creates a two-column table from a map.
map_to_lookup_table : Map Any Any -> Text -> Text -> Table
map_to_lookup_table map key_column value_column =
keys_and_values = map.to_vector
Table.new [[key_column, keys_and_values.map .first], [value_column, keys_and_values.map .second]]
## PRIVATE
Conversion method to a Table from a Column.
Table.from (that:Column) = that.to_table

View File

@ -19,7 +19,7 @@ type Data
setup create_connection_fn =
Data.Value (create_connection_fn Nothing)
teardown self = self.connection.close

View File

@ -0,0 +1,112 @@
from Standard.Base import all
import Standard.Base.Errors.Illegal_Argument.Illegal_Argument
from Standard.Table import all
from Standard.Table.Errors import all
from Standard.Database import all
from Standard.Test_New import all
from project.Common_Table_Operations.Util import run_default_backend
import project.Util
main = run_default_backend add_specs
type Data
Value ~connection
setup create_connection_fn =
Data.Value (create_connection_fn Nothing)
teardown self = self.connection.close
add_specs suite_builder setup =
prefix = setup.prefix
create_connection_fn = setup.create_connection_func
suite_builder.group prefix+"Table.replace" group_builder->
data = Data.setup create_connection_fn
group_builder.teardown <|
data.teardown
table_builder cols =
setup.table_builder cols connection=data.connection
group_builder.specify "should be able to replace values via a lookup table, using from/to column defaults" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]]
result = table.replace lookup_table 'x'
result . should_equal expected
group_builder.specify "should be able to replace values via a lookup table, specifying from/to columns" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = table_builder [['d', [4, 5, 6, 7]], ['x', [2, 1, 4, 3]], ['d2', [5, 6, 7, 8]], ['z', [20, 10, 40, 30]]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]]
result = table.replace lookup_table 'x' 'x' 'z'
result . should_equal expected
group_builder.specify "should be able to replace values via a lookup table provided as a Map" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = Map.from_vector [[2, 20], [1, 10], [4, 40], [3, 30]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']]]
result = table.replace lookup_table 'x'
result . should_equal expected
group_builder.specify "should fail with Missing_Input_Columns if the specified columns do not exist" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]]
table.replace lookup_table 'q' 'x' 'z' . should_fail_with Missing_Input_Columns
table.replace lookup_table 'x' 'q' 'z' . should_fail_with Missing_Input_Columns
table.replace lookup_table 'x' 'x' 'q' . should_fail_with Missing_Input_Columns
group_builder.specify "can allow unmatched rows" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [4, 3, 1]], ['z', [40, 30, 10]]]
expected = table_builder [['x', [10, 2, 30, 40]], ['y', ['a', 'b', 'c', 'd']]]
result = table.replace lookup_table 'x'
result . should_equal expected
group_builder.specify "fails on unmatched rows" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [4, 3, 1]], ['z', [40, 30, 10]]]
table.replace lookup_table 'x' allow_unmatched_rows=False . should_fail_with Unmatched_Rows_In_Lookup
group_builder.specify "fails on non-unique keys" <|
table = table_builder [['x', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [2, 1, 4, 1, 3]], ['z', [20, 10, 40, 11, 30]]]
table.replace lookup_table 'x' . should_fail_with Non_Unique_Key
group_builder.specify "should avoid name clashes in the (internally) generated column name" <|
table = table_builder [['duplicate_key', [1, 2, 3, 4]], ['y', ['a', 'b', 'c', 'd']]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]]]
expected = table_builder [['duplicate_key', [10, 20, 30, 40]], ['y', ['a', 'b', 'c', 'd']]]
result = table.replace lookup_table 'duplicate_key'
result . should_equal expected
group_builder.specify "(edge-case) should allow lookup with itself" <|
table = table_builder [['x', [2, 1, 4, 3]], ['y', [20, 10, 40, 30]]]
expected = table_builder [['x', [20, 10, 40, 30]], ['y', [20, 10, 40, 30]]]
result = table.replace table 'x'
result . should_equal expected
group_builder.specify "should not merge columns other than the one specified in the `column` param" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']], ['q', [4, 5, 6, 7, 8]]]
lookup_table = table_builder [['x', [2, 1, 4, 3]], ['z', [20, 10, 40, 30]], ['q', [40, 50, 60, 70]]]
expected = table_builder [['x', [10, 20, 30, 40, 20]], ['y', ['a', 'b', 'c', 'd', 'e']], ['q', [4, 5, 6, 7, 8]]]
result = table.replace lookup_table 'x'
result . should_equal expected
group_builder.specify "should fail on null key values in lookup table" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = table_builder [['x', [2, 1, Nothing, 3]], ['z', [20, 10, 40, 30]]]
table.replace lookup_table 'x' . should_fail_with Null_Values_In_Key_Columns
group_builder.specify "should not allow from/to_coumn to specified if the argument is a Map" <|
table = table_builder [['x', [1, 2, 3, 4, 2]], ['y', ['a', 'b', 'c', 'd', 'e']]]
lookup_table = Map.from_vector [[2, 20], [1, 10], [4, 40], [3, 30]]
table.replace lookup_table 'x' from_column=8 . should_fail_with Illegal_Argument
table.replace lookup_table 'x' to_column=9 . should_fail_with Illegal_Argument
table.replace lookup_table 'x' from_column=8 to_column=9 . should_fail_with Illegal_Argument