mirror of
https://github.com/enso-org/enso.git
synced 2024-11-22 22:10:15 +03:00
Add Table.cross_join and Table.zip to In-Memory Table (#4063)
Implements https://www.pivotaltracker.com/story/show/184239059
This commit is contained in:
parent
aa995110e9
commit
d2e57edc8b
@ -280,6 +280,8 @@
|
||||
to the types.][4026]
|
||||
- [Implemented `Table.distinct` for Database backends.][4027]
|
||||
- [Implemented `Table.union` for the in-memory backend.][4052]
|
||||
- [Implemented `Table.cross_join` and `Table.zip` for the in-memory
|
||||
backend.][4063]
|
||||
|
||||
[debug-shortcuts]:
|
||||
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
|
||||
@ -438,6 +440,7 @@
|
||||
[4027]: https://github.com/enso-org/enso/pull/4027
|
||||
[4044]: https://github.com/enso-org/enso/pull/4044
|
||||
[4052]: https://github.com/enso-org/enso/pull/4052
|
||||
[4063]: https://github.com/enso-org/enso/pull/4063
|
||||
|
||||
#### Enso Compiler
|
||||
|
||||
|
@ -788,6 +788,94 @@ type Table
|
||||
problem_builder.attach_problems_before on_problems <|
|
||||
self.connection.dialect.prepare_join self.connection sql_join_kind new_table_name left_setup.subquery right_setup.subquery on_expressions where_expressions columns_to_select=result_columns
|
||||
|
||||
## ALIAS Cartesian Join
|
||||
Joins tables by pairing every row of the left table with every row of the
|
||||
right table.
|
||||
|
||||
Arguments:
|
||||
- right: The table to join with.
|
||||
- right_row_limit: If the number of rows in the right table exceeds this,
|
||||
then a `Cross_Join_Row_Limit_Exceeded` problem is raised. The check
|
||||
exists to avoid exploding the size of the table by accident. This check
|
||||
can be disabled by setting this parameter to `Nothing`.
|
||||
- right_prefix: The prefix added to right table column names in case of
|
||||
name conflict. See "Column Renaming" below for more information.
|
||||
- on_problems: Specifies how to handle problems if they occur, reporting
|
||||
them as warnings by default.
|
||||
|
||||
- If the `right` table has more rows than the `right_row_limit` allows,
|
||||
a `Cross_Join_Row_Limit_Exceeded` is reported. In warning/ignore
|
||||
mode, the join is still executed.
|
||||
|
||||
? Column Renaming
|
||||
|
||||
If columns from the two tables have colliding names, a prefix (by
|
||||
default `Right_`) is added to the name of the column from the right
|
||||
table. The left column remains unchanged. It is possible that the new
|
||||
name will be in use, in this case it will be resolved using the normal
|
||||
renaming strategy - adding subsequent `_1`, `_2` etc.
|
||||
|
||||
? Result Ordering
|
||||
|
||||
Rows in the result are first ordered by the order of the corresponding
|
||||
rows from the left table and then the order of rows from the right
|
||||
table. This applies only if the order of the rows was specified (for
|
||||
example, by sorting the table; in-memory tables will keep the memory
|
||||
layout order while for database tables the order may be unspecified).
|
||||
cross_join : Table -> Integer | Nothing -> Table
|
||||
cross_join self right right_row_limit=100 right_prefix="Right_" on_problems=Report_Warning =
|
||||
_ = [right, right_row_limit, right_prefix, on_problems]
|
||||
Error.throw (Unsupported_Database_Operation.Error "Table.cross_join is not implemented yet for the Database backends.")
|
||||
|
||||
## ALIAS Join By Row Position
|
||||
Joins two tables by zipping rows from both tables table together - the
|
||||
first row of the left table is correlated with the first one of the right
|
||||
one etc.
|
||||
|
||||
Arguments:
|
||||
- right: The table to join with.
|
||||
- keep_unmatched: If set to `True`, the result will include as many rows
|
||||
as the larger of the two tables - the last rows of the larger table
|
||||
will have nulls for columns of the smaller one. If set to `False`, the
|
||||
result will have as many rows as the smaller of the two tables - the
|
||||
additional rows of the larger table will be discarded. The default
|
||||
value is `Report_Unmatched` which means that the user expects that two
|
||||
tables should have the same amount of rows; if they do not, the
|
||||
behaviour is the same as if it was set to `True` - i.e. the unmatched
|
||||
rows are kept with `Nothing` values for the other table, but a
|
||||
`Row_Count_Mismatch` problem is also reported.
|
||||
- right_prefix: The prefix added to right table column names in case of
|
||||
name conflict. See "Column Renaming" below for more information.
|
||||
- on_problems: Specifies how to handle problems if they occur, reporting
|
||||
them as warnings by default.
|
||||
|
||||
- If the tables have different number of rows and `keep_unmatched` is
|
||||
set to `Report_Unmatched`, the join will report `Row_Count_Mismatch`.
|
||||
|
||||
? Column Renaming
|
||||
|
||||
If columns from the two tables have colliding names, a prefix (by
|
||||
default `Right_`) is added to the name of the column from the right
|
||||
table. The left column remains unchanged. It is possible that the new
|
||||
name will be in use, in this case it will be resolved using the normal
|
||||
renaming strategy - adding subsequent `_1`, `_2` etc.
|
||||
|
||||
? Row Ordering
|
||||
|
||||
This operation requires a well-defined order of rows in the input
|
||||
tables. In-memory tables rely on the ordering stemming directly from
|
||||
their layout in memory. Database tables may not impose a deterministic
|
||||
ordering. If the table defines a primary key, it is used to by default
|
||||
to ensure deterministic ordering. That can be overridden by specifying
|
||||
a different ordering using `Table.order_by`. If no primary key was
|
||||
defined nor any ordering was specified explicitly by the user, the
|
||||
order of columns is undefined and the operation will fail, reporting a
|
||||
`Undefined_Column_Order` problem and returning an empty table.
|
||||
zip : Table -> Boolean | Report_Unmatched -> Text -> Problem_Behavior -> Table
|
||||
zip self right keep_unmatched=Report_Unmatched right_prefix="Right_" on_problems=Report_Warning =
|
||||
_ = [right, keep_unmatched, right_prefix, on_problems]
|
||||
Error.throw (Unsupported_Database_Operation.Error "Table.zip is not implemented yet for the Database backends.")
|
||||
|
||||
## ALIAS append, concat
|
||||
Appends records from other table(s) to this table.
|
||||
|
||||
|
@ -42,7 +42,7 @@ import project.Delimited.Delimited_Format.Delimited_Format
|
||||
|
||||
from project.Data.Column_Type_Selection import Column_Type_Selection, Auto
|
||||
from project.Internal.Rows_View import Rows_View
|
||||
from project.Errors import Column_Count_Mismatch, Missing_Input_Columns, Column_Indexes_Out_Of_Range, Duplicate_Type_Selector, No_Index_Set_Error, No_Such_Column, No_Input_Columns_Selected, No_Output_Columns, Invalid_Value_Type
|
||||
from project.Errors import Column_Count_Mismatch, Missing_Input_Columns, Column_Indexes_Out_Of_Range, Duplicate_Type_Selector, No_Index_Set_Error, No_Such_Column, No_Input_Columns_Selected, No_Output_Columns, Invalid_Value_Type, Cross_Join_Row_Limit_Exceeded, Row_Count_Mismatch
|
||||
|
||||
from project.Data.Column import get_item_string
|
||||
from project.Internal.Filter_Condition_Helpers import make_filter_column
|
||||
@ -1113,6 +1113,113 @@ type Table
|
||||
problems = new_java_table.getProblems
|
||||
Java_Problems.parse_aggregated_problems problems
|
||||
|
||||
## ALIAS Cartesian Join
|
||||
Joins tables by pairing every row of the left table with every row of the
|
||||
right table.
|
||||
|
||||
Arguments:
|
||||
- right: The table to join with.
|
||||
- right_row_limit: If the number of rows in the right table exceeds this,
|
||||
then a `Cross_Join_Row_Limit_Exceeded` problem is raised. The check
|
||||
exists to avoid exploding the size of the table by accident. This check
|
||||
can be disabled by setting this parameter to `Nothing`.
|
||||
- right_prefix: The prefix added to right table column names in case of
|
||||
name conflict. See "Column Renaming" below for more information.
|
||||
- on_problems: Specifies how to handle problems if they occur, reporting
|
||||
them as warnings by default.
|
||||
|
||||
- If the `right` table has more rows than the `right_row_limit` allows,
|
||||
a `Cross_Join_Row_Limit_Exceeded` is reported. In warning/ignore
|
||||
mode, the join is still executed.
|
||||
|
||||
? Column Renaming
|
||||
|
||||
If columns from the two tables have colliding names, a prefix (by
|
||||
default `Right_`) is added to the name of the column from the right
|
||||
table. The left column remains unchanged. It is possible that the new
|
||||
name will be in use, in this case it will be resolved using the normal
|
||||
renaming strategy - adding subsequent `_1`, `_2` etc.
|
||||
|
||||
? Result Ordering
|
||||
|
||||
Rows in the result are first ordered by the order of the corresponding
|
||||
rows from the left table and then the order of rows from the right
|
||||
table. This applies only if the order of the rows was specified (for
|
||||
example, by sorting the table; in-memory tables will keep the memory
|
||||
layout order while for database tables the order may be unspecified).
|
||||
cross_join : Table -> Integer | Nothing -> Table
|
||||
cross_join self right right_row_limit=100 right_prefix="Right_" on_problems=Report_Warning =
|
||||
if check_table "right" right then
|
||||
limit_problems = case right_row_limit.is_nothing.not && (right.row_count > right_row_limit) of
|
||||
True ->
|
||||
[Cross_Join_Row_Limit_Exceeded.Error right_row_limit right.row_count]
|
||||
False -> []
|
||||
on_problems.attach_problems_before limit_problems <|
|
||||
new_java_table = self.java_table.crossJoin right.java_table right_prefix
|
||||
renaming_problems = new_java_table.getProblems |> Java_Problems.parse_aggregated_problems
|
||||
on_problems.attach_problems_before renaming_problems (Table.Value new_java_table)
|
||||
|
||||
## ALIAS Join By Row Position
|
||||
Joins two tables by zipping rows from both tables table together - the
|
||||
first row of the left table is correlated with the first one of the right
|
||||
one etc.
|
||||
|
||||
Arguments:
|
||||
- right: The table to join with.
|
||||
- keep_unmatched: If set to `True`, the result will include as many rows
|
||||
as the larger of the two tables - the last rows of the larger table
|
||||
will have nulls for columns of the smaller one. If set to `False`, the
|
||||
result will have as many rows as the smaller of the two tables - the
|
||||
additional rows of the larger table will be discarded. The default
|
||||
value is `Report_Unmatched` which means that the user expects that two
|
||||
tables should have the same amount of rows; if they do not, the
|
||||
behaviour is the same as if it was set to `True` - i.e. the unmatched
|
||||
rows are kept with `Nothing` values for the other table, but a
|
||||
`Row_Count_Mismatch` problem is also reported.
|
||||
- right_prefix: The prefix added to right table column names in case of
|
||||
name conflict. See "Column Renaming" below for more information.
|
||||
- on_problems: Specifies how to handle problems if they occur, reporting
|
||||
them as warnings by default.
|
||||
|
||||
- If the tables have different number of rows and `keep_unmatched` is
|
||||
set to `Report_Unmatched`, the join will report `Row_Count_Mismatch`.
|
||||
|
||||
? Column Renaming
|
||||
|
||||
If columns from the two tables have colliding names, a prefix (by
|
||||
default `Right_`) is added to the name of the column from the right
|
||||
table. The left column remains unchanged. It is possible that the new
|
||||
name will be in use, in this case it will be resolved using the normal
|
||||
renaming strategy - adding subsequent `_1`, `_2` etc.
|
||||
|
||||
? Row Ordering
|
||||
|
||||
This operation requires a well-defined order of rows in the input
|
||||
tables. In-memory tables rely on the ordering stemming directly from
|
||||
their layout in memory. Database tables may not impose a deterministic
|
||||
ordering. If the table defines a primary key, it is used to by default
|
||||
to ensure deterministic ordering. That can be overridden by specifying
|
||||
a different ordering using `Table.order_by`. If no primary key was
|
||||
defined nor any ordering was specified explicitly by the user, the
|
||||
order of columns is undefined and the operation will fail, reporting a
|
||||
`Undefined_Column_Order` problem and returning an empty table.
|
||||
zip : Table -> Boolean | Report_Unmatched -> Text -> Problem_Behavior -> Table
|
||||
zip self right keep_unmatched=Report_Unmatched right_prefix="Right_" on_problems=Report_Warning =
|
||||
if check_table "right" right then
|
||||
keep_unmatched_bool = case keep_unmatched of
|
||||
Report_Unmatched -> True
|
||||
b : Boolean -> b
|
||||
report_mismatch = keep_unmatched == Report_Unmatched
|
||||
|
||||
left_row_count = self.row_count
|
||||
right_row_count = right.row_count
|
||||
problems = if (left_row_count == right_row_count) || report_mismatch.not then [] else
|
||||
[Row_Count_Mismatch.Error left_row_count right_row_count]
|
||||
on_problems.attach_problems_before problems <|
|
||||
new_java_table = self.java_table.zip right.java_table keep_unmatched_bool right_prefix
|
||||
renaming_problems = new_java_table.getProblems |> Java_Problems.parse_aggregated_problems
|
||||
on_problems.attach_problems_before renaming_problems (Table.Value new_java_table)
|
||||
|
||||
## ALIAS append, concat
|
||||
Appends records from other table(s) to this table.
|
||||
|
||||
|
@ -358,3 +358,20 @@ type Unmatched_Columns
|
||||
to_display_text : Text
|
||||
to_display_text self =
|
||||
"The following columns were not present in some of the provided tables: " + (self.column_names.map (n -> "["+n+"]") . join ", ") + ". The missing values have been filled with `Nothing`."
|
||||
|
||||
type Cross_Join_Row_Limit_Exceeded
|
||||
## Indicates that a `cross_join` has been attempted where the right table
|
||||
has more rows than allowed by the limit.
|
||||
Error (limit : Integer) (existing_rows : Integer)
|
||||
|
||||
to_display_text : Text
|
||||
to_display_text self =
|
||||
"The cross join operation exceeded the maximum number of rows allowed. The limit is "+self.limit.to_text+" and the number of rows in the right table was "+self.existing_rows.to_text+". The limit may be turned off by setting the `right_row_limit` option to `Nothing`."
|
||||
|
||||
type Row_Count_Mismatch
|
||||
## Indicates that the row counts of zipped tables do not match.
|
||||
Error (left_rows : Integer) (right_rows : Integer)
|
||||
|
||||
to_display_text : Text
|
||||
to_display_text self =
|
||||
"The number of rows in the left table ("+self.left_rows.to_text+") does not match the number of rows in the right table ("+self.right_rows.to_text+")."
|
||||
|
@ -2,6 +2,10 @@ from Standard.Base import all
|
||||
|
||||
import project.Extensions
|
||||
|
||||
## Returns values of warnings attached to the value.Nothing
|
||||
get_attached_warnings v =
|
||||
Warning.get_all v . map .value
|
||||
|
||||
## UNSTABLE
|
||||
Tests how a specific operation behaves depending on the requested
|
||||
`Problem_Behavior`.
|
||||
@ -58,12 +62,10 @@ test_advanced_problem_handling action error_checker warnings_checker result_chec
|
||||
# Lastly, we check the report warnings mode and ensure that both the result is correct and the warnings are as expected.
|
||||
result_warning = action Problem_Behavior.Report_Warning
|
||||
result_checker result_warning
|
||||
warnings = Warning.get_all result_warning . map .value
|
||||
warnings_checker warnings
|
||||
warnings_checker (get_attached_warnings result_warning)
|
||||
|
||||
## UNSTABLE
|
||||
Checks if the provided value does not have any attached problems.
|
||||
assume_no_problems result =
|
||||
result.is_error.should_be_false
|
||||
warnings = Warning.get_all result . map .value
|
||||
warnings.should_equal []
|
||||
(get_attached_warnings result).should_equal []
|
||||
|
@ -1,4 +1,4 @@
|
||||
package org.enso.base.text;
|
||||
package org.enso.base.arrays;
|
||||
|
||||
/** A helper to efficiently build an array of unboxed integers of arbitrary length. */
|
||||
public class IntArrayBuilder {
|
||||
@ -62,4 +62,10 @@ public class IntArrayBuilder {
|
||||
this.storage = null;
|
||||
return tmp;
|
||||
}
|
||||
|
||||
public int[] build() {
|
||||
int[] result = new int[length];
|
||||
System.arraycopy(storage, 0, result, 0, length);
|
||||
return result;
|
||||
}
|
||||
}
|
@ -3,6 +3,8 @@ package org.enso.base.text;
|
||||
import com.ibm.icu.text.BreakIterator;
|
||||
import com.ibm.icu.text.CaseMap;
|
||||
import com.ibm.icu.text.CaseMap.Fold;
|
||||
import org.enso.base.arrays.IntArrayBuilder;
|
||||
|
||||
import java.util.Locale;
|
||||
|
||||
/**
|
||||
|
@ -1,5 +1,7 @@
|
||||
package org.enso.table.data.column.builder.object;
|
||||
|
||||
import java.util.Arrays;
|
||||
import java.util.BitSet;
|
||||
import org.enso.base.polyglot.NumericConverter;
|
||||
import org.enso.table.data.column.storage.BoolStorage;
|
||||
import org.enso.table.data.column.storage.DoubleStorage;
|
||||
@ -7,9 +9,6 @@ import org.enso.table.data.column.storage.LongStorage;
|
||||
import org.enso.table.data.column.storage.Storage;
|
||||
import org.enso.table.util.BitSets;
|
||||
|
||||
import java.util.Arrays;
|
||||
import java.util.BitSet;
|
||||
|
||||
/** A builder for numeric columns. */
|
||||
public class NumericBuilder extends TypedBuilder {
|
||||
private final BitSet isMissing = new BitSet();
|
||||
@ -103,11 +102,11 @@ public class NumericBuilder extends TypedBuilder {
|
||||
|
||||
@Override
|
||||
public void appendBulkStorage(Storage<?> storage) {
|
||||
if (isDouble) {
|
||||
appendBulkDouble(storage);
|
||||
} else {
|
||||
appendBulkLong(storage);
|
||||
}
|
||||
if (isDouble) {
|
||||
appendBulkDouble(storage);
|
||||
} else {
|
||||
appendBulkLong(storage);
|
||||
}
|
||||
}
|
||||
|
||||
private void ensureFreeSpaceFor(int additionalSize) {
|
||||
@ -125,7 +124,10 @@ public class NumericBuilder extends TypedBuilder {
|
||||
BitSets.copy(doubleStorage.getIsMissing(), isMissing, currentSize, n);
|
||||
currentSize += n;
|
||||
} else {
|
||||
throw new IllegalStateException("Unexpected storage implementation for type DOUBLE: " + storage + ". This is a bug in the Table library.");
|
||||
throw new IllegalStateException(
|
||||
"Unexpected storage implementation for type DOUBLE: "
|
||||
+ storage
|
||||
+ ". This is a bug in the Table library.");
|
||||
}
|
||||
} else if (storage.getType() == Storage.Type.LONG) {
|
||||
if (storage instanceof LongStorage longStorage) {
|
||||
@ -135,7 +137,10 @@ public class NumericBuilder extends TypedBuilder {
|
||||
data[currentSize++] = Double.doubleToRawLongBits(longStorage.getItem(i));
|
||||
}
|
||||
} else {
|
||||
throw new IllegalStateException("Unexpected storage implementation for type LONG: " + storage + ". This is a bug in the Table library.");
|
||||
throw new IllegalStateException(
|
||||
"Unexpected storage implementation for type LONG: "
|
||||
+ storage
|
||||
+ ". This is a bug in the Table library.");
|
||||
}
|
||||
} else if (storage.getType() == Storage.Type.BOOL) {
|
||||
if (storage instanceof BoolStorage boolStorage) {
|
||||
@ -149,7 +154,10 @@ public class NumericBuilder extends TypedBuilder {
|
||||
}
|
||||
}
|
||||
} else {
|
||||
throw new IllegalStateException("Unexpected storage implementation for type BOOLEAN: " + storage + ". This is a bug in the Table library.");
|
||||
throw new IllegalStateException(
|
||||
"Unexpected storage implementation for type BOOLEAN: "
|
||||
+ storage
|
||||
+ ". This is a bug in the Table library.");
|
||||
}
|
||||
} else {
|
||||
throw new StorageTypeMismatch(getType(), storage.getType());
|
||||
@ -165,7 +173,10 @@ public class NumericBuilder extends TypedBuilder {
|
||||
BitSets.copy(longStorage.getIsMissing(), isMissing, currentSize, n);
|
||||
currentSize += n;
|
||||
} else {
|
||||
throw new IllegalStateException("Unexpected storage implementation for type DOUBLE: " + storage + ". This is a bug in the Table library.");
|
||||
throw new IllegalStateException(
|
||||
"Unexpected storage implementation for type DOUBLE: "
|
||||
+ storage
|
||||
+ ". This is a bug in the Table library.");
|
||||
}
|
||||
} else if (storage.getType() == Storage.Type.BOOL) {
|
||||
if (storage instanceof BoolStorage boolStorage) {
|
||||
@ -178,7 +189,10 @@ public class NumericBuilder extends TypedBuilder {
|
||||
}
|
||||
}
|
||||
} else {
|
||||
throw new IllegalStateException("Unexpected storage implementation for type BOOLEAN: " + storage + ". This is a bug in the Table library.");
|
||||
throw new IllegalStateException(
|
||||
"Unexpected storage implementation for type BOOLEAN: "
|
||||
+ storage
|
||||
+ ". This is a bug in the Table library.");
|
||||
}
|
||||
} else {
|
||||
throw new StorageTypeMismatch(getType(), storage.getType());
|
||||
|
@ -5,6 +5,8 @@ import java.util.List;
|
||||
import java.util.function.IntFunction;
|
||||
|
||||
import org.enso.base.polyglot.Polyglot_Utils;
|
||||
import org.enso.table.data.column.builder.object.BoolBuilder;
|
||||
import org.enso.table.data.column.builder.object.Builder;
|
||||
import org.enso.table.data.column.builder.object.InferredBuilder;
|
||||
import org.enso.table.data.column.operation.map.MapOpStorage;
|
||||
import org.enso.table.data.column.operation.map.MapOperation;
|
||||
@ -364,6 +366,11 @@ public final class BoolStorage extends Storage<Boolean> {
|
||||
negated);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Builder createDefaultBuilderOfSameType(int capacity) {
|
||||
return new BoolBuilder(capacity);
|
||||
}
|
||||
|
||||
@Override
|
||||
public BoolStorage slice(List<SliceRange> ranges) {
|
||||
int newSize = SliceRange.totalLength(ranges);
|
||||
|
@ -1,6 +1,9 @@
|
||||
package org.enso.table.data.column.storage;
|
||||
|
||||
import java.time.LocalDate;
|
||||
|
||||
import org.enso.table.data.column.builder.object.Builder;
|
||||
import org.enso.table.data.column.builder.object.DateBuilder;
|
||||
import org.enso.table.data.column.operation.map.MapOpStorage;
|
||||
import org.enso.table.data.column.operation.map.SpecializedIsInOp;
|
||||
import org.enso.table.data.column.operation.map.datetime.DateTimeIsInOp;
|
||||
@ -36,4 +39,9 @@ public final class DateStorage extends SpecializedStorage<LocalDate> {
|
||||
public int getType() {
|
||||
return Type.DATE;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Builder createDefaultBuilderOfSameType(int capacity) {
|
||||
return new DateBuilder(capacity);
|
||||
}
|
||||
}
|
||||
|
@ -1,5 +1,7 @@
|
||||
package org.enso.table.data.column.storage;
|
||||
|
||||
import org.enso.table.data.column.builder.object.Builder;
|
||||
import org.enso.table.data.column.builder.object.DateTimeBuilder;
|
||||
import org.enso.table.data.column.operation.map.MapOpStorage;
|
||||
import org.enso.table.data.column.operation.map.SpecializedIsInOp;
|
||||
import org.enso.table.data.column.operation.map.datetime.DateTimeIsInOp;
|
||||
@ -39,4 +41,9 @@ public final class DateTimeStorage extends SpecializedStorage<ZonedDateTime> {
|
||||
public int getType() {
|
||||
return Type.DATE_TIME;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Builder createDefaultBuilderOfSameType(int capacity) {
|
||||
return new DateTimeBuilder(capacity);
|
||||
}
|
||||
}
|
||||
|
@ -2,6 +2,8 @@ package org.enso.table.data.column.storage;
|
||||
|
||||
import java.util.BitSet;
|
||||
import java.util.List;
|
||||
|
||||
import org.enso.table.data.column.builder.object.Builder;
|
||||
import org.enso.table.data.column.builder.object.NumericBuilder;
|
||||
import org.enso.table.data.column.operation.map.MapOpStorage;
|
||||
import org.enso.table.data.column.operation.map.UnaryMapOperation;
|
||||
@ -296,6 +298,11 @@ public final class DoubleStorage extends NumericStorage<Double> {
|
||||
return new DoubleStorage(newData, newSize, newMask);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Builder createDefaultBuilderOfSameType(int capacity) {
|
||||
return NumericBuilder.createDoubleBuilder(capacity);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Storage<Double> slice(List<SliceRange> ranges) {
|
||||
int newSize = SliceRange.totalLength(ranges);
|
||||
|
@ -2,6 +2,8 @@ package org.enso.table.data.column.storage;
|
||||
|
||||
import java.util.BitSet;
|
||||
import java.util.List;
|
||||
|
||||
import org.enso.table.data.column.builder.object.Builder;
|
||||
import org.enso.table.data.column.builder.object.NumericBuilder;
|
||||
import org.enso.table.data.column.operation.map.MapOpStorage;
|
||||
import org.enso.table.data.column.operation.map.UnaryMapOperation;
|
||||
@ -356,6 +358,11 @@ public final class LongStorage extends NumericStorage<Long> {
|
||||
return new LongStorage(newData, newSize, newMask);
|
||||
}
|
||||
|
||||
@Override
|
||||
public Builder createDefaultBuilderOfSameType(int capacity) {
|
||||
return NumericBuilder.createLongBuilder(capacity);
|
||||
}
|
||||
|
||||
@Override
|
||||
public LongStorage slice(List<SliceRange> ranges) {
|
||||
int newSize = SliceRange.totalLength(ranges);
|
||||
|
@ -1,6 +1,9 @@
|
||||
package org.enso.table.data.column.storage;
|
||||
|
||||
import java.util.BitSet;
|
||||
|
||||
import org.enso.table.data.column.builder.object.Builder;
|
||||
import org.enso.table.data.column.builder.object.ObjectBuilder;
|
||||
import org.enso.table.data.column.operation.map.MapOpStorage;
|
||||
import org.enso.table.data.column.operation.map.UnaryMapOperation;
|
||||
|
||||
@ -29,6 +32,11 @@ public final class ObjectStorage extends SpecializedStorage<Object> {
|
||||
return Type.OBJECT;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Builder createDefaultBuilderOfSameType(int capacity) {
|
||||
return new ObjectBuilder(capacity);
|
||||
}
|
||||
|
||||
private static final MapOpStorage<Object, SpecializedStorage<Object>> ops = buildObjectOps();
|
||||
|
||||
static <T, S extends SpecializedStorage<T>> MapOpStorage<T, S> buildObjectOps() {
|
||||
|
@ -257,6 +257,24 @@ public abstract class Storage<T> {
|
||||
/** @return a copy of the storage containing a slice of the original data */
|
||||
public abstract Storage<T> slice(int offset, int limit);
|
||||
|
||||
/**
|
||||
* @return a new storage instance, containing the same elements as this one, with {@code count}
|
||||
* nulls appended at the end
|
||||
*/
|
||||
public Storage<?> appendNulls(int count) {
|
||||
Builder builder = new InferredBuilder(size() + count);
|
||||
builder.appendBulkStorage(this);
|
||||
builder.appendNulls(count);
|
||||
return builder.seal();
|
||||
}
|
||||
|
||||
/**
|
||||
* Creates a builder that is capable of creating storages of the same type as the current one.
|
||||
*
|
||||
* <p>This is useful for example when copying the current storage with some modifications.
|
||||
*/
|
||||
public abstract Builder createDefaultBuilderOfSameType(int capacity);
|
||||
|
||||
/** @return a copy of the storage consisting of slices of the original data */
|
||||
public abstract Storage<T> slice(List<SliceRange> ranges);
|
||||
|
||||
|
@ -3,6 +3,7 @@ package org.enso.table.data.column.storage;
|
||||
import java.util.BitSet;
|
||||
import java.util.HashSet;
|
||||
import org.enso.base.Text_Utils;
|
||||
import org.enso.table.data.column.builder.object.Builder;
|
||||
import org.enso.table.data.column.builder.object.StringBuilder;
|
||||
import org.enso.table.data.column.operation.map.MapOpStorage;
|
||||
import org.enso.table.data.column.operation.map.MapOperation;
|
||||
@ -60,6 +61,11 @@ public final class StringStorage extends SpecializedStorage<String> {
|
||||
}
|
||||
}
|
||||
|
||||
@Override
|
||||
public Builder createDefaultBuilderOfSameType(int capacity) {
|
||||
return new StringBuilder(capacity);
|
||||
}
|
||||
|
||||
private static MapOpStorage<String, SpecializedStorage<String>> buildOps() {
|
||||
MapOpStorage<String, SpecializedStorage<String>> t = ObjectStorage.buildObjectOps();
|
||||
t.add(
|
||||
|
@ -1,6 +1,9 @@
|
||||
package org.enso.table.data.column.storage;
|
||||
|
||||
import java.time.LocalTime;
|
||||
|
||||
import org.enso.table.data.column.builder.object.Builder;
|
||||
import org.enso.table.data.column.builder.object.TimeOfDayBuilder;
|
||||
import org.enso.table.data.column.operation.map.MapOpStorage;
|
||||
import org.enso.table.data.column.operation.map.SpecializedIsInOp;
|
||||
import org.enso.table.data.column.operation.map.datetime.DateTimeIsInOp;
|
||||
@ -36,4 +39,9 @@ public final class TimeOfDayStorage extends SpecializedStorage<LocalTime> {
|
||||
public int getType() {
|
||||
return Type.TIME_OF_DAY;
|
||||
}
|
||||
|
||||
@Override
|
||||
public Builder createDefaultBuilderOfSameType(int capacity) {
|
||||
return new TimeOfDayBuilder(capacity);
|
||||
}
|
||||
}
|
||||
|
@ -46,4 +46,18 @@ public class OrderMask {
|
||||
}
|
||||
return new OrderMask(result);
|
||||
}
|
||||
|
||||
public static OrderMask concat(List<OrderMask> masks) {
|
||||
int size = 0;
|
||||
for (OrderMask mask : masks) {
|
||||
size += mask.positions.length;
|
||||
}
|
||||
int[] result = new int[size];
|
||||
int offset = 0;
|
||||
for (OrderMask mask : masks) {
|
||||
System.arraycopy(mask.positions, 0, result, offset, mask.positions.length);
|
||||
offset += mask.positions.length;
|
||||
}
|
||||
return new OrderMask(result);
|
||||
}
|
||||
}
|
||||
|
@ -158,4 +158,20 @@ public class Column {
|
||||
public Column duplicateCount() {
|
||||
return new Column(name + "_duplicate_count", storage.duplicateCount());
|
||||
}
|
||||
|
||||
/** Resizes the given column to the provided new size.
|
||||
* <p>
|
||||
* If the new size is smaller than the current size, the column is truncated.
|
||||
* If the new size is larger than the current size, the column is padded with nulls.
|
||||
*/
|
||||
public Column resize(int newSize) {
|
||||
if (newSize == getSize()) {
|
||||
return this;
|
||||
} else if (newSize < getSize()) {
|
||||
return slice(0, newSize);
|
||||
} else {
|
||||
int nullsToAdd = newSize - getSize();
|
||||
return new Column(name, storage.appendNulls(nullsToAdd));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -12,10 +12,7 @@ import org.enso.table.data.index.Index;
|
||||
import org.enso.table.data.index.MultiValueIndex;
|
||||
import org.enso.table.data.mask.OrderMask;
|
||||
import org.enso.table.data.mask.SliceRange;
|
||||
import org.enso.table.data.table.join.IndexJoin;
|
||||
import org.enso.table.data.table.join.JoinCondition;
|
||||
import org.enso.table.data.table.join.JoinResult;
|
||||
import org.enso.table.data.table.join.JoinStrategy;
|
||||
import org.enso.table.data.table.join.*;
|
||||
import org.enso.table.problems.AggregatedProblems;
|
||||
import org.enso.table.error.UnexpectedColumnTypeException;
|
||||
import org.enso.table.operations.Distinct;
|
||||
@ -219,57 +216,47 @@ public class Table {
|
||||
*/
|
||||
public Table join(Table right, List<JoinCondition> conditions, boolean keepLeftUnmatched, boolean keepMatched, boolean keepRightUnmatched, boolean includeLeftColumns, boolean includeRightColumns, List<String> rightColumnsToDrop, String right_prefix, Comparator<Object> objectComparator, BiFunction<Object, Object, Boolean> equalityFallback) {
|
||||
NameDeduplicator nameDeduplicator = new NameDeduplicator();
|
||||
JoinResult joinResult = null;
|
||||
// Only compute the join if there are any results to be returned.
|
||||
if (keepLeftUnmatched || keepMatched || keepRightUnmatched) {
|
||||
JoinStrategy strategy = new IndexJoin(objectComparator, equalityFallback);
|
||||
joinResult = strategy.join(this, right, conditions);
|
||||
if (!keepLeftUnmatched && !keepMatched && !keepRightUnmatched) {
|
||||
throw new IllegalArgumentException("At least one of keepLeftUnmatched, keepMatched or keepRightUnmatched must be true.");
|
||||
}
|
||||
|
||||
List<Integer> leftRows = new ArrayList<>();
|
||||
List<Integer> rightRows = new ArrayList<>();
|
||||
JoinStrategy strategy = new IndexJoin(objectComparator, equalityFallback);
|
||||
JoinResult joinResult = strategy.join(this, right, conditions);
|
||||
|
||||
List<JoinResult> resultsToKeep = new ArrayList<>();
|
||||
|
||||
if (keepMatched) {
|
||||
for (var match : joinResult.matchedRows()) {
|
||||
leftRows.add(match.getLeft());
|
||||
rightRows.add(match.getRight());
|
||||
}
|
||||
resultsToKeep.add(joinResult);
|
||||
}
|
||||
|
||||
if (keepLeftUnmatched) {
|
||||
HashSet<Integer> matchedLeftRows = new HashSet<>();
|
||||
for (var match : joinResult.matchedRows()) {
|
||||
matchedLeftRows.add(match.getLeft());
|
||||
}
|
||||
|
||||
Set<Integer> matchedLeftRows = joinResult.leftMatchedRows();
|
||||
JoinResult.Builder leftUnmatchedBuilder = new JoinResult.Builder();
|
||||
for (int i = 0; i < this.rowCount(); i++) {
|
||||
if (!matchedLeftRows.contains(i)) {
|
||||
leftRows.add(i);
|
||||
rightRows.add(Index.NOT_FOUND);
|
||||
leftUnmatchedBuilder.addRow(i, Index.NOT_FOUND);
|
||||
}
|
||||
}
|
||||
|
||||
resultsToKeep.add(leftUnmatchedBuilder.build(AggregatedProblems.of()));
|
||||
}
|
||||
|
||||
if (keepRightUnmatched) {
|
||||
HashSet<Integer> matchedRightRows = new HashSet<>();
|
||||
for (var match : joinResult.matchedRows()) {
|
||||
matchedRightRows.add(match.getRight());
|
||||
}
|
||||
|
||||
Set<Integer> matchedRightRows = joinResult.rightMatchedRows();
|
||||
JoinResult.Builder rightUnmatchedBuilder = new JoinResult.Builder();
|
||||
for (int i = 0; i < right.rowCount(); i++) {
|
||||
if (!matchedRightRows.contains(i)) {
|
||||
leftRows.add(Index.NOT_FOUND);
|
||||
rightRows.add(i);
|
||||
rightUnmatchedBuilder.addRow(Index.NOT_FOUND, i);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
OrderMask leftMask = OrderMask.fromList(leftRows);
|
||||
OrderMask rightMask = OrderMask.fromList(rightRows);
|
||||
resultsToKeep.add(rightUnmatchedBuilder.build(AggregatedProblems.of()));
|
||||
}
|
||||
|
||||
List<Column> newColumns = new ArrayList<>();
|
||||
|
||||
if (includeLeftColumns) {
|
||||
OrderMask leftMask = OrderMask.concat(resultsToKeep.stream().map(JoinResult::getLeftOrderMask).collect(Collectors.toList()));
|
||||
for (Column column : this.columns) {
|
||||
Column newColumn = column.applyMask(leftMask);
|
||||
newColumns.add(newColumn);
|
||||
@ -277,6 +264,7 @@ public class Table {
|
||||
}
|
||||
|
||||
if (includeRightColumns) {
|
||||
OrderMask rightMask = OrderMask.concat(resultsToKeep.stream().map(JoinResult::getRightOrderMask).collect(Collectors.toList()));
|
||||
List<String> leftColumnNames = newColumns.stream().map(Column::getName).collect(Collectors.toList());
|
||||
|
||||
HashSet<String> toDrop = new HashSet<>(rightColumnsToDrop);
|
||||
@ -288,17 +276,74 @@ public class Table {
|
||||
for (int i = 0; i < rightColumnsToKeep.size(); ++i) {
|
||||
Column column = rightColumnsToKeep.get(i);
|
||||
String newName = newRightColumnNames.get(i);
|
||||
Storage<?> newStorage = column.getStorage().applyMask(rightMask);
|
||||
Column newColumn = new Column(newName, newStorage);
|
||||
Column newColumn = column.applyMask(rightMask).rename(newName);
|
||||
newColumns.add(newColumn);
|
||||
}
|
||||
}
|
||||
|
||||
AggregatedProblems joinProblems = joinResult != null ? joinResult.problems() : null;
|
||||
AggregatedProblems aggregatedProblems = AggregatedProblems.merge(joinProblems, AggregatedProblems.of(nameDeduplicator.getProblems()));
|
||||
AggregatedProblems aggregatedProblems = AggregatedProblems.merge(AggregatedProblems.of(nameDeduplicator.getProblems()), joinProblems);
|
||||
return new Table(newColumns.toArray(new Column[0]), aggregatedProblems);
|
||||
}
|
||||
|
||||
/**
|
||||
* Performs a cross-join of this table with the right table.
|
||||
*/
|
||||
public Table crossJoin(Table right, String rightPrefix) {
|
||||
NameDeduplicator nameDeduplicator = new NameDeduplicator();
|
||||
|
||||
List<String> leftColumnNames = Arrays.stream(this.columns).map(Column::getName).collect(Collectors.toList());
|
||||
List<String> rightColumNames = Arrays.stream(right.columns).map(Column::getName).collect(Collectors.toList());
|
||||
|
||||
List<String> newRightColumnNames = nameDeduplicator.combineWithPrefix(leftColumnNames, rightColumNames, rightPrefix);
|
||||
|
||||
JoinResult joinResult = CrossJoin.perform(this.rowCount(), right.rowCount());
|
||||
OrderMask leftMask = joinResult.getLeftOrderMask();
|
||||
OrderMask rightMask = joinResult.getRightOrderMask();
|
||||
|
||||
Column[] newColumns = new Column[this.columns.length + right.columns.length];
|
||||
|
||||
int leftColumnCount = this.columns.length;
|
||||
int rightColumnCount = right.columns.length;
|
||||
for (int i = 0; i < leftColumnCount; i++) {
|
||||
newColumns[i] = this.columns[i].applyMask(leftMask);
|
||||
}
|
||||
for (int i = 0; i < rightColumnCount; i++) {
|
||||
newColumns[leftColumnCount + i] = right.columns[i].applyMask(rightMask).rename(newRightColumnNames.get(i));
|
||||
}
|
||||
|
||||
AggregatedProblems aggregatedProblems = AggregatedProblems.merge(AggregatedProblems.of(nameDeduplicator.getProblems()), joinResult.problems());
|
||||
return new Table(newColumns, aggregatedProblems);
|
||||
}
|
||||
|
||||
/**
|
||||
* Zips rows of this table with rows of the right table.
|
||||
*/
|
||||
public Table zip(Table right, boolean keepUnmatched, String rightPrefix) {
|
||||
NameDeduplicator nameDeduplicator = new NameDeduplicator();
|
||||
|
||||
int leftRowCount = this.rowCount();
|
||||
int rightRowCount = right.rowCount();
|
||||
int resultRowCount = keepUnmatched ? Math.max(leftRowCount, rightRowCount) : Math.min(leftRowCount, rightRowCount);
|
||||
|
||||
List<String> leftColumnNames = Arrays.stream(this.columns).map(Column::getName).collect(Collectors.toList());
|
||||
List<String> rightColumNames = Arrays.stream(right.columns).map(Column::getName).collect(Collectors.toList());
|
||||
List<String> newRightColumnNames = nameDeduplicator.combineWithPrefix(leftColumnNames, rightColumNames, rightPrefix);
|
||||
|
||||
Column[] newColumns = new Column[this.columns.length + right.columns.length];
|
||||
|
||||
int leftColumnCount = this.columns.length;
|
||||
int rightColumnCount = right.columns.length;
|
||||
for (int i = 0; i < leftColumnCount; i++) {
|
||||
newColumns[i] = this.columns[i].resize(resultRowCount);
|
||||
}
|
||||
for (int i = 0; i < rightColumnCount; i++) {
|
||||
newColumns[leftColumnCount + i] = right.columns[i].resize(resultRowCount).rename(newRightColumnNames.get(i));
|
||||
}
|
||||
|
||||
return new Table(newColumns, AggregatedProblems.of(nameDeduplicator.getProblems()));
|
||||
}
|
||||
|
||||
/**
|
||||
* Applies an order mask to all columns and indexes of this array.
|
||||
*
|
||||
|
@ -0,0 +1,16 @@
|
||||
package org.enso.table.data.table.join;
|
||||
|
||||
import org.enso.table.problems.AggregatedProblems;
|
||||
|
||||
public class CrossJoin {
|
||||
public static JoinResult perform(int leftRowCount, int rightRowCount) {
|
||||
JoinResult.Builder resultBuilder = new JoinResult.Builder(leftRowCount * rightRowCount);
|
||||
for (int l = 0; l < leftRowCount; ++l) {
|
||||
for (int r = 0; r < rightRowCount; ++r) {
|
||||
resultBuilder.addRow(l, r);
|
||||
}
|
||||
}
|
||||
|
||||
return resultBuilder.build(AggregatedProblems.of());
|
||||
}
|
||||
}
|
@ -56,13 +56,13 @@ public class IndexJoin implements JoinStrategy {
|
||||
MatcherFactory factory = new MatcherFactory(objectComparator, equalityFallback);
|
||||
Matcher remainingMatcher = factory.create(remainingConditions);
|
||||
|
||||
List<Pair<Integer, Integer>> matches = new ArrayList<>();
|
||||
JoinResult.Builder resultBuilder = new JoinResult.Builder();
|
||||
for (var leftKey : leftIndex.keys()) {
|
||||
if (rightIndex.contains(leftKey)) {
|
||||
for (var leftRow : leftIndex.get(leftKey)) {
|
||||
for (var rightRow : rightIndex.get(leftKey)) {
|
||||
if (remainingMatcher.matches(leftRow, rightRow)) {
|
||||
matches.add(Pair.create(leftRow, rightRow));
|
||||
resultBuilder.addRow(leftRow, rightRow);
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -70,11 +70,8 @@ public class IndexJoin implements JoinStrategy {
|
||||
}
|
||||
|
||||
AggregatedProblems problems =
|
||||
AggregatedProblems.merge(
|
||||
new AggregatedProblems[] {
|
||||
leftIndex.getProblems(), rightIndex.getProblems(), remainingMatcher.getProblems()
|
||||
});
|
||||
return new JoinResult(matches, problems);
|
||||
AggregatedProblems.merge(leftIndex.getProblems(), rightIndex.getProblems(), remainingMatcher.getProblems());
|
||||
return resultBuilder.build(problems);
|
||||
}
|
||||
|
||||
private static boolean isSupported(JoinCondition condition) {
|
||||
|
@ -1,8 +1,50 @@
|
||||
package org.enso.table.data.table.join;
|
||||
|
||||
import org.enso.base.arrays.IntArrayBuilder;
|
||||
import org.enso.table.data.mask.OrderMask;
|
||||
import org.enso.table.problems.AggregatedProblems;
|
||||
import org.graalvm.collections.Pair;
|
||||
|
||||
import java.util.List;
|
||||
import java.util.*;
|
||||
import java.util.stream.Collectors;
|
||||
|
||||
public record JoinResult(List<Pair<Integer, Integer>> matchedRows, AggregatedProblems problems) {}
|
||||
public record JoinResult(int[] matchedRowsLeftIndices, int[] matchedRowsRightIndices, AggregatedProblems problems) {
|
||||
|
||||
public OrderMask getLeftOrderMask() {
|
||||
return new OrderMask(matchedRowsLeftIndices);
|
||||
}
|
||||
|
||||
public OrderMask getRightOrderMask() {
|
||||
return new OrderMask(matchedRowsRightIndices);
|
||||
}
|
||||
|
||||
public Set<Integer> leftMatchedRows() {
|
||||
return new HashSet<>(Arrays.stream(matchedRowsLeftIndices).boxed().collect(Collectors.toList()));
|
||||
}
|
||||
|
||||
public Set<Integer> rightMatchedRows() {
|
||||
return new HashSet<>(Arrays.stream(matchedRowsRightIndices).boxed().collect(Collectors.toList()));
|
||||
}
|
||||
|
||||
public static class Builder {
|
||||
IntArrayBuilder leftIndices;
|
||||
IntArrayBuilder rightIndices;
|
||||
|
||||
public Builder(int initialCapacity) {
|
||||
leftIndices = new IntArrayBuilder(initialCapacity);
|
||||
rightIndices = new IntArrayBuilder(initialCapacity);
|
||||
}
|
||||
|
||||
public Builder() {
|
||||
this(128);
|
||||
}
|
||||
|
||||
public void addRow(int leftIndex, int rightIndex) {
|
||||
leftIndices.add(leftIndex);
|
||||
rightIndices.add(rightIndex);
|
||||
}
|
||||
|
||||
public JoinResult build(AggregatedProblems problemsToInherit) {
|
||||
return new JoinResult(leftIndices.build(), rightIndices.build(), problemsToInherit);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -21,21 +21,22 @@ public class ScanJoin implements JoinStrategy {
|
||||
|
||||
@Override
|
||||
public JoinResult join(Table left, Table right, List<JoinCondition> conditions) {
|
||||
List<Pair<Integer, Integer>> matches = new ArrayList<>();
|
||||
int ls = left.rowCount();
|
||||
int rs = right.rowCount();
|
||||
|
||||
MatcherFactory factory = new MatcherFactory(objectComparator, equalityFallback);
|
||||
Matcher compoundMatcher = factory.create(conditions);
|
||||
|
||||
JoinResult.Builder resultBuilder = new JoinResult.Builder();
|
||||
|
||||
for (int l = 0; l < ls; ++l) {
|
||||
for (int r = 0; r < rs; ++r) {
|
||||
if (compoundMatcher.matches(l, r)) {
|
||||
matches.add(Pair.create(l, r));
|
||||
resultBuilder.addRow(l, r);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return new JoinResult(matches, compoundMatcher.getProblems());
|
||||
return resultBuilder.build(compoundMatcher.getProblems());
|
||||
}
|
||||
}
|
||||
|
@ -10,6 +10,12 @@ public class BitSets {
|
||||
* something on our own that would operate on whole longs instead of bit by bit.
|
||||
*/
|
||||
public static void copy(BitSet source, BitSet destination, int destinationOffset, int length) {
|
||||
if (destinationOffset == 0) {
|
||||
destination.clear(0, length);
|
||||
destination.or(source.get(0, length));
|
||||
return;
|
||||
}
|
||||
|
||||
for (int i = 0; i < length; i++) {
|
||||
if (source.get(i)) {
|
||||
destination.set(destinationOffset + i);
|
||||
|
@ -1324,7 +1324,7 @@ spec setup =
|
||||
|
||||
Test.specify "should merge Invalid Aggregation warnings" <|
|
||||
new_table = table.aggregate [Group_By "Key", Concatenate "Value"]
|
||||
problems = Warning.get_all new_table . map .value
|
||||
problems = Problems.get_attached_warnings new_table
|
||||
problems.length . should_equal 1
|
||||
problems.at 0 . is_a Invalid_Aggregation.Error . should_be_true
|
||||
problems.at 0 . column . should_equal "Concatenate Value"
|
||||
@ -1332,7 +1332,7 @@ spec setup =
|
||||
|
||||
Test.specify "should merge Floating Point Grouping warnings" <|
|
||||
new_table = table.aggregate [Group_By "Float", Count]
|
||||
problems = Warning.get_all new_table . map .value
|
||||
problems = Problems.get_attached_warnings new_table
|
||||
problems.length . should_equal 1
|
||||
problems.at 0 . is_a Floating_Point_Grouping.Error . should_be_true
|
||||
problems.at 0 . column . should_equal "Float"
|
||||
@ -1343,7 +1343,7 @@ spec setup =
|
||||
result.column_count . should_equal 1
|
||||
result.row_count . should_equal 1
|
||||
result.columns.first.to_vector . should_equal [6]
|
||||
warnings = Warning.get_all result . map .value
|
||||
warnings = Problems.get_attached_warnings result
|
||||
warnings.length . should_equal error_count
|
||||
warnings.each warning->
|
||||
warning.should_be_an Unsupported_Database_Operation.Error
|
||||
|
@ -0,0 +1,152 @@
|
||||
from Standard.Base import all
|
||||
import Standard.Base.Error.Illegal_State.Illegal_State
|
||||
|
||||
from Standard.Table import all hiding Table
|
||||
from Standard.Table.Errors import all
|
||||
|
||||
from Standard.Database.Errors import Unsupported_Database_Operation
|
||||
|
||||
from Standard.Test import Test, Problems
|
||||
import Standard.Test.Extensions
|
||||
|
||||
from project.Common_Table_Operations.Util import expect_column_names, run_default_backend
|
||||
|
||||
|
||||
main = run_default_backend spec
|
||||
|
||||
spec setup =
|
||||
prefix = setup.prefix
|
||||
table_builder = setup.table_builder
|
||||
materialize = setup.materialize
|
||||
db_todo = if prefix.contains "In-Memory" then Nothing else "Table.cross_join is still WIP for the DB backend."
|
||||
Test.group prefix+"Table.cross_join" pending=db_todo <|
|
||||
Test.specify "should allow to create a cross product of two tables in the right order" <|
|
||||
t1 = table_builder [["X", [1, 2]], ["Y", [4, 5]]]
|
||||
t2 = table_builder [["Z", ['a', 'b']], ["W", ['c', 'd']]]
|
||||
|
||||
t3 = t1.cross_join t2
|
||||
expect_column_names ["X", "Y", "Z", "W"] t3
|
||||
t3.row_count . should_equal 4
|
||||
r = materialize t3 . rows . map .to_vector
|
||||
r.length . should_equal 4
|
||||
|
||||
r0 = [1, 4, 'a', 'c']
|
||||
r1 = [1, 4, 'b', 'd']
|
||||
r2 = [2, 5, 'a', 'c']
|
||||
r3 = [2, 5, 'b', 'd']
|
||||
expected_rows = [r0, r1, r2, r3]
|
||||
|
||||
case setup.is_database of
|
||||
True -> r.should_contain_the_same_elements_as expected_rows
|
||||
False -> r.should_equal expected_rows
|
||||
|
||||
Test.specify "should work correctly with empty tables" <|
|
||||
t1 = table_builder [["X", [1, 2]], ["Y", [4, 5]]]
|
||||
t2 = table_builder [["Z", ['a']], ["W", ['c']]]
|
||||
# Workaround to easily create empty table until table builder allows that directly.
|
||||
empty = t2.filter "Z" Filter_Condition.Is_Nothing
|
||||
empty.row_count . should_equal 0
|
||||
|
||||
t3 = t1.cross_join empty
|
||||
expect_column_names ["X", "Y", "Z", "W"] t3
|
||||
t3.row_count.should_equal 0
|
||||
t3.at "X" . to_vector . should_equal []
|
||||
|
||||
t4 = empty.cross_join t1
|
||||
expect_column_names ["Z", "W", "X", "Y"] t4
|
||||
t4.row_count.should_equal 0
|
||||
t4.at "X" . to_vector . should_equal []
|
||||
|
||||
Test.specify "should respect the right row limit" <|
|
||||
t2 = table_builder [["X", [1, 2]]]
|
||||
t3 = table_builder [["X", [1, 2, 3]]]
|
||||
t100 = table_builder [["Y", 0.up_to 100 . to_vector]]
|
||||
t101 = table_builder [["Y", 0.up_to 101 . to_vector]]
|
||||
|
||||
t2.cross_join t100 . row_count . should_equal 200
|
||||
t101.cross_join t2 . row_count . should_equal 202
|
||||
action = t2.cross_join t101 on_problems=_
|
||||
tester table =
|
||||
table.row_count . should_equal 202
|
||||
problems = [Cross_Join_Row_Limit_Exceeded.Error 100 101]
|
||||
Problems.test_problem_handling action problems tester
|
||||
|
||||
t2.cross_join t101 right_row_limit=Nothing . row_count . should_equal 202
|
||||
t2.cross_join t3 right_row_limit=2 on_problems=Problem_Behavior.Report_Error . should_fail_with Cross_Join_Row_Limit_Exceeded
|
||||
|
||||
Test.specify "should ensure 1-1 mapping even with duplicate rows" <|
|
||||
t1 = table_builder [["X", [2, 1, 2, 2]], ["Y", [5, 4, 5, 5]]]
|
||||
t2 = table_builder [["Z", ['a', 'a']]]
|
||||
|
||||
t3 = t1.cross_join t2
|
||||
expect_column_names ["X", "Y", "Z"] t3
|
||||
t3.row_count . should_equal 8
|
||||
r = materialize t3 . rows . map .to_vector
|
||||
r.length . should_equal 8
|
||||
r1 = [2, 5, 'a']
|
||||
r2 = [1, 4, 'a']
|
||||
expected_rows = [r1, r1, r2, r2, r1, r1, r1, r1]
|
||||
case setup.is_database of
|
||||
True -> r.should_contain_the_same_elements_as expected_rows
|
||||
False -> r.should_equal expected_rows
|
||||
|
||||
Test.specify "should allow self-joins" <|
|
||||
t1 = table_builder [["X", [1, 2]], ["Y", [4, 5]]]
|
||||
t2 = t1.cross_join t1
|
||||
|
||||
expect_column_names ["X", "Y", "Right_X", "Right_Y"] t2
|
||||
t2.row_count . should_equal 4
|
||||
r = materialize t2 . rows . map .to_vector
|
||||
r.length . should_equal 4
|
||||
r0 = [1, 4, 1, 4]
|
||||
r1 = [1, 4, 2, 5]
|
||||
r2 = [2, 5, 1, 4]
|
||||
r3 = [2, 5, 2, 5]
|
||||
expected_rows = [r0, r1, r2, r3]
|
||||
case setup.is_database of
|
||||
True -> r.should_contain_the_same_elements_as expected_rows
|
||||
False -> r.should_equal expected_rows
|
||||
|
||||
Test.specify "should rename columns of the right table to avoid duplicates" <|
|
||||
t1 = table_builder [["X", [1]], ["Y", [5]]]
|
||||
t2 = table_builder [["X", ['a']], ["Y", ['d']]]
|
||||
|
||||
t3 = t1.cross_join t2
|
||||
expect_column_names ["X", "Y", "Right_X", "Right_Y"] t3
|
||||
Problems.get_attached_warnings t3 . should_equal [Duplicate_Output_Column_Names.Error ["X", "Y"]]
|
||||
t3.row_count . should_equal 1
|
||||
t3.at "X" . to_vector . should_equal [1]
|
||||
t3.at "Y" . to_vector . should_equal [5]
|
||||
t3.at "Right_X" . to_vector . should_equal ['a']
|
||||
t3.at "Right_Y" . to_vector . should_equal ['d']
|
||||
|
||||
t1.cross_join t2 on_problems=Problem_Behavior.Report_Error . should_fail_with Duplicate_Output_Column_Names
|
||||
|
||||
expect_column_names ["X", "Y", "X_1", "Y_1"] (t1.cross_join t2 right_prefix="")
|
||||
|
||||
t4 = table_builder [["X", [1]], ["Right_X", [5]]]
|
||||
expect_column_names ["X", "Y", "Right_X_1", "Right_X"] (t1.cross_join t4)
|
||||
expect_column_names ["X", "Right_X", "Right_X_1", "Y"] (t4.cross_join t1)
|
||||
|
||||
Test.specify "should respect the column ordering" <|
|
||||
t1 = table_builder [["X", [100, 2]], ["Y", [4, 5]]]
|
||||
t2 = table_builder [["Z", ['a', 'b', 'c']], ["W", ['x', 'd', 'd']]]
|
||||
|
||||
t3 = t1.order_by "X"
|
||||
t4 = t2.order_by (Sort_Column_Selector.By_Name [Sort_Column.Name "Z" Sort_Direction.Descending])
|
||||
|
||||
t5 = t3.cross_join t4
|
||||
expect_column_names ["X", "Y", "Z", "W"] t5
|
||||
t5.row_count . should_equal 6
|
||||
r = materialize t5 . rows . map .to_vector
|
||||
r.length . should_equal 6
|
||||
|
||||
r0 = [2, 5, 'c', 'd']
|
||||
r1 = [2, 5, 'b', 'd']
|
||||
r2 = [2, 5, 'a', 'x']
|
||||
r3 = [100, 4, 'c', 'd']
|
||||
r4 = [100, 4, 'b', 'd']
|
||||
r5 = [100, 4, 'a', 'x']
|
||||
expected_rows = [r0, r1, r2, r3, r4, r5]
|
||||
r.should_equal expected_rows
|
||||
|
@ -1,7 +1,7 @@
|
||||
from Standard.Base import all
|
||||
import Standard.Base.Error.Illegal_State.Illegal_State
|
||||
|
||||
from Standard.Table import all
|
||||
from Standard.Table import all hiding Table
|
||||
from Standard.Table.Errors import all
|
||||
import Standard.Table.Data.Value_Type.Value_Type
|
||||
|
||||
@ -417,12 +417,15 @@ spec setup =
|
||||
t2 = table_builder [["X", [2, 1]], ["Y", [2, 2]]]
|
||||
|
||||
t3 = t1.join t2 on=(Join_Condition.Equals "X" "Y") |> materialize |> _.order_by ["Right_X"]
|
||||
Problems.get_attached_warnings t3 . should_equal [Duplicate_Output_Column_Names.Error ["X", "Y"]]
|
||||
expect_column_names ["X", "Y", "Right_X", "Right_Y"] t3
|
||||
t3.at "X" . to_vector . should_equal [2, 2]
|
||||
t3.at "Right_Y" . to_vector . should_equal [2, 2]
|
||||
t3.at "Y" . to_vector . should_equal [4, 4]
|
||||
t3.at "Right_X" . to_vector . should_equal [1, 2]
|
||||
|
||||
t1.join t2 on=(Join_Condition.Equals "X" "Y") on_problems=Problem_Behavior.Report_Error . should_fail_with Duplicate_Output_Column_Names
|
||||
|
||||
t4 = table_builder [["Right_X", [1, 1]], ["X", [1, 2]], ["Y", [3, 4]], ["Right_Y_2", [2, 2]]]
|
||||
t5 = table_builder [["Right_X", [2, 1]], ["X", [2, 2]], ["Y", [2, 2]], ["Right_Y", [2, 2]], ["Right_Y_1", [2, 2]], ["Right_Y_4", [2, 2]]]
|
||||
|
||||
@ -431,6 +434,7 @@ spec setup =
|
||||
|
||||
t7 = t1.join t2 right_prefix=""
|
||||
expect_column_names ["X", "Y", "Y_1"] t7
|
||||
Problems.get_attached_warnings t7 . should_equal [Duplicate_Output_Column_Names.Error ["Y"]]
|
||||
|
||||
t8 = t1.join t2 right_prefix="P"
|
||||
expect_column_names ["X", "Y", "PY"] t8
|
@ -20,7 +20,7 @@ main = run_default_backend spec
|
||||
spec setup =
|
||||
prefix = setup.prefix
|
||||
table_builder = setup.table_builder
|
||||
db_todo = if prefix.contains "In-Memory" then Nothing else "Union API is not yet implemented for the DB backend."
|
||||
db_todo = if prefix.contains "In-Memory" then Nothing else "Table.union is not yet implemented for the DB backend."
|
||||
Test.group prefix+"Table.union" pending=db_todo <|
|
||||
Test.specify "should merge columns from multiple tables" <|
|
||||
t1 = table_builder [["A", [1, 2, 3]], ["B", ["a", "b", "c"]]]
|
||||
@ -148,7 +148,7 @@ spec setup =
|
||||
t3 = t1.union t2 match_columns=Match_Columns.By_Position
|
||||
within_table t3 <|
|
||||
check t3
|
||||
Warning.get_all t3 . map .value . should_equal [Column_Count_Mismatch.Error 2 1]
|
||||
Problems.get_attached_warnings t3 . should_equal [Column_Count_Mismatch.Error 2 1]
|
||||
|
||||
t4 = t1.union t2 match_columns=Match_Columns.By_Position keep_unmatched_columns=True
|
||||
within_table t4 <|
|
238
test/Table_Tests/src/Common_Table_Operations/Join/Zip_Spec.enso
Normal file
238
test/Table_Tests/src/Common_Table_Operations/Join/Zip_Spec.enso
Normal file
@ -0,0 +1,238 @@
|
||||
from Standard.Base import all
|
||||
import Standard.Base.Error.Illegal_State.Illegal_State
|
||||
|
||||
from Standard.Table import all hiding Table
|
||||
from Standard.Table.Errors import all
|
||||
import Standard.Table.Data.Value_Type.Value_Type
|
||||
|
||||
from Standard.Database.Errors import Unsupported_Database_Operation
|
||||
|
||||
from Standard.Test import Test, Problems
|
||||
import Standard.Test.Extensions
|
||||
|
||||
from project.Common_Table_Operations.Util import expect_column_names, run_default_backend
|
||||
|
||||
|
||||
main = run_default_backend spec
|
||||
|
||||
spec setup =
|
||||
prefix = setup.prefix
|
||||
table_builder = setup.table_builder
|
||||
materialize = setup.materialize
|
||||
db_todo = if prefix.contains "In-Memory" then Nothing else "Table.zip is still WIP for the DB backend."
|
||||
Test.group prefix+"Table.zip" pending=db_todo <|
|
||||
if setup.is_database.not then
|
||||
Test.specify "should allow to zip two tables, preserving memory layout order" <|
|
||||
t1 = table_builder [["X", [1, 2, 3]], ["Y", [4, 5, 6]]]
|
||||
t2 = table_builder [["Z", ['a', 'b', 'c']], ["W", ['x', 'y', 'z']]]
|
||||
|
||||
t3 = t1.zip t2
|
||||
expect_column_names ["X", "Y", "Z", "W"] t3
|
||||
t3.row_count . should_equal 3
|
||||
r = materialize t3 . rows . map .to_vector
|
||||
r.length . should_equal 3
|
||||
r0 = [1, 4, 'a', 'x']
|
||||
r1 = [2, 5, 'b', 'y']
|
||||
r2 = [3, 6, 'c', 'z']
|
||||
expected_rows = [r0, r1, r2]
|
||||
r.should_equal expected_rows
|
||||
|
||||
Test.specify "should allow to zip two tables, preserving the order defined by `order_by`" <|
|
||||
t1 = table_builder [["X", [100, 2]], ["Y", [4, 5]]]
|
||||
t2 = table_builder [["Z", ['a', 'b']], ["W", ['x', 'd']]]
|
||||
|
||||
t3 = t1.order_by "X"
|
||||
t4 = t2.order_by (Sort_Column_Selector.By_Name [Sort_Column.Name "Z" Sort_Direction.Descending])
|
||||
|
||||
t5 = t3.zip t4
|
||||
expect_column_names ["X", "Y", "Z", "W"] t5
|
||||
t5.row_count . should_equal 2
|
||||
r = materialize t5 . rows . map .to_vector
|
||||
r.length . should_equal 2
|
||||
|
||||
r0 = [2, 5, 'b', 'd']
|
||||
r1 = [100, 4, 'a', 'x']
|
||||
expected_rows = [r0, r1]
|
||||
r.should_equal expected_rows
|
||||
|
||||
Test.specify "should report unmatched rows if the row counts do not match and pad them with nulls" <|
|
||||
t1 = table_builder [["X", [1, 2, 3]], ["Y", [4, 5, 6]]]
|
||||
t2 = table_builder [["Z", ['a', 'b']], ["W", ['x', 'd']]]
|
||||
|
||||
action_1 = t1.zip t2 on_problems=_
|
||||
tester_1 table =
|
||||
expect_column_names ["X", "Y", "Z", "W"] table
|
||||
table.at "X" . to_vector . should_equal [1, 2, 3]
|
||||
table.at "Y" . to_vector . should_equal [4, 5, 6]
|
||||
table.at "Z" . to_vector . should_equal ['a', 'b', Nothing]
|
||||
table.at "W" . to_vector . should_equal ['x', 'd', Nothing]
|
||||
problems_1 = [Row_Count_Mismatch.Error 3 2]
|
||||
Problems.test_problem_handling action_1 problems_1 tester_1
|
||||
|
||||
action_2 = t2.zip t1 on_problems=_
|
||||
tester_2 table =
|
||||
expect_column_names ["Z", "W", "X", "Y"] table
|
||||
table.at "Z" . to_vector . should_equal ['a', 'b', Nothing]
|
||||
table.at "W" . to_vector . should_equal ['x', 'd', Nothing]
|
||||
table.at "X" . to_vector . should_equal [1, 2, 3]
|
||||
table.at "Y" . to_vector . should_equal [4, 5, 6]
|
||||
problems_2 = [Row_Count_Mismatch.Error 2 3]
|
||||
Problems.test_problem_handling action_2 problems_2 tester_2
|
||||
|
||||
Test.specify "should allow to keep the unmatched rows padded with nulls without reporting problems" <|
|
||||
t1 = table_builder [["X", [1, 2, 3]], ["Y", [4, 5, 6]]]
|
||||
t2 = table_builder [["Z", ['a']], ["W", ['x']]]
|
||||
|
||||
t3 = t1.zip t2 keep_unmatched=True on_problems=Problem_Behavior.Report_Error
|
||||
Problems.assume_no_problems t3
|
||||
expect_column_names ["X", "Y", "Z", "W"] t3
|
||||
t3.at "X" . to_vector . should_equal [1, 2, 3]
|
||||
t3.at "Y" . to_vector . should_equal [4, 5, 6]
|
||||
t3.at "Z" . to_vector . should_equal ['a', Nothing, Nothing]
|
||||
t3.at "W" . to_vector . should_equal ['x', Nothing, Nothing]
|
||||
|
||||
Test.specify "should allow to drop the unmatched rows" <|
|
||||
t1 = table_builder [["X", [1, 2, 3]], ["Y", [4, 5, 6]]]
|
||||
t2 = table_builder [["Z", ['a']], ["W", ['x']]]
|
||||
|
||||
t3 = t1.zip t2 keep_unmatched=False on_problems=Problem_Behavior.Report_Error
|
||||
Problems.assume_no_problems t3
|
||||
expect_column_names ["X", "Y", "Z", "W"] t3
|
||||
t3.at "X" . to_vector . should_equal [1]
|
||||
t3.at "Y" . to_vector . should_equal [4]
|
||||
t3.at "Z" . to_vector . should_equal ['a']
|
||||
t3.at "W" . to_vector . should_equal ['x']
|
||||
|
||||
Test.specify "should work when zipping with an empty table" <|
|
||||
t1 = table_builder [["X", [1, 2]], ["Y", [4, 5]]]
|
||||
t2 = table_builder [["Z", ['a']], ["W", ['c']]]
|
||||
# Workaround to easily create empty table until table builder allows that directly.
|
||||
empty = t2.filter "Z" Filter_Condition.Is_Nothing
|
||||
empty.row_count . should_equal 0
|
||||
|
||||
t3 = t1.zip empty
|
||||
expect_column_names ["X", "Y", "Z", "W"] t3
|
||||
t3.row_count . should_equal 2
|
||||
t3.at "X" . to_vector . should_equal [1, 2]
|
||||
t3.at "Y" . to_vector . should_equal [4, 5]
|
||||
t3.at "Z" . to_vector . should_equal [Nothing, Nothing]
|
||||
t3.at "W" . to_vector . should_equal [Nothing, Nothing]
|
||||
|
||||
t4 = empty.zip t1
|
||||
expect_column_names ["Z", "W", "X", "Y"] t4
|
||||
t4.row_count . should_equal 2
|
||||
t4.at "X" . to_vector . should_equal [1, 2]
|
||||
t4.at "Y" . to_vector . should_equal [4, 5]
|
||||
t4.at "Z" . to_vector . should_equal [Nothing, Nothing]
|
||||
t4.at "W" . to_vector . should_equal [Nothing, Nothing]
|
||||
|
||||
t5 = t1.zip empty keep_unmatched=False
|
||||
expect_column_names ["X", "Y", "Z", "W"] t5
|
||||
t5.row_count . should_equal 0
|
||||
t5.at "X" . to_vector . should_equal []
|
||||
|
||||
t6 = empty.zip t1 keep_unmatched=False
|
||||
expect_column_names ["Z", "W", "X", "Y"] t6
|
||||
t6.row_count . should_equal 0
|
||||
t6.at "X" . to_vector . should_equal []
|
||||
|
||||
Test.specify "should not report unmatched rows for rows that simply are all null" <|
|
||||
t1 = table_builder [["X", [1, 2, 3]], ["Y", [4, 5, 6]]]
|
||||
t2 = table_builder [["Z", ['a', Nothing, Nothing]], ["W", ['b', Nothing, Nothing]]]
|
||||
t3 = t1.zip t2 on_problems=Problem_Behavior.Report_Error
|
||||
Problems.assume_no_problems t3
|
||||
expect_column_names ["X", "Y", "Z", "W"] t3
|
||||
t3.at "X" . to_vector . should_equal [1, 2, 3]
|
||||
t3.at "Y" . to_vector . should_equal [4, 5, 6]
|
||||
t3.at "Z" . to_vector . should_equal ['a', Nothing, Nothing]
|
||||
t3.at "W" . to_vector . should_equal ['b', Nothing, Nothing]
|
||||
|
||||
Test.specify "should rename columns of the right table to avoid duplicates" <|
|
||||
t1 = table_builder [["X", [1, 2]], ["Y", [5, 6]]]
|
||||
t2 = table_builder [["X", ['a']], ["Y", ['d']]]
|
||||
|
||||
t3 = t1.zip t2 keep_unmatched=True
|
||||
expect_column_names ["X", "Y", "Right_X", "Right_Y"] t3
|
||||
Problems.get_attached_warnings t3 . should_equal [Duplicate_Output_Column_Names.Error ["X", "Y"]]
|
||||
t3.row_count . should_equal 2
|
||||
t3.at "X" . to_vector . should_equal [1, 2]
|
||||
t3.at "Y" . to_vector . should_equal [5, 6]
|
||||
t3.at "Right_X" . to_vector . should_equal ['a', Nothing]
|
||||
t3.at "Right_Y" . to_vector . should_equal ['d', Nothing]
|
||||
|
||||
t1.zip t2 keep_unmatched=False on_problems=Problem_Behavior.Report_Error . should_fail_with Duplicate_Output_Column_Names
|
||||
|
||||
expect_column_names ["X", "Y", "X_1", "Y_1"] (t1.zip t2 right_prefix="")
|
||||
|
||||
t4 = table_builder [["X", [1]], ["Right_X", [5]]]
|
||||
expect_column_names ["X", "Y", "Right_X_1", "Right_X"] (t1.zip t4)
|
||||
expect_column_names ["X", "Right_X", "Right_X_1", "Y"] (t4.zip t1)
|
||||
|
||||
Test.specify "should report both row count mismatch and duplicate column warnings at the same time" <|
|
||||
t1 = table_builder [["X", [1, 2]], ["Y", [5, 6]]]
|
||||
t2 = table_builder [["X", ['a']], ["Z", ['d']]]
|
||||
|
||||
t3 = t1.zip t2
|
||||
expected_problems = [Row_Count_Mismatch.Error 2 1, Duplicate_Output_Column_Names.Error ["X"]]
|
||||
Problems.get_attached_warnings t3 . should_contain_the_same_elements_as expected_problems
|
||||
|
||||
Test.specify "should allow to zip the table with itself" <|
|
||||
## Even though this does not seem very useful, we should verify that
|
||||
this edge case works correctly. It may especially be fragile in
|
||||
the Database backend.
|
||||
t1 = table_builder [["X", [1, 2]], ["Y", [4, 5]]]
|
||||
t2 = t1.zip t1
|
||||
expect_column_names ["X", "Y", "Right_X", "Right_Y"] t2
|
||||
t2.row_count . should_equal 2
|
||||
t2.at "X" . to_vector . should_equal [1, 2]
|
||||
t2.at "Y" . to_vector . should_equal [4, 5]
|
||||
t2.at "Right_X" . to_vector . should_equal [1, 2]
|
||||
t2.at "Right_Y" . to_vector . should_equal [4, 5]
|
||||
|
||||
if setup.is_database.not then
|
||||
Test.specify "should correctly pad/truncate all kinds of column types" <|
|
||||
primitives = [["ints", [1, 2, 3]], ["strs", ['a', 'b', 'c']], ["bools", [True, Nothing, False]]]
|
||||
times = [["dates", [Date.new 1999 1 1, Date.new 2000 4 1, Date.new 2001 1 2]], ["times", [Time_Of_Day.new 23 59, Time_Of_Day.new 0 0, Time_Of_Day.new 12 34]], ["datetimes", [Date_Time.new 1999 1 1 23 59, Date_Time.new 2000 4 1 0 0, Date_Time.new 2001 1 2 12 34]]]
|
||||
t = table_builder <|
|
||||
primitives + times + [["mixed", ['a', 2, True]]]
|
||||
|
||||
t1 = table_builder [["X", [1]]]
|
||||
t5 = table_builder [["X", 0.up_to 5 . to_vector]]
|
||||
|
||||
truncated = t.zip t1 keep_unmatched=False
|
||||
expect_column_names ["ints", "strs", "bools", "dates", "times", "datetimes", "mixed", "X"] truncated
|
||||
truncated.row_count . should_equal 1
|
||||
truncated.at "ints" . to_vector . should_equal [1]
|
||||
truncated.at "strs" . to_vector . should_equal ['a']
|
||||
truncated.at "bools" . to_vector . should_equal [True]
|
||||
truncated.at "dates" . to_vector . should_equal [Date.new 1999 1 1]
|
||||
truncated.at "times" . to_vector . should_equal [Time_Of_Day.new 23 59]
|
||||
truncated.at "datetimes" . to_vector . should_equal [Date_Time.new 1999 1 1 23 59]
|
||||
truncated.at "mixed" . to_vector . should_equal ['a']
|
||||
|
||||
truncated.at "ints" . value_type . should_equal Value_Type.Integer
|
||||
truncated.at "strs" . value_type . should_equal Value_Type.Char
|
||||
truncated.at "bools" . value_type . should_equal Value_Type.Boolean
|
||||
truncated.at "dates" . value_type . should_equal Value_Type.Date
|
||||
truncated.at "times" . value_type . should_equal Value_Type.Time
|
||||
truncated.at "datetimes" . value_type . should_equal Value_Type.Date_Time
|
||||
truncated.at "mixed" . value_type . should_equal Value_Type.Mixed
|
||||
|
||||
padded = t.zip t5 keep_unmatched=True
|
||||
expect_column_names ["ints", "strs", "bools", "dates", "times", "datetimes", "mixed", "X"] padded
|
||||
padded.row_count . should_equal 5
|
||||
padded.at "ints" . to_vector . should_equal [1, 2, 3, Nothing, Nothing]
|
||||
padded.at "strs" . to_vector . should_equal ['a', 'b', 'c', Nothing, Nothing]
|
||||
padded.at "bools" . to_vector . should_equal [True, Nothing, False, Nothing, Nothing]
|
||||
padded.at "dates" . to_vector . should_equal [Date.new 1999 1 1, Date.new 2000 4 1, Date.new 2001 1 2, Nothing, Nothing]
|
||||
padded.at "times" . to_vector . should_equal [Time_Of_Day.new 23 59, Time_Of_Day.new 0 0, Time_Of_Day.new 12 34, Nothing, Nothing]
|
||||
padded.at "datetimes" . to_vector . should_equal [Date_Time.new 1999 1 1 23 59, Date_Time.new 2000 4 1 0 0, Date_Time.new 2001 1 2 12 34, Nothing, Nothing]
|
||||
padded.at "mixed" . to_vector . should_equal ['a', 2, True, Nothing, Nothing]
|
||||
|
||||
padded.at "ints" . value_type . should_equal Value_Type.Integer
|
||||
padded.at "strs" . value_type . should_equal Value_Type.Char
|
||||
padded.at "bools" . value_type . should_equal Value_Type.Boolean
|
||||
padded.at "dates" . value_type . should_equal Value_Type.Date
|
||||
padded.at "times" . value_type . should_equal Value_Type.Time
|
||||
padded.at "datetimes" . value_type . should_equal Value_Type.Date_Time
|
||||
padded.at "mixed" . value_type . should_equal Value_Type.Mixed
|
@ -7,12 +7,14 @@ import project.Common_Table_Operations.Distinct_Spec
|
||||
import project.Common_Table_Operations.Expression_Spec
|
||||
import project.Common_Table_Operations.Filter_Spec
|
||||
import project.Common_Table_Operations.Integration_Tests
|
||||
import project.Common_Table_Operations.Join_Spec
|
||||
import project.Common_Table_Operations.Join.Join_Spec
|
||||
import project.Common_Table_Operations.Join.Cross_Join_Spec
|
||||
import project.Common_Table_Operations.Join.Zip_Spec
|
||||
import project.Common_Table_Operations.Join.Union_Spec
|
||||
import project.Common_Table_Operations.Missing_Values_Spec
|
||||
import project.Common_Table_Operations.Order_By_Spec
|
||||
import project.Common_Table_Operations.Select_Columns_Spec
|
||||
import project.Common_Table_Operations.Take_Drop_Spec
|
||||
import project.Common_Table_Operations.Union_Spec
|
||||
|
||||
from project.Common_Table_Operations.Util import run_default_backend
|
||||
|
||||
@ -95,6 +97,8 @@ spec setup =
|
||||
Take_Drop_Spec.spec setup
|
||||
Expression_Spec.spec detailed=False setup
|
||||
Join_Spec.spec setup
|
||||
Cross_Join_Spec.spec setup
|
||||
Zip_Spec.spec setup
|
||||
Union_Spec.spec setup
|
||||
Distinct_Spec.spec setup
|
||||
Integration_Tests.spec setup
|
||||
|
@ -83,11 +83,15 @@ spec setup =
|
||||
selector = By_Index [0, -7, -6, 1]
|
||||
action = table.select_columns selector on_problems=_
|
||||
tester = expect_column_names ["foo", "bar"]
|
||||
problems = [Input_Indices_Already_Matched.Error [-7, 1]]
|
||||
problem_checker problem =
|
||||
problem.should_be_a Input_Indices_Already_Matched.Error
|
||||
problem.indices.should_contain_the_same_elements_as [-7, 1]
|
||||
True
|
||||
err_checker err =
|
||||
err.catch.should_be_a Input_Indices_Already_Matched.Error
|
||||
err.catch.indices.should_contain_the_same_elements_as [-7, 1]
|
||||
Problems.test_advanced_problem_handling action err_checker (x-> x) tester
|
||||
problem_checker err.catch
|
||||
warn_checker warnings =
|
||||
warnings.all problem_checker
|
||||
Problems.test_advanced_problem_handling action err_checker warn_checker tester
|
||||
|
||||
Test.specify "should correctly handle problems: duplicate names" <|
|
||||
selector = By_Name ["foo", "foo"]
|
||||
|
@ -61,7 +61,7 @@ spec =
|
||||
|
||||
r1 = plain_formatter.parse "1E3" Decimal
|
||||
r1.should_equal Nothing
|
||||
Warning.get_all r1 . map .value . should_equal [(Invalid_Format.Error Nothing Decimal ["1E3"])]
|
||||
Problems.get_attached_warnings r1 . should_equal [(Invalid_Format.Error Nothing Decimal ["1E3"])]
|
||||
|
||||
exponential_formatter.parse "1E3" . should_equal 1000.0
|
||||
exponential_formatter.parse "1E3" Decimal . should_equal 1000.0
|
||||
|
@ -35,15 +35,15 @@ spec = Test.group "Table.parse_values" <|
|
||||
t1_zeros = ["+00", "-00", "+01", "-01", "01", "000", "0010"]
|
||||
t3 = t1.parse_values column_types=[Column_Type_Selection.Value 0 Integer]
|
||||
t3.at "ints" . to_vector . should_equal t1_parsed
|
||||
Warning.get_all t3 . map .value . should_equal [Leading_Zeros.Error "ints" Integer t1_zeros]
|
||||
Problems.get_attached_warnings t3 . should_equal [Leading_Zeros.Error "ints" Integer t1_zeros]
|
||||
|
||||
t4 = t1.parse_values column_types=[Column_Type_Selection.Value 0 Decimal]
|
||||
t4.at "ints" . to_vector . should_equal t1_parsed
|
||||
Warning.get_all t4 . map .value . should_equal [Leading_Zeros.Error "ints" Decimal t1_zeros]
|
||||
Problems.get_attached_warnings t4 . should_equal [Leading_Zeros.Error "ints" Decimal t1_zeros]
|
||||
|
||||
t5 = t2.parse_values column_types=[Column_Type_Selection.Value 0 Decimal]
|
||||
t5.at "floats" . to_vector . should_equal [0.0, 0.0, Nothing, Nothing, Nothing, 1.0]
|
||||
Warning.get_all t5 . map .value . should_equal [Leading_Zeros.Error "floats" Decimal ["00.", "01.0", '-0010.0000']]
|
||||
Problems.get_attached_warnings t5 . should_equal [Leading_Zeros.Error "floats" Decimal ["00.", "01.0", '-0010.0000']]
|
||||
|
||||
opts = Data_Formatter.Value allow_leading_zeros=True
|
||||
t1_parsed_zeros = [0, 0, 0, 1, -1, 1, 0, 10, 12345, Nothing]
|
||||
@ -203,10 +203,10 @@ spec = Test.group "Table.parse_values" <|
|
||||
t3 = Table.new [["xs", ["1,2", "1.2", "_0", "0_", "1_0_0"]]]
|
||||
t4 = t3.parse_values opts column_types=[Column_Type_Selection.Value 0 Decimal]
|
||||
t4.at "xs" . to_vector . should_equal [1.2, Nothing, Nothing, Nothing, 100.0]
|
||||
Warning.get_all t4 . map .value . should_equal [Invalid_Format.Error "xs" Decimal ["1.2", "_0", "0_"]]
|
||||
Problems.get_attached_warnings t4 . should_equal [Invalid_Format.Error "xs" Decimal ["1.2", "_0", "0_"]]
|
||||
t5 = t3.parse_values opts column_types=[Column_Type_Selection.Value 0 Integer]
|
||||
t5.at "xs" . to_vector . should_equal [Nothing, Nothing, Nothing, Nothing, 100.0]
|
||||
Warning.get_all t5 . map .value . should_equal [Invalid_Format.Error "xs" Integer ["1,2", "1.2", "_0", "0_"]]
|
||||
Problems.get_attached_warnings t5 . should_equal [Invalid_Format.Error "xs" Integer ["1,2", "1.2", "_0", "0_"]]
|
||||
|
||||
Test.specify "should allow to specify custom values for booleans" <|
|
||||
opts_1 = Data_Formatter.Value true_values=["1", "YES"] false_values=["0"]
|
||||
@ -217,7 +217,7 @@ spec = Test.group "Table.parse_values" <|
|
||||
t3 = Table.new [["bools", ["1", "NO", "False", "True", "YES", "no", "oui", "0"]]]
|
||||
t4 = t3.parse_values opts_1 column_types=[Column_Type_Selection.Value 0 Boolean]
|
||||
t4.at "bools" . to_vector . should_equal [True, Nothing, Nothing, Nothing, True, Nothing, Nothing, False]
|
||||
Warning.get_all t4 . map .value . should_equal [Invalid_Format.Error "bools" Boolean ["NO", "False", "True", "no", "oui"]]
|
||||
Problems.get_attached_warnings t4 . should_equal [Invalid_Format.Error "bools" Boolean ["NO", "False", "True", "no", "oui"]]
|
||||
|
||||
whitespace_table =
|
||||
ints = ["ints", ["0", "1 ", "0 1", " 2"]]
|
||||
@ -236,7 +236,7 @@ spec = Test.group "Table.parse_values" <|
|
||||
t1.at "dates" . to_vector . should_equal [Date.new 2022 1 1, Date.new 2022 7 17, Nothing, Nothing]
|
||||
t1.at "datetimes" . to_vector . should_equal [Date_Time.new 2022 1 1 11 59, Nothing, Nothing, Nothing]
|
||||
t1.at "times" . to_vector . should_equal [Time_Of_Day.new 11 0 0, Time_Of_Day.new, Nothing, Nothing]
|
||||
warnings = Warning.get_all t1 . map .value
|
||||
warnings = Problems.get_attached_warnings t1
|
||||
expected_warnings = Vector.new_builder
|
||||
expected_warnings.append (Invalid_Format.Error "ints" Integer ["0 1"])
|
||||
expected_warnings.append (Invalid_Format.Error "floats" Decimal ["- 1"])
|
||||
@ -256,7 +256,7 @@ spec = Test.group "Table.parse_values" <|
|
||||
t1.at "dates" . to_vector . should_equal nulls
|
||||
t1.at "datetimes" . to_vector . should_equal nulls
|
||||
t1.at "times" . to_vector . should_equal nulls
|
||||
warnings = Warning.get_all t1 . map .value
|
||||
warnings = Problems.get_attached_warnings t1
|
||||
expected_warnings = Vector.new_builder
|
||||
expected_warnings.append (Invalid_Format.Error "ints" Integer ["1 ", "0 1", " 2"])
|
||||
expected_warnings.append (Invalid_Format.Error "floats" Decimal ["0 ", " 2.0", "- 1"])
|
||||
|
@ -202,7 +202,7 @@ spec =
|
||||
positions = [7, 8, 15]
|
||||
msg = "Encoding issues at codepoints " +
|
||||
positions.map .to_text . join separator=", " suffix="."
|
||||
Warning.get_all result . map .value . should_equal [Encoding_Error.Error msg]
|
||||
Problems.get_attached_warnings result . should_equal [Encoding_Error.Error msg]
|
||||
file.delete
|
||||
|
||||
Test.specify "should allow only text columns if no formatter is specified" <|
|
||||
|
@ -2,7 +2,7 @@ from Standard.Base import all
|
||||
import Standard.Base.Error.Common.Type_Error
|
||||
import Standard.Base.Error.Time_Error.Time_Error
|
||||
|
||||
from Standard.Test import Test, Test_Suite
|
||||
from Standard.Test import Problems, Test, Test_Suite
|
||||
import Standard.Test.Extensions
|
||||
|
||||
import project.Data.Time.Date_Part_Spec
|
||||
@ -146,7 +146,7 @@ spec_with name create_new_date parse_date =
|
||||
is_time_error v = case v of
|
||||
_ : Time_Error -> True
|
||||
_ -> False
|
||||
expect_warning value = (Warning.get_all value . map .value . any is_time_error) . should_be_true
|
||||
expect_warning value = (Problems.get_attached_warnings value . any is_time_error) . should_be_true
|
||||
dates_before_epoch = [(create_new_date 100), (create_new_date 500 6 3)]
|
||||
dates_before_epoch.each date->
|
||||
expect_warning date.week_of_year
|
||||
|
@ -93,7 +93,7 @@ spec =
|
||||
expected_problems = [Encoding_Error.Error "Encoding issues at bytes 14, 15, 16."]
|
||||
contents_1 = read_file_one_by_one windows_file encoding expected_contents.length on_problems=Problem_Behavior.Report_Warning
|
||||
contents_1.should_equal expected_contents
|
||||
Warning.get_all contents_1 . map .value . should_equal expected_problems
|
||||
Problems.get_attached_warnings contents_1 . should_equal expected_problems
|
||||
|
||||
contents_2 = windows_file.with_input_stream [File_Access.Read] stream->
|
||||
stream.with_stream_decoder encoding Problem_Behavior.Report_Warning reporting_stream_decoder->
|
||||
@ -104,7 +104,7 @@ spec =
|
||||
reporting_stream_decoder.read.should_equal -1
|
||||
Text.from_codepoints <| [codepoint_1]+codepoints_1+codepoints_2+codepoints_3
|
||||
contents_2.should_equal expected_contents
|
||||
Warning.get_all contents_2 . map .value . should_equal expected_problems
|
||||
Problems.get_attached_warnings contents_2 . should_equal expected_problems
|
||||
|
||||
Test.specify "should work correctly if no data is read from it" <|
|
||||
result = windows_file.with_input_stream [File_Access.Read] stream->
|
||||
|
@ -61,7 +61,7 @@ spec =
|
||||
stream.with_stream_encoder encoding Problem_Behavior.Report_Warning reporting_stream_encoder->
|
||||
reporting_stream_encoder.write contents
|
||||
result.should_succeed
|
||||
Warning.get_all result . map .value . should_equal [Encoding_Error.Error "Encoding issues at codepoints 1, 3."]
|
||||
Problems.get_attached_warnings result . should_equal [Encoding_Error.Error "Encoding issues at codepoints 1, 3."]
|
||||
f.read_text encoding . should_equal "S?o?wka!"
|
||||
|
||||
f.delete_if_exists
|
||||
@ -74,7 +74,7 @@ spec =
|
||||
reporting_stream_encoder.write "bar"
|
||||
|
||||
result_2.should_succeed
|
||||
Warning.get_all result_2 . map .value . should_equal [Encoding_Error.Error "Encoding issues at codepoints 3, 9."]
|
||||
Problems.get_attached_warnings result_2 . should_equal [Encoding_Error.Error "Encoding issues at codepoints 3, 9."]
|
||||
f.read_text encoding . should_equal "ABC?foo -?- bar"
|
||||
|
||||
Test.specify "should work correctly if no data is written to it" <|
|
||||
|
Loading…
Reference in New Issue
Block a user