enso/test/Visualization_Tests/src/Scatter_Plot_Spec.enso

146 lines
6.7 KiB
Plaintext
Raw Normal View History

from Standard.Base import all
import Standard.Table.Data.Column
import Standard.Table.Data.Table
import Standard.Visualization.Scatter_Plot
import Standard.Test
import project
spec =
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
expect_text text axis_expected_text data_expected_text =
json = Json.parse text
json.fields.keys.should_equal ['axis','data']
expected_axis_labels = ['axis', Json.parse axis_expected_text]
expected_data_pair = ['data', Json.parse data_expected_text]
expected_result = Json.from_pairs [expected_axis_labels, expected_data_pair]
json.should_equal expected_result
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
expect value axis_expected_text data_expected_text =
text = Scatter_Plot.process_to_json_text value
expect_text text axis_expected_text data_expected_text
index = Scatter_Plot.index_name
axis label = Json.from_pairs [['label',label]]
labels x y = Json.from_pairs [['x', axis x], ['y', axis y]] . to_text
no_labels = 'null'
Test.group "Scatter Plot Visualization" <|
2021-04-28 12:47:57 +03:00
Test.specify "deals with an empty table" <|
table = Table.from_rows [] []
expect table 'null' '[]'
2021-04-28 12:47:57 +03:00
Test.specify "plots first column if none recognized" <|
header = ['α', 'ω']
row_1 = [11 , 10 ]
row_2 = [21 , 20 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels index 'α') '[{"x":0,"y":11},{"x":1,"y":21}]'
2021-04-28 12:47:57 +03:00
Test.specify "plots 'y' against indices when no 'x' recognized" <|
header = ['α', 'y']
row_1 = [11 , 10 ]
row_2 = [21 , 20 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels index 'y') '[{"x":0,"y":10},{"x":1,"y":20}]'
2021-04-28 12:47:57 +03:00
Test.specify "recognizes all relevant columns" <|
header = ['x' , 'y' , 'size' , 'shape' , 'label' , 'color' ]
row_1 = [11 , 10 , 50 , 'square' , 'label' , 'ff0000']
table = Table.from_rows header [row_1]
expect table (labels 'x' 'y') '[{"color":"ff0000","label":"label","shape":"square","size":50,"x":11,"y":10}]'
2021-04-28 12:47:57 +03:00
Test.specify "is case-insensitive" <|
header = ['X' , 'Y' , 'Size' , 'Shape' , 'Label' , 'Color' ]
row_1 = [11 , 10 , 50 , 'square' , 'label' , 'ff0000']
table = Table.from_rows header [row_1]
expect table (labels 'X' 'Y') '[{"color":"ff0000","label":"label","shape":"square","size":50,"x":11,"y":10}]'
2021-04-28 12:47:57 +03:00
Test.specify "uses first unrecognized numeric column as `y` fallback" <|
header = ['x' , 'size' , 'name' , 'z' , 'ω']
row_1 = [11 , 50 , 'circul' , 20 , 30]
table = Table.from_rows header [row_1]
expect table (labels 'x' 'z') '[{"size":50,"x":11,"y":20}]'
Test.specify "provided only recognized columns" <|
header = ['x', 'y' , 'bar' , 'size']
row_1 = [11 , 10 , 'aa' , 40 ]
row_2 = [21 , 20 , 'bb' , 50 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels 'x' 'y') '[{"size":40,"x":11,"y":10},{"size":50,"x":21,"y":20}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
Test.specify "provided only recognized columns within bounds" <|
header = ['x', 'y' , 'bar' , 'size']
row_1 = [1 , 1 , '11' , 30 ]
row_2 = [11 , 10 , 'aa' , 40 ]
row_3 = [21 , 20 , 'bb' , 50 ]
row_4 = [31 , 30 , 'cc' , 60 ]
table = Table.from_rows header [row_1, row_2, row_3, row_4]
bounds = [0,5,25,25]
text = Scatter_Plot.process_to_json_text table bounds
expect_text text (labels 'x' 'y') '[{"size":40,"x":11,"y":10},{"size":50,"x":21,"y":20}]'
Test.specify "used specified numeric index for x if missing 'x' column from table" <|
header = [ 'y' , 'foo', 'bar', 'baz' , 'size']
row_1 = [ 10 , 'aa' , 12 , 14 , 40 ]
row_2 = [ 20 , 'bb' , 13 , 15 , 50 ]
table = Table.from_rows header [row_1, row_2] . set_index 'baz'
# [TODO] mwu: When it is possible to set multiple index columns, test such case.
expect table (labels 'baz' 'y') '[{"size":40,"x":14,"y":10},{"size":50,"x":15,"y":20}]'
Test.specify "prefers explicit 'x' to index and looks into indices for recognized fields" <|
header = [ 'x' , 'size']
row_1 = [ 10 , 21 ]
row_2 = [ 20 , 22 ]
table = Table.from_rows header [row_1, row_2] . set_index 'size'
expect table (labels 'x' 'size') '[{"size":21,"x":10,"y":21},{"size":22,"x":20,"y":22}]'
Test.specify "used default index for `x` if none set" <|
header = [ 'y' , 'bar' , 'size']
row_1 = [ 10 , 'aa' , 40 ]
row_2 = [ 20 , 'bb' , 50 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels index 'y') '[{"size":40,"x":0,"y":10},{"size":50,"x":1,"y":20}]'
Test.specify "using indices for x if given a vector" <|
vector = [0,10,20]
expect vector no_labels '[{"x":0,"y":0},{"x":1,"y":10},{"x":2,"y":20}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
Test.specify "limit the number of elements" <|
vector = [0,10,20,30]
text = Scatter_Plot.process_to_json_text vector limit=2
json = Json.parse text
json.fields.keys.should_equal ['axis','data']
data = json.fields.get 'data'
data.unwrap.length . should_equal 2
Test.specify "limit the number of squared elements" <|
vector = (-15).up_to 15 . map (x -> x * x)
text = Scatter_Plot.process_to_json_text vector limit=10
json = Json.parse text
json.fields.keys.should_equal ['axis','data']
data = (json.fields.get 'data') . unwrap
data.length . should_equal 10
(data.take (First 3)).to_text . should_equal '[[[\'x\', 0], [\'y\', 225]], [[\'x\', 15], [\'y\', 0]], [[\'x\', 29], [\'y\', 196]]]'
Test.specify "filter the elements" <|
vector = [0,10,20,30]
bounds = [0,5,10,25]
text = Scatter_Plot.process_to_json_text vector bounds
expect_text text no_labels '[{"x":1,"y":10},{"x":2,"y":20}]'
Test.specify "using indices for x if given a column" <|
column = Column.from_vector 'some_col' [10,2,3]
expect column (labels 'index' 'some_col') '[{"x":0,"y":10},{"x":1,"y":2},{"x":2,"y":3}]'
Test.specify "using indices for x if given a range" <|
2021-04-28 12:47:57 +03:00
value = 2.up_to 5
expect value no_labels '[{"x":0,"y":2},{"x":1,"y":3},{"x":2,"y":4}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
main = Test.Suite.run_main spec