enso/test/Visualization_Tests/src/Scatter_Plot_Spec.enso

130 lines
5.6 KiB
Plaintext
Raw Normal View History

from Standard.Base import all
Restructure File.read into the new design (#3701) Changes following Marcin's work. Should be back to very similar public API as before. - Add an "interface" type: `Standard.Base.System.File_Format.File_Format`. - All `File_Format` types now have a `can_read` method to decide if they can read a file. - Move `Standard.Table.IO.File_Format.Text.Text_Data` to `Standard.Base.System.File_Format.Plain_Text_Format.Plain_Text`. - Move `Standard.Table.IO.File_Format.Bytes` to `Standard.Base.System.File_Format.Bytes`. - Move `Standard.Table.IO.File_Format.Infer` to `Standard.Base.System.File_Format.Infer`. **(doesn't belong here...)** - Move `Standard.Table.IO.File_Format.Unsupported_File_Type` to `Standard.Base.Error.Common.Unsupported_File_Type`. - Add `Infer`, `File_Format`, `Bytes`, `Plain_Text`, `Plain_Text_Format` to `Standard.Base` exports. - Fold extension methods of `Standard.Base.Meta.Unresolved_Symbol` into type. - Move `Standard.Table.IO.File_Format.Auto` to `Standard.Table.IO.Auto_Detect.Auto_Detect`. - Added a `types` Vector of all the built in formats. - `Auto_Detect` asks each type if they `can_read` a file. - Broke up and moved `Standard.Table.IO.Excel` into `Standard.Table.Excel`: - Moved `Standard.Table.IO.File_Format.Excel.Excel_Data` to `Standard.Table.Excel.Excel_Format.Excel_Format.Excel`. - Renamed `Sheet` to `Worksheet`. - Internal types `Reader` and `Writer` providing the actual read and write methods. - Created `Standard.Table.Delimited` with similar structure to `Standard.Table.Excel`: - Moved `Standard.Table.IO.File_Format.Delimited.Delimited_Data` to `Standard.Table.Delimited.Delimited_Format.Delimited_Format.Delimited`. - Moved `Standard.Table.IO.Quote_Style` to `Standard.Table.Delimited.Quote_Style`. - Moved the `Reader` and `Writer` internal types into here. Renamed methods to have unique names. - Add `Aggregate_Column`, `Auto_Detect`, `Delimited`, `Delimited_Format`, `Excel`, `Excel_Format`, `Sheet_Names`, `Range_Names`, `Worksheet` and `Cell_Range` to `Standard.Table` exports.
2022-09-15 17:48:46 +03:00
from Standard.Table import Table, Column
import Standard.Visualization.Scatter_Plot
from Standard.Test import Test, Test_Suite
import Standard.Test.Extensions
import project
spec =
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
expect_text text axis_expected_text data_expected_text =
json = Json.parse text
json.field_names.should_equal ['data', 'axis']
expect_text = '{"axis": ' + axis_expected_text + ', "data": ' + data_expected_text + '}'
expected_result = Json.parse expect_text
json.should_equal expected_result
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
expect value axis_expected_text data_expected_text =
text = Scatter_Plot.process_to_json_text value
expect_text text axis_expected_text data_expected_text
index = Scatter_Plot.index_name
axis label = JS_Object.from_pairs [['label',label]]
labels x y = JS_Object.from_pairs [['x', axis x], ['y', axis y]] . to_text
no_labels = 'null'
Test.group "Scatter Plot Visualization" <|
2021-04-28 12:47:57 +03:00
Test.specify "deals with an empty table" <|
table = Table.from_rows [] []
expect table 'null' '[]'
2021-04-28 12:47:57 +03:00
Test.specify "plots first column if none recognized" <|
header = ['α', 'ω']
row_1 = [11 , 10 ]
row_2 = [21 , 20 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels index 'α') '[{"x":0,"y":11},{"x":1,"y":21}]'
2021-04-28 12:47:57 +03:00
Test.specify "plots 'y' against indices when no 'x' recognized" <|
header = ['α', 'y']
row_1 = [11 , 10 ]
row_2 = [21 , 20 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels index 'y') '[{"x":0,"y":10},{"x":1,"y":20}]'
2021-04-28 12:47:57 +03:00
Test.specify "recognizes all relevant columns" <|
header = ['x' , 'y' , 'size' , 'shape' , 'label' , 'color' ]
row_1 = [11 , 10 , 50 , 'square' , 'label' , 'ff0000']
table = Table.from_rows header [row_1]
expect table (labels 'x' 'y') '[{"color":"ff0000","label":"label","shape":"square","size":50,"x":11,"y":10}]'
2021-04-28 12:47:57 +03:00
Test.specify "is case-insensitive" <|
header = ['X' , 'Y' , 'Size' , 'Shape' , 'Label' , 'Color' ]
row_1 = [11 , 10 , 50 , 'square' , 'label' , 'ff0000']
table = Table.from_rows header [row_1]
expect table (labels 'X' 'Y') '[{"color":"ff0000","label":"label","shape":"square","size":50,"x":11,"y":10}]'
2021-04-28 12:47:57 +03:00
Test.specify "uses first unrecognized numeric column as `y` fallback" <|
header = ['x' , 'size' , 'name' , 'z' , 'ω']
row_1 = [11 , 50 , 'circul' , 20 , 30]
table = Table.from_rows header [row_1]
expect table (labels 'x' 'z') '[{"size":50,"x":11,"y":20}]'
Test.specify "provided only recognized columns" <|
header = ['x', 'y' , 'bar' , 'size']
row_1 = [11 , 10 , 'aa' , 40 ]
row_2 = [21 , 20 , 'bb' , 50 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels 'x' 'y') '[{"size":40,"x":11,"y":10},{"size":50,"x":21,"y":20}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
Test.specify "provided only recognized columns within bounds" <|
header = ['x', 'y' , 'bar' , 'size']
row_1 = [1 , 1 , '11' , 30 ]
row_2 = [11 , 10 , 'aa' , 40 ]
row_3 = [21 , 20 , 'bb' , 50 ]
row_4 = [31 , 30 , 'cc' , 60 ]
table = Table.from_rows header [row_1, row_2, row_3, row_4]
bounds = [0,5,25,25]
text = Scatter_Plot.process_to_json_text table bounds
expect_text text (labels 'x' 'y') '[{"size":40,"x":11,"y":10},{"size":50,"x":21,"y":20}]'
Test.specify "used default index for `x` if none set" <|
header = [ 'y' , 'bar' , 'size']
row_1 = [ 10 , 'aa' , 40 ]
row_2 = [ 20 , 'bb' , 50 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels index 'y') '[{"size":40,"x":0,"y":10},{"size":50,"x":1,"y":20}]'
Test.specify "using indices for x if given a vector" <|
vector = [0,10,20]
expect vector no_labels '[{"x":0,"y":0},{"x":1,"y":10},{"x":2,"y":20}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
Test.specify "limit the number of elements" <|
vector = [0,10,20,30]
text = Scatter_Plot.process_to_json_text vector limit=2
json = Json.parse text
json.field_names.should_equal ['data','axis']
data = json.get 'data'
data.length . should_equal 2
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
Test.specify "limit the number of squared elements" <|
vector = (-15).up_to 15 . map (x -> x * x)
text = Scatter_Plot.process_to_json_text vector limit=10
json = Json.parse text
json.field_names.should_equal ['data','axis']
data = json.get 'data'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
data.length . should_equal 10
(data.take (First 3)).to_text . should_equal '[{"x":0,"y":225}, {"x":15,"y":0}, {"x":29,"y":196}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
Test.specify "filter the elements" <|
vector = [0,10,20,30]
bounds = [0,5,10,25]
text = Scatter_Plot.process_to_json_text vector bounds
expect_text text no_labels '[{"x":1,"y":10},{"x":2,"y":20}]'
Test.specify "using indices for x if given a column" <|
column = Column.from_vector 'some_col' [10,2,3]
expect column (labels 'index' 'some_col') '[{"x":0,"y":10},{"x":1,"y":2},{"x":2,"y":3}]'
Test.specify "using indices for x if given a range" <|
2021-04-28 12:47:57 +03:00
value = 2.up_to 5
expect value no_labels '[{"x":0,"y":2},{"x":1,"y":3},{"x":2,"y":4}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
main = Test_Suite.run_main spec