enso/test/Visualization_Tests/src/Scatter_Plot_Spec.enso

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

127 lines
5.6 KiB
Plaintext
Raw Normal View History

from Standard.Base import all
Restructure File.read into the new design (#3701) Changes following Marcin's work. Should be back to very similar public API as before. - Add an "interface" type: `Standard.Base.System.File_Format.File_Format`. - All `File_Format` types now have a `can_read` method to decide if they can read a file. - Move `Standard.Table.IO.File_Format.Text.Text_Data` to `Standard.Base.System.File_Format.Plain_Text_Format.Plain_Text`. - Move `Standard.Table.IO.File_Format.Bytes` to `Standard.Base.System.File_Format.Bytes`. - Move `Standard.Table.IO.File_Format.Infer` to `Standard.Base.System.File_Format.Infer`. **(doesn't belong here...)** - Move `Standard.Table.IO.File_Format.Unsupported_File_Type` to `Standard.Base.Error.Common.Unsupported_File_Type`. - Add `Infer`, `File_Format`, `Bytes`, `Plain_Text`, `Plain_Text_Format` to `Standard.Base` exports. - Fold extension methods of `Standard.Base.Meta.Unresolved_Symbol` into type. - Move `Standard.Table.IO.File_Format.Auto` to `Standard.Table.IO.Auto_Detect.Auto_Detect`. - Added a `types` Vector of all the built in formats. - `Auto_Detect` asks each type if they `can_read` a file. - Broke up and moved `Standard.Table.IO.Excel` into `Standard.Table.Excel`: - Moved `Standard.Table.IO.File_Format.Excel.Excel_Data` to `Standard.Table.Excel.Excel_Format.Excel_Format.Excel`. - Renamed `Sheet` to `Worksheet`. - Internal types `Reader` and `Writer` providing the actual read and write methods. - Created `Standard.Table.Delimited` with similar structure to `Standard.Table.Excel`: - Moved `Standard.Table.IO.File_Format.Delimited.Delimited_Data` to `Standard.Table.Delimited.Delimited_Format.Delimited_Format.Delimited`. - Moved `Standard.Table.IO.Quote_Style` to `Standard.Table.Delimited.Quote_Style`. - Moved the `Reader` and `Writer` internal types into here. Renamed methods to have unique names. - Add `Aggregate_Column`, `Auto_Detect`, `Delimited`, `Delimited_Format`, `Excel`, `Excel_Format`, `Sheet_Names`, `Range_Names`, `Worksheet` and `Cell_Range` to `Standard.Table` exports.
2022-09-15 17:48:46 +03:00
from Standard.Table import Table, Column
import Standard.Visualization.Scatter_Plot
from Standard.Test import Test, Test_Suite
import Standard.Test.Extensions
import project
spec =
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
expect_text text axis_expected_text data_expected_text =
json = Json.parse text
json.field_names.should_equal ['data', 'axis']
expect_text = '{"axis": ' + axis_expected_text + ', "data": ' + data_expected_text + '}'
expected_result = Json.parse expect_text
json.should_equal expected_result
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
expect value axis_expected_text data_expected_text =
text = Scatter_Plot.process_to_json_text value
expect_text text axis_expected_text data_expected_text
index = Scatter_Plot.index_name
axis label = JS_Object.from_pairs [['label',label]]
labels x y = JS_Object.from_pairs [['x', axis x], ['y', axis y]] . to_text
no_labels = 'null'
Test.group "Scatter Plot Visualization" <|
Test.specify "plots first column if none recognized" <|
header = ['α', 'ω']
row_1 = [11 , 10 ]
row_2 = [21 , 20 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels index 'α') '[{"x":0,"y":11},{"x":1,"y":21}]'
2021-04-28 12:47:57 +03:00
Test.specify "plots 'y' against indices when no 'x' recognized" <|
header = ['α', 'y']
row_1 = [11 , 10 ]
row_2 = [21 , 20 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels index 'y') '[{"x":0,"y":10},{"x":1,"y":20}]'
2021-04-28 12:47:57 +03:00
Test.specify "recognizes all relevant columns" <|
header = ['x' , 'y' , 'size' , 'shape' , 'label' , 'color' ]
row_1 = [11 , 10 , 50 , 'square' , 'label' , 'ff0000']
table = Table.from_rows header [row_1]
expect table (labels 'x' 'y') '[{"color":"ff0000","label":"label","shape":"square","size":50,"x":11,"y":10}]'
2021-04-28 12:47:57 +03:00
Test.specify "is case-insensitive" <|
header = ['X' , 'Y' , 'Size' , 'Shape' , 'Label' , 'Color' ]
row_1 = [11 , 10 , 50 , 'square' , 'label' , 'ff0000']
table = Table.from_rows header [row_1]
expect table (labels 'X' 'Y') '[{"color":"ff0000","label":"label","shape":"square","size":50,"x":11,"y":10}]'
2021-04-28 12:47:57 +03:00
Test.specify "uses first unrecognized numeric column as `y` fallback" <|
header = ['x' , 'size' , 'name' , 'z' , 'ω']
row_1 = [11 , 50 , 'circul' , 20 , 30]
table = Table.from_rows header [row_1]
expect table (labels 'x' 'z') '[{"size":50,"x":11,"y":20}]'
Test.specify "provided only recognized columns" <|
header = ['x', 'y' , 'bar' , 'size']
row_1 = [11 , 10 , 'aa' , 40 ]
row_2 = [21 , 20 , 'bb' , 50 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels 'x' 'y') '[{"size":40,"x":11,"y":10},{"size":50,"x":21,"y":20}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
Test.specify "provided only recognized columns within bounds" <|
header = ['x', 'y' , 'bar' , 'size']
row_1 = [1 , 1 , '11' , 30 ]
row_2 = [11 , 10 , 'aa' , 40 ]
row_3 = [21 , 20 , 'bb' , 50 ]
row_4 = [31 , 30 , 'cc' , 60 ]
table = Table.from_rows header [row_1, row_2, row_3, row_4]
bounds = [0,5,25,25]
text = Scatter_Plot.process_to_json_text table bounds
expect_text text (labels 'x' 'y') '[{"size":40,"x":11,"y":10},{"size":50,"x":21,"y":20}]'
Test.specify "used default index for `x` if none set" <|
header = [ 'y' , 'bar' , 'size']
row_1 = [ 10 , 'aa' , 40 ]
row_2 = [ 20 , 'bb' , 50 ]
table = Table.from_rows header [row_1, row_2]
expect table (labels index 'y') '[{"size":40,"x":0,"y":10},{"size":50,"x":1,"y":20}]'
Test.specify "using indices for x if given a vector" <|
vector = [0,10,20]
expect vector no_labels '[{"x":0,"y":0},{"x":1,"y":10},{"x":2,"y":20}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
Test.specify "limit the number of elements" <|
vector = [0,10,20,30]
text = Scatter_Plot.process_to_json_text vector limit=2
json = Json.parse text
json.field_names.should_equal ['data','axis']
data = json.get 'data'
data.should_be_a Vector
data.length . should_equal 2
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
Test.specify "limit the number of squared elements" <|
vector = (-15).up_to 15 . map (x -> x * x)
text = Scatter_Plot.process_to_json_text vector limit=10
json = Json.parse text
json.field_names.should_equal ['data','axis']
data = json.get 'data'
data.should_be_a Vector
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
data.length . should_equal 10
All Enso objects are hasheable (#3878) * Hash codes prototype * Remove Any.hash_code * Improve caching of hashcode in atoms * [WIP] Add Hash_Map type * Implement Any.hash_code builtin for primitives and vectors * Add some values to ValuesGenerator * Fix example docs on Time_Zone.new * [WIP] QuickFix for HashCodeTest before PR #3956 is merged * Fix hash code contract in HashCodeTest * Add times and dates values to HashCodeTest * Fix docs * Remove hashCodeForMetaInterop specialization * Introduce snapshoting of HashMapBuilder * Add unit tests for EnsoHashMap * Remove duplicate test in Map_Spec.enso * Hash_Map.to_vector caches result * Hash_Map_Spec is a copy of Map_Spec * Implement some methods in Hash_Map * Add equalsHashMaps specialization to EqualsAnyNode * get and insert operations are able to work with polyglot values * Implement rest of Hash_Map API * Add test that inserts elements with keys with same hash code * EnsoHashMap.toDisplayString use builder storage directly * Add separate specialization for host objects in EqualsAnyNode * Fix specialization for host objects in EqualsAnyNode * Add polyglot hash map tests * EconomicMap keeps reference to EqualsNode and HashCodeNode. Rather than passing these nodes to `get` and `insert` methods. * HashMapTest run in polyglot context * Fix containsKey index handling in snapshots * Remove snapshots field from EnsoHashMapBuilder * Prepare polyglot hash map handling. - Hash_Map builtin methods are separate nodes * Some bug fixes * Remove ForeignMapWrapper. We would have to wrap foreign maps in assignments for this to be efficient. * Improve performance of Hash_Map.get_builtin Also, if_nothing parameter is suspended * Remove to_flat_vector. Interop API requires nested vector (our previous to_vector implementation). Seems that I have misunderstood the docs the first time I read it. - to_vector does not sort the vector by keys by default * Fix polyglot hash maps method dispatch * Add tests that effectively test hash code implementation. Via hash map that behaves like a hash set. * Remove Hashcode_Spec * Add some polyglot tests * Add Text.== tests for NFD normalization * Fix NFD normalization bug in Text.java * Improve performance of EqualsAnyNode.equalsTexts specialization * Properly compute hash code for Atom and cache it * Fix Text specialization in HashCodeAnyNode * Add Hash_Map_Spec as part of all tests * Remove HashMapTest.java Providing all the infrastructure for all the needed Truffle nodes is no longer manageable. * Remove rest of identityHashCode message implementations * Replace old Map with Hash_Map * Add some docs * Add TruffleBoundaries * Formatting * Fix some tests to accept unsorted vector from Map.to_vector * Delete Map.first and Map.last methods * Add specialization for big integer hash * Introduce proper HashCodeTest and EqualsTest. - Use jUnit theories. - Call nodes directly * Fix some specializations for primitives in HashCodeAnyNode * Fix host object specialization * Remove Any.hash_code * Fix import in Map.enso * Update changelog * Reformat * Add truffle boundary to BigInteger.hashCode * Fix performance of HashCodeTest - initialize DataPoints just once * Fix MetaIsATest * Fix ValuesGenerator.textual - Java's char is not Text * Fix indent in Map_Spec.enso * Add maps to datapoints in HashCodeTest * Add specialization for maps in HashCodeAnyNode * Add multiLevelAtoms to ValuesGenerator * Provide a workaround for non-linear key inserts * Fix specializations for double and BigInteger * Cosmetics * Add truffle boundaries * Add allowInlining=true to some truffle boundaries. Increases performance a lot. * Increase the size of vectors, and warmup time for Vector.Distinct benchmark * Various small performance fixes. * Fix Geo_Spec tests to accept unsorted Map.to_vector * Implement Map.remove * FIx Visualization tests to accept unsorted Map.to_vector * Treat java.util.Properties as Map * Add truffle boundaries * Invoke polyglot methods on java.util.Properties * Ignore python tests if python lang is missing
2023-01-19 12:33:25 +03:00
(data.take (First 3)).to_text . should_equal '[{"x":0,"y":225}, {"x":29,"y":196}, {"x":15,"y":0}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
Test.specify "filter the elements" <|
vector = [0,10,20,30]
bounds = [0,5,10,25]
text = Scatter_Plot.process_to_json_text vector bounds
expect_text text no_labels '[{"x":1,"y":10},{"x":2,"y":20}]'
Test.specify "using indices for x if given a column" <|
column = Column.from_vector 'some_col' [10,2,3]
expect column (labels 'index' 'some_col') '[{"x":0,"y":10},{"x":1,"y":2},{"x":2,"y":3}]'
Test.specify "using indices for x if given a range" <|
2021-04-28 12:47:57 +03:00
value = 2.up_to 5
expect value no_labels '[{"x":0,"y":2},{"x":1,"y":3},{"x":2,"y":4}]'
Lazy scatterplot for Vector & Table (#3655) First of all this PR demonstrates how to implement _lazy visualization_: - one needs to write/enhance Enso visualization libraries - this PR adds two optional parameters (`bounds` and `limit`) to `process_to_json_text` function. - the `process_to_json_text` can be tested by standard Enso test harness which this PR also does - then one has to modify JavaScript on the IDE side to construct `setPreprocessor` expression using the optional parameters The idea of _scatter plot lazy visualization_ is to limit the amount of points the IDE requests. Initially the limit is set to `limit=1024`. The `Scatter_Plot.enso` then processes the data and selects/generates the `limit` subset. Right now it includes `min`, `max` in both `x`, `y` axis plus randomly chosen points up to the `limit`. ![Zooming In](https://user-images.githubusercontent.com/26887752/185336126-f4fbd914-7fd8-4f0b-8377-178095401f46.png) The D3 visualization widget is capable of _zooming in_. When that happens the JavaScript widget composes new expression with `bounds` set to the newly visible area. By calling `setPreprocessor` the engine recomputes the visualization data, filters out any data outside of the `bounds` and selects another `limit` points from the new data. The IDE visualization then updates itself to display these more detailed data. Users can zoom-in to see the smallest detail where the number of points gets bellow `limit` or they can select _Fit all_ to see all the data without any `bounds`. # Important Notes Randomly selecting `limit` samples from the dataset may be misleading. Probably implementing _k-means clustering_ (where `k=limit`) would generate more representative approximation.
2022-08-23 15:12:22 +03:00
main = Test_Suite.run_main spec