q benchmark (#241)

2024-10-03 22:39:52 +03:00 · 2020-09-19 12:56:06 +03:00 · 2020-09-19 12:56:06 +03:00 · 9b492b829a
commit 9b492b829a
parent 865f591a10
18 changed files with 835 additions and 4 deletions
--- a/.gitignore
+++ b/.gitignore
@ -12,3 +12,6 @@ packages
 .idea/
 dist/windows/
 generated-site/
+benchmark_data.tar.gz
+_benchmark_data/
+q.egg-info/
--- a/VERSION_BUMP.md
+++ b/VERSION_BUMP.md
@ -0,0 +1,18 @@
+
+# Version bump
+Currently, there are some manual steps needed in order to release a new version:
+
+* Make sure that you're in a branch
+* Change the version in the following three files: `bin/q.py`, `setup.py` and `do-manual-release.sh` and commit them to the branch
+* perform merge into master of that branch
+* add a tag of the release version
+* `git push --tags origin master`
+* create a release in github with the tag you've just created
+
+Pushing to master will trigger a build/release, and will push the artifacts to the new release as assets.
+
+The reason for this is related to limitations in the way that pyci uploads the binaries to github.
+
+#
+
+TBD - Continue with the flow of wrapping the artifacts with rpm/deb, copying the files to packages-for-q, and updating the web site.
--- a/bin/q.py
+++ b/bin/q.py
@ -33,7 +33,7 @@ from __future__ import print_function

 from collections import OrderedDict

-q_version = '2.0.17'
+q_version = '2.0.18'

 __all__ = [ 'QTextAsData' ]

--- a/do-manual-release.sh
+++ b/do-manual-release.sh
@ -2,7 +2,7 @@

 set -e

-VERSION=2.0.17
+VERSION=2.0.18

 if [[ "$TRAVIS_BRANCH" != "master" ]]
 then
--- a/requirements.txt
+++ b/requirements.txt
@ -1,2 +1,3 @@
 six==1.11.0
 flake8==3.6.0
+setuptools<45.0.0
--- a/setup.py
+++ b/setup.py
@ -2,7 +2,7 @@

 from setuptools import setup

-q_version = '2.0.17'
+q_version = '2.0.18'

 setup(
    name='q',
--- a/test/BENCHMARK.md
+++ b/test/BENCHMARK.md
@ -0,0 +1,159 @@
+
+
+NOTE: *Please don't use or publish this benchmark data yet. See below for details*
+
+# Overview
+This just a preliminary benchmark, originally created for validating performance optimizations and suggestions from users, and analyzing q's move to python3. After writing it, I thought it might be interesting to test its speed against textql and octosql as well.
+
+The results I'm getting are somewhat surprising, to the point of me questioning them a bit, so it would be great to validate the further before finalizing the benchmark results.
+
+The most surprising results are as follows:
+* python3 vs python2 - A huge improvement (for large files, execution times with python 3 are around 40% of the times for python 2)
+* python3 vs textql (written in golang) - Seems that textql becomes slower than the python3 q version as the data sizes grows (both rows and columns)
+
+I would love to validate these results by having other people run the benchmark as well and send me their results. 
+
+If you're interested, follow the instructions and run the benchmark on your machine. After the benchmark is finished, send me the final results file, along with some details about your hardware, and i'll add it to the spreadsheet. <harelba@gmail.com>
+
+I've tried to make running the benchmark as seamless as possible, but there obviously might be errors/issues. Please contact me if you encounter any issue, or just open a ticket.
+
+# Benchmark
+This is an initial version of the benchmark, along with some results. The following is compared:
+* q running on multiple python versions
+* textql 2.0.3
+* octosql v0.3.0
+
+The specific python versions which are being tested are specified in `benchmark-config.sh`.
+
+This is by no means a scientific benchmark, and it only focuses on the data loading time which is the only significant factor for comparison (e.g. the query itself is a very simple count query). Also, it does not try to provide any usability comparison between q and textql/octosql, an interesting topic on its own.
+
+## Methodology
+The idea was to compare the time sensitivity of row and column count. 
+
+* Row counts: 1,10,100,1000,10000,100000,1000000
+* Column counts: 1,5,10,20,50,100
+* Iterations for each combination: 10
+
+File sizes:
+* 1M rows by 100 columns - 976MB (~1GB) - Largest file
+* 1M rows by 50 columns - 477MB
+
+The benchmark executes simple `select count(*) from <file>` queries for each combination, calculating the mean and stddev of each set of iterations. The stddev is used in order to measure the validity of the results.
+
+The graphs below only compare the means of the results, the standard deviations are written into the google sheet itself, and can be viewed there if needed.
+
+Instructions on how to run the benchmark are at the bottom section of this document, after the results section.
+
+## Hardware
+OSX Catalina on a 15" Macbook Pro from Mid 2015, with 16GB of RAM, and an internal Flash Drive of 256GB.
+
+## Results
+(Results are automatically updated from the baseline tab in the google spreadsheet).
+
+Detailed results below.
+
+Summary:
+* All python 3 versions (3.6/3.7/3.8) provide similar results across all scales.
+* python 3.x provides significantly better results than python2. Improvement grows as the file size grows (20% improvement for small files, up to ~70% improvement for the largest file)
+* textql seems to provide faster results than q (py3) for smaller files, up to around 30MB of data. As the size grows further, it becomes slower than q, up to 80% (74 seconds vs 41 seconds) for the largest file
+* The larger the files, textql becomes slower than q-py3 (up to 80% more time than q for the largest file)
+* octosql is significantly slower than both q and textql, even for small files with a low number of rows and columns
+
+### Data for 1M rows
+
+#### Run time durations for 1M rows and different column counts:
+|   rows  	| columns 	| File Size 	| python 2.7 	| python 3.6 	| python 3.7 	| python 3.8 	| textql 	| octosql 	|
+|:-------:	|:-------:	|:---------:	|:----------:	|:----------:	|:----------:	|:----------:	|:------:	|:-------:	|
+| 1000000 	|    1    	|    17M    	|    5.15    	|    4.24    	|    4.08    	|    3.98    	|  2.90  	|  49.95  	|
+| 1000000 	|    5    	|    37M    	|    10.68   	|    5.37    	|    5.26    	|    5.14    	|  5.88  	|  54.69  	|
+| 1000000 	|    10   	|    89M    	|    17.56   	|    7.25    	|    7.15    	|    7.01    	|  9.69  	|  65.32  	|
+| 1000000 	|    20   	|    192M   	|    30.28   	|    10.96   	|    10.78   	|    10.64   	|  17.34 	|  83.94  	|
+| 1000000 	|    50   	|    477M   	|    71.56   	|    21.98   	|    21.59   	|    21.70   	|  38.57 	|  158.26 	|
+| 1000000 	|   100   	|    986M   	|   131.86   	|    41.71   	|    40.82   	|    41.02   	|  74.62 	|  289.58 	|
+
+#### Comparison between python 3.x and python 2 run times (1M rows):
+(>100% is slower than q-py2, <100% is faster than q-py2)
+
+|   rows    | columns 	| file size 	| q-py2 runtime 	| q-py3.6 vs q-py2 runtime 	| q-py3.7 vs q-py2 runtime 	| q-py3.8 vs q-py2 runtime 	|
+|:-------:	|:-------:	|:---------:	|:-------------:	|:------------------------:	|:------------------------:	|:------------------------:	|
+| 1000000 	|    1    	|    17M    	|    100.00%    	|          82.34%          	|          79.34%          	|          77.36%          	|
+| 1000000 	|    5    	|    37M    	|    100.00%    	|          50.25%          	|          49.22%          	|          48.08%          	|
+| 1000000 	|    10   	|    89M    	|    100.00%    	|          41.30%          	|          40.69%          	|          39.93%          	|
+| 1000000 	|    20   	|    192M   	|    100.00%    	|          36.18%          	|          35.59%          	|          35.14%          	|
+| 1000000 	|    50   	|    477M   	|    100.00%    	|          30.71%          	|          30.17%          	|          30.32%          	|
+| 1000000 	|   100   	|    986M   	|    100.00%    	|          31.63%          	|          30.96%          	|          31.11%          	|
+
+#### textql and octosql comparison against q-py3 run time (1M rows):
+(>100% is slower than q-py3, <100% is faster than q-py3)
+
+|   rows  	| columns 	| file size 	| avg q-py3 runtime 	| textql vs q-py3 runtime 	| octosql vs q-py3 runtime 	|
+|:-------:	|:-------:	|:---------:	|:-----------------:	|:-----------------------:	|:------------------------:	|
+| 1000000 	|    1    	|    17M    	|      100.00%      	|          70.67%         	|         1217.76%         	|
+| 1000000 	|    5    	|    37M    	|      100.00%      	|         111.86%         	|         1040.70%         	|
+| 1000000 	|    10   	|    89M    	|      100.00%      	|         135.80%         	|          915.28%         	|
+| 1000000 	|    20   	|    192M   	|      100.00%      	|         160.67%         	|          777.92%         	|
+| 1000000 	|    50   	|    477M   	|      100.00%      	|         177.26%         	|          727.40%         	|
+| 1000000 	|   100   	|    986M   	|      100.00%      	|         181.19%         	|          703.15%         	|
+
+### Sensitivity to column count 
+Based on a the largest file size of 1,000,000 rows.
+
+![Sensitivity to column count](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=1585602598&format=image)
+
+### Sensitivity to line count (per column count)
+
+#### 1 Column Table
+![1 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=1119350798&format=image)
+
+#### 5 Column Table
+![5 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=599223098&format=image)
+
+#### 10 Column Table
+![10 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=82695414&format=image)
+
+#### 20 Column Table
+![20 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=1573199483&format=image)
+
+#### 50 Column Table
+![50 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=448568670&format=image)
+
+#### 100 Column Table
+![100 column table](https://docs.google.com/spreadsheets/d/e/2PACX-1vQy9Zm4I322Tdf5uoiFFJx6Oi3Z4AMq7He3fUUtsEQVQIdTGfWgjxFD6k8PAy9wBjvFkqaG26oBgNTP/pubchart?oid=2101488258&format=image)
+
+## Running the benchmark
+Please note that the initial run generates large files, so you'd need more than 3GB of free space available. All the generated files reside in the `_benchmark_data/` folder.
+
+Part of the preparation flow will download the benchmark data as needed.
+
+### Preparations
+* Prerequisites:
+  * pyenv installed
+  * pyenv-virtualenv installed
+  * [`textql`](https://github.com/dinedal/textql#install)
+  * [`octosql`](https://github.com/cube2222/octosql#installation)
+
+Run `./prepare-benchmark-env`
+
+### Execution
+Run `./run-benchmark <benchmark-id>`.
+
+Benchmark output files will be written to `./benchmark-results/<q-executable>/<benchmark-id>/`.
+
+* `benchmark-id` is the id you wanna give the benchmark.
+* `q-executable` is the name of the q executable being used for the benchmark. If none has been provided through Q_EXECUTABLE, then the value will be the last commit hash. Note that there is no checking of whether the working tree is clean. 
+
+The summary of benchmark will be written to `./benchmark-results/<benchmark-id>/summary.benchmark-results``
+
+By default, the benchmark will use the source python files inside the project. If you wanna run it on one of the standalone binary executable, the set Q_EXECUTABLE to the full path of the q binary.
+
+For anyone helping with running the benchmark, don't use this parameter for now, just test against a clean checkout of the code using `./run-benchmark <benchmark-id>`.
+
+## Benchmark Development info
+### Running against the standalone binary
+* `./run-benchmark` can accept a second parameter with the q executable. If it gets this parameter, it will use this path for running q. This provides a way to test the standalone q binaries in the new packaging format. When this parameter does not exist, the benchmark is executed directly from the source code.
+
+### Updating the benchmark markdown document file
+The results should reside in the following [google sheet](https://docs.google.com/spreadsheets/d/1Ljr8YIJwUQ5F4wr6ATga5Aajpu1CvQp1pe52KGrLkbY/edit?usp=sharing). 
+
+add a new tab to the google sheet, and paste the content of `summary.benchmark-results` to the new sheet.
+
--- a/test/benchmark-config.sh
+++ b/test/benchmark-config.sh
@ -0,0 +1,3 @@
+#!/bin/bash
+
+BENCHMARK_PYTHON_VERSIONS=(2.7.18 3.6.4 3.7.9 3.8.5)
--- a/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/octosql_v0.3.0.benchmark-results
+++ b/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/octosql_v0.3.0.benchmark-results
@ -0,0 +1,48 @@
+lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	1	0.582091641426	0.0235290239617
+10	1	0.596219730377	0.0320124029461
+100	1	0.575977492332	0.0199296245316
+1000	1	0.56785056591	0.00846389017466
+10000	1	1.1466334343	0.00760108698846
+100000	1	5.49565172195	0.131791932977
+1000000	1	49.9513648033	0.443430523063
+lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	5	0.582160949707	0.0274409391571
+10	5	0.57046456337	0.0199413000359
+100	5	0.585747480392	0.0372543971623
+1000	5	0.572268772125	0.00384300349763
+10000	5	1.15530762672	0.0117990775856
+100000	5	6.10629923344	0.146711842919
+1000000	5	54.6851765394	0.315486399525
+lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	10	0.586222410202	0.0232479065914
+10	10	0.59000480175	0.0186508192447
+100	10	0.581873703003	0.0331332482772
+1000	10	0.569027900696	0.0103675493106
+10000	10	1.40067322254	0.00583352224401
+100000	10	7.30705575943	0.0165839217599
+1000000	10	65.3242264032	0.512552576414
+lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	20	0.571048212051	0.0166919396871
+10	20	0.594776701927	0.0368900941023
+100	20	0.561370825768	0.00907051791451
+1000	20	0.577527880669	0.00983965108957
+10000	20	1.90710241795	0.00757011452155
+100000	20	9.8267291069	0.127844155326
+1000000	20	83.9448960066	0.46121344046
+lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	50	0.572030115128	0.0253648479103
+10	50	0.56993534565	0.0230474303306
+100	50	0.563336873055	0.00964411866903
+1000	50	0.826378440857	0.00941629472813
+10000	50	3.27872717381	0.126592845956
+100000	50	17.890055728	0.116794666005
+1000000	50	158.262442636	0.826290454446
+lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	100	0.569358110428	0.0279801762531
+10	100	0.580981063843	0.0272341107532
+100	100	0.559471726418	0.00668155858429
+1000	100	1.08161640167	0.00698594638512
+10000	100	5.67823712826	0.0123398407167
+100000	100	32.2797194242	0.315508270241
+1000000	100	289.582628798	0.929455236817
--- a/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/q-benchmark-2.7.18.benchmark-results
+++ b/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/q-benchmark-2.7.18.benchmark-results
@ -0,0 +1,48 @@
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev
+1	1	0.106449890137	0.002010027753
+10	1	0.106737875938	0.00224112203891
+100	1	0.107839012146	0.00102954061006
+1000	1	0.113026666641	0.00147361890226
+10000	1	0.160376381874	0.00569766179806
+100000	1	0.608236479759	0.00604026519608
+1000000	1	5.14807910919	0.0584474028762
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev
+1	5	0.106719517708	0.00236752032369
+10	5	0.107823801041	0.00238873169438
+100	5	0.109785079956	0.0013047675259
+1000	5	0.120395207405	0.00207224422629
+10000	5	0.21783041954	0.00522254475716
+100000	5	1.17115747929	0.0221394865225
+1000000	5	10.6830974817	0.339822977934
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev
+1	10	0.104981088638	0.00166552032929
+10	10	0.108320140839	0.00204034349199
+100	10	0.112528729439	0.00168376477305
+1000	10	0.13019015789	0.00253773120965
+10000	10	0.284891676903	0.00384009140782
+100000	10	1.84725661278	0.00860738744089
+1000000	10	17.5610994339	0.228322442172
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev
+1	20	0.106477689743	0.00254429925697
+10	20	0.108580899239	0.00173704653824
+100	20	0.118750286102	0.00247623639866
+1000	20	0.146431708336	0.00249685551944
+10000	20	0.419492387772	0.00248210434668
+100000	20	3.15847921371	0.0550301268026
+1000000	20	30.279082489	0.124978814506
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev
+1	50	0.105411934853	0.00171651054128
+10	50	0.109102797508	0.00111620290512
+100	50	0.135682177544	0.00196166766665
+1000	50	0.198261427879	0.00396172489054
+10000	50	0.821499919891	0.0111642692132
+100000	50	7.05980975628	0.121182371277
+1000000	50	71.5645889759	5.02009516291
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev
+1	100	0.10662381649	0.00193146624495
+10	100	0.110662698746	0.00171461379583
+100	100	0.163547992706	0.00166570196628
+1000	100	0.280023741722	0.00337543024145
+10000	100	1.46053376198	0.0221691284465
+100000	100	13.2369835854	0.309375896258
+1000000	100	131.864977288	1.22415449691
--- a/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/q-benchmark-3.6.4.benchmark-results
+++ b/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/q-benchmark-3.6.4.benchmark-results
@ -0,0 +1,48 @@
+lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev
+1	1	0.10342762470245362	0.0017673875851759295
+10	1	0.10239293575286865	0.0012505611685910795
+100	1	0.10317318439483643	0.0010581783881541751
+1000	1	0.10687050819396973	0.0014050135772919004
+10000	1	0.1447664737701416	0.001841256227287192
+100000	1	0.5162809371948243	0.006962985088492867
+1000000	1	4.238853335380554	0.04834401143632507
+lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev
+1	5	0.10211825370788574	0.0022568191323651568
+10	5	0.1025341272354126	0.0016446470901070106
+100	5	0.1053577184677124	0.0015298114223855884
+1000	5	0.10980842113494874	0.002536098780902228
+10000	5	0.1590113162994385	0.003123074098301634
+100000	5	0.6348223447799682	0.0082691507829872
+1000000	5	5.368562030792236	0.11628913334105236
+lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev
+1	10	0.10251858234405517	0.0015963869535345293
+10	10	0.10278875827789306	0.0009920577082124496
+100	10	0.10715732574462891	0.002033320000941064
+1000	10	0.11389360427856446	0.0023603847702423973
+10000	10	0.17806434631347656	0.001114054252191835
+100000	10	0.8252989768981933	0.0037080843359275904
+1000000	10	7.252838873863221	0.029052130546213153
+lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev
+1	20	0.10367965698242188	0.003661761341842434
+10	20	0.10489590167999267	0.001977141196109372
+100	20	0.11108210086822509	0.0014801173497056886
+1000	20	0.12110791206359864	0.001648524669420912
+10000	20	0.2178968906402588	0.0019298316207276716
+100000	20	1.1962245225906372	0.010541407803235559
+1000000	20	10.956057572364807	0.12677108174061705
+lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev
+1	50	0.10458300113677979	0.0016367630302744722
+10	50	0.10616152286529541	0.002345135740908088
+100	50	0.12375867366790771	0.00238414904864133
+1000	50	0.14462883472442628	0.0022428030896492978
+10000	50	0.34488487243652344	0.004867441221052092
+100000	50	2.3394312858581543	0.02263239858944125
+1000000	50	21.979821610450745	0.09080404939303836
+lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev
+1	100	0.10372309684753418	0.0010299126833031144
+10	100	0.10784556865692138	0.0016557634029464607
+100	100	0.14526791572570802	0.0028194506905186724
+1000	100	0.18315494060516357	0.0023585311962114673
+10000	100	0.5586131334304809	0.004808492789681402
+100000	100	4.287398314476013	0.00957500108409644
+1000000	100	41.706851434707644	0.4161526076289425
--- a/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/q-benchmark-3.7.9.benchmark-results
+++ b/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/q-benchmark-3.7.9.benchmark-results
@ -0,0 +1,48 @@
+lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev
+1	1	0.08099310398101807	0.001417385651688644
+10	1	0.0822291374206543	0.0014809900020001858
+100	1	0.08169686794281006	0.002108157069167563
+1000	1	0.08690853118896484	0.0012595326919263487
+10000	1	0.12215542793273926	0.0020152625320395434
+100000	1	0.4825761795043945	0.0050418000028856335
+1000000	1	4.084399747848511	0.027731958079814215
+lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev
+1	5	0.0817826271057129	0.002665533758836163
+10	5	0.08261749744415284	0.0019205430658525572
+100	5	0.08472237586975098	0.002571239449841039
+1000	5	0.08973510265350342	0.002323797583077552
+10000	5	0.13746986389160157	0.001964971666036654
+100000	5	0.60649254322052	0.007131635266871318
+1000000	5	5.2585612535476685	0.05661789407928516
+lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev
+1	10	0.08112843036651611	0.002251300165899426
+10	10	0.08175232410430908	0.0014557171018568637
+100	10	0.08572309017181397	0.0019643550214810675
+1000	10	0.09268453121185302	0.001816414236580489
+10000	10	0.15538835525512695	0.0024978076091814994
+100000	10	0.7879442930221557	0.009412516078916211
+1000000	10	7.146207928657532	0.06659760176757985
+lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev
+1	20	0.08142082691192627	0.001304584466639188
+10	20	0.08197519779205323	0.0014842098503865223
+100	20	0.08949971199035645	0.0009937446141285785
+1000	20	0.09955930709838867	0.0013978961740806384
+10000	20	0.1966566801071167	0.0028489273218240147
+100000	20	1.1518636226654053	0.006410720031542237
+1000000	20	10.776052689552307	0.04739925571001746
+lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev
+1	50	0.08237688541412354	0.0016494314799953837
+10	50	0.08519520759582519	0.002610550182895596
+100	50	0.10423583984375	0.0018808335751867933
+1000	50	0.12195603847503662	0.0023611894043373983
+10000	50	0.3163540124893188	0.002761333651520998
+100000	50	2.237372374534607	0.009955353920396077
+1000000	50	21.59097549915314	0.081188190530421
+lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev
+1	100	0.08336784839630126	0.0013840724401561887
+10	100	0.0864112138748169	0.0017946939354350697
+100	100	0.12199611663818359	0.0013003743156634682
+1000	100	0.15871686935424806	0.0035993681064501234
+10000	100	0.5243751525878906	0.004370273273595629
+100000	100	4.175828623771667	0.016127303710583043
+1000000	100	40.82292411327362	0.12328165162380703
--- a/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/q-benchmark-3.8.5.benchmark-results
+++ b/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/q-benchmark-3.8.5.benchmark-results
@ -0,0 +1,48 @@
+lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev
+1	1	0.10138180255889892	0.0017947074090971444
+10	1	0.10056869983673096	0.003442371291904885
+100	1	0.10126984119415283	0.0016392348107127808
+1000	1	0.10484635829925537	0.0019743937339163262
+10000	1	0.1400548219680786	0.0024523366133394117
+100000	1	0.4901275157928467	0.003970374711691596
+1000000	1	3.982502889633179	0.045292138461945054
+lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev
+1	5	0.09946837425231933	0.0018876161478998787
+10	5	0.099178147315979	0.0014194733014858227
+100	5	0.10171806812286377	0.0017580984705406846
+1000	5	0.10602672100067138	0.002000261880840017
+10000	5	0.15207929611206056	0.0015802680033212048
+100000	5	0.609218978881836	0.006150144273259608
+1000000	5	5.13688440322876	0.03649575898109647
+lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev
+1	10	0.09925477504730225	0.002168389758635997
+10	10	0.09943633079528809	0.0016154501074880502
+100	10	0.10376312732696533	0.0017275485891005433
+1000	10	0.11087138652801513	0.0016934328033239559
+10000	10	0.17246220111846924	0.0023824485659318527
+100000	10	0.7999232530593872	0.003442975393506892
+1000000	10	7.012071299552917	0.059217904448851263
+lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev
+1	20	0.10027089118957519	0.0020291529595204906
+10	20	0.10038816928863525	0.001957086760826999
+100	20	0.10723590850830078	0.0013833918448622436
+1000	20	0.11735000610351562	0.0020318895390750882
+10000	20	0.21264209747314453	0.00482341642419078
+100000	20	1.1567201137542724	0.002987096441878969
+1000000	20	10.640758633613586	0.06116581724028616
+lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev
+1	50	0.10066506862640381	0.002051307639276982
+10	50	0.10588631629943848	0.0035835389655972105
+100	50	0.11841504573822022	0.001608174845404568
+1000	50	0.14032282829284667	0.002640027148889162
+10000	50	0.33160474300384524	0.0027796660009712947
+100000	50	2.258401036262512	0.011041280982383895
+1000000	50	21.70080256462097	0.15897944629180621
+lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev
+1	100	0.10147004127502442	0.0021285682695135768
+10	100	0.10471885204315186	0.001248479289219899
+100	100	0.13894760608673096	0.002307980025026551
+1000	100	0.17586205005645753	0.0023822296091426
+10000	100	0.5414002418518067	0.0036291866664635458
+100000	100	4.222555088996887	0.08562968951916528
+1000000	100	41.021552324295044	0.16033566363076862
--- a/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/summary.benchmark-results
+++ b/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/summary.benchmark-results
@ -0,0 +1,48 @@
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev	lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev	lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev	lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev	lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev	lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	1	0.106449890137	0.002010027753	1	1	0.10342762470245362	0.0017673875851759295	1	1	0.08099310398101807	0.001417385651688644	1	1	0.10138180255889892	0.0017947074090971444	1	1	0.0196103572845	0.00207355214257	1	1	0.582091641426	0.0235290239617
+10	1	0.106737875938	0.00224112203891	10	1	0.10239293575286865	0.0012505611685910795	10	1	0.0822291374206543	0.0014809900020001858	10	1	0.10056869983673096	0.003442371291904885	10	1	0.0186784029007	0.000970810220668	10	1	0.596219730377	0.0320124029461
+100	1	0.107839012146	0.00102954061006	100	1	0.10317318439483643	0.0010581783881541751	100	1	0.08169686794281006	0.002108157069167563	100	1	0.10126984119415283	0.0016392348107127808	100	1	0.019472026825	0.00181951524514	100	1	0.575977492332	0.0199296245316
+1000	1	0.113026666641	0.00147361890226	1000	1	0.10687050819396973	0.0014050135772919004	1000	1	0.08690853118896484	0.0012595326919263487	1000	1	0.10484635829925537	0.0019743937339163262	1000	1	0.022180891037	0.00116649968967	1000	1	0.56785056591	0.00846389017466
+10000	1	0.160376381874	0.00569766179806	10000	1	0.1447664737701416	0.001841256227287192	10000	1	0.12215542793273926	0.0020152625320395434	10000	1	0.1400548219680786	0.0024523366133394117	10000	1	0.051066827774	0.0018168767618	10000	1	1.1466334343	0.00760108698846
+100000	1	0.608236479759	0.00604026519608	100000	1	0.5162809371948243	0.006962985088492867	100000	1	0.4825761795043945	0.0050418000028856335	100000	1	0.4901275157928467	0.003970374711691596	100000	1	0.307463979721	0.00246268029188	100000	1	5.49565172195	0.131791932977
+1000000	1	5.14807910919	0.0584474028762	1000000	1	4.238853335380554	0.04834401143632507	1000000	1	4.084399747848511	0.027731958079814215	1000000	1	3.982502889633179	0.045292138461945054	1000000	1	2.89862303734	0.022182722976	1000000	1	49.9513648033	0.443430523063
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev	lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev	lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev	lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev	lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev	lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	5	0.106719517708	0.00236752032369	1	5	0.10211825370788574	0.0022568191323651568	1	5	0.0817826271057129	0.002665533758836163	1	5	0.09946837425231933	0.0018876161478998787	1	5	0.0195286750793	0.0017840569109	1	5	0.582160949707	0.0274409391571
+10	5	0.107823801041	0.00238873169438	10	5	0.1025341272354126	0.0016446470901070106	10	5	0.08261749744415284	0.0019205430658525572	10	5	0.099178147315979	0.0014194733014858227	10	5	0.0183676958084	0.000925251595491	10	5	0.57046456337	0.0199413000359
+100	5	0.109785079956	0.0013047675259	100	5	0.1053577184677124	0.0015298114223855884	100	5	0.08472237586975098	0.002571239449841039	100	5	0.10171806812286377	0.0017580984705406846	100	5	0.0199447393417	0.000907007099218	100	5	0.585747480392	0.0372543971623
+1000	5	0.120395207405	0.00207224422629	1000	5	0.10980842113494874	0.002536098780902228	1000	5	0.08973510265350342	0.002323797583077552	1000	5	0.10602672100067138	0.002000261880840017	1000	5	0.0263328790665	0.00165486505938	1000	5	0.572268772125	0.00384300349763
+10000	5	0.21783041954	0.00522254475716	10000	5	0.1590113162994385	0.003123074098301634	10000	5	0.13746986389160157	0.001964971666036654	10000	5	0.15207929611206056	0.0015802680033212048	10000	5	0.0826982736588	0.00152451583229	10000	5	1.15530762672	0.0117990775856
+100000	5	1.17115747929	0.0221394865225	100000	5	0.6348223447799682	0.0082691507829872	100000	5	0.60649254322052	0.007131635266871318	100000	5	0.609218978881836	0.006150144273259608	100000	5	0.60660867691	0.00395761320274	100000	5	6.10629923344	0.146711842919
+1000000	5	10.6830974817	0.339822977934	1000000	5	5.368562030792236	0.11628913334105236	1000000	5	5.2585612535476685	0.05661789407928516	1000000	5	5.13688440322876	0.03649575898109647	1000000	5	5.87811236382	0.0304332294491	1000000	5	54.6851765394	0.315486399525
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev	lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev	lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev	lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev	lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev	lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	10	0.104981088638	0.00166552032929	1	10	0.10251858234405517	0.0015963869535345293	1	10	0.08112843036651611	0.002251300165899426	1	10	0.09925477504730225	0.002168389758635997	1	10	0.0191783189774	0.00107718516178	1	10	0.586222410202	0.0232479065914
+10	10	0.108320140839	0.00204034349199	10	10	0.10278875827789306	0.0009920577082124496	10	10	0.08175232410430908	0.0014557171018568637	10	10	0.09943633079528809	0.0016154501074880502	10	10	0.0185215950012	0.000840353961363	10	10	0.59000480175	0.0186508192447
+100	10	0.112528729439	0.00168376477305	100	10	0.10715732574462891	0.002033320000941064	100	10	0.08572309017181397	0.0019643550214810675	100	10	0.10376312732696533	0.0017275485891005433	100	10	0.0209223031998	0.00164494657684	100	10	0.581873703003	0.0331332482772
+1000	10	0.13019015789	0.00253773120965	1000	10	0.11389360427856446	0.0023603847702423973	1000	10	0.09268453121185302	0.001816414236580489	1000	10	0.11087138652801513	0.0016934328033239559	1000	10	0.0309282779694	0.00110848590345	1000	10	0.569027900696	0.0103675493106
+10000	10	0.284891676903	0.00384009140782	10000	10	0.17806434631347656	0.001114054252191835	10000	10	0.15538835525512695	0.0024978076091814994	10000	10	0.17246220111846924	0.0023824485659318527	10000	10	0.121016025543	0.00105071105139	10000	10	1.40067322254	0.00583352224401
+100000	10	1.84725661278	0.00860738744089	100000	10	0.8252989768981933	0.0037080843359275904	100000	10	0.7879442930221557	0.009412516078916211	100000	10	0.7999232530593872	0.003442975393506892	100000	10	0.987622976303	0.00699348302979	100000	10	7.30705575943	0.0165839217599
+1000000	10	17.5610994339	0.228322442172	1000000	10	7.252838873863221	0.029052130546213153	1000000	10	7.146207928657532	0.06659760176757985	1000000	10	7.012071299552917	0.059217904448851263	1000000	10	9.69240145683	0.0354453778052	1000000	10	65.3242264032	0.512552576414
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev	lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev	lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev	lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev	lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev	lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	20	0.106477689743	0.00254429925697	1	20	0.10367965698242188	0.003661761341842434	1	20	0.08142082691192627	0.001304584466639188	1	20	0.10027089118957519	0.0020291529595204906	1	20	0.0202306985855	0.00159619251952	1	20	0.571048212051	0.0166919396871
+10	20	0.108580899239	0.00173704653824	10	20	0.10489590167999267	0.001977141196109372	10	20	0.08197519779205323	0.0014842098503865223	10	20	0.10038816928863525	0.001957086760826999	10	20	0.0187650680542	0.000845692486156	10	20	0.594776701927	0.0368900941023
+100	20	0.118750286102	0.00247623639866	100	20	0.11108210086822509	0.0014801173497056886	100	20	0.08949971199035645	0.0009937446141285785	100	20	0.10723590850830078	0.0013833918448622436	100	20	0.0211876153946	0.000993808448942	100	20	0.561370825768	0.00907051791451
+1000	20	0.146431708336	0.00249685551944	1000	20	0.12110791206359864	0.001648524669420912	1000	20	0.09955930709838867	0.0013978961740806384	1000	20	0.11735000610351562	0.0020318895390750882	1000	20	0.0404737234116	0.00122415059261	1000	20	0.577527880669	0.00983965108957
+10000	20	0.419492387772	0.00248210434668	10000	20	0.2178968906402588	0.0019298316207276716	10000	20	0.1966566801071167	0.0028489273218240147	10000	20	0.21264209747314453	0.00482341642419078	10000	20	0.197762489319	0.00198188642677	10000	20	1.90710241795	0.00757011452155
+100000	20	3.15847921371	0.0550301268026	100000	20	1.1962245225906372	0.010541407803235559	100000	20	1.1518636226654053	0.006410720031542237	100000	20	1.1567201137542724	0.002987096441878969	100000	20	1.75432097912	0.00692372147543	100000	20	9.8267291069	0.127844155326
+1000000	20	30.279082489	0.124978814506	1000000	20	10.956057572364807	0.12677108174061705	1000000	20	10.776052689552307	0.04739925571001746	1000000	20	10.640758633613586	0.06116581724028616	1000000	20	17.3383012295	0.0410164637448	1000000	20	83.9448960066	0.46121344046
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev	lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev	lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev	lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev	lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev	lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	50	0.105411934853	0.00171651054128	1	50	0.10458300113677979	0.0016367630302744722	1	50	0.08237688541412354	0.0016494314799953837	1	50	0.10066506862640381	0.002051307639276982	1	50	0.0205577373505	0.00133922342068	1	50	0.572030115128	0.0253648479103
+10	50	0.109102797508	0.00111620290512	10	50	0.10616152286529541	0.002345135740908088	10	50	0.08519520759582519	0.002610550182895596	10	50	0.10588631629943848	0.0035835389655972105	10	50	0.0195438146591	0.000791630611893	10	50	0.56993534565	0.0230474303306
+100	50	0.135682177544	0.00196166766665	100	50	0.12375867366790771	0.00238414904864133	100	50	0.10423583984375	0.0018808335751867933	100	50	0.11841504573822022	0.001608174845404568	100	50	0.0246078014374	0.00108949795701	100	50	0.563336873055	0.00964411866903
+1000	50	0.198261427879	0.00396172489054	1000	50	0.14462883472442628	0.0022428030896492978	1000	50	0.12195603847503662	0.0023611894043373983	1000	50	0.14032282829284667	0.002640027148889162	1000	50	0.063302564621	0.00058195987294	1000	50	0.826378440857	0.00941629472813
+10000	50	0.821499919891	0.0111642692132	10000	50	0.34488487243652344	0.004867441221052092	10000	50	0.3163540124893188	0.002761333651520998	10000	50	0.33160474300384524	0.0027796660009712947	10000	50	0.410061001778	0.00294901155085	10000	50	3.27872717381	0.126592845956
+100000	50	7.05980975628	0.121182371277	100000	50	2.3394312858581543	0.02263239858944125	100000	50	2.237372374534607	0.009955353920396077	100000	50	2.258401036262512	0.011041280982383895	100000	50	3.87797718048	0.0123467913678	100000	50	17.890055728	0.116794666005
+1000000	50	71.5645889759	5.02009516291	1000000	50	21.979821610450745	0.09080404939303836	1000000	50	21.59097549915314	0.081188190530421	1000000	50	21.70080256462097	0.15897944629180621	1000000	50	38.5674883366	0.0602820291386	1000000	50	158.262442636	0.826290454446
+lines	columns	q-benchmark-2.7.18_mean	q-benchmark-2.7.18_stddev	lines	columns	q-benchmark-3.6.4_mean	q-benchmark-3.6.4_stddev	lines	columns	q-benchmark-3.7.9_mean	q-benchmark-3.7.9_stddev	lines	columns	q-benchmark-3.8.5_mean	q-benchmark-3.8.5_stddev	lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev	lines	columns	octosql_v0.3.0_mean	octosql_v0.3.0_stddev
+1	100	0.10662381649	0.00193146624495	1	100	0.10372309684753418	0.0010299126833031144	1	100	0.08336784839630126	0.0013840724401561887	1	100	0.10147004127502442	0.0021285682695135768	1	100	0.0216581106186	0.00103280947157	1	100	0.569358110428	0.0279801762531
+10	100	0.110662698746	0.00171461379583	10	100	0.10784556865692138	0.0016557634029464607	10	100	0.0864112138748169	0.0017946939354350697	10	100	0.10471885204315186	0.001248479289219899	10	100	0.021723818779	0.000920429257416	10	100	0.580981063843	0.0272341107532
+100	100	0.163547992706	0.00166570196628	100	100	0.14526791572570802	0.0028194506905186724	100	100	0.12199611663818359	0.0013003743156634682	100	100	0.13894760608673096	0.002307980025026551	100	100	0.0299471855164	0.00130217326679	100	100	0.559471726418	0.00668155858429
+1000	100	0.280023741722	0.00337543024145	1000	100	0.18315494060516357	0.0023585311962114673	1000	100	0.15871686935424806	0.0035993681064501234	1000	100	0.17586205005645753	0.0023822296091426	1000	100	0.0996923923492	0.00155352212734	1000	100	1.08161640167	0.00698594638512
+10000	100	1.46053376198	0.0221691284465	10000	100	0.5586131334304809	0.004808492789681402	10000	100	0.5243751525878906	0.004370273273595629	10000	100	0.5414002418518067	0.0036291866664635458	10000	100	0.767001605034	0.00328944029633	10000	100	5.67823712826	0.0123398407167
+100000	100	13.2369835854	0.309375896258	100000	100	4.287398314476013	0.00957500108409644	100000	100	4.175828623771667	0.016127303710583043	100000	100	4.222555088996887	0.08562968951916528	100000	100	7.46734063625	0.0262039846119	100000	100	32.2797194242	0.315508270241
+1000000	100	131.864977288	1.22415449691	1000000	100	41.706851434707644	0.4161526076289425	1000000	100	40.82292411327362	0.12328165162380703	1000000	100	41.021552324295044	0.16033566363076862	1000000	100	74.6216712952	0.0994037504394	1000000	100	289.582628798	0.929455236817
--- a/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/textql_2.0.3.benchmark-results
+++ b/test/benchmark-results/source-files-1443b7418b46594ad256abd9db4a7671cb251e6a/2020-09-17-v2.0.17/textql_2.0.3.benchmark-results
@ -0,0 +1,48 @@
+lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev
+1	1	0.0196103572845	0.00207355214257
+10	1	0.0186784029007	0.000970810220668
+100	1	0.019472026825	0.00181951524514
+1000	1	0.022180891037	0.00116649968967
+10000	1	0.051066827774	0.0018168767618
+100000	1	0.307463979721	0.00246268029188
+1000000	1	2.89862303734	0.022182722976
+lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev
+1	5	0.0195286750793	0.0017840569109
+10	5	0.0183676958084	0.000925251595491
+100	5	0.0199447393417	0.000907007099218
+1000	5	0.0263328790665	0.00165486505938
+10000	5	0.0826982736588	0.00152451583229
+100000	5	0.60660867691	0.00395761320274
+1000000	5	5.87811236382	0.0304332294491
+lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev
+1	10	0.0191783189774	0.00107718516178
+10	10	0.0185215950012	0.000840353961363
+100	10	0.0209223031998	0.00164494657684
+1000	10	0.0309282779694	0.00110848590345
+10000	10	0.121016025543	0.00105071105139
+100000	10	0.987622976303	0.00699348302979
+1000000	10	9.69240145683	0.0354453778052
+lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev
+1	20	0.0202306985855	0.00159619251952
+10	20	0.0187650680542	0.000845692486156
+100	20	0.0211876153946	0.000993808448942
+1000	20	0.0404737234116	0.00122415059261
+10000	20	0.197762489319	0.00198188642677
+100000	20	1.75432097912	0.00692372147543
+1000000	20	17.3383012295	0.0410164637448
+lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev
+1	50	0.0205577373505	0.00133922342068
+10	50	0.0195438146591	0.000791630611893
+100	50	0.0246078014374	0.00108949795701
+1000	50	0.063302564621	0.00058195987294
+10000	50	0.410061001778	0.00294901155085
+100000	50	3.87797718048	0.0123467913678
+1000000	50	38.5674883366	0.0602820291386
+lines	columns	textql_2.0.3_mean	textql_2.0.3_stddev
+1	100	0.0216581106186	0.00103280947157
+10	100	0.021723818779	0.000920429257416
+100	100	0.0299471855164	0.00130217326679
+1000	100	0.0996923923492	0.00155352212734
+10000	100	0.767001605034	0.00328944029633
+100000	100	7.46734063625	0.0262039846119
+1000000	100	74.6216712952	0.0994037504394
--- a/test/prepare-benchmark-env
+++ b/test/prepare-benchmark-env
@ -0,0 +1,44 @@
+#!/bin/bash
+
+set -e
+
+eval "$(pyenv init -)"
+eval "$(pyenv virtualenv-init -)"
+
+source benchmark-config.sh
+
+if [ ! -f ./benchmark_data.tar.gz ];
+then
+	echo benchmark data not found. downloading it
+  curl "https://s3.amazonaws.com/harelba-q-public/benchmark_data.tar.gz" -o ./benchmark_data.tar.gz
+else
+  echo no need to download benchmark data
+fi
+
+if [ ! -d ./_benchmark_data ];
+then
+	echo extracting benchmark data
+  tar xvfz benchmark_data.tar.gz
+  echo benchmark data is ready
+else
+  echo no need to extract benchmark data
+fi
+
+for ver in "${BENCHMARK_PYTHON_VERSIONS[@]}"
+do
+  echo installing $ver 
+  pyenv install -s $ver
+
+  venv_name=q-benchmark-$ver
+  echo create venv $venv_name
+  pyenv virtualenv -f $ver $venv_name
+  echo activate venv $venv_name
+  pyenv activate $venv_name
+  pyenv version
+  echo installing requirements $venv_name
+  pip install -r ../requirements.txt
+  echo deactivating $venv_name
+  pyenv deactivate    
+done
+
+
--- a/test/run-benchmark
+++ b/test/run-benchmark
@ -0,0 +1,77 @@
+#!/bin/bash
+
+# Usage: ./run-benchmark.sh <benchmark-id> <q-executable>
+set -e
+
+get_abs_filename() {
+  # $1 : relative filename
+  echo "$(cd "$(dirname "$1")" && pwd)/$(basename "$1")"
+}
+
+eval "$(pyenv init -)"
+eval "$(pyenv virtualenv-init -)"
+
+if [ "x$1" == "x" ];
+then
+	echo Benchmark id must be provided as a parameter
+  exit 1
+fi
+Q_BENCHMARK_ID=$1
+
+if [ "x$2" == "x" ];
+then
+  EFFECTIVE_Q_EXECUTABLE="source-files-$(git rev-parse HEAD)"
+else
+  ABS_Q_EXECUTABLE="$(get_abs_filename $2)"
+  export Q_EXECUTABLE=$ABS_Q_EXECUTABLE
+	if [ ! -f $ABS_Q_EXECUTABLE ]
+	then
+		echo "q executable must exist ($ABS_Q_EXECUTABLE)"
+		exit 1
+	fi
+  EFFECTIVE_Q_EXECUTABLE="${ABS_Q_EXECUTABLE//\//__}"
+fi
+
+echo "Q executable to use is $EFFECTIVE_Q_EXECUTABLE"
+
+# Must be provided to the benchmark code so it knows where to write the results to
+export Q_BENCHMARK_RESULTS_FOLDER="./benchmark-results/${EFFECTIVE_Q_EXECUTABLE}/${Q_BENCHMARK_ID}/"
+echo Benchmark results folder is $Q_BENCHMARK_RESULTS_FOLDER
+mkdir -p $Q_BENCHMARK_RESULTS_FOLDER
+
+source benchmark-config.sh
+
+ALL_FILES=()
+
+for ver in "${BENCHMARK_PYTHON_VERSIONS[@]}"
+do
+venv_name=q-benchmark-$ver
+echo activating $venv_name
+pyenv activate $venv_name
+echo "==== testing inside $venv_name ==="
+./test-all BenchmarkTests.test_q_matrix -v
+RESULT_FILE="${Q_BENCHMARK_RESULTS_FOLDER}/$venv_name.benchmark-results"
+echo "==== Done. Results are in $RESULT_FILE"
+ALL_FILES[${#ALL_FILES[@]}]="$RESULT_FILE"
+echo "Deactivating"
+pyenv deactivate
+done
+
+echo "==== testing textql ==="
+./test-all BenchmarkTests.test_textql_matrix -v
+RESULT_FILE="textql*.benchmark-results"
+ALL_FILES[${#ALL_FILES[@]}]="${Q_BENCHMARK_RESULTS_FOLDER}/$RESULT_FILE"
+echo "Done. Results are in textql.benchmark-results"
+
+echo "==== testing octosql ==="
+./test-all BenchmarkTests.test_octosql_matrix -v
+RESULT_FILE="octosql*.benchmark-results"
+ALL_FILES[${#ALL_FILES[@]}]="${Q_BENCHMARK_RESULTS_FOLDER}/$RESULT_FILE"
+echo "Done. Results are in octosql.benchmark-results"
+
+summary_file="$Q_BENCHMARK_RESULTS_FOLDER/summary.benchmark-results"
+
+rm -vf $summary_file
+
+paste ${ALL_FILES[*]} > $summary_file
+echo "Done. final results file is $summary_file"
--- a/test/test-suite
+++ b/test/test-suite
@ -10,6 +10,7 @@
 # in order to test the resulting binary executables as well, instead of just executing the q python source code.
 #

+from __future__ import print_function
 import unittest
 import random
 import json
@ -24,7 +25,7 @@ import pprint
 import six
 from six.moves import range
 import codecs
-
+import itertools

 sys.path.append(os.path.join(os.path.abspath(os.path.dirname(sys.argv[0])),'..','bin'))
 from q import QTextAsData,QOutput,QOutputPrinter,QInputParams
@ -2599,6 +2600,195 @@ class BasicModuleTests(AbstractQTestCase):
        self.assertTrue(table_structure.materialized_files['my_data'].filename,'my_data')
        self.assertTrue(table_structure.materialized_files['my_data'].is_stdin)

+
+class BenchmarkAttemptResults(object):
+    def __init__(self, attempt, lines, columns, duration,return_code):
+        self.attempt = attempt
+        self.lines = lines
+        self.columns = columns
+        self.duration = duration
+        self.return_code = return_code
+
+    def __str__(self):
+        return "{}".format(self.__dict__)
+    __repr__ = __str__
+
+class BenchmarkResults(object):
+    def __init__(self, lines, columns, attempt_results, mean, stddev):
+        self.lines = lines
+        self.columns = columns
+        self.attempt_results = attempt_results
+        self.mean = mean
+        self.stddev = stddev
+
+    def __str__(self):
+        return "{}".format(self.__dict__)
+    __repr__ = __str__
+
+class BenchmarkTests(AbstractQTestCase):
+
+    BENCHMARK_DIR = './_benchmark_data'
+
+    def _ensure_benchmark_data_dir_exists(self):
+        try:
+            os.mkdir(BenchmarkTests.BENCHMARK_DIR)
+        except Exception as e:
+            pass
+
+    def _create_benchmark_file_if_needed(self):
+        self._ensure_benchmark_data_dir_exists()
+
+        if os.path.exists('{}/benchmark-file.csv'.format(BenchmarkTests.BENCHMARK_DIR)):
+            return
+
+        g = GzipFile('unit-file.csv.gz')
+        d = g.read().decode('utf-8')
+        f = open('{}/benchmark-file.csv'.format(BenchmarkTests.BENCHMARK_DIR), 'w')
+        for i in range(100):
+            f.write(d)
+        f.close()
+
+    def _prepare_test_file(self, lines, columns):
+
+        filename = '{}/_benchmark_data__lines_{}_columns_{}.csv'.format(BenchmarkTests.BENCHMARK_DIR,lines, columns)
+
+        if os.path.exists(filename):
+            return filename
+
+        c = ['c{}'.format(x + 1) for x in range(columns)]
+
+        # write a header line
+        ff = open(filename,'w')
+        ff.write(",".join(c))
+        ff.write('\n')
+        ff.close()
+
+        r, o, e = run_command('head -{} {}/benchmark-file.csv | ' + Q_EXECUTABLE + ' -d , "select {} from -" >> {}'.format(lines, BenchmarkTests.BENCHMARK_DIR, ','.join(c), filename))
+        self.assertEqual(r, 0)
+        return filename
+
+    def _decide_result(self,attempt_results):
+
+        failed = list(filter(lambda a: a.return_code != 0,attempt_results))
+
+        if len(failed) == 0:
+            mean = sum([x.duration for x in attempt_results]) / len(attempt_results)
+            sum_squared = sum([(x.duration - mean)**2 for x in attempt_results])
+            ddof = 0
+            pvar = sum_squared / (len(attempt_results) - ddof)
+            stddev = pvar ** 0.5
+        else:
+            mean = None
+            stddev = None
+
+        return BenchmarkResults(
+            attempt_results[0].lines,
+            attempt_results[0].columns,
+            attempt_results,
+            mean,
+            stddev
+        )
+
+    def _perform_test_performance_matrix(self,name,generate_cmd_function):
+        results = []
+
+        benchmark_results_folder = os.environ.get("Q_BENCHMARK_RESULTS_FOLDER",'')
+        if benchmark_results_folder == "":
+            raise Exception("Q_BENCHMARK_RESULTS_FOLDER must be provided as an environment variable")
+
+        self._create_benchmark_file_if_needed()
+        for columns in [1, 5, 10, 20, 50, 100]:
+            for lines in [1, 10, 100, 1000, 10000, 100000, 1000000]:
+                attempt_results = []
+                for attempt in range(10):
+                    filename = self._prepare_test_file(lines, columns)
+                    if DEBUG:
+                        print("Testing {}".format(filename))
+                    t0 = time.time()
+                    r, o, e = run_command(generate_cmd_function(filename,lines,columns))
+                    duration = time.time() - t0
+                    attempt_result = BenchmarkAttemptResults(attempt, lines, columns, duration, r)
+                    attempt_results += [attempt_result]
+                    if DEBUG:
+                        print("Results: {}".format(attempt_result.__dict__))
+                final_result = self._decide_result(attempt_results)
+                results += [final_result]
+
+        series_fields = [six.u('lines'),six.u('columns')]
+        value_fields = [six.u('mean'),six.u('stddev')]
+
+        all_fields = series_fields + value_fields
+
+        output_filename = '{}/{}.benchmark-results'.format(benchmark_results_folder,name)
+        output_file = open(output_filename,'w')
+        for columns,g in itertools.groupby(sorted(results,key=lambda x:x.columns),key=lambda x:x.columns):
+            x = six.u("\t").join(series_fields + [six.u('{}_{}').format(name, f) for f in value_fields])
+            print(x,file = output_file)
+            for result in g:
+                print(six.u("\t").join(map(str,[getattr(result,f) for f in all_fields])),file=output_file)
+        output_file.close()
+
+        print("results have been written to : {}".format(output_filename))
+        if DEBUG:
+            print("RESULTS FOR {}".format(name))
+            print(open(output_filename,'r').read())
+
+    def test_q_matrix(self):
+        venv = os.path.basename(os.environ.get('VIRTUAL_ENV') or 'unknown-virtual-env')
+
+        def generate_q_cmd(data_filename,line_count,column_count):
+            if column_count == 1:
+                additional_params = '-c 1'
+            else:
+                additional_params = ''
+            return '{} -d , {} "select count(*) from {}"'.format(Q_EXECUTABLE,additional_params, data_filename)
+        self._perform_test_performance_matrix(venv,generate_q_cmd)
+
+    def _get_textql_version(self):
+        r,o,e = run_command("textql --version")
+        if r != 0:
+            raise Exception("Could not find textql")
+        if len(e) != 0:
+            raise Exception("Errors while getting textql version")
+        return o[0]
+
+    def _get_octosql_version(self):
+        r,o,e = run_command("octosql --version")
+        if r != 0:
+            raise Exception("Could not find octosql")
+        if len(e) != 0:
+            raise Exception("Errors while getting octosql version")
+        import re
+        version = re.findall('v[0-9]+\.[0-9]+\.[0-9]+',o[0])[0]
+        return version
+
+    def test_textql_matrix(self):
+        def generate_textql_cmd(data_filename,line_count,column_count):
+            return 'textql -dlm , -sql "select count(*)" {}'.format(data_filename)
+
+        name = 'textql_%s' % self._get_textql_version()
+        self._perform_test_performance_matrix(name,generate_textql_cmd)
+
+    def test_octosql_matrix(self):
+        config_fn = self.random_tmp_filename('octosql', 'config')
+        def generate_octosql_cmd(data_filename,line_count,column_count):
+            j = """
+dataSources:
+  - name: bmdata
+    type: csv
+    config:
+      path: "{}"
+      headerRow: false
+      batchSize: 10000
+""".format(data_filename)[1:]
+            f = open(config_fn,'w')
+            f.write(j)
+            f.close()
+            return 'octosql -c {} -o batch-csv "select count(*) from bmdata a"'.format(config_fn)
+
+        name = 'octosql_%s' % self._get_octosql_version()
+        self._perform_test_performance_matrix(name,generate_octosql_cmd)
+
 def suite():
    tl = unittest.TestLoader()
    basic_stuff = tl.loadTestsFromTestCase(BasicTests)