mirror of
https://github.com/adambard/learnxinyminutes-docs.git
synced 2024-12-13 03:11:53 +03:00
774 lines
22 KiB
Markdown
774 lines
22 KiB
Markdown
---
|
|
language: kdb+
|
|
contributors:
|
|
- ["Matt Doherty", "https://github.com/picodoc"]
|
|
- ["Jonny Press", "https://github.com/jonnypress"]
|
|
filename: learnkdb.q
|
|
---
|
|
|
|
The q language and its database component kdb+ were developed by Arthur Whitney
|
|
and released by Kx systems in 2003. q is a descendant of APL and as such is
|
|
very terse and a little strange looking for anyone from a "C heritage" language
|
|
background. Its expressiveness and vector oriented nature make it well suited
|
|
to performing complex calculations on large amounts of data (while also
|
|
encouraging some amount of [code golf](https://en.wikipedia.org/wiki/Code_golf)).
|
|
The fundamental structure in the language is not the object but instead the list,
|
|
and tables are built as collections of lists. This means - unlike most traditional
|
|
RDBMS systems - tables are column oriented. The language has both an in-memory
|
|
and on-disk database built in, giving a large amount of flexibility. kdb+ is most
|
|
widely used in the world of finance to store, analyze, process and retrieve large
|
|
time-series data sets.
|
|
|
|
The terms *q* and *kdb+* are usually used interchangeably, as the two are not
|
|
separable so this distinction is not really useful.
|
|
|
|
To learn more about kdb+ you can join the
|
|
[KX Community forums](https://learninghub.kx.com/forums/) or
|
|
the [TorQ kdb+](https://groups.google.com/forum/#!forum/kdbtorq) group.
|
|
|
|
```q
|
|
/ Single line comments start with a forward-slash
|
|
/ These can also be used in-line, so long as at least one whitespace character
|
|
/ separates it from text to the left
|
|
/
|
|
A forward-slash on a line by itself starts a multiline comment
|
|
and a backward-slash on a line by itself terminates it
|
|
\
|
|
|
|
/ Run this file in an empty directory
|
|
|
|
|
|
////////////////////////////////////
|
|
// Basic Operators and Datatypes //
|
|
////////////////////////////////////
|
|
|
|
/ We have integers, which are 8 byte by default
|
|
3 / => 3
|
|
|
|
/ And floats, also 8 byte as standard. Trailing f distinguishes from int
|
|
3.0 / => 3f
|
|
|
|
/ 4 byte numerical types can also be specified with trailing chars
|
|
3i / => 3i
|
|
3.0e / => 3e
|
|
|
|
/ Math is mostly what you would expect
|
|
1+1 / => 2
|
|
8-1 / => 7
|
|
10*2 / => 20
|
|
/ Except division, which uses percent (%) instead of forward-slash (/)
|
|
35%5 / => 7f (the result of division is always a float)
|
|
|
|
/ For integer division we have the keyword div
|
|
4 div 3 / => 1
|
|
|
|
/ Modulo also uses a keyword, since percent (%) is taken
|
|
4 mod 3 / => 1
|
|
|
|
/ And exponentiation...
|
|
2 xexp 4 / => 16
|
|
|
|
/ ...and truncating...
|
|
floor 3.14159 / => 3
|
|
|
|
/ ...getting the absolute value...
|
|
abs -3.14159 / => 3.14159
|
|
/ ...and many other things
|
|
/ see http://code.kx.com/q/ref/ for more
|
|
|
|
/ q has no operator precedence, everything is evaluated right to left
|
|
/ so results like this might take some getting used to
|
|
2*1+1 / => 4 / (no operator precedence tables to remember!)
|
|
|
|
/ Precedence can be modified with parentheses (restoring the 'normal' result)
|
|
(2*1)+1 / => 3
|
|
|
|
/ Assignment uses colon (:) instead of equals (=)
|
|
/ No need to declare variables before assignment
|
|
a:3
|
|
a / => 3
|
|
|
|
/ Variables can also be assigned in-line
|
|
/ this does not affect the value passed on
|
|
c:3+b:2+a:1 / (data "flows" from right to left)
|
|
a / => 1
|
|
b / => 3
|
|
c / => 6
|
|
|
|
/ In-place operations are also as you might expect
|
|
a+:2
|
|
a / => 3
|
|
|
|
/ There are no "true" or "false" keywords in q
|
|
/ boolean values are indicated by the bit value followed by b
|
|
1b / => true value
|
|
0b / => false value
|
|
|
|
/ Equality comparisons use equals (=) (since we don't need it for assignment)
|
|
1=1 / => 1b
|
|
2=1 / => 0b
|
|
|
|
/ Inequality uses <>
|
|
1<>1 / => 0b
|
|
2<>1 / => 1b
|
|
|
|
/ The other comparisons are as you might expect
|
|
1<2 / => 1b
|
|
1>2 / => 0b
|
|
2<=2 / => 1b
|
|
2>=2 / => 1b
|
|
|
|
/ Comparison is not strict with regard to types...
|
|
42=42.0 / => 1b
|
|
|
|
/ ...unless we use the match operator (~)
|
|
/ which only returns true if entities are identical
|
|
42~42.0 / => 0b
|
|
|
|
/ The not operator returns true if the underlying value is zero
|
|
not 0b / => 1b
|
|
not 1b / => 0b
|
|
not 42 / => 0b
|
|
not 0.0 / => 1b
|
|
|
|
/ The max operator (|) reduces to logical "or" for bools
|
|
42|2.0 / => 42f
|
|
1b|0b / => 1b
|
|
|
|
/ The min operator (&) reduces to logical "and" for bools
|
|
42&2.0 / => 2f
|
|
1b&0b / => 0b
|
|
|
|
/ q provides two ways to store character data
|
|
/ Chars in q are stored in a single byte and use double-quotes (")
|
|
ch:"a"
|
|
/ Strings are simply lists of char (more on lists later)
|
|
str:"This is a string"
|
|
/ Escape characters work as normal
|
|
str:"This is a string with \"quotes\""
|
|
|
|
/ Char data can also be stored as symbols using backtick (`)
|
|
symbol:`sym
|
|
/ Symbols are NOT LISTS, they are an enumeration
|
|
/ the q process stores internally a vector of strings
|
|
/ symbols are enumerated against this vector
|
|
/ this can be more space and speed efficient as these are constant width
|
|
|
|
/ The string function converts to strings
|
|
string `symbol / => "symbol"
|
|
string 1.2345 / => "1.2345"
|
|
|
|
/ q has a time type...
|
|
t:01:00:00.000
|
|
/ date type...
|
|
d:2015.12.25
|
|
/ and a datetime type (among other time types)
|
|
dt:2015.12.25D12:00:00.000000000
|
|
|
|
/ These support some arithmetic for easy manipulation
|
|
dt + t / => 2015.12.25D13:00:00.000000000
|
|
t - 00:10:00.000 / => 00:50:00.000
|
|
/ and can be decomposed using dot notation
|
|
d.year / => 2015i
|
|
d.mm / => 12i
|
|
d.dd / => 25i
|
|
/ see http://code.kx.com/q4m3/2_Basic_Data_Types_Atoms/#25-temporal-data for more
|
|
|
|
/ q also has an infinity value so div by zero will not throw an error
|
|
1%0 / => 0w
|
|
-1%0 / => -0w
|
|
|
|
/ And null types for representing missing values
|
|
0N / => null int
|
|
0n / => null float
|
|
/ see http://code.kx.com/q4m3/2_Basic_Data_Types_Atoms/#27-nulls for more
|
|
|
|
/ q has standard control structures
|
|
/ if is as you might expect (; separates the condition and instructions)
|
|
if[1=1;a:"hi"]
|
|
a / => "hi"
|
|
/ if-else uses $ (and unlike if, returns a value)
|
|
$[1=0;a:"hi";a:"bye"] / => "bye"
|
|
a / => "bye"
|
|
/ if-else can be extended to multiple clauses by adding args separated by ;
|
|
$[1=0;a:"hi";0=1;a:"bye";a:"hello again"]
|
|
a / => "hello again"
|
|
|
|
|
|
////////////////////////////////////
|
|
//// Data Structures ////
|
|
////////////////////////////////////
|
|
|
|
/ q is not an object oriented language
|
|
/ instead complexity is built through ordered lists
|
|
/ and mapping them into higher order structures: dictionaries and tables
|
|
|
|
/ Lists (or arrays if you prefer) are simple ordered collections
|
|
/ they are defined using parentheses () and semi-colons (;)
|
|
(1;2;3) / => 1 2 3
|
|
(-10.0;3.14159e;1b;`abc;"c")
|
|
/ => -10f
|
|
/ => 3.14159e
|
|
/ => 1b
|
|
/ => `abc
|
|
/ => "c" (mixed type lists are displayed on multiple lines)
|
|
((1;2;3);(4;5;6);(7;8;9))
|
|
/ => 1 2 3
|
|
/ => 4 5 6
|
|
/ => 7 8 9
|
|
|
|
/ Lists of uniform type can also be defined more concisely
|
|
1 2 3 / => 1 2 3
|
|
`list`of`syms / => `list`of`syms
|
|
`list`of`syms ~ (`list;`of;`syms) / => 1b
|
|
|
|
/ List length
|
|
count (1;2;3) / => 3
|
|
count "I am a string" / => 13 (string are lists of char)
|
|
|
|
/ Empty lists are defined with parentheses
|
|
l:()
|
|
count l / => 0
|
|
|
|
/ Simple variables and single item lists are not equivalent
|
|
/ parentheses syntax cannot create a single item list (they indicate precedence)
|
|
(1)~1 / => 1b
|
|
/ single item lists can be created using enlist
|
|
singleton:enlist 1
|
|
/ or appending to an empty list
|
|
singleton:(),1
|
|
1~(),1 / => 0b
|
|
|
|
/ Speaking of appending, comma (,) is used for this, not plus (+)
|
|
1 2 3,4 5 6 / => 1 2 3 4 5 6
|
|
"hello ","there" / => "hello there"
|
|
|
|
/ Indexing uses square brackets []
|
|
l:1 2 3 4
|
|
l[0] / => 1
|
|
l[1] / => 2
|
|
/ indexing out of bounds returns a null value rather than an error
|
|
l[5] / => 0N
|
|
/ and indexed assignment
|
|
l[0]:5
|
|
l / => 5 2 3 4
|
|
|
|
/ Lists can also be used for indexing and indexed assignment
|
|
l[1 3] / => 2 4
|
|
l[1 3]: 1 3
|
|
l / => 5 1 3 3
|
|
|
|
/ Lists can be untyped/mixed type
|
|
l:(1;2;`hi)
|
|
/ but once they are uniformly typed, q will enforce this
|
|
l[2]:3
|
|
l / => 1 2 3
|
|
l[2]:`hi / throws a type error
|
|
/ this makes sense in the context of lists as table columns (more later)
|
|
|
|
/ For a nested list we can index at depth
|
|
l:((1;2;3);(4;5;6);(7;8;9))
|
|
l[1;1] / => 5
|
|
|
|
/ We can elide the indexes to return entire rows or columns
|
|
l[;1] / => 2 5 8
|
|
l[1;] / => 4 5 6
|
|
|
|
/ All the functions mentioned in the previous section work on lists natively
|
|
1+(1;2;3) / => 2 3 4 (single variable and list)
|
|
(1;2;3) - (3;2;1) / => -2 0 2 (list and list)
|
|
|
|
/ And there are many more that are designed specifically for lists
|
|
avg 1 2 3 / => 2f
|
|
sum 1 2 3 / => 6
|
|
sums 1 2 3 / => 1 3 6 (running sum)
|
|
last 1 2 3 / => 3
|
|
1 rotate 1 2 3 / => 2 3 1
|
|
/ etc.
|
|
/ Using and combining these functions to manipulate lists is where much of the
|
|
/ power and expressiveness of the language comes from
|
|
|
|
/ Take (#), drop (_) and find (?) are also useful working with lists
|
|
l:1 2 3 4 5 6 7 8 9
|
|
l:1+til 9 / til is a useful shortcut for generating ranges
|
|
/ take the first 5 elements
|
|
5#l / => 1 2 3 4 5
|
|
/ drop the first 5
|
|
5_l / => 6 7 8 9
|
|
/ take the last 5
|
|
-5#l / => 5 6 7 8 9
|
|
/ drop the last 5
|
|
-5_l / => 1 2 3 4
|
|
/ find the first occurrence of 4
|
|
l?4 / => 3
|
|
l[3] / => 4
|
|
|
|
/ Dictionaries in q are a generalization of lists
|
|
/ they map a list to another list (of equal length)
|
|
/ the bang (!) symbol is used for defining a dictionary
|
|
d:(`a;`b;`c)!(1;2;3)
|
|
/ or more simply with concise list syntax
|
|
d:`a`b`c!1 2 3
|
|
/ the keyword key returns the first list
|
|
key d / => `a`b`c
|
|
/ and value the second
|
|
value d / => 1 2 3
|
|
|
|
/ Indexing is identical to lists
|
|
/ with the first list as a key instead of the position
|
|
d[`a] / => 1
|
|
d[`b] / => 2
|
|
|
|
/ As is assignment
|
|
d[`c]:4
|
|
d
|
|
/ => a| 1
|
|
/ => b| 2
|
|
/ => c| 4
|
|
|
|
/ Arithmetic and comparison work natively, just like lists
|
|
e:(`a;`b;`c)!(2;3;4)
|
|
d+e
|
|
/ => a| 3
|
|
/ => b| 5
|
|
/ => c| 8
|
|
d-2
|
|
/ => a| -1
|
|
/ => b| 0
|
|
/ => c| 2
|
|
d > (1;1;1)
|
|
/ => a| 0
|
|
/ => b| 1
|
|
/ => c| 1
|
|
|
|
/ And the take, drop and find operators are remarkably similar too
|
|
`a`b#d
|
|
/ => a| 1
|
|
/ => b| 2
|
|
`a`b _ d
|
|
/ => c| 4
|
|
d?2
|
|
/ => `b
|
|
|
|
/ Tables in q are basically a subset of dictionaries
|
|
/ a table is a dictionary where all values must be lists of the same length
|
|
/ as such tables in q are column oriented (unlike most RDBMS)
|
|
/ the flip keyword is used to convert a dictionary to a table
|
|
/ i.e. flip the indices
|
|
flip `c1`c2`c3!(1 2 3;4 5 6;7 8 9)
|
|
/ => c1 c2 c3
|
|
/ => --------
|
|
/ => 1 4 7
|
|
/ => 2 5 8
|
|
/ => 3 6 9
|
|
/ we can also define tables using this syntax
|
|
t:([]c1:1 2 3;c2:4 5 6;c3:7 8 9)
|
|
t
|
|
/ => c1 c2 c3
|
|
/ => --------
|
|
/ => 1 4 7
|
|
/ => 2 5 8
|
|
/ => 3 6 9
|
|
|
|
/ Tables can be indexed and manipulated in a similar way to dicts and lists
|
|
t[`c1]
|
|
/ => 1 2 3
|
|
/ table rows are returned as dictionaries
|
|
t[1]
|
|
/ => c1| 2
|
|
/ => c2| 5
|
|
/ => c3| 8
|
|
|
|
/ meta returns table type information
|
|
meta t
|
|
/ => c | t f a
|
|
/ => --| -----
|
|
/ => c1| j
|
|
/ => c2| j
|
|
/ => c3| j
|
|
/ now we see why type is enforced in lists (to protect column types)
|
|
t[1;`c1]:3
|
|
t[1;`c1]:3.0 / throws a type error
|
|
|
|
/ Most traditional databases have primary key columns
|
|
/ in q we have keyed tables, where one table containing key columns
|
|
/ is mapped to another table using bang (!)
|
|
k:([]id:1 2 3)
|
|
k!t
|
|
/ => id| c1 c2 c3
|
|
/ => --| --------
|
|
/ => 1 | 1 4 7
|
|
/ => 2 | 3 5 8
|
|
/ => 3 | 3 6 9
|
|
|
|
/ We can also use this shortcut for defining keyed tables
|
|
kt:([id:1 2 3]c1:1 2 3;c2:4 5 6;c3:7 8 9)
|
|
|
|
/ Records can then be retrieved based on this key
|
|
kt[1]
|
|
/ => c1| 1
|
|
/ => c2| 4
|
|
/ => c3| 7
|
|
kt[`id!1]
|
|
/ => c1| 1
|
|
/ => c2| 4
|
|
/ => c3| 7
|
|
|
|
|
|
////////////////////////////////////
|
|
//////// Functions ////////
|
|
////////////////////////////////////
|
|
|
|
/ In q the function is similar to a mathematical map, mapping inputs to outputs
|
|
/ curly braces {} are used for function definition
|
|
/ and square brackets [] for calling functions (just like list indexing)
|
|
/ a very minimal function
|
|
f:{x+x}
|
|
f[2] / => 4
|
|
|
|
/ Functions can be anonymous and called at point of definition
|
|
{x+x}[2] / => 4
|
|
|
|
/ By default the last expression is returned
|
|
/ colon (:) can be used to specify return
|
|
{x+x}[2] / => 4
|
|
{:x+x}[2] / => 4
|
|
/ semi-colon (;) separates expressions
|
|
{r:x+x;:r}[2] / => 4
|
|
|
|
/ Function arguments can be specified explicitly (separated by ;)
|
|
{[arg1;arg2] arg1+arg2}[1;2] / => 3
|
|
/ or if omitted will default to x, y and z
|
|
{x+y+z}[1;2;3] / => 6
|
|
|
|
/ Built in functions are no different, and can be called the same way (with [])
|
|
+[1;2] / => 3
|
|
<[1;2] / => 1b
|
|
|
|
/ Functions are first class in q, so can be returned, stored in lists etc.
|
|
{:{x+y}}[] / => {x+y}
|
|
(1;"hi";{x+y})
|
|
/ => 1
|
|
/ => "hi"
|
|
/ => {x+y}
|
|
|
|
/ There is no overloading and no keyword arguments for custom q functions
|
|
/ however using a dictionary as a single argument can overcome this
|
|
/ allows for optional arguments or differing functionality
|
|
d:`arg1`arg2`arg3!(1.0;2;"my function argument")
|
|
{x[`arg1]+x[`arg2]}[d] / => 3f
|
|
|
|
/ Functions in q see the global scope
|
|
a:1
|
|
{:a}[] / => 1
|
|
|
|
/ However local scope obscures this
|
|
a:1
|
|
{a:2;:a}[] / => 2
|
|
a / => 1
|
|
|
|
/ Functions cannot see nested scopes (only local and global)
|
|
{local:1;{:local}[]}[] / throws error as local is not defined in inner function
|
|
|
|
/ A function can have one or more of its arguments fixed (projection)
|
|
f:+[4]
|
|
f[4] / => 8
|
|
f[5] / => 9
|
|
f[6] / => 10
|
|
|
|
|
|
////////////////////////////////////
|
|
////////// q-sql //////////
|
|
////////////////////////////////////
|
|
|
|
/ q has its own syntax for manipulating tables, similar to standard SQL
|
|
/ This contains the usual suspects of select, insert, update etc.
|
|
/ and some new functionality not typically available
|
|
/ q-sql has two significant differences (other than syntax) to normal SQL:
|
|
/ - q tables have well defined record orders
|
|
/ - tables are stored as a collection of columns
|
|
/ (so vectorized column operations are fast)
|
|
/ a full description of q-sql is a little beyond the scope of this intro
|
|
/ so we will just cover enough of the basics to get you going
|
|
|
|
/ First define ourselves a table
|
|
t:([]name:`Arthur`Thomas`Polly;age:35 32 52;height:180 175 160;sex:`m`m`f)
|
|
|
|
/ equivalent of SELECT * FROM t
|
|
select from t / (must be lower case, and the wildcard is not necessary)
|
|
/ => name age height sex
|
|
/ => ---------------------
|
|
/ => Arthur 35 180 m
|
|
/ => Thomas 32 175 m
|
|
/ => Polly 52 160 f
|
|
|
|
/ Select specific columns
|
|
select name,age from t
|
|
/ => name age
|
|
/ => ----------
|
|
/ => Arthur 35
|
|
/ => Thomas 32
|
|
/ => Polly 52
|
|
|
|
/ And name them (equivalent of using AS in standard SQL)
|
|
select charactername:name, currentage:age from t
|
|
/ => charactername currentage
|
|
/ => ------------------------
|
|
/ => Arthur 35
|
|
/ => Thomas 32
|
|
/ => Polly 52
|
|
|
|
/ This SQL syntax is integrated with the q language
|
|
/ so q can be used seamlessly in SQL statements
|
|
select name, feet:floor height*0.032, inches:12*(height*0.032) mod 1 from t
|
|
/ => name feet inches
|
|
/ => ------------------
|
|
/ => Arthur 5 9.12
|
|
/ => Thomas 5 7.2
|
|
/ => Polly 5 1.44
|
|
|
|
/ Including custom functions
|
|
select name, growth:{[h;a]h%a}[height;age] from t
|
|
/ => name growth
|
|
/ => ---------------
|
|
/ => Arthur 5.142857
|
|
/ => Thomas 5.46875
|
|
/ => Polly 3.076923
|
|
|
|
/ The where clause can contain multiple statements separated by commas
|
|
select from t where age>33,height>175
|
|
/ => name age height sex
|
|
/ => ---------------------
|
|
/ => Arthur 35 180 m
|
|
|
|
/ The where statements are executed sequentially (not the same as logical AND)
|
|
select from t where age<40,height=min height
|
|
/ => name age height sex
|
|
/ => ---------------------
|
|
/ => Thomas 32 175 m
|
|
select from t where (age<40)&(height=min height)
|
|
/ => name age height sex
|
|
/ => -------------------
|
|
|
|
/ The by clause falls between select and from
|
|
/ and is equivalent to SQL's GROUP BY
|
|
select avg height by sex from t
|
|
/ => sex| height
|
|
/ => ---| ------
|
|
/ => f | 160
|
|
/ => m | 177.5
|
|
|
|
/ If no aggregation function is specified, last is assumed
|
|
select by sex from t
|
|
/ => sex| name age height
|
|
/ => ---| -----------------
|
|
/ => f | Polly 52 160
|
|
/ => m | Thomas 32 175
|
|
|
|
/ Update has the same basic form as select
|
|
update sex:`male from t where sex=`m
|
|
/ => name age height sex
|
|
/ => ----------------------
|
|
/ => Arthur 35 180 male
|
|
/ => Thomas 32 175 male
|
|
/ => Polly 52 160 f
|
|
|
|
/ As does delete
|
|
delete from t where sex=`m
|
|
/ => name age height sex
|
|
/ => --------------------
|
|
/ => Polly 52 160 f
|
|
|
|
/ None of these sql operations are carried out in place
|
|
t
|
|
/ => name age height sex
|
|
/ => ---------------------
|
|
/ => Arthur 35 180 m
|
|
/ => Thomas 32 175 m
|
|
/ => Polly 52 160 f
|
|
|
|
/ Insert however is in place, it takes a table name, and new data
|
|
`t insert (`John;25;178;`m) / => ,3
|
|
t
|
|
/ => name age height sex
|
|
/ => ---------------------
|
|
/ => Arthur 35 180 m
|
|
/ => Thomas 32 175 m
|
|
/ => Polly 52 160 f
|
|
/ => John 25 178 m
|
|
|
|
/ Upsert is similar (but doesn't have to be in-place)
|
|
t upsert (`Chester;58;179;`m)
|
|
/ => name age height sex
|
|
/ => ----------------------
|
|
/ => Arthur 35 180 m
|
|
/ => Thomas 32 175 m
|
|
/ => Polly 52 160 f
|
|
/ => John 25 178 m
|
|
/ => Chester 58 179 m
|
|
|
|
/ it will also upsert dicts or tables
|
|
t upsert `name`age`height`sex!(`Chester;58;179;`m)
|
|
t upsert (`Chester;58;179;`m)
|
|
/ => name age height sex
|
|
/ => ----------------------
|
|
/ => Arthur 35 180 m
|
|
/ => Thomas 32 175 m
|
|
/ => Polly 52 160 f
|
|
/ => John 25 178 m
|
|
/ => Chester 58 179 m
|
|
|
|
/ And if our table is keyed
|
|
kt:`name xkey t
|
|
/ upsert will replace records where required
|
|
kt upsert ([]name:`Thomas`Chester;age:33 58;height:175 179;sex:`f`m)
|
|
/ => name | age height sex
|
|
/ => -------| --------------
|
|
/ => Arthur | 35 180 m
|
|
/ => Thomas | 33 175 f
|
|
/ => Polly | 52 160 f
|
|
/ => John | 25 178 m
|
|
/ => Chester| 58 179 m
|
|
|
|
/ There is no ORDER BY clause in q-sql, instead use xasc/xdesc
|
|
`name xasc t
|
|
/ => name age height sex
|
|
/ => ---------------------
|
|
/ => Arthur 35 180 m
|
|
/ => John 25 178 m
|
|
/ => Polly 52 160 f
|
|
/ => Thomas 32 175 m
|
|
|
|
/ Most of the standard SQL joins are present in q-sql, plus a few new friends
|
|
/ see http://code.kx.com/q4m3/9_Queries_q-sql/#99-joins
|
|
/ the two most important (commonly used) are lj and aj
|
|
|
|
/ lj is basically the same as SQL LEFT JOIN
|
|
/ where the join is carried out on the key columns of the left table
|
|
le:([sex:`m`f]lifeexpectancy:78 85)
|
|
t lj le
|
|
/ => name age height sex lifeexpectancy
|
|
/ => ------------------------------------
|
|
/ => Arthur 35 180 m 78
|
|
/ => Thomas 32 175 m 78
|
|
/ => Polly 52 160 f 85
|
|
/ => John 25 178 m 78
|
|
|
|
/ aj is an asof join. This is not a standard SQL join, and can be very powerful
|
|
/ The canonical example of this is joining financial trades and quotes tables
|
|
trades:([]time:10:01:01 10:01:03 10:01:04;sym:`msft`ibm`ge;qty:100 200 150)
|
|
quotes:([]time:10:01:00 10:01:01 10:01:01 10:01:03;
|
|
sym:`ibm`msft`msft`ibm; px:100 99 101 98)
|
|
aj[`time`sym;trades;quotes]
|
|
/ => time sym qty px
|
|
/ => ---------------------
|
|
/ => 10:01:01 msft 100 101
|
|
/ => 10:01:03 ibm 200 98
|
|
/ => 10:01:04 ge 150
|
|
/ for each row in the trade table, the last (prevailing) quote (px) for that sym
|
|
/ is joined on.
|
|
/ see http://code.kx.com/q4m3/9_Queries_q-sql/#998-as-of-joins
|
|
|
|
////////////////////////////////////
|
|
///// Extra/Advanced //////
|
|
////////////////////////////////////
|
|
|
|
////// Adverbs //////
|
|
/ You may have noticed the total lack of loops to this point
|
|
/ This is not a mistake!
|
|
/ q is a vector language so explicit loops (for, while etc.) are not encouraged
|
|
/ where possible functionality should be vectorized (i.e. operations on lists)
|
|
/ adverbs supplement this, modifying the behaviour of functions
|
|
/ and providing loop type functionality when required
|
|
/ (in q functions are sometimes referred to as verbs, hence adverbs)
|
|
/ the "each" adverb modifies a function to treat a list as individual variables
|
|
first each (1 2 3;4 5 6;7 8 9)
|
|
/ => 1 4 7
|
|
|
|
/ each-left (\:) and each-right (/:) modify a two-argument function
|
|
/ to treat one of the arguments and individual variables instead of a list
|
|
1 2 3 +\: 11 22 33
|
|
/ => 12 23 34
|
|
/ => 13 24 35
|
|
/ => 14 25 36
|
|
1 2 3 +/: 11 22 33
|
|
/ => 12 13 14
|
|
/ => 23 24 25
|
|
/ => 34 35 36
|
|
|
|
/ The true alternatives to loops in q are the adverbs scan (\) and over (/)
|
|
/ their behaviour differs based on the number of arguments the function they
|
|
/ are modifying receives. Here I'll summarise some of the most useful cases
|
|
/ a single argument function modified by scan given 2 args behaves like "do"
|
|
{x * 2}\[5;1] / => 1 2 4 8 16 32 (i.e. multiply by 2, 5 times)
|
|
{x * 2}/[5;1] / => 32 (using over only the final result is shown)
|
|
|
|
/ If the first argument is a function, we have the equivalent of "while"
|
|
{x * 2}\[{x<100};1] / => 1 2 4 8 16 32 64 128 (iterates until returns 0b)
|
|
{x * 2}/[{x<100};1] / => 128 (again returns only the final result)
|
|
|
|
/ If the function takes two arguments, and we pass a list, we have "for"
|
|
/ where the result of the previous execution is passed back into the next loop
|
|
/ along with the next member of the list
|
|
{x + y}\[1 2 3 4 5] / => 1 3 6 10 15 (i.e. the running sum)
|
|
{x + y}/[1 2 3 4 5] / => 15 (only the final result)
|
|
|
|
/ There are other iterators and uses, this is only intended as quick overview
|
|
/ http://code.kx.com/q4m3/6_Functions/#67-iterators
|
|
|
|
////// Scripts //////
|
|
/ q scripts can be loaded from a q session using the "\l" command
|
|
/ for example "\l learnkdb.q" will load this script
|
|
/ or from the command prompt passing the script as an argument
|
|
/ for example "q learnkdb.q"
|
|
|
|
////// On-disk data //////
|
|
/ Tables can be persisted to disk in several formats
|
|
/ the two most fundamental are serialized and splayed
|
|
t:([]a:1 2 3;b:1 2 3f)
|
|
`:serialized set t / saves the table as a single serialized file
|
|
`:splayed/ set t / saves the table splayed into a directory
|
|
|
|
/ the dir structure will now look something like:
|
|
/ db/
|
|
/ ├── serialized
|
|
/ └── splayed
|
|
/ ├── a
|
|
/ └── b
|
|
|
|
/ Loading this directory (as if it was as script, see above)
|
|
/ loads these tables into the q session
|
|
\l .
|
|
/ the serialized table will be loaded into memory
|
|
/ however the splayed table will only be mapped, not loaded
|
|
/ both tables can be queried using q-sql
|
|
select from serialized
|
|
/ => a b
|
|
/ => ---
|
|
/ => 1 1
|
|
/ => 2 2
|
|
/ => 3 3
|
|
select from splayed / (the columns are read from disk on request)
|
|
/ => a b
|
|
/ => ---
|
|
/ => 1 1
|
|
/ => 2 2
|
|
/ => 3 3
|
|
/ see http://code.kx.com/q4m3/14_Introduction_to_Kdb+/ for more
|
|
|
|
////// Frameworks //////
|
|
/ kdb+ is typically used for data capture and analysis.
|
|
/ This involves using an architecture with multiple processes
|
|
/ working together. kdb+ frameworks are available to streamline the setup
|
|
/ and configuration of this architecture and add additional functionality
|
|
/ such as disaster recovery, logging, access, load balancing etc.
|
|
/ https://github.com/DataIntellectTech/TorQ
|
|
```
|
|
|
|
## Want to know more?
|
|
|
|
* [*q for mortals* q language tutorial](http://code.kx.com/q4m3/)
|
|
* [*Introduction to Kdb+* on disk data tutorial](http://code.kx.com/q4m3/14_Introduction_to_Kdb+/)
|
|
* [q language reference](https://code.kx.com/q/ref/)
|
|
* [TorQ production framework](https://github.com/DataIntellectTech/TorQ)
|