bloodhound/README.md

964 lines
27 KiB
Markdown
Raw Normal View History

2014-11-21 03:37:30 +03:00
Bloodhound [![TravisCI](https://travis-ci.org/bitemyapp/bloodhound.svg)](https://travis-ci.org/bitemyapp/bloodhound) [![Hackage](https://img.shields.io/hackage/v/bloodhound.svg?style=flat)](https://hackage.haskell.org/package/bloodhound)
2014-11-21 03:34:37 +03:00
==========
2014-04-07 22:24:58 +04:00
2014-11-21 03:34:37 +03:00
![Bloodhound (dog)](./bloodhound.jpg)
2014-11-21 03:34:37 +03:00
Elasticsearch client and query DSL for Haskell
==============================================
2014-11-10 21:25:35 +03:00
2014-11-21 03:34:37 +03:00
Why?
----
2014-04-15 12:20:14 +04:00
Search doesn't have to be hard. Let the dog do it.
2014-11-21 03:34:37 +03:00
Endorsements
------------
2014-10-01 11:21:29 +04:00
"Bloodhound makes Elasticsearch almost tolerable!" - Almost-gruntled user
2014-11-10 21:38:19 +03:00
"ES is a nightmare but Bloodhound at least makes it tolerable." - Same user, later opinion.
2014-11-21 03:34:37 +03:00
Version compatibility
---------------------
2014-11-21 03:34:37 +03:00
Elasticsearch \>= 1.0 is recommended. Bloodhound mostly works with 0.9.x, but I don't recommend it if you expect everything to work. As of Bloodhound 0.3 all \>=1.0 versions of Elasticsearch work.
2014-11-21 03:34:37 +03:00
Current versions we test against are 1.0.3, 1.1.2, 1.2.3, 1.3.2, and 1.4.0. We also check that GHC 7.6 and 7.8 both build and pass tests. See our [TravisCI](https://travis-ci.org/bitemyapp/bloodhound) to learn more.
2014-11-21 03:34:37 +03:00
Stability
---------
2014-04-12 23:09:36 +04:00
2014-11-20 07:13:19 +03:00
Bloodhound is stable for production use. I will strive to avoid breaking API compatibility from here on forward, but dramatic features like a type-safe, fully integrated mapping API may require breaking things in the future.
2014-04-12 23:09:36 +04:00
2014-11-21 03:34:37 +03:00
Hackage page and Haddock documentation
======================================
2014-05-04 02:39:10 +04:00
2014-11-21 03:34:37 +03:00
<http://hackage.haskell.org/package/bloodhound>
2014-05-04 02:39:10 +04:00
2015-02-07 00:27:57 +03:00
Elasticsearch Tutorial
======================
It's not using Bloodhound, but if you need an introduction to or overview of Elasticsearch and how to use it, you can use [this screencast](http://vimeo.com/106463167).
2014-11-21 03:34:37 +03:00
Examples
========
2014-04-12 23:09:36 +04:00
2014-11-21 03:34:37 +03:00
Index Operations
----------------
2014-11-21 03:34:37 +03:00
### Create Index
2014-11-21 03:34:37 +03:00
``` {.haskell}
-- Formatted for use in ghci, so there are "let"s in front of the decls.
2014-04-14 05:34:01 +04:00
-- if you see :{ and :}, they're so you can copy-paste
-- the multi-line examples into your ghci REPL.
2014-04-14 05:16:44 +04:00
:set -XDeriveGeneric
:{
import Control.Applicative
2014-04-15 12:10:47 +04:00
import Database.Bloodhound
import Data.Aeson
import Data.Either (Either(..))
import Data.Maybe (fromJust)
import Data.Time.Calendar (Day(..))
import Data.Time.Clock (secondsToDiffTime, UTCTime(..))
import Data.Text (Text)
import GHC.Generics (Generic)
2014-12-10 23:17:14 +03:00
import Network.HTTP.Client
import qualified Network.HTTP.Types.Status as NHTS
-- no trailing slashes in servers, library handles building the path.
let testServer = (Server "http://localhost:9200")
let testIndex = IndexName "twitter"
let testMapping = MappingName "tweet"
let withBH' = withBH defaultManagerSettings testServer
2014-04-15 12:10:47 +04:00
-- defaultIndexSettings is exported by Database.Bloodhound as well
let defaultIndexSettings = IndexSettings (ShardCount 3) (ReplicaCount 2)
-- createIndex returns MonadBH m => m Reply. You can use withBH for
one-off commands or you can use runBH to group together commands
and to pass in your own HTTP manager for pipelining.
-- response :: Reply, Reply is a synonym for Network.HTTP.Conduit.Response
response <- withBH' $ createIndex defaultIndexSettings testIndex
:}
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
### Delete Index
2014-11-21 03:34:37 +03:00
#### Code
2014-05-03 07:57:20 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
-- response :: Reply
response <- withBH' $ deleteIndex testIndex
2014-11-21 03:34:37 +03:00
```
2014-05-03 07:57:20 +04:00
2014-11-21 03:34:37 +03:00
#### Example Response
2014-05-03 07:57:20 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-05-03 07:57:20 +04:00
-- print response if it was a success
2014-04-14 05:23:03 +04:00
Response {responseStatus = Status {statusCode = 200, statusMessage = "OK"}
, responseVersion = HTTP/1.1
, responseHeaders = [("Content-Type", "application/json; charset=UTF-8")
, ("Content-Length", "21")]
, responseBody = "{\"acknowledged\":true}"
, responseCookieJar = CJ {expose = []}
, responseClose' = ResponseClose}
-- if the index to be deleted didn't exist anyway
2014-04-14 05:23:03 +04:00
Response {responseStatus = Status {statusCode = 404, statusMessage = "Not Found"}
, responseVersion = HTTP/1.1
, responseHeaders = [("Content-Type", "application/json; charset=UTF-8")
, ("Content-Length","65")]
, responseBody = "{\"error\":\"IndexMissingException[[twitter] missing]\",\"status\":404}"
, responseCookieJar = CJ {expose = []}
, responseClose' = ResponseClose}
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
### Refresh Index
2014-11-21 03:34:37 +03:00
#### Note, you **have** to do this if you expect to read what you just wrote
2014-11-21 03:34:37 +03:00
``` {.haskell}
resp <- withBH' $ refreshIndex testIndex
2014-11-21 03:34:37 +03:00
```
2014-05-03 07:57:20 +04:00
2014-11-21 03:34:37 +03:00
#### Example Response
2014-05-03 07:57:20 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-05-03 07:57:20 +04:00
-- print resp on success
2014-04-14 05:23:03 +04:00
Response {responseStatus = Status {statusCode = 200, statusMessage = "OK"}
, responseVersion = HTTP/1.1
, responseHeaders = [("Content-Type", "application/json; charset=UTF-8")
, ("Content-Length","50")]
, responseBody = "{\"_shards\":{\"total\":10,\"successful\":5,\"failed\":0}}"
, responseCookieJar = CJ {expose = []}
, responseClose' = ResponseClose}
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
Mapping Operations
------------------
2014-11-21 03:34:37 +03:00
### Create Mapping
2014-11-21 03:34:37 +03:00
``` {.haskell}
-- don't forget imports and the like at the top.
data TweetMapping = TweetMapping deriving (Eq, Show)
2014-04-14 05:24:23 +04:00
-- I know writing the JSON manually sucks.
-- I don't have a proper data type for Mappings yet.
-- Let me know if this is something you need.
:{
instance ToJSON TweetMapping where
toJSON TweetMapping =
object ["tweet" .=
object ["properties" .=
object ["location" .=
object ["type" .= ("geo_point" :: Text)]]]]
:}
resp <- withBH' $ putMapping testIndex testMapping TweetMapping
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
### Delete Mapping
2014-11-21 03:34:37 +03:00
``` {.haskell}
resp <- withBH' $ deleteMapping testIndex testMapping
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
Document Operations
-------------------
2014-11-21 03:34:37 +03:00
### Indexing Documents
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 05:19:18 +04:00
-- don't forget the imports and derive generic setting for ghci
-- at the beginning of the examples.
:{
data Location = Location { lat :: Double
, lon :: Double } deriving (Eq, Generic, Show)
data Tweet = Tweet { user :: Text
, postDate :: UTCTime
, message :: Text
, age :: Int
, location :: Location } deriving (Eq, Generic, Show)
exampleTweet = Tweet { user = "bitemyapp"
, postDate = UTCTime
(ModifiedJulianDay 55000)
(secondsToDiffTime 10)
, message = "Use haskell!"
, age = 10000
, location = Location 40.12 (-71.34) }
-- automagic (generic) derivation of instances because we're lazy.
instance ToJSON Tweet
instance FromJSON Tweet
instance ToJSON Location
instance FromJSON Location
:}
-- Should be able to toJSON and encode the data structures like this:
-- λ> toJSON $ Location 10.0 10.0
-- Object fromList [("lat",Number 10.0),("lon",Number 10.0)]
-- λ> encode $ Location 10.0 10.0
-- "{\"lat\":10,\"lon\":10}"
2015-07-11 00:14:41 +03:00
resp <- withBH' $ indexDocument testIndex testMapping defaultIndexDocumentSettings exampleTweet (DocId "1")
2014-11-21 03:34:37 +03:00
```
2014-05-03 07:57:20 +04:00
2014-11-21 03:34:37 +03:00
#### Example Response
2014-05-03 07:57:20 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-05-03 07:57:20 +04:00
2014-04-14 05:19:18 +04:00
Response {responseStatus =
Status {statusCode = 200, statusMessage = "OK"}
, responseVersion = HTTP/1.1, responseHeaders =
2014-04-14 05:19:18 +04:00
[("Content-Type","application/json; charset=UTF-8"),
("Content-Length","75")]
, responseBody = "{\"_index\":\"twitter\",\"_type\":\"tweet\",\"_id\":\"1\",\"_version\":2,\"created\":false}"
, responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
### Deleting Documents
2014-11-21 03:34:37 +03:00
``` {.haskell}
resp <- withBH' $ deleteDocument testIndex testMapping (DocId "1")
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
### Getting Documents
2014-11-21 03:34:37 +03:00
``` {.haskell}
-- n.b., you'll need the earlier imports. responseBody is from http-conduit
resp <- withBH' $ getDocument testIndex testMapping (DocId "1")
-- responseBody :: Response body -> body
let body = responseBody resp
-- you have two options, you use decode and just get Maybe (EsResult Tweet)
-- or you can use eitherDecode and get Either String (EsResult Tweet)
let maybeResult = decode body :: Maybe (EsResult Tweet)
-- the explicit typing is so Aeson knows how to parse the JSON.
-- use either if you want to know why something failed to parse.
-- (string errors, sadly)
2015-01-24 01:34:53 +03:00
let eitherResult = eitherDecode body :: Either String (EsResult Tweet)
-- print eitherResult should look like:
2014-04-14 05:20:47 +04:00
Right (EsResult {_index = "twitter"
, _type = "tweet"
, _id = "1"
, _version = 2
, found = Just True
, _source = Tweet {user = "bitemyapp"
, postDate = 2009-06-18 00:00:10 UTC
, message = "Use haskell!"
, age = 10000
, location = Location {lat = 40.12, lon = -71.34}}})
-- _source in EsResult is parametric, we dispatch the type by passing in what we expect (Tweet) as a parameter to EsResult.
2014-04-14 05:16:44 +04:00
-- use the _source record accessor to get at your document
2015-01-24 01:34:53 +03:00
fmap _source eitherResult
2014-04-14 05:20:47 +04:00
Right (Tweet {user = "bitemyapp"
, postDate = 2009-06-18 00:00:10 UTC
, message = "Use haskell!"
, age = 10000
, location = Location {lat = 40.12, lon = -71.34}})
2014-04-14 05:16:44 +04:00
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
Bulk Operations
---------------
2014-11-21 03:34:37 +03:00
### Bulk create, index
2014-11-21 03:34:37 +03:00
``` {.haskell}
-- don't forget the imports and derive generic setting for ghci
-- at the beginning of the examples.
:{
-- Using the earlier Tweet datatype and exampleTweet data
-- just changing up the data a bit.
let bulkTest = exampleTweet { user = "blah" }
let bulkTestTwo = exampleTweet { message = "woohoo!" }
-- create only bulk operation
-- BulkCreate :: IndexName -> MappingName -> DocId -> Value -> BulkOperation
let firstOp = BulkCreate testIndex
testMapping (DocId "3") (toJSON bulkTest)
-- index operation "create or update"
let sndOp = BulkIndex testIndex
testMapping (DocId "4") (toJSON bulkTestTwo)
-- Some explanation, the final "Value" type that BulkIndex,
-- BulkCreate, and BulkUpdate accept is the actual document
-- data that your operation applies to. BulkDelete doesn't
2015-02-05 19:08:06 +03:00
-- take a value because it's just deleting whatever DocId
-- you pass.
-- list of bulk operations
let stream = [firstDoc, secondDoc]
-- Fire off the actual bulk request
-- bulk :: Vector BulkOperation -> IO Reply
resp <- withBH' $ bulk stream
:}
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
### Encoding individual bulk API operations
2014-11-21 03:34:37 +03:00
``` {.haskell}
-- the following functions are exported in Bloodhound so
-- you can build up bulk operations yourself
encodeBulkOperations :: V.Vector BulkOperation -> L.ByteString
encodeBulkOperation :: BulkOperation -> L.ByteString
-- How to use the above:
data BulkTest = BulkTest { name :: Text } deriving (Eq, Generic, Show)
instance FromJSON BulkTest
instance ToJSON BulkTest
_ <- insertData
let firstTest = BulkTest "blah"
let secondTest = BulkTest "bloo"
let firstDoc = BulkIndex testIndex
testMapping (DocId "2") (toJSON firstTest)
let secondDoc = BulkCreate testIndex
testMapping (DocId "3") (toJSON secondTest)
let stream = V.fromList [firstDoc, secondDoc] :: V.Vector BulkOperation
-- to encode yourself
let firstDocEncoded = encode firstDoc :: L.ByteString
-- to encode a vector of bulk operations
let encodedOperations = encodeBulkOperations stream
-- to insert into a particular server
-- bulk :: V.Vector BulkOperation -> IO Reply
2015-07-11 00:14:41 +03:00
_ <- withBH' $ bulk streamp
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
Search
------
2014-11-21 03:34:37 +03:00
### Querying
2014-11-21 03:34:37 +03:00
#### Term Query
2014-04-14 05:16:44 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 05:16:44 +04:00
-- exported by the Client module, just defaults some stuff.
-- mkSearch :: Maybe Query -> Maybe Filter -> Search
2015-05-12 07:55:33 +03:00
-- mkSearch query filter = Search query filter Nothing False (From 0) (Size 10)
2014-04-14 05:16:44 +04:00
let query = TermQuery (Term "user" "bitemyapp") Nothing
-- AND'ing identity filter with itself and then tacking it onto a query
-- search should be a null-operation. I include it for the sake of example.
-- <||> (or/plus) should make it into a search that returns everything.
let filter = IdentityFilter <&&> IdentityFilter
2014-04-14 05:27:31 +04:00
-- constructing the search object the searchByIndex function dispatches on.
2014-04-14 05:16:44 +04:00
let search = mkSearch (Just query) (Just filter)
2014-04-14 05:27:31 +04:00
-- you can also searchByType and specify the mapping name.
reply <- withBH' $ searchByIndex testIndex search
2014-04-14 05:27:31 +04:00
2014-04-14 05:16:44 +04:00
let result = eitherDecode (responseBody reply) :: Either String (SearchResult Tweet)
λ> fmap (hits . searchHits) result
2014-04-14 05:25:26 +04:00
Right [Hit {hitIndex = IndexName "twitter"
, hitType = MappingName "tweet"
, hitDocId = DocId "1"
, hitScore = 0.30685282
, hitSource = Tweet {user = "bitemyapp"
, postDate = 2009-06-18 00:00:10 UTC
, message = "Use haskell!"
, age = 10000
, location = Location {lat = 40.12, lon = -71.34}}}]
2014-04-14 05:16:44 +04:00
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
#### Match Query
2014-05-03 08:02:55 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-05-03 08:02:55 +04:00
let query = QueryMatchQuery $ mkMatchQuery (FieldName "user") (QueryString "bitemyapp")
let search = mkSearch (Just query) Nothing
2014-11-21 03:34:37 +03:00
```
2014-05-03 08:02:55 +04:00
2014-11-21 03:34:37 +03:00
#### Multi-Match Query
2014-05-03 08:02:55 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-05-03 08:02:55 +04:00
let fields = [FieldName "user", FieldName "message"]
let query = QueryMultiMatchQuery $ mkMultiMatchQuery fields (QueryString "bitemyapp")
let search = mkSearch (Just query) Nothing
2014-11-21 03:34:37 +03:00
```
2014-05-03 08:02:55 +04:00
2014-11-21 03:34:37 +03:00
#### Bool Query
2014-05-03 08:02:55 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-05-03 08:02:55 +04:00
let innerQuery = QueryMatchQuery $
mkMatchQuery (FieldName "user") (QueryString "bitemyapp")
let query = QueryBoolQuery $
mkBoolQuery [innerQuery] [] []
2014-05-03 08:02:55 +04:00
let search = mkSearch (Just query) Nothing
2014-11-21 03:34:37 +03:00
```
2014-05-03 08:02:55 +04:00
2014-11-21 03:34:37 +03:00
#### Boosting Query
2014-05-03 08:02:55 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-05-03 08:02:55 +04:00
2014-05-03 08:03:24 +04:00
let posQuery = QueryMatchQuery $
mkMatchQuery (FieldName "user") (QueryString "bitemyapp")
let negQuery = QueryMatchQuery $
mkMatchQuery (FieldName "user") (QueryString "notmyapp")
let query = QueryBoostingQuery $
BoostingQuery posQuery negQuery (Boost 0.2)
2014-05-03 08:02:55 +04:00
2014-11-21 03:34:37 +03:00
```
2014-05-03 08:02:55 +04:00
2014-11-21 03:34:37 +03:00
#### Rest of the query/filter types
2014-05-03 08:02:55 +04:00
Just follow the pattern you've seen here and check the Hackage API documentation.
2014-11-21 03:34:37 +03:00
### Sorting
2014-04-15 12:10:47 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-15 12:10:47 +04:00
2014-04-15 12:16:15 +04:00
let sortSpec = DefaultSortSpec $ mkSort (FieldName "age") Ascending
-- mkSort is a shortcut function that takes a FieldName and a SortOrder
-- to generate a vanilla DefaultSort.
-- checkt the DefaultSort type for the full list of customizable options.
-- From and size are integers for pagination.
-- When sorting on a field, scores are not computed. By setting TrackSortScores to true, scores will still be computed and tracked.
-- type Sort = [SortSpec]
-- type TrackSortScores = Bool
-- type From = Int
-- type Size = Int
-- Search takes Maybe Query
-- -> Maybe Filter
-- -> Maybe Sort
-- -> TrackSortScores
-- -> From -> Size
2014-04-15 12:17:13 +04:00
-- just add more sortspecs to the list if you want tie-breakers.
2015-05-12 07:55:33 +03:00
let search = Search Nothing (Just IdentityFilter) (Just [sortSpec]) False (From 0) (Size 10)
2014-04-15 12:16:15 +04:00
2014-11-21 03:34:37 +03:00
```
2014-04-15 12:10:47 +04:00
2014-11-21 03:34:37 +03:00
### Filtering
2014-11-21 03:34:37 +03:00
#### And, Not, and Or filters
2014-04-14 08:30:21 +04:00
Filters form a monoid and seminearring.
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
instance Monoid Filter where
mempty = IdentityFilter
mappend a b = AndFilter [a, b] defaultCache
instance Seminearring Filter where
a <||> b = OrFilter [a, b] defaultCache
-- AndFilter and OrFilter take [Filter] as an argument.
-- This will return anything, because IdentityFilter returns everything
OrFilter [IdentityFilter, someOtherFilter] False
-- This will return exactly what someOtherFilter returns
AndFilter [IdentityFilter, someOtherFilter] False
-- Thanks to the seminearring and monoid, the above can be expressed as:
-- "and"
IdentityFilter <&&> someOtherFilter
-- "or"
IdentityFilter <||> someOtherFilter
-- Also there is a NotFilter, it only accepts a single filter, not a list.
NotFilter someOtherFilter False
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
#### Identity Filter
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
-- And'ing two Identity
let queryFilter = IdentityFilter <&&> IdentityFilter
let search = mkSearch Nothing (Just queryFilter)
reply <- withBH' $ searchByType testIndex testMapping search
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
#### Boolean Filter
2014-04-14 08:30:21 +04:00
Similar to boolean queries.
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
-- Will return only items whose "user" field contains the term "bitemyapp"
let queryFilter = BoolFilter (MustMatch (Term "user" "bitemyapp") False)
-- Will return only items whose "user" field does not contain the term "bitemyapp"
let queryFilter = BoolFilter (MustNotMatch (Term "user" "bitemyapp") False)
-- The clause (query) should appear in the matching document.
-- In a boolean query with no must clauses, one or more should
-- clauses must match a document. The minimum number of should
-- clauses to match can be set using the minimum_should_match parameter.
let queryFilter = BoolFilter (ShouldMatch [(Term "user" "bitemyapp")] False)
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
#### Exists Filter
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
-- Will filter for documents that have the field "user"
let existsFilter = ExistsFilter (FieldName "user")
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
#### Geo BoundingBox Filter
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
-- topLeft and bottomRight
let box = GeoBoundingBox (LatLon 40.73 (-74.1)) (LatLon 40.10 (-71.12))
let constraint = GeoBoundingBoxConstraint (FieldName "tweet.location") box False GeoFilterMemory
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
#### Geo Distance Filter
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
let geoPoint = GeoPoint (FieldName "tweet.location") (LatLon 40.12 (-71.34))
-- coefficient and units
let distance = Distance 10.0 Miles
-- GeoFilterType or NoOptimizeBbox
let optimizeBbox = OptimizeGeoFilterType GeoFilterMemory
-- SloppyArc is the usual/default optimization in Elasticsearch today
-- but pre-1.0 versions will need to pick Arc or Plane.
let geoFilter = GeoDistanceFilter geoPoint distance SloppyArc optimizeBbox False
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
#### Geo Distance Range Filter
2014-04-14 08:30:21 +04:00
Think of a donut and you won't be far off.
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
let geoPoint = GeoPoint (FieldName "tweet.location") (LatLon 40.12 (-71.34))
let distanceRange = DistanceRange (Distance 0.0 Miles) (Distance 10.0 Miles)
let geoFilter = GeoDistanceRangeFilter geoPoint distanceRange
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
#### Geo Polygon Filter
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
-- I think I drew a square here.
let points = [LatLon 40.0 (-70.00),
LatLon 40.0 (-72.00),
LatLon 41.0 (-70.00),
LatLon 41.0 (-72.00)]
let geoFilter = GeoPolygonFilter (FieldName "tweet.location") points
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
#### Document IDs filter
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
-- takes a mapping name and a list of DocIds
IdsFilter (MappingName "tweet") [DocId "1"]
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
#### Range Filter
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
-- RangeFilter :: FieldName
2015-02-08 10:44:24 +03:00
-- -> RangeValue
2014-04-14 08:30:21 +04:00
-- -> RangeExecution
-- -> Cache -> Filter
let filter = RangeFilter (FieldName "age")
2015-02-08 10:44:24 +03:00
(RangeGtLt (GreaterThan 1000.0) (LessThan 100000.0))
2014-04-14 08:30:21 +04:00
RangeExecutionIndex False
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
let filter = RangeFilter (FieldName "age")
2015-02-08 10:44:24 +03:00
(RangeLte (LessThanEq 100000.0))
2014-04-14 08:30:21 +04:00
RangeExecutionIndex False
2014-11-21 03:34:37 +03:00
```
2014-04-14 08:30:21 +04:00
2015-02-08 10:44:24 +03:00
##### Date Ranges
2015-02-05 19:08:06 +03:00
2015-02-08 10:44:24 +03:00
Date ranges are expressed in UTCTime. Date ranges use the same range bound constructors as numerics, except that they end in "D".
2015-02-05 19:08:06 +03:00
2015-02-08 10:44:24 +03:00
Note that compatibility with ES is tested only down to seconds.
2015-02-05 19:08:06 +03:00
2015-02-08 10:44:24 +03:00
``` {.haskell}
2015-02-05 19:08:06 +03:00
2015-02-08 10:44:24 +03:00
let filter = RangeFilter (FieldName "postDate")
(RangeDateGtLte
(GreaterThanD (UTCTime
(ModifiedJulianDay 55000)
(secondsToDiffTime 9)))
(LessThanEqD (UTCTime
(ModifiedJulianDay 55000)
(secondsToDiffTime 11))))
2015-02-05 19:08:06 +03:00
RangeExecutionIndex False
```
2014-11-21 03:34:37 +03:00
#### Regexp Filter
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
``` {.haskell}
2014-04-14 08:30:21 +04:00
-- RegexpFilter
-- :: FieldName
-- -> Regexp
-- -> RegexpFlags
-- -> CacheName
-- -> Cache
-- -> CacheKey
-- -> Filter
let filter = RegexpFilter (FieldName "user") (Regexp "bite.*app")
2014-06-20 20:22:03 +04:00
AllRegexpFlags (CacheName "test") False (CacheKey "key")
-- n.b.
-- data RegexpFlags = AllRegexpFlags
-- | NoRegexpFlags
-- | SomeRegexpFlags (NonEmpty RegexpFlag) deriving (Eq, Show)
-- data RegexpFlag = AnyString
-- | Automaton
-- | Complement
-- | Empty
-- | Intersection
-- | Interval deriving (Eq, Show)
2014-04-14 08:30:21 +04:00
2014-11-21 03:34:37 +03:00
```
### Aggregations
#### Adding aggregations to search
Aggregations can now be added to search queries, or made on their own.
2014-11-21 03:34:37 +03:00
``` {.haskell}
type Aggregations = M.Map Text Aggregation
data Aggregation
= TermsAgg TermsAggregation
| DateHistogramAgg DateHistogramAggregation
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
For convenience, \`\`\`mkAggregations\`\`\` exists, that will create an \`\`\`Aggregations\`\`\` with the aggregation provided.
For example:
2014-11-21 03:34:37 +03:00
``` {.haskell}
let a = mkAggregations "users" $ TermsAgg $ mkTermsAggregation "user"
let search = mkAggregateSearch Nothing a
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
Aggregations can be added to an existing search, using the \`\`\`aggBody\`\`\` field
2014-11-21 03:34:37 +03:00
``` {.haskell}
let search = mkSearch (Just (MatchAllQuery Nothing)) Nothing
let search' = search {aggBody = Just a}
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
Since the \`\`\`Aggregations\`\`\` structure is just a Map Text Aggregation, M.insert can be used to add additional aggregations.
2014-11-21 03:34:37 +03:00
``` {.haskell}
let a' = M.insert "age" (TermsAgg $ mkTermsAggregation "age") a
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
#### Extracting aggregations from results
2014-11-21 03:34:37 +03:00
Aggregations are part of the reply structure of every search, in the form of
``` {.haskell}
-- Lift decode and response body to be in the IO monad.
let decode' = liftM decode
let responseBody' = liftM responseBody
let reply = withBH' $ searchByIndex testIndex search
let response = decode' $ responseBody' reply :: IO (Maybe (SearchResult Tweet))
-- Now that we have our response, we can extract our terms aggregation result -- which is a list of buckets.
let terms = do { response' <- response; return $ response' >>= aggregations >>= toTerms "users" }
terms
Just (Bucket {buckets = [TermsResult {termKey = "bitemyapp", termsDocCount = 1, termsAggs = Nothing}]})
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
Note that bucket aggregation results, such as the TermsResult is a member of the type class :
2014-11-21 03:34:37 +03:00
``` {.haskell}
class BucketAggregation a where
key :: a -> Text
docCount :: a -> Int
aggs :: a -> Maybe AggregationResults
2014-11-21 03:34:37 +03:00
```
haskell
You can use the function to get any nested results, if there were any. For example, if there were a nested terms aggregation keyed to "age" in a TermsResult named , you would call
2014-11-21 03:34:37 +03:00
#### Terms Aggregation
2014-11-21 03:34:37 +03:00
``` {.haskell}
data TermsAggregation
= TermsAggregation {term :: Either Text Text,
termInclude :: Maybe TermInclusion,
termExclude :: Maybe TermInclusion,
termOrder :: Maybe TermOrder,
termMinDocCount :: Maybe Int,
termSize :: Maybe Int,
termShardSize :: Maybe Int,
termCollectMode :: Maybe CollectionMode,
termExecutionHint :: Maybe ExecutionHint,
termAggs :: Maybe Aggregations}
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
Term Aggregations have two factory functions, , and , and can be used as follows:
2014-11-21 03:34:37 +03:00
``` {.haskell}
let ta = TermsAgg $ mkTermsAggregation "user"
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
There are of course other options that can be added to a Terms Aggregation, such as the collection mode:
``` {.haskell}
let ta = mkTermsAggregation "user"
let ta' = ta { termCollectMode = Just BreadthFirst }
let ta'' = TermsAgg ta'
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
For more documentation on how the Terms Aggregation works, see <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html>
2014-11-21 03:34:37 +03:00
#### Date Histogram Aggregation
2014-11-21 03:34:37 +03:00
``` {.haskell}
data DateHistogramAggregation
= DateHistogramAggregation {dateField :: FieldName,
dateInterval :: Interval,
dateFormat :: Maybe Text,
datePreZone :: Maybe Text,
datePostZone :: Maybe Text,
datePreOffset :: Maybe Text,
datePostOffset :: Maybe Text,
dateAggs :: Maybe Aggregations}
2014-11-21 03:34:37 +03:00
```
haskell
2014-11-21 03:34:37 +03:00
The Date Histogram Aggregation works much the same as the Terms Aggregation.
2014-11-21 03:34:37 +03:00
Relevant functions include , and
2014-11-21 03:34:37 +03:00
``` {.haskell}
let dh = DateHistogramAgg (mkDateHistogram (FieldName "postDate") Minute)
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
Date histograms also accept a :
2014-11-21 03:34:37 +03:00
``` {.haskell}
FractionalInterval :: Float -> TimeInterval -> Interval
-- TimeInterval is the following:
data TimeInterval = Weeks | Days | Hours | Minutes | Seconds
2014-11-21 03:34:37 +03:00
```
It can be used as follows:
2014-11-21 03:34:37 +03:00
``` {.haskell}
let dh = DateHistogramAgg (mkDateHistogram (FieldName "postDate") (FractionalInterval 1.5 Minutes))
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
The is defined as:
2014-11-21 03:34:37 +03:00
``` {.haskell}
data DateHistogramResult
= DateHistogramResult {dateKey :: Int,
dateKeyStr :: Maybe Text,
dateDocCount :: Int,
dateHistogramAggs :: Maybe AggregationResults}
2014-11-21 03:34:37 +03:00
```
2014-11-21 03:34:37 +03:00
It is an instance of , and can have nested aggregations in each bucket.
2014-11-21 03:34:37 +03:00
Buckets can be extracted from a using
2014-11-21 03:34:37 +03:00
For more information on the Date Histogram Aggregation, see: <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html>
2015-05-12 08:03:22 +03:00
Contributors
============
* [Chris Allen](https://github.com/bitemyapp)
* [Liam Atkinson](https://github.com/latkins)
* [Christopher Guiney](https://github.com/chrisguiney)
* [Curtis Carter](https://github.com/ccarter)
* [Michael Xavier](https://github.com/MichaelXavier)
* [Bob Long](https://github.com/bobjflong)
* [Maximilian Tagher](https://github.com/MaxGabriel)
* [Anna Kopp](https://github.com/annakopp)
* [Matvey B. Aksenov](https://github.com/supki)
2014-11-21 03:34:37 +03:00
Possible future functionality
=============================
2014-11-21 03:34:37 +03:00
Span Queries
------------
2014-11-21 03:34:37 +03:00
Beginning here: <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-span-first-query.html>
2014-11-21 03:34:37 +03:00
Function Score Query
--------------------
2014-04-19 15:53:39 +04:00
2014-11-21 03:34:37 +03:00
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html>
2014-04-19 15:53:39 +04:00
2014-11-21 03:34:37 +03:00
Node discovery and failover
---------------------------
Might require TCP support.
2014-11-21 03:34:37 +03:00
Support for TCP access to Elasticsearch
---------------------------------------
Pretend to be a transport client?
2014-11-21 03:34:37 +03:00
Bulk cluster-join merge
-----------------------
Might require making a lucene index on disk with the appropriate format.
2014-11-21 03:34:37 +03:00
GeoShapeQuery
-------------
2014-11-21 03:34:37 +03:00
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-shape-query.html>
2014-11-21 03:34:37 +03:00
GeoShapeFilter
--------------
2014-04-11 05:04:24 +04:00
2014-11-21 03:34:37 +03:00
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geo-shape-filter.html>
2014-04-11 05:04:24 +04:00
2014-11-21 03:34:37 +03:00
Geohash cell filter
-------------------
2014-04-11 05:04:24 +04:00
2014-11-21 03:34:37 +03:00
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-geohash-cell-filter.html>
2014-04-11 05:04:24 +04:00
2014-11-21 03:34:37 +03:00
HasChild Filter
---------------
2014-04-11 05:04:24 +04:00
2014-11-21 03:34:37 +03:00
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-child-filter.html>
2014-04-11 05:04:24 +04:00
2014-11-21 03:34:37 +03:00
HasParent Filter
----------------
2014-04-11 05:04:24 +04:00
2014-11-21 03:34:37 +03:00
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-has-parent-filter.html>
2014-04-11 05:04:24 +04:00
2014-11-21 03:34:37 +03:00
Indices Filter
--------------
2014-11-21 03:34:37 +03:00
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-indices-filter.html>
2014-11-21 03:34:37 +03:00
Query Filter
------------
2014-11-21 03:34:37 +03:00
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html>
2014-11-21 03:34:37 +03:00
Script based sorting
--------------------
2014-04-12 03:13:19 +04:00
2014-11-21 03:34:37 +03:00
<http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-sort.html#_script_based_sorting>
2014-04-12 03:13:19 +04:00
2014-11-21 03:34:37 +03:00
Collapsing redundantly nested and/or structures
-----------------------------------------------
2014-04-15 05:23:40 +04:00
2015-03-09 03:40:26 +03:00
The Seminearring instance, if deeply nested can possibly produce nested structure that is redundant. Depending on how this affects ES performance, reducing this structure might be valuable.
2014-04-15 05:23:40 +04:00
2014-11-21 03:34:37 +03:00
Runtime checking for cycles in data structures
----------------------------------------------
check for n \> 1 occurrences in DFS:
2014-11-21 03:34:37 +03:00
<http://hackage.haskell.org/package/stable-maps-0.0.5/docs/System-Mem-StableName-Dynamic.html>
2014-11-21 03:34:37 +03:00
<http://hackage.haskell.org/package/stable-maps-0.0.5/docs/System-Mem-StableName-Dynamic-Map.html>
2014-11-21 03:34:37 +03:00
Photo Origin
============
2014-04-12 14:12:17 +04:00
2014-11-21 03:34:37 +03:00
Photo from HA! Designs: <https://www.flickr.com/photos/hadesigns/>