A Haskell library for fuzzy text search
Go to file
Rúnar 213b717428
Merge pull request #5 from neduard/update-build-dependencies
Update cabal file - tested with GHC 9.8.2
2024-06-13 15:00:37 -04:00
src/Text Prefer matches towards the end of the text 2021-04-14 21:01:37 -04:00
tests Fix bug with reversed alignments. Add test. 2021-04-13 21:50:23 -04:00
.gitignore Gitignorance 2021-03-15 22:25:07 -04:00
CHANGELOG.md Changelog 2021-03-18 09:58:06 -04:00
fuzzyfind.cabal update cabal file - tested with GHC 9.8.2 2024-06-12 21:08:08 +01:00
LICENSE Initial commit 2021-03-15 22:18:22 -04:00
README.markdown Initial commit 2021-03-15 22:18:22 -04:00
Setup.hs Cabal 2021-03-15 22:56:09 -04:00

fuzzyfind

A package that provides an API for fuzzy text search in Haskell, using a modified version of the Smith-Waterman algorithm. The search is intended to behave similarly to the excellent fzf tool by Junegunn Choi.

The core functionality of the library is provided by the bestMatch function:

bestMatch :: String -> String -> Maybe Alignment

Calling bestMatch query string will return Nothing if query is not a subsequence of string. Otherwise, it will return the "best" way to line the characters in query up with the characters in string. Lower-case characters in the query are assumed to be case-insensitive, and upper-case characters are assumed to be case-sensitive.

For example:

> bestMatch "ff" "FuzzyFind"
Just (Alignment {score = 25, result = Result {[Match "F", Gap "uzzy", Match "F", Gap "ind"]}})

The score indicates how "good" the match is. Better matches have higher scores. There's no maximum score (except for the upper limit of the Int datatype), but the lowest score is 0.

A substring from the query will generate a Match, and any characters from the input that don't result in a Match will generate a Gap. Concatenating all the Match and Gap results should yield the original input string.

Note that the matched characters in the input always occur in the same order as they do in the query pattern.

The algorithm prefers (and will generate higher scores for) the following kinds of matches:

  1. Contiguous characters from the query string. For example, bestMatch "pp" will find the last two ps in "pickled pepper".
  2. Characters at the beginnings of words. For example, bestMatch "pp" will find the first two Ps in "Peter Piper".
  3. A character in the input that matches the first character of the query pattern is strongly preferred. For example, bestMatch "mn" "Bat Man" will score higher than bestMatch "mn" "Batman".

All else being equal, matches that occur later in the input string are preferred.

The fuzzyFind function finds input strings that match all the given input patterns. For each input that matches it returns one Alignment. The output is sorted by score, ascending.

fuzzyFind :: [String] -> [String] -> [Alignment]

For example:

> fuzzyFind ["dad", "mac", "dam"] ["tinned macadamia"]
[Alignment {score = 296, result = Result [Gap "tinne", Match "d", Gap " ", Match "macadam", Gap "ia"]}]