mirror of
https://github.com/enso-org/enso.git
synced 2024-11-23 06:34:35 +03:00
Data analysts should be able to Text.match
, Text.match_all
, Text.is_match
to find or check matches (#3841)
Implements https://www.pivotaltracker.com/story/show/181266092 # Important Notes Also renaming `Text.location_of` and `Text.location_of_all` to `Text.locate` and `Text.locate_all`.
This commit is contained in:
parent
8f3bfe8ce2
commit
5b6fd74929
@ -241,6 +241,7 @@
|
||||
create derived Columns.][3782]
|
||||
- [Added support for milli and micro seconds, new short form for rename_columns
|
||||
and fixed issue with compare_to versus Nothing][3874]
|
||||
- [Aligned `Text.match`/`Text.locate` API][3841]
|
||||
|
||||
[debug-shortcuts]:
|
||||
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
|
||||
@ -383,6 +384,7 @@
|
||||
[3782]: https://github.com/enso-org/enso/pull/3782
|
||||
[3863]: https://github.com/enso-org/enso/pull/3863
|
||||
[3874]: https://github.com/enso-org/enso/pull/3874
|
||||
[3841]: https://github.com/enso-org/enso/pull/3841
|
||||
|
||||
#### Enso Compiler
|
||||
|
||||
|
@ -126,119 +126,64 @@ Text.characters self =
|
||||
self.each bldr.append
|
||||
bldr.to_vector
|
||||
|
||||
## ALIAS Match Text
|
||||
## ALIAS find
|
||||
|
||||
Matches the text in `self` against the provided regex `pattern`, returning
|
||||
the match(es) if present or `Nothing` if there are no matches.
|
||||
Matches the text in `self` against the provided `term`, returning the first
|
||||
or last match if present or `Nothing` if there are no matches.
|
||||
|
||||
Arguments:
|
||||
- pattern: The pattern to match `self` against. We recommend using _raw text_
|
||||
- term: The pattern to match `self` against. We recommend using _raw text_
|
||||
to write your patterns.
|
||||
- mode: This argument specifies how many matches the engine will try and
|
||||
find. When mode is set to either `Regex_Mode.First` or `Regex_Mode.Full`,
|
||||
this method will return either a single `Match` or `Nothing`. If set to an
|
||||
`Integer` or `Regex_Mode.All`, this method will return either
|
||||
a `Vector Match` or `Nothing`.
|
||||
- match_ascii: Enables or disables pure-ASCII matching for the regex. If you
|
||||
know your data only contains ASCII then you can enable this for a
|
||||
performance boost on some regex engines.
|
||||
- case_insensitive: Enables or disables case-insensitive matching. Case
|
||||
insensitive matching behaves as if it normalises the case of all input
|
||||
text before matching on it.
|
||||
- dot_matches_newline: Enables or disables the dot matches newline option.
|
||||
This specifies that the `.` special character should match everything
|
||||
_including_ newline characters. Without this flag, it will match all
|
||||
characters _except_ newlines.
|
||||
- multiline: Enables or disables the multiline option. Multiline specifies
|
||||
that the `^` and `$` pattern characters match the start and end of lines,
|
||||
as well as the start and end of the input respectively.
|
||||
- comments: Enables or disables the comments mode for the regular expression.
|
||||
In comments mode, the following changes apply:
|
||||
- Whitespace within the pattern is ignored, except when within a
|
||||
character class or when preceded by an unescaped backslash, or within
|
||||
grouping constructs (e.g. `(?...)`).
|
||||
- When a line contains a `#`, that is not in a character class and is not
|
||||
preceded by an unescaped backslash, all characters from the leftmost
|
||||
such `#` to the end of the line are ignored. That is to say, they act
|
||||
as _comments_ in the regex.
|
||||
- extra_opts: Specifies additional options in a vector. This allows options
|
||||
to be supplied and computed without having to break them out into arguments
|
||||
to the function. Where these overlap with one of the flags (`match_ascii`,
|
||||
`case_insensitive`, `dot_matches_newline`, `multiline` and `verbose`), the
|
||||
flags take precedence.
|
||||
|
||||
! Boolean Flags and Extra Options
|
||||
This function contains a number of arguments that are boolean flags that
|
||||
enable or disable common options for the regex. At the same time, it also
|
||||
provides the ability to specify options in the `extra_opts` argument.
|
||||
|
||||
Where one of the flags is _set_ (has the value `True` or `False`), the
|
||||
value of the flag takes precedence over the value in `extra_opts` when
|
||||
merging the options to the engine. The flags are _unset_ (have value
|
||||
`Nothing`) by default.
|
||||
- mode: This argument specifies whether the first or last match should be
|
||||
returned.
|
||||
- matcher: If a `Text_Matcher`, the text is compared using case-sensitivity
|
||||
rules specified in the matcher. If a `Regex_Matcher`, the term is used as a
|
||||
regular expression and matched using the associated options.
|
||||
|
||||
> Example
|
||||
Find matches for a basic email regex in some text. NOTE: This regex is
|
||||
_not_ compliant with RFC 5322.
|
||||
Find the first substring matching the regex.
|
||||
|
||||
example_match =
|
||||
regex = ".+@.+"
|
||||
"contact@enso.org".match regex
|
||||
Text.match : Text | Engine.Pattern -> (Regex_Mode | Matching_Mode) -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Vector Option.Option -> Match | Vector Match | Nothing ! Regex.Compile_Error
|
||||
Text.match self pattern mode=Regex_Mode.All match_ascii=Nothing case_insensitive=Nothing dot_matches_newline=Nothing multiline=Nothing comments=Nothing extra_opts=[] =
|
||||
compiled_pattern = Regex.compile pattern match_ascii=match_ascii case_insensitive=case_insensitive dot_matches_newline=dot_matches_newline multiline=multiline comments=comments extra_opts=extra_opts
|
||||
compiled_pattern.match self mode
|
||||
regex = "a[ab]c"
|
||||
"aabbbbccccaabcaaaa".match regex == "abc"
|
||||
Text.match : Text -> (Matching_Mode.First | Matching_Mode.Last) -> Matcher -> Text | Nothing
|
||||
Text.match self term mode=Matching_Mode.First matcher=Regex_Matcher.Regex_Matcher_Data =
|
||||
case self.locate term mode matcher of
|
||||
Nothing -> Nothing
|
||||
span -> span.text
|
||||
|
||||
## ALIAS find_all
|
||||
|
||||
Matches all occurrences text in `self` against the provided `term`, returning
|
||||
a vector of matches.
|
||||
|
||||
Arguments:
|
||||
- term: The pattern to match `self` against. We recommend using _raw text_
|
||||
to write your patterns.
|
||||
- matcher: If a `Text_Matcher`, the text is compared using case-sensitivity
|
||||
rules specified in the matcher. If a `Regex_Matcher`, the term is used as a
|
||||
regular expression and matched using the associated options.
|
||||
|
||||
> Example
|
||||
Find all substrings matching the regex.
|
||||
|
||||
example_match =
|
||||
regex = "a[ab]c"
|
||||
"aabcbbccaacaa".match regex == ["abc", "aac"]
|
||||
Text.match_all : Text -> (Text_Matcher | Regex_Matcher) -> Vector Text
|
||||
Text.match_all self term=".*" matcher=Regex_Matcher.Regex_Matcher_Data =
|
||||
self.locate_all term matcher . map .text
|
||||
|
||||
## ALIAS Check Matches
|
||||
|
||||
Matches the text in `self` against the provided regex `pattern`, returning
|
||||
`True` if the text matches at least once, and `False` otherwise.
|
||||
Checks if the whole text in `self` matches a provided `pattern`.
|
||||
|
||||
Arguments:
|
||||
- pattern: The pattern to match `self` against. We recommend using _raw text_
|
||||
to write your patterns.
|
||||
- mode: This argument specifies how many matches the engine will try and
|
||||
find. When mode is set to either `Regex_Mode.First` or `Regex_Mode.Full`,
|
||||
this method will return either a single `Match` or `Nothing`. If set to an
|
||||
`Integer` or `Regex_Mode.All`, this method will return either
|
||||
a `Vector Match` or `Nothing`.
|
||||
- match_ascii: Enables or disables pure-ASCII matching for the regex. If you
|
||||
know your data only contains ASCII then you can enable this for a
|
||||
performance boost on some regex engines.
|
||||
- case_insensitive: Enables or disables case-insensitive matching. Case
|
||||
insensitive matching behaves as if it normalises the case of all input
|
||||
text before matching on it.
|
||||
- dot_matches_newline: Enables or disables the dot matches newline option.
|
||||
This specifies that the `.` special character should match everything
|
||||
_including_ newline characters. Without this flag, it will match all
|
||||
characters _except_ newlines.
|
||||
- multiline: Enables or disables the multiline option. Multiline specifies
|
||||
that the `^` and `$` pattern characters match the start and end of lines,
|
||||
as well as the start and end of the input respectively.
|
||||
- comments: Enables or disables the comments mode for the regular expression.
|
||||
In comments mode, the following changes apply:
|
||||
- Whitespace within the pattern is ignored, except when within a
|
||||
character class or when preceeded by an unescaped backslash, or within
|
||||
grouping constructs (e.g. `(?...)`).
|
||||
- When a line contains a `#`, that is not in a character class and is not
|
||||
preceeded by an unescaped backslash, all characters from the leftmost
|
||||
such `#` to the end of the line are ignored. That is to say, they act
|
||||
as _comments_ in the regex.
|
||||
- extra_opts: Specifies additional options in a vector. This allows options
|
||||
to be supplied and computed without having to break them out into arguments
|
||||
to the function. Where these overlap with one of the flags (`match_ascii`,
|
||||
`case_insensitive`, `dot_matches_newline`, `multiline` and `verbose`), the
|
||||
flags take precedence.
|
||||
|
||||
! Boolean Flags and Extra Options
|
||||
This function contains a number of arguments that are boolean flags that
|
||||
enable or disable common options for the regex. At the same time, it also
|
||||
provides the ability to specify options in the `extra_opts` argument.
|
||||
|
||||
Where one of the flags is _set_ (has the value `True` or `False`), the
|
||||
value of the flag takes precedence over the value in `extra_opts` when
|
||||
merging the options to the engine. The flags are _unset_ (have value
|
||||
`Nothing`) by default.
|
||||
- matcher: If a `Text_Matcher`, the text is compared using case-sensitivity
|
||||
rules specified in the matcher. If a `Regex_Matcher`, the term is used as a
|
||||
regular expression and matched using the associated options.
|
||||
|
||||
> Example
|
||||
Checks if some text matches a basic email regex. NOTE: This regex is _not_
|
||||
@ -246,74 +191,15 @@ Text.match self pattern mode=Regex_Mode.All match_ascii=Nothing case_insensitive
|
||||
|
||||
example_match =
|
||||
regex = ".+@.+"
|
||||
"contact@enso.org".matches regex
|
||||
Text.matches : Text | Engine.Pattern -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Vector Option.Option -> Boolean ! Regex.Compile_Error
|
||||
Text.matches self pattern match_ascii=Nothing case_insensitive=Nothing dot_matches_newline=Nothing multiline=Nothing comments=Nothing extra_opts=[] =
|
||||
compiled_pattern = Regex.compile pattern match_ascii=match_ascii case_insensitive=case_insensitive dot_matches_newline=dot_matches_newline multiline=multiline comments=comments extra_opts=extra_opts
|
||||
"contact@enso.org".is_match regex
|
||||
Text.is_match : Text -> Matcher -> Boolean ! Regex.Compile_Error
|
||||
Text.is_match self pattern=".*" matcher=Regex_Matcher.Regex_Matcher_Data = case matcher of
|
||||
Text_Matcher.Case_Sensitive -> self == pattern
|
||||
Text_Matcher.Case_Insensitive locale -> self.equals_ignore_case pattern locale
|
||||
_ : Regex_Matcher.Regex_Matcher ->
|
||||
compiled_pattern = matcher.compile pattern
|
||||
compiled_pattern.matches self
|
||||
|
||||
## ALIAS Find Text
|
||||
|
||||
Finds all occurrences of `pattern` in the text `self`, returning the text(s)
|
||||
if present, or `Nothing` if there are no matches.
|
||||
|
||||
Arguments:
|
||||
- pattern: The pattern to match `self` against. We recommend using _raw text_
|
||||
to write your patterns.
|
||||
- mode: This argument specifies how many matches the engine will try and
|
||||
find. When mode is set to either `Regex_Mode.First` or `Regex_Mode.Full`,
|
||||
this method will return either a single `Text` or `Nothing`. If set to an
|
||||
`Integer` or `Regex_Mode.All`, this method will return either
|
||||
a `Vector Text` or `Nothing`.
|
||||
- match_ascii: Enables or disables pure-ASCII matching for the regex. If you
|
||||
know your data only contains ASCII then you can enable this for a
|
||||
performance boost on some regex engines.
|
||||
- case_insensitive: Enables or disables case-insensitive matching. Case
|
||||
insensitive matching behaves as if it normalises the case of all input
|
||||
text before matching on it.
|
||||
- dot_matches_newline: Enables or disables the dot matches newline option.
|
||||
This specifies that the `.` special character should match everything
|
||||
_including_ newline characters. Without this flag, it will match all
|
||||
characters _except_ newlines.
|
||||
- multiline: Enables or disables the multiline option. Multiline specifies
|
||||
that the `^` and `$` pattern characters match the start and end of lines,
|
||||
as well as the start and end of the input respectively.
|
||||
- comments: Enables or disables the comments mode for the regular expression.
|
||||
In comments mode, the following changes apply:
|
||||
- Whitespace within the pattern is ignored, except when within a
|
||||
character class or when preceeded by an unescaped backslash, or within
|
||||
grouping constructs (e.g. `(?...)`).
|
||||
- When a line contains a `#`, that is not in a character class and is not
|
||||
preceeded by an unescaped backslash, all characters from the leftmost
|
||||
such `#` to the end of the line are ignored. That is to say, they act
|
||||
as _comments_ in the regex.
|
||||
- extra_opts: Specifies additional options in a vector. This allows options
|
||||
to be supplied and computed without having to break them out into arguments
|
||||
to the function. Where these overlap with one of the flags (`match_ascii`,
|
||||
`case_insensitive`, `dot_matches_newline`, `multiline` and `verbose`), the
|
||||
flags take precedence.
|
||||
|
||||
! Boolean Flags and Extra Options
|
||||
This function contains a number of arguments that are boolean flags that
|
||||
enable or disable common options for the regex. At the same time, it also
|
||||
provides the ability to specify options in the `extra_opts` argument.
|
||||
|
||||
Where one of the flags is _set_ (has the value `True` or `False`), the
|
||||
value of the flag takes precedence over the value in `extra_opts` when
|
||||
merging the options to the engine. The flags are _unset_ (have value
|
||||
`Nothing`) by default.
|
||||
|
||||
> Example
|
||||
Find words that contain three or less letters in text`"\w{1,3}"`
|
||||
|
||||
example_find =
|
||||
text = "Now I know my ABCs"
|
||||
text.find "\w{1,3}"
|
||||
Text.find : Text | Engine.Pattern -> (Regex_Mode | Matching_Mode) -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Vector Option.Option -> Text | Vector Text | Nothing
|
||||
Text.find self pattern mode=Regex_Mode.All match_ascii=Nothing case_insensitive=Nothing dot_matches_newline=Nothing multiline=Nothing comments=Nothing extra_opts=[] =
|
||||
compiled_pattern = Regex.compile pattern match_ascii=match_ascii case_insensitive=case_insensitive dot_matches_newline=dot_matches_newline multiline=multiline comments=comments extra_opts=extra_opts
|
||||
compiled_pattern.find self mode
|
||||
|
||||
## ALIAS Split Text
|
||||
|
||||
Takes a delimiter and returns the vector that results from splitting `self`
|
||||
@ -343,8 +229,8 @@ Text.find self pattern mode=Regex_Mode.All match_ascii=Nothing case_insensitive=
|
||||
'abc def\tghi'.split '\\s+' Regex_Matcher.Regex_Matcher_Data == ["abc", "def", "ghi"]
|
||||
Text.split : Text -> (Text_Matcher | Regex_Matcher) -> Vector Text
|
||||
Text.split self delimiter="," matcher=Text_Matcher.Case_Sensitive = if delimiter.is_empty then Error.throw (Illegal_Argument_Error_Data "The delimiter cannot be empty.") else
|
||||
case Meta.type_of matcher of
|
||||
Text_Matcher.Text_Matcher ->
|
||||
case matcher of
|
||||
_ : Text_Matcher.Text_Matcher ->
|
||||
delimiters = Vector.from_polyglot_array <| case matcher of
|
||||
Text_Matcher.Case_Sensitive ->
|
||||
Text_Utils.span_of_all self delimiter
|
||||
@ -356,7 +242,7 @@ Text.split self delimiter="," matcher=Text_Matcher.Case_Sensitive = if delimiter
|
||||
end = if i == delimiters.length then (Text_Utils.char_length self) else
|
||||
delimiters.at i . codeunit_start
|
||||
Text_Utils.substring self start end
|
||||
Regex_Matcher.Regex_Matcher ->
|
||||
_ : Regex_Matcher.Regex_Matcher ->
|
||||
compiled_pattern = matcher.compile delimiter
|
||||
compiled_pattern.split self mode=Regex_Mode.All
|
||||
|
||||
@ -438,8 +324,8 @@ Text.split self delimiter="," matcher=Text_Matcher.Case_Sensitive = if delimiter
|
||||
"aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "aaa ca"
|
||||
Text.replace : Text -> Text -> Matching_Mode | Regex_Mode -> (Text_Matcher | Regex_Matcher) -> Text
|
||||
Text.replace self term="" new_text="" mode=Regex_Mode.All matcher=Text_Matcher.Case_Sensitive = if term.is_empty then self else
|
||||
case Meta.type_of matcher of
|
||||
Text_Matcher.Text_Matcher ->
|
||||
case matcher of
|
||||
_ : Text_Matcher.Text_Matcher ->
|
||||
array_from_single_result result = case result of
|
||||
Nothing -> Array.empty
|
||||
_ -> Array.new_1 result
|
||||
@ -463,7 +349,7 @@ Text.replace self term="" new_text="" mode=Regex_Mode.All matcher=Text_Matcher.C
|
||||
Text_Utils.span_of_case_insensitive self term locale.java_locale True
|
||||
_ -> Error.throw (Illegal_Argument_Error_Data "Invalid mode.")
|
||||
Text_Utils.replace_spans self spans_array new_text
|
||||
Regex_Matcher.Regex_Matcher ->
|
||||
_ : Regex_Matcher.Regex_Matcher ->
|
||||
compiled_pattern = matcher.compile term
|
||||
compiled_pattern.replace self new_text mode=mode
|
||||
|
||||
@ -882,7 +768,7 @@ Text.starts_with self prefix matcher=Text_Matcher.Case_Sensitive = case matcher
|
||||
Text_Matcher.Case_Sensitive -> Text_Utils.starts_with self prefix
|
||||
Text_Matcher.Case_Insensitive locale ->
|
||||
self.take (Index_Sub_Range.First prefix.length) . equals_ignore_case prefix locale=locale
|
||||
Regex_Matcher.Regex_Matcher_Data _ _ _ _ _ ->
|
||||
_ : Regex_Matcher.Regex_Matcher ->
|
||||
preprocessed_pattern = "\A(?:" + prefix + ")"
|
||||
compiled_pattern = matcher.compile preprocessed_pattern
|
||||
match = compiled_pattern.match self Regex_Mode.First
|
||||
@ -917,7 +803,7 @@ Text.ends_with self suffix matcher=Text_Matcher.Case_Sensitive = case matcher of
|
||||
Text_Matcher.Case_Sensitive -> Text_Utils.ends_with self suffix
|
||||
Text_Matcher.Case_Insensitive locale ->
|
||||
self.take (Index_Sub_Range.Last suffix.length) . equals_ignore_case suffix locale=locale
|
||||
Regex_Matcher.Regex_Matcher_Data _ _ _ _ _ ->
|
||||
_ : Regex_Matcher.Regex_Matcher ->
|
||||
preprocessed_pattern = "(?:" + suffix + ")\z"
|
||||
compiled_pattern = matcher.compile preprocessed_pattern
|
||||
match = compiled_pattern.match self Regex_Mode.First
|
||||
@ -979,7 +865,7 @@ Text.contains self term="" matcher=Text_Matcher.Case_Sensitive = case matcher of
|
||||
Text_Matcher.Case_Sensitive -> Text_Utils.contains self term
|
||||
Text_Matcher.Case_Insensitive locale ->
|
||||
Text_Utils.contains_case_insensitive self term locale.java_locale
|
||||
Regex_Matcher.Regex_Matcher_Data _ _ _ _ _ ->
|
||||
_ : Regex_Matcher.Regex_Matcher ->
|
||||
compiled_pattern = matcher.compile term
|
||||
match = compiled_pattern.match self Regex_Mode.First
|
||||
match.is_nothing.not
|
||||
@ -1031,9 +917,7 @@ Text.* self count = self.repeat count
|
||||
"Hello ".repeat 2 == "Hello Hello "
|
||||
Text.repeat : Integer -> Text
|
||||
Text.repeat self count=1 =
|
||||
## TODO max is a workaround until Range is sorted to make 0..-1 not cause an infinite loop
|
||||
https://www.pivotaltracker.com/story/show/181435598
|
||||
0.up_to (count.max 0) . fold "" acc-> _-> acc + self
|
||||
0.up_to count . fold "" acc-> _-> acc + self
|
||||
|
||||
## ALIAS first, last, left, right, mid, substring
|
||||
Creates a new Text by selecting the specified range of the input.
|
||||
@ -1262,7 +1146,7 @@ Text.trim self where=Location.Both what=_.is_whitespace =
|
||||
if start_index >= end_index then "" else
|
||||
Text_Utils.substring self start_index end_index
|
||||
|
||||
## ALIAS find, index_of, position_of, span_of
|
||||
## ALIAS index_of, position_of, span_of
|
||||
Find the location of the `term` in the input.
|
||||
Returns a Span representing the location at which the term was found, or
|
||||
`Nothing` if the term was not found in the input.
|
||||
@ -1286,9 +1170,9 @@ Text.trim self where=Location.Both what=_.is_whitespace =
|
||||
> Example
|
||||
Finding location of a substring.
|
||||
|
||||
"Hello World!".location_of "J" == Nothing
|
||||
"Hello World!".location_of "o" == Span (Range 4 5) "Hello World!"
|
||||
"Hello World!".location_of "o" mode=Matching_Mode.Last == Span (Range 7 8) "Hello World!"
|
||||
"Hello World!".locate "J" == Nothing
|
||||
"Hello World!".locate "o" == Span (Range 4 5) "Hello World!"
|
||||
"Hello World!".locate "o" mode=Matching_Mode.Last == Span (Range 7 8) "Hello World!"
|
||||
|
||||
! Match Length
|
||||
The function returns not only the index of the match but a `Span` instance
|
||||
@ -1308,7 +1192,7 @@ Text.trim self where=Location.Both what=_.is_whitespace =
|
||||
|
||||
term = "straße"
|
||||
text = "MONUMENTENSTRASSE 42"
|
||||
match = text . location_of term matcher=(Text_Matcher Case_Insensitive)
|
||||
match = text . locate term matcher=(Text_Matcher Case_Insensitive)
|
||||
term.length == 6
|
||||
match.length == 7
|
||||
|
||||
@ -1329,11 +1213,11 @@ Text.trim self where=Location.Both what=_.is_whitespace =
|
||||
ligatures = "ffiffl"
|
||||
ligatures.length == 2
|
||||
term_1 = "IFF"
|
||||
match_1 = ligatures . location_of term_1 matcher=(Text_Matcher Case_Insensitive)
|
||||
match_1 = ligatures . locate term_1 matcher=(Text_Matcher Case_Insensitive)
|
||||
term_1.length == 3
|
||||
match_1.length == 2
|
||||
term_2 = "ffiffl"
|
||||
match_2 = ligatures . location_of term_2 matcher=(Text_Matcher Case_Insensitive)
|
||||
match_2 = ligatures . locate term_2 matcher=(Text_Matcher Case_Insensitive)
|
||||
term_2.length == 6
|
||||
match_2.length == 2
|
||||
# After being extended to full grapheme clusters, both terms "IFF" and "ffiffl" match the same span of grapheme clusters.
|
||||
@ -1349,13 +1233,13 @@ Text.trim self where=Location.Both what=_.is_whitespace =
|
||||
> Example
|
||||
Comparing Matching in Last Mode in Regex and Text mode
|
||||
|
||||
"aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher == Span (Range 1 3) "aaa"
|
||||
"aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher == Span (Range 0 2) "aaa"
|
||||
"aaa".locate "aa" mode=Matching_Mode.Last matcher=Text_Matcher == Span (Range 1 3) "aaa"
|
||||
"aaa".locate "aa" mode=Matching_Mode.Last matcher=Regex_Matcher == Span (Range 0 2) "aaa"
|
||||
|
||||
"aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher == Span (Range 5 7) "aaa aaa"
|
||||
"aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher == Span (Range 4 6) "aaa aaa"
|
||||
Text.location_of : Text -> (Matching_Mode.First | Matching_Mode.Last) -> Matcher -> Span | Nothing
|
||||
Text.location_of self term="" mode=Matching_Mode.First matcher=Text_Matcher.Case_Sensitive = case matcher of
|
||||
"aaa aaa".locate "aa" mode=Matching_Mode.Last matcher=Text_Matcher == Span (Range 5 7) "aaa aaa"
|
||||
"aaa aaa".locate "aa" mode=Matching_Mode.Last matcher=Regex_Matcher == Span (Range 4 6) "aaa aaa"
|
||||
Text.locate : Text -> (Matching_Mode.First | Matching_Mode.Last) -> Matcher -> Span | Nothing
|
||||
Text.locate self term="" mode=Matching_Mode.First matcher=Text_Matcher.Case_Sensitive = case matcher of
|
||||
Text_Matcher.Case_Sensitive ->
|
||||
codepoint_span = case mode of
|
||||
Matching_Mode.First -> Text_Utils.span_of self term
|
||||
@ -1391,7 +1275,7 @@ Text.location_of self term="" mode=Matching_Mode.First matcher=Text_Matcher.Case
|
||||
Nothing -> Nothing
|
||||
matches -> matches.last.span 0 . to_grapheme_span
|
||||
|
||||
## ALIAS find_all, index_of_all, position_of_all, span_of_all
|
||||
## ALIAS index_of_all, position_of_all, span_of_all
|
||||
Finds all the locations of the `term` in the input.
|
||||
If not found, the function returns an empty Vector.
|
||||
|
||||
@ -1411,8 +1295,8 @@ Text.location_of self term="" mode=Matching_Mode.First matcher=Text_Matcher.Case
|
||||
> Example
|
||||
Finding locations of all occurrences of a substring.
|
||||
|
||||
"Hello World!".location_of_all "J" == []
|
||||
"Hello World!".location_of_all "o" . map .start == [4, 7]
|
||||
"Hello World!".locate_all "J" == []
|
||||
"Hello World!".locate_all "o" . map .start == [4, 7]
|
||||
|
||||
! Match Length
|
||||
The function returns not only the index of the match but a `Span` instance
|
||||
@ -1432,7 +1316,7 @@ Text.location_of self term="" mode=Matching_Mode.First matcher=Text_Matcher.Case
|
||||
|
||||
term = "strasse"
|
||||
text = "MONUMENTENSTRASSE ist eine große Straße."
|
||||
match = text . location_of_all term matcher=(Text_Matcher Case_Insensitive)
|
||||
match = text . locate_all term matcher=(Text_Matcher Case_Insensitive)
|
||||
term.length == 7
|
||||
match . map .length == [7, 6]
|
||||
|
||||
@ -1452,12 +1336,12 @@ Text.location_of self term="" mode=Matching_Mode.First matcher=Text_Matcher.Case
|
||||
|
||||
ligatures = "ffifflFFIFF"
|
||||
ligatures.length == 7
|
||||
match_1 = ligatures . location_of_all "IFF" matcher=(Text_Matcher Case_Insensitive)
|
||||
match_1 = ligatures . locate_all "IFF" matcher=(Text_Matcher Case_Insensitive)
|
||||
match_1 . map .length == [2, 3]
|
||||
match_2 = ligatures . location_of_all "ffiff" matcher=(Text_Matcher Case_Insensitive)
|
||||
match_2 = ligatures . locate_all "ffiff" matcher=(Text_Matcher Case_Insensitive)
|
||||
match_2 . map .length == [2, 5]
|
||||
Text.location_of_all : Text -> Matcher -> [Span]
|
||||
Text.location_of_all self term="" matcher=Text_Matcher.Case_Sensitive = if term.is_empty then Vector.new (self.length + 1) (ix -> Span_Data (Range_Data ix ix) self) else case matcher of
|
||||
Text.locate_all : Text -> Matcher -> [Span]
|
||||
Text.locate_all self term="" matcher=Text_Matcher.Case_Sensitive = if term.is_empty then Vector.new (self.length + 1) (ix -> Span_Data (Range_Data ix ix) self) else case matcher of
|
||||
Text_Matcher.Case_Sensitive ->
|
||||
codepoint_spans = Vector.from_polyglot_array <| Text_Utils.span_of_all self term
|
||||
grahpeme_ixes = Vector.from_polyglot_array <| Text_Utils.utf16_indices_to_grapheme_indices self (codepoint_spans.map .codeunit_start).to_array
|
||||
@ -1472,7 +1356,7 @@ Text.location_of_all self term="" matcher=Text_Matcher.Case_Sensitive = if term.
|
||||
grapheme_spans = Vector.from_polyglot_array <| Text_Utils.span_of_all_case_insensitive self term locale.java_locale
|
||||
grapheme_spans.map grapheme_span->
|
||||
Span_Data (Range_Data grapheme_span.grapheme_start grapheme_span.grapheme_end) self
|
||||
Regex_Matcher.Regex_Matcher_Data _ _ _ _ _ ->
|
||||
_ : Regex_Matcher.Regex_Matcher ->
|
||||
case matcher.compile term . match self Regex_Mode.All of
|
||||
Nothing -> []
|
||||
matches -> matches.map m-> m.span 0 . to_grapheme_span
|
||||
|
@ -27,7 +27,7 @@ type Span
|
||||
Arguments:
|
||||
- range: The range of characters over which the span exists. The range is
|
||||
assumed to have `step` equal to 1.
|
||||
- text: The text over which the span exists.
|
||||
- parent: The text over which the span exists.
|
||||
|
||||
! What is a Character?
|
||||
A character is defined as an Extended Grapheme Cluster, see Unicode
|
||||
@ -43,7 +43,7 @@ type Span
|
||||
text = "Hello!"
|
||||
range = 0.up_to 3
|
||||
Span.Span_Data range text
|
||||
Span_Data (range : Range.Range) (text : Text)
|
||||
Span_Data (range : Range.Range) (parent : Text)
|
||||
|
||||
## The index of the first character included in the span.
|
||||
|
||||
@ -73,6 +73,10 @@ type Span
|
||||
length : Integer
|
||||
length self = self.range.length
|
||||
|
||||
## Returns the part of the text that this span covers.
|
||||
text : Text
|
||||
text self = self.to_utf_16_span.text
|
||||
|
||||
## Converts the span of extended grapheme clusters to a corresponding span
|
||||
of UTF-16 code units.
|
||||
|
||||
@ -83,7 +87,7 @@ type Span
|
||||
(Span_Data (Range 1 3) text).to_utf_16_span == (Utf_16_Span_Data (Range 1 4) text)
|
||||
to_utf_16_span : Utf_16_Span
|
||||
to_utf_16_span self =
|
||||
Utf_16_Span_Data (range_to_char_indices self.text self.range) self.text
|
||||
Utf_16_Span_Data (range_to_char_indices self.parent self.range) self.parent
|
||||
|
||||
# TODO Dubious constructor export
|
||||
from project.Data.Text.Span.Utf_16_Span import all
|
||||
@ -96,7 +100,7 @@ type Utf_16_Span
|
||||
Arguments:
|
||||
- range: The range of code units over which the span exists. The range is
|
||||
assumed to have `step` equal to 1.
|
||||
- text: The text over which the span exists.
|
||||
- parent: The text over which the span exists.
|
||||
|
||||
> Example
|
||||
Creating a span over the first three code units of the text 'a\u{301}bc'.
|
||||
@ -106,7 +110,7 @@ type Utf_16_Span
|
||||
example_span =
|
||||
text = 'a\u{301}bc'
|
||||
Span.Utf_16_Span_Data (Range 0 3) text
|
||||
Utf_16_Span_Data (range : Range.Range) (text : Text)
|
||||
Utf_16_Span_Data (range : Range.Range) (parent : Text)
|
||||
|
||||
## The index of the first code unit included in the span.
|
||||
start : Integer
|
||||
@ -121,6 +125,10 @@ type Utf_16_Span
|
||||
length : Integer
|
||||
length self = self.range.length
|
||||
|
||||
## Returns the part of the text that this span covers.
|
||||
text : Text
|
||||
text self = Text_Utils.substring self.parent self.start self.end
|
||||
|
||||
## Returns a span of extended grapheme clusters which is the closest
|
||||
approximation of this span of code units.
|
||||
|
||||
@ -139,14 +147,14 @@ type Utf_16_Span
|
||||
extended == Span_Data (Range 0 3) text # The span is extended to the whole string since it contained code units from every grapheme cluster.
|
||||
extended.to_utf_16_span == Utf_16_Span_Data (Range 0 6) text
|
||||
to_grapheme_span : Span
|
||||
to_grapheme_span self = if (self.start < 0) || (self.end > Text_Utils.char_length self.text) then Error.throw (Illegal_State_Error "Utf_16_Span indices are out of range of the associated text.") else
|
||||
to_grapheme_span self = if (self.start < 0) || (self.end > Text_Utils.char_length self.parent) then Error.throw (Illegal_State_Error_Data "Utf_16_Span indices are out of range of the associated text.") else
|
||||
if self.end < self.start then Error.throw (Illegal_State_Error "Utf_16_Span invariant violation: start <= end") else
|
||||
case self.start == self.end of
|
||||
True ->
|
||||
grapheme_ix = Text_Utils.utf16_index_to_grapheme_index self.text self.start
|
||||
Span_Data (Range_Data grapheme_ix grapheme_ix) self.text
|
||||
grapheme_ix = Text_Utils.utf16_index_to_grapheme_index self.parent self.start
|
||||
Span_Data (Range_Data grapheme_ix grapheme_ix) self.parent
|
||||
False ->
|
||||
grapheme_ixes = Text_Utils.utf16_indices_to_grapheme_indices self.text [self.start, self.end - 1].to_array
|
||||
grapheme_ixes = Text_Utils.utf16_indices_to_grapheme_indices self.parent [self.start, self.end - 1].to_array
|
||||
grapheme_first = grapheme_ixes.at 0
|
||||
grapheme_last = grapheme_ixes.at 1
|
||||
## We find the grapheme index of the last code unit actually contained within our span and set the
|
||||
@ -154,7 +162,7 @@ type Utf_16_Span
|
||||
only a part of a grapheme were contained in our original span, the resulting span will be
|
||||
extended to contain this whole grapheme.
|
||||
grapheme_end = grapheme_last + 1
|
||||
Span_Data (Range_Data grapheme_first grapheme_end) self.text
|
||||
Span_Data (Range_Data grapheme_first grapheme_end) self.parent
|
||||
|
||||
## PRIVATE
|
||||
Utility function taking a range pointing at grapheme clusters and converting
|
||||
|
@ -500,7 +500,7 @@ type File
|
||||
extension : Text
|
||||
extension self =
|
||||
name = self.name
|
||||
last_dot = name.location_of "." mode=Matching_Mode.Last
|
||||
last_dot = name.locate "." mode=Matching_Mode.Last
|
||||
if last_dot.is_nothing then "" else
|
||||
extension = name.drop (First last_dot.start)
|
||||
if extension == "." then "" else extension
|
||||
|
@ -9,7 +9,7 @@ type Test_Result
|
||||
Arguments:
|
||||
- message: The reason why the test failed.
|
||||
- details: Additional context of the error, for example the stack trace.
|
||||
Failure message details
|
||||
Failure message details=Nothing
|
||||
|
||||
## Represents a pending behavioral test.
|
||||
|
||||
|
@ -10,11 +10,14 @@ spec = Test.group "Text.Span" <|
|
||||
span = Span_Data (Range_Data 0 3) text
|
||||
span.start . should_equal 0
|
||||
span.end . should_equal 3
|
||||
span.text . should_equal text
|
||||
span.parent . should_equal text
|
||||
span.text . should_equal "Hel"
|
||||
|
||||
Test.specify "should be able to be converted to code units" <|
|
||||
text = 'ae\u{301}fz'
|
||||
(Span_Data (Range_Data 1 3) text).to_utf_16_span . should_equal (Utf_16_Span_Data (Range_Data 1 4) text)
|
||||
span = Span_Data (Range_Data 1 3) text
|
||||
span.to_utf_16_span . should_equal (Utf_16_Span_Data (Range_Data 1 4) text)
|
||||
span.text . should_equal 'e\u{301}f'
|
||||
|
||||
Test.specify "should expand to the associated grapheme clusters" <|
|
||||
text = 'a\u{301}e\u{302}o\u{303}'
|
||||
|
@ -1110,16 +1110,16 @@ spec =
|
||||
|
||||
'✨🚀🚧'*2 . should_equal '✨🚀🚧✨🚀🚧'
|
||||
|
||||
Test.specify "location_of should work as shown in examples" <|
|
||||
Test.specify "locate should work as shown in examples" <|
|
||||
example_1 =
|
||||
"Hello World!".location_of "J" == Nothing
|
||||
"Hello World!".location_of "o" == Span_Data (Range_Data 4 5) "Hello World!"
|
||||
"Hello World!".location_of "o" mode=Matching_Mode.Last == Span_Data (Range_Data 4 5) "Hello World!"
|
||||
"Hello World!".locate "J" == Nothing
|
||||
"Hello World!".locate "o" == Span_Data (Range_Data 4 5) "Hello World!"
|
||||
"Hello World!".locate "o" mode=Matching_Mode.Last == Span_Data (Range_Data 4 5) "Hello World!"
|
||||
|
||||
example_2 =
|
||||
term = "straße"
|
||||
text = "MONUMENTENSTRASSE 42"
|
||||
match = text . location_of term matcher=Text_Matcher.Case_Insensitive
|
||||
match = text . locate term matcher=Text_Matcher.Case_Insensitive
|
||||
term.length . should_equal 6
|
||||
match.length . should_equal 7
|
||||
|
||||
@ -1127,32 +1127,32 @@ spec =
|
||||
ligatures = "ffiffl"
|
||||
ligatures.length . should_equal 2
|
||||
term_1 = "IFF"
|
||||
match_1 = ligatures . location_of term_1 matcher=Text_Matcher.Case_Insensitive
|
||||
match_1 = ligatures . locate term_1 matcher=Text_Matcher.Case_Insensitive
|
||||
term_1.length . should_equal 3
|
||||
match_1.length . should_equal 2
|
||||
term_2 = "ffiffl"
|
||||
match_2 = ligatures . location_of term_2 matcher=Text_Matcher.Case_Insensitive
|
||||
match_2 = ligatures . locate term_2 matcher=Text_Matcher.Case_Insensitive
|
||||
term_2.length . should_equal 6
|
||||
match_2.length . should_equal 2
|
||||
match_1 . should_equal match_2
|
||||
|
||||
example_4 =
|
||||
"Hello World!".location_of_all "J" . should_equal []
|
||||
"Hello World!".location_of_all "o" . map .start . should_equal [4, 7]
|
||||
"Hello World!".locate_all "J" . should_equal []
|
||||
"Hello World!".locate_all "o" . map .start . should_equal [4, 7]
|
||||
|
||||
example_5 =
|
||||
term = "strasse"
|
||||
text = "MONUMENTENSTRASSE ist eine große Straße."
|
||||
match = text . location_of_all term matcher=Text_Matcher.Case_Insensitive
|
||||
match = text . locate_all term matcher=Text_Matcher.Case_Insensitive
|
||||
term.length . should_equal 7
|
||||
match . map .length . should_equal [7, 6]
|
||||
|
||||
example_6 =
|
||||
ligatures = "ffifflFFIFF"
|
||||
ligatures.length . should_equal 7
|
||||
match_1 = ligatures . location_of_all "IFF" matcher=Text_Matcher.Case_Insensitive
|
||||
match_1 = ligatures . locate_all "IFF" matcher=Text_Matcher.Case_Insensitive
|
||||
match_1 . map .length . should_equal [2, 3]
|
||||
match_2 = ligatures . location_of_all "ffiff" matcher=Text_Matcher.Case_Insensitive
|
||||
match_2 = ligatures . locate_all "ffiff" matcher=Text_Matcher.Case_Insensitive
|
||||
match_2 . map .length . should_equal [2, 5]
|
||||
|
||||
# Put them in blocks to avoid name clashes.
|
||||
@ -1163,165 +1163,216 @@ spec =
|
||||
example_5
|
||||
example_6
|
||||
|
||||
Test.specify "should allow to find location_of occurrences within a text" <|
|
||||
"Hello World!".location_of_all "J" . should_equal []
|
||||
"Hello World!".location_of_all "o" . map .start . should_equal [4, 7]
|
||||
Test.specify "should allow to find locate occurrences within a text" <|
|
||||
"Hello World!".locate_all "J" . should_equal []
|
||||
"Hello World!".locate_all "o" . map .start . should_equal [4, 7]
|
||||
|
||||
accents = 'a\u{301}e\u{301}o\u{301}'
|
||||
accents.location_of accent_1 . should_equal (Span_Data (Range_Data 1 2) accents)
|
||||
accents.locate accent_1 . should_equal (Span_Data (Range_Data 1 2) accents)
|
||||
|
||||
"".location_of "foo" . should_equal Nothing
|
||||
"".location_of "foo" mode=Matching_Mode.Last . should_equal Nothing
|
||||
"".location_of_all "foo" . should_equal []
|
||||
"".location_of "" . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".location_of "" mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".location_of_all "" . should_equal [Span_Data (Range_Data 0 0) ""]
|
||||
"".locate "foo" . should_equal Nothing
|
||||
"".locate "foo" mode=Matching_Mode.Last . should_equal Nothing
|
||||
"".locate_all "foo" . should_equal []
|
||||
"".locate "" . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".locate "" mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".locate_all "" . should_equal [Span_Data (Range_Data 0 0) ""]
|
||||
abc = 'A\u{301}ßC'
|
||||
abc.location_of "" . should_equal (Span_Data (Range_Data 0 0) abc)
|
||||
abc.location_of "" mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 3 3) abc)
|
||||
abc.location_of_all "" . should_equal [Span_Data (Range_Data 0 0) abc, Span_Data (Range_Data 1 1) abc, Span_Data (Range_Data 2 2) abc, Span_Data (Range_Data 3 3) abc]
|
||||
abc.locate "" . should_equal (Span_Data (Range_Data 0 0) abc)
|
||||
abc.locate "" mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 3 3) abc)
|
||||
abc.locate_all "" . should_equal [Span_Data (Range_Data 0 0) abc, Span_Data (Range_Data 1 1) abc, Span_Data (Range_Data 2 2) abc, Span_Data (Range_Data 3 3) abc]
|
||||
|
||||
Test.specify "should allow case-insensitive matching in location_of" <|
|
||||
Test.specify "should allow case-insensitive matching in locate" <|
|
||||
hello = "Hello WORLD!"
|
||||
case_insensitive = Text_Matcher.Case_Insensitive
|
||||
hello.location_of "world" . should_equal Nothing
|
||||
hello.location_of "world" matcher=case_insensitive . should_equal (Span_Data (Range_Data 6 11) hello)
|
||||
hello.locate "world" . should_equal Nothing
|
||||
hello.locate "world" matcher=case_insensitive . should_equal (Span_Data (Range_Data 6 11) hello)
|
||||
|
||||
hello.location_of "o" mode=Regex_Mode.First matcher=case_insensitive . should_equal (Span_Data (Range_Data 4 5) hello)
|
||||
hello.location_of "o" mode=Matching_Mode.Last matcher=case_insensitive . should_equal (Span_Data (Range_Data 7 8) hello)
|
||||
hello.locate "o" mode=Regex_Mode.First matcher=case_insensitive . should_equal (Span_Data (Range_Data 4 5) hello)
|
||||
hello.locate "o" mode=Matching_Mode.Last matcher=case_insensitive . should_equal (Span_Data (Range_Data 7 8) hello)
|
||||
|
||||
accents = 'A\u{301}E\u{301}O\u{301}'
|
||||
accents.location_of accent_1 matcher=case_insensitive . should_equal (Span_Data (Range_Data 1 2) accents)
|
||||
accents.locate accent_1 matcher=case_insensitive . should_equal (Span_Data (Range_Data 1 2) accents)
|
||||
|
||||
"Strasse".location_of "ß" matcher=case_insensitive . should_equal (Span_Data (Range_Data 4 6) "Strasse")
|
||||
"Monumentenstraße 42".location_of "STRASSE" matcher=case_insensitive . should_equal (Span_Data (Range_Data 10 16) "Monumentenstraße 42")
|
||||
"Strasse".locate "ß" matcher=case_insensitive . should_equal (Span_Data (Range_Data 4 6) "Strasse")
|
||||
"Monumentenstraße 42".locate "STRASSE" matcher=case_insensitive . should_equal (Span_Data (Range_Data 10 16) "Monumentenstraße 42")
|
||||
|
||||
'\u0390'.location_of '\u03B9\u0308\u0301' matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 1) '\u0390')
|
||||
'ԵՒ'.location_of 'և' . should_equal Nothing
|
||||
'ԵՒ'.location_of 'և' matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 2) 'ԵՒ')
|
||||
'և'.location_of 'ԵՒ' matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 1) 'և')
|
||||
'\u0390'.locate '\u03B9\u0308\u0301' matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 1) '\u0390')
|
||||
'ԵՒ'.locate 'և' . should_equal Nothing
|
||||
'ԵՒ'.locate 'և' matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 2) 'ԵՒ')
|
||||
'և'.locate 'ԵՒ' matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 1) 'և')
|
||||
|
||||
ligatures = 'ffafffiflffifflſtstZ'
|
||||
ligatures.location_of 'FFI' matcher=case_insensitive . should_equal (Span_Data (Range_Data 3 5) ligatures)
|
||||
ligatures.location_of 'FF' matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 2) ligatures)
|
||||
ligatures.location_of 'ff' matcher=case_insensitive mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 7 8) ligatures)
|
||||
ligatures.location_of_all 'ff' . should_equal [Span_Data (Range_Data 0 2) ligatures]
|
||||
ligatures.location_of_all 'FF' matcher=case_insensitive . should_equal [Span_Data (Range_Data 0 2) ligatures, Span_Data (Range_Data 3 4) ligatures, Span_Data (Range_Data 6 7) ligatures, Span_Data (Range_Data 7 8) ligatures]
|
||||
ligatures.location_of_all 'ffi' matcher=case_insensitive . should_equal [Span_Data (Range_Data 3 5) ligatures, Span_Data (Range_Data 6 7) ligatures]
|
||||
'fffi'.location_of_all 'ff' matcher=case_insensitive . should_equal [Span_Data (Range_Data 0 2) 'fffi']
|
||||
'fffi'.location_of_all 'ffi' . should_equal []
|
||||
'fffi'.location_of_all 'ffi' matcher=case_insensitive . should_equal [Span_Data (Range_Data 1 4) 'fffi']
|
||||
'FFFI'.location_of 'ffi' matcher=case_insensitive . should_equal (Span_Data (Range_Data 1 4) 'FFFI')
|
||||
ligatures.locate 'FFI' matcher=case_insensitive . should_equal (Span_Data (Range_Data 3 5) ligatures)
|
||||
ligatures.locate 'FF' matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 2) ligatures)
|
||||
ligatures.locate 'ff' matcher=case_insensitive mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 7 8) ligatures)
|
||||
ligatures.locate_all 'ff' . should_equal [Span_Data (Range_Data 0 2) ligatures]
|
||||
ligatures.locate_all 'FF' matcher=case_insensitive . should_equal [Span_Data (Range_Data 0 2) ligatures, Span_Data (Range_Data 3 4) ligatures, Span_Data (Range_Data 6 7) ligatures, Span_Data (Range_Data 7 8) ligatures]
|
||||
ligatures.locate_all 'ffi' matcher=case_insensitive . should_equal [Span_Data (Range_Data 3 5) ligatures, Span_Data (Range_Data 6 7) ligatures]
|
||||
'fffi'.locate_all 'ff' matcher=case_insensitive . should_equal [Span_Data (Range_Data 0 2) 'fffi']
|
||||
'fffi'.locate_all 'ffi' . should_equal []
|
||||
'fffi'.locate_all 'ffi' matcher=case_insensitive . should_equal [Span_Data (Range_Data 1 4) 'fffi']
|
||||
'FFFI'.locate 'ffi' matcher=case_insensitive . should_equal (Span_Data (Range_Data 1 4) 'FFFI')
|
||||
|
||||
'ffiffl'.location_of 'IF' matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 2) 'ffiffl')
|
||||
'ffiffl'.location_of 'F' Matching_Mode.Last matcher=case_insensitive . should_equal (Span_Data (Range_Data 1 2) 'ffiffl')
|
||||
'ffiffl'.location_of_all 'F' matcher=case_insensitive . should_equal [Span_Data (Range_Data 0 1) 'ffiffl', Span_Data (Range_Data 0 1) 'ffiffl', Span_Data (Range_Data 1 2) 'ffiffl', Span_Data (Range_Data 1 2) 'ffiffl']
|
||||
'aaffibb'.location_of_all 'af' matcher=case_insensitive . should_equal [Span_Data (Range_Data 1 3) 'aaffibb']
|
||||
'aaffibb'.location_of_all 'affi' matcher=case_insensitive . should_equal [Span_Data (Range_Data 1 3) 'aaffibb']
|
||||
'aaffibb'.location_of_all 'ib' matcher=case_insensitive . should_equal [Span_Data (Range_Data 2 4) 'aaffibb']
|
||||
'aaffibb'.location_of_all 'ffib' matcher=case_insensitive . should_equal [Span_Data (Range_Data 2 4) 'aaffibb']
|
||||
'ffiffl'.locate 'IF' matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 2) 'ffiffl')
|
||||
'ffiffl'.locate 'F' Matching_Mode.Last matcher=case_insensitive . should_equal (Span_Data (Range_Data 1 2) 'ffiffl')
|
||||
'ffiffl'.locate_all 'F' matcher=case_insensitive . should_equal [Span_Data (Range_Data 0 1) 'ffiffl', Span_Data (Range_Data 0 1) 'ffiffl', Span_Data (Range_Data 1 2) 'ffiffl', Span_Data (Range_Data 1 2) 'ffiffl']
|
||||
'aaffibb'.locate_all 'af' matcher=case_insensitive . should_equal [Span_Data (Range_Data 1 3) 'aaffibb']
|
||||
'aaffibb'.locate_all 'affi' matcher=case_insensitive . should_equal [Span_Data (Range_Data 1 3) 'aaffibb']
|
||||
'aaffibb'.locate_all 'ib' matcher=case_insensitive . should_equal [Span_Data (Range_Data 2 4) 'aaffibb']
|
||||
'aaffibb'.locate_all 'ffib' matcher=case_insensitive . should_equal [Span_Data (Range_Data 2 4) 'aaffibb']
|
||||
|
||||
"".location_of "foo" matcher=case_insensitive . should_equal Nothing
|
||||
"".location_of "foo" matcher=case_insensitive mode=Matching_Mode.Last . should_equal Nothing
|
||||
"".location_of_all "foo" matcher=case_insensitive . should_equal []
|
||||
"".location_of "" matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".location_of "" matcher=case_insensitive mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".location_of_all "" matcher=case_insensitive . should_equal [Span_Data (Range_Data 0 0) ""]
|
||||
"".locate "foo" matcher=case_insensitive . should_equal Nothing
|
||||
"".locate "foo" matcher=case_insensitive mode=Matching_Mode.Last . should_equal Nothing
|
||||
"".locate_all "foo" matcher=case_insensitive . should_equal []
|
||||
"".locate "" matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".locate "" matcher=case_insensitive mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".locate_all "" matcher=case_insensitive . should_equal [Span_Data (Range_Data 0 0) ""]
|
||||
abc = 'A\u{301}ßC'
|
||||
abc.location_of "" matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 0) abc)
|
||||
abc.location_of "" matcher=case_insensitive mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 3 3) abc)
|
||||
abc.location_of_all "" matcher=case_insensitive . should_equal [Span_Data (Range_Data 0 0) abc, Span_Data (Range_Data 1 1) abc, Span_Data (Range_Data 2 2) abc, Span_Data (Range_Data 3 3) abc]
|
||||
abc.locate "" matcher=case_insensitive . should_equal (Span_Data (Range_Data 0 0) abc)
|
||||
abc.locate "" matcher=case_insensitive mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 3 3) abc)
|
||||
abc.locate_all "" matcher=case_insensitive . should_equal [Span_Data (Range_Data 0 0) abc, Span_Data (Range_Data 1 1) abc, Span_Data (Range_Data 2 2) abc, Span_Data (Range_Data 3 3) abc]
|
||||
|
||||
Test.specify "should allow regexes in location_of" <|
|
||||
Test.specify "should allow regexes in locate" <|
|
||||
hello = "Hello World!"
|
||||
regex = Regex_Matcher.Regex_Matcher_Data
|
||||
regex_insensitive = Regex_Matcher.Regex_Matcher_Data case_sensitivity=Case_Sensitivity.Insensitive
|
||||
hello.location_of ".o" Matching_Mode.First matcher=regex . should_equal (Span_Data (Range_Data 3 5) hello)
|
||||
hello.location_of ".o" Matching_Mode.Last matcher=regex . should_equal (Span_Data (Range_Data 6 8) hello)
|
||||
hello.location_of_all ".o" matcher=regex . map .start . should_equal [3, 6]
|
||||
hello.locate ".o" Matching_Mode.First matcher=regex . should_equal (Span_Data (Range_Data 3 5) hello)
|
||||
hello.locate ".o" Matching_Mode.Last matcher=regex . should_equal (Span_Data (Range_Data 6 8) hello)
|
||||
hello.locate_all ".o" matcher=regex . map .start . should_equal [3, 6]
|
||||
|
||||
"foobar".location_of "BAR" Regex_Mode.First matcher=regex_insensitive . should_equal (Span_Data (Range_Data 3 6) "foobar")
|
||||
"foobar".locate "BAR" Regex_Mode.First matcher=regex_insensitive . should_equal (Span_Data (Range_Data 3 6) "foobar")
|
||||
|
||||
## Regex matching does not do case folding
|
||||
"Strasse".location_of "ß" Regex_Mode.First matcher=regex_insensitive . should_equal Nothing
|
||||
"Strasse".locate "ß" Regex_Mode.First matcher=regex_insensitive . should_equal Nothing
|
||||
|
||||
## But it should handle the Unicode normalization
|
||||
accents = 'a\u{301}e\u{301}o\u{301}'
|
||||
accents.location_of accent_1 Regex_Mode.First matcher=regex . should_equal (Span_Data (Range_Data 1 2) accents)
|
||||
Test.specify "should correctly handle regex edge cases in location_of" pending="Figure out how to make Regex correctly handle empty patterns." <|
|
||||
accents.locate accent_1 Regex_Mode.First matcher=regex . should_equal (Span_Data (Range_Data 1 2) accents)
|
||||
Test.specify "should correctly handle regex edge cases in locate" pending="Figure out how to make Regex correctly handle empty patterns." <|
|
||||
regex = Regex_Matcher.Regex_Matcher_Data
|
||||
"".location_of "foo" matcher=regex . should_equal Nothing
|
||||
"".location_of "foo" matcher=regex mode=Matching_Mode.Last . should_equal Nothing
|
||||
"".location_of_all "foo" matcher=regex . should_equal []
|
||||
"".location_of "" matcher=regex . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".location_of_all "" matcher=regex . should_equal [Span_Data (Range_Data 0 0) ""]
|
||||
"".location_of "" matcher=regex mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".locate "foo" matcher=regex . should_equal Nothing
|
||||
"".locate "foo" matcher=regex mode=Matching_Mode.Last . should_equal Nothing
|
||||
"".locate_all "foo" matcher=regex . should_equal []
|
||||
"".locate "" matcher=regex . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
"".locate_all "" matcher=regex . should_equal [Span_Data (Range_Data 0 0) ""]
|
||||
"".locate "" matcher=regex mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 0 0) "")
|
||||
abc = 'A\u{301}ßC'
|
||||
abc.location_of "" matcher=regex . should_equal (Span_Data (Range_Data 0 0) abc)
|
||||
abc.location_of_all "" matcher=regex . should_equal [Span_Data (Range_Data 0 0) abc, Span_Data (Range_Data 0 0) abc, Span_Data (Range_Data 1 1) abc, Span_Data (Range_Data 2 2) abc, Span_Data (Range_Data 3 3) abc]
|
||||
abc.location_of "" matcher=regex mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 3 3) abc)
|
||||
abc.locate "" matcher=regex . should_equal (Span_Data (Range_Data 0 0) abc)
|
||||
abc.locate_all "" matcher=regex . should_equal [Span_Data (Range_Data 0 0) abc, Span_Data (Range_Data 0 0) abc, Span_Data (Range_Data 1 1) abc, Span_Data (Range_Data 2 2) abc, Span_Data (Range_Data 3 3) abc]
|
||||
abc.locate "" matcher=regex mode=Matching_Mode.Last . should_equal (Span_Data (Range_Data 3 3) abc)
|
||||
|
||||
Test.specify "should handle overlapping matches as shown in the examples" <|
|
||||
"aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher.Case_Sensitive . should_equal (Span_Data (Range_Data 1 3) "aaa")
|
||||
"aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher.Regex_Matcher_Data . should_equal (Span_Data (Range_Data 0 2) "aaa")
|
||||
"aaa".locate "aa" mode=Matching_Mode.Last matcher=Text_Matcher.Case_Sensitive . should_equal (Span_Data (Range_Data 1 3) "aaa")
|
||||
"aaa".locate "aa" mode=Matching_Mode.Last matcher=Regex_Matcher.Regex_Matcher_Data . should_equal (Span_Data (Range_Data 0 2) "aaa")
|
||||
|
||||
"aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher.Case_Sensitive . should_equal (Span_Data (Range_Data 5 7) "aaa aaa")
|
||||
"aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher.Regex_Matcher_Data . should_equal (Span_Data (Range_Data 4 6) "aaa aaa")
|
||||
"aaa aaa".locate "aa" mode=Matching_Mode.Last matcher=Text_Matcher.Case_Sensitive . should_equal (Span_Data (Range_Data 5 7) "aaa aaa")
|
||||
"aaa aaa".locate "aa" mode=Matching_Mode.Last matcher=Regex_Matcher.Regex_Matcher_Data . should_equal (Span_Data (Range_Data 4 6) "aaa aaa")
|
||||
|
||||
Test.specify "should allow to match one or more occurrences of a pattern in the text" <|
|
||||
"abacadae".match_all "a[bc]" . should_equal ["ab", "ac"]
|
||||
"abacadae".match_all "a." . should_equal ["ab", "ac", "ad", "ae"]
|
||||
"abacadae".match_all "a.*" . should_equal ["abacadae"]
|
||||
"abacadae".match_all "a.+?" . should_equal ["ab", "ac", "ad", "ae"]
|
||||
|
||||
"abacadae".match "a[bc]" mode=Matching_Mode.Last . should_equal "ac"
|
||||
"abacadae".match "a." mode=Matching_Mode.Last . should_equal "ae"
|
||||
"abacadae".match "a.*" mode=Matching_Mode.Last . should_equal "abacadae"
|
||||
"abacadae".match "a.+?" mode=Matching_Mode.Last . should_equal "ae"
|
||||
|
||||
"abacadae".match "a[bc]" matcher=Text_Matcher.Case_Sensitive . should_equal Nothing
|
||||
"abABacAC".match "ab" matcher=Text_Matcher.Case_Sensitive mode=Matching_Mode.Last . should_equal "ab"
|
||||
"abABacAC".match "ab" matcher=Text_Matcher.Case_Insensitive mode=Matching_Mode.Last . should_equal "AB"
|
||||
|
||||
"abABacAC".match_all "ab" matcher=Text_Matcher.Case_Sensitive . should_equal ["ab"]
|
||||
"abABacAC".match_all "ab" matcher=Text_Matcher.Case_Insensitive . should_equal ["ab", "AB"]
|
||||
"abacadae".match_all "a[bc]" matcher=Text_Matcher.Case_Sensitive . should_equal []
|
||||
|
||||
"Strasse and Straße".match_all "STRASSE" matcher=Text_Matcher.Case_Sensitive . should_equal []
|
||||
"Strasse and Straße".match_all "STRASSE" matcher=Text_Matcher.Case_Insensitive . should_equal ["Strasse", "Straße"]
|
||||
|
||||
Test.specify "should default to exact matching for locate but regex for match" <|
|
||||
txt = "aba[bc]adacae"
|
||||
"ab".locate "ab" . should_equal (Span_Data (Range_Data 0 2) "ab")
|
||||
"ab".locate "a[bc]" . should_equal Nothing
|
||||
"ab".locate_all "a[bc]" . should_equal []
|
||||
|
||||
txt.locate "a[bc]" . should_equal (Span_Data (Range_Data 2 7) txt)
|
||||
txt.locate_all "a[bc]" . should_equal [Span_Data (Range_Data 2 7) txt]
|
||||
|
||||
"ab".match "a[bc]" . should_equal "ab"
|
||||
"a[bc]".match "a[bc]" . should_equal Nothing
|
||||
"a[bc]".match_all "a[bc]" . should_equal []
|
||||
|
||||
txt.match "a[bc]" . should_equal "ab"
|
||||
txt.match_all "a[bc]" . should_equal ["ab", "ac"]
|
||||
|
||||
Test.group "Regex matching" <|
|
||||
Test.specify "should be possible on text" <|
|
||||
match = "My Text: Goes Here".match "^My Text: (.+)$" mode=Regex_Mode.First
|
||||
match . should_be_a Default_Engine.Match_Data
|
||||
match.group 1 . should_equal "Goes Here"
|
||||
match = "My Text: Goes Here".match "^My Text: (.+)$"
|
||||
match.should_equal "My Text: Goes Here"
|
||||
|
||||
Test.specify "should be possible on unicode text" <|
|
||||
match = "Korean: 건반".match "^Korean: (.+)$" mode=Regex_Mode.First
|
||||
match . should_be_a Default_Engine.Match_Data
|
||||
match.group 1 . should_equal "건반"
|
||||
txt = "maza건반zaa"
|
||||
txt.match "^a..z$" . should_equal Nothing
|
||||
txt.match "^m..a..z.a$" . should_equal txt
|
||||
txt.match "a..z" . should_equal "a건반z"
|
||||
|
||||
Test.specify "should be possible in ascii mode" <|
|
||||
match = "İ".match "\w" mode=Regex_Mode.First match_ascii=True
|
||||
match = "İ".match "\w" matcher=(Regex_Matcher.Regex_Matcher_Data match_ascii=True)
|
||||
match.should_equal Nothing
|
||||
|
||||
Test.specify "should be possible in case-insensitive mode" <|
|
||||
match = "MY".match "my" mode=Regex_Mode.First case_insensitive=True
|
||||
match . should_be_a Default_Engine.Match_Data
|
||||
match.group 0 . should_equal "MY"
|
||||
match = "MY".match "my" matcher=(Regex_Matcher.Regex_Matcher_Data case_sensitivity=Case_Sensitivity.Insensitive)
|
||||
match.should_equal "MY"
|
||||
|
||||
Test.specify "should be possible in dot_matches_newline mode" <|
|
||||
match = 'Foo\n'.match "(....)" mode=Regex_Mode.First dot_matches_newline=True
|
||||
match . should_be_a Default_Engine.Match_Data
|
||||
match.group 0 . should_equal 'Foo\n'
|
||||
match = 'Foo\n'.match "(....)" matcher=(Regex_Matcher.Regex_Matcher_Data dot_matches_newline=True)
|
||||
match.should_equal 'Foo\n'
|
||||
|
||||
Test.specify "should be possible in multiline mode" <|
|
||||
text = """
|
||||
Foo
|
||||
bar
|
||||
match = text.match "^(...)$" multiline=True
|
||||
match.length . should_equal 2
|
||||
match.at 0 . group 1 . should_equal "Foo"
|
||||
match.at 1 . group 1 . should_equal "bar"
|
||||
match = text.match_all "^(...)$" matcher=(Regex_Matcher.Regex_Matcher_Data multiline=True)
|
||||
match.should_equal ["Foo", "bar"]
|
||||
|
||||
Test.specify "should be possible in comments mode" <|
|
||||
match = "abcde".match "(..) # Match two of any character" comments=True mode=Regex_Mode.First
|
||||
match . should_be_a Default_Engine.Match_Data
|
||||
match.group 0 . should_equal "ab"
|
||||
match = "abcde".match "(..) # Match two of any character" matcher=(Regex_Matcher.Regex_Matcher_Data comments=True)
|
||||
match.should_equal "ab"
|
||||
|
||||
Test.group "Regex matches" <|
|
||||
Test.specify "should be possible on text" <|
|
||||
"My Text: Goes Here".matches "^My Text: (.+)$" . should_be_true
|
||||
Test.group "Text.is_match" <|
|
||||
Test.specify "should default to regex" <|
|
||||
"My Text: Goes Here".is_match "^My Text: (.+)$" . should_be_true
|
||||
"555-801-1923".is_match "^\d{3}-\d{3}-\d{4}$" . should_be_true
|
||||
"Hello".is_match "^[a-z]+$" . should_be_false
|
||||
"Hello".is_match "^[a-z]+$" (Regex_Matcher.Regex_Matcher_Data case_sensitivity=Case_Sensitivity.Insensitive) . should_be_true
|
||||
|
||||
Test.specify "should only match whole input" <|
|
||||
"Hello".is_match "[a-z]" . should_be_false
|
||||
"x".is_match "[a-z]" . should_be_true
|
||||
|
||||
Test.specify "should allow Text_Matcher too" <|
|
||||
"foobar".is_match "foobar" matcher=Text_Matcher.Case_Sensitive . should_be_true
|
||||
"foobar".is_match "FOOBAR" matcher=Text_Matcher.Case_Sensitive . should_be_false
|
||||
"foobar".is_match "foo.*" matcher=Text_Matcher.Case_Sensitive . should_be_false
|
||||
"foobar".is_match "foo" matcher=Text_Matcher.Case_Sensitive . should_be_false
|
||||
|
||||
"foobar".is_match "foobar" matcher=Text_Matcher.Case_Insensitive . should_be_true
|
||||
"foobar".is_match "FOOBAR" matcher=Text_Matcher.Case_Insensitive . should_be_true
|
||||
"foobar".is_match "foo.*" matcher=Text_Matcher.Case_Insensitive . should_be_false
|
||||
"foobar".is_match "foo" matcher=Text_Matcher.Case_Insensitive . should_be_false
|
||||
|
||||
Test.specify "should be possible on unicode text" <|
|
||||
"Korean: 건반".matches "^Korean: (.+)$" . should_be_true
|
||||
"Korean: 건반".is_match "^Korean: (.+)$" . should_be_true
|
||||
|
||||
Test.specify "should be possible in ascii mode" <|
|
||||
"İ".matches "\w" match_ascii=True . should_be_false
|
||||
"İ".is_match "\w" (Regex_Matcher.Regex_Matcher_Data match_ascii=True) . should_be_false
|
||||
|
||||
Test.specify "should be possible in case-insensitive mode" <|
|
||||
"MY".matches "my" case_insensitive=True . should_be_true
|
||||
"MY".is_match "my" (Regex_Matcher.Regex_Matcher_Data case_sensitivity=Case_Sensitivity.Insensitive) . should_be_true
|
||||
|
||||
Test.specify "should be possible in dot_matches_newline mode" <|
|
||||
'Foo\n'.matches "(....)" dot_matches_newline=True . should_be_true
|
||||
'Foo\n'.is_match "(....)" (Regex_Matcher.Regex_Matcher_Data dot_matches_newline=True) . should_be_true
|
||||
|
||||
multiline_matches_message = """
|
||||
This test does not make sense once we require matches to match the
|
||||
@ -1332,33 +1383,33 @@ spec =
|
||||
text = """
|
||||
Foo
|
||||
bar
|
||||
text.matches "^(...)$" multiline=True . should_be_true
|
||||
text.is_match "^(...)$" (Regex_Matcher.Regex_Matcher_Data multiline=True) . should_be_true
|
||||
|
||||
Test.specify "should be possible in comments mode" <|
|
||||
"abcde".matches "(.....) # Match any five characters" comments=True . should_be_true
|
||||
"abcde".is_match "(.....) # Match any five characters" (Regex_Matcher.Regex_Matcher_Data comments=True) . should_be_true
|
||||
|
||||
Test.group "Regex finding" <|
|
||||
Test.specify "should be possible on text" <|
|
||||
match = "My Text: Goes Here".find "^My Text: (.+)$" mode=Regex_Mode.First
|
||||
match = "My Text: Goes Here".match "^My Text: (.+)$" mode=Matching_Mode.First
|
||||
match . should_be_a Text
|
||||
match . should_equal "My Text: Goes Here"
|
||||
|
||||
Test.specify "should be possible on unicode text" <|
|
||||
match = "Korean: 건반".find "^Korean: (.+)$" mode=Regex_Mode.First
|
||||
match = "Korean: 건반".match "^Korean: (.+)$" mode=Matching_Mode.First
|
||||
match . should_be_a Text
|
||||
match . should_equal "Korean: 건반"
|
||||
|
||||
Test.specify "should be possible in ascii mode" <|
|
||||
match = "İ".find "\w" mode=Regex_Mode.First match_ascii=True
|
||||
match = "İ".match "\w" matcher=(Regex_Matcher.Regex_Matcher_Data match_ascii=True)
|
||||
match . should_equal Nothing
|
||||
|
||||
Test.specify "should be possible in case-insensitive mode" <|
|
||||
match = "MY".find "my" mode=Regex_Mode.First case_insensitive=True
|
||||
match = "MY".match "my" matcher=(Regex_Matcher.Regex_Matcher_Data case_sensitivity=Case_Sensitivity.Insensitive)
|
||||
match . should_be_a Text
|
||||
match . should_equal "MY"
|
||||
|
||||
Test.specify "should be possible in dot_matches_newline mode" <|
|
||||
match = 'Foo\n'.find "(....)" mode=Regex_Mode.First dot_matches_newline=True
|
||||
match = 'Foo\n'.match "(....)" matcher=(Regex_Matcher.Regex_Matcher_Data dot_matches_newline=True)
|
||||
match . should_be_a Text
|
||||
match . should_equal 'Foo\n'
|
||||
|
||||
@ -1366,13 +1417,11 @@ spec =
|
||||
text = """
|
||||
Foo
|
||||
bar
|
||||
match = text.find "^(...)$" multiline=True
|
||||
match.length . should_equal 2
|
||||
match.at 0 . should_equal "Foo"
|
||||
match.at 1 . should_equal "bar"
|
||||
match = text.match_all "^(...)$" matcher=(Regex_Matcher.Regex_Matcher_Data multiline=True)
|
||||
match . should_equal ["Foo", "bar"]
|
||||
|
||||
Test.specify "should be possible in comments mode" <|
|
||||
match = "abcde".find "(..) # Match two of any character" comments=True mode=Regex_Mode.First
|
||||
match = "abcde".match "(..) # Match two of any character" matcher=(Regex_Matcher.Regex_Matcher_Data comments=True)
|
||||
match . should_be_a Text
|
||||
match . should_equal "ab"
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user