mirror of
https://github.com/enso-org/enso.git
synced 2024-12-23 16:32:18 +03:00
Data analysts should be able to use Text.replace
to substitute parts of the text (#3393)
Implements https://www.pivotaltracker.com/story/show/181266274
This commit is contained in:
parent
0ab46bc6f8
commit
0ea5dc2a6f
@ -105,6 +105,7 @@
|
|||||||
- [Implemented `Text.reverse`][3377]
|
- [Implemented `Text.reverse`][3377]
|
||||||
- [Implemented support for most Table aggregations in the Database
|
- [Implemented support for most Table aggregations in the Database
|
||||||
backend.][3383]
|
backend.][3383]
|
||||||
|
- [Update `Text.replace` to new API.][3393]
|
||||||
|
|
||||||
[debug-shortcuts]:
|
[debug-shortcuts]:
|
||||||
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
|
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
|
||||||
@ -160,6 +161,7 @@
|
|||||||
[3383]: https://github.com/enso-org/enso/pull/3383
|
[3383]: https://github.com/enso-org/enso/pull/3383
|
||||||
[3385]: https://github.com/enso-org/enso/pull/3385
|
[3385]: https://github.com/enso-org/enso/pull/3385
|
||||||
[3392]: https://github.com/enso-org/enso/pull/3392
|
[3392]: https://github.com/enso-org/enso/pull/3392
|
||||||
|
[3393]: https://github.com/enso-org/enso/pull/3393
|
||||||
|
|
||||||
#### Enso Compiler
|
#### Enso Compiler
|
||||||
|
|
||||||
|
@ -424,52 +424,21 @@ Text.split separator=Split_Kind.Whitespace mode=Mode.All match_ascii=Nothing cas
|
|||||||
pattern.split this mode=mode
|
pattern.split this mode=mode
|
||||||
|
|
||||||
## ALIAS Replace Text
|
## ALIAS Replace Text
|
||||||
|
Replaces the first, last, or all occurrences of term with new_text in the
|
||||||
Replaces each occurrence of `old_sequence` with `new_sequence`, returning
|
input. If `term` is empty, the function returns the input unchanged.
|
||||||
`this` unchanged if no matches are found.
|
|
||||||
|
|
||||||
Arguments:
|
Arguments:
|
||||||
- old_sequence: The pattern to search for in `this`.
|
- term: The term to find.
|
||||||
- new_sequence: The text to replace every occurrence of `old_sequence` with.
|
- new_text: The new text to replace occurrences of `term` with.
|
||||||
- mode: This argument specifies how many matches the engine will try to
|
If `matcher` is a `Regex_Matcher`, `new_text` can include replacement
|
||||||
replace.
|
patterns (such as `$<n>`) for a marked group.
|
||||||
- match_ascii: Enables or disables pure-ASCII matching for the regex. If you
|
- mode: Specifies which instances of term the engine tries to find. When the
|
||||||
know your data only contains ASCII then you can enable this for a
|
mode is `First` or `Last`, this method replaces the first or last instance
|
||||||
performance boost on some regex engines.
|
of term in the input. If set to `All`, it replaces all instances of term in
|
||||||
- case_insensitive: Enables or disables case-insensitive matching. Case
|
the input.
|
||||||
insensitive matching behaves as if it normalises the case of all input
|
- matcher: If a `Text_Matcher`, the text is compared using case-sensitivity
|
||||||
text before matching on it.
|
rules specified in the matcher. If a `Regex_Matcher`, the term is used as a
|
||||||
- dot_matches_newline: Enables or disables the dot matches newline option.
|
regular expression and matched using the associated options.
|
||||||
This specifies that the `.` special character should match everything
|
|
||||||
_including_ newline characters. Without this flag, it will match all
|
|
||||||
characters _except_ newlines.
|
|
||||||
- multiline: Enables or disables the multiline option. Multiline specifies
|
|
||||||
that the `^` and `$` pattern characters match the start and end of lines,
|
|
||||||
as well as the start and end of the input respectively.
|
|
||||||
- comments: Enables or disables the comments mode for the regular expression.
|
|
||||||
In comments mode, the following changes apply:
|
|
||||||
- Whitespace within the pattern is ignored, except when within a
|
|
||||||
character class or when preceeded by an unescaped backslash, or within
|
|
||||||
grouping constructs (e.g. `(?...)`).
|
|
||||||
- When a line contains a `#`, that is not in a character class and is not
|
|
||||||
preceeded by an unescaped backslash, all characters from the leftmost
|
|
||||||
such `#` to the end of the line are ignored. That is to say, they act
|
|
||||||
as _comments_ in the regex.
|
|
||||||
- extra_opts: Specifies additional options in a vector. This allows options
|
|
||||||
to be supplied and computed without having to break them out into arguments
|
|
||||||
to the function. Where these overlap with one of the flags (`match_ascii`,
|
|
||||||
`case_insensitive`, `dot_matches_newline`, `multiline` and `verbose`), the
|
|
||||||
flags take precedence.
|
|
||||||
|
|
||||||
! Boolean Flags and Extra Options
|
|
||||||
This function contains a number of arguments that are boolean flags that
|
|
||||||
enable or disable common options for the regex. At the same time, it also
|
|
||||||
provides the ability to specify options in the `extra_opts` argument.
|
|
||||||
|
|
||||||
Where one of the flags is _set_ (has the value `True` or `False`), the
|
|
||||||
value of the flag takes precedence over the value in `extra_opts` when
|
|
||||||
merging the options to the engine. The flags are _unset_ (have value
|
|
||||||
`Nothing`) by default.
|
|
||||||
|
|
||||||
> Example
|
> Example
|
||||||
Replace letters in the text "aaa".
|
Replace letters in the text "aaa".
|
||||||
@ -477,15 +446,87 @@ Text.split separator=Split_Kind.Whitespace mode=Mode.All match_ascii=Nothing cas
|
|||||||
'aaa'.replace 'aa' 'b' == 'ba'
|
'aaa'.replace 'aa' 'b' == 'ba'
|
||||||
|
|
||||||
> Example
|
> Example
|
||||||
Replace every word of two letters or less with the string "SMOL".
|
Replace all occurrences of letters 'l' and 'o' with '#'.
|
||||||
|
|
||||||
example_replace =
|
"Hello World!".replace "[lo]" "#" matcher=Regex_Matcher == "He### W#r#d!"
|
||||||
text = "I am a very smol word."
|
|
||||||
text.replace "\w\w(?!\w)"
|
> Example
|
||||||
Text.replace : Text | Engine.Pattern -> Text -> Mode.Mode -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Vector.Vector Option.Option -> Text
|
Replace the first occurrence of letter 'l' with '#'.
|
||||||
Text.replace old_sequence new_sequence mode=Mode.All match_ascii=Nothing case_insensitive=Nothing dot_matches_newline=Nothing multiline=Nothing comments=Nothing extra_opts=[] =
|
|
||||||
compiled_pattern = Regex.compile old_sequence match_ascii=match_ascii case_insensitive=case_insensitive dot_matches_newline=dot_matches_newline multiline=multiline comments=comments extra_opts=extra_opts
|
"Hello World!".replace "l" "#" mode=Matching_Mode.First == "He#lo World!"
|
||||||
compiled_pattern.replace this new_sequence mode
|
|
||||||
|
> Example
|
||||||
|
Replace texts in quotes with parentheses.
|
||||||
|
|
||||||
|
'"abc" foo "bar" baz'.replace '"(.*?)"' '($1)' matcher=Regex_Matcher == '(abc) foo (bar) baz'
|
||||||
|
|
||||||
|
! Matching Grapheme Clusters
|
||||||
|
In case-insensitive mode, a single character can match multiple characters,
|
||||||
|
for example `ß` will match `ss` and `SS`, and the ligature `ffi` will match
|
||||||
|
`ffi` or `f` etc. Thus in this mode, it is sometimes possible for a term to
|
||||||
|
match only a part of some single grapheme cluster, for example in the text
|
||||||
|
`ffia` the term `ia` will match just one-third of the first grapheme `ffi`.
|
||||||
|
Since we do not have the resolution to distinguish such partial matches, a
|
||||||
|
match which matched just a part of some grapheme cluster is extended and
|
||||||
|
treated as if it matched the whole grapheme cluster. Thus the whole
|
||||||
|
grapheme cluster may be replaced with the replacement text even if just a
|
||||||
|
part of it was matched.
|
||||||
|
|
||||||
|
> Example
|
||||||
|
Extended partial matches in case-insensitive mode.
|
||||||
|
|
||||||
|
# The ß symbol matches the letter `S` twice in case-insensitive mode, because it folds to `ss`.
|
||||||
|
'ß'.replace 'S' 'A' matcher=(Text_Matcher Case_Insensitive) . should_equal 'AA'
|
||||||
|
# The 'ffi' ligature is a single grapheme cluster, so even if just a part of it is matched, the whole grapheme is replaced.
|
||||||
|
'affib'.replace 'i' 'X' matcher=(Text_Matcher Case_Insensitive) . should_equal 'aXb'
|
||||||
|
|
||||||
|
! Last Match in Regex Mode
|
||||||
|
Regex always performs the search from the front and matching the last
|
||||||
|
occurrence means selecting the last of the matches while still generating
|
||||||
|
matches from the beginning. This will lead to slightly different behavior
|
||||||
|
for overlapping occurrences of a pattern in Regex mode than in exact text
|
||||||
|
matching mode where the matches are searched for from the back.
|
||||||
|
|
||||||
|
> Example
|
||||||
|
Comparing Matching in Last Mode in Regex and Text mode
|
||||||
|
|
||||||
|
"aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal "ac"
|
||||||
|
"aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "ca"
|
||||||
|
|
||||||
|
"aaa aaa".replace "aa" "c" matcher=Text_Matcher . should_equal "ca ca"
|
||||||
|
"aaa aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Text_Matcher . should_equal "ca aaa"
|
||||||
|
"aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal "aaa ac"
|
||||||
|
"aaa aaa".replace "aa" "c" matcher=Regex_Matcher . should_equal "ca ca"
|
||||||
|
"aaa aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "ca aaa"
|
||||||
|
"aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "aaa ca"
|
||||||
|
Text.replace : Text -> Text -> (Matching_Mode.First | Matching_Mode.Last | Mode.All) -> (Text_Matcher | Regex_Matcher) -> Text
|
||||||
|
Text.replace term="" new_text="" mode=Mode.All matcher=Text_Matcher = if term.is_empty then this else
|
||||||
|
case matcher of
|
||||||
|
Text_Matcher case_sensitivity ->
|
||||||
|
array_from_single_result result = case result of
|
||||||
|
Nothing -> Array.empty
|
||||||
|
_ -> Array.new_1 result
|
||||||
|
spans_array = case case_sensitivity of
|
||||||
|
True -> case mode of
|
||||||
|
Mode.All ->
|
||||||
|
Text_Utils.span_of_all this term
|
||||||
|
Matching_Mode.First ->
|
||||||
|
array_from_single_result <| Text_Utils.span_of this term
|
||||||
|
Matching_Mode.Last ->
|
||||||
|
array_from_single_result <| Text_Utils.last_span_of this term
|
||||||
|
Case_Insensitive locale -> case mode of
|
||||||
|
Mode.All ->
|
||||||
|
Text_Utils.span_of_all_case_insensitive this term locale.java_locale
|
||||||
|
Matching_Mode.First ->
|
||||||
|
array_from_single_result <|
|
||||||
|
Text_Utils.span_of_case_insensitive this term locale.java_locale False
|
||||||
|
Matching_Mode.Last ->
|
||||||
|
array_from_single_result <|
|
||||||
|
Text_Utils.span_of_case_insensitive this term locale.java_locale True
|
||||||
|
Text_Utils.replace_spans this spans_array new_text
|
||||||
|
Regex_Matcher _ _ _ _ _ ->
|
||||||
|
compiled_pattern = matcher.compile term
|
||||||
|
compiled_pattern.replace this new_text mode=mode
|
||||||
|
|
||||||
## ALIAS Get Words
|
## ALIAS Get Words
|
||||||
|
|
||||||
@ -1223,16 +1264,16 @@ Text.trim where=Location.Both what=_.is_whitespace =
|
|||||||
which contains both the start and end indices, allowing to determine the
|
which contains both the start and end indices, allowing to determine the
|
||||||
length of the match. This is useful not only with regex matches (where a
|
length of the match. This is useful not only with regex matches (where a
|
||||||
regular expression can have matches of various lengths) but also for case
|
regular expression can have matches of various lengths) but also for case
|
||||||
insensitive matching. In case insensitive mode, a single character can
|
insensitive matching. In case-insensitive mode, a single character can
|
||||||
match multiple characters, for example `ß` will match `ss` and `SS`, and
|
match multiple characters, for example `ß` will match `ss` and `SS`, and
|
||||||
the ligature `ffi` will match `ffi` or `f` etc. Thus in case insensitive
|
the ligature `ffi` will match `ffi` or `f` etc. Thus in case-insensitive
|
||||||
mode, the length of the match can be shorter or longer than the term that
|
mode, the length of the match can be shorter or longer than the term that
|
||||||
was being matched, so it is extremely important to not rely on the length
|
was being matched, so it is extremely important to not rely on the length
|
||||||
of the matched term when analysing the matches as they may have different
|
of the matched term when analysing the matches as they may have different
|
||||||
lengths.
|
lengths.
|
||||||
|
|
||||||
> Example
|
> Example
|
||||||
Match length differences in case insensitive matching.
|
Match length differences in case-insensitive matching.
|
||||||
|
|
||||||
term = "straße"
|
term = "straße"
|
||||||
text = "MONUMENTENSTRASSE 42"
|
text = "MONUMENTENSTRASSE 42"
|
||||||
@ -1241,7 +1282,7 @@ Text.trim where=Location.Both what=_.is_whitespace =
|
|||||||
match.length == 7
|
match.length == 7
|
||||||
|
|
||||||
! Matching Grapheme Clusters
|
! Matching Grapheme Clusters
|
||||||
In case insensitive mode, a single character can match multiple characters,
|
In case-insensitive mode, a single character can match multiple characters,
|
||||||
for example `ß` will match `ss` and `SS`, and the ligature `ffi` will match
|
for example `ß` will match `ss` and `SS`, and the ligature `ffi` will match
|
||||||
`ffi` or `f` etc. Thus in this mode, it is sometimes possible for a term to
|
`ffi` or `f` etc. Thus in this mode, it is sometimes possible for a term to
|
||||||
match only a part of some single grapheme cluster, for example in the text
|
match only a part of some single grapheme cluster, for example in the text
|
||||||
@ -1266,6 +1307,22 @@ Text.trim where=Location.Both what=_.is_whitespace =
|
|||||||
match_2.length == 2
|
match_2.length == 2
|
||||||
# After being extended to full grapheme clusters, both terms "IFF" and "ffiffl" match the same span of grapheme clusters.
|
# After being extended to full grapheme clusters, both terms "IFF" and "ffiffl" match the same span of grapheme clusters.
|
||||||
match_1 == match_2
|
match_1 == match_2
|
||||||
|
|
||||||
|
! Last Match in Regex Mode
|
||||||
|
Regex always performs the search from the front and matching the last
|
||||||
|
occurrence means selecting the last of the matches while still generating
|
||||||
|
matches from the beginning. This will lead to slightly different behavior
|
||||||
|
for overlapping occurrences of a pattern in Regex mode than in exact text
|
||||||
|
matching mode where the matches are searched for from the back.
|
||||||
|
|
||||||
|
> Example
|
||||||
|
Comparing Matching in Last Mode in Regex and Text mode
|
||||||
|
|
||||||
|
"aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher == Span (Range 1 3) "aaa"
|
||||||
|
"aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher == Span (Range 0 2) "aaa"
|
||||||
|
|
||||||
|
"aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher == Span (Range 5 7) "aaa aaa"
|
||||||
|
"aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher == Span (Range 4 6) "aaa aaa"
|
||||||
Text.location_of : Text -> (Matching_Mode.First | Matching_Mode.Last) -> Matcher -> Span | Nothing
|
Text.location_of : Text -> (Matching_Mode.First | Matching_Mode.Last) -> Matcher -> Span | Nothing
|
||||||
Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = case matcher of
|
Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = case matcher of
|
||||||
Text_Matcher case_sensitive -> case case_sensitive of
|
Text_Matcher case_sensitive -> case case_sensitive of
|
||||||
@ -1274,7 +1331,7 @@ Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = cas
|
|||||||
Matching_Mode.First -> Text_Utils.span_of this term
|
Matching_Mode.First -> Text_Utils.span_of this term
|
||||||
Matching_Mode.Last -> Text_Utils.last_span_of this term
|
Matching_Mode.Last -> Text_Utils.last_span_of this term
|
||||||
if codepoint_span.is_nothing then Nothing else
|
if codepoint_span.is_nothing then Nothing else
|
||||||
start = Text_Utils.utf16_index_to_grapheme_index this codepoint_span.start
|
start = Text_Utils.utf16_index_to_grapheme_index this codepoint_span.codeunit_start
|
||||||
## While the codepoint_span may have different code unit length
|
## While the codepoint_span may have different code unit length
|
||||||
from our term, the `length` counted in grapheme clusters is
|
from our term, the `length` counted in grapheme clusters is
|
||||||
guaranteed to be the same.
|
guaranteed to be the same.
|
||||||
@ -1293,7 +1350,7 @@ Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = cas
|
|||||||
case Text_Utils.span_of_case_insensitive this term locale.java_locale search_for_last of
|
case Text_Utils.span_of_case_insensitive this term locale.java_locale search_for_last of
|
||||||
Nothing -> Nothing
|
Nothing -> Nothing
|
||||||
grapheme_span ->
|
grapheme_span ->
|
||||||
Span (Range grapheme_span.start grapheme_span.end) this
|
Span (Range grapheme_span.grapheme_start grapheme_span.grapheme_end) this
|
||||||
Regex_Matcher _ _ _ _ _ -> case mode of
|
Regex_Matcher _ _ _ _ _ -> case mode of
|
||||||
Matching_Mode.First ->
|
Matching_Mode.First ->
|
||||||
case matcher.compile term . match this Mode.First of
|
case matcher.compile term . match this Mode.First of
|
||||||
@ -1332,16 +1389,16 @@ Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = cas
|
|||||||
which contains both the start and end indices, allowing to determine the
|
which contains both the start and end indices, allowing to determine the
|
||||||
length of the match. This is useful not only with regex matches (where a
|
length of the match. This is useful not only with regex matches (where a
|
||||||
regular expression can have matches of various lengths) but also for case
|
regular expression can have matches of various lengths) but also for case
|
||||||
insensitive matching. In case insensitive mode, a single character can
|
insensitive matching. In case-insensitive mode, a single character can
|
||||||
match multiple characters, for example `ß` will match `ss` and `SS`, and
|
match multiple characters, for example `ß` will match `ss` and `SS`, and
|
||||||
the ligature `ffi` will match `ffi` or `f` etc. Thus in case insensitive
|
the ligature `ffi` will match `ffi` or `f` etc. Thus in case-insensitive
|
||||||
mode, the length of the match can be shorter or longer than the term that
|
mode, the length of the match can be shorter or longer than the term that
|
||||||
was being matched, so it is extremely important to not rely on the length
|
was being matched, so it is extremely important to not rely on the length
|
||||||
of the matched term when analysing the matches as they may have different
|
of the matched term when analysing the matches as they may have different
|
||||||
lengths.
|
lengths.
|
||||||
|
|
||||||
> Example
|
> Example
|
||||||
Match length differences in case insensitive matching.
|
Match length differences in case-insensitive matching.
|
||||||
|
|
||||||
term = "strasse"
|
term = "strasse"
|
||||||
text = "MONUMENTENSTRASSE ist eine große Straße."
|
text = "MONUMENTENSTRASSE ist eine große Straße."
|
||||||
@ -1350,7 +1407,7 @@ Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = cas
|
|||||||
match . map .length == [7, 6]
|
match . map .length == [7, 6]
|
||||||
|
|
||||||
! Matching Grapheme Clusters
|
! Matching Grapheme Clusters
|
||||||
In case insensitive mode, a single character can match multiple characters,
|
In case-insensitive mode, a single character can match multiple characters,
|
||||||
for example `ß` will match `ss` and `SS`, and the ligature `ffi` will match
|
for example `ß` will match `ss` and `SS`, and the ligature `ffi` will match
|
||||||
`ffi` or `f` etc. Thus in this mode, it is sometimes possible for a term to
|
`ffi` or `f` etc. Thus in this mode, it is sometimes possible for a term to
|
||||||
match only a part of some single grapheme cluster, for example in the text
|
match only a part of some single grapheme cluster, for example in the text
|
||||||
@ -1374,7 +1431,7 @@ Text.location_of_all term="" matcher=Text_Matcher.new = case matcher of
|
|||||||
Text_Matcher case_sensitive -> if term.is_empty then Vector.new (this.length + 1) (ix -> Span (Range ix ix) this) else case case_sensitive of
|
Text_Matcher case_sensitive -> if term.is_empty then Vector.new (this.length + 1) (ix -> Span (Range ix ix) this) else case case_sensitive of
|
||||||
True ->
|
True ->
|
||||||
codepoint_spans = Vector.Vector <| Text_Utils.span_of_all this term
|
codepoint_spans = Vector.Vector <| Text_Utils.span_of_all this term
|
||||||
grahpeme_ixes = Vector.Vector <| Text_Utils.utf16_indices_to_grapheme_indices this (codepoint_spans.map .start).to_array
|
grahpeme_ixes = Vector.Vector <| Text_Utils.utf16_indices_to_grapheme_indices this (codepoint_spans.map .codeunit_start).to_array
|
||||||
## While the codepoint_spans may have different code unit lengths
|
## While the codepoint_spans may have different code unit lengths
|
||||||
from our term, the `length` counted in grapheme clusters is
|
from our term, the `length` counted in grapheme clusters is
|
||||||
guaranteed to be the same.
|
guaranteed to be the same.
|
||||||
@ -1385,7 +1442,7 @@ Text.location_of_all term="" matcher=Text_Matcher.new = case matcher of
|
|||||||
Case_Insensitive locale ->
|
Case_Insensitive locale ->
|
||||||
grapheme_spans = Vector.Vector <| Text_Utils.span_of_all_case_insensitive this term locale.java_locale
|
grapheme_spans = Vector.Vector <| Text_Utils.span_of_all_case_insensitive this term locale.java_locale
|
||||||
grapheme_spans.map grapheme_span->
|
grapheme_spans.map grapheme_span->
|
||||||
Span (Range grapheme_span.start grapheme_span.end) this
|
Span (Range grapheme_span.grapheme_start grapheme_span.grapheme_end) this
|
||||||
Regex_Matcher _ _ _ _ _ ->
|
Regex_Matcher _ _ _ _ _ ->
|
||||||
case matcher.compile term . match this Mode.All of
|
case matcher.compile term . match this Mode.All of
|
||||||
Nothing -> []
|
Nothing -> []
|
||||||
|
@ -39,6 +39,7 @@ import Standard.Base.Data.Text.Regex
|
|||||||
import Standard.Base.Data.Text.Regex.Engine
|
import Standard.Base.Data.Text.Regex.Engine
|
||||||
import Standard.Base.Data.Text.Regex.Option as Global_Option
|
import Standard.Base.Data.Text.Regex.Option as Global_Option
|
||||||
import Standard.Base.Data.Text.Regex.Mode
|
import Standard.Base.Data.Text.Regex.Mode
|
||||||
|
import Standard.Base.Data.Text.Matching_Mode
|
||||||
import Standard.Base.Polyglot.Java as Java_Ext
|
import Standard.Base.Polyglot.Java as Java_Ext
|
||||||
from Standard.Base.Data.Text.Span as Span_Module import Utf_16_Span
|
from Standard.Base.Data.Text.Span as Span_Module import Utf_16_Span
|
||||||
|
|
||||||
@ -533,7 +534,7 @@ type Pattern
|
|||||||
pattern = engine.compile "aa []
|
pattern = engine.compile "aa []
|
||||||
input = "aabbaabbbbbaab"
|
input = "aabbaabbbbbaab"
|
||||||
pattern.replace input "REPLACED"
|
pattern.replace input "REPLACED"
|
||||||
replace : Text -> Text -> (Mode.First | Integer | Mode.All | Mode.Full) -> Text
|
replace : Text -> Text -> (Mode.First | Integer | Mode.All | Mode.Full | Matching_Mode.Last) -> Text
|
||||||
replace input replacement mode=Mode.All =
|
replace input replacement mode=Mode.All =
|
||||||
do_replace_mode mode start end = case mode of
|
do_replace_mode mode start end = case mode of
|
||||||
Mode.First ->
|
Mode.First ->
|
||||||
@ -559,8 +560,26 @@ type Pattern
|
|||||||
internal_matcher.replaceAll replacement
|
internal_matcher.replaceAll replacement
|
||||||
Mode.Full ->
|
Mode.Full ->
|
||||||
case this.match input mode=Mode.Full of
|
case this.match input mode=Mode.Full of
|
||||||
Match _ _ _ _ -> replacement
|
Match _ _ _ _ -> this.replace input replacement Mode.First
|
||||||
Nothing -> input
|
Nothing -> input
|
||||||
|
Matching_Mode.Last ->
|
||||||
|
all_matches = this.match input
|
||||||
|
all_matches_count = if all_matches.is_nothing then 0 else all_matches.length
|
||||||
|
|
||||||
|
if all_matches_count == 0 then input else
|
||||||
|
internal_matcher = this.build_matcher input start end
|
||||||
|
buffer = StringBuffer.new
|
||||||
|
last_match_index = all_matches_count - 1
|
||||||
|
|
||||||
|
go match_index =
|
||||||
|
internal_matcher.find
|
||||||
|
case match_index == last_match_index of
|
||||||
|
True -> internal_matcher.appendReplacement buffer replacement
|
||||||
|
False -> @Tail_Call go (match_index + 1)
|
||||||
|
|
||||||
|
go 0
|
||||||
|
internal_matcher.appendTail buffer
|
||||||
|
buffer.to_text
|
||||||
Mode.Bounded _ _ _ -> Panic.throw <|
|
Mode.Bounded _ _ _ -> Panic.throw <|
|
||||||
Mode_Error "Modes cannot be recursive."
|
Mode_Error "Modes cannot be recursive."
|
||||||
|
|
||||||
|
@ -81,22 +81,22 @@ type Text_Sub_Range
|
|||||||
if delimiter.is_empty then (Range 0 0) else
|
if delimiter.is_empty then (Range 0 0) else
|
||||||
span = Text_Utils.span_of text delimiter
|
span = Text_Utils.span_of text delimiter
|
||||||
if span.is_nothing then (Range 0 (Text_Utils.char_length text)) else
|
if span.is_nothing then (Range 0 (Text_Utils.char_length text)) else
|
||||||
(Range 0 span.start)
|
(Range 0 span.codeunit_start)
|
||||||
Before_Last delimiter ->
|
Before_Last delimiter ->
|
||||||
if delimiter.is_empty then (Range 0 (Text_Utils.char_length text)) else
|
if delimiter.is_empty then (Range 0 (Text_Utils.char_length text)) else
|
||||||
span = Text_Utils.last_span_of text delimiter
|
span = Text_Utils.last_span_of text delimiter
|
||||||
if span.is_nothing then (Range 0 (Text_Utils.char_length text)) else
|
if span.is_nothing then (Range 0 (Text_Utils.char_length text)) else
|
||||||
(Range 0 span.start)
|
(Range 0 span.codeunit_start)
|
||||||
After delimiter ->
|
After delimiter ->
|
||||||
if delimiter.is_empty then (Range 0 (Text_Utils.char_length text)) else
|
if delimiter.is_empty then (Range 0 (Text_Utils.char_length text)) else
|
||||||
span = Text_Utils.span_of text delimiter
|
span = Text_Utils.span_of text delimiter
|
||||||
if span.is_nothing then (Range 0 0) else
|
if span.is_nothing then (Range 0 0) else
|
||||||
(Range span.end (Text_Utils.char_length text))
|
(Range span.codeunit_end (Text_Utils.char_length text))
|
||||||
After_Last delimiter ->
|
After_Last delimiter ->
|
||||||
if delimiter.is_empty then (Range 0 0) else
|
if delimiter.is_empty then (Range 0 0) else
|
||||||
span = Text_Utils.last_span_of text delimiter
|
span = Text_Utils.last_span_of text delimiter
|
||||||
if span.is_nothing then (Range 0 0) else
|
if span.is_nothing then (Range 0 0) else
|
||||||
(Range span.end (Text_Utils.char_length text))
|
(Range span.codeunit_end (Text_Utils.char_length text))
|
||||||
While predicate ->
|
While predicate ->
|
||||||
indices = find_sub_range_end text _-> start-> end->
|
indices = find_sub_range_end text _-> start-> end->
|
||||||
predicate (Text_Utils.substring text start end) . not
|
predicate (Text_Utils.substring text start end) . not
|
||||||
|
@ -12,6 +12,7 @@ import java.util.List;
|
|||||||
import java.util.Locale;
|
import java.util.Locale;
|
||||||
import java.util.regex.Pattern;
|
import java.util.regex.Pattern;
|
||||||
import org.enso.base.text.CaseFoldedString;
|
import org.enso.base.text.CaseFoldedString;
|
||||||
|
import org.enso.base.text.CaseFoldedString.Grapheme;
|
||||||
import org.enso.base.text.GraphemeSpan;
|
import org.enso.base.text.GraphemeSpan;
|
||||||
import org.enso.base.text.Utf16Span;
|
import org.enso.base.text.Utf16Span;
|
||||||
|
|
||||||
@ -231,19 +232,6 @@ public class Text_Utils {
|
|||||||
return CaseFoldedString.simpleFold(string, locale);
|
return CaseFoldedString.simpleFold(string, locale);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
|
||||||
* Replaces all occurrences of {@code oldSequence} within {@code str} with {@code newSequence}.
|
|
||||||
*
|
|
||||||
* @param str the string to process
|
|
||||||
* @param oldSequence the substring that is searched for and will be replaced
|
|
||||||
* @param newSequence the string that will replace occurrences of {@code oldSequence}
|
|
||||||
* @return {@code str} with all occurrences of {@code oldSequence} replaced with {@code
|
|
||||||
* newSequence}
|
|
||||||
*/
|
|
||||||
public static String replace(String str, String oldSequence, String newSequence) {
|
|
||||||
return str.replace(oldSequence, newSequence);
|
|
||||||
}
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Gets the length of char array of a string
|
* Gets the length of char array of a string
|
||||||
*
|
*
|
||||||
@ -306,7 +294,7 @@ public class Text_Utils {
|
|||||||
|
|
||||||
StringSearch search = new StringSearch(needle, haystack);
|
StringSearch search = new StringSearch(needle, haystack);
|
||||||
ArrayList<Utf16Span> occurrences = new ArrayList<>();
|
ArrayList<Utf16Span> occurrences = new ArrayList<>();
|
||||||
long ix;
|
int ix;
|
||||||
while ((ix = search.next()) != StringSearch.DONE) {
|
while ((ix = search.next()) != StringSearch.DONE) {
|
||||||
occurrences.add(new Utf16Span(ix, ix + search.getMatchLength()));
|
occurrences.add(new Utf16Span(ix, ix + search.getMatchLength()));
|
||||||
}
|
}
|
||||||
@ -456,13 +444,21 @@ public class Text_Utils {
|
|||||||
* @return a minimal {@code GraphemeSpan} which contains all code units from the match
|
* @return a minimal {@code GraphemeSpan} which contains all code units from the match
|
||||||
*/
|
*/
|
||||||
private static GraphemeSpan findExtendedSpan(CaseFoldedString string, int position, int length) {
|
private static GraphemeSpan findExtendedSpan(CaseFoldedString string, int position, int length) {
|
||||||
int firstGrapheme = string.codeUnitToGraphemeIndex(position);
|
Grapheme firstGrapheme = string.findGrapheme(position);
|
||||||
if (length == 0) {
|
if (length == 0) {
|
||||||
return new GraphemeSpan(firstGrapheme, firstGrapheme);
|
return new GraphemeSpan(
|
||||||
|
firstGrapheme.index,
|
||||||
|
firstGrapheme.index,
|
||||||
|
firstGrapheme.codeunit_start,
|
||||||
|
firstGrapheme.codeunit_start);
|
||||||
} else {
|
} else {
|
||||||
int lastGrapheme = string.codeUnitToGraphemeIndex(position + length - 1);
|
Grapheme lastGrapheme = string.findGrapheme(position + length - 1);
|
||||||
int endGrapheme = lastGrapheme + 1;
|
int endGraphemeIndex = lastGrapheme.index + 1;
|
||||||
return new GraphemeSpan(firstGrapheme, endGrapheme);
|
return new GraphemeSpan(
|
||||||
|
firstGrapheme.index,
|
||||||
|
endGraphemeIndex,
|
||||||
|
firstGrapheme.codeunit_start,
|
||||||
|
lastGrapheme.codeunit_end);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -485,4 +481,30 @@ public class Text_Utils {
|
|||||||
public static boolean is_all_whitespace(String text) {
|
public static boolean is_all_whitespace(String text) {
|
||||||
return text.codePoints().allMatch(UCharacter::isUWhiteSpace);
|
return text.codePoints().allMatch(UCharacter::isUWhiteSpace);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Replaces all provided spans within the text with {@code newSequence}.
|
||||||
|
*
|
||||||
|
* @param str the string to process
|
||||||
|
* @param spans the spans to replace; the spans should be sorted by their starting point in the
|
||||||
|
* non-decreasing order; the behaviour is undefined if these requirements are not satisfied.
|
||||||
|
* @param newSequence the string that will replace the spans
|
||||||
|
* @return {@code str} with all provided spans replaced with {@code newSequence}
|
||||||
|
*/
|
||||||
|
public static String replace_spans(String str, List<Utf16Span> spans, String newSequence) {
|
||||||
|
StringBuilder sb = new StringBuilder();
|
||||||
|
int current_ix = 0;
|
||||||
|
for (Utf16Span span : spans) {
|
||||||
|
if (span.codeunit_start > current_ix) {
|
||||||
|
sb.append(str, current_ix, span.codeunit_start);
|
||||||
|
}
|
||||||
|
|
||||||
|
sb.append(newSequence);
|
||||||
|
current_ix = span.codeunit_end;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Add the remaining part of the string (if any).
|
||||||
|
sb.append(str, current_ix, str.length());
|
||||||
|
return sb.toString();
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
@ -13,6 +13,20 @@ import java.util.Locale;
|
|||||||
* indices back in the original string.
|
* indices back in the original string.
|
||||||
*/
|
*/
|
||||||
public class CaseFoldedString {
|
public class CaseFoldedString {
|
||||||
|
public static class Grapheme {
|
||||||
|
/** The grapheme index of the given grapheme in the string. */
|
||||||
|
public final int index;
|
||||||
|
|
||||||
|
/** The codeunit indices of start and end of the given grapheme in the original string. */
|
||||||
|
public final int codeunit_start, codeunit_end;
|
||||||
|
|
||||||
|
public Grapheme(int index, int codeunit_start, int codeunit_end) {
|
||||||
|
this.index = index;
|
||||||
|
this.codeunit_start = codeunit_start;
|
||||||
|
this.codeunit_end = codeunit_end;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
private final String foldedString;
|
private final String foldedString;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@ -24,33 +38,67 @@ public class CaseFoldedString {
|
|||||||
*/
|
*/
|
||||||
private final int[] graphemeIndexMapping;
|
private final int[] graphemeIndexMapping;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A mapping from code units in the transformed string to the first code-unit of the corresponding
|
||||||
|
* grapheme in the original string.
|
||||||
|
*
|
||||||
|
* <p>The mapping must be valid from indices from 0 to @{code foldedString.length()+1}
|
||||||
|
* (inclusive).
|
||||||
|
*/
|
||||||
|
private final int[] codeunitStartIndexMapping;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* A mapping from code units in the transformed string to the end code-unit of the corresponding
|
||||||
|
* grapheme in the original string.
|
||||||
|
*
|
||||||
|
* <p>The mapping must be valid from indices from 0 to @{code foldedString.length()+1}
|
||||||
|
* (inclusive).
|
||||||
|
*/
|
||||||
|
private final int[] codeunitEndIndexMapping;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Constructs a new instance of the folded string.
|
* Constructs a new instance of the folded string.
|
||||||
*
|
*
|
||||||
* @param foldeString the string after applying the case folding transformation
|
* @param foldeString the string after applying the case folding transformation
|
||||||
* @param graphemeIndexMapping a mapping created during the transformation which maps code units
|
* @param graphemeIndexMapping a mapping created during the transformation which maps code units
|
||||||
* in the transformed string to their corresponding graphemes in the original string
|
* in the transformed string to their corresponding graphemes in the original string
|
||||||
|
* @param codeunitStartIndexMapping a mapping created during the transformation which maps code
|
||||||
|
* units in the transformed string to first codeunits of corresponding graphemes in the
|
||||||
|
* original string
|
||||||
|
* @param codeunitStartIndexMapping a mapping created during the transformation which maps code
|
||||||
|
* units in the transformed string to end codeunits of corresponding graphemes in the original
|
||||||
|
* string
|
||||||
*/
|
*/
|
||||||
private CaseFoldedString(String foldeString, int[] graphemeIndexMapping) {
|
private CaseFoldedString(
|
||||||
|
String foldeString,
|
||||||
|
int[] graphemeIndexMapping,
|
||||||
|
int[] codeunitStartIndexMapping,
|
||||||
|
int[] codeunitEndIndexMapping) {
|
||||||
this.foldedString = foldeString;
|
this.foldedString = foldeString;
|
||||||
this.graphemeIndexMapping = graphemeIndexMapping;
|
this.graphemeIndexMapping = graphemeIndexMapping;
|
||||||
|
this.codeunitStartIndexMapping = codeunitStartIndexMapping;
|
||||||
|
this.codeunitEndIndexMapping = codeunitEndIndexMapping;
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Maps a code unit in the folded string to the corresponding grapheme in the original string.
|
* Finds the grapheme corresponding to a code unit in the folded string.
|
||||||
*
|
*
|
||||||
* @param codeunitIndex the index of the code unit in the folded string, valid indices range from
|
* @param codeunitIndex the index of the code unit in the folded string, valid indices range from
|
||||||
* 0 to {@code getFoldedString().length()+1} (inclusive), allowing to also ask for the
|
* 0 to {@code getFoldedString().length()+1} (inclusive), allowing to also ask for the
|
||||||
* position of the end code unit which is located right after the end of the string - which
|
* position of the end code unit which is located right after the end of the string - which
|
||||||
* should always map to the analogous end grapheme.
|
* should always map to the analogous end grapheme.
|
||||||
* @return the index of the grapheme from the original string that after applying the
|
* @return the index of the first code unit of the grapheme from the original string that after
|
||||||
* transformation contains the requested code unit
|
* applying the transformation contains the requested code unit
|
||||||
*/
|
*/
|
||||||
public int codeUnitToGraphemeIndex(int codeunitIndex) {
|
public Grapheme findGrapheme(int codeunitIndex) {
|
||||||
if (codeunitIndex < 0 || codeunitIndex > this.foldedString.length()) {
|
if (codeunitIndex < 0 || codeunitIndex > this.foldedString.length()) {
|
||||||
throw new IndexOutOfBoundsException(codeunitIndex);
|
throw new IndexOutOfBoundsException(codeunitIndex);
|
||||||
}
|
}
|
||||||
return graphemeIndexMapping[codeunitIndex];
|
|
||||||
|
return new Grapheme(
|
||||||
|
graphemeIndexMapping[codeunitIndex],
|
||||||
|
codeunitStartIndexMapping[codeunitIndex],
|
||||||
|
codeunitEndIndexMapping[codeunitIndex]);
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Returns the transformed string. */
|
/** Returns the transformed string. */
|
||||||
@ -74,7 +122,9 @@ public class CaseFoldedString {
|
|||||||
breakIterator.setText(charSequence);
|
breakIterator.setText(charSequence);
|
||||||
StringBuilder stringBuilder = new StringBuilder(charSequence.length());
|
StringBuilder stringBuilder = new StringBuilder(charSequence.length());
|
||||||
Fold foldAlgorithm = caseFoldAlgorithmForLocale(locale);
|
Fold foldAlgorithm = caseFoldAlgorithmForLocale(locale);
|
||||||
IntArrayBuilder index_mapping = new IntArrayBuilder(charSequence.length() + 1);
|
IntArrayBuilder grapheme_mapping = new IntArrayBuilder(charSequence.length() + 1);
|
||||||
|
IntArrayBuilder codeunit_start_mapping = new IntArrayBuilder(charSequence.length() + 1);
|
||||||
|
IntArrayBuilder codeunit_end_mapping = new IntArrayBuilder(charSequence.length() + 1);
|
||||||
|
|
||||||
// We rely on the fact that ICU Case Folding is _not_ context-sensitive, i.e. the mapping of
|
// We rely on the fact that ICU Case Folding is _not_ context-sensitive, i.e. the mapping of
|
||||||
// each grapheme cluster is independent of surrounding ones. Regular casing is
|
// each grapheme cluster is independent of surrounding ones. Regular casing is
|
||||||
@ -87,7 +137,9 @@ public class CaseFoldedString {
|
|||||||
String foldedGrapheme = foldAlgorithm.apply(grapheme);
|
String foldedGrapheme = foldAlgorithm.apply(grapheme);
|
||||||
stringBuilder.append(foldedGrapheme);
|
stringBuilder.append(foldedGrapheme);
|
||||||
for (int i = 0; i < foldedGrapheme.length(); ++i) {
|
for (int i = 0; i < foldedGrapheme.length(); ++i) {
|
||||||
index_mapping.add(grapheme_index);
|
grapheme_mapping.add(grapheme_index);
|
||||||
|
codeunit_start_mapping.add(current);
|
||||||
|
codeunit_end_mapping.add(next);
|
||||||
}
|
}
|
||||||
|
|
||||||
grapheme_index++;
|
grapheme_index++;
|
||||||
@ -96,10 +148,13 @@ public class CaseFoldedString {
|
|||||||
|
|
||||||
// The mapping should also be able to handle a {@code str.length()} query, so we add one more
|
// The mapping should also be able to handle a {@code str.length()} query, so we add one more
|
||||||
// element to the mapping pointing to a non-existent grapheme after the end of the text.
|
// element to the mapping pointing to a non-existent grapheme after the end of the text.
|
||||||
index_mapping.add(grapheme_index);
|
grapheme_mapping.add(grapheme_index);
|
||||||
|
|
||||||
return new CaseFoldedString(
|
return new CaseFoldedString(
|
||||||
stringBuilder.toString(), index_mapping.unsafeGetStorageAndInvalidateTheBuilder());
|
stringBuilder.toString(),
|
||||||
|
grapheme_mapping.unsafeGetStorageAndInvalidateTheBuilder(),
|
||||||
|
codeunit_start_mapping.unsafeGetStorageAndInvalidateTheBuilder(),
|
||||||
|
codeunit_end_mapping.unsafeGetStorageAndInvalidateTheBuilder());
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
|
@ -9,20 +9,21 @@ package org.enso.base.text;
|
|||||||
* <p>Represents an empty span if start and end indices are equal. Such an empty span refers to the
|
* <p>Represents an empty span if start and end indices are equal. Such an empty span refers to the
|
||||||
* space just before the grapheme corresponding to index start.
|
* space just before the grapheme corresponding to index start.
|
||||||
*/
|
*/
|
||||||
public class GraphemeSpan {
|
public class GraphemeSpan extends Utf16Span {
|
||||||
|
|
||||||
public final long start, end;
|
public final int grapheme_start, grapheme_end;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Constructs a span of characters (understood as extended grapheme clusters).
|
* Constructs a span of characters (understood as extended grapheme clusters).
|
||||||
*
|
* @param grapheme_start index of the first extended grapheme cluster contained within the span (or
|
||||||
* @param start index of the first extended grapheme cluster contained within the span (or
|
|
||||||
* location of the span if it is empty)
|
* location of the span if it is empty)
|
||||||
* @param end index of the first extended grapheme cluster after start that is not contained
|
* @param grapheme_end index of the first extended grapheme cluster after start that is not contained
|
||||||
* within the span
|
* @param codeunit_start code unit index of {@code grapheme_start}
|
||||||
|
* @param codeunit_end code unit index of {@code grapheme_end}
|
||||||
*/
|
*/
|
||||||
public GraphemeSpan(long start, long end) {
|
public GraphemeSpan(int grapheme_start, int grapheme_end, int codeunit_start, int codeunit_end) {
|
||||||
this.start = start;
|
super(codeunit_start, codeunit_end);
|
||||||
this.end = end;
|
this.grapheme_start = grapheme_start;
|
||||||
|
this.grapheme_end = grapheme_end;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -8,11 +8,11 @@ package org.enso.base.text;
|
|||||||
*/
|
*/
|
||||||
public class Utf16Span {
|
public class Utf16Span {
|
||||||
|
|
||||||
public final long start, end;
|
public final int codeunit_start, codeunit_end;
|
||||||
|
|
||||||
/** Constructs a span of UTF-16 code units. */
|
/** Constructs a span of UTF-16 code units. */
|
||||||
public Utf16Span(long start, long end) {
|
public Utf16Span(int codeunit_start, int codeunit_end) {
|
||||||
this.start = start;
|
this.codeunit_start = codeunit_start;
|
||||||
this.end = end;
|
this.codeunit_end = codeunit_end;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -376,7 +376,7 @@ spec prefix table_builder supports_case_sensitive_columns pending=Nothing =
|
|||||||
expect_column_names ["bar", "foo_001", "foo_1", "Foo_2", "foo_3", "foo_21", "foo_100"] <| table.sort_columns (Sort_Method natural_order=True case_sensitive=Case_Insensitive.new)
|
expect_column_names ["bar", "foo_001", "foo_1", "Foo_2", "foo_3", "foo_21", "foo_100"] <| table.sort_columns (Sort_Method natural_order=True case_sensitive=Case_Insensitive.new)
|
||||||
expect_column_names ["foo_3", "foo_21", "foo_100", "foo_1", "foo_001", "bar", "Foo_2"] <| table.sort_columns (Sort_Method order=Sort_Order.Descending)
|
expect_column_names ["foo_3", "foo_21", "foo_100", "foo_1", "foo_001", "bar", "Foo_2"] <| table.sort_columns (Sort_Method order=Sort_Order.Descending)
|
||||||
|
|
||||||
Test.specify "should correctly handle case insensitive sorting" <|
|
Test.specify "should correctly handle case-insensitive sorting" <|
|
||||||
expect_column_names ["bar", "foo_001", "foo_1", "foo_100", "Foo_2", "foo_21", "foo_3"] <| table.sort_columns (Sort_Method case_sensitive=Case_Insensitive.new)
|
expect_column_names ["bar", "foo_001", "foo_1", "foo_100", "Foo_2", "foo_21", "foo_3"] <| table.sort_columns (Sort_Method case_sensitive=Case_Insensitive.new)
|
||||||
|
|
||||||
Test.specify "should correctly handle natural order sorting" <|
|
Test.specify "should correctly handle natural order sorting" <|
|
||||||
@ -412,7 +412,7 @@ spec prefix table_builder supports_case_sensitive_columns pending=Nothing =
|
|||||||
expect_column_names ["FirstColumn", "beta", "gamma", "Another"] <|
|
expect_column_names ["FirstColumn", "beta", "gamma", "Another"] <|
|
||||||
table.rename_columns (Column_Mapping.By_Name map (Text_Matcher True))
|
table.rename_columns (Column_Mapping.By_Name map (Text_Matcher True))
|
||||||
|
|
||||||
Test.specify "should work by name case insensitively" <|
|
Test.specify "should work by name case-insensitively" <|
|
||||||
map = Map.from_vector [["ALPHA", "FirstColumn"], ["DELTA", "Another"]]
|
map = Map.from_vector [["ALPHA", "FirstColumn"], ["DELTA", "Another"]]
|
||||||
expect_column_names ["FirstColumn", "beta", "gamma", "Another"] <|
|
expect_column_names ["FirstColumn", "beta", "gamma", "Another"] <|
|
||||||
table.rename_columns (Column_Mapping.By_Name map (Text_Matcher Case_Insensitive.new))
|
table.rename_columns (Column_Mapping.By_Name map (Text_Matcher Case_Insensitive.new))
|
||||||
|
@ -5,6 +5,7 @@ import Standard.Test
|
|||||||
import Standard.Base.Data.Text.Regex
|
import Standard.Base.Data.Text.Regex
|
||||||
import Standard.Base.Data.Text.Regex.Engine.Default as Default_Engine
|
import Standard.Base.Data.Text.Regex.Engine.Default as Default_Engine
|
||||||
import Standard.Base.Data.Text.Regex.Mode
|
import Standard.Base.Data.Text.Regex.Mode
|
||||||
|
import Standard.Base.Data.Text.Matching_Mode
|
||||||
import Standard.Base.Data.Text.Regex.Option as Global_Option
|
import Standard.Base.Data.Text.Regex.Option as Global_Option
|
||||||
from Standard.Base.Data.Text.Span as Span_Module import Utf_16_Span
|
from Standard.Base.Data.Text.Span as Span_Module import Utf_16_Span
|
||||||
|
|
||||||
@ -399,6 +400,11 @@ spec =
|
|||||||
match = pattern.replace input "REPLACED" mode=Mode.Full
|
match = pattern.replace input "REPLACED" mode=Mode.Full
|
||||||
match . should_equal "REPLACED"
|
match . should_equal "REPLACED"
|
||||||
|
|
||||||
|
Test.specify "should correctly replace entire input in Full mode even if partial matches are possible" <|
|
||||||
|
pattern = engine.compile "(aa)+" []
|
||||||
|
pattern.replace "aaa" "REPLACED" mode=Mode.Full . should_equal "aaa"
|
||||||
|
pattern.replace "aaaa" "REPLACED" mode=Mode.Full . should_equal "REPLACED"
|
||||||
|
|
||||||
Test.specify "should return the input for a full replace if the pattern doesn't match the entire input" <|
|
Test.specify "should return the input for a full replace if the pattern doesn't match the entire input" <|
|
||||||
pattern = engine.compile "(..)" []
|
pattern = engine.compile "(..)" []
|
||||||
input = "aa ab"
|
input = "aa ab"
|
||||||
@ -417,6 +423,35 @@ spec =
|
|||||||
match = pattern.replace input "REPLACED" mode=Mode.All
|
match = pattern.replace input "REPLACED" mode=Mode.All
|
||||||
match . should_equal "REPLACEDREPLACEDb"
|
match . should_equal "REPLACEDREPLACEDb"
|
||||||
|
|
||||||
|
Test.specify "should handle capture groups in replacement" <|
|
||||||
|
pattern = engine.compile "(?<capture>[a-z]+)" []
|
||||||
|
pattern.replace "foo bar, baz" "[$1]" mode=Mode.All . should_equal "[foo] [bar], [baz]"
|
||||||
|
pattern.replace "foo bar, baz" "[$1]" mode=0 . should_equal "foo bar, baz"
|
||||||
|
pattern.replace "foo bar, baz" "[$1]" mode=1 . should_equal "[foo] bar, baz"
|
||||||
|
pattern.replace "foo bar, baz" "[$1]" mode=2 . should_equal "[foo] [bar], baz"
|
||||||
|
pattern.replace "foo bar, baz" "[$1]" mode=3 . should_equal "[foo] [bar], [baz]"
|
||||||
|
pattern.replace "foo bar, baz" "[$1]" mode=4 . should_equal "[foo] [bar], [baz]"
|
||||||
|
pattern.replace "foo bar, baz" "[$1]" mode=Mode.First . should_equal "[foo] bar, baz"
|
||||||
|
pattern.replace "foo bar, baz" "[$1]" mode=Matching_Mode.Last . should_equal "foo bar, [baz]"
|
||||||
|
|
||||||
|
pattern.replace "foo bar, baz" "[${capture}]" mode=Mode.All . should_equal "[foo] [bar], [baz]"
|
||||||
|
pattern.replace "foo bar, baz" "[${capture}]" mode=0 . should_equal "foo bar, baz"
|
||||||
|
pattern.replace "foo bar, baz" "[${capture}]" mode=1 . should_equal "[foo] bar, baz"
|
||||||
|
pattern.replace "foo bar, baz" "[${capture}]" mode=2 . should_equal "[foo] [bar], baz"
|
||||||
|
pattern.replace "foo bar, baz" "[${capture}]" mode=3 . should_equal "[foo] [bar], [baz]"
|
||||||
|
pattern.replace "foo bar, baz" "[${capture}]" mode=4 . should_equal "[foo] [bar], [baz]"
|
||||||
|
pattern.replace "foo bar, baz" "[${capture}]" mode=Mode.First . should_equal "[foo] bar, baz"
|
||||||
|
pattern.replace "foo bar, baz" "[${capture}]" mode=Matching_Mode.Last . should_equal "foo bar, [baz]"
|
||||||
|
|
||||||
|
Test.specify "should handle capture groups in replacement in All mode" <|
|
||||||
|
pattern = engine.compile "([a-z]+)" []
|
||||||
|
pattern.replace "foo bar, baz" "[$1]" mode=Mode.Full . should_equal "foo bar, baz"
|
||||||
|
pattern.replace "foo" "[$1]" mode=Mode.Full . should_equal "[foo]"
|
||||||
|
|
||||||
|
pattern_2 = engine.compile '<a href="(?<addr>.*?)">(?<name>.*?)</a>' []
|
||||||
|
pattern_2.replace '<a href="url">content</a>' "$2 <- $1" mode=Mode.Full . should_equal "content <- url"
|
||||||
|
pattern_2.replace '<a href="url">content</a>' "${name} <- ${addr}" mode=Mode.Full . should_equal "content <- url"
|
||||||
|
|
||||||
Test.group "Match.group" <|
|
Test.group "Match.group" <|
|
||||||
engine = Default_Engine.new
|
engine = Default_Engine.new
|
||||||
pattern = engine.compile "(.. .. )(?<letters>.+)()??(?<empty>)??" []
|
pattern = engine.compile "(.. .. )(?<letters>.+)()??(?<empty>)??" []
|
||||||
|
@ -52,10 +52,10 @@ spec =
|
|||||||
|
|
||||||
codeunits = Vector.new folded.getFoldedString.utf_16.length+1 ix->ix
|
codeunits = Vector.new folded.getFoldedString.utf_16.length+1 ix->ix
|
||||||
grapheme_ixes = codeunits.map ix->
|
grapheme_ixes = codeunits.map ix->
|
||||||
folded.codeUnitToGraphemeIndex ix
|
folded.findGrapheme ix . index
|
||||||
grapheme_ixes . should_equal [0, 0, 1, 2, 3, 3, 4, 4, 4, 5, 6]
|
grapheme_ixes . should_equal [0, 0, 1, 2, 3, 3, 4, 4, 4, 5, 6]
|
||||||
|
|
||||||
Test.expect_panic_with (folded.codeUnitToGraphemeIndex -1) Polyglot_Error
|
Test.expect_panic_with (folded.findGrapheme -1) Polyglot_Error
|
||||||
Test.expect_panic_with (folded.codeUnitToGraphemeIndex folded.getFoldedString.utf_16.length+1) Polyglot_Error
|
Test.expect_panic_with (folded.findGrapheme folded.getFoldedString.utf_16.length+1) Polyglot_Error
|
||||||
|
|
||||||
main = Test.Suite.run_main here.spec
|
main = Test.Suite.run_main here.spec
|
||||||
|
@ -942,7 +942,7 @@ spec =
|
|||||||
abc.location_of "" mode=Matching_Mode.Last . should_equal (Span (Range 3 3) abc)
|
abc.location_of "" mode=Matching_Mode.Last . should_equal (Span (Range 3 3) abc)
|
||||||
abc.location_of_all "" . should_equal [Span (Range 0 0) abc, Span (Range 1 1) abc, Span (Range 2 2) abc, Span (Range 3 3) abc]
|
abc.location_of_all "" . should_equal [Span (Range 0 0) abc, Span (Range 1 1) abc, Span (Range 2 2) abc, Span (Range 3 3) abc]
|
||||||
|
|
||||||
Test.specify "should allow case insensitive matching in location_of" <|
|
Test.specify "should allow case-insensitive matching in location_of" <|
|
||||||
hello = "Hello WORLD!"
|
hello = "Hello WORLD!"
|
||||||
case_insensitive = Text_Matcher Case_Insensitive.new
|
case_insensitive = Text_Matcher Case_Insensitive.new
|
||||||
hello.location_of "world" . should_equal Nothing
|
hello.location_of "world" . should_equal Nothing
|
||||||
@ -1022,6 +1022,13 @@ spec =
|
|||||||
abc.location_of_all "" matcher=regex . should_equal [Span (Range 0 0) abc, Span (Range 0 0) abc, Span (Range 1 1) abc, Span (Range 2 2) abc, Span (Range 3 3) abc]
|
abc.location_of_all "" matcher=regex . should_equal [Span (Range 0 0) abc, Span (Range 0 0) abc, Span (Range 1 1) abc, Span (Range 2 2) abc, Span (Range 3 3) abc]
|
||||||
abc.location_of "" matcher=regex mode=Matching_Mode.Last . should_equal (Span (Range 3 3) abc)
|
abc.location_of "" matcher=regex mode=Matching_Mode.Last . should_equal (Span (Range 3 3) abc)
|
||||||
|
|
||||||
|
Test.specify "should handle overlapping matches as shown in the examples"
|
||||||
|
"aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal (Span (Range 1 3) "aaa")
|
||||||
|
"aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal (Span (Range 0 2) "aaa")
|
||||||
|
|
||||||
|
"aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal (Span (Range 5 7) "aaa aaa")
|
||||||
|
"aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal (Span (Range 4 6) "aaa aaa")
|
||||||
|
|
||||||
Test.group "Regex matching" <|
|
Test.group "Regex matching" <|
|
||||||
Test.specify "should be possible on text" <|
|
Test.specify "should be possible on text" <|
|
||||||
match = "My Text: Goes Here".match "^My Text: (.+)$" mode=Regex_Mode.First
|
match = "My Text: Goes Here".match "^My Text: (.+)$" mode=Regex_Mode.First
|
||||||
@ -1179,35 +1186,144 @@ spec =
|
|||||||
splits.at 1 . should_equal "c"
|
splits.at 1 . should_equal "c"
|
||||||
splits.at 2 . should_equal "e"
|
splits.at 2 . should_equal "e"
|
||||||
|
|
||||||
Test.group "Regex replacement" <|
|
Test.group "Text.replace" <|
|
||||||
Test.specify "should be possible on text" <|
|
Test.specify "should work as in examples" <|
|
||||||
result = "ababab".replace "b" "a"
|
'aaa'.replace 'aa' 'b' . should_equal 'ba'
|
||||||
result . should_equal "aaaaaa"
|
"Hello World!".replace "[lo]" "#" matcher=Regex_Matcher . should_equal "He### W#r#d!"
|
||||||
|
"Hello World!".replace "l" "#" mode=Matching_Mode.First . should_equal "He#lo World!"
|
||||||
|
'"abc" foo "bar" baz'.replace '"(.*?)"' '($1)' matcher=Regex_Matcher . should_equal '(abc) foo (bar) baz'
|
||||||
|
'ß'.replace 'S' 'A' matcher=(Text_Matcher Case_Insensitive) . should_equal 'AA'
|
||||||
|
'affib'.replace 'i' 'X' matcher=(Text_Matcher Case_Insensitive) . should_equal 'aXb'
|
||||||
|
|
||||||
|
Test.specify "should correctly handle empty-string edge cases" <|
|
||||||
|
[Mode.All, Matching_Mode.First, Matching_Mode.Last] . each mode->
|
||||||
|
'aaa'.replace '' 'foo' mode=mode . should_equal 'aaa'
|
||||||
|
''.replace '' '' mode=mode . should_equal ''
|
||||||
|
'a'.replace 'a' '' mode=mode . should_equal ''
|
||||||
|
''.replace 'a' 'b' mode=mode . should_equal ''
|
||||||
|
|
||||||
|
'aba' . replace 'a' '' Matching_Mode.First . should_equal 'ba'
|
||||||
|
'aba' . replace 'a' '' Matching_Mode.Last . should_equal 'ab'
|
||||||
|
'aba' . replace 'a' '' . should_equal 'b'
|
||||||
|
'aba' . replace 'c' '' . should_equal 'aba'
|
||||||
|
|
||||||
|
Test.specify "should correctly handle first, all and last matching with overlapping occurrences" <|
|
||||||
|
"aaa aaa".replace "aa" "c" . should_equal "ca ca"
|
||||||
|
"aaa aaa".replace "aa" "c" mode=Matching_Mode.First . should_equal "ca aaa"
|
||||||
|
"aaa aaa".replace "aa" "c" mode=Matching_Mode.Last . should_equal "aaa ac"
|
||||||
|
|
||||||
|
Test.specify "should correctly handle case-insensitive matches" <|
|
||||||
|
'AaąĄ' . replace "A" "-" matcher=(Text_Matcher Case_Insensitive) . should_equal '--ąĄ'
|
||||||
|
'AaąĄ' . replace "A" "-" . should_equal '-aąĄ'
|
||||||
|
'HeLlO wOrLd' . replace 'hElLo' 'Hey,' matcher=(Text_Matcher True) . should_equal 'HeLlO wOrLd'
|
||||||
|
'HeLlO wOrLd' . replace 'hElLo' 'Hey,' matcher=(Text_Matcher Case_Insensitive) . should_equal 'Hey, wOrLd'
|
||||||
|
|
||||||
|
"Iiİı" . replace "i" "-" . should_equal "I-İı"
|
||||||
|
"Iiİı" . replace "I" "-" . should_equal "-iİı"
|
||||||
|
"Iiİı" . replace "İ" "-" . should_equal "Ii-ı"
|
||||||
|
"Iiİı" . replace "ı" "-" . should_equal "Iiİ-"
|
||||||
|
|
||||||
|
"Iiİı" . replace "i" "-" matcher=(Text_Matcher Case_Insensitive) . should_equal "--İı"
|
||||||
|
"Iiİı" . replace "I" "-" matcher=(Text_Matcher Case_Insensitive) . should_equal "--İı"
|
||||||
|
"Iiİı" . replace "İ" "-" matcher=(Text_Matcher Case_Insensitive) . should_equal "Ii-ı"
|
||||||
|
"Iiİı" . replace "ı" "-" matcher=(Text_Matcher Case_Insensitive) . should_equal "Iiİ-"
|
||||||
|
|
||||||
|
tr_insensitive = Text_Matcher (Case_Insensitive (Locale.new "tr"))
|
||||||
|
"Iiİı" . replace "i" "-" matcher=tr_insensitive . should_equal "I--ı"
|
||||||
|
"Iiİı" . replace "I" "-" matcher=tr_insensitive . should_equal "-iİ-"
|
||||||
|
"Iiİı" . replace "İ" "-" matcher=tr_insensitive . should_equal "I--ı"
|
||||||
|
"Iiİı" . replace "ı" "-" matcher=tr_insensitive . should_equal "-iİ-"
|
||||||
|
|
||||||
|
Test.specify "should correctly handle Unicode edge cases" <|
|
||||||
|
'sśs\u{301}' . replace 's' 'O' . should_equal 'Ośs\u{301}'
|
||||||
|
'sśs\u{301}' . replace 's' 'O' Matching_Mode.Last . should_equal 'Ośs\u{301}'
|
||||||
|
'śs\u{301}s' . replace 's' 'O' Matching_Mode.First . should_equal 'śs\u{301}O'
|
||||||
|
|
||||||
|
'sśs\u{301}' . replace 'ś' 'O' . should_equal 'sOO'
|
||||||
|
'sśs\u{301}' . replace 's\u{301}' 'O' . should_equal 'sOO'
|
||||||
|
|
||||||
|
'SŚS\u{301}' . replace 's' 'O' . should_equal 'SŚS\u{301}'
|
||||||
|
'SŚS\u{301}' . replace 's' 'O' Matching_Mode.Last . should_equal 'SŚS\u{301}'
|
||||||
|
'ŚS\u{301}S' . replace 's' 'O' Matching_Mode.First . should_equal 'ŚS\u{301}S'
|
||||||
|
|
||||||
|
'SŚS\u{301}' . replace 'ś' 'O' . should_equal 'SŚS\u{301}'
|
||||||
|
'SŚS\u{301}' . replace 's\u{301}' 'O' . should_equal 'SŚS\u{301}'
|
||||||
|
|
||||||
|
'SŚS\u{301}' . replace 's' 'O' matcher=(Text_Matcher Case_Insensitive) . should_equal 'OŚS\u{301}'
|
||||||
|
'SŚS\u{301}' . replace 's' 'O' Matching_Mode.Last matcher=(Text_Matcher Case_Insensitive) . should_equal 'OŚS\u{301}'
|
||||||
|
'ŚS\u{301}S' . replace 's' 'O' Matching_Mode.First matcher=(Text_Matcher Case_Insensitive) . should_equal 'ŚS\u{301}O'
|
||||||
|
|
||||||
|
'SŚS\u{301}' . replace 'ś' 'O' matcher=(Text_Matcher Case_Insensitive) . should_equal 'SOO'
|
||||||
|
'SŚS\u{301}' . replace 's\u{301}' 'O' matcher=(Text_Matcher Case_Insensitive) . should_equal 'SOO'
|
||||||
|
|
||||||
|
'✨🚀🚧😍😃😍😎😙😉☺' . replace '🚧😍' '|-|:)' . should_equal '✨🚀|-|:)😃😍😎😙😉☺'
|
||||||
|
'Rocket Science' . replace 'Rocket' '🚀' . should_equal '🚀 Science'
|
||||||
|
|
||||||
Test.specify "should be possible on unicode text" <|
|
|
||||||
"Korean: 건반".replace "건반" "keyboard" . should_equal "Korean: keyboard"
|
"Korean: 건반".replace "건반" "keyboard" . should_equal "Korean: keyboard"
|
||||||
|
|
||||||
Test.specify "should be possible in ascii mode" <|
|
Test.specify "will approximate ligature matches" <|
|
||||||
result = "İiİ".replace "\w" "a" match_ascii=True
|
# TODO do we want to improve this? highly non-trivial for very rare edge cases
|
||||||
result . should_equal "İaİ"
|
## Currently we lack 'resolution' to extract a partial match from
|
||||||
|
the ligature to keep it, probably would need some special
|
||||||
|
mapping.
|
||||||
|
'ffiffi'.replace 'ff' 'aa' matcher=(Text_Matcher Case_Insensitive) . should_equal 'aaaa'
|
||||||
|
'ffiffi'.replace 'ff' 'aa' mode=Matching_Mode.First matcher=(Text_Matcher Case_Insensitive) . should_equal 'aaffi'
|
||||||
|
'ffiffi'.replace 'ff' 'aa' mode=Matching_Mode.Last matcher=(Text_Matcher Case_Insensitive) . should_equal 'ffiaa'
|
||||||
|
'affiffib'.replace 'IF' 'X' matcher=(Text_Matcher Case_Insensitive) . should_equal 'aXb'
|
||||||
|
'aiffiffz' . replace 'if' '-' matcher=(Text_Matcher Case_Insensitive) . should_equal 'a--fz'
|
||||||
|
'AFFIB'.replace 'ffi' '-' matcher=(Text_Matcher Case_Insensitive) . should_equal 'A-B'
|
||||||
|
|
||||||
Test.specify "should be possible in case-insensitive mode" <|
|
'ß'.replace 'SS' 'A' matcher=(Text_Matcher Case_Insensitive) . should_equal 'A'
|
||||||
result = "abaBa".replace "b" "a" case_insensitive=True
|
'ß'.replace 'S' 'A' matcher=(Text_Matcher Case_Insensitive) . should_equal 'AA'
|
||||||
result . should_equal "aaaaa"
|
'ß'.replace 'S' 'A' mode=Matching_Mode.First matcher=(Text_Matcher Case_Insensitive) . should_equal 'A'
|
||||||
|
'ß'.replace 'S' 'A' mode=Matching_Mode.Last matcher=(Text_Matcher Case_Insensitive) . should_equal 'A'
|
||||||
|
'STRASSE'.replace 'ß' '-' matcher=(Text_Matcher Case_Insensitive) . should_equal 'STRA-E'
|
||||||
|
|
||||||
Test.specify "should be possible in dot_matches_newline mode" <|
|
Test.specify "should perform simple replacement in Regex mode" <|
|
||||||
result = 'ab\na'.replace "b." "a" dot_matches_newline=True
|
"ababab".replace "b" "a" matcher=Regex_Matcher . should_equal "aaaaaa"
|
||||||
result . should_equal "aaa"
|
"ababab".replace "b" "a" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "aaabab"
|
||||||
|
"ababab".replace "b" "a" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "ababaa"
|
||||||
|
|
||||||
|
"aaaa".replace "aa" "c" matcher=Regex_Matcher . should_equal "cc"
|
||||||
|
"aaaa".replace "aa" "c" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "caa"
|
||||||
|
"aaaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "aac"
|
||||||
|
|
||||||
|
"aaa".replace "aa" "c" matcher=Regex_Matcher . should_equal "ca"
|
||||||
|
"aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "ca"
|
||||||
|
"aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal "ac"
|
||||||
|
"aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "ca"
|
||||||
|
|
||||||
|
"aaa aaa".replace "aa" "c" matcher=Text_Matcher . should_equal "ca ca"
|
||||||
|
"aaa aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Text_Matcher . should_equal "ca aaa"
|
||||||
|
"aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal "aaa ac"
|
||||||
|
"aaa aaa".replace "aa" "c" matcher=Regex_Matcher . should_equal "ca ca"
|
||||||
|
"aaa aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "ca aaa"
|
||||||
|
"aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "aaa ca"
|
||||||
|
|
||||||
|
Test.specify "in Regex mode should work with Unicode" <|
|
||||||
|
"Korean: 건반".replace "건반" "keyboard" matcher=Regex_Matcher . should_equal "Korean: keyboard"
|
||||||
|
'sśs\u{301}'.replace 'ś' '-' matcher=Regex_Matcher . should_equal 's--'
|
||||||
|
'sśs\u{301}'.replace 's\u{301}' '-' matcher=Regex_Matcher . should_equal 's--'
|
||||||
|
|
||||||
|
Test.specify "in Regex mode should support various Regex options" <|
|
||||||
|
r1 = "İiİ".replace "\w" "a" matcher=(Regex_Matcher match_ascii=True)
|
||||||
|
r1 . should_equal "İaİ"
|
||||||
|
r2 = "abaBa".replace "b" "a" matcher=(Regex_Matcher case_sensitive=Case_Insensitive)
|
||||||
|
r2 . should_equal "aaaaa"
|
||||||
|
r3 = 'ab\na'.replace "b." "a" matcher=(Regex_Matcher dot_matches_newline=True)
|
||||||
|
r3 . should_equal "aaa"
|
||||||
|
|
||||||
Test.specify "should be possible in multiline mode" <|
|
|
||||||
text = """
|
text = """
|
||||||
Foo
|
Foo
|
||||||
bar
|
bar
|
||||||
result = text.replace '\n' "" multiline=True
|
r4 = text.replace '\n' "" matcher=(Regex_Matcher multiline=True)
|
||||||
result . should_equal "Foobar"
|
r4 . should_equal "Foobar"
|
||||||
|
|
||||||
Test.specify "should be possible in comments mode" <|
|
r5 = "ababd".replace "b\w # Replacing a `b` followed by any word character" "a" matcher=(Regex_Matcher comments=True)
|
||||||
result = "ababd".replace "b\w # Replacing a `b` followed by any word character" "a" comments=True
|
r5 . should_equal "aaa"
|
||||||
result . should_equal "aaa"
|
|
||||||
|
Test.specify "in Regex mode should allow referring to capture groups in substitutions" <|
|
||||||
|
'<a href="url">content</a>'.replace '<a href="(.*?)">(.*?)</a>' '$2 is at $1' matcher=Regex_Matcher . should_equal 'content is at url'
|
||||||
|
'<a href="url">content</a>'.replace '<a href="(?<address>.*?)">(?<text>.*?)</a>' '${text} is at ${address}' matcher=Regex_Matcher . should_equal 'content is at url'
|
||||||
|
|
||||||
main = Test.Suite.run_main here.spec
|
main = Test.Suite.run_main here.spec
|
||||||
|
@ -28,7 +28,7 @@ spec =
|
|||||||
table = Table.from_rows header [row_1]
|
table = Table.from_rows header [row_1]
|
||||||
expect table '{"df_color":["red"],"df_label":["name"],"df_latitude":[11],"df_longitude":[10],"df_radius":[195]}'
|
expect table '{"df_color":["red"],"df_label":["name"],"df_latitude":[11],"df_longitude":[10],"df_radius":[195]}'
|
||||||
|
|
||||||
Test.specify "is case insensitive" <|
|
Test.specify "is case-insensitive" <|
|
||||||
header = ['latitude' , 'LONGITUDE' , 'LaBeL']
|
header = ['latitude' , 'LONGITUDE' , 'LaBeL']
|
||||||
row_1 = [11 , 10 , 09 ]
|
row_1 = [11 , 10 , 09 ]
|
||||||
row_2 = [21 , 20 , 19 ]
|
row_2 = [21 , 20 , 19 ]
|
||||||
|
@ -46,7 +46,7 @@ spec =
|
|||||||
table = Table.from_rows header [row_1, row_2]
|
table = Table.from_rows header [row_1, row_2]
|
||||||
expect table 'value' [10,20]
|
expect table 'value' [10,20]
|
||||||
|
|
||||||
Test.specify "is case insensitive" <|
|
Test.specify "is case-insensitive" <|
|
||||||
header = ['α', 'Value']
|
header = ['α', 'Value']
|
||||||
row_1 = [11 , 10 ]
|
row_1 = [11 , 10 ]
|
||||||
row_2 = [21 , 20 ]
|
row_2 = [21 , 20 ]
|
||||||
|
@ -49,7 +49,7 @@ spec =
|
|||||||
table = Table.from_rows header [row_1]
|
table = Table.from_rows header [row_1]
|
||||||
expect table (labels 'x' 'y') '[{"color":"ff0000","label":"label","shape":"square","size":50,"x":11,"y":10}]'
|
expect table (labels 'x' 'y') '[{"color":"ff0000","label":"label","shape":"square","size":50,"x":11,"y":10}]'
|
||||||
|
|
||||||
Test.specify "is case insensitive" <|
|
Test.specify "is case-insensitive" <|
|
||||||
header = ['X' , 'Y' , 'Size' , 'Shape' , 'Label' , 'Color' ]
|
header = ['X' , 'Y' , 'Size' , 'Shape' , 'Label' , 'Color' ]
|
||||||
row_1 = [11 , 10 , 50 , 'square' , 'label' , 'ff0000']
|
row_1 = [11 , 10 , 50 , 'square' , 'label' , 'ff0000']
|
||||||
table = Table.from_rows header [row_1]
|
table = Table.from_rows header [row_1]
|
||||||
|
Loading…
Reference in New Issue
Block a user