Data analysts should be able to use Text.replace to substitute parts of the text (#3393)

Implements https://www.pivotaltracker.com/story/show/181266274
2024-12-23 21:12:44 +03:00 · 2022-04-13 21:21:47 +02:00 · 2022-04-13 21:21:47 +02:00 · 0ea5dc2a6f
commit 0ea5dc2a6f
parent 0ab46bc6f8
15 changed files with 448 additions and 141 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -105,6 +105,7 @@
 - [Implemented `Text.reverse`][3377]
 - [Implemented support for most Table aggregations in the Database
  backend.][3383]
 - [Update `Text.replace` to new API.][3393]
 [debug-shortcuts]:
  https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
@ -160,6 +161,7 @@
 [3383]: https://github.com/enso-org/enso/pull/3383
 [3385]: https://github.com/enso-org/enso/pull/3385
 [3392]: https://github.com/enso-org/enso/pull/3392
 [3393]: https://github.com/enso-org/enso/pull/3393
 #### Enso Compiler
--- a/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso
+++ b/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Extensions.enso
@ -424,52 +424,21 @@ Text.split separator=Split_Kind.Whitespace mode=Mode.All match_ascii=Nothing cas
            pattern.split this mode=mode
 ## ALIAS Replace Text
-
+   Replaces the first, last, or all occurrences of term with new_text in the
-   Replaces each occurrence of `old_sequence` with `new_sequence`, returning
+   input. If `term` is empty, the function returns the input unchanged.
   `this` unchanged if no matches are found.
   Arguments:
-   - old_sequence: The pattern to search for in `this`.
+   - term: The term to find.
-   - new_sequence: The text to replace every occurrence of `old_sequence` with.
+   - new_text: The new text to replace occurrences of `term` with.
-   - mode: This argument specifies how many matches the engine will try to
+     If `matcher` is a `Regex_Matcher`, `new_text` can include replacement
-     replace.
+     patterns (such as `$<n>`) for a marked group.
-   - match_ascii: Enables or disables pure-ASCII matching for the regex. If you
+   - mode: Specifies which instances of term the engine tries to find. When the
-     know your data only contains ASCII then you can enable this for a
+     mode is `First` or `Last`, this method replaces the first or last instance
-     performance boost on some regex engines.
+     of term in the input. If set to `All`, it replaces all instances of term in
-   - case_insensitive: Enables or disables case-insensitive matching. Case
+     the input.
-     insensitive matching behaves as if it normalises the case of all input
+   - matcher: If a `Text_Matcher`, the text is compared using case-sensitivity
-     text before matching on it.
+     rules specified in the matcher. If a `Regex_Matcher`, the term is used as a
-   - dot_matches_newline: Enables or disables the dot matches newline option.
+     regular expression and matched using the associated options.
     This specifies that the `.` special character should match everything
     _including_ newline characters. Without this flag, it will match all
     characters _except_ newlines.
   - multiline: Enables or disables the multiline option. Multiline specifies
     that the `^` and `$` pattern characters match the start and end of lines,
     as well as the start and end of the input respectively.
   - comments: Enables or disables the comments mode for the regular expression.
     In comments mode, the following changes apply:
     - Whitespace within the pattern is ignored, except when within a
       character class or when preceeded by an unescaped backslash, or within
       grouping constructs (e.g. `(?...)`).
     - When a line contains a `#`, that is not in a character class and is not
       preceeded by an unescaped backslash, all characters from the leftmost
       such `#` to the end of the line are ignored. That is to say, they act
       as _comments_ in the regex.
   - extra_opts: Specifies additional options in a vector. This allows options
     to be supplied and computed without having to break them out into arguments
     to the function. Where these overlap with one of the flags (`match_ascii`,
     `case_insensitive`, `dot_matches_newline`, `multiline` and `verbose`), the
     flags take precedence.
   ! Boolean Flags and Extra Options
     This function contains a number of arguments that are boolean flags that
     enable or disable common options for the regex. At the same time, it also
     provides the ability to specify options in the `extra_opts` argument.
     Where one of the flags is _set_ (has the value `True` or `False`), the
     value of the flag takes precedence over the value in `extra_opts` when
     merging the options to the engine. The flags are _unset_ (have value
     `Nothing`) by default.
   > Example
     Replace letters in the text "aaa".
@ -477,15 +446,87 @@ Text.split separator=Split_Kind.Whitespace mode=Mode.All match_ascii=Nothing cas
         'aaa'.replace 'aa' 'b' == 'ba'
   > Example
-     Replace every word of two letters or less with the string "SMOL".
+     Replace all occurrences of letters 'l' and 'o' with '#'.
-         example_replace =
+         "Hello World!".replace "[lo]" "#" matcher=Regex_Matcher == "He### W#r#d!"
-             text = "I am a very smol word."
+
-             text.replace "\w\w(?!\w)"
+   > Example
-Text.replace : Text | Engine.Pattern -> Text -> Mode.Mode -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Boolean | Nothing -> Vector.Vector Option.Option -> Text
+     Replace the first occurrence of letter 'l' with '#'.
-Text.replace old_sequence new_sequence mode=Mode.All match_ascii=Nothing case_insensitive=Nothing dot_matches_newline=Nothing multiline=Nothing comments=Nothing extra_opts=[] =
+
-    compiled_pattern = Regex.compile old_sequence match_ascii=match_ascii case_insensitive=case_insensitive dot_matches_newline=dot_matches_newline multiline=multiline comments=comments extra_opts=extra_opts
+         "Hello World!".replace "l" "#" mode=Matching_Mode.First == "He#lo World!"
-    compiled_pattern.replace this new_sequence mode
+
   > Example
     Replace texts in quotes with parentheses.
          '"abc" foo "bar" baz'.replace '"(.*?)"' '($1)' matcher=Regex_Matcher == '(abc) foo (bar) baz'
   ! Matching Grapheme Clusters
     In case-insensitive mode, a single character can match multiple characters,
     for example `ß` will match `ss` and `SS`, and the ligature `ﬃ` will match
     `ffi` or `f` etc. Thus in this mode, it is sometimes possible for a term to
     match only a part of some single grapheme cluster, for example in the text
     `ﬃa` the term `ia` will match just one-third of the first grapheme `ﬃ`.
     Since we do not have the resolution to distinguish such partial matches, a
     match which matched just a part of some grapheme cluster is extended and
     treated as if it matched the whole grapheme cluster. Thus the whole
     grapheme cluster may be replaced with the replacement text even if just a
     part of it was matched.
   > Example
     Extended partial matches in case-insensitive mode.
          # The ß symbol matches the letter `S` twice in case-insensitive mode, because it folds to `ss`.
         'ß'.replace 'S' 'A' matcher=(Text_Matcher Case_Insensitive) . should_equal 'AA'
         # The 'ﬃ' ligature is a single grapheme cluster, so even if just a part of it is matched, the whole grapheme is replaced.
         'aﬃb'.replace 'i' 'X' matcher=(Text_Matcher Case_Insensitive) . should_equal 'aXb'
   ! Last Match in Regex Mode
     Regex always performs the search from the front and matching the last
     occurrence means selecting the last of the matches while still generating
     matches from the beginning. This will lead to slightly different behavior
     for overlapping occurrences of a pattern in Regex mode than in exact text
     matching mode where the matches are searched for from the back.
   > Example
     Comparing Matching in Last Mode in Regex and Text mode
         "aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal "ac"
         "aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "ca"
         "aaa aaa".replace "aa" "c" matcher=Text_Matcher . should_equal "ca ca"
         "aaa aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Text_Matcher . should_equal "ca aaa"
         "aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal "aaa ac"
         "aaa aaa".replace "aa" "c" matcher=Regex_Matcher . should_equal "ca ca"
         "aaa aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "ca aaa"
         "aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "aaa ca"
 Text.replace : Text -> Text -> (Matching_Mode.First | Matching_Mode.Last | Mode.All) -> (Text_Matcher | Regex_Matcher) -> Text
 Text.replace term="" new_text="" mode=Mode.All matcher=Text_Matcher = if term.is_empty then this else
    case matcher of
        Text_Matcher case_sensitivity ->
            array_from_single_result result = case result of
                Nothing -> Array.empty
                _ -> Array.new_1 result
            spans_array = case case_sensitivity of
                True -> case mode of
                    Mode.All ->
                        Text_Utils.span_of_all this term
                    Matching_Mode.First ->
                        array_from_single_result <| Text_Utils.span_of this term
                    Matching_Mode.Last ->
                        array_from_single_result <| Text_Utils.last_span_of this term
                Case_Insensitive locale -> case mode of
                    Mode.All ->
                        Text_Utils.span_of_all_case_insensitive this term locale.java_locale
                    Matching_Mode.First ->
                        array_from_single_result <|
                            Text_Utils.span_of_case_insensitive this term locale.java_locale False
                    Matching_Mode.Last ->
                        array_from_single_result <|
                            Text_Utils.span_of_case_insensitive this term locale.java_locale True
            Text_Utils.replace_spans this spans_array new_text
        Regex_Matcher _ _ _ _ _ ->
            compiled_pattern = matcher.compile term
            compiled_pattern.replace this new_text mode=mode
 ## ALIAS Get Words
@ -1223,16 +1264,16 @@ Text.trim where=Location.Both what=_.is_whitespace =
     which contains both the start and end indices, allowing to determine the
     length of the match. This is useful not only with regex matches (where a
     regular expression can have matches of various lengths) but also for case
-     insensitive matching. In case insensitive mode, a single character can
+     insensitive matching. In case-insensitive mode, a single character can
     match multiple characters, for example `ß` will match `ss` and `SS`, and
-     the ligature `ﬃ` will match `ffi` or `f` etc. Thus in case insensitive
+     the ligature `ﬃ` will match `ffi` or `f` etc. Thus in case-insensitive
     mode, the length of the match can be shorter or longer than the term that
     was being matched, so it is extremely important to not rely on the length
     of the matched term when analysing the matches as they may have different
     lengths.
   > Example
-     Match length differences in case insensitive matching.
+     Match length differences in case-insensitive matching.
         term = "straße"
         text = "MONUMENTENSTRASSE 42"
@ -1241,7 +1282,7 @@ Text.trim where=Location.Both what=_.is_whitespace =
         match.length == 7
   ! Matching Grapheme Clusters
-     In case insensitive mode, a single character can match multiple characters,
+     In case-insensitive mode, a single character can match multiple characters,
     for example `ß` will match `ss` and `SS`, and the ligature `ﬃ` will match
     `ffi` or `f` etc. Thus in this mode, it is sometimes possible for a term to
     match only a part of some single grapheme cluster, for example in the text
@ -1266,6 +1307,22 @@ Text.trim where=Location.Both what=_.is_whitespace =
         match_2.length == 2
         # After being extended to full grapheme clusters, both terms "IFF" and "ffiffl" match the same span of grapheme clusters.
         match_1 == match_2
   ! Last Match in Regex Mode
     Regex always performs the search from the front and matching the last
     occurrence means selecting the last of the matches while still generating
     matches from the beginning. This will lead to slightly different behavior
     for overlapping occurrences of a pattern in Regex mode than in exact text
     matching mode where the matches are searched for from the back.
   > Example
     Comparing Matching in Last Mode in Regex and Text mode
         "aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher == Span (Range 1 3) "aaa"
         "aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher == Span (Range 0 2) "aaa"
         "aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher == Span (Range 5 7) "aaa aaa"
         "aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher == Span (Range 4 6) "aaa aaa"
 Text.location_of : Text -> (Matching_Mode.First | Matching_Mode.Last) -> Matcher -> Span | Nothing
 Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = case matcher of
    Text_Matcher case_sensitive -> case case_sensitive of
@ -1274,7 +1331,7 @@ Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = cas
                Matching_Mode.First -> Text_Utils.span_of this term
                Matching_Mode.Last -> Text_Utils.last_span_of this term
            if codepoint_span.is_nothing then Nothing else
-                start = Text_Utils.utf16_index_to_grapheme_index this codepoint_span.start
+                start = Text_Utils.utf16_index_to_grapheme_index this codepoint_span.codeunit_start
                ## While the codepoint_span may have different code unit length
                   from our term, the `length` counted in grapheme clusters is
                   guaranteed to be the same.
@ -1293,7 +1350,7 @@ Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = cas
                case Text_Utils.span_of_case_insensitive this term locale.java_locale search_for_last of
                    Nothing -> Nothing
                    grapheme_span ->
-                        Span (Range grapheme_span.start grapheme_span.end) this
+                        Span (Range grapheme_span.grapheme_start grapheme_span.grapheme_end) this
    Regex_Matcher _ _ _ _ _ -> case mode of
        Matching_Mode.First ->
            case matcher.compile term . match this Mode.First of
@ -1332,16 +1389,16 @@ Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = cas
     which contains both the start and end indices, allowing to determine the
     length of the match. This is useful not only with regex matches (where a
     regular expression can have matches of various lengths) but also for case
-     insensitive matching. In case insensitive mode, a single character can
+     insensitive matching. In case-insensitive mode, a single character can
     match multiple characters, for example `ß` will match `ss` and `SS`, and
-     the ligature `ﬃ` will match `ffi` or `f` etc. Thus in case insensitive
+     the ligature `ﬃ` will match `ffi` or `f` etc. Thus in case-insensitive
     mode, the length of the match can be shorter or longer than the term that
     was being matched, so it is extremely important to not rely on the length
     of the matched term when analysing the matches as they may have different
     lengths.
   > Example
-     Match length differences in case insensitive matching.
+     Match length differences in case-insensitive matching.
         term = "strasse"
         text = "MONUMENTENSTRASSE ist eine große Straße."
@ -1350,7 +1407,7 @@ Text.location_of term="" mode=Matching_Mode.First matcher=Text_Matcher.new = cas
         match . map .length == [7, 6]
   ! Matching Grapheme Clusters
-     In case insensitive mode, a single character can match multiple characters,
+     In case-insensitive mode, a single character can match multiple characters,
     for example `ß` will match `ss` and `SS`, and the ligature `ﬃ` will match
     `ffi` or `f` etc. Thus in this mode, it is sometimes possible for a term to
     match only a part of some single grapheme cluster, for example in the text
@ -1374,7 +1431,7 @@ Text.location_of_all term="" matcher=Text_Matcher.new = case matcher of
    Text_Matcher case_sensitive -> if term.is_empty then Vector.new (this.length + 1) (ix -> Span (Range ix ix) this) else case case_sensitive of
        True ->
            codepoint_spans = Vector.Vector <| Text_Utils.span_of_all this term
-            grahpeme_ixes = Vector.Vector <| Text_Utils.utf16_indices_to_grapheme_indices this (codepoint_spans.map .start).to_array
+            grahpeme_ixes = Vector.Vector <| Text_Utils.utf16_indices_to_grapheme_indices this (codepoint_spans.map .codeunit_start).to_array
            ## While the codepoint_spans may have different code unit lengths
               from our term, the `length` counted in grapheme clusters is
               guaranteed to be the same.
@ -1385,7 +1442,7 @@ Text.location_of_all term="" matcher=Text_Matcher.new = case matcher of
        Case_Insensitive locale ->
            grapheme_spans = Vector.Vector <| Text_Utils.span_of_all_case_insensitive this term locale.java_locale
            grapheme_spans.map grapheme_span->
-                Span (Range grapheme_span.start grapheme_span.end) this
+                Span (Range grapheme_span.grapheme_start grapheme_span.grapheme_end) this
    Regex_Matcher _ _ _ _ _ ->
        case matcher.compile term . match this Mode.All of
            Nothing -> []
--- a/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Regex/Engine/Default.enso
+++ b/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Regex/Engine/Default.enso
@ -39,6 +39,7 @@ import Standard.Base.Data.Text.Regex
 import Standard.Base.Data.Text.Regex.Engine
 import Standard.Base.Data.Text.Regex.Option as Global_Option
 import Standard.Base.Data.Text.Regex.Mode
 import Standard.Base.Data.Text.Matching_Mode
 import Standard.Base.Polyglot.Java as Java_Ext
 from Standard.Base.Data.Text.Span as Span_Module import Utf_16_Span
@ -533,7 +534,7 @@ type Pattern
                 pattern = engine.compile "aa []
                 input = "aabbaabbbbbaab"
                 pattern.replace input "REPLACED"
-    replace : Text -> Text -> (Mode.First | Integer | Mode.All | Mode.Full) -> Text
+    replace : Text -> Text -> (Mode.First | Integer | Mode.All | Mode.Full | Matching_Mode.Last) -> Text
    replace input replacement mode=Mode.All =
        do_replace_mode mode start end = case mode of
            Mode.First ->
@ -559,8 +560,26 @@ type Pattern
                internal_matcher.replaceAll replacement
            Mode.Full ->
                case this.match input mode=Mode.Full of
-                    Match _ _ _ _ -> replacement
+                    Match _ _ _ _ -> this.replace input replacement Mode.First
                    Nothing -> input
            Matching_Mode.Last ->
                all_matches = this.match input
                all_matches_count = if all_matches.is_nothing then 0 else all_matches.length
                if all_matches_count == 0 then input else
                    internal_matcher = this.build_matcher input start end
                    buffer = StringBuffer.new
                    last_match_index = all_matches_count - 1
                    go match_index =
                        internal_matcher.find
                        case match_index == last_match_index of
                            True -> internal_matcher.appendReplacement buffer replacement
                            False -> @Tail_Call go (match_index + 1)
                    go 0
                    internal_matcher.appendTail buffer
                    buffer.to_text
            Mode.Bounded _ _ _ -> Panic.throw <|
                Mode_Error "Modes cannot be recursive."
--- a/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Text_Sub_Range.enso
+++ b/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Text_Sub_Range.enso
@ -81,22 +81,22 @@ type Text_Sub_Range
                if delimiter.is_empty then (Range 0 0) else
                    span = Text_Utils.span_of text delimiter
                    if span.is_nothing then (Range 0 (Text_Utils.char_length text)) else
-                        (Range 0 span.start)
+                        (Range 0 span.codeunit_start)
            Before_Last delimiter ->
                if delimiter.is_empty then (Range 0 (Text_Utils.char_length text)) else
                    span = Text_Utils.last_span_of text delimiter
                    if span.is_nothing then (Range 0 (Text_Utils.char_length text)) else
-                        (Range 0 span.start)
+                        (Range 0 span.codeunit_start)
            After delimiter ->
                if delimiter.is_empty then (Range 0 (Text_Utils.char_length text)) else
                    span = Text_Utils.span_of text delimiter
                    if span.is_nothing then (Range 0 0) else
-                        (Range span.end (Text_Utils.char_length text))
+                        (Range span.codeunit_end (Text_Utils.char_length text))
            After_Last delimiter ->
                if delimiter.is_empty then (Range 0 0) else
                    span = Text_Utils.last_span_of text delimiter
                    if span.is_nothing then (Range 0 0) else
-                        (Range span.end (Text_Utils.char_length text))
+                        (Range span.codeunit_end (Text_Utils.char_length text))
            While predicate ->
                indices = find_sub_range_end text _-> start-> end->
                    predicate (Text_Utils.substring text start end) . not
--- a/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
+++ b/std-bits/base/src/main/java/org/enso/base/Text_Utils.java
@ -12,6 +12,7 @@ import java.util.List;
 import java.util.Locale;
 import java.util.regex.Pattern;
 import org.enso.base.text.CaseFoldedString;
 import org.enso.base.text.CaseFoldedString.Grapheme;
 import org.enso.base.text.GraphemeSpan;
 import org.enso.base.text.Utf16Span;
@ -231,19 +232,6 @@ public class Text_Utils {
    return CaseFoldedString.simpleFold(string, locale);
  }
  /**
   * Replaces all occurrences of {@code oldSequence} within {@code str} with {@code newSequence}.
   *
   * @param str the string to process
   * @param oldSequence the substring that is searched for and will be replaced
   * @param newSequence the string that will replace occurrences of {@code oldSequence}
   * @return {@code str} with all occurrences of {@code oldSequence} replaced with {@code
   *     newSequence}
   */
  public static String replace(String str, String oldSequence, String newSequence) {
    return str.replace(oldSequence, newSequence);
  }
  /**
   * Gets the length of char array of a string
   *
@ -306,7 +294,7 @@ public class Text_Utils {
    StringSearch search = new StringSearch(needle, haystack);
    ArrayList<Utf16Span> occurrences = new ArrayList<>();
-    long ix;
+    int ix;
    while ((ix = search.next()) != StringSearch.DONE) {
      occurrences.add(new Utf16Span(ix, ix + search.getMatchLength()));
    }
@ -456,13 +444,21 @@ public class Text_Utils {
   * @return a minimal {@code GraphemeSpan} which contains all code units from the match
   */
  private static GraphemeSpan findExtendedSpan(CaseFoldedString string, int position, int length) {
-    int firstGrapheme = string.codeUnitToGraphemeIndex(position);
+    Grapheme firstGrapheme = string.findGrapheme(position);
    if (length == 0) {
-      return new GraphemeSpan(firstGrapheme, firstGrapheme);
+      return new GraphemeSpan(
          firstGrapheme.index,
          firstGrapheme.index,
          firstGrapheme.codeunit_start,
          firstGrapheme.codeunit_start);
    } else {
-      int lastGrapheme = string.codeUnitToGraphemeIndex(position + length - 1);
+      Grapheme lastGrapheme = string.findGrapheme(position + length - 1);
-      int endGrapheme = lastGrapheme + 1;
+      int endGraphemeIndex = lastGrapheme.index + 1;
-      return new GraphemeSpan(firstGrapheme, endGrapheme);
+      return new GraphemeSpan(
          firstGrapheme.index,
          endGraphemeIndex,
          firstGrapheme.codeunit_start,
          lastGrapheme.codeunit_end);
    }
  }
@ -485,4 +481,30 @@ public class Text_Utils {
  public static boolean is_all_whitespace(String text) {
    return text.codePoints().allMatch(UCharacter::isUWhiteSpace);
  }
  /**
   * Replaces all provided spans within the text with {@code newSequence}.
   *
   * @param str the string to process
   * @param spans the spans to replace; the spans should be sorted by their starting point in the
   *     non-decreasing order; the behaviour is undefined if these requirements are not satisfied.
   * @param newSequence the string that will replace the spans
   * @return {@code str} with all provided spans replaced with {@code newSequence}
   */
  public static String replace_spans(String str, List<Utf16Span> spans, String newSequence) {
    StringBuilder sb = new StringBuilder();
    int current_ix = 0;
    for (Utf16Span span : spans) {
      if (span.codeunit_start > current_ix) {
        sb.append(str, current_ix, span.codeunit_start);
      }
      sb.append(newSequence);
      current_ix = span.codeunit_end;
    }
    // Add the remaining part of the string (if any).
    sb.append(str, current_ix, str.length());
    return sb.toString();
  }
 }
--- a/std-bits/base/src/main/java/org/enso/base/text/CaseFoldedString.java
+++ b/std-bits/base/src/main/java/org/enso/base/text/CaseFoldedString.java
@ -13,6 +13,20 @@ import java.util.Locale;
 * indices back in the original string.
 */
 public class CaseFoldedString {
  public static class Grapheme {
    /** The grapheme index of the given grapheme in the string. */
    public final int index;
    /** The codeunit indices of start and end of the given grapheme in the original string. */
    public final int codeunit_start, codeunit_end;
    public Grapheme(int index, int codeunit_start, int codeunit_end) {
      this.index = index;
      this.codeunit_start = codeunit_start;
      this.codeunit_end = codeunit_end;
    }
  }
  private final String foldedString;
  /**
@ -24,33 +38,67 @@ public class CaseFoldedString {
   */
  private final int[] graphemeIndexMapping;
  /**
   * A mapping from code units in the transformed string to the first code-unit of the corresponding
   * grapheme in the original string.
   *
   * <p>The mapping must be valid from indices from 0 to @{code foldedString.length()+1}
   * (inclusive).
   */
  private final int[] codeunitStartIndexMapping;
  /**
   * A mapping from code units in the transformed string to the end code-unit of the corresponding
   * grapheme in the original string.
   *
   * <p>The mapping must be valid from indices from 0 to @{code foldedString.length()+1}
   * (inclusive).
   */
  private final int[] codeunitEndIndexMapping;
  /**
   * Constructs a new instance of the folded string.
   *
   * @param foldeString the string after applying the case folding transformation
   * @param graphemeIndexMapping a mapping created during the transformation which maps code units
   *     in the transformed string to their corresponding graphemes in the original string
   * @param codeunitStartIndexMapping a mapping created during the transformation which maps code
   *     units in the transformed string to first codeunits of corresponding graphemes in the
   *     original string
   * @param codeunitStartIndexMapping a mapping created during the transformation which maps code
   *     units in the transformed string to end codeunits of corresponding graphemes in the original
   *     string
   */
-  private CaseFoldedString(String foldeString, int[] graphemeIndexMapping) {
+  private CaseFoldedString(
      String foldeString,
      int[] graphemeIndexMapping,
      int[] codeunitStartIndexMapping,
      int[] codeunitEndIndexMapping) {
    this.foldedString = foldeString;
    this.graphemeIndexMapping = graphemeIndexMapping;
    this.codeunitStartIndexMapping = codeunitStartIndexMapping;
    this.codeunitEndIndexMapping = codeunitEndIndexMapping;
  }
  /**
-   * Maps a code unit in the folded string to the corresponding grapheme in the original string.
+   * Finds the grapheme corresponding to a code unit in the folded string.
   *
   * @param codeunitIndex the index of the code unit in the folded string, valid indices range from
   *     0 to {@code getFoldedString().length()+1} (inclusive), allowing to also ask for the
   *     position of the end code unit which is located right after the end of the string - which
   *     should always map to the analogous end grapheme.
-   * @return the index of the grapheme from the original string that after applying the
+   * @return the index of the first code unit of the grapheme from the original string that after
-   *     transformation contains the requested code unit
+   *     applying the transformation contains the requested code unit
   */
-  public int codeUnitToGraphemeIndex(int codeunitIndex) {
+  public Grapheme findGrapheme(int codeunitIndex) {
    if (codeunitIndex < 0 || codeunitIndex > this.foldedString.length()) {
      throw new IndexOutOfBoundsException(codeunitIndex);
    }
-    return graphemeIndexMapping[codeunitIndex];
+
    return new Grapheme(
        graphemeIndexMapping[codeunitIndex],
        codeunitStartIndexMapping[codeunitIndex],
        codeunitEndIndexMapping[codeunitIndex]);
  }
  /** Returns the transformed string. */
@ -74,7 +122,9 @@ public class CaseFoldedString {
    breakIterator.setText(charSequence);
    StringBuilder stringBuilder = new StringBuilder(charSequence.length());
    Fold foldAlgorithm = caseFoldAlgorithmForLocale(locale);
-    IntArrayBuilder index_mapping = new IntArrayBuilder(charSequence.length() + 1);
+    IntArrayBuilder grapheme_mapping = new IntArrayBuilder(charSequence.length() + 1);
    IntArrayBuilder codeunit_start_mapping = new IntArrayBuilder(charSequence.length() + 1);
    IntArrayBuilder codeunit_end_mapping = new IntArrayBuilder(charSequence.length() + 1);
    // We rely on the fact that ICU Case Folding is _not_ context-sensitive, i.e. the mapping of
    // each grapheme cluster is independent of surrounding ones. Regular casing is
@ -87,7 +137,9 @@ public class CaseFoldedString {
      String foldedGrapheme = foldAlgorithm.apply(grapheme);
      stringBuilder.append(foldedGrapheme);
      for (int i = 0; i < foldedGrapheme.length(); ++i) {
-        index_mapping.add(grapheme_index);
+        grapheme_mapping.add(grapheme_index);
        codeunit_start_mapping.add(current);
        codeunit_end_mapping.add(next);
      }
      grapheme_index++;
@ -96,10 +148,13 @@ public class CaseFoldedString {
    // The mapping should also be able to handle a {@code str.length()} query, so we add one more
    // element to the mapping pointing to a non-existent grapheme after the end of the text.
-    index_mapping.add(grapheme_index);
+    grapheme_mapping.add(grapheme_index);
    return new CaseFoldedString(
-        stringBuilder.toString(), index_mapping.unsafeGetStorageAndInvalidateTheBuilder());
+        stringBuilder.toString(),
        grapheme_mapping.unsafeGetStorageAndInvalidateTheBuilder(),
        codeunit_start_mapping.unsafeGetStorageAndInvalidateTheBuilder(),
        codeunit_end_mapping.unsafeGetStorageAndInvalidateTheBuilder());
  }
  /**
--- a/std-bits/base/src/main/java/org/enso/base/text/GraphemeSpan.java
+++ b/std-bits/base/src/main/java/org/enso/base/text/GraphemeSpan.java
@ -9,20 +9,21 @@ package org.enso.base.text;
 * <p>Represents an empty span if start and end indices are equal. Such an empty span refers to the
 * space just before the grapheme corresponding to index start.
 */
-public class GraphemeSpan {
+public class GraphemeSpan extends Utf16Span {
-  public final long start, end;
+  public final int grapheme_start, grapheme_end;
  /**
   * Constructs a span of characters (understood as extended grapheme clusters).
-   *
+   *  @param grapheme_start index of the first extended grapheme cluster contained within the span (or
   * @param start index of the first extended grapheme cluster contained within the span (or
   *     location of the span if it is empty)
-   * @param end index of the first extended grapheme cluster after start that is not contained
+   * @param grapheme_end index of the first extended grapheme cluster after start that is not contained
-   *     within the span
+   * @param codeunit_start code unit index of {@code grapheme_start}
   * @param codeunit_end code unit index of {@code grapheme_end}
   */
-  public GraphemeSpan(long start, long end) {
+  public GraphemeSpan(int grapheme_start, int grapheme_end, int codeunit_start, int codeunit_end) {
-    this.start = start;
+    super(codeunit_start, codeunit_end);
-    this.end = end;
+    this.grapheme_start = grapheme_start;
    this.grapheme_end = grapheme_end;
  }
 }
--- a/std-bits/base/src/main/java/org/enso/base/text/Utf16Span.java
+++ b/std-bits/base/src/main/java/org/enso/base/text/Utf16Span.java
@ -8,11 +8,11 @@ package org.enso.base.text;
 */
 public class Utf16Span {
-  public final long start, end;
+  public final int codeunit_start, codeunit_end;
  /** Constructs a span of UTF-16 code units. */
-  public Utf16Span(long start, long end) {
+  public Utf16Span(int codeunit_start, int codeunit_end) {
-    this.start = start;
+    this.codeunit_start = codeunit_start;
-    this.end = end;
+    this.codeunit_end = codeunit_end;
  }
 }
--- a/test/Table_Tests/src/Common_Table_Spec.enso
+++ b/test/Table_Tests/src/Common_Table_Spec.enso
@ -376,7 +376,7 @@ spec prefix table_builder supports_case_sensitive_columns pending=Nothing =
            expect_column_names ["bar", "foo_001", "foo_1", "Foo_2", "foo_3", "foo_21", "foo_100"] <| table.sort_columns (Sort_Method natural_order=True case_sensitive=Case_Insensitive.new)
            expect_column_names ["foo_3", "foo_21", "foo_100", "foo_1", "foo_001", "bar", "Foo_2"] <| table.sort_columns (Sort_Method order=Sort_Order.Descending)
-        Test.specify "should correctly handle case insensitive sorting" <|
+        Test.specify "should correctly handle case-insensitive sorting" <|
            expect_column_names ["bar", "foo_001", "foo_1", "foo_100", "Foo_2", "foo_21", "foo_3"] <| table.sort_columns (Sort_Method case_sensitive=Case_Insensitive.new)
        Test.specify "should correctly handle natural order sorting" <|
@ -412,7 +412,7 @@ spec prefix table_builder supports_case_sensitive_columns pending=Nothing =
            expect_column_names ["FirstColumn", "beta", "gamma", "Another"] <|
                table.rename_columns (Column_Mapping.By_Name map (Text_Matcher True))
-        Test.specify "should work by name case insensitively" <|
+        Test.specify "should work by name case-insensitively" <|
            map = Map.from_vector [["ALPHA", "FirstColumn"], ["DELTA", "Another"]]
            expect_column_names ["FirstColumn", "beta", "gamma", "Another"] <|
                table.rename_columns (Column_Mapping.By_Name map (Text_Matcher Case_Insensitive.new))
--- a/test/Tests/src/Data/Text/Default_Regex_Engine_Spec.enso
+++ b/test/Tests/src/Data/Text/Default_Regex_Engine_Spec.enso
@ -5,6 +5,7 @@ import Standard.Test
 import Standard.Base.Data.Text.Regex
 import Standard.Base.Data.Text.Regex.Engine.Default as Default_Engine
 import Standard.Base.Data.Text.Regex.Mode
 import Standard.Base.Data.Text.Matching_Mode
 import Standard.Base.Data.Text.Regex.Option as Global_Option
 from Standard.Base.Data.Text.Span as Span_Module import Utf_16_Span
@ -399,6 +400,11 @@ spec =
            match = pattern.replace input "REPLACED" mode=Mode.Full
            match . should_equal "REPLACED"
        Test.specify "should correctly replace entire input in Full mode even if partial matches are possible" <|
            pattern = engine.compile "(aa)+" []
            pattern.replace "aaa" "REPLACED" mode=Mode.Full . should_equal "aaa"
            pattern.replace "aaaa" "REPLACED" mode=Mode.Full . should_equal "REPLACED"
        Test.specify "should return the input for a full replace if the pattern doesn't match the entire input" <|
            pattern = engine.compile "(..)" []
            input = "aa ab"
@ -417,6 +423,35 @@ spec =
            match = pattern.replace input "REPLACED" mode=Mode.All
            match . should_equal "REPLACEDREPLACEDb"
        Test.specify "should handle capture groups in replacement" <|
            pattern = engine.compile "(?<capture>[a-z]+)" []
            pattern.replace "foo bar, baz" "[$1]" mode=Mode.All . should_equal "[foo] [bar], [baz]"
            pattern.replace "foo bar, baz" "[$1]" mode=0 . should_equal "foo bar, baz"
            pattern.replace "foo bar, baz" "[$1]" mode=1 . should_equal "[foo] bar, baz"
            pattern.replace "foo bar, baz" "[$1]" mode=2 . should_equal "[foo] [bar], baz"
            pattern.replace "foo bar, baz" "[$1]" mode=3 . should_equal "[foo] [bar], [baz]"
            pattern.replace "foo bar, baz" "[$1]" mode=4 . should_equal "[foo] [bar], [baz]"
            pattern.replace "foo bar, baz" "[$1]" mode=Mode.First . should_equal "[foo] bar, baz"
            pattern.replace "foo bar, baz" "[$1]" mode=Matching_Mode.Last . should_equal "foo bar, [baz]"
            pattern.replace "foo bar, baz" "[${capture}]" mode=Mode.All . should_equal "[foo] [bar], [baz]"
            pattern.replace "foo bar, baz" "[${capture}]" mode=0 . should_equal "foo bar, baz"
            pattern.replace "foo bar, baz" "[${capture}]" mode=1 . should_equal "[foo] bar, baz"
            pattern.replace "foo bar, baz" "[${capture}]" mode=2 . should_equal "[foo] [bar], baz"
            pattern.replace "foo bar, baz" "[${capture}]" mode=3 . should_equal "[foo] [bar], [baz]"
            pattern.replace "foo bar, baz" "[${capture}]" mode=4 . should_equal "[foo] [bar], [baz]"
            pattern.replace "foo bar, baz" "[${capture}]" mode=Mode.First . should_equal "[foo] bar, baz"
            pattern.replace "foo bar, baz" "[${capture}]" mode=Matching_Mode.Last . should_equal "foo bar, [baz]"
        Test.specify "should handle capture groups in replacement in All mode" <|
            pattern = engine.compile "([a-z]+)" []
            pattern.replace "foo bar, baz" "[$1]" mode=Mode.Full . should_equal "foo bar, baz"
            pattern.replace "foo" "[$1]" mode=Mode.Full . should_equal "[foo]"
            pattern_2 = engine.compile '<a href="(?<addr>.*?)">(?<name>.*?)</a>' []
            pattern_2.replace '<a href="url">content</a>' "$2 <- $1" mode=Mode.Full . should_equal "content <- url"
            pattern_2.replace '<a href="url">content</a>' "${name} <- ${addr}" mode=Mode.Full . should_equal "content <- url"
    Test.group "Match.group" <|
        engine = Default_Engine.new
        pattern = engine.compile "(.. .. )(?<letters>.+)()??(?<empty>)??" []
--- a/test/Tests/src/Data/Text/Utils_Spec.enso
+++ b/test/Tests/src/Data/Text/Utils_Spec.enso
@ -52,10 +52,10 @@ spec =
            codeunits = Vector.new folded.getFoldedString.utf_16.length+1 ix->ix
            grapheme_ixes = codeunits.map ix->
-                folded.codeUnitToGraphemeIndex ix
+                folded.findGrapheme ix . index
            grapheme_ixes . should_equal [0, 0, 1, 2, 3, 3, 4, 4, 4, 5, 6]
-            Test.expect_panic_with (folded.codeUnitToGraphemeIndex -1) Polyglot_Error
+            Test.expect_panic_with (folded.findGrapheme -1) Polyglot_Error
-            Test.expect_panic_with (folded.codeUnitToGraphemeIndex folded.getFoldedString.utf_16.length+1) Polyglot_Error
+            Test.expect_panic_with (folded.findGrapheme folded.getFoldedString.utf_16.length+1) Polyglot_Error
 main = Test.Suite.run_main here.spec
--- a/test/Tests/src/Data/Text_Spec.enso
+++ b/test/Tests/src/Data/Text_Spec.enso
@ -942,7 +942,7 @@ spec =
            abc.location_of "" mode=Matching_Mode.Last . should_equal (Span (Range 3 3) abc)
            abc.location_of_all "" . should_equal [Span (Range 0 0) abc, Span (Range 1 1) abc, Span (Range 2 2) abc, Span (Range 3 3) abc]
-        Test.specify "should allow case insensitive matching in location_of" <|
+        Test.specify "should allow case-insensitive matching in location_of" <|
            hello = "Hello WORLD!"
            case_insensitive = Text_Matcher Case_Insensitive.new
            hello.location_of "world" . should_equal Nothing
@ -1022,6 +1022,13 @@ spec =
            abc.location_of_all "" matcher=regex . should_equal [Span (Range 0 0) abc, Span (Range 0 0) abc, Span (Range 1 1) abc, Span (Range 2 2) abc, Span (Range 3 3) abc]
            abc.location_of "" matcher=regex mode=Matching_Mode.Last . should_equal (Span (Range 3 3) abc)
        Test.specify "should handle overlapping matches as shown in the examples"
            "aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal (Span (Range 1 3) "aaa")
            "aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal (Span (Range 0 2) "aaa")
            "aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal (Span (Range 5 7) "aaa aaa")
            "aaa aaa".location_of "aa" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal (Span (Range 4 6) "aaa aaa")
    Test.group "Regex matching" <|
        Test.specify "should be possible on text" <|
            match = "My Text: Goes Here".match "^My Text: (.+)$" mode=Regex_Mode.First
@ -1179,35 +1186,144 @@ spec =
            splits.at 1 . should_equal "c"
            splits.at 2 . should_equal "e"
-    Test.group "Regex replacement" <|
+    Test.group "Text.replace" <|
-        Test.specify "should be possible on text" <|
+        Test.specify "should work as in examples" <|
-            result = "ababab".replace "b" "a"
+            'aaa'.replace 'aa' 'b' . should_equal 'ba'
-            result . should_equal "aaaaaa"
+            "Hello World!".replace "[lo]" "#" matcher=Regex_Matcher . should_equal "He### W#r#d!"
            "Hello World!".replace "l" "#" mode=Matching_Mode.First . should_equal "He#lo World!"
            '"abc" foo "bar" baz'.replace '"(.*?)"' '($1)' matcher=Regex_Matcher . should_equal '(abc) foo (bar) baz'
            'ß'.replace 'S' 'A' matcher=(Text_Matcher Case_Insensitive) . should_equal 'AA'
            'aﬃb'.replace 'i' 'X' matcher=(Text_Matcher Case_Insensitive) . should_equal 'aXb'
        Test.specify "should correctly handle empty-string edge cases" <|
            [Mode.All, Matching_Mode.First, Matching_Mode.Last] . each mode->
                'aaa'.replace '' 'foo' mode=mode . should_equal 'aaa'
                ''.replace '' '' mode=mode . should_equal ''
                'a'.replace 'a' '' mode=mode . should_equal ''
                ''.replace 'a' 'b' mode=mode . should_equal ''
            'aba' . replace 'a' '' Matching_Mode.First . should_equal 'ba'
            'aba' . replace 'a' '' Matching_Mode.Last . should_equal 'ab'
            'aba' . replace 'a' '' . should_equal 'b'
            'aba' . replace 'c' '' . should_equal 'aba'
        Test.specify "should correctly handle first, all and last matching with overlapping occurrences" <|
            "aaa aaa".replace "aa" "c" . should_equal "ca ca"
            "aaa aaa".replace "aa" "c" mode=Matching_Mode.First . should_equal "ca aaa"
            "aaa aaa".replace "aa" "c" mode=Matching_Mode.Last . should_equal "aaa ac"
        Test.specify "should correctly handle case-insensitive matches" <|
            'AaąĄ' . replace "A" "-" matcher=(Text_Matcher Case_Insensitive) . should_equal '--ąĄ'
            'AaąĄ' . replace "A" "-" . should_equal '-aąĄ'
            'HeLlO wOrLd' . replace 'hElLo' 'Hey,' matcher=(Text_Matcher True) . should_equal 'HeLlO wOrLd'
            'HeLlO wOrLd' . replace 'hElLo' 'Hey,' matcher=(Text_Matcher Case_Insensitive) . should_equal 'Hey, wOrLd'
            "Iiİı" . replace "i" "-" . should_equal "I-İı"
            "Iiİı" . replace "I" "-" . should_equal "-iİı"
            "Iiİı" . replace "İ" "-" . should_equal "Ii-ı"
            "Iiİı" . replace "ı" "-" . should_equal "Iiİ-"
            "Iiİı" . replace "i" "-" matcher=(Text_Matcher Case_Insensitive) . should_equal "--İı"
            "Iiİı" . replace "I" "-" matcher=(Text_Matcher Case_Insensitive) . should_equal "--İı"
            "Iiİı" . replace "İ" "-" matcher=(Text_Matcher Case_Insensitive) . should_equal "Ii-ı"
            "Iiİı" . replace "ı" "-" matcher=(Text_Matcher Case_Insensitive) . should_equal "Iiİ-"
            tr_insensitive = Text_Matcher (Case_Insensitive (Locale.new "tr"))
            "Iiİı" . replace "i" "-" matcher=tr_insensitive . should_equal "I--ı"
            "Iiİı" . replace "I" "-" matcher=tr_insensitive . should_equal "-iİ-"
            "Iiİı" . replace "İ" "-" matcher=tr_insensitive . should_equal "I--ı"
            "Iiİı" . replace "ı" "-" matcher=tr_insensitive . should_equal "-iİ-"
        Test.specify "should correctly handle Unicode edge cases" <|
            'sśs\u{301}' . replace 's' 'O' . should_equal 'Ośs\u{301}'
            'sśs\u{301}' . replace 's' 'O' Matching_Mode.Last . should_equal 'Ośs\u{301}'
            'śs\u{301}s' . replace 's' 'O' Matching_Mode.First . should_equal 'śs\u{301}O'
            'sśs\u{301}' . replace 'ś' 'O' . should_equal 'sOO'
            'sśs\u{301}' . replace 's\u{301}' 'O' . should_equal 'sOO'
            'SŚS\u{301}' . replace 's' 'O' . should_equal 'SŚS\u{301}'
            'SŚS\u{301}' . replace 's' 'O' Matching_Mode.Last . should_equal 'SŚS\u{301}'
            'ŚS\u{301}S' . replace 's' 'O' Matching_Mode.First . should_equal 'ŚS\u{301}S'
            'SŚS\u{301}' . replace 'ś' 'O' . should_equal 'SŚS\u{301}'
            'SŚS\u{301}' . replace 's\u{301}' 'O' . should_equal 'SŚS\u{301}'
            'SŚS\u{301}' . replace 's' 'O' matcher=(Text_Matcher Case_Insensitive) . should_equal 'OŚS\u{301}'
            'SŚS\u{301}' . replace 's' 'O' Matching_Mode.Last matcher=(Text_Matcher Case_Insensitive) . should_equal 'OŚS\u{301}'
            'ŚS\u{301}S' . replace 's' 'O' Matching_Mode.First matcher=(Text_Matcher Case_Insensitive) . should_equal 'ŚS\u{301}O'
            'SŚS\u{301}' . replace 'ś' 'O' matcher=(Text_Matcher Case_Insensitive) . should_equal 'SOO'
            'SŚS\u{301}' . replace 's\u{301}' 'O' matcher=(Text_Matcher Case_Insensitive) . should_equal 'SOO'
            '✨🚀🚧😍😃😍😎😙😉☺' . replace '🚧😍' '|-|:)' . should_equal '✨🚀|-|:)😃😍😎😙😉☺'
            'Rocket Science' . replace 'Rocket' '🚀' . should_equal '🚀 Science'
        Test.specify "should be possible on unicode text" <|
            "Korean: 건반".replace "건반" "keyboard" . should_equal "Korean: keyboard"
-        Test.specify "should be possible in ascii mode" <|
+        Test.specify "will approximate ligature matches" <|
-            result = "İiİ".replace "\w" "a" match_ascii=True
+            # TODO do we want to improve this? highly non-trivial for very rare edge cases
-            result . should_equal "İaİ"
+            ## Currently we lack 'resolution' to extract a partial match from
               the ligature to keep it, probably would need some special
               mapping.
            'ﬃﬃ'.replace 'ff' 'aa' matcher=(Text_Matcher Case_Insensitive) . should_equal 'aaaa'
            'ﬃﬃ'.replace 'ff' 'aa' mode=Matching_Mode.First matcher=(Text_Matcher Case_Insensitive) . should_equal 'aaﬃ'
            'ﬃﬃ'.replace 'ff' 'aa' mode=Matching_Mode.Last matcher=(Text_Matcher Case_Insensitive) . should_equal 'ﬃaa'
            'aﬃﬃb'.replace 'IF' 'X' matcher=(Text_Matcher Case_Insensitive) . should_equal 'aXb'
            'aiﬃffz' . replace 'if' '-' matcher=(Text_Matcher Case_Insensitive) . should_equal 'a--fz'
            'AFFIB'.replace 'ﬃ' '-' matcher=(Text_Matcher Case_Insensitive) . should_equal 'A-B'
-        Test.specify "should be possible in case-insensitive mode" <|
+            'ß'.replace 'SS' 'A' matcher=(Text_Matcher Case_Insensitive) . should_equal 'A'
-            result = "abaBa".replace "b" "a" case_insensitive=True
+            'ß'.replace 'S' 'A' matcher=(Text_Matcher Case_Insensitive) . should_equal 'AA'
-            result . should_equal "aaaaa"
+            'ß'.replace 'S' 'A' mode=Matching_Mode.First matcher=(Text_Matcher Case_Insensitive) . should_equal 'A'
            'ß'.replace 'S' 'A' mode=Matching_Mode.Last matcher=(Text_Matcher Case_Insensitive) . should_equal 'A'
            'STRASSE'.replace 'ß' '-' matcher=(Text_Matcher Case_Insensitive) . should_equal 'STRA-E'
-        Test.specify "should be possible in dot_matches_newline mode" <|
+        Test.specify "should perform simple replacement in Regex mode" <|
-            result = 'ab\na'.replace "b." "a" dot_matches_newline=True
+            "ababab".replace "b" "a" matcher=Regex_Matcher . should_equal "aaaaaa"
-            result . should_equal "aaa"
+            "ababab".replace "b" "a" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "aaabab"
            "ababab".replace "b" "a" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "ababaa"
            "aaaa".replace "aa" "c" matcher=Regex_Matcher . should_equal "cc"
            "aaaa".replace "aa" "c" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "caa"
            "aaaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "aac"
            "aaa".replace "aa" "c" matcher=Regex_Matcher . should_equal "ca"
            "aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "ca"
            "aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal "ac"
            "aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "ca"
            "aaa aaa".replace "aa" "c" matcher=Text_Matcher . should_equal "ca ca"
            "aaa aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Text_Matcher . should_equal "ca aaa"
            "aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal "aaa ac"
            "aaa aaa".replace "aa" "c" matcher=Regex_Matcher . should_equal "ca ca"
            "aaa aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "ca aaa"
            "aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "aaa ca"
        Test.specify "in Regex mode should work with Unicode" <|
            "Korean: 건반".replace "건반" "keyboard" matcher=Regex_Matcher . should_equal "Korean: keyboard"
            'sśs\u{301}'.replace 'ś' '-' matcher=Regex_Matcher . should_equal 's--'
            'sśs\u{301}'.replace 's\u{301}' '-' matcher=Regex_Matcher . should_equal 's--'
        Test.specify "in Regex mode should support various Regex options" <|
            r1 = "İiİ".replace "\w" "a" matcher=(Regex_Matcher match_ascii=True)
            r1 . should_equal "İaİ"
            r2 = "abaBa".replace "b" "a" matcher=(Regex_Matcher case_sensitive=Case_Insensitive)
            r2 . should_equal "aaaaa"
            r3 = 'ab\na'.replace "b." "a"  matcher=(Regex_Matcher dot_matches_newline=True)
            r3 . should_equal "aaa"
        Test.specify "should be possible in multiline mode" <|
            text = """
                Foo
                bar
-            result = text.replace '\n' "" multiline=True
+            r4 = text.replace '\n' ""  matcher=(Regex_Matcher multiline=True)
-            result . should_equal "Foobar"
+            r4 . should_equal "Foobar"
-        Test.specify "should be possible in comments mode" <|
+            r5 = "ababd".replace "b\w # Replacing a `b` followed by any word character" "a" matcher=(Regex_Matcher comments=True)
-            result = "ababd".replace "b\w # Replacing a `b` followed by any word character" "a" comments=True
+            r5 . should_equal "aaa"
-            result . should_equal "aaa"
+
        Test.specify "in Regex mode should allow referring to capture groups in substitutions" <|
            '<a href="url">content</a>'.replace '<a href="(.*?)">(.*?)</a>' '$2 is at $1' matcher=Regex_Matcher . should_equal 'content is at url'
            '<a href="url">content</a>'.replace '<a href="(?<address>.*?)">(?<text>.*?)</a>' '${text} is at ${address}' matcher=Regex_Matcher . should_equal 'content is at url'
 main = Test.Suite.run_main here.spec
--- a/test/Visualization_Tests/src/Geo_Map_Spec.enso
+++ b/test/Visualization_Tests/src/Geo_Map_Spec.enso
@ -28,7 +28,7 @@ spec =
            table  = Table.from_rows header [row_1]
            expect table '{"df_color":["red"],"df_label":["name"],"df_latitude":[11],"df_longitude":[10],"df_radius":[195]}'
-        Test.specify "is case insensitive" <|
+        Test.specify "is case-insensitive" <|
            header = ['latitude' , 'LONGITUDE' , 'LaBeL']
            row_1  = [11         , 10          , 09     ]
            row_2  = [21         , 20          , 19     ]
--- a/test/Visualization_Tests/src/Histogram_Spec.enso
+++ b/test/Visualization_Tests/src/Histogram_Spec.enso
@ -46,7 +46,7 @@ spec =
            table  = Table.from_rows header [row_1, row_2]
            expect table 'value' [10,20]
-        Test.specify "is case insensitive" <|
+        Test.specify "is case-insensitive" <|
            header = ['α', 'Value']
            row_1  = [11 , 10 ]
            row_2  = [21 , 20 ]
--- a/test/Visualization_Tests/src/Scatter_Plot_Spec.enso
+++ b/test/Visualization_Tests/src/Scatter_Plot_Spec.enso
@ -49,7 +49,7 @@ spec =
            table = Table.from_rows header [row_1]
            expect table (labels 'x' 'y') '[{"color":"ff0000","label":"label","shape":"square","size":50,"x":11,"y":10}]'
-        Test.specify "is case insensitive" <|
+        Test.specify "is case-insensitive" <|
            header = ['X' , 'Y' , 'Size' , 'Shape'  , 'Label' , 'Color' ]
            row_1 =  [11  , 10  , 50     , 'square' , 'label' , 'ff0000']
            table = Table.from_rows header [row_1]