Merge pull request #64 from rrrepsac/patch-1

Patch 1
2024-10-03 23:57:47 +03:00 · 2023-02-05 22:13:25 +02:00 · 2023-02-05 22:13:25 +02:00 · d1ffe9106c
commit d1ffe9106c
parent 37e5d3cc33 06ca47995b
1 changed files with 3 additions and 3 deletions
--- a/doc/tutorials/low-resource.md
+++ b/doc/tutorials/low-resource.md
@ -73,7 +73,7 @@ Note that the data splits are done on-the-fly from shuffled data sets and this m

 ## Generate back-translations

-Back-translation requires a moel in the opposite direction. First thing to do is to reverse the data. This can be done without generating them from scratch:
+Back-translation requires a model in the opposite direction. First thing to do is to reverse the data. This can be done without generating them from scratch:

 ```
 make SRCLANGS=en TRGLANGS=br reverse-data
@ -141,7 +141,7 @@ backtranslate/br-en/latest/wiki.aa.br-en.en.gz

 Another way of augmenting training data is to translate existing bitexts on one side of the bitext to create more data for the language pair we are interested in. For example, for the case of English-Breton translation we can translate the French part of French-Breton bitexts to English using an existing French-English translation model. The latter is a high-resource language pair and decent transaltions can be expected. All this is supported by the recipes in `pivoting`.

-Forst of all, you can check what kind of bitexts are available for a pivot language like French:
+First of all, you can check what kind of bitexts are available for a pivot language like French:

 ```
 make -C pivoting SRC=en TRG=br PIVOT=fr print-all-data
@ -303,7 +303,7 @@ For the other direction, the additional back-translation loop does not seem to w
 ## Multilingual models

 Another common approach to improve low-resource translation is to rely on transfer learning and multilingual models.
-The basic steps are the same, only some variables need to be adjusted. Most importantly, you need to set several source and target languages to be covered by the model. All combinations of thos languages will be considered. Furthermore, it might be useful to activate over and under-sampling of data to have more equal proportions of data for each language pair. This is done by setting a value to `FIT_DATA_SIZE` (number of training examples, i.e. aligned sentence pairs). Here would be the example for training a mode between English and French to a number of celtic languages (including Breton):
+The basic steps are the same, only some variables need to be adjusted. Most importantly, you need to set several source and target languages to be covered by the model. All combinations of those languages will be considered. Furthermore, it might be useful to activate over and under-sampling of data to have more equal proportions of data for each language pair. This is done by setting a value to `FIT_DATA_SIZE` (number of training examples, i.e. aligned sentence pairs). Here would be the example for training a mode between English and French to a number of celtic languages (including Breton):

 ```
 make SRCLANGS="en fr" TRGLANGS="ga cy br gd kw gv" FIT_DATA_SIZE=500000 config