mirror of
https://github.com/facebookresearch/fairseq.git
synced 2024-10-04 04:37:58 +03:00
f34abcf2b6
Summary: We use omegaconf.DictConfig objects in non-strict mode, so hasattr behaves weirdly: ``` >>> import omegaconf >>> omegaconf.__version__ '2.0.6' >>> x = omegaconf.DictConfig({"a": 1}) >>> hasattr(x, "foo") True ``` This violates some assumptions in various parts of the code. For example, previously this command was incorrectly missing the final layer norm due to upgrade logic that relied on `hasattr`, but is fixed after this diff: ``` CUDA_VISIBLE_DEVICES=0 python train.py --task dummy_lm --arch transformer_lm_gpt3_small --optimizer adam --lr 0.0001 --max-sentences 8 --log-format json --log-interval 1 ``` Pull Request resolved: https://github.com/fairinternal/fairseq-py/pull/2347 Reviewed By: alexeib Differential Revision: D31170584 Pulled By: myleott fbshipit-source-id: bd767b7497794314f58f0f8073cdd4332b214006 |
||
---|---|---|
.. | ||
mining | ||
sentence_retrieval | ||
unsupervised_mt | ||
download_and_preprocess_flores_test.sh | ||
download_and_preprocess_tatoeba.sh | ||
README.md | ||
save_encoder.py |
Cross-lingual Retrieval for Iterative Self-Supervised Training
https://arxiv.org/pdf/2006.09526.pdf
Introduction
CRISS is a multilingual sequence-to-sequnce pretraining method where mining and training processes are applied iteratively, improving cross-lingual alignment and translation ability at the same time.
Requirements:
- faiss: https://github.com/facebookresearch/faiss
- mosesdecoder: https://github.com/moses-smt/mosesdecoder
- flores: https://github.com/facebookresearch/flores
- LASER: https://github.com/facebookresearch/LASER
Unsupervised Machine Translation
1. Download and decompress CRISS checkpoints
cd examples/criss
wget https://dl.fbaipublicfiles.com/criss/criss_3rd_checkpoints.tar.gz
tar -xf criss_checkpoints.tar.gz
2. Download and preprocess Flores test dataset
Make sure to run all scripts from examples/criss directory
bash download_and_preprocess_flores_test.sh
3. Run Evaluation on Sinhala-English
bash unsupervised_mt/eval.sh
Sentence Retrieval
1. Download and preprocess Tatoeba dataset
bash download_and_preprocess_tatoeba.sh
2. Run Sentence Retrieval on Tatoeba Kazakh-English
bash sentence_retrieval/sentence_retrieval_tatoeba.sh
Mining
1. Install faiss
Follow instructions on https://github.com/facebookresearch/faiss/blob/master/INSTALL.md
2. Mine pseudo-parallel data between Kazakh and English
bash mining/mine_example.sh
Citation
@article{tran2020cross,
title={Cross-lingual retrieval for iterative self-supervised training},
author={Tran, Chau and Tang, Yuqing and Li, Xian and Gu, Jiatao},
journal={arXiv preprint arXiv:2006.09526},
year={2020}
}