Commit Graph

4905 Commits

Author SHA1 Message Date
Marcin Junczys-Dowmunt
3a478fc47d update version and changelog 2021-07-09 13:46:18 -07:00
Martin Junczys-Dowmunt
35c822eb4e Merged PR 19685: Marianize LSH as operators for mmapping and use in Quicksand
This PR turns the LSH index and search into a set of operators that live in the expression graph. This makes creation etc. thread-safe (one index per graph) and allows to later implement GPU versions.

This allows to mmap the LSH as a Marian parameter since now we only need to turn the index into something that can be saved to disk using the existing tensors. This happens in marian_conv or the equivalent interface function in the Quicksand interface.
2021-07-09 20:35:09 +00:00
Hieu Hoang
d6c09b24de Merged PR 19409: Unify LSH and short list interface
This PR unifies the shortlist and LSH interface achieving significant speed-up for the LSH.
2021-07-03 20:19:39 +00:00
Marcin Junczys-Dowmunt
9772aa293f remaining comments 2021-07-03 12:13:26 -07:00
Marcin Junczys-Dowmunt
8bfa6a44e3 Merge branch 'hihoan/lsh7' of vs-ssh.visualstudio.com:v3/machinetranslation/Marian/marian-dev into hihoan/lsh7 2021-07-03 12:01:22 -07:00
Hieu Hoang
4ace42f35a paper 2021-07-02 20:30:50 -07:00
Hieu Hoang
5ad0edf6df remove todo 2021-07-02 19:07:42 -07:00
Hieu Hoang
9acf27d6bc credit SLIDE 2021-07-02 12:16:57 -07:00
Hieu Hoang
bd1f1ee9cb marcin's review changes 2021-07-02 12:06:03 -07:00
Hieu Hoang
ff8af52624 lock index before creation 2021-07-01 15:39:18 -07:00
Marcin Junczys-Dowmunt
64e787afce Merge branch 'master' into hihoan/lsh7 2021-06-29 10:42:13 -07:00
Hieu Hoang
7e2fce8f09 Merge branch 'hihoan/lsh7' of vs-ssh.visualstudio.com:v3/machinetranslation/Marian/marian-dev into hihoan/lsh7 2021-06-28 21:26:27 -07:00
Hieu Hoang
24c644bae0 pass shortlist regression tests 2021-06-28 21:26:02 -07:00
Marcin Junczys-Dowmunt
8daa0a4255 fix compilation errors due to narrow conversion 2021-06-28 19:26:40 -07:00
Martin Junczys-Dowmunt
fc0f41f24a Merged PR 19597: Enable mpi wrapper to use size larger than MAX_INT
Enable mpi wrapper to use size larger than MAX_INT.
2021-06-28 23:15:23 +00:00
Hieu Hoang
cd292d3b32 changes for review 2021-06-18 10:18:31 -07:00
Hieu Hoang
a332e550a5 debug 2021-06-16 12:56:36 -07:00
Marcin Junczys-Dowmunt
85eb6adce0 update sentencepiece pointer to version with case-awareness 2021-06-16 12:40:38 -07:00
Hieu Hoang
9b4a845cc7 clean up bias 2021-06-16 11:19:23 -07:00
Hieu Hoang
892554129e lemma Et is optional 2021-06-16 10:19:24 -07:00
Hieu Hoang
395a4f94d0 init vector 2021-06-15 18:08:45 -07:00
Hieu Hoang
6981b21f4e Merge branch 'hihoan/lsh7' of vs-ssh.visualstudio.com:v3/machinetranslation/Marian/marian-dev into hihoan/lsh7 2021-06-15 16:54:57 -07:00
Hieu Hoang
488a532bdf get lemma size from vocab class 2021-06-15 16:54:17 -07:00
Hieu Hoang
7e6ec58507 delete variables altogether 2021-06-14 19:07:17 -07:00
Hieu Hoang
82fa059a03 'use' variables 2021-06-14 19:03:53 -07:00
Hieu Hoang
5362c2cc0e don't define BLAS_FOUND 2021-06-14 18:47:08 -07:00
Hieu Hoang
5b7b1f7e5c no need for args in getIndicesExpr(). Deleted debugging 2021-06-14 18:44:05 -07:00
Hieu Hoang
8c04f66474 reverse batch beam argument order 2021-06-15 00:10:08 +00:00
Hieu Hoang
dffbb47eea rename broadcast -> createCachedTensors 2021-06-14 23:18:21 +00:00
Hieu Hoang
cc295938ce incorrect dimension order 2021-06-14 22:29:27 +00:00
Hieu Hoang
8649034760 no need to broadcast 2021-06-11 23:26:10 +00:00
Hieu Hoang
700dc7fdd1 don't transpose lastIndices. Works for lsh & sl 2021-06-11 22:55:26 +00:00
Hieu Hoang
49998217d9 don't transpose lastIndices. Works for lsh 2021-06-11 22:47:15 +00:00
Hieu Hoang
f0251889f2 debug 2021-06-11 00:57:02 -07:00
Hieu Hoang
fef7202bc8 batch-beam -> beam-batch 2021-06-10 23:58:25 -07:00
Hieu Hoang
5a93c67185 origBatchIdx -> currentBatchIdx. Doesn't crash but bad results 2021-06-09 23:31:06 +00:00
Hieu Hoang
fe97259d3d debug 2021-06-09 22:58:25 +00:00
Hieu Hoang
6f0f534a4a debug 2021-06-09 22:36:34 +00:00
Hieu Hoang
1e3db86a94 batch based filtering. COmment out debug 2021-06-09 21:56:51 +00:00
Hieu Hoang
4b9082bc39 don't manually broadcast lemma 2021-06-09 20:57:02 +00:00
Hieu Hoang
79dbde7efc don't manually broadcast weights 2021-06-09 20:45:40 +00:00
Hieu Hoang
0bc9b22b15 separate broadcast 2021-06-09 20:18:19 +00:00
Hieu Hoang
5d1946ebd3 filter & broadcast every word. SL works 2021-06-09 20:07:57 +00:00
Hieu Hoang
e07e0368c9 debug 2021-06-08 18:20:56 -07:00
Hieu Hoang
92c6c07786 reshape cachedShortWt_ 2021-06-07 15:43:54 -07:00
Hieu Hoang
eb3f540d42 debug 2021-06-07 15:37:22 -07:00
Hieu Hoang
b5f97dc605 reshape cachedShortLemmaEt 2021-06-07 15:35:18 -07:00
Hieu Hoang
acdff77688 reduce tranform for no-shortlist 2021-06-07 15:29:20 -07:00
Hieu Hoang
0949a4c914 start using bdot 2021-06-07 15:05:56 -07:00
Hieu Hoang
bc4ad2408c Merge branch 'mjd/bdot' into hihoan/lsh7 2021-06-07 14:16:41 -07:00