mirror of
https://github.com/facebook/sapling.git
synced 2024-10-08 15:57:43 +03:00
77c63b0f24
The original code uses the similary score 1 - len(diff(after, before)) / len(after) The diff can at most be the size of the 'before' file, so any small 'before' file would be considered very similar. Removing an empty file would cause all files added in the same revision to be considered copies of the removed file. This changes the metric to bytes_overlap(before, after) / len(before + after) i.e. the actual percentage of bytes shared between the two files.
38 lines
543 B
Bash
Executable File
38 lines
543 B
Bash
Executable File
#!/bin/sh
|
|
|
|
hg init rep; cd rep
|
|
|
|
touch empty-file
|
|
python -c 'for x in range(10000): print x' > large-file
|
|
|
|
hg addremove
|
|
|
|
hg commit -m A
|
|
|
|
rm large-file empty-file
|
|
python -c 'for x in range(10,10000): print x' > another-file
|
|
|
|
hg addremove -s50
|
|
|
|
hg commit -m B
|
|
|
|
cd ..
|
|
|
|
hg init rep2; cd rep2
|
|
|
|
python -c 'for x in range(10000): print x' > large-file
|
|
python -c 'for x in range(50): print x' > tiny-file
|
|
|
|
hg addremove
|
|
|
|
hg commit -m A
|
|
|
|
python -c 'for x in range(70): print x' > small-file
|
|
rm tiny-file
|
|
rm large-file
|
|
|
|
hg addremove -s50
|
|
|
|
hg commit -m B
|
|
|