Commit Graph

348 Commits

Author SHA1 Message Date
Xavier Deguillard
5002d01e0a revisionstore: allow indexedlogutil users to lookup in different indices
Summary:
The OpenOptions allow for multiple indices to be added, but lookup had no way
to querying these multiple indices.

Reviewed By: quark-zju

Differential Revision: D20445627

fbshipit-source-id: 0cb754ba17b452d892b7bcb56d502d5753ef963a
2020-03-13 19:03:28 -07:00
Xavier Deguillard
01fb3c0a77 revisionstore: add a new StoreKey type
Summary:
This type can either be a Mercurial type key, or a content hash based key. Both
the prefetch and get_missing now can handle these properly. This is essential
for stores where data can either be fetched in both ways or when the data is
split in 2. For LFS for instance, it is possible to have the LFS pointer (via
getpackv2), but not the actual blob. In which case get_missing will simply
return the content hash version of the StoreKey, to signify what it actually
has missing.

Reviewed By: quark-zju

Differential Revision: D20445631

fbshipit-source-id: 06282f70214966cc96e805e9891f220b438c91a7
2020-03-13 19:03:28 -07:00
Xavier Deguillard
d900874401 revisionstore: rename HistoryStore to HgIdHistoryStore
Summary:
Similarly to the DataStore trait, this makes it easier to understand that they
deal with a Mercurial type Key.

Reviewed By: quark-zju

Differential Revision: D20445621

fbshipit-source-id: a1143d5f5d6a2c8686d517a6ea3c25b07c0df072
2020-03-13 19:03:27 -07:00
Xavier Deguillard
2e4742cefc revisionstore: rename DataStore traits to HgIdDataStore
Summary: This makes it clear that these traits are dealing with Mercurial Key.

Reviewed By: quark-zju

Differential Revision: D20445626

fbshipit-source-id: d5acbf442e9407b973e95e40af69b5a61bff0a4d
2020-03-13 19:03:27 -07:00
Jun Wu
cf04fe3e1f thrift-types: recompile Thrift sources
Summary: The thrift compiler and sources are changed.

Reviewed By: xavierd

Differential Revision: D20445164

fbshipit-source-id: f20f16ae02a922042f366a9a80a3642577f60e57
2020-03-13 14:25:23 -07:00
Jun Wu
7a7f98f1b2 configparser: migrate from Bytes to Text
Summary:
Since configparser enforces utf-8 config files (because pest wants Rust strings),
let's migrate from Bytes to Text to remove extra encoding conversions.

Previously this was blocked by the lack of ref-counted text (since the "source"
of each config location is the entire config file). Now minibytes provides Text
so we can use it.

This unfortunately requires dependent code to be updated. The pyconfigparser
interface is in theory wrong - it shouldn't return utf-8 bytes but
local-encoded bytes. I think it's cleaner to make pyconfigparser unaware of
HGENCODING, so I changed pyconfigparser to use unicode, and add compatibility
layer in uiconfig.py.

This also fixes non-ascii encoding issues on user name (especially on Windows).
The hgrc config file should be in utf-8 and the config parser returns explicit
unicode types, and Python code round-trip them with local encodings.

Reviewed By: markbt

Differential Revision: D20432938

fbshipit-source-id: b1359429b8f1c133ab2d6b2deea6048377dfeca1
2020-03-13 10:51:41 -07:00
Jun Wu
715bc5d451 configparser: migrate from bytes to minibytes
Summary:
This makes it easier to further migrate to `Text` interface.
Dependent crate (`auth`) is updated.

Reviewed By: markbt

Differential Revision: D20432941

fbshipit-source-id: 1dc29d52c9b17ce14676ef0555470c6d36a09c2b
2020-03-13 10:51:41 -07:00
Jun Wu
c4ec99ded4 minibytes: implement Text
Summary:
Text is a reference-counted shared String.
It's similar to Bytes but works for utf-8 strings.

The motivation is to replace configparser's use of Bytes to Text.

Reviewed By: markbt

Differential Revision: D20432940

fbshipit-source-id: ef990255d269e60d433c6520819f60ccdcbe488f
2020-03-13 10:51:41 -07:00
Jun Wu
7895e70dcf minibytes: make Bytes abstract
Summary: This makes it possible to implement "Text". See the next diff.

Reviewed By: markbt

Differential Revision: D20432943

fbshipit-source-id: 94b3810ab205c260d33f57bd637e4accc3ee871d
2020-03-13 10:51:40 -07:00
Jun Wu
e9b14b3608 minibytes: implement From<&'static {str,[u8]}>
Summary:
This makes the API easier to use.

Practically this makes it easier for configparser to migrate to minibytes.

Reviewed By: markbt

Differential Revision: D20432942

fbshipit-source-id: ad08eb118d2216054dc24c86b0b129ae82b9d17c
2020-03-13 10:51:40 -07:00
Jun Wu
ad8190713b cpython-ext: serialize Rust str into Python str type
Summary:
Previously Rust str was serialized into bytes. To be Python 3 friendly, let's
serialize it into `str`.

Reviewed By: markbt

Differential Revision: D19797706

fbshipit-source-id: 388eb044dc7e25cdc438f0c3d6fa5a5740f22e3d
2020-03-12 12:19:38 -07:00
Jun Wu
3376363721 tracing-collector: add is_event to TreeSpan
Summary: Expose the is_event property via public APIs.

Reviewed By: DurhamG

Differential Revision: D19797705

fbshipit-source-id: f441825e98208964f7b3d6815a177b464430cbb7
2020-03-12 12:19:38 -07:00
Stanislau Hlebik
ba871d3bdc xdiff: allow rendering diff for large files
Summary:
The goal of the stack is to support "rendering" diffs for large files in scs
server. Note that rendering is in quotes - we are fine with just showing a
placeholder like "Binary file ... differs". This is still better than the
current behaviour which just return an error.

In order to do that I suggest to tweak xdiff library to accept FileContentType
which can be either Normal(...) meaning that we have file content available, or
Omitted, which usually means the file is large and we don't even want to fetch it, and we
just want xdiff to generate a placeholder.

Reviewed By: markbt, krallin

Differential Revision: D20389226

fbshipit-source-id: 0b776d4f143e2ac657d664aa9911f6de8ccfea37
2020-03-12 04:27:23 -07:00
Jun Wu
194b38385a nameset: add a way to convert between NameSet and SpanSet
Summary:
This will be used in the Python world for legacy reasons. It shouldn't be used
in new Rust node.

To use it, the name `LegacyCodeNeedIdAccess` has to be used so we can do a code
search to find all users of it.

Reviewed By: sfilipco

Differential Revision: D20367834

fbshipit-source-id: 9b93a29f1461ce24bba6f31a2bbb1f327e216c6d
2020-03-11 20:37:30 -07:00
Jun Wu
eef56d9c5b namedag: add a sort API
Summary: This will be useful to actually sort commits.

Reviewed By: sfilipco

Differential Revision: D20367835

fbshipit-source-id: 43bc7835277af3a14ef323ce34247e0c03878dc8
2020-03-11 20:37:29 -07:00
Jun Wu
2ecc0bb757 namedag: move "all" concept to DagSet
Summary:
The old "AllSet" implementation is not very practical - it does not support
iteration. Practically, the "all()" set comes from the DAG. Change the "all"
concept to a hint similar to "is_topo_sorted", and update the fast path
(intersection) accordingly.

Reviewed By: sfilipco

Differential Revision: D20367837

fbshipit-source-id: fdbf370897c93058bfcab0571c1f6fa4b99b0f6b
2020-03-11 20:37:29 -07:00
Jun Wu
ef1696b4db namedag: rename arc_map to snapshot_map
Summary: The word "snapshot" more accurately describes its purpose.

Reviewed By: sfilipco

Differential Revision: D20367836

fbshipit-source-id: c91a0bd402fa1718b5d805beedc0e062824c53d3
2020-03-11 20:37:29 -07:00
Jun Wu
c5c75c9f59 fsinfo: autocorrect "" to "."
Summary:
Without this:

  In [3]: util.getfstype('')
  IOError: [Errno 2] No such file or directory (os error 2)

And there is a code path hitting this:

  File "edenscm/mercurial/util.py", line 1483, in checknlink
    fstype = getfstype(os.path.dirname(testfile))
		# testfile = '.'
	  # os.path.dirname(".") = ""

The old implementation works fine for an empty path:

	In [2]: m.util.getfstype('')
  Out[2]: 'eden'

So let's make the new Rust implementation consistent.

Reviewed By: xavierd

Differential Revision: D20313387

fbshipit-source-id: 258c424a3e8a796d983e20b0d4656e8e3f413706
2020-03-11 17:35:40 -07:00
Jun Wu
61bebcaacc fsinfo: try harder to get fuse fs type
Summary: Similar to D13982877. Try to get names like "fuse.ntfs".

Reviewed By: farnz

Differential Revision: D20313392

fbshipit-source-id: 8363d3d92843e6afb53a0003950be083034bd841
2020-03-11 17:35:39 -07:00
Jun Wu
13374f9d74 fsinfo: drop most type parameters
Summary:
Only keep type parameters at the top-level function.
This reduces the binary size and speeds up rustc.

Reviewed By: xavierd

Differential Revision: D20313388

fbshipit-source-id: 29d77731ff462fee1f1bb9f234601e3430198ae7
2020-03-11 17:35:39 -07:00
Jun Wu
c83006002c fsinfo: return unknown on unsupported platforms
Summary: This makes the code a bit more portable.

Reviewed By: xavierd

Differential Revision: D20313389

fbshipit-source-id: 080538939fa4d2d72e5905f23ad9be987d952748
2020-03-11 17:35:38 -07:00
Jun Wu
9cdc818915 fsinfo: drop "repo" from method names
Summary:
Rename the main method to "fstype". The API has no relation with repo.
So let's rename it.

Reviewed By: xavierd

Differential Revision: D20313386

fbshipit-source-id: 80dd1231ccccfe945150b117b151bce773f0dfeb
2020-03-11 17:35:38 -07:00
Jun Wu
951c8ab082 fsinfo: backport from telemetry
Summary: The fsinfo crate provides the "filesystem type" information.

Reviewed By: xavierd

Differential Revision: D20313391

fbshipit-source-id: f717f5edb32957d59d03090117cfdb8123f03933
2020-03-11 17:35:37 -07:00
Xavier Deguillard
f466037b4b revisionstore: fix memcache test flakiness
Summary:
Since the mocked memcache is shared between the tests, we need to make sure the
keys used by the tests are different, otherwise they are just caching each
others data.

Reviewed By: ikostia

Differential Revision: D20388783

fbshipit-source-id: 0f2f926e0ffe0e52e55291e46142808ce0921288
2020-03-11 15:58:03 -07:00
Jun Wu
97e9b81ba5 indexedlog: remove compiler warnings on Windows
Summary:
Some `use`s are not used on Windows. The code was also formatted using the
latest rustfmt.

Reviewed By: xavierd

Differential Revision: D20379704

fbshipit-source-id: ffadcd68e4e0440dcbd2a4e1ad8532b47a9d83e2
2020-03-11 15:54:19 -07:00
Xavier Deguillard
c98b9cfff9 revisionstore: remove Arc from MetadataStore
Summary: Similarly to the ContentStore, remove the Arc from MetadataStore.

Reviewed By: quark-zju

Differential Revision: D20376838

fbshipit-source-id: 4321600b752c919b6d9fa7bdee6f6cb7ae083b10
2020-03-11 13:39:06 -07:00
Xavier Deguillard
7e704ec7fb revisionstore: remove the Arc from ContentStore
Summary:
The clients should use an Rc/Arc if they need the ability to clone it. This
makes it more obvious and reduces the number of pointer indirection.

Reviewed By: quark-zju

Differential Revision: D20376839

fbshipit-source-id: c56e7e8f89ab17727be621894c329e344a7f3adb
2020-03-11 13:39:05 -07:00
Jun Wu
4960709aa3 dag: do not depend on types
Summary:
The dag crate is designed to work with any kind of binary commit hashes (ex. bonsai,
git or hg). The only use of `types` is to convert from binary to hex. Since dag
already has its own `to_hex` logic in `VertexName`. Let's use that instead.

Reviewed By: sfilipco

Differential Revision: D20378447

fbshipit-source-id: 00ecb551ea927fdb60dd91e5e645064f23139bcd
2020-03-11 10:49:31 -07:00
Jun Wu
009ea22175 indexedlog: retry rename in atomic_write on Windows
Summary:
Recently there are some Windows-related test flakiness in . All of them are
caused by `file.persist(path)` in `atomic_write_plain` failing with
"Access Denied". Since that can be caused by Windows Anti-Virus scans or other
weird stuff, let's workaround around it using automatically retires.

Process Explorer does not provide extra information:

    indexedlog-d0c6135fd7ed9ece.exe	5868	SetRenameInformationFile	C:\Users\quark\AppData\Local\Temp\.tmpKERc5G\.tmpcfDsQQ	ACCESS DENIED	ReplaceIfExists: True, FileName: C:\Users\quark\AppData\Local\Temp\.tmpKERc5G\meta

A successful rename looks like:

    indexedlog-d0c6135fd7ed9ece.exe	5868	SetRenameInformationFile	C:\Users\quark\AppData\Local\Temp\.tmpKERc5G\.tmpbXEVw0	SUCCESS	ReplaceIfExists: True, FileName: C:\Users\quark\AppData\Local\Temp\.tmpKERc5G\meta

Reviewed By: ikostia

Differential Revision: D20379618

fbshipit-source-id: db3e6be3d785875486f7a517df11cbf58bf65ddd
2020-03-11 10:06:47 -07:00
Xavier Deguillard
5d230aef68 backingstore: use get_file_content to strip metadata
Summary:
Now that the ContentStore can automatically strip the metadata header, no need
for duplicated code in the backingstore.

Reviewed By: fanzeyi

Differential Revision: D20376812

fbshipit-source-id: e863e1cc2fcdc8b9e612a464b305fa25ceb66e13
2020-03-11 09:40:26 -07:00
Xavier Deguillard
40bbe7b4da merge: add a Rust threaded file updater
Summary:
During `hg update`, Mercurial forks multiple processes to write files on disk
concurrently, this is done as fetching blobs from the content store, and
writing them to disk is CPU bound. Usually, threads would be the preferred way
of speeding up such process, but unfortunately, Python has GIL that severely
limit the available concurrency. So, multiple processes were chosen.

Unfortunately, the multi-process solution also brings a lot of other issues,
more recently, we've had cases where the connections to the server and memcache
had to be dropped after the fork. In some other cases, this caused deadlocks.
And the solution is not effective on Windows.

Now that Mercurial is getting more and more Rust, we could instead go back to
the threads solution by using them in Rust, and have Python just push work to
them, this is exactly what this change does.

Things that are left to be done, but I wanted to get a diff out first:
 - no file path audit
 - no file backup
 - no symlink creation
 - probably other things I'm missing

Reviewed By: quark-zju

Differential Revision: D20102888

fbshipit-source-id: d47829fd7818b97710586b9851880f178048e27b
2020-03-11 01:13:54 -07:00
Xavier Deguillard
185bc0f437 revisionstore: add an LfsMultiplexer store
Summary:
With this new store, blobs will be transparently written to either an LFS
store, or a non-LFS one, depending on their size.

Initially, and as long as getpackv2 is supported, we also need to support
parsing lfs pointer data that the server is sending and write these to the lfs
pointer store. This code is very adhoc and does manual parsing of the pointer
data, definitively not great, suggestion for a simple and better solution is
welcome :).

From a migration standpoint, the read-only LFS stores are added to the
ContentStore, this allows blobs written in it to be readable at all time even
when `remotefilelog.lfs` isn't set. The code will effecitvely be dormant for a
while until the option is turned on, if we need to disable it, the dormant code
will still be able to read all the blobs written to disk. This forces us to
deploy a release that contains this code to stable first, before setting
`remotefilelog.lfs`.

Reviewed By: quark-zju

Differential Revision: D19986878

fbshipit-source-id: 260f5a542d52e748c0c703bfa7bb8ffac0e7b388
2020-03-10 18:14:54 -07:00
Jun Wu
5f84fc8222 indexedlog: use dev-logger
Summary: This makes `RUST_LOG` work for indexedlog tests.

Reviewed By: xavierd

Differential Revision: D20286515

fbshipit-source-id: ff4a1476eb01a9067dabe3622fd598f65fe86a18
2020-03-10 14:16:39 -07:00
Jun Wu
7a12c33163 dev-logger: a simple library to enable env_logger for testing
Summary:
The tracing / env_logger integration works for hg as a binary. However I'd also
like to use it in library tests. This crate makes it easier to do so.

Reviewed By: xavierd

Differential Revision: D20286507

fbshipit-source-id: f5bf3288ce950591ddfe64b524ad51ce21ee4099
2020-03-10 14:16:38 -07:00
Jun Wu
cf72dc45f5 indexedlog: add some tracing information
Summary: Those has helped me debugging some issues.

Reviewed By: xavierd

Differential Revision: D20286513

fbshipit-source-id: 012ddb16c2d0efd8f8697a5ecd4564ea31d65630
2020-03-10 14:16:38 -07:00
Jun Wu
e7ed737a64 hgcommands: make env_logger show exit code
Summary: Move the scope of spans so the exit code is shown.

Reviewed By: xavierd

Differential Revision: D20286516

fbshipit-source-id: f39cbf60c86ea19a1bb0a09958748f04ff6a42e8
2020-03-10 14:16:37 -07:00
Jun Wu
bb9023c2cb hgcommands: move env_logger initialization to hgcommands
Summary:
Previously env_logger is only initialized if Python is initialized.
This diff makes env_logger initialized for Rust native commands.

Reviewed By: xavierd

Differential Revision: D20286517

fbshipit-source-id: 18fee96c2b41db1da9648d615d1e18809de90a63
2020-03-10 14:16:37 -07:00
Jun Wu
97d0a976fd tracing: make it write to the log eco-system
Summary:
This means crates like env_logger (which reads $RUST_LOG, and writes to stderr)
can be used for convenient debugging.

Reviewed By: xavierd

Differential Revision: D20286514

fbshipit-source-id: e3b80cc4830ba5cc6dbf7aa1cbb92a4f4f046a54
2020-03-10 14:16:37 -07:00
Jun Wu
796f199130 tracing: save static metadata from tracing to Spans
Summary:
Those metadata include module_path, target, line number, etc, in Rust native
format.  They will be used for the upcoming `log` integration.

Reviewed By: xavierd

Differential Revision: D20286510

fbshipit-source-id: 27019b941bef08c0bb3e505bbdae642282dcb141
2020-03-10 14:16:36 -07:00
Stefan Filip
d8b4ddcecf dag: split lock file acquisition to own function
Summary:
Spliting lock file acquisition from `IdDag::prepare_filesystem_sync` to its own
function.
Useful when looking ahead to split IdDag from IndexedLog.

Reviewed By: quark-zju

Differential Revision: D20316443

fbshipit-source-id: a0fd43439730376920706bb4349ce497f6624335
2020-03-09 10:18:07 -07:00
Stefan Filip
620cdd96f2 dag: add IdDag::iter_segments_with_parent
Summary:
This removes an inline use of the indexedlog indexes.
This is going to be useful when we try to separate IndexedLog specifics from
IdDag functionality.

Reviewed By: quark-zju

Differential Revision: D20316058

fbshipit-source-id: 942a0a71660bb327376c81fd3ac435d002ecca6e
2020-03-09 10:18:07 -07:00
Kuba Zika
6a25dbee81 Simplify error pattern matching
Summary:
Instead of returning `anyhow::Error` wrapping an `ErrorKind` enum
from each Thrift client method, just return an error type specific
to that method. This will make error handling simpler and less
error-prone by removing the need to downcast the returned error.

This diff also removes the `ErrorKind` enums so that we can be sure
that there are no leftover places trying to downcast to them.

(Note: this ignores all push blocking failures!)

Reviewed By: dtolnay

Differential Revision: D20260398

fbshipit-source-id: f0dd96a7b83dd49f6b30948660456539012f82e6
2020-03-06 12:09:38 -08:00
Jun Wu
3103fcf62b indexedlog: reload content after obtaining a lock at open time
Summary:
The old code does "read, lock, write", which is unsound because after "lock"
the data just read can be outdated and needs a reload.

Reviewed By: xavierd

Differential Revision: D20306137

fbshipit-source-id: a1c29d5078b2d47ee95cf00db8c1fcbe3447cccf
2020-03-06 08:12:02 -08:00
Jun Wu
75e4ffc17f indexedlog: change IndexDef.lag_threshold back from entries to bytes
Summary:
I thought the index function could be the bottleneck. However, the Log reading
(xxhash, decoding vlqs) can be much slower for very long entries. Therefore
using bytes as the lag threshold is better. It does leaked the Log
implementation details (how it encodes an entry) to some extend, though.

Reverts D20042045 and D20043116 logically. The lagging calculation is using
the new Index::get_original_meta API, which is easier to verify correctness
(In fact, it seems the old code is wrong - it might skip Index flushes if
sync() is called multiple times without flushing).

This should mitigate an issue where a huge entry (generated by `hg trace`) in
blackbox does not get indexed in time and cause performance regressions.

Reviewed By: DurhamG

Differential Revision: D20286508

fbshipit-source-id: 7cd694b58b95537490047fb1834c16b30d102f18
2020-03-05 13:29:48 -08:00
Jun Wu
efff6f3592 indexedlog: add an API to get the Index meta that is not dirty
Summary: This will be used to more reliably detect index lags.

Reviewed By: DurhamG

Differential Revision: D20286518

fbshipit-source-id: c553b6587363a55603b75df12580588e3100e35f
2020-03-05 13:29:47 -08:00
Jun Wu
66e60bacb9 rotatelog: build indexes for older logs on access
Summary:
This ensures indexes are complete even if index format or definition has been
changed.

Reviewed By: DurhamG

Differential Revision: D20286509

fbshipit-source-id: fcc4ebc616a4501e4b6fd2f1a9826f54f40b99b8
2020-03-05 13:29:47 -08:00
Jun Wu
669c58bd56 blackbox: use RotateLog::iter_dirty()
Summary:
This avoids loading all blackbox logs when `init()` gets called multiple times
(for example, once in Rust and once in Python).

Reviewed By: DurhamG

Differential Revision: D20286511

fbshipit-source-id: ef985e454782b787feac90a6249651a882b6552e
2020-03-05 13:29:47 -08:00
Jun Wu
1c6310b9d6 rotatelog: add iter_dirty() API
Summary: This API has the benefit that it does not trigger loading older logs.

Reviewed By: DurhamG

Differential Revision: D20286512

fbshipit-source-id: 426421691ad1130cdbb2305612d76f18c9f8798c
2020-03-05 13:29:46 -08:00
Jun Wu
64ba669a51 nameset: add some tests for DagSet
Summary:
With the new crate-public interfaces and Debug implementations it's possible to
write tests for DagSet. So let's do it.

Reviewed By: sfilipco

Differential Revision: D20242561

fbshipit-source-id: 180e04d9535f79471c79c4307f6ab6e8e8815067
2020-03-05 11:46:18 -08:00
Xavier Deguillard
34bce8690f revisionstore: silence compiler warning
Summary:
Don't restrict constructing a c_api datapack store to only Unix, we can
construct it on Windows too by assuming that their path will be valid UTF-8.

Reviewed By: quark-zju

Differential Revision: D20250718

fbshipit-source-id: 07234b6a71b50c803cfe3b962fa727f57037c919
2020-03-05 09:35:57 -08:00