sapling/eden/fs/model/ObjectId.h
Andrey Chursin 08f337f7ab embed proxy hashes into object id [proxy hash removal 7/n]
Summary:
This diff introduces config store:embed-proxy-hashes.

When this config is set, we store HgId directly into ObjectId, instead of using proxy hash object.
This allows to bypass proxy hash rocks db storage when reading files.

**Compatibility notes**

This diff is compatible with previous versions unless store:embed-proxy-hashes config is set.

Once config is set, new ObjectId format is used and serialized into inodes. Once this is done previous versions of eden fs won't be able to read overlay inodes created by this version.

This means we need to be careful with setting this config - once set we won't be able to roll back eden fs version easily, it will basically require re-creating eden checkout.

Inodes created prior to this config being set will remain written in old format, only when new inode is written is when new format is used.

**Git tree format issue**

We use git tree serialization format in the LocalStore to serialize trees.
This format assumes 20-byte hashes and is not compatible with variable length ObjectId.

In this diff we bypass this issue by not storing trees into local store. This seem ok in terms of correctness, because tree information can always be fetched from mercurial.

However, this seem to impose performance penalty on some work loads (see below).

We can solve this by either introducing new format that supports var length object id(short term), or by getting rid of tree cache and efficiently getting the data directly from mercurial(long term).

**Performance numbers**

Hot file access time is reduced by 50%:
```
$ fsprobe.sh run cat.targets

Before:
lat: 0.2331 ms, qps: 4, dur: 28.697384178s, 123092 files, 217882490 bytes, 1641 errors, rate 7.59 Mb/s

After:
lat: 0.1611 ms, qps: 6, dur: 19.835917353s, 123092 files, 217882490 bytes, 1641 errors, rate 10.98 Mb/s
```

However, we do not see improvement with arc focus, most likely due to bypassing tree serialization, so we will need to figure out that issue.

We can still merge this diff and see if enabling this feature on other workloads like sandcastle is benefitical.

Reviewed By: chadaustin

Differential Revision: D31777929

fbshipit-source-id: fc4b678477d0737c9f242968f0be99ed04f4f58a
2021-11-05 17:05:43 -07:00

170 lines
4.1 KiB
C++

/*
* Copyright (c) Facebook, Inc. and its affiliates.
*
* This software may be used and distributed according to the terms of the
* GNU General Public License version 2.
*/
#pragma once
#include <boost/operators.hpp>
#include <fmt/format.h>
#include <folly/FBString.h>
#include <folly/Range.h>
#include <stdint.h>
#include <array>
#include <iosfwd>
namespace folly {
class IOBuf;
}
namespace facebook::eden {
/**
Identifier of objects in local store.
This identifier is a variable length string.
*/
class ObjectId : boost::totally_ordered<ObjectId> {
public:
// fbstring has more SSO space (23 bytes!) than std::string and thus can hold
// 20-byte hashes inline.
using Storage = folly::fbstring;
/**
* Create an empty object id
*/
ObjectId() noexcept : bytes_{} {}
explicit ObjectId(Storage fbs) noexcept : bytes_{std::move(fbs)} {}
explicit ObjectId(folly::ByteRange bytes)
: bytes_{constructFromByteRange(bytes)} {}
/**
* Compute the SHA1 hash of an IOBuf chain.
*/
static ObjectId sha1(const folly::IOBuf& buf);
/**
* Compute the SHA1 hash of a std::string.
*/
static ObjectId sha1(const std::string& str) {
return sha1(folly::ByteRange{folly::StringPiece{str}});
}
/**
* Compute the SHA1 hash of a ByteRange.
*/
static ObjectId sha1(folly::ByteRange data);
/**
* Returns bytes content of the ObjectId
*/
folly::ByteRange getBytes() const {
return folly::ByteRange{folly::StringPiece{bytes_}};
}
char operator[](size_t pos) const {
return bytes_[pos];
}
/**
* Returns size of this ObjectId
*/
size_t size() const {
return bytes_.size();
}
/** @return [lowercase] hex representation of this ObjectId. */
std::string toLogString() const {
return asHexString();
}
std::string asHexString() const;
/** @return bytes of this ObjectId. */
std::string asString() const;
size_t getHashCode() const noexcept;
bool operator==(const ObjectId&) const;
bool operator<(const ObjectId&) const;
static ObjectId fromHex(folly::StringPiece hex) {
return ObjectId{constructFromHex(hex)};
}
private:
static Storage constructFromByteRange(folly::ByteRange bytes) {
return Storage{(const char*)bytes.data(), bytes.size()};
}
static Storage constructFromHex(folly::StringPiece hex) {
if (hex.size() % 2 != 0) {
throwInvalidArgument(
"incorrect data size for Hash constructor from string: ", hex.size());
}
folly::fbstring result;
auto size = hex.size() / 2;
result.reserve(size);
for (size_t i = 0; i < size; i++) {
result.push_back(hexByteAt(hex, i));
}
return result;
}
static constexpr char hexByteAt(folly::StringPiece hex, size_t index) {
return (nibbleToHex(hex.data()[index * 2]) * 16) +
nibbleToHex(hex.data()[(index * 2) + 1]);
}
static constexpr char nibbleToHex(char c) {
if ('0' <= c && c <= '9') {
return c - '0';
} else if ('a' <= c && c <= 'f') {
return 10 + c - 'a';
} else if ('A' <= c && c <= 'F') {
return 10 + c - 'A';
} else {
throwInvalidArgument(
"invalid hex digit supplied to Hash constructor from string: ", c);
}
}
[[noreturn]] static void throwInvalidArgument(
const char* message,
size_t number);
Storage bytes_;
};
using ObjectIdRange = folly::Range<const ObjectId*>;
/**
* Output stream operator for ObjectId.
*
* This makes it possible to easily use ObjectId in glog statements.
*/
std::ostream& operator<<(std::ostream& os, const ObjectId& hash);
/* Define toAppend() so folly::to<string>(Hash) will work */
void toAppend(const ObjectId& hash, std::string* result);
} // namespace facebook::eden
namespace std {
template <>
struct hash<facebook::eden::ObjectId> {
size_t operator()(const facebook::eden::ObjectId& hash) const noexcept {
return hash.getHashCode();
}
};
} // namespace std
namespace fmt {
template <>
struct formatter<facebook::eden::ObjectId> : formatter<std::string> {
auto format(const facebook::eden::ObjectId& id, format_context& ctx) {
return formatter<std::string>::format(id.toLogString(), ctx);
}
};
} // namespace fmt