A Scalable, User-Friendly Source Control System.
Go to file
Jun Wu 8d54b2b3ed hgsql: add more checks before running tests
Summary:
Add two tests so tests can be skipped early instead of dumping stack traces:
- If the Python module `mysql.connector` is not available, skip.
- If the `mysql` command fails, skip.

Test Plan:
Normally, test passed:

  $ run-tests.py test-sync.t
  .

Without mysql.connector:

  $ mv /usr/lib/python2.7/site-packages/mysql/connector{,2}
  $ run-tests.py test-sync.t
  s
  Skipped test-sync.t: mysql-connector-python missing

With a wrong `getdb.sh`:

  $ echo 'DBPORT=9999' >> getdb.sh
  $ run-tests.py test-sync.t
  s
  Skipped test-sync.t: unable to initialize the database. check your getdb.sh


Reviewers: #sourcecontrol, rmcelroy

Reviewed By: rmcelroy

Differential Revision: https://phabricator.intern.facebook.com/D4772997

Signature: t1:4772997:1490400442:cf67ee406ee12905042cb0faf1053e355eda2ea7
2017-03-24 14:01:13 -07:00
tests hgsql: add more checks before running tests 2017-03-24 14:01:13 -07:00
.hgignore ignore: add .testtimes to hgignore 2016-03-07 02:54:17 -08:00
COPYING Add README and COPYING 2014-08-04 19:38:52 -07:00
hgsql.py hgsql: make it test-friendly for external users 2017-03-23 19:14:52 -07:00
Makefile Add make/setup files 2014-02-03 10:51:06 -08:00
README.md hgsql: make it test-friendly for external users 2017-03-23 19:14:52 -07:00
schema.sql hgsql: make it test-friendly for external users 2017-03-23 19:14:52 -07:00
setup.py Add make/setup files 2014-02-03 10:51:06 -08:00

hgsql

The hgsql extension allows multiple Mercurial servers to provide read and write access to a single repository at once. It does this by using a MySQL database to manage write locks and to propagate changes to the other servers.

This improves server scalability by allowing load to be distributed amongst multiple servers, and improves reliability by allowing individual servers to be taken down for repair without any downtime to the system as a whole.

Installing

hgsql can be installed like any other Mercurial extension. Download the source code and add the hgsql file to your repositories hgrc:

:::ini
[extensions]
hgsql=path/to/hgsql/hgsql.py

Configuring

Server

To set up a new hgsql repo, hg init an empty repository and add the appropriate hgsql configuration to the hgrc. Populate the database by pushing commits into this new repository.

To set up other servers for an existing hgsql repo, hg init a new empty repository and give it the same configuration as the existing repo on the other machines. Run any read command (ex: hg log -l 1) to synchronize the repo with the database.

  • database (required) - The name of the database to use.
  • enabled (required) - Must be set to 'True'.
  • host (required) - The host name of the database.
  • password (required) - The password of the database to use. For testing only. DO NOT actually store your database password in plain text on your Mercurial servers. At Facebook we use an alternative mechanism for authenticating with the database. Users of hgsql are welcome to submit pull requests that enable other authentication mechanisms for their use cases.
  • port (required) - The port of the database.
  • reponame (required) - A unique name for this repository in the database. This is used to distinguish between multiple repositories being stored in the same database.
  • user (required) - The name of the user for connecting to the database.
  • waittimeout (optional) - The MySQL connection timeout to use. Useful when importing large repositories. Defaults to 300 seconds.

An example server configuration:

:::ini
[hgsql]
database = mydatabase
enabled = True
host = localhost
password = aaa
port = 12345
reponame = myreponame
user = mysqluser

Database

See schema.sql for required tables. You can create them using commands like:

:::bash
mysql -uroot -D hgsql < ./schema.sql

Client

Clients do not need hgsql installed, nor any special configuration to talk to hgsql based Mercurial servers.

Caveats & Troubleshooting

Because hgsql synchronizes when any request comes in (even read requests), all users who perform such requests must have write access to the repository.

Since hgsql synchronizes changes between servers, it's possible for servers to become out of sync if one server receives a write without the hgsql extension being enabled. If this happens, that server will refuse to receive any new data from the database and throw an exception. To fix it, strip the recent commits on the offending server using 'hg strip -r "badcommit:" --config extensions.hgsql=!' then try to resync with the db by running any read command (ex: hg log -l 1).

hgsql generally assumes that your repositories are write only and only provides rudimentary support for deleting commits. If you absolutely need to delete a commit, you can use hg sqlstrip <rev> to delete every commit newer than and including <rev>. You will need to run this command on every hgsql server, since deletes are not propagated automatically.

Implementation Details

hgsql works by keeping a table of all commit, manifest, and file revisions in the repository.

When a Mercurial server receives a request from a client, it first checks that it has the latest bits in the MySQL database. If there's new data, it downloads it before serving the request. Otherwise it serves the request from disk like normal. This means the majority of the read load is on the Mercurial server, and the database is just used for doing minimal synchronization.

When a client issues a write request to the Mercurial server (like a push), the Mercurial server obtains both the local Mercurial write lock, and a MySQL application level write lock that prevents all other servers from writing to that repo at the same time.

Contributing

Patches are welcome as pull requests, though they will be collapsed and rebased to maintain a linear history.

To run tests, copy tests/getdb.sh.example to tests/getdb.sh, and edit it to provide MySQL host, port, user and password. Then run the actual tests via:

:::bash
cd tests
./run-tests.py --with-hg=/path/to/hg
# Alternatively, you can use run-tests.py from a checkout of the hg repo
/path/to/repo/hg/tests/run-tests.py

We (Facebook) have to ask for a "Contributor License Agreement" from someone who sends in a patch or code that we want to include in the codebase. This is a legal requirement; a similar situation applies to Apache and other ASF projects.

If we ask you to fill out a CLA we'll direct you to our online CLA page where you can complete it easily. We use the same form as the Apache CLA so that friction is minimal.

License

hgsql is made available under the terms of the GNU General Public License version 2, or any later version. See the COPYING file that accompanies this distribution for the full text of the license.