Improve MBTiles documentation (#961)

Fixes #918 

---------

Co-authored-by: Yuri Astrakhan <YuriAstrakhan@gmail.com>
This commit is contained in:
Lucas 2023-10-31 12:41:21 +08:00 committed by GitHub
parent 3a1f7acc95
commit c6170c5913
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 213 additions and 135 deletions

View File

@ -1,139 +1,12 @@
# Tools
# CLI Tools
Martin has a few additional tools that can be used to interact with the data.
Martin project contains additional tooling to help manage the data servable with Martin tile server.
## MBTiles tool
A small utility that allows users to interact with the `*.mbtiles` files from the command line. Use `mbtiles --help` to see a list of available commands, and `mbtiles <command> --help` to see help for a specific command.
## `mbtiles`
`mbtiles` is a small utility to interact with the `*.mbtiles` files from the command line. It allows users to examine, copy, validate, compare, and apply diffs between them.
Use `mbtiles --help` to see a list of available commands, and `mbtiles <command> --help` to see help for a specific command.
This tool can be installed by compiling the latest released version with `cargo install mbtiles`, or by downloading a pre-built binary from the [releases page](https://github.com/maplibre/martin/releases/latest).
### meta-all
Print all metadata values to stdout, as well as the results of tile detection. The format of the values printed is not stable, and should only be used for visual inspection.
```shell
mbtiles meta-all my_file.mbtiles
```
### meta-get
Retrieve raw metadata value by its name. The value is printed to stdout without any modifications. For example, to get the `description` value from an mbtiles file:
```shell
mbtiles meta-get my_file.mbtiles description
```
### meta-set
Set metadata value by its name, or delete the key if no value is supplied. For example, to set the `description` value to `A vector tile dataset`:
```shell
mbtiles meta-set my_file.mbtiles description "A vector tile dataset"
```
### copy
Copy an mbtiles file, optionally filtering its content by zoom levels.
```shell
mbtiles copy src_file.mbtiles dst_file.mbtiles \
--min-zoom 0 --max-zoom 10
```
Copy command can also be used to compare two mbtiles files and generate a delta (diff) file. The diff file can be applied to the `src_file.mbtiles` elsewhere, to avoid copying/transmitting the entire modified dataset. The delta file will contain all tiles that are different between the two files (modifications, insertions, and deletions as `NULL` values), for both the tile and metadata tables.
There is one exception: `agg_tiles_hash` metadata value will be renamed to `agg_tiles_hash_in_diff`, and a new `agg_tiles_hash` will be generated for the diff file itself. This is done to avoid confusion when applying the diff file to the original file, as the `agg_tiles_hash` value will be different after the diff is applied. The `apply-diff` command will automatically rename the `agg_tiles_hash_in_diff` value back to `agg_tiles_hash` when applying the diff.
```shell
mbtiles copy src_file.mbtiles diff_file.mbtiles \
--diff-with-file modified_file.mbtiles
```
This command can also be used to generate files of different [supported schema](##supported-schema).
```shell
mbtiles copy normalized.mbtiles dst.mbtiles \
--dst-mbttype flat-with-hash
```
### apply-diff
Apply the diff file generated from `copy` command above to an mbtiles file. The diff file can be applied to the `src_file.mbtiles` elsewhere, to avoid copying/transmitting the entire modified dataset.
Note that the `agg_tiles_hash_in_diff` metadata value will be renamed to `agg_tiles_hash` when applying the diff. This is done to avoid confusion when applying the diff file to the original file, as the `agg_tiles_hash` value will be different after the diff is applied.
```shell
mbtiles apply_diff src_file.mbtiles diff_file.mbtiles
```
Another way to apply the diff is to use the `sqlite3` command line tool directly. This SQL will delete all tiles from `src_file.mbtiles` that are set to `NULL` in `diff_file.mbtiles`, and then insert or update all new tiles from `diff_file.mbtiles` into `src_file.mbtiles`, where both files are of `flat` type. The name of the diff file is passed as a query parameter to the sqlite3 command line tool, and then used in the SQL statements.
```shell
sqlite3 src_file.mbtiles \
-bail \
-cmd ".parameter set @diffDbFilename diff_file.mbtiles" \
"ATTACH DATABASE @diffDbFilename AS diffDb;" \
"DELETE FROM tiles WHERE (zoom_level, tile_column, tile_row) IN (SELECT zoom_level, tile_column, tile_row FROM diffDb.tiles WHERE tile_data ISNULL);" \
"INSERT OR REPLACE INTO tiles (zoom_level, tile_column, tile_row, tile_data) SELECT * FROM diffDb.tiles WHERE tile_data NOTNULL;"
```
### validate
If the `.mbtiles` file is of `flat_with_hash` or `normalized` type, then verify that the data stored in columns `tile_hash` and `tile_id` respectively are MD5 hashes of the `tile_data` column.
```shell
mbtiles validate src_file.mbtiles
```
## Content Validation
The original [MBTiles specification](https://github.com/mapbox/mbtiles-spec#readme) does not provide any guarantees for the content of the tile data in MBTiles. This tool adds a few additional conventions to ensure that the content of the tile data is valid.
A typical Normalized schema generated by tools like [tilelive-copy](https://github.com/mapbox/TileLive#bintilelive-copy) use MD5 hash in the `tile_id` column. The Martin's `mbtiles` tool can use this hash to verify the content of each tile. We also define a new `flat-with-hash` schema that stores the hash and tile data in the same table. This schema is more efficient than the `normalized` schema when data has no duplicate tiles (see below). Per tile validation is not available for `flat` schema.
Per-tile validation will catch individual invalid tiles, but it will not detect overall datastore corruption (e.g. missing tiles or tiles that shouldn't exist, or tiles with incorrect z/x/y values).
For that, Martin `mbtiles` tool defines a new metadata value called `agg_tiles_hash`. The value is computed by hashing `cast(zoom_level AS text), cast(tile_column AS text), cast(tile_row AS text), tile_data` combined for all rows in the `tiles` table/view, ordered by z,x,y.
In case there are no rows or all are NULL, the hash value of an empty string is used. Note that SQLite allows any value type to be stored as in any column, so if `tile_data` accidentally contains non-blob/text/null value, validation will fail.
The `mbtiles` tool will compute `agg_tiles_hash` value when copying or validating mbtiles files.
## Supported Schema
The `mbtiles` tool supports three different kinds of schema for `tiles` data in `.mbtiles` files. See also the original [specification](https://github.com/mapbox/mbtiles-spec#readme).
### flat
```sql, ignore
CREATE TABLE tiles (zoom_level integer, tile_column integer, tile_row integer, tile_data blob);
CREATE UNIQUE INDEX tile_index on tiles (zoom_level, tile_column, tile_row);
```
### flat-with-hash
```sql, ignore
CREATE TABLE tiles_with_hash (
zoom_level integer NOT NULL,
tile_column integer NOT NULL,
tile_row integer NOT NULL,
tile_data blob,
tile_hash text);
CREATE UNIQUE INDEX tiles_with_hash_index on tiles_with_hash (zoom_level, tile_column, tile_row);
CREATE VIEW tiles AS SELECT zoom_level, tile_column, tile_row, tile_data FROM tiles_with_hash;
```
### normalized
```sql, ignore
CREATE TABLE map (zoom_level INTEGER, tile_column INTEGER, tile_row INTEGER, tile_id TEXT);
CREATE UNIQUE INDEX map_index ON map (zoom_level, tile_column, tile_row);
CREATE TABLE images (tile_id text, tile_data blob);
CREATE UNIQUE INDEX images_id ON images (tile_id);
CREATE VIEW tiles AS
SELECT
map.zoom_level AS zoom_level,
map.tile_column AS tile_column,
map.tile_row AS tile_row,
images.tile_data AS tile_data
FROM map
JOIN images ON images.tile_id = map.tile_id;
```
Optionally, `.mbtiles` files with `normalized` schema can include a `tiles_with_hash` view:
```sql, ignore
CREATE VIEW tiles_with_hash AS
SELECT
map.zoom_level AS zoom_level,
map.tile_column AS tile_column,
map.tile_row AS tile_row,
images.tile_data AS tile_data,
images.tile_id AS tile_hash
FROM map LEFT JOIN images ON map.tile_id = images.tile_id;
```
**__Note:__** All `normalized` files created by the `mbtiles` tool will contain this view.
The `mbtiles` utility builds on top of the [MBTiles specification](https://github.com/mapbox/mbtiles-spec). It adds a few additional conventions to ensure that the content of the tile data is valid, and can be used for reliable diffing and patching of the tilesets.

View File

@ -0,0 +1,22 @@
# `mbtiles` Metadata Access
## meta-all
Print all metadata values to stdout, as well as the results of tile detection. The format of the values printed is not stable, and should only be used for visual inspection.
```shell
mbtiles meta-all my_file.mbtiles
```
## meta-get
Retrieve raw metadata value by its name. The value is printed to stdout without any modifications. For example, to get the `description` value from an mbtiles file:
```shell
mbtiles meta-get my_file.mbtiles description
```
## meta-set
Set metadata value by its name, or delete the key if no value is supplied. For example, to set the `description` value to `A vector tile dataset`:
```shell
mbtiles meta-set my_file.mbtiles description "A vector tile dataset"
```

View File

@ -0,0 +1,57 @@
# Copying, Diffing, and Patching MBTiles
## `mbtiles copy`
Copy command copies an mbtiles file, optionally filtering its content by zoom levels.
```shell
mbtiles copy src_file.mbtiles dst_file.mbtiles \
--min-zoom 0 --max-zoom 10
```
This command can also be used to generate files of different [supported schema](##supported-schema).
```shell
mbtiles copy normalized.mbtiles dst.mbtiles \
--dst-mbttype flat-with-hash
```
## `mbtiles copy --diff-with-file`
Copy command can also be used to compare two mbtiles files and generate a delta (diff) file. The diff file can be applied to the `src_file.mbtiles` elsewhere, to avoid copying/transmitting the entire modified dataset. The delta file will contain all tiles that are different between the two files (modifications, insertions, and deletions as `NULL` values), for both the tile and metadata tables.
There is one exception: `agg_tiles_hash` metadata value will be renamed to `agg_tiles_hash_in_diff`, and a new `agg_tiles_hash` will be generated for the diff file itself. This is done to avoid confusion when applying the diff file to the original file, as the `agg_tiles_hash` value will be different after the diff is applied. The `apply-diff` command will automatically rename the `agg_tiles_hash_in_diff` value back to `agg_tiles_hash` when applying the diff.
```shell
mbtiles copy src_file.mbtiles diff_file.mbtiles \
--diff-with-file modified_file.mbtiles
```
## `mbtiles copy --apply-patch`
Copy a source file to destination while also applying the diff file generated by `copy --diff-with-file` command above to the destination mbtiles file. This allows safer application of the diff file, as the source file is not modified.
```shell
mbtiles copy src_file.mbtiles dst_file.mbtiles \
--apply-patch diff_file.mbtiles
```
## `mbtiles apply-patch`
Apply the diff file generated from `copy` command above to an mbtiles file. The diff file can be applied to the `src_file.mbtiles` elsewhere, to avoid copying/transmitting the entire modified dataset.
Note that the `agg_tiles_hash_in_diff` metadata value will be renamed to `agg_tiles_hash` when applying the diff. This is done to avoid confusion when applying the diff file to the original file, as the `agg_tiles_hash` value will be different after the diff is applied.
```shell
mbtiles apply_diff src_file.mbtiles diff_file.mbtiles
```
#### Applying diff with SQLite
Another way to apply the diff is to use the `sqlite3` command line tool directly. This SQL will delete all tiles from `src_file.mbtiles` that are set to `NULL` in `diff_file.mbtiles`, and then insert or update all new tiles from `diff_file.mbtiles` into `src_file.mbtiles`, where both files are of `flat` type. The name of the diff file is passed as a query parameter to the sqlite3 command line tool, and then used in the SQL statements. Note that this does not update the `agg_tiles_hash` metadata value, so it will be incorrect after the diff is applied.
```shell
sqlite3 src_file.mbtiles \
-bail \
-cmd ".parameter set @diffDbFilename diff_file.mbtiles" \
"ATTACH DATABASE @diffDbFilename AS diffDb;" \
"DELETE FROM tiles WHERE (zoom_level, tile_column, tile_row) IN (SELECT zoom_level, tile_column, tile_row FROM diffDb.tiles WHERE tile_data ISNULL);" \
"INSERT OR REPLACE INTO tiles (zoom_level, tile_column, tile_row, tile_data) SELECT * FROM diffDb.tiles WHERE tile_data NOTNULL;"
```

View File

@ -0,0 +1,38 @@
# MBTiles Validation
The original [MBTiles specification](https://github.com/mapbox/mbtiles-spec#readme) does not provide any guarantees for the content of the tile data in MBTiles. `mbtiles validate` assumes a few additional conventions and uses them to ensure that the content of the tile data is valid performing several validation steps. If the file is not valid, the command will print an error message and exit with a non-zero exit code.
```shell
mbtiles validate src_file.mbtiles
```
## SQLite Integrity check
The `validate` command will run `PRAGMA integrity_check` on the file, and will fail if the result is not `ok`. The `--integrity-check` flag can be used to disable this check, or to make it more thorow with `full` value. Default is `quick`.
## Schema check
The `validate` command will verify that the `tiles` table/view exists, and that it has the expected columns and indexes. It will also verify that the `metadata` table/view exists, and that it has the expected columns and indexes.
## Per-tile validation
If the `.mbtiles` file uses [flat_with_hash](54-mbtiles-schema.md#flat-with-hash) or [normalized](54-mbtiles-schema.md#normalized) schema, the `validate` command will verify that the MD5 hash of the `tile_data` column matches the `tile_hash` or `tile_id` columns (depending on the schema).
A typical Normalized schema generated by tools like [tilelive-copy](https://github.com/mapbox/TileLive#bintilelive-copy) use MD5 hash in the `tile_id` column. The Martin's `mbtiles` tool can use this hash to verify the content of each tile. We also define a new [flat-with-hash](54-mbtiles-schema.md#flat-with-hash) schema that stores the hash and tile data in the same table, allowing per-tile validation without the multiple table layout.
Per-tile validation is not available for the `flat` schema, and will be skipped.
## Aggregate Content Validation
Per-tile validation will catch individual tile corruption, but it will not detect overall datastore corruption such as missing tiles, tiles that should not exist, or tiles with incorrect z/x/y values. For that, the `mbtiles` tool defines a new metadata value called `agg_tiles_hash`.
The value is computed by hashing the combined value for all rows in the `tiles` table/view, ordered by z,x,y. The value is computed using the following SQL expression, which uses a custom `md5_concat_hex` function from [sqlite-hashes crate](https://crates.io/crates/sqlite-hashes):
```sql, ignore
md5_concat_hex(
CAST(zoom_level AS TEXT),
CAST(tile_column AS TEXT),
CAST(tile_row AS TEXT),
tile_data)
```
In case there are no rows or all are NULL, the hash value of an empty string is used. Note that SQLite allows any value type to be stored as in any column, so if `tile_data` accidentally contains non-blob/text/null value, validation will fail.
The `mbtiles` tool will compute `agg_tiles_hash` value when copying or validating mbtiles files. Use `--update-agg-tiles-hash` to force the value to be updated, even if it is incorrect or does not exist.

View File

@ -0,0 +1,80 @@
# MBTiles Schemas
The `mbtiles` tool builds on top of the original [MBTiles specification](https://github.com/mapbox/mbtiles-spec#readme) by specifying three different kinds of schema for `tiles` data: `flat`, `flat-with-hash`, and `normalized`. The `mbtiles` tool can convert between these schemas, and can also generate a diff between two files of any schemas, as well as merge multiple schema files into one file.
## flat
Flat schema is the closest to the original MBTiles specification. It stores all tiles in a single table. This schema is the most efficient when the tileset contains no duplicate tiles.
```sql, ignore
CREATE TABLE tiles (
zoom_level INTEGER,
tile_column INTEGER,
tile_row INTEGER,
tile_data BLOB);
CREATE UNIQUE INDEX tile_index on tiles (
zoom_level, tile_column, tile_row);
```
## flat-with-hash
Similar to the `flat` schema, but also includes a `tile_hash` column that contains a hash value of the `tile_data` column. Use this schema when the tileset has no duplicate tiles, but you still want to be able to validate the content of each tile individually.
```sql, ignore
CREATE TABLE tiles_with_hash (
zoom_level INTEGER NOT NULL,
tile_column INTEGER NOT NULL,
tile_row INTEGER NOT NULL,
tile_data BLOB,
tile_hash TEXT);
CREATE UNIQUE INDEX tiles_with_hash_index on tiles_with_hash (
zoom_level, tile_column, tile_row);
CREATE VIEW tiles AS
SELECT zoom_level, tile_column, tile_row, tile_data
FROM tiles_with_hash;
```
## normalized
Normalized schema is the most efficient when the tileset contains duplicate tiles. It stores all tile blobs in the `images` table, and stores the tile Z,X,Y coordinates in a `map` table. The `map` table contains a `tile_id` column that is a foreign key to the `images` table. The `tile_id` column is a hash of the `tile_data` column, making it possible to both validate each individual tile like in the `flat-with-hash` schema, and also to optimize storage by storing each unique tile only once.
```sql, ignore
CREATE TABLE map (
zoom_level INTEGER,
tile_column INTEGER,
tile_row INTEGER,
tile_id TEXT);
CREATE TABLE images (
tile_id TEXT,
tile_data BLOB);
CREATE UNIQUE INDEX map_index ON map (
zoom_level, tile_column, tile_row);
CREATE UNIQUE INDEX images_id ON images (
tile_id);
CREATE VIEW tiles AS
SELECT
map.zoom_level AS zoom_level,
map.tile_column AS tile_column,
map.tile_row AS tile_row,
images.tile_data AS tile_data
FROM
map JOIN images
ON images.tile_id = map.tile_id;
```
Optionally, `.mbtiles` files with `normalized` schema can include a `tiles_with_hash` view. All `normalized` files created by the `mbtiles` tool will contain this view.
```sql, ignore
CREATE VIEW tiles_with_hash AS
SELECT
map.zoom_level AS zoom_level,
map.tile_column AS tile_column,
map.tile_row AS tile_row,
images.tile_data AS tile_data,
images.tile_id AS tile_hash
FROM
map JOIN images
ON map.tile_id = images.tile_id;
```

View File

@ -22,4 +22,8 @@
- [Using with Mapbox](44-using-with-mapbox.md)
- [Recipes](45-recipes.md)
- [Tools](50-tools.md)
- [MBTiles Metadata](51-mbtiles-meta.md)
- [MBTiles Copying / Diffing](52-mbtiles-copy.md)
- [MBTiles Validation](53-mbtiles-validation.md)
- [MBTiles Schemas](54-mbtiles-schema.md)
- [Development](60-development.md)

View File

@ -16,7 +16,11 @@ export CARGO_TERM_COLOR := "always"
# Start Martin server
run *ARGS:
cargo run -- {{ ARGS }}
cargo run -p martin -- {{ ARGS }}
# Run mbtiles command
mbtiles *ARGS:
cargo run -p mbtiles -- {{ ARGS }}
# Start release-compiled Martin server and a test database
run-release *ARGS: start