martin/docs/src/mbtiles-validation.md

60 lines
3.0 KiB
Markdown
Raw Normal View History

# MBTiles Validation
The original [MBTiles specification](https://github.com/mapbox/mbtiles-spec#readme) does not provide any guarantees for
the content of the tile data in MBTiles. `mbtiles validate` assumes a few additional conventions and uses them to ensure
that the content of the tile data is valid performing several validation steps. If the file is not valid, the command
will print an error message and exit with a non-zero exit code.
```bash
mbtiles validate src_file.mbtiles
```
## SQLite Integrity check
The `validate` command will run `PRAGMA integrity_check` on the file, and will fail if the result is not `ok`.
The `--integrity-check` flag can be used to disable this check, or to make it more thorough with `full` value. Default
is `quick`.
## Schema check
The `validate` command will verify that the `tiles` table/view exists, and that it has the expected columns and indexes.
It will also verify that the `metadata` table/view exists, and that it has the expected columns and indexes.
## Per-tile validation
If the `.mbtiles` file uses [flat_with_hash](mbtiles-schema.md#flat-with-hash)
or [normalized](mbtiles-schema.md#normalized) schema, the `validate` command will verify that the MD5 hash of
the `tile_data` column matches the `tile_hash` or `tile_id` columns (depending on the schema).
A typical Normalized schema generated by tools like [tilelive-copy](https://github.com/mapbox/TileLive#bintilelive-copy)
use MD5 hash in the `tile_id` column. The Martin's `mbtiles` tool can use this hash to verify the content of each tile.
We also define a new [flat-with-hash](mbtiles-schema.md#flat-with-hash) schema that stores the hash and tile data in the
same table, allowing per-tile validation without the multiple table layout.
Per-tile validation is not available for the `flat` schema, and will be skipped.
## Aggregate Content Validation
Per-tile validation will catch individual tile corruption, but it will not detect overall datastore corruption such as
missing tiles, tiles that should not exist, or tiles with incorrect z/x/y values. For that, the `mbtiles` tool defines a
new metadata value called `agg_tiles_hash`.
The value is computed by hashing the combined value for all rows in the `tiles` table/view, ordered by z,x,y. The value
is computed using the following SQL expression, which uses a custom `md5_concat_hex` function
from [sqlite-hashes crate](https://crates.io/crates/sqlite-hashes):
```sql, ignore
md5_concat_hex(
CAST(zoom_level AS TEXT),
CAST(tile_column AS TEXT),
CAST(tile_row AS TEXT),
tile_data)
```
In case there are no rows or all are NULL, the hash value of an empty string is used. Note that SQLite allows any value
type to be stored as in any column, so if `tile_data` accidentally contains non-blob/text/null value, validation will
fail.
The `mbtiles` tool will compute `agg_tiles_hash` value when copying or validating mbtiles files. Use `--agg-hash update`
to force the value to be updated, even if it is incorrect or does not exist.