martin/docs/src/mbtiles-validation.md
Lucas a9cb0c972f
Adjust readme and martin book (#1253)
Fixes #1245

Large book and README refactoring, adding quick started section, list of available features to the README, and moving most of the content from README to the book.

---------

Co-authored-by: Yuri Astrakhan <YuriAstrakhan@gmail.com>
2024-05-26 07:06:28 -04:00

3.0 KiB

MBTiles Validation

The original MBTiles specification does not provide any guarantees for the content of the tile data in MBTiles. mbtiles validate assumes a few additional conventions and uses them to ensure that the content of the tile data is valid performing several validation steps. If the file is not valid, the command will print an error message and exit with a non-zero exit code.

mbtiles validate src_file.mbtiles

SQLite Integrity check

The validate command will run PRAGMA integrity_check on the file, and will fail if the result is not ok. The --integrity-check flag can be used to disable this check, or to make it more thorough with full value. Default is quick.

Schema check

The validate command will verify that the tiles table/view exists, and that it has the expected columns and indexes. It will also verify that the metadata table/view exists, and that it has the expected columns and indexes.

Per-tile validation

If the .mbtiles file uses flat_with_hash or normalized schema, the validate command will verify that the MD5 hash of the tile_data column matches the tile_hash or tile_id columns (depending on the schema).

A typical Normalized schema generated by tools like tilelive-copy use MD5 hash in the tile_id column. The Martin's mbtiles tool can use this hash to verify the content of each tile. We also define a new flat-with-hash schema that stores the hash and tile data in the same table, allowing per-tile validation without the multiple table layout.

Per-tile validation is not available for the flat schema, and will be skipped.

Aggregate Content Validation

Per-tile validation will catch individual tile corruption, but it will not detect overall datastore corruption such as missing tiles, tiles that should not exist, or tiles with incorrect z/x/y values. For that, the mbtiles tool defines a new metadata value called agg_tiles_hash.

The value is computed by hashing the combined value for all rows in the tiles table/view, ordered by z,x,y. The value is computed using the following SQL expression, which uses a custom md5_concat_hex function from sqlite-hashes crate:

md5_concat_hex(
    CAST(zoom_level  AS TEXT),
    CAST(tile_column AS TEXT),
    CAST(tile_row    AS TEXT),
    tile_data)

In case there are no rows or all are NULL, the hash value of an empty string is used. Note that SQLite allows any value type to be stored as in any column, so if tile_data accidentally contains non-blob/text/null value, validation will fail.

The mbtiles tool will compute agg_tiles_hash value when copying or validating mbtiles files. Use --agg-hash update to force the value to be updated, even if it is incorrect or does not exist.