mbtiles summary tool cleanup (#1000)

After some thinking, it seems `mbtiles summary` (aliased as `mbtiles
info`) would be a bit better than stats. I renamed and adjusted
documentation, consolidating it in one doc page.

Other changes:
* use file system's file size, reporting 'unknown' if needed
* report page count
* moved bbox computation into a separate function
* inlined a number of things for readability
This commit is contained in:
Yuri Astrakhan 2023-11-13 02:50:10 -05:00 committed by GitHub
parent 0398336114
commit 021cddcccd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 202 additions and 134 deletions

72
Cargo.lock generated
View File

@ -330,6 +330,15 @@ version = "1.0.75"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a4668cab20f66d8d020e1fbc0ebe47217433c1b6c8f2040faf858554e394ace6"
[[package]]
name = "approx"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "cab112f0a86d568ea0e627cc1d6be74a1e9cd55214684db5561995f6dad897c6"
dependencies = [
"num-traits",
]
[[package]]
name = "arrayref"
version = "0.3.7"
@ -470,7 +479,7 @@ version = "0.10.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3078c7629b62d3f0439517fa394996acacc5cbc91c5a20d8c658e77abd503a71"
dependencies = [
"generic-array",
"generic-array 0.14.7",
]
[[package]]
@ -842,7 +851,7 @@ version = "0.1.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1bfb12502f3fc46cca1bb51ac28df9d618d813cdc3d2f25b9fe775a34af26bb3"
dependencies = [
"generic-array",
"generic-array 0.14.7",
"typenum",
]
@ -1362,6 +1371,15 @@ dependencies = [
"slab",
]
[[package]]
name = "generic-array"
version = "0.12.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ffdf9f34f1447443d37393cc6c2b8313aebddcd96906caf34e54c68d8e57d7bd"
dependencies = [
"typenum",
]
[[package]]
name = "generic-array"
version = "0.14.7"
@ -1586,7 +1604,7 @@ dependencies = [
"bytemuck",
"byteorder",
"color_quant",
"num-rational",
"num-rational 0.4.1",
"num-traits",
"png",
]
@ -1872,6 +1890,7 @@ version = "0.7.2"
dependencies = [
"actix-rt",
"anyhow",
"approx",
"clap",
"ctor",
"enum-display",
@ -1886,6 +1905,7 @@ dependencies = [
"serde_json",
"serde_with",
"serde_yaml",
"size_format",
"sqlite-hashes",
"sqlx",
"thiserror",
@ -1989,6 +2009,19 @@ dependencies = [
"minimal-lexical",
]
[[package]]
name = "num"
version = "0.2.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b8536030f9fea7127f841b45bb6243b27255787fb4eb83958aa1ef9d2fdc0c36"
dependencies = [
"num-complex",
"num-integer",
"num-iter",
"num-rational 0.2.4",
"num-traits",
]
[[package]]
name = "num-bigint-dig"
version = "0.8.4"
@ -2006,6 +2039,16 @@ dependencies = [
"zeroize",
]
[[package]]
name = "num-complex"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b6b19411a9719e753aff12e5187b74d60d3dc449ec3f4dc21e3989c3f554bc95"
dependencies = [
"autocfg",
"num-traits",
]
[[package]]
name = "num-integer"
version = "0.1.45"
@ -2027,6 +2070,17 @@ dependencies = [
"num-traits",
]
[[package]]
name = "num-rational"
version = "0.2.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5c000134b5dbf44adc5cb772486d335293351644b801551abe8f75c84cfa4aef"
dependencies = [
"autocfg",
"num-integer",
"num-traits",
]
[[package]]
name = "num-rational"
version = "0.4.1"
@ -3110,6 +3164,16 @@ version = "0.3.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "38b58827f4464d87d377d175e90bf58eb00fd8716ff0a62f80356b5e61555d0d"
[[package]]
name = "size_format"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "6ed5f6ab2122c6dec69dca18c72fa4590a27e581ad20d44960fe74c032a0b23b"
dependencies = [
"generic-array 0.12.4",
"num",
]
[[package]]
name = "slab"
version = "0.4.9"
@ -3317,7 +3381,7 @@ dependencies = [
"futures-core",
"futures-io",
"futures-util",
"generic-array",
"generic-array 0.14.7",
"hex",
"hkdf",
"hmac",

View File

@ -1,4 +1,24 @@
# `mbtiles` Metadata Access
# MBTiles information and metadata
## summary
Use `mbtiles summary` to get a summary of the contents of an MBTiles file. The command will print a table with the number of tiles per zoom level, the size of the smallest and largest tiles, and the average size of tiles at each zoom level. The command will also print the bounding box of the covered area per zoom level.
```shell
File: tests/fixtures/mbtiles/world_cities.mbtiles
Schema: flat
File size: 48.00KiB
Page size: 4.00KiB
Page count: 12
| Zoom | Count |Smallest | Largest | Average | BBox |
| 0| 1| 1.08KiB| 1.08KiB| 1.08KiB|-179.99999997494382,-85.05112877764508,180.00000015460688,85.05112879314403|
| 1| 4| 160B| 650B| 366B|-179.99999997494382,-85.05112877764508,180.00000015460688,85.05112879314403|
| 2| 7| 137B| 495B| 239B|-179.99999997494382,-66.51326042021836,180.00000015460688,66.51326049182072|
| 3| 17| 67B| 246B| 134B|-134.99999995874995,-40.9798980140281,180.00000015460688,66.51326049182072|
| 4| 38| 64B| 175B| 86B|-134.99999995874995,-40.9798980140281,180.00000015460688,66.51326049182072|
| 5| 57| 64B| 107B| 72B|-123.74999995470151,-40.9798980140281,180.00000015460688,61.60639642757953|
| 6| 72| 64B| 97B| 68B|-123.74999995470151,-40.9798980140281,180.00000015460688,61.60639642757953|
| all| 196| 64B| 1.0KiB| 96B|
```
## meta-all
Print all metadata values to stdout, as well as the results of tile detection. The format of the values printed is not stable, and should only be used for visual inspection.

View File

@ -1,19 +0,0 @@
# Get a tile statistics from MBTiles file
For the concern of efficiency, you could figure out the page size and file size, tile size and bounds(represented as WGS 84 latitude and longitude values) of covered area with `mbtiles stats` command.
```shell
File: /path/to/world_cities.mbtiles
FileSize: 48.00KiB
Schema: flat
Page size: 4.00KiB
| Zoom | Count |Smallest | Largest | Average | BBox |
| 0| 1| 1.08KiB| 1.08KiB| 1.08KiB|-179.99999997494382,-85.05112877764508,180.00000015460688,85.05112879314403|
| 1| 4| 160B| 650B| 366B|-179.99999997494382,-85.05112877764508,180.00000015460688,85.05112879314403|
| 2| 7| 137B| 495B| 239B|-179.99999997494382,-66.51326042021836,180.00000015460688,66.51326049182072|
| 3| 17| 67B| 246B| 134B|-134.99999995874995,-40.9798980140281,180.00000015460688,66.51326049182072|
| 4| 38| 64B| 175B| 86B|-134.99999995874995,-40.9798980140281,180.00000015460688,66.51326049182072|
| 5| 57| 64B| 107B| 72B|-123.74999995470151,-40.9798980140281,180.00000015460688,61.60639642757953|
| 6| 72| 64B| 97B| 68B|-123.74999995470151,-40.9798980140281,180.00000015460688,61.60639642757953|
| all| 196| 64B| 1.0KiB| 96B|
```

View File

@ -22,9 +22,8 @@
- [Using with Mapbox](44-using-with-mapbox.md)
- [Recipes](45-recipes.md)
- [Tools](50-tools.md)
- [MBTiles Metadata](51-mbtiles-meta.md)
- [MBTiles Info and Metadata](51-mbtiles-meta.md)
- [MBTiles Copying / Diffing](52-mbtiles-copy.md)
- [MBTiles Validation](53-mbtiles-validation.md)
- [MBTiles Schemas](54-mbtiles-schema.md)
- [MBTiles statistics](55-mbtiles-stats.md)
- [Development](60-development.md)

View File

@ -1,10 +1,10 @@
{
"db_name": "SQLite",
"query": "SELECT page_count * page_size as file_size FROM pragma_page_count(), pragma_page_size();",
"query": "PRAGMA page_count;",
"describe": {
"columns": [
{
"name": "file_size",
"name": "page_count",
"ordinal": 0,
"type_info": "Int"
}
@ -16,5 +16,5 @@
null
]
},
"hash": "b771f1a10c396b9317db059015aa4c2714b57d964b06c0b06d4e44076773b37c"
"hash": "73b5d12b379c0fb2d8560d99653729d96dd1288005f47872c6a79b5bbf1ca8de"
}

View File

@ -20,15 +20,15 @@ pub struct Args {
#[derive(Subcommand, PartialEq, Eq, Debug)]
enum Commands {
/// Show MBTiels file summary statistics
#[command(name = "summary", alias = "info")]
Summary { file: PathBuf },
/// Prints all values in the metadata table in a free-style, unstable YAML format
#[command(name = "meta-all")]
MetaAll {
/// MBTiles file to read from
file: PathBuf,
},
/// Gets tile statistics from MBTiels file
#[command(name = "stats")]
Stats { file: PathBuf },
/// Gets a single value from the MBTiles metadata table.
#[command(name = "meta-get")]
MetaGetValue {
@ -117,12 +117,10 @@ async fn main_int() -> anyhow::Result<()> {
let mbt = Mbtiles::new(file.as_path())?;
mbt.validate(integrity_check, update_agg_tiles_hash).await?;
}
Commands::Stats { file } => {
Commands::Summary { file } => {
let mbt = Mbtiles::new(file.as_path())?;
let mut conn = mbt.open_readonly().await?;
let statistics = mbt.statistics(&mut conn).await?;
println!("{statistics}");
println!("{}", mbt.summary(&mut conn).await?);
}
}

View File

@ -3,7 +3,7 @@
use std::collections::HashSet;
use std::ffi::OsStr;
use std::fmt::{Display, Formatter};
use std::path::Path;
use std::path::{Path, PathBuf};
use std::str::FromStr;
#[cfg(feature = "cli")]
@ -41,7 +41,7 @@ pub struct Metadata {
}
#[derive(Clone, Debug, PartialEq, Serialize)]
pub struct ZoomStats {
pub struct ZoomInfo {
pub zoom: u8,
pub count: u64,
pub smallest: u64,
@ -51,29 +51,33 @@ pub struct ZoomStats {
}
#[derive(Clone, Debug, PartialEq, Serialize)]
pub struct Statistics {
pub struct Summary {
pub file_path: String,
pub file_size: u64,
pub file_size: Option<u64>,
pub mbt_type: MbtType,
pub page_size: u64,
pub zoom_stats_list: Vec<ZoomStats>,
pub page_count: u64,
pub zoom_info: Vec<ZoomInfo>,
pub count: u64,
pub smallest: Option<u64>,
pub largest: Option<u64>,
pub average: f64,
}
impl Display for Statistics {
impl Display for Summary {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
writeln!(f, "File: {}", self.file_path)?;
let file_size = SizeFormatterBinary::new(self.file_size);
writeln!(f, "FileSize: {file_size:.2}B")?;
writeln!(f, "Schema: {}", self.mbt_type)?;
if let Some(file_size) = self.file_size {
let file_size = SizeFormatterBinary::new(file_size);
writeln!(f, "File size: {file_size:.2}B")?;
} else {
writeln!(f, "File size: unknown")?;
}
let page_size = SizeFormatterBinary::new(self.page_size);
writeln!(f, "Page size: {page_size:.2}B")?;
writeln!(f, "Page count: {:.2}", self.page_count)?;
writeln!(
f,
@ -81,7 +85,7 @@ impl Display for Statistics {
"Zoom", "Count", "Smallest", "Largest", "Average", "BBox"
)?;
for l in &self.zoom_stats_list {
for l in &self.zoom_info {
let smallest = SizeFormatterBinary::new(l.smallest);
let largest = SizeFormatterBinary::new(l.largest);
let average = SizeFormatterBinary::new(l.average as u64);
@ -97,28 +101,26 @@ impl Display for Statistics {
l.bbox
)?;
}
if self.count != 0 {
if let (Some(smallest), Some(largest)) = (self.smallest, self.largest) {
let smallest = SizeFormatterBinary::new(smallest);
let largest = SizeFormatterBinary::new(largest);
let average = SizeFormatterBinary::new(self.average as u64);
writeln!(
f,
"|{:>9}|{:>9}|{:>9}|{:>9}|{:>9}|",
"all",
self.count,
format!("{smallest}B"),
format!("{largest}B"),
format!("{average}B")
)?
}
if let (Some(smallest), Some(largest)) = (self.smallest, self.largest) {
let smallest = SizeFormatterBinary::new(smallest);
let largest = SizeFormatterBinary::new(largest);
let average = SizeFormatterBinary::new(self.average as u64);
writeln!(
f,
"|{:>9}|{:>9}|{:>9}|{:>9}|{:>9}|",
"all",
self.count,
format!("{smallest}B"),
format!("{largest}B"),
format!("{average}B")
)?;
}
Ok(())
}
}
#[allow(clippy::trivially_copy_pass_by_ref)]
fn serialize_ti<S: Serializer>(ti: &TileInfo, serializer: S) -> Result<S::Ok, S::Error> {
let mut s = serializer.serialize_struct("TileInfo", 2)?;
s.serialize_field("format", &ti.format.to_string())?;
@ -296,23 +298,25 @@ impl Mbtiles {
self.check_agg_tiles_hashes(&mut conn).await
}
}
pub async fn statistics<T>(&self, conn: &mut T) -> MbtResult<Statistics>
/// Compute MBTiles file summary
pub async fn summary<T>(&self, conn: &mut T) -> MbtResult<Summary>
where
for<'e> &'e mut T: SqliteExecutor<'e>,
{
let file_size = query!(
"SELECT page_count * page_size as file_size FROM pragma_page_count(), pragma_page_size();"
).fetch_one(&mut *conn)
.await?
.file_size
.expect("The file size of the MBTiles file shouldn't be None") as u64;
let mbt_type = self.detect_type(&mut *conn).await?;
let file_size = PathBuf::from_str(&self.filepath)
.ok()
.and_then(|p| p.metadata().ok())
.map(|m| m.len());
let page_size = query!("PRAGMA page_size;")
.fetch_one(&mut *conn)
.await?
.page_size
.unwrap() as u64;
let tile_infos_query = query!(
let sql = query!("PRAGMA page_size;");
let page_size = sql.fetch_one(&mut *conn).await?.page_size.unwrap() as u64;
let sql = query!("PRAGMA page_count;");
let page_count = sql.fetch_one(&mut *conn).await?.page_count.unwrap() as u64;
let zoom_info = query!(
"
SELECT zoom_level AS zoom,
count() AS count,
@ -325,66 +329,66 @@ impl Mbtiles {
max(tile_row) AS max_tile_y
FROM tiles
GROUP BY zoom_level"
);
let mbt_type = self.detect_type(&mut *conn).await?;
let level_rows = tile_infos_query.fetch_all(&mut *conn).await?;
let zoom_stats_list: Vec<ZoomStats> = level_rows
)
.fetch_all(&mut *conn)
.await?;
let zoom_info: Vec<ZoomInfo> = zoom_info
.into_iter()
.map(|r| {
let zoom = r.zoom.unwrap() as u8;
let count = r.count as u64;
let tile_length = 40075016.7 / (2_u32.pow(zoom as u32)) as f64;
let smallest = r.smallest.unwrap_or(0) as u64;
let largest = r.largest.unwrap_or(0) as u64;
let average = r.average.unwrap_or(0.0);
let min_tile_x = r.min_tile_x.unwrap();
let min_tile_y = r.min_tile_y.unwrap();
let max_tile_x = r.max_tile_x.unwrap();
let max_tile_y = r.max_tile_y.unwrap();
let (minx, miny) = webmercator_to_wgs84(
-20037508.34 + min_tile_x as f64 * tile_length,
-20037508.34 + min_tile_y as f64 * tile_length,
);
let (maxx, maxy) = webmercator_to_wgs84(
-20037508.34 + (max_tile_x as f64 + 1.0) * tile_length,
-20037508.34 + (max_tile_y as f64 + 1.0) * tile_length,
);
let bbox = Bounds::new(minx, miny, maxx, maxy);
ZoomStats {
let zoom = u8::try_from(r.zoom.unwrap()).expect("zoom_level is not a u8");
ZoomInfo {
zoom,
count,
smallest,
largest,
average,
bbox,
count: r.count as u64,
smallest: r.smallest.unwrap_or(0) as u64,
largest: r.largest.unwrap_or(0) as u64,
average: r.average.unwrap_or(0.0),
bbox: Self::xyz_to_bbox(
zoom,
r.min_tile_x.unwrap(),
r.min_tile_y.unwrap(),
r.max_tile_x.unwrap(),
r.max_tile_y.unwrap(),
),
}
})
.collect();
let count = zoom_stats_list.iter().map(|l| l.count).sum();
let smallest = zoom_stats_list.iter().map(|l| l.smallest).reduce(u64::min);
let largest = zoom_stats_list.iter().map(|l| l.largest).reduce(u64::max);
let average = zoom_stats_list
let count = zoom_info.iter().map(|l| l.count).sum();
let avg_sum = zoom_info
.iter()
.map(|l| l.average * l.count as f64)
.sum::<f64>()
/ count as f64;
Ok(Statistics {
.sum::<f64>();
Ok(Summary {
file_path: self.filepath.clone(),
file_size,
mbt_type,
page_size,
zoom_stats_list,
page_count,
count,
smallest,
largest,
average,
smallest: zoom_info.iter().map(|l| l.smallest).reduce(u64::min),
largest: zoom_info.iter().map(|l| l.largest).reduce(u64::max),
average: avg_sum / count as f64,
zoom_info,
})
}
/// Convert min/max XYZ tile coordinates to a bounding box
fn xyz_to_bbox(zoom: u8, min_x: i32, min_y: i32, max_x: i32, max_y: i32) -> Bounds {
let tile_size = 40075016.7 / (2_u32.pow(zoom as u32)) as f64;
let (min_lng, min_lat) = webmercator_to_wgs84(
-20037508.34 + min_x as f64 * tile_size,
-20037508.34 + min_y as f64 * tile_size,
);
let (max_lng, max_lat) = webmercator_to_wgs84(
-20037508.34 + (max_x as f64 + 1.0) * tile_size,
-20037508.34 + (max_y as f64 + 1.0) * tile_size,
);
Bounds::new(min_lng, min_lat, max_lng, max_lat)
}
/// Get the aggregate tiles hash value from the metadata table
pub async fn get_agg_tiles_hash<T>(&self, conn: &mut T) -> MbtResult<Option<String>>
where
@ -861,15 +865,14 @@ fn webmercator_to_wgs84(x: f64, y: f64) -> (f64, f64) {
mod tests {
use std::collections::HashMap;
use approx::assert_relative_eq;
use insta::assert_yaml_snapshot;
use martin_tile_utils::Encoding;
use sqlx::Executor as _;
use tilejson::VectorLayer;
use crate::create_flat_tables;
use super::*;
use approx::assert_relative_eq;
use crate::create_flat_tables;
async fn open(filepath: &str) -> MbtResult<(SqliteConnection, Mbtiles)> {
let mbt = Mbtiles::new(filepath)?;
@ -1022,17 +1025,18 @@ mod tests {
}
#[actix_rt::test]
async fn stats_empty_file() -> MbtResult<()> {
let (mut conn, mbt) = open("file:mbtiles_empty_stats?mode=memory&cache=shared").await?;
async fn summary_empty_file() -> MbtResult<()> {
let (mut conn, mbt) = open("file:mbtiles_empty_summary?mode=memory&cache=shared").await?;
create_flat_tables(&mut conn).await.unwrap();
let res = mbt.statistics(&mut conn).await?;
let res = mbt.summary(&mut conn).await?;
assert_yaml_snapshot!(res, @r###"
---
file_path: "file:mbtiles_empty_stats?mode=memory&cache=shared"
file_size: 20480
file_path: "file:mbtiles_empty_summary?mode=memory&cache=shared"
file_size: ~
mbt_type: Flat
page_size: 4096
zoom_stats_list: []
page_count: 5
zoom_info: []
count: 0
smallest: ~
largest: ~
@ -1063,7 +1067,7 @@ mod tests {
#[actix_rt::test]
async fn stat() -> MbtResult<()> {
let (mut conn, mbt) = open("../tests/fixtures/mbtiles/world_cities.mbtiles").await?;
let res = mbt.statistics(&mut conn).await?;
let res = mbt.summary(&mut conn).await?;
assert_yaml_snapshot!(res, @r###"
---
@ -1071,7 +1075,8 @@ mod tests {
file_size: 49152
mbt_type: Flat
page_size: 4096
zoom_stats_list:
page_count: 12
zoom_info:
- zoom: 0
count: 1
smallest: 1107

View File

@ -3,8 +3,8 @@ A utility to work with .mbtiles file content
Usage: mbtiles <COMMAND>
Commands:
summary Show MBTiels file summary statistics
meta-all Prints all values in the metadata table in a free-style, unstable YAML format
stats Gets tile statistics from MBTiels file
meta-get Gets a single value from the MBTiles metadata table
meta-set Sets a single value in the MBTiles' file metadata table or deletes it if no value
copy Copy tiles from one mbtiles file to another

View File

@ -1,7 +1,8 @@
File: ./tests/fixtures/mbtiles/world_cities.mbtiles
FileSize: 48.00KiB
Schema: flat
File size: 48.00KiB
Page size: 4.00KiB
Page count: 12
| Zoom | Count |Smallest | Largest | Average | BBox |
| 0| 1| 1.08KiB| 1.08KiB| 1.08KiB|-179.99999997494382,-85.05112877764508,180.00000015460688,85.05112879314403|
| 1| 4| 160B| 650B| 366B|-179.99999997494382,-85.05112877764508,180.00000015460688,85.05112879314403|

View File

@ -352,9 +352,9 @@ if [[ "$MBTILES_BIN" != "-" ]]; then
set -x
$MBTILES_BIN --help 2>&1 | tee "$TEST_OUT_DIR/help.txt"
$MBTILES_BIN summary ./tests/fixtures/mbtiles/world_cities.mbtiles 2>&1 | tee "$TEST_OUT_DIR/summary.txt"
$MBTILES_BIN meta-all --help 2>&1 | tee "$TEST_OUT_DIR/meta-all_help.txt"
$MBTILES_BIN meta-all ./tests/fixtures/mbtiles/world_cities.mbtiles 2>&1 | tee "$TEST_OUT_DIR/meta-all.txt"
$MBTILES_BIN stats ./tests/fixtures/mbtiles/world_cities.mbtiles 2>&1 | tee "$TEST_OUT_DIR/stats.txt"
$MBTILES_BIN meta-get --help 2>&1 | tee "$TEST_OUT_DIR/meta-get_help.txt"
$MBTILES_BIN meta-get ./tests/fixtures/mbtiles/world_cities.mbtiles name 2>&1 | tee "$TEST_OUT_DIR/meta-get_name.txt"
$MBTILES_BIN meta-get ./tests/fixtures/mbtiles/world_cities.mbtiles missing_value 2>&1 | tee "$TEST_OUT_DIR/meta-get_missing_value.txt"