View Source Plausible.Purge (Plausible v0.0.1)
Deletes data from a site.
Stats are stored on Clickhouse, and unlike other databases data deletion is done asynchronously.
All import tables have MergeTree's deduplication mechanism disabled by setting
replicated_deduplication_window
from default 100 to 0. When enabled, every insert
into a given table is compared against hashes of 100 previous inserts (as complete
parts, not concrete rows) and ignored when match is found. The prupose of that
mechanism is making inserts of exact same batches idempotent when retrying them
shortly after - for instance due to timeout, when the client can't easily tell if
previous insert succeeded or not. Deduplication, however, only considers inserts,
not mutations. Deletions do not affect stored hashes, so further inserts of parts
that were deleted will still be treated as duplicates. That's why this feature
is disabled for import tables.
Although deletions are asynchronous, the parts to delete are "remembered", so there's no risk of overlapping deletion causing problems with import following right after it.
IMPORTANT: Deletion requires revision if/when import tables get moved to sharded CH
cluster setup. Mutation queries, which have to be run with ON CLUSTER
in such setup,
dispatch independent queries across shards and those queries can start at different
times. This in turn means risk of deletions corrupting data of follow-up inserts
in some edge cases. Ideally, imported entries should be unique for a given import
an extra
import_id
column can be introduced, holding identifier. Last processed import identifier should be stored with other site data and should be used for scoping imported stats queries. No longer used imports can then be safely removed fully asynchronously.
Summary
Functions
Deletes imported stats from and clears the stats_start_date
field.
Move stats pointers so that no historical stats are available.
Functions
@spec delete_imported_stats!(Plausible.Site.t() | Plausible.Imported.SiteImport.t()) :: :ok
Deletes imported stats from and clears the stats_start_date
field.
The stats_start_date
is expected to get repopulated the next time
Plausible.Sites.stats_start_date/1
is called.
If the input argument is a site, all imported stats are deleted. If it's a site import, only imported stats for that import are deleted.
@spec delete_native_stats!(Plausible.Site.t()) :: :ok
Move stats pointers so that no historical stats are available.