mirror of
https://github.com/a-b-street/abstreet.git
synced 2024-11-24 01:15:12 +03:00
Optimize costs of importing in the cloud. #326
Every time I run an import, 10 GCE workers download about 20GB of data/input. The S3 outbound charges are uncomfortably high. Instead, use GCP's S3->GCS transfer tool manually before each run, and make the GCE VMs read from that instead. I haven't tested these changes yet, but will soon with the next import.
This commit is contained in:
parent
53430319b1
commit
83bc768e28
@ -16,6 +16,11 @@ if [ "$EXPERIMENT_TAG" == "" ]; then
|
||||
exit 1;
|
||||
fi
|
||||
|
||||
if [ "$2" != "gcs_sync_done" ]; then
|
||||
echo First go sync dev/data/input from S3 to GCS. https://console.cloud.google.com/transfer/cloud/jobs
|
||||
exit 1;
|
||||
fi
|
||||
|
||||
NUM_WORKERS=10
|
||||
ZONE=us-east1-b
|
||||
# See other options: https://cloud.google.com/compute/docs/machine-types
|
||||
|
@ -25,9 +25,12 @@ cd worker_payload
|
||||
mv .aws ~/
|
||||
|
||||
# If we import without raw files, we'd wind up downloading fresh OSM data!
|
||||
# Reuse what's in S3. We could use the updater, but probably aws sync is
|
||||
# faster.
|
||||
aws s3 sync s3://abstreet/dev/data/input data/input/
|
||||
# Reuse what's in S3. But having a bunch of GCE VMs grab from S3 is expensive,
|
||||
# so instead, sync from the GCS mirror that I manually update before each job.
|
||||
gsutil -m cp -r gs://abstreet-importer/ .
|
||||
mv abstreet-importer/dev/data/input data/input
|
||||
rmdir abstreet-importer/dev
|
||||
rmdir abstreet-importer
|
||||
find data/input -name '*.gz' -print -exec gunzip '{}' ';'
|
||||
|
||||
# Set up Docker, for the elevation data
|
||||
|
Loading…
Reference in New Issue
Block a user