Every time I run an import, 10 GCE workers download about 20GB of
data/input. The S3 outbound charges are uncomfortably high.
Instead, use GCP's S3->GCS transfer tool manually before each run, and
make the GCE VMs read from that instead.
I haven't tested these changes yet, but will soon with the next import.
- fix self-destruct command
- ship a GDAL-enabled importer and rebuild everything for Seattle, like
the normal local process
I'm pretty sure the full process should succeed now. Next step is
figuring out a process for finalizing the changed output files in S3.
- Amp up number of workers (about 100 cities, so 10/worker now)
- Use an SSD, since especially the setup and upload steps are extremely
IO bound
- Split the script into pieces that can be easily disabled to iterate
faster
- Use the bulk API to create instances
- Make the overall start_batch_import.sh a bit quieter
- Make successful VMs self-destruct so it's easier to track which're
done
- Setup Docker on the VMs, so elevation data works