Commit Graph

7 Commits

Author SHA1 Message Date
Dustin Carlino
83bc768e28 Optimize costs of importing in the cloud. #326
Every time I run an import, 10 GCE workers download about 20GB of
data/input. The S3 outbound charges are uncomfortably high.

Instead, use GCP's S3->GCS transfer tool manually before each run, and
make the GCE VMs read from that instead.

I haven't tested these changes yet, but will soon with the next import.
2021-05-27 08:09:03 -07:00
Dustin Carlino
f5dcd9bfff Polish up the cloud importer, based on its first real use. #326
- Give the worker assigned Seattle a much bigger machine type
- Create a script to grab the results from S3 and finalize them
2021-05-19 22:17:14 -07:00
Dustin Carlino
3de821f1b8 Clear day, cloudy imports. #326
- fix self-destruct command
- ship a GDAL-enabled importer and rebuild everything for Seattle, like
  the normal local process

I'm pretty sure the full process should succeed now. Next step is
figuring out a process for finalizing the changed output files in S3.
2021-05-18 14:07:40 -07:00
Dustin Carlino
c99f251766 Confront the horrors of a past life, and give up on GCE's bulk instance
creation API (or at least gcloud). #326
2021-05-18 12:45:32 -07:00
Dustin Carlino
a81d33628f Working on the GCP importer workflow... #326
- Amp up number of workers (about 100 cities, so 10/worker now)
- Use an SSD, since especially the setup and upload steps are extremely
  IO bound
- Split the script into pieces that can be easily disabled to iterate
  faster
- Use the bulk API to create instances
- Make the overall start_batch_import.sh a bit quieter
- Make successful VMs self-destruct so it's easier to track which're
  done
- Setup Docker on the VMs, so elevation data works
2021-05-18 12:28:41 -07:00
Dustin Carlino
5fca901e4c Give up on Docker and AWS Batch to bulk import cities. Switch to static
sharding with GCE instead. #326
2021-05-18 09:50:28 -07:00
Dustin Carlino
68f1225f22 Create a Docker image to run the map importer in the almighty cloud. #326 2021-05-06 17:35:58 -07:00