When building Bazel-integrated CI workflows, we initially tried to optimize performance by minimizing network bottlenecks in connecting to our remote cache and executors. Next, we realized the bottleneck was Bazel’s analysis phase. We started using Firecracker micro-VMs for our CI runs, which have a snapshotting mechanism that lets you serialize and save a running microVM. The snapshot can be later used to restore the microVM for subsequent CI runs. This lets us reuse the warm Bazel process and analysis cache from the earlier build, which can save several minutes for subsequent builds. We’ve reduced the median duration of our CI runs by nearly 8x, with most runs completing in just a few seconds. Now we’re continuing to improve our CI workflows by supporting remote snapshot sharing across machines. We’re able to store our snapshots remotely by using userfaultfd and network block devices to capture disk and memory reads/writes. This lets us maintain these performance benefits across machine restarts and failures. This talk will walk through these optimizations and the performance improvements we’ve seen!
Speaker: Maggie Lou