Persistent Storage is Hard
When I first started getting the hang of Kubernetes, I wanted to do everything with it. After years of clicking buttons in the AWS console, the idea of having an open source, code-driven, vendor-agnostic cloud platform was incredibly exciting. So I was frustrated to learn that my coworkers — some of the strongest Kubernetes experts I’ve met — strongly advised against running databases in Kubernetes, or using it for any kind of persistent storage. They told me to stick with S3 and RDS for my storage needs.
Of course, I didn’t listen. While we (thankfully) took their advice when architecting our production application, I had started using Kubernetes to run a handful of personal applications, for things like photo sharing and notetaking. I was storing all my data on PersistentVolumes (PVs), and even worked out a naive way to back up the data to S3. And it worked great!
Until it didn’t. At some point, when updating one of my applications, I accidentally wiped out a PersistentVolume, only to discover that my backups hadn’t been working properly. Weeks worth of writing disappeared into the void.
Searching for a Solution
Undeterred, I started looking into more mature backup solutions. Velero seemed to be a community favorite, but was more for heavyweight disaster recovery, rather than the targeted application backup solution I needed. k8s-snapshots was fairly promising — it integrated directly with the AWS, GCP, and DigitalOcean APIs to create backups of the volumes underlying PVs. I had it running well for a while, until I discovered that restoring is considered out-of-scope for the project! The backups were all but useless without the ability to reattach them to a running application.
Finally, I learned about the VolumeSnapshot API, a k8s-native backup solution that has entered beta as of 1.17. It represents a huge step forward for the Container Storage Interface(CSI), the official way to handle persistent storage in Kubernetes. Each of the major cloud vendors provides CSI hooks, allowing you to manage volumes and backups in their cloud using Kubernetes-native interfaces.
This was exactly what I needed! But one problem: VolumeSnapshots have to be created manually. It’s up to the end-user to figure out how to manage their snapshots — not only creating new snapshots on a schedule, but also retiring old snapshots to prevent them from piling up. Storage may be cheap, but an hourly backup of a 1GB volume gets expensive quickly!
We wanted to give the VolumeSnapshot API a more robust, user-friendly interface. Specifically, we considered the following features necessary for any production-grade backup strategy:
- Automatic backups on a customizable, fine-grained schedule
- Automatic deletion of stale backups
- The ability to easily restore data from a particular backup
So we decided to create a new project, Gemini, in order to automate the backup and restoration of PersistentVolumes. Gemini consists of a new CRD — the SnapshotGroup — as well as a controller that creates, deletes, and restores VolumeSnapshots based on SnapshotGroup specifications. Here’s how it works.
We start with a SnapshotGroup definition, which looks something like this:
- every: 10 minutes
Here we tell Gemini to find the existing postgres-data PVC, and to schedule a backup every 10 minutes — overkill, maybe, but better safe than sorry. In addition to the latest backup, we’ve also told Gemini to also hold onto the three most recent backups, so we always have at least 30 minutes worth of coverage.
But we can go further! We can also tell Gemini to keep hourly, daily, weekly, monthly, and yearly snapshots:
- every: 10 minutes
- every: hour
- every: day
- every: week
- every: month
- every: year
Gemini will still only run a single backup every 10 minutes, but it will preserve additional backups to fulfill the longer-term backup schedule.
Restoring data from a VolumeSnapshot can be a bit fraught, as it’s unfortunately impossible to accomplish without some downtime. We need to take the following steps:
- spin down any pods that are using the PVC
- create a one-off backup of the PVC in its current state, just in case
- delete the existing PVC
- create a new PVC from the desired VolumeSnapshot
- restart our pods, pointing them to the new PVC
Because swapping out a PVC necessarily incurs downtime, we made the decision not to hide this process from the user. In particular, the user is responsible for the first and last steps, scaling the application down and back up. Gemini takes care of the middle part, swapping out the PVC.
Here’s the basic restore process. First, we’d check to see what snapshots are available:
$ kubectl get volumesnapshot
Take note of the timestamp,
1585945609- that's our target restore point: 15 minutes ago. Next, we'd scale down the application:
kubectl scale all --all --replicas=0
We’ll want to move quickly now, as our application is offline. To swap out the PVC, we simply annotate our SnapshotGroup with the desired restore point:
kubectl annotate snapshotgroup/postgres-backups --overwrite \ "gemini.fairwinds.com/restore=1585945609"
Once Gemini sees this annotation, it will trigger a one-off backup, delete the old PVC, and replace it with a new one (with the same name) using data from the specified snapshot. The restoration should only take 30 seconds or so. In the meantime, we can scale back up, and our pods will come online once the new PVC is ready.
kubectl scale all --all --replicas=1
It’s unfortunate that restoring data involves a bit of downtime, and that there doesn’t seem to be any reasonable way around this. If you have ideas on how to improve this process, let us know by opening an issue!
It’s also worth noting that you can spin up a second PVC from one of your backups, and attach it to a separate instance of your application. I’ve used this mechanism to recover old photos without having to revert my entire photo app to a particular point in time.
I’m super excited about the VolumeSnapshot API, as it fills a huge gap in the Kubernetes ecosystem. When combined with Gemini, it allows us to safely maintain persistent storage in Kubernetes. While there are still some dragons here (VolumeSnapshots are still in beta, after all), I look forward to the day Fairwinds can confidently recommend using vendor-agnostic, k8s-native storage solutions like MinIO and PostgreSQL-HA over services like S3 and RDS.
In the meantime, I’ll be enjoying life on the edge, using Gemini and VolumeSnapshots to manage my personal data.