GitLab PostgreSQL Data Recovery
Today, shit happened on a larger on-premise GitLab EE instance of one of our Onlime GmbH customers. GitLab's production.log started to fill up with
PG::Error (FATAL: the database system is in recovery mode) errors which were somehow related to LFS operations. That definitely didn't sound cool and smelled like data corruption. The customer noticed it by failed CI jobs with 500 Internal Server Errors, and let me know immediately.
As we have that GitLab server running in a LXC container on a ZFS based system (Proxmox VE), it was easy to pull a clone of the full system and play around with PostgreSQL data recovery before working on live data. I decided to go for a full data restore by dumping and loading it from scratch in a freshly initialized PostgreSQL data dir.
Secure External Backup with ZFS Native Encryption
Let's improve our Simple and Secure External Backup solution I have published back in 2018. Back then, I was using
rsync over SSH to pull backup data, and
LUKS encryption as full disk encryption for the external drives. As we all know, transferring data with rsync can get horribly slow and blow up your I/O if you're transferring millions of small files. Also,
LUKS encryption may be a bit low level and inflexible. What we want to accomplish: A performant and secure backup solution based on ZFS, using
zfs send|recv for efficient data transfer, and ZFS native encryption to secure our external drives. So let's go ahead and built that thing from scratch on a fresh 2021 stack!
Simple and Secure External Backup
What we are going to set up here is a simple and secure offsite and offline backup server. Let's assume you already have an existing backup server that is connected to the internet 24/7 and does daily/weekly/monthly backups. We would now like to set up a second offsite backup server that just cares about storing data to encrypted external drive and after each backup run, you are going to physically detach that drive.
So, we are talking about offline backups in addition to the fact having this server offsite - at a different location than your main backup server.
Preferably, your main backup server would also be offsite. But as it needs to pull data frequently, its storage is always available and not getting detached.
Let's call your main backup server
backup and the one we are going to set up here