Infrastructure engineering
Postgres on-premise
Table of Contents
There is no cloud. It’s just someone else’s computer
Even if I’ve followed the cloud trend, I’ve always wanted to keep one foot in the infrastructure and the system1.
I think it’s essential to maintain this kind of knowledge to understand performance challenges.
Today, this allows me to build complete infrastructures for customers for different reasons:
- Large public sector organizations that already have a physical infrastructure and don’t want to depend on a cloud provider.
- Performance: dedicated servers with local storage are much more efficient than cloud instances.
- Economical: when the infrastructure becomes large, the cloud can be very expensive.
Since 2023, I’ve noticed a change of heart about the cloud, and obviously I’m not the only one:
- CIOs revisit their 100% cloud strategies - 01 Mars 2024 “Moving away from the cloud was a major theme in 2023, and is likely to become a real trend in 2024. The cost savings are simply too great for many companies to ignore.”
- FinOps: mindsets change, budget overspending remains - 17 Avril 2024 “7 out of 10 companies are still unable to meet their cloud budgets, due to a lack of visibility and an inability to integrate cost control into upstream project phases”
David Heinemeier Hansson, creator of the Ruby On Rails framework, has written many articles on this subject:
- Why we’re leaving the cloud
- Our cloud exit has already yielded $1m/year in savings
- The Big Cloud Exit FAQ
- We have left the cloud
- Cloud exit pays off in performance too
- Five values guiding our cloud exit
- Hardware is fun again
- We stand to save $7m over five years from our cloud exit
- The hardware we need for our cloud exit has arrived
TL;DR :
- They have deployed several physical servers on two separate sites (4000 vCPUs, 7680GB RAM, and 384TB NVMe storage).
- They will save
7 million10 million dollars over 5 years. - The hardware purchase paid for itself in 6 months.
- Performance is much better.
Architecture example
Here is an example of the architecture I created for a customer:
- 1 primary and 5 replicas: 2 synchronous, 2 asynchronous cascaded
- Postgres server:
- AMD EPYC 9354 32-Core / 64 threads
- 512GB DDR5 RAM
- 5.5TB NVMe storage (several million IOPS)
- 25 Gb/s network
- “Local” backup :
- Point In Time Recovery
- AMD EPYC 9124 16-Core
- 128GB DDR5
- 15TB of NVMe storage
- 25 Gb/s network
- External backup on S3 equivalent
- Cost: approx. €4,000/month ex VAT vs. over $300,000/month on AWS RDS. Performance is even better thanks to storage.
- Backup time for a 1TB database: less than 5 minutes for a full backup, a few seconds for a differential backup.
- Restore time 5 minutes.
- Linux tuning: RAID, Kernel…
- Postgres tuning
- Drafting of installation, backup/restore and failover procedures…
Of course, I’ve also built more simple architectures:
- On virtual machines
- With a connection pooler (PgBouncer)
- Supervision support: Datadog, Nagios like (Icinga, Thruk) with check_pgactivity.
-
The page you’re reading is hosted on my server. I manage my little server (HTTPS, reverse proxy, backups…). ↩︎