You know what sucks? Amazon’s abusive employment practices. You know what else would suck? Being complicit in that by hosting an app on Amazon Web Services. Oh, and by the way, AWS rips you off so much with it’s monoply-level pricing that for many years AWS had been responsible for all of Amazon’s profit. What if I told you that there is an easy(ish) way to capture that value for yourself instead, and even resell that to your own customers?
I have have been (un)fortunate enough to have had first-hand experience of the real world value proposition of AWS for a generic SaaS startup a few years ago when I became the de facto AWS administrator for 20 person ~$1M ARR startup.
To say I was bitterly disappointed would be an understatement.
My prior experience with clould infrastructure had consisted entirely of working with the superb, but exceptionally expensive Heroku cloud hosting service. Heroku is amazing. For a single exorbitant monthly charge, you can upload web application code and it will just run immediately and start serving requests from the internet - you don’t need to know anything about the server that it is on. What you do need to know however is where on earth you are going to find the money to pay to support a reasonable number of fremium users for long enough to develop a significant base of loyal subscribers to your SaaS. At Heroku prices, that isn’t easy.
Enter AWS. I knew that Heroku used AWS servers, and marked up the price in return for taking away all of your sysadmin responsibilities. I had also heard a lot of hype about how AWS was supposed to streamline the process of setting up your server infrastructure. So when I was unexpectedly handed the responsibility for managing the AWS private cloud for a rapidly growing startup, I expected to encounter something that was going to offer much better pricing than Heroku, in return for doing just a little bit of sysadmin work yourself.
Or, as it turned out, doing almost all of it.
I was horrified at just how little AWS actually does for you. If you are a SaaS startup, all you really get is a bunch of virtual servers, just like you could from any other hosting provider, but at a much higher price and with a charge for every single byte of traffic in or out. Sure, if you were a manager at a large corporation and AWS was going to save you from doing your own PCI DSS or HIPAA certification, then that’s real value - but for a software startup that wants to do something bespoke and interesting using the raw material of servers and a network, it was just a Whole. Lot. Of. Money. For. Nothing.
And forget stability - we still had mysterious server outages - only it was much harder to get in contact with support than it would have been with a smaller provider like Global TeleHost. If you want redundancy or high availability with AWS, you still need to provision backup machines and take care of replication and failover etc etc etc all yourself.
Which ever way I turned, it seemed there was another part of AWS that was basically the same as I would get from any other provider, only that it was 10% easier to do the initial setup through the GUI, and that the costs would blow out horrendously as traffic and snapshot storage mounted up.
As is usual in the computer world, the economics of hardware have completely changed. Again.
When AWS first launched it’s EC2 virtual server offering in 2006, the alternative, if you wanted a reasonable amount of performance, was to buy and rack your own servers. This meant a lot of up front cost and quite a bit of financial risk, especially when you didn’t know what type of load you were going to be servicing. And in 2006, computing power was expensive. Avoiding this type of risk with such large expenses was a huge benefit.
Today, however, there are a myriad of vendors who provide dedicated physical servers that you can rent, that can be set up for you in mere minutes. They are billed monthly or even daily, and they offer roughly an order of magnitude more performance than the equivalently priced offering from AWS. Most of them also provide some kind of admin GUI that gives you all the remote management you would have from AWS, including instant setup with standard Linux ISOs. Alternatively, if you do want to buy and rack your own servers, there are also many affordable colocation options available, and some really great value hardware deals around, ranging from used Xeon systems to the latest AMD Epyc processors.
I intend to take advantage of this by creating a bargain price enterprise cloud for Allaiz.com. I’m calling it the Bargain Cloud Project.
The idea is to have two individual bare-metal servers, in different geographic locations, with the absolute minimum configuration to provide me with all the virtual machines I want. This should provide stability through simplicity (less moving parts means less things will break). I am also going to implement basic High Availability by having each VM able to automatically fail over to the other server in case of problems, and rapid recoverability in case of disaster through the magic of ZFS filesystem replication.
Setting up a bargain cloud means I can afford to offer freemium accounts to my users without going broke. My initial requirement for Allaiz.com is to support up to 1,000 users with a combination of a custom Ruby on Rails app, and some open source servers for email and document storage - perhaps Postfix, Dovecott, and OnlyOffice, although that remains to be determined. That will need a reasonable amount of computing power.
Based on some gut-feel load estimates, I decided that one of these 8 core Ryzen 3700x with 64gb Ram, 2x1TB SSD and 2 add-on 10tb HDDs would be about right. I get the awesome price/performance of the new AMD Zen CPUs along with their hardware memory encryption, and 1tb of Raid0 storage for databases, 10tb of Raid0 storage for documents and emails, and 64gb of Ram to share between the VMs that I am planning to run.
Hetzner provides all of that to me for only € 93.84 per month, which is by far the best price I was able to find. Setup was within 24 hours (and they also offer ready to go servers as well). Just as with AWS, I was given tools that allowed me to immediately install an operating system image (or in my case, manually install a minimalist version of Ubuntu 20.04 running on a ZFS mirror root, but that is a story for another time…)
But Hetzner only has datacenters in Germany and Finland, which, given that I am in Australia and am looking to serve an Australian market initially, does satisfy the “different geographic region” requirement for one of my two servers - however I still need another server closer to home. As I mentioned, other dedicated server providers charge a lot more than Hetzner does, and my requirement of needing two SSDs and two HDDs in the same server is also a major stumbling block for most of them.
After much investigation, I came to realise that the most economical way to have something locally would be to buy and configure a modest Epyc server from someone like these guys, and then colocate it with someone like these guys. The monthly cost of collocating the server and a router for secure remote access is actually about the same as Hetzner charges for a whole server, but for my requirements it’s still far cheaper than the any of the alternatives.
All up, that plan would cost me around USD$6k in up front hardware and about USD$3,200 a year in server rental and collocation. Let’s call that USD$5,200/year including hardware depreciation, or conveniently, USD$100/week. That’s not trivial, but it’s still well within my personal budget.
So how does that compare to the Heroku and Amazon alternatives?
I made a rough estimate of the cost of something equivalent from either AWS or Heroku, which you can read here if you want all the boring details. It came to over USD$40k/year for AWS, and over USD$54k/year for Heroku, which is more than USD$1,000/week! Those costs would be absolutely prohibitive for a bootstrapped business that I am funding out of savings.
So it seems the Bargain Cloud Project may live up to it’s name.
But anyway, it’s not about the money. It’s about not being complicit in Amazon’s abusive employment practices, which Allaiz.com, as a company dedicated to helping people escape Wage Slavery, simply cannot do.
There are some extra skills one needs to learn in order to free oneself from dependence on overpriced cloud VMs. As well as having a basic understanding of networking and Linux system administration, I have found that building the Bargain Cloud Project requires me to learn the following additional skills:
ZFS: While putting everything on ZFS filesystems does make a system administrator’s life so much easier that one would have to be insane not to do that with every production server, in my case it’s also essential to the Disaster Recovery functionality of the Bargain Cloud Project. ZFS allows each server to backup it’s VMs to the other server in near-real-time, meaning that if one of the servers suffers a catastrophic failure, it would be quick and easy to restore it’s entire environment to a new machine.
Libvirt: Since I am running my own VMs instead of renting them, I need to know how to use some kind of hypervisor to run them. While Proxmox seems to have the most mindshare in this kind of space, I found that it had too many bells and whistles for my needs, and introduced a lot of brittleness and extra failure modes as a result - for example, renaming a host can be enough to completely break your setup in a seemingly unrecoverable way. Libvirt, included with most Linux distributions, is a basic abstraction over KVM and LXC that provides the ability to create, start, stop, and destroy virtual machines from the command line on Linux, with minimal overhead and not too many things that can break.
Postgres Administration: Admittedly one of the big timesavers provided by Heroku Postgres and Amazon RDS is that you don’t need to know how to set up and administer a database. That being said, the database is most likely the heart of your SaaS, and whether it’s hosted for you or not, there are many things you are going to want to know about how to tune and configure it to get the most out of it in terms of performance and availability. For example, I need to know how to set up live replication to a hot-standby Postgres instance on my other server, since it’s an essential part of setting up automated failover.
Firewalls/nftables: With numerous VMs running on a virtual network inside my server, I need to control access to them with a firewall, and it also makes sense to have my server act as a NAT gateway to distribute traffic to the various VMs. Nftables is the latest iteration of Linux packet filtering admin tools, and replaces iptables. It’s much more performant and has a cleaner syntax, and for the purpose of managing access to VMs, using it directly results in much simpler and easy to understand rulesets than one tends to get from firewall tools like UFW or firewalld - and when it comes to security, being able to easily understand the contents of my firewall ruleset is important!
DRBD: I am planning to run services such as email (Dovecott) and document storage (OnlyOffice) that store data on a local filesystem. To be able to fail over automatically to the other server in case of failure, any writes to the filesystem need to be synchronised in real time to the other server. This is the job of Linux’s Distributed Replicated Block Device layer. It’s not particularly hard to set up, but there is a lot to learn about how to use it correctly.
OpenResty and Lua: Automated failover will be triggered when Cloudflare’s Intelligent Failover system is unable to contact the primary server and reroutes the request to the secondary server. At this point the secondary server needs to do a number of things before serving the request, like starting the services it needs and making sure that the primary server is not going to wake up and start serving requests also. For this to happen, All requests on the secondary server will be proxied through OpenResty, which is a version of NGINX that has the Lua scripting language embedded in it. This should allow a custom Lua script to pause the request, start services, remove the primary server from the Cloudflare load balancer, fork a process to try and take down any service still running on the primary server, and then release the request for servicing by the backup server. This promises to be one of the more fun parts of the project, and might end up as a Github repository too.
I’ll be covering all of these in more detail in the rest of the articles in this series, as I go through the process of configuring my Bargain Cloud environment… To be continued!