Sharing some lessons I learned from 10 years/millions of users in production. I’ll be in the comments if anyone has any questions!
Fully agree with the sentiment, but the blog post itself is kinda crap. All it really says is - hey, we’re overcomplicating things, but subscribe to my rss for when I actually start talking about it!
Fair criticism. I wanted to lay the groundwork as I intend for it to be a pretty large resource for people over time. Like starting with chapter one before I write the whole book. I hope you can find some value in some of the stuff to come.
Perhaps some kind of teaser that hints at what your strategy is before going into full detail?
learned from 10 years/millions of users in production
10 years per millions of users is an interesting metric :P
I’ve been doing this for 30+ years and it seems like the push lately has been towards oversimplification on the user side, but at the cost of resources and hidden complexity on the backend.
As an Assembly Language programmer I’m used to programming with consideration towards resource consumption. Did using that extra register just cause a couple of extra PUSH and POP commands in the loop? What’s the overhead on that?
But now some people just throw in a JavaScript framework for a single feature and don’t even worry about how it works or the overhead as long as the frontend looks right.
The same is true with computing. We’re abstracting containers inside of VMs on top of base operating systems which is adding so much more resource utilization to the mix (what’s the carbon footprint on that?) with an extremely complex but hidden backend. Everything’s great until you have to figure out why you’re suddenly losing packets that pass through a virtualized router to linuxbridge or OVS to a Kubernetes pod inside a virtual machine. And if one of those processes fails along the way, BOOM! it’s all gone. But that’s OK; we’ll just tear it down and rebuild it.
I get it. I understand the draw, and I see the benefits. IaC is awesome, and the speed with which things can be done is amazing. My concern is that I’ve seen a lot of people using these things who don’t know what’s going on under the hood, so they often make assumptions or mistakes that lead to surprises later.
I’m not sure what the answer is other than to understand what you’re doing at every step of the way, and always try to choose the simplest route (but future-proofed).
You make some great points, but I’m concerned that your preferred solutions may ignore the needs of working with peers. When I’ve worked with similar solutions before, we had a lot of on call, and it all went to the same person, regardless of who actually answered the phone.
There’s nuance to be had in the middle ground:
- The CI/CD pipeline should deliver releases, but shouldn’t be the only way to put them into production. People often get this wrong and have to invoke CI/CD to even do a roll back. Putting something into production without CI/CD should be possible, but it should be loud, to avoid nasty surprises, later.
- Infrastructure as code is great. It just happens to be backwards, today. We could all go back to point and click changes, if the infrastructure had full journaling (including who made the change) of changes and full rollback capability. Ironically, I expect k8s (or a fork of it) to finally deliver this.
- the only nice thing I have to say about K8s today is that it’s not locking myself into a proprietary cloud, or worse, VMWare. Today, K8s is towering mere centimeters above even worse solutions, all of them objectively awful. I’m not mad at anyone using any of these. We all know we want the freedom of K8s, but no one wants the bullshit interfaces it currently comes with. It will get better.
Anyway, interesting read. Thank you. The only way the current awful state of hosting is going to improve is by having this conversation.
I keep hearing “most people aren’t ready for K8S”. But there’s no such thing. There’s just whether K8S (or whatever replaces it) is ready for most people’s use cases.
I’m concerned that your preferred solutions may ignore the needs of working with peers. When I’ve worked with similar solutions before, we had a lot of on call, and it all went to the same person, regardless of who actually answered the phone.
Totally hear you and have the same experience myself. The approach I’m advocating for is simply running a binary on a server with rsync to deploy, and architecting your product around that limitation. Teaching a team the basics of Linux sysadmin will be incredibly useful for their careers, and it’s something that the whole team can easily learn. Then you don’t need to hire a k8s team – any engineer can do some basic debugging when things go sideways.
The approach I’m advocating for is simply running a binary on a server with rsync to deploy, and architecting your product around that limitation.
Intriguing!
I’m looking forward to your blog series on this.
This is a teaser for the promised future posts. Don’t ghost us.
Writing the second post now :-)
How are you liking bearblog.dev?
Only just started using it, but I love it. Simple, basic blogging without the enshittification of Medium.
I am a big advocate of the KISS strategy. look forward to seeing more of what you got.
In addition to this I like YAGNI: you ain’t gonna need it
Don’t implement features you don’t need because you think it might/could/should be useful in the future. YAGNI
DRY and YAGNI are awesome iif you also practice YNIRN (You Need It Right Now)! Otherwise you just get boilerplate of spaghetti