>>> We no longer provision load balancers, or configure DNS; we simply describe the resources that we need, and Kubernetes makes it happen.
This is (part) of what keeps me in the stone age. You are provisioning load balancers and DNS - but just one step removed through k8s
And my prior is that we need to understand that, be aaare of it, have a model of what is going on to help develop and debug.
And so it feels a bit like "magic abstraction". And then to peek through the abstraction you suddenly not only need to know about DNS and which machine is running bind, but also how kubernetes internally stores it's DNS config and how it spits that out and what version they changed it with.
In other words you have to become expect in two things to debug it.
And maybe it's worth it - but I struggle to see why it's not simpler to keep my install scripts going.
(OK I guess I am writing my own answer - but surely the point is what is the simplest level of thin install scripts needed to deploy containers?)
I’m seeing things as you are seeing them currently.
I love the idea of using Kubernetes, it sounds amazing initially, but then every single article I read about it turns into some epic blog post that leaves me worried that the whole house of cards could easily come crashing down.
Maybe in the future the abstraction will become rock solid, easy to install and manage and ‘just work’, but it doesn’t feel like that to me now. There’s too many ‘we had to come up with / use hack X to integrate it with software Y’.
Until then, if you are on a small team with a small budget, I reckon keeping it simple is the better approach. Standard OSs with some bash scripts for provisioning, build and deploy. Even if it’s more manual work, and takes a bit longer, having an understanding of the platform you are building on is crucial.
BTW if you are looking for a description of a way todo this sort of thing ‘the boring way’:
K8S is an API that the majority is agreeing on, which is rare. There is a lot of amazing tooling, a staggering amount of ongoing innovation, all built on solid concepts: declarative models, emitted metrics (the /proc equivalent, but with larger scope) and versioned infrastructure as data (a.k.a. GitOps).
For someone that is known as the King of Bash (self-proclaimed) - https://speakerdeck.com/gerhardlazu/how-to-write-good-bash-c... - and after a decade of Puppet, Chef, Ansible and oh wow that sweet bash https://github.com/gerhard/deliver - even if all my workstations and work servers (yup, all running k3s) are provisioned with Make (bash++), I still think that K8S is the better approach to running production infrastructure. The advantage to using simple and well-defined components (e.g. external-dns, ingress-nginx, prometheus-operator etc.) that adhere to a universal API, and are maintained by many smart people all around the world, is a better proposition than scripting in my opinion.
At the end of the day, I'm in it for the shared mindset, great conversations and a genuine desire to do better, which I have not seen before K8S & the wider CNCF. I will go on a limb here and assume that I love scripting just as much as you do, but go beyond this aspect and you will discover that it's more to it than "thin install scripts that deploy containers" (which are not just glorified jails or unikernels).
I thi k you've hit your head on the nail - the point is not just the kubernetes, it's that you can build standard infrastructure on top. Any software can be (in theory) setup with a helm script, configured in a standard way through YAML configmaps rather than some esoteric configfiles or scripts which are diffetent for every piece of software
By using K8s and similar technologies, you're buying standardization with underlying complexity and reduced efficiency.
In many cases, it's a good tradeoff, because you can now use standard tooling on everything.
Just like it's cheaper to ship an entire (physical) shipping container that's half-full than to ship the same stuff loosely. Or why companies will send you two separate letters on the same day with a small note that this is more efficient for them than collating them.
I assume that k8s also makes it much easier to move to a different cloud provider if you're unhappy with one (or the new one offers better pricing). Instead of rewriting your bespoke scripts that only you understand, anyone familiar with the technology will know which modules to swap to make it work with the new provider.
> And my prior is that we need to understand that, be aaare of it, have a model of what is going on to help develop and debug.
If it's a sufficiently robust abstraction, you don't, you just learn the abstraction. Kubernetes has reached that point for many folks.
I no longer have a detailed mental model of how my compiler or LLVM works, I just trust that it does. When was the last time you needed to (or were capable of) debugging a bug in your compiler? A couple of human generations of work went into making that happen.
Note that it turns out compiling code well, or making a reliable orchestration system, is an enormously complex problem. At some point, the complexity outstrips the ability of even generalists in the field to keep up, yet the systems keep getting more reliable.
So in these types of cases, you can either do it yourself poorly (you're an amateur), do it yourself well (congrats, you've become an expert), or delegate.
This isn't really limited to computing. I delegate maintenance on my car to a mechanic, while I'm pretty sure a generation ago, everybody (in the US) changed their own oil and understood how the carb worked. Times change.
The car issue is tempting as an argument clincher. My problem is the army - a lot of their trucks and so forth are not computer controlled this-generation-of-toyota-abstraction but have stayed inefficient but repairable trucks. Because they both need to have trucks that are repairable and also all the computer understands is nice paved roads. Which is what armies don't do.
computer being tuned for nice paved roads is not really the problem. If the army needed they have the money to put into developing a car computer specialized in whatever terrain they needed to.
Their problem is being easy to repair.
Most of the regular people can take their cars to a shop that will have the really basic in stock (ie. oils and filters) and can order most of the rest from a distributor to have delivered same or next day for most cars. You will only take longer to get specialized parts or parts for unusual cars. Also most people can work around not having a car for a few days even if it is a hassle.
While army base does have specialized mechanics to properly fix the trucks, if one break in the middle of a mission after engagement the team in that truck need to be able to patch it up and get it going on the field.
So the army need trucks that people that are primarily soldiers and not mechanics can patch up "easily" in the field without tons of specialized training, without access to special or weird tools and minimal to no access to parts.
i think the idea is that k8s does away with having to glue all those pieces of infra together, not that you lose understanding of how it all works together. part of the headache with managing infra is that it rots over time... things come and go (sysv to systemd, apt/snap/whatever, config files change, things break). it's easier to keep up to date on k8s than all the disparate parts of the OS and provider-specific APIs and whatnot
> Does this imply there is a cloud abstract layer that should come
crossplane.io comes closest afaik
> And is k8s the simplest possible abstraction? And if not - what is?
If you are asking about the simplest possible abstraction for container scheduling and orchestration, then I believe Nomad from HashiCorp or Docker Swarm are simpler. As for managed solutions with wide adoption in all types of environments and the largest investment to date, I am not aware of anything on par with K8S.
In the happy path, your developers no longer need to worry about this stuff. It’s possible for a team to stand up a new service and plumb it through all the way to the external LB just using k8s yaml templates.
In the unhappy path, sure, you need someone who knows how to debug networking issues, and in some cases it’s going to be harder to debug because of the layers of indirection. But the total amount of toil is significantly reduced.
A bad abstraction doesn’t carry its weight in complexity. A good abstraction allows you to ignore the lower levels most of the time without missing something important; I’d put k8s firmly in the latter category.
Let's turn the knobs on the scenario and see how it appears:
>>> >>> with the advent of the first programming languages you no longer had to think in terms of registers, loading operands from memory, storing the results back, spilling registers to the stack.
>>> You _are_ handling registers and memory spills, but just one step removed through the use of C.
The analogy may not be perfect, but I think it makes obvious some of the things also mentioned in sibling comments: it's all about habit, maturity and thus trust.
If you trust your tools are working correctly, if you know how to deal with their well known quirks, you'll just rebase on top a layer and hopefully boost your productivity and tackle more complex problems more easily.
Maturity is important because if today you're more likely to blame yourself than the compiler when your stuff doesn't works, it's just because you're lucky to work with mature and popular toolchains. (And I'm not only talking about the past when compilers where new and unproven, it still happens today on some niche embedded toolchains).
So, yes, it's indeed rational to be wary of new unproven abstraction layers as they could bring more pain than help.
It's hard to judge when that line is crossed though.
I personally like to know how stuff works under the hood anyways. I find it useful in practice and it gives me confidence in using the higher layers, when they make sense, or stay with the lower level layers, when they make sense.
Occasionally I still write some assembly. But most of the time, for most of the stuff, it just makes more sense to use a higher level programming language.
I see k8s in a similar way. We have operating systems, programming languages, etc; all sorts of abstractions that help us separate concerns and have specialists dealing with the nitty gritty details of some stuff, so that everybody else can be specialized in something else (just like in real life)
But i think this is apples and oranges - C maps pointers and memory allocation in a very direct robust manner, even Python allows you to dis back to see the stack.
But this hard-line-to-underlying reality is unlikely to exist looking at how this months k8s will configure last years AWS route53.
It just works 80% of the time is a disaster, 98% of the time might be bearable. is it above 99%?
Indeed, my point was that it's a quantitative question not a qualitative one.
When compilers created buggy code every other day, when memory allocators were unreliable because memory fragmentation would make it likely for new allocations to fail in the lifetime of a normal program execution, etc, it would be, as you said "a disaster if it works <95%" of the times.
Did k8s reach 99%? The jury is still out. Probably not yet, but in principle I don't see anything wrong pursuing that path just because it's "another layer". We use abstraction layers all the time; they allowed progress (Yes often we went too far)
Isn’t this true of all operating systems? Debugging is going to require some knowledge about the OS and the syscalls it makes but you don’t want to write directly to the machine as an OS would. Same goes for internet connected clustered machines
I'm disappointed that blogs and podcasts keep promoting Linode. I currently maintain about 30 Linodes and I have been doing so for the past 2 years.
Some things I noticed:
* The internal network is not private. But people don't realise it. You share a /16 with other Linodes. So many open databases, file shares and other services in there.
* Block storage performance is really poor, around 100 iops. Same as a SATA disk from 10 years ago.
* No proper snapshot / image functionality.
* Linode Kubeternetes Engine was based on Debian Oldstable when it launched.
* Excessive CPU steal, even on dedicated cores. 25% CPU steal is considered normal. Over 50% happens a lot.
* Problems with their hosts. I can only guess what the reason is but 4 to 8 hours of unannounced downtime of a VM happend to me 6 times in the past 2 years.
Yes, support is friendly. But my international phone bill is huge because the fastest way to get them to do something is to call.
In a previous job a couple of years ago, I experienced regular issues with Linode (hypervisor errors, storage errors, performance issues, networking issues, etc.).
Despite all that, management decided to stay with Linode for the following reasons:
* Change is hard, and "better the devil you know" mentality.
* The instance pricing looks cheap compared to AWS. e.g. c5.xlarge ($124) vs 8GB-Balanced ($40) that Linode charges. In reality it isn't so cheap because it's poor oversold technology.
* AWS/GCP have exorbitant bandwidth pricing. Linode bandwidth is very generous, as its pooled across all servers in the account.
* Having someone to pick up the phone 24/7 when there's a problem is a big plus in theory. However, it's much better not to need to call in the first place because things just work.
* Migrating providers can be an expensive and time-consuming endeavour.
* Technical debt, interdependencies, manually configured snowflake servers and infrastructure, no documentation, etc. makes changes risky.
* Not enough DevOps on the team, and too many fires to put out, and shiny features to ship means cloud provider migration is low on the priority list.
The owner of the company thinks they are great because you can call them when there is a problem. I'm having a hard time convincing him that these are issues I've never had at other providers. Certainly not so many.
We are moving everything away. Most of our servers are with another provider already. And we haven't had any similar issues there. I've never called them!
And I forgot to mention the connectivity issues at Linode. When the whole London datacenter was unreachable for 2 hours we lost some customers.
I use Digital Ocean, Scaleway, Linode, Google, Cloudflare & Amazon on a daily basis, and I have experienced networking issues on all providers this year. It's all public, some even wrote lengthy post-mortems, most have been posted on HN.
When failures happen, it's always a series of unfortunate incidents. When we've hit issues with Linode, we reached out and worked on what we can improve in our changelog.com setup, and discussed about the improvements that we can expect on the Linode end. Our common interest is a more resilient system, which requires a healthy collaboration, and Linode has been a great technology partner for us. Expect to see these write-ups on changelog.com as soon as these improvements have shipped, and we have hard data to support the claims ; )
I'm sorry to hear that things have not been as smooth for you on Linode. I hope that you will find an infra provider that you will be able to rely on and work with as we do. Not all collaborations will work out, and that's OK. It's also OK to be annoyed, fed up with the way things are and look for something different, something more suitable for you. My only ask is that you share your migration story with the changelog.com community. That is something that I would want to hear about.
This is something more companies need to realize: Yes, you should be easily and quickly reachable by phone so that when (not if) things go wrong, I can get good service and resolve it quickly.
But if I'm calling you, you have most likely already failed. If I'm calling for information, your documentation has failed to make that information accessible (it either wasn't documented, or not easy enough to find). If I'm calling to resolve an issue (technical or billing), it would have been a lot better if it didn't happen in the first place.
I ditched Linode years ago because it just wasn't that great. AWS if I need ephemeral boxes (I used to be an enterprise on-prem AWS consultant), but I run most everything on a home 96 EPYC core, 512 GB, SSD, HDD (ZFS) box running KVM, Docker, and open vswitch. It just isn't worth it to rent slow, expensive servers when I need lots of them and to be fast. I don't have any problems remoting into them with ddns and wireguard.
My first Supermicro just turned 9 and it's still running strong, with a fresh install of Ubuntu 20.04 & k3s over the holidays. The second Supermicro turned 5, and has been running FreeBSD all this time like a champ. They are both loft guardians.
A bunch of bare metal hosts run on Scaleway / Online, and different VMs & managed services run in Digital Ocean, Linode, AWS & GCP. I sometimes spin the odd bare metal instance on Equinix Metal (former Packet).
A diverse fleet means that there's always something new to learn and try out. A single large host would make me anxious, as no internet provider or power grid is 100% reliable and available. Also, software upgrades sometimes fail, and things get messed up all the time, which is when I find it most efficient to just start from scratch. A single host makes that less convenient.
Every approach has its pros and cons, which is why my main workstation is a 20 Xeon W with 64GB RAM & 1TB NVME : ). Yes, there is a backup workstation which doubles up as a mobile one meaning that it can work without power or hard internet for almost a day. Options are good ; )
I was baffled by this:
"The worst part is that serving the same files from disks local to the VMs is 44x faster than from persistent volumes (267MB/s vs 6MB/s)."
Is it a configuration issue on their side or do the LKE volumes are really limited to 6MB/s on linode?
Block storage is an area that we are working with Linode to improve. That's the random read/write performance, as measured by fio.
We have mostly sequential reads & writes (mp3 files) that peak at 50MB/s, then rely on CDN caching (Fastly makes us happy in this respect).
CDN caching is something that we are currently improving, which will make things quicker and more reliable.
The focus is on reality vs the ideal, and the path that we are taking to improving not just changelog.com, but also our sponsors' products. No managed K8S or IaaS is perfect, but we enjoy the Linode partnership & collaboration ;)
I think they're sponsored by linode, and they're developer-themed -- there may be team / content reasons to use a lot of unnecessary tools in order to review them
This quote from Adam on our episode about the setup explains some of our motivations here:
> It’s worth noting that we don’t really need what we have around Kubernetes. This is for fun, to some degree. One, we love Linode, they’re a great partner… Two, we love you, Gerhard, and all the work you’ve done here… We don’t really need this setup. One, it’s about learning ourselves, but then also sharing that. Obviously, Changelog.com is open source, so if you’re curious how this is implemented, you can look in our codebase. But beyond that, I think it’s important to remind our audience that we don’t really need this; it’s fun to have, and actually a worthwhile investment for us, because this does cost us money (Gerhard does not work for free), and it’s part of this desire to learn for ourselves, and then also to share it with everyone else… So that’s fun. It’s fun to do.
The static files should definitely be on some kind of object storage like S3, that's what it's built for. Much faster, more reliable, more scalable, and likely much cheaper too.
As for persistent volumes, might be better to just offload Postgres to a managed DB service and downsize the K8S instances, or use something like CockroachDB which is natively distributed and can make use of local volumes instead.
Yes, it does make sense to move static files to object storage, especially the mp3s. There is some ffmpeg-related refactoring that we need to do before we can do this though, and it's not a quick & easy task, so we have been deferring it since it's not that high priority, and there are simpler solutions to this particular problem (i.e. improved CDN caching).
Other static files such as css, js, txt make sense to remain bundled with the app image, which is stateless and a prime candidate for horizontal scaling. Also, CDN caching makes small static files that change infrequently a non issue, regardless of their origin.
The managed Postgres service from Linode's 2021 roadmap is definitely something that we are looking forward to, but the simplest thing might be to provision Postgres with local volumes instead. We are already using a replicated Postgres via the Crunchy PostgreSQL Operator, so I'm looking forward to trying this approach out first.
CockroachDB is on my list of cool tech to experiment with, but that will use an innovation token, and we only have a few left for 2021, so I don't want to spend them all at once.
Yea the small static files that are part of your webapp can stay with it, but media files are best on S3. If you need a block interface though, I recommend something like ObjectiveFS: https://objectivefs.com/
If you're using an operator then local volumes is a good middleground if it automates the replication already. CockroachDB also has a kubernetes operator although it's only for GKE currently. There are also other options like YugabyteDB which is another cloud-native postgres-compatible DB.
It seemed interesting and I tried to subscribe but the process doesn't work without disabling ublock/adblock. I am okay with that so I disabled it but then I failed two captchas round (find bikes in thumbnails, some thumbnails are upside down ?!) :(.
Sorry for the hassle! We’ve been hit by lots of spammers lately so I battened down the hatches. Unfortunately this has the side effect of also blocking some legit humans as well. :(
Yes, we could have mitigated that entirely with CDN stale caching, but it was good to see what happens today, and then iterate towards better Fastly integration.
I'd be interested in learning more about the move from Concourse to Circle (I'm a notorious Concourse fanboy). What went well, what didn't, what you miss, what prompted it -- that sort of thing.
The primary reason behind the move was not wanting to manage CI. Since there were no options for a managed Concourse in 2018, we migrated to Circle, one of the Changelog sponsors at the time.
Concourse worked well for us, we didn't have any issues that were being enough to remember. You may be interested in this screenshot that captured the changelog.com pipeline from 2017: https://pipeline.gerhard.io/images/small-oss.png
I missed the simple Concourse pipeline view at first, but CircleCI improved by leaps and bounds in 2020, and the new Circle pipeline view equivalent is even better (compared to Concourse, clicking on jobs always works): https://app.circleci.com/pipelines/github/thechangelog/chang...
The Circle feature which I didn't expect to like as much as I do today, is the dashboard view (list of all pipeline/workflow runs). This is something that Concourse is still missing: https://app.circleci.com/pipelines/github/thechangelog
In 2021, I expect us to spend one migration credit on GitHub Actions, as a Circle replacement. Argo comes as a second close, but that requires an innovation credit which is more precious to us. Because we are already using GitHub Actions for some automation, it would make sense to consolidate, and also leverage the GitHub Container Registry, as a migration from Docker Hub. Watch https://github.com/thechangelog/changelog.com to see what happens : )
I browsed the site swiftly. Isn't it a bunch of content items, like WordPress site? So, here we are, deploying WordPress in the cloud via Kubernetes, for real, not as a funny meme?
changelog.com used to be WordPress, then became a Phoenix app because it needed features that were hacky to implement & then manage in WP. It's more of a podcasting platform these days rather than a CMS.
Is this really “simpler” as they claim? It reads a bit like the honeymoon phase and I’m a lot more interested in how they feel about the new stack 1 or 2 years down the line.
Not knowing what changelog.com is, I was wondering the same thing... from the header of the page, I could tell that there's a blog, a podcast, and a newsletter.
Without some additional insight that I don't have, it does seem that this is an enormously over-engineered "solution" for a website -- I've NFI why it requires 99.99% uptime!
Perhaps they look it as a goal or challenge, an opportunity to showcase their knowledge and skills to potential customers, or, hell, maybe they just enjoy that kind of thing? If that's the case, I completely understand and can even relate (my home network is a textbook example of an "over-engineered solution": close to a dozen "enterprise-class" servers in the basement, ~35 various subnets, VMware Enterprise Plus clusters, BGP for anycast, and so on).
AFAICT, though, this is just some developers running a blog and podcasts aimed at other developers? I mean, we're not exactly talking about a "mission critical" web site that's going to result in death and destruction the next time it goes down or Linode shits itself, right?
Or am I missing something?
--
EDIT: I've read through the rest of the comments now ...
> This is for fun, to some degree ... We don’t really need this setup. One, it’s about learning ourselves, but then also sharing that ... It’s fun to do.
1. We're great engineers because we can set and maintain such an impressive set-up.
2. We're terrible engineers because all of this could probably done one server for dynamic content + S3 for dynamic content. Or not even S3, maybe just some Cloudflare or Akamai caching.
Of course, like the posters above, I could be missing something due to my outsider/consumer view of changelog.
We are both! I would also add lazy to that paradox. My surname is a letter off, and that's at close as it gets : )
The devil is in the details, there is more to it than dynamic & static content, we are using Fastly, otherwise we couldn't serve all the traffic that we do.
The best part is that it's all public - https://github.com/thechangelog/changelog.com - and we welcome contributions, especially those that simplify our setup without compromising on resiliency and availability. I'm looking forward to yours ; )
Yea that is what I was thinking. And the next thing was what is the price difference. A bit of me thinks this is just some people doing some really cool tech work for the sake of it, which I can't fault them for.
This is (part) of what keeps me in the stone age. You are provisioning load balancers and DNS - but just one step removed through k8s
And my prior is that we need to understand that, be aaare of it, have a model of what is going on to help develop and debug.
And so it feels a bit like "magic abstraction". And then to peek through the abstraction you suddenly not only need to know about DNS and which machine is running bind, but also how kubernetes internally stores it's DNS config and how it spits that out and what version they changed it with.
In other words you have to become expect in two things to debug it.
And maybe it's worth it - but I struggle to see why it's not simpler to keep my install scripts going.
(OK I guess I am writing my own answer - but surely the point is what is the simplest level of thin install scripts needed to deploy containers?)