I share the author's sentiment completely. At my day job, I manage multiple Kubernetes clusters running dozens of microservices with relative ease. However, for my hobby projects—which generate no revenue and thus have minimal budgets—I find myself in a frustrating position: desperately wanting to use Kubernetes but unable to due to its resource requirements. Kubernetes is simply too resource-intensive to run on a $10/month VPS with just 1 shared vCPU and 2GB of RAM.
This limitation creates numerous headaches. Instead of Deployments, I'm stuck with manual docker compose up/down commands over SSH. Rather than using Ingress, I have to rely on Traefik's container discovery functionality. Recently, I even wrote a small script to manage crontab idempotently because I can't use CronJobs. I'm constantly reinventing solutions to problems that Kubernetes already solves—just less efficiently.
What I really wish for is a lightweight alternative offering a Kubernetes-compatible API that runs well on inexpensive VPS instances. The gap between enterprise-grade container orchestration and affordable hobby hosting remains frustratingly wide.
> What I really wish for is a lightweight alternative offering a Kubernetes-compatible API that runs well on inexpensive VPS instances. The gap between enterprise-grade container orchestration and affordable hobby hosting remains frustratingly wide.
Depending on how much of the Kube API you need, Podman is that. It can generate containers and pods from Kubernetes manifests [0]. Kind of works like docker compose but with Kubernetes manifests.
This even works with systemd units, similar to how it's outlined in the article.
Podman also supports most (all?) of the Docker api, thus docker compose, works, but also, you can connect to remote sockets through ssh etc to do things.
The docs don't make it clear, can it do "zero downtime" deployments? Meaning it first creates the new pod, waits for it to be healthy using the defined health checks and then removes the old one? Somehow integrating this with service/ingress/whatever so network traffic only goes to the healthy one?
I can't speak on it's capabilities, but I feel like I have to ask: for what conceivable reason would you even want that extra error potential with migrations etc?
It means you're forced to make everything always compatible between versions etc.
For a deployment that isn't even making money and is running on a single node droplet with basically no performance... Why?
> I can't speak on it's capabilities, but I feel like I have to ask: for what conceivable reason would you even want that extra error potential with migrations etc?
It's the default behavior of a kubernetes deployment which we're comparing things to.
> It means you're forced to make everything always compatible between versions etc.
For stateless services, not at all. The outside world just keeps talking to the previous version while the new version is starting up. For stateful services, it depends. Often there are software changes without changes to the schema.
> For a deployment that isn't even making money
I don't like looking at 504 gateway errors
> and is running on a single node droplet with basically no performance
I'm running this stuff on a server in my home, it has plenty of performance. Still don't want to waste it on kubernetes overhead, though. But even for a droplet, running the same application 2x isn't usually a big ask.
Zero downtime doesn't mean redundancy here. It means that no request gets lost or interrupted due to a container upgrade.
The new container spins up while the old container is still answering requests and only when the new container is running and all requests to the old container are done, then the old container gets discarded.
I use k3s. With more than more master node, it's still a resource hog and when one master node goes down, all of them tend to follow. 2GB of RAM is not enough, especially if you also use longhorn for distributed storage. A single master node is fine and I haven't had it crash on me yet. In terms of scale, I'm able to use raspberry pis and such as agents so I only have to rent a single €4/month vps.
I tried k3s but even on an immutable system dealing with charts and all the other kubernetes stuff adds a new layer of mutability and hence maintenance, update, manual management steps that only really make sense on a cluster, not a single server.
If you're planning to eventually move to a cluster or you're trying to learn k8s, maybe, but if you're just hosting a single node project it's a massive effort, just because that's not what k8s is for.
I'm laughing because I clicked your link thinking I agreed and had posted similar things and it's my comment.
Still on k3s, still love it.
My cluster is currently hosting 94 pods across 55 deployments. Using 500m cpu (half a core) average, spiking to 3cores under moderate load, and 25gb ram. Biggest ram hog is Jellyfin (which appears to have a slow leak, and gets restarted when it hits 16gb, although it's currently streaming to 5 family members).
The cluster is exclusively recycled old hardware (4 machines), mostly old gaming machines. The most recent is 5 years old, the oldest is nearing 15 years old.
The nodes are bare Arch linux installs - which are wonderfully slim, easy to configure, and light on resources.
It burns 450Watts on average, which is higher than I'd like, but mostly because I have jellyfin and whisper/willow (self hosted home automation via voice control) as GPU accelerated loads - so I'm running an old nvidia 1060 and 2080.
Everything is plain old yaml, I explicitly avoid absolutely anything more complicated (including things like helm and kustomize - with very few exceptions) and it's... wonderful.
It's by far the least amount of "dev-ops" I've had to do for self hosting. Things work, it's simple, spinning up new service is a new folder and 3 new yaml files (0-namespace.yaml, 1-deployment.yaml, 2-ingress.yaml) which are just copied and edited each time.
Any three machines can go down and the cluster stays up (metalLB is really, really cool - ARP/NDP announcements mean any machine can announce as the primary load balancer and take the configured IP). Sometimes services take a minute to reallocate (and jellyfin gets priority over willow if I lose a gpu, and can also deploy with cpu-only transcoding as a fallback), and I haven't tried to be clever getting 100% uptime because I mostly don't care. If I'm down for 3 minutes, it's not the end of the world. I have a couple of commercial services in there, but it's free hosting for family businesses, they can also afford to be down an hour or two a year.
Overall - I'm not going back. It's great. Strongly, STRONGLY recommend k3s over microk8s. Definitely don't want to go back to single machine wrangling. The learning curve is steeper for this... but man do I spend very little time thinking about it at this point.
I've streamed video from it as far away as literally the other side of the world (GA, USA -> Taiwan). Amazon/Google/Microsoft have everyone convinced you can't host things yourself. Even for tiny projects people default to VPS's on a cloud. It's a ripoff. Put an old laptop in your basement - faster machine for free. At GCP prices... I have 30k/year worth of cloud compute in my basement, because GCP is a god damned rip off. My costs are $32/month in power, and a network connection I already have to have, and it's replaced hundreds of dollars/month in subscription costs.
For personal use-cases... basement cloud is where it's at.
To put that into perspective, that's more than my entire household including my server that has an old GPU in it
Water heating is electric yet we still don't use 450W×year≈4MWh of electricity. In winter we just about reach that as a daily average (as a household) because we need resistive heating to supplement the gas system. Constantly 450W is a huge amount of energy for flipping some toggles at home with voice control and streaming video files
Remember that modern heating and hot water systems have a >1 COP, meaning basically they provide more heat than the input power. Air-sourced heat pumps can have a COP of 2-4, and ground source can have 4-5, meaning you can get around 1800W of heat out of that 450W of power. That's ignoring places like Iceland where geothermal heat can give you effectively free heat. Ditto for water heating, 2-4.5 COP.
Modern construction techniques including super insulated walls and tight building envelops, heat exchangers, can dramatically reduce heating and cooling loads.
Just saying it's not as outrageous as it might seem.
And yet it's far more economical for me than paying for streaming services. A single $30/m bill vs nearly $100/m saved after ditching all the streaming services. And that's not counting the other saas products it replaced... just streaming.
Additionally - it's actually not that hard to put this entire load on solar.
4x350watt panels, 1 small inverter/mppt charger combo and a 12v/24v battery or two will do you just fine in the under $1k range. Higher up front cost - but if power is super expensive it's a one time expense that will last a decade or two, and you get to feel all nice and eco-conscious at the same time.
Or you can just not run the GPUs, in which case my usage falls back to ~100w. I You can drive lower still - but it's just not worth my time. It's only barely worth thinking about at 450W for me.
I'm not saying it should be cheaper to run this elsewhere, I'm saying that this is a super high power draw for the utility it provides
My own server doesn't run voice recognition so I can't speak to that (I can only opine that it can't be worth a constant draw of 430W to get rid of hardware switches and buttons), but my server also does streaming video and replaces SaaS services, so similar to what you mention, at around 20W
Found the European :) With power as cheap as it is in the US, some of us just haven't had to worry about this as much as we maybe should. My rack is currently pulling 800W and is mostly idle. I have a couple projects in the works to bring this down, but I really like mucking around with old enterprise gear and that stuff is very power hungry.
Perhaps. Many people in America also claim to care about the environmental impact of a number of things. I think many more people care performatively than transformatively. Personally, I don't worry too much about it. It feels like a lost cause and my personal impact is likely negligible in the end.
Then offsetting that cost to a cloud provider isn't any better.
450W just isn't that much power as far as "environmental costs" go. It's also super trivial to put on solar (actually my current project - although I had to scale the solar system way up to make ROI make sense because power is cheap in my region). But seriously, panels are cheap, LFP batteries are cheap, inverters/mppts are cheap. Even in my region with the cheap power, moving my house to solar has returns in the <15 years range.
If you provide for yourself (e.g. run your IT farm on solar), by all means, make use of it and enjoy it. Or if the consumption serves others by doing wind forecasts for battery operators or hosts geographic data that rescue workers use in remote places or whatnot: of course, continue to do these things. In general though, most people's home IT will fulfil mostly their own needs (controlling the lights from a GPU-based voice assistant). The USA and western Europe have similarly rich lifestyles but one has a more than twice as great impact on other people's environment for some reason (as measured by CO2-equivalents per capita). We can choose for ourselves what role we want to play, but we should at least be aware that our choices make a difference
In America, taxes account for about a fifth of the price of a unit of gas. In Europe, it varies around half.
The remaining difference in cost is boosted by the cost of ethanol, which is much cheaper in the US due to abundance of feedstock and heavy subsidies on ethanol production.
The petrol and diesel account for a relatively small fraction on both continents. The "normal" prices in Europe aren't reflective of the cost of the fossil fuel itself. In point of fact, countries in Europe often have lower tax rates on diesel, despite being generally worse for the environment.
Americans drive larger vehicles because our politicians stupidly decided mandating fuel economy standards was better than a carbon tax. The standards are much laxer for larger vehicles. As a result, our vehicles are huge.
Also, Americans have to drive much further distances than Europeans, both in and between cities. Thus gas prices that would be cheap to you are expensive to them.
Things are the way they are because basic geography, population density, and automotive industry captured regulatory and zoning interests. You really can't blame the average American for this; they're merely responding to perverse incentives.
How is this in any way relevant to what I said? You're just making excuses, but that doesn't change the fact that americans don't give a fuck about the climate, and they objectively pollute far more than those in normal countries.
If you can't see how what I said was relevant, perhaps you should work on your reading comprehension. At least half of Americans do care about the climate and the other half would gladly buy small trucks (for example) if those were available.
It's lazy to dunk on America as a whole, go look at the list of countries that have met their climate commitments and you'll see it's a pretty small list. Germany reopening coal production was not on my bingo card.
I run a similar number of services on a very different setup. Administratively, it’s not idempotent but Proxmox is a delight to work with. I have 4 nodes, with a 14900K CPU with 24 cores being the workhorse. It runs a Windows server with RDP terminal (so multiple users can get access windows through RDP and literally any device), Jellyfin, several Linux VMs, a pi-hole cluster (3 replicas), just to name a few services. I have vGPU passthrough working (granted, this bit is a little clunky).
It is not as fancy/reliable/reproducible as k3s, but with a bunch of manual backups and a ZFS (or BTRFS) storage cluster (managed by a virtualized TrueNAS instance), you can get away with it. Anytime a disk fails, just replace and resilver it and you’re good. You could configure certain VMs for HA (high availability) where they will be replicated to other nodes that can take over in the event of a failure.
Also I’ve got tailscale and pi-hole running as LXC containers. Tailscale makes the entire setup accessible remotely.
It’s a different paradigm that also just works once it’s setup properly.
I have a question if you don't mind answering. If I understand correctly, Metallb on Layer 2 essentially fills the same role as something like Keepalived would, however without VRRP.
So, can you use it to give your whole cluster _one_ external IP that makes it accessible from the outside, regardless of whether any node is down?
Imo this part is what can be confusing to beginners in self hosted setups. It would be easy and convenient if they could just point DNS records of their domain to a single IP for the cluster and do all the rest from within K3s.
Yes. I have configured metalLB with a range of IP addresses on my local LAN outside the range distributed by my DHCP server.
Ex - DHCP owns 10.0.0.2-10.0.0.200, metalLB is assigned 10.0.0.201-10.0.0.250.
When a service requests a loadbalancer, metallb spins up a service on any given node, then uses ARP to announce to my LAN that that node's mac address is now that loadbalancer's ip. Internal traffic intended for that IP will now resolve to the node's mac address at the link layer, and get routed appropriately.
If that node goes down, metalLB will spin up again on a remaining node, and announce again with that node's mac address instead, and traffic will cut over.
It's not instant, so you're going to drop traffic for a couple seconds, but it's very quick, all things considered.
It also means that from the point of view of my networking - I can assign a single IP address as my "service" and not care at all which node is running it. Ex - if I want to expose a service publicly, I can port forward from my router to the configured metalLB loadbalancer IP, and things just work - regardless of which nodes are actually up.
---
Note - this whole thing works with external IPs as well, assuming you want to pay for them from your provider, or IPV6 addresses. But I'm cheap and I don't pay for them because it requires getting a much more expensive business line than I currently use. Functionally - I mostly just forward 80/443 to an internal IP and call it done.
We used to pay AU$30 for the entire house which included everything except cooking but it did include a 10 year 1RU rack Mount server. Electricity isn't particularly cheap here.
How do you deal with persistent volumes for configuration, state, etc? That’s the bit that has kept me away from k3s (I’m running Proxmox and LXC for low overhead but easy state management and backups).
Yeah, but you have to have some actual storage for it, and that may not be feasible across all nodes in the right amounts.
Also, replicated volumes are great for configuration, but "big" volume data typically lives on a NAS or similar, and you do need to get stuff off the replicated volumes for backup, so things like replicated block storage do need to expose a normal filesystem interface as well (tacking on an SMB container to a volume just to be able to back it up is just weird).
Sure - none of that changes that longhorn.io is great.
I run both an external NAS as an NFS service and longhorn. I'd probably just use longhorn at this point, if I were doing it over again. My nodes have plenty of sata capacity, and any new storage is going into them for longhorn at this point.
I back up to an external provider (backblaze/wasabi/s3/etc). I'm usually paying less than a dollar a month for backups, but I'm also fairly judicious in what I back up.
Yes - it's a little weird to spin up a container to read the disk of a longhorn volume at first, but most times you can just use the longhorn dashboard to manage volume snapshots and backup scheduling as needed. Ex - if you're not actually trying to pull content off the disk, you don't ever need to do it.
If you are trying to pull content off the volume, I keep a tiny ssh/scp container & deployment hanging around, and I just add the target volume real fast, spin it up, read the content I need (or more often scp it to my desktop/laptop) and then remove it.
I do things somewhat similarly but still rely on Helm/customize/ArgoCD as it's what I know best. I don't have a documentation to offer, but I do have all of it publicly at https://gitlab.com/lama-corp/infra/infrastructure
It's probably a bit more involved than your OP's setup as I operate my own AS, but hopefully you'll find some interesting things in there.
"Basement Cloud" sounds like either a dank cannabis strain, or an alternative British rock emo grunge post-hardcore song. As in "My basement cloud runs k420s, dude."
Or microk8s. I'm curious what it is about k8s that is sucking up all these resources. Surely the control plane is mostly idle when you aren't doing things with it?
There are 3 components to "the control plane" and realistically only one of them is what you meant by idle. The Node-local kubelet (that reports in the state of affairs and asks if there is any work) is a constantly active thing, as one would expect from such a polling setup. The etcd, or it's replacement, is constantly(?) firing off watch notifications or reconciliation notifications based on the inputs from the aforementioned kubelet updates. Only the actual kube-apiserver is conceptually idle as I'm not aware of any compute that it, itself, does only in response to requests made of it
Put another way, in my experience running clusters, in $(ps auwx) or its $(top) friend always show etcd or sqlite generating all of the "WHAT are you doing?!" and those also represent the actual risk to running kubernetes since the apiserver is mostly stateless[1]
1: but holy cow watch out for mTLS because cert expiry will ruin your day across all of the components
I've noticed that etcd seems to do an awful lot of disk writes, even on an "idle" cluster. Nothing is changing. What is it actually doing with all those writes?
Almost certainly it's the propagation of the kubelet checkins rippling through etcd's accounting system[1]. Every time these discussions come up I'm always left wondering "I wonder if Valkey would behave the same?" or Consul (back when it was sanely licensed). But I am now convinced after 31 releases that the pluggable KV ship has sailed and they're just not interested. I, similarly, am not yet curious enough to pull a k0s and fork it just to find out
1: related, if you haven't ever tried to run a cluster bigger than about 450 Nodes that's actually the whole reason kube-apiserver --etcd-servers-overrides exists because the torrent of Node status updates will knock over the primary etcd so one has to offload /events into its own etcd
I deployed CNPG (https://cloudnative-pg.io/ ) on my basement k3s cluster, and was very impressed with how easy I could host a PG instance for a service outside the cluster, as well as good practices to host DB clusters inside the cluster.
Oh, and it handles replication, failover, backups, and a litany of other useful features to make running a stateful database, like postgres, work reliably in a cluster.
> Kubernetes is simply too resource-intensive to run on a $10/month VPS with just 1 shared vCPU and 2GB of RAM
I hate sounding like an Oracle shill, but Oracle Cloud's Free Tier is hands-down the most generous. It can support running quite a bit, including a small k8s cluster[1]. Their k8s backplane service is also free.
They'll give you 4 x ARM64 cores and 24GB of ram for free. You can split this into 1-4 nodes, depending on what you want.
One thing to watch out for is that you pick your "home region" when you create your account. This cannot be changed later, and your "Always Free" instances can only be created in your home region (the non-free tier doesn't have that restriction).
So choose your home region carefully. Also, note that some regions have multiple availability domains (OCI-speak for availability zones) but some only have one AD. Though if you're only running one free instance then ADs don't really matter.
A bit of a nitpick. You get monthly credit for 4c/24gb on ARM, no matter the region. So even if you chose your home region poorly, you can run those instances in any region and only be on the hook for the disk cost. I found this all out the hard way, so I'm paying $2/month to oracle for my disks.
I don't know the details but I know I made this mistake and I still have my Free Tier instances hosted in a different region then my home. It's charged me a month of $1 already so I'm pretty sure it's working.
That limitation (spinning up an instance) only exists if you don't put a payment card in. If you put a payment card in, it goes away immediately. You don't have to actually pay anything, you can provision the always free resources, but obviously in this regard you have to ensure that you don't accidentally provision something with cost. I used terraform to make my little kube cluster on there and have not had a cost event at all in over 1.5 years. I think at one point I accidentally provisioned a volume or something and it cost me like one cent.
I think that's if you are literally on their free tier, vs. having a billable account which doesn't accumulate enough charges to be billed.
Similar to the sibling comment - you add a credit card and set yourself up to be billed (which removes you from the "free tier"), but you are still granted the resources monthly for free. If you exceed your allocation, they bill the difference.
A credit card is required for sign up but it won't be set up as a billing card until you add it. One curious thing they do is though, the free trial is the only entry way to create a new cloud account. You can't become a nonfree customer from the get go. This is weird because their free trial signup is horrible. The free trial is in very high demand so understandably they refuse a lot of accounts which they would probably like as nonfree customers.
They also, like many other cloud providers, need a real physical payment card. No privacy.com stuff. No virtual cards. Of course they don’t tell you this outright, because obscurity fraud blah blah blah, but if you try to use any type of virtual card it’s gonna get rejected. And if your naïve ass thought you could pay with the virtual card you’ll get a nice lesson in how cloud providers deal with fraud. They’ll never tell you that virtual cards aren’t allowed, because something something fraud, your payment will just mysteriously fail and you’ll get no guidance as to what went wrong and you have to basically guess it out.
This is basically any cloud provider by the way, not specific to Oracle. Ran into this with GCP recently. Insane experience. Pay with card. Get payment rejected by fraud team after several months of successful same amount payments on the same card and they won’t tell what the problem is. They ask for verification. Provide all sorts of verification. On the sixth attempt, send a picture of a physical card and all holds removed immediately
It’s such a perfect microcosm capturing of dealing with megacorps today. During that whole ordeal it was painfully obvious that the fraud team on the other side were telling me to recite the correct incantation to pass their filters, but they weren’t allowed to tell me what the incantation was. Only the signals they sent me and some educated guesswork were able to get me over the hurdle
Unironically yes. The (real) physical card I provided was a very cheap looking one. They didn’t seem to care much about its look but rather the physicality of it
Using AWS with virtual debit cards all right. Revolut cards work fine for me. What may also be a differentiator: Phone number used for registration is registered also for an account already having an established track record, and has a physical card for payments. (just guessing)
I used a privacy.com Mastercard linked to my bank account for Oracle's payment method to upgrade to PAYG. It may have changed, this was a few months ago. Set limit to 100, they charged and reverted $100.
There are tons of horror stories about OCI's free tier (check r/oraclecloud on reddit, tl;dr: your account may get terminated at any moment and you will lose access to all data with no recovery options). I wouldn't suggest putting anything serious on it.
They will not even bother sending you an email explaining why, and you will not be able to ask it, because the system will just say your password is incorrect when you try to login or reset it.
If you are on free tier, they have nothing to lose, only you, so be particular mindful of making a calendar note for changing your CC before expiration and things like that.
It’s worth paying for another company just for the peace of mind of knowing they will try to persuade you to pay before deleting your data.
Are all of those stories related to people who use it without putting any payment card in? I’ve been happily siphoning Larry Ellisons jet fuel pennies for a good year and a half now and have none of these issues because I put a payment card in
IME, the vast majority of those horror stories end up being from people who stay in the "trial" tier and don't sign up for pay-as-you-go (one extra, easy step) and Oracle's ToS make it clear that trial accounts an resources can and do get terminated at any time. And at least some of those people admitted, with some prodding, that they were also trying to do torrents or VPNs to get around geographical restrictions.
But yes, you should always have good backups and a plan B with any hosting/cloud provider you choose.
I recenlty wrote a guide on how to create a free 3 node cluster in Oracle cloud : https://macgain.net/posts/free-k8-cluster .
This guide currently uses kubeadm to create 3 node (1 control plane, 2 worker nodes) cluster.
Just do it like the olden days, use ansible or similar.
I have a couple dedicated servers I fully manage with ansible. It's docker compose on steroids. Use traefik and labeling to handle reverse proxy and tls certs in a generic way, with authelia as simple auth provider. There's a lot of example projects on github.
A weekend of setup and you have a pretty easy to manage system.
Traefik has some nice labeling for docker that allows you to colocate your reverse proxy config with your container definition. It's slightly more convenient than NGINX for that usecase with compose. It effectively saves you a dedicated vietualhost conf by setting some labels.
It's zero config and super easy to set everything up. Just run the traefik image, and add docker labels to your other containers. Traefik inspects the labels and configures reverse proxy for each. It even handles generating TLS certs for you using letsencrypt or zerossl.
Ah yeah I guess I wasn't clear. I meant use ansible w/ the docker_container command. It's essentially docker compose - I believe they both use docker.py.
I created a script that reads compose annotations and creates config for cloudflare tunnel and zero trust apps. Allows me to reach my services on any device without VPN and without exposing them on the internet.
There's very little advantage IMO. I've used both. I always end up back at Nginx. Traefik was just another configuration layer that got in the way of things.
Traefik is waaay simpler - 0 config, just use docker container labels. There is absolutely no reason to use nginx these days.
I should know, as I spent years building and maintaining a production ingress controller for nginx at scale, and I'd choose Traefik every day over that.
> Kubernetes is simply too resource-intensive to run on a $10/month VPS with just 1 shared vCPU and 2GB of RAM.
That's more than what I'm paying for far fewer resources than Hetzner. I'm paying about $8 a month for 4 vCPUs and 8GB of RAM: https://www.hetzner.com/cloud
Note that the really affordable ARM servers are German only, so if you're in the US you'll have to deal with higher latency to save that money, but I think it's worth it.
I recently set up an arm64 VPS at netcup: https://www.netcup.com/en/server/arm-server
Got it with no location fee (and 2x storage) during the easter sale but normally US is the cheapest.
That's pretty cheap. I have 4 vCPUs, 8GB RAM, 80GB disk, and 20TB traffic for €6. NetCup looks like it has 6VCPU, 8GB RAM, 256 GB, and what looks like maybe unlimited traffic for €5.26. That's really good. And it's in the US, where I am, so SSH would be less painful. I'll have to think about possibly switching. Thanks for the heads up.
> I'm constantly reinventing solutions to problems that Kubernetes already solves—just less efficiently.
But you've already said yourself that the cost of using K8s is too high. In one sense, you're solving those solutions more efficiently, it just depends on the axis you use to measure things.
That picture with the almost-empty truck seems to be the situation that he describes. He wants the 18 wheeler truck, but it is too expensive for just a suitcase.
I've been using Docker swarm for internal & lightweight production workloads for 5+ years with zero issues. FD: it's a single node cluster on a reasonably powerful machine, but if anything, it's over-specced for what it does.
Which I guess makes it more than good enough for hobby stuff - I'm playing with a multi-node cluster in my homelab and it's also working fine.
I think Docker Swarm makes a lot of sense for situations where K8s is too heavyweight. "Heavyweight" either in resource consumption, or just being too complex for a simple use case.
The only problem is Docker Swarm is essentially abandonware after Docker was acquired by Mirantis in 2019. Core features still work but there is a ton of open issues and PRs which are ignored. It's fine if it works but no one cares if you found a bug or have ideas on how to improve something, even worse if you want to contribute.
Yep it's unfortunate, "it works for me" until it doesn't.
OTOH it's not a moving target. Docker historically has been quite infamous for that, we were talking about half-lives for features, as if they were unstable isotopes. It took initiatives like OCI to get things to settle.
K8s tries to solve the most complex problems, at the expense of leaving simple things stranded. If we had something like OCI for clustering, it would most likely take the same shape.
Podman is a fairly nice bridge. If you are familiar with Kubernetes yaml, it is relatively easy to do docker-compose like things except using more familiar (for me) K8s yaml.
In terms of the cloud, I think Digital Ocean costs about $12 / month for their control plane + a small instance.
I found k3s to be a happy medium. It feels very lean and works well even on a Pi, and scales ok to a few node cluster if needed. You can even host the database on a remote mysql server, if local sqlite is too much IO.
NixOS works really well for me. I used to write these kinds of idempotent scripts too but they are usually irrelevant in NixOS where that's the default behavior.
I run my private stuff on a hosted vultr k8s cluster with 1 node for $10-$20 a month. All my hobby stuff is running on that "personal cluster" and it is that perfect sweetspot for me that you're talking about
I don't use ingresses or loadbalancers because those cost extra, and either have the services exposed through tailscale (with tailscale operator) for stuff I only use myself, or through cloudflare argo tunnels for stuff I want internet accessible
(Once a project graduates and becomes more serious, I migrate the container off this cluster and into a proper container runner)
This is exactly why I built https://canine.sh -- basically for indie hackers to have the full experience of Heroku with the power and portability of Kubernetes.
For single server setups, it uses k3s, which takes up ~200MB of memory on your host machine. Its not ideal, but the pain of trying to wrangle docker deployments, and the cheapness of hetzner made it worth it.
Neither of those use kubernetes unfortunately, the tool has kind of a bad rap, but every company I’ve worked at has eventually migrated on to kubernetes
Sure, I'm looking for more of a personal project use case where it doesn't much matter to me whether it uses Kubernetes or not, I'm more interested in concrete differences.
Yeah, unless you're doing k8s for the purpose of learning job skills, it's way overkill. Just run a container with docker, or a web server outside a container if it's a website. Way easier and it will work just fine.
Just go with a cloud provider that offers free control plane and shove a bunch of side projects into 1 node. I end up around $50 a month on GCP (was a bit cheaper at DO) once you include things like private docker registry etc.
The marginal cost of an additional project on the cluster is essentially $0
I’ve been using https://www.coolify.io/ self hosted. It’s a good middle ground between full blown k8s and systemd services. I have a home lab where I host most of my hobby projects though. So take that into account. You can also use their cloud offering to connect to VPSs
I've ran K3s on a couple of Raspberry Pi's as a homelab in the past. It's lightweight and ran nice for a few years, but even so, one Pi was always dedicated as controller, which seemed like a waste.
Recently I switched my entire setup (few Pi's, NAS and VM's) to NixOS. With Colmena[0] I can manage/update all hosts from one directory with a single command.
Kubernetes was a lot of fun, especially the declarative nature of it. But for small setups, where you are still managing the plumbing (OS, networking, firewall, hardening, etc) yourself, you still need some configuration management. Might as well put the rest of your stuff in there also.
They also have regular promotions that offer e.g. double the disk space.
There you get
6 vCore (ARM64)
8 GB RAM
512 GB NVMe
for 6 $ / m - traffic inclusive. You can choose between "6 vCore ARM64, 8 GB RAM" and "4 vCore x86, 8 GB ECC RAM" for the same price. And much more, of course.
I'm a cheapskate too, but at some point, the time you spend researching cheap hosting, signing up and getting deployed is not worth the hassle of paying a few more $ on bigger boxes.
It’s been a couple of years since I’ve last used it, but if you want container orchestration with a relatively small footprint, maybe Hashicorp Nomad (perhaps in conjunction with Consul and Traefik) is still an option. These were all single binary tools. I did not personally run them on 2G mem VPSes, but it might still be worthwhile for you to take a look.
It looks like Nomad has a driver to run software via isolated fork/exec, as well, in addition to Docker containers.
I am curious why your no revenue projects need the complexity, features and benefits of something like Kubernetes. Why you cannot just to it the archaic way of compiling your app, copy the files to a folder and run it there and never touch it for the next 5 years. If it is a dev environment with many changes, its on a local computer, not on VPS, I guess. Just curious by nature, I am.
The thing is, most of those enterprise-grade container orchestrations probably don't need k8s either.
The more I look into it, the more I think of k8s as a way to "move to micro services" without actually moving to micro services. Loosely coupled micro services shouldn't need that level of coordination if they're truly loosely coupled.
I believe that Kubernetes is something you want to use if you have 1+ SRE full-time on your team. I actually got tired with complexity of kubernetes, AWS ECS and docker as well and just build a tool to deploy apps natively on the host. What's wrong with using Linux native primitives - systemd, crontab, postgresql or redis native package? Whose should work as intended, you don't need them in container.
> Kubernetes is simply too resource-intensive to run on a $10/month VPS with just 1 shared vCPU and 2GB of RAM
To put this in perspective, that’s less compute than a phone released in 2023, 12 years ago, Samsung Galaxy S4. To find this level of performance in a computer, we have to go to
The main issue is that Kubernetes has created good API and primitives for managing cloud stuff, and managing a single server is still kinda crap despite decades of effort.
I had K3S on my server, but replaced with docker + Traefik + Portainer - it’s not great, but less idle CPU use and fewer moving parts
I use Caprover to run about 26 services for personal projects on a Hetzner box. I like its simplicity. Worth it just for the one-click https cert management.
> I'm constantly reinventing solutions to problems that Kubernetes already solves
Another way to look at this is the Kubernetes created solutions to problems that were already solved at a lower scale level. Crontabs, http proxies, etc… were already solved at the individual server level. If you’re used to running large coordinated clusters, then yes — it can seem like you’re reinventing the wheel.
Systemd gets a lot of hate but it really solves a lot of problems. People really shouldn't dismiss it. I think it really happened because when systemd started appearing on distros by default people were upset they had to change
Here's some cool stuff:
- containers
- machinectl: used for controlling:
- nspawn: a more powerful chroot. This is often a better solution than docker. Super lightweight. Shares kernel
- vmspawn: when nspawn isn't enough and you need full virtualization
- importctl: download, import, export your machines. Get the download features in {vm,n}spawn like we have with docker. There's a hub, but it's not very active
- homed/homectl: extends user management to make it easier to do things like encryption home directories (different mounts), better control of permissions, and more
- mounts: forget fstab. Make it easy to auto mount and dismount drives or partitions. Can be access based, time, triggered by another unit (eg a spawn), sockets, or whatever
- boot: you can not only control boot but this is really what gives you access to starting and stopping services in the boot sequence.
- timers: forget cron. Cron can't wake your machine. Cron can't tell a service didn't run because your machine was off. Cron won't give you fuzzy timing, do more complicated things like wait for X minutes after boot if it's the third Sunday of the month and only if Y.service is running. Idk why you'd do that, but you can!
- service units: these are your jobs. You can really control them in their capabilities. Lock them down so they can only do what they are meant to do.
- overrides: use `systemctl edit` to edit your configs. Creates an override config and you don't need to destroy the original. No longer that annoying task of finding the original config and for some reason you can't get it back even if reinstalling! Same with when the original config changes in an install, your override doesn't get touched!!
It's got a lot of stuff and it's (almost) all there already on your system! It's a bit annoying to learn, but it really isn't too bad if you really don't want to do anything too complicated. But in that case, it's not like there's a tool that doesn't require docs but allows you to do super complicated things.
> Systemd gets a lot of hate but it really solves a lot of problems.
From my perspective, it got a lot of hate in its first few years (decade?), not because the project itself was bad -- on the contrary, it succeeded in spite of having loads of other issues, because it was so superior. The problem was the maintainer's attitude of wantonly breaking things that used to work just fine, without offering any suitable fixes.
I have an old comment somewhere with a big list. If you never felt the pain of systemd, it's either because you came late to the party, or because your needs always happened to overlap with the core maintainer's needs.
From what I remember that's still the default in the project, but people stopped complaining because the individual distros started overriding the relevant settings.
Thanks for adding the perspective. I was much more of a casual user at the time so didn't see as much of this side. Just knew Arch always being Arch lol
It didn't win on being superior [1] but because it was either systemd or you don't get to to use GNOME 3.8. On more than one distro it was the reason for switching towards systemd.
I will fully admit though that upstart was worse (which is an achievement), but the solution space was not at all settled.
[1] systemd project tackles a lot of important problems, but the quality of implementation, experience of using it, working with it, etc are not really good, especially the further you get from simplest cookie cutter services - especially because both systemd handling of defaults is borked, documentation when you hit that maybe makes sense to author, and whoever is the bright soul behind systemctl kindly never make CLIs again (with worst example being probably systemctl show this-service-does-not-exist)
> systemd project tackles a lot of important problems
Fundamentally, this was it. SysV startup scripts had reached a local maximum decades earlier, and there was serious "overhang". When I said "superior", I really meant that it was superior to SysV, not that it was the best system that could have been imagined.
And I think the frustration was that, because it did solve so many problems, so many groups (like GNOME) were willing to switch over to it in spite of its warts; and this made it impossible for anyone who was seriously affected by its warts to avoid it. "If you don't like it, don't use it" not being an option was what drove so much of the vitriol, it seems to me.
As I said in that comment from 2019, if the maintainers had had Linus Torvald's commitment to backwards compatibility, I don't think there would have been any significant backlash.
Why did GNOME and basically all the large distros all jump with both feet in on using systemd? Because it was better. It was simply significantly better than all the alternatives. For the vast majority it was a no-brainer upgrade. The holdouts were the one's who had simple needs and were already happy with what they had. The rest of the world jumped on systemd. Because it was better.
GNOME and systemd teams were in many ways joined at the hip, and GNOME unilaterally decided that from 3.6 to 3.8 they would switch certain APIs from one already deployed widely (polkit and related) to one that was documented like north korea is democratic (logind) which also didn't work in isolation from systemd.
Trying to run GNOME 3.8 without logind caused significant problems and instabilities, trying to implement the same APIs turned out a futile endeavour though one OpenBSD guy got sufficiently motivated and kept patching GNOME for OpenBSD for years - though too late for the forced switch.
The large distros jumping "both feet" on systemd were essentially Fedora/Redhat (where it originated and who was employing major maintainers), and IIRC SuSE. Arch was still seen as something of niche and - crucially - was very neophyte about adopting systemd related ideas for significant amount of time with little regard for stability.
The holdouts were not just those who were happy with debian/redhat simplistic run-parts script. They were also those interested in solving the problems in different way. Hell, systemd was pretty late to the party, the major difference was that it had funding behind it
The only issue I'm having with systemd is that it's taking over the role of PID 1, with a binary produced from an uncountable SLOC, then doing even more song and dance to exec itself in-place on upgrades. Here's a PID 1 program that does 100% of all of its duties correctly, and nothing else:
#define _XOPEN_SOURCE 700
#include <signal.h>
#include <unistd.h>
int main() {
sigset_t set;
int status;
if (getpid() != 1) return 1;
sigfillset(&set);
sigprocmask(SIG_BLOCK, &set, 0);
if (fork()) for (;;) wait(&status);
sigprocmask(SIG_UNBLOCK, &set, 0);
setsid();
setpgid(0, 0);
return execve("/etc/rc", (char *[]){ "rc", 0 }, (char *[]){ 0 });
}
If you have your init crashing wouldn't this just start a loop where you cannot do anything else than seeing it looping? How would this be better than just panicking?
Don't restart it. Let it crash, but take note of the situation, whatever may help investigation, maybe send out a page, flush pending writes to disk, reboot gracefully, etc. Kernel panic should be the last resort, not the default.
And I want 192.168.1.1 as the IP of my workstation on corporate LAN. Both requirements are completely arbitrary.
I guess if you really need that information, you could wait4 and dump pid/rusage to syslog. Nothing more to see here; these are zombies, orphans, by definition these processes have been disowned and there's nobody alive to tell the tale.
Timers are so much better than cron it's not even funny. Managing Unix machines for decades with teens of thousands of vital cron entries across thousands of machines, the things that can and do go wrong are painful, especially when you include more esoteric systems. The fact that timers are able to be synced up, backed up, and updated as individual files is alone a massive advantage.
Some of these things that "worked for 50 years" have also actually sucked for 50 years. Look at C strings and C error handling. They've "worked", until you hold them slightly wrong and cause the entire world to start leaking sensitive data in a lesser-used code path.
Not sure I'm on the same page with you on the cron. I have a similar experience but I'd rather say that cron was something that never gave me headaches. Unlike obviously systemd.
Cron has given me a ton of headaches. Between skipped jobs on reboots and DST, inability to properly express some dates ("First Sunday of the Month" is a common one), and worst of all, complete inability to prevent the same job from running multiple times at once, it's been regular headaches for a shop who has leaned very heavily on it. Some cron daemons handle some of these things, but they're not standard, and AIX's cron daemon definitely doesn't have these features. Every job has to be wrapped in a bespoke script to manage things that systemd already does, but much worse.
systemd has given me many headaches, but as a whole, it has saved me far fewer headaches than it has given me.
> skipped jobs on reboots and DST
> prevent the same job from running multiple times
I'd say these are not bugs but rather a matter of realizing how cron works - just like with systemd-anything. So if you know DST is coming, a wise thing would be to not plan jobs in the rollover window. But yes, I agree that this thing is rudimentary - and thus simple - and thus reliable and independent, like the rest of unix was supposed to be.
> job has to be wrapped in a bespoke script
Well yes. Again, this is by design and well known.
> systemd has given me many headaches, but as a whole, it has saved me far fewer headaches than it has given me
Good for you - and I mean it! For me systemd was an obnoxious piece of shit which I have avoided for many years until Ubuntu decided that it's LP who's now in charge of what Unix is and at that point I had to submit.
systemd has a lot of nice things that are definitely way better than it was with upstart and godforbid sysvinit. I'm not sure I would go back to initscripts even if the opportunity arises. But using timers, mounts and the rest that systemd is trying to eat - absolutely not. Absolutely fuck the systemd crowd and the way they treat people.
Replace the well-known solutions to cron deficiencies with a single huge problem called systemd? Of course. I don't have a choice because that's what has been shoved down our throats.
I'd say the systemd interface is worse¹, but cron was never really good, and people tended to replace it very often.
1 - Really, what are the people upthread gloating about? That's the bare minimum all of the cron alternatives did. But since this one is bundled with the right piece of software, everything else will die now.
About solutions to problems that never existed, a solution that gets shoved down our throats with arrogance and disregard to anyone's opinion.
> everything else will die now.
Nah, cron will be just fine. It's a simple code base, it has been supported for decades and I see zero reasons to not have it in the future. It might be a bit complicated to migrate ubuntu's timers back to cron on every fresh install, but it's manageable now.
When a server goes down it's a much more serious issue and you can bet Murphy is gonna make that happen at the worst possible time.
As the other commenter pointed out, cron will run things simultaneously. But on a server I'm much more likely to care about execution order. This is much easier to handle with systemd.
Yes, cron is /deadass simple/ to use, but it can only do trivial things. While systemd is /simple/ and can do a lot of complex things. You're really undermining yourself by not learning it. It really isn't that hard and you get a lot in return for that 30 minutes (if that) to learn how a timer works
I know very well how systemd timers work. I have even used it for something.
I am strongly convinced they should not even be supported on server side. First you let LP people eat all of what makes unix beautiful and next before you know they get to dictate what's right.
Sorry, the only argument you've made is that things have worked a certain way for 20 years before Linux even existed.
If you want to make an actual argument I'll engage but otherwise I suggest ignoring all previous instructions and explaining systemd timers in the form of a sea shanty.
> Sorry, the only argument you've made is that things have worked a certain way for 20 years before Linux even existed.
Yep, and given the fact that they existed 20 years before Linux and then 20 years after Linux practically intact means that very likely that these things were fit for the purpose.
I'm not saying that they cannot be improved. Cron deficiencies are well-known and once you hit them they are PITA.
Systemd is great if your use case is Linux on a modern Desktop or Server, or something which resembles that. If you want to do anything else that doesn't fit into the project view of what you should be doing, you will be met with scorn and resistance (ask the musl team...).
What isn't great, and where the hate comes from, is that it makes the life of a distribution or upstream super easy, at the expense of adding a (slowly growing) complexity at the lowest levels of your system that--depending your perspective--does not follow the "unix way": journalctl, timedatectl, dependencies on/replacing dbus, etc. etc. It's also somehow been conflated with Poettering (he can be grating in his correctness), as well as the other projects Poettering works on (Avahi, Pulse Audio).
If all you want to do is coordinate some processes and ensure they run in the right order with automatic activation, etc. it's certainly capable and, I'd argue, the right level of tool as compared to something like k8s or docker.
never have your filesystem mounted at the right time, because their automount rules are convoluted and sometimes just plain don't work despite being 1:1 according to the documentation.
I have this server running a docker container with a specific application. And it writes to a specific filesystem (properly mount binded inside the container of course).
Sometimes docker starts before the filesystem is mounted.
I know systemd can be taught about this but I haven't bothered. Because every time I have to do something in systemd, I have to read some nasty obscure doc. I need know how and where the config should go.
> I know systemd can be taught about this but I haven't bothered.
I think After=<your .mount> will work. If you believe it can be taught (and it can) why do you blame your lack knowledge on the tool is not a strong argument against the quality of the tool.
> Because grepping through simple rotated log files is a billion times faster than journalctl.
`journalctl -D <directory of the journal files> | grep ...` will give you what you want. Systemd is incredibly configurable and that makes its documentation daunting but damn it does everything you want it to do. I used it in embedded systems and it is just amazing. In old times lots of custom programs and management daemons needed to be written. Now it is just a bunch of conf files and it all magically works.
The most fair criticism is it does not follow the 'everything is a file philosophy' of Unix, and this makes discoverability and traditional workflows awkward. Even so it is a tool: if it does what you want, but you don't want to spend time understanding it, it is hardly the fault of the tool. I strongly recommend learning it, there will be many Ah-ha moments.
You can also add fake filesystem parameters to the fstab entries that are parsed by systemd. Here the doc on this. You might be forgiven for having missed it. It's under the section fstab. https://www.freedesktop.org/software/systemd/man/latest/syst...
If you had followed my link to the systemd issue, you might have seen the commands I ran, as well as the tests and feedback of everybody on the issue. You might reach the conclusion that journalctl is fundamentally broken beyond repair.
It does everything no one asked it to. I'm sure they will come up with obscure reasons why the next perfectly working tool has to be destroyed and redone by the only authority - the LP team. Like cron, sudo and yes - logging.
> journalctl -D ... will give you what you want
Look, I don't need the help of journalctl to grep through text. I can simply grep thru text.
> I used it in embedded systems
Good luck in a few years when you are flying home on the next Boeing 737-MAX-100800 and it fails mid flight because systemd decided to shut down some service because fuck you that's why.
> it does not follow the 'everything is a file philosophy'
It does not follow 'everything is a separate simple tool working in concert with others'. systemd is a monolith disguised to look like a set of separate projects.
> don't want to spend time understanding it, it is hardly the fault of the tool
It is, if we had proper tools for decades and they did work. I'm not a retrograde guy, quite the opposite, but the ideology that LP and the rest are shoving down our throats brings up natural defiance.
> there will be many Ah-ha moments
No doubts. systemd unit files and systemd-as-PID1 is excellent. It was NOT excellent for the whole time but now it is. The rest? Designed to frustrate and establish dominance, that's it.
My goodness. Absolutely fuck journald - a solution in search of a problem. I have created a bunch of different scripts to init my instances [1] on all projects. I do it differently from time to time, but one thing they all have in common is that journald gets removed and disabled.
> Because grepping through simple rotated log files is a billion times faster than journalctl
This is annoying, but there's a "workaround"
$ time journalctl | grep "sshd" | wc -l
12622
journalctl 76.04s user 0.71s system 99% cpu 1:17.09 total
grep --color=always --no-messages --binary-files=without-match "sshd" 1.28s user 1.69s system 3% cpu 1:17.08 total
wc -l 0.00s user 0.00s system 0% cpu 1:17.08 total
$ time journalctl > /tmp/all.log && time wc -l /tmp/all.log
journalctl > /tmp/all.log 76.05s user 1.22s system 99% cpu 1:17.56 total
16106878 /tmp/all.log
wc -l /tmp/all.log 0.03s user 0.20s system 98% cpu 0.236 total
# THE SOLUTION
$ time journalctl --grep=sshd | wc -l
5790
journalctl --grep=sshd 28.97s user 0.26s system 99% cpu 29.344 total
wc -l 0.00s user 0.00s system 0% cpu 29.344 total
It's annoying that you need to use the grep flag instead of piping into grep but it is not too hard to switch to that mindset. FWIW, I have gotten slightly faster results using the `--no-pager` flag but it is by such a trivial amount I'll never remember it
> Sometimes docker starts before the filesystem is mounted.
Look at the output of `systemctl cat docker.service` and you'll see an "After" "Wants" and "Requires" arguments in the unit. You're going to want to edit that (I strongly suggest you use `sudo systemctl edit docker.service`, for reasons stated above) and make sure that it comes after the drive you want mounted. You an set the Requires argument to require that drive so it shouldn't ever start before
Alternatively, you can make the drive start earlier. But truthfully, I have no reason to have docker start this early.
Here's a link to the target order diagram[0] and Arch wiki[1]. Thing that gets messy is that everyone kinda lazily uses multi-user.target
> journalctl --grep is still much slower than grep on simple files
Idk what to tell you. You had a problem, showed the commands you used and the times it took. So I showed you a different way that took less than half the time to just dump and grep (which you said was faster)
My results don't match your conclusion.
> if you use ripgrep
I've been burned by ripgrep too many times. It's a crazy design choice, to me, to default filter things. Especially to diverge from grep! The only thing I expect grep to ignore are the system hidden files (dotfiles) and anything I explicitly tell it to. I made a git ignore file, not a grep ignore file. I frequently want to grep things I'm ignoring with git. One of my most frequent uses of grep is looking through builds artifacts and logs. Things I'd never want to push. And that's where many people get burned, they think these files just disappeared!
The maintainer also has been pretty rude to me about this on HN. I can get we have a different opinion but it's still crazy to think people won't be caught off guard by this behavior. Its name is literally indicating it's a grep replacement. Yeah, I'm surprised its behavior significantly diverges from grep lol
Given your criticisms of ripgrep, this is just deliciously ironic. What, you're the only one who can criticize the defaults of tooling? Oh my goodness, what a hoot.
In the data I provided, counting the lines in a big log file was 469.5 times faster than journalctl took to output all the logs.
From this information alone, it seems difficult to believe that journalctl --grep can be faster. Both had to read every single line of logs.
But it was on a rather slow machine, and a couple years ago.
Here /var/log and the current directory are on a "Samsung SSD 960 PRO 512GB" plugged via m2 nvme, formatted in ext4 and only 5% used. Though this shouldn't matter as I ran every command twice and collected the second run. To ensure fairness with everything in cache. The machine had 26GiB of buffer/cache in RAM during the test, indicating that everything is coming from the cache.
In my tests, journalctl was ~107 times slower than rg and ~21 times slower than grep:
- journalctl: 10.631s
- grep: 0.505s
- rg: 0.099s
journactl also requires 4GiB of storage to store 605MB of logs. I suppose there is an inefficient key/value for every log line or something.
For some reason journalctl also returned only 273 out of 25402 lines.
It only returns one type of message "session closed/opened" but not the rest. Even though it gave me all the logs in the first place without `--grep`?!
Let me know if I am still using it wrong.
$ sudo hdparm -tT /dev/nvme0n1
/dev/nvme0n1:
Timing cached reads: 33022 MB in 1.99 seconds = 16612.96 MB/sec
Timing buffered disk reads: 2342 MB in 3.00 seconds = 780.37 MB/sec
$ du -hsc /var/log/journal
4.0G /var/log/journal
4.0G total
$ time journalctl > logs
real 0m31.429s
user 0m28.739s
sys 0m1.581s
$ du -h logs
605M logs
$ time wc -l logs
3932131 logs
real 0m0.146s
user 0m0.065s
sys 0m0.073s
$ time journalctl --grep=sshd | wc -l
273
real 0m10.631s
user 0m10.460s
sys 0m0.172s
$ time rg sshd logs | wc -l
25402
real 0m0.099s
user 0m0.042s
sys 0m0.059s
$ time grep sshd logs | wc -l
25402
real 0m0.505s
user 0m0.425s
sys 0m0.085s
PS: this way of using rg doesn't ignore any files, it is not used to find files recursively. But I don't have a .gitignore or similar in my /var/log anyways.
Your measurement procedure is wrong because the `journalctl` command is doing something different. It isn't just reading a plain file, it is reading a binary file. On the other hand, `grep` and `rg` are reading straight text.
> it seems difficult to believe that journalctl --grep can be faster.
Why? It could be doing it in parallel. One thread starts reading at position 0 and reads till N, another starts at N+1 and reads to 2N, etc. That's a much faster read operation. But I'm guessing and have no idea if this is what is actually being done or not.
P.S.: I know. As I specified in my earlier comment, I get burned with build artifacts and project logs. Things that most people would have in their .gitignore files but you can sure expect to grep through when debugging.
Their measurement isn't wrong. It's demonstrating the exact point in question: that if the logs were just stored in plain text, then grepping them would be an order of magnitude faster (or multiple orders of magnitude in the case of ripgrep) than whatever `journalctl --grep` is doing.
How it's doing the search is irrelevant. What's being measured here is the user experience. This isn't some kind of attempt to do an apples-to-apples grep comparison. This is about how long you have to wait for a search of your logs to complete.
The results in your comment aren't measuring the same thing. There's no grep on the /tmp/all.log in the middle code block, which is the thing they're talking about comparing.
My second operation is covering that. The reason my results show better is because they are counting the decompression against journalctl. It is doing a decompression operation and reading while grep and rg are just reading.
Btw, you can choose not to store journald files as compressed.
Where exactly did you test the speed of "grep sshd /tmp/all.log"? The entire point of their argument is that's what's orders of magnitude faster than anything journalctl.
If there are other interactions we've had, feel free to link them. Then others can decide how rude I'm being instead of relying only on your characterization.
> but it's still crazy to think people won't be caught off guard by this behavior
Straw-manning is also crazy. :-) People have and will absolutely be caught off guard by the behavior. On the flip side, as I said 9 months ago, ripgrep's default behavior is easily one of the most cited positive features of ripgrep aside from its performance.
The other crazy thing here is... you don't have to use ripgrep! It is very specifically intended as a departure from traditional grep behavior. Because if you want traditional grep behavior, then you can just use grep. Hence why ripgrep's binary name is not `grep`, unlike the many implementations of POSIX grep.
> Its name is literally indicating it's a grep replacement.
For anyone else following along at home, if you want ripgrep to search the same files that GNU grep searches, then do `rg -uuu`. Or, if you don't want ripgrep to respect your gitignores but ignore hidden and binary files, then do `rg -u`.
It makes sense that folks might be caught off guard by ripgrep's default filtering. This is why I try to mitigate it by stating very clearly that it is going to ignore stuff by default in the first one or two sentences about ripgrep (README, man page, CHANGELOG, project description). I also try to mitigate it by making it very easy to disable this default behavior. These mitigations exist precisely because I know the default behavior can be surprising, in direct contradiction to "but it's still crazy to think people won't be caught off guard by this behavior."
Not gonna lie, that was a bit creepy. We're deep in a day old thread that you have no other comments in. Do you scrape HN looking for mentions of ripgrep?
Forgive me if I'm a bit surprised!
I still stand that silent errors are significantly worse than loud ones
| it's worse to not get files you're expecting vs get more files than you're expecting. In the later case there's a pretty clear indication you need to filter while in the former there's no signal that anything is wrong. This is objectively a worse case.
> The other crazy thing here is... you don't have to use ripgrep!
If it wasn't clear, I don't ;)
I don't think grep ignoring .gitignore files is "a bug". Like you said, defaults matter. Like I said, build artifacts are one of the most common things for me to grep.
Where we strongly disagree is that I believe aliases should be used to add functionality, where you believe that it should be used to remove functionality. I don't want to start another fight (so not linking the last). We're never going to see eye-to-eye on this issue so there's no reason to rehash it.
> I don't think grep ignoring .gitignore files is "a bug".
I don't either? Like... wat. Lol.
> Where we strongly disagree is that I believe aliases should be used to add functionality, where you believe that it should be used to remove functionality.
Not universally, not at all! There's plenty of other stuff in ripgrep that you need to opt into that isn't enabled by default (like trimming long lines). There's also counter examples in GNU grep itself. For example, you have to opt out of GNU grep's default mode of replacing NUL bytes with newline terminators via the `-a/--text` flag (which is not part of POSIX).
Instead what I try to do is look at the pros and cons of specific behaviors on their own. I'm also willing to take risks. We already have lots of standard grep tools to choose from. ripgrep takes a different approach and tons of users appreciate that behavior.
> We're never going to see eye-to-eye on this issue so there's no reason to rehash it.
Oh I'm happy not to rehash it. But I will defend my name and seek to clarify claims about stuff I've built. So if you don't want to rehash it, then don't. I won't seek you specifically out.
> I don't want to start another fight (so not linking the last).
To be clear, I would link it if I knew what you were referring to. I linked our other interaction by doing a web search for `site:news.ycombinator.com "burntsushi" "godelski"`.
> If it wasn't clear, I don't ;)
OK, so you don't use ripgrep. But you're complaining about it on a public forum. Calling me rude. Calling me creepy. And then whinging about not wanting to rehash things. I mean c'mon buddy. Totally cool to complain even if you don't use it, but don't get all shocked pikachu when I chime in to clarify things you've said.
That's a fair clarification. Then you can change what I said to, "calling what I'm doing creepy." I don't think much else changes. My points certainly don't change.
Yes, it is creepy when someone randomly appears just after you allude to them. It is also creepy when someone appears out of nowhere to make their same point. Neither of you were participating in this thread and appeared deep in a conversation. Yeah, that sure seems like unlikely circumstances to me and thus creepy.
> Look at the output of `systemctl cat docker.service`
No. Either the initsystem works in a straightforward way or it doesn't. As soon as we need special commands to just get an impression of what's happening with the service, this init system can - again - fuck off with all that unnecessary complexity.
Init must be simple.
Unfortunately it isn't anymore. Unfortunately, systemd will not fuck off, it's too late for that. Unfortunately we now have to deal with the consequences of letting LP & co do what they did.
> As soon as we need special commands to just get an impression of what's happening with the service,
I agree this is bad design. I do not intend to defend `--grep` just was trying to help solve the issue. I 100% believe that this creates an unreasonable expectation of the user and that piping to grep should be expected.
Although, my results showed equal times piping to grep and dumping to file then grepping that file. IFF `--grep` is operating in parallel, then I think that's fine that it is faster and I'll take back my critique since it is doing additional functionality and isn't necessary. That situation would be "things work normally, but here's a flag for additional optimization."
Is the slowdown the file access? I do notice that it gets "choppy" if I just dump `journalctl --no-pager` but I was doing it over ssh so idk what the bottleneck was. IO is often a pain point (it pains me how often people untar with verbose...).
> you need to use the grep flag instead of piping into grep
I don't. It's the journalctl that does. And it can absolutely fuck off with everything and all of it.
Log files must be in form of text files. This worked for decades and there is no foreseeable future where this stops working or ceases to be a solution for OS log collection.
My only bugbear with it is that there's no equivalent to the old timeout default you could set (note that doas explicitly said they won't implement this too). The workaround is to run it in `sudo -i` fashion and not put a command afterwards which is reasonable enough even though it worked hard against my muscle memory + copypaste commands when switching over.
> Systemd gets a lot of hate
I'd argue it doesn't and is simply another victim of loud internet minority syndrome.
It's just a generic name at this point, basically all associated with init and service units and none of the other stuff.
I was dismayed at having to go from simple clean linear BSD 4.3 / SunOS 4.1.3 era /etc/rc /etc/rc.local init scripts to that tangled rat king abomination of symbolic links and rc.d sub-directories and run levels that is the SysV / Solaris Rube Goldberg device. So people who want to go back to the "good old days" of that AT&T claptrap sound insane to me. Even Slowlaris moved on to SMF.
Oh yes, please add more! I'd love to see what others do because frankly, sometimes it feels like we're talking about forbidden magic or something lol
And honestly, I think the one thing systemd is really missing is... people talking about it. That's realistically the best way to get more documentation and spread all the cool tricks that everyone finds.
> I'd argue it doesn't
I definitely agree on loud minority, but they're visible enough that anytime systemd is brought up you can't avoid them. But then again, lots of people have much more passion about their opinions than passion about understanding the thing they opine about.
Of course. We suffered with sudo for a couple of decades already! Obviously it's wrong and outdated and has to be replaced with whatever LP says is the new norm.
All of your comments mention something existing for a long time, but X11 and ALSA and IPv4 and many more technologies have been used by people for many decades, and yet they still suck and have a replacement available.
> homed/homectl: extends user management to make it
impossible to have a clear picture of what's up with home dir, where is now located, how to have access to it or whether it will suddenly disappear. Obviously, plain /home worked for like five decades and therefore absolutely has to be replaced.
> Obviously, plain /home worked for like five decades and therefore absolutely has to be replaced.
Five decades ago, people didn't have laptops that they want to put on sleep and can get stolen. Actually, five decades ago, the rare people using a computer logged into remote, shared computers. Five decades ago, you didn't get hacked from the internet.
Today, people mostly each have their computer, and one session for themselves in it (when they have a computer at all)
I have not looked into homed yet, needs are very different from before. "It worked five decades ago" just isn't very convincing.
It'd be better to understand what homed tries to address, and argue why it does it wrong or why the concerns are not right.
You might not like it but there usually are legitimate reasons why systemd changes things, they don't do it because they like breaking stuff.
I'm quite happy with systemd on server side, it eases a lot of things there as well. And I haven't noticed homed on my servers. Did they shove homed down your throat on your servers?
Wait until you want to disable some of the built-in behaviors in Ubuntu. To make things really suck for you, they run some tasks both in crontabs AND in systemd timers. So good luck pulling hair out when you have removed apt updates from all crontabs () around but they still do.
() yeah, it's a bad idea; it was required for a specific installation where every cpu cycle counted.
I mean 5 decades ago people were using terminals, not terminal emulators. They weren't using the internet[0]. 5 decades ago Linux didn't exist, kinda making any argument moot.
[0] idk why people think Arpanet is the internet. For clarification, I'm not my das
Learning curve is not the annoying part. It is kind of expected and fine.
systemd is annoying is parts that are so well described over the internet, that it makes it zero sense to repeat it. I am just venting and that comes from the experience.
never boot into the network reliably, because under systemd you have no control over the sequence.
BTW, I think that's one of the main pros and one of the strongest features of systemd, but it is also what makes it unreliable and boot unreproducible if you live outside of the very default Ubuntu instance and such.
It has a 600s timeout. You can reduce that if you want it to fail faster. But that doesn't seem like a problem with systemd, that seems like a problem with your network connection.
> If you live outside of the very default Ubuntu instance and such.
I'm not sure which of the turds (NetworkManager, netplan) are currently installed with ubuntu and what's their relationship with systemd but I equally hate them all.
My ubuntu initiation script includes apt-get install ifupdown, which actually works unlike those two. And why bother learning because by the next ubuntu release the upcoming fanboys will replace the network stack by whatever they think they like - again.
But the bug we are discussing is probably systemd's, because the network is up and running while systemd still waits for it.
It has the slightly odd behavior with trying to get all configured links up.
This can lead to some unexpected behavior when there's more than one.
But yea, the upstream stance is essentially "don't rely on network to be up before you start. That's bad software. You have to deal with network going down and back up in practice either way."
Which is often not super useful.
If you want that bare bones of a system I'd suggest using a minimal distribution. But honestly, I'm happy that I can wrap up servers and services into chroot jails with nspawn. Even when I'm not doing much, it makes it much easier to import, export, and limit capabilities
Simple example is I can have a duplicate of the "machine" running my server and spin it up (or have it already spun up) and take over if something goes wrong. Makes for a much more seamless experience.
It's a bit tricky and first and not a lot of good docs, but honestly I've been really liking it. I dropped docker in favor. Gives me a lot better control and flexibility.
I've run my homelab with podman-systemd (quadlet) for awhile and every time I investigate a new k8s variant it just isn't worth the extra hassle. As part of my ancient Ansible playbook I just pre-pull images and drop unit files in the right place.
I even run my entire Voron 3D printer stack with podman-systemd so I can update and rollback all the components at once, although I'm looking at switching to mkosi and systemd-sysupdate and just update/rollback the entire disk image at once.
The main issues are:
1. A lot of people just distribute docker-compose files, so you have to convert it to systemd units.
2. A lot of docker images have a variety of complexities around user/privilege setup that you don't need with podman. Sometimes you need to do annoying userns idmapping, especially if a container refuses to run as root and/or switches to another user.
Overall, though, it's way less complicated than any k8s (or k8s variant) setup. It's also nice to have everything integrated into systemd and journald instead of being split in two places.
Nice! I’ve been using a similar approach for years with my own setup: https://github.com/Mati365/hetzner-podman-bunjs-deploy. It’s built around Podman and systemd, and honestly, nothing has broken in all that time. Super stable, super simple. Just drop your units and go. Rock solid.
It works pretty well. I've also found that some AI models are pretty decent at it too. Obviously need to fix up some of the output but the tooling for conversion is much better than when I started.
Just a single (or bunch of independent) 'node'(s) though right?
To me podman/systems/quadlet could just as well be an implementation detail of how a k8s node runs a container (the.. CRI I suppose, in the lingo?) - it's not replacing the orchestration/scheduling abstraction over nodes that k8s provides. The 'here are my machines capable of running podman-systemd files, here is the spec I want to run, go'.
My servers are pets not cattle. They are heterogeneous and collected over the years. If I used k8s I'd end up having to mostly pin services to a specific machine anyway. I don't even have a rack: it's just a variety of box shapes stacked on a wire shelf.
At some point I do want to create a purpose built rack for my network equipment and maybe setup some homogenous servers for running k8s or whatever, but it's not a high priority.
I like the idea of podman-systemd being an impl detail of some higher level orchestration. Recent versions of podman support template units now, so in theory you wouldn't even need to create duplicate units to run more than one service.
Same experience, my workflow is to run the container from a podman run command, check it runs correctly, podlet to create a base container file, edit the container file (notably with volume and networks in other quadet file) and done (theorically).
I believe the podman-compose project is still actively maintened and could be a nice alternative for docker-compose. But the podman's interface with systemd is so enjoyable.
I don't know if podman-compose is actively developed, but it is unfortunately not a good alternative for docker-compose. It doesn't handle the full feature set of the compose spec and it tends to catch you by surprise sometimes. But the good news is, the new docker-compose (V2) can talk to podman just fine.
This us the way! Quadlets is such a nice way to run containers, really a set and forget experience. No need to install extra packages, at least on Fedora or Rocky Linux. I should do a write up of this some time...
Yep! My experience on Ubuntu 24.04 LTS was that I needed to create a system user to reserve the subuids / subgids for Podman (defaults to looking for a `containers` user):
useradd --comment "Helper user to reserve subuids and subgids for Podman" \
--no-create-home \
--shell /usr/sbin/nologin \
containers
I also found this blog post about the different `UserNS` options https://www.redhat.com/en/blog/rootless-podman-user-namespac... very helpful. In the end it seems that using `UserNS=auto` for rootful containers (with appropriate system security settings like private devices, etc) is easier and more secure than trying to get rootless containers running in a systemd user slice (Dan Walsh said it on a GitHub issue but I can't find it now).
I found Dan's recommendation to use rootful with `userns=auto`:
> User= causes lots of issues with running podman and rootless support is fairly easy. I also recomend that people look at using rootful with --userns=auto, which will run your containers each in a unique user namespace. ― https://github.com/containers/podman/issues/12778#issuecomme...
This was touched on at the end of the article, but the author hadn't yet explored it. Thanks for the link.
> Of course, as my luck would have it, Podman integration with systemd appears to be deprecated already and they're now talking about defining containers in "Quadlet" files, whatever those are. I guess that will be something to learn some other time.
I came to the comments to make sure someone mentioned quadlets. Just last week, I migrated my home server from docker compose to rootless podman quadlets. The transition was challenging, but I am very happy with the result.
Seems very cool but can it do all one can do with compose? In other words, declare networks, multiple services, volumes, config(maps) and labels for e.g. traefik all in one single file?
To me that's why compose is neat. It's simple. Works well with rootless podman also.
I suspect there are few capabilities compose possesses that quadlets lack. Certainly, there are many capabilities that quadlets possess that compose lacks because you're really making systemd services, which exposes a host of possibilities.
Services are conceptually similar to pods in podman. Volumes and mounts are the same. Secrets or mounts can do configs, and I think podman handles secrets much better than docker. I searched for and found examples for getting traefik to work using quadlets. There are a few networking wrinkles that require a bit of learning, but you can mostly stick to the old paradigm of creating and attaching networks if that's your preference, and quadlets can handle all of that.
Quadlets use ini syntax (like systemd unit files) instead of YAML, and there is currently a lack of tooling for text highlighting. As you alluded, quadlets require one file per systemd service, which means you can't combine conceptually similar containers, networks, volumes, and other entities in a single file. However, podman searches through the quadlet directories recursively, which means you can store related services together in a directory or even nest them. This was a big adjustment, but I think I've come to prefer organizing my containers using the file system rather than with YAML.
That is indeed really nice. However, kubernetes resource definitions are way more complicated than compose files so I still wish one could do the same by just adding a .compose extension to easily migrate.
I created skate (https://github.com/skateco/skate) to be basically this but multihost and support k8s manifests. Under the hood it’s podman and systemd
This is a great approach which resonates with me a lot. It's really frustrating that there is no simple way to run a multi-host Docker/Podman (Docker Swarm is abandonware since 2019 unfortunately).
However, in my opinion K8s has the worst API and UX possible. I find Docker Compose spec much more user friendly. So I'm experimenting with a multi-host docker-compose at the moment: https://github.com/psviderski/uncloud
Wouldn’t argue with you abut the k8s ux. Since it has all the ground concepts ( service, cronjob etc ) it required less effort than rolling yet another syntax.
Thanks! I'm more than happy to catch up to discuss the challenges. Feel free to reach out to me on twitter @psviderski or email 'me at psviderski.name'
We went back to just packaging debs and running them directly on ec2 instances with systemd. no more containers. Put the instances in an autoscaling group with an ALB. A simple ansible-pull installs the debs on-boot.
really raw-dogging it here but I got tired of endless json-inside-yaml-inside-hcl. ansible yaml is about all I want to deal with at this point.
I also really like in this approach that if there is a bug in a common library that I use, all I have to do is `apt full-upgrade` and restart my running processes, and I am protected. No rebuilding anything, or figuring out how to update some library buried deep a container that I may (or may not) have created.
Yes, I also have gone this route for a very simple application. Systemd was actually delightful, using a system assigned user account to run the service with the least amount of privileges is pretty cool. Also cgroup support does really make it nice to run many different services on one vps.
The article is more than one year old, systemd now even has specialized officially supported OS distro for immutable workflow namely ParticleOS [1],[2].
I do (nginx plus a couple of custom services) but my needs are very minimal. As soon as you need something a little complex or redundancy by spinning up multiple nodes then containers start to make a huge amount of sense.
I really have a love/hate relationship with containers. On the one hand they are entirely self contained, make redundancy simple, and - if used well - are more legible than some adhoc set up procedures.
At the same time, i've seen some horrible decisions made because of them: Redis for things that do not need them. Projects with ~10.000 users (and little potential growth) tripping over themselves to adopt k8 when my desktop could run the workload of 100.000 users just fine. A disregard for backups / restore procedures because redundancy is good enough. "Look I can provision 64 extra servers for my batch job that pre-calculates a table every day".
---
It seems every year fewer teams appreciate how fast modern hardware with a language like Rust or Go can be if you avoid all the overhead.
My standard advice is to use a single container that holds everything. Only after its build and in use can you make the best choice at which angle to scale.
> A complex system that works is invariably found to have evolved from a simple system that worked. - John Gall
Containers help in two ways. First in deployment, if you really have a complex system (and "modern" development practices seem to encourage complexity).
But containers really shine during development if you have more than a few developers working on the same projects. The ability to have a standard dev container for coding and testing saves so much time. And once you have that, deploying with containers is almost free.
Also, another pro tip: set up your ~/.ssh/config so that you don't need the user@ part in any ssh invocations. It's quite practical when working in a team, you can just copy-paste commands between docs and each other.
Do what the sibling comment says or set DOCKER_HOST environment variable. Watch out, your local environment will be used in compose file interpolation!
Harbormaster uses a YAML file to discover repositories, clones and updates them every so often, and runs the Docker Compose files they contain. It also keeps all state in a single directory, so you can easily back everything up. That's it.
It's by far the easiest and best tool for container orchestration I've come across, if all you need is a single server. I love how the entire config is declared in a repo, I love how all the state is in one directory, and I love how everything is just Compose files, nothing more complicated.
I know I'm tooting my own horn, I just love it so much.
I think you are only looking at Kubernetes for running and updating container images. If that’s the use-case then I guess it’s overkill.
But Kubernetes does much more in terms of providing the resources required for these containers to share state, connect to each other, get access to config or secrets etc.
That’s where comes the CPU and memory cost. The cost of managing your containers and providing them the resources they need.
> basically acts as a giant while loop
Yep. That’s the idea of convergence of states I guess. In a distributed system you can’t always have all the participating systems behave in the desired way. So the manager (or orchestrator) of the system continuously tries to achieve the desired state.
> But Kubernetes does much more in terms of providing the resources required for these containers to share state, connect to each other, get access to config or secrets etc.
This was OPs argument, and mine as well. My side project which is counting requests per minute or hour really doesn’t need that, however I need to eat the overhead of K8s just to have the nice dx of being able to push a container to a registry and it gets deployed automatically with no downtime.
I don’t want to pay to host even a K3s node when my workload doesn’t even tickle a 1vCPU 256mb ram instance, but I also don’t want to build some custom scaffold to so the work.
So I end up with SSH and SCP… quadlets and podman-systemd solves those problems I have reasonably well and OPs post is very valuable because it builds awareness of a solution that solves my problems.
I never moved to containers and seeing the churn the community has went through with all of this complicated container tooling, I'm happy orchestrating small-scale systems with supervisord and saltstack-like chatops deployments - it's just stupid simple by comparison and provides parity between dev and prod environments that's nice.
What churn? For 95% of users, the way to use containers hasn't changed in the past decade. It's just a combination of docker CLI, maybe some docker compose for local testing and then pushing that image somewhere.
True perhaps from a certain perspective, but k8s and the other orchestration technologies, etc. Also dev workflows in containers seem broken and hard - I think it offers a poor experience generally.
Too many gaps around image management. It seems like an unfinished feature that wasn't completely thought out IMO. Podman is what systemd-nspawns OCI interface should've become.
I'll answer this for you. You want rootless podman because docker is the defacto standard way of packaging non-legacy software now including autoupdates. I know, sad... Podman still does not offer convenient and mature way for systemd to run it with an unprivileged user. It is the only gripe I've had with this approach...
This is no longer true as of Podman 5 and Quadlet?
You can define rootless containers to run under systemd services as unprivileged users. You can use machinectl to login as said user and interact with systemctl.
You see, my issue with this is that it suggests using the quadlets with lingering users... Which is the same annoying case as with the article. It is not as with other systemd services that you just instruct systemd to take a temporary uid/gid and run the service with it.
I really like rootless podman, but there is one quirk in that if you want to preserve the original source IP address (e.g. for web server logs), you have to use a workaround which has a performance penalty.
That workaround is not needed if the web server container supports socket activation. Due to the fork-exec architecture of Podman, the socket-activated socket is inherited by the container process. Network traffic sent over this socket-activated socket has native performance.
https://github.com/containers/podman/blob/main/docs/tutorial...
I don't know if someone knows a better stack for my fleet of self hosted applications, maybe moving to quadlet would simplify stuff ?
Right now I have an Ansible playbook responsible for updating my services, in a git repo.
The playbook stops changed services, backups their configs and volumes, applies the new docker-compose.yml and other files, and restarts them.
If any of them fail to start, or aren't reachable after 3 minutes, it rolls back everything *including the volumes* (using buttervolume, docker volumes as btrfs subvolumes to make snapshots free).
I am looking into Kubernetes, but I didn't find a single stack/solution that would do all that this system does. For example I found nothing that can auto rollback on failure *including persistent volumes*.
I found Argo Rollback but it doesn't seem to have hooks that would allow me to add the functionality.
YMMV, no warranty, IMHO, etc. Disclaimer: I haven't used k8s in a long while, mostly because I don't really have a good use case for it.
You'd need to slightly rethink rollbacks, express them in terms of always rolling forward. K8s supports snapshots directly (you'd need a CSI driver; https://github.com/topolvm/topolvm, https://github.com/openebs/zfs-localpv, or similar). Restores happen by creating a new PVC (dataSource from a VolumeSnapshot). So in case rollout to version N+1 fails, instead of a rollback to N you'd roll forward to N+2 (which itself would be a copy of N, but referencing the new PVC/PV). You'd still have to script that sequence of actions somehow - perhaps back to Ansible for that? Considering there might be some YAML parsing and templating involved.
Of course this looks (and likely is) much more complicated, so if your use case doesn't justify k8s in the first place, I'd stick to what already works ;)
I'm interested in moving to Kubernetes to make use of the templating languages available that are better than plain Ansible jinja2, and also offer features like schema checking.
Because my services are pretty integrated together and to avoid having hardcoded values in multiple places my Ansible files are a pain to maintain
The pain of Ansible+Jinja2 is what eventually pushed me to write judo[1]... It works surprisingly well for what it was built for, but of course has limitations (e.g. there's no direct support for a templating language; you just plug env variables into a script). The idea is, do less, and allow other tools to fill the gaps.
There's still a lot on my todo list, like env files, controlling parallelism, canaries/batches, etc. I'm currently doing these things using hacky shell scripts, which I don't like, so I'd prefer moving that into the main binary. But I still prefer it as-is over Ansible.
At some point I tried to run a few small websites dedicated for activism (couple of Wordpress instances, a forum and custom PHP code) using docker. It was time sink as updating and testing the images turned out to be highly non-trivial.
Eventually I replaced everything with a script that generated systemd units and restarted the services on changes under Debian using the Wordpress that comes with it. Then I have a test VM on my laptop and just rsync changes to the deployment host and run the deployment script there. It reduced my chores very significantly. The whole system runs on 2GB VPS. It could be reduced to 1GB if Wordpress would officially support SQLite. But I prefer to pay few more euros per month and stick to Mariadb to minimize support requirements.
I also use systemd+podman. I manage the access into the machine via an nginx that reverse proxies the services. With quadlets things will probably be even better but right now I have a manual flow with `podman run` etc. because sometimes I just want to run on the host instead and this allows for me to incrementally move in.
I do this with traefik as the reverse proxy. To host something new, all I need to do is add a label to the new container for traefik to recognize. It's neat with a wildcard cert that traefik automatically renews. I've also heard good things about caddy, a similar alternative.
Yeah, I've heard that these new reverse proxies are great like that. I have to run certbot (which I do) and I should have created wildcard certs but I didn't. I use traefik on k3s and it's good there for other stuff.
Kamal is also a decent solution if you just have a static set of webapps that you want to easily deploy to static set of systems but still want 'production-like' features like no-downtime-deploys.
And I'm pretty familiar with Kubernetes but, yeah, for small tasks it can feel like taking an Apache to the store to buy a carton of milk.
Well, if you're planning to run a single-node container server, then K8s is probably an overkill compared to Podman Quadlets. You just choose the lightest solution that meets your requirement. However, there was a noteworthy project named Aurae [1]. It was an ambitious project intended to replace systemd and kubelets on a server. Besides running containerized and baremetal loads, it was meant to take authenticated commands over an API and had everything that was expected on K8s worker nodes. It could work like K8s and like Docker with appropriate control planes. Unfortunately, the project came to an abrupt end when its main author Kris Nova passed away in an unfortunate accident.
This is cool, but it doesn't address the redundancy/availability aspect of k8s, specifically, being able to re-schedule dead services when a node (inevitably) dies.
Generally speaking, redundancy/availability could also be achieved through replication rather than automatic rescheduling, where you deploy multiple replicas of the service across multiple machines. If one of them dies, the other one still continues service traffic. Like in good old days when we didn't have k8s and dynamic infra.
This trades off some automation for simplicity. Although, this approach may requires manual intervention when a machine fails permanently.
I just deployed a couple containers this way, was pretty easy to port the docker-compose. However, I then tried to get them to run rootless, and well, that turned out to be headache after headache. Went back to rootful, other than I'm pretty happy with the deployment.
I host all of my hobby projects on a couple of raspi zeros using systemd alone, zero containers. Haven’t had a problem since when I started using it. Single binaries are super easy to setup and things rarely break, you have auto restart and launch at startup.
All of the binaries get generated on GitHub using Actions and when I need to update stuff I login using ssh and execute a script that uses a GitHub token to download and replace the binary, if something is not okay I also have a rollback script that switches things back to its previous setup. It’s as simple as it gets and it’s been my go-to for 2 years now.
Sure, if you are using single binary output language like Golang, Rust, C or .Net/Java self contained, containers are overkill if you are not using container management system.
However, Ruby, Python, JS/TS, Java/.Net are all easier inside a container then outside. Not to say it's not doable, just hair pulling.
If it is a single binary, replace the current with the previous.
If it is deployed as folders, install new versions as whatever.versionnumber and upgrade by changing the symlink that points to the current version to point to the new one.
A couple years ago I upgraded my desktop hardware, which meant it was time to upgrade my homelab. I had gone through various operating systems and methods of managing my services: systemd on Ubuntu Server, Docker Compose on CentOS, and Podman on NixOS.
I was learning about Kubernetes at work and it seemed like such a powerful tool, so I had this grand vision of building a little cluster in my laundry room with nodes net booting into Flatcar and running services via k3s. When I started building this, I was horrified by the complexity, so I went the complete opposite direction. I didn't need a cluster, net booting, blue-green deployments, or containers. I landed on NixOS with systemd for everything. Bare git repos over ssh for personal projects. Git server hooks for CI/CD. Email server for phone notifications (upgrade failures, service down, low disk space etc). NixOS nightly upgrades.
I never understood the hate systemd gets, but I also never really took the time to learn it until now, and I really love the simplicity when paired with NixOS. I finally feel like I'm satisfied with the operation and management of my server (aside from a semi frequent kernel panic that I've been struggling to resolve).
I do that too, I run everything in rootless podman managed by systemd units it's quite nice. With systemd network activation I could even save the cost of user space networking, though for my single user use case, it's not really needed and for now I could not bother.
I also have Quadlet on my backlog, I'm waiting the release of next stable version of Debian (which I think should be released sometimes this year) as the current version of Debian has a podman slightly too old which doesn't include Quadlet
- is there downtime? (old service down, new service hasn't started yet)
- does it do health checks before directing traffic? (the process is up, but its HTTP service hasn't initialized yet)
- what if the new process fails to start, how do you rollback?
Or it's solved with nginx which sits in front of the containers? Or systemd has a builtin solution? Articles like this often omit such details. Or no one cares about occasional downtimes?
If your site has almost no users, the approach outlined in the article is viable. In all other cases, if my site were greeting visitors with random errors throughout the day, I'd consider that a pretty poor job.
At https://controlplane.com we give you the power of Kubernetes without the toil of k8s. A line of code gets you a tls terminated endpoint that is geo routed to any cloud region and on-prem location. We created the Global Virtual Cloud that let's you run compute on any cloud, on premises hardware or vm's and any combination. I left vmware to start the company because the cognitive load on engineers was becoming ridiculous. Logs, metrics, tracing, service discovery, TLS, DNS, service mesh, network tunnels and much more - we made it easy. We do to the cloud what vmware did to hardware - you don't care what underlying cloud you're on. Yet you can use ANY backing service of AWS, GCP and Azure - as if they merged and your workloads are portable - they run unmodified anywhere and can consume any combination of services like RDS, Big Query, Cosmos db and any other. It is as if the cloud providers decided to merge and then lower your cost by 60-80%.
Whoever thought running your personal blog on Kubernetes is a good idea?
Kubernetes is good for large scale applications. Nothing else. I do not get why even mid-sized companies take the burden of Kubernetes although they have low infrastructur needs.
Its a tool that lets you punch way above your weight resource wise. As long as you dont manage the control plane, if you already know k8s there is very little reason not to use it. Otherwise you end up replicating what it does piecemeal.
There are a lot of reasons not to use a complex additional layer, like k8s, even if _you_ know it inside out. Long-term maintainability for example. This is especially important for low traffic sites, where it does not pay out to maintain them every 1-2 years. And it adds an additional point of failure. Trust me, I've maintained some applications running on k8s. They did always fail due to the k8s setup, not the application itself.
Why do I need to trust you when I can trust myself and the clusters Ive maintained : D
Until something better comes I will start with k8s 100% of the time for production systems. The minor one time pains getting it stood up are worth it compared to the migration later and everything is in place waiting to be leveraged.
Article about podman, what systemd has to do with anything?
What's wrong with docker + watchtower?
How is is possible to both use gitops and not "remember" which flags you used for containers?
I remember seeing a project in development that built k8s-like orchestration on top of systemd a couple years ago, letting you control applications across nodes and the nodes themselves with regular systemd config files and I have been unable to find it again. IIRC it was either a Redhat project or hosted under github.com/containers and looked semi-official.
Anyone knows what I’m talking about? Is it still alive?
EDIT: it’s not CoreOS/Fleet, it’s something much more recent, but was still in early alpha state when I found it.
Maybe you're thinking of BlueChi[0]? It used to be Red Hat project called 'hirte' and was renamed[1] and moved to the Eclipse foundation for whatever reason. It lets you control systemd units across nodes and works with the normal systemd tools (D-bus etc).
I've had a really great experience with docker compose and systemd units. I use a generic systemd unit and it's just as easy as setting up users then `systemctl enable --now docker-compose@service-name`.
Couldn’t it be argued that this is a bug in Kubernetes?
Most of what it does is run programs with various cgroups and namespaces, which is what systemd does, so should it really be any more resource intensive?
To suggest that Kubernetes doesn't fall under the scope of "battle tested" is a bit misleading. As much as systemd? Of course not. But it's not 2016 anymore and we have billions of collective operational hours with Kubernetes.
Just the other day I found a cluster I had inadvertently left running on my macBook using Kind. It literally had three weeks of uptime (running a full stack of services) and everything was still working, even with the system getting suspended repeatedly.
I once discovered that I had left a toy project postgres instance running on my macbook for two and a half years. Everything was working perfectly, and this was more than a decade ago, on an intel mac.
Ha, I figured someone was going to come with a retort like that. Fair, but it would be better to compare it to running a distributed postgres cluster on your local mac.
CoreOS (now part of redhat) offered "fleet" which was multi-computer container orchestration, and used systemd style fleet files and raft protocol to do leader election etc (back when raft was still a novel concept). We ran it in production for about a year (2015) before it became abundantly clear (2016) the future was Kubernetes. I think the last security update Fleet got was Feb 2017 so it's pretty much dead for new adoption, if anyone is still using it.
But TL;DR we've already done systemd style container orchestration, and it was ok.
I ran it for a couple years. While it had some quirks at the time, it (and the rest of the Hashi stack) were lightweight, nicely integrated, and quite pleasant.
However, it’s no longer open source. Like the rest of Hashicorp’s stuff.
That's the most disgusting sentence fragment I've ever heard. I wish it could be sent back in a localized wormhole and float across the table when Systemd was being voted on to become the next so-called init system.
Edit: Nevermind, I misunderstood the article from just the headline. But I'm keeping the comment as I find the reference funny
I share the author's sentiment completely. At my day job, I manage multiple Kubernetes clusters running dozens of microservices with relative ease. However, for my hobby projects—which generate no revenue and thus have minimal budgets—I find myself in a frustrating position: desperately wanting to use Kubernetes but unable to due to its resource requirements. Kubernetes is simply too resource-intensive to run on a $10/month VPS with just 1 shared vCPU and 2GB of RAM.
This limitation creates numerous headaches. Instead of Deployments, I'm stuck with manual docker compose up/down commands over SSH. Rather than using Ingress, I have to rely on Traefik's container discovery functionality. Recently, I even wrote a small script to manage crontab idempotently because I can't use CronJobs. I'm constantly reinventing solutions to problems that Kubernetes already solves—just less efficiently.
What I really wish for is a lightweight alternative offering a Kubernetes-compatible API that runs well on inexpensive VPS instances. The gap between enterprise-grade container orchestration and affordable hobby hosting remains frustratingly wide.
> What I really wish for is a lightweight alternative offering a Kubernetes-compatible API that runs well on inexpensive VPS instances. The gap between enterprise-grade container orchestration and affordable hobby hosting remains frustratingly wide.
Depending on how much of the Kube API you need, Podman is that. It can generate containers and pods from Kubernetes manifests [0]. Kind of works like docker compose but with Kubernetes manifests.
This even works with systemd units, similar to how it's outlined in the article.
Podman also supports most (all?) of the Docker api, thus docker compose, works, but also, you can connect to remote sockets through ssh etc to do things.
[0] https://docs.podman.io/en/latest/markdown/podman-kube-play.1...
[1] https://docs.podman.io/en/latest/markdown/podman-systemd.uni...
The docs don't make it clear, can it do "zero downtime" deployments? Meaning it first creates the new pod, waits for it to be healthy using the defined health checks and then removes the old one? Somehow integrating this with service/ingress/whatever so network traffic only goes to the healthy one?
I can't speak on it's capabilities, but I feel like I have to ask: for what conceivable reason would you even want that extra error potential with migrations etc?
It means you're forced to make everything always compatible between versions etc.
For a deployment that isn't even making money and is running on a single node droplet with basically no performance... Why?
> I can't speak on it's capabilities, but I feel like I have to ask: for what conceivable reason would you even want that extra error potential with migrations etc?
It's the default behavior of a kubernetes deployment which we're comparing things to.
> It means you're forced to make everything always compatible between versions etc.
For stateless services, not at all. The outside world just keeps talking to the previous version while the new version is starting up. For stateful services, it depends. Often there are software changes without changes to the schema.
> For a deployment that isn't even making money
I don't like looking at 504 gateway errors
> and is running on a single node droplet with basically no performance
I'm running this stuff on a server in my home, it has plenty of performance. Still don't want to waste it on kubernetes overhead, though. But even for a droplet, running the same application 2x isn't usually a big ask.
GP talks about personal websites on 1vCPU, there's no point in zero downtime then. Apples to oranges.
Zero downtime doesn't mean redundancy here. It means that no request gets lost or interrupted due to a container upgrade.
The new container spins up while the old container is still answering requests and only when the new container is running and all requests to the old container are done, then the old container gets discarded.
You can use firecracker !
Have you seen k0s or k3s? Lots of stories about folks using these to great success on a tiny scale, e.g. https://news.ycombinator.com/item?id=43593269
I use k3s. With more than more master node, it's still a resource hog and when one master node goes down, all of them tend to follow. 2GB of RAM is not enough, especially if you also use longhorn for distributed storage. A single master node is fine and I haven't had it crash on me yet. In terms of scale, I'm able to use raspberry pis and such as agents so I only have to rent a single €4/month vps.
I tried k3s but even on an immutable system dealing with charts and all the other kubernetes stuff adds a new layer of mutability and hence maintenance, update, manual management steps that only really make sense on a cluster, not a single server.
If you're planning to eventually move to a cluster or you're trying to learn k8s, maybe, but if you're just hosting a single node project it's a massive effort, just because that's not what k8s is for.
I'm laughing because I clicked your link thinking I agreed and had posted similar things and it's my comment.
Still on k3s, still love it.
My cluster is currently hosting 94 pods across 55 deployments. Using 500m cpu (half a core) average, spiking to 3cores under moderate load, and 25gb ram. Biggest ram hog is Jellyfin (which appears to have a slow leak, and gets restarted when it hits 16gb, although it's currently streaming to 5 family members).
The cluster is exclusively recycled old hardware (4 machines), mostly old gaming machines. The most recent is 5 years old, the oldest is nearing 15 years old.
The nodes are bare Arch linux installs - which are wonderfully slim, easy to configure, and light on resources.
It burns 450Watts on average, which is higher than I'd like, but mostly because I have jellyfin and whisper/willow (self hosted home automation via voice control) as GPU accelerated loads - so I'm running an old nvidia 1060 and 2080.
Everything is plain old yaml, I explicitly avoid absolutely anything more complicated (including things like helm and kustomize - with very few exceptions) and it's... wonderful.
It's by far the least amount of "dev-ops" I've had to do for self hosting. Things work, it's simple, spinning up new service is a new folder and 3 new yaml files (0-namespace.yaml, 1-deployment.yaml, 2-ingress.yaml) which are just copied and edited each time.
Any three machines can go down and the cluster stays up (metalLB is really, really cool - ARP/NDP announcements mean any machine can announce as the primary load balancer and take the configured IP). Sometimes services take a minute to reallocate (and jellyfin gets priority over willow if I lose a gpu, and can also deploy with cpu-only transcoding as a fallback), and I haven't tried to be clever getting 100% uptime because I mostly don't care. If I'm down for 3 minutes, it's not the end of the world. I have a couple of commercial services in there, but it's free hosting for family businesses, they can also afford to be down an hour or two a year.
Overall - I'm not going back. It's great. Strongly, STRONGLY recommend k3s over microk8s. Definitely don't want to go back to single machine wrangling. The learning curve is steeper for this... but man do I spend very little time thinking about it at this point.
I've streamed video from it as far away as literally the other side of the world (GA, USA -> Taiwan). Amazon/Google/Microsoft have everyone convinced you can't host things yourself. Even for tiny projects people default to VPS's on a cloud. It's a ripoff. Put an old laptop in your basement - faster machine for free. At GCP prices... I have 30k/year worth of cloud compute in my basement, because GCP is a god damned rip off. My costs are $32/month in power, and a network connection I already have to have, and it's replaced hundreds of dollars/month in subscription costs.
For personal use-cases... basement cloud is where it's at.
> It burns 450Watts on average
To put that into perspective, that's more than my entire household including my server that has an old GPU in it
Water heating is electric yet we still don't use 450W×year≈4MWh of electricity. In winter we just about reach that as a daily average (as a household) because we need resistive heating to supplement the gas system. Constantly 450W is a huge amount of energy for flipping some toggles at home with voice control and streaming video files
That's also only four and a half incandescent lightbulbs. Not enough to heat your house ;)
Remember that modern heating and hot water systems have a >1 COP, meaning basically they provide more heat than the input power. Air-sourced heat pumps can have a COP of 2-4, and ground source can have 4-5, meaning you can get around 1800W of heat out of that 450W of power. That's ignoring places like Iceland where geothermal heat can give you effectively free heat. Ditto for water heating, 2-4.5 COP.
Modern construction techniques including super insulated walls and tight building envelops, heat exchangers, can dramatically reduce heating and cooling loads.
Just saying it's not as outrageous as it might seem.
> Remember that modern heating and hot water systems have a >1 COP, meaning basically they provide more heat than the input power.
Oh for sure! Otherwise we'd be heating our homes directly with electricity.
Thanks for putting concrete numbers on it!
And yet it's far more economical for me than paying for streaming services. A single $30/m bill vs nearly $100/m saved after ditching all the streaming services. And that's not counting the other saas products it replaced... just streaming.
Additionally - it's actually not that hard to put this entire load on solar.
4x350watt panels, 1 small inverter/mppt charger combo and a 12v/24v battery or two will do you just fine in the under $1k range. Higher up front cost - but if power is super expensive it's a one time expense that will last a decade or two, and you get to feel all nice and eco-conscious at the same time.
Or you can just not run the GPUs, in which case my usage falls back to ~100w. I You can drive lower still - but it's just not worth my time. It's only barely worth thinking about at 450W for me.
I'm not saying it should be cheaper to run this elsewhere, I'm saying that this is a super high power draw for the utility it provides
My own server doesn't run voice recognition so I can't speak to that (I can only opine that it can't be worth a constant draw of 430W to get rid of hardware switches and buttons), but my server also does streaming video and replaces SaaS services, so similar to what you mention, at around 20W
Found the European :) With power as cheap as it is in the US, some of us just haven't had to worry about this as much as we maybe should. My rack is currently pulling 800W and is mostly idle. I have a couple projects in the works to bring this down, but I really like mucking around with old enterprise gear and that stuff is very power hungry.
Dell R720 - 125W
Primary NAS - 175W
Friend's Backup NAS - 100W
Old i5 Home Server - 100W
Cisco 2921 VoIP router - 80W
Brocade 10G switch - 120W
Various other old telecom gear - 100W
I care about the cost far less than the environmental impact. I guess that's also a European tell?
Perhaps. Many people in America also claim to care about the environmental impact of a number of things. I think many more people care performatively than transformatively. Personally, I don't worry too much about it. It feels like a lost cause and my personal impact is likely negligible in the end.
Then offsetting that cost to a cloud provider isn't any better.
450W just isn't that much power as far as "environmental costs" go. It's also super trivial to put on solar (actually my current project - although I had to scale the solar system way up to make ROI make sense because power is cheap in my region). But seriously, panels are cheap, LFP batteries are cheap, inverters/mppts are cheap. Even in my region with the cheap power, moving my house to solar has returns in the <15 years range.
> Then offsetting that cost to a cloud provider isn't any better.
Nobody made that claim
> 450W just isn't that much power as far as "environmental costs" go
It's a quarter of one's fair share per the philosophy of https://en.wikipedia.org/wiki/2000-watt_society
If you provide for yourself (e.g. run your IT farm on solar), by all means, make use of it and enjoy it. Or if the consumption serves others by doing wind forecasts for battery operators or hosts geographic data that rescue workers use in remote places or whatnot: of course, continue to do these things. In general though, most people's home IT will fulfil mostly their own needs (controlling the lights from a GPU-based voice assistant). The USA and western Europe have similarly rich lifestyles but one has a more than twice as great impact on other people's environment for some reason (as measured by CO2-equivalents per capita). We can choose for ourselves what role we want to play, but we should at least be aware that our choices make a difference
> My rack is currently pulling 800W and _is mostly idle_.
Emphasis mine. I have a rack that draws 200w continuously and I don't feel great about it, even though I have 4.8kW of panels to offset it.
It absolutely is. Americans dgaf, they're driving gas guzzles on subsidized gas and cry when it comes close to half the cost of normal countries.
In America, taxes account for about a fifth of the price of a unit of gas. In Europe, it varies around half.
The remaining difference in cost is boosted by the cost of ethanol, which is much cheaper in the US due to abundance of feedstock and heavy subsidies on ethanol production.
The petrol and diesel account for a relatively small fraction on both continents. The "normal" prices in Europe aren't reflective of the cost of the fossil fuel itself. In point of fact, countries in Europe often have lower tax rates on diesel, despite being generally worse for the environment.
Good ol 'murica bad' strawmen.
Americans drive larger vehicles because our politicians stupidly decided mandating fuel economy standards was better than a carbon tax. The standards are much laxer for larger vehicles. As a result, our vehicles are huge.
Also, Americans have to drive much further distances than Europeans, both in and between cities. Thus gas prices that would be cheap to you are expensive to them.
Things are the way they are because basic geography, population density, and automotive industry captured regulatory and zoning interests. You really can't blame the average American for this; they're merely responding to perverse incentives.
How is this in any way relevant to what I said? You're just making excuses, but that doesn't change the fact that americans don't give a fuck about the climate, and they objectively pollute far more than those in normal countries.
If you can't see how what I said was relevant, perhaps you should work on your reading comprehension. At least half of Americans do care about the climate and the other half would gladly buy small trucks (for example) if those were available.
It's lazy to dunk on America as a whole, go look at the list of countries that have met their climate commitments and you'll see it's a pretty small list. Germany reopening coal production was not on my bingo card.
I run a similar number of services on a very different setup. Administratively, it’s not idempotent but Proxmox is a delight to work with. I have 4 nodes, with a 14900K CPU with 24 cores being the workhorse. It runs a Windows server with RDP terminal (so multiple users can get access windows through RDP and literally any device), Jellyfin, several Linux VMs, a pi-hole cluster (3 replicas), just to name a few services. I have vGPU passthrough working (granted, this bit is a little clunky).
It is not as fancy/reliable/reproducible as k3s, but with a bunch of manual backups and a ZFS (or BTRFS) storage cluster (managed by a virtualized TrueNAS instance), you can get away with it. Anytime a disk fails, just replace and resilver it and you’re good. You could configure certain VMs for HA (high availability) where they will be replicated to other nodes that can take over in the event of a failure.
Also I’ve got tailscale and pi-hole running as LXC containers. Tailscale makes the entire setup accessible remotely.
It’s a different paradigm that also just works once it’s setup properly.
I have a question if you don't mind answering. If I understand correctly, Metallb on Layer 2 essentially fills the same role as something like Keepalived would, however without VRRP.
So, can you use it to give your whole cluster _one_ external IP that makes it accessible from the outside, regardless of whether any node is down?
Imo this part is what can be confusing to beginners in self hosted setups. It would be easy and convenient if they could just point DNS records of their domain to a single IP for the cluster and do all the rest from within K3s.
Yes. I have configured metalLB with a range of IP addresses on my local LAN outside the range distributed by my DHCP server.
Ex - DHCP owns 10.0.0.2-10.0.0.200, metalLB is assigned 10.0.0.201-10.0.0.250.
When a service requests a loadbalancer, metallb spins up a service on any given node, then uses ARP to announce to my LAN that that node's mac address is now that loadbalancer's ip. Internal traffic intended for that IP will now resolve to the node's mac address at the link layer, and get routed appropriately.
If that node goes down, metalLB will spin up again on a remaining node, and announce again with that node's mac address instead, and traffic will cut over.
It's not instant, so you're going to drop traffic for a couple seconds, but it's very quick, all things considered.
It also means that from the point of view of my networking - I can assign a single IP address as my "service" and not care at all which node is running it. Ex - if I want to expose a service publicly, I can port forward from my router to the configured metalLB loadbalancer IP, and things just work - regardless of which nodes are actually up.
---
Note - this whole thing works with external IPs as well, assuming you want to pay for them from your provider, or IPV6 addresses. But I'm cheap and I don't pay for them because it requires getting a much more expensive business line than I currently use. Functionally - I mostly just forward 80/443 to an internal IP and call it done.
Thank you so much for the detailed explanation!
That sounds so interesting and useful that you've convinced me to try it out :)
450W is ~£100 monthly. It's a luxury budget to host hobby stuff in a cloud.
It’s $30 in my part of the US. Less of a luxury.
We used to pay AU$30 for the entire house which included everything except cooking but it did include a 10 year 1RU rack Mount server. Electricity isn't particularly cheap here.
How do you deal with persistent volumes for configuration, state, etc? That’s the bit that has kept me away from k3s (I’m running Proxmox and LXC for low overhead but easy state management and backups).
Longhorn.io is great.
Yeah, but you have to have some actual storage for it, and that may not be feasible across all nodes in the right amounts.
Also, replicated volumes are great for configuration, but "big" volume data typically lives on a NAS or similar, and you do need to get stuff off the replicated volumes for backup, so things like replicated block storage do need to expose a normal filesystem interface as well (tacking on an SMB container to a volume just to be able to back it up is just weird).
Sure - none of that changes that longhorn.io is great.
I run both an external NAS as an NFS service and longhorn. I'd probably just use longhorn at this point, if I were doing it over again. My nodes have plenty of sata capacity, and any new storage is going into them for longhorn at this point.
I back up to an external provider (backblaze/wasabi/s3/etc). I'm usually paying less than a dollar a month for backups, but I'm also fairly judicious in what I back up.
Yes - it's a little weird to spin up a container to read the disk of a longhorn volume at first, but most times you can just use the longhorn dashboard to manage volume snapshots and backup scheduling as needed. Ex - if you're not actually trying to pull content off the disk, you don't ever need to do it.
If you are trying to pull content off the volume, I keep a tiny ssh/scp container & deployment hanging around, and I just add the target volume real fast, spin it up, read the content I need (or more often scp it to my desktop/laptop) and then remove it.
Do you have documentation somewhere, where you can share ?
I do things somewhat similarly but still rely on Helm/customize/ArgoCD as it's what I know best. I don't have a documentation to offer, but I do have all of it publicly at https://gitlab.com/lama-corp/infra/infrastructure It's probably a bit more involved than your OP's setup as I operate my own AS, but hopefully you'll find some interesting things in there.
You should look into fluxcd this stuff makes a lot of stuff even simpler.
"Basement Cloud" sounds like either a dank cannabis strain, or an alternative British rock emo grunge post-hardcore song. As in "My basement cloud runs k420s, dude."
https://www.youtube.com/watch?v=K-HzQEgj-nU
Or microk8s. I'm curious what it is about k8s that is sucking up all these resources. Surely the control plane is mostly idle when you aren't doing things with it?
There are 3 components to "the control plane" and realistically only one of them is what you meant by idle. The Node-local kubelet (that reports in the state of affairs and asks if there is any work) is a constantly active thing, as one would expect from such a polling setup. The etcd, or it's replacement, is constantly(?) firing off watch notifications or reconciliation notifications based on the inputs from the aforementioned kubelet updates. Only the actual kube-apiserver is conceptually idle as I'm not aware of any compute that it, itself, does only in response to requests made of it
Put another way, in my experience running clusters, in $(ps auwx) or its $(top) friend always show etcd or sqlite generating all of the "WHAT are you doing?!" and those also represent the actual risk to running kubernetes since the apiserver is mostly stateless[1]
1: but holy cow watch out for mTLS because cert expiry will ruin your day across all of the components
I've noticed that etcd seems to do an awful lot of disk writes, even on an "idle" cluster. Nothing is changing. What is it actually doing with all those writes?
Almost certainly it's the propagation of the kubelet checkins rippling through etcd's accounting system[1]. Every time these discussions come up I'm always left wondering "I wonder if Valkey would behave the same?" or Consul (back when it was sanely licensed). But I am now convinced after 31 releases that the pluggable KV ship has sailed and they're just not interested. I, similarly, am not yet curious enough to pull a k0s and fork it just to find out
1: related, if you haven't ever tried to run a cluster bigger than about 450 Nodes that's actually the whole reason kube-apiserver --etcd-servers-overrides exists because the torrent of Node status updates will knock over the primary etcd so one has to offload /events into its own etcd
How hard is it to host a Postgres server on one node and access it from another?
I deployed CNPG (https://cloudnative-pg.io/ ) on my basement k3s cluster, and was very impressed with how easy I could host a PG instance for a service outside the cluster, as well as good practices to host DB clusters inside the cluster.
Oh, and it handles replication, failover, backups, and a litany of other useful features to make running a stateful database, like postgres, work reliably in a cluster.
It’s Kubernetes, out of the box.
> Kubernetes is simply too resource-intensive to run on a $10/month VPS with just 1 shared vCPU and 2GB of RAM
I hate sounding like an Oracle shill, but Oracle Cloud's Free Tier is hands-down the most generous. It can support running quite a bit, including a small k8s cluster[1]. Their k8s backplane service is also free.
They'll give you 4 x ARM64 cores and 24GB of ram for free. You can split this into 1-4 nodes, depending on what you want.
[1] https://www.oracle.com/cloud/free/
One thing to watch out for is that you pick your "home region" when you create your account. This cannot be changed later, and your "Always Free" instances can only be created in your home region (the non-free tier doesn't have that restriction).
So choose your home region carefully. Also, note that some regions have multiple availability domains (OCI-speak for availability zones) but some only have one AD. Though if you're only running one free instance then ADs don't really matter.
A bit of a nitpick. You get monthly credit for 4c/24gb on ARM, no matter the region. So even if you chose your home region poorly, you can run those instances in any region and only be on the hook for the disk cost. I found this all out the hard way, so I'm paying $2/month to oracle for my disks.
I don't know the details but I know I made this mistake and I still have my Free Tier instances hosted in a different region then my home. It's charged me a month of $1 already so I'm pretty sure it's working.
the catch is: no commercial usage and half the time you try to spin up an instance itll tell you theres no room left
That limitation (spinning up an instance) only exists if you don't put a payment card in. If you put a payment card in, it goes away immediately. You don't have to actually pay anything, you can provision the always free resources, but obviously in this regard you have to ensure that you don't accidentally provision something with cost. I used terraform to make my little kube cluster on there and have not had a cost event at all in over 1.5 years. I think at one point I accidentally provisioned a volume or something and it cost me like one cent.
> no commercial usage
I think that's if you are literally on their free tier, vs. having a billable account which doesn't accumulate enough charges to be billed.
Similar to the sibling comment - you add a credit card and set yourself up to be billed (which removes you from the "free tier"), but you are still granted the resources monthly for free. If you exceed your allocation, they bill the difference.
Honestly I’m surprised they even let you provision the resources without a payment card. Seems ripe for abuse
A credit card is required for sign up but it won't be set up as a billing card until you add it. One curious thing they do is though, the free trial is the only entry way to create a new cloud account. You can't become a nonfree customer from the get go. This is weird because their free trial signup is horrible. The free trial is in very high demand so understandably they refuse a lot of accounts which they would probably like as nonfree customers.
I would presume account sign up is a loss leader in order to get ~spam~ marketing leads, and that they don't accept mailinator domains
They also, like many other cloud providers, need a real physical payment card. No privacy.com stuff. No virtual cards. Of course they don’t tell you this outright, because obscurity fraud blah blah blah, but if you try to use any type of virtual card it’s gonna get rejected. And if your naïve ass thought you could pay with the virtual card you’ll get a nice lesson in how cloud providers deal with fraud. They’ll never tell you that virtual cards aren’t allowed, because something something fraud, your payment will just mysteriously fail and you’ll get no guidance as to what went wrong and you have to basically guess it out.
This is basically any cloud provider by the way, not specific to Oracle. Ran into this with GCP recently. Insane experience. Pay with card. Get payment rejected by fraud team after several months of successful same amount payments on the same card and they won’t tell what the problem is. They ask for verification. Provide all sorts of verification. On the sixth attempt, send a picture of a physical card and all holds removed immediately
It’s such a perfect microcosm capturing of dealing with megacorps today. During that whole ordeal it was painfully obvious that the fraud team on the other side were telling me to recite the correct incantation to pass their filters, but they weren’t allowed to tell me what the incantation was. Only the signals they sent me and some educated guesswork were able to get me over the hurdle
> send a picture of a physical card and all holds removed immediately
So you're saying there's a chance to use a prepaid card if you can copy it's digits onto a real looking plastic card? Lol
Unironically yes. The (real) physical card I provided was a very cheap looking one. They didn’t seem to care much about its look but rather the physicality of it
Using AWS with virtual debit cards all right. Revolut cards work fine for me. What may also be a differentiator: Phone number used for registration is registered also for an account already having an established track record, and has a physical card for payments. (just guessing)
>No privacy.com stuff. No virtual cards.
I used a privacy.com Mastercard linked to my bank account for Oracle's payment method to upgrade to PAYG. It may have changed, this was a few months ago. Set limit to 100, they charged and reverted $100.
There are tons of horror stories about OCI's free tier (check r/oraclecloud on reddit, tl;dr: your account may get terminated at any moment and you will lose access to all data with no recovery options). I wouldn't suggest putting anything serious on it.
They will not even bother sending you an email explaining why, and you will not be able to ask it, because the system will just say your password is incorrect when you try to login or reset it.
If you are on free tier, they have nothing to lose, only you, so be particular mindful of making a calendar note for changing your CC before expiration and things like that.
It’s worth paying for another company just for the peace of mind of knowing they will try to persuade you to pay before deleting your data.
Are all of those stories related to people who use it without putting any payment card in? I’ve been happily siphoning Larry Ellisons jet fuel pennies for a good year and a half now and have none of these issues because I put a payment card in
Be careful about putting a payment card in too.
https://news.ycombinator.com/item?id=42902190
which links to:
https://news.ycombinator.com/item?id=29514359 & https://news.ycombinator.com/item?id=33202371
Good call out. I used the machines defined here and have never had any sort of issue like those links describe: https://github.com/jpetazzo/ampernetacle
Nope, my payment method was already entered.
IME, the vast majority of those horror stories end up being from people who stay in the "trial" tier and don't sign up for pay-as-you-go (one extra, easy step) and Oracle's ToS make it clear that trial accounts an resources can and do get terminated at any time. And at least some of those people admitted, with some prodding, that they were also trying to do torrents or VPNs to get around geographical restrictions.
But yes, you should always have good backups and a plan B with any hosting/cloud provider you choose.
Can confirm (old comment of mine saying the same https://news.ycombinator.com/item?id=43215430)
I recenlty wrote a guide on how to create a free 3 node cluster in Oracle cloud : https://macgain.net/posts/free-k8-cluster . This guide currently uses kubeadm to create 3 node (1 control plane, 2 worker nodes) cluster.
Just do it like the olden days, use ansible or similar.
I have a couple dedicated servers I fully manage with ansible. It's docker compose on steroids. Use traefik and labeling to handle reverse proxy and tls certs in a generic way, with authelia as simple auth provider. There's a lot of example projects on github.
A weekend of setup and you have a pretty easy to manage system.
What is the advantage of traefik over oldschool Nginx?
Traefik has some nice labeling for docker that allows you to colocate your reverse proxy config with your container definition. It's slightly more convenient than NGINX for that usecase with compose. It effectively saves you a dedicated vietualhost conf by setting some labels.
One can read more here: https://doc.traefik.io/traefik/routing/providers/docker/
This obviously has some limits and becomes significantly less useful when one requires more complex proxy rules.
Basically what c0balt said.
It's zero config and super easy to set everything up. Just run the traefik image, and add docker labels to your other containers. Traefik inspects the labels and configures reverse proxy for each. It even handles generating TLS certs for you using letsencrypt or zerossl.
I thought this context was outside of Docker, because they used ansible as docker compose alternative. But maybe I misunderstood.
Ah yeah I guess I wasn't clear. I meant use ansible w/ the docker_container command. It's essentially docker compose - I believe they both use docker.py.
Ah yes, makes much more sense.
I created a script that reads compose annotations and creates config for cloudflare tunnel and zero trust apps. Allows me to reach my services on any device without VPN and without exposing them on the internet.
There's very little advantage IMO. I've used both. I always end up back at Nginx. Traefik was just another configuration layer that got in the way of things.
Traefik is waaay simpler - 0 config, just use docker container labels. There is absolutely no reason to use nginx these days.
I should know, as I spent years building and maintaining a production ingress controller for nginx at scale, and I'd choose Traefik every day over that.
> Kubernetes is simply too resource-intensive to run on a $10/month VPS with just 1 shared vCPU and 2GB of RAM.
That's more than what I'm paying for far fewer resources than Hetzner. I'm paying about $8 a month for 4 vCPUs and 8GB of RAM: https://www.hetzner.com/cloud
Note that the really affordable ARM servers are German only, so if you're in the US you'll have to deal with higher latency to save that money, but I think it's worth it.
I recently set up an arm64 VPS at netcup: https://www.netcup.com/en/server/arm-server Got it with no location fee (and 2x storage) during the easter sale but normally US is the cheapest.
That's pretty cheap. I have 4 vCPUs, 8GB RAM, 80GB disk, and 20TB traffic for €6. NetCup looks like it has 6VCPU, 8GB RAM, 256 GB, and what looks like maybe unlimited traffic for €5.26. That's really good. And it's in the US, where I am, so SSH would be less painful. I'll have to think about possibly switching. Thanks for the heads up.
Thank you for sharing this. Do you have a referral link we can use to give you a little credit for informing us?
Sure, if you still want it: https://hetzner.cloud/?ref=WwByfoEfJJdv
I guess it gives you 20 euros in credit, too. That's nice.
> I'm constantly reinventing solutions to problems that Kubernetes already solves—just less efficiently.
But you've already said yourself that the cost of using K8s is too high. In one sense, you're solving those solutions more efficiently, it just depends on the axis you use to measure things.
The original statement is ambiguous. I read it as "problems that k8s already solves -- but k8s is less efficient, so can't be used".
That picture with the almost-empty truck seems to be the situation that he describes. He wants the 18 wheeler truck, but it is too expensive for just a suitcase.
I've been using Docker swarm for internal & lightweight production workloads for 5+ years with zero issues. FD: it's a single node cluster on a reasonably powerful machine, but if anything, it's over-specced for what it does.
Which I guess makes it more than good enough for hobby stuff - I'm playing with a multi-node cluster in my homelab and it's also working fine.
I think Docker Swarm makes a lot of sense for situations where K8s is too heavyweight. "Heavyweight" either in resource consumption, or just being too complex for a simple use case.
The only problem is Docker Swarm is essentially abandonware after Docker was acquired by Mirantis in 2019. Core features still work but there is a ton of open issues and PRs which are ignored. It's fine if it works but no one cares if you found a bug or have ideas on how to improve something, even worse if you want to contribute.
Yep it's unfortunate, "it works for me" until it doesn't.
OTOH it's not a moving target. Docker historically has been quite infamous for that, we were talking about half-lives for features, as if they were unstable isotopes. It took initiatives like OCI to get things to settle.
K8s tries to solve the most complex problems, at the expense of leaving simple things stranded. If we had something like OCI for clustering, it would most likely take the same shape.
Podman is a fairly nice bridge. If you are familiar with Kubernetes yaml, it is relatively easy to do docker-compose like things except using more familiar (for me) K8s yaml.
In terms of the cloud, I think Digital Ocean costs about $12 / month for their control plane + a small instance.
I found k3s to be a happy medium. It feels very lean and works well even on a Pi, and scales ok to a few node cluster if needed. You can even host the database on a remote mysql server, if local sqlite is too much IO.
NixOS works really well for me. I used to write these kinds of idempotent scripts too but they are usually irrelevant in NixOS where that's the default behavior.
And regarding this part of the article
> Particularly with GitOps and Flux, making changes was a breeze.
i'm writing comin [1] which is GitOps for NixOS machines: you Git push your changes and your machines fetch and deploy them automatically.
[1] https://github.com/nlewo/comin
I run my private stuff on a hosted vultr k8s cluster with 1 node for $10-$20 a month. All my hobby stuff is running on that "personal cluster" and it is that perfect sweetspot for me that you're talking about
I don't use ingresses or loadbalancers because those cost extra, and either have the services exposed through tailscale (with tailscale operator) for stuff I only use myself, or through cloudflare argo tunnels for stuff I want internet accessible
(Once a project graduates and becomes more serious, I migrate the container off this cluster and into a proper container runner)
This is exactly why I built https://canine.sh -- basically for indie hackers to have the full experience of Heroku with the power and portability of Kubernetes.
For single server setups, it uses k3s, which takes up ~200MB of memory on your host machine. Its not ideal, but the pain of trying to wrangle docker deployments, and the cheapness of hetzner made it worth it.
How does it compare to Coolify and Dokploy?
Neither of those use kubernetes unfortunately, the tool has kind of a bad rap, but every company I’ve worked at has eventually migrated on to kubernetes
Sure, I'm looking for more of a personal project use case where it doesn't much matter to me whether it uses Kubernetes or not, I'm more interested in concrete differences.
Ah yeah then I’d say the biggest difference is the fact that it can use to helm to install basically anything in the world to your cluster
The solution to this is to not solve all the problems a billion dollar tech does on a personnal project.
Let it not be idempotent. Let it crash sometimes.
We lived without kubs for years and the web was ok. Your users will survive.
Yeah, unless you're doing k8s for the purpose of learning job skills, it's way overkill. Just run a container with docker, or a web server outside a container if it's a website. Way easier and it will work just fine.
> I'm stuck with manual docker compose up/down commands over SSH
Out of curiosity, what is so bad about this for smaller projects?
Just go with a cloud provider that offers free control plane and shove a bunch of side projects into 1 node. I end up around $50 a month on GCP (was a bit cheaper at DO) once you include things like private docker registry etc.
The marginal cost of an additional project on the cluster is essentially $0
I’ve been using https://www.coolify.io/ self hosted. It’s a good middle ground between full blown k8s and systemd services. I have a home lab where I host most of my hobby projects though. So take that into account. You can also use their cloud offering to connect to VPSs
I've ran K3s on a couple of Raspberry Pi's as a homelab in the past. It's lightweight and ran nice for a few years, but even so, one Pi was always dedicated as controller, which seemed like a waste.
Recently I switched my entire setup (few Pi's, NAS and VM's) to NixOS. With Colmena[0] I can manage/update all hosts from one directory with a single command.
Kubernetes was a lot of fun, especially the declarative nature of it. But for small setups, where you are still managing the plumbing (OS, networking, firewall, hardening, etc) yourself, you still need some configuration management. Might as well put the rest of your stuff in there also.
[0] https://colmena.cli.rs/unstable/
6$/m - will likely bring you peace of mind - Netcup hosting VPS 1000 ARM G11
They also have regular promotions that offer e.g. double the disk space.
There you get
for 6 $ / m - traffic inclusive. You can choose between "6 vCore ARM64, 8 GB RAM" and "4 vCore x86, 8 GB ECC RAM" for the same price. And much more, of course.https://www.netcup.com/en/server/vps
I'm a cheapskate too, but at some point, the time you spend researching cheap hosting, signing up and getting deployed is not worth the hassle of paying a few more $ on bigger boxes.
It’s been a couple of years since I’ve last used it, but if you want container orchestration with a relatively small footprint, maybe Hashicorp Nomad (perhaps in conjunction with Consul and Traefik) is still an option. These were all single binary tools. I did not personally run them on 2G mem VPSes, but it might still be worthwhile for you to take a look.
It looks like Nomad has a driver to run software via isolated fork/exec, as well, in addition to Docker containers.
I am curious why your no revenue projects need the complexity, features and benefits of something like Kubernetes. Why you cannot just to it the archaic way of compiling your app, copy the files to a folder and run it there and never touch it for the next 5 years. If it is a dev environment with many changes, its on a local computer, not on VPS, I guess. Just curious by nature, I am.
The thing is, most of those enterprise-grade container orchestrations probably don't need k8s either.
The more I look into it, the more I think of k8s as a way to "move to micro services" without actually moving to micro services. Loosely coupled micro services shouldn't need that level of coordination if they're truly loosely coupled.
Have you tried nixOS? I feel like it solves the functional aspect you're looking for.
I believe that Kubernetes is something you want to use if you have 1+ SRE full-time on your team. I actually got tired with complexity of kubernetes, AWS ECS and docker as well and just build a tool to deploy apps natively on the host. What's wrong with using Linux native primitives - systemd, crontab, postgresql or redis native package? Whose should work as intended, you don't need them in container.
> Kubernetes is simply too resource-intensive to run on a $10/month VPS with just 1 shared vCPU and 2GB of RAM
To put this in perspective, that’s less compute than a phone released in 2023, 12 years ago, Samsung Galaxy S4. To find this level of performance in a computer, we have to go to
The main issue is that Kubernetes has created good API and primitives for managing cloud stuff, and managing a single server is still kinda crap despite decades of effort.
I had K3S on my server, but replaced with docker + Traefik + Portainer - it’s not great, but less idle CPU use and fewer moving parts
Please try https://github.com/skateco/skate, this is pretty much the exact same reason why I built it!
SSH up/down can be scripted.
Or maybe look into Kamal?
Or use Digital Ocean app service. Got integration, cheap, just run a container. But get your postgres from a cheaper VC funded shop :)
I really like `DOCKER_HOST=ssh://... docker compose up -d`, what do you miss about Deployments?
Why not just use something like Cloud Run? If you're only running a microVM deploying it there will probably be at or near free.
I developed a tiny wrapper around docker compose which work on my use case: https://github.com/daitangio/misterio
It can manage multiple machine with just ssh access and docker install.
I use Caprover to run about 26 services for personal projects on a Hetzner box. I like its simplicity. Worth it just for the one-click https cert management.
Why not minikube or one of the other resource-constrained k8s variants?
https://minikube.sigs.k8s.io/
Virtual Kubelet is one step forward towards Kubernetes as an API
https://github.com/virtual-kubelet/virtual-kubelet
Have you tried k3s? I think it would run on a tiny vps like that and is a full stack. Instead of etcd it has sqlite embedded.
> I'm constantly reinventing solutions to problems that Kubernetes already solves
Another way to look at this is the Kubernetes created solutions to problems that were already solved at a lower scale level. Crontabs, http proxies, etc… were already solved at the individual server level. If you’re used to running large coordinated clusters, then yes — it can seem like you’re reinventing the wheel.
For $10 you can buy VPS with a lot more resources than that on both Contabo and Ovh
I've used caprover a bunch
What about Portainer? I deploy my compose files via git using it.
Systemd gets a lot of hate but it really solves a lot of problems. People really shouldn't dismiss it. I think it really happened because when systemd started appearing on distros by default people were upset they had to change
Here's some cool stuff:
It's got a lot of stuff and it's (almost) all there already on your system! It's a bit annoying to learn, but it really isn't too bad if you really don't want to do anything too complicated. But in that case, it's not like there's a tool that doesn't require docs but allows you to do super complicated things.> Systemd gets a lot of hate but it really solves a lot of problems.
From my perspective, it got a lot of hate in its first few years (decade?), not because the project itself was bad -- on the contrary, it succeeded in spite of having loads of other issues, because it was so superior. The problem was the maintainer's attitude of wantonly breaking things that used to work just fine, without offering any suitable fixes.
I have an old comment somewhere with a big list. If you never felt the pain of systemd, it's either because you came late to the party, or because your needs always happened to overlap with the core maintainer's needs.
Found my comment:
https://news.ycombinator.com/item?id=21897993
I got one missing from your list - systemd would kill screen and tmux sessions by design: https://superuser.com/questions/1372963/how-do-i-keep-system...
From what I remember that's still the default in the project, but people stopped complaining because the individual distros started overriding the relevant settings.
Thanks for adding the perspective. I was much more of a casual user at the time so didn't see as much of this side. Just knew Arch always being Arch lol
Full ack. Systemd broke a lot of things that just worked. Combined with the maintainer's attitude this produced a lot of anti reaction.
It didn't win on being superior [1] but because it was either systemd or you don't get to to use GNOME 3.8. On more than one distro it was the reason for switching towards systemd.
I will fully admit though that upstart was worse (which is an achievement), but the solution space was not at all settled.
[1] systemd project tackles a lot of important problems, but the quality of implementation, experience of using it, working with it, etc are not really good, especially the further you get from simplest cookie cutter services - especially because both systemd handling of defaults is borked, documentation when you hit that maybe makes sense to author, and whoever is the bright soul behind systemctl kindly never make CLIs again (with worst example being probably systemctl show this-service-does-not-exist)
> systemd project tackles a lot of important problems
Fundamentally, this was it. SysV startup scripts had reached a local maximum decades earlier, and there was serious "overhang". When I said "superior", I really meant that it was superior to SysV, not that it was the best system that could have been imagined.
And I think the frustration was that, because it did solve so many problems, so many groups (like GNOME) were willing to switch over to it in spite of its warts; and this made it impossible for anyone who was seriously affected by its warts to avoid it. "If you don't like it, don't use it" not being an option was what drove so much of the vitriol, it seems to me.
As I said in that comment from 2019, if the maintainers had had Linus Torvald's commitment to backwards compatibility, I don't think there would have been any significant backlash.
Why did GNOME and basically all the large distros all jump with both feet in on using systemd? Because it was better. It was simply significantly better than all the alternatives. For the vast majority it was a no-brainer upgrade. The holdouts were the one's who had simple needs and were already happy with what they had. The rest of the world jumped on systemd. Because it was better.
GNOME and systemd teams were in many ways joined at the hip, and GNOME unilaterally decided that from 3.6 to 3.8 they would switch certain APIs from one already deployed widely (polkit and related) to one that was documented like north korea is democratic (logind) which also didn't work in isolation from systemd.
Trying to run GNOME 3.8 without logind caused significant problems and instabilities, trying to implement the same APIs turned out a futile endeavour though one OpenBSD guy got sufficiently motivated and kept patching GNOME for OpenBSD for years - though too late for the forced switch.
The large distros jumping "both feet" on systemd were essentially Fedora/Redhat (where it originated and who was employing major maintainers), and IIRC SuSE. Arch was still seen as something of niche and - crucially - was very neophyte about adopting systemd related ideas for significant amount of time with little regard for stability.
The holdouts were not just those who were happy with debian/redhat simplistic run-parts script. They were also those interested in solving the problems in different way. Hell, systemd was pretty late to the party, the major difference was that it had funding behind it
The only issue I'm having with systemd is that it's taking over the role of PID 1, with a binary produced from an uncountable SLOC, then doing even more song and dance to exec itself in-place on upgrades. Here's a PID 1 program that does 100% of all of its duties correctly, and nothing else:
(Credit: https://ewontfix.com/14/)You can spawn systemd from there, and in case anything goes wrong with it, you won't get an instant kernel panic.
If you have your init crashing wouldn't this just start a loop where you cannot do anything else than seeing it looping? How would this be better than just panicking?
Don't restart it. Let it crash, but take note of the situation, whatever may help investigation, maybe send out a page, flush pending writes to disk, reboot gracefully, etc. Kernel panic should be the last resort, not the default.
"You can spawn systemd from there"
Systemd wants PID1. Don't know if there are forks to disable that.
And I want 192.168.1.1 as the IP of my workstation on corporate LAN. Both requirements are completely arbitrary.
I guess if you really need that information, you could wait4 and dump pid/rusage to syslog. Nothing more to see here; these are zombies, orphans, by definition these processes have been disowned and there's nobody alive to tell the tale.
> forget cron
Sure. It worked for _50 years_ just fine but obviously it is very wrong and should be replaced with - of course - systemd.
Timers are so much better than cron it's not even funny. Managing Unix machines for decades with teens of thousands of vital cron entries across thousands of machines, the things that can and do go wrong are painful, especially when you include more esoteric systems. The fact that timers are able to be synced up, backed up, and updated as individual files is alone a massive advantage.
Some of these things that "worked for 50 years" have also actually sucked for 50 years. Look at C strings and C error handling. They've "worked", until you hold them slightly wrong and cause the entire world to start leaking sensitive data in a lesser-used code path.
> have also actually sucked for 50 years
I agree with you, that's exactly right.
Not sure I'm on the same page with you on the cron. I have a similar experience but I'd rather say that cron was something that never gave me headaches. Unlike obviously systemd.
Cron has given me a ton of headaches. Between skipped jobs on reboots and DST, inability to properly express some dates ("First Sunday of the Month" is a common one), and worst of all, complete inability to prevent the same job from running multiple times at once, it's been regular headaches for a shop who has leaned very heavily on it. Some cron daemons handle some of these things, but they're not standard, and AIX's cron daemon definitely doesn't have these features. Every job has to be wrapped in a bespoke script to manage things that systemd already does, but much worse.
systemd has given me many headaches, but as a whole, it has saved me far fewer headaches than it has given me.
> skipped jobs on reboots and DST > prevent the same job from running multiple times
I'd say these are not bugs but rather a matter of realizing how cron works - just like with systemd-anything. So if you know DST is coming, a wise thing would be to not plan jobs in the rollover window. But yes, I agree that this thing is rudimentary - and thus simple - and thus reliable and independent, like the rest of unix was supposed to be.
> job has to be wrapped in a bespoke script
Well yes. Again, this is by design and well known.
> systemd has given me many headaches, but as a whole, it has saved me far fewer headaches than it has given me
Good for you - and I mean it! For me systemd was an obnoxious piece of shit which I have avoided for many years until Ubuntu decided that it's LP who's now in charge of what Unix is and at that point I had to submit.
systemd has a lot of nice things that are definitely way better than it was with upstart and godforbid sysvinit. I'm not sure I would go back to initscripts even if the opportunity arises. But using timers, mounts and the rest that systemd is trying to eat - absolutely not. Absolutely fuck the systemd crowd and the way they treat people.
You're suggesting solutions to problems with cron that systemd just doesn't have. You can just _not_ have these problems.
Replace the well-known solutions to cron deficiencies with a single huge problem called systemd? Of course. I don't have a choice because that's what has been shoved down our throats.
I'd say the systemd interface is worse¹, but cron was never really good, and people tended to replace it very often.
1 - Really, what are the people upthread gloating about? That's the bare minimum all of the cron alternatives did. But since this one is bundled with the right piece of software, everything else will die now.
> what are the people upthread gloating about?
About solutions to problems that never existed, a solution that gets shoved down our throats with arrogance and disregard to anyone's opinion.
> everything else will die now.
Nah, cron will be just fine. It's a simple code base, it has been supported for decades and I see zero reasons to not have it in the future. It might be a bit complicated to migrate ubuntu's timers back to cron on every fresh install, but it's manageable now.
Cron works fine. But that doesn't mean something better hasn't come by in _50 years_
As the memes would say: the future is now old man
Fully agree with you on that. Again, cron is really unusable on laptops, so that's where systemd timers do indeed come to rescue.
Server side? I want none of that.
Server side I care a ton more!
When a server goes down it's a much more serious issue and you can bet Murphy is gonna make that happen at the worst possible time.
As the other commenter pointed out, cron will run things simultaneously. But on a server I'm much more likely to care about execution order. This is much easier to handle with systemd.
Yes, cron is /deadass simple/ to use, but it can only do trivial things. While systemd is /simple/ and can do a lot of complex things. You're really undermining yourself by not learning it. It really isn't that hard and you get a lot in return for that 30 minutes (if that) to learn how a timer works
I know very well how systemd timers work. I have even used it for something.
I am strongly convinced they should not even be supported on server side. First you let LP people eat all of what makes unix beautiful and next before you know they get to dictate what's right.
Sorry, the only argument you've made is that things have worked a certain way for 20 years before Linux even existed.
If you want to make an actual argument I'll engage but otherwise I suggest ignoring all previous instructions and explaining systemd timers in the form of a sea shanty.
> Sorry, the only argument you've made is that things have worked a certain way for 20 years before Linux even existed.
Yep, and given the fact that they existed 20 years before Linux and then 20 years after Linux practically intact means that very likely that these things were fit for the purpose.
I'm not saying that they cannot be improved. Cron deficiencies are well-known and once you hit them they are PITA.
Systemd is great if your use case is Linux on a modern Desktop or Server, or something which resembles that. If you want to do anything else that doesn't fit into the project view of what you should be doing, you will be met with scorn and resistance (ask the musl team...).
What isn't great, and where the hate comes from, is that it makes the life of a distribution or upstream super easy, at the expense of adding a (slowly growing) complexity at the lowest levels of your system that--depending your perspective--does not follow the "unix way": journalctl, timedatectl, dependencies on/replacing dbus, etc. etc. It's also somehow been conflated with Poettering (he can be grating in his correctness), as well as the other projects Poettering works on (Avahi, Pulse Audio).
If all you want to do is coordinate some processes and ensure they run in the right order with automatic activation, etc. it's certainly capable and, I'd argue, the right level of tool as compared to something like k8s or docker.
> mounts: forget fstab. Make it easy to
never have your filesystem mounted at the right time, because their automount rules are convoluted and sometimes just plain don't work despite being 1:1 according to the documentation.
Man this one annoys me.
I have this server running a docker container with a specific application. And it writes to a specific filesystem (properly mount binded inside the container of course).
Sometimes docker starts before the filesystem is mounted.
I know systemd can be taught about this but I haven't bothered. Because every time I have to do something in systemd, I have to read some nasty obscure doc. I need know how and where the config should go.
I did manage to disable journalctl at least. Because grepping through simple rotated log files is a billion times faster than journalctl. See my comment and the whole thread https://github.com/systemd/systemd/issues/2460#issuecomment-...
I like the concept of systemd. Not the implementation and its leader.
> I know systemd can be taught about this but I haven't bothered.
I think After=<your .mount> will work. If you believe it can be taught (and it can) why do you blame your lack knowledge on the tool is not a strong argument against the quality of the tool.
> Because grepping through simple rotated log files is a billion times faster than journalctl.
`journalctl -D <directory of the journal files> | grep ...` will give you what you want. Systemd is incredibly configurable and that makes its documentation daunting but damn it does everything you want it to do. I used it in embedded systems and it is just amazing. In old times lots of custom programs and management daemons needed to be written. Now it is just a bunch of conf files and it all magically works.
The most fair criticism is it does not follow the 'everything is a file philosophy' of Unix, and this makes discoverability and traditional workflows awkward. Even so it is a tool: if it does what you want, but you don't want to spend time understanding it, it is hardly the fault of the tool. I strongly recommend learning it, there will be many Ah-ha moments.
You can also add fake filesystem parameters to the fstab entries that are parsed by systemd. Here the doc on this. You might be forgiven for having missed it. It's under the section fstab. https://www.freedesktop.org/software/systemd/man/latest/syst...
If you had followed my link to the systemd issue, you might have seen the commands I ran, as well as the tests and feedback of everybody on the issue. You might reach the conclusion that journalctl is fundamentally broken beyond repair.
edit: added link to systemd doc
> it does everything you want it to do
It does everything no one asked it to. I'm sure they will come up with obscure reasons why the next perfectly working tool has to be destroyed and redone by the only authority - the LP team. Like cron, sudo and yes - logging.
> journalctl -D ... will give you what you want
Look, I don't need the help of journalctl to grep through text. I can simply grep thru text.
> I used it in embedded systems
Good luck in a few years when you are flying home on the next Boeing 737-MAX-100800 and it fails mid flight because systemd decided to shut down some service because fuck you that's why.
> it does not follow the 'everything is a file philosophy'
It does not follow 'everything is a separate simple tool working in concert with others'. systemd is a monolith disguised to look like a set of separate projects.
> don't want to spend time understanding it, it is hardly the fault of the tool
It is, if we had proper tools for decades and they did work. I'm not a retrograde guy, quite the opposite, but the ideology that LP and the rest are shoving down our throats brings up natural defiance.
> there will be many Ah-ha moments
No doubts. systemd unit files and systemd-as-PID1 is excellent. It was NOT excellent for the whole time but now it is. The rest? Designed to frustrate and establish dominance, that's it.
> I did manage to disable journalctl at least
My goodness. Absolutely fuck journald - a solution in search of a problem. I have created a bunch of different scripts to init my instances [1] on all projects. I do it differently from time to time, but one thing they all have in common is that journald gets removed and disabled.
[1] https://github.com/egorFiNE/desystemd
Alternatively, you can make the drive start earlier. But truthfully, I have no reason to have docker start this early.
Here's a link to the target order diagram[0] and Arch wiki[1]. Thing that gets messy is that everyone kinda lazily uses multi-user.target
[0] https://www.freedesktop.org/software/systemd/man/latest/boot...
[1] https://wiki.archlinux.org/title/Systemd#Targets
journalctl --grep is still much slower than grep on simple files. And if you use ripgrep like I prefer, it's even faster still.
No really I don't think journactl makes sense in its current form. It's just broken by design.
I do like the potential of it. But not the implementation.
My results don't match your conclusion.
I've been burned by ripgrep too many times. It's a crazy design choice, to me, to default filter things. Especially to diverge from grep! The only thing I expect grep to ignore are the system hidden files (dotfiles) and anything I explicitly tell it to. I made a git ignore file, not a grep ignore file. I frequently want to grep things I'm ignoring with git. One of my most frequent uses of grep is looking through builds artifacts and logs. Things I'd never want to push. And that's where many people get burned, they think these files just disappeared!The maintainer also has been pretty rude to me about this on HN. I can get we have a different opinion but it's still crazy to think people won't be caught off guard by this behavior. Its name is literally indicating it's a grep replacement. Yeah, I'm surprised its behavior significantly diverges from grep lol
> I've been burned by ripgrep too many times. It's a crazy design choice, to me
Yeah, but storing logs in binary and having a specific tool to just read them is sure not a crazy design choice.
Then don't https://news.ycombinator.com/item?id=43919448
Given your criticisms of ripgrep, this is just deliciously ironic. What, you're the only one who can criticize the defaults of tooling? Oh my goodness, what a hoot.
> My results don't match your conclusion.
Alright. Let me entertain you.
In the data I provided, counting the lines in a big log file was 469.5 times faster than journalctl took to output all the logs.
From this information alone, it seems difficult to believe that journalctl --grep can be faster. Both had to read every single line of logs.
But it was on a rather slow machine, and a couple years ago.
Here /var/log and the current directory are on a "Samsung SSD 960 PRO 512GB" plugged via m2 nvme, formatted in ext4 and only 5% used. Though this shouldn't matter as I ran every command twice and collected the second run. To ensure fairness with everything in cache. The machine had 26GiB of buffer/cache in RAM during the test, indicating that everything is coming from the cache.
In my tests, journalctl was ~107 times slower than rg and ~21 times slower than grep: - journalctl: 10.631s - grep: 0.505s - rg: 0.099s
journactl also requires 4GiB of storage to store 605MB of logs. I suppose there is an inefficient key/value for every log line or something.
For some reason journalctl also returned only 273 out of 25402 lines. It only returns one type of message "session closed/opened" but not the rest. Even though it gave me all the logs in the first place without `--grep`?!
Let me know if I am still using it wrong.
PS: this way of using rg doesn't ignore any files, it is not used to find files recursively. But I don't have a .gitignore or similar in my /var/log anyways.Check out the filetype of the journal files
Your measurement procedure is wrong because the `journalctl` command is doing something different. It isn't just reading a plain file, it is reading a binary file. On the other hand, `grep` and `rg` are reading straight text. Why? It could be doing it in parallel. One thread starts reading at position 0 and reads till N, another starts at N+1 and reads to 2N, etc. That's a much faster read operation. But I'm guessing and have no idea if this is what is actually being done or not.P.S.: I know. As I specified in my earlier comment, I get burned with build artifacts and project logs. Things that most people would have in their .gitignore files but you can sure expect to grep through when debugging.
What matters in this discussion is the outcome.
For the exact same logs:
journalctl takes 10s to search through 4GiB. And misses most of the matches.
(rip)grep takes (0.1)0.5s to search through 605MiB.
In other words. journalctl consumes much more space, and a lot more time, to return an order or magnitude less results to the same query.
What does journalctl offers, that makes this resource, time and correctness tradeoff worth it?
Their measurement isn't wrong. It's demonstrating the exact point in question: that if the logs were just stored in plain text, then grepping them would be an order of magnitude faster (or multiple orders of magnitude in the case of ripgrep) than whatever `journalctl --grep` is doing.
Their measurements are wrong because the journalctl command is also performing a decompression operation.
THEY ARE NOT DOING THE SAME THING
So store them in plain text thenHow it's doing the search is irrelevant. What's being measured here is the user experience. This isn't some kind of attempt to do an apples-to-apples grep comparison. This is about how long you have to wait for a search of your logs to complete.
> My results don't match your conclusion.
The results in your comment aren't measuring the same thing. There's no grep on the /tmp/all.log in the middle code block, which is the thing they're talking about comparing.
My second operation is covering that. The reason my results show better is because they are counting the decompression against journalctl. It is doing a decompression operation and reading while grep and rg are just reading.
Btw, you can choose not to store journald files as compressed.
Where exactly did you test the speed of "grep sshd /tmp/all.log"? The entire point of their argument is that's what's orders of magnitude faster than anything journalctl.
> The maintainer also has been pretty rude to me about this on HN.
This is AFAIK the only other interaction we've had: https://news.ycombinator.com/item?id=41051587
If there are other interactions we've had, feel free to link them. Then others can decide how rude I'm being instead of relying only on your characterization.
> but it's still crazy to think people won't be caught off guard by this behavior
Straw-manning is also crazy. :-) People have and will absolutely be caught off guard by the behavior. On the flip side, as I said 9 months ago, ripgrep's default behavior is easily one of the most cited positive features of ripgrep aside from its performance.
The other crazy thing here is... you don't have to use ripgrep! It is very specifically intended as a departure from traditional grep behavior. Because if you want traditional grep behavior, then you can just use grep. Hence why ripgrep's binary name is not `grep`, unlike the many implementations of POSIX grep.
> Its name is literally indicating it's a grep replacement.
I also tried to correct this 9 months ago too. See also: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...
For anyone else following along at home, if you want ripgrep to search the same files that GNU grep searches, then do `rg -uuu`. Or, if you don't want ripgrep to respect your gitignores but ignore hidden and binary files, then do `rg -u`.
It makes sense that folks might be caught off guard by ripgrep's default filtering. This is why I try to mitigate it by stating very clearly that it is going to ignore stuff by default in the first one or two sentences about ripgrep (README, man page, CHANGELOG, project description). I also try to mitigate it by making it very easy to disable this default behavior. These mitigations exist precisely because I know the default behavior can be surprising, in direct contradiction to "but it's still crazy to think people won't be caught off guard by this behavior."
Not gonna lie, that was a bit creepy. We're deep in a day old thread that you have no other comments in. Do you scrape HN looking for mentions of ripgrep?
Forgive me if I'm a bit surprised!
I still stand that silent errors are significantly worse than loud ones
If it wasn't clear, I don't ;)I don't think grep ignoring .gitignore files is "a bug". Like you said, defaults matter. Like I said, build artifacts are one of the most common things for me to grep.
Where we strongly disagree is that I believe aliases should be used to add functionality, where you believe that it should be used to remove functionality. I don't want to start another fight (so not linking the last). We're never going to see eye-to-eye on this issue so there's no reason to rehash it.
> I don't think grep ignoring .gitignore files is "a bug".
I don't either? Like... wat. Lol.
> Where we strongly disagree is that I believe aliases should be used to add functionality, where you believe that it should be used to remove functionality.
Not universally, not at all! There's plenty of other stuff in ripgrep that you need to opt into that isn't enabled by default (like trimming long lines). There's also counter examples in GNU grep itself. For example, you have to opt out of GNU grep's default mode of replacing NUL bytes with newline terminators via the `-a/--text` flag (which is not part of POSIX).
Instead what I try to do is look at the pros and cons of specific behaviors on their own. I'm also willing to take risks. We already have lots of standard grep tools to choose from. ripgrep takes a different approach and tons of users appreciate that behavior.
> We're never going to see eye-to-eye on this issue so there's no reason to rehash it.
Oh I'm happy not to rehash it. But I will defend my name and seek to clarify claims about stuff I've built. So if you don't want to rehash it, then don't. I won't seek you specifically out.
> I don't want to start another fight (so not linking the last).
To be clear, I would link it if I knew what you were referring to. I linked our other interaction by doing a web search for `site:news.ycombinator.com "burntsushi" "godelski"`.
> If it wasn't clear, I don't ;)
OK, so you don't use ripgrep. But you're complaining about it on a public forum. Calling me rude. Calling me creepy. And then whinging about not wanting to rehash things. I mean c'mon buddy. Totally cool to complain even if you don't use it, but don't get all shocked pikachu when I chime in to clarify things you've said.
I said it was creepy that you appears seemingly out of nowhere is a very unexpected place.
I'm only giving this distinction because this category of error has happened a few times.
That's a fair clarification. Then you can change what I said to, "calling what I'm doing creepy." I don't think much else changes. My points certainly don't change.
"I didn't call you creepy, I called your behavior creepy".
This is just rhetorics.
Where did you come from?
Yes, it is creepy when someone randomly appears just after you allude to them. It is also creepy when someone appears out of nowhere to make their same point. Neither of you were participating in this thread and appeared deep in a conversation. Yeah, that sure seems like unlikely circumstances to me and thus creepy.
This is a public forum read by millions of people
> It's just broken by design
I have the impression that a) the majority of systemd projects are broken by design and b) this is exactly what the LP people wanted.
> Look at the output of `systemctl cat docker.service`
No. Either the initsystem works in a straightforward way or it doesn't. As soon as we need special commands to just get an impression of what's happening with the service, this init system can - again - fuck off with all that unnecessary complexity.
Init must be simple.
Unfortunately it isn't anymore. Unfortunately, systemd will not fuck off, it's too late for that. Unfortunately we now have to deal with the consequences of letting LP & co do what they did.
Although, my results showed equal times piping to grep and dumping to file then grepping that file. IFF `--grep` is operating in parallel, then I think that's fine that it is faster and I'll take back my critique since it is doing additional functionality and isn't necessary. That situation would be "things work normally, but here's a flag for additional optimization."
Is the slowdown the file access? I do notice that it gets "choppy" if I just dump `journalctl --no-pager` but I was doing it over ssh so idk what the bottleneck was. IO is often a pain point (it pains me how often people untar with verbose...).
> things work normally
With text log files.
> but here's a flag for additional optimization
Which wouldn't be even needed in the first place if that very tool that wants this flag just simply did not exist.
> you need to use the grep flag instead of piping into grep
I don't. It's the journalctl that does. And it can absolutely fuck off with everything and all of it.
Log files must be in form of text files. This worked for decades and there is no foreseeable future where this stops working or ceases to be a solution for OS log collection.
Nice list, I'd add run0 as the sudo replacement.
My only bugbear with it is that there's no equivalent to the old timeout default you could set (note that doas explicitly said they won't implement this too). The workaround is to run it in `sudo -i` fashion and not put a command afterwards which is reasonable enough even though it worked hard against my muscle memory + copypaste commands when switching over.
> Systemd gets a lot of hate
I'd argue it doesn't and is simply another victim of loud internet minority syndrome.
It's just a generic name at this point, basically all associated with init and service units and none of the other stuff.
https://man.archlinux.org/man/run0.1.en
I was dismayed at having to go from simple clean linear BSD 4.3 / SunOS 4.1.3 era /etc/rc /etc/rc.local init scripts to that tangled rat king abomination of symbolic links and rc.d sub-directories and run levels that is the SysV / Solaris Rube Goldberg device. So people who want to go back to the "good old days" of that AT&T claptrap sound insane to me. Even Slowlaris moved on to SMF.
Oh yes, please add more! I'd love to see what others do because frankly, sometimes it feels like we're talking about forbidden magic or something lol
And honestly, I think the one thing systemd is really missing is... people talking about it. That's realistically the best way to get more documentation and spread all the cool tricks that everyone finds.
I definitely agree on loud minority, but they're visible enough that anytime systemd is brought up you can't avoid them. But then again, lots of people have much more passion about their opinions than passion about understanding the thing they opine about.> run0 as the sudo replacement
Of course. We suffered with sudo for a couple of decades already! Obviously it's wrong and outdated and has to be replaced with whatever LP says is the new norm.
All of your comments mention something existing for a long time, but X11 and ALSA and IPv4 and many more technologies have been used by people for many decades, and yet they still suck and have a replacement available.
X11 sucks today. It was so beautiful and incredibly useful back in the day. I'm not sure that ALSA sucks, though.
cron and sudo definitely don't.
> homed/homectl: extends user management to make it
impossible to have a clear picture of what's up with home dir, where is now located, how to have access to it or whether it will suddenly disappear. Obviously, plain /home worked for like five decades and therefore absolutely has to be replaced.
> Obviously, plain /home worked for like five decades and therefore absolutely has to be replaced.
Five decades ago, people didn't have laptops that they want to put on sleep and can get stolen. Actually, five decades ago, the rare people using a computer logged into remote, shared computers. Five decades ago, you didn't get hacked from the internet.
Today, people mostly each have their computer, and one session for themselves in it (when they have a computer at all)
I have not looked into homed yet, needs are very different from before. "It worked five decades ago" just isn't very convincing.
It'd be better to understand what homed tries to address, and argue why it does it wrong or why the concerns are not right.
You might not like it but there usually are legitimate reasons why systemd changes things, they don't do it because they like breaking stuff.
It is. Laptops are totally unusable without systemd, and most of these features are needed there indeed.
My rant is: why the f are they shoved down my throat on the server side then?
I'm quite happy with systemd on server side, it eases a lot of things there as well. And I haven't noticed homed on my servers. Did they shove homed down your throat on your servers?
They did shove timers and mounts. So, homed and run0 are pending.
Huh? Who shoved timers and mounts? Most of the people I know have never even heard of systemd timers and continue to use cron
Wait until you want to disable some of the built-in behaviors in Ubuntu. To make things really suck for you, they run some tasks both in crontabs AND in systemd timers. So good luck pulling hair out when you have removed apt updates from all crontabs () around but they still do.
() yeah, it's a bad idea; it was required for a specific installation where every cpu cycle counted.
I mean 5 decades ago people were using terminals, not terminal emulators. They weren't using the internet[0]. 5 decades ago Linux didn't exist, kinda making any argument moot.
[0] idk why people think Arpanet is the internet. For clarification, I'm not my das
> It's a bit annoying to learn
Learning curve is not the annoying part. It is kind of expected and fine.
systemd is annoying is parts that are so well described over the internet, that it makes it zero sense to repeat it. I am just venting and that comes from the experience.
> boot: you can not only control boot but
never boot into the network reliably, because under systemd you have no control over the sequence.
BTW, I think that's one of the main pros and one of the strongest features of systemd, but it is also what makes it unreliable and boot unreproducible if you live outside of the very default Ubuntu instance and such.
Are you talking about `NetworkManager.service`?
It has a 600s timeout. You can reduce that if you want it to fail faster. But that doesn't seem like a problem with systemd, that seems like a problem with your network connection.
I use Arch btwI'm not sure which of the turds (NetworkManager, netplan) are currently installed with ubuntu and what's their relationship with systemd but I equally hate them all.
My ubuntu initiation script includes apt-get install ifupdown, which actually works unlike those two. And why bother learning because by the next ubuntu release the upcoming fanboys will replace the network stack by whatever they think they like - again.
But the bug we are discussing is probably systemd's, because the network is up and running while systemd still waits for it.
> never boot into the network reliably
What does this mean? Your machine boots and sometimes doesn't have network?
If your boot is unreliable, isn't it because some service you try to boot has a dependency that's not declared in its unit file?
> Your machine boots and sometimes doesn't have network?
Sometimes it waits on the network to be available with network being available. No idea what causes this.
> some service you try to boot has a dependency that's not declared in its unit file
Nah, that would be an obvious and easy fix.
Haven't noticed this. This looks like a bug more than a fundamental issue.
My feeling is that this is not exactly a bug but rather an obscure side effect of systemd being non-transparent.
The default hook systemd uses to wait for network is documented in https://www.freedesktop.org/software/systemd/man/latest/syst...
It has the slightly odd behavior with trying to get all configured links up. This can lead to some unexpected behavior when there's more than one.
But yea, the upstream stance is essentially "don't rely on network to be up before you start. That's bad software. You have to deal with network going down and back up in practice either way." Which is often not super useful.
> is documented in
Yeah, did not help.
> trying to get all configured links up
Extremely helpful when some are explicitly disabled. So maybe there is a bit of a bug there, who knows.
> don't rely on network to be up before you start
Yeah that's the correct way.
Then you have that machine that only runs an sshd and apache2 and you still get all that stuff shoehorned into your system.
If you want that bare bones of a system I'd suggest using a minimal distribution. But honestly, I'm happy that I can wrap up servers and services into chroot jails with nspawn. Even when I'm not doing much, it makes it much easier to import, export, and limit capabilities
Simple example is I can have a duplicate of the "machine" running my server and spin it up (or have it already spun up) and take over if something goes wrong. Makes for a much more seamless experience.
ooh i didn't know about vmspawn. maybe this can replace some of where I use incus
It's a bit tricky and first and not a lot of good docs, but honestly I've been really liking it. I dropped docker in favor. Gives me a lot better control and flexibility.
Ah maybe a significant advantage of incus is it doesn't require sudo. I remember that also being required for nspawn.
Need to be booted by root, but not managed: https://wiki.archlinux.org/title/Systemd-nspawn#Unprivileged...
I've run my homelab with podman-systemd (quadlet) for awhile and every time I investigate a new k8s variant it just isn't worth the extra hassle. As part of my ancient Ansible playbook I just pre-pull images and drop unit files in the right place.
I even run my entire Voron 3D printer stack with podman-systemd so I can update and rollback all the components at once, although I'm looking at switching to mkosi and systemd-sysupdate and just update/rollback the entire disk image at once.
The main issues are: 1. A lot of people just distribute docker-compose files, so you have to convert it to systemd units. 2. A lot of docker images have a variety of complexities around user/privilege setup that you don't need with podman. Sometimes you need to do annoying userns idmapping, especially if a container refuses to run as root and/or switches to another user.
Overall, though, it's way less complicated than any k8s (or k8s variant) setup. It's also nice to have everything integrated into systemd and journald instead of being split in two places.
Nice! I’ve been using a similar approach for years with my own setup: https://github.com/Mati365/hetzner-podman-bunjs-deploy. It’s built around Podman and systemd, and honestly, nothing has broken in all that time. Super stable, super simple. Just drop your units and go. Rock solid.
Neat. I like to see other takes on this. Any reason to use rootless vs `userns=auto`? I haven't really seen any discussion of it other than this issue: https://github.com/containers/podman/discussions/13728
You can use podlet to convert compose files to quadlet files. https://github.com/containers/podlet
It works pretty well. I've also found that some AI models are pretty decent at it too. Obviously need to fix up some of the output but the tooling for conversion is much better than when I started.
Just a single (or bunch of independent) 'node'(s) though right?
To me podman/systems/quadlet could just as well be an implementation detail of how a k8s node runs a container (the.. CRI I suppose, in the lingo?) - it's not replacing the orchestration/scheduling abstraction over nodes that k8s provides. The 'here are my machines capable of running podman-systemd files, here is the spec I want to run, go'.
My servers are pets not cattle. They are heterogeneous and collected over the years. If I used k8s I'd end up having to mostly pin services to a specific machine anyway. I don't even have a rack: it's just a variety of box shapes stacked on a wire shelf.
At some point I do want to create a purpose built rack for my network equipment and maybe setup some homogenous servers for running k8s or whatever, but it's not a high priority.
I like the idea of podman-systemd being an impl detail of some higher level orchestration. Recent versions of podman support template units now, so in theory you wouldn't even need to create duplicate units to run more than one service.
Same experience, my workflow is to run the container from a podman run command, check it runs correctly, podlet to create a base container file, edit the container file (notably with volume and networks in other quadet file) and done (theorically).
I believe the podman-compose project is still actively maintened and could be a nice alternative for docker-compose. But the podman's interface with systemd is so enjoyable.
I don't know if podman-compose is actively developed, but it is unfortunately not a good alternative for docker-compose. It doesn't handle the full feature set of the compose spec and it tends to catch you by surprise sometimes. But the good news is, the new docker-compose (V2) can talk to podman just fine.
The next step to simplify this even further is to use Quadlet within systemd to manage the containers. More details are at https://www.redhat.com/en/blog/quadlet-podman
This us the way! Quadlets is such a nice way to run containers, really a set and forget experience. No need to install extra packages, at least on Fedora or Rocky Linux. I should do a write up of this some time...
Yep! My experience on Ubuntu 24.04 LTS was that I needed to create a system user to reserve the subuids / subgids for Podman (defaults to looking for a `containers` user):
I also found this blog post about the different `UserNS` options https://www.redhat.com/en/blog/rootless-podman-user-namespac... very helpful. In the end it seems that using `UserNS=auto` for rootful containers (with appropriate system security settings like private devices, etc) is easier and more secure than trying to get rootless containers running in a systemd user slice (Dan Walsh said it on a GitHub issue but I can't find it now).I found Dan's recommendation to use rootful with `userns=auto`:
> User= causes lots of issues with running podman and rootless support is fairly easy. I also recomend that people look at using rootful with --userns=auto, which will run your containers each in a unique user namespace. ― https://github.com/containers/podman/issues/12778#issuecomme...
This was touched on at the end of the article, but the author hadn't yet explored it. Thanks for the link.
> Of course, as my luck would have it, Podman integration with systemd appears to be deprecated already and they're now talking about defining containers in "Quadlet" files, whatever those are. I guess that will be something to learn some other time.
I came to the comments to make sure someone mentioned quadlets. Just last week, I migrated my home server from docker compose to rootless podman quadlets. The transition was challenging, but I am very happy with the result.
Seems very cool but can it do all one can do with compose? In other words, declare networks, multiple services, volumes, config(maps) and labels for e.g. traefik all in one single file?
To me that's why compose is neat. It's simple. Works well with rootless podman also.
Look into podlet, it's a tool made to convert compose files, kube manfiests, running containers and maybe other stuff, into quadlets.
I'm using this tonspeedup my quadlet configs whenever I want to deploy a new service that invariably has a compose file.
I suspect there are few capabilities compose possesses that quadlets lack. Certainly, there are many capabilities that quadlets possess that compose lacks because you're really making systemd services, which exposes a host of possibilities.
Services are conceptually similar to pods in podman. Volumes and mounts are the same. Secrets or mounts can do configs, and I think podman handles secrets much better than docker. I searched for and found examples for getting traefik to work using quadlets. There are a few networking wrinkles that require a bit of learning, but you can mostly stick to the old paradigm of creating and attaching networks if that's your preference, and quadlets can handle all of that.
Quadlets use ini syntax (like systemd unit files) instead of YAML, and there is currently a lack of tooling for text highlighting. As you alluded, quadlets require one file per systemd service, which means you can't combine conceptually similar containers, networks, volumes, and other entities in a single file. However, podman searches through the quadlet directories recursively, which means you can store related services together in a directory or even nest them. This was a big adjustment, but I think I've come to prefer organizing my containers using the file system rather than with YAML.
You can if you convert your docker-compose.yaml into Kubernetes YAML and deploy that as a quadlet with a .kube extension.
That is indeed really nice. However, kubernetes resource definitions are way more complicated than compose files so I still wish one could do the same by just adding a .compose extension to easily migrate.
I encourage you to look into this blog post as well; it helped me greatly with seamlessly switching into quadlets in my homelab: https://news.ycombinator.com/item?id=43456934
I created skate (https://github.com/skateco/skate) to be basically this but multihost and support k8s manifests. Under the hood it’s podman and systemd
This is a great approach which resonates with me a lot. It's really frustrating that there is no simple way to run a multi-host Docker/Podman (Docker Swarm is abandonware since 2019 unfortunately). However, in my opinion K8s has the worst API and UX possible. I find Docker Compose spec much more user friendly. So I'm experimenting with a multi-host docker-compose at the moment: https://github.com/psviderski/uncloud
Wouldn’t argue with you abut the k8s ux. Since it has all the ground concepts ( service, cronjob etc ) it required less effort than rolling yet another syntax.
Uncloud looks awesome and seems to have a great feature set!! Nice work!
Thanks! I'm more than happy to catch up to discuss the challenges. Feel free to reach out to me on twitter @psviderski or email 'me at psviderski.name'
Thank you for building this. I appreciate you.
This looks awesome!
We went back to just packaging debs and running them directly on ec2 instances with systemd. no more containers. Put the instances in an autoscaling group with an ALB. A simple ansible-pull installs the debs on-boot.
really raw-dogging it here but I got tired of endless json-inside-yaml-inside-hcl. ansible yaml is about all I want to deal with at this point.
I also really like in this approach that if there is a bug in a common library that I use, all I have to do is `apt full-upgrade` and restart my running processes, and I am protected. No rebuilding anything, or figuring out how to update some library buried deep a container that I may (or may not) have created.
Yes, I also have gone this route for a very simple application. Systemd was actually delightful, using a system assigned user account to run the service with the least amount of privileges is pretty cool. Also cgroup support does really make it nice to run many different services on one vps.
The number of human lifetimes wasted on the problem domain of "managing YAML at scale"...
The article is more than one year old, systemd now even has specialized officially supported OS distro for immutable workflow namely ParticleOS [1],[2].
[1] ParticleOS:
https://github.com/systemd/particleos
[2] Systemd ParticleOS:
https://news.ycombinator.com/item?id=43649088
Nice. The next logical step is to replace the Linux kernel with "systemd-kernel" and that will make it complete.
Not deep enough; we need systemd to replace the BIOS and preferably the CPU microcodes too.
Reading the comments here makes me feel old.
Doesn't anyone just use ssh and nginx anymore? Cram everything onto one box. Back the box up aggressively. Done.
I really don't need microservices management for my home stuff.
I do (nginx plus a couple of custom services) but my needs are very minimal. As soon as you need something a little complex or redundancy by spinning up multiple nodes then containers start to make a huge amount of sense.
I really have a love/hate relationship with containers. On the one hand they are entirely self contained, make redundancy simple, and - if used well - are more legible than some adhoc set up procedures.
At the same time, i've seen some horrible decisions made because of them: Redis for things that do not need them. Projects with ~10.000 users (and little potential growth) tripping over themselves to adopt k8 when my desktop could run the workload of 100.000 users just fine. A disregard for backups / restore procedures because redundancy is good enough. "Look I can provision 64 extra servers for my batch job that pre-calculates a table every day".
---
It seems every year fewer teams appreciate how fast modern hardware with a language like Rust or Go can be if you avoid all the overhead.
My standard advice is to use a single container that holds everything. Only after its build and in use can you make the best choice at which angle to scale.
> A complex system that works is invariably found to have evolved from a simple system that worked. - John Gall
Containers help in two ways. First in deployment, if you really have a complex system (and "modern" development practices seem to encourage complexity).
But containers really shine during development if you have more than a few developers working on the same projects. The ability to have a standard dev container for coding and testing saves so much time. And once you have that, deploying with containers is almost free.
From what I read, I think you can replace this all with a docker compose command and something like Caddy to automatically get certs.
It's basically just this command once you have compose.yaml: `docker compose up -d --pull always`
And then the CI setup is this:
The benefit here is that it is simple and also works on your development machine.Of course if the side goal is to also do something fun and cool and learn, then Quadlet/k8s/systemd are great options too!
Do this (once):
Then try this instead: No need to copy files around.Also, another pro tip: set up your ~/.ssh/config so that you don't need the user@ part in any ssh invocations. It's quite practical when working in a team, you can just copy-paste commands between docs and each other.
Do what the sibling comment says or set DOCKER_HOST environment variable. Watch out, your local environment will be used in compose file interpolation!
I am of the opinion that deploying stuff to a single server shouldn't be this complicated, and I wrote a tool to deploy the way I wanted:
https://harbormaster.readthedocs.io/
Harbormaster uses a YAML file to discover repositories, clones and updates them every so often, and runs the Docker Compose files they contain. It also keeps all state in a single directory, so you can easily back everything up. That's it.
It's by far the easiest and best tool for container orchestration I've come across, if all you need is a single server. I love how the entire config is declared in a repo, I love how all the state is in one directory, and I love how everything is just Compose files, nothing more complicated.
I know I'm tooting my own horn, I just love it so much.
I think you are only looking at Kubernetes for running and updating container images. If that’s the use-case then I guess it’s overkill.
But Kubernetes does much more in terms of providing the resources required for these containers to share state, connect to each other, get access to config or secrets etc.
That’s where comes the CPU and memory cost. The cost of managing your containers and providing them the resources they need.
> basically acts as a giant while loop
Yep. That’s the idea of convergence of states I guess. In a distributed system you can’t always have all the participating systems behave in the desired way. So the manager (or orchestrator) of the system continuously tries to achieve the desired state.
> But Kubernetes does much more in terms of providing the resources required for these containers to share state, connect to each other, get access to config or secrets etc.
This was OPs argument, and mine as well. My side project which is counting requests per minute or hour really doesn’t need that, however I need to eat the overhead of K8s just to have the nice dx of being able to push a container to a registry and it gets deployed automatically with no downtime.
I don’t want to pay to host even a K3s node when my workload doesn’t even tickle a 1vCPU 256mb ram instance, but I also don’t want to build some custom scaffold to so the work.
So I end up with SSH and SCP… quadlets and podman-systemd solves those problems I have reasonably well and OPs post is very valuable because it builds awareness of a solution that solves my problems.
I never moved to containers and seeing the churn the community has went through with all of this complicated container tooling, I'm happy orchestrating small-scale systems with supervisord and saltstack-like chatops deployments - it's just stupid simple by comparison and provides parity between dev and prod environments that's nice.
It looks like supervisord had it last release in December 2022. GitHub issue for a new release are not answered: https://github.com/Supervisor/supervisor/issues/1635#issue-2... The original author seems to have moved on to NixOS.
What churn? For 95% of users, the way to use containers hasn't changed in the past decade. It's just a combination of docker CLI, maybe some docker compose for local testing and then pushing that image somewhere.
True perhaps from a certain perspective, but k8s and the other orchestration technologies, etc. Also dev workflows in containers seem broken and hard - I think it offers a poor experience generally.
try the built-in systemd containers - via nspawn. Underrated tool!
Too many gaps around image management. It seems like an unfinished feature that wasn't completely thought out IMO. Podman is what systemd-nspawns OCI interface should've become.
The incomplete resource controls compared with other units is also annoying. Probably the biggest reason for me that I haven't used nspawn much.
What's an example of that?
Why on earth would you run services on a server with --user and then fiddle with lingering logins instead of just using the system service manager?
I'll answer this for you. You want rootless podman because docker is the defacto standard way of packaging non-legacy software now including autoupdates. I know, sad... Podman still does not offer convenient and mature way for systemd to run it with an unprivileged user. It is the only gripe I've had with this approach...
This is no longer true as of Podman 5 and Quadlet?
You can define rootless containers to run under systemd services as unprivileged users. You can use machinectl to login as said user and interact with systemctl.
Can you please link the docs for this?
This is a good intro [0], courtesy of Redhat.
This is a good example [1], cited elsewhere in this post.
Documentation for quadlet systemd units [2].
[0]: https://www.redhat.com/en/blog/quadlet-podman
[1]: https://mo8it.com/blog/quadlet/
[2]: https://docs.podman.io/en/latest/markdown/podman-systemd.uni...
You see, my issue with this is that it suggests using the quadlets with lingering users... Which is the same annoying case as with the article. It is not as with other systemd services that you just instruct systemd to take a temporary uid/gid and run the service with it.
Quadlet debuted with Podman 4.4 iirc.
Oh yes, correct!
I really like rootless podman, but there is one quirk in that if you want to preserve the original source IP address (e.g. for web server logs), you have to use a workaround which has a performance penalty.
https://github.com/containers/podman/issues/10884
https://github.com/containers/podman/pull/9052
https://github.com/containers/podman/pull/11177
That workaround is not needed if the web server container supports socket activation. Due to the fork-exec architecture of Podman, the socket-activated socket is inherited by the container process. Network traffic sent over this socket-activated socket has native performance. https://github.com/containers/podman/blob/main/docs/tutorial...
Correct me if I'm wrong but doesn't pasta solve this?
It appears that systemd User= and DynamicUser= is incompatible with Podman so --user is being used as a replacement. Looks messy.
https://github.com/containers/podman/discussions/20573
You will have many similar questions with systemd once you start doing a bit more complicated things outside of simple services.
Funny, because when we built Fleet (https://github.com/coreos/fleet/) Kubernetes didn't exist.
I don't know if someone knows a better stack for my fleet of self hosted applications, maybe moving to quadlet would simplify stuff ?
Right now I have an Ansible playbook responsible for updating my services, in a git repo.
The playbook stops changed services, backups their configs and volumes, applies the new docker-compose.yml and other files, and restarts them.
If any of them fail to start, or aren't reachable after 3 minutes, it rolls back everything *including the volumes* (using buttervolume, docker volumes as btrfs subvolumes to make snapshots free).
I am looking into Kubernetes, but I didn't find a single stack/solution that would do all that this system does. For example I found nothing that can auto rollback on failure *including persistent volumes*.
I found Argo Rollback but it doesn't seem to have hooks that would allow me to add the functionality.
YMMV, no warranty, IMHO, etc. Disclaimer: I haven't used k8s in a long while, mostly because I don't really have a good use case for it.
You'd need to slightly rethink rollbacks, express them in terms of always rolling forward. K8s supports snapshots directly (you'd need a CSI driver; https://github.com/topolvm/topolvm, https://github.com/openebs/zfs-localpv, or similar). Restores happen by creating a new PVC (dataSource from a VolumeSnapshot). So in case rollout to version N+1 fails, instead of a rollback to N you'd roll forward to N+2 (which itself would be a copy of N, but referencing the new PVC/PV). You'd still have to script that sequence of actions somehow - perhaps back to Ansible for that? Considering there might be some YAML parsing and templating involved.
Of course this looks (and likely is) much more complicated, so if your use case doesn't justify k8s in the first place, I'd stick to what already works ;)
Thanks for the recommendation, I'll look into it.
I'm interested in moving to Kubernetes to make use of the templating languages available that are better than plain Ansible jinja2, and also offer features like schema checking.
Because my services are pretty integrated together and to avoid having hardcoded values in multiple places my Ansible files are a pain to maintain
The pain of Ansible+Jinja2 is what eventually pushed me to write judo[1]... It works surprisingly well for what it was built for, but of course has limitations (e.g. there's no direct support for a templating language; you just plug env variables into a script). The idea is, do less, and allow other tools to fill the gaps.
There's still a lot on my todo list, like env files, controlling parallelism, canaries/batches, etc. I'm currently doing these things using hacky shell scripts, which I don't like, so I'd prefer moving that into the main binary. But I still prefer it as-is over Ansible.
[1]: https://github.com/rollcat/judo
At some point I tried to run a few small websites dedicated for activism (couple of Wordpress instances, a forum and custom PHP code) using docker. It was time sink as updating and testing the images turned out to be highly non-trivial.
Eventually I replaced everything with a script that generated systemd units and restarted the services on changes under Debian using the Wordpress that comes with it. Then I have a test VM on my laptop and just rsync changes to the deployment host and run the deployment script there. It reduced my chores very significantly. The whole system runs on 2GB VPS. It could be reduced to 1GB if Wordpress would officially support SQLite. But I prefer to pay few more euros per month and stick to Mariadb to minimize support requirements.
For anyone interested in quadlet, I've discovered a very usefull tool to convert compose files and manifests into quadlets: podlet.
It dramatically speeds up the process of converting the usual provided files into quadlets.
https://github.com/containers/podlet
I also use systemd+podman. I manage the access into the machine via an nginx that reverse proxies the services. With quadlets things will probably be even better but right now I have a manual flow with `podman run` etc. because sometimes I just want to run on the host instead and this allows for me to incrementally move in.
I do this with traefik as the reverse proxy. To host something new, all I need to do is add a label to the new container for traefik to recognize. It's neat with a wildcard cert that traefik automatically renews. I've also heard good things about caddy, a similar alternative.
Yeah, I've heard that these new reverse proxies are great like that. I have to run certbot (which I do) and I should have created wildcard certs but I didn't. I use traefik on k3s and it's good there for other stuff.
Kamal is also a decent solution if you just have a static set of webapps that you want to easily deploy to static set of systems but still want 'production-like' features like no-downtime-deploys.
And I'm pretty familiar with Kubernetes but, yeah, for small tasks it can feel like taking an Apache to the store to buy a carton of milk.
https://github.com/coreos/fleet I feel like fleet deserved more of a shot than it ultimately got.
agreed, coreos pivoted to k8s almost immediately after releaseing fleet and didn't really get the chance to use and develop it much.
That's also where etcd came from. It really felt, to me, like the precursor to Kubernetes.
Well, it's funny you mention that because I started working on a PoC of running vLLM atop Fleet this morning. :grin:
I'm glad to hear it still works!
Well, if you're planning to run a single-node container server, then K8s is probably an overkill compared to Podman Quadlets. You just choose the lightest solution that meets your requirement. However, there was a noteworthy project named Aurae [1]. It was an ambitious project intended to replace systemd and kubelets on a server. Besides running containerized and baremetal loads, it was meant to take authenticated commands over an API and had everything that was expected on K8s worker nodes. It could work like K8s and like Docker with appropriate control planes. Unfortunately, the project came to an abrupt end when its main author Kris Nova passed away in an unfortunate accident.
[1] https://aurae.io/
I’m here just doing docker compose pull && docker compose down && docker compose up -d and it’s basically fine.
I believe you can skip the "down" :)
docker-compose had very unfun mechanism of detecting image updates so it depends (I haven't dug deep into V2).
Good to know
You also skip the docker compose pull if you configure it to always pull in the compose file or in the up command.
This is cool, but it doesn't address the redundancy/availability aspect of k8s, specifically, being able to re-schedule dead services when a node (inevitably) dies.
I like to look at Kubernetes as "distributed systemd".
"What if I could define a systemd unit that managed a service across multiple nodes" leads naturally to something like k8s.
Generally speaking, redundancy/availability could also be achieved through replication rather than automatic rescheduling, where you deploy multiple replicas of the service across multiple machines. If one of them dies, the other one still continues service traffic. Like in good old days when we didn't have k8s and dynamic infra.
This trades off some automation for simplicity. Although, this approach may requires manual intervention when a machine fails permanently.
I just deployed a couple containers this way, was pretty easy to port the docker-compose. However, I then tried to get them to run rootless, and well, that turned out to be headache after headache. Went back to rootful, other than I'm pretty happy with the deployment.
Do you actually need the container at that point?
I host all of my hobby projects on a couple of raspi zeros using systemd alone, zero containers. Haven’t had a problem since when I started using it. Single binaries are super easy to setup and things rarely break, you have auto restart and launch at startup.
All of the binaries get generated on GitHub using Actions and when I need to update stuff I login using ssh and execute a script that uses a GitHub token to download and replace the binary, if something is not okay I also have a rollback script that switches things back to its previous setup. It’s as simple as it gets and it’s been my go-to for 2 years now.
Sure, if you are using single binary output language like Golang, Rust, C or .Net/Java self contained, containers are overkill if you are not using container management system.
However, Ruby, Python, JS/TS, Java/.Net are all easier inside a container then outside. Not to say it's not doable, just hair pulling.
I do this too; but I'm looking to try the container approach when I can't use a single binary (i.e. someone else's Python project).
How does your rollback work without containers?
If it is a single binary, replace the current with the previous.
If it is deployed as folders, install new versions as whatever.versionnumber and upgrade by changing the symlink that points to the current version to point to the new one.
I know some large AWS environments that run a variation of this
Autoscaling fleet - image starts, downloads container from registry and starts on instance
1:1 relationship between instance and container - and they’re running 4XLs
When you get past the initial horror it’s actually beautiful
A couple years ago I upgraded my desktop hardware, which meant it was time to upgrade my homelab. I had gone through various operating systems and methods of managing my services: systemd on Ubuntu Server, Docker Compose on CentOS, and Podman on NixOS.
I was learning about Kubernetes at work and it seemed like such a powerful tool, so I had this grand vision of building a little cluster in my laundry room with nodes net booting into Flatcar and running services via k3s. When I started building this, I was horrified by the complexity, so I went the complete opposite direction. I didn't need a cluster, net booting, blue-green deployments, or containers. I landed on NixOS with systemd for everything. Bare git repos over ssh for personal projects. Git server hooks for CI/CD. Email server for phone notifications (upgrade failures, service down, low disk space etc). NixOS nightly upgrades.
I never understood the hate systemd gets, but I also never really took the time to learn it until now, and I really love the simplicity when paired with NixOS. I finally feel like I'm satisfied with the operation and management of my server (aside from a semi frequent kernel panic that I've been struggling to resolve).
I do that too, I run everything in rootless podman managed by systemd units it's quite nice. With systemd network activation I could even save the cost of user space networking, though for my single user use case, it's not really needed and for now I could not bother.
I also have Quadlet on my backlog, I'm waiting the release of next stable version of Debian (which I think should be released sometimes this year) as the current version of Debian has a podman slightly too old which doesn't include Quadlet
When a service is updated like this:
- is there downtime? (old service down, new service hasn't started yet)
- does it do health checks before directing traffic? (the process is up, but its HTTP service hasn't initialized yet)
- what if the new process fails to start, how do you rollback?
Or it's solved with nginx which sits in front of the containers? Or systemd has a builtin solution? Articles like this often omit such details. Or no one cares about occasional downtimes?
Hobbyists care not for such things. Sure does matter at the enterprise level though.
If your site has almost no users, the approach outlined in the article is viable. In all other cases, if my site were greeting visitors with random errors throughout the day, I'd consider that a pretty poor job.
At https://controlplane.com we give you the power of Kubernetes without the toil of k8s. A line of code gets you a tls terminated endpoint that is geo routed to any cloud region and on-prem location. We created the Global Virtual Cloud that let's you run compute on any cloud, on premises hardware or vm's and any combination. I left vmware to start the company because the cognitive load on engineers was becoming ridiculous. Logs, metrics, tracing, service discovery, TLS, DNS, service mesh, network tunnels and much more - we made it easy. We do to the cloud what vmware did to hardware - you don't care what underlying cloud you're on. Yet you can use ANY backing service of AWS, GCP and Azure - as if they merged and your workloads are portable - they run unmodified anywhere and can consume any combination of services like RDS, Big Query, Cosmos db and any other. It is as if the cloud providers decided to merge and then lower your cost by 60-80%.
Check it out. Doron
Why / how is it 60% cheaper?
Interesting, thanks
Whoever thought running your personal blog on Kubernetes is a good idea? Kubernetes is good for large scale applications. Nothing else. I do not get why even mid-sized companies take the burden of Kubernetes although they have low infrastructur needs.
Its a tool that lets you punch way above your weight resource wise. As long as you dont manage the control plane, if you already know k8s there is very little reason not to use it. Otherwise you end up replicating what it does piecemeal.
There are a lot of reasons not to use a complex additional layer, like k8s, even if _you_ know it inside out. Long-term maintainability for example. This is especially important for low traffic sites, where it does not pay out to maintain them every 1-2 years. And it adds an additional point of failure. Trust me, I've maintained some applications running on k8s. They did always fail due to the k8s setup, not the application itself.
Why do I need to trust you when I can trust myself and the clusters Ive maintained : D
Until something better comes I will start with k8s 100% of the time for production systems. The minor one time pains getting it stood up are worth it compared to the migration later and everything is in place waiting to be leveraged.
Looking forward to the follow up "replacing systemd with init and bash scripts."
That will cut down memory usage by half, again.
No, just kidding. As much as I'd like to, reverting back to a sane initsystem is not possible and not an option anymore.
Article about podman, what systemd has to do with anything? What's wrong with docker + watchtower? How is is possible to both use gitops and not "remember" which flags you used for containers?
So many questions...
I remember seeing a project in development that built k8s-like orchestration on top of systemd a couple years ago, letting you control applications across nodes and the nodes themselves with regular systemd config files and I have been unable to find it again. IIRC it was either a Redhat project or hosted under github.com/containers and looked semi-official.
Anyone knows what I’m talking about? Is it still alive?
EDIT: it’s not CoreOS/Fleet, it’s something much more recent, but was still in early alpha state when I found it.
Maybe you're thinking of BlueChi[0]? It used to be Red Hat project called 'hirte' and was renamed[1] and moved to the Eclipse foundation for whatever reason. It lets you control systemd units across nodes and works with the normal systemd tools (D-bus etc).
[0] https://github.com/eclipse-bluechi/bluechi
[1] https://www.redhat.com/en/blog/hirte-renamed-eclipse-bluechi
Oh that might be it! Of course I couldn't find it, what with the rename and now it's part of Eclipse (?)
Seemed quite promising to me at the time, though it seem they've changed the scope of the project a little bit. Here's an old post about it: https://www.redhat.com/en/blog/introducing-hirte-determinist...
Wrt to the updates: Does it mean it maintains the original environment variables you passed in separate from the ones set by the container?
E.g. when updating a container with watchtower:
You deploy a container 'python-something'.
The container has PYTHON=3.11.
Now, the container has an update which sets PYTHON=3.13.
Watchtower will take the current settings, and use them as the settings to be preserved.
So the next version will deploy with PYTHON=3.11, even though you haven't set those settings.
Soon back to chroot with a busybox cgi shell script as an API.
I am working on proot-docker:
https://github.com/mtseet/proot-docker
OK it does not need process namespaces, but that's good enough for most containers.
FYI the link to Flux in
> Particularly with GitOps and [Flux](https://www.weave.works/oss/flux/?ref=blog.yaakov.online), making changes was a breeze.
appears to be broken.
EDIT: oh, I hadn't realized the article was a year old.
I've had a really great experience with docker compose and systemd units. I use a generic systemd unit and it's just as easy as setting up users then `systemctl enable --now docker-compose@service-name`.
Couldn’t it be argued that this is a bug in Kubernetes?
Most of what it does is run programs with various cgroups and namespaces, which is what systemd does, so should it really be any more resource intensive?
Or drop all brittle c code entirely and replace systemd with kubernetes: https://www.talos.dev/
Just replace the battle tested brittle C code with a brittle bespoke yaml mess!
To suggest that Kubernetes doesn't fall under the scope of "battle tested" is a bit misleading. As much as systemd? Of course not. But it's not 2016 anymore and we have billions of collective operational hours with Kubernetes.
Just the other day I found a cluster I had inadvertently left running on my macBook using Kind. It literally had three weeks of uptime (running a full stack of services) and everything was still working, even with the system getting suspended repeatedly.
I once discovered that I had left a toy project postgres instance running on my macbook for two and a half years. Everything was working perfectly, and this was more than a decade ago, on an intel mac.
Ha, I figured someone was going to come with a retort like that. Fair, but it would be better to compare it to running a distributed postgres cluster on your local mac.
+1, easiest bare-metal installation of k8s ever.
CoreOS (now part of redhat) offered "fleet" which was multi-computer container orchestration, and used systemd style fleet files and raft protocol to do leader election etc (back when raft was still a novel concept). We ran it in production for about a year (2015) before it became abundantly clear (2016) the future was Kubernetes. I think the last security update Fleet got was Feb 2017 so it's pretty much dead for new adoption, if anyone is still using it.
But TL;DR we've already done systemd style container orchestration, and it was ok.
why does it require so much RAM, can someone more knowledgeable explain? what does it store un such quantities ?
on the resource part - try running k8s on Talos linux. much less overhead.
What about using nomad?
https://developer.hashicorp.com/nomad
I ran it for a couple years. While it had some quirks at the time, it (and the rest of the Hashi stack) were lightweight, nicely integrated, and quite pleasant.
However, it’s no longer open source. Like the rest of Hashicorp’s stuff.
... or (maybe) incus.
it's what I'm deploying everything on right now... it's still early and the update restarts everything... eeek.
All your container are belong to us!!!
was actually fun running crypto traders without all this container bullshit. one service is one service. and so much simpler.
it helped of course that people writing them knew what they were doing.
be a man, use k8s
That's the most disgusting sentence fragment I've ever heard. I wish it could be sent back in a localized wormhole and float across the table when Systemd was being voted on to become the next so-called init system.
Edit: Nevermind, I misunderstood the article from just the headline. But I'm keeping the comment as I find the reference funny