Kubernetes CSI drivers are surprisingly easy to write. You basically just have to implement a number of gRPC procedures that manipulate your system's storage as the Kubernetes control plane calls them. I wrote one that uses file-level syncing between hosts using Syncthing to "fake" network volumes.
I tried longhorn on my homelab cluster. I'll admit it's possible that I did something wrong, but I managed to somehow get it into a state where it seemed my volumes got permanently corrupted. At the very least I couldn't figure out how to get my volumes working again.
When restoring from backup I went with Rook (which is a wrapper on ceph) instead and it's been much more stable, even able to recover (albeit with some manual intervention needed) from a total node hardware failure.
It is interesting seeing this article come up since just yesterday I setup longhorn in my homelab cluster needing better performance for some tasks than NFS was providing so I setup a raid on my r630 and tried it out.
So far things are running well but I can't shake this fear that I am in for a rude awakening and I loose everything. I backups but the recovery will be painful if I have to do it.
I will have to take a look at rook since I am not quite committed enough yet (only moved over 2 things) to switch.
If the information is truly important push it off to a database or NAS. I use rook at home but really only for long lived app data (config files, etc). Anything truly important (media, files, etc) is served from an NFS attached to the cluster.
have not used longhorn, but we are currently in the process of migrating off of ceph after an extremely painful relationship with it. Ceph has fundamental design flaws (like the way it handles subtree pinning) that, IMO, make more modern distributed filesystems very useful. SeaweedFS is also cool, and for high performance use cases, weka is expensive but good.
Are there any distributed POSIX filesystems which don’t suck? I think part of the issue is that POSIX compliant filesystem just doesn’t scale, and you are just seeing that?
I think Lustre works fairly well. At the very least, it's used in a lot of HPC centers to handle large filesystems that get hammered by lots of nodes concurrently. It's open source so nominally free although getting a support contract from specialized consulting firm might be pricey.
You're going to have to open the image and then go to the third image. I thought it was interesting that OCI pegs Lustre at 8Gb/s and their high performance FS at much higher than that... 20-80.
Basically, we are building this at Archil (https://archil.com). The reason these things are generally super expensive is that it’s incredibly hard to build.
weka seems to Just Work from our tests so far, even under pretty extreme load with hundreds of mounts on different machines, lots of small files, etc... Unfortunately it's ungodly expensive.
Ceph overheads aren't that large for a small cluster, but they grow as you add more hosts, drives, and more storage. Probably the main gotcha is that you're (ideally) writing your data three times on different machines, which is going to lead to a large overhead compared with local storage.
Most resource requirements for Ceph assume you're going for a decently sized cluster, not something homelab sized.
I'm only just wading in, after years of intent. I don't feel like Ceph is particularly demanding. It does want a decent amount of ram. 1GB each for monitor, manager, and metadata, up to 16GB total for larger clusters, according to docs. But then each disk's OSD defaults to 4gb, which can add up fast!! And some users can use more. 10Gbe is recommended and more is better here but that seems not unique to ceph: syncing storage will want bandwidth. https://docs.ceph.com/en/octopus/start/hardware-recommendati...
> All you need is a machine, virtual or physical, with two CPU cores, 4GB RAM, and at least two or three disks (plus one disk for the operating system).
Even complaining about Vista raises eyebrows. It had two huge issues: overactive UAC, and Microsoft handing "Vista Certified" to basically anybody who asked. (Frequently to machines that would barely run XP pre-SP1.)
Most of the complaints can be reduced to one of those.
Yes- I hand wave away a lot of other things: because they were required for a huge step towards a decently secure and stable OS.
>a huge step towards a decently secure and stable OS
It absolutely was an important (and required) step towards a more secure and stable OS. What it was not, though, was a secure and stable OS.
Windows ME was the same. A required step on the path towards something better, and ALSO something that had the "Windows XX-ready" badge slapped on anything that asked. But no one is lining up to try Vista again apart from technical challenges.
ME is... not comparable? There's no security boundaries ME could implement- it was still DOS and fat32.
The list of changes Vista made were never going to go off without a hitch. When you put new boundaries in place in the kernel, and a driver violates them because it was recompiled not updated to handle a separation and handle errors from it: there's no choice but to Kernel Panic.
Compatibility Shims were introduced for userland changes.
Despite the hate, DWM handled the most frequent crashes: graphics.
Microsoft is STILL working on pulling graphics code out of the kernel and into userland.
I did this was going to be about the Vista and how some of the FS stuff that got cut was prescient. "This old thing that didn't work was ahead of its' time" is a whole genre of post (ex. Itanium)
Anyone knows what's the story with NVMEoF/SPDK support these days? A couple years ago Mayastor/OpenEBS was running laps around Longhorn on every performance metrics big time, not sure if anything changed there...
(Copied from[0] when this was posted to lobste.rs) Longhorn was nothing but trouble for me. Issues with mount paths, uneven allocation of volumes, orphaned undeletable data taking up space. It’s entirely possible that this was a skill issue, but still - never touching it again. Democratic-csi[1] has been a breath of fresh air by comparison.
Where I work, we primarily use Ceph for the a K8s Native Filesystem. Though we still use OpenEBS for block store and are actively watching OpenEBS mayastor
I looked into mayastor and the NVME-of stuff is interesting, but it is so so so far behind ceph when it comes to stability and features.
One ceph has the next generation crimson OSD with seastore I believe it should close a lot of the performance gaps with ceph.
Kubernetes CSI drivers are surprisingly easy to write. You basically just have to implement a number of gRPC procedures that manipulate your system's storage as the Kubernetes control plane calls them. I wrote one that uses file-level syncing between hosts using Syncthing to "fake" network volumes.
https://kubernetes-csi.github.io/docs/developing.html
There are 4 gRPCs listed in the overview, that literally is all you need.
I tried longhorn on my homelab cluster. I'll admit it's possible that I did something wrong, but I managed to somehow get it into a state where it seemed my volumes got permanently corrupted. At the very least I couldn't figure out how to get my volumes working again.
When restoring from backup I went with Rook (which is a wrapper on ceph) instead and it's been much more stable, even able to recover (albeit with some manual intervention needed) from a total node hardware failure.
It is interesting seeing this article come up since just yesterday I setup longhorn in my homelab cluster needing better performance for some tasks than NFS was providing so I setup a raid on my r630 and tried it out.
So far things are running well but I can't shake this fear that I am in for a rude awakening and I loose everything. I backups but the recovery will be painful if I have to do it.
I will have to take a look at rook since I am not quite committed enough yet (only moved over 2 things) to switch.
I have a small 4 node home cluster, and longhorn works great... on smaller volumes.
I have a 15TB volume for video storage, and it can't complete any replica rebuilds. It always fails at some point and then tries to restart.
If the information is truly important push it off to a database or NAS. I use rook at home but really only for long lived app data (config files, etc). Anything truly important (media, files, etc) is served from an NFS attached to the cluster.
Longhorn is a poorly implemented distributed storage layer. You are better off with Ceph.
have not used longhorn, but we are currently in the process of migrating off of ceph after an extremely painful relationship with it. Ceph has fundamental design flaws (like the way it handles subtree pinning) that, IMO, make more modern distributed filesystems very useful. SeaweedFS is also cool, and for high performance use cases, weka is expensive but good.
That sounds more like a CephFS issue than a Ceph issue.
(a lot of us distrust distributed 'POSIX-like' filesystems for good reasons)
Are there any distributed POSIX filesystems which don’t suck? I think part of the issue is that POSIX compliant filesystem just doesn’t scale, and you are just seeing that?
I think Lustre works fairly well. At the very least, it's used in a lot of HPC centers to handle large filesystems that get hammered by lots of nodes concurrently. It's open source so nominally free although getting a support contract from specialized consulting firm might be pricey.
https://www.reddit.com/r/AMD_Stock/comments/1nd078i/scaleup_...
You're going to have to open the image and then go to the third image. I thought it was interesting that OCI pegs Lustre at 8Gb/s and their high performance FS at much higher than that... 20-80.
Basically, we are building this at Archil (https://archil.com). The reason these things are generally super expensive is that it’s incredibly hard to build.
weka seems to Just Work from our tests so far, even under pretty extreme load with hundreds of mounts on different machines, lots of small files, etc... Unfortunately it's ungodly expensive.
I've heard Ceph is expensive to run. But maybe that's not true?
Ceph overheads aren't that large for a small cluster, but they grow as you add more hosts, drives, and more storage. Probably the main gotcha is that you're (ideally) writing your data three times on different machines, which is going to lead to a large overhead compared with local storage.
Most resource requirements for Ceph assume you're going for a decently sized cluster, not something homelab sized.
It’s going to do a good job saturating your lan maintaining quorum on the data.
I'm only just wading in, after years of intent. I don't feel like Ceph is particularly demanding. It does want a decent amount of ram. 1GB each for monitor, manager, and metadata, up to 16GB total for larger clusters, according to docs. But then each disk's OSD defaults to 4gb, which can add up fast!! And some users can use more. 10Gbe is recommended and more is better here but that seems not unique to ceph: syncing storage will want bandwidth. https://docs.ceph.com/en/octopus/start/hardware-recommendati...
For me it was the ram for the OSDs, 1GB per 1TB but ideally more for SSDs...
This from 2023 says: https://www.redhat.com/en/blog/ceph-cluster-single-machine :
> All you need is a machine, virtual or physical, with two CPU cores, 4GB RAM, and at least two or three disks (plus one disk for the operating system).
Longhorn was the codename for Windows Vista... so not a great choice of a name (IMO).
Longhorn is a fine name, and it doesn't matter if somebody else used it 20+ years ago
By that logic Titanic would be a fine name too.
https://www.titanic-magazin.de
Hmm, maybe just shorten to Titan?
Just don't use it to name a database.
I mean, I think it would be. Superstition about naming is silly.
[flagged]
Even complaining about Vista raises eyebrows. It had two huge issues: overactive UAC, and Microsoft handing "Vista Certified" to basically anybody who asked. (Frequently to machines that would barely run XP pre-SP1.)
Most of the complaints can be reduced to one of those.
Yes- I hand wave away a lot of other things: because they were required for a huge step towards a decently secure and stable OS.
>a huge step towards a decently secure and stable OS
It absolutely was an important (and required) step towards a more secure and stable OS. What it was not, though, was a secure and stable OS.
Windows ME was the same. A required step on the path towards something better, and ALSO something that had the "Windows XX-ready" badge slapped on anything that asked. But no one is lining up to try Vista again apart from technical challenges.
ME is... not comparable? There's no security boundaries ME could implement- it was still DOS and fat32.
The list of changes Vista made were never going to go off without a hitch. When you put new boundaries in place in the kernel, and a driver violates them because it was recompiled not updated to handle a separation and handle errors from it: there's no choice but to Kernel Panic.
Compatibility Shims were introduced for userland changes.
Despite the hate, DWM handled the most frequent crashes: graphics.
Microsoft is STILL working on pulling graphics code out of the kernel and into userland.
I did this was going to be about the Vista and how some of the FS stuff that got cut was prescient. "This old thing that didn't work was ahead of its' time" is a whole genre of post (ex. Itanium)
Indeed, does it uses .NET in its implementation, or are they already rewriting it into COM?
Could've been worse eg Cairo or Blackcomb.
I remembered the Windows Vista reference as soon as I saw the name. That said, I don't think it's a big deal.
As an Enterprise user of Rancher, we had long discussions with Suse about Longhorn. And we are not using it.
You need a separate storage lan, a seriously beafy one at to use Longhorn. But even 25Gbit was not enough to keep volumes from being corrupted.
When rebuilds take too long, longhorn fails, crashes, hangs, etc, etc.
We will never make the mistake of using Longhorn again.
Go with Ceph… a little more of a learning curve but overall better.
Be aware of its security flaws -- https://github.com/longhorn/longhorn/issues/1983
Allowing anyone to delete all your data is not great. When I found this I gave up on Longhorn and installed Ceph.
Anyone knows what's the story with NVMEoF/SPDK support these days? A couple years ago Mayastor/OpenEBS was running laps around Longhorn on every performance metrics big time, not sure if anything changed there...
(Copied from[0] when this was posted to lobste.rs) Longhorn was nothing but trouble for me. Issues with mount paths, uneven allocation of volumes, orphaned undeletable data taking up space. It’s entirely possible that this was a skill issue, but still - never touching it again. Democratic-csi[1] has been a breath of fresh air by comparison.
[0] https://lobste.rs/s/vmardk/longhorn_kubernetes_native_filesy... [1] https://github.com/democratic-csi/democratic-csi
Where I work, we primarily use Ceph for the a K8s Native Filesystem. Though we still use OpenEBS for block store and are actively watching OpenEBS mayastor
I looked into mayastor and the NVME-of stuff is interesting, but it is so so so far behind ceph when it comes to stability and features. One ceph has the next generation crimson OSD with seastore I believe it should close a lot of the performance gaps with ceph.
> One ceph has the next generation crimson OSD with seastore I believe it should close a lot of the performance gaps with ceph.
only been in development for what like 5 years at this point? =) i have no horse in this race but seems to me openebs will close the gap sooner.
i am using nfs and i think its pretty simple and just works
[dead]