Longhorn – A Kubernetes-Native Filesystem

(vegard.blog.engen.priv.no)

54 points | by jandeboevrie 4 days ago ago

45 comments

dpedu 8 hours ago ago

Kubernetes CSI drivers are surprisingly easy to write. You basically just have to implement a number of gRPC procedures that manipulate your system's storage as the Kubernetes control plane calls them. I wrote one that uses file-level syncing between hosts using Syncthing to "fake" network volumes.
https://kubernetes-csi.github.io/docs/developing.html
There are 4 gRPCs listed in the overview, that literally is all you need.
cmeacham98 11 hours ago ago

I tried longhorn on my homelab cluster. I'll admit it's possible that I did something wrong, but I managed to somehow get it into a state where it seemed my volumes got permanently corrupted. At the very least I couldn't figure out how to get my volumes working again.
When restoring from backup I went with Rook (which is a wrapper on ceph) instead and it's been much more stable, even able to recover (albeit with some manual intervention needed) from a total node hardware failure.

[-]
- nerdjon 9 hours ago ago
  
  It is interesting seeing this article come up since just yesterday I setup longhorn in my homelab cluster needing better performance for some tasks than NFS was providing so I setup a raid on my r630 and tried it out.
  So far things are running well but I can't shake this fear that I am in for a rude awakening and I loose everything. I backups but the recovery will be painful if I have to do it.
  I will have to take a look at rook since I am not quite committed enough yet (only moved over 2 things) to switch.
  
  [-]
  - cortesoft 6 hours ago ago
    
    I have a small 4 node home cluster, and longhorn works great... on smaller volumes.
    I have a 15TB volume for video storage, and it can't complete any replica rebuilds. It always fails at some point and then tries to restart.
  - master_crab 7 hours ago ago
    
    If the information is truly important push it off to a database or NAS. I use rook at home but really only for long lived app data (config files, etc). Anything truly important (media, files, etc) is served from an NFS attached to the cluster.
positisop 10 hours ago ago

Longhorn is a poorly implemented distributed storage layer. You are better off with Ceph.

[-]
- willbeddow 9 hours ago ago
  
  have not used longhorn, but we are currently in the process of migrating off of ceph after an extremely painful relationship with it. Ceph has fundamental design flaws (like the way it handles subtree pinning) that, IMO, make more modern distributed filesystems very useful. SeaweedFS is also cool, and for high performance use cases, weka is expensive but good.
  
  [-]
  - q3k 9 hours ago ago
    
    That sounds more like a CephFS issue than a Ceph issue.
    (a lot of us distrust distributed 'POSIX-like' filesystems for good reasons)
  - __turbobrew__ 9 hours ago ago
    
    Are there any distributed POSIX filesystems which don’t suck? I think part of the issue is that POSIX compliant filesystem just doesn’t scale, and you are just seeing that?
    
    [-]
    - scheme271 7 hours ago ago
      
      I think Lustre works fairly well. At the very least, it's used in a lot of HPC centers to handle large filesystems that get hammered by lots of nodes concurrently. It's open source so nominally free although getting a support contract from specialized consulting firm might be pricey.
      
      [-]
      - latchkey 5 hours ago ago
        
        https://www.reddit.com/r/AMD_Stock/comments/1nd078i/scaleup_...
        You're going to have to open the image and then go to the third image. I thought it was interesting that OCI pegs Lustre at 8Gb/s and their high performance FS at much higher than that... 20-80.
    - huntaub 5 hours ago ago
      
      Basically, we are building this at Archil (https://archil.com). The reason these things are generally super expensive is that it’s incredibly hard to build.
    - willbeddow 9 hours ago ago
      
      weka seems to Just Work from our tests so far, even under pretty extreme load with hundreds of mounts on different machines, lots of small files, etc... Unfortunately it's ungodly expensive.
- yupyupyups 10 hours ago ago
  
  I've heard Ceph is expensive to run. But maybe that's not true?
  
  [-]
  - keeperofdakeys 10 hours ago ago
    
    Ceph overheads aren't that large for a small cluster, but they grow as you add more hosts, drives, and more storage. Probably the main gotcha is that you're (ideally) writing your data three times on different machines, which is going to lead to a large overhead compared with local storage.
    Most resource requirements for Ceph assume you're going for a decently sized cluster, not something homelab sized.
  - master_crab 7 hours ago ago
    
    It’s going to do a good job saturating your lan maintaining quorum on the data.
  - jauntywundrkind 10 hours ago ago
    
    I'm only just wading in, after years of intent. I don't feel like Ceph is particularly demanding. It does want a decent amount of ram. 1GB each for monitor, manager, and metadata, up to 16GB total for larger clusters, according to docs. But then each disk's OSD defaults to 4gb, which can add up fast!! And some users can use more. 10Gbe is recommended and more is better here but that seems not unique to ceph: syncing storage will want bandwidth. https://docs.ceph.com/en/octopus/start/hardware-recommendati...
    
    [-]
    - xyzzy123 10 hours ago ago
      
      For me it was the ram for the OSDs, 1GB per 1TB but ideally more for SSDs...
    - westurner 10 hours ago ago
      
      This from 2023 says: https://www.redhat.com/en/blog/ceph-cluster-single-machine :
      > All you need is a machine, virtual or physical, with two CPU cores, 4GB RAM, and at least two or three disks (plus one disk for the operating system).
d3Xt3r 4 days ago ago

Longhorn was the codename for Windows Vista... so not a great choice of a name (IMO).

[-]
- onionisafruit 11 hours ago ago
  
  Longhorn is a fine name, and it doesn't matter if somebody else used it 20+ years ago
  
  [-]
  - weinzierl 10 hours ago ago
    
    By that logic Titanic would be a fine name too.
    
    [-]
    - ofrzeta 5 hours ago ago
      
      https://www.titanic-magazin.de
    - NewJazz 10 hours ago ago
      
      Hmm, maybe just shorten to Titan?
      
      [-]
      - esafak 7 hours ago ago
        
        Just don't use it to name a database.
    - bigstrat2003 9 hours ago ago
      
      I mean, I think it would be. Superstition about naming is silly.
  - fineallaround 11 hours ago ago
    
    [flagged]
    
    [-]
    - privatelypublic 11 hours ago ago
      
      Even complaining about Vista raises eyebrows. It had two huge issues: overactive UAC, and Microsoft handing "Vista Certified" to basically anybody who asked. (Frequently to machines that would barely run XP pre-SP1.)
      Most of the complaints can be reduced to one of those.
      Yes- I hand wave away a lot of other things: because they were required for a huge step towards a decently secure and stable OS.
      
      [-]
      - samplatt 6 hours ago ago
        
        >a huge step towards a decently secure and stable OS
        It absolutely was an important (and required) step towards a more secure and stable OS. What it was not, though, was a secure and stable OS.
        Windows ME was the same. A required step on the path towards something better, and ALSO something that had the "Windows XX-ready" badge slapped on anything that asked. But no one is lining up to try Vista again apart from technical challenges.
        
        [-]
        
        privatelypublic 6 hours ago ago
        
        ME is... not comparable? There's no security boundaries ME could implement- it was still DOS and fat32.
        The list of changes Vista made were never going to go off without a hitch. When you put new boundaries in place in the kernel, and a driver violates them because it was recompiled not updated to handle a separation and handle errors from it: there's no choice but to Kernel Panic.
        Compatibility Shims were introduced for userland changes.
        Despite the hate, DWM handled the most frequent crashes: graphics.
        Microsoft is STILL working on pulling graphics code out of the kernel and into userland.
    - undefined 11 hours ago ago
      
      [deleted]
- gdbsjjdn 10 hours ago ago
  
  I did this was going to be about the Vista and how some of the FS stuff that got cut was prescient. "This old thing that didn't work was ahead of its' time" is a whole genre of post (ex. Itanium)
- pjmlp 6 hours ago ago
  
  Indeed, does it uses .NET in its implementation, or are they already rewriting it into COM?
- antod 8 hours ago ago
  
  Could've been worse eg Cairo or Blackcomb.
- tracker1 11 hours ago ago
  
  I remembered the Windows Vista reference as soon as I saw the name. That said, I don't think it's a big deal.
devn0ll 4 hours ago ago

As an Enterprise user of Rancher, we had long discussions with Suse about Longhorn. And we are not using it.
You need a separate storage lan, a seriously beafy one at to use Longhorn. But even 25Gbit was not enough to keep volumes from being corrupted.
When rebuilds take too long, longhorn fails, crashes, hangs, etc, etc.
We will never make the mistake of using Longhorn again.
coopreme 11 hours ago ago

Go with Ceph… a little more of a learning curve but overall better.
remram 5 hours ago ago

Be aware of its security flaws -- https://github.com/longhorn/longhorn/issues/1983
Allowing anyone to delete all your data is not great. When I found this I gave up on Longhorn and installed Ceph.
dilyevsky 11 hours ago ago

Anyone knows what's the story with NVMEoF/SPDK support these days? A couple years ago Mayastor/OpenEBS was running laps around Longhorn on every performance metrics big time, not sure if anything changed there...
scubbo 8 hours ago ago

(Copied from[0] when this was posted to lobste.rs) Longhorn was nothing but trouble for me. Issues with mount paths, uneven allocation of volumes, orphaned undeletable data taking up space. It’s entirely possible that this was a skill issue, but still - never touching it again. Democratic-csi[1] has been a breath of fresh air by comparison.
[0] https://lobste.rs/s/vmardk/longhorn_kubernetes_native_filesy... [1] https://github.com/democratic-csi/democratic-csi
studmuffin650 9 hours ago ago

Where I work, we primarily use Ceph for the a K8s Native Filesystem. Though we still use OpenEBS for block store and are actively watching OpenEBS mayastor

[-]
- __turbobrew__ 9 hours ago ago
  
  I looked into mayastor and the NVME-of stuff is interesting, but it is so so so far behind ceph when it comes to stability and features. One ceph has the next generation crimson OSD with seastore I believe it should close a lot of the performance gaps with ceph.
  
  [-]
  - dilyevsky 7 hours ago ago
    
    > One ceph has the next generation crimson OSD with seastore I believe it should close a lot of the performance gaps with ceph.
    only been in development for what like 5 years at this point? =) i have no horse in this race but seems to me openebs will close the gap sooner.
yamapikarya 5 hours ago ago

i am using nfs and i think its pretty simple and just works
samlevy0515 7 hours ago ago

[dead]