Lmst

A quick follow-up: Given that caching an #LVM volume requires setting up a cache and a cache meta volume per data volume you need cached, this just gets needlessly complicated when dealing with a storage provisioner that constantly creates new volumes. Caching seems absolutely necessary with the hardware I have available and LVM seems like the only way to go. There are alternatives, like #flashcache by #Synology, but they're not part of the main stream #kernel.

The solution? It ain't pretty, and I have still to test the performance implications of it. I've created a 4TB file on a filesystem cached by LVM, created a #loopback device from it, initialized that as an LVM physical volume, then added a volume group. The #OpenEBS Local LVM provisioner now uses this group to provision persistent volumes in #Kubernetes.

Not sure about the implications of nested LVM'ing, but it solves the #caching issue.

#linux #cache #hdd #nvme #ssd

I'm running my own #Kubernetes cluster on bare metal at #Hetzner. I'm going through different iterations of node configurations, and trying to figure out how to squeeze the most out of a hardware configuration with a 10TB+ HDD drive and two 512GB #NVME drives. #bcache has apparently been discontinued, which leaves me with #dmcache, which is nicely built into #LVM.

One catch though: you need to create NVME cache volumes for each of the data volumes you create. This is all well and good if you're using a single volume for everything, and tools like #Longhorn work well in this kind of setup. Others, like the #OpenEBS LVM provisioner don't. There is a version in the making that just provisions raw disk images as files on whatever filesystem you have, but it's still lacking some features that are vital to me - primarily the ability to offer up StorageClasses that provision on different paths of the node, so I can pick a fast or slow filesystem to store it on.

Tips for other solutions?

Having recently experienced a rather horrible #Kubernetes crash, I'm looking for #backup solutions. We're good with PostgreSQL since we're using #CNPG with remote transaction logs to an offsite #S3 bucket. I need something for volumes and maybe Kubernetes resources. #Longhorn offers S3 backups for it's own volumes, but for other #CSI like local #OpenEBS, maybe #Velero? Thoughts?

https://velero.io/

I've been doing things I shouldn't with #Kubernetes. We're using a replicated #MinIO cluster as the storage backend on #mstdndk, which requires a boat load of storage, especially if you forget to specify any kind of retention. So far, the quick workaround for a full disk, was just to expand the filesystem. Since we're replicating across nodes, we're using #OpenEBS #LVM for local storage. Poor partitioning means we're running out of storage on the volume group, but even worse - PVCs sizes were increased before checking if we had space for it. Kubernetes is now stuck in a most unfortunate situation - it can't grow the local filesystem, as the volume group is full and you're not allowed to decrease the size request. What then? Cue https://github.com/etcd-io/auger - a tools that allows you to edit #K8s resources directly in #etcd. Obviously you should never do this, but with steady hands and clinical precision, you can get yourself out of a pickle like mine. Size was reverted and PVCs were unstuck.

A unfortunate side affect of Longhorn experiencing I/O latency and saturation, is that volumes attached to pods become remounted read-only. This has very very unfortunate side effects on running databases, caches etc. Any tips on making #Longhorn behave would be greatly appreciated. I've looked briefly into OpenEBS' distributed, replicated storage but the requirements are currently not allowing it - specifically, replicated #OpenEBS needs an entire physical disk for itself.

@fwaggle I tried setting up #longhorn a couple years ago and got frustrated, but maybe its worth another look? Or maybe #OpenEBS?

In what I can only describe as a bad case of "proof of concept making it to production", my #homelab #kubernetes cluster cannot run replicated #openebs because lack of hugepages support.....

Because I rolled forward with PV VMs. And then built everything on top of that, without realizing that the workers are PV.

I'm assuming swapping to #HVM is going to be painful, so I'm contemplating wiping everything and starting over.

#xen #selfhosting

#OpenEBS got archived by the #CNCF, and some people expressed concerns and asked if they have to think about alternatives. Are you concerned for its future and if you can still use it?

#kubernetes question: our deployment stack is #skaffold plus #helm. We have #OpenEBS deployed, I'm trying to add their monitoring mixin to deployment. But they want to deploy whole kube-prometheus, which we already have. I have a script to generate their manifests and grab just those 4 files we're interested in, and have helm grab those. But here comes the question: how do I tie in the generation with the rest of the deployment?
[1/2]

🖥️ Würde die Persistenzen meines #MicroK8s-#Cluster gerne auf #OpenEBS #MayaStor umstellen...

Wie kann ich den Speicher dann auf auf meinem Arbeitsrechner mounten um Konfigurationen zu bearbeiten?

Bisher wird alles auf mein NAS gespeichert und die Freigabe kann ich natürlich auch einfach auf dem Desktop mounten... aber wie läuft das dann mit dem #Clusterstorage?

Also falls weiß wie man das umsetzt... her damit 🙏

#Retoot #Boost erwünscht 🔃

#selfhosting #K8sathome #homelab #k8s #BareMetal

man, my #openebs jiva experience has been nightmare. it requires manual intervention like every day on my cluster, and the community is just a sea of ignored issues and slack messages. i don't know why the talos team recommends it.

am i the only one who feels this way?

#kubernetes #jiva #talos #sidero

#RukiiNet had an outage of 12 hours again, because Jiva volumes got somehow tangled.

Took the opportunity to upgrade the #OpenEBS to 3.7.0, and Jiva to 3.4.0.

Will continue upgrading the NFS provisioner from 4.1.0 to 4.4.0 as well.

Edit: Updated NFS provisioner and CSI driver as well.

#RukiiNet had a 13 hour outage today because of some #OpenEBS #Jiva #NFS race condition thing again. Took me a long time to understand what exactly was the problem. Deleted some NFS pods which were stuck and they recreated themselves correctly after some rebootings.

@Gargron

#kubernetes

Very Nice. There are many features of #Kubernetes that is helpful in allowing #Mastodon to scale

#OpenEBS

I'm not posting "days" anymore, my days aren't consecutive and I don't always have stuff to say.
I've managed to get my production #Kubernetes cluster running with #Longhorn after a period of struggles with #Openebs. I've also been working with my friend to build fresh #alpinelinux base images with #Jenkins and shipping them off to #harbor.
#selfhosted #100DaysOfHomeLab

#RukiiNet #SelfHosting update:
Just after writing this #Curie went down again, and it didn't help that the #NFS pods were all on a different node. It all went down regardless.

Even got some data corruption again, it's always a huge manual hassle to bring everything back up. I read somewhere that #MicroK8S tends to be bad with hard reboots if some specific singleton cluster pods like coredns or calico or nfc controller or hostpath provisioner are on the node which goes down. I wonder if it's possible to just add replicas for those...

I found a new (old and known) bug with #OpenEBS, and a mitigation. In some cases, #Jiva has replicas in a readonly state for a moment as it syncs the replicas, and if the moon phase is correct, there's an apparent race condition where the iSCSI mounts become read-only, even though the underlying volume has already become read-write.

To fix this is to go to the node which mounted these, do "mount | grep ro,", and ABSOLUTELY UNDER NO CIRCUMSTANCE UNMOUNT (learned the hard way). Instead, I think it's possible to just remount these rw.

There's also an irritating thing where different pods run their apps with different UIDs, and the Dynamic NFS Provisioner StorageClass needs to be configured to mount the stuff with the same UID. I originally ran this by just setting chmod 0777, but the apps insist on creating files with a different permission set, so when their files get remounted, their permissions stay but the UID changes, and after a remount they don't have write access to the files anymore.

This compounds with the fact that each container runs on its own UID, so each needs its own special StorageClass for that UID... Gods.

I got the new #IntelNUC for the fourth node in the cluster to replace the unstable Curie node, but memories for it are coming Thursday.

#RukiiNet #SelfHosting update:
After fighting with an unstable host due to memories I believe, and the whole cluster always going down when one node went down, I deep dove into what actually happens.

Turned out as I had installed #OpenEBS #Jiva for replicating volumes on my #MicroK8S using their official Helm charts, it didn't work at all. It just made all the replicas correctly, went through all the motions and then stored all the data on a single pod ephemeral store! I had to take the cluster down to investigate, that took a weekend more or less.

I found out that if OpenEBS Jiva is installed as a MicroK8S plug-in, pointing it specifically to their Git main, and not to a tagged release which doesn't work, then it works. I tried to find out the difference between the Helm chart this installs and the one I had installed, with no luck. I think I installed OpenEBS Jiva Helm chart before and that didn't work, while MicroK8S plug-in installs OpenEBS chart with Jiva enabled as a setting.

Anyhow, ordered a new #IntelNUC again, to reduce my maintenance actions due to one flaky node as well. But as I recreated basically the whole cluster with functioning OpenEBS now, and restored all the (daily) backups once again, it seems everything works and probably single node going down shouldn't take the whole Mastodon down anymore regardless.

During all this I have also filed a lot of issues to the relevant projects on GitHub and documented my findings there, so that people getting the same errors can find solutions.

#RukiiNet #SelfHosting update:
I think there is an issue with #OpenEBS #Jiva replication on the #Kubernetes cluster.

It seems all the volume data goes to the Jiva controller pod for that PVC, and it stores all the data in /var/snap/microk8s/common/var/lib/kubelet/pods/PODID/volumes/kubernetes.io~csi/PVCID/mount.

That directory perhaps should be a mount to somewhere, but isn't. It just stores the plain files there on a single node.

The Jiva replicas, three per every volume claim, are set up correctly but the file data doesn't seem to go to those...

Ok, many of my #Microk8s #Kubernetes issues are apparently distro related. #Ubuntu wants to install a snap package of MicroK8S, which would be all well and fine, it's a relatively recent version, and uses Kubelite as a snap service. But there are of course some version incompatibilities.

I had previous issues with MicroK8S plug-ins which in some cases installed very old and buggy versions of the charts in question. Learned to avoid those and install manually where appropriate. In most cases it works, it's just #OpenEBS that is problematic.

Now it seems that when I've had issues with the cluster, I've followed general instructions and used microk8s start. Apparently that's a huge mistake. MicroK8S has opinions about cluster tooling versions which apparently differ from what snap wants to run, and all this leads to really weird incompatibility issues with Unix sockets where multiple things conflict over them.

Hopefully the cluster stays up a bit better now. I can't recommend trying to run #Mastodon on MicroK8S, it's a forever pain with no, or often misleading documentation.

#SelfHosting

@mlink

We are currently using #NFS for storage on our #k8s cluster. We are testing #OpenEBS now.

Because you mentioned many issues with #OpenEBS and none with #Longhorn , I think we need to take another look at #Longhorn and #Rancher.

Currently using #RHEL8

@tero

#OpenEBS

Client Info