Kubernetes Persistent Volumes and RBD

Since the number of stuff I’m deploying on my small Kubernetes cluster is increasing and manually managing the volumes is beginning to be a pain, I decided to start learning about the Storage Classes, Persistent Volumes and Volume claims.

Even if at first it seems to be intimidating, it was really easy to integrate them with my small Ceph cluster that I also play with.

Ceph

On the Ceph side, the configuration consists of creating a new pool and user that will be used by our Kubernetes cluster.

  • First create a new pool ceph osd pool create kubernetes 64 64
  • Then, to reduce compatibility problems, I decided to reduce the features to the bare minimum rbd feature disable --pool kubernetes exclusive-lock object-map fast-diff deep-flatten
  • Once the pool is created, I created a new client key that will be used to provision and claim volumes that will be stored in this pool ceph auth get-or-create-key client.kubernetes
  • We need to add the correct capabilities to this new client so that it can create new images, handle the locks and retrieve the images. The rbd profile automatically allow these operations. ceph auth caps client.kubernetes mon "profile rbd" osd "profile rbd pool=kubernetes
  • Then, we export the key in base64 to be inserted shortly in the Kubernetes storage class configuration. ceph auth get client.kubernetes | grep key | awk '{print $3}' | base64

That’s all for the Ceph part of the storage configuration. Easy until now no ?

Storage class

In Kubernetes, a Storage Class is a way to configure the storage that is available and can be used by the Persistent Volumes. It’s really an easy way to describe the storage so that you don’t have to worry about it when creating new pods.

I created a new file that contains everything needed for the configuration of a new rbd storage class in my cluster. I will describe it part by part, but you can merge everything into one file to apply it with kubectl.

kind: ServiceAccount
apiVersion: v1
metadata:
  name: rbd-provisioner
  namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: rbd-provisioner
subjects:
- kind: ServiceAccount
  name: rbd-provisioner
  namespace: kube-system
roleRef:
  kind: ClusterRole
  name: system:controller:persistent-volume-binder
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: rbd-provisioner
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: rbd-provisioner
    spec:
      containers:
      - name: rbd-provisioner
        image: "quay.io/external_storage/rbd-provisioner:v0.1.0"
      serviceAccountName: rbd-provisioner

A rbd provisioner pod and it’s related service accounts, based on the RBD Volume Provisioner for Kubernetes 1.5+ incubator project.

Not much to add for now on this part. Let’s look into the storage class configuration.

---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: rbd
provisioner: ceph.com/rbd
parameters:
  monitors: 10.42.100.1:6789,10.42.100.2:6789,10.42.100.3:6789
  adminId: kubernetes
  adminSecretName: ceph-secret
  adminSecretNamespace: kube-system
  pool: kubernetes
  userId: kubernetes
  userSecretName: ceph-secret-user
reclaimPolicy: Retain
---
apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret
  namespace: kube-system
type: kubernetes.io/rbd
data:
  key: QV[...]QPo=
---
apiVersion: v1
kind: Secret
metadata:
  name: ceph-secret-user
  namespace: default
type: kubernetes.io/rbd
data:
  key: QV[...]QPo=

This is the part where the storage is described. Update the monitors to match your Ceph configuration and the secrets to match the key you got from the last ceph auth get client.kubernetes | grep key | awk '{print $3}' | base64 command.

Here, I cheated a little and used the same client for both the administration and the user part of the storage, in part because I didn’t want to bother with the capabilities needed for each.

Once everything seems correct, you can save the file or files and apply the configuration on the Kubernetes cluster with kubectl apply -f ceph-rbd.yaml (or the name of your file).

And that’s all for the configuration … We can check that everything is working with kubectl get sc,deploy,po -n kube-system

NAME                 PROVISIONER
storageclasses/rbd   ceph.com/rbd

NAME                                      DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
[...]
deploy/rbd-provisioner                    1         1         1            1           5m

NAME                                                   READY     STATUS    RESTARTS   AGE
[...]
po/rbd-provisioner-5cc5947c77-xdcn5                    1/1       Running   0          5m

There should be a rbd-provisioner deployment with everything as desired, a rbd-provisioner-...-... pod running and a storageclasses/rbd storage class with the correct provisioner.

PersistentVolumeClaim and Volumes

Now to the usage in the deployments :

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: myservice-data-claim
spec:
  storageClassName: rbd
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
kind: Pod
apiVersion: v1
metadata:
  name: myservice-pod
spec:
  volumes:
    - name: myservice-data
      persistentVolumeClaim:
       claimName: myservice-data-claim
  containers:
    - name: myservice-cont
      image: nginx
      ports:
        - containerPort: 80
          name: "http-server"
      volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: myservice-data

If the volume does not yet exist, a new image will be automatically created on Ceph, this image will be formatted (by default in ext4) and mounted. If it exists, it will simply be mounted.

All in all, two hours were sufficient to migrate from volume manually created and managed to a storage class and volume claims. I learnt that even though Kubernetes can really look hard and scary at first, verything is there to help you with your stuff.

Possible errors

Filesystem error - Access denied

By default, the pods will have access to the newly generated filesystems. If you start them with securityContext parameters, you can put them in a state where the user the container is running at does not have access to the filesystem content, either as read or write.

Image is locked by other nodes

If like me you battle with rbd: image is locked by other nodes errors when a pod is migrated between nodes, it usually means that the client you created doesn’t have the capabilities to remove locks after detaching. I fixed that simply by setting the caps to the profile instead of configuring manually the rwx operations : ceph auth caps client.myclient mon "profile rbd" osd "profile rbd pool=mypool"