Kubernetes Persistent Volumes and RBD
Since the number of stuff I’m deploying on my small Kubernetes cluster is increasing and manually managing the volumes is beginning to be a pain, I decided to start learning about the Storage Classes, Persistent Volumes and Volume claims.
Even if at first it seems to be intimidating, it was really easy to integrate them with my small Ceph cluster that I also play with.
Ceph
On the Ceph side, the configuration consists of creating a new pool and user that will be used by our Kubernetes cluster.
- First create a new pool
ceph osd pool create kubernetes 64 64
- Then, to reduce compatibility problems, I decided to reduce the features
to the bare minimum
rbd feature disable --pool kubernetes exclusive-lock object-map fast-diff deep-flatten
- Once the pool is created, I created a new client key that will be used
to provision and claim volumes that will be stored in this pool
ceph auth get-or-create-key client.kubernetes
- We need to add the correct capabilities to this new client so that it
can create new images, handle the locks and retrieve the images. The
rbd
profile automatically allow these operations.ceph auth caps client.kubernetes mon "profile rbd" osd "profile rbd pool=kubernetes
- Then, we export the key in base64 to be inserted shortly in the Kubernetes
storage class configuration.
ceph auth get client.kubernetes | grep key | awk '{print $3}' | base64
That’s all for the Ceph part of the storage configuration. Easy until now no ?
Storage class
In Kubernetes, a Storage Class is a way to configure the storage that is available and can be used by the Persistent Volumes. It’s really an easy way to describe the storage so that you don’t have to worry about it when creating new pods.
I created a new file that contains everything needed for the configuration
of a new rbd
storage class in my cluster. I will describe it part by
part, but you can merge everything into one file to apply it with kubectl
.
kind: ServiceAccount
apiVersion: v1
metadata:
name: rbd-provisioner
namespace: kube-system
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: kube-system
roleRef:
kind: ClusterRole
name: system:controller:persistent-volume-binder
apiGroup: rbac.authorization.k8s.io
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: rbd-provisioner
namespace: kube-system
spec:
replicas: 1
template:
metadata:
labels:
app: rbd-provisioner
spec:
containers:
- name: rbd-provisioner
image: "quay.io/external_storage/rbd-provisioner:v0.1.0"
serviceAccountName: rbd-provisioner
A rbd provisioner pod and it’s related service accounts, based on the [RBD Volume Provisioner for Kubernetes 1.5+ incubator project] (https://github.com/kubernetes-incubator/external-storage/tree/master/ceph/rbd/deploy/rbac).
Not much to add for now on this part. Let’s look into the storage class configuration.
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: rbd
provisioner: ceph.com/rbd
parameters:
monitors: 10.42.100.1:6789,10.42.100.2:6789,10.42.100.3:6789
adminId: kubernetes
adminSecretName: ceph-secret
adminSecretNamespace: kube-system
pool: kubernetes
userId: kubernetes
userSecretName: ceph-secret-user
reclaimPolicy: Retain
---
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret
namespace: kube-system
type: kubernetes.io/rbd
data:
key: QV[...]QPo=
---
apiVersion: v1
kind: Secret
metadata:
name: ceph-secret-user
namespace: default
type: kubernetes.io/rbd
data:
key: QV[...]QPo=
This is the part where the storage is described. Update the monitors
to
match your Ceph configuration and the secrets to match the key you got from
the last ceph auth get client.kubernetes | grep key | awk '{print $3}' | base64
command.
Here, I cheated a little and used the same client for both the administration and the user part of the storage, in part because I didn’t want to bother with the capabilities needed for each.
Once everything seems correct, you can save the file or files and apply
the configuration on the Kubernetes cluster with kubectl apply -f ceph-rbd.yaml
(or the name of your file).
And that’s all for the configuration … We can check that everything is
working with kubectl get sc,deploy,po -n kube-system
NAME PROVISIONER
storageclasses/rbd ceph.com/rbd
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
[...]
deploy/rbd-provisioner 1 1 1 1 5m
NAME READY STATUS RESTARTS AGE
[...]
po/rbd-provisioner-5cc5947c77-xdcn5 1/1 Running 0 5m
There should be a rbd-provisioner
deployment with everything as desired,
a rbd-provisioner-...-...
pod running and a storageclasses/rbd
storage
class with the correct provisioner.
PersistentVolumeClaim and Volumes
Now to the usage in the deployments :
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: myservice-data-claim
spec:
storageClassName: rbd
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
---
kind: Pod
apiVersion: v1
metadata:
name: myservice-pod
spec:
volumes:
- name: myservice-data
persistentVolumeClaim:
claimName: myservice-data-claim
containers:
- name: myservice-cont
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: myservice-data
If the volume does not yet exist, a new image will be automatically created
on Ceph, this image will be formatted (by default in ext4
) and mounted.
If it exists, it will simply be mounted.
All in all, two hours were sufficient to migrate from volume manually created and managed to a storage class and volume claims. I learnt that even though Kubernetes can really look hard and scary at first, verything is there to help you with your stuff.
Possible errors
Filesystem error - Access denied
By default, the pods will have access to the newly generated filesystems.
If you start them with securityContext
parameters, you can put them in
a state where the user the container is running at does not have access
to the filesystem content, either as read or write.
Image is locked by other nodes
If like me you battle with rbd: image is locked by other nodes
errors when a pod is migrated between nodes, it usually means that the client
you created doesn’t have the capabilities to remove locks after detaching.
I fixed that simply by setting the caps to the profile instead of configuring
manually the rwx
operations :
ceph auth caps client.myclient mon "profile rbd" osd "profile rbd pool=mypool"