Deploy Rook
This section will describe how to deploy a Rook Ceph cluster on K8s. This deployment will assume the K8s cluster member nodes have attached unprovisioned raw storage devices. If you want to use host storage from an existing mounted filesystem, review the rook docs before proceeding.
For single server Thorium deployments its best to skip deploying rook and just use a host path storageClass provisioner and Minio for better performance.
1) Create Rook CRD:
Apply the rook CRD and common resources.
kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.16.4/deploy/examples/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/rook/rook/refs/tags/v1.16.4/deploy/examples/common.yaml
2) Create the Rook operator
You can deploy Rook Ceph with the default operator options. However, you may choose to disable certain drivers such as CephFS that won't be needed for Thorium. To do that download the operator YAML resource definition and modify it before applying it.
kubectl apply -f https://github.com/rook/rook/refs/tags/v1.16.4/deploy/examples/operator.yaml
3) Create Ceph/S3 Object Store
Create the Ceph pools and RADOS Object Gateway (RGW) instance(s). You may want to modify the redundancy factors and number of gateway instances depending on the size of your K8s cluster. Some fields you may look to modify are:
The totals of
dataChunks
+codingChunks
and separatelysize
must both be <= the number of k8s cluster servers with attached storage that Rook can utilize. If this condition is not met, the Ceph cluster Rook deploys will not be in a healthy state after deployment and the Rook operator may fail to complete the deployment process.
spec.metadataPool.replicated.size
- Set to less than 3 for small k8s clustersspec.dataPool.erasureCoded.dataChunks
- More erasure coding data chunks for better storage efficiency, but lower write performancespec.dataPool.erasureCoded.codingChunks
- More erasure coding chunks for extra data redundancyspec.gateway.instances
- Increase number of RGW pods for larger K8s clusters and better performance
cat <<EOF | kubectl apply -f -
#################################################################################################################
# Create an object store with settings for erasure coding for the data pool. A minimum of 3 nodes with OSDs are
# required in this example since failureDomain is host.
# kubectl create -f object-ec.yaml
#################################################################################################################
apiVersion: ceph.rook.io/v1
kind: CephObjectStore
metadata:
name: thorium-s3-store
namespace: rook-ceph # namespace:cluster
spec:
# The pool spec used to create the metadata pools. Must use replication.
metadataPool:
failureDomain: osd # host
replicated:
size: 3
# Disallow setting pool with replica 1, this could lead to data loss without recovery.
# Make sure you're *ABSOLUTELY CERTAIN* that is what you want
requireSafeReplicaSize: true
parameters:
# Inline compression mode for the data pool
# Further reference: https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression
compression_mode: none
# gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
# for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
#target_size_ratio: ".5"
# The pool spec used to create the data pool. Can use replication or erasure coding.
dataPool:
failureDomain: osd # host
erasureCoded:
dataChunks: 3
codingChunks: 2
parameters:
# Inline compression mode for the data pool
# Further reference: https://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression
compression_mode: none
# gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
# for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
#target_size_ratio: ".5"
# Whether to preserve metadata and data pools on object store deletion
preservePoolsOnDelete: true
# The gateway service configuration
gateway:
# A reference to the secret in the rook namespace where the ssl certificate is stored
sslCertificateRef:
# The port that RGW pods will listen on (http)
port: 80
# The port that RGW pods will listen on (https). An ssl certificate is required.
# securePort: 443
# The number of pods in the rgw deployment
instances: 1 # 3
# The affinity rules to apply to the rgw deployment or daemonset.
placement:
# nodeAffinity:
# requiredDuringSchedulingIgnoredDuringExecution:
# nodeSelectorTerms:
# - matchExpressions:
# - key: role
# operator: In
# values:
# - rgw-node
# tolerations:
# - key: rgw-node
# operator: Exists
# podAffinity:
# podAntiAffinity:
# A key/value list of annotations
annotations:
# key: value
# A key/value list of labels
labels:
# key: value
resources:
# The requests and limits set here, allow the object store gateway Pod(s) to use half of one CPU core and 1 gigabyte of memory
# limits:
# cpu: "500m"
# memory: "1024Mi"
# requests:
# cpu: "500m"
# memory: "1024Mi"
# priorityClassName: my-priority-class
#zone:
#name: zone-a
# service endpoint healthcheck
healthCheck:
# Configure the pod probes for the rgw daemon
startupProbe:
disabled: false
readinessProbe:
disabled: false
EOF
4) Create block storage class
Use the following storage class to create a Rook Ceph data pool to store RADOS block devices (RBDs)
that will map to Kubernetes persistent volumes. The following command will create a block device pool
and storageClass (called rook-ceph-block
). You will use this storage class name for creating PVCs
in the sections that follow. You may want to update the replication factors depending on the size
of your k8s cluster.
spec.replicated.size
- Set to less than 3 for small k8s clustersspec.erasureCoded.dataChunks
- More erasure coding data chunks for better storage efficiency, but lower write performancespec.erasureCoded.codingChunks
- More erasure coding chunks for extra data redundancy
cat <<EOF | kubectl apply -f -
#################################################################################################################
# Create a storage class with a data pool that uses erasure coding for a production environment.
# A metadata pool is created with replication enabled. A minimum of 3 nodes with OSDs are required in this
# example since the default failureDomain is host.
# kubectl create -f storageclass-ec.yaml
#################################################################################################################
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: replicated-metadata-pool
namespace: rook-ceph # namespace:cluster
spec:
failureDomain: osd # host
replicated:
size: 3
---
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
name: ec-data-pool
namespace: rook-ceph # namespace:cluster
spec:
failureDomain: osd # host
# Make sure you have enough nodes and OSDs running bluestore to support the replica size or erasure code chunks.
# For the below settings, you need at least 3 OSDs on different nodes (because the `failureDomain` is `host` by default).
erasureCoded:
dataChunks: 3
codingChunks: 2
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: rook-ceph-block
# Change "rook-ceph" provisioner prefix to match the operator namespace if needed
provisioner: rook-ceph.rbd.csi.ceph.com # driver:namespace:operator
parameters:
# clusterID is the namespace where the rook cluster is running
# If you change this namespace, also change the namespace below where the secret namespaces are defined
clusterID: rook-ceph # namespace:cluster
# If you want to use erasure coded pool with RBD, you need to create
# two pools. one erasure coded and one replicated.
# You need to specify the replicated pool here in the `pool` parameter, it is
# used for the metadata of the images.
# The erasure coded pool must be set as the `dataPool` parameter below.
dataPool: ec-data-pool
pool: replicated-metadata-pool
# (optional) mapOptions is a comma-separated list of map options.
# For krbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd/#kernel-rbd-krbd-options
# For nbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd-nbd/#options
# mapOptions: lock_on_read,queue_depth=1024
# (optional) unmapOptions is a comma-separated list of unmap options.
# For krbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd/#kernel-rbd-krbd-options
# For nbd options refer
# https://docs.ceph.com/docs/master/man/8/rbd-nbd/#options
# unmapOptions: force
# RBD image format. Defaults to "2".
imageFormat: "2"
# RBD image features. Available for imageFormat: "2". CSI RBD currently supports only `layering` feature.
imageFeatures: layering
# The secrets contain Ceph admin credentials. These are generated automatically by the operator
# in the same namespace as the cluster.
csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph # namespace:cluster
csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph # namespace:cluster
csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph # namespace:cluster
# Specify the filesystem type of the volume. If not specified, csi-provisioner
# will set default as `ext4`.
csi.storage.k8s.io/fstype: xfs
# uncomment the following to use rbd-nbd as mounter on supported nodes
# **IMPORTANT**: CephCSI v3.4.0 onwards a volume healer functionality is added to reattach
# the PVC to application pod if nodeplugin pod restart.
# Its still in Alpha support. Therefore, this option is not recommended for production use.
#mounter: rbd-nbd
allowVolumeExpansion: true
reclaimPolicy: Delete
EOF
6) Create a Thorium S3 User
Create a Thorium S3 user and save access/secret key that are generated with the following command.
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- radosgw-admin user create --uid=thorium-s3-user --display-name="Thorium S3 User"
7) Deploy Rook Ceph Toolbox pod
kubectl https://raw.githubusercontent.com/rook/rook/refs/heads/master/deploy/examples/toolbox.yaml
8) Verify Rook pods are all running
kubectl get pods -n rook-ceph
For a 5 node k8s cluster with 2 raw storage devices per node, the output might look like this:
csi-rbdplugin-provisioner-HASH 5/5 Running 0 1h
csi-rbdplugin-provisioner-HASH 5/5 Running 0 1h
csi-rbdplugin-HASH 3/3 Running 0 1h
csi-rbdplugin-HASH 3/3 Running 0 1h
csi-rbdplugin-HASH 3/3 Running 0 1h
csi-rbdplugin-HASH 3/3 Running 0 1h
csi-rbdplugin-HASH 3/3 Running 0 1h
rook-ceph-crashcollector-NODE1-HASH 1/1 Running 0 1h
rook-ceph-crashcollector-NODE2-HASH 1/1 Running 0 1h
rook-ceph-crashcollector-NODE3-HASH 1/1 Running 0 1h
rook-ceph-crashcollector-NODE4-HASH 1/1 Running 0 1h
rook-ceph-crashcollector-NODE5-HASH 1/1 Running 0 1h
rook-ceph-exporter-NODE5-HASH 1/1 Running 0 1h
rook-ceph-exporter-NODE5-HASH 1/1 Running 0 1h
rook-ceph-exporter-NODE5-HASH 1/1 Running 0 1h
rook-ceph-exporter-NODE5-HASH 1/1 Running 0 1h
rook-ceph-exporter-NODE5-HASH 1/1 Running 0 1h
rook-ceph-mgr-a-HASH 3/3 Running 0 1h
rook-ceph-mgr-b-HASH 3/3 Running 0 1h
rook-ceph-mon-a-HASH 2/2 Running 0 1h
rook-ceph-mon-b-HASH 2/2 Running 0 1h
rook-ceph-mon-c-HASH 2/2 Running 0 1h
rook-ceph-operator-HASH 1/1 Running 0 1h
rook-ceph-osd-0-HASH 2/2 Running 0 1h
rook-ceph-osd-1-HASH 2/2 Running 0 1h
rook-ceph-osd-3-HASH 2/2 Running 0 1h
rook-ceph-osd-4-HASH 2/2 Running 0 1h
rook-ceph-osd-5-HASH 2/2 Running 0 1h
rook-ceph-osd-6-HASH 2/2 Running 0 1h
rook-ceph-osd-7-HASH 2/2 Running 0 1h
rook-ceph-osd-8-HASH 2/2 Running 0 1h
rook-ceph-osd-9-HASH 2/2 Running 0 1h
rook-ceph-osd-prepare-NODE5-HASH 0/1 Completed 0 1h
rook-ceph-osd-prepare-NODE5-HASH 0/1 Completed 0 1h
rook-ceph-osd-prepare-NODE5-HASH 0/1 Completed 0 1h
rook-ceph-osd-prepare-NODE5-HASH 0/1 Completed 0 1h
rook-ceph-osd-prepare-NODE5-HASH 0/1 Completed 0 1h
rook-ceph-rgw-thorium-s3-store-a-HASH 2/2 Running 0 1h
rook-ceph-tools-HASH 1/1 Running 0 1h
9) Verify Ceph cluster is healthy
If the Rook Ceph cluster is healthy, you should be able to run a status command from the Rook
toolbox. The health section of the cluster status will show HEALTH_OK
. If you see HEALTH_WARN
you will need to look at the reasons at the bottom of the cluster status to troubleshoot the
cause.
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -s
cluster:
id: 20ea7cb0-5cab-4565-bc1c-360b6cd1282b
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 1h)
mgr: b(active, since 1h), standbys: a
osd: 10 osds: 10 up (since 1h), 10 in (since 1h)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
...
```