Goglides Dev 🌱

Goglides Dev 🌱 is a community of amazing users

We are working on this space so that IT professionals can grow together.

Create account Log in
Balkrishna Pandey
Balkrishna Pandey

Posted on • Updated on

failed to write ceph configuration file (open /etc/ceph/ceph.conf: permission denied)

We recently noticed strange issues with one of the openshift storage components. Some of the pods are experiencing problems.

oc get pods
Enter fullscreen mode Exit fullscreen mode

Output:

...
csi-rbdplugin-provisioner-5466484bd5-47mpr                        5/6     CrashLoopBackOff   6          8m31s
csi-rbdplugin-provisioner-5466484bd5-vvlds                        5/6     CrashLoopBackOff   10         30m
noobaa-db-pg-0                                                    0/1     Init:0/2           0          21m
ocs-operator-7b8dd5c85f-krf8r                                     0/1     Running            0          15m
...
Enter fullscreen mode Exit fullscreen mode

So we began checking the logs of each pod one by one. The ocs-operator health check status was failing, but there isn't much information in this log to go on.

oc logs ocs-operator-7b8dd5c85f-krf8r   -f
Enter fullscreen mode Exit fullscreen mode

Output:

...
{"level":"info","ts":1663854793.1919212,"logger":"controller-runtime.healthz","msg":"healthz check failed","statuses":[{}]}
{"level":"info","ts":1663854803.1925268,"logger":"controller-runtime.healthz","msg":"healthz check failed","statuses":[{}]}
Enter fullscreen mode Exit fullscreen mode

So we looked at another pod in the CrashLoopBackOff state. This pod has several containers. The following error logs were found in one of the containers named csi-provisioner.

oc logs csi-rbdplugin-provisioner-5466484bd5-47mpr 
Enter fullscreen mode Exit fullscreen mode

Output:

error: a container name must be specified for pod csi-rbdplugin-provisioner-5466484bd5-47mpr, choose one of: [csi-provisioner csi-resizer csi-attacher csi-snapshotter csi-rbdplugin liveness-prometheus]
Enter fullscreen mode Exit fullscreen mode
oc logs csi-rbdplugin-provisioner-5466484bd5-47mpr  csi-provisioner
Enter fullscreen mode Exit fullscreen mode

Output:

I0922 13:57:20.393426       1 feature_gate.go:243] feature gates: &{map[]}
I0922 13:57:20.393600       1 csi-provisioner.go:138] Version: v4.8.0-202206281335.p0.g3ea7e68.assembly.stream-0-ga5337a9-dirty
I0922 13:57:20.393622       1 csi-provisioner.go:161] Building kube configs for running in cluster...
I0922 13:57:20.403527       1 connection.go:153] Connecting to unix:///csi/csi-provisioner.sock
W0922 13:57:30.404071       1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
W0922 13:57:40.404390       1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
W0922 13:57:50.404020       1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
W0922 13:58:00.404168       1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
Enter fullscreen mode Exit fullscreen mode

Then we examined another container log called csi-rbdplugin. The application could not create a file and reported the error failed to write ceph configuration file (open /etc/ceph/ceph.conf: permission denied).

oc logs csi-rbdplugin-provisioner-5466484bd5-47mpr  csi-rbdplugin
Enter fullscreen mode Exit fullscreen mode

Output:

I0922 14:00:31.206741       1 cephcsi.go:131] Driver version: release-4.8 and Git version: ad563f5bebb2efd5f64dee472e441bbe918fa101
I0922 14:00:31.206911       1 cephcsi.go:149] Initial PID limit is set to 1024
E0922 14:00:31.206946       1 cephcsi.go:153] Failed to set new PID limit to -1: open /sys/fs/cgroup/pids/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod1c017f07_348a_493c_bd88_184670c3c35c.slice/crio-cd12a9e76956fc3e3ae7054d781f09b51dc5872047ba4f7b761cedcc174d91a3.scope/pids.max: permission denied
I0922 14:00:31.206963       1 cephcsi.go:176] Starting driver type: rbd with name: openshift-storage.rbd.csi.ceph.com
F0922 14:00:31.206993       1 driver.go:107] failed to write ceph configuration file (open /etc/ceph/ceph.conf: permission denied)
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc0001c0001, 0xc00035c680, 0x83, 0xc7)
    /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9
k8s.io/klog/v2.(*loggingT).output(0x2b3c140, 0xc000000003, 0x0, 0x0, 0xc000415730, 0x2170e40, 0x9, 0x6b, 0x41a900)
    /remote-source/app/vendor/k8s.io/klog/v2/klog.go:975 +0x191
k8s.io/klog/v2.(*loggingT).printDepth(0x2b3c140, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1, 0xc000531350, 0x1, 0x1)
    /remote-source/app/vendor/k8s.io/klog/v2/klog.go:732 +0x16f
k8s.io/klog/v2.FatalDepth(...)
    /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1488
github.com/ceph/ceph-csi/internal/util.FatalLogMsg(0x1b5df58, 0x2c, 0xc0005b1d10, 0x1, 0x1)
    /remote-source/app/internal/util/log.go:58 +0x118
github.com/ceph/ceph-csi/internal/rbd.(*Driver).Run(0xc0005b1f18, 0x2b3c040)
    /remote-source/app/internal/rbd/driver.go:107 +0xa5
main.main()
    /remote-source/app/cmd/cephcsi.go:182 +0x345

goroutine 19 [chan receive]:
k8s.io/klog/v2.(*loggingT).flushDaemon(0x2b3c140)
    /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1169 +0x8b
created by k8s.io/klog/v2.init.0
    /remote-source/app/vendor/k8s.io/klog/v2/klog.go:417 +0xdf

goroutine 131 [chan receive]:
k8s.io/klog.(*loggingT).flushDaemon(0x2b3bf60)
    /remote-source/app/vendor/k8s.io/klog/klog.go:1010 +0x8b
created by k8s.io/klog.init.0
    /remote-source/app/vendor/k8s.io/klog/klog.go:411 +0xd8
Enter fullscreen mode Exit fullscreen mode

We also found a similar issue here in the RedHat forum. In that forum, someone suggested using privileged: true
as a workaround to the csi-rbdplugin container, which fixes the issue immediately.

oc edit deployment csi-rbdplugin-provisioner -n openshift-storage
Enter fullscreen mode Exit fullscreen mode
        name: csi-rbdplugin
        resources: {}
        securityContext:
          privileged: true
Enter fullscreen mode Exit fullscreen mode

However, it appears that the issue is related to scc profile, as pods' scc profile, which is supposed to be rook-ceph-csi, has been changed to a different scc profile, ncom-common, in some way. Because multiple people are currently accessing this cluster; as a result, I could not determine who is making these changes.

      openshift.io/scc: ncom-common
    creationTimestamp: "2022-09-22T14:09:26Z"
    generateName: csi-rbdplugin-provisioner-6b4f9497b8-
    labels:
      app: csi-rbdplugin-provisioner
      contains: csi-rbdplugin-metrics
Enter fullscreen mode Exit fullscreen mode

Discussion (0)