Goglides Dev 🌱

Goglides Dev 🌱 is a community of amazing users

We are working on this space so that IT professionals can grow together.

Create account Log in
Balkrishna Pandey
Balkrishna Pandey

Posted on

quay: Could not connect to storage local_us net/http: timeout awaiting response headers

I encountered a bizarre error when using Quay to pull a docker image from a private repository. You can find the complete log below,

   __   __
  /  \ /  \     ______   _    _     __   __   __
 / /\ / /\ \   /  __  \ | |  | |   /  \  \ \ / /
/ /  / /  \ \  | |  | | | |  | |  / /\ \  \   /
\ \  \ \  / /  | |__| | | |__| | / ____ \  | |
 \ \/ \ \/ /   \_  ___/  \____/ /_/    \_\ |_|
  \__/ \__/      \ \__
                  \___\ by Red Hat
 Build, Store, and Distribute your Containers

Startup timestamp: 
Fri Oct  7 21:08:49 UTC 2022

Running all default registry services without migration
Running init script '/quay-registry/conf/init/certs_create.sh'
Generating a RSA private key
............................................................................................................................................++++
.................................................................++++
writing new private key to 'mitm-key.pem'
-----
Running init script '/quay-registry/conf/init/certs_install.sh'
Installing extra certificates found in /quay-registry/conf/stack/extra_ca_certs directory
Running init script '/quay-registry/conf/init/copy_config_files.sh'
Running init script '/quay-registry/conf/init/d_validate_config_bundle.sh'
Validating Configuration
plpgsql
pg_trgm
...
...
| DistributedStorage   | Could not connect to storage local_us. Error: Get "https://s3.openshift-storage.svc.cluster.local/quay-datastore-b84f9a69-e025-4a53-950e-75077ee64430/?location=": net/http: timeout awaiting response headers
Enter fullscreen mode Exit fullscreen mode

Quay seems unable to connect with nooba data storage for some reason. The component is located in the openshift-storage namespace. Let's see whether any pods are not running in the openshift-storage project as follows,

oc get pods -o custom-columns="POD:metadata.name,STATE:status.containerStatuses[*].state.waiting.reason" -n openshift-storage| grep -v "<none>"
Enter fullscreen mode Exit fullscreen mode

Output:

POD                                STATE
csi-rbdplugin-provisioner-5cdf488d6f-h84bq            CrashLoopBackOff
csi-rbdplugin-provisioner-5cdf488d6f-mlxrw            CrashLoopBackOff
noobaa-db-pg-0                          0/1   Init:0/2      0     24m
Enter fullscreen mode Exit fullscreen mode

Some of the pods are in an unhealthy state. In particular, the "nooba" pod seems to be down., which quay trying to connect as object storage. When I describe the pod, this is what I found:

Warning FailedMount     17m (x2 over 21m)  kubelet         Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[kube-api-access-jvkcl noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume db]: timed out waiting for the condition

Warning FailedMount     7m17s (x5 over 19m) kubelet         Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[db kube-api-access-jvkcl noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume]: timed out waiting for the condition

Warning FailedAttachVolume 3m37s (x9 over 21m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-bd18086f-de8f-469f-96e3-39c3566cb811" : Attach timeout for volume 0001-0011-openshift-storage-0000000000000001-74044298-fed1-11ec-86e3-0a580a830015

Warning FailedMount     3m10s        kubelet         Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[noobaa-postgres-initdb-sh-volume noobaa-postgres-config-volume db kube-api-access-jvkcl]: timed out waiting for the condition

Warning FailedMount     67s (x3 over 15m)  kubelet         Unable to attach or mount volumes: unmounted volumes=[db], unattached volumes=[noobaa-postgres-config-volume db kube-api-access-jvkcl noobaa-postgres-initdb-sh-volume]: timed out waiting for the condition
Enter fullscreen mode Exit fullscreen mode

We can see that there is a problem with our storage controller, specifically with the rbd-plugin-provisioner. This is supposed to handle block devices type of persistent volume, which nooba is using.

So I checked the log of csi-rbdplugin-provisioner pod,

Note: I am using kubetail, check the repo for this log

kubetail csi-rbdplugin-provisioner-5cdf488d6f-h84bq
Will tail 6 logs...
csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-provisioner
csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-resizer
csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-attacher
csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-snapshotter
csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-rbdplugin
csi-rbdplugin-provisioner-5cdf488d6f-h84bq liveness-prometheus
[csi-rbdplugin-provisioner-5cdf488d6f-h84bq liveness-prometheus] W1010 15:14:43.023744    1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-snapshotter] W1010 15:14:42.278799    1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-provisioner] W1010 15:14:41.173525    1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-attacher] W1010 15:14:41.853546    1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-5cdf488d6f-h84bq csi-resizer] W1010 15:14:41.496197    1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
Enter fullscreen mode Exit fullscreen mode

All of the containers saying it is trying to connect to unix:///csi/csi-provisioner.sock. For some reason, csi-rbdplugin container is not able to set up this socket.

Is it possible that the container cannot generate this socket because of an SCC permission? To test this, I added privileged permission to the service account rook-csi-rbd-provisioner-sa.


oc adm policy add-scc-to-user privileged system:serviceaccount:openshift-storage:rook-csi-rbd-provisioner-sa

Enter fullscreen mode Exit fullscreen mode

Afterward, I added a Security Context privileged: true at the pod level so that all containers would have enough privileges. For some reason, this started a new issue; I saw the following ERROR !!!


Warning FailedScheduling 3m23s (x1 over 4m24s) default-scheduler 0/13 nodes are available: 10 node(s) didn't match pod affinity/anti-affinity rules, 10 node(s) didn't match pod anti-affinity rules, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate.

Enter fullscreen mode Exit fullscreen mode

Not sure why, but instead of debugging this new error, I decided to delete the csi-rbdplugin-provisioner deployment entirely and reinstall it again with the privileged flag disabled. This time I saw the following error message

kubetail csi-rbdplugin-provisioner-67f9478588-t7g82 
Will tail 6 logs...
csi-rbdplugin-provisioner-67f9478588-t7g82 csi-provisioner
csi-rbdplugin-provisioner-67f9478588-t7g82 csi-resizer
csi-rbdplugin-provisioner-67f9478588-t7g82 csi-attacher
csi-rbdplugin-provisioner-67f9478588-t7g82 csi-snapshotter
csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin
csi-rbdplugin-provisioner-67f9478588-t7g82 liveness-prometheus
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] I1010 16:01:51.368844    1 cephcsi.go:131] Driver version: release-4.8 and Git version: ad563f5bebb2efd5f64dee472e441bbe918fa101
[csi-rbdplugin-provisioner-67f9478588-t7g82 liveness-prometheus] W1010 16:01:54.557645    1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-attacher] W1010 16:01:53.606806    1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] I1010 16:01:51.369091    1 cephcsi.go:149] Initial PID limit is set to 1024
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] E1010 16:01:51.369149    1 cephcsi.go:153] Failed to set new PID limit to -1: open /sys/fs/cgroup/pids/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod09c3db5b_d5e6_4d85_aea0_298e59a91356.slice/crio-b62add795b28df7c9d62b93b65362c51c0d2b873d3d26631f29435d2c7f0b458.scope/pids.max: permission denied
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-resizer] W1010 16:01:53.341694    1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] I1010 16:01:51.369165    1 cephcsi.go:176] Starting driver type: rbd with name: openshift-storage.rbd.csi.ceph.com
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] F1010 16:01:51.369210    1 driver.go:107] failed to write ceph configuration file (open /etc/ceph/ceph.conf: permission denied)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] goroutine 1 [running]:
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog/v2.stacks(0xc000010001, 0xc000416270, 0x83, 0xc7)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1026 +0xb9
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog/v2.(*loggingT).output(0x2b3c140, 0xc000000003, 0x0, 0x0, 0xc0002d5f10, 0x2170e00, 0x9, 0x6b, 0x41a900)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/vendor/k8s.io/klog/v2/klog.go:975 +0x191
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog/v2.(*loggingT).printDepth(0x2b3c140, 0xc000000003, 0x0, 0x0, 0x0, 0x0, 0x1, 0xc000712c60, 0x1, 0x1)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/vendor/k8s.io/klog/v2/klog.go:732 +0x16f
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog/v2.FatalDepth(...)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1488
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] github.com/ceph/ceph-csi/internal/util.FatalLogMsg(0x1b5df18, 0x2c, 0xc000615d10, 0x1, 0x1)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-snapshotter] W1010 16:01:53.883701    1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/internal/util/log.go:58 +0x118
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] github.com/ceph/ceph-csi/internal/rbd.(*Driver).Run(0xc000615f18, 0x2b3c040)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/internal/rbd/driver.go:107 +0xa5
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] main.main()
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/cmd/cephcsi.go:182 +0x345
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-provisioner] W1010 16:01:53.086761    1 connection.go:172] Still connecting to unix:///csi/csi-provisioner.sock
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] 
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] goroutine 6 [chan receive]:
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog/v2.(*loggingT).flushDaemon(0x2b3c140)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/vendor/k8s.io/klog/v2/klog.go:1169 +0x8b
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] created by k8s.io/klog/v2.init.0
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/vendor/k8s.io/klog/v2/klog.go:417 +0xdf
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] 
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] goroutine 99 [chan receive]:
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] k8s.io/klog.(*loggingT).flushDaemon(0x2b3bf60)
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/vendor/k8s.io/klog/klog.go:1010 +0x8b
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin] created by k8s.io/klog.init.0
[csi-rbdplugin-provisioner-67f9478588-t7g82 csi-rbdplugin]  /remote-source/app/vendor/k8s.io/klog/klog.go:411 +0xd8
Enter fullscreen mode Exit fullscreen mode

The log indicates that the csi-rbdplugin container is having difficulty creating a CSI socket. I resolved the problem by setting a privileged flag for this container.

        securityContext:
          privileged: true
        image: registry.redhat.io/ocs4/cephcsi-rhel8@sha256:502b5da53fae7dd22081717dc317e4978f93866b3c297bac36823571835320f3
        imagePullPolicy: IfNotPresent
        name: csi-rbdplugin
Enter fullscreen mode Exit fullscreen mode

Discussion (0)