Recently, I encountered a problem where one of the nodes in my OpenShift cluster could not pull container images. In this blog post, I will explain the error messages observed in the pod logs and share my steps to resolve the issue.
During my recent experience, I consistently encountered the following error message in the pod logs on the affected node:
5gc/amf:v1" already present on machine
To troubleshoot and resolve this issue, I followed these steps:
I also tried to pull the image using the podman pull command manually. However, I faced the same issue; this suggested that the problem was more comprehensive than the automated image-pulling process in the OpenShift cluster. Instead, it was related to the image or the nodes where the pulling occurred.
Preventing pod scheduling on the affected node:
I used the command
oc adm drain NODENAME --ignore-daemonsets --delete-local-data --disable-eviction --forceto stop scheduling pods on the problematic node.
Stopping services on the affected node:
I SSHed into the node and issued the following commands to stop the necessary services:
systemctl stop crioto stop the CRI-O service.
systemctl stop kubeletto stop the kubelet service.
Resetting the Podman configuration:
I attempted to reset the Podman configuration using the command
podman system reset -f. However, I encountered an error indicating that certain images were still in use by containers.
Manually deleting the container data:
I resolved this issue by manually deleting the
/var/lib/containers/directory, which contained the container data. In my case, I had to remove the entire directory, including the "overlay" subdirectory.
When attempting to delete the "overlay" folder, I encountered a
"Device or resource busy" error. To address this, I used the command
umount overlay to unmount the overlay filesystem. Then, I executed
crio wipe -f to wipe the CRI-O configuration.
This fixes the issue.