I've recently noticed problems with my openshift cluster. I can't even make a simple pod like this.
cat <<EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
name: testpod
spec:
containers:
- name: testcontainer
image: centos/tools
imagePullPolicy: IfNotPresent
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 1000; done;" ]
EOF
When I describe the pod I was seeing following error message.
Warning FailedCreatePodSandBox 16s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_testpod_openshift-ovn-kubernetes_e9d3a164-5126-424f-8783-7eda60f82db3_0(938e2842599776af1a1efae17043e261c4278219db7a2a2cd049d2ac3d6014ec): error adding pod openshift-ovn-kubernetes_testpod to CNI network "multus-cni-network": [openshift-ovn-kubernetes/testpod:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-ovn-kubernetes/testpod 938e2842599776af1a1efae17043e261c4278219db7a2a2cd049d2ac3d6014ec] [openshift-ovn-kubernetes/testpod 938e2842599776af1a1efae17043e261c4278219db7a2a2cd049d2ac3d6014ec] failed to get pod annotation: timed out waiting for annotations
If you look at the log closely you will notice following error log.
error adding container to network "ovn-kubernetes"
It appears that ovn-kubernetes is unable to provide this pod an IP address.
The next obvious step is to determine whether the ovn-kubernetes
component is operational or not. The component is running on openshift-ovn-kubernetes
namespace.
oc get pods -n openshift-ovn-kubernetes | grep ovnkube-master
Output:
NAME READY STATUS RESTARTS AGE
ovnkube-master-pbg8h 6/6 Running 40 111d
ovnkube-master-rz6dg 6/6 Running 46 111d
ovnkube-master-vjvk8 6/6 Running 93 111d
Check logical_switch_port in nbdb database container
There are a lot of reboots going right now. You may do the following to check for "failed" pod entries in nbdb containers:
oc exec -n openshift-ovn-kubernetes ovnkube-master-pbg8h -c nbdb -- ovn-nbctl --no-leader-only list logical_switch_port openshift-monitoring_alertmanager-main-1
Output:
_uuid : 0f1a1aa2-8403-42d2-b366-3ec17184db1f
addresses : ["0a:45:0a:83:04:1a 10.131.4.27"]
dhcpv4_options : []
dhcpv6_options : []
dynamic_addresses : []
enabled : []
external_ids : {namespace=openshift-monitoring, pod="true"}
ha_chassis_group : []
name : openshift-monitoring_alertmanager-main-1
options : {requested-chassis=worker-6.core-pre.goglides.dev}
parent_name : []
port_security : ["0a:45:0a:83:04:1a 10.131.4.27"]
tag : []
tag_request : []
type : ""
up : true
The result indicates that the component is up. I have no idea what is happening. I did the following to restart every pod from ovnkube-master
.
oc delete pods -l ovn-db-pod=true
After sometimes all ovnkube-master
pods are up and running,
oc get pods -l ovn-db-pod=true
NAME READY STATUS RESTARTS AGE
ovnkube-master-djw2j 6/6 Running 0 16m
ovnkube-master-fc87t 6/6 Running 0 10m
ovnkube-master-jc2ln 6/6 Running 0 13m
Somehow this also fixes the issue. Now my testpod
is running.
oc get pods testpod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
testpod 1/1 Running 0 42m 10.129.4.16 worker-8.core-pre.goglides.dev <none> <none>
Original solution posted here in redhat forum.
Top comments (0)