Goglides Dev 🌱

Cover image for Openshift-UPI: Keepalived to rescue failover
Balkrishna Pandey
Balkrishna Pandey

Posted on • Updated on

Openshift-UPI: Keepalived to rescue failover

What is Keepalived, and how does it work?

Before we start how we are using Keepalived in our Hyperconverged scenario, let's first understand what is Keepalived and how does it work?

Keepalived is a free, open-source, high availability software based on VRRP protocol to build two or more nodes in failover mode. It is mainly used to implement load balancer and router failover. Keepalived uses LVS (Linux Virtual Server) kernel module and ipvsadm user-space utility for its operation. It has been released under the GNU GPL license.

High availability means having an infrastructure or application up and running with no downtime. Downtime can be caused by many things such as power failure, network failure, hardware failure, etc. To achieve high availability, we need to have some redundancy in place so that if one component of the cluster fails, another member can take over, and there is no downtime.

VRRP is a protocol that elects a controller node from a group of nodes (called a VRRP cluster), and the traffic is routed through the controller node. If the controller node goes down, another node in the cluster is elected as master, and traffic is routed. The protocol establishes virtual routers, which are abstract representations of numerous routers working as a group, such as Primary/Active and Secondary/Standby routers. Instead of an actual router, the virtual router is allocated to operate as a default gateway for participating hosts. If the router that is routing packets on behalf of the virtual router fails, it is promptly replaced by another physical router. This failover happens automatically without any manual intervention.

VRRP has the concept of Virtual IP, also called VIP. This IP address acts as a floating IP, meaning if the node is down where the VIP is present, VRRP protocol makes sure the VIP address is allocated to a healthy node. So, based on the health check of Virtual IP, VRRP determines the node health and elects an IP owner.

At a given time,

Only one host can own the IP address.
Keepalived uses a priority system to elect the new controller node. All nodes have priority defined; you can adjust this priority using the Keepalived configuration. This priority can be from 1 to 254, and the node with the highest priority (closest to 255) will become the master. A priority of 0 means that the node will never become the master, and a priority of 255 means that the node will always become the master (unless there is another node with a priority of 255). If two nodes have the same priority, the node with the highest IP address will be elected as the master.

Using Keepalived in a Hyperconverged OCP Cluster

In a Hyperconverged OCP cluster, we use Keepalived to provide high availability for,

  • Kubernetes API (API VIP)
  • OpenShift Router (Ingress VIP)

HA for Kubernetes API (API VIP)

In a Kubernetes cluster, the API server is the central point for all operations. All other components, such as kubelet, Kube-proxy, etc., communicate with the API server to perform their functions. So, it is essential to have high availability for the API server.

In a Hyperconverged OCP cluster, we use Keepalived to provide high availability for the API server by creating a virtual IP (VIP) address. This VIP address is assigned to one of the nodes in the cluster, and if that node goes down, another node takes over the VIP and becomes the new API server. This failover is transparent to the users and happens automatically without any manual intervention.

To use Keepalived for the API server, we need to install it on all nodes in the cluster. We also need to create a Keepalived configuration file on each node. This configuration file contains the details of the virtual IP address, priority of the node, etc.

HA for OpenShift Router (Ingress VIP)

OpenShift Router is a component that provides load balancing, SSL termination, and name-based virtual hosting. It routes the traffic from outside the cluster to the services within the cluster. OpenShift Router can be deployed as a pod or as a container on any node in the cluster.

Here is an example Keepalived static manifest and configuration file which we are using for API Server and Openshift Router,

kind: Pod
apiVersion: v1
metadata:
  name: keepalived
  namespace: openshift-openstack-infra
  creationTimestamp:
  deletionGracePeriodSeconds: 65
  labels:
    app: openstack-infra-vrrp
spec:
  volumes:
  - name: resource-dir
    hostPath:
      path: "/etc/kubernetes/static-pod-resources/keepalived"
  - name: script-dir
    hostPath:
      path: "/etc/kubernetes/static-pod-resources/keepalived/scripts"
  - name: kubeconfig
    hostPath:
      path: "/etc/kubernetes"
  - name: kubeconfigvarlib
    hostPath:
      path: "/var/lib/kubelet"
  - name: conf-dir
    hostPath:
      path: "/etc/keepalived"
  - name: run-dir
    empty-dir: {}
  - name: chroot-host
    hostPath:
      path: "/"
  initContainers:
  - name: render-config-keepalived
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c1f0f7c54f0f2ecd38fdf2667651f95153a589bd7fe4605f0f96a97899576a08
    command:
    - runtimecfg
    - render
    - "/etc/kubernetes/kubeconfig"
    - "--api-vip"
    - "172.21.104.25"
    - "--ingress-vip"
    - "172.21.104.26"
    - "/config"
    - "--out-dir"
    - "/etc/keepalived"
    resources: {}
    volumeMounts:
    - name: kubeconfig
      mountPath: "/etc/kubernetes"
    - name: script-dir
      mountPath: "/config"
    - name: conf-dir
      mountPath: "/etc/keepalived"
    imagePullPolicy: IfNotPresent
  containers:
  - name: keepalived
    securityContext:
      privileged: true
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8c872154e89c7c361c882bcd6acff3ff97464e7f1a0311c631e9a939085e9934
    env:
      - name: NSS_SDB_USE_CACHE
        value: "no"
    command:
    - /bin/bash
    - -c
    - |
      #/bin/bash
      reload_keepalived()
      {
        if pid=$(pgrep -o keepalived); then
            kill -s SIGHUP "$pid"
        else
            /usr/sbin/keepalived -f /etc/keepalived/keepalived.conf --dont-fork --vrrp --log-detail --log-console &
        fi
      }

      msg_handler()
      {
        while read -r line; do
          echo "The client sent: $line" >&2
          # currently only 'reload' msg is supported
          if [ "$line" = reload ]; then
              reload_keepalived
          fi
        done
      }

      set -ex
      declare -r keepalived_sock="/var/run/keepalived/keepalived.sock"
      export -f msg_handler
      export -f reload_keepalived
      if [ -s "/etc/keepalived/keepalived.conf" ]; then
          /usr/sbin/keepalived -f /etc/keepalived/keepalived.conf --dont-fork --vrrp --log-detail --log-console &
      fi

      rm -f "$keepalived_sock"
      socat UNIX-LISTEN:${keepalived_sock},fork system:'bash -c msg_handler'
    resources:
      requests:
        cpu: 100m
        memory: 200Mi
    volumeMounts:
    - name: conf-dir
      mountPath: "/etc/keepalived"
    - name: run-dir
      mountPath: "/var/run/keepalived"
    livenessProbe:
      exec:
        command:
        - /bin/bash
        - -c
        - |
          kill -s SIGUSR1 "$(pgrep -o keepalived)" && ! grep -q "State = FAULT" /tmp/keepalived.data
      initialDelaySeconds: 20
    terminationMessagePolicy: FallbackToLogsOnError
    imagePullPolicy: IfNotPresent
  - name: keepalived-monitor
    securityContext:
      privileged: true
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c1f0f7c54f0f2ecd38fdf2667651f95153a589bd7fe4605f0f96a97899576a08
    env:
      - name: ENABLE_UNICAST
        value: "no"
      - name: IS_BOOTSTRAP
        value: "no"
    command:
    - dynkeepalived
    - "/var/lib/kubelet/kubeconfig"
    - "/config/keepalived.conf.tmpl"
    - "/etc/keepalived/keepalived.conf"
    - "--api-vip"
    - "172.21.104.25"
    - "--ingress-vip"
    - "172.21.104.26"
    resources:
      requests:
        cpu: 100m
        memory: 200Mi
    volumeMounts:
    - name: resource-dir
      mountPath: "/config"
    - name: kubeconfigvarlib
      mountPath: "/var/lib/kubelet"
    - name: conf-dir
      mountPath: "/etc/keepalived"
    - name: run-dir
      mountPath: "/var/run/keepalived"
    - name: chroot-host
      mountPath: "/host"
    imagePullPolicy: IfNotPresent
  hostNetwork: true
  tolerations:
  - operator: Exists
  priorityClassName: system-node-critical
status: {}
Enter fullscreen mode Exit fullscreen mode

Configuration:

global_defs {
    enable_script_security
    script_user root
}

# These are separate checks to provide the following behavior:
# If the loadbalanced endpoint is responding then all is well regardless
# of what the local api status is. Both checks will return success and
# we'll have the maximum priority. This means as long as there is a node
# with a functional loadbalancer it will get the VIP.
# If all of the loadbalancers go down but the local api is still running,
# the _both check will still succeed and allow any node with a functional
# api to take the VIP. This isn't preferred because it means all api
# traffic will go through one node, but at least it keeps the api available.
vrrp_script chk_ocp_lb {
    script "/usr/bin/timeout 1.9 /etc/keepalived/chk_ocp_script.sh"
    interval 2
    weight 20
    rise 3
    fall 2
}

vrrp_script chk_ocp_both {
    script "/usr/bin/timeout 1.9 /etc/keepalived/chk_ocp_script_both.sh"
    interval 2
    # Use a smaller weight for this check so it won't trigger the move from
    # bootstrap to master by itself.
    weight 5
    rise 3
    fall 2
}

# TODO: Improve this check. The port is assumed to be alive.
# Need to assess what is the ramification if the port is not there.
vrrp_script chk_ingress {
    script "/usr/bin/timeout 0.9 /usr/bin/curl -o /dev/null -Lfs http://localhost:1936/healthz/ready"
    interval 1
    weight 50
}

{{$nonVirtualIP := .NonVirtualIP}}

vrrp_instance {{ .Cluster.Name }}_API {
    state BACKUP
    interface {{ .VRRPInterface }}
    virtual_router_id {{ .Cluster.APIVirtualRouterID }}
    priority 40
    advert_int 1
    {{if .EnableUnicast}}
    unicast_src_ip {{.NonVirtualIP}}
    unicast_peer {
        {{ .BootstrapIP }}
        {{range .LBConfig.Backends}}
        {{if ne $nonVirtualIP .Address}}{{.Address}}{{end}}
        {{end}}
    }
    {{end}}
    authentication {
        auth_type PASS
        auth_pass {{ .Cluster.Name }}_api_vip
    }
    virtual_ipaddress {
        {{ .Cluster.APIVIP }}/{{ .Cluster.VIPNetmask }}
    }
    track_script {
        chk_ocp_lb
        chk_ocp_both
    }
}


vrrp_instance {{ .Cluster.Name }}_INGRESS {
    state BACKUP
    interface {{ .VRRPInterface }}
    virtual_router_id {{ .Cluster.IngressVirtualRouterID }}
    priority 40
    advert_int 1
    {{if .EnableUnicast}}
    unicast_src_ip {{.NonVirtualIP}}
    unicast_peer {
        {{range .IngressConfig.Peers}}
        {{if ne $nonVirtualIP .}}{{.}}{{end}}
        {{end}}
    }
    {{end}}
    authentication {
        auth_type PASS
        auth_pass {{ .Cluster.Name }}_ingress_vip
    }
    virtual_ipaddress {
        {{ .Cluster.IngressVIP }}/{{ .Cluster.VIPNetmask }}
    }
    track_script {
        chk_ingress
    }
}
Enter fullscreen mode Exit fullscreen mode

Keepalived Request Flow (During Openshift Bootstrapping Process)

Keepalived Request Flow

During initialization, we have a four-node cluster. The first node is the bootstrap node. The second, third, and fourth nodes are the controller nodes, which run the control plane pods. All four nodes have Keepalived installed and are configured with the same virtual IP address (VIP).

The bootstrap node is assigned the VIP and acts as the "controller" node when the cluster is first created. The other three nodes are in a "backup" state.

All three controller nodes pull the machine state from the bootstrap nodes during the bootstrap process. If the bootstrap node becomes unresponsive, those nodes will be unable to continue.

Once the bootstrap process is complete, one of the controller nodes will automatically take over the VIP and become the new "controller" node. After that, we can safely delete the bootstrap node.

Keepalived Request Flow (After bootstrapping process is complete)

Keepalived Request Flow Actual Cluster

After the initialization is complete, we have a three-node cluster. All three nodes have Keepalived installed and are configured with the same virtual IP address (VIP).

Based on the configuration Keepalived uses the VRRP protocol to evaluate node health; the verification is done based on configured time intervals. Based on this verification, it elects a VIP owner and assigned IP to that node and marks the node as an active router. The traffic is always sent to the active endpoint if something happens to active nodes. Keepalived elects a new leader, takes over the VIP address, and sends an Address Resolution Protocol (ARP) message.

You have to configure separate checks for each service, so in our POC, we will configure the health check for k8s API and Ingress controller.

I think this is a good stopping point for Keepalived; let's move on to the next topic.

Top comments (4)

Collapse
 
yashjethi profile image
Yash Singh

Hi Balkrishna, good article with deep dive on how to use static pods for setting up and using keepalived for HA Proxy and API endpoint. I have a quick question is it possible to configure and manage these static pods using MCO by just adding to the manifests.

Collapse
 
bkpandey profile image
Balkrishna Pandey

Yes it is possible to use machineconfig to create static file. Simply create a file where kubelet is tracking and rest kubelet will start managing the pod.

Collapse
 
yashjethi profile image
Yash Singh

Thanks for that, i was looking the static pod resources and there are some variables in haproxy config template. I was wondering how are these variables passed on to the haproxy config file, does installer passes these variables ?

Thread Thread
 
bkpandey profile image
Balkrishna Pandey

There are 2 containers in that pod, haproxy-monitor resolve the variable and produce the correct config, so that haproxy container can consume it.

  - name: haproxy-monitor
    securityContext:
      privileged: true
    image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c1f0f7c54f0f2ecd38fdf2667651f95153a589bd7fe4605f0f96a97899576a08
    command:
      - "/bin/bash"
      - "-c"
      - |            
        cp /host/etc/resolv.conf /etc/resolv.conf
        monitor /var/lib/kubelet/kubeconfig  /config/haproxy.cfg.tmpl  /etc/haproxy/haproxy.cfg  --api-vip 172.21.104.25
    resources:
Enter fullscreen mode Exit fullscreen mode