Fixed issue #7112. Created new API Server vars that replace defunct Controller Manager one (#7114)

Signed-off-by: Brendan Holmes <5072156+holmesb@users.noreply.github.com>
2021-01-08 15:20:53 +00:00 · 2021-01-08 15:20:53 +00:00 · b0ad8ec023
commit b0ad8ec023
parent ab2bfd7f8c
4 changed files with 25 additions and 13 deletions
--- a/docs/kubernetes-reliability.md
+++ b/docs/kubernetes-reliability.md
@ -43,8 +43,10 @@ attempts to set a status of node.
 At the same time Kubernetes controller manager will try to check
 `nodeStatusUpdateRetry` times every `--node-monitor-period` of time. After
-`--node-monitor-grace-period` it will consider node unhealthy. It will remove
+`--node-monitor-grace-period` it will consider node unhealthy.  Pods will then be rescheduled based on the
-its pods based on `--pod-eviction-timeout`
+[Taint Based Eviction](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#taint-based-evictions)
 timers that you set on them individually, or the API Server's global timers:`--default-not-ready-toleration-seconds` &
 ``--default-unreachable-toleration-seconds``.
 Kube proxy has a watcher over API. Once pods are evicted, Kube proxy will
 notice and will update iptables of the node. It will remove endpoints from
@ -57,12 +59,14 @@ services so pods from failed node won't be accessible anymore.
 If `-–node-status-update-frequency` is set to **4s** (10s is default).
 `--node-monitor-period` to **2s** (5s is default).
 `--node-monitor-grace-period` to **20s** (40s is default).
-`--pod-eviction-timeout` is set to **30s** (5m is default)
+`--default-not-ready-toleration-seconds` and ``--default-unreachable-toleration-seconds`` are set to **30**
 (300 seconds is default).  Note these two values should be integers representing the number of seconds ("s" or "m" for
 seconds\minutes are not specified).
 In such scenario, pods will be evicted in **50s** because the node will be
-considered as down after **20s**, and `--pod-eviction-timeout` occurs after
+considered as down after **20s**, and `--default-not-ready-toleration-seconds` or
-**30s** more.  However, this scenario creates an overhead on etcd as every node
+``--default-unreachable-toleration-seconds`` occur after **30s** more.  However, this scenario creates an overhead on
-will try to update its status every 2 seconds.
+etcd as every node will try to update its status every 2 seconds.
 If the environment has 1000 nodes, there will be 15000 node updates per
 minute which may require large etcd containers or even dedicated nodes for etcd.
@ -75,7 +79,8 @@ minute which may require large etcd containers or even dedicated nodes for etcd.
 ## Medium Update and Average Reaction
 Let's set `-–node-status-update-frequency` to **20s**
-`--node-monitor-grace-period` to **2m** and `--pod-eviction-timeout` to **1m**.
+`--node-monitor-grace-period` to **2m** and `--default-not-ready-toleration-seconds` and
 ``--default-unreachable-toleration-seconds`` to **60**.
 In that case, Kubelet will try to update status every 20s. So, it will be 6 * 5
 = 30 attempts before Kubernetes controller manager will consider unhealthy
 status of node. After 1m it will evict all pods. The total time will be 3m
@ -90,9 +95,9 @@ etcd updates per minute.
 ## Low Update and Slow reaction
 Let's set `-–node-status-update-frequency` to **1m**.
-`--node-monitor-grace-period` will set to **5m** and `--pod-eviction-timeout`
+`--node-monitor-grace-period` will set to **5m** and `--default-not-ready-toleration-seconds` and
-to **1m**. In this scenario, every kubelet will try to update the status every
+``--default-unreachable-toleration-seconds`` to **60**. In this scenario, every kubelet will try to update the status
-minute. There will be 5 * 5 = 25 attempts before unhealthy status. After 5m,
+every minute. There will be 5 * 5 = 25 attempts before unhealthy status. After 5m,
 Kubernetes controller manager will set unhealthy status. This means that pods
 will be evicted after 1m after being marked unhealthy. (6m in total).
--- a/docs/large-deployments.md
+++ b/docs/large-deployments.md
@ -30,7 +30,8 @@ For a large scaled deployments, consider the following configuration changes:
 * Tune ``kubelet_status_update_frequency`` to increase reliability of kubelet.
  ``kube_controller_node_monitor_grace_period``,
  ``kube_controller_node_monitor_period``,
-  ``kube_controller_pod_eviction_timeout`` for better Kubernetes reliability.
+  ``kube_apiserver_pod_eviction_not_ready_timeout_seconds`` &
  ``kube_apiserver_pod_eviction_unreachable_timeout_seconds`` for better Kubernetes reliability.
  Check out [Kubernetes Reliability](kubernetes-reliability.md)
 * Tune network prefix sizes. Those are ``kube_network_node_prefix``,
--- a/roles/kubernetes/master/defaults/main/main.yml
+++ b/roles/kubernetes/master/defaults/main/main.yml
@ -86,9 +86,10 @@ audit_webhook_batch_max_wait: 1s
 kube_controller_node_monitor_grace_period: 40s
 kube_controller_node_monitor_period: 5s
 kube_controller_pod_eviction_timeout: 5m0s
 kube_controller_terminated_pod_gc_threshold: 12500
 kube_apiserver_request_timeout: "1m0s"
 kube_apiserver_pod_eviction_not_ready_timeout_seconds: "300"
 kube_apiserver_pod_eviction_unreachable_timeout_seconds: "300"
 # 1.10+ admission plugins
 kube_apiserver_enable_admission_plugins: []
--- a/roles/kubernetes/master/templates/kubeadm-config.v1beta2.yaml.j2
+++ b/roles/kubernetes/master/templates/kubeadm-config.v1beta2.yaml.j2
@ -100,6 +100,12 @@ certificatesDir: {{ kube_cert_dir }}
 imageRepository: {{ kube_image_repo }}
 apiServer:
  extraArgs:
 {% if kube_apiserver_pod_eviction_not_ready_timeout_seconds is defined %}
    default-not-ready-toleration-seconds: "{{ kube_apiserver_pod_eviction_not_ready_timeout_seconds }}"
 {% endif %}
 {% if kube_apiserver_pod_eviction_unreachable_timeout_seconds is defined %}
    default-unreachable-toleration-seconds: "{{ kube_apiserver_pod_eviction_unreachable_timeout_seconds }}"
 {% endif %}
 {% if kube_api_anonymous_auth is defined %}
    anonymous-auth: "{{ kube_api_anonymous_auth }}"
 {% endif %}
@ -256,7 +262,6 @@ controllerManager:
  extraArgs:
    node-monitor-grace-period: {{ kube_controller_node_monitor_grace_period }}
    node-monitor-period: {{ kube_controller_node_monitor_period }}
    pod-eviction-timeout: {{ kube_controller_pod_eviction_timeout }}
    node-cidr-mask-size: "{{ kube_network_node_prefix }}"
    profiling: "{{ kube_profiling }}"
    terminated-pod-gc-threshold: "{{ kube_controller_terminated_pod_gc_threshold }}"