[2.17] Update kubernetes version to 1.21.6 (#8142 )

Fixed default DNS min replica for single node clusters (#8109 )
Implement drain fallback with --disable-eviction to ignore PDBs (#8102 )
2021-11-02 01:32:58 -07:00 · 2021-10-26 23:59:25 -07:00 · 2021-10-21 06:14:09 -07:00 · 2021-10-21 05:06:10 -07:00 · 2021-10-01 10:00:24 -07:00 · 2021-09-30 08:02:08 -07:00
26 changed files with 107 additions and 26 deletions
--- a/README.md
+++ b/README.md
@ -130,7 +130,7 @@ Note: Upstart/SysV init based OS types are not supported.
 ## Supported Components

 - Core
-  - [kubernetes](https://github.com/kubernetes/kubernetes) v1.21.5
+  - [kubernetes](https://github.com/kubernetes/kubernetes) v1.21.6
  - [etcd](https://github.com/coreos/etcd) v3.4.13
  - [docker](https://www.docker.com/) v20.10 (see note)
  - [containerd](https://containerd.io/) v1.4.9
--- a/docs/calico.md
+++ b/docs/calico.md
@ -189,7 +189,7 @@ To re-define default action please set the following variable in your inventory:
 calico_endpoint_to_host_action: "ACCEPT"
 ```

-## Optional : Define address on which Felix will respond to health requests
+### Optional : Define address on which Felix will respond to health requests

 Since Calico 3.2.0, HealthCheck default behavior changed from listening on all interfaces to just listening on localhost.

@ -199,6 +199,15 @@ To re-define health host please set the following variable in your inventory:
 calico_healthhost: "0.0.0.0"
 ```

+### Optional : Configure Calico Node probe timeouts
+
+Under certain conditions a deployer may need to tune the Calico liveness and readiness probes timeout settings. These can be configured like this:
+
+```yml
+calico_node_livenessprobe_timeout: 10
+calico_node_readinessprobe_timeout: 10
+```
+
 ## Config encapsulation for cross server traffic

 Calico supports two types of encapsulation: [VXLAN and IP in IP](https://docs.projectcalico.org/v3.11/networking/vxlan-ipip). VXLAN is supported in some environments where IP in IP is not (for example, Azure).
--- a/inventory/sample/group_vars/k8s_cluster/addons.yml
+++ b/inventory/sample/group_vars/k8s_cluster/addons.yml
@ -14,6 +14,7 @@ registry_enabled: false

 # Metrics Server deployment
 metrics_server_enabled: false
+# metrics_server_resizer: false
 # metrics_server_kubelet_insecure_tls: true
 # metrics_server_metric_resolution: 15s
 # metrics_server_kubelet_preferred_address_types: "InternalIP"
--- a/inventory/sample/group_vars/k8s_cluster/k8s-cluster.yml
+++ b/inventory/sample/group_vars/k8s_cluster/k8s-cluster.yml
@ -17,7 +17,7 @@ kube_token_dir: "{{ kube_config_dir }}/tokens"
 kube_api_anonymous_auth: true

 ## Change this to use another Kubernetes version, e.g. a current beta release
-kube_version: v1.21.5
+kube_version: v1.21.6

 # Where the binaries will be downloaded.
 # Note: ensure that you've enough disk space (about 1G)
--- a/inventory/sample/group_vars/k8s_cluster/k8s-net-calico.yml
+++ b/inventory/sample/group_vars/k8s_cluster/k8s-net-calico.yml
@ -103,3 +103,7 @@

 # Enable calico traffic encryption with wireguard
 # calico_wireguard_enabled: false
+
+# Under certain situations liveness and readiness probes may need tunning
+# calico_node_livenessprobe_timeout: 10
+# calico_node_readinessprobe_timeout: 10
--- a/roles/bootstrap-os/tasks/bootstrap-redhat.yml
+++ b/roles/bootstrap-os/tasks/bootstrap-redhat.yml
@ -16,6 +16,13 @@
  become: true
  when: not skip_http_proxy_on_os_packages

+- name: Add proxy to RHEL subscription-manager if http_proxy is defined
+  command: /sbin/subscription-manager config --server.proxy_hostname={{ http_proxy | regex_replace(':\\d+$') }} --server.proxy_port={{ http_proxy | regex_replace('^.*:') }}
+  become: true
+  when:
+    - not skip_http_proxy_on_os_packages
+    - http_proxy is defined
+
 - name: Check RHEL subscription-manager status
  command: /sbin/subscription-manager status
  register: rh_subscription_status
--- a/roles/download/defaults/main.yml
+++ b/roles/download/defaults/main.yml
@ -143,6 +143,7 @@ kubelet_checksums:
    v1.22.2: 941e639b0f859eba65df0c66be82808ea6be697ed5dbf4df8e602dcbfa683aa3
    v1.22.1: f42bc00f274be7ce0578b359cbccc48ead03894b599f5bf4d10e44c305fbab65
    v1.22.0: 4354dc8db1d8ca336eb940dd73adcd3cf17cbdefbf11889602420f6ee9c6c4bb
+    v1.21.6: 20571caa4edcab5c17c448099cff74f0c0c54087c91888a23fc59407b8836127
    v1.21.5: 9130b8b5677fc82b8292f115996370311021ebec404b9be01ff572b187efd45d
    v1.21.4: b3ca234719d75df246f5f3ae2426cb2a2659fcb2f42bae15ed2017f29b911e4d
    v1.21.3: 7375096bf6985ca3df94285bc69216b827ccabbc459b738984318df904679958
@ -181,6 +182,7 @@ kubelet_checksums:
    v1.22.2: f5fe3d6f4b2df5a794ebf325dc17fcdfe905a188e25f7c7e47d9cd15f14f8c2d
    v1.22.1: d5ffd67d8285fb224a1c49622fd739131f7b941e3d68f233dec96e72c9ebee63
    v1.22.0: cea637a7da4f1097b16b0195005351c07032a820a3d64c3ff326b9097cfac930
+    v1.21.6: 041441623c31bc6b0295342b8a2a5930d87545473e7c761ea79f3ff186c0ff52
    v1.21.5: 746a535956db55807ef71772d2a4afec5cc438233da23952167ec0aec6fe937b
    v1.21.4: 12c849ccc627e9404187adf432a922b895c8bdecfd7ca901e1928396558eb043
    v1.21.3: 5d21da1145c25181605b9ad0810401545262fc421bbaae683bdb599632e834c1
@ -219,6 +221,7 @@ kubelet_checksums:
    v1.22.2: 0fd6572e24e3bebbfd6b2a7cb7adced41dad4a828ef324a83f04b46378a8cb24
    v1.22.1: 2079780ad2ff993affc9b8e1a378bf5ee759bf87fdc446e6a892a0bbd7353683
    v1.22.0: fec5c596f7f815f17f5d7d955e9707df1ef02a2ca5e788b223651f83376feb7f
+    v1.21.6: 422c29a1ba3bfeb2fc26ebd1c3596847fbbeeeef0ce2694515504513dc907813
    v1.21.5: 600f70fe0e69151b9d8ac65ec195bcc840687f86ba397fce27be1faae3538a6f
    v1.21.4: cdd46617d1a501531c62421de3754d65f30ad24d75beae2693688993a12bb557
    v1.21.3: 5bd542d656caabd75e59757a3adbae3e13d63c7c7c113d2a72475574c3c640fe
@ -258,6 +261,7 @@ kubectl_checksums:
    v1.22.2: a16f7d70e65589d2dbd5d4f2115f6ccd4f089fe17a2961c286b809ad94eb052a
    v1.22.1: 50991ec4313ee42da03d60e21b90bc15e3252c97db189d1b66aad5bbb555997b
    v1.22.0: 6d7c787416a148acffd49746837df4cebb1311c652483dc3d2c8d24ce1cc897e
+    v1.21.6: 9100bc13498f770a5a1524665a9dc2470d3a15518e53aba68c700f10f3def978
    v1.21.5: 51955c2fec47b83c904004fedde970b6c8f37a7a5f3c2910b6dd63b99fa697e5
    v1.21.4: bb741dae49b17b7784dc2460467c876e9f961c14f628de7553d023cdef85b1ac
    v1.21.3: 603b6e57c5546c079faee6b606014e83b95ea076146fbf73329f3069968f83bf
@ -296,6 +300,7 @@ kubectl_checksums:
    v1.22.2: c5bcc7e5321d34ac42c4635ad4f6fe8bd4698e9c879dc3367be542a0b301297b
    v1.22.1: 5c7ef1e505c35a8dc0b708f6b6ecdad6723875bb85554e9f9c3fe591e030ae5c
    v1.22.0: 8d9cc92dcc942f5ea2b2fc93c4934875d9e0e8ddecbde24c7d4c4e092cfc7afc
+    v1.21.6: a193997181cdfa00be0420ac6e7f4cfbf6cedd6967259c5fda1d558fa9f4efe0
    v1.21.5: fca8de7e55b55cceab9902aae03837fb2f1e72b97aa09b2ac9626bdbfd0466e4
    v1.21.4: 8ac78de847118c94e2d87844e9b974556dfb30aff0e0d15fd03b82681df3ac98
    v1.21.3: 2be58b5266faeeb93f38fa72d36add13a950643d2ae16a131f48f5a21c66ef23
@ -334,6 +339,7 @@ kubectl_checksums:
    v1.22.2: aeca0018958c1cae0bf2f36f566315e52f87bdab38b440df349cd091e9f13f36
    v1.22.1: 78178a8337fc6c76780f60541fca7199f0f1a2e9c41806bded280a4a5ef665c9
    v1.22.0: 703e70d49b82271535bc66bc7bd469a58c11d47f188889bd37101c9772f14fa1
+    v1.21.6: 810eadc2673e0fab7044f88904853e8f3f58a4134867370bf0ccd62c19889eaa
    v1.21.5: 060ede75550c63bdc84e14fcc4c8ab3017f7ffc032fc4cac3bf20d274fab1be4
    v1.21.4: 9410572396fb31e49d088f9816beaebad7420c7686697578691be1651d3bf85a
    v1.21.3: 631246194fc1931cb897d61e1d542ef2321ec97adcb859a405d3b285ad9dd3d6
@ -373,6 +379,7 @@ kubeadm_checksums:
    v1.22.2: 6ccc26494160e19468b0cb55d56b2d5c62d21424fac79cb66402224c2bf73a0d
    v1.22.1: cc08281c5261e860df9a0b5040b8aa2e6d202a243daf25556f5f6d3fd8f2e1e9
    v1.22.0: 6a002deb0ee191001d5c0e0435e9a995d70aa376d55075c5f61e70ce198433b8
+    v1.21.6: 02951dae946dd5588ccda71b6e28f0d91adf7a94b57792b412635fcce7099d74
    v1.21.5: 39c98582b0a2444e7d6bc85dc5eac5217aee5dd18c2de7e1d5aed09415023201
    v1.21.4: f1ff5765439624c162489e4f037d12d9f8adf96c04cb298c06aeb7217d620349
    v1.21.3: 25eac1922276a0b4aabda92df67882be25a2462e84245f4231f5a888a8ab8bae
@ -411,6 +418,7 @@ kubeadm_checksums:
    v1.22.2: 77b4c6a56ae0ec142f54a6f5044a7167cdd7193612b04b77bf433ffe1d1918ef
    v1.22.1: 85df7978b2e5bb78064ed0bcce14a39d105a1a3968bb92ee5d2f96a1fa09ed12
    v1.22.0: 9fc14b993de2c275b54445255d7770bd1d6cdb49f4cf9c227c5b035f658a2351
+    v1.21.6: 498325da2521ce67b27902967daf4087153c5797070e03bf0bdd7c846f4d61a8
    v1.21.5: 5a273b023eaa60d7820436b0f0062c4bd467274d6f2b86a9e13270c91d663618
    v1.21.4: 30645f57296281d214a9dd787a90bd16207df4b1fca7ac320913c616818a92cd
    v1.21.3: 5bff1c6cd1d683ce191d271b968d7b776ae5ed7403bdab5fa88446100e74972c
@ -449,6 +457,7 @@ kubeadm_checksums:
    v1.22.2: 4ff09d3cd2118ee2670bc96ed034620a9a1ea6a69ef38804363d4710a2f90d8c
    v1.22.1: 50a5f0d186d7aefae309539e9cc7d530ef1a9b45ce690801655c2bee722d978c
    v1.22.0: 90a48b92a57ff6aef63ff409e2feda0713ca926b2cd243fe7e88a84c483456cc
+    v1.21.6: fef4b40acd982da99294be07932eabedd476113ce5dc38bb9149522e32dada6d
    v1.21.5: e384171fcb3c0de924904007bfd7babb0f970997b93223ed7ffee14d29019353
    v1.21.4: 286794aed41148e82a77087d79111052ea894796c6ae81fc463275dcd848f98d
    v1.21.3: 82fff4fc0cdb1110150596ab14a3ddcd3dbe53f40c404917d2e9703f8f04787a
--- a/roles/kubernetes-apps/ansible/defaults/main.yml
+++ b/roles/kubernetes-apps/ansible/defaults/main.yml
@ -3,7 +3,7 @@
 dns_memory_limit: 170Mi
 dns_cpu_requests: 100m
 dns_memory_requests: 70Mi
-dns_min_replicas: 2
+dns_min_replicas: "{{ [ 2, groups['k8s_cluster'] | length ] | min }}"
 dns_nodes_per_replica: 16
 dns_cores_per_replica: 256
 dns_prevent_single_point_failure: "{{ 'true' if dns_min_replicas|int > 1 else 'false' }}"
--- a/roles/kubernetes-apps/external_cloud_controller/openstack/templates/external-openstack-cloud-config.j2
+++ b/roles/kubernetes-apps/external_cloud_controller/openstack/templates/external-openstack-cloud-config.j2
@ -1,6 +1,6 @@
 [Global]
 auth-url="{{ external_openstack_auth_url }}"
-{% if external_openstack_application_credential_id is not defined and external_openstack_application_credential_name is not defined %}
+{% if external_openstack_application_credential_id == "" and external_openstack_application_credential_name == "" %}
 username="{{ external_openstack_username }}"
 password="{{ external_openstack_password }}"
 {% endif %}
--- a/roles/kubernetes-apps/metrics_server/defaults/main.yml
+++ b/roles/kubernetes-apps/metrics_server/defaults/main.yml
@ -1,4 +1,5 @@
 ---
+metrics_server_resizer: false
 metrics_server_kubelet_insecure_tls: true
 metrics_server_kubelet_preferred_address_types: "InternalIP"
 metrics_server_metric_resolution: 15s
--- a/roles/kubernetes-apps/metrics_server/templates/metrics-server-deployment.yaml.j2
+++ b/roles/kubernetes-apps/metrics_server/templates/metrics-server-deployment.yaml.j2
@ -67,7 +67,6 @@ spec:
          failureThreshold: 3
          initialDelaySeconds: 40
        securityContext:
-          allowPrivilegeEscalation: false
          capabilities:
            drop: ["all"]
            add: ["NET_BIND_SERVICE"]
@ -82,6 +81,7 @@ spec:
          requests:
            cpu: {{ metrics_server_requests_cpu }}
            memory: {{ metrics_server_requests_memory }}
+{% if metrics_server_resizer %}
      - name: metrics-server-nanny
        image: {{ addon_resizer_image_repo }}:{{ addon_resizer_image_tag }}
        imagePullPolicy: {{ k8s_image_pull_policy }}
@ -119,6 +119,7 @@ spec:
          # Specifies the smallest cluster (defined in number of nodes)
          # resources will be scaled to.
          - --minClusterSize={{ metrics_server_min_cluster_size }}
+{% endif %}
      volumes:
        - name: metrics-server-config-volume
          configMap:
--- a/roles/kubernetes/control-plane/tasks/kubeadm-setup.yml
+++ b/roles/kubernetes/control-plane/tasks/kubeadm-setup.yml
@ -150,8 +150,8 @@

 - name: Create hardcoded kubeadm token for joining nodes with 24h expiration (if defined)
  shell: >-
-    {{ bin_dir }}/kubeadm --kubeconfig /etc/kubernetes/admin.conf token delete {{ kubeadm_token }} || :;
-    {{ bin_dir }}/kubeadm --kubeconfig /etc/kubernetes/admin.conf token create {{ kubeadm_token }}
+    {{ bin_dir }}/kubeadm --kubeconfig {{ kube_config_dir }}/admin.conf token delete {{ kubeadm_token }} || :;
+    {{ bin_dir }}/kubeadm --kubeconfig {{ kube_config_dir }}/admin.conf token create {{ kubeadm_token }}
  changed_when: false
  when:
    - inventory_hostname == groups['kube_control_plane']|first
@ -161,7 +161,7 @@
    - kubeadm_token

 - name: Create kubeadm token for joining nodes with 24h expiration (default)
-  command: "{{ bin_dir }}/kubeadm --kubeconfig /etc/kubernetes/admin.conf token create"
+  command: "{{ bin_dir }}/kubeadm --kubeconfig {{ kube_config_dir }}/admin.conf token create"
  changed_when: false
  register: temp_token
  retries: 5
--- a/roles/kubernetes/control-plane/tasks/kubeadm-upgrade.yml
+++ b/roles/kubernetes/control-plane/tasks/kubeadm-upgrade.yml
@ -62,7 +62,7 @@
 - name: kubeadm | scale down coredns replicas to 0 if not using coredns dns_mode
  command: >-
    {{ bin_dir }}/kubectl
-    --kubeconfig /etc/kubernetes/admin.conf
+    --kubeconfig {{ kube_config_dir }}/admin.conf
    -n kube-system
    scale deployment/coredns --replicas 0
  register: scale_down_coredns
--- a/roles/kubernetes/control-plane/templates/k8s-certs-renew.sh.j2
+++ b/roles/kubernetes/control-plane/templates/k8s-certs-renew.sh.j2
@ -14,7 +14,7 @@ echo "## Restarting control plane pods managed by kubeadm ##"
 {% endif %}

 echo "## Updating /root/.kube/config ##"
-/usr/bin/cp {{ kube_config_dir }}/admin.conf /root/.kube/config
+cp {{ kube_config_dir }}/admin.conf /root/.kube/config

 echo "## Waiting for apiserver to be up again ##"
 until printf "" 2>>/dev/null >>/dev/tcp/127.0.0.1/6443; do sleep 1; done
--- a/roles/kubernetes/preinstall/vars/debian-11.yml
+++ b/roles/kubernetes/preinstall/vars/debian-11.yml
@ -6,3 +6,4 @@ required_pkgs:
  - software-properties-common
  - conntrack
  - iptables
+  - apparmor
--- a/roles/kubernetes/preinstall/vars/debian.yml
+++ b/roles/kubernetes/preinstall/vars/debian.yml
@ -5,3 +5,4 @@ required_pkgs:
  - apt-transport-https
  - software-properties-common
  - conntrack
+  - apparmor
--- a/roles/kubernetes/preinstall/vars/ubuntu.yml
+++ b/roles/kubernetes/preinstall/vars/ubuntu.yml
@ -5,3 +5,4 @@ required_pkgs:
  - apt-transport-https
  - software-properties-common
  - conntrack
+  - apparmor
--- a/roles/kubespray-defaults/defaults/main.yaml
+++ b/roles/kubespray-defaults/defaults/main.yaml
@ -15,7 +15,7 @@ is_fedora_coreos: false
 disable_swap: true

 ## Change this to use another Kubernetes version, e.g. a current beta release
-kube_version: v1.21.3
+kube_version: v1.21.6

 ## The minimum version working
 kube_version_min_required: v1.19.0
--- a/roles/network_plugin/calico/tasks/pre.yml
+++ b/roles/network_plugin/calico/tasks/pre.yml
@ -12,7 +12,9 @@
  - name: Set fact calico_datastore to etcd if needed
    set_fact:
      calico_datastore: etcd
-    when: "'etcd_endpoints' in calico_cni_config.plugins.0"
+    when:
+    - "'plugins' in calico_cni_config"
+    - "'etcd_endpoints' in calico_cni_config.plugins.0"
  when: calico_cni_config_slurp.content is defined

 - name: Calico | Get kubelet hostname
--- a/roles/network_plugin/calico/templates/calico-node.yml.j2
+++ b/roles/network_plugin/calico/templates/calico-node.yml.j2
@ -305,6 +305,7 @@ spec:
 {% endif %}
            periodSeconds: 10
            initialDelaySeconds: 10
+            timeoutSeconds: {{ calico_node_livenessprobe_timeout | default(10) }}
            failureThreshold: 6
          readinessProbe:
            exec:
@ -315,6 +316,7 @@ spec:
 {% endif %}
              - -felix-ready
            periodSeconds: 10
+            timeoutSeconds: {{ calico_node_readinessprobe_timeout | default(10) }}
            failureThreshold: 6
          volumeMounts:
            - mountPath: /lib/modules
--- a/roles/network_plugin/calico/templates/calico-typha.yml.j2
+++ b/roles/network_plugin/calico/templates/calico-typha.yml.j2
@ -108,14 +108,6 @@ spec:
            value: /etc/typha/server_certificate.pem
          - name: TYPHA_SERVERKEYFILE
            value: /etc/typha/server_key.pem
-        volumeMounts:
-          - mountPath: /etc/typha
-            name: typha-server
-            readOnly: true
-          - mountPath: /etc/ca/ca.crt
-            subPath: ca.crt
-            name: cacert
-            readOnly: true
 {% endif %}
 {% if typha_prometheusmetricsenabled %}
          # Since Typha is host-networked,
@ -124,6 +116,16 @@ spec:
            value: "true"
          - name: TYPHA_PROMETHEUSMETRICSPORT
            value: "{{ typha_prometheusmetricsport }}"
+{% endif %}
+{% if typha_secure %}
+        volumeMounts:
+          - mountPath: /etc/typha
+            name: typha-server
+            readOnly: true
+          - mountPath: /etc/ca/ca.crt
+            subPath: ca.crt
+            name: cacert
+            readOnly: true
 {% endif %}
          # Needed for version >=3.7 when the 'host-local' ipam is used
          # Should never happen given templates/cni-calico.conflist.j2
--- a/roles/network_plugin/cilium/templates/cilium-config.yml.j2
+++ b/roles/network_plugin/cilium/templates/cilium-config.yml.j2
@ -38,6 +38,8 @@ data:
  # scheduled.
 {% if cilium_enable_prometheus %}
  prometheus-serve-addr: ":9090"
+  operator-prometheus-serve-addr: ":6942"
+  enable-metrics: "true"
 {% endif %}

  # If you want to run cilium in debug mode change this value to true
--- a/roles/remove-node/pre-remove/tasks/main.yml
+++ b/roles/remove-node/pre-remove/tasks/main.yml
@ -9,7 +9,7 @@

 - name: remove-node | Drain node except daemonsets resource  # noqa 301
  command: >-
-    {{ bin_dir }}/kubectl --kubeconfig /etc/kubernetes/admin.conf drain
+    {{ bin_dir }}/kubectl --kubeconfig {{ kube_config_dir }}/admin.conf drain
      --force
      --ignore-daemonsets
      --grace-period {{ drain_grace_period }}
--- a/roles/upgrade/post-upgrade/tasks/main.yml
+++ b/roles/upgrade/post-upgrade/tasks/main.yml
@ -1,6 +1,6 @@
 ---
 - name: Uncordon node
-  command: "{{ bin_dir }}/kubectl --kubeconfig /etc/kubernetes/admin.conf uncordon {{ kube_override_hostname|default(inventory_hostname) }}"
+  command: "{{ bin_dir }}/kubectl --kubeconfig {{ kube_config_dir }}/admin.conf uncordon {{ kube_override_hostname|default(inventory_hostname) }}"
  delegate_to: "{{ groups['kube_control_plane'][0] }}"
  when:
    - needs_cordoning|default(false)
--- a/roles/upgrade/pre-upgrade/defaults/main.yml
+++ b/roles/upgrade/pre-upgrade/defaults/main.yml
@ -6,6 +6,12 @@ drain_nodes: true
 drain_retries: 3
 drain_retry_delay_seconds: 10

+drain_fallback_enabled: false
+drain_fallback_grace_period: 300
+drain_fallback_timeout: 360s
+drain_fallback_retries: 0
+drain_fallback_retry_delay_seconds: 10
+
 upgrade_node_always_cordon: false
 upgrade_node_uncordon_after_drain_failure: true
 upgrade_node_fail_if_drain_fails: true
--- a/roles/upgrade/pre-upgrade/tasks/main.yml
+++ b/roles/upgrade/pre-upgrade/tasks/main.yml
@ -73,18 +73,50 @@
        {{ bin_dir }}/kubectl drain
        --force
        --ignore-daemonsets
-        --grace-period {{ drain_grace_period }}
-        --timeout {{ drain_timeout }}
+        --grace-period {{ hostvars['localhost']['drain_grace_period_after_failure'] | default(drain_grace_period) }}
+        --timeout {{ hostvars['localhost']['drain_timeout_after_failure'] | default(drain_timeout) }}
        --delete-local-data {{ kube_override_hostname|default(inventory_hostname) }}
        {% if drain_pod_selector %}--pod-selector '{{ drain_pod_selector }}'{% endif %}
      when: drain_nodes
      register: result
+      failed_when:
+        - result.rc != 0
+        - not drain_fallback_enabled
      until: result.rc == 0
      retries: "{{ drain_retries }}"
      delay: "{{ drain_retry_delay_seconds }}"
+
+    - name: Drain fallback
+      block:
+        - name: Set facts after regular drain has failed
+          set_fact:
+            drain_grace_period_after_failure: "{{ drain_fallback_grace_period }}"
+            drain_timeout_after_failure: "{{ drain_fallback_timeout }}"
+          delegate_to: localhost
+          delegate_facts: yes
+          run_once: yes
+
+        - name: Drain node - fallback with disabled eviction
+          command: >-
+            {{ bin_dir }}/kubectl drain
+            --force
+            --ignore-daemonsets
+            --grace-period {{ drain_fallback_grace_period }}
+            --timeout {{ drain_fallback_timeout }}
+            --delete-local-data {{ kube_override_hostname|default(inventory_hostname) }}
+            {% if drain_pod_selector %}--pod-selector '{{ drain_pod_selector }}'{% endif %}
+            --disable-eviction
+          register: drain_fallback_result
+          until: drain_fallback_result.rc == 0
+          retries: "{{ drain_fallback_retries }}"
+          delay: "{{ drain_fallback_retry_delay_seconds }}"
+      when:
+        - drain_nodes
+        - drain_fallback_enabled
+        - result.rc != 0
  rescue:
    - name: Set node back to schedulable
-      command: "{{ bin_dir }}/kubectl --kubeconfig /etc/kubernetes/admin.conf uncordon {{ inventory_hostname }}"
+      command: "{{ bin_dir }}/kubectl --kubeconfig {{ kube_config_dir }}/admin.conf uncordon {{ inventory_hostname }}"
      when: upgrade_node_uncordon_after_drain_failure
    - name: Fail after rescue
      fail:
Author	SHA1	Message	Date
Kenichi Omichi	eeeca4a1d0	[2.17] Update kubernetes version to 1.21.6 (#8142 )	2021-11-02 01:32:58 -07:00
Sébastien Masset	7e296b1523	Fixed default DNS min replica for single node clusters (#8109 )	2021-10-26 23:59:25 -07:00
Utku Özdemir	488fbd8a37	Implement drain fallback with --disable-eviction to ignore PDBs (#8102 ) Signed-off-by: Utku Ozdemir <uoz@protonmail.com>	2021-10-21 06:14:09 -07:00
Cristian Calin	f7242d39b9	Calico: increase calico node probe timeouts and allow tunning (#7981 ) (#8103 )	2021-10-21 05:06:10 -07:00
Mathieu Parent	87fee0cccf	[2.17] Fix containerd failed to start if apparmor is not installed (#8042 ) * Ensure apparmor is installed (#8011) Kubespray deployment failed when using containerd backend on nodes that apparmor was not installed or previously removed. This PR ensure apparmor is installed by adding it into required_pkgs var. (cherry picked from commit `4bace2491d`) * Ensure apparmor is installed (#8036) Kubespray deployment failed when using containerd backend on nodes that apparmor was not installed or previously removed. This PR ensure apparmor is installed by adding it into required_pkgs var. (cherry picked from commit `af04906b51`) Co-authored-by: rtsp <git@rtsp.us>	2021-10-01 10:00:24 -07:00
Kenichi Omichi	45018ac077	Check if openstack application credentials are empty since they always exists (#8021 ) (#8038 ) Co-authored-by: Hugo Blom <bl0m1@users.noreply.github.com>	2021-09-30 08:02:08 -07:00
Kenichi Omichi	9fafe9849b	Add proxy for subscription-manager (#8012 ) (#8039 ) If using proxy, it is necessary to configure it before running "subscription-manager status" command. This adds the step.	2021-09-30 02:20:08 -07:00
Kenichi Omichi	3b2b618cd2	check if 'plugins' key exists in calico_cni_config object (#7717 ) (#8040 ) * check if 'plugins' key exists in calico_cni_config object * fix whitespace linting error * fixed when list indentation Co-authored-by: David Louks <2402775+dlouks@users.noreply.github.com>	2021-09-30 02:12:07 -07:00
Kenichi Omichi	bf1bb5984b	Use kube_config_dir for kubeconfig (#7996 ) (#8037 ) The path of kubeconfig should be configurable, and its default value is /etc/kubernetes/admin.conf. Most paths of the file are configurable but some were not. This make those configurable.	2021-09-30 02:08:08 -07:00
Kenichi Omichi	04a8a19ce6	Issue 8004: Fix typha prometheus (#8005 ) (#8035 ) The typha prometheus settings were in the `volumeMounts` section of the spec and not in the `envs` section. This was cauing the deployment to fail because it was looking for a volumeMount. ``` failed: [controller-001.a2.da.dev.logdna.net] (item=calico-typha.yml) => {"ansible_loop_var": "item", "changed": false, "item": {"ansible_loop_var": "item", "changed": true, "checksum": "598ac79530749e8e2110793b53fc49ac208e7130", "dest": "/etc/kubernetes/calico-typha.yml", "diff": [], "failed": false, "gid": 0, "group": "root", "invocation": {"module_args": {"_original_basename": "calico-typha.yml.j2", "attributes": null, "backup": false, "checksum": "598ac79530749e8e2110793b53fc49ac208e7130", "content": null, "delimiter": null, "dest": "/etc/kubernetes/calico-typha.yml", "directory_mode": null, "follow": false, "force": true, "group": null, "local_follow": null, "mode": null, "owner": null, "regexp": null, "remote_src": null, "selevel": null, "serole": null, "setype": null, "seuser": null, "src": "/home/core/.ansible/tmp/ansible-tmp-1632349768.56-75434-32452975679246/source", "unsafe_writes": null, "validate": null}}, "item": {"file": "calico-typha.yml", "name": "calico", "type": "typha"}, "md5sum": "53c00ac7f562cf9ecbbfd27899ea066d", "mode": "0644", "owner": "root", "size": 5378, "src": "/home/core/.ansible/tmp/ansible-tmp-1632349768.56-75434-32452975679246/source", "state": "file", "uid": 0}, "msg": "error running kubectl (/opt/bin/kubectl --namespace=kube-system apply --force --filename=/etc/kubernetes/calico-typha.yml) command (rc=1), out='service/calico-typha unchanged\n', err='error: error validating \"/etc/kubernetes/calico-typha.yml\": error validating data: [ValidationError(Deployment.spec.template.spec.containers[0].volumeMounts[2]): unknown field \"value\" in io.k8s.api.core.v1.VolumeMount, ValidationError(Deployment.spec.template.spec.containers[0].volumeMounts[2]): missing required field \"mountPath\" in io.k8s.api.core.v1.VolumeMount, ValidationError(Deployment.spec.template.spec.containers[0].volumeMounts[3]): unknown field \"value\" in io.k8s.api.core.v1.VolumeMount, ValidationError(Deployment.spec.template.spec.containers[0].volumeMounts[3]): missing required field \"mountPath\" in io.k8s.api.core.v1.VolumeMount]; if you choose to ignore these errors, turn validation off with --validate=false\n'"} ``` Co-authored-by: Eric Lake <ericlake@gmail.com>	2021-09-29 10:22:49 -07:00
Kenichi Omichi	ae1fb69382	Fix cilium operator metrics activation (#8000 ) (#8033 ) This is a cherry-pick of `598f178054` Co-authored-by: Léopold Jacquot <leopold.jacquot@infomaniak.com>	2021-09-29 01:32:49 -07:00
Kenichi Omichi	dfee7a8ec5	Fix k8s-certs-renew cp path (#7992 ) (#8032 ) This is a cherry-pick of `2211504790` Signed-off-by: Wang Zhen <lazybetrayer@gmail.com> Co-authored-by: Wang Zhen <lazybetrayer@gmail.com>	2021-09-29 01:28:48 -07:00
Kenichi Omichi	bd4407199c	Add metrics_server_resizer option (#8018 ) (#8031 ) The addon-resizer container can reduce resource limits of cpu and memory of metrics-server container in the pod, and that caused OOMKilled. In addition, the original metrics-server manifest doesn't contain the addon-resizer container as [1]. So this adds metrics_server_resizer option to control the addon-resizer container deployment and the default value is false to make it stable for most environments. This is a cherry-pick of `8d3961edbe` [1]: `527679e5e8/manifests/base/deployment.yaml`	2021-09-28 11:15:16 -07:00
Kenichi Omichi	6cfa3bbb22	Remove allowPrivilegeEscalation from metrics-server (#8014 ) (#8025 ) "allowPrivilegeEscalation: false" blocks deploying metrics-server on CentOS7. In addition, the original metrics-server manifest doesn't contain it as [1]. This removes it. [1]: `527679e5e8/manifests/base/deployment.yaml`	2021-09-27 23:54:43 -07:00