* Wait for kube-apiserver availability before starting upgrade
I am experiencing a timing issue when upgrading from kubespray
2.11.0(k8s 1.15.3) to kubespray 2.12.6(k8s 1.16.9). The certificates get
replaced in `kubeadm-secondary-legacy.yml` and kube-apiserver notices a mismatch
(for a fraction of a second) between `apiserver.crt` and `apiserver.key`
which causes it to restart. And sometimes ( ~ 1 out of 5 upgrades) the
kube-apiserver isn't back on time for the start of the upgrade task. It
fails when kubeadm checks with the kube-apiserver to start the upgrade. The
kube-apiserver returns a `connect: connection refused`. I have created this
small task to check the availability of the kube-apiserver before starting the
upgrade, so that the upgrade will run without an issue.
Signed-off-by: Rick Haan <rickhaan94@gmail.com>
* Fix markdownlint
* Remove old CI
Co-authored-by: Maxime Guyot <maxime@root314.com>
* etcd: etcd-events doesn't depend on etcd_cluster_setup
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: remove condition already present on include_tasks
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: fix scaling up
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use *access_addresses, do not delegate to etcd[0]
We want to wait for the full cluster to be healthy,
so use all the cluster addresses
Also we should be able to run the playbook when etcd[0] is down
(not tested), so do not delegate to etcd[0]
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use failed_when for health check
unhealthy cluster is expected on first run, so use failed_when
instead of ignore_errors to remove scary red messages
Also use run_once
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/preinstall: ensure ansible_fqdn is up to date after changing /etc/hosts
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/master: regenerate apiserver cert if needed
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
(cherry picked from commit a35b6dc1af)
* Remove dockerproject org (#5548)
* Change dockerproject.org to download.docker.com
dockerproject.org was deprecated in 2017 and has gone down.
* Restore yum repo for containerd
Change-Id: I883bb512a2164a85865b1bd4fb569af0358c8c2b
Co-authored-by: Craig Rodrigues <rodrigc@crodrigues.org>
* remove legacy docker repo in kubernetes/preinstall before any packages installed (#5640)
* Remove dockerproject_.+_repo_.+ variables (#5662)
This 38688a4486 change replaces the
value for dockerproject_.+_repo_.+ docker variables but their new
value was previously defined in other variables. This change removes
the dockerproject_.+_repo_.+ docker variables in favor of the older
ones.
* Remove stale legacy yum docker repo /etc/yum.repos.d/docker.repo (#5569)
* Remove stale legacy yum docker repo /etc/yum.repos.d/docker.repo
* move task 'Remove legacy docker repo file' to pre-upgrade.yml
* fix upgrade procedure when in playbook (#5695)
exists role kubernetes/preinstall and not exists role container-engine
error 'yum_repo_dir' is undefined
Co-authored-by: Matthew Mosesohn <matthew.mosesohn@gmail.com>
Co-authored-by: Craig Rodrigues <rodrigc@crodrigues.org>
Co-authored-by: Victor Morales <chipahuac@hotmail.com>
When running with serial != 100%, like upgrade_cluster.yml, we need to apply this fixup each time
Problem was introduced in 05dc2b3a09
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
(cherry picked from commit 5e9479cded)
* Fix python3-libselinux installation for RHEL/CentOS 8
In bootstrap-centos.yml we haven't gathered the facts,
so #5127 couldn't work
Minimum ansible version to run kubespray is 2.7.8,
so ansible_distribution_major_version is defined an there is no need to default it
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Restart NetworkManager for RHEL/CentOS 8
network.service doesn't exist anymore
# systemctl status network
Unit network.service could not be found.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Add module_hotfixes=True to docker / containerd yum repo config
https://bugzilla.redhat.com/show_bug.cgi?id=1734081https://bugzilla.redhat.com/show_bug.cgi?id=1756473
Without this setting you end up with the following error:
# yum install docker-ce
Failed to set locale, defaulting to C
Last metadata expiration check: 0:03:21 ago on Thu Sep 26 22:00:05 2019.
Error:
Problem: package docker-ce-3:19.03.2-3.el7.x86_64 requires containerd.io >= 1.2.2-3, but none of the providers can be installed
- cannot install the best candidate for the job
- package containerd.io-1.2.2-3.3.el7.x86_64 is excluded
- package containerd.io-1.2.2-3.el7.x86_64 is excluded
- package containerd.io-1.2.4-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.5-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.6-3.3.el7.x86_64 is excluded
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
This fixes the scenario where masters are upgraded one at a time
and coredns gets improperly scaled back up to 2 replicas.
Change-Id: I7cc9283f40efcfd61b5813c89a5805c95d901567
Kubespray Pull Request #5084 (https://github.com/kubernetes-sigs/kubespray/pull/5084) caused more problems than it solved due to limitations with the synchronize module. See comments on Kubespray Issues #5059 (https://github.com/kubernetes-sigs/kubespray/issues/5059) and #5116 (https://github.com/kubernetes-sigs/kubespray/issues/5116). Details from Ansible documentation: "Currently, synchronize is limited to elevating permissions via passwordless sudo. This is because rsync itself is connecting to the remote machine and rsync doesn’t give us a way to pass sudo credentials in. ... Currently there are only a few connection types which support synchronize (ssh, paramiko, local, and docker) because a sync strategy has been determined for those connection types. Note that the connection for these must not need a password as rsync itself is making the connection and rsync does not provide us a way to pass a password to the connection. ..." Thus, reverting Pull Request #5084.
When using cluster.yml or scale.yml to add/scale nodes in the existing
k8s cluster, the `kubeadm init` wouldn't run. As a result, kube-proxy
wouldn't be created, and therefore the kube-proxy deletion task would
fail, e.g. in the case where kube-router is used and "kube_proxy_remove"
is set to true. As a workaround, add ignore_errors to the kube-proxy
deletion task.
The script is not usable unless you are in the '.vagrant/provisioners/ansible/inventory/artifacts' folder.
This update makes this usable from anywhere.
Cleaned up deprecated APIs:
apps/v1beta1
apps/v1beta2
extensions/v1beta1 for ds,deploy,rs
Add workaround for deploying helm using incompatible
deployment manifest.
Change-Id: I78b36741348f47a999df3841ee63cf4e6f377830
* Use python3-libselinux on RHEL8/Centos8
* The fact ansible_facts.distribution_major_version is not present on older Ansible version.
Default it to 0 in when not present and use libselinux-python as package to get current
default behaviour.