* Add docker-ce 19.03 packages for Debian & Ubuntu
K8s has updated the recommended Docker version to 19.03. More
specifically it should be 19.03.4, but since we used 18.06.7 instead of
.2, I'm assuming the latest patch version should be used here as well.
* Add docker 19.03 for redhat
The 'regexp' parameter matches last occurrence of a line starting with 'proxy=' and replaces it with the one defined in 'line' parameter. If no match - it works same way as before. This fixes resuming cluster deployments failed after that task (if there was no more than one line starting with 'proxy' in the yum.conf file - this condition should also be reassured with the change introduced here) eg. if they were initiated with Terraform.
refs #5277
As the issue describes, when no external or local load-balanced is used,
kube-proxy won't be able to contact apiserver at 127.0.0.1. So the
config map should be left as is.
* Upgrade etcd to 3.3.18
* Try with etcd 3.3.15 (kubeadm 1.16.7 default)
* Back to square one
* Try with 3.3.11
* Upgrade etcd to 3.3.18 (take 2)
* Try with 3.3.12
* download file
* download containers
* fix push image to nodes
* pull if none image on host
* fix
* improve docker image tag checks.
do not pull already cached images
* rebase fix merge conflict
* add support download_run_once when upgrade and scale cluster
add some test with download_run_once
* set default values to temp flag for every download cycle
* add save,load abilty for containerd and crio when download_run_once=true
* return redefine image save/load command to set_docker_image_facts.yml
* move set command to set_container_facts
* ctr in containerd_bin_dir
* fix order of ctr image export arguments
* temporary disable download_run_once for containerd and crio
due https://github.com/containerd/containerd/issues/4075
* remove unused files
* fix strict yaml linter warning and errors
* refactor logical conditions to pull and cache container images
* remove comment due lint check
* document role
* remove image_load_on_localhost, because cached images are always loaded to docker on remote sites
* remove XXX from debug output
* Run 'container-engine' after drain.
Move possibly disruptive role 'container-engine' to run after the node
is drained.
As that role have to be run on non-cluster nodes as well (etcd and
calico-rr), and those nodes are not drained, add play for that case.
* Check if api is up before upgrade.
If container engine is restarted in previous role, api controller can
take some time to start. This check ensures api is up before upgrade.
When kube-router is used as cni, rules might be added to the mangle table
to support external IPs. Therefore, mangle table should be flushed during
reset as well.
This 38688a4486 change replaces the
value for dockerproject_.+_repo_.+ docker variables but their new
value was previously defined in other variables. This change removes
the dockerproject_.+_repo_.+ docker variables in favor of the older
ones.
* Fix incorrect assertion comparison for kube_network_node_prefix
* Ignore assertion comparison for kube_network_node_prefix when using calico
* Adding more var docs description for kube_network_node_prefix
* Fixing trailing whitespaces
* External OpenStack Cloud Controller Manager implementation
* Adding controller image tag
* Minor fixes
* Restructuring the external cloud controller to work with KubeADM
* Added in code to allow control over pull policy for local path provisioner
* change to imagePullPolicy to use globally used variable k8s_image_pull_policy
* removed unusued variable from defaults
* updated contiv-etcd and cinder-csi-controllerplugin to use k8s_image_pull_policy variable
* Introduce kubelet_config_extra_args and kubelet_node_config_extra_args to pass params to kubelet via YAML config
* kubelet_config_extra_args is not the alternative
* Fix recover-control-plane to work with etcd 3.3.x and add CI
* Set default values for testcase
* Add actual test jobs
* Attempt to satisty gitlab ci linter
* Fix ansible targets
* Set etcd_member_name as stated in the docs...
* Recovering from 0 masters is not supported yet
* Add other master to broken_kube-master group as well
* Increase number of retries to see if etcd needs more time to heal
* Make number of retries for ETCD loops configurable, increase it for recovery CI and document it
* containerd: add proxy support
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubespray-defaults: add kube_service_addresses / kube_pods_subnet to no_proxy
CIDR notation in no_proxy is supported by a lot of programs/languages,
including go: https://github.com/golang/go/issues/16704
Without that containerd cannot talk the the API server (kube_apiserver_ip),
but it should not go through an external proxy for the nodes/pods/services
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Change dockerproject.org to download.docker.com
dockerproject.org was deprecated in 2017 and has gone down.
* Restore yum repo for containerd
Change-Id: I883bb512a2164a85865b1bd4fb569af0358c8c2b
Co-authored-by: Craig Rodrigues <rodrigc@crodrigues.org>
When running with serial != 100%, like upgrade_cluster.yml, we need to apply this fixup each time
Problem was introduced in 05dc2b3a09
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Raises limit from 100 to 300 because the default is far too low
and the pod can handle 300 with the given resources.
Change-Id: Ib1eec10da3d09d198933fcfe87291587e58d7cdb
I've tested this update by deploying a containerd / etcd cluster on top CentOS7,
MetalLB + NGINX Ingress. Upgrade using upgrade-cluster.yml
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Resolves issue where kubectl cache of <v1.16 api schema
interferes with interacting with daemonsets and deployments.
Change-Id: I63b7046958f2008eb144b6da0004c598f945e0ae
* Fix crictl
* Reload systemd daemon before enabling service
* Typo
* Add crictl template
* Remove seccomp.json for ubuntu
* Set runtime path of runc for ubuntu
* Change path to conmon
There is no cri-tools package in CentOS/EPEL/Red Hat.
Additionally, cri-tools is provided into the installation via
roles/download/defaults/main.yml:104:crictl_download_url.
* Fix python3-libselinux installation for RHEL/CentOS 8
In bootstrap-centos.yml we haven't gathered the facts,
so #5127 couldn't work
Minimum ansible version to run kubespray is 2.7.8,
so ansible_distribution_major_version is defined an there is no need to default it
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Restart NetworkManager for RHEL/CentOS 8
network.service doesn't exist anymore
# systemctl status network
Unit network.service could not be found.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Add module_hotfixes=True to docker / containerd yum repo config
https://bugzilla.redhat.com/show_bug.cgi?id=1734081https://bugzilla.redhat.com/show_bug.cgi?id=1756473
Without this setting you end up with the following error:
# yum install docker-ce
Failed to set locale, defaulting to C
Last metadata expiration check: 0:03:21 ago on Thu Sep 26 22:00:05 2019.
Error:
Problem: package docker-ce-3:19.03.2-3.el7.x86_64 requires containerd.io >= 1.2.2-3, but none of the providers can be installed
- cannot install the best candidate for the job
- package containerd.io-1.2.2-3.3.el7.x86_64 is excluded
- package containerd.io-1.2.2-3.el7.x86_64 is excluded
- package containerd.io-1.2.4-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.5-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.6-3.3.el7.x86_64 is excluded
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Initially this was to fix a mis-indented approvers key. However, it turns
out that 'oilbeater' is not a member of kubernetes-sigs nor
kubernetes-incubator (the org this repo was migrated from). Thus this
OWNERS file is failing prow's validation check.
As a workaround I've opted to move them to emeritus_approver, which
isn't valiated and can be used as a hint for other approvers in this
repo
This fixes the scenario where masters are upgraded one at a time
and coredns gets improperly scaled back up to 2 replicas.
Change-Id: I7cc9283f40efcfd61b5813c89a5805c95d901567
Kubespray Pull Request #5084 (https://github.com/kubernetes-sigs/kubespray/pull/5084) caused more problems than it solved due to limitations with the synchronize module. See comments on Kubespray Issues #5059 (https://github.com/kubernetes-sigs/kubespray/issues/5059) and #5116 (https://github.com/kubernetes-sigs/kubespray/issues/5116). Details from Ansible documentation: "Currently, synchronize is limited to elevating permissions via passwordless sudo. This is because rsync itself is connecting to the remote machine and rsync doesn’t give us a way to pass sudo credentials in. ... Currently there are only a few connection types which support synchronize (ssh, paramiko, local, and docker) because a sync strategy has been determined for those connection types. Note that the connection for these must not need a password as rsync itself is making the connection and rsync does not provide us a way to pass a password to the connection. ..." Thus, reverting Pull Request #5084.
* Add support for Kubernetes 1.16.1
* Defaults to 1.16.1
* add 1.16.2 checksums and set new version as default
* correct 1.16.2 checksums and add 1.15.5 checksums
When using cluster.yml or scale.yml to add/scale nodes in the existing
k8s cluster, the `kubeadm init` wouldn't run. As a result, kube-proxy
wouldn't be created, and therefore the kube-proxy deletion task would
fail, e.g. in the case where kube-router is used and "kube_proxy_remove"
is set to true. As a workaround, add ignore_errors to the kube-proxy
deletion task.
The script is not usable unless you are in the '.vagrant/provisioners/ansible/inventory/artifacts' folder.
This update makes this usable from anywhere.
- do not run etcd role when etcd_kubeadm_enabled == true
- remove default value 'systemd' for cgroup driver in containerd role.
this value override autodetect in kubelet_cgroup_driver_detected from docker info
This allows to easily override the gcr, quay, and docker repos with the
mirror repos in countries like China, where the default accesses are
blocked or unstable.
mydict.keys() should be converted to list,
otherwise it causes errors in loop iteration.
Remove extra space after class name, which broke configmap.
Also allow set reclaimPolicy property.
Cleaned up deprecated APIs:
apps/v1beta1
apps/v1beta2
extensions/v1beta1 for ds,deploy,rs
Add workaround for deploying helm using incompatible
deployment manifest.
Change-Id: I78b36741348f47a999df3841ee63cf4e6f377830
* Use python3-libselinux on RHEL8/Centos8
* The fact ansible_facts.distribution_major_version is not present on older Ansible version.
Default it to 0 in when not present and use libselinux-python as package to get current
default behaviour.
Fix for Kubespray Issue #5059 (https://github.com/kubernetes-sigs/kubespray/issues/5059). There is a known issue with the 'fetch' module that will sometimes lead to it failing with a memory error. See ansible/ansible#11702 (https://github.com/ansible/ansible/issues/11702). I encountered this issue with the "Copy kubectl binary to ansible host" task in kubespray/roles/kubernetes/client/tasks/main.yml, and it caused my entire deployment to error out (see "Output of ansible run" above). Replacing 'fetch' with 'synchronize' fixes this issue.
Updated Openstack to terraform 0.12 (#5062)
* update openstack to terraform 0.12(.5)
* replace cluter.tf with cluster.tfvars
* update README.md to terraform 0.12
* update Openstack CI tests to use terraform 0.12
* specify terraform version in openstack README
* gitlab CI to copy cluster.tfvars in case of openstack provider
* The terraform/openstack dynamic inventory can read
tfstate v4 (generated by terraform 0.12) and convert them internally
ro v3 (as generated by terraform 0.11.x).
Additionally the script has been updated to Python 3.
* run 'task download_container | Copy image to ansible host cache' with synchronize on download_delegate host
* try to run task copy file to ansible host on all inventory, not only on first random host
Fixes situation when using manual mode because it
tries to download coredns v1.3.1 from the same
image repository where kubernetes images are
downloaded from.
Change-Id: Ibbec8a72c8162ce8befa74e2013a268737ea5f8a
* Refactor calico-rr to run in k8s cluster with taint
Change-Id: I75a3169ff5b36ce8302fc7ef1c32d3eb697b5afa
* add preinstall checks
* rework calico/rr role
Change-Id: I2f0a7e6cb77cf91ad4a615923680760d2e5d9ca8
* add empty calico-rr group
Change-Id: I006c0a60db9b72d02245bf8fdfabcf982144a5ad
* Enable nodes to run calicoctl
per-node tasks require waiting for calico-node to be applied
Change-Id: Ibe1076b7334a2da0332f2dd766fde0c3f172d1f2
* cleanup tasks that should run on master
Change-Id: I43a837879ef41596f14657ecd7f813899b6865ae
* Switch run_once calico logic to just run on first master
Change-Id: I6893711e354f63c5e1eaf6ac2e23d9a6347a555d
* Enable containerd to deploy vanilla containerd package
Fixes kubeadm references to CRI socket for containerd
Fixes download role cache feature to work with containerd
Change-Id: I2ab8f0031107e2f0d1a85c39b4beb66f08509a01
* use containerd for flannel-addons job
Change-Id: Ied375c7d65e64a625ffbd995ff16f2374067dee6
* add containerd vars
Change-Id: Ib9a8a04e501c481a86235413cbec63f3672baf91
* fixup vars
Change-Id: Ibea64e4b18405a578b52a13da100384582aa24c2
* more fixes
* fix rh repo
Change-Id: I00575a77cfb7b81d6095db5d918a52023c8f13ba
* Adjust helm host install for containerd
* Add calico 3.7.3 support
* add calico_datastore variable to policy controller role
* add missing clusterrole rules for calico policy controller
* disable calico kube controller when kdd mode is used for versions < 3.6
8080 is a pretty common port, using nodelocaldns_ip:8080 still
prevents node processes or hostNetwork=true processes to bind to *:8080
so switch to 9254 by default (prometheus port is 9253)
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>