crio refuses to delete pods when cni is unavailable which is the
case e.g. using calico with kdd datastore. See:
https://github.com/cri-o/cri-o/issues/4084
Fix by deleting storage associated with containers. Stop and disable
crio service so switching container runtime can be done.
* Added option to force apiserver and respective client certificate to be regenerated without necessarily needing to bump the K8S cluster version
* Removed extra blank line
Handlers with the same name (Kubeadm | restart kubelet) leads to incorrect playbook execution. As a result, after completing the tasks, kubelet does not restart. This PR fix this behavior
After upgrading to newer Kubernetes(v1.17 at least), kubectl command
shows the following warning message:
WARNING: Kubernetes configuration file is group-readable.
This is insecure. Location: /home/foo/.kube/config
The kubeconfig was copied from {{ artifacts_dir }}/admin.conf with
kubeconfig_localhost feature. It is better to set valid file mode
at getting it on Kubespray.
The 0d0cc8cf9c change creates several
DaemonSets to cover the Flannel CNI installation for different CPU
architectures. This change removes the unnecessary architecture value
from the docker tag value.
Signed-off-by: Victor Morales <v.morales@samsung.com>
In case multiple nodeselectors are specified in ingress_nginx_nodeselector, the generated daemonset yaml template for nginx is invalid due to missing indentation starting with the second nodeselector
When stopping at the check of "Stop if ip var does not match local ips"
the error message is like:
fatal: [single-k8s]: FAILED! => {
"assertion": "ip in ansible_all_ipv4_addresses",
"changed": false,
"evaluated_to": false,
"msg": "Assertion failed"
}
That doesn't contain actual IP addresses and it is difficult to understand
what was wrong. This adds the error message which contain actual IP addresses
to investigate the issue if happens.
* calico: add constant calico_min_version_required
and verify current deployed version against it.
* calico: remove upgrade support with data migration
The tool was used pre v3.0.0 and is no longer needed.
* calico: remove old version support from tasks
* calico: remove old ver support from policy ctrl
* calico: remove old ver support from node
* canal: remove old ver support
* remove unused calicoctl download checksums
calico_min_version_required is the oldest version that can be installed
Older versions can be removed.
* Add retries to update calico-rr data in etcd through calicoctl
* Update update-node yaml syntax
* Add comment to clarify ansible block loop
* Remove trailing space
* Fix reserved memory unit in kubelet configuration
Signed-off-by: Wang Zhen <lazybetrayer@gmail.com>
* Move systemReserved default values from template
Signed-off-by: Wang Zhen <lazybetrayer@gmail.com>
* Added ability to set calico vxlan vni and port. defaults to calico's documented defaults.
* Check if calico_network_backend is defined prior to checking value
* Removed calico hidden defaults for vxlan port and vni
* Fixed FELIX_VXLANVNI typo
* Added support for setting tiller_service_account and tiller_replicas
* Specify helm 2 version to ensure we have a test path that still hits helm 2 code
* Moved tiller_service_account to defaults.yml. Fixed is tiller_replicas defined check.
* Make metallb image repos configurable
* Moved metallb image repo definitions to download role defaults
* Removed comment. These are set in download defaults
After host reboot kubelet and crio goes into a loop and no container is started.
storage_driver in crio.conf overrides system defaults in etc/containers/storage.conf
/etc/containers/storage.conf is installed by package containers-common dependency
installed from cri-o (centos7) and contains "overlay".
Hosts already configured with overlay2 should be reconfigured and the
/var/lib/containers content removed.
* Add comment from roles/kubespray-defaults/defaults/main.yaml clarifying network allocation and sizes
Signed-off-by: Mikael Johansson <mik.json@gmail.com>
* Rewrite of the comment and added new examples
Signed-off-by: Mikael Johansson <mik.json@gmail.com>
* remove podman cni plugin
* configure networkamanger global dns
* allow installation of python3-libselinux by disabling update repo temporary
* remove ipv4 section because it is not a valid configuration
Removes these startup warnings:
Warning: For remote container runtime, --pod-infra-container-image is ignored in kubelet, which should be set in that remote runtime instead
Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
* add snapshot-controller and v1beta1 snapshot api
* fix typo
* udpate manifest to v1beta1
* update
* update manifests
* fix spelling
* wait until crd is applied
* fix missing info in kube module
* revert snapshotclass
* add snapshot crds before applying the csi driver
* add crds, missed them in last commit
* use pull policy from kubespray
* Update CustomResourceDefinition for kubecontrollersconfigurations.crd.projectcalico.org to v1
* Align ClusterRole for kube-controllers with upstream (calico)
* Add support for openstack application credentials
* Add some lines for readability
* Update external_openstack_tenant_id check
Do not check external_openstack_tenant_id when application credentials are defined
* Add check for external_openstack_domain_id
* Fix typo
By default do not allow "unqualified" (without a registry) images
because it is considered unsecure and subject to mitm attacks.
To enable insecure pull configure for example:
crio_registries:
- "docker.io"
- "quay.io"
* Use proper openssl command to differentiate between host and ip in current certificate check
* fixup! Use proper openssl command to differentiate between host and ip in current certificate check
The bootstrap-os role uses a bootstrap script to provision a
python interpreter on flatcar and container os hosts. As the
pypy project switched to another hoster, the download url changed.
If applied this will use the new proper pypy download url in bootstrap script
* Update the cilium svc proxy test to HA mode
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Fix cilium strict kube-proxy in HA
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Add a single global endpoint variable
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Add cilium docs about kube-proxy replacement
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Fix issues in docs
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Option for MetalLB to talk BGP
* Check for BGP peers when metallb_protocol is bgp
* README clarification
* Commented values as documentation only in the sample inventory
* layer 2 or BGP, not both
* log level by default increased to 'info'
* cgroup manager by default set to 'systemd'
* stream port (used by kubelet) bound to 127.0.0.1 for security reasons
* metrics can be enabled and port specified
Trying to layer this package on Fedora 32 causes the install to crash
and furthermore it looks like the original bug linked to in the comment
has been resolved for Fedora 31
* Update calico_veth_mtu to FELIX_IPINIP variable
calico_veth_mtu is specified in the configuration, but since it only works for wireguard, modify it to work for IP-in-IP users.
* Update template with more cleaner expression
CI job 624031102 failed with:
fatal: [ubuntu1804]: FAILED! => {"changed": false, "msg": "Failed to download key at https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable/xUbuntu_18.04/Release.key: Request failed: <urlopen error [Errno -3] Temporary failure in name resolution>"}
Assuming its a temporary problem it should get more robust with a
couple of retries like in other roles.
Co-authored-by: Hans Feldt <hafe@users.noreply.github.com>
* Add additional metadata configuration option to external Openstack CCM (kubernetes-sigs#6338)
* Set the variable external_openstack_metadata_search_order undefined by default
* Fix kubelet cgroup driver detection for crio
Remove fact standalone_kubelet since it is not used
* Fix yamllint complaints of roles/kubernetes/node/tasks/facts.yml
Co-authored-by: Hans Feldt <hafe@users.noreply.github.com>
This changes MetalLB contrib to one of addons for deploying MetalLB with
Kubernetes cluster deployment. By the default, Kubespray doesn't deploy
MetalLB addon.
Support for Ambassador OSS as an Ingress Controller when
settings `ingress_ambassador_enabled: true`.
Signed-off-by: Alvaro Saurin <alvaro.saurin@gmail.com>
* Install Kata Containers as additional container runtime
* Create RuntimeClasses for Kata Containers
* Updated Vagrant to optionally run without Docker as container manager
* Updated Vagrant to optionally use Libvirt nested virtualization
* Add Kata Containers documentation
* Fix lint errors
* Add kata_containers_enabled to kubespray-defaults
* Fixed typo error
* Fixed typo error
We need to specify either external_openstack_tenant_name or
external_openstack_tenant_id. Those values were checked by seeing they
are defined or they have actual values separately.
However those values are always defined because of the following code
of openstack/defaults/main.yml:
external_openstack_tenant_id: "{{ lookup('env','OS_TENANT_ID')| default(lookup('env','OS_PROJECT_ID'),true) }}"
external_openstack_tenant_name: "{{ lookup('env','OS_TENANT_NAME')| default(lookup('env','OS_PROJECT_NAME'),true) }}"
So even if not specifying both values, those checks could not detect
the misconfiguration. This fixes this to detect the misconfiguration.
* MINOR: Check kernel version before enable modprobe nf_conntrack
* CLEANUP: no more need to ignore error of this task
* MINOR: Fixing yaml and ansible lint error - remove trailling-space
If the special parameter "$@" is not quoted, the following command will not work:
./kubectl.sh patch storageclass my-storage-class -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
* Add oraclelinux8 and disable firewalld
Add oraclelinux8 image and disable firewalld on oraclelinux VMs
* Fix Oracle Linux repositories
As documented in: http://yum.oracle.com/getting-started.html#installing-software-from-oracle-linux-yum-server
public-yum-ol7.repo was deprecated on release 7.6. Some repos were integrated into oracle-linux-ol7.repo (i.e.: ol7_latest, ol7_addons) and other are available as packages (epel). This also adds support for oraclelinux8
* Fix to use ansible_distribution_version
Instead of ansible_distribution_major_version
* Update README.md
On OpenStack history, we used to call "tenant" for separeted namespace.
However we use "project" now instead.
Then we have replaced "tenant" with "project". Then all "TENANT" variables
also are renamed to "PROJECT".
This makes Kubespray search "PROJECT" variable also for newer OpenStack
clouds.
with the Python ruamel.yml library
- Change True/False to true/false in a few places so file can
be more easily round-tripped with the Python ruamel.yml library
flannel, ovn and multus network plugins did not support all taint keys. This
update changes the tolerations to support them all.
According to the documentation:
```
There are two special cases: An empty key with operator Exists matches all keys,
values and effects which means this will tolerate everything. An empty effect matches
all effects with key key.
```
Usage of the empty `key` and `effect` ensures the network plugin daemonset will
be deployed on every nodes (ex: in case of custom taints, or NoExecute effect)
Since weave 2.5.1, `NoExecute` taint effect is no more supported,
this changes the daemonset tolerations to change this behavior.
Also remove the toleration key `CriticalAddonsOnly` not required anymore.
* fix(kubelet): exec notify restart kubelet service when kube-config.yml changed
* Revert "refactor(kubelet handler): change task name("reload kubelet") this is misleading"
This reverts commit 8f5d29560802c7c997293adb1ce9f84d3b20b6cb.
* fix(handlers,kubelet): setting right notify task name
* Add additional network configuration options to external Openstack CCM (#6083)
* Change the default version of external openstack cloud controller image to v1.18.1 since there was an issue in v1.18.0 where some IPs of the private network were ignored
* Change Network section in external-openstack-cloud-config.j2 to Networking
* Add networking customization information in the openstack documentation
The 98e7a07fba commit udpates the
dashboard version to 2.0.0 but it enable skip login flag wasn't
updated. This change updates its identation to avoid issues when
dashboard_skip_login is enabled.
* bump to dashboard 2.0 rc6 with metrics scrapper
* fix missing yaml seperator making Replicaset complaining about missing ServiceAccount
* unwanted legay gross hack forgot to remove before
* no need namespace on CrBinding
* bump to 2.0.0 release
* remove dashboard_metrics_scrapper_enabled
* replace removed repo with kubic repository for centos 7
* add crio configuration for centos8
* add crio configurations for debian
* use correct crio version for fedora
* simplify calulation of required crio version
- gives possibility to overwrite
* change default path for runc
* change default for seccomp path
* change default for conmon
* declare kubic repo for ubuntu
* do not install crictl twice
* move fedora repo modular tasks to crio_repo file
* move centos repo tasks to crio_repo
* declare crio version matrix for ubuntu
* update documentation crio support for ubuntu
* Add proxy support to CRI-O service
The crio.service requires proxy environment variables when it's
deployed behind a corporated network. This change creates a systemd
configuration file when the proxy variables are defined.
* Remove unnecesary crio's tasks
The playbook that bootstrap openSUSE servers assumes that the
/etc/sysconfig/proxy file exists but the execution fails when
these file is not present. This change guarantees its existence.
* assembly fallback_ips and no_proxy var only one time on localhost and populate result on all hosts
* add tag always, fix ansible lint errors
* workaround to mitogen issue dw/mitogen#663
* do not gather fact before install python on coreos like distros
* try to pass docker molecule test
* fix upgrade of crio on fcos
- update documents
* install conntrack required by kube-proxy
- like commit 48c41bcbe7
* enable fedora modular repo for crio
* allow to override crio configuration
- set cgroup manager same to kubelet_cgroup_driver if defined
- path of seccomp_profile depends on distribution
* allow to override crio configuration
- fix path for ubuntu
* allow to override crio configuration
- fix cni path for fcos
* added required permissions for querying endpointslice resources
* copy-pasted role permissions from cilium install manifests
* bumped cilium version to v1.7.2
* Fix proxy and module_hotfixes
On CentOS 8 with proxy ansible render inline `proxy` and `module_hotfixes` options.
For example:
`proxy=http://127.0.0.1:3128module_hotfixes=True`
But expected result:
```
proxy=http://127.0.0.1:3128
module_hotfixes=True
```
* Use ini_file module for work with ini files
* Prevent duplicates proxy= option in /etc/yum.conf
Module `lineinfile` is weak, use most powerful module `ini_file` and add or remove `proxy=` when `http_proxy` is defined or not.
* etcd: etcd-events doesn't depend on etcd_cluster_setup
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: remove condition already present on include_tasks
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: fix scaling up
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use *access_addresses, do not delegate to etcd[0]
We want to wait for the full cluster to be healthy,
so use all the cluster addresses
Also we should be able to run the playbook when etcd[0] is down
(not tested), so do not delegate to etcd[0]
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use failed_when for health check
unhealthy cluster is expected on first run, so use failed_when
instead of ignore_errors to remove scary red messages
Also use run_once
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/preinstall: ensure ansible_fqdn is up to date after changing /etc/hosts
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/master: regenerate apiserver cert if needed
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Fix chicken and egg problem with proxy_env not defined on the first envinronment usage.
* Disable fact gathering for the first proxy_env evaluation.
* Move proxy_env var set up from the role defaults to the root playbooks as fact.
* kubernetes-sigs-kubespray #5824
Added support nodes which are part of Virtual Machine Scale Sets(VMSS)
* kubernetes-sigs-kubespray #5824
* kubernetes-sigs-kubespray #5824
Added comments and updatetd azure docs.
* kubernetes-sigs-kubespray #5824
Added supported values comments for "azure_vmtype" in azure.yml
The variable is defined in `kubernetes/preinstall` role and used in several roles. Since `kubernetes/preinstall` is not always included when `ansible-playbook` is run with tag selectors (see #5734 for reason), they will fail, or individual roles must copy the same fact definitions (as in #3846). Moving the definition to the always-included `kubespray-defaults` role will resolve the dependency problem.
- This solves issue #5721 & #5713 (dupes)
- Provide a cleaner default usage pattern for the download role
around etcd that supports 'host' and 'docker' properly
- Extract the 'etcdctl' as a separate task install piece and reuse it where
appropriate
- Update the kubeadm-etcd task to reflect the above change
* fedora coreos support
- bootstrap and new fact for
* fedora coreos support
- fix bootstrap condition
* fedora coreos support
- allow customize packages for fedora coreos bootstrap
* fedora coreos support
- prevent install ptyhon3 and epel via dnf for fedora coreos
* fedora coreos support
- handle all ostree like os in same way
* fedora coreos support
- handle all ostree like os in same way for crio
* fedora coreos support
- add fcos documentations
* Support configuring the insert mode
Defaults to the upstream default https://docs.projectcalico.org/v3.9/reference/felix/configuration
so nothing should change for existing deployments.
This allows coexistence with other firewall management technologies.
* Add a note to the sample config
* Add docker-ce 19.03 packages for Debian & Ubuntu
K8s has updated the recommended Docker version to 19.03. More
specifically it should be 19.03.4, but since we used 18.06.7 instead of
.2, I'm assuming the latest patch version should be used here as well.
* Add docker 19.03 for redhat
The 'regexp' parameter matches last occurrence of a line starting with 'proxy=' and replaces it with the one defined in 'line' parameter. If no match - it works same way as before. This fixes resuming cluster deployments failed after that task (if there was no more than one line starting with 'proxy' in the yum.conf file - this condition should also be reassured with the change introduced here) eg. if they were initiated with Terraform.
refs #5277
As the issue describes, when no external or local load-balanced is used,
kube-proxy won't be able to contact apiserver at 127.0.0.1. So the
config map should be left as is.
* Upgrade etcd to 3.3.18
* Try with etcd 3.3.15 (kubeadm 1.16.7 default)
* Back to square one
* Try with 3.3.11
* Upgrade etcd to 3.3.18 (take 2)
* Try with 3.3.12
* download file
* download containers
* fix push image to nodes
* pull if none image on host
* fix
* improve docker image tag checks.
do not pull already cached images
* rebase fix merge conflict
* add support download_run_once when upgrade and scale cluster
add some test with download_run_once
* set default values to temp flag for every download cycle
* add save,load abilty for containerd and crio when download_run_once=true
* return redefine image save/load command to set_docker_image_facts.yml
* move set command to set_container_facts
* ctr in containerd_bin_dir
* fix order of ctr image export arguments
* temporary disable download_run_once for containerd and crio
due https://github.com/containerd/containerd/issues/4075
* remove unused files
* fix strict yaml linter warning and errors
* refactor logical conditions to pull and cache container images
* remove comment due lint check
* document role
* remove image_load_on_localhost, because cached images are always loaded to docker on remote sites
* remove XXX from debug output
* Run 'container-engine' after drain.
Move possibly disruptive role 'container-engine' to run after the node
is drained.
As that role have to be run on non-cluster nodes as well (etcd and
calico-rr), and those nodes are not drained, add play for that case.
* Check if api is up before upgrade.
If container engine is restarted in previous role, api controller can
take some time to start. This check ensures api is up before upgrade.
When kube-router is used as cni, rules might be added to the mangle table
to support external IPs. Therefore, mangle table should be flushed during
reset as well.
This 38688a4486 change replaces the
value for dockerproject_.+_repo_.+ docker variables but their new
value was previously defined in other variables. This change removes
the dockerproject_.+_repo_.+ docker variables in favor of the older
ones.
* Fix incorrect assertion comparison for kube_network_node_prefix
* Ignore assertion comparison for kube_network_node_prefix when using calico
* Adding more var docs description for kube_network_node_prefix
* Fixing trailing whitespaces
* External OpenStack Cloud Controller Manager implementation
* Adding controller image tag
* Minor fixes
* Restructuring the external cloud controller to work with KubeADM
* Added in code to allow control over pull policy for local path provisioner
* change to imagePullPolicy to use globally used variable k8s_image_pull_policy
* removed unusued variable from defaults
* updated contiv-etcd and cinder-csi-controllerplugin to use k8s_image_pull_policy variable
* Introduce kubelet_config_extra_args and kubelet_node_config_extra_args to pass params to kubelet via YAML config
* kubelet_config_extra_args is not the alternative
* Fix recover-control-plane to work with etcd 3.3.x and add CI
* Set default values for testcase
* Add actual test jobs
* Attempt to satisty gitlab ci linter
* Fix ansible targets
* Set etcd_member_name as stated in the docs...
* Recovering from 0 masters is not supported yet
* Add other master to broken_kube-master group as well
* Increase number of retries to see if etcd needs more time to heal
* Make number of retries for ETCD loops configurable, increase it for recovery CI and document it
* containerd: add proxy support
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubespray-defaults: add kube_service_addresses / kube_pods_subnet to no_proxy
CIDR notation in no_proxy is supported by a lot of programs/languages,
including go: https://github.com/golang/go/issues/16704
Without that containerd cannot talk the the API server (kube_apiserver_ip),
but it should not go through an external proxy for the nodes/pods/services
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Change dockerproject.org to download.docker.com
dockerproject.org was deprecated in 2017 and has gone down.
* Restore yum repo for containerd
Change-Id: I883bb512a2164a85865b1bd4fb569af0358c8c2b
Co-authored-by: Craig Rodrigues <rodrigc@crodrigues.org>
When running with serial != 100%, like upgrade_cluster.yml, we need to apply this fixup each time
Problem was introduced in 05dc2b3a09
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Raises limit from 100 to 300 because the default is far too low
and the pod can handle 300 with the given resources.
Change-Id: Ib1eec10da3d09d198933fcfe87291587e58d7cdb
I've tested this update by deploying a containerd / etcd cluster on top CentOS7,
MetalLB + NGINX Ingress. Upgrade using upgrade-cluster.yml
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Resolves issue where kubectl cache of <v1.16 api schema
interferes with interacting with daemonsets and deployments.
Change-Id: I63b7046958f2008eb144b6da0004c598f945e0ae
* Fix crictl
* Reload systemd daemon before enabling service
* Typo
* Add crictl template
* Remove seccomp.json for ubuntu
* Set runtime path of runc for ubuntu
* Change path to conmon
There is no cri-tools package in CentOS/EPEL/Red Hat.
Additionally, cri-tools is provided into the installation via
roles/download/defaults/main.yml:104:crictl_download_url.
* Fix python3-libselinux installation for RHEL/CentOS 8
In bootstrap-centos.yml we haven't gathered the facts,
so #5127 couldn't work
Minimum ansible version to run kubespray is 2.7.8,
so ansible_distribution_major_version is defined an there is no need to default it
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Restart NetworkManager for RHEL/CentOS 8
network.service doesn't exist anymore
# systemctl status network
Unit network.service could not be found.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Add module_hotfixes=True to docker / containerd yum repo config
https://bugzilla.redhat.com/show_bug.cgi?id=1734081https://bugzilla.redhat.com/show_bug.cgi?id=1756473
Without this setting you end up with the following error:
# yum install docker-ce
Failed to set locale, defaulting to C
Last metadata expiration check: 0:03:21 ago on Thu Sep 26 22:00:05 2019.
Error:
Problem: package docker-ce-3:19.03.2-3.el7.x86_64 requires containerd.io >= 1.2.2-3, but none of the providers can be installed
- cannot install the best candidate for the job
- package containerd.io-1.2.2-3.3.el7.x86_64 is excluded
- package containerd.io-1.2.2-3.el7.x86_64 is excluded
- package containerd.io-1.2.4-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.5-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.6-3.3.el7.x86_64 is excluded
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Initially this was to fix a mis-indented approvers key. However, it turns
out that 'oilbeater' is not a member of kubernetes-sigs nor
kubernetes-incubator (the org this repo was migrated from). Thus this
OWNERS file is failing prow's validation check.
As a workaround I've opted to move them to emeritus_approver, which
isn't valiated and can be used as a hint for other approvers in this
repo
This fixes the scenario where masters are upgraded one at a time
and coredns gets improperly scaled back up to 2 replicas.
Change-Id: I7cc9283f40efcfd61b5813c89a5805c95d901567
Kubespray Pull Request #5084 (https://github.com/kubernetes-sigs/kubespray/pull/5084) caused more problems than it solved due to limitations with the synchronize module. See comments on Kubespray Issues #5059 (https://github.com/kubernetes-sigs/kubespray/issues/5059) and #5116 (https://github.com/kubernetes-sigs/kubespray/issues/5116). Details from Ansible documentation: "Currently, synchronize is limited to elevating permissions via passwordless sudo. This is because rsync itself is connecting to the remote machine and rsync doesn’t give us a way to pass sudo credentials in. ... Currently there are only a few connection types which support synchronize (ssh, paramiko, local, and docker) because a sync strategy has been determined for those connection types. Note that the connection for these must not need a password as rsync itself is making the connection and rsync does not provide us a way to pass a password to the connection. ..." Thus, reverting Pull Request #5084.
* Add support for Kubernetes 1.16.1
* Defaults to 1.16.1
* add 1.16.2 checksums and set new version as default
* correct 1.16.2 checksums and add 1.15.5 checksums
When using cluster.yml or scale.yml to add/scale nodes in the existing
k8s cluster, the `kubeadm init` wouldn't run. As a result, kube-proxy
wouldn't be created, and therefore the kube-proxy deletion task would
fail, e.g. in the case where kube-router is used and "kube_proxy_remove"
is set to true. As a workaround, add ignore_errors to the kube-proxy
deletion task.
The script is not usable unless you are in the '.vagrant/provisioners/ansible/inventory/artifacts' folder.
This update makes this usable from anywhere.
- do not run etcd role when etcd_kubeadm_enabled == true
- remove default value 'systemd' for cgroup driver in containerd role.
this value override autodetect in kubelet_cgroup_driver_detected from docker info
This allows to easily override the gcr, quay, and docker repos with the
mirror repos in countries like China, where the default accesses are
blocked or unstable.
mydict.keys() should be converted to list,
otherwise it causes errors in loop iteration.
Remove extra space after class name, which broke configmap.
Also allow set reclaimPolicy property.
Cleaned up deprecated APIs:
apps/v1beta1
apps/v1beta2
extensions/v1beta1 for ds,deploy,rs
Add workaround for deploying helm using incompatible
deployment manifest.
Change-Id: I78b36741348f47a999df3841ee63cf4e6f377830
* Use python3-libselinux on RHEL8/Centos8
* The fact ansible_facts.distribution_major_version is not present on older Ansible version.
Default it to 0 in when not present and use libselinux-python as package to get current
default behaviour.
Fix for Kubespray Issue #5059 (https://github.com/kubernetes-sigs/kubespray/issues/5059). There is a known issue with the 'fetch' module that will sometimes lead to it failing with a memory error. See ansible/ansible#11702 (https://github.com/ansible/ansible/issues/11702). I encountered this issue with the "Copy kubectl binary to ansible host" task in kubespray/roles/kubernetes/client/tasks/main.yml, and it caused my entire deployment to error out (see "Output of ansible run" above). Replacing 'fetch' with 'synchronize' fixes this issue.
Updated Openstack to terraform 0.12 (#5062)
* update openstack to terraform 0.12(.5)
* replace cluter.tf with cluster.tfvars
* update README.md to terraform 0.12
* update Openstack CI tests to use terraform 0.12
* specify terraform version in openstack README
* gitlab CI to copy cluster.tfvars in case of openstack provider
* The terraform/openstack dynamic inventory can read
tfstate v4 (generated by terraform 0.12) and convert them internally
ro v3 (as generated by terraform 0.11.x).
Additionally the script has been updated to Python 3.
* run 'task download_container | Copy image to ansible host cache' with synchronize on download_delegate host
* try to run task copy file to ansible host on all inventory, not only on first random host
Fixes situation when using manual mode because it
tries to download coredns v1.3.1 from the same
image repository where kubernetes images are
downloaded from.
Change-Id: Ibbec8a72c8162ce8befa74e2013a268737ea5f8a
* Refactor calico-rr to run in k8s cluster with taint
Change-Id: I75a3169ff5b36ce8302fc7ef1c32d3eb697b5afa
* add preinstall checks
* rework calico/rr role
Change-Id: I2f0a7e6cb77cf91ad4a615923680760d2e5d9ca8
* add empty calico-rr group
Change-Id: I006c0a60db9b72d02245bf8fdfabcf982144a5ad
* Enable nodes to run calicoctl
per-node tasks require waiting for calico-node to be applied
Change-Id: Ibe1076b7334a2da0332f2dd766fde0c3f172d1f2
* cleanup tasks that should run on master
Change-Id: I43a837879ef41596f14657ecd7f813899b6865ae
* Switch run_once calico logic to just run on first master
Change-Id: I6893711e354f63c5e1eaf6ac2e23d9a6347a555d
* Enable containerd to deploy vanilla containerd package
Fixes kubeadm references to CRI socket for containerd
Fixes download role cache feature to work with containerd
Change-Id: I2ab8f0031107e2f0d1a85c39b4beb66f08509a01
* use containerd for flannel-addons job
Change-Id: Ied375c7d65e64a625ffbd995ff16f2374067dee6
* add containerd vars
Change-Id: Ib9a8a04e501c481a86235413cbec63f3672baf91
* fixup vars
Change-Id: Ibea64e4b18405a578b52a13da100384582aa24c2
* more fixes
* fix rh repo
Change-Id: I00575a77cfb7b81d6095db5d918a52023c8f13ba
* Adjust helm host install for containerd
* Add calico 3.7.3 support
* add calico_datastore variable to policy controller role
* add missing clusterrole rules for calico policy controller
* disable calico kube controller when kdd mode is used for versions < 3.6
8080 is a pretty common port, using nodelocaldns_ip:8080 still
prevents node processes or hostNetwork=true processes to bind to *:8080
so switch to 9254 by default (prometheus port is 9253)
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Use K8s 1.15
* Use Kubernetes 1.15 and use kubeadm.k8s.io/v1beta2 for
InitConfiguration.
* bump to v1.15.0
* Remove k8s 1.13 checksums.
* Update README kubernetes version 1.15.0.
* Update metrics server 0.3.3 for k8s 1.15
* Remove less than k8s 1.14 related code
* Use kubeadm with --upload-certs instead of --experimental-upload-certs due to depricate
* Update dnsautoscaler 1.6.0
* Skip certificateKey if it's not defined
* Add kubeadm-conftolplane.v2beta2 for k8s 1.15 or later
* Support kubeadm control plane for k8s 1.15
* Update sonobuoy version 0.15.0 for k8s 1.15
* Add limited containerd support
Containerd support for Ubuntu + Calico
* Added CRI-O support for ubuntu
* containerd support.
* Reset containerd support.
* fix lint.
* implemented feedback
* Change task name cri xx instead of cri-o in reset task and timeout condition.
* set crictl to fixed version
* Use docker-ce's container.io package for containerd.
* Add check containerd is installable or not.
* Avoid stop docker when use containerd and optimize retry for reset.
* Add config.toml.
* Fixed containerd for kubelet.env.
* Merge PR #4629
* Remove unused ubuntu variable for containerd
* Polish code for containerd and cri-o
* Refactoring cri socket configuration.
* Configurable conmon.
* Remove unused crictl/runc download
* Now crictl and runc is downloaded by common crictl.yml.
* fixed yamllint error
* Fixed brokenfiles by conflict.
* Remove commented line in config.toml
* Remove readded v1.12.x version
* Fixed broken set_docker_image_facts
* Fix yamllint errors.
* Remove unused apt source
* Fix crictl could not be installed
* Add containerd config from skolekonov's PR #4601
* add macvlan cni to kubespray
* macvlan: lint yaml files and fix sample config file
* macvlan: add OWNERS file
* add macvlan to README
* macvlan : CI first shoot
* macvlan : CI add full masquerade
* delegate retrive pod cidr to master only
* macvlan: add config for CI
* macvlan: add netchecker deployment
kubernetes/master role defines this value as an empty string
when using a cloud provider, not undefined. The check was updated
accordingly.
Change-Id: I58dc31ef4fd568a717a6753eb89ca687933018ae
* Require minimum version of Kubernetes
* Remove checksums for kubernetes version 1.12
* Add kube_version to precheck output and add min required version to README
* Fix merge
* Fix defaults
* Fix typo in precheck
* File and container image downloads are now cached localy, so that repeated vagrant up/down runs do not trigger downloading of those files. This is especially useful on laptops with kubernetes runnig locally on vm's. The total size of the cache, after an ansible run, is currently around 800MB, so bandwidth (=time) savings can be quite significant.
* When download_run_once is false, the default is still not to cache, but setting download_force_cache will still enable caching.
* The local cache location can be set with download_cache_dir and defaults to /tmp/kubernetes_cache
* A local docker instance is no longer required to cache docker images; Images are cached to file. A local docker instance is still required, though, if you wish to download images on localhost.
* Fixed a FIXME, wher the argument was that delegate_to doesn't play nice with omit. That is a correct observation and the fix is to use default(inventory_host) instead of default(omit). See ansible/ansible#26009
* Removed "Register docker images info" task from download_container and set_docker_image_facts because it was faulty and unused.
* Removed redundant when:download.{container,enabled,run_once} conditions from {sync,download}_container.yml
* All features of commit d6fd0d2aca by Timoses <timosesu@gmail.com>, merged May 1st 2019, are included in this patch. Not all code was included verbatim, but each feature of that commit was checked to be working in this patch. One notable change: The actual downloading of the kubeadm images was moved to {download,sync)_container, to enable caching.
Note 1: I considered splitting this patch, but most changes that are not directly related to caching, are a pleasant by-product of implementing the caching code, so splitting would be impractical.
Note 2: I have my doubts about the usefulness of the upload, download and upgrade tags in the download role. Must they remain or can they be removed? If anybody knows, then please speak up.
* Make local volume provisioner dir mode a variable
I need to change this for Nagios monitoring. Others may
need to as well. Had to close previous commits, sorry for
the spam.
* Make local volume provisioner dir mode a variable
I need to change this for Nagios monitoring. Others may
need to as well. Had to close previous commits, sorry for
the spam.
Task "kube-roter | Set cni directory permissions"
sets ownership of /opt/cni/bin to "kube"
Task "kube-router | Copy cni plugins"
copies the binaries from the archive setting the ownership
back to "root"
Fix "kube-roter" typo
Signed-off-by: Alberto Murillo <albertomurillosilva@gmail.com>
* Add support for arm images for hyperkube, kubeadm and cni_binary
* Add dummy etcd checksum for arm
This commit adds dummy etcd checksum for arm to avoid "no attribute" error
during setup.
* Add etcd host assert check
* Add 1.13.4 checksums of kubeadm and hyperkube for arm
* Update checksums of kubeadm and hyperkube for arm
* Add dummy checksums for calicoctl_binary_checksums dict
* disable gather_facts because it causes tests to fail
* Remove architecture check for etcd, due to unable to run tests
* Added pod psp in Rancher Local Path Provisioner
Added pod security policy (psp) in Rancher Local Path Provisioner.
Signed-off-by: André R. de Miranda <andre@miranda.work>
* Apply psp for Rancher Local Path Provisioner only when local_path_provisioner_namespace is not kube-system and also reorganized the templates
Kubespray waits exit of every drain before run other one.
Running drain every after each other seems better than parallel, because we should check resources availability every time.
But, this way, we have one additional problem: possible restart pods on the nodes that are killed little bit later.
Fast cordon before heavy drain seems like an easy solution.
Error starting nginx because in requiredDropCapabilities is dropped all capabilities.
The nginx requires the following capabilities:
- CHOWN
- SETGID
- SETUID
Signed-off-by: André R. de Miranda <andre@miranda.work>
Without this, pulls are considered for all
hosts groups, even if not targetted by the downloads
`groups` list. Hence, a download/sync is triggered
even though the host does not require the image.
* Disable kube_api_anonymous_auth by default to secure the setup
* Disable metrics-server in addons. Health endpoint is slow and unstable
* Fix anonymous-auth missing in configuration
* Cleanup a bit
* Fix kube anon auth
* Download to delegate and sync files when download_run_once
* Fail on error after saving container image
* Do not set changed status when downloaded container was up to date
* Only sync containers when they are actually required
Previously, non-required images (pull_required=false as
image existed on target host) were synced to the target
hosts. This failed as the image was not downloaded to
the download_delegate and hence was not available for
syncing.
* Sync containers when only missing on some hosts
* Consider images with multiple repo tags
* Enable kubeadm images pull/syncing with download_delegate
* Use kubeadm images list to pull/sync
'kubeadm config images pull' is replaced by collecting the images
list with 'kubeadm config images list' and using the commonly
used method of pull/syncing the images.
* Ensure containers are downloaded and synced for all hosts
* Fix download/syncing when download_delegate is a kubernetes host
When docker_container_storage_setup is false,
docker service should not depend on docker-storage-setup service,
because it's not installed.
For example, when using overlay2 on recent RHEL 7/Centos 7 kernels,
you most likely don't need it.
* Fix nodeselectors for contiv and nginx-ingress
Change-Id: Ib3eb6bd87193c69a90ee944c9164a0b6792c79ba
* Set kube proxy mode to iptables for addons task
Change-Id: Iff71a71f672405c74b4708c71db15ddc4391a53a
add the support of the folling property in azure-credential-check.yml
- azure_loadbalancer_sku: Sku of Load Balancer and Public IP. Candidate values are: basic and standard.
- azure_exclude_master_from_standard_lb: excludes master nodes from standard load balancer.
- azure_disable_outbound_snat: disables the outbound SNAT for public load balancer rules
- useInstanceMetadata: Use instance metadata service where possible
- azure_primary_availability_set: (Optional) The name of the availability set that should be used as the load balancer backend
We don't need to support upgrades from 2 year old installs,
just from the last major version.
Also changed most retried tasks to 1s delay instead of longer.
* Add README to bootstrap-os role
* Rework bootstrap-os once more
* Document workarounds for bugs/deficiencies in Ansible modules
* Unify and document role variables
* Remove installation of additional packages and repositories
* Merge Ubuntu and Debian tasks
* Remove pipelining setting from default playbooks
* Fix OpenSUSE not running its required tasks
The docker service provided by the containers-basic bundle is masked
in ClearLinux distribution. This is causing errors in the following
steps. This commit ensures that the unit is not masked.
* Use K8s 1.14 and add kubeadm experimental control plane mode
This reverts commit d39c273d96.
* Cleanup kubeadm setup run on first master
* pin kubeadm_certificate_key in test
* Remove kubelet autolabel of kube-node, add symlink for pki dir
Change-Id: Id5e74dd667c60675dbfe4193b0bc9fb44380e1ca
The BINDIR variable defined on the runc's Makefile[1] defines
installation path is on $(PREFIX)/sbin which used for most of the
Linux distributions. This change fixes the absolute path used for
non-ClearLinux distributions (CentOS, Ubuntu).
[1] https://github.com/opencontainers/runc/blob/master/Makefile#L10
The Stateless ClearLinux feature[1] requires the creation of folders
in /etc folder. This change ensure the existence of the
/etc/bash_completion.d/ folder for ClearLinux Distribution.
[1] https://clearlinux.org/features/stateless
* Enable nodelocaldns by default
* Enable nodelocaldns by default
* nodelocaldns is now default
* Disable enable_nodelocaldns for the addons CI jobs
Disable enable_nodelocaldns for the addons CI jobs to make sure things still work without nodelocaldns
* Add ansible-lint as gitlab-ci step
* Fix jinja2 syntax in include_tasks that breaks ansible-lint
* Use a block scalar to get around gitlab quoting/escaping rules
* Run ansible-lint in verbose mode in CI
This will fix error: error converting YAML to JSON: yaml: line 36: mapping values are not allowed in this context
Signed-off-by: Abdulaziz AlMalki <almalki.a@gmail.com>
* Vagrantfile: Bump openSUSE to Leap 15.0
* roles: container-engine: Add 'containerd' package for openSUSE
The 'containerd' package contains the docker-containerd and
docker-containerd-shim binaries. We also need to ensure that the latest
version is installed since an older version may already be present (eg GCE
images)
* Remove docker log-opts for opensuse
* roles: bootstrap-os: Use lowercase 'o' for openSUSE
OpenSUSE is not a valid family name. The correct one is openSUSE
* roles: bootstrap-os: Update zypper cache before first installation
The zypper cache may be outdated so ensure that it's fully updated
before we try and install the bootstrap packages.
Both the `yum` and `apt` modules support a list as input, this allows us avoid the slower `with_items` approach, which can take a long time with a large count of cluster nodes.
Both kubedns and dnsmasq modes are long not maintained.
We should run dns_late steps at the end because sshd
makes DNS lookups during Ansible run and has 2s timeouts
for each failed lookup trying to connect to coredns before
it is ready.
values from inventory in roles/kubespray-defaults/defaults/main.yml
hardcoded values in roles/container-engine/defaults/main.yml
dns_servers set empty in roles/container-engine/defaults/main.yml and skydns_server not set in docker_dns_servers variables
also set default value for manual_dns_serve
another variables in roles/container-engine/defaults not need to set
* feat(external-provisioner/local-path-provisioner): adds support for local path provisioner
Helpful for local development but also in production workloads (once the
permission model is worked out) where you have redundancy built into the
software uses the PVCs (e.g. database cluster with synchronous
replication)
* feat(local-path-provisioner): adds debug flag, image tag group var
* fix(local-path-provisioner): moves image repo/tag to download role
* test(gce_centos7-flannel): enables local-path-provisioner in test case
* fix(addons): add image repo/tag to commented default values
* fix(local-path-provisioner): typo in jinja template for local path provisioner
* style(local-path-provisioner): debug flag condition re-formatted
* fix(local-path-provisioner): adds missing default value for debug flag
* fix(local-path-provisioner): syntax fix for debug if condition end
* fix(local-path-provisioner): jinja template syntax: if condition white space
This was already approved in #4106 but there are CI issues
with that PR due to references to kubernetes incubator.
After upgrading to Kubespray 2.8.1 with Kubeadm enabled Rook
Ceph volume provision failed due to the flexvolume plugin dir not
being correct. Adding the var fixed the issue
* Adding ability to maintain existing Encryption Secrets at Rest.
If secrets_encryption.yaml is present it will not be overriten with a new kube_encrypt_token.
This should allow for it to be set ahead of a playbook running or maintain it if cluster.yml is ran on the same cluster and the ansible host does not have access to the secrets.
* Setting existing kube_encrypt_token across all master nodes in case it was missing in one or more nodes.
This fixes an issue where the `nodename` in calico's cni config json can fall out of sync with the k8s node name used by the calico pod if `kube_override_hostname` is set
Currently, the task `container_download | download images for kubeadm config images` fetches etcd image even though it's not required (etcd is bootstrapped by kubespray, not kubeadm).
`kubeadm-images.yaml` is only a subset of `kubeadm-config.yaml`, therefore ``kubeadm config images pull` will try to get all this list (including etcd)
```
# kubeadm config images list --config /etc/kubernetes/kubeadm-images.yaml
k8s.gcr.io/kube-apiserver:v1.13.2
k8s.gcr.io/kube-controller-manager:v1.13.2
k8s.gcr.io/kube-scheduler:v1.13.2
k8s.gcr.io/kube-proxy:v1.13.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.2.24
k8s.gcr.io/coredns:1.2.6
```
When using the `kubeadm-config.yaml` though, it doesn't list etcd image:
```
# kubeadm config images list --config /etc/kubernetes/kubeadm-config.yaml
k8s.gcr.io/kube-apiserver:v1.13.2
k8s.gcr.io/kube-controller-manager:v1.13.2
k8s.gcr.io/kube-scheduler:v1.13.2
k8s.gcr.io/kube-proxy:v1.13.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/coredns:1.2.6
```
This change just adds the etcd endpoints in the `kubeadm-images.yaml` to give a hint to kubeadm it doesn't need etcd image for its boostrapping as etcd is "external".
I confess it is a ugly hack, a better way would be to use a single `kubeadm-config.yaml` for both tasks, but they are triggered by different roles (`kubeadm-images.yaml` is used by download, `kubeadm-config.yaml` by kubernetes/master) at different steps and I didn't want to refactor too many things to prevent breakage.
This is specially useful for offline installation where a whitelist of container images is mirrored on a local private container registry. `k8s.gcr.io/etcd` and `quay.io/coreos/etcd` are two different repositories hosting the same images but using *different tags*!
* coreos/etcd:v3.2.24
* k8s.gcr.io/etcd:3.2.24 (note the missing 'v' in the tag name)
Fix issue where `kubeadm join` could wait forever for joining.
Fix issue where `kubeadm join` were not reaching the user, making
impossible to find the cause of the failure.
New behaviour is to first attempt to join without bypassing the
verifications checks and to display them if needed.
If this fails it still attempts to join by ignoring the check in
order to make previous behavior.
A timeout of 60 seconds is allocated for a joining.
Related-bug: #3973
* bootstrap: rework role
* support being called from a non-root user
* run some commands in check mode
* unify spelling/task names
* bootstrap: fix wording of comments for check_mode: false
* bootstrap: remove setup-pipelining task
* OCI subnet AD 2 is not required for CCM >= 0.7.0
Reorganize OCI provider to generate configuration, rather than pull
Add pull secret option to OCI cloud provider
* Updated oci example to document new parameters
This PR ensures that the e2fsprogs and xfsprogs packages are
installed on all Kubernetes nodes and that the packages are
the latest versions. It also ensures that the nodes can
create XFS filesystems when necessary, since not all distros
install xfsprogs by default.
e2fsprogs - ext2/ext3/ext4 file system utilities
xfsprogs - Utilities for managing the XFS filesystem
* Calico: Ability to define the default IPPool CIDR (instead of kube_pods_subnet)
* Documentation for calico_pool_cidr (and calico_advertise_cluster_ips which has been forgotten...)
* Set cluster DNS correctly in case of nodelocal dns cache
* Pass in cluster_ip based on dns mode
* Disable nodelocaldns by default
* Fix syntax error
* Fix syntax issue
* Add nodelocadns ip to vars of node installation
* Change location of nodelocaldns_ip
* Try to remove newlines from jinja template
* Add debug for config file
* Move parameter logic outside of template
* Adapt templates after feedback
* Remove debugging
Addressing the discussion started in #4064, this PR moves kubeadm and
hyperkube binaries to /usr/local/bin before running them on the master
nodes.
It is to address the case where local_release_dir points to /tmp
(kubespray default) and /tmp is mounted with noexec mode, preventing
any binaries to be run in that partition.
In role "node", we still move kubeadm to bin_dir only on the worker
nodes.
I know this is a bit hack.
If you use cloud LB, you can use kubeadm's controlPlaneEndpoint to configure kube-proxy's server field.
But for nginx-proxy, it didn't start when kubeadm init.
Looks like `epel_enabled` was not configured for the epel install in `bootstrap-centos.yml`. Also, there were no conditionals that would trigger bootstrap for RHEL.