TASK [Generate a list of information about the images on a node]
registers list of container images to docker_images.
Then the next TASK [Set pull_required if the desired image is not
yet loaded] does based on expecting images are registered.
However sometimes the first TASK was failed as [1] but the failure
is ignored due to failed_when:false and it makes another issue.
This removes this unnecessary failed_when to detect the failure
at the point.
In addition, this removes no_log:true also because the output doesn't
contain any sensitive data and now it just makes debugging difficult.
[1]: https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/934714534#L2953
no_proxy is a pain to get right, and having proxy variables present causes issues
(k8s components get proxy configuration after upgrade, see #7100)
It's better to only configure what require proxy:
- the runtime (containerd/docker/crio)
- the package manager + apt_key
- the download tasks
Tested with the following clusters
- 4 CentOS 8 nodes
- 1 Ubuntu 20.04 node
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
In some environments, it might not be possible to ping the IP address
of the nodes, e.g., because ICMP echo is blocked.
This commit allows kubespray to be configured to disable the ping
check, while performing all other checks.
TASK [network_plugin/calico : Calico | Configure calico network pool] **********
task path: /builds/kargo-ci/kubernetes-sigs-kubespray/roles/network_plugin/calico/tasks/install.yml:138
Friday 08 January 2021 17:10:12 +0000 (0:00:01.521) 0:11:36.885 ********
[WARNING]: The value {'kind': 'IPPool', 'apiVersion': 'projectcalico.org/v3',
'metadata': {'name': 'default-pool'}, 'spec': {'blockSize': 24, 'cidr':
'10.233.64.0/18', 'ipipMode': 'Always', 'vxlanMode': 'Never', 'natOutgoing':
True}} (type dict) in a string field was converted to "{'kind': 'IPPool',
'apiVersion': 'projectcalico.org/v3', 'metadata': {'name': 'default-pool'},
'spec': {'blockSize': 24, 'cidr': '10.233.64.0/18', 'ipipMode': 'Always',
'vxlanMode': 'Never', 'natOutgoing': True}}" (type string). If this does not
look like what you expect, quote the entire value to ensure it does not change.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Improve how we set 'proxy=' in yum.conf or dnf.conf
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Fixup spaces in no_proxy
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Add svc,svc.{{ dns_domain }} to no_proxy
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Remove because of empty need_http_proxy.rc if http/https_proxy and skip_http_proxy_on_os_packages=true is set
* Modify sample for debian and centos skip_http_proxy
* Modify sample for debian and centos skip_http_proxy
If some settings were changed from the default but not commited into an inventory repo,
we risk breaking the cluster / cause downtime, so add some extra checks
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Upgrading docker / containerd without adapting the configuration might break the node,
so disable docker-ce repo by default.
We are already using dpkg hold for Debian.
All containerd.io packages provide /usr/bin/runc, so no need to check
yum_conf was never used for containerd
module_hotfixes should not be needed with the EL8 repo
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Fedora CoreOS: Fix for ethtool pre-installed
Fix error in rpm-ostree when ethtool is already insatlled (FCOS >= 32.20201104.3.0)
* Fedora CoreOS: Fix connection lost
Fedora CoreOS: Ignore connection lost due to reboot and continues the playbook
We are currently setting the IP variable to hostIP,
Before https://github.com/projectcalico/node/pull/593 (not yet released)
Calico interpret that as hostIP/32
Using 'can-reach' we get the future behavior
This fixes vxlan and IPIP CrossSubnet modes
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* update files to handle multi-asn bgp peering conditions.
* put back in the serviceClusterIPs. Bad merge.
* remove extraneous environment var.
* update files as discussed with mirwan
* update titles.
* add not in.
* add a conditional for using bgp to advertise cluster ips.
Co-authored-by: marlow-h <mweston@habana.ai>
If cluster-name is not set, the default value "kubernetes" is used.
The loadbalancees created by Kubernetes follow the format:
kube_service_clusterName_serviceNamespace_serviceName
If 2 clusters create a loadbalancer for the same service in the same
namespace, they will share the same non-working loadbalancer.
Signed-off-by: Cedric Hnyda <cedric.hnyda@itera.io>
* Update hashes and set default version to 1.19.5
Signed-off-by: anthr76 <hello@anthonyrabbito.com>
* Reorder hashes
1.19.5 hashes should be near 1.19.x
* Added back blank line
This fixes the following warning:
[kubernetes/client : Generate admin kubeconfig with external api endpoint]
[WARNING]: Consider using the file module with state=directory rather than
running 'mkdir'. If you need to use command because file is insufficient
you can
RedHat 8.3 merged nf_conntrack_ipv4 in nf_conntrack but still advertise 4.18
so just try to modprobe and decide depending on the success
Also nf_conntrack is a dependency of ip_vs, so no need to care about it
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Ensure libseccomp is installed before starting containerd on CentOS 8
* Simplify libseccomp install on CentOS 8
- Uses `package` module
- Replaces complex version check with 'state: latest'. The version must
be > 2.3 when using with cri-o.
- Removes unnecessary `not is_ostree` condition as CentOS 8 does not use
ostree
* copying ssh key no longer required, works with password auth
* use copy module instead of synchronize (which requires sshpass)
* less tasks and always changed tasks
* containerd docker hub registry mirror support
* add docs
* fix typo
* fix yamllint
* fix indent in sample
and ansible-playbook param in testcases_run
* fix md
* mv common vars to tests/common/_docker_hub_registry_mirror.yml
* checkout vars to upgrade tests
If crictl (and docker) binaries are deployed to the directories
that are not in standard PATH (e.g. /usr/local/bin), it is required
to specify full path to the binaries.
The task outputs the following warning:
TASK [kubernetes/preinstall : Enable ip forwarding]
[WARNING]: The value 1 (type int) in a string field was converted
to u'1' (type string). If this does not look like what you expect,
quote the entire value to ensure it does not change.
This new version uses the same base image as kube-proxy
(k8s.gcr.io/build-image/debian-iptables)
This allow to automatically pick iptables-legacy or iptables-nft,
and be compatible with RHEL/CentOS 8
https://github.com/kubernetes/dns/pull/367
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* fix flake8 errors in Kubespray CI - tox-inventory-builder
* Invalidate CRI-O kubic repo's cache
Signed-off-by: Victor Morales <v.morales@samsung.com>
* add support to configure pkg install retries
and use in CI job tf-ovh_ubuntu18-calico (due to it failing often)
* Switch Calico, Cilium and MetalLB image repos to Quay.io
Co-authored-by: Victor Morales <v.morales@samsung.com>
Co-authored-by: Barry Melbourne <9964974+bmelbourne@users.noreply.github.com>
calico PODs are first started and then in a handler killed and
restarted for no reason, nothing has changed.
By using the existing variable 'calico_cni_config' (only defined when
calico has already started) the restart can be skipped.
* create a wrapper script with pki options
* supports all kubespray managed container engines
Co-authored-by: Hans Feldt <hafe@users.noreply.github.com>
* Allow the eventRecordQPS setting to be set.
The eventRecordQPS parameter controls rate limiting for event recording. When zero, unlimited events can cause denial-of-service situations. For my situation, I don't need more than a setting of "5". This change allows me to configure the setting before creating the cluster.
* Allow the eventRecordQPS setting to be set.
The default settings (see types.go) is five. So, this change does not affect the cluster provisioning. However, it does allow for the setting to be changed.
Fedora 31 uses Cgroups v2 by default. This change by passes the kernel
parameter systemd.unified_cgroup_hierarchy=0.
Signed-off-by: Victor Morales <v.morales@samsung.com>
* update version of ingress-nginx controller.
Change tag from controller-v0.34.0 to controller-v0.40.2 to use newest tag.
* Update docs about aws deploy templates.
In the yaml templates, there is no mention of idle timeouts. This is why I removed the documentation about it. This might be a mistake. Please verify this. I don't know enough to verify it myself.
* Change label when checking version.
When checking for `app.kubernetes.io/name=ingress-nginx`, a completed pod was selected which is not helpful when trying to `exec`. Changing the label selects the running controller pod.
* put back the information about ELB Idle Timeouts.
When I removed the information, I had overlooked that it was mentioned in the L7 yaml file. Thanks.
and thereby support upgrade from e.g. 1.18.x to 1.19.y
Included OSes:
- Centos7/8
- Ubuntu18/20
New variables for overriding by default installed packages:
- centos_crio_packages
- ubuntu_crio_packages
* Enable Kata Containers for CRI-O runtime
Kata Containers is an OCI runtime where containers are run inside
lightweight VMs. This runtime has been enabled for containerd runtime
thru the kata_containers_enabled variable. This change enables Kata
Containers to CRI-O container runtime.
Signed-off-by: Victor Morales <v.morales@samsung.com>
* Set appropiate conmon_cgroup when crio_cgroup_manager is 'cgroupfs'
* Set manage_ns_lifecycle=true when KataContainers is enabed
* Add preinstall check for katacontainers
Signed-off-by: Victor Morales <v.morales@samsung.com>
Co-authored-by: Pasquale Toscano <pasqualetoscano90@gmail.com>
Command line flags aren't added to kube-proxy which results in missing
feature gates set in this component. Add appropriate setting to
ConfigMap instead.
Signed-off-by: Maciej Wereski <m.wereski@partner.samsung.com>
'ansible.vars.hostvars.HostVarsVars object' has no attribute 'kubeadm_upload_cert'
kubeadm_upload_cert will never be found as a hostvar for the first
master since the task is executed for a worker.
Fix by executing the upload task for the first master and register
the needed key. After that, workers can read hostvars for the master
Var kubeadm_etcd_refresh_cert_key removed since it no longer has
any use.
* Adding option to disable gloablly applying a proxy to etc/yum.conf
* Change made to proxy_yum_globaly basedon reviewer feedback
* fix trailing spaces in ymllint
This fixes the Containerd + EL8 case that was missed in 7d1ab3374e
On CentOS 8 with proxy ansible render inline `proxy` and `module_hotfixes` options.
For example:
```
proxy=http://127.0.0.1:3128module_hotfixes=True
```
But expected result:
```
proxy=http://127.0.0.1:3128
module_hotfixes=True
```
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
crio refuses to delete pods when cni is unavailable which is the
case e.g. using calico with kdd datastore. See:
https://github.com/cri-o/cri-o/issues/4084
Fix by deleting storage associated with containers. Stop and disable
crio service so switching container runtime can be done.
* Added option to force apiserver and respective client certificate to be regenerated without necessarily needing to bump the K8S cluster version
* Removed extra blank line
Handlers with the same name (Kubeadm | restart kubelet) leads to incorrect playbook execution. As a result, after completing the tasks, kubelet does not restart. This PR fix this behavior
After upgrading to newer Kubernetes(v1.17 at least), kubectl command
shows the following warning message:
WARNING: Kubernetes configuration file is group-readable.
This is insecure. Location: /home/foo/.kube/config
The kubeconfig was copied from {{ artifacts_dir }}/admin.conf with
kubeconfig_localhost feature. It is better to set valid file mode
at getting it on Kubespray.
The 0d0cc8cf9c change creates several
DaemonSets to cover the Flannel CNI installation for different CPU
architectures. This change removes the unnecessary architecture value
from the docker tag value.
Signed-off-by: Victor Morales <v.morales@samsung.com>
In case multiple nodeselectors are specified in ingress_nginx_nodeselector, the generated daemonset yaml template for nginx is invalid due to missing indentation starting with the second nodeselector
When stopping at the check of "Stop if ip var does not match local ips"
the error message is like:
fatal: [single-k8s]: FAILED! => {
"assertion": "ip in ansible_all_ipv4_addresses",
"changed": false,
"evaluated_to": false,
"msg": "Assertion failed"
}
That doesn't contain actual IP addresses and it is difficult to understand
what was wrong. This adds the error message which contain actual IP addresses
to investigate the issue if happens.
* calico: add constant calico_min_version_required
and verify current deployed version against it.
* calico: remove upgrade support with data migration
The tool was used pre v3.0.0 and is no longer needed.
* calico: remove old version support from tasks
* calico: remove old ver support from policy ctrl
* calico: remove old ver support from node
* canal: remove old ver support
* remove unused calicoctl download checksums
calico_min_version_required is the oldest version that can be installed
Older versions can be removed.
* Add retries to update calico-rr data in etcd through calicoctl
* Update update-node yaml syntax
* Add comment to clarify ansible block loop
* Remove trailing space
* Fix reserved memory unit in kubelet configuration
Signed-off-by: Wang Zhen <lazybetrayer@gmail.com>
* Move systemReserved default values from template
Signed-off-by: Wang Zhen <lazybetrayer@gmail.com>
* Added ability to set calico vxlan vni and port. defaults to calico's documented defaults.
* Check if calico_network_backend is defined prior to checking value
* Removed calico hidden defaults for vxlan port and vni
* Fixed FELIX_VXLANVNI typo
* Added support for setting tiller_service_account and tiller_replicas
* Specify helm 2 version to ensure we have a test path that still hits helm 2 code
* Moved tiller_service_account to defaults.yml. Fixed is tiller_replicas defined check.
* Make metallb image repos configurable
* Moved metallb image repo definitions to download role defaults
* Removed comment. These are set in download defaults
After host reboot kubelet and crio goes into a loop and no container is started.
storage_driver in crio.conf overrides system defaults in etc/containers/storage.conf
/etc/containers/storage.conf is installed by package containers-common dependency
installed from cri-o (centos7) and contains "overlay".
Hosts already configured with overlay2 should be reconfigured and the
/var/lib/containers content removed.
* Add comment from roles/kubespray-defaults/defaults/main.yaml clarifying network allocation and sizes
Signed-off-by: Mikael Johansson <mik.json@gmail.com>
* Rewrite of the comment and added new examples
Signed-off-by: Mikael Johansson <mik.json@gmail.com>
* remove podman cni plugin
* configure networkamanger global dns
* allow installation of python3-libselinux by disabling update repo temporary
* remove ipv4 section because it is not a valid configuration
Removes these startup warnings:
Warning: For remote container runtime, --pod-infra-container-image is ignored in kubelet, which should be set in that remote runtime instead
Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
* add snapshot-controller and v1beta1 snapshot api
* fix typo
* udpate manifest to v1beta1
* update
* update manifests
* fix spelling
* wait until crd is applied
* fix missing info in kube module
* revert snapshotclass
* add snapshot crds before applying the csi driver
* add crds, missed them in last commit
* use pull policy from kubespray
* Update CustomResourceDefinition for kubecontrollersconfigurations.crd.projectcalico.org to v1
* Align ClusterRole for kube-controllers with upstream (calico)
* Add support for openstack application credentials
* Add some lines for readability
* Update external_openstack_tenant_id check
Do not check external_openstack_tenant_id when application credentials are defined
* Add check for external_openstack_domain_id
* Fix typo
By default do not allow "unqualified" (without a registry) images
because it is considered unsecure and subject to mitm attacks.
To enable insecure pull configure for example:
crio_registries:
- "docker.io"
- "quay.io"
* Use proper openssl command to differentiate between host and ip in current certificate check
* fixup! Use proper openssl command to differentiate between host and ip in current certificate check
The bootstrap-os role uses a bootstrap script to provision a
python interpreter on flatcar and container os hosts. As the
pypy project switched to another hoster, the download url changed.
If applied this will use the new proper pypy download url in bootstrap script
* Update the cilium svc proxy test to HA mode
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Fix cilium strict kube-proxy in HA
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Add a single global endpoint variable
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Add cilium docs about kube-proxy replacement
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Fix issues in docs
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Option for MetalLB to talk BGP
* Check for BGP peers when metallb_protocol is bgp
* README clarification
* Commented values as documentation only in the sample inventory
* layer 2 or BGP, not both
* log level by default increased to 'info'
* cgroup manager by default set to 'systemd'
* stream port (used by kubelet) bound to 127.0.0.1 for security reasons
* metrics can be enabled and port specified
Trying to layer this package on Fedora 32 causes the install to crash
and furthermore it looks like the original bug linked to in the comment
has been resolved for Fedora 31
* Update calico_veth_mtu to FELIX_IPINIP variable
calico_veth_mtu is specified in the configuration, but since it only works for wireguard, modify it to work for IP-in-IP users.
* Update template with more cleaner expression
CI job 624031102 failed with:
fatal: [ubuntu1804]: FAILED! => {"changed": false, "msg": "Failed to download key at https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable/xUbuntu_18.04/Release.key: Request failed: <urlopen error [Errno -3] Temporary failure in name resolution>"}
Assuming its a temporary problem it should get more robust with a
couple of retries like in other roles.
Co-authored-by: Hans Feldt <hafe@users.noreply.github.com>
* Add additional metadata configuration option to external Openstack CCM (kubernetes-sigs#6338)
* Set the variable external_openstack_metadata_search_order undefined by default
* Fix kubelet cgroup driver detection for crio
Remove fact standalone_kubelet since it is not used
* Fix yamllint complaints of roles/kubernetes/node/tasks/facts.yml
Co-authored-by: Hans Feldt <hafe@users.noreply.github.com>
This changes MetalLB contrib to one of addons for deploying MetalLB with
Kubernetes cluster deployment. By the default, Kubespray doesn't deploy
MetalLB addon.
Support for Ambassador OSS as an Ingress Controller when
settings `ingress_ambassador_enabled: true`.
Signed-off-by: Alvaro Saurin <alvaro.saurin@gmail.com>
* Install Kata Containers as additional container runtime
* Create RuntimeClasses for Kata Containers
* Updated Vagrant to optionally run without Docker as container manager
* Updated Vagrant to optionally use Libvirt nested virtualization
* Add Kata Containers documentation
* Fix lint errors
* Add kata_containers_enabled to kubespray-defaults
* Fixed typo error
* Fixed typo error
We need to specify either external_openstack_tenant_name or
external_openstack_tenant_id. Those values were checked by seeing they
are defined or they have actual values separately.
However those values are always defined because of the following code
of openstack/defaults/main.yml:
external_openstack_tenant_id: "{{ lookup('env','OS_TENANT_ID')| default(lookup('env','OS_PROJECT_ID'),true) }}"
external_openstack_tenant_name: "{{ lookup('env','OS_TENANT_NAME')| default(lookup('env','OS_PROJECT_NAME'),true) }}"
So even if not specifying both values, those checks could not detect
the misconfiguration. This fixes this to detect the misconfiguration.
* MINOR: Check kernel version before enable modprobe nf_conntrack
* CLEANUP: no more need to ignore error of this task
* MINOR: Fixing yaml and ansible lint error - remove trailling-space
If the special parameter "$@" is not quoted, the following command will not work:
./kubectl.sh patch storageclass my-storage-class -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
* Add oraclelinux8 and disable firewalld
Add oraclelinux8 image and disable firewalld on oraclelinux VMs
* Fix Oracle Linux repositories
As documented in: http://yum.oracle.com/getting-started.html#installing-software-from-oracle-linux-yum-server
public-yum-ol7.repo was deprecated on release 7.6. Some repos were integrated into oracle-linux-ol7.repo (i.e.: ol7_latest, ol7_addons) and other are available as packages (epel). This also adds support for oraclelinux8
* Fix to use ansible_distribution_version
Instead of ansible_distribution_major_version
* Update README.md
On OpenStack history, we used to call "tenant" for separeted namespace.
However we use "project" now instead.
Then we have replaced "tenant" with "project". Then all "TENANT" variables
also are renamed to "PROJECT".
This makes Kubespray search "PROJECT" variable also for newer OpenStack
clouds.
with the Python ruamel.yml library
- Change True/False to true/false in a few places so file can
be more easily round-tripped with the Python ruamel.yml library
flannel, ovn and multus network plugins did not support all taint keys. This
update changes the tolerations to support them all.
According to the documentation:
```
There are two special cases: An empty key with operator Exists matches all keys,
values and effects which means this will tolerate everything. An empty effect matches
all effects with key key.
```
Usage of the empty `key` and `effect` ensures the network plugin daemonset will
be deployed on every nodes (ex: in case of custom taints, or NoExecute effect)