follow new naming conventions for gcr's coredns image.
starting from 1.21 kubeadm assumes it to be `coredns/coredns`:
this causes the kubeadm deployment being unable to pull image, beacuse `v`
was also added in image tag, until the role `kubernetes-apps` ovverides
it with the old name, which is only compatible with <=1.7.
Backward comptability with kubeadm <=1.20 is mantained checking
kubernetes version and falling back to old names (`coredns:1.xx`) when
the version is less than 1.21
* rename ansible groups to use _ instead of -
k8s-cluster -> k8s_cluster
k8s-node -> k8s_node
calico-rr -> calico_rr
no-floating -> no_floating
Note: kube-node,k8s-cluster groups in upgrade CI
need clean-up after v2.16 is tagged
* ensure old groups are mapped to the new ones
* crio: add supported versions 1.20 and 1.21 and align default with k8s version
* cri-o: drop versions 1.17 and 1.18 from version matrix
* update note on cri-o version alignment
* calico: drop support for version 3.15
* drop check for calico version >= 3.3, we are at 3.16 minimum now
* we moved to calico 3.16+ so we can default to /opt/cni/bin/install
* AlmaLinux: ansible>2.9.19 is needed to know about AlmaLinux
* AlmaLinux: identify as a centos derrivative
* AlmaLinux: add AlmaLinux to checks for CentOS
* Use ansible_os_family to compare family and not distribution
As the official document[1], the parameter keepcache should be
'0' or '1' as string. To avoid the following warning message,
this fixes the parameter value:
[WARNING]: The value False (type bool) in a string field was
converted to u'False' (type string). If this does not look
like what you expect, quote the entire value to ensure it
does not change.
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/yum_repository_module.html
* Add containerd_extra_args
This is useful for custom containerd config, e.g. auth
Signed-off-by: Zhong Jianxin <azuwis@gmail.com>
* Make containerd config.toml mode 0640
It may contain sensitive information like password
Signed-off-by: Zhong Jianxin <azuwis@gmail.com>
This PR is to move the cilium kvstore options to the configmap
rather than specifying them in the deployment as args. This
is not technically necessary but keeping all the options in
one place is probably not a bad idea.
Tested with cilium 1.9.5.
When attempting a fresh install without cilium_ipsec_enabled I ran
into the following error:
failed: [k8m01] (item={'name': 'cilium', 'file': 'cilium-secret.yml', 'type': 'secret', 'when': 'cilium_ipsec_enabled'}) =>
{"ansible_loop_var": "item", "changed": false, "item": {"file": "cilium-secret.yml", "name": "cilium", "type": "secret",
"when": "cilium_ipsec_enabled"},"msg": "AnsibleUndefinedVariable: 'cilium_ipsec_key' is undefined"}
Moving the when condition from the item level to the task level solved
the issue.
* Add KubeSchedulerConfiguration for k8s 1.19 and up
With release of version 1.19.0 of kubernetes KubeSchedulerConfiguration
was graduated to beta. It allows to extend different stages of
scheduling with profiles. Such effect is achieved by using plugins and
extensions.
This patch adds KubeSchedulerConfiguration for versions 1.19 and later.
Configuration is set to k8s defaults or to kubespray vars. Moving those
defaults to new vars will be done in following patch.
Signed-off-by: Maciej Wereski <m.wereski@partner.samsung.com>
* KubeSchedulerConfiguration: add defaults
Signed-off-by: Maciej Wereski <m.wereski@partner.samsung.com>
Starting with Cilium v1.9 the default ipam mode has changed to "Cluster
Scope". See:
https://docs.cilium.io/en/v1.9/concepts/networking/ipam/
With this ipam mode Cilium handles assigning subnets to nodes to use
for pod ip addresses. The default Kubespray deploy uses the Kube
Controller Manager for this (the --allocate-node-cidrs
kube-controller-manager flag is set). This makes the proper ipam mode
for kubespray using cilium v1.9+ "kubernetes".
Tested with Cilium 1.9.5.
This PR also mounts the cilium-config ConfigMap for this variable
to be read properly.
In the future we can probably remove the kvstore and kvstore-opt
Cilium Operator args since they can be in the ConfigMap. I will tackle
that after this merges.
When upgrading cilium from 1.8.8 to 1.9.5 I ran into the following
error:
level=error msg="Unable to update CRD" error="customresourcedefinitions.apiextensions.k8s.io
\"ciliumnodes.cilium.io\" is forbidden: User \"system:serviceaccount:kube-system:cilium-operator\"
cannot update resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the
cluster scope" name=CiliumNode/v2 subsys=k8s
The fix was to add the update verb to the clusterrole. I also added
create to match the clusterrole created by the cilium helm chart.
DNSSEC is off by default on ubuntu/bionic64 (18.04) as per resolved.conf(5).
These tasks are artefacts of obsolete infra configuration, and no longer needed.
Further removing these tasks resolves the issue that the tasks always reports
'changed' and bounces systemd-resolved unneccesarily, even if there was no
actual modification of /etc/systemd/resolved.conf.
* Remove contrib/vault
This is marked as broken since 2018 / 3dcb914607
This still reference apiserver.pem, not used since ddffdb63bf
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* Finish nuking vault from the codebase
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
This replaces kube-master with kube_control_plane because of [1]:
The Kubernetes project is moving away from wording that is
considered offensive. A new working group WG Naming was created
to track this work, and the word "master" was declared as offensive.
A proposal was formalized for replacing the word "master" with
"control plane". This means it should be removed from source code,
documentation, and user-facing configuration from Kubernetes and
its sub-projects.
NOTE: The reason why this changes it to kube_control_plane not
kube-control-plane is for valid group names on ansible.
[1]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint/README.md#motivation
While at it remove force_certificate_regeneration
This boolean only forced the renewal of the apiserver certs
Either manually use k8s-certs-renew.sh or set auto_renew_certificates
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* Add crun download_url and checksum
* Change versioning format to crun native versioning
* Download crun using download_file.yml
* Get crun version from download defaults
* Delegate crun binary copy task to crun role
* Download Calico KDD CRDs
* Replace kustomize with lineinfile and use ansible assemble module
* Replace find+lineinfile by sed in shell module to avoid nested loop
* add condition on sed
* use block for kdd tasks + remove supernumerary kdd manifest apply in start "Start Calico resources"
* add nodeselector and tolerations for metallb
* remove unnecessary commented lines in metallb template
* set default speaker toleration to match original manifest
When privileged is enabled for a container, all the `/dev/*` block
devices from the host are mounted into the guest. The
`privileged_without_host_devices` flag prevents host devices from
being passed to privileged containers.
More information:
* https://github.com/containerd/cri/pull/1225
* 1d0f68156b
The important action in kubeadm-version.yml is the templating of the configuration,
not finding / setting the version
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
kubeadm is the default for a long time now,
and admin.conf is created by it, so let kubeadm handle it
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
Using `kubeadm init phase kubeconfig all` breaks kubelet client certificate rotation
as we are missing `kubeadm init phase kubelet-finalize all` to point to `kubelet-client-current.pem`
kubeconfig format is stable so let's just use lineinfile,
this will avoid other future breakage
This revert to the logic before 6fe2248314
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
On CentOS 8 they seem to be ignored by default, but better be extra safe
This also make it easy to exclude other network plugin interfaces
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* use external_openstack_lbaas_use_octavia for template openstack-cloud-config
* Delete external_openstack_lbaas_use_octavia from default values. Added description and default values of variables to docs
* markdown fix
* make this simple
* set external_openstack_lbaas_use_octavia in default values
* duplicated variable in doc
Since a790935d02 all proxy users
should be properly configured
Now when you have *_PROXY vars in your environment it can leads to failure
if NO_PROXY is not correct, or to persistent configuration changes
as seen with kubeadm in 1c5391dda7
Instead of playing constant whack-a-bug, inject empty *_PROXY vars everywhere
at the play level, and override at the task level when needed
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* Move proxy_env to kubespray-defaults/defaults
There is no reasons to use set_facts here
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* Ensure kubeadm doesn't use proxy
*_proxy variables might be present in the environment (/etc/environment, bash profile, ...)
When this is the case we end up with those proxy configuration in /etc/kubernetes/manifests/kube-*.yaml manifests
We cannot unset env variables, but kubeadm is nice enough to ignore empty vars
93d288e2a4/cmd/kubeadm/app/util/env.go (L27)
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
Ubuntu 18.04 crio package ships with 'mountopt = "nodev,metacopy=on"'
even if GA kernel is 4.15 (HWE Kernel can be more recent)
Fedora package ships without metacopy=on
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
By default Ansible stat module compute checksum, list extended attributes and find mime type
To find all stat invocations that really use one of those:
git grep -F stat. | grep -vE 'stat.(islnk|exists|lnk_source|writeable)'
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
`containerd.io` is the companion package of `docker-ce` and is the
proper package name. This is needed to avoid apt upgrade/dist-upgrade
from breaking kubernetes.
Running remove-node.yml tasks for clean up cluster on Fedora CoreOS.
The task failed to restart network daemon (task name: "reset | Restart network").
Fedora CoreOS is essentially using NetworkManager, but this task returns network.
Signed-off-by: Takashi IIGUNI <iiguni.tks@gmail.com>
* Add unique annotation on coredns deployment and only remove existing deployment if annotation is missing.
* Ignore errors when gathering coredns deployment details to handle case where it doesn't exist yet
* Remove run_once, deletegate_to and add to when statement
* Added force_etcd_cert_refresh var to maintain existing functionality. Broke out etcd node cert syncing from member and admin cert sync logic. Now first etcd will sync node certs to other etcd members on every run to keep all etcds up to date after adding additional worker nodes to the cluster
* Updated etcd cert check tasks to better detect when new certificates need to be generated
* Move usage of force_etcd_cert_refresh var to gen_certs fact set
* Force etcd cert generation per server if force_etcd_cert_refresh is set to true
* Include gathering of node certs even if k8s-cluster member and in etcd group.
* Removed run_once due to when statement
Helm v3.5.2 is a security (patch) release. Users are strongly
recommended to update to this release. It fixes two security issues in
upstream dependencies and one security issue in the Helm codebase.
See https://github.com/helm/helm/releases/tag/v3.5.2
This makes the docker role work the same as the containerd role.
Being able to override this is needed when you have your own debian
repository. E.g. when performing an airgapped installation
* update local-path-storage config template to version v0.0.19
* changes local_path_provisioner image tag to v0.0.19
* removes copy paste example from rancher local-path-provisioner repo
According to the following recommendation, this moves the directory
to control-plane:
The Kubernetes project is moving away from wording that is considered
offensive. A new working group WG Naming was created to track this work,
and the word "master" was declared as offensive. A proposal was formalized
for replacing the word "master" with "control plane".
Previous check for presence of NM assumed "systemctl show
NetworkManager" would exit with a nonzero status code, which seems not
the case anymore with recent Flatcar Container Linux.
This new check also checks the activeness of network manager, as
`is-active` implies presence.
Signed-off-by Jorik Jonker <jorik@kippendief.biz>
This was introduced in 143e2272ff
Extra repo is enabled by default in CentOS, and is not the right repo for EL8
Instead of adding a CentOS repo to RHEL, enable the needed RHEL repos with rhsm_repository
For RHEL 7, we need the "extras" repo for container-selinux
For RHEL 8, we need the "appstream" repo for container-selinux, ipvsadm and socat
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Only checking the kubernetes api on the first master when upgrading is not enough.
Each master needs to be checked before it's upgrade.
Signed-off-by: Rick Haan <rickhaan94@gmail.com>
yum_repository expect really different params, so nothing to factor here
Ubuntu is not an ansible_os_family, the OS family for Ubuntu is Debian
Check for ansible_pkg_mgr == apt
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
we don't need rpm_key, so nothing to factor here
Ubuntu is not an ansible_os_family, the OS family for Ubuntu is Debian
Check for ansible_pkg_mgr == apt
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Here the desciption from Ansible docs
Corresponds to the --force-yes to apt-get and implies allow_unauthenticated: yes
This option will disable checking both the packages' signatures and the certificates of the web servers they are downloaded from.
This option *is not* the equivalent of passing the -f flag to apt-get on the command line
**This is a destructive operation with the potential to destroy your system, and it should almost never be used.** Please also see man apt-get for more information.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
TASK [Generate a list of information about the images on a node]
registers list of container images to docker_images.
Then the next TASK [Set pull_required if the desired image is not
yet loaded] does based on expecting images are registered.
However sometimes the first TASK was failed as [1] but the failure
is ignored due to failed_when:false and it makes another issue.
This removes this unnecessary failed_when to detect the failure
at the point.
In addition, this removes no_log:true also because the output doesn't
contain any sensitive data and now it just makes debugging difficult.
[1]: https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/934714534#L2953
no_proxy is a pain to get right, and having proxy variables present causes issues
(k8s components get proxy configuration after upgrade, see #7100)
It's better to only configure what require proxy:
- the runtime (containerd/docker/crio)
- the package manager + apt_key
- the download tasks
Tested with the following clusters
- 4 CentOS 8 nodes
- 1 Ubuntu 20.04 node
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
In some environments, it might not be possible to ping the IP address
of the nodes, e.g., because ICMP echo is blocked.
This commit allows kubespray to be configured to disable the ping
check, while performing all other checks.
TASK [network_plugin/calico : Calico | Configure calico network pool] **********
task path: /builds/kargo-ci/kubernetes-sigs-kubespray/roles/network_plugin/calico/tasks/install.yml:138
Friday 08 January 2021 17:10:12 +0000 (0:00:01.521) 0:11:36.885 ********
[WARNING]: The value {'kind': 'IPPool', 'apiVersion': 'projectcalico.org/v3',
'metadata': {'name': 'default-pool'}, 'spec': {'blockSize': 24, 'cidr':
'10.233.64.0/18', 'ipipMode': 'Always', 'vxlanMode': 'Never', 'natOutgoing':
True}} (type dict) in a string field was converted to "{'kind': 'IPPool',
'apiVersion': 'projectcalico.org/v3', 'metadata': {'name': 'default-pool'},
'spec': {'blockSize': 24, 'cidr': '10.233.64.0/18', 'ipipMode': 'Always',
'vxlanMode': 'Never', 'natOutgoing': True}}" (type string). If this does not
look like what you expect, quote the entire value to ensure it does not change.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Improve how we set 'proxy=' in yum.conf or dnf.conf
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Fixup spaces in no_proxy
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Add svc,svc.{{ dns_domain }} to no_proxy
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Remove because of empty need_http_proxy.rc if http/https_proxy and skip_http_proxy_on_os_packages=true is set
* Modify sample for debian and centos skip_http_proxy
* Modify sample for debian and centos skip_http_proxy
If some settings were changed from the default but not commited into an inventory repo,
we risk breaking the cluster / cause downtime, so add some extra checks
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Upgrading docker / containerd without adapting the configuration might break the node,
so disable docker-ce repo by default.
We are already using dpkg hold for Debian.
All containerd.io packages provide /usr/bin/runc, so no need to check
yum_conf was never used for containerd
module_hotfixes should not be needed with the EL8 repo
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Fedora CoreOS: Fix for ethtool pre-installed
Fix error in rpm-ostree when ethtool is already insatlled (FCOS >= 32.20201104.3.0)
* Fedora CoreOS: Fix connection lost
Fedora CoreOS: Ignore connection lost due to reboot and continues the playbook
We are currently setting the IP variable to hostIP,
Before https://github.com/projectcalico/node/pull/593 (not yet released)
Calico interpret that as hostIP/32
Using 'can-reach' we get the future behavior
This fixes vxlan and IPIP CrossSubnet modes
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* update files to handle multi-asn bgp peering conditions.
* put back in the serviceClusterIPs. Bad merge.
* remove extraneous environment var.
* update files as discussed with mirwan
* update titles.
* add not in.
* add a conditional for using bgp to advertise cluster ips.
Co-authored-by: marlow-h <mweston@habana.ai>
If cluster-name is not set, the default value "kubernetes" is used.
The loadbalancees created by Kubernetes follow the format:
kube_service_clusterName_serviceNamespace_serviceName
If 2 clusters create a loadbalancer for the same service in the same
namespace, they will share the same non-working loadbalancer.
Signed-off-by: Cedric Hnyda <cedric.hnyda@itera.io>
* Update hashes and set default version to 1.19.5
Signed-off-by: anthr76 <hello@anthonyrabbito.com>
* Reorder hashes
1.19.5 hashes should be near 1.19.x
* Added back blank line
This fixes the following warning:
[kubernetes/client : Generate admin kubeconfig with external api endpoint]
[WARNING]: Consider using the file module with state=directory rather than
running 'mkdir'. If you need to use command because file is insufficient
you can
RedHat 8.3 merged nf_conntrack_ipv4 in nf_conntrack but still advertise 4.18
so just try to modprobe and decide depending on the success
Also nf_conntrack is a dependency of ip_vs, so no need to care about it
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Ensure libseccomp is installed before starting containerd on CentOS 8
* Simplify libseccomp install on CentOS 8
- Uses `package` module
- Replaces complex version check with 'state: latest'. The version must
be > 2.3 when using with cri-o.
- Removes unnecessary `not is_ostree` condition as CentOS 8 does not use
ostree
* copying ssh key no longer required, works with password auth
* use copy module instead of synchronize (which requires sshpass)
* less tasks and always changed tasks
* containerd docker hub registry mirror support
* add docs
* fix typo
* fix yamllint
* fix indent in sample
and ansible-playbook param in testcases_run
* fix md
* mv common vars to tests/common/_docker_hub_registry_mirror.yml
* checkout vars to upgrade tests
If crictl (and docker) binaries are deployed to the directories
that are not in standard PATH (e.g. /usr/local/bin), it is required
to specify full path to the binaries.
The task outputs the following warning:
TASK [kubernetes/preinstall : Enable ip forwarding]
[WARNING]: The value 1 (type int) in a string field was converted
to u'1' (type string). If this does not look like what you expect,
quote the entire value to ensure it does not change.
This new version uses the same base image as kube-proxy
(k8s.gcr.io/build-image/debian-iptables)
This allow to automatically pick iptables-legacy or iptables-nft,
and be compatible with RHEL/CentOS 8
https://github.com/kubernetes/dns/pull/367
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* fix flake8 errors in Kubespray CI - tox-inventory-builder
* Invalidate CRI-O kubic repo's cache
Signed-off-by: Victor Morales <v.morales@samsung.com>
* add support to configure pkg install retries
and use in CI job tf-ovh_ubuntu18-calico (due to it failing often)
* Switch Calico, Cilium and MetalLB image repos to Quay.io
Co-authored-by: Victor Morales <v.morales@samsung.com>
Co-authored-by: Barry Melbourne <9964974+bmelbourne@users.noreply.github.com>
calico PODs are first started and then in a handler killed and
restarted for no reason, nothing has changed.
By using the existing variable 'calico_cni_config' (only defined when
calico has already started) the restart can be skipped.
* create a wrapper script with pki options
* supports all kubespray managed container engines
Co-authored-by: Hans Feldt <hafe@users.noreply.github.com>
* Allow the eventRecordQPS setting to be set.
The eventRecordQPS parameter controls rate limiting for event recording. When zero, unlimited events can cause denial-of-service situations. For my situation, I don't need more than a setting of "5". This change allows me to configure the setting before creating the cluster.
* Allow the eventRecordQPS setting to be set.
The default settings (see types.go) is five. So, this change does not affect the cluster provisioning. However, it does allow for the setting to be changed.
Fedora 31 uses Cgroups v2 by default. This change by passes the kernel
parameter systemd.unified_cgroup_hierarchy=0.
Signed-off-by: Victor Morales <v.morales@samsung.com>
* update version of ingress-nginx controller.
Change tag from controller-v0.34.0 to controller-v0.40.2 to use newest tag.
* Update docs about aws deploy templates.
In the yaml templates, there is no mention of idle timeouts. This is why I removed the documentation about it. This might be a mistake. Please verify this. I don't know enough to verify it myself.
* Change label when checking version.
When checking for `app.kubernetes.io/name=ingress-nginx`, a completed pod was selected which is not helpful when trying to `exec`. Changing the label selects the running controller pod.
* put back the information about ELB Idle Timeouts.
When I removed the information, I had overlooked that it was mentioned in the L7 yaml file. Thanks.
and thereby support upgrade from e.g. 1.18.x to 1.19.y
Included OSes:
- Centos7/8
- Ubuntu18/20
New variables for overriding by default installed packages:
- centos_crio_packages
- ubuntu_crio_packages
* Enable Kata Containers for CRI-O runtime
Kata Containers is an OCI runtime where containers are run inside
lightweight VMs. This runtime has been enabled for containerd runtime
thru the kata_containers_enabled variable. This change enables Kata
Containers to CRI-O container runtime.
Signed-off-by: Victor Morales <v.morales@samsung.com>
* Set appropiate conmon_cgroup when crio_cgroup_manager is 'cgroupfs'
* Set manage_ns_lifecycle=true when KataContainers is enabed
* Add preinstall check for katacontainers
Signed-off-by: Victor Morales <v.morales@samsung.com>
Co-authored-by: Pasquale Toscano <pasqualetoscano90@gmail.com>
Command line flags aren't added to kube-proxy which results in missing
feature gates set in this component. Add appropriate setting to
ConfigMap instead.
Signed-off-by: Maciej Wereski <m.wereski@partner.samsung.com>
'ansible.vars.hostvars.HostVarsVars object' has no attribute 'kubeadm_upload_cert'
kubeadm_upload_cert will never be found as a hostvar for the first
master since the task is executed for a worker.
Fix by executing the upload task for the first master and register
the needed key. After that, workers can read hostvars for the master
Var kubeadm_etcd_refresh_cert_key removed since it no longer has
any use.
* Adding option to disable gloablly applying a proxy to etc/yum.conf
* Change made to proxy_yum_globaly basedon reviewer feedback
* fix trailing spaces in ymllint
This fixes the Containerd + EL8 case that was missed in 7d1ab3374e
On CentOS 8 with proxy ansible render inline `proxy` and `module_hotfixes` options.
For example:
```
proxy=http://127.0.0.1:3128module_hotfixes=True
```
But expected result:
```
proxy=http://127.0.0.1:3128
module_hotfixes=True
```
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
crio refuses to delete pods when cni is unavailable which is the
case e.g. using calico with kdd datastore. See:
https://github.com/cri-o/cri-o/issues/4084
Fix by deleting storage associated with containers. Stop and disable
crio service so switching container runtime can be done.
* Added option to force apiserver and respective client certificate to be regenerated without necessarily needing to bump the K8S cluster version
* Removed extra blank line
Handlers with the same name (Kubeadm | restart kubelet) leads to incorrect playbook execution. As a result, after completing the tasks, kubelet does not restart. This PR fix this behavior
After upgrading to newer Kubernetes(v1.17 at least), kubectl command
shows the following warning message:
WARNING: Kubernetes configuration file is group-readable.
This is insecure. Location: /home/foo/.kube/config
The kubeconfig was copied from {{ artifacts_dir }}/admin.conf with
kubeconfig_localhost feature. It is better to set valid file mode
at getting it on Kubespray.
The 0d0cc8cf9c change creates several
DaemonSets to cover the Flannel CNI installation for different CPU
architectures. This change removes the unnecessary architecture value
from the docker tag value.
Signed-off-by: Victor Morales <v.morales@samsung.com>
In case multiple nodeselectors are specified in ingress_nginx_nodeselector, the generated daemonset yaml template for nginx is invalid due to missing indentation starting with the second nodeselector
When stopping at the check of "Stop if ip var does not match local ips"
the error message is like:
fatal: [single-k8s]: FAILED! => {
"assertion": "ip in ansible_all_ipv4_addresses",
"changed": false,
"evaluated_to": false,
"msg": "Assertion failed"
}
That doesn't contain actual IP addresses and it is difficult to understand
what was wrong. This adds the error message which contain actual IP addresses
to investigate the issue if happens.
* calico: add constant calico_min_version_required
and verify current deployed version against it.
* calico: remove upgrade support with data migration
The tool was used pre v3.0.0 and is no longer needed.
* calico: remove old version support from tasks
* calico: remove old ver support from policy ctrl
* calico: remove old ver support from node
* canal: remove old ver support
* remove unused calicoctl download checksums
calico_min_version_required is the oldest version that can be installed
Older versions can be removed.
* Add retries to update calico-rr data in etcd through calicoctl
* Update update-node yaml syntax
* Add comment to clarify ansible block loop
* Remove trailing space
* Fix reserved memory unit in kubelet configuration
Signed-off-by: Wang Zhen <lazybetrayer@gmail.com>
* Move systemReserved default values from template
Signed-off-by: Wang Zhen <lazybetrayer@gmail.com>
* Added ability to set calico vxlan vni and port. defaults to calico's documented defaults.
* Check if calico_network_backend is defined prior to checking value
* Removed calico hidden defaults for vxlan port and vni
* Fixed FELIX_VXLANVNI typo
* Added support for setting tiller_service_account and tiller_replicas
* Specify helm 2 version to ensure we have a test path that still hits helm 2 code
* Moved tiller_service_account to defaults.yml. Fixed is tiller_replicas defined check.
* Make metallb image repos configurable
* Moved metallb image repo definitions to download role defaults
* Removed comment. These are set in download defaults
After host reboot kubelet and crio goes into a loop and no container is started.
storage_driver in crio.conf overrides system defaults in etc/containers/storage.conf
/etc/containers/storage.conf is installed by package containers-common dependency
installed from cri-o (centos7) and contains "overlay".
Hosts already configured with overlay2 should be reconfigured and the
/var/lib/containers content removed.
* Add comment from roles/kubespray-defaults/defaults/main.yaml clarifying network allocation and sizes
Signed-off-by: Mikael Johansson <mik.json@gmail.com>
* Rewrite of the comment and added new examples
Signed-off-by: Mikael Johansson <mik.json@gmail.com>
* remove podman cni plugin
* configure networkamanger global dns
* allow installation of python3-libselinux by disabling update repo temporary
* remove ipv4 section because it is not a valid configuration
Removes these startup warnings:
Warning: For remote container runtime, --pod-infra-container-image is ignored in kubelet, which should be set in that remote runtime instead
Using "/var/run/crio/crio.sock" as endpoint is deprecated, please consider using full url format "unix:///var/run/crio/crio.sock".
* add snapshot-controller and v1beta1 snapshot api
* fix typo
* udpate manifest to v1beta1
* update
* update manifests
* fix spelling
* wait until crd is applied
* fix missing info in kube module
* revert snapshotclass
* add snapshot crds before applying the csi driver
* add crds, missed them in last commit
* use pull policy from kubespray
* Update CustomResourceDefinition for kubecontrollersconfigurations.crd.projectcalico.org to v1
* Align ClusterRole for kube-controllers with upstream (calico)
* Add support for openstack application credentials
* Add some lines for readability
* Update external_openstack_tenant_id check
Do not check external_openstack_tenant_id when application credentials are defined
* Add check for external_openstack_domain_id
* Fix typo
By default do not allow "unqualified" (without a registry) images
because it is considered unsecure and subject to mitm attacks.
To enable insecure pull configure for example:
crio_registries:
- "docker.io"
- "quay.io"
* Use proper openssl command to differentiate between host and ip in current certificate check
* fixup! Use proper openssl command to differentiate between host and ip in current certificate check
The bootstrap-os role uses a bootstrap script to provision a
python interpreter on flatcar and container os hosts. As the
pypy project switched to another hoster, the download url changed.
If applied this will use the new proper pypy download url in bootstrap script
* Update the cilium svc proxy test to HA mode
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Fix cilium strict kube-proxy in HA
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Add a single global endpoint variable
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Add cilium docs about kube-proxy replacement
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Fix issues in docs
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
* Option for MetalLB to talk BGP
* Check for BGP peers when metallb_protocol is bgp
* README clarification
* Commented values as documentation only in the sample inventory
* layer 2 or BGP, not both
* log level by default increased to 'info'
* cgroup manager by default set to 'systemd'
* stream port (used by kubelet) bound to 127.0.0.1 for security reasons
* metrics can be enabled and port specified
Trying to layer this package on Fedora 32 causes the install to crash
and furthermore it looks like the original bug linked to in the comment
has been resolved for Fedora 31
* Update calico_veth_mtu to FELIX_IPINIP variable
calico_veth_mtu is specified in the configuration, but since it only works for wireguard, modify it to work for IP-in-IP users.
* Update template with more cleaner expression
CI job 624031102 failed with:
fatal: [ubuntu1804]: FAILED! => {"changed": false, "msg": "Failed to download key at https://download.opensuse.org/repositories/devel:kubic:libcontainers:stable/xUbuntu_18.04/Release.key: Request failed: <urlopen error [Errno -3] Temporary failure in name resolution>"}
Assuming its a temporary problem it should get more robust with a
couple of retries like in other roles.
Co-authored-by: Hans Feldt <hafe@users.noreply.github.com>
* Add additional metadata configuration option to external Openstack CCM (kubernetes-sigs#6338)
* Set the variable external_openstack_metadata_search_order undefined by default
* Fix kubelet cgroup driver detection for crio
Remove fact standalone_kubelet since it is not used
* Fix yamllint complaints of roles/kubernetes/node/tasks/facts.yml
Co-authored-by: Hans Feldt <hafe@users.noreply.github.com>
This changes MetalLB contrib to one of addons for deploying MetalLB with
Kubernetes cluster deployment. By the default, Kubespray doesn't deploy
MetalLB addon.
Support for Ambassador OSS as an Ingress Controller when
settings `ingress_ambassador_enabled: true`.
Signed-off-by: Alvaro Saurin <alvaro.saurin@gmail.com>
* Install Kata Containers as additional container runtime
* Create RuntimeClasses for Kata Containers
* Updated Vagrant to optionally run without Docker as container manager
* Updated Vagrant to optionally use Libvirt nested virtualization
* Add Kata Containers documentation
* Fix lint errors
* Add kata_containers_enabled to kubespray-defaults
* Fixed typo error
* Fixed typo error
We need to specify either external_openstack_tenant_name or
external_openstack_tenant_id. Those values were checked by seeing they
are defined or they have actual values separately.
However those values are always defined because of the following code
of openstack/defaults/main.yml:
external_openstack_tenant_id: "{{ lookup('env','OS_TENANT_ID')| default(lookup('env','OS_PROJECT_ID'),true) }}"
external_openstack_tenant_name: "{{ lookup('env','OS_TENANT_NAME')| default(lookup('env','OS_PROJECT_NAME'),true) }}"
So even if not specifying both values, those checks could not detect
the misconfiguration. This fixes this to detect the misconfiguration.
* MINOR: Check kernel version before enable modprobe nf_conntrack
* CLEANUP: no more need to ignore error of this task
* MINOR: Fixing yaml and ansible lint error - remove trailling-space
If the special parameter "$@" is not quoted, the following command will not work:
./kubectl.sh patch storageclass my-storage-class -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
* Add oraclelinux8 and disable firewalld
Add oraclelinux8 image and disable firewalld on oraclelinux VMs
* Fix Oracle Linux repositories
As documented in: http://yum.oracle.com/getting-started.html#installing-software-from-oracle-linux-yum-server
public-yum-ol7.repo was deprecated on release 7.6. Some repos were integrated into oracle-linux-ol7.repo (i.e.: ol7_latest, ol7_addons) and other are available as packages (epel). This also adds support for oraclelinux8
* Fix to use ansible_distribution_version
Instead of ansible_distribution_major_version
* Update README.md
On OpenStack history, we used to call "tenant" for separeted namespace.
However we use "project" now instead.
Then we have replaced "tenant" with "project". Then all "TENANT" variables
also are renamed to "PROJECT".
This makes Kubespray search "PROJECT" variable also for newer OpenStack
clouds.
with the Python ruamel.yml library
- Change True/False to true/false in a few places so file can
be more easily round-tripped with the Python ruamel.yml library
flannel, ovn and multus network plugins did not support all taint keys. This
update changes the tolerations to support them all.
According to the documentation:
```
There are two special cases: An empty key with operator Exists matches all keys,
values and effects which means this will tolerate everything. An empty effect matches
all effects with key key.
```
Usage of the empty `key` and `effect` ensures the network plugin daemonset will
be deployed on every nodes (ex: in case of custom taints, or NoExecute effect)
Since weave 2.5.1, `NoExecute` taint effect is no more supported,
this changes the daemonset tolerations to change this behavior.
Also remove the toleration key `CriticalAddonsOnly` not required anymore.
* fix(kubelet): exec notify restart kubelet service when kube-config.yml changed
* Revert "refactor(kubelet handler): change task name("reload kubelet") this is misleading"
This reverts commit 8f5d29560802c7c997293adb1ce9f84d3b20b6cb.
* fix(handlers,kubelet): setting right notify task name
* Add additional network configuration options to external Openstack CCM (#6083)
* Change the default version of external openstack cloud controller image to v1.18.1 since there was an issue in v1.18.0 where some IPs of the private network were ignored
* Change Network section in external-openstack-cloud-config.j2 to Networking
* Add networking customization information in the openstack documentation
The 98e7a07fba commit udpates the
dashboard version to 2.0.0 but it enable skip login flag wasn't
updated. This change updates its identation to avoid issues when
dashboard_skip_login is enabled.
* bump to dashboard 2.0 rc6 with metrics scrapper
* fix missing yaml seperator making Replicaset complaining about missing ServiceAccount
* unwanted legay gross hack forgot to remove before
* no need namespace on CrBinding
* bump to 2.0.0 release
* remove dashboard_metrics_scrapper_enabled
* replace removed repo with kubic repository for centos 7
* add crio configuration for centos8
* add crio configurations for debian
* use correct crio version for fedora
* simplify calulation of required crio version
- gives possibility to overwrite
* change default path for runc
* change default for seccomp path
* change default for conmon
* declare kubic repo for ubuntu
* do not install crictl twice
* move fedora repo modular tasks to crio_repo file
* move centos repo tasks to crio_repo
* declare crio version matrix for ubuntu
* update documentation crio support for ubuntu
* Add proxy support to CRI-O service
The crio.service requires proxy environment variables when it's
deployed behind a corporated network. This change creates a systemd
configuration file when the proxy variables are defined.
* Remove unnecesary crio's tasks
The playbook that bootstrap openSUSE servers assumes that the
/etc/sysconfig/proxy file exists but the execution fails when
these file is not present. This change guarantees its existence.
* assembly fallback_ips and no_proxy var only one time on localhost and populate result on all hosts
* add tag always, fix ansible lint errors
* workaround to mitogen issue dw/mitogen#663
* do not gather fact before install python on coreos like distros
* try to pass docker molecule test
* fix upgrade of crio on fcos
- update documents
* install conntrack required by kube-proxy
- like commit 48c41bcbe7
* enable fedora modular repo for crio
* allow to override crio configuration
- set cgroup manager same to kubelet_cgroup_driver if defined
- path of seccomp_profile depends on distribution
* allow to override crio configuration
- fix path for ubuntu
* allow to override crio configuration
- fix cni path for fcos
* added required permissions for querying endpointslice resources
* copy-pasted role permissions from cilium install manifests
* bumped cilium version to v1.7.2
* Fix proxy and module_hotfixes
On CentOS 8 with proxy ansible render inline `proxy` and `module_hotfixes` options.
For example:
`proxy=http://127.0.0.1:3128module_hotfixes=True`
But expected result:
```
proxy=http://127.0.0.1:3128
module_hotfixes=True
```
* Use ini_file module for work with ini files
* Prevent duplicates proxy= option in /etc/yum.conf
Module `lineinfile` is weak, use most powerful module `ini_file` and add or remove `proxy=` when `http_proxy` is defined or not.
* etcd: etcd-events doesn't depend on etcd_cluster_setup
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: remove condition already present on include_tasks
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: fix scaling up
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use *access_addresses, do not delegate to etcd[0]
We want to wait for the full cluster to be healthy,
so use all the cluster addresses
Also we should be able to run the playbook when etcd[0] is down
(not tested), so do not delegate to etcd[0]
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use failed_when for health check
unhealthy cluster is expected on first run, so use failed_when
instead of ignore_errors to remove scary red messages
Also use run_once
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/preinstall: ensure ansible_fqdn is up to date after changing /etc/hosts
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/master: regenerate apiserver cert if needed
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Fix chicken and egg problem with proxy_env not defined on the first envinronment usage.
* Disable fact gathering for the first proxy_env evaluation.
* Move proxy_env var set up from the role defaults to the root playbooks as fact.
* kubernetes-sigs-kubespray #5824
Added support nodes which are part of Virtual Machine Scale Sets(VMSS)
* kubernetes-sigs-kubespray #5824
* kubernetes-sigs-kubespray #5824
Added comments and updatetd azure docs.
* kubernetes-sigs-kubespray #5824
Added supported values comments for "azure_vmtype" in azure.yml
The variable is defined in `kubernetes/preinstall` role and used in several roles. Since `kubernetes/preinstall` is not always included when `ansible-playbook` is run with tag selectors (see #5734 for reason), they will fail, or individual roles must copy the same fact definitions (as in #3846). Moving the definition to the always-included `kubespray-defaults` role will resolve the dependency problem.
- This solves issue #5721 & #5713 (dupes)
- Provide a cleaner default usage pattern for the download role
around etcd that supports 'host' and 'docker' properly
- Extract the 'etcdctl' as a separate task install piece and reuse it where
appropriate
- Update the kubeadm-etcd task to reflect the above change
* fedora coreos support
- bootstrap and new fact for
* fedora coreos support
- fix bootstrap condition
* fedora coreos support
- allow customize packages for fedora coreos bootstrap
* fedora coreos support
- prevent install ptyhon3 and epel via dnf for fedora coreos
* fedora coreos support
- handle all ostree like os in same way
* fedora coreos support
- handle all ostree like os in same way for crio
* fedora coreos support
- add fcos documentations
* Support configuring the insert mode
Defaults to the upstream default https://docs.projectcalico.org/v3.9/reference/felix/configuration
so nothing should change for existing deployments.
This allows coexistence with other firewall management technologies.
* Add a note to the sample config
* Add docker-ce 19.03 packages for Debian & Ubuntu
K8s has updated the recommended Docker version to 19.03. More
specifically it should be 19.03.4, but since we used 18.06.7 instead of
.2, I'm assuming the latest patch version should be used here as well.
* Add docker 19.03 for redhat
The 'regexp' parameter matches last occurrence of a line starting with 'proxy=' and replaces it with the one defined in 'line' parameter. If no match - it works same way as before. This fixes resuming cluster deployments failed after that task (if there was no more than one line starting with 'proxy' in the yum.conf file - this condition should also be reassured with the change introduced here) eg. if they were initiated with Terraform.
refs #5277
As the issue describes, when no external or local load-balanced is used,
kube-proxy won't be able to contact apiserver at 127.0.0.1. So the
config map should be left as is.
* Upgrade etcd to 3.3.18
* Try with etcd 3.3.15 (kubeadm 1.16.7 default)
* Back to square one
* Try with 3.3.11
* Upgrade etcd to 3.3.18 (take 2)
* Try with 3.3.12
* download file
* download containers
* fix push image to nodes
* pull if none image on host
* fix
* improve docker image tag checks.
do not pull already cached images
* rebase fix merge conflict
* add support download_run_once when upgrade and scale cluster
add some test with download_run_once
* set default values to temp flag for every download cycle
* add save,load abilty for containerd and crio when download_run_once=true
* return redefine image save/load command to set_docker_image_facts.yml
* move set command to set_container_facts
* ctr in containerd_bin_dir
* fix order of ctr image export arguments
* temporary disable download_run_once for containerd and crio
due https://github.com/containerd/containerd/issues/4075
* remove unused files
* fix strict yaml linter warning and errors
* refactor logical conditions to pull and cache container images
* remove comment due lint check
* document role
* remove image_load_on_localhost, because cached images are always loaded to docker on remote sites
* remove XXX from debug output
* Run 'container-engine' after drain.
Move possibly disruptive role 'container-engine' to run after the node
is drained.
As that role have to be run on non-cluster nodes as well (etcd and
calico-rr), and those nodes are not drained, add play for that case.
* Check if api is up before upgrade.
If container engine is restarted in previous role, api controller can
take some time to start. This check ensures api is up before upgrade.
When kube-router is used as cni, rules might be added to the mangle table
to support external IPs. Therefore, mangle table should be flushed during
reset as well.
This 38688a4486 change replaces the
value for dockerproject_.+_repo_.+ docker variables but their new
value was previously defined in other variables. This change removes
the dockerproject_.+_repo_.+ docker variables in favor of the older
ones.
* Fix incorrect assertion comparison for kube_network_node_prefix
* Ignore assertion comparison for kube_network_node_prefix when using calico
* Adding more var docs description for kube_network_node_prefix
* Fixing trailing whitespaces
* External OpenStack Cloud Controller Manager implementation
* Adding controller image tag
* Minor fixes
* Restructuring the external cloud controller to work with KubeADM
* Added in code to allow control over pull policy for local path provisioner
* change to imagePullPolicy to use globally used variable k8s_image_pull_policy
* removed unusued variable from defaults
* updated contiv-etcd and cinder-csi-controllerplugin to use k8s_image_pull_policy variable
* Introduce kubelet_config_extra_args and kubelet_node_config_extra_args to pass params to kubelet via YAML config
* kubelet_config_extra_args is not the alternative
* Fix recover-control-plane to work with etcd 3.3.x and add CI
* Set default values for testcase
* Add actual test jobs
* Attempt to satisty gitlab ci linter
* Fix ansible targets
* Set etcd_member_name as stated in the docs...
* Recovering from 0 masters is not supported yet
* Add other master to broken_kube-master group as well
* Increase number of retries to see if etcd needs more time to heal
* Make number of retries for ETCD loops configurable, increase it for recovery CI and document it
* containerd: add proxy support
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubespray-defaults: add kube_service_addresses / kube_pods_subnet to no_proxy
CIDR notation in no_proxy is supported by a lot of programs/languages,
including go: https://github.com/golang/go/issues/16704
Without that containerd cannot talk the the API server (kube_apiserver_ip),
but it should not go through an external proxy for the nodes/pods/services
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Change dockerproject.org to download.docker.com
dockerproject.org was deprecated in 2017 and has gone down.
* Restore yum repo for containerd
Change-Id: I883bb512a2164a85865b1bd4fb569af0358c8c2b
Co-authored-by: Craig Rodrigues <rodrigc@crodrigues.org>
When running with serial != 100%, like upgrade_cluster.yml, we need to apply this fixup each time
Problem was introduced in 05dc2b3a09
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Raises limit from 100 to 300 because the default is far too low
and the pod can handle 300 with the given resources.
Change-Id: Ib1eec10da3d09d198933fcfe87291587e58d7cdb
I've tested this update by deploying a containerd / etcd cluster on top CentOS7,
MetalLB + NGINX Ingress. Upgrade using upgrade-cluster.yml
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Resolves issue where kubectl cache of <v1.16 api schema
interferes with interacting with daemonsets and deployments.
Change-Id: I63b7046958f2008eb144b6da0004c598f945e0ae
* Fix crictl
* Reload systemd daemon before enabling service
* Typo
* Add crictl template
* Remove seccomp.json for ubuntu
* Set runtime path of runc for ubuntu
* Change path to conmon
There is no cri-tools package in CentOS/EPEL/Red Hat.
Additionally, cri-tools is provided into the installation via
roles/download/defaults/main.yml:104:crictl_download_url.