* feat(): Add wireguard backend to flannel cni
As described in the flannel docs:
https://github.com/flannel-io/flannel/blob/master/Documentation/backends.md#wireguard
This does not support optional configuration methods like:
- setting a psk (will be autogenerated by default)
- chang listening ports
- change mode (defaults to 'separate')
- change PersistentKeepaliveInterval (defaults to 0)
* Add supported backends to flannel docs
* Fix markdown in docs
This fixes a task failure in the rescue block that uncordons nodes after an unsuccessful drain. The issue occurs when `kube_override_hostname` is set and does not match `inventory_hostname`.
The `containerd-common` role is responsible for gathering OS specific variables from the vars directory of the roles that include or import it. `containerd-common` is imported via role dependency by a total of two roles, `container-engine/docker`, and `container-engine/containerd`.
containerd-common is needed by both the docker and containerd roles as a dependency when:
- containerd is selected as the container engine
- a docker install is detected and needs to be removed
- apt is the package manager
However, by default, roles can not be invoked more than once in the same play, unless `allow_duplicates: true` is set for that role. This results in the failure of the `containerd | Remove containerd repository` task, since only the docker vars will be loaded in the play, and `containerd_repo_info.repos`, normally populated by containerd/vars, is left empty.
This change sets `allow_duplicates: true` for `containerd-common` which fixes the currently failing containerd tasks if docker was detected and removed in the same play.
* [metrics_server]: Enabled HA mode by adding 'metrics_server_replicas' variable and adding podAntiAffinity rule
Signed-off-by: Ugur Can Ozturk <57688057+ugur99@users.noreply.github.com>
* [metrics_server]: added namespaces selector
Signed-off-by: Ugur Can Ozturk <57688057+ugur99@users.noreply.github.com>
Signed-off-by: Ugur Can Ozturk <57688057+ugur99@users.noreply.github.com>
During the reset, restart network was not completing in distros
like RHEL/CentOS/AlmaLinux with major version higher than 8.
Example:
kubespray> ansible-playbook -i inventory/mydomain/hosts.yml reset.yml -b -v
fatal: [mynode]: FAILED! => {"changed": false, "msg": "Could not find the requested service network: host"}
Signed-off-by: Douglas Schilling Landgraf <dlandgra@redhat.com>
Signed-off-by: Douglas Schilling Landgraf <dlandgra@redhat.com>
There is a wrong directory path to all.yml and vsphere.yml. The wrong directory is `inventory/sample/group_vars/all.yml` and `inventory/sample/group_vars/all/vsphere.yml` which should be `inventory/sample/group_vars/all/all.yml` and `inventory/sample/group_vars/all/vsphere.yml`.
When operating kubespray from kubespray image with docker run,
we need to checkout the specific kubespray version as the same as
the image, because the sample inventory contains kubernetes version
and the version of master branch could not be supported on the released
kubespray, for example.
The certified-conformance mode took 2+ hours and that was too long
by comparing Quick mode which was specified previously.
So this updates the mode to Quick again.
The latest version of sonobuoy is v0.56.11.
This updates the version to the latest.
As the file name, this makes it use certified-conformance mode
clearly for the latest version of sonobuoy.
by setting a default runtime spec with a patch for RLIMIT_NOFILE.
- Introduces containerd_base_runtime_spec_rlimit_nofile.
- Generates base_runtime_spec on-the-fly, to use the containerd version
of the node.
- Update and re-work the documentation:
- Update links
- Fix formatting (especially for lists)
- Remove documentation about `useAlphaApi`,
a flag only for k8s versions < v1.10
- Attempt to clarify the doc
- Update to version 1.5.0
- Remove PodSecurityPolicy (deprecated in k8s v1.21+)
- Update ClusterRole following upstream
(cf https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner/pull/292)
- Add nodeSelector to DaemonSet (following upstream)
We made all vagrant jobs non-voting because those jobs were not stable.
However the setting allowed a pull request which broke vagrant jobs
completely merged into the master branch.
To avoid such situation, this makes one of vagrant jobs voting.
Let's see the stability of the job.
The install-cni-plugin image was not updated to the corresponding
arch when building the different DS.
Fixes issue #9460
Signed-off-by: Fred Rolland <frolland@nvidia.com>
Signed-off-by: Fred Rolland <frolland@nvidia.com>
`test` is is a internal Python package (see [doc]), and as such should not be
used here. It make tests fail in some environments.
[doc]: https://docs.python.org/3/library/test.html
* Fix inconsistent handling of admission plugin list
* Adjust hardening doc with the normalized admission plugin list
* Add pre-check for admission plugins format change
* Ignore checking admission plugins value when variable is not defined
On hardening environments, cert-manager pods could not be created
from the corresponding deployments. This adds the securityContext
to solve the issue.
To verify the hardening method works always.
The configuration comes from docs/hardening.md
Fix yaml format of hardening.yml
Add condition to skip 040 test for hardening
busybox container requires a root permission for ping.
For testing hardening method at CI, we need to switch to another image
which doesn't require the root permission for network testing.
On kubernetes/kubernetes repo, we are using agnhost which doesn't
require it. So this makes the test use aghhost image.
In addition, this updates the test manifest to specify securityContext
without any privilege.
* Avoid MetalLB speaker image download when metallb_speaker_enabled is set to
* Move metallb_speaker_enabled var to allow outside metalLB role references
* Move metallb_speaker_enabled var to allow outside metalLB role references
* Improve metallb_speaker_enabled default values
When trying to add a hardening CI job by copying configuration from
hardening.md, yamllint CI job deleted invalid format.
This fixes it for maintaining the CI job.
* Fix: install policy controller on kdd too
* Remove the calico_policy_version condition altogether
* Install policy controller both on canal and calico under same condition
* Support kubeadm patches in v1beta3
* Update kubeadm patches sample files in inventory
* Fix pre-commit syntax
* Set kubeadm_patches enabled to false in sample inventory
* Drop support for Cilium < 1.10
Signed-off-by: necatican <necaticanyildirim@gmail.com>
* Synchronize Cilium templates for 1.11.7
Signed-off-by: necatican <contact@necatican.com>
* Set Cilium v1.12.1 as the default version
Signed-off-by: necatican <contact@necatican.com>
Signed-off-by: necatican <necaticanyildirim@gmail.com>
Signed-off-by: necatican <contact@necatican.com>
* Add optional NAT support in calico router mode
* Add a blank line in front of lists
* Remove mutual exclusivity: NAT and router mode
* Ignore router mode from NAT
* Update calico doc
Since the commit fad296616c cri_dockerd_enabled
has not been used. But the packet_ubuntu22-aio-docker.yml still contains
the configuration and causes confusions.
This removes the configuration for cleanup.
It seems that PR #8839 broke `calico_datastore: etcd` when it removed ipamconfig support for etcd mode.
This PR fixes some failing tasks when `calico_datastore == etcd`, but it does not restore ipamconfig support for calico in etcd mode. If someone wants to restore ipamconfig support for `calico_datastore: etcd` please submit a follow up PR for that.
For the following configuration
```
containerd_insecure_registries:
docker.io:
- dockerhubcache.example.com
```
the rendered /etc/containerd/config.toml contains
```
[plugins."io.containerd.grpc.v1.cri".registry.configs."docker.io".tls]
insecure_skip_verify = true
```
but it needs to be
```
[plugins."io.containerd.grpc.v1.cri".registry.configs."dockerhubcache.example.com".tls]
insecure_skip_verify = true
```
* Add the option to enable default Pod Security Configuration
Enable Pod Security in all namespaces by default with the option to
exempt some namespaces. Without the change only namespaces explicitly
configured will receive the admission plugin treatment.
* Fix the PR according to code review comments
* Revert the latest changes
- leave the empty file when kube_pod_security_use_default, but add comment explaining the empty file
- don't attempt magic at conditionally adding PodSecurity to kube_apiserver_admission_plugins_needs_configuration
* Add 'flush ip6tables' task in reset role
If enable_dual_stack_networks is set to true and ip6 is defined,ip6tables will be created. But when reset the kubernetes cluster, kubespray doesn't flush ip6tables.
* [CI] fix molecule tests on opensuse by upgrading to 15.4 (#9175)
* [CI] fix molecule tests on opensuse by upgrading to 15.4
* [opensuse] use correct python crytography package name depending on distribution version
Co-authored-by: Cristian Calin <6627509+cristicalin@users.noreply.github.com>
This condition blocks the creation of the `etcd` user in certain conditions.
Specifically, when you have a `etcd_deployment_type: kubeadm` and `kube_owner: root`.
Being the `root` user already present on the system, this will not be a problem (due to the idempotency of ansible).
Today we have many contributions to contrib/offline/ and some PRs
contained invalid coding style for those scripts.
This enables shellcheck to make such invalid coding style easily.
* Update main.yaml
* remove version in dpkg_selection name
* make lint happy
* Fix typo
* add comment / remove useless contition
* remove dpkg hold in reset tasks
* This release removes support for Kubernetes v1.19.0
* This release adds support for Kubernetes v1.24.0
* Starting with this release, we will need permissions on the coordination.k8s.io/leases resource for leaderelection lock
The commit 1ce2f04 tried to merge multiple SUSE OS checks including
"openSUSE Leap" and "openSUSE Tumbleweed" into a single SUSE, but
that was a perfect change.
Then the commit c16efc9 tried to fix it for "openSUSE Leap", but it
didn't take care of "openSUSE Tumbleweed".
Then this adds "openSUSE Tumbleweed" to the OS check.
* Fix vcloud-csi bug related to #9046
Signed-off-by: yasintahaerol <yasintahaerol@gmail.com>
* add supervisor-fss-namespace=kube-system flag to vsphere-csi-controller-deployment
Signed-off-by: yasintahaerol <yasintahaerol@gmail.com>
This adds target components on check_readme_versions.sh after
merging https://github.com/kubernetes-sigs/kubespray/pull/9044
In addition, this fixes typo on check_readme_versions.sh
This adds `foo_version` variables for some components because
check_readme_versions.sh verifies the corresponding version for
`<component name>_version` from main.yml. This change also makes
consistency in the main.yml. In long-term, we will be able to
remove the existing `foo_image_tag` variables, but that is not now
for backwards compatibility for users.
During code-review, reviwers needed to take care of README.md also
should be updated when the pull request updated component versions.
This adds the corresponding check to reduce reviwer's burden.
* Added new configuration item for extra tolerations in policy controllers
Signed-off-by: Sébastien Masset <smt.masset@gmail.com>
* Added new configuration item for extra tolerations in DNS autoscaler
Signed-off-by: Sébastien Masset <smt.masset@gmail.com>
* Aligned existing handling of extra DNS tolerations
Signed-off-by: Sébastien Masset <smt.masset@gmail.com>
* feat: make kubernetes owner parametrized
* docs: update hardening guide with configuration for CIS 1.1.19
* fix: set etcd data directory permissions to be compliant to CIS 1.1.12
* extra admission controls now don't have a version in their file names
eventratelimit.v1beta2.yaml.j2 -> eventratelimit.yaml.j2
* cri_socket variable includes the unix:// prefix to be conformat with
upstream
When running molecule jobs, we saw the folloing warning message:
[DEPRECATION WARNING]: [defaults]callback_whitelist option, normalizing names
to new standard, use callbacks_enabled instead. This feature will be removed
from ansible-core in version 2.15. Deprecation warnings can be disabled by
setting deprecation_warnings=False in ansible.cfg.
callbacks_enabled has been added since Ansible 2.11 and Kubespray is using
Ansible 2.12 at master branch. So we can use callbacks_enabled safely to
avoid the warning message.
* Allow disabling calico CNI logs with calico_cni_log_file_path
Calico CNI logs up to 1G if it log a lot with current default settings:
log_file_max_size 100 Max file size in MB log files can reach before they are rotated.
log_file_max_age 30 Max age in days that old log files will be kept on the host before they are removed.
log_file_max_count 10 Max number of rotated log files allowed on the host before they are cleaned up.
See https://projectcalico.docs.tigera.io/reference/cni-plugin/configuration#logging
To save disk space, make the path configurable and allow disabling this log by setting
`calico_cni_log_file_path: false`
* Fix markdown
* Update roles/network_plugin/canal/templates/cni-canal.conflist.j2
Co-authored-by: Kenichi Omichi <ken1ohmichi@gmail.com>
Co-authored-by: Kenichi Omichi <ken1ohmichi@gmail.com>
* Fix: set fallback value of kubelet ip6 (#8858)
* Prune the spurious comma in the end of kubelet_address
- Update `roles/kubernetes/node/defaults/main.yml`
Co-authored-by: Cristian Calin <6627509+cristicalin@users.noreply.github.com>
* Fix: set fallback value of kubelet ip6 (#8858)
- Apply the lint: 132606368e
Co-authored-by: Cristian Calin <6627509+cristicalin@users.noreply.github.com>
This reverts commit e375678674.
The workaround of explicitly specifying root for the kubelet unit was
for pulling images from private registry. Kubernetes now have a
dedicated mechanism with imagePullSecret.
Current Kubespray supports the Kubernetes version 1.21 or upper with
`kube_version_min_required: v1.21.0`
Then kube_version v1.20- related code is not used at all.
This deletes those code for cleanup.
* [etcd] Add extra documentation for `etcd_memory_limit` and `etcd_quota_backend_bytes`
Signed-off-by: necatican <necaticanyildirim@gmail.com>
* [etcd] Add support for setting ETCD_MAX_REQUEST_BYTES
Signed-off-by: necatican <necaticanyildirim@gmail.com>
* [upcloud] add option to use preconfigured cpu/mem plan
* [upcloud] add option to use firewall rules for API server/SSH access
* [upcloud] add option to use managed load balancer
* [cilium] Separate templates for cilium, cilium-operator, and hubble installations
Signed-off-by: necatican <necaticanyildirim@gmail.com>
* [cilium] Update cilium-operator templates
Signed-off-by: necatican <necaticanyildirim@gmail.com>
* [cilium] Allow using custom args and mounting extra volumes for the Cilium Operator
Signed-off-by: necatican <necaticanyildirim@gmail.com>
* [cilium] Update the cilium configmap to filter out the deprecated variables, and add the new variables
Signed-off-by: necatican <necaticanyildirim@gmail.com>
* [cilium] Add an option to use Wireguard encryption on Cilium 1.10 and up
Signed-off-by: necatican <necaticanyildirim@gmail.com>
* [cilium] Update cilium-agent templates
Signed-off-by: necatican <necaticanyildirim@gmail.com>
* [cilium] Bump Cilium version to 1.11.3
Signed-off-by: necatican <necaticanyildirim@gmail.com>
* Assert that IP range is enough for the nodes
Co-authored-by: Necatican Yıldırım <necaticanyildirim@gmail.com>
* Fixed whitespace
* Fixed errors
* Fixed errors
Co-authored-by: Necatican Yıldırım <necaticanyildirim@gmail.com>
The version 1.4 of containerd has been End of Life since March 3, 2022
as https://containerd.io/releases/#support-horizon
It is nice to drop the support from Kubespray also to follow containerd.
tests/requirements.txt links to tests/requirements-2.12.txt, so
Kubespray uses ansible 2.12 by default for testing. However we
forgot to update testcases_prepare.sh to use ansible 2.12.
This updates testcases_prepare to use ansible 2.12.
* Force containerd service unmasking
Force systemd to unmask and start service when adding containerd service
* Eliminate restart and move unmasking step
Switch to start instead of restart
Move unmasking to restart handler
* Add unmasking to similar container runtimes
* Add missing service names
aufs-tools was required for docker.io package originally,
but Kubespray installs docker-ce package instead today.
In addition, Ubuntu 20.04 doesn't provide aufs-tools as [1].
Then this removes aufs-tools from Ubuntu requirement.
[1]: https://bugs.launchpad.net/ubuntu/+source/aufs-tools/+bug/1947004
* Update verbs for volumeattachments resource
Update verbs for volumeattachments resource so that the kubelet can create volumeattachments and mount volumes when deploying Kubernetes on VMware vSphere.
* Update verbs for volumeattachments resource
Update verbs for volumeattachments resource to match upstream
* Update vsphere-csi-controller-rbac.yml.j2
* - add ability to specify the network_zone in hetzner terraform
- Export the network id from hetzner terraform the the generated inventory.ini
* - Add with_networks variable to allow different deployments of hcloud controller manager
- Add network id to hcloud controller secret (added via the inventory)
- Don't include extra_args if it's not set
Current ansible.tags 'facts' is for skipping actual Kubespray deployment
at vagrant CI because the deployment takes much time. However the static
'facts' skips the deployment for normal usage of vagrant also.
That causes confusions.
This adds VAGRANT_ANSIBLE_TAGS to skip the deployment for vagrant CI.
The quotations in the variable nerdctl_extra_flags are not required for the `nerdctl_image_pull_command` and throw the following error when executing the cluster-playbook with `container_insecure_registries` set:
unknown flag: --insecure-registry\\\"
This happens as the complete nerdctl_image_pull_command string variable gets split into an array string for the cmd task. The escaped quotation doesn't get escaped properly and is added to the cmd-string array as part of the command. This leads to a wrong written insecure-registry flag, which throws this error.
Due to missing quotation of nerdctl_extra_flags, ansible-playbook was failed:
Using module file /usr/local/lib/python3.6/dist-packages/ansible/modules/command.py
Pipelining is enabled.
[..]
File "/usr/lib/python3.8/shlex.py", line 191, in read_token
raise ValueError("No closing quotation")
This fixes the issue.
T-Eberle investigated the issue and found the solution.
Thank you T-Eberle!
* [ansible] make ansible 5.x the new default version and move different versions tested to nightly jobs
* [CI] jobs were missing proper ansible cleanup
If running Kubespray on static IP environments, a task was failed like:
TASK [kubernetes/preinstall : Configure dhclient hooks for resolv.conf (RH-only)]
fatal: [ak8s2]: FAILED! => {
"changed": false, "checksum": "..",
"msg": "Destination directory /etc/dhcp/dhclient.d does not exist"}
This adds a check for dhclientconffile for running 0100-dhclient-hooks to
run the task only if dhcpclient is enabled.
* terraform/openstack: Use path.module for ansible_bastion_template.txt
This extends on #7643 by not using path.root, but switching to path.module
to allow use of the terraform code as a module itself. This change then keeps
all calls to the template file stable even for that use-case.
* terraform/openstack: Make sed calls fail on errors
By using a single call with two replacements to use of sed will create proper exit codes
and allowing for errors to be recognized by terraform.
When running cluster.yml for new machines what containerd is already
install but Kubernetes cluster were not installed before, the task
"remove-node | List nodes" is failed like
"changed": false,
"cmd": [
"/usr/local/bin/kubectl", "--kubeconfig",
"/etc/kubernetes/admin.conf", "get", "nodes", "-o",
"go-template={{ range .items }}{{ .metadata.name }}
{{ "\n" }}{{ end }}"
],
..
"stderr": "error: stat /etc/kubernetes/admin.conf: no such file or directory",
That was due to lack to check the existing Kubernetes cluster exists
or not before running "kubectl drain" command.
This adds the check to avoid the issue.
* [calico] make vxlan encapsulation the default
* don't enable ipip encapsulation by default
* set calico_network_backend by default to vxlan
* update sample inventory and documentation
* [CI] pin default calico parameters for upgrade tests to ensure proper upgrade
* [CI] improve netchecker connectivity testing
* [CI] show logs for tests
* [calico] tweak task name
* [CI] Don't run the provisioner from vagrant since we run it in testcases_run.sh
* [CI] move kube-router tests to vagrant to avoid network connectivity issues during netchecker check
* service proxy mode still fails connectivity tests so keeping it manual mode
* [kube-router] account for containerd use-case
* Sketch of helm-apps role interface
* helm-apps: Early implementation and settings
* helm-apps: Fix README.md example playbook
* fixup! Sketch of helm-apps role interface
* Make the argument specs more explicit
* Remove exposed options from hardcoded default
* Simplify example playbook in README.md
- Define directly the roles parameters
- Add an example of option override for one chart only
* Use release instead of charts
Make explicit that the role is mananing releases, not charts.
Simplify parameters naming
* Add epoch to docker-ce and docker-ce-cli packages to ensure docker upgrade
* Split container-engine redhat vars to support legacy RHEL 7 version management
* Support ansible_distribution_major_version when disvering vars with ansible_os_family
* Update ansible-lint to 5.4.0 (#8607)
It seems that the Rich version 11.0.0 has a breaking change.
So need to update ansible-lint to 5.3.2 or later.
* Fix for ansible-lint no-changed-when rule (#8607)
* There is an issue with etcd v3.5.0 where it resurrects ancient members see: https://github.com/etcd-io/etcd/issues/13196
This issue is clearly fixed in etcd v3.5.2
* Just keep the checksums
* [containerd] add hashes for 1.6.1
* [contained] make 1.6.1 the default
* [containerd] add hashes for 1.5.10
* [containerd] add hashes for 1.4.13
* [nerdct] bump to 0.17.1
* [containerd] add checksums for 1.6.0
* [containerd] promote 1.6.0 as the new default
* [runc] promote 1.1.0 as the new default to allow arm deployments out of the box
* [nerdctl] bump to 0.17.0 to align with containerd 1.6.0
* [reset] allow crictl stopp and rmp commands to fail
As far as I can tell this is simply a typo that has existed from the beginning. Having it this way around (`etcd` group as a child and thus subset of `k8s_cluster`) mirrors what is written in the preceeding sentence.
When trying to run print_hostnames of inventory.py, it outputs the following
error:
$ CONFIG_FILE=./test-hosts.yaml python3 ./inventory.py print_hostnames
Traceback (most recent call last):
File "./inventory.py", line 472, in <module>
sys.exit(main())
File "./inventory.py", line 467, in main
KubesprayInventory(argv, CONFIG_FILE)
File "./inventory.py", line 92, in __init__
self.parse_command(changed_hosts[0], changed_hosts[1:])
File "./inventory.py", line 415, in parse_command
self.print_hostnames()
File "./inventory.py", line 455, in print_hostnames
print(' '.join(self.yaml_config['all']['hosts'].keys()))
KeyError: 'all'
because it is missed to load a hosts config file before printing hostnames.
This fixes the issue.
* [calico] upgrade 3.19.x to 3.19.4
* [calico] upgrade 3.20.x to 3.20.4
* [calico] upgrade 3.21.x to 3.21.4 and make it the default
* [calico] add 3.22.0 checksums
* [calico] account for path changes in calico 3.21.4 crd archive and above
Fixed the problem when call ansible-playbook contrib/mitogen/mitogen.yml
"The error was: 'dict object' has no attribute 'section'"
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Use openstack_networking_port_v2 and openstack_networking_floatingip_associate_v2
to attach floating ips. This gives us more flexibility on disabling port security
when binding instances directly on provider networks in private cloud scenario.
If kubelet is run with systemd (as it always is when using kubespray),
it starts in systemd's /system.slice/kubelet.service cgroup.
This commit prevents a creation and usage of a second unrelated cgroup.
* terraform/gcp: Do not create unused subnetworks
By default terraform creates a subnetwork in each 39 regions
* terraform/gcp: Upgrade to latest google provider
... where "one of source_tags, source_ranges, or source_service_accounts must be defined"
* Use sysctl_file_path variable for all sysctl_file locations
* Add sysctl_file_path variable to kubespay-defaults
* Remove previously used sysctl file locations if present
* Use explicit filename in roles/kubernetes/node/defaults/main.yml
* Defaults: use explicit value
targetRef on endpoints surfaces as
__meta_kubernetes_endpoint_address_target_kind/__meta_kubernetes_endpoint_address_target_name
in prometheus and gets converted to the label `node` by
prometheus-operator
* Fix terraform Warning
Version constraints inside provider configuration blocks are deprecated
Terraform 0.13 and earlier allowed provider version constraints inside the
provider configuration block, but that is now deprecated and will be removed
in a future version of Terraform. To silence this warning, move the provider
version constraint into the required_providers block.
* Fix terraform Warning: Quoted references are deprecated
* terraform: Update GCP Ubuntu to latest LTS
The tf-elastx_cleanup test job was failed with error message:
Port xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx cannot be deleted
directly via the port API: has device owner network:router_interface.
That means necessary to remove a subnet from the router before
deleting the port.
This adds a method to removes a subnet from the router automatically.
This allow to workaround #8375 by using image_command_tool=crictl
when containerd_registries is used for containerd.
Also changes image_info_command_on_localhost for docker to return digests.
This fixes the following types of failures:
- empty-string-compare
- literal-compare
- risky-file-permissions
- risky-shell-pipe
- var-spacing
In addition, this changes .gitlab-ci/lint.yml to block the same issue
by using the same method at Kubespray CI.
When running ansible-lint directly, we can see a lot of warning
message like
risky-file-permissions File permissions unset or incorrect
This fixes the warning messages.
All container image versions were defined in download/defaults/main.yml
except containerd.
The inconsistency caused the offline script(generate_list.sh) could not
output the URL of containerd image.
This moves the definition into a valid file.
In addition, this adds host_os to generate_list.sh for downloading
krew from a valid URL.
* Update config.toml.j2
i think this commit code is not completed works
exam registry address : a.com:5000
insecure registry must be http://a.com:5000
but this code add insecure a.com:5000 (without http://)
If there is no http, containerd accesses with https even if insecure_skip_verify = true
solution is code edit
* Update config.toml.j2
* Update containerd.yml
* Update containerd.yml
* Update containerd.yml
* Update config.toml.j2
I was deploy my cluster with separate etcd cluster and not intersect with kube_control_plane or kube_node. And I want to run etcd cluster in docker but still used containerd to make container runtime for all other nodes. Therefore, I was added note to this doc for everyone
Thank !
- Use builtin task scheduling of ansible (same task on each host)
instead of manual looping on master
Benefits:
- One less play in remove-node.yml playbook
- Parralel node drain
- Drain parameters (timeout, grace period, retries,
allow_ungraceful_removal) can be adjusted separately for each node
with ansible variables
* Ensure entries for 1.23 are added for supported_versions vars
* cri-o: add support for kubernetes 1.23 but still use cri-o 1.22
* kubescheduler-config: diferentiate config versions based on kube_version
* registry: service add clusterIP, nodePort, loadBalancer support
* modify camelcase name to underscore
* Add registry service type compatibility check
* containerd: change default resolvconf_mode to host_resolvconf
* Wait for kube-apiserver to come back after pod refresh
* Handle resolv.conf gracefully
* Retain currently configured DNS entries to ensure we don't break the resolvers
* Suse uses wickedd for network management so no dhcp hooks
* Molecule: increase ansible timeout
* CI: Increase ansible timeout to 120s for Packet jobs
* Improve control plane scale flow (#13)
* Added version 1.20.10 of K8s
* Setting first_kube_control_plane to a existing one
* Setting first_kube_control_plane to a existing one
* change first_kube_master for first_kube_control_plane
* Ansible-lint changes
If trying to pull k8scsi/csi-resizer image from gcr.io, we face the error
like:
$ docker pull gcr.io/k8scsi/csi-resizer:v1.0.0
Error response from daemon: Head https://gcr.io/v2/k8scsi/csi-resizer/
manifests/v1.0.0: unknown: Project 'project:k8scsi' not found or deleted.
$
We can pull the image from quay.io instead.
This fixes the issue.
* containerd: add hashes for 1.5.8 and 1.4.12 and make 1.5.8 the new default
* containerd: make nerdctl mandatory for container_manager = containerd
* nerdctl: bump to version 0.14.0
* containerd: use nerdctl for image manipulation
* OpenSuSE: install basic nerdctl dependencies
* set ingress-nginx default terminationGracePeriodSeconds to 5 min for the drain of connection
* Add ingress_nginx_termination_grace_period_seconds at sample inventory
install containerd on centos need to binary download it
but offline.yml has no that value
binary download url default in
roles/download/defaults/main.yml:runc_download_url: "https://github.com/opencontainers/runc/releases/download/{{ runc_version }}/runc.{{ image_arch }}"
roles/download/defaults/main.yml:containerd_download_url: "https://github.com/containerd/containerd/releases/download/v{{ containerd_version }}/containerd-{{ containerd_version }}-linux-{{ image_arch }}.tar.gz"
if i use default offlie.yml, it's error from task download files
because runc,containerd down url is none offline
i want fix this
just add 2 new line
* Docs: update CONTRIBUTING.md
* Docs: clean up outdated roadmap and point to github issues instead
* Docs: update note on kubelet_cgroup_driver
* Docs: update kata containers docs with note about cgroup driver
* Docs: note about CI specific overrides
* Defaults: replace docker with containerd as our default container_manager
* CI: Use docker for download_localhost test
* Defaults: with container_manager=containerd we need etcd_deployment_type=host
* CI: Run weave jobs with docker
* CI: Vagrant don't download_force_cache
* CI: Fix upgrade tests
* should run compatible with old settings, this means docker
* we need to run with a distro that has at least modern containerd,
this means move from debian9 to debian10 to allow `containerd_version`
to match between 2.17 and master
* add metallb auto-assign property for main IP range & update addons.yml for sample inventory
* add new line at the end of file roles\kubernetes-apps\metallb\defaults\main.yml
* set default value for matallb_auto_assign = true
* Fixes various issues in vSphere Terraform code
Provided to address various shortcomings and to fix the following
issue in upstream Kubespray:
https://github.com/kubernetes-sigs/kubespray/issues/8176
* Resolves Terraform formatting issues
* Sets default prefix to human-readable name
* Documents new default prefix in README
* Ansible: separate requirements files for supported ansible versions
* Ansible: allow using ansible 2.11
* CI: Exercise Ansible 2.9 and Ansible 2.11 in a basic AIO CI job
* CI: Allow running a reset test outside of idempotency tests and running it in stage1
* CI: move ubuntu18-calico-aio job to stage2 and relay only on ubuntu20 with the variously supported ansible versions for stage1
* CI: add capability to install collections or roles from ansible-galaxy to mitigate missing behavior in older ansible versions
* Kata-containes: Fix for ubuntu and centos sometimes kata containers fail to start because of access errors to /dev/vhost-vsock and /dev/vhost-net
* Kata-containers: use similar testing strategy as gvisor
* Kata-Containers: adjust values for 2.2.0 defaults
Make CI tests actually pass
* Kata-Containers: bump to 2.2.2 to fix sandbox_cgroup_only issue
* Limit kubectl delete node to k8s nodes
This avoids the use of `kubectl delete node` when removing etcd nodes
which are not part of the cluser (separate etcd)
* Take errors into account when deleting node
There should not be error now that we're limiting the deletion to nodes
actually in the cluster
* Retrying on error
* Ensure addon-resizer 1.8.11 only effective at arch amd64.
k8s.gcr.io/addon-resizer:1.8.11 returns the amd64 image which is not executable at arm64.
Disable addon-resizer when the platform is not amd64.
When metrics-server upgrade and use addon-resizer:2.3, then revert this
commit and `image_arch` will determine the `addon_resizer_image_tag`.
* Add metrics_server_resizer architectures check
* nginx-ingress: bump to 1.0.4
* Disable builtin ssl_session_cache solving the problem with OpenSSL consuming memory.
* Print warning only instead of error if no IngressClass permission is available.
* nginx-ingress: bump to 1.0.4 in the README
* Disable builtin ssl_session_cache solving the problem with OpenSSL consuming memory.
* Print warning only instead of error if no IngressClass permission is available.
* Containerd: download containerd from upstream instead of using distro specific packages
split runc download to separate role
make bootstrap-os role deploy container-selinux and seccomp libraries
clean up package manager provided containerd
move variables to docker role that are no longer common with containerd
* Containerd: make molecule testing more relevant
* replace ubuntu18 with ubuntu20
* add centos8 and debian11 to molecule tests
* run kubernetes/preinstall role to ensure relevancy
of test including dependency packages
* CI: adjust test scenarios for downloaded containerd
kube-bench scan outputs warning related to Calico like:
* text: "Ensure that the Container Network Interface file
permissions are set to 644 or more restrictive (Manual)"
* text: "Ensure that the Container Network Interface file
ownership is set to root:root (Manual)"
This fixes these warnings.
* netchecker: update images to 1.2.2 from Mirantis which is slightly less ancinet than the l23networks images
* Netchecker: use local etcd instead of kubernetes v1beta1 crds which are no longer suported by kube 1.22+
* Add Rocky as a known OS
* Make sure Rocky includes bootstrap-centos.yml
* Update docs with Rocky Linux
* Rocky Linux wireguard and EPEL
* Rocky Linux in the list of supported distributions
If the etcd cluster is separate and the etcd_deployment_type is "host",
there is no need for a container engine on the etcd nodes
Do not rely on a 'default(true)' filter, but define a proper default in
kubespray-defaults depending on etcd deployment method and if internal
or external etcd is used
to remove deprecation warning:
> Flag --feature-gates has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag.
Kubespray deployment failed when using containerd backend on nodes that apparmor was not installed or previously removed. This PR ensure apparmor is installed by adding it into required_pkgs var.
* Fix invalid link to Ansible documentation
* Fix invalid link to mitogen doc page
* Fix invalid link to calico doc page
* Fix all invalid links to doc pages
The addon-resizer container can reduce resource limits of cpu and
memory of metrics-server container in the pod, and that caused
OOMKilled.
In addition, the original metrics-server manifest doesn't contain
the addon-resizer container as [1].
So this adds metrics_server_resizer option to control the addon-resizer
container deployment and the default value is false to make it stable
for most environments.
[1]: 527679e5e8/manifests/base/deployment.yaml
"allowPrivilegeEscalation: false" blocks deploying metrics-server
on CentOS7. In addition, the original metrics-server manifest doesn't
contain it as [1]. This removes it.
[1]: 527679e5e8/manifests/base/deployment.yaml
* Kata-Containers: add 2.2.0 hashes and make default
* Kata-Containers: replace 2.1.0 with bugfix version 2.1.1
* Kata-Containers: move to q35 a more modern VM architecture as 'pc' is removed in 2.2.0
Kubespray deployment failed when using containerd backend on nodes that apparmor was not installed or previously removed. This PR ensure apparmor is installed by adding it into required_pkgs var.
The path of kubeconfig should be configurable, and its default value
is /etc/kubernetes/admin.conf. Most paths of the file are configurable
but some were not. This make those configurable.
* Calico: make calico_min_version check relevant
* Calico: only check currently installed version against the oldest supported version by the previous release
Modify connection_strings_etcd to only return etcd nodes - not master nodes - since this results in duplicate hosts in the generated Ansible inventory and is unnecessary.
On Debian 11, `ipset` just recommend `iptables` so on the system that apt is configured with `APT::Install-Recommends "0";` iptables will not install automatically.
* Fix: adding new ips with inventory builder (#7577)
* moved conflig loading logic
to after checking whether the config
should be loaded, and added check for
whether the config should be loaded
* added check for removing nodes from config
if the user wants to remove a node, we
need to load the config
* Fix tox errors
* Fix missing file mode (risky-file-permissions)
Found this using ansible-lint.
Signed-off-by: Bryan Hundven <bryanhundven@gmail.com>
* Fix another missing file mode (risky-file-permissions)
This one fixes `/etc/crio/config.json`
Signed-off-by: Bryan Hundven <bryanhundven@gmail.com>
* CSI: update CSI snapshot CRDs
* CSI: update snapshot controller tag version with kubernetes specific versions
* CSI: allow enabling csi_snapshot_controller independent of Cinder CSI
* CSI: Align csi-snapshot-controller with upstream and use a Deployment instead of a StatefulSet
When using Calico with:
- `calico_network_backend: vxlan`,
- `calico_ipip_mode: "Never"`,
- `calico_vxlan_mode: "Always"`,
the `FelixConfiguration` object has `ipipEnabled: true`, when it should be false:
This is caused by an error in the `| bool` conversion in the install task:
when `calico_ipip_mode` is `Never`,
`{{ calico_ipip_mode != 'Never' | bool }}` evaluates to `true`:
* Fedora and RHEL use etc_t and the convention is <type_name>_t
* Docs: specify all values for preinstall_selinux_state
* CI: Add Fedora 34 with SELinux in enforcing mode
using --no-cache-dir flag in pip install ,make sure downloaded packages
by pip don't cached on system . This is a best practice which make sure
to fetch from repo instead of using local cached one . Further , in case
of Docker Containers , by restricting caching , we can reduce image size.
In term of stats , it depends upon the number of python packages
multiplied by their respective size . e.g for heavy packages with a lot
of dependencies it reduce a lot by don't caching pip packages.
Further , more detail information can be found at
https://medium.com/sciforce/strategies-of-docker-images-optimization-2ca9cc5719b6
Signed-off-by: Pratik Raj <rajpratik71@gmail.com>
Fix task 'Cert Manager | Wait for Webhook pods become ready' failed due to webhook pods don't exist yet by using `retries..until` trick like kubernetes-sigs/kubespray#7842
This fix should be removed in the future if the kubernetes/kubernetes#83242 is resolved.
Signed-off-by: rtsp <git@rtsp.us>
The main functions are wrapped by a sys.exit function which expects and
argument. The curent implementation isn't returning values in all cases.
This change ensures main functions return a value in all cases.
Fix task 'Cert Manager | Apply ClusterIssuer manifest' failed due to service/endpoints updating delayed even though the wekhook pod status is ready.
Signed-off-by: rtsp <git@rtsp.us>
Changes:
* ClusterRole updated according to the latest manifests from
https://github.com/kubernetes/cloud-provider-vsphere
* vSphere CPI/CSI default versions bumped and
tested successfully on K8S 1.21.1
* vSphere documentation updated
Signed-off-by: Vitaliy D <vi7alya@gmail.com>
non-kubeadm mode has been removed since ddffdb63bf
2.5 years ago. The non-kubeadm makes unnecessary confusion today, then
this updates the documentation.
Previously IDs of container images were gotten from tar files of container
images but that way was wrong. If multiple json files are contained in a
tar file, the script got multiple IDs and tried to pass these IDs on
`docker tag` command. Then the command was failed.
This updates the script to get image IDs from `docker image inspect` command
to fix this issue.
In addition, this adds a check a registry container exists already or not
before deploying registry container to avoid a container conflict failure.
* CRI-O: Install libseccomp2 from backports on Debian 10
libseccomp2 is a required dependency of cri-o-runc package
The one provided in Debian 10 repositories is outdated
* 7816: Remove useless when condition
As this condition is handled by block
* fix(misc): terraform/aws
- handles deployment with a single availability zone
- handles deployment with more than two availability zone
- handles etcd collocation with control-plane nodes (`aws_etcd_num=0`)
- allows to set a bastion instances count (`aws_bastion_num`)
- allows to set bastion/etcd/control-plane/workers rootfs volume size
- removes variables from terraform.tfvars that were not re-used
- adds .terraform.lock.hcl to .gitignore
- changes/updates base image from ubuntu-18.03 to debian-10
tested by a few coworkers of mine, and myself: thanks for the outstanding
work, on both those terraform samples and kubespray playbooks.
I did not test ubuntu deployments, I could still swap from buster to
focal. LMK.
* fix(gitlab-ci)
AFAIU, terraform.tfvars indentation should be fixed for / no diff
returned running `terraform fmt -check -diff`
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/1445622114
To download necessary files in advance for offline deployment,
we can see all file URLs with contrib/offline/generate_list.sh
Most URLs are downloadable, but gvisor's one is not because the
URL is a part of full URLs for gvisor.
To download gvisor's files from the URLs directory, this separates
into two URLs for runsc and the shim.
tf-elax_ubuntu18-calico is so flake today. The test job is failed
due to SSH connectivity check error after deploying virtual machines
which are used for Kubernetes nodes.
This allows failure on the job to see the test situation without
pull request merger failures.
When running the script, I faced the following error but it was
difficult to know the root problem due to lack of error handling.
docker tag" requires exactly 2 arguments.
See 'docker tag --help'.
Usage: docker tag SOURCE_IMAGE[:TAG] TARGET_IMAGE[:TAG]
Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
To investigate such errors easily, this adds an error handling.
* csi-driver: Added possibility to use application credentials for cinder
* external-cloud-controller: Added env vars for openstack application credentials
* set selinux type t_etc if selinux state is enforcing
* workaround with update repo is no longer needed
remove comments about failing playbook
* grubby is not available in distros using ostree
* remove docker support because removed in fcos
update install script example with live rootfs
* do not call grubby on ostree based distro
* update docs enabling containerd on fedora coreos
* Ansible: move to Ansible 3.4.0 which uses ansible-base 2.10.10
* Docs: add a note about ansible upgrade post 2.9.x
* CI: ensure ansible is removed before ansible 3.x is installed to avoid pip failures
* Ansible: use newer ansible-lint
* Fix ansible-lint 5.0.11 found issues
* syntax issues
* risky-file-permissions
* var-naming
* role-name
* molecule tests
* Mitogen: use 0.3.0rc1 which adds support for ansible 2.10+
* Pin ansible-base to 2.10.11 to get package fix on RHEL8
* terraform/openstack: Use path.root for ansible_bastion_template.txt
The path.root variable points to the root module path. Using this
instead of a relative path makes less assumptions about the current
working directory.
* terraform/openstack: Add group_vars_path variable
Previously, the group_vars path was assumed to be in CWD. The
default value for the group_vars_path variable is still relative
to CWD and thus should be backwards compatible if unset.
* Calico: align manifests with upstream
* allow enabling typha prometheus metrics
* Calico: enable eBPF support
* manage the kubernetes-services-endpoint configmap
* Calico: document the use of eBPF dataplane
* Calico: improve checks before deployment
* enforce disabling kube-proxy when using eBPF dataplane
* ensure calico_version is supported
* Kata: add Kata 2.x checksums and adjust download urls for 2.x
* Kata: drop 1.x version which is no longer supported
* Kata: set default version 2.1.0
* Packet->Equinix Metal rename #6901
Updates throughout to reflect #6901 renaming for Packet to Equinix Metal.
* Rename Packet to Equinix Metal throughout the project #6901
Packet is renamed to Equinix Metal in more contexts including
documentation links. The Terraform provider used is still the Packet
provider. The environment variables and configuration options still
refer to the Packet name.
Signed-off-by: Marques Johansson <mjohansson@equinix.com>
Co-authored-by: Edward Vielmetti <ed@packet.net>
* Calico: add v3.19.1 hashes
* enable liveness probe for calico-kube-controllers
3.19.1
* Calico: drop support for v3.16.x
* Calico: promote v3.18.3 as default
* Override the default value of containerd's root, state, and oom_score configurations
* Add tests data for containerd_storage_dir, containerd_state_dir and containerd_oom_score variables
Basically we need to make necessary TCP/UDP ports open.
However the necessary ports are so many, and sometimes it is difficult
to figure out that is due to firewall issues or not if facing deployment
issues.
To distinguish a root problem on such situation, this adds contrib
playbook to disable the service firewall for Kubespray development
and test.
* add support for using ansible 2.10.x for deploying kubespray
* move dns-autoscaler-clusterrole{binding}.yml to files/ folder
* note that ansible 2.10 is now experimentally supported
* coredns: move files to templates like before #4341
* add initial MetalLB docs
* metallb allow disabling the deployment of the metallb speaker
* calico>=3.18 allow using calico to advertise service loadbalancer IPs
* Document the use of MetalLB and Calico
* clean MetalLB docs
Since K8S 1.21, BoundServiceAccountTokenVolume feature gate is in beta stage, thus activated by default (anyone who follows CSI guidelines has enabled AllAlpha and faced the issue before 1.21).
With this feature, SA tokens are regenerated every hour.
As a consequence for Calico CNI, token in /etc/cni/net.d/calico-kubeconfig copied from /var/run/secrets/kubernetes.io/serviceaccount in install-cni initContainer expires after one hour and any pod creation fails due to unauthorization.
Calico pods need to be restarted so that /etc/cni/net.d/calico-kubeconfig is updated with the new SA token.
follow new naming conventions for gcr's coredns image.
starting from 1.21 kubeadm assumes it to be `coredns/coredns`:
this causes the kubeadm deployment being unable to pull image, beacuse `v`
was also added in image tag, until the role `kubernetes-apps` ovverides
it with the old name, which is only compatible with <=1.7.
Backward comptability with kubeadm <=1.20 is mantained checking
kubernetes version and falling back to old names (`coredns:1.xx`) when
the version is less than 1.21
* rename ansible groups to use _ instead of -
k8s-cluster -> k8s_cluster
k8s-node -> k8s_node
calico-rr -> calico_rr
no-floating -> no_floating
Note: kube-node,k8s-cluster groups in upgrade CI
need clean-up after v2.16 is tagged
* ensure old groups are mapped to the new ones
* crio: add supported versions 1.20 and 1.21 and align default with k8s version
* cri-o: drop versions 1.17 and 1.18 from version matrix
* update note on cri-o version alignment
* calico: drop support for version 3.15
* drop check for calico version >= 3.3, we are at 3.16 minimum now
* we moved to calico 3.16+ so we can default to /opt/cni/bin/install
* AlmaLinux: ansible>2.9.19 is needed to know about AlmaLinux
* AlmaLinux: identify as a centos derrivative
* AlmaLinux: add AlmaLinux to checks for CentOS
* Use ansible_os_family to compare family and not distribution
As the official document[1], the parameter keepcache should be
'0' or '1' as string. To avoid the following warning message,
this fixes the parameter value:
[WARNING]: The value False (type bool) in a string field was
converted to u'False' (type string). If this does not look
like what you expect, quote the entire value to ensure it
does not change.
https://docs.ansible.com/ansible/latest/collections/ansible/builtin/yum_repository_module.html
Context: Load-balancing in Exoscale is performed by associating many
workers with the same EIP. This works, however, the workers cannot access
themselves via the EIP, which is needed at least for cert-managers
"self-test".
Problem: The old iptables based workaround felt fragile and disappointed
me at least once.
New solution: Add the EIP to a loopback interface on each worker.
* Add containerd_extra_args
This is useful for custom containerd config, e.g. auth
Signed-off-by: Zhong Jianxin <azuwis@gmail.com>
* Make containerd config.toml mode 0640
It may contain sensitive information like password
Signed-off-by: Zhong Jianxin <azuwis@gmail.com>
This PR is to move the cilium kvstore options to the configmap
rather than specifying them in the deployment as args. This
is not technically necessary but keeping all the options in
one place is probably not a bad idea.
Tested with cilium 1.9.5.
When attempting a fresh install without cilium_ipsec_enabled I ran
into the following error:
failed: [k8m01] (item={'name': 'cilium', 'file': 'cilium-secret.yml', 'type': 'secret', 'when': 'cilium_ipsec_enabled'}) =>
{"ansible_loop_var": "item", "changed": false, "item": {"file": "cilium-secret.yml", "name": "cilium", "type": "secret",
"when": "cilium_ipsec_enabled"},"msg": "AnsibleUndefinedVariable: 'cilium_ipsec_key' is undefined"}
Moving the when condition from the item level to the task level solved
the issue.
* Add KubeSchedulerConfiguration for k8s 1.19 and up
With release of version 1.19.0 of kubernetes KubeSchedulerConfiguration
was graduated to beta. It allows to extend different stages of
scheduling with profiles. Such effect is achieved by using plugins and
extensions.
This patch adds KubeSchedulerConfiguration for versions 1.19 and later.
Configuration is set to k8s defaults or to kubespray vars. Moving those
defaults to new vars will be done in following patch.
Signed-off-by: Maciej Wereski <m.wereski@partner.samsung.com>
* KubeSchedulerConfiguration: add defaults
Signed-off-by: Maciej Wereski <m.wereski@partner.samsung.com>
* Add documentation for audit webhook variables
* Enclose the value of audit_webhook_server_url in a codeblock
* Add default value for audit_webhook_batch_max_wait
Starting with Cilium v1.9 the default ipam mode has changed to "Cluster
Scope". See:
https://docs.cilium.io/en/v1.9/concepts/networking/ipam/
With this ipam mode Cilium handles assigning subnets to nodes to use
for pod ip addresses. The default Kubespray deploy uses the Kube
Controller Manager for this (the --allocate-node-cidrs
kube-controller-manager flag is set). This makes the proper ipam mode
for kubespray using cilium v1.9+ "kubernetes".
Tested with Cilium 1.9.5.
This PR also mounts the cilium-config ConfigMap for this variable
to be read properly.
In the future we can probably remove the kvstore and kvstore-opt
Cilium Operator args since they can be in the ConfigMap. I will tackle
that after this merges.
When upgrading cilium from 1.8.8 to 1.9.5 I ran into the following
error:
level=error msg="Unable to update CRD" error="customresourcedefinitions.apiextensions.k8s.io
\"ciliumnodes.cilium.io\" is forbidden: User \"system:serviceaccount:kube-system:cilium-operator\"
cannot update resource \"customresourcedefinitions\" in API group \"apiextensions.k8s.io\" at the
cluster scope" name=CiliumNode/v2 subsys=k8s
The fix was to add the update verb to the clusterrole. I also added
create to match the clusterrole created by the cilium helm chart.
DNSSEC is off by default on ubuntu/bionic64 (18.04) as per resolved.conf(5).
These tasks are artefacts of obsolete infra configuration, and no longer needed.
Further removing these tasks resolves the issue that the tasks always reports
'changed' and bounces systemd-resolved unneccesarily, even if there was no
actual modification of /etc/systemd/resolved.conf.
* Remove contrib/vault
This is marked as broken since 2018 / 3dcb914607
This still reference apiserver.pem, not used since ddffdb63bf
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* Finish nuking vault from the codebase
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
This replaces kube-master with kube_control_plane because of [1]:
The Kubernetes project is moving away from wording that is
considered offensive. A new working group WG Naming was created
to track this work, and the word "master" was declared as offensive.
A proposal was formalized for replacing the word "master" with
"control plane". This means it should be removed from source code,
documentation, and user-facing configuration from Kubernetes and
its sub-projects.
NOTE: The reason why this changes it to kube_control_plane not
kube-control-plane is for valid group names on ansible.
[1]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint/README.md#motivation
While at it remove force_certificate_regeneration
This boolean only forced the renewal of the apiserver certs
Either manually use k8s-certs-renew.sh or set auto_renew_certificates
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* Add crun download_url and checksum
* Change versioning format to crun native versioning
* Download crun using download_file.yml
* Get crun version from download defaults
* Delegate crun binary copy task to crun role
* Download Calico KDD CRDs
* Replace kustomize with lineinfile and use ansible assemble module
* Replace find+lineinfile by sed in shell module to avoid nested loop
* add condition on sed
* use block for kdd tasks + remove supernumerary kdd manifest apply in start "Start Calico resources"
"The error was: 'proxy_disable_env' is undefined\n\nThe error appears to
be in '<censored>scale.yml': line 72, column 7"
Fixes 067db686f6
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
upgrades.md explains how to do upgrade from v1.4.3 to v1.4.6 as an
example. The versions are a little old, and the doc readers would
have a concern the upgrade works fine or not.
This updates versions after verifying the way works fine by hands.
* terraform support for UpCloud
* terraform support for UpCloud
* terraform support for UpCloud
* terraform support for UpCloud
* terraform support for UpCloud
* terraform support for UpCloud
* terraform support for UpCloud
* Updates to README.md and main.tf files
* formatting and updating readme
* added a .terraform_validate CI job
* fixed format issue
* added sample inventory
* added symbolic link to group_vars
* added missing tf variables and minor fixes
* added text formatting
* minor formatting fixes
* Update ansible to v2.9.18
Signed-off-by: Maciej Wereski <m.wereski@partner.samsung.com>
* Update jinja2 to v2.11.3
Signed-off-by: Maciej Wereski <m.wereski@partner.samsung.com>
* add nodeselector and tolerations for metallb
* remove unnecessary commented lines in metallb template
* set default speaker toleration to match original manifest
When privileged is enabled for a container, all the `/dev/*` block
devices from the host are mounted into the guest. The
`privileged_without_host_devices` flag prevents host devices from
being passed to privileged containers.
More information:
* https://github.com/containerd/cri/pull/1225
* 1d0f68156b
The important action in kubeadm-version.yml is the templating of the configuration,
not finding / setting the version
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
kubeadm is the default for a long time now,
and admin.conf is created by it, so let kubeadm handle it
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
Using `kubeadm init phase kubeconfig all` breaks kubelet client certificate rotation
as we are missing `kubeadm init phase kubelet-finalize all` to point to `kubelet-client-current.pem`
kubeconfig format is stable so let's just use lineinfile,
this will avoid other future breakage
This revert to the logic before 6fe2248314
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
On CentOS 8 they seem to be ignored by default, but better be extra safe
This also make it easy to exclude other network plugin interfaces
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* Add terraform scripts for vSphere
* Fixup: Add terraform scripts for vSphere
* Add inventory generation
* Use machines var to provide IPs
* Add README file
* Add default.tfvars file
* Fix newlines at the end of files
* Remove master.count and worker.count variables
* Fixup cloud-init formatting
* Fixes after initial review
* Add warning about disabled DHCP
* Fixes after second review
* Add sample-inventory
* use external_openstack_lbaas_use_octavia for template openstack-cloud-config
* Delete external_openstack_lbaas_use_octavia from default values. Added description and default values of variables to docs
* markdown fix
* make this simple
* set external_openstack_lbaas_use_octavia in default values
* duplicated variable in doc
This replaces KUBE_MASTERS with KUBE_CONTROL_HOSTS because of [1]:
```
The Kubernetes project is moving away from wording that is
considered offensive. A new working group WG Naming was created
to track this work, and the word "master" was declared as offensive.
A proposal was formalized for replacing the word "master" with
"control plane". This means it should be removed from source code,
documentation, and user-facing configuration from Kubernetes and
its sub-projects.
```
[1]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint/README.md#motivation
Since a790935d02 all proxy users
should be properly configured
Now when you have *_PROXY vars in your environment it can leads to failure
if NO_PROXY is not correct, or to persistent configuration changes
as seen with kubeadm in 1c5391dda7
Instead of playing constant whack-a-bug, inject empty *_PROXY vars everywhere
at the play level, and override at the task level when needed
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
Before this commit, we were gathering:
1 !all
7 network
7 hardware
After we are gathering:
1 !all
1 network
1 hardware
ansible_distribution_major_version is gathered by '!all'
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* Move proxy_env to kubespray-defaults/defaults
There is no reasons to use set_facts here
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
* Ensure kubeadm doesn't use proxy
*_proxy variables might be present in the environment (/etc/environment, bash profile, ...)
When this is the case we end up with those proxy configuration in /etc/kubernetes/manifests/kube-*.yaml manifests
We cannot unset env variables, but kubeadm is nice enough to ignore empty vars
93d288e2a4/cmd/kubeadm/app/util/env.go (L27)
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
Ubuntu 18.04 crio package ships with 'mountopt = "nodev,metacopy=on"'
even if GA kernel is 4.15 (HWE Kernel can be more recent)
Fedora package ships without metacopy=on
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
By default Ansible stat module compute checksum, list extended attributes and find mime type
To find all stat invocations that really use one of those:
git grep -F stat. | grep -vE 'stat.(islnk|exists|lnk_source|writeable)'
Signed-off-by: Etienne Champetier <e.champetier@ateme.com>
`containerd.io` is the companion package of `docker-ce` and is the
proper package name. This is needed to avoid apt upgrade/dist-upgrade
from breaking kubernetes.
Running remove-node.yml tasks for clean up cluster on Fedora CoreOS.
The task failed to restart network daemon (task name: "reset | Restart network").
Fedora CoreOS is essentially using NetworkManager, but this task returns network.
Signed-off-by: Takashi IIGUNI <iiguni.tks@gmail.com>
* Add unique annotation on coredns deployment and only remove existing deployment if annotation is missing.
* Ignore errors when gathering coredns deployment details to handle case where it doesn't exist yet
* Remove run_once, deletegate_to and add to when statement
* Added force_etcd_cert_refresh var to maintain existing functionality. Broke out etcd node cert syncing from member and admin cert sync logic. Now first etcd will sync node certs to other etcd members on every run to keep all etcds up to date after adding additional worker nodes to the cluster
* Updated etcd cert check tasks to better detect when new certificates need to be generated
* Move usage of force_etcd_cert_refresh var to gen_certs fact set
* Force etcd cert generation per server if force_etcd_cert_refresh is set to true
* Include gathering of node certs even if k8s-cluster member and in etcd group.
* Removed run_once due to when statement
Helm v3.5.2 is a security (patch) release. Users are strongly
recommended to update to this release. It fixes two security issues in
upstream dependencies and one security issue in the Helm codebase.
See https://github.com/helm/helm/releases/tag/v3.5.2
This makes the docker role work the same as the containerd role.
Being able to override this is needed when you have your own debian
repository. E.g. when performing an airgapped installation
* contrib/terraform/exoscale: Rework SSH public keys
Exoscale has a few limitations with `exoscale_ssh_keypair` resources.
Creating several clusters with these scripts may lead to an error like:
```
Error: API error ParamError 431 (InvalidParameterValueException 4350): The key pair "lj-sc-ssh-key" already has this fingerprint
```
This patch reworks handling of SSH public keys. Specifically, we rely on
the more cloud-agnostic way of configuring SSH public keys via
`cloud-init`.
* contrib/terraform/exoscale: terraform fmt
* contrib/terraform/exoscale: Add terraform validate
* contrib/terraform/exoscale: Inline public SSH keys
The Terraform scripts need to install some SSH key, so that Kubespray
(i.e., the "Ansible part") can take over. Initially, we pointed the
Terraform scripts to `~/.ssh/id_rsa.pub`. This proved to be suboptimal:
Operators sharing responbility for a cluster risk unnecessarily replacing resources.
Therefore, it has been determined that it's best to inline the public
SSH keys. The chosen variable `ssh_public_keys` provides some uniformity
with `contrib/azurerm`.
* Fix Terraform Exoscale test
* Fix Terraform 0.14 test
* update local-path-storage config template to version v0.0.19
* changes local_path_provisioner image tag to v0.0.19
* removes copy paste example from rancher local-path-provisioner repo
According to the following recommendation, this moves the directory
to control-plane:
The Kubernetes project is moving away from wording that is considered
offensive. A new working group WG Naming was created to track this work,
and the word "master" was declared as offensive. A proposal was formalized
for replacing the word "master" with "control plane".
Fixes the following error when using Bastion Node with the sample config.
```
fatal: [bastion]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'bastion'\n\nThe error appears to be in '/home/felix/inovex/kubespray/roles/bastion-ssh-config/tasks/main.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: set bastion host IP\n ^ here\n"}
```
Previous check for presence of NM assumed "systemctl show
NetworkManager" would exit with a nonzero status code, which seems not
the case anymore with recent Flatcar Container Linux.
This new check also checks the activeness of network manager, as
`is-active` implies presence.
Signed-off-by Jorik Jonker <jorik@kippendief.biz>
This was introduced in 143e2272ff
Extra repo is enabled by default in CentOS, and is not the right repo for EL8
Instead of adding a CentOS repo to RHEL, enable the needed RHEL repos with rhsm_repository
For RHEL 7, we need the "extras" repo for container-selinux
For RHEL 8, we need the "appstream" repo for container-selinux, ipvsadm and socat
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Only checking the kubernetes api on the first master when upgrading is not enough.
Each master needs to be checked before it's upgrade.
Signed-off-by: Rick Haan <rickhaan94@gmail.com>
yum_repository expect really different params, so nothing to factor here
Ubuntu is not an ansible_os_family, the OS family for Ubuntu is Debian
Check for ansible_pkg_mgr == apt
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
we don't need rpm_key, so nothing to factor here
Ubuntu is not an ansible_os_family, the OS family for Ubuntu is Debian
Check for ansible_pkg_mgr == apt
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
Here the desciption from Ansible docs
Corresponds to the --force-yes to apt-get and implies allow_unauthenticated: yes
This option will disable checking both the packages' signatures and the certificates of the web servers they are downloaded from.
This option *is not* the equivalent of passing the -f flag to apt-get on the command line
**This is a destructive operation with the potential to destroy your system, and it should almost never be used.** Please also see man apt-get for more information.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
It is recommended to use filter to manage the GitHub email notification, see [examples for setting filters to Kubernetes Github notifications](https://github.com/kubernetes/community/blob/master/communication/best-practices.md#examples-for-setting-filters-to-kubernetes-github-notifications)
To install development dependencies you can use `pip install -r tests/requirements.txt`
To install development dependencies you can set up a python virtual env with the necessary dependencies:
```ShellSession
virtualenv venv
source venv/bin/activate
pip install -r tests/requirements.txt
```
#### Linting
Kubespray uses `yamllint` and `ansible-lint`. To run them locally use `yamllint .` and `ansible-lint`
Kubespray uses [pre-commit](https://pre-commit.com) hook configuration to run several linters, please install this tool and use it to run validation tests before submitting a PR.
```ShellSession
pre-commit install
pre-commit run -a # To run pre-commit hook on all files in the repository, even if they were not modified
```
#### Molecule
@ -27,5 +38,9 @@ Vagrant with VirtualBox or libvirt driver helps you to quickly spin test cluster
1. Submit an issue describing your proposed change to the repo in question.
2. The [repo owners](OWNERS) will respond to your issue promptly.
3. Fork the desired repo, develop and test your code changes.
4. Sign the CNCF CLA (<https://git.k8s.io/community/CLA.md#the-contributor-license-agreement>)
5. Submit a pull request.
4. Install [pre-commit](https://pre-commit.com) and install it in your development repo.
5. Addess any pre-commit validation failures.
6. Sign the CNCF CLA (<https://git.k8s.io/community/CLA.md#the-contributor-license-agreement>)
7. Submit a pull request.
8. Work with the reviewers on their suggestions.
9. Ensure to rebase to the HEAD of your target branch and squash un-necessary commits (<https://blog.carbonfive.com/always-squash-and-rebase-your-git-commits/>) before final merger of your contribution.
If you have questions, check the documentation at [kubespray.io](https://kubespray.io) and join us on the [kubernetes slack](https://kubernetes.slack.com), channel **\#kubespray**.
You can get your invite [here](http://slack.k8s.io/)
- Can be deployed on **[AWS](docs/aws.md), GCE, [Azure](docs/azure.md), [OpenStack](docs/openstack.md), [vSphere](docs/vsphere.md), [Packet](docs/packet.md) (bare metal), Oracle Cloud Infrastructure (Experimental), or Baremetal**
- Can be deployed on **[AWS](docs/aws.md), GCE, [Azure](docs/azure.md), [OpenStack](docs/openstack.md), [vSphere](docs/vsphere.md), [Equinix Metal](docs/equinix-metal.md) (bare metal), Oracle Cloud Infrastructure (Experimental), or Baremetal**
- **Highly available** cluster
- **Composable** (Choice of the network plugin for instance)
- Supports most popular **Linux distributions**
@ -19,10 +19,10 @@ To deploy the cluster you can use :
#### Usage
```ShellSession
# Install dependencies from ``requirements.txt``
sudo pip3 install -r requirements.txt
Install Ansible according to [Ansible installation guide](/docs/ansible.md#installing-ansible)
then run the following steps:
```ShellSession
# Copy ``inventory/sample`` as ``inventory/mycluster``
# Deploy Kubespray with Ansible Playbook - run the playbook as root
# The option `--become` is required, as for example writing SSL keys in /etc/,
@ -48,11 +48,24 @@ As a consequence, `ansible-playbook` command will fail with:
ERROR! no action detected in task. This often indicates a misspelled module name, or incorrect module path.
```
probably pointing on a task depending on a module present in requirements.txt (i.e. "unseal vault").
probably pointing on a task depending on a module present in requirements.txt.
One way of solving this would be to uninstall the Ansible package and then, to install it via pip but it is not always possible.
A workaround consists of setting `ANSIBLE_LIBRARY` and `ANSIBLE_MODULE_UTILS` environment variables respectively to the `ansible/modules` and `ansible/module_utils` subdirectories of pip packages installation location, which can be found in the Location field of the output of `pip show [package]` before executing `ansible-playbook`.
A simple way to ensure you get all the correct version of Ansible is to use the [pre-built docker image from Quay](https://quay.io/repository/kubespray/kubespray?tab=tags).
You will then need to use [bind mounts](https://docs.docker.com/storage/bind-mounts/) to get the inventory and ssh key into the container, like this:
```ShellSession
git checkout v2.20.0
docker pull quay.io/kubespray/kubespray:v2.20.0
docker run --rm -it --mount type=bind,source="$(pwd)"/inventory/sample,dst=/inventory \
For Vagrant we need to install python dependencies for provisioning tasks.
@ -63,10 +76,11 @@ python -V && pip -V
```
If this returns the version of the software, you're good to go. If not, download and install Python from here <https://www.python.org/downloads/source/>
Install the necessary requirements
Install Ansible according to [Ansible installation guide](/docs/ansible.md#installing-ansible)
Note: The list of available docker version is 18.09, 19.03 and 20.10. The recommended docker version is 19.03. The kubelet might break on docker's non-standard version numbering (it no longer uses semantic versioning). To ensure auto-updates don't break your cluster look into e.g. yum versionlock plugin or apt pin).
## Container Runtime Notes
- The list of available docker version is 18.09, 19.03 and 20.10. The recommended docker version is 20.10. The kubelet might break on docker's non-standard version numbering (it no longer uses semantic versioning). To ensure auto-updates don't break your cluster look into e.g. yum versionlock plugin or apt pin).
- The cri-o version should be aligned with the respective kubernetes version (i.e. kube_version=1.20.x, crio_version=1.20)
## Requirements
- **Minimum required version of Kubernetes is v1.18**
- **Ansible v2.9.x, Jinja 2.11+ and python-netaddr is installed on the machine that will run Ansible commands, Ansible 2.10.x is not supported for now**
- **Minimum required version of Kubernetes is v1.23**
- **Ansible v2.11+, Jinja 2.11+ and python-netaddr is installed on the machine that will run Ansible commands**
- The target servers must have **access to the Internet** in order to pull docker images. Otherwise, additional configuration is required (See [Offline Environment](docs/offline-environment.md))
- The target servers are configured to allow **IPv4 forwarding**.
- If using IPv6 for pods and services, the target servers are configured to allow **IPv6 forwarding**.
- The **firewalls are not managed**, you'll need to implement your own rules the way you used to.
in order to avoid any issue during deployment you should disable your firewall.
- If kubespray is ran from non-root user account, correct privilege escalation method
@ -177,8 +215,6 @@ You can choose between 10 network plugins. (default: `calico`, except Vagrant us
- [cilium](http://docs.cilium.io/en/latest/): layer 3/4 networking (as well as layer 7 to protect and secure application protocols), supports dynamic insertion of BPF bytecode into the Linux kernel to implement security services, networking and visibility logic.
- [ovn4nfv](docs/ovn4nfv.md): [ovn4nfv-k8s-plugins](https://github.com/opnfv/ovn4nfv-k8s-plugin) is the network controller, OVS agent and CNI server to offer basic SFC and OVN overlay networking.
- [weave](docs/weave.md): Weave is a lightweight container overlay network that doesn't require an external K/V database cluster.
(Please refer to `weave` [troubleshooting documentation](https://www.weave.works/docs/net/latest/troubleshooting/)).
@ -199,10 +235,10 @@ See also [Network checker](docs/netcheck.md).
## Ingress Plugins
- [ambassador](docs/ambassador.md): the Ambassador Ingress Controller and API gateway.
- [nginx](https://kubernetes.github.io/ingress-nginx): the NGINX Ingress Controller.
- [metallb](docs/metallb.md): the MetalLB bare-metal service LoadBalancer provider.
The Kubespray Project is released on an as-needed basis. The process is as follows:
1. An issue is proposing a new release with a changelog since the last release
1. An issue is proposing a new release with a changelog since the last release. Please see [a good sample issue](https://github.com/kubernetes-sigs/kubespray/issues/8325)
2. At least one of the [approvers](OWNERS_ALIASES) must approve this release
3. The `kube_version_min_required` variable is set to `n-1`
4. Remove hashes for [EOL versions](https://github.com/kubernetes/sig-release/blob/master/releases/patch-releases.md) of kubernetes from `*_checksums` variables.
5. An approver creates [new release in GitHub](https://github.com/kubernetes-sigs/kubespray/releases/new) using a version and tag name like `vX.Y.Z` and attaching the release notes
6. An approver creates a release branch in the form `release-X.Y`
7. The corresponding version of [quay.io/kubespray/kubespray:vX.Y.Z](https://quay.io/repository/kubespray/kubespray) and [quay.io/kubespray/vagrant:vX.Y.Z](https://quay.io/repository/kubespray/vagrant) docker images are built and tagged
8. The `KUBESPRAY_VERSION` variable is updated in `.gitlab-ci.yml`
9. The release issue is closed
10. An announcement email is sent to `kubernetes-dev@googlegroups.com` with the subject `[ANNOUNCE] Kubespray $VERSION is released`
11. The topic of the #kubespray channel is updated with `vX.Y.Z is released! | ...`
4. Remove hashes for [EOL versions](https://github.com/kubernetes/website/blob/main/content/en/releases/patch-releases.md) of kubernetes from `*_checksums` variables.
5. Create the release note with [Kubernetes Release Notes Generator](https://github.com/kubernetes/release/blob/master/cmd/release-notes/README.md). See the following `Release note creation` section for the details.
6. An approver creates [new release in GitHub](https://github.com/kubernetes-sigs/kubespray/releases/new) using a version and tag name like `vX.Y.Z` and attaching the release notes
7. An approver creates a release branch in the form `release-X.Y`
8. The corresponding version of [quay.io/kubespray/kubespray:vX.Y.Z](https://quay.io/repository/kubespray/kubespray) and [quay.io/kubespray/vagrant:vX.Y.Z](https://quay.io/repository/kubespray/vagrant) container images are built and tagged. See the following `Container image creation` section for the details.
9. The `KUBESPRAY_VERSION` variable is updated in `.gitlab-ci.yml`
10. The release issue is closed
11. An announcement email is sent to `dev@kubernetes.io` with the subject `[ANNOUNCE] Kubespray $VERSION is released`
12. The topic of the #kubespray channel is updated with `vX.Y.Z is released! | ...`
## Major/minor releases and milestones
@ -46,3 +47,37 @@ The Kubespray Project is released on an as-needed basis. The process is as follo
then Kubespray v2.1.0 may be bound to only minor changes to `kube_version`, like v1.5.1
and *any* changes to other components, like etcd v4, or calico 1.2.3.
And Kubespray v3.x.x shall be bound to `kube_version: 2.x.x` respectively.
If the release note file(/tmp/kubespray-release-note) contains "### Uncategorized" pull requests, those pull requests don't have a valid kind label(`kind/feature`, etc.).
It is necessary to put a valid label on each pull request and run the above release-notes command again to get a better release note)
## Container image creation
The container image `quay.io/kubespray/kubespray:vX.Y.Z` can be created from Dockerfile of the kubespray root directory:
@ -8,18 +8,22 @@ Installs and configures GlusterFS on Linux.
For GlusterFS to connect between servers, TCP ports `24007`, `24008`, and `24009`/`49152`+ (that port, plus an additional incremented port for each additional server in the cluster; the latter if GlusterFS is version 3.4+), and TCP/UDP port `111` must be open. You can open these using whatever firewall you wish (this can easily be configured using the `geerlingguy.firewall` role).
This role performs basic installation and setup of Gluster, but it does not configure or mount bricks (volumes), since that step is easier to do in a series of plays in your own playbook. Ansible 1.9+ includes the [`gluster_volume`](https://docs.ansible.com/gluster_volume_module.html) module to ease the management of Gluster volumes.
This role performs basic installation and setup of Gluster, but it does not configure or mount bricks (volumes), since that step is easier to do in a series of plays in your own playbook. Ansible 1.9+ includes the [`gluster_volume`](https://docs.ansible.com/ansible/latest/collections/gluster/gluster/gluster_volume_module.html) module to ease the management of Gluster volumes.
## Role Variables
Available variables are listed below, along with default values (see `defaults/main.yml`):
glusterfs_default_release: ""
```yaml
glusterfs_default_release: ""
```
You can specify a `default_release` for apt on Debian/Ubuntu by overriding this variable. This is helpful if you need a different package or version for the main GlusterFS packages (e.g. GlusterFS 3.5.x instead of 3.2.x with the `wheezy-backports` default release on Debian Wheezy).
glusterfs_ppa_use: yes
glusterfs_ppa_version: "3.5"
```yaml
glusterfs_ppa_use: yes
glusterfs_ppa_version: "3.5"
```
For Ubuntu, specify whether to use the official Gluster PPA, and which version of the PPA to use. See Gluster's [Getting Started Guide](https://docs.gluster.org/en/latest/Quick-Start-Guide/Quickstart/) for more info.
@ -29,9 +33,11 @@ None.
## Example Playbook
```yaml
- hosts: server
roles:
- geerlingguy.glusterfs
```
For a real-world use example, read through [Simple GlusterFS Setup with Ansible](http://www.jeffgeerling.com/blog/simple-glusterfs-setup-ansible), a blog post by this role's author, which is included in Chapter 8 of [Ansible for DevOps](https://www.ansiblefordevops.com/).
when:inventory_hostname == groups['kube-master'][0] and groups['gfs-cluster'] is defined and hostvars[groups['gfs-cluster'][0]].gluster_disk_size_gb is defined
when:inventory_hostname == groups['kube_control_plane'][0] and groups['gfs-cluster'] is defined and hostvars[groups['gfs-cluster'][0]].gluster_disk_size_gb is defined
- name:Kubernetes Apps | Set GlusterFS endpoint and PV
- Terraform automatically creates an Ansible Inventory file called `hosts` with the created infrastructure in the directory `inventory`
- Ansible will automatically generate an ssh config file for your bastion hosts. To connect to hosts with ssh using bastion host use generated ssh-bastion.conf.
Ansible automatically detects bastion and changes ssh_args
- Ansible will automatically generate an ssh config file for your bastion hosts. To connect to hosts with ssh using bastion host use generated `ssh-bastion.conf`. Ansible automatically detects bastion and changes `ssh_args`
```commandline
ssh -F ./ssh-bastion.conf user@$ip
@ -122,7 +121,7 @@ You can use the following set of commands to get the kubeconfig file from your n
* `ssh_public_keys`: List of public SSH keys to install on all machines
* `zone`: The zone where to run the cluster
* `machines`: Machines to provision. Key of this object will be used as the name of the machine
* `node_type`: The role of this node *(master|worker)*
* `size`: The size to use
* `boot_disk`: The boot disk to use
* `image_name`: Name of the image
* `root_partition_size`: Size *(in GB)* for the root partition
* `ceph_partition_size`: Size *(in GB)* for the partition for rook to use as ceph storage. *(Set to 0 to disable)*
* `node_local_partition_size`: Size *(in GB)* for the partition for node-local-storage. *(Set to 0 to disable)*
* `ssh_whitelist`: List of IP ranges (CIDR) that will be allowed to ssh to the nodes
* `api_server_whitelist`: List of IP ranges (CIDR) that will be allowed to connect to the API server
* `nodeport_whitelist`: List of IP ranges (CIDR) that will be allowed to connect to the kubernetes nodes on port 30000-32767 (kubernetes nodeports)
### Optional
* `prefix`: Prefix to use for all resources, required to be unique for all clusters in the same project *(Defaults to `default`)*
An example variables file can be found `default.tfvars`
## Known limitations
### Only single disk
Since Exoscale doesn't support additional disks to be mounted onto an instance, this script has the ability to create partitions for [Rook](https://rook.io/) and [node-local-storage](https://kubernetes.io/docs/concepts/storage/volumes/#local).
### No Kubernetes API
The current solution doesn't use the [Exoscale Kubernetes cloud controller](https://github.com/exoscale/exoscale-cloud-controller-manager).
This means that we need to set up a HTTP(S) loadbalancer in front of all workers and set the Ingress controller to DaemonSet mode.