* etcd: etcd-events doesn't depend on etcd_cluster_setup
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: remove condition already present on include_tasks
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: fix scaling up
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use *access_addresses, do not delegate to etcd[0]
We want to wait for the full cluster to be healthy,
so use all the cluster addresses
Also we should be able to run the playbook when etcd[0] is down
(not tested), so do not delegate to etcd[0]
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* etcd: use failed_when for health check
unhealthy cluster is expected on first run, so use failed_when
instead of ignore_errors to remove scary red messages
Also use run_once
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/preinstall: ensure ansible_fqdn is up to date after changing /etc/hosts
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* kubernetes/master: regenerate apiserver cert if needed
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
The variable is defined in `kubernetes/preinstall` role and used in several roles. Since `kubernetes/preinstall` is not always included when `ansible-playbook` is run with tag selectors (see #5734 for reason), they will fail, or individual roles must copy the same fact definitions (as in #3846). Moving the definition to the always-included `kubespray-defaults` role will resolve the dependency problem.
- This solves issue #5721 & #5713 (dupes)
- Provide a cleaner default usage pattern for the download role
around etcd that supports 'host' and 'docker' properly
- Extract the 'etcdctl' as a separate task install piece and reuse it where
appropriate
- Update the kubeadm-etcd task to reflect the above change
* fedora coreos support
- bootstrap and new fact for
* fedora coreos support
- fix bootstrap condition
* fedora coreos support
- allow customize packages for fedora coreos bootstrap
* fedora coreos support
- prevent install ptyhon3 and epel via dnf for fedora coreos
* fedora coreos support
- handle all ostree like os in same way
* fedora coreos support
- handle all ostree like os in same way for crio
* fedora coreos support
- add fcos documentations
* download file
* download containers
* fix push image to nodes
* pull if none image on host
* fix
* improve docker image tag checks.
do not pull already cached images
* rebase fix merge conflict
* add support download_run_once when upgrade and scale cluster
add some test with download_run_once
* set default values to temp flag for every download cycle
* add save,load abilty for containerd and crio when download_run_once=true
* return redefine image save/load command to set_docker_image_facts.yml
* move set command to set_container_facts
* ctr in containerd_bin_dir
* fix order of ctr image export arguments
* temporary disable download_run_once for containerd and crio
due https://github.com/containerd/containerd/issues/4075
* remove unused files
* fix strict yaml linter warning and errors
* refactor logical conditions to pull and cache container images
* remove comment due lint check
* document role
* remove image_load_on_localhost, because cached images are always loaded to docker on remote sites
* remove XXX from debug output
* Fix incorrect assertion comparison for kube_network_node_prefix
* Ignore assertion comparison for kube_network_node_prefix when using calico
* Adding more var docs description for kube_network_node_prefix
* Fixing trailing whitespaces
* Fix python3-libselinux installation for RHEL/CentOS 8
In bootstrap-centos.yml we haven't gathered the facts,
so #5127 couldn't work
Minimum ansible version to run kubespray is 2.7.8,
so ansible_distribution_major_version is defined an there is no need to default it
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Restart NetworkManager for RHEL/CentOS 8
network.service doesn't exist anymore
# systemctl status network
Unit network.service could not be found.
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Add module_hotfixes=True to docker / containerd yum repo config
https://bugzilla.redhat.com/show_bug.cgi?id=1734081https://bugzilla.redhat.com/show_bug.cgi?id=1756473
Without this setting you end up with the following error:
# yum install docker-ce
Failed to set locale, defaulting to C
Last metadata expiration check: 0:03:21 ago on Thu Sep 26 22:00:05 2019.
Error:
Problem: package docker-ce-3:19.03.2-3.el7.x86_64 requires containerd.io >= 1.2.2-3, but none of the providers can be installed
- cannot install the best candidate for the job
- package containerd.io-1.2.2-3.3.el7.x86_64 is excluded
- package containerd.io-1.2.2-3.el7.x86_64 is excluded
- package containerd.io-1.2.4-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.5-3.1.el7.x86_64 is excluded
- package containerd.io-1.2.6-3.3.el7.x86_64 is excluded
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages)
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Use python3-libselinux on RHEL8/Centos8
* The fact ansible_facts.distribution_major_version is not present on older Ansible version.
Default it to 0 in when not present and use libselinux-python as package to get current
default behaviour.
* Refactor calico-rr to run in k8s cluster with taint
Change-Id: I75a3169ff5b36ce8302fc7ef1c32d3eb697b5afa
* add preinstall checks
* rework calico/rr role
Change-Id: I2f0a7e6cb77cf91ad4a615923680760d2e5d9ca8
* add empty calico-rr group
Change-Id: I006c0a60db9b72d02245bf8fdfabcf982144a5ad
* add macvlan cni to kubespray
* macvlan: lint yaml files and fix sample config file
* macvlan: add OWNERS file
* add macvlan to README
* macvlan : CI first shoot
* macvlan : CI add full masquerade
* delegate retrive pod cidr to master only
* macvlan: add config for CI
* macvlan: add netchecker deployment
* Require minimum version of Kubernetes
* Remove checksums for kubernetes version 1.12
* Add kube_version to precheck output and add min required version to README
* Fix merge
* Fix defaults
* Fix typo in precheck
* Make local volume provisioner dir mode a variable
I need to change this for Nagios monitoring. Others may
need to as well. Had to close previous commits, sorry for
the spam.
* Make local volume provisioner dir mode a variable
I need to change this for Nagios monitoring. Others may
need to as well. Had to close previous commits, sorry for
the spam.
* Add support for arm images for hyperkube, kubeadm and cni_binary
* Add dummy etcd checksum for arm
This commit adds dummy etcd checksum for arm to avoid "no attribute" error
during setup.
* Add etcd host assert check
* Add 1.13.4 checksums of kubeadm and hyperkube for arm
* Update checksums of kubeadm and hyperkube for arm
* Add dummy checksums for calicoctl_binary_checksums dict
* disable gather_facts because it causes tests to fail
* Remove architecture check for etcd, due to unable to run tests
* Use K8s 1.14 and add kubeadm experimental control plane mode
This reverts commit d39c273d96.
* Cleanup kubeadm setup run on first master
* pin kubeadm_certificate_key in test
* Remove kubelet autolabel of kube-node, add symlink for pki dir
Change-Id: Id5e74dd667c60675dbfe4193b0bc9fb44380e1ca
Both kubedns and dnsmasq modes are long not maintained.
We should run dns_late steps at the end because sshd
makes DNS lookups during Ansible run and has 2s timeouts
for each failed lookup trying to connect to coredns before
it is ready.
This PR ensures that the e2fsprogs and xfsprogs packages are
installed on all Kubernetes nodes and that the packages are
the latest versions. It also ensures that the nodes can
create XFS filesystems when necessary, since not all distros
install xfsprogs by default.
e2fsprogs - ext2/ext3/ext4 file system utilities
xfsprogs - Utilities for managing the XFS filesystem
- Fixed an issue where storage class host directories were looped
through excessive target hosts
- Fixes examples in the LVP `README.md` to use nested dicts instead of a
list of dicts
* Makes local volume provisioner more dynamic
* Correct variable name in local storage provisioner defaults
* Updates external-provisioner readme
* Updates variable naming to be more clear, more documentation, fixes sample inventory
* Variable refactor, untangled some jinja2 loops
* Corrects variable name
* No variable substitution in dict keys, replaced with anchor
* Fixes default storage_classes dict, inline docs
* Fixes spelling in inline docs
* Addresses comments in review
* Updates all the defaults
* Fix failing CI task
* Fixes external provisioner daemonset
* Remove non-kubeadm deployment
* More cleanup
* More cleanup
* More cleanup
* More cleanup
* Fix gitlab
* Try stop gce first before absent to make the delete process work
* More cleanup
* Fix bug with checking if kubeadm has already run
* Fix bug with checking if kubeadm has already run
* More fixes
* Fix test
* fix
* Fix gitlab checkout untill kubespray 2.8 is on quay
* Fixed
* Add upgrade path from non-kubeadm to kubeadm. Revert ssl path
* Readd secret checking
* Do gitlab checks from v2.7.0 test upgrade path to 2.8.0
* fix typo
* Fix CI jobs to kubeadm again. Fix broken hyperkube path
* Fix gitlab
* Fix rotate tokens
* More fixes
* More fixes
* Fix tokens
When using resolvconf_mode host_resolvconf, there is an early DNS
config stage where Kubernetes cluster DNS is not injected for host
DNS intially. Later, the cluster DNS is enabled, but we do not
need to run every task from the kubernetes/preinstall role.
* warning on meta flush_handlers
* avoid rm
* avoid "Module remote_tmp /root/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually" warning on subsequent tasks using blockinfile
* is match
* failed
* version_compare
* succeeded
* skipped
* success
* version_compare becomes version since ansible 2.5
* ansible minimal version updated in doc and spec
* last version_compare
* [jjo] add kube-router support
Fixescloudnativelabs/kube-router#147.
* add kube-router as another network_plugin choice
* support most used kube-router flags via
`kube_router_foo` vars as other plugins
* implement replacing kube-proxy (--run-service-proxy=true) via
`kube_proxy_mode: none`, verified in a _non kubeadm_enabled_
install, should also work for recent kubeadm releases via
`skipKubeProxyInstall: true` config
* [jjo] address PR#3339 review from @woopstar
* add busybox image used by kube-router to downloads
* fix busybox download groups key
* rework kubeadm_enabled + kube_router_run_service_proxy
- verify it working ok w/the kubeadm_enabled and
kube_router_run_service_proxy true or false
- introduce `kube_proxy_remove` fact, to decouple logic
from kube_proxy_mode (which affects kubeadm configmap
settings, thus no-good to ab-use it to 'none')
* improve kube-router.md re: kubeadm_enabled and kube_router_run_service_proxy
* address @woopstar latest review
* add inventory/sample/group_vars/k8s-cluster/k8s-net-kube-router.yml
* fix kube_router_run_service_proxy conditional for kube-proxy removal
* fix kube_proxy_remove fact (w/ |bool), add some needed kube-proxy tags on my and existing changes
* update kube-router tolerations for 1.12 compatibility
* add PriorityClass to kube-router DaemonSet
The hosts(5) manpage clearly states that the first entry is the
"canonical name", or FQDN (Fully-Qualified Domain Name):
IP_address canonical_hostname [aliases...]
By using the alias as a first entry, `hostname -f` does not return the
correct domain which breaks all sorts of unrelated functionality (it
has impact over email server configuration, for example).
* [jjo] add DIND support to contrib/
- add contrib/dind with ansible playbook to
create "node" containers, and setup them to mimic
host nodes as much as possible (using Ubuntu images),
see contrib/dind/README.md
- nodes' /etc/hosts editing via `blockinfile` and
`lineinfile` need `unsafe_writes: yes` because /etc/hosts
are mounted by docker, and thus can't be handled atomically
(modify copy + rename)
* dind-host role: set node container hostname on creation
* add "Resulting deployment" section with some CLI outputs
* typo
* selectable node_distro: debian, ubuntu
* some fixes for node_distro: ubuntu
* cpu optimization: add early `pkill -STOP agetty`
* typo
* add centos dind support ;)
* add kubespray-dind.yaml, support fedora
- add kubespray-dind.yaml (former custom.yaml at README.md)
- rework README.md as per above
- use some YAML power to share distros' commonality
- add fedora support
* create unique /etc/machine-id and other updates
- create unique /etc/machine-id in each docker node,
used as seed for e.g. weave mac addresses
- with above, now netchecker 100% passes WoHooOO!
🎉🎉🎉
- updated README.md output from (1.12.1, verified
netcheck)
* minor typos
* fix centos node creation, needs earlier udevadm removal to avoid flaky facts, also verified netcheck Ok \o/
* add Q&D test-distros.sh, back to manual /etc/machine-id hack
* run-test-distros.sh cosmetics and minor fixes
* run-test-distros.sh: $rc fix and minor formatting changes
* run-test-distros.sh output cosmetics
- Local Volume StorageClass configuration is now manged by `local_volume_provisioner_storage_classes`, a list of maps that specifies local storage classes with `name` `host_dir` and `mount_dir` keys per entry
- Tasks and templates updated to loop through local volume storage classes
- Previous defaults for path/class names were not changed
- Fixed an issue where a `kubernetes/preinstall` was creating directories inconsistently with the `kubernetes-apps/external_provisioner/local_volume_provisioner` task
* calico upgrade to v3
* update calico_rr version
* add missing file
* change contents of main.yml as it was left old version
* enable network policy by default
* remove unneeded task
* Fix kubelet calico settings
* fix when statement
* switch back to node-kubeconfig.yaml
The number of pods on a given node is determined by the --max-pods=k
directive. When the address space is exhausted, no more pods can be
scheduled even if from the --max-pods-perspective, the node still has
capacity.
The special case that a pod is scheduled and uses the node IP in the
host network namespace is too "soft" to derive a guarantee.
Comparing kubelet_max_pods with kube_network_node_prefix when given
allows to assert that pod limits match the CIDR address space.
* Move front-proxy-client certs back to kube mount
We want the same CA for all k8s certs
* Refactor vault to use a third party module
The module adds idempotency and reduces some of the repetitive
logic in the vault role
Requires ansible-modules-hashivault on ansible node and hvac
on the vault hosts themselves
Add upgrade test scenario
Remove bootstrap-os tags from tasks
* fix upgrade issues
* improve unseal logic
* specify ca and fix etcd check
* Fix initialization check
bump machine size
* sysctl file should be in defaults so that it can be overriden
* Change sysctl_file_path to be consistent with roles/kubernetes/preinstall/defaults/main.yml