Fix issue where `kubeadm join` could wait forever for joining.
Fix issue where `kubeadm join` were not reaching the user, making
impossible to find the cause of the failure.
New behaviour is to first attempt to join without bypassing the
verifications checks and to display them if needed.
If this fails it still attempts to join by ignoring the check in
order to make previous behavior.
A timeout of 60 seconds is allocated for a joining.
Related-bug: #3973
* bootstrap: rework role
* support being called from a non-root user
* run some commands in check mode
* unify spelling/task names
* bootstrap: fix wording of comments for check_mode: false
* bootstrap: remove setup-pipelining task
* OCI subnet AD 2 is not required for CCM >= 0.7.0
Reorganize OCI provider to generate configuration, rather than pull
Add pull secret option to OCI cloud provider
* Updated oci example to document new parameters
This PR ensures that the e2fsprogs and xfsprogs packages are
installed on all Kubernetes nodes and that the packages are
the latest versions. It also ensures that the nodes can
create XFS filesystems when necessary, since not all distros
install xfsprogs by default.
e2fsprogs - ext2/ext3/ext4 file system utilities
xfsprogs - Utilities for managing the XFS filesystem
* Calico: Ability to define the default IPPool CIDR (instead of kube_pods_subnet)
* Documentation for calico_pool_cidr (and calico_advertise_cluster_ips which has been forgotten...)
* Set cluster DNS correctly in case of nodelocal dns cache
* Pass in cluster_ip based on dns mode
* Disable nodelocaldns by default
* Fix syntax error
* Fix syntax issue
* Add nodelocadns ip to vars of node installation
* Change location of nodelocaldns_ip
* Try to remove newlines from jinja template
* Add debug for config file
* Move parameter logic outside of template
* Adapt templates after feedback
* Remove debugging
Addressing the discussion started in #4064, this PR moves kubeadm and
hyperkube binaries to /usr/local/bin before running them on the master
nodes.
It is to address the case where local_release_dir points to /tmp
(kubespray default) and /tmp is mounted with noexec mode, preventing
any binaries to be run in that partition.
In role "node", we still move kubeadm to bin_dir only on the worker
nodes.
I know this is a bit hack.
If you use cloud LB, you can use kubeadm's controlPlaneEndpoint to configure kube-proxy's server field.
But for nginx-proxy, it didn't start when kubeadm init.
Looks like `epel_enabled` was not configured for the epel install in `bootstrap-centos.yml`. Also, there were no conditionals that would trigger bootstrap for RHEL.
* Use external LB IP for external api endpoint
Use loadbalancer_apiserver.address instead of apiserver_loadbalancer_domain_name for kudadm init --apiserver-advertise-address argument
https://kubernetes.io/docs/reference/setup-tools/kubeadm/kubeadm-init/#options states apiserver-advertise-address needs to be a IPv4 or IPv6 address
* only use loadbalancer IP if it is defined
I found a potential use case where `writable` could be null and therfore
not treated like a boolean, so this adds an extra default statement to
avoid negating a non-boolean as boolean which would lead to undefined. refs #4020
Looks like the template is removing the trailing space between storage
class entries, and since CI only has one storage class we never hit this
issue. This change will prevent the yaml from printing on a single line
when multiple storage classes are defined.
In v1beta1 of `ClusterConfiguration` the extraVolumes `writable` field was changed to `readOnly` and its boolean value must be negated.
Also, the json field for `useHyperKubeImage` was incorrectly capitalized.
Right now we're consistently getting warnings about kubelet not found in
path during `kubeadm init`. We fixed this for `kubeadm join` in #3342, and this brings the change to init
as well.
- Fixed an issue where storage class host directories were looped
through excessive target hosts
- Fixes examples in the LVP `README.md` to use nested dicts instead of a
list of dicts
* Makes local volume provisioner more dynamic
* Correct variable name in local storage provisioner defaults
* Updates external-provisioner readme
* Updates variable naming to be more clear, more documentation, fixes sample inventory
* Variable refactor, untangled some jinja2 loops
* Corrects variable name
* No variable substitution in dict keys, replaced with anchor
* Fixes default storage_classes dict, inline docs
* Fixes spelling in inline docs
* Addresses comments in review
* Updates all the defaults
* Fix failing CI task
* Fixes external provisioner daemonset
* allows to override the bind addresses for controller-manager and scheduler
Useful for Prometheus metrics monitoring
* Add bind addr override support in kubeadm/v1beta1
Adds support for override of bind addresses for controller-manager
and scheduler in kubeadm/v1beta1
* Move location of bind address vars
* Remove double declaration of schedulerExtraArgs
The change implemented in #3908 remove line breaks for supplementary
addresses in kubeadm SANs, causing errors in the config file and
failure to bring cluster up. This commit reimplement line breaks in
between supplementary addresses.
- Creates and defaults an ansible variable for every configuration option in the `kubeproxy.config.k8s.io/v1alpha1` type spec
- Fixes vars that were orphaned by removing non-kubeadm
- Fixes previously harcoded kubeadm values
- Introduces a `main` directory for role default files per component (requires ansible 2.6.0+)
- Split out just `kube-proxy.yml` in this first effort
- Removes the kube-proxy server field patch task
We should continue to pull out other components from `main.yml` into their own defaults files as I did here for `defaults/main/kube-proxy.yml`. I hope for and will need others to join me in this refactoring across the project until each component config template has a matching role defaults file, with shared defaults in `kubespray-defaults` or `downloads`
The containerd service and socket files have been dropped from the
openSUSE docker package so we should not require them in the docker
service anymore. This makes the docker service file look similar to
the one shipped by the openSUSE package.
Signed-off-by: Markos Chandras <mchandras@suse.de>
* controlPlaneEndpoint set up through load balancer should be possible even in single master setups
Enable load balancer for single-master setups
Fixes an issue where single-master setups are not reachable using the usual admin.conf from outside the cluster.
controlPlaneEndpoint set up through load balancer should be possible even in single master setups
* add fix to other api versions
* remove obsolete check completely
* remove check, pass 2
* removes checks in client configuration
* delete 'and'
* Add support for running a nodelocal dns cache
After encountering dns issues in a cluster I was recently working on I
noticed Kubernetes 1.13 introduced support for running a nodelocal dns
cache.
I believe this can usefull for more people.
73b548db06https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md
* Add requested changes
* Add additional requested changes + documentation
* Add requested changes after review
* Replace incorrect variable
Setting host_architecture to allow etcd upgrade working through: ansible-playbook -b -i inventory/sample/hosts.ini cluster.yml --tags=etcd (on other case host_architecture is missing)
* Upgrade kubernetes to v1.13.0
* Remove all precense of scheduler.alpha.kubernetes.io/critical-pod in templates
* Fix cert dir
* Use kubespray v2.8 as baseline for gitlab
* Remove non-kubeadm deployment
* More cleanup
* More cleanup
* More cleanup
* More cleanup
* Fix gitlab
* Try stop gce first before absent to make the delete process work
* More cleanup
* Fix bug with checking if kubeadm has already run
* Fix bug with checking if kubeadm has already run
* More fixes
* Fix test
* fix
* Fix gitlab checkout untill kubespray 2.8 is on quay
* Fixed
* Add upgrade path from non-kubeadm to kubeadm. Revert ssl path
* Readd secret checking
* Do gitlab checks from v2.7.0 test upgrade path to 2.8.0
* fix typo
* Fix CI jobs to kubeadm again. Fix broken hyperkube path
* Fix gitlab
* Fix rotate tokens
* More fixes
* More fixes
* Fix tokens
* Remove variables defined in download role. Fixes#3799
* Cleanup some more variables
* Fix bad templating
* Minor fix
* Add dashboard to download role. Fixes#3736
* Set configure-cloud-routes=false as default if no network plugin is used
As configure-cloud-routes default value is `true`, so it need to be set to `false` when not required to avoid error messages like:
"Couldn't reconcile node routes: error listing routes: unable to find route table for AWS cluster"
on, for example, AWS installations that don't use cloud native routing.
* Update kube-controller-manager.manifest.j2
remove extra spaces
Introduced variable node_taints which can be set in inventory for
specific hosts or in group_vars, which generates --register-with-taints
command line argument for kubelet.
Introduced variable `ingress_nginx_tolerations` to set custom
tolerations for Ingress nginx daemonset, to be able to schedule
ingress-nginx on dedicated nodes with taints.
* Update defaults to match k8s 1.12 suggestions
* Test if Netchecker works with node ip instead of localhost
* Update defaults to ipvs and coredns
* Update defaults for kube_apiserver_insecure_port
* Update main.yaml
When `ansible_user` is not root, using `-b` option.
And with `download_run_once` and `download_localhost` set `true`.
Ansible will executes `container_download | upload container images to nodes` task.
It uses rsync to upload images to `/tmp/release/container/`, but the
`container` directory owned by `root`.
Now the `kubespray-aws-inventory.py` script always set a node_labels key
to ansible_host.
When AWS instance did not set property labels, it would be an empty
string.
The TASK `Write kubelet config file (kubeadm or non-kubeadm)` will
failed with a msg:
`AnsibleUndefinedVariable: 'unicode object' has no attribute 'items'`.
* Support Metrics Server as addon (#3560).
* Update metrics server v0.3.1.
* Add metrics server test.
* Replace metrics server manifests with kubernetes/cluster/addons's.
* Modify metrics server manifests for kubespray.
* Follow PR#3558 node label node-role.kubernetes.io/master change
* Fix metrics server parameters base_metrics_server_... to metrics_server_...
* Fix too hard corded metrics_server_memory_per_node
* Add configurable insecure tls for metrics-apiservice
* Downloadable addon-resizer and extract parameter as variables
* Remove metrics server version from deployment name
* Metrics Server work when all masters has node role
* Download metrics-server and add-resizer container only on master
* ServiceAccount and ConfigMap is separated and fix application name
* Remove old metrics server clusterrole template
* Fix addon-resizer image specify
* Make InternalIP default for metrics_server_kubelet_preferred_address_types
Make InternalIP default because multiple preferrred address types does not work.
comparison that happens during `TASK [kubernetes-apps/ansible : Kubernetes Apps | Lay Down CoreDNS Template]` where the `dns-autoscaler` template is deployed causes coredns to fail deployment. The error is caused by the variable `dns_prevent_single_point_failure` where an integer is being compared with a string. The resulting error:
```bash
'>' not supported between instances of 'int' and 'str'
```
prevents successful deployment of CoreDNS.
The change makes the comparison happen between integers and allows CoreDNS to succeed.
* Enable AutoScaler for CoreDNS
* Only use one template for dns autoscaler
* Rename a few variables for replicas and minimum pods
* Rename a few variables for replicas and minimum pods
* Remove replicas to make autoscale work
* Cleanup kubedns-autoscaler as it has been renamed
add prometheus annotations to calico-node if
calico_felix_prometheusmetricsenabled is enabled.
This will allow a kubernetes_sd to automaticly find the pods and start
scraping.
* Fix Failure talking to yum: Cannot find a valid baseurl for repo: base/7/x86_64 if Install packages in CentOS using proxy
* Add proxy to /etc/yum.conf if http_proxy is defined
* Added changes to clean up orphan containers and reload docker & kubelet directories.
* Added new files for cleaning up orphans and docker & kubelet directories
* Added new lines at the end of these files
* removed the trailing whitespaces from main.yml and clean-up.yml
* Updated as per the review comments
* Updated as per the review comments
* Removed service_facts and package_facts because they are not supported in ansible 2.4.0
* Corrected yaml syntax errors
* Removed the use of json_query filter and utilized selectattr
* Removed trailing spaces
* Changed the default value of docker_clean_up to false
* Added Changes to only include cleanup-docker-orphans.sh
* Reverted back changes done inside handler.
* Removed trailing spaces and made default value of docker_orphan_clean_up as true
* Reverted the default value of docker_orphan_clean_up as false
* Made the docker clean up as drop in
* Made the docker clean up as drop in
* Reverted the value of boolean docker_orphan_clean_up to false
* Converted ExecStop to ExecSTartPost. Removed the live restore check from the orphan script
* Adds support for Multus (multiple interfaces) CNI plugin
Multus is a latin word for "Multi". As the name suggests, it acts as a
Multi plugin in Kubernetes and provides multiple network interface
support in a pod. Multus uses the concept of invoking delegates by
grouping multiple plugins into delegates and invoking them in the
sequential order of the CNI configuration file provided in json format.
* Change CNI version (0.1.0->0.3.1) of Contiv to be compatible with Multus
When using resolvconf_mode host_resolvconf, there is an early DNS
config stage where Kubernetes cluster DNS is not injected for host
DNS intially. Later, the cluster DNS is enabled, but we do not
need to run every task from the kubernetes/preinstall role.
kube-router v0.2.1 highlights from changelog:
- IPv6 WIP but pretty close to full working functionality
- fully support network policy semantics with addition of support for
ipblock and except
* warning on meta flush_handlers
* avoid rm
* avoid "Module remote_tmp /root/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually" warning on subsequent tasks using blockinfile
* is match
* failed
* version_compare
* succeeded
* skipped
* success
* version_compare becomes version since ansible 2.5
* ansible minimal version updated in doc and spec
* last version_compare
* [jjo] add kube-router support
Fixescloudnativelabs/kube-router#147.
* add kube-router as another network_plugin choice
* support most used kube-router flags via
`kube_router_foo` vars as other plugins
* implement replacing kube-proxy (--run-service-proxy=true) via
`kube_proxy_mode: none`, verified in a _non kubeadm_enabled_
install, should also work for recent kubeadm releases via
`skipKubeProxyInstall: true` config
* [jjo] address PR#3339 review from @woopstar
* add busybox image used by kube-router to downloads
* fix busybox download groups key
* rework kubeadm_enabled + kube_router_run_service_proxy
- verify it working ok w/the kubeadm_enabled and
kube_router_run_service_proxy true or false
- introduce `kube_proxy_remove` fact, to decouple logic
from kube_proxy_mode (which affects kubeadm configmap
settings, thus no-good to ab-use it to 'none')
* improve kube-router.md re: kubeadm_enabled and kube_router_run_service_proxy
* address @woopstar latest review
* add inventory/sample/group_vars/k8s-cluster/k8s-net-kube-router.yml
* fix kube_router_run_service_proxy conditional for kube-proxy removal
* fix kube_proxy_remove fact (w/ |bool), add some needed kube-proxy tags on my and existing changes
* update kube-router tolerations for 1.12 compatibility
* add PriorityClass to kube-router DaemonSet
The hosts(5) manpage clearly states that the first entry is the
"canonical name", or FQDN (Fully-Qualified Domain Name):
IP_address canonical_hostname [aliases...]
By using the alias as a first entry, `hostname -f` does not return the
correct domain which breaks all sorts of unrelated functionality (it
has impact over email server configuration, for example).
* [jjo] add DIND support to contrib/
- add contrib/dind with ansible playbook to
create "node" containers, and setup them to mimic
host nodes as much as possible (using Ubuntu images),
see contrib/dind/README.md
- nodes' /etc/hosts editing via `blockinfile` and
`lineinfile` need `unsafe_writes: yes` because /etc/hosts
are mounted by docker, and thus can't be handled atomically
(modify copy + rename)
* dind-host role: set node container hostname on creation
* add "Resulting deployment" section with some CLI outputs
* typo
* selectable node_distro: debian, ubuntu
* some fixes for node_distro: ubuntu
* cpu optimization: add early `pkill -STOP agetty`
* typo
* add centos dind support ;)
* add kubespray-dind.yaml, support fedora
- add kubespray-dind.yaml (former custom.yaml at README.md)
- rework README.md as per above
- use some YAML power to share distros' commonality
- add fedora support
* create unique /etc/machine-id and other updates
- create unique /etc/machine-id in each docker node,
used as seed for e.g. weave mac addresses
- with above, now netchecker 100% passes WoHooOO!
🎉🎉🎉
- updated README.md output from (1.12.1, verified
netcheck)
* minor typos
* fix centos node creation, needs earlier udevadm removal to avoid flaky facts, also verified netcheck Ok \o/
* add Q&D test-distros.sh, back to manual /etc/machine-id hack
* run-test-distros.sh cosmetics and minor fixes
* run-test-distros.sh: $rc fix and minor formatting changes
* run-test-distros.sh output cosmetics
* Added Priority class to tiller installation and also fixed tiller override implementation.
* Added changes to handle priority classes separately in tiller, instead of using the variable tiller_override
* Added changes to clean up orphan containers and reload docker & kubelet directories.
* Added new files for cleaning up orphans and docker & kubelet directories
* Added new lines at the end of these files
* removed the trailing whitespaces from main.yml and clean-up.yml
* Updated as per the review comments
* Updated as per the review comments
* Removed service_facts and package_facts because they are not supported in ansible 2.4.0
* Corrected yaml syntax errors
* Removed the use of json_query filter and utilized selectattr
* Removed trailing spaces
* Changed the default value of docker_clean_up to false
* Added Changes to only include cleanup-docker-orphans.sh
* Reverted back changes done inside handler.
* Removed trailing spaces and made default value of docker_orphan_clean_up as true
* Reverted the default value of docker_orphan_clean_up as false
* Made the docker clean up as drop in
* Made the docker clean up as drop in
* Reverted the value of boolean docker_orphan_clean_up to false
* #3475 - make dnsmasq to send queries to all servers in upstream. Make dnsmasq config file customizable.
* Code style fixes. Return current behaviour for dnsmasq strict-order flag.
* Fix DNS loop when resolvconf_mode is set to host_resolvconf
* Make sure upstream_dns_servers is defined when using resolvconf_mode == 'host_resolvconf'
* Only set upstream dns servers on KubeDNS and CoreDNS if they are defined
* Only set upstream dns servers on KubeDNS and CoreDNS if they are defined
- Local Volume StorageClass configuration is now manged by `local_volume_provisioner_storage_classes`, a list of maps that specifies local storage classes with `name` `host_dir` and `mount_dir` keys per entry
- Tasks and templates updated to loop through local volume storage classes
- Previous defaults for path/class names were not changed
- Fixed an issue where a `kubernetes/preinstall` was creating directories inconsistently with the `kubernetes-apps/external_provisioner/local_volume_provisioner` task
* Fix the jinja expression for openstack_tenant_id
OS_PROJECT_ID is obsolete in keystone v3 and jinja expression
doesn't set openstack_tenant_id as expected because of
undefined env var. Fixed the expression.
* Fix the dic iteration method in the kubelet template
Kubelet template rendering errors when additional Node lables are
added and using Python3. Update the method to be compatible to both
python2/3
Node lables doesn't work
According to the documentation, container images are described
by vars like `foo_image_repo` and `foo_image_tag`.
The variables netcheck_{agent,server}_{img_repo,tag} do not
follow that convention.
Before, Nodes tainted with NoExecute policy did not have calico/weave Pod.
Network pod should run on all nodes whatever happens on a specific node.
Also always set the Pods to be critical.
Also remove deprecated scheduler.alpha.kubernetes.io/tolerations annotations.
* Changes to assign pod priority to kube components.
* Removed the boolean flag pod_priority_assignment
* Created new priorityclass k8s-cluster-critical
* Created new priorityclass k8s-cluster-critical
* Fixed the trailing spaces
* Fixed the trailing spaces
* Added kube version check while creating Priority Class k8s-cluster-critical
* Moved k8s-cluster-critical.yml
* Moved k8s-cluster-critical.yml to kube_config_dir
When enable_network_policy is set to True with Calico 3 kubectl
apply fails with the error:
The Deployment "calico-kube-controllers" is invalid:
spec.strategy.rollingUpdate: Forbidden: may not be specified when
strategy type is 'Recreate'
See
https://github.com/kubernetes-incubator/kubespray/issues/3267
Changing the update strategy to RollingUpdate avoids this error.