* bump to dashboard 2.0 rc6 with metrics scrapper
* fix missing yaml seperator making Replicaset complaining about missing ServiceAccount
* unwanted legay gross hack forgot to remove before
* no need namespace on CrBinding
* bump to 2.0.0 release
* remove dashboard_metrics_scrapper_enabled
* added required permissions for querying endpointslice resources
* copy-pasted role permissions from cilium install manifests
* bumped cilium version to v1.7.2
- This solves issue #5721 & #5713 (dupes)
- Provide a cleaner default usage pattern for the download role
around etcd that supports 'host' and 'docker' properly
- Extract the 'etcdctl' as a separate task install piece and reuse it where
appropriate
- Update the kubeadm-etcd task to reflect the above change
* Upgrade etcd to 3.3.18
* Try with etcd 3.3.15 (kubeadm 1.16.7 default)
* Back to square one
* Try with 3.3.11
* Upgrade etcd to 3.3.18 (take 2)
* Try with 3.3.12
* download file
* download containers
* fix push image to nodes
* pull if none image on host
* fix
* improve docker image tag checks.
do not pull already cached images
* rebase fix merge conflict
* add support download_run_once when upgrade and scale cluster
add some test with download_run_once
* set default values to temp flag for every download cycle
* add save,load abilty for containerd and crio when download_run_once=true
* return redefine image save/load command to set_docker_image_facts.yml
* move set command to set_container_facts
* ctr in containerd_bin_dir
* fix order of ctr image export arguments
* temporary disable download_run_once for containerd and crio
due https://github.com/containerd/containerd/issues/4075
* remove unused files
* fix strict yaml linter warning and errors
* refactor logical conditions to pull and cache container images
* remove comment due lint check
* document role
* remove image_load_on_localhost, because cached images are always loaded to docker on remote sites
* remove XXX from debug output
I've tested this update by deploying a containerd / etcd cluster on top CentOS7,
MetalLB + NGINX Ingress. Upgrade using upgrade-cluster.yml
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Add support for Kubernetes 1.16.1
* Defaults to 1.16.1
* add 1.16.2 checksums and set new version as default
* correct 1.16.2 checksums and add 1.15.5 checksums
This allows to easily override the gcr, quay, and docker repos with the
mirror repos in countries like China, where the default accesses are
blocked or unstable.
Cleaned up deprecated APIs:
apps/v1beta1
apps/v1beta2
extensions/v1beta1 for ds,deploy,rs
Add workaround for deploying helm using incompatible
deployment manifest.
Change-Id: I78b36741348f47a999df3841ee63cf4e6f377830
* run 'task download_container | Copy image to ansible host cache' with synchronize on download_delegate host
* try to run task copy file to ansible host on all inventory, not only on first random host
Fixes situation when using manual mode because it
tries to download coredns v1.3.1 from the same
image repository where kubernetes images are
downloaded from.
Change-Id: Ibbec8a72c8162ce8befa74e2013a268737ea5f8a
* Enable containerd to deploy vanilla containerd package
Fixes kubeadm references to CRI socket for containerd
Fixes download role cache feature to work with containerd
Change-Id: I2ab8f0031107e2f0d1a85c39b4beb66f08509a01
* use containerd for flannel-addons job
Change-Id: Ied375c7d65e64a625ffbd995ff16f2374067dee6
* add containerd vars
Change-Id: Ib9a8a04e501c481a86235413cbec63f3672baf91
* fixup vars
Change-Id: Ibea64e4b18405a578b52a13da100384582aa24c2
* more fixes
* fix rh repo
Change-Id: I00575a77cfb7b81d6095db5d918a52023c8f13ba
* Adjust helm host install for containerd
* Add calico 3.7.3 support
* add calico_datastore variable to policy controller role
* add missing clusterrole rules for calico policy controller
* disable calico kube controller when kdd mode is used for versions < 3.6
* Use K8s 1.15
* Use Kubernetes 1.15 and use kubeadm.k8s.io/v1beta2 for
InitConfiguration.
* bump to v1.15.0
* Remove k8s 1.13 checksums.
* Update README kubernetes version 1.15.0.
* Update metrics server 0.3.3 for k8s 1.15
* Remove less than k8s 1.14 related code
* Use kubeadm with --upload-certs instead of --experimental-upload-certs due to depricate
* Update dnsautoscaler 1.6.0
* Skip certificateKey if it's not defined
* Add kubeadm-conftolplane.v2beta2 for k8s 1.15 or later
* Support kubeadm control plane for k8s 1.15
* Update sonobuoy version 0.15.0 for k8s 1.15
* Add limited containerd support
Containerd support for Ubuntu + Calico
* Added CRI-O support for ubuntu
* containerd support.
* Reset containerd support.
* fix lint.
* implemented feedback
* Change task name cri xx instead of cri-o in reset task and timeout condition.
* set crictl to fixed version
* Use docker-ce's container.io package for containerd.
* Add check containerd is installable or not.
* Avoid stop docker when use containerd and optimize retry for reset.
* Add config.toml.
* Fixed containerd for kubelet.env.
* Merge PR #4629
* Remove unused ubuntu variable for containerd
* Polish code for containerd and cri-o
* Refactoring cri socket configuration.
* Configurable conmon.
* Remove unused crictl/runc download
* Now crictl and runc is downloaded by common crictl.yml.
* fixed yamllint error
* Fixed brokenfiles by conflict.
* Remove commented line in config.toml
* Remove readded v1.12.x version
* Fixed broken set_docker_image_facts
* Fix yamllint errors.
* Remove unused apt source
* Fix crictl could not be installed
* Add containerd config from skolekonov's PR #4601
* Require minimum version of Kubernetes
* Remove checksums for kubernetes version 1.12
* Add kube_version to precheck output and add min required version to README
* Fix merge
* Fix defaults
* Fix typo in precheck
* File and container image downloads are now cached localy, so that repeated vagrant up/down runs do not trigger downloading of those files. This is especially useful on laptops with kubernetes runnig locally on vm's. The total size of the cache, after an ansible run, is currently around 800MB, so bandwidth (=time) savings can be quite significant.
* When download_run_once is false, the default is still not to cache, but setting download_force_cache will still enable caching.
* The local cache location can be set with download_cache_dir and defaults to /tmp/kubernetes_cache
* A local docker instance is no longer required to cache docker images; Images are cached to file. A local docker instance is still required, though, if you wish to download images on localhost.
* Fixed a FIXME, wher the argument was that delegate_to doesn't play nice with omit. That is a correct observation and the fix is to use default(inventory_host) instead of default(omit). See ansible/ansible#26009
* Removed "Register docker images info" task from download_container and set_docker_image_facts because it was faulty and unused.
* Removed redundant when:download.{container,enabled,run_once} conditions from {sync,download}_container.yml
* All features of commit d6fd0d2aca by Timoses <timosesu@gmail.com>, merged May 1st 2019, are included in this patch. Not all code was included verbatim, but each feature of that commit was checked to be working in this patch. One notable change: The actual downloading of the kubeadm images was moved to {download,sync)_container, to enable caching.
Note 1: I considered splitting this patch, but most changes that are not directly related to caching, are a pleasant by-product of implementing the caching code, so splitting would be impractical.
Note 2: I have my doubts about the usefulness of the upload, download and upgrade tags in the download role. Must they remain or can they be removed? If anybody knows, then please speak up.
* Add support for arm images for hyperkube, kubeadm and cni_binary
* Add dummy etcd checksum for arm
This commit adds dummy etcd checksum for arm to avoid "no attribute" error
during setup.
* Add etcd host assert check
* Add 1.13.4 checksums of kubeadm and hyperkube for arm
* Update checksums of kubeadm and hyperkube for arm
* Add dummy checksums for calicoctl_binary_checksums dict
* disable gather_facts because it causes tests to fail
* Remove architecture check for etcd, due to unable to run tests
Without this, pulls are considered for all
hosts groups, even if not targetted by the downloads
`groups` list. Hence, a download/sync is triggered
even though the host does not require the image.
* Download to delegate and sync files when download_run_once
* Fail on error after saving container image
* Do not set changed status when downloaded container was up to date
* Only sync containers when they are actually required
Previously, non-required images (pull_required=false as
image existed on target host) were synced to the target
hosts. This failed as the image was not downloaded to
the download_delegate and hence was not available for
syncing.
* Sync containers when only missing on some hosts
* Consider images with multiple repo tags
* Enable kubeadm images pull/syncing with download_delegate
* Use kubeadm images list to pull/sync
'kubeadm config images pull' is replaced by collecting the images
list with 'kubeadm config images list' and using the commonly
used method of pull/syncing the images.
* Ensure containers are downloaded and synced for all hosts
* Fix download/syncing when download_delegate is a kubernetes host
* Use K8s 1.14 and add kubeadm experimental control plane mode
This reverts commit d39c273d96.
* Cleanup kubeadm setup run on first master
* pin kubeadm_certificate_key in test
* Remove kubelet autolabel of kube-node, add symlink for pki dir
Change-Id: Id5e74dd667c60675dbfe4193b0bc9fb44380e1ca
* Add ansible-lint as gitlab-ci step
* Fix jinja2 syntax in include_tasks that breaks ansible-lint
* Use a block scalar to get around gitlab quoting/escaping rules
* Run ansible-lint in verbose mode in CI
Both kubedns and dnsmasq modes are long not maintained.
We should run dns_late steps at the end because sshd
makes DNS lookups during Ansible run and has 2s timeouts
for each failed lookup trying to connect to coredns before
it is ready.
* feat(external-provisioner/local-path-provisioner): adds support for local path provisioner
Helpful for local development but also in production workloads (once the
permission model is worked out) where you have redundancy built into the
software uses the PVCs (e.g. database cluster with synchronous
replication)
* feat(local-path-provisioner): adds debug flag, image tag group var
* fix(local-path-provisioner): moves image repo/tag to download role
* test(gce_centos7-flannel): enables local-path-provisioner in test case
* fix(addons): add image repo/tag to commented default values
* fix(local-path-provisioner): typo in jinja template for local path provisioner
* style(local-path-provisioner): debug flag condition re-formatted
* fix(local-path-provisioner): adds missing default value for debug flag
* fix(local-path-provisioner): syntax fix for debug if condition end
* fix(local-path-provisioner): jinja template syntax: if condition white space
Currently, the task `container_download | download images for kubeadm config images` fetches etcd image even though it's not required (etcd is bootstrapped by kubespray, not kubeadm).
`kubeadm-images.yaml` is only a subset of `kubeadm-config.yaml`, therefore ``kubeadm config images pull` will try to get all this list (including etcd)
```
# kubeadm config images list --config /etc/kubernetes/kubeadm-images.yaml
k8s.gcr.io/kube-apiserver:v1.13.2
k8s.gcr.io/kube-controller-manager:v1.13.2
k8s.gcr.io/kube-scheduler:v1.13.2
k8s.gcr.io/kube-proxy:v1.13.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/etcd:3.2.24
k8s.gcr.io/coredns:1.2.6
```
When using the `kubeadm-config.yaml` though, it doesn't list etcd image:
```
# kubeadm config images list --config /etc/kubernetes/kubeadm-config.yaml
k8s.gcr.io/kube-apiserver:v1.13.2
k8s.gcr.io/kube-controller-manager:v1.13.2
k8s.gcr.io/kube-scheduler:v1.13.2
k8s.gcr.io/kube-proxy:v1.13.2
k8s.gcr.io/pause:3.1
k8s.gcr.io/coredns:1.2.6
```
This change just adds the etcd endpoints in the `kubeadm-images.yaml` to give a hint to kubeadm it doesn't need etcd image for its boostrapping as etcd is "external".
I confess it is a ugly hack, a better way would be to use a single `kubeadm-config.yaml` for both tasks, but they are triggered by different roles (`kubeadm-images.yaml` is used by download, `kubeadm-config.yaml` by kubernetes/master) at different steps and I didn't want to refactor too many things to prevent breakage.
This is specially useful for offline installation where a whitelist of container images is mirrored on a local private container registry. `k8s.gcr.io/etcd` and `quay.io/coreos/etcd` are two different repositories hosting the same images but using *different tags*!
* coreos/etcd:v3.2.24
* k8s.gcr.io/etcd:3.2.24 (note the missing 'v' in the tag name)
Addressing the discussion started in #4064, this PR moves kubeadm and
hyperkube binaries to /usr/local/bin before running them on the master
nodes.
It is to address the case where local_release_dir points to /tmp
(kubespray default) and /tmp is mounted with noexec mode, preventing
any binaries to be run in that partition.
In role "node", we still move kubeadm to bin_dir only on the worker
nodes.
* Add support for running a nodelocal dns cache
After encountering dns issues in a cluster I was recently working on I
noticed Kubernetes 1.13 introduced support for running a nodelocal dns
cache.
I believe this can usefull for more people.
73b548db06https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md
* Add requested changes
* Add additional requested changes + documentation
* Add requested changes after review
* Replace incorrect variable
* Upgrade kubernetes to v1.13.0
* Remove all precense of scheduler.alpha.kubernetes.io/critical-pod in templates
* Fix cert dir
* Use kubespray v2.8 as baseline for gitlab
* Remove non-kubeadm deployment
* More cleanup
* More cleanup
* More cleanup
* More cleanup
* Fix gitlab
* Try stop gce first before absent to make the delete process work
* More cleanup
* Fix bug with checking if kubeadm has already run
* Fix bug with checking if kubeadm has already run
* More fixes
* Fix test
* fix
* Fix gitlab checkout untill kubespray 2.8 is on quay
* Fixed
* Add upgrade path from non-kubeadm to kubeadm. Revert ssl path
* Readd secret checking
* Do gitlab checks from v2.7.0 test upgrade path to 2.8.0
* fix typo
* Fix CI jobs to kubeadm again. Fix broken hyperkube path
* Fix gitlab
* Fix rotate tokens
* More fixes
* More fixes
* Fix tokens
* Remove variables defined in download role. Fixes#3799
* Cleanup some more variables
* Fix bad templating
* Minor fix
* Add dashboard to download role. Fixes#3736
When `ansible_user` is not root, using `-b` option.
And with `download_run_once` and `download_localhost` set `true`.
Ansible will executes `container_download | upload container images to nodes` task.
It uses rsync to upload images to `/tmp/release/container/`, but the
`container` directory owned by `root`.
* Support Metrics Server as addon (#3560).
* Update metrics server v0.3.1.
* Add metrics server test.
* Replace metrics server manifests with kubernetes/cluster/addons's.
* Modify metrics server manifests for kubespray.
* Follow PR#3558 node label node-role.kubernetes.io/master change
* Fix metrics server parameters base_metrics_server_... to metrics_server_...
* Fix too hard corded metrics_server_memory_per_node
* Add configurable insecure tls for metrics-apiservice
* Downloadable addon-resizer and extract parameter as variables
* Remove metrics server version from deployment name
* Metrics Server work when all masters has node role
* Download metrics-server and add-resizer container only on master
* ServiceAccount and ConfigMap is separated and fix application name
* Remove old metrics server clusterrole template
* Fix addon-resizer image specify
* Make InternalIP default for metrics_server_kubelet_preferred_address_types
Make InternalIP default because multiple preferrred address types does not work.
* Enable AutoScaler for CoreDNS
* Only use one template for dns autoscaler
* Rename a few variables for replicas and minimum pods
* Rename a few variables for replicas and minimum pods
* Remove replicas to make autoscale work
* Cleanup kubedns-autoscaler as it has been renamed
* Adds support for Multus (multiple interfaces) CNI plugin
Multus is a latin word for "Multi". As the name suggests, it acts as a
Multi plugin in Kubernetes and provides multiple network interface
support in a pod. Multus uses the concept of invoking delegates by
grouping multiple plugins into delegates and invoking them in the
sequential order of the CNI configuration file provided in json format.
* Change CNI version (0.1.0->0.3.1) of Contiv to be compatible with Multus
kube-router v0.2.1 highlights from changelog:
- IPv6 WIP but pretty close to full working functionality
- fully support network policy semantics with addition of support for
ipblock and except
* failed
* version_compare
* succeeded
* skipped
* success
* version_compare becomes version since ansible 2.5
* ansible minimal version updated in doc and spec
* last version_compare
* [jjo] add kube-router support
Fixescloudnativelabs/kube-router#147.
* add kube-router as another network_plugin choice
* support most used kube-router flags via
`kube_router_foo` vars as other plugins
* implement replacing kube-proxy (--run-service-proxy=true) via
`kube_proxy_mode: none`, verified in a _non kubeadm_enabled_
install, should also work for recent kubeadm releases via
`skipKubeProxyInstall: true` config
* [jjo] address PR#3339 review from @woopstar
* add busybox image used by kube-router to downloads
* fix busybox download groups key
* rework kubeadm_enabled + kube_router_run_service_proxy
- verify it working ok w/the kubeadm_enabled and
kube_router_run_service_proxy true or false
- introduce `kube_proxy_remove` fact, to decouple logic
from kube_proxy_mode (which affects kubeadm configmap
settings, thus no-good to ab-use it to 'none')
* improve kube-router.md re: kubeadm_enabled and kube_router_run_service_proxy
* address @woopstar latest review
* add inventory/sample/group_vars/k8s-cluster/k8s-net-kube-router.yml
* fix kube_router_run_service_proxy conditional for kube-proxy removal
* fix kube_proxy_remove fact (w/ |bool), add some needed kube-proxy tags on my and existing changes
* update kube-router tolerations for 1.12 compatibility
* add PriorityClass to kube-router DaemonSet
According to the documentation, container images are described
by vars like `foo_image_repo` and `foo_image_tag`.
The variables netcheck_{agent,server}_{img_repo,tag} do not
follow that convention.
* calico upgrade to v3
* update calico_rr version
* add missing file
* change contents of main.yml as it was left old version
* enable network policy by default
* remove unneeded task
* Fix kubelet calico settings
* fix when statement
* switch back to node-kubeconfig.yaml
Upgrade Kubernetes to V1.11.2
The kubeadm configuration file version has been upgraded from v1alpha1 to v1alpha2
Add bootstrap kubeadm-config.yaml with external etcd
* Update local-volume-provisioner-ds.yml.j2
After v1.10.2 default mountPropagation is "None"
* local_volume_provisioner version bump
v2.1.0 uses the beta nodeAffinity API by default which is available starting 1.10
* Update local-volume-provisioner-ds.yml.j2
MY_NAMESPACE env
* Update README.md
Raw block devices docs.
ingress-nginx 0.16.2 (https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.16.2)
This patch simplify ingress-nginx deployment by default deploy on
master, with customizable options; on the other hand, remove the
additional Ansible group "kube-ingress" and its k8s node label
injection.
Reference to https://kubernetes.io/docs/concepts/services-networking/ingress/#prerequisites:
GCE/Google Kubernetes Engine deploys an ingress controller on the master.
By changing `ingress_nginx_nodeselector` plus custom k8s node
label, user could customize the DaemonSet deployment target.
If `ingress_nginx_nodeselector` is empty, will deploy DaemonSet on
every k8s node.
- cephfs-provisioner 06fddbe2 (https://github.com/kubernetes-incubator/external-storage/tree/06fddbe2/ceph/cephfs)
Noteable changes from upstream:
- Added storage class parameters to specify a root path within the backing cephfs and, optionally, use deterministic directory and user names (https://github.com/kubernetes-incubator/external-storage/pull/696)
- Support capacity (https://github.com/kubernetes-incubator/external-storage/pull/770)
- Enable metrics server (https://github.com/kubernetes-incubator/external-storage/pull/797)
Other noteable changes:
- Clean up legacy manifests file naming
- Remove legacy manifests, namespace and storageclass before upgrade
- `cephfs_provisioner_monitors` simplified as string
- Default to new deterministic naming
- Add `reclaimPolicy` support in StorageClass
With legacy non-deterministic naming style (where $UUID are generated ramdonly):
- cephfs_provisioner_claim_root: /volumes/kubernetes
- cephfs_provisioner_deterministic_names: false
- Generated CephFS volume: /volumes/kubernetes/kubernetes-dynamic-pvc-$UUID
- Generated CephFS user: kubernetes-dynamic-user-$UUID
With new default deterministic naming style (where $NAMESPACE and $PVC are predictable):
- cephfs_provisioner_claim_root: /volumes
- cephfs_provisioner_deterministic_names: true
- Generated CephFS volume: /volumes/$NAMESPACE/$PVC
- Generated CephFS user: k8s.$NAMESPACE.$PVC
Currently all the gcr.io images used in kubespray can only run on x86.
Also gcr.io has not fully support multi-arch docker images.
Add extra var "image_arch" (default is amd64) to support running other
platforms, like arm64.
Change-Id: I8e1c9af533c021cb96ade291a1ce58773b40e271
* Move front-proxy-client certs back to kube mount
We want the same CA for all k8s certs
* Refactor vault to use a third party module
The module adds idempotency and reduces some of the repetitive
logic in the vault role
Requires ansible-modules-hashivault on ansible node and hvac
on the vault hosts themselves
Add upgrade test scenario
Remove bootstrap-os tags from tasks
* fix upgrade issues
* improve unseal logic
* specify ca and fix etcd check
* Fix initialization check
bump machine size
Added CoreDNS to downloads
Updated with labels. Should now work without RBAC too
Fix DNS settings on hosts
Rename CoreDNS service from kube-dns to coredns
Add rotate based on http://edgeofsanity.net/rant/2017/12/20/systemd-resolved-is-broken.html
Updated docs with CoreDNS info
Added labels and fixed minor settings from official yaml file: https://github.com/kubernetes/kubernetes/blob/release-1.9/cluster/addons/dns/coredns.yaml.sed
Added a secondary deployment and secondary service ip. This is to mitigate dns timeouts and create high resitency for failures. See discussion at 'https://github.com/coreos/coreos-kubernetes/issues/641#issuecomment-281174806'
Set dns list correct. Thanks to @whereismyjetpack
Only download KubeDNS or CoreDNS if selected
Move dns cleanup to its own file and import tasks based on dns mode
Fix install of KubeDNS when dnsmask_kubedns mode is selected
Add new dns option coredns_dual for dual stack deployment. Added variable to configure replicas deployed. Updated docs for dual stack deployment. Removed rotate option in resolv.conf.
Run DNS manifests for CoreDNS and KubeDNS
Set skydns servers on dual stack deployment
Use only one template for CoreDNS dual deployment
Set correct cluster ip for the dns server
* Added cilium support
* Fix typo in debian test config
* Remove empty lines
* Changed cilium version from <latest> to <v1.0.0-rc3>
* Add missing changes for cilium
* Add cilium to CI pipeline
* Fix wrong file name
* Check kernel version for cilium
* fixed ci error
* fixed cilium-ds.j2 template
* added waiting for cilium pods to run
* Fixed missing EOF
* Fixed trailing spaces
* Fixed trailing spaces
* Fixed trailing spaces
* Fixed too many blank lines
* Updated tolerations,annotations in cilium DS template
* Set cilium_version to iptables-1.9 to see if bug is fixed in CI
* Update cilium image tag to v1.0.0-rc4
* Update Cilium test case CI vars filenames
* Add optional prometheus flag, adjust initial readiness delay
* Update README.md with cilium info