Cleaned up deprecated APIs:
apps/v1beta1
apps/v1beta2
extensions/v1beta1 for ds,deploy,rs
Add workaround for deploying helm using incompatible
deployment manifest.
Change-Id: I78b36741348f47a999df3841ee63cf4e6f377830
* Enable nodes to run calicoctl
per-node tasks require waiting for calico-node to be applied
Change-Id: Ibe1076b7334a2da0332f2dd766fde0c3f172d1f2
* cleanup tasks that should run on master
Change-Id: I43a837879ef41596f14657ecd7f813899b6865ae
* Switch run_once calico logic to just run on first master
Change-Id: I6893711e354f63c5e1eaf6ac2e23d9a6347a555d
* Enable containerd to deploy vanilla containerd package
Fixes kubeadm references to CRI socket for containerd
Fixes download role cache feature to work with containerd
Change-Id: I2ab8f0031107e2f0d1a85c39b4beb66f08509a01
* use containerd for flannel-addons job
Change-Id: Ied375c7d65e64a625ffbd995ff16f2374067dee6
* add containerd vars
Change-Id: Ib9a8a04e501c481a86235413cbec63f3672baf91
* fixup vars
Change-Id: Ibea64e4b18405a578b52a13da100384582aa24c2
* more fixes
* fix rh repo
Change-Id: I00575a77cfb7b81d6095db5d918a52023c8f13ba
* Adjust helm host install for containerd
* Add calico 3.7.3 support
* add calico_datastore variable to policy controller role
* add missing clusterrole rules for calico policy controller
* disable calico kube controller when kdd mode is used for versions < 3.6
8080 is a pretty common port, using nodelocaldns_ip:8080 still
prevents node processes or hostNetwork=true processes to bind to *:8080
so switch to 9254 by default (prometheus port is 9253)
Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
* Use K8s 1.15
* Use Kubernetes 1.15 and use kubeadm.k8s.io/v1beta2 for
InitConfiguration.
* bump to v1.15.0
* Remove k8s 1.13 checksums.
* Update README kubernetes version 1.15.0.
* Update metrics server 0.3.3 for k8s 1.15
* Remove less than k8s 1.14 related code
* Use kubeadm with --upload-certs instead of --experimental-upload-certs due to depricate
* Update dnsautoscaler 1.6.0
* Skip certificateKey if it's not defined
* Add kubeadm-conftolplane.v2beta2 for k8s 1.15 or later
* Support kubeadm control plane for k8s 1.15
* Update sonobuoy version 0.15.0 for k8s 1.15
* Make local volume provisioner dir mode a variable
I need to change this for Nagios monitoring. Others may
need to as well. Had to close previous commits, sorry for
the spam.
* Make local volume provisioner dir mode a variable
I need to change this for Nagios monitoring. Others may
need to as well. Had to close previous commits, sorry for
the spam.
* Added pod psp in Rancher Local Path Provisioner
Added pod security policy (psp) in Rancher Local Path Provisioner.
Signed-off-by: André R. de Miranda <andre@miranda.work>
* Apply psp for Rancher Local Path Provisioner only when local_path_provisioner_namespace is not kube-system and also reorganized the templates
Error starting nginx because in requiredDropCapabilities is dropped all capabilities.
The nginx requires the following capabilities:
- CHOWN
- SETGID
- SETUID
Signed-off-by: André R. de Miranda <andre@miranda.work>
* Fix nodeselectors for contiv and nginx-ingress
Change-Id: Ib3eb6bd87193c69a90ee944c9164a0b6792c79ba
* Set kube proxy mode to iptables for addons task
Change-Id: Iff71a71f672405c74b4708c71db15ddc4391a53a
We don't need to support upgrades from 2 year old installs,
just from the last major version.
Also changed most retried tasks to 1s delay instead of longer.
The Stateless ClearLinux feature[1] requires the creation of folders
in /etc folder. This change ensure the existence of the
/etc/bash_completion.d/ folder for ClearLinux Distribution.
[1] https://clearlinux.org/features/stateless
Both kubedns and dnsmasq modes are long not maintained.
We should run dns_late steps at the end because sshd
makes DNS lookups during Ansible run and has 2s timeouts
for each failed lookup trying to connect to coredns before
it is ready.
* feat(external-provisioner/local-path-provisioner): adds support for local path provisioner
Helpful for local development but also in production workloads (once the
permission model is worked out) where you have redundancy built into the
software uses the PVCs (e.g. database cluster with synchronous
replication)
* feat(local-path-provisioner): adds debug flag, image tag group var
* fix(local-path-provisioner): moves image repo/tag to download role
* test(gce_centos7-flannel): enables local-path-provisioner in test case
* fix(addons): add image repo/tag to commented default values
* fix(local-path-provisioner): typo in jinja template for local path provisioner
* style(local-path-provisioner): debug flag condition re-formatted
* fix(local-path-provisioner): adds missing default value for debug flag
* fix(local-path-provisioner): syntax fix for debug if condition end
* fix(local-path-provisioner): jinja template syntax: if condition white space
* OCI subnet AD 2 is not required for CCM >= 0.7.0
Reorganize OCI provider to generate configuration, rather than pull
Add pull secret option to OCI cloud provider
* Updated oci example to document new parameters
* Set cluster DNS correctly in case of nodelocal dns cache
* Pass in cluster_ip based on dns mode
* Disable nodelocaldns by default
* Fix syntax error
* Fix syntax issue
* Add nodelocadns ip to vars of node installation
* Change location of nodelocaldns_ip
* Try to remove newlines from jinja template
* Add debug for config file
* Move parameter logic outside of template
* Adapt templates after feedback
* Remove debugging
Looks like the template is removing the trailing space between storage
class entries, and since CI only has one storage class we never hit this
issue. This change will prevent the yaml from printing on a single line
when multiple storage classes are defined.
- Fixed an issue where storage class host directories were looped
through excessive target hosts
- Fixes examples in the LVP `README.md` to use nested dicts instead of a
list of dicts
* Makes local volume provisioner more dynamic
* Correct variable name in local storage provisioner defaults
* Updates external-provisioner readme
* Updates variable naming to be more clear, more documentation, fixes sample inventory
* Variable refactor, untangled some jinja2 loops
* Corrects variable name
* No variable substitution in dict keys, replaced with anchor
* Fixes default storage_classes dict, inline docs
* Fixes spelling in inline docs
* Addresses comments in review
* Updates all the defaults
* Fix failing CI task
* Fixes external provisioner daemonset
* Add support for running a nodelocal dns cache
After encountering dns issues in a cluster I was recently working on I
noticed Kubernetes 1.13 introduced support for running a nodelocal dns
cache.
I believe this can usefull for more people.
73b548db06https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md
* Add requested changes
* Add additional requested changes + documentation
* Add requested changes after review
* Replace incorrect variable
* Upgrade kubernetes to v1.13.0
* Remove all precense of scheduler.alpha.kubernetes.io/critical-pod in templates
* Fix cert dir
* Use kubespray v2.8 as baseline for gitlab
* Remove non-kubeadm deployment
* More cleanup
* More cleanup
* More cleanup
* More cleanup
* Fix gitlab
* Try stop gce first before absent to make the delete process work
* More cleanup
* Fix bug with checking if kubeadm has already run
* Fix bug with checking if kubeadm has already run
* More fixes
* Fix test
* fix
* Fix gitlab checkout untill kubespray 2.8 is on quay
* Fixed
* Add upgrade path from non-kubeadm to kubeadm. Revert ssl path
* Readd secret checking
* Do gitlab checks from v2.7.0 test upgrade path to 2.8.0
* fix typo
* Fix CI jobs to kubeadm again. Fix broken hyperkube path
* Fix gitlab
* Fix rotate tokens
* More fixes
* More fixes
* Fix tokens
* Remove variables defined in download role. Fixes#3799
* Cleanup some more variables
* Fix bad templating
* Minor fix
* Add dashboard to download role. Fixes#3736
Introduced variable `ingress_nginx_tolerations` to set custom
tolerations for Ingress nginx daemonset, to be able to schedule
ingress-nginx on dedicated nodes with taints.
* Support Metrics Server as addon (#3560).
* Update metrics server v0.3.1.
* Add metrics server test.
* Replace metrics server manifests with kubernetes/cluster/addons's.
* Modify metrics server manifests for kubespray.
* Follow PR#3558 node label node-role.kubernetes.io/master change
* Fix metrics server parameters base_metrics_server_... to metrics_server_...
* Fix too hard corded metrics_server_memory_per_node
* Add configurable insecure tls for metrics-apiservice
* Downloadable addon-resizer and extract parameter as variables
* Remove metrics server version from deployment name
* Metrics Server work when all masters has node role
* Download metrics-server and add-resizer container only on master
* ServiceAccount and ConfigMap is separated and fix application name
* Remove old metrics server clusterrole template
* Fix addon-resizer image specify
* Make InternalIP default for metrics_server_kubelet_preferred_address_types
Make InternalIP default because multiple preferrred address types does not work.
comparison that happens during `TASK [kubernetes-apps/ansible : Kubernetes Apps | Lay Down CoreDNS Template]` where the `dns-autoscaler` template is deployed causes coredns to fail deployment. The error is caused by the variable `dns_prevent_single_point_failure` where an integer is being compared with a string. The resulting error:
```bash
'>' not supported between instances of 'int' and 'str'
```
prevents successful deployment of CoreDNS.
The change makes the comparison happen between integers and allows CoreDNS to succeed.
* Enable AutoScaler for CoreDNS
* Only use one template for dns autoscaler
* Rename a few variables for replicas and minimum pods
* Rename a few variables for replicas and minimum pods
* Remove replicas to make autoscale work
* Cleanup kubedns-autoscaler as it has been renamed
* Adds support for Multus (multiple interfaces) CNI plugin
Multus is a latin word for "Multi". As the name suggests, it acts as a
Multi plugin in Kubernetes and provides multiple network interface
support in a pod. Multus uses the concept of invoking delegates by
grouping multiple plugins into delegates and invoking them in the
sequential order of the CNI configuration file provided in json format.
* Change CNI version (0.1.0->0.3.1) of Contiv to be compatible with Multus
* failed
* version_compare
* succeeded
* skipped
* success
* version_compare becomes version since ansible 2.5
* ansible minimal version updated in doc and spec
* last version_compare
* [jjo] add kube-router support
Fixescloudnativelabs/kube-router#147.
* add kube-router as another network_plugin choice
* support most used kube-router flags via
`kube_router_foo` vars as other plugins
* implement replacing kube-proxy (--run-service-proxy=true) via
`kube_proxy_mode: none`, verified in a _non kubeadm_enabled_
install, should also work for recent kubeadm releases via
`skipKubeProxyInstall: true` config
* [jjo] address PR#3339 review from @woopstar
* add busybox image used by kube-router to downloads
* fix busybox download groups key
* rework kubeadm_enabled + kube_router_run_service_proxy
- verify it working ok w/the kubeadm_enabled and
kube_router_run_service_proxy true or false
- introduce `kube_proxy_remove` fact, to decouple logic
from kube_proxy_mode (which affects kubeadm configmap
settings, thus no-good to ab-use it to 'none')
* improve kube-router.md re: kubeadm_enabled and kube_router_run_service_proxy
* address @woopstar latest review
* add inventory/sample/group_vars/k8s-cluster/k8s-net-kube-router.yml
* fix kube_router_run_service_proxy conditional for kube-proxy removal
* fix kube_proxy_remove fact (w/ |bool), add some needed kube-proxy tags on my and existing changes
* update kube-router tolerations for 1.12 compatibility
* add PriorityClass to kube-router DaemonSet
* Added Priority class to tiller installation and also fixed tiller override implementation.
* Added changes to handle priority classes separately in tiller, instead of using the variable tiller_override
* Fix DNS loop when resolvconf_mode is set to host_resolvconf
* Make sure upstream_dns_servers is defined when using resolvconf_mode == 'host_resolvconf'
* Only set upstream dns servers on KubeDNS and CoreDNS if they are defined
* Only set upstream dns servers on KubeDNS and CoreDNS if they are defined
- Local Volume StorageClass configuration is now manged by `local_volume_provisioner_storage_classes`, a list of maps that specifies local storage classes with `name` `host_dir` and `mount_dir` keys per entry
- Tasks and templates updated to loop through local volume storage classes
- Previous defaults for path/class names were not changed
- Fixed an issue where a `kubernetes/preinstall` was creating directories inconsistently with the `kubernetes-apps/external_provisioner/local_volume_provisioner` task
According to the documentation, container images are described
by vars like `foo_image_repo` and `foo_image_tag`.
The variables netcheck_{agent,server}_{img_repo,tag} do not
follow that convention.
* Changes to assign pod priority to kube components.
* Removed the boolean flag pod_priority_assignment
* Created new priorityclass k8s-cluster-critical
* Created new priorityclass k8s-cluster-critical
* Fixed the trailing spaces
* Fixed the trailing spaces
* Added kube version check while creating Priority Class k8s-cluster-critical
* Moved k8s-cluster-critical.yml
* Moved k8s-cluster-critical.yml to kube_config_dir
When enable_network_policy is set to True with Calico 3 kubectl
apply fails with the error:
The Deployment "calico-kube-controllers" is invalid:
spec.strategy.rollingUpdate: Forbidden: may not be specified when
strategy type is 'Recreate'
See
https://github.com/kubernetes-incubator/kubespray/issues/3267
Changing the update strategy to RollingUpdate avoids this error.
ensure there is pin priority for docker package to avoid upgrade of docker to incompatible version
remove empty when line
ensure there is pin priority for docker package to avoid upgrade of docker to incompatible version
force kubeadm upgrade due to failure without --force flag
ensure there is pin priority for docker package to avoid upgrade of docker to incompatible version
added nodeSelector to have compatibility with hybrid cluster with win nodes, also fix for download with missing container type
fixes in syntax and LF for newline in files
fix on yamllint check
ensure there is pin priority for docker package to avoid upgrade of docker to incompatible version
some cleanup for innecesary lines
remove conditions for nodeselector
* calico upgrade to v3
* update calico_rr version
* add missing file
* change contents of main.yml as it was left old version
* enable network policy by default
* remove unneeded task
* Fix kubelet calico settings
* fix when statement
* switch back to node-kubeconfig.yaml
* Update local-volume-provisioner-ds.yml.j2
After v1.10.2 default mountPropagation is "None"
* local_volume_provisioner version bump
v2.1.0 uses the beta nodeAffinity API by default which is available starting 1.10
* Update local-volume-provisioner-ds.yml.j2
MY_NAMESPACE env
* Update README.md
Raw block devices docs.
* kubedns & kubedns-autoscaler: Stick to master nodes.
- Tolerate only master nodes and not any NoSchedule taint
- Pods are on different nodes
- Pods are required to be on a master node.
* kubedns: use soft nodeAffinity.
Prefer to be on a master node, don't require.
* coredns: Stick to (different) master nodes.
- Pods are on different nodes
- Pods are preferred to be on a master node.
ingress-nginx 0.16.2 (https://github.com/kubernetes/ingress-nginx/releases/tag/nginx-0.16.2)
This patch simplify ingress-nginx deployment by default deploy on
master, with customizable options; on the other hand, remove the
additional Ansible group "kube-ingress" and its k8s node label
injection.
Reference to https://kubernetes.io/docs/concepts/services-networking/ingress/#prerequisites:
GCE/Google Kubernetes Engine deploys an ingress controller on the master.
By changing `ingress_nginx_nodeselector` plus custom k8s node
label, user could customize the DaemonSet deployment target.
If `ingress_nginx_nodeselector` is empty, will deploy DaemonSet on
every k8s node.
- cephfs-provisioner 06fddbe2 (https://github.com/kubernetes-incubator/external-storage/tree/06fddbe2/ceph/cephfs)
Noteable changes from upstream:
- Added storage class parameters to specify a root path within the backing cephfs and, optionally, use deterministic directory and user names (https://github.com/kubernetes-incubator/external-storage/pull/696)
- Support capacity (https://github.com/kubernetes-incubator/external-storage/pull/770)
- Enable metrics server (https://github.com/kubernetes-incubator/external-storage/pull/797)
Other noteable changes:
- Clean up legacy manifests file naming
- Remove legacy manifests, namespace and storageclass before upgrade
- `cephfs_provisioner_monitors` simplified as string
- Default to new deterministic naming
- Add `reclaimPolicy` support in StorageClass
With legacy non-deterministic naming style (where $UUID are generated ramdonly):
- cephfs_provisioner_claim_root: /volumes/kubernetes
- cephfs_provisioner_deterministic_names: false
- Generated CephFS volume: /volumes/kubernetes/kubernetes-dynamic-pvc-$UUID
- Generated CephFS user: kubernetes-dynamic-user-$UUID
With new default deterministic naming style (where $NAMESPACE and $PVC are predictable):
- cephfs_provisioner_claim_root: /volumes
- cephfs_provisioner_deterministic_names: true
- Generated CephFS volume: /volumes/$NAMESPACE/$PVC
- Generated CephFS user: k8s.$NAMESPACE.$PVC
Currently all the gcr.io images used in kubespray can only run on x86.
Also gcr.io has not fully support multi-arch docker images.
Add extra var "image_arch" (default is amd64) to support running other
platforms, like arm64.
Change-Id: I8e1c9af533c021cb96ade291a1ce58773b40e271
Kubespray should not install any helm charts. This is a task
that a user should do on his/her own through ansible or another
tool. It opens the door to wrapping installation of any helm
chart.
The default for kibana_base_url does not make sense an makes kibana unusable. The default path forces a 404 when you try to open kibana in the browser. Not setting kibana_base_url works just fine.
Added CoreDNS to downloads
Updated with labels. Should now work without RBAC too
Fix DNS settings on hosts
Rename CoreDNS service from kube-dns to coredns
Add rotate based on http://edgeofsanity.net/rant/2017/12/20/systemd-resolved-is-broken.html
Updated docs with CoreDNS info
Added labels and fixed minor settings from official yaml file: https://github.com/kubernetes/kubernetes/blob/release-1.9/cluster/addons/dns/coredns.yaml.sed
Added a secondary deployment and secondary service ip. This is to mitigate dns timeouts and create high resitency for failures. See discussion at 'https://github.com/coreos/coreos-kubernetes/issues/641#issuecomment-281174806'
Set dns list correct. Thanks to @whereismyjetpack
Only download KubeDNS or CoreDNS if selected
Move dns cleanup to its own file and import tasks based on dns mode
Fix install of KubeDNS when dnsmask_kubedns mode is selected
Add new dns option coredns_dual for dual stack deployment. Added variable to configure replicas deployed. Updated docs for dual stack deployment. Removed rotate option in resolv.conf.
Run DNS manifests for CoreDNS and KubeDNS
Set skydns servers on dual stack deployment
Use only one template for CoreDNS dual deployment
Set correct cluster ip for the dns server
* Added cilium support
* Fix typo in debian test config
* Remove empty lines
* Changed cilium version from <latest> to <v1.0.0-rc3>
* Add missing changes for cilium
* Add cilium to CI pipeline
* Fix wrong file name
* Check kernel version for cilium
* fixed ci error
* fixed cilium-ds.j2 template
* added waiting for cilium pods to run
* Fixed missing EOF
* Fixed trailing spaces
* Fixed trailing spaces
* Fixed trailing spaces
* Fixed too many blank lines
* Updated tolerations,annotations in cilium DS template
* Set cilium_version to iptables-1.9 to see if bug is fixed in CI
* Update cilium image tag to v1.0.0-rc4
* Update Cilium test case CI vars filenames
* Add optional prometheus flag, adjust initial readiness delay
* Update README.md with cilium info
In some installation, it can take up to 3sec to get the value. Retrying
for 5 sec will ensure the command won't return 1.
Signed-off-by: Sébastien Han <seb@redhat.com>
Update checksum for kubeadm
Use v1.9.0 kubeadm params
Include hash of ca.crt for kubeadm join
Update tag for testing upgrades
Add workaround for testing upgrades
Remove scale CI scenarios because of slow inventory parsing
in ansible 2.4.x.
Change region for tests to us-central1 to
improve ansible performance
This allows `kube_apiserver_insecure_port` to be set to 0 (disabled).
Rework of #1937 with kubeadm support
Also, fixed an issue in `kubeadm-migrate-certs` where the old apiserver cert was copied as the kubeadm key
* Add Contiv support
Contiv is a network plugin for Kubernetes and Docker. It supports
vlan/vxlan/BGP/Cisco ACI technologies. It support firewall policies,
multiple networks and bridging pods onto physical networks.
* Update contiv version to 1.1.4
Update contiv version to 1.1.4 and added SVC_SUBNET in contiv-config.
* Load openvswitch module to workaround on CentOS7.4
* Set contiv cni version to 0.1.0
Correct contiv CNI version to 0.1.0.
* Use kube_apiserver_endpoint for K8S_API_SERVER
Use kube_apiserver_endpoint as K8S_API_SERVER to make contiv talks
to a available endpoint no matter if there's a loadbalancer or not.
* Make contiv use its own etcd
Before this commit, contiv is using a etcd proxy mode to k8s etcd,
this work fine when the etcd hosts are co-located with contiv etcd
proxy, however the k8s peering certs are only in etcd group, as a
result the etcd-proxy is not able to peering with the k8s etcd on
etcd group, plus the netplugin is always trying to find the etcd
endpoint on localhost, this will cause problem for all netplugins
not runnign on etcd group nodes.
This commit make contiv uses its own etcd, separate from k8s one.
on kube-master nodes (where net-master runs), it will run as leader
mode and on all rest nodes it will run as proxy mode.
* Use cp instead of rsync to copy cni binaries
Since rsync has been removed from hyperkube, this commit changes it
to use cp instead.
* Make contiv-etcd able to run on master nodes
* Add rbac_enabled flag for contiv pods
* Add contiv into CNI network plugin lists
* migrate contiv test to tests/files
Signed-off-by: Cristian Staretu <cristian.staretu@gmail.com>
* Add required rules for contiv netplugin
* Better handling json return of fwdMode
* Make contiv etcd port configurable
* Use default var instead of templating
* roles/download/defaults/main.yml: use contiv 1.1.7
Signed-off-by: Cristian Staretu <cristian.staretu@gmail.com>
Move RS to deployment so no need to take care of the revision history
limits :
- Delete the old RS
- Make Calico manifest a deployment
- move deployments to apps/v1beta2 API since Kubernetes 1.8
This allows `kube_apiserver_insecure_port` to be set to 0 (disabled). It's working, but so far I have had to:
1. Make the `uri` module "Wait for apiserver up" checks use `kube_apiserver_port` (HTTPS)
2. Add apiserver client cert/key to the "Wait for apiserver up" checks
3. Update apiserver liveness probe to use HTTPS ports
4. Set `kube_api_anonymous_auth` to true to allow liveness probe to hit apiserver's /healthz over HTTPS (livenessProbes can't use client cert/key unfortunately)
5. RBAC has to be enabled. Anonymous requests are in the `system:unauthenticated` group which is granted access to /healthz by one of RBAC's default ClusterRoleBindings. An equivalent ABAC rule could allow this as well.
Changes 1 and 2 should work for everyone, but 3, 4, and 5 require new coupling of currently independent configuration settings. So I also added a new settings check.
Options:
1. The problem goes away if you have both anonymous-auth and RBAC enabled. This is how kubeadm does it. This may be the best way to go since RBAC is already on by default but anonymous auth is not.
2. Include conditional templates to set a different liveness probe for possible combinations of `kube_apiserver_insecure_port = 0`, RBAC, and `kube_api_anonymous_auth` (won't be possible to cover every case without a guaranteed authorizer for the secure port)
3. Use basic auth headers for the liveness probe (I really don't like this, it adds a new dependency on basic auth which I'd also like to leave independently configurable, and it requires encoded passwords in the apiserver manifest)
Option 1 seems like the clear winner to me, but is there a reason we wouldn't want anonymous-auth on by default? The apiserver binary defaults anonymous-auth to true, but kubespray's default was false.
* Refactor downloads to use download role directly
Also disable fact delegation so download delegate works acros OSes.
* clean up bools and ansible_os_family conditionals