Since it is unsupported to skip upgrades, I've detailed the steps for upgrading a step at a time and removed some language that indicated it should work
* Refactor calico-rr to run in k8s cluster with taint
Change-Id: I75a3169ff5b36ce8302fc7ef1c32d3eb697b5afa
* add preinstall checks
* rework calico/rr role
Change-Id: I2f0a7e6cb77cf91ad4a615923680760d2e5d9ca8
* add empty calico-rr group
Change-Id: I006c0a60db9b72d02245bf8fdfabcf982144a5ad
* add macvlan cni to kubespray
* macvlan: lint yaml files and fix sample config file
* macvlan: add OWNERS file
* add macvlan to README
* macvlan : CI first shoot
* macvlan : CI add full masquerade
* delegate retrive pod cidr to master only
* macvlan: add config for CI
* macvlan: add netchecker deployment
* File and container image downloads are now cached localy, so that repeated vagrant up/down runs do not trigger downloading of those files. This is especially useful on laptops with kubernetes runnig locally on vm's. The total size of the cache, after an ansible run, is currently around 800MB, so bandwidth (=time) savings can be quite significant.
* When download_run_once is false, the default is still not to cache, but setting download_force_cache will still enable caching.
* The local cache location can be set with download_cache_dir and defaults to /tmp/kubernetes_cache
* A local docker instance is no longer required to cache docker images; Images are cached to file. A local docker instance is still required, though, if you wish to download images on localhost.
* Fixed a FIXME, wher the argument was that delegate_to doesn't play nice with omit. That is a correct observation and the fix is to use default(inventory_host) instead of default(omit). See ansible/ansible#26009
* Removed "Register docker images info" task from download_container and set_docker_image_facts because it was faulty and unused.
* Removed redundant when:download.{container,enabled,run_once} conditions from {sync,download}_container.yml
* All features of commit d6fd0d2aca by Timoses <timosesu@gmail.com>, merged May 1st 2019, are included in this patch. Not all code was included verbatim, but each feature of that commit was checked to be working in this patch. One notable change: The actual downloading of the kubeadm images was moved to {download,sync)_container, to enable caching.
Note 1: I considered splitting this patch, but most changes that are not directly related to caching, are a pleasant by-product of implementing the caching code, so splitting would be impractical.
Note 2: I have my doubts about the usefulness of the upload, download and upgrade tags in the download role. Must they remain or can they be removed? If anybody knows, then please speak up.
* Download to delegate and sync files when download_run_once
* Fail on error after saving container image
* Do not set changed status when downloaded container was up to date
* Only sync containers when they are actually required
Previously, non-required images (pull_required=false as
image existed on target host) were synced to the target
hosts. This failed as the image was not downloaded to
the download_delegate and hence was not available for
syncing.
* Sync containers when only missing on some hosts
* Consider images with multiple repo tags
* Enable kubeadm images pull/syncing with download_delegate
* Use kubeadm images list to pull/sync
'kubeadm config images pull' is replaced by collecting the images
list with 'kubeadm config images list' and using the commonly
used method of pull/syncing the images.
* Ensure containers are downloaded and synced for all hosts
* Fix download/syncing when download_delegate is a kubernetes host
add the support of the folling property in azure-credential-check.yml
- azure_loadbalancer_sku: Sku of Load Balancer and Public IP. Candidate values are: basic and standard.
- azure_exclude_master_from_standard_lb: excludes master nodes from standard load balancer.
- azure_disable_outbound_snat: disables the outbound SNAT for public load balancer rules
- useInstanceMetadata: Use instance metadata service where possible
- azure_primary_availability_set: (Optional) The name of the availability set that should be used as the load balancer backend
Ansible 2.0 has deprecated the “ssh” from ansible_ssh_host.
Updating the docs to be more aligned with the Ansible version used in the sample/inventory.ini file as well.
Also adding `[bastion]` group in the docs to avoid confusion.
* Vagrantfile: Bump openSUSE to Leap 15.0
* roles: container-engine: Add 'containerd' package for openSUSE
The 'containerd' package contains the docker-containerd and
docker-containerd-shim binaries. We also need to ensure that the latest
version is installed since an older version may already be present (eg GCE
images)
* Remove docker log-opts for opensuse
* roles: bootstrap-os: Use lowercase 'o' for openSUSE
OpenSUSE is not a valid family name. The correct one is openSUSE
* roles: bootstrap-os: Update zypper cache before first installation
The zypper cache may be outdated so ensure that it's fully updated
before we try and install the bootstrap packages.
Both kubedns and dnsmasq modes are long not maintained.
We should run dns_late steps at the end because sshd
makes DNS lookups during Ansible run and has 2s timeouts
for each failed lookup trying to connect to coredns before
it is ready.
* Lint everything in the repository with yamllint
* yamllint fixes: syntax fixes only
* yamllint fixes: move comments to play names
* yamllint fixes: indent comments in .gitlab-ci.yml file
* Calico: Ability to define the default IPPool CIDR (instead of kube_pods_subnet)
* Documentation for calico_pool_cidr (and calico_advertise_cluster_ips which has been forgotten...)
--limit doesn't work when using remove-node.yml as there is group listing with "hosts: kube-master" in the playbook. Thus, remove-node/pre-remove/post-remove tasks are skipped as they are filtered by group "hosts: kube-master"
Added a line documenting where to find acceptable values for the
`docker_version` setting. If you use a value that is not used as
a key value by `docker_versioned_pkg` the container-engine/docker
playbook will throw a "Unexpected templating type error". (e.g.
If you use '18.06.1' or '18.06.1-ce', neither of which is used
as a key value of `docker_versioned_pkg`, rather than '18.06',
you'll get an error when installing on Ubuntu 18.04.)
* Add support for running a nodelocal dns cache
After encountering dns issues in a cluster I was recently working on I
noticed Kubernetes 1.13 introduced support for running a nodelocal dns
cache.
I believe this can usefull for more people.
73b548db06https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md
* Add requested changes
* Add additional requested changes + documentation
* Add requested changes after review
* Replace incorrect variable
* Remove non-kubeadm deployment
* More cleanup
* More cleanup
* More cleanup
* More cleanup
* Fix gitlab
* Try stop gce first before absent to make the delete process work
* More cleanup
* Fix bug with checking if kubeadm has already run
* Fix bug with checking if kubeadm has already run
* More fixes
* Fix test
* fix
* Fix gitlab checkout untill kubespray 2.8 is on quay
* Fixed
* Add upgrade path from non-kubeadm to kubeadm. Revert ssl path
* Readd secret checking
* Do gitlab checks from v2.7.0 test upgrade path to 2.8.0
* fix typo
* Fix CI jobs to kubeadm again. Fix broken hyperkube path
* Fix gitlab
* Fix rotate tokens
* More fixes
* More fixes
* Fix tokens
* Remove variables defined in download role. Fixes#3799
* Cleanup some more variables
* Fix bad templating
* Minor fix
* Add dashboard to download role. Fixes#3736
Introduced variable node_taints which can be set in inventory for
specific hosts or in group_vars, which generates --register-with-taints
command line argument for kubelet.
* Adds support for Multus (multiple interfaces) CNI plugin
Multus is a latin word for "Multi". As the name suggests, it acts as a
Multi plugin in Kubernetes and provides multiple network interface
support in a pod. Multus uses the concept of invoking delegates by
grouping multiple plugins into delegates and invoking them in the
sequential order of the CNI configuration file provided in json format.
* Change CNI version (0.1.0->0.3.1) of Contiv to be compatible with Multus
* [jjo] add kube-router support
Fixescloudnativelabs/kube-router#147.
* add kube-router as another network_plugin choice
* support most used kube-router flags via
`kube_router_foo` vars as other plugins
* implement replacing kube-proxy (--run-service-proxy=true) via
`kube_proxy_mode: none`, verified in a _non kubeadm_enabled_
install, should also work for recent kubeadm releases via
`skipKubeProxyInstall: true` config
* [jjo] address PR#3339 review from @woopstar
* add busybox image used by kube-router to downloads
* fix busybox download groups key
* rework kubeadm_enabled + kube_router_run_service_proxy
- verify it working ok w/the kubeadm_enabled and
kube_router_run_service_proxy true or false
- introduce `kube_proxy_remove` fact, to decouple logic
from kube_proxy_mode (which affects kubeadm configmap
settings, thus no-good to ab-use it to 'none')
* improve kube-router.md re: kubeadm_enabled and kube_router_run_service_proxy
* address @woopstar latest review
* add inventory/sample/group_vars/k8s-cluster/k8s-net-kube-router.yml
* fix kube_router_run_service_proxy conditional for kube-proxy removal
* fix kube_proxy_remove fact (w/ |bool), add some needed kube-proxy tags on my and existing changes
* update kube-router tolerations for 1.12 compatibility
* add PriorityClass to kube-router DaemonSet
* Fix DNS loop when resolvconf_mode is set to host_resolvconf
* Make sure upstream_dns_servers is defined when using resolvconf_mode == 'host_resolvconf'
* Only set upstream dns servers on KubeDNS and CoreDNS if they are defined
* Only set upstream dns servers on KubeDNS and CoreDNS if they are defined
* fix openstack cli syntax
* 'allowed-address' is also a dash, not an underscore
* multiple allowed-address
multiple allowed-address must be in separate parameters
* calico upgrade to v3
* update calico_rr version
* add missing file
* change contents of main.yml as it was left old version
* enable network policy by default
* remove unneeded task
* Fix kubelet calico settings
* fix when statement
* switch back to node-kubeconfig.yaml
Added CoreDNS to downloads
Updated with labels. Should now work without RBAC too
Fix DNS settings on hosts
Rename CoreDNS service from kube-dns to coredns
Add rotate based on http://edgeofsanity.net/rant/2017/12/20/systemd-resolved-is-broken.html
Updated docs with CoreDNS info
Added labels and fixed minor settings from official yaml file: https://github.com/kubernetes/kubernetes/blob/release-1.9/cluster/addons/dns/coredns.yaml.sed
Added a secondary deployment and secondary service ip. This is to mitigate dns timeouts and create high resitency for failures. See discussion at 'https://github.com/coreos/coreos-kubernetes/issues/641#issuecomment-281174806'
Set dns list correct. Thanks to @whereismyjetpack
Only download KubeDNS or CoreDNS if selected
Move dns cleanup to its own file and import tasks based on dns mode
Fix install of KubeDNS when dnsmask_kubedns mode is selected
Add new dns option coredns_dual for dual stack deployment. Added variable to configure replicas deployed. Updated docs for dual stack deployment. Removed rotate option in resolv.conf.
Run DNS manifests for CoreDNS and KubeDNS
Set skydns servers on dual stack deployment
Use only one template for CoreDNS dual deployment
Set correct cluster ip for the dns server
* Fix HA docs API access endpoints explained
Follow-up commit 81347298a3
and fix the endpoint value provided in HA docs.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
* Clarify internal LB with external LB use case
* Clarify how to use both internal and external, non-cluster aware and
not managed with Kubespray, LB solutions.
* Clarify the requirements, like TLS/SSL termination, for such an external LB.
Unlike to the 'cluster-aware' external LB config, endpoints' security must be
managed by that non-cluster aware external LB.
* Note that masters always contact their local apiservers via https://bip:sp.
It's highly unlikely to go down and it reduces latency that might be
introduced when going host->lb->host. Only computes go that path.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
* Add a note for supplementary_addresses_in_ssl_keys
Explain how to benefit from supplementary_addresses_in_ssl_keys
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Auto configure API access endpoint with a custom bind IP, if provided.
Fix HA docs' http URLs are https in fact, clarify the insecure vs secure
API access modes as well.
Closes: #issues/2051
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
* Allow setting --bind-address for apiserver hyperkube
This is required if you wish to configure a loadbalancer (e.g haproxy)
running on the master nodes without choosing a different port for the
vip from that used by the API - in this case you need the API to bind to
a specific interface, then haproxy can bind the same port on the VIP:
root@overcloud-controller-0 ~]# netstat -taupen | grep 6443
tcp 0 0 192.168.24.6:6443 0.0.0.0:* LISTEN 0 680613 134504/haproxy
tcp 0 0 192.168.24.16:6443 0.0.0.0:* LISTEN 0 653329 131423/hyperkube
tcp 0 0 192.168.24.16:6443 192.168.24.16:58404 ESTABLISHED 0 652991 131423/hyperkube
tcp 0 0 192.168.24.16:58404 192.168.24.16:6443 ESTABLISHED 0 652986 131423/hyperkube
This can be achieved e.g via:
kube_apiserver_bind_address: 192.168.24.16
* Address code review feedback
* Update kube-apiserver.manifest.j2
* Add Contiv support
Contiv is a network plugin for Kubernetes and Docker. It supports
vlan/vxlan/BGP/Cisco ACI technologies. It support firewall policies,
multiple networks and bridging pods onto physical networks.
* Update contiv version to 1.1.4
Update contiv version to 1.1.4 and added SVC_SUBNET in contiv-config.
* Load openvswitch module to workaround on CentOS7.4
* Set contiv cni version to 0.1.0
Correct contiv CNI version to 0.1.0.
* Use kube_apiserver_endpoint for K8S_API_SERVER
Use kube_apiserver_endpoint as K8S_API_SERVER to make contiv talks
to a available endpoint no matter if there's a loadbalancer or not.
* Make contiv use its own etcd
Before this commit, contiv is using a etcd proxy mode to k8s etcd,
this work fine when the etcd hosts are co-located with contiv etcd
proxy, however the k8s peering certs are only in etcd group, as a
result the etcd-proxy is not able to peering with the k8s etcd on
etcd group, plus the netplugin is always trying to find the etcd
endpoint on localhost, this will cause problem for all netplugins
not runnign on etcd group nodes.
This commit make contiv uses its own etcd, separate from k8s one.
on kube-master nodes (where net-master runs), it will run as leader
mode and on all rest nodes it will run as proxy mode.
* Use cp instead of rsync to copy cni binaries
Since rsync has been removed from hyperkube, this commit changes it
to use cp instead.
* Make contiv-etcd able to run on master nodes
* Add rbac_enabled flag for contiv pods
* Add contiv into CNI network plugin lists
* migrate contiv test to tests/files
Signed-off-by: Cristian Staretu <cristian.staretu@gmail.com>
* Add required rules for contiv netplugin
* Better handling json return of fwdMode
* Make contiv etcd port configurable
* Use default var instead of templating
* roles/download/defaults/main.yml: use contiv 1.1.7
Signed-off-by: Cristian Staretu <cristian.staretu@gmail.com>
* Defaults for apiserver_loadbalancer_domain_name
When loadbalancer_apiserver is defined, use the
apiserver_loadbalancer_domain_name with a given default value.
Fix unconsistencies for checking if apiserver_loadbalancer_domain_name
is defined AND using it with a default value provided at once.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
* Define defaults for LB modes in common defaults
Adjust the defaults for apiserver_loadbalancer_domain_name and
loadbalancer_apiserver_localhost to come from a single source, which is
kubespray-defaults. Removes some confusion and simplefies the code.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
* Change deprecated vagrant ansible flag 'sudo' to 'become'
* Emphasize, that the name of the pip_pyton_modules is only considered in coreos
* Remove useless unused variable
* Fix warning when jinja2 template-delimiters used in when statement
There is no need for jinja2 template-delimiters like {{ }} or {% %}
any more. They can just be omitted as described in https://github.com/ansible/ansible/issues/22397
* Fix broken link in getting-started guide
In 1.8, the Node authorization mode should be listed first to
allow kubelet to access secrets. This seems to only impact
environments with cloudprovider enabled.
* Rename dns_server to dnsmasq_dns_server so that it includes role prefix
as the var name is generic and conflicts when integrating with existing ansible automation.
* Enable selinux state to be configurable with new var preinstall_selinux_state
This follows pull request #1677, adding the cgroup-driver
autodetection also for kubeadm way of deploying.
Info about this and the possibility to override is added to the docs.
New files: /etc/kubernetes/admin.conf
/root/.kube/config
$GITDIR/artifacts/{kubectl,admin.conf}
Optional method to download kubectl and admin.conf if
kubeconfig_lcoalhost is set to true (default false)
* Added update CA trust step for etcd and kube/secrets roles
* Added load_balancer_domain_name to certificate alt names if defined. Reset CA's in RedHat os.
* Rename kube-cluster-ca.crt to vault-ca.crt, we need separated CA`s for vault, etcd and kube.
* Vault role refactoring, remove optional cert vault auth because not not used and worked. Create separate CA`s fro vault and etcd.
* Fixed different certificates set for vault cert_managment
* Update doc/vault.md
* Fixed condition create vault CA, wrong group
* Fixed missing etcd_cert_path mount for rkt deployment type. Distribute vault roles for all vault hosts
* Removed wrong when condition in create etcd role vault tasks.
* Updates Controller Manager/Kubelet with Flannel's required configuration for CNI
* Removes old Flannel installation
* Install CNI enabled Flannel DaemonSet/ConfigMap/CNI bins and config (with portmap plugin) on host
* Uses RBAC if enabled
* Fixed an issue that could occur if br_netfilter is not a module and net.bridge.bridge-nf-call-iptables sysctl was not set
$IPS only expands to the first ip address in the array:
justin@box:~$ declare -a IPS=(10.10.1.3 10.10.1.4 10.10.1.5)
justin@box:~$ echo $IPS
10.10.1.3
justin@box:~$ echo ${IPS[@]}
10.10.1.3 10.10.1.4 10.10.1.5
Clarify that the `kube_version` environment variable is needed for the CLI "graceful upgrade". Also add and example to check that the upgrade was successful.
Non-brekable space is 0xc2 0xa0 byte sequence in UTF-8.
To find one:
$ git grep -I -P '\xc2\xa0'
To replace with regular space:
$ git grep -l -I -P '\xc2\xa0' | xargs sed -i 's/\xc2\xa0/ /g'
This commit doesn't include changes that will overlap with commit f1c59a91a1.
The AWS IAM profiles and policies required to run Kargo on AWS
are no longer hosted in the kubernetes main repo since kube-up got
deprecated. Hence we have to move the files into the kargo repository.
Updates based on feedback
Simplify checks for file exists
remove invalid char
Review feedback. Use regular systemd file.
Add template for docker systemd atomic
By default Calico blocks traffic from endpoints
to the host itself by using an iptables DROP
action. It could lead to a situation when service
has one alive endpoint, but pods which run on
the same node can not access it. Changed the action
to RETURN.
Operator can specify any port for kube-api (6443 default) This helps in
case where some pods such as Ingress require 443 exclusively.
Closes: 820
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
* Leave all.yml to keep only optional vars
* Store groups' specific vars by existing group names
* Fix optional vars casted as mandatory (add default())
* Fix missing defaults for an optional IP var
* Relink group_vars for terraform to reflect changes
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
New deploy modes: scale, ha-scale, separate-scale
Creates 200 fake hosts for deployment with fake hostvars.
Useful for testing certificate generation and propagation to other
master nodes.
Updated test cases descriptions.
Based on #718 introduced by rsmitty.
Includes all roles and all options to support deployment of
new hosts in case they were added to inventory.
Main difference here is that master role is evaluated first
so that master components get upgraded first.
Fixes#694
- Exclude kubelet CPU/RAM (kube-reserved) from cgroup. It decreases a
chance of overcommitment
- Add a possibility to modify Kubelet node-status-update-frequency
- Add a posibility to configure node-monitor-grace-period,
node-monitor-period, pod-eviction-timeout for Kubernetes controller
manager
- Add Kubernetes Relaibility Documentation with recomendations for
various scenarios.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
kubelet lost the ability to load kernel modules. This
puts that back by adding the lib/modules mount to kubelet.
The new variable kubelet_load_modules can be set to true
to enable this item. It is OFF by default.
Netchecker is rewritten in Go lang with some new args instead of
env variables. Also netchecker-server no longer requires kubectl
container. Updating playbooks accordingly.
* Drop linux capabilities for unprivileged containerized
worlkoads Kargo configures for deployments.
* Configure required securityContext/user/group/groups for kube
components' static manifests, etcd, calico-rr and k8s apps,
like dnsmasq daemonset.
* Rework cloud-init (etcd) users creation for CoreOS.
* Fix nologin paths, adjust defaults for addusers role and ensure
supplementary groups membership added for users.
* Add netplug user for network plugins (yet unused by privileged
networking containers though).
* Grant the kube and netplug users read access for etcd certs via
the etcd certs group.
* Grant group read access to kube certs via the kube cert group.
* Remove priveleged mode for calico-rr and run it under its uid/gid
and supplementary etcd_cert group.
* Adjust docs.
* Align cpu/memory limits and dropped caps with added rkt support
for control plane.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Also adds calico-rr group if there are standalone etcd nodes.
Now if there are 50 or more nodes, 3 etcd nodes will be standalone.
If there are 200 or more nodes, 2 kube-masters will be standalone.
If thresholds are exceeded, kube-node group cannot add nodes that
belong to etcd or kube-master groups (according to above statements).
* Add restart for weave service unit
* Reuse docker_bin_dir everythere
* Limit systemd managed docker containers by CPU/RAM. Do not configure native
systemd limits due to the lack of consensus in the kernel community
requires out-of-tree kernel patches.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>