In order to mitigate sporadic data races in etcd
(publish error: etcdserver: request timed out"):
- Add etcd_start_delay and kubelet_start_delay (defaults to a 5 sec.)
- Increase default start sleep times to foo_start_delay from a 1 sec.
- Add restart sleeping as well.
- Add missing start sleep commands as appropriate.
Closes: https://github.com/kubespray/kargo/issues/342
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
The requirements for network policy feature are described here [1]. In
order to enable it, appropriate configuration must be provided to the CNI
plug in and Calico policy controller must be set up. Beside that
corresponding extensions needed to be enabled in k8s API.
Now to turn on the feature user can define `enable_network_policy`
customization variable for Ansible.
[1] http://kubernetes.io/docs/user-guide/networkpolicies/
Also adds all masters by hostname and localhost/127.0.0.1 to
apiserver SSL certificate.
Includes documentation update on how localhost loadbalancer works.
Change additional dnsmasq opts:
- Adjust caching size and TTL
- Disable resolve conf to not create loops
- Change dnsPolicy to default (similarly to kubedns's dnsmasq). The
ClusterFirst should not be used to not create loops
- Disable negative NXDOMAIN replies to be cached
- Make its very installation as optional step (enabled by default).
If you don't want more than 3 DNS servers, including 1 for K8s, disable
it.
- Add docs and a drawing to clarify DNS setup.
- Fix stdout logs for dnsmasq/kubedns app configs
- Add missed notifies to resolvconf -u handler
- Fix idempotency of resolvconf head file changes
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Creating the unit using default settings early on
and then changing it during network_plugin section
leads to too many docker restarts and duplicated code.
Reversed Wants= dependence on docker.service so it does not
restart docker when reloading systemd
Consolidated all docker restart handlers.
* Add for docker system units:
ExecReload=/bin/kill -s HUP $MAINPID
Delegate=yes
KillMode=process.
* Add missed DOCKER_OPTIONS for calico/weave docker systemd unit.
* Change Requires= to a less strict and non-faily Wants=, add missing
Wants= for After=.
* Align wants/after in a wat if Wants=foo, After= has foo as well.
* Make wants/after docker.service to ask for the docker.socket as well.
* Move "docker rm -f" commands from ExecStartPre= to ExecStopPost=.
hooks to ensure non-destructive start attempts issued by Wants=.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* Add HA docs for API server.
* Add auto-evaluated internal endpoints and clarify the loadbalancer_apiserver
vars and usecases.
* Use facts for kube_apiserver to not repeat code and enable LB endpoints use.
* Use /healthz check for the wait-for apiserver.
* Use the single endpoint for kubelet instead of the list of apiservers
* Specify kube_apiserver_count to for HA layout
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
kubelet via docker
kube-apiserver as a static pod
Fixed etcd service start to be more tolerant of slow start.
Workaround for kube_version to stay in download role, but not
download an files by creating a new "nothing" download entry.
* Add auto-evaluated internal endpoints and clarify the loadbalancer_apiserver
vars and usecases.
* Add loadbalancer_apiserver_localhost (default false). If enabled, override
the external LB and expect localhost:443/8080 to be new internal only frontends.
* Add kube_apiserver_multiaccess to ignore loadbalancers, and make clients
to access the apiservers as a comma-separated list of access_ip/ip/ansible ip
(a default mode). When disabled, allow clients to use the given loadbalancers.
* Define connections security mode for kube controllers, schedulers, proxies.
It is insecure be default, which is the current deployment choice.
* Rework the groups['kube-master'][0] hardcode defining the apiserver
endpoints.
* Improve grouping of vars and add facts for kube_apiserver.
* Define kube_apiserver_insecure_bind_address as a fact, add more
facts for ease of use.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* Enforce a etcd-proxy role to a k8s-cluster group members. This
provides an HA layout for all of the k8s cluster internal clients.
* Proxies to be run on each node in the group as a separate etcd
instances with a readwrite proxy mode and listen the given endpoint,
which is either the access_ip:2379 or the localhost:2379.
* A notion for the 'kube_etcd_multiaccess' is: ignore endpoints and
loadbalancers and use the etcd members IPs as a comma-separated
list. Otherwise, clients shall use the local endpoint provided by a
etcd-proxy instances on each etcd node. A Netwroking plugins always
use that access mode.
* Fix apiserver's etcd servers args to use the etcd_access_endpoint.
* Fix networking plugins flannel/calico to use the etcd_endpoint.
* Fix name env var for non masters to be set as well.
* Fix etcd_client_url was not used anywhere and other etcd_* facts
evaluation was duplicated in a few places.
* Define proxy modes only in the env file, if not a master. Del
an automatic proxy mode decisions for etcd nodes in init/unit scripts.
* Use Wants= instead of Requires= as "This is the recommended way to
hook start-up of one unit to the start-up of another unit"
* Make apiserver/calico Wants= etcd-proxy to keep it always up
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Matthew Mosesohn <mmosesohn@mirantis.com>
Currently kubespray does not install kubernetes in a way that allows cinder volumes to be used. This commit provides the necessary cloud configuration file and configures kubelet and kube-apiserver to use it.
Each node can have 3 IPs.
1. ansible_default_ip4 - whatever ansible things is the first IPv4 address
usually with the default gw.
2. ip - An address to use on the local node to bind listeners and do local
communication. For example, Vagrant boxes have a first address that is the
NAT bridge and is common for all nodes. The second address/interface should
be used.
3. access_ip - An address to use for node-to-node access. This is assumed to
be used by other nodes to access the node and may not be actually assigned
on the node. For example, AWS public ip that is not assigned to node.
This updates the places addresses are used to use either ip or access_ip and walk
up the list to find an address.