Currently all the gcr.io images used in kubespray can only run on x86.
Also gcr.io has not fully support multi-arch docker images.
Add extra var "image_arch" (default is amd64) to support running other
platforms, like arm64.
Change-Id: I8e1c9af533c021cb96ade291a1ce58773b40e271
Added CoreDNS to downloads
Updated with labels. Should now work without RBAC too
Fix DNS settings on hosts
Rename CoreDNS service from kube-dns to coredns
Add rotate based on http://edgeofsanity.net/rant/2017/12/20/systemd-resolved-is-broken.html
Updated docs with CoreDNS info
Added labels and fixed minor settings from official yaml file: https://github.com/kubernetes/kubernetes/blob/release-1.9/cluster/addons/dns/coredns.yaml.sed
Added a secondary deployment and secondary service ip. This is to mitigate dns timeouts and create high resitency for failures. See discussion at 'https://github.com/coreos/coreos-kubernetes/issues/641#issuecomment-281174806'
Set dns list correct. Thanks to @whereismyjetpack
Only download KubeDNS or CoreDNS if selected
Move dns cleanup to its own file and import tasks based on dns mode
Fix install of KubeDNS when dnsmask_kubedns mode is selected
Add new dns option coredns_dual for dual stack deployment. Added variable to configure replicas deployed. Updated docs for dual stack deployment. Removed rotate option in resolv.conf.
Run DNS manifests for CoreDNS and KubeDNS
Set skydns servers on dual stack deployment
Use only one template for CoreDNS dual deployment
Set correct cluster ip for the dns server
This allows `kube_apiserver_insecure_port` to be set to 0 (disabled).
Rework of #1937 with kubeadm support
Also, fixed an issue in `kubeadm-migrate-certs` where the old apiserver cert was copied as the kubeadm key
This allows `kube_apiserver_insecure_port` to be set to 0 (disabled). It's working, but so far I have had to:
1. Make the `uri` module "Wait for apiserver up" checks use `kube_apiserver_port` (HTTPS)
2. Add apiserver client cert/key to the "Wait for apiserver up" checks
3. Update apiserver liveness probe to use HTTPS ports
4. Set `kube_api_anonymous_auth` to true to allow liveness probe to hit apiserver's /healthz over HTTPS (livenessProbes can't use client cert/key unfortunately)
5. RBAC has to be enabled. Anonymous requests are in the `system:unauthenticated` group which is granted access to /healthz by one of RBAC's default ClusterRoleBindings. An equivalent ABAC rule could allow this as well.
Changes 1 and 2 should work for everyone, but 3, 4, and 5 require new coupling of currently independent configuration settings. So I also added a new settings check.
Options:
1. The problem goes away if you have both anonymous-auth and RBAC enabled. This is how kubeadm does it. This may be the best way to go since RBAC is already on by default but anonymous auth is not.
2. Include conditional templates to set a different liveness probe for possible combinations of `kube_apiserver_insecure_port = 0`, RBAC, and `kube_api_anonymous_auth` (won't be possible to cover every case without a guaranteed authorizer for the secure port)
3. Use basic auth headers for the liveness probe (I really don't like this, it adds a new dependency on basic auth which I'd also like to leave independently configurable, and it requires encoded passwords in the apiserver manifest)
Option 1 seems like the clear winner to me, but is there a reason we wouldn't want anonymous-auth on by default? The apiserver binary defaults anonymous-auth to true, but kubespray's default was false.
* kubeadm support
* move k8s master to a subtask
* disable k8s secrets when using kubeadm
* fix etcd cert serial var
* move simple auth users to master role
* make a kubeadm-specific env file for kubelet
* add non-ha CI job
* change ci boolean vars to json format
* fixup
* Update create-gce.yml
* Update create-gce.yml
* Update create-gce.yml
* Fix netchecker update side effect
kubectl apply should only be used on resources created
with kubectl apply. To workaround this, we should apply
the old manifest before upgrading it.
* Update 030_check-network.yml
* Use kubectl apply instead of create/replace
Disable checks for existing resources to speed up execution.
* Fix non-rbac deployment of resources as a list
* Fix autoscaler tolerations field
* set all kube resources to state=latest
* Update netchecker and weave
* Adding yaml linter to ci check
* Minor linting fixes from yamllint
* Changing CI to install python pkgs from requirements.txt
- adding in a secondary requirements.txt for tests
- moving yamllint to tests requirements
* Bump tag for upgrade CI, fix netchecker upgrade
netchecker-server was changed from pod to deployment, so
we need an upgrade hook for it.
CI now uses v2.1.1 as a basis for upgrade.
* Fix upgrades for certs from non-rbac to rbac
Replace 'netcheck_tag' with 'netcheck_version' and add additional
'netcheck_server_tag' and 'netcheck_agent_tag' config options to
provide ability to use different tags for server and agent
containers.
Pod opbject is not reschedulable by kubernetes. It means that if node
with netchecker-server goes down, netchecker-server won't be scheduled
somewhere. This commit changes the type of netchecker-server to
Deployment, so netchecker-server will be scheduled on other nodes in
case of failures.
In kubernetes 1.6 ClusterFirstWithHostNet was added as an option. In
accordance to it kubelet will generate resolv.conf based on own
resolv.conf. However, this doesn't create 'options', thus the proper
solution requires some investigation.
This patch sets the same resolv.conf for kubelet as host
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
By default Calico CNI does not create any network access policies
or profiles if 'policy' is enabled in CNI config. And without any
policies/profiles network access to/from PODs is blocked.
K8s related policies are created by calico-policy-controller in
such case. So we need to start it as soon as possible, before any
real workloads.
This patch also fixes kube-api port in calico-policy-controller
yaml template.
Closes#1132
By default kubedns and dnsmasq scale when installed.
Dnsmasq is no longer a daemonset. It is now a deployment.
Kubedns is no longer a replicationcluster. It is now a deployment.
Minimum replicas is two (to enable rolling updates).
Reduced memory erquirements for dnsmasq and kubedns
Operator can specify any port for kube-api (6443 default) This helps in
case where some pods such as Ingress require 443 exclusively.
Closes: 820
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
Migrate older inline= syntax to pure yml syntax for module args as to be consistant with most of the rest of the tasks
Cleanup some spacing in various files
Rename some files named yaml to yml for consistancy
Daemonsets cannot be simply upgraded through a single API call,
regardless of any kubectl documentation. The resource must be
purged and then recreated in order to make any changes.
Netchecker is rewritten in Go lang with some new args instead of
env variables. Also netchecker-server no longer requires kubectl
container. Updating playbooks accordingly.
* Drop linux capabilities for unprivileged containerized
worlkoads Kargo configures for deployments.
* Configure required securityContext/user/group/groups for kube
components' static manifests, etcd, calico-rr and k8s apps,
like dnsmasq daemonset.
* Rework cloud-init (etcd) users creation for CoreOS.
* Fix nologin paths, adjust defaults for addusers role and ensure
supplementary groups membership added for users.
* Add netplug user for network plugins (yet unused by privileged
networking containers though).
* Grant the kube and netplug users read access for etcd certs via
the etcd certs group.
* Grant group read access to kube certs via the kube cert group.
* Remove priveleged mode for calico-rr and run it under its uid/gid
and supplementary etcd_cert group.
* Adjust docs.
* Align cpu/memory limits and dropped caps with added rkt support
for control plane.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Change version for calico images to v1.0.0. Also bump versions for
CNI and policy controller.
Also removing images repo and tag duplication from netchecker role
* Add restart for weave service unit
* Reuse docker_bin_dir everythere
* Limit systemd managed docker containers by CPU/RAM. Do not configure native
systemd limits due to the lack of consensus in the kernel community
requires out-of-tree kernel patches.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Also place in global vars and do not repeat the kube_*_config_dir
and kube_namespace vars for better code maintainability and UX.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* Add an option to deploy K8s app to test e2e network connectivity
and cluster DNS resolve via Kubedns for nethost/simple pods
(defaults to false).
* Parametrize existing k8s apps templates with kube_namespace and
kube_config_dir instead of hardcode.
* For CoreOS, ensure nameservers from inventory to be put in the
first place to allow hostnet pods connectivity via short names
or FQDN and hostnet agents to pass as well, if netchecker
deployed.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* Add dns_replicas, dns_memory/cpu_limit/requests vars for
dns related apps.
* When kube_log_level=4, log dnsmasq queries as well.
* Add log level control for skydns (part of kubedns app).
* Add limits/requests vars for dnsmasq (part of kubedns app) and
dnsmasq daemon set.
* Drop string defaults for kube_log_level as it is int and
is defined in the global vars as well.
* Add docs
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
According to http://kubernetes.io/docs/user-guide/images/ :
By default, the kubelet will try to pull each image from the
specified registry. However, if the imagePullPolicy property
of the container is set to IfNotPresent or Never, then a local\
image is used (preferentially or exclusively, respectively).
Use IfNotPresent value to allow images prepared by the download
role dependencies to be effectively used by kubelet without pull
errors resulting apps to stay blocked in PullBackOff/Error state
even when there are images on the localhost exist.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Pre download all required container images as roles' deps.
Drop unused flannel-server-helper images pre download.
Improve pods creation post-install test pre downloaded busybox.
Improve logs collection script with kubectl describe, fix sudo/etcd/weave
commands.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
'etcd_cert_dir' variable is missing from 'kubernetes-apps/ansible'
role which breaks Calico policy controller deployment.
Also fixing calico-policy-controller.yml.