Admin certs are only available for kube-master nodes.
When etcd nodes are separate, calico fails to access them with
missing admin certs and etcd fails to configure ETCD_PEER_* env
vars due to missing member certs.
Fix this by switching curls to the first etcd node
and delegate to the first master. This assumes only admin certs
allow to get calico keys from etcd but not member/node certs.
Also move member certs from master_certs to node_certs list as
ETCD(_PEER)_CERT/KEY env vars expects.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
When download_run_once with download_localhost is used, docker is
expected to be running on the delegate localhost. That may be not
the case for a non localhost delegate, which is the kube-master
otherwise. Then the dnsmasq role, had it been invoked early before
deployment starts, would fail because of the missing docker dependency.
* Fix that dependency on docker and do not pre download dnsmasq image
for the dnsmasq role, if download_localhost is disabled.
* Remove become: false for docker CLI invocation because that's not
the common pattern to allow users access docker CLI w/o sudo.
* Fix opt bin path hack for localhost delegate to ignore errors when
it fails with "sudo password required" otherwise.
* Describe download_run_once with download_localhost use case in docs
as well.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Use cloud-init config to replace /etc/resolv.conf with the
content for kubelet to properly configure hostnet pods.
Do not use systemd-resolved yet, see
https://coreos.com/os/docs/latest/configuring-dns.html
"Only nss-aware applications can take advantage of the
systemd-resolved cache. Notably, this means that statically
linked Go programs and programs running within Docker/rkt
will use /etc/resolv.conf only, and will not use the
systemd-resolve cache."
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
W/o this patch, the "Download containers" task may be skipped
when running on the delegate node due to wrong "when" confition.
Then it fails to upload nginx image to the nodes as well.
Fix download nginx dependency so it always can be pushed to
nodes when download_run_once is enabled.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
When setting permission for containers download/upload dir we're
using `ansible_ssh_user`. But if playbook is executed without
user being explicitly set `ansible_ssh_user` may be undefined.
In such situations dir ownership will default to `ansible_user_id`
Closes: #644
According to http://kubernetes.io/docs/user-guide/images/ :
By default, the kubelet will try to pull each image from the
specified registry. However, if the imagePullPolicy property
of the container is set to IfNotPresent or Never, then a local\
image is used (preferentially or exclusively, respectively).
Use IfNotPresent value to allow images prepared by the download
role dependencies to be effectively used by kubelet without pull
errors resulting apps to stay blocked in PullBackOff/Error state
even when there are images on the localhost exist.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Pre download all required container images as roles' deps.
Drop unused flannel-server-helper images pre download.
Improve pods creation post-install test pre downloaded busybox.
Improve logs collection script with kubectl describe, fix sudo/etcd/weave
commands.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Fix unreliable waiting for the apiserver to become ready.
Remove logfile mount to align with the rest of static pods
and because containers shall write logs to stdout only.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
in the etcd handler, the reload etcd action
was called after ansible waits for etcd to be
up, this means that the health checks which are
called immediately after fail (resulting in the etcd
role always failing and never finishing)
This patch changes the order to move the 'wait for etcd
up' resource after the 'reload etcd resource', ensuring that
the service is up before the health check is called.
* Add download_localhost for the download_run_once mode, which is
use the ansible host (a travis node for CI case) to store and
distribute containers across cluster nodes in inventory.
Defaults to false.
* Rework download_run_once logic to fix idempotency of uploading
containers.
* For Travis CI, enable docker images caching and run Travis
workers with sudo enabled as a dependency
* For Travis CI, deploy with download_localhost and download_run_once
enabled to shourten dev path drastically.
* Add compression for saved container images. Defaults to 'best'.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Aleksandr Didenko <adidenko@mirantis.com>
Add one more step (task) to containers download/upload sequence -
copy saved .tar containers to ansible host (delegate_to: localhost).
Then upload images to target nodes. It uses synchronize module so
if ansible host (localhost) is the same host as kube-master[0] then
new task causes no issues and the copy to localhost process is
basically skipped.
- Move CNI configuration from `kubernetes/node` role to
`network_plugin/canal`
- Create SSL dir for Canal and symlink etcd SSL files
- Add needed options to `canal-config` configmap
- Run flannel and calico-node containers with proper configuration
Since version 'v1.0.0-beta' calicoctl is written
in Go and its API differs from old Python based
utility. Added support of both old and new version
of the utility.
- Drop debugs from collect-info playbook
- Drop sudo from collect-info step and add target dir var (required for travis jobs)
- Label all k8s apps, including static manifests
- Add logs for K8s apps to be collected as well
- Fix upload to GCS as a public-read tarball
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
'etcd_cert_dir' variable is missing from 'kubernetes-apps/ansible'
role which breaks Calico policy controller deployment.
Also fixing calico-policy-controller.yml.