c12s-kubespray/docs/large-deployments.md
Bogdan Dobrelya aa447585c4 Fix download dnsmasq image dependency on docker
When download_run_once with download_localhost is used, docker is
expected to be running on the delegate localhost. That may be not
the case for a non localhost delegate, which is the kube-master
otherwise. Then the dnsmasq role, had it been invoked early before
deployment starts, would fail because of the missing docker dependency.

* Fix that dependency on docker and do not pre download dnsmasq image
  for the dnsmasq role, if download_localhost is disabled.
* Remove become: false for docker CLI invocation because that's not
  the common pattern to allow users access docker CLI w/o sudo.
* Fix opt bin path hack for localhost delegate to ignore errors when
  it fails with "sudo password required" otherwise.
* Describe download_run_once with download_localhost use case in docs
  as well.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-24 18:31:26 +01:00

25 lines
1.3 KiB
Markdown

Large deployments of K8s
========================
For a large scaled deployments, consider the following configuration changes:
* Tune [ansible settings](http://docs.ansible.com/ansible/intro_configuration.html)
for `forks` and `timeout` vars to fit large numbers of nodes being deployed.
* Override containers' `foo_image_repo` vars to point to intranet registry.
* Override the ``download_run_once: true`` to download container images only once
then push to cluster nodes in batches. The default delegate node
for pushing images is the first kube-master. Note, if you have passwordless sudo
and docker enabled on the separate admin node, you may want to define the
``download_localhost: true``, which makes that node a delegate for pushing images
while running the deployment with ansible. This maybe the case if cluster nodes
cannot access each over via ssh or you want to use local docker images as a cache
for multiple clusters.
* Adjust the `retry_stagger` global var as appropriate. It should provide sane
load on a delegate (the first K8s master node) then retrying failed
push or download operations.
For example, when deploying 200 nodes, you may want to run ansible with
``--forks=50``, ``--timeout=600`` and define the ``retry_stagger: 60``.