c12s-kubespray/docs/large-deployments.md

Large deployments of K8s
========================

For a large scaled deployments, consider the following configuration changes:

* Tune [ansible settings](http://docs.ansible.com/ansible/intro_configuration.html)
  for `forks` and `timeout` vars to fit large numbers of nodes being deployed.

* Override containers' `foo_image_repo` vars to point to intranet registry.

* Override the ``download_run_once: true`` and/or ``download_localhost: true``.
  See download modes for details.

* Adjust the `retry_stagger` global var as appropriate. It should provide sane
  load on a delegate (the first K8s master node) then retrying failed
  push or download operations.

* Tune parameters for DNS related applications (dnsmasq daemon set, kubedns
  replication controller). Those are ``dns_replicas``, ``dns_cpu_limit``,
  ``dns_cpu_requests``, ``dns_memory_limit``, ``dns_memory_requests``.
  Please note that limits must always be greater than or equal to requests.

* Tune CPU/memory limits and requests. Those are located in roles' defaults
  and named like ``foo_memory_limit``, ``foo_memory_requests`` and
  ``foo_cpu_limit``, ``foo_cpu_requests``. Note that 'Mi' memory units for K8s
  will be submitted as 'M', if applied for ``docker run``, and cpu K8s units will
  end up with the 'm' skipped for docker as well. This is required as docker does not
  understand k8s units well.

For example, when deploying 200 nodes, you may want to run ansible with
``--forks=50``, ``--timeout=600`` and define the ``retry_stagger: 60``.
Add retry_stagger var for failed download/pushes. * Add the retry_stagger var to tweak push and retry time strategies. * Add large deployments related docs. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-09-15 09:23:27 +00:00			`Large deployments of K8s`
			`========================`

			`For a large scaled deployments, consider the following configuration changes:`

			`* Tune [ansible settings](http://docs.ansible.com/ansible/intro_configuration.html)`
			for `forks` and `timeout` vars to fit large numbers of nodes being deployed.

			* Override containers' `foo_image_repo` vars to point to intranet registry.

Add download_always_pull check and sha256 for docker images Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-12-19 14:50:04 +00:00			* Override the ``download_run_once: true`` and/or ``download_localhost: true``.
			`See download modes for details.`
Add retry_stagger var for failed download/pushes. * Add the retry_stagger var to tweak push and retry time strategies. * Add large deployments related docs. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-09-15 09:23:27 +00:00
			* Adjust the `retry_stagger` global var as appropriate. It should provide sane
			`load on a delegate (the first K8s master node) then retrying failed`
			`push or download operations.`

Tune dnsmasq/kubedns limits, replicas, logging * Add dns_replicas, dns_memory/cpu_limit/requests vars for dns related apps. * When kube_log_level=4, log dnsmasq queries as well. * Add log level control for skydns (part of kubedns app). * Add limits/requests vars for dnsmasq (part of kubedns app) and dnsmasq daemon set. * Drop string defaults for kube_log_level as it is int and is defined in the global vars as well. * Add docs Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-11-25 10:33:39 +00:00			`* Tune parameters for DNS related applications (dnsmasq daemon set, kubedns`
			replication controller). Those are ``dns_replicas``, ``dns_cpu_limit``,
			``dns_cpu_requests``, ``dns_memory_limit``, ``dns_memory_requests``.
			`Please note that limits must always be greater than or equal to requests.`

Systemd units, limits, and bin path fixes * Add restart for weave service unit * Reuse docker_bin_dir everythere * Limit systemd managed docker containers by CPU/RAM. Do not configure native systemd limits due to the lack of consensus in the kernel community requires out-of-tree kernel patches. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-12-23 14:44:44 +00:00			`* Tune CPU/memory limits and requests. Those are located in roles' defaults`
			and named like ``foo_memory_limit``, ``foo_memory_requests`` and
			``foo_cpu_limit``, ``foo_cpu_requests``. Note that 'Mi' memory units for K8s
			will be submitted as 'M', if applied for ``docker run``, and cpu K8s units will
			`end up with the 'm' skipped for docker as well. This is required as docker does not`
			`understand k8s units well.`

Add retry_stagger var for failed download/pushes. * Add the retry_stagger var to tweak push and retry time strategies. * Add large deployments related docs. Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com> 2016-09-15 09:23:27 +00:00			`For example, when deploying 200 nodes, you may want to run ansible with`
			``--forks=50``, ``--timeout=600`` and define the ``retry_stagger: 60``.