Commit graph

212 commits

Author SHA1 Message Date
Matthew Mosesohn
acb63a57fa Only limit etcd memory on small hosts (#1860)
Also disable oom killer on etcd
2017-10-25 10:25:15 +01:00
Matthew Mosesohn
0b4fcc83bd Fix up warnings and deprecations (#1848) 2017-10-20 08:25:57 +01:00
Matthew Mosesohn
514359e556 Improve etcd scale up (#1846)
Now adding unjoined members to existing etcd cluster
occurs one at a time so that the cluster does not
lose quorum.
2017-10-20 08:02:31 +01:00
Matthew Mosesohn
fc9a65be2b Refactor downloads to use download role directly (#1824)
* Refactor downloads to use download role directly

Also disable fact delegation so download delegate works acros OSes.

* clean up bools and ansible_os_family conditionals
2017-10-19 09:17:11 +01:00
Matthew Mosesohn
10dd049912 Revert "Security fixes for etcd (#1778)" (#1786)
This reverts commit 4209f1cbfd.
2017-10-12 14:02:51 +01:00
Matthew Mosesohn
4209f1cbfd Security fixes for etcd (#1778)
* Security fixes for etcd

* Use certs when querying etcd
2017-10-12 13:32:54 +01:00
Matthew Mosesohn
83be0735cd Fix setting etcd client cert serial (#1775) 2017-10-11 19:47:11 +01:00
Spencer Smith
f2db15873d Merge pull request #1754 from ArchiFleKs/rkt-kubelet-fix
add hosts to rkt kubelet
2017-10-10 10:37:36 -04:00
ArchiFleKs
7c663de6c9 add /etc/hosts volume to rkt templates 2017-10-09 16:41:51 +02:00
Aivars Sterns
9c86da1403 Normalize tags in all places to prepare for tag fixing in future (#1739) 2017-10-05 08:43:04 +01:00
Matthew Mosesohn
a56738324a Move set_facts to kubespray-defaults defaults
These facts can be generated in defaults with a performance
boost.

Also cleaned up duplicate etcd var names.
2017-10-04 14:02:47 +01:00
Brad Beam
14c232e3c4 Merge pull request #1663 from foxyriver/fix-shell
use command module instead of shell module
2017-09-25 13:24:45 -05:00
Bogdan Dobrelya
bcddfb786d Merge pull request #1692 from mattymo/old-etcd-logic
drop unused etcd logic
2017-09-25 17:44:33 +02:00
Hassan Zamani
b23d81f825 Add etcd_blkio_weight var (#1690) 2017-09-25 12:20:24 +01:00
Matthew Mosesohn
126f42de06 drop unused etcd logic
Fixes #1660
2017-09-25 07:52:55 +01:00
foxyriver
30b5493fd6 use command module instead of shell module 2017-09-22 15:47:03 +08:00
Brad Beam
ac281476c8 Prune unnecessary certs from vault setup (#1652)
* Cleaning up cert checks for vault

* Removing all unnecessary etcd certs from each node

* Removing all unnecessary kube certs from each node
2017-09-14 12:28:11 +01:00
Matthew Mosesohn
6744726089 kubeadm support (#1631)
* kubeadm support

* move k8s master to a subtask
* disable k8s secrets when using kubeadm
* fix etcd cert serial var
* move simple auth users to master role
* make a kubeadm-specific env file for kubelet
* add non-ha CI job

* change ci boolean vars to json format

* fixup

* Update create-gce.yml

* Update create-gce.yml

* Update create-gce.yml
2017-09-13 19:00:51 +01:00
Matthew Mosesohn
5d99fa0940 Purge old upgrade hooks and unused tasks (#1641) 2017-09-09 23:41:20 +03:00
mkrasilnikov
bf0af1cd3d Vault role updates:
* using separated vault roles for generate certs with different `O` (Organization) subject field;
  * configure vault roles for issuing certificates with different `CN` (Common name) subject field;
  * set `CN` and `O` to `kubernetes` and `etcd` certificates;
  * vault/defaults vars definition was simplified;
  * vault dirs variables defined in kubernetes-defaults foles for using
  shared tasks in etcd and kubernetes/secrets roles;
  * upgrade vault to 0.8.1;
  * generate random vault user password for each role by default;
  * fix `serial` file name for vault certs;
  * move vault auth request to issue_cert tasks;
  * enable `RBAC` in vault CI;
2017-09-05 09:07:35 +03:00
Brad Beam
8ae77e955e Adding in certificate serial numbers to manifests (#1392) 2017-09-01 09:02:23 +03:00
sgmitchell
783924e671 Change backup handler to only run v2 data backup if snap directory exists (#1594) 2017-08-31 18:23:24 +03:00
Maxim Krasilnikov
6eb22c5db2 Change single Vault pki mount to multi pki mounts paths for etcd and kube CA`s (#1552)
* Added update CA trust step for etcd and kube/secrets roles

* Added load_balancer_domain_name to certificate alt names if defined. Reset CA's in RedHat os.

* Rename kube-cluster-ca.crt to vault-ca.crt, we need separated CA`s for vault, etcd and kube.

* Vault role refactoring, remove optional cert vault auth because not not used and worked. Create separate CA`s fro vault and etcd.

* Fixed different certificates set for vault cert_managment

* Update doc/vault.md

* Fixed condition create vault CA, wrong group

* Fixed missing etcd_cert_path mount for rkt deployment type. Distribute vault roles for all vault hosts

* Removed wrong when condition in create etcd role vault tasks.
2017-08-30 16:03:22 +03:00
Brad Beam
8b151d12b9 Adding yamllinter to ci steps (#1556)
* Adding yaml linter to ci check

* Minor linting fixes from yamllint

* Changing CI to install python pkgs from requirements.txt

- adding in a secondary requirements.txt for tests
- moving yamllint to tests requirements
2017-08-24 12:09:52 +03:00
Anton
1e07ee6cc4 etcd_compaction_retention every 8 hour (#1527) 2017-08-20 13:55:48 +03:00
Maxim Krasilnikov
2ba285a544 Fixed deploy cluster with vault cert manager (#1548)
* Added custom ips to etcd vault distributed certificates

* Added custom ips to kube-master vault distributed certificates

* Added comment about issue_cert_copy_ca var in vault/issue_cert role file

* Generate kube-proxy, controller-manager and scheduler certificates by vault

* Revert "Disable vault from CI (#1546)"

This reverts commit 781f31d2b8.

* Fixed upgrade cluster with vault cert manager

* Remove vault dir in reset playbook
2017-08-20 13:53:58 +03:00
Matthew Mosesohn
2645e88b0c Fix vault setup partially (#1531)
This does not address per-node certs and scheduler/proxy/controller-manager
component certs which are now required. This should be handled in a
follow-up patch.
2017-08-18 15:09:45 +03:00
Spencer Smith
e55f8a61cd Merge pull request #1482 from bradbeam/fix1393
Removing run_once in these tasks so that etcd ca certs get propogated…
2017-07-31 13:47:18 -04:00
Spencer Smith
cb6892d2ed Merge pull request #1469 from hzamani/etcd_metrics
Add etcd metrics flag
2017-07-31 09:04:07 -04:00
Brad Beam
d09222c900 Removing run_once in these tasks so that etcd ca certs get propogated properly to worker nodes
without this etcd ca certs dont exist on worker nodes causing calico to fail
2017-07-28 14:34:47 -05:00
Anton
e0960f6288 FIX: Unneded (extra) cycles in some tasks (#1393) 2017-07-27 20:46:21 +03:00
Hassan Zamani
3fb0383df4 Add etcd metrics flag 2017-07-25 20:00:30 +04:30
Spencer Smith
955c5549ae Merge pull request #1402 from Lendico/fix_failed_when
"failed_when: false" and "|succeeded" checks for registered vars
2017-07-25 09:33:43 -04:00
Brad Beam
20f29327e9 Merge pull request #1379 from gdmello/etcd_data_dir_fix
Custom `etcd_data_dir` saves etcd data to host, not container
2017-07-20 09:30:18 -05:00
Anton Nerozya
1fedbded62 ignore_errors instead of failed_when: false 2017-06-29 20:15:14 +02:00
gdmelloatpoints
649654207f mount the etcd data directory in the container with the same path as on the host. 2017-06-27 09:29:47 -04:00
gdmelloatpoints
3123502f4c move etcd_backup_prefix to new home. 2017-06-27 09:12:34 -04:00
gdmelloatpoints
4ba237c5d8 Make etcd_backup_prefix configurable. Ensures that backups can be stored on a different location other than ${HOST}/var/backups, say an EBS volume on AWS. 2017-06-26 09:42:30 -04:00
gdmelloatpoints
5c1891ec9f In the etcd container, the etcd data directory is always /var/lib/etcd. Reverting to this value, since etcd_data_dir on the host maps to /var/lib/etcd in the container. 2017-06-23 13:49:31 -04:00
Gregory Storme
fff0aec720 add configurable parameter for etcd_auto_compaction_retention 2017-06-14 10:39:38 +02:00
Brad Beam
db3e8edacd Fixing up vault variables 2017-06-08 16:15:33 -05:00
Matthew Mosesohn
ae7f59e249 Skip vault cert task evaluation completely when using script cert generation 2017-04-13 19:29:07 +03:00
Matthew Mosesohn
798f90c4d5 Merge pull request #1153 from mattymo/graceful_drain
Move graceful upgrade test to Ubuntu canal HA, adjust drain
2017-04-04 17:33:53 +03:00
Aleksandr Didenko
58acbe7caf Fix multiline when condition in sync_certs task
Folded style in multiline 'when' condition causes error with
unexpected ident. Changing it to literal style should fix
the issue.

Closes #1190
2017-03-30 22:21:04 +02:00
Matthew Mosesohn
fb467df47c fix etcd restart 2017-03-29 23:22:49 +04:00
Sergii Golovatiuk
f144fd1ed3 Refactor etcd role
- Run docker run from script rather than directly from systemd target
- Refactoring styling/templates

Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
2017-03-24 12:34:15 +01:00
Sergii Golovatiuk
c04a6254b9 Backup etcd data before restarting etcd
etcd is crucial part of kubernetes cluster. Ansible restarts etcd on
reconfiguration. Backup helps operator to restore cluster manually in
case of any issues.

Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
2017-03-20 14:50:52 +01:00
Matthew Mosesohn
8195957461 Merge branch 'master' into idempotency2 2017-03-16 09:29:43 +03:00
Matthew Mosesohn
a422ad0d50 More idempotency fixes
Fixed sync_tokens fact
Fixed sync_certs for k8s tokens fact
Disabled register docker images changability
Fixed CNI dir permission
Fix idempotency for etcd pre upgrade checks
2017-03-15 19:06:39 +03:00
Matthew Mosesohn
4c6829513c Fix etcd idempotency 2017-03-14 17:23:29 +03:00
Matthew Mosesohn
02a8e78902 Remove standalone etcd specific play, cleanup host mode
Now etcd role can optionally disable etcd cluster setup for faster
deployment when it is combined with etcd role.
2017-03-04 00:34:26 +04:00
Matthew Mosesohn
8f3d9e93ce Merge pull request #1111 from mattymo/use_find_for_certs
Use find module for checking for certificates
2017-03-03 20:08:33 +03:00
Matthew Mosesohn
d176818c44 Use find module for checking for certificates
Also generate certs only when absent on master (rather than
when absent on target node)
2017-03-03 16:21:01 +03:00
Vijay Katam
a0b1eda1d0 Add support for atomic host
Updates based on feedback

Simplify checks for file exists

remove invalid char

Review feedback. Use regular systemd file.

Add template for docker systemd atomic
2017-03-01 09:38:19 -08:00
Antoine Legrand
77e5171679 Merge pull request #1076 from VincentS/etcd_openssl_count_fix
Fixed counter in ETCD Openssl.conf
2017-03-01 14:17:27 +01:00
Sergii Golovatiuk
f9ff93c606 Make etcd data dir configurable.
Closes: #1073
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
2017-02-27 21:35:51 +01:00
Vincent Schwarzer
0cbc3d8df6 Fixed counter in ETCD Openssl.conf
When a apiserver_loadbalancer_domain_name is added to the Openssl.conf
the counter gets not increased correctly. This didnt seem to have an
effect at the current kargo version.
2017-02-27 12:01:09 +01:00
Sergii Golovatiuk
00cfead9bb Increase SSL TTL to 3650 days
In real scenarios 365 days is short period of time. 3650 days is good
enough for long running k8s environments
2017-02-24 15:38:13 +01:00
Matthew Mosesohn
d821448e2f Merge branch 'master' into synthscale 2017-02-21 22:17:43 +03:00
Matthew Mosesohn
0afadb9149 Merge pull request #1046 from skyscooby/pedantic-syntax-cleanup
Cleanup legacy syntax, spacing, files all to yml
2017-02-21 17:03:16 +03:00
Matthew Mosesohn
d19e6dec7a speed up etcd preupgrade check 2017-02-20 20:18:10 +03:00
Matthew Mosesohn
a21eb036ee Add no_log to cert tar tasks
This works around 4MB limit for gitlab CI runner.
2017-02-18 14:09:57 +04:00
Matthew Mosesohn
9c1701f2aa Add synthetic scale deployment mode
New deploy modes: scale, ha-scale, separate-scale
Creates 200 fake hosts for deployment with fake hostvars.

Useful for testing certificate generation and propagation to other
master nodes.

Updated test cases descriptions.
2017-02-18 14:09:55 +04:00
Andrew Greenwood
ca9ea097df Cleanup legacy syntax, spacing, files all to yml
Migrate older inline= syntax to pure yml syntax for module args as to be consistant with most of the rest of the tasks
Cleanup some spacing in various files
Rename some files named yaml to yml for consistancy
2017-02-17 16:22:34 -05:00
Antoine Legrand
e16ebcad6e Merge pull request #1042 from holser/fix_facts
Fix fact tags
2017-02-17 17:56:29 +01:00
Sergii Golovatiuk
e91e58aec9 Fix fact tags
Ansible playbook fails when tags are limited to "facts,etcd" or to
"facts". This patch allows to run ansible-playbook to gather facts only
that don't require calico/flannel/weave components to be verified. This
allows to run ansible with 'facts,bootstrap-os' or just 'facts' to
gether facts that don't require specific components.

Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
2017-02-17 12:32:33 +01:00
Matthew Mosesohn
80c0e747a7 Fix references to CoreOS and Container Linux by CoreOS
Fixes #967
2017-02-16 19:25:17 +03:00
Vladimir Rutsky
09847567ae set "check_mode: no" for read-only "shell" steps that registers result
"shell" step doesn't support check mode, which currently leads to failures,
when Ansible is being run in check mode (because Ansible doesn't run command,
assuming that command might have effect, and no "rc" or "output" is registered).

Setting "check_mode: no" allows to run those "shell" commands in check mode
(which is safe, because those shell commands doesn't have side effects).
2017-02-13 18:53:41 +03:00
Josh Conant
245e05ce61 Vault security hardening and role isolation 2017-02-08 21:41:36 +00:00
Josh Conant
f4ec2d18e5 Adding the Vault role 2017-02-08 21:31:28 +00:00
Matthew Mosesohn
0180ad7f38 Merge pull request #990 from mattymo/fix_cert_upgrade
Fix check for node-NODEID certs existence
2017-02-08 14:44:09 +03:00
Matthew Mosesohn
e5779ab786 Fix check for node-NODEID certs existence
Fixes upgrade from pre-individual node cert envs.
2017-02-07 21:06:48 +03:00
Matthew Mosesohn
71e14a13b4 Re-tune ETCD performance params
Reduce election timeout to 5000ms (was 10000ms)
Raise heartbeat interval to 250ms (was 100ms)
Remove etcd cpu share (was 300)
Make etcd_cpu_limit and etcd_memory_limit optional.
2017-02-07 20:15:14 +03:00
Matthew Mosesohn
fd30131dc2 Revert "Drop linux capabilities and rework users/groups" 2017-02-06 15:58:54 +03:00
Bogdan Dobrelya
cb2e5ac776 Drop linux capabilities and rework users/groups
* Drop linux capabilities for unprivileged containerized
  worlkoads Kargo configures for deployments.
* Configure required securityContext/user/group/groups for kube
  components' static manifests, etcd, calico-rr and k8s apps,
  like dnsmasq daemonset.
* Rework cloud-init (etcd) users creation for CoreOS.
* Fix nologin paths, adjust defaults for addusers role and ensure
  supplementary groups membership added for users.
* Add netplug user for network plugins (yet unused by privileged
  networking containers though).
* Grant the kube and netplug users read access for etcd certs via
  the etcd certs group.
* Grant group read access to kube certs via the kube cert group.
* Remove priveleged mode for calico-rr and run it under its uid/gid
  and supplementary etcd_cert group.
* Adjust docs.
* Align cpu/memory limits and dropped caps with added rkt support
  for control plane.

Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
2017-01-20 08:50:42 +01:00
Matthew Mosesohn
8ce32eb3e1 Merge pull request #905 from galthaus/async-runs
Add tasks to ensure that the first nodes have their directories for cert gen
2017-01-19 18:32:27 +03:00
Greg Althaus
0d44599a63 Add explicit name printing in task names for deletgated task during
cert creation
2017-01-18 14:06:50 -06:00
Sergii Golovatiuk
43fa72b7b7 Flush handlers before etcd restart
systemctl daemon-reload should be run before when task modifies/creates
union for etcd. Otherwise etcd won't be able to start

Closes #892

Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
2017-01-17 15:04:25 +01:00
Greg Althaus
6c69da1573 This PR adds/or modifies a few tasks to allow for the playbook to
be run by limit on each node without regard for order.

The changes make sure that all of the directories needed to do
certificate management are on the master[0] or etcd[0] node regardless
of when the playbook gets run on each node.  This allows for separate
ansible playbook runs in parallel that don't have to be synchronized.
2017-01-14 23:24:34 -06:00
Greg Althaus
95bf380d07 If the inventory name of the host exceeds 63 characters,
the openssl tools will fail to create signing requests because
the CN is too long.  This is mainly a problem when FQDNs are used
in the inventory file.

THis will truncate the hostname for the CN field only at the
first dot.  This should handle the issue for most cases.
2017-01-13 10:02:23 -06:00
Aleksandr Didenko
d9539e0f27 Fix etcd cert generation for calico-rr role
"etcd_node_cert_data" variable is undefinded for "calico-rr" role.
This patch adds "calico-rr" nodes to task where "etcd_node_cert_data"
variable is registered.
2017-01-09 12:06:25 +01:00
Bogdan Dobrelya
5af2c42bde Better fix for different CoreOS os family facts
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
2017-01-05 16:32:08 +01:00
Bogdan Dobrelya
f7447837c5 Rename CoreOS fact
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
2017-01-05 14:02:29 +01:00
Brad Beam
4b6f29d5e1 Adding kubelet in rkt 2017-01-03 14:49:48 -06:00
Brad Beam
8dc19374cc Allowing etcd to run via rkt 2017-01-03 10:10:38 -06:00
Bogdan Dobrelya
58062be2a3 Drop non systemd OS types support
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
2017-01-02 12:14:03 +01:00
Matthew Mosesohn
1f9f885379 Fix etcd cert generation to support large deployments
Due to bash max args limits, we should pass all node filenames and
base64-encoded tar data through stdin/stdout instead.

Fixes #832
2016-12-30 12:55:26 +03:00
Bogdan Dobrelya
a56d9de502 Systemd units, limits, and bin path fixes
* Add restart for weave service unit
* Reuse docker_bin_dir everythere
* Limit systemd managed docker containers by CPU/RAM. Do not configure native
  systemd limits due to the lack of consensus in the kernel community
  requires out-of-tree kernel patches.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-12-28 15:49:42 +01:00
Matthew Mosesohn
f0c0390646 Fix creation and sync of etcd certs
Admin certs only go to etcd nodes
Only generate cert-data for nodes that need sync
2016-12-28 14:21:17 +04:00
Matthew Mosesohn
e7a1949d85 Merge pull request #818 from mattymo/calico-rr-certs
Fix calico-rr to use etcd certs instead of kube certs
2016-12-28 08:47:16 +03:00
Matthew Mosesohn
6d9cd2d720 Fix calico-rr to use etcd certs instead of kube certs 2016-12-27 17:04:50 +03:00
Bogdan Dobrelya
79996b557b Rework ignore_errors to report no reds
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
2016-12-27 13:00:50 +01:00
Matthew Mosesohn
385f7f6e75 Update etcd.j2 2016-12-22 22:29:24 +03:00
Matthew Mosesohn
9f1e3db906 Adjust etcd server certificates
ETCD doesn't need cert/key options set. It only requires peer
cert options.
2016-12-22 23:05:17 +04:00
Spencer Smith
b63d900625 Workaround etcdctl not yet being installed (#797)
workaround case for etcdctl not yet being installed, only allow for return code of 0 (no error)
2016-12-22 12:41:38 -05:00
Matthew Mosesohn
ad796d188d Individual etcd ssl certs
Includes hooks for triggering calico, kubelet, and kube-apiserver restarts
if etcd certs changed.
2016-12-22 13:31:11 +03:00
Matthew Mosesohn
348fc5b109 Fix etcd to-SSL upgrade and task register vars 2016-12-19 15:05:49 +03:00
Matthew Mosesohn
9cc73bdf08 Fix etcd member list when upgrading ETCD from an old version 2016-12-15 12:00:45 +04:00
Bogdan Dobrelya
a15d626771 Preconfigure DNS stack and docker early
In order to enable offline/intranet installation cases:
* Move DNS/resolvconf configuration to preinstall role. Remove
  skip_dnsmasq_k8s var as not needed anymore.

* Preconfigure DNS stack early, which may be the case when downloading
  artifacts from intranet repositories. Do not configure
  K8s DNS resolvers for hosts /etc/resolv.conf yet early (as they may be
  not existing).

* Reconfigure K8s DNS resolvers for hosts only after kubedns/dnsmasq
  was set up and before K8s apps to be created.

* Move docker install task to early stage as well and unbind it from the
  etcd role's specific install path. Fix external flannel dependency on
  docker role handlers. Also fix the docker restart handlers' steps
  ordering to match the expected sequence (the socket then the service).

* Add default resolver fact, which is
  the cloud provider specific and remove hardcoded GCE resolver.

* Reduce default ndots for hosts /etc/resolv.conf to 2. Multiple search
  domains combined with high ndots values lead to poor performance of
  DNS stack and make ansible workers to fail very often with the
  "Timeout (12s) waiting for privilege escalation prompt:" error.

* Update docs.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-12-09 17:30:55 +01:00
Bogdan Dobrelya
8cc84e132a Add tags
Add tags to allow more granular tasks filtering.
Add generator script for MD formatted tags found.
Add docs for tags how-to.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-12-09 12:14:28 +01:00