Attempting to clarify the language surrounding the etcd node deployment script failure mechanism. I had this error when doing a new cluster deployment last night and, though it should have been, it wasn't immediately apparent to me what was causing the issue (since my default master node hostnames do not specify whether they are also acting as etcd replicas).
* Add supplementary node groups
To add additional ansible groups to the k8s nodes, such as
`kube-ingress` for running ingress controller pods. Empty by default.
* [terraform/openstack] Restores ability to use existing public nodes and masters as bastion.
* [terraform/openstack] Uses network_id as output
* [terraform/openstack] Fixes link to inventory/local/group_vars
* [terraform/openstack] Adds supplementary master groups
* [terraform/openstack] Updates documentation avoiding manual setups for bastion (as they are not needed now).
* [terraform/openstack] Supplementary master groups in docs.
* [terraform/openstack] Fixes repeated usage of master fips instead of bastion fips
* [terraform/openstack] Missing change for network_id to subnet_id
* [terraform/openstack] Changes conditional to element( concat ) form to avoid type issues with empty lists.
* Update rpm spec and pbr setup configs
* Rename package to kubespray
* Do not break Fedora's FHS and install to /usr/share instead
* Remove the vendor tag
* Update source0 for better artifacts' names
* Fix missing files build errors
* Make version/release to auto match from git and fit PEP 440
Co-authored-by: Matthias Runge <mrunge@redhat.com>
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
* Add package paths to roles search in ansible conf
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
* Poke jinja2 requirements in rpm spec file
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Hardcoded variables are removed from variables.tf file because it might
not be suitable for all OpenStack Cloud depending on Identity API
version available (between v2 or v3) and preferred authentication
method.
* Adding bastion and private network provisioning for openstack terraform
* Remove usage of floating-ip property
* Combine openstack instances + floating ips
* Fix relating floating IPs to hosts for openstack builds
* Tighten up security groups
Allow ssh into all instances with floating IP
* Add the gluster hosts to the no-floating group
* Break terraform into modules
* Update README and var descriptions to match current config
* Remove volume property in gluster compute def
* Include cluster name in internal network and router names
* Make dns_nameservers a variable
* Properly tag instances and subnets with `kubernetes.io/cluster/$cluster_name`
This is required by kubernetes to support multiple clusters in a single vpc/az
* Get rid of loadbalancer_apiserver_address as it is no longer needed
* Dynamically retrieve aws_bastion_ami latest reference by querying AWS rather than hard coded
* Dynamically retrieve the list of availability_zones instead of needing to have them hard coded
* Limit availability zones to first 2, using slice extrapolation function
* Replace the need for hardcoded variable "aws_cluster_ami" by the data provided by Terraform
* Move ami choosing to vars, so people don't need to edit create infrastructure if they want another vendor image (as suggested by @atoms)
* Make name of the data block agnostic of distribution, given there are more than one distribution supported
* Add documentation about other distros being supported and what to change in which location to make these changes
* Add comment line and documentation for bastion host usage
* Take out unneeded sudo parm
* Remove blank lines
* revert changes
* take out disabling of strict host checking
This trigger ensures the inventory file is kept up-to-date. Otherwise, if the file exists and you've made changes to your terraform-managed infra without having deleted the file, it would never get updated.
For example, consider the case where you've destroyed and re-applied the terraform resources, none of the IPs would get updated, so ansible would be trying to connect to the old ones.
This does not address per-node certs and scheduler/proxy/controller-manager
component certs which are now required. This should be handled in a
follow-up patch.
Install roles under /usr/local/share/kubespray/roles,
playbooks - /usr/local/share/kubespray/playbooks/,
ansible.cfg and inventory group vars - into /etc/kubespray.
Ship README and an example inventory as the package docs.
Update the ansible.cfg to consume the roles from the given path,
including virtualenvs prefix, if defined.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Optional Ansible playbook for preparing a host for running Kargo.
This includes creation of a user account, some basic packages,
and sysctl values required to allow CNI networking on a libvirt network.
Rewrote AWS Terraform deployment for AWS Kargo. It supports now
multiple Availability Zones, AWS Loadbalancer for Kubernetes API,
Bastion Host, ...
For more information see README
The AWS IAM profiles and policies required to run Kargo on AWS
are no longer hosted in the kubernetes main repo since kube-up got
deprecated. Hence we have to move the files into the kargo repository.
* Leave all.yml to keep only optional vars
* Store groups' specific vars by existing group names
* Fix optional vars casted as mandatory (add default())
* Fix missing defaults for an optional IP var
* Relink group_vars for terraform to reflect changes
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Also adds calico-rr group if there are standalone etcd nodes.
Now if there are 50 or more nodes, 3 etcd nodes will be standalone.
If there are 200 or more nodes, 2 kube-masters will be standalone.
If thresholds are exceeded, kube-node group cannot add nodes that
belong to etcd or kube-master groups (according to above statements).
Also place in global vars and do not repeat the kube_*_config_dir
and kube_namespace vars for better code maintainability and UX.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Currently, the terraform script in contrib
adds etcd role as a child of k8s-cluster in
its generated inventory file.
This is problematic when the etcd role is
deployed on separate nodes from the k8s master
and nodes. In this case, this leads to failures
of the k8s node since the PKI certs required for
that role have not been propogated.
* Add download_localhost for the download_run_once mode, which is
use the ansible host (a travis node for CI case) to store and
distribute containers across cluster nodes in inventory.
Defaults to false.
* Rework download_run_once logic to fix idempotency of uploading
containers.
* For Travis CI, enable docker images caching and run Travis
workers with sudo enabled as a dependency
* For Travis CI, deploy with download_localhost and download_run_once
enabled to shourten dev path drastically.
* Add compression for saved container images. Defaults to 'best'.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Aleksandr Didenko <adidenko@mirantis.com>
Squashed commits:
[f9355ea] Swap order in which we reload docker/socket
[2ca6819] Reload docker.socket after installing flannel on coreos
Workaround for #569
[9f976e5] Vagrantfile: setup proxy inside virtual machines
In corporate networks, it is good to pre-configure proxy variables.
[9d7142f] Vagrantfile: use Ubuntu 16.04 LTS
Use recent supported version of Ubuntu for local development setup
with Vagrant.
[50f77cc] Add CI test layouts
* Drop Wily from test matrix
* Replace the Wily cases dropped with extra cases to test separate
roles deployment
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
[03e162b] Update OWNERS
[c7b00ca] Use tar+register instead of copy/slurp for distributing tokens and certs
Related bug: https://github.com/ansible/ansible/issues/15405
Uses tar and register because synchronize module cannot sudo on the
remote side correctly and copy is too slow.
This patch dramatically cuts down the number of tasks to process
for cert synchronization.
[2778ac6] Add new var skip_dnsmasq_k8s
If skip_dnsmasq is set, it will still not set up dnsmasq
k8s pod. This enables independent setup of resolvconf section
before kubelet is up.