Modify connection_strings_etcd to only return etcd nodes - not master nodes - since this results in duplicate hosts in the generated Ansible inventory and is unnecessary.
* fix(misc): terraform/aws
- handles deployment with a single availability zone
- handles deployment with more than two availability zone
- handles etcd collocation with control-plane nodes (`aws_etcd_num=0`)
- allows to set a bastion instances count (`aws_bastion_num`)
- allows to set bastion/etcd/control-plane/workers rootfs volume size
- removes variables from terraform.tfvars that were not re-used
- adds .terraform.lock.hcl to .gitignore
- changes/updates base image from ubuntu-18.03 to debian-10
tested by a few coworkers of mine, and myself: thanks for the outstanding
work, on both those terraform samples and kubespray playbooks.
I did not test ubuntu deployments, I could still swap from buster to
focal. LMK.
* fix(gitlab-ci)
AFAIU, terraform.tfvars indentation should be fixed for / no diff
returned running `terraform fmt -check -diff`
https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/1445622114
* rename ansible groups to use _ instead of -
k8s-cluster -> k8s_cluster
k8s-node -> k8s_node
calico-rr -> calico_rr
no-floating -> no_floating
Note: kube-node,k8s-cluster groups in upgrade CI
need clean-up after v2.16 is tagged
* ensure old groups are mapped to the new ones
This replaces kube-master with kube_control_plane because of [1]:
The Kubernetes project is moving away from wording that is
considered offensive. A new working group WG Naming was created
to track this work, and the word "master" was declared as offensive.
A proposal was formalized for replacing the word "master" with
"control plane". This means it should be removed from source code,
documentation, and user-facing configuration from Kubernetes and
its sub-projects.
NOTE: The reason why this changes it to kube_control_plane not
kube-control-plane is for valid group names on ansible.
[1]: https://github.com/kubernetes/enhancements/blob/master/keps/sig-cluster-lifecycle/kubeadm/2067-rename-master-label-taint/README.md#motivation
* [terraform/aws] Fix Terraform >=0.13 warnings
Terraform >=0.13 gives the following warning:
```
Warning: Interpolation-only expressions are deprecated
```
The fix was tested as follows:
```
rm -rf .terraform && terraform0.12.26 init && terraform0.12.26 validate
rm -rf .terraform && terraform0.13.5 init && terraform0.13.5 validate
rm -rf .terraform && terraform0.14.3 init && terraform0.14.3 validate
```
which gave no errors nor warnings.
* [terraform/openstack] Fixes for Terraform >=0.13
Terraform >=0.13 gives the following error:
```
Error: Failed to install providers
Could not find required providers, but found possible alternatives:
hashicorp/openstack -> terraform-provider-openstack/openstack
```
This patch fixes these errors.
This fix was tested as follows:
```
rm -rf .terraform && terraform0.12.26 init && terraform0.12.26 validate
rm -rf .terraform && terraform0.13.5 init && terraform0.13.5 validate
rm -rf .terraform && terraform0.14.3 init && terraform0.14.3 validate
```
which gave no errors nor warnings for Terraform 0.13.5 and Terraform
0.14.3. Unfortunately, 0.12.x gives a harmless warning, but
with 0.14.3 out the door, I guess we need to move on.
* [terraform/packet] Fixes for Terraform >=0.13
This fix was tested as follows:
```
export PACKET_AUTH_TOKEN=blah-blah
rm -rf .terraform && terraform0.12.26 init && terraform0.12.26 validate
rm -rf .terraform && terraform0.13.5 init && terraform0.13.5 validate
rm -rf .terraform && terraform0.14.3 init && terraform0.14.3 validate
```
Errors are gone, but warnings still remain. It is impossible to please
all three versions of Terraform.
* Add tests for Terraform >=0.13
* Add note about changing private IP in admin.conf.
When I run kubespray, a load balancer is created which should be used instead of the ip of the controller node.
* Procedure to find load balancer and update admin.conf
When I run kubespray, a load balancer is used instead of the private ip of the controller.
I kept seeing `TLS handshake error from 10.250.250.158:63770: EOF` from two IP addresses that correlate to my ELB. Changing the health check from TCP to HTTPS stopped the errors from being generated.
* Run terraform fmt
* Add terraform fmt to .terraform-validate CI step
* Add tf-validate-aws CI step
* Revert "Add tf-validate-aws CI step"
This reverts commit e007225fac.
* Properly tag instances and subnets with `kubernetes.io/cluster/$cluster_name`
This is required by kubernetes to support multiple clusters in a single vpc/az
* Get rid of loadbalancer_apiserver_address as it is no longer needed
* Dynamically retrieve aws_bastion_ami latest reference by querying AWS rather than hard coded
* Dynamically retrieve the list of availability_zones instead of needing to have them hard coded
* Limit availability zones to first 2, using slice extrapolation function
* Replace the need for hardcoded variable "aws_cluster_ami" by the data provided by Terraform
* Move ami choosing to vars, so people don't need to edit create infrastructure if they want another vendor image (as suggested by @atoms)
* Make name of the data block agnostic of distribution, given there are more than one distribution supported
* Add documentation about other distros being supported and what to change in which location to make these changes
* Add comment line and documentation for bastion host usage
* Take out unneeded sudo parm
* Remove blank lines
* revert changes
* take out disabling of strict host checking
This trigger ensures the inventory file is kept up-to-date. Otherwise, if the file exists and you've made changes to your terraform-managed infra without having deleted the file, it would never get updated.
For example, consider the case where you've destroyed and re-applied the terraform resources, none of the IPs would get updated, so ansible would be trying to connect to the old ones.
Rewrote AWS Terraform deployment for AWS Kargo. It supports now
multiple Availability Zones, AWS Loadbalancer for Kubernetes API,
Bastion Host, ...
For more information see README
Currently, the terraform script in contrib
adds etcd role as a child of k8s-cluster in
its generated inventory file.
This is problematic when the etcd role is
deployed on separate nodes from the k8s master
and nodes. In this case, this leads to failures
of the k8s node since the PKI certs required for
that role have not been propogated.