* Rename dns_server to dnsmasq_dns_server so that it includes role prefix
as the var name is generic and conflicts when integrating with existing ansible automation.
* Enable selinux state to be configurable with new var preinstall_selinux_state
This follows pull request #1677, adding the cgroup-driver
autodetection also for kubeadm way of deploying.
Info about this and the possibility to override is added to the docs.
New files: /etc/kubernetes/admin.conf
/root/.kube/config
$GITDIR/artifacts/{kubectl,admin.conf}
Optional method to download kubectl and admin.conf if
kubeconfig_lcoalhost is set to true (default false)
* Added update CA trust step for etcd and kube/secrets roles
* Added load_balancer_domain_name to certificate alt names if defined. Reset CA's in RedHat os.
* Rename kube-cluster-ca.crt to vault-ca.crt, we need separated CA`s for vault, etcd and kube.
* Vault role refactoring, remove optional cert vault auth because not not used and worked. Create separate CA`s fro vault and etcd.
* Fixed different certificates set for vault cert_managment
* Update doc/vault.md
* Fixed condition create vault CA, wrong group
* Fixed missing etcd_cert_path mount for rkt deployment type. Distribute vault roles for all vault hosts
* Removed wrong when condition in create etcd role vault tasks.
* Updates Controller Manager/Kubelet with Flannel's required configuration for CNI
* Removes old Flannel installation
* Install CNI enabled Flannel DaemonSet/ConfigMap/CNI bins and config (with portmap plugin) on host
* Uses RBAC if enabled
* Fixed an issue that could occur if br_netfilter is not a module and net.bridge.bridge-nf-call-iptables sysctl was not set
$IPS only expands to the first ip address in the array:
justin@box:~$ declare -a IPS=(10.10.1.3 10.10.1.4 10.10.1.5)
justin@box:~$ echo $IPS
10.10.1.3
justin@box:~$ echo ${IPS[@]}
10.10.1.3 10.10.1.4 10.10.1.5
Clarify that the `kube_version` environment variable is needed for the CLI "graceful upgrade". Also add and example to check that the upgrade was successful.
Non-brekable space is 0xc2 0xa0 byte sequence in UTF-8.
To find one:
$ git grep -I -P '\xc2\xa0'
To replace with regular space:
$ git grep -l -I -P '\xc2\xa0' | xargs sed -i 's/\xc2\xa0/ /g'
This commit doesn't include changes that will overlap with commit f1c59a91a1.
The AWS IAM profiles and policies required to run Kargo on AWS
are no longer hosted in the kubernetes main repo since kube-up got
deprecated. Hence we have to move the files into the kargo repository.
Updates based on feedback
Simplify checks for file exists
remove invalid char
Review feedback. Use regular systemd file.
Add template for docker systemd atomic
By default Calico blocks traffic from endpoints
to the host itself by using an iptables DROP
action. It could lead to a situation when service
has one alive endpoint, but pods which run on
the same node can not access it. Changed the action
to RETURN.
Operator can specify any port for kube-api (6443 default) This helps in
case where some pods such as Ingress require 443 exclusively.
Closes: 820
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
* Leave all.yml to keep only optional vars
* Store groups' specific vars by existing group names
* Fix optional vars casted as mandatory (add default())
* Fix missing defaults for an optional IP var
* Relink group_vars for terraform to reflect changes
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
New deploy modes: scale, ha-scale, separate-scale
Creates 200 fake hosts for deployment with fake hostvars.
Useful for testing certificate generation and propagation to other
master nodes.
Updated test cases descriptions.
Based on #718 introduced by rsmitty.
Includes all roles and all options to support deployment of
new hosts in case they were added to inventory.
Main difference here is that master role is evaluated first
so that master components get upgraded first.
Fixes#694
- Exclude kubelet CPU/RAM (kube-reserved) from cgroup. It decreases a
chance of overcommitment
- Add a possibility to modify Kubelet node-status-update-frequency
- Add a posibility to configure node-monitor-grace-period,
node-monitor-period, pod-eviction-timeout for Kubernetes controller
manager
- Add Kubernetes Relaibility Documentation with recomendations for
various scenarios.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
kubelet lost the ability to load kernel modules. This
puts that back by adding the lib/modules mount to kubelet.
The new variable kubelet_load_modules can be set to true
to enable this item. It is OFF by default.
Netchecker is rewritten in Go lang with some new args instead of
env variables. Also netchecker-server no longer requires kubectl
container. Updating playbooks accordingly.
* Drop linux capabilities for unprivileged containerized
worlkoads Kargo configures for deployments.
* Configure required securityContext/user/group/groups for kube
components' static manifests, etcd, calico-rr and k8s apps,
like dnsmasq daemonset.
* Rework cloud-init (etcd) users creation for CoreOS.
* Fix nologin paths, adjust defaults for addusers role and ensure
supplementary groups membership added for users.
* Add netplug user for network plugins (yet unused by privileged
networking containers though).
* Grant the kube and netplug users read access for etcd certs via
the etcd certs group.
* Grant group read access to kube certs via the kube cert group.
* Remove priveleged mode for calico-rr and run it under its uid/gid
and supplementary etcd_cert group.
* Adjust docs.
* Align cpu/memory limits and dropped caps with added rkt support
for control plane.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Also adds calico-rr group if there are standalone etcd nodes.
Now if there are 50 or more nodes, 3 etcd nodes will be standalone.
If there are 200 or more nodes, 2 kube-masters will be standalone.
If thresholds are exceeded, kube-node group cannot add nodes that
belong to etcd or kube-master groups (according to above statements).
* Add restart for weave service unit
* Reuse docker_bin_dir everythere
* Limit systemd managed docker containers by CPU/RAM. Do not configure native
systemd limits due to the lack of consensus in the kernel community
requires out-of-tree kernel patches.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Run 1 of each part1/2/special for Gitlab CI only for PRs
Remaining 2/3 of each stage are manual steps for master only (merges)
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* Reduce default testcase to 2 nodes, add HA case.
* Adjust gen_matrix script for Travis/Gitlab CIs.
* Enable netchecker deploy foro gitlab CI.
* Sync other things from travis matrix and reorder them as build steps
for pull requests, master branch, auto/manual.
* Do auto-step1 from part1 and manual step2,3 for branches/PRs.
* Do manual steps from part2, special for master merges.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Add BGP route reflectors support in order to optimize BGP topology
for deployments with Calico network plugin.
Also bump version of calico/ctl for some bug fixes.
Do not repeat options and nameservers in the dhclient hooks.
Do not prepend nameservers for dhclient but supersede and fail back
to the upstream_dns_resolvers then default_resolver. Fixes order of
nameservers placement, which is cluster DNS ip goes always first.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* For Debian/RedHat OS families (with NetworkManager/dhclient/resolvconf
optionally enabled) prepend /etc/resolv.conf with required nameservers,
options, and supersede domain and search domains via the dhclient/resolvconf
hooks.
* Drop (z)nodnsupdate dhclient hook and re-implement it to complement the
resolvconf -u command, which is distro/cloud provider specific.
Update docs as well.
* Enable network restart to apply and persist changes and simplify handlers
to rely on network restart only. This fixes DNS resolve for hostnet K8s
pods for Red Hat OS family. Skip network restart for canal/calico plugins,
unless https://github.com/projectcalico/felix/issues/1185 fixed.
* Replace linefiles line plus with_items to block mode as it's faster.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Matthew Mosesohn <mmosesohn@mirantis.com>
In order to enable offline/intranet installation cases:
* Move DNS/resolvconf configuration to preinstall role. Remove
skip_dnsmasq_k8s var as not needed anymore.
* Preconfigure DNS stack early, which may be the case when downloading
artifacts from intranet repositories. Do not configure
K8s DNS resolvers for hosts /etc/resolv.conf yet early (as they may be
not existing).
* Reconfigure K8s DNS resolvers for hosts only after kubedns/dnsmasq
was set up and before K8s apps to be created.
* Move docker install task to early stage as well and unbind it from the
etcd role's specific install path. Fix external flannel dependency on
docker role handlers. Also fix the docker restart handlers' steps
ordering to match the expected sequence (the socket then the service).
* Add default resolver fact, which is
the cloud provider specific and remove hardcoded GCE resolver.
* Reduce default ndots for hosts /etc/resolv.conf to 2. Multiple search
domains combined with high ndots values lead to poor performance of
DNS stack and make ansible workers to fail very often with the
"Timeout (12s) waiting for privilege escalation prompt:" error.
* Update docs.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Add upload tag allow users to exclude distributing images across nodes
when running with the download tag set.
Add related tags and update docs as well.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>