add prometheus annotations to calico-node if
calico_felix_prometheusmetricsenabled is enabled.
This will allow a kubernetes_sd to automaticly find the pods and start
scraping.
* warning on meta flush_handlers
* avoid rm
* avoid "Module remote_tmp /root/.ansible/tmp did not exist and was created with a mode of 0700, this may cause issues when running as another user. To avoid this, create the remote_tmp dir with the correct permissions manually" warning on subsequent tasks using blockinfile
* is match
* failed
* version_compare
* succeeded
* skipped
* success
* version_compare becomes version since ansible 2.5
* ansible minimal version updated in doc and spec
* last version_compare
Before, Nodes tainted with NoExecute policy did not have calico/weave Pod.
Network pod should run on all nodes whatever happens on a specific node.
Also always set the Pods to be critical.
Also remove deprecated scheduler.alpha.kubernetes.io/tolerations annotations.
* Changes to assign pod priority to kube components.
* Removed the boolean flag pod_priority_assignment
* Created new priorityclass k8s-cluster-critical
* Created new priorityclass k8s-cluster-critical
* Fixed the trailing spaces
* Fixed the trailing spaces
* Added kube version check while creating Priority Class k8s-cluster-critical
* Moved k8s-cluster-critical.yml
* Moved k8s-cluster-critical.yml to kube_config_dir
* calico upgrade to v3
* update calico_rr version
* add missing file
* change contents of main.yml as it was left old version
* enable network policy by default
* remove unneeded task
* Fix kubelet calico settings
* fix when statement
* switch back to node-kubeconfig.yaml
* allow installs to not have hostname overriden with fqdn from inventory
* calico-config no longer requires local as and will default to global
* when cloudprovider is not defined, use the inventory_hostname for cni-calico
* allow reset to not restart network (buggy nodes die with this cmd)
* default kube_override_hostname to inventory_hostname instead of ansible_hostname
* Refactor downloads to use download role directly
Also disable fact delegation so download delegate works acros OSes.
* clean up bools and ansible_os_family conditionals
The value cannot be determined properly via local facts, so
checking k8s api is the most reliable way to look up what hostname
is used when using a cloudprovider.
* kubeadm support
* move k8s master to a subtask
* disable k8s secrets when using kubeadm
* fix etcd cert serial var
* move simple auth users to master role
* make a kubeadm-specific env file for kubelet
* add non-ha CI job
* change ci boolean vars to json format
* fixup
* Update create-gce.yml
* Update create-gce.yml
* Update create-gce.yml
* Adding yaml linter to ci check
* Minor linting fixes from yamllint
* Changing CI to install python pkgs from requirements.txt
- adding in a secondary requirements.txt for tests
- moving yamllint to tests requirements
By default Calico CNI does not create any network access policies
or profiles if 'policy' is enabled in CNI config. And without any
policies/profiles network access to/from PODs is blocked.
K8s related policies are created by calico-policy-controller in
such case. So we need to start it as soon as possible, before any
real workloads.
This patch also fixes kube-api port in calico-policy-controller
yaml template.
Closes#1132
By default Calico blocks traffic from endpoints
to the host itself by using an iptables DROP
action. It could lead to a situation when service
has one alive endpoint, but pods which run on
the same node can not access it. Changed the action
to RETURN.
* Leave all.yml to keep only optional vars
* Store groups' specific vars by existing group names
* Fix optional vars casted as mandatory (add default())
* Fix missing defaults for an optional IP var
* Relink group_vars for terraform to reflect changes
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
Migrate older inline= syntax to pure yml syntax for module args as to be consistant with most of the rest of the tasks
Cleanup some spacing in various files
Rename some files named yaml to yml for consistancy
Ansible playbook fails when tags are limited to "facts,etcd" or to
"facts". This patch allows to run ansible-playbook to gather facts only
that don't require calico/flannel/weave components to be verified. This
allows to run ansible with 'facts,bootstrap-os' or just 'facts' to
gether facts that don't require specific components.
Signed-off-by: Sergii Golovatiuk <sgolovatiuk@mirantis.com>
For consistancy with kubernetes services we should use the same
hostname for nodes, which is 'ansible_hostname'.
Also fixing missed 'kube-node' in templates, Calico is installed
on 'k8s-cluster' roles, not only 'kube-node'.
Calico-rr is broken for deployments with separate k8s-master and
k8s-node roles. In order to fix it we should peer k8s-cluster
nodes with calico-rr, not just k8s-node. The same for peering
with routers.
Closes#925
* Drop linux capabilities for unprivileged containerized
worlkoads Kargo configures for deployments.
* Configure required securityContext/user/group/groups for kube
components' static manifests, etcd, calico-rr and k8s apps,
like dnsmasq daemonset.
* Rework cloud-init (etcd) users creation for CoreOS.
* Fix nologin paths, adjust defaults for addusers role and ensure
supplementary groups membership added for users.
* Add netplug user for network plugins (yet unused by privileged
networking containers though).
* Grant the kube and netplug users read access for etcd certs via
the etcd certs group.
* Grant group read access to kube certs via the kube cert group.
* Remove priveleged mode for calico-rr and run it under its uid/gid
and supplementary etcd_cert group.
* Adjust docs.
* Align cpu/memory limits and dropped caps with added rkt support
for control plane.
Signed-off-by: Bogdan Dobrelya <bogdando@mail.ru>
"etcd_node_cert_data" variable is undefinded for "calico-rr" role.
This patch adds "calico-rr" nodes to task where "etcd_node_cert_data"
variable is registered.
* Add restart for weave service unit
* Reuse docker_bin_dir everythere
* Limit systemd managed docker containers by CPU/RAM. Do not configure native
systemd limits due to the lack of consensus in the kernel community
requires out-of-tree kernel patches.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Add BGP route reflectors support in order to optimize BGP topology
for deployments with Calico network plugin.
Also bump version of calico/ctl for some bug fixes.
When running legacy calicoctl we do not specify calico hostname in
calico-node container thus we should not specify it in CNI config.
Also move 'legacy_calicoctl' set_fact task to the top.
In new `calicoctl` version nodes peering with routers is broken.
We need to use predictable node names for calico-node and the
same names in calico `bgpPeer` resources and CNI.
Since version 'v1.0.0-beta' calicoctl is written
in Go and its API differs from old Python based
utility. Added support of both old and new version
of the utility.
test to change the machine type
Revert "test to change the machine type"
This reverts commit 7a91f1b5405a39bee6cb91940b09a0b0f9d3aee1.
use google dns server when no upstream dns are defined
comment upstream_dns_servers
update documentation
remove deprecated kubelet flags
Revert "remove deprecated kubelet flags"
This reverts commit 21e3b893c896d0291c36a07d0414f4cb88b8d8ac.
- Allow to overwrite calico cni binaries copied from hyperkube
by the custom ones.
- Fix calico-ipam deployment (it had wrong source in rsync)
- Make copy from hyperkube idempotent (use rsync instead of cp)
- Remove some orphaned comments
* Add the retry_stagger var to tweak push and retry time strategies.
* Add large deployments related docs.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Move version/repo vars to download role.
Add container to download params, which overrides url/source_url,
if enabled.
Fix networking plugins download depending on kube_network_plugin.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Hyperkube from CoreOS now ships with all binaries required for
calico and flannel (but not weave). It simplifies deployment for
some network plugin scenarios to not download CNI images.
TODO: Optionally disable downloading calico to /opt/cni/bin
Creating the unit using default settings early on
and then changing it during network_plugin section
leads to too many docker restarts and duplicated code.
Reversed Wants= dependence on docker.service so it does not
restart docker when reloading systemd
Consolidated all docker restart handlers.
* Add for docker system units:
ExecReload=/bin/kill -s HUP $MAINPID
Delegate=yes
KillMode=process.
* Add missed DOCKER_OPTIONS for calico/weave docker systemd unit.
* Change Requires= to a less strict and non-faily Wants=, add missing
Wants= for After=.
* Align wants/after in a wat if Wants=foo, After= has foo as well.
* Make wants/after docker.service to ask for the docker.socket as well.
* Move "docker rm -f" commands from ExecStartPre= to ExecStopPost=.
hooks to ensure non-destructive start attempts issued by Wants=.
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
* Add HA docs for API server.
* Add auto-evaluated internal endpoints and clarify the loadbalancer_apiserver
vars and usecases.
* Use facts for kube_apiserver to not repeat code and enable LB endpoints use.
* Use /healthz check for the wait-for apiserver.
* Use the single endpoint for kubelet instead of the list of apiservers
* Specify kube_apiserver_count to for HA layout
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
kubelet via docker
kube-apiserver as a static pod
Fixed etcd service start to be more tolerant of slow start.
Workaround for kube_version to stay in download role, but not
download an files by creating a new "nothing" download entry.
Adds new boolean configuration variable for calico network plugin
`ipip`. When it's enabled calico pool is created with '--ipip'
option (IP-over-IP encapsulation across hosts).
Also refactor pool creation tasks to simplify logic and make tasks
more readable.
Improved docker reload command to wait for etcd to be
up before proceeding. Switched reload to run restart
because it can't reload if it is not guaranteed to be
in running state.
* Enforce a etcd-proxy role to a k8s-cluster group members. This
provides an HA layout for all of the k8s cluster internal clients.
* Proxies to be run on each node in the group as a separate etcd
instances with a readwrite proxy mode and listen the given endpoint,
which is either the access_ip:2379 or the localhost:2379.
* A notion for the 'kube_etcd_multiaccess' is: ignore endpoints and
loadbalancers and use the etcd members IPs as a comma-separated
list. Otherwise, clients shall use the local endpoint provided by a
etcd-proxy instances on each etcd node. A Netwroking plugins always
use that access mode.
* Fix apiserver's etcd servers args to use the etcd_access_endpoint.
* Fix networking plugins flannel/calico to use the etcd_endpoint.
* Fix name env var for non masters to be set as well.
* Fix etcd_client_url was not used anywhere and other etcd_* facts
evaluation was duplicated in a few places.
* Define proxy modes only in the env file, if not a master. Del
an automatic proxy mode decisions for etcd nodes in init/unit scripts.
* Use Wants= instead of Requires= as "This is the recommended way to
hook start-up of one unit to the start-up of another unit"
* Make apiserver/calico Wants= etcd-proxy to keep it always up
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Matthew Mosesohn <mmosesohn@mirantis.com>