Matthew Mosesohn 023108a733 Refactor calico route reflector to run in k8s cluster (#4975 )

* Refactor calico-rr to run in k8s cluster with taint

Change-Id: I75a3169ff5b36ce8302fc7ef1c32d3eb697b5afa

* add preinstall checks

* rework calico/rr role

Change-Id: I2f0a7e6cb77cf91ad4a615923680760d2e5d9ca8

* add empty calico-rr group

Change-Id: I006c0a60db9b72d02245bf8fdfabcf982144a5ad

2019-08-08 07:37:22 -07:00

6.9 KiB

Raw Blame History

Calico

N.B. Version 2.6.5 upgrade to 3.1.1 is upgrading etcd store to etcdv3 If you create automated backups of etcdv2 please switch for creating etcdv3 backups, as kubernetes and calico now uses etcdv3 After migration you can check `/tmp/calico_upgrade/` directory for converted items to etcdv3. PLEASE TEST upgrade before upgrading production cluster.

Check if the calico-node container is running

docker ps | grep calico

The calicoctl command allows to check the status of the network workloads.

Check the status of Calico nodes

calicoctl node status

or for versions prior to v1.0.0:

calicoctl status

Show the configured network subnet for containers

calicoctl get ippool -o wide

or for versions prior to v1.0.0:

calicoctl pool show

Show the workloads (ip addresses of containers and their located)

calicoctl get workloadEndpoint -o wide

and

calicoctl get hostEndpoint -o wide

or for versions prior v1.0.0:

calicoctl endpoint show --detail

Optional : Define network backend

In some cases you may want to define Calico network backend. Allowed values are 'bird', 'gobgp' or 'none'. Bird is a default value.

To re-define you need to edit the inventory and add a group variable calico_network_backend

calico_network_backend: none

Optional : Define the default pool CIDR

By default, kube_pods_subnet is used as the IP range CIDR for the default IP Pool. In some cases you may want to add several pools and not have them considered by Kubernetes as external (which means that they must be within or equal to the range defined in kube_pods_subnet), it starts with the default IP Pool of which IP range CIDR can by defined in group_vars (k8s-cluster/k8s-net-calico.yml):

calico_pool_cidr: 10.233.64.0/20

Optional : BGP Peering with border routers

In some cases you may want to route the pods subnet and so NAT is not needed on the nodes. For instance if you have a cluster spread on different locations and you want your pods to talk each other no matter where they are located. The following variables need to be set: peer_with_router to enable the peering with the datacenter's border router (default value: false). you'll need to edit the inventory and add a hostvar local_as by node.

node1 ansible_ssh_host=95.54.0.12 local_as=xxxxxx

Optional : Defining BGP peers

Peers can be defined using the peers variable (see docs/calico_peer_example examples). In order to define global peers, the peers variable can be defined in group_vars with the "scope" attribute of each global peer set to "global". In order to define peers on a per node basis, the peers variable must be defined in hostvars. NB: Ansible's hash_behaviour is by default set to "replace", thus defining both global and per node peers would end up with having only per node peers. If having both global and per node peers defined was meant to happen, global peers would have to be defined in hostvars for each host (as well as per node peers)

Since calico 3.4, Calico supports advertising Kubernetes service cluster IPs over BGP, just as it advertises pod IPs. This can be enabled by setting the following variable as follow in group_vars (k8s-cluster/k8s-net-calico.yml)

calico_advertise_cluster_ips: true

Optional : Define global AS number

Optional parameter global_as_num defines Calico global AS number (/calico/bgp/v1/global/as_num etcd key). It defaults to "64512".

Optional : BGP Peering with route reflectors

At large scale you may want to disable full node-to-node mesh in order to optimize your BGP topology and improve calico-node containers' start times.

To do so you can deploy BGP route reflectors and peer calico-node with them as recommended here:

You need to edit your inventory and add:

calico-rr group with nodes in it. calico-rr can be combined with kube-node and/or kube-master. calico-rr group also must be a child group of k8s-cluster group.
cluster_id by route reflector node/group (see details here)

Here's an example of Kubespray inventory with standalone route reflectors:

[all]
rr0 ansible_ssh_host=10.210.1.10 ip=10.210.1.10
rr1 ansible_ssh_host=10.210.1.11 ip=10.210.1.11
node2 ansible_ssh_host=10.210.1.12 ip=10.210.1.12
node3 ansible_ssh_host=10.210.1.13 ip=10.210.1.13
node4 ansible_ssh_host=10.210.1.14 ip=10.210.1.14
node5 ansible_ssh_host=10.210.1.15 ip=10.210.1.15

[kube-master]
node2
node3

[etcd]
node2
node3
node4

[kube-node]
node2
node3
node4
node5

[k8s-cluster:children]
kube-node
kube-master
calico-rr

[calico-rr]
rr0
rr1

[rack0]
rr0
rr1
node2
node3
node4
node5

[rack0:vars]
cluster_id="1.0.0.1"

The inventory above will deploy the following topology assuming that calico's global_as_num is set to 65400:

Optional : Define default endpoint to host action

By default Calico blocks traffic from endpoints to the host itself by using an iptables DROP action. When using it in kubernetes the action has to be changed to RETURN (default in kubespray) or ACCEPT (see https://github.com/projectcalico/felix/issues/660 and https://github.com/projectcalico/calicoctl/issues/1389). Otherwise all network packets from pods (with hostNetwork=False) to services endpoints (with hostNetwork=True) within the same node are dropped.

To re-define default action please set the following variable in your inventory:

calico_endpoint_to_host_action: "ACCEPT"

Optional : Define address on which Felix will respond to health requests

Since Calico 3.2.0, HealthCheck default behavior changed from listening on all interfaces to just listening on localhost.

To re-define health host please set the following variable in your inventory:

calico_healthhost: "0.0.0.0"

Cloud providers configuration

Please refer to the official documentation, for example GCE configuration requires a security rule for calico ip-ip tunnels. Note, calico is always configured with ipip: true if the cloud provider was defined.

Optional : Ignore kernel's RPF check setting

By default the felix agent(calico-node) will abort if the Kernel RPF setting is not 'strict'. If you want Calico to ignore the Kernel setting:

calico_node_ignorelooserpf: true

Note that in OpenStack you must allow ipip traffic in your security groups, otherwise you will experience timeouts. To do this you must add a rule which allows it, for example:

neutron  security-group-rule-create  --protocol 4  --direction egress  k8s-a0tp4t
neutron  security-group-rule-create  --protocol 4  --direction igress  k8s-a0tp4t

6.9 KiB Raw Blame History