c12s-kubespray/docs/nodes.md

# Adding/replacing a node

Modified from [comments in #3471](https://github.com/kubernetes-sigs/kubespray/issues/3471#issuecomment-530036084)

## Limitation: Removal of first kube-master and etcd-master

Currently you can't remove the first node in your kube-master and etcd-master list. If you still want to remove this node you have to:

### 1) Change order of current masters

Modify the order of your master list by pushing your first entry to any other position. E.g. if you want to remove `node-1` of the following example:

```yaml
  children:
    kube-master:
      hosts:
        node-1:
        node-2:
        node-3:
    kube-node:
      hosts:
        node-1:
        node-2:
        node-3:
    etcd:
      hosts:
        node-1:
        node-2:
        node-3:
```

change your inventory to:

```yaml
  children:
    kube-master:
      hosts:
        node-2:
        node-3:
        node-1:
    kube-node:
      hosts:
        node-2:
        node-3:
        node-1:
    etcd:
      hosts:
        node-2:
        node-3:
        node-1:
```

## 2) Upgrade the cluster

run `cluster-upgrade.yml` or `cluster.yml`. Now you are good to go on with the removal.

## Adding/replacing a worker node

This should be the easiest.

### 1) Add new node to the inventory

### 2) Run `scale.yml`

You can use `--limit=NODE_NAME` to limit Kubespray to avoid disturbing other nodes in the cluster.

Before using `--limit` run playbook `facts.yml` without the limit to refresh facts cache for all nodes.

### 3) Remove an old node with remove-node.yml

With the old node still in the inventory, run `remove-node.yml`. You need to pass `-e node=NODE_NAME` to the playbook to limit the execution to the node being removed.
  
If the node you want to remove is not online, you should add `reset_nodes=false` to your extra-vars: `-e node=NODE_NAME reset_nodes=false`.
Use this flag even when you remove other types of nodes like a master or etcd nodes.

### 5) Remove the node from the inventory

That's it.

## Adding/replacing a master node

### 1) Run `cluster.yml`

Append the new host to the inventory and run `cluster.yml`. You can NOT use `scale.yml` for that.

### 3) Restart kube-system/nginx-proxy

In all hosts, restart nginx-proxy pod. This pod is a local proxy for the apiserver. Kubespray will update its static config, but it needs to be restarted in order to reload.

```sh
# run in every host
docker ps | grep k8s_nginx-proxy_nginx-proxy | awk '{print $1}' | xargs docker restart
```

### 4) Remove old master nodes

With the old node still in the inventory, run `remove-node.yml`. You need to pass `-e node=NODE_NAME` to the playbook to limit the execution to the node being removed.
If the node you want to remove is not online, you should add `reset_nodes=false` to your extra-vars.

## Adding an etcd node

You need to make sure there are always an odd number of etcd nodes in the cluster. In such a way, this is always a replace or scale up operation. Either add two new nodes or remove an old one.

### 1) Add the new node running cluster.yml

Update the inventory and run `cluster.yml` passing `--limit=etcd,kube-master -e ignore_assert_errors=yes`.
If the node you want to add as an etcd node is already a worker or master node in your cluster, you have to remove him first using `remove-node.yml`.

Run `upgrade-cluster.yml` also passing `--limit=etcd,kube-master -e ignore_assert_errors=yes`. This is necessary to update all etcd configuration in the cluster.  

At this point, you will have an even number of nodes.
Everything should still be working, and you should only have problems if the cluster decides to elect a new etcd leader before you remove a node.
Even so, running applications should continue to be available.

If you add multiple ectd nodes with one run, you might want to append `-e etcd_retries=10` to increase the amount of retries between each ectd node join.
Otherwise the etcd cluster might still be processing the first join and fail on subsequent nodes. `etcd_retries=10` might work to join 3 new nodes.

## Removing an etcd node

### 1) Remove old etcd members from the cluster runtime

Acquire a shell prompt into one of the etcd containers and use etcdctl to remove the old member. Use a etcd master that will not be removed for that.  

```sh
# list all members
etcdctl member list

# run remove for each member you want pass to remove-node.yml in step 2
etcdctl member remove MEMBER_ID
# careful!!! if you remove a wrong member you will be in trouble

# wait until you do not get a 'Failed' output from
etcdctl member list

# note: these command lines are actually much bigger, if you are not inside an etcd container, since you need to pass all certificates to etcdctl.
```

You can get into an etcd container by running `docker exec -it $(docker ps --filter "name=etcd" --format "{{.ID}}") sh` on one of the etcd masters.  

### 2) Remove an old etcd node

With the node still in the inventory, run `remove-node.yml` passing `-e node=NODE_NAME` as the name of the node that should be removed.
If the node you want to remove is not online, you should add `reset_nodes=false` to your extra-vars.

### 3) Make sure only remaining nodes are in your inventory

Remove `NODE_NAME` from your inventory file.

### 4) Update kubernetes and network configuration files with the valid list of etcd members

Run `cluster.yml` to regenerate the configuration files on all remaining nodes.

### 5) Shutdown the old instance

That's it.
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00			`# Adding/replacing a node`

			`Modified from [comments in #3471](https://github.com/kubernetes-sigs/kubespray/issues/3471#issuecomment-530036084)`

update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			`## Limitation: Removal of first kube-master and etcd-master`

			`Currently you can't remove the first node in your kube-master and etcd-master list. If you still want to remove this node you have to:`

			`### 1) Change order of current masters`

			Modify the order of your master list by pushing your first entry to any other position. E.g. if you want to remove `node-1` of the following example:

			```yaml
			`children:`
			`kube-master:`
			`hosts:`
			`node-1:`
			`node-2:`
			`node-3:`
			`kube-node:`
			`hosts:`
			`node-1:`
			`node-2:`
			`node-3:`
			`etcd:`
			`hosts:`
			`node-1:`
			`node-2:`
			`node-3:`
			```

			`change your inventory to:`

			```yaml
			`children:`
			`kube-master:`
			`hosts:`
			`node-2:`
			`node-3:`
			`node-1:`
			`kube-node:`
			`hosts:`
			`node-2:`
			`node-3:`
			`node-1:`
			`etcd:`
			`hosts:`
			`node-2:`
			`node-3:`
			`node-1:`
			```

			`## 2) Upgrade the cluster`

			run `cluster-upgrade.yml` or `cluster.yml`. Now you are good to go on with the removal.

Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00			`## Adding/replacing a worker node`

			`This should be the easiest.`

			`### 1) Add new node to the inventory`

			### 2) Run `scale.yml`

update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			You can use `--limit=NODE_NAME` to limit Kubespray to avoid disturbing other nodes in the cluster.
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
Gather just the necessary facts (#5955) * Gather just the necessary facts * Move fact gathering to separate playbook. 2020-04-17 23:23:36 +00:00			Before using `--limit` run playbook `facts.yml` without the limit to refresh facts cache for all nodes.

update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			`### 3) Remove an old node with remove-node.yml`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
			With the old node still in the inventory, run `remove-node.yml`. You need to pass `-e node=NODE_NAME` to the playbook to limit the execution to the node being removed.
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00
			If the node you want to remove is not online, you should add `reset_nodes=false` to your extra-vars: `-e node=NODE_NAME reset_nodes=false`.
			`Use this flag even when you remove other types of nodes like a master or etcd nodes.`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
			`### 5) Remove the node from the inventory`

			`That's it.`

			`## Adding/replacing a master node`

update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			### 1) Run `cluster.yml`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			Append the new host to the inventory and run `cluster.yml`. You can NOT use `scale.yml` for that.
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
			`### 3) Restart kube-system/nginx-proxy`

			`In all hosts, restart nginx-proxy pod. This pod is a local proxy for the apiserver. Kubespray will update its static config, but it needs to be restarted in order to reload.`

			```sh
			`# run in every host`
			`docker ps \| grep k8s_nginx-proxy_nginx-proxy \| awk '{print $1}' \| xargs docker restart`
			```

			`### 4) Remove old master nodes`

update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			With the old node still in the inventory, run `remove-node.yml`. You need to pass `-e node=NODE_NAME` to the playbook to limit the execution to the node being removed.
			If the node you want to remove is not online, you should add `reset_nodes=false` to your extra-vars.
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			`## Adding an etcd node`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
			`You need to make sure there are always an odd number of etcd nodes in the cluster. In such a way, this is always a replace or scale up operation. Either add two new nodes or remove an old one.`

			`### 1) Add the new node running cluster.yml`

			Update the inventory and run `cluster.yml` passing `--limit=etcd,kube-master -e ignore_assert_errors=yes`.
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			If the node you want to add as an etcd node is already a worker or master node in your cluster, you have to remove him first using `remove-node.yml`.
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			Run `upgrade-cluster.yml` also passing `--limit=etcd,kube-master -e ignore_assert_errors=yes`. This is necessary to update all etcd configuration in the cluster.
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			`At this point, you will have an even number of nodes.`
			`Everything should still be working, and you should only have problems if the cluster decides to elect a new etcd leader before you remove a node.`
			`Even so, running applications should continue to be available.`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			If you add multiple ectd nodes with one run, you might want to append `-e etcd_retries=10` to increase the amount of retries between each ectd node join.
			Otherwise the etcd cluster might still be processing the first join and fail on subsequent nodes. `etcd_retries=10` might work to join 3 new nodes.
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			`## Removing an etcd node`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			`### 1) Remove old etcd members from the cluster runtime`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			`Acquire a shell prompt into one of the etcd containers and use etcdctl to remove the old member. Use a etcd master that will not be removed for that.`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
			```sh
			`# list all members`
			`etcdctl member list`

update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			`# run remove for each member you want pass to remove-node.yml in step 2`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00			`etcdctl member remove MEMBER_ID`
			`# careful!!! if you remove a wrong member you will be in trouble`

update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			`# wait until you do not get a 'Failed' output from`
			`etcdctl member list`

			`# note: these command lines are actually much bigger, if you are not inside an etcd container, since you need to pass all certificates to etcdctl.`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00			```

update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			You can get into an etcd container by running `docker exec -it $(docker ps --filter "name=etcd" --format "{{.ID}}") sh` on one of the etcd masters.

			`### 2) Remove an old etcd node`

			With the node still in the inventory, run `remove-node.yml` passing `-e node=NODE_NAME` as the name of the node that should be removed.
			If the node you want to remove is not online, you should add `reset_nodes=false` to your extra-vars.

			`### 3) Make sure only remaining nodes are in your inventory`

			Remove `NODE_NAME` from your inventory file.
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			`### 4) Update kubernetes and network configuration files with the valid list of etcd members`
Add document about adding/replacing a node (#5570) * Add document about adding/replacing a node * Update nodes.md Amend for comments 2020-03-15 10:32:34 +00:00
update documentation to add and remove nodes (#6095) * update documentation to add and remove nodes * add information about parameters to change when adding multiple etcd nodes * add information about reset_nodes * add documentation about adding existing nodes to ectd masters. 2020-05-18 09:35:37 +00:00			Run `cluster.yml` to regenerate the configuration files on all remaining nodes.

			`### 5) Shutdown the old instance`

			`That's it.`