Commit graph

1594 commits

Author SHA1 Message Date
Bogdan Dobrelya
16a4b4f336 Merge pull request #672 from kubernetes-incubator/fail_all_on_error
Fail all nodes on error
2016-12-02 17:08:10 +01:00
Bogdan Dobrelya
c1d3a5b7af Merge pull request #671 from bogdando/disable_logs_upload
Disable logs upload and verbose logging
2016-12-02 16:02:52 +01:00
Bogdan Dobrelya
ba0d496242 Disable logs upload and verbose logging
In order to speed up CI jobs, do not produce -v logs.
Also, disable collecting and uploading logs to GS, unless
the buckets creation issue resolved.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-12-02 16:02:33 +01:00
Bogdan Dobrelya
220a375cb9 Merge pull request #656 from YorikSar/nginx-proxy-timeout
Set proxy_timeout to 10m in nginx.conf
2016-12-02 12:48:18 +01:00
ant31
e8e2c84ca4 Fail all nodes on error 2016-12-02 12:37:22 +01:00
Bogdan Dobrelya
3a73310c15 Merge pull request #663 from bogdando/reduce_matrix
Reduce CI test matrix
2016-11-30 10:43:43 +01:00
Bogdan Dobrelya
aafdbebd48 Reduce CI test matrix
Reduce the test cases from 15 to 9, bearing in mind that:
* Disable weave/coreos gate unless its deployment fixed
* If debian/centos7 fails with net plugin X, ubuntu-xenial/rhel-7 will
  likely fail as well
* Canal also covers the flannel plugin deployment, but keep at least one
  of the flannel plugin deployment, unless it's superseded and removed.
* Keep at least one of each OS/plugin family to be tested in the separate
  nodes layout
* Keep at least one of each OS family to be tested against each of the
  plugin types in default nodes layout
* Rebalance GCE regions for instances, replace asia to eu/us as they
  are the longest running jobs.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-29 18:53:43 +01:00
Antoine Legrand
6630348164 Merge pull request #657 from smelchior/master
add  azure support for kargo
2016-11-29 12:20:49 +01:00
Sebastian Melchior
bc465fb6c0 add azure to readme 2016-11-29 12:16:30 +01:00
Bogdan Dobrelya
c644c22583 Merge pull request #658 from bogdando/gce_images
Switch to standard debian/centos/rhel for CI
2016-11-29 11:35:50 +01:00
Bogdan Dobrelya
d4305d1a64 Switch to standard debian/centos/rhel for CI
Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-29 10:25:07 +01:00
Sebastian Melchior
254e02c69e add basic azure support for kargo 2016-11-29 10:20:28 +01:00
Yuriy Taraday
d92124561d Set proxy_timeout to 10m in nginx.conf
Fixes #655.

This is a teporary solution for long-polling idle connections to
apiserver. It will make Nginx not cut them for the duration of expected
timeout. It will also make Nginx extremely slow in realizing that there
is some issue with connectivity to apiserver as well, so it might not be
perfect permanent solution.
2016-11-28 20:27:47 +03:00
Antoine Legrand
f75e2c5119 Merge pull request #529 from bogdando/netcheck
Add a k8s app for advanced e2e netcheck for DNS
2016-11-28 15:26:30 +01:00
Bogdan Dobrelya
d5b21b34c2 Add advanced net check for DNS K8s app
* Add an option to deploy K8s app to test e2e network connectivity
  and cluster DNS resolve via Kubedns for nethost/simple pods
  (defaults to false).
* Parametrize existing k8s apps templates with kube_namespace and
  kube_config_dir instead of hardcode.
* For CoreOS, ensure nameservers from inventory to be put in the
  first place to allow hostnet pods connectivity via short names
  or FQDN and hostnet agents to pass as well, if netchecker
  deployed.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-28 13:23:25 +01:00
Bogdan Dobrelya
779d676414 Merge pull request #652 from kubernetes-incubator/debug_mode
Tune dnsmasq/kubedns limits, replicas, logging
2016-11-25 16:57:15 +01:00
Bogdan Dobrelya
3d0912cf35 Merge pull request #640 from bodepd/terraform_aws_decouple_k8s_cluster_etcd_roles
Decouple etcd/k8s-cluster roles in ec2 terraform
2016-11-25 15:11:51 +01:00
Bogdan Dobrelya
8f194d3bd0 Merge pull request #650 from adidenko/remove-calico-ctl-tag-override
Update default calico/ctl image tag
2016-11-25 14:55:59 +01:00
Bogdan Dobrelya
c34c49d4d9 Tune dnsmasq/kubedns limits, replicas, logging
* Add dns_replicas, dns_memory/cpu_limit/requests vars for
dns related apps.
* When kube_log_level=4, log dnsmasq queries as well.
* Add log level control for skydns (part of kubedns app).
* Add limits/requests vars for dnsmasq (part of kubedns app) and
  dnsmasq daemon set.
* Drop string defaults for kube_log_level as it is int and
  is defined in the global vars as well.
* Add docs

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-25 12:49:17 +01:00
Aleksandr Didenko
0e49c5f240 Update calico/ctl image tag
We no longer need to use v0.22.0 for calicoctl since Kargo has
support for new calicoctl CLI format.

Also fixing condition logic for calico pool task.
2016-11-25 11:23:27 +01:00
Bogdan Dobrelya
09a1a1a963 Merge pull request #651 from bogdando/fix_docker_install
Fix download dnsmasq image dependency on docker
2016-11-24 18:44:12 +01:00
Bogdan Dobrelya
91cc141662 Merge pull request #648 from artem-panchenko/fix_calicoctl_node_run
Fix Calico jinja template (systemd)
2016-11-24 18:33:34 +01:00
Bogdan Dobrelya
417a931f78 Fix download dnsmasq image dependency on docker
When download_run_once with download_localhost is used, docker is
expected to be running on the delegate localhost. That may be not
the case for a non localhost delegate, which is the kube-master
otherwise. Then the dnsmasq role, had it been invoked early before
deployment starts, would fail because of the missing docker dependency.

* Fix that dependency on docker and do not pre download dnsmasq image
  for the dnsmasq role, if download_localhost is disabled.
* Remove become: false for docker CLI invocation because that's not
  the common pattern to allow users access docker CLI w/o sudo.
* Fix opt bin path hack for localhost delegate to ignore errors when
  it fails with "sudo password required" otherwise.
* Describe download_run_once with download_localhost use case in docs
  as well.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-24 18:31:26 +01:00
Smaine Kahlouch
5bc9b9e349 Merge pull request #649 from bogdando/coreos_resolvconf
Ensure /etc/resolv.conf content for CoreOS
2016-11-24 10:42:38 +01:00
Bogdan Dobrelya
bbd57d5f5e Ensure /etc/resolv.conf content for CoreOS
Use cloud-init config to replace /etc/resolv.conf with the
content for kubelet to properly configure hostnet pods.

Do not use systemd-resolved yet, see
https://coreos.com/os/docs/latest/configuring-dns.html
"Only nss-aware applications can take advantage of the
systemd-resolved cache. Notably, this means that statically
linked Go programs and programs running within Docker/rkt
will use /etc/resolv.conf only, and will not use the
systemd-resolve cache."

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-23 16:51:49 +01:00
Bogdan Dobrelya
0e6435686f Merge pull request #646 from kubernetes-incubator/fix_nginx_download
Fix nginx container download for download_run_once mode
2016-11-23 11:46:53 +01:00
Artem Panchenko
0437f9584d Fix Calico jinja template (systemd) 2016-11-23 11:43:53 +02:00
Bogdan Dobrelya
a4d5a14791 Fix nginx container download for download_run_once mode
W/o this patch, the "Download containers" task may be skipped
when running on the delegate node due to wrong "when" confition.
Then it fails to upload nginx image to the nodes as well.

Fix download nginx dependency so it always can be pushed to
nodes when download_run_once is enabled.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-23 10:37:08 +01:00
Bogdan Dobrelya
bfa15cd8ea Merge pull request #642 from kubernetes-incubator/k8s_imgpull
Allow pre-downloaded images to be used effectively
2016-11-22 18:09:38 +01:00
Bogdan Dobrelya
57b514aa66 Merge pull request #645 from adidenko/fix-ansible_ssh_user
Set defaults for ansible_ssh_user
2016-11-22 18:07:16 +01:00
Aleksandr Didenko
f0b1884104 Set defaults for ansible_ssh_user
When setting permission for containers download/upload dir we're
using `ansible_ssh_user`. But if playbook is executed without
user being explicitly set `ansible_ssh_user` may be undefined.
In such situations dir ownership will default to `ansible_user_id`

Closes: #644
2016-11-22 18:00:56 +01:00
Bogdan Dobrelya
1bd3d3a080 Allow pre-downloaded images to be used effectively
According to http://kubernetes.io/docs/user-guide/images/ :
By default, the kubelet will try to pull each image from the
specified registry. However, if the imagePullPolicy property
of the container is set to IfNotPresent or Never, then a local\
image is used (preferentially or exclusively, respectively).

Use IfNotPresent value to allow images prepared by the download
role dependencies to be effectively used by kubelet without pull
errors resulting apps to stay blocked in PullBackOff/Error state
even when there are images on the localhost exist.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-22 16:16:04 +01:00
Antoine Legrand
bca996bf0b Merge pull request #638 from pskrzyns/fix_setting_loadbalancer_apiserver_localhost
Fix conditional when setting loadbalancer_apiserver_localhost
2016-11-22 15:15:38 +01:00
Bogdan Dobrelya
539a47b0fa Merge pull request #621 from xenolog/calico_network_backend
Add ability to define network backend for Calico.
2016-11-22 14:55:47 +01:00
Antoine Legrand
0016ba1759 Merge pull request #635 from kubernetes-incubator/download_images
Download images as dependencies of roles
2016-11-22 14:53:12 +01:00
Antoine Legrand
5c66f1bb8b Merge pull request #637 from bogdando/wait_pods
Increase wait for pods post-install test
2016-11-22 12:25:47 +01:00
Bogdan Dobrelya
793cedc522 Download images as dependencies of roles
Pre download all required container images as roles' deps.
Drop unused flannel-server-helper images pre download.
Improve pods creation post-install test pre downloaded busybox.
Improve logs collection script with kubectl describe, fix sudo/etcd/weave
commands.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-22 11:13:57 +01:00
Dan Bode
ece09765b9 Decouple etcd/k8s-cluster roles in ec2 terraform
Currently, the terraform script in contrib
adds etcd role as a child of k8s-cluster in
its generated inventory file.

This is problematic when the etcd role is
deployed on separate nodes from the k8s master
and nodes. In this case, this leads to failures
of the k8s node since the PKI certs required for
that role have not been propogated.
2016-11-21 10:44:13 -08:00
Paweł Skrzyński
67b61c5c42 Fix conditional when setting loadbalancer_apiserver_localhost 2016-11-21 19:36:05 +01:00
Bogdan Dobrelya
326d98353f Increase wait for pods post-install test
The test deployment/rc/pods creation time
is near 2m on slow CI instances with 1 CPU/1.7G RAM.
Increase wait time to allow the post test fail less often.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-21 18:50:05 +01:00
Antoine Legrand
1204eaccb1 Merge pull request #636 from kubernetes-incubator/apiserver_liveness
Add missing liveness probe for apiserver static pod
2016-11-21 18:27:20 +01:00
Bogdan Dobrelya
523c9d77df Add missing liveness probe for apiserver static pod
Fix unreliable waiting for the apiserver to become ready.
Remove logfile mount to align with the rest of static pods
and because containers shall write logs to stdout only.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-21 13:15:51 +01:00
Bogdan Dobrelya
b44479d911 Merge pull request #629 from kubernetes-incubator/fix-download-once
Fix download once
2016-11-21 10:55:54 +01:00
Bogdan Dobrelya
d2f9c11299 Merge pull request #633 from bodepd/etcd_fix
Ensure that etcd health checks always pass
2016-11-21 10:29:35 +01:00
Bogdan Dobrelya
47d7277a7d Merge pull request #631 from kubernetes-incubator/allow_failures
Allow failures for coreos/weave
2016-11-21 10:21:57 +01:00
Bogdan Dobrelya
167de7d76b Merge pull request #630 from suside/node_port
Add service-node-port-range parameter for kube-apiserver
2016-11-21 10:17:34 +01:00
Dan Bode
aad73ea90e Ensure that etcd health checks always pass
in the etcd handler, the reload etcd action
was called after ansible waits for etcd to be
up, this means that the health checks which are
called immediately after fail (resulting in the etcd
role always failing and never finishing)

This patch changes the order to move the 'wait for etcd
up' resource after the 'reload etcd resource', ensuring that
the service is up before the health check is called.
2016-11-18 14:15:00 -08:00
Spencer Smith
106dcc3898 updated all instances of restart always to restart on-failure with a max of 5 times 2016-11-18 14:33:22 -05:00
Bogdan Dobrelya
6d900c4a31 Allow failures for coreos/weave
Unless https://github.com/kubernetes-incubator/kargo/issues/613
fixed.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
2016-11-18 17:41:50 +01:00
Bogdan Dobrelya
cf7c6ae859 Add download localhost and enable for CI
* Add download_localhost for the download_run_once mode, which is
  use the ansible host (a travis node for CI case) to store and
  distribute containers across cluster nodes in inventory.
  Defaults to false.
* Rework download_run_once logic to fix idempotency of uploading
  containers.
* For Travis CI, enable docker images caching and run Travis
  workers with sudo enabled as a dependency
* For Travis CI, deploy with download_localhost and download_run_once
  enabled to shourten dev path drastically.
* Add compression for saved container images. Defaults to 'best'.

Signed-off-by: Bogdan Dobrelya <bdobrelia@mirantis.com>
Co-authored-by: Aleksandr Didenko <adidenko@mirantis.com>
2016-11-18 16:00:07 +01:00