Kubernetes

Rationale

Kubernetes is the system we use for hosting, deploying and managing our applications. It comprises infrastructure solutions like RBAC Authorization, distributed persistent storage, managing resource quotas, managing DNS records, managing load balancers, autoscaling, blue-Green deployments, rollbacks among many others. It allows us to serve and scale our applications in an easy, secure and automated way.

The main reasons why we chose it over other alternatives are:

It is capable of deploying complex applications, including related Servers, DNS records, and load balancers in an automated way, allowing us to focus more on the application development and less on the infrastructure supporting it.
It can be fully managed using Terraform.
It supports Blue-Green deployments, allowing us to deploy applications many times a day without service interruptions.
It supports Rollbacks, allowing us to revert applications to previous versions in case the need arise.
It supports Horizontal autoscaling, allowing us to easily adapt our applications to the loads they're getting.
It supports Service accounts, RBAC Authorization, and IRSA, allowing to give applications permissions to external resources using a least privilege approach.
It supports resource quotas, allowing to easily distribute containers among physical machines using a granular cpu/memory per container approach.
It has its own package manager, which makes deploying services very easy.
It has its own local reproducibility tool for simulating clusters in local environments.
It is Open source.
It is not platform-bounded.
Azure AKS, AWS EKS, GCP GKE, support it.
It can be IaaS when implemented under a cloud provider.
Migrating it from one cloud provider to another is, although not a simple task, at least possible.
It is widely used by the community.
It has many open source extensions.

Alternatives

The following alternatives were considered but not chosen for the following reasons:

AWS ECS: It is a serverless service for running containers. It is expensive as only one container exists within an entire physical machine. It does not support extensions. It is platform-bounded. It is not Open source.
AWS Fargate: It is a serverless service for running containers without administering the infrastructure they run upon. It is expensive as only one container exists within an entire physical machine. It does not support extensions. It is platform-bounded. It is not Open source.
AWS EC2: It is a service for cloud computing. AWS EKS actually uses it for setting up cluster workers. It does not support extensions. It is platform-bounded. It is not Open source.
HashiCorp Nomad: Currently, no cloud provider supports it, which means that having to manage both managers and workers is required. It takes a simpler approach to orchestrating applications, with the downside of losing flexibility.
Docker Swarm: Currently, no cloud provider supports it, which means that having to manage both managers and workers is required. It takes a simpler approach to orchestrating applications, with the downside of losing flexibility.

Usage

We use Kubernetes for:

Hosting our Platform.
Automatically deploying ephemeral environments on CI/CD workflows.
Automatically deploying DNS records for applications.
Automatically deploying load balancers for applications.
Automatically scaling worker nodes based on application load.
Running application performance monitoring using New Relic.

We do not use Kubernetes for:

Rollbacks: We should version production artifacts in order to be able to automatically return to a previous working version of our applications.
GitLab Runner: It was slow, unreliable and added too much overhead to workers. We decided to go back to Autoscaling Runner.
Chaos Engineering: In order to harden ourselves against errors, we should create a little chaos in our infrastructure.

Guidelines

General

Any changes to the cluster infrastructure and configuration must be done via Merge Requests.
Any changes related to the Platform (deployments, autoscaling, ingress...) for both development and production must be done via Merge Requests.
To learn how to test and apply infrastructure via Terraform, visit the Terraform Guidelines.

Components

Our cluster implements:

AWS EKS Terraform module for declaring the cluster as code using Terraform.
AWS Load Balancer Controller for automatically initializing AWS load balancers when declaring ingress resources.
AWS Kubernetes Autoscaler for automatically scaling the cluster size based on resource assignation.
ExternalDNS for automatically setting DNS records when declaring ingress resources.
Kubernetes Metrics Server for automatically scaling deployments like production Platform based on application load (CPU, Memory, custom metrics).
New Relic for monitoring both production Platform and general infrastructure.

Debugging

Connect to cluster

In order to connect to the Kubernetes Cluster, you must:

Login as an Integrates developer using this guide.
Install kubectl and aws-cli with nix-env -i awscli kubectl.
Select cluster by running aws eks update-kubeconfig --name common-k8s --region us-east-1.
Run kubectl get node.

Your input should be similar to this:

kubectl get node
NAME                            STATUS   ROLES    AGE   VERSION
ip-192-168-5-112.ec2.internal   Ready    <none>   58d   v1.17.9-eks-4c6976
ip-192-168-5-144.ec2.internal   Ready    <none>   39d   v1.17.11-eks-cfdc40
ip-192-168-5-170.ec2.internal   Ready    <none>   20d   v1.17.11-eks-cfdc40
ip-192-168-5-35.ec2.internal    Ready    <none>   30d   v1.17.11-eks-cfdc40
ip-192-168-5-51.ec2.internal    Ready    <none>   30d   v1.17.11-eks-cfdc40
ip-192-168-6-109.ec2.internal   Ready    <none>   30d   v1.17.11-eks-cfdc40
ip-192-168-6-127.ec2.internal   Ready    <none>   18d   v1.17.11-eks-cfdc40
ip-192-168-6-135.ec2.internal   Ready    <none>   31d   v1.17.11-eks-cfdc40
ip-192-168-6-151.ec2.internal   Ready    <none>   30d   v1.17.11-eks-cfdc40
ip-192-168-6-221.ec2.internal   Ready    <none>   13d   v1.17.11-eks-cfdc40
ip-192-168-7-151.ec2.internal   Ready    <none>   30d   v1.17.11-eks-cfdc40
ip-192-168-7-161.ec2.internal   Ready    <none>   33d   v1.17.11-eks-cfdc40
ip-192-168-7-214.ec2.internal   Ready    <none>   61d   v1.17.9-eks-4c6976
ip-192-168-7-48.ec2.internal    Ready    <none>   30d   v1.17.11-eks-cfdc40
ip-192-168-7-54.ec2.internal    Ready    <none>   39d   v1.17.11-eks-cfdc40

Common commands

Most commands have the following syntax: kubectl <action> <resource> -n <namespace>

Common actions are: get, describe, logs, exec and edit.
Common resources are: pod, node, deployment, ingress, hpa.
Common namespaces are: development, production and kube-system. Additionally, the -A flag executes <action> for all namespaces.

Some basic examples are:

Command	Example	Description
`kubectl get pod -A`	`N/A`	Get all running pods
`kubectl get node -A`	`N/A`	Get all cluster nodes
`kubectl get deployment -A`	`N/A`	Get all cluster deployments
`kubectl get hpa -A`	`N/A`	Get all autoscaling policies
`kubectl get namespace`	`N/A`	Get all cluster namespaces

Some more complex examples are:

Command	Example	Description
`kubectl describe pod -n <namespace> <pod>`	`kubectl describe pod -n development app-dsalazaratfluid-7c485cf565-w9gwg`	Describe pod configurations
`kubectl logs -n <namespace> <pod> -c <container>`	`kubectl logs -n development app-dsalazaratfluid-7c485cf565-w9gwg -c app`	Get container logs from a pod
`kubectl exec -it -n <namespace> <pod> -c <container> -- <command>`	`kubectl exec -it -n development app-dsalazaratfluid-7c485cf565-w9gwg -c app -- bash`	Access a container within pod
`kubectl edit deployment -n <namespace> <deployment>`	`kubectl edit deployment -n development integrates-dsalazaratfluid`	Edit a specific deployment

Rationale​

Alternatives​

Usage​

Guidelines​

General​

Components​

Debugging​

Connect to cluster​

Common commands​