Kubernetes
Rationale
Kubernetes is the system we use for hosting, deploying and managing our applications. It comprises infrastructure solutions like RBAC Authorization, distributed persistent storage, managing resource quotas, managing DNS records, managing load balancers, autoscaling, blue-Green deployments, rollbacks among many others. It allows us to serve and scale our applications in an easy, secure and automated way.
The main reasons why we chose it over other alternatives are:
- It is capable of deploying complex applications, including related Servers, DNS records, and load balancers in an automated way, allowing us to focus more on the application development and less on the infrastructure supporting it.
- It can be fully managed using Terraform.
- It supports Blue-Green deployments, allowing us to deploy applications many times a day without service interruptions.
- It supports Rollbacks, allowing us to revert applications to previous versions in case the need arise.
- It supports Horizontal autoscaling, allowing us to easily adapt our applications to the loads they're getting.
- It supports Service accounts, RBAC Authorization, and IRSA, allowing to give applications permissions to external resources using a least privilege approach.
- It supports
resource quotas,
allowing to easily distribute containers among physical machines using
a granular
cpu/memory per container
approach. - It has its own package manager, which makes deploying services very easy.
- It has its own local reproducibility tool for simulating clusters in local environments.
- It is Open source.
- It is not platform-bounded.
- Azure AKS, AWS EKS, GCP GKE, support it.
- It can be IaaS when implemented under a cloud provider.
- Migrating it from one cloud provider to another is, although not a simple task, at least possible.
- It is widely used by the community.
- It has many open source extensions.
Alternatives
The following alternatives were considered but not chosen for the following reasons:
- AWS ECS: It is a serverless service for running containers. It is expensive as only one container exists within an entire physical machine. It does not support extensions. It is platform-bounded. It is not Open source.
- AWS Fargate: It is a serverless service for running containers without administering the infrastructure they run upon. It is expensive as only one container exists within an entire physical machine. It does not support extensions. It is platform-bounded. It is not Open source.
- AWS EC2: It is a service for cloud computing. AWS EKS actually uses it for setting up cluster workers. It does not support extensions. It is platform-bounded. It is not Open source.
- HashiCorp Nomad: Currently, no cloud provider supports it, which means that having to manage both managers and workers is required. It takes a simpler approach to orchestrating applications, with the downside of losing flexibility.
- Docker Swarm: Currently, no cloud provider supports it, which means that having to manage both managers and workers is required. It takes a simpler approach to orchestrating applications, with the downside of losing flexibility.
Usage
We use Kubernetes for:
- Hosting our Platform.
- Automatically deploying ephemeral environments on CI/CD workflows.
- Automatically deploying DNS records for applications.
- Automatically deploying load balancers for applications.
- Automatically scaling worker nodes based on application load.
- Running application performance monitoring using New Relic.
We do not use Kubernetes for:
- Rollbacks: We should version production artifacts in order to be able to automatically return to a previous working version of our applications.
- GitLab Runner: It was slow, unreliable and added too much overhead to workers. We decided to go back to Autoscaling Runner.
- Chaos Engineering: In order to harden ourselves against errors, we should create a little chaos in our infrastructure.
Guidelines
General
- Any changes to the cluster infrastructure and configuration must be done via Merge Requests.
- Any changes related to the Platform (deployments, autoscaling, ingress...) for both development and production must be done via Merge Requests.
- To learn how to test and apply infrastructure via Terraform, visit the Terraform Guidelines.
Components
Our cluster implements:
- AWS EKS Terraform module for declaring the cluster as code using Terraform.
- AWS Load Balancer Controller for automatically initializing AWS load balancers when declaring ingress resources.
- AWS Kubernetes Autoscaler for automatically scaling the cluster size based on resource assignation.
- ExternalDNS for automatically setting DNS records when declaring ingress resources.
- Kubernetes Metrics Server for automatically scaling deployments like production Platform based on application load (CPU, Memory, custom metrics).
- New Relic for monitoring both production Platform and general infrastructure.
Debugging
Connect to cluster
In order to connect to the Kubernetes Cluster, you must:
- Login as an Integrates developer using this guide.
- Install kubectl and aws-cli with
nix-env -i awscli kubectl
. - Select cluster by running
aws eks update-kubeconfig --name common-k8s --region us-east-1
. - Run
kubectl get node
.
Your input should be similar to this:
kubectl get node
NAME STATUS ROLES AGE VERSION
ip-192-168-5-112.ec2.internal Ready <none> 58d v1.17.9-eks-4c6976
ip-192-168-5-144.ec2.internal Ready <none> 39d v1.17.11-eks-cfdc40
ip-192-168-5-170.ec2.internal Ready <none> 20d v1.17.11-eks-cfdc40
ip-192-168-5-35.ec2.internal Ready <none> 30d v1.17.11-eks-cfdc40
ip-192-168-5-51.ec2.internal Ready <none> 30d v1.17.11-eks-cfdc40
ip-192-168-6-109.ec2.internal Ready <none> 30d v1.17.11-eks-cfdc40
ip-192-168-6-127.ec2.internal Ready <none> 18d v1.17.11-eks-cfdc40
ip-192-168-6-135.ec2.internal Ready <none> 31d v1.17.11-eks-cfdc40
ip-192-168-6-151.ec2.internal Ready <none> 30d v1.17.11-eks-cfdc40
ip-192-168-6-221.ec2.internal Ready <none> 13d v1.17.11-eks-cfdc40
ip-192-168-7-151.ec2.internal Ready <none> 30d v1.17.11-eks-cfdc40
ip-192-168-7-161.ec2.internal Ready <none> 33d v1.17.11-eks-cfdc40
ip-192-168-7-214.ec2.internal Ready <none> 61d v1.17.9-eks-4c6976
ip-192-168-7-48.ec2.internal Ready <none> 30d v1.17.11-eks-cfdc40
ip-192-168-7-54.ec2.internal Ready <none> 39d v1.17.11-eks-cfdc40
Common commands
Most commands have the following syntax: kubectl <action> <resource> -n <namespace>
- Common actions are:
get
,describe
,logs
,exec
andedit
. - Common resources are:
pod
,node
,deployment
,ingress
,hpa
. - Common namespaces are:
development
,production
andkube-system
. Additionally, the-A
flag executes<action>
for all namespaces.
Some basic examples are:
Command | Example | Description |
---|---|---|
kubectl get pod -A | N/A | Get all running pods |
kubectl get node -A | N/A | Get all cluster nodes |
kubectl get deployment -A | N/A | Get all cluster deployments |
kubectl get hpa -A | N/A | Get all autoscaling policies |
kubectl get namespace | N/A | Get all cluster namespaces |
Some more complex examples are:
Command | Example | Description |
---|---|---|
kubectl describe pod -n <namespace> <pod> | kubectl describe pod -n development app-dsalazaratfluid-7c485cf565-w9gwg | Describe pod configurations |
kubectl logs -n <namespace> <pod> -c <container> | kubectl logs -n development app-dsalazaratfluid-7c485cf565-w9gwg -c app | Get container logs from a pod |
kubectl exec -it -n <namespace> <pod> -c <container> -- <command> | kubectl exec -it -n development app-dsalazaratfluid-7c485cf565-w9gwg -c app -- bash | Access a container within pod |
kubectl edit deployment -n <namespace> <deployment> | kubectl edit deployment -n development integrates-dsalazaratfluid | Edit a specific deployment |