Jim Cheung

Kubernetes

(notes from Getting Started with Kubernetes: Jonathan Baier: 9781784394035: Amazon.com: Books)

(note, this book is from 2015, should validate content with the latest documentation)

Control groups (cGroups) - allow host to share and limit the resources each process or container can consume.

Namespaces - Processes are limited to see only the process ID in the same namespace.

Core Concepts

Master - brain of cluster, core API server

master includes scheduler, which works with API server to schedule workloads

replication controller works with API server to ensure correct number of pod replicas are running at any given time

etcd as a distributed configuration store, kubernetes state is stored here

Node (formerly minions)

in each node, we have:

and some default pods:

default pod will run alongside our scheduled pids on every node

Pods

keep related containers close in terms of network and hardware infrastructure

logical group of containers

pods may run one or more containers inside

Labels

labels give us another level of categorization

labels are just simple key-value pairs

Services

using a reliable endpoint, can access pods running on the cluster seamlessly

users -> URI -> Kube-proxy -> Pod (Virtual IP and Port)

membership in the service load balancing pool is determined by the use of selectors and labels.

updates to service definitions are monitored and coordinated from the master and propagated to the kube-proxy daemons running on each node.

(note, there is a plan to containerize kube-proxy and kubelet by default in the future)

Replication controllers (RCs)

manage the number of nodes that a pod runs on. they ensure that an instance of an image is being run with the specific number of copies

Health checks

two layers of health checking

first, k8s attempts to connect to a particular endpoint (http or tcp) and give a status of healthy on a successful connection

second, application-specifc health checks can be performed using command line scripts

livenessProbe is the core health check element, we can specify httpGet, tcpSocket or exec from there

livenessProbe is only changed with restarting the container on a health check fail.

readinessProbe will remove a container from the pool of pods answering service endpoints.

Life cycle hooks

hooks like postStart and preStop

hooks calls are delivered at least once, any logic in the action should gracefully handles multple calls.

postStart runs before a pod enters its ready state, if hook fails, the pod will be considered unhealthy.

Application scheduling

default behavior is to spread container replicas across the nodes in cluster.

it also provides the ability to add constraints based on resources available to the node. (cpu and memory)

when additional constraints are defined, kubernetes will check a node for available resources; if no nodes can be found that meet the criteria, then we will see a scheduling error in the logs.

Networking

Networking in kubernetes requires that each pod have its own IP address.

kubernetes does not allow the use of NAT for container-to-container or container-to-node traffic.

compares to docker:

docker by default uses bridged networking mode - container has its own networking namespace and is then bridged via virtual interfaces to the host (node in case of k8s) network

in bridged mode, two containers can use the same IP range because they are completely isolated.

docker also supports host mode and container mode, but in all these scenarios, the container IP space is not available outside that machine, connecting containers across two machines then requires NAT and port mapping for communication.

libnetwork, weave, flannel and calico are projects to solve this problem

(note, it's IP per pod not per container. you can have a webservice pod with web server and database server containers sharing the same IP but communicating with ports)

Kubernetes is using kube-proxy to determine the proper pod IP address and port serving each request.

behind the scenes, kube-proxy is actually using virtual IPs and iptables

kube-proxy is running on every node, monitors the API from master. any updates to services will trigger an update to iptables from kube-proxy.

port is randomly selected during service creation.

it's also possible to always forward traffic from the same client IP to same backend pod/container using the sessionAffinity element in service definition.