(note, this book is from 2015, should validate content with the latest documentation)
Control groups (cGroups) - allow host to share and limit the resources each process or container can consume.
Namespaces - Processes are limited to see only the process ID in the same namespace.
Master - brain of cluster, core API server
master includes scheduler, which works with API server to schedule workloads
replication controller works with API server to ensure correct number of pod replicas are running at any given time
etcd as a distributed configuration store, kubernetes state is stored here
Node (formerly minions)
in each node, we have:
- kublet interacts with the API server to update state and start new workloads that have been invoked by the scheduler
- kube-proxy provides basic load balancing and directs traffic to specific services
and some default pods:
- health checks
default pod will run alongside our scheduled pids on every node
keep related containers close in terms of network and hardware infrastructure
logical group of containers
pods may run one or more containers inside
labels give us another level of categorization
labels are just simple key-value pairs
using a reliable endpoint, can access pods running on the cluster seamlessly
users -> URI -> Kube-proxy -> Pod (Virtual IP and Port)
membership in the service load balancing pool is determined by the use of selectors and labels.
updates to service definitions are monitored and coordinated from the master and propagated to the kube-proxy daemons running on each node.
(note, there is a plan to containerize kube-proxy and kubelet by default in the future)
Replication controllers (RCs)
manage the number of nodes that a pod runs on. they ensure that an instance of an image is being run with the specific number of copies
two layers of health checking
first, k8s attempts to connect to a particular endpoint (http or tcp) and give a status of healthy on a successful connection
second, application-specifc health checks can be performed using command line scripts
livenessProbe is the core health check element, we can specify
exec from there
livenessProbe is only changed with restarting the container on a health check fail.
readinessProbe will remove a container from the pool of pods answering service endpoints.
Life cycle hooks
hooks calls are delivered at least once, any logic in the action should gracefully handles multple calls.
postStart runs before a pod enters its ready state, if hook fails, the pod will be considered unhealthy.
default behavior is to spread container replicas across the nodes in cluster.
it also provides the ability to add constraints based on resources available to the node. (cpu and memory)
when additional constraints are defined, kubernetes will check a node for available resources; if no nodes can be found that meet the criteria, then we will see a scheduling error in the logs.
Networking in kubernetes requires that each pod have its own IP address.
kubernetes does not allow the use of NAT for container-to-container or container-to-node traffic.
compares to docker:
docker by default uses bridged networking mode - container has its own networking namespace and is then bridged via virtual interfaces to the host (node in case of k8s) network
in bridged mode, two containers can use the same IP range because they are completely isolated.
docker also supports host mode and container mode, but in all these scenarios, the container IP space is not available outside that machine, connecting containers across two machines then requires NAT and port mapping for communication.
libnetwork, weave, flannel and calico are projects to solve this problem
(note, it's IP per pod not per container. you can have a webservice pod with web server and database server containers sharing the same IP but communicating with ports)
Kubernetes is using kube-proxy to determine the proper pod IP address and port serving each request.
behind the scenes, kube-proxy is actually using virtual IPs and
kube-proxy is running on every node, monitors the API from master. any updates to services will trigger an update to
iptables from kube-proxy.
port is randomly selected during service creation.
it's also possible to always forward traffic from the same client IP to same backend pod/container using the
sessionAffinity element in service definition.