sample app
app.js
:
const http = require('http');
const os = require('os');
console.log("Kubia server starting...");
var handler = function(request, response) {
console.log("Received request from " + request.connection.remoteAddress);
response.writeHead(200);
response.end("You've hit " + os.hostname() + "\n");
};
var www = http.createServer(handler);
www.listen(8080);
Dockerfile
:
FROM node:7
ADD app.js /app.js
ENTRYPOINT ["node", "app.js"]
introduction
$ kubectl cluster-info
bash completion
$ source <(kubectl completion bash)
create alias
alias k=kubectl
$ source <(kubectl completion bash | sed s/kubectl/k/g)
The simplest way to deploy your app is to use the kubectl run
command, which will create all the necessary components without having to deal with JSON or YAML.
$ kubectl run kubia --image=luksa/kubia --port=8080 --generator=run/v1
To make the pod accessible from the outside, you’ll expose it through a Service object. You’ll create a special service of type LoadBalancer
By creating a LoadBalancer-type service, an external load balancer will be created and you can connect to the pod through the load balancer’s public IP.
$ kubectl expose rc kubia --type=LoadBalancer --name kubia-http
$ kubectl get svc
We’re using the abbreviation rc
instead of replicationcontroller
(po
for pods
, svc
for services
).
scale up
$ kubectl scale rc kubia --replicas=3
$ kubectl get rc
$ kubectl get pods
$ kubectl get pods -o wide
$ kubectl describe pod kubia-hczji
dashboard
$ kubectl cluster-info | grep dashboard
$ gcloud container clusters describe kubia | grep -E "(username|password):"
$ minikube dashboard
pods
All pods in a Kubernetes cluster reside in a single flat, shared, network-address space.
which means every pod can access every other pod at the other pod’s IP address.
No NAT (Network Address Translation) gateways exist between them.
Deciding when to use multiple containers in a pod
- Do they need to be run together or can they run on different hosts?
- Do they represent a single whole or are they independent components?
- Must they be scaled together or individually?
pod definition
three important sections are found in almost all Kubernetes resources:
metadata
includes the name, namespace, labels, and other information about the pod.spec
contains the actual description of the pod’s contents, such as the pod’s containers, volumes, and other data.status
contains the current information about the running pod, such as what condition the pod is in, the description and status of each container, and the pod’s internal IP and other basic info.
more info:
$ kubectl explain pods
$ kubectl explain pod.spec
A basic pod manifest kubia-manual.yaml
:
apiVersion: v1
kind: Pod
metadata:
name: kubia-manual
spec:
containers:
- image: luksa/kubia
name: kubia
ports:
- containerPort: 8080
protocol: TCP
$ kubectl create -f kubia-manual.yaml
$ kubectl get po kubia-manual -o yaml
$ kubectl get pods
$ docker logs <container id>
$ kubectl logs kubia-manual
$ kubectl logs kubia-manual -c kubia
$ kubectl port-forward kubia-manual 8888:8080
$ curl localhost:8888
labels
metadata:
name: kubia-manual-v2
labels:
creation_method: manual
env: prod
$ kubectl create -f kubia-manual-with-labels.yaml
$ kubectl get po --show-labels
$ kubectl get po -L creation_method,env
add new label:
$ kubectl label po kubia-manual creation_method=manual
update label:
$ kubectl label po kubia-manual-v2 env=debug --overwrite
listing pods using a label selector
$ kubectl get po -l creation_method=manual
$ kubectl get po -l env
$ kubectl get po -l '!env'
label nodes
$ kubectl label node gke-kubia-85f6-node-0rrx gpu=true
$ kubectl get nodes -l gpu=true
schedule pods to specific node
apiVersion: v1
kind: Pod
metadata:
name: kubia-gpu
spec:
nodeSelector:
gpu: "true"
containers:
- image: luksa/kubia
name: kubia
namespaces
$ kubectl get ns
$ kubectl get po --namespace kube-system
create namespace
apiVersion: v1
kind: Namespace
metadata:
name: custom-namespace
$ kubectl create -f custom-namespace.yaml
# or
$ kubectl create namespace custom-namespace
create pod under namespace:
$ kubectl create -f kubia-manual.yaml -n custom-namespace
deleting pod
$ kubectl delete po kubia-gpu
# delete by label
$ kubectl delete po -l creation_method=manual
$ kubectl delete po -l rel=canary
# delete by namespace
$ kubectl delete ns custom-namespace
# delete all under current namespace
$ kubectl delete po --all
Replication and other controllers
liveness probes
Kubernetes can probe a container using one of the three mechanisms:
- An
HTTP GET
probe performs an HTTP GET request on the container’s IP address, a port and path you specify. - A
TCP Socket
probe tries to open a TCP connection to the specified port of the container. - An
Exec
probe executes an arbitrary command inside the container and checks the command’s exit status code.
http get probe:
apiVersion: v1
kind: pod
metadata:
name: kubia-liveness
spec:
containers:
- image: luksa/kubia-unhealthy
name: kubia
livenessProbe:
httpGet:
path: /
port: 8080
$ kubectl get po kubia-liveness
$ kubectl logs mypod --previous
$ kubectl describe po kubia-liveness
delay probe
livenessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 15
If you don’t set the initial delay, the prober will start probing the container as soon as it starts.
even if you set the failure threshold to 1
, Kubernetes will retry the probe several times before considering it a single failed attempt. Therefore, implementing your own retry loop into the probe is wasted effort.
If you’re running a Java app in your container, be sure to use an HTTP GET
liveness probe instead of an Exec
probe, where you spin up a whole new JVM to get the liveness information. The same goes for any JVM-based or similar applications, whose start-up procedure requires considerable computational resources.
ReplicationController
A ReplicationController has three essential parts:
- A label selector, which determines what pods are in the ReplicationController’s scope
- A replica count, which specifies the desired number of pods that should be running
- A pod template, which is used when creating new pod replicas
apiVersion: v1 kind: ReplicationController metadata: name: kubia spec: replicas: 3 selector: app: kubia template: metadata: labels: app: kubia spec: containers: - name: kubia image: luksa/kubia ports: - containerPort: 8080
$ kubectl create -f kubia-rc.yaml $ kubectl get pods $ kubectl get rc $ kubectl describe rc kubia
Moving pods in and out of the scope of a ReplicationController
If you change a pod’s labels so they no longer match a ReplicationController’s label selector, the pod becomes like any other manually created pod. It’s no longer managed by anything.
$ kubectl label pod kubia-dmdck type=special
$ kubectl label pod kubia-dmdck app=foo --overwrite
Changing the pod template
$ kubectl edit rc kubia
You can tell kubectl
to use a text editor of your choice by setting the KUBE_EDITOR
environment variable.
Horizontally scaling pods
$ kubectl scale rc kubia --replicas=10
or $ kubectl edit rc kubia
spec:
replicas: 3
selector:
app: kubia
delete rc but keep its pods running:
$ kubectl delete rc kubia --cascade=false
ReplicaSet
ReplicaSet is a new generation of ReplicationController and replaces it completely.
A ReplicaSet behaves exactly like a ReplicationController, but it has more expressive pod selectors.
Whereas a ReplicationController’s label selector only allows matching pods that include a certain label, a ReplicaSet’s selector also allows matching pods that lack a certain label or pods that include a certain label key, regardless of its value.
apiVersion: apps/v1beta2
kind: ReplicaSet
metadata:
name: kubia
spec:
replicas: 3
selector:
matchLabels:
app: kubia
template:
metadata:
labels:
app: kubia
spec:
containers:
- name: kubia
image: luksa/kubia
ReplicaSets aren't part of the v1
API, but belong to the apps API group and version v1beta2
.
$ kubectl get rs
$ kubectl describe rs
matchExpressions:
selector:
matchExpressions:
- key: app
operator: In
values:
- kubia
each expression must contain a key, an operator, and possibly (depending on the operator) a list of values.
You’ll see four valid operators:
In
—Label’s value must match one of the specified values.NotIn
—Label’s value must not match any of the specified values.Exists
—Pod must include a label with the specified key (the value isn’t important). When using this operator, you shouldn’t specify the values field.DoesNotExist
—Pod must not include a label with the specified key. The values property must not be specified.
Running exactly one pod on each node with DaemonSets
A DaemonSet makes sure it creates as many pods as there are nodes and deploys each one on its own node,
If a node goes down, the DaemonSet doesn’t cause the pod to be created elsewhere. But when a new node is added to the cluster, the DaemonSet immediately deploys a new pod instance to it.
apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
name: ssd-monitor
spec:
selector:
matchLabels:
app: ssd-monitor
template:
metadata:
labels:
app: ssd-monitor
spec:
nodeSelector:
disk: ssd
containers:
- name: main
image: luksa/ssd-monitor
$ kubectl create -f ssd-monitor-daemonset.yaml
$ kubectl get ds
$ kubectl label node minikube disk=ssd
Running pods that perform a single completable task
Job resource
apiVersion: batch/v1
kind: Job
metadata:
name: batch-job
spec:
template:
metadata:
labels:
app: batch-job
spec:
restartPolicy: OnFailure
containers:
- name: main
image: luksa/batch-job
Job pods can’t use the default policy, because they’re not meant to run indefinitely. Therefore, you need to explicitly set the restart policy to either OnFailure
or Never
.
$ kubectl get jobs
# after job is done
$ kubectl get po -a
$ kubectl logs batch-job-28qf4
Running multiple pod instances in a Job
run five pods sequentially
apiVersion: batch/v1
kind: Job
metadata:
name: multi-completion-batch-job
spec:
completions: 5
template:
<template is the same as above>
running job pods in parallel:
apiVersion: batch/v1
kind: Job
metadata:
name: multi-completion-batch-job
spec:
completions: 5
parallelism: 2
template:
<same as above>
Scaling a Job
You can even change a Job’s parallelism property while the Job is running.
$ kubectl scale job multi-completion-batch-job --replicas 3
limiting the time allowed for a Job pod to complete
A pod’s time can be limited by setting the activeDeadlineSeconds
property in the pod spec.
You can configure how many times a Job can be retried before it is marked as failed by specifying the spec.backoffLimit
field in the Job manifest. If you don’t explicitly specify it, it defaults to 6
.
scheduling Jobs to run periodically or once in the future
Creating a CronJob:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: batch-job-every-fifteen-minutes
spec:
schedule: "0,15,30,45 * * * *"
jobTemplate:
spec:
template:
metadata:
labels:
app: periodic-batch-job
spec:
restartPolicy: OnFailure
containers:
- name: main
image: luksa/batch-job
It may happen that the Job or pod is created and run relatively late.
You may have a hard requirement for the job to not be started too far over the scheduled time. In that case, you can specify a deadline by specifying the startingDeadlineSeconds
field.
apiVersion: batch/v1beta1
kind: CronJob
spec:
schedule: "0,15,30,45 * * * *"
startingDeadlineSeconds: 15
...
Services
apiVersion: v1
kind: Service
metadata:
name: kubia
spec:
ports:
- port: 80
targetPort: 8080
selector:
app: kubia
$ kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 10.111.240.1 <none> 443/TCP 30d
kubia 10.111.249.153 <none> 80/TCP 6m
$ kubectl exec kubia-7nog1 -- curl -s http://10.111.249.153
you want all requests made by a certain client to be redirected to the same pod every time, you can set the service’s sessionAffinity
property to ClientIP
(instead of None
, which is the default)
apiVersion: v1
kind: Service
spec:
sessionAffinity: ClientIP
...
Exposing multiple ports in the same service
apiVersion: v1
kind: Service
metadata:
name: kubia
spec:
ports:
- name: http
port: 80
targetPort: 8080
- name: https
port: 443
targetPort: 8443
selector:
app: kubia
When creating a service with multiple ports, you must specify a name for each port.
Discovering services through environment variables
When a pod is started, Kubernetes initializes a set of environment variables pointing to each service that exists at that moment.
$ kubectl exec kubia-3inly env
...
KUBIA_SERVICE_HOST=10.111.249.153
KUBIA_SERVICE_PORT=80
...
Discovering services through DNS
Whether a pod uses the internal DNS server or not is configurable through the dnsPolicy
property in each pod’s spec.
root@kubia-3inly:/# curl http://kubia.default.svc.cluster.local
You've hit kubia-5asi2
root@kubia-3inly:/# curl http://kubia.default
You've hit kubia-3inly
root@kubia-3inly:/# curl http://kubia
You've hit kubia-8awf3
You can omit the namespace and the svc.cluster.local
suffix because of how the DNS resolver inside each pod’s container is configured:
root@kubia-3inly:/# cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local ...
Endpoints resource
$ kubectl get endpoints kubia
Manually configuring service endpoints
apiVersion: v1
kind: Service
metadata:
name: external-service
spec:
ports:
- port: 80
Creating an Endpoints resource for a service without a selector
apiVersion: v1
kind: Endpoints
metadata:
name: external-service
subsets:
- addresses:
- ip: 11.11.11.11
- ip: 22.22.22.22
ports:
- port: 80
Creating an ExternalName service
apiVersion: v1
kind: Service
metadata:
name: external-service
spec:
type: ExternalName
externalName: someapi.somecompany.com
ports:
- port: 80
After the service is created, pods can connect to the external service through the external-service.default.svc.cluster.local domain name (or even external-service) instead of using the service’s actual FQDN.
ExternalName
services are implemented solely at the DNS level—a simple CNAME DNS record is created for the service. Therefore, clients connecting to the service will connect to the external service directly, bypassing the service proxy completely. For this reason, these types of services don’t even get a cluster IP.
Exposing services to external clients
You have a few ways to make a service accessible externally:
- Setting the service type to
NodePort
—each cluster node opens a port on the node itself (hence the name) and redirects traffic received on that port to the underlying service - Setting the service type to
LoadBalancer
, an extension of theNodePort
type—This makes the service accessible through a dedicated load balancer, provisioned from the cloud infrastructure Kubernetes is running on. The load balancer redirects traffic to the node port across all the nodes. Clients connect to the service through the load balancer’s IP. - Creating an
Ingress
resource, a radically different mechanism for exposing multiple services through a single IP address—It operates at the HTTP level (network layer 7) and can thus offer more features than layer 4 services can.
NodePort service
apiVersion: v1
kind: Service
metadata:
name: kubia-nodeport
spec:
type: NodePort
ports:
- port: 80
targetPort: 8080
nodePort: 30123
selector:
app: kubia
$ kubectl get svc kubia-nodeport
Using JSONPath to get the IPs of all your nodes
$ kubectl get nodes -o jsonpath='{.items[*].status.addresses[?(@.type=="ExternalIP")].address}'
Exposing a service through an external load balancer
If Kubernetes is running in an environment that doesn’t support LoadBalancer
services, the load balancer will not be provisioned, but the service will still behave like a NodePort
service.
apiVersion: v1
kind: Service
metadata:
name: kubia-loadbalancer
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
selector:
app: kubia
$ kubectl get svc kubia-loadbalancer
browser is using keep-alive connections and sends all its requests through a single connection, whereas curl opens a new connection every time.
Services work at the connection level, so when a connection to a service is first opened, a random pod is selected and then all network packets belonging to that connection are all sent to that single pod.
Even if session affinity is set to None, users will always hit the same pod (until the connection is closed).
preventing unnecessary network hops
configuring the service to redirect external traffic only to pods running on the node that received the connection
spec:
externalTrafficPolicy: Local
...
If a service definition includes this setting and an external connection is opened through the service’s node port, the service proxy will choose a locally running pod.
If no local pods exist, the connection will hang. You therefore need to ensure the load balancer forwards connections only to nodes that have at least one such pod.
Using this annotation also has other drawbacks. Normally, connections are spread evenly across all the pods, but when using this annotation, that’s no longer the case.
It also affects the preservation of the client’s IP, because there’s no additional hop between the node receiving the connection and the node hosting the target pod (SNAT isn’t performed).
Exposing services externally through an Ingress resource
One important reason is that each LoadBalancer
service requires its own load balancer with its own public IP address, whereas an Ingress
only requires one, even when providing access to dozens of services.
an Ingress
controller needs to be running in the cluster.
Enabling the Ingress add-on in Minikube
$ minikube addons list
$ minikube addons enable ingress
$ kubectl get po --all-namespaces
creating an Ingress resource
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: kubia
spec:
rules:
- host: kubia.example.com
http:
paths:
- path: /
backend:
serviceName: kubia-nodeport
servicePort: 80
Ingress controllers on cloud providers (in GKE, for example) require the Ingress to point to a NodePort
service. But that’s not a requirement of Kubernetes itself.
$ kubectl get ingresses
Exposing multiple services through the same Ingress
You can map multiple paths on the same host to different services
...
- host: kubia.example.com
http:
paths:
- path: /kubia
backend:
serviceName: kubia
servicePort: 80
- path: /foo
backend:
serviceName: bar
servicePort: 80
Similarly, you can use an Ingress to map to different services based on the host in the HTTP request instead of (only) the path
spec:
rules:
- host: foo.example.com
http:
paths:
- path: /
backend:
serviceName: foo
servicePort: 80
- host: bar.example.com
http:
paths:
- path: /
backend:
serviceName: bar
servicePort: 80
Configuring Ingress to handle TLS traffic
When a client opens a TLS connection to an Ingress controller, the controller terminates the TLS connection.
The application running in the pod doesn’t need to support TLS.
create the private key and certificate:
$ openssl genrsa -out tls.key 2048
$ openssl req -new -x509 -key tls.key -out tls.cert -days 360 -subj /CN=kubia.example.com
$ kubectl create secret tls tls-secret --cert=tls.cert --key=tls.key
Instead of signing the certificate ourselves, you can get the certificate signed by creating a CertificateSigningRequest (CSR) resource.
$ kubectl certificate approve <name of the CSR>
The private key and the certificate are now stored in the Secret called tls-secret
ingress manifest:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: kubia
spec:
tls:
- hosts:
- kubia.example.com
secretName: tls-secret
rules:
- host: kubia.example.com
http:
paths:
- path: /
backend:
serviceName: kubia-nodeport
servicePort: 80
Instead of deleting the Ingress and re-creating it from the new file, you can invoke kubectl apply -f kubia-ingress-tls.yaml
, which updates the Ingress resource with what’s specified in the file.
$ curl -k -v https://kubia.example.com/kubia
Although they currently support only L7 (HTTP/HTTPS) load balancing, support for L4 load balancing is also planned.
Signaling when a pod is ready to accept connections
readiness probe is invoked periodically and determines whether the specific pod should receive client requests or not.
Types of readiness probes
Like liveness probes, 3 types of readiness probes exist:
- An
Exec
probe, where a process is executed. The container’s status is determined by the process’ exit status code. - An
HTTP GET
probe, which sends an HTTP GET request to the container and the HTTP status code of the response determines whether the container is ready or not. - A
TCP Socket
probe, which opens a TCP connection to a specified port of the container. If the connection is established, the container is considered ready.
When a container is started, Kubernetes can be configured to wait for a configurable amount of time to pass before performing the first readiness check. After that, it invokes the probe periodically and acts based on the result of the readiness probe. If a pod reports that it’s not ready, it’s removed from the service.
Unlike liveness probes, if a container fails the readiness check, it won’t be killed or restarted. This is an important distinction between liveness and readiness probes.
Liveness probes keep pods healthy by killing off unhealthy containers and replacing them with new, healthy ones,
whereas readiness probes make sure that only pods that are ready to serve requests receive them.
Adding a readiness probe to the pod template
$ kubectl edit rc kubia
apiVersion: v1
kind: ReplicationController
...
spec:
...
template:
...
spec:
containers:
- name: kubia
image: luksa/kubia
readinessProbe:
exec:
command:
- ls
- /var/ready
...
$ kubectl get po
$ kubectl exec kubia-2r1qb -- touch /var/ready
The readiness probe is checked periodically—every 10
seconds by default.
Understanding what real-world readiness probes should do
Manually removing pods from services should be performed by either deleting the pod or changing the pod’s labels instead of manually flipping a switch in the probe.
If you want to add or remove a pod from a service manually, add enabled=true
as a label to your pod and to the label selector of your service. Remove the label when you want to remove the pod from the service.
You should always define a readiness probe, even if it’s as simple as sending an HTTP request to the base URL.
Don’t include pod shutdown logic into your readiness probes, because Kubernetes removes the pod from all services as soon as you delete the pod.
Creating a headless service
Setting the clusterIP
field in a service spec to None
makes the service headless, as Kubernetes won’t assign it a cluster IP through which clients could connect to the pods backing it
apiVersion: v1
kind: Service
metadata:
name: kubia-headless
spec:
clusterIP: None
ports:
- port: 80
targetPort: 8080
selector:
app: kubia
$ kubectl exec <pod name> -- touch /var/ready
$ kubectl run dnsutils --image=tutum/dnsutils --generator=run-pod/v1 --command -- sleep infinity
$ kubectl exec dnsutils nslookup kubia-headless