A comprehensive review of the main concepts of K8S!

[[392655]]

This article is reprinted from the WeChat public account "Little Sister Taste", and the author is the dog raised by the little sister, No. 02. Please contact the Little Sister Taste public account to reprint this article.

K8s has become an absolutely popular technology. If a company of a certain scale does not use K8s, it will be embarrassed to go out and meet people. Installing K8s requires overcoming various network obstacles, but even greater obstacles are still to come...

I found that many articles about k8s do not speak human language, including that damn official website.

To understand the details of k8s, you need to know what k8s is. Its main function is to schedule containers, that is, to deploy deployment instances to any place based on the overall resource usage. Don't talk about other things first, as that will distract your attention and increase complexity.

Note that these two words "any" indicate that you cannot access the deployed instance through the conventional IP and port methods. This is where the complexity comes from.

When we learn k8s, we need to see what resources it needs to schedule. In the traditional sense, it is nothing more than cpu, memory, network, io, etc. Before understanding how to schedule these resources, we must first understand what Pod is, which is one of the core concepts of k8s.

If you don’t understand Pod, you can’t use k8s.

The mind map of this article can be viewed online here: http://mind.xjjdog.cn/mind/cloud-k8s

1. Pod

Pod is the smallest unit of k8s scheduling, which contains one or more containers (you can temporarily think of the container here as docker).

A Pod has a unique IP address. Even when it contains multiple containers, it still has an IP address. How does it do that?

xjjdog has previously written two articles on the principles of Docker, pointing out that two of the underlying technologies used are namespace and cgroup. When k8s uses multiple containers, it uses a shared namespace, so that the containers in the Pod can communicate through localhost, just like two processes. Similarly, a Pod can mount multiple shared storage volumes, and then each container inside can access the shared volume to read and write data.

Some sidecars and survival probes also exist in Pod in the form of containers. So Pod is a hodgepodge that replaces part of the work of Docker containers (which is the responsibility of Pause containers), such as creating some shared net namespaces.

So how do you represent and declare a Pod? And how do you specify the relationship between these containers and the Pod? K8s uses the yaml configuration method with the original intention of avoiding excessive API design.

Well, this brings up another problem, which is the expansion of yml files. All k8s operators have experienced the fear of being dominated by yml files.

There is no silver bullet, it's just shifting the problem to another scenario.

Declaring a Pod means writing a yml file. A Pod yml sample may look like the following.

 apiVersion: v1 #version number
 kind: Service #The type of resource created
 metadata: #Metadata is required
 namespace: bcmall #bind namespace
 name : bcmall-srv #Service resource name
 spec: #define detailed information
 type: NodePort #Type
 selector: #label selector
 app: container-bcmall-pod
 ports: #Define ports
 - port: 8080 #port specifies the server port, which is used for internal access to the cluster
 targetPort: 80 #bind pod port
 nodePort: 14000 #Map the server port to the Node port for external network access
 protocol: TCP #Port protocol

Pay attention to the option kind, which will be a nightmare for the expansion of the k8s concept! The various configurations of k8s basically revolve around this. Oh, by the way, to make these yml files take effect, you need to use the kubectl command, like this.

 kubectl create -f ./bcmall.yaml

A Pod can be accessed through its IP address or through an internal domain name (CoreDNS is needed in this case). When used this way, the Pod behaves just like an ordinary machine, and the containers inside are just a bunch of processes.

2. Probe and Hook

After a Pod is scheduled, it must be initialized. Initialization must have a feedback, otherwise it is unknown whether it has been successfully started. These health check functions are called probes, a rather strange English term.

There are three common probes: livenessProbe, readinessProbe, and startupProbe.

LivenessProbe is a bit like a heartbeat. If it is determined to be offline, it will be killed; readinessProbe generally indicates the ready state, which is also more like a heartbeat, proving that your service is running normally; startupProbe is used to determine whether the container has been started to avoid some timeouts, for example, your JVM must be started before it can provide services to the outside world.

Generally, it is common to spend 120 seconds on startupProbe startup practice, check livenessProbe every 5 seconds, and check readinessProbe every 10 seconds.

This information is also configured in yml. I will not go into details about the specific configuration levels here. Please check the documentation.

Let's talk about hooks. There are two main types: PostStart and PreStop. PostStart can be executed after the container is started, while PreStop is executed before the container is terminated. There is nothing magical about this, it just executes some shell scripts, but it is more commonly used, so it has been upgraded to the level of keywords.

Let's take a look at what it looks like. Since these configuration files are similar, I will not post such code later.

 apiVersion: v1
 kind: Pod
 metadata:
 labels:
 test: liveness
 name : liveness- exec  
 spec:
 containers:
 - name : liveness
 image: k8s.gcr.io/busybox
 args:
 - /bin/sh
 - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
 livenessProbe:
 exec :
 command:
 - cat
 - /tmp/healthy
 initialDelaySeconds: 5
 periodSeconds: 5

3. Noun explosion caused by high availability

As mentioned above, the complexity of YAML is due to the large number of kinds. The first one we will touch is ReplicaSet.

We need multiple replicas to achieve high availability.

The reason is very simple. A Pod is equivalent to a machine. Once it crashes, it cannot provide services. What kind of high availability is this? Therefore, there must be multiple copies of equivalent Pods.

ReplicaSet, also known as RS, can keep the number of your Pods at a certain level. But it is still a bit troublesome to operate, so more advanced Deployment is generally used. Deployment can meet some rolling upgrade requirements, but the premise is that you need to set the corresponding key-value pairs in spec.template.metadata.labels.

Some of the filtering work of k8s is achieved through labels. This is actually a very compromised approach, because it does not do some work like SQL query, so it can only work on the key-value pairs of this map. For example:

 kubectl get pod -n demo -l app=nginx,version=v1

Is this a magical way of writing? It doesn’t matter, you’ll get used to it.

These YML configurations are usually linked together, with cross-references, so there will be priorities. A higher-order kind will directly create a lower-order kind, which is a cascading relationship.

Everything is just a layer of yml file.

Okay, we are going to touch the next kind: service.

Why do we need a Service? Because it takes some effort to access the Pod we created above, even if it is a Pod created by a Deployment. Although the Pod has an IP, if it is restarted or destroyed, the IP will change dynamically. Because the Pod is scheduled, it does not know which machine it will be scheduled to.

The purpose of Service is to provide a non-IP access method so that we can access it no matter where the Pod is.

As shown in the figure, by filtering with Labels, multiple Pods can be grouped into one category, and then services can be exposed as a certain type. To put it simply, a Service is also a combination of things.

The types of external access need to be explained here because it is important. There are four main types:

ClusterIP creates a virtual IP, which is unique and cannot be modified. All requests to access this IP will be forwarded to the backend by iptables. This is the default behavior, which is a coredns plugin.

NodePort provides a static port (NodePort) to expose services. The main technology used is NAT

LoadBalancer LoadBalancer is mainly used for external service discovery, that is, exposure to access outside the cluster

ExternalName is rarely used. If you are interested, you can learn more about it yourself.

But wait. How does k8s enable Pods across hosts to access each other?

It is understandable that Pods on a single Node can access each other directly through the IP allocated by the docker0 bridge.

So how is the underlying network of k8s really set up? The answer may be frustrating. k8s itself is not responsible for network management, nor does it provide specific network settings for containers. It is implemented through CNI (Container Network Interface). On different nodes, different Pods have to access each other, which is exactly the job of CNI. Commonly used CNI plug-ins are: Flannel, Calico, Canal, Weave.

That’s right, another bunch of nouns, and each one is difficult to deal with.

The network aspect is the most complex knowledge point of k8s, and there are many frameworks. The following articles will specifically introduce it.

4. Internal components

Before opening more Kind, let's take a look at the internal components of k8s.

The following picture is a picture from the official website, which illustrates a series of necessary components of k8s. Among them, etcd is not in this system at all, but some persistent states of k8s need to be stored somewhere, so such a component is introduced to store configuration information.

The left part is the components of k8s itself, and the right part is the daemon process on each Node (that is, the physical machine). Their functions are as follows:

kube-apiserver provides the Rest interface, which is the soul of k8s. All authentication, authorization, access control, service discovery and other functions are exposed through it.
kube-scheduler is a scheduling component, and that’s what it does. It listens to unscheduled Pods and implements your specified goals.
kube-controller-manager is responsible for maintaining the status of the entire k8s cluster. Note that it is the status of the k8s cluster, not the Pod
Kubelet is a daemon process that communicates with the apiserver and reports the status of its own node. Some scheduling commands are also received and executed through kubelet.
kube-proxy kube-proxy is actually the access entry for managing services, including access from Pods in the cluster to services and access from outside the cluster to services. The four modes mentioned above are forwarded through proxies

The responsibilities of these components are very clear. The difficulty lies in the concept of multiple Kinds.

5. More concepts

These concepts in the figure are essentially another layer above the Pod. The higher the level, the more abstract the function and the more dependent configurations. The following will introduce the main ones.

The instance IDs of StatefulSet Deployment are random, such as bcmall-deployment-5d45f98bd9, which is stateless. In contrast, StatefulSet generates instance names like bcmall-deployment-1. It has fixed network tags, such as host name, domain name, etc., and can be deployed and expanded in sequence, which is very suitable for instance deployment like MySQL.
DaemonSet is used to ensure that each node in the cluster runs only a specific pod copy, usually used to implement system-level background tasks
ConfigMap and Secret, as the name implies, are used for configuration, because containers more or less need to pass in some environment variables from the outside. They can be used to achieve unified management of business configurations, allowing configuration files to be separated from image files to make containerized applications portable.
PV and PVC Business operations require storage, which can be defined through PV. The life cycle of PV is independent of the life cycle of Pod, which is a network storage; PVC is the user's demand for storage: Pod consumes node resources, PVC consumes PV resources, and PVC and PV are one-to-one corresponding. Yes, they are all declared through yml files
**StorageClass** can realize dynamic PV, which is a further encapsulation
The job exits immediately upon completion, without restarting or rebuilding
Cronjob periodic task control, no need to run continuously in the background
CRD

6. Resource Limitations

Very good, we are finally going to talk about resource limits. The resource limits of k8s are still implemented through cgroups.

K8s provides two types of parameters: requests and limits to pre-allocate resources and limit usage.

Don't be confused by these two words. requests is equivalent to -Xms in the JVM parameters, and limits is equivalent to -Xmx. So, if you set these two values to the same, it is a best practice.

It's just that the setup is a bit weird:

 requests:
 memory: "64Mi"  
 cpu: "250m"  
 limits:
 memory: "128Mi"  
 cpu: "500m"

The unit of memory is Mi, and the unit of CPU is m. It is as awkward as it can be, but there is a reason for it.

m stands for milli-core. For example, our operating system has 4 cores, multiply it by 1000, and the total CPU resources are 4000 milli-cores. If you want your application to occupy at most 1/4 core, then set it to 250m.

Let's look at memory. Mi stands for MiB. I don't understand why MB is not used but Mi is used. Maybe they want to make a deep impression on you (MB and MiB are really different).

If the memory usage exceeds the limit, it will trigger the system's OOM mechanism, but the CPU will not, at most it will affect the system operation.

K8s also provides Kinds called LimitRange and ResourceQuota, which are used to limit the application range of CPU and Memory, and have more advanced functions.

7. Cluster building tools

There are many common methods for building a k8s cluster, such as kind, minikube, kubeadm, and rancher2. Rancher2 is a relatively easy method. It takes into account some network issues and has some recommended proxy options. For novices, it is very useful to try it out.

However, in normal use, it is recommended that you use the kubeadm tool, which is the official maintenance and recommended construction tool, and is backward compatible. According to the official guidance, you only need to run kubeadm init to complete the construction. As for monitoring, logging, etc., they are easy to handle.

There are three most troublesome points about k8s:

yml file concept explosion
Network solutions are diverse and complex
Complicated configuration of permissions and certificates

If you understand these three aspects, you can say that you will have no problem using k8s.

About the author: Xiaojieweidao (xjjdog), a public account that does not allow programmers to take detours. Focusing on infrastructure and Linux. Ten years of architecture, hundreds of billions of daily traffic, discussing the high-concurrency world with you, giving you a different taste. My personal WeChat is xjjdog0, welcome to add friends for further communication.

<<: Application of IoT technology in the logistics industry under the background of 5G

>>: Foreign media said that the number of 5G base stations in Shenzhen has exceeded that of Germany by 50 times

Summary information: 51Cloud/Yunji Internet/Hengchuang Technology/LiuliuCloud/Yunmi Technology/Hengtian Cloud

Blog

HostYun: VPS in the United States/UK/Russia/Korea/Hong Kong starting at 16 yuan per month, with CN2/AS9929/large bandwidth/high defense and other features available

Blog

HostYun: Unicom AS9929 line, minimum monthly payment starting from 18 yuan, maximum 500Mbps bandwidth, Los Angeles data center

Blog

"No products, no discounts, no sales" Huawei's new "knowledgeable" approach

Megalayer: Dedicated servers from 299 yuan/month, optimized CN2 lines in mainland China, Hong Kong/Philippines/US VPS annual payment from 159 yuan

Blog

The hidden threat of smart home privacy leakage comes from the router

Blog

Recommend

Migrate WHM/cPanel data to DA (DirectAdmin)

I shared an article about migrating from CP to DA...

Pesyun: 188 yuan/month-8 cores/32GB memory/500G SSD/30M/Los Angeles & Portland data centers

Pesyun (Standard Interconnect) has launched the 2...

The Ministry of Industry and Information Technology announced the ten major events of 2017. Mobile data charges dropped to 26 yuan/GB.

Recently, the Ministry of Industry and Informatio...

[Black Friday] ITLDC: 40% off unlimited traffic VPS annual payment, 25% off dedicated server, 14 data centers in the United States/Singapore/Netherlands

ITLDC's Black Friday promotion this year last...

A comprehensive review of the main concepts of K8S!

1. Pod

2. Probe and Hook

3. Noun explosion caused by high availability

4. Internal components

5. More concepts

6. Resource Limitations

7. Cluster building tools

Summary information: 51Cloud/Yunji Internet/Hengchuang Technology/LiuliuCloud/Yunmi Technology/Hengtian Cloud

HostYun: VPS in the United States/UK/Russia/Korea/Hong Kong starting at 16 yuan per month, with CN2/AS9929/large bandwidth/high defense and other features available

HostYun: Unicom AS9929 line, minimum monthly payment starting from 18 yuan, maximum 500Mbps bandwidth, Los Angeles data center

"No products, no discounts, no sales" Huawei's new "knowledgeable" approach

Interviewer: Can you tell me about the release process of WeChat Mini Programs?

The number of 5G commercial networks has reached 200 worldwide, and 1,257 5G terminals have been released

Analysis on the application of narrowband Internet of Things (NB-IoT) in traffic infrastructure safety monitoring

New enterprise video conferencing strategies must go beyond meetings

Megalayer: Dedicated servers from 299 yuan/month, optimized CN2 lines in mainland China, Hong Kong/Philippines/US VPS annual payment from 159 yuan

The hidden threat of smart home privacy leakage comes from the router

Recommend

Migrate WHM/cPanel data to DA (DirectAdmin)

Pesyun: 188 yuan/month-8 cores/32GB memory/500G SSD/30M/Los Angeles & Portland data centers

The Ministry of Industry and Information Technology announced the ten major events of 2017. Mobile data charges dropped to 26 yuan/GB.

[Black Friday] ITLDC: 40% off unlimited traffic VPS annual payment, 25% off dedicated server, 14 data centers in the United States/Singapore/Netherlands

In the "5G era", a large number of "unicorn" companies will emerge to seize the opportunity

Operators should not set traps for unlimited data packages

Emerging technology trends to watch in 2023

New electromagnetic wave router will enable unlimited bandwidth

Three common misunderstandings about SD-WAN

The arrival of 5G is getting closer and closer. "Wi-Fi will die" is not an exaggeration

MoeCloud New Year Promotion: San Jose CN2 GIA monthly payment 15% off, annual payment 30% off

How 5G frequencies affect range and speed

DediPath: San Jose 1Gbps unlimited traffic server starting at $45/month, E3-1240v3, 32G memory, 2TB hard disk

Data Center and IT Facilities Priorities

Aruba: Modernizing the network to enable ubiquitous connectivity