k8s

k8s问题解决

Posted by Epoch Blog on December 3, 2021

最近遇到了个redis性能的问题,导致对外提供接口在并发大量的时候超时(超时30秒),因此讲述一下redis实际应用中的性能压力分析。

安装K8S的时候遇到如下问题

在进行初始化的时候

问题1:

kubeadm init –kubernetes-version=1.22.4 –apiserver-advertise-address=192.168.19.131 –image-reposito ry registry.aliyuncs.com/google_containers –service-cidr=10.10.0.0/16 –pod-network-cidr=10.122.0.0/16 –ignore-preflight-errors=Swap

执行命令后,进入到如下

1
2
3
4
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/m
anifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

等待若干分钟后报错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
	Unfortunately, an error has occurred:
		timed out waiting for the condition

	This error is likely caused by:
		- The kubelet is not running
		- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

	If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
		- 'systemctl status kubelet'
		- 'journalctl -xeu kubelet'

	Additionally, a control plane component may have crashed or exited when started by the container runtime.
	To troubleshoot, list all containers using your preferred container runtimes CLI.

	Here is one example how you may list all Kubernetes containers running in docker:
		- 'docker ps -a | grep kube | grep -v pause'
		Once you have found the failing container, you can inspect its logs with:
		- 'docker logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

查看信息日志

tail /var/log/messages

1
2
3
Dec  4 11:00:10 master kubelet[36152]: F1204 11:00:10.753392   36152 server.go:274] failed to run Kubelet:
 misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"Dec  4 11:00:10 master systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Dec  4 11:00:10 master systemd[1]: kubelet.service: Failed with result 'exit-code'.

其中说明,kubeletd的cgroup driver是cgroupfs,但是docker的cgroup driver是systemd

因此修改docker的变为cgroupfs

1
2
3
4
5
[root@master ~]# cat /etc/docker/daemon.json 
{
  "registry-mirrors": ["https://cfaccnwy.mirror.aliyuncs.com"],
  "exec-opts": ["native.cgroupdriver=cgroupfs"]
}

然后重启

1
systemctl daemon-reload && systemctl restart docker

修改完成后,又发现一个问题

1
2
3
4
5
6
7
8
9
10
11
Dec  4 11:53:28 master kubelet[56181]: I1204 11:53:28.349642   56181 kubelet_node_status.go:294] Setting n
ode annotation to enable volume controller attach/detachDec  4 11:53:38 master kubelet[56181]: I1204 11:53:38.373012   56181 kubelet_node_status.go:294] Setting n
ode annotation to enable volume controller attach/detachDec  4 11:53:48 master kubelet[56181]: I1204 11:53:48.396199   56181 kubelet_node_status.go:294] Setting n
ode annotation to enable volume controller attach/detachDec  4 11:53:58 master kubelet[56181]: I1204 11:53:58.417475   56181 kubelet_node_status.go:294] Setting n
ode annotation to enable volume controller attach/detachDec  4 11:54:08 master kubelet[56181]: I1204 11:54:08.442535   56181 kubelet_node_status.go:294] Setting n
ode annotation to enable volume controller attach/detachDec  4 11:54:18 master kubelet[56181]: I1204 11:54:18.466599   56181 kubelet_node_status.go:294] Setting n
ode annotation to enable volume controller attach/detachDec  4 11:54:28 master kubelet[56181]: I1204 11:54:28.490899   56181 kubelet_node_status.go:294] Setting n
ode annotation to enable volume controller attach/detachDec  4 11:54:38 master kubelet[56181]: I1204 11:54:38.515567   56181 kubelet_node_status.go:294] Setting n
ode annotation to enable volume controller attach/detachDec  4 11:54:48 master kubelet[56181]: I1204 11:54:48.539856   56181 kubelet_node_status.go:294] Setting n
ode annotation to enable volume controller attach/detachDec  4 11:54:58 master kubelet[56181]: I1204 11:54:58.565186   56181 kubelet_node_status.go:294] Setting n
ode annotation to enable volume controller attach/detach

代码中大量报 attach/detach失败

参考

https://blog.csdn.net/weixin_39850920/article/details/110801818?spm=1001.2101.3001.6650.1&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.no_search_link

目前问题还未解决。这是官方的一个BUG,继续跟踪中,主要考虑因为之前安装过别的版本,卸载后重新安装后出错导致的。