RockyLinux 9.5 构建高可用Kubernetes集群:Containerd运行时与Calico网络实战

张开发
2026/4/16 1:31:19 15 分钟阅读

分享文章

RockyLinux 9.5 构建高可用Kubernetes集群:Containerd运行时与Calico网络实战
1. 环境准备与系统优化在RockyLinux 9.5上构建高可用Kubernetes集群第一步需要做好基础环境配置。我遇到过不少因为系统参数没调优导致的集群异常这里分享几个关键配置点。首先是防火墙和SELinux的处理。生产环境中我通常建议保持SELinux处于宽容模式Permissive这样既不会影响业务运行又能保留安全审计能力。具体操作很简单sudo setenforce 0 \ sudo sed -i s/^SELINUX.*/SELINUXpermissive/ /etc/selinux/config网络参数调优是另一个重点。Kubernetes依赖br_netfilter模块实现网络包过滤必须确保以下配置生效cat /etc/sysctl.d/k8s.conf EOF net.bridge.bridge-nf-call-ip6tables 1 net.bridge.bridge-nf-call-iptables 1 net.ipv4.ip_forward 1 EOF modprobe br_netfilter sysctl -p /etc/sysctl.d/k8s.conf内存管理方面必须彻底禁用swap。有次线上故障就是因为swap没关导致kubelet频繁重启教训深刻swapoff -a sed -ri s/.*swap.*/#/ /etc/fstab主机名解析也很关键。我在三节点集群的每台机器上都配置了hosts映射echo -e 192.168.31.106 master\n192.168.31.107 node1\n192.168.31.108 node2 | sudo tee -a /etc/hosts2. Containerd运行时深度配置相比DockerContainerd作为Kubernetes运行时更轻量高效。但在RockyLinux 9.5上安装时要注意几个坑点。通过阿里云镜像源安装最新版Containerddnf config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo dnf install containerd.io -y生成默认配置后必须修改两个关键参数containerd config default | tee /etc/containerd/config.toml sed -i s#SystemdCgroup false#SystemdCgroup true# /etc/containerd/config.toml sed -i s#registry.k8s.io/pause:3.8#registry.aliyuncs.com/google_containers/pause:3.9# /etc/containerd/config.toml镜像加速配置直接影响拉取效率。我通常会配置多个国内镜像源mkdir -p /etc/containerd/certs.d/docker.io cat /etc/containerd/certs.d/docker.io/hosts.toml EOF server https://registry-1.docker.io [host.https://qa9ktbtj.mirror.aliyuncs.com] capabilities [pull, resolve] [host.https://mirror.ccs.tencentyun.com] capabilities [pull, resolve] EOF验证Containerd运行状态时我习惯用crictl工具crictl images systemctl restart containerd3. Kubernetes集群初始化实战使用kubeadm初始化集群时配置文件优化很重要。这是我的kubeadm-config.yaml模板apiVersion: kubeadm.k8s.io/v1beta3 kind: InitConfiguration localAPIEndpoint: advertiseAddress: 192.168.31.106 bindPort: 6443 nodeRegistration: criSocket: unix:///var/run/containerd/containerd.sock imagePullPolicy: IfNotPresent --- apiVersion: kubeadm.k8s.io/v1beta3 kind: ClusterConfiguration imageRepository: registry.aliyuncs.com/google_containers controlPlaneEndpoint: master:6443 networking: podSubnet: 192.168.0.0/16初始化命令建议带上--upload-certs参数方便后续扩容控制平面kubeadm init --config kubeadm-config.yaml --upload-certs节点加入集群后常见问题是节点状态显示NotReady。这通常是网络插件未安装导致的kubectl get nodes NAME STATUS ROLES AGE VERSION master NotReady control-plane 5m v1.28.24. Calico网络插件部署详解Calico 3.28.4的部署我推荐两种方式各有适用场景。方式一直接部署官方YAMLwget https://raw.githubusercontent.com/projectcalico/calico/v3.28.4/manifests/calico.yaml kubectl apply -f calico.yaml方式二离线环境部署先导入必备镜像ctr -n k8s.io images import calico-cni.tar ctr -n k8s.io images import calico-node.tar ctr -n k8s.io images import calico-kube-controllers.tar验证Calico运行状态时我最关注的是calico-node和calico-kube-controllerskubectl get pods -n kube-system -l k8s-appcalico-node kubectl logs -n kube-system calico-pod-name网络策略是Calico的强项。这里给出一个允许前端访问后端的策略示例apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-frontend-to-backend spec: podSelector: matchLabels: app: backend ingress: - from: - podSelector: matchLabels: app: frontend ports: - protocol: TCP port: 80805. 集群运维与故障排查证书管理是长期运维的重点。检查证书有效期kubeadm certs check-expiration发现证书即将过期时续订操作很简单kubeadm certs renew all systemctl restart kubelet常用诊断命令收集# 查看节点资源使用 kubectl top nodes # 检查事件记录 kubectl get events --sort-by.metadata.creationTimestamp # 查看组件日志 journalctl -u kubelet -f网络连通性测试我常用busybox工具kubectl run busybox --imagebusybox:1.28 --rm -it --restartNever -- ping 192.168.0.16. Dashboard可视化监控部署Dashboard 2.7.0时镜像替换成国内源很重要wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml sed -i s#kubernetesui/dashboard#registry.cn-hangzhou.aliyuncs.com/google_containers/dashboard# recommended.yaml kubectl apply -f recommended.yaml暴露服务的NodePort配置kubectl edit svc kubernetes-dashboard -n kubernetes-dashboard # 修改type为NodePort创建admin用户并获取tokenapiVersion: v1 kind: ServiceAccount metadata: name: admin-user namespace: kubernetes-dashboard --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: admin-user roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: admin-user namespace: kubernetes-dashboard获取登录tokenkubectl -n kubernetes-dashboard create token admin-user7. 应用部署实战案例以Nginx为例演示完整部署流程。首先准备包含私有镜像拉取secret的部署文件apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment spec: replicas: 2 selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers: - name: nginx image: registry.example.com/nginx:1.0 ports: - containerPort: 80 imagePullSecrets: - name: regcred创建镜像拉取secretkubectl create secret docker-registry regcred \ --docker-serverregistry.example.com \ --docker-usernameadmin \ --docker-passwordyourpassword \ --docker-emailuserexample.com暴露NodePort服务apiVersion: v1 kind: Service metadata: name: nginx-service spec: type: NodePort selector: app: nginx ports: - protocol: TCP port: 80 targetPort: 80 nodePort: 30001部署完成后验证服务可达性curl http://node-ip:30001 kubectl get pods -o wide kubectl logs pod-name

更多文章