首页 文章

AWS Kubernetes群集中的“失败创建pod沙箱”窗格错误

提问于
浏览
2

Summary of the issue 我们曾多次观察到我们的群集运行的问题是一个或多个节点上的一个或多个pod未启动(容器或容器内的容器未启动) . 该窗格显示"Failed create pod sandbox"错误 . 在"affected"节点上重新启动docker或kubelet无法解决问题 . 此外,终止并重新创建受影响的EC2实例并不能解决问题 . 如果一个pod(两个都未能启动并且"healthy")被重新安排到其他节点上,则它们无法以相同的错误启动 . 因此,只要至少有一个pod无法以"Failed create pod sandbox"错误启动,似乎一个完整的集群就会被破坏 . 我们没有找到修复集群的方法 - 只有完全重新创建集群才能帮助我们启动和运行应用程序 .

Environment * Kubernetes 1.9.3 *使用在AWS上运行的kops(v 1.8)创建的集群(专用网络拓扑)*网络:weave-net *集群:1个主节点,3个节点

Incident instance timeline

  • 我们已经使用kops执行滚动集群更新,以使用我们构建的新AMI启动节点和主控(基于kops ami k8s-1.8-debian-jessie-amd64-hvm-ebs-2017-11-27) . 调整kops AMI从来都不是问题 - 我们应用Linux更新和一些与Kubernetes无关的小改动 .

  • 在Kubernetes仪表板中注意到其中一个pod(代理pod)未能启动以下错误:0/4个节点可用:1 NodeUnschedulable,1 PodToleratesNodeTaints,3 NoVolumeZoneConflich . 创建pod沙箱失败 .

  • 由与上述节点在同一节点上安排的守护进程集(流利pod)创建的另一个pod有不同的错误:网络未就绪:[运行时网络未就绪:NetworkReady = false原因:NetworkPluginNotReady消息:docker:network plugin is未准备好:cni clnfig uninitialized] *根据Kubernetes,受影响的节点是 Health 的 . 使用kubectl描述节点,受影响的节点具有足够的资源来运行pod

  • 查看kubelet日志,我们观察了以下代理窗格:

Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.364224    1346 kubelet.go:1835] SyncLoop (ADD, "api"): "broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)"
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.512797    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "authservice-ca" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-authservice-ca") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.512862    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "broker-internal-cert" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-broker-internal-cert") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.512934    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "ssl-certs" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-ssl-certs") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.512958    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.512980    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "broker-prometheus-config" (UniqueName: "kubernetes.io/configmap/8ead64a3-28f3-11e8-b520-025c267c6ea8-broker-prometheus-config") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.513002    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "default-token-vrhqr" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-default-token-vrhqr") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: E0316 08:29:54.513117    1346 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280\"" failed. No retries permitted until 2018-03-16 08:29:55.013051677 +0000 UTC m=+33.865177055 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa\" (UniqueName: \"kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280\") pod \"broker-0\" (UID: \"8ead64a3-28f3-11e8-b520-025c267c6ea8\") "
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.613329    1346 reconciler.go:262] operationExecutor.MountVolume started for volume "broker-prometheus-config" (UniqueName: "kubernetes.io/configmap/8ead64a3-28f3-11e8-b520-025c267c6ea8-broker-prometheus-config") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.613544    1346 reconciler.go:262] operationExecutor.MountVolume started for volume "default-token-vrhqr" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-default-token-vrhqr") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.613614    1346 reconciler.go:262] operationExecutor.MountVolume started for volume "authservice-ca" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-authservice-ca") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.613660    1346 reconciler.go:262] operationExecutor.MountVolume started for volume "broker-internal-cert" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-broker-internal-cert") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.613698    1346 reconciler.go:262] operationExecutor.MountVolume started for volume "ssl-certs" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-ssl-certs") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.616720    1346 operation_generator.go:522] MountVolume.SetUp succeeded for volume "broker-prometheus-config" (UniqueName: "kubernetes.io/configmap/8ead64a3-28f3-11e8-b520-025c267c6ea8-broker-prometheus-config") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.621948    1346 operation_generator.go:522] MountVolume.SetUp succeeded for volume "ssl-certs" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-ssl-certs") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.624467    1346 operation_generator.go:522] MountVolume.SetUp succeeded for volume "authservice-ca" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-authservice-ca") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.626039    1346 operation_generator.go:522] MountVolume.SetUp succeeded for volume "broker-internal-cert" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-broker-internal-cert") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:54 ip-172-20-85-48 kubelet[1346]: I0316 08:29:54.626604    1346 operation_generator.go:522] MountVolume.SetUp succeeded for volume "default-token-vrhqr" (UniqueName: "kubernetes.io/secret/8ead64a3-28f3-11e8-b520-025c267c6ea8-default-token-vrhqr") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:55 ip-172-20-85-48 kubelet[1346]: I0316 08:29:55.014972    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:55 ip-172-20-85-48 kubelet[1346]: E0316 08:29:55.015086    1346 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280\"" failed. No retries permitted until 2018-03-16 08:29:56.015048945 +0000 UTC m=+34.867174315 (durationBeforeRetry 1s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa\" (UniqueName: \"kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280\") pod \"broker-0\" (UID: \"8ead64a3-28f3-11e8-b520-025c267c6ea8\") "
Mar 16 08:29:56 ip-172-20-85-48 kubelet[1346]: I0316 08:29:56.017902    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:56 ip-172-20-85-48 kubelet[1346]: E0316 08:29:56.018024    1346 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280\"" failed. No retries permitted until 2018-03-16 08:29:58.017982038 +0000 UTC m=+36.870107444 (durationBeforeRetry 2s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa\" (UniqueName: \"kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280\") pod \"broker-0\" (UID: \"8ead64a3-28f3-11e8-b520-025c267c6ea8\") "
Mar 16 08:29:58 ip-172-20-85-48 kubelet[1346]: I0316 08:29:58.023714    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:29:58 ip-172-20-85-48 kubelet[1346]: E0316 08:29:58.023871    1346 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280\"" failed. No retries permitted until 2018-03-16 08:30:02.023814124 +0000 UTC m=+40.875939520 (durationBeforeRetry 4s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa\" (UniqueName: \"kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280\") pod \"broker-0\" (UID: \"8ead64a3-28f3-11e8-b520-025c267c6ea8\") "
Mar 16 08:30:02 ip-172-20-85-48 kubelet[1346]: I0316 08:30:02.033951    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:30:02 ip-172-20-85-48 kubelet[1346]: E0316 08:30:02.034045    1346 nestedpendingoperations.go:263] Operation for "\"kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280\"" failed. No retries permitted until 2018-03-16 08:30:10.034017896 +0000 UTC m=+48.886143256 (durationBeforeRetry 8s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa\" (UniqueName: \"kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280\") pod \"broker-0\" (UID: \"8ead64a3-28f3-11e8-b520-025c267c6ea8\") "
Mar 16 08:30:10 ip-172-20-85-48 kubelet[1346]: I0316 08:30:10.055813    1346 reconciler.go:217] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:30:10 ip-172-20-85-48 kubelet[1346]: I0316 08:30:10.060992    1346 operation_generator.go:1111] Controller attach succeeded for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8") device path: "/dev/xvdcr"
Mar 16 08:30:10 ip-172-20-85-48 kubelet[1346]: I0316 08:30:10.156111    1346 reconciler.go:262] operationExecutor.MountVolume started for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:30:10 ip-172-20-85-48 kubelet[1346]: I0316 08:30:10.156188    1346 operation_generator.go:446] MountVolume.WaitForAttach entering for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8") DevicePath "/dev/xvdcr"
Mar 16 08:30:11 ip-172-20-85-48 kubelet[1346]: I0316 08:30:11.156560    1346 operation_generator.go:455] MountVolume.WaitForAttach succeeded for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8") DevicePath "/dev/xvdcr"
Mar 16 08:30:12 ip-172-20-85-48 kubelet[1346]: I0316 08:30:12.128286    1346 operation_generator.go:484] MountVolume.MountDevice succeeded for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8") device mount path "/var/lib/kubelet/plugins/kubernetes.io/aws-ebs/mounts/aws/eu-central-1b/vol-04145a1c9d1a26280"
Mar 16 08:30:12 ip-172-20-85-48 kubelet[1346]: I0316 08:30:12.370928    1346 operation_generator.go:522] MountVolume.SetUp succeeded for volume "pvc-b673d6da-26e3-11e8-aa99-02cd3728faaa" (UniqueName: "kubernetes.io/aws-ebs/aws://eu-central-1b/vol-04145a1c9d1a26280") pod "broker-0" (UID: "8ead64a3-28f3-11e8-b520-025c267c6ea8")
Mar 16 08:30:12 ip-172-20-85-48 kubelet[1346]: I0316 08:30:12.672408    1346 kuberuntime_manager.go:385] No sandbox for pod "broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)" can be found. Need to start a new one
Mar 16 08:30:15 ip-172-20-85-48 kubelet[1346]: I0316 08:30:15.337311    1346 kubelet.go:1880] SyncLoop (PLEG): "broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)", event: &pleg.PodLifecycleEvent{ID:"8ead64a3-28f3-11e8-b520-025c267c6ea8", Type:"ContainerStarted", Data:"b08ea5b45ce3ba467856952ad6cc095f4b796673d7dfbf3b9c4029b6b1a75a1b"}
Mar 16 08:34:12 ip-172-20-85-48 kubelet[1346]: E0316 08:34:12.672940    1346 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 16 08:34:12 ip-172-20-85-48 kubelet[1346]: E0316 08:34:12.672956    1346 kuberuntime_manager.go:647] createPodSandbox for pod "broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
Mar 16 08:34:12 ip-172-20-85-48 kubelet[1346]: E0316 08:34:12.673020    1346 pod_workers.go:186] Error syncing pod 8ead64a3-28f3-11e8-b520-025c267c6ea8 ("broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)"), skipping: failed to "CreatePodSandbox" for "broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)" with CreatePodSandboxError: "CreatePodSandbox for pod \"broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)\" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Mar 16 08:34:12 ip-172-20-85-48 kubelet[1346]: E0316 08:34:12.673020    1346 pod_workers.go:186] Error syncing pod 8ead64a3-28f3-11e8-b520-025c267c6ea8 ("broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)"), skipping: failed to "CreatePodSandbox" for "broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)" with CreatePodSandboxError: "CreatePodSandbox for pod \"broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)\" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Mar 16 08:34:12 ip-172-20-85-48 kubelet[1346]: E0316 08:34:12.673020    1346 pod_workers.go:186] Error syncing pod 8ead64a3-28f3-11e8-b520-025c267c6ea8 ("broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)"), skipping: failed to "CreatePodSandbox" for "broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)" with CreatePodSandboxError: "CreatePodSandbox for pod \"broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)\" failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Mar 16 08:34:13 ip-172-20-85-48 kubelet[1346]: I0316 08:34:13.301580    1346 kuberuntime_manager.go:416] Sandbox for pod "broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)" has no IP address.  Need to start a new one
Mar 16 08:34:14 ip-172-20-85-48 kubelet[1346]: I0316 08:34:14.005589    1346 kubelet.go:1880] SyncLoop (PLEG): "broker-0(8ead64a3-28f3-11e8-b520-025c267c6ea8)", event: &pleg.PodLifecycleEvent{ID:"8ead64a3-28f3-11e8-b520-025c267c6ea8", Type:"ContainerDied", Data:"b08ea5b45ce3ba467856952ad6cc095f4b796673d7dfbf3b9c4029b6b1a75a1b"}
  • 通过强制在“受影响的节点”上重新安排在Docker守护程序和kubelet重新启动后重新安排另一个pod(adapter-mqtt-vertx)的问题,产生类似的结果

有没有人知道这里的问题是什么,什么是补救措施?

1 回答

  • 1

    [运行时网络未就绪:NetworkReady = false原因:NetworkPluginNotReady消息:docker:网络插件尚未就绪:cni clnfig uninitialized]

    该帖子中有很多信息,但就我所知,pod沙箱问题总是SDN和/或CNI配置错误 . 我会确保 /etc/cni/net.d/opt/cni/binassociated machinery都很酷 . 遗憾的是,我在调试CNI的过程中遇到的经验很少,因为通常我的问题是对这些目录之一的遗忘(或拥有错误的权限)的简单疏忽 . 我肯定没有任何建设性的调试编织:-(

    希望你的情况也是一个“简单”的问题,或者我希望其他人能够权衡谁拥有比我更多的CNI或编织调试技能

相关问题