记录kubelet残留孤儿pod(Orphaned pod)无法删除的问题分析和解决方法
问题
查看kubelet日志,错误信息如下:
1
2
3
4
|
E0823 10:31:01.847946 1303 kubelet_volumes.go:140] Orphaned pod "19a4e3e6-a562-11e8-9a25-309c23027882" found, but volume paths are still present on disk : There were a total of 2 errors similar to this. Turn up verbosity to see them.
E0823 10:31:03.840552 1303 kubelet_volumes.go:140] Orphaned pod "19a4e3e6-a562-11e8-9a25-309c23027882" found, but volume paths are still present on disk : There were a total of 2 errors similar to this. Turn up verbosity to see them.
|
这些错误信息打印提示出现了Orphaned pod,并且每2秒打印1条记录,会导致系统日志充满kubelet的泛滥打印。影响对系统日志信息的查看。
分析
原因为,k8s已经删除了该Orphaned pod信息(kubectl get 已经查询不到Orphaned pod),此时,kubelet从pod manager中获取all pods信息时不会有该孤儿pod,但是kubelet在cleanupOrphanedPodDirs操作清理该pod时,发现该pod的卷目录仍存在(或挂载使用中),pod卷目录仍存在且无法删除,就导致该pod在节点上无法删除,并提示Orphaned pod错误信息。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| // cleanupOrphanedPodDirs removes the volumes of pods that should not be
// running and that have no containers running. Note that we roll up logs here since it runs in the main loop.
func (kl *Kubelet) cleanupOrphanedPodDirs(pods []*v1.Pod, runningPods []*kubecontainer.Pod) error {
// If there are still volume directories, do not delete directory
volumePaths, err := kl.getPodVolumePathListFromDisk(uid)
if err != nil {
orphanVolumeErrors = append(orphanVolumeErrors, fmt.Errorf("orphaned pod %q found, but error %v occurred during reading volume dir from disk", uid, err))
continue
}
if len(volumePaths) > 0 {
orphanVolumeErrors = append(orphanVolumeErrors, fmt.Errorf("orphaned pod %q found, but volume paths are still present on disk", uid))
continue
}
// If there are any volume-subpaths, do not cleanup directories
volumeSubpathExists, err := kl.podVolumeSubpathsDirExists(uid)
if err != nil {
orphanVolumeErrors = append(orphanVolumeErrors, fmt.Errorf("orphaned pod %q found, but error %v occurred during reading of volume-subpaths dir from disk", uid, err))
continue
}
if volumeSubpathExists {
orphanVolumeErrors = append(orphanVolumeErrors, fmt.Errorf("orphaned pod %q found, but volume subpaths are still present on disk", uid))
continue
}
}
|
解决
一般的 我们会直接删除掉这个孤儿pod的目录,该操作需谨慎。
1
2
3
4
| # kubeectl 查询不到该$podid的pod信息,则可以进行清理操作
# 删除该Orphaned pod数据目录
KUBELET_HOME=/var/lib
rm -rf ${KUBELET_HOME}/kubelet/pods/$podid
|
Orphaned pod批量清理
如果系统环境中存在大量的Orphaned pod需求清理,可使用下面脚本批量处理
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
| #!/bin/bash
# 这个脚本使用优雅方式进行pod相关挂载卷的卸载处理和卷目录删除,并不手动删除pod目录(其内可能包含pod数据)
KUBELET_HOME=/var/lib
# 通过系统日志获取到全部孤儿pod的podid
# for podid in $(grep "orphaned pod" /var/log/syslog | tail -1 | awk '{print $12}' | sed 's/"//g');
for podid in $(grep "orphaned pod" /var/log/messages | tail -1 | awk '{print $12}' | sed 's/"//g');
do
if [ ! -d ${KUBELET_HOME}/kubelet/pods/$podid ]; then
break
fi
if [ -d ${KUBELET_HOME}/kubelet/pods/$podid/volume-subpaths/ ]; then
mountpath=$(mount | grep ${KUBELET_HOME}/kubelet/pods/$podid/volume-subpaths/ | awk '{print $3}')
for mntPath in $mountpath;
do
umount $mntPath
done
rm -rf ${KUBELET_HOME}/kubelet/pods/$podid/volume-subpaths
fi
csiMounts=$(mount | grep "${KUBELET_HOME}/kubelet/pods/$podid/volumes/kubernetes.io~csi")
if [ "$csiMounts" != "" ]; then
echo "csi is mounted at: $csiMounts"
exit 1
else
rm -rf ${KUBELET_HOME}/kubelet/pods/$podid/volumes/kubernetes.io~csi
fi
volumeTypes=$(ls ${KUBELET_HOME}/kubelet/pods/$podid/volumes/)
for volumeType in $volumeTypes;
do
subVolumes=$(ls -A ${KUBELET_HOME}/kubelet/pods/$podid/volumes/$volumeType)
if [ "$subVolumes" != "" ]; then
echo "${KUBELET_HOME}/kubelet/pods/$podid/volumes/$volumeType contents volume: $subVolumes"
exit 1
else
rmdir ${KUBELET_HOME}/kubelet/pods/$podid/volumes/$volumeType
fi
done
done
|
参考