目录

kubelet残留孤儿pod(Orphaned pod)无法删除的问题

记录kubelet残留孤儿pod(Orphaned pod)无法删除的问题分析和解决方法

问题

查看kubelet日志,错误信息如下:

1
2
3
4

E0823 10:31:01.847946    1303 kubelet_volumes.go:140] Orphaned pod "19a4e3e6-a562-11e8-9a25-309c23027882" found, but volume paths are still present on disk : There were a total of 2 errors similar to this. Turn up verbosity to see them.
E0823 10:31:03.840552    1303 kubelet_volumes.go:140] Orphaned pod "19a4e3e6-a562-11e8-9a25-309c23027882" found, but volume paths are still present on disk : There were a total of 2 errors similar to this. Turn up verbosity to see them.

这些错误信息打印提示出现了Orphaned pod,并且每2秒打印1条记录,会导致系统日志充满kubelet的泛滥打印。影响对系统日志信息的查看。

分析

原因为,k8s已经删除了该Orphaned pod信息(kubectl get 已经查询不到Orphaned pod),此时,kubelet从pod manager中获取all pods信息时不会有该孤儿pod,但是kubelet在cleanupOrphanedPodDirs操作清理该pod时,发现该pod的卷目录仍存在(或挂载使用中),pod卷目录仍存在且无法删除,就导致该pod在节点上无法删除,并提示Orphaned pod错误信息。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// cleanupOrphanedPodDirs removes the volumes of pods that should not be
// running and that have no containers running.  Note that we roll up logs here since it runs in the main loop.
func (kl *Kubelet) cleanupOrphanedPodDirs(pods []*v1.Pod, runningPods []*kubecontainer.Pod) error {
        // If there are still volume directories, do not delete directory
        volumePaths, err := kl.getPodVolumePathListFromDisk(uid)
        if err != nil {
            orphanVolumeErrors = append(orphanVolumeErrors, fmt.Errorf("orphaned pod %q found, but error %v occurred during reading volume dir from disk", uid, err))
            continue
        }
        if len(volumePaths) > 0 {
            orphanVolumeErrors = append(orphanVolumeErrors, fmt.Errorf("orphaned pod %q found, but volume paths are still present on disk", uid))
            continue
        }

        // If there are any volume-subpaths, do not cleanup directories
        volumeSubpathExists, err := kl.podVolumeSubpathsDirExists(uid)
        if err != nil {
            orphanVolumeErrors = append(orphanVolumeErrors, fmt.Errorf("orphaned pod %q found, but error %v occurred during reading of volume-subpaths dir from disk", uid, err))
            continue
        }
        if volumeSubpathExists {
            orphanVolumeErrors = append(orphanVolumeErrors, fmt.Errorf("orphaned pod %q found, but volume subpaths are still present on disk", uid))
            continue
        }

}

解决

一般的 我们会直接删除掉这个孤儿pod的目录,该操作需谨慎。

1
2
3
4
# kubeectl 查询不到该$podid的pod信息,则可以进行清理操作
# 删除该Orphaned pod数据目录
KUBELET_HOME=/var/lib
rm -rf ${KUBELET_HOME}/kubelet/pods/$podid

Orphaned pod批量清理

如果系统环境中存在大量的Orphaned pod需求清理,可使用下面脚本批量处理

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#!/bin/bash

# 这个脚本使用优雅方式进行pod相关挂载卷的卸载处理和卷目录删除,并不手动删除pod目录(其内可能包含pod数据)

KUBELET_HOME=/var/lib
# 通过系统日志获取到全部孤儿pod的podid
# for podid in $(grep "orphaned pod" /var/log/syslog | tail -1 | awk '{print $12}' | sed 's/"//g');
for podid in $(grep "orphaned pod" /var/log/messages | tail -1 | awk '{print $12}' | sed 's/"//g');
do
    if [ ! -d ${KUBELET_HOME}/kubelet/pods/$podid ]; then
        break
    fi

    if [ -d ${KUBELET_HOME}/kubelet/pods/$podid/volume-subpaths/ ]; then
        mountpath=$(mount | grep ${KUBELET_HOME}/kubelet/pods/$podid/volume-subpaths/ | awk '{print $3}')
        for mntPath in $mountpath;
        do
            umount $mntPath
        done
        rm -rf ${KUBELET_HOME}/kubelet/pods/$podid/volume-subpaths
    fi

    csiMounts=$(mount | grep "${KUBELET_HOME}/kubelet/pods/$podid/volumes/kubernetes.io~csi")
    if [ "$csiMounts" != "" ]; then
        echo "csi is mounted at: $csiMounts"
        exit 1
    else
        rm -rf ${KUBELET_HOME}/kubelet/pods/$podid/volumes/kubernetes.io~csi
    fi

    volumeTypes=$(ls ${KUBELET_HOME}/kubelet/pods/$podid/volumes/)
    for volumeType in $volumeTypes;
    do
        subVolumes=$(ls -A ${KUBELET_HOME}/kubelet/pods/$podid/volumes/$volumeType)
        if [ "$subVolumes" != "" ]; then
            echo "${KUBELET_HOME}/kubelet/pods/$podid/volumes/$volumeType contents volume: $subVolumes"
            exit 1
        else
            rmdir ${KUBELET_HOME}/kubelet/pods/$podid/volumes/$volumeType
        fi
    done
done

参考