加载中

When building Docker containers, you should be aware of the PID 1 zombie reaping problem. That problem can cause unexpected and obscure-looking issues when you least expect it. This article explains the PID 1 problem, explains how you can solve it, and presents a pre-built solution that you can use: Baseimage-docker.

When done, you may want to read part 2: Baseimage-docker, fat containers and “treating containers as VMs”.

Introduction

About a year ago — back in the Docker 0.6 days — we first introduced Baseimage-docker. This is a minimal Ubuntu base image that is modified for Docker-friendliness. Other people can pull Baseimage-docker from the Docker Registry and use it as a base image for their own images.

当构建Docker 容器时,需要注意PID 1 僵尸回收问题,那个问题会在你最不期望出现问题的时候,导致一些不期望的结果和看起来很困惑的问题。本文解释了PID 1问题,解释怎样解决它,并且作为一个预先构建的方案--可以作为一个基本的Docker镜像来使用。

当上面的问题解决了,你可能会想去读 第二部分:Docker基础镜像,胖容器 和 “把容器当虚拟机” 

介绍

大概一年前-回溯到Docker0.6 的时期-我第一次介绍Docker基础镜像。这是为了对Docker友好而修改的最小的Ubuntu 基础镜像。其他人可以 从Docker 登记下载Docker基础镜像并且把它作为他们自己镜像的基础镜像。

We were early adopters of Docker, using Docker for continuous integration and for building development environments way before Docker hit 1.0. We developed Baseimage-docker in order to solve some problems with the way Docker works. For example, Docker does not run processes under a special init process that properly reaps child processes, so that it is possible for the container to end up with zombie processes that cause all sorts of trouble. Docker also does not do anything with syslog so that it’s possible for important messages to get silently swallowed, etcetera.

However, we’ve found that a lot of people have problems understanding the problems that we’re solving. Granted, these are low-level Unix operating system-level mechanisms that few people know about or understand. So in this blog article we will describe the most important problem that we’re solving — the PID 1 problem zombie reaping problem — in detail.

Zombies

我们是早期的Docker使用者,用Docker来持续集成并且在Docker达到1.0版本之前,用作搭建开发环境的方式。为了解决一些使用Docker能够解决的问题,我们开发了docker基础镜像。例如,Docker不会在某个恰当处理子进程的初始化进程下运行进程。以至于容器有可能结束导致各种各样的问题僵尸进程。Docker 也不会做任何事情,以至于让重要的消息能够正常的被处理等等。

然而,我们已经发现很多人对我们解决的问题理解上有问题。Granted,是Unix操作系统底层很少人知道和理解的系统级机制。所以在本文中我们将会详细描述这个我们已经解决了的最重要的PID 1僵尸进程问题。

Zombies

We figured that:

  1. The problems that we solved are applicable to a lot of people.

  2. Most people are not even aware of these problems, so things can break in unexpected ways (Murphey’s law).

  3. It’s inefficient if everybody has to solve these problems over and over.

So in our spare time we extracted our solution into a reusable base image that everyone can use: Baseimage-docker. This image also adds a bunch of useful tools that we believe most Docker image developers would need. We use Baseimage-docker as a base image for all our Docker images.

The community seemed to like what we did: we are the most popular third party image on the Docker Registry, only ranking below the official Ubuntu and CentOS images.

我们发现:

  1. 我们解决的问题适用于很多人

  2. 大多数人甚至没有意识到这些问题,所以很多事情会以意想不到的方式被打断(墨菲定律)

  3. 如果每个人都一遍又一遍的重复解决这些问题是低效率的。

所以我们在空闲时间,把解决方案提取为每个人可以复用的基础镜像:Baseimage-docker.这个镜像也加入了一些有用的,相信大多数Docker镜像开发者都需要的工具。我们把Baseimage-docker作为我们所有Docker镜像的一个基础镜像。

社区看起来喜欢我们所做得事情:我们是Docker注册处最流行的第三方镜像。只是排在了官方的Ubuntu和CentOServer镜像下面。

The PID 1 problem: reaping zombies

Recall that Unix processes are ordered in a tree. Each process can spawn child processes, and each process has a parent except for the top-most process.

This top-most process is the init process. It is started by the kernel when you boot your system. This init process is responsible for starting the rest of the system, such as starting the SSH daemon, starting the Docker daemon, starting Apache/Nginx, starting your GUI desktop environment, etc. Each of them may in turn spawn further child processes.

Unix process hierarchy

Nothing special so far. But consider what happens if a process terminates. Let’s say that the bash (PID 5) process terminates. It turns into a so-called “defunct process”, also known as a “zombie process”.

Zombie process

PID 1 问题: 进程僵尸

回想一下Unix的进程是一个有序的树。每个进程可以派生子进程,每个进程具有一个除了最顶层以外的父进程。

这个最顶层的进程是init进程。它是当你启动系统时由内核启动。这个init进程负责启动系统的其余部分,如启动SSH服务,从启动Docker守护进程,启动Apache / Nginx的,启动你的GUI桌面环境,等等。他们每个进程都可能会反过来派生出更多的子进程。

Unix process hierarchy

到目前为止还没有什么特别的。但考虑到如果一个进程终止会发生什么。比方说,bash(PID 5)进程终止。它变成了一个所谓的“停止活动的进程”,也称为“僵尸进程”。

Zombie process

Why does this happen? It’s because Unix is designed in such a way that parent processes must explicitly “wait” for child process termination, in order to collect its exit status. The zombie process exists until the parent process has performed this action, using the waitpid() family of system calls. I quote from the man page:

“A child that terminates, but has not been waited for becomes a “zombie”. The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child.”

In every day language, people consider “zombie processes” to be simply runaway processes that cause havoc. But formally speaking — from a Unix operating system point of view — zombie processes have a very specific definition. They are processes that have terminated but have not (yet) been waited for by their parent processes.

为什么会这样?这是因为Unix被设计为这样一种方式,父进程必须明确地“等待”子进程终止,以便收集它的退出状态.。僵尸进程一直存在,直到父进程已经执行该操作,使用系统调用waitpid()函数。我从手册页引用

“一个子进程终止了,但一直被等待就变成了”僵尸“。内核维护了一组关于僵尸进程最小的信息列表(PID,终止状态,资源使用信息),为了让父进程以后进行等待时,能够获取有关子进程的信息。”

在日常的语言中,人们认为“僵尸进程”是会造成严重破坏的混乱进程。但正式的说 - 从Unix操作系统观点 - 僵尸进程有一个非常明确的定义。他们是已经终止,但没有(还)被他们的父进程等待的进程。

Most of the time this is not a problem. The action of calling waitpid() on a child process in order to eliminate its zombie, is called “reaping”. Many applications reap their child processes correctly. In the above example with sshd, if bash terminates then the operating system will send a SIGCHLD signal to sshd to wake it up. Sshd notices this and reaps the child process.

Zombie process reaping

But there is a special case. Suppose the parent process terminates, either intentionally (because the program logic has determined that it should exit), or caused by a user action (e.g. the user killed the process). What happens then to its children? They no longer have a parent process, so they become “orphaned” (this is the actual technical term).

大多数时间这都不是问题,在子进程上调用waitpid()的动作是为了消除它的僵死进程,这就是所谓的“收割”。许多应用正确的收割它们的子进程。在上面的例子中用的是sshd,如果bash终止了然后操作系统将会向sshd发送一个SIGCHLD信号把它唤醒。sshd注意到了这个信号后就收割子进程。

Zombie process reaping

但是有个特殊情况,假设父进程终结了,或者是故意的(因为程序逻辑决定该退出系统了)或者是用户的操作导致的(例如用户将这个进程杀死了)。这个父进程的子进程将会发生什么?他们不再有父进程了,所以他们变成了“孤儿”(这是实际的专业术语)。

And this is where the init process kicks in. The init process — PID 1 — has a special task. Its task is to “adopt” orphaned child processes (again, this is the actual technical term). This means that the init process becomes the parent of such processes, even though those processes were never created directly by the init process.

Consider Nginx as an example, which daemonizes into the background by default. This works as follows. First, Nginx creates a child process. Second, the original Nginx process exits. Third, the Nginx child process is adopted by the init process.

Orphaned process adoption

You may see where I am going. The operating system kernel automatically handles adoption, so this means that the kernel expects the init process to have a special responsibility: the operating system expects the init process to reap adopted children too.

这就是init进程起作用的地方。init进程--PID 1--有一个特殊的任务。就是“接收”孤儿进程(注意,这是实际的技术术语)。这就意味着init进程变为了这些进程的父进程。尽管这些进程从来都没有被init进程直接创建。

拿Nginx作为例子,默认是作为后台守护进程。它是这么工作的。第一,Nginx创建一个子进程。第二,原始的Nginx进程退出了。第三,Nginx子进程被init进程给接收了。

Orphaned process adoption

你可能知道我将要表达什么。操作系统内核自动的处理收容,所以这就意味着内核期望init进程要有一个专门的职责:操作系统也期望init进程收割被接收的孤儿进程。

This is a very important responsibility in Unix systems. It is such a fundamental responsibility that many many pieces of software are written to make use of this. Pretty much all daemon software expect that daemonized child processes are adopted and reaped by init.

Although I used daemons as an example, this is in no way limited to just daemons. Every time a process exits even though it has child processes, it’s expecting the init process to perform the cleanup later on. This is described in detail in two very good books: Operating System Concepts by Silberschatz et al, and Advanced Programming in the UNIX Environment by Stevens et al.

Operating System Concepts by Silberschatz et al Avanced Programming in the Unix Environment by Stevens et al

这是Unix系统中一个非常重要的职责。它是如此基础的职责以至于很多很多软件的都利用了这一点。所有的守护软件非常期望被守护的子进程都被init进程收容和收割。

尽管我用守护进程作为例子,但不限于守护进程。每当一个进程退出了,虽然它还有子进程存在。这是因为它们期待init进程稍后来清理。这些已经详细的在这两本书中描述了:操作系统概念 著 Silberschatz等和Unix环境中的高级编程 著 Stevens 等。

Operating System Concepts by Silberschatz et al

Avanced Programming in the Unix Environment by Stevens et al

Why zombie processes are harmful

Why are zombie processes a bad thing, even though they’re terminated processes? Surely the original application memory has already been freed, right? Is it anything more than just an entry that you see in ps?

You’re right, the original application memory has been freed. But the fact that you still see it in ps means that it’s still taking up some kernel resources. I quote the Linux waitpid man page:

“As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes.”

为什么僵尸进程是有害的

即使他们终止了进程,为什么僵尸进程是一件坏事 ? 原始应用程序的内存已经被释放,对啊?这不仅仅是一个条目,你在ps中看到它了吗?

你是对的,原始应用程序的内存已经被释放。 但事实上,你还看到它在ps中,这意味着它仍然占用一些内核资源。 我参考Linux waitpid手册:

“只要一个僵尸进程通过等待没有在系统中被移除, 它就会在内核进程表中消耗一个位置,并且要是这个表被填满,那它就没办法创建一个新的进程。”

Relationship with Docker

So how does this relate to Docker? Well, we see that a lot of people run only one process in their container, and they think that when they run this single process, they’re done. But most likely, this process is not written to behave like a proper init process. That is, instead of properly reaping adopted processes, it’s probably expecting another init process to do that job, and rightly so.

Let’s look at a concrete example. Suppose that your container contains a web server that runs a CGI script that’s written in bash. The CGI script calls grep. Then the web server decides that the CGI script is taking too long and kills the script, but grep is not affected and keeps running. When grep finishes, it becomes a zombie and is adopted by the PID 1 (the web server). The web server doesn’t know about grep, so it doesn’t reap it, and the grep zombie stays in the system.

与Docker的关系

那么这怎么涉及到Docker?我们看到很多人在他们的容器里只运行一个进程,他们认为运行单进程,他们的工作就结束了。但是,这个进程写出来并不是为了完全像init进程的行为。也就是说,非但没有恰当的收割被收容的孤儿进程,反而没准它还期望其他的init进程来正确地做那样的工作。

让我们来看看具体的例子.假设你的容器运行了一个web服务器,web服务器运行一个CGI,它是用bash写的脚本。CGI脚本调用grep.然后web服务器决定CGI脚本运行的时间太长了并且杀死了这个脚本,但是grep 没有受影响并继续运行。当grep结束了,它成为了僵尸并且被PID 1收容(web服务器)。web服务器不知道grep,所以web容器不收割它,然后grep僵尸进程停留在系统中了。

返回顶部
顶部