翻译于 2015/01/23 22:01
2 人 顶 此译文
When building Docker containers, you should be aware of the PID 1 zombie reaping problem. That problem can cause unexpected and obscure-looking issues when you least expect it. This article explains the PID 1 problem, explains how you can solve it, and presents a pre-built solution that you can use: Baseimage-docker.
When done, you may want to read part 2: Baseimage-docker, fat containers and “treating containers as VMs”.
About a year ago — back in the Docker 0.6 days — we first introduced Baseimage-docker. This is a minimal Ubuntu base image that is modified for Docker-friendliness. Other people can pull Baseimage-docker from the Docker Registry and use it as a base image for their own images.
当构建Docker 容器时，需要注意PID 1 僵尸回收问题，那个问题会在你最不期望出现问题的时候，导致一些不期望的结果和看起来很困惑的问题。本文解释了PID 1问题，解释怎样解决它，并且作为一个预先构建的方案--可以作为一个基本的Docker镜像来使用。
当上面的问题解决了，你可能会想去读 第二部分：Docker基础镜像，胖容器 和 “把容器当虚拟机”
We were early adopters of Docker, using Docker for continuous integration and for building development environments way before Docker hit 1.0. We developed Baseimage-docker in order to solve some problems with the way Docker works. For example, Docker does not run processes under a special init process that properly reaps child processes, so that it is possible for the container to end up with zombie processes that cause all sorts of trouble. Docker also does not do anything with syslog so that it’s possible for important messages to get silently swallowed, etcetera.
However, we’ve found that a lot of people have problems understanding the problems that we’re solving. Granted, these are low-level Unix operating system-level mechanisms that few people know about or understand. So in this blog article we will describe the most important problem that we’re solving — the PID 1 problem zombie reaping problem — in detail.
We figured that:
The problems that we solved are applicable to a lot of people.
Most people are not even aware of these problems, so things can break in unexpected ways (Murphey’s law).
It’s inefficient if everybody has to solve these problems over and over.
So in our spare time we extracted our solution into a reusable base image that everyone can use: Baseimage-docker. This image also adds a bunch of useful tools that we believe most Docker image developers would need. We use Baseimage-docker as a base image for all our Docker images.
The community seemed to like what we did: we are the most popular third party image on the Docker Registry, only ranking below the official Ubuntu and CentOS images.
Recall that Unix processes are ordered in a tree. Each process can spawn child processes, and each process has a parent except for the top-most process.
This top-most process is the init process. It is started by the kernel when you boot your system. This init process is responsible for starting the rest of the system, such as starting the SSH daemon, starting the Docker daemon, starting Apache/Nginx, starting your GUI desktop environment, etc. Each of them may in turn spawn further child processes.
Nothing special so far. But consider what happens if a process terminates. Let’s say that the bash (PID 5) process terminates. It turns into a so-called “defunct process”, also known as a “zombie process”.
这个最顶层的进程是init进程。它是当你启动系统时由内核启动。这个init进程负责启动系统的其余部分，如启动SSH服务，从启动Docker守护进程，启动Apache / Nginx的，启动你的GUI桌面环境，等等。他们每个进程都可能会反过来派生出更多的子进程。
Why does this happen? It’s because Unix is designed in such a way
that parent processes must explicitly “wait” for child process
termination, in order to collect its exit status. The zombie process
exists until the parent process has performed this action, using the waitpid() family of system calls. I quote from the man page:
“A child that terminates, but has not been waited for becomes a “zombie”. The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child.”
In every day language, people consider “zombie processes” to be simply runaway processes that cause havoc. But formally speaking — from a Unix operating system point of view — zombie processes have a very specific definition. They are processes that have terminated but have not (yet) been waited for by their parent processes.
在日常的语言中，人们认为“僵尸进程”是会造成严重破坏的混乱进程。但正式的说 - 从Unix操作系统观点 - 僵尸进程有一个非常明确的定义。他们是已经终止，但没有（还）被他们的父进程等待的进程。
Most of the time this is not a problem. The action of calling
on a child process in order to eliminate its zombie, is called
“reaping”. Many applications reap their child processes correctly. In
the above example with sshd, if bash terminates then the operating
system will send a SIGCHLD signal to sshd to wake it up. Sshd notices
this and reaps the child process.
But there is a special case. Suppose the parent process terminates, either intentionally (because the program logic has determined that it should exit), or caused by a user action (e.g. the user killed the process). What happens then to its children? They no longer have a parent process, so they become “orphaned” (this is the actual technical term).
And this is where the init process kicks in. The init process — PID 1 — has a special task. Its task is to “adopt” orphaned child processes (again, this is the actual technical term). This means that the init process becomes the parent of such processes, even though those processes were never created directly by the init process.
Consider Nginx as an example, which daemonizes into the background by default. This works as follows. First, Nginx creates a child process. Second, the original Nginx process exits. Third, the Nginx child process is adopted by the init process.
You may see where I am going. The operating system kernel automatically handles adoption, so this means that the kernel expects the init process to have a special responsibility: the operating system expects the init process to reap adopted children too.
This is a very important responsibility in Unix systems. It is such a fundamental responsibility that many many pieces of software are written to make use of this. Pretty much all daemon software expect that daemonized child processes are adopted and reaped by init.
Although I used daemons as an example, this is in no way limited to just daemons. Every time a process exits even though it has child processes, it’s expecting the init process to perform the cleanup later on. This is described in detail in two very good books: Operating System Concepts by Silberschatz et al, and Advanced Programming in the UNIX Environment by Stevens et al.
Why are zombie processes a bad thing, even though they’re terminated
processes? Surely the original application memory has already been
freed, right? Is it anything more than just an entry that you see in
You’re right, the original application memory has been freed. But the fact that you still see it in
ps means that it’s still taking up some kernel resources. I quote the Linux waitpid man page:
“As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes.”
So how does this relate to Docker? Well, we see that a lot of people run only one process in their container, and they think that when they run this single process, they’re done. But most likely, this process is not written to behave like a proper init process. That is, instead of properly reaping adopted processes, it’s probably expecting another init process to do that job, and rightly so.
Let’s look at a concrete example. Suppose that your container contains a web server that runs a CGI script that’s written in bash. The CGI script calls grep. Then the web server decides that the CGI script is taking too long and kills the script, but grep is not affected and keeps running. When grep finishes, it becomes a zombie and is adopted by the PID 1 (the web server). The web server doesn’t know about grep, so it doesn’t reap it, and the grep zombie stays in the system.
让我们来看看具体的例子.假设你的容器运行了一个web服务器，web服务器运行一个CGI，它是用bash写的脚本。CGI脚本调用grep.然后web服务器决定CGI脚本运行的时间太长了并且杀死了这个脚本，但是grep 没有受影响并继续运行。当grep结束了，它成为了僵尸并且被PID 1收容（web服务器）。web服务器不知道grep，所以web容器不收割它，然后grep僵尸进程停留在系统中了。