使用 Go 语言实现优雅的服务器重启 已翻译 100%

oschina 投递于 2014/12/20 08:59 (共 7 段, 翻译完成于 12-24)
阅读 15469
收藏 246
Go
16
加载中

Go has been designed as a backend language and is mostly used as such. Servers are the most common type of software produced with it. The question I’m going to answer here is: how to cleanly upgrade a running server?

image

Goals:

  • Do not close any of the existing connections: for instance, we don’t want to cut down any running deployment. However we want to be able to upgrade our services whenever we want without any constraint.

  • The socket should always be available for the users: if the socket is unavailable at any moment some user may get a ‘connection refused’ message which is not acceptable.

  • The new version of the process should be started and should replace the old one.

已有 1 人翻译此段
我来翻译

Principle

In UNIX-based operating systems, the common way to interact with long running processes is the signals.

  • SIGTERM: Request a process to stop gracefully

  • SIGHUP: Process restart/reload (example: nginx, sshd, apache)

Once a SIGHUP signal is received, there are several steps to restart the process gracefully:

  1. The server stops accepting new connections, but the socket is kept opened.

  2. The new version of the process is started.

  3. The socket is ‘given’ to the new process which will start accepting new connections.

  4. Once the old process has finished serving its client, the process has to stop.

已有 1 人翻译此段
我来翻译

Stop accepting connections

Servers have this in common: they contain an infinite loop accepting connections:

for {
  conn, err := listener.Accept()
  // Handle connection}

To break this loop the easiest way is to set a timeout on the listener, when listener. SetTimeout(time.Now()) is called, the listener.Accept() will instantly return a timeout error you can catch and handle.

for {
  conn, err := listener.Accept()
  if err != nil {
    if nerr, ok := err.(net.Err); ok && nerr.Timeout() {
       fmt.Println(“Stop accepting connections”)
       return
    }
  }}

It is important to get that there is a difference between this operation and closing the listener. In this case, the process still listen on a port for example, but connections are queued by the network stack of the operating system, waiting for a process to accept them.

已有 1 人翻译此段
我来翻译

Start the new version of the process

Go provides a ForkExec primitive to spawn a new process. (It doesn’t allow to only fork btw, cf Is it safe to fork() a Golang process?) You can share some pieces of information with this new process, like file descriptors or your environment.

execSpec := &syscall.ProcAttr{
  Env:   os.Environ(),
  Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd()},}fork, err := syscall.ForkExec(os.Args[0], os.Args, execSpec)[…]

You can see that the process starts a new version of itself with exactly the same argument os.Args

已有 1 人翻译此段
我来翻译

Send socket to child process and recover it

As you’ve seen just before, you can pass file descriptors to your new process, and with a bit of UNIX magic (everything is a file), we can send the socket to the new process and it will be able to use it and to accepts the waiting and future connections.

But the fork-execed process should know that it has to get its socket from a file and not building a new one (which would be already used by the way, as we haven’t closed the existing listener). You can do it anyway you want, the most common is through the environment or with a command line flag.

listenerFile, err := listener.File()if err != nil {
  log.Fatalln("Fail to get socket file descriptor:", err)}listenerFd := listenerFile.Fd()// Set a flag for the new process start processos.Setenv("_GRACEFUL_RESTART", "true")execSpec := &syscall.ProcAttr{
  Env:   os.Environ(),
  Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd(), listenerFd},}// Fork exec the new version of your serverfork, err := syscall.ForkExec(os.Args[0], os.Args, execSpec)

Then at the beginning of the program:

var listener *net.TCPListenerif os.Getenv("_GRACEFUL_RESTART") == "true" {
  file := os.NewFile(3, "/tmp/sock-go-graceful-restart")
  listener, err := net.FileListener(file)
  if err != nil {
    // handle
  }
  var bool ok
  listener, ok = listener.(*net.TCPListener)
  if !ok {
    // handle
  }} else {
  listener, err = newListenerWithPort(12345)}

The file descriptor has not been chosen randomly the file descriptor 3, it is because in the slice of uintptr which has been sent to the fork, the listener got the index 3. Be careful with shadow declaration mistakes.

已有 1 人翻译此段
我来翻译

Last step, wait for the old server connections to stop

At that point, that’s it, we have passed the buck to another process which is now correctly running, the last operation for the old server is to wait that the connections are closed. There is a simple wait to implement it with go, thanks to the sync.WaitGroup structure provided in the standard library.

Each time a connection is accepted, 1 is added to the WaitGroup, then, we decrease the counter when it’s done:

for {  conn, err := listener.Accept()

 wg.Add(1)  go func() {    handle(conn)    wg.Done()  }()}

As a result to wait the end of the connections, you just have to wg.Wait(), as there is no new connection, we are waiting that the wg.Done() has been called for all the running handlers.

已有 1 人翻译此段
我来翻译

Bonus: don’t wait infinitely but a given amount of time

With a time.Timer, it’s really straightforward to implement this:

timeout := time.NewTimer(time.Minute)wait := make(chan struct{})go func() {
  wg.Wait()
  wait <- struct{}{}}()select {case <-timeout.C:
  return WaitTimeoutErrorcase <-wait:
  return nil}

Complete example

Most of the code snippets in this article have been extracted from the complete example I’ve developped to illustrate this blog post: https://github.com/Scalingo/go-graceful-restart-example

Conclusion

Using ForkExec with socket passing is a really efficient way to upgrade a process without disturbing connections, at the maximum, new clients will wait a few milliseconds, time for the new server to boot up and get back the socket, but this amount of time is really short.

This article was part of our #FridayTechnical serie, there won’t be any article next week, merry Christmas everybody.

Links:

— Léo Unbekandt CTO @ Appsdeck

已有 1 人翻译此段
我来翻译
本文中的所有译文仅用于学习和交流目的,转载请务必注明文章译者、出处、和本文链接。
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们。
加载中

评论(30)

小莫0
小莫0
感谢分享翻译..
K
Klain
现成的开源实现 https://github.com/facebookgo/grace
zoujiaqing
zoujiaqing
跟语言啥关系。。。
独孤影
独孤影
mark
l
looksgood
mark
_
_西门吹泡泡
mark
我的ID是jmjoy
我的ID是jmjoy
非常优雅的GO!
yibei
yibei
CentOS 上测试 SIGTERM: 父进程不能退出是什么原因?

20010 1 0 13:25 pts/1 00:00:00 ./ping
21215 20010 0 13:36 pts/1 00:00:00 ./ping
21601 19241 0 13:39 pts/4 00:00:00 grep ping
Doer
Doer
mark
k
kevinxu001
这个是UNIX系统下 的
返回顶部
顶部