加载中

Go has been designed as a backend language and is mostly used as such. Servers are the most common type of software produced with it. The question I’m going to answer here is: how to cleanly upgrade a running server?

image

Goals:

  • Do not close any of the existing connections: for instance, we don’t want to cut down any running deployment. However we want to be able to upgrade our services whenever we want without any constraint.

  • The socket should always be available for the users: if the socket is unavailable at any moment some user may get a ‘connection refused’ message which is not acceptable.

  • The new version of the process should be started and should replace the old one.

Go被设计为一种后台语言,它通常也被用于后端程序中。服务端程序是GO语言最常见的软件产品。在这我要解决的问题是:如何干净利落地升级正在运行的服务端程序。

image

目标:

  • 不关闭现有连接:例如我们不希望关掉已部署的运行中的程序。但又想不受限制地随时升级服务。

  • socket连接要随时响应用户请求:任何时刻socket的关闭可能使用户返回'连接被拒绝'的消息,而这是不可取的。

  • 新的进程要能够启动并替换掉旧的。

Principle

In UNIX-based operating systems, the common way to interact with long running processes is the signals.

  • SIGTERM: Request a process to stop gracefully

  • SIGHUP: Process restart/reload (example: nginx, sshd, apache)

Once a SIGHUP signal is received, there are several steps to restart the process gracefully:

  1. The server stops accepting new connections, but the socket is kept opened.

  2. The new version of the process is started.

  3. The socket is ‘given’ to the new process which will start accepting new connections.

  4. Once the old process has finished serving its client, the process has to stop.

原理

在基于Unix的操作系统中,signal(信号)是与长时间运行的进程交互的常用方法.

  • SIGTERM: 优雅地停止进程

  • SIGHUP: 重启/重新加载进程 (例如: nginx, sshd, apache)

如果收到SIGHUP信号,优雅地重启进程需要以下几个步骤:

  1. 服务器要拒绝新的连接请求,但要保持已有的连接。

  2. 启用新版本的进程

  3. 将socket“交给”新进程,新进程开始接受新连接请求

  4. 旧进程处理完毕后立即停止。

Stop accepting connections

Servers have this in common: they contain an infinite loop accepting connections:

for {
  conn, err := listener.Accept()
  // Handle connection}

To break this loop the easiest way is to set a timeout on the listener, when listener. SetTimeout(time.Now()) is called, the listener.Accept() will instantly return a timeout error you can catch and handle.

for {
  conn, err := listener.Accept()
  if err != nil {
    if nerr, ok := err.(net.Err); ok && nerr.Timeout() {
       fmt.Println(“Stop accepting connections”)
       return
    }
  }}

It is important to get that there is a difference between this operation and closing the listener. In this case, the process still listen on a port for example, but connections are queued by the network stack of the operating system, waiting for a process to accept them.

停止接受连接请求

服务器程序的共同点:持有一个死循环来接受连接请求:

for {
  conn, err := listener.Accept()
  // Handle connection}

跳出这个循环的最简单方式是在socket监听器上设置一个超时,当调用listener.SetTimeout(time.Now())后,listener.Accept()会立即返回一个timeout err,你可以捕获并处理:

for {
  conn, err := listener.Accept()
  if err != nil {
    if nerr, ok := err.(net.Err); ok && nerr.Timeout() {
       fmt.Println(“Stop accepting connections”)
       return
    }
  }}

注意这个操作与关闭listener有所不同。这样进程仍在监听服务器端口,但连接请求会被操作系统的网络栈排队,等待一个进程接受它们。

Start the new version of the process

Go provides a ForkExec primitive to spawn a new process. (It doesn’t allow to only fork btw, cf Is it safe to fork() a Golang process?) You can share some pieces of information with this new process, like file descriptors or your environment.

execSpec := &syscall.ProcAttr{
  Env:   os.Environ(),
  Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd()},}fork, err := syscall.ForkExec(os.Args[0], os.Args, execSpec)[…]

You can see that the process starts a new version of itself with exactly the same argument os.Args

启动新进程

Go提供了一个原始类型ForkExec来产生新进程.你可以与这个新进程共享某些消息,例如文件描述符或环境参数。

execSpec := &syscall.ProcAttr{
  Env:   os.Environ(),
  Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd()},
}fork, 

err := syscall.ForkExec(os.Args[0], os.Args, execSpec)[…]

你会发现这个进程使用完全相同的参数os.Args启动了一个新进程。

Send socket to child process and recover it

As you’ve seen just before, you can pass file descriptors to your new process, and with a bit of UNIX magic (everything is a file), we can send the socket to the new process and it will be able to use it and to accepts the waiting and future connections.

But the fork-execed process should know that it has to get its socket from a file and not building a new one (which would be already used by the way, as we haven’t closed the existing listener). You can do it anyway you want, the most common is through the environment or with a command line flag.

listenerFile, err := listener.File()if err != nil {
  log.Fatalln("Fail to get socket file descriptor:", err)}listenerFd := listenerFile.Fd()// Set a flag for the new process start processos.Setenv("_GRACEFUL_RESTART", "true")execSpec := &syscall.ProcAttr{
  Env:   os.Environ(),
  Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd(), listenerFd},}// Fork exec the new version of your serverfork, err := syscall.ForkExec(os.Args[0], os.Args, execSpec)

Then at the beginning of the program:

var listener *net.TCPListenerif os.Getenv("_GRACEFUL_RESTART") == "true" {
  file := os.NewFile(3, "/tmp/sock-go-graceful-restart")
  listener, err := net.FileListener(file)
  if err != nil {
    // handle
  }
  var bool ok
  listener, ok = listener.(*net.TCPListener)
  if !ok {
    // handle
  }} else {
  listener, err = newListenerWithPort(12345)}

The file descriptor has not been chosen randomly the file descriptor 3, it is because in the slice of uintptr which has been sent to the fork, the listener got the index 3. Be careful with shadow declaration mistakes.

发送socket到子进程并恢复它

正如你先前看到的,你可以将文件描述符传递到新进程,这需要一些UNIX魔法(一切都是文件),我们可以把socket发送到新进程中,这样新进程就能够使用它并接收及等待新的连接。

但fork-execed进程需要知道它必须从文件中得到socket而不是新建一个(有些兴许已经在使用了,因为我们还没断开已有的监听)。你可以按任何你希望的方法来,最常见的是通过环境变量或命令行标志。

listenerFile, err := listener.File()if err != nil {
  log.Fatalln("Fail to get socket file descriptor:", err)}listenerFd := listenerFile.Fd()// Set a flag for the new process start processos.Setenv("_GRACEFUL_RESTART", "true")execSpec := &syscall.ProcAttr{
  Env:   os.Environ(),
  Files: []uintptr{os.Stdin.Fd(), os.Stdout.Fd(), os.Stderr.Fd(), listenerFd},}// Fork exec the new version of your serverfork, err := syscall.ForkExec(os.Args[0], os.Args, execSpec)

然后在程序的开始处:

var listener *net.TCPListenerif os.Getenv("_GRACEFUL_RESTART") == "true" {
  file := os.NewFile(3, "/tmp/sock-go-graceful-restart")
  listener, err := net.FileListener(file)
  if err != nil {
    // handle
  }
  var bool ok
  listener, ok = listener.(*net.TCPListener)
  if !ok {
    // handle
  }} else {
  listener, err = newListenerWithPort(12345)}

文件描述没有被随机的选择为3,这是因为uintptr的切片已经发送了fork,监听获取了索引3。留意隐式声明问题

Last step, wait for the old server connections to stop

At that point, that’s it, we have passed the buck to another process which is now correctly running, the last operation for the old server is to wait that the connections are closed. There is a simple wait to implement it with go, thanks to the sync.WaitGroup structure provided in the standard library.

Each time a connection is accepted, 1 is added to the WaitGroup, then, we decrease the counter when it’s done:

for {  conn, err := listener.Accept()

 wg.Add(1)  go func() {    handle(conn)    wg.Done()  }()}

As a result to wait the end of the connections, you just have to wg.Wait(), as there is no new connection, we are waiting that the wg.Done() has been called for all the running handlers.

最后一步,等待旧服务连接停止

到此为止,就这样,我们已经将其传到另一个正在正确运行的进程,对于旧服务器的最后操作是等其连接关闭。由于标准库里提供了sync.WaitGroup结构体,用go实现这个功能很简单。

每次接收一个连接,在WaitGroup上加1,然后,我们在它完成时将计数器减一:

for {  conn, err := listener.Accept()

  wg.Add(1)  go func() {    handle(conn)    wg.Done()  }()}

至于等待连接的结束,你仅需要wg.Wait(),因为没有新的连接,我们等待wg.Done()已经被所有正在运行的handler调用。

Bonus: don’t wait infinitely but a given amount of time

With a time.Timer, it’s really straightforward to implement this:

timeout := time.NewTimer(time.Minute)wait := make(chan struct{})go func() {
  wg.Wait()
  wait <- struct{}{}}()select {case <-timeout.C:
  return WaitTimeoutErrorcase <-wait:
  return nil}

Complete example

Most of the code snippets in this article have been extracted from the complete example I’ve developped to illustrate this blog post: https://github.com/Scalingo/go-graceful-restart-example

Conclusion

Using ForkExec with socket passing is a really efficient way to upgrade a process without disturbing connections, at the maximum, new clients will wait a few milliseconds, time for the new server to boot up and get back the socket, but this amount of time is really short.

This article was part of our #FridayTechnical serie, there won’t be any article next week, merry Christmas everybody.

Links:

— Léo Unbekandt CTO @ Appsdeck

Bonus: 不要无限制等待,给定限量的时间

有time.Timer,实现很简单:

timeout := time.NewTimer(time.Minute)wait := make(chan struct{})go func() {
  wg.Wait()
  wait <- struct{}{}}()select {case <-timeout.C:
  return WaitTimeoutErrorcase <-wait:
  return nil}

完整的示例

这篇文章中的代码片段都是从这个完整的示例中提取的:https://github.com/Scalingo/go-graceful-restart-example

结论

socket传递配合ForkExec使用确实是一种无干扰更新进程的有效方式,在最大时间上,新的连接会等待几毫秒——用于服务的启动和恢复socket,但这个时间很短。

这篇文章是我#周五技术系列的一部分,下这个周不会有新的更新了,大家圣诞节快乐。

链接:

— Léo Unbekandt CTO @ Appsdeck

返回顶部
顶部