翻译于 2013/06/03 11:00
3 人 顶 此译文
It's time for web servers to handle ten thousand clients simultaneously, don't you think? After all, the web is a big place now.
And computers are big, too. You can buy a 1000MHz machine with 2 gigabytes of RAM and an 1000Mbit/sec Ethernet card for $1200 or so. Let's see - at 20000 clients, that's 50KHz, 100Kbytes, and 50Kbits/sec per client. It shouldn't take any more horsepower than that to take four kilobytes from the disk and send them to the network once a second for each of twenty thousand clients. (That works out to $0.08 per client, by the way. Those $100/client licensing fees some operating systems charge are starting to look a little heavy!) So hardware is no longer the bottleneck.
In 1999 one of the busiest ftp sites, cdrom.com, actually handled 10000 clients simultaneously through a Gigabit Ethernet pipe. As of 2001, that same speed is now being offered by several ISPs, who expect it to become increasingly popular with large business customers.
是时候让 Web 服务器同时处理一万客户端了，你不觉得吗？毕竟，现在的 Web 是一个大地盘了。
并且，计算机也是一样的大。 你可以花 $1200 左右购买一台 1000MHz，2Gb RAM 和一块 1000Mbit/s 以太网卡的机器。我们来看看——在 20000 客户端（是 50KHz，100Kb 和 50Kb/s/客户端）时，它不采取任何更多的马力而是采用 4Kb 的硬盘和为2万客户端中的每个一秒一次的发送它们到网络。（顺便说一句，这是$0.08 每客户端。 那些按 $100/客户端 许可费来收取费用的一些操作系统开始看起来有点沉重！）。所以硬件不再是瓶颈了。
1999年最繁忙的FTP网站： cdrom.com，实际上通过一个千兆的以太网管道同时地处理了 10000 客户端。截至 2001年，同样的速度现在由多个ISP提供，期望它变得越来越受大型商业客户的欢迎。
And the thin client model of computing appears to be coming back in style -- this time with the server out on the Internet, serving thousands of clients.
With that in mind, here are a few notes on how to configure operating systems and write code to support thousands of clients. The discussion centers around Unix-like operating systems, as that's my personal area of interest, but Windows is also covered a bit.
考虑到这一点，这里有几个关于如何配置操作系统和编写代码以支持数千客户端的注意事项。讨论的中心是围绕类 Unix 操作系统的。因为这是我个人感兴趣的领域。但Windows也包括了一点。
In October 2003, Felix von Leitner put together an excellent web page and presentation about network scalability, complete with benchmarks comparing various networking system calls and operating systems. One of his observations is that the 2.6 Linux kernel really does beat the 2.4 kernel, but there are many, many good graphs that will give the OS developers food for thought for some time. (See also the Slashdot comments; it'll be interesting to see whether anyone does followup benchmarks improving on Felix's results.)
If you haven't read it already, go out and get a copy of Unix Network Programming : Networking Apis: Sockets and Xti (Volume 1) by the late W. Richard Stevens. It describes many of the I/O strategies and pitfalls related to writing high-performance servers. It even talks about the 'thundering herd' problem. And while you're at it, go read Jeff Darcy's notes on high-performance server design.
(Another book which might be more helpful for those who are *using* rather than *writing* a web server is Building Scalable Web Sites by Cal Henderson.)
阅读Nick Black写的超级棒的 Fast UNIX Servers 文章.
2003年十月， Felix von Leitner 做了一个超级好的网页，展示了网络的可扩展性，他完成了在各种不同的网络系统请求和操作系统下的benchmark比较。他的一个实验观察结果就是linux2.6的内核确实比2.4的要好，但还有有很多很多好的图表数据可以引起OS开发者的深思(如有兴趣可以看看 Slashdot 的评论;是否真的有遵循Felix的实验结果对benchm的提高进行跟踪的)
如果你还没有读过上述，那么先出去买一本W. Richard Stevens写的 Unix Network Programming : Networking Apis: Sockets and Xti (Volume 1) . 这本书描述了很多编写高性能服务器的I/O策略和误区，甚至还讲解了关于 'thundering herd' 问题。惊群问题
如果你读过了，那么请读这本 Jeff Darcy's notes on high-performance server design.(Cal Henderson写的，对更加倾向于使用一款web 服务器而非开发一款服务器 来构建可扩展web站点的同志，这本书更加有用.)
Prepackaged libraries are available that abstract some of the techniques presented below, insulating your code from the operating system and making it more portable.
The following five combinations seem to be popular:
... set nonblocking mode on all network handles, and use select() or poll() to tell which network handle has data waiting. This is the traditional favorite. With this scheme, the kernel tells you whether a file descriptor is ready, whether or not you've done anything with that file descriptor since the last time the kernel told you about it. (The name 'level triggered' comes from computer hardware design; it's the opposite of 'edge triggered'. Jonathon Lemon introduced the terms in his BSDCON 2000 paper on kqueue().)
Note: it's particularly important to remember that readiness notification from the kernel is only a hint; the file descriptor might not be ready anymore when you try to read from it. That's why it's important to use nonblocking mode when using readiness notification.
... 将所有网络处理单元设置为非阻塞状态，并使用select() 或 poll()识别哪个网络处理单元有等待数据。这是传统所推崇的。在这种场景，内核会告诉你一个文件描述符是否已经具备，自从上次内核告诉你这个文件描述符以后，你是否对它完成了某种事件。（名词“电平触发”（level triggered）来自于计算机硬件设计领域；它是'边缘触发' （edge triggered）的对立面。Jonathon Lemon在他的BSDCON 2000 关于kqueue()的论文 中引入了这些术语。）
An important bottleneck in this method is that read() or sendfile() from disk blocks if the page is not in core at the moment; setting nonblocking mode on a disk file handle has no effect. Same thing goes for memory-mapped disk files. The first time a server needs disk I/O, its process blocks, all clients must wait, and that raw nonthreaded performance goes to waste.
This is what asynchronous I/O is for, but on systems that lack AIO, worker threads or processes that do the disk I/O can also get around this bottleneck. One approach is to use memory-mapped files, and if mincore() indicates I/O is needed, ask a worker to do the I/O, and continue handling network traffic. Jef Poskanzer mentions that Pai, Druschel, and Zwaenepoel's 1999 Flash web server uses this trick; they gave a talk at Usenix '99 on it. It looks like mincore() is available in BSD-derived Unixes like FreeBSD and Solaris, but is not part of the Single Unix Specification. It's available as part of Linux as of kernel 2.3.51, thanks to Chuck Lever.
But in November 2003 on the freebsd-hackers list, Vivek Pei et al reported very good results using system-wide profiling of their Flash web server to attack bottlenecks. One bottleneck they found was mincore (guess that wasn't such a good idea after all) Another was the fact that sendfile blocks on disk access; they improved performance by introducing a modified sendfile() that return something like EWOULDBLOCK when the disk page it's fetching is not yet in core. (Not sure how you tell the user the page is now resident... seems to me what's really needed here is aio_sendfile().) The end result of their optimizations is a SpecWeb99 score of about 800 on a 1GHZ/1GB FreeBSD box, which is better than anything on file at spec.org.
There are several ways for a single thread to tell which of a set of nonblocking sockets are ready for I/O:
Some OS's (e.g. Solaris 8) speed up poll() et al by use of techniques like poll hinting, which was implemented and benchmarked by Niels Provos for Linux in 1999.
The idea behind /dev/poll is to take advantage of the fact that often poll() is called many times with the same arguments. With /dev/poll, you get an open handle to /dev/poll, and tell the OS just once what files you're interested in by writing to that handle; from then on, you just read the set of currently ready file descriptors from that handle.
Various implementations of /dev/poll were tried on Linux, but none of them perform as well as epoll, and were never really completed. /dev/poll use on Linux is not recommended.
See Poller_devpoll (cc, h benchmarks ) for an example of how to use /dev/poll interchangeably with many other readiness notification schemes. (Caution - the example is for Linux /dev/poll, might not work right on Solaris.)
See below. kqueue() can specify either edge triggering or level triggering.
When you use readiness change notification, you must be prepared for spurious events, since one common implementation is to signal readiness whenever any packets are received, regardless of whether the file descriptor was already ready.
This is the opposite of "level-triggered" readiness notification. It's a bit less forgiving of programming mistakes, since if you miss just one event, the connection that event was for gets stuck forever. Nevertheless, I have found that edge-triggered readiness notification made programming nonblocking clients with OpenSSL easier, so it's worth trying.
[Banga, Mogul, Drusha '99] described this kind of scheme in 1999.
这是“水平触发”就绪通知的对立面。这是有点不宽容的编程错误，因为如果你错过了唯一的事件，连接事件就将永远的卡住。不过，我发现边沿触发就绪通知使使用 OpenSSL 编程非阻塞客户端更容易，所以是值得尝试的。
[Banga, Mogul, Drusha '99] 在1999年描述了这种方案。