加载中

Most web-based services begin as a collection of front-end application servers paired with databases used to manage data storage. As they grow, the databases are augmented with caches to store frequently-read pieces of data and improve site performance. Often, the ability to quickly access data moves from being an optimization to a requirement for a site. This evolution of cache from neat optimization to necessity is a common path that has been followed by many large web scale companies, including Facebook, Twitter[1], Instagram, Reddit, and many others.

大多数Web服务开始于前端负载均衡、中间业务服务及后端数据库服务的架构。当业务发展到一定阶段,通常会引入一组缓存服务缓存数据库的数据,减少数据库的压力以提高性能。对一个提供Web服务的站点来说,通过引入缓存技术快速 访问数据的能力从最初的优化技能变成了一项必备技能。业界大型的Web服务站点,诸如:Facebook、Twitter、Instagram, Reddit等都经历了这样一种演变。

Last year, at the Data@Scale event and at the USENIX Networked Systems Design and Implementation conference , we spoke about turning caches into distributed systems using software we developed called mcrouter (pronounced “mick-router”). Mcrouter is a memcached protocol router that is used at Facebook to handle all traffic to, from, and between thousands of cache servers across dozens of clusters distributed in our data centers around the world. It is proven at massive scale — at peak, mcrouter handles close to 5 billion requests per second. Mcrouter was also proven to work as a standalone binary in an Amazon Web Services setup when Instagram used it last year before fully transitioning to Facebook's infrastructure.

Today, we are excited to announce that we are releasing mcrouter’s code under an open-source BSD license. We believe it will help many sites scale more easily by leveraging Facebook’s knowledge about large-scale systems in an easy-to-understand and easy-to-deploy package.

在去年的Data@Scale大会和USENIX的USENIX的NSDI(联网系统设计和实现座谈会)上我们就提过会将缓存换成我们自主开发的分布式软件系统,我们称之为mcrouter(发音“mick-router”)。Mcrouter是一个memcached协议的路由器,被facebook用于在他们遍布全球的数据中心中的数十个集群几千个服务器之间控制流量。它适用于大规模的级别中,在峰值的时候,mcrouter处理接近50亿的请求/秒。Mcrouter同样也可以作为独立的二进制包工作于AWS中,去年之前Instagram使用它来完全过渡到Facebook的基础设施。

今天,我们激动的宣布我们将发布mcrouter的源代码(开源BSD协议)。我们相信它可以帮助更多的网站通过Facebook的大规模系统的知识以一种更容易理解更容易发布的方式扩大其系统的规模。

Features

Since any client that wants to talk to memcached can already speak the standard ASCII memcached protocol, we use that as the common API and enter the picture silently. To a client, mcrouter looks like a memcached server. To a server, mcrouter looks like a normal memcached client. But mcrouter's feature-rich configurability makes it more than a simple proxy.

Some features of mcrouter are listed below. In the following, a “destination” is a memcached host (or some other cache service that understands the memcached protocol) and “pool” is a set of destinations configured for some workload — e.g., a sharded pool with a specified hashing function, or a replicated pool with multiple copies of the data on separate hosts. Finally, pools can be organized into multiple clusters.

特性

由于任何要接入memcached服务的客户端,都会使用标准ASCII编码的memcached协议,我们可以采用memcached的通用API作为通信方式(参看下图)。对于memcached客户端,mcrouter完全像一个memcached服务器。对于服务器,mcrouter完全像一个普通的memcached客户端。但mcrouter丰富的可配置性,使得它更像一个简化的proxy。

下面列举了一些mcrouter的特性。其中“destination”指memcached 主机(或者其他能兼容memcached协议的缓存服务实现)。“pool”指集群化的destinations,并能通过配置将负载均衡分配给不同的destination--例如,可通过hash方式均衡,亦可通过冗余数据均衡(读操作)。无论何种方式,pools最终都能以集群的方式进行管理。

  • Standard open source memcached ASCII protocol support: Any client that can talk the memcached protocol can already talk to mcrouter — no changes are needed. Mcrouter can simply be simply dropped in between clients and memcached boxes to take advantage of its functionality.

  • Connection pooling: Multiple clients can connect to a single mcrouter instance and share the outgoing connections, reducing the number of open connections to memcached instances.

  • Multiple hashing schemes: Mcrouter provides a proven consistent hashing algorithm (furc_hash) that allows distribution of keys across many memcached instances. Hostname hashing is useful for selecting a unique replica per client. There are a number of other hashes useful in specialized applications.

  • Prefix routing: Mcrouter can route keys according to common key prefixes. For example, you can send all keys starting with “foo” to one pool, “bar” prefix to another pool, and everything else to a “wildcard” pool. This is a simple way to separate different workloads.

  • Replicated pools: A replicated pool has the same data on multiple hosts. Writes are replicated to all hosts in the pool, while reads are routed to a single replica chosen separately for each client. This could be done either due to per-host packet limitations where a sharded pool would not be able to handle the read rate; or for increased availability of the data (one replica going down doesn't affect availability due to automatic failover).

  • Production traffic shadowing: When testing new cache hardware, we found it extremely useful to be able to route a complete copy of production traffic from clients. Mcrouter supports flexible shadowing configuration. It's possible to shadow test a different pool size (re-hashing the key space), shadow only a fraction of the key space, or vary shadowing settings dynamically at runtime.

  • Online reconfiguration: Mcrouter monitors its configuration files and automatically reloads them on any file change; this loading and parsing is done on a background thread and new requests are routed according to the new configuration as soon as it's ready. There's no extra latency from client's point of view.

  • Flexible routing: Configuration is specified as a graph of small routing modules called “route handles,” which share a common interface (route a request and return a reply) and which can be composed freely. Route handles are easy to understand, create, and test individually, allowing for arbitrarily complex logic when used together. For example: An “all-sync” route handle will be set up with multiple child route handles (which themselves could be arbitrary route handles). It will pass a request to all of its children and wait for all of the replies to come back before returning one of these replies. Other examples include, among many others, “all-async” (send to all but don't wait for replies), “all-majority” (for consensus polling), and “failover” (send to every child in order until an non-error reply is returned). Expanding a pool can be done quickly by using a “cold cache warmup” route handle on the pool (with the old set of servers as the warm pool). Moving this handle handle up the stack will allow for an entire cluster to be warmed up from a warm cluster.

  • Destination health monitoring and automatic failover: Mcrouter keeps track of the health status of each destination. If mcrouter marks a destination as unresponsive, it will fail over incoming requests to an alternate destination automatically (fast failover) without attempting to send them to the original destination. At the same time health check requests will be sent in the background, and as soon as a health check is successful, mcrouter will revert to using the original destination. We distinguish between “soft errors” (e.g., data timeouts) that are allowed to happen a few times in a row and “hard errors” (e.g., connection refused) that cause a host to be marked unresponsive immediately. Needless to say, all of this is completely transparent to the client.

  • Cold cache warm up: Mcrouter can smooth the performance impact of starting a brand new empty cache host or set of hosts (as large as an entire cluster) by automatically refilling it from a designated “warm” cache.

  • Broadcast operations: By adding a special prefix to a key in a request, it's easy to replicate the same request into multiple pools and/or clusters.

  • Reliable delete stream: In a demand-filled look-aside cache, it's important to ensure all deletes are eventually delivered to guarantee consistency. Mcrouter supports logging delete commands to disk in cases when the destination is not accessible (due to a network outage or other failure). A separate process then replays those deletes asynchronously. This is done transparently to the client — the original delete command is always reported as successful.

  • Multi-cluster support: Configuration management for large multi-cluster setups is easy. A single config can be distributed to all clusters and, depending on command line options, mcrouter will interpret the config based on its location.

  • Rich stats and debug commands: Mcrouter exports many internal counters (via a “stats” command; also to a JSON file on disk). Introspection debug commands are also available, which can answer questions like “Which host would a particular request go to?” at runtime.

  • Quality of service: Mcrouter allows throttling the rate of any type of request (e.g., get/set/delete) at any level (per-host, per-pool, per-cluster), rejecting requests over a specified limit. We also support rate limit requests to slow delivery.

  • Large values: Mcrouter can automatically split/re-stitch large values that would not normally fit in a memcached slab.

  • Multi-level caches: Mcrouter supports local/remote cache setup, where values would be looked up locally first and automatically set in a local cache from remote after fetching.

  • IPv6 support: We have strong support internally for IPv6 at Facebook, so mcrouter is IPv6 compatible out of the box.

  • SSL support: Mcrouter supports SSL connections (incoming or outgoing), as long as the client or the destination hosts support it as well. It is also possible to set up multiple mcrouters in series, in which case the middle connection between mcrouters can be over SSL out of the box.

  • Multi-threaded architecture: Mcrouter can take full advantage of multicore systems by starting one thread per core.

  • 支持标准开源的memcached ASCII编码协议:支持memcahed协议的所有客户端无需做任何修改即可被mcrouter支持.只需要将mcrouter连接客户端和memcached盒子便能让其正常工作.

  • 连接池:mcrouter能让客户端共享连接池,以减少连接个数

  • 多种散列方法:mcrouter提供了一个行之有效的consistent hashing算法(furc_hash),算法允许给多个memcached实例分配哈希值。Hostname hashing再根据分配的哈希值为客户端选择一个独一无二的副本.在特定的应用中很有很多其他的有用的散列方法.

  • 前缀路由:mcrouter可以根据key前缀把客户端分配到不同的memcahed池.比如,你可以把以”foo"为前缀的所有key分配到一个“foo"池,把以"bar"为前缀的所有key分配到另外一个"bar"池,其他的key都分配到"wildcard" 池.这是一种简单的均衡负载的方法.参照下图:

  • memcached池备份:在多个主机上保存一份相同数据的备份.做写错做时向所有的主机写入同一份数据,但是做读操作时只从客户端对应的缓存区读取一份数据.这样就可以处理由于主机数据的限制造成分片池不能处理的读出率的问题;而且还能增强数据的可用性(比如:由于故障自动转移,即使一份备份坏掉也不会影响其正常操作).

  • 演示路径跟踪: 在测试新缓存设备时,能够路由从客户端到缓存设备的所有可能路径是非常有用的. Mcrouter支持灵活的跟踪配置,通过重新哈希值范围跟踪测试不同大小的memcached池,或只跟踪哈希值范围的一部分,或在运行时动态修改跟踪环境. 

  • 热加载:mcrouter监控它所有的配置文件.一旦检测到任何配置文件被修改,mcrouter的一个后台线程将自动的重新加载,分析这些文件.在这个操作完成之后,mcrouter会根据新配置来处理新请求.这个过程对客户端而言是透明的.

  • 灵活的路由方式: "路由句柄"是由小路由模块组合而成,这些路由模块公用一个接口(路由一个请求,返回一个回复),也可以自由组合.单个路由句柄更容易理解、创建和测试. 利用路由句柄创建复杂的逻辑.比如:名为"all-sync"路由句柄是由多个子句柄组成的, 它发送一个请求给所有的子句柄,只有在所有的子句柄发回回复时,"all-sync"才发回其中的一个回复. 还有其他的类似的例子:"all-async"(发送请求到所有的子句柄,但不会等待子句柄的回复), "all-majority"(舆论调查),"failover"(顺序的发送请求到每个子句柄直到收到一个无错误的回复). 通过"cold cached warmup"句柄能快速扩充一个memcached池(把旧的memcached服务器作为"暖缓存").

  • Destination心跳检测和自动故障转移: Mcrouter能够检测每个destination的心跳.一旦mcrouter将一个destination标记为无响应,它将直接将所有的请求转移到另一个可用的destination. 同时,后台形成将向无响应destination发送心跳请求,只要destination的心跳恢复正常,mcrouter将会重新启用这个destination."软错误"(比如:数据超时)允许连续发生多次,但是一旦发生"硬错误"(比如:拒绝连接)mcrouter立即将该destination标记为无响应. 不用说,这个过程对客户端完全是透明的.

  • 自动填充新增缓存:mcrouter通过指定的"warm"缓存区主动填充新增的缓存区域的方式来消除新增缓存区造成的性能影响.

  • 广播操作:通过在请求关键字里面增加一个特殊的前缀,能够很容易把请求备份到多个memcached池或者集群里面.

  • 可靠的删除操作:在一个有求必应的缓冲区里,保证所有的删除操作都送到目的地是非常重要的.mcrouter将所有的删除操作都记录到硬盘上以防止由于网络中断或者原因导致destination不可访问.当连接修复之后,mcrouter将启动一个单独的进程异步的重新执行这些删除操作.这个过程对客户端是透明的,并且客户端接受到的操作结果通常是成功的.

  • 支持多集群:mcrouter能通过的简单的配置管理大的多集群.单个配置通过命令行参数分发到所有的集群,mcrouter根据命令行参数的位置解释配置.

  • 丰富的stats和debug命令:mcrouter通过"stats"命令导出很多的内部计数器(或者以JSON格式导出到文件系统).mcrouter提供自我调试命令,这种命令能够反应在运行时一个特定的请求被分配到哪一个主机.

  • 保障服务质量:mcrouter允许以主机,池或者集群为单位设置任何请求(比如:get/set/delete)的速率的阀值,当请求个数超过阀值,剩下的请求将会被拒绝. 同事也支持限制请求的速度以减缓请求的发送速度.

  •  分割数据块:当传入的数据超过memcached slab的大小,mcrouter能根据slab的大小自动分割或重组数据块.

  • 多级缓存:mcrouter支持本地/远程缓存设置.请求数据时先从本地缓存区查找,如果在本地缓存区查找失败再从远程缓存区查找,如果在远程缓存区找到该数据,mcrouter会自动的将数据缓存到本地缓存区.

  • 支持IPv6:在Facebook,mcrouter支持IPv6. 同样,在Faceboo局域网以外的地方,mcrouter也支持IPv6.

  • 支持SSL:mcrouter就支持双向SSL(输入或者输出),只要客户端或者目标主机其中的一方支持SSL即可。串联多个mcrouter, 在两两mcrouter之间的连接支持SSL也是可能的.

  • 多线程架构:mcrouter通过一个内核一个线程的方式充分利用多核系统的优势.

Implementation

Mcrouter is written mostly in C++ (with heavy use of C++11 features), with some library code written in C and protocol parsing code written in Ragel. It uses Facebook's open source libraries Folly and fbthrift (for async networking code).

A mcrouter process starts up multiple independent threads, each running an event loop that processes all network events asynchronously (using libevent). Once each request or reply is parsed, it's processed inside its own lightweight thread or “fiber”; we have a custom fiber library implementation built on top of boost::context.

Mcrouter configuration is written in JSON format and allows specifying an arbitrary route handle scheme to easily adapt to any routing task. We have presented some common use cases in depth on our wiki.

实现

Mcrouter使用C++开发(使用了大量的C++ 11特性),其余用C开发了功能库部分,用Ragel开发了协议解析部分。并借用了Facebook的开源库Follyfbthrift(用于异步网络处理)。

一个Mcrouter的进程,会启动多个相互独立的线程,用于异步处理网络事件(基于libevent的实现)。当线程处理请求包/响应包时,它会使用内部的轻量级线程/或称"纤程(fiber)"。纤程的实现是基于boost::context

Mcrouter采用JSON格式的配置,支持通过任意方式的路由处理(route handle scheme),以适应各种路由需求。这里有一些常用的示例可供参考。

What's next

We invite software engineers using memcached everywhere to evaluate mcrouter and see if it helps to simplify the site administration while providing the new capabilities listed above (shadow testing, cold cache warmup, and so on). Instagram used mcrouter for the last year, before transitioning to Facebook's infrastructure, so mcrouter is proven in an Amazon Web Services setup. Prior to open sourcing, we partnered with Reddit for a limited beta test, and they are currently running mcrouter in production for some of their caches.

We would also love to see patches come back that will make mcrouter more helpful to you and to others in the memcached community.

Mcrouter source code has been open sourced at https://github.com/facebook/mcrouter. We're always looking for ways to improve mcrouter's performance, fix bugs, and add new features. We will continuously update the external Github repo with our internal changes, so you can benefit from this work as well. We maintain mcrouter documentation on the Github wiki. We have also set up a Facebook discussion group.

下一步是什么?

我们邀请软件工程师使用Memcached,在任何地方评估Mcrouter,看看它是否能帮助简化站点的管理。与此同时,它还提供了许多新功能,上面的列表列出的(诸如:shadow testing,cold cache warmup 等等)。在过渡到Facebook的基础架构之前,Instagram使用了一年多的Mcrouter,因此Mcrouter在Amazon Web Services上被证明是可行的。在(项目)开源之前,我们与Reddit合作,他们提供了一套限定的β测试(方案),现在他们还在许多生产环境的cache上运行Mcrouter。

我们乐于看到持续的改进,这将让Mcrouter更有助于在Memcached社区的你和其他人。

Mcrouter的源代码已经被开源,并放在了https://github.com/facebook/mcrouter。我们一直在寻求改进Mcrouter性能的方法(修复bugs,添加新特性)。我们将会持续不断更新外部的Github repo和我们内部的更改,因此,你将会受益于这项工作。我们会在Github wiki上维护Mcrouter的文档。我们还建立了一个Facebook讨论组

返回顶部
顶部