Uber 的分布式跟踪 已翻译 12%

itfanr 投递于 10/09 15:05 (共 16 段)
阅读 62
收藏 0
0
加载中

分布式跟踪正迅速成为许多组织用于监视复杂的基于微服务的架构的工具中必不可少的组件。在 Uber 工程团队中,我们的开源分布式跟踪系统 Jaeger 在整个 2016 年都实现了大规模的内部采用,集成到数百个微服务中,现在每秒能记录数千条记录。随着新一年的开始,这篇文章讲述我们如何得到下面的内容,从调查像 Zipkin 这样的现成解决方案,到为什么我们从 pull 架构切换到 push 架构,以及分布式跟踪将如何在 2017 年继续发展。

从 Monolith 到 Microservices

随着 Uber 的业务成倍增长,我们的软件架构复杂性也在增长。一年多以前的2015年秋天,我们有大约 500 个微服务。截至2017年初,我们有超过两千个微服务。这部分是由于越来越多的业务功能 —— 面向用户的业务功能,如 UberEATSUberRUSH —— 以及像欺诈检测、数据挖掘和地图处理等这样的内部功能。复杂性增加的另一个原因是从大型单片应用程序转向分布式微服务架构。

硅谷课堂
硅谷课堂
翻译于 10/16 09:58
0

As it often happens, moving into a microservices ecosystem brings its own challenges. Among them is the loss of visibility into the system, and the complex interactions now occurring between services. Engineers at Uber know that our technology has a direct impact on people’s livelihoods. The reliability of the system is paramount, yet it is not possible without observability. Traditional monitoring tools such as metrics and distributed logging still have their place, but they often fail to provide visibility across services. This is where distributed tracing thrives.

Tracing Uber’s Beginnings

The first widely used tracing system at Uber was called Merckx, named after the fastest cyclist in the world during his time. Merckx quickly answered complex queries about Uber’s monolithic Python backend. It made queries like “find me requests where the user was logged in and the request took more than two seconds and only certain databases were used and a transaction was held open for more than 500 ms” possible. The profiling data was organized into a tree of blocks, with each block representing a certain operation or a remote call, similar to the notion of “span” in the OpenTracing API. Users could run ad hoc queries against the data stream in Kafka using command-line tools. They could also use a web UI to view predefined digests that summarized the high-level behavior of API endpoints and Celery tasks.

Merckx modeled the call graph as a tree of blocks, with each block representing an operation within the application, such as a database call, an RPC, or even a library function like parsing JSON.

已有 1 人翻译此段(待审批)
我来翻译

Merckx instrumentation was automatically applied to a number of infrastructure libraries in Python, including HTTP clients and servers, SQL queries, Redis calls, and even JSON serialization. The instrumentation recorded certain performance metrics and metadata about each operation, such as the URL for an HTTP call, or SQL query for database calls. It also captured information like how long database transactions have remained open, and which database shards and replicas were accessed.

Merckx architecture is a pull model from a stream of instrumentation data in Kafka.

The major shortcoming with Merckx was its design for the days of a monolithic API at Uber. Merckx lacked any concept of distributed context propagation. It recorded SQL queries, Redis calls, and even calls to other services, but there was no way to go more than one level deep. One other interesting Merckx limitation was that many advanced features like database transaction tracking really only worked under uWSGI, since Merckx data was stored in a global, thread-local storage. Once Uber started adopting Tornado, an asynchronous application framework for Python services, the thread-local storage was unable to represent many concurrent requests running in the same thread on Tornado’s IOLoop. We began to realize how important it was to have a solid story for keeping request state around and propagating it correctly, without relying on global variables or global state.

还没有人翻译此段落
我来翻译

Next, Tracing in TChannel

At the beginning of 2015, we started the development of TChannel, a network multiplexing and framing protocol for RPC. One of the design goals of the protocol was to have Dapper-style distributed tracing built into the protocol as a first-class citizen. Toward that goal, the TChannel protocol specification defined tracing fields as part of the binary format.

spanid:8 parentid:8 traceid:8 traceflags:1

fieldtypedescription

spanid

int64that identifies the current span

parentid

int64of the previous span

traceid

int64assigned by the original requestor

traceflags

uint8bit flags field

Tracing fields appear as part of the binary format in TChannel protocol specification.

In addition to the protocol specification, we released several open-source client libraries that implement the protocol in different languages. One of the design principles for those libraries was to have the notion of a request context that the application was expected to pass through from the server endpoints to the downstream call sites. For example, in tchannel-go, the signature to make an outbound call with JSON encoding required the context as the first argument:

func (c *Client) Call(ctx Context, method string, arg, resp interface{}) error {..}

还没有人翻译此段落
我来翻译

The TChannel libraries encouraged application developers to write their code with distributed context propagation in mind.

The client libraries had built-in support for distributed tracing by marshalling the tracing context between the wire representation and the in-memory context object, and by creating tracing spans around service handlers and the outbound calls. Internally, the spans were represented in a format nearly identical to the Zipkin tracing system, including the use of Zipkin-specific annotations, such as “cs” (Client Send) and “cr” (Client Receive). TChannel used a tracing reporter interface to send the collected tracing spans out of process to the tracing system’s backend. The libraries came with a default reporter implementation that used TChannel itself and Hyperbahn, the discovery and routing layer, to send the spans in Thrift format to a cluster of collectors.

TChannel client libraries got us close to the working distributing tracing system Uber needed, providing the following building blocks:

  • Interprocess propagation of tracing context, in-band with the requests

  • Instrumentation API to record tracing spans

  • In-process propagation of the tracing context

  • Format and mechanism for reporting tracing data out of process to the tracing backend

还没有人翻译此段落
我来翻译

The only missing piece was the tracing backend itself. Both the wire format of the tracing context and the default Thrift format used by the reporter have been designed to make it very straightforward to integrate TChannel with a Zipkin backend. However, at the time the only way to send spans to Zipkin was via Scribe, and the only performant data store that Zipkin supported was Cassandra. Back then, we had no direct operational experience for either of those technologies, so we built a prototype backend that combined some custom components with the Zipkin UI to form a complete tracing system.

The architecture of the prototype backend for TChannel-generated traces was a push model with custom collectors, custom storage, and the open source Zipkin UI.

The success of distributed tracing systems at other major tech companies such as Google and Twitter was predicated on the availability of RPC frameworks, Stubby and Finagle respectively, widely used at those companies.

还没有人翻译此段落
我来翻译

Similarly, out-of-the-box tracing capabilities in TChannel were a big step forward. The deployed backend prototype started receiving traces from several dozen services right away. More services were being built using TChannel, but full-scale production rollout and widespread adoption were still problematic. The prototype backend and its Riak/Solr based storage had some issues scaling up to Uber’s traffic, and several query capabilities were missing to properly interoperate with the Zipkin UI. And despite the rapid adoption of TChannel by new services, Uber still had a large number of services not using TChannel for RPC; in fact, most of the services responsible for running the core business functions ran without TChannel. These services were implemented in four major programming languages (Node.js, Python, Go, and Java), using a variety of different frameworks for interprocess communication. This heterogeneity of the technology landscape made deploying distributed tracing at Uber a much more difficult task than at places like Google and Twitter.

Building Jaeger in New York City

The Uber NYC Engineering organization began in early 2015, with two primary teams: Observability on the infrastructure side and Uber Everything on the product side (including UberEATS and UberRUSH). Since distributed tracing is a form of production monitoring, it was a good fit for Observability.

还没有人翻译此段落
我来翻译

We formed the Distributed Tracing team with two engineers and two objectives: transform the existing prototype into a full-scale production system, and make distributed tracing available to and adopted by all Uber microservices. We also needed a code name for the project. Naming things is one of the two hard problems in computer science, so it took us a couple weeks of brainstorming words with the themes of tracing, detectives, and hunting, until we settled on the name Jaeger (ˈyā-gər), German for hunter or hunting attendant.

The NYC team already had the operational experience of running Cassandra clusters, which was the database directly supported by the Zipkin backend, so we decided to abandon the Riak/Solr based prototype. We reimplemented the collectors in Go to accept TChannel traffic and store it in Cassandra in the binary format compatible with Zipkin. This allowed us to use Zipkin web and query services without any modifications, and also provided the missing functionality of searching traces by custom tags. We have also built in a dynamically configurable multiplication factor into each collector to multiply the inbound traffic n times for the purpose of stress testing the backend with production data.

The early Jaeger architecture still relied on Zipkin UI and Zipkin storage format.

还没有人翻译此段落
我来翻译

The second order of business was to make tracing available to all the existing services that were not using TChannel for RPC. We spent the next few months building client side libraries in Go, Java, Python, and Node.js to support instrumentation of arbitrary services, including HTTP-based ones. Even though the Zipkin backend was fairly well known and popular, it lacked a good story on the instrumentation side, especially outside of the Java/Scala ecosystem. We considered various open source instrumentation libraries, but they were maintained by different people with no guarantee of interoperability on the wire, often with completely different APIs, and most requiring Scribe or Kafka as the transport for reporting spans. We ultimately decided to write our own libraries that would be integration tested for interoperability, support the transport that we needed, and, most importantly, provide a consistent instrumentation API in different languages. All our client libraries have been build to support the OpenTracing API from inception.

已有 1 人翻译此段(待审批)
我来翻译

Another novel feature that we built into the very first versions of the client libraries was the ability to poll the tracing backend for the sampling strategy. When a service receives a request that has no tracing metadata, the tracing instrumentation usually starts a new trace for that request by generating a new random trace ID. However, most production tracing systems, especially those that have to deal with the scale of Uber, do not profile every single trace or record it in storage. Doing so would create a prohibitively large volume of traffic from the services to the tracing backend, possibly orders of magnitude larger than the actual business traffic handled by the services. Instead, most tracing systems sample only a small percentage of traces and only profile and record those sampled traces. The exact algorithm for making a sampling decision is what we call a sampling strategy. Examples of sampling strategies include:

  • Sample everything. This is useful for testing, but expensive in production!

  • A probabilistic approach, where a given trace is sampled randomly with a certain fixed probability.

  • A rate limiting approach, where X number of traces are sampled per time unit. For example, a variant of the leaky bucket algorithm might be used.

还没有人翻译此段落
我来翻译
本文中的所有译文仅用于学习和交流目的,转载请务必注明文章译者、出处、和本文链接。
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们。
加载中

评论(0)

返回顶部
顶部