关于 Java 性能方面的 9 个谬论 已翻译 100%

oschina 投递于 2013/04/25 08:37 (共 11 段, 翻译完成于 04-27)
阅读 22334
收藏 196
7
加载中

Java performance has the reputation of being something of a Dark Art. Partly this is due to the sophistication of the platform, which makes it hard to reason about in many cases. However, there has historically also been a trend for Java performance techniques to consist of a body of folk wisdom rather than applied statistics and empirical reasoning. In this article, I hope to address some of the most egregious of these technical fairytales.

1. Java is slow

Of all the most outdated Java Performance fallacies, this is probably the most glaringly obvious.

Sure, back in the 90s and very early 2000s, Java could be slow at times.


However we have had over 10 years of improvements in virtual machine and JIT technology since thenand Java's overall performance is now screamingly fast.

In six separate web performance benchmarks, Java frameworks took 22 out of the 24 top-four positions.

The JVM's use of profiling to only optimize the commonly-used codepaths, but to optimize those heavily has paid off. JIT-compiled Java code is now as fast as C++ in a large (and growing) number of cases.

Despite this, the perception of Java as a slow platform persists, perhaps due to a negative historical bias from people who had experiences with early versions of the Java platform.

We suggest remaining objective and assessing up-to-date performance results before jumping to conclusions.

已有 1 人翻译此段
我来翻译

2. A single line of Java means anything in isolation

Consider the following short line of code:

MyObject obj = new MyObject();

To a Java developer, it seems obvious that this code must allocate an object and run the appropriate constructor.

From that we might begin to reason about performance boundaries. We know that there is some finite amount of work that must be going on, and so we can attempt to calculate performance impact based on our presumptions.

This is a cognitive bias that can trap us into thinking that we know, a priori, that any work will need to be done at all.

In actuality, both javac and the JIT compiler can optimize away dead code. In the case of the JIT compiler, code can even be optimized away speculatively, based on profiling data. In such cases the line of code won't run at all, and so it will have zero performance impact.

Furthermore, in some JVMs, such as JRockit, the JIT compiler can even decompose object operations so that allocations can be avoided even if the code path is not completely dead.

The moral of the story here is that context is significant when dealing with Java performance, and premature optimization can produce counter-intuitive results. For best results don’t attempt to optimize prematurely. Instead always build your code and use performance tuning techniques to locate and correct your performance hot spots.

已有 1 人翻译此段
我来翻译

3. A microbenchmark means what you think it does

As we saw above, reasoning about a small section of code is less accurate than analyzing overall application performance.

Nonetheless developers love to write microbenchmarks. The visceral pleasure that some people derive from tinkering with some low-level aspect of the platform seems to be endless.

Richard Feynman once said: "The first principle is that you must not fool yourself - and you are the easiest person to fool". Nowhere is this truer than when writing Java microbenchmarks.

Writing good microbenchmarks is profoundly difficult. The Java platform is sophisticated and complex, and many microbenchmarks only succeed in measuring transient effects, or other unintended aspects of the platform.

For example, a naively written microbenchmark will frequently end up measuring the timing subsystem or perhaps garbage collection rather than the effect it was trying to capture.

已有 2 人翻译此段
我来翻译

Only developers and teams that have a real need for should write microbenchmarks. These benchmarks should be published in their entirety (including source code), and should be reproducible and subject to peer review and deep scrutiny.

The Java platform's many optimizations imply that statistics of individual runs matters. A single benchmark must be run many times and the results aggregated to get a really reliable answer.

If you feel you must write microbenchmarks, then a good place to start is by reading the paper "Statistically Rigorous Java Performance Evaluation" by Georges, Buytaert, Eeckhout. Without proper treatment of the statistics, it is very easy to be misled.

There are well-developed tools and communities around them (for example, Google's Caliper) - if you absolutely must write microbenchmarks, then do not do so by yourself - you need the viewpoints and experience of your peers.

已有 1 人翻译此段
我来翻译

4. Algorithmic slowness is the most common cause of performance problems

A very familiar cognitive fallacy among developers (and humans in general) is to assume that the parts of a system that they control are the important ones.

In Java performance, this manifests itself by Java developers believing that algorithmic quality is the dominant cause of performance problems. Developers think about code, so they have a natural bias towards thinking about their algorithms.

In practice, when dealing with a range of real-world performance problems, algorithm design was found to be the fundamental issue less than 10% of the time.

Instead, garbage collection, database access and misconfiguration were all much more likely to cause application slowness than algorithms.

Most applications deal with relatively small amounts of data, so that even major algorithmic inefficiencies don't often lead to severe performance problems. To be sure, we are acknowledging that the algorithms were suboptimal; nonetheless the amount of inefficiency they added was small relative to other, much more dominant performance effects from other parts of the application stack.

So our best advice is to use empirical, production data to uncover the true causes of performance problems. Measure; don't guess!

已有 1 人翻译此段
我来翻译

5. Caching solves everything

"Every problem in Computer Science can be solved by adding another level of indirection"

This programmer's aphorism, attributed to David Wheeler (and thanks to the Internet, to at least two other Computer Scientists), is surprisingly common, especially among web developers.

Often this fallacy arises due to analysis paralysis when faced with an existing, poorly understood architecture.

Rather than deal with an intimidating extant system, a developer will frequently choose to hide from it by sticking a cache in front and hoping for the best. Of course, this approach just complicates the overall architecture and makes the situation worse for the next developer who seeks to understand the status quo of production.

Large, sprawling architectures are written one line, and one subsystem at a time. However, in many cases simpler, refactored architectures are more performant - and they are almost always easier to understand.

So when you are evaluating whether caching is really necessary, plan to collect basic usage statistics (miss rate, hit rate, etc.) to prove that the caching layer is actually adding value.

已有 1 人翻译此段
我来翻译

6. All apps need to be concerned about Stop-The-World

A fact of life of the Java platform is that all application threads must periodically stop to allow Garbage Collection to run. This is sometimes brandished as a serious weakness, even in the absence of any real evidence.

Empirical studies have shown that human beings cannot normally perceive changes in numeric data (e.g. price movements) occurring more frequently than once every 200ms. 

Consequently for applications that have a human as their primary user, a useful rule of thumb is that Stop-The-World (STW) pause of 200ms or under is usually of no concern. Some applications (e.g. streaming video) need lower GC jitter than this, but many GUI applications will not.

There are a minority of applications (such as low-latency trading, or mechanical control systems) for which a 200ms pause is unacceptable. Unless your application is in that minority it is unlikely your users will perceive any impact from the garbage collector.

It is also worth mentioning that in any system where there are more application threads than physical cores, the operating system scheduler will have to intervene to time-slice access to the CPUs. Stop-The-World sounds scary, but in practice, every application (whether JVM or not) has to deal with contended access to scarce compute resources.

Without measurement, it isn't clear that the JVM's approach has any meaningful additional impact on application performance.

In summary, determine whether pause times are actually affecting your application by turning on GC logs. Analyze the logs (either by hand, or with scripting or a tool) to determine the pause times. Then decide whether these really pose a problem for your application domain. Most importantly, ask yourself a most poignant question: have any users actually complained?

已有 1 人翻译此段
我来翻译

7. Hand-rolled Object Pooling is appropriate for a wide range of apps

One common response to the feeling that Stop-The-World pauses are somehow bad is for application groups to invent their own memory management techniques within the Java heap. Often this boils down to implementing an object pooling (or even full-blown reference-counting) approach and requiring any code using the domain objects to participate.

This technique is almost always misguided. It often has its roots in the distant past, where object allocation was expensive and mutability was deemed inconsequential. The world is very different now.

Modern hardware is incredibly efficient at allocation; the bandwidth to memory is at least 2 to 3GB on recent desktop or server hardware. This is a big number; outside of specialist use cases it is not that easy to make real applications saturate that much bandwidth.

Object pooling is generally difficult to implement correctly (especially when there are multiple threads at work) and has several negative requirements that render it a poor choice for general use:

  • All developers who touch the code must be aware of pooling and handle it correctly
  • The boundary between "pool-aware" and "non-pool-aware" code must be known and documented
  • All of this additional complexity must be kept up to date, and regularly reviewed
  • If any of this fails, the risk of silent corruption (similar to pointer re-use in C) is reintroduced

In summary, object pooling should only be used when GC pauses are unacceptable, and intelligent attempts at tuning and refactoring have been unable to reduce pauses to an acceptable level.

已有 1 人翻译此段
我来翻译

8. CMS is always a better choice of GC than Parallel Old

By default, the Oracle JDK will use a parallel, stop-the-world collector for collecting the old generation.

An alternative choice is Concurrent-Mark-Sweep (CMS). This allows application threads to continue running throughout most of the GC cycle, but it comes at a price, and with quite a few caveats.

Allowing application threads to run alongside GC threads invariably results in application threads mutating the object graph in a way that would affect the liveness of objects. This has to be cleaned up after the fact, and so CMS actually has two (usually very short) STW phases.

This has several consequences:

  1. All application threads have to be brought to safe points and stopped twice per full collection;
  2. Whilst the collection is running concurrently, application throughput is reduced (usually by 50%);
  3. The overall amount of bookkeeping (and CPU cycles) in which the JVM engages to collect garbage via CMS is considerably higher than for parallel collection.

Depending on the application circumstances these prices may be worth paying or they may not. But there’s no such thing as a free lunch. The CMS collector is a remarkable piece of engineering, but it is not a panacea.

So before concluding that CMS is your correct GC strategy, you should first determine that STW pauses from Parallel Old are unacceptable and can't be tuned. And finally, (and I can’t stress this enough), be sure that all metrics are obtained on a production-equivalent system.

已有 1 人翻译此段
我来翻译

9. Increasing the heap size will solve your memory problem

When an application is in trouble and GC is suspected, many application groups will respond by just increasing the heap size. Under some circumstances, this can produce quick wins and allow time for a more considered fix. However, without a full understanding of the causes of the performance problem, this strategy can actually make matters worse.

Consider a badly coded application that is producing too many domain objects (with a typical lifespan of say two to three seconds). If the allocation rate is high enough, garbage collections could occur so rapidly that the domain objects are promoted into the tenured (old) generation. Once in tenured, the domain objects die almost immediately, but they would not be collected until the next full collection.

If this application has its heap size increased, then all we're really doing is adding space for relatively short-lived domain objects to propagate into and die. This can make the length of Stop-The-World pauses worse for no benefit to the application.

Understanding the dynamics of object allocation and lifetime before changing heap size or tuning other parameters is essential. Acting without measuring can make matters worse. The tenuring distribution information from the garbage collector is especially important here.

已有 1 人翻译此段
我来翻译
本文中的所有译文仅用于学习和交流目的,转载请务必注明文章译者、出处、和本文链接。
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们。
加载中

评论(180)

旧心暖暖
旧心暖暖

引用来自“xesam”的评论

引用来自“幸福线”的评论

就凭打开eclipse的速度,我就不可能相信java快

你该换电脑了。。

引用来自“lplus”的评论

Java总是觉得硬件能解决速度,理论上是可以无限提升,只要硬件都是可以无限提升的,这句话说了多少年了,我酷睿I7,16G内存,做为一个个人电脑还不够吗,可并没有明显的提升
难道你打开visual stdio不慢吗
钱吉亮
钱吉亮
java 的优势不在性能快慢上面,而在于项目管理上面。
1.在项目开发中,java程序员易于培养且工资较低
2.项目维护成本低(java项目在开发中代码可读性强于其他语言)
3.项目增加服务内容或二次开发时,其时间及开发成本远低于其他语言
4.在手机应用上与ios二分天下,android是以java为基础的(windows phone只能拼命追赶)
5.随着技术的进步,硬件(cpu及内存)的限制将逐渐不是问题
6.云技术逐渐成熟将进一步解放性能限制
7.java语言更新换代中有很多大公司在做推手,而且java程序员人数很多,在遇到问题时容易解决
8.java 开源
以上优势还没有哪一种语言能够媲美,所以一般大公司的项目主体是java,而对性能要求较高的程序才会用C进行开发。未来的趋势将是Java和C的天下,这一趋势无人能挡。(除非出现一位计算机天才开发出比java更好的超越以上优势的语言,并且有大公司愿意支持,java才有可能退下神坛)
---- LyonQian--2016-06-01
l
lplus

引用来自“breakerror”的评论

sb java把中国人的计算机能都排挤到底层之外的地方。直接架空了中国的人。工程上面用用还说的过去,如果大面积推广给年轻人,中国的 计算机科学将没有未来了!!!
顶一个,java就是烂
l
lplus

引用来自“xesam”的评论

引用来自“幸福线”的评论

就凭打开eclipse的速度,我就不可能相信java快

你该换电脑了。。
Java总是觉得硬件能解决速度,理论上是可以无限提升,只要硬件都是可以无限提升的,这句话说了多少年了,我酷睿I7,16G内存,做为一个个人电脑还不够吗,可并没有明显的提升
walk273
walk273
听说Java不慢的消息很振奋...Java讲究规范性, 目的是减轻编程难度, 增强代码的可维护性
chaogao
chaogao
小伙改行卖水果,你们接着争吧
Le_Guto
Le_Guto

引用来自“鳄鱼先生”的评论

引用来自“breakerror”的评论

引用来自“鳄鱼先生”的评论

引用来自“HooxinFirefoxmmx”的评论

引用来自“鳄鱼先生”的评论

实话讲,现在还在追捧C++的程序员绝对是脑残。
我做过10年C++开发,从来不觉得C++比Java有啥明显的优势,除了工资方面。

哈哈,工资就是重点,鳄鱼先生高明,高级黑啊.

我是想说:大部分企业雇佣C++程序员绝对是脑残。一门没落的语言真不值得再讨论了!

简单的例子。有本事你java虚拟机不要用c++写啊!用java写啊!可以说java就是寄生在c++上面的一种半残废语言!可以说没有java世界照样转,唯一多出来的只是那些不懂指针不懂内存的人的蛋蛋的忧伤吧!不懂内存管理...呵呵!

第一版本的C++编译器是由C开发的,C编译器是汇编和机器码实现的,你干脆学机器码吧! 内存,寄存器都任你控制。

java 还要弄个jvm , 还要弄个Native
Le_Guto
Le_Guto

引用来自“鳄鱼先生”的评论

引用来自“李珍珍”的评论

引用来自“鳄鱼先生”的评论

引用来自“李珍珍”的评论

java 内存占用量很大 没错吧 ,, 这才是它的致命缺陷

内存便宜还是一个合格的C++程序员便宜?

手机的内存可以随便添加?

你会用C++去开发手机应用?
实话讲到现在还想不通还有那么多人追捧C++。

我说的是应用方面, 没说在什么机器上做开发, 你开发出来的东西是要用的-- ,不是只放在一台很牛X 的机器能运行的很爽就行--
撸红薯
撸红薯

引用来自“唐海康”的评论

引用来自“breakerror”的评论

引用来自“ichenshy”的评论

引用来自“阿尔法兽”的评论

引用来自“ichenshy”的评论

引用来自“阿尔法兽”的评论

引用来自“ichenshy”的评论

把Java运行慢归结于“历史原因造成的偏见”,实在太可笑了!说Java慢的,一般都是由另一门语言转到Java后的主观感受(也有人做比较测试),并不是什么偏见!!现在Java的运行速度的确有提升,除了“有超过10年的时间来改进虚拟机和JIT技术”,还有硬件的飞速发展,而且这个因素占了很大比重。为什么把Java的执行效率跟CPU和内存占用一起说呢? 由于服务器的硬件提升已经非常大,所以在主观上很难感觉到Java慢,需要做对比测试,用数据来比较,但如果是面向用户的桌面应用,主观感受一个字——慢!两个字——笨重!

欢迎贴出比较数据

你第一天用Java,还是一直只用Java?还是从来不上技术性的网站,还是从来不看其他程序员的博客?

欢迎贴出数据

http://starry198804265811.iteye.com/blog/1055377
http://developer.51cto.com/art/201007/211089.htm
http://www.keakon.net/2009/12/07/Java%E3%80%81PHP%E3%80%81Python%E4%B8%8EMySQL%E4%BA%A4%E4%BA%92%E7%9A%84%E6%80%A7%E8%83%BD%E6%B5%8B%E8%AF%95
跟Java做性能对比测试的,这么多年我都不知道看过多少了,有国外国内,有的测试设计很好,有的设计欠周全,stackoverflow也经常有这方面的讨论,但总体上来说,Java一般都不占什么优势!

如果你开始就从java入手的话,你一定认为内存可以无限大!

额……我应该可以认为你是完全不懂java,但是了解java比C/C++慢,并且内存占用大。是这样吧- -

不多就java的内涵和c++相比幼稚到不好意思
撸红薯
撸红薯

引用来自“唐海康”的评论

引用来自“breakerror”的评论

引用来自“ichenshy”的评论

引用来自“阿尔法兽”的评论

引用来自“ichenshy”的评论

引用来自“阿尔法兽”的评论

引用来自“ichenshy”的评论

把Java运行慢归结于“历史原因造成的偏见”,实在太可笑了!说Java慢的,一般都是由另一门语言转到Java后的主观感受(也有人做比较测试),并不是什么偏见!!现在Java的运行速度的确有提升,除了“有超过10年的时间来改进虚拟机和JIT技术”,还有硬件的飞速发展,而且这个因素占了很大比重。为什么把Java的执行效率跟CPU和内存占用一起说呢? 由于服务器的硬件提升已经非常大,所以在主观上很难感觉到Java慢,需要做对比测试,用数据来比较,但如果是面向用户的桌面应用,主观感受一个字——慢!两个字——笨重!

欢迎贴出比较数据

你第一天用Java,还是一直只用Java?还是从来不上技术性的网站,还是从来不看其他程序员的博客?

欢迎贴出数据

http://starry198804265811.iteye.com/blog/1055377
http://developer.51cto.com/art/201007/211089.htm
http://www.keakon.net/2009/12/07/Java%E3%80%81PHP%E3%80%81Python%E4%B8%8EMySQL%E4%BA%A4%E4%BA%92%E7%9A%84%E6%80%A7%E8%83%BD%E6%B5%8B%E8%AF%95
跟Java做性能对比测试的,这么多年我都不知道看过多少了,有国外国内,有的测试设计很好,有的设计欠周全,stackoverflow也经常有这方面的讨论,但总体上来说,Java一般都不占什么优势!

如果你开始就从java入手的话,你一定认为内存可以无限大!

额……我应该可以认为你是完全不懂java,但是了解java比C/C++慢,并且内存占用大。是这样吧- -

的确不懂!第一眼见到这个语言就觉得是个拯救不懂内存的人的救命稻草!
返回顶部
顶部