Valgrind *不是* 泄漏检查工具 已翻译 100%

oschina 投递于 2014/12/09 07:26 (共 10 段, 翻译完成于 12-11)
阅读 8158
收藏 125
4
加载中

Summary:

Valgrind is one of the most misunderstood tools I know of in my community. Valgrind is not a leak checker. It has a leak checking tool. I'd argue that this tool happens to be the least useful component.

Without changing the way you invoke Valgrind, you get so much more useful information than most people realize. Valgrind finds latent bugs even when they don't cause your program to fail/crash; it doesn't just tell you where the bug happened, it tells you why it happened, in English. Valgrind is an undefined behavior checking tool first, a function and memory profiler second, a data-race detection tool third, and a leak checking tool last.

There's a reason why this is the first thing I tell students to do at office hours.

已有 1 人翻译此段
我来翻译

First things first:

To run valgrind, simply go to the directory where your program is and run:

valgrind ./myProgram myProgramsFirstArg myProgramsSecondArg

No special arguments.

You'll see both your program's output as well as the debugging output generated by Valgrind (which is prefixed with ==). The output is most helpful (and includes line numbers) if you compile your program with -g before running valgrind over the executable.

For the purposes of this article, please, Ignore all Valgrind output after the "HEAP SUMMARY" line. This is the part we don't care about: the memory leak summary.

已有 1 人翻译此段
我来翻译

What can it detect?:

1) Misuse of uninitialized values. At it's most basic:

bool condition;if (condition) {
  //Do the thing}

This is a fun one. A lot of time your code is just going to keep going and fail silently if you run this. It might even do exactly what you hoped it would do... most of the time. In a perfect world, when your code is wrong, it fails every time. Hard and fast errors, not silent, latent, and long-running. Knowing that there is a bug is the first step to fixing it. The problem here is that that bool has no value assigned to it. It is NOT automatically initialized to false (or true). Its value is whatever garbage happened to be in memory at that time.

The valgrind output for the example is of the form:

==2364== Conditional jump or move depends on uninitialized value(s)
==2364==    at 0x400916: main (test.cpp:106)

Notice: This tells us why the code exhibits undefined behavior, not just where. What's more, Valgrind catches it even if the undefined behavior wouldn't cause your program to crash.

已有 1 人翻译此段
我来翻译

I doubt something quite so obvious as the above example is written often, but it'd be much harder to see this mistake in code of the form:

bool condition;if (foo) {
  condition = true;}if (bar) {
  condition = false;}if (baz) {
  condition = true;}if (condition) {
  //Do the thing}

Here we initialize properly some of the time... but not all of the time. Valgrind still catches it if you have a test that exhibits the undefined behavior.

For what it's worth, you can use defensive coding practices to avoid this type of bug in the first place. Prefer to always initialize your variables with a value. Use the auto keyword to require that you do so (you cannot deduce a type without a value to deduce it from). Take a look at the articles on auto on Herb Sutter's blog to find out more.

已有 1 人翻译此段
我来翻译

2) Accessing memory you shouldn't. Touching memory that was never allocated, memory that's been freed, access past the end of allocated memory (so, off by one errors), and inaccessible parts of the stack.

An example:

  vector<int> v { 1, 2, 3, 4, 5 };
  v[5] = 0; //Oops

Do you see it?

If I run this code normally on my computer, it actually seems to run just fine. No crashes over 20 runs... but it's definitely wrong. Even if I did manage to have it open in GDB (another debugging tool) when it crashed, the best I'd get is a stack trace, and it might not be where the problem was caused, but rather, where it manifested, at the symptom, if you will.

Here's the corresponding Valgrind output:

==2710== Invalid write of size 4
==2710==    at 0x400961: foo() (test.cpp:85)
==2710==    by 0x4009A2: main (test.cpp:89)
==2710==  Address 0x5a1d054 is 0 bytes after a block of size 20 alloc'd
==2710==    at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2710==    by 0x400EDF: __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) (new_allocator.h:104)
==2710==    by 0x400DCE: std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) (in /home/mark/test/a.out)
==2710==    by 0x400C5F: void std::vector<int, std::allocator<int> >::_M_range_initialize<int const*>(int const*, int const*, std::forward_iterator_tag) (stl_vector.h:1201)
==2710==    by 0x400AF4: std::vector<int, std::allocator<int> >::vector(std::initializer_list<int>, std::allocator<int> const&) (stl_vector.h:368)
==2710==    by 0x400943: foo() (test.cpp:84)
==2710==    by 0x4009A2: main (test.cpp:89)

That's a little unwieldy if you're not used to looking at stack traces through the STL. Let's break it down.

已有 1 人翻译此段
我来翻译

First line tells you why your code exhibited undefined behavior. There was an "Invalid write of size 4". Size 4 means I wrote something 4 bytes big. On my machine, that's probably an int. Invalid write means that I touched memory I shouldn't have. As it happens, this was an off by one error: I wrote past the end of my vector.

Now let’s look at the 2nd and 3rd lines. These are Valgrind's best guess at the part of the stack trace that you care about. Indeed, in my case, foo is where the troubled code was, and main is the function that called foo.

The 4th line is more detail on the matter of "you ran off the end of the memory you were using".

And the rest is a more detailed stack trace that includes the STL. For what it's worth, the problem is never in the STL (ok, almost never).

已有 1 人翻译此段
我来翻译

3) Misuse of std::memcpy and functions that build on top of it whereby your source and destination arrays overlap (be sure to read my article about why std::memcpy is deprecated, then remember that you'll still invoke it under the hood of a better abstraction)

Not including an example on this error type or the next; I don't think they're especially common in modern code and if you do run into these, running Valgrind normally, without arguments, will expose both types of problems.

4) Invalid freeing of memory (minimal in modern code where you should be using smart pointers anyway)

已有 1 人翻译此段
我来翻译

5) Data races:

If I run:

  auto x = 0;
  thread([&] {
    ++x;
  }).detach();
  ++x;

with:

valgrind --tool=helgrind ./myProgram

I get some useful information:

==2872== Possible data race during read of size 4 at 0xFFEFFFE8C by thread #1
==2872== Locks held: none
==2872==    at 0x401081: main (test.cpp:96)
==2872== 
==2872== This conflicts with a previous write of size 4 by thread #2
==2872== Locks held: none
==2872==    at 0x40103A: main::{lambda()#1}::operator()() const (test.cpp:94)
==2872==    by 0x401F2D: void std::_Bind_simple<main::{lambda()#1} ()>::_M_invoke<>(std::_Index_tuple<>) (functional:1732)
==2872==    by 0x401E84: std::_Bind_simple<main::{lambda()#1} ()>::operator()() (functional:1720)
==2872==    by 0x401E1D: std::thread::_Impl<std::_Bind_simple<main::{lambda()#1} ()> >::_M_run() (thread:115)
==2872==    by 0x4EEEBEF: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19)
==2872==    by 0x4C30E26: ??? (in /usr/lib/valgrind/vgpreload_helgrind-amd64-linux.so)
==2872==    by 0x535F181: start_thread (pthread_create.c:312)
==2872==    by 0x566FEFC: clone (clone.S:111)

It tells me that I'm not protecting my data properly. I'm sharing data without synchronizing with a mutex. Bam.

I should mention that although this did find the bug in the code, it also included a ton of false positives on the std::shared_ptr used internally to std::thread. It seems they need to do a bit more work on that front. You could probably write a simple D or python script to scrape helgrind output for only the useful bits.

已有 1 人翻译此段
我来翻译

6) And yeah... it finds leaks, if you're still not using smart pointers.

Run:

valgrind --leak-check=full ./myProgram

(If you forget that flag, just run valgrind normally once; it'll remind you in the text in the summary area)

On:

auto x = new int(5);

And you'll see:

==2881== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==2881==    at 0x4C2B0E0: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==2881==    by 0x400966: main (test.cpp:92)

Valgrind as a function and memory profiler:

In addition to being able to tell you where you've introduced bugs into your program, Valgrind can also help you optimize. Too often people assume that they know what's eating up their runtime or what their big memory problems are... and they're wrong. Use your time wisely: measure!

Run your program with:

valgrind --tool=callgrind ./myProgram

And it'll spit out a file in the same directory whose name is something like callgrind.out.2887. Download the program KCachegrind to get a GUI visualization of the flow of your program, what functions are eating up your runtime, and generally, a better understanding of where to focus your efforts.

Here's what some of the most simple output looks like, showing the runtime cost of each function both in terms of wall time, number of times it was called, and percentage of the total runtime. You can Google for some of the more interesting graphs/flow diagrams it generates.

Similarly, I can evaluate where I'm allocating the most memory by running with --tool=massif. This is often useful for leak checking as well, as larger parts of your memory footprint may be indicative of leaks.

已有 1 人翻译此段
我来翻译

Conclusions:

Valgrind is much more than a leak checking tool. Change your perspective: Valgrind is an undefined behavior killer.

Valgrind should be your tool of first resort. It not only tells you where your bugs are happening, but why, and it'll tell you this even if your program doesn't crash (unlike GDB on both counts). For what it's worth, GDB is still a very useful tool for getting full stack traces on failed assertions and for debugging concurrent code, among other things.

You may also find it useful to always compile with -pedantic -Wall -Wextra. Your compiler is often smart enough to flag undefined behavior as well. What the compiler misses, Valgrind should catch.

If this interests you, you may want to take a look at some other tools that perform similar duties, often with less of a runtime hit:
Address Sanitizer for clang and g++
Undefined Behavior Sanitizer for clang and g++
Memory Sanitizer for clang
Thread Sanitizer for clang

已有 1 人翻译此段
我来翻译
本文中的所有译文仅用于学习和交流目的,转载请务必注明文章译者、出处、和本文链接。
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们。
加载中

评论(14)

y
ydong08
mark,学习了,非常不错
a
alzuse

引用来自“timxx”的评论

内存问题还是用ASAN比较好,速度快。。Valgrind用的比较多的时函数调用分析
谢谢这条回复,又学到了一条侦测内存问题的办法。
哆啦比猫
哆啦比猫

引用来自“月影南溪”的评论

我就比较好奇那两个*代表什么

引用来自“Miyanaga”的评论

大概只是强调吧……

引用来自“月影南溪”的评论

0.0 我读书少,不要骗我

引用来自“eechen”的评论

Markdown里在文字两侧加星号表示强调。

引用来自“月影南溪”的评论

好奇怪

引用来自“Raphael_goh”的评论

应该是标题无法用markdown渲染,不过按照markdown的语法,*not*是斜体,**not**才是粗体 表示强调
斜体才是强调好吧
timxx
timxx
内存问题还是用ASAN比较好,速度快。。Valgrind用的比较多的时函数调用分析
evangelist
evangelist
valgrind思路还是不错的,但是有点太重量级。没记错的话,它是弄了个类似虚拟CPU之类的玩意,你原本的程序有一个字节,它也对应生成一段数据。
这样你得把硬件翻一番,才能用valgrind跑起来检测代码问题。程序规模大点,硬件一般点,基本一跑就卡死了。
Raphael_goh
Raphael_goh

引用来自“月影南溪”的评论

我就比较好奇那两个*代表什么

引用来自“Miyanaga”的评论

大概只是强调吧……

引用来自“月影南溪”的评论

0.0 我读书少,不要骗我

引用来自“eechen”的评论

Markdown里在文字两侧加星号表示强调。

引用来自“月影南溪”的评论

好奇怪
应该是标题无法用markdown渲染,不过按照markdown的语法,*not*是斜体,**not**才是粗体 表示强调
Raphael_goh
Raphael_goh

引用来自“月影南溪”的评论

我就比较好奇那两个*代表什么

引用来自“Miyanaga”的评论

大概只是强调吧……

引用来自“月影南溪”的评论

0.0 我读书少,不要骗我

引用来自“eechen”的评论

Markdown里在文字两侧加星号表示强调。
两侧各两个**才是粗体表示强调吧,两侧各一个是斜体吧
月影南溪
月影南溪

引用来自“月影南溪”的评论

我就比较好奇那两个*代表什么

引用来自“Miyanaga”的评论

大概只是强调吧……

引用来自“月影南溪”的评论

0.0 我读书少,不要骗我

引用来自“eechen”的评论

Markdown里在文字两侧加星号表示强调。
好奇怪
eechen
eechen
Qt Creator 和 Eclipse CDT 里都继承了 Valgrind 和 GDB 前端:
http://dragly.org/wp-content/uploads/2013/03/qt-valgrind.png
http://my.oschina.net/eechen/blog/166969
eechen
eechen

引用来自“月影南溪”的评论

我就比较好奇那两个*代表什么

引用来自“Miyanaga”的评论

大概只是强调吧……

引用来自“月影南溪”的评论

0.0 我读书少,不要骗我
Markdown里在文字两侧加星号表示强调。
返回顶部
顶部