理解虚拟内存 已翻译 100%

mqDxGot2 投递于 2013/01/09 17:42 (共 27 段, 翻译完成于 01-15)
阅读 6804
收藏 28
4
加载中

Introduction

One of the most important aspects of an operating system is the Virtual Memory Management system. Virtual Memory (VM) allows an operating system to perform many of its advanced functions, such as process isolation, file caching, and swapping. As such, it is imperative that an administrator understand the functions and tunable parameters of an operating system's Virtual Memory Manager so that optimal performance for a given workload may be achieved. After reading this article, the reader should have a rudimentary understanding of the data the Red Hat Enterprise Linux (RHEL3) VM controls and the algorithms it uses. Further, the reader should have a fairly good understanding of general Linux VM tuning techniques. It is important to note that Linux as an operating system has a proud legacy of overhaul. Items which no longer serve useful purposes or which have better implementations as technology advances are phased out. This implies that the tuning parameters described in this article may be out of date if you are using a newer or older kernel. Fear not however! With a well grounded understanding of the general mechanics of a VM, it is fairly easy to convert knowledge of VM tuning to another VM. The same general principles apply, and documentation for a given kernel (including its specific tunable parameters) can be found in the corresponding kernel source tree under the fileDocumentation/sysctl/vm.txt.

已有 1 人翻译此段
我来翻译

Definitions

To properly understand how a Virtual Memory Manager does its job, it helps to understand what components comprise a VM. While the low level view of a VM are overwhelming for most, a high level view is necessary to understand how a VM works and how it can be optimized for workloads.

What Comprises a VM

High Level Overview of VM Subsystem
Figure 1. High Level Overview of VM Subsystem

The inner workings of the Linux virtual memory subsystem are quite complex, but it can be defined at a high level with the following components:

MMU

The Memory Management Unit (MMU) is the hardware base that makes a VM system possible. The MMU allows software to reference physical memory by aliased addresses, quite often more than one. It accomplishes this through the use of pages and page tables. The MMU uses a section of memory to translate virtual addresses into physical addresses via a series of table lookups.

已有 1 人翻译此段
我来翻译

Zoned Buddy Allocator

The Zoned Buddy Allocator is responsible for the management of page allocations to the entire system. This code manages lists of physically contiguous pages and maps them into the MMU page tables, so as to provide other kernel subsystems with valid physical address ranges when the kernel requests them (Physical to Virtual Address mapping is handled by a higher layer of the VM). The name Buddy Allocator is derived from the algorithm this subsystem uses to maintain it free page lists. All physical pages in RAM are cataloged by the Buddy Allocator and grouped into lists. Each list represents clusters of 2n pages, where n is incremented in each list. If no entries exist on the requested list, an entry from the next list up is broken into two separate clusters and is returned to the caller while the other is added to the next list down. When an allocation is returned to the buddy allocator, the reverse process happens. Note that the Buddy Allocator also manages memory zones, which define pools of memory which have different purposes. Currently there are three memory pools which the Buddy Allocator manages accesses for:

  • DMA — This zone consists of the first 16 MB of RAM, from which legacy devices allocate to perform direct memory operations.

  • NORMAL — This zone encompasses memory addresses from 16 MB to 1 GB and is used by the kernel for internal data structures as well as other system and user space allocations.

  • HIGHMEM — This zone includes all memory above 1 GB and is used exclusively for system allocations (file system buffers, user space allocations, etc).

已有 1 人翻译此段
我来翻译
Slab Allocator

The Slab Allocator provides a more usable front end to the Buddy Allocator for those sections of the kernel which require memory in sizes that are more flexible than the standard 4 KB page. The Slab Allocator allows other kernel components to create caches of memory objects of a given size. The Slab Allocator is responsible for placing as many of the cache's objects on a page as possible and monitoring which objects are free and which are allocated. When allocations are requested and no more are available, the Slab Allocator requests more pages from the Buddy Allocator to satisfy the request. This allows kernel components to use memory in a much simpler way. This way components which make use of many small portions of memory are not required to individually implement memory management code so that too many pages are not wasted. The Slab Allocator may only allocate from the DMA and NORMAL zones.

已有 1 人翻译此段
我来翻译

Kernel Threads

The last component in the VM subsystem are the kernel threads:kscand,kswapd,kupdated, andbdflush. These tasks are responsible for the recovery and management of in use memory. All pages of memory have an associated state (for more information on the memory state machine, refer to the section called “The Life of a Page” section. In general, the active tasks in the kernel related to VM usage are responsible for attempting to move pages out of RAM. Periodically they examine RAM, trying to identify and free inactive memory so that it can be put to other uses in the system.

已有 1 人翻译此段
我来翻译

The Life of a Page

All of the memory managed by the VM is labeled by a state. These states help let the VM know what to do with a given page under various circumstances. Dependent on the current needs of the system, the VM may transfer pages from one state to the next, according to the state machine in Figure 2. “VM Page State Machine”. Using these states, the VM can determine what is being done with a page by the system at a given time and what actions the VM may take on the page. The states that have particular meanings are as follows:

  1. FREE — All pages available for allocation begin in this state. This indicates to the VM that the page is not being used for any purpose and is available for allocation.

  2. ACTIVE — Pages which have been allocated from the Buddy Allocator enter this state. It indicates to the VM that the page has been allocated and is actively in use by the kernel or a user process.

  3. INACTIVE DIRTY — This state indicates that the page has fallen into disuse by the entity which allocated it and thus is a candidate for removal from main memory. Thekscandtask periodically sweeps through all the pages in memory, taking note of the amount of time the page has been in memory since it was last accessed. Ifkscandfinds that a page has been accessed since it last visited the page, it increments the page's age counter; otherwise, it decrements that counter. Ifkscandfinds a page with its age counter at zero, it moves the page to the inactive dirty state. Pages in the inactive dirty state are kept in a list of pages to be laundered.

  4. INACTIVE LAUNDERED — This is an interim state in which those pages which have been selected for removal from main memory enter while their contents are being moved to disk. Only pages which were in the inactive dirty state can enter this state. When the disk I/O operation is complete, the page is moved to the inactive clean state, where it may be deallocated or overwritten for another purpose. If, during the disk operation, the page is accessed, the page is moved back into the active state.

  5. INACTIVE CLEAN — Pages in this state have been laundered. This means that the contents of the page are in sync with the backed up data on disk. Thus, they may be deallocated by the VM or overwritten for other purposes.

VM Page State Machine
Figure 2. VM Page State Machine

已有 1 人翻译此段
我来翻译
Tuning the VM

Now that the picture of the VM mechanism is sufficiently illustrated, how is it adjusted to fit certain workloads? There are two methods for changing tunable parameters in the Linux VM. The first is the sysctl interface. The sysctl interface is a programming oriented interface, which allows software programs to modify various tunable parameters directly. It is exported to system administrators via the sysctl utility, which allows an administrator to specify a value for any of the tunable VM parameters via the command line. For example:

sysctl -w vm.max map count=65535

已有 1 人翻译此段
我来翻译
The sysctl utility also supports the use of a configuration file (/etc/sysctl.conf), in which all the desirable changes to a VM can be recorded for a system and restored after a restart of the operating system, making this access method suitable for long term changes to a system VM. The file is straightforward in its layout, using simple key-value pairs with comments for clarity. For example:

#Adjust the min and max read-ahead for files
vm.max-readahead=64
vm.min-readahead=32
#turn on memory over-commit 
vm.overcommit_memory=2
#bump up the percentage of memory in use to activate bdflush
vm.bdflush="40 500 0 0 500 3000 60 20 0"

已有 1 人翻译此段
我来翻译
The second method of modifying VM tunable parameters is via the proc file system. This method exports every group of VM tunables as a virtual file, accessible via all the common Linux utilities used for modifying file contents. The VM tunables are available in the directory/proc/sys/vm/and are most commonly read and modified using thecatandechocommands. For example, use the commandcat /proc/sys/vm/kswapdto view the current value of thekswapdtunable. The output should be similar to:

512 32 8

Then, use the following command to modify the value of the tunable:

echo 511 31 7 > /proc/sys/vm/kswapd

Use thecat /proc/sys/vm/kswapdcommand again to verify that the value was modified. The output should be:

511 31 7

The proc file system interface is a convenient method for making adjustments to the VM while attempting to isolate the peak performance of a system. For convenience, the following sections list the VM tunable parameters as the filenames they are exported to in the/proc/sys/vm/directory. Unless otherwise noted, these tunables apply to the RHEL3 2.4.21-4 kernel.

已有 1 人翻译此段
我来翻译

bdflush

Thebdflushfile contains 9 parameters, of which 6 are tunable. These parameters affect the rate at which pages in the buffer cache (the subset of pagecache which stores files in memory) are freed and returned to disk. By adjusting the various values in this file, a system can be tuned to achieve better performance in environments where large amounts of file I/O are performed. Table 1. “bdflush Parameters” defines the parameters forbdflushin the order they appear in the file.

Parameter Description
nfract The percentage of dirty pages in the buffer cache required to activate thebdflushtask
ndirty The maximum number of dirty pages in the buffer cache to write to disk in eachbdflushexecution
reserved1 Reserved for future use
reserved2 Reserved for future
interval The number of jiffies (10ms periods) to delay betweenbdflushiterations
age_buffer The time for a normal buffer to age before it is considered for flushing back to disk
nfract_sync The percentage of dirty pages in the buffer cache required to cause the tasks which are writing pages of memory to begin writing those pages to disk instead
nfract_stop_bdflush The percentage of dirty pages in buffer cache required to allowbdflushto return to idle state
reserved3 Reserved for future use
Table 1.bdflushParameters

已有 1 人翻译此段
我来翻译
本文中的所有译文仅用于学习和交流目的,转载请务必注明文章译者、出处、和本文链接。
我们的翻译工作遵照 CC 协议,如果我们的工作有侵犯到您的权益,请及时联系我们。
加载中

评论(3)

qwfys
qwfys
~~
叫我蝴蝶吧
叫我蝴蝶吧
a very good article
张亦俊
张亦俊
此文甚好,只是略长
返回顶部
顶部