Umar DBA: Windows NT Memory Architecture Overview

Subject:	Windows NT Memory Architecture Overview
	Doc ID:	Note:46053.1	Type:	REFERENCE
	Last Revision Date:	17-APR-2001	Status:	PUBLISHED

1) Purpose

==========

This article is intended to assist customers understand how the Windows NT

memory architecture works and this should help them better understand how

the Oracle database interacts with it when used in combination with article

Note 46001.1. It is not intended to be a definitive guide to the Windows

NT memory architecture, please refer to Intel / Microsoft's own information

for this.

This note is only relevant to Windows NT 4.0, Windows 2000 includes many

new features not addressed here.

2) The Window NT 32 Bit Memory Model

====================================

a) Standard Windows NT Memory Model

-----------------------------------

Windows NT 4.0 has a virtual-memory system that combines physical memory,

the file system cache, and disk into a flexible information storage and

retrieval system. Each process running on Windows NT has a flat, linear

32-bit memory address space. This means that each process can "see" 32-bits

of address space or 4 Gigabytes (GB) of virtual memory. The upper half

(0x80000000 through 0xFFFFFFFF) of the virtual memory is reserved for the

system code and data that is visible to the process only when it is running

in privileged mode. The lower half (0x00000000 through 0x7FFFFFFF) is

available to the process when it is running in user-mode and to user-mode

system services called by the program.

Windows NT versions prior to 3.51 included some 16-bit data structures that

limited processes to 256MB (64K pages) of virtual memory. These have been

converted to 32-bit data structures in Windows NT 4.0, so 2GB of virtual

memory is now available to each and every process.

b) Breaking through the Intel Windows NT 2GB limits

---------------------------------------------------

With the cost of physical memory continuing to drop and more I/O intensive

applications such as database management systems becoming available, the

2GB limit imposed on processes has become a constraining factor. To address

this issue Microsoft introduced the 4GT RAM Tuning feature of Windows NT

Server, Enterprise Edition version 4.0 (Intel Only).

The 4GT tuning feature increases the 2GB user-mode partition of the virtual

address space to 3GB (0x00000000 through 0xBFFFFFFF) by reducing the kernel

mode partition to 1GB (0xC0000000 through 0xFFFFFFFF). One of the main

benefits of this change is that it is transparent to applications.

This has been achieved by moving the Guard page which protects the boundary

between the User Address Space and System Address Space from 0x7FFFEFFF to

0xBFFFEFFF. To enable this feature the /3GB switch must be added to the

boot.ini startup line, for example :

multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows NT 4.0 Enterprise" /3GB

For applications to take advantage of this feature they need to be linked

with the /LARGEADDRESSAWARE switch, which sets a bit in the executables

image header (IMAGE_FILE_LARGE_ADDRESS_AWARE). For executables that were

not linked with this switch, it can be set by running the imagecfg tool

against the executable, for example :

  imagecfg -l oracle80.exe

This tool is available on the Windows NT 4.0 Enterprise Edition CD 2 under

\support\DEBUG\i386\ directory.

c) Breaking through the Intel Windows NT 4GB limits

---------------------------------------------------

The introduction of Intel Servers that can support greater than 4GB of main

memory has presented a challenge to Windows NT, because NT is only capable

of using up to 4GB in total. This is especially important as a growing

number of enterprise class applications are capable of deriving benefit

from this extra memory.

Intel has introduced Servers based on the Pentium II/III Xeon processor

with support for the Intel Extended Server Memory (ESM) Architecture which

breaks through the 4GB (32-bit) memory barrier. ESM includes 36-bit memory

addressing technologies which are capable of addressing 64GB of main

memory, using the Page Size Extension 36-bit (PSE36) driver, which must

be obtained from Intel. The current PSE36 driver is limited to 8GB.

The Intel PSE36 driver is a standard RAM disk device (based on the Windows

NT DDK RAM disk driver) that lacks a file system and is backed by main

memory that is unused by the operating system. The PSE36 driver functions

like a raw disk with much lower latency and allows 4MB pages to exist at

addresses anywhere in the 36-bit address space. Applications must be

rewritten to make use of this feature.

Only one process may open / access the PSE36 driver at a time, this process

gets exclusive access to all of the additional memory. The RAM disk is not

shared between processes, it is never mapped into the address space of a

process and it is not backed by the Windows NT page file. Applications that

use this device driver access it via the same Win32 API function calls used

to access standard raw disk partitions :

  - CreateFile      : obtains a file handle to the PSE36 device driver

                      and specifies access modes

  - DeviceIoControl : obtains the size of the PSE36 driver device and

                      provides optimised READ and WRITE device controls

Systems with less than 4GB of memory can still utilize the PSE36 driver as

long as the /MAXMEM switch is added to the Windows NT boot.ini file. For

example on a system with 4GB of memory and a Xeon processor MAXMEM could

be set to 2048 MB :

multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows NT 4.0 EE" /MAXMEM:2048

Under such a configuration, assuming 256MB of address space at the top of

memory has been reserved for I/O devices, Windows NT would control a 2GB

chunk of memory and 1.75GB would be controlled by the PSE36 driver. For

systems with greater than 4GB of physical memory the MAXMEM parameter can

be used to maximize the amount of memory used by the PSE36 driver which

is useful in systems where processes have only modest kernel memory

requirements. For example on a machine with 5GB of physical memory, MAXMEM

could be set to 3GB (3072) to increase the memory available to the PSE36

driver from 1GB to 2GB. Although it is often unnecessary to set MAXMEM on

such systems because Windows NT in unable to access memory beyond 4GB.

2) NT's Virtual Memory Manager (VMM)

====================================

a) The virtual address space and address translation

----------------------------------------------------

As is the case with other Virtual Memory Managers the Windows NT VMM is

responsible for creating the illusion that all processes have exclusive

access to 32-bits (4GB) of physical memory, the reality being all processes

share the same physical memory (up to a maximum of 4GB). The 32-bits of

address space are known as virtual memory because they do not directly

correspond to physical memory, it is the VMM responsibility to translate

back and forth between virtual and physical memory.

Both physical and virtual memory are divided up into blocks known as Memory

Management Units (MMU) that the VMM performs memory address translation

upon. A computers physical MMU is known as a page frame which the processor

numbers consecutively with page frame numbers up to maximum physical memory

available, where as the virtual MMU is known as a page. The size of a page

varies with the processor platform, Intel have 4096 bytes per page; Alpha

platforms have 8192 bytes per page.

The Windows NT VMM uses a three step address resolution mechanism, where

the virtual address is split into three parts :

   - Page Directory Entry/Offset (PDE) : bits 22 to 31

   - Page Table Entry/Offset (PTE)     : bits 12 to 21

   - Page Offset                       : bits  0 to 11

Each process has its own private page directory and a special hardware

register is used to point to its address. When the scheduler switches

between processes NT copies the new processes pointer into the register.

The MMU translation mechanism uses the PDE offset from the virtual address

to retrieve from the page directory the page frame number of the PTE, it

then uses the PTE offset from the virtual address to retrieve to page frame

number of the code or data page required :

          +-----------+------------+--------+

          | Directory | Page Table | Page   | Virtual

          |  Offset   |   Offset   | Offset | Address

          +-----------+------------+--------+

         31 |        21 |         11   |    0

    +-------+           |              +-----+

    |                   |   Page Table       |

    |     Page          |      Page          |   Page Frame

    |   Directory       |   +---------+      |  +-----------+

    |  +---------+      |   |         |      |  |           |

    |  |         |      |   +---------+      |  +-----------+

    |  +---------+      +-> | PF Addr |---+  |  |           |

    +->| PT Addr |---+      +---------+   |  |  +-----------+

       +---------+   |      |    .    |   |  +->| Code/Data |

       |         |   |      |    .    |   |     +-----------+

       +---------+   |      |    .    |   |     |     .     |

       |    .    |   |      |    .    |   |     |     .     |

       |    .    |   |      +---------+   |     +-----------+

       |         |   |             ^      |              ^

       +---------+   |             |      |              |

      (per process)  +-------------+      +--------------+

When a page frame is shared between two processes, the VMM inserts a level

of indirection into its page tables by using a prototype page table entry

(prototype PTE) data structure. This ensures only the prototype PTE needs

to be updated when a page frame is paged in, rather than the PTE of each

process.

The Windows NT VMM uses a three step approach to save memory because it

assumes most processes have the majority of their 4GB address space

unallocated. It fully defines the Page Directory, but Page Table Pages are

defined only as and when needed, where as a two step translation will need

to fully maintain the PTEs which would require one million entries each

using a four bytes pointer = 4MB per process. A three step translation

would cause poor performance without features such as Translation Lookaside

Buffers, where the processor provides an array of associative memory which

holds a direct virtual to physical page mapping for the most frequently

used pages.

b) Paging

---------

When the number of available page frames runs low, the VMM selects page

frames to free and copies them to disk, this process is know as paging.

Paging is essential to a virtual memory system where multiple processes

compete for the same physical memory, although excessive paging can

monopolize processors and disks. The Windows NT VMM architecture includes

sophisticated strategies for anticipating the code and data requirements

of competing processes to minimize disk access through paging.

A page fault occurs when a program requests a page of code or data that is

not in its working set (the set of pages visible to the program in physical

memory).

  - A hard page fault occurs when the requested page must be

    retrieved from disk.

  - A soft page fault occurs when then the requested page is found

    elsewhere in physical memory.

Soft page faults can be satisfied quickly and relatively easily by the

Virtual Memory Manager, but hard faults cause paging from disk, which can

degrade performance. There are many causes of soft page faults including

accessing new PDE/PTE entries and re-accessing pages that were removed

from a working set but are still unmodified.

Pages that are written to disk are written to the Windows NT Page File

(pagefile.sys). The paging file can be split across multiple devices (up

to 15 secondary files are allowed) but only one file per device. The total

size of the paging file plus physical memory limits the amount of data

that can be stored in memory by all processes. It is usually recommended

that the page file is at least twice as large as the physical memory to

accommodate a mix of active and inactive processes but the actual size will

be dependant upon the required total number of concurrent committed pages

in the system.

c) Working Set Management

-------------------------

Processes have a certain number of pages that reside in physical memory,

these are known as the processes working set and they may have other pages

that are stored in the pagefile. Three types of working set exist :

  - system  : the is one of these and it belongs to the Windows NT kernel

  - session : used per logged on session in Windows Terminal Server

  - process : per user process

Every process has a maximum and minimum working set defined for it, during

the term that the process exists the working set will vary between these

values and because the minimum is by default non zero the whole process

will never be completely swapped out. If a process requests more pages than

its maximum defined the VMM will remove one of the processes pages using a

FIFO algorithm (oldest first) causing a page fault for each new page. When

a page is removed from a processes working set it remains in physical

memory for a period of time and can be brought back into the working set

if required avoiding a hard fault. The page frame goes onto either the

modified (process wrote to the page / contents not yet on disk) list or

the standby (allocated for reuse) list.

When physical memory runs low, the VMM uses a technique known as automatic

working-set trimming to increase the amount of memory available to the

system. It examines each process by comparing its current working set with

its minimum defined and the level of page faulting it incurs, it removes

pages from the working set making them available to other processes. For

processes that haven't released the memory, the page frame goes on to the

modified or standby list which means the page frame contents are still in

memory. These lists use FIFO, so a system with lots of free physical memory

will not immediately use a page that just went to the standby list. If a

process needs to re-access the page, the VMM will revalidate that page

using the existing image still in memory causing a soft page fault. A disk

read (hard page fault) only occurs when a process re-accessed a page and

the page was no longer on the modified or standby lists, i.e. no longer in

physical memory.

d) Sharing Memory / Executable Images

-------------------------------------

Sharing memory is an important feature of any VMM and one of the mechanisms

that Windows NT uses is called memory mapped files which allow normal files

to back physical memory rather than the pagefile. Memory mapped files allow

processes to map files to their virtual address space by creating a section

object and mapping a view of all or part of the file, this process returns

the value of the starting address of the mapped view.

Windows NT uses memory-mapped files to load and execute EXE/DLL files which

greatly reduces pagefile space plus the time required for an application

to begin executing. When subsequent instances of a process are started NT

simply opens another memory mapped view of the executable files image, this

allows multiple instances of the same application to share the same code

and data in physical memory. To protect against one instance altering the

global variables of another's or from a code page being changed by a

debugger setting breakpoints, NT use the copy-on-write feature. When an

attempt is made to write to a memory mapped file the VMM catches the

attempt and allocates a new block of memory for the pages containing the

memory the application is trying to write to. The newly allocated page

frames will be backed by the page file.

e) Caching Files

----------------

Windows NT Server is commonly used as a network file server, to provide

better response times to applications accessing common files across the

network and to programs that are I/O intensive NT implements a file system

cache. The size of the Windows NT file system cache is continually adjusted

by the VMM based upon the size of physical memory and the demand for memory

space.

The cache is designed to be self-tuning but can be influenced by selecting:

  - Control Panel -> Network -> Services -> Server.

For systems that mainly act as a file server set optimisation to :

  - Maximize Throughput for File sharing

For systems that have applications that are accessed via client / server

architectures and often perform their own file caching such as database

servers set optimisation to :

  - Maximize Throughput for Network Applications.

3) Virtual Memory Terminology

=============================

All virtual memory in Windows NT is either reserved, committed, or

available. The following provides a description of these states :

a) Reserved Memory :

--------------------

When a process is created and given its address space, the bulk of this

usable address space is free, or unallocated. In order to use portions of

this address space it must allocate regions within it, this process is

known as reserving. The reserved regions are of contiguous pages and

rounded up to an even multiple of the page size. This address space is

set aside by the VMM for the process but does not count against the

process's memory quota until it is used. When a process needs to write to

memory, some of the reserved memory is committed to the process. If the

process runs out of memory, available memory can be reserved and committed

simultaneously.

b) Committed Memory :

---------------------

To use a reserved region of address space a process must allocate and map

this storage to the reserved region. This process is known as committing

physical storage and is always committed in whole memory pages, but does

not need to commit storage to an entire region. The VMM "saves space" for

the committed pages in the Pagefile.sys file in case it needs to be written

to disk. The amount of committed memory for a process is an indication of

how much memory is it really using. Committed memory is limited by the size

of the paging file.

c) The Commit Limit :

---------------------

Is the amount of memory that can be committed without expanding the paging

file. If disk space is available, the paging file can expanded and the

commit limit will be increased as long as it does not exceed the maximum

page file size.

d) Available Memory :

---------------------

Memory that is neither reserved nor committed is available. Available

memory includes free memory, zeroed memory (which is cleared and filled

with zeros), and memory on the standby list, which has been removed from a

processes working set but might be reclaimed.

4) Threads and Memory allocations

=================================

Windows NT processes consist of one or more threads (commonly known as a

multi-thread architecture), where a thread describes a path of execution

within the process. Threaded architectures increase the complexity of

memory management because memory accessed by one thread needs to be

protected from invalid access by other threads.

When a process is created, Windows NT creates a heap in the processes

address space, this heap is called the processes default heap. The default

heap is used by many of the Win32 API calls and C runtime calls such as

malloc / localalloc, although processes can create additional named

heaps within the virtual address space if required. The default heap is

created as a 1MB region (reserved and committed), as allocations are made

to and released from the heap the heap manager commits or decommits the

region. Access to the heap is serialized using critical sections so that

multiple threads can not simultaneously access the heap.

When a thread is created in a process, Windows NT reserves a region of

the address space for the threads stack (each thread gets its own stack)

and also commits some of the reserved region. When a process is linked in

the standard manor the system reserves a 1MB region of the virtual address

space for the stack and commits two of the pages. When a thread allocates

a static or global variable, multiple threads can access this variable

at the same time, potentially corrupting the variables contents. Local

and automatic variables are created on the threads stack and are therefore

less likely to be corrupted. Allocations are made from the top of the

stack down, e.g with a stack from 0x08000000 to 0x080FF000 allocations will

cause pages to be committed from 0x080FF000 down through to 0x08001000,

access to a page at 0x08001000 will cause a stack overflow exception. The

stack can not grow any further and any additional attempt to access the

stack will cause an access violation which could cause the process to

terminate without notice.

We have covered two of the three mechanisms for manipulating memory above,

heaps (which are best for managing large numbers of small object) and

memory mapped files (which are best for managing large streams of data),

the final mechanism is direct virtual memory allocation. The Win32

functions for manipulating virtual memory allow you to directly reserve a

region of the address space and commit physical storage (from the page

file) to the region. The main APIs used to achieve this are VirtualAlloc /

VirtualFree and they allow a contiguous region of the virtual address space

to be defined at either an explicit or implicit address, rounded to an even

64K boundary. They also allow the region to be reserved (MEM_RESERVE) or

reserved and committed (MEM_RESERVE | MEM_COMMIT) as well as setting the

regions access permissions e.g. PAGE_READWRITE / PAGE_READONLY. Using this

mechanism provides the process with a flexible and efficient mechanism of

managing limited memory resources, particularly for large allocations.

When a thread terminates it can do so in one of two ways; by calling

ExitThread or TerminateThread. When a thread detaches under normal

conditions the ExitThread function will be called, the stack for the thread

will be destroyed and memory will be released back to the virtual address

space of the process. If a thread is terminated using TerminateThread the

thread detach code will not be called and Windows NT will not destroy the

threads stack. As a result the regions and stack for the terminated thread

will not be released back to the processes virtual address space, this

memory will only get released when the process that owns the thread

terminates.

Umar DBA

Thursday, December 17, 2009

Windows NT Memory Architecture Overview

1 comment:

Followers

Blog Archive