Thursday, December 17, 2009

Windows NT Memory Architecture Overview


Subject:
Windows NT Memory Architecture Overview

Note:46053.1
Type:
REFERENCE

Last Revision Date:
17-APR-2001
Status:
PUBLISHED
 
1) Purpose
==========
 
This article is intended to assist customers understand how the Windows NT
memory architecture works and this should help them better understand how
the Oracle database interacts with it when used in combination with article
Note 46001.1. It is not intended to be a definitive guide to the Windows
NT memory architecture, please refer to Intel / Microsoft's own information
for this.
 
This note is only relevant to Windows NT 4.0, Windows 2000 includes many
new features not addressed here.
 
 
2) The Window NT 32 Bit Memory Model
====================================
 
a) Standard Windows NT Memory Model
-----------------------------------
 
Windows NT 4.0 has a virtual-memory system that combines physical memory,
the file system cache, and disk into a flexible information storage and
retrieval system. Each process running on Windows NT has a flat, linear
32-bit memory address space. This means that each process can "see" 32-bits
of address space or 4 Gigabytes (GB) of virtual memory. The upper half
(0x80000000 through 0xFFFFFFFF) of the virtual memory is reserved for the
system code and data that is visible to the process only when it is running
in privileged mode. The lower half (0x00000000 through 0x7FFFFFFF) is
available to the process when it is running in user-mode and to user-mode
system services called by the program.
 
Windows NT versions prior to 3.51 included some 16-bit data structures that
limited processes to 256MB (64K pages) of virtual memory. These have been 
converted to 32-bit data structures in Windows NT 4.0, so 2GB of virtual
memory is now available to each and every process.
 
 
b) Breaking through the Intel Windows NT 2GB limits
---------------------------------------------------
 
With the cost of physical memory continuing to drop and more I/O intensive
applications such as database management systems becoming available, the
2GB limit imposed on processes has become a constraining factor. To address
this issue Microsoft introduced the 4GT RAM Tuning feature of Windows NT
Server, Enterprise Edition version 4.0 (Intel Only).
 
The 4GT tuning feature increases the 2GB user-mode partition of the virtual
address space to 3GB (0x00000000 through 0xBFFFFFFF) by reducing the kernel
mode partition to 1GB (0xC0000000 through 0xFFFFFFFF). One of the main
benefits of this change is that it is transparent to applications.
 
This has been achieved by moving the Guard page which protects the boundary
between the User Address Space and System Address Space from 0x7FFFEFFF to
0xBFFFEFFF. To enable this feature the /3GB switch must be added to the
boot.ini startup line, for example :
 
multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows NT 4.0 Enterprise" /3GB
 
For applications to take advantage of this feature they need to be linked
with the /LARGEADDRESSAWARE switch, which sets a bit in the executables
image header (IMAGE_FILE_LARGE_ADDRESS_AWARE). For executables that were
not linked with this switch, it can be set by running the imagecfg tool
against the executable, for example :
 
  imagecfg -l oracle80.exe
 
This tool is available on the Windows NT 4.0 Enterprise Edition CD 2 under
\support\DEBUG\i386\ directory.
 
 
c) Breaking through the Intel Windows NT 4GB limits
---------------------------------------------------
 
The introduction of Intel Servers that can support greater than 4GB of main
memory has presented a challenge to Windows NT, because NT is only capable
of using up to 4GB in total. This is especially important as a growing
number of enterprise class applications are capable of deriving benefit
from this extra memory.
 
Intel has introduced Servers based on the Pentium II/III Xeon processor
with support for the Intel Extended Server Memory (ESM) Architecture which
breaks through the 4GB (32-bit) memory barrier. ESM includes 36-bit memory
addressing technologies which are capable of addressing 64GB of main
memory, using the Page Size Extension 36-bit (PSE36) driver, which must
be obtained from Intel. The current PSE36 driver is limited to 8GB.
 
The Intel PSE36 driver is a standard RAM disk device (based on the Windows
NT DDK RAM disk driver) that lacks a file system and is backed by main
memory that is unused by the operating system. The PSE36 driver functions
like a raw disk with much lower latency and allows 4MB pages to exist at
addresses anywhere in the 36-bit address space. Applications must be
rewritten to make use of this feature.
 
Only one process may open / access the PSE36 driver at a time, this process
gets exclusive access to all of the additional memory. The RAM disk is not
shared between processes, it is never mapped into the address space of a
process and it is not backed by the Windows NT page file. Applications that
use this device driver access it via the same Win32 API function calls used
to access standard raw disk partitions :
 
  - CreateFile      : obtains a file handle to the PSE36 device driver
                      and specifies access modes
  - DeviceIoControl : obtains the size of the PSE36 driver device and
                      provides optimised READ and WRITE device controls
 
Systems with less than 4GB of memory can still utilize the PSE36 driver as
long as the /MAXMEM switch is added to the Windows NT boot.ini file. For
example on a system with 4GB of memory and a Xeon processor MAXMEM could
be set to 2048 MB :
 
multi(0)disk(0)rdisk(0)partition(2)\WINNT="Windows NT 4.0 EE" /MAXMEM:2048
 
Under such a configuration, assuming 256MB of address space at the top of
memory has been reserved for I/O devices, Windows NT would control a 2GB
chunk of memory and 1.75GB would be controlled by the PSE36 driver. For
systems with greater than 4GB of physical memory the MAXMEM parameter can
be used to maximize the amount of memory used by the PSE36 driver which
is useful in systems where processes have only modest kernel memory
requirements. For example on a machine with 5GB of physical memory, MAXMEM
could be set to 3GB (3072) to increase the memory available to the PSE36
driver from 1GB to 2GB. Although it is often unnecessary to set MAXMEM on
such systems because Windows NT in unable to access memory beyond 4GB. 
 
 
2) NT's Virtual Memory Manager (VMM)
====================================
 
a) The virtual address space and address translation
----------------------------------------------------
 
As is the case with other Virtual Memory Managers the Windows NT VMM is
responsible for creating the illusion that all processes have exclusive
access to 32-bits (4GB) of physical memory, the reality being all processes
share the same physical memory (up to a maximum of 4GB). The 32-bits of
address space are known as virtual memory because they do not directly
correspond to physical memory, it is the VMM responsibility to translate
back and forth between virtual and physical memory.
 
Both physical and virtual memory are divided up into blocks known as Memory
Management Units (MMU) that the VMM performs memory address translation
upon. A computers physical MMU is known as a page frame which the processor
numbers consecutively with page frame numbers up to maximum physical memory
available, where as the virtual MMU is known as a page. The size of a page
varies with the processor platform, Intel have 4096 bytes per page; Alpha
platforms have 8192 bytes per page.
 
The Windows NT VMM uses a three step address resolution mechanism, where
the virtual address is split into three parts :
 
   - Page Directory Entry/Offset (PDE) : bits 22 to 31
   - Page Table Entry/Offset (PTE)     : bits 12 to 21
   - Page Offset                       : bits  0 to 11
 
Each process has its own private page directory and a special hardware
register is used to point to its address. When the scheduler switches
between processes NT copies the new processes pointer into the register.
The MMU translation mechanism uses the PDE offset from the virtual address
to retrieve from the page directory the page frame number of the PTE, it
then uses the PTE offset from the virtual address to retrieve to page frame
number of the code or data page required :
 
          +-----------+------------+--------+
          | Directory | Page Table | Page   | Virtual
          |  Offset   |   Offset   | Offset | Address
          +-----------+------------+--------+
         31 |        21 |         11   |    0
    +-------+           |              +-----+
    |                   |   Page Table       |
    |     Page          |      Page          |   Page Frame
    |   Directory       |   +---------+      |  +-----------+
    |  +---------+      |   |         |      |  |           |
    |  |         |      |   +---------+      |  +-----------+
    |  +---------+      +-> | PF Addr |---+  |  |           |
    +->| PT Addr |---+      +---------+   |  |  +-----------+
       +---------+   |      |    .    |   |  +->| Code/Data |
       |         |   |      |    .    |   |     +-----------+
       +---------+   |      |    .    |   |     |     .     |
       |    .    |   |      |    .    |   |     |     .     |
       |    .    |   |      +---------+   |     +-----------+
       |         |   |             ^      |              ^
       +---------+   |             |      |              |
      (per process)  +-------------+      +--------------+
 
When a page frame is shared between two processes, the VMM inserts a level
of indirection into its page tables by using a prototype page table entry
(prototype PTE) data structure. This ensures only the prototype PTE needs
to be updated when a page frame is paged in, rather than the PTE of each
process.
 
The Windows NT VMM uses a three step approach to save memory because it
assumes most processes have the majority of their 4GB address space
unallocated. It fully defines the Page Directory, but Page Table Pages are
defined only as and when needed, where as a two step translation will need
to fully maintain the PTEs which would require one million entries each
using a four bytes pointer = 4MB per process. A three step translation
would cause poor performance without features such as Translation Lookaside
Buffers, where the processor provides an array of associative memory which
holds a direct virtual to physical page mapping for the most frequently
used pages.
 
b) Paging
---------
 
When the number of available page frames runs low, the VMM selects page
frames to free and copies them to disk, this process is know as paging.
Paging is essential to a virtual memory system where multiple processes
compete for the same physical memory, although excessive paging can
monopolize processors and disks. The Windows NT VMM architecture includes
sophisticated strategies for anticipating the code and data requirements
of competing processes to minimize disk access through paging.
 
A page fault occurs when a program requests a page of code or data that is
not in its working set (the set of pages visible to the program in physical
memory). 
 
  - A hard page fault occurs when the requested page must be 
    retrieved from disk. 
  - A soft page fault occurs when then the requested page is found 
    elsewhere in physical memory. 
 
Soft page faults can be satisfied quickly and relatively easily by the
Virtual Memory Manager, but hard faults cause paging from disk, which can
degrade performance. There are many causes of soft page faults including
accessing new PDE/PTE entries and re-accessing pages that were removed
from a working set but are still unmodified.
 
Pages that are written to disk are written to the Windows NT Page File
(pagefile.sys). The paging file can be split across multiple devices (up
to 15 secondary files are allowed) but only one file per device. The total
size of the paging file plus physical memory limits the amount of data
that can be stored in memory by all processes. It is usually recommended
that the page file is at least twice as large as the physical memory to 
accommodate a mix of active and inactive processes but the actual size will
be dependant upon the required total number of concurrent committed pages 
in the system.
 
c) Working Set Management 
-------------------------
 
Processes have a certain number of pages that reside in physical memory,
these are known as the processes working set and they may have other pages
that are stored in the pagefile. Three types of working set exist :
 
  - system  : the is one of these and it belongs to the Windows NT kernel
  - session : used per logged on session in Windows Terminal Server
  - process : per user process
 
Every process has a maximum and minimum working set defined for it, during
the term that the process exists the working set will vary between these
values and because the minimum is by default non zero the whole process
will never be completely swapped out. If a process requests more pages than
its maximum defined the VMM will remove one of the processes pages using a
FIFO algorithm (oldest first) causing a page fault for each new page. When
a page is removed from a processes working set it remains in physical 
memory for a period of time and can be brought back into the working set
if required avoiding a hard fault. The page frame goes onto either the
modified (process wrote to the page / contents not yet on disk) list or
the standby (allocated for reuse) list.
 
When physical memory runs low, the VMM uses a technique known as automatic
working-set trimming to increase the amount of memory available to the
system. It examines each process by comparing its current working set with
its minimum defined and the level of page faulting it incurs, it removes
pages from the working set making them available to other processes. For
processes that haven't released the memory, the page frame goes on to the
modified or standby list which means the page frame contents are still in
memory. These lists use FIFO, so a system with lots of free physical memory
will not immediately use a page that just went to the standby list. If a
process needs to re-access the page, the VMM will revalidate that page
using the existing image still in memory causing a soft page fault. A disk
read (hard page fault) only occurs when a process re-accessed a page and
the page was no longer on the modified or standby lists, i.e. no longer in
physical memory.
 
d) Sharing Memory / Executable Images
-------------------------------------
 
Sharing memory is an important feature of any VMM and one of the mechanisms
that Windows NT uses is called memory mapped files which allow normal files
to back physical memory rather than the pagefile. Memory mapped files allow
processes to map files to their virtual address space by creating a section
object and mapping a view of all or part of the file, this process returns
the value of the starting address of the mapped view.
 
Windows NT uses memory-mapped files to load and execute EXE/DLL files which
greatly reduces pagefile space plus the time required for an application
to begin executing. When subsequent instances of a process are started NT
simply opens another memory mapped view of the executable files image, this
allows multiple instances of the same application to share the same code
and data in physical memory. To protect against one instance altering the
global variables of another's or from a code page being changed by a
debugger setting breakpoints, NT use the copy-on-write feature. When an
attempt is made to write to a memory mapped file the VMM catches the 
attempt and allocates a new block of memory for the pages containing the
memory the application is trying to write to. The newly allocated page
frames will be backed by the page file. 
 
e) Caching Files
----------------
 
Windows NT Server is commonly used as a network file server, to provide
better response times to applications accessing common files across the
network and to programs that are I/O intensive NT implements a file system
cache. The size of the Windows NT file system cache is continually adjusted
by the VMM based upon the size of physical memory and the demand for memory
space. 
 
The cache is designed to be self-tuning but can be influenced by selecting:
  - Control Panel -> Network -> Services -> Server.
 
For systems that mainly act as a file server set optimisation to :
  - Maximize Throughput for File sharing
 
For systems that have applications that are accessed via client / server
architectures and often perform their own file caching such as database
servers set optimisation to :
  - Maximize Throughput for Network Applications.
 
 
3) Virtual Memory Terminology
=============================
 
All virtual memory in Windows NT is either reserved, committed, or
available. The following provides a description of these states :
 
a) Reserved Memory : 
--------------------
 
When a process is created and given its address space, the bulk of this
usable address space is free, or unallocated. In order to use portions of
this address space it must allocate regions within it, this process is
known as reserving. The reserved regions are of contiguous pages and
rounded up to an even multiple of the page size. This address space is
set aside by the VMM for the process but does not count against the
process's memory quota until it is used. When a process needs to write to
memory, some of the reserved memory is committed to the process. If the
process runs out of memory, available memory can be reserved and committed
simultaneously.
 
b) Committed Memory :
---------------------
 
To use a reserved region of address space a process must allocate and map
this storage to the reserved region. This process is known as committing
physical storage and is always committed in whole memory pages, but does
not need to commit storage to an entire region. The VMM "saves space" for
the committed pages in the Pagefile.sys file in case it needs to be written
to disk. The amount of committed memory for a process is an indication of
how much memory is it really using. Committed memory is limited by the size
of the paging file.
 
c) The Commit Limit :
---------------------
 
Is the amount of memory that can be committed without expanding the paging
file. If disk space is available, the paging file can expanded and the
commit limit will be increased as long as it does not exceed the maximum
page file size.
 
d) Available Memory :
---------------------
 
Memory that is neither reserved nor committed is available. Available
memory includes free memory, zeroed memory (which is cleared and filled
with zeros), and memory on the standby list, which has been removed from a
processes working set but might be reclaimed.
 
 
4) Threads and Memory allocations
=================================
 
Windows NT processes consist of one or more threads (commonly known as a
multi-thread architecture), where a thread describes a path of execution
within the process. Threaded architectures increase the complexity of
memory management because memory accessed by one thread needs to be
protected from invalid access by other threads.
 
When a process is created, Windows NT creates a heap in the processes
address space, this heap is called the processes default heap. The default
heap is used by many of the Win32 API calls and C runtime calls such as
malloc / localalloc, although processes can create additional named
heaps within the virtual address space if required. The default heap is
created as a 1MB region (reserved and committed), as allocations are made
to and released from the heap the heap manager commits or decommits the
region. Access to the heap is serialized using critical sections so that
multiple threads can not simultaneously access the heap.
 
When a thread is created in a process, Windows NT reserves a region of
the address space for the threads stack (each thread gets its own stack)
and also commits some of the reserved region. When a process is linked in
the standard manor the system reserves a 1MB region of the virtual address
space for the stack and commits two of the pages. When a thread allocates
a static or global variable, multiple threads can access this variable
at the same time, potentially corrupting the variables contents. Local
and automatic variables are created on the threads stack and are therefore
less likely to be corrupted. Allocations are made from the top of the
stack down, e.g with a stack from 0x08000000 to 0x080FF000 allocations will
cause pages to be committed from 0x080FF000 down through to 0x08001000,
access to a page at 0x08001000 will cause a stack overflow exception. The
stack can not grow any further and any additional attempt to access the
stack will cause an access violation which could cause the process to
terminate without notice.
 
We have covered two of the three mechanisms for manipulating memory above,
heaps (which are best for managing large numbers of small object) and
memory mapped files (which are best for managing large streams of data),
the final mechanism is direct virtual memory allocation. The Win32
functions for manipulating virtual memory allow you to directly reserve a
region of the address space and commit physical storage (from the page
file) to the region. The main APIs used to achieve this are VirtualAlloc /
VirtualFree and they allow a contiguous region of the virtual address space
to be defined at either an explicit or implicit address, rounded to an even
64K boundary. They also allow the region to be reserved (MEM_RESERVE) or
reserved and committed (MEM_RESERVE | MEM_COMMIT) as well as setting the
regions access permissions e.g. PAGE_READWRITE / PAGE_READONLY. Using this
mechanism provides the process with a flexible and efficient mechanism of
managing limited memory resources, particularly for large allocations.
 
When a thread terminates it can do so in one of two ways; by calling
ExitThread or TerminateThread. When a thread detaches under normal
conditions the ExitThread function will be called, the stack for the thread
will be destroyed and memory will be released back to the virtual address
space of the process. If a thread is terminated using TerminateThread the
thread detach code will not be called and Windows NT will not destroy the
threads stack. As a result the regions and stack for the terminated thread
will not be released back to the processes virtual address space, this
memory will only get released when the process that owns the thread
terminates.

1 comment:

  1. In case you are looking into making cash from your visitors by popunder ads, you can use one of the highest paying companies: Propeller Ads.

    ReplyDelete