docs/vm: add documentation of memory models
diff mbox series

Message ID 1556101715-31966-1-git-send-email-rppt@linux.ibm.com
State Superseded
Headers show
Series
  • docs/vm: add documentation of memory models
Related show

Commit Message

Mike Rapoport April 24, 2019, 10:28 a.m. UTC
Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to
maintain pfn <-> struct page correspondence.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
---
 Documentation/vm/index.rst        |   1 +
 Documentation/vm/memory-model.rst | 171 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 172 insertions(+)
 create mode 100644 Documentation/vm/memory-model.rst

Comments

Anshuman Khandual April 24, 2019, 10:50 a.m. UTC | #1
On 04/24/2019 03:58 PM, Mike Rapoport wrote:
> +To use vmemmap, an architecture has to reserve a range of virtual
> +addresses that will map the physical pages containing the memory
> +map. and make sure that `vmemmap` points to that range. In addition,
> +the architecture should implement :c:func:`vmemmap_populate` method
> +that will allocate the physical memory and create page tables for the
> +virtual memory map. If an architecture does not have any special
> +requirements for the vmemmap mappings, it can use default
> +:c:func:`vmemmap_populate_basepages` provided by the generic memory
> +management.

Just to complete it, could you also include struct vmem_altmap and how it
can contribute towards the physical backing for vmemmap virtual mapping.
Otherwise the write up looks complete.
Mike Rapoport April 24, 2019, 11:35 a.m. UTC | #2
On Wed, Apr 24, 2019 at 04:20:02PM +0530, Anshuman Khandual wrote:
> 
> 
> On 04/24/2019 03:58 PM, Mike Rapoport wrote:
> > +To use vmemmap, an architecture has to reserve a range of virtual
> > +addresses that will map the physical pages containing the memory
> > +map. and make sure that `vmemmap` points to that range. In addition,
> > +the architecture should implement :c:func:`vmemmap_populate` method
> > +that will allocate the physical memory and create page tables for the
> > +virtual memory map. If an architecture does not have any special
> > +requirements for the vmemmap mappings, it can use default
> > +:c:func:`vmemmap_populate_basepages` provided by the generic memory
> > +management.
> 
> Just to complete it, could you also include struct vmem_altmap and how it
> can contribute towards the physical backing for vmemmap virtual mapping.
> Otherwise the write up looks complete.

Sure, but I'd prefer having it as a separate patch.
Jonathan Corbet April 24, 2019, 4:14 p.m. UTC | #3
On Wed, 24 Apr 2019 13:28:35 +0300
Mike Rapoport <rppt@linux.ibm.com> wrote:

> Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to
> maintain pfn <-> struct page correspondence.

Quick question: should this document perhaps mention that DISCONTIGMEM
appears to be on its way out?

Thanks,

jon
Mike Rapoport April 24, 2019, 6:02 p.m. UTC | #4
On Wed, Apr 24, 2019 at 10:14:55AM -0600, Jonathan Corbet wrote:
> On Wed, 24 Apr 2019 13:28:35 +0300
> Mike Rapoport <rppt@linux.ibm.com> wrote:
> 
> > Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to
> > maintain pfn <-> struct page correspondence.
> 
> Quick question: should this document perhaps mention that DISCONTIGMEM
> appears to be on its way out?

I suspect it'll take a while until then, but I'll add a sentence about it
being deprecated.
Which reminds me that mm/Kconfig also begs for the corresponding update.
 
> Thanks,
> 
> jon
>
Randy Dunlap April 25, 2019, 1:08 a.m. UTC | #5
On 4/24/19 3:28 AM, Mike Rapoport wrote:
> Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to
> maintain pfn <-> struct page correspondence.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> ---
>  Documentation/vm/index.rst        |   1 +
>  Documentation/vm/memory-model.rst | 171 ++++++++++++++++++++++++++++++++++++++
>  2 files changed, 172 insertions(+)
>  create mode 100644 Documentation/vm/memory-model.rst
> 

Hi Mike,
I have a few minor edits below...

> diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst
> new file mode 100644
> index 0000000..914c52a
> --- /dev/null
> +++ b/Documentation/vm/memory-model.rst
> @@ -0,0 +1,171 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +.. _physical_memory_model:
> +
> +=====================
> +Physical Memory Model
> +=====================
> +
> +Physical memory in a system may be addressed in different ways. The
> +simplest case is when the physical memory starts at address 0 and
> +spans a contiguous range up to the maximal address. It could be,
> +however, that this range contains small holes that are not accessible
> +for the CPU. Then there could be several contiguous ranges at
> +completely distinct addresses. And, don't forget about NUMA, where
> +different memory banks are attached to different CPUs.
> +
> +Linux abstracts this diversity using one of the three memory models:
> +FLATMEM, DISCONTIGMEM and SPARSEMEM. Each architecture defines what
> +memory models it supports, what is the default memory model and
> +whether it possible to manually override that default.
> +
> +All the memory models track the status of physical page frames using
> +:c:type:`struct page` arranged in one or more arrays.
> +
> +Regardless of the selected memory model, there exists one-to-one
> +mapping between the physical page frame number (PFN) and the
> +corresponding `struct page`.
> +
> +Each memory model defines :c:func:`pfn_to_page` and :c:func:`page_to_pfn`
> +helpers that allow the conversion from PFN to `struct page` and vise

                                                                   vice

> +versa.
> +
> +FLATMEM
> +=======
> +
> +The simplest memory model is FLATMEM. This model is suitable for
> +non-NUMA systems with contiguous, or mostly contiguous, physical
> +memory.
> +
> +In the FLATMEM memory model, there is a global `mem_map` array that
> +maps the entire physical memory. For most architectures, the holes
> +have entries in the `mem_map` array. The `struct page` objects
> +corresponding to the holes are never fully initialized.
> +
> +To allocate the `mem_map` array, architecture specific setup code
> +should call :c:func:`free_area_init_node` function or its convenience
> +wrapper :c:func:`free_area_init`. Yet, the mappings array is not
> +usable until the call to :c:func:`memblock_free_all` that hands all
> +the memory to the page allocator.
> +
> +If an architecture enables `CONFIG_ARCH_HAS_HOLES_MEMORYMODEL` option,
> +it may free parts of the `mem_map` array that do not cover the
> +actual physical pages. In such case, the architecture specific
> +:c:func:`pfn_valid` implementation should take the holes in the
> +`mem_map` into account.
> +
> +With FLATMEM, the conversion between a PFN and the `struct page` is
> +straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the
> +`mem_map` array.
> +
> +The `ARCH_PFN_OFFSET` defines the first page frame number for
> +systems that their physical memory does not start at 0.

s/that/when/ ?  Seems awkward as is.

> +
> +DISCONTIGMEM
> +============
> +
> +The DISCONTIGMEM model treats the physical memory as a collection of
> +`nodes` similarly to how Linux NUMA support does. For each node Linux
> +constructs an independent memory management subsystem represented by
> +`struct pglist_data` (or `pg_data_t` for short). Among other
> +things, `pg_data_t` holds the `node_mem_map` array that maps
> +physical pages belonging to that node. The `node_start_pfn` field of
> +`pg_data_t` is the number of the first page frame belonging to that
> +node.
> +
> +The architecture setup code should call :c:func:`free_area_init_node` for
> +each node in the system to initialize the `pg_data_t` object and its
> +`node_mem_map`.
> +
> +Every `node_mem_map` behaves exactly as FLATMEM's `mem_map` -
> +every physical page frame in a node has a `struct page` entry in the
> +`node_mem_map` array. When DISCONTIGMEM is enabled, a portion of the
> +`flags` field of the `struct page` encodes the node number of the
> +node hosting that page.
> +
> +The conversion between a PFN and the `struct page` in the
> +DISCONTIGMEM model became slightly more complex as it has to determine
> +which node hosts the physical page and which `pg_data_t` object
> +holds the `struct page`.
> +
> +Architectures that support DISCONTIGMEM provide :c:func:`pfn_to_nid`
> +to convert PFN to the node number. The opposite conversion helper
> +:c:func:`page_to_nid` is generic as it uses the node number encoded in
> +page->flags.
> +
> +Once the node number is known, the PFN can be used to index
> +appropriate `node_mem_map` array to access the `struct page` and
> +the offset of the `struct page` from the `node_mem_map` plus
> +`node_start_pfn` is the PFN of that page.
> +
> +SPARSEMEM
> +=========
> +
> +SPARSEMEM is the most versatile memory model available in Linux and it
> +is the only memory model that supports several advanced features such
> +as hot-plug and hot-remove of the physical memory, alternative memory
> +maps for non-volatile memory devices and deferred initialization of
> +the memory map for larger systems.
> +
> +The SPARSEMEM model presents the physical memory as a collection of
> +sections. A section is represented with :c:type:`struct mem_section`
> +that contains `section_mem_map` that is, logically, a pointer to an
> +array of struct pages. However, it is stored with some other magic
> +that aids the sections management. The section size and maximal number
> +of section is specified using `SECTION_SIZE_BITS` and
> +`MAX_PHYSMEM_BITS` constants defined by each architecture that
> +supports SPARSEMEM. While `MAX_PHYSMEM_BITS` is an actual width of a
> +physical address that an architecture supports, the
> +`SECTION_SIZE_BITS` is an arbitrary value.
> +
> +The maximal number of sections is denoted `NR_MEM_SECTIONS` and
> +defined as
> +
> +.. math::
> +
> +   NR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)}
> +
> +The `mem_section` objects are arranged in a two dimensional array

                                               two-dimensional

> +called `mem_sections`. The size and placement of this array depend
> +on `CONFIG_SPARSEMEM_EXTREME` and the maximal possible number of
> +sections:
> +
> +* When `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections`
> +  array is static and has `NR_MEM_SECTIONS` rows. Each row holds a
> +  single `mem_section` object.
> +* When `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections`
> +  array is dynamically allocated. Each row contains PAGE_SIZE worth of
> +  `mem_section` objects and the number of rows is calculated to fit
> +  all the memory sections.
> +
> +The architecture setup code should call :c:func:`memory_present` for
> +each active memory range or use :c:func:`memblocks_present` or
> +:c:func:`sparse_memory_present_with_active_regions` wrappers to
> +initialize the memory sections. Next, the actual memory maps should be
> +set up using :c:func:`sparse_init`.
> +
> +With SPARSEMEM there are two possible ways to convert a PFN to the
> +corresponding `struct page` - a "classic sparse" and "sparse
> +vmemmap". The selection is made at build time and it is determined by
> +the value of `CONFIG_SPARSEMEM_VMEMMAP`.
> +
> +The classic sparse encodes the section number of a page in page->flags
> +and uses high bits of a PFN to access the section that maps that page
> +frame. Inside a section, the PFN is the index to the array of pages.
> +
> +The sparse vmemmap uses a virtually mapped memory map to optimize
> +pfn_to_page and page_to_pfn operations. There is a global `struct
> +page *vmemmap` pointer that points to a virtually contiguous array of
> +`struct page` objects. A PFN is an index to that array and the the
> +offset of the `struct page` from `vmemmap` is the PFN of that
> +page.
> +
> +To use vmemmap, an architecture has to reserve a range of virtual
> +addresses that will map the physical pages containing the memory
> +map. and make sure that `vmemmap` points to that range. In addition,

   map and

> +the architecture should implement :c:func:`vmemmap_populate` method
> +that will allocate the physical memory and create page tables for the
> +virtual memory map. If an architecture does not have any special
> +requirements for the vmemmap mappings, it can use default
> +:c:func:`vmemmap_populate_basepages` provided by the generic memory
> +management.
> 

thanks.
Mike Rapoport April 25, 2019, 8:22 a.m. UTC | #6
Hi Randy,

On Wed, Apr 24, 2019 at 06:08:46PM -0700, Randy Dunlap wrote:
> On 4/24/19 3:28 AM, Mike Rapoport wrote:
> > Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to
> > maintain pfn <-> struct page correspondence.
> > 
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> >  Documentation/vm/index.rst        |   1 +
> >  Documentation/vm/memory-model.rst | 171 ++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 172 insertions(+)
> >  create mode 100644 Documentation/vm/memory-model.rst
> > 
> 
> Hi Mike,
> I have a few minor edits below...

I kinda expected those ;-)

> > diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst
> > new file mode 100644
> > index 0000000..914c52a
> > --- /dev/null
> > +++ b/Documentation/vm/memory-model.rst

...

> > +
> > +With FLATMEM, the conversion between a PFN and the `struct page` is
> > +straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the
> > +`mem_map` array.
> > +
> > +The `ARCH_PFN_OFFSET` defines the first page frame number for
> > +systems that their physical memory does not start at 0.
> 
> s/that/when/ ?  Seems awkward as is.

Yeah, it is awkward. How about

The `ARCH_PFN_OFFSET` defines the first page frame number for
systems with physical memory starting at address different from 0.

> > +
> > +DISCONTIGMEM
> > +============
> > +
> 
> thanks.
> -- 
> ~Randy
>
Randy Dunlap April 25, 2019, 3:01 p.m. UTC | #7
On 4/25/19 1:22 AM, Mike Rapoport wrote:
> Hi Randy,
> 
> On Wed, Apr 24, 2019 at 06:08:46PM -0700, Randy Dunlap wrote:
>> On 4/24/19 3:28 AM, Mike Rapoport wrote:
>>> Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to
>>> maintain pfn <-> struct page correspondence.
>>>
>>> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
>>> ---
>>>  Documentation/vm/index.rst        |   1 +
>>>  Documentation/vm/memory-model.rst | 171 ++++++++++++++++++++++++++++++++++++++
>>>  2 files changed, 172 insertions(+)
>>>  create mode 100644 Documentation/vm/memory-model.rst
>>>
>>
>> Hi Mike,
>> I have a few minor edits below...
> 
> I kinda expected those ;-)
> 
>>> diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst
>>> new file mode 100644
>>> index 0000000..914c52a
>>> --- /dev/null
>>> +++ b/Documentation/vm/memory-model.rst
> 
> ...
> 
>>> +
>>> +With FLATMEM, the conversion between a PFN and the `struct page` is
>>> +straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the
>>> +`mem_map` array.
>>> +
>>> +The `ARCH_PFN_OFFSET` defines the first page frame number for
>>> +systems that their physical memory does not start at 0.
>>
>> s/that/when/ ?  Seems awkward as is.
> 
> Yeah, it is awkward. How about
> 
> The `ARCH_PFN_OFFSET` defines the first page frame number for
> systems with physical memory starting at address different from 0.

OK.  Thanks.

>>> +
>>> +DISCONTIGMEM
>>> +============
>>> +
>>
>> thanks.
>> -- 
>> ~Randy
>>
>

Patch
diff mbox series

diff --git a/Documentation/vm/index.rst b/Documentation/vm/index.rst
index b58cc3b..e8d943b 100644
--- a/Documentation/vm/index.rst
+++ b/Documentation/vm/index.rst
@@ -37,6 +37,7 @@  descriptions of data structures and algorithms.
    hwpoison
    hugetlbfs_reserv
    ksm
+   memory-model
    mmu_notifier
    numa
    overcommit-accounting
diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst
new file mode 100644
index 0000000..914c52a
--- /dev/null
+++ b/Documentation/vm/memory-model.rst
@@ -0,0 +1,171 @@ 
+.. SPDX-License-Identifier: GPL-2.0
+
+.. _physical_memory_model:
+
+=====================
+Physical Memory Model
+=====================
+
+Physical memory in a system may be addressed in different ways. The
+simplest case is when the physical memory starts at address 0 and
+spans a contiguous range up to the maximal address. It could be,
+however, that this range contains small holes that are not accessible
+for the CPU. Then there could be several contiguous ranges at
+completely distinct addresses. And, don't forget about NUMA, where
+different memory banks are attached to different CPUs.
+
+Linux abstracts this diversity using one of the three memory models:
+FLATMEM, DISCONTIGMEM and SPARSEMEM. Each architecture defines what
+memory models it supports, what is the default memory model and
+whether it possible to manually override that default.
+
+All the memory models track the status of physical page frames using
+:c:type:`struct page` arranged in one or more arrays.
+
+Regardless of the selected memory model, there exists one-to-one
+mapping between the physical page frame number (PFN) and the
+corresponding `struct page`.
+
+Each memory model defines :c:func:`pfn_to_page` and :c:func:`page_to_pfn`
+helpers that allow the conversion from PFN to `struct page` and vise
+versa.
+
+FLATMEM
+=======
+
+The simplest memory model is FLATMEM. This model is suitable for
+non-NUMA systems with contiguous, or mostly contiguous, physical
+memory.
+
+In the FLATMEM memory model, there is a global `mem_map` array that
+maps the entire physical memory. For most architectures, the holes
+have entries in the `mem_map` array. The `struct page` objects
+corresponding to the holes are never fully initialized.
+
+To allocate the `mem_map` array, architecture specific setup code
+should call :c:func:`free_area_init_node` function or its convenience
+wrapper :c:func:`free_area_init`. Yet, the mappings array is not
+usable until the call to :c:func:`memblock_free_all` that hands all
+the memory to the page allocator.
+
+If an architecture enables `CONFIG_ARCH_HAS_HOLES_MEMORYMODEL` option,
+it may free parts of the `mem_map` array that do not cover the
+actual physical pages. In such case, the architecture specific
+:c:func:`pfn_valid` implementation should take the holes in the
+`mem_map` into account.
+
+With FLATMEM, the conversion between a PFN and the `struct page` is
+straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the
+`mem_map` array.
+
+The `ARCH_PFN_OFFSET` defines the first page frame number for
+systems that their physical memory does not start at 0.
+
+DISCONTIGMEM
+============
+
+The DISCONTIGMEM model treats the physical memory as a collection of
+`nodes` similarly to how Linux NUMA support does. For each node Linux
+constructs an independent memory management subsystem represented by
+`struct pglist_data` (or `pg_data_t` for short). Among other
+things, `pg_data_t` holds the `node_mem_map` array that maps
+physical pages belonging to that node. The `node_start_pfn` field of
+`pg_data_t` is the number of the first page frame belonging to that
+node.
+
+The architecture setup code should call :c:func:`free_area_init_node` for
+each node in the system to initialize the `pg_data_t` object and its
+`node_mem_map`.
+
+Every `node_mem_map` behaves exactly as FLATMEM's `mem_map` -
+every physical page frame in a node has a `struct page` entry in the
+`node_mem_map` array. When DISCONTIGMEM is enabled, a portion of the
+`flags` field of the `struct page` encodes the node number of the
+node hosting that page.
+
+The conversion between a PFN and the `struct page` in the
+DISCONTIGMEM model became slightly more complex as it has to determine
+which node hosts the physical page and which `pg_data_t` object
+holds the `struct page`.
+
+Architectures that support DISCONTIGMEM provide :c:func:`pfn_to_nid`
+to convert PFN to the node number. The opposite conversion helper
+:c:func:`page_to_nid` is generic as it uses the node number encoded in
+page->flags.
+
+Once the node number is known, the PFN can be used to index
+appropriate `node_mem_map` array to access the `struct page` and
+the offset of the `struct page` from the `node_mem_map` plus
+`node_start_pfn` is the PFN of that page.
+
+SPARSEMEM
+=========
+
+SPARSEMEM is the most versatile memory model available in Linux and it
+is the only memory model that supports several advanced features such
+as hot-plug and hot-remove of the physical memory, alternative memory
+maps for non-volatile memory devices and deferred initialization of
+the memory map for larger systems.
+
+The SPARSEMEM model presents the physical memory as a collection of
+sections. A section is represented with :c:type:`struct mem_section`
+that contains `section_mem_map` that is, logically, a pointer to an
+array of struct pages. However, it is stored with some other magic
+that aids the sections management. The section size and maximal number
+of section is specified using `SECTION_SIZE_BITS` and
+`MAX_PHYSMEM_BITS` constants defined by each architecture that
+supports SPARSEMEM. While `MAX_PHYSMEM_BITS` is an actual width of a
+physical address that an architecture supports, the
+`SECTION_SIZE_BITS` is an arbitrary value.
+
+The maximal number of sections is denoted `NR_MEM_SECTIONS` and
+defined as
+
+.. math::
+
+   NR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)}
+
+The `mem_section` objects are arranged in a two dimensional array
+called `mem_sections`. The size and placement of this array depend
+on `CONFIG_SPARSEMEM_EXTREME` and the maximal possible number of
+sections:
+
+* When `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections`
+  array is static and has `NR_MEM_SECTIONS` rows. Each row holds a
+  single `mem_section` object.
+* When `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections`
+  array is dynamically allocated. Each row contains PAGE_SIZE worth of
+  `mem_section` objects and the number of rows is calculated to fit
+  all the memory sections.
+
+The architecture setup code should call :c:func:`memory_present` for
+each active memory range or use :c:func:`memblocks_present` or
+:c:func:`sparse_memory_present_with_active_regions` wrappers to
+initialize the memory sections. Next, the actual memory maps should be
+set up using :c:func:`sparse_init`.
+
+With SPARSEMEM there are two possible ways to convert a PFN to the
+corresponding `struct page` - a "classic sparse" and "sparse
+vmemmap". The selection is made at build time and it is determined by
+the value of `CONFIG_SPARSEMEM_VMEMMAP`.
+
+The classic sparse encodes the section number of a page in page->flags
+and uses high bits of a PFN to access the section that maps that page
+frame. Inside a section, the PFN is the index to the array of pages.
+
+The sparse vmemmap uses a virtually mapped memory map to optimize
+pfn_to_page and page_to_pfn operations. There is a global `struct
+page *vmemmap` pointer that points to a virtually contiguous array of
+`struct page` objects. A PFN is an index to that array and the the
+offset of the `struct page` from `vmemmap` is the PFN of that
+page.
+
+To use vmemmap, an architecture has to reserve a range of virtual
+addresses that will map the physical pages containing the memory
+map. and make sure that `vmemmap` points to that range. In addition,
+the architecture should implement :c:func:`vmemmap_populate` method
+that will allocate the physical memory and create page tables for the
+virtual memory map. If an architecture does not have any special
+requirements for the vmemmap mappings, it can use default
+:c:func:`vmemmap_populate_basepages` provided by the generic memory
+management.