All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-nvdimm@lists.01.org,
	Mike Rapoport <rppt@linux.ibm.com>,
	linux-kernel@vger.kernel.org, Jonathan Corbet <corbet@lwn.net>
Subject: [PATCH v10 10/13] mm: Document ZONE_DEVICE memory-model implications
Date: Tue, 18 Jun 2019 22:52:29 -0700	[thread overview]
Message-ID: <156092354985.979959.15763234410543451710.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)
In-Reply-To: <156092349300.979959.17603710711957735135.stgit@dwillia2-desk3.amr.corp.intel.com>

Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users
of 'devm_memremap_pages()'.

Cc: Jonathan Corbet <corbet@lwn.net>
Reported-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/vm/memory-model.rst |   39 +++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst
index 382f72ace1fc..e0af47e02e78 100644
--- a/Documentation/vm/memory-model.rst
+++ b/Documentation/vm/memory-model.rst
@@ -181,3 +181,42 @@ that is eventually passed to vmemmap_populate() through a long chain
 of function calls. The vmemmap_populate() implementation may use the
 `vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to
 allocate memory map on the persistent memory device.
+
+ZONE_DEVICE
+===========
+The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer
+`struct page` `mem_map` services for device driver identified physical
+address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact
+that the page objects for these address ranges are never marked online,
+and that a reference must be taken against the device, not just the page
+to keep the memory pinned for active use. `ZONE_DEVICE`, via
+:c:func:`devm_memremap_pages`, performs just enough memory hotplug to
+turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and
+:c:func:`get_user_pages` service for the given range of pfns. Since the
+page reference count never drops below 1 the page is never tracked as
+free memory and the page's `struct list_head lru` space is repurposed
+for back referencing to the host device / driver that mapped the memory.
+
+While `SPARSEMEM` presents memory as a collection of sections,
+optionally collected into memory blocks, `ZONE_DEVICE` users have a need
+for smaller granularity of populating the `mem_map`. Given that
+`ZONE_DEVICE` memory is never marked online it is subsequently never
+subject to its memory ranges being exposed through the sysfs memory
+hotplug api on memory block boundaries. The implementation relies on
+this lack of user-api constraint to allow sub-section sized memory
+ranges to be specified to :c:func:`arch_add_memory`, the top-half of
+memory hotplug. Sub-section support allows for `PMD_SIZE` as the minimum
+alignment granularity for :c:func:`devm_memremap_pages`.
+
+The users of `ZONE_DEVICE` are:
+* pmem: Map platform persistent memory to be used as a direct-I/O target
+  via DAX mappings.
+
+* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
+  event callbacks to allow a device-driver to coordinate memory management
+  events related to device-memory, typically GPU memory. See
+  Documentation/vm/hmm.rst.
+
+* p2pdma: Create `struct page` objects to allow peer devices in a
+  PCI/-E topology to coordinate direct-DMA operations between themselves,
+  i.e. bypass host memory.

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

WARNING: multiple messages have this Message-ID (diff)
From: Dan Williams <dan.j.williams@intel.com>
To: akpm@linux-foundation.org
Cc: Jonathan Corbet <corbet@lwn.net>,
	Mike Rapoport <rppt@linux.ibm.com>,
	linux-mm@kvack.org, linux-nvdimm@lists.01.org,
	linux-kernel@vger.kernel.org
Subject: [PATCH v10 10/13] mm: Document ZONE_DEVICE memory-model implications
Date: Tue, 18 Jun 2019 22:52:29 -0700	[thread overview]
Message-ID: <156092354985.979959.15763234410543451710.stgit@dwillia2-desk3.amr.corp.intel.com> (raw)
In-Reply-To: <156092349300.979959.17603710711957735135.stgit@dwillia2-desk3.amr.corp.intel.com>

Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users
of 'devm_memremap_pages()'.

Cc: Jonathan Corbet <corbet@lwn.net>
Reported-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 Documentation/vm/memory-model.rst |   39 +++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/Documentation/vm/memory-model.rst b/Documentation/vm/memory-model.rst
index 382f72ace1fc..e0af47e02e78 100644
--- a/Documentation/vm/memory-model.rst
+++ b/Documentation/vm/memory-model.rst
@@ -181,3 +181,42 @@ that is eventually passed to vmemmap_populate() through a long chain
 of function calls. The vmemmap_populate() implementation may use the
 `vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to
 allocate memory map on the persistent memory device.
+
+ZONE_DEVICE
+===========
+The `ZONE_DEVICE` facility builds upon `SPARSEMEM_VMEMMAP` to offer
+`struct page` `mem_map` services for device driver identified physical
+address ranges. The "device" aspect of `ZONE_DEVICE` relates to the fact
+that the page objects for these address ranges are never marked online,
+and that a reference must be taken against the device, not just the page
+to keep the memory pinned for active use. `ZONE_DEVICE`, via
+:c:func:`devm_memremap_pages`, performs just enough memory hotplug to
+turn on :c:func:`pfn_to_page`, :c:func:`page_to_pfn`, and
+:c:func:`get_user_pages` service for the given range of pfns. Since the
+page reference count never drops below 1 the page is never tracked as
+free memory and the page's `struct list_head lru` space is repurposed
+for back referencing to the host device / driver that mapped the memory.
+
+While `SPARSEMEM` presents memory as a collection of sections,
+optionally collected into memory blocks, `ZONE_DEVICE` users have a need
+for smaller granularity of populating the `mem_map`. Given that
+`ZONE_DEVICE` memory is never marked online it is subsequently never
+subject to its memory ranges being exposed through the sysfs memory
+hotplug api on memory block boundaries. The implementation relies on
+this lack of user-api constraint to allow sub-section sized memory
+ranges to be specified to :c:func:`arch_add_memory`, the top-half of
+memory hotplug. Sub-section support allows for `PMD_SIZE` as the minimum
+alignment granularity for :c:func:`devm_memremap_pages`.
+
+The users of `ZONE_DEVICE` are:
+* pmem: Map platform persistent memory to be used as a direct-I/O target
+  via DAX mappings.
+
+* hmm: Extend `ZONE_DEVICE` with `->page_fault()` and `->page_free()`
+  event callbacks to allow a device-driver to coordinate memory management
+  events related to device-memory, typically GPU memory. See
+  Documentation/vm/hmm.rst.
+
+* p2pdma: Create `struct page` objects to allow peer devices in a
+  PCI/-E topology to coordinate direct-DMA operations between themselves,
+  i.e. bypass host memory.


  parent reply	other threads:[~2019-06-19  6:06 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-19  5:51 [PATCH v10 00/13] mm: Sub-section memory hotplug support Dan Williams
2019-06-19  5:51 ` Dan Williams
2019-06-19  5:51 ` [PATCH v10 01/13] mm/sparsemem: Introduce struct mem_section_usage Dan Williams
2019-06-19  5:51   ` Dan Williams
2019-06-19  5:51 ` [PATCH v10 02/13] mm/sparsemem: Introduce a SECTION_IS_EARLY flag Dan Williams
2019-06-19  5:51   ` Dan Williams
2019-06-24 17:54   ` Oscar Salvador
2019-06-24 17:54     ` Oscar Salvador
2019-06-19  5:51 ` [PATCH v10 03/13] mm/sparsemem: Add helpers track active portions of a section at boot Dan Williams
2019-06-19  5:51   ` Dan Williams
2019-06-24 17:57   ` Oscar Salvador
2019-06-24 17:57     ` Oscar Salvador
2019-06-24 17:57     ` Oscar Salvador
2019-06-19  5:51 ` [PATCH v10 04/13] mm/hotplug: Prepare shrink_{zone, pgdat}_span for sub-section removal Dan Williams
2019-06-19  5:51   ` Dan Williams
2019-06-19  5:52 ` [PATCH v10 05/13] mm/sparsemem: Convert kmalloc_section_memmap() to populate_section_memmap() Dan Williams
2019-06-19  5:52   ` Dan Williams
2019-06-24 18:00   ` Oscar Salvador
2019-06-24 18:00     ` Oscar Salvador
2019-06-24 18:00     ` Oscar Salvador
2019-06-19  5:52 ` [PATCH v10 06/13] mm/hotplug: Kill is_dev_zone() usage in __remove_pages() Dan Williams
2019-06-19  5:52   ` Dan Williams
2019-06-19  5:52 ` [PATCH v10 07/13] mm: Kill is_dev_zone() helper Dan Williams
2019-06-19  5:52   ` Dan Williams
2019-06-19  5:52 ` [PATCH v10 08/13] mm/sparsemem: Prepare for sub-section ranges Dan Williams
2019-06-19  5:52   ` Dan Williams
2019-06-20 10:31   ` David Hildenbrand
2019-06-20 10:31     ` David Hildenbrand
2019-06-20 16:19     ` Dan Williams
2019-06-20 16:19       ` Dan Williams
2019-06-20 16:35       ` David Hildenbrand
2019-06-20 16:35         ` David Hildenbrand
2019-06-20 16:56         ` Dan Williams
2019-06-20 16:56           ` Dan Williams
2019-06-20 16:56           ` Dan Williams
2019-06-24 18:05   ` Oscar Salvador
2019-06-24 18:05     ` Oscar Salvador
2019-06-24 18:05     ` Oscar Salvador
2019-06-19  5:52 ` [PATCH v10 09/13] mm/sparsemem: Support sub-section hotplug Dan Williams
2019-06-19  5:52   ` Dan Williams
2019-06-24 20:45   ` Oscar Salvador
2019-06-24 20:45     ` Oscar Salvador
2019-06-24 20:45     ` Oscar Salvador
2019-06-19  5:52 ` Dan Williams [this message]
2019-06-19  5:52   ` [PATCH v10 10/13] mm: Document ZONE_DEVICE memory-model implications Dan Williams
2019-06-20 12:30   ` Mike Rapoport
2019-06-20 12:30     ` Mike Rapoport
2019-06-19  5:52 ` [PATCH v10 11/13] mm/devm_memremap_pages: Enable sub-section remap Dan Williams
2019-06-19  5:52   ` Dan Williams
2019-06-19  5:52 ` [PATCH v10 12/13] libnvdimm/pfn: Fix fsdax-mode namespace info-block zero-fields Dan Williams
2019-06-19  5:52   ` Dan Williams
2019-06-19 16:30   ` Aneesh Kumar K.V
2019-06-19 17:06     ` Dan Williams
2019-06-19 17:06       ` Dan Williams
2019-06-19 17:06       ` Dan Williams
2019-06-19  5:52 ` [PATCH v10 13/13] libnvdimm/pfn: Stop padding pmem namespaces to section alignment Dan Williams
2019-06-19  5:52   ` Dan Williams
2019-06-20 12:30 ` [PATCH v10 00/13] mm: Sub-section memory hotplug support Aneesh Kumar K.V
2019-06-20 12:30   ` Aneesh Kumar K.V
2019-06-20 16:30   ` Dan Williams
2019-06-20 16:30     ` Dan Williams
2019-06-20 17:00 ` Oscar Salvador
2019-06-20 17:00   ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=156092354985.979959.15763234410543451710.stgit@dwillia2-desk3.amr.corp.intel.com \
    --to=dan.j.williams@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=rppt@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.