linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Andrew Morton <akpm@linux-foundation.org>,
	Dan Williams <dan.j.williams@intel.com>
Cc: "Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>,
	"Christian König" <christian.koenig@amd.com>,
	"Pan, Xinhui" <Xinhui.Pan@amd.com>,
	"Ben Skeggs" <bskeggs@redhat.com>,
	"Karol Herbst" <kherbst@redhat.com>,
	"Lyude Paul" <lyude@redhat.com>, "Jason Gunthorpe" <jgg@ziepe.ca>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Logan Gunthorpe" <logang@deltatee.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	linux-kernel@vger.kernel.org, amd-gfx@lists.freedesktop.org,
	dri-devel@lists.freedesktop.org, nouveau@lists.freedesktop.org,
	nvdimm@lists.linux.dev, linux-mm@kvack.org,
	"Alex Sierra" <alex.sierra@amd.com>
Subject: [PATCH 15/27] mm: add zone device coherent type memory support
Date: Thu, 10 Feb 2022 08:28:16 +0100	[thread overview]
Message-ID: <20220210072828.2930359-16-hch@lst.de> (raw)
In-Reply-To: <20220210072828.2930359-1-hch@lst.de>

From: Alex Sierra <alex.sierra@amd.com>

Device memory that is cache coherent from device and CPU point of view.
This is used on platforms that have an advanced system bus (like CAPI
or CXL). Any page of a process can be migrated to such memory. However,
no one should be allowed to pin such memory so that it can always be
evicted.

Signed-off-by: Alex Sierra <alex.sierra@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
[hch: rebased ontop of the refcount changes,
      removed is_dev_private_or_coherent_page]
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/memremap.h | 14 ++++++++++++++
 mm/memcontrol.c          |  7 ++++---
 mm/memory-failure.c      |  8 ++++++--
 mm/memremap.c            | 10 ++++++++++
 mm/migrate_device.c      | 16 +++++++---------
 mm/rmap.c                |  5 +++--
 6 files changed, 44 insertions(+), 16 deletions(-)

diff --git a/include/linux/memremap.h b/include/linux/memremap.h
index d6a114dd5ea8b7..eb73630a49da39 100644
--- a/include/linux/memremap.h
+++ b/include/linux/memremap.h
@@ -41,6 +41,13 @@ struct vmem_altmap {
  * A more complete discussion of unaddressable memory may be found in
  * include/linux/hmm.h and Documentation/vm/hmm.rst.
  *
+ * MEMORY_DEVICE_COHERENT:
+ * Device memory that is cache coherent from device and CPU point of view. This
+ * is used on platforms that have an advanced system bus (like CAPI or CXL). A
+ * driver can hotplug the device memory using ZONE_DEVICE and with that memory
+ * type. Any page of a process can be migrated to such memory. However no one
+ * should be allowed to pin such memory so that it can always be evicted.
+ *
  * MEMORY_DEVICE_FS_DAX:
  * Host memory that has similar access semantics as System RAM i.e. DMA
  * coherent and supports page pinning. In support of coordinating page
@@ -61,6 +68,7 @@ struct vmem_altmap {
 enum memory_type {
 	/* 0 is reserved to catch uninitialized type fields */
 	MEMORY_DEVICE_PRIVATE = 1,
+	MEMORY_DEVICE_COHERENT,
 	MEMORY_DEVICE_FS_DAX,
 	MEMORY_DEVICE_GENERIC,
 	MEMORY_DEVICE_PCI_P2PDMA,
@@ -138,6 +146,12 @@ static inline bool is_device_private_page(const struct page *page)
 		page->pgmap->type == MEMORY_DEVICE_PRIVATE;
 }
 
+static inline bool is_device_coherent_page(const struct page *page)
+{
+	return is_zone_device_page(page) &&
+		page->pgmap->type == MEMORY_DEVICE_COHERENT;
+}
+
 static inline bool is_pci_p2pdma_page(const struct page *page)
 {
 	return IS_ENABLED(CONFIG_PCI_P2PDMA) &&
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 510cbfb82bb62a..10259c35fde20d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5687,8 +5687,8 @@ static int mem_cgroup_move_account(struct page *page,
  *   2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
  *     target for charge migration. if @target is not NULL, the entry is stored
  *     in target->ent.
- *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is MEMORY_DEVICE_PRIVATE
- *     (so ZONE_DEVICE page and thus not on the lru).
+ *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is device memory and
+ *   thus not on the lru.
  *     For now we such page is charge like a regular page would be as for all
  *     intent and purposes it is just special memory taking the place of a
  *     regular page.
@@ -5722,7 +5722,8 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
 		 */
 		if (page_memcg(page) == mc.from) {
 			ret = MC_TARGET_PAGE;
-			if (is_device_private_page(page))
+			if (is_device_private_page(page) ||
+			    is_device_coherent_page(page))
 				ret = MC_TARGET_DEVICE;
 			if (target)
 				target->page = page;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 97a9ed8f87a96a..f498ed3ece79ae 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1617,12 +1617,16 @@ static int memory_failure_dev_pagemap(unsigned long pfn, int flags,
 		goto unlock;
 	}
 
-	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+	switch (pgmap->type) {
+	case MEMORY_DEVICE_PRIVATE:
+	case MEMORY_DEVICE_COHERENT:
 		/*
-		 * TODO: Handle HMM pages which may need coordination
+		 * TODO: Handle device pages which may need coordination
 		 * with device-side memory.
 		 */
 		goto unlock;
+	default:
+		break;
 	}
 
 	/*
diff --git a/mm/memremap.c b/mm/memremap.c
index e00ffcdba7b632..d00bb21a0630cd 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -313,6 +313,16 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
 			return ERR_PTR(-EINVAL);
 		}
 		break;
+	case MEMORY_DEVICE_COHERENT:
+		if (!pgmap->ops->page_free) {
+			WARN(1, "Missing page_free method\n");
+			return ERR_PTR(-EINVAL);
+		}
+		if (!pgmap->owner) {
+			WARN(1, "Missing owner\n");
+			return ERR_PTR(-EINVAL);
+		}
+		break;
 	case MEMORY_DEVICE_FS_DAX:
 		if (IS_ENABLED(CONFIG_FS_DAX_LIMITED)) {
 			WARN(1, "File system DAX not supported\n");
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 749e0bab8e4779..bfd66e7d830b02 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -494,7 +494,7 @@ EXPORT_SYMBOL(migrate_vma_setup);
  *     handle_pte_fault()
  *       do_anonymous_page()
  * to map in an anonymous zero page but the struct page will be a ZONE_DEVICE
- * private page.
+ * private or coherent page.
  */
 static void migrate_vma_insert_page(struct migrate_vma *migrate,
 				    unsigned long addr,
@@ -570,11 +570,8 @@ static void migrate_vma_insert_page(struct migrate_vma *migrate,
 						page_to_pfn(page));
 		entry = swp_entry_to_pte(swp_entry);
 	} else {
-		/*
-		 * For now we only support migrating to un-addressable device
-		 * memory.
-		 */
-		if (is_zone_device_page(page)) {
+		if (is_zone_device_page(page) &&
+		    !is_device_coherent_page(page)) {
 			pr_warn_once("Unsupported ZONE_DEVICE page type.\n");
 			goto abort;
 		}
@@ -677,10 +674,11 @@ void migrate_vma_pages(struct migrate_vma *migrate)
 
 		mapping = page_mapping(page);
 
-		if (is_device_private_page(newpage)) {
+		if (is_device_private_page(newpage) ||
+		    is_device_coherent_page(newpage)) {
 			/*
-			 * For now only support private anonymous when migrating
-			 * to un-addressable device memory.
+			 * For now only support anonymous memory migrating to
+			 * device private or coherent memory.
 			 */
 			if (mapping) {
 				migrate->src[i] &= ~MIGRATE_PFN_MIGRATE;
diff --git a/mm/rmap.c b/mm/rmap.c
index 6a1e8c7f621361..c34de7bd22393e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1835,7 +1835,7 @@ static bool try_to_migrate_one(struct page *page, struct vm_area_struct *vma,
 		/* Update high watermark before we lower rss */
 		update_hiwater_rss(mm);
 
-		if (is_zone_device_page(page)) {
+		if (is_device_private_page(page)) {
 			unsigned long pfn = page_to_pfn(page);
 			swp_entry_t entry;
 			pte_t swp_pte;
@@ -1976,7 +1976,8 @@ void try_to_migrate(struct page *page, enum ttu_flags flags)
 					TTU_SYNC)))
 		return;
 
-	if (is_zone_device_page(page) && !is_device_private_page(page))
+	if (is_zone_device_page(page) &&
+	    (!is_device_private_page(page) && !is_device_coherent_page(page)))
 		return;
 
 	/*
-- 
2.30.2



  parent reply	other threads:[~2022-02-10  7:29 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-10  7:28 start sorting out the ZONE_DEVICE refcount mess v2 Christoph Hellwig
2022-02-10  7:28 ` [PATCH 01/27] mm: remove a pointless CONFIG_ZONE_DEVICE check in memremap_pages Christoph Hellwig
2022-02-10 11:37   ` Miaohe Lin
2022-02-10  7:28 ` [PATCH 02/27] mm: remove the __KERNEL__ guard from <linux/mm.h> Christoph Hellwig
2022-02-10  7:28 ` [PATCH 03/27] mm: remove pointless includes from <linux/hmm.h> Christoph Hellwig
2022-02-10  9:39   ` Muchun Song
2022-02-10  7:28 ` [PATCH 04/27] mm: move free_devmap_managed_page to memremap.c Christoph Hellwig
2022-02-10  7:28 ` [PATCH 05/27] mm: simplify freeing of devmap managed pages Christoph Hellwig
2022-02-10  7:28 ` [PATCH 06/27] mm: don't include <linux/memremap.h> in <linux/mm.h> Christoph Hellwig
2022-02-10  7:28 ` [PATCH 07/27] mm: remove the extra ZONE_DEVICE struct page refcount Christoph Hellwig
2022-02-10  7:28 ` [PATCH 08/27] fsdax: depend on ZONE_DEVICE || FS_DAX_LIMITED Christoph Hellwig
2022-02-10  7:28 ` [PATCH 09/27] mm: generalize the pgmap based page_free infrastructure Christoph Hellwig
2022-02-14 17:37   ` Logan Gunthorpe
2022-02-10  7:28 ` [PATCH 10/27] mm: refactor check_and_migrate_movable_pages Christoph Hellwig
2022-02-10  7:28 ` [PATCH 11/27] mm: refactor the ZONE_DEVICE handling in migrate_vma_insert_page Christoph Hellwig
2022-02-10 10:48   ` Alistair Popple
2022-02-10  7:28 ` [PATCH 12/27] mm: refactor the ZONE_DEVICE handling in migrate_vma_pages Christoph Hellwig
2022-02-10 10:46   ` Alistair Popple
2022-02-10  7:28 ` [PATCH 13/27] mm: move the migrate_vma_* device migration code into it's own file Christoph Hellwig
2022-02-10 10:35   ` Alistair Popple
2022-02-10 15:23     ` Christoph Hellwig
2022-02-10  7:28 ` [PATCH 14/27] mm: build migrate_vma_* for all configs with ZONE_DEVICE support Christoph Hellwig
2022-02-10 10:43   ` Alistair Popple
2022-02-10  7:28 ` Christoph Hellwig [this message]
2022-02-10  7:28 ` [PATCH 16/27] mm: add device coherent vma selection for memory migration Christoph Hellwig
2022-02-10  7:28 ` [PATCH 17/27] mm/gup: fail get_user_pages for LONGTERM dev coherent type Christoph Hellwig
2022-02-10  7:28 ` [PATCH 18/27] drm/amdkfd: add SPM support for SVM Christoph Hellwig
2022-02-10  7:28 ` [PATCH 19/27] drm/amdkfd: coherent type as sys mem on migration to ram Christoph Hellwig
2022-02-10  7:28 ` [PATCH 20/27] lib: test_hmm add ioctl to get zone device type Christoph Hellwig
2022-02-10  7:28 ` [PATCH 21/27] lib: test_hmm add module param for " Christoph Hellwig
2022-02-10  7:28 ` [PATCH 22/27] lib: add support for device coherent type in test_hmm Christoph Hellwig
2022-02-10  7:28 ` [PATCH 23/27] tools: update hmm-test to support device coherent type Christoph Hellwig
2022-02-10  7:28 ` [PATCH 24/27] tools: update test_hmm script to support SP config Christoph Hellwig
2022-02-10  7:28 ` [PATCH 25/27] mm: remove the vma check in migrate_vma_setup() Christoph Hellwig
2022-02-10  7:28 ` [PATCH 26/27] mm/gup: migrate device coherent pages when pinning instead of failing Christoph Hellwig
2022-02-10  7:28 ` [PATCH 27/27] tools: add hmm gup test for long term pinned device pages Christoph Hellwig
2022-02-10 10:56 ` start sorting out the ZONE_DEVICE refcount mess v2 Alistair Popple
2022-02-10 17:36 ` Sierra Guiza, Alejandro (Alex)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220210072828.2930359-16-hch@lst.de \
    --to=hch@lst.de \
    --cc=Felix.Kuehling@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.sierra@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=apopple@nvidia.com \
    --cc=bskeggs@redhat.com \
    --cc=christian.koenig@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=jgg@ziepe.ca \
    --cc=kherbst@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=logang@deltatee.com \
    --cc=lyude@redhat.com \
    --cc=nouveau@lists.freedesktop.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=rcampbell@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).