All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] THP_SWAP support for ARM64 SoC with MTE
@ 2024-03-22 11:41 ` Barry Song
  0 siblings, 0 replies; 20+ messages in thread
From: Barry Song @ 2024-03-22 11:41 UTC (permalink / raw)
  To: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel
  Cc: chrisl, mark.rutland, ryan.roberts, steven.price, david, willy,
	Barry Song

From: Barry Song <v-songbaohua@oppo.com>

The patch has been extracted from the larger folios swap-in series [1],
incorporating some new modifications.

Introducing THP_SWAP support for ARM64 SoCs with MTE is essential, particularly
due to its significance for widely used ARM64 products in the market. Without
this support, Ryan's mTHP swap-out without splitting series won't operate
effectively on these SoCs.

Therefore, it's imperative for this update to be implemented sooner
rather than later.

There are a couple of differences with the code in [1]:
1. minor code cleanup, Ryan
2. always pass the first swap entry of a folio to arch_swap_restore, Ryan

[1] https://lore.kernel.org/linux-mm/20240304081348.197341-2-21cnbao@gmail.com/

Barry Song (1):
  arm64: mm: swap: support THP_SWAP on hardware with MTE

 arch/arm64/include/asm/pgtable.h | 19 ++------------
 arch/arm64/mm/mteswap.c          | 45 ++++++++++++++++++++++++++++++++
 include/linux/huge_mm.h          | 12 ---------
 include/linux/pgtable.h          |  2 +-
 mm/internal.h                    | 14 ++++++++++
 mm/memory.c                      |  2 +-
 mm/page_io.c                     |  2 +-
 mm/shmem.c                       |  2 +-
 mm/swap_slots.c                  |  2 +-
 mm/swapfile.c                    |  2 +-
 10 files changed, 67 insertions(+), 35 deletions(-)

Appendix

I also have a small test program specifically designed for running MTE
on a THP that I can share with those who are interested in this subject.

 /*
  * To be compiled with -march=armv8.5-a+memtag
  */
 #include <errno.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <sys/auxv.h>
 #include <sys/mman.h>
 #include <sys/prctl.h>
 
 /*
  * From arch/arm64/include/uapi/asm/hwcap.h
  */
 #define HWCAP2_MTE              (1 << 18)
 
 /*
  * From arch/arm64/include/uapi/asm/mman.h
  */
 #define PROT_MTE                 0x20
 
 /*
  * From include/uapi/linux/prctl.h
  */
 #define PR_SET_TAGGED_ADDR_CTRL 55
 #define PR_GET_TAGGED_ADDR_CTRL 56
 # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
 # define PR_MTE_TCF_SHIFT       1
 # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TAG_SHIFT       3
 # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)
 
 /*
  * Insert a random logical tag into the given pointer.
  */
 #define insert_random_tag(ptr) ({                       \
 		uint64_t __val;                                 \
 		asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \
 		__val;                                          \
 		})
 
 /*
  * Set the allocation tag on the destination address.
  */
 #define set_tag(tagged_addr) do {                                      \
 	asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
 } while (0)
 
 int main()
 {
 	unsigned char *a, *p[512];
 	unsigned long page_sz = 4 * 1024UL;
 	unsigned long mem_sz = 2 * 1024 * 1024UL;
 	unsigned long hwcap2 = getauxval(AT_HWCAP2);
 	int i;
 
 	if (!(hwcap2 & HWCAP2_MTE))
 		return EXIT_FAILURE;
 
 	if (prctl(PR_SET_TAGGED_ADDR_CTRL,
 				PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC |
 				(0xfffe << PR_MTE_TAG_SHIFT),
 				0, 0, 0)) {
 		perror("prctl() failed");
 		return EXIT_FAILURE;
 	}
 
 	a = mmap(0, mem_sz * 2, PROT_READ | PROT_WRITE,
 			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
 	if (a == MAP_FAILED) {
 		perror("mmap() failed");
 		return EXIT_FAILURE;
 	}
 
 	/* make sure a is aligned with 2MiB THP */
 	a = (unsigned char *)(((unsigned long)a + mem_sz - 1) & ~(mem_sz - 1));
 	madvise(a, mem_sz, MADV_HUGEPAGE);
 	memset(a, 0x11, mem_sz);
 
 	if (mprotect(a, mem_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
 		perror("mprotect() failed");
 		return EXIT_FAILURE;
 	}
 
 	printf("set tag for each 4KiB page\n");
 	for (i = 0; i < 512; i++) {
 		p[i] = a + i * page_sz;
 		p[i] = (unsigned char *)insert_random_tag(p[i]);
 		set_tag(p[i]);
 		p[i][0] = 0x33;
 	}
 
 	printf("swap-out the whole THP\n");
 	madvise(a, mem_sz, MADV_PAGEOUT);
 
 	printf("swap-in each page of the original THP\n");
 	for (i = 0; i < 512; i++) {
 		if (p[i][0] != 0x33) {
 			printf("test fails, unmatched value after swap-in\n");
 			return EXIT_FAILURE;
 		}
 	}
 	printf("we should get here\n");
 
 	for (i = 0; i < 512; i++) {
 		printf("page :%d val: expect segment fault, is %02x\n", i, p[i][16]);
 	}
 
 	printf("we shouldn't get here\n");
 
 	return EXIT_FAILURE;
 }

-- 
2.34.1



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 0/1] THP_SWAP support for ARM64 SoC with MTE
@ 2024-03-22 11:41 ` Barry Song
  0 siblings, 0 replies; 20+ messages in thread
From: Barry Song @ 2024-03-22 11:41 UTC (permalink / raw)
  To: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel
  Cc: chrisl, mark.rutland, ryan.roberts, steven.price, david, willy,
	Barry Song

From: Barry Song <v-songbaohua@oppo.com>

The patch has been extracted from the larger folios swap-in series [1],
incorporating some new modifications.

Introducing THP_SWAP support for ARM64 SoCs with MTE is essential, particularly
due to its significance for widely used ARM64 products in the market. Without
this support, Ryan's mTHP swap-out without splitting series won't operate
effectively on these SoCs.

Therefore, it's imperative for this update to be implemented sooner
rather than later.

There are a couple of differences with the code in [1]:
1. minor code cleanup, Ryan
2. always pass the first swap entry of a folio to arch_swap_restore, Ryan

[1] https://lore.kernel.org/linux-mm/20240304081348.197341-2-21cnbao@gmail.com/

Barry Song (1):
  arm64: mm: swap: support THP_SWAP on hardware with MTE

 arch/arm64/include/asm/pgtable.h | 19 ++------------
 arch/arm64/mm/mteswap.c          | 45 ++++++++++++++++++++++++++++++++
 include/linux/huge_mm.h          | 12 ---------
 include/linux/pgtable.h          |  2 +-
 mm/internal.h                    | 14 ++++++++++
 mm/memory.c                      |  2 +-
 mm/page_io.c                     |  2 +-
 mm/shmem.c                       |  2 +-
 mm/swap_slots.c                  |  2 +-
 mm/swapfile.c                    |  2 +-
 10 files changed, 67 insertions(+), 35 deletions(-)

Appendix

I also have a small test program specifically designed for running MTE
on a THP that I can share with those who are interested in this subject.

 /*
  * To be compiled with -march=armv8.5-a+memtag
  */
 #include <errno.h>
 #include <stdint.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
 #include <sys/auxv.h>
 #include <sys/mman.h>
 #include <sys/prctl.h>
 
 /*
  * From arch/arm64/include/uapi/asm/hwcap.h
  */
 #define HWCAP2_MTE              (1 << 18)
 
 /*
  * From arch/arm64/include/uapi/asm/mman.h
  */
 #define PROT_MTE                 0x20
 
 /*
  * From include/uapi/linux/prctl.h
  */
 #define PR_SET_TAGGED_ADDR_CTRL 55
 #define PR_GET_TAGGED_ADDR_CTRL 56
 # define PR_TAGGED_ADDR_ENABLE  (1UL << 0)
 # define PR_MTE_TCF_SHIFT       1
 # define PR_MTE_TCF_NONE        (0UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TCF_SYNC        (1UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TCF_ASYNC       (2UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TCF_MASK        (3UL << PR_MTE_TCF_SHIFT)
 # define PR_MTE_TAG_SHIFT       3
 # define PR_MTE_TAG_MASK        (0xffffUL << PR_MTE_TAG_SHIFT)
 
 /*
  * Insert a random logical tag into the given pointer.
  */
 #define insert_random_tag(ptr) ({                       \
 		uint64_t __val;                                 \
 		asm("irg %0, %1" : "=r" (__val) : "r" (ptr));   \
 		__val;                                          \
 		})
 
 /*
  * Set the allocation tag on the destination address.
  */
 #define set_tag(tagged_addr) do {                                      \
 	asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
 } while (0)
 
 int main()
 {
 	unsigned char *a, *p[512];
 	unsigned long page_sz = 4 * 1024UL;
 	unsigned long mem_sz = 2 * 1024 * 1024UL;
 	unsigned long hwcap2 = getauxval(AT_HWCAP2);
 	int i;
 
 	if (!(hwcap2 & HWCAP2_MTE))
 		return EXIT_FAILURE;
 
 	if (prctl(PR_SET_TAGGED_ADDR_CTRL,
 				PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC |
 				(0xfffe << PR_MTE_TAG_SHIFT),
 				0, 0, 0)) {
 		perror("prctl() failed");
 		return EXIT_FAILURE;
 	}
 
 	a = mmap(0, mem_sz * 2, PROT_READ | PROT_WRITE,
 			MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
 	if (a == MAP_FAILED) {
 		perror("mmap() failed");
 		return EXIT_FAILURE;
 	}
 
 	/* make sure a is aligned with 2MiB THP */
 	a = (unsigned char *)(((unsigned long)a + mem_sz - 1) & ~(mem_sz - 1));
 	madvise(a, mem_sz, MADV_HUGEPAGE);
 	memset(a, 0x11, mem_sz);
 
 	if (mprotect(a, mem_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
 		perror("mprotect() failed");
 		return EXIT_FAILURE;
 	}
 
 	printf("set tag for each 4KiB page\n");
 	for (i = 0; i < 512; i++) {
 		p[i] = a + i * page_sz;
 		p[i] = (unsigned char *)insert_random_tag(p[i]);
 		set_tag(p[i]);
 		p[i][0] = 0x33;
 	}
 
 	printf("swap-out the whole THP\n");
 	madvise(a, mem_sz, MADV_PAGEOUT);
 
 	printf("swap-in each page of the original THP\n");
 	for (i = 0; i < 512; i++) {
 		if (p[i][0] != 0x33) {
 			printf("test fails, unmatched value after swap-in\n");
 			return EXIT_FAILURE;
 		}
 	}
 	printf("we should get here\n");
 
 	for (i = 0; i < 512; i++) {
 		printf("page :%d val: expect segment fault, is %02x\n", i, p[i][16]);
 	}
 
 	printf("we shouldn't get here\n");
 
 	return EXIT_FAILURE;
 }

-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
  2024-03-22 11:41 ` Barry Song
@ 2024-03-22 11:41   ` Barry Song
  -1 siblings, 0 replies; 20+ messages in thread
From: Barry Song @ 2024-03-22 11:41 UTC (permalink / raw)
  To: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel
  Cc: chrisl, mark.rutland, ryan.roberts, steven.price, david, willy,
	Barry Song, Kemeng Shi, Anshuman Khandual, Peter Collingbourne,
	Yosry Ahmed, Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

From: Barry Song <v-songbaohua@oppo.com>

Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
MTE as the MTE code works with the assumption tags save/restore is
always handling a folio with only one page.

The limitation should be removed as more and more ARM64 SoCs have
this feature. Co-existence of MTE and THP_SWAP becomes more and
more important.

This patch makes MTE tags saving support large folios, then we don't
need to split large folios into base pages for swapping out on ARM64
SoCs with MTE any more.

arch_prepare_to_swap() should take folio rather than page as parameter
because we support THP swap-out as a whole. It saves tags for all
pages in a large folio.

As now we are restoring tags based-on folio, in arch_swap_restore(),
we may increase some extra loops and early-exitings while refaulting
a large folio which is still in swapcache in do_swap_page(). In case
a large folio has nr pages, do_swap_page() will only set the PTE of
the particular page which is causing the page fault.
Thus do_swap_page() runs nr times, and each time, arch_swap_restore()
will loop nr times for those subpages in the folio. So right now the
algorithmic complexity becomes O(nr^2).

Once we support mapping large folios in do_swap_page(), extra loops
and early-exitings will decrease while not being completely removed
as a large folio might get partially tagged in corner cases such as,
1. a large folio in swapcache can be partially unmapped, thus, MTE
tags for the unmapped pages will be invalidated;
2. users might use mprotect() to set MTEs on a part of a large folio.

arch_thp_swp_supported() is dropped since ARM64 MTE was the only one
who needed it.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Chris Li <chrisl@kernel.org>
---
 arch/arm64/include/asm/pgtable.h | 19 ++------------
 arch/arm64/mm/mteswap.c          | 45 ++++++++++++++++++++++++++++++++
 include/linux/huge_mm.h          | 12 ---------
 include/linux/pgtable.h          |  2 +-
 mm/internal.h                    | 14 ++++++++++
 mm/memory.c                      |  2 +-
 mm/page_io.c                     |  2 +-
 mm/shmem.c                       |  2 +-
 mm/swap_slots.c                  |  2 +-
 mm/swapfile.c                    |  2 +-
 10 files changed, 67 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..259325e6b7e8 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -49,12 +49,6 @@
 	__flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-static inline bool arch_thp_swp_supported(void)
-{
-	return !system_supports_mte();
-}
-#define arch_thp_swp_supported arch_thp_swp_supported
-
 /*
  * Outside of a few very special situations (e.g. hibernation), we always
  * use broadcast TLB invalidation instructions, therefore a spurious page
@@ -1280,12 +1274,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 #ifdef CONFIG_ARM64_MTE
 
 #define __HAVE_ARCH_PREPARE_TO_SWAP
-static inline int arch_prepare_to_swap(struct page *page)
-{
-	if (system_supports_mte())
-		return mte_save_tags(page);
-	return 0;
-}
+extern int arch_prepare_to_swap(struct folio *folio);
 
 #define __HAVE_ARCH_SWAP_INVALIDATE
 static inline void arch_swap_invalidate_page(int type, pgoff_t offset)
@@ -1301,11 +1290,7 @@ static inline void arch_swap_invalidate_area(int type)
 }
 
 #define __HAVE_ARCH_SWAP_RESTORE
-static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
-{
-	if (system_supports_mte())
-		mte_restore_tags(entry, &folio->page);
-}
+extern void arch_swap_restore(swp_entry_t entry, struct folio *folio);
 
 #endif /* CONFIG_ARM64_MTE */
 
diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
index a31833e3ddc5..63e8d72f202a 100644
--- a/arch/arm64/mm/mteswap.c
+++ b/arch/arm64/mm/mteswap.c
@@ -68,6 +68,13 @@ void mte_invalidate_tags(int type, pgoff_t offset)
 	mte_free_tag_storage(tags);
 }
 
+static inline void __mte_invalidate_tags(struct page *page)
+{
+	swp_entry_t entry = page_swap_entry(page);
+
+	mte_invalidate_tags(swp_type(entry), swp_offset(entry));
+}
+
 void mte_invalidate_tags_area(int type)
 {
 	swp_entry_t entry = swp_entry(type, 0);
@@ -83,3 +90,41 @@ void mte_invalidate_tags_area(int type)
 	}
 	xa_unlock(&mte_pages);
 }
+
+int arch_prepare_to_swap(struct folio *folio)
+{
+	long i, nr;
+	int err;
+
+	if (!system_supports_mte())
+		return 0;
+
+	nr = folio_nr_pages(folio);
+
+	for (i = 0; i < nr; i++) {
+		err = mte_save_tags(folio_page(folio, i));
+		if (err)
+			goto out;
+	}
+	return 0;
+
+out:
+	while (i--)
+		__mte_invalidate_tags(folio_page(folio, i));
+	return err;
+}
+
+void arch_swap_restore(swp_entry_t entry, struct folio *folio)
+{
+	long i, nr;
+
+	if (!system_supports_mte())
+		return;
+
+	nr = folio_nr_pages(folio);
+
+	for (i = 0; i < nr; i++) {
+		mte_restore_tags(entry, folio_page(folio, i));
+		entry.val++;
+	}
+}
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index de0c89105076..e04b93c43965 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -535,16 +535,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order)
 #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0)
 #define split_folio(f) split_folio_to_order(f, 0)
 
-/*
- * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
- * limitations in the implementation like arm64 MTE can override this to
- * false
- */
-#ifndef arch_thp_swp_supported
-static inline bool arch_thp_swp_supported(void)
-{
-	return true;
-}
-#endif
-
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 85fc7554cd52..b10a7dd615bd 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1052,7 +1052,7 @@ static inline int arch_unmap_one(struct mm_struct *mm,
  * prototypes must be defined in the arch-specific asm/pgtable.h file.
  */
 #ifndef __HAVE_ARCH_PREPARE_TO_SWAP
-static inline int arch_prepare_to_swap(struct page *page)
+static inline int arch_prepare_to_swap(struct folio *folio)
 {
 	return 0;
 }
diff --git a/mm/internal.h b/mm/internal.h
index 7e486f2c502c..2551e93dd885 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -76,6 +76,20 @@ static inline int folio_nr_pages_mapped(struct folio *folio)
 	return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED;
 }
 
+/*
+ * Retrieve the first entry of a folio based on a provided entry within the
+ * folio. We cannot rely on folio->swap as there is no guarantee that it has
+ * been initialized. Used for calling arch_swap_restore()
+ */
+static inline swp_entry_t folio_swap(swp_entry_t entry, struct folio *folio)
+{
+	swp_entry_t swap = {
+		.val = ALIGN_DOWN(entry.val, folio_nr_pages(folio)),
+	};
+
+	return swap;
+}
+
 static inline void *folio_raw_mapping(struct folio *folio)
 {
 	unsigned long mapping = (unsigned long)folio->mapping;
diff --git a/mm/memory.c b/mm/memory.c
index f2bc6dd15eb8..b7cab8be8632 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4188,7 +4188,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	 * when reading from swap. This metadata may be indexed by swap entry
 	 * so this must be called before swap_free().
 	 */
-	arch_swap_restore(entry, folio);
+	arch_swap_restore(folio_swap(entry, folio), folio);
 
 	/*
 	 * Remove the swap entry and conditionally try to free up the swapcache.
diff --git a/mm/page_io.c b/mm/page_io.c
index ae2b49055e43..a9a7c236aecc 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -189,7 +189,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 	 * Arch code may have to preserve more data than just the page
 	 * contents, e.g. memory tags.
 	 */
-	ret = arch_prepare_to_swap(&folio->page);
+	ret = arch_prepare_to_swap(folio);
 	if (ret) {
 		folio_mark_dirty(folio);
 		folio_unlock(folio);
diff --git a/mm/shmem.c b/mm/shmem.c
index 0aad0d9a621b..44c1519ba881 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1913,7 +1913,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 	 * Some architectures may have to restore extra metadata to the
 	 * folio after reading from swap.
 	 */
-	arch_swap_restore(swap, folio);
+	arch_swap_restore(folio_swap(swap, folio), folio);
 
 	if (shmem_should_replace_folio(folio, gfp)) {
 		error = shmem_replace_folio(&folio, gfp, info, index);
diff --git a/mm/swap_slots.c b/mm/swap_slots.c
index 90973ce7881d..53abeaf1371d 100644
--- a/mm/swap_slots.c
+++ b/mm/swap_slots.c
@@ -310,7 +310,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio)
 	entry.val = 0;
 
 	if (folio_test_large(folio)) {
-		if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
+		if (IS_ENABLED(CONFIG_THP_SWAP))
 			get_swap_pages(1, &entry, folio_nr_pages(folio));
 		goto out;
 	}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4919423cce76..5e6d2304a2a4 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1806,7 +1806,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 	 * when reading from swap. This metadata may be indexed by swap entry
 	 * so this must be called before swap_free().
 	 */
-	arch_swap_restore(entry, folio);
+	arch_swap_restore(folio_swap(entry, folio), folio);
 
 	dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
 	inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
@ 2024-03-22 11:41   ` Barry Song
  0 siblings, 0 replies; 20+ messages in thread
From: Barry Song @ 2024-03-22 11:41 UTC (permalink / raw)
  To: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel
  Cc: chrisl, mark.rutland, ryan.roberts, steven.price, david, willy,
	Barry Song, Kemeng Shi, Anshuman Khandual, Peter Collingbourne,
	Yosry Ahmed, Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

From: Barry Song <v-songbaohua@oppo.com>

Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
MTE as the MTE code works with the assumption tags save/restore is
always handling a folio with only one page.

The limitation should be removed as more and more ARM64 SoCs have
this feature. Co-existence of MTE and THP_SWAP becomes more and
more important.

This patch makes MTE tags saving support large folios, then we don't
need to split large folios into base pages for swapping out on ARM64
SoCs with MTE any more.

arch_prepare_to_swap() should take folio rather than page as parameter
because we support THP swap-out as a whole. It saves tags for all
pages in a large folio.

As now we are restoring tags based-on folio, in arch_swap_restore(),
we may increase some extra loops and early-exitings while refaulting
a large folio which is still in swapcache in do_swap_page(). In case
a large folio has nr pages, do_swap_page() will only set the PTE of
the particular page which is causing the page fault.
Thus do_swap_page() runs nr times, and each time, arch_swap_restore()
will loop nr times for those subpages in the folio. So right now the
algorithmic complexity becomes O(nr^2).

Once we support mapping large folios in do_swap_page(), extra loops
and early-exitings will decrease while not being completely removed
as a large folio might get partially tagged in corner cases such as,
1. a large folio in swapcache can be partially unmapped, thus, MTE
tags for the unmapped pages will be invalidated;
2. users might use mprotect() to set MTEs on a part of a large folio.

arch_thp_swp_supported() is dropped since ARM64 MTE was the only one
who needed it.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Yosry Ahmed <yosryahmed@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Chris Li <chrisl@kernel.org>
---
 arch/arm64/include/asm/pgtable.h | 19 ++------------
 arch/arm64/mm/mteswap.c          | 45 ++++++++++++++++++++++++++++++++
 include/linux/huge_mm.h          | 12 ---------
 include/linux/pgtable.h          |  2 +-
 mm/internal.h                    | 14 ++++++++++
 mm/memory.c                      |  2 +-
 mm/page_io.c                     |  2 +-
 mm/shmem.c                       |  2 +-
 mm/swap_slots.c                  |  2 +-
 mm/swapfile.c                    |  2 +-
 10 files changed, 67 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index afdd56d26ad7..259325e6b7e8 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -49,12 +49,6 @@
 	__flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
-static inline bool arch_thp_swp_supported(void)
-{
-	return !system_supports_mte();
-}
-#define arch_thp_swp_supported arch_thp_swp_supported
-
 /*
  * Outside of a few very special situations (e.g. hibernation), we always
  * use broadcast TLB invalidation instructions, therefore a spurious page
@@ -1280,12 +1274,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
 #ifdef CONFIG_ARM64_MTE
 
 #define __HAVE_ARCH_PREPARE_TO_SWAP
-static inline int arch_prepare_to_swap(struct page *page)
-{
-	if (system_supports_mte())
-		return mte_save_tags(page);
-	return 0;
-}
+extern int arch_prepare_to_swap(struct folio *folio);
 
 #define __HAVE_ARCH_SWAP_INVALIDATE
 static inline void arch_swap_invalidate_page(int type, pgoff_t offset)
@@ -1301,11 +1290,7 @@ static inline void arch_swap_invalidate_area(int type)
 }
 
 #define __HAVE_ARCH_SWAP_RESTORE
-static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
-{
-	if (system_supports_mte())
-		mte_restore_tags(entry, &folio->page);
-}
+extern void arch_swap_restore(swp_entry_t entry, struct folio *folio);
 
 #endif /* CONFIG_ARM64_MTE */
 
diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
index a31833e3ddc5..63e8d72f202a 100644
--- a/arch/arm64/mm/mteswap.c
+++ b/arch/arm64/mm/mteswap.c
@@ -68,6 +68,13 @@ void mte_invalidate_tags(int type, pgoff_t offset)
 	mte_free_tag_storage(tags);
 }
 
+static inline void __mte_invalidate_tags(struct page *page)
+{
+	swp_entry_t entry = page_swap_entry(page);
+
+	mte_invalidate_tags(swp_type(entry), swp_offset(entry));
+}
+
 void mte_invalidate_tags_area(int type)
 {
 	swp_entry_t entry = swp_entry(type, 0);
@@ -83,3 +90,41 @@ void mte_invalidate_tags_area(int type)
 	}
 	xa_unlock(&mte_pages);
 }
+
+int arch_prepare_to_swap(struct folio *folio)
+{
+	long i, nr;
+	int err;
+
+	if (!system_supports_mte())
+		return 0;
+
+	nr = folio_nr_pages(folio);
+
+	for (i = 0; i < nr; i++) {
+		err = mte_save_tags(folio_page(folio, i));
+		if (err)
+			goto out;
+	}
+	return 0;
+
+out:
+	while (i--)
+		__mte_invalidate_tags(folio_page(folio, i));
+	return err;
+}
+
+void arch_swap_restore(swp_entry_t entry, struct folio *folio)
+{
+	long i, nr;
+
+	if (!system_supports_mte())
+		return;
+
+	nr = folio_nr_pages(folio);
+
+	for (i = 0; i < nr; i++) {
+		mte_restore_tags(entry, folio_page(folio, i));
+		entry.val++;
+	}
+}
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index de0c89105076..e04b93c43965 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -535,16 +535,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order)
 #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0)
 #define split_folio(f) split_folio_to_order(f, 0)
 
-/*
- * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
- * limitations in the implementation like arm64 MTE can override this to
- * false
- */
-#ifndef arch_thp_swp_supported
-static inline bool arch_thp_swp_supported(void)
-{
-	return true;
-}
-#endif
-
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 85fc7554cd52..b10a7dd615bd 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1052,7 +1052,7 @@ static inline int arch_unmap_one(struct mm_struct *mm,
  * prototypes must be defined in the arch-specific asm/pgtable.h file.
  */
 #ifndef __HAVE_ARCH_PREPARE_TO_SWAP
-static inline int arch_prepare_to_swap(struct page *page)
+static inline int arch_prepare_to_swap(struct folio *folio)
 {
 	return 0;
 }
diff --git a/mm/internal.h b/mm/internal.h
index 7e486f2c502c..2551e93dd885 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -76,6 +76,20 @@ static inline int folio_nr_pages_mapped(struct folio *folio)
 	return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED;
 }
 
+/*
+ * Retrieve the first entry of a folio based on a provided entry within the
+ * folio. We cannot rely on folio->swap as there is no guarantee that it has
+ * been initialized. Used for calling arch_swap_restore()
+ */
+static inline swp_entry_t folio_swap(swp_entry_t entry, struct folio *folio)
+{
+	swp_entry_t swap = {
+		.val = ALIGN_DOWN(entry.val, folio_nr_pages(folio)),
+	};
+
+	return swap;
+}
+
 static inline void *folio_raw_mapping(struct folio *folio)
 {
 	unsigned long mapping = (unsigned long)folio->mapping;
diff --git a/mm/memory.c b/mm/memory.c
index f2bc6dd15eb8..b7cab8be8632 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4188,7 +4188,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	 * when reading from swap. This metadata may be indexed by swap entry
 	 * so this must be called before swap_free().
 	 */
-	arch_swap_restore(entry, folio);
+	arch_swap_restore(folio_swap(entry, folio), folio);
 
 	/*
 	 * Remove the swap entry and conditionally try to free up the swapcache.
diff --git a/mm/page_io.c b/mm/page_io.c
index ae2b49055e43..a9a7c236aecc 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -189,7 +189,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 	 * Arch code may have to preserve more data than just the page
 	 * contents, e.g. memory tags.
 	 */
-	ret = arch_prepare_to_swap(&folio->page);
+	ret = arch_prepare_to_swap(folio);
 	if (ret) {
 		folio_mark_dirty(folio);
 		folio_unlock(folio);
diff --git a/mm/shmem.c b/mm/shmem.c
index 0aad0d9a621b..44c1519ba881 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1913,7 +1913,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 	 * Some architectures may have to restore extra metadata to the
 	 * folio after reading from swap.
 	 */
-	arch_swap_restore(swap, folio);
+	arch_swap_restore(folio_swap(swap, folio), folio);
 
 	if (shmem_should_replace_folio(folio, gfp)) {
 		error = shmem_replace_folio(&folio, gfp, info, index);
diff --git a/mm/swap_slots.c b/mm/swap_slots.c
index 90973ce7881d..53abeaf1371d 100644
--- a/mm/swap_slots.c
+++ b/mm/swap_slots.c
@@ -310,7 +310,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio)
 	entry.val = 0;
 
 	if (folio_test_large(folio)) {
-		if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
+		if (IS_ENABLED(CONFIG_THP_SWAP))
 			get_swap_pages(1, &entry, folio_nr_pages(folio));
 		goto out;
 	}
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 4919423cce76..5e6d2304a2a4 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1806,7 +1806,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 	 * when reading from swap. This metadata may be indexed by swap entry
 	 * so this must be called before swap_free().
 	 */
-	arch_swap_restore(entry, folio);
+	arch_swap_restore(folio_swap(entry, folio), folio);
 
 	dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
 	inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
  2024-03-22 11:41   ` Barry Song
@ 2024-03-26 17:37     ` Ryan Roberts
  -1 siblings, 0 replies; 20+ messages in thread
From: Ryan Roberts @ 2024-03-26 17:37 UTC (permalink / raw)
  To: Barry Song, catalin.marinas, will, akpm, hughd, linux-mm,
	linux-arm-kernel
  Cc: chrisl, mark.rutland, steven.price, david, willy, Barry Song,
	Kemeng Shi, Anshuman Khandual, Peter Collingbourne, Yosry Ahmed,
	Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On 22/03/2024 11:41, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
> MTE as the MTE code works with the assumption tags save/restore is
> always handling a folio with only one page.
> 
> The limitation should be removed as more and more ARM64 SoCs have
> this feature. Co-existence of MTE and THP_SWAP becomes more and
> more important.
> 
> This patch makes MTE tags saving support large folios, then we don't
> need to split large folios into base pages for swapping out on ARM64
> SoCs with MTE any more.
> 
> arch_prepare_to_swap() should take folio rather than page as parameter
> because we support THP swap-out as a whole. It saves tags for all
> pages in a large folio.
> 
> As now we are restoring tags based-on folio, in arch_swap_restore(),
> we may increase some extra loops and early-exitings while refaulting
> a large folio which is still in swapcache in do_swap_page(). In case
> a large folio has nr pages, do_swap_page() will only set the PTE of
> the particular page which is causing the page fault.
> Thus do_swap_page() runs nr times, and each time, arch_swap_restore()
> will loop nr times for those subpages in the folio. So right now the
> algorithmic complexity becomes O(nr^2).
> 
> Once we support mapping large folios in do_swap_page(), extra loops
> and early-exitings will decrease while not being completely removed
> as a large folio might get partially tagged in corner cases such as,
> 1. a large folio in swapcache can be partially unmapped, thus, MTE
> tags for the unmapped pages will be invalidated;
> 2. users might use mprotect() to set MTEs on a part of a large folio.
> 
> arch_thp_swp_supported() is dropped since ARM64 MTE was the only one
> who needed it.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Peter Collingbourne <pcc@google.com>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
> Acked-by: Chris Li <chrisl@kernel.org>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

It would be really great if we can get this in for v6.10!

> ---
>  arch/arm64/include/asm/pgtable.h | 19 ++------------
>  arch/arm64/mm/mteswap.c          | 45 ++++++++++++++++++++++++++++++++
>  include/linux/huge_mm.h          | 12 ---------
>  include/linux/pgtable.h          |  2 +-
>  mm/internal.h                    | 14 ++++++++++
>  mm/memory.c                      |  2 +-
>  mm/page_io.c                     |  2 +-
>  mm/shmem.c                       |  2 +-
>  mm/swap_slots.c                  |  2 +-
>  mm/swapfile.c                    |  2 +-
>  10 files changed, 67 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index afdd56d26ad7..259325e6b7e8 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -49,12 +49,6 @@
>  	__flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
> -static inline bool arch_thp_swp_supported(void)
> -{
> -	return !system_supports_mte();
> -}
> -#define arch_thp_swp_supported arch_thp_swp_supported
> -
>  /*
>   * Outside of a few very special situations (e.g. hibernation), we always
>   * use broadcast TLB invalidation instructions, therefore a spurious page
> @@ -1280,12 +1274,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>  #ifdef CONFIG_ARM64_MTE
>  
>  #define __HAVE_ARCH_PREPARE_TO_SWAP
> -static inline int arch_prepare_to_swap(struct page *page)
> -{
> -	if (system_supports_mte())
> -		return mte_save_tags(page);
> -	return 0;
> -}
> +extern int arch_prepare_to_swap(struct folio *folio);
>  
>  #define __HAVE_ARCH_SWAP_INVALIDATE
>  static inline void arch_swap_invalidate_page(int type, pgoff_t offset)
> @@ -1301,11 +1290,7 @@ static inline void arch_swap_invalidate_area(int type)
>  }
>  
>  #define __HAVE_ARCH_SWAP_RESTORE
> -static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
> -{
> -	if (system_supports_mte())
> -		mte_restore_tags(entry, &folio->page);
> -}
> +extern void arch_swap_restore(swp_entry_t entry, struct folio *folio);
>  
>  #endif /* CONFIG_ARM64_MTE */
>  
> diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
> index a31833e3ddc5..63e8d72f202a 100644
> --- a/arch/arm64/mm/mteswap.c
> +++ b/arch/arm64/mm/mteswap.c
> @@ -68,6 +68,13 @@ void mte_invalidate_tags(int type, pgoff_t offset)
>  	mte_free_tag_storage(tags);
>  }
>  
> +static inline void __mte_invalidate_tags(struct page *page)
> +{
> +	swp_entry_t entry = page_swap_entry(page);
> +
> +	mte_invalidate_tags(swp_type(entry), swp_offset(entry));
> +}
> +
>  void mte_invalidate_tags_area(int type)
>  {
>  	swp_entry_t entry = swp_entry(type, 0);
> @@ -83,3 +90,41 @@ void mte_invalidate_tags_area(int type)
>  	}
>  	xa_unlock(&mte_pages);
>  }
> +
> +int arch_prepare_to_swap(struct folio *folio)
> +{
> +	long i, nr;
> +	int err;
> +
> +	if (!system_supports_mte())
> +		return 0;
> +
> +	nr = folio_nr_pages(folio);
> +
> +	for (i = 0; i < nr; i++) {
> +		err = mte_save_tags(folio_page(folio, i));
> +		if (err)
> +			goto out;
> +	}
> +	return 0;
> +
> +out:
> +	while (i--)
> +		__mte_invalidate_tags(folio_page(folio, i));
> +	return err;
> +}
> +
> +void arch_swap_restore(swp_entry_t entry, struct folio *folio)
> +{
> +	long i, nr;
> +
> +	if (!system_supports_mte())
> +		return;
> +
> +	nr = folio_nr_pages(folio);
> +
> +	for (i = 0; i < nr; i++) {
> +		mte_restore_tags(entry, folio_page(folio, i));
> +		entry.val++;
> +	}
> +}
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index de0c89105076..e04b93c43965 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -535,16 +535,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order)
>  #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0)
>  #define split_folio(f) split_folio_to_order(f, 0)
>  
> -/*
> - * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> - * limitations in the implementation like arm64 MTE can override this to
> - * false
> - */
> -#ifndef arch_thp_swp_supported
> -static inline bool arch_thp_swp_supported(void)
> -{
> -	return true;
> -}
> -#endif
> -
>  #endif /* _LINUX_HUGE_MM_H */
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 85fc7554cd52..b10a7dd615bd 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1052,7 +1052,7 @@ static inline int arch_unmap_one(struct mm_struct *mm,
>   * prototypes must be defined in the arch-specific asm/pgtable.h file.
>   */
>  #ifndef __HAVE_ARCH_PREPARE_TO_SWAP
> -static inline int arch_prepare_to_swap(struct page *page)
> +static inline int arch_prepare_to_swap(struct folio *folio)
>  {
>  	return 0;
>  }
> diff --git a/mm/internal.h b/mm/internal.h
> index 7e486f2c502c..2551e93dd885 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -76,6 +76,20 @@ static inline int folio_nr_pages_mapped(struct folio *folio)
>  	return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED;
>  }
>  
> +/*
> + * Retrieve the first entry of a folio based on a provided entry within the
> + * folio. We cannot rely on folio->swap as there is no guarantee that it has
> + * been initialized. Used for calling arch_swap_restore()
> + */
> +static inline swp_entry_t folio_swap(swp_entry_t entry, struct folio *folio)
> +{
> +	swp_entry_t swap = {
> +		.val = ALIGN_DOWN(entry.val, folio_nr_pages(folio)),
> +	};
> +
> +	return swap;
> +}
> +
>  static inline void *folio_raw_mapping(struct folio *folio)
>  {
>  	unsigned long mapping = (unsigned long)folio->mapping;
> diff --git a/mm/memory.c b/mm/memory.c
> index f2bc6dd15eb8..b7cab8be8632 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4188,7 +4188,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>  	 * when reading from swap. This metadata may be indexed by swap entry
>  	 * so this must be called before swap_free().
>  	 */
> -	arch_swap_restore(entry, folio);
> +	arch_swap_restore(folio_swap(entry, folio), folio);
>  
>  	/*
>  	 * Remove the swap entry and conditionally try to free up the swapcache.
> diff --git a/mm/page_io.c b/mm/page_io.c
> index ae2b49055e43..a9a7c236aecc 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -189,7 +189,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>  	 * Arch code may have to preserve more data than just the page
>  	 * contents, e.g. memory tags.
>  	 */
> -	ret = arch_prepare_to_swap(&folio->page);
> +	ret = arch_prepare_to_swap(folio);
>  	if (ret) {
>  		folio_mark_dirty(folio);
>  		folio_unlock(folio);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 0aad0d9a621b..44c1519ba881 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1913,7 +1913,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>  	 * Some architectures may have to restore extra metadata to the
>  	 * folio after reading from swap.
>  	 */
> -	arch_swap_restore(swap, folio);
> +	arch_swap_restore(folio_swap(swap, folio), folio);
>  
>  	if (shmem_should_replace_folio(folio, gfp)) {
>  		error = shmem_replace_folio(&folio, gfp, info, index);
> diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> index 90973ce7881d..53abeaf1371d 100644
> --- a/mm/swap_slots.c
> +++ b/mm/swap_slots.c
> @@ -310,7 +310,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio)
>  	entry.val = 0;
>  
>  	if (folio_test_large(folio)) {
> -		if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> +		if (IS_ENABLED(CONFIG_THP_SWAP))
>  			get_swap_pages(1, &entry, folio_nr_pages(folio));
>  		goto out;
>  	}
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 4919423cce76..5e6d2304a2a4 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1806,7 +1806,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>  	 * when reading from swap. This metadata may be indexed by swap entry
>  	 * so this must be called before swap_free().
>  	 */
> -	arch_swap_restore(entry, folio);
> +	arch_swap_restore(folio_swap(entry, folio), folio);
>  
>  	dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
>  	inc_mm_counter(vma->vm_mm, MM_ANONPAGES);



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
@ 2024-03-26 17:37     ` Ryan Roberts
  0 siblings, 0 replies; 20+ messages in thread
From: Ryan Roberts @ 2024-03-26 17:37 UTC (permalink / raw)
  To: Barry Song, catalin.marinas, will, akpm, hughd, linux-mm,
	linux-arm-kernel
  Cc: chrisl, mark.rutland, steven.price, david, willy, Barry Song,
	Kemeng Shi, Anshuman Khandual, Peter Collingbourne, Yosry Ahmed,
	Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On 22/03/2024 11:41, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
> MTE as the MTE code works with the assumption tags save/restore is
> always handling a folio with only one page.
> 
> The limitation should be removed as more and more ARM64 SoCs have
> this feature. Co-existence of MTE and THP_SWAP becomes more and
> more important.
> 
> This patch makes MTE tags saving support large folios, then we don't
> need to split large folios into base pages for swapping out on ARM64
> SoCs with MTE any more.
> 
> arch_prepare_to_swap() should take folio rather than page as parameter
> because we support THP swap-out as a whole. It saves tags for all
> pages in a large folio.
> 
> As now we are restoring tags based-on folio, in arch_swap_restore(),
> we may increase some extra loops and early-exitings while refaulting
> a large folio which is still in swapcache in do_swap_page(). In case
> a large folio has nr pages, do_swap_page() will only set the PTE of
> the particular page which is causing the page fault.
> Thus do_swap_page() runs nr times, and each time, arch_swap_restore()
> will loop nr times for those subpages in the folio. So right now the
> algorithmic complexity becomes O(nr^2).
> 
> Once we support mapping large folios in do_swap_page(), extra loops
> and early-exitings will decrease while not being completely removed
> as a large folio might get partially tagged in corner cases such as,
> 1. a large folio in swapcache can be partially unmapped, thus, MTE
> tags for the unmapped pages will be invalidated;
> 2. users might use mprotect() to set MTEs on a part of a large folio.
> 
> arch_thp_swp_supported() is dropped since ARM64 MTE was the only one
> who needed it.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Peter Collingbourne <pcc@google.com>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
> Acked-by: Chris Li <chrisl@kernel.org>

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>

It would be really great if we can get this in for v6.10!

> ---
>  arch/arm64/include/asm/pgtable.h | 19 ++------------
>  arch/arm64/mm/mteswap.c          | 45 ++++++++++++++++++++++++++++++++
>  include/linux/huge_mm.h          | 12 ---------
>  include/linux/pgtable.h          |  2 +-
>  mm/internal.h                    | 14 ++++++++++
>  mm/memory.c                      |  2 +-
>  mm/page_io.c                     |  2 +-
>  mm/shmem.c                       |  2 +-
>  mm/swap_slots.c                  |  2 +-
>  mm/swapfile.c                    |  2 +-
>  10 files changed, 67 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index afdd56d26ad7..259325e6b7e8 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -49,12 +49,6 @@
>  	__flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
> -static inline bool arch_thp_swp_supported(void)
> -{
> -	return !system_supports_mte();
> -}
> -#define arch_thp_swp_supported arch_thp_swp_supported
> -
>  /*
>   * Outside of a few very special situations (e.g. hibernation), we always
>   * use broadcast TLB invalidation instructions, therefore a spurious page
> @@ -1280,12 +1274,7 @@ static inline pmd_t pmdp_establish(struct vm_area_struct *vma,
>  #ifdef CONFIG_ARM64_MTE
>  
>  #define __HAVE_ARCH_PREPARE_TO_SWAP
> -static inline int arch_prepare_to_swap(struct page *page)
> -{
> -	if (system_supports_mte())
> -		return mte_save_tags(page);
> -	return 0;
> -}
> +extern int arch_prepare_to_swap(struct folio *folio);
>  
>  #define __HAVE_ARCH_SWAP_INVALIDATE
>  static inline void arch_swap_invalidate_page(int type, pgoff_t offset)
> @@ -1301,11 +1290,7 @@ static inline void arch_swap_invalidate_area(int type)
>  }
>  
>  #define __HAVE_ARCH_SWAP_RESTORE
> -static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio)
> -{
> -	if (system_supports_mte())
> -		mte_restore_tags(entry, &folio->page);
> -}
> +extern void arch_swap_restore(swp_entry_t entry, struct folio *folio);
>  
>  #endif /* CONFIG_ARM64_MTE */
>  
> diff --git a/arch/arm64/mm/mteswap.c b/arch/arm64/mm/mteswap.c
> index a31833e3ddc5..63e8d72f202a 100644
> --- a/arch/arm64/mm/mteswap.c
> +++ b/arch/arm64/mm/mteswap.c
> @@ -68,6 +68,13 @@ void mte_invalidate_tags(int type, pgoff_t offset)
>  	mte_free_tag_storage(tags);
>  }
>  
> +static inline void __mte_invalidate_tags(struct page *page)
> +{
> +	swp_entry_t entry = page_swap_entry(page);
> +
> +	mte_invalidate_tags(swp_type(entry), swp_offset(entry));
> +}
> +
>  void mte_invalidate_tags_area(int type)
>  {
>  	swp_entry_t entry = swp_entry(type, 0);
> @@ -83,3 +90,41 @@ void mte_invalidate_tags_area(int type)
>  	}
>  	xa_unlock(&mte_pages);
>  }
> +
> +int arch_prepare_to_swap(struct folio *folio)
> +{
> +	long i, nr;
> +	int err;
> +
> +	if (!system_supports_mte())
> +		return 0;
> +
> +	nr = folio_nr_pages(folio);
> +
> +	for (i = 0; i < nr; i++) {
> +		err = mte_save_tags(folio_page(folio, i));
> +		if (err)
> +			goto out;
> +	}
> +	return 0;
> +
> +out:
> +	while (i--)
> +		__mte_invalidate_tags(folio_page(folio, i));
> +	return err;
> +}
> +
> +void arch_swap_restore(swp_entry_t entry, struct folio *folio)
> +{
> +	long i, nr;
> +
> +	if (!system_supports_mte())
> +		return;
> +
> +	nr = folio_nr_pages(folio);
> +
> +	for (i = 0; i < nr; i++) {
> +		mte_restore_tags(entry, folio_page(folio, i));
> +		entry.val++;
> +	}
> +}
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index de0c89105076..e04b93c43965 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -535,16 +535,4 @@ static inline int split_folio_to_order(struct folio *folio, int new_order)
>  #define split_folio_to_list(f, l) split_folio_to_list_to_order(f, l, 0)
>  #define split_folio(f) split_folio_to_order(f, 0)
>  
> -/*
> - * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> - * limitations in the implementation like arm64 MTE can override this to
> - * false
> - */
> -#ifndef arch_thp_swp_supported
> -static inline bool arch_thp_swp_supported(void)
> -{
> -	return true;
> -}
> -#endif
> -
>  #endif /* _LINUX_HUGE_MM_H */
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 85fc7554cd52..b10a7dd615bd 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1052,7 +1052,7 @@ static inline int arch_unmap_one(struct mm_struct *mm,
>   * prototypes must be defined in the arch-specific asm/pgtable.h file.
>   */
>  #ifndef __HAVE_ARCH_PREPARE_TO_SWAP
> -static inline int arch_prepare_to_swap(struct page *page)
> +static inline int arch_prepare_to_swap(struct folio *folio)
>  {
>  	return 0;
>  }
> diff --git a/mm/internal.h b/mm/internal.h
> index 7e486f2c502c..2551e93dd885 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -76,6 +76,20 @@ static inline int folio_nr_pages_mapped(struct folio *folio)
>  	return atomic_read(&folio->_nr_pages_mapped) & FOLIO_PAGES_MAPPED;
>  }
>  
> +/*
> + * Retrieve the first entry of a folio based on a provided entry within the
> + * folio. We cannot rely on folio->swap as there is no guarantee that it has
> + * been initialized. Used for calling arch_swap_restore()
> + */
> +static inline swp_entry_t folio_swap(swp_entry_t entry, struct folio *folio)
> +{
> +	swp_entry_t swap = {
> +		.val = ALIGN_DOWN(entry.val, folio_nr_pages(folio)),
> +	};
> +
> +	return swap;
> +}
> +
>  static inline void *folio_raw_mapping(struct folio *folio)
>  {
>  	unsigned long mapping = (unsigned long)folio->mapping;
> diff --git a/mm/memory.c b/mm/memory.c
> index f2bc6dd15eb8..b7cab8be8632 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4188,7 +4188,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>  	 * when reading from swap. This metadata may be indexed by swap entry
>  	 * so this must be called before swap_free().
>  	 */
> -	arch_swap_restore(entry, folio);
> +	arch_swap_restore(folio_swap(entry, folio), folio);
>  
>  	/*
>  	 * Remove the swap entry and conditionally try to free up the swapcache.
> diff --git a/mm/page_io.c b/mm/page_io.c
> index ae2b49055e43..a9a7c236aecc 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -189,7 +189,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
>  	 * Arch code may have to preserve more data than just the page
>  	 * contents, e.g. memory tags.
>  	 */
> -	ret = arch_prepare_to_swap(&folio->page);
> +	ret = arch_prepare_to_swap(folio);
>  	if (ret) {
>  		folio_mark_dirty(folio);
>  		folio_unlock(folio);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 0aad0d9a621b..44c1519ba881 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1913,7 +1913,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>  	 * Some architectures may have to restore extra metadata to the
>  	 * folio after reading from swap.
>  	 */
> -	arch_swap_restore(swap, folio);
> +	arch_swap_restore(folio_swap(swap, folio), folio);
>  
>  	if (shmem_should_replace_folio(folio, gfp)) {
>  		error = shmem_replace_folio(&folio, gfp, info, index);
> diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> index 90973ce7881d..53abeaf1371d 100644
> --- a/mm/swap_slots.c
> +++ b/mm/swap_slots.c
> @@ -310,7 +310,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio)
>  	entry.val = 0;
>  
>  	if (folio_test_large(folio)) {
> -		if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> +		if (IS_ENABLED(CONFIG_THP_SWAP))
>  			get_swap_pages(1, &entry, folio_nr_pages(folio));
>  		goto out;
>  	}
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 4919423cce76..5e6d2304a2a4 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1806,7 +1806,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>  	 * when reading from swap. This metadata may be indexed by swap entry
>  	 * so this must be called before swap_free().
>  	 */
> -	arch_swap_restore(entry, folio);
> +	arch_swap_restore(folio_swap(entry, folio), folio);
>  
>  	dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
>  	inc_mm_counter(vma->vm_mm, MM_ANONPAGES);


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
  2024-03-22 11:41   ` Barry Song
@ 2024-03-27 12:23     ` Catalin Marinas
  -1 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-03-27 12:23 UTC (permalink / raw)
  To: Barry Song
  Cc: will, akpm, hughd, linux-mm, linux-arm-kernel, chrisl,
	mark.rutland, ryan.roberts, steven.price, david, willy,
	Barry Song, Kemeng Shi, Anshuman Khandual, Peter Collingbourne,
	Yosry Ahmed, Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
> MTE as the MTE code works with the assumption tags save/restore is
> always handling a folio with only one page.
> 
> The limitation should be removed as more and more ARM64 SoCs have
> this feature. Co-existence of MTE and THP_SWAP becomes more and
> more important.
> 
> This patch makes MTE tags saving support large folios, then we don't
> need to split large folios into base pages for swapping out on ARM64
> SoCs with MTE any more.
> 
> arch_prepare_to_swap() should take folio rather than page as parameter
> because we support THP swap-out as a whole. It saves tags for all
> pages in a large folio.
> 
> As now we are restoring tags based-on folio, in arch_swap_restore(),
> we may increase some extra loops and early-exitings while refaulting
> a large folio which is still in swapcache in do_swap_page(). In case
> a large folio has nr pages, do_swap_page() will only set the PTE of
> the particular page which is causing the page fault.
> Thus do_swap_page() runs nr times, and each time, arch_swap_restore()
> will loop nr times for those subpages in the folio. So right now the
> algorithmic complexity becomes O(nr^2).
> 
> Once we support mapping large folios in do_swap_page(), extra loops
> and early-exitings will decrease while not being completely removed
> as a large folio might get partially tagged in corner cases such as,
> 1. a large folio in swapcache can be partially unmapped, thus, MTE
> tags for the unmapped pages will be invalidated;
> 2. users might use mprotect() to set MTEs on a part of a large folio.
> 
> arch_thp_swp_supported() is dropped since ARM64 MTE was the only one
> who needed it.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Peter Collingbourne <pcc@google.com>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
> Acked-by: Chris Li <chrisl@kernel.org>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
@ 2024-03-27 12:23     ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-03-27 12:23 UTC (permalink / raw)
  To: Barry Song
  Cc: will, akpm, hughd, linux-mm, linux-arm-kernel, chrisl,
	mark.rutland, ryan.roberts, steven.price, david, willy,
	Barry Song, Kemeng Shi, Anshuman Khandual, Peter Collingbourne,
	Yosry Ahmed, Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
> MTE as the MTE code works with the assumption tags save/restore is
> always handling a folio with only one page.
> 
> The limitation should be removed as more and more ARM64 SoCs have
> this feature. Co-existence of MTE and THP_SWAP becomes more and
> more important.
> 
> This patch makes MTE tags saving support large folios, then we don't
> need to split large folios into base pages for swapping out on ARM64
> SoCs with MTE any more.
> 
> arch_prepare_to_swap() should take folio rather than page as parameter
> because we support THP swap-out as a whole. It saves tags for all
> pages in a large folio.
> 
> As now we are restoring tags based-on folio, in arch_swap_restore(),
> we may increase some extra loops and early-exitings while refaulting
> a large folio which is still in swapcache in do_swap_page(). In case
> a large folio has nr pages, do_swap_page() will only set the PTE of
> the particular page which is causing the page fault.
> Thus do_swap_page() runs nr times, and each time, arch_swap_restore()
> will loop nr times for those subpages in the folio. So right now the
> algorithmic complexity becomes O(nr^2).
> 
> Once we support mapping large folios in do_swap_page(), extra loops
> and early-exitings will decrease while not being completely removed
> as a large folio might get partially tagged in corner cases such as,
> 1. a large folio in swapcache can be partially unmapped, thus, MTE
> tags for the unmapped pages will be invalidated;
> 2. users might use mprotect() to set MTEs on a part of a large folio.
> 
> arch_thp_swp_supported() is dropped since ARM64 MTE was the only one
> who needed it.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Kemeng Shi <shikemeng@huaweicloud.com>
> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Peter Collingbourne <pcc@google.com>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Yosry Ahmed <yosryahmed@google.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
> Acked-by: Chris Li <chrisl@kernel.org>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
  2024-03-22 11:41   ` Barry Song
@ 2024-03-27 14:53     ` Matthew Wilcox
  -1 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2024-03-27 14:53 UTC (permalink / raw)
  To: Barry Song
  Cc: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel,
	chrisl, mark.rutland, ryan.roberts, steven.price, david,
	Barry Song, Kemeng Shi, Anshuman Khandual, Peter Collingbourne,
	Yosry Ahmed, Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
> MTE as the MTE code works with the assumption tags save/restore is
> always handling a folio with only one page.
> 
> The limitation should be removed as more and more ARM64 SoCs have
> this feature. Co-existence of MTE and THP_SWAP becomes more and
> more important.
> 
> This patch makes MTE tags saving support large folios, then we don't
> need to split large folios into base pages for swapping out on ARM64
> SoCs with MTE any more.

Can we go further than this patch and only support PG_mte_tagged and
PG_mte_lock on folio->flags instead of page->flags?  We're down to using
page->flags for these two MTE bits, a whole lot of s390 junk, PG_hwpoison,
PG_head, PG_anon_exclusive and Zone/Section/Node/KASAN/last_cpupid.

Looking at some of the callers, the call in copy_highpage() would need
to be lifted to its caller so that we only set the tags once per folio
rather than try to set them per page of a folio ... might be a bit of
churn, and I'd hate to try to do it myself without being able to test it.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
@ 2024-03-27 14:53     ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2024-03-27 14:53 UTC (permalink / raw)
  To: Barry Song
  Cc: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel,
	chrisl, mark.rutland, ryan.roberts, steven.price, david,
	Barry Song, Kemeng Shi, Anshuman Khandual, Peter Collingbourne,
	Yosry Ahmed, Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
> MTE as the MTE code works with the assumption tags save/restore is
> always handling a folio with only one page.
> 
> The limitation should be removed as more and more ARM64 SoCs have
> this feature. Co-existence of MTE and THP_SWAP becomes more and
> more important.
> 
> This patch makes MTE tags saving support large folios, then we don't
> need to split large folios into base pages for swapping out on ARM64
> SoCs with MTE any more.

Can we go further than this patch and only support PG_mte_tagged and
PG_mte_lock on folio->flags instead of page->flags?  We're down to using
page->flags for these two MTE bits, a whole lot of s390 junk, PG_hwpoison,
PG_head, PG_anon_exclusive and Zone/Section/Node/KASAN/last_cpupid.

Looking at some of the callers, the call in copy_highpage() would need
to be lifted to its caller so that we only set the tags once per folio
rather than try to set them per page of a folio ... might be a bit of
churn, and I'd hate to try to do it myself without being able to test it.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
  2024-03-27 14:53     ` Matthew Wilcox
@ 2024-03-27 14:57       ` David Hildenbrand
  -1 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2024-03-27 14:57 UTC (permalink / raw)
  To: Matthew Wilcox, Barry Song
  Cc: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel,
	chrisl, mark.rutland, ryan.roberts, steven.price, Barry Song,
	Kemeng Shi, Anshuman Khandual, Peter Collingbourne, Yosry Ahmed,
	Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On 27.03.24 15:53, Matthew Wilcox wrote:
> On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
>> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
>> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
>> MTE as the MTE code works with the assumption tags save/restore is
>> always handling a folio with only one page.
>>
>> The limitation should be removed as more and more ARM64 SoCs have
>> this feature. Co-existence of MTE and THP_SWAP becomes more and
>> more important.
>>
>> This patch makes MTE tags saving support large folios, then we don't
>> need to split large folios into base pages for swapping out on ARM64
>> SoCs with MTE any more.
> 
> Can we go further than this patch and only support PG_mte_tagged and
> PG_mte_lock on folio->flags instead of page->flags?  We're down to using

I think we discussed that already and what I learned is that it "gets a 
bit complicated". But I'm hoping we can get that discussion started again.

> page->flags for these two MTE bits, a whole lot of s390 junk, PG_hwpoison,
> PG_head, PG_anon_exclusive and Zone/Section/Node/KASAN/last_cpupid.

... just like PG_anon_exclusive "gets a bit complicated". Well, I think 
I might have finally found a way to make it work, I'll only have to 
uglify fork() a bit.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
@ 2024-03-27 14:57       ` David Hildenbrand
  0 siblings, 0 replies; 20+ messages in thread
From: David Hildenbrand @ 2024-03-27 14:57 UTC (permalink / raw)
  To: Matthew Wilcox, Barry Song
  Cc: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel,
	chrisl, mark.rutland, ryan.roberts, steven.price, Barry Song,
	Kemeng Shi, Anshuman Khandual, Peter Collingbourne, Yosry Ahmed,
	Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On 27.03.24 15:53, Matthew Wilcox wrote:
> On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
>> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
>> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
>> MTE as the MTE code works with the assumption tags save/restore is
>> always handling a folio with only one page.
>>
>> The limitation should be removed as more and more ARM64 SoCs have
>> this feature. Co-existence of MTE and THP_SWAP becomes more and
>> more important.
>>
>> This patch makes MTE tags saving support large folios, then we don't
>> need to split large folios into base pages for swapping out on ARM64
>> SoCs with MTE any more.
> 
> Can we go further than this patch and only support PG_mte_tagged and
> PG_mte_lock on folio->flags instead of page->flags?  We're down to using

I think we discussed that already and what I learned is that it "gets a 
bit complicated". But I'm hoping we can get that discussion started again.

> page->flags for these two MTE bits, a whole lot of s390 junk, PG_hwpoison,
> PG_head, PG_anon_exclusive and Zone/Section/Node/KASAN/last_cpupid.

... just like PG_anon_exclusive "gets a bit complicated". Well, I think 
I might have finally found a way to make it work, I'll only have to 
uglify fork() a bit.

-- 
Cheers,

David / dhildenb


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
  2024-03-27 14:57       ` David Hildenbrand
@ 2024-03-27 15:13         ` Ryan Roberts
  -1 siblings, 0 replies; 20+ messages in thread
From: Ryan Roberts @ 2024-03-27 15:13 UTC (permalink / raw)
  To: David Hildenbrand, Matthew Wilcox, Barry Song
  Cc: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel,
	chrisl, mark.rutland, steven.price, Barry Song, Kemeng Shi,
	Anshuman Khandual, Peter Collingbourne, Yosry Ahmed, Peter Xu,
	Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On 27/03/2024 14:57, David Hildenbrand wrote:
> On 27.03.24 15:53, Matthew Wilcox wrote:
>> On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
>>> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
>>> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
>>> MTE as the MTE code works with the assumption tags save/restore is
>>> always handling a folio with only one page.
>>>
>>> The limitation should be removed as more and more ARM64 SoCs have
>>> this feature. Co-existence of MTE and THP_SWAP becomes more and
>>> more important.
>>>
>>> This patch makes MTE tags saving support large folios, then we don't
>>> need to split large folios into base pages for swapping out on ARM64
>>> SoCs with MTE any more.
>>
>> Can we go further than this patch and only support PG_mte_tagged and
>> PG_mte_lock on folio->flags instead of page->flags?  We're down to using
> 
> I think we discussed that already and what I learned is that it "gets a bit
> complicated". But I'm hoping we can get that discussion started again.

The original conversation starts here:
https://lore.kernel.org/linux-mm/fb34d312-1049-4932-8f2b-d7f33cfc297c@arm.com/

The issue is that you can have a large folio mapped to user space, and user
space only wants to activate MTE for a portion of it. So at that point, you
either have to deal with only part of it being tagged (as we do today with the
per-page flag) or you have to split the folio.

I haven't re-read the entire thread - so might be forgetting some important details.

> 
>> page->flags for these two MTE bits, a whole lot of s390 junk, PG_hwpoison,
>> PG_head, PG_anon_exclusive and Zone/Section/Node/KASAN/last_cpupid.
> 
> ... just like PG_anon_exclusive "gets a bit complicated". Well, I think I might
> have finally found a way to make it work, I'll only have to uglify fork() a bit.
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
@ 2024-03-27 15:13         ` Ryan Roberts
  0 siblings, 0 replies; 20+ messages in thread
From: Ryan Roberts @ 2024-03-27 15:13 UTC (permalink / raw)
  To: David Hildenbrand, Matthew Wilcox, Barry Song
  Cc: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel,
	chrisl, mark.rutland, steven.price, Barry Song, Kemeng Shi,
	Anshuman Khandual, Peter Collingbourne, Yosry Ahmed, Peter Xu,
	Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On 27/03/2024 14:57, David Hildenbrand wrote:
> On 27.03.24 15:53, Matthew Wilcox wrote:
>> On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
>>> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
>>> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
>>> MTE as the MTE code works with the assumption tags save/restore is
>>> always handling a folio with only one page.
>>>
>>> The limitation should be removed as more and more ARM64 SoCs have
>>> this feature. Co-existence of MTE and THP_SWAP becomes more and
>>> more important.
>>>
>>> This patch makes MTE tags saving support large folios, then we don't
>>> need to split large folios into base pages for swapping out on ARM64
>>> SoCs with MTE any more.
>>
>> Can we go further than this patch and only support PG_mte_tagged and
>> PG_mte_lock on folio->flags instead of page->flags?  We're down to using
> 
> I think we discussed that already and what I learned is that it "gets a bit
> complicated". But I'm hoping we can get that discussion started again.

The original conversation starts here:
https://lore.kernel.org/linux-mm/fb34d312-1049-4932-8f2b-d7f33cfc297c@arm.com/

The issue is that you can have a large folio mapped to user space, and user
space only wants to activate MTE for a portion of it. So at that point, you
either have to deal with only part of it being tagged (as we do today with the
per-page flag) or you have to split the folio.

I haven't re-read the entire thread - so might be forgetting some important details.

> 
>> page->flags for these two MTE bits, a whole lot of s390 junk, PG_hwpoison,
>> PG_head, PG_anon_exclusive and Zone/Section/Node/KASAN/last_cpupid.
> 
> ... just like PG_anon_exclusive "gets a bit complicated". Well, I think I might
> have finally found a way to make it work, I'll only have to uglify fork() a bit.
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
  2024-03-27 15:13         ` Ryan Roberts
@ 2024-03-27 15:15           ` Ryan Roberts
  -1 siblings, 0 replies; 20+ messages in thread
From: Ryan Roberts @ 2024-03-27 15:15 UTC (permalink / raw)
  To: David Hildenbrand, Matthew Wilcox, Barry Song
  Cc: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel,
	chrisl, mark.rutland, steven.price, Barry Song, Kemeng Shi,
	Anshuman Khandual, Peter Collingbourne, Yosry Ahmed, Peter Xu,
	Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On 27/03/2024 15:13, Ryan Roberts wrote:
> On 27/03/2024 14:57, David Hildenbrand wrote:
>> On 27.03.24 15:53, Matthew Wilcox wrote:
>>> On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
>>>> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
>>>> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
>>>> MTE as the MTE code works with the assumption tags save/restore is
>>>> always handling a folio with only one page.
>>>>
>>>> The limitation should be removed as more and more ARM64 SoCs have
>>>> this feature. Co-existence of MTE and THP_SWAP becomes more and
>>>> more important.
>>>>
>>>> This patch makes MTE tags saving support large folios, then we don't
>>>> need to split large folios into base pages for swapping out on ARM64
>>>> SoCs with MTE any more.
>>>
>>> Can we go further than this patch and only support PG_mte_tagged and
>>> PG_mte_lock on folio->flags instead of page->flags?  We're down to using

Also, I wouldn't want to hold this patch up in order to do this extra work. I
think the 2 things are orthogonal? (supporting THP swap with MTE vs not using
tail page flags for MTE). Can we discuss and handle it separately?

>>
>> I think we discussed that already and what I learned is that it "gets a bit
>> complicated". But I'm hoping we can get that discussion started again.
> 
> The original conversation starts here:
> https://lore.kernel.org/linux-mm/fb34d312-1049-4932-8f2b-d7f33cfc297c@arm.com/
> 
> The issue is that you can have a large folio mapped to user space, and user
> space only wants to activate MTE for a portion of it. So at that point, you
> either have to deal with only part of it being tagged (as we do today with the
> per-page flag) or you have to split the folio.
> 
> I haven't re-read the entire thread - so might be forgetting some important details.
> 
>>
>>> page->flags for these two MTE bits, a whole lot of s390 junk, PG_hwpoison,
>>> PG_head, PG_anon_exclusive and Zone/Section/Node/KASAN/last_cpupid.
>>
>> ... just like PG_anon_exclusive "gets a bit complicated". Well, I think I might
>> have finally found a way to make it work, I'll only have to uglify fork() a bit.
>>
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
@ 2024-03-27 15:15           ` Ryan Roberts
  0 siblings, 0 replies; 20+ messages in thread
From: Ryan Roberts @ 2024-03-27 15:15 UTC (permalink / raw)
  To: David Hildenbrand, Matthew Wilcox, Barry Song
  Cc: catalin.marinas, will, akpm, hughd, linux-mm, linux-arm-kernel,
	chrisl, mark.rutland, steven.price, Barry Song, Kemeng Shi,
	Anshuman Khandual, Peter Collingbourne, Yosry Ahmed, Peter Xu,
	Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On 27/03/2024 15:13, Ryan Roberts wrote:
> On 27/03/2024 14:57, David Hildenbrand wrote:
>> On 27.03.24 15:53, Matthew Wilcox wrote:
>>> On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
>>>> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
>>>> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
>>>> MTE as the MTE code works with the assumption tags save/restore is
>>>> always handling a folio with only one page.
>>>>
>>>> The limitation should be removed as more and more ARM64 SoCs have
>>>> this feature. Co-existence of MTE and THP_SWAP becomes more and
>>>> more important.
>>>>
>>>> This patch makes MTE tags saving support large folios, then we don't
>>>> need to split large folios into base pages for swapping out on ARM64
>>>> SoCs with MTE any more.
>>>
>>> Can we go further than this patch and only support PG_mte_tagged and
>>> PG_mte_lock on folio->flags instead of page->flags?  We're down to using

Also, I wouldn't want to hold this patch up in order to do this extra work. I
think the 2 things are orthogonal? (supporting THP swap with MTE vs not using
tail page flags for MTE). Can we discuss and handle it separately?

>>
>> I think we discussed that already and what I learned is that it "gets a bit
>> complicated". But I'm hoping we can get that discussion started again.
> 
> The original conversation starts here:
> https://lore.kernel.org/linux-mm/fb34d312-1049-4932-8f2b-d7f33cfc297c@arm.com/
> 
> The issue is that you can have a large folio mapped to user space, and user
> space only wants to activate MTE for a portion of it. So at that point, you
> either have to deal with only part of it being tagged (as we do today with the
> per-page flag) or you have to split the folio.
> 
> I haven't re-read the entire thread - so might be forgetting some important details.
> 
>>
>>> page->flags for these two MTE bits, a whole lot of s390 junk, PG_hwpoison,
>>> PG_head, PG_anon_exclusive and Zone/Section/Node/KASAN/last_cpupid.
>>
>> ... just like PG_anon_exclusive "gets a bit complicated". Well, I think I might
>> have finally found a way to make it work, I'll only have to uglify fork() a bit.
>>
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
  2024-03-27 15:15           ` Ryan Roberts
@ 2024-03-27 17:34             ` Matthew Wilcox
  -1 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2024-03-27 17:34 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: David Hildenbrand, Barry Song, catalin.marinas, will, akpm,
	hughd, linux-mm, linux-arm-kernel, chrisl, mark.rutland,
	steven.price, Barry Song, Kemeng Shi, Anshuman Khandual,
	Peter Collingbourne, Yosry Ahmed, Peter Xu, Lorenzo Stoakes,
	Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On Wed, Mar 27, 2024 at 03:15:33PM +0000, Ryan Roberts wrote:
> >>> Can we go further than this patch and only support PG_mte_tagged and
> >>> PG_mte_lock on folio->flags instead of page->flags?  We're down to using
> 
> Also, I wouldn't want to hold this patch up in order to do this extra work. I
> think the 2 things are orthogonal? (supporting THP swap with MTE vs not using
> tail page flags for MTE). Can we discuss and handle it separately?

Yes, I'm not trying to hold this patch hostage.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
@ 2024-03-27 17:34             ` Matthew Wilcox
  0 siblings, 0 replies; 20+ messages in thread
From: Matthew Wilcox @ 2024-03-27 17:34 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: David Hildenbrand, Barry Song, catalin.marinas, will, akpm,
	hughd, linux-mm, linux-arm-kernel, chrisl, mark.rutland,
	steven.price, Barry Song, Kemeng Shi, Anshuman Khandual,
	Peter Collingbourne, Yosry Ahmed, Peter Xu, Lorenzo Stoakes,
	Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On Wed, Mar 27, 2024 at 03:15:33PM +0000, Ryan Roberts wrote:
> >>> Can we go further than this patch and only support PG_mte_tagged and
> >>> PG_mte_lock on folio->flags instead of page->flags?  We're down to using
> 
> Also, I wouldn't want to hold this patch up in order to do this extra work. I
> think the 2 things are orthogonal? (supporting THP swap with MTE vs not using
> tail page flags for MTE). Can we discuss and handle it separately?

Yes, I'm not trying to hold this patch hostage.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
  2024-03-27 15:13         ` Ryan Roberts
@ 2024-03-27 17:58           ` Catalin Marinas
  -1 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-03-27 17:58 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: David Hildenbrand, Matthew Wilcox, Barry Song, will, akpm, hughd,
	linux-mm, linux-arm-kernel, chrisl, mark.rutland, steven.price,
	Barry Song, Kemeng Shi, Anshuman Khandual, Peter Collingbourne,
	Yosry Ahmed, Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On Wed, Mar 27, 2024 at 03:13:18PM +0000, Ryan Roberts wrote:
> On 27/03/2024 14:57, David Hildenbrand wrote:
> > On 27.03.24 15:53, Matthew Wilcox wrote:
> >> On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
> >>> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
> >>> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
> >>> MTE as the MTE code works with the assumption tags save/restore is
> >>> always handling a folio with only one page.
> >>>
> >>> The limitation should be removed as more and more ARM64 SoCs have
> >>> this feature. Co-existence of MTE and THP_SWAP becomes more and
> >>> more important.
> >>>
> >>> This patch makes MTE tags saving support large folios, then we don't
> >>> need to split large folios into base pages for swapping out on ARM64
> >>> SoCs with MTE any more.
> >>
> >> Can we go further than this patch and only support PG_mte_tagged and
> >> PG_mte_lock on folio->flags instead of page->flags?  We're down to using
> > 
> > I think we discussed that already and what I learned is that it "gets a bit
> > complicated". But I'm hoping we can get that discussion started again.
> 
> The original conversation starts here:
> https://lore.kernel.org/linux-mm/fb34d312-1049-4932-8f2b-d7f33cfc297c@arm.com/
> 
> The issue is that you can have a large folio mapped to user space, and user
> space only wants to activate MTE for a portion of it. So at that point, you
> either have to deal with only part of it being tagged (as we do today with the
> per-page flag) or you have to split the folio.

It needs splitting since the PROT_MTE property ends up in the pte as a
memory attribute. So we can't have a THP mapping with PROT_MTE but only
specific pages tagged.

I had an attempt last year to only keep the PG_mte_tagged flag in the
head page but I recall folio_copy() got in the way since it was calling
copy_highpage() on individual pages and the arm64 code was not seeing
the head PG_mte_tagged. I think it can be worked around but I got
distracted and forgot about this.

-- 
Catalin


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware with MTE
@ 2024-03-27 17:58           ` Catalin Marinas
  0 siblings, 0 replies; 20+ messages in thread
From: Catalin Marinas @ 2024-03-27 17:58 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: David Hildenbrand, Matthew Wilcox, Barry Song, will, akpm, hughd,
	linux-mm, linux-arm-kernel, chrisl, mark.rutland, steven.price,
	Barry Song, Kemeng Shi, Anshuman Khandual, Peter Collingbourne,
	Yosry Ahmed, Peter Xu, Lorenzo Stoakes, Mike Rapoport (IBM),
	Aneesh Kumar K.V, Rick Edgecombe

On Wed, Mar 27, 2024 at 03:13:18PM +0000, Ryan Roberts wrote:
> On 27/03/2024 14:57, David Hildenbrand wrote:
> > On 27.03.24 15:53, Matthew Wilcox wrote:
> >> On Sat, Mar 23, 2024 at 12:41:36AM +1300, Barry Song wrote:
> >>> Commit d0637c505f8a1 ("arm64: enable THP_SWAP for arm64") brings up
> >>> THP_SWAP on ARM64, but it doesn't enable THP_SWP on hardware with
> >>> MTE as the MTE code works with the assumption tags save/restore is
> >>> always handling a folio with only one page.
> >>>
> >>> The limitation should be removed as more and more ARM64 SoCs have
> >>> this feature. Co-existence of MTE and THP_SWAP becomes more and
> >>> more important.
> >>>
> >>> This patch makes MTE tags saving support large folios, then we don't
> >>> need to split large folios into base pages for swapping out on ARM64
> >>> SoCs with MTE any more.
> >>
> >> Can we go further than this patch and only support PG_mte_tagged and
> >> PG_mte_lock on folio->flags instead of page->flags?  We're down to using
> > 
> > I think we discussed that already and what I learned is that it "gets a bit
> > complicated". But I'm hoping we can get that discussion started again.
> 
> The original conversation starts here:
> https://lore.kernel.org/linux-mm/fb34d312-1049-4932-8f2b-d7f33cfc297c@arm.com/
> 
> The issue is that you can have a large folio mapped to user space, and user
> space only wants to activate MTE for a portion of it. So at that point, you
> either have to deal with only part of it being tagged (as we do today with the
> per-page flag) or you have to split the folio.

It needs splitting since the PROT_MTE property ends up in the pte as a
memory attribute. So we can't have a THP mapping with PROT_MTE but only
specific pages tagged.

I had an attempt last year to only keep the PG_mte_tagged flag in the
head page but I recall folio_copy() got in the way since it was calling
copy_highpage() on individual pages and the arm64 code was not seeing
the head PG_mte_tagged. I think it can be worked around but I got
distracted and forgot about this.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-03-27 17:58 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-22 11:41 [PATCH 0/1] THP_SWAP support for ARM64 SoC with MTE Barry Song
2024-03-22 11:41 ` Barry Song
2024-03-22 11:41 ` [PATCH 1/1] arm64: mm: swap: support THP_SWAP on hardware " Barry Song
2024-03-22 11:41   ` Barry Song
2024-03-26 17:37   ` Ryan Roberts
2024-03-26 17:37     ` Ryan Roberts
2024-03-27 12:23   ` Catalin Marinas
2024-03-27 12:23     ` Catalin Marinas
2024-03-27 14:53   ` Matthew Wilcox
2024-03-27 14:53     ` Matthew Wilcox
2024-03-27 14:57     ` David Hildenbrand
2024-03-27 14:57       ` David Hildenbrand
2024-03-27 15:13       ` Ryan Roberts
2024-03-27 15:13         ` Ryan Roberts
2024-03-27 15:15         ` Ryan Roberts
2024-03-27 15:15           ` Ryan Roberts
2024-03-27 17:34           ` Matthew Wilcox
2024-03-27 17:34             ` Matthew Wilcox
2024-03-27 17:58         ` Catalin Marinas
2024-03-27 17:58           ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.