All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Add hugetlb MADV_DONTNEED support
@ 2022-02-15  0:23 Mike Kravetz
  2022-02-15  0:23 ` [PATCH v3 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Mike Kravetz @ 2022-02-15  0:23 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Mike Rapoport, Andrew Morton, Mike Kravetz

Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP
testing.  However, mremap support was recently added in commit
550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed
vma").  While attempting to enable mremap support in the test, it was
discovered that the mremap test indirectly depends on MADV_DONTNEED.

madvise does not allow MADV_DONTNEED for hugetlb mappings.  However,
that is primarily due to the check in can_madv_lru_vma().  By simply
removing the check and adding huge page alignment, MADV_DONTNEED can
be made to work for hugetlb mappings.

Do note that there is no compelling use case for adding this support.
This was discussed in the RFC [1].  However, adding support makes sense
as it is fairly trivial and brings hugetlb functionality more in line
with 'normal' memory.

After enabling support, add selftest for MADV_DONTNEED as well as
MADV_REMOVE.  Then update userfaultfd selftest.

If new functionality is accepted, then madvise man page will be updated
to indicate hugetlb is supported.  It will also be updated to clarify
what happens to the passed length argument.

v2 -> v3
- Force start huge page alignment, extend length to huge page size. David
- Added more selftests to verify alignment.
- Remove double check for hugetlb vma. Peter

v1 -> v2
- Use is_vm_hugetlb_page() instead of open coding vma hugetlb check.
- Add new test to .gitignore and use meaningful symbolic names (#define)
  for constants used in test.  Shuah
- Updated help text in userfaultfd test and modified run_vmtests to not
  pass in a file for userfaultfd hugetlb test.  Axel
- Added Reviewed-by for selftest patches.

RFC -> v1
- Fixed alignment issues when calling zap_page_range.  Naoya
- Added checks for invalid arguments and misalignment to selftest.

[1] https://lore.kernel.org/linux-mm/20220113180308.15610-1-mike.kravetz@oracle.com/

Mike Kravetz (3):
  mm: enable MADV_DONTNEED for hugetlb mappings
  selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
  userfaultfd/selftests: enable hugetlb remap and remove event testing

 mm/madvise.c                                 |  33 +-
 tools/testing/selftests/vm/.gitignore        |   1 +
 tools/testing/selftests/vm/Makefile          |   1 +
 tools/testing/selftests/vm/hugetlb-madvise.c | 410 +++++++++++++++++++
 tools/testing/selftests/vm/run_vmtests.sh    |  15 +-
 tools/testing/selftests/vm/userfaultfd.c     |  69 ++--
 6 files changed, 490 insertions(+), 39 deletions(-)
 create mode 100644 tools/testing/selftests/vm/hugetlb-madvise.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v3 1/3] mm: enable MADV_DONTNEED for hugetlb mappings
  2022-02-15  0:23 [PATCH v3 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
@ 2022-02-15  0:23 ` Mike Kravetz
  2022-02-17  8:32   ` Yin Fengwei
  2022-02-15  0:23 ` [PATCH v3 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test Mike Kravetz
  2022-02-15  0:23 ` [PATCH v3 3/3] userfaultfd/selftests: enable hugetlb remap and remove event testing Mike Kravetz
  2 siblings, 1 reply; 7+ messages in thread
From: Mike Kravetz @ 2022-02-15  0:23 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Mike Rapoport, Andrew Morton, Mike Kravetz

MADV_DONTNEED is currently disabled for hugetlb mappings.  This
certainly makes sense in shared file mappings as the pagecache maintains
a reference to the page and it will never be freed.  However, it could
be useful to unmap and free pages in private mappings.  In addition,
userfaultfd minor fault users may be able to simplify code by using
MADV_DONTNEED.

The primary thing preventing MADV_DONTNEED from working on hugetlb
mappings is a check in can_madv_lru_vma().  To allow support for hugetlb
mappings create and use a new routine madvise_dontneed_free_valid_vma()
that allows hugetlb mappings in this specific case.

For normal mappings, madvise requires the start address be PAGE aligned
and rounds up length to the next multiple of PAGE_SIZE.  Do similarly
for hugetlb mappings: require start address be huge page size aligned and
round up length to the next multiple of huge page size.  Use the new
madvise_dontneed_free_valid_vma routine to check alignment and round up
length/end.  zap_page_range requires this alignment for hugetlb vmas
otherwise we will hit BUGs.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/madvise.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index bed872a2ad5f..ede6affa1350 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -554,9 +554,14 @@ static void madvise_cold_page_range(struct mmu_gather *tlb,
 	tlb_end_vma(tlb, vma);
 }
 
+static inline bool can_madv_lru_non_huge_vma(struct vm_area_struct *vma)
+{
+	return !(vma->vm_flags & (VM_LOCKED|VM_PFNMAP));
+}
+
 static inline bool can_madv_lru_vma(struct vm_area_struct *vma)
 {
-	return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP));
+	return can_madv_lru_non_huge_vma(vma) && !is_vm_hugetlb_page(vma);
 }
 
 static long madvise_cold(struct vm_area_struct *vma,
@@ -829,6 +834,23 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
 	return 0;
 }
 
+static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
+					    unsigned long start,
+					    unsigned long *end,
+					    int behavior)
+{
+	if (!is_vm_hugetlb_page(vma))
+		return can_madv_lru_non_huge_vma(vma);
+
+	if (behavior != MADV_DONTNEED)
+		return false;
+	if (start & ~huge_page_mask(hstate_vma(vma)))
+		return false;
+
+	*end = ALIGN(*end, huge_page_size(hstate_vma(vma)));
+	return true;
+}
+
 static long madvise_dontneed_free(struct vm_area_struct *vma,
 				  struct vm_area_struct **prev,
 				  unsigned long start, unsigned long end,
@@ -837,7 +859,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 
 	*prev = vma;
-	if (!can_madv_lru_vma(vma))
+	if (!madvise_dontneed_free_valid_vma(vma, start, &end, behavior))
 		return -EINVAL;
 
 	if (!userfaultfd_remove(vma, start, end)) {
@@ -859,7 +881,12 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 			 */
 			return -ENOMEM;
 		}
-		if (!can_madv_lru_vma(vma))
+		/*
+		 * Potential end adjustment for hugetlb vma is OK as
+		 * the check below keeps end within vma.
+		 */
+		if (!madvise_dontneed_free_valid_vma(vma, start, &end,
+						     behavior))
 			return -EINVAL;
 		if (end > vma->vm_end) {
 			/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
  2022-02-15  0:23 [PATCH v3 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
  2022-02-15  0:23 ` [PATCH v3 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
@ 2022-02-15  0:23 ` Mike Kravetz
  2022-02-15  0:23 ` [PATCH v3 3/3] userfaultfd/selftests: enable hugetlb remap and remove event testing Mike Kravetz
  2 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2022-02-15  0:23 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Mike Rapoport, Andrew Morton, Mike Kravetz, Shuah Khan

Now that MADV_DONTNEED support for hugetlb is enabled, add corresponding
tests.  MADV_REMOVE has been enabled for some time, but no tests exist
so add them as well.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>
---
 tools/testing/selftests/vm/.gitignore        |   1 +
 tools/testing/selftests/vm/Makefile          |   1 +
 tools/testing/selftests/vm/hugetlb-madvise.c | 410 +++++++++++++++++++
 tools/testing/selftests/vm/run_vmtests.sh    |  12 +
 4 files changed, 424 insertions(+)
 create mode 100644 tools/testing/selftests/vm/hugetlb-madvise.c

diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore
index 3b5faec3c04f..d7507f3c7c76 100644
--- a/tools/testing/selftests/vm/.gitignore
+++ b/tools/testing/selftests/vm/.gitignore
@@ -3,6 +3,7 @@ hugepage-mmap
 hugepage-mremap
 hugepage-shm
 hugepage-vmemmap
+hugetlb-madvise
 khugepaged
 map_hugetlb
 map_populate
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 96714d2d49dc..5e43f072f5b7 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -28,6 +28,7 @@ LDLIBS = -lrt -lpthread
 TEST_GEN_FILES = compaction_test
 TEST_GEN_FILES += gup_test
 TEST_GEN_FILES += hmm-tests
+TEST_GEN_FILES += hugetlb-madvise
 TEST_GEN_FILES += hugepage-mmap
 TEST_GEN_FILES += hugepage-mremap
 TEST_GEN_FILES += hugepage-shm
diff --git a/tools/testing/selftests/vm/hugetlb-madvise.c b/tools/testing/selftests/vm/hugetlb-madvise.c
new file mode 100644
index 000000000000..6c6af40f5747
--- /dev/null
+++ b/tools/testing/selftests/vm/hugetlb-madvise.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * hugepage-madvise:
+ *
+ * Basic functional testing of madvise MADV_DONTNEED and MADV_REMOVE
+ * on hugetlb mappings.
+ *
+ * Before running this test, make sure the administrator has pre-allocated
+ * at least MIN_FREE_PAGES hugetlb pages and they are free.  In addition,
+ * the test takes an argument that is the path to a file in a hugetlbfs
+ * filesystem.  Therefore, a hugetlbfs filesystem must be mounted on some
+ * directory.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#define __USE_GNU
+#include <fcntl.h>
+
+#define USAGE	"USAGE: %s <hugepagefile_name>\n"
+#define MIN_FREE_PAGES	20
+#define NR_HUGE_PAGES	10	/* common number of pages to map/allocate */
+
+#define validate_free_pages(exp_free)					\
+	do {								\
+		int fhp = get_free_hugepages();				\
+		if (fhp != (exp_free)) {				\
+			printf("Unexpected number of free huge "	\
+				"pages line %d\n", __LINE__);		\
+			exit(1);					\
+		}							\
+	} while (0)
+
+unsigned long huge_page_size;
+unsigned long base_page_size;
+
+/*
+ * default_huge_page_size copied from mlock2-tests.c
+ */
+unsigned long default_huge_page_size(void)
+{
+	unsigned long hps = 0;
+	char *line = NULL;
+	size_t linelen = 0;
+	FILE *f = fopen("/proc/meminfo", "r");
+
+	if (!f)
+		return 0;
+	while (getline(&line, &linelen, f) > 0) {
+		if (sscanf(line, "Hugepagesize:       %lu kB", &hps) == 1) {
+			hps <<= 10;
+			break;
+		}
+	}
+
+	free(line);
+	fclose(f);
+	return hps;
+}
+
+unsigned long get_free_hugepages(void)
+{
+	unsigned long fhp = 0;
+	char *line = NULL;
+	size_t linelen = 0;
+	FILE *f = fopen("/proc/meminfo", "r");
+
+	if (!f)
+		return fhp;
+	while (getline(&line, &linelen, f) > 0) {
+		if (sscanf(line, "HugePages_Free:      %lu", &fhp) == 1)
+			break;
+	}
+
+	free(line);
+	fclose(f);
+	return fhp;
+}
+
+void write_fault_pages(void *addr, unsigned long nr_pages)
+{
+	unsigned long i;
+
+	for (i = 0; i < nr_pages; i++)
+		*((unsigned long *)(addr + (i * huge_page_size))) = i;
+}
+
+void read_fault_pages(void *addr, unsigned long nr_pages)
+{
+	unsigned long i, tmp;
+
+	for (i = 0; i < nr_pages; i++)
+		tmp += *((unsigned long *)(addr + (i * huge_page_size)));
+}
+
+int main(int argc, char **argv)
+{
+	unsigned long free_hugepages;
+	void *addr, *addr2;
+	int fd;
+	int ret;
+
+	if (argc != 2) {
+		printf(USAGE, argv[0]);
+		exit(1);
+	}
+
+	huge_page_size = default_huge_page_size();
+	if (!huge_page_size) {
+		printf("Unable to determine huge page size, exiting!\n");
+		exit(1);
+	}
+	base_page_size = sysconf(_SC_PAGE_SIZE);
+	if (!huge_page_size) {
+		printf("Unable to determine base page size, exiting!\n");
+		exit(1);
+	}
+
+	free_hugepages = get_free_hugepages();
+	if (free_hugepages < MIN_FREE_PAGES) {
+		printf("Not enough free huge pages to test, exiting!\n");
+		exit(1);
+	}
+
+	fd = open(argv[1], O_CREAT | O_RDWR, 0755);
+	if (fd < 0) {
+		perror("Open failed");
+		exit(1);
+	}
+
+	/*
+	 * Test validity of MADV_DONTNEED addr and length arguments.  mmap
+	 * size is NR_HUGE_PAGES + 2.  One page at the beginning and end of
+	 * the mapping will be unmapped so we KNOW there is nothing mapped
+	 * there.
+	 */
+	addr = mmap(NULL, (NR_HUGE_PAGES + 2) * huge_page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
+			-1, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+	if (munmap(addr, huge_page_size) ||
+			munmap(addr + (NR_HUGE_PAGES + 1) * huge_page_size,
+				huge_page_size)) {
+		perror("munmap");
+		exit(1);
+	}
+	addr = addr + huge_page_size;
+
+	write_fault_pages(addr, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	/* addr before mapping should fail */
+	ret = madvise(addr - base_page_size, NR_HUGE_PAGES * huge_page_size,
+		MADV_DONTNEED);
+	if (!ret) {
+		printf("Unexpected success of madvise call with invalid addr line %d\n",
+				__LINE__);
+			exit(1);
+	}
+
+	/* addr + length after mapping should fail */
+	ret = madvise(addr, (NR_HUGE_PAGES * huge_page_size) + base_page_size,
+		MADV_DONTNEED);
+	if (!ret) {
+		printf("Unexpected success of madvise call with invalid length line %d\n",
+				__LINE__);
+			exit(1);
+	}
+
+	(void)munmap(addr, NR_HUGE_PAGES * huge_page_size);
+
+	/*
+	 * Test alignment of MADV_DONTNEED addr and length arguments
+	 */
+	addr = mmap(NULL, NR_HUGE_PAGES * huge_page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
+			-1, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+	write_fault_pages(addr, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	/* addr is not huge page size aligned and should fail */
+	ret = madvise(addr + base_page_size,
+			NR_HUGE_PAGES * huge_page_size - base_page_size,
+			MADV_DONTNEED);
+	if (!ret) {
+		printf("Unexpected success of madvise call with unaligned start address %d\n",
+				__LINE__);
+			exit(1);
+	}
+
+	/* addr + length should be aligned up to huge page size */
+	if (madvise(addr,
+			((NR_HUGE_PAGES - 1) * huge_page_size) + base_page_size,
+			MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+
+	/* should free all pages in mapping */
+	validate_free_pages(free_hugepages);
+
+	(void)munmap(addr, NR_HUGE_PAGES * huge_page_size);
+
+	/*
+	 * Test MADV_DONTNEED on anonymous private mapping
+	 */
+	addr = mmap(NULL, NR_HUGE_PAGES * huge_page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
+			-1, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+	write_fault_pages(addr, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	if (madvise(addr, NR_HUGE_PAGES * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+
+	/* should free all pages in mapping */
+	validate_free_pages(free_hugepages);
+
+	(void)munmap(addr, NR_HUGE_PAGES * huge_page_size);
+
+	/*
+	 * Test MADV_DONTNEED on private mapping of hugetlb file
+	 */
+	if (fallocate(fd, 0, 0, NR_HUGE_PAGES * huge_page_size)) {
+		perror("fallocate");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	addr = mmap(NULL, NR_HUGE_PAGES * huge_page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE, fd, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	/* read should not consume any pages */
+	read_fault_pages(addr, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	/* madvise should not free any pages */
+	if (madvise(addr, NR_HUGE_PAGES * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	/* writes should allocate private pages */
+	write_fault_pages(addr, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - (2 * NR_HUGE_PAGES));
+
+	/* madvise should free private pages */
+	if (madvise(addr, NR_HUGE_PAGES * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	/* writes should allocate private pages */
+	write_fault_pages(addr, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - (2 * NR_HUGE_PAGES));
+
+	/*
+	 * The fallocate below certainly should free the pages associated
+	 * with the file.  However, pages in the private mapping are also
+	 * freed.  This is not the 'correct' behavior, but is expected
+	 * because this is how it has worked since the initial hugetlb
+	 * implementation.
+	 */
+	if (fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+					0, NR_HUGE_PAGES * huge_page_size)) {
+		perror("fallocate");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages);
+
+	(void)munmap(addr, NR_HUGE_PAGES * huge_page_size);
+
+	/*
+	 * Test MADV_DONTNEED on shared mapping of hugetlb file
+	 */
+	if (fallocate(fd, 0, 0, NR_HUGE_PAGES * huge_page_size)) {
+		perror("fallocate");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	addr = mmap(NULL, NR_HUGE_PAGES * huge_page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_SHARED, fd, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	/* write should not consume any pages */
+	write_fault_pages(addr, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	/* madvise should not free any pages */
+	if (madvise(addr, NR_HUGE_PAGES * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	/*
+	 * Test MADV_REMOVE on shared mapping of hugetlb file
+	 *
+	 * madvise is same as hole punch and should free all pages.
+	 */
+	if (madvise(addr, NR_HUGE_PAGES * huge_page_size, MADV_REMOVE)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages);
+	(void)munmap(addr, NR_HUGE_PAGES * huge_page_size);
+
+	/*
+	 * Test MADV_REMOVE on shared and private mapping of hugetlb file
+	 */
+	if (fallocate(fd, 0, 0, NR_HUGE_PAGES * huge_page_size)) {
+		perror("fallocate");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	addr = mmap(NULL, NR_HUGE_PAGES * huge_page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_SHARED, fd, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	/* shared write should not consume any additional pages */
+	write_fault_pages(addr, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	addr2 = mmap(NULL, NR_HUGE_PAGES * huge_page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE, fd, 0);
+	if (addr2 == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	/* private read should not consume any pages */
+	read_fault_pages(addr2, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	/* private write should consume additional pages */
+	write_fault_pages(addr2, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - (2 * NR_HUGE_PAGES));
+
+	/* madvise of shared mapping should not free any pages */
+	if (madvise(addr, NR_HUGE_PAGES * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - (2 * NR_HUGE_PAGES));
+
+	/* madvise of private mapping should free private pages */
+	if (madvise(addr2, NR_HUGE_PAGES * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - NR_HUGE_PAGES);
+
+	/* private write should consume additional pages again */
+	write_fault_pages(addr2, NR_HUGE_PAGES);
+	validate_free_pages(free_hugepages - (2 * NR_HUGE_PAGES));
+
+	/*
+	 * madvise should free both file and private pages although this is
+	 * not correct.  private pages should not be freed, but this is
+	 * expected.  See comment associated with FALLOC_FL_PUNCH_HOLE call.
+	 */
+	if (madvise(addr, NR_HUGE_PAGES * huge_page_size, MADV_REMOVE)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages);
+
+	(void)munmap(addr, NR_HUGE_PAGES * huge_page_size);
+	(void)munmap(addr2, NR_HUGE_PAGES * huge_page_size);
+
+	close(fd);
+	unlink(argv[1]);
+	return 0;
+}
diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh
index e10d50e0b8e8..1948098f431d 100755
--- a/tools/testing/selftests/vm/run_vmtests.sh
+++ b/tools/testing/selftests/vm/run_vmtests.sh
@@ -131,6 +131,18 @@ else
 	echo "[PASS]"
 fi
 
+echo "-----------------------"
+echo "running hugetlb-madvise"
+echo "-----------------------"
+./hugetlb-madvise $mnt/madvise-test
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+rm -f $mnt/madvise-test
+
 echo "NOTE: The above hugetlb tests provide minimal coverage.  Use"
 echo "      https://github.com/libhugetlbfs/libhugetlbfs.git for"
 echo "      hugetlb regression testing."
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v3 3/3] userfaultfd/selftests: enable hugetlb remap and remove event testing
  2022-02-15  0:23 [PATCH v3 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
  2022-02-15  0:23 ` [PATCH v3 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
  2022-02-15  0:23 ` [PATCH v3 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test Mike Kravetz
@ 2022-02-15  0:23 ` Mike Kravetz
  2 siblings, 0 replies; 7+ messages in thread
From: Mike Kravetz @ 2022-02-15  0:23 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Mike Rapoport, Andrew Morton, Mike Kravetz

With MADV_DONTNEED support added to hugetlb mappings, mremap testing
can also be enabled for hugetlb.

Modify the tests to use madvise MADV_DONTNEED and MADV_REMOVE instead of
fallocate hole puch for releasing hugetlb pages.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
---
 tools/testing/selftests/vm/run_vmtests.sh |  3 +-
 tools/testing/selftests/vm/userfaultfd.c  | 69 ++++++++++++-----------
 2 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh
index 1948098f431d..3b265f140c25 100755
--- a/tools/testing/selftests/vm/run_vmtests.sh
+++ b/tools/testing/selftests/vm/run_vmtests.sh
@@ -208,14 +208,13 @@ echo "running userfaultfd_hugetlb"
 echo "---------------------------"
 # Test requires source and destination huge pages.  Size of source
 # (half_ufd_size_MB) is passed as argument to test.
-./userfaultfd hugetlb $half_ufd_size_MB 32 $mnt/ufd_test_file
+./userfaultfd hugetlb $half_ufd_size_MB 32
 if [ $? -ne 0 ]; then
 	echo "[FAIL]"
 	exitcode=1
 else
 	echo "[PASS]"
 fi
-rm -f $mnt/ufd_test_file
 
 echo "-------------------------"
 echo "running userfaultfd_shmem"
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 50476ec6c070..388bf4fc9dc6 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -88,7 +88,6 @@ static bool test_uffdio_minor = false;
 static bool map_shared;
 static int shm_fd;
 static int huge_fd;
-static char *huge_fd_off0;
 static unsigned long long *count_verify;
 static int uffd = -1;
 static int uffd_flags, finished, *pipefd;
@@ -127,9 +126,9 @@ const char *examples =
     "./userfaultfd anon 100 99999\n\n"
     "# Run share memory test on 1GiB region with 99 bounces:\n"
     "./userfaultfd shmem 1000 99\n\n"
-    "# Run hugetlb memory test on 256MiB region with 50 bounces (using /dev/hugepages/hugefile):\n"
-    "./userfaultfd hugetlb 256 50 /dev/hugepages/hugefile\n\n"
-    "# Run the same hugetlb test but using shmem:\n"
+    "# Run hugetlb memory test on 256MiB region with 50 bounces:\n"
+    "./userfaultfd hugetlb 256 50\n\n"
+    "# Run the same hugetlb test but using shared file:\n"
     "./userfaultfd hugetlb_shared 256 50 /dev/hugepages/hugefile\n\n"
     "# 10MiB-~6GiB 999 bounces anonymous test, "
     "continue forever unless an error triggers\n"
@@ -226,10 +225,13 @@ static void noop_alias_mapping(__u64 *start, size_t len, unsigned long offset)
 
 static void hugetlb_release_pages(char *rel_area)
 {
-	if (fallocate(huge_fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
-		      rel_area == huge_fd_off0 ? 0 : nr_pages * page_size,
-		      nr_pages * page_size))
-		err("fallocate() failed");
+	if (!map_shared) {
+		if (madvise(rel_area, nr_pages * page_size, MADV_DONTNEED))
+			err("madvise(MADV_DONTNEED) failed");
+	} else {
+		if (madvise(rel_area, nr_pages * page_size, MADV_REMOVE))
+			err("madvise(MADV_REMOVE) failed");
+	}
 }
 
 static void hugetlb_allocate_area(void **alloc_area)
@@ -237,26 +239,37 @@ static void hugetlb_allocate_area(void **alloc_area)
 	void *area_alias = NULL;
 	char **alloc_area_alias;
 
-	*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
-			   (map_shared ? MAP_SHARED : MAP_PRIVATE) |
-			   MAP_HUGETLB |
-			   (*alloc_area == area_src ? 0 : MAP_NORESERVE),
-			   huge_fd, *alloc_area == area_src ? 0 :
-			   nr_pages * page_size);
+	if (!map_shared)
+		*alloc_area = mmap(NULL,
+			nr_pages * page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB |
+				(*alloc_area == area_src ? 0 : MAP_NORESERVE),
+			-1,
+			0);
+	else
+		*alloc_area = mmap(NULL,
+			nr_pages * page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_SHARED |
+				(*alloc_area == area_src ? 0 : MAP_NORESERVE),
+			huge_fd,
+			*alloc_area == area_src ? 0 : nr_pages * page_size);
 	if (*alloc_area == MAP_FAILED)
 		err("mmap of hugetlbfs file failed");
 
 	if (map_shared) {
-		area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
-				  MAP_SHARED | MAP_HUGETLB,
-				  huge_fd, *alloc_area == area_src ? 0 :
-				  nr_pages * page_size);
+		area_alias = mmap(NULL,
+			nr_pages * page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_SHARED,
+			huge_fd,
+			*alloc_area == area_src ? 0 : nr_pages * page_size);
 		if (area_alias == MAP_FAILED)
 			err("mmap of hugetlb file alias failed");
 	}
 
 	if (*alloc_area == area_src) {
-		huge_fd_off0 = *alloc_area;
 		alloc_area_alias = &area_src_alias;
 	} else {
 		alloc_area_alias = &area_dst_alias;
@@ -269,12 +282,7 @@ static void hugetlb_alias_mapping(__u64 *start, size_t len, unsigned long offset
 {
 	if (!map_shared)
 		return;
-	/*
-	 * We can't zap just the pagetable with hugetlbfs because
-	 * MADV_DONTEED won't work. So exercise -EEXIST on a alias
-	 * mapping where the pagetables are not established initially,
-	 * this way we'll exercise the -EEXEC at the fs level.
-	 */
+
 	*start = (unsigned long) area_dst_alias + offset;
 }
 
@@ -427,7 +435,6 @@ static void uffd_test_ctx_clear(void)
 		uffd = -1;
 	}
 
-	huge_fd_off0 = NULL;
 	munmap_area((void **)&area_src);
 	munmap_area((void **)&area_src_alias);
 	munmap_area((void **)&area_dst);
@@ -925,10 +932,7 @@ static int faulting_process(int signal_test)
 	struct sigaction act;
 	unsigned long signalled = 0;
 
-	if (test_type != TEST_HUGETLB)
-		split_nr_pages = (nr_pages + 1) / 2;
-	else
-		split_nr_pages = nr_pages;
+	split_nr_pages = (nr_pages + 1) / 2;
 
 	if (signal_test) {
 		sigbuf = &jbuf;
@@ -985,9 +989,6 @@ static int faulting_process(int signal_test)
 	if (signal_test)
 		return signalled != split_nr_pages;
 
-	if (test_type == TEST_HUGETLB)
-		return 0;
-
 	area_dst = mremap(area_dst, nr_pages * page_size,  nr_pages * page_size,
 			  MREMAP_MAYMOVE | MREMAP_FIXED, area_src);
 	if (area_dst == MAP_FAILED)
@@ -1675,7 +1676,7 @@ int main(int argc, char **argv)
 	}
 	nr_pages = nr_pages_per_cpu * nr_cpus;
 
-	if (test_type == TEST_HUGETLB) {
+	if (test_type == TEST_HUGETLB && map_shared) {
 		if (argc < 5)
 			usage();
 		huge_fd = open(argv[4], O_CREAT | O_RDWR, 0755);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/3] mm: enable MADV_DONTNEED for hugetlb mappings
  2022-02-15  0:23 ` [PATCH v3 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
@ 2022-02-17  8:32   ` Yin Fengwei
  2022-02-17  8:58     ` David Hildenbrand
  0 siblings, 1 reply; 7+ messages in thread
From: Yin Fengwei @ 2022-02-17  8:32 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Mike Rapoport, Andrew Morton



On 2022/2/15 08:23, Mike Kravetz wrote:
> MADV_DONTNEED is currently disabled for hugetlb mappings.  This
> certainly makes sense in shared file mappings as the pagecache maintains
> a reference to the page and it will never be freed.  However, it could
> be useful to unmap and free pages in private mappings.  In addition,
> userfaultfd minor fault users may be able to simplify code by using
> MADV_DONTNEED.
> 
> The primary thing preventing MADV_DONTNEED from working on hugetlb
> mappings is a check in can_madv_lru_vma().  To allow support for hugetlb
> mappings create and use a new routine madvise_dontneed_free_valid_vma()
> that allows hugetlb mappings in this specific case.
> 
> For normal mappings, madvise requires the start address be PAGE aligned
> and rounds up length to the next multiple of PAGE_SIZE.  Do similarly

In man page of mmap, NOTE for "Huge page (Huge TLB) mappings":

"For munmap(), addr, and length must both be a multiple of the 
underlying huge page size."

Should we apply same rule to MADV_DONTNEED? Thanks.


Regards
Yin, Fengwei



> for hugetlb mappings: require start address be huge page size aligned and
> round up length to the next multiple of huge page size.  Use the new
> madvise_dontneed_free_valid_vma routine to check alignment and round up
> length/end.  zap_page_range requires this alignment for hugetlb vmas
> otherwise we will hit BUGs.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>   mm/madvise.c | 33 ++++++++++++++++++++++++++++++---
>   1 file changed, 30 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index bed872a2ad5f..ede6affa1350 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -554,9 +554,14 @@ static void madvise_cold_page_range(struct mmu_gather *tlb,
>   	tlb_end_vma(tlb, vma);
>   }
>   
> +static inline bool can_madv_lru_non_huge_vma(struct vm_area_struct *vma)
> +{
> +	return !(vma->vm_flags & (VM_LOCKED|VM_PFNMAP));
> +}
> +
>   static inline bool can_madv_lru_vma(struct vm_area_struct *vma)
>   {
> -	return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP));
> +	return can_madv_lru_non_huge_vma(vma) && !is_vm_hugetlb_page(vma);
>   }
>   
>   static long madvise_cold(struct vm_area_struct *vma,
> @@ -829,6 +834,23 @@ static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
>   	return 0;
>   }
>   
> +static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
> +					    unsigned long start,
> +					    unsigned long *end,
> +					    int behavior)
> +{
> +	if (!is_vm_hugetlb_page(vma))
> +		return can_madv_lru_non_huge_vma(vma);
> +
> +	if (behavior != MADV_DONTNEED)
> +		return false;
> +	if (start & ~huge_page_mask(hstate_vma(vma)))
> +		return false;
> +
> +	*end = ALIGN(*end, huge_page_size(hstate_vma(vma)));
> +	return true;
> +}
> +
>   static long madvise_dontneed_free(struct vm_area_struct *vma,
>   				  struct vm_area_struct **prev,
>   				  unsigned long start, unsigned long end,
> @@ -837,7 +859,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
>   	struct mm_struct *mm = vma->vm_mm;
>   
>   	*prev = vma;
> -	if (!can_madv_lru_vma(vma))
> +	if (!madvise_dontneed_free_valid_vma(vma, start, &end, behavior))
>   		return -EINVAL;
>   
>   	if (!userfaultfd_remove(vma, start, end)) {
> @@ -859,7 +881,12 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
>   			 */
>   			return -ENOMEM;
>   		}
> -		if (!can_madv_lru_vma(vma))
> +		/*
> +		 * Potential end adjustment for hugetlb vma is OK as
> +		 * the check below keeps end within vma.
> +		 */
> +		if (!madvise_dontneed_free_valid_vma(vma, start, &end,
> +						     behavior))
>   			return -EINVAL;
>   		if (end > vma->vm_end) {
>   			/*

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/3] mm: enable MADV_DONTNEED for hugetlb mappings
  2022-02-17  8:32   ` Yin Fengwei
@ 2022-02-17  8:58     ` David Hildenbrand
  2022-02-18  0:39       ` Yin Fengwei
  0 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2022-02-17  8:58 UTC (permalink / raw)
  To: Yin Fengwei, Mike Kravetz, linux-mm, linux-kernel
  Cc: Naoya Horiguchi, Axel Rasmussen, Mina Almasry, Michal Hocko,
	Peter Xu, Andrea Arcangeli, Shuah Khan, Mike Rapoport,
	Andrew Morton

On 17.02.22 09:32, Yin Fengwei wrote:
> 
> 
> On 2022/2/15 08:23, Mike Kravetz wrote:
>> MADV_DONTNEED is currently disabled for hugetlb mappings.  This
>> certainly makes sense in shared file mappings as the pagecache maintains
>> a reference to the page and it will never be freed.  However, it could
>> be useful to unmap and free pages in private mappings.  In addition,
>> userfaultfd minor fault users may be able to simplify code by using
>> MADV_DONTNEED.
>>
>> The primary thing preventing MADV_DONTNEED from working on hugetlb
>> mappings is a check in can_madv_lru_vma().  To allow support for hugetlb
>> mappings create and use a new routine madvise_dontneed_free_valid_vma()
>> that allows hugetlb mappings in this specific case.
>>
>> For normal mappings, madvise requires the start address be PAGE aligned
>> and rounds up length to the next multiple of PAGE_SIZE.  Do similarly
> 
> In man page of mmap, NOTE for "Huge page (Huge TLB) mappings":
> 
> "For munmap(), addr, and length must both be a multiple of the 
> underlying huge page size."
> 
> Should we apply same rule to MADV_DONTNEED? Thanks.

madvise() already has different rules than mmap() for ordinary pages, so
we'd much rather try staying consistent with madvise() rules.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3 1/3] mm: enable MADV_DONTNEED for hugetlb mappings
  2022-02-17  8:58     ` David Hildenbrand
@ 2022-02-18  0:39       ` Yin Fengwei
  0 siblings, 0 replies; 7+ messages in thread
From: Yin Fengwei @ 2022-02-18  0:39 UTC (permalink / raw)
  To: David Hildenbrand, Mike Kravetz, linux-mm, linux-kernel
  Cc: Naoya Horiguchi, Axel Rasmussen, Mina Almasry, Michal Hocko,
	Peter Xu, Andrea Arcangeli, Shuah Khan, Mike Rapoport,
	Andrew Morton



On 2/17/2022 4:58 PM, David Hildenbrand wrote:
>> In man page of mmap, NOTE for "Huge page (Huge TLB) mappings":
>>
>> "For munmap(), addr, and length must both be a multiple of the 
>> underlying huge page size."
>>
>> Should we apply same rule to MADV_DONTNEED? Thanks.
> madvise() already has different rules than mmap() for ordinary pages, so
> we'd much rather try staying consistent with madvise() rules.

Thanks for clarification.

Regards
Yin, Fengwei

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-02-18  0:40 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-15  0:23 [PATCH v3 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
2022-02-15  0:23 ` [PATCH v3 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
2022-02-17  8:32   ` Yin Fengwei
2022-02-17  8:58     ` David Hildenbrand
2022-02-18  0:39       ` Yin Fengwei
2022-02-15  0:23 ` [PATCH v3 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test Mike Kravetz
2022-02-15  0:23 ` [PATCH v3 3/3] userfaultfd/selftests: enable hugetlb remap and remove event testing Mike Kravetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.