All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Add hugetlb MADV_DONTNEED support
@ 2022-01-28 22:26 Mike Kravetz
  2022-01-28 22:26 ` [PATCH 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-01-28 22:26 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Andrew Morton, Mike Kravetz

Userfaultfd selftests for hugetlb does not perform UFFD_EVENT_REMAP
testing.  However, mremap support was recently added in commit
550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed
vma").  While attempting to enable mremap support in the test, it was
discovered that the mremap test indirectly depends on MADV_DONTNEED.

madvise does not allow MADV_DONTNEED for hugetlb mappings.  However,
that is primarily due to the check in can_madv_lru_vma().  By simply
removing the check and adding huge page alignment, MADV_DONTNEED can
be made to work for hugetlb mappings.

Do note that there is no compelling use case for adding this support.
This was discussed in the RFC [1].  However, adding support makes sense
as it is fairly trivial and brings hugetlb functionality more in line
with 'normal' memory.

After enabling support, add selftest for MADV_DONTNEED as well as
MADV_REMOVE.  Then update userfaultfd selftest.

RFC -> v1
- Fixed alignment issues when calling zap_page_range.  Naoya
- Added checks for invalid arguments and misalignment to selftest.

Mike Kravetz (3):
  mm: enable MADV_DONTNEED for hugetlb mappings
  selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
  userfaultfd/selftests: enable huegtlb remap and remove event testing

 mm/madvise.c                                 |  24 +-
 tools/testing/selftests/vm/Makefile          |   1 +
 tools/testing/selftests/vm/hugetlb-madvise.c | 401 +++++++++++++++++++
 tools/testing/selftests/vm/run_vmtests.sh    |  12 +
 tools/testing/selftests/vm/userfaultfd.c     |  67 ++--
 5 files changed, 470 insertions(+), 35 deletions(-)
 create mode 100644 tools/testing/selftests/vm/hugetlb-madvise.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/3] mm: enable MADV_DONTNEED for hugetlb mappings
  2022-01-28 22:26 [PATCH 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
@ 2022-01-28 22:26 ` Mike Kravetz
  2022-01-28 22:26 ` [PATCH 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test Mike Kravetz
  2022-01-28 22:26 ` [PATCH 3/3] userfaultfd/selftests: enable huegtlb remap and remove event testing Mike Kravetz
  2 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-01-28 22:26 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Andrew Morton, Mike Kravetz

MADV_DONTNEED is currently disabled for hugetlb mappings.  This
certainly makes sense in shared file mappings as the pagecache maintains
a reference to the page and it will never be freed.  However, it could
be useful to unmap and free pages in private mappings.

The only thing preventing MADV_DONTNEED from working on hugetlb mappings
is a check in can_madv_lru_vma().  To allow support for hugetlb mappings
create and use a new routine madvise_dontneed_free_valid_vma() that will
allow hugetlb mappings.  Also, before calling zap_page_range in the
DONTNEED case align start and size to huge page size for hugetlb vmas.
madvise only requires PAGE_SIZE alignment, but the hugetlb unmap routine
requires huge page size alignment.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/madvise.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 5604064df464..01b4d145d8f0 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -796,10 +796,30 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
 static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
 					unsigned long start, unsigned long end)
 {
+	/*
+	 * start and size (end - start) must be huge page size aligned
+	 * for hugetlb vmas.
+	 */
+	if (vma->vm_flags & VM_HUGETLB) {
+		struct hstate *h = hstate_vma(vma);
+
+		start = ALIGN_DOWN(start, huge_page_size(h));
+		end = ALIGN(end, huge_page_size(h));
+	}
+
 	zap_page_range(vma, start, end - start);
 	return 0;
 }
 
+static bool madvise_dontneed_free_valid_vma(struct vm_area_struct *vma,
+						int behavior)
+{
+	if (vma->vm_flags & VM_HUGETLB)
+		return behavior == MADV_DONTNEED;
+	else
+		return can_madv_lru_vma(vma);
+}
+
 static long madvise_dontneed_free(struct vm_area_struct *vma,
 				  struct vm_area_struct **prev,
 				  unsigned long start, unsigned long end,
@@ -808,7 +828,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 	struct mm_struct *mm = vma->vm_mm;
 
 	*prev = vma;
-	if (!can_madv_lru_vma(vma))
+	if (!madvise_dontneed_free_valid_vma(vma, behavior))
 		return -EINVAL;
 
 	if (!userfaultfd_remove(vma, start, end)) {
@@ -830,7 +850,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
 			 */
 			return -ENOMEM;
 		}
-		if (!can_madv_lru_vma(vma))
+		if (!madvise_dontneed_free_valid_vma(vma, behavior))
 			return -EINVAL;
 		if (end > vma->vm_end) {
 			/*
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
  2022-01-28 22:26 [PATCH 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
  2022-01-28 22:26 ` [PATCH 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
@ 2022-01-28 22:26 ` Mike Kravetz
  2022-01-28 23:46   ` Shuah Khan
  2022-01-28 22:26 ` [PATCH 3/3] userfaultfd/selftests: enable huegtlb remap and remove event testing Mike Kravetz
  2 siblings, 1 reply; 8+ messages in thread
From: Mike Kravetz @ 2022-01-28 22:26 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Andrew Morton, Mike Kravetz

Now that MADV_DONTNEED support for hugetlb is enabled, add corresponding
tests.  MADV_REMOVE has been enabled for some time, but no tests exist
so add them as well.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 tools/testing/selftests/vm/Makefile          |   1 +
 tools/testing/selftests/vm/hugetlb-madvise.c | 401 +++++++++++++++++++
 tools/testing/selftests/vm/run_vmtests.sh    |  12 +
 3 files changed, 414 insertions(+)
 create mode 100644 tools/testing/selftests/vm/hugetlb-madvise.c

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 1607322a112c..f60cf43bbf61 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -28,6 +28,7 @@ LDLIBS = -lrt -lpthread
 TEST_GEN_FILES = compaction_test
 TEST_GEN_FILES += gup_test
 TEST_GEN_FILES += hmm-tests
+TEST_GEN_FILES += hugetlb-madvise
 TEST_GEN_FILES += hugepage-mmap
 TEST_GEN_FILES += hugepage-mremap
 TEST_GEN_FILES += hugepage-shm
diff --git a/tools/testing/selftests/vm/hugetlb-madvise.c b/tools/testing/selftests/vm/hugetlb-madvise.c
new file mode 100644
index 000000000000..31c302528f2c
--- /dev/null
+++ b/tools/testing/selftests/vm/hugetlb-madvise.c
@@ -0,0 +1,401 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * hugepage-madvise:
+ *
+ * Basic functional testing of madvise MADV_DONTNEED and MADV_REMOVE
+ * on hugetlb mappings.
+ *
+ * Before running this test, make sure the administrator has pre-allocated
+ * at least MIN_FREE_PAGES hugetlb pages and they are free.  In addition,
+ * the test takes an argument that is the path to a file in a hugetlbfs
+ * filesystem.  Therefore, a hugetlbfs filesystem must be mounted on some
+ * directory.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#define __USE_GNU
+#include <fcntl.h>
+
+#define USAGE	"USAGE: %s <hugepagefile_name>\n"
+#define MIN_FREE_PAGES	20
+
+#define validate_free_pages(exp_free)					\
+	do {								\
+		int fhp = get_free_hugepages();				\
+		if (fhp != (exp_free)) {				\
+			printf("Unexpected number of free huge "	\
+				"pages line %d\n", __LINE__);		\
+			exit(1);					\
+		}							\
+	} while (0)
+
+unsigned long huge_page_size;
+unsigned long base_page_size;
+
+/*
+ * default_huge_page_size copied from mlock2-tests.c
+ */
+unsigned long default_huge_page_size(void)
+{
+	unsigned long hps = 0;
+	char *line = NULL;
+	size_t linelen = 0;
+	FILE *f = fopen("/proc/meminfo", "r");
+
+	if (!f)
+		return 0;
+	while (getline(&line, &linelen, f) > 0) {
+		if (sscanf(line, "Hugepagesize:       %lu kB", &hps) == 1) {
+			hps <<= 10;
+			break;
+		}
+	}
+
+	free(line);
+	fclose(f);
+	return hps;
+}
+
+unsigned long get_free_hugepages(void)
+{
+	unsigned long fhp = 0;
+	char *line = NULL;
+	size_t linelen = 0;
+	FILE *f = fopen("/proc/meminfo", "r");
+
+	if (!f)
+		return fhp;
+	while (getline(&line, &linelen, f) > 0) {
+		if (sscanf(line, "HugePages_Free:      %lu", &fhp) == 1)
+			break;
+	}
+
+	free(line);
+	fclose(f);
+	return fhp;
+}
+
+void write_fault_pages(void *addr, unsigned long nr_pages)
+{
+	unsigned long i;
+
+	for (i = 0; i < nr_pages; i++)
+		*((unsigned long *)(addr + (i * huge_page_size))) = i;
+}
+
+void read_fault_pages(void *addr, unsigned long nr_pages)
+{
+	unsigned long i, tmp;
+
+	for (i = 0; i < nr_pages; i++)
+		tmp += *((unsigned long *)(addr + (i * huge_page_size)));
+}
+
+int main(int argc, char **argv)
+{
+	unsigned long free_hugepages;
+	void *addr, *addr2;
+	int fd;
+	int ret;
+
+	if (argc != 2) {
+		printf(USAGE, argv[0]);
+		exit(1);
+	}
+
+	huge_page_size = default_huge_page_size();
+	if (!huge_page_size) {
+		printf("Unable to determine huge page size, exiting!\n");
+		exit(1);
+	}
+	base_page_size = sysconf(_SC_PAGE_SIZE);
+	if (!huge_page_size) {
+		printf("Unable to determine base page size, exiting!\n");
+		exit(1);
+	}
+
+	free_hugepages = get_free_hugepages();
+	if (free_hugepages < MIN_FREE_PAGES) {
+		printf("Not enough free huge pages to test, exiting!\n");
+		exit(1);
+	}
+
+	fd = open(argv[1], O_CREAT | O_RDWR, 0755);
+	if (fd < 0) {
+		perror("Open failed");
+		exit(1);
+	}
+
+	/*
+	 * Test validity of MADV_DONTNEED addr and length arguments
+	 */
+	addr = mmap(NULL, 12 * huge_page_size, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
+			-1, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+	/* unmap first and last page so we know nothing is mapped there */
+	if (munmap(addr, huge_page_size) ||
+			munmap(addr + 11 * huge_page_size, huge_page_size)) {
+		perror("munmap");
+		exit(1);
+	}
+	addr = addr + huge_page_size;
+
+	write_fault_pages(addr, 10);
+	validate_free_pages(free_hugepages - 10);
+
+	/* addr before mapping should fail */
+	ret = madvise(addr - base_page_size, 10 * huge_page_size,
+		MADV_DONTNEED);
+	if (!ret) {
+		printf("Unexpected success of madvise call with invalid addr line %d\n",
+				__LINE__);
+			exit(1);
+	}
+
+	/* addr + length after mapping should fail */
+	ret = madvise(addr, (10 * huge_page_size) + base_page_size,
+		MADV_DONTNEED);
+	if (!ret) {
+		printf("Unexpected success of madvise call with invalid length line %d\n",
+				__LINE__);
+			exit(1);
+	}
+
+	(void)munmap(addr, 10 * huge_page_size);
+
+	/*
+	 * Test alignment of MADV_DONTNEED addr and length arguments
+	 */
+	addr = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
+			-1, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+	write_fault_pages(addr, 10);
+	validate_free_pages(free_hugepages - 10);
+
+	/* addr should be aligned down to huge page size */
+	if (madvise(addr + base_page_size,
+			10 * huge_page_size - base_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+
+	/* should free all pages in mapping */
+	validate_free_pages(free_hugepages);
+
+	write_fault_pages(addr, 10);
+	validate_free_pages(free_hugepages - 10);
+
+	/* addr + length should be aligned up to huge page size */
+	if (madvise(addr, (10 * huge_page_size) - base_page_size,
+			MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+
+	/* should free all pages in mapping */
+	validate_free_pages(free_hugepages);
+
+	(void)munmap(addr, 10 * huge_page_size);
+
+	/*
+	 * Test MADV_DONTNEED on anonymous private mapping
+	 */
+	addr = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
+			-1, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+	write_fault_pages(addr, 10);
+	validate_free_pages(free_hugepages - 10);
+
+	if (madvise(addr, 10 * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+
+	/* should free all pages in mapping */
+	validate_free_pages(free_hugepages);
+
+	(void)munmap(addr, 10 * huge_page_size);
+
+	/*
+	 * Test MADV_DONTNEED on private mapping of hugetlb file
+	 */
+	if (fallocate(fd, 0, 0, 10 * huge_page_size)) {
+		perror("fallocate");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - 10);
+
+	addr = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE, fd, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	/* read should not consume any pages */
+	read_fault_pages(addr, 10);
+	validate_free_pages(free_hugepages - 10);
+
+	/* madvise should not free any pages */
+	if (madvise(addr, 10 * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - 10);
+
+	/* writes should allocate private pages */
+	write_fault_pages(addr, 10);
+	validate_free_pages(free_hugepages - 20);
+
+	/* madvise should free private pages */
+	if (madvise(addr, 10 * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - 10);
+
+	/* writes should allocate private pages */
+	write_fault_pages(addr, 10);
+	validate_free_pages(free_hugepages - 20);
+
+	/*
+	 * The fallocate below certainly should free the pages associated
+	 * with the file.  However, pages in the private mapping are also
+	 * freed.  This is not the 'correct' behavior, but is expected
+	 * because this is how it has worked since the initial hugetlb
+	 * implementation.
+	 */
+	if (fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+						0, 10 * huge_page_size)) {
+		perror("fallocate");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages);
+
+	(void)munmap(addr, 10 * huge_page_size);
+
+	/*
+	 * Test MADV_DONTNEED on shared mapping of hugetlb file
+	 */
+	if (fallocate(fd, 0, 0, 10 * huge_page_size)) {
+		perror("fallocate");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - 10);
+
+	addr = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
+			MAP_SHARED, fd, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	/* write should not consume any pages */
+	write_fault_pages(addr, 10);
+	validate_free_pages(free_hugepages - 10);
+
+	/* madvise should not free any pages */
+	if (madvise(addr, 10 * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - 10);
+
+	/*
+	 * Test MADV_REMOVE on shared mapping of hugetlb file
+	 *
+	 * madvise is same as hole punch and should free all pages.
+	 */
+	if (madvise(addr, 10 * huge_page_size, MADV_REMOVE)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages);
+	(void)munmap(addr, 10 * huge_page_size);
+
+	/*
+	 * Test MADV_REMOVE on shared and private mapping of hugetlb file
+	 */
+	if (fallocate(fd, 0, 0, 10 * huge_page_size)) {
+		perror("fallocate");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - 10);
+
+	addr = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
+			MAP_SHARED, fd, 0);
+	if (addr == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	/* shared write should not consume any additional pages */
+	write_fault_pages(addr, 10);
+	validate_free_pages(free_hugepages - 10);
+
+	addr2 = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
+			MAP_PRIVATE, fd, 0);
+	if (addr2 == MAP_FAILED) {
+		perror("mmap");
+		exit(1);
+	}
+
+	/* private read should not consume any pages */
+	read_fault_pages(addr2, 10);
+	validate_free_pages(free_hugepages - 10);
+
+	/* private write should consume additional pages */
+	write_fault_pages(addr2, 10);
+	validate_free_pages(free_hugepages - 20);
+
+	/* madvise of shared mapping should not free any pages */
+	if (madvise(addr, 10 * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - 20);
+
+	/* madvise of private mapping should free private pages */
+	if (madvise(addr2, 10 * huge_page_size, MADV_DONTNEED)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages - 10);
+
+	/* private write should consume additional pages again */
+	write_fault_pages(addr2, 10);
+	validate_free_pages(free_hugepages - 20);
+
+	/*
+	 * madvise should free both file and private pages although this is
+	 * not correct.  private pages should not be freed, but this is
+	 * expected.  See comment associated with FALLOC_FL_PUNCH_HOLE call.
+	 */
+	if (madvise(addr, 10 * huge_page_size, MADV_REMOVE)) {
+		perror("madvise");
+		exit(1);
+	}
+	validate_free_pages(free_hugepages);
+
+	(void)munmap(addr, 10 * huge_page_size);
+	(void)munmap(addr2, 10 * huge_page_size);
+
+	close(fd);
+	unlink(argv[1]);
+	return 0;
+}
diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh
index 75d401741394..e0daf9ff0cbe 100755
--- a/tools/testing/selftests/vm/run_vmtests.sh
+++ b/tools/testing/selftests/vm/run_vmtests.sh
@@ -119,6 +119,18 @@ else
 	echo "[PASS]"
 fi
 
+echo "-----------------------"
+echo "running hugetlb-madvise"
+echo "-----------------------"
+./hugetlb-madvise $mnt/madvise-test
+if [ $? -ne 0 ]; then
+	echo "[FAIL]"
+	exitcode=1
+else
+	echo "[PASS]"
+fi
+rm -f $mnt/madvise-test
+
 echo "NOTE: The above hugetlb tests provide minimal coverage.  Use"
 echo "      https://github.com/libhugetlbfs/libhugetlbfs.git for"
 echo "      hugetlb regression testing."
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/3] userfaultfd/selftests: enable huegtlb remap and remove event testing
  2022-01-28 22:26 [PATCH 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
  2022-01-28 22:26 ` [PATCH 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
  2022-01-28 22:26 ` [PATCH 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test Mike Kravetz
@ 2022-01-28 22:26 ` Mike Kravetz
  2022-01-28 23:34   ` Axel Rasmussen
  2 siblings, 1 reply; 8+ messages in thread
From: Mike Kravetz @ 2022-01-28 22:26 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Andrew Morton, Mike Kravetz

With MADV_DONTNEED support added to hugetlb mappings, mremap testing
can also be enabled for hugetlb.

Modify the tests to use madvise MADV_DONTNEED and MADV_REMOVE instead of
fallocate hole puch for releasing hugetlb pages.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 67 ++++++++++++------------
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index d3fd24f9fae8..f5578ef85560 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -88,7 +88,6 @@ static bool test_uffdio_minor = false;
 static bool map_shared;
 static int shm_fd;
 static int huge_fd;
-static char *huge_fd_off0;
 static unsigned long long *count_verify;
 static int uffd = -1;
 static int uffd_flags, finished, *pipefd;
@@ -124,9 +123,9 @@ const char *examples =
     "./userfaultfd anon 100 99999\n\n"
     "# Run share memory test on 1GiB region with 99 bounces:\n"
     "./userfaultfd shmem 1000 99\n\n"
-    "# Run hugetlb memory test on 256MiB region with 50 bounces (using /dev/hugepages/hugefile):\n"
+    "# Run hugetlb memory test on 256MiB region with 50 bounces:\n"
     "./userfaultfd hugetlb 256 50 /dev/hugepages/hugefile\n\n"
-    "# Run the same hugetlb test but using shmem:\n"
+    "# Run the same hugetlb test but using shared file:\n"
     "./userfaultfd hugetlb_shared 256 50 /dev/hugepages/hugefile\n\n"
     "# 10MiB-~6GiB 999 bounces anonymous test, "
     "continue forever unless an error triggers\n"
@@ -223,10 +222,13 @@ static void noop_alias_mapping(__u64 *start, size_t len, unsigned long offset)
 
 static void hugetlb_release_pages(char *rel_area)
 {
-	if (fallocate(huge_fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
-		      rel_area == huge_fd_off0 ? 0 : nr_pages * page_size,
-		      nr_pages * page_size))
-		err("fallocate() failed");
+	if (!map_shared) {
+		if (madvise(rel_area, nr_pages * page_size, MADV_DONTNEED))
+			err("madvise(MADV_DONTNEED) failed");
+	} else {
+		if (madvise(rel_area, nr_pages * page_size, MADV_REMOVE))
+			err("madvise(MADV_REMOVE) failed");
+	}
 }
 
 static void hugetlb_allocate_area(void **alloc_area)
@@ -234,26 +236,37 @@ static void hugetlb_allocate_area(void **alloc_area)
 	void *area_alias = NULL;
 	char **alloc_area_alias;
 
-	*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
-			   (map_shared ? MAP_SHARED : MAP_PRIVATE) |
-			   MAP_HUGETLB |
-			   (*alloc_area == area_src ? 0 : MAP_NORESERVE),
-			   huge_fd, *alloc_area == area_src ? 0 :
-			   nr_pages * page_size);
+	if (!map_shared)
+		*alloc_area = mmap(NULL,
+			nr_pages * page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB |
+				(*alloc_area == area_src ? 0 : MAP_NORESERVE),
+			-1,
+			0);
+	else
+		*alloc_area = mmap(NULL,
+			nr_pages * page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_SHARED |
+				(*alloc_area == area_src ? 0 : MAP_NORESERVE),
+			huge_fd,
+			*alloc_area == area_src ? 0 : nr_pages * page_size);
 	if (*alloc_area == MAP_FAILED)
 		err("mmap of hugetlbfs file failed");
 
 	if (map_shared) {
-		area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
-				  MAP_SHARED | MAP_HUGETLB,
-				  huge_fd, *alloc_area == area_src ? 0 :
-				  nr_pages * page_size);
+		area_alias = mmap(NULL,
+			nr_pages * page_size,
+			PROT_READ | PROT_WRITE,
+			MAP_SHARED,
+			huge_fd,
+			*alloc_area == area_src ? 0 : nr_pages * page_size);
 		if (area_alias == MAP_FAILED)
 			err("mmap of hugetlb file alias failed");
 	}
 
 	if (*alloc_area == area_src) {
-		huge_fd_off0 = *alloc_area;
 		alloc_area_alias = &area_src_alias;
 	} else {
 		alloc_area_alias = &area_dst_alias;
@@ -266,12 +279,7 @@ static void hugetlb_alias_mapping(__u64 *start, size_t len, unsigned long offset
 {
 	if (!map_shared)
 		return;
-	/*
-	 * We can't zap just the pagetable with hugetlbfs because
-	 * MADV_DONTEED won't work. So exercise -EEXIST on a alias
-	 * mapping where the pagetables are not established initially,
-	 * this way we'll exercise the -EEXEC at the fs level.
-	 */
+
 	*start = (unsigned long) area_dst_alias + offset;
 }
 
@@ -424,7 +432,6 @@ static void uffd_test_ctx_clear(void)
 		uffd = -1;
 	}
 
-	huge_fd_off0 = NULL;
 	munmap_area((void **)&area_src);
 	munmap_area((void **)&area_src_alias);
 	munmap_area((void **)&area_dst);
@@ -922,10 +929,7 @@ static int faulting_process(int signal_test)
 	struct sigaction act;
 	unsigned long signalled = 0;
 
-	if (test_type != TEST_HUGETLB)
-		split_nr_pages = (nr_pages + 1) / 2;
-	else
-		split_nr_pages = nr_pages;
+	split_nr_pages = (nr_pages + 1) / 2;
 
 	if (signal_test) {
 		sigbuf = &jbuf;
@@ -982,9 +986,6 @@ static int faulting_process(int signal_test)
 	if (signal_test)
 		return signalled != split_nr_pages;
 
-	if (test_type == TEST_HUGETLB)
-		return 0;
-
 	area_dst = mremap(area_dst, nr_pages * page_size,  nr_pages * page_size,
 			  MREMAP_MAYMOVE | MREMAP_FIXED, area_src);
 	if (area_dst == MAP_FAILED)
@@ -1667,7 +1668,7 @@ int main(int argc, char **argv)
 	}
 	nr_pages = nr_pages_per_cpu * nr_cpus;
 
-	if (test_type == TEST_HUGETLB) {
+	if (test_type == TEST_HUGETLB && map_shared) {
 		if (argc < 5)
 			usage();
 		huge_fd = open(argv[4], O_CREAT | O_RDWR, 0755);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] userfaultfd/selftests: enable huegtlb remap and remove event testing
  2022-01-28 22:26 ` [PATCH 3/3] userfaultfd/selftests: enable huegtlb remap and remove event testing Mike Kravetz
@ 2022-01-28 23:34   ` Axel Rasmussen
  2022-01-28 23:53     ` Mike Kravetz
  0 siblings, 1 reply; 8+ messages in thread
From: Axel Rasmussen @ 2022-01-28 23:34 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Linux MM, LKML, Naoya Horiguchi, David Hildenbrand, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Andrew Morton

Besides the help text, looks correct to me. I applied the patches and
ran the userfaultfd selftests, and everything seems to work properly.

Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>

On Fri, Jan 28, 2022 at 2:26 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> With MADV_DONTNEED support added to hugetlb mappings, mremap testing
> can also be enabled for hugetlb.
>
> Modify the tests to use madvise MADV_DONTNEED and MADV_REMOVE instead of
> fallocate hole puch for releasing hugetlb pages.
>
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  tools/testing/selftests/vm/userfaultfd.c | 67 ++++++++++++------------
>  1 file changed, 34 insertions(+), 33 deletions(-)
>
> diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
> index d3fd24f9fae8..f5578ef85560 100644
> --- a/tools/testing/selftests/vm/userfaultfd.c
> +++ b/tools/testing/selftests/vm/userfaultfd.c
> @@ -88,7 +88,6 @@ static bool test_uffdio_minor = false;
>  static bool map_shared;
>  static int shm_fd;
>  static int huge_fd;
> -static char *huge_fd_off0;
>  static unsigned long long *count_verify;
>  static int uffd = -1;
>  static int uffd_flags, finished, *pipefd;
> @@ -124,9 +123,9 @@ const char *examples =
>      "./userfaultfd anon 100 99999\n\n"
>      "# Run share memory test on 1GiB region with 99 bounces:\n"
>      "./userfaultfd shmem 1000 99\n\n"
> -    "# Run hugetlb memory test on 256MiB region with 50 bounces (using /dev/hugepages/hugefile):\n"
> +    "# Run hugetlb memory test on 256MiB region with 50 bounces:\n"
>      "./userfaultfd hugetlb 256 50 /dev/hugepages/hugefile\n\n"

We should remove the path from the line above here as well, right?
Since for the hugetlb test type, we now just MAP_ANONYMOUS |
MAP_HUGETLB, we don't open a file descriptor.

> -    "# Run the same hugetlb test but using shmem:\n"
> +    "# Run the same hugetlb test but using shared file:\n"
>      "./userfaultfd hugetlb_shared 256 50 /dev/hugepages/hugefile\n\n"
>      "# 10MiB-~6GiB 999 bounces anonymous test, "
>      "continue forever unless an error triggers\n"
> @@ -223,10 +222,13 @@ static void noop_alias_mapping(__u64 *start, size_t len, unsigned long offset)
>
>  static void hugetlb_release_pages(char *rel_area)
>  {
> -       if (fallocate(huge_fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> -                     rel_area == huge_fd_off0 ? 0 : nr_pages * page_size,
> -                     nr_pages * page_size))
> -               err("fallocate() failed");
> +       if (!map_shared) {
> +               if (madvise(rel_area, nr_pages * page_size, MADV_DONTNEED))
> +                       err("madvise(MADV_DONTNEED) failed");
> +       } else {
> +               if (madvise(rel_area, nr_pages * page_size, MADV_REMOVE))
> +                       err("madvise(MADV_REMOVE) failed");
> +       }
>  }
>
>  static void hugetlb_allocate_area(void **alloc_area)
> @@ -234,26 +236,37 @@ static void hugetlb_allocate_area(void **alloc_area)
>         void *area_alias = NULL;
>         char **alloc_area_alias;
>
> -       *alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
> -                          (map_shared ? MAP_SHARED : MAP_PRIVATE) |
> -                          MAP_HUGETLB |
> -                          (*alloc_area == area_src ? 0 : MAP_NORESERVE),
> -                          huge_fd, *alloc_area == area_src ? 0 :
> -                          nr_pages * page_size);
> +       if (!map_shared)
> +               *alloc_area = mmap(NULL,
> +                       nr_pages * page_size,
> +                       PROT_READ | PROT_WRITE,
> +                       MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB |
> +                               (*alloc_area == area_src ? 0 : MAP_NORESERVE),
> +                       -1,
> +                       0);
> +       else
> +               *alloc_area = mmap(NULL,
> +                       nr_pages * page_size,
> +                       PROT_READ | PROT_WRITE,
> +                       MAP_SHARED |
> +                               (*alloc_area == area_src ? 0 : MAP_NORESERVE),
> +                       huge_fd,
> +                       *alloc_area == area_src ? 0 : nr_pages * page_size);
>         if (*alloc_area == MAP_FAILED)
>                 err("mmap of hugetlbfs file failed");
>
>         if (map_shared) {
> -               area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
> -                                 MAP_SHARED | MAP_HUGETLB,
> -                                 huge_fd, *alloc_area == area_src ? 0 :
> -                                 nr_pages * page_size);
> +               area_alias = mmap(NULL,
> +                       nr_pages * page_size,
> +                       PROT_READ | PROT_WRITE,
> +                       MAP_SHARED,
> +                       huge_fd,
> +                       *alloc_area == area_src ? 0 : nr_pages * page_size);
>                 if (area_alias == MAP_FAILED)
>                         err("mmap of hugetlb file alias failed");
>         }
>
>         if (*alloc_area == area_src) {
> -               huge_fd_off0 = *alloc_area;
>                 alloc_area_alias = &area_src_alias;
>         } else {
>                 alloc_area_alias = &area_dst_alias;
> @@ -266,12 +279,7 @@ static void hugetlb_alias_mapping(__u64 *start, size_t len, unsigned long offset
>  {
>         if (!map_shared)
>                 return;
> -       /*
> -        * We can't zap just the pagetable with hugetlbfs because
> -        * MADV_DONTEED won't work. So exercise -EEXIST on a alias
> -        * mapping where the pagetables are not established initially,
> -        * this way we'll exercise the -EEXEC at the fs level.
> -        */
> +
>         *start = (unsigned long) area_dst_alias + offset;
>  }
>
> @@ -424,7 +432,6 @@ static void uffd_test_ctx_clear(void)
>                 uffd = -1;
>         }
>
> -       huge_fd_off0 = NULL;
>         munmap_area((void **)&area_src);
>         munmap_area((void **)&area_src_alias);
>         munmap_area((void **)&area_dst);
> @@ -922,10 +929,7 @@ static int faulting_process(int signal_test)
>         struct sigaction act;
>         unsigned long signalled = 0;
>
> -       if (test_type != TEST_HUGETLB)
> -               split_nr_pages = (nr_pages + 1) / 2;
> -       else
> -               split_nr_pages = nr_pages;
> +       split_nr_pages = (nr_pages + 1) / 2;
>
>         if (signal_test) {
>                 sigbuf = &jbuf;
> @@ -982,9 +986,6 @@ static int faulting_process(int signal_test)
>         if (signal_test)
>                 return signalled != split_nr_pages;
>
> -       if (test_type == TEST_HUGETLB)
> -               return 0;
> -
>         area_dst = mremap(area_dst, nr_pages * page_size,  nr_pages * page_size,
>                           MREMAP_MAYMOVE | MREMAP_FIXED, area_src);
>         if (area_dst == MAP_FAILED)
> @@ -1667,7 +1668,7 @@ int main(int argc, char **argv)
>         }
>         nr_pages = nr_pages_per_cpu * nr_cpus;
>
> -       if (test_type == TEST_HUGETLB) {
> +       if (test_type == TEST_HUGETLB && map_shared) {
>                 if (argc < 5)
>                         usage();
>                 huge_fd = open(argv[4], O_CREAT | O_RDWR, 0755);
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
  2022-01-28 22:26 ` [PATCH 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test Mike Kravetz
@ 2022-01-28 23:46   ` Shuah Khan
  2022-01-29  0:19     ` Mike Kravetz
  0 siblings, 1 reply; 8+ messages in thread
From: Shuah Khan @ 2022-01-28 23:46 UTC (permalink / raw)
  To: Mike Kravetz, linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Andrew Morton, Shuah Khan

On 1/28/22 3:26 PM, Mike Kravetz wrote:
> Now that MADV_DONTNEED support for hugetlb is enabled, add corresponding
> tests.  MADV_REMOVE has been enabled for some time, but no tests exist
> so add them as well.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>   tools/testing/selftests/vm/Makefile          |   1 +
>   tools/testing/selftests/vm/hugetlb-madvise.c | 401 +++++++++++++++++++
>   tools/testing/selftests/vm/run_vmtests.sh    |  12 +
>   3 files changed, 414 insertions(+)
>   create mode 100644 tools/testing/selftests/vm/hugetlb-madvise.c
> 
> diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
> index 1607322a112c..f60cf43bbf61 100644
> --- a/tools/testing/selftests/vm/Makefile
> +++ b/tools/testing/selftests/vm/Makefile
> @@ -28,6 +28,7 @@ LDLIBS = -lrt -lpthread
>   TEST_GEN_FILES = compaction_test
>   TEST_GEN_FILES += gup_test
>   TEST_GEN_FILES += hmm-tests
> +TEST_GEN_FILES += hugetlb-madvise

Please update .gitignore with the new binary.

>   TEST_GEN_FILES += hugepage-mmap
>   TEST_GEN_FILES += hugepage-mremap
>   TEST_GEN_FILES += hugepage-shm
> diff --git a/tools/testing/selftests/vm/hugetlb-madvise.c b/tools/testing/selftests/vm/hugetlb-madvise.c
> new file mode 100644
> index 000000000000..31c302528f2c
> --- /dev/null
> +++ b/tools/testing/selftests/vm/hugetlb-madvise.c
> @@ -0,0 +1,401 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * hugepage-madvise:
> + *
> + * Basic functional testing of madvise MADV_DONTNEED and MADV_REMOVE
> + * on hugetlb mappings.
> + *
> + * Before running this test, make sure the administrator has pre-allocated
> + * at least MIN_FREE_PAGES hugetlb pages and they are free.  In addition,
> + * the test takes an argument that is the path to a file in a hugetlbfs
> + * filesystem.  Therefore, a hugetlbfs filesystem must be mounted on some
> + * directory.
> + */
> +
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#define __USE_GNU
> +#include <fcntl.h>
> +
> +#define USAGE	"USAGE: %s <hugepagefile_name>\n"
> +#define MIN_FREE_PAGES	20
> +
> +#define validate_free_pages(exp_free)					\
> +	do {								\
> +		int fhp = get_free_hugepages();				\
> +		if (fhp != (exp_free)) {				\
> +			printf("Unexpected number of free huge "	\
> +				"pages line %d\n", __LINE__);		\
> +			exit(1);					\
> +		}							\
> +	} while (0)
> +
> +unsigned long huge_page_size;
> +unsigned long base_page_size;
> +
> +/*
> + * default_huge_page_size copied from mlock2-tests.c
> + */
> +unsigned long default_huge_page_size(void)
> +{
> +	unsigned long hps = 0;
> +	char *line = NULL;
> +	size_t linelen = 0;
> +	FILE *f = fopen("/proc/meminfo", "r");
> +
> +	if (!f)
> +		return 0;
> +	while (getline(&line, &linelen, f) > 0) {
> +		if (sscanf(line, "Hugepagesize:       %lu kB", &hps) == 1) {
> +			hps <<= 10;
> +			break;
> +		}
> +	}
> +
> +	free(line);
> +	fclose(f);
> +	return hps;
> +}
> +
> +unsigned long get_free_hugepages(void)
> +{
> +	unsigned long fhp = 0;
> +	char *line = NULL;
> +	size_t linelen = 0;
> +	FILE *f = fopen("/proc/meminfo", "r");
> +
> +	if (!f)
> +		return fhp;
> +	while (getline(&line, &linelen, f) > 0) {
> +		if (sscanf(line, "HugePages_Free:      %lu", &fhp) == 1)
> +			break;
> +	}
> +
> +	free(line);
> +	fclose(f);
> +	return fhp;
> +}
> +
> +void write_fault_pages(void *addr, unsigned long nr_pages)
> +{
> +	unsigned long i;
> +
> +	for (i = 0; i < nr_pages; i++)
> +		*((unsigned long *)(addr + (i * huge_page_size))) = i;
> +}
> +
> +void read_fault_pages(void *addr, unsigned long nr_pages)
> +{
> +	unsigned long i, tmp;
> +
> +	for (i = 0; i < nr_pages; i++)
> +		tmp += *((unsigned long *)(addr + (i * huge_page_size)));
> +}
> +
> +int main(int argc, char **argv)
> +{
> +	unsigned long free_hugepages;
> +	void *addr, *addr2;
> +	int fd;
> +	int ret;
> +
> +	if (argc != 2) {
> +		printf(USAGE, argv[0]);
> +		exit(1);
> +	}
> +
> +	huge_page_size = default_huge_page_size();
> +	if (!huge_page_size) {
> +		printf("Unable to determine huge page size, exiting!\n");
> +		exit(1);
> +	}
> +	base_page_size = sysconf(_SC_PAGE_SIZE);
> +	if (!huge_page_size) {
> +		printf("Unable to determine base page size, exiting!\n");
> +		exit(1);
> +	}
> +
> +	free_hugepages = get_free_hugepages();
> +	if (free_hugepages < MIN_FREE_PAGES) {
> +		printf("Not enough free huge pages to test, exiting!\n");
> +		exit(1);
> +	}
> +
> +	fd = open(argv[1], O_CREAT | O_RDWR, 0755);
> +	if (fd < 0) {
> +		perror("Open failed");
> +		exit(1);
> +	}
> +
> +	/*
> +	 * Test validity of MADV_DONTNEED addr and length arguments
> +	 */
> +	addr = mmap(NULL, 12 * huge_page_size, PROT_READ | PROT_WRITE,
> +			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
> +			-1, 0);

What is 12 here? Any significance to this value? Add a define for it
so it it clear why it is used.

> +	if (addr == MAP_FAILED) {
> +		perror("mmap");
> +		exit(1);
> +	}
> +	/* unmap first and last page so we know nothing is mapped there */
> +	if (munmap(addr, huge_page_size) ||
> +			munmap(addr + 11 * huge_page_size, huge_page_size)) {
> +		perror("munmap");
> +		exit(1);
> +	}
> +	addr = addr + huge_page_size;
> +
> +	write_fault_pages(addr, 10);

What is 10 here? Any significance to this value? Add a define for it
so it it clear why it is used.

> +	validate_free_pages(free_hugepages - 10);
> +
> +	/* addr before mapping should fail */
> +	ret = madvise(addr - base_page_size, 10 * huge_page_size,
> +		MADV_DONTNEED);
> +	if (!ret) {
> +		printf("Unexpected success of madvise call with invalid addr line %d\n",
> +				__LINE__);
> +			exit(1);
> +	}
> +
> +	/* addr + length after mapping should fail */
> +	ret = madvise(addr, (10 * huge_page_size) + base_page_size,
> +		MADV_DONTNEED);
> +	if (!ret) {
> +		printf("Unexpected success of madvise call with invalid length line %d\n",
> +				__LINE__);
> +			exit(1);
> +	}
> +
> +	(void)munmap(addr, 10 * huge_page_size);

Same comment on use of 10.

> +
> +	/*
> +	 * Test alignment of MADV_DONTNEED addr and length arguments
> +	 */
> +	addr = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
> +			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
> +			-1, 0);
> +	if (addr == MAP_FAILED) {
> +		perror("mmap");
> +		exit(1);
> +	}
> +	write_fault_pages(addr, 10);

Ditto

> +	validate_free_pages(free_hugepages - 10);

Ditto

> +
> +	/* addr should be aligned down to huge page size */
> +	if (madvise(addr + base_page_size,
> +			10 * huge_page_size - base_page_size, MADV_DONTNEED)) {
> +		perror("madvise");
> +		exit(1);
> +	}
> +
> +	/* should free all pages in mapping */
> +	validate_free_pages(free_hugepages);
> +
> +	write_fault_pages(addr, 10);
> +	validate_free_pages(free_hugepages - 10);
> +
> +	/* addr + length should be aligned up to huge page size */
> +	if (madvise(addr, (10 * huge_page_size) - base_page_size,
> +			MADV_DONTNEED)) {
> +		perror("madvise");
> +		exit(1);
> +	}
> +
> +	/* should free all pages in mapping */
> +	validate_free_pages(free_hugepages);
> +
> +	(void)munmap(addr, 10 * huge_page_size);
> +
> +	/*
> +	 * Test MADV_DONTNEED on anonymous private mapping
> +	 */
> +	addr = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
> +			MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
> +			-1, 0);
> +	if (addr == MAP_FAILED) {
> +		perror("mmap");
> +		exit(1);
> +	}
> +	write_fault_pages(addr, 10);
> +	validate_free_pages(free_hugepages - 10);
> +
> +	if (madvise(addr, 10 * huge_page_size, MADV_DONTNEED)) {
> +		perror("madvise");
> +		exit(1);
> +	}
> +
> +	/* should free all pages in mapping */
> +	validate_free_pages(free_hugepages);
> +
> +	(void)munmap(addr, 10 * huge_page_size);
> +
> +	/*
> +	 * Test MADV_DONTNEED on private mapping of hugetlb file
> +	 */
> +	if (fallocate(fd, 0, 0, 10 * huge_page_size)) {
> +		perror("fallocate");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages - 10);
> +
> +	addr = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
> +			MAP_PRIVATE, fd, 0);
> +	if (addr == MAP_FAILED) {
> +		perror("mmap");
> +		exit(1);
> +	}
> +
> +	/* read should not consume any pages */
> +	read_fault_pages(addr, 10);
> +	validate_free_pages(free_hugepages - 10);
> +
> +	/* madvise should not free any pages */
> +	if (madvise(addr, 10 * huge_page_size, MADV_DONTNEED)) {
> +		perror("madvise");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages - 10);
> +
> +	/* writes should allocate private pages */
> +	write_fault_pages(addr, 10);
> +	validate_free_pages(free_hugepages - 20);
> +
> +	/* madvise should free private pages */
> +	if (madvise(addr, 10 * huge_page_size, MADV_DONTNEED)) {
> +		perror("madvise");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages - 10);
> +
> +	/* writes should allocate private pages */
> +	write_fault_pages(addr, 10);
> +	validate_free_pages(free_hugepages - 20);
> +
> +	/*
> +	 * The fallocate below certainly should free the pages associated
> +	 * with the file.  However, pages in the private mapping are also
> +	 * freed.  This is not the 'correct' behavior, but is expected
> +	 * because this is how it has worked since the initial hugetlb
> +	 * implementation.
> +	 */
> +	if (fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> +						0, 10 * huge_page_size)) {
> +		perror("fallocate");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages);
> +
> +	(void)munmap(addr, 10 * huge_page_size);
> +
> +	/*
> +	 * Test MADV_DONTNEED on shared mapping of hugetlb file
> +	 */
> +	if (fallocate(fd, 0, 0, 10 * huge_page_size)) {
> +		perror("fallocate");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages - 10);
> +
> +	addr = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
> +			MAP_SHARED, fd, 0);
> +	if (addr == MAP_FAILED) {
> +		perror("mmap");
> +		exit(1);
> +	}
> +
> +	/* write should not consume any pages */
> +	write_fault_pages(addr, 10);
> +	validate_free_pages(free_hugepages - 10);
> +
> +	/* madvise should not free any pages */
> +	if (madvise(addr, 10 * huge_page_size, MADV_DONTNEED)) {
> +		perror("madvise");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages - 10);
> +
> +	/*
> +	 * Test MADV_REMOVE on shared mapping of hugetlb file
> +	 *
> +	 * madvise is same as hole punch and should free all pages.
> +	 */
> +	if (madvise(addr, 10 * huge_page_size, MADV_REMOVE)) {
> +		perror("madvise");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages);
> +	(void)munmap(addr, 10 * huge_page_size);
> +
> +	/*
> +	 * Test MADV_REMOVE on shared and private mapping of hugetlb file
> +	 */
> +	if (fallocate(fd, 0, 0, 10 * huge_page_size)) {
> +		perror("fallocate");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages - 10);
> +
> +	addr = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
> +			MAP_SHARED, fd, 0);
> +	if (addr == MAP_FAILED) {
> +		perror("mmap");
> +		exit(1);
> +	}
> +
> +	/* shared write should not consume any additional pages */
> +	write_fault_pages(addr, 10);
> +	validate_free_pages(free_hugepages - 10);
> +
> +	addr2 = mmap(NULL, 10 * huge_page_size, PROT_READ | PROT_WRITE,
> +			MAP_PRIVATE, fd, 0);
> +	if (addr2 == MAP_FAILED) {
> +		perror("mmap");
> +		exit(1);
> +	}
> +
> +	/* private read should not consume any pages */
> +	read_fault_pages(addr2, 10);
> +	validate_free_pages(free_hugepages - 10);
> +
> +	/* private write should consume additional pages */
> +	write_fault_pages(addr2, 10);
> +	validate_free_pages(free_hugepages - 20);
> +
> +	/* madvise of shared mapping should not free any pages */
> +	if (madvise(addr, 10 * huge_page_size, MADV_DONTNEED)) {
> +		perror("madvise");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages - 20);
> +
> +	/* madvise of private mapping should free private pages */
> +	if (madvise(addr2, 10 * huge_page_size, MADV_DONTNEED)) {
> +		perror("madvise");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages - 10);
> +
> +	/* private write should consume additional pages again */
> +	write_fault_pages(addr2, 10);
> +	validate_free_pages(free_hugepages - 20);
> +
> +	/*
> +	 * madvise should free both file and private pages although this is
> +	 * not correct.  private pages should not be freed, but this is
> +	 * expected.  See comment associated with FALLOC_FL_PUNCH_HOLE call.
> +	 */
> +	if (madvise(addr, 10 * huge_page_size, MADV_REMOVE)) {
> +		perror("madvise");
> +		exit(1);
> +	}
> +	validate_free_pages(free_hugepages);
> +
> +	(void)munmap(addr, 10 * huge_page_size);
> +	(void)munmap(addr2, 10 * huge_page_size);
> +
> +	close(fd);
> +	unlink(argv[1]);
> +	return 0;
> +}

Same comment on all usages of value 10

> diff --git a/tools/testing/selftests/vm/run_vmtests.sh b/tools/testing/selftests/vm/run_vmtests.sh
> index 75d401741394..e0daf9ff0cbe 100755
> --- a/tools/testing/selftests/vm/run_vmtests.sh
> +++ b/tools/testing/selftests/vm/run_vmtests.sh
> @@ -119,6 +119,18 @@ else
>   	echo "[PASS]"
>   fi
>   
> +echo "-----------------------"
> +echo "running hugetlb-madvise"
> +echo "-----------------------"
> +./hugetlb-madvise $mnt/madvise-test
> +if [ $? -ne 0 ]; then
> +	echo "[FAIL]"
> +	exitcode=1
> +else
> +	echo "[PASS]"
> +fi
> +rm -f $mnt/madvise-test
> +
>   echo "NOTE: The above hugetlb tests provide minimal coverage.  Use"
>   echo "      https://github.com/libhugetlbfs/libhugetlbfs.git for"
>   echo "      hugetlb regression testing."
> 

With these comments addressed/explained

Reviewed-by: Shuah Khan <skhan@linuxfoundation.org>

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/3] userfaultfd/selftests: enable huegtlb remap and remove event testing
  2022-01-28 23:34   ` Axel Rasmussen
@ 2022-01-28 23:53     ` Mike Kravetz
  0 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-01-28 23:53 UTC (permalink / raw)
  To: Axel Rasmussen
  Cc: Linux MM, LKML, Naoya Horiguchi, David Hildenbrand, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Andrew Morton

On 1/28/22 15:34, Axel Rasmussen wrote:
> Besides the help text, looks correct to me. I applied the patches and
> ran the userfaultfd selftests, and everything seems to work properly.
> 
> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com>
> 
> On Fri, Jan 28, 2022 at 2:26 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> With MADV_DONTNEED support added to hugetlb mappings, mremap testing
>> can also be enabled for hugetlb.
>>
>> Modify the tests to use madvise MADV_DONTNEED and MADV_REMOVE instead of
>> fallocate hole puch for releasing hugetlb pages.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>>  tools/testing/selftests/vm/userfaultfd.c | 67 ++++++++++++------------
>>  1 file changed, 34 insertions(+), 33 deletions(-)
>>
>> diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
>> index d3fd24f9fae8..f5578ef85560 100644
>> --- a/tools/testing/selftests/vm/userfaultfd.c
>> +++ b/tools/testing/selftests/vm/userfaultfd.c
>> @@ -88,7 +88,6 @@ static bool test_uffdio_minor = false;
>>  static bool map_shared;
>>  static int shm_fd;
>>  static int huge_fd;
>> -static char *huge_fd_off0;
>>  static unsigned long long *count_verify;
>>  static int uffd = -1;
>>  static int uffd_flags, finished, *pipefd;
>> @@ -124,9 +123,9 @@ const char *examples =
>>      "./userfaultfd anon 100 99999\n\n"
>>      "# Run share memory test on 1GiB region with 99 bounces:\n"
>>      "./userfaultfd shmem 1000 99\n\n"
>> -    "# Run hugetlb memory test on 256MiB region with 50 bounces (using /dev/hugepages/hugefile):\n"
>> +    "# Run hugetlb memory test on 256MiB region with 50 bounces:\n"
>>      "./userfaultfd hugetlb 256 50 /dev/hugepages/hugefile\n\n"
> 
> We should remove the path from the line above here as well, right?
> Since for the hugetlb test type, we now just MAP_ANONYMOUS |
> MAP_HUGETLB, we don't open a file descriptor.
> 

Yes, and I should also update run_vmtests.sh to not include file path.
The test just ignores the file path in this case.

Thanks,
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test
  2022-01-28 23:46   ` Shuah Khan
@ 2022-01-29  0:19     ` Mike Kravetz
  0 siblings, 0 replies; 8+ messages in thread
From: Mike Kravetz @ 2022-01-29  0:19 UTC (permalink / raw)
  To: Shuah Khan, linux-mm, linux-kernel
  Cc: Naoya Horiguchi, David Hildenbrand, Axel Rasmussen, Mina Almasry,
	Michal Hocko, Peter Xu, Andrea Arcangeli, Shuah Khan,
	Andrew Morton

On 1/28/22 15:46, Shuah Khan wrote:
> On 1/28/22 3:26 PM, Mike Kravetz wrote:
>> Now that MADV_DONTNEED support for hugetlb is enabled, add corresponding
>> tests.  MADV_REMOVE has been enabled for some time, but no tests exist
>> so add them as well.
>>
>> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
>> ---
>>   tools/testing/selftests/vm/Makefile          |   1 +
>>   tools/testing/selftests/vm/hugetlb-madvise.c | 401 +++++++++++++++++++
>>   tools/testing/selftests/vm/run_vmtests.sh    |  12 +
>>   3 files changed, 414 insertions(+)
>>   create mode 100644 tools/testing/selftests/vm/hugetlb-madvise.c
>>
>> diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
>> index 1607322a112c..f60cf43bbf61 100644
>> --- a/tools/testing/selftests/vm/Makefile
>> +++ b/tools/testing/selftests/vm/Makefile
>> @@ -28,6 +28,7 @@ LDLIBS = -lrt -lpthread
>>   TEST_GEN_FILES = compaction_test
>>   TEST_GEN_FILES += gup_test
>>   TEST_GEN_FILES += hmm-tests
>> +TEST_GEN_FILES += hugetlb-madvise
> 
> Please update .gitignore with the new binary.
> 

Will do.

>>   TEST_GEN_FILES += hugepage-mmap
>>   TEST_GEN_FILES += hugepage-mremap
>>   TEST_GEN_FILES += hugepage-shm
>> diff --git a/tools/testing/selftests/vm/hugetlb-madvise.c b/tools/testing/selftests/vm/hugetlb-madvise.c
>> new file mode 100644
>> index 000000000000..31c302528f2c
>> --- /dev/null
>> +++ b/tools/testing/selftests/vm/hugetlb-madvise.c
>> @@ -0,0 +1,401 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * hugepage-madvise:
>> + *
>> + * Basic functional testing of madvise MADV_DONTNEED and MADV_REMOVE
>> + * on hugetlb mappings.
>> + *
>> + * Before running this test, make sure the administrator has pre-allocated
>> + * at least MIN_FREE_PAGES hugetlb pages and they are free.  In addition,
>> + * the test takes an argument that is the path to a file in a hugetlbfs
>> + * filesystem.  Therefore, a hugetlbfs filesystem must be mounted on some
>> + * directory.
>> + */
>> +
>> +#include <stdlib.h>
>> +#include <stdio.h>
>> +#include <unistd.h>
>> +#include <sys/mman.h>
>> +#define __USE_GNU
>> +#include <fcntl.h>
>> +
>> +#define USAGE    "USAGE: %s <hugepagefile_name>\n"
>> +#define MIN_FREE_PAGES    20
>> +
>> +#define validate_free_pages(exp_free)                    \
>> +    do {                                \
>> +        int fhp = get_free_hugepages();                \
>> +        if (fhp != (exp_free)) {                \
>> +            printf("Unexpected number of free huge "    \
>> +                "pages line %d\n", __LINE__);        \
>> +            exit(1);                    \
>> +        }                            \
>> +    } while (0)
>> +
>> +unsigned long huge_page_size;
>> +unsigned long base_page_size;
>> +
>> +/*
>> + * default_huge_page_size copied from mlock2-tests.c
>> + */
>> +unsigned long default_huge_page_size(void)
>> +{
>> +    unsigned long hps = 0;
>> +    char *line = NULL;
>> +    size_t linelen = 0;
>> +    FILE *f = fopen("/proc/meminfo", "r");
>> +
>> +    if (!f)
>> +        return 0;
>> +    while (getline(&line, &linelen, f) > 0) {
>> +        if (sscanf(line, "Hugepagesize:       %lu kB", &hps) == 1) {
>> +            hps <<= 10;
>> +            break;
>> +        }
>> +    }
>> +
>> +    free(line);
>> +    fclose(f);
>> +    return hps;
>> +}
>> +
>> +unsigned long get_free_hugepages(void)
>> +{
>> +    unsigned long fhp = 0;
>> +    char *line = NULL;
>> +    size_t linelen = 0;
>> +    FILE *f = fopen("/proc/meminfo", "r");
>> +
>> +    if (!f)
>> +        return fhp;
>> +    while (getline(&line, &linelen, f) > 0) {
>> +        if (sscanf(line, "HugePages_Free:      %lu", &fhp) == 1)
>> +            break;
>> +    }
>> +
>> +    free(line);
>> +    fclose(f);
>> +    return fhp;
>> +}
>> +
>> +void write_fault_pages(void *addr, unsigned long nr_pages)
>> +{
>> +    unsigned long i;
>> +
>> +    for (i = 0; i < nr_pages; i++)
>> +        *((unsigned long *)(addr + (i * huge_page_size))) = i;
>> +}
>> +
>> +void read_fault_pages(void *addr, unsigned long nr_pages)
>> +{
>> +    unsigned long i, tmp;
>> +
>> +    for (i = 0; i < nr_pages; i++)
>> +        tmp += *((unsigned long *)(addr + (i * huge_page_size)));
>> +}
>> +
>> +int main(int argc, char **argv)
>> +{
>> +    unsigned long free_hugepages;
>> +    void *addr, *addr2;
>> +    int fd;
>> +    int ret;
>> +
>> +    if (argc != 2) {
>> +        printf(USAGE, argv[0]);
>> +        exit(1);
>> +    }
>> +
>> +    huge_page_size = default_huge_page_size();
>> +    if (!huge_page_size) {
>> +        printf("Unable to determine huge page size, exiting!\n");
>> +        exit(1);
>> +    }
>> +    base_page_size = sysconf(_SC_PAGE_SIZE);
>> +    if (!huge_page_size) {
>> +        printf("Unable to determine base page size, exiting!\n");
>> +        exit(1);
>> +    }
>> +
>> +    free_hugepages = get_free_hugepages();
>> +    if (free_hugepages < MIN_FREE_PAGES) {
>> +        printf("Not enough free huge pages to test, exiting!\n");
>> +        exit(1);
>> +    }
>> +
>> +    fd = open(argv[1], O_CREAT | O_RDWR, 0755);
>> +    if (fd < 0) {
>> +        perror("Open failed");
>> +        exit(1);
>> +    }
>> +
>> +    /*
>> +     * Test validity of MADV_DONTNEED addr and length arguments
>> +     */
>> +    addr = mmap(NULL, 12 * huge_page_size, PROT_READ | PROT_WRITE,
>> +            MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB,
>> +            -1, 0);
> 
> What is 12 here? Any significance to this value? Add a define for it
> so it it clear why it is used.

I'll add a commented #define for the 10 as that is used throughout the
file.  I'll add a comment here about mapping and then upmapping areas to
make sure nothing is mapped at specific addresses.

Thanks for taking a look!
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-01-29  0:20 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-28 22:26 [PATCH 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
2022-01-28 22:26 ` [PATCH 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
2022-01-28 22:26 ` [PATCH 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test Mike Kravetz
2022-01-28 23:46   ` Shuah Khan
2022-01-29  0:19     ` Mike Kravetz
2022-01-28 22:26 ` [PATCH 3/3] userfaultfd/selftests: enable huegtlb remap and remove event testing Mike Kravetz
2022-01-28 23:34   ` Axel Rasmussen
2022-01-28 23:53     ` Mike Kravetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.