All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] userfaultfd updates for v4.13-rc3
@ 2017-08-02 16:51 Andrea Arcangeli
  2017-08-02 16:51 ` [PATCH 1/6] userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case Andrea Arcangeli
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2017-08-02 16:51 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport,
	Mike Kravetz, Alexey Perevalov

Hello,

these are some uffd updates I have pending that looks ready for
merging. vhost-user KVM developement run into a crash so patch 1/6 is
urgent (and simple), the rest is not urgent.

The testcase has been updated to exercise it.

This should apply clean to -mm, and I reviewed in detail all other
userfaultfd patches that are in -mm and they're all great, including
the shmem zeropage addition.

Alexey Perevalov (1):
  userfaultfd: provide pid in userfault msg

Andrea Arcangeli (5):
  userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED
    case
  userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST
  userfaultfd: selftest: explicit failure if the SIGBUS test failed
  userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds
  userfaultfd: provide pid in userfault msg - add feat union

 fs/userfaultfd.c                         |   8 +-
 include/uapi/linux/userfaultfd.h         |  12 ++-
 mm/hugetlb.c                             |   2 +-
 mm/mmap.c                                |  22 +++--
 tools/testing/selftests/vm/userfaultfd.c | 149 +++++++++++++++++++++++++++++--
 5 files changed, 172 insertions(+), 21 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/6] userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
  2017-08-02 16:51 [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrea Arcangeli
@ 2017-08-02 16:51 ` Andrea Arcangeli
  2017-08-02 17:12   ` Mike Kravetz
  2017-08-02 16:51 ` [PATCH 2/6] userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST Andrea Arcangeli
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Andrea Arcangeli @ 2017-08-02 16:51 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport,
	Mike Kravetz, Alexey Perevalov

huge_add_to_page_cache->add_to_page_cache implicitly unlock the page
before returning in case of errors.

The error returned was -EEXIST by running UFFDIO_COPY on a non-hole
offset of a VM_SHARED hugetlbfs mapping. It was an userland bug that
triggered it and the kernel must cope with it returning -EEXIST from
ioctl(UFFDIO_COPY) as expected.

page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
------------[ cut here ]------------
kernel BUG at mm/filemap.c:964!
invalid opcode: 0000 [#1] SMP
CPU: 1 PID: 22582 Comm: qemu-system-x86 Not tainted 4.11.11-300.fc26.x86_64 #1
task: ffff973131ab2600 task.stack: ffffacc0cba78000
RIP: 0010:unlock_page+0x4a/0x50
RSP: 0018:ffffacc0cba7bca0 EFLAGS: 00010246
RAX: 0000000000000036 RBX: fffff99d09f38000 RCX: 0000000000000006
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff97326148e0c0
RBP: ffffacc0cba7bca0 R08: 00000000000006be R09: 0000000000000004
R10: 00000000000007ae R11: ffffffffb622cbed R12: 0000000000000008
R13: ffff972f9a265240 R14: ffff972de2919740 R15: ffffffffb62da820
FS:  00007f122efff700(0000) GS:ffff973261480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd52f788ea8 CR3: 000000036d022000 CR4: 00000000003426e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 hugetlb_mcopy_atomic_pte+0xc0/0x320
 mcopy_atomic+0x96f/0xbe0
 userfaultfd_ioctl+0x218/0xe90
 ? __schedule+0x23c/0x8d0
 ? hrtimer_start_range_ns+0x1bd/0x330
 do_vfs_ioctl+0xa5/0x600
 ? do_vfs_ioctl+0xa5/0x600
 SyS_ioctl+0x79/0x90
 entry_SYSCALL_64_fastpath+0x1a/0xa9

Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 mm/hugetlb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bc48ee783dd9..5a240c72c3b6 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4062,9 +4062,9 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
 	return ret;
 out_release_unlock:
 	spin_unlock(ptl);
-out_release_nounlock:
 	if (vm_shared)
 		unlock_page(page);
+out_release_nounlock:
 	put_page(page);
 	goto out;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/6] userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST
  2017-08-02 16:51 [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrea Arcangeli
  2017-08-02 16:51 ` [PATCH 1/6] userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case Andrea Arcangeli
@ 2017-08-02 16:51 ` Andrea Arcangeli
  2017-08-02 21:33   ` Mike Kravetz
  2017-08-02 16:51 ` [PATCH 3/6] userfaultfd: selftest: explicit failure if the SIGBUS test failed Andrea Arcangeli
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Andrea Arcangeli @ 2017-08-02 16:51 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport,
	Mike Kravetz, Alexey Perevalov

This will retry the UFFDIO_COPY/ZEROPAGE to verify it returns -EEXIST
at the first invocation and then later every 10 seconds.

In the filebacked MAP_SHARED case this also verifies the -EEXIST
triggered in the filesystem pagecache insertion, if the offset in the
file was not a hole.

shmem MAP_SHARED tries to index the newly allocated pagecache in the
radix tree before checking the pagetable so it doesn't need any
assistance to exercise that case.

hugetlbfs checks the pmd to be not none before trying to index the
hugetlbfs page in the radix tree, so it requires to run UFFDIO_COPY
into an alias mapping (the alternative would be to use MADV_DONTNEED
to only zap the pagetables, but that doesn't work on hugetlbfs).

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 145 +++++++++++++++++++++++++++++--
 1 file changed, 138 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 7db6299b2f0d..d07156de55e8 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -67,6 +67,7 @@
 #include <pthread.h>
 #include <linux/userfaultfd.h>
 #include <setjmp.h>
+#include <stdbool.h>
 
 #ifdef __NR_userfaultfd
 
@@ -83,11 +84,17 @@ static int bounces;
 #define TEST_SHMEM	3
 static int test_type;
 
+/* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */
+#define ALARM_INTERVAL_SECS 10
+static volatile bool test_uffdio_copy_eexist = true;
+static volatile bool test_uffdio_zeropage_eexist = true;
+
+static bool map_shared;
 static int huge_fd;
 static char *huge_fd_off0;
 static unsigned long long *count_verify;
 static int uffd, uffd_flags, finished, *pipefd;
-static char *area_src, *area_dst;
+static char *area_src, *area_src_alias, *area_dst, *area_dst_alias;
 static char *zeropage;
 pthread_attr_t attr;
 
@@ -126,6 +133,9 @@ static void anon_allocate_area(void **alloc_area)
 	}
 }
 
+static void noop_alias_mapping(__u64 *start, size_t len, unsigned long offset)
+{
+}
 
 /* HugeTLB memory */
 static int hugetlb_release_pages(char *rel_area)
@@ -146,17 +156,51 @@ static int hugetlb_release_pages(char *rel_area)
 
 static void hugetlb_allocate_area(void **alloc_area)
 {
+	void *area_alias = NULL;
+	char **alloc_area_alias;
 	*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
-				MAP_PRIVATE | MAP_HUGETLB, huge_fd,
-				*alloc_area == area_src ? 0 :
-				nr_pages * page_size);
+			   (map_shared ? MAP_SHARED : MAP_PRIVATE) |
+			   MAP_HUGETLB,
+			   huge_fd, *alloc_area == area_src ? 0 :
+			   nr_pages * page_size);
 	if (*alloc_area == MAP_FAILED) {
 		fprintf(stderr, "mmap of hugetlbfs file failed\n");
 		*alloc_area = NULL;
 	}
 
-	if (*alloc_area == area_src)
+	if (map_shared) {
+		area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
+				  MAP_SHARED | MAP_HUGETLB,
+				  huge_fd, *alloc_area == area_src ? 0 :
+				  nr_pages * page_size);
+		if (area_alias == MAP_FAILED) {
+			if (munmap(*alloc_area, nr_pages * page_size) < 0)
+				perror("hugetlb munmap"), exit(1);
+			*alloc_area = NULL;
+			return;
+		}
+	}
+	if (*alloc_area == area_src) {
 		huge_fd_off0 = *alloc_area;
+		alloc_area_alias = &area_src_alias;
+	} else {
+		alloc_area_alias = &area_dst_alias;
+	}
+	if (area_alias)
+		*alloc_area_alias = area_alias;
+}
+
+static void hugetlb_alias_mapping(__u64 *start, size_t len, unsigned long offset)
+{
+	if (!map_shared)
+		return;
+	/*
+	 * We can't zap just the pagetable with hugetlbfs because
+	 * MADV_DONTEED won't work. So exercise -EEXIST on a alias
+	 * mapping where the pagetables are not established initially,
+	 * this way we'll exercise the -EEXEC at the fs level.
+	 */
+	*start = (unsigned long) area_dst_alias + offset;
 }
 
 /* Shared memory */
@@ -186,6 +230,7 @@ struct uffd_test_ops {
 	unsigned long expected_ioctls;
 	void (*allocate_area)(void **alloc_area);
 	int (*release_pages)(char *rel_area);
+	void (*alias_mapping)(__u64 *start, size_t len, unsigned long offset);
 };
 
 #define ANON_EXPECTED_IOCTLS		((1 << _UFFDIO_WAKE) | \
@@ -196,18 +241,21 @@ static struct uffd_test_ops anon_uffd_test_ops = {
 	.expected_ioctls = ANON_EXPECTED_IOCTLS,
 	.allocate_area	= anon_allocate_area,
 	.release_pages	= anon_release_pages,
+	.alias_mapping = noop_alias_mapping,
 };
 
 static struct uffd_test_ops shmem_uffd_test_ops = {
 	.expected_ioctls = ANON_EXPECTED_IOCTLS,
 	.allocate_area	= shmem_allocate_area,
 	.release_pages	= shmem_release_pages,
+	.alias_mapping = noop_alias_mapping,
 };
 
 static struct uffd_test_ops hugetlb_uffd_test_ops = {
 	.expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC,
 	.allocate_area	= hugetlb_allocate_area,
 	.release_pages	= hugetlb_release_pages,
+	.alias_mapping = hugetlb_alias_mapping,
 };
 
 static struct uffd_test_ops *uffd_test_ops;
@@ -332,6 +380,23 @@ static void *locking_thread(void *arg)
 	return NULL;
 }
 
+static void retry_copy_page(int ufd, struct uffdio_copy *uffdio_copy,
+			    unsigned long offset)
+{
+	uffd_test_ops->alias_mapping(&uffdio_copy->dst,
+				     uffdio_copy->len,
+				     offset);
+	if (ioctl(ufd, UFFDIO_COPY, uffdio_copy)) {
+		/* real retval in ufdio_copy.copy */
+		if (uffdio_copy->copy != -EEXIST)
+			fprintf(stderr, "UFFDIO_COPY retry error %Ld\n",
+				uffdio_copy->copy), exit(1);
+	} else {
+		fprintf(stderr,	"UFFDIO_COPY retry unexpected %Ld\n",
+			uffdio_copy->copy), exit(1);
+	}
+}
+
 static int copy_page(int ufd, unsigned long offset)
 {
 	struct uffdio_copy uffdio_copy;
@@ -352,8 +417,13 @@ static int copy_page(int ufd, unsigned long offset)
 	} else if (uffdio_copy.copy != page_size) {
 		fprintf(stderr, "UFFDIO_COPY unexpected copy %Ld\n",
 			uffdio_copy.copy), exit(1);
-	} else
+	} else {
+		if (test_uffdio_copy_eexist) {
+			test_uffdio_copy_eexist = false;
+			retry_copy_page(ufd, &uffdio_copy, offset);
+		}
 		return 1;
+	}
 	return 0;
 }
 
@@ -692,6 +762,23 @@ static int faulting_process(int signal_test)
 	return 0;
 }
 
+static void retry_uffdio_zeropage(int ufd,
+				  struct uffdio_zeropage *uffdio_zeropage,
+				  unsigned long offset)
+{
+	uffd_test_ops->alias_mapping(&uffdio_zeropage->range.start,
+				     uffdio_zeropage->range.len,
+				     offset);
+	if (ioctl(ufd, UFFDIO_ZEROPAGE, uffdio_zeropage)) {
+		if (uffdio_zeropage->zeropage != -EEXIST)
+			fprintf(stderr, "UFFDIO_ZEROPAGE retry error %Ld\n",
+				uffdio_zeropage->zeropage), exit(1);
+	} else {
+		fprintf(stderr, "UFFDIO_ZEROPAGE retry unexpected %Ld\n",
+			uffdio_zeropage->zeropage), exit(1);
+	}
+}
+
 static int uffdio_zeropage(int ufd, unsigned long offset)
 {
 	struct uffdio_zeropage uffdio_zeropage;
@@ -727,6 +814,11 @@ static int uffdio_zeropage(int ufd, unsigned long offset)
 			fprintf(stderr, "UFFDIO_ZEROPAGE unexpected %Ld\n",
 				uffdio_zeropage.zeropage), exit(1);
 		} else
+			if (test_uffdio_zeropage_eexist) {
+				test_uffdio_zeropage_eexist = false;
+				retry_uffdio_zeropage(ufd, &uffdio_zeropage,
+						      offset);
+			}
 			return 1;
 	} else {
 		fprintf(stderr,
@@ -999,6 +1091,15 @@ static int userfaultfd_stress(void)
 			return 1;
 		}
 
+		if (area_dst_alias) {
+			uffdio_register.range.start = (unsigned long)
+				area_dst_alias;
+			if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) {
+				fprintf(stderr, "register failure alias\n");
+				return 1;
+			}
+		}
+
 		/*
 		 * The madvise done previously isn't enough: some
 		 * uffd_thread could have read userfaults (one of
@@ -1032,9 +1133,17 @@ static int userfaultfd_stress(void)
 
 		/* unregister */
 		if (ioctl(uffd, UFFDIO_UNREGISTER, &uffdio_register.range)) {
-			fprintf(stderr, "register failure\n");
+			fprintf(stderr, "unregister failure\n");
 			return 1;
 		}
+		if (area_dst_alias) {
+			uffdio_register.range.start = (unsigned long) area_dst;
+			if (ioctl(uffd, UFFDIO_UNREGISTER,
+				  &uffdio_register.range)) {
+				fprintf(stderr, "unregister failure alias\n");
+				return 1;
+			}
+		}
 
 		/* verification */
 		if (bounces & BOUNCE_VERIFY) {
@@ -1056,6 +1165,10 @@ static int userfaultfd_stress(void)
 		area_src = area_dst;
 		area_dst = tmp_area;
 
+		tmp_area = area_src_alias;
+		area_src_alias = area_dst_alias;
+		area_dst_alias = tmp_area;
+
 		printf("userfaults:");
 		for (cpu = 0; cpu < nr_cpus; cpu++)
 			printf(" %lu", userfaults[cpu]);
@@ -1102,7 +1215,12 @@ static void set_test_type(const char *type)
 	} else if (!strcmp(type, "hugetlb")) {
 		test_type = TEST_HUGETLB;
 		uffd_test_ops = &hugetlb_uffd_test_ops;
+	} else if (!strcmp(type, "hugetlb_shared")) {
+		map_shared = true;
+		test_type = TEST_HUGETLB;
+		uffd_test_ops = &hugetlb_uffd_test_ops;
 	} else if (!strcmp(type, "shmem")) {
+		map_shared = true;
 		test_type = TEST_SHMEM;
 		uffd_test_ops = &shmem_uffd_test_ops;
 	} else {
@@ -1122,12 +1240,25 @@ static void set_test_type(const char *type)
 		fprintf(stderr, "Impossible to run this test\n"), exit(2);
 }
 
+static void sigalrm(int sig)
+{
+	if (sig != SIGALRM)
+		abort();
+	test_uffdio_copy_eexist = true;
+	test_uffdio_zeropage_eexist = true;
+	alarm(ALARM_INTERVAL_SECS);
+}
+
 int main(int argc, char **argv)
 {
 	if (argc < 4)
 		fprintf(stderr, "Usage: <test type> <MiB> <bounces> [hugetlbfs_file]\n"),
 				exit(1);
 
+	if (signal(SIGALRM, sigalrm) == SIG_ERR)
+		fprintf(stderr, "failed to arm SIGALRM"), exit(1);
+	alarm(ALARM_INTERVAL_SECS);
+
 	set_test_type(argv[1]);
 
 	nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/6] userfaultfd: selftest: explicit failure if the SIGBUS test failed
  2017-08-02 16:51 [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrea Arcangeli
  2017-08-02 16:51 ` [PATCH 1/6] userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case Andrea Arcangeli
  2017-08-02 16:51 ` [PATCH 2/6] userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST Andrea Arcangeli
@ 2017-08-02 16:51 ` Andrea Arcangeli
  2017-08-02 16:51 ` [PATCH 4/6] userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds Andrea Arcangeli
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2017-08-02 16:51 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport,
	Mike Kravetz, Alexey Perevalov

Showing zero in the output isn't very self explanatory as a successful
result. Show a more explicit error output if the test fails.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index d07156de55e8..34838d5b33f3 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -986,7 +986,9 @@ static int userfaultfd_sig_test(void)
 		return 1;
 
 	printf("done.\n");
-	printf(" Signal test userfaults: %ld\n", userfaults);
+	if (userfaults)
+		fprintf(stderr, "Signal test failed, userfaults: %ld\n",
+			userfaults);
 	close(uffd);
 	return userfaults != 0;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/6] userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds
  2017-08-02 16:51 [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrea Arcangeli
                   ` (2 preceding siblings ...)
  2017-08-02 16:51 ` [PATCH 3/6] userfaultfd: selftest: explicit failure if the SIGBUS test failed Andrea Arcangeli
@ 2017-08-02 16:51 ` Andrea Arcangeli
  2017-08-02 16:51 ` [PATCH 5/6] userfaultfd: provide pid in userfault msg Andrea Arcangeli
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2017-08-02 16:51 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport,
	Mike Kravetz, Alexey Perevalov

A __split_vma is not a worthy event to report, and it's definitely not
a unmap so it would be incorrect to report unmap for the whole region
to the userfaultfd manager if a __split_vma fails.

So only call userfaultfd_unmap_prep after the __vma_splitting is over
and do_munmap cannot fail anymore.

Also add unlikely because it's better to optimize for the vast
majority of apps that aren't using userfaultfd in a non cooperative
way. Ideally we should also find a way to eliminate the branch
entirely if CONFIG_USERFAULTFD=n, but it would complicate things so
stick to unlikely for now.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 mm/mmap.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index f19efcf75418..9b0161f1adf3 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2639,13 +2639,6 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
 	if (vma->vm_start >= end)
 		return 0;
 
-	if (uf) {
-		int error = userfaultfd_unmap_prep(vma, start, end, uf);
-
-		if (error)
-			return error;
-	}
-
 	/*
 	 * If we need to split any vma, do it now to save pain later.
 	 *
@@ -2679,6 +2672,21 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
 	}
 	vma = prev ? prev->vm_next : mm->mmap;
 
+	if (unlikely(uf)) {
+		/*
+		 * If userfaultfd_unmap_prep returns an error the vmas
+		 * will remain splitted, but userland will get a
+		 * highly unexpected error anyway. This is no
+		 * different than the case where the first of the two
+		 * __split_vma fails, but we don't undo the first
+		 * split, despite we could. This is unlikely enough
+		 * failure that it's not worth optimizing it for.
+		 */
+		int error = userfaultfd_unmap_prep(vma, start, end, uf);
+		if (error)
+			return error;
+	}
+
 	/*
 	 * unlock any mlock()ed ranges before detaching vmas
 	 */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 5/6] userfaultfd: provide pid in userfault msg
  2017-08-02 16:51 [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrea Arcangeli
                   ` (3 preceding siblings ...)
  2017-08-02 16:51 ` [PATCH 4/6] userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds Andrea Arcangeli
@ 2017-08-02 16:51 ` Andrea Arcangeli
  2017-08-02 16:51 ` [PATCH 6/6] userfaultfd: provide pid in userfault msg - add feat union Andrea Arcangeli
  2017-08-02 21:29 ` [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrew Morton
  6 siblings, 0 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2017-08-02 16:51 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport,
	Mike Kravetz, Alexey Perevalov

From: Alexey Perevalov <a.perevalov@samsung.com>

It could be useful for calculating downtime during
postcopy live migration per vCPU. Side observer or application itself
will be informed about proper task's sleep during userfaultfd
processing.

Process's thread id is being provided when user requeste it
by setting UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.

Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 fs/userfaultfd.c                 |  8 ++++++--
 include/uapi/linux/userfaultfd.h | 10 +++++++---
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 03510becb321..ae044650dffa 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -181,7 +181,8 @@ static inline void msg_init(struct uffd_msg *msg)
 
 static inline struct uffd_msg userfault_msg(unsigned long address,
 					    unsigned int flags,
-					    unsigned long reason)
+					    unsigned long reason,
+					    unsigned int features)
 {
 	struct uffd_msg msg;
 	msg_init(&msg);
@@ -205,6 +206,8 @@ static inline struct uffd_msg userfault_msg(unsigned long address,
 		 * write protect fault.
 		 */
 		msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WP;
+	if (features & UFFD_FEATURE_THREAD_ID)
+		msg.arg.pagefault.ptid = task_pid_vnr(current);
 	return msg;
 }
 
@@ -425,7 +428,8 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason)
 
 	init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
 	uwq.wq.private = current;
-	uwq.msg = userfault_msg(vmf->address, vmf->flags, reason);
+	uwq.msg = userfault_msg(vmf->address, vmf->flags, reason,
+			ctx->features);
 	uwq.ctx = ctx;
 	uwq.waken = false;
 
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index d39d5db56771..2b24c28d99a7 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -24,7 +24,8 @@
 			   UFFD_FEATURE_EVENT_UNMAP |		\
 			   UFFD_FEATURE_MISSING_HUGETLBFS |	\
 			   UFFD_FEATURE_MISSING_SHMEM |		\
-			   UFFD_FEATURE_SIGBUS)
+			   UFFD_FEATURE_SIGBUS |		\
+			   UFFD_FEATURE_THREAD_ID)
 #define UFFD_API_IOCTLS				\
 	((__u64)1 << _UFFDIO_REGISTER |		\
 	 (__u64)1 << _UFFDIO_UNREGISTER |	\
@@ -79,6 +80,7 @@ struct uffd_msg {
 		struct {
 			__u64	flags;
 			__u64	address;
+			__u32   ptid;
 		} pagefault;
 
 		struct {
@@ -158,8 +160,9 @@ struct uffdio_api {
 	 * UFFD_FEATURE_SIGBUS feature means no page-fault
 	 * (UFFD_EVENT_PAGEFAULT) event will be delivered, instead
 	 * a SIGBUS signal will be sent to the faulting process.
-	 * The application process can enable this behavior by adding
-	 * it to uffdio_api.features.
+	 *
+	 * UFFD_FEATURE_THREAD_ID pid of the page faulted task_struct will
+	 * be returned, if feature is not requested 0 will be returned.
 	 */
 #define UFFD_FEATURE_PAGEFAULT_FLAG_WP		(1<<0)
 #define UFFD_FEATURE_EVENT_FORK			(1<<1)
@@ -169,6 +172,7 @@ struct uffdio_api {
 #define UFFD_FEATURE_MISSING_SHMEM		(1<<5)
 #define UFFD_FEATURE_EVENT_UNMAP		(1<<6)
 #define UFFD_FEATURE_SIGBUS			(1<<7)
+#define UFFD_FEATURE_THREAD_ID			(1<<8)
 	__u64 features;
 
 	__u64 ioctls;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 6/6] userfaultfd: provide pid in userfault msg - add feat union
  2017-08-02 16:51 [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrea Arcangeli
                   ` (4 preceding siblings ...)
  2017-08-02 16:51 ` [PATCH 5/6] userfaultfd: provide pid in userfault msg Andrea Arcangeli
@ 2017-08-02 16:51 ` Andrea Arcangeli
  2017-08-02 21:29 ` [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrew Morton
  6 siblings, 0 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2017-08-02 16:51 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport,
	Mike Kravetz, Alexey Perevalov

No ABI change, but this will make it more explicit to software that
ptid is only available if requested by passing UFFD_FEATURE_THREAD_ID
to UFFDIO_API. The fact it's a union will also self document it
shouldn't be taken for granted there's a tpid there.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 fs/userfaultfd.c                 | 2 +-
 include/uapi/linux/userfaultfd.h | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index ae044650dffa..9677d862ed53 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -207,7 +207,7 @@ static inline struct uffd_msg userfault_msg(unsigned long address,
 		 */
 		msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WP;
 	if (features & UFFD_FEATURE_THREAD_ID)
-		msg.arg.pagefault.ptid = task_pid_vnr(current);
+		msg.arg.pagefault.feat.ptid = task_pid_vnr(current);
 	return msg;
 }
 
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 2b24c28d99a7..d6d1f65cb3c3 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -80,7 +80,9 @@ struct uffd_msg {
 		struct {
 			__u64	flags;
 			__u64	address;
-			__u32   ptid;
+			union {
+				__u32 ptid;
+			} feat;
 		} pagefault;
 
 		struct {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/6] userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case
  2017-08-02 16:51 ` [PATCH 1/6] userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case Andrea Arcangeli
@ 2017-08-02 17:12   ` Mike Kravetz
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Kravetz @ 2017-08-02 17:12 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton, linux-mm
  Cc: Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport, Alexey Perevalov

On 08/02/2017 09:51 AM, Andrea Arcangeli wrote:
> huge_add_to_page_cache->add_to_page_cache implicitly unlock the page
> before returning in case of errors.
> 
> The error returned was -EEXIST by running UFFDIO_COPY on a non-hole
> offset of a VM_SHARED hugetlbfs mapping. It was an userland bug that
> triggered it and the kernel must cope with it returning -EEXIST from
> ioctl(UFFDIO_COPY) as expected.
> 
> page dumped because: VM_BUG_ON_PAGE(!PageLocked(page))
> ------------[ cut here ]------------
> kernel BUG at mm/filemap.c:964!
> invalid opcode: 0000 [#1] SMP
> CPU: 1 PID: 22582 Comm: qemu-system-x86 Not tainted 4.11.11-300.fc26.x86_64 #1
> task: ffff973131ab2600 task.stack: ffffacc0cba78000
> RIP: 0010:unlock_page+0x4a/0x50
> RSP: 0018:ffffacc0cba7bca0 EFLAGS: 00010246
> RAX: 0000000000000036 RBX: fffff99d09f38000 RCX: 0000000000000006
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff97326148e0c0
> RBP: ffffacc0cba7bca0 R08: 00000000000006be R09: 0000000000000004
> R10: 00000000000007ae R11: ffffffffb622cbed R12: 0000000000000008
> R13: ffff972f9a265240 R14: ffff972de2919740 R15: ffffffffb62da820
> FS:  00007f122efff700(0000) GS:ffff973261480000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fd52f788ea8 CR3: 000000036d022000 CR4: 00000000003426e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  hugetlb_mcopy_atomic_pte+0xc0/0x320
>  mcopy_atomic+0x96f/0xbe0
>  userfaultfd_ioctl+0x218/0xe90
>  ? __schedule+0x23c/0x8d0
>  ? hrtimer_start_range_ns+0x1bd/0x330
>  do_vfs_ioctl+0xa5/0x600
>  ? do_vfs_ioctl+0xa5/0x600
>  SyS_ioctl+0x79/0x90
>  entry_SYSCALL_64_fastpath+0x1a/0xa9
> 
> Tested-by: Maxime Coquelin <maxime.coquelin@redhat.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Thanks for catching this and fixing it.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bc48ee783dd9..5a240c72c3b6 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -4062,9 +4062,9 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
>  	return ret;
>  out_release_unlock:
>  	spin_unlock(ptl);
> -out_release_nounlock:
>  	if (vm_shared)
>  		unlock_page(page);
> +out_release_nounlock:
>  	put_page(page);
>  	goto out;
>  }
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] userfaultfd updates for v4.13-rc3
  2017-08-02 16:51 [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrea Arcangeli
                   ` (5 preceding siblings ...)
  2017-08-02 16:51 ` [PATCH 6/6] userfaultfd: provide pid in userfault msg - add feat union Andrea Arcangeli
@ 2017-08-02 21:29 ` Andrew Morton
  2017-08-02 22:27   ` Andrea Arcangeli
  6 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2017-08-02 21:29 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport,
	Mike Kravetz, Alexey Perevalov

On Wed,  2 Aug 2017 18:51:39 +0200 Andrea Arcangeli <aarcange@redhat.com> wrote:

> Hello,
> 
> these are some uffd updates I have pending that looks ready for
> merging. vhost-user KVM developement run into a crash so patch 1/6 is
> urgent (and simple), the rest is not urgent.
> 
> The testcase has been updated to exercise it.
> 
> This should apply clean to -mm, and I reviewed in detail all other
> userfaultfd patches that are in -mm and they're all great, including
> the shmem zeropage addition.
> 
> Alexey Perevalov (1):
>   userfaultfd: provide pid in userfault msg
> 
> Andrea Arcangeli (5):
>   userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED
>     case
>   userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST
>   userfaultfd: selftest: explicit failure if the SIGBUS test failed
>   userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds
>   userfaultfd: provide pid in userfault msg - add feat union

I'm thinking "userfaultfd: hugetlbfs: remove superfluous page unlock in
VM_SHARED case" goes into 4.13-rc and the other patches into 4.14-rc1. 
Sound sane?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/6] userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST
  2017-08-02 16:51 ` [PATCH 2/6] userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST Andrea Arcangeli
@ 2017-08-02 21:33   ` Mike Kravetz
  0 siblings, 0 replies; 11+ messages in thread
From: Mike Kravetz @ 2017-08-02 21:33 UTC (permalink / raw)
  To: Andrea Arcangeli, Andrew Morton, linux-mm
  Cc: Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport, Alexey Perevalov

On 08/02/2017 09:51 AM, Andrea Arcangeli wrote:
> This will retry the UFFDIO_COPY/ZEROPAGE to verify it returns -EEXIST
> at the first invocation and then later every 10 seconds.
> 
> In the filebacked MAP_SHARED case this also verifies the -EEXIST
> triggered in the filesystem pagecache insertion, if the offset in the
> file was not a hole.
> 
> shmem MAP_SHARED tries to index the newly allocated pagecache in the
> radix tree before checking the pagetable so it doesn't need any
> assistance to exercise that case.
> 
> hugetlbfs checks the pmd to be not none before trying to index the
> hugetlbfs page in the radix tree, so it requires to run UFFDIO_COPY
> into an alias mapping (the alternative would be to use MADV_DONTNEED
> to only zap the pagetables, but that doesn't work on hugetlbfs).
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  tools/testing/selftests/vm/userfaultfd.c | 145 +++++++++++++++++++++++++++++--
>  1 file changed, 138 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
> index 7db6299b2f0d..d07156de55e8 100644
> --- a/tools/testing/selftests/vm/userfaultfd.c
> +++ b/tools/testing/selftests/vm/userfaultfd.c
> @@ -67,6 +67,7 @@
>  #include <pthread.h>
>  #include <linux/userfaultfd.h>
>  #include <setjmp.h>
> +#include <stdbool.h>
>  
>  #ifdef __NR_userfaultfd
>  
> @@ -83,11 +84,17 @@ static int bounces;
>  #define TEST_SHMEM	3
>  static int test_type;
>  
> +/* exercise the test_uffdio_*_eexist every ALARM_INTERVAL_SECS */
> +#define ALARM_INTERVAL_SECS 10
> +static volatile bool test_uffdio_copy_eexist = true;
> +static volatile bool test_uffdio_zeropage_eexist = true;
> +
> +static bool map_shared;
>  static int huge_fd;
>  static char *huge_fd_off0;
>  static unsigned long long *count_verify;
>  static int uffd, uffd_flags, finished, *pipefd;
> -static char *area_src, *area_dst;
> +static char *area_src, *area_src_alias, *area_dst, *area_dst_alias;
>  static char *zeropage;
>  pthread_attr_t attr;
>  
> @@ -126,6 +133,9 @@ static void anon_allocate_area(void **alloc_area)
>  	}
>  }
>  
> +static void noop_alias_mapping(__u64 *start, size_t len, unsigned long offset)
> +{
> +}
>  
>  /* HugeTLB memory */
>  static int hugetlb_release_pages(char *rel_area)
> @@ -146,17 +156,51 @@ static int hugetlb_release_pages(char *rel_area)
>  
>  static void hugetlb_allocate_area(void **alloc_area)
>  {
> +	void *area_alias = NULL;
> +	char **alloc_area_alias;
>  	*alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
> -				MAP_PRIVATE | MAP_HUGETLB, huge_fd,
> -				*alloc_area == area_src ? 0 :
> -				nr_pages * page_size);
> +			   (map_shared ? MAP_SHARED : MAP_PRIVATE) |
> +			   MAP_HUGETLB,
> +			   huge_fd, *alloc_area == area_src ? 0 :
> +			   nr_pages * page_size);
>  	if (*alloc_area == MAP_FAILED) {
>  		fprintf(stderr, "mmap of hugetlbfs file failed\n");
>  		*alloc_area = NULL;
>  	}
>  
> -	if (*alloc_area == area_src)
> +	if (map_shared) {
> +		area_alias = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
> +				  MAP_SHARED | MAP_HUGETLB,
> +				  huge_fd, *alloc_area == area_src ? 0 :
> +				  nr_pages * page_size);
> +		if (area_alias == MAP_FAILED) {
> +			if (munmap(*alloc_area, nr_pages * page_size) < 0)
> +				perror("hugetlb munmap"), exit(1);
> +			*alloc_area = NULL;
> +			return;
> +		}
> +	}
> +	if (*alloc_area == area_src) {
>  		huge_fd_off0 = *alloc_area;
> +		alloc_area_alias = &area_src_alias;
> +	} else {
> +		alloc_area_alias = &area_dst_alias;
> +	}
> +	if (area_alias)
> +		*alloc_area_alias = area_alias;
> +}
> +
> +static void hugetlb_alias_mapping(__u64 *start, size_t len, unsigned long offset)
> +{
> +	if (!map_shared)
> +		return;
> +	/*
> +	 * We can't zap just the pagetable with hugetlbfs because
> +	 * MADV_DONTEED won't work. So exercise -EEXIST on a alias
> +	 * mapping where the pagetables are not established initially,
> +	 * this way we'll exercise the -EEXEC at the fs level.
> +	 */
> +	*start = (unsigned long) area_dst_alias + offset;
>  }
>  
>  /* Shared memory */
> @@ -186,6 +230,7 @@ struct uffd_test_ops {
>  	unsigned long expected_ioctls;
>  	void (*allocate_area)(void **alloc_area);
>  	int (*release_pages)(char *rel_area);
> +	void (*alias_mapping)(__u64 *start, size_t len, unsigned long offset);
>  };
>  
>  #define ANON_EXPECTED_IOCTLS		((1 << _UFFDIO_WAKE) | \
> @@ -196,18 +241,21 @@ static struct uffd_test_ops anon_uffd_test_ops = {
>  	.expected_ioctls = ANON_EXPECTED_IOCTLS,
>  	.allocate_area	= anon_allocate_area,
>  	.release_pages	= anon_release_pages,
> +	.alias_mapping = noop_alias_mapping,
>  };
>  
>  static struct uffd_test_ops shmem_uffd_test_ops = {
>  	.expected_ioctls = ANON_EXPECTED_IOCTLS,
>  	.allocate_area	= shmem_allocate_area,
>  	.release_pages	= shmem_release_pages,
> +	.alias_mapping = noop_alias_mapping,
>  };
>  
>  static struct uffd_test_ops hugetlb_uffd_test_ops = {
>  	.expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC,
>  	.allocate_area	= hugetlb_allocate_area,
>  	.release_pages	= hugetlb_release_pages,
> +	.alias_mapping = hugetlb_alias_mapping,
>  };
>  
>  static struct uffd_test_ops *uffd_test_ops;
> @@ -332,6 +380,23 @@ static void *locking_thread(void *arg)
>  	return NULL;
>  }
>  
> +static void retry_copy_page(int ufd, struct uffdio_copy *uffdio_copy,
> +			    unsigned long offset)
> +{
> +	uffd_test_ops->alias_mapping(&uffdio_copy->dst,
> +				     uffdio_copy->len,
> +				     offset);
> +	if (ioctl(ufd, UFFDIO_COPY, uffdio_copy)) {
> +		/* real retval in ufdio_copy.copy */
> +		if (uffdio_copy->copy != -EEXIST)
> +			fprintf(stderr, "UFFDIO_COPY retry error %Ld\n",
> +				uffdio_copy->copy), exit(1);
> +	} else {
> +		fprintf(stderr,	"UFFDIO_COPY retry unexpected %Ld\n",
> +			uffdio_copy->copy), exit(1);
> +	}
> +}
> +
>  static int copy_page(int ufd, unsigned long offset)
>  {
>  	struct uffdio_copy uffdio_copy;
> @@ -352,8 +417,13 @@ static int copy_page(int ufd, unsigned long offset)
>  	} else if (uffdio_copy.copy != page_size) {
>  		fprintf(stderr, "UFFDIO_COPY unexpected copy %Ld\n",
>  			uffdio_copy.copy), exit(1);
> -	} else
> +	} else {
> +		if (test_uffdio_copy_eexist) {
> +			test_uffdio_copy_eexist = false;
> +			retry_copy_page(ufd, &uffdio_copy, offset);
> +		}
>  		return 1;
> +	}
>  	return 0;
>  }
>  
> @@ -692,6 +762,23 @@ static int faulting_process(int signal_test)
>  	return 0;
>  }
>  
> +static void retry_uffdio_zeropage(int ufd,
> +				  struct uffdio_zeropage *uffdio_zeropage,
> +				  unsigned long offset)
> +{
> +	uffd_test_ops->alias_mapping(&uffdio_zeropage->range.start,
> +				     uffdio_zeropage->range.len,
> +				     offset);
> +	if (ioctl(ufd, UFFDIO_ZEROPAGE, uffdio_zeropage)) {
> +		if (uffdio_zeropage->zeropage != -EEXIST)
> +			fprintf(stderr, "UFFDIO_ZEROPAGE retry error %Ld\n",
> +				uffdio_zeropage->zeropage), exit(1);
> +	} else {
> +		fprintf(stderr, "UFFDIO_ZEROPAGE retry unexpected %Ld\n",
> +			uffdio_zeropage->zeropage), exit(1);
> +	}
> +}
> +
>  static int uffdio_zeropage(int ufd, unsigned long offset)
>  {
>  	struct uffdio_zeropage uffdio_zeropage;
> @@ -727,6 +814,11 @@ static int uffdio_zeropage(int ufd, unsigned long offset)
>  			fprintf(stderr, "UFFDIO_ZEROPAGE unexpected %Ld\n",
>  				uffdio_zeropage.zeropage), exit(1);
>  		} else
> +			if (test_uffdio_zeropage_eexist) {
> +				test_uffdio_zeropage_eexist = false;
> +				retry_uffdio_zeropage(ufd, &uffdio_zeropage,
> +						      offset);
> +			}
>  			return 1;

I think you want to add {} around those indented lines in the else
clause above.  Caught by the compiler, not me.

-- 
Mike Kravetz

>  	} else {
>  		fprintf(stderr,
> @@ -999,6 +1091,15 @@ static int userfaultfd_stress(void)
>  			return 1;
>  		}
>  
> +		if (area_dst_alias) {
> +			uffdio_register.range.start = (unsigned long)
> +				area_dst_alias;
> +			if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) {
> +				fprintf(stderr, "register failure alias\n");
> +				return 1;
> +			}
> +		}
> +
>  		/*
>  		 * The madvise done previously isn't enough: some
>  		 * uffd_thread could have read userfaults (one of
> @@ -1032,9 +1133,17 @@ static int userfaultfd_stress(void)
>  
>  		/* unregister */
>  		if (ioctl(uffd, UFFDIO_UNREGISTER, &uffdio_register.range)) {
> -			fprintf(stderr, "register failure\n");
> +			fprintf(stderr, "unregister failure\n");
>  			return 1;
>  		}
> +		if (area_dst_alias) {
> +			uffdio_register.range.start = (unsigned long) area_dst;
> +			if (ioctl(uffd, UFFDIO_UNREGISTER,
> +				  &uffdio_register.range)) {
> +				fprintf(stderr, "unregister failure alias\n");
> +				return 1;
> +			}
> +		}
>  
>  		/* verification */
>  		if (bounces & BOUNCE_VERIFY) {
> @@ -1056,6 +1165,10 @@ static int userfaultfd_stress(void)
>  		area_src = area_dst;
>  		area_dst = tmp_area;
>  
> +		tmp_area = area_src_alias;
> +		area_src_alias = area_dst_alias;
> +		area_dst_alias = tmp_area;
> +
>  		printf("userfaults:");
>  		for (cpu = 0; cpu < nr_cpus; cpu++)
>  			printf(" %lu", userfaults[cpu]);
> @@ -1102,7 +1215,12 @@ static void set_test_type(const char *type)
>  	} else if (!strcmp(type, "hugetlb")) {
>  		test_type = TEST_HUGETLB;
>  		uffd_test_ops = &hugetlb_uffd_test_ops;
> +	} else if (!strcmp(type, "hugetlb_shared")) {
> +		map_shared = true;
> +		test_type = TEST_HUGETLB;
> +		uffd_test_ops = &hugetlb_uffd_test_ops;
>  	} else if (!strcmp(type, "shmem")) {
> +		map_shared = true;
>  		test_type = TEST_SHMEM;
>  		uffd_test_ops = &shmem_uffd_test_ops;
>  	} else {
> @@ -1122,12 +1240,25 @@ static void set_test_type(const char *type)
>  		fprintf(stderr, "Impossible to run this test\n"), exit(2);
>  }
>  
> +static void sigalrm(int sig)
> +{
> +	if (sig != SIGALRM)
> +		abort();
> +	test_uffdio_copy_eexist = true;
> +	test_uffdio_zeropage_eexist = true;
> +	alarm(ALARM_INTERVAL_SECS);
> +}
> +
>  int main(int argc, char **argv)
>  {
>  	if (argc < 4)
>  		fprintf(stderr, "Usage: <test type> <MiB> <bounces> [hugetlbfs_file]\n"),
>  				exit(1);
>  
> +	if (signal(SIGALRM, sigalrm) == SIG_ERR)
> +		fprintf(stderr, "failed to arm SIGALRM"), exit(1);
> +	alarm(ALARM_INTERVAL_SECS);
> +
>  	set_test_type(argv[1]);
>  
>  	nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/6] userfaultfd updates for v4.13-rc3
  2017-08-02 21:29 ` [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrew Morton
@ 2017-08-02 22:27   ` Andrea Arcangeli
  0 siblings, 0 replies; 11+ messages in thread
From: Andrea Arcangeli @ 2017-08-02 22:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Dr. David Alan Gilbert, Maxime Coquelin, Mike Rapoport,
	Mike Kravetz, Alexey Perevalov

On Wed, Aug 02, 2017 at 02:29:25PM -0700, Andrew Morton wrote:
> On Wed,  2 Aug 2017 18:51:39 +0200 Andrea Arcangeli <aarcange@redhat.com> wrote:
> 
> > Hello,
> > 
> > these are some uffd updates I have pending that looks ready for
> > merging. vhost-user KVM developement run into a crash so patch 1/6 is
> > urgent (and simple), the rest is not urgent.
> > 
> > The testcase has been updated to exercise it.
> > 
> > This should apply clean to -mm, and I reviewed in detail all other
> > userfaultfd patches that are in -mm and they're all great, including
> > the shmem zeropage addition.
> > 
> > Alexey Perevalov (1):
> >   userfaultfd: provide pid in userfault msg
> > 
> > Andrea Arcangeli (5):
> >   userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED
> >     case
> >   userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST
> >   userfaultfd: selftest: explicit failure if the SIGBUS test failed
> >   userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds
> >   userfaultfd: provide pid in userfault msg - add feat union
> 
> I'm thinking "userfaultfd: hugetlbfs: remove superfluous page unlock in
> VM_SHARED case" goes into 4.13-rc and the other patches into 4.14-rc1. 
> Sound sane?

That would be perfect!

Mike spotted that 2/6 needs the incremental fix below, my compiler
didn't warn about it and the difference would be only noticeable in
case of fatal errors. I can resend 2/6 if you prefer.

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 34838d5b33f3..a2c53a3d223d 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -813,13 +813,14 @@ static int uffdio_zeropage(int ufd, unsigned long offset)
 		if (uffdio_zeropage.zeropage != page_size) {
 			fprintf(stderr, "UFFDIO_ZEROPAGE unexpected %Ld\n",
 				uffdio_zeropage.zeropage), exit(1);
-		} else
+		} else {
 			if (test_uffdio_zeropage_eexist) {
 				test_uffdio_zeropage_eexist = false;
 				retry_uffdio_zeropage(ufd, &uffdio_zeropage,
 						      offset);
 			}
 			return 1;
+		}
 	} else {
 		fprintf(stderr,
 			"UFFDIO_ZEROPAGE succeeded %Ld\n",

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-08-02 22:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-02 16:51 [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrea Arcangeli
2017-08-02 16:51 ` [PATCH 1/6] userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case Andrea Arcangeli
2017-08-02 17:12   ` Mike Kravetz
2017-08-02 16:51 ` [PATCH 2/6] userfaultfd: selftest: exercise UFFDIO_COPY/ZEROPAGE -EEXIST Andrea Arcangeli
2017-08-02 21:33   ` Mike Kravetz
2017-08-02 16:51 ` [PATCH 3/6] userfaultfd: selftest: explicit failure if the SIGBUS test failed Andrea Arcangeli
2017-08-02 16:51 ` [PATCH 4/6] userfaultfd: call userfaultfd_unmap_prep only if __split_vma succeeds Andrea Arcangeli
2017-08-02 16:51 ` [PATCH 5/6] userfaultfd: provide pid in userfault msg Andrea Arcangeli
2017-08-02 16:51 ` [PATCH 6/6] userfaultfd: provide pid in userfault msg - add feat union Andrea Arcangeli
2017-08-02 21:29 ` [PATCH 0/6] userfaultfd updates for v4.13-rc3 Andrew Morton
2017-08-02 22:27   ` Andrea Arcangeli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.