All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+
@ 2015-09-08 20:43 Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 01/12] userfaultfd: selftest: update userfaultfd x86 32bit syscall number Andrea Arcangeli
                   ` (12 more replies)
  0 siblings, 13 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

Here are some pending updates for userfaultfd mostly to the self test,
the rest are cleanups.

I put 1/12 first because it was already submitted separately by
Stephen Rothwell so chances are it's already upstream.

If you had problems with the selftest on non-x86 arches please try
again with these patches applied (or if you prefer to git clone
instead of git am, give a spin to the aa.git "userfault" branch which
is in sync with this submit).

Some of these have been floating on the lists, so after this submit we
should be all in sync.

I understand the powerpc parts are to be deferred for upstream merging
(-mm mailer comment said so), so I assume the aarch64 parts too, but
they could still land in -mm and linux-next in the meantime.

NOTE: none of these changes is urgent.

The patchset is actually against upstream, if this doesn't apply clean
to -mm or you prefer it against linux-next let me know.

Thanks,
Andrea

Andrea Arcangeli (7):
  userfaultfd: selftest: update userfaultfd x86 32bit syscall number
  userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to
    __wake_up_locked_key"
  userfaultfd: selftest: headers fixup
  userfaultfd: selftest: avoid my_bcmp false positives with powerpc
  userfaultfd: selftest: return an error if BOUNCE_VERIFY fails
  userfaultfd: selftest: don't error out if pthread_mutex_t isn't
    identical
  userfaultfd: powerpc: implement syscall

Bharata B Rao (1):
  userfaultfd: powerpc: Bump up __NR_syscalls to account for
    __NR_userfaultfd

Dr. David Alan Gilbert (1):
  userfaultfd: register uapi generic syscall (aarch64)

Geert Uytterhoeven (1):
  userfaultfd: selftest: Fix compiler warnings on 32-bit

Michael Ellerman (1):
  userfaultfd: selftest: only warn if __NR_userfaultfd is undefined

Thierry Reding (1):
  userfaultfd: selftests: vm: pick up sanitized kernel headers

 arch/powerpc/include/asm/systbl.h        |  1 +
 arch/powerpc/include/asm/unistd.h        |  2 +-
 arch/powerpc/include/uapi/asm/unistd.h   |  1 +
 fs/userfaultfd.c                         |  8 ++---
 include/linux/wait.h                     |  5 ++-
 include/uapi/asm-generic/unistd.h        |  4 ++-
 kernel/sched/wait.c                      |  7 ++--
 net/sunrpc/sched.c                       |  2 +-
 tools/testing/selftests/vm/Makefile      |  9 +++--
 tools/testing/selftests/vm/userfaultfd.c | 61 ++++++++++++++++++--------------
 10 files changed, 57 insertions(+), 43 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/12] userfaultfd: selftest: update userfaultfd x86 32bit syscall number
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 02/12] userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" Andrea Arcangeli
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

It changed as result of linux-next merge of other syscalls.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 0c0b839..76071b1 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -69,7 +69,7 @@
 #ifdef __x86_64__
 #define __NR_userfaultfd 323
 #elif defined(__i386__)
-#define __NR_userfaultfd 359
+#define __NR_userfaultfd 374
 #elif defined(__powewrpc__)
 #define __NR_userfaultfd 364
 #else

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/12] userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key"
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 01/12] userfaultfd: selftest: update userfaultfd x86 32bit syscall number Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 03/12] userfaultfd: selftests: vm: pick up sanitized kernel headers Andrea Arcangeli
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

This reverts commit 51360155eccb907ff8635bd10fc7de876408c2e0 and
adapts fs/userfaultfd.c to use the old version of that function.

It didn't look robust to call __wake_up_common with "nr == 1" when we
absolutely require wakeall semantics, but we've full control of what
we insert in the two waitqueue heads of the blocked userfaults. No
exclusive waitqueue risks to be inserted into those two waitqueue
heads so we can as well stick to "nr == 1" of the old code and we can
rely purely on the fact no waitqueue inserted in one of the two
waitqueue heads we must enforce as wakeall, has wait->flags
WQ_FLAG_EXCLUSIVE set.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 fs/userfaultfd.c     | 8 ++++----
 include/linux/wait.h | 5 ++---
 kernel/sched/wait.c  | 7 +++----
 net/sunrpc/sched.c   | 2 +-
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 634e676..15245e5 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -467,8 +467,8 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
 	 * the fault_*wqh.
 	 */
 	spin_lock(&ctx->fault_pending_wqh.lock);
-	__wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL, 0, &range);
-	__wake_up_locked_key(&ctx->fault_wqh, TASK_NORMAL, 0, &range);
+	__wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL, &range);
+	__wake_up_locked_key(&ctx->fault_wqh, TASK_NORMAL, &range);
 	spin_unlock(&ctx->fault_pending_wqh.lock);
 
 	wake_up_poll(&ctx->fd_wqh, POLLHUP);
@@ -650,10 +650,10 @@ static void __wake_userfault(struct userfaultfd_ctx *ctx,
 	spin_lock(&ctx->fault_pending_wqh.lock);
 	/* wake all in the range and autoremove */
 	if (waitqueue_active(&ctx->fault_pending_wqh))
-		__wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL, 0,
+		__wake_up_locked_key(&ctx->fault_pending_wqh, TASK_NORMAL,
 				     range);
 	if (waitqueue_active(&ctx->fault_wqh))
-		__wake_up_locked_key(&ctx->fault_wqh, TASK_NORMAL, 0, range);
+		__wake_up_locked_key(&ctx->fault_wqh, TASK_NORMAL, range);
 	spin_unlock(&ctx->fault_pending_wqh.lock);
 }
 
diff --git a/include/linux/wait.h b/include/linux/wait.h
index d3d0772..1e1bf9f 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -147,8 +147,7 @@ __remove_wait_queue(wait_queue_head_t *head, wait_queue_t *old)
 
 typedef int wait_bit_action_f(struct wait_bit_key *);
 void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
-void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, int nr,
-			  void *key);
+void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, void *key);
 void __wake_up_sync_key(wait_queue_head_t *q, unsigned int mode, int nr, void *key);
 void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr);
 void __wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr);
@@ -180,7 +179,7 @@ wait_queue_head_t *bit_waitqueue(void *, int);
 #define wake_up_poll(x, m)						\
 	__wake_up(x, TASK_NORMAL, 1, (void *) (m))
 #define wake_up_locked_poll(x, m)					\
-	__wake_up_locked_key((x), TASK_NORMAL, 1, (void *) (m))
+	__wake_up_locked_key((x), TASK_NORMAL, (void *) (m))
 #define wake_up_interruptible_poll(x, m)				\
 	__wake_up(x, TASK_INTERRUPTIBLE, 1, (void *) (m))
 #define wake_up_interruptible_sync_poll(x, m)				\
diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index 272d932..052e026 100644
--- a/kernel/sched/wait.c
+++ b/kernel/sched/wait.c
@@ -106,10 +106,9 @@ void __wake_up_locked(wait_queue_head_t *q, unsigned int mode, int nr)
 }
 EXPORT_SYMBOL_GPL(__wake_up_locked);
 
-void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, int nr,
-			  void *key)
+void __wake_up_locked_key(wait_queue_head_t *q, unsigned int mode, void *key)
 {
-	__wake_up_common(q, mode, nr, 0, key);
+	__wake_up_common(q, mode, 1, 0, key);
 }
 EXPORT_SYMBOL_GPL(__wake_up_locked_key);
 
@@ -284,7 +283,7 @@ void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait,
 	if (!list_empty(&wait->task_list))
 		list_del_init(&wait->task_list);
 	else if (waitqueue_active(q))
-		__wake_up_locked_key(q, mode, 1, key);
+		__wake_up_locked_key(q, mode, key);
 	spin_unlock_irqrestore(&q->lock, flags);
 }
 EXPORT_SYMBOL(abort_exclusive_wait);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index b140c09..337ca85 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -297,7 +297,7 @@ static int rpc_complete_task(struct rpc_task *task)
 	clear_bit(RPC_TASK_ACTIVE, &task->tk_runstate);
 	ret = atomic_dec_and_test(&task->tk_count);
 	if (waitqueue_active(wq))
-		__wake_up_locked_key(wq, TASK_NORMAL, 1, &k);
+		__wake_up_locked_key(wq, TASK_NORMAL, &k);
 	spin_unlock_irqrestore(&wq->lock, flags);
 	return ret;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/12] userfaultfd: selftests: vm: pick up sanitized kernel headers
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 01/12] userfaultfd: selftest: update userfaultfd x86 32bit syscall number Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 02/12] userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-09  2:48   ` Michael Ellerman
  2015-09-08 20:43 ` [PATCH 04/12] userfaultfd: selftest: headers fixup Andrea Arcangeli
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

From: Thierry Reding <treding@nvidia.com>

Add the usr/include subdirectory of the top-level tree to the include
path, and make sure to include headers without relative paths to make sure
the sanitized headers get picked up.  Otherwise the compiler will not be
able to find the linux/compiler.h header included by the non- sanitized
include/uapi/linux/userfaultfd.h.

While at it, make sure to only hardcode the syscall numbers on x86 and
PowerPC if they haven't been properly picked up from the headers.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/Makefile      | 2 +-
 tools/testing/selftests/vm/userfaultfd.c | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 0d68547..33f7bbf 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -1,6 +1,6 @@
 # Makefile for vm selftests
 
-CFLAGS = -Wall
+CFLAGS = -Wall -I ../../../../usr/include $(EXTRA_CFLAGS)
 BINARIES = compaction_test
 BINARIES += hugepage-mmap
 BINARIES += hugepage-shm
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 76071b1..2bf1fc3 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -64,8 +64,9 @@
 #include <sys/syscall.h>
 #include <sys/ioctl.h>
 #include <pthread.h>
-#include "../../../../include/uapi/linux/userfaultfd.h"
+#include <linux/userfaultfd.h>
 
+#ifndef __NR_userfaultfd
 #ifdef __x86_64__
 #define __NR_userfaultfd 323
 #elif defined(__i386__)
@@ -75,6 +76,7 @@
 #else
 #error "missing __NR_userfaultfd definition"
 #endif
+#endif
 
 static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/12] userfaultfd: selftest: headers fixup
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
                   ` (2 preceding siblings ...)
  2015-09-08 20:43 ` [PATCH 03/12] userfaultfd: selftests: vm: pick up sanitized kernel headers Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 05/12] userfaultfd: selftest: only warn if __NR_userfaultfd is undefined Andrea Arcangeli
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

Depend on "make headers_install" to create proper headers to include
and provide syscall numbers.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/Makefile      | 7 +++++--
 tools/testing/selftests/vm/userfaultfd.c | 8 --------
 2 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 33f7bbf..5b066fc 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -13,8 +13,11 @@ BINARIES += userfaultfd
 all: $(BINARIES)
 %: %.c
 	$(CC) $(CFLAGS) -o $@ $^ -lrt
-userfaultfd: userfaultfd.c
-	$(CC) $(CFLAGS) -O2 -o $@ $^ -lpthread
+userfaultfd: userfaultfd.c ../../../../usr/include/linux/kernel.h
+	$(CC) $(CFLAGS) -O2 -o $@ $< -lpthread
+
+../../../../usr/include/linux/kernel.h:
+	make -C ../../../.. headers_install
 
 TEST_PROGS := run_vmtests
 TEST_FILES := $(BINARIES)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 2bf1fc3..23ba5f2 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -67,16 +67,8 @@
 #include <linux/userfaultfd.h>
 
 #ifndef __NR_userfaultfd
-#ifdef __x86_64__
-#define __NR_userfaultfd 323
-#elif defined(__i386__)
-#define __NR_userfaultfd 374
-#elif defined(__powewrpc__)
-#define __NR_userfaultfd 364
-#else
 #error "missing __NR_userfaultfd definition"
 #endif
-#endif
 
 static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/12] userfaultfd: selftest: only warn if __NR_userfaultfd is undefined
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
                   ` (3 preceding siblings ...)
  2015-09-08 20:43 ` [PATCH 04/12] userfaultfd: selftest: headers fixup Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 06/12] userfaultfd: selftest: avoid my_bcmp false positives with powerpc Andrea Arcangeli
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

From: Michael Ellerman <mpe@ellerman.id.au>

If __NR_userfaultfd is not yet defined by the arch, warn but still
build and run the userfaultfd selftest successfully.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 23ba5f2..0c7d66f 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -66,9 +66,7 @@
 #include <pthread.h>
 #include <linux/userfaultfd.h>
 
-#ifndef __NR_userfaultfd
-#error "missing __NR_userfaultfd definition"
-#endif
+#ifdef __NR_userfaultfd
 
 static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size;
 
@@ -628,3 +626,15 @@ int main(int argc, char **argv)
 	       nr_pages, nr_pages_per_cpu);
 	return userfaultfd_stress();
 }
+
+#else /* __NR_userfaultfd */
+
+#warning "missing __NR_userfaultfd definition"
+
+int main(void)
+{
+	printf("skip: Skipping userfaultfd test (missing __NR_userfaultfd)\n");
+	return 0;
+}
+
+#endif /* __NR_userfaultfd */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/12] userfaultfd: selftest: avoid my_bcmp false positives with powerpc
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
                   ` (4 preceding siblings ...)
  2015-09-08 20:43 ` [PATCH 05/12] userfaultfd: selftest: only warn if __NR_userfaultfd is undefined Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-09  2:50   ` Michael Ellerman
  2015-09-08 20:43 ` [PATCH 07/12] userfaultfd: selftest: Fix compiler warnings on 32-bit Andrea Arcangeli
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

Keep a non-zero placeholder after the count, for the my_bcmp
comparison of the page against the zeropage. The lockless increment
between 255 to 256 against a lockless my_bcmp could otherwise return
false positives on ppc32le.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 0c7d66f..c1e4fa1 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -462,6 +462,14 @@ static int userfaultfd_stress(void)
 		*area_mutex(area_src, nr) = (pthread_mutex_t)
 			PTHREAD_MUTEX_INITIALIZER;
 		count_verify[nr] = *area_count(area_src, nr) = 1;
+		/*
+		 * In the transition between 255 to 256, powerpc will
+		 * read out of order in my_bcmp and see both bytes as
+		 * zero, so leave a placeholder below always non-zero
+		 * after the count, to avoid my_bcmp to trigger false
+		 * positives.
+		 */
+		*(area_count(area_src, nr) + 1) = 1;
 	}
 
 	pipefd = malloc(sizeof(int) * nr_cpus * 2);
@@ -607,8 +615,8 @@ int main(int argc, char **argv)
 		fprintf(stderr, "Usage: <MiB> <bounces>\n"), exit(1);
 	nr_cpus = sysconf(_SC_NPROCESSORS_ONLN);
 	page_size = sysconf(_SC_PAGE_SIZE);
-	if ((unsigned long) area_count(NULL, 0) + sizeof(unsigned long long) >
-	    page_size)
+	if ((unsigned long) area_count(NULL, 0) + sizeof(unsigned long long) * 2
+	    > page_size)
 		fprintf(stderr, "Impossible to run this test\n"), exit(2);
 	nr_pages_per_cpu = atol(argv[1]) * 1024*1024 / page_size /
 		nr_cpus;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/12] userfaultfd: selftest: Fix compiler warnings on 32-bit
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
                   ` (5 preceding siblings ...)
  2015-09-08 20:43 ` [PATCH 06/12] userfaultfd: selftest: avoid my_bcmp false positives with powerpc Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 08/12] userfaultfd: selftest: return an error if BOUNCE_VERIFY fails Andrea Arcangeli
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

From: Geert Uytterhoeven <geert@linux-m68k.org>

On 32-bit:

    userfaultfd.c: In function 'locking_thread':
    userfaultfd.c:152: warning: left shift count >= width of type
    userfaultfd.c: In function 'uffd_poll_thread':
    userfaultfd.c:295: warning: cast to pointer from integer of different size
    userfaultfd.c: In function 'uffd_read_thread':
    userfaultfd.c:332: warning: cast to pointer from integer of different size

Fix the shift warning by splitting the shift in two parts, and the
integer/pointer warnigns by adding intermediate casts to "unsigned
long".

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index c1e4fa1..1089709 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -139,7 +139,8 @@ static void *locking_thread(void *arg)
 			if (sizeof(page_nr) > sizeof(rand_nr)) {
 				if (random_r(&rand, &rand_nr))
 					fprintf(stderr, "random_r 2 error\n"), exit(1);
-				page_nr |= ((unsigned long) rand_nr) << 32;
+				page_nr |= (((unsigned long) rand_nr) << 16) <<
+					   16;
 			}
 		} else
 			page_nr += 1;
@@ -282,7 +283,8 @@ static void *uffd_poll_thread(void *arg)
 				msg.event), exit(1);
 		if (msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WRITE)
 			fprintf(stderr, "unexpected write fault\n"), exit(1);
-		offset = (char *)msg.arg.pagefault.address - area_dst;
+		offset = (char *)(unsigned long)msg.arg.pagefault.address -
+			 area_dst;
 		offset &= ~(page_size-1);
 		if (copy_page(offset))
 			userfaults++;
@@ -319,7 +321,8 @@ static void *uffd_read_thread(void *arg)
 		if (bounces & BOUNCE_VERIFY &&
 		    msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WRITE)
 			fprintf(stderr, "unexpected write fault\n"), exit(1);
-		offset = (char *)msg.arg.pagefault.address - area_dst;
+		offset = (char *)(unsigned long)msg.arg.pagefault.address -
+			 area_dst;
 		offset &= ~(page_size-1);
 		if (copy_page(offset))
 			(*this_cpu_userfaults)++;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/12] userfaultfd: selftest: return an error if BOUNCE_VERIFY fails
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
                   ` (6 preceding siblings ...)
  2015-09-08 20:43 ` [PATCH 07/12] userfaultfd: selftest: Fix compiler warnings on 32-bit Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 09/12] userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical Andrea Arcangeli
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

This will report the error in the exit code, in addition of the
fprintf.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 1089709..174f2fc 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -422,7 +422,7 @@ static int userfaultfd_stress(void)
 	struct uffdio_register uffdio_register;
 	struct uffdio_api uffdio_api;
 	unsigned long cpu;
-	int uffd_flags;
+	int uffd_flags, err;
 	unsigned long userfaults[nr_cpus];
 
 	if (posix_memalign(&area, page_size, nr_pages * page_size)) {
@@ -499,6 +499,7 @@ static int userfaultfd_stress(void)
 	pthread_attr_init(&attr);
 	pthread_attr_setstacksize(&attr, 16*1024*1024);
 
+	err = 0;
 	while (bounces--) {
 		unsigned long expected_ioctls;
 
@@ -583,8 +584,9 @@ static int userfaultfd_stress(void)
 					    area_dst + nr * page_size,
 					    sizeof(pthread_mutex_t))) {
 					fprintf(stderr,
-						"error mutex 2 %lu\n",
+						"error mutex %lu\n",
 						nr);
+					err = 1;
 					bounces = 0;
 				}
 				if (*area_count(area_dst, nr) != count_verify[nr]) {
@@ -593,6 +595,7 @@ static int userfaultfd_stress(void)
 						*area_count(area_src, nr),
 						count_verify[nr],
 						nr);
+					err = 1;
 					bounces = 0;
 				}
 			}
@@ -609,7 +612,7 @@ static int userfaultfd_stress(void)
 		printf("\n");
 	}
 
-	return 0;
+	return err;
 }
 
 int main(int argc, char **argv)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/12] userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
                   ` (7 preceding siblings ...)
  2015-09-08 20:43 ` [PATCH 08/12] userfaultfd: selftest: return an error if BOUNCE_VERIFY fails Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 10/12] userfaultfd: powerpc: Bump up __NR_syscalls to account for __NR_userfaultfd Andrea Arcangeli
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

On ppc big endian this check fails, the mutex doesn't necessarily need
to be identical for all pages after pthread_mutex_lock/unlock
cycles. The count verification (outside of the pthread_mutex_t
structure) suffices and that is retained.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 tools/testing/selftests/vm/userfaultfd.c | 9 ---------
 1 file changed, 9 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index 174f2fc..d77ed41 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -580,15 +580,6 @@ static int userfaultfd_stress(void)
 		/* verification */
 		if (bounces & BOUNCE_VERIFY) {
 			for (nr = 0; nr < nr_pages; nr++) {
-				if (my_bcmp(area_dst,
-					    area_dst + nr * page_size,
-					    sizeof(pthread_mutex_t))) {
-					fprintf(stderr,
-						"error mutex %lu\n",
-						nr);
-					err = 1;
-					bounces = 0;
-				}
 				if (*area_count(area_dst, nr) != count_verify[nr]) {
 					fprintf(stderr,
 						"error area_count %Lu %Lu %lu\n",

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/12] userfaultfd: powerpc: Bump up __NR_syscalls to account for __NR_userfaultfd
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
                   ` (8 preceding siblings ...)
  2015-09-08 20:43 ` [PATCH 09/12] userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-09  2:48   ` Michael Ellerman
  2015-09-08 20:43 ` [PATCH 11/12] userfaultfd: powerpc: implement syscall Andrea Arcangeli
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

With userfaultfd syscall, the number of syscalls will be 365 on PowerPC.
Reflect the same in __NR_syscalls.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 arch/powerpc/include/asm/unistd.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
index f4f8b66..4a055b6 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include <uapi/asm/unistd.h>
 
 
-#define __NR_syscalls		364
+#define __NR_syscalls		365
 
 #define __NR__exit __NR_exit
 #define NR_syscalls	__NR_syscalls

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/12] userfaultfd: powerpc: implement syscall
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
                   ` (9 preceding siblings ...)
  2015-09-08 20:43 ` [PATCH 10/12] userfaultfd: powerpc: Bump up __NR_syscalls to account for __NR_userfaultfd Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-08 20:43 ` [PATCH 12/12] userfaultfd: register uapi generic syscall (aarch64) Andrea Arcangeli
  2015-09-30 21:56 ` [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Mike Kravetz
  12 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

Add userfaultfd to powerpc.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 arch/powerpc/include/asm/systbl.h      | 1 +
 arch/powerpc/include/uapi/asm/unistd.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/powerpc/include/asm/systbl.h b/arch/powerpc/include/asm/systbl.h
index 71f2b3f..4d65499 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -368,3 +368,4 @@ SYSCALL_SPU(memfd_create)
 SYSCALL_SPU(bpf)
 COMPAT_SYS(execveat)
 PPC64ONLY(switch_endian)
+SYSCALL_SPU(userfaultfd)
diff --git a/arch/powerpc/include/uapi/asm/unistd.h b/arch/powerpc/include/uapi/asm/unistd.h
index e4aa173..6ad58d4 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -386,5 +386,6 @@
 #define __NR_bpf		361
 #define __NR_execveat		362
 #define __NR_switch_endian	363
+#define __NR_userfaultfd	364
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/12] userfaultfd: register uapi generic syscall (aarch64)
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
                   ` (10 preceding siblings ...)
  2015-09-08 20:43 ` [PATCH 11/12] userfaultfd: powerpc: implement syscall Andrea Arcangeli
@ 2015-09-08 20:43 ` Andrea Arcangeli
  2015-09-15 20:02   ` Andrew Morton
  2015-09-30 21:56 ` [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Mike Kravetz
  12 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-08 20:43 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen, Rik van Riel,
	Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add the userfaultfd syscalls to uapi asm-generic, it was tested with
postcopy live migration on aarch64 with both 4k and 64k pagesize kernels.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 include/uapi/asm-generic/unistd.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index e016bd9..5e97468 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
 __SYSCALL(__NR_bpf, sys_bpf)
 #define __NR_execveat 281
 __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
+#define __NR_userfaultfd 282
+__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
 
 #undef __NR_syscalls
-#define __NR_syscalls 282
+#define __NR_syscalls 283
 
 /*
  * All syscalls below here should go away really,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 03/12] userfaultfd: selftests: vm: pick up sanitized kernel headers
  2015-09-08 20:43 ` [PATCH 03/12] userfaultfd: selftests: vm: pick up sanitized kernel headers Andrea Arcangeli
@ 2015-09-09  2:48   ` Michael Ellerman
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Ellerman @ 2015-09-09  2:48 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, Pavel Emelyanov, zhang.zhanghailiang,
	Dave Hansen, Rik van Riel, Dr. David Alan Gilbert,
	Huangpeng (Peter),
	Bamvor Zhang Jian, Bharata B Rao, Geert Uytterhoeven

On Tue, 2015-09-08 at 22:43 +0200, Andrea Arcangeli wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Add the usr/include subdirectory of the top-level tree to the include
> path, and make sure to include headers without relative paths to make sure
> the sanitized headers get picked up.  Otherwise the compiler will not be
> able to find the linux/compiler.h header included by the non- sanitized
> include/uapi/linux/userfaultfd.h.
> 
> While at it, make sure to only hardcode the syscall numbers on x86 and
> PowerPC if they haven't been properly picked up from the headers.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> Cc: Shuah Khan <shuahkh@osg.samsung.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  tools/testing/selftests/vm/Makefile      | 2 +-
>  tools/testing/selftests/vm/userfaultfd.c | 4 +++-
>  2 files changed, 4 insertions(+), 2 deletions(-)

This is not perfect, but better than what's there, so:

Acked-by: Michael Ellerman <mpe@ellerman.id.au>

cheers


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 10/12] userfaultfd: powerpc: Bump up __NR_syscalls to account for __NR_userfaultfd
  2015-09-08 20:43 ` [PATCH 10/12] userfaultfd: powerpc: Bump up __NR_syscalls to account for __NR_userfaultfd Andrea Arcangeli
@ 2015-09-09  2:48   ` Michael Ellerman
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Ellerman @ 2015-09-09  2:48 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, Pavel Emelyanov, zhang.zhanghailiang,
	Dave Hansen, Rik van Riel, Dr. David Alan Gilbert,
	Huangpeng (Peter),
	Bamvor Zhang Jian, Bharata B Rao, Geert Uytterhoeven

On Tue, 2015-09-08 at 22:43 +0200, Andrea Arcangeli wrote:
> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> With userfaultfd syscall, the number of syscalls will be 365 on PowerPC.
> Reflect the same in __NR_syscalls.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  arch/powerpc/include/asm/unistd.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/unistd.h b/arch/powerpc/include/asm/unistd.h
> index f4f8b66..4a055b6 100644
> --- a/arch/powerpc/include/asm/unistd.h
> +++ b/arch/powerpc/include/asm/unistd.h
> @@ -12,7 +12,7 @@
>  #include <uapi/asm/unistd.h>
>  
> 
> -#define __NR_syscalls		364
> +#define __NR_syscalls		365

I guess technically it's OK for this to get bumped first, but we typically do
it in a single patch with the addition of the syscall number.

I'd rather do the syscall addition via my tree.

cheers



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 06/12] userfaultfd: selftest: avoid my_bcmp false positives with powerpc
  2015-09-08 20:43 ` [PATCH 06/12] userfaultfd: selftest: avoid my_bcmp false positives with powerpc Andrea Arcangeli
@ 2015-09-09  2:50   ` Michael Ellerman
  2015-09-09 17:02     ` Andrea Arcangeli
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Ellerman @ 2015-09-09  2:50 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, Pavel Emelyanov, zhang.zhanghailiang,
	Dave Hansen, Rik van Riel, Dr. David Alan Gilbert,
	Huangpeng (Peter),
	Bamvor Zhang Jian, Bharata B Rao, Geert Uytterhoeven

On Tue, 2015-09-08 at 22:43 +0200, Andrea Arcangeli wrote:
> Keep a non-zero placeholder after the count, for the my_bcmp
> comparison of the page against the zeropage. The lockless increment
> between 255 to 256 against a lockless my_bcmp could otherwise return
> false positives on ppc32le.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
>  tools/testing/selftests/vm/userfaultfd.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)

Without groking what the code is doing, this fix makes the test pass on my
ppc64le box.

So if you like have a:

Tested-by: Michael Ellerman <mpe@ellerman.id.au>

cheers


git:master@linux-next(I)> ssh -t lebuntu sudo ./userfaultfd 128 32
nr_pages: 2048, nr_pages_per_cpu: 128
bounces: 31, mode: rnd racing ver poll, userfaults: 56 390 40 33 32 32 26 127 29 8 7 10 12 4 5 2
bounces: 30, mode: racing ver poll, userfaults: 247 39 29 39 28 17 40 23 21 17 18 15 14 2 2 2
bounces: 29, mode: rnd ver poll, userfaults: 140 120 169 120 90 100 136 106 46 35 14 0 1 0 0 0
bounces: 28, mode: ver poll, userfaults: 61 64 36 74 30 40 24 45 11 18 10 5 12 9 4 0
bounces: 27, mode: rnd racing poll, userfaults: 145 34 84 18 38 23 57 11 13 8 18 15 6 3 5 3
bounces: 26, mode: racing poll, userfaults: 32 23 19 43 22 23 9 21 21 15 5 12 16 5 12 2
bounces: 25, mode: rnd poll, userfaults: 92 118 107 122 95 41 50 49 27 30 61 13 2 3 0 0
bounces: 24, mode: poll, userfaults: 92 64 66 31 40 25 33 52 38 24 33 12 20 4 3 1
bounces: 23, mode: rnd racing ver, userfaults: 43 50 62 42 25 13 52 17 11 27 7 3 2 5 3 0
bounces: 22, mode: racing ver, userfaults: 18 16 24 15 18 19 39 16 15 13 7 6 8 5 9 4
bounces: 21, mode: rnd ver, userfaults: 188 195 141 104 136 99 51 30 40 32 20 23 1 0 0 0
bounces: 20, mode: ver, userfaults: 55 63 79 60 37 35 37 21 24 26 29 18 23 17 4 0
bounces: 19, mode: rnd racing, userfaults: 54 44 36 31 59 36 59 37 25 13 18 15 6 11 2 0
bounces: 18, mode: racing, userfaults: 106 73 121 36 91 62 43 37 36 27 12 12 9 11 6 1
bounces: 17, mode: rnd, userfaults: 157 146 119 121 96 78 78 75 75 25 29 10 0 1 0 0
bounces: 16, mode:, userfaults: 99 105 88 86 100 27 26 40 26 20 36 23 31 15 14 2
bounces: 15, mode: rnd racing ver poll, userfaults: 69 38 27 38 29 53 45 26 30 33 15 16 23 6 3 1
bounces: 14, mode: racing ver poll, userfaults: 40 32 61 58 39 45 12 69 67 11 12 10 4 2 2 1
bounces: 13, mode: rnd ver poll, userfaults: 125 192 161 154 153 63 53 25 53 96 2 0 0 0 0 0
bounces: 12, mode: ver poll, userfaults: 45 48 39 21 107 74 25 27 12 31 14 10 4 4 4 3
bounces: 11, mode: rnd racing poll, userfaults: 175 70 251 29 31 21 25 21 17 25 4 12 10 8 3 1
bounces: 10, mode: racing poll, userfaults: 11 19 9 10 26 10 11 4 11 2 1 1 0 2 1 0
bounces: 9, mode: rnd poll, userfaults: 102 61 96 159 109 71 57 64 34 54 53 23 13 6 0 0
bounces: 8, mode: poll, userfaults: 46 23 24 46 35 43 37 37 28 9 10 23 13 13 5 1
bounces: 7, mode: rnd racing ver, userfaults: 152 51 34 30 39 48 20 26 25 20 12 9 10 8 5 11
bounces: 6, mode: racing ver, userfaults: 19 32 40 33 29 43 23 19 15 15 11 14 4 2 5 3
bounces: 5, mode: rnd ver, userfaults: 124 114 162 132 71 84 58 61 39 47 13 22 23 7 7 0
bounces: 4, mode: ver, userfaults: 123 112 46 19 35 29 17 8 24 10 17 14 18 11 13 9
bounces: 3, mode: rnd racing, userfaults: 61 48 57 54 56 51 19 32 10 5 19 11 4 6 1 1
bounces: 2, mode: racing, userfaults: 21 12 8 14 11 17 7 13 6 10 13 5 1 2 4 2
bounces: 1, mode: rnd, userfaults: 153 121 129 139 105 101 92 83 23 46 24 0 0 0 0 0
bounces: 0, mode:, userfaults: 67 58 59 67 36 55 14 12 14 23 15 9 4 1 0 3


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 06/12] userfaultfd: selftest: avoid my_bcmp false positives with powerpc
  2015-09-09  2:50   ` Michael Ellerman
@ 2015-09-09 17:02     ` Andrea Arcangeli
  0 siblings, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-09 17:02 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Andrew Morton, linux-mm, Pavel Emelyanov, zhang.zhanghailiang,
	Dave Hansen, Rik van Riel, Dr. David Alan Gilbert,
	Huangpeng (Peter),
	Bamvor Zhang Jian, Bharata B Rao, Geert Uytterhoeven

On Wed, Sep 09, 2015 at 12:50:26PM +1000, Michael Ellerman wrote:
> On Tue, 2015-09-08 at 22:43 +0200, Andrea Arcangeli wrote:
> > Keep a non-zero placeholder after the count, for the my_bcmp
> > comparison of the page against the zeropage. The lockless increment
> > between 255 to 256 against a lockless my_bcmp could otherwise return
> > false positives on ppc32le.
> > 
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > ---
> >  tools/testing/selftests/vm/userfaultfd.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> Without groking what the code is doing, this fix makes the test pass on my
> ppc64le box.

On ppc byte reads are "more" out of order, so you can read byte 1
before byte 0. This doesn't seem to happen on x86 (even though
smp_rmb() is not a noop on x86). Now the code didn't even have an
actual compiler barrier but gcc is unlikely to unroll the loop in a
unordered way so that wasn't a problem, but this patch takes care of
gcc too.

One side does a inc++ on a long long. The other side read byte 1
before byte 0, and check if they're both zero. When the inc long long
var transitions from 255 to 256 if you read it in reverse with little
endian, you'll see 0 0 and so my_bcmp would think the page is full of
zeros.

my_bcmp checks that we didn't map a zeropage in there by mistake (like
it would happen if userfaultfd wasn't registered on the anonymous
holes). So instead of relaying on the counter to be always > 0, I
added a further word after the counter that is never changing and not
zero, so we don't have to use any memory barrier for those out of
order checks.

On a side note (feel free to skip this part as it's userland): this is
also why I couldn't use bcmp because bcmp and memcmp return false
positive zeroes if the memory changes under it. In the sse4 unrolled
loop if it finds a difference, it can't tell if it should return > or
< 0 because it's a ptest and not a cmp insn, so then it has to repeat
the memory comparison (re-reading from memory a potentially different
copy of the memory that could have become equal). Problem is it's not
restarting the unrolled loop from where it stopped it, if this final
comparison returns zero and it's not the last byte to compare. In
short glibc memcmp/bcmp can very well return 0 before comparing all
memory that it is told to compare, if the memory is changing under
memcmp/bcmp.

It'd be enough to restart the unrolled loop if the "length" isn't zero
and the last final comparison returned zero, to fix memcmp/bcmp in
glibc. It's not getting fixed because it's not a bug by the C
standard, but it makes the SIMD accellerated bcmp/memcmp in glibc
unusable for lockless fast-path comparisons as it would lead to false
positives that would degrade performance. For example if glibc memcmp
was used to build the stable tree in KSM, it would lead to superfluous
write protections. KSM is in kernel of course so it's not affected by
the memcmp glibc implementation, but similar things can happen in
userland (like I found out the hard way in the userfaultfd program).

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 12/12] userfaultfd: register uapi generic syscall (aarch64)
  2015-09-08 20:43 ` [PATCH 12/12] userfaultfd: register uapi generic syscall (aarch64) Andrea Arcangeli
@ 2015-09-15 20:02   ` Andrew Morton
  2015-09-15 20:20     ` Mathieu Desnoyers
  2015-09-15 20:47     ` Andrea Arcangeli
  0 siblings, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2015-09-15 20:02 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: linux-mm, Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen,
	Rik van Riel, Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven, Mathieu Desnoyers

On Tue,  8 Sep 2015 22:43:30 +0200 Andrea Arcangeli <aarcange@redhat.com> wrote:

> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add the userfaultfd syscalls to uapi asm-generic, it was tested with
> postcopy live migration on aarch64 with both 4k and 64k pagesize kernels.
> 
> ...
>
> --- a/include/uapi/asm-generic/unistd.h
> +++ b/include/uapi/asm-generic/unistd.h
> @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
>  __SYSCALL(__NR_bpf, sys_bpf)
>  #define __NR_execveat 281
>  __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
> +#define __NR_userfaultfd 282
> +__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 282
> +#define __NR_syscalls 283

sys_membarrier got there first.  Does this version look OK?

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: userfaultfd: register uapi generic syscall (aarch64)

Add the userfaultfd syscalls to uapi asm-generic, it was tested with
postcopy live migration on aarch64 with both 4k and 64k pagesize kernels.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/uapi/asm-generic/unistd.h |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff -puN include/uapi/asm-generic/unistd.h~userfaultfd-register-uapi-generic-syscall-aarch64 include/uapi/asm-generic/unistd.h
--- a/include/uapi/asm-generic/unistd.h~userfaultfd-register-uapi-generic-syscall-aarch64
+++ a/include/uapi/asm-generic/unistd.h
@@ -709,17 +709,19 @@ __SYSCALL(__NR_memfd_create, sys_memfd_c
 __SYSCALL(__NR_bpf, sys_bpf)
 #define __NR_execveat 281
 __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
-#define __NR_membarrier 282
+#define __NR_userfaultfd 282
+__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
+#define __NR_membarrier 283
 __SYSCALL(__NR_membarrier, sys_membarrier)
 
 #undef __NR_syscalls
-#define __NR_syscalls 283
+#define __NR_syscalls 284
 
 /*
  * All syscalls below here should go away really,
  * these are provided for both review and as a porting
  * help for the C library version.
-*
+ *
  * Last chance: are any of these important enough to
  * enable by default?
  */
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 12/12] userfaultfd: register uapi generic syscall (aarch64)
  2015-09-15 20:02   ` Andrew Morton
@ 2015-09-15 20:20     ` Mathieu Desnoyers
  2015-09-15 20:47     ` Andrea Arcangeli
  1 sibling, 0 replies; 25+ messages in thread
From: Mathieu Desnoyers @ 2015-09-15 20:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andrea Arcangeli, linux-mm, Pavel Emelyanov, zhang zhanghailiang,
	Dave Hansen, Rik van Riel, Dr. David Alan Gilbert,
	Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao, geert

----- On Sep 15, 2015, at 4:02 PM, Andrew Morton akpm@linux-foundation.org wrote:

> On Tue,  8 Sep 2015 22:43:30 +0200 Andrea Arcangeli <aarcange@redhat.com> wrote:
> 
>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> 
>> Add the userfaultfd syscalls to uapi asm-generic, it was tested with
>> postcopy live migration on aarch64 with both 4k and 64k pagesize kernels.
>> 
>> ...
>>
>> --- a/include/uapi/asm-generic/unistd.h
>> +++ b/include/uapi/asm-generic/unistd.h
>> @@ -709,9 +709,11 @@ __SYSCALL(__NR_memfd_create, sys_memfd_create)
>>  __SYSCALL(__NR_bpf, sys_bpf)
>>  #define __NR_execveat 281
>>  __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
>> +#define __NR_userfaultfd 282
>> +__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
>>  
>>  #undef __NR_syscalls
>> -#define __NR_syscalls 282
>> +#define __NR_syscalls 283
> 
> sys_membarrier got there first.  Does this version look OK?

Hi Andrew,

Since userfaultfd also made it into 4.3-rc1, bumping the system
call number of sys_membarrier in asm-generic seems to be a good
approach to handle this conflict. We can probably expect conflicts
on other architectures too when architecture maintainers wire up
membarrier and userfaultfd.

Thanks!

Mathieu

> 
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Subject: userfaultfd: register uapi generic syscall (aarch64)
> 
> Add the userfaultfd syscalls to uapi asm-generic, it was tested with
> postcopy live migration on aarch64 with both 4k and 64k pagesize kernels.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
> include/uapi/asm-generic/unistd.h |    8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff -puN
> include/uapi/asm-generic/unistd.h~userfaultfd-register-uapi-generic-syscall-aarch64
> include/uapi/asm-generic/unistd.h
> ---
> a/include/uapi/asm-generic/unistd.h~userfaultfd-register-uapi-generic-syscall-aarch64
> +++ a/include/uapi/asm-generic/unistd.h
> @@ -709,17 +709,19 @@ __SYSCALL(__NR_memfd_create, sys_memfd_c
> __SYSCALL(__NR_bpf, sys_bpf)
> #define __NR_execveat 281
> __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
> -#define __NR_membarrier 282
> +#define __NR_userfaultfd 282
> +__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
> +#define __NR_membarrier 283
> __SYSCALL(__NR_membarrier, sys_membarrier)
> 
> #undef __NR_syscalls
> -#define __NR_syscalls 283
> +#define __NR_syscalls 284
> 
> /*
>  * All syscalls below here should go away really,
>  * these are provided for both review and as a porting
>  * help for the C library version.
> -*
> + *
>  * Last chance: are any of these important enough to
>  * enable by default?
>  */
> _

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 12/12] userfaultfd: register uapi generic syscall (aarch64)
  2015-09-15 20:02   ` Andrew Morton
  2015-09-15 20:20     ` Mathieu Desnoyers
@ 2015-09-15 20:47     ` Andrea Arcangeli
  1 sibling, 0 replies; 25+ messages in thread
From: Andrea Arcangeli @ 2015-09-15 20:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Pavel Emelyanov, zhang.zhanghailiang, Dave Hansen,
	Rik van Riel, Dr. David Alan Gilbert, Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven, Mathieu Desnoyers

On Tue, Sep 15, 2015 at 01:02:53PM -0700, Andrew Morton wrote:
> sys_membarrier got there first.  Does this version look OK?

Yes, but it's up to you.

While rebasing my tree on latest upstream I actually moved userfaultfd
to 283 here, as membarrier was already upstream at 282.

It makes no difference to me if userfaultfd gets 283 if you prefer to
leave 282 to membarrier considering it's already upstream. The
selftest will pick whatever number it gets with "make headers_install"
so it wouldn't require updates.

Thanks,
Andrea

> 
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Subject: userfaultfd: register uapi generic syscall (aarch64)
> 
> Add the userfaultfd syscalls to uapi asm-generic, it was tested with
> postcopy live migration on aarch64 with both 4k and 64k pagesize kernels.

> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  include/uapi/asm-generic/unistd.h |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff -puN include/uapi/asm-generic/unistd.h~userfaultfd-register-uapi-generic-syscall-aarch64 include/uapi/asm-generic/unistd.h
> --- a/include/uapi/asm-generic/unistd.h~userfaultfd-register-uapi-generic-syscall-aarch64
> +++ a/include/uapi/asm-generic/unistd.h
> @@ -709,17 +709,19 @@ __SYSCALL(__NR_memfd_create, sys_memfd_c
>  __SYSCALL(__NR_bpf, sys_bpf)
>  #define __NR_execveat 281
>  __SC_COMP(__NR_execveat, sys_execveat, compat_sys_execveat)
> -#define __NR_membarrier 282
> +#define __NR_userfaultfd 282
> +__SYSCALL(__NR_userfaultfd, sys_userfaultfd)
> +#define __NR_membarrier 283
>  __SYSCALL(__NR_membarrier, sys_membarrier)
>  
>  #undef __NR_syscalls
> -#define __NR_syscalls 283
> +#define __NR_syscalls 284
>  
>  /*
>   * All syscalls below here should go away really,
>   * these are provided for both review and as a porting
>   * help for the C library version.
> -*
> + *
>   * Last chance: are any of these important enough to
>   * enable by default?
>   */
> _
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+
  2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
                   ` (11 preceding siblings ...)
  2015-09-08 20:43 ` [PATCH 12/12] userfaultfd: register uapi generic syscall (aarch64) Andrea Arcangeli
@ 2015-09-30 21:56 ` Mike Kravetz
  2015-10-01  0:06   ` Andrea Arcangeli
  12 siblings, 1 reply; 25+ messages in thread
From: Mike Kravetz @ 2015-09-30 21:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, Pavel Emelyanov, zhang.zhanghailiang,
	Dave Hansen, Rik van Riel, Dr. David Alan Gilbert,
	Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

On 09/08/2015 01:43 PM, Andrea Arcangeli wrote:
> Here are some pending updates for userfaultfd mostly to the self test,
> the rest are cleanups.

I have a potential use case for userfualtfd.  So, I started experimenting
with the self test code.  I replaced the posix_memalign() calls to allocate
area_src and area_dst with mmap().  mmap(MAP_PRIVATE | MAP_ANONYMOUS) works
as expected.  However, mmap(MAP_SHARED | MAP_ANONYMOUS) causes the test to
fail without any errros from the userfaultfd APIs.

--------------------
running userfaultfd
--------------------
nr_pages: 32768, nr_pages_per_cpu: 8192
bounces: 31, mode: rnd racing ver poll, page_nr 31523 wrong count 0 1

I would expect some type of error from the ioctl() that registers the
range, or perhaps the poll/copy code?  Just curious about the expected
behavior.

FYI - My use case is for hugetlbfs.  I would like a mechanism to catch all
new huge page allocations as a result of page faults.  I have some very
rough code to extend userfualtfd and add the required functionality to
hugetlbfs.  Still working on it.

-- 
Mike Kravetz

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+
  2015-09-30 21:56 ` [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Mike Kravetz
@ 2015-10-01  0:06   ` Andrea Arcangeli
  2015-10-01  0:42     ` Mike Kravetz
  0 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2015-10-01  0:06 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, linux-mm, Pavel Emelyanov, zhang.zhanghailiang,
	Dave Hansen, Rik van Riel, Dr. David Alan Gilbert,
	Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

Hello Mike,

On Wed, Sep 30, 2015 at 02:56:19PM -0700, Mike Kravetz wrote:
> On 09/08/2015 01:43 PM, Andrea Arcangeli wrote:
> > Here are some pending updates for userfaultfd mostly to the self test,
> > the rest are cleanups.
> 
> I have a potential use case for userfualtfd.  So, I started experimenting

Glad to hear you may have one more use case.

On a side note, there's also a patch posted to CRIU to pagein lazily
anonymous memory during restore using userfaultfd, that's yet another
recent user.

> with the self test code.  I replaced the posix_memalign() calls to allocate
> area_src and area_dst with mmap().  mmap(MAP_PRIVATE | MAP_ANONYMOUS) works
> as expected.  However, mmap(MAP_SHARED | MAP_ANONYMOUS) causes the test to
> fail without any errros from the userfaultfd APIs.
> 
> --------------------
> running userfaultfd
> --------------------
> nr_pages: 32768, nr_pages_per_cpu: 8192
> bounces: 31, mode: rnd racing ver poll, page_nr 31523 wrong count 0 1
> 
> I would expect some type of error from the ioctl() that registers the
> range, or perhaps the poll/copy code?  Just curious about the expected
> behavior.

That should return an error during UFFDIO_REGISTER and the testcase
shouldn't start, not sure what went wrong. Can you send the
modification to the testcase?

UFFDIO_REGISTER is the point where userfaultfd is first told which
kind of memory you want to manage with userfaults. It was planned to
fail there (and it cannot fail any earlier).

This check has to fail and return -EINVAL in the ioctl(UFFDIO_REGISTER).

		/* check not compatible vmas */
		ret = -EINVAL;
		if (cur->vm_ops)
			goto out_unlock;

In the testcase you should get an exit 1 and the fprintf printed:

		if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) {
			fprintf(stderr, "register failure\n");
			return 1;
		}

Could you double check these two paths to find what's wrong?

> FYI - My use case is for hugetlbfs.  I would like a mechanism to catch all
> new huge page allocations as a result of page faults.  I have some very
> rough code to extend userfualtfd and add the required functionality to
> hugetlbfs.  Still working on it.

Adding support for hugetlbfs sounds great to me.

Only anonymous memory has null vm_ops, so once you extend the code to
track hugetlbfs (tracking at least tmpfs and not just anonymous memory
is needed for volatile pages which also work on tmpfs) you should
relax the above check to accept &hugetlb_vm_ops.

You then need to specify which kind of ioctl you supported in the
current kernel for that kind of memory you registered on in the
uffdio_register->ioctl parameter.

		/*
		 * Now that we scanned all vmas we can already tell
		 * userland which ioctls methods are guaranteed to
		 * succeed on this range.
		 */
		if (put_user(UFFD_API_RANGE_IOCTLS,
			     &user_uffdio_register->ioctls))
			ret = -EFAULT;

#define UFFD_API_RANGE_IOCTLS			\
	((__u64)1 << _UFFDIO_WAKE |		\
	 (__u64)1 << _UFFDIO_COPY |		\
	 (__u64)1 << _UFFDIO_ZEROPAGE)

hugetlbfs doesn't seem to support the zeropage. So if vma->vm_ops ==
&hugetlb_vm_ops, it should return only WAKE|COPY in
uffdio_register->ioctl.

hugetlbfs is non standard, there's no sysconf(_SC_PAGE_SIZE) to know
the minimum granularity supported by the UFFDIO_COPY|WAKE of
hugetlbfs. This is a generic issue with hugetlbfs, not really related
to userfaultfd. The same constraints of hugetlbfs minimum granularity
and alignment applies to all other memory management syscalls too.

So the app itself using hugetlbfs will have to know by other means
(i.e. sysfs mangling) that the minimum granularity supported by
UFFDIO_COPY is 2MB (or 1GB). That is again because it registered
userfaultfd on hugetlbfs, and hugetlbfs has non standard
constraints. In turn UFFDIO_COPY of hugetlbfs has to fail if len is
not a multiple of 2MB (never the case for all other kinds of memory
that userfaultfd could ever manage).

There's flexibility in the userfaultfd API to gradually expand the
coverage to a variety of types of virtual memory while at the same
time not risking random behavior from a new app if run on a old
kernel. The new app will be able to tell reliably to the user, to
upgrade the kernel (or it can fallback to a non-userfaultfd mode with
just a warning to the user).

We need to handle the write protection faults too as soon as possible
(VM_UFFD_WP/UFFD_FEATURE_PAGEFAULT_FLAG_WP). The uffdio_api->features
are already prepared to report to userland the availability of the
UFFD_FEATURE_PAGEFAULT_FLAG_WP. Then the app can set
UFFDIO_REGISTER_MODE_WP in uffdio_register.mode.

I mentioned this because while there's flexibility to expand the
coverage gradually, it'd be great if all kinds of memory supporting
UFFDIO_REGISTER_MODE_MISSING would also support
UFFDIO_REGISTER_MODE_WP once that gets available, as it'd keep
userfaultfd_register() a bit simpler to maintain.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+
  2015-10-01  0:06   ` Andrea Arcangeli
@ 2015-10-01  0:42     ` Mike Kravetz
  2015-10-01 16:04       ` Andrea Arcangeli
  0 siblings, 1 reply; 25+ messages in thread
From: Mike Kravetz @ 2015-10-01  0:42 UTC (permalink / raw)
  To: Andrea Arcangeli, Mike Kravetz
  Cc: Andrew Morton, linux-mm, Pavel Emelyanov, zhang.zhanghailiang,
	Dave Hansen, Rik van Riel, Dr. David Alan Gilbert,
	Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

On 09/30/2015 05:06 PM, Andrea Arcangeli wrote:
> Hello Mike,
> 
> On Wed, Sep 30, 2015 at 02:56:19PM -0700, Mike Kravetz wrote:
>> On 09/08/2015 01:43 PM, Andrea Arcangeli wrote:
>>> Here are some pending updates for userfaultfd mostly to the self test,
>>> the rest are cleanups.
>>
>> I have a potential use case for userfualtfd.  So, I started experimenting
> 
> Glad to hear you may have one more use case.
> 
> On a side note, there's also a patch posted to CRIU to pagein lazily
> anonymous memory during restore using userfaultfd, that's yet another
> recent user.
> 
>> with the self test code.  I replaced the posix_memalign() calls to allocate
>> area_src and area_dst with mmap().  mmap(MAP_PRIVATE | MAP_ANONYMOUS) works
>> as expected.  However, mmap(MAP_SHARED | MAP_ANONYMOUS) causes the test to
>> fail without any errros from the userfaultfd APIs.
>>
>> --------------------
>> running userfaultfd
>> --------------------
>> nr_pages: 32768, nr_pages_per_cpu: 8192
>> bounces: 31, mode: rnd racing ver poll, page_nr 31523 wrong count 0 1
>>
>> I would expect some type of error from the ioctl() that registers the
>> range, or perhaps the poll/copy code?  Just curious about the expected
>> behavior.
> 
> That should return an error during UFFDIO_REGISTER and the testcase
> shouldn't start, not sure what went wrong. Can you send the
> modification to the testcase?
> 
> UFFDIO_REGISTER is the point where userfaultfd is first told which
> kind of memory you want to manage with userfaults. It was planned to
> fail there (and it cannot fail any earlier).
> 
> This check has to fail and return -EINVAL in the ioctl(UFFDIO_REGISTER).
> 
> 		/* check not compatible vmas */
> 		ret = -EINVAL;
> 		if (cur->vm_ops)
> 			goto out_unlock;
> 
> In the testcase you should get an exit 1 and the fprintf printed:
> 
> 		if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) {
> 			fprintf(stderr, "register failure\n");
> 			return 1;
> 		}
> 
> Could you double check these two paths to find what's wrong?

My apologies!!!!

This was running in my hacked up kernel.  I removed the cur->vm_ops check
as a quick and dirty way for me to register hugetlb vmas.  Sorry, I forgot
I was still running this kernel.

>> FYI - My use case is for hugetlbfs.  I would like a mechanism to catch all
>> new huge page allocations as a result of page faults.  I have some very
>> rough code to extend userfualtfd and add the required functionality to
>> hugetlbfs.  Still working on it.
> 
> Adding support for hugetlbfs sounds great to me.

The use case I have is pretty simple.  Recently, fallocate hole punch
support was added to hugetlbfs.  The reason for this is that the database
people want to 'free up' huge pages they know will no longer be used.
However, these huge pages are part of SGA areas sometimes mapped by tens
of thousands of tasks.  They would like to 'catch' any tasks that
(incorrectly) fault in a page after hole punch.  The thought is that
this can be done with userfaultfd by registering these mappings with
UFFDIO_REGISTER_MODE_MISSING.  No need for UFFDIO_COPY or UFFDIO_ZEROPAGE.
We would just send a signal to the task (such as SIGBUS) and then do
a UFFDIO_WAKE.  The only downside to this approach is having thousands
of threads monitoring userfault fds to catch a database error condition.
I believe the MADV_USERFAULT/NOUSERFAULT code you proposed some time back
would be the ideal solution for this use case.  Unfortunately, I did not
know of this use case or your proposal back then. :(

-- 
Mike Kravetz

> 
> Only anonymous memory has null vm_ops, so once you extend the code to
> track hugetlbfs (tracking at least tmpfs and not just anonymous memory
> is needed for volatile pages which also work on tmpfs) you should
> relax the above check to accept &hugetlb_vm_ops.
> 
> You then need to specify which kind of ioctl you supported in the
> current kernel for that kind of memory you registered on in the
> uffdio_register->ioctl parameter.
> 
> 		/*
> 		 * Now that we scanned all vmas we can already tell
> 		 * userland which ioctls methods are guaranteed to
> 		 * succeed on this range.
> 		 */
> 		if (put_user(UFFD_API_RANGE_IOCTLS,
> 			     &user_uffdio_register->ioctls))
> 			ret = -EFAULT;
> 
> #define UFFD_API_RANGE_IOCTLS			\
> 	((__u64)1 << _UFFDIO_WAKE |		\
> 	 (__u64)1 << _UFFDIO_COPY |		\
> 	 (__u64)1 << _UFFDIO_ZEROPAGE)
> 
> hugetlbfs doesn't seem to support the zeropage. So if vma->vm_ops ==
> &hugetlb_vm_ops, it should return only WAKE|COPY in
> uffdio_register->ioctl.
> 
> hugetlbfs is non standard, there's no sysconf(_SC_PAGE_SIZE) to know
> the minimum granularity supported by the UFFDIO_COPY|WAKE of
> hugetlbfs. This is a generic issue with hugetlbfs, not really related
> to userfaultfd. The same constraints of hugetlbfs minimum granularity
> and alignment applies to all other memory management syscalls too.
> 
> So the app itself using hugetlbfs will have to know by other means
> (i.e. sysfs mangling) that the minimum granularity supported by
> UFFDIO_COPY is 2MB (or 1GB). That is again because it registered
> userfaultfd on hugetlbfs, and hugetlbfs has non standard
> constraints. In turn UFFDIO_COPY of hugetlbfs has to fail if len is
> not a multiple of 2MB (never the case for all other kinds of memory
> that userfaultfd could ever manage).
> 
> There's flexibility in the userfaultfd API to gradually expand the
> coverage to a variety of types of virtual memory while at the same
> time not risking random behavior from a new app if run on a old
> kernel. The new app will be able to tell reliably to the user, to
> upgrade the kernel (or it can fallback to a non-userfaultfd mode with
> just a warning to the user).
> 
> We need to handle the write protection faults too as soon as possible
> (VM_UFFD_WP/UFFD_FEATURE_PAGEFAULT_FLAG_WP). The uffdio_api->features
> are already prepared to report to userland the availability of the
> UFFD_FEATURE_PAGEFAULT_FLAG_WP. Then the app can set
> UFFDIO_REGISTER_MODE_WP in uffdio_register.mode.
> 
> I mentioned this because while there's flexibility to expand the
> coverage gradually, it'd be great if all kinds of memory supporting
> UFFDIO_REGISTER_MODE_MISSING would also support
> UFFDIO_REGISTER_MODE_WP once that gets available, as it'd keep
> userfaultfd_register() a bit simpler to maintain.
> 
> Thanks,
> Andrea
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+
  2015-10-01  0:42     ` Mike Kravetz
@ 2015-10-01 16:04       ` Andrea Arcangeli
  2015-10-01 16:45         ` Mike Kravetz
  0 siblings, 1 reply; 25+ messages in thread
From: Andrea Arcangeli @ 2015-10-01 16:04 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Andrew Morton, linux-mm, Pavel Emelyanov, zhang.zhanghailiang,
	Dave Hansen, Rik van Riel, Dr. David Alan Gilbert,
	Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

Hello Mike,

On Wed, Sep 30, 2015 at 05:42:09PM -0700, Mike Kravetz wrote:
> The use case I have is pretty simple.  Recently, fallocate hole punch
> support was added to hugetlbfs.  The reason for this is that the database
> people want to 'free up' huge pages they know will no longer be used.
> However, these huge pages are part of SGA areas sometimes mapped by tens
> of thousands of tasks.  They would like to 'catch' any tasks that
> (incorrectly) fault in a page after hole punch.  The thought is that
> this can be done with userfaultfd by registering these mappings with
> UFFDIO_REGISTER_MODE_MISSING.  No need for UFFDIO_COPY or UFFDIO_ZEROPAGE.
> We would just send a signal to the task (such as SIGBUS) and then do
> a UFFDIO_WAKE.  The only downside to this approach is having thousands
> of threads monitoring userfault fds to catch a database error condition.
> I believe the MADV_USERFAULT/NOUSERFAULT code you proposed some time back
> would be the ideal solution for this use case.  Unfortunately, I did not
> know of this use case or your proposal back then. :(

I see how the MADV_USERFAULT would have been lighter weight in
avoiding to allocate anon file structures and the associated anon
inode, but it's no big deal. A few thousand files are lost in the
noise in terms of memory footprint and there will be no performance
difference.

Note also that adding back MADV_USEFAULT always remains possible but
you can avoid all those threads even with the userfaultfd API. CRIU
and postcopy live migration of containers are also going to use a
similar logic (and for them MADV_USERFAULT API would not be enough).

Even at the light of this, I don't think MADV_USERFAULT was worth
saving, it was too flakey when you deal with copy-user or GUP failing
in the context of read/write or other syscalls that just return
-EFAULT and are not restartable by signals if page faults fails. Not
to tell it requires going back to userland and back into kernel in
order to run the sigbus handler, userfaultfd optimizes that away. Last
but not the least a communication channel between the sigbus handler
and the userfault handler thread would need to be allocated by
manually by userland anyway. With userfaultfd it's the kernel that
talks directly to the userfault handler thread so there's no need of
maintaining another communication channel because the userfaultfd
provides for it in a more efficient way.

If you have a parent alive of all those processes waiting for sigchld
to reap the zombies, you can send the userfaultfd of the child to a
thread in the parent using unix domain sockets, then you can release
the fd in the child. Then the uffd will be pollable in the parent and
it'll still work on the child "mm" as if it was a thread per-child
handling it. A single parent thread (or even the main process thread
itself if it's using a epoll driven loop) can poll all child. If doing
it with a separate thread cloned by the parent, no need of epoll for
your case, as you only get waken in case of memory corruption and
failure to cleanup and report.

Once an uffd gets waken you can send any signal to the child to kill
it (note that only SIGKILL is reliable to kill a task stuck in
handle_userfaultd because if the userfault happened inside a syscall
all other signals can't run until the child is waken by
UFFDIO_WAKE). SIGKILL always works reliably at killing a task stuck in
userfault no matter if it was originated by userland or not. To
decrease the latency of signals and to allow gdb/strace to work
seamlessly in most cases, we allowed signals to interrupt a blocked
userfault if it originated in userland and in turn it will be retried
immediately after the signal sigreturns. It'll be like if no page
fault has happened yet by the time the signal returns. You don't want
to depend on this as you won't know if the handle_userfault() was
originated by a userland or kernel page fault.

When a SIGCHLD is received by the parent and you call one of the
wait() variants to reap the zombie, you also close the associated uffd
to release the memory of the child.

Alternatively if you are satisfied with just an hang instead of ending
up with memory-corrupting, you can just register it in the child and
leave the uffd open without ever polling it. If you've a watchdog in
the parent process detecting task in S state not responding you can
still detect the corruption case by looking in /proc/pid/stack, you'll
see it hung in handle_userfault(). This won't provide for an accurate
error message though but it'd be the simplest to deploy. It'll still
provide for a fully safe avoidance of memory corruption and it may be
enough considering what would happen if the userfault wasn't armed.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+
  2015-10-01 16:04       ` Andrea Arcangeli
@ 2015-10-01 16:45         ` Mike Kravetz
  0 siblings, 0 replies; 25+ messages in thread
From: Mike Kravetz @ 2015-10-01 16:45 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andrew Morton, linux-mm, Pavel Emelyanov, zhang.zhanghailiang,
	Dave Hansen, Rik van Riel, Dr. David Alan Gilbert,
	Huangpeng (Peter),
	Michael Ellerman, Bamvor Zhang Jian, Bharata B Rao,
	Geert Uytterhoeven

Thanks for the detailed explanation Andrea

On 10/01/2015 09:04 AM, Andrea Arcangeli wrote:
> Hello Mike,
> 
> On Wed, Sep 30, 2015 at 05:42:09PM -0700, Mike Kravetz wrote:
>> The use case I have is pretty simple.  Recently, fallocate hole punch
>> support was added to hugetlbfs.  The reason for this is that the database
>> people want to 'free up' huge pages they know will no longer be used.
>> However, these huge pages are part of SGA areas sometimes mapped by tens
>> of thousands of tasks.  They would like to 'catch' any tasks that
>> (incorrectly) fault in a page after hole punch.  The thought is that
>> this can be done with userfaultfd by registering these mappings with
>> UFFDIO_REGISTER_MODE_MISSING.  No need for UFFDIO_COPY or UFFDIO_ZEROPAGE.
>> We would just send a signal to the task (such as SIGBUS) and then do
>> a UFFDIO_WAKE.  The only downside to this approach is having thousands
>> of threads monitoring userfault fds to catch a database error condition.
>> I believe the MADV_USERFAULT/NOUSERFAULT code you proposed some time back
>> would be the ideal solution for this use case.  Unfortunately, I did not
>> know of this use case or your proposal back then. :(
> 
> I see how the MADV_USERFAULT would have been lighter weight in
> avoiding to allocate anon file structures and the associated anon
> inode, but it's no big deal. A few thousand files are lost in the
> noise in terms of memory footprint and there will be no performance
> difference.
> 
> Note also that adding back MADV_USEFAULT always remains possible but
> you can avoid all those threads even with the userfaultfd API. CRIU
> and postcopy live migration of containers are also going to use a
> similar logic (and for them MADV_USERFAULT API would not be enough).
> 
> Even at the light of this, I don't think MADV_USERFAULT was worth
> saving, it was too flakey when you deal with copy-user or GUP failing
> in the context of read/write or other syscalls that just return
> -EFAULT and are not restartable by signals if page faults fails. Not
> to tell it requires going back to userland and back into kernel in
> order to run the sigbus handler, userfaultfd optimizes that away. Last
> but not the least a communication channel between the sigbus handler
> and the userfault handler thread would need to be allocated by
> manually by userland anyway. With userfaultfd it's the kernel that
> talks directly to the userfault handler thread so there's no need of
> maintaining another communication channel because the userfaultfd
> provides for it in a more efficient way.
> 
> If you have a parent alive of all those processes waiting for sigchld
> to reap the zombies, you can send the userfaultfd of the child to a
> thread in the parent using unix domain sockets, then you can release
> the fd in the child. Then the uffd will be pollable in the parent and
> it'll still work on the child "mm" as if it was a thread per-child
> handling it. A single parent thread (or even the main process thread
> itself if it's using a epoll driven loop) can poll all child. If doing
> it with a separate thread cloned by the parent, no need of epoll for
> your case, as you only get waken in case of memory corruption and
> failure to cleanup and report.

Yes, it was my intention to try and consolidate userfault fd polling to
several threads using this method.

> Once an uffd gets waken you can send any signal to the child to kill
> it (note that only SIGKILL is reliable to kill a task stuck in
> handle_userfaultd because if the userfault happened inside a syscall
> all other signals can't run until the child is waken by
> UFFDIO_WAKE). SIGKILL always works reliably at killing a task stuck in
> userfault no matter if it was originated by userland or not. To
> decrease the latency of signals and to allow gdb/strace to work
> seamlessly in most cases, we allowed signals to interrupt a blocked
> userfault if it originated in userland and in turn it will be retried
> immediately after the signal sigreturns. It'll be like if no page
> fault has happened yet by the time the signal returns. You don't want
> to depend on this as you won't know if the handle_userfault() was
> originated by a userland or kernel page fault.

Thanks.  I was not aware of this issue.

> When a SIGCHLD is received by the parent and you call one of the
> wait() variants to reap the zombie, you also close the associated uffd
> to release the memory of the child.
> 
> Alternatively if you are satisfied with just an hang instead of ending
> up with memory-corrupting, you can just register it in the child and
> leave the uffd open without ever polling it. If you've a watchdog in
> the parent process detecting task in S state not responding you can
> still detect the corruption case by looking in /proc/pid/stack, you'll
> see it hung in handle_userfault(). This won't provide for an accurate
> error message though but it'd be the simplest to deploy. It'll still
> provide for a fully safe avoidance of memory corruption and it may be
> enough considering what would happen if the userfault wasn't armed.

I need to talk with the database folks about this.  Pretty sure they want
to be signaled in this case.  However, it does make me wonder what type
of 'recovery' is possible in the thread accessing data that should no
longer be valid.  I am pretty sure this would be a rare occurrence.  They
only want the ability to catch potential bugs in their code.  Ideally,
this never happens.  This is why there is some concern about the resources
necessary (per-process userfault fd and polling thread) for something that
hopefully never happens.

-- 
Mike Kravetz

> Thanks,
> Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-10-01 16:45 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-08 20:43 [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 01/12] userfaultfd: selftest: update userfaultfd x86 32bit syscall number Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 02/12] userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 03/12] userfaultfd: selftests: vm: pick up sanitized kernel headers Andrea Arcangeli
2015-09-09  2:48   ` Michael Ellerman
2015-09-08 20:43 ` [PATCH 04/12] userfaultfd: selftest: headers fixup Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 05/12] userfaultfd: selftest: only warn if __NR_userfaultfd is undefined Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 06/12] userfaultfd: selftest: avoid my_bcmp false positives with powerpc Andrea Arcangeli
2015-09-09  2:50   ` Michael Ellerman
2015-09-09 17:02     ` Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 07/12] userfaultfd: selftest: Fix compiler warnings on 32-bit Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 08/12] userfaultfd: selftest: return an error if BOUNCE_VERIFY fails Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 09/12] userfaultfd: selftest: don't error out if pthread_mutex_t isn't identical Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 10/12] userfaultfd: powerpc: Bump up __NR_syscalls to account for __NR_userfaultfd Andrea Arcangeli
2015-09-09  2:48   ` Michael Ellerman
2015-09-08 20:43 ` [PATCH 11/12] userfaultfd: powerpc: implement syscall Andrea Arcangeli
2015-09-08 20:43 ` [PATCH 12/12] userfaultfd: register uapi generic syscall (aarch64) Andrea Arcangeli
2015-09-15 20:02   ` Andrew Morton
2015-09-15 20:20     ` Mathieu Desnoyers
2015-09-15 20:47     ` Andrea Arcangeli
2015-09-30 21:56 ` [PATCH 00/12] userfaultfd non-x86 and selftest updates for 4.2.0+ Mike Kravetz
2015-10-01  0:06   ` Andrea Arcangeli
2015-10-01  0:42     ` Mike Kravetz
2015-10-01 16:04       ` Andrea Arcangeli
2015-10-01 16:45         ` Mike Kravetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.