All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	qemu-devel@nongnu.org, kvm@vger.kernel.org
Cc: Pavel Emelyanov <xemul@parallels.com>,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	zhang.zhanghailiang@huawei.com,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Andy Lutomirski <luto@amacapital.net>,
	Hugh Dickins <hughd@google.com>,
	Peter Feiner <pfeiner@google.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Subject: [PATCH 0/7] userfault21 update
Date: Mon, 15 Jun 2015 19:22:04 +0200	[thread overview]
Message-ID: <1434388931-24487-1-git-send-email-aarcange@redhat.com> (raw)

This is an incremental update to the userfaultfd code in -mm.

This fixes two bugs that could cause some malfunction (but nothing
that could cause memory corruption or kernel crashes of any sort,
neither in kernel nor userland).

This also introduces some enhancement: gdb now runs fine, signals can
interrupt userfaults (userfaults are retried when signal returns),
read blocking got wakeone behavior (with benchmark results in commit
header), the UFFDIO_API invocation is enforced before other ioctl can
run (to enforce future backwards compatibility just in case of API
bumps), one dependency on a scheduler change has been reverted.

Notably this introduces the testsuite as well. A good way to run the
testsuite is:

# it will use 10MiB-~6GiB 999 bounces, continue forever unless an error triggers
while ./userfaultfd $[RANDOM % 6000 + 10] 999; do true; done

What caused a significant amount of time wasted, had nothing to do
with userfaultfd. The testsuite exposed erratic memcmp/bcmp retvals if
part of the strings compared can change under memcmp/bcmp (while still
being different in other parts of the string that aren't actually
changing). I will provide a separate standalone testcase for this not
using userfaultfd (I already created it to be sure it isn't a bug in
userfaultfd, and nevertheless my my_bcmp works fine even with
userfaultfd). Insisting memcmp/bcmp would eventually lead to the
correct result that in kernel-speak to be initially (but erroneously)
translated to missing TLB flush (or cache flush but on x86 unlikely)
or a pagefault hitting on the zeropage somehow, or some other subtle
kernel bug. Eventually I had to consider the possibiltity memcmp or
bcmp core library functions were broken, despite how unlikely this
sounds. It might be possible that this only happens if the memory
changing is inside the "len" range being compared and that nothing
goes wrong if the data changing is beyond the end of the "len" even if
in the same cacheline. So it might be possible that it's perfectly
correct in C standard terms, but the total erratic result is
unacceptable to me and it makes memcmp/bcmp very risky to use in
multithreaded programs. I will ensure this gets fixed in my systems
with perhaps slower versions of memcpy/bcmp. If the two pages never
actually are the same at any given time (no matter if they're
changing) both bcmp and memcmp can't keep returning an erratic racy 0
here. If this is safe by C standard, this still wouldn't be safe
enough for me. It's unclear how this erratic result materializes at
this point and if SIMD instructions have special restrictions on
memory that is modified by other CPUs. CPU bugs in SIMD cannot be
ruled out either yet.

Andrea Arcangeli (7):
  userfaultfd: require UFFDIO_API before other ioctls
  userfaultfd: propagate the full address in THP faults
  userfaultfd: allow signals to interrupt a userfault
  userfaultfd: avoid missing wakeups during refile in userfaultfd_read
  userfaultfd: switch to exclusive wakeup for blocking reads
  userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to
    __wake_up_locked_key"
  userfaultfd: selftest

 fs/userfaultfd.c                         |  78 +++-
 include/linux/wait.h                     |   5 +-
 kernel/sched/wait.c                      |   7 +-
 mm/huge_memory.c                         |  10 +-
 net/sunrpc/sched.c                       |   2 +-
 tools/testing/selftests/vm/Makefile      |   4 +-
 tools/testing/selftests/vm/userfaultfd.c | 669 +++++++++++++++++++++++++++++++
 7 files changed, 752 insertions(+), 23 deletions(-)
 create mode 100644 tools/testing/selftests/vm/userfaultfd.c

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	qemu-devel@nongnu.org, kvm@vger.kernel.org
Cc: Pavel Emelyanov <xemul@parallels.com>,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	zhang.zhanghailiang@huawei.com,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Andy Lutomirski <luto@amacapital.net>,
	Hugh Dickins <hughd@google.com>,
	Peter Feiner <pfeiner@google.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Subject: [PATCH 0/7] userfault21 update
Date: Mon, 15 Jun 2015 19:22:04 +0200	[thread overview]
Message-ID: <1434388931-24487-1-git-send-email-aarcange@redhat.com> (raw)

This is an incremental update to the userfaultfd code in -mm.

This fixes two bugs that could cause some malfunction (but nothing
that could cause memory corruption or kernel crashes of any sort,
neither in kernel nor userland).

This also introduces some enhancement: gdb now runs fine, signals can
interrupt userfaults (userfaults are retried when signal returns),
read blocking got wakeone behavior (with benchmark results in commit
header), the UFFDIO_API invocation is enforced before other ioctl can
run (to enforce future backwards compatibility just in case of API
bumps), one dependency on a scheduler change has been reverted.

Notably this introduces the testsuite as well. A good way to run the
testsuite is:

# it will use 10MiB-~6GiB 999 bounces, continue forever unless an error triggers
while ./userfaultfd $[RANDOM % 6000 + 10] 999; do true; done

What caused a significant amount of time wasted, had nothing to do
with userfaultfd. The testsuite exposed erratic memcmp/bcmp retvals if
part of the strings compared can change under memcmp/bcmp (while still
being different in other parts of the string that aren't actually
changing). I will provide a separate standalone testcase for this not
using userfaultfd (I already created it to be sure it isn't a bug in
userfaultfd, and nevertheless my my_bcmp works fine even with
userfaultfd). Insisting memcmp/bcmp would eventually lead to the
correct result that in kernel-speak to be initially (but erroneously)
translated to missing TLB flush (or cache flush but on x86 unlikely)
or a pagefault hitting on the zeropage somehow, or some other subtle
kernel bug. Eventually I had to consider the possibiltity memcmp or
bcmp core library functions were broken, despite how unlikely this
sounds. It might be possible that this only happens if the memory
changing is inside the "len" range being compared and that nothing
goes wrong if the data changing is beyond the end of the "len" even if
in the same cacheline. So it might be possible that it's perfectly
correct in C standard terms, but the total erratic result is
unacceptable to me and it makes memcmp/bcmp very risky to use in
multithreaded programs. I will ensure this gets fixed in my systems
with perhaps slower versions of memcpy/bcmp. If the two pages never
actually are the same at any given time (no matter if they're
changing) both bcmp and memcmp can't keep returning an erratic racy 0
here. If this is safe by C standard, this still wouldn't be safe
enough for me. It's unclear how this erratic result materializes at
this point and if SIMD instructions have special restrictions on
memory that is modified by other CPUs. CPU bugs in SIMD cannot be
ruled out either yet.

Andrea Arcangeli (7):
  userfaultfd: require UFFDIO_API before other ioctls
  userfaultfd: propagate the full address in THP faults
  userfaultfd: allow signals to interrupt a userfault
  userfaultfd: avoid missing wakeups during refile in userfaultfd_read
  userfaultfd: switch to exclusive wakeup for blocking reads
  userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to
    __wake_up_locked_key"
  userfaultfd: selftest

 fs/userfaultfd.c                         |  78 +++-
 include/linux/wait.h                     |   5 +-
 kernel/sched/wait.c                      |   7 +-
 mm/huge_memory.c                         |  10 +-
 net/sunrpc/sched.c                       |   2 +-
 tools/testing/selftests/vm/Makefile      |   4 +-
 tools/testing/selftests/vm/userfaultfd.c | 669 +++++++++++++++++++++++++++++++
 7 files changed, 752 insertions(+), 23 deletions(-)
 create mode 100644 tools/testing/selftests/vm/userfaultfd.c

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	qemu-devel@nongnu.org, kvm@vger.kernel.org
Cc: zhang.zhanghailiang@huawei.com,
	Pavel Emelyanov <xemul@parallels.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hugh Dickins <hughd@google.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Mel Gorman <mgorman@suse.de>, Paolo Bonzini <pbonzini@redhat.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	"Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Feiner <pfeiner@google.com>
Subject: [Qemu-devel] [PATCH 0/7] userfault21 update
Date: Mon, 15 Jun 2015 19:22:04 +0200	[thread overview]
Message-ID: <1434388931-24487-1-git-send-email-aarcange@redhat.com> (raw)

This is an incremental update to the userfaultfd code in -mm.

This fixes two bugs that could cause some malfunction (but nothing
that could cause memory corruption or kernel crashes of any sort,
neither in kernel nor userland).

This also introduces some enhancement: gdb now runs fine, signals can
interrupt userfaults (userfaults are retried when signal returns),
read blocking got wakeone behavior (with benchmark results in commit
header), the UFFDIO_API invocation is enforced before other ioctl can
run (to enforce future backwards compatibility just in case of API
bumps), one dependency on a scheduler change has been reverted.

Notably this introduces the testsuite as well. A good way to run the
testsuite is:

# it will use 10MiB-~6GiB 999 bounces, continue forever unless an error triggers
while ./userfaultfd $[RANDOM % 6000 + 10] 999; do true; done

What caused a significant amount of time wasted, had nothing to do
with userfaultfd. The testsuite exposed erratic memcmp/bcmp retvals if
part of the strings compared can change under memcmp/bcmp (while still
being different in other parts of the string that aren't actually
changing). I will provide a separate standalone testcase for this not
using userfaultfd (I already created it to be sure it isn't a bug in
userfaultfd, and nevertheless my my_bcmp works fine even with
userfaultfd). Insisting memcmp/bcmp would eventually lead to the
correct result that in kernel-speak to be initially (but erroneously)
translated to missing TLB flush (or cache flush but on x86 unlikely)
or a pagefault hitting on the zeropage somehow, or some other subtle
kernel bug. Eventually I had to consider the possibiltity memcmp or
bcmp core library functions were broken, despite how unlikely this
sounds. It might be possible that this only happens if the memory
changing is inside the "len" range being compared and that nothing
goes wrong if the data changing is beyond the end of the "len" even if
in the same cacheline. So it might be possible that it's perfectly
correct in C standard terms, but the total erratic result is
unacceptable to me and it makes memcmp/bcmp very risky to use in
multithreaded programs. I will ensure this gets fixed in my systems
with perhaps slower versions of memcpy/bcmp. If the two pages never
actually are the same at any given time (no matter if they're
changing) both bcmp and memcmp can't keep returning an erratic racy 0
here. If this is safe by C standard, this still wouldn't be safe
enough for me. It's unclear how this erratic result materializes at
this point and if SIMD instructions have special restrictions on
memory that is modified by other CPUs. CPU bugs in SIMD cannot be
ruled out either yet.

Andrea Arcangeli (7):
  userfaultfd: require UFFDIO_API before other ioctls
  userfaultfd: propagate the full address in THP faults
  userfaultfd: allow signals to interrupt a userfault
  userfaultfd: avoid missing wakeups during refile in userfaultfd_read
  userfaultfd: switch to exclusive wakeup for blocking reads
  userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to
    __wake_up_locked_key"
  userfaultfd: selftest

 fs/userfaultfd.c                         |  78 +++-
 include/linux/wait.h                     |   5 +-
 kernel/sched/wait.c                      |   7 +-
 mm/huge_memory.c                         |  10 +-
 net/sunrpc/sched.c                       |   2 +-
 tools/testing/selftests/vm/Makefile      |   4 +-
 tools/testing/selftests/vm/userfaultfd.c | 669 +++++++++++++++++++++++++++++++
 7 files changed, 752 insertions(+), 23 deletions(-)
 create mode 100644 tools/testing/selftests/vm/userfaultfd.c

             reply	other threads:[~2015-06-15 17:22 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-15 17:22 Andrea Arcangeli [this message]
2015-06-15 17:22 ` [Qemu-devel] [PATCH 0/7] userfault21 update Andrea Arcangeli
2015-06-15 17:22 ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 1/7] userfaultfd: require UFFDIO_API before other ioctls Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 18:11   ` Linus Torvalds
2015-06-15 18:11     ` [Qemu-devel] " Linus Torvalds
2015-06-15 21:43     ` Andrea Arcangeli
2015-06-15 21:43       ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 21:43       ` Andrea Arcangeli
2015-06-15 21:55       ` Linus Torvalds
2015-06-15 21:55         ` [Qemu-devel] " Linus Torvalds
2015-06-15 17:22 ` [PATCH 2/7] userfaultfd: propagate the full address in THP faults Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 3/7] userfaultfd: allow signals to interrupt a userfault Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 4/7] userfaultfd: avoid missing wakeups during refile in userfaultfd_read Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 5/7] userfaultfd: switch to exclusive wakeup for blocking reads Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 18:19   ` Linus Torvalds
2015-06-15 18:19     ` [Qemu-devel] " Linus Torvalds
2015-06-15 22:19     ` Andrea Arcangeli
2015-06-15 22:19       ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 22:19       ` Andrea Arcangeli
2015-06-16  6:41       ` Linus Torvalds
2015-06-16  6:41         ` [Qemu-devel] " Linus Torvalds
2015-06-16  6:41         ` Linus Torvalds
2015-06-16 12:17         ` Andrea Arcangeli
2015-06-16 12:17           ` [Qemu-devel] " Andrea Arcangeli
2015-06-16 12:17           ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 6/7] userfaultfd: Revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-06-15 17:22 ` [PATCH 7/7] userfaultfd: selftest Andrea Arcangeli
2015-06-15 17:22   ` [Qemu-devel] " Andrea Arcangeli
2015-06-15 17:22   ` Andrea Arcangeli
2015-10-12 15:04 ` [PATCH 0/7] userfault21 update Patrick Donnelly
2015-10-12 15:04   ` [Qemu-devel] " Patrick Donnelly
2015-10-12 15:04   ` Patrick Donnelly
2015-10-19 21:42   ` Andrea Arcangeli
2015-10-19 21:42     ` [Qemu-devel] " Andrea Arcangeli
2015-10-19 21:42     ` Andrea Arcangeli
2015-10-19 21:42     ` Andrea Arcangeli
2015-10-20 13:44     ` Patrick Donnelly
2015-10-20 13:44       ` [Qemu-devel] " Patrick Donnelly
2015-10-20 13:44       ` Patrick Donnelly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1434388931-24487-1-git-send-email-aarcange@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=dave.hansen@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mgorman@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=pfeiner@google.com \
    --cc=qemu-devel@nongnu.org \
    --cc=riel@redhat.com \
    --cc=sanidhya.gatech@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=xemul@parallels.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.