All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	qemu-devel@nongnu.org, KVM list <kvm@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	zhang.zhanghailiang@huawei.com,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Andy Lutomirski <luto@amacapital.net>,
	Hugh Dickins <hughd@google.com>,
	Peter Feiner <pfeiner@google.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Subject: Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization
Date: Fri, 15 May 2015 18:04:26 +0200	[thread overview]
Message-ID: <20150515160426.GD19097@redhat.com> (raw)
In-Reply-To: <CA+55aFwCODeiXUPDR7-Y-=2xE2abmVuCnmVV=ezFqhO+JkaW=A@mail.gmail.com>

On Thu, May 14, 2015 at 10:49:06AM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2015 at 10:31 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > +static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
> > +                                          struct userfaultfd_wake_range *range)
> > +{
> > +       if (waitqueue_active(&ctx->fault_wqh))
> > +               __wake_userfault(ctx, range);
> > +}
> 
> Pretty much every single time people use this "if
> (waitqueue_active())" model, it tends to be a bug, because it means
> that there is zero serialization with people who are just about to go
> to sleep. It's fundamentally racy against all the "wait_event()" loops
> that carefully do memory barriers between testing conditions and going
> to sleep, because the memory barriers now don't exist on the waking
> side.

As far as 10/23 is concerned, the __wake_userfault taking the locks
would also ignore any "incoming" not yet "read(2)" userfault. "read(2)"
as in syscall read. So there was no race to worry about as "incoming"
userfaults would be ignored anyway by the wake.

The only case that had to be reliable in not missing wakeup events was
the "release" file operation. But that doesn't use waitqueue_active
and it relies on ctx->released combined with mmap_sem taken for writing.

    http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/tree/fs/userfaultfd.c?h=userfault&id=2f73ffa8267e41f04fe6b0d93d23feed45c0980a#n267

However later (after I started documenting what userland should do) I
started to question this old model of leaving the race handling to
userland. It's way too complex to leave the race handling to
userland. Furthermore it's inefficient because there would be lots of
spurious userfaults that we could have been waken up within the kernel
before userland could have a chance to read them.

So then I handled the race in the kernel in patch 17/23. That allowed
to drop hundres of line of locking code from qemu (two bits per page
and a mutex and lots of complexity) and then I didn't need to document
the complex rules described in this commit header:

     http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=5a2b3614e107482da9a1fcfcd120d30eb62d45dc

You can see from the patch it complicated the kernel by adding a
pagetable walk, but it's worth it because it's simpler than solving it
in userland, plus its solved at once for all users.

The only cons is that the old model would allow userland to be way
more consistent in enforcing asserts as it had the control.

With the current model POLLIN may be returned from poll() despite the
later read returns -EAGAIN, so it basically requires a non blocking
open if people uses poll(). (blocking reads without poll are still
fine)

Now back to your question: the waitqueue_active is still there in the
current model as well (since 17/23).

Since 17/23 losing a wakeup (as far as qemu is concerned) would mean
that qemu would read the address of the fault that didn't get waken up
because of the race, then it would send a page request to the source
node (it had no state in the destination node where the userfault runs
to know if it was a "dup"), which would discard the request (not
sending the page twice) noticing in its simple per-page bitmap that it
was already sent. So with the new model after 17/23 qemu, losing a
wakeup is a bug.

The wait_event is like this:

	handle_userfault (wait_event)
	-------
	spin_lock(&ctx->fault_pending_wqh.lock);
	/*
	 * After the __add_wait_queue the uwq is visible to userland
	 * through poll/read().
	 */
	__add_wait_queue(&ctx->fault_pending_wqh, &uwq.wq);
	/*
	 * The smp_mb() after __set_current_state prevents the reads
	 * following the spin_unlock to happen before the list_add in
	 * __add_wait_queue.
	 */
	set_current_state(TASK_KILLABLE);
	spin_unlock(&ctx->fault_pending_wqh.lock);

	must_wait = userfaultfd_must_wait(ctx, address, flags, reason);

	up_read(&mm->mmap_sem);

	if (likely(must_wait && !ACCESS_ONCE(ctx->released) &&
		   !fatal_signal_pending(current))) {
		wake_up_poll(&ctx->fd_wqh, POLLIN);
		schedule();

The wakeup side is:

	userfaultfd_copy
	-----------
	ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
			   uffdio_copy.len);
	if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
		return -EFAULT;
	if (ret < 0)
		goto out;
	BUG_ON(!ret);
	/* len == 0 would wake all */
	range.len = ret;
	if (!(uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE)) {
		range.start = uffdio_copy.dst;
		wake_userfault(ctx, &range);


wake_userfault is the function that is using waitqueue_active so the
problem materializes if waitqueue_active moves before
mcopy_atomic. Precisely it should move before the set_pte_at below:

	mcopy_atomic_pte
	------
	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);

	/* No need to invalidate - it was non-present before */
	update_mmu_cache(dst_vma, dst_addr, dst_pte);

	pte_unmap_unlock(dst_pte, ptl);

unlock would allow the read to be reordered before it even on
x86. There's an up_read as well in between but it has the same
problem. So you're right that theoretically we can miss a wakeup.

Practically it sounds unlikely because of the sheer size of
mcopy_atomic and we never experienced it but it's still a bug.

>  So I'm making a new rule: if you use waitqueue_active(), I want an
> explanation for why it's not racy with the waiter. A big comment
> about the memory ordering, or about higher-level locks that are held
> by the caller, or something.  Linus

To fix it I added this along a comment:

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index c89e96f..6be316b 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -593,6 +593,21 @@ static void __wake_userfault(struct userfaultfd_ctx *ctx,
 static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
 					   struct userfaultfd_wake_range *range)
 {
+	/*
+	 * To be sure waitqueue_active() is not reordered by the CPU
+	 * before the pagetable update, use an explicit SMP memory
+	 * barrier here. PT lock release or up_read(mmap_sem) still
+	 * have release semantics that can allow the
+	 * waitqueue_active() to be reordered before the pte update.
+	 */
+	smp_mb();
+
+	/*
+	 * Use waitqueue_active because it's very frequent to
+	 * change the address space atomically even if there are no
+	 * userfaults yet. So we take the spinlock only when we're
+	 * sure we've userfaults to wake.
+	 */
 	if (waitqueue_active(&ctx->fault_pending_wqh) ||
 	    waitqueue_active(&ctx->fault_wqh))
 		__wake_userfault(ctx, range);

The wait_event/handle_userfault side already has a smp_mb() to prevent
the lockless pagetable walk to be reordered before the list_add in
__add_wait_queue (needed as well precisely because of the
waitqueue_active optimization that I'd like to keep).

Thanks,
Andrea

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	qemu-devel@nongnu.org, KVM list <kvm@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	zhang.zhanghailiang@huawei.com,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Andy Lutomirski <luto@amacapital.net>,
	Hugh Dickins <hughd@google.com>,
	Peter Feiner <pfeiner@google.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Huangpeng (Pet
Subject: Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization
Date: Fri, 15 May 2015 18:04:26 +0200	[thread overview]
Message-ID: <20150515160426.GD19097@redhat.com> (raw)
In-Reply-To: <CA+55aFwCODeiXUPDR7-Y-=2xE2abmVuCnmVV=ezFqhO+JkaW=A@mail.gmail.com>

On Thu, May 14, 2015 at 10:49:06AM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2015 at 10:31 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > +static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
> > +                                          struct userfaultfd_wake_range *range)
> > +{
> > +       if (waitqueue_active(&ctx->fault_wqh))
> > +               __wake_userfault(ctx, range);
> > +}
> 
> Pretty much every single time people use this "if
> (waitqueue_active())" model, it tends to be a bug, because it means
> that there is zero serialization with people who are just about to go
> to sleep. It's fundamentally racy against all the "wait_event()" loops
> that carefully do memory barriers between testing conditions and going
> to sleep, because the memory barriers now don't exist on the waking
> side.

As far as 10/23 is concerned, the __wake_userfault taking the locks
would also ignore any "incoming" not yet "read(2)" userfault. "read(2)"
as in syscall read. So there was no race to worry about as "incoming"
userfaults would be ignored anyway by the wake.

The only case that had to be reliable in not missing wakeup events was
the "release" file operation. But that doesn't use waitqueue_active
and it relies on ctx->released combined with mmap_sem taken for writing.

    http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/tree/fs/userfaultfd.c?h=userfault&id=2f73ffa8267e41f04fe6b0d93d23feed45c0980a#n267

However later (after I started documenting what userland should do) I
started to question this old model of leaving the race handling to
userland. It's way too complex to leave the race handling to
userland. Furthermore it's inefficient because there would be lots of
spurious userfaults that we could have been waken up within the kernel
before userland could have a chance to read them.

So then I handled the race in the kernel in patch 17/23. That allowed
to drop hundres of line of locking code from qemu (two bits per page
and a mutex and lots of complexity) and then I didn't need to document
the complex rules described in this commit header:

     http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=5a2b3614e107482da9a1fcfcd120d30eb62d45dc

You can see from the patch it complicated the kernel by adding a
pagetable walk, but it's worth it because it's simpler than solving it
in userland, plus its solved at once for all users.

The only cons is that the old model would allow userland to be way
more consistent in enforcing asserts as it had the control.

With the current model POLLIN may be returned from poll() despite the
later read returns -EAGAIN, so it basically requires a non blocking
open if people uses poll(). (blocking reads without poll are still
fine)

Now back to your question: the waitqueue_active is still there in the
current model as well (since 17/23).

Since 17/23 losing a wakeup (as far as qemu is concerned) would mean
that qemu would read the address of the fault that didn't get waken up
because of the race, then it would send a page request to the source
node (it had no state in the destination node where the userfault runs
to know if it was a "dup"), which would discard the request (not
sending the page twice) noticing in its simple per-page bitmap that it
was already sent. So with the new model after 17/23 qemu, losing a
wakeup is a bug.

The wait_event is like this:

	handle_userfault (wait_event)
	-------
	spin_lock(&ctx->fault_pending_wqh.lock);
	/*
	 * After the __add_wait_queue the uwq is visible to userland
	 * through poll/read().
	 */
	__add_wait_queue(&ctx->fault_pending_wqh, &uwq.wq);
	/*
	 * The smp_mb() after __set_current_state prevents the reads
	 * following the spin_unlock to happen before the list_add in
	 * __add_wait_queue.
	 */
	set_current_state(TASK_KILLABLE);
	spin_unlock(&ctx->fault_pending_wqh.lock);

	must_wait = userfaultfd_must_wait(ctx, address, flags, reason);

	up_read(&mm->mmap_sem);

	if (likely(must_wait && !ACCESS_ONCE(ctx->released) &&
		   !fatal_signal_pending(current))) {
		wake_up_poll(&ctx->fd_wqh, POLLIN);
		schedule();

The wakeup side is:

	userfaultfd_copy
	-----------
	ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
			   uffdio_copy.len);
	if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
		return -EFAULT;
	if (ret < 0)
		goto out;
	BUG_ON(!ret);
	/* len == 0 would wake all */
	range.len = ret;
	if (!(uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE)) {
		range.start = uffdio_copy.dst;
		wake_userfault(ctx, &range);


wake_userfault is the function that is using waitqueue_active so the
problem materializes if waitqueue_active moves before
mcopy_atomic. Precisely it should move before the set_pte_at below:

	mcopy_atomic_pte
	------
	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);

	/* No need to invalidate - it was non-present before */
	update_mmu_cache(dst_vma, dst_addr, dst_pte);

	pte_unmap_unlock(dst_pte, ptl);

unlock would allow the read to be reordered before it even on
x86. There's an up_read as well in between but it has the same
problem. So you're right that theoretically we can miss a wakeup.

Practically it sounds unlikely because of the sheer size of
mcopy_atomic and we never experienced it but it's still a bug.

>  So I'm making a new rule: if you use waitqueue_active(), I want an
> explanation for why it's not racy with the waiter. A big comment
> about the memory ordering, or about higher-level locks that are held
> by the caller, or something.  Linus

To fix it I added this along a comment:

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index c89e96f..6be316b 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -593,6 +593,21 @@ static void __wake_userfault(struct userfaultfd_ctx *ctx,
 static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
 					   struct userfaultfd_wake_range *range)
 {
+	/*
+	 * To be sure waitqueue_active() is not reordered by the CPU
+	 * before the pagetable update, use an explicit SMP memory
+	 * barrier here. PT lock release or up_read(mmap_sem) still
+	 * have release semantics that can allow the
+	 * waitqueue_active() to be reordered before the pte update.
+	 */
+	smp_mb();
+
+	/*
+	 * Use waitqueue_active because it's very frequent to
+	 * change the address space atomically even if there are no
+	 * userfaults yet. So we take the spinlock only when we're
+	 * sure we've userfaults to wake.
+	 */
 	if (waitqueue_active(&ctx->fault_pending_wqh) ||
 	    waitqueue_active(&ctx->fault_wqh))
 		__wake_userfault(ctx, range);

The wait_event/handle_userfault side already has a smp_mb() to prevent
the lockless pagetable walk to be reordered before the list_add in
__add_wait_queue (needed as well precisely because of the
waitqueue_active optimization that I'd like to keep).

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	qemu-devel@nongnu.org, KVM list <kvm@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	zhang.zhanghailiang@huawei.com,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Andy Lutomirski <luto@amacapital.net>,
	Hugh Dickins <hughd@google.com>,
	Peter Feiner <pfeiner@google.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	"Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Subject: Re: [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization
Date: Fri, 15 May 2015 18:04:26 +0200	[thread overview]
Message-ID: <20150515160426.GD19097@redhat.com> (raw)
In-Reply-To: <CA+55aFwCODeiXUPDR7-Y-=2xE2abmVuCnmVV=ezFqhO+JkaW=A@mail.gmail.com>

On Thu, May 14, 2015 at 10:49:06AM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2015 at 10:31 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > +static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
> > +                                          struct userfaultfd_wake_range *range)
> > +{
> > +       if (waitqueue_active(&ctx->fault_wqh))
> > +               __wake_userfault(ctx, range);
> > +}
> 
> Pretty much every single time people use this "if
> (waitqueue_active())" model, it tends to be a bug, because it means
> that there is zero serialization with people who are just about to go
> to sleep. It's fundamentally racy against all the "wait_event()" loops
> that carefully do memory barriers between testing conditions and going
> to sleep, because the memory barriers now don't exist on the waking
> side.

As far as 10/23 is concerned, the __wake_userfault taking the locks
would also ignore any "incoming" not yet "read(2)" userfault. "read(2)"
as in syscall read. So there was no race to worry about as "incoming"
userfaults would be ignored anyway by the wake.

The only case that had to be reliable in not missing wakeup events was
the "release" file operation. But that doesn't use waitqueue_active
and it relies on ctx->released combined with mmap_sem taken for writing.

    http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/tree/fs/userfaultfd.c?h=userfault&id=2f73ffa8267e41f04fe6b0d93d23feed45c0980a#n267

However later (after I started documenting what userland should do) I
started to question this old model of leaving the race handling to
userland. It's way too complex to leave the race handling to
userland. Furthermore it's inefficient because there would be lots of
spurious userfaults that we could have been waken up within the kernel
before userland could have a chance to read them.

So then I handled the race in the kernel in patch 17/23. That allowed
to drop hundres of line of locking code from qemu (two bits per page
and a mutex and lots of complexity) and then I didn't need to document
the complex rules described in this commit header:

     http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=5a2b3614e107482da9a1fcfcd120d30eb62d45dc

You can see from the patch it complicated the kernel by adding a
pagetable walk, but it's worth it because it's simpler than solving it
in userland, plus its solved at once for all users.

The only cons is that the old model would allow userland to be way
more consistent in enforcing asserts as it had the control.

With the current model POLLIN may be returned from poll() despite the
later read returns -EAGAIN, so it basically requires a non blocking
open if people uses poll(). (blocking reads without poll are still
fine)

Now back to your question: the waitqueue_active is still there in the
current model as well (since 17/23).

Since 17/23 losing a wakeup (as far as qemu is concerned) would mean
that qemu would read the address of the fault that didn't get waken up
because of the race, then it would send a page request to the source
node (it had no state in the destination node where the userfault runs
to know if it was a "dup"), which would discard the request (not
sending the page twice) noticing in its simple per-page bitmap that it
was already sent. So with the new model after 17/23 qemu, losing a
wakeup is a bug.

The wait_event is like this:

	handle_userfault (wait_event)
	-------
	spin_lock(&ctx->fault_pending_wqh.lock);
	/*
	 * After the __add_wait_queue the uwq is visible to userland
	 * through poll/read().
	 */
	__add_wait_queue(&ctx->fault_pending_wqh, &uwq.wq);
	/*
	 * The smp_mb() after __set_current_state prevents the reads
	 * following the spin_unlock to happen before the list_add in
	 * __add_wait_queue.
	 */
	set_current_state(TASK_KILLABLE);
	spin_unlock(&ctx->fault_pending_wqh.lock);

	must_wait = userfaultfd_must_wait(ctx, address, flags, reason);

	up_read(&mm->mmap_sem);

	if (likely(must_wait && !ACCESS_ONCE(ctx->released) &&
		   !fatal_signal_pending(current))) {
		wake_up_poll(&ctx->fd_wqh, POLLIN);
		schedule();

The wakeup side is:

	userfaultfd_copy
	-----------
	ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
			   uffdio_copy.len);
	if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
		return -EFAULT;
	if (ret < 0)
		goto out;
	BUG_ON(!ret);
	/* len == 0 would wake all */
	range.len = ret;
	if (!(uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE)) {
		range.start = uffdio_copy.dst;
		wake_userfault(ctx, &range);


wake_userfault is the function that is using waitqueue_active so the
problem materializes if waitqueue_active moves before
mcopy_atomic. Precisely it should move before the set_pte_at below:

	mcopy_atomic_pte
	------
	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);

	/* No need to invalidate - it was non-present before */
	update_mmu_cache(dst_vma, dst_addr, dst_pte);

	pte_unmap_unlock(dst_pte, ptl);

unlock would allow the read to be reordered before it even on
x86. There's an up_read as well in between but it has the same
problem. So you're right that theoretically we can miss a wakeup.

Practically it sounds unlikely because of the sheer size of
mcopy_atomic and we never experienced it but it's still a bug.

>  So I'm making a new rule: if you use waitqueue_active(), I want an
> explanation for why it's not racy with the waiter. A big comment
> about the memory ordering, or about higher-level locks that are held
> by the caller, or something.  Linus

To fix it I added this along a comment:

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index c89e96f..6be316b 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -593,6 +593,21 @@ static void __wake_userfault(struct userfaultfd_ctx *ctx,
 static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
 					   struct userfaultfd_wake_range *range)
 {
+	/*
+	 * To be sure waitqueue_active() is not reordered by the CPU
+	 * before the pagetable update, use an explicit SMP memory
+	 * barrier here. PT lock release or up_read(mmap_sem) still
+	 * have release semantics that can allow the
+	 * waitqueue_active() to be reordered before the pte update.
+	 */
+	smp_mb();
+
+	/*
+	 * Use waitqueue_active because it's very frequent to
+	 * change the address space atomically even if there are no
+	 * userfaults yet. So we take the spinlock only when we're
+	 * sure we've userfaults to wake.
+	 */
 	if (waitqueue_active(&ctx->fault_pending_wqh) ||
 	    waitqueue_active(&ctx->fault_wqh))
 		__wake_userfault(ctx, range);

The wait_event/handle_userfault side already has a smp_mb() to prevent
the lockless pagetable walk to be reordered before the list_add in
__add_wait_queue (needed as well precisely because of the
waitqueue_active optimization that I'd like to keep).

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrea Arcangeli <aarcange@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>,
	zhang.zhanghailiang@huawei.com, KVM list <kvm@vger.kernel.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Linux API <linux-api@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	qemu-devel@nongnu.org, linux-mm <linux-mm@kvack.org>,
	Andres Lagar-Cavilla <andreslc@google.com>,
	Mel Gorman <mgorman@suse.de>, Paolo Bonzini <pbonzini@redhat.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Andrew Morton <akpm@linux-foundation.org>,
	Sanidhya Kashyap <sanidhya.gatech@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Andy Lutomirski <luto@amacapital.net>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Peter Feiner <pfeiner@google.com>
Subject: Re: [Qemu-devel] [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization
Date: Fri, 15 May 2015 18:04:26 +0200	[thread overview]
Message-ID: <20150515160426.GD19097@redhat.com> (raw)
In-Reply-To: <CA+55aFwCODeiXUPDR7-Y-=2xE2abmVuCnmVV=ezFqhO+JkaW=A@mail.gmail.com>

On Thu, May 14, 2015 at 10:49:06AM -0700, Linus Torvalds wrote:
> On Thu, May 14, 2015 at 10:31 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> > +static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
> > +                                          struct userfaultfd_wake_range *range)
> > +{
> > +       if (waitqueue_active(&ctx->fault_wqh))
> > +               __wake_userfault(ctx, range);
> > +}
> 
> Pretty much every single time people use this "if
> (waitqueue_active())" model, it tends to be a bug, because it means
> that there is zero serialization with people who are just about to go
> to sleep. It's fundamentally racy against all the "wait_event()" loops
> that carefully do memory barriers between testing conditions and going
> to sleep, because the memory barriers now don't exist on the waking
> side.

As far as 10/23 is concerned, the __wake_userfault taking the locks
would also ignore any "incoming" not yet "read(2)" userfault. "read(2)"
as in syscall read. So there was no race to worry about as "incoming"
userfaults would be ignored anyway by the wake.

The only case that had to be reliable in not missing wakeup events was
the "release" file operation. But that doesn't use waitqueue_active
and it relies on ctx->released combined with mmap_sem taken for writing.

    http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/tree/fs/userfaultfd.c?h=userfault&id=2f73ffa8267e41f04fe6b0d93d23feed45c0980a#n267

However later (after I started documenting what userland should do) I
started to question this old model of leaving the race handling to
userland. It's way too complex to leave the race handling to
userland. Furthermore it's inefficient because there would be lots of
spurious userfaults that we could have been waken up within the kernel
before userland could have a chance to read them.

So then I handled the race in the kernel in patch 17/23. That allowed
to drop hundres of line of locking code from qemu (two bits per page
and a mutex and lots of complexity) and then I didn't need to document
the complex rules described in this commit header:

     http://git.kernel.org/cgit/linux/kernel/git/andrea/aa.git/commit/?h=userfault&id=5a2b3614e107482da9a1fcfcd120d30eb62d45dc

You can see from the patch it complicated the kernel by adding a
pagetable walk, but it's worth it because it's simpler than solving it
in userland, plus its solved at once for all users.

The only cons is that the old model would allow userland to be way
more consistent in enforcing asserts as it had the control.

With the current model POLLIN may be returned from poll() despite the
later read returns -EAGAIN, so it basically requires a non blocking
open if people uses poll(). (blocking reads without poll are still
fine)

Now back to your question: the waitqueue_active is still there in the
current model as well (since 17/23).

Since 17/23 losing a wakeup (as far as qemu is concerned) would mean
that qemu would read the address of the fault that didn't get waken up
because of the race, then it would send a page request to the source
node (it had no state in the destination node where the userfault runs
to know if it was a "dup"), which would discard the request (not
sending the page twice) noticing in its simple per-page bitmap that it
was already sent. So with the new model after 17/23 qemu, losing a
wakeup is a bug.

The wait_event is like this:

	handle_userfault (wait_event)
	-------
	spin_lock(&ctx->fault_pending_wqh.lock);
	/*
	 * After the __add_wait_queue the uwq is visible to userland
	 * through poll/read().
	 */
	__add_wait_queue(&ctx->fault_pending_wqh, &uwq.wq);
	/*
	 * The smp_mb() after __set_current_state prevents the reads
	 * following the spin_unlock to happen before the list_add in
	 * __add_wait_queue.
	 */
	set_current_state(TASK_KILLABLE);
	spin_unlock(&ctx->fault_pending_wqh.lock);

	must_wait = userfaultfd_must_wait(ctx, address, flags, reason);

	up_read(&mm->mmap_sem);

	if (likely(must_wait && !ACCESS_ONCE(ctx->released) &&
		   !fatal_signal_pending(current))) {
		wake_up_poll(&ctx->fd_wqh, POLLIN);
		schedule();

The wakeup side is:

	userfaultfd_copy
	-----------
	ret = mcopy_atomic(ctx->mm, uffdio_copy.dst, uffdio_copy.src,
			   uffdio_copy.len);
	if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
		return -EFAULT;
	if (ret < 0)
		goto out;
	BUG_ON(!ret);
	/* len == 0 would wake all */
	range.len = ret;
	if (!(uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE)) {
		range.start = uffdio_copy.dst;
		wake_userfault(ctx, &range);


wake_userfault is the function that is using waitqueue_active so the
problem materializes if waitqueue_active moves before
mcopy_atomic. Precisely it should move before the set_pte_at below:

	mcopy_atomic_pte
	------
	set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);

	/* No need to invalidate - it was non-present before */
	update_mmu_cache(dst_vma, dst_addr, dst_pte);

	pte_unmap_unlock(dst_pte, ptl);

unlock would allow the read to be reordered before it even on
x86. There's an up_read as well in between but it has the same
problem. So you're right that theoretically we can miss a wakeup.

Practically it sounds unlikely because of the sheer size of
mcopy_atomic and we never experienced it but it's still a bug.

>  So I'm making a new rule: if you use waitqueue_active(), I want an
> explanation for why it's not racy with the waiter. A big comment
> about the memory ordering, or about higher-level locks that are held
> by the caller, or something.  Linus

To fix it I added this along a comment:

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index c89e96f..6be316b 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -593,6 +593,21 @@ static void __wake_userfault(struct userfaultfd_ctx *ctx,
 static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
 					   struct userfaultfd_wake_range *range)
 {
+	/*
+	 * To be sure waitqueue_active() is not reordered by the CPU
+	 * before the pagetable update, use an explicit SMP memory
+	 * barrier here. PT lock release or up_read(mmap_sem) still
+	 * have release semantics that can allow the
+	 * waitqueue_active() to be reordered before the pte update.
+	 */
+	smp_mb();
+
+	/*
+	 * Use waitqueue_active because it's very frequent to
+	 * change the address space atomically even if there are no
+	 * userfaults yet. So we take the spinlock only when we're
+	 * sure we've userfaults to wake.
+	 */
 	if (waitqueue_active(&ctx->fault_pending_wqh) ||
 	    waitqueue_active(&ctx->fault_wqh))
 		__wake_userfault(ctx, range);

The wait_event/handle_userfault side already has a smp_mb() to prevent
the lockless pagetable walk to be reordered before the list_add in
__add_wait_queue (needed as well precisely because of the
waitqueue_active optimization that I'd like to keep).

Thanks,
Andrea

  reply	other threads:[~2015-05-15 16:05 UTC|newest]

Thread overview: 221+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-14 17:30 [PATCH 00/23] userfaultfd v4 Andrea Arcangeli
2015-05-14 17:30 ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:30 ` Andrea Arcangeli
2015-05-14 17:30 ` [PATCH 01/23] userfaultfd: linux/Documentation/vm/userfaultfd.txt Andrea Arcangeli
2015-05-14 17:30   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:30   ` Andrea Arcangeli
2015-09-11  8:47   ` Michael Kerrisk (man-pages)
2015-09-11  8:47     ` [Qemu-devel] " Michael Kerrisk (man-pages)
2015-09-11  8:47     ` Michael Kerrisk (man-pages)
2015-09-11  8:47     ` Michael Kerrisk (man-pages)
2015-12-04 15:50     ` Michael Kerrisk (man-pages)
2015-12-04 15:50       ` [Qemu-devel] " Michael Kerrisk (man-pages)
2015-12-04 15:50       ` Michael Kerrisk (man-pages)
2015-12-04 17:55       ` Andrea Arcangeli
2015-12-04 17:55         ` [Qemu-devel] " Andrea Arcangeli
2015-12-04 17:55         ` Andrea Arcangeli
2015-12-04 17:55         ` Andrea Arcangeli
2015-05-14 17:30 ` [PATCH 02/23] userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key Andrea Arcangeli
2015-05-14 17:30   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:30   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 03/23] userfaultfd: uAPI Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 04/23] userfaultfd: linux/userfaultfd_k.h Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 05/23] userfaultfd: add vm_userfaultfd_ctx to the vm_area_struct Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 06/23] userfaultfd: add VM_UFFD_MISSING and VM_UFFD_WP Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 07/23] userfaultfd: call handle_userfault() for userfaultfd_missing() faults Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 08/23] userfaultfd: teach vma_merge to merge across vma->vm_userfaultfd_ctx Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 09/23] userfaultfd: prevent khugepaged to merge if userfaultfd is armed Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 10/23] userfaultfd: add new syscall to provide memory externalization Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:49   ` Linus Torvalds
2015-05-14 17:49     ` [Qemu-devel] " Linus Torvalds
2015-05-14 17:49     ` Linus Torvalds
2015-05-15 16:04     ` Andrea Arcangeli [this message]
2015-05-15 16:04       ` [Qemu-devel] " Andrea Arcangeli
2015-05-15 16:04       ` Andrea Arcangeli
2015-05-15 16:04       ` Andrea Arcangeli
2015-05-15 18:22       ` Linus Torvalds
2015-05-15 18:22         ` [Qemu-devel] " Linus Torvalds
2015-05-15 18:22         ` Linus Torvalds
2015-05-27 11:41   ` Thomas Martitz
2015-06-23 19:00   ` Dave Hansen
2015-06-23 19:00     ` [Qemu-devel] " Dave Hansen
2015-06-23 19:00     ` Dave Hansen
2015-06-23 19:00     ` Dave Hansen
2015-06-23 21:41     ` Andrea Arcangeli
2015-06-23 21:41       ` [Qemu-devel] " Andrea Arcangeli
2015-06-23 21:41       ` Andrea Arcangeli
2015-06-23 21:41       ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 11/23] userfaultfd: Rename uffd_api.bits into .features Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 12/23] userfaultfd: Rename uffd_api.bits into .features fixup Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 13/23] userfaultfd: change the read API to return a uffd_msg Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 14/23] userfaultfd: wake pending userfaults Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-10-22 12:10   ` Peter Zijlstra
2015-10-22 12:10     ` [Qemu-devel] " Peter Zijlstra
2015-10-22 12:10     ` Peter Zijlstra
2015-10-22 13:20     ` Andrea Arcangeli
2015-10-22 13:20       ` [Qemu-devel] " Andrea Arcangeli
2015-10-22 13:20       ` Andrea Arcangeli
2015-10-22 13:20       ` Andrea Arcangeli
2015-10-22 13:38       ` Peter Zijlstra
2015-10-22 13:38         ` [Qemu-devel] " Peter Zijlstra
2015-10-22 13:38         ` Peter Zijlstra
2015-10-22 14:18         ` Andrea Arcangeli
2015-10-22 14:18           ` [Qemu-devel] " Andrea Arcangeli
2015-10-22 14:18           ` Andrea Arcangeli
2015-10-22 14:18           ` Andrea Arcangeli
2015-10-22 14:18           ` Andrea Arcangeli
2015-10-22 15:15           ` Peter Zijlstra
2015-10-22 15:15             ` [Qemu-devel] " Peter Zijlstra
2015-10-22 15:15             ` Peter Zijlstra
2015-10-22 15:30             ` Andrea Arcangeli
2015-10-22 15:30               ` [Qemu-devel] " Andrea Arcangeli
2015-10-22 15:30               ` Andrea Arcangeli
2015-10-22 15:30               ` Andrea Arcangeli
2015-10-22 15:30               ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 15/23] userfaultfd: optimize read() and poll() to be O(1) Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 16/23] userfaultfd: allocate the userfaultfd_ctx cacheline aligned Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 17/23] userfaultfd: solve the race between UFFDIO_COPY|ZEROPAGE and read Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 18/23] userfaultfd: buildsystem activation Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 19/23] userfaultfd: activate syscall Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-08-11 10:07   ` [Qemu-devel] " Bharata B Rao
2015-08-11 10:07     ` Bharata B Rao
2015-08-11 10:07     ` Bharata B Rao
2015-08-11 13:48     ` Andrea Arcangeli
2015-08-11 13:48       ` Andrea Arcangeli
2015-08-11 13:48       ` Andrea Arcangeli
2015-08-12  5:23       ` Bharata B Rao
2015-08-12  5:23         ` Bharata B Rao
2015-08-12  5:23         ` Bharata B Rao
2015-08-12  5:23         ` Bharata B Rao
2015-09-08  6:08         ` Michael Ellerman
2015-09-08  6:08           ` Michael Ellerman
2015-09-08  6:08           ` Michael Ellerman
2015-09-08  6:08           ` Michael Ellerman
2015-09-08  6:08           ` Michael Ellerman
2015-09-08  6:39           ` Bharata B Rao
2015-09-08  6:39             ` Bharata B Rao
2015-09-08  6:39             ` Bharata B Rao
2015-09-08  6:39             ` Bharata B Rao
2015-09-08  7:14             ` Michael Ellerman
2015-09-08  7:14               ` Michael Ellerman
2015-09-08  7:14               ` Michael Ellerman
2015-09-08  7:14               ` Michael Ellerman
2015-09-08  7:14               ` Michael Ellerman
2015-09-08 10:40               ` Michael Ellerman
2015-09-08 10:40                 ` Michael Ellerman
2015-09-08 12:28                 ` Dr. David Alan Gilbert
2015-09-08 12:28                   ` Dr. David Alan Gilbert
2015-09-08 12:28                   ` Dr. David Alan Gilbert
2015-09-08  8:59             ` Dr. David Alan Gilbert
2015-09-08  8:59               ` Dr. David Alan Gilbert
2015-09-08  8:59               ` Dr. David Alan Gilbert
2015-09-08 10:00               ` Bharata B Rao
2015-09-08 10:00                 ` Bharata B Rao
2015-09-08 10:00                 ` Bharata B Rao
2015-09-08 10:00                 ` Bharata B Rao
2015-09-08 12:46                 ` Dr. David Alan Gilbert
2015-09-08 12:46                   ` Dr. David Alan Gilbert
2015-09-08 12:46                   ` Dr. David Alan Gilbert
2015-09-08 13:37                   ` Bharata B Rao
2015-09-08 13:37                     ` Bharata B Rao
2015-09-08 13:37                     ` Bharata B Rao
2015-09-08 13:37                     ` Bharata B Rao
2015-09-08 13:37                     ` Bharata B Rao
2015-09-08 14:13                     ` Dr. David Alan Gilbert
2015-09-08 14:13                       ` Dr. David Alan Gilbert
2015-09-08 14:13                       ` Dr. David Alan Gilbert
2015-09-08 14:13                       ` Dr. David Alan Gilbert
2015-09-08 14:13                       ` Dr. David Alan Gilbert
2015-09-10 12:24                       ` Bharata B Rao
2015-09-11 19:15                         ` Dr. David Alan Gilbert
2015-09-14 18:53                         ` Dr. David Alan Gilbert
2015-05-14 17:31 ` [PATCH 20/23] userfaultfd: UFFDIO_COPY|UFFDIO_ZEROPAGE uAPI Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 21/23] userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 22/23] userfaultfd: avoid mmap_sem read recursion in mcopy_atomic Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-22 20:18   ` Andrew Morton
2015-05-22 20:18     ` [Qemu-devel] " Andrew Morton
2015-05-22 20:18     ` Andrew Morton
2015-05-22 20:48     ` Andrea Arcangeli
2015-05-22 20:48       ` [Qemu-devel] " Andrea Arcangeli
2015-05-22 20:48       ` Andrea Arcangeli
2015-05-22 21:18       ` Andrew Morton
2015-05-22 21:18         ` [Qemu-devel] " Andrew Morton
2015-05-22 21:18         ` Andrew Morton
2015-05-23  1:04         ` Andrea Arcangeli
2015-05-23  1:04           ` [Qemu-devel] " Andrea Arcangeli
2015-05-23  1:04           ` Andrea Arcangeli
2015-05-23  1:04           ` Andrea Arcangeli
2015-05-23  1:04           ` Andrea Arcangeli
2015-05-14 17:31 ` [PATCH 23/23] userfaultfd: UFFDIO_COPY and UFFDIO_ZEROPAGE Andrea Arcangeli
2015-05-14 17:31   ` [Qemu-devel] " Andrea Arcangeli
2015-05-14 17:31   ` Andrea Arcangeli
2015-05-18 14:24 ` [PATCH 00/23] userfaultfd v4 Pavel Emelyanov
2015-05-18 14:24   ` [Qemu-devel] " Pavel Emelyanov
2015-05-18 14:24   ` Pavel Emelyanov
2015-05-18 14:24   ` Pavel Emelyanov
2015-05-19 21:38 ` Andrew Morton
2015-05-19 21:38   ` [Qemu-devel] " Andrew Morton
2015-05-19 21:38   ` Andrew Morton
2015-05-19 21:59   ` Richard Weinberger
2015-05-19 21:59     ` [Qemu-devel] " Richard Weinberger
2015-05-19 21:59     ` Richard Weinberger
2015-05-20 14:17     ` Andrea Arcangeli
2015-05-20 14:17       ` [Qemu-devel] " Andrea Arcangeli
2015-05-20 14:17       ` Andrea Arcangeli
2015-05-20 14:17       ` Andrea Arcangeli
2015-05-20 13:23   ` Andrea Arcangeli
2015-05-20 13:23     ` [Qemu-devel] " Andrea Arcangeli
2015-05-20 13:23     ` Andrea Arcangeli
2015-05-21 13:11 ` Kirill Smelkov
2015-05-21 13:11   ` [Qemu-devel] " Kirill Smelkov
2015-05-21 13:11   ` Kirill Smelkov
2015-05-21 15:52   ` Andrea Arcangeli
2015-05-21 15:52     ` [Qemu-devel] " Andrea Arcangeli
2015-05-21 15:52     ` Andrea Arcangeli
2015-05-21 15:52     ` Andrea Arcangeli
2015-05-21 15:52     ` Andrea Arcangeli
2015-05-22 16:35     ` Kirill Smelkov
2015-05-22 16:35       ` [Qemu-devel] " Kirill Smelkov
2015-05-22 16:35       ` Kirill Smelkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150515160426.GD19097@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreslc@google.com \
    --cc=dave.hansen@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=kvm@vger.kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mgorman@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=pfeiner@google.com \
    --cc=qemu-devel@nongnu.org \
    --cc=riel@redhat.com \
    --cc=sanidhya.gatech@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=xemul@parallels.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.