All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
@ 2015-10-09  0:35 Kosuke Tatsukawa
  2015-10-09  8:45 ` Paolo Bonzini
  0 siblings, 1 reply; 7+ messages in thread
From: Kosuke Tatsukawa @ 2015-10-09  0:35 UTC (permalink / raw)
  To: Gleb Natapov, Paolo Bonzini; +Cc: kvm, linux-kernel

async_pf_execute() seems to be missing a memory barrier which might
cause the waker to not notice the waiter and miss sending a wake_up as
in the following figure.

        async_pf_execute                    kvm_vcpu_block
------------------------------------------------------------------------
spin_lock(&vcpu->async_pf.lock);
if (waitqueue_active(&vcpu->wq))
/* The CPU might reorder the test for
   the waitqueue up here, before
   prior writes complete */
                                    prepare_to_wait(&vcpu->wq, &wait,
                                      TASK_INTERRUPTIBLE);
                                    /*if (kvm_vcpu_check_block(vcpu) < 0) */
                                     /*if (kvm_arch_vcpu_runnable(vcpu)) { */
                                      ...
                                      return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
                                        !vcpu->arch.apf.halted)
                                        || !list_empty_careful(&vcpu->async_pf.done)
                                     ...
                                     return 0;
list_add_tail(&apf->link,
  &vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
                                    waited = true;
                                    schedule();
------------------------------------------------------------------------

The attached patch adds the missing memory barrier.

I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c  (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).

Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
---
 virt/kvm/async_pf.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 44660ae..303fc7f 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -94,6 +94,11 @@ static void async_pf_execute(struct work_struct *work)
 
 	trace_kvm_async_pf_completed(addr, gva);
 
+	/*
+	 * Memory barrier is required here to make sure change to
+	 * vcpu->async_pf.done is visible from other CPUs
+	 */
+	smp_mb();
 	if (waitqueue_active(&vcpu->wq))
 		wake_up_interruptible(&vcpu->wq);
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
  2015-10-09  0:35 [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c Kosuke Tatsukawa
@ 2015-10-09  8:45 ` Paolo Bonzini
  2015-10-09  8:50   ` Peter Zijlstra
  2015-10-09  9:04   ` Kosuke Tatsukawa
  0 siblings, 2 replies; 7+ messages in thread
From: Paolo Bonzini @ 2015-10-09  8:45 UTC (permalink / raw)
  To: Kosuke Tatsukawa, Gleb Natapov; +Cc: kvm, linux-kernel



On 09/10/2015 02:35, Kosuke Tatsukawa wrote:
>         async_pf_execute                    kvm_vcpu_block
> ------------------------------------------------------------------------
> spin_lock(&vcpu->async_pf.lock);
> if (waitqueue_active(&vcpu->wq))
> /* The CPU might reorder the test for
>    the waitqueue up here, before
>    prior writes complete */
>                                     prepare_to_wait(&vcpu->wq, &wait,
>                                       TASK_INTERRUPTIBLE);
>                                     /*if (kvm_vcpu_check_block(vcpu) < 0) */
>                                      /*if (kvm_arch_vcpu_runnable(vcpu)) { */
>                                       ...
>                                       return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>                                         !vcpu->arch.apf.halted)
>                                         || !list_empty_careful(&vcpu->async_pf.done)
>                                      ...

The new memory barrier isn't "paired" with any other, and in
fact I think that the same issue exists on the other side: 
list_empty_careful(&vcpu->async_pf.done) may be reordered up,
before the prepare_to_wait:

spin_lock(&vcpu->async_pf.lock);
                                    (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
                                            !vcpu->arch.apf.halted)
                                            || !list_empty_careful(&vcpu->async_pf.done)
                                    ...
                                    prepare_to_wait(&vcpu->wq, &wait,
                                      TASK_INTERRUPTIBLE);
                                    /*if (kvm_vcpu_check_block(vcpu) < 0) */
                                     /*if (kvm_arch_vcpu_runnable(vcpu)) { */
                                      ...
                                     return 0;
list_add_tail(&apf->link,
  &vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
                                    waited = true;
                                    schedule();
if (waitqueue_active(&vcpu->wq))

So you need another smp_mb() after prepare_to_wait().  I'm not sure
if it's needed also for your original tty report, but I think it is
for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active
without memory barrier in mei drivers").

I wonder if it makes sense to introduce smp_mb__before_spin_lock()
and smp_mb__after_spin_unlock().  On x86 the former could be a
simple compiler barrier, and on s390 both of them could.  But that
should be a separate patch.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
  2015-10-09  8:45 ` Paolo Bonzini
@ 2015-10-09  8:50   ` Peter Zijlstra
  2015-10-09 10:24     ` Paolo Bonzini
  2015-10-09  9:04   ` Kosuke Tatsukawa
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2015-10-09  8:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kosuke Tatsukawa, Gleb Natapov, kvm, linux-kernel

On Fri, Oct 09, 2015 at 10:45:32AM +0200, Paolo Bonzini wrote:
> So you need another smp_mb() after prepare_to_wait().  I'm not sure
> if it's needed also for your original tty report, but I think it is
> for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active
> without memory barrier in mei drivers").
> 
> I wonder if it makes sense to introduce smp_mb__before_spin_lock()
> and smp_mb__after_spin_unlock().  On x86 the former could be a
> simple compiler barrier, and on s390 both of them could.  But that
> should be a separate patch.

Not having actually read or thought about the issue at hand, its
perfectly valid to pair an smp_mb() with either spin_lock() or
spin_unlock().

IOW. MB <-> {ACQUIRE, RELEASE} is a valid pairing.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
  2015-10-09  8:45 ` Paolo Bonzini
  2015-10-09  8:50   ` Peter Zijlstra
@ 2015-10-09  9:04   ` Kosuke Tatsukawa
  2015-10-09 10:26     ` Paolo Bonzini
  1 sibling, 1 reply; 7+ messages in thread
From: Kosuke Tatsukawa @ 2015-10-09  9:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Gleb Natapov, kvm, linux-kernel

Paolo Bonzini wrote:
> On 09/10/2015 02:35, Kosuke Tatsukawa wrote:
>>         async_pf_execute                    kvm_vcpu_block
>> ------------------------------------------------------------------------
>> spin_lock(&vcpu->async_pf.lock);
>> if (waitqueue_active(&vcpu->wq))
>> /* The CPU might reorder the test for
>>    the waitqueue up here, before
>>    prior writes complete */
>>                                     prepare_to_wait(&vcpu->wq, &wait,
>>                                       TASK_INTERRUPTIBLE);
>>                                     /*if (kvm_vcpu_check_block(vcpu) < 0) */
>>                                      /*if (kvm_arch_vcpu_runnable(vcpu)) { */
>>                                       ...
>>                                       return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>>                                         !vcpu->arch.apf.halted)
>>                                         || !list_empty_careful(&vcpu->async_pf.done)
>>                                      ...
> 
> The new memory barrier isn't "paired" with any other, and in
> fact I think that the same issue exists on the other side: 
> list_empty_careful(&vcpu->async_pf.done) may be reordered up,
> before the prepare_to_wait:

smp_store_mb() called from set_current_state(), which is called from
prepare_to_wait() should prevent reordering such as below from
happening.  wait_event*() also calls set_current_state() inside.


> spin_lock(&vcpu->async_pf.lock);
>                                     (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>                                             !vcpu->arch.apf.halted)
>                                             || !list_empty_careful(&vcpu->async_pf.done)
>                                     ...
>                                     prepare_to_wait(&vcpu->wq, &wait,
>                                       TASK_INTERRUPTIBLE);
>                                     /*if (kvm_vcpu_check_block(vcpu) < 0) */
>                                      /*if (kvm_arch_vcpu_runnable(vcpu)) { */
>                                       ...
>                                      return 0;
> list_add_tail(&apf->link,
>   &vcpu->async_pf.done);
> spin_unlock(&vcpu->async_pf.lock);
>                                     waited = true;
>                                     schedule();
> if (waitqueue_active(&vcpu->wq))
> 
> So you need another smp_mb() after prepare_to_wait().  I'm not sure
> if it's needed also for your original tty report, but I think it is
> for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active
> without memory barrier in mei drivers").
> 
> I wonder if it makes sense to introduce smp_mb__before_spin_lock()
> and smp_mb__after_spin_unlock().  On x86 the former could be a
> simple compiler barrier, and on s390 both of them could.  But that
> should be a separate patch.

The problem on the waiter side occurs if add_wait_queue() or
__add_wait_queue() is directly called without memory barriers nor locks
protecting it.


> Thanks,
> 
> Paolo
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
                  | IT Platform Division, NEC Corporation
                  | tatsu@ab.jp.nec.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
  2015-10-09  8:50   ` Peter Zijlstra
@ 2015-10-09 10:24     ` Paolo Bonzini
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2015-10-09 10:24 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Kosuke Tatsukawa, Gleb Natapov, kvm, linux-kernel



On 09/10/2015 10:50, Peter Zijlstra wrote:
> Not having actually read or thought about the issue at hand, its
> perfectly valid to pair an smp_mb() with either spin_lock() or
> spin_unlock().
> 
> IOW. MB <-> {ACQUIRE, RELEASE} is a valid pairing.

In this case it's an smp_mb() (store-load barrier) being paired with
another smp_mb(), so spin_unlock()'s release is not enough.

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
  2015-10-09  9:04   ` Kosuke Tatsukawa
@ 2015-10-09 10:26     ` Paolo Bonzini
  2015-10-09 12:21       ` Kosuke Tatsukawa
  0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2015-10-09 10:26 UTC (permalink / raw)
  To: Kosuke Tatsukawa; +Cc: Gleb Natapov, kvm, linux-kernel



On 09/10/2015 11:04, Kosuke Tatsukawa wrote:
> smp_store_mb() called from set_current_state(), which is called from
> prepare_to_wait() should prevent reordering such as below from
> happening.  wait_event*() also calls set_current_state() inside.

Ah, I missed that set_current_state has a memory barrier in it.  The
patch is okay, but please expand the comment to say that this memory
barrier pairs with prepare_to_wait's set_current_state().

Paolo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
  2015-10-09 10:26     ` Paolo Bonzini
@ 2015-10-09 12:21       ` Kosuke Tatsukawa
  0 siblings, 0 replies; 7+ messages in thread
From: Kosuke Tatsukawa @ 2015-10-09 12:21 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Gleb Natapov, kvm, linux-kernel

Paolo Bonzini wrote:
> On 09/10/2015 11:04, Kosuke Tatsukawa wrote:
>> smp_store_mb() called from set_current_state(), which is called from
>> prepare_to_wait() should prevent reordering such as below from
>> happening.  wait_event*() also calls set_current_state() inside.
>
> Ah, I missed that set_current_state has a memory barrier in it.  The
> patch is okay, but please expand the comment to say that this memory
> barrier pairs with prepare_to_wait's set_current_state().

thank you for the comment.  I'll send a new version of the patch with
the modification.
---
Kosuke TATSUKAWA  | 3rd IT Platform Department
                  | IT Platform Division, NEC Corporation
                  | tatsu@ab.jp.nec.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-10-09 12:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-09  0:35 [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c Kosuke Tatsukawa
2015-10-09  8:45 ` Paolo Bonzini
2015-10-09  8:50   ` Peter Zijlstra
2015-10-09 10:24     ` Paolo Bonzini
2015-10-09  9:04   ` Kosuke Tatsukawa
2015-10-09 10:26     ` Paolo Bonzini
2015-10-09 12:21       ` Kosuke Tatsukawa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.