* [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
@ 2015-10-09 0:35 Kosuke Tatsukawa
2015-10-09 8:45 ` Paolo Bonzini
0 siblings, 1 reply; 7+ messages in thread
From: Kosuke Tatsukawa @ 2015-10-09 0:35 UTC (permalink / raw)
To: Gleb Natapov, Paolo Bonzini; +Cc: kvm, linux-kernel
async_pf_execute() seems to be missing a memory barrier which might
cause the waker to not notice the waiter and miss sending a wake_up as
in the following figure.
async_pf_execute kvm_vcpu_block
------------------------------------------------------------------------
spin_lock(&vcpu->async_pf.lock);
if (waitqueue_active(&vcpu->wq))
/* The CPU might reorder the test for
the waitqueue up here, before
prior writes complete */
prepare_to_wait(&vcpu->wq, &wait,
TASK_INTERRUPTIBLE);
/*if (kvm_vcpu_check_block(vcpu) < 0) */
/*if (kvm_arch_vcpu_runnable(vcpu)) { */
...
return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
!vcpu->arch.apf.halted)
|| !list_empty_careful(&vcpu->async_pf.done)
...
return 0;
list_add_tail(&apf->link,
&vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
waited = true;
schedule();
------------------------------------------------------------------------
The attached patch adds the missing memory barrier.
I found this issue when I was looking through the linux source code
for places calling waitqueue_active() before wake_up*(), but without
preceding memory barriers, after sending a patch to fix a similar
issue in drivers/tty/n_tty.c (Details about the original issue can be
found here: https://lkml.org/lkml/2015/9/28/849).
Signed-off-by: Kosuke Tatsukawa <tatsu@ab.jp.nec.com>
---
virt/kvm/async_pf.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 44660ae..303fc7f 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -94,6 +94,11 @@ static void async_pf_execute(struct work_struct *work)
trace_kvm_async_pf_completed(addr, gva);
+ /*
+ * Memory barrier is required here to make sure change to
+ * vcpu->async_pf.done is visible from other CPUs
+ */
+ smp_mb();
if (waitqueue_active(&vcpu->wq))
wake_up_interruptible(&vcpu->wq);
--
1.7.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
2015-10-09 0:35 [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c Kosuke Tatsukawa
@ 2015-10-09 8:45 ` Paolo Bonzini
2015-10-09 8:50 ` Peter Zijlstra
2015-10-09 9:04 ` Kosuke Tatsukawa
0 siblings, 2 replies; 7+ messages in thread
From: Paolo Bonzini @ 2015-10-09 8:45 UTC (permalink / raw)
To: Kosuke Tatsukawa, Gleb Natapov; +Cc: kvm, linux-kernel
On 09/10/2015 02:35, Kosuke Tatsukawa wrote:
> async_pf_execute kvm_vcpu_block
> ------------------------------------------------------------------------
> spin_lock(&vcpu->async_pf.lock);
> if (waitqueue_active(&vcpu->wq))
> /* The CPU might reorder the test for
> the waitqueue up here, before
> prior writes complete */
> prepare_to_wait(&vcpu->wq, &wait,
> TASK_INTERRUPTIBLE);
> /*if (kvm_vcpu_check_block(vcpu) < 0) */
> /*if (kvm_arch_vcpu_runnable(vcpu)) { */
> ...
> return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
> !vcpu->arch.apf.halted)
> || !list_empty_careful(&vcpu->async_pf.done)
> ...
The new memory barrier isn't "paired" with any other, and in
fact I think that the same issue exists on the other side:
list_empty_careful(&vcpu->async_pf.done) may be reordered up,
before the prepare_to_wait:
spin_lock(&vcpu->async_pf.lock);
(vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
!vcpu->arch.apf.halted)
|| !list_empty_careful(&vcpu->async_pf.done)
...
prepare_to_wait(&vcpu->wq, &wait,
TASK_INTERRUPTIBLE);
/*if (kvm_vcpu_check_block(vcpu) < 0) */
/*if (kvm_arch_vcpu_runnable(vcpu)) { */
...
return 0;
list_add_tail(&apf->link,
&vcpu->async_pf.done);
spin_unlock(&vcpu->async_pf.lock);
waited = true;
schedule();
if (waitqueue_active(&vcpu->wq))
So you need another smp_mb() after prepare_to_wait(). I'm not sure
if it's needed also for your original tty report, but I think it is
for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active
without memory barrier in mei drivers").
I wonder if it makes sense to introduce smp_mb__before_spin_lock()
and smp_mb__after_spin_unlock(). On x86 the former could be a
simple compiler barrier, and on s390 both of them could. But that
should be a separate patch.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
2015-10-09 8:45 ` Paolo Bonzini
@ 2015-10-09 8:50 ` Peter Zijlstra
2015-10-09 10:24 ` Paolo Bonzini
2015-10-09 9:04 ` Kosuke Tatsukawa
1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2015-10-09 8:50 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Kosuke Tatsukawa, Gleb Natapov, kvm, linux-kernel
On Fri, Oct 09, 2015 at 10:45:32AM +0200, Paolo Bonzini wrote:
> So you need another smp_mb() after prepare_to_wait(). I'm not sure
> if it's needed also for your original tty report, but I think it is
> for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active
> without memory barrier in mei drivers").
>
> I wonder if it makes sense to introduce smp_mb__before_spin_lock()
> and smp_mb__after_spin_unlock(). On x86 the former could be a
> simple compiler barrier, and on s390 both of them could. But that
> should be a separate patch.
Not having actually read or thought about the issue at hand, its
perfectly valid to pair an smp_mb() with either spin_lock() or
spin_unlock().
IOW. MB <-> {ACQUIRE, RELEASE} is a valid pairing.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
2015-10-09 8:45 ` Paolo Bonzini
2015-10-09 8:50 ` Peter Zijlstra
@ 2015-10-09 9:04 ` Kosuke Tatsukawa
2015-10-09 10:26 ` Paolo Bonzini
1 sibling, 1 reply; 7+ messages in thread
From: Kosuke Tatsukawa @ 2015-10-09 9:04 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Gleb Natapov, kvm, linux-kernel
Paolo Bonzini wrote:
> On 09/10/2015 02:35, Kosuke Tatsukawa wrote:
>> async_pf_execute kvm_vcpu_block
>> ------------------------------------------------------------------------
>> spin_lock(&vcpu->async_pf.lock);
>> if (waitqueue_active(&vcpu->wq))
>> /* The CPU might reorder the test for
>> the waitqueue up here, before
>> prior writes complete */
>> prepare_to_wait(&vcpu->wq, &wait,
>> TASK_INTERRUPTIBLE);
>> /*if (kvm_vcpu_check_block(vcpu) < 0) */
>> /*if (kvm_arch_vcpu_runnable(vcpu)) { */
>> ...
>> return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
>> !vcpu->arch.apf.halted)
>> || !list_empty_careful(&vcpu->async_pf.done)
>> ...
>
> The new memory barrier isn't "paired" with any other, and in
> fact I think that the same issue exists on the other side:
> list_empty_careful(&vcpu->async_pf.done) may be reordered up,
> before the prepare_to_wait:
smp_store_mb() called from set_current_state(), which is called from
prepare_to_wait() should prevent reordering such as below from
happening. wait_event*() also calls set_current_state() inside.
> spin_lock(&vcpu->async_pf.lock);
> (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE &&
> !vcpu->arch.apf.halted)
> || !list_empty_careful(&vcpu->async_pf.done)
> ...
> prepare_to_wait(&vcpu->wq, &wait,
> TASK_INTERRUPTIBLE);
> /*if (kvm_vcpu_check_block(vcpu) < 0) */
> /*if (kvm_arch_vcpu_runnable(vcpu)) { */
> ...
> return 0;
> list_add_tail(&apf->link,
> &vcpu->async_pf.done);
> spin_unlock(&vcpu->async_pf.lock);
> waited = true;
> schedule();
> if (waitqueue_active(&vcpu->wq))
>
> So you need another smp_mb() after prepare_to_wait(). I'm not sure
> if it's needed also for your original tty report, but I think it is
> for https://lkml.org/lkml/2015/10/8/989 ("mei: fix waitqueue_active
> without memory barrier in mei drivers").
>
> I wonder if it makes sense to introduce smp_mb__before_spin_lock()
> and smp_mb__after_spin_unlock(). On x86 the former could be a
> simple compiler barrier, and on s390 both of them could. But that
> should be a separate patch.
The problem on the waiter side occurs if add_wait_queue() or
__add_wait_queue() is directly called without memory barriers nor locks
protecting it.
> Thanks,
>
> Paolo
---
Kosuke TATSUKAWA | 3rd IT Platform Department
| IT Platform Division, NEC Corporation
| tatsu@ab.jp.nec.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
2015-10-09 8:50 ` Peter Zijlstra
@ 2015-10-09 10:24 ` Paolo Bonzini
0 siblings, 0 replies; 7+ messages in thread
From: Paolo Bonzini @ 2015-10-09 10:24 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Kosuke Tatsukawa, Gleb Natapov, kvm, linux-kernel
On 09/10/2015 10:50, Peter Zijlstra wrote:
> Not having actually read or thought about the issue at hand, its
> perfectly valid to pair an smp_mb() with either spin_lock() or
> spin_unlock().
>
> IOW. MB <-> {ACQUIRE, RELEASE} is a valid pairing.
In this case it's an smp_mb() (store-load barrier) being paired with
another smp_mb(), so spin_unlock()'s release is not enough.
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
2015-10-09 9:04 ` Kosuke Tatsukawa
@ 2015-10-09 10:26 ` Paolo Bonzini
2015-10-09 12:21 ` Kosuke Tatsukawa
0 siblings, 1 reply; 7+ messages in thread
From: Paolo Bonzini @ 2015-10-09 10:26 UTC (permalink / raw)
To: Kosuke Tatsukawa; +Cc: Gleb Natapov, kvm, linux-kernel
On 09/10/2015 11:04, Kosuke Tatsukawa wrote:
> smp_store_mb() called from set_current_state(), which is called from
> prepare_to_wait() should prevent reordering such as below from
> happening. wait_event*() also calls set_current_state() inside.
Ah, I missed that set_current_state has a memory barrier in it. The
patch is okay, but please expand the comment to say that this memory
barrier pairs with prepare_to_wait's set_current_state().
Paolo
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c
2015-10-09 10:26 ` Paolo Bonzini
@ 2015-10-09 12:21 ` Kosuke Tatsukawa
0 siblings, 0 replies; 7+ messages in thread
From: Kosuke Tatsukawa @ 2015-10-09 12:21 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Gleb Natapov, kvm, linux-kernel
Paolo Bonzini wrote:
> On 09/10/2015 11:04, Kosuke Tatsukawa wrote:
>> smp_store_mb() called from set_current_state(), which is called from
>> prepare_to_wait() should prevent reordering such as below from
>> happening. wait_event*() also calls set_current_state() inside.
>
> Ah, I missed that set_current_state has a memory barrier in it. The
> patch is okay, but please expand the comment to say that this memory
> barrier pairs with prepare_to_wait's set_current_state().
thank you for the comment. I'll send a new version of the patch with
the modification.
---
Kosuke TATSUKAWA | 3rd IT Platform Department
| IT Platform Division, NEC Corporation
| tatsu@ab.jp.nec.com
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-10-09 12:22 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-09 0:35 [PATCH] kvm: fix waitqueue_active without memory barrier in virt/kvm/async_pf.c Kosuke Tatsukawa
2015-10-09 8:45 ` Paolo Bonzini
2015-10-09 8:50 ` Peter Zijlstra
2015-10-09 10:24 ` Paolo Bonzini
2015-10-09 9:04 ` Kosuke Tatsukawa
2015-10-09 10:26 ` Paolo Bonzini
2015-10-09 12:21 ` Kosuke Tatsukawa
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.