xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
@ 2021-02-20 19:47 Julien Grall
  2021-02-22 11:11 ` Ian Jackson
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Julien Grall @ 2021-02-20 19:47 UTC (permalink / raw)
  To: xen-devel
  Cc: iwj, jbeulich, sstabellini, ash.j.wilding, Julien Grall,
	George Dunlap, Dario Faggioli

From: Julien Grall <jgrall@amazon.com>

The comment in vcpu_block() states that the events should be checked
/after/ blocking to avoids wakeup waiting race. However, from a generic
perspective, set_bit() doesn't prevent re-ordering. So the following
could happen:

CPU0  (blocking vCPU A)         |   CPU1 ( unblock vCPU A)
                                |
A <- read local events          |
                                |   set local events
                                |   test_and_clear_bit(_VPF_blocked)
                                |       -> Bail out as the bit if not set
                                |
set_bit(_VFP_blocked)           |
                                |
check A                         |

The variable A will be 0 and therefore the vCPU will be blocked when it
should continue running.

vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the CPU
to read any information about local events before the flag _VPF_blocked
is set.

Signed-off-by: Julien Grall <jgrall@amazon.com>

---

This is a follow-up of the discussion that started in 2019 (see [1])
regarding a possible race between do_poll()/vcpu_unblock() and the wake
up path.

I haven't yet fully thought about the potential race in do_poll(). If
there is, then this would likely want to be fixed in a separate patch.

On x86, the current code is safe because set_bit() is fully ordered. SO
the problem is Arm (and potentially any new architectures).

I couldn't convince myself whether the Arm implementation of
local_events_need_delivery() contains enough barrier to prevent the
re-ordering. However, I don't think we want to play with devil here as
the function may be optimized in the future.

This patch is candidate for 4.15. The risk is really small as the memory
ordering will be stricter on Arm and therefore less racy.

[1] <3ce4998b-a8a8-37bd-bb26-9550571709b6@suse.com>
---
 xen/common/sched/core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 9745a77eee23..2b974fd6f8ba 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -1418,6 +1418,8 @@ void vcpu_block(void)
 
     set_bit(_VPF_blocked, &v->pause_flags);
 
+    smp_mb__after_atomic();
+
     arch_vcpu_block(v);
 
     /* Check for events /after/ blocking: avoids wakeup waiting race. */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-20 19:47 [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block() Julien Grall
@ 2021-02-22 11:11 ` Ian Jackson
  2021-02-22 14:26 ` Jan Beulich
  2021-02-23 13:24 ` Ash Wilding
  2 siblings, 0 replies; 12+ messages in thread
From: Ian Jackson @ 2021-02-22 11:11 UTC (permalink / raw)
  To: Julien Grall
  Cc: xen-devel, jbeulich, sstabellini, ash.j.wilding, Julien Grall,
	George Dunlap, Dario Faggioli

Julien Grall writes ("[PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()"):
> From: Julien Grall <jgrall@amazon.com>
> 
> The comment in vcpu_block() states that the events should be checked
> /after/ blocking to avoids wakeup waiting race. However, from a generic
> perspective, set_bit() doesn't prevent re-ordering. So the following
> could happen:

Release-Acked-by: Ian Jackson <iwj@xenproject.org>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-20 19:47 [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block() Julien Grall
  2021-02-22 11:11 ` Ian Jackson
@ 2021-02-22 14:26 ` Jan Beulich
  2021-02-22 20:09   ` Stefano Stabellini
  2021-02-23 13:24 ` Ash Wilding
  2 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2021-02-22 14:26 UTC (permalink / raw)
  To: Julien Grall
  Cc: iwj, sstabellini, ash.j.wilding, Julien Grall, George Dunlap,
	Dario Faggioli, xen-devel

On 20.02.2021 20:47, Julien Grall wrote:
> From: Julien Grall <jgrall@amazon.com>
> 
> The comment in vcpu_block() states that the events should be checked
> /after/ blocking to avoids wakeup waiting race. However, from a generic
> perspective, set_bit() doesn't prevent re-ordering. So the following
> could happen:
> 
> CPU0  (blocking vCPU A)         |   CPU1 ( unblock vCPU A)
>                                 |
> A <- read local events          |
>                                 |   set local events
>                                 |   test_and_clear_bit(_VPF_blocked)
>                                 |       -> Bail out as the bit if not set
>                                 |
> set_bit(_VFP_blocked)           |
>                                 |
> check A                         |
> 
> The variable A will be 0 and therefore the vCPU will be blocked when it
> should continue running.
> 
> vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the CPU
> to read any information about local events before the flag _VPF_blocked
> is set.
> 
> Signed-off-by: Julien Grall <jgrall@amazon.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

> ---
> 
> This is a follow-up of the discussion that started in 2019 (see [1])
> regarding a possible race between do_poll()/vcpu_unblock() and the wake
> up path.
> 
> I haven't yet fully thought about the potential race in do_poll(). If
> there is, then this would likely want to be fixed in a separate patch.
> 
> On x86, the current code is safe because set_bit() is fully ordered. SO
> the problem is Arm (and potentially any new architectures).
> 
> I couldn't convince myself whether the Arm implementation of
> local_events_need_delivery() contains enough barrier to prevent the
> re-ordering. However, I don't think we want to play with devil here as
> the function may be optimized in the future.

In fact I think this ...

> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> @@ -1418,6 +1418,8 @@ void vcpu_block(void)
>  
>      set_bit(_VPF_blocked, &v->pause_flags);
>  
> +    smp_mb__after_atomic();
> +

... pattern should be looked for throughout the codebase, and barriers
be added unless it can be proven none is needed.

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-22 14:26 ` Jan Beulich
@ 2021-02-22 20:09   ` Stefano Stabellini
  2021-02-22 20:12     ` Julien Grall
  2021-02-23  8:45     ` Dario Faggioli
  0 siblings, 2 replies; 12+ messages in thread
From: Stefano Stabellini @ 2021-02-22 20:09 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Julien Grall, iwj, sstabellini, ash.j.wilding, Julien Grall,
	George Dunlap, Dario Faggioli, xen-devel

[-- Attachment #1: Type: text/plain, Size: 2525 bytes --]

On Mon, 22 Feb 2021, Jan Beulich wrote:
> On 20.02.2021 20:47, Julien Grall wrote:
> > From: Julien Grall <jgrall@amazon.com>
> > 
> > The comment in vcpu_block() states that the events should be checked
> > /after/ blocking to avoids wakeup waiting race. However, from a generic
> > perspective, set_bit() doesn't prevent re-ordering. So the following
> > could happen:
> > 
> > CPU0  (blocking vCPU A)         |   CPU1 ( unblock vCPU A)
> >                                 |
> > A <- read local events          |
> >                                 |   set local events
> >                                 |   test_and_clear_bit(_VPF_blocked)
> >                                 |       -> Bail out as the bit if not set
> >                                 |
> > set_bit(_VFP_blocked)           |
> >                                 |
> > check A                         |
> > 
> > The variable A will be 0 and therefore the vCPU will be blocked when it
> > should continue running.
> > 
> > vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the CPU
> > to read any information about local events before the flag _VPF_blocked
> > is set.
> > 
> > Signed-off-by: Julien Grall <jgrall@amazon.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Acked-by: Stefano Stabellini <sstabellini@kernel.org>



> > This is a follow-up of the discussion that started in 2019 (see [1])
> > regarding a possible race between do_poll()/vcpu_unblock() and the wake
> > up path.
> > 
> > I haven't yet fully thought about the potential race in do_poll(). If
> > there is, then this would likely want to be fixed in a separate patch.
> > 
> > On x86, the current code is safe because set_bit() is fully ordered. SO
> > the problem is Arm (and potentially any new architectures).
> > 
> > I couldn't convince myself whether the Arm implementation of
> > local_events_need_delivery() contains enough barrier to prevent the
> > re-ordering. However, I don't think we want to play with devil here as
> > the function may be optimized in the future.
> 
> In fact I think this ...
> 
> > --- a/xen/common/sched/core.c
> > +++ b/xen/common/sched/core.c
> > @@ -1418,6 +1418,8 @@ void vcpu_block(void)
> >  
> >      set_bit(_VPF_blocked, &v->pause_flags);
> >  
> > +    smp_mb__after_atomic();
> > +
> 
> ... pattern should be looked for throughout the codebase, and barriers
> be added unless it can be proven none is needed.

And in that case it would be best to add an in-code comment to explain
why the barrier is not needed

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-22 20:09   ` Stefano Stabellini
@ 2021-02-22 20:12     ` Julien Grall
  2021-02-23  1:22       ` Stefano Stabellini
  2021-02-23  7:00       ` Jan Beulich
  2021-02-23  8:45     ` Dario Faggioli
  1 sibling, 2 replies; 12+ messages in thread
From: Julien Grall @ 2021-02-22 20:12 UTC (permalink / raw)
  To: Stefano Stabellini, Jan Beulich
  Cc: iwj, ash.j.wilding, Julien Grall, George Dunlap, Dario Faggioli,
	xen-devel



On 22/02/2021 20:09, Stefano Stabellini wrote:
> On Mon, 22 Feb 2021, Jan Beulich wrote:
>> On 20.02.2021 20:47, Julien Grall wrote:
>>> From: Julien Grall <jgrall@amazon.com>
>>>
>>> The comment in vcpu_block() states that the events should be checked
>>> /after/ blocking to avoids wakeup waiting race. However, from a generic
>>> perspective, set_bit() doesn't prevent re-ordering. So the following
>>> could happen:
>>>
>>> CPU0  (blocking vCPU A)         |   CPU1 ( unblock vCPU A)
>>>                                  |
>>> A <- read local events          |
>>>                                  |   set local events
>>>                                  |   test_and_clear_bit(_VPF_blocked)
>>>                                  |       -> Bail out as the bit if not set
>>>                                  |
>>> set_bit(_VFP_blocked)           |
>>>                                  |
>>> check A                         |
>>>
>>> The variable A will be 0 and therefore the vCPU will be blocked when it
>>> should continue running.
>>>
>>> vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the CPU
>>> to read any information about local events before the flag _VPF_blocked
>>> is set.
>>>
>>> Signed-off-by: Julien Grall <jgrall@amazon.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
> 
> 
> 
>>> This is a follow-up of the discussion that started in 2019 (see [1])
>>> regarding a possible race between do_poll()/vcpu_unblock() and the wake
>>> up path.
>>>
>>> I haven't yet fully thought about the potential race in do_poll(). If
>>> there is, then this would likely want to be fixed in a separate patch.
>>>
>>> On x86, the current code is safe because set_bit() is fully ordered. SO
>>> the problem is Arm (and potentially any new architectures).
>>>
>>> I couldn't convince myself whether the Arm implementation of
>>> local_events_need_delivery() contains enough barrier to prevent the
>>> re-ordering. However, I don't think we want to play with devil here as
>>> the function may be optimized in the future.
>>
>> In fact I think this ...
>>
>>> --- a/xen/common/sched/core.c
>>> +++ b/xen/common/sched/core.c
>>> @@ -1418,6 +1418,8 @@ void vcpu_block(void)
>>>   
>>>       set_bit(_VPF_blocked, &v->pause_flags);
>>>   
>>> +    smp_mb__after_atomic();
>>> +
>>
>> ... pattern should be looked for throughout the codebase, and barriers
>> be added unless it can be proven none is needed. >
> And in that case it would be best to add an in-code comment to explain
> why the barrier is not needed
.
I would rather not add comment for every *_bit() calls. It should be 
pretty obvious for most of them that the barrier is not necessary.

We should only add comments where the barrier is necessary or it is not 
clear why it is not necessary.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-22 20:12     ` Julien Grall
@ 2021-02-23  1:22       ` Stefano Stabellini
  2021-02-23  7:00       ` Jan Beulich
  1 sibling, 0 replies; 12+ messages in thread
From: Stefano Stabellini @ 2021-02-23  1:22 UTC (permalink / raw)
  To: Julien Grall
  Cc: Stefano Stabellini, Jan Beulich, iwj, ash.j.wilding,
	Julien Grall, George Dunlap, Dario Faggioli, xen-devel

[-- Attachment #1: Type: text/plain, Size: 3306 bytes --]

On Mon, 22 Feb 2021, Julien Grall wrote:
> On 22/02/2021 20:09, Stefano Stabellini wrote:
> > On Mon, 22 Feb 2021, Jan Beulich wrote:
> > > On 20.02.2021 20:47, Julien Grall wrote:
> > > > From: Julien Grall <jgrall@amazon.com>
> > > > 
> > > > The comment in vcpu_block() states that the events should be checked
> > > > /after/ blocking to avoids wakeup waiting race. However, from a generic
> > > > perspective, set_bit() doesn't prevent re-ordering. So the following
> > > > could happen:
> > > > 
> > > > CPU0  (blocking vCPU A)         |   CPU1 ( unblock vCPU A)
> > > >                                  |
> > > > A <- read local events          |
> > > >                                  |   set local events
> > > >                                  |   test_and_clear_bit(_VPF_blocked)
> > > >                                  |       -> Bail out as the bit if not
> > > > set
> > > >                                  |
> > > > set_bit(_VFP_blocked)           |
> > > >                                  |
> > > > check A                         |
> > > > 
> > > > The variable A will be 0 and therefore the vCPU will be blocked when it
> > > > should continue running.
> > > > 
> > > > vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the CPU
> > > > to read any information about local events before the flag _VPF_blocked
> > > > is set.
> > > > 
> > > > Signed-off-by: Julien Grall <jgrall@amazon.com>
> > > 
> > > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > 
> > Acked-by: Stefano Stabellini <sstabellini@kernel.org>
> > 
> > 
> > 
> > > > This is a follow-up of the discussion that started in 2019 (see [1])
> > > > regarding a possible race between do_poll()/vcpu_unblock() and the wake
> > > > up path.
> > > > 
> > > > I haven't yet fully thought about the potential race in do_poll(). If
> > > > there is, then this would likely want to be fixed in a separate patch.
> > > > 
> > > > On x86, the current code is safe because set_bit() is fully ordered. SO
> > > > the problem is Arm (and potentially any new architectures).
> > > > 
> > > > I couldn't convince myself whether the Arm implementation of
> > > > local_events_need_delivery() contains enough barrier to prevent the
> > > > re-ordering. However, I don't think we want to play with devil here as
> > > > the function may be optimized in the future.
> > > 
> > > In fact I think this ...
> > > 
> > > > --- a/xen/common/sched/core.c
> > > > +++ b/xen/common/sched/core.c
> > > > @@ -1418,6 +1418,8 @@ void vcpu_block(void)
> > > >         set_bit(_VPF_blocked, &v->pause_flags);
> > > >   +    smp_mb__after_atomic();
> > > > +
> > > 
> > > ... pattern should be looked for throughout the codebase, and barriers
> > > be added unless it can be proven none is needed. >
> > And in that case it would be best to add an in-code comment to explain
> > why the barrier is not needed
> .
> I would rather not add comment for every *_bit() calls. It should be pretty
> obvious for most of them that the barrier is not necessary.
> 
> We should only add comments where the barrier is necessary or it is not clear
> why it is not necessary.

Either way is fine, as long as it is consistent. Yeah we don't want to
add too many comments everywhere so maybe adding them only when the
barrier is required could be better.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-22 20:12     ` Julien Grall
  2021-02-23  1:22       ` Stefano Stabellini
@ 2021-02-23  7:00       ` Jan Beulich
  2021-02-23  9:04         ` Julien Grall
  1 sibling, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2021-02-23  7:00 UTC (permalink / raw)
  To: Julien Grall, Stefano Stabellini
  Cc: iwj, ash.j.wilding, Julien Grall, George Dunlap, Dario Faggioli,
	xen-devel

On 22.02.2021 21:12, Julien Grall wrote:
> On 22/02/2021 20:09, Stefano Stabellini wrote:
>> On Mon, 22 Feb 2021, Jan Beulich wrote:
>>> On 20.02.2021 20:47, Julien Grall wrote:
>>>> This is a follow-up of the discussion that started in 2019 (see [1])
>>>> regarding a possible race between do_poll()/vcpu_unblock() and the wake
>>>> up path.
>>>>
>>>> I haven't yet fully thought about the potential race in do_poll(). If
>>>> there is, then this would likely want to be fixed in a separate patch.
>>>>
>>>> On x86, the current code is safe because set_bit() is fully ordered. SO
>>>> the problem is Arm (and potentially any new architectures).
>>>>
>>>> I couldn't convince myself whether the Arm implementation of
>>>> local_events_need_delivery() contains enough barrier to prevent the
>>>> re-ordering. However, I don't think we want to play with devil here as
>>>> the function may be optimized in the future.
>>>
>>> In fact I think this ...
>>>
>>>> --- a/xen/common/sched/core.c
>>>> +++ b/xen/common/sched/core.c
>>>> @@ -1418,6 +1418,8 @@ void vcpu_block(void)
>>>>   
>>>>       set_bit(_VPF_blocked, &v->pause_flags);
>>>>   
>>>> +    smp_mb__after_atomic();
>>>> +
>>>
>>> ... pattern should be looked for throughout the codebase, and barriers
>>> be added unless it can be proven none is needed. >
>> And in that case it would be best to add an in-code comment to explain
>> why the barrier is not needed
> .
> I would rather not add comment for every *_bit() calls. It should be 
> pretty obvious for most of them that the barrier is not necessary.
> 
> We should only add comments where the barrier is necessary or it is not 
> clear why it is not necessary.

I guess by "pattern" I didn't necessarily mean _all_ *_bit()
calls - indeed there are many where it's clear that no barrier
would be needed. I was rather meaning modifications like this
of v->pause_flags (I'm sure there are further such fields).

Jan


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-22 20:09   ` Stefano Stabellini
  2021-02-22 20:12     ` Julien Grall
@ 2021-02-23  8:45     ` Dario Faggioli
  1 sibling, 0 replies; 12+ messages in thread
From: Dario Faggioli @ 2021-02-23  8:45 UTC (permalink / raw)
  To: Stefano Stabellini, Jan Beulich
  Cc: Julien Grall, iwj, ash.j.wilding, Julien Grall, George Dunlap, xen-devel

[-- Attachment #1: Type: text/plain, Size: 870 bytes --]

On Mon, 2021-02-22 at 12:09 -0800, Stefano Stabellini wrote:
> On Mon, 22 Feb 2021, Jan Beulich wrote:
> > On 20.02.2021 20:47, Julien Grall wrote:
> > > 
> > > vcpu_block() is now gaining an smp_mb__after_atomic() to prevent
> > > the CPU
> > > to read any information about local events before the flag
> > > _VPF_blocked
> > > is set.
> > > 
> > > Signed-off-by: Julien Grall <jgrall@amazon.com>
> > 
> > Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
> 
Acked-by: Dario Faggioli <dfaggioli@suse.com>

Thanks and Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
-------------------------------------------------------------------
<<This happens because _I_ choose it to happen!>> (Raistlin Majere)

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-23  7:00       ` Jan Beulich
@ 2021-02-23  9:04         ` Julien Grall
  0 siblings, 0 replies; 12+ messages in thread
From: Julien Grall @ 2021-02-23  9:04 UTC (permalink / raw)
  To: Jan Beulich, Stefano Stabellini
  Cc: iwj, ash.j.wilding, Julien Grall, George Dunlap, Dario Faggioli,
	xen-devel

Hi,

On 23/02/2021 07:00, Jan Beulich wrote:
> On 22.02.2021 21:12, Julien Grall wrote:
>> On 22/02/2021 20:09, Stefano Stabellini wrote:
>>> On Mon, 22 Feb 2021, Jan Beulich wrote:
>>>> On 20.02.2021 20:47, Julien Grall wrote:
>>>>> This is a follow-up of the discussion that started in 2019 (see [1])
>>>>> regarding a possible race between do_poll()/vcpu_unblock() and the wake
>>>>> up path.
>>>>>
>>>>> I haven't yet fully thought about the potential race in do_poll(). If
>>>>> there is, then this would likely want to be fixed in a separate patch.
>>>>>
>>>>> On x86, the current code is safe because set_bit() is fully ordered. SO
>>>>> the problem is Arm (and potentially any new architectures).
>>>>>
>>>>> I couldn't convince myself whether the Arm implementation of
>>>>> local_events_need_delivery() contains enough barrier to prevent the
>>>>> re-ordering. However, I don't think we want to play with devil here as
>>>>> the function may be optimized in the future.
>>>>
>>>> In fact I think this ...
>>>>
>>>>> --- a/xen/common/sched/core.c
>>>>> +++ b/xen/common/sched/core.c
>>>>> @@ -1418,6 +1418,8 @@ void vcpu_block(void)
>>>>>    
>>>>>        set_bit(_VPF_blocked, &v->pause_flags);
>>>>>    
>>>>> +    smp_mb__after_atomic();
>>>>> +
>>>>
>>>> ... pattern should be looked for throughout the codebase, and barriers
>>>> be added unless it can be proven none is needed. >
>>> And in that case it would be best to add an in-code comment to explain
>>> why the barrier is not needed
>> .
>> I would rather not add comment for every *_bit() calls. It should be
>> pretty obvious for most of them that the barrier is not necessary.
>>
>> We should only add comments where the barrier is necessary or it is not
>> clear why it is not necessary.
> 
> I guess by "pattern" I didn't necessarily mean _all_ *_bit()
> calls - indeed there are many where it's clear that no barrier
> would be needed. I was rather meaning modifications like this
> of v->pause_flags (I'm sure there are further such fields).

Agree, this work is mostly a side project for now. So I will continue to 
go through the pattern when I find time.

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-20 19:47 [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block() Julien Grall
  2021-02-22 11:11 ` Ian Jackson
  2021-02-22 14:26 ` Jan Beulich
@ 2021-02-23 13:24 ` Ash Wilding
  2021-02-25 14:01   ` Julien Grall
  2 siblings, 1 reply; 12+ messages in thread
From: Ash Wilding @ 2021-02-23 13:24 UTC (permalink / raw)
  To: julien
  Cc: ash.j.wilding, dfaggioli, george.dunlap, iwj, jbeulich, jgrall,
	sstabellini, xen-devel

Hi Julien,

Thanks for looking at this,

> vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the
> CPU to read any information about local events before the flag
> _VPF_blocked is set.

Reviewed-by: Ash Wilding <ash.j.wilding@gmail.com>


As an aside,

> I couldn't convince myself whether the Arm implementation of
> local_events_need_delivery() contains enough barrier to prevent the
> re-ordering. However, I don't think we want to play with devil here
> as the function may be optimized in the future.

Agreed.

The vgic_vcpu_pending_irq() and vgic_evtchn_irq_pending() in the call
path of local_events_need_delivery() both call spin_lock_irqsave(),
which has an arch_lock_acquire_barrier() in its call path.

That just happens to map to a heavier smp_mb() on Arm right now, but
relying on this behaviour would be shaky; I can imagine a future update
to arch_lock_acquire_barrier() that relaxes it down to just acquire
semantics like its name implies (for example an LSE-based lock_acquire()
using LDUMAXA),in which case any code incorrectly relying on that full
barrier behaviour may break. I'm guessing this is what you meant by the
function may be optimized in future?

Do we know whether there is an expectation for previous loads/stores
to have been observed before local_events_need_delivery()? I'm wondering
whether it would make sense to have an smb_mb() at the start of the
*_nomask() variant in local_events_need_delivery()'s call path.

Doing so would obviate the need for this particular patch, though would
not obviate the need to identify and fix similar set_bit() patterns.


Cheers,
Ash.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-23 13:24 ` Ash Wilding
@ 2021-02-25 14:01   ` Julien Grall
  2021-02-25 21:35     ` Stefano Stabellini
  0 siblings, 1 reply; 12+ messages in thread
From: Julien Grall @ 2021-02-25 14:01 UTC (permalink / raw)
  To: Ash Wilding
  Cc: dfaggioli, george.dunlap, iwj, jbeulich, jgrall, sstabellini, xen-devel

On 23/02/2021 13:24, Ash Wilding wrote:
> Hi Julien,

Hi Ash,

> Thanks for looking at this,
> 
>> vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the
>> CPU to read any information about local events before the flag
>> _VPF_blocked is set.
> 
> Reviewed-by: Ash Wilding <ash.j.wilding@gmail.com>

Thanks!

> 
> 
> As an aside,
> 
>> I couldn't convince myself whether the Arm implementation of
>> local_events_need_delivery() contains enough barrier to prevent the
>> re-ordering. However, I don't think we want to play with devil here
>> as the function may be optimized in the future.
> 
> Agreed.
> 
> The vgic_vcpu_pending_irq() and vgic_evtchn_irq_pending() in the call
> path of local_events_need_delivery() both call spin_lock_irqsave(),
> which has an arch_lock_acquire_barrier() in its call path.
> 
> That just happens to map to a heavier smp_mb() on Arm right now, but
> relying on this behaviour would be shaky; I can imagine a future update
> to arch_lock_acquire_barrier() that relaxes it down to just acquire
> semantics like its name implies (for example an LSE-based lock_acquire()
> using LDUMAXA),in which case any code incorrectly relying on that full
> barrier behaviour may break. I'm guessing this is what you meant by the
> function may be optimized in future?

That's one of the optimization I had in mind. The other one is we may 
find a way to remove the spinlocks, so the barriers would disappear 
completely.

> 
> Do we know whether there is an expectation for previous loads/stores
> to have been observed before local_events_need_delivery()? I'm wondering
> whether it would make sense to have an smb_mb() at the start of the
> *_nomask() variant in local_events_need_delivery()'s call path.

That's a good question :). For Arm, there are 4 users of 
local_events_need_delivery():
   1) do_poll()
   2) vcpu_block()
   3) hypercall_preempt_check()
   4) general_preempt_check()

3 and 4 are used for breaking down long running operations. I guess we 
would want to have an accurate view of the pending events and therefore 
we would need a memory barrier to prevent the loads happening too early.

In this case, I think the smp_mb() would want to be part of the 
hypercall_preempt_check() and general_preempt_check().

Therefore, I think we want to avoid the extra barrier in 
local_events_need_delivery(). Instead, we should require the caller to 
take care of the ordering if needed.

This would have benefits any new architecture as the common code would 
already contain the appropriate barriers.

@Stefano, what do you think?

Cheers,

-- 
Julien Grall


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block()
  2021-02-25 14:01   ` Julien Grall
@ 2021-02-25 21:35     ` Stefano Stabellini
  0 siblings, 0 replies; 12+ messages in thread
From: Stefano Stabellini @ 2021-02-25 21:35 UTC (permalink / raw)
  To: Julien Grall
  Cc: Ash Wilding, dfaggioli, george.dunlap, iwj, jbeulich, jgrall,
	sstabellini, xen-devel

On Thu, 25 Feb 2021, Julien Grall wrote:
> On 23/02/2021 13:24, Ash Wilding wrote:
> > Hi Julien,
> 
> Hi Ash,
> 
> > Thanks for looking at this,
> > 
> > > vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the
> > > CPU to read any information about local events before the flag
> > > _VPF_blocked is set.
> > 
> > Reviewed-by: Ash Wilding <ash.j.wilding@gmail.com>
> 
> Thanks!
> 
> > 
> > 
> > As an aside,
> > 
> > > I couldn't convince myself whether the Arm implementation of
> > > local_events_need_delivery() contains enough barrier to prevent the
> > > re-ordering. However, I don't think we want to play with devil here
> > > as the function may be optimized in the future.
> > 
> > Agreed.
> > 
> > The vgic_vcpu_pending_irq() and vgic_evtchn_irq_pending() in the call
> > path of local_events_need_delivery() both call spin_lock_irqsave(),
> > which has an arch_lock_acquire_barrier() in its call path.
> > 
> > That just happens to map to a heavier smp_mb() on Arm right now, but
> > relying on this behaviour would be shaky; I can imagine a future update
> > to arch_lock_acquire_barrier() that relaxes it down to just acquire
> > semantics like its name implies (for example an LSE-based lock_acquire()
> > using LDUMAXA),in which case any code incorrectly relying on that full
> > barrier behaviour may break. I'm guessing this is what you meant by the
> > function may be optimized in future?
> 
> That's one of the optimization I had in mind. The other one is we may find a
> way to remove the spinlocks, so the barriers would disappear completely.
> 
> > 
> > Do we know whether there is an expectation for previous loads/stores
> > to have been observed before local_events_need_delivery()? I'm wondering
> > whether it would make sense to have an smb_mb() at the start of the
> > *_nomask() variant in local_events_need_delivery()'s call path.
> 
> That's a good question :). For Arm, there are 4 users of
> local_events_need_delivery():
>   1) do_poll()
>   2) vcpu_block()
>   3) hypercall_preempt_check()
>   4) general_preempt_check()
> 
> 3 and 4 are used for breaking down long running operations. I guess we would
> want to have an accurate view of the pending events and therefore we would
> need a memory barrier to prevent the loads happening too early.
> 
> In this case, I think the smp_mb() would want to be part of the
> hypercall_preempt_check() and general_preempt_check().
> 
> Therefore, I think we want to avoid the extra barrier in
> local_events_need_delivery(). Instead, we should require the caller to take
> care of the ordering if needed.
> 
> This would have benefits any new architecture as the common code would already
> contain the appropriate barriers.
> 
> @Stefano, what do you think?

I am thinking the same way as you. Also because it is cleaner if it is
the one who writes that also takes care of any barriers/flushes needed.

In this case it is vcpu_block that is writing _VPF_blocked and knows
that it has to be seen before local_events_need_delivery(). It is easier
to keep track of it if the barrier is in vcpu_block together with the
set_bit call.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-02-25 21:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-20 19:47 [PATCH for-4.15] xen/sched: Add missing memory barrier in vcpu_block() Julien Grall
2021-02-22 11:11 ` Ian Jackson
2021-02-22 14:26 ` Jan Beulich
2021-02-22 20:09   ` Stefano Stabellini
2021-02-22 20:12     ` Julien Grall
2021-02-23  1:22       ` Stefano Stabellini
2021-02-23  7:00       ` Jan Beulich
2021-02-23  9:04         ` Julien Grall
2021-02-23  8:45     ` Dario Faggioli
2021-02-23 13:24 ` Ash Wilding
2021-02-25 14:01   ` Julien Grall
2021-02-25 21:35     ` Stefano Stabellini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).