linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] timers/migration: Fix ignored event due to missing CPU update
@ 2024-04-01 21:48 Frederic Weisbecker
  2024-04-01 22:18 ` Frederic Weisbecker
  2024-04-02  9:52 ` Anna-Maria Behnsen
  0 siblings, 2 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2024-04-01 21:48 UTC (permalink / raw)
  To: Anna-Maria Behnsen, Thomas Gleixner, Ingo Molnar
  Cc: LKML, Frederic Weisbecker

When a group event is updated with its expiry unchanged but a different
CPU, that target change may go unnoticed and the event may be propagated
up with a stale CPU value. The following depicts a scenario that has
been actually observed:

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = TGRP1:0 (T0)
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T0
      /         \
    0 (T0)       1 (T1)
    idle         idle

0) The hierarchy has 3 levels. The left part (GRP1:0) is all idle,
including CPU 0 and CPU 1 which have a timer each: T0 and T1. They have
the same expiry value.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T0
      /         \
    0 (T0)       1 (T1)
    idle         idle

1) The migrator in GRP1:1 handles remotely T0. The event is dequeued
from the top and T0 executed.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

2) The migrator in GRP1:1 fetches the next timer for CPU 0 and finds
none. But it updates the events from its groups, starting with GRP0:0
which now has T1 as its next event. So far so good.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

3) The migrator in GRP1:1 proceeds upward and updates the events in
GRP1:0. The child event TGRP0:0 is found queued with the same expiry
as before. And therefore it is left unchanged. However the target CPU
is not the same but that fact is ignored so TGRP0:0 still points to
CPU 0 when it should point to CPU 1.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = TGRP1:0 (T0)
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

4) The propagation has reached the top level and TGRP1:0, having TGRP0:0
as its first event, also wrongly points to CPU 0. TGRP1:0 is added to
the top level group.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

5) The migrator in GRP1:1 dequeues the next event in top level pointing
to CPU 0. But since it actually doesn't see any real event in CPU 0, it
early returns.

6) T1 is left unhandled until either CPU 0 or CPU 1 wake up.

Some other bad scenario may involve trees with just two levels.

Fix this with checking the CPU, along with the expiry value before
considering to early return while updating a queued event.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/timer_migration.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index c63a0afdcebe..90786bb9a607 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -762,8 +762,11 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
 	 * queue when the expiry time changed only or when it could be ignored.
 	 */
 	if (timerqueue_node_queued(&evt->nextevt)) {
-		if ((evt->nextevt.expires == nextexp) && !evt->ignore)
+		if ((evt->nextevt.expires == nextexp) && !evt->ignore) {
+			if (evt->cpu != first_childevt->cpu)
+				evt->cpu = first_childevt->cpu;
 			goto check_toplvl;
+		}
 
 		if (!timerqueue_del(&group->events, &evt->nextevt))
 			WRITE_ONCE(group->next_expiry, KTIME_MAX);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] timers/migration: Fix ignored event due to missing CPU update
  2024-04-01 21:48 [PATCH] timers/migration: Fix ignored event due to missing CPU update Frederic Weisbecker
@ 2024-04-01 22:18 ` Frederic Weisbecker
  2024-04-02  9:52 ` Anna-Maria Behnsen
  1 sibling, 0 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2024-04-01 22:18 UTC (permalink / raw)
  To: Anna-Maria Behnsen, Thomas Gleixner, Ingo Molnar; +Cc: LKML

Le Mon, Apr 01, 2024 at 11:48:59PM +0200, Frederic Weisbecker a écrit :
> When a group event is updated with its expiry unchanged but a different
> CPU, that target change may go unnoticed and the event may be propagated
> up with a stale CPU value. The following depicts a scenario that has
> been actually observed:
> 
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = TGRP1:0 (T0)
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T0
>       /         \
>     0 (T0)       1 (T1)
>     idle         idle
> 
> 0) The hierarchy has 3 levels. The left part (GRP1:0) is all idle,
> including CPU 0 and CPU 1 which have a timer each: T0 and T1. They have
> the same expiry value.
> 
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = KTIME_MAX
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T0
>       /         \
>     0 (T0)       1 (T1)
>     idle         idle
> 
> 1) The migrator in GRP1:1 handles remotely T0. The event is dequeued
> from the top and T0 executed.
> 
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = KTIME_MAX
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T1
>       /         \
>     0            1 (T1)
>     idle         idle
> 
> 2) The migrator in GRP1:1 fetches the next timer for CPU 0 and finds
> none. But it updates the events from its groups, starting with GRP0:0
> which now has T1 as its next event. So far so good.
> 
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = KTIME_MAX
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T1
>       /         \
>     0            1 (T1)
>     idle         idle
> 
> 3) The migrator in GRP1:1 proceeds upward and updates the events in
> GRP1:0. The child event TGRP0:0 is found queued with the same expiry
> as before. And therefore it is left unchanged. However the target CPU
> is not the same but that fact is ignored so TGRP0:0 still points to
> CPU 0 when it should point to CPU 1.
> 
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = TGRP1:0 (T0)
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T1
>       /         \
>     0            1 (T1)
>     idle         idle
> 
> 4) The propagation has reached the top level and TGRP1:0, having TGRP0:0
> as its first event, also wrongly points to CPU 0. TGRP1:0 is added to
> the top level group.
> 
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = KTIME_MAX
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T1
>       /         \
>     0            1 (T1)
>     idle         idle
> 
> 5) The migrator in GRP1:1 dequeues the next event in top level pointing
> to CPU 0. But since it actually doesn't see any real event in CPU 0, it
> early returns.
> 
> 6) T1 is left unhandled until either CPU 0 or CPU 1 wake up.
> 
> Some other bad scenario may involve trees with just two levels.
> 
> Fix this with checking the CPU, along with the expiry value before
> considering to early return while updating a queued event.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] timers/migration: Fix ignored event due to missing CPU update
  2024-04-01 21:48 [PATCH] timers/migration: Fix ignored event due to missing CPU update Frederic Weisbecker
  2024-04-01 22:18 ` Frederic Weisbecker
@ 2024-04-02  9:52 ` Anna-Maria Behnsen
  2024-04-03 16:24   ` [PATCH v2] " Frederic Weisbecker
  1 sibling, 1 reply; 6+ messages in thread
From: Anna-Maria Behnsen @ 2024-04-02  9:52 UTC (permalink / raw)
  To: Frederic Weisbecker, Thomas Gleixner, Ingo Molnar
  Cc: LKML, Frederic Weisbecker

Frederic Weisbecker <frederic@kernel.org> writes:

> When a group event is updated with its expiry unchanged but a different
> CPU, that target change may go unnoticed and the event may be propagated
> up with a stale CPU value. The following depicts a scenario that has
> been actually observed:

urgh...

>
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> ---
>  kernel/time/timer_migration.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
> index c63a0afdcebe..90786bb9a607 100644
> --- a/kernel/time/timer_migration.c
> +++ b/kernel/time/timer_migration.c
> @@ -762,8 +762,11 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
>  	 * queue when the expiry time changed only or when it could be ignored.
>  	 */
>  	if (timerqueue_node_queued(&evt->nextevt)) {
> -		if ((evt->nextevt.expires == nextexp) && !evt->ignore)
> +		if ((evt->nextevt.expires == nextexp) && !evt->ignore) {
> +			if (evt->cpu != first_childevt->cpu)
> +				evt->cpu = first_childevt->cpu;

Why not just unconditionally overwriting the evt->cpu value here?

>  			goto check_toplvl;
> +		}
>  
>  		if (!timerqueue_del(&group->events, &evt->nextevt))
>  			WRITE_ONCE(group->next_expiry, KTIME_MAX);

Thanks,

        Anna-Maria


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] timers/migration: Fix ignored event due to missing CPU update
  2024-04-02  9:52 ` Anna-Maria Behnsen
@ 2024-04-03 16:24   ` Frederic Weisbecker
  2024-04-04 16:52     ` Anna-Maria Behnsen
  2024-04-05  9:11     ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker
  0 siblings, 2 replies; 6+ messages in thread
From: Frederic Weisbecker @ 2024-04-03 16:24 UTC (permalink / raw)
  To: Anna-Maria Behnsen; +Cc: Thomas Gleixner, Ingo Molnar, LKML

Le Tue, Apr 02, 2024 at 11:52:23AM +0200, Anna-Maria Behnsen a écrit :
> Frederic Weisbecker <frederic@kernel.org> writes:
> 
> > When a group event is updated with its expiry unchanged but a different
> > CPU, that target change may go unnoticed and the event may be propagated
> > up with a stale CPU value. The following depicts a scenario that has
> > been actually observed:
> 
> urgh...
> 
> >
> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > ---
> >  kernel/time/timer_migration.c | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
> > index c63a0afdcebe..90786bb9a607 100644
> > --- a/kernel/time/timer_migration.c
> > +++ b/kernel/time/timer_migration.c
> > @@ -762,8 +762,11 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
> >  	 * queue when the expiry time changed only or when it could be ignored.
> >  	 */
> >  	if (timerqueue_node_queued(&evt->nextevt)) {
> > -		if ((evt->nextevt.expires == nextexp) && !evt->ignore)
> > +		if ((evt->nextevt.expires == nextexp) && !evt->ignore) {
> > +			if (evt->cpu != first_childevt->cpu)
> > +				evt->cpu = first_childevt->cpu;
> 
> Why not just unconditionally overwriting the evt->cpu value here?

Right! See below:

---
From d038dad7345398a2f6671a3cda98a48805f9eba3 Mon Sep 17 00:00:00 2001
From: Frederic Weisbecker <frederic@kernel.org>
Date: Mon, 1 Apr 2024 23:48:59 +0200
Subject: [PATCH v2] timers/migration: Fix ignored event due to missing CPU update

When a group event is updated with its expiry unchanged but a different
CPU, that target change may go unnoticed and the event may be propagated
up with a stale CPU value. The following depicts a scenario that has
been actually observed:

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = TGRP1:0 (T0)
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T0
      /         \
    0 (T0)       1 (T1)
    idle         idle

0) The hierarchy has 3 levels. The left part (GRP1:0) is all idle,
including CPU 0 and CPU 1 which have a timer each: T0 and T1. They have
the same expiry value.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T0
      /         \
    0 (T0)       1 (T1)
    idle         idle

1) The migrator in GRP1:1 handles remotely T0. The event is dequeued
from the top and T0 executed.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

2) The migrator in GRP1:1 fetches the next timer for CPU 0 and finds
none. But it updates the events from its groups, starting with GRP0:0
which now has T1 as its next event. So far so good.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

3) The migrator in GRP1:1 proceeds upward and updates the events in
GRP1:0. The child event TGRP0:0 is found queued with the same expiry
as before. And therefore it is left unchanged. However the target CPU
is not the same but that fact is ignored so TGRP0:0 still points to
CPU 0 when it should point to CPU 1.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = TGRP1:0 (T0)
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

4) The propagation has reached the top level and TGRP1:0, having TGRP0:0
as its first event, also wrongly points to CPU 0. TGRP1:0 is added to
the top level group.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

5) The migrator in GRP1:1 dequeues the next event in top level pointing
to CPU 0. But since it actually doesn't see any real event in CPU 0, it
early returns.

6) T1 is left unhandled until either CPU 0 or CPU 1 wake up.

Some other bad scenario may involve trees with just two levels.

Fix this with unconditionally updating the CPU of the child event before
considering to early return while updating a queued event with an
unchanged expiry value.

Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/timer_migration.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index c63a0afdcebe..e3075e40cb43 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -762,8 +762,11 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
 	 * queue when the expiry time changed only or when it could be ignored.
 	 */
 	if (timerqueue_node_queued(&evt->nextevt)) {
-		if ((evt->nextevt.expires == nextexp) && !evt->ignore)
+		if ((evt->nextevt.expires == nextexp) && !evt->ignore) {
+			/* Make sure not to miss a new CPU event with the same expiry */
+			evt->cpu = first_childevt->cpu;
 			goto check_toplvl;
+		}
 
 		if (!timerqueue_del(&group->events, &evt->nextevt))
 			WRITE_ONCE(group->next_expiry, KTIME_MAX);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] timers/migration: Fix ignored event due to missing CPU update
  2024-04-03 16:24   ` [PATCH v2] " Frederic Weisbecker
@ 2024-04-04 16:52     ` Anna-Maria Behnsen
  2024-04-05  9:11     ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker
  1 sibling, 0 replies; 6+ messages in thread
From: Anna-Maria Behnsen @ 2024-04-04 16:52 UTC (permalink / raw)
  To: Frederic Weisbecker; +Cc: Thomas Gleixner, Ingo Molnar, LKML

Frederic Weisbecker <frederic@kernel.org> writes:

> Le Tue, Apr 02, 2024 at 11:52:23AM +0200, Anna-Maria Behnsen a écrit :
>> Frederic Weisbecker <frederic@kernel.org> writes:
>> 
>> > When a group event is updated with its expiry unchanged but a different
>> > CPU, that target change may go unnoticed and the event may be propagated
>> > up with a stale CPU value. The following depicts a scenario that has
>> > been actually observed:
>> 
>> urgh...
>> 
>> >
>> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>> > ---
>> >  kernel/time/timer_migration.c | 5 ++++-
>> >  1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
>> > index c63a0afdcebe..90786bb9a607 100644
>> > --- a/kernel/time/timer_migration.c
>> > +++ b/kernel/time/timer_migration.c
>> > @@ -762,8 +762,11 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
>> >  	 * queue when the expiry time changed only or when it could be ignored.
>> >  	 */
>> >  	if (timerqueue_node_queued(&evt->nextevt)) {
>> > -		if ((evt->nextevt.expires == nextexp) && !evt->ignore)
>> > +		if ((evt->nextevt.expires == nextexp) && !evt->ignore) {
>> > +			if (evt->cpu != first_childevt->cpu)
>> > +				evt->cpu = first_childevt->cpu;
>> 
>> Why not just unconditionally overwriting the evt->cpu value here?
>
> Right! See below:
>
> ---
> From d038dad7345398a2f6671a3cda98a48805f9eba3 Mon Sep 17 00:00:00 2001
> From: Frederic Weisbecker <frederic@kernel.org>
> Date: Mon, 1 Apr 2024 23:48:59 +0200
> Subject: [PATCH v2] timers/migration: Fix ignored event due to missing CPU update
>
> When a group event is updated with its expiry unchanged but a different
> CPU, that target change may go unnoticed and the event may be propagated
> up with a stale CPU value. The following depicts a scenario that has
> been actually observed:
>
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = TGRP1:0 (T0)
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T0
>       /         \
>     0 (T0)       1 (T1)
>     idle         idle
>
> 0) The hierarchy has 3 levels. The left part (GRP1:0) is all idle,
> including CPU 0 and CPU 1 which have a timer each: T0 and T1. They have
> the same expiry value.
>
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = KTIME_MAX
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T0
>       /         \
>     0 (T0)       1 (T1)
>     idle         idle
>
> 1) The migrator in GRP1:1 handles remotely T0. The event is dequeued
> from the top and T0 executed.
>
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = KTIME_MAX
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T1
>       /         \
>     0            1 (T1)
>     idle         idle
>
> 2) The migrator in GRP1:1 fetches the next timer for CPU 0 and finds
> none. But it updates the events from its groups, starting with GRP0:0
> which now has T1 as its next event. So far so good.
>
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = KTIME_MAX
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T1
>       /         \
>     0            1 (T1)
>     idle         idle
>
> 3) The migrator in GRP1:1 proceeds upward and updates the events in
> GRP1:0. The child event TGRP0:0 is found queued with the same expiry
> as before. And therefore it is left unchanged. However the target CPU
> is not the same but that fact is ignored so TGRP0:0 still points to
> CPU 0 when it should point to CPU 1.
>
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = TGRP1:0 (T0)
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T1
>       /         \
>     0            1 (T1)
>     idle         idle
>
> 4) The propagation has reached the top level and TGRP1:0, having TGRP0:0
> as its first event, also wrongly points to CPU 0. TGRP1:0 is added to
> the top level group.
>
>                        [GRP2:0]
>                    migrator = GRP1:1
>                    active   = GRP1:1
>                    nextevt  = KTIME_MAX
>                     /              \
>                [GRP1:0]           [GRP1:1]
>             migrator = NONE       [...]
>             active   = NONE
>             nextevt  = TGRP0:0 (T0)
>             /           \
>         [GRP0:0]       [...]
>       migrator = NONE
>       active   = NONE
>       nextevt  = T1
>       /         \
>     0            1 (T1)
>     idle         idle
>
> 5) The migrator in GRP1:1 dequeues the next event in top level pointing
> to CPU 0. But since it actually doesn't see any real event in CPU 0, it
> early returns.
>
> 6) T1 is left unhandled until either CPU 0 or CPU 1 wake up.
>
> Some other bad scenario may involve trees with just two levels.
>
> Fix this with unconditionally updating the CPU of the child event before
> considering to early return while updating a queued event with an
> unchanged expiry value.
>
> Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>

Thanks,

	Anna-Maria


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip: timers/urgent] timers/migration: Fix ignored event due to missing CPU update
  2024-04-03 16:24   ` [PATCH v2] " Frederic Weisbecker
  2024-04-04 16:52     ` Anna-Maria Behnsen
@ 2024-04-05  9:11     ` tip-bot2 for Frederic Weisbecker
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot2 for Frederic Weisbecker @ 2024-04-05  9:11 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Frederic Weisbecker, Thomas Gleixner, Anna-Maria Behnsen, x86,
	linux-kernel

The following commit has been merged into the timers/urgent branch of tip:

Commit-ID:     61f7fdf8fd00ce33d30ca3fae8d643c0850ce945
Gitweb:        https://git.kernel.org/tip/61f7fdf8fd00ce33d30ca3fae8d643c0850ce945
Author:        Frederic Weisbecker <frederic@kernel.org>
AuthorDate:    Mon, 01 Apr 2024 23:48:59 +02:00
Committer:     Thomas Gleixner <tglx@linutronix.de>
CommitterDate: Fri, 05 Apr 2024 11:05:16 +02:00

timers/migration: Fix ignored event due to missing CPU update

When a group event is updated with its expiry unchanged but a different
CPU, that target change may go unnoticed and the event may be propagated
up with a stale CPU value. The following depicts a scenario that has
been actually observed:

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = TGRP1:0 (T0)
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T0
      /         \
    0 (T0)       1 (T1)
    idle         idle

0) The hierarchy has 3 levels. The left part (GRP1:0) is all idle,
including CPU 0 and CPU 1 which have a timer each: T0 and T1. They have
the same expiry value.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T0
      /         \
    0 (T0)       1 (T1)
    idle         idle

1) The migrator in GRP1:1 handles remotely T0. The event is dequeued
from the top and T0 executed.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

2) The migrator in GRP1:1 fetches the next timer for CPU 0 and finds
none. But it updates the events from its groups, starting with GRP0:0
which now has T1 as its next event. So far so good.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

3) The migrator in GRP1:1 proceeds upward and updates the events in
GRP1:0. The child event TGRP0:0 is found queued with the same expiry
as before. And therefore it is left unchanged. However the target CPU
is not the same but that fact is ignored so TGRP0:0 still points to
CPU 0 when it should point to CPU 1.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = TGRP1:0 (T0)
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

4) The propagation has reached the top level and TGRP1:0, having TGRP0:0
as its first event, also wrongly points to CPU 0. TGRP1:0 is added to
the top level group.

                       [GRP2:0]
                   migrator = GRP1:1
                   active   = GRP1:1
                   nextevt  = KTIME_MAX
                    /              \
               [GRP1:0]           [GRP1:1]
            migrator = NONE       [...]
            active   = NONE
            nextevt  = TGRP0:0 (T0)
            /           \
        [GRP0:0]       [...]
      migrator = NONE
      active   = NONE
      nextevt  = T1
      /         \
    0            1 (T1)
    idle         idle

5) The migrator in GRP1:1 dequeues the next event in top level pointing
to CPU 0. But since it actually doesn't see any real event in CPU 0, it
early returns.

6) T1 is left unhandled until either CPU 0 or CPU 1 wake up.

Some other bad scenario may involve trees with just two levels.

Fix this with unconditionally updating the CPU of the child event before
considering to early return while updating a queued event with an
unchanged expiry value.

Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model")
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Link: https://lore.kernel.org/r/Zg2Ct6M2RJAYHgCB@localhost.localdomain
---
 kernel/time/timer_migration.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/time/timer_migration.c b/kernel/time/timer_migration.c
index c63a0af..e3075e4 100644
--- a/kernel/time/timer_migration.c
+++ b/kernel/time/timer_migration.c
@@ -762,8 +762,11 @@ bool tmigr_update_events(struct tmigr_group *group, struct tmigr_group *child,
 	 * queue when the expiry time changed only or when it could be ignored.
 	 */
 	if (timerqueue_node_queued(&evt->nextevt)) {
-		if ((evt->nextevt.expires == nextexp) && !evt->ignore)
+		if ((evt->nextevt.expires == nextexp) && !evt->ignore) {
+			/* Make sure not to miss a new CPU event with the same expiry */
+			evt->cpu = first_childevt->cpu;
 			goto check_toplvl;
+		}
 
 		if (!timerqueue_del(&group->events, &evt->nextevt))
 			WRITE_ONCE(group->next_expiry, KTIME_MAX);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-04-05  9:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-01 21:48 [PATCH] timers/migration: Fix ignored event due to missing CPU update Frederic Weisbecker
2024-04-01 22:18 ` Frederic Weisbecker
2024-04-02  9:52 ` Anna-Maria Behnsen
2024-04-03 16:24   ` [PATCH v2] " Frederic Weisbecker
2024-04-04 16:52     ` Anna-Maria Behnsen
2024-04-05  9:11     ` [tip: timers/urgent] " tip-bot2 for Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).