linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RT 0/2] Linux v4.19.115-rt49-rc1
@ 2020-04-23 20:54 zanussi
  2020-04-23 20:54 ` [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue zanussi
  2020-04-23 20:54 ` [PATCH RT 2/2] Linux 4.19.115-rt49-rc1 zanussi
  0 siblings, 2 replies; 18+ messages in thread
From: zanussi @ 2020-04-23 20:54 UTC (permalink / raw)
  To: LKML, linux-rt-users, Steven Rostedt, Thomas Gleixner,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Daniel Wagner, Clark Williams, Tom Zanussi

From: Tom Zanussi <zanussi@kernel.org>

Dear RT Folks,

This is the RT stable review cycle of patch 4.19.115-rt49-rc1.

Please scream at me if I messed something up. Please test the patches
too.

The -rc release will be uploaded to kernel.org and will be deleted
when the final release is out. This is just a review release (or
release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main
release on 2020-04-27.

To build 4.19.115-rt49-rc1 directly, the following patches should be applied:

  https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.19.tar.xz

  https://www.kernel.org/pub/linux/kernel/v4.x/patch-4.19.115.xz

  https://www.kernel.org/pub/linux/kernel/projects/rt/4.19/patch-4.19.115-rt49-rc1.patch.xz

You can also build from 4.19.115-rt48 by applying the incremental patch:

  https://www.kernel.org/pub/linux/kernel/projects/rt/4.19/incr/patch-4.19.115-rt48-rt49-rc1.patch.xz


Enjoy,

-- Tom


Tom Zanussi (1):
  Linux 4.19.115-rt49-rc1

Zhang Xiao (1):
  tasklet: Address a race resulting in double-enqueue

 include/linux/interrupt.h | 5 ++++-
 kernel/softirq.c          | 6 +++++-
 localversion-rt           | 2 +-
 3 files changed, 10 insertions(+), 3 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-04-23 20:54 [PATCH RT 0/2] Linux v4.19.115-rt49-rc1 zanussi
@ 2020-04-23 20:54 ` zanussi
  2020-06-04 16:04   ` Ramon Fried
  2020-04-23 20:54 ` [PATCH RT 2/2] Linux 4.19.115-rt49-rc1 zanussi
  1 sibling, 1 reply; 18+ messages in thread
From: zanussi @ 2020-04-23 20:54 UTC (permalink / raw)
  To: LKML, linux-rt-users, Steven Rostedt, Thomas Gleixner,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Daniel Wagner, Clark Williams, Tom Zanussi
  Cc: Zhang Xiao

From: Zhang Xiao <xiao.zhang@windriver.com>

v4.19.115-rt49-rc1 stable review patch.
If anyone has any objections, please let me know.

-----------


The kernel bugzilla has the following race condition reported:

CPU0                    CPU1            CPU2
------------------------------------------------
test_set SCHED
 test_set RUN
   if SCHED
    add_list
    raise
    clear RUN
<softirq>
test_set RUN
test_clear SCHED
 ->func
                        test_set SCHED
tasklet_try_unlock ->0
test_clear SCHED
                                        test_set SCHED
 ->func
tasklet_try_unlock ->1
                                        test_set RUN
                                        if SCHED
                                        add list
                                        raise
                                        clear RUN
                        test_set RUN
                        if SCHED
                         add list
                         raise
                         clear RUN

As a result the tasklet is enqueued on both CPUs and run on both CPUs. Due
to the nature of the list used here, it is possible that further
(different) tasklets, which are enqueued after this double-enqueued
tasklet, are scheduled on CPU2 but invoked on CPU1. It is also possible
that these tasklets won't be invoked at all, because during the second
enqueue process the t->next pointer is set to NULL - dropping everything
from the list.

This race will trigger one or two of the WARN_ON() in
tasklet_action_common().
The problem is that the tasklet may be invoked multiple times and clear
SCHED bit on each invocation. This makes it possible to enqueue the
very same tasklet on different CPUs.

Current RT-devel is using the upstream implementation which does not
re-run tasklets if they have SCHED set again and so it does not clear
the SCHED bit multiple times on a single invocation.

Introduce the CHAINED flag. The tasklet will only be enqueued if the
CHAINED flag has been set successfully.
If it is possible to exchange the flags (CHAINED | RUN) -> 0 then the
tasklet won't be re-run. Otherwise the possible SCHED flag is removed
and the tasklet is re-run again.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=61451
Not-signed-off-by: Zhang Xiao <xiao.zhang@windriver.com>
[bigeasy: patch description]
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
---
 include/linux/interrupt.h | 5 ++++-
 kernel/softirq.c          | 6 +++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 97d9ba26915e..a3b5edb26bc5 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -579,12 +579,15 @@ enum
 {
 	TASKLET_STATE_SCHED,	/* Tasklet is scheduled for execution */
 	TASKLET_STATE_RUN,	/* Tasklet is running (SMP only) */
-	TASKLET_STATE_PENDING	/* Tasklet is pending */
+	TASKLET_STATE_PENDING,	/* Tasklet is pending */
+	TASKLET_STATE_CHAINED	/* Tasklet is chained */
 };
 
 #define TASKLET_STATEF_SCHED	(1 << TASKLET_STATE_SCHED)
 #define TASKLET_STATEF_RUN	(1 << TASKLET_STATE_RUN)
 #define TASKLET_STATEF_PENDING	(1 << TASKLET_STATE_PENDING)
+#define TASKLET_STATEF_CHAINED	(1 << TASKLET_STATE_CHAINED)
+#define TASKLET_STATEF_RC	(TASKLET_STATEF_RUN | TASKLET_STATEF_CHAINED)
 
 #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
 static inline int tasklet_trylock(struct tasklet_struct *t)
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 25bcf2f2714b..73dae64bfc9c 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -947,6 +947,10 @@ static void __tasklet_schedule_common(struct tasklet_struct *t,
 	 * is locked before adding it to the list.
 	 */
 	if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
+		if (test_and_set_bit(TASKLET_STATE_CHAINED, &t->state)) {
+			tasklet_unlock(t);
+			return;
+		}
 		t->next = NULL;
 		*head->tail = t;
 		head->tail = &(t->next);
@@ -1040,7 +1044,7 @@ static void tasklet_action_common(struct softirq_action *a,
 again:
 		t->func(t->data);
 
-		while (!tasklet_tryunlock(t)) {
+		while (cmpxchg(&t->state, TASKLET_STATEF_RC, 0) != TASKLET_STATEF_RC) {
 			/*
 			 * If it got disabled meanwhile, bail out:
 			 */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RT 2/2] Linux 4.19.115-rt49-rc1
  2020-04-23 20:54 [PATCH RT 0/2] Linux v4.19.115-rt49-rc1 zanussi
  2020-04-23 20:54 ` [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue zanussi
@ 2020-04-23 20:54 ` zanussi
  1 sibling, 0 replies; 18+ messages in thread
From: zanussi @ 2020-04-23 20:54 UTC (permalink / raw)
  To: LKML, linux-rt-users, Steven Rostedt, Thomas Gleixner,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Daniel Wagner, Clark Williams, Tom Zanussi

From: Tom Zanussi <zanussi@kernel.org>

v4.19.115-rt49-rc1 stable review patch.
If anyone has any objections, please let me know.

-----------


Signed-off-by: Tom Zanussi <zanussi@kernel.org>
---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index 24707986c321..623b4c9cb701 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt48
+-rt49-rc1
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-04-23 20:54 ` [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue zanussi
@ 2020-06-04 16:04   ` Ramon Fried
  2020-06-04 20:51     ` Tom Zanussi
  0 siblings, 1 reply; 18+ messages in thread
From: Ramon Fried @ 2020-06-04 16:04 UTC (permalink / raw)
  To: zanussi
  Cc: LKML, linux-rt-users, Steven Rostedt, Thomas Gleixner,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Daniel Wagner, Clark Williams, Zhang Xiao

On Thu, Apr 23, 2020 at 11:55 PM <zanussi@kernel.org> wrote:
>
> From: Zhang Xiao <xiao.zhang@windriver.com>
>
> v4.19.115-rt49-rc1 stable review patch.
> If anyone has any objections, please let me know.
>
> -----------
>
>
> The kernel bugzilla has the following race condition reported:
>
> CPU0                    CPU1            CPU2
> ------------------------------------------------
> test_set SCHED
>  test_set RUN
>    if SCHED
>     add_list
>     raise
>     clear RUN
> <softirq>
> test_set RUN
> test_clear SCHED
>  ->func
>                         test_set SCHED
> tasklet_try_unlock ->0
> test_clear SCHED
>                                         test_set SCHED
>  ->func
> tasklet_try_unlock ->1
>                                         test_set RUN
>                                         if SCHED
>                                         add list
>                                         raise
>                                         clear RUN
>                         test_set RUN
>                         if SCHED
>                          add list
>                          raise
>                          clear RUN
>
> As a result the tasklet is enqueued on both CPUs and run on both CPUs. Due
> to the nature of the list used here, it is possible that further
> (different) tasklets, which are enqueued after this double-enqueued
> tasklet, are scheduled on CPU2 but invoked on CPU1. It is also possible
> that these tasklets won't be invoked at all, because during the second
> enqueue process the t->next pointer is set to NULL - dropping everything
> from the list.
>
> This race will trigger one or two of the WARN_ON() in
> tasklet_action_common().
> The problem is that the tasklet may be invoked multiple times and clear
> SCHED bit on each invocation. This makes it possible to enqueue the
> very same tasklet on different CPUs.
>
> Current RT-devel is using the upstream implementation which does not
> re-run tasklets if they have SCHED set again and so it does not clear
> the SCHED bit multiple times on a single invocation.
>
> Introduce the CHAINED flag. The tasklet will only be enqueued if the
> CHAINED flag has been set successfully.
> If it is possible to exchange the flags (CHAINED | RUN) -> 0 then the
> tasklet won't be re-run. Otherwise the possible SCHED flag is removed
> and the tasklet is re-run again.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=61451
> Not-signed-off-by: Zhang Xiao <xiao.zhang@windriver.com>
> [bigeasy: patch description]
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
>
> Signed-off-by: Tom Zanussi <zanussi@kernel.org>
> ---
>  include/linux/interrupt.h | 5 ++++-
>  kernel/softirq.c          | 6 +++++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> index 97d9ba26915e..a3b5edb26bc5 100644
> --- a/include/linux/interrupt.h
> +++ b/include/linux/interrupt.h
> @@ -579,12 +579,15 @@ enum
>  {
>         TASKLET_STATE_SCHED,    /* Tasklet is scheduled for execution */
>         TASKLET_STATE_RUN,      /* Tasklet is running (SMP only) */
> -       TASKLET_STATE_PENDING   /* Tasklet is pending */
> +       TASKLET_STATE_PENDING,  /* Tasklet is pending */
> +       TASKLET_STATE_CHAINED   /* Tasklet is chained */
>  };
>
>  #define TASKLET_STATEF_SCHED   (1 << TASKLET_STATE_SCHED)
>  #define TASKLET_STATEF_RUN     (1 << TASKLET_STATE_RUN)
>  #define TASKLET_STATEF_PENDING (1 << TASKLET_STATE_PENDING)
> +#define TASKLET_STATEF_CHAINED (1 << TASKLET_STATE_CHAINED)
> +#define TASKLET_STATEF_RC      (TASKLET_STATEF_RUN | TASKLET_STATEF_CHAINED)
>
>  #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
>  static inline int tasklet_trylock(struct tasklet_struct *t)
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 25bcf2f2714b..73dae64bfc9c 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -947,6 +947,10 @@ static void __tasklet_schedule_common(struct tasklet_struct *t,
>          * is locked before adding it to the list.
>          */
>         if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
> +               if (test_and_set_bit(TASKLET_STATE_CHAINED, &t->state)) {
> +                       tasklet_unlock(t);
> +                       return;
> +               }
>                 t->next = NULL;
>                 *head->tail = t;
>                 head->tail = &(t->next);
> @@ -1040,7 +1044,7 @@ static void tasklet_action_common(struct softirq_action *a,
>  again:
>                 t->func(t->data);
>
> -               while (!tasklet_tryunlock(t)) {
> +               while (cmpxchg(&t->state, TASKLET_STATEF_RC, 0) != TASKLET_STATEF_RC) {
>                         /*
>                          * If it got disabled meanwhile, bail out:
>                          */
> --
> 2.17.1
>

Hi, This patch introduced a regression in our kernel
(v4.19.124-rt53-rebase), It occurs when we're jumping to crush kernel
using kexec, in the initialization of the emmc driver.
I'm still debugging the root cause, but I thought of mentioning this
in the mailing list if you have any idea why this could occur.
The issue doesn't happen on normal boot, only when I specifically
crash the kernel into the crash kernel.
Thanks,
Ramon.

[    0.546142] macb 2b00000.eth1 eth1: Cadence GEM rev 0x00070200 at
0x02b00000 irq 16 (00:28:f8:b7:a0:40)
[    0.548041] sdhci: Secure Digital Host Controller Interface driver
[    0.548042] sdhci: Copyright(c) Pierre Ossman
[    0.548044] sdhci-pltfm: SDHCI platform and OF driver helper
[    0.580588] mmc0: SDHCI controller on 2200000.sdhci [2200000.sdhci]
using ADMA 64-bit
[    0.605753] hm, tasklet state: 00000008
[    0.605757] ------------[ cut here ]------------
[    0.605763] WARNING: CPU: 0 PID: 1 at ../kernel/softirq.c:1061
0xa80000080282649c
[    0.605771] CPU: 0 PID: 1 Comm: swapper Not tainted
4.19.124.eyeq5_ssnt-local-rt53-07819-g3519e8cf4110 #4
[    0.605774] Stack : 0000000000000000 0000000000000018
a80000080340bd40 0000000002d40000
[    0.605780]         a80000080340be70 0000000000000000
0000000000000000 000000000000007e
[    0.605785]         0000000000000000 0000000000000001
0000000000000001 0000000002d30000
[    0.605790]         0000000000000000 a800000802b2a688
0000000002d30000 0000000002d30000
[    0.605795]         0000000000000000 0000000000000000
a80000080282649c a800000802b6cf28
[    0.605799]         0000000000000009 0000000000000425
0000000000000000 0000000000000007
[    0.605804]         0000000002d30000 0000000002d30000
0000000000000000 a800000802d20000
[    0.605809]         a800000803430000 a80000080340bd40
a800000802d284c0 a80000080280a6c4
[    0.605813]         0000000000000000 0000000000000000
0000000000000000 0000000000000000
[    0.605818]         0000000000000000 a80000080280a6c8
0000000000000000 a800000802823b0c
[    0.605823]         ...
[    0.605825] Call Trace:
[    0.605830] [<a800000802b2a688>] 0xa800000802b2a688
[    0.605833] [<a80000080282649c>] 0xa80000080282649c
[    0.605835] [<a80000080280a6c4>] 0xa80000080280a6c4
[    0.605838] [<a80000080280a6c8>] 0xa80000080280a6c8
[    0.605840] [<a800000802823b0c>] 0xa800000802823b0c
[    0.605842] [<a80000080282649c>] 0xa80000080282649c
[    0.605845] [<a800000802b30394>] 0xa800000802b30394
[    0.605847] [<a8000008029cd2ec>] 0xa8000008029cd2ec
[    0.605850] [<a800000802805490>] 0xa800000802805490
[    0.605851]
[    0.605854] ---[ end trace 293afee709e91e0b ]---
[    0.607548] stn8500_serial 800000.uart: detected port #0
[    0.607576] stn8500_serial 800000.uart: uartclk 125000000
[    0.607605] 800000.uart: ttyST0 at MMIO 0x800000 (irq = 46,
base_baud = 7812500) is a stn8500
[    0.610303] stn8500_serial 900000.uart: detected port #1
[    0.610324] stn8500_serial 900000.uart: uartclk 125000000
[    0.611564] 900000.uart: ttyST1 at MMIO 0x900000 (irq = 46,
base_baud = 7812500) is a stn8500
[    0.614645] stn8500_serial a00000.uart: detected port #2
[    0.614891] stn8500_serial a00000.uart: uartclk 125000000
[    0.614919] a00000.uart: ttyST2 at MMIO 0xa00000 (irq = 46,
base_baud = 7812500) is a stn8500
[    0.614930] stn8500_serial a00000.uart: console setup on port #2 :
uartclk 125000000 115207-n-8-n
[    0.615943] console [ttyST2] enabled
[    0.615945] bootconsole [stn8500] disabled
[    0.620959] Freeing unused kernel memory: 1324K
[    0.620965] This architecture does not have kernel memory protection.
[    0.620967] Run /init as init process
[    0.710641] random: dd: uninitialized urandom read (512 bytes read)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-04 16:04   ` Ramon Fried
@ 2020-06-04 20:51     ` Tom Zanussi
  2020-06-09 15:47       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Zanussi @ 2020-06-04 20:51 UTC (permalink / raw)
  To: Ramon Fried
  Cc: LKML, linux-rt-users, Steven Rostedt, Thomas Gleixner,
	Carsten Emde, John Kacur, Sebastian Andrzej Siewior,
	Daniel Wagner, Clark Williams, Zhang Xiao

Hi Ramon,

On Thu, 2020-06-04 at 19:04 +0300, Ramon Fried wrote:
> On Thu, Apr 23, 2020 at 11:55 PM <zanussi@kernel.org> wrote:
> > 
> > From: Zhang Xiao <xiao.zhang@windriver.com>
> > 
> > v4.19.115-rt49-rc1 stable review patch.
> > If anyone has any objections, please let me know.
> > 
> > -----------
> > 
> > 
> > The kernel bugzilla has the following race condition reported:
> > 
> > CPU0                    CPU1            CPU2
> > ------------------------------------------------
> > test_set SCHED
> >  test_set RUN
> >    if SCHED
> >     add_list
> >     raise
> >     clear RUN
> > <softirq>
> > test_set RUN
> > test_clear SCHED
> >  ->func
> >                         test_set SCHED
> > tasklet_try_unlock ->0
> > test_clear SCHED
> >                                         test_set SCHED
> >  ->func
> > tasklet_try_unlock ->1
> >                                         test_set RUN
> >                                         if SCHED
> >                                         add list
> >                                         raise
> >                                         clear RUN
> >                         test_set RUN
> >                         if SCHED
> >                          add list
> >                          raise
> >                          clear RUN
> > 
> > As a result the tasklet is enqueued on both CPUs and run on both
> > CPUs. Due
> > to the nature of the list used here, it is possible that further
> > (different) tasklets, which are enqueued after this double-enqueued
> > tasklet, are scheduled on CPU2 but invoked on CPU1. It is also
> > possible
> > that these tasklets won't be invoked at all, because during the
> > second
> > enqueue process the t->next pointer is set to NULL - dropping
> > everything
> > from the list.
> > 
> > This race will trigger one or two of the WARN_ON() in
> > tasklet_action_common().
> > The problem is that the tasklet may be invoked multiple times and
> > clear
> > SCHED bit on each invocation. This makes it possible to enqueue the
> > very same tasklet on different CPUs.
> > 
> > Current RT-devel is using the upstream implementation which does
> > not
> > re-run tasklets if they have SCHED set again and so it does not
> > clear
> > the SCHED bit multiple times on a single invocation.
> > 
> > Introduce the CHAINED flag. The tasklet will only be enqueued if
> > the
> > CHAINED flag has been set successfully.
> > If it is possible to exchange the flags (CHAINED | RUN) -> 0 then
> > the
> > tasklet won't be re-run. Otherwise the possible SCHED flag is
> > removed
> > and the tasklet is re-run again.
> > 
> > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=61451
> > Not-signed-off-by: Zhang Xiao <xiao.zhang@windriver.com>
> > [bigeasy: patch description]
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > 
> > Signed-off-by: Tom Zanussi <zanussi@kernel.org>
> > ---
> >  include/linux/interrupt.h | 5 ++++-
> >  kernel/softirq.c          | 6 +++++-
> >  2 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
> > index 97d9ba26915e..a3b5edb26bc5 100644
> > --- a/include/linux/interrupt.h
> > +++ b/include/linux/interrupt.h
> > @@ -579,12 +579,15 @@ enum
> >  {
> >         TASKLET_STATE_SCHED,    /* Tasklet is scheduled for
> > execution */
> >         TASKLET_STATE_RUN,      /* Tasklet is running (SMP only) */
> > -       TASKLET_STATE_PENDING   /* Tasklet is pending */
> > +       TASKLET_STATE_PENDING,  /* Tasklet is pending */
> > +       TASKLET_STATE_CHAINED   /* Tasklet is chained */
> >  };
> > 
> >  #define TASKLET_STATEF_SCHED   (1 << TASKLET_STATE_SCHED)
> >  #define TASKLET_STATEF_RUN     (1 << TASKLET_STATE_RUN)
> >  #define TASKLET_STATEF_PENDING (1 << TASKLET_STATE_PENDING)
> > +#define TASKLET_STATEF_CHAINED (1 << TASKLET_STATE_CHAINED)
> > +#define TASKLET_STATEF_RC      (TASKLET_STATEF_RUN |
> > TASKLET_STATEF_CHAINED)
> > 
> >  #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
> >  static inline int tasklet_trylock(struct tasklet_struct *t)
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 25bcf2f2714b..73dae64bfc9c 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -947,6 +947,10 @@ static void __tasklet_schedule_common(struct
> > tasklet_struct *t,
> >          * is locked before adding it to the list.
> >          */
> >         if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
> > +               if (test_and_set_bit(TASKLET_STATE_CHAINED, &t-
> > >state)) {
> > +                       tasklet_unlock(t);
> > +                       return;
> > +               }
> >                 t->next = NULL;
> >                 *head->tail = t;
> >                 head->tail = &(t->next);
> > @@ -1040,7 +1044,7 @@ static void tasklet_action_common(struct
> > softirq_action *a,
> >  again:
> >                 t->func(t->data);
> > 
> > -               while (!tasklet_tryunlock(t)) {
> > +               while (cmpxchg(&t->state, TASKLET_STATEF_RC, 0) !=
> > TASKLET_STATEF_RC) {
> >                         /*
> >                          * If it got disabled meanwhile, bail out:
> >                          */
> > --
> > 2.17.1
> > 
> 
> Hi, This patch introduced a regression in our kernel
> (v4.19.124-rt53-rebase), It occurs when we're jumping to crush kernel
> using kexec, in the initialization of the emmc driver.
> I'm still debugging the root cause, but I thought of mentioning this
> in the mailing list if you have any idea why this could occur.
> The issue doesn't happen on normal boot, only when I specifically
> crash the kernel into the crash kernel.
> Thanks,
> Ramon.

I'm not very familiar with crashing the kernel into the crash kernel. 
Can you explain in enough detail how to set things up to reproduce this
and how to trigger it?  Does it happen every time? 

From looking at the backtrace, it's hitting the WARN_ON() in the
cmpxchg() loop below, because TASKLET_STATE is just
TASKLET_STATE_CHAINED.

It seems that the only way to turn off TASKLET_STATE_CHAINED is via
this cmpxchg(), but TASKLET_STATE_RUN can be independently turned off
elsewhere (tasklet_unlock() and tasklet_tryunlock()), so if that
happens and this loop is hit, you could loop until loops runs out and
hit this warning.

                while (cmpxchg(&t->state, TASKLET_STATEF_RC, 0) != TASKLET_STATEF_RC) {
                        /*                                                      
                         * If it got disabled meanwhile, bail out:              
                         */
                        if (atomic_read(&t->count))
                                goto out_disabled;
                        /*                                                      
                         * If it got scheduled meanwhile, re-execute            
                         * the tasklet function:                                
                         */
                        if (test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
                                goto again;
                        if (!--loops) {
                                printk("hm, tasklet state: %08lx\n", t->state);
                                WARN_ON(1);
                                tasklet_unlock(t);
                                break;
                        }
                }

But I don't know why it would happen just in this special case (crash
kernel, and what's different about that?) and not with a normal boot.

Thanks,

Tom

> 
> [    0.546142] macb 2b00000.eth1 eth1: Cadence GEM rev 0x00070200 at
> 0x02b00000 irq 16 (00:28:f8:b7:a0:40)
> [    0.548041] sdhci: Secure Digital Host Controller Interface driver
> [    0.548042] sdhci: Copyright(c) Pierre Ossman
> [    0.548044] sdhci-pltfm: SDHCI platform and OF driver helper
> [    0.580588] mmc0: SDHCI controller on 2200000.sdhci
> [2200000.sdhci]
> using ADMA 64-bit
> [    0.605753] hm, tasklet state: 00000008
> [    0.605757] ------------[ cut here ]------------
> [    0.605763] WARNING: CPU: 0 PID: 1 at ../kernel/softirq.c:1061
> 0xa80000080282649c
> [    0.605771] CPU: 0 PID: 1 Comm: swapper Not tainted
> 4.19.124.eyeq5_ssnt-local-rt53-07819-g3519e8cf4110 #4
> [    0.605774] Stack : 0000000000000000 0000000000000018
> a80000080340bd40 0000000002d40000
> [    0.605780]         a80000080340be70 0000000000000000
> 0000000000000000 000000000000007e
> [    0.605785]         0000000000000000 0000000000000001
> 0000000000000001 0000000002d30000
> [    0.605790]         0000000000000000 a800000802b2a688
> 0000000002d30000 0000000002d30000
> [    0.605795]         0000000000000000 0000000000000000
> a80000080282649c a800000802b6cf28
> [    0.605799]         0000000000000009 0000000000000425
> 0000000000000000 0000000000000007
> [    0.605804]         0000000002d30000 0000000002d30000
> 0000000000000000 a800000802d20000
> [    0.605809]         a800000803430000 a80000080340bd40
> a800000802d284c0 a80000080280a6c4
> [    0.605813]         0000000000000000 0000000000000000
> 0000000000000000 0000000000000000
> [    0.605818]         0000000000000000 a80000080280a6c8
> 0000000000000000 a800000802823b0c
> [    0.605823]         ...
> [    0.605825] Call Trace:
> [    0.605830] [<a800000802b2a688>] 0xa800000802b2a688
> [    0.605833] [<a80000080282649c>] 0xa80000080282649c
> [    0.605835] [<a80000080280a6c4>] 0xa80000080280a6c4
> [    0.605838] [<a80000080280a6c8>] 0xa80000080280a6c8
> [    0.605840] [<a800000802823b0c>] 0xa800000802823b0c
> [    0.605842] [<a80000080282649c>] 0xa80000080282649c
> [    0.605845] [<a800000802b30394>] 0xa800000802b30394
> [    0.605847] [<a8000008029cd2ec>] 0xa8000008029cd2ec
> [    0.605850] [<a800000802805490>] 0xa800000802805490
> [    0.605851]
> [    0.605854] ---[ end trace 293afee709e91e0b ]---
> [    0.607548] stn8500_serial 800000.uart: detected port #0
> [    0.607576] stn8500_serial 800000.uart: uartclk 125000000
> [    0.607605] 800000.uart: ttyST0 at MMIO 0x800000 (irq = 46,
> base_baud = 7812500) is a stn8500
> [    0.610303] stn8500_serial 900000.uart: detected port #1
> [    0.610324] stn8500_serial 900000.uart: uartclk 125000000
> [    0.611564] 900000.uart: ttyST1 at MMIO 0x900000 (irq = 46,
> base_baud = 7812500) is a stn8500
> [    0.614645] stn8500_serial a00000.uart: detected port #2
> [    0.614891] stn8500_serial a00000.uart: uartclk 125000000
> [    0.614919] a00000.uart: ttyST2 at MMIO 0xa00000 (irq = 46,
> base_baud = 7812500) is a stn8500
> [    0.614930] stn8500_serial a00000.uart: console setup on port #2 :
> uartclk 125000000 115207-n-8-n
> [    0.615943] console [ttyST2] enabled
> [    0.615945] bootconsole [stn8500] disabled
> [    0.620959] Freeing unused kernel memory: 1324K
> [    0.620965] This architecture does not have kernel memory
> protection.
> [    0.620967] Run /init as init process
> [    0.710641] random: dd: uninitialized urandom read (512 bytes
> read)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-04 20:51     ` Tom Zanussi
@ 2020-06-09 15:47       ` Sebastian Andrzej Siewior
  2020-06-09 16:17         ` Tom Zanussi
  0 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-06-09 15:47 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Ramon Fried, LKML, linux-rt-users, Steven Rostedt,
	Thomas Gleixner, Carsten Emde, John Kacur, Daniel Wagner,
	Clark Williams, Zhang Xiao

On 2020-06-04 15:51:14 [-0500], Tom Zanussi wrote:
> > 
> > Hi, This patch introduced a regression in our kernel
> > (v4.19.124-rt53-rebase), It occurs when we're jumping to crush kernel
> > using kexec, in the initialization of the emmc driver.
> > I'm still debugging the root cause, but I thought of mentioning this
> > in the mailing list if you have any idea why this could occur.
> > The issue doesn't happen on normal boot, only when I specifically
> > crash the kernel into the crash kernel.
> > Thanks,
> > Ramon.
> 
> I'm not very familiar with crashing the kernel into the crash kernel. 
> Can you explain in enough detail how to set things up to reproduce this
> and how to trigger it?  Does it happen every time? 
> 
> >From looking at the backtrace, it's hitting the WARN_ON() in the
> cmpxchg() loop below, because TASKLET_STATE is just
> TASKLET_STATE_CHAINED.
> 
> It seems that the only way to turn off TASKLET_STATE_CHAINED is via
> this cmpxchg(), but TASKLET_STATE_RUN can be independently turned off
> elsewhere (tasklet_unlock() and tasklet_tryunlock()), so if that
> happens and this loop is hit, you could loop until loops runs out and
> hit this warning.

But clearing TASKLET_STATE_RUN independently happens by the task, that
set it / part of tasklet_schedule().
tasklet_tryunlock() does a cmpxchg() with only the RUN bit so it won't
work if the additional CHAINED bit is set.

The tasklet itself (which may run on another CPU) sets the RUN bit at the
begin and clears it at the end via cmpxchg() together with the CHAINED
bit. 

I've been staring at it for sometime and I don't see how this can
happen.

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 15:47       ` Sebastian Andrzej Siewior
@ 2020-06-09 16:17         ` Tom Zanussi
  2020-06-09 16:34           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Zanussi @ 2020-06-09 16:17 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Ramon Fried, LKML, linux-rt-users, Steven Rostedt,
	Thomas Gleixner, Carsten Emde, John Kacur, Daniel Wagner,
	Clark Williams, Zhang Xiao

Hi Sebastian,

On Tue, 2020-06-09 at 17:47 +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-04 15:51:14 [-0500], Tom Zanussi wrote:
> > > 
> > > Hi, This patch introduced a regression in our kernel
> > > (v4.19.124-rt53-rebase), It occurs when we're jumping to crush
> > > kernel
> > > using kexec, in the initialization of the emmc driver.
> > > I'm still debugging the root cause, but I thought of mentioning
> > > this
> > > in the mailing list if you have any idea why this could occur.
> > > The issue doesn't happen on normal boot, only when I specifically
> > > crash the kernel into the crash kernel.
> > > Thanks,
> > > Ramon.
> > 
> > I'm not very familiar with crashing the kernel into the crash
> > kernel. 
> > Can you explain in enough detail how to set things up to reproduce
> > this
> > and how to trigger it?  Does it happen every time? 
> > 
> > > From looking at the backtrace, it's hitting the WARN_ON() in the
> > 
> > cmpxchg() loop below, because TASKLET_STATE is just
> > TASKLET_STATE_CHAINED.
> > 
> > It seems that the only way to turn off TASKLET_STATE_CHAINED is via
> > this cmpxchg(), but TASKLET_STATE_RUN can be independently turned
> > off
> > elsewhere (tasklet_unlock() and tasklet_tryunlock()), so if that
> > happens and this loop is hit, you could loop until loops runs out
> > and
> > hit this warning.
> 
> But clearing TASKLET_STATE_RUN independently happens by the task,
> that
> set it / part of tasklet_schedule().
> tasklet_tryunlock() does a cmpxchg() with only the RUN bit so it
> won't
> work if the additional CHAINED bit is set.
> 
> The tasklet itself (which may run on another CPU) sets the RUN bit at
> the
> begin and clears it at the end via cmpxchg() together with the
> CHAINED
> bit. 
> 
> I've been staring at it for sometime and I don't see how this can
> happen.
> 

I did find a problem with the patch when configured as !SMP since in
that case the RUN flag is never set (will send a patch for that
shortly), but that wouldn't be the case here.

It would help to be able to reproduce it, but I haven't been able to
yet.

Tom

> Sebastian


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 16:17         ` Tom Zanussi
@ 2020-06-09 16:34           ` Sebastian Andrzej Siewior
  2020-06-09 16:37             ` Ramon Fried
  2020-06-09 16:57             ` Tom Zanussi
  0 siblings, 2 replies; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-06-09 16:34 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Ramon Fried, LKML, linux-rt-users, Steven Rostedt,
	Thomas Gleixner, Carsten Emde, John Kacur, Daniel Wagner,
	Clark Williams, Zhang Xiao

On 2020-06-09 11:17:53 [-0500], Tom Zanussi wrote:
> Hi Sebastian,
Hi Tom,

> I did find a problem with the patch when configured as !SMP since in
> that case the RUN flag is never set (will send a patch for that
> shortly), but that wouldn't be the case here.

How?

| #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
| static inline int tasklet_trylock(struct tasklet_struct *t)
| {
|         return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state);
| }

I can't tell from the backtrace if he runs with RT or without but I
assumed RT. But yes, for !SMP && !RT it would explain it.

> It would help to be able to reproduce it, but I haven't been able to
> yet.
> 
> Tom
> 
Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 16:34           ` Sebastian Andrzej Siewior
@ 2020-06-09 16:37             ` Ramon Fried
  2020-06-09 16:40               ` Ramon Fried
  2020-06-09 16:57             ` Tom Zanussi
  1 sibling, 1 reply; 18+ messages in thread
From: Ramon Fried @ 2020-06-09 16:37 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Tom Zanussi
  Cc: LKML, linux-rt-users, Steven Rostedt, Thomas Gleixner,
	Carsten Emde, John Kacur, Daniel Wagner, Clark Williams,
	Zhang Xiao



On June 9, 2020 7:34:46 PM GMT+03:00, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>On 2020-06-09 11:17:53 [-0500], Tom Zanussi wrote:
>> Hi Sebastian,
>Hi Tom,
>
>> I did find a problem with the patch when configured as !SMP since in
>> that case the RUN flag is never set (will send a patch for that
>> shortly), but that wouldn't be the case here.
>
>How?
>
>| #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
>| static inline int tasklet_trylock(struct tasklet_struct *t)
>| {
>|         return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state);
>| }
>
>I can't tell from the backtrace if he runs with RT or without but I
>assumed RT. But yes, for !SMP && !RT it would explain it.
PREEMT_FULL is enabled. 
I'm working on getting symbols for this trace, this is a crash kernel so everything is stripped naked. 
Thanks, Ramon
>
>> It would help to be able to reproduce it, but I haven't been able to
>> yet.
>> 
>> Tom
>> 
>Sebastian

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 16:37             ` Ramon Fried
@ 2020-06-09 16:40               ` Ramon Fried
  2020-06-09 16:42                 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 18+ messages in thread
From: Ramon Fried @ 2020-06-09 16:40 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Tom Zanussi
  Cc: LKML, linux-rt-users, Steven Rostedt, Thomas Gleixner,
	Carsten Emde, John Kacur, Daniel Wagner, Clark Williams,
	Zhang Xiao



On June 9, 2020 7:37:31 PM GMT+03:00, Ramon Fried <rfried.dev@gmail.com> wrote:
>
>
>On June 9, 2020 7:34:46 PM GMT+03:00, Sebastian Andrzej Siewior
><bigeasy@linutronix.de> wrote:
>>On 2020-06-09 11:17:53 [-0500], Tom Zanussi wrote:
>>> Hi Sebastian,
>>Hi Tom,
>>
>>> I did find a problem with the patch when configured as !SMP since in
>>> that case the RUN flag is never set (will send a patch for that
>>> shortly), but that wouldn't be the case here.
>>
>>How?
>>
>>| #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
>>| static inline int tasklet_trylock(struct tasklet_struct *t)
>>| {
>>|         return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state);
>>| }
>>
>>I can't tell from the backtrace if he runs with RT or without but I
>>assumed RT. But yes, for !SMP && !RT it would explain it.
>PREEMT_FULL is enabled. 
>I'm working on getting symbols for this trace, this is a crash kernel
>so everything is stripped naked. 
>Thanks, Ramon

Correction. normal kernel is running with RT enabled, crash kernel without. 
>>
>>> It would help to be able to reproduce it, but I haven't been able to
>>> yet.
>>> 
>>> Tom
>>> 
>>Sebastian

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 16:40               ` Ramon Fried
@ 2020-06-09 16:42                 ` Sebastian Andrzej Siewior
  2020-06-09 17:07                   ` Ramon Fried
  0 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-06-09 16:42 UTC (permalink / raw)
  To: Ramon Fried
  Cc: Tom Zanussi, LKML, linux-rt-users, Steven Rostedt,
	Thomas Gleixner, Carsten Emde, John Kacur, Daniel Wagner,
	Clark Williams, Zhang Xiao

On 2020-06-09 19:40:03 [+0300], Ramon Fried wrote:
> Correction. normal kernel is running with RT enabled, crash kernel without. 

no RT and no SMP in your crash kernel? So this information in your first
email would have saved me some time…

Sebastian

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 16:34           ` Sebastian Andrzej Siewior
  2020-06-09 16:37             ` Ramon Fried
@ 2020-06-09 16:57             ` Tom Zanussi
  1 sibling, 0 replies; 18+ messages in thread
From: Tom Zanussi @ 2020-06-09 16:57 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Ramon Fried, LKML, linux-rt-users, Steven Rostedt,
	Thomas Gleixner, Carsten Emde, John Kacur, Daniel Wagner,
	Clark Williams, Zhang Xiao

Hi Sebastian,

On Tue, 2020-06-09 at 18:34 +0200, Sebastian Andrzej Siewior wrote:
> On 2020-06-09 11:17:53 [-0500], Tom Zanussi wrote:
> > Hi Sebastian,
> 
> Hi Tom,
> 
> > I did find a problem with the patch when configured as !SMP since
> > in
> > that case the RUN flag is never set (will send a patch for that
> > shortly), but that wouldn't be the case here.
> 
> How?
> 

My test machine with !SMP and !RT doesn't boot, and in that case we
have:

#define tasklet_trylock(t) 1
#define tasklet_tryunlock(t)    1

instead of setting/clearing the RUN flag.

So the cmpxchg with RUN+CHAIN can never work and we hit the loop.

> > #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
> > static inline int tasklet_trylock(struct tasklet_struct *t)
> > {
> >         return !test_and_set_bit(TASKLET_STATE_RUN, &(t)->state);
> > }
> 
> I can't tell from the backtrace if he runs with RT or without but I
> assumed RT. But yes, for !SMP && !RT it would explain it.
> 

Yeah, for me !SMP and RT works, but !SMP and !RT doesn't.

I had assumed he was talking about the samely configured kernel, but
apparently it's not.

Tom

> > It would help to be able to reproduce it, but I haven't been able
> > to
> > yet.
> > 
> > Tom
> > 
> 
> Sebastian


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 16:42                 ` Sebastian Andrzej Siewior
@ 2020-06-09 17:07                   ` Ramon Fried
  2020-06-09 17:10                     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 18+ messages in thread
From: Ramon Fried @ 2020-06-09 17:07 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tom Zanussi, LKML, linux-rt-users, Steven Rostedt,
	Thomas Gleixner, Carsten Emde, John Kacur, Daniel Wagner,
	Clark Williams, Zhang Xiao



On June 9, 2020 7:42:13 PM GMT+03:00, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>On 2020-06-09 19:40:03 [+0300], Ramon Fried wrote:
>> Correction. normal kernel is running with RT enabled, crash kernel
>without. 
>
>no RT and no SMP in your crash kernel? So this information in your
>first
>email would have saved me some time…
Indeed
 I'm truly sorry, I thought our crash kernel is configured as RT as well. 
so, as I understand, if I build the RT kernel without preempt enabled I can hit this bug? 
>
>Sebastian

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 17:07                   ` Ramon Fried
@ 2020-06-09 17:10                     ` Sebastian Andrzej Siewior
  2020-06-09 17:14                       ` Ramon Fried
  0 siblings, 1 reply; 18+ messages in thread
From: Sebastian Andrzej Siewior @ 2020-06-09 17:10 UTC (permalink / raw)
  To: Ramon Fried
  Cc: Tom Zanussi, LKML, linux-rt-users, Steven Rostedt,
	Thomas Gleixner, Carsten Emde, John Kacur, Daniel Wagner,
	Clark Williams, Zhang Xiao

On 2020-06-09 20:07:06 [+0300], Ramon Fried wrote:
> Indeed
>  I'm truly sorry, I thought our crash kernel is configured as RT as well. 
> so, as I understand, if I build the RT kernel without preempt enabled I can hit this bug? 

Don't worry, I should have been better with the details in the log.

So you should _always_ hit the warning/bug if you compile a kernel
without SMP and RT. If you enable one of these then everything should be
fine.

Sebastian


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 17:10                     ` Sebastian Andrzej Siewior
@ 2020-06-09 17:14                       ` Ramon Fried
  2020-06-09 17:20                         ` Tom Zanussi
  0 siblings, 1 reply; 18+ messages in thread
From: Ramon Fried @ 2020-06-09 17:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Tom Zanussi, LKML, linux-rt-users, Steven Rostedt,
	Thomas Gleixner, Carsten Emde, John Kacur, Daniel Wagner,
	Clark Williams, Zhang Xiao



On June 9, 2020 8:10:43 PM GMT+03:00, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>On 2020-06-09 20:07:06 [+0300], Ramon Fried wrote:
>> Indeed
>>  I'm truly sorry, I thought our crash kernel is configured as RT as
>well. 
>> so, as I understand, if I build the RT kernel without preempt enabled
>I can hit this bug? 
>
>Don't worry, I should have been better with the details in the log.
>
>So you should _always_ hit the warning/bug if you compile a kernel
>without SMP and RT. If you enable one of these then everything should
>be
>fine.
Would there be a fix for that? 
>
>Sebastian

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 17:14                       ` Ramon Fried
@ 2020-06-09 17:20                         ` Tom Zanussi
  2020-06-09 20:03                           ` Ramon Fried
  0 siblings, 1 reply; 18+ messages in thread
From: Tom Zanussi @ 2020-06-09 17:20 UTC (permalink / raw)
  To: Ramon Fried, Sebastian Andrzej Siewior
  Cc: LKML, linux-rt-users, Steven Rostedt, Thomas Gleixner,
	Carsten Emde, John Kacur, Daniel Wagner, Clark Williams,
	Zhang Xiao

Hi Ramon,

On Tue, 2020-06-09 at 20:14 +0300, Ramon Fried wrote:
> 
> On June 9, 2020 8:10:43 PM GMT+03:00, Sebastian Andrzej Siewior <
> bigeasy@linutronix.de> wrote:
> > On 2020-06-09 20:07:06 [+0300], Ramon Fried wrote:
> > > Indeed
> > >  I'm truly sorry, I thought our crash kernel is configured as RT
> > > as
> > 
> > well. 
> > > so, as I understand, if I build the RT kernel without preempt
> > > enabled
> > 
> > I can hit this bug? 
> > 
> > Don't worry, I should have been better with the details in the log.
> > 
> > So you should _always_ hit the warning/bug if you compile a kernel
> > without SMP and RT. If you enable one of these then everything
> > should
> > be
> > fine.
> 
> Would there be a fix for that? 

I haven't tested the fix yet, but can you try the below patch and see
if it fixes your broken case?

[PATCH] tasklet: Fix UP case for tasklet CHAINED state

commit 62d0a2a30cd0 (tasklet: Address a race resulting in
double-enqueue) addresses a problem that can result in a tasklet being
enqueued on two cpus at the same time by combining the RUN flag with a
new CHAINED flag, and relies on the combination to be present in order
to zero it out, which can never happen on (!SMP and !PREEMPT_RT_FULL)
because the RUN flag is SMP/PREEMPT_RT_FULL-only.

So make sure the above commit is only applied for the SMP ||
PREEMPT_RT_FULL case.

Signed-off-by: Tom Zanussi <zanussi@kernel.org>
---
 kernel/softirq.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 73dae64bfc9c..9bad7a16dc61 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -947,10 +947,12 @@ static void __tasklet_schedule_common(struct
tasklet_struct *t,
 	 * is locked before adding it to the list.
 	 */
 	if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
 		if (test_and_set_bit(TASKLET_STATE_CHAINED, &t->state)) 
{
 			tasklet_unlock(t);
 			return;
 		}
+#endif
 		t->next = NULL;
 		*head->tail = t;
 		head->tail = &(t->next);
@@ -1044,7 +1046,11 @@ static void tasklet_action_common(struct
softirq_action *a,
 again:
 		t->func(t->data);
 
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
 		while (cmpxchg(&t->state, TASKLET_STATEF_RC, 0) !=
TASKLET_STATEF_RC) {
+#else
+		while (!tasklet_tryunlock(t)) {
+#endif
 			/*
 			 * If it got disabled meanwhile, bail out:
 			 */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 17:20                         ` Tom Zanussi
@ 2020-06-09 20:03                           ` Ramon Fried
  2020-06-09 20:09                             ` Tom Zanussi
  0 siblings, 1 reply; 18+ messages in thread
From: Ramon Fried @ 2020-06-09 20:03 UTC (permalink / raw)
  To: Tom Zanussi
  Cc: Sebastian Andrzej Siewior, LKML, linux-rt-users, Steven Rostedt,
	Thomas Gleixner, Carsten Emde, John Kacur, Daniel Wagner,
	Clark Williams, Zhang Xiao

On Tue, Jun 9, 2020 at 8:20 PM Tom Zanussi <zanussi@kernel.org> wrote:
>
> Hi Ramon,
>
> On Tue, 2020-06-09 at 20:14 +0300, Ramon Fried wrote:
> >
> > On June 9, 2020 8:10:43 PM GMT+03:00, Sebastian Andrzej Siewior <
> > bigeasy@linutronix.de> wrote:
> > > On 2020-06-09 20:07:06 [+0300], Ramon Fried wrote:
> > > > Indeed
> > > >  I'm truly sorry, I thought our crash kernel is configured as RT
> > > > as
> > >
> > > well.
> > > > so, as I understand, if I build the RT kernel without preempt
> > > > enabled
> > >
> > > I can hit this bug?
> > >
> > > Don't worry, I should have been better with the details in the log.
> > >
> > > So you should _always_ hit the warning/bug if you compile a kernel
> > > without SMP and RT. If you enable one of these then everything
> > > should
> > > be
> > > fine.
> >
> > Would there be a fix for that?
>
> I haven't tested the fix yet, but can you try the below patch and see
> if it fixes your broken case?
>
> [PATCH] tasklet: Fix UP case for tasklet CHAINED state
>
> commit 62d0a2a30cd0 (tasklet: Address a race resulting in
> double-enqueue) addresses a problem that can result in a tasklet being
> enqueued on two cpus at the same time by combining the RUN flag with a
> new CHAINED flag, and relies on the combination to be present in order
> to zero it out, which can never happen on (!SMP and !PREEMPT_RT_FULL)
> because the RUN flag is SMP/PREEMPT_RT_FULL-only.
>
> So make sure the above commit is only applied for the SMP ||
> PREEMPT_RT_FULL case.
>
> Signed-off-by: Tom Zanussi <zanussi@kernel.org>
> ---
>  kernel/softirq.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 73dae64bfc9c..9bad7a16dc61 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -947,10 +947,12 @@ static void __tasklet_schedule_common(struct
> tasklet_struct *t,
>          * is locked before adding it to the list.
>          */
>         if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
> +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
>                 if (test_and_set_bit(TASKLET_STATE_CHAINED, &t->state))
> {
>                         tasklet_unlock(t);
>                         return;
>                 }
> +#endif
>                 t->next = NULL;
>                 *head->tail = t;
>                 head->tail = &(t->next);
> @@ -1044,7 +1046,11 @@ static void tasklet_action_common(struct
> softirq_action *a,
>  again:
>                 t->func(t->data);
>
> +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
>                 while (cmpxchg(&t->state, TASKLET_STATEF_RC, 0) !=
> TASKLET_STATEF_RC) {
> +#else
> +               while (!tasklet_tryunlock(t)) {
> +#endif
>                         /*
>                          * If it got disabled meanwhile, bail out:
>                          */
> --
> 2.17.1
>
>
Tested-By: Ramon Fried <rfried.dev@gmail.com>

Working, thanks a lot.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue
  2020-06-09 20:03                           ` Ramon Fried
@ 2020-06-09 20:09                             ` Tom Zanussi
  0 siblings, 0 replies; 18+ messages in thread
From: Tom Zanussi @ 2020-06-09 20:09 UTC (permalink / raw)
  To: Ramon Fried
  Cc: Sebastian Andrzej Siewior, LKML, linux-rt-users, Steven Rostedt,
	Thomas Gleixner, Carsten Emde, John Kacur, Daniel Wagner,
	Clark Williams, Zhang Xiao

On Tue, 2020-06-09 at 23:03 +0300, Ramon Fried wrote:
> On Tue, Jun 9, 2020 at 8:20 PM Tom Zanussi <zanussi@kernel.org>
> wrote:
> > 
> > Hi Ramon,
> > 
> > On Tue, 2020-06-09 at 20:14 +0300, Ramon Fried wrote:
> > > 
> > > On June 9, 2020 8:10:43 PM GMT+03:00, Sebastian Andrzej Siewior <
> > > bigeasy@linutronix.de> wrote:
> > > > On 2020-06-09 20:07:06 [+0300], Ramon Fried wrote:
> > > > > Indeed
> > > > >  I'm truly sorry, I thought our crash kernel is configured as
> > > > > RT
> > > > > as
> > > > 
> > > > well.
> > > > > so, as I understand, if I build the RT kernel without preempt
> > > > > enabled
> > > > 
> > > > I can hit this bug?
> > > > 
> > > > Don't worry, I should have been better with the details in the
> > > > log.
> > > > 
> > > > So you should _always_ hit the warning/bug if you compile a
> > > > kernel
> > > > without SMP and RT. If you enable one of these then everything
> > > > should
> > > > be
> > > > fine.
> > > 
> > > Would there be a fix for that?
> > 
> > I haven't tested the fix yet, but can you try the below patch and
> > see
> > if it fixes your broken case?
> > 
> > [PATCH] tasklet: Fix UP case for tasklet CHAINED state
> > 
> > commit 62d0a2a30cd0 (tasklet: Address a race resulting in
> > double-enqueue) addresses a problem that can result in a tasklet
> > being
> > enqueued on two cpus at the same time by combining the RUN flag
> > with a
> > new CHAINED flag, and relies on the combination to be present in
> > order
> > to zero it out, which can never happen on (!SMP and
> > !PREEMPT_RT_FULL)
> > because the RUN flag is SMP/PREEMPT_RT_FULL-only.
> > 
> > So make sure the above commit is only applied for the SMP ||
> > PREEMPT_RT_FULL case.
> > 
> > Signed-off-by: Tom Zanussi <zanussi@kernel.org>
> > ---
> >  kernel/softirq.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 73dae64bfc9c..9bad7a16dc61 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -947,10 +947,12 @@ static void __tasklet_schedule_common(struct
> > tasklet_struct *t,
> >          * is locked before adding it to the list.
> >          */
> >         if (test_bit(TASKLET_STATE_SCHED, &t->state)) {
> > +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
> >                 if (test_and_set_bit(TASKLET_STATE_CHAINED, &t-
> > >state))
> > {
> >                         tasklet_unlock(t);
> >                         return;
> >                 }
> > +#endif
> >                 t->next = NULL;
> >                 *head->tail = t;
> >                 head->tail = &(t->next);
> > @@ -1044,7 +1046,11 @@ static void tasklet_action_common(struct
> > softirq_action *a,
> >  again:
> >                 t->func(t->data);
> > 
> > +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL)
> >                 while (cmpxchg(&t->state, TASKLET_STATEF_RC, 0) !=
> > TASKLET_STATEF_RC) {
> > +#else
> > +               while (!tasklet_tryunlock(t)) {
> > +#endif
> >                         /*
> >                          * If it got disabled meanwhile, bail out:
> >                          */
> > --
> > 2.17.1
> > 
> > 
> 
> Tested-By: Ramon Fried <rfried.dev@gmail.com>
> 
> Working, thanks a lot.

OK, great, thanks for testing and reporting this.

Tom


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-06-09 20:09 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-23 20:54 [PATCH RT 0/2] Linux v4.19.115-rt49-rc1 zanussi
2020-04-23 20:54 ` [PATCH RT 1/2] tasklet: Address a race resulting in double-enqueue zanussi
2020-06-04 16:04   ` Ramon Fried
2020-06-04 20:51     ` Tom Zanussi
2020-06-09 15:47       ` Sebastian Andrzej Siewior
2020-06-09 16:17         ` Tom Zanussi
2020-06-09 16:34           ` Sebastian Andrzej Siewior
2020-06-09 16:37             ` Ramon Fried
2020-06-09 16:40               ` Ramon Fried
2020-06-09 16:42                 ` Sebastian Andrzej Siewior
2020-06-09 17:07                   ` Ramon Fried
2020-06-09 17:10                     ` Sebastian Andrzej Siewior
2020-06-09 17:14                       ` Ramon Fried
2020-06-09 17:20                         ` Tom Zanussi
2020-06-09 20:03                           ` Ramon Fried
2020-06-09 20:09                             ` Tom Zanussi
2020-06-09 16:57             ` Tom Zanussi
2020-04-23 20:54 ` [PATCH RT 2/2] Linux 4.19.115-rt49-rc1 zanussi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).