All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
@ 2018-02-14 17:45 Elenita Hinds
  2018-02-14 19:32 ` Kurt Van Dijck
  2018-02-14 20:33 ` Marc Kleine-Budde
  0 siblings, 2 replies; 8+ messages in thread
From: Elenita Hinds @ 2018-02-14 17:45 UTC (permalink / raw)
  To: linux-can, dev.kurt, o.rempel

Hi all.

I was seeing the kernel panic when receiving J1939 packets
promiscuously over both can0 and can1
(with multi-frame messages). The panic is due to the following line in
kernel/softirq.c (line 518):

if (!test_and_clear_bit(TASKLET_STATE_SCHED,
        &t->state))
    BUG();  <=== panic here!!

The issue seems similar to a previous patch in can/bcm.c
(see commit https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412),
wherein a race condition can occur while the tasklets are being or
have terminated and the soft irq is checking the flag at the same
time. I mimicked the bcm fix and the panic went away.

Below is a proposed fix. Please review.


Regards,
Elenita (Hinds)

----

diff --git net/can/j1939/transport.c net/can/j1939/transport.c
index c15412dae565..9114ae7cdf72 100644
--- net/can/j1939/transport.c
+++ net/can/j1939/transport.c
@@ -121,10 +121,27 @@ static inline struct list_head *sessionq(bool extd)
 static inline void j1939_session_destroy(struct session *session)
 {
     kfree_skb(session->skb);
-    hrtimer_cancel(&session->rxtimer);
-    hrtimer_cancel(&session->txtimer);
-    tasklet_disable(&session->rxtask);
-    tasklet_disable(&session->txtask);
+
+        /* When terminating a session, a hrtimer or a tasklet might run.
+         * When the hrtimer triggers its associated tasklet and vice versa,
+         * we need to make sure they are not running or scheduled to run.
+         * Otherwise, a race condition can occur when the soft irq is
+         * handling the tasklet, resulting in kernel panic.
+         */
+    while (test_bit(TASKLET_STATE_SCHED, &session->rxtask.state) ||
+        test_bit(TASKLET_STATE_RUN, &session->rxtask.state) ||
+        hrtimer_active(&session->rxtimer)) {
+        hrtimer_cancel(&session->rxtimer);
+        tasklet_disable(&session->rxtask);
+    }
+
+    while (test_bit(TASKLET_STATE_SCHED, &session->txtask.state) ||
+        test_bit(TASKLET_STATE_RUN, &session->txtask.state) ||
+        hrtimer_active(&session->txtimer)) {
+        hrtimer_cancel(&session->txtimer);
+        tasklet_disable(&session->txtask);
+    }
+
     kfree(session);
 }

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
  2018-02-14 17:45 [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed Elenita Hinds
@ 2018-02-14 19:32 ` Kurt Van Dijck
  2018-02-14 20:20   ` Oliver Hartkopp
  2018-02-14 20:33 ` Marc Kleine-Budde
  1 sibling, 1 reply; 8+ messages in thread
From: Kurt Van Dijck @ 2018-02-14 19:32 UTC (permalink / raw)
  To: Elenita Hinds; +Cc: linux-can, o.rempel

On wo, 14 feb 2018 11:45:31 -0600, Elenita Hinds wrote:
> Hi all.
> 
> I was seeing the kernel panic when receiving J1939 packets
> promiscuously over both can0 and can1
> (with multi-frame messages). The panic is due to the following line in
> kernel/softirq.c (line 518):
> 
> if (!test_and_clear_bit(TASKLET_STATE_SCHED,
>         &t->state))
>     BUG();  <=== panic here!!
> 
> The issue seems similar to a previous patch in can/bcm.c
> (see commit https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412),
> wherein a race condition can occur while the tasklets are being or
> have terminated and the soft irq is checking the flag at the same
> time. I mimicked the bcm fix and the panic went away.
> 
> Below is a proposed fix. Please review.
> 
> 
> Regards,
> Elenita (Hinds)
> 

Your patch looks good. Well found!
Acked-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>

> ----
> 
> diff --git net/can/j1939/transport.c net/can/j1939/transport.c
> index c15412dae565..9114ae7cdf72 100644
> --- net/can/j1939/transport.c
> +++ net/can/j1939/transport.c
> @@ -121,10 +121,27 @@ static inline struct list_head *sessionq(bool extd)
>  static inline void j1939_session_destroy(struct session *session)
>  {
>      kfree_skb(session->skb);
> -    hrtimer_cancel(&session->rxtimer);
> -    hrtimer_cancel(&session->txtimer);
> -    tasklet_disable(&session->rxtask);
> -    tasklet_disable(&session->txtask);
> +
> +        /* When terminating a session, a hrtimer or a tasklet might run.
> +         * When the hrtimer triggers its associated tasklet and vice versa,
> +         * we need to make sure they are not running or scheduled to run.
> +         * Otherwise, a race condition can occur when the soft irq is
> +         * handling the tasklet, resulting in kernel panic.
> +         */
> +    while (test_bit(TASKLET_STATE_SCHED, &session->rxtask.state) ||
> +        test_bit(TASKLET_STATE_RUN, &session->rxtask.state) ||
> +        hrtimer_active(&session->rxtimer)) {
> +        hrtimer_cancel(&session->rxtimer);
> +        tasklet_disable(&session->rxtask);
> +    }
> +
> +    while (test_bit(TASKLET_STATE_SCHED, &session->txtask.state) ||
> +        test_bit(TASKLET_STATE_RUN, &session->txtask.state) ||
> +        hrtimer_active(&session->txtimer)) {
> +        hrtimer_cancel(&session->txtimer);
> +        tasklet_disable(&session->txtask);
> +    }
> +
>      kfree(session);
>  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
  2018-02-14 19:32 ` Kurt Van Dijck
@ 2018-02-14 20:20   ` Oliver Hartkopp
  0 siblings, 0 replies; 8+ messages in thread
From: Oliver Hartkopp @ 2018-02-14 20:20 UTC (permalink / raw)
  To: Elenita Hinds, linux-can, o.rempel, Kurt Van Dijck

Hi all,

On 02/14/2018 08:32 PM, Kurt Van Dijck wrote:
> On wo, 14 feb 2018 11:45:31 -0600, Elenita Hinds wrote:

>> I was seeing the kernel panic when receiving J1939 packets
>> promiscuously over both can0 and can1
>> (with multi-frame messages). The panic is due to the following line in
>> kernel/softirq.c (line 518):
>>
>> if (!test_and_clear_bit(TASKLET_STATE_SCHED,
>>          &t->state))
>>      BUG();  <=== panic here!!
>>
>> The issue seems similar to a previous patch in can/bcm.c
>> (see commit https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412),

Yeah - that was sticky ;-)

>> wherein a race condition can occur while the tasklets are being or
>> have terminated and the soft irq is checking the flag at the same
>> time. I mimicked the bcm fix and the panic went away.
>>
>> Below is a proposed fix. Please review.
>>
>>
>> Regards,
>> Elenita (Hinds)
>>
> 
> Your patch looks good. Well found!
> Acked-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>

Just to keep you up-to-date with latest hrtimer developments:

Anna-Maria Gleixner integrated a rework of hrtimers so that we can use 
them in softirq context without the tasklet trampoline stuff.

The cleanup will also cause changes in net/can/bcm.c
https://lkml.org/lkml/2017/11/29/595

which finally removes all this stuff (and the fix above too!)

I assume the BCM changes will hit 4.17 as the 4.16 only contains the new 
infrastructure without changing the hrtimer users.

Best regards,
Oliver


> 
>> ----
>>
>> diff --git net/can/j1939/transport.c net/can/j1939/transport.c
>> index c15412dae565..9114ae7cdf72 100644
>> --- net/can/j1939/transport.c
>> +++ net/can/j1939/transport.c
>> @@ -121,10 +121,27 @@ static inline struct list_head *sessionq(bool extd)
>>   static inline void j1939_session_destroy(struct session *session)
>>   {
>>       kfree_skb(session->skb);
>> -    hrtimer_cancel(&session->rxtimer);
>> -    hrtimer_cancel(&session->txtimer);
>> -    tasklet_disable(&session->rxtask);
>> -    tasklet_disable(&session->txtask);
>> +
>> +        /* When terminating a session, a hrtimer or a tasklet might run.
>> +         * When the hrtimer triggers its associated tasklet and vice versa,
>> +         * we need to make sure they are not running or scheduled to run.
>> +         * Otherwise, a race condition can occur when the soft irq is
>> +         * handling the tasklet, resulting in kernel panic.
>> +         */
>> +    while (test_bit(TASKLET_STATE_SCHED, &session->rxtask.state) ||
>> +        test_bit(TASKLET_STATE_RUN, &session->rxtask.state) ||
>> +        hrtimer_active(&session->rxtimer)) {
>> +        hrtimer_cancel(&session->rxtimer);
>> +        tasklet_disable(&session->rxtask);
>> +    }
>> +
>> +    while (test_bit(TASKLET_STATE_SCHED, &session->txtask.state) ||
>> +        test_bit(TASKLET_STATE_RUN, &session->txtask.state) ||
>> +        hrtimer_active(&session->txtimer)) {
>> +        hrtimer_cancel(&session->txtimer);
>> +        tasklet_disable(&session->txtask);
>> +    }
>> +
>>       kfree(session);
>>   }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
  2018-02-14 17:45 [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed Elenita Hinds
  2018-02-14 19:32 ` Kurt Van Dijck
@ 2018-02-14 20:33 ` Marc Kleine-Budde
  2018-02-14 21:20   ` Elenita Hinds
  1 sibling, 1 reply; 8+ messages in thread
From: Marc Kleine-Budde @ 2018-02-14 20:33 UTC (permalink / raw)
  To: Elenita Hinds, linux-can, dev.kurt, o.rempel


[-- Attachment #1.1: Type: text/plain, Size: 1136 bytes --]

On 02/14/2018 06:45 PM, Elenita Hinds wrote:
> Hi all.
> 
> I was seeing the kernel panic when receiving J1939 packets
> promiscuously over both can0 and can1
> (with multi-frame messages). The panic is due to the following line in
> kernel/softirq.c (line 518):
> 
> if (!test_and_clear_bit(TASKLET_STATE_SCHED,
>         &t->state))
>     BUG();  <=== panic here!!
> 
> The issue seems similar to a previous patch in can/bcm.c
> (see commit https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412),
> wherein a race condition can occur while the tasklets are being or
> have terminated and the soft irq is checking the flag at the same
> time. I mimicked the bcm fix and the panic went away.
> 
> Below is a proposed fix. Please review.

Can I have your S-o-b?

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
  2018-02-14 20:33 ` Marc Kleine-Budde
@ 2018-02-14 21:20   ` Elenita Hinds
  2018-02-14 21:20     ` Marc Kleine-Budde
  2018-02-15  8:45     ` Marc Kleine-Budde
  0 siblings, 2 replies; 8+ messages in thread
From: Elenita Hinds @ 2018-02-14 21:20 UTC (permalink / raw)
  To: Marc Kleine-Budde; +Cc: linux-can, dev.kurt, o.rempel

Thanks, Kurt and Oliver, for the feedback. The newest bcm changes look better.

Marc:  I'm sorry ... forgot to sign the patch. Hope re-attaching below is ok:

Signed-off-by: Elenita Hinds <ecathinds@gmail.com>


On Wed, Feb 14, 2018 at 2:33 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 02/14/2018 06:45 PM, Elenita Hinds wrote:
>> Hi all.
>>
>> I was seeing the kernel panic when receiving J1939 packets
>> promiscuously over both can0 and can1
>> (with multi-frame messages). The panic is due to the following line in
>> kernel/softirq.c (line 518):
>>
>> if (!test_and_clear_bit(TASKLET_STATE_SCHED,
>>         &t->state))
>>     BUG();  <=== panic here!!
>>
>> The issue seems similar to a previous patch in can/bcm.c
>> (see commit https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412),
>> wherein a race condition can occur while the tasklets are being or
>> have terminated and the soft irq is checking the flag at the same
>> time. I mimicked the bcm fix and the panic went away.
>>
>> Below is a proposed fix. Please review.
>
> Can I have your S-o-b?
>
> Marc
>
> --
> Pengutronix e.K.                  | Marc Kleine-Budde           |
> Industrial Linux Solutions        | Phone: +49-231-2826-924     |
> Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
  2018-02-14 21:20   ` Elenita Hinds
@ 2018-02-14 21:20     ` Marc Kleine-Budde
  2018-02-14 21:23       ` Marc Kleine-Budde
  2018-02-15  8:45     ` Marc Kleine-Budde
  1 sibling, 1 reply; 8+ messages in thread
From: Marc Kleine-Budde @ 2018-02-14 21:20 UTC (permalink / raw)
  To: Elenita Hinds; +Cc: linux-can, dev.kurt, o.rempel


[-- Attachment #1.1: Type: text/plain, Size: 572 bytes --]

On 02/14/2018 10:20 PM, Elenita Hinds wrote:
> Thanks, Kurt and Oliver, for the feedback. The newest bcm changes look better.
> 
> Marc:  I'm sorry ... forgot to sign the patch. Hope re-attaching below is ok:

Sure.

> Signed-off-by: Elenita Hinds <ecathinds@gmail.com>

Thanks,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
  2018-02-14 21:20     ` Marc Kleine-Budde
@ 2018-02-14 21:23       ` Marc Kleine-Budde
  0 siblings, 0 replies; 8+ messages in thread
From: Marc Kleine-Budde @ 2018-02-14 21:23 UTC (permalink / raw)
  To: Elenita Hinds; +Cc: linux-can, dev.kurt, o.rempel


[-- Attachment #1.1: Type: text/plain, Size: 676 bytes --]

On 02/14/2018 10:20 PM, Marc Kleine-Budde wrote:
> On 02/14/2018 10:20 PM, Elenita Hinds wrote:
>> Thanks, Kurt and Oliver, for the feedback. The newest bcm changes look better.
>>
>> Marc:  I'm sorry ... forgot to sign the patch. Hope re-attaching below is ok:
> 
> Sure.
> 
>> Signed-off-by: Elenita Hinds <ecathinds@gmail.com>

Added to the j1939 and j1939-v4.15-rc9 branches.

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
  2018-02-14 21:20   ` Elenita Hinds
  2018-02-14 21:20     ` Marc Kleine-Budde
@ 2018-02-15  8:45     ` Marc Kleine-Budde
  1 sibling, 0 replies; 8+ messages in thread
From: Marc Kleine-Budde @ 2018-02-15  8:45 UTC (permalink / raw)
  To: Elenita Hinds; +Cc: linux-can, dev.kurt, o.rempel


[-- Attachment #1.1: Type: text/plain, Size: 526 bytes --]

On 02/14/2018 10:20 PM, Elenita Hinds wrote:
> Thanks, Kurt and Oliver, for the feedback. The newest bcm changes look better.

Do you or anyone else care porting the new hrtimer-softirq approach to
j1939? Patches are welcome!

regards,
Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-02-15  8:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-14 17:45 [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed Elenita Hinds
2018-02-14 19:32 ` Kurt Van Dijck
2018-02-14 20:20   ` Oliver Hartkopp
2018-02-14 20:33 ` Marc Kleine-Budde
2018-02-14 21:20   ` Elenita Hinds
2018-02-14 21:20     ` Marc Kleine-Budde
2018-02-14 21:23       ` Marc Kleine-Budde
2018-02-15  8:45     ` Marc Kleine-Budde

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.