* [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
@ 2018-02-14 17:45 Elenita Hinds
2018-02-14 19:32 ` Kurt Van Dijck
2018-02-14 20:33 ` Marc Kleine-Budde
0 siblings, 2 replies; 8+ messages in thread
From: Elenita Hinds @ 2018-02-14 17:45 UTC (permalink / raw)
To: linux-can, dev.kurt, o.rempel
Hi all.
I was seeing the kernel panic when receiving J1939 packets
promiscuously over both can0 and can1
(with multi-frame messages). The panic is due to the following line in
kernel/softirq.c (line 518):
if (!test_and_clear_bit(TASKLET_STATE_SCHED,
&t->state))
BUG(); <=== panic here!!
The issue seems similar to a previous patch in can/bcm.c
(see commit https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412),
wherein a race condition can occur while the tasklets are being or
have terminated and the soft irq is checking the flag at the same
time. I mimicked the bcm fix and the panic went away.
Below is a proposed fix. Please review.
Regards,
Elenita (Hinds)
----
diff --git net/can/j1939/transport.c net/can/j1939/transport.c
index c15412dae565..9114ae7cdf72 100644
--- net/can/j1939/transport.c
+++ net/can/j1939/transport.c
@@ -121,10 +121,27 @@ static inline struct list_head *sessionq(bool extd)
static inline void j1939_session_destroy(struct session *session)
{
kfree_skb(session->skb);
- hrtimer_cancel(&session->rxtimer);
- hrtimer_cancel(&session->txtimer);
- tasklet_disable(&session->rxtask);
- tasklet_disable(&session->txtask);
+
+ /* When terminating a session, a hrtimer or a tasklet might run.
+ * When the hrtimer triggers its associated tasklet and vice versa,
+ * we need to make sure they are not running or scheduled to run.
+ * Otherwise, a race condition can occur when the soft irq is
+ * handling the tasklet, resulting in kernel panic.
+ */
+ while (test_bit(TASKLET_STATE_SCHED, &session->rxtask.state) ||
+ test_bit(TASKLET_STATE_RUN, &session->rxtask.state) ||
+ hrtimer_active(&session->rxtimer)) {
+ hrtimer_cancel(&session->rxtimer);
+ tasklet_disable(&session->rxtask);
+ }
+
+ while (test_bit(TASKLET_STATE_SCHED, &session->txtask.state) ||
+ test_bit(TASKLET_STATE_RUN, &session->txtask.state) ||
+ hrtimer_active(&session->txtimer)) {
+ hrtimer_cancel(&session->txtimer);
+ tasklet_disable(&session->txtask);
+ }
+
kfree(session);
}
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
2018-02-14 17:45 [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed Elenita Hinds
@ 2018-02-14 19:32 ` Kurt Van Dijck
2018-02-14 20:20 ` Oliver Hartkopp
2018-02-14 20:33 ` Marc Kleine-Budde
1 sibling, 1 reply; 8+ messages in thread
From: Kurt Van Dijck @ 2018-02-14 19:32 UTC (permalink / raw)
To: Elenita Hinds; +Cc: linux-can, o.rempel
On wo, 14 feb 2018 11:45:31 -0600, Elenita Hinds wrote:
> Hi all.
>
> I was seeing the kernel panic when receiving J1939 packets
> promiscuously over both can0 and can1
> (with multi-frame messages). The panic is due to the following line in
> kernel/softirq.c (line 518):
>
> if (!test_and_clear_bit(TASKLET_STATE_SCHED,
> &t->state))
> BUG(); <=== panic here!!
>
> The issue seems similar to a previous patch in can/bcm.c
> (see commit https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412),
> wherein a race condition can occur while the tasklets are being or
> have terminated and the soft irq is checking the flag at the same
> time. I mimicked the bcm fix and the panic went away.
>
> Below is a proposed fix. Please review.
>
>
> Regards,
> Elenita (Hinds)
>
Your patch looks good. Well found!
Acked-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
> ----
>
> diff --git net/can/j1939/transport.c net/can/j1939/transport.c
> index c15412dae565..9114ae7cdf72 100644
> --- net/can/j1939/transport.c
> +++ net/can/j1939/transport.c
> @@ -121,10 +121,27 @@ static inline struct list_head *sessionq(bool extd)
> static inline void j1939_session_destroy(struct session *session)
> {
> kfree_skb(session->skb);
> - hrtimer_cancel(&session->rxtimer);
> - hrtimer_cancel(&session->txtimer);
> - tasklet_disable(&session->rxtask);
> - tasklet_disable(&session->txtask);
> +
> + /* When terminating a session, a hrtimer or a tasklet might run.
> + * When the hrtimer triggers its associated tasklet and vice versa,
> + * we need to make sure they are not running or scheduled to run.
> + * Otherwise, a race condition can occur when the soft irq is
> + * handling the tasklet, resulting in kernel panic.
> + */
> + while (test_bit(TASKLET_STATE_SCHED, &session->rxtask.state) ||
> + test_bit(TASKLET_STATE_RUN, &session->rxtask.state) ||
> + hrtimer_active(&session->rxtimer)) {
> + hrtimer_cancel(&session->rxtimer);
> + tasklet_disable(&session->rxtask);
> + }
> +
> + while (test_bit(TASKLET_STATE_SCHED, &session->txtask.state) ||
> + test_bit(TASKLET_STATE_RUN, &session->txtask.state) ||
> + hrtimer_active(&session->txtimer)) {
> + hrtimer_cancel(&session->txtimer);
> + tasklet_disable(&session->txtask);
> + }
> +
> kfree(session);
> }
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
2018-02-14 19:32 ` Kurt Van Dijck
@ 2018-02-14 20:20 ` Oliver Hartkopp
0 siblings, 0 replies; 8+ messages in thread
From: Oliver Hartkopp @ 2018-02-14 20:20 UTC (permalink / raw)
To: Elenita Hinds, linux-can, o.rempel, Kurt Van Dijck
Hi all,
On 02/14/2018 08:32 PM, Kurt Van Dijck wrote:
> On wo, 14 feb 2018 11:45:31 -0600, Elenita Hinds wrote:
>> I was seeing the kernel panic when receiving J1939 packets
>> promiscuously over both can0 and can1
>> (with multi-frame messages). The panic is due to the following line in
>> kernel/softirq.c (line 518):
>>
>> if (!test_and_clear_bit(TASKLET_STATE_SCHED,
>> &t->state))
>> BUG(); <=== panic here!!
>>
>> The issue seems similar to a previous patch in can/bcm.c
>> (see commit https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412),
Yeah - that was sticky ;-)
>> wherein a race condition can occur while the tasklets are being or
>> have terminated and the soft irq is checking the flag at the same
>> time. I mimicked the bcm fix and the panic went away.
>>
>> Below is a proposed fix. Please review.
>>
>>
>> Regards,
>> Elenita (Hinds)
>>
>
> Your patch looks good. Well found!
> Acked-by: Kurt Van Dijck <dev.kurt@vandijck-laurijssen.be>
Just to keep you up-to-date with latest hrtimer developments:
Anna-Maria Gleixner integrated a rework of hrtimers so that we can use
them in softirq context without the tasklet trampoline stuff.
The cleanup will also cause changes in net/can/bcm.c
https://lkml.org/lkml/2017/11/29/595
which finally removes all this stuff (and the fix above too!)
I assume the BCM changes will hit 4.17 as the 4.16 only contains the new
infrastructure without changing the hrtimer users.
Best regards,
Oliver
>
>> ----
>>
>> diff --git net/can/j1939/transport.c net/can/j1939/transport.c
>> index c15412dae565..9114ae7cdf72 100644
>> --- net/can/j1939/transport.c
>> +++ net/can/j1939/transport.c
>> @@ -121,10 +121,27 @@ static inline struct list_head *sessionq(bool extd)
>> static inline void j1939_session_destroy(struct session *session)
>> {
>> kfree_skb(session->skb);
>> - hrtimer_cancel(&session->rxtimer);
>> - hrtimer_cancel(&session->txtimer);
>> - tasklet_disable(&session->rxtask);
>> - tasklet_disable(&session->txtask);
>> +
>> + /* When terminating a session, a hrtimer or a tasklet might run.
>> + * When the hrtimer triggers its associated tasklet and vice versa,
>> + * we need to make sure they are not running or scheduled to run.
>> + * Otherwise, a race condition can occur when the soft irq is
>> + * handling the tasklet, resulting in kernel panic.
>> + */
>> + while (test_bit(TASKLET_STATE_SCHED, &session->rxtask.state) ||
>> + test_bit(TASKLET_STATE_RUN, &session->rxtask.state) ||
>> + hrtimer_active(&session->rxtimer)) {
>> + hrtimer_cancel(&session->rxtimer);
>> + tasklet_disable(&session->rxtask);
>> + }
>> +
>> + while (test_bit(TASKLET_STATE_SCHED, &session->txtask.state) ||
>> + test_bit(TASKLET_STATE_RUN, &session->txtask.state) ||
>> + hrtimer_active(&session->txtimer)) {
>> + hrtimer_cancel(&session->txtimer);
>> + tasklet_disable(&session->txtask);
>> + }
>> +
>> kfree(session);
>> }
> --
> To unsubscribe from this list: send the line "unsubscribe linux-can" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
2018-02-14 17:45 [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed Elenita Hinds
2018-02-14 19:32 ` Kurt Van Dijck
@ 2018-02-14 20:33 ` Marc Kleine-Budde
2018-02-14 21:20 ` Elenita Hinds
1 sibling, 1 reply; 8+ messages in thread
From: Marc Kleine-Budde @ 2018-02-14 20:33 UTC (permalink / raw)
To: Elenita Hinds, linux-can, dev.kurt, o.rempel
[-- Attachment #1.1: Type: text/plain, Size: 1136 bytes --]
On 02/14/2018 06:45 PM, Elenita Hinds wrote:
> Hi all.
>
> I was seeing the kernel panic when receiving J1939 packets
> promiscuously over both can0 and can1
> (with multi-frame messages). The panic is due to the following line in
> kernel/softirq.c (line 518):
>
> if (!test_and_clear_bit(TASKLET_STATE_SCHED,
> &t->state))
> BUG(); <=== panic here!!
>
> The issue seems similar to a previous patch in can/bcm.c
> (see commit https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412),
> wherein a race condition can occur while the tasklets are being or
> have terminated and the soft irq is checking the flag at the same
> time. I mimicked the bcm fix and the panic went away.
>
> Below is a proposed fix. Please review.
Can I have your S-o-b?
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
2018-02-14 20:33 ` Marc Kleine-Budde
@ 2018-02-14 21:20 ` Elenita Hinds
2018-02-14 21:20 ` Marc Kleine-Budde
2018-02-15 8:45 ` Marc Kleine-Budde
0 siblings, 2 replies; 8+ messages in thread
From: Elenita Hinds @ 2018-02-14 21:20 UTC (permalink / raw)
To: Marc Kleine-Budde; +Cc: linux-can, dev.kurt, o.rempel
Thanks, Kurt and Oliver, for the feedback. The newest bcm changes look better.
Marc: I'm sorry ... forgot to sign the patch. Hope re-attaching below is ok:
Signed-off-by: Elenita Hinds <ecathinds@gmail.com>
On Wed, Feb 14, 2018 at 2:33 PM, Marc Kleine-Budde <mkl@pengutronix.de> wrote:
> On 02/14/2018 06:45 PM, Elenita Hinds wrote:
>> Hi all.
>>
>> I was seeing the kernel panic when receiving J1939 packets
>> promiscuously over both can0 and can1
>> (with multi-frame messages). The panic is due to the following line in
>> kernel/softirq.c (line 518):
>>
>> if (!test_and_clear_bit(TASKLET_STATE_SCHED,
>> &t->state))
>> BUG(); <=== panic here!!
>>
>> The issue seems similar to a previous patch in can/bcm.c
>> (see commit https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/commit/?id=a06393ed03167771246c4c43192d9c264bc48412),
>> wherein a race condition can occur while the tasklets are being or
>> have terminated and the soft irq is checking the flag at the same
>> time. I mimicked the bcm fix and the panic went away.
>>
>> Below is a proposed fix. Please review.
>
> Can I have your S-o-b?
>
> Marc
>
> --
> Pengutronix e.K. | Marc Kleine-Budde |
> Industrial Linux Solutions | Phone: +49-231-2826-924 |
> Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
> Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
2018-02-14 21:20 ` Elenita Hinds
@ 2018-02-14 21:20 ` Marc Kleine-Budde
2018-02-14 21:23 ` Marc Kleine-Budde
2018-02-15 8:45 ` Marc Kleine-Budde
1 sibling, 1 reply; 8+ messages in thread
From: Marc Kleine-Budde @ 2018-02-14 21:20 UTC (permalink / raw)
To: Elenita Hinds; +Cc: linux-can, dev.kurt, o.rempel
[-- Attachment #1.1: Type: text/plain, Size: 572 bytes --]
On 02/14/2018 10:20 PM, Elenita Hinds wrote:
> Thanks, Kurt and Oliver, for the feedback. The newest bcm changes look better.
>
> Marc: I'm sorry ... forgot to sign the patch. Hope re-attaching below is ok:
Sure.
> Signed-off-by: Elenita Hinds <ecathinds@gmail.com>
Thanks,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
2018-02-14 21:20 ` Marc Kleine-Budde
@ 2018-02-14 21:23 ` Marc Kleine-Budde
0 siblings, 0 replies; 8+ messages in thread
From: Marc Kleine-Budde @ 2018-02-14 21:23 UTC (permalink / raw)
To: Elenita Hinds; +Cc: linux-can, dev.kurt, o.rempel
[-- Attachment #1.1: Type: text/plain, Size: 676 bytes --]
On 02/14/2018 10:20 PM, Marc Kleine-Budde wrote:
> On 02/14/2018 10:20 PM, Elenita Hinds wrote:
>> Thanks, Kurt and Oliver, for the feedback. The newest bcm changes look better.
>>
>> Marc: I'm sorry ... forgot to sign the patch. Hope re-attaching below is ok:
>
> Sure.
>
>> Signed-off-by: Elenita Hinds <ecathinds@gmail.com>
Added to the j1939 and j1939-v4.15-rc9 branches.
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed
2018-02-14 21:20 ` Elenita Hinds
2018-02-14 21:20 ` Marc Kleine-Budde
@ 2018-02-15 8:45 ` Marc Kleine-Budde
1 sibling, 0 replies; 8+ messages in thread
From: Marc Kleine-Budde @ 2018-02-15 8:45 UTC (permalink / raw)
To: Elenita Hinds; +Cc: linux-can, dev.kurt, o.rempel
[-- Attachment #1.1: Type: text/plain, Size: 526 bytes --]
On 02/14/2018 10:20 PM, Elenita Hinds wrote:
> Thanks, Kurt and Oliver, for the feedback. The newest bcm changes look better.
Do you or anyone else care porting the new hrtimer-softirq approach to
j1939? Patches are welcome!
regards,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Industrial Linux Solutions | Phone: +49-231-2826-924 |
Vertretung West/Dortmund | Fax: +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686 | http://www.pengutronix.de |
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2018-02-15 8:46 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-14 17:45 [PATCH] j1939: Fix to softirq tasklet race condition when session is destroyed Elenita Hinds
2018-02-14 19:32 ` Kurt Van Dijck
2018-02-14 20:20 ` Oliver Hartkopp
2018-02-14 20:33 ` Marc Kleine-Budde
2018-02-14 21:20 ` Elenita Hinds
2018-02-14 21:20 ` Marc Kleine-Budde
2018-02-14 21:23 ` Marc Kleine-Budde
2018-02-15 8:45 ` Marc Kleine-Budde
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.