All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xnthread_relax: Make sure wakework irq work has a stack
@ 2022-04-05 11:40 Richard Weinberger
  2022-04-05 13:01 ` Bezdeka, Florian
  2022-04-05 16:59 ` Philippe Gerum
  0 siblings, 2 replies; 10+ messages in thread
From: Richard Weinberger @ 2022-04-05 11:40 UTC (permalink / raw)
  To: xenomai; +Cc: Richard Weinberger, Florian Bezdeka, Jan Kiszka

While testing some workload the following KASAN splat arose.
irq_work_single+0x70/0x80 is the last line of irq_work_single():
(void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);

So, writing to &work->node.a_flags failed.
atomic_read() and atomic_set() right before work->func(work) didn't
trigger KASAN. This means the memory location behind &work vanished
while the work function ran.

After taking a brief look for irq work users in Xenomai, I found that
xnthread_relax() posts irq work via pipeline_post_inband_work(&wakework);
where wakework in on the stack.
To my best knowledge it is not guaranteed that xnthread_relax() will
stay around until the irq work has finished.

This can explain the KASAN splat. xnthread_relax() returned while
the irq work was still busy.

To fix the problem, add a new helper, pipeline_sync_inband_work(),
which will synchronize against the posted irq work and ensures
that the work is done when xnthread_relax() is about to return.

On the other hand, on ipipe does suffer from this problem because
ipipe has it's own irq work mechanism which takes a copy of the
work struct before processing it asynchronously.

BUG: KASAN: stack-out-of-bounds in irq_work_single+0x70/0x80
Write of size 4 at addr ffff88802f067d48 by task cobalt_printf/1421
CPU: 1 PID: 1421 Comm: cobalt_printf Tainted: G           OE     5.15.9xeno3.2-x8664G-rw #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
IRQ stage: Linux
Call Trace:
 <IRQ>
 dump_stack_lvl+0x6e/0x97
 print_address_description.constprop.11.cold.17+0x12/0x32a
 ? irq_work_single+0x70/0x80
 ? irq_work_single+0x70/0x80
 kasan_report.cold.18+0x83/0xdf
 ? irq_work_single+0x70/0x80
 kasan_check_range+0x1c1/0x1e0
 irq_work_single+0x70/0x80
 irq_work_run_list+0x4a/0x60
 irq_work_run+0x14/0x30
 inband_work_interrupt+0xa/0x10
 handle_synthetic_irq+0xbb/0x200
 arch_do_IRQ_pipelined+0xab/0x550
 </IRQ>
 <TASK>
 sync_current_irq_stage+0x28a/0x3d0
 __inband_irq_enable+0x62/0x70
 finish_task_switch+0x14f/0x390
 ? __switch_to+0x31e/0x710
 ? finalize_oob_transition+0x24/0xc0
 __schedule+0x7b4/0xfd0
 ? pci_mmcfg_check_reserved+0xc0/0xc0
 ? hrtimer_run_softirq+0x100/0x100
 ? __debug_object_init+0x327/0x6b0
 schedule+0xbf/0x120
 do_nanosleep+0x166/0x2d0
 ? schedule_timeout_idle+0x30/0x30
 ? __hrtimer_init+0xbb/0xf0
 hrtimer_nanosleep+0x110/0x230
 ? nanosleep_copyout+0x70/0x70
 ? hrtimer_init_sleeper_on_stack+0x60/0x60
 __x64_sys_nanosleep+0x10d/0x160
 ? __ia32_sys_nanosleep_time32+0x160/0x160
 do_syscall_64+0x4c/0xa0
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7ff22aa620b0
Code: 3d 00 f0 ff ff 77 43 c3 66 90 55 48 89 f5 53 48 89 fb 48 83 ec 18 e8 af f5 ff ff 48 89 ee 48 89 df 89 c2 b8 23 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 89 d7 89 44 24 0c e8 ed f5 ff ff 8b 44 24
RSP: 002b:00007ff21327ce40 EFLAGS: 00000293 ORIG_RAX: 0000000000000023
RAX: ffffffffffffffda RBX: 00007ff22b097ff0 RCX: 00007ff22aa620b0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ff22b097ff0
RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ff21327d700
R10: 00007ff21327d9d0 R11: 0000000000000293 R12: 00007ffe1dc7af9e
R13: 00007ffe1dc7af9f R14: 00007ffe1dc7b020 R15: 00007ff21327cf40
 </TASK>

Cc: "Florian Bezdeka" <florian.bezdeka@siemens.com>
Cc: "Jan Kiszka" <jan.kiszka@siemens.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
---
 include/cobalt/kernel/dovetail/pipeline/inband_work.h | 3 +++
 kernel/cobalt/thread.c                                | 9 +++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/cobalt/kernel/dovetail/pipeline/inband_work.h b/include/cobalt/kernel/dovetail/pipeline/inband_work.h
index af3d70fc6..826785e43 100644
--- a/include/cobalt/kernel/dovetail/pipeline/inband_work.h
+++ b/include/cobalt/kernel/dovetail/pipeline/inband_work.h
@@ -25,4 +25,7 @@ struct pipeline_inband_work {
 #define pipeline_post_inband_work(__work)				\
 			irq_work_queue(&(__work)->inband_work.work)
 
+#define pipeline_sync_inband_work(__work)				\
+			irq_work_sync(&(__work)->inband_work.work)
+
 #endif /* !_COBALT_KERNEL_DOVETAIL_INBAND_WORK_H */
diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
index ff12f288a..beda67e18 100644
--- a/kernel/cobalt/thread.c
+++ b/kernel/cobalt/thread.c
@@ -2087,6 +2087,15 @@ void xnthread_relax(int notify, int reason)
 	xnthread_suspend(thread, suspension, XN_INFINITE, XN_RELATIVE, NULL);
 	splnone();
 
+#ifdef CONFIG_DOVETAIL
+	/*
+	 * Make sure wakework has finished before we continue and our
+	 * stack vanishes.
+	 * On ipipe this is not a problem because ipipe copies the work.
+	 */
+	pipeline_sync_inband_work(&wakework);
+#endif
+
 	/*
 	 * Basic sanity check after an expected transition to secondary
 	 * mode.
-- 
2.26.2



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack
  2022-04-05 11:40 [PATCH] xnthread_relax: Make sure wakework irq work has a stack Richard Weinberger
@ 2022-04-05 13:01 ` Bezdeka, Florian
  2022-04-05 13:10   ` Richard Weinberger
  2022-04-05 16:59 ` Philippe Gerum
  1 sibling, 1 reply; 10+ messages in thread
From: Bezdeka, Florian @ 2022-04-05 13:01 UTC (permalink / raw)
  To: jan.kiszka, richard; +Cc: xenomai

On Tue, 2022-04-05 at 13:40 +0200, Richard Weinberger wrote:
> While testing some workload the following KASAN splat arose.
> irq_work_single+0x70/0x80 is the last line of irq_work_single():
> (void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);
> 
> So, writing to &work->node.a_flags failed.
> atomic_read() and atomic_set() right before work->func(work) didn't
> trigger KASAN. This means the memory location behind &work vanished
> while the work function ran.
> 
> After taking a brief look for irq work users in Xenomai, I found that
> xnthread_relax() posts irq work via pipeline_post_inband_work(&wakework);
> where wakework in on the stack.
> To my best knowledge it is not guaranteed that xnthread_relax() will
> stay around until the irq work has finished.

I had a look at the git history again and identified 
https://source.denx.de/Xenomai/xenomai/-/commit/6cacd089663d187d6f0ab1203b8def11148fa89b
as possible "duplicate".

Jan, can you remember this one?

> 
> This can explain the KASAN splat. xnthread_relax() returned while
> the irq work was still busy.
> 
> To fix the problem, add a new helper, pipeline_sync_inband_work(),
> which will synchronize against the posted irq work and ensures
> that the work is done when xnthread_relax() is about to return.
> 
> On the other hand, on ipipe does suffer from this problem because
> ipipe has it's own irq work mechanism which takes a copy of the
> work struct before processing it asynchronously.
> 
> BUG: KASAN: stack-out-of-bounds in irq_work_single+0x70/0x80
> Write of size 4 at addr ffff88802f067d48 by task cobalt_printf/1421
> CPU: 1 PID: 1421 Comm: cobalt_printf Tainted: G           OE     5.15.9xeno3.2-x8664G-rw #4
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
> IRQ stage: Linux
> Call Trace:
>  <IRQ>
>  dump_stack_lvl+0x6e/0x97
>  print_address_description.constprop.11.cold.17+0x12/0x32a
>  ? irq_work_single+0x70/0x80
>  ? irq_work_single+0x70/0x80
>  kasan_report.cold.18+0x83/0xdf
>  ? irq_work_single+0x70/0x80
>  kasan_check_range+0x1c1/0x1e0
>  irq_work_single+0x70/0x80
>  irq_work_run_list+0x4a/0x60
>  irq_work_run+0x14/0x30
>  inband_work_interrupt+0xa/0x10
>  handle_synthetic_irq+0xbb/0x200
>  arch_do_IRQ_pipelined+0xab/0x550
>  </IRQ>
>  <TASK>
>  sync_current_irq_stage+0x28a/0x3d0
>  __inband_irq_enable+0x62/0x70
>  finish_task_switch+0x14f/0x390
>  ? __switch_to+0x31e/0x710
>  ? finalize_oob_transition+0x24/0xc0
>  __schedule+0x7b4/0xfd0
>  ? pci_mmcfg_check_reserved+0xc0/0xc0
>  ? hrtimer_run_softirq+0x100/0x100
>  ? __debug_object_init+0x327/0x6b0
>  schedule+0xbf/0x120
>  do_nanosleep+0x166/0x2d0
>  ? schedule_timeout_idle+0x30/0x30
>  ? __hrtimer_init+0xbb/0xf0
>  hrtimer_nanosleep+0x110/0x230
>  ? nanosleep_copyout+0x70/0x70
>  ? hrtimer_init_sleeper_on_stack+0x60/0x60
>  __x64_sys_nanosleep+0x10d/0x160
>  ? __ia32_sys_nanosleep_time32+0x160/0x160
>  do_syscall_64+0x4c/0xa0
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7ff22aa620b0
> Code: 3d 00 f0 ff ff 77 43 c3 66 90 55 48 89 f5 53 48 89 fb 48 83 ec 18 e8 af f5 ff ff 48 89 ee 48 89 df 89 c2 b8 23 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 89 d7 89 44 24 0c e8 ed f5 ff ff 8b 44 24
> RSP: 002b:00007ff21327ce40 EFLAGS: 00000293 ORIG_RAX: 0000000000000023
> RAX: ffffffffffffffda RBX: 00007ff22b097ff0 RCX: 00007ff22aa620b0
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ff22b097ff0
> RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ff21327d700
> R10: 00007ff21327d9d0 R11: 0000000000000293 R12: 00007ffe1dc7af9e
> R13: 00007ffe1dc7af9f R14: 00007ffe1dc7b020 R15: 00007ff21327cf40
>  </TASK>
> 
> Cc: "Florian Bezdeka" <florian.bezdeka@siemens.com>
> Cc: "Jan Kiszka" <jan.kiszka@siemens.com>
> Signed-off-by: Richard Weinberger <richard@nod.at>
> ---
>  include/cobalt/kernel/dovetail/pipeline/inband_work.h | 3 +++
>  kernel/cobalt/thread.c                                | 9 +++++++++
>  2 files changed, 12 insertions(+)
> 
> diff --git a/include/cobalt/kernel/dovetail/pipeline/inband_work.h b/include/cobalt/kernel/dovetail/pipeline/inband_work.h
> index af3d70fc6..826785e43 100644
> --- a/include/cobalt/kernel/dovetail/pipeline/inband_work.h
> +++ b/include/cobalt/kernel/dovetail/pipeline/inband_work.h
> @@ -25,4 +25,7 @@ struct pipeline_inband_work {
>  #define pipeline_post_inband_work(__work)				\
>  			irq_work_queue(&(__work)->inband_work.work)
>  
> +#define pipeline_sync_inband_work(__work)				\
> +			irq_work_sync(&(__work)->inband_work.work)
> +
>  #endif /* !_COBALT_KERNEL_DOVETAIL_INBAND_WORK_H */
> diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
> index ff12f288a..beda67e18 100644
> --- a/kernel/cobalt/thread.c
> +++ b/kernel/cobalt/thread.c
> @@ -2087,6 +2087,15 @@ void xnthread_relax(int notify, int reason)
>  	xnthread_suspend(thread, suspension, XN_INFINITE, XN_RELATIVE, NULL);
>  	splnone();
>  
> +#ifdef CONFIG_DOVETAIL
> +	/*
> +	 * Make sure wakework has finished before we continue and our
> +	 * stack vanishes.
> +	 * On ipipe this is not a problem because ipipe copies the work.
> +	 */
> +	pipeline_sync_inband_work(&wakework);
> +#endif
> +

I'm not sure if waiting is really what we want. I like the idea of
moving the work into struct xnthread as Jan already suggested
internally.

ifdef is only necessary for the stable branches. ipipe support has
already been removed from master/main branch.

>  	/*
>  	 * Basic sanity check after an expected transition to secondary
>  	 * mode.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack
  2022-04-05 13:01 ` Bezdeka, Florian
@ 2022-04-05 13:10   ` Richard Weinberger
  2022-04-05 15:38     ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2022-04-05 13:10 UTC (permalink / raw)
  To: Bezdeka, Florian; +Cc: jan.kiszka, richard, xenomai

On Tue, Apr 5, 2022 at 3:02 PM Bezdeka, Florian via Xenomai
<xenomai@xenomai.org> wrote:
> I'm not sure if waiting is really what we want. I like the idea of
> moving the work into struct xnthread as Jan already suggested
> internally.

Well, the wait is cheap, it does not involve scheduling.
I'm not sure whether further bloating struct xnthread is wise either.

> ifdef is only necessary for the stable branches. ipipe support has
> already been removed from master/main branch.

Ok!

-- 
Thanks,
//richard


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack
  2022-04-05 13:10   ` Richard Weinberger
@ 2022-04-05 15:38     ` Jan Kiszka
  2022-04-05 15:53       ` Richard Weinberger
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2022-04-05 15:38 UTC (permalink / raw)
  To: Richard Weinberger, Bezdeka, Florian; +Cc: richard, xenomai

On 05.04.22 15:10, Richard Weinberger wrote:
> On Tue, Apr 5, 2022 at 3:02 PM Bezdeka, Florian via Xenomai
> <xenomai@xenomai.org> wrote:
>> I'm not sure if waiting is really what we want. I like the idea of
>> moving the work into struct xnthread as Jan already suggested
>> internally.
> 
> Well, the wait is cheap, it does not involve scheduling.
> I'm not sure whether further bloating struct xnthread is wise either.
> 

Let's not optimize before we are actually sure that the issue is what we
assume it to be.

I would like to have an explanation or prove points (traces, assertions)
that we actually see xnthread_relax overtaking the delivery of its own
wakework.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack
  2022-04-05 15:38     ` Jan Kiszka
@ 2022-04-05 15:53       ` Richard Weinberger
  2022-04-05 16:01         ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2022-04-05 15:53 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Florian Bezdeka, xenomai

----- Ursprüngliche Mail -----
> Von: "Jan Kiszka" <jan.kiszka@siemens.com>
> I would like to have an explanation or prove points (traces, assertions)
> that we actually see xnthread_relax overtaking the delivery of its own
> wakework.

I can re-test with something like that:

diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
index beda67e18..4c100b645 100644
--- a/kernel/cobalt/thread.c
+++ b/kernel/cobalt/thread.c
@@ -2159,6 +2159,7 @@ void xnthread_relax(int notify, int reason)
        pipeline_clear_mayday();
 
        trace_cobalt_shadow_relaxed(thread);
+       WARN_ON_ONCE(irq_work_is_busy(&wakework.inband_work.work));
 }
 EXPORT_SYMBOL_GPL(xnthread_relax);

But I fear this might take some time. The KASAM spat happened only once
and also only after the test ran for almost 5 days.

Thanks,
//richard


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack
  2022-04-05 15:53       ` Richard Weinberger
@ 2022-04-05 16:01         ` Jan Kiszka
  2022-04-05 16:14           ` Richard Weinberger
  0 siblings, 1 reply; 10+ messages in thread
From: Jan Kiszka @ 2022-04-05 16:01 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: Florian Bezdeka, xenomai

On 05.04.22 17:53, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>> Von: "Jan Kiszka" <jan.kiszka@siemens.com>
>> I would like to have an explanation or prove points (traces, assertions)
>> that we actually see xnthread_relax overtaking the delivery of its own
>> wakework.
> 
> I can re-test with something like that:
> 
> diff --git a/kernel/cobalt/thread.c b/kernel/cobalt/thread.c
> index beda67e18..4c100b645 100644
> --- a/kernel/cobalt/thread.c
> +++ b/kernel/cobalt/thread.c
> @@ -2159,6 +2159,7 @@ void xnthread_relax(int notify, int reason)
>         pipeline_clear_mayday();
>  
>         trace_cobalt_shadow_relaxed(thread);
> +       WARN_ON_ONCE(irq_work_is_busy(&wakework.inband_work.work));
>  }
>  EXPORT_SYMBOL_GPL(xnthread_relax);
> 
> But I fear this might take some time. The KASAM spat happened only once
> and also only after the test ran for almost 5 days.

How about additionally widening the suspected race window by adding a
delay to lostage_task_wakeup?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack
  2022-04-05 16:01         ` Jan Kiszka
@ 2022-04-05 16:14           ` Richard Weinberger
  2022-04-05 17:23             ` Richard Weinberger
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2022-04-05 16:14 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Florian Bezdeka, xenomai

----- Ursprüngliche Mail -----
> Von: "Jan Kiszka" <jan.kiszka@siemens.com>
>> But I fear this might take some time. The KASAM spat happened only once
>> and also only after the test ran for almost 5 days.
> 
> How about additionally widening the suspected race window by adding a
> delay to lostage_task_wakeup?

Excellent idea! :-)

Thanks,
//richard


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack
  2022-04-05 11:40 [PATCH] xnthread_relax: Make sure wakework irq work has a stack Richard Weinberger
  2022-04-05 13:01 ` Bezdeka, Florian
@ 2022-04-05 16:59 ` Philippe Gerum
  1 sibling, 0 replies; 10+ messages in thread
From: Philippe Gerum @ 2022-04-05 16:59 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: xenomai


Richard Weinberger via Xenomai <xenomai@xenomai.org> writes:

> While testing some workload the following KASAN splat arose.
> irq_work_single+0x70/0x80 is the last line of irq_work_single():
> (void)atomic_cmpxchg(&work->node.a_flags, flags, flags & ~IRQ_WORK_BUSY);
>
> So, writing to &work->node.a_flags failed.
> atomic_read() and atomic_set() right before work->func(work) didn't
> trigger KASAN. This means the memory location behind &work vanished
> while the work function ran.
>
> After taking a brief look for irq work users in Xenomai, I found that
> xnthread_relax() posts irq work via pipeline_post_inband_work(&wakework);
> where wakework in on the stack.
> To my best knowledge it is not guaranteed that xnthread_relax() will
> stay around until the irq work has finished.
>

If the task calling xnthread_relax() stays on same CPU during the whole
process, I believe this is ok:

- irq_work() queues the request to the local CPU, i.e. where the thread
  calling xnthread_relax() currently runs. The later suspends from the
  Cobalt scheduler, waits for the irq_work handler to wake it up.

- irq_work() runs in (hard) interrupt context on the in-band stage, so
  it is free from preemption by any in-band task on the same CPU, which
  the caller of xnthread_relax() is now, after a successful oob ->
  in-band stage switch. Also, irq_work() uses a synthetic IRQ, and this
  type of interrupt cannot be threaded, so there is no surprise to
  expect from that dept.

> This can explain the KASAN splat. xnthread_relax() returned while
> the irq work was still busy.
>
> To fix the problem, add a new helper, pipeline_sync_inband_work(),
> which will synchronize against the posted irq work and ensures
> that the work is done when xnthread_relax() is about to return.
>
> On the other hand, on ipipe does suffer from this problem because
> ipipe has it's own irq work mechanism which takes a copy of the
> work struct before processing it asynchronously.
>
> BUG: KASAN: stack-out-of-bounds in irq_work_single+0x70/0x80
> Write of size 4 at addr ffff88802f067d48 by task cobalt_printf/1421

What bothers me is that cobalt_printf is not a Xenomai thread, this is a
plain POSIX thread. So we cannot be observing the tail code from a stage
switch to in-band for that thread. The only explanation I could come up
with is that a relaxed thread is migrated to a different CPU at wake up
from the one which triggered the irq_work() call, in which case we might
have enough concurrency to see these fireworks.

Normally, a Xenomai thread runs on a single CPU, but this rule might be
broken by an explicit user action though. Is there any Xenomai thread
with more than one CPU in its affinity set for this test?

PS: I agree that moving the request descriptor to the thread control
block is the safest option (the evl core dropped all stack-based request
descriptors, this is a no brainer, tends to enable cleanup work in the
code, and simplifies the overall logic). We are talking about a few
bytes, and this would save the irq_work init chores we currently have to
do at each relax call. Besides, a stack-based descriptor for the relax
path is always going to be sensitive to a CPU migration which might
cause havoc.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack
  2022-04-05 16:14           ` Richard Weinberger
@ 2022-04-05 17:23             ` Richard Weinberger
  2022-04-05 17:39               ` Jan Kiszka
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Weinberger @ 2022-04-05 17:23 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Florian Bezdeka, xenomai

----- Ursprüngliche Mail -----
>> How about additionally widening the suspected race window by adding a
>> delay to lostage_task_wakeup?
> 
> Excellent idea! :-)

Yeah, with a dealy in lostage_task_wakeup() my WARN_ON_ONCE() triggers
very quickly.

[  123.237698] ------------[ cut here ]------------
[  123.238755] WARNING: CPU: 1 PID: 1411 at kernel/xenomai/thread.c:2158 xnthread_relax+0x5d4/0x680
[  123.240698] Modules linked in: loader(OE) tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables xeno_can_peak_pci xeno_can_sja1000 xeno_can xeno_16550A libcrc32c nfnetlink xeno_rtipc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device snd_pcm snd_timer snd soundcore sunrpc pktcdvd rt_e1000 rt_e1000_new crc32_pclmul rtnet i2c_piix4 bochs drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm e1000 serio_raw crc32c_intel ata_generic pata_acpi floppy qemu_fw_cfg fuse
[  123.252790] CPU: 1 PID: 1411 Comm: app Tainted: G           OE     5.15.9xeno3.2-x8664G-rw #6
[  123.255001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[  123.257443] IRQ stage: Linux
[  123.258090] RIP: 0010:xnthread_relax+0x5d4/0x680
[  123.259136] Code: 18 05 00 00 e8 5d 8d 13 00 41 8b 97 18 05 00 00 48 8d b3 6c 02 00 00 48 c7 c7 a0 8f 6d 82 e8 5c 96 cc 00 0f 0b e9 9d fc ff ff <0f> 0b e9 af fd ff ff 65 44 8b 2d a5 2d ca 7e 41 83 fd 03 77 7a 45
[  123.263163] RSP: 0018:ffff88811381fb60 EFLAGS: 00010202
[  123.264330] RAX: 0000000000000022 RBX: ffffc90000bf6408 RCX: ffffffff8137e13b
[  123.265893] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88811381fc18
[  123.267474] RBP: 1ffff11022703f6e R08: ffffed1022703f84 R09: ffffed1022703f84
[  123.269095] R10: ffff88811381fc1b R11: ffffed1022703f83 R12: ffff88811381fc10
[  123.270662] R13: ffff888104f03e00 R14: 0000000000000000 R15: ffffc90000bf6428
[  123.272222] FS:  00007fb4d0aca700(0000) GS:ffff88811b080000(0000) knlGS:0000000000000000
[  123.273983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  123.275262] CR2: 0000000000a74008 CR3: 0000000111938003 CR4: 0000000000170ea0
[  123.276823] Call Trace:
[  123.277401]  <TASK>
[  123.277878]  ? xnthread_wait_period+0x4c0/0x4c0
[  123.278899]  ? xnsynch_release+0x690/0x690
[  123.279828]  ? __cobalt_sem_destroy+0x2dd/0x630
[  123.280848]  ? recalibrate_cpu_khz+0x10/0x10
[  123.281812]  ? xnthread_set_periodic+0x3a0/0x3a0
[  123.282855]  ? recalibrate_cpu_khz+0x10/0x10
[  123.283816]  ? ktime_get_mono_fast_ns+0xdb/0x120
[  123.284852]  ? xnlock_dbg_release+0xd9/0x170
[  123.285812]  prepare_for_signal+0x297/0x3a0
[  123.286765]  ? CoBaLt_serialdbg+0x140/0x140
[  123.287709]  ? cobalt_thread_setschedparam_ex+0x1a0/0x1a0
[  123.288906]  handle_head_syscall+0x6e2/0x810
[  123.289867]  ? __cobalt_cond_wait_prologue+0xf60/0xf60
[  123.291017]  ? CoBaLt_trace+0x650/0x650
[  123.291887]  ? cobalt_thread_setschedparam_ex+0x1a0/0x1a0
[  123.293088]  pipeline_syscall+0x8e/0x140
[  123.293979]  syscall_enter_from_user_mode+0x30/0x80
[  123.295076]  do_syscall_64+0x1d/0xa0
[  123.295895]  entry_SYSCALL_64_after_hwframe+0x44/0xae

Thanks,
//richard


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] xnthread_relax: Make sure wakework irq work has a stack
  2022-04-05 17:23             ` Richard Weinberger
@ 2022-04-05 17:39               ` Jan Kiszka
  0 siblings, 0 replies; 10+ messages in thread
From: Jan Kiszka @ 2022-04-05 17:39 UTC (permalink / raw)
  To: Richard Weinberger; +Cc: Florian Bezdeka, xenomai

On 05.04.22 19:23, Richard Weinberger wrote:
> ----- Ursprüngliche Mail -----
>>> How about additionally widening the suspected race window by adding a
>>> delay to lostage_task_wakeup?
>>
>> Excellent idea! :-)
> 
> Yeah, with a dealy in lostage_task_wakeup() my WARN_ON_ONCE() triggers
> very quickly.
> 
> [  123.237698] ------------[ cut here ]------------
> [  123.238755] WARNING: CPU: 1 PID: 1411 at kernel/xenomai/thread.c:2158 xnthread_relax+0x5d4/0x680
> [  123.240698] Modules linked in: loader(OE) tun bridge stp llc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv6 nft_reject nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables xeno_can_peak_pci xeno_can_sja1000 xeno_can xeno_16550A libcrc32c nfnetlink xeno_rtipc snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device snd_pcm snd_timer snd soundcore sunrpc pktcdvd rt_e1000 rt_e1000_new crc32_pclmul rtnet i2c_piix4 bochs drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec drm_ttm_helper ttm drm e1000 serio_raw crc32c_intel ata_generic pata_acpi floppy qemu_fw_cfg fuse
> [  123.252790] CPU: 1 PID: 1411 Comm: app Tainted: G           OE     5.15.9xeno3.2-x8664G-rw #6
> [  123.255001] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
> [  123.257443] IRQ stage: Linux
> [  123.258090] RIP: 0010:xnthread_relax+0x5d4/0x680
> [  123.259136] Code: 18 05 00 00 e8 5d 8d 13 00 41 8b 97 18 05 00 00 48 8d b3 6c 02 00 00 48 c7 c7 a0 8f 6d 82 e8 5c 96 cc 00 0f 0b e9 9d fc ff ff <0f> 0b e9 af fd ff ff 65 44 8b 2d a5 2d ca 7e 41 83 fd 03 77 7a 45
> [  123.263163] RSP: 0018:ffff88811381fb60 EFLAGS: 00010202
> [  123.264330] RAX: 0000000000000022 RBX: ffffc90000bf6408 RCX: ffffffff8137e13b
> [  123.265893] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff88811381fc18
> [  123.267474] RBP: 1ffff11022703f6e R08: ffffed1022703f84 R09: ffffed1022703f84
> [  123.269095] R10: ffff88811381fc1b R11: ffffed1022703f83 R12: ffff88811381fc10
> [  123.270662] R13: ffff888104f03e00 R14: 0000000000000000 R15: ffffc90000bf6428
> [  123.272222] FS:  00007fb4d0aca700(0000) GS:ffff88811b080000(0000) knlGS:0000000000000000
> [  123.273983] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  123.275262] CR2: 0000000000a74008 CR3: 0000000111938003 CR4: 0000000000170ea0
> [  123.276823] Call Trace:
> [  123.277401]  <TASK>
> [  123.277878]  ? xnthread_wait_period+0x4c0/0x4c0
> [  123.278899]  ? xnsynch_release+0x690/0x690
> [  123.279828]  ? __cobalt_sem_destroy+0x2dd/0x630
> [  123.280848]  ? recalibrate_cpu_khz+0x10/0x10
> [  123.281812]  ? xnthread_set_periodic+0x3a0/0x3a0
> [  123.282855]  ? recalibrate_cpu_khz+0x10/0x10
> [  123.283816]  ? ktime_get_mono_fast_ns+0xdb/0x120
> [  123.284852]  ? xnlock_dbg_release+0xd9/0x170
> [  123.285812]  prepare_for_signal+0x297/0x3a0
> [  123.286765]  ? CoBaLt_serialdbg+0x140/0x140
> [  123.287709]  ? cobalt_thread_setschedparam_ex+0x1a0/0x1a0
> [  123.288906]  handle_head_syscall+0x6e2/0x810
> [  123.289867]  ? __cobalt_cond_wait_prologue+0xf60/0xf60
> [  123.291017]  ? CoBaLt_trace+0x650/0x650
> [  123.291887]  ? cobalt_thread_setschedparam_ex+0x1a0/0x1a0
> [  123.293088]  pipeline_syscall+0x8e/0x140
> [  123.293979]  syscall_enter_from_user_mode+0x30/0x80
> [  123.295076]  do_syscall_64+0x1d/0xa0
> [  123.295895]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> 

Ok, we are seeing clearer but not everything. Could you find out if work
and thread are running on different CPUs? Or, via tracing, what led to
this case otherwise?

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-04-05 17:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-05 11:40 [PATCH] xnthread_relax: Make sure wakework irq work has a stack Richard Weinberger
2022-04-05 13:01 ` Bezdeka, Florian
2022-04-05 13:10   ` Richard Weinberger
2022-04-05 15:38     ` Jan Kiszka
2022-04-05 15:53       ` Richard Weinberger
2022-04-05 16:01         ` Jan Kiszka
2022-04-05 16:14           ` Richard Weinberger
2022-04-05 17:23             ` Richard Weinberger
2022-04-05 17:39               ` Jan Kiszka
2022-04-05 16:59 ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.