All of lore.kernel.org
 help / color / mirror / Atom feed
* Process hangs after upgrade
@ 2023-01-12 15:26 Schuman Eelco (DC-AE/ESW5)
  2023-01-13 13:52 ` Jan Kiszka
  0 siblings, 1 reply; 9+ messages in thread
From: Schuman Eelco (DC-AE/ESW5) @ 2023-01-12 15:26 UTC (permalink / raw)
  To: xenomai

Hi,

I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
I'm seeing scheduling problems, e.g. sleep 1 does not finish.

The IRQ pipeline torture tests run OK:
[    0.992675] Starting IRQ pipeline tests...
[    0.992687] IRQ pipeline: high-priority torture stage added.
[    1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
[    1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
[    1.021390] CPU1: proxy tick device registered (325.00MHz)
[    1.026881] CPU0: proxy tick device registered (325.00MHz)
[    1.035773] irq_pipeline-torture: CPU0: irq_work handled
[    1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
[    1.048430] irq_pipeline-torture: CPU0: stage escalation request works
[    1.048436] irq_pipeline-torture: CPU0: irq_work handled
[    1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
[    2.113302] CPU1: proxy tick device unregistered
[    2.117930] CPU0: proxy tick device unregistered
[    2.126014] IRQ pipeline: torture stage removed.
[    2.130654] IRQ pipeline tests OK.


There is a warning in the kernel logs:

     0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
[    0.000000] Hardware name: Xilinx Zynq Platform
[    0.000000] IRQ stage: Linux
[    0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
[    0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
[    0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
[    0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
[    0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
[    0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
[    0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
[    0.000000] ---[ end trace 58996bc4bee07162 ]---
[    0.000000] IRQ pipeline enabled

The system and applications run OK with only one core in use. 

Is there something missing in the Linux config or anything else I can try?

Thanks,

Eelco Schuman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process hangs after upgrade
  2023-01-12 15:26 Process hangs after upgrade Schuman Eelco (DC-AE/ESW5)
@ 2023-01-13 13:52 ` Jan Kiszka
  2023-01-13 16:51   ` Greg Gallagher
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2023-01-13 13:52 UTC (permalink / raw)
  To: Schuman Eelco (DC-AE/ESW5), xenomai, Philippe Gerum

On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
> Hi,
> 
> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
> 
> The IRQ pipeline torture tests run OK:
> [    0.992675] Starting IRQ pipeline tests...
> [    0.992687] IRQ pipeline: high-priority torture stage added.
> [    1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
> [    1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
> [    1.021390] CPU1: proxy tick device registered (325.00MHz)
> [    1.026881] CPU0: proxy tick device registered (325.00MHz)
> [    1.035773] irq_pipeline-torture: CPU0: irq_work handled
> [    1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
> [    1.048430] irq_pipeline-torture: CPU0: stage escalation request works
> [    1.048436] irq_pipeline-torture: CPU0: irq_work handled
> [    1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
> [    2.113302] CPU1: proxy tick device unregistered
> [    2.117930] CPU0: proxy tick device unregistered
> [    2.126014] IRQ pipeline: torture stage removed.
> [    2.130654] IRQ pipeline tests OK.
> 
> 
> There is a warning in the kernel logs:
> 
>      0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84

This complains that hard-IRQs would be on while they must not. All that
in fairly early boot code.

> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
> [    0.000000] Hardware name: Xilinx Zynq Platform
> [    0.000000] IRQ stage: Linux
> [    0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
> [    0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
> [    0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
> [    0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
> [    0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
> [    0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
> [    0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
> [    0.000000] ---[ end trace 58996bc4bee07162 ]---
> [    0.000000] IRQ pipeline enabled
> 
> The system and applications run OK with only one core in use. 
> 
> Is there something missing in the Linux config or anything else I can try?

Not sure. We don't have a Zynq in the test lab or at least as reference
image here. It might be that dovetail is missing some bits for this SoC,
e.g. a patch analogously to [1] for drivers/gpio/gpio-zynq.c. Adding
Philippe for possibly more insights.

ARM multi-core is generically fine, see our QEMU target
(https://source.denx.de/Xenomai/xenomai-images/-/jobs/552798 e.g.). But
our only physical armhf target in the lab is a single-core beaglebone.

Jan

[1]
https://source.denx.de/Xenomai/linux-dovetail/-/commit/3d3e5a1659d69c9dc96471efa542dc05fcc58e56

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process hangs after upgrade
  2023-01-13 13:52 ` Jan Kiszka
@ 2023-01-13 16:51   ` Greg Gallagher
  2023-01-13 17:15     ` Jan Kiszka
  0 siblings, 1 reply; 9+ messages in thread
From: Greg Gallagher @ 2023-01-13 16:51 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Schuman Eelco (DC-AE/ESW5), xenomai, Philippe Gerum

On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>
> On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
> > Hi,
> >
> > I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
> > I'm seeing scheduling problems, e.g. sleep 1 does not finish.
> >
> > The IRQ pipeline torture tests run OK:
> > [    0.992675] Starting IRQ pipeline tests...
> > [    0.992687] IRQ pipeline: high-priority torture stage added.
> > [    1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
> > [    1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
> > [    1.021390] CPU1: proxy tick device registered (325.00MHz)
> > [    1.026881] CPU0: proxy tick device registered (325.00MHz)
> > [    1.035773] irq_pipeline-torture: CPU0: irq_work handled
> > [    1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
> > [    1.048430] irq_pipeline-torture: CPU0: stage escalation request works
> > [    1.048436] irq_pipeline-torture: CPU0: irq_work handled
> > [    1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
> > [    2.113302] CPU1: proxy tick device unregistered
> > [    2.117930] CPU0: proxy tick device unregistered
> > [    2.126014] IRQ pipeline: torture stage removed.
> > [    2.130654] IRQ pipeline tests OK.
> >
> >
> > There is a warning in the kernel logs:
> >
> >      0.000000] ------------[ cut here ]------------
> > [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
>
> This complains that hard-IRQs would be on while they must not. All that
> in fairly early boot code.
>
> > [    0.000000] Modules linked in:
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
> > [    0.000000] Hardware name: Xilinx Zynq Platform
> > [    0.000000] IRQ stage: Linux
> > [    0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
> > [    0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
> > [    0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
> > [    0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
> > [    0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
> > [    0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
> > [    0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
> > [    0.000000] ---[ end trace 58996bc4bee07162 ]---
> > [    0.000000] IRQ pipeline enabled
> >
> > The system and applications run OK with only one core in use.
> >
> > Is there something missing in the Linux config or anything else I can try?
>
> Not sure. We don't have a Zynq in the test lab or at least as reference
> image here. It might be that dovetail is missing some bits for this SoC,
> e.g. a patch analogously to [1] for drivers/gpio/gpio-zynq.c. Adding
> Philippe for possibly more insights.
>
> ARM multi-core is generically fine, see our QEMU target
> (https://source.denx.de/Xenomai/xenomai-images/-/jobs/552798 e.g.). But
> our only physical armhf target in the lab is a single-core beaglebone.
>
> Jan
>
> [1]
> https://source.denx.de/Xenomai/linux-dovetail/-/commit/3d3e5a1659d69c9dc96471efa542dc05fcc58e56
>
> --
> Siemens AG, Technology
> Competence Center Embedded Linux
>
>

If Philippe doesn't have time for an immediate answer I can try one of
my zynq boards this weekend.  I reserved some time for ipipe things, I
can add this to the list and see if I can help.

-Greg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process hangs after upgrade
  2023-01-13 16:51   ` Greg Gallagher
@ 2023-01-13 17:15     ` Jan Kiszka
  2023-01-16 20:26       ` Greg Gallagher
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2023-01-13 17:15 UTC (permalink / raw)
  To: Greg Gallagher; +Cc: Schuman Eelco (DC-AE/ESW5), xenomai, Philippe Gerum

On 13.01.23 17:51, Greg Gallagher wrote:
> On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>
>> On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
>>> Hi,
>>>
>>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
>>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
>>>
>>> The IRQ pipeline torture tests run OK:
>>> [    0.992675] Starting IRQ pipeline tests...
>>> [    0.992687] IRQ pipeline: high-priority torture stage added.
>>> [    1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
>>> [    1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
>>> [    1.021390] CPU1: proxy tick device registered (325.00MHz)
>>> [    1.026881] CPU0: proxy tick device registered (325.00MHz)
>>> [    1.035773] irq_pipeline-torture: CPU0: irq_work handled
>>> [    1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
>>> [    1.048430] irq_pipeline-torture: CPU0: stage escalation request works
>>> [    1.048436] irq_pipeline-torture: CPU0: irq_work handled
>>> [    1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
>>> [    2.113302] CPU1: proxy tick device unregistered
>>> [    2.117930] CPU0: proxy tick device unregistered
>>> [    2.126014] IRQ pipeline: torture stage removed.
>>> [    2.130654] IRQ pipeline tests OK.
>>>
>>>
>>> There is a warning in the kernel logs:
>>>
>>>      0.000000] ------------[ cut here ]------------
>>> [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
>>
>> This complains that hard-IRQs would be on while they must not. All that
>> in fairly early boot code.
>>
>>> [    0.000000] Modules linked in:
>>> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
>>> [    0.000000] Hardware name: Xilinx Zynq Platform
>>> [    0.000000] IRQ stage: Linux
>>> [    0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
>>> [    0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
>>> [    0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
>>> [    0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
>>> [    0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
>>> [    0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
>>> [    0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
>>> [    0.000000] ---[ end trace 58996bc4bee07162 ]---
>>> [    0.000000] IRQ pipeline enabled
>>>
>>> The system and applications run OK with only one core in use.
>>>
>>> Is there something missing in the Linux config or anything else I can try?
>>
>> Not sure. We don't have a Zynq in the test lab or at least as reference
>> image here. It might be that dovetail is missing some bits for this SoC,
>> e.g. a patch analogously to [1] for drivers/gpio/gpio-zynq.c. Adding
>> Philippe for possibly more insights.
>>
>> ARM multi-core is generically fine, see our QEMU target
>> (https://source.denx.de/Xenomai/xenomai-images/-/jobs/552798 e.g.). But
>> our only physical armhf target in the lab is a single-core beaglebone.
>>
>> Jan
>>
>> [1]
>> https://source.denx.de/Xenomai/linux-dovetail/-/commit/3d3e5a1659d69c9dc96471efa542dc05fcc58e56
>>
>> --
>> Siemens AG, Technology
>> Competence Center Embedded Linux
>>
>>
> 
> If Philippe doesn't have time for an immediate answer I can try one of
> my zynq boards this weekend.  I reserved some time for ipipe things, I
> can add this to the list and see if I can help.

That would be great! I have one zync board literally at my feet as well,
but setting all that up will take the usual time...

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process hangs after upgrade
  2023-01-13 17:15     ` Jan Kiszka
@ 2023-01-16 20:26       ` Greg Gallagher
  2023-01-17 17:03         ` Schuman Eelco (DC-AE/ESW5)
  0 siblings, 1 reply; 9+ messages in thread
From: Greg Gallagher @ 2023-01-16 20:26 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Schuman Eelco (DC-AE/ESW5), xenomai, Philippe Gerum

On Fri, Jan 13, 2023 at 12:15 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>
> On 13.01.23 17:51, Greg Gallagher wrote:
> > On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
> >>
> >> On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
> >>> Hi,
> >>>
> >>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
> >>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
> >>>
> >>> The IRQ pipeline torture tests run OK:
> >>> [    0.992675] Starting IRQ pipeline tests...
> >>> [    0.992687] IRQ pipeline: high-priority torture stage added.
> >>> [    1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
> >>> [    1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
> >>> [    1.021390] CPU1: proxy tick device registered (325.00MHz)
> >>> [    1.026881] CPU0: proxy tick device registered (325.00MHz)
> >>> [    1.035773] irq_pipeline-torture: CPU0: irq_work handled
> >>> [    1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
> >>> [    1.048430] irq_pipeline-torture: CPU0: stage escalation request works
> >>> [    1.048436] irq_pipeline-torture: CPU0: irq_work handled
> >>> [    1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
> >>> [    2.113302] CPU1: proxy tick device unregistered
> >>> [    2.117930] CPU0: proxy tick device unregistered
> >>> [    2.126014] IRQ pipeline: torture stage removed.
> >>> [    2.130654] IRQ pipeline tests OK.
> >>>
> >>>
> >>> There is a warning in the kernel logs:
> >>>
> >>>      0.000000] ------------[ cut here ]------------
> >>> [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
> >>
> >> This complains that hard-IRQs would be on while they must not. All that
> >> in fairly early boot code.
> >>
> >>> [    0.000000] Modules linked in:
> >>> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
> >>> [    0.000000] Hardware name: Xilinx Zynq Platform
> >>> [    0.000000] IRQ stage: Linux
> >>> [    0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
> >>> [    0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
> >>> [    0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
> >>> [    0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
> >>> [    0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
> >>> [    0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
> >>> [    0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
> >>> [    0.000000] ---[ end trace 58996bc4bee07162 ]---
> >>> [    0.000000] IRQ pipeline enabled
> >>>
> >>> The system and applications run OK with only one core in use.
> >>>
> >>> Is there something missing in the Linux config or anything else I can try?
> >>
> >> Not sure. We don't have a Zynq in the test lab or at least as reference
> >> image here. It might be that dovetail is missing some bits for this SoC,
> >> e.g. a patch analogously to [1] for drivers/gpio/gpio-zynq.c. Adding
> >> Philippe for possibly more insights.
> >>
> >> ARM multi-core is generically fine, see our QEMU target
> >> (https://source.denx.de/Xenomai/xenomai-images/-/jobs/552798 e.g.). But
> >> our only physical armhf target in the lab is a single-core beaglebone.
> >>
> >> Jan
> >>
> >> [1]
> >> https://source.denx.de/Xenomai/linux-dovetail/-/commit/3d3e5a1659d69c9dc96471efa542dc05fcc58e56
> >>
> >> --
> >> Siemens AG, Technology
> >> Competence Center Embedded Linux
> >>
> >>
> >
> > If Philippe doesn't have time for an immediate answer I can try one of
> > my zynq boards this weekend.  I reserved some time for ipipe things, I
> > can add this to the list and see if I can help.
>
> That would be great! I have one zync board literally at my feet as well,
> but setting all that up will take the usual time...
>
> Jan
>
> --
> Siemens AG, Technology
> Competence Center Embedded Linux
>

I reproduced the issue (pretty sure), I can look at it this week if
that helps.  Just to confirm, Eelco, to reproduce I can create a
thread and call sleep(1) and I should see the process hang?

-Greg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Process hangs after upgrade
  2023-01-16 20:26       ` Greg Gallagher
@ 2023-01-17 17:03         ` Schuman Eelco (DC-AE/ESW5)
  2023-01-17 19:28           ` Greg Gallagher
  0 siblings, 1 reply; 9+ messages in thread
From: Schuman Eelco (DC-AE/ESW5) @ 2023-01-17 17:03 UTC (permalink / raw)
  To: Greg Gallagher, Jan Kiszka; +Cc: xenomai, Philippe Gerum

> On Fri, Jan 13, 2023 at 12:15 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
> >
> > On 13.01.23 17:51, Greg Gallagher wrote:
> > On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > >
>> > > On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
>> >>> Hi,
>> >>>
>> >>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
>> >>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
>> >>>
>> >>> The IRQ pipeline torture tests run OK:
>> >>> [    0.992675] Starting IRQ pipeline tests...
>> >>> [    0.992687] IRQ pipeline: high-priority torture stage added.
>> >>> [    1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
>> >>> [    1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
>> >>> [    1.021390] CPU1: proxy tick device registered (325.00MHz)
>> >>> [    1.026881] CPU0: proxy tick device registered (325.00MHz)
>> >>> [    1.035773] irq_pipeline-torture: CPU0: irq_work handled
>> >>> [    1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
>> >>> [    1.048430] irq_pipeline-torture: CPU0: stage escalation request works
>> >>> [    1.048436] irq_pipeline-torture: CPU0: irq_work handled
>> >>> [    1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
>> >>> [    2.113302] CPU1: proxy tick device unregistered
>> >>> [    2.117930] CPU0: proxy tick device unregistered
>> >>> [    2.126014] IRQ pipeline: torture stage removed.
>> >>> [    2.130654] IRQ pipeline tests OK.
>> >>>
>> >>>
>> >>> There is a warning in the kernel logs:
>> >>>
>> >>>      0.000000] ------------[ cut here ]------------
>> >>> [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
>> >>
>> >> This complains that hard-IRQs would be on while they must not. All 
>> >> that in fairly early boot code.
>> >>
>> >>> [    0.000000] Modules linked in:
>> >>> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
>> >>> [    0.000000] Hardware name: Xilinx Zynq Platform
>> >>> [    0.000000] IRQ stage: Linux
>> >>> [    0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
>> >>> [    0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
>> >>> [    0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
>> >>> [    0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
>> >>> [    0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
>> >>> [    0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
>> >>> [    0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
>> >>> [    0.000000] ---[ end trace 58996bc4bee07162 ]---
>> >>> [    0.000000] IRQ pipeline enabled
>> >>>
>> >>> The system and applications run OK with only one core in use.
>> >>>
>> >>> Is there something missing in the Linux config or anything else I can try?
>> >>
>> >> Not sure. We don't have a Zynq in the test lab or at least as 
>> >> reference image here. It might be that dovetail is missing some 
>> >> bits for this SoC, e.g. a patch analogously to [1] for 
>> >> drivers/gpio/gpio-zynq.c. Adding Philippe for possibly more insights.
>> >>
>> >> ARM multi-core is generically fine, see our QEMU target
>> >> (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>> >> source.denx.de%2FXenomai%2Fxenomai-images%2F-%2Fjobs%2F552798&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X9HSPXZaFPte2sxFfS5OTd683dFqMhWRpRcGXX%2FtRDg%3D&reserved=0 e.g.). But our only physical armhf target in the lab is a single-core beaglebone.
>> >>
>> >> Jan
>> >>
>> >> [1]
>> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fs
>> >> ource.denx.de%2FXenomai%2Flinux-dovetail%2F-%2Fcommit%2F3d3e5a1659d
>> >> 69c9dc96471efa542dc05fcc58e56&data=05%7C01%7CEelco.Schuman%40boschr
>> >> exroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d
>> >> 648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8e
>> >> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
>> >> C3000%7C%7C%7C&sdata=Owo4KNn2Bo6r8yKJaI0ZrJNKTQ7woM4eOjYCb74M6ZA%3D
>> >> &reserved=0
>> >>
>> >> --
>> >> Siemens AG, Technology
>> >> Competence Center Embedded Linux
>> >>
>> >>
>> >>
>> > If Philippe doesn't have time for an immediate answer I can try one 
>> > of my zynq boards this weekend.  I reserved some time for ipipe 
>> > things, I can add this to the list and see if I can help.
>>
>> That would be great! I have one zync board literally at my feet as 
>> well, but setting all that up will take the usual time...
>>
>> Jan
>>
>> --
>> Siemens AG, Technology
>> Competence Center Embedded Linux
>>
>
>I reproduced the issue (pretty sure), I can look at it this week if that helps.  Just to confirm, Eelco, to reproduce I can create a thread and call sleep(1) and I should see the process hang?
>
>-Greg


Hi Greg, many thanks for looking at this issue. Yes, a 'sleep(1)' in an application or thread hangs the process.  I noticed that the proxy tick does not get any interrupts:

          CPU0       CPU1
 24:          0          5     SIRQC  24 Edge          in-band work
 25:          0          0     GIC-0  27 Edge          gt
 26:      94785      94399     GIC-0  29 Edge     oob  twd
 27:          0          0     GIC-0  37 Level         arm-pmu
 28:          0          0     GIC-0  38 Level         arm-pmu
 29:         43          0     GIC-0  39 Level         f8007100.adc
 32:          0          0     GIC-0  45 Level         f8003000.dmac
 33:          0          0     GIC-0  46 Level         f8003000.dmac
 34:          0          0     GIC-0  47 Level         f8003000.dmac
 35:          0          0     GIC-0  48 Level         f8003000.dmac
 36:          0          0     GIC-0  49 Level         f8003000.dmac
 37:          0          0     GIC-0  72 Level         f8003000.dmac
 38:          0          0     GIC-0  73 Level         f8003000.dmac
 39:          0          0     GIC-0  74 Level         f8003000.dmac
 40:          0          0     GIC-0  75 Level         f8003000.dmac
 41:          0          0     GIC-0  54 Level         eth0
 43:       2840          0     GIC-0  57 Level         cdns-i2c
 45:        463          0     GIC-0  59 Level         xuartps
 46:          0          0     GIC-0  64 Level    oob  [nyce-kmod]
 48:          0          0     SIRQC  48 Edge          Inband sirq
 49:          0          0     SIRQC  49 Edge          Inband sirq
 50:          0          0     SIRQC  50 Edge          Inband sirq
 51:          0          0     SIRQC  51 Edge          proxy tick
IPI0:          0          0  CPU wakeup interrupts
IPI1:          0          0  Timer broadcast interrupts
IPI2:        321        528  Rescheduling interrupts
IPI3:          5          2  Function call interrupts
IPI4:          0          0  CPU stop interrupts
IPI5:          0          0  IRQ work interrupts
IPI6:          0          0  completion interrupts

Is this because xnintr_host_tick() is not called, e.g. from xnintr_core_clock_handler() ? 


Best regards,

Eelco



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process hangs after upgrade
  2023-01-17 17:03         ` Schuman Eelco (DC-AE/ESW5)
@ 2023-01-17 19:28           ` Greg Gallagher
  2023-01-18 16:44             ` Schuman Eelco (DC-AE/ESW5)
  0 siblings, 1 reply; 9+ messages in thread
From: Greg Gallagher @ 2023-01-17 19:28 UTC (permalink / raw)
  To: Schuman Eelco (DC-AE/ESW5); +Cc: Jan Kiszka, xenomai, Philippe Gerum

On Tue, Jan 17, 2023 at 12:03 PM Schuman Eelco (DC-AE/ESW5)
<Eelco.Schuman@boschrexroth.nl> wrote:
>
> > On Fri, Jan 13, 2023 at 12:15 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > >
> > > On 13.01.23 17:51, Greg Gallagher wrote:
> > > On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > > >
> >> > > On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
> >> >>> Hi,
> >> >>>
> >> >>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
> >> >>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
> >> >>>
> >> >>> The IRQ pipeline torture tests run OK:
> >> >>> [    0.992675] Starting IRQ pipeline tests...
> >> >>> [    0.992687] IRQ pipeline: high-priority torture stage added.
> >> >>> [    1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
> >> >>> [    1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
> >> >>> [    1.021390] CPU1: proxy tick device registered (325.00MHz)
> >> >>> [    1.026881] CPU0: proxy tick device registered (325.00MHz)
> >> >>> [    1.035773] irq_pipeline-torture: CPU0: irq_work handled
> >> >>> [    1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
> >> >>> [    1.048430] irq_pipeline-torture: CPU0: stage escalation request works
> >> >>> [    1.048436] irq_pipeline-torture: CPU0: irq_work handled
> >> >>> [    1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
> >> >>> [    2.113302] CPU1: proxy tick device unregistered
> >> >>> [    2.117930] CPU0: proxy tick device unregistered
> >> >>> [    2.126014] IRQ pipeline: torture stage removed.
> >> >>> [    2.130654] IRQ pipeline tests OK.
> >> >>>
> >> >>>
> >> >>> There is a warning in the kernel logs:
> >> >>>
> >> >>>      0.000000] ------------[ cut here ]------------
> >> >>> [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
> >> >>
> >> >> This complains that hard-IRQs would be on while they must not. All
> >> >> that in fairly early boot code.
> >> >>
> >> >>> [    0.000000] Modules linked in:
> >> >>> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
> >> >>> [    0.000000] Hardware name: Xilinx Zynq Platform
> >> >>> [    0.000000] IRQ stage: Linux
> >> >>> [    0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
> >> >>> [    0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
> >> >>> [    0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
> >> >>> [    0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
> >> >>> [    0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
> >> >>> [    0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
> >> >>> [    0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
> >> >>> [    0.000000] ---[ end trace 58996bc4bee07162 ]---
> >> >>> [    0.000000] IRQ pipeline enabled
> >> >>>
> >> >>> The system and applications run OK with only one core in use.
> >> >>>
> >> >>> Is there something missing in the Linux config or anything else I can try?
> >> >>
> >> >> Not sure. We don't have a Zynq in the test lab or at least as
> >> >> reference image here. It might be that dovetail is missing some
> >> >> bits for this SoC, e.g. a patch analogously to [1] for
> >> >> drivers/gpio/gpio-zynq.c. Adding Philippe for possibly more insights.
> >> >>
> >> >> ARM multi-core is generically fine, see our QEMU target
> >> >> (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> >> >> source.denx.de%2FXenomai%2Fxenomai-images%2F-%2Fjobs%2F552798&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X9HSPXZaFPte2sxFfS5OTd683dFqMhWRpRcGXX%2FtRDg%3D&reserved=0 e.g.). But our only physical armhf target in the lab is a single-core beaglebone.
> >> >>
> >> >> Jan
> >> >>
> >> >> [1]
> >> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fs
> >> >> ource.denx.de%2FXenomai%2Flinux-dovetail%2F-%2Fcommit%2F3d3e5a1659d
> >> >> 69c9dc96471efa542dc05fcc58e56&data=05%7C01%7CEelco.Schuman%40boschr
> >> >> exroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d
> >> >> 648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8e
> >> >> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> >> >> C3000%7C%7C%7C&sdata=Owo4KNn2Bo6r8yKJaI0ZrJNKTQ7woM4eOjYCb74M6ZA%3D
> >> >> &reserved=0
> >> >>
> >> >> --
> >> >> Siemens AG, Technology
> >> >> Competence Center Embedded Linux
> >> >>
> >> >>
> >> >>
> >> > If Philippe doesn't have time for an immediate answer I can try one
> >> > of my zynq boards this weekend.  I reserved some time for ipipe
> >> > things, I can add this to the list and see if I can help.
> >>
> >> That would be great! I have one zync board literally at my feet as
> >> well, but setting all that up will take the usual time...
> >>
> >> Jan
> >>
> >> --
> >> Siemens AG, Technology
> >> Competence Center Embedded Linux
> >>
> >
> >I reproduced the issue (pretty sure), I can look at it this week if that helps.  Just to confirm, Eelco, to reproduce I can create a thread and call sleep(1) and I should see the process hang?
> >
> >-Greg
>
>
> Hi Greg, many thanks for looking at this issue. Yes, a 'sleep(1)' in an application or thread hangs the process.  I noticed that the proxy tick does not get any interrupts:
>
>           CPU0       CPU1
>  24:          0          5     SIRQC  24 Edge          in-band work
>  25:          0          0     GIC-0  27 Edge          gt
>  26:      94785      94399     GIC-0  29 Edge     oob  twd
>  27:          0          0     GIC-0  37 Level         arm-pmu
>  28:          0          0     GIC-0  38 Level         arm-pmu
>  29:         43          0     GIC-0  39 Level         f8007100.adc
>  32:          0          0     GIC-0  45 Level         f8003000.dmac
>  33:          0          0     GIC-0  46 Level         f8003000.dmac
>  34:          0          0     GIC-0  47 Level         f8003000.dmac
>  35:          0          0     GIC-0  48 Level         f8003000.dmac
>  36:          0          0     GIC-0  49 Level         f8003000.dmac
>  37:          0          0     GIC-0  72 Level         f8003000.dmac
>  38:          0          0     GIC-0  73 Level         f8003000.dmac
>  39:          0          0     GIC-0  74 Level         f8003000.dmac
>  40:          0          0     GIC-0  75 Level         f8003000.dmac
>  41:          0          0     GIC-0  54 Level         eth0
>  43:       2840          0     GIC-0  57 Level         cdns-i2c
>  45:        463          0     GIC-0  59 Level         xuartps
>  46:          0          0     GIC-0  64 Level    oob  [nyce-kmod]
>  48:          0          0     SIRQC  48 Edge          Inband sirq
>  49:          0          0     SIRQC  49 Edge          Inband sirq
>  50:          0          0     SIRQC  50 Edge          Inband sirq
>  51:          0          0     SIRQC  51 Edge          proxy tick
> IPI0:          0          0  CPU wakeup interrupts
> IPI1:          0          0  Timer broadcast interrupts
> IPI2:        321        528  Rescheduling interrupts
> IPI3:          5          2  Function call interrupts
> IPI4:          0          0  CPU stop interrupts
> IPI5:          0          0  IRQ work interrupts
> IPI6:          0          0  completion interrupts
>
> Is this because xnintr_host_tick() is not called, e.g. from xnintr_core_clock_handler() ?
>
>
> Best regards,
>
> Eelco
>
>
I would start looking there, that looks suspicious to me.  Prior to
the sleep() call proxy tick does get interrupts?

-Greg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: Process hangs after upgrade
  2023-01-17 19:28           ` Greg Gallagher
@ 2023-01-18 16:44             ` Schuman Eelco (DC-AE/ESW5)
  2023-01-18 17:21               ` Philippe Gerum
  0 siblings, 1 reply; 9+ messages in thread
From: Schuman Eelco (DC-AE/ESW5) @ 2023-01-18 16:44 UTC (permalink / raw)
  To: Greg Gallagher; +Cc: Jan Kiszka, xenomai, Philippe Gerum

>On Tue, Jan 17, 2023 at 12:03 PM Schuman Eelco (DC-AE/ESW5) <Eelco.Schuman@boschrexroth.nl> wrote:
>>
>> > On Fri, Jan 13, 2023 at 12:15 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> > >
>> > > On 13.01.23 17:51, Greg Gallagher wrote:
>> > > On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> > > >
>> >> > > On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
>> >> >>> Hi,
>> >> >>>
>> >> >>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
>> >> >>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
>> >> >>>
>> >> >>> The IRQ pipeline torture tests run OK:
>> >> >>> [    0.992675] Starting IRQ pipeline tests...
>> >> >>> [    0.992687] IRQ pipeline: high-priority torture stage added.
>> >> >>> [    1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
>> >> >>> [    1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
>> >> >>> [    1.021390] CPU1: proxy tick device registered (325.00MHz)
>> >> >>> [    1.026881] CPU0: proxy tick device registered (325.00MHz)
>> >> >>> [    1.035773] irq_pipeline-torture: CPU0: irq_work handled
>> >> >>> [    1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
>> >> >>> [    1.048430] irq_pipeline-torture: CPU0: stage escalation request works
>> >> >>> [    1.048436] irq_pipeline-torture: CPU0: irq_work handled
>> >> >>> [    1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
>> >> >>> [    2.113302] CPU1: proxy tick device unregistered
>> >> >>> [    2.117930] CPU0: proxy tick device unregistered
>> >> >>> [    2.126014] IRQ pipeline: torture stage removed.
>> >> >>> [    2.130654] IRQ pipeline tests OK.
>> >> >>>
>> >> >>>
>> >> >>> There is a warning in the kernel logs:
>> >> >>>
>> >> >>>      0.000000] ------------[ cut here ]------------
>> >> >>> [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
>> >> >>
>> >> >> This complains that hard-IRQs would be on while they must not. 
>> >> >> All that in fairly early boot code.
>> >> >>
>> >> >>> [    0.000000] Modules linked in:
>> >> >>> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
>> >> >>> [    0.000000] Hardware name: Xilinx Zynq Platform
>> >> >>> [    0.000000] IRQ stage: Linux
>> >> >>> [    0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
>> >> >>> [    0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
>> >> >>> [    0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
>> >> >>> [    0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
>> >> >>> [    0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
>> >> >>> [    0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
>> >> >>> [    0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
>> >> >>> [    0.000000] ---[ end trace 58996bc4bee07162 ]---
>> >> >>> [    0.000000] IRQ pipeline enabled
>> >> >>>
>> >> >>> The system and applications run OK with only one core in use.
>> >> >>>
>> >> >>> Is there something missing in the Linux config or anything else I can try?
>> >> >>
>> >> >> Not sure. We don't have a Zynq in the test lab or at least as 
>> >> >> reference image here. It might be that dovetail is missing some 
>> >> >> bits for this SoC, e.g. a patch analogously to [1] for 
>> >> >> drivers/gpio/gpio-zynq.c. Adding Philippe for possibly more insights.
>> >> >>
>> >> >> ARM multi-core is generically fine, see our QEMU target 
>> >> >> (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F
>> >> >> %2F
>> >> >> source.denx.de%2FXenomai%2Fxenomai-images%2F-%2Fjobs%2F552798&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X9HSPXZaFPte2sxFfS5OTd683dFqMhWRpRcGXX%2FtRDg%3D&reserved=0 e.g.). But our only physical armhf target in the lab is a single-core beaglebone.
>> >> >>
>> >> >> Jan
>> >> >>
>> >> >> [1]
>> >> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>> >> >> 2Fs%2F&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7C503f3bab
>> >> >> a62247f1cc7008daf8c0f983%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%
>> >> >> 7C0%7C638095804954456375%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>> >> >> wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C
>> >> >> %7C&sdata=gskg%2FBDZWbmrHmEJe2%2B26EiYQDdroZJBOXXz9d29ROY%3D&res
>> >> >> erved=0 
>> >> >> ource.denx.de%2FXenomai%2Flinux-dovetail%2F-%2Fcommit%2F3d3e5a16
>> >> >> 59d 
>> >> >> 69c9dc96471efa542dc05fcc58e56&data=05%7C01%7CEelco.Schuman%40bos
>> >> >> chr 
>> >> >> exroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bb
>> >> >> b6d 
>> >> >> 648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3
>> >> >> d8e
>> >> >> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
>> >> >> D%7 
>> >> >> C3000%7C%7C%7C&sdata=Owo4KNn2Bo6r8yKJaI0ZrJNKTQ7woM4eOjYCb74M6ZA
>> >> >> %3D
>> >> >> &reserved=0
>> >> >>
>> >> >> --
>> >> >> Siemens AG, Technology
>> >> >> Competence Center Embedded Linux
>> >> >>
>> >> >>
>> >> >>
>> >> > If Philippe doesn't have time for an immediate answer I can try 
>> >> > one of my zynq boards this weekend.  I reserved some time for 
>> >> > ipipe things, I can add this to the list and see if I can help.
>> >>
>> >> That would be great! I have one zync board literally at my feet as 
>> >> well, but setting all that up will take the usual time...
>> >>
>> >> Jan
>> >>
>> >> --
>> >> Siemens AG, Technology
>> >> Competence Center Embedded Linux
>> >>
>> >
>> >I reproduced the issue (pretty sure), I can look at it this week if that helps.  Just to confirm, Eelco, to reproduce I can create a thread and call sleep(1) and I should see the process hang?
>> >
>> >-Greg
>>
>>
>> Hi Greg, many thanks for looking at this issue. Yes, a 'sleep(1)' in an application or thread hangs the process.  I noticed that the proxy tick does not get any interrupts:
>>
>>           CPU0       CPU1
>>  24:          0          5     SIRQC  24 Edge          in-band work
>>  25:          0          0     GIC-0  27 Edge          gt
>>  26:      94785      94399     GIC-0  29 Edge     oob  twd
>>  27:          0          0     GIC-0  37 Level         arm-pmu
>>  28:          0          0     GIC-0  38 Level         arm-pmu
>>  29:         43          0     GIC-0  39 Level         f8007100.adc
>>  32:          0          0     GIC-0  45 Level         f8003000.dmac
>>  33:          0          0     GIC-0  46 Level         f8003000.dmac
>>  34:          0          0     GIC-0  47 Level         f8003000.dmac
>>  35:          0          0     GIC-0  48 Level         f8003000.dmac
>>  36:          0          0     GIC-0  49 Level         f8003000.dmac
>>  37:          0          0     GIC-0  72 Level         f8003000.dmac
>>  38:          0          0     GIC-0  73 Level         f8003000.dmac
>>  39:          0          0     GIC-0  74 Level         f8003000.dmac
>>  40:          0          0     GIC-0  75 Level         f8003000.dmac
>>  41:          0          0     GIC-0  54 Level         eth0
>>  43:       2840          0     GIC-0  57 Level         cdns-i2c
>>  45:        463          0     GIC-0  59 Level         xuartps
>>  46:          0          0     GIC-0  64 Level    oob  [nyce-kmod]
>>  48:          0          0     SIRQC  48 Edge          Inband sirq
>>  49:          0          0     SIRQC  49 Edge          Inband sirq
>>  50:          0          0     SIRQC  50 Edge          Inband sirq
>>  51:          0          0     SIRQC  51 Edge          proxy tick
>> IPI0:          0          0  CPU wakeup interrupts
>> IPI1:          0          0  Timer broadcast interrupts
>> IPI2:        321        528  Rescheduling interrupts
>> IPI3:          5          2  Function call interrupts
>> IPI4:          0          0  CPU stop interrupts
>> IPI5:          0          0  IRQ work interrupts
>> IPI6:          0          0  completion interrupts
>>
>> Is this because xnintr_host_tick() is not called, e.g. from xnintr_core_clock_handler() ?
>>
>>
>> Best regards,
>>
>> Eelco
>>
>>
>I would start looking there, that looks suspicious to me.  Prior to the sleep() call proxy tick does get interrupts?
>
>-Greg

- A process with 'sleep 1' hangs, so if you do (in bash)
   * sleep 1
   * wait 2 seconds
   * put in background ^Z                      [1]+  Stopped                    sleep 1
   * process is scheduled and finishes         [1]+  Done                       sleep 1

- The proxy tick has no interrupts before and after sleep. I don't see changes in the interrupt counts, except for the twd count.

- I did a quick experiment to force xnintr_host_ticks(). With this I see proxy tick interrupts and sleep works as expected. 

void xnintr_core_clock_handler(void)
{
...
...
	sched = xnsched_current();
	// if ((sched->lflags & XNHTICK) &&               -- force inband interrupts to proxy tick
	if (xnthread_test_state(sched->curr, XNROOT))
		xnintr_host_tick(sched);
}

Thanks,
--Eelco



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Process hangs after upgrade
  2023-01-18 16:44             ` Schuman Eelco (DC-AE/ESW5)
@ 2023-01-18 17:21               ` Philippe Gerum
  0 siblings, 0 replies; 9+ messages in thread
From: Philippe Gerum @ 2023-01-18 17:21 UTC (permalink / raw)
  To: Schuman Eelco (DC-AE/ESW5); +Cc: Greg Gallagher, Jan Kiszka, xenomai


"Schuman Eelco (DC-AE/ESW5)" <Eelco.Schuman@boschrexroth.nl> writes:

>>On Tue, Jan 17, 2023 at 12:03 PM Schuman Eelco (DC-AE/ESW5) <Eelco.Schuman@boschrexroth.nl> wrote:
>>>
>>> > On Fri, Jan 13, 2023 at 12:15 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> > >
>>> > > On 13.01.23 17:51, Greg Gallagher wrote:
>>> > > On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> > > >
>>> >> > > On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
>>> >> >>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
>>> >> >>>
>>> >> >>> The IRQ pipeline torture tests run OK:
>>> >> >>> [    0.992675] Starting IRQ pipeline tests...
>>> >> >>> [    0.992687] IRQ pipeline: high-priority torture stage added.
>>> >> >>> [    1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
>>> >> >>> [    1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
>>> >> >>> [    1.021390] CPU1: proxy tick device registered (325.00MHz)
>>> >> >>> [    1.026881] CPU0: proxy tick device registered (325.00MHz)
>>> >> >>> [    1.035773] irq_pipeline-torture: CPU0: irq_work handled
>>> >> >>> [    1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
>>> >> >>> [    1.048430] irq_pipeline-torture: CPU0: stage escalation request works
>>> >> >>> [    1.048436] irq_pipeline-torture: CPU0: irq_work handled
>>> >> >>> [    1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
>>> >> >>> [    2.113302] CPU1: proxy tick device unregistered
>>> >> >>> [    2.117930] CPU0: proxy tick device unregistered
>>> >> >>> [    2.126014] IRQ pipeline: torture stage removed.
>>> >> >>> [    2.130654] IRQ pipeline tests OK.
>>> >> >>>
>>> >> >>>
>>> >> >>> There is a warning in the kernel logs:
>>> >> >>>
>>> >> >>>      0.000000] ------------[ cut here ]------------
>>> >> >>> [    0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
>>> >> >>
>>> >> >> This complains that hard-IRQs would be on while they must not. 
>>> >> >> All that in fairly early boot code.
>>> >> >>
>>> >> >>> [    0.000000] Modules linked in:
>>> >> >>> [    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
>>> >> >>> [    0.000000] Hardware name: Xilinx Zynq Platform
>>> >> >>> [    0.000000] IRQ stage: Linux
>>> >> >>> [    0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
>>> >> >>> [    0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
>>> >> >>> [    0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
>>> >> >>> [    0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
>>> >> >>> [    0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
>>> >> >>> [    0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
>>> >> >>> [    0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
>>> >> >>> [    0.000000] ---[ end trace 58996bc4bee07162 ]---
>>> >> >>> [    0.000000] IRQ pipeline enabled
>>> >> >>>
>>> >> >>> The system and applications run OK with only one core in use.
>>> >> >>>
>>> >> >>> Is there something missing in the Linux config or anything else I can try?
>>> >> >>
>>> >> >> Not sure. We don't have a Zynq in the test lab or at least as 
>>> >> >> reference image here. It might be that dovetail is missing some 
>>> >> >> bits for this SoC, e.g. a patch analogously to [1] for 
>>> >> >> drivers/gpio/gpio-zynq.c. Adding Philippe for possibly more insights.
>>> >> >>
>>> >> >> ARM multi-core is generically fine, see our QEMU target 
>>> >> >> (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F
>>> >> >> %2F
>>> >> >> source.denx.de%2FXenomai%2Fxenomai-images%2F-%2Fjobs%2F552798&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X9HSPXZaFPte2sxFfS5OTd683dFqMhWRpRcGXX%2FtRDg%3D&reserved=0
>>> >> >> e.g.). But our only physical armhf target in the lab is a
>>> >> >> single-core beaglebone.
>>> >> >>
>>> >> >> Jan
>>> >> >>
>>> >> >> [1]
>>> >> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> >> >> 2Fs%2F&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7C503f3bab
>>> >> >> a62247f1cc7008daf8c0f983%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%
>>> >> >> 7C0%7C638095804954456375%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>>> >> >> wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C
>>> >> >> %7C&sdata=gskg%2FBDZWbmrHmEJe2%2B26EiYQDdroZJBOXXz9d29ROY%3D&res
>>> >> >> erved=0 
>>> >> >> ource.denx.de%2FXenomai%2Flinux-dovetail%2F-%2Fcommit%2F3d3e5a16
>>> >> >> 59d 
>>> >> >> 69c9dc96471efa542dc05fcc58e56&data=05%7C01%7CEelco.Schuman%40bos
>>> >> >> chr 
>>> >> >> exroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bb
>>> >> >> b6d 
>>> >> >> 648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3
>>> >> >> d8e
>>> >> >> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
>>> >> >> D%7 
>>> >> >> C3000%7C%7C%7C&sdata=Owo4KNn2Bo6r8yKJaI0ZrJNKTQ7woM4eOjYCb74M6ZA
>>> >> >> %3D
>>> >> >> &reserved=0
>>> >> >>
>>> >> >> --
>>> >> >> Siemens AG, Technology
>>> >> >> Competence Center Embedded Linux
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> > If Philippe doesn't have time for an immediate answer I can try 
>>> >> > one of my zynq boards this weekend.  I reserved some time for 
>>> >> > ipipe things, I can add this to the list and see if I can help.
>>> >>
>>> >> That would be great! I have one zync board literally at my feet as 
>>> >> well, but setting all that up will take the usual time...
>>> >>
>>> >> Jan
>>> >>
>>> >> --
>>> >> Siemens AG, Technology
>>> >> Competence Center Embedded Linux
>>> >>
>>> >
>>> >I reproduced the issue (pretty sure), I can look at it this week if that helps.  Just to confirm, Eelco, to reproduce I can create a thread and call sleep(1) and I should see the process hang?
>>> >
>>> >-Greg
>>>
>>>
>>> Hi Greg, many thanks for looking at this issue. Yes, a 'sleep(1)' in an application or thread hangs the process.  I noticed that the proxy tick does not get any interrupts:
>>>
>>>           CPU0       CPU1
>>>  24:          0          5     SIRQC  24 Edge          in-band work
>>>  25:          0          0     GIC-0  27 Edge          gt
>>>  26:      94785      94399     GIC-0  29 Edge     oob  twd
>>>  27:          0          0     GIC-0  37 Level         arm-pmu
>>>  28:          0          0     GIC-0  38 Level         arm-pmu
>>>  29:         43          0     GIC-0  39 Level         f8007100.adc
>>>  32:          0          0     GIC-0  45 Level         f8003000.dmac
>>>  33:          0          0     GIC-0  46 Level         f8003000.dmac
>>>  34:          0          0     GIC-0  47 Level         f8003000.dmac
>>>  35:          0          0     GIC-0  48 Level         f8003000.dmac
>>>  36:          0          0     GIC-0  49 Level         f8003000.dmac
>>>  37:          0          0     GIC-0  72 Level         f8003000.dmac
>>>  38:          0          0     GIC-0  73 Level         f8003000.dmac
>>>  39:          0          0     GIC-0  74 Level         f8003000.dmac
>>>  40:          0          0     GIC-0  75 Level         f8003000.dmac
>>>  41:          0          0     GIC-0  54 Level         eth0
>>>  43:       2840          0     GIC-0  57 Level         cdns-i2c
>>>  45:        463          0     GIC-0  59 Level         xuartps
>>>  46:          0          0     GIC-0  64 Level    oob  [nyce-kmod]
>>>  48:          0          0     SIRQC  48 Edge          Inband sirq
>>>  49:          0          0     SIRQC  49 Edge          Inband sirq
>>>  50:          0          0     SIRQC  50 Edge          Inband sirq
>>>  51:          0          0     SIRQC  51 Edge          proxy tick
>>> IPI0:          0          0  CPU wakeup interrupts
>>> IPI1:          0          0  Timer broadcast interrupts
>>> IPI2:        321        528  Rescheduling interrupts
>>> IPI3:          5          2  Function call interrupts
>>> IPI4:          0          0  CPU stop interrupts
>>> IPI5:          0          0  IRQ work interrupts
>>> IPI6:          0          0  completion interrupts
>>>
>>> Is this because xnintr_host_tick() is not called, e.g. from xnintr_core_clock_handler() ?
>>>
>>>
>>> Best regards,
>>>
>>> Eelco
>>>
>>>
>>I would start looking there, that looks suspicious to me.  Prior to the sleep() call proxy tick does get interrupts?
>>
>>-Greg
>
> - A process with 'sleep 1' hangs, so if you do (in bash)
>    * sleep 1
>    * wait 2 seconds
>    * put in background ^Z                      [1]+  Stopped                    sleep 1
>    * process is scheduled and finishes         [1]+  Done                       sleep 1
>
> - The proxy tick has no interrupts before and after sleep. I don't see changes in the interrupt counts, except for the twd count.
>
> - I did a quick experiment to force xnintr_host_ticks(). With this I see proxy tick interrupts and sleep works as expected. 
>
> void xnintr_core_clock_handler(void)
> {
> ...
> ...
> 	sched = xnsched_current();
> 	// if ((sched->lflags & XNHTICK) &&               -- force inband interrupts to proxy tick
> 	if (xnthread_test_state(sched->curr, XNROOT))
> 		xnintr_host_tick(sched);
> }
>
> Thanks,
> --Eelco


Is proxy_set_next_ktime() steadily called in the failing case? It should
be with sleep 1 running. The fact that your patch works around the issue
means that the system does receive (oob) clock events normally, but if
Xenomai's host timer (sched->htimer) is not queued upon a timing request
received from the inband kernel, then XNHTICK cannot be raised, breaking
the proxy tick propagation downwards from Xenomai to the inband kernel.

The theory of operations for the proxy tick is documented at [1]. Also,
you may want to check whether the ONESHOT_STOPPED mode of clockevents
is not involved in the issue, just in case.

Sorry, I cannot do much more ATM for investigating the issue, I could
not reproduce it on any of the arm(64) SoCs I have at hand, and I'm
deeply buried in my todo list.

[1] https://evlproject.org/dovetail/porting/timer/#proxy-tick-logic

-- 
Philippe.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-01-18 17:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-12 15:26 Process hangs after upgrade Schuman Eelco (DC-AE/ESW5)
2023-01-13 13:52 ` Jan Kiszka
2023-01-13 16:51   ` Greg Gallagher
2023-01-13 17:15     ` Jan Kiszka
2023-01-16 20:26       ` Greg Gallagher
2023-01-17 17:03         ` Schuman Eelco (DC-AE/ESW5)
2023-01-17 19:28           ` Greg Gallagher
2023-01-18 16:44             ` Schuman Eelco (DC-AE/ESW5)
2023-01-18 17:21               ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.