* Process hangs after upgrade
@ 2023-01-12 15:26 Schuman Eelco (DC-AE/ESW5)
2023-01-13 13:52 ` Jan Kiszka
0 siblings, 1 reply; 9+ messages in thread
From: Schuman Eelco (DC-AE/ESW5) @ 2023-01-12 15:26 UTC (permalink / raw)
To: xenomai
Hi,
I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
I'm seeing scheduling problems, e.g. sleep 1 does not finish.
The IRQ pipeline torture tests run OK:
[ 0.992675] Starting IRQ pipeline tests...
[ 0.992687] IRQ pipeline: high-priority torture stage added.
[ 1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
[ 1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
[ 1.021390] CPU1: proxy tick device registered (325.00MHz)
[ 1.026881] CPU0: proxy tick device registered (325.00MHz)
[ 1.035773] irq_pipeline-torture: CPU0: irq_work handled
[ 1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
[ 1.048430] irq_pipeline-torture: CPU0: stage escalation request works
[ 1.048436] irq_pipeline-torture: CPU0: irq_work handled
[ 1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
[ 2.113302] CPU1: proxy tick device unregistered
[ 2.117930] CPU0: proxy tick device unregistered
[ 2.126014] IRQ pipeline: torture stage removed.
[ 2.130654] IRQ pipeline tests OK.
There is a warning in the kernel logs:
0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
[ 0.000000] Hardware name: Xilinx Zynq Platform
[ 0.000000] IRQ stage: Linux
[ 0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
[ 0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
[ 0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
[ 0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
[ 0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
[ 0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
[ 0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
[ 0.000000] ---[ end trace 58996bc4bee07162 ]---
[ 0.000000] IRQ pipeline enabled
The system and applications run OK with only one core in use.
Is there something missing in the Linux config or anything else I can try?
Thanks,
Eelco Schuman
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process hangs after upgrade
2023-01-12 15:26 Process hangs after upgrade Schuman Eelco (DC-AE/ESW5)
@ 2023-01-13 13:52 ` Jan Kiszka
2023-01-13 16:51 ` Greg Gallagher
0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2023-01-13 13:52 UTC (permalink / raw)
To: Schuman Eelco (DC-AE/ESW5), xenomai, Philippe Gerum
On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
> Hi,
>
> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
>
> The IRQ pipeline torture tests run OK:
> [ 0.992675] Starting IRQ pipeline tests...
> [ 0.992687] IRQ pipeline: high-priority torture stage added.
> [ 1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
> [ 1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
> [ 1.021390] CPU1: proxy tick device registered (325.00MHz)
> [ 1.026881] CPU0: proxy tick device registered (325.00MHz)
> [ 1.035773] irq_pipeline-torture: CPU0: irq_work handled
> [ 1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
> [ 1.048430] irq_pipeline-torture: CPU0: stage escalation request works
> [ 1.048436] irq_pipeline-torture: CPU0: irq_work handled
> [ 1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
> [ 2.113302] CPU1: proxy tick device unregistered
> [ 2.117930] CPU0: proxy tick device unregistered
> [ 2.126014] IRQ pipeline: torture stage removed.
> [ 2.130654] IRQ pipeline tests OK.
>
>
> There is a warning in the kernel logs:
>
> 0.000000] ------------[ cut here ]------------
> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
This complains that hard-IRQs would be on while they must not. All that
in fairly early boot code.
> [ 0.000000] Modules linked in:
> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
> [ 0.000000] Hardware name: Xilinx Zynq Platform
> [ 0.000000] IRQ stage: Linux
> [ 0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
> [ 0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
> [ 0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
> [ 0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
> [ 0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
> [ 0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
> [ 0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
> [ 0.000000] ---[ end trace 58996bc4bee07162 ]---
> [ 0.000000] IRQ pipeline enabled
>
> The system and applications run OK with only one core in use.
>
> Is there something missing in the Linux config or anything else I can try?
Not sure. We don't have a Zynq in the test lab or at least as reference
image here. It might be that dovetail is missing some bits for this SoC,
e.g. a patch analogously to [1] for drivers/gpio/gpio-zynq.c. Adding
Philippe for possibly more insights.
ARM multi-core is generically fine, see our QEMU target
(https://source.denx.de/Xenomai/xenomai-images/-/jobs/552798 e.g.). But
our only physical armhf target in the lab is a single-core beaglebone.
Jan
[1]
https://source.denx.de/Xenomai/linux-dovetail/-/commit/3d3e5a1659d69c9dc96471efa542dc05fcc58e56
--
Siemens AG, Technology
Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process hangs after upgrade
2023-01-13 13:52 ` Jan Kiszka
@ 2023-01-13 16:51 ` Greg Gallagher
2023-01-13 17:15 ` Jan Kiszka
0 siblings, 1 reply; 9+ messages in thread
From: Greg Gallagher @ 2023-01-13 16:51 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Schuman Eelco (DC-AE/ESW5), xenomai, Philippe Gerum
On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>
> On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
> > Hi,
> >
> > I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
> > I'm seeing scheduling problems, e.g. sleep 1 does not finish.
> >
> > The IRQ pipeline torture tests run OK:
> > [ 0.992675] Starting IRQ pipeline tests...
> > [ 0.992687] IRQ pipeline: high-priority torture stage added.
> > [ 1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
> > [ 1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
> > [ 1.021390] CPU1: proxy tick device registered (325.00MHz)
> > [ 1.026881] CPU0: proxy tick device registered (325.00MHz)
> > [ 1.035773] irq_pipeline-torture: CPU0: irq_work handled
> > [ 1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
> > [ 1.048430] irq_pipeline-torture: CPU0: stage escalation request works
> > [ 1.048436] irq_pipeline-torture: CPU0: irq_work handled
> > [ 1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
> > [ 2.113302] CPU1: proxy tick device unregistered
> > [ 2.117930] CPU0: proxy tick device unregistered
> > [ 2.126014] IRQ pipeline: torture stage removed.
> > [ 2.130654] IRQ pipeline tests OK.
> >
> >
> > There is a warning in the kernel logs:
> >
> > 0.000000] ------------[ cut here ]------------
> > [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
>
> This complains that hard-IRQs would be on while they must not. All that
> in fairly early boot code.
>
> > [ 0.000000] Modules linked in:
> > [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
> > [ 0.000000] Hardware name: Xilinx Zynq Platform
> > [ 0.000000] IRQ stage: Linux
> > [ 0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
> > [ 0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
> > [ 0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
> > [ 0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
> > [ 0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
> > [ 0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
> > [ 0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
> > [ 0.000000] ---[ end trace 58996bc4bee07162 ]---
> > [ 0.000000] IRQ pipeline enabled
> >
> > The system and applications run OK with only one core in use.
> >
> > Is there something missing in the Linux config or anything else I can try?
>
> Not sure. We don't have a Zynq in the test lab or at least as reference
> image here. It might be that dovetail is missing some bits for this SoC,
> e.g. a patch analogously to [1] for drivers/gpio/gpio-zynq.c. Adding
> Philippe for possibly more insights.
>
> ARM multi-core is generically fine, see our QEMU target
> (https://source.denx.de/Xenomai/xenomai-images/-/jobs/552798 e.g.). But
> our only physical armhf target in the lab is a single-core beaglebone.
>
> Jan
>
> [1]
> https://source.denx.de/Xenomai/linux-dovetail/-/commit/3d3e5a1659d69c9dc96471efa542dc05fcc58e56
>
> --
> Siemens AG, Technology
> Competence Center Embedded Linux
>
>
If Philippe doesn't have time for an immediate answer I can try one of
my zynq boards this weekend. I reserved some time for ipipe things, I
can add this to the list and see if I can help.
-Greg
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process hangs after upgrade
2023-01-13 16:51 ` Greg Gallagher
@ 2023-01-13 17:15 ` Jan Kiszka
2023-01-16 20:26 ` Greg Gallagher
0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2023-01-13 17:15 UTC (permalink / raw)
To: Greg Gallagher; +Cc: Schuman Eelco (DC-AE/ESW5), xenomai, Philippe Gerum
On 13.01.23 17:51, Greg Gallagher wrote:
> On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>
>> On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
>>> Hi,
>>>
>>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
>>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
>>>
>>> The IRQ pipeline torture tests run OK:
>>> [ 0.992675] Starting IRQ pipeline tests...
>>> [ 0.992687] IRQ pipeline: high-priority torture stage added.
>>> [ 1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
>>> [ 1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
>>> [ 1.021390] CPU1: proxy tick device registered (325.00MHz)
>>> [ 1.026881] CPU0: proxy tick device registered (325.00MHz)
>>> [ 1.035773] irq_pipeline-torture: CPU0: irq_work handled
>>> [ 1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
>>> [ 1.048430] irq_pipeline-torture: CPU0: stage escalation request works
>>> [ 1.048436] irq_pipeline-torture: CPU0: irq_work handled
>>> [ 1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
>>> [ 2.113302] CPU1: proxy tick device unregistered
>>> [ 2.117930] CPU0: proxy tick device unregistered
>>> [ 2.126014] IRQ pipeline: torture stage removed.
>>> [ 2.130654] IRQ pipeline tests OK.
>>>
>>>
>>> There is a warning in the kernel logs:
>>>
>>> 0.000000] ------------[ cut here ]------------
>>> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
>>
>> This complains that hard-IRQs would be on while they must not. All that
>> in fairly early boot code.
>>
>>> [ 0.000000] Modules linked in:
>>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
>>> [ 0.000000] Hardware name: Xilinx Zynq Platform
>>> [ 0.000000] IRQ stage: Linux
>>> [ 0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
>>> [ 0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
>>> [ 0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
>>> [ 0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
>>> [ 0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
>>> [ 0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
>>> [ 0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
>>> [ 0.000000] ---[ end trace 58996bc4bee07162 ]---
>>> [ 0.000000] IRQ pipeline enabled
>>>
>>> The system and applications run OK with only one core in use.
>>>
>>> Is there something missing in the Linux config or anything else I can try?
>>
>> Not sure. We don't have a Zynq in the test lab or at least as reference
>> image here. It might be that dovetail is missing some bits for this SoC,
>> e.g. a patch analogously to [1] for drivers/gpio/gpio-zynq.c. Adding
>> Philippe for possibly more insights.
>>
>> ARM multi-core is generically fine, see our QEMU target
>> (https://source.denx.de/Xenomai/xenomai-images/-/jobs/552798 e.g.). But
>> our only physical armhf target in the lab is a single-core beaglebone.
>>
>> Jan
>>
>> [1]
>> https://source.denx.de/Xenomai/linux-dovetail/-/commit/3d3e5a1659d69c9dc96471efa542dc05fcc58e56
>>
>> --
>> Siemens AG, Technology
>> Competence Center Embedded Linux
>>
>>
>
> If Philippe doesn't have time for an immediate answer I can try one of
> my zynq boards this weekend. I reserved some time for ipipe things, I
> can add this to the list and see if I can help.
That would be great! I have one zync board literally at my feet as well,
but setting all that up will take the usual time...
Jan
--
Siemens AG, Technology
Competence Center Embedded Linux
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process hangs after upgrade
2023-01-13 17:15 ` Jan Kiszka
@ 2023-01-16 20:26 ` Greg Gallagher
2023-01-17 17:03 ` Schuman Eelco (DC-AE/ESW5)
0 siblings, 1 reply; 9+ messages in thread
From: Greg Gallagher @ 2023-01-16 20:26 UTC (permalink / raw)
To: Jan Kiszka; +Cc: Schuman Eelco (DC-AE/ESW5), xenomai, Philippe Gerum
On Fri, Jan 13, 2023 at 12:15 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>
> On 13.01.23 17:51, Greg Gallagher wrote:
> > On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
> >>
> >> On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
> >>> Hi,
> >>>
> >>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
> >>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
> >>>
> >>> The IRQ pipeline torture tests run OK:
> >>> [ 0.992675] Starting IRQ pipeline tests...
> >>> [ 0.992687] IRQ pipeline: high-priority torture stage added.
> >>> [ 1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
> >>> [ 1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
> >>> [ 1.021390] CPU1: proxy tick device registered (325.00MHz)
> >>> [ 1.026881] CPU0: proxy tick device registered (325.00MHz)
> >>> [ 1.035773] irq_pipeline-torture: CPU0: irq_work handled
> >>> [ 1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
> >>> [ 1.048430] irq_pipeline-torture: CPU0: stage escalation request works
> >>> [ 1.048436] irq_pipeline-torture: CPU0: irq_work handled
> >>> [ 1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
> >>> [ 2.113302] CPU1: proxy tick device unregistered
> >>> [ 2.117930] CPU0: proxy tick device unregistered
> >>> [ 2.126014] IRQ pipeline: torture stage removed.
> >>> [ 2.130654] IRQ pipeline tests OK.
> >>>
> >>>
> >>> There is a warning in the kernel logs:
> >>>
> >>> 0.000000] ------------[ cut here ]------------
> >>> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
> >>
> >> This complains that hard-IRQs would be on while they must not. All that
> >> in fairly early boot code.
> >>
> >>> [ 0.000000] Modules linked in:
> >>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
> >>> [ 0.000000] Hardware name: Xilinx Zynq Platform
> >>> [ 0.000000] IRQ stage: Linux
> >>> [ 0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
> >>> [ 0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
> >>> [ 0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
> >>> [ 0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
> >>> [ 0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
> >>> [ 0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
> >>> [ 0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
> >>> [ 0.000000] ---[ end trace 58996bc4bee07162 ]---
> >>> [ 0.000000] IRQ pipeline enabled
> >>>
> >>> The system and applications run OK with only one core in use.
> >>>
> >>> Is there something missing in the Linux config or anything else I can try?
> >>
> >> Not sure. We don't have a Zynq in the test lab or at least as reference
> >> image here. It might be that dovetail is missing some bits for this SoC,
> >> e.g. a patch analogously to [1] for drivers/gpio/gpio-zynq.c. Adding
> >> Philippe for possibly more insights.
> >>
> >> ARM multi-core is generically fine, see our QEMU target
> >> (https://source.denx.de/Xenomai/xenomai-images/-/jobs/552798 e.g.). But
> >> our only physical armhf target in the lab is a single-core beaglebone.
> >>
> >> Jan
> >>
> >> [1]
> >> https://source.denx.de/Xenomai/linux-dovetail/-/commit/3d3e5a1659d69c9dc96471efa542dc05fcc58e56
> >>
> >> --
> >> Siemens AG, Technology
> >> Competence Center Embedded Linux
> >>
> >>
> >
> > If Philippe doesn't have time for an immediate answer I can try one of
> > my zynq boards this weekend. I reserved some time for ipipe things, I
> > can add this to the list and see if I can help.
>
> That would be great! I have one zync board literally at my feet as well,
> but setting all that up will take the usual time...
>
> Jan
>
> --
> Siemens AG, Technology
> Competence Center Embedded Linux
>
I reproduced the issue (pretty sure), I can look at it this week if
that helps. Just to confirm, Eelco, to reproduce I can create a
thread and call sleep(1) and I should see the process hang?
-Greg
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Process hangs after upgrade
2023-01-16 20:26 ` Greg Gallagher
@ 2023-01-17 17:03 ` Schuman Eelco (DC-AE/ESW5)
2023-01-17 19:28 ` Greg Gallagher
0 siblings, 1 reply; 9+ messages in thread
From: Schuman Eelco (DC-AE/ESW5) @ 2023-01-17 17:03 UTC (permalink / raw)
To: Greg Gallagher, Jan Kiszka; +Cc: xenomai, Philippe Gerum
> On Fri, Jan 13, 2023 at 12:15 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
> >
> > On 13.01.23 17:51, Greg Gallagher wrote:
> > On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > >
>> > > On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
>> >>> Hi,
>> >>>
>> >>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
>> >>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
>> >>>
>> >>> The IRQ pipeline torture tests run OK:
>> >>> [ 0.992675] Starting IRQ pipeline tests...
>> >>> [ 0.992687] IRQ pipeline: high-priority torture stage added.
>> >>> [ 1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
>> >>> [ 1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
>> >>> [ 1.021390] CPU1: proxy tick device registered (325.00MHz)
>> >>> [ 1.026881] CPU0: proxy tick device registered (325.00MHz)
>> >>> [ 1.035773] irq_pipeline-torture: CPU0: irq_work handled
>> >>> [ 1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
>> >>> [ 1.048430] irq_pipeline-torture: CPU0: stage escalation request works
>> >>> [ 1.048436] irq_pipeline-torture: CPU0: irq_work handled
>> >>> [ 1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
>> >>> [ 2.113302] CPU1: proxy tick device unregistered
>> >>> [ 2.117930] CPU0: proxy tick device unregistered
>> >>> [ 2.126014] IRQ pipeline: torture stage removed.
>> >>> [ 2.130654] IRQ pipeline tests OK.
>> >>>
>> >>>
>> >>> There is a warning in the kernel logs:
>> >>>
>> >>> 0.000000] ------------[ cut here ]------------
>> >>> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
>> >>
>> >> This complains that hard-IRQs would be on while they must not. All
>> >> that in fairly early boot code.
>> >>
>> >>> [ 0.000000] Modules linked in:
>> >>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
>> >>> [ 0.000000] Hardware name: Xilinx Zynq Platform
>> >>> [ 0.000000] IRQ stage: Linux
>> >>> [ 0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
>> >>> [ 0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
>> >>> [ 0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
>> >>> [ 0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
>> >>> [ 0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
>> >>> [ 0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
>> >>> [ 0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
>> >>> [ 0.000000] ---[ end trace 58996bc4bee07162 ]---
>> >>> [ 0.000000] IRQ pipeline enabled
>> >>>
>> >>> The system and applications run OK with only one core in use.
>> >>>
>> >>> Is there something missing in the Linux config or anything else I can try?
>> >>
>> >> Not sure. We don't have a Zynq in the test lab or at least as
>> >> reference image here. It might be that dovetail is missing some
>> >> bits for this SoC, e.g. a patch analogously to [1] for
>> >> drivers/gpio/gpio-zynq.c. Adding Philippe for possibly more insights.
>> >>
>> >> ARM multi-core is generically fine, see our QEMU target
>> >> (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
>> >> source.denx.de%2FXenomai%2Fxenomai-images%2F-%2Fjobs%2F552798&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X9HSPXZaFPte2sxFfS5OTd683dFqMhWRpRcGXX%2FtRDg%3D&reserved=0 e.g.). But our only physical armhf target in the lab is a single-core beaglebone.
>> >>
>> >> Jan
>> >>
>> >> [1]
>> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fs
>> >> ource.denx.de%2FXenomai%2Flinux-dovetail%2F-%2Fcommit%2F3d3e5a1659d
>> >> 69c9dc96471efa542dc05fcc58e56&data=05%7C01%7CEelco.Schuman%40boschr
>> >> exroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d
>> >> 648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8e
>> >> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
>> >> C3000%7C%7C%7C&sdata=Owo4KNn2Bo6r8yKJaI0ZrJNKTQ7woM4eOjYCb74M6ZA%3D
>> >> &reserved=0
>> >>
>> >> --
>> >> Siemens AG, Technology
>> >> Competence Center Embedded Linux
>> >>
>> >>
>> >>
>> > If Philippe doesn't have time for an immediate answer I can try one
>> > of my zynq boards this weekend. I reserved some time for ipipe
>> > things, I can add this to the list and see if I can help.
>>
>> That would be great! I have one zync board literally at my feet as
>> well, but setting all that up will take the usual time...
>>
>> Jan
>>
>> --
>> Siemens AG, Technology
>> Competence Center Embedded Linux
>>
>
>I reproduced the issue (pretty sure), I can look at it this week if that helps. Just to confirm, Eelco, to reproduce I can create a thread and call sleep(1) and I should see the process hang?
>
>-Greg
Hi Greg, many thanks for looking at this issue. Yes, a 'sleep(1)' in an application or thread hangs the process. I noticed that the proxy tick does not get any interrupts:
CPU0 CPU1
24: 0 5 SIRQC 24 Edge in-band work
25: 0 0 GIC-0 27 Edge gt
26: 94785 94399 GIC-0 29 Edge oob twd
27: 0 0 GIC-0 37 Level arm-pmu
28: 0 0 GIC-0 38 Level arm-pmu
29: 43 0 GIC-0 39 Level f8007100.adc
32: 0 0 GIC-0 45 Level f8003000.dmac
33: 0 0 GIC-0 46 Level f8003000.dmac
34: 0 0 GIC-0 47 Level f8003000.dmac
35: 0 0 GIC-0 48 Level f8003000.dmac
36: 0 0 GIC-0 49 Level f8003000.dmac
37: 0 0 GIC-0 72 Level f8003000.dmac
38: 0 0 GIC-0 73 Level f8003000.dmac
39: 0 0 GIC-0 74 Level f8003000.dmac
40: 0 0 GIC-0 75 Level f8003000.dmac
41: 0 0 GIC-0 54 Level eth0
43: 2840 0 GIC-0 57 Level cdns-i2c
45: 463 0 GIC-0 59 Level xuartps
46: 0 0 GIC-0 64 Level oob [nyce-kmod]
48: 0 0 SIRQC 48 Edge Inband sirq
49: 0 0 SIRQC 49 Edge Inband sirq
50: 0 0 SIRQC 50 Edge Inband sirq
51: 0 0 SIRQC 51 Edge proxy tick
IPI0: 0 0 CPU wakeup interrupts
IPI1: 0 0 Timer broadcast interrupts
IPI2: 321 528 Rescheduling interrupts
IPI3: 5 2 Function call interrupts
IPI4: 0 0 CPU stop interrupts
IPI5: 0 0 IRQ work interrupts
IPI6: 0 0 completion interrupts
Is this because xnintr_host_tick() is not called, e.g. from xnintr_core_clock_handler() ?
Best regards,
Eelco
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process hangs after upgrade
2023-01-17 17:03 ` Schuman Eelco (DC-AE/ESW5)
@ 2023-01-17 19:28 ` Greg Gallagher
2023-01-18 16:44 ` Schuman Eelco (DC-AE/ESW5)
0 siblings, 1 reply; 9+ messages in thread
From: Greg Gallagher @ 2023-01-17 19:28 UTC (permalink / raw)
To: Schuman Eelco (DC-AE/ESW5); +Cc: Jan Kiszka, xenomai, Philippe Gerum
On Tue, Jan 17, 2023 at 12:03 PM Schuman Eelco (DC-AE/ESW5)
<Eelco.Schuman@boschrexroth.nl> wrote:
>
> > On Fri, Jan 13, 2023 at 12:15 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > >
> > > On 13.01.23 17:51, Greg Gallagher wrote:
> > > On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
> > > >
> >> > > On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
> >> >>> Hi,
> >> >>>
> >> >>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
> >> >>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
> >> >>>
> >> >>> The IRQ pipeline torture tests run OK:
> >> >>> [ 0.992675] Starting IRQ pipeline tests...
> >> >>> [ 0.992687] IRQ pipeline: high-priority torture stage added.
> >> >>> [ 1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
> >> >>> [ 1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
> >> >>> [ 1.021390] CPU1: proxy tick device registered (325.00MHz)
> >> >>> [ 1.026881] CPU0: proxy tick device registered (325.00MHz)
> >> >>> [ 1.035773] irq_pipeline-torture: CPU0: irq_work handled
> >> >>> [ 1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
> >> >>> [ 1.048430] irq_pipeline-torture: CPU0: stage escalation request works
> >> >>> [ 1.048436] irq_pipeline-torture: CPU0: irq_work handled
> >> >>> [ 1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
> >> >>> [ 2.113302] CPU1: proxy tick device unregistered
> >> >>> [ 2.117930] CPU0: proxy tick device unregistered
> >> >>> [ 2.126014] IRQ pipeline: torture stage removed.
> >> >>> [ 2.130654] IRQ pipeline tests OK.
> >> >>>
> >> >>>
> >> >>> There is a warning in the kernel logs:
> >> >>>
> >> >>> 0.000000] ------------[ cut here ]------------
> >> >>> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
> >> >>
> >> >> This complains that hard-IRQs would be on while they must not. All
> >> >> that in fairly early boot code.
> >> >>
> >> >>> [ 0.000000] Modules linked in:
> >> >>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
> >> >>> [ 0.000000] Hardware name: Xilinx Zynq Platform
> >> >>> [ 0.000000] IRQ stage: Linux
> >> >>> [ 0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
> >> >>> [ 0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
> >> >>> [ 0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
> >> >>> [ 0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
> >> >>> [ 0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
> >> >>> [ 0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
> >> >>> [ 0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
> >> >>> [ 0.000000] ---[ end trace 58996bc4bee07162 ]---
> >> >>> [ 0.000000] IRQ pipeline enabled
> >> >>>
> >> >>> The system and applications run OK with only one core in use.
> >> >>>
> >> >>> Is there something missing in the Linux config or anything else I can try?
> >> >>
> >> >> Not sure. We don't have a Zynq in the test lab or at least as
> >> >> reference image here. It might be that dovetail is missing some
> >> >> bits for this SoC, e.g. a patch analogously to [1] for
> >> >> drivers/gpio/gpio-zynq.c. Adding Philippe for possibly more insights.
> >> >>
> >> >> ARM multi-core is generically fine, see our QEMU target
> >> >> (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> >> >> source.denx.de%2FXenomai%2Fxenomai-images%2F-%2Fjobs%2F552798&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X9HSPXZaFPte2sxFfS5OTd683dFqMhWRpRcGXX%2FtRDg%3D&reserved=0 e.g.). But our only physical armhf target in the lab is a single-core beaglebone.
> >> >>
> >> >> Jan
> >> >>
> >> >> [1]
> >> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fs
> >> >> ource.denx.de%2FXenomai%2Flinux-dovetail%2F-%2Fcommit%2F3d3e5a1659d
> >> >> 69c9dc96471efa542dc05fcc58e56&data=05%7C01%7CEelco.Schuman%40boschr
> >> >> exroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d
> >> >> 648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8e
> >> >> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7
> >> >> C3000%7C%7C%7C&sdata=Owo4KNn2Bo6r8yKJaI0ZrJNKTQ7woM4eOjYCb74M6ZA%3D
> >> >> &reserved=0
> >> >>
> >> >> --
> >> >> Siemens AG, Technology
> >> >> Competence Center Embedded Linux
> >> >>
> >> >>
> >> >>
> >> > If Philippe doesn't have time for an immediate answer I can try one
> >> > of my zynq boards this weekend. I reserved some time for ipipe
> >> > things, I can add this to the list and see if I can help.
> >>
> >> That would be great! I have one zync board literally at my feet as
> >> well, but setting all that up will take the usual time...
> >>
> >> Jan
> >>
> >> --
> >> Siemens AG, Technology
> >> Competence Center Embedded Linux
> >>
> >
> >I reproduced the issue (pretty sure), I can look at it this week if that helps. Just to confirm, Eelco, to reproduce I can create a thread and call sleep(1) and I should see the process hang?
> >
> >-Greg
>
>
> Hi Greg, many thanks for looking at this issue. Yes, a 'sleep(1)' in an application or thread hangs the process. I noticed that the proxy tick does not get any interrupts:
>
> CPU0 CPU1
> 24: 0 5 SIRQC 24 Edge in-band work
> 25: 0 0 GIC-0 27 Edge gt
> 26: 94785 94399 GIC-0 29 Edge oob twd
> 27: 0 0 GIC-0 37 Level arm-pmu
> 28: 0 0 GIC-0 38 Level arm-pmu
> 29: 43 0 GIC-0 39 Level f8007100.adc
> 32: 0 0 GIC-0 45 Level f8003000.dmac
> 33: 0 0 GIC-0 46 Level f8003000.dmac
> 34: 0 0 GIC-0 47 Level f8003000.dmac
> 35: 0 0 GIC-0 48 Level f8003000.dmac
> 36: 0 0 GIC-0 49 Level f8003000.dmac
> 37: 0 0 GIC-0 72 Level f8003000.dmac
> 38: 0 0 GIC-0 73 Level f8003000.dmac
> 39: 0 0 GIC-0 74 Level f8003000.dmac
> 40: 0 0 GIC-0 75 Level f8003000.dmac
> 41: 0 0 GIC-0 54 Level eth0
> 43: 2840 0 GIC-0 57 Level cdns-i2c
> 45: 463 0 GIC-0 59 Level xuartps
> 46: 0 0 GIC-0 64 Level oob [nyce-kmod]
> 48: 0 0 SIRQC 48 Edge Inband sirq
> 49: 0 0 SIRQC 49 Edge Inband sirq
> 50: 0 0 SIRQC 50 Edge Inband sirq
> 51: 0 0 SIRQC 51 Edge proxy tick
> IPI0: 0 0 CPU wakeup interrupts
> IPI1: 0 0 Timer broadcast interrupts
> IPI2: 321 528 Rescheduling interrupts
> IPI3: 5 2 Function call interrupts
> IPI4: 0 0 CPU stop interrupts
> IPI5: 0 0 IRQ work interrupts
> IPI6: 0 0 completion interrupts
>
> Is this because xnintr_host_tick() is not called, e.g. from xnintr_core_clock_handler() ?
>
>
> Best regards,
>
> Eelco
>
>
I would start looking there, that looks suspicious to me. Prior to
the sleep() call proxy tick does get interrupts?
-Greg
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Process hangs after upgrade
2023-01-17 19:28 ` Greg Gallagher
@ 2023-01-18 16:44 ` Schuman Eelco (DC-AE/ESW5)
2023-01-18 17:21 ` Philippe Gerum
0 siblings, 1 reply; 9+ messages in thread
From: Schuman Eelco (DC-AE/ESW5) @ 2023-01-18 16:44 UTC (permalink / raw)
To: Greg Gallagher; +Cc: Jan Kiszka, xenomai, Philippe Gerum
>On Tue, Jan 17, 2023 at 12:03 PM Schuman Eelco (DC-AE/ESW5) <Eelco.Schuman@boschrexroth.nl> wrote:
>>
>> > On Fri, Jan 13, 2023 at 12:15 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> > >
>> > > On 13.01.23 17:51, Greg Gallagher wrote:
>> > > On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>> > > >
>> >> > > On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
>> >> >>> Hi,
>> >> >>>
>> >> >>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
>> >> >>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
>> >> >>>
>> >> >>> The IRQ pipeline torture tests run OK:
>> >> >>> [ 0.992675] Starting IRQ pipeline tests...
>> >> >>> [ 0.992687] IRQ pipeline: high-priority torture stage added.
>> >> >>> [ 1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
>> >> >>> [ 1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
>> >> >>> [ 1.021390] CPU1: proxy tick device registered (325.00MHz)
>> >> >>> [ 1.026881] CPU0: proxy tick device registered (325.00MHz)
>> >> >>> [ 1.035773] irq_pipeline-torture: CPU0: irq_work handled
>> >> >>> [ 1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
>> >> >>> [ 1.048430] irq_pipeline-torture: CPU0: stage escalation request works
>> >> >>> [ 1.048436] irq_pipeline-torture: CPU0: irq_work handled
>> >> >>> [ 1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
>> >> >>> [ 2.113302] CPU1: proxy tick device unregistered
>> >> >>> [ 2.117930] CPU0: proxy tick device unregistered
>> >> >>> [ 2.126014] IRQ pipeline: torture stage removed.
>> >> >>> [ 2.130654] IRQ pipeline tests OK.
>> >> >>>
>> >> >>>
>> >> >>> There is a warning in the kernel logs:
>> >> >>>
>> >> >>> 0.000000] ------------[ cut here ]------------
>> >> >>> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
>> >> >>
>> >> >> This complains that hard-IRQs would be on while they must not.
>> >> >> All that in fairly early boot code.
>> >> >>
>> >> >>> [ 0.000000] Modules linked in:
>> >> >>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
>> >> >>> [ 0.000000] Hardware name: Xilinx Zynq Platform
>> >> >>> [ 0.000000] IRQ stage: Linux
>> >> >>> [ 0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
>> >> >>> [ 0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
>> >> >>> [ 0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
>> >> >>> [ 0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
>> >> >>> [ 0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
>> >> >>> [ 0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
>> >> >>> [ 0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
>> >> >>> [ 0.000000] ---[ end trace 58996bc4bee07162 ]---
>> >> >>> [ 0.000000] IRQ pipeline enabled
>> >> >>>
>> >> >>> The system and applications run OK with only one core in use.
>> >> >>>
>> >> >>> Is there something missing in the Linux config or anything else I can try?
>> >> >>
>> >> >> Not sure. We don't have a Zynq in the test lab or at least as
>> >> >> reference image here. It might be that dovetail is missing some
>> >> >> bits for this SoC, e.g. a patch analogously to [1] for
>> >> >> drivers/gpio/gpio-zynq.c. Adding Philippe for possibly more insights.
>> >> >>
>> >> >> ARM multi-core is generically fine, see our QEMU target
>> >> >> (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F
>> >> >> %2F
>> >> >> source.denx.de%2FXenomai%2Fxenomai-images%2F-%2Fjobs%2F552798&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X9HSPXZaFPte2sxFfS5OTd683dFqMhWRpRcGXX%2FtRDg%3D&reserved=0 e.g.). But our only physical armhf target in the lab is a single-core beaglebone.
>> >> >>
>> >> >> Jan
>> >> >>
>> >> >> [1]
>> >> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>> >> >> 2Fs%2F&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7C503f3bab
>> >> >> a62247f1cc7008daf8c0f983%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%
>> >> >> 7C0%7C638095804954456375%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>> >> >> wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C
>> >> >> %7C&sdata=gskg%2FBDZWbmrHmEJe2%2B26EiYQDdroZJBOXXz9d29ROY%3D&res
>> >> >> erved=0
>> >> >> ource.denx.de%2FXenomai%2Flinux-dovetail%2F-%2Fcommit%2F3d3e5a16
>> >> >> 59d
>> >> >> 69c9dc96471efa542dc05fcc58e56&data=05%7C01%7CEelco.Schuman%40bos
>> >> >> chr
>> >> >> exroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bb
>> >> >> b6d
>> >> >> 648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3
>> >> >> d8e
>> >> >> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
>> >> >> D%7
>> >> >> C3000%7C%7C%7C&sdata=Owo4KNn2Bo6r8yKJaI0ZrJNKTQ7woM4eOjYCb74M6ZA
>> >> >> %3D
>> >> >> &reserved=0
>> >> >>
>> >> >> --
>> >> >> Siemens AG, Technology
>> >> >> Competence Center Embedded Linux
>> >> >>
>> >> >>
>> >> >>
>> >> > If Philippe doesn't have time for an immediate answer I can try
>> >> > one of my zynq boards this weekend. I reserved some time for
>> >> > ipipe things, I can add this to the list and see if I can help.
>> >>
>> >> That would be great! I have one zync board literally at my feet as
>> >> well, but setting all that up will take the usual time...
>> >>
>> >> Jan
>> >>
>> >> --
>> >> Siemens AG, Technology
>> >> Competence Center Embedded Linux
>> >>
>> >
>> >I reproduced the issue (pretty sure), I can look at it this week if that helps. Just to confirm, Eelco, to reproduce I can create a thread and call sleep(1) and I should see the process hang?
>> >
>> >-Greg
>>
>>
>> Hi Greg, many thanks for looking at this issue. Yes, a 'sleep(1)' in an application or thread hangs the process. I noticed that the proxy tick does not get any interrupts:
>>
>> CPU0 CPU1
>> 24: 0 5 SIRQC 24 Edge in-band work
>> 25: 0 0 GIC-0 27 Edge gt
>> 26: 94785 94399 GIC-0 29 Edge oob twd
>> 27: 0 0 GIC-0 37 Level arm-pmu
>> 28: 0 0 GIC-0 38 Level arm-pmu
>> 29: 43 0 GIC-0 39 Level f8007100.adc
>> 32: 0 0 GIC-0 45 Level f8003000.dmac
>> 33: 0 0 GIC-0 46 Level f8003000.dmac
>> 34: 0 0 GIC-0 47 Level f8003000.dmac
>> 35: 0 0 GIC-0 48 Level f8003000.dmac
>> 36: 0 0 GIC-0 49 Level f8003000.dmac
>> 37: 0 0 GIC-0 72 Level f8003000.dmac
>> 38: 0 0 GIC-0 73 Level f8003000.dmac
>> 39: 0 0 GIC-0 74 Level f8003000.dmac
>> 40: 0 0 GIC-0 75 Level f8003000.dmac
>> 41: 0 0 GIC-0 54 Level eth0
>> 43: 2840 0 GIC-0 57 Level cdns-i2c
>> 45: 463 0 GIC-0 59 Level xuartps
>> 46: 0 0 GIC-0 64 Level oob [nyce-kmod]
>> 48: 0 0 SIRQC 48 Edge Inband sirq
>> 49: 0 0 SIRQC 49 Edge Inband sirq
>> 50: 0 0 SIRQC 50 Edge Inband sirq
>> 51: 0 0 SIRQC 51 Edge proxy tick
>> IPI0: 0 0 CPU wakeup interrupts
>> IPI1: 0 0 Timer broadcast interrupts
>> IPI2: 321 528 Rescheduling interrupts
>> IPI3: 5 2 Function call interrupts
>> IPI4: 0 0 CPU stop interrupts
>> IPI5: 0 0 IRQ work interrupts
>> IPI6: 0 0 completion interrupts
>>
>> Is this because xnintr_host_tick() is not called, e.g. from xnintr_core_clock_handler() ?
>>
>>
>> Best regards,
>>
>> Eelco
>>
>>
>I would start looking there, that looks suspicious to me. Prior to the sleep() call proxy tick does get interrupts?
>
>-Greg
- A process with 'sleep 1' hangs, so if you do (in bash)
* sleep 1
* wait 2 seconds
* put in background ^Z [1]+ Stopped sleep 1
* process is scheduled and finishes [1]+ Done sleep 1
- The proxy tick has no interrupts before and after sleep. I don't see changes in the interrupt counts, except for the twd count.
- I did a quick experiment to force xnintr_host_ticks(). With this I see proxy tick interrupts and sleep works as expected.
void xnintr_core_clock_handler(void)
{
...
...
sched = xnsched_current();
// if ((sched->lflags & XNHTICK) && -- force inband interrupts to proxy tick
if (xnthread_test_state(sched->curr, XNROOT))
xnintr_host_tick(sched);
}
Thanks,
--Eelco
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Process hangs after upgrade
2023-01-18 16:44 ` Schuman Eelco (DC-AE/ESW5)
@ 2023-01-18 17:21 ` Philippe Gerum
0 siblings, 0 replies; 9+ messages in thread
From: Philippe Gerum @ 2023-01-18 17:21 UTC (permalink / raw)
To: Schuman Eelco (DC-AE/ESW5); +Cc: Greg Gallagher, Jan Kiszka, xenomai
"Schuman Eelco (DC-AE/ESW5)" <Eelco.Schuman@boschrexroth.nl> writes:
>>On Tue, Jan 17, 2023 at 12:03 PM Schuman Eelco (DC-AE/ESW5) <Eelco.Schuman@boschrexroth.nl> wrote:
>>>
>>> > On Fri, Jan 13, 2023 at 12:15 PM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> > >
>>> > > On 13.01.23 17:51, Greg Gallagher wrote:
>>> > > On Fri, Jan 13, 2023 at 8:52 AM Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> > > >
>>> >> > > On 12.01.23 16:26, Schuman Eelco (DC-AE/ESW5) wrote:
>>> >> >>> Hi,
>>> >> >>>
>>> >> >>> I'm trying to upgrade a system from Linux 4.4 to v5.10.y-dovetail with Xenomai 3.2.2 on a Zynq 7000 ARMv7 board.
>>> >> >>> I'm seeing scheduling problems, e.g. sleep 1 does not finish.
>>> >> >>>
>>> >> >>> The IRQ pipeline torture tests run OK:
>>> >> >>> [ 0.992675] Starting IRQ pipeline tests...
>>> >> >>> [ 0.992687] IRQ pipeline: high-priority torture stage added.
>>> >> >>> [ 1.005789] irq_pipeline-torture: CPU0 initiates stop_machine()
>>> >> >>> [ 1.011772] irq_pipeline-torture: CPU1 responds to stop_machine()
>>> >> >>> [ 1.021390] CPU1: proxy tick device registered (325.00MHz)
>>> >> >>> [ 1.026881] CPU0: proxy tick device registered (325.00MHz)
>>> >> >>> [ 1.035773] irq_pipeline-torture: CPU0: irq_work handled
>>> >> >>> [ 1.041099] irq_pipeline-torture: CPU0: in-band->in-band irq_work trigger works
>>> >> >>> [ 1.048430] irq_pipeline-torture: CPU0: stage escalation request works
>>> >> >>> [ 1.048436] irq_pipeline-torture: CPU0: irq_work handled
>>> >> >>> [ 1.060276] irq_pipeline-torture: CPU0: oob->in-band irq_work trigger works
>>> >> >>> [ 2.113302] CPU1: proxy tick device unregistered
>>> >> >>> [ 2.117930] CPU0: proxy tick device unregistered
>>> >> >>> [ 2.126014] IRQ pipeline: torture stage removed.
>>> >> >>> [ 2.130654] IRQ pipeline tests OK.
>>> >> >>>
>>> >> >>>
>>> >> >>> There is a warning in the kernel logs:
>>> >> >>>
>>> >> >>> 0.000000] ------------[ cut here ]------------
>>> >> >>> [ 0.000000] WARNING: CPU: 0 PID: 0 at kernel/irq/pipeline.c:1742 irq_pipeline_init+0x20/0x84
>>> >> >>
>>> >> >> This complains that hard-IRQs would be on while they must not.
>>> >> >> All that in fairly early boot code.
>>> >> >>
>>> >> >>> [ 0.000000] Modules linked in:
>>> >> >>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.157-xenomai #2
>>> >> >>> [ 0.000000] Hardware name: Xilinx Zynq Platform
>>> >> >>> [ 0.000000] IRQ stage: Linux
>>> >> >>> [ 0.000000] [<c0014460>] (unwind_backtrace) from [<c0011400>] (show_stack+0x10/0x14)
>>> >> >>> [ 0.000000] [<c0011400>] (show_stack) from [<c03aac8c>] (dump_stack+0xdc/0xfc)
>>> >> >>> [ 0.000000] [<c03aac8c>] (dump_stack) from [<c03a7b08>] (__warn+0xa4/0xcc)
>>> >> >>> [ 0.000000] [<c03a7b08>] (__warn) from [<c03a7ba8>] (warn_slowpath_fmt+0x78/0xb0)
>>> >> >>> [ 0.000000] [<c03a7ba8>] (warn_slowpath_fmt) from [<c04d1e20>] (irq_pipeline_init+0x20/0x84)
>>> >> >>> [ 0.000000] [<c04d1e20>] (irq_pipeline_init) from [<c04c6d4c>] (start_kernel+0x28c/0x4b8)
>>> >> >>> [ 0.000000] [<c04c6d4c>] (start_kernel) from [<00000000>] (0x0)
>>> >> >>> [ 0.000000] ---[ end trace 58996bc4bee07162 ]---
>>> >> >>> [ 0.000000] IRQ pipeline enabled
>>> >> >>>
>>> >> >>> The system and applications run OK with only one core in use.
>>> >> >>>
>>> >> >>> Is there something missing in the Linux config or anything else I can try?
>>> >> >>
>>> >> >> Not sure. We don't have a Zynq in the test lab or at least as
>>> >> >> reference image here. It might be that dovetail is missing some
>>> >> >> bits for this SoC, e.g. a patch analogously to [1] for
>>> >> >> drivers/gpio/gpio-zynq.c. Adding Philippe for possibly more insights.
>>> >> >>
>>> >> >> ARM multi-core is generically fine, see our QEMU target
>>> >> >> (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F
>>> >> >> %2F
>>> >> >> source.denx.de%2FXenomai%2Fxenomai-images%2F-%2Fjobs%2F552798&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=X9HSPXZaFPte2sxFfS5OTd683dFqMhWRpRcGXX%2FtRDg%3D&reserved=0
>>> >> >> e.g.). But our only physical armhf target in the lab is a
>>> >> >> single-core beaglebone.
>>> >> >>
>>> >> >> Jan
>>> >> >>
>>> >> >> [1]
>>> >> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%
>>> >> >> 2Fs%2F&data=05%7C01%7CEelco.Schuman%40boschrexroth.nl%7C503f3bab
>>> >> >> a62247f1cc7008daf8c0f983%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%
>>> >> >> 7C0%7C638095804954456375%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjA
>>> >> >> wMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C
>>> >> >> %7C&sdata=gskg%2FBDZWbmrHmEJe2%2B26EiYQDdroZJBOXXz9d29ROY%3D&res
>>> >> >> erved=0
>>> >> >> ource.denx.de%2FXenomai%2Flinux-dovetail%2F-%2Fcommit%2F3d3e5a16
>>> >> >> 59d
>>> >> >> 69c9dc96471efa542dc05fcc58e56&data=05%7C01%7CEelco.Schuman%40bos
>>> >> >> chr
>>> >> >> exroth.nl%7Cbde2fcb830c7453eb24e08daf7fff75c%7C0ae51e1907c84e4bb
>>> >> >> b6d
>>> >> >> 648ee58410f4%7C0%7C0%7C638094975982300406%7CUnknown%7CTWFpbGZsb3
>>> >> >> d8e
>>> >> >> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
>>> >> >> D%7
>>> >> >> C3000%7C%7C%7C&sdata=Owo4KNn2Bo6r8yKJaI0ZrJNKTQ7woM4eOjYCb74M6ZA
>>> >> >> %3D
>>> >> >> &reserved=0
>>> >> >>
>>> >> >> --
>>> >> >> Siemens AG, Technology
>>> >> >> Competence Center Embedded Linux
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> > If Philippe doesn't have time for an immediate answer I can try
>>> >> > one of my zynq boards this weekend. I reserved some time for
>>> >> > ipipe things, I can add this to the list and see if I can help.
>>> >>
>>> >> That would be great! I have one zync board literally at my feet as
>>> >> well, but setting all that up will take the usual time...
>>> >>
>>> >> Jan
>>> >>
>>> >> --
>>> >> Siemens AG, Technology
>>> >> Competence Center Embedded Linux
>>> >>
>>> >
>>> >I reproduced the issue (pretty sure), I can look at it this week if that helps. Just to confirm, Eelco, to reproduce I can create a thread and call sleep(1) and I should see the process hang?
>>> >
>>> >-Greg
>>>
>>>
>>> Hi Greg, many thanks for looking at this issue. Yes, a 'sleep(1)' in an application or thread hangs the process. I noticed that the proxy tick does not get any interrupts:
>>>
>>> CPU0 CPU1
>>> 24: 0 5 SIRQC 24 Edge in-band work
>>> 25: 0 0 GIC-0 27 Edge gt
>>> 26: 94785 94399 GIC-0 29 Edge oob twd
>>> 27: 0 0 GIC-0 37 Level arm-pmu
>>> 28: 0 0 GIC-0 38 Level arm-pmu
>>> 29: 43 0 GIC-0 39 Level f8007100.adc
>>> 32: 0 0 GIC-0 45 Level f8003000.dmac
>>> 33: 0 0 GIC-0 46 Level f8003000.dmac
>>> 34: 0 0 GIC-0 47 Level f8003000.dmac
>>> 35: 0 0 GIC-0 48 Level f8003000.dmac
>>> 36: 0 0 GIC-0 49 Level f8003000.dmac
>>> 37: 0 0 GIC-0 72 Level f8003000.dmac
>>> 38: 0 0 GIC-0 73 Level f8003000.dmac
>>> 39: 0 0 GIC-0 74 Level f8003000.dmac
>>> 40: 0 0 GIC-0 75 Level f8003000.dmac
>>> 41: 0 0 GIC-0 54 Level eth0
>>> 43: 2840 0 GIC-0 57 Level cdns-i2c
>>> 45: 463 0 GIC-0 59 Level xuartps
>>> 46: 0 0 GIC-0 64 Level oob [nyce-kmod]
>>> 48: 0 0 SIRQC 48 Edge Inband sirq
>>> 49: 0 0 SIRQC 49 Edge Inband sirq
>>> 50: 0 0 SIRQC 50 Edge Inband sirq
>>> 51: 0 0 SIRQC 51 Edge proxy tick
>>> IPI0: 0 0 CPU wakeup interrupts
>>> IPI1: 0 0 Timer broadcast interrupts
>>> IPI2: 321 528 Rescheduling interrupts
>>> IPI3: 5 2 Function call interrupts
>>> IPI4: 0 0 CPU stop interrupts
>>> IPI5: 0 0 IRQ work interrupts
>>> IPI6: 0 0 completion interrupts
>>>
>>> Is this because xnintr_host_tick() is not called, e.g. from xnintr_core_clock_handler() ?
>>>
>>>
>>> Best regards,
>>>
>>> Eelco
>>>
>>>
>>I would start looking there, that looks suspicious to me. Prior to the sleep() call proxy tick does get interrupts?
>>
>>-Greg
>
> - A process with 'sleep 1' hangs, so if you do (in bash)
> * sleep 1
> * wait 2 seconds
> * put in background ^Z [1]+ Stopped sleep 1
> * process is scheduled and finishes [1]+ Done sleep 1
>
> - The proxy tick has no interrupts before and after sleep. I don't see changes in the interrupt counts, except for the twd count.
>
> - I did a quick experiment to force xnintr_host_ticks(). With this I see proxy tick interrupts and sleep works as expected.
>
> void xnintr_core_clock_handler(void)
> {
> ...
> ...
> sched = xnsched_current();
> // if ((sched->lflags & XNHTICK) && -- force inband interrupts to proxy tick
> if (xnthread_test_state(sched->curr, XNROOT))
> xnintr_host_tick(sched);
> }
>
> Thanks,
> --Eelco
Is proxy_set_next_ktime() steadily called in the failing case? It should
be with sleep 1 running. The fact that your patch works around the issue
means that the system does receive (oob) clock events normally, but if
Xenomai's host timer (sched->htimer) is not queued upon a timing request
received from the inband kernel, then XNHTICK cannot be raised, breaking
the proxy tick propagation downwards from Xenomai to the inband kernel.
The theory of operations for the proxy tick is documented at [1]. Also,
you may want to check whether the ONESHOT_STOPPED mode of clockevents
is not involved in the issue, just in case.
Sorry, I cannot do much more ATM for investigating the issue, I could
not reproduce it on any of the arm(64) SoCs I have at hand, and I'm
deeply buried in my todo list.
[1] https://evlproject.org/dovetail/porting/timer/#proxy-tick-logic
--
Philippe.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-01-18 17:32 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-12 15:26 Process hangs after upgrade Schuman Eelco (DC-AE/ESW5)
2023-01-13 13:52 ` Jan Kiszka
2023-01-13 16:51 ` Greg Gallagher
2023-01-13 17:15 ` Jan Kiszka
2023-01-16 20:26 ` Greg Gallagher
2023-01-17 17:03 ` Schuman Eelco (DC-AE/ESW5)
2023-01-17 19:28 ` Greg Gallagher
2023-01-18 16:44 ` Schuman Eelco (DC-AE/ESW5)
2023-01-18 17:21 ` Philippe Gerum
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.