All of lore.kernel.org
 help / color / mirror / Atom feed
* System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
@ 2022-03-11 10:12 Scott Reed
  2022-03-11 11:38 ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Scott Reed @ 2022-03-11 10:12 UTC (permalink / raw)
  To: xenomai

Hello,

I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
when trying to move to a newer kernel and I-pipe patch.

The issue is as soon as a PCIe MSI interrupt occurs, the system
hangs with no message output on the serial console or in
/var/log/messages.

The platform I am working on is a "i.MX 6 Quad" and I am upgrading
from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
kernel and I-pipe patch with Xenomai 3.2.1.

Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
interrupts to the CPU from, for example, an Altera Triple-Speed MAC.

I have stable system running for some time with Linux 4.14.62 with
Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
some time back, I tried to move to 4.14.110 with I-pipe and also
saw same scenario of my system hanging on the first PCIe MSI interrupt
so I backed out back to 4.14.62. Now I am trying to move to 5.4.151, but
see the same hang.

Before I dive into analyzing the hang, I wanted to ask:

What are other people's experiences with using PCIe MSI interrupts
and I-pipe?

I am thinking of trying 5.10.103 Dovetail to see if I still see
the problem. Would this be recommended?

Thanks,

Scott

[1] https://xenomai.org/pipermail/xenomai/2021-August/046138.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-11 10:12 System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62 Scott Reed
@ 2022-03-11 11:38 ` Jan Kiszka
  2022-03-11 13:13   ` Scott Reed
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-03-11 11:38 UTC (permalink / raw)
  To: Scott Reed, xenomai

On 11.03.22 11:12, Scott Reed via Xenomai wrote:
> Hello,
> 
> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
> when trying to move to a newer kernel and I-pipe patch.
> 
> The issue is as soon as a PCIe MSI interrupt occurs, the system
> hangs with no message output on the serial console or in
> /var/log/messages.
> 
> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
> kernel and I-pipe patch with Xenomai 3.2.1.
> 
> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
> 
> I have stable system running for some time with Linux 4.14.62 with
> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
> some time back, I tried to move to 4.14.110 with I-pipe and also
> saw same scenario of my system hanging on the first PCIe MSI interrupt
> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151, but
> see the same hang.

What about 4.19.y-cip? Specifically because of
https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.

Actually, that commit is also missing from the last tagged 5.4 ipipe
version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.

> 
> Before I dive into analyzing the hang, I wanted to ask:
> 
> What are other people's experiences with using PCIe MSI interrupts
> and I-pipe?
> 
> I am thinking of trying 5.10.103 Dovetail to see if I still see
> the problem. Would this be recommended?

If you can migrate your test with reasonable effort, yes, definitely.

Jan

> 
> Thanks,
> 
> Scott
> 
> [1] https://xenomai.org/pipermail/xenomai/2021-August/046138.html
> 
> 

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-11 11:38 ` Jan Kiszka
@ 2022-03-11 13:13   ` Scott Reed
  2022-03-14 17:45     ` Scott Reed
  0 siblings, 1 reply; 15+ messages in thread
From: Scott Reed @ 2022-03-11 13:13 UTC (permalink / raw)
  To: Jan Kiszka, xenomai


On 3/11/22 12:38 PM, Jan Kiszka wrote:
> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>> Hello,
>>
>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>> when trying to move to a newer kernel and I-pipe patch.
>>
>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>> hangs with no message output on the serial console or in
>> /var/log/messages.
>>
>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
>> kernel and I-pipe patch with Xenomai 3.2.1.
>>
>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
>> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
>>
>> I have stable system running for some time with Linux 4.14.62 with
>> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
>> some time back, I tried to move to 4.14.110 with I-pipe and also
>> saw same scenario of my system hanging on the first PCIe MSI interrupt
>> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151, but
>> see the same hang.
> 
> What about 4.19.y-cip? Specifically because of
> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
> 
> Actually, that commit is also missing from the last tagged 5.4 ipipe
> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.

To do a quick test, I just applied the change from the commit you
referenced above to my 5.4.151 ipipe kernel and it unfortunately did not
help (hang still occurs with first interrupt).

> 
>>
>> Before I dive into analyzing the hang, I wanted to ask:
>>
>> What are other people's experiences with using PCIe MSI interrupts
>> and I-pipe?
>>
>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>> the problem. Would this be recommended?
> 
> If you can migrate your test with reasonable effort, yes, definitely.

I will try to migrate my test to 5.10.103 Dovetail with the hopes that
it will not be too much effort and report back.

Thanks,

Scott

> 
> Jan
> 
>>
>> Thanks,
>>
>> Scott
>>
>> [1] https://xenomai.org/pipermail/xenomai/2021-August/046138.html
>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-11 13:13   ` Scott Reed
@ 2022-03-14 17:45     ` Scott Reed
  2022-03-15  6:32       ` Jan Kiszka
  0 siblings, 1 reply; 15+ messages in thread
From: Scott Reed @ 2022-03-14 17:45 UTC (permalink / raw)
  To: Jan Kiszka, xenomai



On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
> 
> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>> Hello,
>>>
>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>> when trying to move to a newer kernel and I-pipe patch.
>>>
>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>> hangs with no message output on the serial console or in
>>> /var/log/messages.
>>>
>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>
>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
>>> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
>>>
>>> I have stable system running for some time with Linux 4.14.62 with
>>> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>> saw same scenario of my system hanging on the first PCIe MSI interrupt
>>> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151, but
>>> see the same hang.
>>
>> What about 4.19.y-cip? Specifically because of
>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c. 
>>
>>
>> Actually, that commit is also missing from the last tagged 5.4 ipipe
>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.
> 
> To do a quick test, I just applied the change from the commit you
> referenced above to my 5.4.151 ipipe kernel and it unfortunately did not
> help (hang still occurs with first interrupt).
> 
>>
>>>
>>> Before I dive into analyzing the hang, I wanted to ask:
>>>
>>> What are other people's experiences with using PCIe MSI interrupts
>>> and I-pipe?
>>>
>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>> the problem. Would this be recommended?
>>
>> If you can migrate your test with reasonable effort, yes, definitely.
> 
> I will try to migrate my test to 5.10.103 Dovetail with the hopes that
> it will not be too much effort and report back.

I tried to migrate my test to 5.10.103 Dovetail and failed on the first
step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
on my platform.

The kernel boots without a problem, but the FEC Ethernet port on the
i.MX 6 is not working (cannot ping in or out).

I looked at the trace with Wireshark and it looks like when pinging
out that the ARP packet is corrupt and therefore failing. The ARP
packet is corrupt in that it looks like various bits are flipped. For 
example, the source MAC address should be
   00:09:cc:02:c1:b6
but is
   00:01:cc:02:01:36 or
   00:09:cc:02:c1:36
Wireshark also complains about the Frame check sequence
([FCS Status: Unverified]

I can provide Wireshark dumps if someone is interested, but for me
at this point I do not want to fight with getting a 5.10.x kernel
to work as I was pretty far along moving to a 5.4.x kernel with
ipipe before running into the original problem posted (with ipipe
my system freezes on the first PCIe MSI interrupt. Note: without
ipipe, I do not see any issues).

As mentioned, I first saw this problem a while ago when trying
to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
then backed back down to 4.14.62+ipipe which works.

I guess my next strategy is to try to figure out what changed
between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
the hang as I hope the delta between them is not too large.

If anyone has other suggestions or tips, they are more than welcome.

Thanks,

Scott

> 
> Thanks,
> 
> Scott
> 
>>
>> Jan
>>
>>>
>>> Thanks,
>>>
>>> Scott
>>>
>>> [1] https://xenomai.org/pipermail/xenomai/2021-August/046138.html
>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-14 17:45     ` Scott Reed
@ 2022-03-15  6:32       ` Jan Kiszka
  2022-03-15  8:42         ` Scott Reed
  0 siblings, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-03-15  6:32 UTC (permalink / raw)
  To: Scott Reed, xenomai

On 14.03.22 18:45, Scott Reed wrote:
> 
> 
> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>
>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>> Hello,
>>>>
>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>
>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>> hangs with no message output on the serial console or in
>>>> /var/log/messages.
>>>>
>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>
>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
>>>> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
>>>>
>>>> I have stable system running for some time with Linux 4.14.62 with
>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>> saw same scenario of my system hanging on the first PCIe MSI interrupt
>>>> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151,
>>>> but
>>>> see the same hang.
>>>
>>> What about 4.19.y-cip? Specifically because of
>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>
>>>
>>> Actually, that commit is also missing from the last tagged 5.4 ipipe
>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.
>>
>> To do a quick test, I just applied the change from the commit you
>> referenced above to my 5.4.151 ipipe kernel and it unfortunately did not
>> help (hang still occurs with first interrupt).
>>
>>>
>>>>
>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>
>>>> What are other people's experiences with using PCIe MSI interrupts
>>>> and I-pipe?
>>>>
>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>> the problem. Would this be recommended?
>>>
>>> If you can migrate your test with reasonable effort, yes, definitely.
>>
>> I will try to migrate my test to 5.10.103 Dovetail with the hopes that
>> it will not be too much effort and report back.
> 
> I tried to migrate my test to 5.10.103 Dovetail and failed on the first
> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
> on my platform.
> 
> The kernel boots without a problem, but the FEC Ethernet port on the
> i.MX 6 is not working (cannot ping in or out).

Do you have or did you have any custom patches on top?

> 
> I looked at the trace with Wireshark and it looks like when pinging
> out that the ARP packet is corrupt and therefore failing. The ARP
> packet is corrupt in that it looks like various bits are flipped. For
> example, the source MAC address should be
>   00:09:cc:02:c1:b6
> but is
>   00:01:cc:02:01:36 or
>   00:09:cc:02:c1:36
> Wireshark also complains about the Frame check sequence
> ([FCS Status: Unverified]
> 
> I can provide Wireshark dumps if someone is interested, but for me
> at this point I do not want to fight with getting a 5.10.x kernel
> to work as I was pretty far along moving to a 5.4.x kernel with
> ipipe before running into the original problem posted (with ipipe
> my system freezes on the first PCIe MSI interrupt. Note: without
> ipipe, I do not see any issues).
> 
> As mentioned, I first saw this problem a while ago when trying
> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
> then backed back down to 4.14.62+ipipe which works.
> 
> I guess my next strategy is to try to figure out what changed
> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
> the hang as I hope the delta between them is not too large.
> 
> If anyone has other suggestions or tips, they are more than welcome.

As I wrote before: try the latest 4.19-cip-ipipe first.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-15  6:32       ` Jan Kiszka
@ 2022-03-15  8:42         ` Scott Reed
  2022-03-16  9:58           ` Scott Reed
  0 siblings, 1 reply; 15+ messages in thread
From: Scott Reed @ 2022-03-15  8:42 UTC (permalink / raw)
  To: Jan Kiszka, xenomai



On 3/15/22 7:32 AM, Jan Kiszka wrote:
> On 14.03.22 18:45, Scott Reed wrote:
>>
>>
>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>
>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>> Hello,
>>>>>
>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>
>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>> hangs with no message output on the serial console or in
>>>>> /var/log/messages.
>>>>>
>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>
>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
>>>>>
>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>> saw same scenario of my system hanging on the first PCIe MSI interrupt
>>>>> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151,
>>>>> but
>>>>> see the same hang.
>>>>
>>>> What about 4.19.y-cip? Specifically because of
>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>>
>>>>
>>>> Actually, that commit is also missing from the last tagged 5.4 ipipe
>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.
>>>
>>> To do a quick test, I just applied the change from the commit you
>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately did not
>>> help (hang still occurs with first interrupt).
>>>
>>>>
>>>>>
>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>
>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>> and I-pipe?
>>>>>
>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>> the problem. Would this be recommended?
>>>>
>>>> If you can migrate your test with reasonable effort, yes, definitely.
>>>
>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes that
>>> it will not be too much effort and report back.
>>
>> I tried to migrate my test to 5.10.103 Dovetail and failed on the first
>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
>> on my platform.
>>
>> The kernel boots without a problem, but the FEC Ethernet port on the
>> i.MX 6 is not working (cannot ping in or out).
> 
> Do you have or did you have any custom patches on top?

Only a patch to add the device tree include (dtsi) for our imx6 SOC:
    μQ7-962 - μQseven standard module with NXP i.MX 6 Processor

> 
>>
>> I looked at the trace with Wireshark and it looks like when pinging
>> out that the ARP packet is corrupt and therefore failing. The ARP
>> packet is corrupt in that it looks like various bits are flipped. For
>> example, the source MAC address should be
>>    00:09:cc:02:c1:b6
>> but is
>>    00:01:cc:02:01:36 or
>>    00:09:cc:02:c1:36
>> Wireshark also complains about the Frame check sequence
>> ([FCS Status: Unverified]
>>
>> I can provide Wireshark dumps if someone is interested, but for me
>> at this point I do not want to fight with getting a 5.10.x kernel
>> to work as I was pretty far along moving to a 5.4.x kernel with
>> ipipe before running into the original problem posted (with ipipe
>> my system freezes on the first PCIe MSI interrupt. Note: without
>> ipipe, I do not see any issues).
>>
>> As mentioned, I first saw this problem a while ago when trying
>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>> then backed back down to 4.14.62+ipipe which works.
>>
>> I guess my next strategy is to try to figure out what changed
>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>> the hang as I hope the delta between them is not too large.
>>
>> If anyone has other suggestions or tips, they are more than welcome.
> 
> As I wrote before: try the latest 4.19-cip-ipipe first.

OK. Will do.

> 
> Jan
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-15  8:42         ` Scott Reed
@ 2022-03-16  9:58           ` Scott Reed
  2022-03-16 10:22             ` Scott Reed
  2022-03-16 10:35             ` Jan Kiszka
  0 siblings, 2 replies; 15+ messages in thread
From: Scott Reed @ 2022-03-16  9:58 UTC (permalink / raw)
  To: Jan Kiszka, xenomai



On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
> 
> 
> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>> On 14.03.22 18:45, Scott Reed wrote:
>>>
>>>
>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>
>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>
>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>> hangs with no message output on the serial console or in
>>>>>> /var/log/messages.
>>>>>>
>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>
>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
>>>>>>
>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>> saw same scenario of my system hanging on the first PCIe MSI 
>>>>>> interrupt
>>>>>> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151,
>>>>>> but
>>>>>> see the same hang.
>>>>>
>>>>> What about 4.19.y-cip? Specifically because of
>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c. 
>>>>>
>>>>>
>>>>>
>>>>> Actually, that commit is also missing from the last tagged 5.4 ipipe
>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.
>>>>
>>>> To do a quick test, I just applied the change from the commit you
>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately did 
>>>> not
>>>> help (hang still occurs with first interrupt).
>>>>
>>>>>
>>>>>>
>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>
>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>> and I-pipe?
>>>>>>
>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>> the problem. Would this be recommended?
>>>>>
>>>>> If you can migrate your test with reasonable effort, yes, definitely.
>>>>
>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes that
>>>> it will not be too much effort and report back.
>>>
>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the first
>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
>>> on my platform.
>>>
>>> The kernel boots without a problem, but the FEC Ethernet port on the
>>> i.MX 6 is not working (cannot ping in or out).
>>
>> Do you have or did you have any custom patches on top?
> 
> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
>     μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
> 
>>
>>>
>>> I looked at the trace with Wireshark and it looks like when pinging
>>> out that the ARP packet is corrupt and therefore failing. The ARP
>>> packet is corrupt in that it looks like various bits are flipped. For
>>> example, the source MAC address should be
>>>    00:09:cc:02:c1:b6
>>> but is
>>>    00:01:cc:02:01:36 or
>>>    00:09:cc:02:c1:36
>>> Wireshark also complains about the Frame check sequence
>>> ([FCS Status: Unverified]
>>>
>>> I can provide Wireshark dumps if someone is interested, but for me
>>> at this point I do not want to fight with getting a 5.10.x kernel
>>> to work as I was pretty far along moving to a 5.4.x kernel with
>>> ipipe before running into the original problem posted (with ipipe
>>> my system freezes on the first PCIe MSI interrupt. Note: without
>>> ipipe, I do not see any issues).
>>>
>>> As mentioned, I first saw this problem a while ago when trying
>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>>> then backed back down to 4.14.62+ipipe which works.
>>>
>>> I guess my next strategy is to try to figure out what changed
>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>>> the hang as I hope the delta between them is not too large.
>>>
>>> If anyone has other suggestions or tips, they are more than welcome.
>>
>> As I wrote before: try the latest 4.19-cip-ipipe first.
> 
> OK. Will do.

I was able to run my test where the system hangs on the first
PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
unfortunately see the same behavior (system hangs).

PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
but when I add  ipipe and Xenomai 3.2.1 to the kernel, then
the system hangs on the first PCIe MSI interrupt.

As mentioned before, I first observed this behavior when moving from
4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
into what changed in this time frame. My goal is still to move to
5.4.x+ipipe, but need to first understand what change is causing
my problem. I assume it is a kernel change or i-pipe change which
either causes the problem or triggers a problem in our system which
was dormant up until now.

I suppose I could try the 4.14.101 kernel with the 4.14.62 ipipe
patch (if the patch applies cleanly) to try and determine if the
problematic change is in the kernel or ipipe patch.

A question in general. How "common" is it to use PCIe MSI interrupts
and ipipe? Are other people running systems with PCIe MSI interrupts
and ipipe without issues or is this simply not a typical use-case?

Scott
> 
>>
>> Jan
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-16  9:58           ` Scott Reed
@ 2022-03-16 10:22             ` Scott Reed
  2022-03-16 10:35             ` Jan Kiszka
  1 sibling, 0 replies; 15+ messages in thread
From: Scott Reed @ 2022-03-16 10:22 UTC (permalink / raw)
  To: Jan Kiszka, xenomai



On 3/16/22 10:58 AM, Scott Reed via Xenomai wrote:
> 
> 
> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>>
>>
>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>>> On 14.03.22 18:45, Scott Reed wrote:
>>>>
>>>>
>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>>
>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>>
>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>>> hangs with no message output on the serial console or in
>>>>>>> /var/log/messages.
>>>>>>>
>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>>
>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
>>>>>>>
>>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>>> saw same scenario of my system hanging on the first PCIe MSI 
>>>>>>> interrupt
>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151,
>>>>>>> but
>>>>>>> see the same hang.
>>>>>>
>>>>>> What about 4.19.y-cip? Specifically because of
>>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c. 
>>>>>>
>>>>>>
>>>>>>
>>>>>> Actually, that commit is also missing from the last tagged 5.4 ipipe
>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.
>>>>>
>>>>> To do a quick test, I just applied the change from the commit you
>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately 
>>>>> did not
>>>>> help (hang still occurs with first interrupt).
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>>
>>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>>> and I-pipe?
>>>>>>>
>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>>> the problem. Would this be recommended?
>>>>>>
>>>>>> If you can migrate your test with reasonable effort, yes, definitely.
>>>>>
>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes that
>>>>> it will not be too much effort and report back.
>>>>
>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the first
>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
>>>> on my platform.
>>>>
>>>> The kernel boots without a problem, but the FEC Ethernet port on the
>>>> i.MX 6 is not working (cannot ping in or out).
>>>
>>> Do you have or did you have any custom patches on top?
>>
>> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
>>     μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
>>
>>>
>>>>
>>>> I looked at the trace with Wireshark and it looks like when pinging
>>>> out that the ARP packet is corrupt and therefore failing. The ARP
>>>> packet is corrupt in that it looks like various bits are flipped. For
>>>> example, the source MAC address should be
>>>>    00:09:cc:02:c1:b6
>>>> but is
>>>>    00:01:cc:02:01:36 or
>>>>    00:09:cc:02:c1:36
>>>> Wireshark also complains about the Frame check sequence
>>>> ([FCS Status: Unverified]
>>>>
>>>> I can provide Wireshark dumps if someone is interested, but for me
>>>> at this point I do not want to fight with getting a 5.10.x kernel
>>>> to work as I was pretty far along moving to a 5.4.x kernel with
>>>> ipipe before running into the original problem posted (with ipipe
>>>> my system freezes on the first PCIe MSI interrupt. Note: without
>>>> ipipe, I do not see any issues).
>>>>
>>>> As mentioned, I first saw this problem a while ago when trying
>>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>>>> then backed back down to 4.14.62+ipipe which works.
>>>>
>>>> I guess my next strategy is to try to figure out what changed
>>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>>>> the hang as I hope the delta between them is not too large.
>>>>
>>>> If anyone has other suggestions or tips, they are more than welcome.
>>>
>>> As I wrote before: try the latest 4.19-cip-ipipe first.
>>
>> OK. Will do.
> 
> I was able to run my test where the system hangs on the first
> PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
> unfortunately see the same behavior (system hangs).
> 
> PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
> but when I add  ipipe and Xenomai 3.2.1 to the kernel, then
> the system hangs on the first PCIe MSI interrupt.
> 
> As mentioned before, I first observed this behavior when moving from
> 4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
> into what changed in this time frame. My goal is still to move to
> 5.4.x+ipipe, but need to first understand what change is causing
> my problem. I assume it is a kernel change or i-pipe change which
> either causes the problem or triggers a problem in our system which
> was dormant up until now.
> 
> I suppose I could try the 4.14.101 kernel with the 4.14.62 ipipe
> patch (if the patch applies cleanly) to try and determine if the
> problematic change is in the kernel or ipipe patch.

Sorry typo above 4.14.101->4.14.110. I tried applying the 4.14.62
ipipe patch to the 4.14.110 kernel and it did not apply
cleanly (Hunks failed). I guess not surprising, but was worth a
quick try.

Scott
> 
> A question in general. How "common" is it to use PCIe MSI interrupts
> and ipipe? Are other people running systems with PCIe MSI interrupts
> and ipipe without issues or is this simply not a typical use-case?
> 
> Scott
>>
>>>
>>> Jan
>>>
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-16  9:58           ` Scott Reed
  2022-03-16 10:22             ` Scott Reed
@ 2022-03-16 10:35             ` Jan Kiszka
  2022-03-17 15:24               ` Scott Reed
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-03-16 10:35 UTC (permalink / raw)
  To: Scott Reed, xenomai

On 16.03.22 10:58, Scott Reed wrote:
> 
> 
> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>>
>>
>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>>> On 14.03.22 18:45, Scott Reed wrote:
>>>>
>>>>
>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>>
>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>>
>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>>> hangs with no message output on the serial console or in
>>>>>>> /var/log/messages.
>>>>>>>
>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>>
>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
>>>>>>>
>>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
>>>>>>> interrupt
>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151,
>>>>>>> but
>>>>>>> see the same hang.
>>>>>>
>>>>>> What about 4.19.y-cip? Specifically because of
>>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Actually, that commit is also missing from the last tagged 5.4 ipipe
>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.
>>>>>
>>>>> To do a quick test, I just applied the change from the commit you
>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
>>>>> did not
>>>>> help (hang still occurs with first interrupt).
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>>
>>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>>> and I-pipe?
>>>>>>>
>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>>> the problem. Would this be recommended?
>>>>>>
>>>>>> If you can migrate your test with reasonable effort, yes, definitely.
>>>>>
>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes that
>>>>> it will not be too much effort and report back.
>>>>
>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the first
>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
>>>> on my platform.
>>>>
>>>> The kernel boots without a problem, but the FEC Ethernet port on the
>>>> i.MX 6 is not working (cannot ping in or out).
>>>
>>> Do you have or did you have any custom patches on top?
>>
>> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
>>     μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
>>
>>>
>>>>
>>>> I looked at the trace with Wireshark and it looks like when pinging
>>>> out that the ARP packet is corrupt and therefore failing. The ARP
>>>> packet is corrupt in that it looks like various bits are flipped. For
>>>> example, the source MAC address should be
>>>>    00:09:cc:02:c1:b6
>>>> but is
>>>>    00:01:cc:02:01:36 or
>>>>    00:09:cc:02:c1:36
>>>> Wireshark also complains about the Frame check sequence
>>>> ([FCS Status: Unverified]
>>>>
>>>> I can provide Wireshark dumps if someone is interested, but for me
>>>> at this point I do not want to fight with getting a 5.10.x kernel
>>>> to work as I was pretty far along moving to a 5.4.x kernel with
>>>> ipipe before running into the original problem posted (with ipipe
>>>> my system freezes on the first PCIe MSI interrupt. Note: without
>>>> ipipe, I do not see any issues).
>>>>
>>>> As mentioned, I first saw this problem a while ago when trying
>>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>>>> then backed back down to 4.14.62+ipipe which works.
>>>>
>>>> I guess my next strategy is to try to figure out what changed
>>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>>>> the hang as I hope the delta between them is not too large.
>>>>
>>>> If anyone has other suggestions or tips, they are more than welcome.
>>>
>>> As I wrote before: try the latest 4.19-cip-ipipe first.
>>
>> OK. Will do.
> 
> I was able to run my test where the system hangs on the first
> PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
> unfortunately see the same behavior (system hangs).
> 
> PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
> but when I add  ipipe and Xenomai 3.2.1 to the kernel, then
> the system hangs on the first PCIe MSI interrupt.
> 
> As mentioned before, I first observed this behavior when moving from
> 4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
> into what changed in this time frame. My goal is still to move to

Yes, that might be a way now to try to find the root cause. Problem: you
can't do bisection easily because of the merges with the I-pipe patch.
Therefore, it can be easier to actually debug where the system hangs, on
what. With some traces from there, it can then be simpler again to
analyse the differences between to working and non-working 4.14 kernels.

> 5.4.x+ipipe, but need to first understand what change is causing
> my problem. I assume it is a kernel change or i-pipe change which
> either causes the problem or triggers a problem in our system which
> was dormant up until now.
> 
> I suppose I could try the 4.14.101 kernel with the 4.14.62 ipipe
> patch (if the patch applies cleanly) to try and determine if the
> problematic change is in the kernel or ipipe patch.
> 
> A question in general. How "common" is it to use PCIe MSI interrupts
> and ipipe? Are other people running systems with PCIe MSI interrupts
> and ipipe without issues or is this simply not a typical use-case?
> 

PCIe and MSI are very common and well tested - on x86, possibly also on
arm64. It is very likely not that well on 32-bit arm, though.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-16 10:35             ` Jan Kiszka
@ 2022-03-17 15:24               ` Scott Reed
  2022-03-17 15:31                 ` Greg Gallagher
  2022-03-17 15:44                 ` Jan Kiszka
  0 siblings, 2 replies; 15+ messages in thread
From: Scott Reed @ 2022-03-17 15:24 UTC (permalink / raw)
  To: Jan Kiszka, xenomai



On 3/16/22 11:35 AM, Jan Kiszka wrote:
> On 16.03.22 10:58, Scott Reed wrote:
>>
>>
>> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>>>
>>>
>>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>>>> On 14.03.22 18:45, Scott Reed wrote:
>>>>>
>>>>>
>>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>>>
>>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>>>
>>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>>>> hangs with no message output on the serial console or in
>>>>>>>> /var/log/messages.
>>>>>>>>
>>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to 5.4.151
>>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>>>
>>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe MSI
>>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed MAC.
>>>>>>>>
>>>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1]. Also
>>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
>>>>>>>> interrupt
>>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to 5.4.151,
>>>>>>>> but
>>>>>>>> see the same hang.
>>>>>>>
>>>>>>> What about 4.19.y-cip? Specifically because of
>>>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Actually, that commit is also missing from the last tagged 5.4 ipipe
>>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head instead.
>>>>>>
>>>>>> To do a quick test, I just applied the change from the commit you
>>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
>>>>>> did not
>>>>>> help (hang still occurs with first interrupt).
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>>>
>>>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>>>> and I-pipe?
>>>>>>>>
>>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>>>> the problem. Would this be recommended?
>>>>>>>
>>>>>>> If you can migrate your test with reasonable effort, yes, definitely.
>>>>>>
>>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes that
>>>>>> it will not be too much effort and report back.
>>>>>
>>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the first
>>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103 kernel
>>>>> on my platform.
>>>>>
>>>>> The kernel boots without a problem, but the FEC Ethernet port on the
>>>>> i.MX 6 is not working (cannot ping in or out).
>>>>
>>>> Do you have or did you have any custom patches on top?
>>>
>>> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
>>>      μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
>>>
>>>>
>>>>>
>>>>> I looked at the trace with Wireshark and it looks like when pinging
>>>>> out that the ARP packet is corrupt and therefore failing. The ARP
>>>>> packet is corrupt in that it looks like various bits are flipped. For
>>>>> example, the source MAC address should be
>>>>>     00:09:cc:02:c1:b6
>>>>> but is
>>>>>     00:01:cc:02:01:36 or
>>>>>     00:09:cc:02:c1:36
>>>>> Wireshark also complains about the Frame check sequence
>>>>> ([FCS Status: Unverified]
>>>>>
>>>>> I can provide Wireshark dumps if someone is interested, but for me
>>>>> at this point I do not want to fight with getting a 5.10.x kernel
>>>>> to work as I was pretty far along moving to a 5.4.x kernel with
>>>>> ipipe before running into the original problem posted (with ipipe
>>>>> my system freezes on the first PCIe MSI interrupt. Note: without
>>>>> ipipe, I do not see any issues).
>>>>>
>>>>> As mentioned, I first saw this problem a while ago when trying
>>>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>>>>> then backed back down to 4.14.62+ipipe which works.
>>>>>
>>>>> I guess my next strategy is to try to figure out what changed
>>>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>>>>> the hang as I hope the delta between them is not too large.
>>>>>
>>>>> If anyone has other suggestions or tips, they are more than welcome.
>>>>
>>>> As I wrote before: try the latest 4.19-cip-ipipe first.
>>>
>>> OK. Will do.
>>
>> I was able to run my test where the system hangs on the first
>> PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
>> unfortunately see the same behavior (system hangs).
>>
>> PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
>> but when I add  ipipe and Xenomai 3.2.1 to the kernel, then
>> the system hangs on the first PCIe MSI interrupt.
>>
>> As mentioned before, I first observed this behavior when moving from
>> 4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
>> into what changed in this time frame. My goal is still to move to
> 
> Yes, that might be a way now to try to find the root cause. Problem: you
> can't do bisection easily because of the merges with the I-pipe patch.
> Therefore, it can be easier to actually debug where the system hangs, on
> what. With some traces from there, it can then be simpler again to
> analyse the differences between to working and non-working 4.14 kernels.
> 

I have been able to get my test running on 4.14.110+ipipe without the
system hanging on the first PCIe MSI interrupt. I have attached my
patch (hopefully the attachment shows up correctly, but if not
please let me know).

The fix is to replace in the PCIe MSI interrupt handler the call
to generic_handle_irq() with ipipe_handle_demuxed_irq.

Actually, I had already made this patch on my 4.14.62 system
in combination with a patch to make the PCIe driver an RTDM
driver (see [1]) to address latency issues. As this patch was for
a latency issue on 4.14.62 and not a hang, I did not immediately
think about the ipipe part of the patch being the fix for the
hang I was seeing when moving to 4.14.110+ipipe.

I will now check if the same/similar patch fixes my original
hang on 5.4.151+ipipe.

Would it make sense to integrate this patch into next ipipe release?

Scott

>> 5.4.x+ipipe, but need to first understand what change is causing
>> my problem. I assume it is a kernel change or i-pipe change which
>> either causes the problem or triggers a problem in our system which
>> was dormant up until now.
>>
>> I suppose I could try the 4.14.101 kernel with the 4.14.62 ipipe
>> patch (if the patch applies cleanly) to try and determine if the
>> problematic change is in the kernel or ipipe patch.
>>
>> A question in general. How "common" is it to use PCIe MSI interrupts
>> and ipipe? Are other people running systems with PCIe MSI interrupts
>> and ipipe without issues or is this simply not a typical use-case?
>>
> 
> PCIe and MSI are very common and well tested - on x86, possibly also on
> arm64. It is very likely not that well on 32-bit arm, though.
> 
> Jan
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-ipipe-arm-Fix-handling-of-PCIe-MSI-interrupts.patch
Type: text/x-patch
Size: 993 bytes
Desc: not available
URL: <http://xenomai.org/pipermail/xenomai/attachments/20220317/c81fcb0f/attachment.bin>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-17 15:24               ` Scott Reed
@ 2022-03-17 15:31                 ` Greg Gallagher
  2022-03-17 15:44                   ` Scott Reed
  2022-03-17 15:44                 ` Jan Kiszka
  1 sibling, 1 reply; 15+ messages in thread
From: Greg Gallagher @ 2022-03-17 15:31 UTC (permalink / raw)
  To: Scott Reed; +Cc: Jan Kiszka, xenomai

On Thu, Mar 17, 2022 at 11:25 AM Scott Reed via Xenomai <xenomai@xenomai.org>
wrote:

>
>
> On 3/16/22 11:35 AM, Jan Kiszka wrote:
> > On 16.03.22 10:58, Scott Reed wrote:
> >>
> >>
> >> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
> >>>
> >>>
> >>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
> >>>> On 14.03.22 18:45, Scott Reed wrote:
> >>>>>
> >>>>>
> >>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
> >>>>>>
> >>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
> >>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
> >>>>>>>> Hello,
> >>>>>>>>
> >>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
> >>>>>>>> when trying to move to a newer kernel and I-pipe patch.
> >>>>>>>>
> >>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
> >>>>>>>> hangs with no message output on the serial console or in
> >>>>>>>> /var/log/messages.
> >>>>>>>>
> >>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
> >>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to
> 5.4.151
> >>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
> >>>>>>>>
> >>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe
> MSI
> >>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed
> MAC.
> >>>>>>>>
> >>>>>>>> I have stable system running for some time with Linux 4.14.62 with
> >>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1].
> Also
> >>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
> >>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
> >>>>>>>> interrupt
> >>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to
> 5.4.151,
> >>>>>>>> but
> >>>>>>>> see the same hang.
> >>>>>>>
> >>>>>>> What about 4.19.y-cip? Specifically because of
> >>>>>>>
> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c
> .
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Actually, that commit is also missing from the last tagged 5.4
> ipipe
> >>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head
> instead.
> >>>>>>
> >>>>>> To do a quick test, I just applied the change from the commit you
> >>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
> >>>>>> did not
> >>>>>> help (hang still occurs with first interrupt).
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
> >>>>>>>>
> >>>>>>>> What are other people's experiences with using PCIe MSI interrupts
> >>>>>>>> and I-pipe?
> >>>>>>>>
> >>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
> >>>>>>>> the problem. Would this be recommended?
> >>>>>>>
> >>>>>>> If you can migrate your test with reasonable effort, yes,
> definitely.
> >>>>>>
> >>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes
> that
> >>>>>> it will not be too much effort and report back.
> >>>>>
> >>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the
> first
> >>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103
> kernel
> >>>>> on my platform.
> >>>>>
> >>>>> The kernel boots without a problem, but the FEC Ethernet port on the
> >>>>> i.MX 6 is not working (cannot ping in or out).
> >>>>
> >>>> Do you have or did you have any custom patches on top?
> >>>
> >>> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
> >>>      μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
> >>>
> >>>>
> >>>>>
> >>>>> I looked at the trace with Wireshark and it looks like when pinging
> >>>>> out that the ARP packet is corrupt and therefore failing. The ARP
> >>>>> packet is corrupt in that it looks like various bits are flipped. For
> >>>>> example, the source MAC address should be
> >>>>>     00:09:cc:02:c1:b6
> >>>>> but is
> >>>>>     00:01:cc:02:01:36 or
> >>>>>     00:09:cc:02:c1:36
> >>>>> Wireshark also complains about the Frame check sequence
> >>>>> ([FCS Status: Unverified]
> >>>>>
> >>>>> I can provide Wireshark dumps if someone is interested, but for me
> >>>>> at this point I do not want to fight with getting a 5.10.x kernel
> >>>>> to work as I was pretty far along moving to a 5.4.x kernel with
> >>>>> ipipe before running into the original problem posted (with ipipe
> >>>>> my system freezes on the first PCIe MSI interrupt. Note: without
> >>>>> ipipe, I do not see any issues).
> >>>>>
> >>>>> As mentioned, I first saw this problem a while ago when trying
> >>>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
> >>>>> then backed back down to 4.14.62+ipipe which works.
> >>>>>
> >>>>> I guess my next strategy is to try to figure out what changed
> >>>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
> >>>>> the hang as I hope the delta between them is not too large.
> >>>>>
> >>>>> If anyone has other suggestions or tips, they are more than welcome.
> >>>>
> >>>> As I wrote before: try the latest 4.19-cip-ipipe first.
> >>>
> >>> OK. Will do.
> >>
> >> I was able to run my test where the system hangs on the first
> >> PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
> >> unfortunately see the same behavior (system hangs).
> >>
> >> PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
> >> but when I add  ipipe and Xenomai 3.2.1 to the kernel, then
> >> the system hangs on the first PCIe MSI interrupt.
> >>
> >> As mentioned before, I first observed this behavior when moving from
> >> 4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
> >> into what changed in this time frame. My goal is still to move to
> >
> > Yes, that might be a way now to try to find the root cause. Problem: you
> > can't do bisection easily because of the merges with the I-pipe patch.
> > Therefore, it can be easier to actually debug where the system hangs, on
> > what. With some traces from there, it can then be simpler again to
> > analyse the differences between to working and non-working 4.14 kernels.
> >
>
> I have been able to get my test running on 4.14.110+ipipe without the
> system hanging on the first PCIe MSI interrupt. I have attached my
> patch (hopefully the attachment shows up correctly, but if not
> please let me know).
>
> The fix is to replace in the PCIe MSI interrupt handler the call
> to generic_handle_irq() with ipipe_handle_demuxed_irq.
>
> Actually, I had already made this patch on my 4.14.62 system
> in combination with a patch to make the PCIe driver an RTDM
> driver (see [1]) to address latency issues. As this patch was for
> a latency issue on 4.14.62 and not a hang, I did not immediately
> think about the ipipe part of the patch being the fix for the
> hang I was seeing when moving to 4.14.110+ipipe.
>
> I will now check if the same/similar patch fixes my original
> hang on 5.4.151+ipipe.
>
> Would it make sense to integrate this patch into next ipipe release?
>
> Scott
>
> >> 5.4.x+ipipe, but need to first understand what change is causing
> >> my problem. I assume it is a kernel change or i-pipe change which
> >> either causes the problem or triggers a problem in our system which
> >> was dormant up until now.
> >>
> >> I suppose I could try the 4.14.101 kernel with the 4.14.62 ipipe
> >> patch (if the patch applies cleanly) to try and determine if the
> >> problematic change is in the kernel or ipipe patch.
> >>
> >> A question in general. How "common" is it to use PCIe MSI interrupts
> >> and ipipe? Are other people running systems with PCIe MSI interrupts
> >> and ipipe without issues or is this simply not a typical use-case?
> >>
> >
> > PCIe and MSI are very common and well tested - on x86, possibly also on
> > arm64. It is very likely not that well on 32-bit arm, though.
> >
> > Jan
> >
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: 0001-ipipe-arm-Fix-handling-of-PCIe-MSI-interrupts.patch
> Type: text/x-patch
> Size: 993 bytes
> Desc: not available
> URL: <
> http://xenomai.org/pipermail/xenomai/attachments/20220317/c81fcb0f/attachment.bin
> >


Yes, if the patch works I can integrate into the next release. I was just
about the release the latest patch but I will wait for this to be included.
Does it work for 4.19 as well? I’m thinking we should include it in
4.19-cip as well.

Thanks

Greg

>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-17 15:24               ` Scott Reed
  2022-03-17 15:31                 ` Greg Gallagher
@ 2022-03-17 15:44                 ` Jan Kiszka
  2022-03-17 16:22                   ` Scott Reed
  1 sibling, 1 reply; 15+ messages in thread
From: Jan Kiszka @ 2022-03-17 15:44 UTC (permalink / raw)
  To: Scott Reed, xenomai, Greg Gallagher

On 17.03.22 16:24, Scott Reed wrote:
> 
> 
> On 3/16/22 11:35 AM, Jan Kiszka wrote:
>> On 16.03.22 10:58, Scott Reed wrote:
>>>
>>>
>>> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>>>>
>>>>
>>>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>>>>> On 14.03.22 18:45, Scott Reed wrote:
>>>>>>
>>>>>>
>>>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>>>>
>>>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>>>>
>>>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>>>>> hangs with no message output on the serial console or in
>>>>>>>>> /var/log/messages.
>>>>>>>>>
>>>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to
>>>>>>>>> 5.4.151
>>>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>>>>
>>>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe
>>>>>>>>> MSI
>>>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed
>>>>>>>>> MAC.
>>>>>>>>>
>>>>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1].
>>>>>>>>> Also
>>>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
>>>>>>>>> interrupt
>>>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to
>>>>>>>>> 5.4.151,
>>>>>>>>> but
>>>>>>>>> see the same hang.
>>>>>>>>
>>>>>>>> What about 4.19.y-cip? Specifically because of
>>>>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Actually, that commit is also missing from the last tagged 5.4
>>>>>>>> ipipe
>>>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head
>>>>>>>> instead.
>>>>>>>
>>>>>>> To do a quick test, I just applied the change from the commit you
>>>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
>>>>>>> did not
>>>>>>> help (hang still occurs with first interrupt).
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>>>>
>>>>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>>>>> and I-pipe?
>>>>>>>>>
>>>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>>>>> the problem. Would this be recommended?
>>>>>>>>
>>>>>>>> If you can migrate your test with reasonable effort, yes,
>>>>>>>> definitely.
>>>>>>>
>>>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes
>>>>>>> that
>>>>>>> it will not be too much effort and report back.
>>>>>>
>>>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the
>>>>>> first
>>>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103
>>>>>> kernel
>>>>>> on my platform.
>>>>>>
>>>>>> The kernel boots without a problem, but the FEC Ethernet port on the
>>>>>> i.MX 6 is not working (cannot ping in or out).
>>>>>
>>>>> Do you have or did you have any custom patches on top?
>>>>
>>>> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
>>>>      μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
>>>>
>>>>>
>>>>>>
>>>>>> I looked at the trace with Wireshark and it looks like when pinging
>>>>>> out that the ARP packet is corrupt and therefore failing. The ARP
>>>>>> packet is corrupt in that it looks like various bits are flipped. For
>>>>>> example, the source MAC address should be
>>>>>>     00:09:cc:02:c1:b6
>>>>>> but is
>>>>>>     00:01:cc:02:01:36 or
>>>>>>     00:09:cc:02:c1:36
>>>>>> Wireshark also complains about the Frame check sequence
>>>>>> ([FCS Status: Unverified]
>>>>>>
>>>>>> I can provide Wireshark dumps if someone is interested, but for me
>>>>>> at this point I do not want to fight with getting a 5.10.x kernel
>>>>>> to work as I was pretty far along moving to a 5.4.x kernel with
>>>>>> ipipe before running into the original problem posted (with ipipe
>>>>>> my system freezes on the first PCIe MSI interrupt. Note: without
>>>>>> ipipe, I do not see any issues).
>>>>>>
>>>>>> As mentioned, I first saw this problem a while ago when trying
>>>>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>>>>>> then backed back down to 4.14.62+ipipe which works.
>>>>>>
>>>>>> I guess my next strategy is to try to figure out what changed
>>>>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>>>>>> the hang as I hope the delta between them is not too large.
>>>>>>
>>>>>> If anyone has other suggestions or tips, they are more than welcome.
>>>>>
>>>>> As I wrote before: try the latest 4.19-cip-ipipe first.
>>>>
>>>> OK. Will do.
>>>
>>> I was able to run my test where the system hangs on the first
>>> PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
>>> unfortunately see the same behavior (system hangs).
>>>
>>> PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
>>> but when I add  ipipe and Xenomai 3.2.1 to the kernel, then
>>> the system hangs on the first PCIe MSI interrupt.
>>>
>>> As mentioned before, I first observed this behavior when moving from
>>> 4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
>>> into what changed in this time frame. My goal is still to move to
>>
>> Yes, that might be a way now to try to find the root cause. Problem: you
>> can't do bisection easily because of the merges with the I-pipe patch.
>> Therefore, it can be easier to actually debug where the system hangs, on
>> what. With some traces from there, it can then be simpler again to
>> analyse the differences between to working and non-working 4.14 kernels.
>>
> 
> I have been able to get my test running on 4.14.110+ipipe without the
> system hanging on the first PCIe MSI interrupt. I have attached my
> patch (hopefully the attachment shows up correctly, but if not
> please let me know).
> 
> The fix is to replace in the PCIe MSI interrupt handler the call
> to generic_handle_irq() with ipipe_handle_demuxed_irq.

Great to hear! Looks a lot like
https://source.denx.de/Xenomai/ipipe-noarch/-/commit/578e2cbf69ce8e22546423d403cda4a438d0751f

> 
> Actually, I had already made this patch on my 4.14.62 system
> in combination with a patch to make the PCIe driver an RTDM
> driver (see [1]) to address latency issues. As this patch was for
> a latency issue on 4.14.62 and not a hang, I did not immediately
> think about the ipipe part of the patch being the fix for the
> hang I was seeing when moving to 4.14.110+ipipe.
> 
> I will now check if the same/similar patch fixes my original
> hang on 5.4.151+ipipe.
> 
> Would it make sense to integrate this patch into next ipipe release?
> 

Yep. Please prepare an official patch once done with testing. I will add
it to ipipe-noarch, and then the architecture trees (relevant for arm &
arm64) can pick it up.

Jan

-- 
Siemens AG, Technology
Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-17 15:31                 ` Greg Gallagher
@ 2022-03-17 15:44                   ` Scott Reed
  0 siblings, 0 replies; 15+ messages in thread
From: Scott Reed @ 2022-03-17 15:44 UTC (permalink / raw)
  To: Greg Gallagher; +Cc: Jan Kiszka, xenomai



On 3/17/22 4:31 PM, Greg Gallagher wrote:
> 
> 
> On Thu, Mar 17, 2022 at 11:25 AM Scott Reed via Xenomai 
> <xenomai@xenomai.org <mailto:xenomai@xenomai.org>> wrote:
> 
> 
> 
>     On 3/16/22 11:35 AM, Jan Kiszka wrote:
>      > On 16.03.22 10:58, Scott Reed wrote:
>      >>
>      >>
>      >> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>      >>>
>      >>>
>      >>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>      >>>> On 14.03.22 18:45, Scott Reed wrote:
>      >>>>>
>      >>>>>
>      >>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>      >>>>>>
>      >>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>      >>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>      >>>>>>>> Hello,
>      >>>>>>>>
>      >>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and
>     I-pipe
>      >>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>      >>>>>>>>
>      >>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the
>     system
>      >>>>>>>> hangs with no message output on the serial console or in
>      >>>>>>>> /var/log/messages.
>      >>>>>>>>
>      >>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am
>     upgrading
>      >>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07
>     to 5.4.151
>      >>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>      >>>>>>>>
>      >>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates
>     PCIe MSI
>      >>>>>>>> interrupts to the CPU from, for example, an Altera
>     Triple-Speed MAC.
>      >>>>>>>>
>      >>>>>>>> I have stable system running for some time with Linux
>     4.14.62 with
>      >>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver
>     [1]. Also
>      >>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe
>     and also
>      >>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
>      >>>>>>>> interrupt
>      >>>>>>>> so I backed out back to 4.14.62. Now I am trying to move
>     to 5.4.151,
>      >>>>>>>> but
>      >>>>>>>> see the same hang.
>      >>>>>>>
>      >>>>>>> What about 4.19.y-cip? Specifically because of
>      >>>>>>>
>     https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c
>     <https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c>.
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>
>      >>>>>>> Actually, that commit is also missing from the last tagged
>     5.4 ipipe
>      >>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head
>     instead.
>      >>>>>>
>      >>>>>> To do a quick test, I just applied the change from the
>     commit you
>      >>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
>      >>>>>> did not
>      >>>>>> help (hang still occurs with first interrupt).
>      >>>>>>
>      >>>>>>>
>      >>>>>>>>
>      >>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>      >>>>>>>>
>      >>>>>>>> What are other people's experiences with using PCIe MSI
>     interrupts
>      >>>>>>>> and I-pipe?
>      >>>>>>>>
>      >>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I
>     still see
>      >>>>>>>> the problem. Would this be recommended?
>      >>>>>>>
>      >>>>>>> If you can migrate your test with reasonable effort, yes,
>     definitely.
>      >>>>>>
>      >>>>>> I will try to migrate my test to 5.10.103 Dovetail with the
>     hopes that
>      >>>>>> it will not be too much effort and report back.
>      >>>>>
>      >>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on
>     the first
>      >>>>> step, namely bringing up a standard (i.e. no Dovetail)
>     5.10.103 kernel
>      >>>>> on my platform.
>      >>>>>
>      >>>>> The kernel boots without a problem, but the FEC Ethernet port
>     on the
>      >>>>> i.MX 6 is not working (cannot ping in or out).
>      >>>>
>      >>>> Do you have or did you have any custom patches on top?
>      >>>
>      >>> Only a patch to add the device tree include (dtsi) for our imx6
>     SOC:
>      >>>      μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
>      >>>
>      >>>>
>      >>>>>
>      >>>>> I looked at the trace with Wireshark and it looks like when
>     pinging
>      >>>>> out that the ARP packet is corrupt and therefore failing. The ARP
>      >>>>> packet is corrupt in that it looks like various bits are
>     flipped. For
>      >>>>> example, the source MAC address should be
>      >>>>>     00:09:cc:02:c1:b6
>      >>>>> but is
>      >>>>>     00:01:cc:02:01:36 or
>      >>>>>     00:09:cc:02:c1:36
>      >>>>> Wireshark also complains about the Frame check sequence
>      >>>>> ([FCS Status: Unverified]
>      >>>>>
>      >>>>> I can provide Wireshark dumps if someone is interested, but
>     for me
>      >>>>> at this point I do not want to fight with getting a 5.10.x kernel
>      >>>>> to work as I was pretty far along moving to a 5.4.x kernel with
>      >>>>> ipipe before running into the original problem posted (with ipipe
>      >>>>> my system freezes on the first PCIe MSI interrupt. Note: without
>      >>>>> ipipe, I do not see any issues).
>      >>>>>
>      >>>>> As mentioned, I first saw this problem a while ago when trying
>      >>>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>      >>>>> then backed back down to 4.14.62+ipipe which works.
>      >>>>>
>      >>>>> I guess my next strategy is to try to figure out what changed
>      >>>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>      >>>>> the hang as I hope the delta between them is not too large.
>      >>>>>
>      >>>>> If anyone has other suggestions or tips, they are more than
>     welcome.
>      >>>>
>      >>>> As I wrote before: try the latest 4.19-cip-ipipe first.
>      >>>
>      >>> OK. Will do.
>      >>
>      >> I was able to run my test where the system hangs on the first
>      >> PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
>      >> unfortunately see the same behavior (system hangs).
>      >>
>      >> PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
>      >> but when I add  ipipe and Xenomai 3.2.1 to the kernel, then
>      >> the system hangs on the first PCIe MSI interrupt.
>      >>
>      >> As mentioned before, I first observed this behavior when moving from
>      >> 4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
>      >> into what changed in this time frame. My goal is still to move to
>      >
>      > Yes, that might be a way now to try to find the root cause.
>     Problem: you
>      > can't do bisection easily because of the merges with the I-pipe
>     patch.
>      > Therefore, it can be easier to actually debug where the system
>     hangs, on
>      > what. With some traces from there, it can then be simpler again to
>      > analyse the differences between to working and non-working 4.14
>     kernels.
>      >
> 
>     I have been able to get my test running on 4.14.110+ipipe without the
>     system hanging on the first PCIe MSI interrupt. I have attached my
>     patch (hopefully the attachment shows up correctly, but if not
>     please let me know).
> 
>     The fix is to replace in the PCIe MSI interrupt handler the call
>     to generic_handle_irq() with ipipe_handle_demuxed_irq.
> 
>     Actually, I had already made this patch on my 4.14.62 system
>     in combination with a patch to make the PCIe driver an RTDM
>     driver (see [1]) to address latency issues. As this patch was for
>     a latency issue on 4.14.62 and not a hang, I did not immediately
>     think about the ipipe part of the patch being the fix for the
>     hang I was seeing when moving to 4.14.110+ipipe.
> 
>     I will now check if the same/similar patch fixes my original
>     hang on 5.4.151+ipipe.
> 
>     Would it make sense to integrate this patch into next ipipe release?
> 
>     Scott
> 
>      >> 5.4.x+ipipe, but need to first understand what change is causing
>      >> my problem. I assume it is a kernel change or i-pipe change which
>      >> either causes the problem or triggers a problem in our system which
>      >> was dormant up until now.
>      >>
>      >> I suppose I could try the 4.14.101 kernel with the 4.14.62 ipipe
>      >> patch (if the patch applies cleanly) to try and determine if the
>      >> problematic change is in the kernel or ipipe patch.
>      >>
>      >> A question in general. How "common" is it to use PCIe MSI interrupts
>      >> and ipipe? Are other people running systems with PCIe MSI interrupts
>      >> and ipipe without issues or is this simply not a typical use-case?
>      >>
>      >
>      > PCIe and MSI are very common and well tested - on x86, possibly
>     also on
>      > arm64. It is very likely not that well on 32-bit arm, though.
>      >
>      > Jan
>      >
>     -------------- next part --------------
>     A non-text attachment was scrubbed...
>     Name: 0001-ipipe-arm-Fix-handling-of-PCIe-MSI-interrupts.patch
>     Type: text/x-patch
>     Size: 993 bytes
>     Desc: not available
>     URL:
>     <http://xenomai.org/pipermail/xenomai/attachments/20220317/c81fcb0f/attachment.bin
>     <http://xenomai.org/pipermail/xenomai/attachments/20220317/c81fcb0f/attachment.bin>>
> 
> 
> Yes, if the patch works I can integrate into the next release. I was 
> just about the release the latest patch but I will wait for this to be 
> included. Does it work for 4.19 as well? I’m thinking we should include 
> it in 4.19-cip as well.

I will test it on 4.19-cip and let you know.

Scott
> 
> Thanks
> 
> Greg
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-17 15:44                 ` Jan Kiszka
@ 2022-03-17 16:22                   ` Scott Reed
  2022-03-18 16:47                     ` Scott Reed
  0 siblings, 1 reply; 15+ messages in thread
From: Scott Reed @ 2022-03-17 16:22 UTC (permalink / raw)
  To: Jan Kiszka, xenomai, Greg Gallagher



On 3/17/22 4:44 PM, Jan Kiszka wrote:
> On 17.03.22 16:24, Scott Reed wrote:
>>
>>
>> On 3/16/22 11:35 AM, Jan Kiszka wrote:
>>> On 16.03.22 10:58, Scott Reed wrote:
>>>>
>>>>
>>>> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>>>>>
>>>>>
>>>>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>>>>>> On 14.03.22 18:45, Scott Reed wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>>>>>
>>>>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>>>>>
>>>>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>>>>>> hangs with no message output on the serial console or in
>>>>>>>>>> /var/log/messages.
>>>>>>>>>>
>>>>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to
>>>>>>>>>> 5.4.151
>>>>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>>>>>
>>>>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe
>>>>>>>>>> MSI
>>>>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed
>>>>>>>>>> MAC.
>>>>>>>>>>
>>>>>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1].
>>>>>>>>>> Also
>>>>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
>>>>>>>>>> interrupt
>>>>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to
>>>>>>>>>> 5.4.151,
>>>>>>>>>> but
>>>>>>>>>> see the same hang.
>>>>>>>>>
>>>>>>>>> What about 4.19.y-cip? Specifically because of
>>>>>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Actually, that commit is also missing from the last tagged 5.4
>>>>>>>>> ipipe
>>>>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head
>>>>>>>>> instead.
>>>>>>>>
>>>>>>>> To do a quick test, I just applied the change from the commit you
>>>>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
>>>>>>>> did not
>>>>>>>> help (hang still occurs with first interrupt).
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>>>>>
>>>>>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>>>>>> and I-pipe?
>>>>>>>>>>
>>>>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>>>>>> the problem. Would this be recommended?
>>>>>>>>>
>>>>>>>>> If you can migrate your test with reasonable effort, yes,
>>>>>>>>> definitely.
>>>>>>>>
>>>>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes
>>>>>>>> that
>>>>>>>> it will not be too much effort and report back.
>>>>>>>
>>>>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the
>>>>>>> first
>>>>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103
>>>>>>> kernel
>>>>>>> on my platform.
>>>>>>>
>>>>>>> The kernel boots without a problem, but the FEC Ethernet port on the
>>>>>>> i.MX 6 is not working (cannot ping in or out).
>>>>>>
>>>>>> Do you have or did you have any custom patches on top?
>>>>>
>>>>> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
>>>>>       μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
>>>>>
>>>>>>
>>>>>>>
>>>>>>> I looked at the trace with Wireshark and it looks like when pinging
>>>>>>> out that the ARP packet is corrupt and therefore failing. The ARP
>>>>>>> packet is corrupt in that it looks like various bits are flipped. For
>>>>>>> example, the source MAC address should be
>>>>>>>      00:09:cc:02:c1:b6
>>>>>>> but is
>>>>>>>      00:01:cc:02:01:36 or
>>>>>>>      00:09:cc:02:c1:36
>>>>>>> Wireshark also complains about the Frame check sequence
>>>>>>> ([FCS Status: Unverified]
>>>>>>>
>>>>>>> I can provide Wireshark dumps if someone is interested, but for me
>>>>>>> at this point I do not want to fight with getting a 5.10.x kernel
>>>>>>> to work as I was pretty far along moving to a 5.4.x kernel with
>>>>>>> ipipe before running into the original problem posted (with ipipe
>>>>>>> my system freezes on the first PCIe MSI interrupt. Note: without
>>>>>>> ipipe, I do not see any issues).
>>>>>>>
>>>>>>> As mentioned, I first saw this problem a while ago when trying
>>>>>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>>>>>>> then backed back down to 4.14.62+ipipe which works.
>>>>>>>
>>>>>>> I guess my next strategy is to try to figure out what changed
>>>>>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>>>>>>> the hang as I hope the delta between them is not too large.
>>>>>>>
>>>>>>> If anyone has other suggestions or tips, they are more than welcome.
>>>>>>
>>>>>> As I wrote before: try the latest 4.19-cip-ipipe first.
>>>>>
>>>>> OK. Will do.
>>>>
>>>> I was able to run my test where the system hangs on the first
>>>> PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
>>>> unfortunately see the same behavior (system hangs).
>>>>
>>>> PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
>>>> but when I add  ipipe and Xenomai 3.2.1 to the kernel, then
>>>> the system hangs on the first PCIe MSI interrupt.
>>>>
>>>> As mentioned before, I first observed this behavior when moving from
>>>> 4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
>>>> into what changed in this time frame. My goal is still to move to
>>>
>>> Yes, that might be a way now to try to find the root cause. Problem: you
>>> can't do bisection easily because of the merges with the I-pipe patch.
>>> Therefore, it can be easier to actually debug where the system hangs, on
>>> what. With some traces from there, it can then be simpler again to
>>> analyse the differences between to working and non-working 4.14 kernels.
>>>
>>
>> I have been able to get my test running on 4.14.110+ipipe without the
>> system hanging on the first PCIe MSI interrupt. I have attached my
>> patch (hopefully the attachment shows up correctly, but if not
>> please let me know).
>>
>> The fix is to replace in the PCIe MSI interrupt handler the call
>> to generic_handle_irq() with ipipe_handle_demuxed_irq.
> 
> Great to hear! Looks a lot like
> https://source.denx.de/Xenomai/ipipe-noarch/-/commit/578e2cbf69ce8e22546423d403cda4a438d0751f
> 
>>
>> Actually, I had already made this patch on my 4.14.62 system
>> in combination with a patch to make the PCIe driver an RTDM
>> driver (see [1]) to address latency issues. As this patch was for
>> a latency issue on 4.14.62 and not a hang, I did not immediately
>> think about the ipipe part of the patch being the fix for the
>> hang I was seeing when moving to 4.14.110+ipipe.
>>
>> I will now check if the same/similar patch fixes my original
>> hang on 5.4.151+ipipe.
>>
>> Would it make sense to integrate this patch into next ipipe release?
>>
> 
> Yep. Please prepare an official patch once done with testing. I will add
> it to ipipe-noarch, and then the architecture trees (relevant for arm &
> arm64) can pick it up.

Will do. May take me a day or two to be able to submit an official patch
as it will be my first official patch.

I will submit (i.e. send to mailing list) the patch on ipipe-noarch:
ipipe/master which looks like it is currently based on 5.4.179 once
my testing is complete.

If my understanding is not correct, please let me know.

Scott
> 
> Jan
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62
  2022-03-17 16:22                   ` Scott Reed
@ 2022-03-18 16:47                     ` Scott Reed
  0 siblings, 0 replies; 15+ messages in thread
From: Scott Reed @ 2022-03-18 16:47 UTC (permalink / raw)
  To: Jan Kiszka, xenomai, Greg Gallagher



On 3/17/22 5:22 PM, Scott Reed via Xenomai wrote:
> 
> 
> On 3/17/22 4:44 PM, Jan Kiszka wrote:
>> On 17.03.22 16:24, Scott Reed wrote:
>>>
>>>
>>> On 3/16/22 11:35 AM, Jan Kiszka wrote:
>>>> On 16.03.22 10:58, Scott Reed wrote:
>>>>>
>>>>>
>>>>> On 3/15/22 9:42 AM, Scott Reed via Xenomai wrote:
>>>>>>
>>>>>>
>>>>>> On 3/15/22 7:32 AM, Jan Kiszka wrote:
>>>>>>> On 14.03.22 18:45, Scott Reed wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/11/22 2:13 PM, Scott Reed via Xenomai wrote:
>>>>>>>>>
>>>>>>>>> On 3/11/22 12:38 PM, Jan Kiszka wrote:
>>>>>>>>>> On 11.03.22 11:12, Scott Reed via Xenomai wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am seeing an apparent issue with PCIe MSI interrupts and I-pipe
>>>>>>>>>>> when trying to move to a newer kernel and I-pipe patch.
>>>>>>>>>>>
>>>>>>>>>>> The issue is as soon as a PCIe MSI interrupt occurs, the system
>>>>>>>>>>> hangs with no message output on the serial console or in
>>>>>>>>>>> /var/log/messages.
>>>>>>>>>>>
>>>>>>>>>>> The platform I am working on is a "i.MX 6 Quad" and I am upgrading
>>>>>>>>>>> from a 4.14.62 kernel and I-pipe patch with Xenomai 3.07 to
>>>>>>>>>>> 5.4.151
>>>>>>>>>>> kernel and I-pipe patch with Xenomai 3.2.1.
>>>>>>>>>>>
>>>>>>>>>>> Our FPGA is connected to the i.MX 6 via PCIe and generates PCIe
>>>>>>>>>>> MSI
>>>>>>>>>>> interrupts to the CPU from, for example, an Altera Triple-Speed
>>>>>>>>>>> MAC.
>>>>>>>>>>>
>>>>>>>>>>> I have stable system running for some time with Linux 4.14.62 with
>>>>>>>>>>> Xenomai 3.07 although I did need to patch the PCIe driver [1].
>>>>>>>>>>> Also
>>>>>>>>>>> some time back, I tried to move to 4.14.110 with I-pipe and also
>>>>>>>>>>> saw same scenario of my system hanging on the first PCIe MSI
>>>>>>>>>>> interrupt
>>>>>>>>>>> so I backed out back to 4.14.62. Now I am trying to move to
>>>>>>>>>>> 5.4.151,
>>>>>>>>>>> but
>>>>>>>>>>> see the same hang.
>>>>>>>>>>
>>>>>>>>>> What about 4.19.y-cip? Specifically because of
>>>>>>>>>> https://source.denx.de/Xenomai/ipipe-arm/-/commit/a1aab8ba3098e595f9fa8b23a011ce6d72f8699c.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Actually, that commit is also missing from the last tagged 5.4
>>>>>>>>>> ipipe
>>>>>>>>>> version (ipipe-core-5.4.151-arm-4). So try ipipe/5.4.y head
>>>>>>>>>> instead.
>>>>>>>>>
>>>>>>>>> To do a quick test, I just applied the change from the commit you
>>>>>>>>> referenced above to my 5.4.151 ipipe kernel and it unfortunately
>>>>>>>>> did not
>>>>>>>>> help (hang still occurs with first interrupt).
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Before I dive into analyzing the hang, I wanted to ask:
>>>>>>>>>>>
>>>>>>>>>>> What are other people's experiences with using PCIe MSI interrupts
>>>>>>>>>>> and I-pipe?
>>>>>>>>>>>
>>>>>>>>>>> I am thinking of trying 5.10.103 Dovetail to see if I still see
>>>>>>>>>>> the problem. Would this be recommended?
>>>>>>>>>>
>>>>>>>>>> If you can migrate your test with reasonable effort, yes,
>>>>>>>>>> definitely.
>>>>>>>>>
>>>>>>>>> I will try to migrate my test to 5.10.103 Dovetail with the hopes
>>>>>>>>> that
>>>>>>>>> it will not be too much effort and report back.
>>>>>>>>
>>>>>>>> I tried to migrate my test to 5.10.103 Dovetail and failed on the
>>>>>>>> first
>>>>>>>> step, namely bringing up a standard (i.e. no Dovetail) 5.10.103
>>>>>>>> kernel
>>>>>>>> on my platform.
>>>>>>>>
>>>>>>>> The kernel boots without a problem, but the FEC Ethernet port on the
>>>>>>>> i.MX 6 is not working (cannot ping in or out).
>>>>>>>
>>>>>>> Do you have or did you have any custom patches on top?
>>>>>>
>>>>>> Only a patch to add the device tree include (dtsi) for our imx6 SOC:
>>>>>>       μQ7-962 - μQseven standard module with NXP i.MX 6 Processor
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> I looked at the trace with Wireshark and it looks like when pinging
>>>>>>>> out that the ARP packet is corrupt and therefore failing. The ARP
>>>>>>>> packet is corrupt in that it looks like various bits are flipped. For
>>>>>>>> example, the source MAC address should be
>>>>>>>>      00:09:cc:02:c1:b6
>>>>>>>> but is
>>>>>>>>      00:01:cc:02:01:36 or
>>>>>>>>      00:09:cc:02:c1:36
>>>>>>>> Wireshark also complains about the Frame check sequence
>>>>>>>> ([FCS Status: Unverified]
>>>>>>>>
>>>>>>>> I can provide Wireshark dumps if someone is interested, but for me
>>>>>>>> at this point I do not want to fight with getting a 5.10.x kernel
>>>>>>>> to work as I was pretty far along moving to a 5.4.x kernel with
>>>>>>>> ipipe before running into the original problem posted (with ipipe
>>>>>>>> my system freezes on the first PCIe MSI interrupt. Note: without
>>>>>>>> ipipe, I do not see any issues).
>>>>>>>>
>>>>>>>> As mentioned, I first saw this problem a while ago when trying
>>>>>>>> to move from 4.14.62+ipipe to 4.14.110+ipipe and at that time
>>>>>>>> then backed back down to 4.14.62+ipipe which works.
>>>>>>>>
>>>>>>>> I guess my next strategy is to try to figure out what changed
>>>>>>>> between 4.14.62+ipipe and 4.14.110+ipipe which triggers/causes
>>>>>>>> the hang as I hope the delta between them is not too large.
>>>>>>>>
>>>>>>>> If anyone has other suggestions or tips, they are more than welcome.
>>>>>>>
>>>>>>> As I wrote before: try the latest 4.19-cip-ipipe first.
>>>>>>
>>>>>> OK. Will do.
>>>>>
>>>>> I was able to run my test where the system hangs on the first
>>>>> PCIe MSI interrupt on the latest 4.19-cip-ipipe (4.19.229) and
>>>>> unfortunately see the same behavior (system hangs).
>>>>>
>>>>> PCIe MSI interrupts work as expected on a "vanilla" 4.19.229 kernel,
>>>>> but when I add  ipipe and Xenomai 3.2.1 to the kernel, then
>>>>> the system hangs on the first PCIe MSI interrupt.
>>>>>
>>>>> As mentioned before, I first observed this behavior when moving from
>>>>> 4.14.62+ipipe and 4.14.110+ipipe so I think my best bet is to dive
>>>>> into what changed in this time frame. My goal is still to move to
>>>>
>>>> Yes, that might be a way now to try to find the root cause. Problem: you
>>>> can't do bisection easily because of the merges with the I-pipe patch.
>>>> Therefore, it can be easier to actually debug where the system hangs, on
>>>> what. With some traces from there, it can then be simpler again to
>>>> analyse the differences between to working and non-working 4.14 kernels.
>>>>
>>>
>>> I have been able to get my test running on 4.14.110+ipipe without the
>>> system hanging on the first PCIe MSI interrupt. I have attached my
>>> patch (hopefully the attachment shows up correctly, but if not
>>> please let me know).
>>>
>>> The fix is to replace in the PCIe MSI interrupt handler the call
>>> to generic_handle_irq() with ipipe_handle_demuxed_irq.
>>
>> Great to hear! Looks a lot like
>> https://source.denx.de/Xenomai/ipipe-noarch/-/commit/578e2cbf69ce8e22546423d403cda4a438d0751f
>>
>>>
>>> Actually, I had already made this patch on my 4.14.62 system
>>> in combination with a patch to make the PCIe driver an RTDM
>>> driver (see [1]) to address latency issues. As this patch was for
>>> a latency issue on 4.14.62 and not a hang, I did not immediately
>>> think about the ipipe part of the patch being the fix for the
>>> hang I was seeing when moving to 4.14.110+ipipe.
>>>
>>> I will now check if the same/similar patch fixes my original
>>> hang on 5.4.151+ipipe.
>>>
>>> Would it make sense to integrate this patch into next ipipe release?
>>>
>>
>> Yep. Please prepare an official patch once done with testing. I will add
>> it to ipipe-noarch, and then the architecture trees (relevant for arm &
>> arm64) can pick it up.
> 
> Will do. May take me a day or two to be able to submit an official patch
> as it will be my first official patch.
> 
> I will submit (i.e. send to mailing list) the patch on ipipe-noarch:
> ipipe/master which looks like it is currently based on 5.4.179 once
> my testing is complete.
> 
> If my understanding is not correct, please let me know.

In addition to testing the patch on 4.14.110 on my system, I have also
successfully tested the patch on 4.19.229 and 5.4.151.

I will be submitting then the patch to ipipe-noarch then shortly. As
this is the first time I am submitting an official patch, if did
something incorrectly, please let me know.

Scott
> 
> Scott
>>
>> Jan
>>
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-03-18 16:47 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-11 10:12 System hang on first PCIe MSI interrupt with I-pipe kernels newer than 4.14.62 Scott Reed
2022-03-11 11:38 ` Jan Kiszka
2022-03-11 13:13   ` Scott Reed
2022-03-14 17:45     ` Scott Reed
2022-03-15  6:32       ` Jan Kiszka
2022-03-15  8:42         ` Scott Reed
2022-03-16  9:58           ` Scott Reed
2022-03-16 10:22             ` Scott Reed
2022-03-16 10:35             ` Jan Kiszka
2022-03-17 15:24               ` Scott Reed
2022-03-17 15:31                 ` Greg Gallagher
2022-03-17 15:44                   ` Scott Reed
2022-03-17 15:44                 ` Jan Kiszka
2022-03-17 16:22                   ` Scott Reed
2022-03-18 16:47                     ` Scott Reed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.