All of lore.kernel.org
 help / color / mirror / Atom feed
* Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
@ 2022-07-12  1:10 Sri Subramanian
  2022-07-14  8:16 ` Philippe Gerum
  2022-07-14 15:18 ` Philippe Gerum
  0 siblings, 2 replies; 14+ messages in thread
From: Sri Subramanian @ 2022-07-12  1:10 UTC (permalink / raw)
  To: xenomai

Hello,

I am trying to optimize sending ethernet raw packets (for an Ethercat
fieldbus application) on a RaspberryPi/CM4 environment.

Before implementing the application, I thought I'd try getting an ICMP
echo using the "tidbits" sample program oob-net-icmp.

I am using eth0 from a RaspberryPI (used to ping) connected to another
PI device (CM4 on an IO board) also on eth0, which is echoing back the
pings using oob-net-icmp. I set up vlan 42 as documented in the
program, for both the devices. (I verify the network connection is
good  prior to this by sending pings and verifying the echo responses,
through the standard Linux stack on eth0).

I also set up the ping-receiving device so the Xenomai kernel can
route the packets up the Xenomai mini stack, setting:

echo 1> /sys/class/net/eth0.42/oob_port
echo 42 > /sys/class/evl/control/net_vlans

On the first ping, the kernel crashes with the stack as below. I've
tested on both versions 5.15 and 5.17 (both with CONFIG_NET_OOB=y).
Thanks for any help or guidance.

-Sri Subramanian

Crash stack:

[1569.557819] Unable to handle kernel paging request at virtual
address 0000000008ab2428
[1569.656294] Mem abort info:
[1569.692521] ESR = 0x96000005 [1569.731885] EC 0x25: DABT (current
EL), IL = 32 bits
[1569.798702] SET = 0, FnV = 0
[1569.838099] EA = 0, S1PTH = 0
[1569.878560] FSC 0x05: level 1 translation fault
[1569.940162] Data abort info:
[1569.977535) ISV = 0, ISS = 0x00000005
[1570.026534] CH= 0, HnR = 0
[1570.064995] user pgtable: 4k pages, 39-bit Vis, pgdp=0000000101a1f000
[1570.1457331 [0000000008ab2428] pgd=0000000000000000,
p4d:0000000000000000, pud:0000000000000000
[1570,253996] Internal error: Oops: 960000005 [#1] PREEMPT SMP
Modules linked in: bluetooth ecdh_generic ecc rfkill squashfs dm_md
snd_soc_hdmi_codec spidev
cdc_mbim cdc_wdm cdc_ncm cdc_ether ax88179_178a vc4 cec
raspberrypi_hwmon drm_cma_helper
12c_brcmstb 12c_bcm2835 drm_kms_helper sp_bcm2835 snd_soc_core
snd_pcm_dmaengine snd_pcm snd_timer
snd syscopyarea sysfillrect sysimgblt fb_sys_fops nvnem_rmem
uio_pdrv_genirq uio sch_fq_codel
fuse drm drm_panel_orientation_quirks backlight configfs ip_tables x_tables ipv6
[1570.254135] CPU: O PID: 1007 Comm: kworker/u8:0 Not tainted
5.17.0-xeno-5.17+cob-net+ #7
[1570.254148] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
[1570.254153] IRQ stage: Linux
[1570.254163] Workqueue: 0x0 (events unbound)
[1570.254179] pstate: 00000005 (nzev dalf -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[1570.254189] pc skb clone+0x28/0xe0
[1570.254207] Ir dev_queue mit nit+0x130/0x298
[1570.254221] sp : ffffffc00941b9c0
[1570.254225] x29: ffffffc00941b9c0 x28: 0000000000000000 x27: ffffffc008b23000
[1570.254240] x26: ffffff8101800000 x25: ffffffc00ed6af90 x24: 0000000000000004
[1570.254253] x23: 0000000000000001 x22: ffffff8103124e00 x21: ffffff8101a00080
[1570.254265] x20: 0000000000000820 x19: ffffff8103124e00 x18: 0000000000000000
[1570.2542771 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000062cc76c8
[1570.254289] x14: 010002002c120000 x13: e2ce38387840bd21 x12: 0acc300000000002
[1570.254301] x11: 0000000000000000 x10: ffffffc1f632e000 x9 : 00000000000001f0
[1570.254314] x8 : 0000000000000000 x7 : ffffff8100b90000 x6 : 0000000000000010
[1570.254326] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffffc00941bb14
[1570.254337] x2 : 0000000008ab2428 x1 : 0000000000000000 x0 : 0000000008ab2428
[1570.254350] Call trace:
[1570.254353] skb_clone+0x28/0xe0
[1570.254364] dev_queue_mit_nit+0x130/0x298
[1570.254376] dev_hard_start_xmit+0xib0/0xids
[1570.254388] sch direct mit+0x94/0x1e8
[1570.254400] __qdisc_run+0x120/0x6b0
[1570.254410] net_tx_action+0x138/0x210 [1570.254417] stext+0x11c/0x260
[1570.254426] irq exit+0xb0/0xf0
[1570.254436] arch_do_IRQ pipelined+0x48/0x70
[1570.254446] sync_current_irq_stage+0x1ec/0x280
[1570.254460] __inband_irq_enable+0x78/0x90
[1570.254471] inband_irq_enable+0x10/0x20
[1570.254482] finish_task_switch+0x98/0x228
[1570.254494] __schedule+0x288/0x710
[1570.2545071 schedule+0x54/0xf0
[1570.254518] worker thread+0xbc/0x420
[1570.254526] kthread+0x110/0x120
[1570.254538) ret_from_fork+0x10/0x20
[1570.254553] Code: b4000100 b940b800 19406261 86000022 (36606620)
[1573.224950]---[ end trace 0000000000000000]---
[1573.224958) Kernel panic- not syncing: Oops: Fatal exception in interrupt
[1573.224963] SMP: stopping secondary CPUS
[1573.2249821 Kernel Offset: disabled
[1573.224984] CPU features: 0x40,00000342,00000842
[1573.224991] Memory Limit: none

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-12  1:10 Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp Sri Subramanian
@ 2022-07-14  8:16 ` Philippe Gerum
  2022-07-14 15:18 ` Philippe Gerum
  1 sibling, 0 replies; 14+ messages in thread
From: Philippe Gerum @ 2022-07-14  8:16 UTC (permalink / raw)
  To: Sri Subramanian; +Cc: xenomai


Sri Subramanian <sridhar.subramanian@gmail.com> writes:

> Hello,
>
> I am trying to optimize sending ethernet raw packets (for an Ethercat
> fieldbus application) on a RaspberryPi/CM4 environment.
>
> Before implementing the application, I thought I'd try getting an ICMP
> echo using the "tidbits" sample program oob-net-icmp.
>
> I am using eth0 from a RaspberryPI (used to ping) connected to another
> PI device (CM4 on an IO board) also on eth0, which is echoing back the
> pings using oob-net-icmp. I set up vlan 42 as documented in the
> program, for both the devices. (I verify the network connection is
> good  prior to this by sending pings and verifying the echo responses,
> through the standard Linux stack on eth0).
>
> I also set up the ping-receiving device so the Xenomai kernel can
> route the packets up the Xenomai mini stack, setting:
>
> echo 1> /sys/class/net/eth0.42/oob_port
> echo 42 > /sys/class/evl/control/net_vlans
>
> On the first ping, the kernel crashes with the stack as below. I've
> tested on both versions 5.15 and 5.17 (both with CONFIG_NET_OOB=y).
> Thanks for any help or guidance.
>

Thanks for reporting, I can reproduce this as well. Maybe a problem with
the prep/recycling of socket buffers. Investigating this bug is on my
todo list.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-12  1:10 Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp Sri Subramanian
  2022-07-14  8:16 ` Philippe Gerum
@ 2022-07-14 15:18 ` Philippe Gerum
  2022-07-15  0:59   ` Sri Subramanian
  1 sibling, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2022-07-14 15:18 UTC (permalink / raw)
  To: Sri Subramanian; +Cc: xenomai


Sri Subramanian <sridhar.subramanian@gmail.com> writes:

> Hello,
>
> I am trying to optimize sending ethernet raw packets (for an Ethercat
> fieldbus application) on a RaspberryPi/CM4 environment.
>

No NIC driver currently implements the EVL out-of-band protocol fully
yet, so at the moment, although the EVL network stack does improve the
timings, there is still some work on the device side to get a full
real-time path from the application to the wire.

i.e. today we have:

(app) <---> EVL netstack <--/ non-rt /--> (NIC driver via in-band context)

we aim at a full real-time path next:

(app) <---> EVL netstack <-------> (NIC driver via out-of-band context)


> Before implementing the application, I thought I'd try getting an ICMP
> echo using the "tidbits" sample program oob-net-icmp.
>
> I am using eth0 from a RaspberryPI (used to ping) connected to another
> PI device (CM4 on an IO board) also on eth0, which is echoing back the
> pings using oob-net-icmp. I set up vlan 42 as documented in the
> program, for both the devices. (I verify the network connection is
> good  prior to this by sending pings and verifying the echo responses,
> through the standard Linux stack on eth0).
>
> I also set up the ping-receiving device so the Xenomai kernel can
> route the packets up the Xenomai mini stack, setting:
>
> echo 1> /sys/class/net/eth0.42/oob_port
> echo 42 > /sys/class/evl/control/net_vlans
>
> On the first ping, the kernel crashes with the stack as below. I've
> tested on both versions 5.15 and 5.17 (both with CONFIG_NET_OOB=y).
> Thanks for any help or guidance.
>
> -Sri Subramanian
>
> Crash stack:
>
> [1569.557819] Unable to handle kernel paging request at virtual
> address 0000000008ab2428
> [1569.656294] Mem abort info:
> [1569.692521] ESR = 0x96000005 [1569.731885] EC 0x25: DABT (current
> EL), IL = 32 bits
> [1569.798702] SET = 0, FnV = 0
> [1569.838099] EA = 0, S1PTH = 0
> [1569.878560] FSC 0x05: level 1 translation fault
> [1569.940162] Data abort info:
> [1569.977535) ISV = 0, ISS = 0x00000005
> [1570.026534] CH= 0, HnR = 0
> [1570.064995] user pgtable: 4k pages, 39-bit Vis, pgdp=0000000101a1f000
> [1570.1457331 [0000000008ab2428] pgd=0000000000000000,
> p4d:0000000000000000, pud:0000000000000000
> [1570,253996] Internal error: Oops: 960000005 [#1] PREEMPT SMP
> Modules linked in: bluetooth ecdh_generic ecc rfkill squashfs dm_md
> snd_soc_hdmi_codec spidev
> cdc_mbim cdc_wdm cdc_ncm cdc_ether ax88179_178a vc4 cec
> raspberrypi_hwmon drm_cma_helper
> 12c_brcmstb 12c_bcm2835 drm_kms_helper sp_bcm2835 snd_soc_core
> snd_pcm_dmaengine snd_pcm snd_timer
> snd syscopyarea sysfillrect sysimgblt fb_sys_fops nvnem_rmem
> uio_pdrv_genirq uio sch_fq_codel
> fuse drm drm_panel_orientation_quirks backlight configfs ip_tables x_tables ipv6
> [1570.254135] CPU: O PID: 1007 Comm: kworker/u8:0 Not tainted
> 5.17.0-xeno-5.17+cob-net+ #7
> [1570.254148] Hardware name: Raspberry Pi Compute Module 4 Rev 1.0 (DT)
> [1570.254153] IRQ stage: Linux
> [1570.254163] Workqueue: 0x0 (events unbound)
> [1570.254179] pstate: 00000005 (nzev dalf -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [1570.254189] pc skb clone+0x28/0xe0
> [1570.254207] Ir dev_queue mit nit+0x130/0x298
> [1570.254221] sp : ffffffc00941b9c0
> [1570.254225] x29: ffffffc00941b9c0 x28: 0000000000000000 x27: ffffffc008b23000
> [1570.254240] x26: ffffff8101800000 x25: ffffffc00ed6af90 x24: 0000000000000004
> [1570.254253] x23: 0000000000000001 x22: ffffff8103124e00 x21: ffffff8101a00080
> [1570.254265] x20: 0000000000000820 x19: ffffff8103124e00 x18: 0000000000000000
> [1570.2542771 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000062cc76c8
> [1570.254289] x14: 010002002c120000 x13: e2ce38387840bd21 x12: 0acc300000000002
> [1570.254301] x11: 0000000000000000 x10: ffffffc1f632e000 x9 : 00000000000001f0
> [1570.254314] x8 : 0000000000000000 x7 : ffffff8100b90000 x6 : 0000000000000010
> [1570.254326] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffffc00941bb14
> [1570.254337] x2 : 0000000008ab2428 x1 : 0000000000000000 x0 : 0000000008ab2428
> [1570.254350] Call trace:
> [1570.254353] skb_clone+0x28/0xe0
> [1570.254364] dev_queue_mit_nit+0x130/0x298
> [1570.254376] dev_hard_start_xmit+0xib0/0xids
> [1570.254388] sch direct mit+0x94/0x1e8
> [1570.254400] __qdisc_run+0x120/0x6b0
> [1570.254410] net_tx_action+0x138/0x210 [1570.254417] stext+0x11c/0x260
> [1570.254426] irq exit+0xb0/0xf0
> [1570.254436] arch_do_IRQ pipelined+0x48/0x70
> [1570.254446] sync_current_irq_stage+0x1ec/0x280
> [1570.254460] __inband_irq_enable+0x78/0x90
> [1570.254471] inband_irq_enable+0x10/0x20
> [1570.254482] finish_task_switch+0x98/0x228
> [1570.254494] __schedule+0x288/0x710
> [1570.2545071 schedule+0x54/0xf0
> [1570.254518] worker thread+0xbc/0x420
> [1570.254526] kthread+0x110/0x120
> [1570.254538) ret_from_fork+0x10/0x20
> [1570.254553] Code: b4000100 b940b800 19406261 86000022 (36606620)
> [1573.224950]---[ end trace 0000000000000000]---
> [1573.224958) Kernel panic- not syncing: Oops: Fatal exception in interrupt
> [1573.224963] SMP: stopping secondary CPUS
> [1573.2249821 Kernel Offset: disabled
> [1573.224984] CPU features: 0x40,00000342,00000842
> [1573.224991] Memory Limit: none


This patch may help:
https://source.denx.de/Xenomai/xenomai4/linux-evl/-/commit/2ad6b2207f9f1669f64f6816428a20e1bbd6357c

Feedback on this fix welcome, so that we may assume the issue is closed.

PS: EVL is maintained for the latest LTS and SLTS releases, and follows
the mainline development tip closely too. This means v5.10-stable,
v5.15-stable and v5.19 at the moment. Other releases published may not
include all available patches: if you want to use them nevertheless, you
will need to cherry-pick fixes from maintained trees.

Thanks,

-- 
Philippe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-14 15:18 ` Philippe Gerum
@ 2022-07-15  0:59   ` Sri Subramanian
  2022-07-19 15:29     ` Sri Subramanian
  0 siblings, 1 reply; 14+ messages in thread
From: Sri Subramanian @ 2022-07-15  0:59 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Thu, Jul 14, 2022 at 8:32 AM Philippe Gerum <rpm@xenomai.org> wrote:
>
>
> Sri Subramanian <sridhar.subramanian@gmail.com> writes:
>
> > Hello,
> >
> > I am trying to optimize sending ethernet raw packets (for an Ethercat
> > fieldbus application) on a RaspberryPi/CM4 environment.
> >
>
> No NIC driver currently implements the EVL out-of-band protocol fully
> yet, so at the moment, although the EVL network stack does improve the
> timings, there is still some work on the device side to get a full
> real-time path from the application to the wire.
>
> i.e. today we have:
>
> (app) <---> EVL netstack <--/ non-rt /--> (NIC driver via in-band context)
>
> we aim at a full real-time path next:
>
> (app) <---> EVL netstack <-------> (NIC driver via out-of-band context)
>

Understood! Thanks for the explanation. Our current application
occasionally gets hit with ~100us latency on the linux send() call,
so we're hoping this will be an improvement.

>
> > Before implementing the application, I thought I'd try getting an ICMP
> > echo using the "tidbits" sample program oob-net-icmp.
> >
> > I am using eth0 from a RaspberryPI (used to ping) connected to another
> > PI device (CM4 on an IO board) also on eth0, which is echoing back the
> > pings using oob-net-icmp. I set up vlan 42 as documented in the
> > program, for both the devices. (I verify the network connection is
> > good  prior to this by sending pings and verifying the echo responses,
> > through the standard Linux stack on eth0).
> >
> > I also set up the ping-receiving device so the Xenomai kernel can
> > route the packets up the Xenomai mini stack, setting:
> >
> > echo 1> /sys/class/net/eth0.42/oob_port
> > echo 42 > /sys/class/evl/control/net_vlans
> >
> > On the first ping, the kernel crashes with the stack as below. I've
> > tested on both versions 5.15 and 5.17 (both with CONFIG_NET_OOB=y).
> > Thanks for any help or guidance.
> >
> > -Sri Subramanian
> >
<snip>
>
> This patch may help:
> https://source.denx.de/Xenomai/xenomai4/linux-evl/-/commit/2ad6b2207f9f1669f64f6816428a20e1bbd6357c
>
> Feedback on this fix welcome, so that we may assume the issue is closed.
>

The fix doesn't crash the kernel, thanks for the quick response.

oob-net-icmp successfully sends the response and prints stats on the console.
I am puzzled that the sending ping program doesn't ack the echo responses.
oob_port is not enabled for the sending side, so the echo responses should go
up the Linux network stack and 'ping' should see them.

ifconfig stats on eth0.42 also show that the Rx packets were not received.

I've tried this both on NET_OOB -enabled and -disabled kernels with the same
behavior. I'll try using a hardware sniffer next to make sure.

> PS: EVL is maintained for the latest LTS and SLTS releases, and follows
> the mainline development tip closely too. This means v5.10-stable,
> v5.15-stable and v5.19 at the moment. Other releases published may not
> include all available patches: if you want to use them nevertheless, you
> will need to cherry-pick fixes from maintained trees.
>

Will do. Cheers!
Sri

> Thanks,
>
> --
> Philippe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-15  0:59   ` Sri Subramanian
@ 2022-07-19 15:29     ` Sri Subramanian
  2022-07-20  8:42       ` Philippe Gerum
  0 siblings, 1 reply; 14+ messages in thread
From: Sri Subramanian @ 2022-07-19 15:29 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Thu, Jul 14, 2022 at 5:59 PM Sri Subramanian
<sridhar.subramanian@gmail.com> wrote:
>
> On Thu, Jul 14, 2022 at 8:32 AM Philippe Gerum <rpm@xenomai.org> wrote:
> >
<snip>
> >
> > This patch may help:
> > https://source.denx.de/Xenomai/xenomai4/linux-evl/-/commit/2ad6b2207f9f1669f64f6816428a20e1bbd6357c
> >
> > Feedback on this fix welcome, so that we may assume the issue is closed.
> >
>

Hello Philippe,

As noted below, the fix doesn't crash the kernel.

However, I observe this strange behavior with oob_sendmsg:

1. With oob-net-icmp, sending back ICMP echo responses, the pinging device
only receives exactly 10 responses, then stops. oob-net-icmp continues to
print on the console that it is sending responses.

2. If I just use oob_sendmsg to send a packet every second to the network and
then run tshark on the receiving device, I see the same behavior -- 10 packets
received and then nothing while the client continues to indicate it
has successfully
sent the packets.

3. Once you get into this condition of "non-sent-packets", the only
way to get out
is to turn off eth0.42, reset the values to oob_port and
control_vlans, then turn on
eth0.42. It is, weirdly, as if the oob links are restricted to only
send 10 packets.

Again, the environment is Raspberry PI/CM4 and I've tested with both
5.15 and 5.17.

Thanks for any help!
Sri

> The fix doesn't crash the kernel, thanks for the quick response.
>
> oob-net-icmp successfully sends the response and prints stats on the console.
> I am puzzled that the sending ping program doesn't ack the echo responses.
> oob_port is not enabled for the sending side, so the echo responses should go
> up the Linux network stack and 'ping' should see them.
>
> ifconfig stats on eth0.42 also show that the Rx packets were not received.
>
> I've tried this both on NET_OOB -enabled and -disabled kernels with the same
> behavior. I'll try using a hardware sniffer next to make sure.
>
> > PS: EVL is maintained for the latest LTS and SLTS releases, and follows
> > the mainline development tip closely too. This means v5.10-stable,
> > v5.15-stable and v5.19 at the moment. Other releases published may not
> > include all available patches: if you want to use them nevertheless, you
> > will need to cherry-pick fixes from maintained trees.
> >
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-19 15:29     ` Sri Subramanian
@ 2022-07-20  8:42       ` Philippe Gerum
  2022-07-20 16:35         ` Sri Subramanian
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2022-07-20  8:42 UTC (permalink / raw)
  To: Sri Subramanian; +Cc: xenomai


Sri Subramanian <sridhar.subramanian@gmail.com> writes:

> On Thu, Jul 14, 2022 at 5:59 PM Sri Subramanian
> <sridhar.subramanian@gmail.com> wrote:
>>
>> On Thu, Jul 14, 2022 at 8:32 AM Philippe Gerum <rpm@xenomai.org> wrote:
>> >
> <snip>
>> >
>> > This patch may help:
>> > https://source.denx.de/Xenomai/xenomai4/linux-evl/-/commit/2ad6b2207f9f1669f64f6816428a20e1bbd6357c
>> >
>> > Feedback on this fix welcome, so that we may assume the issue is closed.
>> >
>>
>
> Hello Philippe,
>
> As noted below, the fix doesn't crash the kernel.
>
> However, I observe this strange behavior with oob_sendmsg:
>
> 1. With oob-net-icmp, sending back ICMP echo responses, the pinging device
> only receives exactly 10 responses, then stops. oob-net-icmp continues to
> print on the console that it is sending responses.
>
> 2. If I just use oob_sendmsg to send a packet every second to the network and
> then run tshark on the receiving device, I see the same behavior -- 10 packets
> received and then nothing while the client continues to indicate it
> has successfully
> sent the packets.
>
> 3. Once you get into this condition of "non-sent-packets", the only
> way to get out
> is to turn off eth0.42, reset the values to oob_port and
> control_vlans, then turn on
> eth0.42. It is, weirdly, as if the oob links are restricted to only
> send 10 packets.
>
> Again, the environment is Raspberry PI/CM4 and I've tested with both
> 5.15 and 5.17.
>

I cannot reproduce such behavior (x86 <-> i.MX6q on the same switch),
the overnight test was still ok both sides after 14hrs. Does tshark see
the packets going out when observing from the sending device?

-- 
Philippe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-20  8:42       ` Philippe Gerum
@ 2022-07-20 16:35         ` Sri Subramanian
  2022-07-20 17:41           ` Philippe Gerum
  0 siblings, 1 reply; 14+ messages in thread
From: Sri Subramanian @ 2022-07-20 16:35 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

I just re-ran the test and yes, tshark (on the pinging device) sees
exactly 10 request+reply packets, and then, only request packets.

Configuration: R-PI4 (pinger) -> CM4+IO (oob-net-icmp) on
same switch, version 5.17+patch on both.

I'll try to scrounge around for a 32-bit ARM board I can test with
to see if that's the differentiator.

Sri

On Wed, Jul 20, 2022 at 1:46 AM Philippe Gerum <rpm@xenomai.org> wrote:
>
>
> Sri Subramanian <sridhar.subramanian@gmail.com> writes:
>
> > On Thu, Jul 14, 2022 at 5:59 PM Sri Subramanian
> > <sridhar.subramanian@gmail.com> wrote:
> >>
> >> On Thu, Jul 14, 2022 at 8:32 AM Philippe Gerum <rpm@xenomai.org> wrote:
> >> >
> > <snip>
> >> >
> >> > This patch may help:
> >> > https://source.denx.de/Xenomai/xenomai4/linux-evl/-/commit/2ad6b2207f9f1669f64f6816428a20e1bbd6357c
> >> >
> >> > Feedback on this fix welcome, so that we may assume the issue is closed.
> >> >
> >>
> >
> > Hello Philippe,
> >
> > As noted below, the fix doesn't crash the kernel.
> >
> > However, I observe this strange behavior with oob_sendmsg:
> >
> > 1. With oob-net-icmp, sending back ICMP echo responses, the pinging device
> > only receives exactly 10 responses, then stops. oob-net-icmp continues to
> > print on the console that it is sending responses.
> >
> > 2. If I just use oob_sendmsg to send a packet every second to the network and
> > then run tshark on the receiving device, I see the same behavior -- 10 packets
> > received and then nothing while the client continues to indicate it
> > has successfully
> > sent the packets.
> >
> > 3. Once you get into this condition of "non-sent-packets", the only
> > way to get out
> > is to turn off eth0.42, reset the values to oob_port and
> > control_vlans, then turn on
> > eth0.42. It is, weirdly, as if the oob links are restricted to only
> > send 10 packets.
> >
> > Again, the environment is Raspberry PI/CM4 and I've tested with both
> > 5.15 and 5.17.
> >
>
> I cannot reproduce such behavior (x86 <-> i.MX6q on the same switch),
> the overnight test was still ok both sides after 14hrs. Does tshark see
> the packets going out when observing from the sending device?
>
> --
> Philippe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-20 16:35         ` Sri Subramanian
@ 2022-07-20 17:41           ` Philippe Gerum
  2022-07-20 19:48             ` Sri Subramanian
  2022-07-21  0:38             ` Sri Subramanian
  0 siblings, 2 replies; 14+ messages in thread
From: Philippe Gerum @ 2022-07-20 17:41 UTC (permalink / raw)
  To: Sri Subramanian; +Cc: xenomai


Sri Subramanian <sridhar.subramanian@gmail.com> writes:

> I just re-ran the test and yes, tshark (on the pinging device) sees
> exactly 10 request+reply packets, and then, only request packets.
>

Ok, so for some reason, the sender is fine but the ICMP reply on the
receive side does not make it to the wire, although the ministack there
says it's all fine.

> Configuration: R-PI4 (pinger) -> CM4+IO (oob-net-icmp) on
> same switch, version 5.17+patch on both.
>

To be sure, is your 5.17 tree up to date EVL-wise, with all the patches
available so far for 5.15.y and 5.19?

> I'll try to scrounge around for a 32-bit ARM board I can test with
> to see if that's the differentiator.
>

Ok. On my end, I'll connect a pi4 with the i.MX6, to add the GENET MAC
module to the picture.

> Sri
>
> On Wed, Jul 20, 2022 at 1:46 AM Philippe Gerum <rpm@xenomai.org> wrote:
>>
>>
>> Sri Subramanian <sridhar.subramanian@gmail.com> writes:
>>
>> > On Thu, Jul 14, 2022 at 5:59 PM Sri Subramanian
>> > <sridhar.subramanian@gmail.com> wrote:
>> >>
>> >> On Thu, Jul 14, 2022 at 8:32 AM Philippe Gerum <rpm@xenomai.org> wrote:
>> >> >
>> > <snip>
>> >> >
>> >> > This patch may help:
>> >> > https://source.denx.de/Xenomai/xenomai4/linux-evl/-/commit/2ad6b2207f9f1669f64f6816428a20e1bbd6357c
>> >> >
>> >> > Feedback on this fix welcome, so that we may assume the issue is closed.
>> >> >
>> >>
>> >
>> > Hello Philippe,
>> >
>> > As noted below, the fix doesn't crash the kernel.
>> >
>> > However, I observe this strange behavior with oob_sendmsg:
>> >
>> > 1. With oob-net-icmp, sending back ICMP echo responses, the pinging device
>> > only receives exactly 10 responses, then stops. oob-net-icmp continues to
>> > print on the console that it is sending responses.
>> >
>> > 2. If I just use oob_sendmsg to send a packet every second to the network and
>> > then run tshark on the receiving device, I see the same behavior -- 10 packets
>> > received and then nothing while the client continues to indicate it
>> > has successfully
>> > sent the packets.
>> >
>> > 3. Once you get into this condition of "non-sent-packets", the only
>> > way to get out
>> > is to turn off eth0.42, reset the values to oob_port and
>> > control_vlans, then turn on
>> > eth0.42. It is, weirdly, as if the oob links are restricted to only
>> > send 10 packets.
>> >
>> > Again, the environment is Raspberry PI/CM4 and I've tested with both
>> > 5.15 and 5.17.
>> >
>>
>> I cannot reproduce such behavior (x86 <-> i.MX6q on the same switch),
>> the overnight test was still ok both sides after 14hrs. Does tshark see
>> the packets going out when observing from the sending device?
>>
>> --
>> Philippe.


-- 
Philippe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-20 17:41           ` Philippe Gerum
@ 2022-07-20 19:48             ` Sri Subramanian
  2022-07-21  0:38             ` Sri Subramanian
  1 sibling, 0 replies; 14+ messages in thread
From: Sri Subramanian @ 2022-07-20 19:48 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Wed, Jul 20, 2022 at 10:47 AM Philippe Gerum <rpm@xenomai.org> wrote:
>
>
> Sri Subramanian <sridhar.subramanian@gmail.com> writes:
>
> > I just re-ran the test and yes, tshark (on the pinging device) sees
> > exactly 10 request+reply packets, and then, only request packets.
> >
>
> Ok, so for some reason, the sender is fine but the ICMP reply on the
> receive side does not make it to the wire, although the ministack there
> says it's all fine.
>
> > Configuration: R-PI4 (pinger) -> CM4+IO (oob-net-icmp) on
> > same switch, version 5.17+patch on both.
> >
>
> To be sure, is your 5.17 tree up to date EVL-wise, with all the patches
> available so far for 5.15.y and 5.19?
>

I updated to the top of the 5.19.evl-rebase branch and built rc7 off it and
I got the same results.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-20 17:41           ` Philippe Gerum
  2022-07-20 19:48             ` Sri Subramanian
@ 2022-07-21  0:38             ` Sri Subramanian
  2022-07-21  7:39               ` Philippe Gerum
  1 sibling, 1 reply; 14+ messages in thread
From: Sri Subramanian @ 2022-07-21  0:38 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Wed, Jul 20, 2022 at 10:47 AM Philippe Gerum <rpm@xenomai.org> wrote:
>
>
> Ok. On my end, I'll connect a pi4 with the i.MX6, to add the GENET MAC
> module to the picture.
>

It looks like the GENET MAC driver (or the interface) is the culprit.
For some reason, it
restricts the packets sent to 10 after every netif down/up toggle.

I attached a network adapter card to the PCIe interface and connected
that to the switch.

Both the oob-net-icmp and my packet generator are able to call oob_sendmsg
continuously!

One anomaly I found in my experimentation:

- When I try the USB-ethernet adapter, the packets are not seen by the
EVL mini-stack and
despite the fact that /sys/class/net/eth1.42/oob_port is set to 1 and
control/net_vlans is 42,
the packets are read/responded to by the Linux kernel and oob-net-icmp
is just waiting.

Previously I've used the USB-ethernet adapter for all tcp/ip
communication to the device
and it works just fine.

> >
> > On Wed, Jul 20, 2022 at 1:46 AM Philippe Gerum <rpm@xenomai.org> wrote:
> >>
> >>
> >> Sri Subramanian <sridhar.subramanian@gmail.com> writes:
> >>
> >> > On Thu, Jul 14, 2022 at 5:59 PM Sri Subramanian
> >> > <sridhar.subramanian@gmail.com> wrote:
> >> >>
> >> >> On Thu, Jul 14, 2022 at 8:32 AM Philippe Gerum <rpm@xenomai.org> wrote:
> >> >> >
> >> > <snip>
> >> >> >
> >> >> > This patch may help:
> >> >> > https://source.denx.de/Xenomai/xenomai4/linux-evl/-/commit/2ad6b2207f9f1669f64f6816428a20e1bbd6357c
> >> >> >
> >> >> > Feedback on this fix welcome, so that we may assume the issue is closed.
> >> >> >
> >> >>
> >> >
> >> > Hello Philippe,
> >> >
> >> > As noted below, the fix doesn't crash the kernel.
> >> >
> >> > However, I observe this strange behavior with oob_sendmsg:
> >> >
> >> > 1. With oob-net-icmp, sending back ICMP echo responses, the pinging device
> >> > only receives exactly 10 responses, then stops. oob-net-icmp continues to
> >> > print on the console that it is sending responses.
> >> >
> >> > 2. If I just use oob_sendmsg to send a packet every second to the network and
> >> > then run tshark on the receiving device, I see the same behavior -- 10 packets
> >> > received and then nothing while the client continues to indicate it
> >> > has successfully
> >> > sent the packets.
> >> >
> >> > 3. Once you get into this condition of "non-sent-packets", the only
> >> > way to get out
> >> > is to turn off eth0.42, reset the values to oob_port and
> >> > control_vlans, then turn on
> >> > eth0.42. It is, weirdly, as if the oob links are restricted to only
> >> > send 10 packets.
> >> >
> >> > Again, the environment is Raspberry PI/CM4 and I've tested with both
> >> > 5.15 and 5.17.
> >> >
> >>
> >> I cannot reproduce such behavior (x86 <-> i.MX6q on the same switch),
> >> the overnight test was still ok both sides after 14hrs. Does tshark see
> >> the packets going out when observing from the sending device?
> >>
> >> --
> >> Philippe.
>
>
> --
> Philippe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-21  0:38             ` Sri Subramanian
@ 2022-07-21  7:39               ` Philippe Gerum
  2022-07-21 14:21                 ` Philippe Gerum
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2022-07-21  7:39 UTC (permalink / raw)
  To: Sri Subramanian; +Cc: xenomai


Sri Subramanian <sridhar.subramanian@gmail.com> writes:

> On Wed, Jul 20, 2022 at 10:47 AM Philippe Gerum <rpm@xenomai.org> wrote:
>>
>>
>> Ok. On my end, I'll connect a pi4 with the i.MX6, to add the GENET MAC
>> module to the picture.
>>
>
> It looks like the GENET MAC driver (or the interface) is the culprit.
> For some reason, it
> restricts the packets sent to 10 after every netif down/up toggle.
>

Weird. I'll trace that.

> I attached a network adapter card to the PCIe interface and connected
> that to the switch.
>
> Both the oob-net-icmp and my packet generator are able to call oob_sendmsg
> continuously!
>
> One anomaly I found in my experimentation:
>
> - When I try the USB-ethernet adapter, the packets are not seen by the
> EVL mini-stack and
> despite the fact that /sys/class/net/eth1.42/oob_port is set to 1 and
> control/net_vlans is 42,
> the packets are read/responded to by the Linux kernel and oob-net-icmp
> is just waiting.
>
> Previously I've used the USB-ethernet adapter for all tcp/ip
> communication to the device
> and it works just fine.
>

I suspect something is wrong in the oob traffic detection handler,
namely [1]. For some reason, incoming packets are not seen as belonging
to the oob vlan. Most likely something I overlooked when looking for the
802.1q encapsulation, which the USB adapter triggers.

[1] https://source.denx.de/Xenomai/xenomai4/linux-evl/-/blob/v5.15.y-evl-rebase/kernel/evl/net/ethernet/input.c#L36

-- 
Philippe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-21  7:39               ` Philippe Gerum
@ 2022-07-21 14:21                 ` Philippe Gerum
  2022-07-22 17:06                   ` Philippe Gerum
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2022-07-21 14:21 UTC (permalink / raw)
  To: Sri Subramanian; +Cc: xenomai


Philippe Gerum <rpm@xenomai.org> writes:

> Sri Subramanian <sridhar.subramanian@gmail.com> writes:
>
>> On Wed, Jul 20, 2022 at 10:47 AM Philippe Gerum <rpm@xenomai.org> wrote:
>>>
>>>
>>> Ok. On my end, I'll connect a pi4 with the i.MX6, to add the GENET MAC
>>> module to the picture.
>>>
>>
>> It looks like the GENET MAC driver (or the interface) is the culprit.
>> For some reason, it
>> restricts the packets sent to 10 after every netif down/up toggle.
>>
>
> Weird. I'll trace that.
>

Confirmed, I can observe the exact same behavior with any board pinging
the pi4 here. 10 packets with proper ICMP reply, then the responder goes
south.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-21 14:21                 ` Philippe Gerum
@ 2022-07-22 17:06                   ` Philippe Gerum
  2022-07-22 19:30                     ` Sri Subramanian
  0 siblings, 1 reply; 14+ messages in thread
From: Philippe Gerum @ 2022-07-22 17:06 UTC (permalink / raw)
  To: Sri Subramanian; +Cc: xenomai


Philippe Gerum <rpm@xenomai.org> writes:

> Philippe Gerum <rpm@xenomai.org> writes:
>
>> Sri Subramanian <sridhar.subramanian@gmail.com> writes:
>>
>>> On Wed, Jul 20, 2022 at 10:47 AM Philippe Gerum <rpm@xenomai.org> wrote:
>>>>
>>>>
>>>> Ok. On my end, I'll connect a pi4 with the i.MX6, to add the GENET MAC
>>>> module to the picture.
>>>>
>>>
>>> It looks like the GENET MAC driver (or the interface) is the culprit.
>>> For some reason, it
>>> restricts the packets sent to 10 after every netif down/up toggle.
>>>
>>
>> Weird. I'll trace that.
>>
>
> Confirmed, I can observe the exact same behavior with any board pinging
> the pi4 here. 10 packets with proper ICMP reply, then the responder goes
> south.

This issue should be gone with the latest (5) commits from this branch:
https://source.denx.de/Xenomai/xenomai4/linux-evl/-/commits/v5.19-evl-rebase/

I'll backport them to 5.15 and 5.10 as well.

-- 
Philippe.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp
  2022-07-22 17:06                   ` Philippe Gerum
@ 2022-07-22 19:30                     ` Sri Subramanian
  0 siblings, 0 replies; 14+ messages in thread
From: Sri Subramanian @ 2022-07-22 19:30 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On Fri, Jul 22, 2022 at 10:07 AM Philippe Gerum <rpm@xenomai.org> wrote:
>
> > Confirmed, I can observe the exact same behavior with any board pinging
> > the pi4 here. 10 packets with proper ICMP reply, then the responder goes
> > south.
>
> This issue should be gone with the latest (5) commits from this branch:
> https://source.denx.de/Xenomai/xenomai4/linux-evl/-/commits/v5.19-evl-rebase/
>
> I'll backport them to 5.15 and 5.10 as well.
>

Thanks for your promptness and dedication! I've tested it and the fix works!

Sri

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-07-22 19:30 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-12  1:10 Xenomai 4 / oob_sendmsg crashes kernel from tidbits/oob-net-icmp Sri Subramanian
2022-07-14  8:16 ` Philippe Gerum
2022-07-14 15:18 ` Philippe Gerum
2022-07-15  0:59   ` Sri Subramanian
2022-07-19 15:29     ` Sri Subramanian
2022-07-20  8:42       ` Philippe Gerum
2022-07-20 16:35         ` Sri Subramanian
2022-07-20 17:41           ` Philippe Gerum
2022-07-20 19:48             ` Sri Subramanian
2022-07-21  0:38             ` Sri Subramanian
2022-07-21  7:39               ` Philippe Gerum
2022-07-21 14:21                 ` Philippe Gerum
2022-07-22 17:06                   ` Philippe Gerum
2022-07-22 19:30                     ` Sri Subramanian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.