All of lore.kernel.org
 help / color / mirror / Atom feed
* RTnet splash with current next and stable
@ 2018-10-30 10:28 Jan Kiszka
  2018-10-31 17:10 ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2018-10-30 10:28 UTC (permalink / raw)
  To: Xenomai, Philippe Gerum, Sebastian Smolorz

Hi all,

just testing new kernels with smokey, and one of them had RTnet on. This
triggered a bug with the net_packat_dgram test:

[   50.610997] initializing loopback...
[   50.613799] RTnet: registered rtlo
[   50.692154] ------------[ cut here ]------------
[   50.693577] WARNING: CPU: 1 PID: 1119 at ../lib/list_debug.c:29 __list_add+0x62/0xb0
[   50.696832] list_add corruption. next->prev should be prev (ffffffffa0007d20), but was           (null). (next=ffff88003a8bb700).
[   50.700751] Modules linked in: rtpacket rtipv4 rt_loopback rt_e1000e rtnet
[   50.702518] CPU: 1 PID: 1119 Comm: smokey Not tainted 4.9.135+ #26
[   50.704188] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
[   50.707861] I-pipe domain: Linux
[   50.709295]  ffffc90002943c10 ffffffff8146a31a ffffc90002943c60 0000000000000000
[   50.710335]  ffffc90002943c50 ffffffff810e9441 0000001da0007ca8 ffff88003a8bb700
[   50.710335]  ffffffffa0007d20 ffff88003a8bb700 000000000001d230 ffff88003a8bb640
[   50.710335] Call Trace:
[   50.710335]  [<ffffffff8146a31a>] dump_stack+0xa2/0xc8
[   50.710335]  [<ffffffff810e9441>] __warn+0xe1/0x100
[   50.710335]  [<ffffffff810e94af>] warn_slowpath_fmt+0x4f/0x60
[   50.710335]  [<ffffffff812b0b61>] ? __slab_alloc.isra.88+0xa1/0xd0
[   50.710335]  [<ffffffffa0003edb>] ? rtskb_pool_extend+0xbb/0x220 [rtnet]
[   50.710335]  [<ffffffff81498862>] __list_add+0x62/0xb0
[   50.710335]  [<ffffffffa000160a>] rtdev_map_rtskb+0x7a/0xa0 [rtnet]
[   50.710335]  [<ffffffffa0003f12>] rtskb_pool_extend+0xf2/0x220 [rtnet]
[   50.710335]  [<ffffffffa0004072>] rtskb_pool_init+0x32/0x70 [rtnet]
[   50.710335]  [<ffffffffa00043d2>] __rt_bare_socket_init+0x42/0x90 [rtnet]
[   50.710335]  [<ffffffffa0004501>] __rt_socket_init+0x81/0xd0 [rtnet]
[   50.710335]  [<ffffffffa0054477>] rt_packet_socket+0x27/0xb0 [rtpacket]
[   50.710335]  [<ffffffff81219dd7>] ? create_instance+0x67/0x80
[   50.710335]  [<ffffffff8121a772>] __rtdm_dev_socket+0x112/0x440
[   50.710335]  [<ffffffff8113bb09>] ? lock_release+0x2c9/0x440
[   50.710335]  [<ffffffff8122b200>] ? CoBaLt_open+0x40/0x40
[   50.710335]  [<ffffffff8122b20e>] CoBaLt_socket+0xe/0x20
[   50.710335]  [<ffffffff812412ca>] ipipe_syscall_hook+0x19a/0x460
[   50.710335]  [<ffffffff811ad8e9>] __ipipe_notify_syscall+0x79/0x400
[   50.710335]  [<ffffffff811adc9a>] ipipe_handle_syscall+0x2a/0xd0
[   50.710335]  [<ffffffff81002d6c>] do_syscall_64+0x3c/0x1c0
[   50.710335]  [<ffffffff8184f2c3>] entry_SYSCALL_64_after_swapgs+0x5d/0xdb
[   50.761499] ---[ end trace 633f1ec8aecb0735 ]---
[   50.762893] ------------[ cut here ]------------

Anyone who hacked on this recently have a feeling?

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTnet splash with current next and stable
  2018-10-30 10:28 RTnet splash with current next and stable Jan Kiszka
@ 2018-10-31 17:10 ` Philippe Gerum
  2018-12-04 17:00   ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2018-10-31 17:10 UTC (permalink / raw)
  To: Jan Kiszka, Xenomai, Sebastian Smolorz

On 10/30/18 11:28 AM, Jan Kiszka wrote:
> Hi all,
> 
> just testing new kernels with smokey, and one of them had RTnet on. This
> triggered a bug with the net_packat_dgram test:
> 
> [   50.610997] initializing loopback...
> [   50.613799] RTnet: registered rtlo
> [   50.692154] ------------[ cut here ]------------
> [   50.693577] WARNING: CPU: 1 PID: 1119 at ../lib/list_debug.c:29 __list_add+0x62/0xb0
> [   50.696832] list_add corruption. next->prev should be prev (ffffffffa0007d20), but was           (null). (next=ffff88003a8bb700).
> [   50.700751] Modules linked in: rtpacket rtipv4 rt_loopback rt_e1000e rtnet
> [   50.702518] CPU: 1 PID: 1119 Comm: smokey Not tainted 4.9.135+ #26
> [   50.704188] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> [   50.707861] I-pipe domain: Linux
> [   50.709295]  ffffc90002943c10 ffffffff8146a31a ffffc90002943c60 0000000000000000
> [   50.710335]  ffffc90002943c50 ffffffff810e9441 0000001da0007ca8 ffff88003a8bb700
> [   50.710335]  ffffffffa0007d20 ffff88003a8bb700 000000000001d230 ffff88003a8bb640
> [   50.710335] Call Trace:
> [   50.710335]  [<ffffffff8146a31a>] dump_stack+0xa2/0xc8
> [   50.710335]  [<ffffffff810e9441>] __warn+0xe1/0x100
> [   50.710335]  [<ffffffff810e94af>] warn_slowpath_fmt+0x4f/0x60
> [   50.710335]  [<ffffffff812b0b61>] ? __slab_alloc.isra.88+0xa1/0xd0
> [   50.710335]  [<ffffffffa0003edb>] ? rtskb_pool_extend+0xbb/0x220 [rtnet]
> [   50.710335]  [<ffffffff81498862>] __list_add+0x62/0xb0
> [   50.710335]  [<ffffffffa000160a>] rtdev_map_rtskb+0x7a/0xa0 [rtnet]
> [   50.710335]  [<ffffffffa0003f12>] rtskb_pool_extend+0xf2/0x220 [rtnet]
> [   50.710335]  [<ffffffffa0004072>] rtskb_pool_init+0x32/0x70 [rtnet]
> [   50.710335]  [<ffffffffa00043d2>] __rt_bare_socket_init+0x42/0x90 [rtnet]
> [   50.710335]  [<ffffffffa0004501>] __rt_socket_init+0x81/0xd0 [rtnet]
> [   50.710335]  [<ffffffffa0054477>] rt_packet_socket+0x27/0xb0 [rtpacket]
> [   50.710335]  [<ffffffff81219dd7>] ? create_instance+0x67/0x80
> [   50.710335]  [<ffffffff8121a772>] __rtdm_dev_socket+0x112/0x440
> [   50.710335]  [<ffffffff8113bb09>] ? lock_release+0x2c9/0x440
> [   50.710335]  [<ffffffff8122b200>] ? CoBaLt_open+0x40/0x40
> [   50.710335]  [<ffffffff8122b20e>] CoBaLt_socket+0xe/0x20
> [   50.710335]  [<ffffffff812412ca>] ipipe_syscall_hook+0x19a/0x460
> [   50.710335]  [<ffffffff811ad8e9>] __ipipe_notify_syscall+0x79/0x400
> [   50.710335]  [<ffffffff811adc9a>] ipipe_handle_syscall+0x2a/0xd0
> [   50.710335]  [<ffffffff81002d6c>] do_syscall_64+0x3c/0x1c0
> [   50.710335]  [<ffffffff8184f2c3>] entry_SYSCALL_64_after_swapgs+0x5d/0xdb
> [   50.761499] ---[ end trace 633f1ec8aecb0735 ]---
> [   50.762893] ------------[ cut here ]------------
> 
> Anyone who hacked on this recently have a feeling?
> 

No idea. I'll have a look.


-- 
Philippe.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTnet splash with current next and stable
  2018-10-31 17:10 ` Philippe Gerum
@ 2018-12-04 17:00   ` Jan Kiszka
  2018-12-13  9:55     ` Pintu Agarwal
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2018-12-04 17:00 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai, Sebastian Smolorz

On 31.10.18 18:10, Philippe Gerum wrote:
> On 10/30/18 11:28 AM, Jan Kiszka wrote:
>> Hi all,
>>
>> just testing new kernels with smokey, and one of them had RTnet on. This
>> triggered a bug with the net_packat_dgram test:
>>
>> [   50.610997] initializing loopback...
>> [   50.613799] RTnet: registered rtlo
>> [   50.692154] ------------[ cut here ]------------
>> [   50.693577] WARNING: CPU: 1 PID: 1119 at ../lib/list_debug.c:29 __list_add+0x62/0xb0
>> [   50.696832] list_add corruption. next->prev should be prev (ffffffffa0007d20), but was           (null). (next=ffff88003a8bb700).
>> [   50.700751] Modules linked in: rtpacket rtipv4 rt_loopback rt_e1000e rtnet
>> [   50.702518] CPU: 1 PID: 1119 Comm: smokey Not tainted 4.9.135+ #26
>> [   50.704188] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
>> [   50.707861] I-pipe domain: Linux
>> [   50.709295]  ffffc90002943c10 ffffffff8146a31a ffffc90002943c60 0000000000000000
>> [   50.710335]  ffffc90002943c50 ffffffff810e9441 0000001da0007ca8 ffff88003a8bb700
>> [   50.710335]  ffffffffa0007d20 ffff88003a8bb700 000000000001d230 ffff88003a8bb640
>> [   50.710335] Call Trace:
>> [   50.710335]  [<ffffffff8146a31a>] dump_stack+0xa2/0xc8
>> [   50.710335]  [<ffffffff810e9441>] __warn+0xe1/0x100
>> [   50.710335]  [<ffffffff810e94af>] warn_slowpath_fmt+0x4f/0x60
>> [   50.710335]  [<ffffffff812b0b61>] ? __slab_alloc.isra.88+0xa1/0xd0
>> [   50.710335]  [<ffffffffa0003edb>] ? rtskb_pool_extend+0xbb/0x220 [rtnet]
>> [   50.710335]  [<ffffffff81498862>] __list_add+0x62/0xb0
>> [   50.710335]  [<ffffffffa000160a>] rtdev_map_rtskb+0x7a/0xa0 [rtnet]
>> [   50.710335]  [<ffffffffa0003f12>] rtskb_pool_extend+0xf2/0x220 [rtnet]
>> [   50.710335]  [<ffffffffa0004072>] rtskb_pool_init+0x32/0x70 [rtnet]
>> [   50.710335]  [<ffffffffa00043d2>] __rt_bare_socket_init+0x42/0x90 [rtnet]
>> [   50.710335]  [<ffffffffa0004501>] __rt_socket_init+0x81/0xd0 [rtnet]
>> [   50.710335]  [<ffffffffa0054477>] rt_packet_socket+0x27/0xb0 [rtpacket]
>> [   50.710335]  [<ffffffff81219dd7>] ? create_instance+0x67/0x80
>> [   50.710335]  [<ffffffff8121a772>] __rtdm_dev_socket+0x112/0x440
>> [   50.710335]  [<ffffffff8113bb09>] ? lock_release+0x2c9/0x440
>> [   50.710335]  [<ffffffff8122b200>] ? CoBaLt_open+0x40/0x40
>> [   50.710335]  [<ffffffff8122b20e>] CoBaLt_socket+0xe/0x20
>> [   50.710335]  [<ffffffff812412ca>] ipipe_syscall_hook+0x19a/0x460
>> [   50.710335]  [<ffffffff811ad8e9>] __ipipe_notify_syscall+0x79/0x400
>> [   50.710335]  [<ffffffff811adc9a>] ipipe_handle_syscall+0x2a/0xd0
>> [   50.710335]  [<ffffffff81002d6c>] do_syscall_64+0x3c/0x1c0
>> [   50.710335]  [<ffffffff8184f2c3>] entry_SYSCALL_64_after_swapgs+0x5d/0xdb
>> [   50.761499] ---[ end trace 633f1ec8aecb0735 ]---
>> [   50.762893] ------------[ cut here ]------------
>>
>> Anyone who hacked on this recently have a feeling?
>>
> 
> No idea. I'll have a look.
> 
> 

I think this one is still open. At least I do not remember having merged 
anything related to that since then.

As everyone is busy right now, I only wonder if we have a regression here, or if 
it's just a preexisting issue that we can safely ignore for some v3.0.8.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTnet splash with current next and stable
  2018-12-04 17:00   ` Jan Kiszka
@ 2018-12-13  9:55     ` Pintu Agarwal
  2018-12-13 16:21       ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Pintu Agarwal @ 2018-12-13  9:55 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Philippe Gerum, Xenomai@xenomai.org, Sebastian.Smolorz

On Tue, Dec 4, 2018 at 10:30 PM Jan Kiszka via Xenomai
<xenomai@xenomai.org> wrote:
>
> On 31.10.18 18:10, Philippe Gerum wrote:
> > On 10/30/18 11:28 AM, Jan Kiszka wrote:
> >> Hi all,
> >>
> >> just testing new kernels with smokey, and one of them had RTnet on. This
> >> triggered a bug with the net_packat_dgram test:
> >>
> >> [   50.610997] initializing loopback...
> >> [   50.613799] RTnet: registered rtlo
> >> [   50.692154] ------------[ cut here ]------------
> >> [   50.693577] WARNING: CPU: 1 PID: 1119 at ../lib/list_debug.c:29 __list_add+0x62/0xb0
> >> [   50.696832] list_add corruption. next->prev should be prev (ffffffffa0007d20), but was           (null). (next=ffff88003a8bb700).
> >> [   50.700751] Modules linked in: rtpacket rtipv4 rt_loopback rt_e1000e rtnet
> >> [   50.702518] CPU: 1 PID: 1119 Comm: smokey Not tainted 4.9.135+ #26
> >> [   50.704188] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> >> [   50.707861] I-pipe domain: Linux
> >> [   50.709295]  ffffc90002943c10 ffffffff8146a31a ffffc90002943c60 0000000000000000
> >> [   50.710335]  ffffc90002943c50 ffffffff810e9441 0000001da0007ca8 ffff88003a8bb700
> >> [   50.710335]  ffffffffa0007d20 ffff88003a8bb700 000000000001d230 ffff88003a8bb640
> >> [   50.710335] Call Trace:
> >> [   50.710335]  [<ffffffff8146a31a>] dump_stack+0xa2/0xc8
> >> [   50.710335]  [<ffffffff810e9441>] __warn+0xe1/0x100
> >> [   50.710335]  [<ffffffff810e94af>] warn_slowpath_fmt+0x4f/0x60
> >> [   50.710335]  [<ffffffff812b0b61>] ? __slab_alloc.isra.88+0xa1/0xd0
> >> [   50.710335]  [<ffffffffa0003edb>] ? rtskb_pool_extend+0xbb/0x220 [rtnet]
> >> [   50.710335]  [<ffffffff81498862>] __list_add+0x62/0xb0
> >> [   50.710335]  [<ffffffffa000160a>] rtdev_map_rtskb+0x7a/0xa0 [rtnet]
> >> [   50.710335]  [<ffffffffa0003f12>] rtskb_pool_extend+0xf2/0x220 [rtnet]
> >> [   50.710335]  [<ffffffffa0004072>] rtskb_pool_init+0x32/0x70 [rtnet]
> >> [   50.710335]  [<ffffffffa00043d2>] __rt_bare_socket_init+0x42/0x90 [rtnet]
> >> [   50.710335]  [<ffffffffa0004501>] __rt_socket_init+0x81/0xd0 [rtnet]
> >> [   50.710335]  [<ffffffffa0054477>] rt_packet_socket+0x27/0xb0 [rtpacket]
> >> [   50.710335]  [<ffffffff81219dd7>] ? create_instance+0x67/0x80
> >> [   50.710335]  [<ffffffff8121a772>] __rtdm_dev_socket+0x112/0x440
> >> [   50.710335]  [<ffffffff8113bb09>] ? lock_release+0x2c9/0x440
> >> [   50.710335]  [<ffffffff8122b200>] ? CoBaLt_open+0x40/0x40
> >> [   50.710335]  [<ffffffff8122b20e>] CoBaLt_socket+0xe/0x20
> >> [   50.710335]  [<ffffffff812412ca>] ipipe_syscall_hook+0x19a/0x460
> >> [   50.710335]  [<ffffffff811ad8e9>] __ipipe_notify_syscall+0x79/0x400
> >> [   50.710335]  [<ffffffff811adc9a>] ipipe_handle_syscall+0x2a/0xd0
> >> [   50.710335]  [<ffffffff81002d6c>] do_syscall_64+0x3c/0x1c0
> >> [   50.710335]  [<ffffffff8184f2c3>] entry_SYSCALL_64_after_swapgs+0x5d/0xdb
> >> [   50.761499] ---[ end trace 633f1ec8aecb0735 ]---
> >> [   50.762893] ------------[ cut here ]------------
> >>
> >> Anyone who hacked on this recently have a feeling?
> >>
> >
> > No idea. I'll have a look.
> >
> >
>
> I think this one is still open. At least I do not remember having merged
> anything related to that since then.
>
> As everyone is busy right now, I only wonder if we have a regression here, or if
> it's just a preexisting issue that we can safely ignore for some v3.0.8.
>
> Jan
>

I think I already reported similar rtnet loopback issue around 6
months ago with 3.0.6 itself.
This kind of issues are visible on x86, Beagle Bone/Raspberry Pi
(arm), Hikey (arm64) , even with SMAP disabled in Kernel 4.9.x
However, I did not see this issue with Virtual Box (Ubuntu guest).

> --
> Siemens AG, Corporate Technology, CT RDA IOT SES-DE
> Corporate Competence Center Embedded Linux
>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTnet splash with current next and stable
  2018-12-13  9:55     ` Pintu Agarwal
@ 2018-12-13 16:21       ` Jan Kiszka
  2018-12-13 17:24         ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2018-12-13 16:21 UTC (permalink / raw)
  To: Pintu Agarwal; +Cc: Philippe Gerum, Xenomai@xenomai.org, Sebastian.Smolorz

On 13.12.18 10:55, Pintu Agarwal wrote:
> On Tue, Dec 4, 2018 at 10:30 PM Jan Kiszka via Xenomai
> <xenomai@xenomai.org> wrote:
>>
>> On 31.10.18 18:10, Philippe Gerum wrote:
>>> On 10/30/18 11:28 AM, Jan Kiszka wrote:
>>>> Hi all,
>>>>
>>>> just testing new kernels with smokey, and one of them had RTnet on. This
>>>> triggered a bug with the net_packat_dgram test:
>>>>
>>>> [   50.610997] initializing loopback...
>>>> [   50.613799] RTnet: registered rtlo
>>>> [   50.692154] ------------[ cut here ]------------
>>>> [   50.693577] WARNING: CPU: 1 PID: 1119 at ../lib/list_debug.c:29 __list_add+0x62/0xb0
>>>> [   50.696832] list_add corruption. next->prev should be prev (ffffffffa0007d20), but was           (null). (next=ffff88003a8bb700).
>>>> [   50.700751] Modules linked in: rtpacket rtipv4 rt_loopback rt_e1000e rtnet
>>>> [   50.702518] CPU: 1 PID: 1119 Comm: smokey Not tainted 4.9.135+ #26
>>>> [   50.704188] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
>>>> [   50.707861] I-pipe domain: Linux
>>>> [   50.709295]  ffffc90002943c10 ffffffff8146a31a ffffc90002943c60 0000000000000000
>>>> [   50.710335]  ffffc90002943c50 ffffffff810e9441 0000001da0007ca8 ffff88003a8bb700
>>>> [   50.710335]  ffffffffa0007d20 ffff88003a8bb700 000000000001d230 ffff88003a8bb640
>>>> [   50.710335] Call Trace:
>>>> [   50.710335]  [<ffffffff8146a31a>] dump_stack+0xa2/0xc8
>>>> [   50.710335]  [<ffffffff810e9441>] __warn+0xe1/0x100
>>>> [   50.710335]  [<ffffffff810e94af>] warn_slowpath_fmt+0x4f/0x60
>>>> [   50.710335]  [<ffffffff812b0b61>] ? __slab_alloc.isra.88+0xa1/0xd0
>>>> [   50.710335]  [<ffffffffa0003edb>] ? rtskb_pool_extend+0xbb/0x220 [rtnet]
>>>> [   50.710335]  [<ffffffff81498862>] __list_add+0x62/0xb0
>>>> [   50.710335]  [<ffffffffa000160a>] rtdev_map_rtskb+0x7a/0xa0 [rtnet]
>>>> [   50.710335]  [<ffffffffa0003f12>] rtskb_pool_extend+0xf2/0x220 [rtnet]
>>>> [   50.710335]  [<ffffffffa0004072>] rtskb_pool_init+0x32/0x70 [rtnet]
>>>> [   50.710335]  [<ffffffffa00043d2>] __rt_bare_socket_init+0x42/0x90 [rtnet]
>>>> [   50.710335]  [<ffffffffa0004501>] __rt_socket_init+0x81/0xd0 [rtnet]
>>>> [   50.710335]  [<ffffffffa0054477>] rt_packet_socket+0x27/0xb0 [rtpacket]
>>>> [   50.710335]  [<ffffffff81219dd7>] ? create_instance+0x67/0x80
>>>> [   50.710335]  [<ffffffff8121a772>] __rtdm_dev_socket+0x112/0x440
>>>> [   50.710335]  [<ffffffff8113bb09>] ? lock_release+0x2c9/0x440
>>>> [   50.710335]  [<ffffffff8122b200>] ? CoBaLt_open+0x40/0x40
>>>> [   50.710335]  [<ffffffff8122b20e>] CoBaLt_socket+0xe/0x20
>>>> [   50.710335]  [<ffffffff812412ca>] ipipe_syscall_hook+0x19a/0x460
>>>> [   50.710335]  [<ffffffff811ad8e9>] __ipipe_notify_syscall+0x79/0x400
>>>> [   50.710335]  [<ffffffff811adc9a>] ipipe_handle_syscall+0x2a/0xd0
>>>> [   50.710335]  [<ffffffff81002d6c>] do_syscall_64+0x3c/0x1c0
>>>> [   50.710335]  [<ffffffff8184f2c3>] entry_SYSCALL_64_after_swapgs+0x5d/0xdb
>>>> [   50.761499] ---[ end trace 633f1ec8aecb0735 ]---
>>>> [   50.762893] ------------[ cut here ]------------
>>>>
>>>> Anyone who hacked on this recently have a feeling?
>>>>
>>>
>>> No idea. I'll have a look.
>>>
>>>
>>
>> I think this one is still open. At least I do not remember having merged
>> anything related to that since then.
>>
>> As everyone is busy right now, I only wonder if we have a regression here, or if
>> it's just a preexisting issue that we can safely ignore for some v3.0.8.
>>
>> Jan
>>
> 
> I think I already reported similar rtnet loopback issue around 6
> months ago with 3.0.6 itself.
> This kind of issues are visible on x86, Beagle Bone/Raspberry Pi
> (arm), Hikey (arm64) , even with SMAP disabled in Kernel 4.9.x
> However, I did not see this issue with Virtual Box (Ubuntu guest).
> 

Thanks for the hint - though I failed to find your concrete email in the archive 
so far. I'm planning to try this out soon and then decide in which order to 
proceed (release and fix later, or the other way around).

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTnet splash with current next and stable
  2018-12-13 16:21       ` Jan Kiszka
@ 2018-12-13 17:24         ` Jan Kiszka
  2018-12-14 11:22           ` [PATCH] rtnet: Fix lifecycle management of mapped rtskbs Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2018-12-13 17:24 UTC (permalink / raw)
  To: Pintu Agarwal; +Cc: Philippe Gerum, Xenomai@xenomai.org, Sebastian.Smolorz

On 13.12.18 17:21, Jan Kiszka wrote:
> On 13.12.18 10:55, Pintu Agarwal wrote:
>> On Tue, Dec 4, 2018 at 10:30 PM Jan Kiszka via Xenomai
>> <xenomai@xenomai.org> wrote:
>>>
>>> On 31.10.18 18:10, Philippe Gerum wrote:
>>>> On 10/30/18 11:28 AM, Jan Kiszka wrote:
>>>>> Hi all,
>>>>>
>>>>> just testing new kernels with smokey, and one of them had RTnet on. This
>>>>> triggered a bug with the net_packat_dgram test:
>>>>>
>>>>> [   50.610997] initializing loopback...
>>>>> [   50.613799] RTnet: registered rtlo
>>>>> [   50.692154] ------------[ cut here ]------------
>>>>> [   50.693577] WARNING: CPU: 1 PID: 1119 at ../lib/list_debug.c:29 
>>>>> __list_add+0x62/0xb0
>>>>> [   50.696832] list_add corruption. next->prev should be prev 
>>>>> (ffffffffa0007d20), but was           (null). (next=ffff88003a8bb700).
>>>>> [   50.700751] Modules linked in: rtpacket rtipv4 rt_loopback rt_e1000e rtnet
>>>>> [   50.702518] CPU: 1 PID: 1119 Comm: smokey Not tainted 4.9.135+ #26
>>>>> [   50.704188] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
>>>>> rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
>>>>> [   50.707861] I-pipe domain: Linux
>>>>> [   50.709295]  ffffc90002943c10 ffffffff8146a31a ffffc90002943c60 
>>>>> 0000000000000000
>>>>> [   50.710335]  ffffc90002943c50 ffffffff810e9441 0000001da0007ca8 
>>>>> ffff88003a8bb700
>>>>> [   50.710335]  ffffffffa0007d20 ffff88003a8bb700 000000000001d230 
>>>>> ffff88003a8bb640
>>>>> [   50.710335] Call Trace:
>>>>> [   50.710335]  [<ffffffff8146a31a>] dump_stack+0xa2/0xc8
>>>>> [   50.710335]  [<ffffffff810e9441>] __warn+0xe1/0x100
>>>>> [   50.710335]  [<ffffffff810e94af>] warn_slowpath_fmt+0x4f/0x60
>>>>> [   50.710335]  [<ffffffff812b0b61>] ? __slab_alloc.isra.88+0xa1/0xd0
>>>>> [   50.710335]  [<ffffffffa0003edb>] ? rtskb_pool_extend+0xbb/0x220 [rtnet]
>>>>> [   50.710335]  [<ffffffff81498862>] __list_add+0x62/0xb0
>>>>> [   50.710335]  [<ffffffffa000160a>] rtdev_map_rtskb+0x7a/0xa0 [rtnet]
>>>>> [   50.710335]  [<ffffffffa0003f12>] rtskb_pool_extend+0xf2/0x220 [rtnet]
>>>>> [   50.710335]  [<ffffffffa0004072>] rtskb_pool_init+0x32/0x70 [rtnet]
>>>>> [   50.710335]  [<ffffffffa00043d2>] __rt_bare_socket_init+0x42/0x90 [rtnet]
>>>>> [   50.710335]  [<ffffffffa0004501>] __rt_socket_init+0x81/0xd0 [rtnet]
>>>>> [   50.710335]  [<ffffffffa0054477>] rt_packet_socket+0x27/0xb0 [rtpacket]
>>>>> [   50.710335]  [<ffffffff81219dd7>] ? create_instance+0x67/0x80
>>>>> [   50.710335]  [<ffffffff8121a772>] __rtdm_dev_socket+0x112/0x440
>>>>> [   50.710335]  [<ffffffff8113bb09>] ? lock_release+0x2c9/0x440
>>>>> [   50.710335]  [<ffffffff8122b200>] ? CoBaLt_open+0x40/0x40
>>>>> [   50.710335]  [<ffffffff8122b20e>] CoBaLt_socket+0xe/0x20
>>>>> [   50.710335]  [<ffffffff812412ca>] ipipe_syscall_hook+0x19a/0x460
>>>>> [   50.710335]  [<ffffffff811ad8e9>] __ipipe_notify_syscall+0x79/0x400
>>>>> [   50.710335]  [<ffffffff811adc9a>] ipipe_handle_syscall+0x2a/0xd0
>>>>> [   50.710335]  [<ffffffff81002d6c>] do_syscall_64+0x3c/0x1c0
>>>>> [   50.710335]  [<ffffffff8184f2c3>] entry_SYSCALL_64_after_swapgs+0x5d/0xdb
>>>>> [   50.761499] ---[ end trace 633f1ec8aecb0735 ]---
>>>>> [   50.762893] ------------[ cut here ]------------
>>>>>
>>>>> Anyone who hacked on this recently have a feeling?
>>>>>
>>>>
>>>> No idea. I'll have a look.
>>>>
>>>>
>>>
>>> I think this one is still open. At least I do not remember having merged
>>> anything related to that since then.
>>>
>>> As everyone is busy right now, I only wonder if we have a regression here, or if
>>> it's just a preexisting issue that we can safely ignore for some v3.0.8.
>>>
>>> Jan
>>>
>>
>> I think I already reported similar rtnet loopback issue around 6
>> months ago with 3.0.6 itself.
>> This kind of issues are visible on x86, Beagle Bone/Raspberry Pi
>> (arm), Hikey (arm64) , even with SMAP disabled in Kernel 4.9.x
>> However, I did not see this issue with Virtual Box (Ubuntu guest).
>>
> 
> Thanks for the hint - though I failed to find your concrete email in the archive 
> so far. I'm planning to try this out soon and then decide in which order to 
> proceed (release and fix later, or the other way around).
> 

Confirmed, the problem existed in 3.0.6 already.

I'm having a brief look nevertheless. I feel like it could be simple...

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] rtnet: Fix lifecycle management of mapped rtskbs
  2018-12-13 17:24         ` Jan Kiszka
@ 2018-12-14 11:22           ` Jan Kiszka
  2018-12-18  6:46             ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2018-12-14 11:22 UTC (permalink / raw)
  To: Xenomai@xenomai.org

Do not add rtskbs to the rtskb_list which are not mappend because
rtdev_unmap_rtskb will not remove such rtskbs again (buf_dma_addr ==
RTSKB_UNMAPPED). In fact, rtskb_list should be called rtskb_mapped_list,
so refactor this while at it.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 kernel/drivers/net/stack/rtdev.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/drivers/net/stack/rtdev.c b/kernel/drivers/net/stack/rtdev.c
index 5eb73ce1a4..7b267f2a7c 100644
--- a/kernel/drivers/net/stack/rtdev.c
+++ b/kernel/drivers/net/stack/rtdev.c
@@ -44,9 +44,9 @@ MODULE_PARM_DESC(device_rtskbs, "Number of additional global realtime socket "
 struct rtnet_device         *rtnet_devices[MAX_RT_DEVICES];
 static struct rtnet_device  *loopback_device;
 static DEFINE_RTDM_LOCK(rtnet_devices_rt_lock);
+static LIST_HEAD(rtskb_mapped_list);
 
 LIST_HEAD(event_hook_list);
-LIST_HEAD(rtskb_list);
 DEFINE_MUTEX(rtnet_devices_nrt_lock);
 
 static int rtdev_locked_xmit(struct rtskb *skb, struct rtnet_device *rtdev);
@@ -412,8 +412,8 @@ int rtdev_map_rtskb(struct rtskb *skb)
 	}
     }
 
-    if (!err)
-	list_add(&skb->entry, &rtskb_list);
+    if (!err && skb->buf_dma_addr != RTSKB_UNMAPPED)
+	list_add(&skb->entry, &rtskb_mapped_list);
 
     mutex_unlock(&rtnet_devices_nrt_lock);
 
@@ -430,7 +430,7 @@ static int rtdev_map_all_rtskbs(struct rtnet_device *rtdev)
     if (!rtdev->map_rtskb)
 	return 0;
 
-    list_for_each_entry(skb, &rtskb_list, entry) {
+    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
 	err = rtskb_map(rtdev, skb);
 	if (err)
 	   break;
@@ -474,7 +474,7 @@ static void rtdev_unmap_all_rtskbs(struct rtnet_device *rtdev)
     if (!rtdev->unmap_rtskb)
 	return;
 
-    list_for_each_entry(skb, &rtskb_list, entry) {
+    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
 	rtdev->unmap_rtskb(rtdev, skb);
     }
 }
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] rtnet: Fix lifecycle management of mapped rtskbs
  2018-12-14 11:22           ` [PATCH] rtnet: Fix lifecycle management of mapped rtskbs Jan Kiszka
@ 2018-12-18  6:46             ` Jan Kiszka
  2019-01-17 12:56               ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Kiszka @ 2018-12-18  6:46 UTC (permalink / raw)
  To: Xenomai@xenomai.org

On 14.12.18 12:22, Jan Kiszka via Xenomai wrote:
> Do not add rtskbs to the rtskb_list which are not mappend because
> rtdev_unmap_rtskb will not remove such rtskbs again (buf_dma_addr ==
> RTSKB_UNMAPPED). In fact, rtskb_list should be called rtskb_mapped_list,
> so refactor this while at it.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> ---
>   kernel/drivers/net/stack/rtdev.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/drivers/net/stack/rtdev.c b/kernel/drivers/net/stack/rtdev.c
> index 5eb73ce1a4..7b267f2a7c 100644
> --- a/kernel/drivers/net/stack/rtdev.c
> +++ b/kernel/drivers/net/stack/rtdev.c
> @@ -44,9 +44,9 @@ MODULE_PARM_DESC(device_rtskbs, "Number of additional global realtime socket "
>   struct rtnet_device         *rtnet_devices[MAX_RT_DEVICES];
>   static struct rtnet_device  *loopback_device;
>   static DEFINE_RTDM_LOCK(rtnet_devices_rt_lock);
> +static LIST_HEAD(rtskb_mapped_list);
>   
>   LIST_HEAD(event_hook_list);
> -LIST_HEAD(rtskb_list);
>   DEFINE_MUTEX(rtnet_devices_nrt_lock);
>   
>   static int rtdev_locked_xmit(struct rtskb *skb, struct rtnet_device *rtdev);
> @@ -412,8 +412,8 @@ int rtdev_map_rtskb(struct rtskb *skb)
>   	}
>       }
>   
> -    if (!err)
> -	list_add(&skb->entry, &rtskb_list);
> +    if (!err && skb->buf_dma_addr != RTSKB_UNMAPPED)
> +	list_add(&skb->entry, &rtskb_mapped_list);
>   
>       mutex_unlock(&rtnet_devices_nrt_lock);
>   
> @@ -430,7 +430,7 @@ static int rtdev_map_all_rtskbs(struct rtnet_device *rtdev)
>       if (!rtdev->map_rtskb)
>   	return 0;
>   
> -    list_for_each_entry(skb, &rtskb_list, entry) {
> +    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
>   	err = rtskb_map(rtdev, skb);
>   	if (err)
>   	   break;
> @@ -474,7 +474,7 @@ static void rtdev_unmap_all_rtskbs(struct rtnet_device *rtdev)
>       if (!rtdev->unmap_rtskb)
>   	return;
>   
> -    list_for_each_entry(skb, &rtskb_list, entry) {
> +    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
>   	rtdev->unmap_rtskb(rtdev, skb);
>       }
>   }
> 

Applied to master and stable.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA IOT SES-DE
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rtnet: Fix lifecycle management of mapped rtskbs
  2018-12-18  6:46             ` Jan Kiszka
@ 2019-01-17 12:56               ` Philippe Gerum
  2019-01-18 11:58                 ` Philippe Gerum
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2019-01-17 12:56 UTC (permalink / raw)
  To: Jan Kiszka, Xenomai@xenomai.org

On 12/18/18 7:46 AM, Jan Kiszka via Xenomai wrote:
> On 14.12.18 12:22, Jan Kiszka via Xenomai wrote:
>> Do not add rtskbs to the rtskb_list which are not mappend because
>> rtdev_unmap_rtskb will not remove such rtskbs again (buf_dma_addr ==
>> RTSKB_UNMAPPED). In fact, rtskb_list should be called rtskb_mapped_list,
>> so refactor this while at it.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> ---
>>   kernel/drivers/net/stack/rtdev.c | 10 +++++-----
>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/drivers/net/stack/rtdev.c
>> b/kernel/drivers/net/stack/rtdev.c
>> index 5eb73ce1a4..7b267f2a7c 100644
>> --- a/kernel/drivers/net/stack/rtdev.c
>> +++ b/kernel/drivers/net/stack/rtdev.c
>> @@ -44,9 +44,9 @@ MODULE_PARM_DESC(device_rtskbs, "Number of
>> additional global realtime socket "
>>   struct rtnet_device         *rtnet_devices[MAX_RT_DEVICES];
>>   static struct rtnet_device  *loopback_device;
>>   static DEFINE_RTDM_LOCK(rtnet_devices_rt_lock);
>> +static LIST_HEAD(rtskb_mapped_list);
>>     LIST_HEAD(event_hook_list);
>> -LIST_HEAD(rtskb_list);
>>   DEFINE_MUTEX(rtnet_devices_nrt_lock);
>>     static int rtdev_locked_xmit(struct rtskb *skb, struct
>> rtnet_device *rtdev);
>> @@ -412,8 +412,8 @@ int rtdev_map_rtskb(struct rtskb *skb)
>>       }
>>       }
>>   -    if (!err)
>> -    list_add(&skb->entry, &rtskb_list);
>> +    if (!err && skb->buf_dma_addr != RTSKB_UNMAPPED)
>> +    list_add(&skb->entry, &rtskb_mapped_list);
>>         mutex_unlock(&rtnet_devices_nrt_lock);
>>   @@ -430,7 +430,7 @@ static int rtdev_map_all_rtskbs(struct
>> rtnet_device *rtdev)
>>       if (!rtdev->map_rtskb)
>>       return 0;
>>   -    list_for_each_entry(skb, &rtskb_list, entry) {
>> +    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
>>       err = rtskb_map(rtdev, skb);
>>       if (err)
>>          break;
>> @@ -474,7 +474,7 @@ static void rtdev_unmap_all_rtskbs(struct
>> rtnet_device *rtdev)
>>       if (!rtdev->unmap_rtskb)
>>       return;
>>   -    list_for_each_entry(skb, &rtskb_list, entry) {
>> +    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
>>       rtdev->unmap_rtskb(rtdev, skb);
>>       }
>>   }
>>
> 
> Applied to master and stable.
> 
> Jan
> 

Unfortunately, this breaks RTnet for any driver using DMA, apparently
uncovering a pre-existing issue in the device init->registration logic
wrt mapping skb from per-device pools.

In short, filtering skbs we can't immediately map out of the rtskb list
in rtdev_map_rtskb() prevents rt_register_rtnetdev(dev) from finding
@dev's skbs in rtskb_mapped_list when running rtdev_map_all_skbs().
In turn this prevents the driver from setting up the hw's RX/TX rings
with proper DMA mapping addresses, as none of its skbs has valid bus
addresses for DMA.

There may be a simple fix, but the rtdev init logic is a bit too
convoluted for my after lunch. I'll try to come up with a fix later.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] rtnet: Fix lifecycle management of mapped rtskbs
  2019-01-17 12:56               ` Philippe Gerum
@ 2019-01-18 11:58                 ` Philippe Gerum
  2019-01-18 12:37                   ` Jan Kiszka
  0 siblings, 1 reply; 11+ messages in thread
From: Philippe Gerum @ 2019-01-18 11:58 UTC (permalink / raw)
  To: Jan Kiszka, Xenomai@xenomai.org

On 1/17/19 1:56 PM, Philippe Gerum wrote:
> On 12/18/18 7:46 AM, Jan Kiszka via Xenomai wrote:
>> On 14.12.18 12:22, Jan Kiszka via Xenomai wrote:
>>> Do not add rtskbs to the rtskb_list which are not mappend because
>>> rtdev_unmap_rtskb will not remove such rtskbs again (buf_dma_addr ==
>>> RTSKB_UNMAPPED). In fact, rtskb_list should be called rtskb_mapped_list,
>>> so refactor this while at it.
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>> ---
>>>   kernel/drivers/net/stack/rtdev.c | 10 +++++-----
>>>   1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/kernel/drivers/net/stack/rtdev.c
>>> b/kernel/drivers/net/stack/rtdev.c
>>> index 5eb73ce1a4..7b267f2a7c 100644
>>> --- a/kernel/drivers/net/stack/rtdev.c
>>> +++ b/kernel/drivers/net/stack/rtdev.c
>>> @@ -44,9 +44,9 @@ MODULE_PARM_DESC(device_rtskbs, "Number of
>>> additional global realtime socket "
>>>   struct rtnet_device         *rtnet_devices[MAX_RT_DEVICES];
>>>   static struct rtnet_device  *loopback_device;
>>>   static DEFINE_RTDM_LOCK(rtnet_devices_rt_lock);
>>> +static LIST_HEAD(rtskb_mapped_list);
>>>     LIST_HEAD(event_hook_list);
>>> -LIST_HEAD(rtskb_list);
>>>   DEFINE_MUTEX(rtnet_devices_nrt_lock);
>>>     static int rtdev_locked_xmit(struct rtskb *skb, struct
>>> rtnet_device *rtdev);
>>> @@ -412,8 +412,8 @@ int rtdev_map_rtskb(struct rtskb *skb)
>>>       }
>>>       }
>>>   -    if (!err)
>>> -    list_add(&skb->entry, &rtskb_list);
>>> +    if (!err && skb->buf_dma_addr != RTSKB_UNMAPPED)
>>> +    list_add(&skb->entry, &rtskb_mapped_list);
>>>         mutex_unlock(&rtnet_devices_nrt_lock);
>>>   @@ -430,7 +430,7 @@ static int rtdev_map_all_rtskbs(struct
>>> rtnet_device *rtdev)
>>>       if (!rtdev->map_rtskb)
>>>       return 0;
>>>   -    list_for_each_entry(skb, &rtskb_list, entry) {
>>> +    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
>>>       err = rtskb_map(rtdev, skb);
>>>       if (err)
>>>          break;
>>> @@ -474,7 +474,7 @@ static void rtdev_unmap_all_rtskbs(struct
>>> rtnet_device *rtdev)
>>>       if (!rtdev->unmap_rtskb)
>>>       return;
>>>   -    list_for_each_entry(skb, &rtskb_list, entry) {
>>> +    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
>>>       rtdev->unmap_rtskb(rtdev, skb);
>>>       }
>>>   }
>>>
>>
>> Applied to master and stable.
>>
>> Jan
>>
> 
> Unfortunately, this breaks RTnet for any driver using DMA, apparently
> uncovering a pre-existing issue in the device init->registration logic
> wrt mapping skb from per-device pools.
> 
> In short, filtering skbs we can't immediately map out of the rtskb list
> in rtdev_map_rtskb() prevents rt_register_rtnetdev(dev) from finding
> @dev's skbs in rtskb_mapped_list when running rtdev_map_all_skbs().
> In turn this prevents the driver from setting up the hw's RX/TX rings
> with proper DMA mapping addresses, as none of its skbs has valid bus
> addresses for DMA.
> 
> There may be a simple fix, but the rtdev init logic is a bit too
> convoluted for my after lunch. I'll try to come up with a fix later.
> 

So there is a catch 22 situation with skb mapping for DMA operations. It
stems from the fact that creating a netdevice via a call to
rt_alloc_etherdev(dev) can't successfully map the skbs attached to that
new device, because rtdev_map_rtskb() only operates on registered
netdevices, which the new one cannot be just yet as it is only half-baked.

Prior to that change, the code worked almost by accident because
registering that new device would end up calling rtdev_map_all_rtskbs(),
which would eventually find the unmapped skbs left in the global
rtskb_list and map them through the driver's ->map_rtskb handler.

There are only two in-tree drivers using the ->map_rtskb handler, namely
igb and e1000e, I'm working on adding a 3rd one with a refactoring of
the fec for i.MX SoCs, which is why I hit this bug.

I suspect ->map_rtskb was introduced to address yet another issue with
all other DMA-capable drivers: calling dma_{map, unmap}_single() from
I/O handlers in primary mode like they all do is unsafe, and this
handler may have been intended to ensure those operations happen from
regular Linux context instead. As a matter of fact, most existing RTnet
drivers are unsafe regarding this. Generally speaking, it looks like the
process of initializing a new netdevice has evolved over time in RTnet,
leading to a lot of open coding on the driver developer's end, which is
quite error-prone.

My take on this is that the init->registration logic may need a rework
and a better API, solving the issue of skb mapping in the same move,
fixing or simply dropping obsolete driver code too. However, doing so
without breaking more stuff would require significant RTnet background I
don't have, so I'm leaving this to the grown-ups.

In the meantime, this is a workaround addressing the skb mapping bug
introduced by this commit (only lightly tested so far):

commit 31ae7b2154bc0d8df841835d699087876bec3014
Author: Philippe Gerum <rpm@xenomai.org>
Date:   Thu Jan 17 18:17:43 2019 +0100

    net/rtdev: ensure per-device skbs get mapped at registration

    This patch works around a regression introduced by #74464ee37d0,
    causing a new device's skbs not to be passed to its ->map_rtskb()
    handler when registered, breaking further DMA inits in the driver.

diff --git a/kernel/drivers/net/stack/rtdev.c
b/kernel/drivers/net/stack/rtdev.c
index 2982740d0..c5f8c73c6 100644
--- a/kernel/drivers/net/stack/rtdev.c
+++ b/kernel/drivers/net/stack/rtdev.c
@@ -45,6 +45,7 @@ struct rtnet_device
*rtnet_devices[MAX_RT_DEVICES];
 static struct rtnet_device  *loopback_device;
 static DEFINE_RTDM_LOCK(rtnet_devices_rt_lock);
 static LIST_HEAD(rtskb_mapped_list);
+static LIST_HEAD(rtskb_mapwait_list);
  LIST_HEAD(event_hook_list);
 DEFINE_MUTEX(rtnet_devices_nrt_lock);
@@ -459,8 +460,12 @@ int rtdev_map_rtskb(struct rtskb *skb)
 	}
     }
 -    if (!err && skb->buf_dma_addr != RTSKB_UNMAPPED)
-	list_add(&skb->entry, &rtskb_mapped_list);
+    if (!err) {
+	    if (skb->buf_dma_addr != RTSKB_UNMAPPED)
+		    list_add(&skb->entry, &rtskb_mapped_list);
+	    else
+		    list_add(&skb->entry, &rtskb_mapwait_list);
+    }
      mutex_unlock(&rtnet_devices_nrt_lock);
 @@ -471,19 +476,31 @@ int rtdev_map_rtskb(struct rtskb *skb)
  static int rtdev_map_all_rtskbs(struct rtnet_device *rtdev)
 {
-    struct rtskb *skb;
-    int err = 0;
+    struct rtskb *skb, *n;
+    int err;
      if (!rtdev->map_rtskb)
-	return 0;
+	    return 0;
 -    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
-	err = rtskb_map(rtdev, skb);
-	if (err)
-	   break;
+    if (!list_empty(&rtskb_mapped_list)) {
+	    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
+		    err = rtskb_map(rtdev, skb);
+		    if (err)
+			    return err;
+	    }
     }
 -    return err;
+    if (!list_empty(&rtskb_mapwait_list)) {
+	    list_for_each_entry_safe(skb, n, &rtskb_mapwait_list, entry) {
+		    err = rtskb_map(rtdev, skb);
+		    if (err)
+			    return err;
+		    list_del(&skb->entry);
+		    list_add(&skb->entry, &rtskb_mapped_list);
+	    }
+    }
+
+    return 0;
 }


-- 
Philippe.


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] rtnet: Fix lifecycle management of mapped rtskbs
  2019-01-18 11:58                 ` Philippe Gerum
@ 2019-01-18 12:37                   ` Jan Kiszka
  0 siblings, 0 replies; 11+ messages in thread
From: Jan Kiszka @ 2019-01-18 12:37 UTC (permalink / raw)
  To: Philippe Gerum, Xenomai@xenomai.org

On 18.01.19 12:58, Philippe Gerum via Xenomai wrote:
> On 1/17/19 1:56 PM, Philippe Gerum wrote:
>> On 12/18/18 7:46 AM, Jan Kiszka via Xenomai wrote:
>>> On 14.12.18 12:22, Jan Kiszka via Xenomai wrote:
>>>> Do not add rtskbs to the rtskb_list which are not mappend because
>>>> rtdev_unmap_rtskb will not remove such rtskbs again (buf_dma_addr ==
>>>> RTSKB_UNMAPPED). In fact, rtskb_list should be called rtskb_mapped_list,
>>>> so refactor this while at it.
>>>>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>> ---
>>>>    kernel/drivers/net/stack/rtdev.c | 10 +++++-----
>>>>    1 file changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/kernel/drivers/net/stack/rtdev.c
>>>> b/kernel/drivers/net/stack/rtdev.c
>>>> index 5eb73ce1a4..7b267f2a7c 100644
>>>> --- a/kernel/drivers/net/stack/rtdev.c
>>>> +++ b/kernel/drivers/net/stack/rtdev.c
>>>> @@ -44,9 +44,9 @@ MODULE_PARM_DESC(device_rtskbs, "Number of
>>>> additional global realtime socket "
>>>>    struct rtnet_device         *rtnet_devices[MAX_RT_DEVICES];
>>>>    static struct rtnet_device  *loopback_device;
>>>>    static DEFINE_RTDM_LOCK(rtnet_devices_rt_lock);
>>>> +static LIST_HEAD(rtskb_mapped_list);
>>>>      LIST_HEAD(event_hook_list);
>>>> -LIST_HEAD(rtskb_list);
>>>>    DEFINE_MUTEX(rtnet_devices_nrt_lock);
>>>>      static int rtdev_locked_xmit(struct rtskb *skb, struct
>>>> rtnet_device *rtdev);
>>>> @@ -412,8 +412,8 @@ int rtdev_map_rtskb(struct rtskb *skb)
>>>>        }
>>>>        }
>>>>    -    if (!err)
>>>> -    list_add(&skb->entry, &rtskb_list);
>>>> +    if (!err && skb->buf_dma_addr != RTSKB_UNMAPPED)
>>>> +    list_add(&skb->entry, &rtskb_mapped_list);
>>>>          mutex_unlock(&rtnet_devices_nrt_lock);
>>>>    @@ -430,7 +430,7 @@ static int rtdev_map_all_rtskbs(struct
>>>> rtnet_device *rtdev)
>>>>        if (!rtdev->map_rtskb)
>>>>        return 0;
>>>>    -    list_for_each_entry(skb, &rtskb_list, entry) {
>>>> +    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
>>>>        err = rtskb_map(rtdev, skb);
>>>>        if (err)
>>>>           break;
>>>> @@ -474,7 +474,7 @@ static void rtdev_unmap_all_rtskbs(struct
>>>> rtnet_device *rtdev)
>>>>        if (!rtdev->unmap_rtskb)
>>>>        return;
>>>>    -    list_for_each_entry(skb, &rtskb_list, entry) {
>>>> +    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
>>>>        rtdev->unmap_rtskb(rtdev, skb);
>>>>        }
>>>>    }
>>>>
>>>
>>> Applied to master and stable.
>>>
>>> Jan
>>>
>>
>> Unfortunately, this breaks RTnet for any driver using DMA, apparently
>> uncovering a pre-existing issue in the device init->registration logic
>> wrt mapping skb from per-device pools.
>>
>> In short, filtering skbs we can't immediately map out of the rtskb list
>> in rtdev_map_rtskb() prevents rt_register_rtnetdev(dev) from finding
>> @dev's skbs in rtskb_mapped_list when running rtdev_map_all_skbs().
>> In turn this prevents the driver from setting up the hw's RX/TX rings
>> with proper DMA mapping addresses, as none of its skbs has valid bus
>> addresses for DMA.
>>
>> There may be a simple fix, but the rtdev init logic is a bit too
>> convoluted for my after lunch. I'll try to come up with a fix later.
>>
>
> So there is a catch 22 situation with skb mapping for DMA operations. It
> stems from the fact that creating a netdevice via a call to
> rt_alloc_etherdev(dev) can't successfully map the skbs attached to that
> new device, because rtdev_map_rtskb() only operates on registered
> netdevices, which the new one cannot be just yet as it is only half-baked.
>
> Prior to that change, the code worked almost by accident because
> registering that new device would end up calling rtdev_map_all_rtskbs(),
> which would eventually find the unmapped skbs left in the global
> rtskb_list and map them through the driver's ->map_rtskb handler.
>
> There are only two in-tree drivers using the ->map_rtskb handler, namely
> igb and e1000e, I'm working on adding a 3rd one with a refactoring of
> the fec for i.MX SoCs, which is why I hit this bug.
>
> I suspect ->map_rtskb was introduced to address yet another issue with
> all other DMA-capable drivers: calling dma_{map, unmap}_single() from
> I/O handlers in primary mode like they all do is unsafe, and this
> handler may have been intended to ensure those operations happen from
> regular Linux context instead. As a matter of fact, most existing RTnet
> drivers are unsafe regarding this. Generally speaking, it looks like the
> process of initializing a new netdevice has evolved over time in RTnet,
> leading to a lot of open coding on the driver developer's end, which is
> quite error-prone.
>
> My take on this is that the init->registration logic may need a rework
> and a better API, solving the issue of skb mapping in the same move,
> fixing or simply dropping obsolete driver code too. However, doing so
> without breaking more stuff would require significant RTnet background I
> don't have, so I'm leaving this to the grown-ups.
>
> In the meantime, this is a workaround addressing the skb mapping bug
> introduced by this commit (only lightly tested so far):
>
> commit 31ae7b2154bc0d8df841835d699087876bec3014
> Author: Philippe Gerum <rpm@xenomai.org>
> Date:   Thu Jan 17 18:17:43 2019 +0100
>
>      net/rtdev: ensure per-device skbs get mapped at registration
>
>      This patch works around a regression introduced by #74464ee37d0,
>      causing a new device's skbs not to be passed to its ->map_rtskb()
>      handler when registered, breaking further DMA inits in the driver.
>
> diff --git a/kernel/drivers/net/stack/rtdev.c
> b/kernel/drivers/net/stack/rtdev.c
> index 2982740d0..c5f8c73c6 100644
> --- a/kernel/drivers/net/stack/rtdev.c
> +++ b/kernel/drivers/net/stack/rtdev.c
> @@ -45,6 +45,7 @@ struct rtnet_device
> *rtnet_devices[MAX_RT_DEVICES];
>   static struct rtnet_device  *loopback_device;
>   static DEFINE_RTDM_LOCK(rtnet_devices_rt_lock);
>   static LIST_HEAD(rtskb_mapped_list);
> +static LIST_HEAD(rtskb_mapwait_list);
>    LIST_HEAD(event_hook_list);
>   DEFINE_MUTEX(rtnet_devices_nrt_lock);
> @@ -459,8 +460,12 @@ int rtdev_map_rtskb(struct rtskb *skb)
>   	}
>       }
>   -    if (!err && skb->buf_dma_addr != RTSKB_UNMAPPED)
> -	list_add(&skb->entry, &rtskb_mapped_list);
> +    if (!err) {
> +	    if (skb->buf_dma_addr != RTSKB_UNMAPPED)
> +		    list_add(&skb->entry, &rtskb_mapped_list);
> +	    else
> +		    list_add(&skb->entry, &rtskb_mapwait_list);
> +    }
>        mutex_unlock(&rtnet_devices_nrt_lock);
>   @@ -471,19 +476,31 @@ int rtdev_map_rtskb(struct rtskb *skb)
>    static int rtdev_map_all_rtskbs(struct rtnet_device *rtdev)
>   {
> -    struct rtskb *skb;
> -    int err = 0;
> +    struct rtskb *skb, *n;
> +    int err;
>        if (!rtdev->map_rtskb)
> -	return 0;
> +	    return 0;
>   -    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
> -	err = rtskb_map(rtdev, skb);
> -	if (err)
> -	   break;
> +    if (!list_empty(&rtskb_mapped_list)) {
> +	    list_for_each_entry(skb, &rtskb_mapped_list, entry) {
> +		    err = rtskb_map(rtdev, skb);
> +		    if (err)
> +			    return err;
> +	    }
>       }
>   -    return err;
> +    if (!list_empty(&rtskb_mapwait_list)) {
> +	    list_for_each_entry_safe(skb, n, &rtskb_mapwait_list, entry) {
> +		    err = rtskb_map(rtdev, skb);
> +		    if (err)
> +			    return err;
> +		    list_del(&skb->entry);
> +		    list_add(&skb->entry, &rtskb_mapped_list);
> +	    }
> +    }
> +
> +    return 0;
>   }
>
>

OK, thanks for digging into that. Yes, the RTnet code carries a lot of history,
and parts that I completely forgot by now (it's 15 years old, or more).

If that workaround also pleases our smokey test via lookback dev, I can merge.

Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-18 12:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-30 10:28 RTnet splash with current next and stable Jan Kiszka
2018-10-31 17:10 ` Philippe Gerum
2018-12-04 17:00   ` Jan Kiszka
2018-12-13  9:55     ` Pintu Agarwal
2018-12-13 16:21       ` Jan Kiszka
2018-12-13 17:24         ` Jan Kiszka
2018-12-14 11:22           ` [PATCH] rtnet: Fix lifecycle management of mapped rtskbs Jan Kiszka
2018-12-18  6:46             ` Jan Kiszka
2019-01-17 12:56               ` Philippe Gerum
2019-01-18 11:58                 ` Philippe Gerum
2019-01-18 12:37                   ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.