bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] BUG: corrupted list in netif_napi_add
@ 2021-10-13 11:40 syzbot
  2021-10-13 13:35 ` Daniel Borkmann
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: syzbot @ 2021-10-13 11:40 UTC (permalink / raw)
  To: andrii, ast, bpf, daniel, davem, hawk, john.fastabend, kafai,
	kpsingh, kuba, linux-kernel, netdev, songliubraving,
	syzkaller-bugs, yhs

Hello,

syzbot found the following issue on:

HEAD commit:    683f29b781ae Add linux-next specific files for 20211008
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=673b3589d970c
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17c98e98b00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+62e474dd92a35e3060d8@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
list_add double add: new=ffff888023417160, prev=ffff88807de3a050, next=ffff888023417160.
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:29!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 9490 Comm: syz-executor.1 Not tainted 5.15.0-rc4-next-20211008-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
FS:  00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 __list_add_rcu include/linux/rculist.h:79 [inline]
 list_add_rcu include/linux/rculist.h:106 [inline]
 netif_napi_add+0x3fd/0x9c0 net/core/dev.c:6889
 veth_enable_xdp_range+0x1b1/0x300 drivers/net/veth.c:1009
 veth_enable_xdp+0x2a5/0x620 drivers/net/veth.c:1063
 veth_xdp_set drivers/net/veth.c:1483 [inline]
 veth_xdp+0x4d4/0x780 drivers/net/veth.c:1523
 bond_xdp_set drivers/net/bonding/bond_main.c:5217 [inline]
 bond_xdp+0x325/0x920 drivers/net/bonding/bond_main.c:5263
 dev_xdp_install+0xd5/0x270 net/core/dev.c:9365
 dev_xdp_attach+0x83d/0x1010 net/core/dev.c:9513
 dev_change_xdp_fd+0x246/0x300 net/core/dev.c:9753
 do_setlink+0x2fb4/0x3970 net/core/rtnetlink.c:2931
 rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
 __rtnl_newlink+0xc06/0x1750 net/core/rtnetlink.c:3396
 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
 rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
 netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
 netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
 sock_sendmsg_nosec net/socket.c:704 [inline]
 sock_sendmsg+0xcf/0x120 net/socket.c:724
 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
 ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f841f2718d9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f841e9e8188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f841f375f60 RCX: 00007f841f2718d9
RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
RBP: 00007f841f2cbcb4 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffc8978d37f R14: 00007f841e9e8300 R15: 0000000000022000
 </TASK>
Modules linked in:
---[ end trace 7281cadbc8534f23 ]---
RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
FS:  00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-13 11:40 [syzbot] BUG: corrupted list in netif_napi_add syzbot
@ 2021-10-13 13:35 ` Daniel Borkmann
  2021-10-14 13:50   ` Paolo Abeni
  2021-10-13 14:41 ` Paolo Abeni
  2021-12-14  7:52 ` syzbot
  2 siblings, 1 reply; 13+ messages in thread
From: Daniel Borkmann @ 2021-10-13 13:35 UTC (permalink / raw)
  To: syzbot, andrii, ast, bpf, davem, hawk, john.fastabend, kafai,
	kpsingh, kuba, linux-kernel, netdev, songliubraving,
	syzkaller-bugs, yhs
  Cc: pabeni, toke, joamaki

On 10/13/21 1:40 PM, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:

[ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]

> HEAD commit:    683f29b781ae Add linux-next specific files for 20211008
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=673b3589d970c
> dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17c98e98b00000
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+62e474dd92a35e3060d8@syzkaller.appspotmail.com
> 
> IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
> list_add double add: new=ffff888023417160, prev=ffff88807de3a050, next=ffff888023417160.
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:29!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 9490 Comm: syz-executor.1 Not tainted 5.15.0-rc4-next-20211008-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
> Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
> RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
> RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
> RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
> RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
> R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
> R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
> FS:  00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>   <TASK>
>   __list_add_rcu include/linux/rculist.h:79 [inline]
>   list_add_rcu include/linux/rculist.h:106 [inline]
>   netif_napi_add+0x3fd/0x9c0 net/core/dev.c:6889
>   veth_enable_xdp_range+0x1b1/0x300 drivers/net/veth.c:1009
>   veth_enable_xdp+0x2a5/0x620 drivers/net/veth.c:1063
>   veth_xdp_set drivers/net/veth.c:1483 [inline]
>   veth_xdp+0x4d4/0x780 drivers/net/veth.c:1523
>   bond_xdp_set drivers/net/bonding/bond_main.c:5217 [inline]
>   bond_xdp+0x325/0x920 drivers/net/bonding/bond_main.c:5263
>   dev_xdp_install+0xd5/0x270 net/core/dev.c:9365
>   dev_xdp_attach+0x83d/0x1010 net/core/dev.c:9513
>   dev_change_xdp_fd+0x246/0x300 net/core/dev.c:9753
>   do_setlink+0x2fb4/0x3970 net/core/rtnetlink.c:2931
>   rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
>   __rtnl_newlink+0xc06/0x1750 net/core/rtnetlink.c:3396
>   rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
>   rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
>   netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
>   netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>   netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
>   netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
>   sock_sendmsg_nosec net/socket.c:704 [inline]
>   sock_sendmsg+0xcf/0x120 net/socket.c:724
>   ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
>   ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
>   __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>   do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f841f2718d9
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f841e9e8188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f841f375f60 RCX: 00007f841f2718d9
> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
> RBP: 00007f841f2cbcb4 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffc8978d37f R14: 00007f841e9e8300 R15: 0000000000022000
>   </TASK>
> Modules linked in:
> ---[ end trace 7281cadbc8534f23 ]---
> RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
> Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
> RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
> RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
> RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
> RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
> R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
> R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
> FS:  00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
> syzbot can test patches for this issue, for details see:
> https://goo.gl/tpsmEJ#testing-patches
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-13 11:40 [syzbot] BUG: corrupted list in netif_napi_add syzbot
  2021-10-13 13:35 ` Daniel Borkmann
@ 2021-10-13 14:41 ` Paolo Abeni
  2021-12-14  7:52 ` syzbot
  2 siblings, 0 replies; 13+ messages in thread
From: Paolo Abeni @ 2021-10-13 14:41 UTC (permalink / raw)
  To: syzbot, andrii, ast, bpf, daniel, davem, hawk, john.fastabend,
	kafai, kpsingh, kuba, linux-kernel, netdev, songliubraving,
	syzkaller-bugs, yhs

On Wed, 2021-10-13 at 04:40 -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    683f29b781ae Add linux-next specific files for 20211008
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=1525a614b00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=673b3589d970c
> dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=17c98e98b00000
> 
> IMPORTANTIMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+62e474dd92a35e3060d8@syzkaller.appspotmail.com
> 
> IPv6: ADDRCONF(NETDEV_CHANGE): vcan0: link becomes ready
> list_add double add: new=ffff888023417160, prev=ffff88807de3a050, next=ffff888023417160.
> ------------[ cut here ]------------
> kernel BUG at lib/list_debug.c:29!
> invalid opcode: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 PID: 9490 Comm: syz-executor.1 Not tainted 5.15.0-rc4-next-20211008-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
> Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
> RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
> RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
> RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
> RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
> R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
> R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
> FS:  00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  __list_add_rcu include/linux/rculist.h:79 [inline]
>  list_add_rcu include/linux/rculist.h:106 [inline]
>  netif_napi_add+0x3fd/0x9c0 net/core/dev.c:6889
>  veth_enable_xdp_range+0x1b1/0x300 drivers/net/veth.c:1009
>  veth_enable_xdp+0x2a5/0x620 drivers/net/veth.c:1063
>  veth_xdp_set drivers/net/veth.c:1483 [inline]
>  veth_xdp+0x4d4/0x780 drivers/net/veth.c:1523
>  bond_xdp_set drivers/net/bonding/bond_main.c:5217 [inline]
>  bond_xdp+0x325/0x920 drivers/net/bonding/bond_main.c:5263
>  dev_xdp_install+0xd5/0x270 net/core/dev.c:9365
>  dev_xdp_attach+0x83d/0x1010 net/core/dev.c:9513
>  dev_change_xdp_fd+0x246/0x300 net/core/dev.c:9753
>  do_setlink+0x2fb4/0x3970 net/core/rtnetlink.c:2931
>  rtnl_group_changelink net/core/rtnetlink.c:3242 [inline]
>  __rtnl_newlink+0xc06/0x1750 net/core/rtnetlink.c:3396
>  rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3506
>  rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5572
>  netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2491
>  netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
>  netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
>  netlink_sendmsg+0x86d/0xda0 net/netlink/af_netlink.c:1916
>  sock_sendmsg_nosec net/socket.c:704 [inline]
>  sock_sendmsg+0xcf/0x120 net/socket.c:724
>  ____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
>  ___sys_sendmsg+0xf3/0x170 net/socket.c:2463
>  __sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f841f2718d9
> Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f841e9e8188 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
> RAX: ffffffffffffffda RBX: 00007f841f375f60 RCX: 00007f841f2718d9
> RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003
> RBP: 00007f841f2cbcb4 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007ffc8978d37f R14: 00007f841e9e8300 R15: 0000000000022000
>  </TASK>
> Modules linked in:
> ---[ end trace 7281cadbc8534f23 ]---
> RIP: 0010:__list_add_valid.cold+0x26/0x3c lib/list_debug.c:29
> Code: b1 24 c3 fa 4c 89 e1 48 c7 c7 60 56 04 8a e8 f2 8c f1 ff 0f 0b 48 89 f2 4c 89 e1 48 89 ee 48 c7 c7 a0 57 04 8a e8 db 8c f1 ff <0f> 0b 48 89 f1 48 c7 c7 20 57 04 8a 4c 89 e6 e8 c7 8c f1 ff 0f 0b
> RSP: 0018:ffffc90002c26a48 EFLAGS: 00010286
> RAX: 0000000000000058 RBX: 0000000000000040 RCX: 0000000000000000
> RDX: ffff888023263a00 RSI: ffffffff815e0d78 RDI: fffff52000584d3b
> RBP: ffff888023417160 R08: 0000000000000058 R09: 0000000000000000
> R10: ffffffff815dab5e R11: 0000000000000000 R12: ffff888023417160
> R13: ffff888023417000 R14: ffff888023417160 R15: ffff888023417160
> FS:  00007f841e9e8700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 00000000601bd000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

For the record: I'm wild guessing this is related to:

https://syzkaller.appspot.com/bug?extid=67f89551088ea1a6850e

(hopefully they share the same root cause)

I spent some time investigating the latter, with no real clue. This has
a repro, so I'll ask syzbot to provide more info with debug patches.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-13 13:35 ` Daniel Borkmann
@ 2021-10-14 13:50   ` Paolo Abeni
  2021-10-18 14:04     ` Vlad Buslov
  2021-10-19 10:11     ` Jussi Maki
  0 siblings, 2 replies; 13+ messages in thread
From: Paolo Abeni @ 2021-10-14 13:50 UTC (permalink / raw)
  To: Daniel Borkmann, syzbot, andrii, ast, bpf, davem, hawk,
	john.fastabend, kafai, kpsingh, kuba, linux-kernel, netdev,
	songliubraving, syzkaller-bugs, yhs
  Cc: toke, joamaki

On Wed, 2021-10-13 at 15:35 +0200, Daniel Borkmann wrote:
> On 10/13/21 1:40 PM, syzbot wrote:
> > Hello,
> > 
> > syzbot found the following issue on:
> 
> [ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]

For the records: Toke and me are actively investigating this issue and
the other recent related one. So far we could not find anything
relevant. 

The onluy note is that the reproducer is not extremelly reliable - I
could not reproduce locally, and multiple syzbot runs on the same code
give different results. Anyhow, so far the issue was only observerable
on a specific 'next' commit which is currently "not reachable" from any
branch. I'm wondering if the issue was caused by some incosistent
status of such tree.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-14 13:50   ` Paolo Abeni
@ 2021-10-18 14:04     ` Vlad Buslov
  2021-10-18 15:42       ` Jakub Kicinski
  2021-10-19 10:11     ` Jussi Maki
  1 sibling, 1 reply; 13+ messages in thread
From: Vlad Buslov @ 2021-10-18 14:04 UTC (permalink / raw)
  To: Paolo Abeni, Daniel Borkmann
  Cc: syzbot, andrii, ast, bpf, davem, hawk, john.fastabend, kafai,
	kpsingh, kuba, linux-kernel, netdev, songliubraving,
	syzkaller-bugs, yhs, toke, joamaki, Saeed Mahameed,
	Maxim Mikityanskiy


On Thu 14 Oct 2021 at 16:50, Paolo Abeni <pabeni@redhat.com> wrote:
> On Wed, 2021-10-13 at 15:35 +0200, Daniel Borkmann wrote:
>> On 10/13/21 1:40 PM, syzbot wrote:
>> > Hello,
>> > 
>> > syzbot found the following issue on:
>> 
>> [ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]
>
> For the records: Toke and me are actively investigating this issue and
> the other recent related one. So far we could not find anything
> relevant. 
>
> The onluy note is that the reproducer is not extremelly reliable - I
> could not reproduce locally, and multiple syzbot runs on the same code
> give different results. Anyhow, so far the issue was only observerable
> on a specific 'next' commit which is currently "not reachable" from any
> branch. I'm wondering if the issue was caused by some incosistent
> status of such tree.

Hi,

We got a use-after-free with very similar trace [0] during nightly
regression. The issue happens when ip link up/down state is flipped
several times in loop and doesn't reproduce for me manually. The fact
that it didn't reproduce for me after running test ten times suggests
that it is either very hard to reproduce or that it is a result of some
interaction between several tests in our suite.

[0]:

[ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
 [ 3187.890694] ==================================================================
 [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
 [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618
 [ 3187.895683] 
 [ 3187.896209] CPU: 0 PID: 119618 Comm: ip Not tainted 5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
 [ 3187.898445] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [ 3187.901075] Call Trace:
 [ 3187.901858]  dump_stack_lvl+0x57/0x7d
 [ 3187.902899]  print_address_description.constprop.0+0x1f/0x140
 [ 3187.904346]  ? __list_add_valid+0xc3/0xf0
 [ 3187.905439]  ? __list_add_valid+0xc3/0xf0
 [ 3187.906565]  kasan_report.cold+0x83/0xdf
 [ 3187.907619]  ? __list_add_valid+0xc3/0xf0
 [ 3187.908693]  __list_add_valid+0xc3/0xf0
 [ 3187.909765]  netif_napi_add+0x399/0x9a0
 [ 3187.910794]  ? kmalloc_order_trace+0x6a/0x120
 [ 3187.911944]  mlx5e_open_channels+0x91b/0x2e10 [mlx5_core]
 [ 3187.913872]  ? rwlock_bug.part.0+0x90/0x90
 [ 3187.914959]  ? mlx5e_close_cq+0x80/0x80 [mlx5_core]
 [ 3187.916584]  ? mutex_is_locked+0x13/0x50
 [ 3187.917703]  mlx5e_open_locked+0x6a/0x1f0 [mlx5_core]
 [ 3187.919368]  mlx5e_open+0x35/0xb0 [mlx5_core]
 [ 3187.920863]  __dev_open+0x22f/0x420
 [ 3187.921852]  ? dev_set_rx_mode+0x80/0x80
 [ 3187.922920]  ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
 [ 3187.924866]  ? __local_bh_enable_ip+0xa2/0x100
 [ 3187.926148]  ? trace_hardirqs_on+0x32/0x120
 [ 3187.927270]  __dev_change_flags+0x451/0x670
 [ 3187.928387]  ? dev_set_allmulti+0x10/0x10
 [ 3187.929480]  ? rtnl_fill_vfinfo+0x936/0xdb0
 [ 3187.930592]  dev_change_flags+0x8b/0x150
 [ 3187.931651]  do_setlink+0x820/0x2d60
 [ 3187.932631]  ? rtnetlink_put_metrics+0x490/0x490
 [ 3187.933852]  ? lock_release+0x460/0x750
 [ 3187.934881]  ? kvm_async_pf_task_wake+0x410/0x410
 [ 3187.936122]  ? lock_downgrade+0x6e0/0x6e0
 [ 3187.937203]  ? do_raw_spin_unlock+0x54/0x220
 [ 3187.938351]  ? memset+0x20/0x40
 [ 3187.939246]  ? __nla_validate_parse+0xb2/0x22c0
 [ 3187.940426]  ? do_raw_spin_lock+0x126/0x270
 [ 3187.941568]  ? push_cpu_stop+0x830/0x830
 [ 3187.942638]  ? rwlock_bug.part.0+0x90/0x90
 [ 3187.943733]  ? devlink_compat_switch_id_get+0xbb/0x100
 [ 3187.945065]  ? nla_get_range_signed+0x540/0x540
 [ 3187.946272]  ? memcpy+0x39/0x60
 [ 3187.947162]  ? memset+0x20/0x40
 [ 3187.948058]  ? memset+0x20/0x40
 [ 3187.948943]  __rtnl_newlink+0xac0/0x1370
 [ 3187.950038]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3187.951380]  ? rtnl_setlink+0x330/0x330
 [ 3187.952417]  ? deref_stack_reg+0x160/0x160
 [ 3187.953534]  ? deref_stack_reg+0xe6/0x160
 [ 3187.954619]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3187.955848]  ? lock_release+0x460/0x750
 [ 3187.956886]  ? is_bpf_text_address+0x54/0x110
 [ 3187.958047]  ? lock_downgrade+0x6e0/0x6e0
 [ 3187.959133]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3187.960469]  ? deref_stack_reg+0x160/0x160
 [ 3187.961592]  ? is_bpf_text_address+0x73/0x110
 [ 3187.962759]  ? kernel_text_address+0xda/0x100
 [ 3187.963920]  ? __kernel_text_address+0xe/0x30
 [ 3187.965069]  ? unwind_get_return_address+0x56/0xa0
 [ 3187.966334]  ? __thaw_task+0x70/0x70
 [ 3187.967320]  ? arch_stack_walk+0x98/0xf0
 [ 3187.968405]  ? lock_downgrade+0x6e0/0x6e0
 [ 3187.969510]  ? trace_hardirqs_on+0x32/0x120
 [ 3187.970644]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3187.971883]  rtnl_newlink+0x5f/0x90
 [ 3187.972866]  rtnetlink_rcv_msg+0x32b/0x950
 [ 3187.973968]  ? deref_stack_reg+0x160/0x160
 [ 3187.975088]  ? rtnl_fdb_dump+0x830/0x830
 [ 3187.976160]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3187.977393]  ? lock_acquire+0x38d/0x4c0
 [ 3187.978443]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3187.979685]  ? lock_acquire+0x38d/0x4c0
 [ 3187.980733]  netlink_rcv_skb+0x11d/0x340
 [ 3187.981812]  ? rtnl_fdb_dump+0x830/0x830
 [ 3187.982862]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3187.984105]  ? netlink_ack+0x930/0x930
 [ 3187.985136]  ? netlink_deliver_tap+0x140/0xb10
 [ 3187.986316]  ? netlink_deliver_tap+0x14c/0xb10
 [ 3187.987495]  ? _copy_from_iter+0x282/0xbe0
 [ 3187.988597]  netlink_unicast+0x433/0x700
 [ 3187.989693]  ? netlink_attachskb+0x740/0x740
 [ 3187.990819]  ? __alloc_skb+0x117/0x2c0
 [ 3187.991855]  netlink_sendmsg+0x707/0xbf0
 [ 3187.992921]  ? netlink_unicast+0x700/0x700
 [ 3187.994024]  ? netlink_unicast+0x700/0x700
 [ 3187.995121]  sock_sendmsg+0xb0/0xe0
 [ 3187.996091]  ____sys_sendmsg+0x4fa/0x6d0
 [ 3187.997163]  ? iovec_from_user+0x136/0x280
 [ 3187.998276]  ? kernel_sendmsg+0x30/0x30
 [ 3188.012806]  ? __import_iovec+0x51/0x610
 [ 3188.013858]  ___sys_sendmsg+0x12e/0x1b0
 [ 3188.014875]  ? do_recvmmsg+0x500/0x500
 [ 3188.015877]  ? get_max_files+0x10/0x10
 [ 3188.016866]  ? kasan_record_aux_stack+0xab/0xc0
 [ 3188.018108]  ? call_rcu+0x87/0xd40
 [ 3188.019041]  ? task_work_run+0xc5/0x160
 [ 3188.020044]  ? exit_to_user_mode_prepare+0x1d9/0x1e0
 [ 3188.021271]  ? syscall_exit_to_user_mode+0x19/0x50
 [ 3188.022563]  ? do_syscall_64+0x4a/0x90
 [ 3188.023559]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3188.024858]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.026121]  ? lock_release+0x460/0x750
 [ 3188.027174]  ? mntput_no_expire+0x113/0xb40
 [ 3188.028302]  ? lock_downgrade+0x6e0/0x6e0
 [ 3188.029398]  ? rwlock_bug.part.0+0x90/0x90
 [ 3188.030555]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.031812]  ? mntput_no_expire+0x132/0xb40
 [ 3188.032940]  ? __fget_light+0x51/0x220
 [ 3188.033986]  __sys_sendmsg+0xa4/0x120
 [ 3188.034992]  ? __sys_sendmsg_sock+0x20/0x20
 [ 3188.036115]  ? call_rcu+0x543/0xd40
 [ 3188.037084]  ? syscall_enter_from_user_mode+0x1d/0x50
 [ 3188.038406]  ? trace_hardirqs_on+0x32/0x120
 [ 3188.039515]  do_syscall_64+0x3d/0x90
 [ 3188.040502]  entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3188.041896] RIP: 0033:0x7f904ec94c17
 [ 3188.042891] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
 [ 3188.047412] RSP: 002b:00007ffc1a6c4a98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 [ 3188.049361] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f904ec94c17
 [ 3188.051121] RDX: 0000000000000000 RSI: 00007ffc1a6c4b00 RDI: 0000000000000003
 [ 3188.052881] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007f904ed55a40
 [ 3188.054645] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
 [ 3188.056403] R13: 00007ffc1a6c51b0 R14: 00007ffc1a6c6c87 R15: 000000000048f520
 [ 3188.058189] 
 [ 3188.058732] The buggy address belongs to the page:
 [ 3188.059996] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
 [ 3188.062378] flags: 0x8000000000000000(zone=2)
 [ 3188.063551] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
 [ 3188.065548] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
 [ 3188.067518] page dumped because: kasan: bad access detected
 [ 3188.068930] 
 [ 3188.069481] Memory state around the buggy address:
 [ 3188.070730]  ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.072618]  ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.074508] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.076378]                                         ^
 [ 3188.077711]  ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.079584]  ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.081470] ==================================================================
 [ 3188.083406] ==================================================================
 [ 3188.085280] BUG: KASAN: use-after-free in netif_napi_add+0x8b7/0x9a0
 [ 3188.086952] Write of size 8 at addr ffff8881150b3fb8 by task ip/119618
 [ 3188.089181] 
 [ 3188.089987] CPU: 0 PID: 119618 Comm: ip Tainted: G    B             5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
 [ 3188.092659] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [ 3188.095481] Call Trace:
 [ 3188.096222]  dump_stack_lvl+0x57/0x7d
 [ 3188.097238]  print_address_description.constprop.0+0x1f/0x140
 [ 3188.098764]  ? netif_napi_add+0x8b7/0x9a0
 [ 3188.099862]  ? netif_napi_add+0x8b7/0x9a0
 [ 3188.100940]  kasan_report.cold+0x83/0xdf
 [ 3188.102041]  ? netif_napi_add+0x8b7/0x9a0
 [ 3188.103140]  netif_napi_add+0x8b7/0x9a0
 [ 3188.104180]  ? kmalloc_order_trace+0x6a/0x120
 [ 3188.105336]  mlx5e_open_channels+0x91b/0x2e10 [mlx5_core]
 [ 3188.107145]  ? rwlock_bug.part.0+0x90/0x90
 [ 3188.108238]  ? mlx5e_close_cq+0x80/0x80 [mlx5_core]
 [ 3188.109882]  ? mutex_is_locked+0x13/0x50
 [ 3188.110985]  mlx5e_open_locked+0x6a/0x1f0 [mlx5_core]
 [ 3188.112644]  mlx5e_open+0x35/0xb0 [mlx5_core]
 [ 3188.114215]  __dev_open+0x22f/0x420
 [ 3188.115186]  ? dev_set_rx_mode+0x80/0x80
 [ 3188.116247]  ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
 [ 3188.118252]  ? __local_bh_enable_ip+0xa2/0x100
 [ 3188.119438]  ? trace_hardirqs_on+0x32/0x120
 [ 3188.120554]  __dev_change_flags+0x451/0x670
 [ 3188.121705]  ? dev_set_allmulti+0x10/0x10
 [ 3188.122828]  ? rtnl_fill_vfinfo+0x936/0xdb0
 [ 3188.123943]  dev_change_flags+0x8b/0x150
 [ 3188.124995]  do_setlink+0x820/0x2d60
 [ 3188.126023]  ? rtnetlink_put_metrics+0x490/0x490
 [ 3188.127233]  ? lock_release+0x460/0x750
 [ 3188.128269]  ? kvm_async_pf_task_wake+0x410/0x410
 [ 3188.129502]  ? lock_downgrade+0x6e0/0x6e0
 [ 3188.130620]  ? do_raw_spin_unlock+0x54/0x220
 [ 3188.131781]  ? memset+0x20/0x40
 [ 3188.132663]  ? __nla_validate_parse+0xb2/0x22c0
 [ 3188.133894]  ? do_raw_spin_lock+0x126/0x270
 [ 3188.135066]  ? push_cpu_stop+0x830/0x830
 [ 3188.136136]  ? rwlock_bug.part.0+0x90/0x90
 [ 3188.137230]  ? devlink_compat_switch_id_get+0xbb/0x100
 [ 3188.138585]  ? nla_get_range_signed+0x540/0x540
 [ 3188.139780]  ? memcpy+0x39/0x60
 [ 3188.140683]  ? memset+0x20/0x40
 [ 3188.141580]  ? memset+0x20/0x40
 [ 3188.142517]  __rtnl_newlink+0xac0/0x1370
 [ 3188.143579]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3188.144914]  ? rtnl_setlink+0x330/0x330
 [ 3188.145974]  ? deref_stack_reg+0x160/0x160
 [ 3188.147078]  ? deref_stack_reg+0xe6/0x160
 [ 3188.148157]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.149378]  ? lock_release+0x460/0x750
 [ 3188.150490]  ? is_bpf_text_address+0x54/0x110
 [ 3188.151648]  ? lock_downgrade+0x6e0/0x6e0
 [ 3188.152725]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3188.154075]  ? deref_stack_reg+0x160/0x160
 [ 3188.155176]  ? is_bpf_text_address+0x73/0x110
 [ 3188.156353]  ? kernel_text_address+0xda/0x100
 [ 3188.157510]  ? __kernel_text_address+0xe/0x30
 [ 3188.158707]  ? unwind_get_return_address+0x56/0xa0
 [ 3188.159992]  ? __thaw_task+0x70/0x70
 [ 3188.160979]  ? arch_stack_walk+0x98/0xf0
 [ 3188.162072]  ? lock_downgrade+0x6e0/0x6e0
 [ 3188.163167]  ? trace_hardirqs_on+0x32/0x120
 [ 3188.164295]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.165546]  rtnl_newlink+0x5f/0x90
 [ 3188.166558]  rtnetlink_rcv_msg+0x32b/0x950
 [ 3188.167677]  ? deref_stack_reg+0x160/0x160
 [ 3188.168782]  ? rtnl_fdb_dump+0x830/0x830
 [ 3188.169857]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.171089]  ? lock_acquire+0x38d/0x4c0
 [ 3188.172131]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.173367]  ? lock_acquire+0x38d/0x4c0
 [ 3188.174472]  netlink_rcv_skb+0x11d/0x340
 [ 3188.175531]  ? rtnl_fdb_dump+0x830/0x830
 [ 3188.176592]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.177824]  ? netlink_ack+0x930/0x930
 [ 3188.178848]  ? netlink_deliver_tap+0x140/0xb10
 [ 3188.180013]  ? netlink_deliver_tap+0x14c/0xb10
 [ 3188.181188]  ? _copy_from_iter+0x282/0xbe0
 [ 3188.182351]  netlink_unicast+0x433/0x700
 [ 3188.183418]  ? netlink_attachskb+0x740/0x740
 [ 3188.184552]  ? __alloc_skb+0x117/0x2c0
 [ 3188.185606]  netlink_sendmsg+0x707/0xbf0
 [ 3188.186672]  ? netlink_unicast+0x700/0x700
 [ 3188.187783]  ? netlink_unicast+0x700/0x700
 [ 3188.188882]  sock_sendmsg+0xb0/0xe0
 [ 3188.189862]  ____sys_sendmsg+0x4fa/0x6d0
 [ 3188.190971]  ? iovec_from_user+0x136/0x280
 [ 3188.192074]  ? kernel_sendmsg+0x30/0x30
 [ 3188.193130]  ? __import_iovec+0x51/0x610
 [ 3188.194225]  ___sys_sendmsg+0x12e/0x1b0
 [ 3188.195267]  ? do_recvmmsg+0x500/0x500
 [ 3188.196301]  ? get_max_files+0x10/0x10
 [ 3188.197333]  ? kasan_record_aux_stack+0xab/0xc0
 [ 3188.198558]  ? call_rcu+0x87/0xd40
 [ 3188.199519]  ? task_work_run+0xc5/0x160
 [ 3188.200557]  ? exit_to_user_mode_prepare+0x1d9/0x1e0
 [ 3188.201872]  ? syscall_exit_to_user_mode+0x19/0x50
 [ 3188.203134]  ? do_syscall_64+0x4a/0x90
 [ 3188.204152]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3188.205511]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.206782]  ? lock_release+0x460/0x750
 [ 3188.207870]  ? mntput_no_expire+0x113/0xb40
 [ 3188.209025]  ? lock_downgrade+0x6e0/0x6e0
 [ 3188.210272]  ? rwlock_bug.part.0+0x90/0x90
 [ 3188.211864]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.213644]  ? mntput_no_expire+0x132/0xb40
 [ 3188.215253]  ? __fget_light+0x51/0x220
 [ 3188.216535]  __sys_sendmsg+0xa4/0x120
 [ 3188.217574]  ? __sys_sendmsg_sock+0x20/0x20
 [ 3188.218707]  ? call_rcu+0x543/0xd40
 [ 3188.219679]  ? syscall_enter_from_user_mode+0x1d/0x50
 [ 3188.221004]  ? trace_hardirqs_on+0x32/0x120
 [ 3188.235475]  do_syscall_64+0x3d/0x90
 [ 3188.236463]  entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3188.237744] RIP: 0033:0x7f904ec94c17
 [ 3188.238693] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
 [ 3188.242968] RSP: 002b:00007ffc1a6c4a98 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 [ 3188.244834] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f904ec94c17
 [ 3188.246604] RDX: 0000000000000000 RSI: 00007ffc1a6c4b00 RDI: 0000000000000003
 [ 3188.248362] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007f904ed55a40
 [ 3188.250140] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
 [ 3188.251889] R13: 00007ffc1a6c51b0 R14: 00007ffc1a6c6c87 R15: 000000000048f520
 [ 3188.253667] 
 [ 3188.254215] The buggy address belongs to the page:
 [ 3188.255460] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
 [ 3188.257812] flags: 0x8000000000000000(zone=2)
 [ 3188.258985] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
 [ 3188.260971] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
 [ 3188.262993] page dumped because: kasan: bad access detected
 [ 3188.264413] 
 [ 3188.264943] Memory state around the buggy address:
 [ 3188.266203]  ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.268082]  ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.269957] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.271818]                                         ^
 [ 3188.273122]  ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.275000]  ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.276862] ==================================================================
 [ 3188.371511] mlx5_core 0000:08:00.0 enp8s0f0: Link up
 [ 3188.376126] IPv6: ADDRCONF(NETDEV_CHANGE): enp8s0f0: link becomes ready
 [ 3188.430532] ==================================================================
 [ 3188.432378] BUG: KASAN: use-after-free in __list_del_entry_valid+0x14b/0x180
 [ 3188.434254] Read of size 8 at addr ffff8881150b3fb8 by task ip/119619
 [ 3188.435826] 
 [ 3188.436365] CPU: 3 PID: 119619 Comm: ip Tainted: G    B             5.15.0-rc5_for_upstream_debug_2021_10_17_12_06 #1
 [ 3188.439688] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 [ 3188.442423] Call Trace:
 [ 3188.443172]  dump_stack_lvl+0x57/0x7d
 [ 3188.444186]  print_address_description.constprop.0+0x1f/0x140
 [ 3188.445703]  ? __list_del_entry_valid+0x14b/0x180
 [ 3188.447004]  ? __list_del_entry_valid+0x14b/0x180
 [ 3188.448255]  kasan_report.cold+0x83/0xdf
 [ 3188.449323]  ? __list_del_entry_valid+0x14b/0x180
 [ 3188.450670]  __list_del_entry_valid+0x14b/0x180
 [ 3188.451887]  ? _raw_spin_unlock+0x1f/0x30
 [ 3188.452969]  __netif_napi_del.part.0+0xec/0x4a0
 [ 3188.454453]  mlx5e_close_channel+0x7d/0xd0 [mlx5_core]
 [ 3188.456988]  mlx5e_close_channels+0xf9/0x200 [mlx5_core]
 [ 3188.459599]  mlx5e_close_locked+0x101/0x130 [mlx5_core]
 [ 3188.462156]  mlx5e_close+0xad/0x100 [mlx5_core]
 [ 3188.463961]  __dev_close_many+0x18e/0x2b0
 [ 3188.465045]  ? list_netdevice+0x3a0/0x3a0
 [ 3188.466187]  ? __mlx5_eswitch_set_vport_vlan+0x290/0x290 [mlx5_core]
 [ 3188.468156]  ? __local_bh_enable_ip+0xa2/0x100
 [ 3188.469333]  ? trace_hardirqs_on+0x32/0x120
 [ 3188.470496]  __dev_change_flags+0x254/0x670
 [ 3188.471605]  ? dev_set_allmulti+0x10/0x10
 [ 3188.472692]  ? rtnl_fill_vfinfo+0x936/0xdb0
 [ 3188.473854]  dev_change_flags+0x8b/0x150
 [ 3188.474965]  do_setlink+0x820/0x2d60
 [ 3188.475950]  ? rtnetlink_put_metrics+0x490/0x490
 [ 3188.477165]  ? lock_release+0x460/0x750
 [ 3188.478306]  ? kvm_async_pf_task_wake+0x410/0x410
 [ 3188.479542]  ? lock_downgrade+0x6e0/0x6e0
 [ 3188.480615]  ? do_raw_spin_unlock+0x54/0x220
 [ 3188.481790]  ? memset+0x20/0x40
 [ 3188.482963]  ? __nla_validate_parse+0xb2/0x22c0
 [ 3188.484167]  ? do_raw_spin_lock+0x126/0x270
 [ 3188.485281]  ? push_cpu_stop+0x830/0x830
 [ 3188.486457]  ? rwlock_bug.part.0+0x90/0x90
 [ 3188.487557]  ? devlink_compat_switch_id_get+0xbb/0x100
 [ 3188.488894]  ? nla_get_range_signed+0x540/0x540
 [ 3188.490168]  ? memcpy+0x39/0x60
 [ 3188.491083]  ? memset+0x20/0x40
 [ 3188.491966]  ? memset+0x20/0x40
 [ 3188.492855]  __rtnl_newlink+0xac0/0x1370
 [ 3188.493987]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3188.495384]  ? rtnl_setlink+0x330/0x330
 [ 3188.496446]  ? deref_stack_reg+0x160/0x160
 [ 3188.497551]  ? deref_stack_reg+0xe6/0x160
 [ 3188.498713]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.499929]  ? lock_release+0x460/0x750
 [ 3188.501232]  ? is_bpf_text_address+0x54/0x110
 [ 3188.502735]  ? lock_downgrade+0x6e0/0x6e0
 [ 3188.503831]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3188.505157]  ? deref_stack_reg+0x160/0x160
 [ 3188.506298]  ? is_bpf_text_address+0x73/0x110
 [ 3188.507459]  ? kernel_text_address+0xda/0x100
 [ 3188.508615]  ? __kernel_text_address+0xe/0x30
 [ 3188.509776]  ? unwind_get_return_address+0x56/0xa0
 [ 3188.511047]  ? __thaw_task+0x70/0x70
 [ 3188.512033]  ? arch_stack_walk+0x98/0xf0
 [ 3188.513059]  ? lock_downgrade+0x6e0/0x6e0
 [ 3188.514191]  ? trace_hardirqs_on+0x32/0x120
 [ 3188.515303]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.516524]  rtnl_newlink+0x5f/0x90
 [ 3188.517513]  rtnetlink_rcv_msg+0x32b/0x950
 [ 3188.518652]  ? deref_stack_reg+0x160/0x160
 [ 3188.519761]  ? rtnl_fdb_dump+0x830/0x830
 [ 3188.520816]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.522119]  ? lock_acquire+0x38d/0x4c0
 [ 3188.523211]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.524435]  ? lock_acquire+0x38d/0x4c0
 [ 3188.525498]  netlink_rcv_skb+0x11d/0x340
 [ 3188.526649]  ? rtnl_fdb_dump+0x830/0x830
 [ 3188.527722]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.528949]  ? netlink_ack+0x930/0x930
 [ 3188.530055]  ? netlink_deliver_tap+0x140/0xb10
 [ 3188.531347]  ? netlink_deliver_tap+0x14c/0xb10
 [ 3188.532549]  ? _copy_from_iter+0x282/0xbe0
 [ 3188.533711]  netlink_unicast+0x433/0x700
 [ 3188.534845]  ? netlink_attachskb+0x740/0x740
 [ 3188.535987]  ? __alloc_skb+0x117/0x2c0
 [ 3188.537006]  netlink_sendmsg+0x707/0xbf0
 [ 3188.538150]  ? netlink_unicast+0x700/0x700
 [ 3188.539337]  ? netlink_unicast+0x700/0x700
 [ 3188.540448]  sock_sendmsg+0xb0/0xe0
 [ 3188.541424]  ____sys_sendmsg+0x4fa/0x6d0
 [ 3188.542743]  ? iovec_from_user+0x136/0x280
 [ 3188.543932]  ? kernel_sendmsg+0x30/0x30
 [ 3188.544963]  ? __import_iovec+0x51/0x610
 [ 3188.546063]  ___sys_sendmsg+0x12e/0x1b0
 [ 3188.547189]  ? do_recvmmsg+0x500/0x500
 [ 3188.548209]  ? get_max_files+0x10/0x10
 [ 3188.549226]  ? kasan_record_aux_stack+0xab/0xc0
 [ 3188.550547]  ? call_rcu+0x87/0xd40
 [ 3188.551509]  ? task_work_run+0xc5/0x160
 [ 3188.552546]  ? exit_to_user_mode_prepare+0x1d9/0x1e0
 [ 3188.553896]  ? syscall_exit_to_user_mode+0x19/0x50
 [ 3188.555195]  ? do_syscall_64+0x4a/0x90
 [ 3188.556206]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3188.557634]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.558903]  ? lock_release+0x460/0x750
 [ 3188.559948]  ? mntput_no_expire+0x113/0xb40
 [ 3188.561059]  ? lock_downgrade+0x6e0/0x6e0
 [ 3188.562231]  ? rwlock_bug.part.0+0x90/0x90
 [ 3188.563338]  ? rcu_read_lock_sched_held+0x12/0x70
 [ 3188.564583]  ? mntput_no_expire+0x132/0xb40
 [ 3188.565731]  ? __fget_light+0x51/0x220
 [ 3188.566858]  __sys_sendmsg+0xa4/0x120
 [ 3188.567878]  ? __sys_sendmsg_sock+0x20/0x20
 [ 3188.568995]  ? call_rcu+0x543/0xd40
 [ 3188.570047]  ? syscall_enter_from_user_mode+0x1d/0x50
 [ 3188.571387]  ? trace_hardirqs_on+0x32/0x120
 [ 3188.572502]  do_syscall_64+0x3d/0x90
 [ 3188.573491]  entry_SYSCALL_64_after_hwframe+0x44/0xae
 [ 3188.574916] RIP: 0033:0x7fc68ffd4c17
 [ 3188.575900] Code: 0c 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
 [ 3188.580625] RSP: 002b:00007ffd26634f18 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
 [ 3188.582945] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc68ffd4c17
 [ 3188.585684] RDX: 0000000000000000 RSI: 00007ffd26634f80 RDI: 0000000000000003
 [ 3188.587965] RBP: 00000000616c5eef R08: 0000000000000001 R09: 00007fc690095a40
 [ 3188.589788] R10: fffffffffffff3d6 R11: 0000000000000246 R12: 0000000000000001
 [ 3188.591618] R13: 00007ffd26635630 R14: 00007ffd26635c85 R15: 000000000048f520
 [ 3188.593365] 
 [ 3188.593953] The buggy address belongs to the page:
 [ 3188.595288] page:000000003ccb70fc refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1150b3
 [ 3188.597966] flags: 0x8000000000000000(zone=2)
 [ 3188.599643] raw: 8000000000000000 0000000000000000 dead000000000122 0000000000000000
 [ 3188.601766] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
 [ 3188.603786] page dumped because: kasan: bad access detected
 [ 3188.622507] 
 [ 3188.623291] Memory state around the buggy address:
 [ 3188.625031]  ffff8881150b3e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.627617]  ffff8881150b3f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.630275] >ffff8881150b3f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.632956]                                         ^
 [ 3188.634838]  ffff8881150b4000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.637544]  ffff8881150b4080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 [ 3188.640221] ==================================================================

[...]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-18 14:04     ` Vlad Buslov
@ 2021-10-18 15:42       ` Jakub Kicinski
  2021-10-18 16:12         ` Vlad Buslov
  2021-10-18 17:40         ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 13+ messages in thread
From: Jakub Kicinski @ 2021-10-18 15:42 UTC (permalink / raw)
  To: Vlad Buslov
  Cc: Paolo Abeni, Daniel Borkmann, syzbot, andrii, ast, bpf, davem,
	hawk, john.fastabend, kafai, kpsingh, linux-kernel, netdev,
	songliubraving, syzkaller-bugs, yhs, toke, joamaki,
	Saeed Mahameed, Maxim Mikityanskiy

On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
> We got a use-after-free with very similar trace [0] during nightly
> regression. The issue happens when ip link up/down state is flipped
> several times in loop and doesn't reproduce for me manually. The fact
> that it didn't reproduce for me after running test ten times suggests
> that it is either very hard to reproduce or that it is a result of some
> interaction between several tests in our suite.
> 
> [0]:
> 
> [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
>  [ 3187.890694] ==================================================================
>  [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
>  [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618

Hm, not sure how similar it is. This one looks like channel was freed
without deleting NAPI. Do you have list debug enabled?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-18 15:42       ` Jakub Kicinski
@ 2021-10-18 16:12         ` Vlad Buslov
  2021-10-18 23:31           ` Saeed Mahameed
  2021-10-18 17:40         ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 13+ messages in thread
From: Vlad Buslov @ 2021-10-18 16:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Paolo Abeni, Daniel Borkmann, syzbot, andrii, ast, bpf, davem,
	hawk, john.fastabend, kafai, kpsingh, linux-kernel, netdev,
	songliubraving, syzkaller-bugs, yhs, toke, joamaki,
	Saeed Mahameed, Maxim Mikityanskiy

On Mon 18 Oct 2021 at 18:42, Jakub Kicinski <kuba@kernel.org> wrote:
> On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
>> We got a use-after-free with very similar trace [0] during nightly
>> regression. The issue happens when ip link up/down state is flipped
>> several times in loop and doesn't reproduce for me manually. The fact
>> that it didn't reproduce for me after running test ten times suggests
>> that it is either very hard to reproduce or that it is a result of some
>> interaction between several tests in our suite.
>> 
>> [0]:
>> 
>> [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
>>  [ 3187.890694] ==================================================================
>>  [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
>>  [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618
>
> Hm, not sure how similar it is. This one looks like channel was freed
> without deleting NAPI. Do you have list debug enabled?

Yes, CONFIG_DEBUG_LIST is enabled.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-18 15:42       ` Jakub Kicinski
  2021-10-18 16:12         ` Vlad Buslov
@ 2021-10-18 17:40         ` Toke Høiland-Jørgensen
  2021-10-18 17:58           ` Jakub Kicinski
  1 sibling, 1 reply; 13+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-10-18 17:40 UTC (permalink / raw)
  To: Jakub Kicinski, Vlad Buslov
  Cc: Paolo Abeni, Daniel Borkmann, syzbot, andrii, ast, bpf, davem,
	hawk, john.fastabend, kafai, kpsingh, linux-kernel, netdev,
	songliubraving, syzkaller-bugs, yhs, joamaki, Saeed Mahameed,
	Maxim Mikityanskiy

Jakub Kicinski <kuba@kernel.org> writes:

> On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
>> We got a use-after-free with very similar trace [0] during nightly
>> regression. The issue happens when ip link up/down state is flipped
>> several times in loop and doesn't reproduce for me manually. The fact
>> that it didn't reproduce for me after running test ten times suggests
>> that it is either very hard to reproduce or that it is a result of some
>> interaction between several tests in our suite.
>> 
>> [0]:
>> 
>> [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
>>  [ 3187.890694] ==================================================================
>>  [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
>>  [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618
>
> Hm, not sure how similar it is. This one looks like channel was freed
> without deleting NAPI. Do you have list debug enabled?

Well, the other report[0] also kinda looks like the NAPI thread keeps
running after it should have been disabled, so maybe they are in fact
related?

-Toke

[0] https://lore.kernel.org/r/000000000000c1524005cdeacc5f@google.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-18 17:40         ` Toke Høiland-Jørgensen
@ 2021-10-18 17:58           ` Jakub Kicinski
  0 siblings, 0 replies; 13+ messages in thread
From: Jakub Kicinski @ 2021-10-18 17:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Vlad Buslov, Paolo Abeni, Daniel Borkmann, syzbot, andrii, ast,
	bpf, davem, hawk, john.fastabend, kafai, kpsingh, linux-kernel,
	netdev, songliubraving, syzkaller-bugs, yhs, joamaki,
	Saeed Mahameed, Maxim Mikityanskiy

On Mon, 18 Oct 2021 19:40:40 +0200 Toke Høiland-Jørgensen wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
> 
> > On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:  
> >> We got a use-after-free with very similar trace [0] during nightly
> >> regression. The issue happens when ip link up/down state is flipped
> >> several times in loop and doesn't reproduce for me manually. The fact
> >> that it didn't reproduce for me after running test ten times suggests
> >> that it is either very hard to reproduce or that it is a result of some
> >> interaction between several tests in our suite.
> >> 
> >> [0]:
> >> 
> >> [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
> >>  [ 3187.890694] ==================================================================
> >>  [ 3187.892518] BUG: KASAN: use-after-free in __list_add_valid+0xc3/0xf0
> >>  [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task ip/119618  
> >
> > Hm, not sure how similar it is. This one looks like channel was freed
> > without deleting NAPI. Do you have list debug enabled?  
> 
> Well, the other report[0] also kinda looks like the NAPI thread keeps
> running after it should have been disabled, so maybe they are in fact
> related?
> 
> [0] https://lore.kernel.org/r/000000000000c1524005cdeacc5f@google.com

Could be, if napi->state gets corrupted it may lose NAPI_STATE_LISTED.

719c57197010 ("net: make napi_disable() symmetric with enable")
3765996e4f0b ("napi: fix race inside napi_enable")
is the only thing that comes to mind, but they look fine to me.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-18 16:12         ` Vlad Buslov
@ 2021-10-18 23:31           ` Saeed Mahameed
  0 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2021-10-18 23:31 UTC (permalink / raw)
  To: Vlad Buslov, kuba
  Cc: songliubraving, hawk, syzkaller-bugs, kafai, davem,
	john.fastabend, andrii, linux-kernel, pabeni, kpsingh, ast,
	joamaki, yhs, toke, daniel, bpf, Maxim Mikityanskiy, netdev,
	syzbot+62e474dd92a35e3060d8

On Mon, 2021-10-18 at 19:12 +0300, Vlad Buslov wrote:
> On Mon 18 Oct 2021 at 18:42, Jakub Kicinski <kuba@kernel.org> wrote:
> > On Mon, 18 Oct 2021 17:04:19 +0300 Vlad Buslov wrote:
> > > We got a use-after-free with very similar trace [0] during
> > > nightly
> > > regression. The issue happens when ip link up/down state is
> > > flipped
> > > several times in loop and doesn't reproduce for me manually. The
> > > fact
> > > that it didn't reproduce for me after running test ten times
> > > suggests
> > > that it is either very hard to reproduce or that it is a result
> > > of some
> > > interaction between several tests in our suite.
> > > 
> > > [0]:
> > > 
> > > [ 3187.779569] mlx5_core 0000:08:00.0 enp8s0f0: Link up
> > >  [ 3187.890694]
> > > =================================================================
> > > =
> > >  [ 3187.892518] BUG: KASAN: use-after-free in
> > > __list_add_valid+0xc3/0xf0
> > >  [ 3187.894132] Read of size 8 at addr ffff8881150b3fb8 by task
> > > ip/119618
> > 
> > Hm, not sure how similar it is. This one looks like channel was
> > freed
> > without deleting NAPI. Do you have list debug enabled?
> 
> Yes, CONFIG_DEBUG_LIST is enabled.
> 
do you have core dumps ?
let's enable kernel.panic_on_oops with core dumps and look at it next
time we see this, I really don't think mlx5 is leaking.. 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-14 13:50   ` Paolo Abeni
  2021-10-18 14:04     ` Vlad Buslov
@ 2021-10-19 10:11     ` Jussi Maki
  1 sibling, 0 replies; 13+ messages in thread
From: Jussi Maki @ 2021-10-19 10:11 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: Daniel Borkmann, syzbot, Andrii Nakryiko, ast, bpf, davem, hawk,
	john.fastabend, kafai, kpsingh, kuba, linux-kernel,
	Network Development, songliubraving, syzkaller-bugs, yhs, toke

On Thu, Oct 14, 2021 at 3:50 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2021-10-13 at 15:35 +0200, Daniel Borkmann wrote:
> > On 10/13/21 1:40 PM, syzbot wrote:
> > > Hello,
> > >
> > > syzbot found the following issue on:
> >
> > [ +Paolo/Toke wrt veth/XDP, +Jussi wrt bond/XDP, please take a look, thanks! ]
>
> For the records: Toke and me are actively investigating this issue and
> the other recent related one. So far we could not find anything
> relevant.
>
> The onluy note is that the reproducer is not extremelly reliable - I
> could not reproduce locally, and multiple syzbot runs on the same code
> give different results. Anyhow, so far the issue was only observerable
> on a specific 'next' commit which is currently "not reachable" from any
> branch. I'm wondering if the issue was caused by some incosistent
> status of such tree.
>

Hey,

I took a look at the bond/XDP related bits and couldn't find anything
obvious there. And for what it's worth, I was running the syzbot repro
under bpf-next tree (223f903e9c8) in the bpf vmtest.sh environment for
30 minutes without hitting this. An inconsistent tree might be a
plausible cause.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-10-13 11:40 [syzbot] BUG: corrupted list in netif_napi_add syzbot
  2021-10-13 13:35 ` Daniel Borkmann
  2021-10-13 14:41 ` Paolo Abeni
@ 2021-12-14  7:52 ` syzbot
  2021-12-14 17:56   ` Dmitry Vyukov
  2 siblings, 1 reply; 13+ messages in thread
From: syzbot @ 2021-12-14  7:52 UTC (permalink / raw)
  To: alexandr.lobakin, andrii, ast, bpf, daniel, davem, dvyukov,
	edumazet, hawk, hdanton, jesse.brandeburg, joamaki,
	john.fastabend, kafai, kpsingh, kuba, linux-kernel, maximmi,
	netdev, pabeni, saeedm, songliubraving, syzkaller-bugs, toke,
	vladbu, yhs

syzbot suspects this issue was fixed by commit:

commit 0315a075f1343966ea2d9a085666a88a69ea6a3d
Author: Alexander Lobakin <alexandr.lobakin@intel.com>
Date:   Wed Nov 10 19:56:05 2021 +0000

    net: fix premature exit from NAPI state polling in napi_disable()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=138dffbeb00000
start commit:   911e3a46fb38 net: phy: Fix unsigned comparison with less t..
git tree:       net-next
kernel config:  https://syzkaller.appspot.com/x/.config?x=d36d2402e8523638
dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=141592f2b00000

If the result looks correct, please mark the issue as fixed by replying with:

#syz fix: net: fix premature exit from NAPI state polling in napi_disable()

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [syzbot] BUG: corrupted list in netif_napi_add
  2021-12-14  7:52 ` syzbot
@ 2021-12-14 17:56   ` Dmitry Vyukov
  0 siblings, 0 replies; 13+ messages in thread
From: Dmitry Vyukov @ 2021-12-14 17:56 UTC (permalink / raw)
  To: syzbot
  Cc: alexandr.lobakin, andrii, ast, bpf, daniel, davem, edumazet,
	hawk, hdanton, jesse.brandeburg, joamaki, john.fastabend, kafai,
	kpsingh, kuba, linux-kernel, maximmi, netdev, pabeni, saeedm,
	songliubraving, syzkaller-bugs, toke, vladbu, yhs

On Tue, 14 Dec 2021 at 08:52, syzbot
<syzbot+62e474dd92a35e3060d8@syzkaller.appspotmail.com> wrote:
>
> syzbot suspects this issue was fixed by commit:
>
> commit 0315a075f1343966ea2d9a085666a88a69ea6a3d
> Author: Alexander Lobakin <alexandr.lobakin@intel.com>
> Date:   Wed Nov 10 19:56:05 2021 +0000
>
>     net: fix premature exit from NAPI state polling in napi_disable()
>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=138dffbeb00000
> start commit:   911e3a46fb38 net: phy: Fix unsigned comparison with less t..
> git tree:       net-next
> kernel config:  https://syzkaller.appspot.com/x/.config?x=d36d2402e8523638
> dashboard link: https://syzkaller.appspot.com/bug?extid=62e474dd92a35e3060d8
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=141592f2b00000
>
> If the result looks correct, please mark the issue as fixed by replying with:
>
> #syz fix: net: fix premature exit from NAPI state polling in napi_disable()
>
> For information about bisection process see: https://goo.gl/tpsmEJ#bisection


Looks reasonable based on the subsystem:

#syz fix: net: fix premature exit from NAPI state polling in napi_disable()

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-12-14 17:57 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-13 11:40 [syzbot] BUG: corrupted list in netif_napi_add syzbot
2021-10-13 13:35 ` Daniel Borkmann
2021-10-14 13:50   ` Paolo Abeni
2021-10-18 14:04     ` Vlad Buslov
2021-10-18 15:42       ` Jakub Kicinski
2021-10-18 16:12         ` Vlad Buslov
2021-10-18 23:31           ` Saeed Mahameed
2021-10-18 17:40         ` Toke Høiland-Jørgensen
2021-10-18 17:58           ` Jakub Kicinski
2021-10-19 10:11     ` Jussi Maki
2021-10-13 14:41 ` Paolo Abeni
2021-12-14  7:52 ` syzbot
2021-12-14 17:56   ` Dmitry Vyukov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).