All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING: locking bug in inet_autobind
@ 2019-05-16  5:46 syzbot
  2019-05-21  8:31 ` syzbot
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: syzbot @ 2019-05-16  5:46 UTC (permalink / raw)
  To: ast, bpf, daniel, davem, kafai, kuznet, linux-kernel, netdev,
	songliubraving, syzkaller-bugs, yhs, yoshfuji

Hello,

syzbot found the following crash on:

HEAD commit:    35c99ffa Merge tag 'for_linus' of git://git.kernel.org/pub..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=10e970f4a00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=82f0809e8f0a8c87
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com

WARNING: CPU: 1 PID: 32543 at kernel/locking/lockdep.c:734  
arch_local_save_flags arch/x86/include/asm/paravirt.h:762 [inline]
WARNING: CPU: 1 PID: 32543 at kernel/locking/lockdep.c:734  
arch_local_save_flags arch/x86/include/asm/paravirt.h:760 [inline]
WARNING: CPU: 1 PID: 32543 at kernel/locking/lockdep.c:734  
look_up_lock_class kernel/locking/lockdep.c:725 [inline]
WARNING: CPU: 1 PID: 32543 at kernel/locking/lockdep.c:734  
register_lock_class+0xe10/0x1860 kernel/locking/lockdep.c:1078
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 32543 Comm: syz-executor.4 Not tainted 5.1.0+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  panic+0x2cb/0x65c kernel/panic.c:214
  __warn.cold+0x20/0x45 kernel/panic.c:566
  report_bug+0x263/0x2b0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:180 [inline]
  fixup_bug arch/x86/kernel/traps.c:175 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:273
  do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:292
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:972
RIP: 0010:look_up_lock_class kernel/locking/lockdep.c:734 [inline]
RIP: 0010:register_lock_class+0xe10/0x1860 kernel/locking/lockdep.c:1078
Code: 00 48 89 da 4d 8b 76 c0 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80  
3c 02 00 0f 85 23 07 00 00 4c 89 33 e9 e3 f4 ff ff 0f 0b <0f> 0b e9 ea f3  
ff ff 44 89 e0 4c 8b 95 50 ff ff ff 83 c0 01 4c 8b
RSP: 0018:ffff88806395f9e8 EFLAGS: 00010083
RAX: dffffc0000000000 RBX: ffff8880a947f1e0 RCX: 0000000000000000
RDX: 1ffff1101528fe3f RSI: 0000000000000000 RDI: ffff8880a947f1f8
RBP: ffff88806395fab0 R08: 1ffff1100c72bf45 R09: ffffffff8a459c80
R10: ffffffff8a0e47e0 R11: 0000000000000000 R12: ffffffff8a1235a0
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff87fe4c60
  __lock_acquire+0x116/0x5490 kernel/locking/lockdep.c:3673
  lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4302
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
  _raw_spin_lock_bh+0x33/0x50 kernel/locking/spinlock.c:175
  spin_lock_bh include/linux/spinlock.h:343 [inline]
  lock_sock_nested+0x41/0x120 net/core/sock.c:2917
  lock_sock include/net/sock.h:1525 [inline]
  inet_autobind+0x20/0x1a0 net/ipv4/af_inet.c:183
  inet_dgram_connect+0x252/0x2e0 net/ipv4/af_inet.c:573
  __sys_connect+0x266/0x330 net/socket.c:1840
  __do_sys_connect net/socket.c:1851 [inline]
  __se_sys_connect net/socket.c:1848 [inline]
  __x64_sys_connect+0x73/0xb0 net/socket.c:1848
  do_syscall_64+0x103/0x680 arch/x86/entry/common.c:301
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x458da9
Code: ad b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 7b b8 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f695f8b6c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000458da9
RDX: 000000000000001c RSI: 0000000020000000 RDI: 0000000000000003
RBP: 000000000073bf00 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f695f8b76d4
R13: 00000000004bf1fe R14: 00000000004d04f8 R15: 00000000ffffffff
Kernel Offset: disabled
Rebooting in 86400 seconds..


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING: locking bug in inet_autobind
  2019-05-16  5:46 WARNING: locking bug in inet_autobind syzbot
@ 2019-05-21  8:31 ` syzbot
  2019-05-22  3:16 ` syzbot
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: syzbot @ 2019-05-21  8:31 UTC (permalink / raw)
  To: ast, bpf, daniel, davem, kafai, kuznet, linux-kernel, netdev,
	songliubraving, syzkaller-bugs, yhs, yoshfuji

syzbot has found a reproducer for the following crash on:

HEAD commit:    f49aa1de Merge tag 'for-5.2-rc1-tag' of git://git.kernel.o..
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=14e5b130a00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=fc045131472947d7
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=163731f8a00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com

WARNING: CPU: 1 PID: 28592 at kernel/locking/lockdep.c:734  
arch_local_save_flags arch/x86/include/asm/paravirt.h:762 [inline]
WARNING: CPU: 1 PID: 28592 at kernel/locking/lockdep.c:734  
arch_local_save_flags arch/x86/include/asm/paravirt.h:760 [inline]
WARNING: CPU: 1 PID: 28592 at kernel/locking/lockdep.c:734  
look_up_lock_class kernel/locking/lockdep.c:725 [inline]
WARNING: CPU: 1 PID: 28592 at kernel/locking/lockdep.c:734  
register_lock_class+0xe10/0x1860 kernel/locking/lockdep.c:1078
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 28592 Comm: syz-executor.5 Not tainted 5.2.0-rc1+ #1
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x172/0x1f0 lib/dump_stack.c:113
  panic+0x2cb/0x744 kernel/panic.c:218
  __warn.cold+0x20/0x4d kernel/panic.c:575
  report_bug+0x263/0x2b0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:179 [inline]
  fixup_bug arch/x86/kernel/traps.c:174 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272
  do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:986
RIP: 0010:look_up_lock_class kernel/locking/lockdep.c:734 [inline]
RIP: 0010:register_lock_class+0xe10/0x1860 kernel/locking/lockdep.c:1078
Code: 00 48 89 da 4d 8b 76 c0 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 80  
3c 02 00 0f 85 23 07 00 00 4c 89 33 e9 e3 f4 ff ff 0f 0b <0f> 0b e9 ea f3  
ff ff 44 89 e0 4c 8b 95 50 ff ff ff 83 c0 01 4c 8b
RSP: 0018:ffff888093d179e8 EFLAGS: 00010083
RAX: dffffc0000000000 RBX: ffff8880967cd160 RCX: 0000000000000000
RDX: 1ffff11012cf9a2f RSI: 0000000000000000 RDI: ffff8880967cd178
RBP: ffff888093d17ab0 R08: 1ffff110127a2f45 R09: ffffffff8a659d40
R10: ffffffff8a2e8440 R11: 0000000000000000 R12: ffffffff8a323030
R13: 0000000000000000 R14: 0000000000000000 R15: ffffffff88022ba0
  __lock_acquire+0x116/0x5490 kernel/locking/lockdep.c:3673
  lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4302
  __raw_spin_lock_bh include/linux/spinlock_api_smp.h:135 [inline]
  _raw_spin_lock_bh+0x33/0x50 kernel/locking/spinlock.c:175
  spin_lock_bh include/linux/spinlock.h:343 [inline]
  lock_sock_nested+0x41/0x120 net/core/sock.c:2917
  lock_sock include/net/sock.h:1525 [inline]
  inet_autobind+0x20/0x1a0 net/ipv4/af_inet.c:183
  inet_dgram_connect+0x243/0x2d0 net/ipv4/af_inet.c:573
  __sys_connect+0x264/0x330 net/socket.c:1840
  __do_sys_connect net/socket.c:1851 [inline]
  __se_sys_connect net/socket.c:1848 [inline]
  __x64_sys_connect+0x73/0xb0 net/socket.c:1848
  do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459279
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2321b1ac78 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000459279
RDX: 000000000000001c RSI: 0000000020000000 RDI: 0000000000000003
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f2321b1b6d4
R13: 00000000004bf74d R14: 00000000004d0c18 R15: 00000000ffffffff
Kernel Offset: disabled
Rebooting in 86400 seconds..


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING: locking bug in inet_autobind
  2019-05-16  5:46 WARNING: locking bug in inet_autobind syzbot
  2019-05-21  8:31 ` syzbot
@ 2019-05-22  3:16 ` syzbot
       [not found]   ` <0000000000008b645c058971629b-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
  2022-09-18 15:52 ` Tetsuo Handa
  2022-12-29  6:26 ` [syzbot] " syzbot
  3 siblings, 1 reply; 18+ messages in thread
From: syzbot @ 2019-05-22  3:16 UTC (permalink / raw)
  To: Yong.Zhao, airlied, alexander.deucher, amd-gfx, ast, bpf,
	christian.koenig, daniel, daniel, davem, david1.zhou, dri-devel,
	evan.quan, felix.kuehling, harry.wentland, kafai, kuznet,
	linux-kernel, netdev, ozeng, ray.huang, rex.zhu, songliubraving,
	syzkaller-bugs, yhs, yong.zhao, yoshfuji

syzbot has bisected this bug to:

commit c0d9271ecbd891cdeb0fad1edcdd99ee717a655f
Author: Yong Zhao <Yong.Zhao@amd.com>
Date:   Fri Feb 1 23:36:21 2019 +0000

     drm/amdgpu: Delete user queue doorbell variables

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1433ece4a00000
start commit:   f49aa1de Merge tag 'for-5.2-rc1-tag' of git://git.kernel.o..
git tree:       net-next
final crash:    https://syzkaller.appspot.com/x/report.txt?x=1633ece4a00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1233ece4a00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=fc045131472947d7
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=163731f8a00000

Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com
Fixes: c0d9271ecbd8 ("drm/amdgpu: Delete user queue doorbell variables")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING: locking bug in inet_autobind
       [not found]   ` <0000000000008b645c058971629b-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
@ 2019-05-22  3:21     ` Zhao, Yong
  0 siblings, 0 replies; 18+ messages in thread
From: Zhao, Yong @ 2019-05-22  3:21 UTC (permalink / raw)
  To: syzbot, airlied-cv59FeDIM0c, Deucher, Alexander,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	ast-DgEjT+Ai2ygdnm+yROfE0A, bpf-u79uwXL29TY76Z2rM5mHXA, Koenig,
	Christian, daniel-/w4YWyX8dFk, daniel-FeC+5ew28dpmcu3hnIyYJQ,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q, Zhou, David(ChunMing),
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Quan, Evan, Kuehling,
	Felix, Wentland, Harry, kafai-b10kYP2dOMg,
	kuznet-v/Mj1YrvjDBInbfyfbPRSQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA


[-- Attachment #1.1: Type: text/plain, Size: 2286 bytes --]

This commit was reverted later. I guess the revert was probably not picked up properly.

Regards,
Yong
________________________________
From: syzbot <syzbot+94cc2a66fc228b23f360-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org>
Sent: Tuesday, May 21, 2019 11:16 PM
To: Zhao, Yong; airlied-cv59FeDIM0c@public.gmane.org; Deucher, Alexander; amd-gfx-PD4FTy7X32mqWrfYKbYh0A@public.gmane.orgktop.org; ast-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org; bpf-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; Koenig, Christian; daniel@ffwll.ch; daniel-FeC+5ew28dpmcu3hnIyYJQ@public.gmane.org; davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org; Zhou, David(ChunMing); dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org; Quan, Evan; Kuehling, Felix; Wentland, Harry; kafai-b10kYP2dOMg@public.gmane.org; kuznet-v/Mj1YrvjDBInbfyfbPRSQ@public.gmane.org; linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; netdev@vger.kernel.org; Zeng, Oak; Huang, Ray; rex.zhu-5C7GfCeVMHo@public.gmane.org; songliubraving@fb.com; syzkaller-bugs-/JYPxA39Uh5TLH3MbocFFw@public.gmane.org; yhs-b10kYP2dOMg@public.gmane.org; Zhao, Yong; yoshfuji@linux-ipv6.org
Subject: Re: WARNING: locking bug in inet_autobind

[CAUTION: External Email]

syzbot has bisected this bug to:

commit c0d9271ecbd891cdeb0fad1edcdd99ee717a655f
Author: Yong Zhao <Yong.Zhao-5C7GfCeVMHo@public.gmane.org>
Date:   Fri Feb 1 23:36:21 2019 +0000

     drm/amdgpu: Delete user queue doorbell variables

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1433ece4a00000
start commit:   f49aa1de Merge tag 'for-5.2-rc1-tag' of git://git.kernel.o..
git tree:       net-next
final crash:    https://syzkaller.appspot.com/x/report.txt?x=1633ece4a00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1233ece4a00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=fc045131472947d7
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=163731f8a00000

Reported-by: syzbot+94cc2a66fc228b23f360-Pl5Pbv+GP7P466ipTTIvnc23WoclnBCfAL8bYrjMMd8@public.gmane.org
Fixes: c0d9271ecbd8 ("drm/amdgpu: Delete user queue doorbell variables")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

[-- Attachment #1.2: Type: text/html, Size: 4106 bytes --]

[-- Attachment #2: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING: locking bug in inet_autobind
  2019-05-16  5:46 WARNING: locking bug in inet_autobind syzbot
  2019-05-21  8:31 ` syzbot
  2019-05-22  3:16 ` syzbot
@ 2022-09-18 15:52 ` Tetsuo Handa
  2022-09-18 18:25   ` Boqun Feng
  2022-12-29  6:26 ` [syzbot] " syzbot
  3 siblings, 1 reply; 18+ messages in thread
From: Tetsuo Handa @ 2022-09-18 15:52 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	Boqun Feng, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, syzbot, syzkaller-bugs

syzbot is reporting locking bug in inet_autobind(), for
commit 37159ef2c1ae1e69 ("l2tp: fix a lockdep splat") started
calling 

  lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock")

in l2tp_tunnel_create() (which is currently in l2tp_tunnel_register()).
How can we fix this problem?

  ------------[ cut here ]------------
  class->name=slock-AF_INET6 lock->name=l2tp_sock lock->key=l2tp_socket_class
  WARNING: CPU: 2 PID: 9237 at kernel/locking/lockdep.c:940 look_up_lock_class+0xcc/0x140
  Modules linked in:
  CPU: 2 PID: 9237 Comm: a.out Not tainted 6.0.0-rc5-00094-ga335366bad13-dirty #860
  Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
  RIP: 0010:look_up_lock_class+0xcc/0x140

On 2019/05/16 14:46, syzbot wrote:
> HEAD commit:    35c99ffa Merge tag 'for_linus' of git://git.kernel.org/pub..
> git tree:       net-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=10e970f4a00000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=82f0809e8f0a8c87
> dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
> compiler:       gcc (GCC) 9.0.0 20181231 (experimental)

C reproducer is available at
https://syzkaller.appspot.com/text?tag=ReproC&x=15062310080000 .


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: WARNING: locking bug in inet_autobind
  2022-09-18 15:52 ` Tetsuo Handa
@ 2022-09-18 18:25   ` Boqun Feng
  2022-09-19  5:02     ` Tetsuo Handa
  0 siblings, 1 reply; 18+ messages in thread
From: Boqun Feng @ 2022-09-18 18:25 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, syzbot, syzkaller-bugs

On Mon, Sep 19, 2022 at 12:52:45AM +0900, Tetsuo Handa wrote:
> syzbot is reporting locking bug in inet_autobind(), for
> commit 37159ef2c1ae1e69 ("l2tp: fix a lockdep splat") started
> calling 
> 
>   lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock")
> 
> in l2tp_tunnel_create() (which is currently in l2tp_tunnel_register()).
> How can we fix this problem?
> 

Just a theory, it seems that we have a memory corruption happened for
lockdep_set_class_and_name(), in l2tp_tunnel_register(), the "sk" gets
published before lockdep_set_class_and_name():

	tunnel->sock = sk;
	...
	lockdep_set_class_and_name(&sk->sk_lock.slock,...);

And what could happen is that sock_lock_init() races with the
l2tp_tunnel_register(), which results into two
lockdep_set_class_and_name()s race with each other.

Anyway, "sk" should not be published until its lock gets properly
initialized, could you try the following (untested)? Looks to me all
other code around the lockdep_set_class_and_name() should be moved
upwards, but I don't want to pretend I'm an expert ;-)

Regards,
Boqun

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 7499c51b1850..1a01d23abc53 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1480,7 +1480,9 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,

        sk = sock->sk;
        sock_hold(sk);
-       tunnel->sock = sk;
+       lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class,
+                                  "l2tp_sock");
+       smp_store_release(&tunnel->sock, sk);

        spin_lock_bh(&pn->l2tp_tunnel_list_lock);
        list_for_each_entry(tunnel_walk, &pn->l2tp_tunnel_list, list) {
@@ -1509,8 +1511,6 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,

        tunnel->old_sk_destruct = sk->sk_destruct;
        sk->sk_destruct = &l2tp_tunnel_destruct;
-       lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class,
-                                  "l2tp_sock");
        sk->sk_allocation = GFP_ATOMIC;

        trace_register_tunnel(tunnel);  

>   ------------[ cut here ]------------
>   class->name=slock-AF_INET6 lock->name=l2tp_sock lock->key=l2tp_socket_class
>   WARNING: CPU: 2 PID: 9237 at kernel/locking/lockdep.c:940 look_up_lock_class+0xcc/0x140
>   Modules linked in:
>   CPU: 2 PID: 9237 Comm: a.out Not tainted 6.0.0-rc5-00094-ga335366bad13-dirty #860
>   Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
>   RIP: 0010:look_up_lock_class+0xcc/0x140
> 
> On 2019/05/16 14:46, syzbot wrote:
> > HEAD commit:    35c99ffa Merge tag 'for_linus' of git://git.kernel.org/pub..
> > git tree:       net-next
> > console output: https://syzkaller.appspot.com/x/log.txt?x=10e970f4a00000
> > kernel config:  https://syzkaller.appspot.com/x/.config?x=82f0809e8f0a8c87
> > dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
> > compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
> 
> C reproducer is available at
> https://syzkaller.appspot.com/text?tag=ReproC&x=15062310080000 .
> 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: WARNING: locking bug in inet_autobind
  2022-09-18 18:25   ` Boqun Feng
@ 2022-09-19  5:02     ` Tetsuo Handa
  2022-09-27 13:00       ` Tetsuo Handa
  0 siblings, 1 reply; 18+ messages in thread
From: Tetsuo Handa @ 2022-09-19  5:02 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	netdev, syzbot, syzkaller-bugs

On 2022/09/19 3:25, Boqun Feng wrote:
> On Mon, Sep 19, 2022 at 12:52:45AM +0900, Tetsuo Handa wrote:
>> syzbot is reporting locking bug in inet_autobind(), for
>> commit 37159ef2c1ae1e69 ("l2tp: fix a lockdep splat") started
>> calling 
>>
>>   lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock")
>>
>> in l2tp_tunnel_create() (which is currently in l2tp_tunnel_register()).
>> How can we fix this problem?
>>
> 
> Just a theory, it seems that we have a memory corruption happened for
> lockdep_set_class_and_name(), in l2tp_tunnel_register(), the "sk" gets
> published before lockdep_set_class_and_name():
> 
> 	tunnel->sock = sk;
> 	...
> 	lockdep_set_class_and_name(&sk->sk_lock.slock,...);
> 
> And what could happen is that sock_lock_init() races with the
> l2tp_tunnel_register(), which results into two
> lockdep_set_class_and_name()s race with each other.
> 
> Anyway, "sk" should not be published until its lock gets properly
> initialized, could you try the following (untested)? Looks to me all
> other code around the lockdep_set_class_and_name() should be moved
> upwards, but I don't want to pretend I'm an expert ;-)

This diff did not help.

  ------------[ cut here ]------------
  Looking for class "l2tp_sock" with key l2tp_socket_class, but found a different class "slock-AF_INET6" with the same key
  WARNING: CPU: 1 PID: 14195 at kernel/locking/lockdep.c:940 look_up_lock_class+0xcc/0x140
  Modules linked in:
  CPU: 1 PID: 14195 Comm: a.out Not tainted 6.0.0-rc6-dirty #863
  Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
  RIP: 0010:look_up_lock_class+0xcc/0x140

A roughly simplified reproducer (be unlikely able to reproduce) is shown below.

----------------------------------------
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/if_pppox.h>

int main(int argc, char *argv[])
{
        const int fd0 = socket(AF_PPPOX, SOCK_STREAM, 1);
        const int fd1 = socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP);
        struct sockaddr_pppol2tp addr0 = {
                .sa_family = AF_PPPOX, .sa_protocol = 1, .pppol2tp.fd = fd1, /* AF_INET6 UDP socket. */
                .pppol2tp.addr.sin_port = htons(1),
                .pppol2tp.addr.sin_addr = htonl(INADDR_LOOPBACK),
                .pppol2tp.s_tunnel = 2
        };
        struct sockaddr_in6 addr1 = { .sin6_family = AF_INET6, .sin6_port = htons(0), .sin6_addr = in6addr_loopback };
        if (fork() == 0) {
                connect(fd1, (struct sockaddr *) &addr1, sizeof(addr1)); /* Invoke inet_autobind() due to .sin6_port = htons(0). */
                _exit(0);
        }
        connect(fd0, (struct sockaddr *) &addr0, sizeof(addr0)); /* Call lockdep_set_class_and_name(sk) of already published fd1. */
        return 0;
}
----------------------------------------

The reproducer is creating two file descriptors via socket(AF_PPPOX, SOCK_STREAM, 1)
and socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP).

The connect() on AF_PPPOX socket calls l2tp_tunnel_register() via pppol2tp_connect().
l2tp_tunnel_register() changes an already published socket's "sk" which can be reached
via file descriptor using sockfd_lookup(). And for this reproducer, a "sk" created via
socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) is modified by the connect() on AF_PPPOX socket.

But since this file descriptor is visible to userspace, the userspace can concurrently
call connect() on AF_INET6 socket (which invokes inet_autobind() by passing port == 0)
using this file descriptor. As a result, spin_lock_bh(&sk->sk_lock.slock) from
lock_sock_nested(sk) from lock_sock(sk) from inet_autobind() from inet_dgram_connect()
finds that there already is a class "slock-AF_INET6" which would have been a normal
result if l2tp_tunnel_register() did not call
lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock")
on this AF_INET6 socket.

It seems like a race condition, for a debug printk() patch shown below suggested that
this happens when lock_sock(sk) and lockdep_set_class_and_name(&sk->sk_lock.slock) ran
in parallel.

----------------------------------------
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 3ca0cc467886..57b31d06b0e1 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -174,6 +174,8 @@ static int inet_autobind(struct sock *sk)
 {
 	struct inet_sock *inet;
 	/* We may need to bind the socket. */
+	if (!strcmp(current->comm, "a.out"))
+		pr_info("inet_autobind(sk=%px)\n", sk);
 	lock_sock(sk);
 	inet = inet_sk(sk);
 	if (!inet->inet_num) {
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 7499c51b1850..1bb14b19bca0 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1509,8 +1509,12 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
 
 	tunnel->old_sk_destruct = sk->sk_destruct;
 	sk->sk_destruct = &l2tp_tunnel_destruct;
+	if (!strcmp(current->comm, "a.out"))
+		pr_info("l2tp_tunnel_register(sk=%px) before\n", sk);
 	lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class,
 				   "l2tp_sock");
+	if (!strcmp(current->comm, "a.out"))
+		pr_info("l2tp_tunnel_register(sk=%px) after\n", sk);
 	sk->sk_allocation = GFP_ATOMIC;
 
 	trace_register_tunnel(tunnel);
----------------------------------------

----------------------------------------
[  229.873612][T41464] l2tp_core: l2tp_tunnel_register(sk=ffff8880148a7800) before
[  229.873619][T41464] l2tp_core: l2tp_tunnel_register(sk=ffff8880148a7800) after
[  229.873654][T41465] IPv4: inet_autobind(sk=ffff8880148a7800)
[  229.879263][T41468] IPv4: inet_autobind(sk=ffff8880d63a1e00)
[  229.879264][T41467] l2tp_core: l2tp_tunnel_register(sk=ffff8880d63a1e00) before
[  229.879272][T41468] ------------[ cut here ]------------
[  229.879272][T41467] l2tp_core: l2tp_tunnel_register(sk=ffff8880d63a1e00) after
[  229.879275][T41468] Looking for class "l2tp_sock" with key l2tp_socket_class, but found a different class "slock-AF_INET6" with the same key
[  229.879932][T41450] l2tp_core: l2tp_tunnel_register(sk=ffff88807c416180) after
[  229.882029][T41468] WARNING: CPU: 0 PID: 41468 at kernel/locking/lockdep.c:940 look_up_lock_class+0xcc/0x140
[  229.888126][T41471] IPv4: inet_autobind(sk=ffff88807c410000)
[  229.888126][T41470] l2tp_core: l2tp_tunnel_register(sk=ffff88807c410000) before
[  229.888134][T41470] l2tp_core: l2tp_tunnel_register(sk=ffff88807c410000) after
[  229.889140][T41468] Modules linked in:
[  230.006548][T41468] CPU: 0 PID: 41468 Comm: a.out Not tainted 6.0.0-rc6-00001-g7def00e9a851-dirty #871
[  230.009327][T41468] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  230.012117][T41468] RIP: 0010:look_up_lock_class+0xcc/0x140
[  230.014633][T41468] Code: 8b 17 48 c7 c0 90 42 4b 88 48 39 c2 74 c4 f6 05 dd 31 dc 01 01 75 bb c6 05 d4 31 dc 01 01 48 c7 c7 26 5e f3 85 e8 f4 17 4c fc <0f> 0b eb a4 e8 5b c1 93 fd 48 c7 c7 fd 4c 19 86 89 de e8 c5 06 ff
[  230.020534][T41468] RSP: 0018:ffffc90013bc3ba0 EFLAGS: 00010046
[  230.023183][T41468] RAX: 4ca7765a49bbb600 RBX: ffffffff8837db90 RCX: ffff8880d5ddd580
[  230.025998][T41468] RDX: 0000000000000000 RSI: 0000000080000201 RDI: 0000000000000000
[  230.028984][T41468] RBP: 0000000000000001 R08: ffffffff8136457a R09: 0000000000000000
[  230.031785][T41468] R10: ffffffff81366013 R11: ffff8880d5ddd580 R12: 0000000000000000
[  230.034512][T41468] R13: ffff8880d63a1eb0 R14: 0000000000000000 R15: 0000000000000000
[  230.037347][T41468] FS:  00007efccdb44640(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[  230.040207][T41468] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  230.042940][T41468] CR2: 00007efccdb43ef8 CR3: 0000000011a99000 CR4: 00000000000506f0
[  230.045741][T41468] Call Trace:
[  230.048282][T41468]  <TASK>
[  230.050869][T41468]  register_lock_class+0x48/0x300
[  230.053474][T41468]  __lock_acquire+0x87/0x3340
[  230.056057][T41468]  ? __lock_acquire+0x65f/0x3340
[  230.058852][T41468]  ? console_trylock_spinning+0x187/0x2c0
[  230.061637][T41468]  lock_acquire+0xc6/0x1d0
[  230.064189][T41468]  ? lock_sock_nested+0x56/0xa0
[  230.066753][T41468]  ? lock_sock_nested+0x56/0xa0
[  230.069337][T41468]  _raw_spin_lock_bh+0x31/0x40
[  230.071879][T41468]  ? lock_sock_nested+0x56/0xa0
[  230.074527][T41468]  lock_sock_nested+0x56/0xa0
[  230.077195][T41468]  inet_dgram_connect+0xd7/0x1c0
[  230.079829][T41468]  __sys_connect+0x137/0x150
[  230.082440][T41468]  ? syscall_enter_from_user_mode+0x2e/0x1d0
[  230.085198][T41468]  ? lockdep_hardirqs_on+0x8d/0x130
[  230.087957][T41468]  __x64_sys_connect+0x18/0x20
[  230.090690][T41468]  do_syscall_64+0x3d/0x90
[  230.093232][T41468]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
----------------------------------------

But unfortunately reordering

  tunnel->sock = sk;
  ...
  lockdep_set_class_and_name(&sk->sk_lock.slock,...);

by

  lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock");
  smp_store_release(&tunnel->sock, sk);

does not help, for connect() on AF_INET6 socket is not finding this "sk" by
accessing tunnel->sock.


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: WARNING: locking bug in inet_autobind
  2022-09-19  5:02     ` Tetsuo Handa
@ 2022-09-27 13:00       ` Tetsuo Handa
  2022-11-22 18:02         ` Jakub Sitnicki
  0 siblings, 1 reply; 18+ messages in thread
From: Tetsuo Handa @ 2022-09-27 13:00 UTC (permalink / raw)
  To: Boqun Feng, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, netdev,
	syzbot, syzkaller-bugs

On 2022/09/19 14:02, Tetsuo Handa wrote:
> But unfortunately reordering
> 
>   tunnel->sock = sk;
>   ...
>   lockdep_set_class_and_name(&sk->sk_lock.slock,...);
> 
> by
> 
>   lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock");
>   smp_store_release(&tunnel->sock, sk);
> 
> does not help, for connect() on AF_INET6 socket is not finding this "sk" by
> accessing tunnel->sock.
> 

I considered something like below diff, but I came to think that this problem
cannot be solved unless l2tp_tunnel_register() stops using userspace-supplied
file descriptor and starts always calling l2tp_tunnel_sock_create(), for
userspace can continue using userspace-supplied file descriptor as if a normal
socket even after lockdep_set_class_and_name() told that this is a tunneling
socket.

Since userspace-supplied file descriptor has to be a datagram socket,
can we somehow copy the source/destination addresses from
userspace-supplied socket to kernel-created socket?


diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 7499c51b1850..07429bed7c4c 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1382,8 +1382,6 @@ static int l2tp_tunnel_sock_create(struct net *net,
 	return err;
 }
 
-static struct lock_class_key l2tp_socket_class;
-
 int l2tp_tunnel_create(int fd, int version, u32 tunnel_id, u32 peer_tunnel_id,
 		       struct l2tp_tunnel_cfg *cfg, struct l2tp_tunnel **tunnelp)
 {
@@ -1509,8 +1507,20 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
 
 	tunnel->old_sk_destruct = sk->sk_destruct;
 	sk->sk_destruct = &l2tp_tunnel_destruct;
-	lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class,
-				   "l2tp_sock");
+	if (IS_ENABLED(CONFIG_LOCKDEP)) {
+		static struct lock_class_key l2tp_socket_class;
+
+		/* Changing class/name of an already visible sock might race
+		 * with first lock_sock() call on that sock. In order to make
+		 * sure that register_lock_class() has completed before
+		 * lockdep_set_class_and_name() changes class/name, explicitly
+		 * lock/release that sock.
+		 */
+		lock_sock(sk);
+		release_sock(sk);
+		lockdep_set_class_and_name(&sk->sk_lock.slock,
+					   &l2tp_socket_class, "l2tp_sock");
+	}
 	sk->sk_allocation = GFP_ATOMIC;
 
 	trace_register_tunnel(tunnel);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: WARNING: locking bug in inet_autobind
  2022-09-27 13:00       ` Tetsuo Handa
@ 2022-11-22 18:02         ` Jakub Sitnicki
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Sitnicki @ 2022-11-22 18:02 UTC (permalink / raw)
  To: Eric Dumazet, Tetsuo Handa
  Cc: Boqun Feng, David S. Miller, Jakub Kicinski, Paolo Abeni,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Waiman Long, netdev,
	syzbot, syzkaller-bugs

On Tue, Sep 27, 2022 at 10:00 PM +09, Tetsuo Handa wrote:
> On 2022/09/19 14:02, Tetsuo Handa wrote:
>> But unfortunately reordering
>> 
>>   tunnel->sock = sk;
>>   ...
>>   lockdep_set_class_and_name(&sk->sk_lock.slock,...);
>> 
>> by
>> 
>>   lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class, "l2tp_sock");
>>   smp_store_release(&tunnel->sock, sk);
>> 
>> does not help, for connect() on AF_INET6 socket is not finding this "sk" by
>> accessing tunnel->sock.
>> 
>
> I considered something like below diff, but I came to think that this problem
> cannot be solved unless l2tp_tunnel_register() stops using userspace-supplied
> file descriptor and starts always calling l2tp_tunnel_sock_create(), for
> userspace can continue using userspace-supplied file descriptor as if a normal
> socket even after lockdep_set_class_and_name() told that this is a tunneling
> socket.
>
> Since userspace-supplied file descriptor has to be a datagram socket,
> can we somehow copy the source/destination addresses from
> userspace-supplied socket to kernel-created socket?
>
>
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 7499c51b1850..07429bed7c4c 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1382,8 +1382,6 @@ static int l2tp_tunnel_sock_create(struct net *net,
>  	return err;
>  }
>  
> -static struct lock_class_key l2tp_socket_class;
> -
>  int l2tp_tunnel_create(int fd, int version, u32 tunnel_id, u32 peer_tunnel_id,
>  		       struct l2tp_tunnel_cfg *cfg, struct l2tp_tunnel **tunnelp)
>  {
> @@ -1509,8 +1507,20 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
>  
>  	tunnel->old_sk_destruct = sk->sk_destruct;
>  	sk->sk_destruct = &l2tp_tunnel_destruct;
> -	lockdep_set_class_and_name(&sk->sk_lock.slock, &l2tp_socket_class,
> -				   "l2tp_sock");
> +	if (IS_ENABLED(CONFIG_LOCKDEP)) {
> +		static struct lock_class_key l2tp_socket_class;
> +
> +		/* Changing class/name of an already visible sock might race
> +		 * with first lock_sock() call on that sock. In order to make
> +		 * sure that register_lock_class() has completed before
> +		 * lockdep_set_class_and_name() changes class/name, explicitly
> +		 * lock/release that sock.
> +		 */
> +		lock_sock(sk);
> +		release_sock(sk);
> +		lockdep_set_class_and_name(&sk->sk_lock.slock,
> +					   &l2tp_socket_class, "l2tp_sock");
> +	}
>  	sk->sk_allocation = GFP_ATOMIC;
>  
>  	trace_register_tunnel(tunnel);

What if we revisit Eric's lockdep splat fix in 37159ef2c1ae ("l2tp: fix
a lockdep splat") and:

1. remove the lockdep_set_class_and_name(...) call in l2tp; it looks
   like an odd case within the network stack, and

2. switch to bh_lock_sock_nested in l2tp_xmit_core so that we don't
   break what has been fixed in 37159ef2c1ae.

Eric, WDYT?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] WARNING: locking bug in inet_autobind
  2019-05-16  5:46 WARNING: locking bug in inet_autobind syzbot
                   ` (2 preceding siblings ...)
  2022-09-18 15:52 ` Tetsuo Handa
@ 2022-12-29  6:26 ` syzbot
  2023-01-03 15:39   ` Felix Kuehling
  3 siblings, 1 reply; 18+ messages in thread
From: syzbot @ 2022-12-29  6:26 UTC (permalink / raw)
  To: Alexander.Deucher, Christian.Koenig, David1.Zhou, Evan.Quan,
	Felix.Kuehling, Harry.Wentland, Oak.Zeng, Ray.Huang, Yong.Zhao,
	airlied, alexander.deucher, amd-gfx, ast, boqun.feng, bpf,
	christian.koenig, daniel, daniel, davem, david1.zhou, dri-devel,
	dsahern, edumazet, evan.quan, felix.kuehling, gautammenghani201,
	harry.wentland, jakub, kafai, kuba, kuznet, linux-kernel,
	longman, mingo, netdev, ozeng, pabeni, penguin-kernel,
	penguin-kernel, peterz, ray.huang, rex.zhu, songliubraving,
	syzkaller-bugs, will, yhs, yong.zhao, yoshfuji

syzbot has found a reproducer for the following issue on:

HEAD commit:    1b929c02afd3 Linux 6.2-rc1
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=145c6a68480000
kernel config:  https://syzkaller.appspot.com/x/.config?x=2651619a26b4d687
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13e13e32480000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13790f08480000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/d1849f1ca322/disk-1b929c02.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/924cb8aa4ada/vmlinux-1b929c02.xz
kernel image: https://storage.googleapis.com/syzbot-assets/8c7330dae0a0/bzImage-1b929c02.xz

The issue was bisected to:

commit c0d9271ecbd891cdeb0fad1edcdd99ee717a655f
Author: Yong Zhao <Yong.Zhao@amd.com>
Date:   Fri Feb 1 23:36:21 2019 +0000

    drm/amdgpu: Delete user queue doorbell variables

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1433ece4a00000
final oops:     https://syzkaller.appspot.com/x/report.txt?x=1633ece4a00000
console output: https://syzkaller.appspot.com/x/log.txt?x=1233ece4a00000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com
Fixes: c0d9271ecbd8 ("drm/amdgpu: Delete user queue doorbell variables")

------------[ cut here ]------------
Looking for class "l2tp_sock" with key l2tp_socket_class, but found a different class "slock-AF_INET6" with the same key
WARNING: CPU: 0 PID: 7280 at kernel/locking/lockdep.c:937 look_up_lock_class+0x97/0x110 kernel/locking/lockdep.c:937
Modules linked in:
CPU: 0 PID: 7280 Comm: syz-executor835 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:look_up_lock_class+0x97/0x110 kernel/locking/lockdep.c:937
Code: 17 48 81 fa e0 e5 f6 8f 74 59 80 3d 5d bc 57 04 00 75 50 48 c7 c7 00 4d 4c 8a 48 89 04 24 c6 05 49 bc 57 04 01 e8 a9 42 b9 ff <0f> 0b 48 8b 04 24 eb 31 9c 5a 80 e6 02 74 95 e8 45 38 02 fa 85 c0
RSP: 0018:ffffc9000b5378b8 EFLAGS: 00010082
RAX: 0000000000000000 RBX: ffffffff91c06a00 RCX: 0000000000000000
RDX: ffff8880292d0000 RSI: ffffffff8166721c RDI: fffff520016a6f09
RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000080000201 R11: 20676e696b6f6f4c R12: 0000000000000000
R13: ffff88802a5820b0 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f1fd7a97700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000100 CR3: 0000000078ab4000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 register_lock_class+0xbe/0x1120 kernel/locking/lockdep.c:1289
 __lock_acquire+0x109/0x56d0 kernel/locking/lockdep.c:4934
 lock_acquire kernel/locking/lockdep.c:5668 [inline]
 lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
 _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
 spin_lock_bh include/linux/spinlock.h:355 [inline]
 lock_sock_nested+0x5f/0xf0 net/core/sock.c:3473
 lock_sock include/net/sock.h:1725 [inline]
 inet_autobind+0x1a/0x190 net/ipv4/af_inet.c:177
 inet_send_prepare net/ipv4/af_inet.c:813 [inline]
 inet_send_prepare+0x325/0x4e0 net/ipv4/af_inet.c:807
 inet6_sendmsg+0x43/0xe0 net/ipv6/af_inet6.c:655
 sock_sendmsg_nosec net/socket.c:714 [inline]
 sock_sendmsg+0xd3/0x120 net/socket.c:734
 __sys_sendto+0x23a/0x340 net/socket.c:2117
 __do_sys_sendto net/socket.c:2129 [inline]
 __se_sys_sendto net/socket.c:2125 [inline]
 __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f1fd78538b9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f1fd7a971f8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f1fd78f0038 RCX: 00007f1fd78538b9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007f1fd78f0030 R08: 0000000020000100 R09: 000000000000001c
R10: 0000000004008000 R11: 0000000000000212 R12: 00007f1fd78f003c
R13: 00007f1fd79ffc8f R14: 00007f1fd7a97300 R15: 0000000000022000
 </TASK>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] WARNING: locking bug in inet_autobind
  2022-12-29  6:26 ` [syzbot] " syzbot
@ 2023-01-03 15:39   ` Felix Kuehling
  2023-01-03 16:05     ` Waiman Long
  0 siblings, 1 reply; 18+ messages in thread
From: Felix Kuehling @ 2023-01-03 15:39 UTC (permalink / raw)
  To: syzbot, Alexander.Deucher, Christian.Koenig, David1.Zhou,
	Evan.Quan, Harry.Wentland, Oak.Zeng, Ray.Huang, Yong.Zhao,
	airlied, amd-gfx, ast, boqun.feng, bpf, daniel, daniel, davem,
	dri-devel, dsahern, edumazet, gautammenghani201, jakub, kafai,
	kuba, kuznet, linux-kernel, longman, mingo, netdev, ozeng,
	pabeni, penguin-kernel, peterz, rex.zhu, songliubraving,
	syzkaller-bugs, will, yhs, yoshfuji

The regression point doesn't make sense. The kernel config doesn't 
enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU 
could have caused this regression.

Regards,
   Felix


Am 2022-12-29 um 01:26 schrieb syzbot:
> syzbot has found a reproducer for the following issue on:
>
> HEAD commit:    1b929c02afd3 Linux 6.2-rc1
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=145c6a68480000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=2651619a26b4d687
> dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13e13e32480000
> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13790f08480000
>
> Downloadable assets:
> disk image: https://storage.googleapis.com/syzbot-assets/d1849f1ca322/disk-1b929c02.raw.xz
> vmlinux: https://storage.googleapis.com/syzbot-assets/924cb8aa4ada/vmlinux-1b929c02.xz
> kernel image: https://storage.googleapis.com/syzbot-assets/8c7330dae0a0/bzImage-1b929c02.xz
>
> The issue was bisected to:
>
> commit c0d9271ecbd891cdeb0fad1edcdd99ee717a655f
> Author: Yong Zhao <Yong.Zhao@amd.com>
> Date:   Fri Feb 1 23:36:21 2019 +0000
>
>      drm/amdgpu: Delete user queue doorbell variables
>
> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=1433ece4a00000
> final oops:     https://syzkaller.appspot.com/x/report.txt?x=1633ece4a00000
> console output: https://syzkaller.appspot.com/x/log.txt?x=1233ece4a00000
>
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com
> Fixes: c0d9271ecbd8 ("drm/amdgpu: Delete user queue doorbell variables")
>
> ------------[ cut here ]------------
> Looking for class "l2tp_sock" with key l2tp_socket_class, but found a different class "slock-AF_INET6" with the same key
> WARNING: CPU: 0 PID: 7280 at kernel/locking/lockdep.c:937 look_up_lock_class+0x97/0x110 kernel/locking/lockdep.c:937
> Modules linked in:
> CPU: 0 PID: 7280 Comm: syz-executor835 Not tainted 6.2.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
> RIP: 0010:look_up_lock_class+0x97/0x110 kernel/locking/lockdep.c:937
> Code: 17 48 81 fa e0 e5 f6 8f 74 59 80 3d 5d bc 57 04 00 75 50 48 c7 c7 00 4d 4c 8a 48 89 04 24 c6 05 49 bc 57 04 01 e8 a9 42 b9 ff <0f> 0b 48 8b 04 24 eb 31 9c 5a 80 e6 02 74 95 e8 45 38 02 fa 85 c0
> RSP: 0018:ffffc9000b5378b8 EFLAGS: 00010082
> RAX: 0000000000000000 RBX: ffffffff91c06a00 RCX: 0000000000000000
> RDX: ffff8880292d0000 RSI: ffffffff8166721c RDI: fffff520016a6f09
> RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
> R10: 0000000080000201 R11: 20676e696b6f6f4c R12: 0000000000000000
> R13: ffff88802a5820b0 R14: 0000000000000000 R15: 0000000000000000
> FS:  00007f1fd7a97700(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000020000100 CR3: 0000000078ab4000 CR4: 00000000003506f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>   <TASK>
>   register_lock_class+0xbe/0x1120 kernel/locking/lockdep.c:1289
>   __lock_acquire+0x109/0x56d0 kernel/locking/lockdep.c:4934
>   lock_acquire kernel/locking/lockdep.c:5668 [inline]
>   lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
>   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
>   _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
>   spin_lock_bh include/linux/spinlock.h:355 [inline]
>   lock_sock_nested+0x5f/0xf0 net/core/sock.c:3473
>   lock_sock include/net/sock.h:1725 [inline]
>   inet_autobind+0x1a/0x190 net/ipv4/af_inet.c:177
>   inet_send_prepare net/ipv4/af_inet.c:813 [inline]
>   inet_send_prepare+0x325/0x4e0 net/ipv4/af_inet.c:807
>   inet6_sendmsg+0x43/0xe0 net/ipv6/af_inet6.c:655
>   sock_sendmsg_nosec net/socket.c:714 [inline]
>   sock_sendmsg+0xd3/0x120 net/socket.c:734
>   __sys_sendto+0x23a/0x340 net/socket.c:2117
>   __do_sys_sendto net/socket.c:2129 [inline]
>   __se_sys_sendto net/socket.c:2125 [inline]
>   __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125
>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>   do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f1fd78538b9
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f1fd7a971f8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
> RAX: ffffffffffffffda RBX: 00007f1fd78f0038 RCX: 00007f1fd78538b9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 00007f1fd78f0030 R08: 0000000020000100 R09: 000000000000001c
> R10: 0000000004008000 R11: 0000000000000212 R12: 00007f1fd78f003c
> R13: 00007f1fd79ffc8f R14: 00007f1fd7a97300 R15: 0000000000022000
>   </TASK>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] WARNING: locking bug in inet_autobind
  2023-01-03 15:39   ` Felix Kuehling
@ 2023-01-03 16:05     ` Waiman Long
  2023-01-03 16:20       ` Felix Kuehling
  0 siblings, 1 reply; 18+ messages in thread
From: Waiman Long @ 2023-01-03 16:05 UTC (permalink / raw)
  To: Felix Kuehling, syzbot, Alexander.Deucher, Christian.Koenig,
	David1.Zhou, Evan.Quan, Harry.Wentland, Oak.Zeng, Ray.Huang,
	Yong.Zhao, airlied, amd-gfx, ast, boqun.feng, bpf, daniel,
	daniel, davem, dri-devel, dsahern, edumazet, gautammenghani201,
	jakub, kafai, kuba, kuznet, linux-kernel, mingo, netdev, ozeng,
	pabeni, penguin-kernel, peterz, rex.zhu, songliubraving,
	syzkaller-bugs, will, yhs, yoshfuji

On 1/3/23 10:39, Felix Kuehling wrote:
> The regression point doesn't make sense. The kernel config doesn't 
> enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU 
> could have caused this regression.
>
I agree. It is likely a pre-existing problem or caused by another commit 
that got triggered because of the change in cacheline alignment caused 
by commit c0d9271ecbd ("drm/amdgpu: Delete user queue doorbell variable").

Cheers,
Longman


> Regards,
>   Felix
>
>
> Am 2022-12-29 um 01:26 schrieb syzbot:
>> syzbot has found a reproducer for the following issue on:
>>
>> HEAD commit:    1b929c02afd3 Linux 6.2-rc1
>> git tree:       upstream
>> console output: https://syzkaller.appspot.com/x/log.txt?x=145c6a68480000
>> kernel config: 
>> https://syzkaller.appspot.com/x/.config?x=2651619a26b4d687
>> dashboard link: 
>> https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU 
>> Binutils for Debian) 2.35.2
>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13e13e32480000
>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13790f08480000
>>
>> Downloadable assets:
>> disk image: 
>> https://storage.googleapis.com/syzbot-assets/d1849f1ca322/disk-1b929c02.raw.xz
>> vmlinux: 
>> https://storage.googleapis.com/syzbot-assets/924cb8aa4ada/vmlinux-1b929c02.xz
>> kernel image: 
>> https://storage.googleapis.com/syzbot-assets/8c7330dae0a0/bzImage-1b929c02.xz
>>
>> The issue was bisected to:
>>
>> commit c0d9271ecbd891cdeb0fad1edcdd99ee717a655f
>> Author: Yong Zhao <Yong.Zhao@amd.com>
>> Date:   Fri Feb 1 23:36:21 2019 +0000
>>
>>      drm/amdgpu: Delete user queue doorbell variables
>>
>> bisection log: 
>> https://syzkaller.appspot.com/x/bisect.txt?x=1433ece4a00000
>> final oops: https://syzkaller.appspot.com/x/report.txt?x=1633ece4a00000
>> console output: https://syzkaller.appspot.com/x/log.txt?x=1233ece4a00000
>>
>> IMPORTANT: if you fix the issue, please add the following tag to the 
>> commit:
>> Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com
>> Fixes: c0d9271ecbd8 ("drm/amdgpu: Delete user queue doorbell variables")
>>
>> ------------[ cut here ]------------
>> Looking for class "l2tp_sock" with key l2tp_socket_class, but found a 
>> different class "slock-AF_INET6" with the same key
>> WARNING: CPU: 0 PID: 7280 at kernel/locking/lockdep.c:937 
>> look_up_lock_class+0x97/0x110 kernel/locking/lockdep.c:937
>> Modules linked in:
>> CPU: 0 PID: 7280 Comm: syz-executor835 Not tainted 
>> 6.2.0-rc1-syzkaller #0
>> Hardware name: Google Google Compute Engine/Google Compute Engine, 
>> BIOS Google 10/26/2022
>> RIP: 0010:look_up_lock_class+0x97/0x110 kernel/locking/lockdep.c:937
>> Code: 17 48 81 fa e0 e5 f6 8f 74 59 80 3d 5d bc 57 04 00 75 50 48 c7 
>> c7 00 4d 4c 8a 48 89 04 24 c6 05 49 bc 57 04 01 e8 a9 42 b9 ff <0f> 
>> 0b 48 8b 04 24 eb 31 9c 5a 80 e6 02 74 95 e8 45 38 02 fa 85 c0
>> RSP: 0018:ffffc9000b5378b8 EFLAGS: 00010082
>> RAX: 0000000000000000 RBX: ffffffff91c06a00 RCX: 0000000000000000
>> RDX: ffff8880292d0000 RSI: ffffffff8166721c RDI: fffff520016a6f09
>> RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
>> R10: 0000000080000201 R11: 20676e696b6f6f4c R12: 0000000000000000
>> R13: ffff88802a5820b0 R14: 0000000000000000 R15: 0000000000000000
>> FS:  00007f1fd7a97700(0000) GS:ffff8880b9800000(0000) 
>> knlGS:0000000000000000
>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: 0000000020000100 CR3: 0000000078ab4000 CR4: 00000000003506f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>>   <TASK>
>>   register_lock_class+0xbe/0x1120 kernel/locking/lockdep.c:1289
>>   __lock_acquire+0x109/0x56d0 kernel/locking/lockdep.c:4934
>>   lock_acquire kernel/locking/lockdep.c:5668 [inline]
>>   lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
>>   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
>>   _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
>>   spin_lock_bh include/linux/spinlock.h:355 [inline]
>>   lock_sock_nested+0x5f/0xf0 net/core/sock.c:3473
>>   lock_sock include/net/sock.h:1725 [inline]
>>   inet_autobind+0x1a/0x190 net/ipv4/af_inet.c:177
>>   inet_send_prepare net/ipv4/af_inet.c:813 [inline]
>>   inet_send_prepare+0x325/0x4e0 net/ipv4/af_inet.c:807
>>   inet6_sendmsg+0x43/0xe0 net/ipv6/af_inet6.c:655
>>   sock_sendmsg_nosec net/socket.c:714 [inline]
>>   sock_sendmsg+0xd3/0x120 net/socket.c:734
>>   __sys_sendto+0x23a/0x340 net/socket.c:2117
>>   __do_sys_sendto net/socket.c:2129 [inline]
>>   __se_sys_sendto net/socket.c:2125 [inline]
>>   __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125
>>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>   do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>> RIP: 0033:0x7f1fd78538b9
>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 15 00 00 90 48 89 f8 48 
>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 
>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>> RSP: 002b:00007f1fd7a971f8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
>> RAX: ffffffffffffffda RBX: 00007f1fd78f0038 RCX: 00007f1fd78538b9
>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
>> RBP: 00007f1fd78f0030 R08: 0000000020000100 R09: 000000000000001c
>> R10: 0000000004008000 R11: 0000000000000212 R12: 00007f1fd78f003c
>> R13: 00007f1fd79ffc8f R14: 00007f1fd7a97300 R15: 0000000000022000
>>   </TASK>
>>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] WARNING: locking bug in inet_autobind
  2023-01-03 16:05     ` Waiman Long
@ 2023-01-03 16:20       ` Felix Kuehling
  2023-01-03 22:07           ` Tetsuo Handa
  0 siblings, 1 reply; 18+ messages in thread
From: Felix Kuehling @ 2023-01-03 16:20 UTC (permalink / raw)
  To: Waiman Long, syzbot, Alexander.Deucher, Christian.Koenig,
	David1.Zhou, Evan.Quan, Harry.Wentland, Oak.Zeng, Ray.Huang,
	Yong.Zhao, airlied, amd-gfx, ast, boqun.feng, bpf, daniel,
	daniel, davem, dri-devel, dsahern, edumazet, gautammenghani201,
	jakub, kafai, kuba, kuznet, linux-kernel, mingo, netdev, ozeng,
	pabeni, penguin-kernel, peterz, rex.zhu, songliubraving,
	syzkaller-bugs, will, yhs, yoshfuji


Am 2023-01-03 um 11:05 schrieb Waiman Long:
> On 1/3/23 10:39, Felix Kuehling wrote:
>> The regression point doesn't make sense. The kernel config doesn't 
>> enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU 
>> could have caused this regression.
>>
> I agree. It is likely a pre-existing problem or caused by another 
> commit that got triggered because of the change in cacheline alignment 
> caused by commit c0d9271ecbd ("drm/amdgpu: Delete user queue doorbell 
> variable").
I don't think the change can affect cache line alignment. The entire 
amdgpu driver doesn't even get compiled in the kernel config that was 
used, and the change doesn't touch any files outside 
drivers/gpu/drm/amd/amdgpu:

# CONFIG_DRM_AMDGPU is not set

My guess would be that it's an intermittent bug that is confusing bisect.

Regards,
   Felix


>
> Cheers,
> Longman
>
>
>> Regards,
>>   Felix
>>
>>
>> Am 2022-12-29 um 01:26 schrieb syzbot:
>>> syzbot has found a reproducer for the following issue on:
>>>
>>> HEAD commit:    1b929c02afd3 Linux 6.2-rc1
>>> git tree:       upstream
>>> console output: 
>>> https://syzkaller.appspot.com/x/log.txt?x=145c6a68480000
>>> kernel config: 
>>> https://syzkaller.appspot.com/x/.config?x=2651619a26b4d687
>>> dashboard link: 
>>> https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
>>> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU 
>>> Binutils for Debian) 2.35.2
>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=13e13e32480000
>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=13790f08480000
>>>
>>> Downloadable assets:
>>> disk image: 
>>> https://storage.googleapis.com/syzbot-assets/d1849f1ca322/disk-1b929c02.raw.xz
>>> vmlinux: 
>>> https://storage.googleapis.com/syzbot-assets/924cb8aa4ada/vmlinux-1b929c02.xz
>>> kernel image: 
>>> https://storage.googleapis.com/syzbot-assets/8c7330dae0a0/bzImage-1b929c02.xz
>>>
>>> The issue was bisected to:
>>>
>>> commit c0d9271ecbd891cdeb0fad1edcdd99ee717a655f
>>> Author: Yong Zhao <Yong.Zhao@amd.com>
>>> Date:   Fri Feb 1 23:36:21 2019 +0000
>>>
>>>      drm/amdgpu: Delete user queue doorbell variables
>>>
>>> bisection log: 
>>> https://syzkaller.appspot.com/x/bisect.txt?x=1433ece4a00000
>>> final oops: https://syzkaller.appspot.com/x/report.txt?x=1633ece4a00000
>>> console output: 
>>> https://syzkaller.appspot.com/x/log.txt?x=1233ece4a00000
>>>
>>> IMPORTANT: if you fix the issue, please add the following tag to the 
>>> commit:
>>> Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com
>>> Fixes: c0d9271ecbd8 ("drm/amdgpu: Delete user queue doorbell 
>>> variables")
>>>
>>> ------------[ cut here ]------------
>>> Looking for class "l2tp_sock" with key l2tp_socket_class, but found 
>>> a different class "slock-AF_INET6" with the same key
>>> WARNING: CPU: 0 PID: 7280 at kernel/locking/lockdep.c:937 
>>> look_up_lock_class+0x97/0x110 kernel/locking/lockdep.c:937
>>> Modules linked in:
>>> CPU: 0 PID: 7280 Comm: syz-executor835 Not tainted 
>>> 6.2.0-rc1-syzkaller #0
>>> Hardware name: Google Google Compute Engine/Google Compute Engine, 
>>> BIOS Google 10/26/2022
>>> RIP: 0010:look_up_lock_class+0x97/0x110 kernel/locking/lockdep.c:937
>>> Code: 17 48 81 fa e0 e5 f6 8f 74 59 80 3d 5d bc 57 04 00 75 50 48 c7 
>>> c7 00 4d 4c 8a 48 89 04 24 c6 05 49 bc 57 04 01 e8 a9 42 b9 ff <0f> 
>>> 0b 48 8b 04 24 eb 31 9c 5a 80 e6 02 74 95 e8 45 38 02 fa 85 c0
>>> RSP: 0018:ffffc9000b5378b8 EFLAGS: 00010082
>>> RAX: 0000000000000000 RBX: ffffffff91c06a00 RCX: 0000000000000000
>>> RDX: ffff8880292d0000 RSI: ffffffff8166721c RDI: fffff520016a6f09
>>> RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
>>> R10: 0000000080000201 R11: 20676e696b6f6f4c R12: 0000000000000000
>>> R13: ffff88802a5820b0 R14: 0000000000000000 R15: 0000000000000000
>>> FS:  00007f1fd7a97700(0000) GS:ffff8880b9800000(0000) 
>>> knlGS:0000000000000000
>>> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 0000000020000100 CR3: 0000000078ab4000 CR4: 00000000003506f0
>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> Call Trace:
>>>   <TASK>
>>>   register_lock_class+0xbe/0x1120 kernel/locking/lockdep.c:1289
>>>   __lock_acquire+0x109/0x56d0 kernel/locking/lockdep.c:4934
>>>   lock_acquire kernel/locking/lockdep.c:5668 [inline]
>>>   lock_acquire+0x1e3/0x630 kernel/locking/lockdep.c:5633
>>>   __raw_spin_lock_bh include/linux/spinlock_api_smp.h:126 [inline]
>>>   _raw_spin_lock_bh+0x33/0x40 kernel/locking/spinlock.c:178
>>>   spin_lock_bh include/linux/spinlock.h:355 [inline]
>>>   lock_sock_nested+0x5f/0xf0 net/core/sock.c:3473
>>>   lock_sock include/net/sock.h:1725 [inline]
>>>   inet_autobind+0x1a/0x190 net/ipv4/af_inet.c:177
>>>   inet_send_prepare net/ipv4/af_inet.c:813 [inline]
>>>   inet_send_prepare+0x325/0x4e0 net/ipv4/af_inet.c:807
>>>   inet6_sendmsg+0x43/0xe0 net/ipv6/af_inet6.c:655
>>>   sock_sendmsg_nosec net/socket.c:714 [inline]
>>>   sock_sendmsg+0xd3/0x120 net/socket.c:734
>>>   __sys_sendto+0x23a/0x340 net/socket.c:2117
>>>   __do_sys_sendto net/socket.c:2129 [inline]
>>>   __se_sys_sendto net/socket.c:2125 [inline]
>>>   __x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125
>>>   do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>>>   do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
>>>   entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> RIP: 0033:0x7f1fd78538b9
>>> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 15 00 00 90 48 89 f8 48 
>>> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 
>>> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
>>> RSP: 002b:00007f1fd7a971f8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
>>> RAX: ffffffffffffffda RBX: 00007f1fd78f0038 RCX: 00007f1fd78538b9
>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
>>> RBP: 00007f1fd78f0030 R08: 0000000020000100 R09: 000000000000001c
>>> R10: 0000000004008000 R11: 0000000000000212 R12: 00007f1fd78f003c
>>> R13: 00007f1fd79ffc8f R14: 00007f1fd7a97300 R15: 0000000000022000
>>>   </TASK>
>>>
>>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] WARNING: locking bug in inet_autobind
  2023-01-03 16:20       ` Felix Kuehling
  2023-01-03 22:07           ` Tetsuo Handa
@ 2023-01-03 22:07           ` Tetsuo Handa
  0 siblings, 0 replies; 18+ messages in thread
From: Tetsuo Handa @ 2023-01-03 22:07 UTC (permalink / raw)
  To: Felix Kuehling, Waiman Long, edumazet, jakub
  Cc: syzkaller-bugs, netdev, syzbot, Alexander.Deucher,
	Christian.Koenig, David1.Zhou, Evan.Quan, Harry.Wentland,
	Oak.Zeng, Ray.Huang, Yong.Zhao, airlied, ast, boqun.feng, daniel,
	daniel, davem, dsahern, gautammenghani201, kafai, kuba, kuznet,
	mingo, ozeng, pabeni, peterz, rex.zhu, songliubraving, will, yhs,
	yoshfuji

On 2023/01/04 1:20, Felix Kuehling wrote:
> 
> Am 2023-01-03 um 11:05 schrieb Waiman Long:
>> On 1/3/23 10:39, Felix Kuehling wrote:
>>> The regression point doesn't make sense. The kernel config doesn't enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU could have caused this regression.
>>>
>> I agree. It is likely a pre-existing problem or caused by another commit that got triggered because of the change in cacheline alignment caused by commit c0d9271ecbd ("drm/amdgpu: Delete user queue doorbell variable").
> I don't think the change can affect cache line alignment. The entire amdgpu driver doesn't even get compiled in the kernel config that was used, and the change doesn't touch any files outside drivers/gpu/drm/amd/amdgpu:
> 
> # CONFIG_DRM_AMDGPU is not set
> 
> My guess would be that it's an intermittent bug that is confusing bisect.
> 
> Regards,
>   Felix

This was already explained in https://groups.google.com/g/syzkaller-bugs/c/1rmGDmbXWIw/m/nIQm0EmxBAAJ .

Jakub Sitnicki suggested

  What if we revisit Eric's lockdep splat fix in 37159ef2c1ae ("l2tp: fix
  a lockdep splat") and: 

  1. remove the lockdep_set_class_and_name(...) call in l2tp; it looks
     like an odd case within the network stack, and

  2. switch to bh_lock_sock_nested in l2tp_xmit_core so that we don't
     break what has been fixed in 37159ef2c1ae.

and we are waiting for response from Eric Dumazet.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] WARNING: locking bug in inet_autobind
@ 2023-01-03 22:07           ` Tetsuo Handa
  0 siblings, 0 replies; 18+ messages in thread
From: Tetsuo Handa @ 2023-01-03 22:07 UTC (permalink / raw)
  To: Felix Kuehling, Waiman Long, edumazet, jakub
  Cc: Yong.Zhao, songliubraving, Christian.Koenig, airlied, yhs, ast,
	Ray.Huang, will, David1.Zhou, syzbot, ozeng, daniel, Oak.Zeng,
	peterz, mingo, kuba, pabeni, boqun.feng, syzkaller-bugs, kuznet,
	Evan.Quan, yoshfuji, netdev, dsahern, davem, gautammenghani201,
	Alexander.Deucher, rex.zhu, kafai

On 2023/01/04 1:20, Felix Kuehling wrote:
> 
> Am 2023-01-03 um 11:05 schrieb Waiman Long:
>> On 1/3/23 10:39, Felix Kuehling wrote:
>>> The regression point doesn't make sense. The kernel config doesn't enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU could have caused this regression.
>>>
>> I agree. It is likely a pre-existing problem or caused by another commit that got triggered because of the change in cacheline alignment caused by commit c0d9271ecbd ("drm/amdgpu: Delete user queue doorbell variable").
> I don't think the change can affect cache line alignment. The entire amdgpu driver doesn't even get compiled in the kernel config that was used, and the change doesn't touch any files outside drivers/gpu/drm/amd/amdgpu:
> 
> # CONFIG_DRM_AMDGPU is not set
> 
> My guess would be that it's an intermittent bug that is confusing bisect.
> 
> Regards,
>   Felix

This was already explained in https://groups.google.com/g/syzkaller-bugs/c/1rmGDmbXWIw/m/nIQm0EmxBAAJ .

Jakub Sitnicki suggested

  What if we revisit Eric's lockdep splat fix in 37159ef2c1ae ("l2tp: fix
  a lockdep splat") and: 

  1. remove the lockdep_set_class_and_name(...) call in l2tp; it looks
     like an odd case within the network stack, and

  2. switch to bh_lock_sock_nested in l2tp_xmit_core so that we don't
     break what has been fixed in 37159ef2c1ae.

and we are waiting for response from Eric Dumazet.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] WARNING: locking bug in inet_autobind
@ 2023-01-03 22:07           ` Tetsuo Handa
  0 siblings, 0 replies; 18+ messages in thread
From: Tetsuo Handa @ 2023-01-03 22:07 UTC (permalink / raw)
  To: Felix Kuehling, Waiman Long, edumazet, jakub
  Cc: Yong.Zhao, songliubraving, Christian.Koenig, airlied, yhs, ast,
	Ray.Huang, will, David1.Zhou, syzbot, ozeng, daniel, Oak.Zeng,
	peterz, mingo, kuba, pabeni, Harry.Wentland, boqun.feng,
	syzkaller-bugs, kuznet, Evan.Quan, yoshfuji, netdev, dsahern,
	davem, daniel, gautammenghani201, Alexander.Deucher, rex.zhu,
	kafai

On 2023/01/04 1:20, Felix Kuehling wrote:
> 
> Am 2023-01-03 um 11:05 schrieb Waiman Long:
>> On 1/3/23 10:39, Felix Kuehling wrote:
>>> The regression point doesn't make sense. The kernel config doesn't enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU could have caused this regression.
>>>
>> I agree. It is likely a pre-existing problem or caused by another commit that got triggered because of the change in cacheline alignment caused by commit c0d9271ecbd ("drm/amdgpu: Delete user queue doorbell variable").
> I don't think the change can affect cache line alignment. The entire amdgpu driver doesn't even get compiled in the kernel config that was used, and the change doesn't touch any files outside drivers/gpu/drm/amd/amdgpu:
> 
> # CONFIG_DRM_AMDGPU is not set
> 
> My guess would be that it's an intermittent bug that is confusing bisect.
> 
> Regards,
>   Felix

This was already explained in https://groups.google.com/g/syzkaller-bugs/c/1rmGDmbXWIw/m/nIQm0EmxBAAJ .

Jakub Sitnicki suggested

  What if we revisit Eric's lockdep splat fix in 37159ef2c1ae ("l2tp: fix
  a lockdep splat") and: 

  1. remove the lockdep_set_class_and_name(...) call in l2tp; it looks
     like an odd case within the network stack, and

  2. switch to bh_lock_sock_nested in l2tp_xmit_core so that we don't
     break what has been fixed in 37159ef2c1ae.

and we are waiting for response from Eric Dumazet.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] WARNING: locking bug in inet_autobind
  2023-01-03 22:07           ` Tetsuo Handa
  (?)
  (?)
@ 2023-01-03 22:12           ` Eric Dumazet
  -1 siblings, 0 replies; 18+ messages in thread
From: Eric Dumazet @ 2023-01-03 22:12 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Felix Kuehling, Waiman Long, jakub, syzkaller-bugs, netdev,
	syzbot, Alexander.Deucher, Christian.Koenig, David1.Zhou,
	Evan.Quan, Harry.Wentland, Oak.Zeng, Ray.Huang, Yong.Zhao,
	airlied, ast, boqun.feng, daniel, daniel, davem, dsahern,
	gautammenghani201, kafai, kuba, kuznet, mingo, ozeng, pabeni,
	peterz, rex.zhu, songliubraving, will, yhs, yoshfuji

On Tue, Jan 3, 2023 at 11:08 PM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> On 2023/01/04 1:20, Felix Kuehling wrote:
> >
> > Am 2023-01-03 um 11:05 schrieb Waiman Long:
> >> On 1/3/23 10:39, Felix Kuehling wrote:
> >>> The regression point doesn't make sense. The kernel config doesn't enable CONFIG_DRM_AMDGPU, so there is no way that a change in AMDGPU could have caused this regression.
> >>>
> >> I agree. It is likely a pre-existing problem or caused by another commit that got triggered because of the change in cacheline alignment caused by commit c0d9271ecbd ("drm/amdgpu: Delete user queue doorbell variable").
> > I don't think the change can affect cache line alignment. The entire amdgpu driver doesn't even get compiled in the kernel config that was used, and the change doesn't touch any files outside drivers/gpu/drm/amd/amdgpu:
> >
> > # CONFIG_DRM_AMDGPU is not set
> >
> > My guess would be that it's an intermittent bug that is confusing bisect.
> >
> > Regards,
> >   Felix
>
> This was already explained in https://groups.google.com/g/syzkaller-bugs/c/1rmGDmbXWIw/m/nIQm0EmxBAAJ .
>
> Jakub Sitnicki suggested
>
>   What if we revisit Eric's lockdep splat fix in 37159ef2c1ae ("l2tp: fix
>   a lockdep splat") and:
>
>   1. remove the lockdep_set_class_and_name(...) call in l2tp; it looks
>      like an odd case within the network stack, and
>
>   2. switch to bh_lock_sock_nested in l2tp_xmit_core so that we don't
>      break what has been fixed in 37159ef2c1ae.
>
> and we are waiting for response from Eric Dumazet.
>

Eric Dumazet has been very busy.

Send a patch, instead of an idea/description.

Thanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [syzbot] WARNING: locking bug in inet_autobind
       [not found] <20221229101603.2931-1-hdanton@sina.com>
@ 2022-12-29 10:43 ` syzbot
  0 siblings, 0 replies; 18+ messages in thread
From: syzbot @ 2022-12-29 10:43 UTC (permalink / raw)
  To: hdanton, linux-kernel, syzkaller-bugs

Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
INFO: rcu detected stall in corrupted

rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { P5564 } 2687 jiffies s: 2885 root: 0x0/T
rcu: blocking rcu_node structures (internal RCU debug):


Tested on:

commit:         1b929c02 Linux 6.2-rc1
git tree:       https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
console output: https://syzkaller.appspot.com/x/log.txt?x=124c2632480000
kernel config:  https://syzkaller.appspot.com/x/.config?x=2651619a26b4d687
dashboard link: https://syzkaller.appspot.com/bug?extid=94cc2a66fc228b23f360
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
patch:          https://syzkaller.appspot.com/x/patch.diff?x=16485ff2480000


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-01-04  8:15 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16  5:46 WARNING: locking bug in inet_autobind syzbot
2019-05-21  8:31 ` syzbot
2019-05-22  3:16 ` syzbot
     [not found]   ` <0000000000008b645c058971629b-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2019-05-22  3:21     ` Zhao, Yong
2022-09-18 15:52 ` Tetsuo Handa
2022-09-18 18:25   ` Boqun Feng
2022-09-19  5:02     ` Tetsuo Handa
2022-09-27 13:00       ` Tetsuo Handa
2022-11-22 18:02         ` Jakub Sitnicki
2022-12-29  6:26 ` [syzbot] " syzbot
2023-01-03 15:39   ` Felix Kuehling
2023-01-03 16:05     ` Waiman Long
2023-01-03 16:20       ` Felix Kuehling
2023-01-03 22:07         ` Tetsuo Handa
2023-01-03 22:07           ` Tetsuo Handa
2023-01-03 22:07           ` Tetsuo Handa
2023-01-03 22:12           ` Eric Dumazet
     [not found] <20221229101603.2931-1-hdanton@sina.com>
2022-12-29 10:43 ` syzbot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.