linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [syzbot] WARNING in c_start
@ 2022-10-14 21:16 syzbot
  2022-10-14 21:24 ` Borislav Petkov
  0 siblings, 1 reply; 11+ messages in thread
From: syzbot @ 2022-10-14 21:16 UTC (permalink / raw)
  To: bp, dave.hansen, hpa, linux-kernel, mingo, paulmck, peterz,
	rafael.j.wysocki, syzkaller-bugs, tglx, x86

Hello,

syzbot found the following issue on:

HEAD commit:    9c9155a3509a Merge tag 'drm-next-2022-10-14' of git://anon..
git tree:       upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1030f562880000
kernel config:  https://syzkaller.appspot.com/x/.config?x=557e715ffce7a200
dashboard link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com

------------[ cut here ]------------
WARNING: CPU: 1 PID: 3671 at include/linux/cpumask.h:110 cpu_max_bits_warn include/linux/cpumask.h:110 [inline]
WARNING: CPU: 1 PID: 3671 at include/linux/cpumask.h:110 cpumask_check include/linux/cpumask.h:117 [inline]
WARNING: CPU: 1 PID: 3671 at include/linux/cpumask.h:110 cpumask_next include/linux/cpumask.h:178 [inline]
WARNING: CPU: 1 PID: 3671 at include/linux/cpumask.h:110 c_start+0x152/0x1b0 arch/x86/kernel/cpu/proc.c:156
Modules linked in:
CPU: 1 PID: 3671 Comm: syz-fuzzer Not tainted 6.0.0-syzkaller-11990-g9c9155a3509a #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
RIP: 0010:cpu_max_bits_warn include/linux/cpumask.h:110 [inline]
RIP: 0010:cpumask_check include/linux/cpumask.h:117 [inline]
RIP: 0010:cpumask_next include/linux/cpumask.h:178 [inline]
RIP: 0010:c_start+0x152/0x1b0 arch/x86/kernel/cpu/proc.c:156
Code: a0 68 99 8b e8 3f 97 4b 00 5b 5d 4c 89 e0 41 5c 41 5d c3 e8 30 97 4b 00 45 31 e4 5b 4c 89 e0 5d 41 5c 41 5d c3 e8 1e 97 4b 00 <0f> 0b e9 1e ff ff ff 48 c7 c7 40 54 e1 8d e8 fb 19 97 00 e9 6b ff
RSP: 0018:ffffc90002f27c70 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffffffff8de15440 RCX: 0000000000000000
RDX: ffff88801b3b6100 RSI: ffffffff812ffc02 RDI: 0000000000000004
RBP: 0000000000000008 R08: 0000000000000004 R09: 0000000000000008
R10: 0000000000000008 R11: 0000000000000001 R12: ffff88801c29c278
R13: 0000000000000008 R14: ffff888023b17b3c R15: 0000000000000000
FS:  000000c00004e890(0000) GS:ffff88802c800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe53b98da00 CR3: 000000001b4cb000 CR4: 0000000000150ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 seq_read_iter+0x2c6/0x1280 fs/seq_file.c:225
 proc_reg_read_iter+0x111/0x2d0 fs/proc/inode.c:301
 call_read_iter include/linux/fs.h:2185 [inline]
 new_sync_read fs/read_write.c:389 [inline]
 vfs_read+0x67d/0x930 fs/read_write.c:470
 ksys_read+0x127/0x250 fs/read_write.c:613
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x4ae09b
Code: e8 ea 57 fb ff eb 88 cc cc cc cc cc cc cc cc e8 fb 9b fb ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
RSP: 002b:000000c0007233b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000000c00003e800 RCX: 00000000004ae09b
RDX: 0000000000000b49 RSI: 000000c0004fe4b7 RDI: 0000000000000003
RBP: 000000c000723408 R08: 0000000000000001 R09: 000000c000074480
R10: 0000000000000b49 R11: 0000000000000212 R12: 000000c000723570
R13: 0000000000000000 R14: 000000c0000001a0 R15: 00007fe53942c2f3
 </TASK>


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-14 21:16 [syzbot] WARNING in c_start syzbot
@ 2022-10-14 21:24 ` Borislav Petkov
       [not found]   ` <2eaf1386-8ab0-bd65-acee-e29f1c5a6623@I-love.SAKURA.ne.jp>
  0 siblings, 1 reply; 11+ messages in thread
From: Borislav Petkov @ 2022-10-14 21:24 UTC (permalink / raw)
  To: syzbot
  Cc: dave.hansen, hpa, linux-kernel, mingo, paulmck, peterz,
	rafael.j.wysocki, syzkaller-bugs, tglx, x86

On Fri, Oct 14, 2022 at 02:16:42PM -0700, syzbot wrote:
> Hello,
> 
> syzbot found the following issue on:
> 
> HEAD commit:    9c9155a3509a Merge tag 'drm-next-2022-10-14' of git://anon..
> git tree:       upstream
> console output: https://syzkaller.appspot.com/x/log.txt?x=1030f562880000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=557e715ffce7a200
> dashboard link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd
> compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> 
> Unfortunately, I don't have any reproducer for this issue yet.
> 
> IMPORTANT: if you fix the issue, please add the following tag to the commit:
> Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 3671 at include/linux/cpumask.h:110 cpu_max_bits_warn include/linux/cpumask.h:110 [inline]
> WARNING: CPU: 1 PID: 3671 at include/linux/cpumask.h:110 cpumask_check include/linux/cpumask.h:117 [inline]
> WARNING: CPU: 1 PID: 3671 at include/linux/cpumask.h:110 cpumask_next include/linux/cpumask.h:178 [inline]
> WARNING: CPU: 1 PID: 3671 at include/linux/cpumask.h:110 c_start+0x152/0x1b0 arch/x86/kernel/cpu/proc.c:156
> Modules linked in:
> CPU: 1 PID: 3671 Comm: syz-fuzzer Not tainted 6.0.0-syzkaller-11990-g9c9155a3509a #0
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
> RIP: 0010:cpu_max_bits_warn include/linux/cpumask.h:110 [inline]
> RIP: 0010:cpumask_check include/linux/cpumask.h:117 [inline]
> RIP: 0010:cpumask_next include/linux/cpumask.h:178 [inline]
> RIP: 0010:c_start+0x152/0x1b0 arch/x86/kernel/cpu/proc.c:156
> Code: a0 68 99 8b e8 3f 97 4b 00 5b 5d 4c 89 e0 41 5c 41 5d c3 e8 30 97 4b 00 45 31 e4 5b 4c 89 e0 5d 41 5c 41 5d c3 e8 1e 97 4b 00 <0f> 0b e9 1e ff ff ff 48 c7 c7 40 54 e1 8d e8 fb 19 97 00 e9 6b ff
> RSP: 0018:ffffc90002f27c70 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: ffffffff8de15440 RCX: 0000000000000000
> RDX: ffff88801b3b6100 RSI: ffffffff812ffc02 RDI: 0000000000000004
> RBP: 0000000000000008 R08: 0000000000000004 R09: 0000000000000008
> R10: 0000000000000008 R11: 0000000000000001 R12: ffff88801c29c278
> R13: 0000000000000008 R14: ffff888023b17b3c R15: 0000000000000000
> FS:  000000c00004e890(0000) GS:ffff88802c800000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fe53b98da00 CR3: 000000001b4cb000 CR4: 0000000000150ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  <TASK>
>  seq_read_iter+0x2c6/0x1280 fs/seq_file.c:225
>  proc_reg_read_iter+0x111/0x2d0 fs/proc/inode.c:301
>  call_read_iter include/linux/fs.h:2185 [inline]
>  new_sync_read fs/read_write.c:389 [inline]
>  vfs_read+0x67d/0x930 fs/read_write.c:470
>  ksys_read+0x127/0x250 fs/read_write.c:613
>  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
>  do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
>  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x4ae09b
> Code: e8 ea 57 fb ff eb 88 cc cc cc cc cc cc cc cc e8 fb 9b fb ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
> RSP: 002b:000000c0007233b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 000000c00003e800 RCX: 00000000004ae09b
> RDX: 0000000000000b49 RSI: 000000c0004fe4b7 RDI: 0000000000000003
> RBP: 000000c000723408 R08: 0000000000000001 R09: 000000c000074480
> R10: 0000000000000b49 R11: 0000000000000212 R12: 000000c000723570
> R13: 0000000000000000 R14: 000000c0000001a0 R15: 00007fe53942c2f3
>  </TASK>
> 
> 
> ---
> This report is generated by a bot. It may contain errors.
> See https://goo.gl/tpsmEJ for more information about syzbot.
> syzbot engineers can be reached at syzkaller@googlegroups.com.
> 
> syzbot will keep track of this issue. See:
> https://goo.gl/tpsmEJ#status for how to communicate with syzbot.

Let's see if the bot understands fetching patches from mailing lists:

# syz test: https://lore.kernel.org/r/20221014155845.1986223-3-ajones@ventanamicro.com

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] WARNING in c_start
       [not found]     ` <Y0qfLyhSoTodAdxu@zn.tnic>
@ 2022-10-15 20:44       ` Yury Norov
  2022-10-16  0:24         ` Tetsuo Handa
  0 siblings, 1 reply; 11+ messages in thread
From: Yury Norov @ 2022-10-15 20:44 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Tetsuo Handa, syzbot, syzkaller-bugs, Andrew Jones, netdev,
	David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Menglong Dong, Kuniyuki Iwashima,
	Petr Machata, Guo Ren, Michael S . Tsirkin, Alexander Gordeev,
	andriy.shevchenko, linux, yury.norov, caraitto, willemb,
	jonolson, amritha.nambiar, linux-kernel

Add people from other threads discussing this.

On Sat, Oct 15, 2022 at 01:53:19PM +0200, Borislav Petkov wrote:
> On Sat, Oct 15, 2022 at 08:39:19PM +0900, Tetsuo Handa wrote:
> > That's an invalid command line. The correct syntax is:
> > 
> > #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> The fix is not in Linus' tree yet.
> 
> > Andrew Jones proposed a fix for x86 and riscv architectures [2]. But
> > other architectures have the same problem. And fixing all callers will
> > not be in time for this merge window.
> 
> Why won't there be time? That's why the -rcs are for.
> 
> Also, that thing fires only when CONFIG_DEBUG_PER_CPU_MAPS is enabled.
> 
> So no, we will take Andrew's fixes for all arches in time for 6.1.

Summarizing things:

1. cpumask_check() was introduced to make sure that the cpu number
passed into cpumask API belongs to a valid range. But the check is
broken for a very long time. And because of that there are a lot of
places where cpumask API is used wrongly.

2. Underlying bitmap functions handle that correctly - when user
passes out-of-range CPU index, the nr_cpu_ids is returned, and this is
what expected by client code. So if DEBUG_PER_CPU_MAPS config is off,
everything is working smoothly.

3. I fixed all warnings that I was aware at the time of submitting the
patch. 2 follow-up series are on review: "[PATCH v2 0/4] net: drop
netif_attrmask_next*()" and "[PATCH 0/9] lib/cpumask: simplify
cpumask_next_wrap()". Also, Andrew Jones, Alexander Gordeev and Guo Ren
proposed fixes for c_start() in arch code.

4. The code paths mentioned above are all known to me that violate
cpumask_check() rules. (Did I miss something?)

With all that, I agree with Borislav. Unfortunately, syzcall didn't CC
me about this problem with c_start(). But I don't like the idea to revert
cpumask_check() fix. This way we'll never clean that mess. 

If for some reason those warnings are unacceptable for -rcs (and like
Boris, I don't understand why), than instead of reverting commits, I'd
suggest moving cpumask sanity check from DEBUG_PER_CPU_MAPS under a new
config, say CONFIG_CPUMASK_DEBUG, which will be inactive until people will
fix their code. I can send a patch shortly, if we'll decide going this way.

How people would even realize that they're doing something wrong if
they will not get warned about it?

Thanks,
Yury

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-15 20:44       ` Yury Norov
@ 2022-10-16  0:24         ` Tetsuo Handa
  2022-10-16  0:28           ` Randy Dunlap
  2022-10-16  1:12           ` Yury Norov
  0 siblings, 2 replies; 11+ messages in thread
From: Tetsuo Handa @ 2022-10-16  0:24 UTC (permalink / raw)
  To: Yury Norov, Borislav Petkov, Linus Torvalds
  Cc: syzbot, syzkaller-bugs, Andrew Jones, netdev, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Menglong Dong, Kuniyuki Iwashima,
	Petr Machata, Guo Ren, Michael S . Tsirkin, Alexander Gordeev,
	andriy.shevchenko, linux, caraitto, willemb, jonolson,
	amritha.nambiar, linux-kernel

On 2022/10/16 5:44, Yury Norov wrote:
> Add people from other threads discussing this.
> 
> On Sat, Oct 15, 2022 at 01:53:19PM +0200, Borislav Petkov wrote:
>> On Sat, Oct 15, 2022 at 08:39:19PM +0900, Tetsuo Handa wrote:
>>> That's an invalid command line. The correct syntax is:
>>>
>>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>
>> The fix is not in Linus' tree yet.
>>
>>> Andrew Jones proposed a fix for x86 and riscv architectures [2]. But
>>> other architectures have the same problem. And fixing all callers will
>>> not be in time for this merge window.
>>
>> Why won't there be time? That's why the -rcs are for.
>>
>> Also, that thing fires only when CONFIG_DEBUG_PER_CPU_MAPS is enabled.
>>
>> So no, we will take Andrew's fixes for all arches in time for 6.1.
> 
> Summarizing things:
> 
> 1. cpumask_check() was introduced to make sure that the cpu number
> passed into cpumask API belongs to a valid range. But the check is
> broken for a very long time. And because of that there are a lot of
> places where cpumask API is used wrongly.
> 
> 2. Underlying bitmap functions handle that correctly - when user
> passes out-of-range CPU index, the nr_cpu_ids is returned, and this is
> what expected by client code. So if DEBUG_PER_CPU_MAPS config is off,
> everything is working smoothly.
> 
> 3. I fixed all warnings that I was aware at the time of submitting the
> patch. 2 follow-up series are on review: "[PATCH v2 0/4] net: drop
> netif_attrmask_next*()" and "[PATCH 0/9] lib/cpumask: simplify
> cpumask_next_wrap()". Also, Andrew Jones, Alexander Gordeev and Guo Ren
> proposed fixes for c_start() in arch code.
> 
> 4. The code paths mentioned above are all known to me that violate
> cpumask_check() rules. (Did I miss something?)
> 
> With all that, I agree with Borislav. Unfortunately, syzcall didn't CC
> me about this problem with c_start(). But I don't like the idea to revert
> cpumask_check() fix. This way we'll never clean that mess. 
> 
> If for some reason those warnings are unacceptable for -rcs (and like
> Boris, I don't understand why), than instead of reverting commits, I'd
> suggest moving cpumask sanity check from DEBUG_PER_CPU_MAPS under a new
> config, say CONFIG_CPUMASK_DEBUG, which will be inactive until people will
> fix their code. I can send a patch shortly, if we'll decide going this way.
> 
> How people would even realize that they're doing something wrong if
> they will not get warned about it?

I'm asking you not to use BUG_ON()/WARN_ON() etc. which breaks syzkaller.
Just printing messages (without "BUG:"/"WARNING:" string which also breaks
syzkaller) like below diff is sufficient for people to realize that they're
doing something wrong.

Again, please do revert "cpumask: fix checking valid cpu range" immediately.

 include/linux/cpumask.h | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index c2aa0aa26b45..31af2cc5f0c2 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -118,6 +118,18 @@ static __always_inline unsigned int cpumask_check(unsigned int cpu)
 	return cpu;
 }
 
+/*
+ * We want to avoid passing -1 as a valid cpu argument.
+ * But we should not crash the kernel until all in-tree callers are fixed.
+ */
+static __always_inline void report_negative_cpuid(void)
+{
+#ifdef CONFIG_DEBUG_PER_CPU_MAPS
+	pr_warn_once("FIXME: Passing -1 as CPU argument needs to be avoided.\n");
+	DO_ONCE_LITE(dump_stack);
+#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
+}
+
 /**
  * cpumask_first - get the first cpu in a cpumask
  * @srcp: the cpumask pointer
@@ -177,6 +189,8 @@ unsigned int cpumask_next(int n, const struct cpumask *srcp)
 	/* -1 is a legal arg here. */
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 	return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
 }
 
@@ -192,6 +206,8 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp)
 	/* -1 is a legal arg here. */
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 	return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1);
 }
 
@@ -234,6 +250,8 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p,
 	/* -1 is a legal arg here. */
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
 		nr_cpumask_bits, n + 1);
 }
@@ -265,6 +283,8 @@ unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, boo
 	cpumask_check(start);
 	if (n != -1)
 		cpumask_check(n);
+	else
+		report_negative_cpuid();
 
 	/*
 	 * Return the first available CPU when wrapping, or when starting before cpu0,
-- 
2.18.4

> 
> Thanks,
> Yury


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16  0:24         ` Tetsuo Handa
@ 2022-10-16  0:28           ` Randy Dunlap
  2022-10-16  0:34             ` Tetsuo Handa
  2022-10-16  1:12           ` Yury Norov
  1 sibling, 1 reply; 11+ messages in thread
From: Randy Dunlap @ 2022-10-16  0:28 UTC (permalink / raw)
  To: Tetsuo Handa, Yury Norov, Borislav Petkov, Linus Torvalds
  Cc: syzbot, syzkaller-bugs, Andrew Jones, netdev, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Menglong Dong, Kuniyuki Iwashima,
	Petr Machata, Guo Ren, Michael S . Tsirkin, Alexander Gordeev,
	andriy.shevchenko, linux, caraitto, willemb, jonolson,
	amritha.nambiar, linux-kernel

Hi--

On 10/15/22 17:24, Tetsuo Handa wrote:
> On 2022/10/16 5:44, Yury Norov wrote:
>> Add people from other threads discussing this.
>>
>> On Sat, Oct 15, 2022 at 01:53:19PM +0200, Borislav Petkov wrote:
>>> On Sat, Oct 15, 2022 at 08:39:19PM +0900, Tetsuo Handa wrote:
>>>> That's an invalid command line. The correct syntax is:
>>>>
>>>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>>>
>>> The fix is not in Linus' tree yet.
>>>
>>>> Andrew Jones proposed a fix for x86 and riscv architectures [2]. But
>>>> other architectures have the same problem. And fixing all callers will
>>>> not be in time for this merge window.
>>>
>>> Why won't there be time? That's why the -rcs are for.
>>>
>>> Also, that thing fires only when CONFIG_DEBUG_PER_CPU_MAPS is enabled.
>>>
>>> So no, we will take Andrew's fixes for all arches in time for 6.1.
>>
>> Summarizing things:
>>
>> 1. cpumask_check() was introduced to make sure that the cpu number
>> passed into cpumask API belongs to a valid range. But the check is
>> broken for a very long time. And because of that there are a lot of
>> places where cpumask API is used wrongly.
>>
>> 2. Underlying bitmap functions handle that correctly - when user
>> passes out-of-range CPU index, the nr_cpu_ids is returned, and this is
>> what expected by client code. So if DEBUG_PER_CPU_MAPS config is off,
>> everything is working smoothly.
>>
>> 3. I fixed all warnings that I was aware at the time of submitting the
>> patch. 2 follow-up series are on review: "[PATCH v2 0/4] net: drop
>> netif_attrmask_next*()" and "[PATCH 0/9] lib/cpumask: simplify
>> cpumask_next_wrap()". Also, Andrew Jones, Alexander Gordeev and Guo Ren
>> proposed fixes for c_start() in arch code.
>>
>> 4. The code paths mentioned above are all known to me that violate
>> cpumask_check() rules. (Did I miss something?)
>>
>> With all that, I agree with Borislav. Unfortunately, syzcall didn't CC
>> me about this problem with c_start(). But I don't like the idea to revert
>> cpumask_check() fix. This way we'll never clean that mess. 
>>
>> If for some reason those warnings are unacceptable for -rcs (and like
>> Boris, I don't understand why), than instead of reverting commits, I'd
>> suggest moving cpumask sanity check from DEBUG_PER_CPU_MAPS under a new
>> config, say CONFIG_CPUMASK_DEBUG, which will be inactive until people will
>> fix their code. I can send a patch shortly, if we'll decide going this way.
>>
>> How people would even realize that they're doing something wrong if
>> they will not get warned about it?
> 
> I'm asking you not to use BUG_ON()/WARN_ON() etc. which breaks syzkaller.
> Just printing messages (without "BUG:"/"WARNING:" string which also breaks
> syzkaller) like below diff is sufficient for people to realize that they're
> doing something wrong.
> 
> Again, please do revert "cpumask: fix checking valid cpu range" immediately.
> 
>  include/linux/cpumask.h | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> index c2aa0aa26b45..31af2cc5f0c2 100644
> --- a/include/linux/cpumask.h
> +++ b/include/linux/cpumask.h
> @@ -118,6 +118,18 @@ static __always_inline unsigned int cpumask_check(unsigned int cpu)
>  	return cpu;
>  }
>  
> +/*
> + * We want to avoid passing -1 as a valid cpu argument.
> + * But we should not crash the kernel until all in-tree callers are fixed.
> + */

Why not say that any negative cpu argument is invalid?
Or is it OK to pass -2 as the cpu arg?

> +static __always_inline void report_negative_cpuid(void)
> +{
> +#ifdef CONFIG_DEBUG_PER_CPU_MAPS
> +	pr_warn_once("FIXME: Passing -1 as CPU argument needs to be avoided.\n");
> +	DO_ONCE_LITE(dump_stack);
> +#endif /* CONFIG_DEBUG_PER_CPU_MAPS */
> +}
> +
>  /**
>   * cpumask_first - get the first cpu in a cpumask
>   * @srcp: the cpumask pointer
> @@ -177,6 +189,8 @@ unsigned int cpumask_next(int n, const struct cpumask *srcp)
>  	/* -1 is a legal arg here. */
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  	return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
>  }
>  
> @@ -192,6 +206,8 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp)
>  	/* -1 is a legal arg here. */
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  	return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1);
>  }
>  
> @@ -234,6 +250,8 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p,
>  	/* -1 is a legal arg here. */
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  	return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p),
>  		nr_cpumask_bits, n + 1);
>  }
> @@ -265,6 +283,8 @@ unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, boo
>  	cpumask_check(start);
>  	if (n != -1)
>  		cpumask_check(n);
> +	else
> +		report_negative_cpuid();
>  
>  	/*
>  	 * Return the first available CPU when wrapping, or when starting before cpu0,

-- 
~Randy

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16  0:28           ` Randy Dunlap
@ 2022-10-16  0:34             ` Tetsuo Handa
  0 siblings, 0 replies; 11+ messages in thread
From: Tetsuo Handa @ 2022-10-16  0:34 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: syzbot, syzkaller-bugs, Andrew Jones, netdev, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Sebastian Andrzej Siewior, Menglong Dong, Kuniyuki Iwashima,
	Petr Machata, Guo Ren, Michael S . Tsirkin, Alexander Gordeev,
	andriy.shevchenko, linux, caraitto, willemb, jonolson,
	amritha.nambiar, linux-kernel, Yury Norov, Borislav Petkov,
	Linus Torvalds

On 2022/10/16 9:28, Randy Dunlap wrote:
>> +/*
>> + * We want to avoid passing -1 as a valid cpu argument.
>> + * But we should not crash the kernel until all in-tree callers are fixed.
>> + */
> 
> Why not say that any negative cpu argument is invalid?

Currently only -1 is accepted as exception.

>>  	if (n != -1)
>>  		cpumask_check(n);
>> +	else
>> +		report_negative_cpuid();

> Or is it OK to pass -2 as the cpu arg?

Passing -2 will hit WARN_ON_ONCE(cpu >= nr_cpumask_bits) path.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16  0:24         ` Tetsuo Handa
  2022-10-16  0:28           ` Randy Dunlap
@ 2022-10-16  1:12           ` Yury Norov
  2022-10-16  4:10             ` Tetsuo Handa
  2022-10-16 17:52             ` Linus Torvalds
  1 sibling, 2 replies; 11+ messages in thread
From: Yury Norov @ 2022-10-16  1:12 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Borislav Petkov, Linus Torvalds, syzbot, syzkaller-bugs,
	Andrew Jones, netdev, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Sebastian Andrzej Siewior,
	Menglong Dong, Kuniyuki Iwashima, Petr Machata, Guo Ren,
	Michael S . Tsirkin, Alexander Gordeev, andriy.shevchenko, linux,
	caraitto, willemb, jonolson, amritha.nambiar, linux-kernel

On Sun, Oct 16, 2022 at 09:24:57AM +0900, Tetsuo Handa wrote:
> On 2022/10/16 5:44, Yury Norov wrote:
> > Add people from other threads discussing this.
> > 
> > On Sat, Oct 15, 2022 at 01:53:19PM +0200, Borislav Petkov wrote:
> >> On Sat, Oct 15, 2022 at 08:39:19PM +0900, Tetsuo Handa wrote:
> >>> That's an invalid command line. The correct syntax is:
> >>>
> >>> #syz test: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> >>
> >> The fix is not in Linus' tree yet.
> >>
> >>> Andrew Jones proposed a fix for x86 and riscv architectures [2]. But
> >>> other architectures have the same problem. And fixing all callers will
> >>> not be in time for this merge window.
> >>
> >> Why won't there be time? That's why the -rcs are for.
> >>
> >> Also, that thing fires only when CONFIG_DEBUG_PER_CPU_MAPS is enabled.
> >>
> >> So no, we will take Andrew's fixes for all arches in time for 6.1.
> > 
> > Summarizing things:
> > 
> > 1. cpumask_check() was introduced to make sure that the cpu number
> > passed into cpumask API belongs to a valid range. But the check is
> > broken for a very long time. And because of that there are a lot of
> > places where cpumask API is used wrongly.
> > 
> > 2. Underlying bitmap functions handle that correctly - when user
> > passes out-of-range CPU index, the nr_cpu_ids is returned, and this is
> > what expected by client code. So if DEBUG_PER_CPU_MAPS config is off,
> > everything is working smoothly.
> > 
> > 3. I fixed all warnings that I was aware at the time of submitting the
> > patch. 2 follow-up series are on review: "[PATCH v2 0/4] net: drop
> > netif_attrmask_next*()" and "[PATCH 0/9] lib/cpumask: simplify
> > cpumask_next_wrap()". Also, Andrew Jones, Alexander Gordeev and Guo Ren
> > proposed fixes for c_start() in arch code.
> > 
> > 4. The code paths mentioned above are all known to me that violate
> > cpumask_check() rules. (Did I miss something?)
> > 
> > With all that, I agree with Borislav. Unfortunately, syzcall didn't CC
> > me about this problem with c_start(). But I don't like the idea to revert
> > cpumask_check() fix. This way we'll never clean that mess. 
> > 
> > If for some reason those warnings are unacceptable for -rcs (and like
> > Boris, I don't understand why), than instead of reverting commits, I'd
> > suggest moving cpumask sanity check from DEBUG_PER_CPU_MAPS under a new
> > config, say CONFIG_CPUMASK_DEBUG, which will be inactive until people will
> > fix their code. I can send a patch shortly, if we'll decide going this way.
> > 
> > How people would even realize that they're doing something wrong if
> > they will not get warned about it?
> 
> I'm asking you not to use BUG_ON()/WARN_ON() etc. which breaks syzkaller.

It's not me who added WARN_ON() in the cpumask_check(). You're asking
a wrong person. 

What for do we have WARN_ON(), if we can't use it?

> Just printing messages (without "BUG:"/"WARNING:" string which also breaks
> syzkaller) like below diff is sufficient for people to realize that they're
> doing something wrong.
> 
> Again, please do revert "cpumask: fix checking valid cpu range" immediately.

The revert is already in Jakub's batch for -rc2, AFAIK.

Thanks,
Yury

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16  1:12           ` Yury Norov
@ 2022-10-16  4:10             ` Tetsuo Handa
  2022-10-16 17:52             ` Linus Torvalds
  1 sibling, 0 replies; 11+ messages in thread
From: Tetsuo Handa @ 2022-10-16  4:10 UTC (permalink / raw)
  To: Yury Norov
  Cc: Borislav Petkov, Linus Torvalds, syzbot, syzkaller-bugs,
	Andrew Jones, netdev, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Sebastian Andrzej Siewior,
	Menglong Dong, Kuniyuki Iwashima, Petr Machata, Guo Ren,
	Michael S . Tsirkin, Alexander Gordeev, andriy.shevchenko, linux,
	caraitto, willemb, jonolson, amritha.nambiar, linux-kernel

On 2022/10/16 10:12, Yury Norov wrote:
> On Sun, Oct 16, 2022 at 09:24:57AM +0900, Tetsuo Handa wrote:
>> I'm asking you not to use BUG_ON()/WARN_ON() etc. which breaks syzkaller.
> 
> It's not me who added WARN_ON() in the cpumask_check(). You're asking
> a wrong person. 

Because you broke the kernel testing. It was obvious that passing "cpu + 1"
instead of "cpu" will trivially hit cpu == nr_cpumask_bits condition.
If your patch were reviewed and tested, we would not have done this discussion.

> 
> What for do we have WARN_ON(), if we can't use it?

WARN_ON() could be used which should not happen.
But cpu == nr_cpumask_bits condition shall happen in your patch.

RCs are not for fixing bugs that causes boot failures. Such bugs
should have been tested and fixed before patches are sent to linux.git .

> 
>> Just printing messages (without "BUG:"/"WARNING:" string which also breaks
>> syzkaller) like below diff is sufficient for people to realize that they're
>> doing something wrong.
>>
>> Again, please do revert "cpumask: fix checking valid cpu range" immediately.
> 
> The revert is already in Jakub's batch for -rc2, AFAIK.

OK.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16  1:12           ` Yury Norov
  2022-10-16  4:10             ` Tetsuo Handa
@ 2022-10-16 17:52             ` Linus Torvalds
  2022-10-17  2:54               ` Tetsuo Handa
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2022-10-16 17:52 UTC (permalink / raw)
  To: Yury Norov
  Cc: Tetsuo Handa, Borislav Petkov, syzbot, syzkaller-bugs,
	Andrew Jones, netdev, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Sebastian Andrzej Siewior,
	Menglong Dong, Kuniyuki Iwashima, Petr Machata, Guo Ren,
	Michael S . Tsirkin, Alexander Gordeev, andriy.shevchenko, linux,
	caraitto, willemb, jonolson, amritha.nambiar, linux-kernel

On Sat, Oct 15, 2022 at 6:12 PM Yury Norov <yury.norov@gmail.com> wrote:
>
> On Sun, Oct 16, 2022 at 09:24:57AM +0900, Tetsuo Handa wrote:
> >
> > Again, please do revert "cpumask: fix checking valid cpu range" immediately.
>
> The revert is already in Jakub's batch for -rc2, AFAIK.

Hmm.

I've looked at this, and at the discussion, and the various reports,
and my gut feel is that the problem is that the whole
"cpumask_check()" is completely bogus for all the "starting at bit X"
cases.

And I think it was wrong even before, and yes, I think the "+1"
simplification just made things worse.

I think that where it makes sense is in contexts where we actually
*use* the bit value as a bit, so cpumask_clear_cpu() doing

        clear_bit(cpumask_check(cpu), cpumask_bits(dstp));

makes 100% sense and is unequivocally something that merits a warning.
An out-of-range cpu number is a serious bug in that context.

But all the cases where the function fundamentally already limits
things to the number of CPU's (with comments like "Returns >=
nr_cpu_ids if no further cpus unset.") should simply not have the
cpumask_check() at all.

All it results in just moving the onus of testing things into the
callers, or just makes for odd complications ("-1 is valid, because it
acts as the previous cpu for the beginning because we add one to get
the next CPU").

Anyway, since rc1 is fairly imminent, I will just revert it for now -
I don't want to have a pending revert wait until -rc2.

But I actually suspect that the thing we should really do is to just
remove the check entirely for these functions that are already defined
in terms of "if no more bits, return nr_cpu_ids". They already
basically return an error case, having them *warn* about the error
they are going to return is just obnoxious.

                     Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-16 17:52             ` Linus Torvalds
@ 2022-10-17  2:54               ` Tetsuo Handa
  2022-10-17 21:26                 ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Tetsuo Handa @ 2022-10-17  2:54 UTC (permalink / raw)
  To: Linus Torvalds, Yury Norov, Jakub Kicinski
  Cc: Borislav Petkov, syzbot, syzkaller-bugs, Andrew Jones, netdev,
	David S . Miller, Eric Dumazet, Paolo Abeni,
	Sebastian Andrzej Siewior, Menglong Dong, Kuniyuki Iwashima,
	Petr Machata, Guo Ren, Michael S . Tsirkin, Alexander Gordeev,
	andriy.shevchenko, linux, caraitto, willemb, jonolson,
	amritha.nambiar, linux-kernel

On 2022/10/17 2:52, Linus Torvalds wrote:
> Anyway, since rc1 is fairly imminent, I will just revert it for now -
> I don't want to have a pending revert wait until -rc2.

Thank you, Linus.

Yury or Jakub, please send a revert request on commit 854701ba4c39 ("net: fix cpu_max_bits_warn()
usage in netif_attrmask_next{,_and}"), for https://syzkaller.appspot.com/bug?extid=9abe5ecc348676215427
says that boot is still failing.

> But I actually suspect that the thing we should really do is to just
> remove the check entirely for these functions that are already defined
> in terms of "if no more bits, return nr_cpu_ids". They already
> basically return an error case, having them *warn* about the error
> they are going to return is just obnoxious.

I agree.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [syzbot] WARNING in c_start
  2022-10-17  2:54               ` Tetsuo Handa
@ 2022-10-17 21:26                 ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2022-10-17 21:26 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Linus Torvalds, Yury Norov, Borislav Petkov, syzbot,
	syzkaller-bugs, Andrew Jones, netdev, David S . Miller,
	Eric Dumazet, Paolo Abeni, Sebastian Andrzej Siewior,
	Menglong Dong, Kuniyuki Iwashima, Petr Machata, Guo Ren,
	Michael S . Tsirkin, Alexander Gordeev, andriy.shevchenko, linux,
	caraitto, willemb, jonolson, amritha.nambiar, linux-kernel

On Mon, 17 Oct 2022 11:54:01 +0900 Tetsuo Handa wrote:
> On 2022/10/17 2:52, Linus Torvalds wrote:
> > Anyway, since rc1 is fairly imminent, I will just revert it for now -
> > I don't want to have a pending revert wait until -rc2.  
> 
> Thank you, Linus.
> 
> Yury or Jakub, please send a revert request on commit 854701ba4c39 ("net: fix cpu_max_bits_warn()
> usage in netif_attrmask_next{,_and}"), for https://syzkaller.appspot.com/bug?extid=9abe5ecc348676215427
> says that boot is still failing.

Yup, we got that revert queued up. LMK if it's a show stopper 
otherwise it'll get to Linus on Thu.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-10-17 21:26 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-14 21:16 [syzbot] WARNING in c_start syzbot
2022-10-14 21:24 ` Borislav Petkov
     [not found]   ` <2eaf1386-8ab0-bd65-acee-e29f1c5a6623@I-love.SAKURA.ne.jp>
     [not found]     ` <Y0qfLyhSoTodAdxu@zn.tnic>
2022-10-15 20:44       ` Yury Norov
2022-10-16  0:24         ` Tetsuo Handa
2022-10-16  0:28           ` Randy Dunlap
2022-10-16  0:34             ` Tetsuo Handa
2022-10-16  1:12           ` Yury Norov
2022-10-16  4:10             ` Tetsuo Handa
2022-10-16 17:52             ` Linus Torvalds
2022-10-17  2:54               ` Tetsuo Handa
2022-10-17 21:26                 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).