linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: cpumask: re-introduce constant-sized cpumask optimizations
@ 2023-03-06 11:20 Geert Uytterhoeven
  2023-03-06 19:19 ` Linus Torvalds
  0 siblings, 1 reply; 3+ messages in thread
From: Geert Uytterhoeven @ 2023-03-06 11:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux-Renesas, Linux ARM, Linux Kernel Mailing List

Hi Linus,

Your final commit 596ff4a09b898179 ("cpumask: re-introduce
constant-sized cpumask optimizations") in v6.3-rc1 introduced a
regression.  During Debian userspace startup, the kernel crashes with:

    Alignment trap: not handling instruction e1931f9f at [<c015f0b4>]
    8<--- cut here ---
    Unhandled fault: alignment exception (0x001) at 0xc0c5b701
    [c0c5b701] *pgd=40c1141e(bad)
    Internal error: : 1 [#1] SMP ARM
    CPU: 0 PID: 1 Comm: systemd Not tainted 6.3.0-rc1-shmobile #1519
    Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
    PC is at arch_spin_lock+0x10/0x48
    LR is at arch_spin_lock+0xc/0x48
    pc : [<c015f0b8>]    lr : [<c015f0b4>]    psr: 80060093
    sp : f0815e48  ip : c0c5b700  fp : c0d04e08
    r10: c0d05b34  r9 : c0e5c284  r8 : c10ad140
    r7 : f0815e84  r6 : 00000008  r5 : c0c5b701  r4 : f0815e84
    r3 : c0c5b701  r2 : c0858678  r1 : 40060013  r0 : c0c5b701
    Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
    Control: 10c5387d  Table: 4204c06a  DAC: 00000051
    Register r0 information: non-slab/vmalloc memory
    Register r1 information: non-paged memory
    Register r2 information: non-slab/vmalloc memory
    Register r3 information: non-slab/vmalloc memory
    Register r4 information: 2-page vmalloc region starting at
0xf0814000 allocated at kernel_clone+0xa0/0x258
    Register r5 information: non-slab/vmalloc memory
    Register r6 information: non-paged memory
    Register r7 information: 2-page vmalloc region starting at
0xf0814000 allocated at kernel_clone+0xa0/0x258
    Register r8 information: slab task_struct start c10ad140 pointer
offset 0 size 2176
    Register r9 information: non-slab/vmalloc memory
    Register r10 information: non-slab/vmalloc memory
    Register r11 information: non-slab/vmalloc memory
    Register r12 information: non-slab/vmalloc memory
    Process systemd (pid: 1, stack limit = 0x(ptrval))
    Stack: (0xf0815e48 to 0xf0816000)
    5e40:                   f0815e84 c0186694 40060013 96063d9c
f0815e80 00000008
    5e60: 00000002 c08584a4 00000000 96063d9c 00000000 04a183ac
00000003 00000001
    5e80: 04a183ac 00000122 00000000 ffff8dd8 c0858678 06040001
00000001 00000002
    5ea0: f7e1016b 00000007 c2143015 004c6000 00000000 c1d05000
c12c20d0 00000101
    5ec0: 00000000 00000000 00000000 0000007a 00000038 00000000
00000000 96063d9c
    5ee0: f0815ee4 c0e5c284 f0815f18 c22ad240 f0815f78 be847850
c10ad140 00000003
    5f00: b6e61130 c0445328 00000000 00000010 c22ad240 c024ab18
01000006 00000000
    5f20: 00000010 be847850 00000000 00000000 c22ad240 00000000
00000000 00000000
    5f40: 00000000 00000000 00000000 00004004 00000000 00000000
00000001 96063d9c
    5f60: c22ad240 be847850 f0815f78 f0815f84 00000010 c024af94
00000000 00000000
    5f80: 00000000 c22ad240 00000000 96063d9c 00000074 be847850
00000000 00000003
    5fa0: c01002c4 c0100060 00000074 be847850 0000000c be847850
00000010 00000000
    5fc0: 00000074 be847850 00000000 00000003 00000001 00000001
00000001 b6e61130
    5fe0: 00000003 be8477c0 b6ef152f b6e7a746 60060030 0000000c
00000000 00000000
     arch_spin_lock from add_timer_on+0xe8/0x124
     add_timer_on from try_to_generate_entropy+0x1f4/0x250
     try_to_generate_entropy from urandom_read_iter+0x2c/0xc8
     urandom_read_iter from vfs_read+0x124/0x178
     vfs_read from ksys_read+0x74/0xc8
     ksys_read from ret_fast_syscall+0x0/0x54
    Exception stack(0xf0815fa8 to 0xf0815ff0)
    5fa0:                   00000074 be847850 0000000c be847850
00000010 00000000
    5fc0: 00000074 be847850 00000000 00000003 00000001 00000001
00000001 b6e61130
    5fe0: 00000003 be8477c0 b6ef152f b6e7a746
    Code: e92d4010 e1a03000 ebfffff7 e1931f9f (e2812801)
    ---[ end trace 0000000000000000 ]---
    note: systemd[1] exited with irqs disabled

|     #define for_each_cpu_wrap(cpu, mask, start)                            \
|    -       for_each_set_bit_wrap(cpu, cpumask_bits(mask),
nr_cpumask_bits, start)
|    +       for_each_set_bit_wrap(cpu, cpumask_bits(mask),
small_cpumask_bits, start)

Presumably using small_cpumask_bits instead of nr_cpu_ids accesses
some uninitialized array members?

    NR_CPUS = 8
    small_cpumask_bits = 8
    nr_cpu_ids = 2

A similar kernel on an arm64 system that does have 8 CPU cores works fine.
On an arm64 system with 2 CPU cores, it crashes in a similar way.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: cpumask: re-introduce constant-sized cpumask optimizations
  2023-03-06 11:20 cpumask: re-introduce constant-sized cpumask optimizations Geert Uytterhoeven
@ 2023-03-06 19:19 ` Linus Torvalds
  2023-03-06 20:18   ` Geert Uytterhoeven
  0 siblings, 1 reply; 3+ messages in thread
From: Linus Torvalds @ 2023-03-06 19:19 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Linux-Renesas, Linux ARM, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1687 bytes --]

On Mon, Mar 6, 2023 at 3:20 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Your final commit 596ff4a09b898179 ("cpumask: re-introduce
> constant-sized cpumask optimizations") in v6.3-rc1 introduced a
> regression.  During Debian userspace startup, the kernel crashes with:

I'm pretty sure the attached patch should fix it. If you can confirm,
that would be lovely.

The only relevant part is actually just the oneliner to
drivers/char/random.c - the others are fixing the exact same problem
elsewhere, but that's not the case you're actually hitting.

> Presumably using small_cpumask_bits instead of nr_cpu_ids accesses
> some uninitialized array members?

No, it's actually the other way around - some drivers end up using
that "nr_cpumask_bits" in invalid ways, and the small_cpumask_bits
optimization then made that just _very_ obvious.

The bug was pre-existing, it's just that you couldn't trigger it before.

> A similar kernel on an arm64 system that does have 8 CPU cores works fine.
> On an arm64 system with 2 CPU cores, it crashes in a similar way.

Yeah, it's rather machine-specific, and that's why I never saw it in
my local testing either.

So on my machine I always either filled up the cpumask entirely
(because I did my testing on my beefy desktop), or NR_CPUS was so
large that the problem case never happened in the first place because
nr_cpumask_bits was the same as small_cpumask_bits.

I thought I had tested the interesting cases, but in this case, the
case that fails is actually the really trivial case. It's just that my
big machine would never trigger it, because it has so many cores.

               Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 4183 bytes --]

 arch/powerpc/xmon/xmon.c         |  2 +-
 drivers/char/random.c            |  2 +-
 drivers/net/wireguard/queueing.h |  2 +-
 drivers/scsi/lpfc/lpfc_init.c    | 14 +++++++-------
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
index 73c620c2a3a1..e753a6bd4888 100644
--- a/arch/powerpc/xmon/xmon.c
+++ b/arch/powerpc/xmon/xmon.c
@@ -1275,7 +1275,7 @@ static int xmon_batch_next_cpu(void)
 	while (!cpumask_empty(&xmon_batch_cpus)) {
 		cpu = cpumask_next_wrap(smp_processor_id(), &xmon_batch_cpus,
 					xmon_batch_start_cpu, true);
-		if (cpu == nr_cpumask_bits)
+		if (cpu >= nr_cpu_ids)
 			break;
 		if (xmon_batch_start_cpu == -1)
 			xmon_batch_start_cpu = cpu;
diff --git a/drivers/char/random.c b/drivers/char/random.c
index ce3ccd172cc8..253f2ddb8913 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1311,7 +1311,7 @@ static void __cold try_to_generate_entropy(void)
 			/* Basic CPU round-robin, which avoids the current CPU. */
 			do {
 				cpu = cpumask_next(cpu, &timer_cpus);
-				if (cpu == nr_cpumask_bits)
+				if (cpu >= nr_cpu_ids)
 					cpu = cpumask_first(&timer_cpus);
 			} while (cpu == smp_processor_id() && num_cpus > 1);
 
diff --git a/drivers/net/wireguard/queueing.h b/drivers/net/wireguard/queueing.h
index 583adb37ee1e..125284b346a7 100644
--- a/drivers/net/wireguard/queueing.h
+++ b/drivers/net/wireguard/queueing.h
@@ -106,7 +106,7 @@ static inline int wg_cpumask_choose_online(int *stored_cpu, unsigned int id)
 {
 	unsigned int cpu = *stored_cpu, cpu_index, i;
 
-	if (unlikely(cpu == nr_cpumask_bits ||
+	if (unlikely(cpu >= nr_cpu_ids ||
 		     !cpumask_test_cpu(cpu, cpu_online_mask))) {
 		cpu_index = id % cpumask_weight(cpu_online_mask);
 		cpu = cpumask_first(cpu_online_mask);
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index 61958a24a43d..73b544bfbb2e 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -12563,7 +12563,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 					goto found_same;
 				new_cpu = cpumask_next(
 					new_cpu, cpu_present_mask);
-				if (new_cpu == nr_cpumask_bits)
+				if (new_cpu >= nr_cpu_ids)
 					new_cpu = first_cpu;
 			}
 			/* At this point, we leave the CPU as unassigned */
@@ -12577,7 +12577,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 			 * selecting the same IRQ.
 			 */
 			start_cpu = cpumask_next(new_cpu, cpu_present_mask);
-			if (start_cpu == nr_cpumask_bits)
+			if (start_cpu >= nr_cpu_ids)
 				start_cpu = first_cpu;
 
 			lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
@@ -12613,7 +12613,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 					goto found_any;
 				new_cpu = cpumask_next(
 					new_cpu, cpu_present_mask);
-				if (new_cpu == nr_cpumask_bits)
+				if (new_cpu >= nr_cpu_ids)
 					new_cpu = first_cpu;
 			}
 			/* We should never leave an entry unassigned */
@@ -12631,7 +12631,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 			 * selecting the same IRQ.
 			 */
 			start_cpu = cpumask_next(new_cpu, cpu_present_mask);
-			if (start_cpu == nr_cpumask_bits)
+			if (start_cpu >= nr_cpu_ids)
 				start_cpu = first_cpu;
 
 			lpfc_printf_log(phba, KERN_INFO, LOG_INIT,
@@ -12704,7 +12704,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 				goto found_hdwq;
 			}
 			new_cpu = cpumask_next(new_cpu, cpu_present_mask);
-			if (new_cpu == nr_cpumask_bits)
+			if (new_cpu >= nr_cpu_ids)
 				new_cpu = first_cpu;
 		}
 
@@ -12719,7 +12719,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
 				goto found_hdwq;
 
 			new_cpu = cpumask_next(new_cpu, cpu_present_mask);
-			if (new_cpu == nr_cpumask_bits)
+			if (new_cpu >= nr_cpu_ids)
 				new_cpu = first_cpu;
 		}
 
@@ -12730,7 +12730,7 @@ lpfc_cpu_affinity_check(struct lpfc_hba *phba, int vectors)
  found_hdwq:
 		/* We found an available entry, copy the IRQ info */
 		start_cpu = cpumask_next(new_cpu, cpu_present_mask);
-		if (start_cpu == nr_cpumask_bits)
+		if (start_cpu >= nr_cpu_ids)
 			start_cpu = first_cpu;
 		cpup->hdwq = new_cpup->hdwq;
  logit:

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: cpumask: re-introduce constant-sized cpumask optimizations
  2023-03-06 19:19 ` Linus Torvalds
@ 2023-03-06 20:18   ` Geert Uytterhoeven
  0 siblings, 0 replies; 3+ messages in thread
From: Geert Uytterhoeven @ 2023-03-06 20:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux-Renesas, Linux ARM, Linux Kernel Mailing List

Hi Linus,

On Mon, Mar 6, 2023 at 8:19 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Mon, Mar 6, 2023 at 3:20 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > Your final commit 596ff4a09b898179 ("cpumask: re-introduce
> > constant-sized cpumask optimizations") in v6.3-rc1 introduced a
> > regression.  During Debian userspace startup, the kernel crashes with:
>
> I'm pretty sure the attached patch should fix it. If you can confirm,
> that would be lovely.

Thanks, works fine again!
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-03-06 20:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-06 11:20 cpumask: re-introduce constant-sized cpumask optimizations Geert Uytterhoeven
2023-03-06 19:19 ` Linus Torvalds
2023-03-06 20:18   ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).