Out-of-bounds access when hartid >= NR_CPUS

* Out-of-bounds access when hartid >= NR_CPUS
@ 2021-10-25 15:54 ` Geert Uytterhoeven
  0 siblings, 0 replies; 30+ messages in thread
From: Geert Uytterhoeven @ 2021-10-25 15:54 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: linux-riscv, Linux Kernel Mailing List

Hi all,

When booting a kernel with CONFIG_NR_CPUS=4 on Microchip PolarFire,
the 4th CPU either fails to come online, or the system crashes.

This happens because PolarFire has 5 CPU cores: hart 0 is an e51,
and harts 1-4 are u54s, with the latter becoming CPUs 0-3 in Linux:
  - unused core has hartid 0 (sifive,e51),
  - processor 0 has hartid 1 (sifive,u74-mc),
  - processor 1 has hartid 2 (sifive,u74-mc),
  - processor 2 has hartid 3 (sifive,u74-mc),
  - processor 3 has hartid 4 (sifive,u74-mc).

I assume the same issue is present on the SiFive fu540 and fu740
SoCs, but I don't have access to these.  The issue is not present
on StarFive JH7100, as processor 0 has hartid 1, and processor 1 has
hartid 0.

arch/riscv/kernel/cpu_ops.c has:

    void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
    void *__cpu_up_task_pointer[NR_CPUS] __section(".data");

    void cpu_update_secondary_bootdata(unsigned int cpuid,
                                       struct task_struct *tidle)
    {
            int hartid = cpuid_to_hartid_map(cpuid);

            /* Make sure tidle is updated */
            smp_mb();
            WRITE_ONCE(__cpu_up_stack_pointer[hartid],
                       task_stack_page(tidle) + THREAD_SIZE);
            WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);

The above two writes cause out-of-bound accesses beyond
__cpu_up_{stack,pointer}_pointer[] if hartid >= CONFIG_NR_CPUS.

    }

arch/riscv/kernel/smpboot.c:setup_smp(void) detects CPUs like this:

    for_each_of_cpu_node(dn) {
            hart = riscv_of_processor_hartid(dn);
            if (hart < 0)
                    continue;

            if (hart == cpuid_to_hartid_map(0)) {
                    BUG_ON(found_boot_cpu);
                    found_boot_cpu = 1;
                    early_map_cpu_to_node(0, of_node_to_nid(dn));
                    continue;
            }
            if (cpuid >= NR_CPUS) {
                    pr_warn("Invalid cpuid [%d] for hartid [%d]\n",
                            cpuid, hart);
                    break;
            }

            cpuid_to_hartid_map(cpuid) = hart;
            early_map_cpu_to_node(cpuid, of_node_to_nid(dn));
            cpuid++;
    }

So cpuid >= CONFIG_NR_CPUS (too many CPU cores) is already rejected.

How to fix this?

We could skip hartids >= NR_CPUS, but that feels strange to me, as
you need NR_CPUS to be larger (much larger if the first usable hartid
is a large number) than the number of CPUs used.

We could store the minimum hartid, and always subtract that when
accessing __cpu_up_{stack,pointer}_pointer[] (also in
arch/riscv/kernel/head.S), but that means unused cores cannot be in the
middle of the hartid range.

Are hartids guaranteed to be continuous? If not, we have no choice but
to index __cpu_up_{stack,pointer}_pointer[] by cpuid instead, which
needs a more expensive conversion in arch/riscv/kernel/head.S.

Thanks for your comments!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 30+ messages in thread