From: Atish Patra <atishp@atishpatra.org> To: linux-kernel@vger.kernel.org Cc: Atish Patra <atishp@atishpatra.org>, Alexandre Ghiti <alex@ghiti.fr>, Anup Patel <anup.patel@wdc.com>, Greentime Hu <greentime.hu@sifive.com>, Guo Ren <guoren@linux.alibaba.com>, Heinrich Schuchardt <xypron.glpk@gmx.de>, Ingo Molnar <mingo@kernel.org>, Jisheng Zhang <jszhang@kernel.org>, kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, linux-riscv@lists.infradead.org, Marc Zyngier <maz@kernel.org>, Nanyong Sun <sunnanyong@huawei.com>, Nick Kossifidis <mick@ics.forth.gr>, Palmer Dabbelt <palmer@dabbelt.com>, Paul Walmsley <paul.walmsley@sifive.com>, Pekka Enberg <penberg@kernel.org>, Vincent Chen <vincent.chen@sifive.com>, Vitaly Wool <vitaly.wool@konsulko.com> Subject: [RFC 0/6] Sparse HART id support Date: Fri, 3 Dec 2021 16:20:32 -0800 [thread overview] Message-ID: <20211204002038.113653-1-atishp@atishpatra.org> (raw) Currently, sparse hartid is not supported for Linux RISC-V for the following reasons. 1. Both spinwait and ordered booting method uses __cpu_up_stack/task_pointer which is an array size of NR_CPUs. 2. During early booting, any hartid greater than NR_CPUs are not booted at all. 3. riscv_cpuid_to_hartid_mask uses struct cpumask for generating hartid bitmap. 4. SBI v0.2 implementation uses NR_CPUs as the maximum hartid number while generating hartmask. In order to support sparse hartid, the hartid & NR_CPUS needs to be disassociated which was logically incorrect anyways. NR_CPUs represent the maximum logical| CPU id configured in the kernel while the hartid represent the physical hartid stored in mhartid CSR defined by the privilege specification. Thus, hartid can have much greater value than logical cpuid. Currently, we have two methods of booting. Ordered booting where the booting hart brings up each non-booting hart one by one using SBI HSM extension. The spinwait booting method relies on harts jumping to Linux kernel randomly and boot hart is selected by a lottery. All other non-booting harts keep spinning on __cpu_up_stack/task_pointer until boot hart initializes the data. Both these methods rely on __cpu_up_stack/task_pointer to setup the stack/ task pointer. The spinwait method is mostly used to support older firmwares without SBI HSM extension and M-mode Linux. The ordered booting method is the preferred booting method for booting general Linux because it can support cpu hotplug and kexec. The first patch modified the ordered booting method to use an opaque parameter already available in HSM start API to setup the stack/task pointer. The third patch resolves the issue #1 by limiting the usage of __cpu_up_stack/task_pointer to spinwait specific booting method. The fourth and fifth patch moves the entire hart lottery selection and spinwait method to a separate config that can be disabled if required. It solves the issue #2. The 6th patch solves issue #3 and #4 by removing riscv_cpuid_to_hartid_mask completely. All the SBI APIs directly pass a pointer to struct cpumask and the SBI implementation takes care of generating the hart bitmap from the cpumask. It is not trivial to support sparse hartid for spinwait booting method and there are no usecases to support sparse hartid for spinwait method as well. Any platform with sparse hartid will probably require more advanced features such as cpu hotplug and kexec. Thus, the series supports the sparse hartid via ordered booting method only. To maintain backward compatibility, spinwait booting method is currently enabled in defconfig so that M-mode linux will continue to work. Any platform that requires to sparse hartid must disable the spinwait method. This series also fixes the out-of-bounds access error[1] reported by Geert. The issue can be reproduced with SMP booting with NR_CPUS=4 on platforms with discontiguous hart numbering (HiFive unleashed/unmatched & polarfire). Spinwait method should also be disabled for such configuration where NR_CPUS value is less than maximum hartid in the platform. [1] https://lore.kernel.org/lkml/CAMuHMdUPWOjJfJohxLJefHOrJBtXZ0xfHQt4=hXpUXnasiN+AQ@mail.gmail.com/#t The series is based on queue branch on kvm-riscv as it has kvm related changes as well. I have tested it on HiFive Unmatched and Qemu. Atish Patra (6): RISC-V: Avoid using per cpu array for ordered booting RISC-V: Do not print the SBI version during HSM extension boot print RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method RISC-V: Move the entire hart selection via lottery to SMP RISC-V: Move spinwait booting method to its own config RISC-V: Do not use cpumask data structure for hartid bitmap arch/riscv/Kconfig | 14 ++ arch/riscv/include/asm/cpu_ops.h | 2 - arch/riscv/include/asm/cpu_ops_sbi.h | 28 ++++ arch/riscv/include/asm/sbi.h | 19 +-- arch/riscv/include/asm/smp.h | 8 -- arch/riscv/kernel/Makefile | 3 +- arch/riscv/kernel/cpu_ops.c | 26 ++-- arch/riscv/kernel/cpu_ops_sbi.c | 23 +++- arch/riscv/kernel/cpu_ops_spinwait.c | 27 +++- arch/riscv/kernel/head.S | 33 +++-- arch/riscv/kernel/head.h | 6 +- arch/riscv/kernel/sbi.c | 189 +++++++++++++++------------ arch/riscv/kernel/smp.c | 10 -- arch/riscv/kernel/smpboot.c | 2 +- arch/riscv/kvm/mmu.c | 4 +- arch/riscv/kvm/vcpu_sbi_replace.c | 11 +- arch/riscv/kvm/vcpu_sbi_v01.c | 11 +- arch/riscv/kvm/vmid.c | 4 +- arch/riscv/mm/cacheflush.c | 5 +- arch/riscv/mm/tlbflush.c | 9 +- 20 files changed, 252 insertions(+), 182 deletions(-) create mode 100644 arch/riscv/include/asm/cpu_ops_sbi.h -- 2.33.1
WARNING: multiple messages have this Message-ID (diff)
From: Atish Patra <atishp@atishpatra.org> To: linux-kernel@vger.kernel.org Cc: Atish Patra <atishp@atishpatra.org>, Alexandre Ghiti <alex@ghiti.fr>, Anup Patel <anup.patel@wdc.com>, Greentime Hu <greentime.hu@sifive.com>, Guo Ren <guoren@linux.alibaba.com>, Heinrich Schuchardt <xypron.glpk@gmx.de>, Ingo Molnar <mingo@kernel.org>, Jisheng Zhang <jszhang@kernel.org>, kvm-riscv@lists.infradead.org, kvm@vger.kernel.org, linux-riscv@lists.infradead.org, Marc Zyngier <maz@kernel.org>, Nanyong Sun <sunnanyong@huawei.com>, Nick Kossifidis <mick@ics.forth.gr>, Palmer Dabbelt <palmer@dabbelt.com>, Paul Walmsley <paul.walmsley@sifive.com>, Pekka Enberg <penberg@kernel.org>, Vincent Chen <vincent.chen@sifive.com>, Vitaly Wool <vitaly.wool@konsulko.com> Subject: [RFC 0/6] Sparse HART id support Date: Fri, 3 Dec 2021 16:20:32 -0800 [thread overview] Message-ID: <20211204002038.113653-1-atishp@atishpatra.org> (raw) Currently, sparse hartid is not supported for Linux RISC-V for the following reasons. 1. Both spinwait and ordered booting method uses __cpu_up_stack/task_pointer which is an array size of NR_CPUs. 2. During early booting, any hartid greater than NR_CPUs are not booted at all. 3. riscv_cpuid_to_hartid_mask uses struct cpumask for generating hartid bitmap. 4. SBI v0.2 implementation uses NR_CPUs as the maximum hartid number while generating hartmask. In order to support sparse hartid, the hartid & NR_CPUS needs to be disassociated which was logically incorrect anyways. NR_CPUs represent the maximum logical| CPU id configured in the kernel while the hartid represent the physical hartid stored in mhartid CSR defined by the privilege specification. Thus, hartid can have much greater value than logical cpuid. Currently, we have two methods of booting. Ordered booting where the booting hart brings up each non-booting hart one by one using SBI HSM extension. The spinwait booting method relies on harts jumping to Linux kernel randomly and boot hart is selected by a lottery. All other non-booting harts keep spinning on __cpu_up_stack/task_pointer until boot hart initializes the data. Both these methods rely on __cpu_up_stack/task_pointer to setup the stack/ task pointer. The spinwait method is mostly used to support older firmwares without SBI HSM extension and M-mode Linux. The ordered booting method is the preferred booting method for booting general Linux because it can support cpu hotplug and kexec. The first patch modified the ordered booting method to use an opaque parameter already available in HSM start API to setup the stack/task pointer. The third patch resolves the issue #1 by limiting the usage of __cpu_up_stack/task_pointer to spinwait specific booting method. The fourth and fifth patch moves the entire hart lottery selection and spinwait method to a separate config that can be disabled if required. It solves the issue #2. The 6th patch solves issue #3 and #4 by removing riscv_cpuid_to_hartid_mask completely. All the SBI APIs directly pass a pointer to struct cpumask and the SBI implementation takes care of generating the hart bitmap from the cpumask. It is not trivial to support sparse hartid for spinwait booting method and there are no usecases to support sparse hartid for spinwait method as well. Any platform with sparse hartid will probably require more advanced features such as cpu hotplug and kexec. Thus, the series supports the sparse hartid via ordered booting method only. To maintain backward compatibility, spinwait booting method is currently enabled in defconfig so that M-mode linux will continue to work. Any platform that requires to sparse hartid must disable the spinwait method. This series also fixes the out-of-bounds access error[1] reported by Geert. The issue can be reproduced with SMP booting with NR_CPUS=4 on platforms with discontiguous hart numbering (HiFive unleashed/unmatched & polarfire). Spinwait method should also be disabled for such configuration where NR_CPUS value is less than maximum hartid in the platform. [1] https://lore.kernel.org/lkml/CAMuHMdUPWOjJfJohxLJefHOrJBtXZ0xfHQt4=hXpUXnasiN+AQ@mail.gmail.com/#t The series is based on queue branch on kvm-riscv as it has kvm related changes as well. I have tested it on HiFive Unmatched and Qemu. Atish Patra (6): RISC-V: Avoid using per cpu array for ordered booting RISC-V: Do not print the SBI version during HSM extension boot print RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method RISC-V: Move the entire hart selection via lottery to SMP RISC-V: Move spinwait booting method to its own config RISC-V: Do not use cpumask data structure for hartid bitmap arch/riscv/Kconfig | 14 ++ arch/riscv/include/asm/cpu_ops.h | 2 - arch/riscv/include/asm/cpu_ops_sbi.h | 28 ++++ arch/riscv/include/asm/sbi.h | 19 +-- arch/riscv/include/asm/smp.h | 8 -- arch/riscv/kernel/Makefile | 3 +- arch/riscv/kernel/cpu_ops.c | 26 ++-- arch/riscv/kernel/cpu_ops_sbi.c | 23 +++- arch/riscv/kernel/cpu_ops_spinwait.c | 27 +++- arch/riscv/kernel/head.S | 33 +++-- arch/riscv/kernel/head.h | 6 +- arch/riscv/kernel/sbi.c | 189 +++++++++++++++------------ arch/riscv/kernel/smp.c | 10 -- arch/riscv/kernel/smpboot.c | 2 +- arch/riscv/kvm/mmu.c | 4 +- arch/riscv/kvm/vcpu_sbi_replace.c | 11 +- arch/riscv/kvm/vcpu_sbi_v01.c | 11 +- arch/riscv/kvm/vmid.c | 4 +- arch/riscv/mm/cacheflush.c | 5 +- arch/riscv/mm/tlbflush.c | 9 +- 20 files changed, 252 insertions(+), 182 deletions(-) create mode 100644 arch/riscv/include/asm/cpu_ops_sbi.h -- 2.33.1 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv
next reply other threads:[~2021-12-04 0:20 UTC|newest] Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-12-04 0:20 Atish Patra [this message] 2021-12-04 0:20 ` [RFC 0/6] Sparse HART id support Atish Patra 2021-12-04 0:20 ` [RFC 1/6] RISC-V: Avoid using per cpu array for ordered booting Atish Patra 2021-12-04 0:20 ` Atish Patra 2021-12-13 12:48 ` Anup Patel 2021-12-13 12:48 ` Anup Patel 2021-12-13 21:05 ` Atish Patra 2021-12-13 21:05 ` Atish Patra 2021-12-04 0:20 ` [RFC 2/6] RISC-V: Do not print the SBI version during HSM extension boot print Atish Patra 2021-12-04 0:20 ` Atish Patra 2021-12-13 12:49 ` Anup Patel 2021-12-13 12:49 ` Anup Patel 2021-12-04 0:20 ` [RFC 3/6] RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method Atish Patra 2021-12-04 0:20 ` Atish Patra 2021-12-13 12:50 ` Anup Patel 2021-12-13 12:50 ` Anup Patel 2021-12-13 12:59 ` Marc Zyngier 2021-12-13 12:59 ` Marc Zyngier 2021-12-13 21:12 ` Atish Patra 2021-12-13 21:12 ` Atish Patra 2021-12-04 0:20 ` [RFC 4/6] RISC-V: Move the entire hart selection via lottery to SMP Atish Patra 2021-12-13 12:57 ` Anup Patel 2021-12-04 0:20 ` [RFC 5/6] RISC-V: Move spinwait booting method to its own config Atish Patra 2021-12-04 0:20 ` Atish Patra 2021-12-04 0:40 ` Randy Dunlap 2021-12-04 0:40 ` Randy Dunlap 2021-12-13 13:01 ` Anup Patel 2021-12-13 13:01 ` Anup Patel 2021-12-13 21:08 ` Atish Patra 2021-12-13 21:08 ` Atish Patra 2021-12-04 0:20 ` [RFC 6/6] RISC-V: Do not use cpumask data structure for hartid bitmap Atish Patra 2021-12-04 0:20 ` Atish Patra 2021-12-06 15:28 ` [RFC 0/6] Sparse HART id support Rob Herring 2021-12-06 15:28 ` Rob Herring 2021-12-13 21:27 ` Atish Patra 2021-12-13 21:27 ` Atish Patra 2021-12-13 23:11 ` Rob Herring 2021-12-13 23:11 ` Rob Herring 2021-12-14 0:58 ` Atish Patra 2021-12-14 0:58 ` Atish Patra
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20211204002038.113653-1-atishp@atishpatra.org \ --to=atishp@atishpatra.org \ --cc=alex@ghiti.fr \ --cc=anup.patel@wdc.com \ --cc=greentime.hu@sifive.com \ --cc=guoren@linux.alibaba.com \ --cc=jszhang@kernel.org \ --cc=kvm-riscv@lists.infradead.org \ --cc=kvm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-riscv@lists.infradead.org \ --cc=maz@kernel.org \ --cc=mick@ics.forth.gr \ --cc=mingo@kernel.org \ --cc=palmer@dabbelt.com \ --cc=paul.walmsley@sifive.com \ --cc=penberg@kernel.org \ --cc=sunnanyong@huawei.com \ --cc=vincent.chen@sifive.com \ --cc=vitaly.wool@konsulko.com \ --cc=xypron.glpk@gmx.de \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.