linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/6] Sparse HART id support
@ 2021-12-04  0:20 Atish Patra
  2021-12-04  0:20 ` [RFC 1/6] RISC-V: Avoid using per cpu array for ordered booting Atish Patra
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Atish Patra @ 2021-12-04  0:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Alexandre Ghiti, Anup Patel, Greentime Hu, Guo Ren,
	Heinrich Schuchardt, Ingo Molnar, Jisheng Zhang, kvm-riscv, kvm,
	linux-riscv, Marc Zyngier, Nanyong Sun, Nick Kossifidis,
	Palmer Dabbelt, Paul Walmsley, Pekka Enberg, Vincent Chen,
	Vitaly Wool

Currently, sparse hartid is not supported for Linux RISC-V for the following
reasons.
1. Both spinwait and ordered booting method uses __cpu_up_stack/task_pointer
   which is an array size of NR_CPUs.
2. During early booting, any hartid greater than NR_CPUs are not booted at all.
3. riscv_cpuid_to_hartid_mask uses struct cpumask for generating hartid bitmap.
4. SBI v0.2 implementation uses NR_CPUs as the maximum hartid number while
   generating hartmask.

In order to support sparse hartid, the hartid & NR_CPUS needs to be disassociated
which was logically incorrect anyways. NR_CPUs represent the maximum logical|
CPU id configured in the kernel while the hartid represent the physical hartid
stored in mhartid CSR defined by the privilege specification. Thus, hartid
can have much greater value than logical cpuid.

Currently, we have two methods of booting. Ordered booting where the booting
hart brings up each non-booting hart one by one using SBI HSM extension.
The spinwait booting method relies on harts jumping to Linux kernel randomly
and boot hart is selected by a lottery. All other non-booting harts keep
spinning on __cpu_up_stack/task_pointer until boot hart initializes the data.
Both these methods rely on __cpu_up_stack/task_pointer to setup the stack/
task pointer. The spinwait method is mostly used to support older firmwares
without SBI HSM extension and M-mode Linux.  The ordered booting method is the
preferred booting method for booting general Linux because it can support
cpu hotplug and kexec.

The first patch modified the ordered booting method to use an opaque parameter
already available in HSM start API to setup the stack/task pointer. The third
patch resolves the issue #1 by limiting the usage of
__cpu_up_stack/task_pointer to spinwait specific booting method. The fourth 
and fifth patch moves the entire hart lottery selection and spinwait method
to a separate config that can be disabled if required. It solves the issue #2.
The 6th patch solves issue #3 and #4 by removing riscv_cpuid_to_hartid_mask
completely. All the SBI APIs directly pass a pointer to struct cpumask and
the SBI implementation takes care of generating the hart bitmap from the
cpumask.

It is not trivial to support sparse hartid for spinwait booting method and
there are no usecases to support sparse hartid for spinwait method as well.
Any platform with sparse hartid will probably require more advanced features
such as cpu hotplug and kexec. Thus, the series supports the sparse hartid via
ordered booting method only. To maintain backward compatibility, spinwait
booting method is currently enabled in defconfig so that M-mode linux will
continue to work. Any platform that requires to sparse hartid must disable the
spinwait method.

This series also fixes the out-of-bounds access error[1] reported by Geert.
The issue can be reproduced with SMP booting with NR_CPUS=4 on platforms with
discontiguous hart numbering (HiFive unleashed/unmatched & polarfire).
Spinwait method should also be disabled for such configuration where NR_CPUS
value is less than maximum hartid in the platform. 

[1] https://lore.kernel.org/lkml/CAMuHMdUPWOjJfJohxLJefHOrJBtXZ0xfHQt4=hXpUXnasiN+AQ@mail.gmail.com/#t

The series is based on queue branch on kvm-riscv as it has kvm related changes
as well. I have tested it on HiFive Unmatched and Qemu.

Atish Patra (6):
RISC-V: Avoid using per cpu array for ordered booting
RISC-V: Do not print the SBI version during HSM extension boot print
RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method
RISC-V: Move the entire hart selection via lottery to SMP
RISC-V: Move spinwait booting method to its own config
RISC-V: Do not use cpumask data structure for hartid bitmap

arch/riscv/Kconfig                   |  14 ++
arch/riscv/include/asm/cpu_ops.h     |   2 -
arch/riscv/include/asm/cpu_ops_sbi.h |  28 ++++
arch/riscv/include/asm/sbi.h         |  19 +--
arch/riscv/include/asm/smp.h         |   8 --
arch/riscv/kernel/Makefile           |   3 +-
arch/riscv/kernel/cpu_ops.c          |  26 ++--
arch/riscv/kernel/cpu_ops_sbi.c      |  23 +++-
arch/riscv/kernel/cpu_ops_spinwait.c |  27 +++-
arch/riscv/kernel/head.S             |  33 +++--
arch/riscv/kernel/head.h             |   6 +-
arch/riscv/kernel/sbi.c              | 189 +++++++++++++++------------
arch/riscv/kernel/smp.c              |  10 --
arch/riscv/kernel/smpboot.c          |   2 +-
arch/riscv/kvm/mmu.c                 |   4 +-
arch/riscv/kvm/vcpu_sbi_replace.c    |  11 +-
arch/riscv/kvm/vcpu_sbi_v01.c        |  11 +-
arch/riscv/kvm/vmid.c                |   4 +-
arch/riscv/mm/cacheflush.c           |   5 +-
arch/riscv/mm/tlbflush.c             |   9 +-
20 files changed, 252 insertions(+), 182 deletions(-)
create mode 100644 arch/riscv/include/asm/cpu_ops_sbi.h

--
2.33.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC 1/6] RISC-V: Avoid using per cpu array for ordered booting
  2021-12-04  0:20 [RFC 0/6] Sparse HART id support Atish Patra
@ 2021-12-04  0:20 ` Atish Patra
  2021-12-13 12:48   ` Anup Patel
  2021-12-04  0:20 ` [RFC 2/6] RISC-V: Do not print the SBI version during HSM extension boot print Atish Patra
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Atish Patra @ 2021-12-04  0:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Alexandre Ghiti, Anup Patel, Greentime Hu, Guo Ren,
	Heinrich Schuchardt, Ingo Molnar, Jisheng Zhang, kvm-riscv, kvm,
	linux-riscv, Marc Zyngier, Nanyong Sun, Nick Kossifidis,
	Palmer Dabbelt, Paul Walmsley, Pekka Enberg, Vincent Chen,
	Vitaly Wool

From: Atish Patra <atishp@rivosinc.com>

Currently both order booting and spinwait approach uses a per cpu
array to update stack & task pointer. This approach will not work for the
following cases.
1. If NR_CPUs are configured to be less than highest hart id.
2. A platform has sparse hartid.

This issue can be fixed for ordered booting as the booting cpu brings up
one cpu at a time using SBI HSM extension which has opaque parameter
that is unused until now.

Introduce a common secondary boot data structure that can store the stack
and task pointer. Secondary harts will use this data while booting up
to setup the sp & tp.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/cpu_ops_sbi.h | 28 ++++++++++++++++++++++++++++
 arch/riscv/kernel/cpu_ops_sbi.c      | 23 ++++++++++++++++++++---
 arch/riscv/kernel/head.S             | 19 ++++++++++---------
 3 files changed, 58 insertions(+), 12 deletions(-)
 create mode 100644 arch/riscv/include/asm/cpu_ops_sbi.h

diff --git a/arch/riscv/include/asm/cpu_ops_sbi.h b/arch/riscv/include/asm/cpu_ops_sbi.h
new file mode 100644
index 000000000000..ccb9a6d30486
--- /dev/null
+++ b/arch/riscv/include/asm/cpu_ops_sbi.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2021 by Rivos Inc.
+ */
+#ifndef __ASM_CPU_OPS_SBI_H
+#define __ASM_CPU_OPS_SBI_H
+
+#ifndef __ASSEMBLY__
+#include <linux/init.h>
+#include <linux/sched.h>
+#include <linux/threads.h>
+
+/**
+ * struct sbi_hart_boot_data - Hart specific boot used during booting and
+ *			       cpu hotplug.
+ * @task_ptr: A pointer to the hart specific tp
+ * @stack_ptr: A pointer to the hart specific sp
+ */
+struct sbi_hart_boot_data {
+	void *task_ptr;
+	void *stack_ptr;
+};
+#endif
+
+#define SBI_HART_BOOT_TASK_PTR_OFFSET (0x00)
+#define SBI_HART_BOOT_STACK_PTR_OFFSET RISCV_SZPTR
+
+#endif /* ifndef __ASM_CPU_OPS_H */
diff --git a/arch/riscv/kernel/cpu_ops_sbi.c b/arch/riscv/kernel/cpu_ops_sbi.c
index 685fae72b7f5..2e7a9dd9c2a7 100644
--- a/arch/riscv/kernel/cpu_ops_sbi.c
+++ b/arch/riscv/kernel/cpu_ops_sbi.c
@@ -7,13 +7,22 @@
 
 #include <linux/init.h>
 #include <linux/mm.h>
+#include <linux/sched/task_stack.h>
 #include <asm/cpu_ops.h>
+#include <asm/cpu_ops_sbi.h>
 #include <asm/sbi.h>
 #include <asm/smp.h>
 
 extern char secondary_start_sbi[];
 const struct cpu_operations cpu_ops_sbi;
 
+/*
+ * Ordered booting via HSM brings one cpu at a time. However, cpu hotplug can
+ * be invoked from multiple threads in paralle. Define a per cpu data
+ * to handle that.
+ */
+DEFINE_PER_CPU(struct sbi_hart_boot_data, boot_data);
+
 static int sbi_hsm_hart_start(unsigned long hartid, unsigned long saddr,
 			      unsigned long priv)
 {
@@ -58,9 +67,17 @@ static int sbi_cpu_start(unsigned int cpuid, struct task_struct *tidle)
 	int rc;
 	unsigned long boot_addr = __pa_symbol(secondary_start_sbi);
 	int hartid = cpuid_to_hartid_map(cpuid);
-
-	cpu_update_secondary_bootdata(cpuid, tidle);
-	rc = sbi_hsm_hart_start(hartid, boot_addr, 0);
+	unsigned long hsm_data;
+	struct sbi_hart_boot_data *bdata = &per_cpu(boot_data, cpuid);
+
+	/* Make sure tidle is updated */
+	smp_mb();
+	bdata->task_ptr = tidle;
+	bdata->stack_ptr = task_stack_page(tidle) + THREAD_SIZE;
+	/* Make sure boot data is updated */
+	smp_mb();
+	hsm_data = __pa(bdata);
+	rc = sbi_hsm_hart_start(hartid, boot_addr, hsm_data);
 
 	return rc;
 }
diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index f52f01ecbeea..40d4c625513c 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -11,6 +11,7 @@
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/csr.h>
+#include <asm/cpu_ops_sbi.h>
 #include <asm/hwcap.h>
 #include <asm/image.h>
 #include "efi-header.S"
@@ -167,15 +168,15 @@ secondary_start_sbi:
 	la a3, .Lsecondary_park
 	csrw CSR_TVEC, a3
 
-	slli a3, a0, LGREG
-	la a4, __cpu_up_stack_pointer
-	XIP_FIXUP_OFFSET a4
-	la a5, __cpu_up_task_pointer
-	XIP_FIXUP_OFFSET a5
-	add a4, a3, a4
-	add a5, a3, a5
-	REG_L sp, (a4)
-	REG_L tp, (a5)
+	/* a0 contains the hartid & a1 contains boot data */
+	li a2, SBI_HART_BOOT_TASK_PTR_OFFSET
+	XIP_FIXUP_OFFSET a2
+	add a2, a2, a1
+	REG_L tp, (a2)
+	li a3, SBI_HART_BOOT_STACK_PTR_OFFSET
+	XIP_FIXUP_OFFSET a3
+	add a3, a3, a1
+	REG_L sp, (a3)
 
 	.global secondary_start_common
 secondary_start_common:
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 2/6] RISC-V: Do not print the SBI version during HSM extension boot print
  2021-12-04  0:20 [RFC 0/6] Sparse HART id support Atish Patra
  2021-12-04  0:20 ` [RFC 1/6] RISC-V: Avoid using per cpu array for ordered booting Atish Patra
@ 2021-12-04  0:20 ` Atish Patra
  2021-12-13 12:49   ` Anup Patel
  2021-12-04  0:20 ` [RFC 3/6] RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method Atish Patra
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Atish Patra @ 2021-12-04  0:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Alexandre Ghiti, Anup Patel, Greentime Hu, Guo Ren,
	Heinrich Schuchardt, Ingo Molnar, Jisheng Zhang, kvm-riscv, kvm,
	linux-riscv, Marc Zyngier, Nanyong Sun, Nick Kossifidis,
	Palmer Dabbelt, Paul Walmsley, Pekka Enberg, Vincent Chen,
	Vitaly Wool

From: Atish Patra <atishp@rivosinc.com>

The HSM extension information log also prints the SBI version v0.2. This
is misleading as the underlying firmware SBI version may be different
from v0.2.

Remove the unncessary printing of SBI version.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/kernel/cpu_ops.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/kernel/cpu_ops.c b/arch/riscv/kernel/cpu_ops.c
index 1985884fe829..3f5a38b03044 100644
--- a/arch/riscv/kernel/cpu_ops.c
+++ b/arch/riscv/kernel/cpu_ops.c
@@ -38,7 +38,7 @@ void __init cpu_set_ops(int cpuid)
 #if IS_ENABLED(CONFIG_RISCV_SBI)
 	if (sbi_probe_extension(SBI_EXT_HSM) > 0) {
 		if (!cpuid)
-			pr_info("SBI v0.2 HSM extension detected\n");
+			pr_info("SBI HSM extension detected\n");
 		cpu_ops[cpuid] = &cpu_ops_sbi;
 	} else
 #endif
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 3/6] RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method
  2021-12-04  0:20 [RFC 0/6] Sparse HART id support Atish Patra
  2021-12-04  0:20 ` [RFC 1/6] RISC-V: Avoid using per cpu array for ordered booting Atish Patra
  2021-12-04  0:20 ` [RFC 2/6] RISC-V: Do not print the SBI version during HSM extension boot print Atish Patra
@ 2021-12-04  0:20 ` Atish Patra
  2021-12-13 12:50   ` Anup Patel
  2021-12-13 12:59   ` Marc Zyngier
  2021-12-04  0:20 ` [RFC 5/6] RISC-V: Move spinwait booting method to its own config Atish Patra
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 19+ messages in thread
From: Atish Patra @ 2021-12-04  0:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Alexandre Ghiti, Anup Patel, Greentime Hu, Guo Ren,
	Heinrich Schuchardt, Ingo Molnar, Jisheng Zhang, kvm-riscv, kvm,
	linux-riscv, Marc Zyngier, Nanyong Sun, Nick Kossifidis,
	Palmer Dabbelt, Paul Walmsley, Pekka Enberg, Vincent Chen,
	Vitaly Wool

From: Atish Patra <atishp@rivosinc.com>

The __cpu_up_stack/task_pointer array is only used for spinwait method
now. The per cpu array based lookup is also fragile for platforms with
discontiguous/sparse hartids. The spinwait method is only used for
M-mode Linux or older firmwares without SBI HSM extension. For general
Linux systems, ordered booting method is preferred anyways to support
cpu hotplug and kexec.

Make sure that __cpu_up_stack/task_pointer is only used for spinwait
method. Take this opportunity to rename it to
__cpu_spinwait_stack/task_pointer to emphasize the purpose as well.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/cpu_ops.h     |  2 --
 arch/riscv/kernel/cpu_ops.c          | 16 ----------------
 arch/riscv/kernel/cpu_ops_spinwait.c | 27 ++++++++++++++++++++++++++-
 arch/riscv/kernel/head.S             |  4 ++--
 arch/riscv/kernel/head.h             |  4 ++--
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/arch/riscv/include/asm/cpu_ops.h b/arch/riscv/include/asm/cpu_ops.h
index a8ec3c5c1bd2..134590f1b843 100644
--- a/arch/riscv/include/asm/cpu_ops.h
+++ b/arch/riscv/include/asm/cpu_ops.h
@@ -40,7 +40,5 @@ struct cpu_operations {
 
 extern const struct cpu_operations *cpu_ops[NR_CPUS];
 void __init cpu_set_ops(int cpu);
-void cpu_update_secondary_bootdata(unsigned int cpuid,
-				   struct task_struct *tidle);
 
 #endif /* ifndef __ASM_CPU_OPS_H */
diff --git a/arch/riscv/kernel/cpu_ops.c b/arch/riscv/kernel/cpu_ops.c
index 3f5a38b03044..c1e30f403c3b 100644
--- a/arch/riscv/kernel/cpu_ops.c
+++ b/arch/riscv/kernel/cpu_ops.c
@@ -8,31 +8,15 @@
 #include <linux/of.h>
 #include <linux/string.h>
 #include <linux/sched.h>
-#include <linux/sched/task_stack.h>
 #include <asm/cpu_ops.h>
 #include <asm/sbi.h>
 #include <asm/smp.h>
 
 const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init;
 
-void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
-void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
-
 extern const struct cpu_operations cpu_ops_sbi;
 extern const struct cpu_operations cpu_ops_spinwait;
 
-void cpu_update_secondary_bootdata(unsigned int cpuid,
-				   struct task_struct *tidle)
-{
-	int hartid = cpuid_to_hartid_map(cpuid);
-
-	/* Make sure tidle is updated */
-	smp_mb();
-	WRITE_ONCE(__cpu_up_stack_pointer[hartid],
-		   task_stack_page(tidle) + THREAD_SIZE);
-	WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
-}
-
 void __init cpu_set_ops(int cpuid)
 {
 #if IS_ENABLED(CONFIG_RISCV_SBI)
diff --git a/arch/riscv/kernel/cpu_ops_spinwait.c b/arch/riscv/kernel/cpu_ops_spinwait.c
index b2c957bb68c1..9f398eb94f7a 100644
--- a/arch/riscv/kernel/cpu_ops_spinwait.c
+++ b/arch/riscv/kernel/cpu_ops_spinwait.c
@@ -6,11 +6,36 @@
 #include <linux/errno.h>
 #include <linux/of.h>
 #include <linux/string.h>
+#include <linux/sched/task_stack.h>
 #include <asm/cpu_ops.h>
 #include <asm/sbi.h>
 #include <asm/smp.h>
 
 const struct cpu_operations cpu_ops_spinwait;
+void *__cpu_spinwait_stack_pointer[NR_CPUS] __section(".data");
+void *__cpu_spinwait_task_pointer[NR_CPUS] __section(".data");
+
+static void cpu_update_secondary_bootdata(unsigned int cpuid,
+				   struct task_struct *tidle)
+{
+	int hartid = cpuid_to_hartid_map(cpuid);
+
+	/*
+	 * The hartid must be less than NR_CPUS to avoid out-of-bound access
+	 * errors for __cpu_spinwait_stack/task_pointer. That is not always possible
+	 * for platforms with discontiguous hartid numbering scheme. That's why
+	 * spinwait booting is not the recommended approach for any platforms
+	 * and will be removed in future.
+	 */
+	if (hartid == INVALID_HARTID || hartid >= NR_CPUS)
+		return;
+
+	/* Make sure tidle is updated */
+	smp_mb();
+	WRITE_ONCE(__cpu_spinwait_stack_pointer[hartid],
+		   task_stack_page(tidle) + THREAD_SIZE);
+	WRITE_ONCE(__cpu_spinwait_task_pointer[hartid], tidle);
+}
 
 static int spinwait_cpu_prepare(unsigned int cpuid)
 {
@@ -28,7 +53,7 @@ static int spinwait_cpu_start(unsigned int cpuid, struct task_struct *tidle)
 	 * selects the first cpu to boot the kernel and causes the remainder
 	 * of the cpus to spin in a loop waiting for their stack pointer to be
 	 * setup by that main cpu.  Writing to bootdata
-	 * (i.e __cpu_up_stack_pointer) signals to the spinning cpus that they
+	 * (i.e __cpu_spinwait_stack_pointer) signals to the spinning cpus that they
 	 * can continue the boot process.
 	 */
 	cpu_update_secondary_bootdata(cpuid, tidle);
diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index 40d4c625513c..6f8e99eac6a1 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -347,9 +347,9 @@ clear_bss_done:
 	csrw CSR_TVEC, a3
 
 	slli a3, a0, LGREG
-	la a1, __cpu_up_stack_pointer
+	la a1, __cpu_spinwait_stack_pointer
 	XIP_FIXUP_OFFSET a1
-	la a2, __cpu_up_task_pointer
+	la a2, __cpu_spinwait_task_pointer
 	XIP_FIXUP_OFFSET a2
 	add a1, a3, a1
 	add a2, a3, a2
diff --git a/arch/riscv/kernel/head.h b/arch/riscv/kernel/head.h
index aabbc3ac3e48..5393cca77790 100644
--- a/arch/riscv/kernel/head.h
+++ b/arch/riscv/kernel/head.h
@@ -16,7 +16,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa);
 asmlinkage void __init __copy_data(void);
 #endif
 
-extern void *__cpu_up_stack_pointer[];
-extern void *__cpu_up_task_pointer[];
+extern void *__cpu_spinwait_stack_pointer[];
+extern void *__cpu_spinwait_task_pointer[];
 
 #endif /* __ASM_HEAD_H */
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 5/6] RISC-V: Move spinwait booting method to its own config
  2021-12-04  0:20 [RFC 0/6] Sparse HART id support Atish Patra
                   ` (2 preceding siblings ...)
  2021-12-04  0:20 ` [RFC 3/6] RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method Atish Patra
@ 2021-12-04  0:20 ` Atish Patra
  2021-12-04  0:40   ` Randy Dunlap
  2021-12-13 13:01   ` Anup Patel
  2021-12-04  0:20 ` [RFC 6/6] RISC-V: Do not use cpumask data structure for hartid bitmap Atish Patra
  2021-12-06 15:28 ` [RFC 0/6] Sparse HART id support Rob Herring
  5 siblings, 2 replies; 19+ messages in thread
From: Atish Patra @ 2021-12-04  0:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Alexandre Ghiti, Anup Patel, Greentime Hu, Guo Ren,
	Heinrich Schuchardt, Ingo Molnar, Jisheng Zhang, kvm-riscv, kvm,
	linux-riscv, Marc Zyngier, Nanyong Sun, Nick Kossifidis,
	Palmer Dabbelt, Paul Walmsley, Pekka Enberg, Vincent Chen,
	Vitaly Wool

From: Atish Patra <atishp@rivosinc.com>

The spinwait booting method should only be used for platforms with older
firmware without SBI HSM extension or M-mode firmware because spinwait
method can't support cpu hotplug, kexec or sparse hartid. It is better
to move the entire spinwait implementation to its own config which can
be disabled if required. It is enabled by default to maintain backward
compatibility and M-mode Linux.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/Kconfig          | 14 ++++++++++++++
 arch/riscv/kernel/Makefile  |  3 ++-
 arch/riscv/kernel/cpu_ops.c |  8 ++++++++
 arch/riscv/kernel/head.S    |  6 +++---
 arch/riscv/kernel/head.h    |  2 ++
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 821252b65f89..4afb42d5707d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -403,6 +403,20 @@ config RISCV_SBI_V01
 	  This config allows kernel to use SBI v0.1 APIs. This will be
 	  deprecated in future once legacy M-mode software are no longer in use.
 
+config RISCV_BOOT_SPINWAIT
+	bool "Spinwait booting method"
+	depends on SMP
+	default y
+	help
+	  This enables support for booting Linux via spinwait method. In the
+	  spinwait method, all cores randomly jump to Linux. One of the core
+	  gets chosen via lottery and all other keeps spinning on a percpu
+	  variable. This method can not support cpu hotplug and sparse hartid
+	  scheme. It should be only enabled for M-mode Linux or platforms relying
+	  on older firmware without SBI HSM extension. All other platform should
+	  rely on ordered booing via SBI HSM extension which gets chosen
+          dynamically at runtime if the firmware supports it.
+
 config KEXEC
 	bool "Kexec system call"
 	select KEXEC_CORE
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 3397ddac1a30..612556faa527 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -43,7 +43,8 @@ obj-$(CONFIG_FPU)		+= fpu.o
 obj-$(CONFIG_SMP)		+= smpboot.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= cpu_ops.o
-obj-$(CONFIG_SMP)		+= cpu_ops_spinwait.o
+
+obj-$(CONFIG_RISCV_BOOT_SPINWAIT) += cpu_ops_spinwait.o
 obj-$(CONFIG_MODULES)		+= module.o
 obj-$(CONFIG_MODULE_SECTIONS)	+= module-sections.o
 
diff --git a/arch/riscv/kernel/cpu_ops.c b/arch/riscv/kernel/cpu_ops.c
index c1e30f403c3b..170d07e57721 100644
--- a/arch/riscv/kernel/cpu_ops.c
+++ b/arch/riscv/kernel/cpu_ops.c
@@ -15,7 +15,15 @@
 const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init;
 
 extern const struct cpu_operations cpu_ops_sbi;
+#ifdef CONFIG_RISCV_BOOT_SPINWAIT
 extern const struct cpu_operations cpu_ops_spinwait;
+#else
+const struct cpu_operations cpu_ops_spinwait = {
+	.name		= "",
+	.cpu_prepare	= NULL,
+	.cpu_start	= NULL,
+};
+#endif
 
 void __init cpu_set_ops(int cpuid)
 {
diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
index 9f16bfe9307e..4a694e15b95b 100644
--- a/arch/riscv/kernel/head.S
+++ b/arch/riscv/kernel/head.S
@@ -259,7 +259,7 @@ pmp_done:
 	li t0, SR_FS
 	csrc CSR_STATUS, t0
 
-#ifdef CONFIG_SMP
+#ifdef CONFIG_RISCV_BOOT_SPINWAIT
 	li t0, CONFIG_NR_CPUS
 	blt a0, t0, .Lgood_cores
 	tail .Lsecondary_park
@@ -285,7 +285,7 @@ pmp_done:
 	beq t0, t1, .Lsecondary_start
 
 #endif /* CONFIG_XIP */
-#endif /* CONFIG_SMP */
+#endif /* CONFIG_RISCV_BOOT_SPINWAIT */
 
 #ifdef CONFIG_XIP_KERNEL
 	la sp, _end + THREAD_SIZE
@@ -344,7 +344,7 @@ clear_bss_done:
 	call soc_early_init
 	tail start_kernel
 
-#ifdef CONFIG_SMP
+#if defined(CONFIG_SMP) && defined(CONFIG_RISCV_BOOT_SPINWAIT)
 .Lsecondary_start:
 	/* Set trap vector to spin forever to help debug */
 	la a3, .Lsecondary_park
diff --git a/arch/riscv/kernel/head.h b/arch/riscv/kernel/head.h
index 5393cca77790..726731ada534 100644
--- a/arch/riscv/kernel/head.h
+++ b/arch/riscv/kernel/head.h
@@ -16,7 +16,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa);
 asmlinkage void __init __copy_data(void);
 #endif
 
+#ifdef CONFIG_RISCV_BOOT_SPINWAIT
 extern void *__cpu_spinwait_stack_pointer[];
 extern void *__cpu_spinwait_task_pointer[];
+#endif
 
 #endif /* __ASM_HEAD_H */
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC 6/6] RISC-V: Do not use cpumask data structure for hartid bitmap
  2021-12-04  0:20 [RFC 0/6] Sparse HART id support Atish Patra
                   ` (3 preceding siblings ...)
  2021-12-04  0:20 ` [RFC 5/6] RISC-V: Move spinwait booting method to its own config Atish Patra
@ 2021-12-04  0:20 ` Atish Patra
  2021-12-06 15:28 ` [RFC 0/6] Sparse HART id support Rob Herring
  5 siblings, 0 replies; 19+ messages in thread
From: Atish Patra @ 2021-12-04  0:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: Atish Patra, Alexandre Ghiti, Anup Patel, Greentime Hu, Guo Ren,
	Heinrich Schuchardt, Ingo Molnar, Jisheng Zhang, kvm-riscv, kvm,
	linux-riscv, Marc Zyngier, Nanyong Sun, Nick Kossifidis,
	Palmer Dabbelt, Paul Walmsley, Pekka Enberg, Vincent Chen,
	Vitaly Wool

From: Atish Patra <atishp@rivosinc.com>

Currently, SBI APIs accept a hartmask that is generated from struct cpumask.
Cpumask data structure can hold upto NR_CPUs value. Thus, it is not
the correct data structure for hartids as it can be higher than NR_CPUs
for platforms with sparse or discontguous hartids.

Remove all association between hartid mask and struct cpumask.

Signed-off-by: Atish Patra <atishp@rivosinc.com>
---
 arch/riscv/include/asm/sbi.h      |  19 +--
 arch/riscv/include/asm/smp.h      |   8 --
 arch/riscv/kernel/sbi.c           | 189 +++++++++++++++++-------------
 arch/riscv/kernel/smp.c           |  10 --
 arch/riscv/kernel/smpboot.c       |   2 +-
 arch/riscv/kvm/mmu.c              |   4 +-
 arch/riscv/kvm/vcpu_sbi_replace.c |  11 +-
 arch/riscv/kvm/vcpu_sbi_v01.c     |  11 +-
 arch/riscv/kvm/vmid.c             |   4 +-
 arch/riscv/mm/cacheflush.c        |   5 +-
 arch/riscv/mm/tlbflush.c          |   9 +-
 11 files changed, 130 insertions(+), 142 deletions(-)

diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h
index 9c46dd3ff4a2..94abf4c300e4 100644
--- a/arch/riscv/include/asm/sbi.h
+++ b/arch/riscv/include/asm/sbi.h
@@ -8,6 +8,7 @@
 #define _ASM_RISCV_SBI_H
 
 #include <linux/types.h>
+#include <linux/cpumask.h>
 
 #ifdef CONFIG_RISCV_SBI
 enum sbi_ext_id {
@@ -112,27 +113,27 @@ long sbi_get_mimpid(void);
 void sbi_set_timer(uint64_t stime_value);
 void sbi_shutdown(void);
 void sbi_clear_ipi(void);
-int sbi_send_ipi(const unsigned long *hart_mask);
-int sbi_remote_fence_i(const unsigned long *hart_mask);
-int sbi_remote_sfence_vma(const unsigned long *hart_mask,
+int sbi_send_ipi(const struct cpumask *cpu_mask);
+int sbi_remote_fence_i(const struct cpumask *cpu_mask);
+int sbi_remote_sfence_vma(const struct cpumask *cpu_mask,
 			   unsigned long start,
 			   unsigned long size);
 
-int sbi_remote_sfence_vma_asid(const unsigned long *hart_mask,
+int sbi_remote_sfence_vma_asid(const struct cpumask *cpu_mask,
 				unsigned long start,
 				unsigned long size,
 				unsigned long asid);
-int sbi_remote_hfence_gvma(const unsigned long *hart_mask,
+int sbi_remote_hfence_gvma(const struct cpumask *cpu_mask,
 			   unsigned long start,
 			   unsigned long size);
-int sbi_remote_hfence_gvma_vmid(const unsigned long *hart_mask,
+int sbi_remote_hfence_gvma_vmid(const struct cpumask *cpu_mask,
 				unsigned long start,
 				unsigned long size,
 				unsigned long vmid);
-int sbi_remote_hfence_vvma(const unsigned long *hart_mask,
+int sbi_remote_hfence_vvma(const struct cpumask *cpu_mask,
 			   unsigned long start,
 			   unsigned long size);
-int sbi_remote_hfence_vvma_asid(const unsigned long *hart_mask,
+int sbi_remote_hfence_vvma_asid(const struct cpumask *cpu_mask,
 				unsigned long start,
 				unsigned long size,
 				unsigned long asid);
@@ -159,7 +160,7 @@ static inline unsigned long sbi_minor_version(void)
 
 int sbi_err_map_linux_errno(int err);
 #else /* CONFIG_RISCV_SBI */
-static inline int sbi_remote_fence_i(const unsigned long *hart_mask) { return -1; }
+static inline int sbi_remote_fence_i(const struct cpumask *cpu_mask) { return -1; }
 static inline void sbi_init(void) {}
 #endif /* CONFIG_RISCV_SBI */
 #endif /* _ASM_RISCV_SBI_H */
diff --git a/arch/riscv/include/asm/smp.h b/arch/riscv/include/asm/smp.h
index a7d2811f3536..e07ecfb5d925 100644
--- a/arch/riscv/include/asm/smp.h
+++ b/arch/riscv/include/asm/smp.h
@@ -43,7 +43,6 @@ void arch_send_call_function_ipi_mask(struct cpumask *mask);
 void arch_send_call_function_single_ipi(int cpu);
 
 int riscv_hartid_to_cpuid(int hartid);
-void riscv_cpuid_to_hartid_mask(const struct cpumask *in, struct cpumask *out);
 
 /* Set custom IPI operations */
 void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops);
@@ -85,13 +84,6 @@ static inline unsigned long cpuid_to_hartid_map(int cpu)
 	return boot_cpu_hartid;
 }
 
-static inline void riscv_cpuid_to_hartid_mask(const struct cpumask *in,
-					      struct cpumask *out)
-{
-	cpumask_clear(out);
-	cpumask_set_cpu(boot_cpu_hartid, out);
-}
-
 static inline void riscv_set_ipi_ops(const struct riscv_ipi_ops *ops)
 {
 }
diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c
index 7402a417f38e..2438d6fdb788 100644
--- a/arch/riscv/kernel/sbi.c
+++ b/arch/riscv/kernel/sbi.c
@@ -15,8 +15,8 @@ unsigned long sbi_spec_version __ro_after_init = SBI_SPEC_VERSION_DEFAULT;
 EXPORT_SYMBOL(sbi_spec_version);
 
 static void (*__sbi_set_timer)(uint64_t stime) __ro_after_init;
-static int (*__sbi_send_ipi)(const unsigned long *hart_mask) __ro_after_init;
-static int (*__sbi_rfence)(int fid, const unsigned long *hart_mask,
+static int (*__sbi_send_ipi)(const struct cpumask *cpu_mask) __ro_after_init;
+static int (*__sbi_rfence)(int fid, const struct cpumask *cpu_mask,
 			   unsigned long start, unsigned long size,
 			   unsigned long arg4, unsigned long arg5) __ro_after_init;
 
@@ -66,6 +66,30 @@ int sbi_err_map_linux_errno(int err)
 EXPORT_SYMBOL(sbi_err_map_linux_errno);
 
 #ifdef CONFIG_RISCV_SBI_V01
+static unsigned long __sbi_v01_cpumask_to_hartmask(const struct cpumask *cpu_mask)
+{
+	unsigned long cpuid, hartid;
+	unsigned long hmask = 0;
+
+	/*
+	 * There is no maximum hartid concept in RISC-V and NR_CPUS must not be
+	 * associated with hartid. As SBI v0.1 is only kept for backward compatibility
+	 * and will be removed in the future, there is no point in supporting hartid
+	 * greater than BITS_PER_LONG (32 for RV32 and 64 for RV64). Ideally, SBI v0.2
+	 * should be used for platforms with hartid greater than BITS_PER_LONG.
+	 */
+	for_each_cpu(cpuid, cpu_mask) {
+		hartid = cpuid_to_hartid_map(cpuid);
+		if (hartid >= BITS_PER_LONG) {
+			pr_warn("Unable to send any request to hartid > BITS_PER_LONG for SBI v0.1\n");
+			break;
+		}
+		hmask |= 1 << hartid;
+	}
+
+	return hmask;
+}
+
 /**
  * sbi_console_putchar() - Writes given character to the console device.
  * @ch: The data to be written to the console.
@@ -131,33 +155,44 @@ static void __sbi_set_timer_v01(uint64_t stime_value)
 #endif
 }
 
-static int __sbi_send_ipi_v01(const unsigned long *hart_mask)
+static int __sbi_send_ipi_v01(const struct cpumask *cpu_mask)
 {
-	sbi_ecall(SBI_EXT_0_1_SEND_IPI, 0, (unsigned long)hart_mask,
+	unsigned long hart_mask;
+
+	if (!cpu_mask)
+		cpu_mask = cpu_online_mask;
+	hart_mask = __sbi_v01_cpumask_to_hartmask(cpu_mask);
+
+	sbi_ecall(SBI_EXT_0_1_SEND_IPI, 0, (unsigned long)(&hart_mask),
 		  0, 0, 0, 0, 0);
 	return 0;
 }
 
-static int __sbi_rfence_v01(int fid, const unsigned long *hart_mask,
+static int __sbi_rfence_v01(int fid, const struct cpumask *cpu_mask,
 			    unsigned long start, unsigned long size,
 			    unsigned long arg4, unsigned long arg5)
 {
 	int result = 0;
+	unsigned long hart_mask;
+
+	if (!cpu_mask)
+		cpu_mask = cpu_online_mask;
+	hart_mask = __sbi_v01_cpumask_to_hartmask(cpu_mask);
 
 	/* v0.2 function IDs are equivalent to v0.1 extension IDs */
 	switch (fid) {
 	case SBI_EXT_RFENCE_REMOTE_FENCE_I:
 		sbi_ecall(SBI_EXT_0_1_REMOTE_FENCE_I, 0,
-			  (unsigned long)hart_mask, 0, 0, 0, 0, 0);
+			  (unsigned long)&hart_mask, 0, 0, 0, 0, 0);
 		break;
 	case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA:
 		sbi_ecall(SBI_EXT_0_1_REMOTE_SFENCE_VMA, 0,
-			  (unsigned long)hart_mask, start, size,
+			  (unsigned long)&hart_mask, start, size,
 			  0, 0, 0);
 		break;
 	case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID:
 		sbi_ecall(SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID, 0,
-			  (unsigned long)hart_mask, start, size,
+			  (unsigned long)&hart_mask, start, size,
 			  arg4, 0, 0);
 		break;
 	default:
@@ -179,7 +214,7 @@ static void __sbi_set_timer_v01(uint64_t stime_value)
 		sbi_major_version(), sbi_minor_version());
 }
 
-static int __sbi_send_ipi_v01(const unsigned long *hart_mask)
+static int __sbi_send_ipi_v01(const struct cpumask *cpu_mask)
 {
 	pr_warn("IPI extension is not available in SBI v%lu.%lu\n",
 		sbi_major_version(), sbi_minor_version());
@@ -187,7 +222,7 @@ static int __sbi_send_ipi_v01(const unsigned long *hart_mask)
 	return 0;
 }
 
-static int __sbi_rfence_v01(int fid, const unsigned long *hart_mask,
+static int __sbi_rfence_v01(int fid, const struct cpumask *cpu_mask,
 			    unsigned long start, unsigned long size,
 			    unsigned long arg4, unsigned long arg5)
 {
@@ -211,37 +246,33 @@ static void __sbi_set_timer_v02(uint64_t stime_value)
 #endif
 }
 
-static int __sbi_send_ipi_v02(const unsigned long *hart_mask)
+static int __sbi_send_ipi_v02(const struct cpumask *cpu_mask)
 {
-	unsigned long hartid, hmask_val, hbase;
-	struct cpumask tmask;
+	unsigned long hartid, cpuid, hmask = 0, hbase = 0;
 	struct sbiret ret = {0};
 	int result;
 
-	if (!hart_mask || !(*hart_mask)) {
-		riscv_cpuid_to_hartid_mask(cpu_online_mask, &tmask);
-		hart_mask = cpumask_bits(&tmask);
-	}
+	if (!cpu_mask)
+		cpu_mask = cpu_online_mask;
 
-	hmask_val = 0;
-	hbase = 0;
-	for_each_set_bit(hartid, hart_mask, NR_CPUS) {
-		if (hmask_val && ((hbase + BITS_PER_LONG) <= hartid)) {
+	for_each_cpu(cpuid, cpu_mask) {
+		hartid = cpuid_to_hartid_map(cpuid);
+		if (hmask && ((hbase + BITS_PER_LONG) <= hartid)) {
 			ret = sbi_ecall(SBI_EXT_IPI, SBI_EXT_IPI_SEND_IPI,
-					hmask_val, hbase, 0, 0, 0, 0);
+					hmask, hbase, 0, 0, 0, 0);
 			if (ret.error)
 				goto ecall_failed;
-			hmask_val = 0;
+			hmask = 0;
 			hbase = 0;
 		}
-		if (!hmask_val)
+		if (!hmask)
 			hbase = hartid;
-		hmask_val |= 1UL << (hartid - hbase);
+		hmask |= 1UL << (hartid - hbase);
 	}
 
-	if (hmask_val) {
+	if (hmask) {
 		ret = sbi_ecall(SBI_EXT_IPI, SBI_EXT_IPI_SEND_IPI,
-				hmask_val, hbase, 0, 0, 0, 0);
+				hmask, hbase, 0, 0, 0, 0);
 		if (ret.error)
 			goto ecall_failed;
 	}
@@ -251,11 +282,11 @@ static int __sbi_send_ipi_v02(const unsigned long *hart_mask)
 ecall_failed:
 	result = sbi_err_map_linux_errno(ret.error);
 	pr_err("%s: hbase = [%lu] hmask = [0x%lx] failed (error [%d])\n",
-	       __func__, hbase, hmask_val, result);
+	       __func__, hbase, hmask, result);
 	return result;
 }
 
-static int __sbi_rfence_v02_call(unsigned long fid, unsigned long hmask_val,
+static int __sbi_rfence_v02_call(unsigned long fid, unsigned long hmask,
 				 unsigned long hbase, unsigned long start,
 				 unsigned long size, unsigned long arg4,
 				 unsigned long arg5)
@@ -266,31 +297,31 @@ static int __sbi_rfence_v02_call(unsigned long fid, unsigned long hmask_val,
 
 	switch (fid) {
 	case SBI_EXT_RFENCE_REMOTE_FENCE_I:
-		ret = sbi_ecall(ext, fid, hmask_val, hbase, 0, 0, 0, 0);
+		ret = sbi_ecall(ext, fid, hmask, hbase, 0, 0, 0, 0);
 		break;
 	case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA:
-		ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+		ret = sbi_ecall(ext, fid, hmask, hbase, start,
 				size, 0, 0);
 		break;
 	case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID:
-		ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+		ret = sbi_ecall(ext, fid, hmask, hbase, start,
 				size, arg4, 0);
 		break;
 
 	case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA:
-		ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+		ret = sbi_ecall(ext, fid, hmask, hbase, start,
 				size, 0, 0);
 		break;
 	case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID:
-		ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+		ret = sbi_ecall(ext, fid, hmask, hbase, start,
 				size, arg4, 0);
 		break;
 	case SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA:
-		ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+		ret = sbi_ecall(ext, fid, hmask, hbase, start,
 				size, 0, 0);
 		break;
 	case SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID:
-		ret = sbi_ecall(ext, fid, hmask_val, hbase, start,
+		ret = sbi_ecall(ext, fid, hmask, hbase, start,
 				size, arg4, 0);
 		break;
 	default:
@@ -302,43 +333,39 @@ static int __sbi_rfence_v02_call(unsigned long fid, unsigned long hmask_val,
 	if (ret.error) {
 		result = sbi_err_map_linux_errno(ret.error);
 		pr_err("%s: hbase = [%lu] hmask = [0x%lx] failed (error [%d])\n",
-		       __func__, hbase, hmask_val, result);
+		       __func__, hbase, hmask, result);
 	}
 
 	return result;
 }
 
-static int __sbi_rfence_v02(int fid, const unsigned long *hart_mask,
+static int __sbi_rfence_v02(int fid, const struct cpumask *cpu_mask,
 			    unsigned long start, unsigned long size,
 			    unsigned long arg4, unsigned long arg5)
 {
-	unsigned long hmask_val, hartid, hbase;
-	struct cpumask tmask;
+	unsigned long hartid, cpuid, hmask = 0, hbase = 0;
 	int result;
 
-	if (!hart_mask || !(*hart_mask)) {
-		riscv_cpuid_to_hartid_mask(cpu_online_mask, &tmask);
-		hart_mask = cpumask_bits(&tmask);
-	}
+	if (!cpu_mask)
+		cpu_mask = cpu_online_mask;
 
-	hmask_val = 0;
-	hbase = 0;
-	for_each_set_bit(hartid, hart_mask, NR_CPUS) {
-		if (hmask_val && ((hbase + BITS_PER_LONG) <= hartid)) {
-			result = __sbi_rfence_v02_call(fid, hmask_val, hbase,
+	for_each_cpu(cpuid, cpu_mask) {
+		hartid = cpuid_to_hartid_map(cpuid);
+		if (hmask && ((hbase + BITS_PER_LONG) <= hartid)) {
+			result = __sbi_rfence_v02_call(fid, hmask, hbase,
 						       start, size, arg4, arg5);
 			if (result)
 				return result;
-			hmask_val = 0;
+			hmask = 0;
 			hbase = 0;
 		}
-		if (!hmask_val)
+		if (!hmask)
 			hbase = hartid;
-		hmask_val |= 1UL << (hartid - hbase);
+		hmask |= 1UL << (hartid - hbase);
 	}
 
-	if (hmask_val) {
-		result = __sbi_rfence_v02_call(fid, hmask_val, hbase,
+	if (hmask) {
+		result = __sbi_rfence_v02_call(fid, hmask, hbase,
 					       start, size, arg4, arg5);
 		if (result)
 			return result;
@@ -360,44 +387,44 @@ void sbi_set_timer(uint64_t stime_value)
 
 /**
  * sbi_send_ipi() - Send an IPI to any hart.
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
  *
  * Return: 0 on success, appropriate linux error code otherwise.
  */
-int sbi_send_ipi(const unsigned long *hart_mask)
+int sbi_send_ipi(const struct cpumask *cpu_mask)
 {
-	return __sbi_send_ipi(hart_mask);
+	return __sbi_send_ipi(cpu_mask);
 }
 EXPORT_SYMBOL(sbi_send_ipi);
 
 /**
  * sbi_remote_fence_i() - Execute FENCE.I instruction on given remote harts.
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
  *
  * Return: 0 on success, appropriate linux error code otherwise.
  */
-int sbi_remote_fence_i(const unsigned long *hart_mask)
+int sbi_remote_fence_i(const struct cpumask *cpu_mask)
 {
 	return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_FENCE_I,
-			    hart_mask, 0, 0, 0, 0);
+			    cpu_mask, 0, 0, 0, 0);
 }
 EXPORT_SYMBOL(sbi_remote_fence_i);
 
 /**
  * sbi_remote_sfence_vma() - Execute SFENCE.VMA instructions on given remote
  *			     harts for the specified virtual address range.
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
  * @start: Start of the virtual address
  * @size: Total size of the virtual address range.
  *
  * Return: 0 on success, appropriate linux error code otherwise.
  */
-int sbi_remote_sfence_vma(const unsigned long *hart_mask,
+int sbi_remote_sfence_vma(const struct cpumask *cpu_mask,
 			   unsigned long start,
 			   unsigned long size)
 {
 	return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA,
-			    hart_mask, start, size, 0, 0);
+			    cpu_mask, start, size, 0, 0);
 }
 EXPORT_SYMBOL(sbi_remote_sfence_vma);
 
@@ -405,38 +432,38 @@ EXPORT_SYMBOL(sbi_remote_sfence_vma);
  * sbi_remote_sfence_vma_asid() - Execute SFENCE.VMA instructions on given
  * remote harts for a virtual address range belonging to a specific ASID.
  *
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
  * @start: Start of the virtual address
  * @size: Total size of the virtual address range.
  * @asid: The value of address space identifier (ASID).
  *
  * Return: 0 on success, appropriate linux error code otherwise.
  */
-int sbi_remote_sfence_vma_asid(const unsigned long *hart_mask,
+int sbi_remote_sfence_vma_asid(const struct cpumask *cpu_mask,
 				unsigned long start,
 				unsigned long size,
 				unsigned long asid)
 {
 	return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID,
-			    hart_mask, start, size, asid, 0);
+			    cpu_mask, start, size, asid, 0);
 }
 EXPORT_SYMBOL(sbi_remote_sfence_vma_asid);
 
 /**
  * sbi_remote_hfence_gvma() - Execute HFENCE.GVMA instructions on given remote
  *			   harts for the specified guest physical address range.
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
  * @start: Start of the guest physical address
  * @size: Total size of the guest physical address range.
  *
  * Return: None
  */
-int sbi_remote_hfence_gvma(const unsigned long *hart_mask,
+int sbi_remote_hfence_gvma(const struct cpumask *cpu_mask,
 			   unsigned long start,
 			   unsigned long size)
 {
 	return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA,
-			    hart_mask, start, size, 0, 0);
+			    cpu_mask, start, size, 0, 0);
 }
 EXPORT_SYMBOL_GPL(sbi_remote_hfence_gvma);
 
@@ -444,38 +471,38 @@ EXPORT_SYMBOL_GPL(sbi_remote_hfence_gvma);
  * sbi_remote_hfence_gvma_vmid() - Execute HFENCE.GVMA instructions on given
  * remote harts for a guest physical address range belonging to a specific VMID.
  *
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
  * @start: Start of the guest physical address
  * @size: Total size of the guest physical address range.
  * @vmid: The value of guest ID (VMID).
  *
  * Return: 0 if success, Error otherwise.
  */
-int sbi_remote_hfence_gvma_vmid(const unsigned long *hart_mask,
+int sbi_remote_hfence_gvma_vmid(const struct cpumask *cpu_mask,
 				unsigned long start,
 				unsigned long size,
 				unsigned long vmid)
 {
 	return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA_VMID,
-			    hart_mask, start, size, vmid, 0);
+			    cpu_mask, start, size, vmid, 0);
 }
 EXPORT_SYMBOL(sbi_remote_hfence_gvma_vmid);
 
 /**
  * sbi_remote_hfence_vvma() - Execute HFENCE.VVMA instructions on given remote
  *			     harts for the current guest virtual address range.
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
  * @start: Start of the current guest virtual address
  * @size: Total size of the current guest virtual address range.
  *
  * Return: None
  */
-int sbi_remote_hfence_vvma(const unsigned long *hart_mask,
+int sbi_remote_hfence_vvma(const struct cpumask *cpu_mask,
 			   unsigned long start,
 			   unsigned long size)
 {
 	return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA,
-			    hart_mask, start, size, 0, 0);
+			    cpu_mask, start, size, 0, 0);
 }
 EXPORT_SYMBOL(sbi_remote_hfence_vvma);
 
@@ -484,20 +511,20 @@ EXPORT_SYMBOL(sbi_remote_hfence_vvma);
  * remote harts for current guest virtual address range belonging to a specific
  * ASID.
  *
- * @hart_mask: A cpu mask containing all the target harts.
+ * @cpu_mask: A cpu mask containing all the target harts.
  * @start: Start of the current guest virtual address
  * @size: Total size of the current guest virtual address range.
  * @asid: The value of address space identifier (ASID).
  *
  * Return: None
  */
-int sbi_remote_hfence_vvma_asid(const unsigned long *hart_mask,
+int sbi_remote_hfence_vvma_asid(const struct cpumask *cpu_mask,
 				unsigned long start,
 				unsigned long size,
 				unsigned long asid)
 {
 	return __sbi_rfence(SBI_EXT_RFENCE_REMOTE_HFENCE_VVMA_ASID,
-			    hart_mask, start, size, asid, 0);
+			    cpu_mask, start, size, asid, 0);
 }
 EXPORT_SYMBOL(sbi_remote_hfence_vvma_asid);
 
@@ -564,11 +591,7 @@ long sbi_get_mimpid(void)
 
 static void sbi_send_cpumask_ipi(const struct cpumask *target)
 {
-	struct cpumask hartid_mask;
-
-	riscv_cpuid_to_hartid_mask(target, &hartid_mask);
-
-	sbi_send_ipi(cpumask_bits(&hartid_mask));
+	sbi_send_ipi(target);
 }
 
 static const struct riscv_ipi_ops sbi_ipi_ops = {
diff --git a/arch/riscv/kernel/smp.c b/arch/riscv/kernel/smp.c
index 2f6da845c9ae..b5d30ea92292 100644
--- a/arch/riscv/kernel/smp.c
+++ b/arch/riscv/kernel/smp.c
@@ -59,16 +59,6 @@ int riscv_hartid_to_cpuid(int hartid)
 	return -ENOENT;
 }
 
-void riscv_cpuid_to_hartid_mask(const struct cpumask *in, struct cpumask *out)
-{
-	int cpu;
-
-	cpumask_clear(out);
-	for_each_cpu(cpu, in)
-		cpumask_set_cpu(cpuid_to_hartid_map(cpu), out);
-}
-EXPORT_SYMBOL_GPL(riscv_cpuid_to_hartid_mask);
-
 bool arch_match_cpu_phys_id(int cpu, u64 phys_id)
 {
 	return phys_id == cpuid_to_hartid_map(cpu);
diff --git a/arch/riscv/kernel/smpboot.c b/arch/riscv/kernel/smpboot.c
index bd82375db51a..622f226454d5 100644
--- a/arch/riscv/kernel/smpboot.c
+++ b/arch/riscv/kernel/smpboot.c
@@ -96,7 +96,7 @@ void __init setup_smp(void)
 		if (cpuid >= NR_CPUS) {
 			pr_warn("Invalid cpuid [%d] for hartid [%d]\n",
 				cpuid, hart);
-			break;
+			continue;
 		}
 
 		cpuid_to_hartid_map(cpuid) = hart;
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 9ffd0255af43..81702a4829ae 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -114,7 +114,6 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, gpa_t addr,
 
 static void stage2_remote_tlb_flush(struct kvm *kvm, u32 level, gpa_t addr)
 {
-	struct cpumask hmask;
 	unsigned long size = PAGE_SIZE;
 	struct kvm_vmid *vmid = &kvm->arch.vmid;
 
@@ -127,8 +126,7 @@ static void stage2_remote_tlb_flush(struct kvm *kvm, u32 level, gpa_t addr)
 	 * where the Guest/VM is running.
 	 */
 	preempt_disable();
-	riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask);
-	sbi_remote_hfence_gvma_vmid(cpumask_bits(&hmask), addr, size,
+	sbi_remote_hfence_gvma_vmid(cpu_online_mask, addr, size,
 				    READ_ONCE(vmid->vmid));
 	preempt_enable();
 }
diff --git a/arch/riscv/kvm/vcpu_sbi_replace.c b/arch/riscv/kvm/vcpu_sbi_replace.c
index 67a64db1efc9..734b38b1846b 100644
--- a/arch/riscv/kvm/vcpu_sbi_replace.c
+++ b/arch/riscv/kvm/vcpu_sbi_replace.c
@@ -80,7 +80,7 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
 				      struct kvm_cpu_trap *utrap, bool *exit)
 {
 	int i, ret = 0;
-	struct cpumask cm, hm;
+	struct cpumask cm;
 	struct kvm_vcpu *tmp;
 	struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
 	unsigned long hmask = cp->a0;
@@ -88,7 +88,6 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
 	unsigned long funcid = cp->a6;
 
 	cpumask_clear(&cm);
-	cpumask_clear(&hm);
 	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
 		if (hbase != -1UL) {
 			if (tmp->vcpu_id < hbase)
@@ -101,17 +100,15 @@ static int kvm_sbi_ext_rfence_handler(struct kvm_vcpu *vcpu, struct kvm_run *run
 		cpumask_set_cpu(tmp->cpu, &cm);
 	}
 
-	riscv_cpuid_to_hartid_mask(&cm, &hm);
-
 	switch (funcid) {
 	case SBI_EXT_RFENCE_REMOTE_FENCE_I:
-		ret = sbi_remote_fence_i(cpumask_bits(&hm));
+		ret = sbi_remote_fence_i(&cm);
 		break;
 	case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA:
-		ret = sbi_remote_hfence_vvma(cpumask_bits(&hm), cp->a2, cp->a3);
+		ret = sbi_remote_hfence_vvma(&cm, cp->a2, cp->a3);
 		break;
 	case SBI_EXT_RFENCE_REMOTE_SFENCE_VMA_ASID:
-		ret = sbi_remote_hfence_vvma_asid(cpumask_bits(&hm), cp->a2,
+		ret = sbi_remote_hfence_vvma_asid(&cm, cp->a2,
 						  cp->a3, cp->a4);
 		break;
 	case SBI_EXT_RFENCE_REMOTE_HFENCE_GVMA:
diff --git a/arch/riscv/kvm/vcpu_sbi_v01.c b/arch/riscv/kvm/vcpu_sbi_v01.c
index 08097d1c13c1..84d7b96874f2 100644
--- a/arch/riscv/kvm/vcpu_sbi_v01.c
+++ b/arch/riscv/kvm/vcpu_sbi_v01.c
@@ -38,7 +38,7 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	int i, ret = 0;
 	u64 next_cycle;
 	struct kvm_vcpu *rvcpu;
-	struct cpumask cm, hm;
+	struct cpumask cm;
 	struct kvm *kvm = vcpu->kvm;
 	struct kvm_cpu_context *cp = &vcpu->arch.guest_context;
 
@@ -101,15 +101,12 @@ static int kvm_sbi_ext_v01_handler(struct kvm_vcpu *vcpu, struct kvm_run *run,
 				continue;
 			cpumask_set_cpu(rvcpu->cpu, &cm);
 		}
-		riscv_cpuid_to_hartid_mask(&cm, &hm);
 		if (cp->a7 == SBI_EXT_0_1_REMOTE_FENCE_I)
-			ret = sbi_remote_fence_i(cpumask_bits(&hm));
+			ret = sbi_remote_fence_i(&cm);
 		else if (cp->a7 == SBI_EXT_0_1_REMOTE_SFENCE_VMA)
-			ret = sbi_remote_hfence_vvma(cpumask_bits(&hm),
-						cp->a1, cp->a2);
+			ret = sbi_remote_hfence_vvma(&cm, cp->a1, cp->a2);
 		else
-			ret = sbi_remote_hfence_vvma_asid(cpumask_bits(&hm),
-						cp->a1, cp->a2, cp->a3);
+			ret = sbi_remote_hfence_vvma_asid(&cm, cp->a1, cp->a2, cp->a3);
 		break;
 	default:
 		ret = -EINVAL;
diff --git a/arch/riscv/kvm/vmid.c b/arch/riscv/kvm/vmid.c
index 2c6253b293bc..1bd4779d124e 100644
--- a/arch/riscv/kvm/vmid.c
+++ b/arch/riscv/kvm/vmid.c
@@ -67,7 +67,6 @@ void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu)
 {
 	int i;
 	struct kvm_vcpu *v;
-	struct cpumask hmask;
 	struct kvm_vmid *vmid = &vcpu->kvm->arch.vmid;
 
 	if (!kvm_riscv_stage2_vmid_ver_changed(vmid))
@@ -102,8 +101,7 @@ void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu)
 		 * running, we force VM exits on all host CPUs using IPI and
 		 * flush all Guest TLBs.
 		 */
-		riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask);
-		sbi_remote_hfence_gvma(cpumask_bits(&hmask), 0, 0);
+		sbi_remote_hfence_gvma(cpu_online_mask, 0, 0);
 	}
 
 	vmid->vmid = vmid_next;
diff --git a/arch/riscv/mm/cacheflush.c b/arch/riscv/mm/cacheflush.c
index 89f81067e09e..6cb7d96ad9c7 100644
--- a/arch/riscv/mm/cacheflush.c
+++ b/arch/riscv/mm/cacheflush.c
@@ -67,10 +67,7 @@ void flush_icache_mm(struct mm_struct *mm, bool local)
 		 */
 		smp_mb();
 	} else if (IS_ENABLED(CONFIG_RISCV_SBI)) {
-		cpumask_t hartid_mask;
-
-		riscv_cpuid_to_hartid_mask(&others, &hartid_mask);
-		sbi_remote_fence_i(cpumask_bits(&hartid_mask));
+		sbi_remote_fence_i(&others);
 	} else {
 		on_each_cpu_mask(&others, ipi_remote_fence_i, NULL, 1);
 	}
diff --git a/arch/riscv/mm/tlbflush.c b/arch/riscv/mm/tlbflush.c
index 64f8201237c2..37ed760d007c 100644
--- a/arch/riscv/mm/tlbflush.c
+++ b/arch/riscv/mm/tlbflush.c
@@ -32,7 +32,6 @@ static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
 				  unsigned long size, unsigned long stride)
 {
 	struct cpumask *cmask = mm_cpumask(mm);
-	struct cpumask hmask;
 	unsigned int cpuid;
 	bool broadcast;
 
@@ -46,9 +45,7 @@ static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
 		unsigned long asid = atomic_long_read(&mm->context.id);
 
 		if (broadcast) {
-			riscv_cpuid_to_hartid_mask(cmask, &hmask);
-			sbi_remote_sfence_vma_asid(cpumask_bits(&hmask),
-						   start, size, asid);
+			sbi_remote_sfence_vma_asid(cmask, start, size, asid);
 		} else if (size <= stride) {
 			local_flush_tlb_page_asid(start, asid);
 		} else {
@@ -56,9 +53,7 @@ static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start,
 		}
 	} else {
 		if (broadcast) {
-			riscv_cpuid_to_hartid_mask(cmask, &hmask);
-			sbi_remote_sfence_vma(cpumask_bits(&hmask),
-					      start, size);
+			sbi_remote_sfence_vma(cmask, start, size);
 		} else if (size <= stride) {
 			local_flush_tlb_page(start);
 		} else {
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC 5/6] RISC-V: Move spinwait booting method to its own config
  2021-12-04  0:20 ` [RFC 5/6] RISC-V: Move spinwait booting method to its own config Atish Patra
@ 2021-12-04  0:40   ` Randy Dunlap
  2021-12-13 13:01   ` Anup Patel
  1 sibling, 0 replies; 19+ messages in thread
From: Randy Dunlap @ 2021-12-04  0:40 UTC (permalink / raw)
  To: Atish Patra, linux-kernel
  Cc: Atish Patra, Alexandre Ghiti, Anup Patel, Greentime Hu, Guo Ren,
	Heinrich Schuchardt, Ingo Molnar, Jisheng Zhang, kvm-riscv, kvm,
	linux-riscv, Marc Zyngier, Nanyong Sun, Nick Kossifidis,
	Palmer Dabbelt, Paul Walmsley, Pekka Enberg, Vincent Chen,
	Vitaly Wool

Hi--

On 12/3/21 16:20, Atish Patra wrote:
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 821252b65f89..4afb42d5707d 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -403,6 +403,20 @@ config RISCV_SBI_V01
>  	  This config allows kernel to use SBI v0.1 APIs. This will be
>  	  deprecated in future once legacy M-mode software are no longer in use.
>  
> +config RISCV_BOOT_SPINWAIT
> +	bool "Spinwait booting method"
> +	depends on SMP
> +	default y
> +	help
> +	  This enables support for booting Linux via spinwait method. In the
> +	  spinwait method, all cores randomly jump to Linux. One of the core

	                                                                cores

> +	  gets chosen via lottery and all other keeps spinning on a percpu

	                                  others keep

> +	  variable. This method can not support cpu hotplug and sparse hartid

	                        cannot support CPU hotplug and sparse hartid

> +	  scheme. It should be only enabled for M-mode Linux or platforms relying
> +	  on older firmware without SBI HSM extension. All other platform should

	                                                         platforms

> +	  rely on ordered booing via SBI HSM extension which gets chosen

	                  booting

> +          dynamically at runtime if the firmware supports it.

	  dynamically at runtime if the firmware supports it.

(Last line should be indented with tab + 2 spaces, not 10 spaces.)


-- 
~Randy

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/6] Sparse HART id support
  2021-12-04  0:20 [RFC 0/6] Sparse HART id support Atish Patra
                   ` (4 preceding siblings ...)
  2021-12-04  0:20 ` [RFC 6/6] RISC-V: Do not use cpumask data structure for hartid bitmap Atish Patra
@ 2021-12-06 15:28 ` Rob Herring
  2021-12-13 21:27   ` Atish Patra
  5 siblings, 1 reply; 19+ messages in thread
From: Rob Herring @ 2021-12-06 15:28 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Alexandre Ghiti, Anup Patel, Greentime Hu, Guo Ren,
	Heinrich Schuchardt, Ingo Molnar, Jisheng Zhang, kvm-riscv, kvm,
	linux-riscv, Marc Zyngier, Nanyong Sun, Nick Kossifidis,
	Palmer Dabbelt, Paul Walmsley, Pekka Enberg, Vincent Chen,
	Vitaly Wool

On Fri, Dec 03, 2021 at 04:20:32PM -0800, Atish Patra wrote:
> Currently, sparse hartid is not supported for Linux RISC-V for the following
> reasons.
> 1. Both spinwait and ordered booting method uses __cpu_up_stack/task_pointer
>    which is an array size of NR_CPUs.
> 2. During early booting, any hartid greater than NR_CPUs are not booted at all.
> 3. riscv_cpuid_to_hartid_mask uses struct cpumask for generating hartid bitmap.
> 4. SBI v0.2 implementation uses NR_CPUs as the maximum hartid number while
>    generating hartmask.
> 
> In order to support sparse hartid, the hartid & NR_CPUS needs to be disassociated
> which was logically incorrect anyways. NR_CPUs represent the maximum logical|
> CPU id configured in the kernel while the hartid represent the physical hartid
> stored in mhartid CSR defined by the privilege specification. Thus, hartid
> can have much greater value than logical cpuid.

We already have a couple of architectures with logical to physical CPU 
id maps. See cpu_logical_map. Can we make that common and use it here? 
That would also possibly allow for common populating the map from DT.

Rob

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 1/6] RISC-V: Avoid using per cpu array for ordered booting
  2021-12-04  0:20 ` [RFC 1/6] RISC-V: Avoid using per cpu array for ordered booting Atish Patra
@ 2021-12-13 12:48   ` Anup Patel
  2021-12-13 21:05     ` Atish Patra
  0 siblings, 1 reply; 19+ messages in thread
From: Anup Patel @ 2021-12-13 12:48 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel@vger.kernel.org List, Atish Patra, Alexandre Ghiti,
	Anup Patel, Greentime Hu, Guo Ren, Heinrich Schuchardt,
	Ingo Molnar, Jisheng Zhang, kvm-riscv, KVM General, linux-riscv,
	Marc Zyngier, Nanyong Sun, Nick Kossifidis, Palmer Dabbelt,
	Paul Walmsley, Pekka Enberg, Vincent Chen, Vitaly Wool

On Sat, Dec 4, 2021 at 5:50 AM Atish Patra <atishp@atishpatra.org> wrote:
>
> From: Atish Patra <atishp@rivosinc.com>
>
> Currently both order booting and spinwait approach uses a per cpu
> array to update stack & task pointer. This approach will not work for the
> following cases.
> 1. If NR_CPUs are configured to be less than highest hart id.
> 2. A platform has sparse hartid.
>
> This issue can be fixed for ordered booting as the booting cpu brings up
> one cpu at a time using SBI HSM extension which has opaque parameter
> that is unused until now.
>
> Introduce a common secondary boot data structure that can store the stack
> and task pointer. Secondary harts will use this data while booting up
> to setup the sp & tp.
>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/include/asm/cpu_ops_sbi.h | 28 ++++++++++++++++++++++++++++
>  arch/riscv/kernel/cpu_ops_sbi.c      | 23 ++++++++++++++++++++---
>  arch/riscv/kernel/head.S             | 19 ++++++++++---------
>  3 files changed, 58 insertions(+), 12 deletions(-)
>  create mode 100644 arch/riscv/include/asm/cpu_ops_sbi.h
>
> diff --git a/arch/riscv/include/asm/cpu_ops_sbi.h b/arch/riscv/include/asm/cpu_ops_sbi.h
> new file mode 100644
> index 000000000000..ccb9a6d30486
> --- /dev/null
> +++ b/arch/riscv/include/asm/cpu_ops_sbi.h
> @@ -0,0 +1,28 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2021 by Rivos Inc.
> + */
> +#ifndef __ASM_CPU_OPS_SBI_H
> +#define __ASM_CPU_OPS_SBI_H
> +
> +#ifndef __ASSEMBLY__
> +#include <linux/init.h>
> +#include <linux/sched.h>
> +#include <linux/threads.h>
> +
> +/**
> + * struct sbi_hart_boot_data - Hart specific boot used during booting and
> + *                            cpu hotplug.
> + * @task_ptr: A pointer to the hart specific tp
> + * @stack_ptr: A pointer to the hart specific sp
> + */
> +struct sbi_hart_boot_data {
> +       void *task_ptr;
> +       void *stack_ptr;
> +};
> +#endif
> +
> +#define SBI_HART_BOOT_TASK_PTR_OFFSET (0x00)
> +#define SBI_HART_BOOT_STACK_PTR_OFFSET RISCV_SZPTR

Don't manually create these defines instead generate this
defines at compile time by adding entries in kernel/asm-offsets.c

> +
> +#endif /* ifndef __ASM_CPU_OPS_H */
> diff --git a/arch/riscv/kernel/cpu_ops_sbi.c b/arch/riscv/kernel/cpu_ops_sbi.c
> index 685fae72b7f5..2e7a9dd9c2a7 100644
> --- a/arch/riscv/kernel/cpu_ops_sbi.c
> +++ b/arch/riscv/kernel/cpu_ops_sbi.c
> @@ -7,13 +7,22 @@
>
>  #include <linux/init.h>
>  #include <linux/mm.h>
> +#include <linux/sched/task_stack.h>
>  #include <asm/cpu_ops.h>
> +#include <asm/cpu_ops_sbi.h>
>  #include <asm/sbi.h>
>  #include <asm/smp.h>
>
>  extern char secondary_start_sbi[];
>  const struct cpu_operations cpu_ops_sbi;
>
> +/*
> + * Ordered booting via HSM brings one cpu at a time. However, cpu hotplug can
> + * be invoked from multiple threads in paralle. Define a per cpu data
> + * to handle that.
> + */
> +DEFINE_PER_CPU(struct sbi_hart_boot_data, boot_data);
> +
>  static int sbi_hsm_hart_start(unsigned long hartid, unsigned long saddr,
>                               unsigned long priv)
>  {
> @@ -58,9 +67,17 @@ static int sbi_cpu_start(unsigned int cpuid, struct task_struct *tidle)
>         int rc;
>         unsigned long boot_addr = __pa_symbol(secondary_start_sbi);
>         int hartid = cpuid_to_hartid_map(cpuid);
> -
> -       cpu_update_secondary_bootdata(cpuid, tidle);
> -       rc = sbi_hsm_hart_start(hartid, boot_addr, 0);
> +       unsigned long hsm_data;
> +       struct sbi_hart_boot_data *bdata = &per_cpu(boot_data, cpuid);
> +
> +       /* Make sure tidle is updated */
> +       smp_mb();
> +       bdata->task_ptr = tidle;
> +       bdata->stack_ptr = task_stack_page(tidle) + THREAD_SIZE;
> +       /* Make sure boot data is updated */
> +       smp_mb();
> +       hsm_data = __pa(bdata);
> +       rc = sbi_hsm_hart_start(hartid, boot_addr, hsm_data);
>
>         return rc;
>  }
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index f52f01ecbeea..40d4c625513c 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -11,6 +11,7 @@
>  #include <asm/page.h>
>  #include <asm/pgtable.h>
>  #include <asm/csr.h>
> +#include <asm/cpu_ops_sbi.h>
>  #include <asm/hwcap.h>
>  #include <asm/image.h>
>  #include "efi-header.S"
> @@ -167,15 +168,15 @@ secondary_start_sbi:
>         la a3, .Lsecondary_park
>         csrw CSR_TVEC, a3
>
> -       slli a3, a0, LGREG
> -       la a4, __cpu_up_stack_pointer
> -       XIP_FIXUP_OFFSET a4
> -       la a5, __cpu_up_task_pointer
> -       XIP_FIXUP_OFFSET a5
> -       add a4, a3, a4
> -       add a5, a3, a5
> -       REG_L sp, (a4)
> -       REG_L tp, (a5)
> +       /* a0 contains the hartid & a1 contains boot data */
> +       li a2, SBI_HART_BOOT_TASK_PTR_OFFSET
> +       XIP_FIXUP_OFFSET a2
> +       add a2, a2, a1
> +       REG_L tp, (a2)
> +       li a3, SBI_HART_BOOT_STACK_PTR_OFFSET
> +       XIP_FIXUP_OFFSET a3
> +       add a3, a3, a1
> +       REG_L sp, (a3)
>
>         .global secondary_start_common
>  secondary_start_common:
> --
> 2.33.1
>

Regards,
Anup

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 2/6] RISC-V: Do not print the SBI version during HSM extension boot print
  2021-12-04  0:20 ` [RFC 2/6] RISC-V: Do not print the SBI version during HSM extension boot print Atish Patra
@ 2021-12-13 12:49   ` Anup Patel
  0 siblings, 0 replies; 19+ messages in thread
From: Anup Patel @ 2021-12-13 12:49 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel@vger.kernel.org List, Atish Patra, Alexandre Ghiti,
	Anup Patel, Greentime Hu, Guo Ren, Heinrich Schuchardt,
	Ingo Molnar, Jisheng Zhang, kvm-riscv, KVM General, linux-riscv,
	Marc Zyngier, Nanyong Sun, Nick Kossifidis, Palmer Dabbelt,
	Paul Walmsley, Pekka Enberg, Vincent Chen, Vitaly Wool

On Sat, Dec 4, 2021 at 5:50 AM Atish Patra <atishp@atishpatra.org> wrote:
>
> From: Atish Patra <atishp@rivosinc.com>
>
> The HSM extension information log also prints the SBI version v0.2. This
> is misleading as the underlying firmware SBI version may be different
> from v0.2.
>
> Remove the unncessary printing of SBI version.
>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>

Looks good to me.

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  arch/riscv/kernel/cpu_ops.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/kernel/cpu_ops.c b/arch/riscv/kernel/cpu_ops.c
> index 1985884fe829..3f5a38b03044 100644
> --- a/arch/riscv/kernel/cpu_ops.c
> +++ b/arch/riscv/kernel/cpu_ops.c
> @@ -38,7 +38,7 @@ void __init cpu_set_ops(int cpuid)
>  #if IS_ENABLED(CONFIG_RISCV_SBI)
>         if (sbi_probe_extension(SBI_EXT_HSM) > 0) {
>                 if (!cpuid)
> -                       pr_info("SBI v0.2 HSM extension detected\n");
> +                       pr_info("SBI HSM extension detected\n");
>                 cpu_ops[cpuid] = &cpu_ops_sbi;
>         } else
>  #endif
> --
> 2.33.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/6] RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method
  2021-12-04  0:20 ` [RFC 3/6] RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method Atish Patra
@ 2021-12-13 12:50   ` Anup Patel
  2021-12-13 12:59   ` Marc Zyngier
  1 sibling, 0 replies; 19+ messages in thread
From: Anup Patel @ 2021-12-13 12:50 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel@vger.kernel.org List, Atish Patra, Alexandre Ghiti,
	Anup Patel, Greentime Hu, Guo Ren, Heinrich Schuchardt,
	Ingo Molnar, Jisheng Zhang, kvm-riscv, KVM General, linux-riscv,
	Marc Zyngier, Nanyong Sun, Nick Kossifidis, Palmer Dabbelt,
	Paul Walmsley, Pekka Enberg, Vincent Chen, Vitaly Wool

On Sat, Dec 4, 2021 at 5:51 AM Atish Patra <atishp@atishpatra.org> wrote:
>
> From: Atish Patra <atishp@rivosinc.com>
>
> The __cpu_up_stack/task_pointer array is only used for spinwait method
> now. The per cpu array based lookup is also fragile for platforms with
> discontiguous/sparse hartids. The spinwait method is only used for
> M-mode Linux or older firmwares without SBI HSM extension. For general
> Linux systems, ordered booting method is preferred anyways to support
> cpu hotplug and kexec.
>
> Make sure that __cpu_up_stack/task_pointer is only used for spinwait
> method. Take this opportunity to rename it to
> __cpu_spinwait_stack/task_pointer to emphasize the purpose as well.
>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>

Looks good to me.

Reviewed-by: Anup Patel <anup@brainfault.org>

Regards,
Anup

> ---
>  arch/riscv/include/asm/cpu_ops.h     |  2 --
>  arch/riscv/kernel/cpu_ops.c          | 16 ----------------
>  arch/riscv/kernel/cpu_ops_spinwait.c | 27 ++++++++++++++++++++++++++-
>  arch/riscv/kernel/head.S             |  4 ++--
>  arch/riscv/kernel/head.h             |  4 ++--
>  5 files changed, 30 insertions(+), 23 deletions(-)
>
> diff --git a/arch/riscv/include/asm/cpu_ops.h b/arch/riscv/include/asm/cpu_ops.h
> index a8ec3c5c1bd2..134590f1b843 100644
> --- a/arch/riscv/include/asm/cpu_ops.h
> +++ b/arch/riscv/include/asm/cpu_ops.h
> @@ -40,7 +40,5 @@ struct cpu_operations {
>
>  extern const struct cpu_operations *cpu_ops[NR_CPUS];
>  void __init cpu_set_ops(int cpu);
> -void cpu_update_secondary_bootdata(unsigned int cpuid,
> -                                  struct task_struct *tidle);
>
>  #endif /* ifndef __ASM_CPU_OPS_H */
> diff --git a/arch/riscv/kernel/cpu_ops.c b/arch/riscv/kernel/cpu_ops.c
> index 3f5a38b03044..c1e30f403c3b 100644
> --- a/arch/riscv/kernel/cpu_ops.c
> +++ b/arch/riscv/kernel/cpu_ops.c
> @@ -8,31 +8,15 @@
>  #include <linux/of.h>
>  #include <linux/string.h>
>  #include <linux/sched.h>
> -#include <linux/sched/task_stack.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/sbi.h>
>  #include <asm/smp.h>
>
>  const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init;
>
> -void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> -void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> -
>  extern const struct cpu_operations cpu_ops_sbi;
>  extern const struct cpu_operations cpu_ops_spinwait;
>
> -void cpu_update_secondary_bootdata(unsigned int cpuid,
> -                                  struct task_struct *tidle)
> -{
> -       int hartid = cpuid_to_hartid_map(cpuid);
> -
> -       /* Make sure tidle is updated */
> -       smp_mb();
> -       WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> -                  task_stack_page(tidle) + THREAD_SIZE);
> -       WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> -}
> -
>  void __init cpu_set_ops(int cpuid)
>  {
>  #if IS_ENABLED(CONFIG_RISCV_SBI)
> diff --git a/arch/riscv/kernel/cpu_ops_spinwait.c b/arch/riscv/kernel/cpu_ops_spinwait.c
> index b2c957bb68c1..9f398eb94f7a 100644
> --- a/arch/riscv/kernel/cpu_ops_spinwait.c
> +++ b/arch/riscv/kernel/cpu_ops_spinwait.c
> @@ -6,11 +6,36 @@
>  #include <linux/errno.h>
>  #include <linux/of.h>
>  #include <linux/string.h>
> +#include <linux/sched/task_stack.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/sbi.h>
>  #include <asm/smp.h>
>
>  const struct cpu_operations cpu_ops_spinwait;
> +void *__cpu_spinwait_stack_pointer[NR_CPUS] __section(".data");
> +void *__cpu_spinwait_task_pointer[NR_CPUS] __section(".data");
> +
> +static void cpu_update_secondary_bootdata(unsigned int cpuid,
> +                                  struct task_struct *tidle)
> +{
> +       int hartid = cpuid_to_hartid_map(cpuid);
> +
> +       /*
> +        * The hartid must be less than NR_CPUS to avoid out-of-bound access
> +        * errors for __cpu_spinwait_stack/task_pointer. That is not always possible
> +        * for platforms with discontiguous hartid numbering scheme. That's why
> +        * spinwait booting is not the recommended approach for any platforms
> +        * and will be removed in future.
> +        */
> +       if (hartid == INVALID_HARTID || hartid >= NR_CPUS)
> +               return;
> +
> +       /* Make sure tidle is updated */
> +       smp_mb();
> +       WRITE_ONCE(__cpu_spinwait_stack_pointer[hartid],
> +                  task_stack_page(tidle) + THREAD_SIZE);
> +       WRITE_ONCE(__cpu_spinwait_task_pointer[hartid], tidle);
> +}
>
>  static int spinwait_cpu_prepare(unsigned int cpuid)
>  {
> @@ -28,7 +53,7 @@ static int spinwait_cpu_start(unsigned int cpuid, struct task_struct *tidle)
>          * selects the first cpu to boot the kernel and causes the remainder
>          * of the cpus to spin in a loop waiting for their stack pointer to be
>          * setup by that main cpu.  Writing to bootdata
> -        * (i.e __cpu_up_stack_pointer) signals to the spinning cpus that they
> +        * (i.e __cpu_spinwait_stack_pointer) signals to the spinning cpus that they
>          * can continue the boot process.
>          */
>         cpu_update_secondary_bootdata(cpuid, tidle);
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index 40d4c625513c..6f8e99eac6a1 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -347,9 +347,9 @@ clear_bss_done:
>         csrw CSR_TVEC, a3
>
>         slli a3, a0, LGREG
> -       la a1, __cpu_up_stack_pointer
> +       la a1, __cpu_spinwait_stack_pointer
>         XIP_FIXUP_OFFSET a1
> -       la a2, __cpu_up_task_pointer
> +       la a2, __cpu_spinwait_task_pointer
>         XIP_FIXUP_OFFSET a2
>         add a1, a3, a1
>         add a2, a3, a2
> diff --git a/arch/riscv/kernel/head.h b/arch/riscv/kernel/head.h
> index aabbc3ac3e48..5393cca77790 100644
> --- a/arch/riscv/kernel/head.h
> +++ b/arch/riscv/kernel/head.h
> @@ -16,7 +16,7 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa);
>  asmlinkage void __init __copy_data(void);
>  #endif
>
> -extern void *__cpu_up_stack_pointer[];
> -extern void *__cpu_up_task_pointer[];
> +extern void *__cpu_spinwait_stack_pointer[];
> +extern void *__cpu_spinwait_task_pointer[];
>
>  #endif /* __ASM_HEAD_H */
> --
> 2.33.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/6] RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method
  2021-12-04  0:20 ` [RFC 3/6] RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method Atish Patra
  2021-12-13 12:50   ` Anup Patel
@ 2021-12-13 12:59   ` Marc Zyngier
  2021-12-13 21:12     ` Atish Patra
  1 sibling, 1 reply; 19+ messages in thread
From: Marc Zyngier @ 2021-12-13 12:59 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel, Atish Patra, Alexandre Ghiti, Anup Patel,
	Greentime Hu, Guo Ren, Heinrich Schuchardt, Ingo Molnar,
	Jisheng Zhang, kvm-riscv, kvm, linux-riscv, Nanyong Sun,
	Nick Kossifidis, Palmer Dabbelt, Paul Walmsley, Pekka Enberg,
	Vincent Chen, Vitaly Wool

On 2021-12-04 00:20, Atish Patra wrote:
> From: Atish Patra <atishp@rivosinc.com>
> 
> The __cpu_up_stack/task_pointer array is only used for spinwait method
> now. The per cpu array based lookup is also fragile for platforms with
> discontiguous/sparse hartids. The spinwait method is only used for
> M-mode Linux or older firmwares without SBI HSM extension. For general
> Linux systems, ordered booting method is preferred anyways to support
> cpu hotplug and kexec.
> 
> Make sure that __cpu_up_stack/task_pointer is only used for spinwait
> method. Take this opportunity to rename it to
> __cpu_spinwait_stack/task_pointer to emphasize the purpose as well.
> 
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/include/asm/cpu_ops.h     |  2 --
>  arch/riscv/kernel/cpu_ops.c          | 16 ----------------
>  arch/riscv/kernel/cpu_ops_spinwait.c | 27 ++++++++++++++++++++++++++-
>  arch/riscv/kernel/head.S             |  4 ++--
>  arch/riscv/kernel/head.h             |  4 ++--
>  5 files changed, 30 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/riscv/include/asm/cpu_ops.h 
> b/arch/riscv/include/asm/cpu_ops.h
> index a8ec3c5c1bd2..134590f1b843 100644
> --- a/arch/riscv/include/asm/cpu_ops.h
> +++ b/arch/riscv/include/asm/cpu_ops.h
> @@ -40,7 +40,5 @@ struct cpu_operations {
> 
>  extern const struct cpu_operations *cpu_ops[NR_CPUS];
>  void __init cpu_set_ops(int cpu);
> -void cpu_update_secondary_bootdata(unsigned int cpuid,
> -				   struct task_struct *tidle);
> 
>  #endif /* ifndef __ASM_CPU_OPS_H */
> diff --git a/arch/riscv/kernel/cpu_ops.c b/arch/riscv/kernel/cpu_ops.c
> index 3f5a38b03044..c1e30f403c3b 100644
> --- a/arch/riscv/kernel/cpu_ops.c
> +++ b/arch/riscv/kernel/cpu_ops.c
> @@ -8,31 +8,15 @@
>  #include <linux/of.h>
>  #include <linux/string.h>
>  #include <linux/sched.h>
> -#include <linux/sched/task_stack.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/sbi.h>
>  #include <asm/smp.h>
> 
>  const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init;
> 
> -void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> -void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> -
>  extern const struct cpu_operations cpu_ops_sbi;
>  extern const struct cpu_operations cpu_ops_spinwait;
> 
> -void cpu_update_secondary_bootdata(unsigned int cpuid,
> -				   struct task_struct *tidle)
> -{
> -	int hartid = cpuid_to_hartid_map(cpuid);
> -
> -	/* Make sure tidle is updated */
> -	smp_mb();
> -	WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> -		   task_stack_page(tidle) + THREAD_SIZE);
> -	WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> -}
> -
>  void __init cpu_set_ops(int cpuid)
>  {
>  #if IS_ENABLED(CONFIG_RISCV_SBI)
> diff --git a/arch/riscv/kernel/cpu_ops_spinwait.c
> b/arch/riscv/kernel/cpu_ops_spinwait.c
> index b2c957bb68c1..9f398eb94f7a 100644
> --- a/arch/riscv/kernel/cpu_ops_spinwait.c
> +++ b/arch/riscv/kernel/cpu_ops_spinwait.c
> @@ -6,11 +6,36 @@
>  #include <linux/errno.h>
>  #include <linux/of.h>
>  #include <linux/string.h>
> +#include <linux/sched/task_stack.h>
>  #include <asm/cpu_ops.h>
>  #include <asm/sbi.h>
>  #include <asm/smp.h>
> 
>  const struct cpu_operations cpu_ops_spinwait;
> +void *__cpu_spinwait_stack_pointer[NR_CPUS] __section(".data");
> +void *__cpu_spinwait_task_pointer[NR_CPUS] __section(".data");
> +
> +static void cpu_update_secondary_bootdata(unsigned int cpuid,
> +				   struct task_struct *tidle)
> +{
> +	int hartid = cpuid_to_hartid_map(cpuid);
> +
> +	/*
> +	 * The hartid must be less than NR_CPUS to avoid out-of-bound access
> +	 * errors for __cpu_spinwait_stack/task_pointer. That is not always 
> possible
> +	 * for platforms with discontiguous hartid numbering scheme. That's 
> why
> +	 * spinwait booting is not the recommended approach for any platforms
> +	 * and will be removed in future.

How can you do that? Yes, spinning schemes are terrible.
However, once you started supporting them, you are stuck.

Best case, you can have an allow-list and only allow some
older platforms to use them. You can also make some features
dependent on non-spin schemes (kexec being one).

But dropping support isn't a valid option, I'm afraid.

Thanks,

          M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 5/6] RISC-V: Move spinwait booting method to its own config
  2021-12-04  0:20 ` [RFC 5/6] RISC-V: Move spinwait booting method to its own config Atish Patra
  2021-12-04  0:40   ` Randy Dunlap
@ 2021-12-13 13:01   ` Anup Patel
  2021-12-13 21:08     ` Atish Patra
  1 sibling, 1 reply; 19+ messages in thread
From: Anup Patel @ 2021-12-13 13:01 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel@vger.kernel.org List, Atish Patra, Alexandre Ghiti,
	Anup Patel, Greentime Hu, Guo Ren, Heinrich Schuchardt,
	Ingo Molnar, Jisheng Zhang, kvm-riscv, KVM General, linux-riscv,
	Marc Zyngier, Nanyong Sun, Nick Kossifidis, Palmer Dabbelt,
	Paul Walmsley, Pekka Enberg, Vincent Chen, Vitaly Wool

On Sat, Dec 4, 2021 at 5:51 AM Atish Patra <atishp@atishpatra.org> wrote:
>
> From: Atish Patra <atishp@rivosinc.com>
>
> The spinwait booting method should only be used for platforms with older
> firmware without SBI HSM extension or M-mode firmware because spinwait
> method can't support cpu hotplug, kexec or sparse hartid. It is better
> to move the entire spinwait implementation to its own config which can
> be disabled if required. It is enabled by default to maintain backward
> compatibility and M-mode Linux.
>
> Signed-off-by: Atish Patra <atishp@rivosinc.com>
> ---
>  arch/riscv/Kconfig          | 14 ++++++++++++++
>  arch/riscv/kernel/Makefile  |  3 ++-
>  arch/riscv/kernel/cpu_ops.c |  8 ++++++++
>  arch/riscv/kernel/head.S    |  6 +++---
>  arch/riscv/kernel/head.h    |  2 ++
>  5 files changed, 29 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 821252b65f89..4afb42d5707d 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -403,6 +403,20 @@ config RISCV_SBI_V01
>           This config allows kernel to use SBI v0.1 APIs. This will be
>           deprecated in future once legacy M-mode software are no longer in use.
>
> +config RISCV_BOOT_SPINWAIT
> +       bool "Spinwait booting method"
> +       depends on SMP
> +       default y
> +       help
> +         This enables support for booting Linux via spinwait method. In the
> +         spinwait method, all cores randomly jump to Linux. One of the core
> +         gets chosen via lottery and all other keeps spinning on a percpu
> +         variable. This method can not support cpu hotplug and sparse hartid
> +         scheme. It should be only enabled for M-mode Linux or platforms relying
> +         on older firmware without SBI HSM extension. All other platform should
> +         rely on ordered booing via SBI HSM extension which gets chosen
> +          dynamically at runtime if the firmware supports it.
> +
>  config KEXEC
>         bool "Kexec system call"
>         select KEXEC_CORE
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 3397ddac1a30..612556faa527 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -43,7 +43,8 @@ obj-$(CONFIG_FPU)             += fpu.o
>  obj-$(CONFIG_SMP)              += smpboot.o
>  obj-$(CONFIG_SMP)              += smp.o
>  obj-$(CONFIG_SMP)              += cpu_ops.o
> -obj-$(CONFIG_SMP)              += cpu_ops_spinwait.o
> +
> +obj-$(CONFIG_RISCV_BOOT_SPINWAIT) += cpu_ops_spinwait.o
>  obj-$(CONFIG_MODULES)          += module.o
>  obj-$(CONFIG_MODULE_SECTIONS)  += module-sections.o
>
> diff --git a/arch/riscv/kernel/cpu_ops.c b/arch/riscv/kernel/cpu_ops.c
> index c1e30f403c3b..170d07e57721 100644
> --- a/arch/riscv/kernel/cpu_ops.c
> +++ b/arch/riscv/kernel/cpu_ops.c
> @@ -15,7 +15,15 @@
>  const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init;
>
>  extern const struct cpu_operations cpu_ops_sbi;
> +#ifdef CONFIG_RISCV_BOOT_SPINWAIT
>  extern const struct cpu_operations cpu_ops_spinwait;
> +#else
> +const struct cpu_operations cpu_ops_spinwait = {
> +       .name           = "",
> +       .cpu_prepare    = NULL,
> +       .cpu_start      = NULL,
> +};
> +#endif
>
>  void __init cpu_set_ops(int cpuid)
>  {
> diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> index 9f16bfe9307e..4a694e15b95b 100644
> --- a/arch/riscv/kernel/head.S
> +++ b/arch/riscv/kernel/head.S
> @@ -259,7 +259,7 @@ pmp_done:
>         li t0, SR_FS
>         csrc CSR_STATUS, t0
>
> -#ifdef CONFIG_SMP
> +#ifdef CONFIG_RISCV_BOOT_SPINWAIT
>         li t0, CONFIG_NR_CPUS
>         blt a0, t0, .Lgood_cores
>         tail .Lsecondary_park
> @@ -285,7 +285,7 @@ pmp_done:
>         beq t0, t1, .Lsecondary_start
>
>  #endif /* CONFIG_XIP */
> -#endif /* CONFIG_SMP */
> +#endif /* CONFIG_RISCV_BOOT_SPINWAIT */
>
>  #ifdef CONFIG_XIP_KERNEL
>         la sp, _end + THREAD_SIZE
> @@ -344,7 +344,7 @@ clear_bss_done:
>         call soc_early_init
>         tail start_kernel
>
> -#ifdef CONFIG_SMP
> +#if defined(CONFIG_SMP) && defined(CONFIG_RISCV_BOOT_SPINWAIT)

The RISCV_BOOT_SPINWAIT option already depends on SMP.

Do you still need to check defined(CONFIG_SMP) here ?

Regards,
Anup

>  .Lsecondary_start:
>         /* Set trap vector to spin forever to help debug */
>         la a3, .Lsecondary_park
> diff --git a/arch/riscv/kernel/head.h b/arch/riscv/kernel/head.h
> index 5393cca77790..726731ada534 100644
> --- a/arch/riscv/kernel/head.h
> +++ b/arch/riscv/kernel/head.h
> @@ -16,7 +16,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa);
>  asmlinkage void __init __copy_data(void);
>  #endif
>
> +#ifdef CONFIG_RISCV_BOOT_SPINWAIT
>  extern void *__cpu_spinwait_stack_pointer[];
>  extern void *__cpu_spinwait_task_pointer[];
> +#endif
>
>  #endif /* __ASM_HEAD_H */
> --
> 2.33.1
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 1/6] RISC-V: Avoid using per cpu array for ordered booting
  2021-12-13 12:48   ` Anup Patel
@ 2021-12-13 21:05     ` Atish Patra
  0 siblings, 0 replies; 19+ messages in thread
From: Atish Patra @ 2021-12-13 21:05 UTC (permalink / raw)
  To: Anup Patel
  Cc: linux-kernel@vger.kernel.org List, Atish Patra, Alexandre Ghiti,
	Anup Patel, Greentime Hu, Guo Ren, Heinrich Schuchardt,
	Ingo Molnar, Jisheng Zhang, kvm-riscv, KVM General, linux-riscv,
	Marc Zyngier, Nanyong Sun, Nick Kossifidis, Palmer Dabbelt,
	Paul Walmsley, Pekka Enberg, Vincent Chen, Vitaly Wool

On Mon, Dec 13, 2021 at 4:49 AM Anup Patel <anup@brainfault.org> wrote:
>
> On Sat, Dec 4, 2021 at 5:50 AM Atish Patra <atishp@atishpatra.org> wrote:
> >
> > From: Atish Patra <atishp@rivosinc.com>
> >
> > Currently both order booting and spinwait approach uses a per cpu
> > array to update stack & task pointer. This approach will not work for the
> > following cases.
> > 1. If NR_CPUs are configured to be less than highest hart id.
> > 2. A platform has sparse hartid.
> >
> > This issue can be fixed for ordered booting as the booting cpu brings up
> > one cpu at a time using SBI HSM extension which has opaque parameter
> > that is unused until now.
> >
> > Introduce a common secondary boot data structure that can store the stack
> > and task pointer. Secondary harts will use this data while booting up
> > to setup the sp & tp.
> >
> > Signed-off-by: Atish Patra <atishp@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/cpu_ops_sbi.h | 28 ++++++++++++++++++++++++++++
> >  arch/riscv/kernel/cpu_ops_sbi.c      | 23 ++++++++++++++++++++---
> >  arch/riscv/kernel/head.S             | 19 ++++++++++---------
> >  3 files changed, 58 insertions(+), 12 deletions(-)
> >  create mode 100644 arch/riscv/include/asm/cpu_ops_sbi.h
> >
> > diff --git a/arch/riscv/include/asm/cpu_ops_sbi.h b/arch/riscv/include/asm/cpu_ops_sbi.h
> > new file mode 100644
> > index 000000000000..ccb9a6d30486
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/cpu_ops_sbi.h
> > @@ -0,0 +1,28 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (c) 2021 by Rivos Inc.
> > + */
> > +#ifndef __ASM_CPU_OPS_SBI_H
> > +#define __ASM_CPU_OPS_SBI_H
> > +
> > +#ifndef __ASSEMBLY__
> > +#include <linux/init.h>
> > +#include <linux/sched.h>
> > +#include <linux/threads.h>
> > +
> > +/**
> > + * struct sbi_hart_boot_data - Hart specific boot used during booting and
> > + *                            cpu hotplug.
> > + * @task_ptr: A pointer to the hart specific tp
> > + * @stack_ptr: A pointer to the hart specific sp
> > + */
> > +struct sbi_hart_boot_data {
> > +       void *task_ptr;
> > +       void *stack_ptr;
> > +};
> > +#endif
> > +
> > +#define SBI_HART_BOOT_TASK_PTR_OFFSET (0x00)
> > +#define SBI_HART_BOOT_STACK_PTR_OFFSET RISCV_SZPTR
>
> Don't manually create these defines instead generate this
> defines at compile time by adding entries in kernel/asm-offsets.c
>

Sure. I will fix this in the next version.

> > +
> > +#endif /* ifndef __ASM_CPU_OPS_H */
> > diff --git a/arch/riscv/kernel/cpu_ops_sbi.c b/arch/riscv/kernel/cpu_ops_sbi.c
> > index 685fae72b7f5..2e7a9dd9c2a7 100644
> > --- a/arch/riscv/kernel/cpu_ops_sbi.c
> > +++ b/arch/riscv/kernel/cpu_ops_sbi.c
> > @@ -7,13 +7,22 @@
> >
> >  #include <linux/init.h>
> >  #include <linux/mm.h>
> > +#include <linux/sched/task_stack.h>
> >  #include <asm/cpu_ops.h>
> > +#include <asm/cpu_ops_sbi.h>
> >  #include <asm/sbi.h>
> >  #include <asm/smp.h>
> >
> >  extern char secondary_start_sbi[];
> >  const struct cpu_operations cpu_ops_sbi;
> >
> > +/*
> > + * Ordered booting via HSM brings one cpu at a time. However, cpu hotplug can
> > + * be invoked from multiple threads in paralle. Define a per cpu data
> > + * to handle that.
> > + */
> > +DEFINE_PER_CPU(struct sbi_hart_boot_data, boot_data);
> > +
> >  static int sbi_hsm_hart_start(unsigned long hartid, unsigned long saddr,
> >                               unsigned long priv)
> >  {
> > @@ -58,9 +67,17 @@ static int sbi_cpu_start(unsigned int cpuid, struct task_struct *tidle)
> >         int rc;
> >         unsigned long boot_addr = __pa_symbol(secondary_start_sbi);
> >         int hartid = cpuid_to_hartid_map(cpuid);
> > -
> > -       cpu_update_secondary_bootdata(cpuid, tidle);
> > -       rc = sbi_hsm_hart_start(hartid, boot_addr, 0);
> > +       unsigned long hsm_data;
> > +       struct sbi_hart_boot_data *bdata = &per_cpu(boot_data, cpuid);
> > +
> > +       /* Make sure tidle is updated */
> > +       smp_mb();
> > +       bdata->task_ptr = tidle;
> > +       bdata->stack_ptr = task_stack_page(tidle) + THREAD_SIZE;
> > +       /* Make sure boot data is updated */
> > +       smp_mb();
> > +       hsm_data = __pa(bdata);
> > +       rc = sbi_hsm_hart_start(hartid, boot_addr, hsm_data);
> >
> >         return rc;
> >  }
> > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > index f52f01ecbeea..40d4c625513c 100644
> > --- a/arch/riscv/kernel/head.S
> > +++ b/arch/riscv/kernel/head.S
> > @@ -11,6 +11,7 @@
> >  #include <asm/page.h>
> >  #include <asm/pgtable.h>
> >  #include <asm/csr.h>
> > +#include <asm/cpu_ops_sbi.h>
> >  #include <asm/hwcap.h>
> >  #include <asm/image.h>
> >  #include "efi-header.S"
> > @@ -167,15 +168,15 @@ secondary_start_sbi:
> >         la a3, .Lsecondary_park
> >         csrw CSR_TVEC, a3
> >
> > -       slli a3, a0, LGREG
> > -       la a4, __cpu_up_stack_pointer
> > -       XIP_FIXUP_OFFSET a4
> > -       la a5, __cpu_up_task_pointer
> > -       XIP_FIXUP_OFFSET a5
> > -       add a4, a3, a4
> > -       add a5, a3, a5
> > -       REG_L sp, (a4)
> > -       REG_L tp, (a5)
> > +       /* a0 contains the hartid & a1 contains boot data */
> > +       li a2, SBI_HART_BOOT_TASK_PTR_OFFSET
> > +       XIP_FIXUP_OFFSET a2
> > +       add a2, a2, a1
> > +       REG_L tp, (a2)
> > +       li a3, SBI_HART_BOOT_STACK_PTR_OFFSET
> > +       XIP_FIXUP_OFFSET a3
> > +       add a3, a3, a1
> > +       REG_L sp, (a3)
> >
> >         .global secondary_start_common
> >  secondary_start_common:
> > --
> > 2.33.1
> >
>
> Regards,
> Anup



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 5/6] RISC-V: Move spinwait booting method to its own config
  2021-12-13 13:01   ` Anup Patel
@ 2021-12-13 21:08     ` Atish Patra
  0 siblings, 0 replies; 19+ messages in thread
From: Atish Patra @ 2021-12-13 21:08 UTC (permalink / raw)
  To: Anup Patel
  Cc: linux-kernel@vger.kernel.org List, Atish Patra, Alexandre Ghiti,
	Anup Patel, Greentime Hu, Guo Ren, Heinrich Schuchardt,
	Ingo Molnar, Jisheng Zhang, kvm-riscv, KVM General, linux-riscv,
	Marc Zyngier, Nanyong Sun, Nick Kossifidis, Palmer Dabbelt,
	Paul Walmsley, Pekka Enberg, Vincent Chen, Vitaly Wool

On Mon, Dec 13, 2021 at 5:02 AM Anup Patel <anup@brainfault.org> wrote:
>
> On Sat, Dec 4, 2021 at 5:51 AM Atish Patra <atishp@atishpatra.org> wrote:
> >
> > From: Atish Patra <atishp@rivosinc.com>
> >
> > The spinwait booting method should only be used for platforms with older
> > firmware without SBI HSM extension or M-mode firmware because spinwait
> > method can't support cpu hotplug, kexec or sparse hartid. It is better
> > to move the entire spinwait implementation to its own config which can
> > be disabled if required. It is enabled by default to maintain backward
> > compatibility and M-mode Linux.
> >
> > Signed-off-by: Atish Patra <atishp@rivosinc.com>
> > ---
> >  arch/riscv/Kconfig          | 14 ++++++++++++++
> >  arch/riscv/kernel/Makefile  |  3 ++-
> >  arch/riscv/kernel/cpu_ops.c |  8 ++++++++
> >  arch/riscv/kernel/head.S    |  6 +++---
> >  arch/riscv/kernel/head.h    |  2 ++
> >  5 files changed, 29 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 821252b65f89..4afb42d5707d 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -403,6 +403,20 @@ config RISCV_SBI_V01
> >           This config allows kernel to use SBI v0.1 APIs. This will be
> >           deprecated in future once legacy M-mode software are no longer in use.
> >
> > +config RISCV_BOOT_SPINWAIT
> > +       bool "Spinwait booting method"
> > +       depends on SMP
> > +       default y
> > +       help
> > +         This enables support for booting Linux via spinwait method. In the
> > +         spinwait method, all cores randomly jump to Linux. One of the core
> > +         gets chosen via lottery and all other keeps spinning on a percpu
> > +         variable. This method can not support cpu hotplug and sparse hartid
> > +         scheme. It should be only enabled for M-mode Linux or platforms relying
> > +         on older firmware without SBI HSM extension. All other platform should
> > +         rely on ordered booing via SBI HSM extension which gets chosen
> > +          dynamically at runtime if the firmware supports it.
> > +
> >  config KEXEC
> >         bool "Kexec system call"
> >         select KEXEC_CORE
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 3397ddac1a30..612556faa527 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -43,7 +43,8 @@ obj-$(CONFIG_FPU)             += fpu.o
> >  obj-$(CONFIG_SMP)              += smpboot.o
> >  obj-$(CONFIG_SMP)              += smp.o
> >  obj-$(CONFIG_SMP)              += cpu_ops.o
> > -obj-$(CONFIG_SMP)              += cpu_ops_spinwait.o
> > +
> > +obj-$(CONFIG_RISCV_BOOT_SPINWAIT) += cpu_ops_spinwait.o
> >  obj-$(CONFIG_MODULES)          += module.o
> >  obj-$(CONFIG_MODULE_SECTIONS)  += module-sections.o
> >
> > diff --git a/arch/riscv/kernel/cpu_ops.c b/arch/riscv/kernel/cpu_ops.c
> > index c1e30f403c3b..170d07e57721 100644
> > --- a/arch/riscv/kernel/cpu_ops.c
> > +++ b/arch/riscv/kernel/cpu_ops.c
> > @@ -15,7 +15,15 @@
> >  const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init;
> >
> >  extern const struct cpu_operations cpu_ops_sbi;
> > +#ifdef CONFIG_RISCV_BOOT_SPINWAIT
> >  extern const struct cpu_operations cpu_ops_spinwait;
> > +#else
> > +const struct cpu_operations cpu_ops_spinwait = {
> > +       .name           = "",
> > +       .cpu_prepare    = NULL,
> > +       .cpu_start      = NULL,
> > +};
> > +#endif
> >
> >  void __init cpu_set_ops(int cpuid)
> >  {
> > diff --git a/arch/riscv/kernel/head.S b/arch/riscv/kernel/head.S
> > index 9f16bfe9307e..4a694e15b95b 100644
> > --- a/arch/riscv/kernel/head.S
> > +++ b/arch/riscv/kernel/head.S
> > @@ -259,7 +259,7 @@ pmp_done:
> >         li t0, SR_FS
> >         csrc CSR_STATUS, t0
> >
> > -#ifdef CONFIG_SMP
> > +#ifdef CONFIG_RISCV_BOOT_SPINWAIT
> >         li t0, CONFIG_NR_CPUS
> >         blt a0, t0, .Lgood_cores
> >         tail .Lsecondary_park
> > @@ -285,7 +285,7 @@ pmp_done:
> >         beq t0, t1, .Lsecondary_start
> >
> >  #endif /* CONFIG_XIP */
> > -#endif /* CONFIG_SMP */
> > +#endif /* CONFIG_RISCV_BOOT_SPINWAIT */
> >
> >  #ifdef CONFIG_XIP_KERNEL
> >         la sp, _end + THREAD_SIZE
> > @@ -344,7 +344,7 @@ clear_bss_done:
> >         call soc_early_init
> >         tail start_kernel
> >
> > -#ifdef CONFIG_SMP
> > +#if defined(CONFIG_SMP) && defined(CONFIG_RISCV_BOOT_SPINWAIT)
>
> The RISCV_BOOT_SPINWAIT option already depends on SMP.
>
> Do you still need to check defined(CONFIG_SMP) here ?
>

Nope. I guess this one slipped through the cracks. All other related
#ifdef have only CONFIG_RISCV_BOOT_SPINWAIT
I will fix it in v2.

> Regards,
> Anup
>
> >  .Lsecondary_start:
> >         /* Set trap vector to spin forever to help debug */
> >         la a3, .Lsecondary_park
> > diff --git a/arch/riscv/kernel/head.h b/arch/riscv/kernel/head.h
> > index 5393cca77790..726731ada534 100644
> > --- a/arch/riscv/kernel/head.h
> > +++ b/arch/riscv/kernel/head.h
> > @@ -16,7 +16,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa);
> >  asmlinkage void __init __copy_data(void);
> >  #endif
> >
> > +#ifdef CONFIG_RISCV_BOOT_SPINWAIT
> >  extern void *__cpu_spinwait_stack_pointer[];
> >  extern void *__cpu_spinwait_task_pointer[];
> > +#endif
> >
> >  #endif /* __ASM_HEAD_H */
> > --
> > 2.33.1
> >



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 3/6] RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method
  2021-12-13 12:59   ` Marc Zyngier
@ 2021-12-13 21:12     ` Atish Patra
  0 siblings, 0 replies; 19+ messages in thread
From: Atish Patra @ 2021-12-13 21:12 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-kernel@vger.kernel.org List, Atish Patra, Alexandre Ghiti,
	Anup Patel, Greentime Hu, Guo Ren, Heinrich Schuchardt,
	Ingo Molnar, Jisheng Zhang, kvm-riscv, KVM General, linux-riscv,
	Nanyong Sun, Nick Kossifidis, Palmer Dabbelt, Paul Walmsley,
	Pekka Enberg, Vincent Chen, Vitaly Wool

On Mon, Dec 13, 2021 at 4:59 AM Marc Zyngier <maz@kernel.org> wrote:
>
> On 2021-12-04 00:20, Atish Patra wrote:
> > From: Atish Patra <atishp@rivosinc.com>
> >
> > The __cpu_up_stack/task_pointer array is only used for spinwait method
> > now. The per cpu array based lookup is also fragile for platforms with
> > discontiguous/sparse hartids. The spinwait method is only used for
> > M-mode Linux or older firmwares without SBI HSM extension. For general
> > Linux systems, ordered booting method is preferred anyways to support
> > cpu hotplug and kexec.
> >
> > Make sure that __cpu_up_stack/task_pointer is only used for spinwait
> > method. Take this opportunity to rename it to
> > __cpu_spinwait_stack/task_pointer to emphasize the purpose as well.
> >
> > Signed-off-by: Atish Patra <atishp@rivosinc.com>
> > ---
> >  arch/riscv/include/asm/cpu_ops.h     |  2 --
> >  arch/riscv/kernel/cpu_ops.c          | 16 ----------------
> >  arch/riscv/kernel/cpu_ops_spinwait.c | 27 ++++++++++++++++++++++++++-
> >  arch/riscv/kernel/head.S             |  4 ++--
> >  arch/riscv/kernel/head.h             |  4 ++--
> >  5 files changed, 30 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/cpu_ops.h
> > b/arch/riscv/include/asm/cpu_ops.h
> > index a8ec3c5c1bd2..134590f1b843 100644
> > --- a/arch/riscv/include/asm/cpu_ops.h
> > +++ b/arch/riscv/include/asm/cpu_ops.h
> > @@ -40,7 +40,5 @@ struct cpu_operations {
> >
> >  extern const struct cpu_operations *cpu_ops[NR_CPUS];
> >  void __init cpu_set_ops(int cpu);
> > -void cpu_update_secondary_bootdata(unsigned int cpuid,
> > -                                struct task_struct *tidle);
> >
> >  #endif /* ifndef __ASM_CPU_OPS_H */
> > diff --git a/arch/riscv/kernel/cpu_ops.c b/arch/riscv/kernel/cpu_ops.c
> > index 3f5a38b03044..c1e30f403c3b 100644
> > --- a/arch/riscv/kernel/cpu_ops.c
> > +++ b/arch/riscv/kernel/cpu_ops.c
> > @@ -8,31 +8,15 @@
> >  #include <linux/of.h>
> >  #include <linux/string.h>
> >  #include <linux/sched.h>
> > -#include <linux/sched/task_stack.h>
> >  #include <asm/cpu_ops.h>
> >  #include <asm/sbi.h>
> >  #include <asm/smp.h>
> >
> >  const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init;
> >
> > -void *__cpu_up_stack_pointer[NR_CPUS] __section(".data");
> > -void *__cpu_up_task_pointer[NR_CPUS] __section(".data");
> > -
> >  extern const struct cpu_operations cpu_ops_sbi;
> >  extern const struct cpu_operations cpu_ops_spinwait;
> >
> > -void cpu_update_secondary_bootdata(unsigned int cpuid,
> > -                                struct task_struct *tidle)
> > -{
> > -     int hartid = cpuid_to_hartid_map(cpuid);
> > -
> > -     /* Make sure tidle is updated */
> > -     smp_mb();
> > -     WRITE_ONCE(__cpu_up_stack_pointer[hartid],
> > -                task_stack_page(tidle) + THREAD_SIZE);
> > -     WRITE_ONCE(__cpu_up_task_pointer[hartid], tidle);
> > -}
> > -
> >  void __init cpu_set_ops(int cpuid)
> >  {
> >  #if IS_ENABLED(CONFIG_RISCV_SBI)
> > diff --git a/arch/riscv/kernel/cpu_ops_spinwait.c
> > b/arch/riscv/kernel/cpu_ops_spinwait.c
> > index b2c957bb68c1..9f398eb94f7a 100644
> > --- a/arch/riscv/kernel/cpu_ops_spinwait.c
> > +++ b/arch/riscv/kernel/cpu_ops_spinwait.c
> > @@ -6,11 +6,36 @@
> >  #include <linux/errno.h>
> >  #include <linux/of.h>
> >  #include <linux/string.h>
> > +#include <linux/sched/task_stack.h>
> >  #include <asm/cpu_ops.h>
> >  #include <asm/sbi.h>
> >  #include <asm/smp.h>
> >
> >  const struct cpu_operations cpu_ops_spinwait;
> > +void *__cpu_spinwait_stack_pointer[NR_CPUS] __section(".data");
> > +void *__cpu_spinwait_task_pointer[NR_CPUS] __section(".data");
> > +
> > +static void cpu_update_secondary_bootdata(unsigned int cpuid,
> > +                                struct task_struct *tidle)
> > +{
> > +     int hartid = cpuid_to_hartid_map(cpuid);
> > +
> > +     /*
> > +      * The hartid must be less than NR_CPUS to avoid out-of-bound access
> > +      * errors for __cpu_spinwait_stack/task_pointer. That is not always
> > possible
> > +      * for platforms with discontiguous hartid numbering scheme. That's
> > why
> > +      * spinwait booting is not the recommended approach for any platforms
> > +      * and will be removed in future.
>
> How can you do that? Yes, spinning schemes are terrible.
> However, once you started supporting them, you are stuck.
>

This was a typo. It is supposed to say "disabled" in the future based
on other configs.
For now, it is enabled by default. Any platform with sparse hartid
needs to disable it in their own config.

In the future, we may be able to make it only available to M-mode
Linux if we can ensure that nobody is running
older firmware without SBI HSM extension.

> Best case, you can have an allow-list and only allow some
> older platforms to use them. You can also make some features
> dependent on non-spin schemes (kexec being one).
>
> But dropping support isn't a valid option, I'm afraid.
>
> Thanks,
>
>           M.
> --
> Jazz is not dead. It just smells funny...



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/6] Sparse HART id support
  2021-12-06 15:28 ` [RFC 0/6] Sparse HART id support Rob Herring
@ 2021-12-13 21:27   ` Atish Patra
  2021-12-13 23:11     ` Rob Herring
  0 siblings, 1 reply; 19+ messages in thread
From: Atish Patra @ 2021-12-13 21:27 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel@vger.kernel.org List, Alexandre Ghiti, Anup Patel,
	Greentime Hu, Guo Ren, Heinrich Schuchardt, Ingo Molnar,
	Jisheng Zhang, kvm-riscv, KVM General, linux-riscv, Marc Zyngier,
	Nanyong Sun, Nick Kossifidis, Palmer Dabbelt, Paul Walmsley,
	Pekka Enberg, Vincent Chen, Vitaly Wool

On Mon, Dec 6, 2021 at 7:28 AM Rob Herring <robh@kernel.org> wrote:
>
> On Fri, Dec 03, 2021 at 04:20:32PM -0800, Atish Patra wrote:
> > Currently, sparse hartid is not supported for Linux RISC-V for the following
> > reasons.
> > 1. Both spinwait and ordered booting method uses __cpu_up_stack/task_pointer
> >    which is an array size of NR_CPUs.
> > 2. During early booting, any hartid greater than NR_CPUs are not booted at all.
> > 3. riscv_cpuid_to_hartid_mask uses struct cpumask for generating hartid bitmap.
> > 4. SBI v0.2 implementation uses NR_CPUs as the maximum hartid number while
> >    generating hartmask.
> >
> > In order to support sparse hartid, the hartid & NR_CPUS needs to be disassociated
> > which was logically incorrect anyways. NR_CPUs represent the maximum logical|
> > CPU id configured in the kernel while the hartid represent the physical hartid
> > stored in mhartid CSR defined by the privilege specification. Thus, hartid
> > can have much greater value than logical cpuid.
>
> We already have a couple of architectures with logical to physical CPU
> id maps. See cpu_logical_map. Can we make that common and use it here?

Yes. We can move the cpu_logical_map(which is a macro) &
__cpu_logical_map(actual array with NR_CPUS size)
to common code so that all the architecture can use it instead of
defining it separately.

> That would also possibly allow for common populating the map from DT.
>

I didn't understand this part. The mapping is populated at run time
[1] as the boot cpu can be any hart in RISC-V.
That booting hart will be mapped to cpu 0. All others will be mapped
based on how the cpu node is laid out in the DT.
Do you mean we can move the 2nd part to common code as well ?

[1] RISC-V: https://elixir.bootlin.com/linux/v5.16-rc5/source/arch/riscv/kernel/smpboot.c#L102

> Rob



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/6] Sparse HART id support
  2021-12-13 21:27   ` Atish Patra
@ 2021-12-13 23:11     ` Rob Herring
  2021-12-14  0:58       ` Atish Patra
  0 siblings, 1 reply; 19+ messages in thread
From: Rob Herring @ 2021-12-13 23:11 UTC (permalink / raw)
  To: Atish Patra
  Cc: linux-kernel@vger.kernel.org List, Alexandre Ghiti, Anup Patel,
	Greentime Hu, Guo Ren, Heinrich Schuchardt, Ingo Molnar,
	Jisheng Zhang, kvm-riscv, KVM General, linux-riscv, Marc Zyngier,
	Nanyong Sun, Nick Kossifidis, Palmer Dabbelt, Paul Walmsley,
	Pekka Enberg, Vincent Chen, Vitaly Wool

On Mon, Dec 13, 2021 at 3:27 PM Atish Patra <atishp@atishpatra.org> wrote:
>
> On Mon, Dec 6, 2021 at 7:28 AM Rob Herring <robh@kernel.org> wrote:
> >
> > On Fri, Dec 03, 2021 at 04:20:32PM -0800, Atish Patra wrote:
> > > Currently, sparse hartid is not supported for Linux RISC-V for the following
> > > reasons.
> > > 1. Both spinwait and ordered booting method uses __cpu_up_stack/task_pointer
> > >    which is an array size of NR_CPUs.
> > > 2. During early booting, any hartid greater than NR_CPUs are not booted at all.
> > > 3. riscv_cpuid_to_hartid_mask uses struct cpumask for generating hartid bitmap.
> > > 4. SBI v0.2 implementation uses NR_CPUs as the maximum hartid number while
> > >    generating hartmask.
> > >
> > > In order to support sparse hartid, the hartid & NR_CPUS needs to be disassociated
> > > which was logically incorrect anyways. NR_CPUs represent the maximum logical|
> > > CPU id configured in the kernel while the hartid represent the physical hartid
> > > stored in mhartid CSR defined by the privilege specification. Thus, hartid
> > > can have much greater value than logical cpuid.
> >
> > We already have a couple of architectures with logical to physical CPU
> > id maps. See cpu_logical_map. Can we make that common and use it here?
>
> Yes. We can move the cpu_logical_map(which is a macro) &
> __cpu_logical_map(actual array with NR_CPUS size)
> to common code so that all the architecture can use it instead of
> defining it separately.

IIRC, the macro is what varies by arch and I would move to static
inlines rather than supporting:

cpu_logical_map(cpu) = 0xdeadbeef;

>
> > That would also possibly allow for common populating the map from DT.
> >
>
> I didn't understand this part. The mapping is populated at run time
> [1] as the boot cpu can be any hart in RISC-V.
> That booting hart will be mapped to cpu 0. All others will be mapped
> based on how the cpu node is laid out in the DT.
> Do you mean we can move the 2nd part to common code as well ?

Yes, as the DT platforms just loop thru the cpu nodes and fill the
logical map based on 'reg', I don't think that needs to be per arch
once we have a common map. But not asking for that now.

Rob

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC 0/6] Sparse HART id support
  2021-12-13 23:11     ` Rob Herring
@ 2021-12-14  0:58       ` Atish Patra
  0 siblings, 0 replies; 19+ messages in thread
From: Atish Patra @ 2021-12-14  0:58 UTC (permalink / raw)
  To: Rob Herring
  Cc: linux-kernel@vger.kernel.org List, Alexandre Ghiti, Anup Patel,
	Greentime Hu, Guo Ren, Heinrich Schuchardt, Ingo Molnar,
	Jisheng Zhang, kvm-riscv, KVM General, linux-riscv, Marc Zyngier,
	Nanyong Sun, Nick Kossifidis, Palmer Dabbelt, Paul Walmsley,
	Pekka Enberg, Vincent Chen, Vitaly Wool

On Mon, Dec 13, 2021 at 3:11 PM Rob Herring <robh@kernel.org> wrote:
>
> On Mon, Dec 13, 2021 at 3:27 PM Atish Patra <atishp@atishpatra.org> wrote:
> >
> > On Mon, Dec 6, 2021 at 7:28 AM Rob Herring <robh@kernel.org> wrote:
> > >
> > > On Fri, Dec 03, 2021 at 04:20:32PM -0800, Atish Patra wrote:
> > > > Currently, sparse hartid is not supported for Linux RISC-V for the following
> > > > reasons.
> > > > 1. Both spinwait and ordered booting method uses __cpu_up_stack/task_pointer
> > > >    which is an array size of NR_CPUs.
> > > > 2. During early booting, any hartid greater than NR_CPUs are not booted at all.
> > > > 3. riscv_cpuid_to_hartid_mask uses struct cpumask for generating hartid bitmap.
> > > > 4. SBI v0.2 implementation uses NR_CPUs as the maximum hartid number while
> > > >    generating hartmask.
> > > >
> > > > In order to support sparse hartid, the hartid & NR_CPUS needs to be disassociated
> > > > which was logically incorrect anyways. NR_CPUs represent the maximum logical|
> > > > CPU id configured in the kernel while the hartid represent the physical hartid
> > > > stored in mhartid CSR defined by the privilege specification. Thus, hartid
> > > > can have much greater value than logical cpuid.
> > >
> > > We already have a couple of architectures with logical to physical CPU
> > > id maps. See cpu_logical_map. Can we make that common and use it here?
> >
> > Yes. We can move the cpu_logical_map(which is a macro) &
> > __cpu_logical_map(actual array with NR_CPUS size)
> > to common code so that all the architecture can use it instead of
> > defining it separately.
>
> IIRC, the macro is what varies by arch and I would move to static
> inlines rather than supporting:
>
> cpu_logical_map(cpu) = 0xdeadbeef;
>

Sounds good.


> >
> > > That would also possibly allow for common populating the map from DT.
> > >
> >
> > I didn't understand this part. The mapping is populated at run time
> > [1] as the boot cpu can be any hart in RISC-V.
> > That booting hart will be mapped to cpu 0. All others will be mapped
> > based on how the cpu node is laid out in the DT.
> > Do you mean we can move the 2nd part to common code as well ?
>
> Yes, as the DT platforms just loop thru the cpu nodes and fill the
> logical map based on 'reg', I don't think that needs to be per arch
> once we have a common map. But not asking for that now.
>

It would make sense to keep them together in a series. I can take a stab
at it once this series is merged so that we don't end up in
conflicting changes between
two series.

> Rob



-- 
Regards,
Atish

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-12-14  0:58 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-04  0:20 [RFC 0/6] Sparse HART id support Atish Patra
2021-12-04  0:20 ` [RFC 1/6] RISC-V: Avoid using per cpu array for ordered booting Atish Patra
2021-12-13 12:48   ` Anup Patel
2021-12-13 21:05     ` Atish Patra
2021-12-04  0:20 ` [RFC 2/6] RISC-V: Do not print the SBI version during HSM extension boot print Atish Patra
2021-12-13 12:49   ` Anup Patel
2021-12-04  0:20 ` [RFC 3/6] RISC-V: Use __cpu_up_stack/task_pointer only for spinwait method Atish Patra
2021-12-13 12:50   ` Anup Patel
2021-12-13 12:59   ` Marc Zyngier
2021-12-13 21:12     ` Atish Patra
2021-12-04  0:20 ` [RFC 5/6] RISC-V: Move spinwait booting method to its own config Atish Patra
2021-12-04  0:40   ` Randy Dunlap
2021-12-13 13:01   ` Anup Patel
2021-12-13 21:08     ` Atish Patra
2021-12-04  0:20 ` [RFC 6/6] RISC-V: Do not use cpumask data structure for hartid bitmap Atish Patra
2021-12-06 15:28 ` [RFC 0/6] Sparse HART id support Rob Herring
2021-12-13 21:27   ` Atish Patra
2021-12-13 23:11     ` Rob Herring
2021-12-14  0:58       ` Atish Patra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).