All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot
@ 2020-09-07 18:16 Sean Anderson
  2020-09-07 18:16 ` [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs" Sean Anderson
                   ` (7 more replies)
  0 siblings, 8 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-07 18:16 UTC (permalink / raw)
  To: u-boot

On the K210, the prior stage bootloader does not clear IPIs. This presents
a problem, because U-Boot up until this point assumes (with one exception)
that IPIs are cleared when it starts. This series attempts to fix this in a
robust manner, and fix several concurrency bugs I noticed while fixing
these other areas. Heinrich previously submitted a patch addressing part of
this problem in [1].

[1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk at gmx.de/


Sean Anderson (7):
  Revert "riscv: Clear pending interrupts before enabling IPIs"
  riscv: Match memory barriers between send_ipi_many and handle_ipi
  riscv: Use NULL as a sentinel value for smp_call_function
  riscv: Clear pending IPIs on initialization
  riscv: Add fence to available_harts_lock
  riscv: Ensure gp is NULL or points to valid data
  riscv: Add some comments to start.S

 arch/riscv/cpu/cpu.c        | 18 ++++++++++++++
 arch/riscv/cpu/start.S      | 47 +++++++++++++++++++++++++++++--------
 arch/riscv/lib/interrupts.c |  3 ++-
 arch/riscv/lib/smp.c        | 26 +++++++++++++++++---
 4 files changed, 80 insertions(+), 14 deletions(-)

-- 
2.28.0

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-07 18:16 [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Sean Anderson
@ 2020-09-07 18:16 ` Sean Anderson
  2020-09-09  7:50   ` Rick Chen
  2020-09-11  7:38   ` Bin Meng
  2020-09-07 18:16 ` [PATCH 2/7] riscv: Match memory barriers between send_ipi_many and handle_ipi Sean Anderson
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-07 18:16 UTC (permalink / raw)
  To: u-boot

Clearing MIP doesn't do anything. Whoops. The following commits should
tackle this problem in a more robust manner.

This reverts commit 9472630337e7c4ac442066b5a752aaa8c3b4d4a6.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
---

 arch/riscv/cpu/start.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
index bf9fdf369b..e3222b1ea7 100644
--- a/arch/riscv/cpu/start.S
+++ b/arch/riscv/cpu/start.S
@@ -65,8 +65,6 @@ _start:
 #else
 	li	t0, SIE_SSIE
 #endif
-	/* Clear any pending IPIs */
-	csrc	MODE_PREFIX(ip), t0
 	csrs	MODE_PREFIX(ie), t0
 #endif
 
-- 
2.28.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/7] riscv: Match memory barriers between send_ipi_many and handle_ipi
  2020-09-07 18:16 [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Sean Anderson
  2020-09-07 18:16 ` [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs" Sean Anderson
@ 2020-09-07 18:16 ` Sean Anderson
  2020-09-11  7:45   ` Bin Meng
  2020-09-07 18:16 ` [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function Sean Anderson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 45+ messages in thread
From: Sean Anderson @ 2020-09-07 18:16 UTC (permalink / raw)
  To: u-boot

Without a matching barrier on the write side, the barrier in handle_ipi
does nothing. It was entirely possible for the boot hart to write to addr,
arg0, and arg1 *after* sending the IPI, because there was no barrier on the
sending side.

Fixes: 90ae28143700bae4edd23930a7772899ad259058
Signed-off-by: Sean Anderson <seanga2@gmail.com>
---

 arch/riscv/lib/smp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
index ac22136314..ab6d8bd7fa 100644
--- a/arch/riscv/lib/smp.c
+++ b/arch/riscv/lib/smp.c
@@ -54,6 +54,8 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
 		gd->arch.ipi[reg].arg0 = ipi->arg0;
 		gd->arch.ipi[reg].arg1 = ipi->arg1;
 
+		__smp_mb();
+
 		ret = riscv_send_ipi(reg);
 		if (ret) {
 			pr_err("Cannot send IPI to hart %d\n", reg);
-- 
2.28.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-07 18:16 [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Sean Anderson
  2020-09-07 18:16 ` [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs" Sean Anderson
  2020-09-07 18:16 ` [PATCH 2/7] riscv: Match memory barriers between send_ipi_many and handle_ipi Sean Anderson
@ 2020-09-07 18:16 ` Sean Anderson
  2020-09-09  8:33   ` Rick Chen
  2020-09-11  8:04   ` Bin Meng
  2020-09-07 18:16 ` [PATCH 4/7] riscv: Clear pending IPIs on initialization Sean Anderson
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-07 18:16 UTC (permalink / raw)
  To: u-boot

Some IPIs may already be pending when U-Boot is started. This could be a
problem if a secondary hart tries to handle an IPI before the boot hart has
initialized the IPI device.

This commit uses NULL as a sentinel for secondary harts so they know when
the IPI is initialized, and it is safe to use the IPI API. The smp addr
parameter is initialized to NULL by board_init_f_init_reserve. Before this,
secondary harts wait in wait_for_gd_init.

This imposes a minor restriction because harts may no longer jump to NULL.
However, given that the RISC-V debug device is likely to be located at
0x400, it is unlikely for any RISC-V implementation to have usable ram
located at 0x0.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
---

 arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
index ab6d8bd7fa..8c25755330 100644
--- a/arch/riscv/lib/smp.c
+++ b/arch/riscv/lib/smp.c
@@ -18,6 +18,12 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
 	u32 reg;
 	int ret, pending;
 
+	/* NULL is used as a sentinel value */
+	if (!ipi->addr) {
+		pr_err("smp_function cannot be set to 0x0\n");
+		return -EINVAL;
+	}
+
 	cpus = ofnode_path("/cpus");
 	if (!ofnode_valid(cpus)) {
 		pr_err("Can't find cpus node!\n");
@@ -50,11 +56,16 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
 			continue;
 #endif
 
-		gd->arch.ipi[reg].addr = ipi->addr;
 		gd->arch.ipi[reg].arg0 = ipi->arg0;
 		gd->arch.ipi[reg].arg1 = ipi->arg1;
 
-		__smp_mb();
+		/*
+		 * Ensure addr only becomes non-NULL when arg0 and arg1 are
+		 * valid. An IPI may already be pending on other harts, so we
+		 * need a way to signal that the IPI device has been
+		 * initialized, and that it is ok to call the function.
+		 */
+		__smp_store_release(&gd->arch.ipi[reg].addr, ipi->addr);
 
 		ret = riscv_send_ipi(reg);
 		if (ret) {
@@ -83,9 +94,16 @@ void handle_ipi(ulong hart)
 	if (hart >= CONFIG_NR_CPUS)
 		return;
 
-	__smp_mb();
+	smp_function = (void (*)(ulong, ulong, ulong))
+			__smp_load_acquire(&gd->arch.ipi[hart].addr);
+	/*
+	 * If the function is NULL, then U-Boot has not requested the IPI. The
+	 * IPI device may not be initialized, so all we can do is wait for
+	 * U-Boot to initialize it and send an IPI
+	 */
+	if (!smp_function)
+		return;
 
-	smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
 	invalidate_icache_all();
 
 	/*
-- 
2.28.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/7] riscv: Clear pending IPIs on initialization
  2020-09-07 18:16 [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Sean Anderson
                   ` (2 preceding siblings ...)
  2020-09-07 18:16 ` [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function Sean Anderson
@ 2020-09-07 18:16 ` Sean Anderson
  2020-09-14  2:08   ` Bin Meng
  2020-09-07 18:16 ` [PATCH 5/7] riscv: Add fence to available_harts_lock Sean Anderson
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 45+ messages in thread
From: Sean Anderson @ 2020-09-07 18:16 UTC (permalink / raw)
  To: u-boot

Even though we no longer call smp_function if an IPI was not sent by
U-Boot, we still need to clear any IPIs which were pending from the
execution environment. Otherwise, secondary harts will busy-wait in
secondary_hart_loop, instead of relaxing.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
---

 arch/riscv/cpu/cpu.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/riscv/cpu/cpu.c b/arch/riscv/cpu/cpu.c
index bfa2d4a426..ad94744c0f 100644
--- a/arch/riscv/cpu/cpu.c
+++ b/arch/riscv/cpu/cpu.c
@@ -72,6 +72,15 @@ static int riscv_cpu_probe(void)
 	return 0;
 }
 
+/*
+ * This is called on secondary harts just after the IPI is init'd. Currently
+ * there's nothing to do, since we just need to clear any existing IPIs, and
+ * that is handled by the sending of an ipi itself.
+ */
+void riscv_ipi_init_secondary_hart(ulong hart, ulong arg0, ulong arg1)
+{
+}
+
 int arch_cpu_init_dm(void)
 {
 	int ret;
@@ -111,6 +120,15 @@ int arch_cpu_init_dm(void)
 	ret = riscv_init_ipi();
 	if (ret)
 		return ret;
+
+	/*
+	 * Clear all pending IPIs on secondary harts. We don't do anything on
+	 * the boot hart, since we never send an IPI to ourselves, and no
+	 * interrupts are enabled
+	 */
+	ret = smp_call_function((ulong)riscv_ipi_init_secondary_hart, 0, 0, 0);
+	if (ret)
+		return ret;
 #endif
 
 	return 0;
-- 
2.28.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 5/7] riscv: Add fence to available_harts_lock
  2020-09-07 18:16 [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Sean Anderson
                   ` (3 preceding siblings ...)
  2020-09-07 18:16 ` [PATCH 4/7] riscv: Clear pending IPIs on initialization Sean Anderson
@ 2020-09-07 18:16 ` Sean Anderson
  2020-09-10  3:26   ` Rick Chen
  2020-09-11 14:47   ` Bin Meng
  2020-09-07 18:16 ` [PATCH 6/7] riscv: Ensure gp is NULL or points to valid data Sean Anderson
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-07 18:16 UTC (permalink / raw)
  To: u-boot

Without these fences, it is perfectly valid for an out-of-order core to
re-order memory accesses to outside of the available_harts_lock critical
section.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
---

 arch/riscv/cpu/start.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
index e3222b1ea7..ad18e746b6 100644
--- a/arch/riscv/cpu/start.S
+++ b/arch/riscv/cpu/start.S
@@ -126,12 +126,12 @@ call_board_init_f_0:
 #ifndef CONFIG_XIP
 	la	t0, available_harts_lock
 	fence	rw, w
-	amoswap.w zero, zero, 0(t0)
+	amoswap.w.rl zero, zero, 0(t0)
 
 wait_for_gd_init:
 	la	t0, available_harts_lock
 	li	t1, 1
-1:	amoswap.w t1, t1, 0(t0)
+1:	amoswap.w.aq t1, t1, 0(t0)
 	fence	r, rw
 	bnez	t1, 1b
 
@@ -143,7 +143,7 @@ wait_for_gd_init:
 	SREG	t2, GD_AVAILABLE_HARTS(gp)
 
 	fence	rw, w
-	amoswap.w zero, zero, 0(t0)
+	amoswap.w.rl zero, zero, 0(t0)
 
 	/*
 	 * Continue on hart lottery winner, others branch to
-- 
2.28.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 6/7] riscv: Ensure gp is NULL or points to valid data
  2020-09-07 18:16 [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Sean Anderson
                   ` (4 preceding siblings ...)
  2020-09-07 18:16 ` [PATCH 5/7] riscv: Add fence to available_harts_lock Sean Anderson
@ 2020-09-07 18:16 ` Sean Anderson
  2020-09-14  5:25   ` Bin Meng
  2020-09-07 18:16 ` [PATCH 7/7] riscv: Add some comments to start.S Sean Anderson
  2020-09-09  2:02 ` [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Rick Chen
  7 siblings, 1 reply; 45+ messages in thread
From: Sean Anderson @ 2020-09-07 18:16 UTC (permalink / raw)
  To: u-boot

This allows code to use a construct like `if (gd & gd->...) { ... }` when
accessing the global data pointer. Without this change, it was possible for
a very early trap to cause _exit_trap to read arbitrary memory. This could
cause a second trap, preventing show_regs from being printed.

Fixes: 7c6ca03eaed0035ca6676e9bc7f5f1dfcaae7e8f
Signed-off-by: Sean Anderson <seanga2@gmail.com>
---

 arch/riscv/cpu/start.S      | 20 +++++++++++++++++---
 arch/riscv/lib/interrupts.c |  3 ++-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
index ad18e746b6..59d3d7bbf4 100644
--- a/arch/riscv/cpu/start.S
+++ b/arch/riscv/cpu/start.S
@@ -47,6 +47,13 @@ _start:
 	mv	tp, a0
 	mv	s1, a1
 
+	/*
+	 * Set the global data pointer to a known value in case we get a very
+	 * early trap. The global data pointer will be set its actual value only
+	 * after it has been initialized.
+	 */
+	mv	gp, zero
+
 	la	t0, trap_entry
 	csrw	MODE_PREFIX(tvec), t0
 
@@ -85,10 +92,10 @@ call_board_init_f_0:
 	jal	board_init_f_alloc_reserve
 
 	/*
-	 * Set global data pointer here for all harts, uninitialized at this
-	 * point.
+	 * Save global data pointer for later. We don't set it here because it
+	 * is not initialized yet.
 	 */
-	mv	gp, a0
+	mv	s0, a0
 
 	/* setup stack */
 #if CONFIG_IS_ENABLED(SMP)
@@ -135,6 +142,13 @@ wait_for_gd_init:
 	fence	r, rw
 	bnez	t1, 1b
 
+	/*
+	 * Set the global data pointer only when gd_t has been initialized.
+	 * This was already set by arch_setup_gd on the boot hart, but all other
+	 * harts' global data pointers gets set here.
+	 */
+	mv	gp, s0
+
 	/* register available harts in the available_harts mask */
 	li	t1, 1
 	sll	t1, t1, tp
diff --git a/arch/riscv/lib/interrupts.c b/arch/riscv/lib/interrupts.c
index cd47e64487..ad870e98d8 100644
--- a/arch/riscv/lib/interrupts.c
+++ b/arch/riscv/lib/interrupts.c
@@ -78,7 +78,8 @@ static void _exit_trap(ulong code, ulong epc, ulong tval, struct pt_regs *regs)
 
 	printf("EPC: " REG_FMT " RA: " REG_FMT " TVAL: " REG_FMT "\n",
 	       epc, regs->ra, tval);
-	if (gd->flags & GD_FLG_RELOC)
+	/* Print relocation adjustments, but only if gd is initialized */
+	if (gd && gd->flags & GD_FLG_RELOC)
 		printf("EPC: " REG_FMT " RA: " REG_FMT " reloc adjusted\n\n",
 		       epc - gd->reloc_off, regs->ra - gd->reloc_off);
 
-- 
2.28.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 7/7] riscv: Add some comments to start.S
  2020-09-07 18:16 [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Sean Anderson
                   ` (5 preceding siblings ...)
  2020-09-07 18:16 ` [PATCH 6/7] riscv: Ensure gp is NULL or points to valid data Sean Anderson
@ 2020-09-07 18:16 ` Sean Anderson
  2020-09-14  5:26   ` Bin Meng
  2020-09-09  2:02 ` [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Rick Chen
  7 siblings, 1 reply; 45+ messages in thread
From: Sean Anderson @ 2020-09-07 18:16 UTC (permalink / raw)
  To: u-boot

This adds comments regarding the ordering and purpose of certain
instructions as I understand them.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
---

 arch/riscv/cpu/start.S | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
index 59d3d7bbf4..c659c6df53 100644
--- a/arch/riscv/cpu/start.S
+++ b/arch/riscv/cpu/start.S
@@ -43,7 +43,10 @@ _start:
 	csrr	a0, CSR_MHARTID
 #endif
 
-	/* save hart id and dtb pointer */
+	/*
+	 * Save hart id and dtb pointer. The thread pointer register is not
+	 * modified by C code, and may be used in trap handlers.
+	 */
 	mv	tp, a0
 	mv	s1, a1
 
@@ -54,10 +57,18 @@ _start:
 	 */
 	mv	gp, zero
 
+	/*
+	 * Set the trap handler. This must happen after initializing tp and gp
+	 * because the handler may use these registers.
+	 */
 	la	t0, trap_entry
 	csrw	MODE_PREFIX(tvec), t0
 
-	/* mask all interrupts */
+	/*
+	 * Mask all interrupts. Interrupts are disabled globally (in m/sstatus)
+	 * for U-Boot, but we will need to read m/sip to determine if we get an
+	 * IPI
+	 */
 	csrw	MODE_PREFIX(ie), zero
 
 #if CONFIG_IS_ENABLED(SMP)
@@ -407,6 +418,10 @@ secondary_hart_relocate:
 	mv	gp, a2
 #endif
 
+/*
+ * Interrupts are disabled globally, but they can still be read from m/sip. The
+ * wfi function will wake us up if we get an IPI, even if we do not trap.
+ */
 secondary_hart_loop:
 	wfi
 
-- 
2.28.0

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot
  2020-09-07 18:16 [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Sean Anderson
                   ` (6 preceding siblings ...)
  2020-09-07 18:16 ` [PATCH 7/7] riscv: Add some comments to start.S Sean Anderson
@ 2020-09-09  2:02 ` Rick Chen
  2020-09-09  2:38   ` Sean Anderson
  7 siblings, 1 reply; 45+ messages in thread
From: Rick Chen @ 2020-09-09  2:02 UTC (permalink / raw)
  To: u-boot

Hi Sean

> On the K210, the prior stage bootloader does not clear IPIs. This presents
> a problem, because U-Boot up until this point assumes (with one exception)
> that IPIs are cleared when it starts. This series attempts to fix this in a
> robust manner, and fix several concurrency bugs I noticed while fixing
> these other areas. Heinrich previously submitted a patch addressing part of
> this problem in [1].
>
> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk at gmx.de/
>

It sounds like that the bootloader does not deal with SMP flow well
and jump to u-boot-spl, right ?

I have a question that why not try to fix the prior stage bootloader
to clear IPIs correctly?

Actually I have encounter a similar SMP issue like you.
Our prior stage bootloader will jump to u-boot-spl with the incorrect
mstatus and result in the SMP working abnormal in u-boot-spl.

I mean this is an individual case, not a general case.
If we try to cover any errors which come from any prior stage bootloaders,
the SMP flow will become more and more complicated and hard to debug.

That is why it does not need implement SMP flow in U-Boot proper with
SBI v0.2 HSM extension.

Thanks,
Rick

>
> Sean Anderson (7):
>   Revert "riscv: Clear pending interrupts before enabling IPIs"
>   riscv: Match memory barriers between send_ipi_many and handle_ipi
>   riscv: Use NULL as a sentinel value for smp_call_function
>   riscv: Clear pending IPIs on initialization
>   riscv: Add fence to available_harts_lock
>   riscv: Ensure gp is NULL or points to valid data
>   riscv: Add some comments to start.S
>
>  arch/riscv/cpu/cpu.c        | 18 ++++++++++++++
>  arch/riscv/cpu/start.S      | 47 +++++++++++++++++++++++++++++--------
>  arch/riscv/lib/interrupts.c |  3 ++-
>  arch/riscv/lib/smp.c        | 26 +++++++++++++++++---
>  4 files changed, 80 insertions(+), 14 deletions(-)
>
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot
  2020-09-09  2:02 ` [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Rick Chen
@ 2020-09-09  2:38   ` Sean Anderson
  2020-09-09  2:44     ` Sean Anderson
  2020-09-10  7:08     ` Rick Chen
  0 siblings, 2 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-09  2:38 UTC (permalink / raw)
  To: u-boot

On 9/8/20 10:02 PM, Rick Chen wrote:
> Hi Sean
> 
>> On the K210, the prior stage bootloader does not clear IPIs. This presents
>> a problem, because U-Boot up until this point assumes (with one exception)
>> that IPIs are cleared when it starts. This series attempts to fix this in a
>> robust manner, and fix several concurrency bugs I noticed while fixing
>> these other areas. Heinrich previously submitted a patch addressing part of
>> this problem in [1].
>>
>> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk at gmx.de/
>>
> 
> It sounds like that the bootloader does not deal with SMP flow well
> and jump to u-boot-spl, right ?
> 
> I have a question that why not try to fix the prior stage bootloader
> to clear IPIs correctly?

Because it is a ROM :)

> 
> Actually I have encounter a similar SMP issue like you.
> Our prior stage bootloader will jump to u-boot-spl with the incorrect
> mstatus and result in the SMP working abnormal in u-boot-spl.

Perhaps we should just clear MIE then? I originally had a patch in this
series which moved the handle_ipi code into handle_trap, and got rid of
the manual checks on the interrupt. Something like

secondary_hart_loop:
	wfi
	j	secondary_hart_loop

Of course as part of that we would need to explicitly enable and disable
interrupts. Perhaps not the worst idea, but I didn't include it here
because I figure the current system work OK, even if it is not what one
might expect.

> I mean this is an individual case, not a general case.
> If we try to cover any errors which come from any prior stage bootloaders,
> the SMP flow will become more and more complicated and hard to debug.

Of course, if a prior bootloader doesn't hold up its end of the
contract, we can be left with some awful bugs to fix. U-Boot is
generally not too bad to debug, but I've had an awful time whenever some
concurrency sneaks into the mix. I think it's much better to confine the
(necessary) complexity to as few files as possible, so that the rest of
the code can be ignorant. I think part of that is verifying that we have
everything in a known state, so that when we see something unexpected,
we can handle it/panic/whatever instead of silently getting a bug.

--Sean

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot
  2020-09-09  2:38   ` Sean Anderson
@ 2020-09-09  2:44     ` Sean Anderson
  2020-09-10  7:08     ` Rick Chen
  1 sibling, 0 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-09  2:44 UTC (permalink / raw)
  To: u-boot

On 9/8/20 10:38 PM, Sean Anderson wrote:
> On 9/8/20 10:02 PM, Rick Chen wrote:
>> Hi Sean
>>
>>> On the K210, the prior stage bootloader does not clear IPIs. This presents
>>> a problem, because U-Boot up until this point assumes (with one exception)
>>> that IPIs are cleared when it starts. This series attempts to fix this in a
>>> robust manner, and fix several concurrency bugs I noticed while fixing
>>> these other areas. Heinrich previously submitted a patch addressing part of
>>> this problem in [1].
>>>
>>> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk at gmx.de/
>>>
>>
>> It sounds like that the bootloader does not deal with SMP flow well
>> and jump to u-boot-spl, right ?
>>
>> I have a question that why not try to fix the prior stage bootloader
>> to clear IPIs correctly?
> 
> Because it is a ROM :)

Err, perhaps I should clarify. When I say "prior stage bootloader," I
mean that in the general sense of "anything which comes before U-Boot,"
and not to refer specifically to SPL or TPL. For the k210, this is
something akin to the ZSBL on an fu540.

--Sean

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-07 18:16 ` [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs" Sean Anderson
@ 2020-09-09  7:50   ` Rick Chen
  2020-09-09 10:23     ` Sean Anderson
  2020-09-11  7:38   ` Bin Meng
  1 sibling, 1 reply; 45+ messages in thread
From: Rick Chen @ 2020-09-09  7:50 UTC (permalink / raw)
  To: u-boot

Hi Sean

> Clearing MIP doesn't do anything. Whoops. The following commits should
> tackle this problem in a more robust manner.

I still not catch your points about that  this commit 947263033 really
help to fix  pending IPIs not clean problem on k210 platform at that
time, but you just said it do nothing and remove it here currently.

Thanks,
Rick

>
> This reverts commit 9472630337e7c4ac442066b5a752aaa8c3b4d4a6.
>
> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  arch/riscv/cpu/start.S | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> index bf9fdf369b..e3222b1ea7 100644
> --- a/arch/riscv/cpu/start.S
> +++ b/arch/riscv/cpu/start.S
> @@ -65,8 +65,6 @@ _start:
>  #else
>         li      t0, SIE_SSIE
>  #endif
> -       /* Clear any pending IPIs */
> -       csrc    MODE_PREFIX(ip), t0
>         csrs    MODE_PREFIX(ie), t0
>  #endif
>
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-07 18:16 ` [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function Sean Anderson
@ 2020-09-09  8:33   ` Rick Chen
  2020-09-09  9:01     ` Rick Chen
  2020-09-11  8:04   ` Bin Meng
  1 sibling, 1 reply; 45+ messages in thread
From: Rick Chen @ 2020-09-09  8:33 UTC (permalink / raw)
  To: u-boot

Hi Sean

> Some IPIs may already be pending when U-Boot is started. This could be a
> problem if a secondary hart tries to handle an IPI before the boot hart has
> initialized the IPI device.
>
> This commit uses NULL as a sentinel for secondary harts so they know when
> the IPI is initialized, and it is safe to use the IPI API. The smp addr
> parameter is initialized to NULL by board_init_f_init_reserve. Before this,
> secondary harts wait in wait_for_gd_init.
>
> This imposes a minor restriction because harts may no longer jump to NULL.
> However, given that the RISC-V debug device is likely to be located at
> 0x400, it is unlikely for any RISC-V implementation to have usable ram
> located at 0x0.

The ram location of AE350 is at 0x0.

>
> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
> index ab6d8bd7fa..8c25755330 100644
> --- a/arch/riscv/lib/smp.c
> +++ b/arch/riscv/lib/smp.c
> @@ -18,6 +18,12 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>         u32 reg;
>         int ret, pending;
>
> +       /* NULL is used as a sentinel value */
> +       if (!ipi->addr) {
> +               pr_err("smp_function cannot be set to 0x0\n");
> +               return -EINVAL;
> +       }
> +

This conflict with memory configurations of AE350.
Please check about doc\board\AndesTech\ax25-ae350.rst, and you can
find BBL is configured as zero address on AE350 platform.

>         cpus = ofnode_path("/cpus");
>         if (!ofnode_valid(cpus)) {
>                 pr_err("Can't find cpus node!\n");
> @@ -50,11 +56,16 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>                         continue;
>  #endif
>
> -               gd->arch.ipi[reg].addr = ipi->addr;
>                 gd->arch.ipi[reg].arg0 = ipi->arg0;
>                 gd->arch.ipi[reg].arg1 = ipi->arg1;
>
> -               __smp_mb();
> +               /*
> +                * Ensure addr only becomes non-NULL when arg0 and arg1 are
> +                * valid. An IPI may already be pending on other harts, so we
> +                * need a way to signal that the IPI device has been
> +                * initialized, and that it is ok to call the function.
> +                */
> +               __smp_store_release(&gd->arch.ipi[reg].addr, ipi->addr);

It is too tricky and hack by using zero address to be a signal for the
other pending harts waiting the IPI device been initialized.

>
>                 ret = riscv_send_ipi(reg);
>                 if (ret) {
> @@ -83,9 +94,16 @@ void handle_ipi(ulong hart)
>         if (hart >= CONFIG_NR_CPUS)
>                 return;
>
> -       __smp_mb();
> +       smp_function = (void (*)(ulong, ulong, ulong))
> +                       __smp_load_acquire(&gd->arch.ipi[hart].addr);
> +       /*
> +        * If the function is NULL, then U-Boot has not requested the IPI. The
> +        * IPI device may not be initialized, so all we can do is wait for
> +        * U-Boot to initialize it and send an IPI
> +        */
> +       if (!smp_function)
> +               return;

It will boot BBL+Kernel payload fail here on AE350 platforms with this check.

Thanks,
Rick

>
> -       smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
>         invalidate_icache_all();
>
>         /*
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-09  8:33   ` Rick Chen
@ 2020-09-09  9:01     ` Rick Chen
  2020-09-09 10:16       ` Sean Anderson
  0 siblings, 1 reply; 45+ messages in thread
From: Rick Chen @ 2020-09-09  9:01 UTC (permalink / raw)
  To: u-boot

Hi Sean

> Hi Sean
>
> > Some IPIs may already be pending when U-Boot is started. This could be a
> > problem if a secondary hart tries to handle an IPI before the boot hart has
> > initialized the IPI device.
> >
> > This commit uses NULL as a sentinel for secondary harts so they know when
> > the IPI is initialized, and it is safe to use the IPI API. The smp addr
> > parameter is initialized to NULL by board_init_f_init_reserve. Before this,
> > secondary harts wait in wait_for_gd_init.
> >
> > This imposes a minor restriction because harts may no longer jump to NULL.
> > However, given that the RISC-V debug device is likely to be located at
> > 0x400, it is unlikely for any RISC-V implementation to have usable ram
> > located at 0x0.
>
> The ram location of AE350 is at 0x0.
>
> >
> > Signed-off-by: Sean Anderson <seanga2@gmail.com>
> > ---
> >
> >  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
> >  1 file changed, 22 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
> > index ab6d8bd7fa..8c25755330 100644
> > --- a/arch/riscv/lib/smp.c
> > +++ b/arch/riscv/lib/smp.c
> > @@ -18,6 +18,12 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
> >         u32 reg;
> >         int ret, pending;
> >
> > +       /* NULL is used as a sentinel value */
> > +       if (!ipi->addr) {
> > +               pr_err("smp_function cannot be set to 0x0\n");
> > +               return -EINVAL;
> > +       }
> > +
>
> This conflict with memory configurations of AE350.
> Please check about doc\board\AndesTech\ax25-ae350.rst, and you can
> find BBL is configured as zero address on AE350 platform.
>
> >         cpus = ofnode_path("/cpus");
> >         if (!ofnode_valid(cpus)) {
> >                 pr_err("Can't find cpus node!\n");
> > @@ -50,11 +56,16 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
> >                         continue;
> >  #endif
> >
> > -               gd->arch.ipi[reg].addr = ipi->addr;
> >                 gd->arch.ipi[reg].arg0 = ipi->arg0;
> >                 gd->arch.ipi[reg].arg1 = ipi->arg1;
> >
> > -               __smp_mb();

Why do you add this in [PATCH 2/7] but remove it in  [PATCH 3/7] ?

Thanks,
Rick

> > +               /*
> > +                * Ensure addr only becomes non-NULL when arg0 and arg1 are
> > +                * valid. An IPI may already be pending on other harts, so we
> > +                * need a way to signal that the IPI device has been
> > +                * initialized, and that it is ok to call the function.
> > +                */
> > +               __smp_store_release(&gd->arch.ipi[reg].addr, ipi->addr);
>
> It is too tricky and hack by using zero address to be a signal for the
> other pending harts waiting the IPI device been initialized.
>
> >
> >                 ret = riscv_send_ipi(reg);
> >                 if (ret) {
> > @@ -83,9 +94,16 @@ void handle_ipi(ulong hart)
> >         if (hart >= CONFIG_NR_CPUS)
> >                 return;
> >
> > -       __smp_mb();
> > +       smp_function = (void (*)(ulong, ulong, ulong))
> > +                       __smp_load_acquire(&gd->arch.ipi[hart].addr);
> > +       /*
> > +        * If the function is NULL, then U-Boot has not requested the IPI. The
> > +        * IPI device may not be initialized, so all we can do is wait for
> > +        * U-Boot to initialize it and send an IPI
> > +        */
> > +       if (!smp_function)
> > +               return;
>
> It will boot BBL+Kernel payload fail here on AE350 platforms with this check.
>
> Thanks,
> Rick
>
> >
> > -       smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
> >         invalidate_icache_all();
> >
> >         /*
> > --
> > 2.28.0
> >

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-09  9:01     ` Rick Chen
@ 2020-09-09 10:16       ` Sean Anderson
  2020-09-09 10:26         ` Heinrich Schuchardt
                           ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-09 10:16 UTC (permalink / raw)
  To: u-boot

On 9/9/20 5:01 AM, Rick Chen wrote:
> Hi Sean
> 
>> Hi Sean
>>
>>> Some IPIs may already be pending when U-Boot is started. This could be a
>>> problem if a secondary hart tries to handle an IPI before the boot hart has
>>> initialized the IPI device.
>>>
>>> This commit uses NULL as a sentinel for secondary harts so they know when
>>> the IPI is initialized, and it is safe to use the IPI API. The smp addr
>>> parameter is initialized to NULL by board_init_f_init_reserve. Before this,
>>> secondary harts wait in wait_for_gd_init.
>>>
>>> This imposes a minor restriction because harts may no longer jump to NULL.
>>> However, given that the RISC-V debug device is likely to be located at
>>> 0x400, it is unlikely for any RISC-V implementation to have usable ram
>>> located at 0x0.
>>
>> The ram location of AE350 is at 0x0.

Huh. Does it not have a debug device?

>>
>>>
>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>>> ---
>>>
>>>  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
>>>  1 file changed, 22 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
>>> index ab6d8bd7fa..8c25755330 100644
>>> --- a/arch/riscv/lib/smp.c
>>> +++ b/arch/riscv/lib/smp.c
>>> @@ -18,6 +18,12 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>>>         u32 reg;
>>>         int ret, pending;
>>>
>>> +       /* NULL is used as a sentinel value */
>>> +       if (!ipi->addr) {
>>> +               pr_err("smp_function cannot be set to 0x0\n");
>>> +               return -EINVAL;
>>> +       }
>>> +
>>
>> This conflict with memory configurations of AE350.
>> Please check about doc\board\AndesTech\ax25-ae350.rst, and you can
>> find BBL is configured as zero address on AE350 platform.

Ok, that is a strange choice because any accidental NULL-pointer
dereference turns into code modification. In the next revision, I will
add an arch.ipi[reg].valid variable for the same prupose, instead of
re-using addr.

>>>         cpus = ofnode_path("/cpus");
>>>         if (!ofnode_valid(cpus)) {
>>>                 pr_err("Can't find cpus node!\n");
>>> @@ -50,11 +56,16 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>>>                         continue;
>>>  #endif
>>>
>>> -               gd->arch.ipi[reg].addr = ipi->addr;
>>>                 gd->arch.ipi[reg].arg0 = ipi->arg0;
>>>                 gd->arch.ipi[reg].arg1 = ipi->arg1;
>>>
>>> -               __smp_mb();
> 
> Why do you add this in [PATCH 2/7] but remove it in  [PATCH 3/7] ?

Because conceptually, patch 2 is independent of this patch. It is still
a bug even if this patch is not applied. I think by making this change
over two patches, it is more obvious why the barrier was added, and then
weakened, as opposed to if I made the change in one patch.

> 
> Thanks,
> Rick
> 
>>> +               /*
>>> +                * Ensure addr only becomes non-NULL when arg0 and arg1 are
>>> +                * valid. An IPI may already be pending on other harts, so we
>>> +                * need a way to signal that the IPI device has been
>>> +                * initialized, and that it is ok to call the function.
>>> +                */
>>> +               __smp_store_release(&gd->arch.ipi[reg].addr, ipi->addr);
>>
>> It is too tricky and hack by using zero address to be a signal for the
>> other pending harts waiting the IPI device been initialized.
>>
>>>
>>>                 ret = riscv_send_ipi(reg);
>>>                 if (ret) {
>>> @@ -83,9 +94,16 @@ void handle_ipi(ulong hart)
>>>         if (hart >= CONFIG_NR_CPUS)
>>>                 return;
>>>
>>> -       __smp_mb();
>>> +       smp_function = (void (*)(ulong, ulong, ulong))
>>> +                       __smp_load_acquire(&gd->arch.ipi[hart].addr);
>>> +       /*
>>> +        * If the function is NULL, then U-Boot has not requested the IPI. The
>>> +        * IPI device may not be initialized, so all we can do is wait for
>>> +        * U-Boot to initialize it and send an IPI
>>> +        */
>>> +       if (!smp_function)
>>> +               return;
>>
>> It will boot BBL+Kernel payload fail here on AE350 platforms with this check.
>>
>> Thanks,
>> Rick
>>
>>>
>>> -       smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
>>>         invalidate_icache_all();
>>>
>>>         /*
>>> --
>>> 2.28.0
>>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-09  7:50   ` Rick Chen
@ 2020-09-09 10:23     ` Sean Anderson
  2020-09-10  6:39       ` Rick Chen
  0 siblings, 1 reply; 45+ messages in thread
From: Sean Anderson @ 2020-09-09 10:23 UTC (permalink / raw)
  To: u-boot

On 9/9/20 3:50 AM, Rick Chen wrote:
> Hi Sean
> 
>> Clearing MIP doesn't do anything. Whoops. The following commits should
>> tackle this problem in a more robust manner.
> 
> I still not catch your points about that  this commit 947263033 really
> help to fix  pending IPIs not clean problem on k210 platform at that
> time, but you just said it do nothing and remove it here currently.

After refactoring the original smp patch to remove the null check, I
neglected to re-test with a debug uart enabled. Without doing that test,
it is impossible to catch the pending IPI bug. The secondary hart will
crash before the serial driver is initialized. I didn't do that test at
the time, because I was mostly worried about the secondary hart
corrupting the device tree and causing the boot to fail. That failure
mode was fixed by 40686c394. So I saw that and thought that the pending
IPI issue was solved as well.

--Sean

>>
>> This reverts commit 9472630337e7c4ac442066b5a752aaa8c3b4d4a6.
>>
>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>> ---
>>
>>  arch/riscv/cpu/start.S | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
>> index bf9fdf369b..e3222b1ea7 100644
>> --- a/arch/riscv/cpu/start.S
>> +++ b/arch/riscv/cpu/start.S
>> @@ -65,8 +65,6 @@ _start:
>>  #else
>>         li      t0, SIE_SSIE
>>  #endif
>> -       /* Clear any pending IPIs */
>> -       csrc    MODE_PREFIX(ip), t0
>>         csrs    MODE_PREFIX(ie), t0
>>  #endif
>>
>> --
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-09 10:16       ` Sean Anderson
@ 2020-09-09 10:26         ` Heinrich Schuchardt
  2020-09-09 10:36           ` Sean Anderson
  2020-09-10  8:09         ` Rick Chen
  2020-09-14  3:21         ` Rick Chen
  2 siblings, 1 reply; 45+ messages in thread
From: Heinrich Schuchardt @ 2020-09-09 10:26 UTC (permalink / raw)
  To: u-boot

On 9/9/20 12:16 PM, Sean Anderson wrote:
> On 9/9/20 5:01 AM, Rick Chen wrote:
>> Hi Sean
>>
>>> Hi Sean
>>>
>>>> Some IPIs may already be pending when U-Boot is started. This could be a
>>>> problem if a secondary hart tries to handle an IPI before the boot hart has
>>>> initialized the IPI device.
>>>>
>>>> This commit uses NULL as a sentinel for secondary harts so they know when
>>>> the IPI is initialized, and it is safe to use the IPI API. The smp addr
>>>> parameter is initialized to NULL by board_init_f_init_reserve. Before this,
>>>> secondary harts wait in wait_for_gd_init.
>>>>
>>>> This imposes a minor restriction because harts may no longer jump to NULL.
>>>> However, given that the RISC-V debug device is likely to be located at
>>>> 0x400, it is unlikely for any RISC-V implementation to have usable ram
>>>> located at 0x0.
>>>
>>> The ram location of AE350 is at 0x0.
>
> Huh. Does it not have a debug device?

Wouldn't it make sense to use an odd value (e.g. (void *)-1UL) instead
NULL as sentinal value? RISC-V code addresses are always even.

Best regards

Heinrich

>
>>>
>>>>
>>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>>>> ---
>>>>
>>>>  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
>>>>  1 file changed, 22 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
>>>> index ab6d8bd7fa..8c25755330 100644
>>>> --- a/arch/riscv/lib/smp.c
>>>> +++ b/arch/riscv/lib/smp.c
>>>> @@ -18,6 +18,12 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>>>>         u32 reg;
>>>>         int ret, pending;
>>>>
>>>> +       /* NULL is used as a sentinel value */
>>>> +       if (!ipi->addr) {
>>>> +               pr_err("smp_function cannot be set to 0x0\n");
>>>> +               return -EINVAL;
>>>> +       }
>>>> +
>>>
>>> This conflict with memory configurations of AE350.
>>> Please check about doc\board\AndesTech\ax25-ae350.rst, and you can
>>> find BBL is configured as zero address on AE350 platform.
>
> Ok, that is a strange choice because any accidental NULL-pointer
> dereference turns into code modification. In the next revision, I will
> add an arch.ipi[reg].valid variable for the same prupose, instead of
> re-using addr.
>
>>>>         cpus = ofnode_path("/cpus");
>>>>         if (!ofnode_valid(cpus)) {
>>>>                 pr_err("Can't find cpus node!\n");
>>>> @@ -50,11 +56,16 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>>>>                         continue;
>>>>  #endif
>>>>
>>>> -               gd->arch.ipi[reg].addr = ipi->addr;
>>>>                 gd->arch.ipi[reg].arg0 = ipi->arg0;
>>>>                 gd->arch.ipi[reg].arg1 = ipi->arg1;
>>>>
>>>> -               __smp_mb();
>>
>> Why do you add this in [PATCH 2/7] but remove it in  [PATCH 3/7] ?
>
> Because conceptually, patch 2 is independent of this patch. It is still
> a bug even if this patch is not applied. I think by making this change
> over two patches, it is more obvious why the barrier was added, and then
> weakened, as opposed to if I made the change in one patch.
>
>>
>> Thanks,
>> Rick
>>
>>>> +               /*
>>>> +                * Ensure addr only becomes non-NULL when arg0 and arg1 are
>>>> +                * valid. An IPI may already be pending on other harts, so we
>>>> +                * need a way to signal that the IPI device has been
>>>> +                * initialized, and that it is ok to call the function.
>>>> +                */
>>>> +               __smp_store_release(&gd->arch.ipi[reg].addr, ipi->addr);
>>>
>>> It is too tricky and hack by using zero address to be a signal for the
>>> other pending harts waiting the IPI device been initialized.
>>>
>>>>
>>>>                 ret = riscv_send_ipi(reg);
>>>>                 if (ret) {
>>>> @@ -83,9 +94,16 @@ void handle_ipi(ulong hart)
>>>>         if (hart >= CONFIG_NR_CPUS)
>>>>                 return;
>>>>
>>>> -       __smp_mb();
>>>> +       smp_function = (void (*)(ulong, ulong, ulong))
>>>> +                       __smp_load_acquire(&gd->arch.ipi[hart].addr);
>>>> +       /*
>>>> +        * If the function is NULL, then U-Boot has not requested the IPI. The
>>>> +        * IPI device may not be initialized, so all we can do is wait for
>>>> +        * U-Boot to initialize it and send an IPI
>>>> +        */
>>>> +       if (!smp_function)
>>>> +               return;
>>>
>>> It will boot BBL+Kernel payload fail here on AE350 platforms with this check.
>>>
>>> Thanks,
>>> Rick
>>>
>>>>
>>>> -       smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
>>>>         invalidate_icache_all();
>>>>
>>>>         /*
>>>> --
>>>> 2.28.0
>>>>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-09 10:26         ` Heinrich Schuchardt
@ 2020-09-09 10:36           ` Sean Anderson
  0 siblings, 0 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-09 10:36 UTC (permalink / raw)
  To: u-boot

On 9/9/20 6:26 AM, Heinrich Schuchardt wrote:
> On 9/9/20 12:16 PM, Sean Anderson wrote:
>> On 9/9/20 5:01 AM, Rick Chen wrote:
>>> Hi Sean
>>>
>>>> Hi Sean
>>>>
>>>>> Some IPIs may already be pending when U-Boot is started. This could be a
>>>>> problem if a secondary hart tries to handle an IPI before the boot hart has
>>>>> initialized the IPI device.
>>>>>
>>>>> This commit uses NULL as a sentinel for secondary harts so they know when
>>>>> the IPI is initialized, and it is safe to use the IPI API. The smp addr
>>>>> parameter is initialized to NULL by board_init_f_init_reserve. Before this,
>>>>> secondary harts wait in wait_for_gd_init.
>>>>>
>>>>> This imposes a minor restriction because harts may no longer jump to NULL.
>>>>> However, given that the RISC-V debug device is likely to be located at
>>>>> 0x400, it is unlikely for any RISC-V implementation to have usable ram
>>>>> located at 0x0.
>>>>
>>>> The ram location of AE350 is at 0x0.
>>
>> Huh. Does it not have a debug device?
> 
> Wouldn't it make sense to use an odd value (e.g. (void *)-1UL) instead
> NULL as sentinal value? RISC-V code addresses are always even.

Well, the advantage of NULL is that it gets set when gd gets
initialized, so we don't have to do anything else. On the K210, we are
effectively already using it as a sentinel value for this reason (except
then we jump to it and crash).

To implement your suggestion, we would have to add something like

void riscv_early_ipi_init(void)
{
	int i;

	for (i = 0; i < CONFIG_NR_CPUS; i++)
		gd->arch.ipi[i].addr = -1;
}

and call that after board_init_f_init_reserve but before releasing
available_harts_lock.

--Sean

> 
> Best regards
> 
> Heinrich
> 
>>
>>>>
>>>>>
>>>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>>>>> ---
>>>>>
>>>>>  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
>>>>>  1 file changed, 22 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
>>>>> index ab6d8bd7fa..8c25755330 100644
>>>>> --- a/arch/riscv/lib/smp.c
>>>>> +++ b/arch/riscv/lib/smp.c
>>>>> @@ -18,6 +18,12 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>>>>>         u32 reg;
>>>>>         int ret, pending;
>>>>>
>>>>> +       /* NULL is used as a sentinel value */
>>>>> +       if (!ipi->addr) {
>>>>> +               pr_err("smp_function cannot be set to 0x0\n");
>>>>> +               return -EINVAL;
>>>>> +       }
>>>>> +
>>>>
>>>> This conflict with memory configurations of AE350.
>>>> Please check about doc\board\AndesTech\ax25-ae350.rst, and you can
>>>> find BBL is configured as zero address on AE350 platform.
>>
>> Ok, that is a strange choice because any accidental NULL-pointer
>> dereference turns into code modification. In the next revision, I will
>> add an arch.ipi[reg].valid variable for the same prupose, instead of
>> re-using addr.
>>
>>>>>         cpus = ofnode_path("/cpus");
>>>>>         if (!ofnode_valid(cpus)) {
>>>>>                 pr_err("Can't find cpus node!\n");
>>>>> @@ -50,11 +56,16 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>>>>>                         continue;
>>>>>  #endif
>>>>>
>>>>> -               gd->arch.ipi[reg].addr = ipi->addr;
>>>>>                 gd->arch.ipi[reg].arg0 = ipi->arg0;
>>>>>                 gd->arch.ipi[reg].arg1 = ipi->arg1;
>>>>>
>>>>> -               __smp_mb();
>>>
>>> Why do you add this in [PATCH 2/7] but remove it in  [PATCH 3/7] ?
>>
>> Because conceptually, patch 2 is independent of this patch. It is still
>> a bug even if this patch is not applied. I think by making this change
>> over two patches, it is more obvious why the barrier was added, and then
>> weakened, as opposed to if I made the change in one patch.
>>
>>>
>>> Thanks,
>>> Rick
>>>
>>>>> +               /*
>>>>> +                * Ensure addr only becomes non-NULL when arg0 and arg1 are
>>>>> +                * valid. An IPI may already be pending on other harts, so we
>>>>> +                * need a way to signal that the IPI device has been
>>>>> +                * initialized, and that it is ok to call the function.
>>>>> +                */
>>>>> +               __smp_store_release(&gd->arch.ipi[reg].addr, ipi->addr);
>>>>
>>>> It is too tricky and hack by using zero address to be a signal for the
>>>> other pending harts waiting the IPI device been initialized.
>>>>
>>>>>
>>>>>                 ret = riscv_send_ipi(reg);
>>>>>                 if (ret) {
>>>>> @@ -83,9 +94,16 @@ void handle_ipi(ulong hart)
>>>>>         if (hart >= CONFIG_NR_CPUS)
>>>>>                 return;
>>>>>
>>>>> -       __smp_mb();
>>>>> +       smp_function = (void (*)(ulong, ulong, ulong))
>>>>> +                       __smp_load_acquire(&gd->arch.ipi[hart].addr);
>>>>> +       /*
>>>>> +        * If the function is NULL, then U-Boot has not requested the IPI. The
>>>>> +        * IPI device may not be initialized, so all we can do is wait for
>>>>> +        * U-Boot to initialize it and send an IPI
>>>>> +        */
>>>>> +       if (!smp_function)
>>>>> +               return;
>>>>
>>>> It will boot BBL+Kernel payload fail here on AE350 platforms with this check.
>>>>
>>>> Thanks,
>>>> Rick
>>>>
>>>>>
>>>>> -       smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
>>>>>         invalidate_icache_all();
>>>>>
>>>>>         /*
>>>>> --
>>>>> 2.28.0
>>>>>
>>
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 5/7] riscv: Add fence to available_harts_lock
  2020-09-07 18:16 ` [PATCH 5/7] riscv: Add fence to available_harts_lock Sean Anderson
@ 2020-09-10  3:26   ` Rick Chen
  2020-09-11 10:39     ` Sean Anderson
  2020-09-11 14:47   ` Bin Meng
  1 sibling, 1 reply; 45+ messages in thread
From: Rick Chen @ 2020-09-10  3:26 UTC (permalink / raw)
  To: u-boot

Hi Sean

> Without these fences, it is perfectly valid for an out-of-order core to
> re-order memory accesses to outside of the available_harts_lock critical
> section.
>
> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  arch/riscv/cpu/start.S | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> index e3222b1ea7..ad18e746b6 100644
> --- a/arch/riscv/cpu/start.S
> +++ b/arch/riscv/cpu/start.S
> @@ -126,12 +126,12 @@ call_board_init_f_0:
>  #ifndef CONFIG_XIP
>         la      t0, available_harts_lock
>         fence   rw, w
> -       amoswap.w zero, zero, 0(t0)
> +       amoswap.w.rl zero, zero, 0(t0)

fence   rw, w can also be removed.

>
>  wait_for_gd_init:
>         la      t0, available_harts_lock
>         li      t1, 1
> -1:     amoswap.w t1, t1, 0(t0)
> +1:     amoswap.w.aq t1, t1, 0(t0)
>         fence   r, rw

fence   r, rw can also be removed.

>         bnez    t1, 1b
>
> @@ -143,7 +143,7 @@ wait_for_gd_init:
>         SREG    t2, GD_AVAILABLE_HARTS(gp)
>
>         fence   rw, w

fence   rw, w can also be removed.

Thanks,
Rick

> -       amoswap.w zero, zero, 0(t0)
> +       amoswap.w.rl zero, zero, 0(t0)
>
>         /*
>          * Continue on hart lottery winner, others branch to
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-09 10:23     ` Sean Anderson
@ 2020-09-10  6:39       ` Rick Chen
  2020-09-10 10:18         ` Sean Anderson
  0 siblings, 1 reply; 45+ messages in thread
From: Rick Chen @ 2020-09-10  6:39 UTC (permalink / raw)
  To: u-boot

Hi Sean

> On 9/9/20 3:50 AM, Rick Chen wrote:
> > Hi Sean
> >
> >> Clearing MIP doesn't do anything. Whoops. The following commits should
> >> tackle this problem in a more robust manner.
> >
> > I still not catch your points about that  this commit 947263033 really
> > help to fix  pending IPIs not clean problem on k210 platform at that
> > time, but you just said it do nothing and remove it here currently.
>
> After refactoring the original smp patch to remove the null check, I
> neglected to re-test with a debug uart enabled. Without doing that test,
> it is impossible to catch the pending IPI bug. The secondary hart will
> crash before the serial driver is initialized. I didn't do that test at
> the time, because I was mostly worried about the secondary hart
> corrupting the device tree and causing the boot to fail. That failure
> mode was fixed by 40686c394. So I saw that and thought that the pending
> IPI issue was solved as well.

Can you describe more clearly why with a debug uart enabled will
trigger the pending IPI bug ?

And why the pending IPI be raised and not clear before jump to U-Boot ?

Why the
csrc MODE_PREFIX(ip), t0
don't help to fix this bug ?

Thanks,
Rick


>
> --Sean
>
> >>
> >> This reverts commit 9472630337e7c4ac442066b5a752aaa8c3b4d4a6.
> >>
> >> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> >> ---
> >>
> >>  arch/riscv/cpu/start.S | 2 --
> >>  1 file changed, 2 deletions(-)
> >>
> >> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> >> index bf9fdf369b..e3222b1ea7 100644
> >> --- a/arch/riscv/cpu/start.S
> >> +++ b/arch/riscv/cpu/start.S
> >> @@ -65,8 +65,6 @@ _start:
> >>  #else
> >>         li      t0, SIE_SSIE
> >>  #endif
> >> -       /* Clear any pending IPIs */
> >> -       csrc    MODE_PREFIX(ip), t0
> >>         csrs    MODE_PREFIX(ie), t0
> >>  #endif
> >>
> >> --
> >> 2.28.0
> >>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot
  2020-09-09  2:38   ` Sean Anderson
  2020-09-09  2:44     ` Sean Anderson
@ 2020-09-10  7:08     ` Rick Chen
  2020-09-10 10:49       ` Sean Anderson
  1 sibling, 1 reply; 45+ messages in thread
From: Rick Chen @ 2020-09-10  7:08 UTC (permalink / raw)
  To: u-boot

Hi Sean

> On 9/8/20 10:02 PM, Rick Chen wrote:
> > Hi Sean
> >
> >> On the K210, the prior stage bootloader does not clear IPIs. This presents
> >> a problem, because U-Boot up until this point assumes (with one exception)
> >> that IPIs are cleared when it starts. This series attempts to fix this in a
> >> robust manner, and fix several concurrency bugs I noticed while fixing
> >> these other areas. Heinrich previously submitted a patch addressing part of
> >> this problem in [1].
> >>
> >> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk at gmx.de/
> >>
> >
> > It sounds like that the bootloader does not deal with SMP flow well
> > and jump to u-boot-spl, right ?
> >
> > I have a question that why not try to fix the prior stage bootloader
> > to clear IPIs correctly?
>
> Because it is a ROM :)

Is it a mask ROM or flash ROM ?

>
> >
> > Actually I have encounter a similar SMP issue like you.
> > Our prior stage bootloader will jump to u-boot-spl with the incorrect
> > mstatus and result in the SMP working abnormal in u-boot-spl.
>
> Perhaps we should just clear MIE then? I originally had a patch in this
> series which moved the handle_ipi code into handle_trap, and got rid of
> the manual checks on the interrupt. Something like
>
> secondary_hart_loop:
>         wfi
>         j       secondary_hart_loop
>
> Of course as part of that we would need to explicitly enable and disable
> interrupts. Perhaps not the worst idea, but I didn't include it here
> because I figure the current system work OK, even if it is not what one
> might expect.
>
> > I mean this is an individual case, not a general case.
> > If we try to cover any errors which come from any prior stage bootloaders,
> > the SMP flow will become more and more complicated and hard to debug.
>
> Of course, if a prior bootloader doesn't hold up its end of the
> contract, we can be left with some awful bugs to fix. U-Boot is
> generally not too bad to debug, but I've had an awful time whenever some
> concurrency sneaks into the mix. I think it's much better to confine the
> (necessary) complexity to as few files as possible, so that the rest of
> the code can be ignorant. I think part of that is verifying that we have
> everything in a known state, so that when we see something unexpected,
> we can handle it/panic/whatever instead of silently getting a bug.

It sounds like an error handling and the errors come from the prior
stage bootloader.
Without U-Boot, does Kernel handle this kind of IPIs not clean
unexpected errors ?

Thanks,
Rick

>
> --Sean

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-09 10:16       ` Sean Anderson
  2020-09-09 10:26         ` Heinrich Schuchardt
@ 2020-09-10  8:09         ` Rick Chen
  2020-09-14  3:21         ` Rick Chen
  2 siblings, 0 replies; 45+ messages in thread
From: Rick Chen @ 2020-09-10  8:09 UTC (permalink / raw)
  To: u-boot

Hi Sean

> On 9/9/20 5:01 AM, Rick Chen wrote:
> > Hi Sean
> >
> >> Hi Sean
> >>
> >>> Some IPIs may already be pending when U-Boot is started. This could be a
> >>> problem if a secondary hart tries to handle an IPI before the boot hart has
> >>> initialized the IPI device.
> >>>
> >>> This commit uses NULL as a sentinel for secondary harts so they know when
> >>> the IPI is initialized, and it is safe to use the IPI API. The smp addr
> >>> parameter is initialized to NULL by board_init_f_init_reserve. Before this,
> >>> secondary harts wait in wait_for_gd_init.
> >>>
> >>> This imposes a minor restriction because harts may no longer jump to NULL.
> >>> However, given that the RISC-V debug device is likely to be located at
> >>> 0x400, it is unlikely for any RISC-V implementation to have usable ram
> >>> located at 0x0.
> >>
> >> The ram location of AE350 is at 0x0.
>
> Huh. Does it not have a debug device?

No, our debug device is not in here.

>
> >>
> >>>
> >>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> >>> ---
> >>>
> >>>  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
> >>>  1 file changed, 22 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
> >>> index ab6d8bd7fa..8c25755330 100644
> >>> --- a/arch/riscv/lib/smp.c
> >>> +++ b/arch/riscv/lib/smp.c
> >>> @@ -18,6 +18,12 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
> >>>         u32 reg;
> >>>         int ret, pending;
> >>>
> >>> +       /* NULL is used as a sentinel value */
> >>> +       if (!ipi->addr) {
> >>> +               pr_err("smp_function cannot be set to 0x0\n");
> >>> +               return -EINVAL;
> >>> +       }
> >>> +
> >>
> >> This conflict with memory configurations of AE350.
> >> Please check about doc\board\AndesTech\ax25-ae350.rst, and you can
> >> find BBL is configured as zero address on AE350 platform.
>
> Ok, that is a strange choice because any accidental NULL-pointer
> dereference turns into code modification. In the next revision, I will
> add an arch.ipi[reg].valid variable for the same prupose, instead of
> re-using addr.
>
> >>>         cpus = ofnode_path("/cpus");
> >>>         if (!ofnode_valid(cpus)) {
> >>>                 pr_err("Can't find cpus node!\n");
> >>> @@ -50,11 +56,16 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
> >>>                         continue;
> >>>  #endif
> >>>
> >>> -               gd->arch.ipi[reg].addr = ipi->addr;
> >>>                 gd->arch.ipi[reg].arg0 = ipi->arg0;
> >>>                 gd->arch.ipi[reg].arg1 = ipi->arg1;
> >>>
> >>> -               __smp_mb();
> >
> > Why do you add this in [PATCH 2/7] but remove it in  [PATCH 3/7] ?
>
> Because conceptually, patch 2 is independent of this patch. It is still
> a bug even if this patch is not applied. I think by making this change
> over two patches, it is more obvious why the barrier was added, and then
> weakened, as opposed to if I made the change in one patch.

OK.
Thanks for explanations.

>
> >
> > Thanks,
> > Rick
> >
> >>> +               /*
> >>> +                * Ensure addr only becomes non-NULL when arg0 and arg1 are
> >>> +                * valid. An IPI may already be pending on other harts, so we
> >>> +                * need a way to signal that the IPI device has been
> >>> +                * initialized, and that it is ok to call the function.
> >>> +                */
> >>> +               __smp_store_release(&gd->arch.ipi[reg].addr, ipi->addr);
> >>
> >> It is too tricky and hack by using zero address to be a signal for the
> >> other pending harts waiting the IPI device been initialized.
> >>
> >>>
> >>>                 ret = riscv_send_ipi(reg);
> >>>                 if (ret) {
> >>> @@ -83,9 +94,16 @@ void handle_ipi(ulong hart)
> >>>         if (hart >= CONFIG_NR_CPUS)
> >>>                 return;
> >>>
> >>> -       __smp_mb();
> >>> +       smp_function = (void (*)(ulong, ulong, ulong))
> >>> +                       __smp_load_acquire(&gd->arch.ipi[hart].addr);
> >>> +       /*
> >>> +        * If the function is NULL, then U-Boot has not requested the IPI. The
> >>> +        * IPI device may not be initialized, so all we can do is wait for
> >>> +        * U-Boot to initialize it and send an IPI
> >>> +        */
> >>> +       if (!smp_function)
> >>> +               return;
> >>
> >> It will boot BBL+Kernel payload fail here on AE350 platforms with this check.
> >>
> >> Thanks,
> >> Rick
> >>
> >>>
> >>> -       smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
> >>>         invalidate_icache_all();
> >>>
> >>>         /*
> >>> --
> >>> 2.28.0
> >>>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-10  6:39       ` Rick Chen
@ 2020-09-10 10:18         ` Sean Anderson
  0 siblings, 0 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-10 10:18 UTC (permalink / raw)
  To: u-boot

On 9/10/20 2:39 AM, Rick Chen wrote:
> Hi Sean
> 
>> On 9/9/20 3:50 AM, Rick Chen wrote:
>>> Hi Sean
>>>
>>>> Clearing MIP doesn't do anything. Whoops. The following commits should
>>>> tackle this problem in a more robust manner.
>>>
>>> I still not catch your points about that  this commit 947263033 really
>>> help to fix  pending IPIs not clean problem on k210 platform at that
>>> time, but you just said it do nothing and remove it here currently.
>>
>> After refactoring the original smp patch to remove the null check, I
>> neglected to re-test with a debug uart enabled. Without doing that test,
>> it is impossible to catch the pending IPI bug. The secondary hart will
>> crash before the serial driver is initialized. I didn't do that test at
>> the time, because I was mostly worried about the secondary hart
>> corrupting the device tree and causing the boot to fail. That failure
>> mode was fixed by 40686c394. So I saw that and thought that the pending
>> IPI issue was solved as well.
> 
> Can you describe more clearly why with a debug uart enabled will
> trigger the pending IPI bug ?

It doesn't trigger the bug, it always happens. However, if there is no
debug uart nothing gets printed, because it occurs before the serial
driver is initialized.

> 
> And why the pending IPI be raised and not clear before jump to U-Boot ?

I don't know. Presumably the boot rom raises it to signal to jump to
U-Boot and never clears it.

> 
> Why the
> csrc MODE_PREFIX(ip), t0
> don't help to fix this bug ?

The problem is that MSIP in MIP is not necessarily writable.

> Each lower privilege level has a separate software interrupt-pending
> bit (HSIP, SSIP, USIP), which can be both read and written by CSR
> accesses from code running on the local hart at the associated or any
> higher privilege level. The machine-level MSIP bits are written by
> accesses to memory- mapped control registers, which are used by remote
> harts to provide machine-mode interprocessor interrupts.
> Interprocessor interrupts for lower privilege levels are implemented
> through ABI, SBI or HBI calls to the AEE, SEE or HEE respectively,
> which might ultimately result in a machine- mode write to the
> receiving hart?s MSIP bit. A hart can write its own MSIP bit using the
> same memory-mapped control register.

Contrast for example with SSIP in SIP

> Three types of interrupts are defined: software interrupts, timer
> interrupts, and external interrupts. A supervisor-level software
> interrupt is triggered on the current hart by writing 1 to its
> supervisor software interrupt-pending (SSIP) bit in the sip register.
> A pending supervisor-level software interrupt can be cleared by
> writing 0 to the SSIP bit in sip. Supervisor-level software interrupts
> are disabled when the SSIE bit in the sie register is clear.

I originally added this at your suggestion, but I never ended up
testing it separately from the IPI cleanup patch.

--Sean

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot
  2020-09-10  7:08     ` Rick Chen
@ 2020-09-10 10:49       ` Sean Anderson
  0 siblings, 0 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-10 10:49 UTC (permalink / raw)
  To: u-boot

On 9/10/20 3:08 AM, Rick Chen wrote:
> Hi Sean
> 
>> On 9/8/20 10:02 PM, Rick Chen wrote:
>>> Hi Sean
>>>
>>>> On the K210, the prior stage bootloader does not clear IPIs. This presents
>>>> a problem, because U-Boot up until this point assumes (with one exception)
>>>> that IPIs are cleared when it starts. This series attempts to fix this in a
>>>> robust manner, and fix several concurrency bugs I noticed while fixing
>>>> these other areas. Heinrich previously submitted a patch addressing part of
>>>> this problem in [1].
>>>>
>>>> [1] https://patchwork.ozlabs.org/project/uboot/patch/20200811035648.3284-1-xypron.glpk at gmx.de/
>>>>
>>>
>>> It sounds like that the bootloader does not deal with SMP flow well
>>> and jump to u-boot-spl, right ?
>>>
>>> I have a question that why not try to fix the prior stage bootloader
>>> to clear IPIs correctly?
>>
>> Because it is a ROM :)
> 
> Is it a mask ROM or flash ROM ?

I don't know, it's not documented. From what I can tell, it's controlled
by the OTP device. However, I haven't thoroughly investigated it.

> 
>>
>>>
>>> Actually I have encounter a similar SMP issue like you.
>>> Our prior stage bootloader will jump to u-boot-spl with the incorrect
>>> mstatus and result in the SMP working abnormal in u-boot-spl.
>>
>> Perhaps we should just clear MIE then? I originally had a patch in this
>> series which moved the handle_ipi code into handle_trap, and got rid of
>> the manual checks on the interrupt. Something like
>>
>> secondary_hart_loop:
>>         wfi
>>         j       secondary_hart_loop
>>
>> Of course as part of that we would need to explicitly enable and disable
>> interrupts. Perhaps not the worst idea, but I didn't include it here
>> because I figure the current system work OK, even if it is not what one
>> might expect.
>>
>>> I mean this is an individual case, not a general case.
>>> If we try to cover any errors which come from any prior stage bootloaders,
>>> the SMP flow will become more and more complicated and hard to debug.
>>
>> Of course, if a prior bootloader doesn't hold up its end of the
>> contract, we can be left with some awful bugs to fix. U-Boot is
>> generally not too bad to debug, but I've had an awful time whenever some
>> concurrency sneaks into the mix. I think it's much better to confine the
>> (necessary) complexity to as few files as possible, so that the rest of
>> the code can be ignorant. I think part of that is verifying that we have
>> everything in a known state, so that when we see something unexpected,
>> we can handle it/panic/whatever instead of silently getting a bug.
> 
> It sounds like an error handling and the errors come from the prior
> stage bootloader.
> Without U-Boot, does Kernel handle this kind of IPIs not clean
> unexpected errors ?

Well, software interrupts are disabled on each hart until
riscv_intc_init is called (and enables soft irqs). This prevents the
handler from being called before ipi_data is initialized. After that,
handle_IPI can be called, but if ops == 0 (e.g. the IPI wasn't signaled
by Linux), then it just goes to done.

--Sean

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-07 18:16 ` [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs" Sean Anderson
  2020-09-09  7:50   ` Rick Chen
@ 2020-09-11  7:38   ` Bin Meng
  2020-09-11 10:22     ` Sean Anderson
  1 sibling, 1 reply; 45+ messages in thread
From: Bin Meng @ 2020-09-11  7:38 UTC (permalink / raw)
  To: u-boot

Hi Sean,

On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>
> Clearing MIP doesn't do anything. Whoops. The following commits should

Which following commits?

> tackle this problem in a more robust manner.
>
> This reverts commit 9472630337e7c4ac442066b5a752aaa8c3b4d4a6.
>
> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  arch/riscv/cpu/start.S | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> index bf9fdf369b..e3222b1ea7 100644
> --- a/arch/riscv/cpu/start.S
> +++ b/arch/riscv/cpu/start.S
> @@ -65,8 +65,6 @@ _start:
>  #else
>         li      t0, SIE_SSIE
>  #endif
> -       /* Clear any pending IPIs */
> -       csrc    MODE_PREFIX(ip), t0

Did you mean the clearing MIP.MSIP actually does nothing, but the
following commit is the correct fix?

commit 40686c394e533fec765fe237936e353c84e73fff
Author: Sean Anderson <seanga2@gmail.com>
Date:   Wed Jun 24 06:41:18 2020 -0400

    riscv: Clean up IPI initialization code

    The previous IPI code initialized the device whenever the first call was
    made to a riscv_*_ipi function. This made it difficult to determine when
    the IPI device was initialized. This patch introduces a new function
    riscv_init_ipi. It is called once during arch_cpu_init_dm. In SPL, it is
    called in spl_invoke_opensbi. Before this point, no riscv_*_ipi functions
    should be called.

    Signed-off-by: Sean Anderson <seanga2@gmail.com>
    Reviewed-by: Rick Chen <rick@andestech.com>

>         csrs    MODE_PREFIX(ie), t0
>  #endif
>

Regards,
Bin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 2/7] riscv: Match memory barriers between send_ipi_many and handle_ipi
  2020-09-07 18:16 ` [PATCH 2/7] riscv: Match memory barriers between send_ipi_many and handle_ipi Sean Anderson
@ 2020-09-11  7:45   ` Bin Meng
  0 siblings, 0 replies; 45+ messages in thread
From: Bin Meng @ 2020-09-11  7:45 UTC (permalink / raw)
  To: u-boot

On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>
> Without a matching barrier on the write side, the barrier in handle_ipi
> does nothing. It was entirely possible for the boot hart to write to addr,
> arg0, and arg1 *after* sending the IPI, because there was no barrier on the
> sending side.
>
> Fixes: 90ae28143700bae4edd23930a7772899ad259058

nits: wrong format of Fixes tag

> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  arch/riscv/lib/smp.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
> index ac22136314..ab6d8bd7fa 100644
> --- a/arch/riscv/lib/smp.c
> +++ b/arch/riscv/lib/smp.c
> @@ -54,6 +54,8 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
>                 gd->arch.ipi[reg].arg0 = ipi->arg0;
>                 gd->arch.ipi[reg].arg1 = ipi->arg1;
>
> +               __smp_mb();
> +
>                 ret = riscv_send_ipi(reg);
>                 if (ret) {
>                         pr_err("Cannot send IPI to hart %d\n", reg);

Reviewed-by: Bin Meng <bin.meng@windriver.com>

Regards,
Bin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-07 18:16 ` [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function Sean Anderson
  2020-09-09  8:33   ` Rick Chen
@ 2020-09-11  8:04   ` Bin Meng
  2020-09-14  1:58     ` Leo Liang
  2020-09-14 14:05     ` Sean Anderson
  1 sibling, 2 replies; 45+ messages in thread
From: Bin Meng @ 2020-09-11  8:04 UTC (permalink / raw)
  To: u-boot

On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>
> Some IPIs may already be pending when U-Boot is started. This could be a
> problem if a secondary hart tries to handle an IPI before the boot hart has
> initialized the IPI device.
>
> This commit uses NULL as a sentinel for secondary harts so they know when
> the IPI is initialized, and it is safe to use the IPI API. The smp addr
> parameter is initialized to NULL by board_init_f_init_reserve. Before this,
> secondary harts wait in wait_for_gd_init.
>
> This imposes a minor restriction because harts may no longer jump to NULL.
> However, given that the RISC-V debug device is likely to be located at
> 0x400, it is unlikely for any RISC-V implementation to have usable ram
> located at 0x0.
>
> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
>

Reviewed-by: Bin Meng <bin.meng@windriver.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-11  7:38   ` Bin Meng
@ 2020-09-11 10:22     ` Sean Anderson
  2020-09-11 14:45       ` Bin Meng
  0 siblings, 1 reply; 45+ messages in thread
From: Sean Anderson @ 2020-09-11 10:22 UTC (permalink / raw)
  To: u-boot

On 9/11/20 3:38 AM, Bin Meng wrote:
> Hi Sean,
> 
> On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>>
>> Clearing MIP doesn't do anything. Whoops. The following commits should
> 
> Which following commits?
> 
>> tackle this problem in a more robust manner.
>>
>> This reverts commit 9472630337e7c4ac442066b5a752aaa8c3b4d4a6.
>>
>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>> ---
>>
>>  arch/riscv/cpu/start.S | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
>> index bf9fdf369b..e3222b1ea7 100644
>> --- a/arch/riscv/cpu/start.S
>> +++ b/arch/riscv/cpu/start.S
>> @@ -65,8 +65,6 @@ _start:
>>  #else
>>         li      t0, SIE_SSIE
>>  #endif
>> -       /* Clear any pending IPIs */
>> -       csrc    MODE_PREFIX(ip), t0
> 
> Did you mean the clearing MIP.MSIP actually does nothing, but the
> following commit is the correct fix?

Yes, but we also need

[PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function

so the secondary harts know not to take any IPIs not raised by U-Boot.

> 
> commit 40686c394e533fec765fe237936e353c84e73fff
> Author: Sean Anderson <seanga2@gmail.com>
> Date:   Wed Jun 24 06:41:18 2020 -0400
> 
>     riscv: Clean up IPI initialization code
> 
>     The previous IPI code initialized the device whenever the first call was
>     made to a riscv_*_ipi function. This made it difficult to determine when
>     the IPI device was initialized. This patch introduces a new function
>     riscv_init_ipi. It is called once during arch_cpu_init_dm. In SPL, it is
>     called in spl_invoke_opensbi. Before this point, no riscv_*_ipi functions
>     should be called.
> 
>     Signed-off-by: Sean Anderson <seanga2@gmail.com>
>     Reviewed-by: Rick Chen <rick@andestech.com>
> 
>>         csrs    MODE_PREFIX(ie), t0
>>  #endif
>>

--Sean

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 5/7] riscv: Add fence to available_harts_lock
  2020-09-10  3:26   ` Rick Chen
@ 2020-09-11 10:39     ` Sean Anderson
  0 siblings, 0 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-11 10:39 UTC (permalink / raw)
  To: u-boot

On 9/9/20 11:26 PM, Rick Chen wrote:
> Hi Sean
> 
>> Without these fences, it is perfectly valid for an out-of-order core to
>> re-order memory accesses to outside of the available_harts_lock critical
>> section.
>>
>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>> ---
>>
>>  arch/riscv/cpu/start.S | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
>> index e3222b1ea7..ad18e746b6 100644
>> --- a/arch/riscv/cpu/start.S
>> +++ b/arch/riscv/cpu/start.S
>> @@ -126,12 +126,12 @@ call_board_init_f_0:
>>  #ifndef CONFIG_XIP
>>         la      t0, available_harts_lock
>>         fence   rw, w
>> -       amoswap.w zero, zero, 0(t0)
>> +       amoswap.w.rl zero, zero, 0(t0)
> 
> fence   rw, w can also be removed.

Hmm. Actually, upon reading the spec I noticed this paragraph

> To help implement multiprocessor synchronization, the AMOs optionally
> provide release consistency semantics. If the aq bit is set, then no
> later memory operations in this RISC-V hart can be observed to take
> place before the AMO. Conversely, if the rl bit is set, then other
> RISC-V harts will not observe the AMO before memory accesses preceding
> the AMO in this RISC-V hart. Setting both the aq and the rl bit on an
> AMO makes the sequence sequentially consistent, meaning that it cannot
> be reordered with earlier or later memory operations from the same
> hart.

The key word being optionally. However, I could not find any information
on how to detect if this option is enabled. Perhaps a separate fence was
used to prevent having to try and detect this feature?

>>
>>  wait_for_gd_init:
>>         la      t0, available_harts_lock
>>         li      t1, 1
>> -1:     amoswap.w t1, t1, 0(t0)
>> +1:     amoswap.w.aq t1, t1, 0(t0)
>>         fence   r, rw
> 
> fence   r, rw can also be removed.
> 
>>         bnez    t1, 1b
>>
>> @@ -143,7 +143,7 @@ wait_for_gd_init:
>>         SREG    t2, GD_AVAILABLE_HARTS(gp)
>>
>>         fence   rw, w
> 
> fence   rw, w can also be removed.
> 
> Thanks,
> Rick
> 
>> -       amoswap.w zero, zero, 0(t0)
>> +       amoswap.w.rl zero, zero, 0(t0)
>>
>>         /*
>>          * Continue on hart lottery winner, others branch to
>> --
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-11 10:22     ` Sean Anderson
@ 2020-09-11 14:45       ` Bin Meng
  2020-09-11 18:30         ` Sean Anderson
  0 siblings, 1 reply; 45+ messages in thread
From: Bin Meng @ 2020-09-11 14:45 UTC (permalink / raw)
  To: u-boot

On Fri, Sep 11, 2020 at 6:22 PM Sean Anderson <seanga2@gmail.com> wrote:
>
> On 9/11/20 3:38 AM, Bin Meng wrote:
> > Hi Sean,
> >
> > On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
> >>
> >> Clearing MIP doesn't do anything. Whoops. The following commits should
> >
> > Which following commits?
> >
> >> tackle this problem in a more robust manner.
> >>
> >> This reverts commit 9472630337e7c4ac442066b5a752aaa8c3b4d4a6.
> >>
> >> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> >> ---
> >>
> >>  arch/riscv/cpu/start.S | 2 --
> >>  1 file changed, 2 deletions(-)
> >>
> >> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> >> index bf9fdf369b..e3222b1ea7 100644
> >> --- a/arch/riscv/cpu/start.S
> >> +++ b/arch/riscv/cpu/start.S
> >> @@ -65,8 +65,6 @@ _start:
> >>  #else
> >>         li      t0, SIE_SSIE
> >>  #endif
> >> -       /* Clear any pending IPIs */
> >> -       csrc    MODE_PREFIX(ip), t0
> >
> > Did you mean the clearing MIP.MSIP actually does nothing, but the
> > following commit is the correct fix?
>
> Yes, but we also need

Is MIP.MSIP read-only on K210?

>
> [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
>
> so the secondary harts know not to take any IPIs not raised by U-Boot.
>
> >
> > commit 40686c394e533fec765fe237936e353c84e73fff
> > Author: Sean Anderson <seanga2@gmail.com>
> > Date:   Wed Jun 24 06:41:18 2020 -0400
> >
> >     riscv: Clean up IPI initialization code
> >
> >     The previous IPI code initialized the device whenever the first call was
> >     made to a riscv_*_ipi function. This made it difficult to determine when
> >     the IPI device was initialized. This patch introduces a new function
> >     riscv_init_ipi. It is called once during arch_cpu_init_dm. In SPL, it is
> >     called in spl_invoke_opensbi. Before this point, no riscv_*_ipi functions
> >     should be called.
> >
> >     Signed-off-by: Sean Anderson <seanga2@gmail.com>
> >     Reviewed-by: Rick Chen <rick@andestech.com>
> >
> >>         csrs    MODE_PREFIX(ie), t0
> >>  #endif

Regards,
Bin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 5/7] riscv: Add fence to available_harts_lock
  2020-09-07 18:16 ` [PATCH 5/7] riscv: Add fence to available_harts_lock Sean Anderson
  2020-09-10  3:26   ` Rick Chen
@ 2020-09-11 14:47   ` Bin Meng
  1 sibling, 0 replies; 45+ messages in thread
From: Bin Meng @ 2020-09-11 14:47 UTC (permalink / raw)
  To: u-boot

On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>
> Without these fences, it is perfectly valid for an out-of-order core to
> re-order memory accesses to outside of the available_harts_lock critical
> section.

The commit message should be reworded, because current codes do
nothing wrong as "fence" instruction is used. What was done in this
patch is using .aq / .rl to replace "fence".

>
> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  arch/riscv/cpu/start.S | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>

Regards,
Bin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-11 14:45       ` Bin Meng
@ 2020-09-11 18:30         ` Sean Anderson
  2020-09-14  3:10           ` Rick Chen
  0 siblings, 1 reply; 45+ messages in thread
From: Sean Anderson @ 2020-09-11 18:30 UTC (permalink / raw)
  To: u-boot

On 9/11/20 10:45 AM, Bin Meng wrote:
> On Fri, Sep 11, 2020 at 6:22 PM Sean Anderson <seanga2@gmail.com> wrote:
>>
>> On 9/11/20 3:38 AM, Bin Meng wrote:
>>> Hi Sean,
>>>
>>> On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>>>>
>>>> Clearing MIP doesn't do anything. Whoops. The following commits should
>>>
>>> Which following commits?
>>>
>>>> tackle this problem in a more robust manner.
>>>>
>>>> This reverts commit 9472630337e7c4ac442066b5a752aaa8c3b4d4a6.
>>>>
>>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>>>> ---
>>>>
>>>>  arch/riscv/cpu/start.S | 2 --
>>>>  1 file changed, 2 deletions(-)
>>>>
>>>> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
>>>> index bf9fdf369b..e3222b1ea7 100644
>>>> --- a/arch/riscv/cpu/start.S
>>>> +++ b/arch/riscv/cpu/start.S
>>>> @@ -65,8 +65,6 @@ _start:
>>>>  #else
>>>>         li      t0, SIE_SSIE
>>>>  #endif
>>>> -       /* Clear any pending IPIs */
>>>> -       csrc    MODE_PREFIX(ip), t0
>>>
>>> Did you mean the clearing MIP.MSIP actually does nothing, but the
>>> following commit is the correct fix?
>>
>> Yes, but we also need
> 
> Is MIP.MSIP read-only on K210?

I think so. See [1] where only ssip, stip, and seip are written (and
new_mip is not otherwise used). The spec doesn't require MIP.MSIP to be
writable at all.

--Sean

[1] https://github.com/chipsalliance/rocket-chip/blob/master/src/main/scala/rocket/CSR.scala#L859

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-11  8:04   ` Bin Meng
@ 2020-09-14  1:58     ` Leo Liang
  2020-09-14  2:07       ` Bin Meng
  2020-09-14 14:05     ` Sean Anderson
  1 sibling, 1 reply; 45+ messages in thread
From: Leo Liang @ 2020-09-14  1:58 UTC (permalink / raw)
  To: u-boot

On Fri, Sep 11, 2020 at 04:04:13PM +0800, Bin Meng wrote:
> On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
> >
> > Some IPIs may already be pending when U-Boot is started. This could be a
> > problem if a secondary hart tries to handle an IPI before the boot hart has
> > initialized the IPI device.
> >
> > This commit uses NULL as a sentinel for secondary harts so they know when
> > the IPI is initialized, and it is safe to use the IPI API. The smp addr
> > parameter is initialized to NULL by board_init_f_init_reserve. Before this,
> > secondary harts wait in wait_for_gd_init.
> >
> > This imposes a minor restriction because harts may no longer jump to NULL.
> > However, given that the RISC-V debug device is likely to be located at
> > 0x400, it is unlikely for any RISC-V implementation to have usable ram
> > located at 0x0.
> >
> > Signed-off-by: Sean Anderson <seanga2@gmail.com>
> > ---
> >
> >  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
> >  1 file changed, 22 insertions(+), 4 deletions(-)
> >
> 
> Reviewed-by: Bin Meng <bin.meng@windriver.com>

Hi Bin,

There is a series of follow-up discussion on this patch that you might miss reading.

This current patch will cause AE350 to fail booting,

so maybe we should wait until Sean's next patch comes up to consider a reviewed-by tag.

Best regards,

Leo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-14  1:58     ` Leo Liang
@ 2020-09-14  2:07       ` Bin Meng
  2020-09-14  6:10         ` Leo Liang
  0 siblings, 1 reply; 45+ messages in thread
From: Bin Meng @ 2020-09-14  2:07 UTC (permalink / raw)
  To: u-boot

Hi Leo,

On Mon, Sep 14, 2020 at 9:58 AM Leo Liang <ycliang@andestech.com> wrote:
>
> On Fri, Sep 11, 2020 at 04:04:13PM +0800, Bin Meng wrote:
> > On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
> > >
> > > Some IPIs may already be pending when U-Boot is started. This could be a
> > > problem if a secondary hart tries to handle an IPI before the boot hart has
> > > initialized the IPI device.
> > >
> > > This commit uses NULL as a sentinel for secondary harts so they know when
> > > the IPI is initialized, and it is safe to use the IPI API. The smp addr
> > > parameter is initialized to NULL by board_init_f_init_reserve. Before this,
> > > secondary harts wait in wait_for_gd_init.
> > >
> > > This imposes a minor restriction because harts may no longer jump to NULL.
> > > However, given that the RISC-V debug device is likely to be located at
> > > 0x400, it is unlikely for any RISC-V implementation to have usable ram
> > > located at 0x0.
> > >
> > > Signed-off-by: Sean Anderson <seanga2@gmail.com>
> > > ---
> > >
> > >  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
> > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > >
> >
> > Reviewed-by: Bin Meng <bin.meng@windriver.com>
>
> Hi Bin,
>
> There is a series of follow-up discussion on this patch that you might miss reading.
>
> This current patch will cause AE350 to fail booting,
>
> so maybe we should wait until Sean's next patch comes up to consider a reviewed-by tag.
>

Do we know why AE350 fails to boot with this patch?

If some big changes are reworked in newer patches, I believe author
should remove the tag and re-request the review.

Regards,
Bin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 4/7] riscv: Clear pending IPIs on initialization
  2020-09-07 18:16 ` [PATCH 4/7] riscv: Clear pending IPIs on initialization Sean Anderson
@ 2020-09-14  2:08   ` Bin Meng
  0 siblings, 0 replies; 45+ messages in thread
From: Bin Meng @ 2020-09-14  2:08 UTC (permalink / raw)
  To: u-boot

On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>
> Even though we no longer call smp_function if an IPI was not sent by
> U-Boot, we still need to clear any IPIs which were pending from the
> execution environment. Otherwise, secondary harts will busy-wait in
> secondary_hart_loop, instead of relaxing.
>
> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  arch/riscv/cpu/cpu.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>

Reviewed-by: Bin Meng <bin.meng@windriver.com>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-11 18:30         ` Sean Anderson
@ 2020-09-14  3:10           ` Rick Chen
  2020-09-14 12:45             ` Sean Anderson
  0 siblings, 1 reply; 45+ messages in thread
From: Rick Chen @ 2020-09-14  3:10 UTC (permalink / raw)
  To: u-boot

HI Sean

> On 9/11/20 10:45 AM, Bin Meng wrote:
> > On Fri, Sep 11, 2020 at 6:22 PM Sean Anderson <seanga2@gmail.com> wrote:
> >>
> >> On 9/11/20 3:38 AM, Bin Meng wrote:
> >>> Hi Sean,
> >>>
> >>> On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
> >>>>
> >>>> Clearing MIP doesn't do anything. Whoops. The following commits should
> >>>
> >>> Which following commits?
> >>>
> >>>> tackle this problem in a more robust manner.
> >>>>
> >>>> This reverts commit 9472630337e7c4ac442066b5a752aaa8c3b4d4a6.
> >>>>
> >>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> >>>> ---
> >>>>
> >>>>  arch/riscv/cpu/start.S | 2 --
> >>>>  1 file changed, 2 deletions(-)
> >>>>
> >>>> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> >>>> index bf9fdf369b..e3222b1ea7 100644
> >>>> --- a/arch/riscv/cpu/start.S
> >>>> +++ b/arch/riscv/cpu/start.S
> >>>> @@ -65,8 +65,6 @@ _start:
> >>>>  #else
> >>>>         li      t0, SIE_SSIE
> >>>>  #endif
> >>>> -       /* Clear any pending IPIs */
> >>>> -       csrc    MODE_PREFIX(ip), t0
> >>>
> >>> Did you mean the clearing MIP.MSIP actually does nothing, but the
> >>> following commit is the correct fix?
> >>
> >> Yes, but we also need
> >
> > Is MIP.MSIP read-only on K210?

Since clear mip will not affect anything in K210 and it is writable
for other RISC-V platforms.
I will prefer to keep this instruction stay here for standard startup
initialization.

Thanks,
Rick

>
> I think so. See [1] where only ssip, stip, and seip are written (and
> new_mip is not otherwise used). The spec doesn't require MIP.MSIP to be
> writable at all.
>
> --Sean
>
> [1] https://github.com/chipsalliance/rocket-chip/blob/master/src/main/scala/rocket/CSR.scala#L859

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-09 10:16       ` Sean Anderson
  2020-09-09 10:26         ` Heinrich Schuchardt
  2020-09-10  8:09         ` Rick Chen
@ 2020-09-14  3:21         ` Rick Chen
  2 siblings, 0 replies; 45+ messages in thread
From: Rick Chen @ 2020-09-14  3:21 UTC (permalink / raw)
  To: u-boot

Hi Sean

> On 9/9/20 5:01 AM, Rick Chen wrote:
> > Hi Sean
> >
> >> Hi Sean
> >>
> >>> Some IPIs may already be pending when U-Boot is started. This could be a
> >>> problem if a secondary hart tries to handle an IPI before the boot hart has
> >>> initialized the IPI device.
> >>>
> >>> This commit uses NULL as a sentinel for secondary harts so they know when
> >>> the IPI is initialized, and it is safe to use the IPI API. The smp addr
> >>> parameter is initialized to NULL by board_init_f_init_reserve. Before this,
> >>> secondary harts wait in wait_for_gd_init.
> >>>
> >>> This imposes a minor restriction because harts may no longer jump to NULL.
> >>> However, given that the RISC-V debug device is likely to be located at
> >>> 0x400, it is unlikely for any RISC-V implementation to have usable ram
> >>> located at 0x0.
> >>
> >> The ram location of AE350 is at 0x0.
>
> Huh. Does it not have a debug device?
>
> >>
> >>>
> >>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> >>> ---
> >>>
> >>>  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
> >>>  1 file changed, 22 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/arch/riscv/lib/smp.c b/arch/riscv/lib/smp.c
> >>> index ab6d8bd7fa..8c25755330 100644
> >>> --- a/arch/riscv/lib/smp.c
> >>> +++ b/arch/riscv/lib/smp.c
> >>> @@ -18,6 +18,12 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
> >>>         u32 reg;
> >>>         int ret, pending;
> >>>
> >>> +       /* NULL is used as a sentinel value */
> >>> +       if (!ipi->addr) {
> >>> +               pr_err("smp_function cannot be set to 0x0\n");
> >>> +               return -EINVAL;
> >>> +       }
> >>> +
> >>
> >> This conflict with memory configurations of AE350.
> >> Please check about doc\board\AndesTech\ax25-ae350.rst, and you can
> >> find BBL is configured as zero address on AE350 platform.
>
> Ok, that is a strange choice because any accidental NULL-pointer
> dereference turns into code modification. In the next revision, I will
> add an arch.ipi[reg].valid variable for the same prupose, instead of
> re-using addr.

Adding arch.ipi[reg].valid instead of re-using addr is OK for me.

Thanks,
Rick

>
> >>>         cpus = ofnode_path("/cpus");
> >>>         if (!ofnode_valid(cpus)) {
> >>>                 pr_err("Can't find cpus node!\n");
> >>> @@ -50,11 +56,16 @@ static int send_ipi_many(struct ipi_data *ipi, int wait)
> >>>                         continue;
> >>>  #endif
> >>>
> >>> -               gd->arch.ipi[reg].addr = ipi->addr;
> >>>                 gd->arch.ipi[reg].arg0 = ipi->arg0;
> >>>                 gd->arch.ipi[reg].arg1 = ipi->arg1;
> >>>
> >>> -               __smp_mb();
> >
> > Why do you add this in [PATCH 2/7] but remove it in  [PATCH 3/7] ?
>
> Because conceptually, patch 2 is independent of this patch. It is still
> a bug even if this patch is not applied. I think by making this change
> over two patches, it is more obvious why the barrier was added, and then
> weakened, as opposed to if I made the change in one patch.
>
> >
> > Thanks,
> > Rick
> >
> >>> +               /*
> >>> +                * Ensure addr only becomes non-NULL when arg0 and arg1 are
> >>> +                * valid. An IPI may already be pending on other harts, so we
> >>> +                * need a way to signal that the IPI device has been
> >>> +                * initialized, and that it is ok to call the function.
> >>> +                */
> >>> +               __smp_store_release(&gd->arch.ipi[reg].addr, ipi->addr);
> >>
> >> It is too tricky and hack by using zero address to be a signal for the
> >> other pending harts waiting the IPI device been initialized.
> >>
> >>>
> >>>                 ret = riscv_send_ipi(reg);
> >>>                 if (ret) {
> >>> @@ -83,9 +94,16 @@ void handle_ipi(ulong hart)
> >>>         if (hart >= CONFIG_NR_CPUS)
> >>>                 return;
> >>>
> >>> -       __smp_mb();
> >>> +       smp_function = (void (*)(ulong, ulong, ulong))
> >>> +                       __smp_load_acquire(&gd->arch.ipi[hart].addr);
> >>> +       /*
> >>> +        * If the function is NULL, then U-Boot has not requested the IPI. The
> >>> +        * IPI device may not be initialized, so all we can do is wait for
> >>> +        * U-Boot to initialize it and send an IPI
> >>> +        */
> >>> +       if (!smp_function)
> >>> +               return;
> >>
> >> It will boot BBL+Kernel payload fail here on AE350 platforms with this check.
> >>
> >> Thanks,
> >> Rick
> >>
> >>>
> >>> -       smp_function = (void (*)(ulong, ulong, ulong))gd->arch.ipi[hart].addr;
> >>>         invalidate_icache_all();
> >>>
> >>>         /*
> >>> --
> >>> 2.28.0
> >>>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6/7] riscv: Ensure gp is NULL or points to valid data
  2020-09-07 18:16 ` [PATCH 6/7] riscv: Ensure gp is NULL or points to valid data Sean Anderson
@ 2020-09-14  5:25   ` Bin Meng
  2020-09-14 13:03     ` Sean Anderson
  0 siblings, 1 reply; 45+ messages in thread
From: Bin Meng @ 2020-09-14  5:25 UTC (permalink / raw)
  To: u-boot

On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>
> This allows code to use a construct like `if (gd & gd->...) { ... }` when
> accessing the global data pointer. Without this change, it was possible for
> a very early trap to cause _exit_trap to read arbitrary memory. This could
> cause a second trap, preventing show_regs from being printed.
>
> Fixes: 7c6ca03eaed0035ca6676e9bc7f5f1dfcaae7e8f

nits: wrong fixes tag format

> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  arch/riscv/cpu/start.S      | 20 +++++++++++++++++---
>  arch/riscv/lib/interrupts.c |  3 ++-
>  2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> index ad18e746b6..59d3d7bbf4 100644
> --- a/arch/riscv/cpu/start.S
> +++ b/arch/riscv/cpu/start.S
> @@ -47,6 +47,13 @@ _start:
>         mv      tp, a0
>         mv      s1, a1
>
> +       /*
> +        * Set the global data pointer to a known value in case we get a very
> +        * early trap. The global data pointer will be set its actual value only
> +        * after it has been initialized.
> +        */
> +       mv      gp, zero
> +
>         la      t0, trap_entry
>         csrw    MODE_PREFIX(tvec), t0
>
> @@ -85,10 +92,10 @@ call_board_init_f_0:
>         jal     board_init_f_alloc_reserve
>
>         /*
> -        * Set global data pointer here for all harts, uninitialized at this
> -        * point.
> +        * Save global data pointer for later. We don't set it here because it
> +        * is not initialized yet.
>          */
> -       mv      gp, a0
> +       mv      s0, a0
>
>         /* setup stack */
>  #if CONFIG_IS_ENABLED(SMP)
> @@ -135,6 +142,13 @@ wait_for_gd_init:
>         fence   r, rw
>         bnez    t1, 1b
>
> +       /*
> +        * Set the global data pointer only when gd_t has been initialized.
> +        * This was already set by arch_setup_gd on the boot hart, but all other
> +        * harts' global data pointers gets set here.
> +        */
> +       mv      gp, s0

This is currently in the #ifndef CONFIG_XIP block. Should consider the
#else branch?

> +
>         /* register available harts in the available_harts mask */
>         li      t1, 1
>         sll     t1, t1, tp
> diff --git a/arch/riscv/lib/interrupts.c b/arch/riscv/lib/interrupts.c
> index cd47e64487..ad870e98d8 100644
> --- a/arch/riscv/lib/interrupts.c
> +++ b/arch/riscv/lib/interrupts.c
> @@ -78,7 +78,8 @@ static void _exit_trap(ulong code, ulong epc, ulong tval, struct pt_regs *regs)
>
>         printf("EPC: " REG_FMT " RA: " REG_FMT " TVAL: " REG_FMT "\n",
>                epc, regs->ra, tval);
> -       if (gd->flags & GD_FLG_RELOC)
> +       /* Print relocation adjustments, but only if gd is initialized */
> +       if (gd && gd->flags & GD_FLG_RELOC)
>                 printf("EPC: " REG_FMT " RA: " REG_FMT " reloc adjusted\n\n",
>                        epc - gd->reloc_off, regs->ra - gd->reloc_off);
>
> --

Regards,
Bin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 7/7] riscv: Add some comments to start.S
  2020-09-07 18:16 ` [PATCH 7/7] riscv: Add some comments to start.S Sean Anderson
@ 2020-09-14  5:26   ` Bin Meng
  0 siblings, 0 replies; 45+ messages in thread
From: Bin Meng @ 2020-09-14  5:26 UTC (permalink / raw)
  To: u-boot

On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>
> This adds comments regarding the ordering and purpose of certain
> instructions as I understand them.
>
> Signed-off-by: Sean Anderson <seanga2@gmail.com>
> ---
>
>  arch/riscv/cpu/start.S | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
> index 59d3d7bbf4..c659c6df53 100644
> --- a/arch/riscv/cpu/start.S
> +++ b/arch/riscv/cpu/start.S
> @@ -43,7 +43,10 @@ _start:
>         csrr    a0, CSR_MHARTID
>  #endif
>
> -       /* save hart id and dtb pointer */
> +       /*
> +        * Save hart id and dtb pointer. The thread pointer register is not

U-Boot does not use tp register in C code.

> +        * modified by C code, and may be used in trap handlers.
> +        */
>         mv      tp, a0
>         mv      s1, a1
>
> @@ -54,10 +57,18 @@ _start:
>          */
>         mv      gp, zero
>
> +       /*
> +        * Set the trap handler. This must happen after initializing tp and gp
> +        * because the handler may use these registers.
> +        */
>         la      t0, trap_entry
>         csrw    MODE_PREFIX(tvec), t0
>
> -       /* mask all interrupts */
> +       /*
> +        * Mask all interrupts. Interrupts are disabled globally (in m/sstatus)
> +        * for U-Boot, but we will need to read m/sip to determine if we get an
> +        * IPI
> +        */
>         csrw    MODE_PREFIX(ie), zero
>
>  #if CONFIG_IS_ENABLED(SMP)
> @@ -407,6 +418,10 @@ secondary_hart_relocate:
>         mv      gp, a2
>  #endif
>
> +/*
> + * Interrupts are disabled globally, but they can still be read from m/sip. The
> + * wfi function will wake us up if we get an IPI, even if we do not trap.
> + */
>  secondary_hart_loop:
>         wfi

Regards,
Bin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-14  2:07       ` Bin Meng
@ 2020-09-14  6:10         ` Leo Liang
  2020-09-14  6:15           ` Bin Meng
  0 siblings, 1 reply; 45+ messages in thread
From: Leo Liang @ 2020-09-14  6:10 UTC (permalink / raw)
  To: u-boot

Hi, Bin
On Mon, Sep 14, 2020 at 10:07:57AM +0800, Bin Meng wrote:
> Hi Leo,
> 
> On Mon, Sep 14, 2020 at 9:58 AM Leo Liang <ycliang@andestech.com> wrote:
> >
> > On Fri, Sep 11, 2020 at 04:04:13PM +0800, Bin Meng wrote:
> > > On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
> > > >
> > > > Some IPIs may already be pending when U-Boot is started. This could be a
> > > > problem if a secondary hart tries to handle an IPI before the boot hart has
> > > > initialized the IPI device.
> > > >
> > > > This commit uses NULL as a sentinel for secondary harts so they know when
> > > > the IPI is initialized, and it is safe to use the IPI API. The smp addr
> > > > parameter is initialized to NULL by board_init_f_init_reserve. Before this,
> > > > secondary harts wait in wait_for_gd_init.
> > > >
> > > > This imposes a minor restriction because harts may no longer jump to NULL.
> > > > However, given that the RISC-V debug device is likely to be located at
> > > > 0x400, it is unlikely for any RISC-V implementation to have usable ram
> > > > located at 0x0.
> > > >
> > > > Signed-off-by: Sean Anderson <seanga2@gmail.com>
> > > > ---
> > > >
> > > >  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
> > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > >
> > >
> > > Reviewed-by: Bin Meng <bin.meng@windriver.com>
> >
> > Hi Bin,
> >
> > There is a series of follow-up discussion on this patch that you might miss reading.
> >
> > This current patch will cause AE350 to fail booting,
> >
> > so maybe we should wait until Sean's next patch comes up to consider a reviewed-by tag.
> >
> 
> Do we know why AE350 fails to boot with this patch?
>

When booting with bootm command,
boot hart will use smp_call_function(addr, arg0, arg1) to tell other harts to jump to "addr",
and other hart will use handle_ipi() to get the "addr".

The address of bbl+kernel payload when booting with AE350 is at 0x0,
so the "addr" would be 0x0 that other harts have to jump to.

With this patch

+	if (!ipi->addr) {
+		pr_err("smp_function cannot be set to 0x0\n");
+		return -EINVAL;
+	}

+	if (!smp_function)
+		return;

these two code snippets would cause u-boot to hang and thus fail the booting process.

> If some big changes are reworked in newer patches, I believe author
> should remove the tag and re-request the review.
> 

Oh I see! Thanks for the explanation.

> Regards,
> Bin

Best,
Leo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-14  6:10         ` Leo Liang
@ 2020-09-14  6:15           ` Bin Meng
  0 siblings, 0 replies; 45+ messages in thread
From: Bin Meng @ 2020-09-14  6:15 UTC (permalink / raw)
  To: u-boot

Hi Leo,

On Mon, Sep 14, 2020 at 2:10 PM Leo Liang <ycliang@andestech.com> wrote:
>
> Hi, Bin
> On Mon, Sep 14, 2020 at 10:07:57AM +0800, Bin Meng wrote:
> > Hi Leo,
> >
> > On Mon, Sep 14, 2020 at 9:58 AM Leo Liang <ycliang@andestech.com> wrote:
> > >
> > > On Fri, Sep 11, 2020 at 04:04:13PM +0800, Bin Meng wrote:
> > > > On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
> > > > >
> > > > > Some IPIs may already be pending when U-Boot is started. This could be a
> > > > > problem if a secondary hart tries to handle an IPI before the boot hart has
> > > > > initialized the IPI device.
> > > > >
> > > > > This commit uses NULL as a sentinel for secondary harts so they know when
> > > > > the IPI is initialized, and it is safe to use the IPI API. The smp addr
> > > > > parameter is initialized to NULL by board_init_f_init_reserve. Before this,
> > > > > secondary harts wait in wait_for_gd_init.
> > > > >
> > > > > This imposes a minor restriction because harts may no longer jump to NULL.
> > > > > However, given that the RISC-V debug device is likely to be located at
> > > > > 0x400, it is unlikely for any RISC-V implementation to have usable ram
> > > > > located at 0x0.
> > > > >
> > > > > Signed-off-by: Sean Anderson <seanga2@gmail.com>
> > > > > ---
> > > > >
> > > > >  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
> > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > >
> > > >
> > > > Reviewed-by: Bin Meng <bin.meng@windriver.com>
> > >
> > > Hi Bin,
> > >
> > > There is a series of follow-up discussion on this patch that you might miss reading.
> > >
> > > This current patch will cause AE350 to fail booting,
> > >
> > > so maybe we should wait until Sean's next patch comes up to consider a reviewed-by tag.
> > >
> >
> > Do we know why AE350 fails to boot with this patch?
> >
>
> When booting with bootm command,
> boot hart will use smp_call_function(addr, arg0, arg1) to tell other harts to jump to "addr",
> and other hart will use handle_ipi() to get the "addr".
>
> The address of bbl+kernel payload when booting with AE350 is at 0x0,
> so the "addr" would be 0x0 that other harts have to jump to.
>

Thanks for the info! It's weird that AE350 has set the kernel boot
entry at address 0.

> With this patch
>
> +       if (!ipi->addr) {
> +               pr_err("smp_function cannot be set to 0x0\n");
> +               return -EINVAL;
> +       }
>
> +       if (!smp_function)
> +               return;
>
> these two code snippets would cause u-boot to hang and thus fail the booting process.
>
> > If some big changes are reworked in newer patches, I believe author
> > should remove the tag and re-request the review.
> >
>
> Oh I see! Thanks for the explanation.

Regards,
Bin

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs"
  2020-09-14  3:10           ` Rick Chen
@ 2020-09-14 12:45             ` Sean Anderson
  0 siblings, 0 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-14 12:45 UTC (permalink / raw)
  To: u-boot

On 9/13/20 11:10 PM, Rick Chen wrote:
> HI Sean
> 
>> On 9/11/20 10:45 AM, Bin Meng wrote:
>>> On Fri, Sep 11, 2020 at 6:22 PM Sean Anderson <seanga2@gmail.com> wrote:
>>>>
>>>> On 9/11/20 3:38 AM, Bin Meng wrote:
>>>>> Hi Sean,
>>>>>
>>>>> On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>>>>>>
>>>>>> Clearing MIP doesn't do anything. Whoops. The following commits should
>>>>>
>>>>> Which following commits?
>>>>>
>>>>>> tackle this problem in a more robust manner.
>>>>>>
>>>>>> This reverts commit 9472630337e7c4ac442066b5a752aaa8c3b4d4a6.
>>>>>>
>>>>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>>>>>> ---
>>>>>>
>>>>>>  arch/riscv/cpu/start.S | 2 --
>>>>>>  1 file changed, 2 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
>>>>>> index bf9fdf369b..e3222b1ea7 100644
>>>>>> --- a/arch/riscv/cpu/start.S
>>>>>> +++ b/arch/riscv/cpu/start.S
>>>>>> @@ -65,8 +65,6 @@ _start:
>>>>>>  #else
>>>>>>         li      t0, SIE_SSIE
>>>>>>  #endif
>>>>>> -       /* Clear any pending IPIs */
>>>>>> -       csrc    MODE_PREFIX(ip), t0
>>>>>
>>>>> Did you mean the clearing MIP.MSIP actually does nothing, but the
>>>>> following commit is the correct fix?
>>>>
>>>> Yes, but we also need
>>>
>>> Is MIP.MSIP read-only on K210?
> 
> Since clear mip will not affect anything in K210 and it is writable
> for other RISC-V platforms.

Does it do anything for other RISC-V platforms? I checked the manuals
for the Sifive fu540 and fe310 and the Nuclei Bumblebee (the core
Gigadevice's GD32VF103 series), and none of them do anything with writes
to MIP. Does Andes do anything with writes in their cores? At the very
least I think the comment should be changed so as not to mislead
readers.

> I will prefer to keep this instruction stay here for standard startup
> initialization.

I would prefer not to, since the rest of this series should handle the
original intent of this commit in a much more robust manner.

--Sean

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6/7] riscv: Ensure gp is NULL or points to valid data
  2020-09-14  5:25   ` Bin Meng
@ 2020-09-14 13:03     ` Sean Anderson
  2020-09-14 13:27       ` Sean Anderson
  0 siblings, 1 reply; 45+ messages in thread
From: Sean Anderson @ 2020-09-14 13:03 UTC (permalink / raw)
  To: u-boot

On 9/14/20 1:25 AM, Bin Meng wrote:
> On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>>
>> This allows code to use a construct like `if (gd & gd->...) { ... }` when
>> accessing the global data pointer. Without this change, it was possible for
>> a very early trap to cause _exit_trap to read arbitrary memory. This could
>> cause a second trap, preventing show_regs from being printed.
>>
>> Fixes: 7c6ca03eaed0035ca6676e9bc7f5f1dfcaae7e8f
> 
> nits: wrong fixes tag format
> 
>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>> ---
>>
>>  arch/riscv/cpu/start.S      | 20 +++++++++++++++++---
>>  arch/riscv/lib/interrupts.c |  3 ++-
>>  2 files changed, 19 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
>> index ad18e746b6..59d3d7bbf4 100644
>> --- a/arch/riscv/cpu/start.S
>> +++ b/arch/riscv/cpu/start.S
>> @@ -47,6 +47,13 @@ _start:
>>         mv      tp, a0
>>         mv      s1, a1
>>
>> +       /*
>> +        * Set the global data pointer to a known value in case we get a very
>> +        * early trap. The global data pointer will be set its actual value only
>> +        * after it has been initialized.
>> +        */
>> +       mv      gp, zero
>> +
>>         la      t0, trap_entry
>>         csrw    MODE_PREFIX(tvec), t0
>>
>> @@ -85,10 +92,10 @@ call_board_init_f_0:
>>         jal     board_init_f_alloc_reserve
>>
>>         /*
>> -        * Set global data pointer here for all harts, uninitialized at this
>> -        * point.
>> +        * Save global data pointer for later. We don't set it here because it
>> +        * is not initialized yet.
>>          */
>> -       mv      gp, a0
>> +       mv      s0, a0
>>
>>         /* setup stack */
>>  #if CONFIG_IS_ENABLED(SMP)
>> @@ -135,6 +142,13 @@ wait_for_gd_init:
>>         fence   r, rw
>>         bnez    t1, 1b
>>
>> +       /*
>> +        * Set the global data pointer only when gd_t has been initialized.
>> +        * This was already set by arch_setup_gd on the boot hart, but all other
>> +        * harts' global data pointers gets set here.
>> +        */
>> +       mv      gp, s0
> 
> This is currently in the #ifndef CONFIG_XIP block. Should consider the
> #else branch?

gp is set by arch_setup_gd in board_init_f_init_reserve on the boot
hart. Setting it here is only necessary for secondary harts.

> 
>> +
>>         /* register available harts in the available_harts mask */
>>         li      t1, 1
>>         sll     t1, t1, tp
>> diff --git a/arch/riscv/lib/interrupts.c b/arch/riscv/lib/interrupts.c
>> index cd47e64487..ad870e98d8 100644
>> --- a/arch/riscv/lib/interrupts.c
>> +++ b/arch/riscv/lib/interrupts.c
>> @@ -78,7 +78,8 @@ static void _exit_trap(ulong code, ulong epc, ulong tval, struct pt_regs *regs)
>>
>>         printf("EPC: " REG_FMT " RA: " REG_FMT " TVAL: " REG_FMT "\n",
>>                epc, regs->ra, tval);
>> -       if (gd->flags & GD_FLG_RELOC)
>> +       /* Print relocation adjustments, but only if gd is initialized */
>> +       if (gd && gd->flags & GD_FLG_RELOC)
>>                 printf("EPC: " REG_FMT " RA: " REG_FMT " reloc adjusted\n\n",
>>                        epc - gd->reloc_off, regs->ra - gd->reloc_off);
>>
>> --
> 
> Regards,
> Bin
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 6/7] riscv: Ensure gp is NULL or points to valid data
  2020-09-14 13:03     ` Sean Anderson
@ 2020-09-14 13:27       ` Sean Anderson
  0 siblings, 0 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-14 13:27 UTC (permalink / raw)
  To: u-boot

On 9/14/20 9:03 AM, Sean Anderson wrote:
> On 9/14/20 1:25 AM, Bin Meng wrote:
>> On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>>>
>>> This allows code to use a construct like `if (gd & gd->...) { ... }` when
>>> accessing the global data pointer. Without this change, it was possible for
>>> a very early trap to cause _exit_trap to read arbitrary memory. This could
>>> cause a second trap, preventing show_regs from being printed.
>>>
>>> Fixes: 7c6ca03eaed0035ca6676e9bc7f5f1dfcaae7e8f
>>
>> nits: wrong fixes tag format
>>
>>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>>> ---
>>>
>>>  arch/riscv/cpu/start.S      | 20 +++++++++++++++++---
>>>  arch/riscv/lib/interrupts.c |  3 ++-
>>>  2 files changed, 19 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/riscv/cpu/start.S b/arch/riscv/cpu/start.S
>>> index ad18e746b6..59d3d7bbf4 100644
>>> --- a/arch/riscv/cpu/start.S
>>> +++ b/arch/riscv/cpu/start.S
>>> @@ -47,6 +47,13 @@ _start:
>>>         mv      tp, a0
>>>         mv      s1, a1
>>>
>>> +       /*
>>> +        * Set the global data pointer to a known value in case we get a very
>>> +        * early trap. The global data pointer will be set its actual value only
>>> +        * after it has been initialized.
>>> +        */
>>> +       mv      gp, zero
>>> +
>>>         la      t0, trap_entry
>>>         csrw    MODE_PREFIX(tvec), t0
>>>
>>> @@ -85,10 +92,10 @@ call_board_init_f_0:
>>>         jal     board_init_f_alloc_reserve
>>>
>>>         /*
>>> -        * Set global data pointer here for all harts, uninitialized at this
>>> -        * point.
>>> +        * Save global data pointer for later. We don't set it here because it
>>> +        * is not initialized yet.
>>>          */
>>> -       mv      gp, a0
>>> +       mv      s0, a0
>>>
>>>         /* setup stack */
>>>  #if CONFIG_IS_ENABLED(SMP)
>>> @@ -135,6 +142,13 @@ wait_for_gd_init:
>>>         fence   r, rw
>>>         bnez    t1, 1b
>>>
>>> +       /*
>>> +        * Set the global data pointer only when gd_t has been initialized.
>>> +        * This was already set by arch_setup_gd on the boot hart, but all other
>>> +        * harts' global data pointers gets set here.
>>> +        */
>>> +       mv      gp, s0
>>
>> This is currently in the #ifndef CONFIG_XIP block. Should consider the
>> #else branch?
> 
> gp is set by arch_setup_gd in board_init_f_init_reserve on the boot
> hart. Setting it here is only necessary for secondary harts.

Ok, I see what you mean. I guess I can set gp just before XIP jumps to
secondary_hart_loop. Doing it that way is still vulnerable to pending
IPIs, but not all cores should have that problem. Perhaps that
limitation should be documented somewhere? In any case, I don't have an
XIP core to test on (and there isn't a CI configuration either).

--Sean

>>
>>> +
>>>         /* register available harts in the available_harts mask */
>>>         li      t1, 1
>>>         sll     t1, t1, tp
>>> diff --git a/arch/riscv/lib/interrupts.c b/arch/riscv/lib/interrupts.c
>>> index cd47e64487..ad870e98d8 100644
>>> --- a/arch/riscv/lib/interrupts.c
>>> +++ b/arch/riscv/lib/interrupts.c
>>> @@ -78,7 +78,8 @@ static void _exit_trap(ulong code, ulong epc, ulong tval, struct pt_regs *regs)
>>>
>>>         printf("EPC: " REG_FMT " RA: " REG_FMT " TVAL: " REG_FMT "\n",
>>>                epc, regs->ra, tval);
>>> -       if (gd->flags & GD_FLG_RELOC)
>>> +       /* Print relocation adjustments, but only if gd is initialized */
>>> +       if (gd && gd->flags & GD_FLG_RELOC)
>>>                 printf("EPC: " REG_FMT " RA: " REG_FMT " reloc adjusted\n\n",
>>>                        epc - gd->reloc_off, regs->ra - gd->reloc_off);
>>>
>>> --
>>
>> Regards,
>> Bin
>>
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function
  2020-09-11  8:04   ` Bin Meng
  2020-09-14  1:58     ` Leo Liang
@ 2020-09-14 14:05     ` Sean Anderson
  1 sibling, 0 replies; 45+ messages in thread
From: Sean Anderson @ 2020-09-14 14:05 UTC (permalink / raw)
  To: u-boot


On 9/11/20 4:04 AM, Bin Meng wrote:
> On Tue, Sep 8, 2020 at 2:17 AM Sean Anderson <seanga2@gmail.com> wrote:
>>
>> Some IPIs may already be pending when U-Boot is started. This could be a
>> problem if a secondary hart tries to handle an IPI before the boot hart has
>> initialized the IPI device.
>>
>> This commit uses NULL as a sentinel for secondary harts so they know when
>> the IPI is initialized, and it is safe to use the IPI API. The smp addr
>> parameter is initialized to NULL by board_init_f_init_reserve. Before this,
>> secondary harts wait in wait_for_gd_init.
>>
>> This imposes a minor restriction because harts may no longer jump to NULL.
>> However, given that the RISC-V debug device is likely to be located at
>> 0x400, it is unlikely for any RISC-V implementation to have usable ram
>> located at 0x0.
>>
>> Signed-off-by: Sean Anderson <seanga2@gmail.com>
>> ---
>>
>>  arch/riscv/lib/smp.c | 26 ++++++++++++++++++++++----
>>  1 file changed, 22 insertions(+), 4 deletions(-)
>>
> 
> Reviewed-by: Bin Meng <bin.meng@windriver.com>
> 

Per Leo's comments, I will not include this tag in the next revision, as
it uses a different mechanism to accomplish this behavior.

--Sean

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2020-09-14 14:05 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-07 18:16 [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Sean Anderson
2020-09-07 18:16 ` [PATCH 1/7] Revert "riscv: Clear pending interrupts before enabling IPIs" Sean Anderson
2020-09-09  7:50   ` Rick Chen
2020-09-09 10:23     ` Sean Anderson
2020-09-10  6:39       ` Rick Chen
2020-09-10 10:18         ` Sean Anderson
2020-09-11  7:38   ` Bin Meng
2020-09-11 10:22     ` Sean Anderson
2020-09-11 14:45       ` Bin Meng
2020-09-11 18:30         ` Sean Anderson
2020-09-14  3:10           ` Rick Chen
2020-09-14 12:45             ` Sean Anderson
2020-09-07 18:16 ` [PATCH 2/7] riscv: Match memory barriers between send_ipi_many and handle_ipi Sean Anderson
2020-09-11  7:45   ` Bin Meng
2020-09-07 18:16 ` [PATCH 3/7] riscv: Use NULL as a sentinel value for smp_call_function Sean Anderson
2020-09-09  8:33   ` Rick Chen
2020-09-09  9:01     ` Rick Chen
2020-09-09 10:16       ` Sean Anderson
2020-09-09 10:26         ` Heinrich Schuchardt
2020-09-09 10:36           ` Sean Anderson
2020-09-10  8:09         ` Rick Chen
2020-09-14  3:21         ` Rick Chen
2020-09-11  8:04   ` Bin Meng
2020-09-14  1:58     ` Leo Liang
2020-09-14  2:07       ` Bin Meng
2020-09-14  6:10         ` Leo Liang
2020-09-14  6:15           ` Bin Meng
2020-09-14 14:05     ` Sean Anderson
2020-09-07 18:16 ` [PATCH 4/7] riscv: Clear pending IPIs on initialization Sean Anderson
2020-09-14  2:08   ` Bin Meng
2020-09-07 18:16 ` [PATCH 5/7] riscv: Add fence to available_harts_lock Sean Anderson
2020-09-10  3:26   ` Rick Chen
2020-09-11 10:39     ` Sean Anderson
2020-09-11 14:47   ` Bin Meng
2020-09-07 18:16 ` [PATCH 6/7] riscv: Ensure gp is NULL or points to valid data Sean Anderson
2020-09-14  5:25   ` Bin Meng
2020-09-14 13:03     ` Sean Anderson
2020-09-14 13:27       ` Sean Anderson
2020-09-07 18:16 ` [PATCH 7/7] riscv: Add some comments to start.S Sean Anderson
2020-09-14  5:26   ` Bin Meng
2020-09-09  2:02 ` [PATCH 0/7] riscv: Correctly handle IPIs already pending upon boot Rick Chen
2020-09-09  2:38   ` Sean Anderson
2020-09-09  2:44     ` Sean Anderson
2020-09-10  7:08     ` Rick Chen
2020-09-10 10:49       ` Sean Anderson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.