All of lore.kernel.org
 help / color / mirror / Atom feed
* [kvm-unit-tests PATCH v5 0/2] arm: Add PSCI CPU_OFF test
@ 2023-01-27 17:59 Alexandru Elisei
  2023-01-27 17:59 ` [kvm-unit-tests PATCH v5 1/2] arm/psci: Test that CPU 1 has been successfully brought online Alexandru Elisei
  2023-01-27 17:59 ` [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case Alexandru Elisei
  0 siblings, 2 replies; 10+ messages in thread
From: Alexandru Elisei @ 2023-01-27 17:59 UTC (permalink / raw)
  To: andrew.jones, kvm, kvmarm

The series adds a test for the PSCI function CPU_OFF. The test is
implemented in patch #2, "arm/psci: Add PSCI CPU_OFF test case".  Patch #1,
"arm/psci: Test that CPU 1 has been successfully brought online" is a
preparatory patch to allow the CPU_OFF test to run after the CPU_ON test.
Executing the CPU_OFF test after the CPU_ON test makes the most sense,
since CPU_OFF requires all the CPUs to be brought online before the test
can execute, and this is exactly what the CPU_ON test does.

I believe that proving that a CPU has been successfully offlined is an
undecidable problem - it might just be that the CPU is stuck at a higher
exception level doing something else entirely, and if the test were to wait
long enough, the CPU would return from the CPU_OFF call and start executing
code, thus failing to be offlined. Right now, the test waits for 100ms
after checking that all the other CPUs are offline. I thought this was a
good balance between making the test fast and being reasonably sure that
the offline succeeded. I'm open to suggestions here if anyone thinks
otherwise.

Tested for both the arm and arm64 architectures: on an x86 machine (qemu
with TCG), and a rockpro64 (qemu and kvmtool).

And Nikita is not longer at Arm, and I don't have a new email address where
she can be reached, so I didn't CC her.

Changelog:

v4 -> v5:
- Open code smp_prepare_secondary in psci.c instead of exposing it as a
  library function in the cpu_on test (Drew).
- Don't return early if a CPU failed to online CPU1 in the cpu_on test, and
  check the return values for the CPUs that succeeded to online CPU 1
  (Drew).
- Drop cpu_off_success in the cpu_off test in favour of checking
  AFFINITY_INFO and the cpu idle mask (Drew).
- Hopefully made the cpu_off test faster (Drew).

v3 -> v4:
- Moved ownership of the series to Alexandru Elisei. All bugs are mine.
- Moved the timeout for the CPU_ON test to patch #1.
- Changed the include order for arm/psic.c in patch #1 to order the headers
  alphabetically.
- Run the CPU_OFF test only if CPU_ON succeeds, because CPU_OFF expects all
  CPUs to be online, the test would otherwise hang forever.
- Minor style changes here and there.

v2 -> v3:
- Add timeout so that test does not hang if CPU1 fails to come online
- Remove unnecessary call of on_cpus() in the condition where target CPU is not online.

v1 -> v2:
- Modify PSCI CPU_ON test to ensure CPU 1 remains online after the execution of the test.
- Addition of PSCI CPU_OFF test and calling it after PSCI CPU_ON test has been executed.

Alexandru Elisei (1):
  arm/psci: Test that CPU 1 has been successfully brought online

Nikita Venkatesh (1):
  arm/psci: Add PSCI CPU_OFF test case

 arm/psci.c        | 132 +++++++++++++++++++++++++++++++++++++++-------
 lib/arm/asm/smp.h |   1 +
 lib/arm/smp.c     |  12 +++--
 lib/errata.h      |   2 +
 4 files changed, 125 insertions(+), 22 deletions(-)


base-commit: 2480430a36102f8ea276b3bfb1d64d5dacc23b8f
-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [kvm-unit-tests PATCH v5 1/2] arm/psci: Test that CPU 1 has been successfully brought online
  2023-01-27 17:59 [kvm-unit-tests PATCH v5 0/2] arm: Add PSCI CPU_OFF test Alexandru Elisei
@ 2023-01-27 17:59 ` Alexandru Elisei
  2023-01-31  6:43   ` Andrew Jones
  2023-01-27 17:59 ` [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case Alexandru Elisei
  1 sibling, 1 reply; 10+ messages in thread
From: Alexandru Elisei @ 2023-01-27 17:59 UTC (permalink / raw)
  To: andrew.jones, kvm, kvmarm

For the PSCI CPU_ON function test, all other CPUs perform a CPU_ON call
that target CPU 1. The test is considered a success if CPU_ON returns PSCI
SUCCESS exactly once, and for the rest of the calls PSCI ALREADY_ON.

Enhance the test by checking that CPU 1 is actually online and able to
execute code. Also make the test more robust by checking that the CPU_ON
call returns, instead of assuming that it will always succeed and
hanging indefinitely if it doesn't.

Since the CPU 1 thread is now being set up properly by kvm-unit-tests
when being brought online, it becomes possible to add other tests in the
future that require all CPUs.

The include header order in arm/psci.c has been changed to be in
alphabetic order. This means moving the errata.h include before
libcflat.h, which causes compilation to fail because of missing includes
in errata.h. Fix that also by including the needed header in errata.h.

Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---
 arm/psci.c        | 67 ++++++++++++++++++++++++++++++++++++++---------
 lib/arm/asm/smp.h |  9 ++++++-
 lib/arm/smp.c     |  4 ---
 lib/errata.h      |  2 ++
 4 files changed, 64 insertions(+), 18 deletions(-)

diff --git a/arm/psci.c b/arm/psci.c
index efa0722c0566..f7238f8e0bbd 100644
--- a/arm/psci.c
+++ b/arm/psci.c
@@ -7,11 +7,14 @@
  *
  * This work is licensed under the terms of the GNU LGPL, version 2.
  */
-#include <libcflat.h>
 #include <errata.h>
+#include <libcflat.h>
+
+#include <asm/delay.h>
+#include <asm/mmu.h>
 #include <asm/processor.h>
-#include <asm/smp.h>
 #include <asm/psci.h>
+#include <asm/smp.h>
 
 static bool invalid_function_exception;
 
@@ -72,46 +75,84 @@ static int cpu_on_ret[NR_CPUS];
 static cpumask_t cpu_on_ready, cpu_on_done;
 static volatile int cpu_on_start;
 
-static void cpu_on_secondary_entry(void)
+extern void secondary_entry(void);
+static void cpu_on_do_wake_target(void)
 {
 	int cpu = smp_processor_id();
 
 	cpumask_set_cpu(cpu, &cpu_on_ready);
 	while (!cpu_on_start)
 		cpu_relax();
-	cpu_on_ret[cpu] = psci_cpu_on(cpus[1], __pa(halt));
+	cpu_on_ret[cpu] = psci_cpu_on(cpus[1], __pa(secondary_entry));
 	cpumask_set_cpu(cpu, &cpu_on_done);
 }
 
+static void cpu_on_target(void)
+{
+	int cpu = smp_processor_id();
+
+	cpumask_set_cpu(cpu, &cpu_on_done);
+}
+
+extern struct secondary_data secondary_data;
+
+/* Open code the setup part from smp_boot_secondary(). */
+static void psci_cpu_on_prepare_secondary(int cpu, secondary_entry_fn entry)
+{
+	secondary_data.stack = thread_stack_alloc();
+	secondary_data.entry = entry;
+	mmu_mark_disabled(cpu);
+}
+
 static bool psci_cpu_on_test(void)
 {
 	bool failed = false;
 	int ret_success = 0;
-	int cpu;
-
-	cpumask_set_cpu(1, &cpu_on_ready);
-	cpumask_set_cpu(1, &cpu_on_done);
+	int i, cpu;
 
 	for_each_present_cpu(cpu) {
 		if (cpu < 2)
 			continue;
-		smp_boot_secondary(cpu, cpu_on_secondary_entry);
+		smp_boot_secondary(cpu, cpu_on_do_wake_target);
 	}
 
 	cpumask_set_cpu(0, &cpu_on_ready);
+	cpumask_set_cpu(1, &cpu_on_ready);
 	while (!cpumask_full(&cpu_on_ready))
 		cpu_relax();
 
+	/*
+	 * Configure CPU 1 after all secondaries are online to avoid
+	 * secondary_data being overwritten.
+	 */
+	psci_cpu_on_prepare_secondary(1, cpu_on_target);
+
 	cpu_on_start = 1;
 	smp_mb();
 
-	cpu_on_ret[0] = psci_cpu_on(cpus[1], __pa(halt));
+	cpu_on_ret[0] = psci_cpu_on(cpus[1], __pa(secondary_entry));
 	cpumask_set_cpu(0, &cpu_on_done);
 
-	while (!cpumask_full(&cpu_on_done))
-		cpu_relax();
+	report_info("waiting for CPU1 to come online...");
+	for (i = 0; i < 100; i++) {
+		mdelay(10);
+		if (cpumask_full(&cpu_on_done))
+			break;
+	}
 
-	for_each_present_cpu(cpu) {
+	if (!cpumask_full(&cpu_on_done)) {
+		for_each_present_cpu(cpu) {
+			if (!cpumask_test_cpu(cpu, &cpu_on_done)) {
+				if (cpu == 1)
+					report_info("CPU1 failed to come online");
+				else
+					report_info("CPU%d failed to online CPU1", cpu);
+			}
+		}
+		failed = true;
+	}
+
+	for_each_cpu(cpu, &cpu_on_done) {
 		if (cpu == 1)
 			continue;
 		if (cpu_on_ret[cpu] == PSCI_RET_SUCCESS) {
diff --git a/lib/arm/asm/smp.h b/lib/arm/asm/smp.h
index 077afde85520..dee4c1a883e7 100644
--- a/lib/arm/asm/smp.h
+++ b/lib/arm/asm/smp.h
@@ -10,6 +10,14 @@
 
 #define smp_processor_id()		(current_thread_info()->cpu)
 
+typedef void (*secondary_entry_fn)(void);
+
+struct secondary_data {
+	void *stack;            /* must be first member of struct */
+	secondary_entry_fn entry;
+};
+extern struct secondary_data secondary_data;
+
 extern bool cpu0_calls_idle;
 
 extern void halt(void);
@@ -48,7 +56,6 @@ static inline void set_cpu_idle(int cpu, bool idle)
 		cpumask_clear_cpu(cpu, &cpu_idle_mask);
 }
 
-typedef void (*secondary_entry_fn)(void);
 extern void smp_boot_secondary(int cpu, secondary_entry_fn entry);
 extern void on_cpu_async(int cpu, void (*func)(void *data), void *data);
 extern void on_cpu(int cpu, void (*func)(void *data), void *data);
diff --git a/lib/arm/smp.c b/lib/arm/smp.c
index 98a5054e039b..1d470d1aab45 100644
--- a/lib/arm/smp.c
+++ b/lib/arm/smp.c
@@ -21,10 +21,6 @@ cpumask_t cpu_present_mask;
 cpumask_t cpu_online_mask;
 cpumask_t cpu_idle_mask;
 
-struct secondary_data {
-	void *stack;            /* must be first member of struct */
-	secondary_entry_fn entry;
-};
 struct secondary_data secondary_data;
 static struct spinlock lock;
 
diff --git a/lib/errata.h b/lib/errata.h
index 5af0eb3bf8e2..de8205d8b370 100644
--- a/lib/errata.h
+++ b/lib/errata.h
@@ -6,6 +6,8 @@
  */
 #ifndef _ERRATA_H_
 #define _ERRATA_H_
+#include <libcflat.h>
+
 #include "config.h"
 
 #ifndef CONFIG_ERRATA_FORCE
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case
  2023-01-27 17:59 [kvm-unit-tests PATCH v5 0/2] arm: Add PSCI CPU_OFF test Alexandru Elisei
  2023-01-27 17:59 ` [kvm-unit-tests PATCH v5 1/2] arm/psci: Test that CPU 1 has been successfully brought online Alexandru Elisei
@ 2023-01-27 17:59 ` Alexandru Elisei
  2023-01-31  6:56   ` Andrew Jones
  1 sibling, 1 reply; 10+ messages in thread
From: Alexandru Elisei @ 2023-01-27 17:59 UTC (permalink / raw)
  To: andrew.jones, kvm, kvmarm

From: Nikita Venkatesh <Nikita.Venkatesh@arm.com>

The test uses the following method.

The primary CPU brings up all the secondary CPUs, which are held in a wait
loop. Once the primary releases the CPUs, each of the secondary CPUs
proceed to issue CPU_OFF.

The primary CPU then checks for the status of the individual CPU_OFF
request. There is a chance that some CPUs might return from the CPU_OFF
function call after the primary CPU has finished the scan. There is no
foolproof method to handle this, but the test tries its best to
eliminate these false positives by introducing an extra delay if all the
CPUs are reported offline after the initial scan.

Signed-off-by: Nikita Venkatesh <Nikita.Venkatesh@arm.com>
[ Alex E: Skip CPU_OFF test if CPU_ON failed, drop cpu_off_success in
	  favour of checking AFFINITY_INFO, commit message tweaking ]
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
---

Decided to drop Drew's Reviewed-by tag because the changes are not trivial
from the previous version.

 arm/psci.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 75 insertions(+), 5 deletions(-)

diff --git a/arm/psci.c b/arm/psci.c
index f7238f8e0bbd..7034d8ebe6e1 100644
--- a/arm/psci.c
+++ b/arm/psci.c
@@ -72,8 +72,9 @@ static bool psci_affinity_info_off(void)
 }
 
 static int cpu_on_ret[NR_CPUS];
-static cpumask_t cpu_on_ready, cpu_on_done;
+static cpumask_t cpu_on_ready, cpu_on_done, cpu_off_done;
 static volatile int cpu_on_start;
+static volatile int cpu_off_start;
 
 extern void secondary_entry(void);
 static void cpu_on_do_wake_target(void)
@@ -171,9 +172,71 @@ static bool psci_cpu_on_test(void)
 	return !failed;
 }
 
-int main(void)
+static void cpu_off_secondary_entry(void *data)
+{
+	int cpu = smp_processor_id();
+
+	while (!cpu_off_start)
+		cpu_relax();
+	cpumask_set_cpu(cpu, &cpu_off_done);
+	cpu_psci_cpu_die();
+}
+
+static bool psci_cpu_off_test(void)
+{
+	bool failed = false;
+	int i, count, cpu;
+
+	for_each_present_cpu(cpu) {
+		if (cpu == 0)
+			continue;
+		on_cpu_async(cpu, cpu_off_secondary_entry, NULL);
+	}
+
+	cpumask_set_cpu(0, &cpu_off_done);
+
+	cpu_off_start = 1;
+	report_info("waiting for the CPUs to be offlined...");
+	while (!cpumask_full(&cpu_off_done))
+		cpu_relax();
+
+	/* Allow all the other CPUs to complete the operation */
+	for (i = 0; i < 100; i++) {
+		mdelay(10);
+
+		count = 0;
+		for_each_present_cpu(cpu) {
+			if (cpu == 0)
+				continue;
+			if (psci_affinity_info(cpus[cpu], 0) != PSCI_0_2_AFFINITY_LEVEL_OFF)
+				count++;
+		}
+		if (count > 0)
+			continue;
+	}
+
+	/* Try to catch CPUs that return from CPU_OFF. */
+	if (count == 0)
+		mdelay(100);
+
+	for_each_present_cpu(cpu) {
+		if (cpu == 0)
+			continue;
+		if (cpu_idle(cpu)) {
+			report_info("CPU%d failed to be offlined", cpu);
+			if (psci_affinity_info(cpus[cpu], 0) == PSCI_0_2_AFFINITY_LEVEL_OFF)
+				report_info("AFFINITY_INFO incorrectly reports CPU%d as offline", cpu);
+			failed = true;
+		}
+	}
+
+	return !failed;
+}
+
+int main(int argc, char **argv)
 {
 	int ver = psci_invoke(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
+	bool cpu_on_success = true;
 
 	report_prefix_push("psci");
 
@@ -188,10 +251,17 @@ int main(void)
 	report(psci_affinity_info_on(), "affinity-info-on");
 	report(psci_affinity_info_off(), "affinity-info-off");
 
-	if (ERRATA(6c7a5dce22b3))
-		report(psci_cpu_on_test(), "cpu-on");
-	else
+	if (ERRATA(6c7a5dce22b3)) {
+		cpu_on_success = psci_cpu_on_test();
+		report(cpu_on_success, "cpu-on");
+	} else {
 		report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable.");
+	}
+
+	if (!cpu_on_success)
+		report_skip("Skipping cpu-off test because the cpu-on test failed");
+	else
+		report(psci_cpu_off_test(), "cpu-off");
 
 done:
 #if 0
-- 
2.39.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [kvm-unit-tests PATCH v5 1/2] arm/psci: Test that CPU 1 has been successfully brought online
  2023-01-27 17:59 ` [kvm-unit-tests PATCH v5 1/2] arm/psci: Test that CPU 1 has been successfully brought online Alexandru Elisei
@ 2023-01-31  6:43   ` Andrew Jones
  0 siblings, 0 replies; 10+ messages in thread
From: Andrew Jones @ 2023-01-31  6:43 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, kvmarm

On Fri, Jan 27, 2023 at 05:59:15PM +0000, Alexandru Elisei wrote:
> For the PSCI CPU_ON function test, all other CPUs perform a CPU_ON call
> that target CPU 1. The test is considered a success if CPU_ON returns PSCI
> SUCCESS exactly once, and for the rest of the calls PSCI ALREADY_ON.
> 
> Enhance the test by checking that CPU 1 is actually online and able to
> execute code. Also make the test more robust by checking that the CPU_ON
> call returns, instead of assuming that it will always succeed and
> hanging indefinitely if it doesn't.
> 
> Since the CPU 1 thread is now being set up properly by kvm-unit-tests
> when being brought online, it becomes possible to add other tests in the
> future that require all CPUs.
> 
> The include header order in arm/psci.c has been changed to be in
> alphabetic order. This means moving the errata.h include before
> libcflat.h, which causes compilation to fail because of missing includes
> in errata.h. Fix that also by including the needed header in errata.h.
> 
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
>  arm/psci.c        | 67 ++++++++++++++++++++++++++++++++++++++---------
>  lib/arm/asm/smp.h |  9 ++++++-
>  lib/arm/smp.c     |  4 ---
>  lib/errata.h      |  2 ++
>  4 files changed, 64 insertions(+), 18 deletions(-)
> 
> diff --git a/arm/psci.c b/arm/psci.c
> index efa0722c0566..f7238f8e0bbd 100644
> --- a/arm/psci.c
> +++ b/arm/psci.c
> @@ -7,11 +7,14 @@
>   *
>   * This work is licensed under the terms of the GNU LGPL, version 2.
>   */
> -#include <libcflat.h>
>  #include <errata.h>
> +#include <libcflat.h>
> +
> +#include <asm/delay.h>
> +#include <asm/mmu.h>
>  #include <asm/processor.h>
> -#include <asm/smp.h>
>  #include <asm/psci.h>
> +#include <asm/smp.h>
>  
>  static bool invalid_function_exception;
>  
> @@ -72,46 +75,84 @@ static int cpu_on_ret[NR_CPUS];
>  static cpumask_t cpu_on_ready, cpu_on_done;
>  static volatile int cpu_on_start;
>  
> -static void cpu_on_secondary_entry(void)
> +extern void secondary_entry(void);
> +static void cpu_on_do_wake_target(void)
>  {
>  	int cpu = smp_processor_id();
>  
>  	cpumask_set_cpu(cpu, &cpu_on_ready);
>  	while (!cpu_on_start)
>  		cpu_relax();
> -	cpu_on_ret[cpu] = psci_cpu_on(cpus[1], __pa(halt));
> +	cpu_on_ret[cpu] = psci_cpu_on(cpus[1], __pa(secondary_entry));
>  	cpumask_set_cpu(cpu, &cpu_on_done);
>  }
>  
> +static void cpu_on_target(void)
> +{
> +	int cpu = smp_processor_id();
> +
> +	cpumask_set_cpu(cpu, &cpu_on_done);
> +}
> +
> +extern struct secondary_data secondary_data;
> +
> +/* Open code the setup part from smp_boot_secondary(). */
> +static void psci_cpu_on_prepare_secondary(int cpu, secondary_entry_fn entry)
> +{
> +	secondary_data.stack = thread_stack_alloc();
> +	secondary_data.entry = entry;
> +	mmu_mark_disabled(cpu);
> +}
> +
>  static bool psci_cpu_on_test(void)
>  {
>  	bool failed = false;
>  	int ret_success = 0;
> -	int cpu;
> -
> -	cpumask_set_cpu(1, &cpu_on_ready);
> -	cpumask_set_cpu(1, &cpu_on_done);
> +	int i, cpu;
>  
>  	for_each_present_cpu(cpu) {
>  		if (cpu < 2)
>  			continue;
> -		smp_boot_secondary(cpu, cpu_on_secondary_entry);
> +		smp_boot_secondary(cpu, cpu_on_do_wake_target);
>  	}
>  
>  	cpumask_set_cpu(0, &cpu_on_ready);
> +	cpumask_set_cpu(1, &cpu_on_ready);
>  	while (!cpumask_full(&cpu_on_ready))
>  		cpu_relax();
>  
> +	/*
> +	 * Configure CPU 1 after all secondaries are online to avoid
> +	 * secondary_data being overwritten.
> +	 */
> +	psci_cpu_on_prepare_secondary(1, cpu_on_target);
> +
>  	cpu_on_start = 1;
>  	smp_mb();
>  
> -	cpu_on_ret[0] = psci_cpu_on(cpus[1], __pa(halt));
> +	cpu_on_ret[0] = psci_cpu_on(cpus[1], __pa(secondary_entry));
>  	cpumask_set_cpu(0, &cpu_on_done);
>  
> -	while (!cpumask_full(&cpu_on_done))
> -		cpu_relax();
> +	report_info("waiting for CPU1 to come online...");
> +	for (i = 0; i < 100; i++) {
> +		mdelay(10);
> +		if (cpumask_full(&cpu_on_done))
> +			break;
> +	}
>  
> -	for_each_present_cpu(cpu) {
> +	if (!cpumask_full(&cpu_on_done)) {
> +		for_each_present_cpu(cpu) {
> +			if (!cpumask_test_cpu(cpu, &cpu_on_done)) {
> +				if (cpu == 1)
> +					report_info("CPU1 failed to come online");
> +				else
> +					report_info("CPU%d failed to online CPU1", cpu);
> +			}
> +		}
> +		failed = true;
> +	}
> +
> +	for_each_cpu(cpu, &cpu_on_done) {
>  		if (cpu == 1)
>  			continue;
>  		if (cpu_on_ret[cpu] == PSCI_RET_SUCCESS) {
> diff --git a/lib/arm/asm/smp.h b/lib/arm/asm/smp.h
> index 077afde85520..dee4c1a883e7 100644
> --- a/lib/arm/asm/smp.h
> +++ b/lib/arm/asm/smp.h
> @@ -10,6 +10,14 @@
>  
>  #define smp_processor_id()		(current_thread_info()->cpu)
>  
> +typedef void (*secondary_entry_fn)(void);
> +
> +struct secondary_data {
> +	void *stack;            /* must be first member of struct */
> +	secondary_entry_fn entry;
> +};
> +extern struct secondary_data secondary_data;
> +
>  extern bool cpu0_calls_idle;
>  
>  extern void halt(void);
> @@ -48,7 +56,6 @@ static inline void set_cpu_idle(int cpu, bool idle)
>  		cpumask_clear_cpu(cpu, &cpu_idle_mask);
>  }
>  
> -typedef void (*secondary_entry_fn)(void);
>  extern void smp_boot_secondary(int cpu, secondary_entry_fn entry);
>  extern void on_cpu_async(int cpu, void (*func)(void *data), void *data);
>  extern void on_cpu(int cpu, void (*func)(void *data), void *data);
> diff --git a/lib/arm/smp.c b/lib/arm/smp.c
> index 98a5054e039b..1d470d1aab45 100644
> --- a/lib/arm/smp.c
> +++ b/lib/arm/smp.c
> @@ -21,10 +21,6 @@ cpumask_t cpu_present_mask;
>  cpumask_t cpu_online_mask;
>  cpumask_t cpu_idle_mask;
>  
> -struct secondary_data {
> -	void *stack;            /* must be first member of struct */
> -	secondary_entry_fn entry;
> -};
>  struct secondary_data secondary_data;
>  static struct spinlock lock;
>  
> diff --git a/lib/errata.h b/lib/errata.h
> index 5af0eb3bf8e2..de8205d8b370 100644
> --- a/lib/errata.h
> +++ b/lib/errata.h
> @@ -6,6 +6,8 @@
>   */
>  #ifndef _ERRATA_H_
>  #define _ERRATA_H_
> +#include <libcflat.h>
> +
>  #include "config.h"
>  
>  #ifndef CONFIG_ERRATA_FORCE
> -- 
> 2.39.0
>

Reviewed-by: Andrew Jones <andrew.jones@linux.dev>

Thanks,
drew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case
  2023-01-27 17:59 ` [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case Alexandru Elisei
@ 2023-01-31  6:56   ` Andrew Jones
  2023-01-31  9:52     ` Alexandru Elisei
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Jones @ 2023-01-31  6:56 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, kvmarm

On Fri, Jan 27, 2023 at 05:59:16PM +0000, Alexandru Elisei wrote:
> From: Nikita Venkatesh <Nikita.Venkatesh@arm.com>
> 
> The test uses the following method.
> 
> The primary CPU brings up all the secondary CPUs, which are held in a wait
> loop. Once the primary releases the CPUs, each of the secondary CPUs
> proceed to issue CPU_OFF.
> 
> The primary CPU then checks for the status of the individual CPU_OFF
> request. There is a chance that some CPUs might return from the CPU_OFF
> function call after the primary CPU has finished the scan. There is no
> foolproof method to handle this, but the test tries its best to
> eliminate these false positives by introducing an extra delay if all the
> CPUs are reported offline after the initial scan.
> 
> Signed-off-by: Nikita Venkatesh <Nikita.Venkatesh@arm.com>
> [ Alex E: Skip CPU_OFF test if CPU_ON failed, drop cpu_off_success in
> 	  favour of checking AFFINITY_INFO, commit message tweaking ]
> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> ---
> 
> Decided to drop Drew's Reviewed-by tag because the changes are not trivial
> from the previous version.
> 
>  arm/psci.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 75 insertions(+), 5 deletions(-)
> 
> diff --git a/arm/psci.c b/arm/psci.c
> index f7238f8e0bbd..7034d8ebe6e1 100644
> --- a/arm/psci.c
> +++ b/arm/psci.c
> @@ -72,8 +72,9 @@ static bool psci_affinity_info_off(void)
>  }
>  
>  static int cpu_on_ret[NR_CPUS];
> -static cpumask_t cpu_on_ready, cpu_on_done;
> +static cpumask_t cpu_on_ready, cpu_on_done, cpu_off_done;
>  static volatile int cpu_on_start;
> +static volatile int cpu_off_start;
>  
>  extern void secondary_entry(void);
>  static void cpu_on_do_wake_target(void)
> @@ -171,9 +172,71 @@ static bool psci_cpu_on_test(void)
>  	return !failed;
>  }
>  
> -int main(void)
> +static void cpu_off_secondary_entry(void *data)
> +{
> +	int cpu = smp_processor_id();
> +
> +	while (!cpu_off_start)
> +		cpu_relax();
> +	cpumask_set_cpu(cpu, &cpu_off_done);
> +	cpu_psci_cpu_die();
> +}
> +
> +static bool psci_cpu_off_test(void)
> +{
> +	bool failed = false;
> +	int i, count, cpu;
> +
> +	for_each_present_cpu(cpu) {
> +		if (cpu == 0)
> +			continue;
> +		on_cpu_async(cpu, cpu_off_secondary_entry, NULL);
> +	}
> +
> +	cpumask_set_cpu(0, &cpu_off_done);
> +
> +	cpu_off_start = 1;
> +	report_info("waiting for the CPUs to be offlined...");
> +	while (!cpumask_full(&cpu_off_done))
> +		cpu_relax();
> +
> +	/* Allow all the other CPUs to complete the operation */
> +	for (i = 0; i < 100; i++) {
> +		mdelay(10);
> +
> +		count = 0;
> +		for_each_present_cpu(cpu) {
> +			if (cpu == 0)
> +				continue;
> +			if (psci_affinity_info(cpus[cpu], 0) != PSCI_0_2_AFFINITY_LEVEL_OFF)
> +				count++;
> +		}
> +		if (count > 0)
> +			continue;

This should be

if (count == 0)
   break;

otherwise we never leave the loop early.

> +	}
> +
> +	/* Try to catch CPUs that return from CPU_OFF. */
> +	if (count == 0)
> +		mdelay(100);
> +
> +	for_each_present_cpu(cpu) {
> +		if (cpu == 0)
> +			continue;
> +		if (cpu_idle(cpu)) {
> +			report_info("CPU%d failed to be offlined", cpu);
> +			if (psci_affinity_info(cpus[cpu], 0) == PSCI_0_2_AFFINITY_LEVEL_OFF)
> +				report_info("AFFINITY_INFO incorrectly reports CPU%d as offline", cpu);
> +			failed = true;
> +		}
> +	}
> +
> +	return !failed;
> +}
> +
> +int main(int argc, char **argv)
>  {
>  	int ver = psci_invoke(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
> +	bool cpu_on_success = true;
>  
>  	report_prefix_push("psci");
>  
> @@ -188,10 +251,17 @@ int main(void)
>  	report(psci_affinity_info_on(), "affinity-info-on");
>  	report(psci_affinity_info_off(), "affinity-info-off");
>  
> -	if (ERRATA(6c7a5dce22b3))
> -		report(psci_cpu_on_test(), "cpu-on");
> -	else
> +	if (ERRATA(6c7a5dce22b3)) {
> +		cpu_on_success = psci_cpu_on_test();
> +		report(cpu_on_success, "cpu-on");
> +	} else {
>  		report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable.");
> +	}
> +
> +	if (!cpu_on_success)
> +		report_skip("Skipping cpu-off test because the cpu-on test failed");

We should output "was skipped" when the cpu-on test was skipped, rather
than always reporting "failed". We need two booleans, try_cpu_on_test and
cpu_on_success.

> +	else
> +		report(psci_cpu_off_test(), "cpu-off");
>  
>  done:
>  #if 0
> -- 
> 2.39.0
> 

Thanks,
drew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case
  2023-01-31  6:56   ` Andrew Jones
@ 2023-01-31  9:52     ` Alexandru Elisei
  2023-01-31 10:46       ` Andrew Jones
  0 siblings, 1 reply; 10+ messages in thread
From: Alexandru Elisei @ 2023-01-31  9:52 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, kvmarm

Hi Drew,

On Tue, Jan 31, 2023 at 07:56:23AM +0100, Andrew Jones wrote:
> On Fri, Jan 27, 2023 at 05:59:16PM +0000, Alexandru Elisei wrote:
> > From: Nikita Venkatesh <Nikita.Venkatesh@arm.com>
> > 
> > The test uses the following method.
> > 
> > The primary CPU brings up all the secondary CPUs, which are held in a wait
> > loop. Once the primary releases the CPUs, each of the secondary CPUs
> > proceed to issue CPU_OFF.
> > 
> > The primary CPU then checks for the status of the individual CPU_OFF
> > request. There is a chance that some CPUs might return from the CPU_OFF
> > function call after the primary CPU has finished the scan. There is no
> > foolproof method to handle this, but the test tries its best to
> > eliminate these false positives by introducing an extra delay if all the
> > CPUs are reported offline after the initial scan.
> > 
> > Signed-off-by: Nikita Venkatesh <Nikita.Venkatesh@arm.com>
> > [ Alex E: Skip CPU_OFF test if CPU_ON failed, drop cpu_off_success in
> > 	  favour of checking AFFINITY_INFO, commit message tweaking ]
> > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > ---
> > 
> > Decided to drop Drew's Reviewed-by tag because the changes are not trivial
> > from the previous version.
> > 
> >  arm/psci.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 75 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arm/psci.c b/arm/psci.c
> > index f7238f8e0bbd..7034d8ebe6e1 100644
> > --- a/arm/psci.c
> > +++ b/arm/psci.c
> > @@ -72,8 +72,9 @@ static bool psci_affinity_info_off(void)
> >  }
> >  
> >  static int cpu_on_ret[NR_CPUS];
> > -static cpumask_t cpu_on_ready, cpu_on_done;
> > +static cpumask_t cpu_on_ready, cpu_on_done, cpu_off_done;
> >  static volatile int cpu_on_start;
> > +static volatile int cpu_off_start;
> >  
> >  extern void secondary_entry(void);
> >  static void cpu_on_do_wake_target(void)
> > @@ -171,9 +172,71 @@ static bool psci_cpu_on_test(void)
> >  	return !failed;
> >  }
> >  
> > -int main(void)
> > +static void cpu_off_secondary_entry(void *data)
> > +{
> > +	int cpu = smp_processor_id();
> > +
> > +	while (!cpu_off_start)
> > +		cpu_relax();
> > +	cpumask_set_cpu(cpu, &cpu_off_done);
> > +	cpu_psci_cpu_die();
> > +}
> > +
> > +static bool psci_cpu_off_test(void)
> > +{
> > +	bool failed = false;
> > +	int i, count, cpu;
> > +
> > +	for_each_present_cpu(cpu) {
> > +		if (cpu == 0)
> > +			continue;
> > +		on_cpu_async(cpu, cpu_off_secondary_entry, NULL);
> > +	}
> > +
> > +	cpumask_set_cpu(0, &cpu_off_done);
> > +
> > +	cpu_off_start = 1;
> > +	report_info("waiting for the CPUs to be offlined...");
> > +	while (!cpumask_full(&cpu_off_done))
> > +		cpu_relax();
> > +
> > +	/* Allow all the other CPUs to complete the operation */
> > +	for (i = 0; i < 100; i++) {
> > +		mdelay(10);
> > +
> > +		count = 0;
> > +		for_each_present_cpu(cpu) {
> > +			if (cpu == 0)
> > +				continue;
> > +			if (psci_affinity_info(cpus[cpu], 0) != PSCI_0_2_AFFINITY_LEVEL_OFF)
> > +				count++;
> > +		}
> > +		if (count > 0)
> > +			continue;
> 
> This should be
> 
> if (count == 0)
>    break;
> 
> otherwise we never leave the loop early.

Duh, don't know what I was thinking. Thanks for noticing it.

> 
> > +	}
> > +
> > +	/* Try to catch CPUs that return from CPU_OFF. */
> > +	if (count == 0)
> > +		mdelay(100);
> > +
> > +	for_each_present_cpu(cpu) {
> > +		if (cpu == 0)
> > +			continue;
> > +		if (cpu_idle(cpu)) {
> > +			report_info("CPU%d failed to be offlined", cpu);
> > +			if (psci_affinity_info(cpus[cpu], 0) == PSCI_0_2_AFFINITY_LEVEL_OFF)
> > +				report_info("AFFINITY_INFO incorrectly reports CPU%d as offline", cpu);
> > +			failed = true;
> > +		}
> > +	}
> > +
> > +	return !failed;
> > +}
> > +
> > +int main(int argc, char **argv)
> >  {
> >  	int ver = psci_invoke(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
> > +	bool cpu_on_success = true;
> >  
> >  	report_prefix_push("psci");
> >  
> > @@ -188,10 +251,17 @@ int main(void)
> >  	report(psci_affinity_info_on(), "affinity-info-on");
> >  	report(psci_affinity_info_off(), "affinity-info-off");
> >  
> > -	if (ERRATA(6c7a5dce22b3))
> > -		report(psci_cpu_on_test(), "cpu-on");
> > -	else
> > +	if (ERRATA(6c7a5dce22b3)) {
> > +		cpu_on_success = psci_cpu_on_test();
> > +		report(cpu_on_success, "cpu-on");
> > +	} else {
> >  		report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable.");
> > +	}
> > +
> > +	if (!cpu_on_success)
> > +		report_skip("Skipping cpu-off test because the cpu-on test failed");
> 
> We should output "was skipped" when the cpu-on test was skipped, rather
> than always reporting "failed". We need two booleans, try_cpu_on_test and
> cpu_on_success.

This is not about cpu-on being a precondition for cpu-off. cpu-off makes
only one assumption, and that is that all secondaries can be onlined
successfully. Even if cpu-on is never run, cpu-off calls on_cpu_async,
which will online a secondary. This is safe even if the errata is not
present, because the errata is about concurrent CPU_ON calls for the same
VCPU, not for different VCPUs.

The cpu-off test is skipped here because it can hang indefinitely if
onlining CPU1 was not successful:

[..]
        for_each_present_cpu(cpu) {
                if (cpu == 0)
                        continue;
                on_cpu_async(cpu, cpu_off_secondary_entry, NULL);
        }

        cpumask_set_cpu(0, &cpu_off_done);

        cpu_off_start = 1;
        report_info("waiting for the CPUs to be offlined...");
        while (!cpumask_full(&cpu_off_done))	// infinite loop if CPU1
                cpu_relax();			// cannot be onlined.

Does that make sense? Should I add a comment to make it clear why cpu-off
is skipped when cpu-on fails?

Thanks,
Alex

> 
> > +	else
> > +		report(psci_cpu_off_test(), "cpu-off");
> >  
> >  done:
> >  #if 0
> > -- 
> > 2.39.0
> > 
> 
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case
  2023-01-31  9:52     ` Alexandru Elisei
@ 2023-01-31 10:46       ` Andrew Jones
  2023-01-31 11:16         ` Alexandru Elisei
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Jones @ 2023-01-31 10:46 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, kvmarm

On Tue, Jan 31, 2023 at 09:52:56AM +0000, Alexandru Elisei wrote:
> Hi Drew,
> 
> On Tue, Jan 31, 2023 at 07:56:23AM +0100, Andrew Jones wrote:
> > On Fri, Jan 27, 2023 at 05:59:16PM +0000, Alexandru Elisei wrote:
> > > From: Nikita Venkatesh <Nikita.Venkatesh@arm.com>
> > > 
> > > The test uses the following method.
> > > 
> > > The primary CPU brings up all the secondary CPUs, which are held in a wait
> > > loop. Once the primary releases the CPUs, each of the secondary CPUs
> > > proceed to issue CPU_OFF.
> > > 
> > > The primary CPU then checks for the status of the individual CPU_OFF
> > > request. There is a chance that some CPUs might return from the CPU_OFF
> > > function call after the primary CPU has finished the scan. There is no
> > > foolproof method to handle this, but the test tries its best to
> > > eliminate these false positives by introducing an extra delay if all the
> > > CPUs are reported offline after the initial scan.
> > > 
> > > Signed-off-by: Nikita Venkatesh <Nikita.Venkatesh@arm.com>
> > > [ Alex E: Skip CPU_OFF test if CPU_ON failed, drop cpu_off_success in
> > > 	  favour of checking AFFINITY_INFO, commit message tweaking ]
> > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > ---
> > > 
> > > Decided to drop Drew's Reviewed-by tag because the changes are not trivial
> > > from the previous version.
> > > 
> > >  arm/psci.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++----
> > >  1 file changed, 75 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/arm/psci.c b/arm/psci.c
> > > index f7238f8e0bbd..7034d8ebe6e1 100644
> > > --- a/arm/psci.c
> > > +++ b/arm/psci.c
> > > @@ -72,8 +72,9 @@ static bool psci_affinity_info_off(void)
> > >  }
> > >  
> > >  static int cpu_on_ret[NR_CPUS];
> > > -static cpumask_t cpu_on_ready, cpu_on_done;
> > > +static cpumask_t cpu_on_ready, cpu_on_done, cpu_off_done;
> > >  static volatile int cpu_on_start;
> > > +static volatile int cpu_off_start;
> > >  
> > >  extern void secondary_entry(void);
> > >  static void cpu_on_do_wake_target(void)
> > > @@ -171,9 +172,71 @@ static bool psci_cpu_on_test(void)
> > >  	return !failed;
> > >  }
> > >  
> > > -int main(void)
> > > +static void cpu_off_secondary_entry(void *data)
> > > +{
> > > +	int cpu = smp_processor_id();
> > > +
> > > +	while (!cpu_off_start)
> > > +		cpu_relax();
> > > +	cpumask_set_cpu(cpu, &cpu_off_done);
> > > +	cpu_psci_cpu_die();
> > > +}
> > > +
> > > +static bool psci_cpu_off_test(void)
> > > +{
> > > +	bool failed = false;
> > > +	int i, count, cpu;
> > > +
> > > +	for_each_present_cpu(cpu) {
> > > +		if (cpu == 0)
> > > +			continue;
> > > +		on_cpu_async(cpu, cpu_off_secondary_entry, NULL);
> > > +	}
> > > +
> > > +	cpumask_set_cpu(0, &cpu_off_done);
> > > +
> > > +	cpu_off_start = 1;
> > > +	report_info("waiting for the CPUs to be offlined...");
> > > +	while (!cpumask_full(&cpu_off_done))
> > > +		cpu_relax();
> > > +
> > > +	/* Allow all the other CPUs to complete the operation */
> > > +	for (i = 0; i < 100; i++) {
> > > +		mdelay(10);
> > > +
> > > +		count = 0;
> > > +		for_each_present_cpu(cpu) {
> > > +			if (cpu == 0)
> > > +				continue;
> > > +			if (psci_affinity_info(cpus[cpu], 0) != PSCI_0_2_AFFINITY_LEVEL_OFF)
> > > +				count++;
> > > +		}
> > > +		if (count > 0)
> > > +			continue;
> > 
> > This should be
> > 
> > if (count == 0)
> >    break;
> > 
> > otherwise we never leave the loop early.
> 
> Duh, don't know what I was thinking. Thanks for noticing it.
> 
> > 
> > > +	}
> > > +
> > > +	/* Try to catch CPUs that return from CPU_OFF. */
> > > +	if (count == 0)
> > > +		mdelay(100);
> > > +
> > > +	for_each_present_cpu(cpu) {
> > > +		if (cpu == 0)
> > > +			continue;
> > > +		if (cpu_idle(cpu)) {
> > > +			report_info("CPU%d failed to be offlined", cpu);
> > > +			if (psci_affinity_info(cpus[cpu], 0) == PSCI_0_2_AFFINITY_LEVEL_OFF)
> > > +				report_info("AFFINITY_INFO incorrectly reports CPU%d as offline", cpu);
> > > +			failed = true;
> > > +		}
> > > +	}
> > > +
> > > +	return !failed;
> > > +}
> > > +
> > > +int main(int argc, char **argv)

I just noticed we're adding argc,argv in this patch, but not using them.

> > >  {
> > >  	int ver = psci_invoke(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
> > > +	bool cpu_on_success = true;
> > >  
> > >  	report_prefix_push("psci");
> > >  
> > > @@ -188,10 +251,17 @@ int main(void)
> > >  	report(psci_affinity_info_on(), "affinity-info-on");
> > >  	report(psci_affinity_info_off(), "affinity-info-off");
> > >  
> > > -	if (ERRATA(6c7a5dce22b3))
> > > -		report(psci_cpu_on_test(), "cpu-on");
> > > -	else
> > > +	if (ERRATA(6c7a5dce22b3)) {
> > > +		cpu_on_success = psci_cpu_on_test();
> > > +		report(cpu_on_success, "cpu-on");
> > > +	} else {
> > >  		report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable.");
> > > +	}
> > > +
> > > +	if (!cpu_on_success)
> > > +		report_skip("Skipping cpu-off test because the cpu-on test failed");
> > 
> > We should output "was skipped" when the cpu-on test was skipped, rather
> > than always reporting "failed". We need two booleans, try_cpu_on_test and
> > cpu_on_success.
> 
> This is not about cpu-on being a precondition for cpu-off. cpu-off makes
> only one assumption, and that is that all secondaries can be onlined
> successfully. Even if cpu-on is never run, cpu-off calls on_cpu_async,
> which will online a secondary. This is safe even if the errata is not
> present, because the errata is about concurrent CPU_ON calls for the same
> VCPU, not for different VCPUs.
> 
> The cpu-off test is skipped here because it can hang indefinitely if
> onlining CPU1 was not successful:
> 
> [..]
>         for_each_present_cpu(cpu) {
>                 if (cpu == 0)
>                         continue;
>                 on_cpu_async(cpu, cpu_off_secondary_entry, NULL);
>         }
> 
>         cpumask_set_cpu(0, &cpu_off_done);
> 
>         cpu_off_start = 1;
>         report_info("waiting for the CPUs to be offlined...");
>         while (!cpumask_full(&cpu_off_done))	// infinite loop if CPU1
>                 cpu_relax();			// cannot be onlined.
> 
> Does that make sense? Should I add a comment to make it clear why cpu-off
> is skipped when cpu-on fails?

I missed that cpu_on_success was initialized to true. Seeing that now, I
understand how the only time it's false is if the cpu-on test failed. When
I thought it was initialized to false it had two ways to be false, failure
or skip. I think it's a bit confusing to set a 'success' variable to true
when the test is skipped. Also, we can relax the condition as to whether
or not we try cpu-off by simply checking that all cpus, other than cpu0,
are in idle. How about

 if (ERRATA(6c7a5dce22b3))
     report(psci_cpu_on_test(), "cpu-on");
 else
     report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable.");

 assert(!cpu_idle(0));

 if (!ERRATA(6c7a5dce22b3) || cpumask_weight(&cpu_idle_mask) == nr_cpus - 1)
     report(psci_cpu_off_test(), "cpu-off");
 else
     report_skip("Skipping cpu-off test because the cpu-on test failed");


Thanks,
drew


> 
> Thanks,
> Alex
> 
> > 
> > > +	else
> > > +		report(psci_cpu_off_test(), "cpu-off");
> > >  
> > >  done:
> > >  #if 0
> > > -- 
> > > 2.39.0
> > > 
> > 
> > Thanks,
> > drew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case
  2023-01-31 10:46       ` Andrew Jones
@ 2023-01-31 11:16         ` Alexandru Elisei
  2023-01-31 11:45           ` Andrew Jones
  0 siblings, 1 reply; 10+ messages in thread
From: Alexandru Elisei @ 2023-01-31 11:16 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, kvmarm

Hi Drew,

On Tue, Jan 31, 2023 at 11:46:10AM +0100, Andrew Jones wrote:
> On Tue, Jan 31, 2023 at 09:52:56AM +0000, Alexandru Elisei wrote:
> > Hi Drew,
> > 
> > On Tue, Jan 31, 2023 at 07:56:23AM +0100, Andrew Jones wrote:
> > > On Fri, Jan 27, 2023 at 05:59:16PM +0000, Alexandru Elisei wrote:
> > > > From: Nikita Venkatesh <Nikita.Venkatesh@arm.com>
> > > > 
> > > > The test uses the following method.
> > > > 
> > > > The primary CPU brings up all the secondary CPUs, which are held in a wait
> > > > loop. Once the primary releases the CPUs, each of the secondary CPUs
> > > > proceed to issue CPU_OFF.
> > > > 
> > > > The primary CPU then checks for the status of the individual CPU_OFF
> > > > request. There is a chance that some CPUs might return from the CPU_OFF
> > > > function call after the primary CPU has finished the scan. There is no
> > > > foolproof method to handle this, but the test tries its best to
> > > > eliminate these false positives by introducing an extra delay if all the
> > > > CPUs are reported offline after the initial scan.
> > > > 
> > > > Signed-off-by: Nikita Venkatesh <Nikita.Venkatesh@arm.com>
> > > > [ Alex E: Skip CPU_OFF test if CPU_ON failed, drop cpu_off_success in
> > > > 	  favour of checking AFFINITY_INFO, commit message tweaking ]
> > > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
> > > > ---
> > > > 
> > > > Decided to drop Drew's Reviewed-by tag because the changes are not trivial
> > > > from the previous version.
> > > > 
> > > >  arm/psci.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++----
> > > >  1 file changed, 75 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/arm/psci.c b/arm/psci.c
> > > > index f7238f8e0bbd..7034d8ebe6e1 100644
> > > > --- a/arm/psci.c
> > > > +++ b/arm/psci.c
> > > > @@ -72,8 +72,9 @@ static bool psci_affinity_info_off(void)
> > > >  }
> > > >  
> > > >  static int cpu_on_ret[NR_CPUS];
> > > > -static cpumask_t cpu_on_ready, cpu_on_done;
> > > > +static cpumask_t cpu_on_ready, cpu_on_done, cpu_off_done;
> > > >  static volatile int cpu_on_start;
> > > > +static volatile int cpu_off_start;
> > > >  
> > > >  extern void secondary_entry(void);
> > > >  static void cpu_on_do_wake_target(void)
> > > > @@ -171,9 +172,71 @@ static bool psci_cpu_on_test(void)
> > > >  	return !failed;
> > > >  }
> > > >  
> > > > -int main(void)
> > > > +static void cpu_off_secondary_entry(void *data)
> > > > +{
> > > > +	int cpu = smp_processor_id();
> > > > +
> > > > +	while (!cpu_off_start)
> > > > +		cpu_relax();
> > > > +	cpumask_set_cpu(cpu, &cpu_off_done);
> > > > +	cpu_psci_cpu_die();
> > > > +}
> > > > +
> > > > +static bool psci_cpu_off_test(void)
> > > > +{
> > > > +	bool failed = false;
> > > > +	int i, count, cpu;
> > > > +
> > > > +	for_each_present_cpu(cpu) {
> > > > +		if (cpu == 0)
> > > > +			continue;
> > > > +		on_cpu_async(cpu, cpu_off_secondary_entry, NULL);
> > > > +	}
> > > > +
> > > > +	cpumask_set_cpu(0, &cpu_off_done);
> > > > +
> > > > +	cpu_off_start = 1;
> > > > +	report_info("waiting for the CPUs to be offlined...");
> > > > +	while (!cpumask_full(&cpu_off_done))
> > > > +		cpu_relax();
> > > > +
> > > > +	/* Allow all the other CPUs to complete the operation */
> > > > +	for (i = 0; i < 100; i++) {
> > > > +		mdelay(10);
> > > > +
> > > > +		count = 0;
> > > > +		for_each_present_cpu(cpu) {
> > > > +			if (cpu == 0)
> > > > +				continue;
> > > > +			if (psci_affinity_info(cpus[cpu], 0) != PSCI_0_2_AFFINITY_LEVEL_OFF)
> > > > +				count++;
> > > > +		}
> > > > +		if (count > 0)
> > > > +			continue;
> > > 
> > > This should be
> > > 
> > > if (count == 0)
> > >    break;
> > > 
> > > otherwise we never leave the loop early.
> > 
> > Duh, don't know what I was thinking. Thanks for noticing it.
> > 
> > > 
> > > > +	}
> > > > +
> > > > +	/* Try to catch CPUs that return from CPU_OFF. */
> > > > +	if (count == 0)
> > > > +		mdelay(100);
> > > > +
> > > > +	for_each_present_cpu(cpu) {
> > > > +		if (cpu == 0)
> > > > +			continue;
> > > > +		if (cpu_idle(cpu)) {
> > > > +			report_info("CPU%d failed to be offlined", cpu);
> > > > +			if (psci_affinity_info(cpus[cpu], 0) == PSCI_0_2_AFFINITY_LEVEL_OFF)
> > > > +				report_info("AFFINITY_INFO incorrectly reports CPU%d as offline", cpu);
> > > > +			failed = true;
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return !failed;
> > > > +}
> > > > +
> > > > +int main(int argc, char **argv)
> 
> I just noticed we're adding argc,argv in this patch, but not using them.

Will remove that. It's a leftoever from a previous version where the
cpu-off test was a separate test selected via the command line.

> 
> > > >  {
> > > >  	int ver = psci_invoke(PSCI_0_2_FN_PSCI_VERSION, 0, 0, 0);
> > > > +	bool cpu_on_success = true;
> > > >  
> > > >  	report_prefix_push("psci");
> > > >  
> > > > @@ -188,10 +251,17 @@ int main(void)
> > > >  	report(psci_affinity_info_on(), "affinity-info-on");
> > > >  	report(psci_affinity_info_off(), "affinity-info-off");
> > > >  
> > > > -	if (ERRATA(6c7a5dce22b3))
> > > > -		report(psci_cpu_on_test(), "cpu-on");
> > > > -	else
> > > > +	if (ERRATA(6c7a5dce22b3)) {
> > > > +		cpu_on_success = psci_cpu_on_test();
> > > > +		report(cpu_on_success, "cpu-on");
> > > > +	} else {
> > > >  		report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable.");
> > > > +	}
> > > > +
> > > > +	if (!cpu_on_success)
> > > > +		report_skip("Skipping cpu-off test because the cpu-on test failed");
> > > 
> > > We should output "was skipped" when the cpu-on test was skipped, rather
> > > than always reporting "failed". We need two booleans, try_cpu_on_test and
> > > cpu_on_success.
> > 
> > This is not about cpu-on being a precondition for cpu-off. cpu-off makes
> > only one assumption, and that is that all secondaries can be onlined
> > successfully. Even if cpu-on is never run, cpu-off calls on_cpu_async,
> > which will online a secondary. This is safe even if the errata is not
> > present, because the errata is about concurrent CPU_ON calls for the same
> > VCPU, not for different VCPUs.
> > 
> > The cpu-off test is skipped here because it can hang indefinitely if
> > onlining CPU1 was not successful:
> > 
> > [..]
> >         for_each_present_cpu(cpu) {
> >                 if (cpu == 0)
> >                         continue;
> >                 on_cpu_async(cpu, cpu_off_secondary_entry, NULL);
> >         }
> > 
> >         cpumask_set_cpu(0, &cpu_off_done);
> > 
> >         cpu_off_start = 1;
> >         report_info("waiting for the CPUs to be offlined...");
> >         while (!cpumask_full(&cpu_off_done))	// infinite loop if CPU1
> >                 cpu_relax();			// cannot be onlined.
> > 
> > Does that make sense? Should I add a comment to make it clear why cpu-off
> > is skipped when cpu-on fails?
> 
> I missed that cpu_on_success was initialized to true. Seeing that now, I
> understand how the only time it's false is if the cpu-on test failed. When
> I thought it was initialized to false it had two ways to be false, failure
> or skip. I think it's a bit confusing to set a 'success' variable to true
> when the test is skipped. Also, we can relax the condition as to whether
> or not we try cpu-off by simply checking that all cpus, other than cpu0,
> are in idle. How about
> 
>  if (ERRATA(6c7a5dce22b3))
>      report(psci_cpu_on_test(), "cpu-on");
>  else
>      report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable.");
> 
>  assert(!cpu_idle(0));

cpu0 is the boot CPU, I don't see how cpu0 can execute this line of code
and be in idle at the same time. Unless this is done for documenting
purposes, to explain why we compare the number of cpus in idle to nr_cpus
-1 below. But I still find it confusing, especially considering (almost)
the same assert is in smp.c:

void on_cpu_async(int cpu, void (*func)(void *data), void *data)
{
	[..]
        assert_msg(cpu != 0 || cpu0_calls_idle, "Waiting on CPU0, which is unlikely to idle. "
                                                "If this is intended set cpu0_calls_idle=1");

I know, it's a different scenario, but the point I'm trying to make is that
kvm-unit-tests really doesn't expect cpu0 to be in idle. I would prefer not
to have the assert here.

> 
>  if (!ERRATA(6c7a5dce22b3) || cpumask_weight(&cpu_idle_mask) == nr_cpus - 1)
>      report(psci_cpu_off_test(), "cpu-off");
>  else
>      report_skip("Skipping cpu-off test because the cpu-on test failed");

Looks good, will do it this way.

Thanks,
Alex

> 
> 
> Thanks,
> drew
> 
> 
> > 
> > Thanks,
> > Alex
> > 
> > > 
> > > > +	else
> > > > +		report(psci_cpu_off_test(), "cpu-off");
> > > >  
> > > >  done:
> > > >  #if 0
> > > > -- 
> > > > 2.39.0
> > > > 
> > > 
> > > Thanks,
> > > drew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case
  2023-01-31 11:16         ` Alexandru Elisei
@ 2023-01-31 11:45           ` Andrew Jones
  2023-01-31 11:49             ` Alexandru Elisei
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Jones @ 2023-01-31 11:45 UTC (permalink / raw)
  To: Alexandru Elisei; +Cc: kvm, kvmarm

On Tue, Jan 31, 2023 at 11:16:22AM +0000, Alexandru Elisei wrote:
...
> > > Does that make sense? Should I add a comment to make it clear why cpu-off
> > > is skipped when cpu-on fails?
> > 
> > I missed that cpu_on_success was initialized to true. Seeing that now, I
> > understand how the only time it's false is if the cpu-on test failed. When
> > I thought it was initialized to false it had two ways to be false, failure
> > or skip. I think it's a bit confusing to set a 'success' variable to true
> > when the test is skipped. Also, we can relax the condition as to whether
> > or not we try cpu-off by simply checking that all cpus, other than cpu0,
> > are in idle. How about
> > 
> >  if (ERRATA(6c7a5dce22b3))
> >      report(psci_cpu_on_test(), "cpu-on");
> >  else
> >      report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable.");
> > 
> >  assert(!cpu_idle(0));
> 
> cpu0 is the boot CPU, I don't see how cpu0 can execute this line of code
> and be in idle at the same time.

That's why it's an assert and not an if, i.e. it should never happen. It
could happen if things are messed up in the lib code, a previous test
mucked with cpu_idle_mask, or a previous test idled cpu0 and manipulated
another cpu into executing this line.

> Unless this is done for documenting
> purposes, to explain why we compare the number of cpus in idle to nr_cpus
> -1 below.

Exactly, and furthermore that we expect the missing cpu to be cpu0.

> But I still find it confusing, especially considering (almost)
> the same assert is in smp.c:
> 
> void on_cpu_async(int cpu, void (*func)(void *data), void *data)
> {
> 	[..]
>         assert_msg(cpu != 0 || cpu0_calls_idle, "Waiting on CPU0, which is unlikely to idle. "
>                                                 "If this is intended set cpu0_calls_idle=1");
> 
> I know, it's a different scenario, but the point I'm trying to make is that
> kvm-unit-tests really doesn't expect cpu0 to be in idle. I would prefer not
> to have the assert here.

asserts are for things we assume, but also want to ensure, as other code
depends on the assumptions. Please keep the assert. It doesn't hurt :-)

Thanks,
drew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case
  2023-01-31 11:45           ` Andrew Jones
@ 2023-01-31 11:49             ` Alexandru Elisei
  0 siblings, 0 replies; 10+ messages in thread
From: Alexandru Elisei @ 2023-01-31 11:49 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvm, kvmarm

Hi Drew,

On Tue, Jan 31, 2023 at 12:45:49PM +0100, Andrew Jones wrote:
> On Tue, Jan 31, 2023 at 11:16:22AM +0000, Alexandru Elisei wrote:
> ...
> > > > Does that make sense? Should I add a comment to make it clear why cpu-off
> > > > is skipped when cpu-on fails?
> > > 
> > > I missed that cpu_on_success was initialized to true. Seeing that now, I
> > > understand how the only time it's false is if the cpu-on test failed. When
> > > I thought it was initialized to false it had two ways to be false, failure
> > > or skip. I think it's a bit confusing to set a 'success' variable to true
> > > when the test is skipped. Also, we can relax the condition as to whether
> > > or not we try cpu-off by simply checking that all cpus, other than cpu0,
> > > are in idle. How about
> > > 
> > >  if (ERRATA(6c7a5dce22b3))
> > >      report(psci_cpu_on_test(), "cpu-on");
> > >  else
> > >      report_skip("Skipping unsafe cpu-on test. Set ERRATA_6c7a5dce22b3=y to enable.");
> > > 
> > >  assert(!cpu_idle(0));
> > 
> > cpu0 is the boot CPU, I don't see how cpu0 can execute this line of code
> > and be in idle at the same time.
> 
> That's why it's an assert and not an if, i.e. it should never happen. It
> could happen if things are messed up in the lib code, a previous test
> mucked with cpu_idle_mask, or a previous test idled cpu0 and manipulated
> another cpu into executing this line.
> 
> > Unless this is done for documenting
> > purposes, to explain why we compare the number of cpus in idle to nr_cpus
> > -1 below.
> 
> Exactly, and furthermore that we expect the missing cpu to be cpu0.
> 
> > But I still find it confusing, especially considering (almost)
> > the same assert is in smp.c:
> > 
> > void on_cpu_async(int cpu, void (*func)(void *data), void *data)
> > {
> > 	[..]
> >         assert_msg(cpu != 0 || cpu0_calls_idle, "Waiting on CPU0, which is unlikely to idle. "
> >                                                 "If this is intended set cpu0_calls_idle=1");
> > 
> > I know, it's a different scenario, but the point I'm trying to make is that
> > kvm-unit-tests really doesn't expect cpu0 to be in idle. I would prefer not
> > to have the assert here.
> 
> asserts are for things we assume, but also want to ensure, as other code
> depends on the assumptions. Please keep the assert. It doesn't hurt :-)

I'm keeping the assert then :)

Thanks,
Alex

> 
> Thanks,
> drew

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-01-31 11:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-27 17:59 [kvm-unit-tests PATCH v5 0/2] arm: Add PSCI CPU_OFF test Alexandru Elisei
2023-01-27 17:59 ` [kvm-unit-tests PATCH v5 1/2] arm/psci: Test that CPU 1 has been successfully brought online Alexandru Elisei
2023-01-31  6:43   ` Andrew Jones
2023-01-27 17:59 ` [kvm-unit-tests PATCH v5 2/2] arm/psci: Add PSCI CPU_OFF test case Alexandru Elisei
2023-01-31  6:56   ` Andrew Jones
2023-01-31  9:52     ` Alexandru Elisei
2023-01-31 10:46       ` Andrew Jones
2023-01-31 11:16         ` Alexandru Elisei
2023-01-31 11:45           ` Andrew Jones
2023-01-31 11:49             ` Alexandru Elisei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.