All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] some boot/shutdown improvements
@ 2017-10-06  6:10 Nicholas Piggin
  2017-10-06  6:10 ` [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot Nicholas Piggin
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Nicholas Piggin @ 2017-10-06  6:10 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Vasant Hegde, Stewart Smith, Benjamin Herrenschmidt

Hi,

These are a couple of improvements to powernv/opal boot and
shutdown paths. Also a patch to move smp_send_stop over to
use NMI IPIs, which gives us a significantly better chance to
stop secondaries on platforms which support it (pSeries and
PowerNV POWER9 so far).

Patch 1 in particular it would be good if people could take
a look at. I *think* it's okay wrt kexec, but I could miss
something or some old firmware might do the wrong thing.

Thanks,
Nick

Nicholas Piggin (3):
  powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot
  powerpc/powernv: Always stop secondaries before reboot/shutdown
  powerpc: use NMI IPI for smp_send_stop

 arch/powerpc/include/asm/opal.h             |  2 +-
 arch/powerpc/kernel/head_64.S               |  4 +++-
 arch/powerpc/kernel/setup_64.c              | 14 ++++++++++++--
 arch/powerpc/kernel/smp.c                   |  9 +++++----
 arch/powerpc/platforms/powernv/opal-flash.c | 28 +---------------------------
 arch/powerpc/platforms/powernv/setup.c      | 15 +++++----------
 6 files changed, 27 insertions(+), 45 deletions(-)

-- 
2.13.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot
  2017-10-06  6:10 [PATCH 0/3] some boot/shutdown improvements Nicholas Piggin
@ 2017-10-06  6:10 ` Nicholas Piggin
  2017-10-10 11:11   ` Michael Ellerman
  2017-10-06  6:10 ` [PATCH 2/3] powerpc/powernv: Always stop secondaries before reboot/shutdown Nicholas Piggin
  2017-10-06  6:10 ` [PATCH 3/3] powerpc: use NMI IPI for smp_send_stop Nicholas Piggin
  2 siblings, 1 reply; 10+ messages in thread
From: Nicholas Piggin @ 2017-10-06  6:10 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Vasant Hegde, Stewart Smith, Benjamin Herrenschmidt

OPAL boot does not insert secondaries at 0x60 to wait at the secondary
hold spinloop. Instead it keeps them held in firmware until the
opal_start_cpu call is made, which directs them where the caller
specifies. Linux inserts them into generic_secondary_smp_init(), which
is after the secondary hold spinloop (they go on to spin at the per-CPU
paca loops, but that is another step).

So avoid waiting on this spinloop when booting with OPAL firmware.
It always just times out.

This saves 100ms boot time on bare metal, and 10s of seconds when
booting the simulator in SMP.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/head_64.S  |  4 +++-
 arch/powerpc/kernel/setup_64.c | 14 ++++++++++++--
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index c9e760ec7530..1ebfb3f2cbbb 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -60,7 +60,9 @@
  *   1. The MMU is off, processor in HV mode, primary CPU enters at 0
  *      with device-tree in gpr3. We also get OPAL base in r8 and
  *	entry in r9 for debugging purposes
- *   2. Secondary processors enter at 0x60 with PIR in gpr3
+ *   2. Secondary processors enter as directed by opal_start_cpu(), which
+ *      is generic_secondary_smp_init, with PIR in gpr3. The secondary spin
+ *      code is not used.
  *
  *  For Book3E processors:
  *   1. The MMU is on running in AS0 in a state defined in ePAPR
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 3f2453858f60..eada0a7b73f8 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -363,8 +363,18 @@ void early_setup_secondary(void)
 #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE)
 static bool use_spinloop(void)
 {
-	if (!IS_ENABLED(CONFIG_PPC_BOOK3E))
-		return true;
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S)) {
+		/*
+		 * With OPAL, secondaries do not use the secondary hold
+		 * spinloop, rather they are held in firmware until
+		 * opal_start_cpu() sends them to generic_secondary_smp_init
+		 * directly.
+		 */
+		if (firmware_has_feature(FW_FEATURE_OPAL))
+			return false;
+		else
+			return true;
+	}
 
 	/*
 	 * When book3e boots from kexec, the ePAPR spin table does
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] powerpc/powernv: Always stop secondaries before reboot/shutdown
  2017-10-06  6:10 [PATCH 0/3] some boot/shutdown improvements Nicholas Piggin
  2017-10-06  6:10 ` [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot Nicholas Piggin
@ 2017-10-06  6:10 ` Nicholas Piggin
  2017-10-06  6:10 ` [PATCH 3/3] powerpc: use NMI IPI for smp_send_stop Nicholas Piggin
  2 siblings, 0 replies; 10+ messages in thread
From: Nicholas Piggin @ 2017-10-06  6:10 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Vasant Hegde, Stewart Smith, Benjamin Herrenschmidt

Currently powernv reboot and shutdown requests just leave secondaries
to do their own things. This is undesirable because they can trigger
any number of watchdogs while waiting for reboot, but also we don't
know what else they might be doing, or they might be stuck somewhere
causing trouble.

The opal scheduled flash update code already ran into watchdog problems
due to flashing taking a long time, but it's possible for regular
reboots to trigger problems too (this is with watchdog_thresh set to 1,
but I have seen it with watchdog_thresh at the default value once too):

  reboot: Restarting system
  [  360.038896709,5] OPAL: Reboot request...
  Watchdog CPU:0 Hard LOCKUP
  Watchdog CPU:44 detected Hard LOCKUP other CPUS:16
  Watchdog CPU:16 Hard LOCKUP
  watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0]

So remove the special case for flash update, and unconditionally do
smp_send_stop before rebooting.

Return the CPUs to Linux stop loops rather than OPAL. The reason for
this is that in firmware, CPUs will check for jobs, whereas smp_send_stop
puts them into a simple infinite loop. If there is some corruption, it
is better to do the latter, to maximize the chance of a successful
reboot.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/opal.h             |  2 +-
 arch/powerpc/platforms/powernv/opal-flash.c | 28 +---------------------------
 arch/powerpc/platforms/powernv/setup.c      | 15 +++++----------
 3 files changed, 7 insertions(+), 38 deletions(-)

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 04c32b08ffa1..ce58f4139ff5 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -317,7 +317,7 @@ struct rtc_time;
 extern unsigned long opal_get_boot_time(void);
 extern void opal_nvram_init(void);
 extern void opal_flash_update_init(void);
-extern void opal_flash_term_callback(void);
+extern void opal_flash_update_print_message(void);
 extern int opal_elog_init(void);
 extern void opal_platform_dump_init(void);
 extern void opal_sys_param_init(void);
diff --git a/arch/powerpc/platforms/powernv/opal-flash.c b/arch/powerpc/platforms/powernv/opal-flash.c
index 2fa3ac80cb4e..632871d78576 100644
--- a/arch/powerpc/platforms/powernv/opal-flash.c
+++ b/arch/powerpc/platforms/powernv/opal-flash.c
@@ -303,26 +303,9 @@ static int opal_flash_update(int op)
 	return rc;
 }
 
-/* Return CPUs to OPAL before starting FW update */
-static void flash_return_cpu(void *info)
-{
-	int cpu = smp_processor_id();
-
-	if (!cpu_online(cpu))
-		return;
-
-	/* Disable IRQ */
-	hard_irq_disable();
-
-	/* Return the CPU to OPAL */
-	opal_return_cpu();
-}
-
 /* This gets called just before system reboots */
-void opal_flash_term_callback(void)
+void opal_flash_update_print_message(void)
 {
-	struct cpumask mask;
-
 	if (update_flash_data.status != FLASH_IMG_READY)
 		return;
 
@@ -333,15 +316,6 @@ void opal_flash_term_callback(void)
 
 	/* Small delay to help getting the above message out */
 	msleep(500);
-
-	/* Return secondary CPUs to firmware */
-	cpumask_copy(&mask, cpu_online_mask);
-	cpumask_clear_cpu(smp_processor_id(), &mask);
-	if (!cpumask_empty(&mask))
-		smp_call_function_many(&mask,
-				       flash_return_cpu, NULL, false);
-	/* Hard disable interrupts */
-	hard_irq_disable();
 }
 
 /*
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index cf52d53da460..0d2f70d24747 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -112,17 +112,12 @@ static void pnv_prepare_going_down(void)
 	 */
 	opal_event_shutdown();
 
-	/* Soft disable interrupts */
-	local_irq_disable();
+	/* Print flash update message if one is scheduled. */
+	opal_flash_update_print_message();
 
-	/*
-	 * Return secondary CPUs to firwmare if a flash update
-	 * is pending otherwise we will get all sort of error
-	 * messages about CPU being stuck etc.. This will also
-	 * have the side effect of hard disabling interrupts so
-	 * past this point, the kernel is effectively dead.
-	 */
-	opal_flash_term_callback();
+	smp_send_stop();
+
+	hard_irq_disable();
 }
 
 static void  __noreturn pnv_restart(char *cmd)
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] powerpc: use NMI IPI for smp_send_stop
  2017-10-06  6:10 [PATCH 0/3] some boot/shutdown improvements Nicholas Piggin
  2017-10-06  6:10 ` [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot Nicholas Piggin
  2017-10-06  6:10 ` [PATCH 2/3] powerpc/powernv: Always stop secondaries before reboot/shutdown Nicholas Piggin
@ 2017-10-06  6:10 ` Nicholas Piggin
  2 siblings, 0 replies; 10+ messages in thread
From: Nicholas Piggin @ 2017-10-06  6:10 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nicholas Piggin, Vasant Hegde, Stewart Smith, Benjamin Herrenschmidt

Use the NMI IPI rather than smp_call_function for smp_send_stop.
Have stopped CPUs hard disable interrupts rather than just soft
disable.

This function is used in crash/panic/shutdown paths to bring other
CPUs down as quickly and reliably as possible, and minimizing their
potential to cause trouble.

Avoiding the Linux smp_call_function infrastructure and (if supported)
using true NMI IPIs makes this more robust.

Also use spin loop primitives in the stop callback, mainly to help
processing speed of the active thread speed in the simulator.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/smp.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e0a4c1f82e25..ce891030d925 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -547,19 +547,20 @@ void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
 }
 #endif
 
-static void stop_this_cpu(void *dummy)
+static void __noreturn stop_this_cpu(struct pt_regs *regs)
 {
 	/* Remove this CPU */
 	set_cpu_online(smp_processor_id(), false);
 
-	local_irq_disable();
+	hard_irq_disable();
+	spin_begin();
 	while (1)
-		;
+		spin_cpu_relax();
 }
 
 void smp_send_stop(void)
 {
-	smp_call_function(stop_this_cpu, NULL, 0);
+	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, stop_this_cpu, 1000000);
 }
 
 struct thread_info *current_set[NR_CPUS];
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot
  2017-10-06  6:10 ` [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot Nicholas Piggin
@ 2017-10-10 11:11   ` Michael Ellerman
  2017-10-10 11:44     ` Nicholas Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2017-10-10 11:11 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
  Cc: Vasant Hegde, Stewart Smith, Nicholas Piggin

Nicholas Piggin <npiggin@gmail.com> writes:

> OPAL boot does not insert secondaries at 0x60 to wait at the secondary
> hold spinloop. Instead it keeps them held in firmware until the
> opal_start_cpu call is made, which directs them where the caller
> specifies. Linux inserts them into generic_secondary_smp_init(), which
> is after the secondary hold spinloop (they go on to spin at the per-CPU
> paca loops, but that is another step).
>
> So avoid waiting on this spinloop when booting with OPAL firmware.
> It always just times out.
>
> This saves 100ms boot time on bare metal, and 10s of seconds when
> booting the simulator in SMP.

Oh nice, that's real facepalm territory.

It'd be neater if we just inserted them at 0x60, but the sequence is
wrong.

Can we fix it just by making spinning_secondaries zero on OPAL?

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot
  2017-10-10 11:11   ` Michael Ellerman
@ 2017-10-10 11:44     ` Nicholas Piggin
  2017-10-10 15:58       ` Nicholas Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Nicholas Piggin @ 2017-10-10 11:44 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Vasant Hegde, Stewart Smith

On Tue, 10 Oct 2017 22:11:46 +1100
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > OPAL boot does not insert secondaries at 0x60 to wait at the secondary
> > hold spinloop. Instead it keeps them held in firmware until the
> > opal_start_cpu call is made, which directs them where the caller
> > specifies. Linux inserts them into generic_secondary_smp_init(), which
> > is after the secondary hold spinloop (they go on to spin at the per-CPU
> > paca loops, but that is another step).
> >
> > So avoid waiting on this spinloop when booting with OPAL firmware.
> > It always just times out.
> >
> > This saves 100ms boot time on bare metal, and 10s of seconds when
> > booting the simulator in SMP.  
> 
> Oh nice, that's real facepalm territory.
> 
> It'd be neater if we just inserted them at 0x60, but the sequence is
> wrong.
> 
> Can we fix it just by making spinning_secondaries zero on OPAL?

I had a look at that, but generic_secondary_smp_init() still
decrements it, so it would underflow which I thought was
uglier.

I actually have to look a bit further, because KVM guests are
also having the loop time out too by the looks.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot
  2017-10-10 11:44     ` Nicholas Piggin
@ 2017-10-10 15:58       ` Nicholas Piggin
  2017-10-10 18:52         ` Nicholas Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Nicholas Piggin @ 2017-10-10 15:58 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Vasant Hegde, Stewart Smith

On Tue, 10 Oct 2017 21:44:15 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On Tue, 10 Oct 2017 22:11:46 +1100
> Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> > Nicholas Piggin <npiggin@gmail.com> writes:
> >   
> > > OPAL boot does not insert secondaries at 0x60 to wait at the secondary
> > > hold spinloop. Instead it keeps them held in firmware until the
> > > opal_start_cpu call is made, which directs them where the caller
> > > specifies. Linux inserts them into generic_secondary_smp_init(), which
> > > is after the secondary hold spinloop (they go on to spin at the per-CPU
> > > paca loops, but that is another step).
> > >
> > > So avoid waiting on this spinloop when booting with OPAL firmware.
> > > It always just times out.
> > >
> > > This saves 100ms boot time on bare metal, and 10s of seconds when
> > > booting the simulator in SMP.    
> > 
> > Oh nice, that's real facepalm territory.
> > 
> > It'd be neater if we just inserted them at 0x60, but the sequence is
> > wrong.
> > 
> > Can we fix it just by making spinning_secondaries zero on OPAL?  
> 
> I had a look at that, but generic_secondary_smp_init() still
> decrements it, so it would underflow which I thought was
> uglier.
> 
> I actually have to look a bit further, because KVM guests are
> also having the loop time out too by the looks.

Ahh okay, pseries is using the start-cpu RTAS call to enter at
generic_secondary_smp_init() as well. So we can take it out for
pseries as well.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot
  2017-10-10 15:58       ` Nicholas Piggin
@ 2017-10-10 18:52         ` Nicholas Piggin
  2017-10-11 11:27           ` Michael Ellerman
  0 siblings, 1 reply; 10+ messages in thread
From: Nicholas Piggin @ 2017-10-10 18:52 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Vasant Hegde, Stewart Smith

On Wed, 11 Oct 2017 01:58:28 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:


> Ahh okay, pseries is using the start-cpu RTAS call to enter at
> generic_secondary_smp_init() as well. So we can take it out for
> pseries as well.

This patch seems to do the trick for pseries guests too:

powerpc/64s: Avoid waiting for secondary hold spinloop if it is not used

OPAL and some RTAS boot does not insert secondaries at 0x60 to wait at
the secondary hold spinloop. Instead they are started later, at
generic_secondary_smp_init(), which is after the secondary hold
spinloop.

Avoid waiting on this spinloop when booting with OPAL firmware, or
when the RTAS boot does not use this loop. This wait always times
out in those cases.

This saves 100ms boot time on bare metal (10s of seconds of real time
when booting on the simulator in SMP), and 100ms on modern pseries
guests.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/kernel/head_64.S  | 16 +++++++++++-----
 arch/powerpc/kernel/setup_64.c | 12 +++++++++++-
 2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index c9e760ec7530..0deef350004f 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -55,12 +55,18 @@
  *
  *  For pSeries or server processors:
  *   1. The MMU is off & open firmware is running in real mode.
- *   2. The kernel is entered at __start
+ *   2. The primary CPU enters at __start.
+ *   3. If the RTAS supports "query-cpu-stopped-state", then secondary
+ *      CPUs will enter as directed by "start-cpu" RTAS call, which is
+ *      generic_secondary_smp_init, with PIR in r3.
+ *   4. Else the secondary CPUs will enter at secondary_hold (0x60) as
+ *      directed by the "start-cpu" RTS call, with PIR in r3.
  * -or- For OPAL entry:
- *   1. The MMU is off, processor in HV mode, primary CPU enters at 0
- *      with device-tree in gpr3. We also get OPAL base in r8 and
- *	entry in r9 for debugging purposes
- *   2. Secondary processors enter at 0x60 with PIR in gpr3
+ *   1. The MMU is off, processor in HV mode.
+ *   2. The primary CPU enters at 0 with device-tree in r3, OPAL base
+ *      in r8, and entry in r9 for debugging purposes.
+ *   3. Secondary CPUs enter as directed by OPAL_START_CPU call, which
+ *      is at generic_secondary_smp_init, with PIR in r3.
  *
  *  For Book3E processors:
  *   1. The MMU is on running in AS0 in a state defined in ePAPR
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 3f2453858f60..afa79e8d56a6 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -363,8 +363,18 @@ void early_setup_secondary(void)
 #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE)
 static bool use_spinloop(void)
 {
-	if (!IS_ENABLED(CONFIG_PPC_BOOK3E))
+	if (IS_ENABLED(CONFIG_PPC_BOOK3S)) {
+		/*
+		 * See comments in head_64.S -- not all platforms insert
+		 * secondaries at __secondary_hold and wait at the spin
+		 * loop.
+		 */
+		if (firmware_has_feature(FW_FEATURE_OPAL))
+			return false;
+		if (rtas_token("query-cpu-stopped-state") != RTAS_UNKNOWN_SERVICE)
+			return false;
 		return true;
+	}
 
 	/*
 	 * When book3e boots from kexec, the ePAPR spin table does
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot
  2017-10-10 18:52         ` Nicholas Piggin
@ 2017-10-11 11:27           ` Michael Ellerman
  2017-10-11 14:00             ` Nicholas Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Ellerman @ 2017-10-11 11:27 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev, Vasant Hegde, Stewart Smith

Nicholas Piggin <npiggin@gmail.com> writes:

> On Wed, 11 Oct 2017 01:58:28 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
>
>
>> Ahh okay, pseries is using the start-cpu RTAS call to enter at
>> generic_secondary_smp_init() as well. So we can take it out for
>> pseries as well.
>
> This patch seems to do the trick for pseries guests too:
>
> powerpc/64s: Avoid waiting for secondary hold spinloop if it is not used
>
> OPAL and some RTAS boot does not insert secondaries at 0x60 to wait at
> the secondary hold spinloop. Instead they are started later, at
> generic_secondary_smp_init(), which is after the secondary hold
> spinloop.
>
> Avoid waiting on this spinloop when booting with OPAL firmware, or
> when the RTAS boot does not use this loop. This wait always times
> out in those cases.
>
> This saves 100ms boot time on bare metal (10s of seconds of real time
> when booting on the simulator in SMP), and 100ms on modern pseries
> guests.

My instinct was to say "huh, that's not how it works on pseries!".

But then I see this was all changed in:

  dbe78b401186 ("powerpc/pseries: Do not start secondaries in Open Firmware") (Sep 2013)


So that is where my confusion comes from. Most of the code and comments
still talk about secondaries coming in at 0x60, but that's really only
on "legacy" machines.

I guess I can merge this, but really this code needs a proper cleanup. I
dislike all this platform specific knowledge ending up in setup_64.c.

If we had an smp_ops->spinning_secondaries() that tells the spin
loop how many secondaries to wait for, it could all go in platform code
I think.

The check for use_spinloop() would just become a short-circuit check of
spinning_secondaries == 0.

The other issue is kexec. IIRC when we kexec on pseries we don't return
the CPUs to RTAS, so then they *are* spinning at 0x60. But maybe that's
changed since I last looked at it too :)

cheers

> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index c9e760ec7530..0deef350004f 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -55,12 +55,18 @@
>   *
>   *  For pSeries or server processors:
>   *   1. The MMU is off & open firmware is running in real mode.
> - *   2. The kernel is entered at __start
> + *   2. The primary CPU enters at __start.
> + *   3. If the RTAS supports "query-cpu-stopped-state", then secondary
> + *      CPUs will enter as directed by "start-cpu" RTAS call, which is
> + *      generic_secondary_smp_init, with PIR in r3.
> + *   4. Else the secondary CPUs will enter at secondary_hold (0x60) as
> + *      directed by the "start-cpu" RTS call, with PIR in r3.
>   * -or- For OPAL entry:
> - *   1. The MMU is off, processor in HV mode, primary CPU enters at 0
> - *      with device-tree in gpr3. We also get OPAL base in r8 and
> - *	entry in r9 for debugging purposes
> - *   2. Secondary processors enter at 0x60 with PIR in gpr3
> + *   1. The MMU is off, processor in HV mode.
> + *   2. The primary CPU enters at 0 with device-tree in r3, OPAL base
> + *      in r8, and entry in r9 for debugging purposes.
> + *   3. Secondary CPUs enter as directed by OPAL_START_CPU call, which
> + *      is at generic_secondary_smp_init, with PIR in r3.
>   *
>   *  For Book3E processors:
>   *   1. The MMU is on running in AS0 in a state defined in ePAPR
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 3f2453858f60..afa79e8d56a6 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -363,8 +363,18 @@ void early_setup_secondary(void)
>  #if defined(CONFIG_SMP) || defined(CONFIG_KEXEC_CORE)
>  static bool use_spinloop(void)
>  {
> -	if (!IS_ENABLED(CONFIG_PPC_BOOK3E))
> +	if (IS_ENABLED(CONFIG_PPC_BOOK3S)) {
> +		/*
> +		 * See comments in head_64.S -- not all platforms insert
> +		 * secondaries at __secondary_hold and wait at the spin
> +		 * loop.
> +		 */
> +		if (firmware_has_feature(FW_FEATURE_OPAL))
> +			return false;
> +		if (rtas_token("query-cpu-stopped-state") != RTAS_UNKNOWN_SERVICE)
> +			return false;
>  		return true;
> +	}
>  
>  	/*
>  	 * When book3e boots from kexec, the ePAPR spin table does
> -- 
> 2.13.3

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot
  2017-10-11 11:27           ` Michael Ellerman
@ 2017-10-11 14:00             ` Nicholas Piggin
  0 siblings, 0 replies; 10+ messages in thread
From: Nicholas Piggin @ 2017-10-11 14:00 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, Vasant Hegde, Stewart Smith

On Wed, 11 Oct 2017 22:27:23 +1100
Michael Ellerman <mpe@ellerman.id.au> wrote:

> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> > On Wed, 11 Oct 2017 01:58:28 +1000
> > Nicholas Piggin <npiggin@gmail.com> wrote:
> >
> >  
> >> Ahh okay, pseries is using the start-cpu RTAS call to enter at
> >> generic_secondary_smp_init() as well. So we can take it out for
> >> pseries as well.  
> >
> > This patch seems to do the trick for pseries guests too:
> >
> > powerpc/64s: Avoid waiting for secondary hold spinloop if it is not used
> >
> > OPAL and some RTAS boot does not insert secondaries at 0x60 to wait at
> > the secondary hold spinloop. Instead they are started later, at
> > generic_secondary_smp_init(), which is after the secondary hold
> > spinloop.
> >
> > Avoid waiting on this spinloop when booting with OPAL firmware, or
> > when the RTAS boot does not use this loop. This wait always times
> > out in those cases.
> >
> > This saves 100ms boot time on bare metal (10s of seconds of real time
> > when booting on the simulator in SMP), and 100ms on modern pseries
> > guests.  
> 
> My instinct was to say "huh, that's not how it works on pseries!".
> 
> But then I see this was all changed in:
> 
>   dbe78b401186 ("powerpc/pseries: Do not start secondaries in Open Firmware") (Sep 2013)
> 
> 
> So that is where my confusion comes from. Most of the code and comments
> still talk about secondaries coming in at 0x60, but that's really only
> on "legacy" machines.
> 
> I guess I can merge this, but really this code needs a proper cleanup. I
> dislike all this platform specific knowledge ending up in setup_64.c.
> 
> If we had an smp_ops->spinning_secondaries() that tells the spin
> loop how many secondaries to wait for, it could all go in platform code
> I think.

Yeah, not sure how best to do it. What I wanted to do was just increment
spinning_secondaries in prom_init as we inserted them to 0x60 (or the
0x100 for pmac or whatever). But prom_init doesn't like referencing outside
variables so there goes that.

> 
> The check for use_spinloop() would just become a short-circuit check of
> spinning_secondaries == 0.

Yeah maybe that would be enough. I don't know if half that setup_arch
could be per-platformirized, including smp_release_cpus().

> 
> The other issue is kexec. IIRC when we kexec on pseries we don't return
> the CPUs to RTAS, so then they *are* spinning at 0x60. But maybe that's
> changed since I last looked at it too :)

Oh I might have forgotten to test that on pseries, so I'll try that.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-10-11 14:00 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-06  6:10 [PATCH 0/3] some boot/shutdown improvements Nicholas Piggin
2017-10-06  6:10 ` [PATCH 1/3] powerpc/powernv: Avoid the secondary hold spinloop for OPAL boot Nicholas Piggin
2017-10-10 11:11   ` Michael Ellerman
2017-10-10 11:44     ` Nicholas Piggin
2017-10-10 15:58       ` Nicholas Piggin
2017-10-10 18:52         ` Nicholas Piggin
2017-10-11 11:27           ` Michael Ellerman
2017-10-11 14:00             ` Nicholas Piggin
2017-10-06  6:10 ` [PATCH 2/3] powerpc/powernv: Always stop secondaries before reboot/shutdown Nicholas Piggin
2017-10-06  6:10 ` [PATCH 3/3] powerpc: use NMI IPI for smp_send_stop Nicholas Piggin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.