All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()
@ 2022-01-25  7:21 ` Nathan Lynch
  0 siblings, 0 replies; 6+ messages in thread
From: Nathan Lynch @ 2022-01-25  7:21 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: brking, linux-kernel, npiggin, srikar

Replace the outdated iteration and timeout calculations here with
indefinite spin_until_cond()-wrapped poll of cpu_callin_map. __cpu_up()
already does this when waiting for the cpu to set its online bit before
returning, so this change is not really making the function more brittle.

Removing the msleep(1) in the hotplug path here reduces the time it takes
to online a CPU on a P9 PowerVM LPAR from about 30ms to 1ms when exercised
via thaw_secondary_cpus().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/smp.c | 25 ++-----------------------
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index b7fd6a72aa76..990893365fe0 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1270,7 +1270,7 @@ static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
-	int rc, c;
+	int rc;
 
 	/*
 	 * Don't allow secondary threads to come online if inhibited
@@ -1314,28 +1314,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 		return rc;
 	}
 
-	/*
-	 * wait to see if the cpu made a callin (is actually up).
-	 * use this value that I found through experimentation.
-	 * -- Cort
-	 */
-	if (system_state < SYSTEM_RUNNING)
-		for (c = 50000; c && !cpu_callin_map[cpu]; c--)
-			udelay(100);
-#ifdef CONFIG_HOTPLUG_CPU
-	else
-		/*
-		 * CPUs can take much longer to come up in the
-		 * hotplug case.  Wait five seconds.
-		 */
-		for (c = 5000; c && !cpu_callin_map[cpu]; c--)
-			msleep(1);
-#endif
-
-	if (!cpu_callin_map[cpu]) {
-		printk(KERN_ERR "Processor %u is stuck.\n", cpu);
-		return -ENOENT;
-	}
+	spin_until_cond(cpu_callin_map[cpu] != 0);
 
 	DBG("Processor %u found.\n", cpu);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()
@ 2022-01-25  7:21 ` Nathan Lynch
  0 siblings, 0 replies; 6+ messages in thread
From: Nathan Lynch @ 2022-01-25  7:21 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-kernel, mpe, npiggin, brking, srikar

Replace the outdated iteration and timeout calculations here with
indefinite spin_until_cond()-wrapped poll of cpu_callin_map. __cpu_up()
already does this when waiting for the cpu to set its online bit before
returning, so this change is not really making the function more brittle.

Removing the msleep(1) in the hotplug path here reduces the time it takes
to online a CPU on a P9 PowerVM LPAR from about 30ms to 1ms when exercised
via thaw_secondary_cpus().

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
---
 arch/powerpc/kernel/smp.c | 25 ++-----------------------
 1 file changed, 2 insertions(+), 23 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index b7fd6a72aa76..990893365fe0 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1270,7 +1270,7 @@ static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
 
 int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
-	int rc, c;
+	int rc;
 
 	/*
 	 * Don't allow secondary threads to come online if inhibited
@@ -1314,28 +1314,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 		return rc;
 	}
 
-	/*
-	 * wait to see if the cpu made a callin (is actually up).
-	 * use this value that I found through experimentation.
-	 * -- Cort
-	 */
-	if (system_state < SYSTEM_RUNNING)
-		for (c = 50000; c && !cpu_callin_map[cpu]; c--)
-			udelay(100);
-#ifdef CONFIG_HOTPLUG_CPU
-	else
-		/*
-		 * CPUs can take much longer to come up in the
-		 * hotplug case.  Wait five seconds.
-		 */
-		for (c = 5000; c && !cpu_callin_map[cpu]; c--)
-			msleep(1);
-#endif
-
-	if (!cpu_callin_map[cpu]) {
-		printk(KERN_ERR "Processor %u is stuck.\n", cpu);
-		return -ENOENT;
-	}
+	spin_until_cond(cpu_callin_map[cpu] != 0);
 
 	DBG("Processor %u found.\n", cpu);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()
  2022-01-25  7:21 ` Nathan Lynch
@ 2022-06-29  9:19   ` Michael Ellerman
  -1 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2022-06-29  9:19 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: linux-kernel, npiggin, brking, srikar

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Replace the outdated iteration and timeout calculations here with
> indefinite spin_until_cond()-wrapped poll of cpu_callin_map. __cpu_up()
> already does this when waiting for the cpu to set its online bit before
> returning, so this change is not really making the function more brittle.

Sorry for the glacial response.

I'm not sure I agree that this doesn't make the code more brittle.

The existing indefinite wait you mention is later in the function, and
happens after the CPU has successfully come into the kernel.

I think it's more common that a stuck/borked CPU doesn't come into the
kernel at all, rather than comes in and then fails to online.

So I think the bail out when the CPU fails to call in is useful, I would
guess I see that "Processor x is stuck" message multiple times a year
while debugging various things.

> Removing the msleep(1) in the hotplug path here reduces the time it takes
> to online a CPU on a P9 PowerVM LPAR from about 30ms to 1ms when exercised
> via thaw_secondary_cpus().

That is a nice improvement.

Can we do something that returns quickly in the happy case and still has
a timeout when things go wrong? Seems like a busy loop with a
time_after() check would do the trick.

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()
@ 2022-06-29  9:19   ` Michael Ellerman
  0 siblings, 0 replies; 6+ messages in thread
From: Michael Ellerman @ 2022-06-29  9:19 UTC (permalink / raw)
  To: Nathan Lynch, linuxppc-dev; +Cc: brking, linux-kernel, npiggin, srikar

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Replace the outdated iteration and timeout calculations here with
> indefinite spin_until_cond()-wrapped poll of cpu_callin_map. __cpu_up()
> already does this when waiting for the cpu to set its online bit before
> returning, so this change is not really making the function more brittle.

Sorry for the glacial response.

I'm not sure I agree that this doesn't make the code more brittle.

The existing indefinite wait you mention is later in the function, and
happens after the CPU has successfully come into the kernel.

I think it's more common that a stuck/borked CPU doesn't come into the
kernel at all, rather than comes in and then fails to online.

So I think the bail out when the CPU fails to call in is useful, I would
guess I see that "Processor x is stuck" message multiple times a year
while debugging various things.

> Removing the msleep(1) in the hotplug path here reduces the time it takes
> to online a CPU on a P9 PowerVM LPAR from about 30ms to 1ms when exercised
> via thaw_secondary_cpus().

That is a nice improvement.

Can we do something that returns quickly in the happy case and still has
a timeout when things go wrong? Seems like a busy loop with a
time_after() check would do the trick.

cheers

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()
  2022-06-29  9:19   ` Michael Ellerman
@ 2022-06-29 17:51     ` Nathan Lynch
  -1 siblings, 0 replies; 6+ messages in thread
From: Nathan Lynch @ 2022-06-29 17:51 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linux-kernel, npiggin, brking, srikar, linuxppc-dev

Michael Ellerman <mpe@ellerman.id.au> writes:

> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> Replace the outdated iteration and timeout calculations here with
>> indefinite spin_until_cond()-wrapped poll of cpu_callin_map. __cpu_up()
>> already does this when waiting for the cpu to set its online bit before
>> returning, so this change is not really making the function more brittle.
>
> I'm not sure I agree that this doesn't make the code more brittle.
>
> The existing indefinite wait you mention is later in the function, and
> happens after the CPU has successfully come into the kernel.
>
> I think it's more common that a stuck/borked CPU doesn't come into the
> kernel at all, rather than comes in and then fails to online.
>
> So I think the bail out when the CPU fails to call in is useful, I would
> guess I see that "Processor x is stuck" message multiple times a year
> while debugging various things.

Yeah I can see how my claim is too strong here.

>> Removing the msleep(1) in the hotplug path here reduces the time it takes
>> to online a CPU on a P9 PowerVM LPAR from about 30ms to 1ms when exercised
>> via thaw_secondary_cpus().
>
> That is a nice improvement.
>
> Can we do something that returns quickly in the happy case and still has
> a timeout when things go wrong? Seems like a busy loop with a
> time_after() check would do the trick.

Yes, I'll rework it like that. Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up()
@ 2022-06-29 17:51     ` Nathan Lynch
  0 siblings, 0 replies; 6+ messages in thread
From: Nathan Lynch @ 2022-06-29 17:51 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: brking, linuxppc-dev, linux-kernel, npiggin, srikar

Michael Ellerman <mpe@ellerman.id.au> writes:

> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> Replace the outdated iteration and timeout calculations here with
>> indefinite spin_until_cond()-wrapped poll of cpu_callin_map. __cpu_up()
>> already does this when waiting for the cpu to set its online bit before
>> returning, so this change is not really making the function more brittle.
>
> I'm not sure I agree that this doesn't make the code more brittle.
>
> The existing indefinite wait you mention is later in the function, and
> happens after the CPU has successfully come into the kernel.
>
> I think it's more common that a stuck/borked CPU doesn't come into the
> kernel at all, rather than comes in and then fails to online.
>
> So I think the bail out when the CPU fails to call in is useful, I would
> guess I see that "Processor x is stuck" message multiple times a year
> while debugging various things.

Yeah I can see how my claim is too strong here.

>> Removing the msleep(1) in the hotplug path here reduces the time it takes
>> to online a CPU on a P9 PowerVM LPAR from about 30ms to 1ms when exercised
>> via thaw_secondary_cpus().
>
> That is a nice improvement.
>
> Can we do something that returns quickly in the happy case and still has
> a timeout when things go wrong? Seems like a busy loop with a
> time_after() check would do the trick.

Yes, I'll rework it like that. Thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-06-29 17:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-25  7:21 [PATCH] powerpc/smp: poll cpu_callin_map more aggressively in __cpu_up() Nathan Lynch
2022-01-25  7:21 ` Nathan Lynch
2022-06-29  9:19 ` Michael Ellerman
2022-06-29  9:19   ` Michael Ellerman
2022-06-29 17:51   ` Nathan Lynch
2022-06-29 17:51     ` Nathan Lynch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.