* [PATCH 1/1] powerpc: Clear cpu_sibling_map in cpu_die
@ 2010-08-11 20:34 Brian King
2010-08-24 5:24 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 3+ messages in thread
From: Brian King @ 2010-08-11 20:34 UTC (permalink / raw)
To: benh; +Cc: brking, linuxppc-dev
While testing CPU DLPAR, the following problem was discovered.
We were DLPAR removing the first CPU, which in this case was
logical CPUs 0-3. CPUs 0-2 were already marked offline and
we were in the process of offlining CPU 3. After marking
the CPU inactive and offline in cpu_disable, but before the
cpu was completely idle (cpu_die), we ended up in __make_request
on CPU 3. There we looked at the topology map to see which CPU
to complete the I/O on and found no CPUs in the cpu_sibling_map.
This resulted in the block layer setting the completion cpu
to be NR_CPUS, which then caused an oops when we tried to
complete the I/O.
Fix this by delaying clearing the sibling map of the cpu we
are offlining for the cpu we are offlining until cpu_die.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
---
arch/powerpc/kernel/smp.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff -puN arch/powerpc/kernel/smp.c~powerpc_sibling_map_offline arch/powerpc/kernel/smp.c
--- linux-2.6/arch/powerpc/kernel/smp.c~powerpc_sibling_map_offline 2010-08-09 16:49:47.000000000 -0500
+++ linux-2.6-bjking1/arch/powerpc/kernel/smp.c 2010-08-09 16:49:47.000000000 -0500
@@ -598,8 +598,11 @@ int __cpu_disable(void)
/* Update sibling maps */
base = cpu_first_thread_in_core(cpu);
for (i = 0; i < threads_per_core; i++) {
- cpumask_clear_cpu(cpu, cpu_sibling_mask(base + i));
- cpumask_clear_cpu(base + i, cpu_sibling_mask(cpu));
+ if ((base + i) != cpu) {
+ cpumask_clear_cpu(cpu, cpu_sibling_mask(base + i));
+ cpumask_clear_cpu(base + i, cpu_sibling_mask(cpu));
+ }
+
cpumask_clear_cpu(cpu, cpu_core_mask(base + i));
cpumask_clear_cpu(base + i, cpu_core_mask(cpu));
}
@@ -641,6 +644,8 @@ void cpu_hotplug_driver_unlock()
void cpu_die(void)
{
+ cpumask_clear_cpu(smp_processor_id(), cpu_sibling_mask(smp_processor_id()));
+
if (ppc_md.cpu_die)
ppc_md.cpu_die();
}
_
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 1/1] powerpc: Clear cpu_sibling_map in cpu_die
2010-08-11 20:34 [PATCH 1/1] powerpc: Clear cpu_sibling_map in cpu_die Brian King
@ 2010-08-24 5:24 ` Benjamin Herrenschmidt
2010-08-24 21:40 ` Brian King
0 siblings, 1 reply; 3+ messages in thread
From: Benjamin Herrenschmidt @ 2010-08-24 5:24 UTC (permalink / raw)
To: Brian King; +Cc: linuxppc-dev
On Wed, 2010-08-11 at 15:34 -0500, Brian King wrote:
> While testing CPU DLPAR, the following problem was discovered.
> We were DLPAR removing the first CPU, which in this case was
> logical CPUs 0-3. CPUs 0-2 were already marked offline and
> we were in the process of offlining CPU 3. After marking
> the CPU inactive and offline in cpu_disable, but before the
> cpu was completely idle (cpu_die), we ended up in __make_request
> on CPU 3. There we looked at the topology map to see which CPU
> to complete the I/O on and found no CPUs in the cpu_sibling_map.
> This resulted in the block layer setting the completion cpu
> to be NR_CPUS, which then caused an oops when we tried to
> complete the I/O.
>
> Fix this by delaying clearing the sibling map of the cpu we
> are offlining for the cpu we are offlining until cpu_die.
So I'm not getting a clear mental picture of the situation, sorry about
that.
We are offlining CPU 3, and we have already marked it inactive and
online, so how come we end up in __make_request() on it at this stage
and shouldn't it be the block layer that notices that it's targeting an
offlined CPU ?
IE. I have doubts about leaving a CPU in the sibling map which isn't
online... Wouldn't we end up "scheduling" things to it after it's
supposed to have freed itself of everything (timers, workqueues,
etc...) ?
As I said, I'm probably missing a part of the puzzle ..
Ben.
> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
> ---
>
> arch/powerpc/kernel/smp.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff -puN arch/powerpc/kernel/smp.c~powerpc_sibling_map_offline arch/powerpc/kernel/smp.c
> --- linux-2.6/arch/powerpc/kernel/smp.c~powerpc_sibling_map_offline 2010-08-09 16:49:47.000000000 -0500
> +++ linux-2.6-bjking1/arch/powerpc/kernel/smp.c 2010-08-09 16:49:47.000000000 -0500
> @@ -598,8 +598,11 @@ int __cpu_disable(void)
> /* Update sibling maps */
> base = cpu_first_thread_in_core(cpu);
> for (i = 0; i < threads_per_core; i++) {
> - cpumask_clear_cpu(cpu, cpu_sibling_mask(base + i));
> - cpumask_clear_cpu(base + i, cpu_sibling_mask(cpu));
> + if ((base + i) != cpu) {
> + cpumask_clear_cpu(cpu, cpu_sibling_mask(base + i));
> + cpumask_clear_cpu(base + i, cpu_sibling_mask(cpu));
> + }
> +
> cpumask_clear_cpu(cpu, cpu_core_mask(base + i));
> cpumask_clear_cpu(base + i, cpu_core_mask(cpu));
> }
> @@ -641,6 +644,8 @@ void cpu_hotplug_driver_unlock()
>
> void cpu_die(void)
> {
> + cpumask_clear_cpu(smp_processor_id(), cpu_sibling_mask(smp_processor_id()));
> +
> if (ppc_md.cpu_die)
> ppc_md.cpu_die();
> }
> _
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH 1/1] powerpc: Clear cpu_sibling_map in cpu_die
2010-08-24 5:24 ` Benjamin Herrenschmidt
@ 2010-08-24 21:40 ` Brian King
0 siblings, 0 replies; 3+ messages in thread
From: Brian King @ 2010-08-24 21:40 UTC (permalink / raw)
To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
On 08/24/2010 12:24 AM, Benjamin Herrenschmidt wrote:
> On Wed, 2010-08-11 at 15:34 -0500, Brian King wrote:
>> While testing CPU DLPAR, the following problem was discovered.
>> We were DLPAR removing the first CPU, which in this case was
>> logical CPUs 0-3. CPUs 0-2 were already marked offline and
>> we were in the process of offlining CPU 3. After marking
>> the CPU inactive and offline in cpu_disable, but before the
>> cpu was completely idle (cpu_die), we ended up in __make_request
>> on CPU 3. There we looked at the topology map to see which CPU
>> to complete the I/O on and found no CPUs in the cpu_sibling_map.
>> This resulted in the block layer setting the completion cpu
>> to be NR_CPUS, which then caused an oops when we tried to
>> complete the I/O.
>>
>> Fix this by delaying clearing the sibling map of the cpu we
>> are offlining for the cpu we are offlining until cpu_die.
>
> So I'm not getting a clear mental picture of the situation, sorry about
> that.
>
> We are offlining CPU 3, and we have already marked it inactive and
> online, so how come we end up in __make_request() on it at this stage
I'm not sure about that. My thought was that until we get into cpu_die,
the cpu could still be executing code.
> and shouldn't it be the block layer that notices that it's targeting an
> offlined CPU ?
It could be easily fixed in blk_cpu_to_group as well. I'll look into
this.
> IE. I have doubts about leaving a CPU in the sibling map which isn't
> online... Wouldn't we end up "scheduling" things to it after it's
> supposed to have freed itself of everything (timers, workqueues,
> etc...) ?
I was assuming this wouldn't happen since the cpu is no longer online.
Thanks,
Brian
>
> As I said, I'm probably missing a part of the puzzle ..
>
> Ben.
>
>> Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
>> ---
>>
>> arch/powerpc/kernel/smp.c | 9 +++++++--
>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff -puN arch/powerpc/kernel/smp.c~powerpc_sibling_map_offline arch/powerpc/kernel/smp.c
>> --- linux-2.6/arch/powerpc/kernel/smp.c~powerpc_sibling_map_offline 2010-08-09 16:49:47.000000000 -0500
>> +++ linux-2.6-bjking1/arch/powerpc/kernel/smp.c 2010-08-09 16:49:47.000000000 -0500
>> @@ -598,8 +598,11 @@ int __cpu_disable(void)
>> /* Update sibling maps */
>> base = cpu_first_thread_in_core(cpu);
>> for (i = 0; i < threads_per_core; i++) {
>> - cpumask_clear_cpu(cpu, cpu_sibling_mask(base + i));
>> - cpumask_clear_cpu(base + i, cpu_sibling_mask(cpu));
>> + if ((base + i) != cpu) {
>> + cpumask_clear_cpu(cpu, cpu_sibling_mask(base + i));
>> + cpumask_clear_cpu(base + i, cpu_sibling_mask(cpu));
>> + }
>> +
>> cpumask_clear_cpu(cpu, cpu_core_mask(base + i));
>> cpumask_clear_cpu(base + i, cpu_core_mask(cpu));
>> }
>> @@ -641,6 +644,8 @@ void cpu_hotplug_driver_unlock()
>>
>> void cpu_die(void)
>> {
>> + cpumask_clear_cpu(smp_processor_id(), cpu_sibling_mask(smp_processor_id()));
>> +
>> if (ppc_md.cpu_die)
>> ppc_md.cpu_die();
>> }
>> _
>
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
--
Brian King
Linux on Power Virtualization
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2010-08-24 21:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-11 20:34 [PATCH 1/1] powerpc: Clear cpu_sibling_map in cpu_die Brian King
2010-08-24 5:24 ` Benjamin Herrenschmidt
2010-08-24 21:40 ` Brian King
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.