All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Do not change worker's running cpu in cmci_rediscover().
@ 2012-10-19  5:45 Tang Chen
  2012-10-19  5:45 ` [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() " Tang Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Tang Chen @ 2012-10-19  5:45 UTC (permalink / raw)
  To: stable, tony.luck, bp, tglx, mingo, hpa, miaox, laijs, wency,
	x86, linux-edac, linux-kernel

1. cmci_rediscover() is only called by the CPU_POST_DEAD event handler, which
means the corresponding cpu has already dead. As a result, it won't be accessed
in the for_each_online_cpu loop.
So, we could change the if(cpu == dying) statement into a WARN_ON_ONCE().

2. cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's
running cpu, and migrate itself to the dest cpu. But worker processes are not
allowed to be migrated. If current is a worker, the worker will be migrated to
another cpu, but the corresponding  worker_pool is still on the original cpu.

In this case, the following BUG_ON in try_to_wake_up_local() will be triggered:
BUG_ON(rq != this_rq());

This will cause the kernel panic.

This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover
jobs onto all the other cpus using system_wq. This could bring some delay for
the jobs.

The following is call trace.

[ 6155.451107] ------------[ cut here ]------------
[ 6155.452019] kernel BUG at kernel/sched/core.c:1654!
......
[ 6155.452019] RIP: 0010:[<ffffffff810add15>]  [<ffffffff810add15>] try_to_wake_up_local+0x115/0x130
......
[ 6155.452019] Call Trace:
[ 6155.452019]  [<ffffffff8166fc14>] __schedule+0x764/0x880
[ 6155.452019]  [<ffffffff81670059>] schedule+0x29/0x70
[ 6155.452019]  [<ffffffff8166de65>] schedule_timeout+0x235/0x2d0
[ 6155.452019]  [<ffffffff810db57d>] ? mark_held_locks+0x8d/0x140
[ 6155.452019]  [<ffffffff810dd463>] ? __lock_release+0x133/0x1a0
[ 6155.452019]  [<ffffffff81671c50>] ? _raw_spin_unlock_irq+0x30/0x50
[ 6155.452019]  [<ffffffff810db8f5>] ? trace_hardirqs_on_caller+0x105/0x190
[ 6155.452019]  [<ffffffff8166fefb>] wait_for_common+0x12b/0x180
[ 6155.452019]  [<ffffffff810b0b30>] ? try_to_wake_up+0x2f0/0x2f0
[ 6155.452019]  [<ffffffff8167002d>] wait_for_completion+0x1d/0x20
[ 6155.452019]  [<ffffffff8110008a>] stop_one_cpu+0x8a/0xc0
[ 6155.452019]  [<ffffffff810abd40>] ? __migrate_task+0x1a0/0x1a0
[ 6155.452019]  [<ffffffff810a6ab8>] ? complete+0x28/0x60
[ 6155.452019]  [<ffffffff810b0fd8>] set_cpus_allowed_ptr+0x128/0x130
[ 6155.452019]  [<ffffffff81036785>] cmci_rediscover+0xf5/0x140
[ 6155.452019]  [<ffffffff816643c0>] mce_cpu_callback+0x18d/0x19d
[ 6155.452019]  [<ffffffff81676187>] notifier_call_chain+0x67/0x150
[ 6155.452019]  [<ffffffff810a03de>] __raw_notifier_call_chain+0xe/0x10
[ 6155.452019]  [<ffffffff81070470>] __cpu_notify+0x20/0x40
[ 6155.452019]  [<ffffffff810704a5>] cpu_notify_nofail+0x15/0x30
[ 6155.452019]  [<ffffffff81655182>] _cpu_down+0x262/0x2e0
[ 6155.452019]  [<ffffffff81655236>] cpu_down+0x36/0x50
[ 6155.452019]  [<ffffffff813d3eaa>] acpi_processor_remove+0x50/0x11e
[ 6155.452019]  [<ffffffff813a6978>] acpi_device_remove+0x90/0xb2
[ 6155.452019]  [<ffffffff8143cbec>] __device_release_driver+0x7c/0xf0
[ 6155.452019]  [<ffffffff8143cd6f>] device_release_driver+0x2f/0x50
[ 6155.452019]  [<ffffffff813a7870>] acpi_bus_remove+0x32/0x6d
[ 6155.452019]  [<ffffffff813a7932>] acpi_bus_trim+0x87/0xee
[ 6155.452019]  [<ffffffff813a7a21>] acpi_bus_hot_remove_device+0x88/0x16b
[ 6155.452019]  [<ffffffff813a33ee>] acpi_os_execute_deferred+0x27/0x34
[ 6155.452019]  [<ffffffff81090589>] process_one_work+0x219/0x680
[ 6155.452019]  [<ffffffff81090528>] ? process_one_work+0x1b8/0x680
[ 6155.452019]  [<ffffffff813a33c7>] ? acpi_os_wait_events_complete+0x23/0x23
[ 6155.452019]  [<ffffffff810923be>] worker_thread+0x12e/0x320
[ 6155.452019]  [<ffffffff81092290>] ? manage_workers+0x110/0x110
[ 6155.452019]  [<ffffffff81098396>] kthread+0xc6/0xd0
[ 6155.452019]  [<ffffffff8167c4c4>] kernel_thread_helper+0x4/0x10
[ 6155.452019]  [<ffffffff81671f30>] ? retint_restore_args+0x13/0x13
[ 6155.452019]  [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70
[ 6155.452019]  [<ffffffff8167c4c0>] ? gs_change+0x13/0x13


Tang Chen (2):
  Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  Do not change worker's running cpu in cmci_rediscover().

 arch/x86/kernel/cpu/mcheck/mce_intel.c |   34 +++++++++++++++++--------------
 1 files changed, 19 insertions(+), 15 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-19  5:45 [PATCH v2 0/2] Do not change worker's running cpu in cmci_rediscover() Tang Chen
@ 2012-10-19  5:45 ` Tang Chen
  2012-10-19 14:07   ` Greg KH
  2012-10-19 16:40   ` Borislav Petkov
  2012-10-19  5:45 ` [PATCH v2 2/2] Do not change worker's running cpu " Tang Chen
  2012-10-19  7:21 ` [PATCH v2 0/2] " Tang Chen
  2 siblings, 2 replies; 24+ messages in thread
From: Tang Chen @ 2012-10-19  5:45 UTC (permalink / raw)
  To: stable, tony.luck, bp, tglx, mingo, hpa, miaox, laijs, wency,
	x86, linux-edac, linux-kernel

cmci_rediscover() is only called by the CPU_POST_DEAD event handler,
which means the corresponding cpu has already dead. As a result, it
won't be accessed in the for_each_online_cpu loop.
So, we could change the if(cpu == dying) statement into a WARN_ON_ONCE().

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
---
 arch/x86/kernel/cpu/mcheck/mce_intel.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 38e49bc..481d152 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -180,8 +180,8 @@ void cmci_rediscover(int dying)
 	cpumask_copy(old, &current->cpus_allowed);
 
 	for_each_online_cpu(cpu) {
-		if (cpu == dying)
-			continue;
+		WARN_ON_ONCE(cpu == dying);
+
 		if (set_cpus_allowed_ptr(current, cpumask_of(cpu)))
 			continue;
 		/* Recheck banks in case CPUs don't all have the same */
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/2] Do not change worker's running cpu in cmci_rediscover().
  2012-10-19  5:45 [PATCH v2 0/2] Do not change worker's running cpu in cmci_rediscover() Tang Chen
  2012-10-19  5:45 ` [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() " Tang Chen
@ 2012-10-19  5:45 ` Tang Chen
  2012-10-19 16:42   ` Borislav Petkov
  2012-10-19  7:21 ` [PATCH v2 0/2] " Tang Chen
  2 siblings, 1 reply; 24+ messages in thread
From: Tang Chen @ 2012-10-19  5:45 UTC (permalink / raw)
  To: stable, tony.luck, bp, tglx, mingo, hpa, miaox, laijs, wency,
	x86, linux-edac, linux-kernel

cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's
running cpu, and migrate itself to the dest cpu. But worker processes are not
allowed to be migrated. If current is a worker, the worker will be migrated to
another cpu, but the corresponding  worker_pool is still on the original cpu.

In this case, the following BUG_ON in try_to_wake_up_local() will be triggered:
BUG_ON(rq != this_rq());

This will cause the kernel panic.

This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover
jobs onto all the other cpus using system_wq. This could bring some delay for
the jobs.

Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
---
 arch/x86/kernel/cpu/mcheck/mce_intel.c |   30 +++++++++++++++++-------------
 1 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 481d152..d8f0da6 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -163,34 +163,38 @@ void cmci_clear(void)
 	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
 }
 
+static long cmci_rediscover_work_func(void *arg)
+{
+	int banks;
+
+	/* Recheck banks in case CPUs don't all have the same */
+	if (cmci_supported(&banks))
+		cmci_discover(banks, 0);
+
+	return 0;
+}
+
 /*
  * After a CPU went down cycle through all the others and rediscover
  * Must run in process context.
  */
 void cmci_rediscover(int dying)
 {
-	int banks;
-	int cpu;
-	cpumask_var_t old;
+	int cpu, banks;
 
 	if (!cmci_supported(&banks))
 		return;
-	if (!alloc_cpumask_var(&old, GFP_KERNEL))
-		return;
-	cpumask_copy(old, &current->cpus_allowed);
 
 	for_each_online_cpu(cpu) {
 		WARN_ON_ONCE(cpu == dying);
 
-		if (set_cpus_allowed_ptr(current, cpumask_of(cpu)))
+		if (cpu == smp_processor_id()) {
+			cmci_rediscover_work_func(NULL);
 			continue;
-		/* Recheck banks in case CPUs don't all have the same */
-		if (cmci_supported(&banks))
-			cmci_discover(banks, 0);
-	}
+		}
 
-	set_cpus_allowed_ptr(current, old);
-	free_cpumask_var(old);
+		work_on_cpu(cpu, cmci_rediscover_work_func, NULL);
+	}
 }
 
 /*
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/2] Do not change worker's running cpu in cmci_rediscover().
  2012-10-19  5:45 [PATCH v2 0/2] Do not change worker's running cpu in cmci_rediscover() Tang Chen
  2012-10-19  5:45 ` [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() " Tang Chen
  2012-10-19  5:45 ` [PATCH v2 2/2] Do not change worker's running cpu " Tang Chen
@ 2012-10-19  7:21 ` Tang Chen
  2 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2012-10-19  7:21 UTC (permalink / raw)
  To: stable, tony.luck, bp, tglx, mingo, hpa, miaox, laijs, wency,
	x86, linux-edac, linux-kernel, Tejun Heo

Hi,

CC to Tejun Heo, and add change log:

Changes from v1 to v2:

   - split one single patch into two.
   - use WARN_ON_ONCE() but not BUG_ON(), as Tejun said.

Thanks. :)

On 10/19/2012 01:45 PM, Tang Chen wrote:
> 1. cmci_rediscover() is only called by the CPU_POST_DEAD event handler, which
> means the corresponding cpu has already dead. As a result, it won't be accessed
> in the for_each_online_cpu loop.
> So, we could change the if(cpu == dying) statement into a WARN_ON_ONCE().
>
> 2. cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's
> running cpu, and migrate itself to the dest cpu. But worker processes are not
> allowed to be migrated. If current is a worker, the worker will be migrated to
> another cpu, but the corresponding  worker_pool is still on the original cpu.
>
> In this case, the following BUG_ON in try_to_wake_up_local() will be triggered:
> BUG_ON(rq != this_rq());
>
> This will cause the kernel panic.
>
> This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover
> jobs onto all the other cpus using system_wq. This could bring some delay for
> the jobs.
>
> The following is call trace.
>
> [ 6155.451107] ------------[ cut here ]------------
> [ 6155.452019] kernel BUG at kernel/sched/core.c:1654!
> ......
> [ 6155.452019] RIP: 0010:[<ffffffff810add15>]  [<ffffffff810add15>] try_to_wake_up_local+0x115/0x130
> ......
> [ 6155.452019] Call Trace:
> [ 6155.452019]  [<ffffffff8166fc14>] __schedule+0x764/0x880
> [ 6155.452019]  [<ffffffff81670059>] schedule+0x29/0x70
> [ 6155.452019]  [<ffffffff8166de65>] schedule_timeout+0x235/0x2d0
> [ 6155.452019]  [<ffffffff810db57d>] ? mark_held_locks+0x8d/0x140
> [ 6155.452019]  [<ffffffff810dd463>] ? __lock_release+0x133/0x1a0
> [ 6155.452019]  [<ffffffff81671c50>] ? _raw_spin_unlock_irq+0x30/0x50
> [ 6155.452019]  [<ffffffff810db8f5>] ? trace_hardirqs_on_caller+0x105/0x190
> [ 6155.452019]  [<ffffffff8166fefb>] wait_for_common+0x12b/0x180
> [ 6155.452019]  [<ffffffff810b0b30>] ? try_to_wake_up+0x2f0/0x2f0
> [ 6155.452019]  [<ffffffff8167002d>] wait_for_completion+0x1d/0x20
> [ 6155.452019]  [<ffffffff8110008a>] stop_one_cpu+0x8a/0xc0
> [ 6155.452019]  [<ffffffff810abd40>] ? __migrate_task+0x1a0/0x1a0
> [ 6155.452019]  [<ffffffff810a6ab8>] ? complete+0x28/0x60
> [ 6155.452019]  [<ffffffff810b0fd8>] set_cpus_allowed_ptr+0x128/0x130
> [ 6155.452019]  [<ffffffff81036785>] cmci_rediscover+0xf5/0x140
> [ 6155.452019]  [<ffffffff816643c0>] mce_cpu_callback+0x18d/0x19d
> [ 6155.452019]  [<ffffffff81676187>] notifier_call_chain+0x67/0x150
> [ 6155.452019]  [<ffffffff810a03de>] __raw_notifier_call_chain+0xe/0x10
> [ 6155.452019]  [<ffffffff81070470>] __cpu_notify+0x20/0x40
> [ 6155.452019]  [<ffffffff810704a5>] cpu_notify_nofail+0x15/0x30
> [ 6155.452019]  [<ffffffff81655182>] _cpu_down+0x262/0x2e0
> [ 6155.452019]  [<ffffffff81655236>] cpu_down+0x36/0x50
> [ 6155.452019]  [<ffffffff813d3eaa>] acpi_processor_remove+0x50/0x11e
> [ 6155.452019]  [<ffffffff813a6978>] acpi_device_remove+0x90/0xb2
> [ 6155.452019]  [<ffffffff8143cbec>] __device_release_driver+0x7c/0xf0
> [ 6155.452019]  [<ffffffff8143cd6f>] device_release_driver+0x2f/0x50
> [ 6155.452019]  [<ffffffff813a7870>] acpi_bus_remove+0x32/0x6d
> [ 6155.452019]  [<ffffffff813a7932>] acpi_bus_trim+0x87/0xee
> [ 6155.452019]  [<ffffffff813a7a21>] acpi_bus_hot_remove_device+0x88/0x16b
> [ 6155.452019]  [<ffffffff813a33ee>] acpi_os_execute_deferred+0x27/0x34
> [ 6155.452019]  [<ffffffff81090589>] process_one_work+0x219/0x680
> [ 6155.452019]  [<ffffffff81090528>] ? process_one_work+0x1b8/0x680
> [ 6155.452019]  [<ffffffff813a33c7>] ? acpi_os_wait_events_complete+0x23/0x23
> [ 6155.452019]  [<ffffffff810923be>] worker_thread+0x12e/0x320
> [ 6155.452019]  [<ffffffff81092290>] ? manage_workers+0x110/0x110
> [ 6155.452019]  [<ffffffff81098396>] kthread+0xc6/0xd0
> [ 6155.452019]  [<ffffffff8167c4c4>] kernel_thread_helper+0x4/0x10
> [ 6155.452019]  [<ffffffff81671f30>] ? retint_restore_args+0x13/0x13
> [ 6155.452019]  [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70
> [ 6155.452019]  [<ffffffff8167c4c0>] ? gs_change+0x13/0x13
>
>
> Tang Chen (2):
>    Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
>    Do not change worker's running cpu in cmci_rediscover().
>
>   arch/x86/kernel/cpu/mcheck/mce_intel.c |   34 +++++++++++++++++--------------
>   1 files changed, 19 insertions(+), 15 deletions(-)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-19  5:45 ` [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() " Tang Chen
@ 2012-10-19 14:07   ` Greg KH
  2012-10-19 16:40   ` Borislav Petkov
  1 sibling, 0 replies; 24+ messages in thread
From: Greg KH @ 2012-10-19 14:07 UTC (permalink / raw)
  To: Tang Chen
  Cc: stable, tony.luck, bp, tglx, mingo, hpa, miaox, laijs, wency,
	x86, linux-edac, linux-kernel

On Fri, Oct 19, 2012 at 01:45:27PM +0800, Tang Chen wrote:
> cmci_rediscover() is only called by the CPU_POST_DEAD event handler,
> which means the corresponding cpu has already dead. As a result, it
> won't be accessed in the for_each_online_cpu loop.
> So, we could change the if(cpu == dying) statement into a WARN_ON_ONCE().
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_intel.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-19  5:45 ` [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() " Tang Chen
  2012-10-19 14:07   ` Greg KH
@ 2012-10-19 16:40   ` Borislav Petkov
  2012-10-22  2:10     ` Tang Chen
  1 sibling, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2012-10-19 16:40 UTC (permalink / raw)
  To: Tang Chen
  Cc: stable, tony.luck, bp, tglx, mingo, hpa, miaox, laijs, wency,
	x86, linux-edac, linux-kernel, Borislav Petkov

On Fri, Oct 19, 2012 at 01:45:27PM +0800, Tang Chen wrote:
> cmci_rediscover() is only called by the CPU_POST_DEAD event handler,
> which means the corresponding cpu has already dead. As a result, it
> won't be accessed in the for_each_online_cpu loop.
> So, we could change the if(cpu == dying) statement into a WARN_ON_ONCE().
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_intel.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> index 38e49bc..481d152 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
> @@ -180,8 +180,8 @@ void cmci_rediscover(int dying)
>  	cpumask_copy(old, &current->cpus_allowed);
>  
>  	for_each_online_cpu(cpu) {
> -		if (cpu == dying)
> -			continue;
> +		WARN_ON_ONCE(cpu == dying);

Ok, I don't understand that:

we want to warn that the rediscovering is happening on a dying cpu?? And
before that, we simply jumped over it and didn't do the rediscovering
there? Why should we warn at all?

Huh?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/2] Do not change worker's running cpu in cmci_rediscover().
  2012-10-19  5:45 ` [PATCH v2 2/2] Do not change worker's running cpu " Tang Chen
@ 2012-10-19 16:42   ` Borislav Petkov
  2012-10-19 17:21     ` Luck, Tony
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2012-10-19 16:42 UTC (permalink / raw)
  To: Tang Chen, Tony Luck
  Cc: stable, tglx, mingo, hpa, miaox, laijs, wency, x86, linux-edac,
	linux-kernel, Borislav Petkov

On Fri, Oct 19, 2012 at 01:45:28PM +0800, Tang Chen wrote:
> cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's
> running cpu, and migrate itself to the dest cpu. But worker processes are not
> allowed to be migrated. If current is a worker, the worker will be migrated to
> another cpu, but the corresponding  worker_pool is still on the original cpu.
> 
> In this case, the following BUG_ON in try_to_wake_up_local() will be triggered:
> BUG_ON(rq != this_rq());
> 
> This will cause the kernel panic.
> 
> This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover
> jobs onto all the other cpus using system_wq. This could bring some delay for
> the jobs.
> 
> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>

I guess this is ok. Tony?

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 2/2] Do not change worker's running cpu in cmci_rediscover().
  2012-10-19 16:42   ` Borislav Petkov
@ 2012-10-19 17:21     ` Luck, Tony
  2012-10-22  3:33       ` Tang Chen
  0 siblings, 1 reply; 24+ messages in thread
From: Luck, Tony @ 2012-10-19 17:21 UTC (permalink / raw)
  To: Borislav Petkov, Tang Chen
  Cc: stable, tglx, mingo, hpa, miaox, laijs, wency, x86, linux-edac,
	linux-kernel, Borislav Petkov

> In this case, the following BUG_ON in try_to_wake_up_local() will be triggered:
> BUG_ON(rq != this_rq());

Logically this looks OK - what is the test case to trigger this?  I've done a moderate
amount of testing of cpu online/offline while injecting corrected errors (when testing
the CMCI storm patches) ... but didn't see this problem.

-Tony

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-19 16:40   ` Borislav Petkov
@ 2012-10-22  2:10     ` Tang Chen
  2012-10-22 10:14       ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: Tang Chen @ 2012-10-22  2:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, tglx, mingo, hpa, miaox, laijs, wency, x86,
	linux-edac, linux-kernel, Borislav Petkov

On 10/20/2012 12:40 AM, Borislav Petkov wrote:
> On Fri, Oct 19, 2012 at 01:45:27PM +0800, Tang Chen wrote:
>> cmci_rediscover() is only called by the CPU_POST_DEAD event handler,
>> which means the corresponding cpu has already dead. As a result, it
>> won't be accessed in the for_each_online_cpu loop.
>> So, we could change the if(cpu == dying) statement into a WARN_ON_ONCE().
>>
>> Signed-off-by: Tang Chen<tangchen@cn.fujitsu.com>
>> ---
>>   arch/x86/kernel/cpu/mcheck/mce_intel.c |    4 ++--
>>   1 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
>> index 38e49bc..481d152 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
>> @@ -180,8 +180,8 @@ void cmci_rediscover(int dying)
>>   	cpumask_copy(old,&current->cpus_allowed);
>>
>>   	for_each_online_cpu(cpu) {
>> -		if (cpu == dying)
>> -			continue;
>> +		WARN_ON_ONCE(cpu == dying);
>
> Ok, I don't understand that:
>
> we want to warn that the rediscovering is happening on a dying cpu?? And
> before that, we simply jumped over it and didn't do the rediscovering
> there? Why should we warn at all?

Hi Borislav,

As far as I know, cmci_rediscover() is only called in
mce_cpu_callback() to handle CPU_POST_DEAD event.

2362         if (action == CPU_POST_DEAD) {
2363                 /* intentionally ignoring frozen here */
2364                 cmci_rediscover(cpu);
2365         }

I didn't find anywhere else using this function. So I think the cpu
should be dead already when this function is called.

I don't why before we just jumped over it. But I think if we have an
online cpu == dying here, it must be wrong. So I think we should warn
it, not just jump over it.

Thanks. :)


>
> Huh?
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/2] Do not change worker's running cpu in cmci_rediscover().
  2012-10-19 17:21     ` Luck, Tony
@ 2012-10-22  3:33       ` Tang Chen
  2012-10-22 10:18         ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: Tang Chen @ 2012-10-22  3:33 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, tglx, mingo, hpa, miaox, laijs, wency, x86,
	linux-edac, linux-kernel, Borislav Petkov

On 10/20/2012 01:21 AM, Luck, Tony wrote:
>> In this case, the following BUG_ON in try_to_wake_up_local() will be triggered:
>> BUG_ON(rq != this_rq());
>
> Logically this looks OK - what is the test case to trigger this?  I've done a moderate
> amount of testing of cpu online/offline while injecting corrected errors (when testing
> the CMCI storm patches) ... but didn't see this problem.

Hi Tony, Borislav,

Here is my case.

I have 2 nodes, node0 and node1. node1 could be hotpluged.
node0 has cpu0 ~ cpu15, node1 has cpu16 ~ cpu31.

I online all the cpus on node1, and hot-remove node1 directly.

When this problem is triggered, current is a work thread.

For example: cpu20 is dying. current is on cpu21, it migrates
itself to cpu22.

Assume current is process1, and it is a work thread.

        cpu21                        cpu22
p1:
....
cmci_rediscover()
|-set_cpus_allowed_ptr()
   |-stop_one_cpu()
     |-create a work to excute migration_cpu_stop()
       |-wait_for_completion()
         |-wait_for_common()
           |-might_sleep()
Here, p1 gives up cpu21.

The work starts:
migration_cpu_stop()
|-migrate p1 to cpu22

                                On cpu22, p1 wakes up:
                                p1:
                                In wait_for_common()
                                   |-do_wait_for_common()
                                     |-schedule_timeout()
                                       |-schedule()
                                         |-__schedule()
                                           |-try_to_wake_up_local()
                                             |-wq_worker_sleeping()
                                             |-BUG_ON(rq != this_rq())

On cpu22, wq_worker_sleeping() uses p1's worker_pool to find a worker
to go on to execute p1. But p1's worker_pool is on cpu21, and p1 is now
on cpu22. So the BUG_ON(rq != this_rq()) is triggered.

Thanks. :)

>
> -Tony
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-22  2:10     ` Tang Chen
@ 2012-10-22 10:14       ` Borislav Petkov
  2012-10-23  1:35         ` Tang Chen
  2012-10-23  2:55         ` Tang Chen
  0 siblings, 2 replies; 24+ messages in thread
From: Borislav Petkov @ 2012-10-22 10:14 UTC (permalink / raw)
  To: Tang Chen
  Cc: Borislav Petkov, tony.luck, tglx, mingo, hpa, miaox, laijs,
	wency, x86, linux-edac, linux-kernel

On Mon, Oct 22, 2012 at 10:10:24AM +0800, Tang Chen wrote:
> I don't why before we just jumped over it. But I think if we have an
> online cpu == dying here, it must be wrong. So I think we should warn
> it, not just jump over it.

Why do we need to warn? What good would that bring us?

AFAICT, the check in cmci_rediscover is there to make sure we absolutely
don't rediscover on the dying cpu. I think it is a safety precaution in
concurrency scenarios between cpu hotplug and mce code.

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/2] Do not change worker's running cpu in cmci_rediscover().
  2012-10-22  3:33       ` Tang Chen
@ 2012-10-22 10:18         ` Borislav Petkov
  2012-10-23  1:30           ` Tang Chen
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2012-10-22 10:18 UTC (permalink / raw)
  To: Tang Chen
  Cc: Luck, Tony, Borislav Petkov, tglx, mingo, hpa, miaox, laijs,
	wency, x86, linux-edac, linux-kernel

On Mon, Oct 22, 2012 at 11:33:16AM +0800, Tang Chen wrote:
> I have 2 nodes, node0 and node1. node1 could be hotpluged.
> node0 has cpu0 ~ cpu15, node1 has cpu16 ~ cpu31.
> 
> I online all the cpus on node1, and hot-remove node1 directly.

Hold on, I need to ask here: you soft-online all cores on node1 and
*then* you *hot* *remove* it? So with all cores online you physically
take out the processor from the socket? Am I reading this correctly?

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 2/2] Do not change worker's running cpu in cmci_rediscover().
  2012-10-22 10:18         ` Borislav Petkov
@ 2012-10-23  1:30           ` Tang Chen
  0 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2012-10-23  1:30 UTC (permalink / raw)
  To: Borislav Petkov, Luck, Tony, Borislav Petkov, tglx, mingo, hpa,
	miaox, laijs, wency, x86, linux-edac, linux-kernel

On 10/22/2012 06:18 PM, Borislav Petkov wrote:
> On Mon, Oct 22, 2012 at 11:33:16AM +0800, Tang Chen wrote:
>> I have 2 nodes, node0 and node1. node1 could be hotpluged.
>> node0 has cpu0 ~ cpu15, node1 has cpu16 ~ cpu31.
>>
>> I online all the cpus on node1, and hot-remove node1 directly.
>
> Hold on, I need to ask here: you soft-online all cores on node1 and
> *then* you *hot* *remove* it? So with all cores online you physically
> take out the processor from the socket? Am I reading this correctly?

Hi Borislav,

No, it's not like that. I'm sorry to make you confused.

Firstly, let me do a little explanation here. :)

I was doing ACPI based hotplug. In a container device, it contains
cpus and memories and so on. I didn't physically remove cpus from
hardware. I emulated a SCI with my own test module, and the
container_notify_cb() was called, and it recursively remove all the
sub-devices.

The "remove" here doesn't mean "taking it away". And of course, when
the "remove" is done, you can physically take the whole system board
away.

Please refer to this url:
http://www.spinics.net/lists/linux-pci/msg17651.html
This is the module we are doing the SCI emulation.

In summary, the hotplug process could be like the following:

hot-add(SCI) -> online(softly) -> device working -> offline(softly) -> 
hot-remove(SCI)


Secondly, in kernel, before a container device is removed, all its
sub-devices will be offlined and removed first. So I have all the cpus
online, and hot-remove a container(which I think is a node, maybe not)
directly. This operation has no problem, I think.

Maybe I should not say container as a node. :)

And again, sorry for the confusion.

Thanks. :)


>
> Thanks.
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-22 10:14       ` Borislav Petkov
@ 2012-10-23  1:35         ` Tang Chen
  2012-10-23  2:55         ` Tang Chen
  1 sibling, 0 replies; 24+ messages in thread
From: Tang Chen @ 2012-10-23  1:35 UTC (permalink / raw)
  To: Borislav Petkov, Borislav Petkov, tony.luck, tglx, mingo, hpa,
	miaox, laijs, wency, x86, linux-edac, linux-kernel

On 10/22/2012 06:14 PM, Borislav Petkov wrote:
> On Mon, Oct 22, 2012 at 10:10:24AM +0800, Tang Chen wrote:
>> I don't why before we just jumped over it. But I think if we have an
>> online cpu == dying here, it must be wrong. So I think we should warn
>> it, not just jump over it.
>
> Why do we need to warn? What good would that bring us?
>
> AFAICT, the check in cmci_rediscover is there to make sure we absolutely
> don't rediscover on the dying cpu. I think it is a safety precaution in
> concurrency scenarios between cpu hotplug and mce code.

Hi Borislav,

As I said, cmci_rediscover() is only used to handle CPU_POST_DEAD event.
And is uses for_each_online_cpu to iterate all online cpus. If we get a
online cpu == dying, I think we are in a wrong situation. I think at
least, we should give a warning.

Thanks. :)

>
> Thanks.
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-22 10:14       ` Borislav Petkov
  2012-10-23  1:35         ` Tang Chen
@ 2012-10-23  2:55         ` Tang Chen
  2012-10-23  9:52           ` Borislav Petkov
  1 sibling, 1 reply; 24+ messages in thread
From: Tang Chen @ 2012-10-23  2:55 UTC (permalink / raw)
  To: Borislav Petkov, Borislav Petkov, tony.luck, tglx, mingo, hpa,
	miaox, laijs, wency, x86, linux-edac, linux-kernel

On 10/22/2012 06:14 PM, Borislav Petkov wrote:
> On Mon, Oct 22, 2012 at 10:10:24AM +0800, Tang Chen wrote:
>> I don't why before we just jumped over it. But I think if we have an
>> online cpu == dying here, it must be wrong. So I think we should warn
>> it, not just jump over it.
>
> Why do we need to warn? What good would that bring us?
>
> AFAICT, the check in cmci_rediscover is there to make sure we absolutely
> don't rediscover on the dying cpu. I think it is a safety precaution in
> concurrency scenarios between cpu hotplug and mce code.

Well, I see. I dropped the if statement. :)

So, how about warn once, and continue:
	if (cpu == dying) {
		WARN_ON_ONCE(cpu == dying);
		continue;
	}

or, use BUG_ON() instead ?

>
> Thanks.
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-23  2:55         ` Tang Chen
@ 2012-10-23  9:52           ` Borislav Petkov
  2012-10-23 10:17             ` Miao Xie
  2012-10-23 11:30             ` Tang Chen
  0 siblings, 2 replies; 24+ messages in thread
From: Borislav Petkov @ 2012-10-23  9:52 UTC (permalink / raw)
  To: Tang Chen
  Cc: tony.luck, tglx, mingo, hpa, miaox, laijs, wency, x86,
	linux-edac, linux-kernel

On Tue, Oct 23, 2012 at 10:55:13AM +0800, Tang Chen wrote:
> So, how about warn once, and continue:
> 	if (cpu == dying) {
> 		WARN_ON_ONCE(cpu == dying);
> 		continue;
> 	}
> 
> or, use BUG_ON() instead ?

Let me ask you again, but I want you to think real hard this time:

"Why do we need to warn? What good would that bring us?"

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-23  9:52           ` Borislav Petkov
@ 2012-10-23 10:17             ` Miao Xie
  2012-10-23 10:20               ` Borislav Petkov
  2012-10-23 11:30             ` Tang Chen
  1 sibling, 1 reply; 24+ messages in thread
From: Miao Xie @ 2012-10-23 10:17 UTC (permalink / raw)
  To: Borislav Petkov, Tang Chen, tony.luck, tglx, mingo, hpa, laijs,
	wency, x86, linux-edac, linux-kernel

On tue, 23 Oct 2012 11:52:34 +0200, Borislav Petkov wrote:
> On Tue, Oct 23, 2012 at 10:55:13AM +0800, Tang Chen wrote:
>> So, how about warn once, and continue:
>> 	if (cpu == dying) {
>> 		WARN_ON_ONCE(cpu == dying);
>> 		continue;
>> 	}
>>
>> or, use BUG_ON() instead ?
> 
> Let me ask you again, but I want you to think real hard this time:
> 
> "Why do we need to warn? What good would that bring us?"
> 

This function is called after a cpu is offline, in other words, it is
impossible that the cpu is still in cpu_online_mask. otherwise there is
something wrong in the code.

Thanks
Miao

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-23 10:17             ` Miao Xie
@ 2012-10-23 10:20               ` Borislav Petkov
  2012-10-23 10:34                 ` Miao Xie
  0 siblings, 1 reply; 24+ messages in thread
From: Borislav Petkov @ 2012-10-23 10:20 UTC (permalink / raw)
  To: Miao Xie
  Cc: Tang Chen, tony.luck, tglx, mingo, hpa, laijs, wency, x86,
	linux-edac, linux-kernel

On Tue, Oct 23, 2012 at 06:17:31PM +0800, Miao Xie wrote:
> This function is called after a cpu is offline, in other words, it is
> impossible that the cpu is still in cpu_online_mask. otherwise there
> is something wrong in the code.

And?

Are you answering my question or explaining the code just for the fun of
it?

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-23 10:20               ` Borislav Petkov
@ 2012-10-23 10:34                 ` Miao Xie
  2012-10-23 13:14                   ` Borislav Petkov
  0 siblings, 1 reply; 24+ messages in thread
From: Miao Xie @ 2012-10-23 10:34 UTC (permalink / raw)
  To: Borislav Petkov, Tang Chen, tony.luck, tglx, mingo, hpa, laijs,
	wency, x86, linux-edac, linux-kernel

On tue, 23 Oct 2012 12:20:08 +0200, Borislav Petkov wrote:
> On Tue, Oct 23, 2012 at 06:17:31PM +0800, Miao Xie wrote:
>> This function is called after a cpu is offline, in other words, it is
>> impossible that the cpu is still in cpu_online_mask. otherwise there
>> is something wrong in the code.
> 
> And?

So we add this WARN_ON_ONCE(), it can tell the developers that there is
something wrong in the code if it is triggered.

Thanks
Miao

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-23  9:52           ` Borislav Petkov
  2012-10-23 10:17             ` Miao Xie
@ 2012-10-23 11:30             ` Tang Chen
  2012-10-23 14:17               ` Borislav Petkov
  2012-10-23 16:16               ` Luck, Tony
  1 sibling, 2 replies; 24+ messages in thread
From: Tang Chen @ 2012-10-23 11:30 UTC (permalink / raw)
  To: Borislav Petkov, tony.luck, tglx, mingo, hpa, miaox, laijs,
	wency, x86, linux-edac, linux-kernel

On 10/23/2012 05:52 PM, Borislav Petkov wrote:
> On Tue, Oct 23, 2012 at 10:55:13AM +0800, Tang Chen wrote:
>> So, how about warn once, and continue:
>> 	if (cpu == dying) {
>> 		WARN_ON_ONCE(cpu == dying);
>> 		continue;
>> 	}
>>
>> or, use BUG_ON() instead ?
>
> Let me ask you again, but I want you to think real hard this time:
>
> "Why do we need to warn? What good would that bring us?"

Hi,

First of all, I do think I was answering your question. As I said
before, if an online cpu == dying here, there must be something wrong.
Am I right here ?

If so, I think the "good" is obvious. If we don't output anything when
an online cpu == dying, nobody will know this happens. The kernel is in
wrong state, but nobody knows that, I don't see any good.

Actually, I used BUG_ON() in my v1 patch. So I dropped the if statement.
But Tejun asked me to use WARN_ON_ONCE(). And I forgot to add the if
statement.
Please refer to https://lkml.org/lkml/2012/10/16/528

And again, the "good" is inform user the kernel is in wrong state.

Thanks. :)


>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-23 10:34                 ` Miao Xie
@ 2012-10-23 13:14                   ` Borislav Petkov
  0 siblings, 0 replies; 24+ messages in thread
From: Borislav Petkov @ 2012-10-23 13:14 UTC (permalink / raw)
  To: Miao Xie
  Cc: Tang Chen, tony.luck, tglx, mingo, hpa, laijs, wency, x86,
	linux-edac, linux-kernel

On Tue, Oct 23, 2012 at 06:34:33PM +0800, Miao Xie wrote:
> So we add this WARN_ON_ONCE(), it can tell the developers that there
> is something wrong in the code if it is triggered.

First of all, the WARN_ON_ONCE will fire only once during system
lifetime (well, doh, of course) which diminishes debuggability
significantly and then, the only other place which deals with
CPU_POST_DEAD is kernel/stop_machine.c:cpu_stop_cpu_callback.

So, just to sum up and finish this fruitless discussion:
cmci_rediscover() correctly ignores the dying cpu and there's
*absolutely* no need to warn.

If you still think there is, you have to come up with a concrete example
and a way for others to reproduce it. Then we can talk.

End of story.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-23 11:30             ` Tang Chen
@ 2012-10-23 14:17               ` Borislav Petkov
  2012-10-23 16:16               ` Luck, Tony
  1 sibling, 0 replies; 24+ messages in thread
From: Borislav Petkov @ 2012-10-23 14:17 UTC (permalink / raw)
  To: Tang Chen
  Cc: tony.luck, tglx, mingo, hpa, miaox, laijs, wency, x86,
	linux-edac, linux-kernel

On Tue, Oct 23, 2012 at 07:30:21PM +0800, Tang Chen wrote:
> First of all, I do think I was answering your question. As I said
> before, if an online cpu == dying here, there must be something wrong.
> Am I right here ?

Please read the code. We're skipping the cpu == dying case.

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-23 11:30             ` Tang Chen
  2012-10-23 14:17               ` Borislav Petkov
@ 2012-10-23 16:16               ` Luck, Tony
  2012-10-24  1:31                 ` Tang Chen
  1 sibling, 1 reply; 24+ messages in thread
From: Luck, Tony @ 2012-10-23 16:16 UTC (permalink / raw)
  To: Tang Chen, Borislav Petkov, tglx, mingo, hpa, miaox, laijs,
	wency, x86, linux-edac, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1265 bytes --]

> First of all, I do think I was answering your question. As I said
> before, if an online cpu == dying here, there must be something wrong.
> Am I right here ?

Yes - but there is a fuzzy line over where it is good to check for "something wrong"
or whether to trust that the caller of the function knew what they were doing.

For example we trust that "dying" is a valid cpu number.  If we were
super-paranoid that someone might change the code and call us with a
bad argument, we might add:

	BUG_ON(dying < 0 || dying >= MAX_NR_CPUS);

This would certainly help debug the case if someone did make a bogus
change ... but I think it is clear that this test is way past the fuzzy line and
into pointless.

Back to the case in question: do we think there is a credible case where
the "dying" cpu can show up in our "for_each_cpu_online()" loop? The
original author of the code was worried enough to make a test, but thought
that the appropriate action was to silently skip it. You want to add a WARN_ON,
which will cause users who read the console logs to worry, but that most users
will never see.

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() in cmci_rediscover().
  2012-10-23 16:16               ` Luck, Tony
@ 2012-10-24  1:31                 ` Tang Chen
  0 siblings, 0 replies; 24+ messages in thread
From: Tang Chen @ 2012-10-24  1:31 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, tglx, mingo, hpa, miaox, laijs, wency, x86,
	linux-edac, linux-kernel

Hi Luck, Borislav,

OK, since you all think it is not necessary, I think I will drop patch1.
And thanks for your comments. :)

So, how about patch2 ?
If you need more detail, please tell me. Thanks. :)

On 10/24/2012 12:16 AM, Luck, Tony wrote:
>> First of all, I do think I was answering your question. As I said
>> before, if an online cpu == dying here, there must be something wrong.
>> Am I right here ?
>
> Yes - but there is a fuzzy line over where it is good to check for "something wrong"
> or whether to trust that the caller of the function knew what they were doing.
>
> For example we trust that "dying" is a valid cpu number.  If we were
> super-paranoid that someone might change the code and call us with a
> bad argument, we might add:
>
> 	BUG_ON(dying<  0 || dying>= MAX_NR_CPUS);
>
> This would certainly help debug the case if someone did make a bogus
> change ... but I think it is clear that this test is way past the fuzzy line and
> into pointless.
>
> Back to the case in question: do we think there is a credible case where
> the "dying" cpu can show up in our "for_each_cpu_online()" loop? The
> original author of the code was worried enough to make a test, but thought
> that the appropriate action was to silently skip it. You want to add a WARN_ON,
> which will cause users who read the console logs to worry, but that most users
> will never see.
>
> -Tony



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2012-10-24  2:00 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-19  5:45 [PATCH v2 0/2] Do not change worker's running cpu in cmci_rediscover() Tang Chen
2012-10-19  5:45 ` [PATCH v2 1/2] Replace if statement with WARN_ON_ONCE() " Tang Chen
2012-10-19 14:07   ` Greg KH
2012-10-19 16:40   ` Borislav Petkov
2012-10-22  2:10     ` Tang Chen
2012-10-22 10:14       ` Borislav Petkov
2012-10-23  1:35         ` Tang Chen
2012-10-23  2:55         ` Tang Chen
2012-10-23  9:52           ` Borislav Petkov
2012-10-23 10:17             ` Miao Xie
2012-10-23 10:20               ` Borislav Petkov
2012-10-23 10:34                 ` Miao Xie
2012-10-23 13:14                   ` Borislav Petkov
2012-10-23 11:30             ` Tang Chen
2012-10-23 14:17               ` Borislav Petkov
2012-10-23 16:16               ` Luck, Tony
2012-10-24  1:31                 ` Tang Chen
2012-10-19  5:45 ` [PATCH v2 2/2] Do not change worker's running cpu " Tang Chen
2012-10-19 16:42   ` Borislav Petkov
2012-10-19 17:21     ` Luck, Tony
2012-10-22  3:33       ` Tang Chen
2012-10-22 10:18         ` Borislav Petkov
2012-10-23  1:30           ` Tang Chen
2012-10-19  7:21 ` [PATCH v2 0/2] " Tang Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.