All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: Don't print number of MCE banks for every CPU
@ 2009-10-15 21:21 Roland Dreier
  2009-10-16  7:20 ` Ingo Molnar
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Roland Dreier @ 2009-10-15 21:21 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin; +Cc: linux-kernel, x86

The MCE initialization code explicitly says it doesn't handle asymmetric
configurations where different CPUs support different numbers of MCE
banks, and it prints a big warning in that case.  Therefore, printing
the "mce: CPU supports <x> MCE banks" message into the kernel log for
every CPU is pure redundancy that clutters the log significantly for
systems with lots of CPUs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b1598a9..721a77c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void)
 	rdmsrl(MSR_IA32_MCG_CAP, cap);
 
 	b = cap & MCG_BANKCNT_MASK;
-	printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
+	if (!banks)
+		printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
 
 	if (b > MAX_NR_BANKS) {
 		printk(KERN_WARNING

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: Don't print number of MCE banks for every CPU
  2009-10-15 21:21 [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier
@ 2009-10-16  7:20 ` Ingo Molnar
  2009-10-16  7:22 ` [tip:x86/urgent] " tip-bot for Roland Dreier
  2009-10-27 19:42 ` [PATCH] " Mike Travis
  2 siblings, 0 replies; 17+ messages in thread
From: Ingo Molnar @ 2009-10-16  7:20 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86


* Roland Dreier <rdreier@cisco.com> wrote:

> The MCE initialization code explicitly says it doesn't handle asymmetric
> configurations where different CPUs support different numbers of MCE
> banks, and it prints a big warning in that case.  Therefore, printing
> the "mce: CPU supports <x> MCE banks" message into the kernel log for
> every CPU is pure redundancy that clutters the log significantly for
> systems with lots of CPUs.
> 
> Signed-off-by: Roland Dreier <rolandd@cisco.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)

Applied, thanks Roland!

	Ingo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [tip:x86/urgent] x86: Don't print number of MCE banks for every CPU
  2009-10-15 21:21 [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier
  2009-10-16  7:20 ` Ingo Molnar
@ 2009-10-16  7:22 ` tip-bot for Roland Dreier
  2009-10-27 19:42 ` [PATCH] " Mike Travis
  2 siblings, 0 replies; 17+ messages in thread
From: tip-bot for Roland Dreier @ 2009-10-16  7:22 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, rdreier, rolandd, tglx, mingo

Commit-ID:  93ae5012a79b11e7fc855b52c7ce1e16fe1540b0
Gitweb:     http://git.kernel.org/tip/93ae5012a79b11e7fc855b52c7ce1e16fe1540b0
Author:     Roland Dreier <rdreier@cisco.com>
AuthorDate: Thu, 15 Oct 2009 14:21:14 -0700
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 16 Oct 2009 09:20:03 +0200

x86: Don't print number of MCE banks for every CPU

The MCE initialization code explicitly says it doesn't handle
asymmetric configurations where different CPUs support different
numbers of MCE banks, and it prints a big warning in that case.

Therefore, printing the "mce: CPU supports <x> MCE banks"
message into the kernel log for every CPU is pure redundancy
that clutters the log significantly for systems with lots of
CPUs.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
LKML-Reference: <adaeip473qt.fsf@cisco.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index b1598a9..721a77c 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void)
 	rdmsrl(MSR_IA32_MCG_CAP, cap);
 
 	b = cap & MCG_BANKCNT_MASK;
-	printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
+	if (!banks)
+		printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
 
 	if (b > MAX_NR_BANKS) {
 		printk(KERN_WARNING

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: Don't print number of MCE banks for every CPU
  2009-10-15 21:21 [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier
  2009-10-16  7:20 ` Ingo Molnar
  2009-10-16  7:22 ` [tip:x86/urgent] " tip-bot for Roland Dreier
@ 2009-10-27 19:42 ` Mike Travis
  2009-10-27 20:53   ` Mike Travis
  2 siblings, 1 reply; 17+ messages in thread
From: Mike Travis @ 2009-10-27 19:42 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86

Hi Roland,

I've found that I'm getting one of these lines for every cpu:

mce: CPU supports 0 MCE banks

Regards,
Mike

Roland Dreier wrote:
> The MCE initialization code explicitly says it doesn't handle asymmetric
> configurations where different CPUs support different numbers of MCE
> banks, and it prints a big warning in that case.  Therefore, printing
> the "mce: CPU supports <x> MCE banks" message into the kernel log for
> every CPU is pure redundancy that clutters the log significantly for
> systems with lots of CPUs.
> 
> Signed-off-by: Roland Dreier <rolandd@cisco.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index b1598a9..721a77c 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void)
>  	rdmsrl(MSR_IA32_MCG_CAP, cap);
>  
>  	b = cap & MCG_BANKCNT_MASK;
> -	printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
> +	if (!banks)
> +		printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
>  
>  	if (b > MAX_NR_BANKS) {
>  		printk(KERN_WARNING
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: Don't print number of MCE banks for every CPU
  2009-10-27 19:42 ` [PATCH] " Mike Travis
@ 2009-10-27 20:53   ` Mike Travis
  2009-10-28  4:07     ` [PATCH] x86, mce: disable MCE if cpu has no MCE banks Hidetoshi Seto
  2009-10-28  4:26     ` [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier
  0 siblings, 2 replies; 17+ messages in thread
From: Mike Travis @ 2009-10-27 20:53 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86


Mike Travis wrote:
> Hi Roland,
> 
> I've found that I'm getting one of these lines for every cpu:
> 
> mce: CPU supports 0 MCE banks
> 

A bit more info.  THe data above was from our simulator which
apparently is not simulating mce very well.  On a live system
I get 383 lines (for 383 additional cpus) with what appears to be
redundant lines...

[    4.882085] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21
[    4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21
...
[    4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21


> Regards,
> Mike
> 
> Roland Dreier wrote:
>> The MCE initialization code explicitly says it doesn't handle asymmetric
>> configurations where different CPUs support different numbers of MCE
>> banks, and it prints a big warning in that case.  Therefore, printing
>> the "mce: CPU supports <x> MCE banks" message into the kernel log for
>> every CPU is pure redundancy that clutters the log significantly for
>> systems with lots of CPUs.
>>
>> Signed-off-by: Roland Dreier <rolandd@cisco.com>
>> ---
>>  arch/x86/kernel/cpu/mcheck/mce.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c 
>> b/arch/x86/kernel/cpu/mcheck/mce.c
>> index b1598a9..721a77c 100644
>> --- a/arch/x86/kernel/cpu/mcheck/mce.c
>> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
>> @@ -1214,7 +1214,8 @@ static int __cpuinit mce_cap_init(void)
>>      rdmsrl(MSR_IA32_MCG_CAP, cap);
>>  
>>      b = cap & MCG_BANKCNT_MASK;
>> -    printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
>> +    if (!banks)
>> +        printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
>>  
>>      if (b > MAX_NR_BANKS) {
>>          printk(KERN_WARNING
>> -- 
>> To unsubscribe from this list: send the line "unsubscribe 
>> linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-27 20:53   ` Mike Travis
@ 2009-10-28  4:07     ` Hidetoshi Seto
  2009-10-28  5:24       ` Andi Kleen
  2009-10-28  4:26     ` [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier
  1 sibling, 1 reply; 17+ messages in thread
From: Hidetoshi Seto @ 2009-10-28  4:07 UTC (permalink / raw)
  To: Mike Travis
  Cc: Roland Dreier, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-kernel, x86, Andi Kleen

Mike Travis wrote:
> 
> Mike Travis wrote:
>> Hi Roland,
>>
>> I've found that I'm getting one of these lines for every cpu:
>>
>> mce: CPU supports 0 MCE banks

I believe my patch at last in this mail will solve this issue.

> A bit more info.  THe data above was from our simulator which
> apparently is not simulating mce very well.  On a live system
> I get 383 lines (for 383 additional cpus) with what appears to be
> redundant lines...
> 
> [    4.882085] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6
> SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18
> SHD:19 SHD:20 SHD:21
> [    4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6
> SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18
> SHD:19 SHD:20 SHD:21
> ...
> [    4.978893] CPU 2 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6
> SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18
> SHD:19 SHD:20 SHD:21

Hum, I suppose the line for CPU 0 was slightly different from others,
because SHD means "this bank is shared bank and controlled by other".
Maybe:
 CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21

But I agree that we could some work for this messages...
Is it better to change the message level to debug from info?
How about changing the format like:
  CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC
  CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss
   :

If there are no complains, I'll make another patch to do so.


Thanks,
H.Seto

===

Subject: [PATCH] x86, mce: disable MCE if cpu has no MCE banks

If cpu has no MCE banks (e.g. simulated processor on VMs), it is better to
disable MCE support on the system since we cannot handle MCE well.

Reported-by: Mike Travis <travis@sgi.com>
Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 8080170..29055ab 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -1228,6 +1228,10 @@ static int __cpuinit __mcheck_cpu_cap_init(void)
 	rdmsrl(MSR_IA32_MCG_CAP, cap);
 
 	b = cap & MCG_BANKCNT_MASK;
+	if (!b) {
+		pr_info("MCE: no MCE banks - not enabling MCE support.\n");
+		return -ENODEV;
+	}
 	if (!banks)
 		printk(KERN_INFO "mce: CPU supports %d MCE banks\n", b);
 
-- 
1.6.5.2



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86: Don't print number of MCE banks for every CPU
  2009-10-27 20:53   ` Mike Travis
  2009-10-28  4:07     ` [PATCH] x86, mce: disable MCE if cpu has no MCE banks Hidetoshi Seto
@ 2009-10-28  4:26     ` Roland Dreier
  1 sibling, 0 replies; 17+ messages in thread
From: Roland Dreier @ 2009-10-28  4:26 UTC (permalink / raw)
  To: Mike Travis
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-kernel, x86


 > [    4.882085] CPU 1 MCA banks SHD:0 SHD:1 CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:7 SHD:8 SHD:9 SHD:12 SHD:13 SHD:14 SHD:15 SHD:16 SHD:17 SHD:18 SHD:19 SHD:20 SHD:21

Yes, we should probably kill that debug output as well, that was on my
list of things to do.

 - R.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-28  4:07     ` [PATCH] x86, mce: disable MCE if cpu has no MCE banks Hidetoshi Seto
@ 2009-10-28  5:24       ` Andi Kleen
  2009-10-28  6:26         ` Hidetoshi Seto
  2009-10-28 12:03         ` Valdis.Kletnieks
  0 siblings, 2 replies; 17+ messages in thread
From: Andi Kleen @ 2009-10-28  5:24 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Mike Travis, Roland Dreier, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-kernel, x86

Hidetoshi Seto wrote:
> Mike Travis wrote:
>> Mike Travis wrote:
>>> Hi Roland,
>>>
>>> I've found that I'm getting one of these lines for every cpu:
>>>
>>> mce: CPU supports 0 MCE banks

That message can be just removed I think. I don't see much value in it
because the value is in sysfs and when you see the CPU type you can easily
determine it anyways.

I don't think the patch below really solves the problem because they
would have the same noise problem back once they switch from the simulator
to a real box which has banks.

> Hum, I suppose the line for CPU 0 was slightly different from others,
> because SHD means "this bank is shared bank and controlled by other".
> Maybe:
>  CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
> 
> But I agree that we could some work for this messages...
> Is it better to change the message level to debug from info?

Can be made INFO yes, but I would prefer not removing them
from the dmesg for now.

Perhaps they could be also compressed a bit like SRAT.

-Andi

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-28  5:24       ` Andi Kleen
@ 2009-10-28  6:26         ` Hidetoshi Seto
  2009-10-28  6:48           ` Andi Kleen
  2009-10-28 12:03         ` Valdis.Kletnieks
  1 sibling, 1 reply; 17+ messages in thread
From: Hidetoshi Seto @ 2009-10-28  6:26 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Mike Travis, Roland Dreier, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-kernel, x86

Andi Kleen wrote:
> Hidetoshi Seto wrote:
>> Mike Travis wrote:
>>> Mike Travis wrote:
>>>> Hi Roland,
>>>>
>>>> I've found that I'm getting one of these lines for every cpu:
>>>>
>>>> mce: CPU supports 0 MCE banks
> 
> That message can be just removed I think. I don't see much value in it
> because the value is in sysfs and when you see the CPU type you can easily
> determine it anyways.
> 
> I don't think the patch below really solves the problem because they
> would have the same noise problem back once they switch from the simulator
> to a real box which has banks.

If box has any banks more than 0, then the line above will be appeared only
once for CPU 0.  Only on the simulator, with MCE-capable processor with no
bank, this message becomes unacceptable noise because it appears for every
cpu.

Anyway I think my patch is nice to have, to avoid unexpected behavior on
uncertain environment.  

Without disabling, what can we do on MCE with no bank?
I found that do_machine_check() does nothing if banks==0 ... it is better
to let system to panic with "Machine check from unknown source"?


>> Hum, I suppose the line for CPU 0 was slightly different from others,
>> because SHD means "this bank is shared bank and controlled by other".
>> Maybe:
>>  CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>
>> But I agree that we could some work for this messages...
>> Is it better to change the message level to debug from info?
> 
> Can be made INFO yes, but I would prefer not removing them
> from the dmesg for now.
> 
> Perhaps they could be also compressed a bit like SRAT.

Like SRAT?  I could not catch the meaning ... For example?


Thanks,
H.Seto


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-28  6:26         ` Hidetoshi Seto
@ 2009-10-28  6:48           ` Andi Kleen
  2009-10-28  8:18             ` Hidetoshi Seto
  2009-10-28 17:12             ` Roland Dreier
  0 siblings, 2 replies; 17+ messages in thread
From: Andi Kleen @ 2009-10-28  6:48 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Mike Travis, Roland Dreier, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-kernel, x86

Hidetoshi Seto wrote:

> 
> Without disabling, what can we do on MCE with no bank?

Nothing, but is it really worth adding a special case?

> I found that do_machine_check() does nothing if banks==0 ... it is better
> to let system to panic with "Machine check from unknown source"?

IMHO yes. In this case the system must be very confused and panic is the
best you can do. Otherwise it won't do anything interesting anyways.

> 
>>> Hum, I suppose the line for CPU 0 was slightly different from others,
>>> because SHD means "this bank is shared bank and controlled by other".
>>> Maybe:
>>>  CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>>
>>> But I agree that we could some work for this messages...
>>> Is it better to change the message level to debug from info?
>> Can be made INFO yes, but I would prefer not removing them
>> from the dmesg for now.
>>
>> Perhaps they could be also compressed a bit like SRAT.
> 
> Like SRAT?  I could not catch the meaning ... For example?

See the recent patches from David Rientjes in the same original thread.

-Andi


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-28  6:48           ` Andi Kleen
@ 2009-10-28  8:18             ` Hidetoshi Seto
  2009-10-28 17:09               ` Mike Travis
  2009-10-28 17:12             ` Roland Dreier
  1 sibling, 1 reply; 17+ messages in thread
From: Hidetoshi Seto @ 2009-10-28  8:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Mike Travis, Roland Dreier, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-kernel, x86

Andi Kleen wrote:
> Hidetoshi Seto wrote:
>> Without disabling, what can we do on MCE with no bank?
> 
> Nothing, but is it really worth adding a special case?

If question were:
  - is it really worth to support this special environment,
    "MCE-capable but no MCE banks" ?
then I'd like to say no.

So I suggested to disable MCE on this uncertain environment.
Or we will end up adding more codes for special cases...

>> I found that do_machine_check() does nothing if banks==0 ... it is better
>> to let system to panic with "Machine check from unknown source"?
> 
> IMHO yes. In this case the system must be very confused and panic is the
> best you can do. Otherwise it won't do anything interesting anyways.

Agreed, but this is also a special case.
Not depending on the real number of banks, confused system could fail to
get the value from memory... Humm, in theory MCE handler must be
implemented carefully, but I bet the confused value will not be always 0,
... is it worth to do?

>>>> Hum, I suppose the line for CPU 0 was slightly different from others,
>>>> because SHD means "this bank is shared bank and controlled by other".
>>>> Maybe:
>>>>  CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>>>
>>>> But I agree that we could some work for this messages...
>>>> Is it better to change the message level to debug from info?
>>> Can be made INFO yes, but I would prefer not removing them
>>> from the dmesg for now.
>>>
>>> Perhaps they could be also compressed a bit like SRAT.
>>
>> Like SRAT?  I could not catch the meaning ... For example?
> 
> See the recent patches from David Rientjes in the same original thread.

I found it, thanks.

So I suppose your idea is like:
  CPU 0 MCA banks CMCI:{0-3,5-9,12-21} POLL:{4,10,11}
  CPU 1 MCA banks SHD:{0,1,6-9,12-21} CMCI:{2,3,5} POLL:{4,10,11}
right?

IMHO the format I suggested is better to read, as far as banks is
not so big number.
  CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC
  CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss


Thanks,
H.Seto


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-28  5:24       ` Andi Kleen
  2009-10-28  6:26         ` Hidetoshi Seto
@ 2009-10-28 12:03         ` Valdis.Kletnieks
  2009-10-28 13:44           ` Andi Kleen
  1 sibling, 1 reply; 17+ messages in thread
From: Valdis.Kletnieks @ 2009-10-28 12:03 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Hidetoshi Seto, Mike Travis, Roland Dreier, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, linux-kernel, x86

[-- Attachment #1: Type: text/plain, Size: 472 bytes --]

On Wed, 28 Oct 2009 06:24:45 BST, Andi Kleen said:

> >>> mce: CPU supports 0 MCE banks
> 
> That message can be just removed I think. I don't see much value in it
> because the value is in sysfs and when you see the CPU type you can easily
> determine it anyways.

Maybe it should only print a message if it finds an unexpected number of banks?
"Hey dood - we're on a Core3.5 and there should be 6 banks here, but the
hardware says there's only 4. What's up with that?"


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-28 12:03         ` Valdis.Kletnieks
@ 2009-10-28 13:44           ` Andi Kleen
  0 siblings, 0 replies; 17+ messages in thread
From: Andi Kleen @ 2009-10-28 13:44 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Hidetoshi Seto, Mike Travis, Roland Dreier, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, linux-kernel, x86

Valdis.Kletnieks@vt.edu wrote:
> On Wed, 28 Oct 2009 06:24:45 BST, Andi Kleen said:
> 
>>>>> mce: CPU supports 0 MCE banks
>> That message can be just removed I think. I don't see much value in it
>> because the value is in sysfs and when you see the CPU type you can easily
>> determine it anyways.
> 
> Maybe it should only print a message if it finds an unexpected number of banks?
> "Hey dood - we're on a Core3.5 and there should be 6 banks here, but the
> hardware says there's only 4. What's up with that?"

The kernel doesn't know what number of banks are expected, just humans do.

-Andi



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-28  8:18             ` Hidetoshi Seto
@ 2009-10-28 17:09               ` Mike Travis
  0 siblings, 0 replies; 17+ messages in thread
From: Mike Travis @ 2009-10-28 17:09 UTC (permalink / raw)
  To: Hidetoshi Seto
  Cc: Andi Kleen, Roland Dreier, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-kernel, x86



Hidetoshi Seto wrote:
> Andi Kleen wrote:
>> Hidetoshi Seto wrote:
>>> Without disabling, what can we do on MCE with no bank?
>> Nothing, but is it really worth adding a special case?
> 
> If question were:
>   - is it really worth to support this special environment,
>     "MCE-capable but no MCE banks" ?
> then I'd like to say no.
> 
> So I suggested to disable MCE on this uncertain environment.
> Or we will end up adding more codes for special cases...
> 
>>> I found that do_machine_check() does nothing if banks==0 ... it is better
>>> to let system to panic with "Machine check from unknown source"?
>> IMHO yes. In this case the system must be very confused and panic is the
>> best you can do. Otherwise it won't do anything interesting anyways.
> 
> Agreed, but this is also a special case.
> Not depending on the real number of banks, confused system could fail to
> get the value from memory... Humm, in theory MCE handler must be
> implemented carefully, but I bet the confused value will not be always 0,
> ... is it worth to do?
> 
>>>>> Hum, I suppose the line for CPU 0 was slightly different from others,
>>>>> because SHD means "this bank is shared bank and controlled by other".
>>>>> Maybe:
>>>>>  CPU 0 MCA banks CMCI:0 CMCI:1 CMCI:2 CMCI:3 CMCI:5 ... CMCI:21
>>>>>
>>>>> But I agree that we could some work for this messages...
>>>>> Is it better to change the message level to debug from info?
>>>> Can be made INFO yes, but I would prefer not removing them
>>>> from the dmesg for now.
>>>>
>>>> Perhaps they could be also compressed a bit like SRAT.
>>> Like SRAT?  I could not catch the meaning ... For example?
>> See the recent patches from David Rientjes in the same original thread.
> 
> I found it, thanks.
> 
> So I suppose your idea is like:
>   CPU 0 MCA banks CMCI:{0-3,5-9,12-21} POLL:{4,10,11}
>   CPU 1 MCA banks SHD:{0,1,6-9,12-21} CMCI:{2,3,5} POLL:{4,10,11}
> right?
> 
> IMHO the format I suggested is better to read, as far as banks is
> not so big number.
>   CPU 0 MCA banks map : CCCC PCCC CCPP CCCC CCCC CC
>   CPU 1 MCA banks map : ssCC PCss ssPP ssss ssss ss
> 
> 
> Thanks,
> H.Seto

The problem comes up when you have a whole bunch of cpus, and the lines
become redundant.  Can you compress the lines so that cpus with the
same given mappings are printed on one line?

Thanks,
Mike

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-28  6:48           ` Andi Kleen
  2009-10-28  8:18             ` Hidetoshi Seto
@ 2009-10-28 17:12             ` Roland Dreier
  2009-10-28 17:37               ` Mike Travis
  1 sibling, 1 reply; 17+ messages in thread
From: Roland Dreier @ 2009-10-28 17:12 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Hidetoshi Seto, Mike Travis, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-kernel, x86


 > Perhaps they could be also compressed a bit like SRAT.

Seems like a good idea... but I wonder what the best way to represent
things is.  For example I have a 2-socket Nehalem system that shows:

 2 times:  MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
 6 times:  MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
 8 times:  MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8

presumably the first line is once per package, the next line is for the
first sibling in all the other cores in a package, and the last line is
for the SMT siblings of all the cores.

But would we want to accumulate all the different combinations of banks
along with a CPU mask and then print something like:

 CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
 CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
 CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8

of course output like that is going to lead to super-long lines on a
64-thread system.

Also I'm not sure of a clean way to implement this; unlike the SRAT
stuff, we need to deal with CPU hotplug so all this at best could be
__cpuinitdata, ie we can't discard it in most configs.

However the "MCA banks" output definitely is annoying on a 64-thread
system -- the amount of output is far greater than the utility of said
output.  So ideas on the best way to reduce this would be appreciated.

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-28 17:12             ` Roland Dreier
@ 2009-10-28 17:37               ` Mike Travis
  2009-10-28 18:03                 ` Roland Dreier
  0 siblings, 1 reply; 17+ messages in thread
From: Mike Travis @ 2009-10-28 17:37 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Andi Kleen, Hidetoshi Seto, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-kernel, x86



Roland Dreier wrote:
>  > Perhaps they could be also compressed a bit like SRAT.
> 
> Seems like a good idea... but I wonder what the best way to represent
> things is.  For example I have a 2-socket Nehalem system that shows:
> 
>  2 times:  MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
>  6 times:  MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
>  8 times:  MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
> 
> presumably the first line is once per package, the next line is for the
> first sibling in all the other cores in a package, and the last line is
> for the SMT siblings of all the cores.
> 
> But would we want to accumulate all the different combinations of banks
> along with a CPU mask and then print something like:
> 
>  CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
>  CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
>  CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8

Or use a cpumask and cpulist_scnprintf which condenses the cpu list nicely.

> 
> of course output like that is going to lead to super-long lines on a
> 64-thread system.
> 
> Also I'm not sure of a clean way to implement this; unlike the SRAT
> stuff, we need to deal with CPU hotplug so all this at best could be
> __cpuinitdata, ie we can't discard it in most configs.
> 
> However the "MCA banks" output definitely is annoying on a 64-thread
> system -- the amount of output is far greater than the utility of said
> output.  So ideas on the best way to reduce this would be appreciated.
> 
> Thanks,
>   Roland

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] x86, mce: disable MCE if cpu has no MCE banks
  2009-10-28 17:37               ` Mike Travis
@ 2009-10-28 18:03                 ` Roland Dreier
  0 siblings, 0 replies; 17+ messages in thread
From: Roland Dreier @ 2009-10-28 18:03 UTC (permalink / raw)
  To: Mike Travis
  Cc: Andi Kleen, Hidetoshi Seto, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-kernel, x86


 > > But would we want to accumulate all the different combinations of banks
 > > along with a CPU mask and then print something like:
 > >
 > >  CPUs 0 4: MCA banks CMCI:2 CMCI:3 CMCI:5 CMCI:6 SHD:8
 > >  CPUs 1 2 3 5 6 7: MCA banks CMCI:2 CMCI:3 CMCI:5 SHD:6 SHD:8
 > >  CPUs 8 9 10 11 12 13 14 15: MCA banks SHD:2 SHD:3 SHD:5 SHD:6 SHD:8
 > 
 > Or use a cpumask and cpulist_scnprintf which condenses the cpu list nicely.

Thanks!  I didn't know about that API.

However with that said I think the real issue is whether that style of
output is a good idea, no matter how nicely the CPU list is formatted :)

 - R.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2009-10-28 18:03 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-15 21:21 [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier
2009-10-16  7:20 ` Ingo Molnar
2009-10-16  7:22 ` [tip:x86/urgent] " tip-bot for Roland Dreier
2009-10-27 19:42 ` [PATCH] " Mike Travis
2009-10-27 20:53   ` Mike Travis
2009-10-28  4:07     ` [PATCH] x86, mce: disable MCE if cpu has no MCE banks Hidetoshi Seto
2009-10-28  5:24       ` Andi Kleen
2009-10-28  6:26         ` Hidetoshi Seto
2009-10-28  6:48           ` Andi Kleen
2009-10-28  8:18             ` Hidetoshi Seto
2009-10-28 17:09               ` Mike Travis
2009-10-28 17:12             ` Roland Dreier
2009-10-28 17:37               ` Mike Travis
2009-10-28 18:03                 ` Roland Dreier
2009-10-28 12:03         ` Valdis.Kletnieks
2009-10-28 13:44           ` Andi Kleen
2009-10-28  4:26     ` [PATCH] x86: Don't print number of MCE banks for every CPU Roland Dreier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.