linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86/mce: fix failed to reenable cmci when swiching to interrupt mode
@ 2015-08-11 10:09 Xie XiuQi
  2015-08-11 14:46 ` Borislav Petkov
  0 siblings, 1 reply; 5+ messages in thread
From: Xie XiuQi @ 2015-08-11 10:09 UTC (permalink / raw)
  To: tony.luck, bp, tglx, mingo, hpa
  Cc: x86, linux-edac, linux-kernel, zhangliguang

Zhang Liguang report a bug as bellow:
1) system detected cmci storm on current cpu
2) disable cmci interrupt on banks ownd by current cpu, then swiching to poll mode
3) a few minites later, system swiching to interrupt mode on current cpu
4) we expect system to reenable cmci interrupt on banks ownd by current cpu
   mce_intel_adjust_timer
   |-> cmci_reenable
       |-> cmci_discover     # but, ownd banks is ignore here

> static void cmci_discover(int banks)
>	...
>	for (i = 0; i < banks; i++) {
>		...
>		if (test_bit(i, owned))	# ownd banks is ignore here
>			continue;

In this patch, we add a func cmci_storm_enable_banks(), just to enable banks
which ownd by current cpu without clean the ownd flags. We call this func
instead of cmci_reenble() when swiching to interrupt mode.

Reported-by: Zhang Liguang <zhangliguang@huawei.com>
Cc: stable@vger.kernel.org  # v4.1+
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index 844f56c..d4e98c7 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -146,6 +146,22 @@ void mce_intel_hcpu_update(unsigned long cpu)
 	per_cpu(cmci_storm_state, cpu) = CMCI_STORM_NONE;
 }
 
+static void cmci_storm_enable_banks(void)
+{
+	unsigned long flags, *owned;
+	int bank;
+	u64 val;
+
+	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+	owned = this_cpu_ptr(mce_banks_owned);
+	for_each_set_bit(bank, owned, MAX_NR_BANKS) {
+		rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+		val |= MCI_CTL2_CMCI_EN;
+		wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+	}
+	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+}
+
 unsigned long cmci_intel_adjust_timer(unsigned long interval)
 {
 	if ((this_cpu_read(cmci_backoff_cnt) > 0) &&
@@ -175,7 +191,7 @@ unsigned long cmci_intel_adjust_timer(unsigned long interval)
 		 */
 		if (!atomic_read(&cmci_storm_on_cpus)) {
 			__this_cpu_write(cmci_storm_state, CMCI_STORM_NONE);
-			cmci_reenable();
+			cmci_storm_enable_banks();
 			cmci_recheck();
 		}
 		return CMCI_POLL_INTERVAL;
-- 
2.0.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mce: fix failed to reenable cmci when swiching to interrupt mode
  2015-08-11 10:09 [PATCH] x86/mce: fix failed to reenable cmci when swiching to interrupt mode Xie XiuQi
@ 2015-08-11 14:46 ` Borislav Petkov
  2015-08-11 18:52   ` Luck, Tony
  2015-08-12  2:07   ` Xie XiuQi
  0 siblings, 2 replies; 5+ messages in thread
From: Borislav Petkov @ 2015-08-11 14:46 UTC (permalink / raw)
  To: Xie XiuQi
  Cc: tony.luck, tglx, mingo, hpa, x86, linux-edac, linux-kernel, zhangliguang

On Tue, Aug 11, 2015 at 06:09:37PM +0800, Xie XiuQi wrote:
> Zhang Liguang report a bug as bellow:
> 1) system detected cmci storm on current cpu
> 2) disable cmci interrupt on banks ownd by current cpu, then swiching to poll mode
> 3) a few minites later, system swiching to interrupt mode on current cpu
> 4) we expect system to reenable cmci interrupt on banks ownd by current cpu
>    mce_intel_adjust_timer
>    |-> cmci_reenable
>        |-> cmci_discover     # but, ownd banks is ignore here
> 
> > static void cmci_discover(int banks)
> >	...
> >	for (i = 0; i < banks; i++) {
> >		...
> >		if (test_bit(i, owned))	# ownd banks is ignore here
> >			continue;
> 
> In this patch, we add a func cmci_storm_enable_banks(), just to enable banks
> which ownd by current cpu without clean the ownd flags. We call this func
> instead of cmci_reenble() when swiching to interrupt mode.

Hmm, and we cannot clear the owned bit because those banks won't be
polled otherwise, see:

27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")

Yuck.

Well, ok, but do it differently, please: rename
cmci_storm_disable_banks() to cmci_storm_switch_banks(bool on) which
turns them on and off. Unless Tony has a better suggestion...

> Reported-by: Zhang Liguang <zhangliguang@huawei.com>
> Cc: stable@vger.kernel.org  # v4.1+

Why 4.1 only?

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] x86/mce: fix failed to reenable cmci when swiching to interrupt mode
  2015-08-11 14:46 ` Borislav Petkov
@ 2015-08-11 18:52   ` Luck, Tony
  2015-08-12  2:08     ` Xie XiuQi
  2015-08-12  2:07   ` Xie XiuQi
  1 sibling, 1 reply; 5+ messages in thread
From: Luck, Tony @ 2015-08-11 18:52 UTC (permalink / raw)
  To: Borislav Petkov, Xie XiuQi
  Cc: tglx, mingo, hpa, x86, linux-edac, linux-kernel, zhangliguang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 592 bytes --]

> Well, ok, but do it differently, please: rename
> cmci_storm_disable_banks() to cmci_storm_switch_banks(bool on) which
> turns them on and off. Unless Tony has a better suggestion...

I like the boolean argument ... but not the "switch_banks" name. It sounds more
like we are juggling between banks, rather than setting a switch/flag in a bank.

How does "cmci_storm_set_cmci(bool on)" sound?  Too many "cmci" in one name?

-Tony
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mce: fix failed to reenable cmci when swiching to interrupt mode
  2015-08-11 14:46 ` Borislav Petkov
  2015-08-11 18:52   ` Luck, Tony
@ 2015-08-12  2:07   ` Xie XiuQi
  1 sibling, 0 replies; 5+ messages in thread
From: Xie XiuQi @ 2015-08-12  2:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, tglx, mingo, hpa, x86, linux-edac, linux-kernel, zhangliguang

On 2015/8/11 22:46, Borislav Petkov wrote:
> On Tue, Aug 11, 2015 at 06:09:37PM +0800, Xie XiuQi wrote:
>> Zhang Liguang report a bug as bellow:
>> 1) system detected cmci storm on current cpu
>> 2) disable cmci interrupt on banks ownd by current cpu, then swiching to poll mode
>> 3) a few minites later, system swiching to interrupt mode on current cpu
>> 4) we expect system to reenable cmci interrupt on banks ownd by current cpu
>>     mce_intel_adjust_timer
>>     |-> cmci_reenable
>>         |-> cmci_discover     # but, ownd banks is ignore here
>>
>>> static void cmci_discover(int banks)
>>> 	...
>>> 	for (i = 0; i < banks; i++) {
>>> 		...
>>> 		if (test_bit(i, owned))	# ownd banks is ignore here
>>> 			continue;
>>
>> In this patch, we add a func cmci_storm_enable_banks(), just to enable banks
>> which ownd by current cpu without clean the ownd flags. We call this func
>> instead of cmci_reenble() when swiching to interrupt mode.
>
> Hmm, and we cannot clear the owned bit because those banks won't be
> polled otherwise, see:
>
> 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms")

OK, thanks.

>
> Yuck.
>
> Well, ok, but do it differently, please: rename
> cmci_storm_disable_banks() to cmci_storm_switch_banks(bool on) which
> turns them on and off. Unless Tony has a better suggestion...
>
>> Reported-by: Zhang Liguang <zhangliguang@huawei.com>
>> Cc: stable@vger.kernel.org  # v4.1+
>
> Why 4.1 only?

My fault, it's v3.15+.

Thanks,
Xie XiuQi

>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86/mce: fix failed to reenable cmci when swiching to interrupt mode
  2015-08-11 18:52   ` Luck, Tony
@ 2015-08-12  2:08     ` Xie XiuQi
  0 siblings, 0 replies; 5+ messages in thread
From: Xie XiuQi @ 2015-08-12  2:08 UTC (permalink / raw)
  To: Luck, Tony, Borislav Petkov
  Cc: tglx, mingo, hpa, x86, linux-edac, linux-kernel, zhangliguang

On 2015/8/12 2:52, Luck, Tony wrote:
>> Well, ok, but do it differently, please: rename
>> cmci_storm_disable_banks() to cmci_storm_switch_banks(bool on) which
>> turns them on and off. Unless Tony has a better suggestion...
>
> I like the boolean argument ... but not the "switch_banks" name. It sounds more
> like we are juggling between banks, rather than setting a switch/flag in a bank.
>
> How does "cmci_storm_set_cmci(bool on)" sound?  Too many "cmci" in one name?

Thanks, I'll use this name.

--
Xie XiuQi

>
> -Tony
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-08-12  2:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-11 10:09 [PATCH] x86/mce: fix failed to reenable cmci when swiching to interrupt mode Xie XiuQi
2015-08-11 14:46 ` Borislav Petkov
2015-08-11 18:52   ` Luck, Tony
2015-08-12  2:08     ` Xie XiuQi
2015-08-12  2:07   ` Xie XiuQi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).