* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms
[not found] <tip-27f6c573e0f77f7d1cc907c1494c99a61e48b7d8@git.kernel.org>
@ 2014-04-02 8:51 ` William Dauchy
2014-04-02 9:01 ` Borislav Petkov
2014-04-11 17:01 ` H. Peter Anvin
2014-04-14 8:39 ` [PATCH 1/2] x86, CMCI: Fix a missed put_cpu_var Chen, Gong
2 siblings, 1 reply; 12+ messages in thread
From: William Dauchy @ 2014-04-02 8:51 UTC (permalink / raw)
To: linux-kernel, Ingo Molnar, hpa, William Dauchy, tglx, Tony, Gong
Cc: linux-tip-commits, William Dauchy
On Wed, Apr 2, 2014 at 9:55 AM, tip-bot for Chen, Gong <tipbot@zytor.com> wrote:
> Commit-ID: 27f6c573e0f77f7d1cc907c1494c99a61e48b7d8
> Gitweb: http://git.kernel.org/tip/27f6c573e0f77f7d1cc907c1494c99a61e48b7d8
> Author: Chen, Gong <gong.chen@linux.intel.com>
> AuthorDate: Thu, 27 Mar 2014 21:24:36 -0400
> Committer: Tony Luck <tony.luck@intel.com>
> CommitDate: Fri, 28 Mar 2014 13:40:16 -0700
>
> x86, CMCI: Add proper detection of end of CMCI storms
>
> When CMCI storm persists for a long time(at least beyond predefined
> threshold. It's 30 seconds for now), we can watch CMCI storm is
> detected immediately after it subsides.
>
> ...
> Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode
> Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode
> Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode
> Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode
> ...
>
> The problem is that our logic that determines that the storm has
> ended is incorrect. We announce the end, re-enable interrupts and
> realize that the storm is still going on, so we switch back to
> polling mode. Rinse, repeat.
>
> When a storm happens we disable signaling of errors via CMCI and begin
> polling machine check banks instead. If we find any logged errors,
> then we need to set a per-cpu flag so that our per-cpu tests that
> check whether the storm is ongoing will see that errors are still
> being logged independently of whether mce_notify_irq() says that the
> error has been fully processed.
>
> cmci_clear() is not the right tool to disable a bank. It disables the
> interrupt for the bank as desired, but it also clears the bit for
> this bank in "mce_banks_owned" so we will skip the bank when polling
> (so we fail to see that the storm continues because we stop looking).
> New cmci_storm_disable_banks() just disables the interrupt while
> allowing polling to continue.
>
> Reported-by: William Dauchy <wdauchy@gmail.com>
Could you use the following address instead?
Reported-by: William Dauchy <william@gandi.net>
Thanks,
> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
> Signed-off-by: Tony Luck <tony.luck@intel.com>
--
William
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms
2014-04-02 8:51 ` [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms William Dauchy
@ 2014-04-02 9:01 ` Borislav Petkov
2014-04-02 9:06 ` William Dauchy
2014-04-02 10:46 ` Ingo Molnar
0 siblings, 2 replies; 12+ messages in thread
From: Borislav Petkov @ 2014-04-02 9:01 UTC (permalink / raw)
To: William Dauchy
Cc: linux-kernel, Ingo Molnar, hpa, tglx, Tony, Gong,
linux-tip-commits, William Dauchy
On Wed, Apr 02, 2014 at 10:51:49AM +0200, William Dauchy wrote:
> Could you use the following address instead?
> Reported-by: William Dauchy <william@gandi.net>
It is too late for that now as the patch is in -tip already... Unless
Ingo can still amend it, that is.
But, we're working on a real solution for the storm issue and there
we'll be asking you to test stuff anyway so we'll make sure to use this
mail address then, ok?
:-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms
2014-04-02 9:01 ` Borislav Petkov
@ 2014-04-02 9:06 ` William Dauchy
2014-04-02 10:46 ` Ingo Molnar
1 sibling, 0 replies; 12+ messages in thread
From: William Dauchy @ 2014-04-02 9:06 UTC (permalink / raw)
To: Borislav Petkov
Cc: William Dauchy, linux-kernel, Ingo Molnar, hpa, tglx, Tony, Gong,
linux-tip-commits, William Dauchy
[-- Attachment #1: Type: text/plain, Size: 347 bytes --]
On Apr02 11:01, Borislav Petkov wrote:
> It is too late for that now as the patch is in -tip already... Unless
> Ingo can still amend it, that is.
>
> But, we're working on a real solution for the storm issue and there
> we'll be asking you to test stuff anyway so we'll make sure to use this
> mail address then, ok?
ack
--
William
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms
2014-04-02 9:01 ` Borislav Petkov
2014-04-02 9:06 ` William Dauchy
@ 2014-04-02 10:46 ` Ingo Molnar
2014-04-02 10:52 ` Borislav Petkov
1 sibling, 1 reply; 12+ messages in thread
From: Ingo Molnar @ 2014-04-02 10:46 UTC (permalink / raw)
To: Borislav Petkov
Cc: William Dauchy, linux-kernel, hpa, tglx, Tony, Gong,
linux-tip-commits, William Dauchy
* Borislav Petkov <bp@alien8.de> wrote:
> On Wed, Apr 02, 2014 at 10:51:49AM +0200, William Dauchy wrote:
> > Could you use the following address instead?
> > Reported-by: William Dauchy <william@gandi.net>
>
> It is too late for that now as the patch is in -tip already... Unless
> Ingo can still amend it, that is.
There are already patches on top of it, so it's not possible - but
even if it was the last one, since it got committed by Tony I cannot
rebase or amend it.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms
2014-04-02 10:46 ` Ingo Molnar
@ 2014-04-02 10:52 ` Borislav Petkov
0 siblings, 0 replies; 12+ messages in thread
From: Borislav Petkov @ 2014-04-02 10:52 UTC (permalink / raw)
To: Ingo Molnar
Cc: William Dauchy, linux-kernel, hpa, tglx, Tony, Gong,
linux-tip-commits, William Dauchy
On Wed, Apr 02, 2014 at 12:46:12PM +0200, Ingo Molnar wrote:
> There are already patches on top of it, so it's not possible - but
> even if it was the last one, since it got committed by Tony I cannot
> rebase or amend it.
Ah right, you pulled it from him, sure.
Thanks.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms
[not found] <tip-27f6c573e0f77f7d1cc907c1494c99a61e48b7d8@git.kernel.org>
2014-04-02 8:51 ` [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms William Dauchy
@ 2014-04-11 17:01 ` H. Peter Anvin
2014-04-14 8:39 ` [PATCH 1/2] x86, CMCI: Fix a missed put_cpu_var Chen, Gong
2 siblings, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2014-04-11 17:01 UTC (permalink / raw)
To: linux-kernel, mingo, wdauchy, tglx, tony.luck, gong.chen,
linux-tip-commits
On 04/02/2014 12:55 AM, tip-bot for Chen, Gong wrote:
> @@ -614,6 +618,8 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
> if (!(m.status & MCI_STATUS_VAL))
> continue;
>
> + v = &get_cpu_var(mce_polled_error);
> + set_bit(0, v);
> /*
> * Uncorrected or signalled events are handled by the exception
> * handler when it is enabled, so don't process those here.
> @@ -1278,10 +1284,18 @@ static unsigned long mce_adjust_timer_default(unsigned long interval)
> static unsigned long (*mce_adjust_timer)(unsigned long interval) =
> mce_adjust_timer_default;
>
> +static int cmc_error_seen(void)
> +{
> + unsigned long *v = &__get_cpu_var(mce_polled_error);
> +
> + return test_and_clear_bit(0, v);
> +}
> +
Please use this_cpu_*() whereever possible instead of __get_cpu_var().
Since this is not actually a bitmask this_cpu_xchg() can be used at the end.
In fact, using set_bit() is completely wasteful.
I'll push this onward since it is a bit late, but please submit a
cleanup patch.
-hpa
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] x86, CMCI: Fix a missed put_cpu_var
[not found] <tip-27f6c573e0f77f7d1cc907c1494c99a61e48b7d8@git.kernel.org>
2014-04-02 8:51 ` [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms William Dauchy
2014-04-11 17:01 ` H. Peter Anvin
@ 2014-04-14 8:39 ` Chen, Gong
2014-04-14 8:39 ` [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var Chen, Gong
2 siblings, 1 reply; 12+ messages in thread
From: Chen, Gong @ 2014-04-14 8:39 UTC (permalink / raw)
To: hpa, tony.luck; +Cc: tglx, mingo, linux-kernel, linux-tip-commits, Chen, Gong
This issue is introduced in commit 27f6c573e0. I forget to
execute put_cpu_var operation after get_cpu_var.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index eeee23f..26eaf3b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -620,6 +620,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
v = &get_cpu_var(mce_polled_error);
set_bit(0, v);
+ put_cpu_var(mce_polled_error);
/*
* Uncorrected or signalled events are handled by the exception
* handler when it is enabled, so don't process those here.
--
1.9.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var
2014-04-14 8:39 ` [PATCH 1/2] x86, CMCI: Fix a missed put_cpu_var Chen, Gong
@ 2014-04-14 8:39 ` Chen, Gong
2014-04-14 16:19 ` H. Peter Anvin
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Chen, Gong @ 2014-04-14 8:39 UTC (permalink / raw)
To: hpa, tony.luck; +Cc: tglx, mingo, linux-kernel, linux-tip-commits, Chen, Gong
According to Peter's suggestion, use this_cpu_* instead of
__get_cpu_var. BTW, remove bitmask ops to avoid unnecessary
overhead.
Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Suggested-by: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/kernel/cpu/mcheck/mce.c | 36 ++++++++++++++++++------------------
1 file changed, 18 insertions(+), 18 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 26eaf3b..a44506e 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -402,7 +402,7 @@ static u64 mce_rdmsrl(u32 msr)
if (offset < 0)
return 0;
- return *(u64 *)((char *)&__get_cpu_var(injectm) + offset);
+ return *(u64 *)((char *)this_cpu_ptr(&injectm) + offset);
}
if (rdmsrl_safe(msr, &v)) {
@@ -424,7 +424,7 @@ static void mce_wrmsrl(u32 msr, u64 v)
int offset = msr_to_offset(msr);
if (offset >= 0)
- *(u64 *)((char *)&__get_cpu_var(injectm) + offset) = v;
+ *(u64 *)((char *)this_cpu_ptr(&injectm) + offset) = v;
return;
}
wrmsrl(msr, v);
@@ -480,7 +480,7 @@ static DEFINE_PER_CPU(struct mce_ring, mce_ring);
/* Runs with CPU affinity in workqueue */
static int mce_ring_empty(void)
{
- struct mce_ring *r = &__get_cpu_var(mce_ring);
+ struct mce_ring *r = this_cpu_ptr(&mce_ring);
return r->start == r->end;
}
@@ -492,7 +492,7 @@ static int mce_ring_get(unsigned long *pfn)
*pfn = 0;
get_cpu();
- r = &__get_cpu_var(mce_ring);
+ r = this_cpu_ptr(&mce_ring);
if (r->start == r->end)
goto out;
*pfn = r->ring[r->start];
@@ -506,7 +506,7 @@ out:
/* Always runs in MCE context with preempt off */
static int mce_ring_add(unsigned long pfn)
{
- struct mce_ring *r = &__get_cpu_var(mce_ring);
+ struct mce_ring *r = this_cpu_ptr(&mce_ring);
unsigned next;
next = (r->end + 1) % MCE_RING_SIZE;
@@ -528,7 +528,7 @@ int mce_available(struct cpuinfo_x86 *c)
static void mce_schedule_work(void)
{
if (!mce_ring_empty())
- schedule_work(&__get_cpu_var(mce_work));
+ schedule_work(this_cpu_ptr(&mce_work));
}
DEFINE_PER_CPU(struct irq_work, mce_irq_work);
@@ -553,7 +553,7 @@ static void mce_report_event(struct pt_regs *regs)
return;
}
- irq_work_queue(&__get_cpu_var(mce_irq_work));
+ irq_work_queue(this_cpu_ptr(&mce_irq_work));
}
/*
@@ -619,7 +619,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
continue;
v = &get_cpu_var(mce_polled_error);
- set_bit(0, v);
+ *v = 1;
put_cpu_var(mce_polled_error);
/*
* Uncorrected or signalled events are handled by the exception
@@ -1053,7 +1053,7 @@ void do_machine_check(struct pt_regs *regs, long error_code)
mce_gather_info(&m, regs);
- final = &__get_cpu_var(mces_seen);
+ final = this_cpu_ptr(&mces_seen);
*final = m;
memset(valid_banks, 0, sizeof(valid_banks));
@@ -1287,14 +1287,14 @@ static unsigned long (*mce_adjust_timer)(unsigned long interval) =
static int cmc_error_seen(void)
{
- unsigned long *v = &__get_cpu_var(mce_polled_error);
+ unsigned long *v = this_cpu_ptr(&mce_polled_error);
- return test_and_clear_bit(0, v);
+ return this_cpu_xchg(*v, 0);
}
static void mce_timer_fn(unsigned long data)
{
- struct timer_list *t = &__get_cpu_var(mce_timer);
+ struct timer_list *t = this_cpu_ptr(&mce_timer);
unsigned long iv;
int notify;
@@ -1302,7 +1302,7 @@ static void mce_timer_fn(unsigned long data)
if (mce_available(__this_cpu_ptr(&cpu_info))) {
machine_check_poll(MCP_TIMESTAMP,
- &__get_cpu_var(mce_poll_banks));
+ this_cpu_ptr(&mce_poll_banks));
mce_intel_cmci_poll();
}
@@ -1332,7 +1332,7 @@ static void mce_timer_fn(unsigned long data)
*/
void mce_timer_kick(unsigned long interval)
{
- struct timer_list *t = &__get_cpu_var(mce_timer);
+ struct timer_list *t = this_cpu_ptr(&mce_timer);
unsigned long when = jiffies + interval;
unsigned long iv = __this_cpu_read(mce_next_interval);
@@ -1668,7 +1668,7 @@ static void mce_start_timer(unsigned int cpu, struct timer_list *t)
static void __mcheck_cpu_init_timer(void)
{
- struct timer_list *t = &__get_cpu_var(mce_timer);
+ struct timer_list *t = this_cpu_ptr(&mce_timer);
unsigned int cpu = smp_processor_id();
setup_timer(t, mce_timer_fn, cpu);
@@ -1711,8 +1711,8 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
__mcheck_cpu_init_generic();
__mcheck_cpu_init_vendor(c);
__mcheck_cpu_init_timer();
- INIT_WORK(&__get_cpu_var(mce_work), mce_process_work);
- init_irq_work(&__get_cpu_var(mce_irq_work), &mce_irq_work_cb);
+ INIT_WORK(this_cpu_ptr(&mce_work), mce_process_work);
+ init_irq_work(this_cpu_ptr(&mce_irq_work), &mce_irq_work_cb);
}
/*
@@ -1964,7 +1964,7 @@ static struct miscdevice mce_chrdev_device = {
static void __mce_disable_bank(void *arg)
{
int bank = *((int *)arg);
- __clear_bit(bank, __get_cpu_var(mce_poll_banks));
+ __clear_bit(bank, *this_cpu_ptr(&mce_poll_banks));
cmci_disable_bank(bank);
}
--
1.9.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var
2014-04-14 8:39 ` [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var Chen, Gong
@ 2014-04-14 16:19 ` H. Peter Anvin
2014-04-14 16:20 ` H. Peter Anvin
2014-04-14 16:23 ` H. Peter Anvin
2 siblings, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2014-04-14 16:19 UTC (permalink / raw)
To: Chen, Gong, tony.luck; +Cc: tglx, mingo, linux-kernel, linux-tip-commits
On 04/14/2014 01:39 AM, Chen, Gong wrote:
> @@ -619,7 +619,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
> continue;
>
> v = &get_cpu_var(mce_polled_error);
> - set_bit(0, v);
> + *v = 1;
> put_cpu_var(mce_polled_error);
> /*
> * Uncorrected or signalled events are handled by the exception
The amazing thing is that you managed to miss the one place where you
could actively elide a pointer.
The above should simply be:
this_cpu_write(mce_polled_error, 1);
-hpa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var
2014-04-14 8:39 ` [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var Chen, Gong
2014-04-14 16:19 ` H. Peter Anvin
@ 2014-04-14 16:20 ` H. Peter Anvin
2014-04-14 16:23 ` H. Peter Anvin
2 siblings, 0 replies; 12+ messages in thread
From: H. Peter Anvin @ 2014-04-14 16:20 UTC (permalink / raw)
To: Chen, Gong, tony.luck; +Cc: tglx, mingo, linux-kernel, linux-tip-commits
On 04/14/2014 01:39 AM, Chen, Gong wrote:
> @@ -1287,14 +1287,14 @@ static unsigned long (*mce_adjust_timer)(unsigned long interval) =
>
> static int cmc_error_seen(void)
> {
> - unsigned long *v = &__get_cpu_var(mce_polled_error);
> + unsigned long *v = this_cpu_ptr(&mce_polled_error);
>
> - return test_and_clear_bit(0, v);
> + return this_cpu_xchg(*v, 0);
> }
>
Here you produce a pointer and *then* passing it through a this_cpu_
function... this is actively wrong.
It should simply be:
return this_cpu_xchg(mce_polled_error, 0);
-hpa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var
2014-04-14 8:39 ` [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var Chen, Gong
2014-04-14 16:19 ` H. Peter Anvin
2014-04-14 16:20 ` H. Peter Anvin
@ 2014-04-14 16:23 ` H. Peter Anvin
2014-04-15 2:28 ` Chen, Gong
2 siblings, 1 reply; 12+ messages in thread
From: H. Peter Anvin @ 2014-04-14 16:23 UTC (permalink / raw)
To: Chen, Gong, tony.luck; +Cc: tglx, mingo, linux-kernel, linux-tip-commits
Please read Documentation/this_cpu_ops.txt for reference for how to use
these functions. However, you want to avoid forming a pointer if you
can; it is relatively expensive to do so.
-hpa
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var
2014-04-14 16:23 ` H. Peter Anvin
@ 2014-04-15 2:28 ` Chen, Gong
0 siblings, 0 replies; 12+ messages in thread
From: Chen, Gong @ 2014-04-15 2:28 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: tony.luck, tglx, mingo, linux-kernel, linux-tip-commits
[-- Attachment #1: Type: text/plain, Size: 433 bytes --]
On Mon, Apr 14, 2014 at 09:23:08AM -0700, H. Peter Anvin wrote:
> Please read Documentation/this_cpu_ops.txt for reference for how to use
> these functions. However, you want to avoid forming a pointer if you
> can; it is relatively expensive to do so.
>
:-). I just found your comment yesterday so that I was eager to finish it.
I should do more homework as you told and send it again. Thanks very much
for your patience.
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-04-15 2:53 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <tip-27f6c573e0f77f7d1cc907c1494c99a61e48b7d8@git.kernel.org>
2014-04-02 8:51 ` [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms William Dauchy
2014-04-02 9:01 ` Borislav Petkov
2014-04-02 9:06 ` William Dauchy
2014-04-02 10:46 ` Ingo Molnar
2014-04-02 10:52 ` Borislav Petkov
2014-04-11 17:01 ` H. Peter Anvin
2014-04-14 8:39 ` [PATCH 1/2] x86, CMCI: Fix a missed put_cpu_var Chen, Gong
2014-04-14 8:39 ` [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var Chen, Gong
2014-04-14 16:19 ` H. Peter Anvin
2014-04-14 16:20 ` H. Peter Anvin
2014-04-14 16:23 ` H. Peter Anvin
2014-04-15 2:28 ` Chen, Gong
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.