* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms [not found] <tip-27f6c573e0f77f7d1cc907c1494c99a61e48b7d8@git.kernel.org> @ 2014-04-02 8:51 ` William Dauchy 2014-04-02 9:01 ` Borislav Petkov 2014-04-11 17:01 ` H. Peter Anvin 2014-04-14 8:39 ` [PATCH 1/2] x86, CMCI: Fix a missed put_cpu_var Chen, Gong 2 siblings, 1 reply; 12+ messages in thread From: William Dauchy @ 2014-04-02 8:51 UTC (permalink / raw) To: linux-kernel, Ingo Molnar, hpa, William Dauchy, tglx, Tony, Gong Cc: linux-tip-commits, William Dauchy On Wed, Apr 2, 2014 at 9:55 AM, tip-bot for Chen, Gong <tipbot@zytor.com> wrote: > Commit-ID: 27f6c573e0f77f7d1cc907c1494c99a61e48b7d8 > Gitweb: http://git.kernel.org/tip/27f6c573e0f77f7d1cc907c1494c99a61e48b7d8 > Author: Chen, Gong <gong.chen@linux.intel.com> > AuthorDate: Thu, 27 Mar 2014 21:24:36 -0400 > Committer: Tony Luck <tony.luck@intel.com> > CommitDate: Fri, 28 Mar 2014 13:40:16 -0700 > > x86, CMCI: Add proper detection of end of CMCI storms > > When CMCI storm persists for a long time(at least beyond predefined > threshold. It's 30 seconds for now), we can watch CMCI storm is > detected immediately after it subsides. > > ... > Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode > Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode > Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode > Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode > ... > > The problem is that our logic that determines that the storm has > ended is incorrect. We announce the end, re-enable interrupts and > realize that the storm is still going on, so we switch back to > polling mode. Rinse, repeat. > > When a storm happens we disable signaling of errors via CMCI and begin > polling machine check banks instead. If we find any logged errors, > then we need to set a per-cpu flag so that our per-cpu tests that > check whether the storm is ongoing will see that errors are still > being logged independently of whether mce_notify_irq() says that the > error has been fully processed. > > cmci_clear() is not the right tool to disable a bank. It disables the > interrupt for the bank as desired, but it also clears the bit for > this bank in "mce_banks_owned" so we will skip the bank when polling > (so we fail to see that the storm continues because we stop looking). > New cmci_storm_disable_banks() just disables the interrupt while > allowing polling to continue. > > Reported-by: William Dauchy <wdauchy@gmail.com> Could you use the following address instead? Reported-by: William Dauchy <william@gandi.net> Thanks, > Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> > Signed-off-by: Tony Luck <tony.luck@intel.com> -- William ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms 2014-04-02 8:51 ` [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms William Dauchy @ 2014-04-02 9:01 ` Borislav Petkov 2014-04-02 9:06 ` William Dauchy 2014-04-02 10:46 ` Ingo Molnar 0 siblings, 2 replies; 12+ messages in thread From: Borislav Petkov @ 2014-04-02 9:01 UTC (permalink / raw) To: William Dauchy Cc: linux-kernel, Ingo Molnar, hpa, tglx, Tony, Gong, linux-tip-commits, William Dauchy On Wed, Apr 02, 2014 at 10:51:49AM +0200, William Dauchy wrote: > Could you use the following address instead? > Reported-by: William Dauchy <william@gandi.net> It is too late for that now as the patch is in -tip already... Unless Ingo can still amend it, that is. But, we're working on a real solution for the storm issue and there we'll be asking you to test stuff anyway so we'll make sure to use this mail address then, ok? :-) -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms 2014-04-02 9:01 ` Borislav Petkov @ 2014-04-02 9:06 ` William Dauchy 2014-04-02 10:46 ` Ingo Molnar 1 sibling, 0 replies; 12+ messages in thread From: William Dauchy @ 2014-04-02 9:06 UTC (permalink / raw) To: Borislav Petkov Cc: William Dauchy, linux-kernel, Ingo Molnar, hpa, tglx, Tony, Gong, linux-tip-commits, William Dauchy [-- Attachment #1: Type: text/plain, Size: 347 bytes --] On Apr02 11:01, Borislav Petkov wrote: > It is too late for that now as the patch is in -tip already... Unless > Ingo can still amend it, that is. > > But, we're working on a real solution for the storm issue and there > we'll be asking you to test stuff anyway so we'll make sure to use this > mail address then, ok? ack -- William [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms 2014-04-02 9:01 ` Borislav Petkov 2014-04-02 9:06 ` William Dauchy @ 2014-04-02 10:46 ` Ingo Molnar 2014-04-02 10:52 ` Borislav Petkov 1 sibling, 1 reply; 12+ messages in thread From: Ingo Molnar @ 2014-04-02 10:46 UTC (permalink / raw) To: Borislav Petkov Cc: William Dauchy, linux-kernel, hpa, tglx, Tony, Gong, linux-tip-commits, William Dauchy * Borislav Petkov <bp@alien8.de> wrote: > On Wed, Apr 02, 2014 at 10:51:49AM +0200, William Dauchy wrote: > > Could you use the following address instead? > > Reported-by: William Dauchy <william@gandi.net> > > It is too late for that now as the patch is in -tip already... Unless > Ingo can still amend it, that is. There are already patches on top of it, so it's not possible - but even if it was the last one, since it got committed by Tony I cannot rebase or amend it. Thanks, Ingo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms 2014-04-02 10:46 ` Ingo Molnar @ 2014-04-02 10:52 ` Borislav Petkov 0 siblings, 0 replies; 12+ messages in thread From: Borislav Petkov @ 2014-04-02 10:52 UTC (permalink / raw) To: Ingo Molnar Cc: William Dauchy, linux-kernel, hpa, tglx, Tony, Gong, linux-tip-commits, William Dauchy On Wed, Apr 02, 2014 at 12:46:12PM +0200, Ingo Molnar wrote: > There are already patches on top of it, so it's not possible - but > even if it was the last one, since it got committed by Tony I cannot > rebase or amend it. Ah right, you pulled it from him, sure. Thanks. -- Regards/Gruss, Boris. Sent from a fat crate under my desk. Formatting is fine. -- ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms [not found] <tip-27f6c573e0f77f7d1cc907c1494c99a61e48b7d8@git.kernel.org> 2014-04-02 8:51 ` [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms William Dauchy @ 2014-04-11 17:01 ` H. Peter Anvin 2014-04-14 8:39 ` [PATCH 1/2] x86, CMCI: Fix a missed put_cpu_var Chen, Gong 2 siblings, 0 replies; 12+ messages in thread From: H. Peter Anvin @ 2014-04-11 17:01 UTC (permalink / raw) To: linux-kernel, mingo, wdauchy, tglx, tony.luck, gong.chen, linux-tip-commits On 04/02/2014 12:55 AM, tip-bot for Chen, Gong wrote: > @@ -614,6 +618,8 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) > if (!(m.status & MCI_STATUS_VAL)) > continue; > > + v = &get_cpu_var(mce_polled_error); > + set_bit(0, v); > /* > * Uncorrected or signalled events are handled by the exception > * handler when it is enabled, so don't process those here. > @@ -1278,10 +1284,18 @@ static unsigned long mce_adjust_timer_default(unsigned long interval) > static unsigned long (*mce_adjust_timer)(unsigned long interval) = > mce_adjust_timer_default; > > +static int cmc_error_seen(void) > +{ > + unsigned long *v = &__get_cpu_var(mce_polled_error); > + > + return test_and_clear_bit(0, v); > +} > + Please use this_cpu_*() whereever possible instead of __get_cpu_var(). Since this is not actually a bitmask this_cpu_xchg() can be used at the end. In fact, using set_bit() is completely wasteful. I'll push this onward since it is a bit late, but please submit a cleanup patch. -hpa ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] x86, CMCI: Fix a missed put_cpu_var [not found] <tip-27f6c573e0f77f7d1cc907c1494c99a61e48b7d8@git.kernel.org> 2014-04-02 8:51 ` [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms William Dauchy 2014-04-11 17:01 ` H. Peter Anvin @ 2014-04-14 8:39 ` Chen, Gong 2014-04-14 8:39 ` [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var Chen, Gong 2 siblings, 1 reply; 12+ messages in thread From: Chen, Gong @ 2014-04-14 8:39 UTC (permalink / raw) To: hpa, tony.luck; +Cc: tglx, mingo, linux-kernel, linux-tip-commits, Chen, Gong This issue is introduced in commit 27f6c573e0. I forget to execute put_cpu_var operation after get_cpu_var. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> --- arch/x86/kernel/cpu/mcheck/mce.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index eeee23f..26eaf3b 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -620,6 +620,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) v = &get_cpu_var(mce_polled_error); set_bit(0, v); + put_cpu_var(mce_polled_error); /* * Uncorrected or signalled events are handled by the exception * handler when it is enabled, so don't process those here. -- 1.9.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var 2014-04-14 8:39 ` [PATCH 1/2] x86, CMCI: Fix a missed put_cpu_var Chen, Gong @ 2014-04-14 8:39 ` Chen, Gong 2014-04-14 16:19 ` H. Peter Anvin ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Chen, Gong @ 2014-04-14 8:39 UTC (permalink / raw) To: hpa, tony.luck; +Cc: tglx, mingo, linux-kernel, linux-tip-commits, Chen, Gong According to Peter's suggestion, use this_cpu_* instead of __get_cpu_var. BTW, remove bitmask ops to avoid unnecessary overhead. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Suggested-by: H. Peter Anvin <hpa@zytor.com> --- arch/x86/kernel/cpu/mcheck/mce.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 26eaf3b..a44506e 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -402,7 +402,7 @@ static u64 mce_rdmsrl(u32 msr) if (offset < 0) return 0; - return *(u64 *)((char *)&__get_cpu_var(injectm) + offset); + return *(u64 *)((char *)this_cpu_ptr(&injectm) + offset); } if (rdmsrl_safe(msr, &v)) { @@ -424,7 +424,7 @@ static void mce_wrmsrl(u32 msr, u64 v) int offset = msr_to_offset(msr); if (offset >= 0) - *(u64 *)((char *)&__get_cpu_var(injectm) + offset) = v; + *(u64 *)((char *)this_cpu_ptr(&injectm) + offset) = v; return; } wrmsrl(msr, v); @@ -480,7 +480,7 @@ static DEFINE_PER_CPU(struct mce_ring, mce_ring); /* Runs with CPU affinity in workqueue */ static int mce_ring_empty(void) { - struct mce_ring *r = &__get_cpu_var(mce_ring); + struct mce_ring *r = this_cpu_ptr(&mce_ring); return r->start == r->end; } @@ -492,7 +492,7 @@ static int mce_ring_get(unsigned long *pfn) *pfn = 0; get_cpu(); - r = &__get_cpu_var(mce_ring); + r = this_cpu_ptr(&mce_ring); if (r->start == r->end) goto out; *pfn = r->ring[r->start]; @@ -506,7 +506,7 @@ out: /* Always runs in MCE context with preempt off */ static int mce_ring_add(unsigned long pfn) { - struct mce_ring *r = &__get_cpu_var(mce_ring); + struct mce_ring *r = this_cpu_ptr(&mce_ring); unsigned next; next = (r->end + 1) % MCE_RING_SIZE; @@ -528,7 +528,7 @@ int mce_available(struct cpuinfo_x86 *c) static void mce_schedule_work(void) { if (!mce_ring_empty()) - schedule_work(&__get_cpu_var(mce_work)); + schedule_work(this_cpu_ptr(&mce_work)); } DEFINE_PER_CPU(struct irq_work, mce_irq_work); @@ -553,7 +553,7 @@ static void mce_report_event(struct pt_regs *regs) return; } - irq_work_queue(&__get_cpu_var(mce_irq_work)); + irq_work_queue(this_cpu_ptr(&mce_irq_work)); } /* @@ -619,7 +619,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) continue; v = &get_cpu_var(mce_polled_error); - set_bit(0, v); + *v = 1; put_cpu_var(mce_polled_error); /* * Uncorrected or signalled events are handled by the exception @@ -1053,7 +1053,7 @@ void do_machine_check(struct pt_regs *regs, long error_code) mce_gather_info(&m, regs); - final = &__get_cpu_var(mces_seen); + final = this_cpu_ptr(&mces_seen); *final = m; memset(valid_banks, 0, sizeof(valid_banks)); @@ -1287,14 +1287,14 @@ static unsigned long (*mce_adjust_timer)(unsigned long interval) = static int cmc_error_seen(void) { - unsigned long *v = &__get_cpu_var(mce_polled_error); + unsigned long *v = this_cpu_ptr(&mce_polled_error); - return test_and_clear_bit(0, v); + return this_cpu_xchg(*v, 0); } static void mce_timer_fn(unsigned long data) { - struct timer_list *t = &__get_cpu_var(mce_timer); + struct timer_list *t = this_cpu_ptr(&mce_timer); unsigned long iv; int notify; @@ -1302,7 +1302,7 @@ static void mce_timer_fn(unsigned long data) if (mce_available(__this_cpu_ptr(&cpu_info))) { machine_check_poll(MCP_TIMESTAMP, - &__get_cpu_var(mce_poll_banks)); + this_cpu_ptr(&mce_poll_banks)); mce_intel_cmci_poll(); } @@ -1332,7 +1332,7 @@ static void mce_timer_fn(unsigned long data) */ void mce_timer_kick(unsigned long interval) { - struct timer_list *t = &__get_cpu_var(mce_timer); + struct timer_list *t = this_cpu_ptr(&mce_timer); unsigned long when = jiffies + interval; unsigned long iv = __this_cpu_read(mce_next_interval); @@ -1668,7 +1668,7 @@ static void mce_start_timer(unsigned int cpu, struct timer_list *t) static void __mcheck_cpu_init_timer(void) { - struct timer_list *t = &__get_cpu_var(mce_timer); + struct timer_list *t = this_cpu_ptr(&mce_timer); unsigned int cpu = smp_processor_id(); setup_timer(t, mce_timer_fn, cpu); @@ -1711,8 +1711,8 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c) __mcheck_cpu_init_generic(); __mcheck_cpu_init_vendor(c); __mcheck_cpu_init_timer(); - INIT_WORK(&__get_cpu_var(mce_work), mce_process_work); - init_irq_work(&__get_cpu_var(mce_irq_work), &mce_irq_work_cb); + INIT_WORK(this_cpu_ptr(&mce_work), mce_process_work); + init_irq_work(this_cpu_ptr(&mce_irq_work), &mce_irq_work_cb); } /* @@ -1964,7 +1964,7 @@ static struct miscdevice mce_chrdev_device = { static void __mce_disable_bank(void *arg) { int bank = *((int *)arg); - __clear_bit(bank, __get_cpu_var(mce_poll_banks)); + __clear_bit(bank, *this_cpu_ptr(&mce_poll_banks)); cmci_disable_bank(bank); } -- 1.9.0 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var 2014-04-14 8:39 ` [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var Chen, Gong @ 2014-04-14 16:19 ` H. Peter Anvin 2014-04-14 16:20 ` H. Peter Anvin 2014-04-14 16:23 ` H. Peter Anvin 2 siblings, 0 replies; 12+ messages in thread From: H. Peter Anvin @ 2014-04-14 16:19 UTC (permalink / raw) To: Chen, Gong, tony.luck; +Cc: tglx, mingo, linux-kernel, linux-tip-commits On 04/14/2014 01:39 AM, Chen, Gong wrote: > @@ -619,7 +619,7 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) > continue; > > v = &get_cpu_var(mce_polled_error); > - set_bit(0, v); > + *v = 1; > put_cpu_var(mce_polled_error); > /* > * Uncorrected or signalled events are handled by the exception The amazing thing is that you managed to miss the one place where you could actively elide a pointer. The above should simply be: this_cpu_write(mce_polled_error, 1); -hpa ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var 2014-04-14 8:39 ` [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var Chen, Gong 2014-04-14 16:19 ` H. Peter Anvin @ 2014-04-14 16:20 ` H. Peter Anvin 2014-04-14 16:23 ` H. Peter Anvin 2 siblings, 0 replies; 12+ messages in thread From: H. Peter Anvin @ 2014-04-14 16:20 UTC (permalink / raw) To: Chen, Gong, tony.luck; +Cc: tglx, mingo, linux-kernel, linux-tip-commits On 04/14/2014 01:39 AM, Chen, Gong wrote: > @@ -1287,14 +1287,14 @@ static unsigned long (*mce_adjust_timer)(unsigned long interval) = > > static int cmc_error_seen(void) > { > - unsigned long *v = &__get_cpu_var(mce_polled_error); > + unsigned long *v = this_cpu_ptr(&mce_polled_error); > > - return test_and_clear_bit(0, v); > + return this_cpu_xchg(*v, 0); > } > Here you produce a pointer and *then* passing it through a this_cpu_ function... this is actively wrong. It should simply be: return this_cpu_xchg(mce_polled_error, 0); -hpa ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var 2014-04-14 8:39 ` [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var Chen, Gong 2014-04-14 16:19 ` H. Peter Anvin 2014-04-14 16:20 ` H. Peter Anvin @ 2014-04-14 16:23 ` H. Peter Anvin 2014-04-15 2:28 ` Chen, Gong 2 siblings, 1 reply; 12+ messages in thread From: H. Peter Anvin @ 2014-04-14 16:23 UTC (permalink / raw) To: Chen, Gong, tony.luck; +Cc: tglx, mingo, linux-kernel, linux-tip-commits Please read Documentation/this_cpu_ops.txt for reference for how to use these functions. However, you want to avoid forming a pointer if you can; it is relatively expensive to do so. -hpa ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var 2014-04-14 16:23 ` H. Peter Anvin @ 2014-04-15 2:28 ` Chen, Gong 0 siblings, 0 replies; 12+ messages in thread From: Chen, Gong @ 2014-04-15 2:28 UTC (permalink / raw) To: H. Peter Anvin; +Cc: tony.luck, tglx, mingo, linux-kernel, linux-tip-commits [-- Attachment #1: Type: text/plain, Size: 433 bytes --] On Mon, Apr 14, 2014 at 09:23:08AM -0700, H. Peter Anvin wrote: > Please read Documentation/this_cpu_ops.txt for reference for how to use > these functions. However, you want to avoid forming a pointer if you > can; it is relatively expensive to do so. > :-). I just found your comment yesterday so that I was eager to finish it. I should do more homework as you told and send it again. Thanks very much for your patience. [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-04-15 2:53 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <tip-27f6c573e0f77f7d1cc907c1494c99a61e48b7d8@git.kernel.org> 2014-04-02 8:51 ` [tip:x86/urgent] x86, CMCI: Add proper detection of end of CMCI storms William Dauchy 2014-04-02 9:01 ` Borislav Petkov 2014-04-02 9:06 ` William Dauchy 2014-04-02 10:46 ` Ingo Molnar 2014-04-02 10:52 ` Borislav Petkov 2014-04-11 17:01 ` H. Peter Anvin 2014-04-14 8:39 ` [PATCH 1/2] x86, CMCI: Fix a missed put_cpu_var Chen, Gong 2014-04-14 8:39 ` [PATCH 2/2] x86, MCE: Cleanup macro __get_cpu_var Chen, Gong 2014-04-14 16:19 ` H. Peter Anvin 2014-04-14 16:20 ` H. Peter Anvin 2014-04-14 16:23 ` H. Peter Anvin 2014-04-15 2:28 ` Chen, Gong
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.