All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Borislav Petkov <bp@alien8.de>
Cc: linux-kernel@vger.kernel.org, x86@kernel.org,
	linux-edac@vger.kernel.org, tony.luck@intel.com,
	tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com,
	kernel-team@fb.com
Subject: Re: [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs
Date: Fri, 8 Jan 2021 06:55:14 -0800	[thread overview]
Message-ID: <20210108145514.GS2743@paulmck-ThinkPad-P72> (raw)
In-Reply-To: <20210108123156.GD4042@zn.tnic>

On Fri, Jan 08, 2021 at 01:31:56PM +0100, Borislav Petkov wrote:
> On Thu, Jan 07, 2021 at 09:08:44AM -0800, Paul E. McKenney wrote:
> > Some information is usually better than none.  And I bet that failing
> > hardware is capable of all sorts of tricks at all sorts of levels.  ;-)
> 
> Tell me about it.
> 
> > Updated patch below.  Is this what you had in mind?
> 
> Ok, so I've massaged it into the below locally while taking another
> detailed look. Made the pr_info pr_emerg and poked at the text more, as
> I do. :)
> 
> Lemme know if something else needs to be adjusted, otherwise I'll queue
> it.

Looks good to me!  I agree that your change to the pr_emerg() string is
much better than my original.  And good point on your added comment,
plus it was fun to see that my original "holdouts" wording has not
completely vanished.  ;-)

Thank you very much!!!

							Thanx, Paul

> Thx.
> 
> ---
> Author: Paul E. McKenney <paulmck@kernel.org>
> Date:   Wed Dec 23 17:04:19 2020 -0800
> 
>     x86/mce: Make mce_timed_out() identify holdout CPUs
>     
>     The
>     
>       "Timeout: Not all CPUs entered broadcast exception handler"
>     
>     message will appear from time to time given enough systems, but this
>     message does not identify which CPUs failed to enter the broadcast
>     exception handler. This information would be valuable if available,
>     for example, in order to correlate with other hardware-oriented error
>     messages.
>     
>     Add a cpumask of CPUs which maintains which CPUs have entered this
>     handler, and print out which ones failed to enter in the event of a
>     timeout.
>     
>      [ bp: Massage. ]
>     
>     Reported-by: Jonathan Lemon <bsd@fb.com>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>     Signed-off-by: Borislav Petkov <bp@suse.de>
>     Tested-by: Tony Luck <tony.luck@intel.com>
>     Link: https://lkml.kernel.org/r/20210106174102.GA23874@paulmck-ThinkPad-P72
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 13d3f1cbda17..6c81d0998e0a 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -877,6 +877,12 @@ static atomic_t mce_executing;
>   */
>  static atomic_t mce_callin;
>  
> +/*
> + * Track which CPUs entered the MCA broadcast synchronization and which not in
> + * order to print holdouts.
> + */
> +static cpumask_t mce_missing_cpus = CPU_MASK_ALL;
> +
>  /*
>   * Check if a timeout waiting for other CPUs happened.
>   */
> @@ -894,8 +900,12 @@ static int mce_timed_out(u64 *t, const char *msg)
>  	if (!mca_cfg.monarch_timeout)
>  		goto out;
>  	if ((s64)*t < SPINUNIT) {
> -		if (mca_cfg.tolerant <= 1)
> +		if (mca_cfg.tolerant <= 1) {
> +			if (cpumask_and(&mce_missing_cpus, cpu_online_mask, &mce_missing_cpus))
> +				pr_emerg("CPUs not responding to MCE broadcast (may include false positives): %*pbl\n",
> +					 cpumask_pr_args(&mce_missing_cpus));
>  			mce_panic(msg, NULL, NULL);
> +		}
>  		cpu_missing = 1;
>  		return 1;
>  	}
> @@ -1006,6 +1016,7 @@ static int mce_start(int *no_way_out)
>  	 * is updated before mce_callin.
>  	 */
>  	order = atomic_inc_return(&mce_callin);
> +	cpumask_clear_cpu(smp_processor_id(), &mce_missing_cpus);
>  
>  	/*
>  	 * Wait for everyone.
> @@ -1114,6 +1125,7 @@ static int mce_end(int order)
>  reset:
>  	atomic_set(&global_nwo, 0);
>  	atomic_set(&mce_callin, 0);
> +	cpumask_setall(&mce_missing_cpus);
>  	barrier();
>  
>  	/*
> @@ -2712,6 +2724,7 @@ static void mce_reset(void)
>  	atomic_set(&mce_executing, 0);
>  	atomic_set(&mce_callin, 0);
>  	atomic_set(&global_nwo, 0);
> +	cpumask_setall(&mce_missing_cpus);
>  }
>  
>  static int fake_panic_get(void *data, u64 *val)
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

  reply	other threads:[~2021-01-08 14:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-06 17:41 [PATCH RFC x86/mce] Make mce_timed_out() identify holdout CPUs Paul E. McKenney
2021-01-06 18:32 ` Borislav Petkov
2021-01-06 19:13   ` Paul E. McKenney
2021-01-07  7:07     ` Borislav Petkov
2021-01-07 17:08       ` Paul E. McKenney
2021-01-08 12:31         ` Borislav Petkov
2021-01-08 14:55           ` Paul E. McKenney [this message]
2021-01-08 16:57             ` Borislav Petkov
2021-01-06 18:39 ` Luck, Tony
2021-01-06 19:17   ` Paul E. McKenney
2021-01-06 22:49     ` Luck, Tony
2021-01-06 23:23       ` Paul E. McKenney
2021-01-07  0:26         ` Luck, Tony
2021-01-07  0:41           ` Paul E. McKenney
2021-01-08 17:09 ` [tip: ras/core] x86/mce: " tip-bot2 for Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210108145514.GS2743@paulmck-ThinkPad-P72 \
    --to=paulmck@kernel.org \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=kernel-team@fb.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.