All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Thompson <daniel.thompson@linaro.org>
To: Douglas Anderson <dianders@chromium.org>
Cc: Jason Wessel <jason.wessel@windriver.com>,
	kgdb-bugreport@lists.sourceforge.net,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 3/4] kgdb: Don't round up a CPU that failed rounding up before
Date: Wed, 7 Nov 2018 11:59:53 +0000	[thread overview]
Message-ID: <20181107115953.cnar4u2axi4poaxe@holly.lan> (raw)
In-Reply-To: <20181107010028.184543-4-dianders@chromium.org>

On Tue, Nov 06, 2018 at 05:00:27PM -0800, Douglas Anderson wrote:
> If we're using the default implementation of kgdb_roundup_cpus() that
> uses smp_call_function_single_async() we can end up hanging
> kgdb_roundup_cpus() if we try to round up a CPU that failed to round
> up before.
> 
> Specifically smp_call_function_single_async() will try to wait on the
> csd lock for the CPU that we're trying to round up.  If the previous
> round up never finished then that lock could still be held and we'll
> just sit there hanging.
> 
> There's not a lot of use trying to round up a CPU that failed to round
> up before.  Let's keep a flag that indicates whether the CPU started
> but didn't finish to round up before.  If we see that flag set then
> we'll skip the next round up.
> 
> In general we have a few goals here:
> - We never want to end up calling smp_call_function_single_async()
>   when the csd is still locked.  This is accomplished because
>   flush_smp_call_function_queue() unlocks the csd _before_ invoking
>   the callback.  That means that when kgdb_nmicallback() runs we know
>   for sure the the csd is no longer locked.  Thus when we set
>   "rounding_up = false" we know for sure that the csd is unlocked.
> - If there are no timeouts rounding up we should never skip a round
>   up.
> 
> NOTE #1: In general trying to continue running after failing to round
> up CPUs doesn't appear to be supported in the debugger.  When I
> simulate this I find that kdb reports "Catastrophic error detected"
> when I try to continue.  I can overrule and continue anyway, but it
> should be noted that we may be entering the land of dragons here.

It's been quite a while but AFAIR I decided to set the catastrophic
error here *because* the stuck csd lock would make restarting fragile.

So arguably we are now able to remove the code that sets this flag when
a CPU fails to round up.


> NOTE #3: setting 'kgdb_info[cpu].rounding_up = false' is in
> kgdb_nmicallback() instead of kgdb_call_nmi_hook() because some
> implementations override kgdb_call_nmi_hook().  It shouldn't hurt to
> have it in kgdb_nmicallback() in any case.

Slightly icky but I guess this is OK.

> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index 23f2b5613afa..324cba8917f1 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -246,6 +246,20 @@ void __weak kgdb_roundup_cpus(void)
>  			continue;
>  
>  		csd = &per_cpu(kgdb_roundup_csd, cpu);
> +
> +		/*
> +		 * If it didn't round up last time, don't try again
> +		 * since smp_call_function_single_async() will block.
> +		 *
> +		 * If rounding_up is false then we know that the
> +		 * previous call must have at least started and that
> +		 * means smp_call_function_single_async() won't block.
> +		 */
> +		smp_mb();

Not commented and I suspect this may have no useful effect. What 
(harmful) orderings does this barrier render impossible?


> +		if (kgdb_info[cpu].rounding_up)
> +			continue;
> +		kgdb_info[cpu].rounding_up = true;
> +
>  		csd->func = kgdb_call_nmi_hook;
>  		smp_call_function_single_async(cpu, csd);
>  	}
> @@ -782,6 +796,9 @@ int kgdb_nmicallback(int cpu, void *regs)
>  	struct kgdb_state kgdb_var;
>  	struct kgdb_state *ks = &kgdb_var;
>  
> +	kgdb_info[cpu].rounding_up = false;
> +	smp_mb();

Also not commented. Here I think the barrier may have a purpose (to
ensure rounding_up gets cleared before we peek at dbg_master_lock) but
if that is the case we need to comment it.
 

Daniel.

  reply	other threads:[~2018-11-07 12:00 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-07  1:00 [PATCH v3 0/4] kgdb: Fix kgdb_roundup_cpus() Douglas Anderson
2018-11-07  1:00 ` Douglas Anderson
2018-11-07  1:00 ` Douglas Anderson
2018-11-07  1:00 ` Douglas Anderson
2018-11-07  1:00 ` Douglas Anderson
2018-11-07  1:00 ` Douglas Anderson
2018-11-07  1:00 ` [PATCH v3 1/4] kgdb: Remove irq flags from roundup Douglas Anderson
2018-11-07  1:00   ` Douglas Anderson
2018-11-07  1:00   ` Douglas Anderson
2018-11-07  1:00   ` Douglas Anderson
2018-11-07  1:00   ` Douglas Anderson
2018-11-07  1:00   ` Douglas Anderson
2018-11-07  1:00 ` [PATCH v3 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function() Douglas Anderson
2018-11-07  1:00   ` Douglas Anderson
2018-11-07  1:00   ` Douglas Anderson
2018-11-07  1:00   ` Douglas Anderson
2018-11-07  1:00   ` Douglas Anderson
2018-11-07  1:00   ` Douglas Anderson
2018-11-07  1:00 ` [PATCH v3 3/4] kgdb: Don't round up a CPU that failed rounding up before Douglas Anderson
2018-11-07 11:59   ` Daniel Thompson [this message]
2018-11-07 16:44     ` Doug Anderson
2018-11-07  1:00 ` [PATCH v3 4/4] kdb: Don't back trace on a cpu that didn't round up Douglas Anderson
2018-11-07 12:30   ` Daniel Thompson
2018-11-07 18:44     ` Doug Anderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181107115953.cnar4u2axi4poaxe@holly.lan \
    --to=daniel.thompson@linaro.org \
    --cc=dianders@chromium.org \
    --cc=jason.wessel@windriver.com \
    --cc=kgdb-bugreport@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.