All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Balbir Singh <bsingharora@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org, mpe@ellerman.id.au
Subject: Re: [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's
Date: Thu, 14 Dec 2017 11:51:37 +1000	[thread overview]
Message-ID: <20171214115137.5603f77d@roar.ozlabs.ibm.com> (raw)
In-Reply-To: <20171214111213.07ba99a4@gmail.com>

On Thu, 14 Dec 2017 11:12:13 +1100
Balbir Singh <bsingharora@gmail.com> wrote:

> On Wed, 13 Dec 2017 20:51:01 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> > This is looking pretty nice now...
> > 
> > On Wed, 13 Dec 2017 19:08:28 +1100
> > Balbir Singh <bsingharora@gmail.com> wrote:
> >   
> > > @@ -543,7 +543,25 @@ void smp_send_debugger_break(void)
> > >  #ifdef CONFIG_KEXEC_CORE
> > >  void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
> > >  {
> > > +	int cpu;
> > > +
> > >  	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 1000000);
> > > +	if (kdump_in_progress() && crash_wake_offline) {
> > > +		for_each_present_cpu(cpu) {
> > > +			if (cpu_online(cpu))
> > > +				continue;
> > > +			/*
> > > +			 * crash_ipi_callback will wait for
> > > +			 * all cpus, including offline CPUs.
> > > +			 * We don't care about nmi_ipi_function.
> > > +			 * Offline cpus will jump straight into
> > > +			 * crash_ipi_callback, we can skip the
> > > +			 * entire NMI dance and waiting for
> > > +			 * cpus to clear pending mask, etc.
> > > +			 */
> > > +			do_smp_send_nmi_ipi(cpu);    
> > 
> > Still a little bit concerned about using NMI IPI for this.
> >  
> 
> OK -- for offline CPUs you mean?

Yes.

> > If you take an NMI IPI from stop, the idle code should do the
> > right thing and we would just return the system reset wakeup
> > reason in SRR1 here (which does not need to be cleared).
> > 
> > If you take the system reset anywhere else in the loop, it's
> > going to go out via system_reset_exception. I guess that
> > would end up doing the right thing, it probably gets to
> > crash_ipi_callback from crash_kexec_secondary?  
> 
> You mean like if we are online at the time of NMI'ing? If so
> the original loop will NMI us back into crash_ipi_callback
> anyway. We don't expect this to occur for offline CPUs

No, if the offline CPU is executing any instruction except for
stop when the crash occurs.

> 
> > 
> > It's just going to be a very untested code path :( What we
> > gain I suppose is better ability to handle a CPU that's locked
> > up somewhere in the cpu offline path. Assuming the uncommon
> > case works...
> > 
> > Actually, if you *always* go via the system reset exception
> > handler, then code paths will be shared. That might be the
> > way to go. So I would check for system reset wakeup SRR1 reason
> > and call replay_system_reset() for it. What do you think?
> >   
> 
> We could do that, but that would call pnv_system_reset_exception
> and try to call the NMI function, but we've not used that path
> to initiate the NMI, so it should call the stale nmi_ipi_function
> which is crash_ipi_callback and not go via the crash_kexec path.

It shouldn't, if the CPU is not set in the NMI bitmask, I think
it should go back out and do the rest of the system_reset_exception
handler.

Anyway we have to get this case right, because it can already hit
as I said if the offline CPU takes the NMI when it is not stopped.
This is why I want to try to use a unified code path.

> I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and
> I'm worried about calling a stale nmi_ipi_function via the
> system_reset_exception path, if we are OK with it, I can revisit
> the code path

You shouldn't get a stale one, that would also be a bug -- we
have to cope with NMIs coming in at any time that are triggered
externally (not by smp_send_nmi_ipi), so if you see any bugs
there those need to be fixed separately.

Thanks,
Nick

  reply	other threads:[~2017-12-14  1:51 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-13  8:08 [PATCH 1/2] powerpc/crash: Remove the test for cpu_online in the IPI callback Balbir Singh
2017-12-13  8:08 ` [PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's Balbir Singh
2017-12-13 10:51   ` Nicholas Piggin
2017-12-14  0:12     ` Balbir Singh
2017-12-14  1:51       ` Nicholas Piggin [this message]
2017-12-14 12:16         ` Balbir Singh
2017-12-14 13:32           ` Nicholas Piggin
2017-12-15  2:55             ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171214115137.5603f77d@roar.ozlabs.ibm.com \
    --to=npiggin@gmail.com \
    --cc=bsingharora@gmail.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.