All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, mdroth@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] [Qemu-ppc] [PATCH for-2.11 v2] hw/ppc: CAS reset on early device hotplug
Date: Wed, 30 Aug 2017 12:07:22 +1000	[thread overview]
Message-ID: <20170830020722.GB3386@umbus.fritz.box> (raw)
In-Reply-To: <df49df06-caaf-81f5-7c24-f4324654a262@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3271 bytes --]

On Tue, Aug 29, 2017 at 05:54:28PM -0300, Daniel Henrique Barboza wrote:
> 
> 
> On 08/29/2017 04:23 AM, David Gibson wrote:
> > On Fri, Aug 25, 2017 at 06:11:18PM -0300, Daniel Henrique Barboza wrote:
> > > v2:
> > > - rebased with ppc-for-2.11
> > > - function 'spapr_cas_completed' dropped
> > > - function 'spapr_drc_needed' made public and it's now used inside
> > >    'spapr_hotplugged_dev_before_cas'
> > > - 'spapr_drc_needed' was changed to support the migration of logical
> > >    DRCs with devs attached in UNUSED state
> > > - new function: 'spapr_clear_pending_events'. This function is used
> > >    inside ppc_spapr_reset to reset the pending_events QTAILQ
> > Thanks for the followup, unfortunately there is still an important bug
> > left, see comments on the patch itself.
> > 
> > At a higher level, though, looking at the event reset code made me
> > think of a possible even simpler solution to this problem.
> > 
> > The queue of events (both hotplug and epow) is already in a simple
> > internal form that's independent of the two delivery mechanisms.  The
> > only difference is what event source triggers the interrupt.  This
> > explains why an extra hotplug event after the CAS "unstuck" the queue.
> > 
> > AFAICT, a spurious interrupts here should be harmless - the kernel
> > will just check the queue and find nothing there.
> > 
> > So, it should be sufficient to, after CAS, pulse the hotplug queue
> > interrupt if the hotplug queue is negotiated.
> > 
> This is something I've tried in my first attempts at this problem, before
> sending the first patch in which I blocked hotplug before CAS. Back then,
> the problem was that the kernel panics with sig 11 (acess of bad area) when
> receiving the pulse after CAS.

Huh.

> I've investigated it a bit today and it seems that it still the case. Firing
> an IRQ right
> after CAS breaks the kernel. In fact, if you time a regular CPU hotplug
> right after
> CAS you'll get the same sig 11 kernel ooops. It looks like there is a time
> window after
> CAS that the kernel can't handle the hotplug process and pulsing the hotplug
> queue in this window breaks the guest. I've tried some hacks such as pulsing
> the queue
> in the first 'event_scan' call made by the guest, but apparently it is still
> too early.
> 
> I've sent an email to the linuxppc-dev mailing list talking about this
> behavior
> and asking if there is a reliable way to know when  we can safely pulse the
> hotplug
> queue. Meanwhile, I'll keep working in the v3 respin of this patch in case
> this
> solution of pulsing the hotplug queue ends up being not feasible.

Right.  As Ben's reply says that definitely looks like a guest kernel
bug.  But, it's in enough kernels in the wild that we really need to
work around it anyway.  I think the reset-at-CAS approach is our best
bet to accomplish that at this stage.

Note that the clear-queue-at-reset preliminary cleanup will be
valuable even if we end up not needing the rest of the reset at CAS
stuff.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2017-08-30  2:11 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-25 21:11 [Qemu-devel] [PATCH for-2.11 v2] hw/ppc: CAS reset on early device hotplug Daniel Henrique Barboza
2017-08-25 21:11 ` Daniel Henrique Barboza
2017-08-29  6:45   ` David Gibson
2017-08-29  7:23 ` David Gibson
2017-08-29 20:54   ` [Qemu-devel] [Qemu-ppc] " Daniel Henrique Barboza
2017-08-30  2:07     ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170830020722.GB3386@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=danielhb@linux.vnet.ibm.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.