All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: David Gibson <david@gibson.dropbear.id.au>
Cc: aik@ozlabs.ru, mdroth@linux.vnet.ibm.com, gwshan@au1.ibm.com,
	qemu-devel@nongnu.org, qemu-ppc@nongnu.org
Subject: Re: [Qemu-devel] [RFC 01/12] vfio: Start improving VFIO/EEH interface
Date: Thu, 03 Dec 2015 14:02:55 -0700	[thread overview]
Message-ID: <1449176575.15753.227.camel@redhat.com> (raw)
In-Reply-To: <20151203042219.GI3107@voom.redhat.com>

On Thu, 2015-12-03 at 15:22 +1100, David Gibson wrote:
> On Wed, Dec 02, 2015 at 01:09:34PM -0700, Alex Williamson wrote:
> > On Tue, 2015-12-01 at 13:23 +1100, David Gibson wrote:
> > > On Mon, Nov 23, 2015 at 02:58:11PM -0700, Alex Williamson wrote:
> > > > On Thu, 2015-11-19 at 15:29 +1100, David Gibson wrote:
> > > > > At present the code handling IBM's Enhanced Error Handling (EEH) interface
> > > > > on VFIO devices operates by bypassing the usual VFIO logic with
> > > > > vfio_container_ioctl().  That's a poorly designed interface with unclear
> > > > > semantics about exactly what can be operated on.
> > > > > 
> > > > > In particular it operates on a single vfio container internally (hence the
> > > > > name), but takes an address space and group id, from which it deduces the
> > > > > container in a rather roundabout way.  groupids are something that code
> > > > > outside vfio shouldn't even be aware of.
> > > > > 
> > > > > This patch creates new interfaces for EEH operations.  Internally we
> > > > > have vfio_eeh_container_op() which takes a VFIOContainer object
> > > > > directly.  For external use we have vfio_eeh_as_ok() which determines
> > > > > if an AddressSpace is usable for EEH (at present this means it has a
> > > > > single container and at most a single group attached), and
> > > > > vfio_eeh_as_op() which will perform an operation on an AddressSpace in
> > > > > the unambiguous case, and otherwise returns an error.
> > > > > 
> > > > > This interface still isn't great, but it's enough of an improvement to
> > > > > allow a number of cleanups in other places.
> > > > > 
> > > > > Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
> > > > > ---
> > > > >  hw/vfio/common.c       | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++
> > > > >  include/hw/vfio/vfio.h |  2 ++
> > > > >  2 files changed, 79 insertions(+)
> > > > > 
> > > > > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > > > > index 6797208..4733625 100644
> > > > > --- a/hw/vfio/common.c
> > > > > +++ b/hw/vfio/common.c
> > > > > @@ -1002,3 +1002,80 @@ int vfio_container_ioctl(AddressSpace *as, int32_t groupid,
> > > > >  
> > > > >      return vfio_container_do_ioctl(as, groupid, req, param);
> > > > >  }
> > > > > +
> > > > > +/*
> > > > > + * Interfaces for IBM EEH (Enhanced Error Handling)
> > > > > + */
> > > > > +static bool vfio_eeh_container_ok(VFIOContainer *container)
> > > > > +{
> > > > > +    /* A broken kernel implementation means EEH operations won't work
> > > > > +     * correctly if there are multiple groups in a container */
> > > > > +
> > > > > +    if (!QLIST_EMPTY(&container->group_list)
> > > > > +        && QLIST_NEXT(QLIST_FIRST(&container->group_list), container_next)) {
> > > > > +        return false;
> > > > > +    }
> > > > > +
> > > > > +    return true;
> > > > > +}
> > > > 
> > > > Seems like there are ways to make this a non-eeh specific function,
> > > > vfio_container_group_count(), vfio_container_group_empty_or_singleton(),
> > > > etc.
> > > 
> > > I guess, but I don't know of anything else that needs to know, so is
> > > there a point?
> > 
> > Yes, long term maintainability.  Simple functions that are named based
> > on what they do are building blocks for other users, even if we don't
> > yet know they exist.  Functions tainted with the name and purpose of
> > their currently intended callers are cruft and code duplication waiting
> > to happen.
> 
> Ok, point taken.
> 
> > > Actually.. I could do with a another opinion here: so, logically EEH
> > > operations should be possible on a container basis - the kernel
> > > interface correctly reflects that (my previous comments that the
> > > interface was broken were mistaken).
> > > 
> > > The current kernel implementation *is* broken (and is non-trivial to
> > > fix) which is what this test is about.  But is checking for a probably
> > > broken kernel state something that we ought to be checking for in
> > > qemu?  As it stands when the kernel is fixed we'll need a new
> > > capability so that qemu can know to disable this test.
> > > 
> > > Should we instead just proceed with any container and just advise
> > > people not to attach multiple groups until the kernel is fixed?
> > > 
> > > A relevant point here might be that while I haven't implemented it so
> > > far, I think it will be possible to workaround the broken kernel with
> > > full functionality by forcing each group into a separate container and
> > > using one of a couple of possible different methods to handle EEH
> > > functionality across multiple containers on a vPHB.
> > 
> > This sounds vaguely similar to the discussions we're having around AER
> > handling.  We really need to be able to translate a guest bus reset to a
> > host bus reset to enable guest participation in AER recovery, but iommu
> > grouping doesn't encompass any sort of shared bus property on x86 like
> > it does on power.  Therefore the configurations where we can enable AER
> > are only a subset of what we can enable otherwise.  However, not
> > everyone cares about AER recovery and perhaps the same is true of EEH.
> > So you really don't want to prevent useful configurations if the user
> > doesn't opt-in for that feature.
> > 
> > So for AER we're thinking about a new vfio-pci option, aer=on, that
> > indicates the device must be in a configuration that supports AER or the
> > VM instantiation (or device hotplug) should fail.  Should EEH do
> > something similar?
> 
> Yes, I think that's a good idea.  I'd been thinking about a PHB option
> for enabling EEH, but I think one on the devices themselves makes
> things work better.
> 
> > Should we share an option to make life easier for
> > libvirt so it doesn't need to care about EEH vs AER?
> 
> My initial thought is yes, but I'm not really sure if there are
> wrinkles that could make that a problem.
> 
> > If the kernel
> > interface is eventually fixed, maybe that just relaxes some of the
> > configuration parameters making EEH support easier to achieve, but still
> > optional?  Thanks,
> 
> So, yes, and that's good, but that's not really what I was asking
> about.
> 
> The kernel *interface* is not broken, just the implementation.  Which
> means when it's fixed it won't be discoverable unless we also add a
> capability advertising the fix.
> 
> So the question is: do we workaround in qemu until such a capability
> comes along, or just assume that it's (potentially) working and
> declare it a kernel problem if it doesn't?

I'm in favor of deterministic behavior, so I think we better consider
the limitations of the current implementation to be part of the ABI and
add a capability when that limitation is lifted.  Whether you choose to
abandon EEH in QEMU until the implementation better suits your needs or
stumble along with restricted configurations is up to you, but QEMU
shouldn't need to assume anything.  Thanks,

Alex

  reply	other threads:[~2015-12-03 21:03 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-19  4:29 [Qemu-devel] [RFC 00/12] Merge EEH support into spapr-pci-host-bridge David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 01/12] vfio: Start improving VFIO/EEH interface David Gibson
2015-11-23 21:58   ` Alex Williamson
2015-12-01  2:23     ` David Gibson
2015-12-02 20:09       ` Alex Williamson
2015-12-03  4:22         ` David Gibson
2015-12-03 21:02           ` Alex Williamson [this message]
2015-11-19  4:29 ` [Qemu-devel] [RFC 02/12] spapr_pci: Switch to vfio_eeh_as_op() interface David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 03/12] spapr_pci: Eliminate class callbacks David Gibson
2015-11-24  9:00   ` [Qemu-devel] [Qemu-ppc] " Alexander Graf
2015-11-25  6:22     ` David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 04/12] spapr_pci: Fold spapr_phb_vfio_eeh_configure() into spapr_pci code David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 05/12] spapr_pci: Fold spapr_phb_vfio_eeh_reset() " David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 06/12] spapr_pci: Fold spapr_phb_vfio_eeh_get_state() " David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 07/12] spapr_pci: Fold spapr_phb_vfio_eeh_set_option() " David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 08/12] spapr_pci: Fold spapr_phb_vfio_reset() " David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 09/12] spapr_pci: Allow EEH on spapr-pci-host-bridge David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 10/12] spapr_pci: (Mostly) remove spapr-pci-vfio-host-bridge David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 11/12] spapr_pci: Remove finish_realize hook David Gibson
2015-11-19  4:29 ` [Qemu-devel] [RFC 12/12] vfio: Eliminate vfio_container_ioctl() David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1449176575.15753.227.camel@redhat.com \
    --to=alex.williamson@redhat.com \
    --cc=aik@ozlabs.ru \
    --cc=david@gibson.dropbear.id.au \
    --cc=gwshan@au1.ibm.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.