All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Greg Kurz <groug@kaod.org>
Cc: Thomas Huth <thuth@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-ppc@nongnu.org, Markus Armbruster <armbru@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH] spapr_pci: Robustify support of PCI bridges
Date: Fri, 17 Jul 2020 09:50:23 +1000	[thread overview]
Message-ID: <20200716235023.GC5607@umbus.fritz.box> (raw)
In-Reply-To: <20200716164200.2bea2977@bahia.lan>

[-- Attachment #1: Type: text/plain, Size: 5516 bytes --]

On Thu, Jul 16, 2020 at 04:42:00PM +0200, Greg Kurz wrote:
> On Thu, 16 Jul 2020 16:01:18 +0200
> Markus Armbruster <armbru@redhat.com> wrote:
> 
> > David Gibson <david@gibson.dropbear.id.au> writes:
> > 
> > > On Thu, Jul 09, 2020 at 07:12:47PM +0200, Greg Kurz wrote:
> > >> Some recent error handling cleanups unveiled issues with our support of
> > >> PCI bridges:
> > >> 
> > >> 1) QEMU aborts when using non-standard PCI bridge types,
> > >>    unveiled by commit 7ef1553dac "spapr_pci: Drop some dead error handling"
> > >> 
> > >> $ qemu-system-ppc64 -M pseries -device pcie-pci-bridge
> > >> Unexpected error in object_property_find() at qom/object.c:1240:
> > >> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> > >> Aborted (core dumped)
> > >
> > > Oops, I thought we had a check that we actually had a "pci-bridge"
> > > device before continuing with the hotplug, but I guess not.
> > >
> > >> This happens because we assume all PCI bridge types to have a "chassis_nr"
> > >> property. This property only exists with the standard PCI bridge type
> > >> "pci-bridge" actually. We could possibly revert 7ef1553dac but it seems
> > >> much simpler to check the presence of "chassis_nr" earlier.
> > >
> > > Hrm, right, 7ef1553dac was not really correct since add_drcs() really
> > > can fail.
> > 
> > Right.  I failed to see that we can run into a bridge without a
> > "chassis_nr" here.

And I missed it on review, as well.

> > >> 2) QEMU abort if same "chassis_nr" value is used several times,
> > >>    unveiled by commit d2623129a7de "qom: Drop parameter @errp of
> > >>    object_property_add() & friends"
> > >> 
> > >> $ qemu-system-ppc64 -M pseries -device pci-bridge,chassis_nr=1 \
> > >>                         -device pci-bridge,chassis_nr=1
> > >> Unexpected error in object_property_try_add() at qom/object.c:1167:
> > >> qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
> > >> Aborted (core dumped)
> > 
> > Before d2623129a7de, the error got *ignored* in
> > spapr_dr_connector_new():
> > 
> >     SpaprDrc *spapr_dr_connector_new(Object *owner, const char *type,
> >                                              uint32_t id)
> >     {
> >         SpaprDrc *drc = SPAPR_DR_CONNECTOR(object_new(type));
> >         char *prop_name;
> > 
> >         drc->id = id;
> >         drc->owner = owner;
> >         prop_name = g_strdup_printf("dr-connector[%"PRIu32"]",
> >                                     spapr_drc_index(drc));
> >         object_property_add_child(owner, prop_name, OBJECT(drc), &error_abort);
> >         object_unref(OBJECT(drc));
> > --->    object_property_set_bool(OBJECT(drc), true, "realized", NULL);
> >         g_free(prop_name);
> > 
> >         return drc;
> >     }
> > 
> > I doubt that's healthy.

Indeed.

> This isn't. The object_property_set_bool() was later converted to
> qdev_realize() (thanks again for the cleanups!) but the problem
> remains. Realize can fail and I see now reason we don't do proper
> error handling when it comes to the DRCs.
> 
> I'll look into fixing that.
> 
> > >> This happens because we assume that "chassis_nr" values are unique, but
> > >> nobody enforces that and we end up generating duplicate DRC ids. The PCI
> > >> code doesn't really care for duplicate "chassis_nr" properties since it
> > >> is only used to initialize the "Chassis Number Register" of the bridge,
> > >> with no functional impact on QEMU. So, even if passing the same value
> > >> several times might look weird, it never broke anything before, so
> > >> I guess we don't necessarily want to enforce strict checking in the PCI
> > >> code now.
> > >
> > > Yeah, I guess.  I'm pretty sure that the chassis number of bridges is
> > > supposed to be system-unique (well, unique within the PCI domain at
> > > least, I guess) as part of the hardware spec.  So specifying multiple
> > > chassis ids the same is a user error, but we need a better failure
> > > mode.
> > >
> > >> Workaround both issues in the PAPR code: check that the bridge has a
> > >> unique and non null "chassis_nr" when plugging it into its parent bus.
> > >>
> > >> Fixes: 05929a6c5dfe ("spapr: Don't use bus number for building DRC ids")
> > >
> > > Arguably, it's really fixing 7ef1553dac.
> > 
> > I agree 7ef1553dac broke the "use a bridge that doesn't have property
> > 'chassis_nr' case.
> > 
> > I suspect the "duplicate chassis_nr" case has always been broken, and
> > d2623129a7de merely uncovered it.
> 
> Yes.

I agree.

> > If we can trigger the abort with hot-plug, then d2623129a7de made things
> > materially worse (new way to accidentally kill your guest and maybe lose
> > data), and I'd add a Fixes: blaming it.
> > 
> 
> Yes it does.
> 
> David,
> 
> Maybe consider folding a third Fixes: tag into this patch ?

Done.

> > >> Reported-by: Thomas Huth <thuth@redhat.com>
> > >> Signed-off-by: Greg Kurz <groug@kaod.org>
> > >
> > > I had a few misgivings about the details of this, but I think I've
> > > convinced myself they're fine.  There's a couple of things I'd like to
> > > polish, but I'll do that as a follow up.
> > 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2020-07-17  0:09 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-09 17:12 [PATCH] spapr_pci: Robustify support of PCI bridges Greg Kurz
2020-07-16  4:45 ` David Gibson
2020-07-16 10:32   ` Greg Kurz
2020-07-16 13:11     ` David Gibson
2020-07-16 14:23       ` Markus Armbruster
2020-07-16 14:57         ` Greg Kurz
2020-07-16 23:57           ` David Gibson
2020-07-16 14:01   ` Markus Armbruster
2020-07-16 14:42     ` Greg Kurz
2020-07-16 23:50       ` David Gibson [this message]
2020-07-16  6:53 ` Michael S. Tsirkin
2020-07-16 10:34   ` Greg Kurz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200716235023.GC5607@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=armbru@redhat.com \
    --cc=groug@kaod.org \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.