All of lore.kernel.org
 help / color / mirror / Atom feed
* qemu-system-ppc64 abort()s with pcie bridges
@ 2020-07-08  8:03 Thomas Huth
  2020-07-08  9:57 ` Greg Kurz
  0 siblings, 1 reply; 3+ messages in thread
From: Thomas Huth @ 2020-07-08  8:03 UTC (permalink / raw)
  To: qemu-ppc
  Cc: Laurent Vivier, David Gibson, QEMU Developers, Greg Kurz,
	Cédric Le Goater


 Hi,

qemu-system-ppc64 currently abort()s when it is started with a pcie
bridge device:

$ qemu-system-ppc64 -M pseries-5.1 -device pcie-pci-bridge
Unexpected error in object_property_find() at qom/object.c:1240:
qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
Aborted (core dumped)

or:

$ qemu-system-ppc64 -M pseries -device dec-21154-p2p-bridge
Unexpected error in object_property_find() at qom/object.c:1240:
qemu-system-ppc64: -device dec-21154-p2p-bridge: Property '.chassis_nr'
not found
Aborted (core dumped)

That's kind of ugly, and it shows up as error when running
scripts/device-crash-test. Is there an easy way to avoid the abort() and
fail more gracefully here?

 Thomas



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: qemu-system-ppc64 abort()s with pcie bridges
  2020-07-08  8:03 qemu-system-ppc64 abort()s with pcie bridges Thomas Huth
@ 2020-07-08  9:57 ` Greg Kurz
  2020-07-09  9:28   ` Greg Kurz
  0 siblings, 1 reply; 3+ messages in thread
From: Greg Kurz @ 2020-07-08  9:57 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Laurent Vivier, Markus Armbruster, QEMU Developers, qemu-ppc,
	Cédric Le Goater, David Gibson

On Wed, 8 Jul 2020 10:03:47 +0200
Thomas Huth <thuth@redhat.com> wrote:

> 
>  Hi,
> 
> qemu-system-ppc64 currently abort()s when it is started with a pcie
> bridge device:
> 
> $ qemu-system-ppc64 -M pseries-5.1 -device pcie-pci-bridge
> Unexpected error in object_property_find() at qom/object.c:1240:
> qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> Aborted (core dumped)
> 
> or:
> 
> $ qemu-system-ppc64 -M pseries -device dec-21154-p2p-bridge
> Unexpected error in object_property_find() at qom/object.c:1240:
> qemu-system-ppc64: -device dec-21154-p2p-bridge: Property '.chassis_nr'
> not found
> Aborted (core dumped)
> 
> That's kind of ugly, and it shows up as error when running
> scripts/device-crash-test. Is there an easy way to avoid the abort() and
> fail more gracefully here?
> 

And even worse, this can tear down a running guest with hotplug :\

(qemu) device_add pcie-pci-bridge 
Unexpected error in object_property_find() at /home/greg/Work/qemu/qemu-ppc/qom/object.c:1240:
Property '.chassis_nr' not found
Aborted (core dumped)

This is caused by recent commit:

commit 7ef1553dac8ef8dbe547b58d7420461a16be0eeb
Author: Markus Armbruster <armbru@redhat.com>
Date:   Tue May 5 17:29:25 2020 +0200

    spapr_pci: Drop some dead error handling
    
    chassis_from_bus() uses object_property_get_uint() to get property
    "chassis_nr" of the bridge device.  Failure would be a programming
    error.  Pass &error_abort, and simplify its callers.
    
    Cc: David Gibson <david@gibson.dropbear.id.au>
    Cc: qemu-ppc@nongnu.org
    Signed-off-by: Markus Armbruster <armbru@redhat.com>
    Acked-by: David Gibson <david@gibson.dropbear.id.au>
    Reviewed-by: Greg Kurz <groug@kaod.org>
    Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
    Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
    Message-Id: <20200505152926.18877-18-armbru@redhat.com>

Before that, we would simply print the "chassir_nr not found" error,
and in case of a cold plugged device exit.

The root cause is that the sPAPR PCI code assumes that a PCI bridge
has a "chassir_nr" property, ie. it is a standard PCI bridge. Other
PCI bridge types don't have that. Not sure yet why this information
is required, I'll check LoPAPR.

In the meantime, since we're in soft freeze, I guess we should
revert Markus's patch and add a big fat comment to explain
what's going on and maybe change the error message to something
more informative, eg. "PCIE-to-PCI bridges are not supported".

Thoughts ?

>  Thomas
> 



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: qemu-system-ppc64 abort()s with pcie bridges
  2020-07-08  9:57 ` Greg Kurz
@ 2020-07-09  9:28   ` Greg Kurz
  0 siblings, 0 replies; 3+ messages in thread
From: Greg Kurz @ 2020-07-09  9:28 UTC (permalink / raw)
  To: Thomas Huth
  Cc: Laurent Vivier, Michael S. Tsirkin, Markus Armbruster,
	QEMU Developers, qemu-ppc, Cédric Le Goater, David Gibson

On Wed, 8 Jul 2020 11:57:03 +0200
Greg Kurz <groug@kaod.org> wrote:

> On Wed, 8 Jul 2020 10:03:47 +0200
> Thomas Huth <thuth@redhat.com> wrote:
> 
> > 
> >  Hi,
> > 
> > qemu-system-ppc64 currently abort()s when it is started with a pcie
> > bridge device:
> > 
> > $ qemu-system-ppc64 -M pseries-5.1 -device pcie-pci-bridge
> > Unexpected error in object_property_find() at qom/object.c:1240:
> > qemu-system-ppc64: -device pcie-pci-bridge: Property '.chassis_nr' not found
> > Aborted (core dumped)
> > 
> > or:
> > 
> > $ qemu-system-ppc64 -M pseries -device dec-21154-p2p-bridge
> > Unexpected error in object_property_find() at qom/object.c:1240:
> > qemu-system-ppc64: -device dec-21154-p2p-bridge: Property '.chassis_nr'
> > not found
> > Aborted (core dumped)
> > 
> > That's kind of ugly, and it shows up as error when running
> > scripts/device-crash-test. Is there an easy way to avoid the abort() and
> > fail more gracefully here?
> > 
> 
> And even worse, this can tear down a running guest with hotplug :\
> 
> (qemu) device_add pcie-pci-bridge 
> Unexpected error in object_property_find() at /home/greg/Work/qemu/qemu-ppc/qom/object.c:1240:
> Property '.chassis_nr' not found
> Aborted (core dumped)
> 
> This is caused by recent commit:
> 
> commit 7ef1553dac8ef8dbe547b58d7420461a16be0eeb
> Author: Markus Armbruster <armbru@redhat.com>
> Date:   Tue May 5 17:29:25 2020 +0200
> 
>     spapr_pci: Drop some dead error handling
>     
>     chassis_from_bus() uses object_property_get_uint() to get property
>     "chassis_nr" of the bridge device.  Failure would be a programming
>     error.  Pass &error_abort, and simplify its callers.
>     
>     Cc: David Gibson <david@gibson.dropbear.id.au>
>     Cc: qemu-ppc@nongnu.org
>     Signed-off-by: Markus Armbruster <armbru@redhat.com>
>     Acked-by: David Gibson <david@gibson.dropbear.id.au>
>     Reviewed-by: Greg Kurz <groug@kaod.org>
>     Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
>     Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
>     Message-Id: <20200505152926.18877-18-armbru@redhat.com>
> 
> Before that, we would simply print the "chassir_nr not found" error,
> and in case of a cold plugged device exit.
> 
> The root cause is that the sPAPR PCI code assumes that a PCI bridge
> has a "chassir_nr" property, ie. it is a standard PCI bridge. Other
> PCI bridge types don't have that. Not sure yet why this information
> is required, I'll check LoPAPR.
> 

More on this side : each slot of a PCI bridge is associated a DRC (a
PAPR thingy to handle hot plug/unplug). Each DRC must have a unique
identifier system-wide. We used to use the bus number to compute
the DRC id but it was broken, so we now _hijack_ "chassis_nr" as an
alternative since this commit:

commit 05929a6c5dfe1028ef66250b7bbf11939f8e77cd
Author: David Gibson <david@gibson.dropbear.id.au>
Date:   Wed Apr 10 11:49:28 2019 +1000

    spapr: Don't use bus number for building DRC ids

This means that we only support the standard pci-bridge device,
and this relies on the availability of "chassis_nr". Failure
to find this property is then not a programming error, but
an expected case where we want to fail gracefully (ie. revert
Markus's commit mentioned above).

While reading code I realized that we have another problem : the
realization of the pci-bridge device does fail if "chassis_nr" is
zero, but I failed to find a uniqueness check. And we get:

$ qemu-system-ppc64 -device pci-bridge,chassis_nr=1 -device pci-bridge,chassis_nr=1
Unexpected error in object_property_try_add() at qom/object.c:1167:
qemu-system-ppc64: -device pci-bridge,chassis_nr=1: attempt to add duplicate property '40000100' to object (type 'container')
Aborted (core dumped)

It is very confusing to see that we state that "chassis_nr" is unique
several times in slotid_cap_init() but it is never enforced anywhere.

    if (!chassis) {
        error_setg(errp, "Bridge chassis not specified. Each bridge is required"
                   " to be assigned a unique chassis id > 0.");
        return -EINVAL;
    }

or

    /* We make each chassis unique, this way each bridge is First in Chassis */


Michael, Marcel or anyone with PCI knowledge,

Can you shed some light on the semantics of "chassis_nr" ?

> In the meantime, since we're in soft freeze, I guess we should
> revert Markus's patch and add a big fat comment to explain
> what's going on and maybe change the error message to something
> more informative, eg. "PCIE-to-PCI bridges are not supported".
> 
> Thoughts ?
> 
> >  Thomas
> > 
> 



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-07-09  9:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-08  8:03 qemu-system-ppc64 abort()s with pcie bridges Thomas Huth
2020-07-08  9:57 ` Greg Kurz
2020-07-09  9:28   ` Greg Kurz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.