All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes
@ 2019-01-22 12:51 David Hildenbrand
  2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges David Hildenbrand
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: David Hildenbrand @ 2019-01-22 12:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Collin Walling, Thomas Huth, Christian Borntraeger,
	Cornelia Huck, Richard Henderson, David Hildenbrand

Hotplugging of PCI bridges is right now pretty much broken. Coldplugging
and hotplugging will assign wrong primary bus numbers in some scenarios.

I base my knowledge on how this is supposed to work on
http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html

I did a couple of tests, building whole hierarchies of bridges, both
hot and coldplugged. "info pci" as well as the Linux guests showed
what I was expecting.

David Hildenbrand (2):
  s390x/pci: Fix primary bus number for PCI bridges
  s390x/pci: Fix hotplugging of PCI bridges

 hw/s390x/s390-pci-bus.c | 41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

-- 
2.17.2

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges
  2019-01-22 12:51 [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes David Hildenbrand
@ 2019-01-22 12:51 ` David Hildenbrand
  2019-01-23 10:26   ` Thomas Huth
  2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of " David Hildenbrand
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2019-01-22 12:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Collin Walling, Thomas Huth, Christian Borntraeger,
	Cornelia Huck, Richard Henderson, David Hildenbrand

The primary bus number corresponds always to the bus number of the
bus the bridge is attached to.

Right now, if we have two bridges attached to the same bus (e.g. root
bus) this is however not the case. Fix assignment.

While at it
- Add a comment why we have to reassign durign every reset (which I
  found to be supprising)
- Drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff. As we are
  setting it via a DFS and not via a BFS (as discussed e.g. in [1]), this
  is not necessary. The last number when we return is the highest
  number.

Please note that hotplugging of bridges is in general still broken, will
be fixed next.

[1] http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/s390-pci-bus.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index f017c1ded0..b7c4613fde 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -862,7 +862,8 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         qbus_set_hotplug_handler(bus, DEVICE(s), errp);
 
         if (dev->hotplugged) {
-            pci_default_write_config(pdev, PCI_PRIMARY_BUS, s->bus_no, 1);
+            pci_default_write_config(pdev, PCI_PRIMARY_BUS,
+                                     pci_dev_bus_num(pdev), 1);
             s->bus_no += 1;
             pci_default_write_config(pdev, PCI_SECONDARY_BUS, s->bus_no, 1);
             do {
@@ -1016,8 +1017,6 @@ static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
                                       void *opaque)
 {
     S390pciState *s = opaque;
-    unsigned int primary = s->bus_no;
-    unsigned int subordinate = 0xff;
     PCIBus *sec_bus = NULL;
 
     if ((pci_default_read_config(pdev, PCI_HEADER_TYPE, 1) !=
@@ -1026,7 +1025,7 @@ static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
     }
 
     (s->bus_no)++;
-    pci_default_write_config(pdev, PCI_PRIMARY_BUS, primary, 1);
+    pci_default_write_config(pdev, PCI_PRIMARY_BUS, pci_dev_bus_num(pdev), 1);
     pci_default_write_config(pdev, PCI_SECONDARY_BUS, s->bus_no, 1);
     pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no, 1);
 
@@ -1035,7 +1034,7 @@ static void s390_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
         return;
     }
 
-    pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, subordinate, 1);
+    /* Assign numbers to all child bridges. The last is the highest number. */
     pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
                         s390_pci_enumerate_bridge, s);
     pci_default_write_config(pdev, PCI_SUBORDINATE_BUS, s->bus_no, 1);
@@ -1046,6 +1045,10 @@ static void s390_pcihost_reset(DeviceState *dev)
     S390pciState *s = S390_PCI_HOST_BRIDGE(dev);
     PCIBus *bus = s->parent_obj.bus;
 
+    /*
+     * When resetting a PCI bridge, the assigned numbers are set to 0. So
+     * on every system reset, we also have to reassign numbers.
+     */
     s->bus_no = 0;
     pci_for_each_device(bus, pci_bus_num(bus), s390_pci_enumerate_bridge, s);
 }
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of PCI bridges
  2019-01-22 12:51 [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes David Hildenbrand
  2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges David Hildenbrand
@ 2019-01-22 12:51 ` David Hildenbrand
  2019-01-23 11:16   ` Thomas Huth
  2019-01-22 13:01 ` [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes Cornelia Huck
  2019-01-28 11:42 ` Cornelia Huck
  3 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2019-01-22 12:51 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-s390x, Collin Walling, Thomas Huth, Christian Borntraeger,
	Cornelia Huck, Richard Henderson, David Hildenbrand

When hotplugging a PCI bridge right now to the root port, we resolve
pci_get_bus(pdev)->parent_dev, which results in a SEGFAULT. Hotplugging
really only works right now when hotplugging to another bridge.

Instead, we have to properly check if we are already at the root.

Let's cleanup the code while at it a bit and factor out updating the
subordiante bus number into a separate function. The check for
"old_nr < nr" is right now not strictly necessary, but makes it more
obvious what is actually going on.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 hw/s390x/s390-pci-bus.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
index b7c4613fde..9b5c5fff60 100644
--- a/hw/s390x/s390-pci-bus.c
+++ b/hw/s390x/s390-pci-bus.c
@@ -843,6 +843,21 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     }
 }
 
+static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
+{
+    uint32_t old_nr;
+
+    pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
+    while (!pci_bus_is_root(pci_get_bus(dev))) {
+        dev = pci_get_bus(dev)->parent_dev;
+
+        old_nr = pci_default_read_config(dev, PCI_SUBORDINATE_BUS, 1);
+        if (old_nr < nr) {
+            pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
+        }
+    }
+}
+
 static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
                               Error **errp)
 {
@@ -851,26 +866,21 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
     S390PCIBusDevice *pbdev = NULL;
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
-        BusState *bus;
         PCIBridge *pb = PCI_BRIDGE(dev);
-        PCIDevice *pdev = PCI_DEVICE(dev);
 
+        pdev = PCI_DEVICE(dev);
         pci_bridge_map_irq(pb, dev->id, s390_pci_map_irq);
         pci_setup_iommu(&pb->sec_bus, s390_pci_dma_iommu, s);
 
-        bus = BUS(&pb->sec_bus);
-        qbus_set_hotplug_handler(bus, DEVICE(s), errp);
+        qbus_set_hotplug_handler(BUS(&pb->sec_bus), DEVICE(s), errp);
 
         if (dev->hotplugged) {
             pci_default_write_config(pdev, PCI_PRIMARY_BUS,
                                      pci_dev_bus_num(pdev), 1);
             s->bus_no += 1;
             pci_default_write_config(pdev, PCI_SECONDARY_BUS, s->bus_no, 1);
-            do {
-                pdev = pci_get_bus(pdev)->parent_dev;
-                pci_default_write_config(pdev, PCI_SUBORDINATE_BUS,
-                                         s->bus_no, 1);
-            } while (pci_get_bus(pdev) && pci_dev_bus_num(pdev));
+
+            s390_pci_update_subordinate(pdev, s->bus_no);
         }
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
         pdev = PCI_DEVICE(dev);
-- 
2.17.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes
  2019-01-22 12:51 [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes David Hildenbrand
  2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges David Hildenbrand
  2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of " David Hildenbrand
@ 2019-01-22 13:01 ` Cornelia Huck
  2019-01-23  8:02   ` David Hildenbrand
  2019-01-28 11:42 ` Cornelia Huck
  3 siblings, 1 reply; 15+ messages in thread
From: Cornelia Huck @ 2019-01-22 13:01 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Collin Walling, Thomas Huth,
	Christian Borntraeger, Richard Henderson, Pierre Morel

On Tue, 22 Jan 2019 13:51:31 +0100
David Hildenbrand <david@redhat.com> wrote:

> Hotplugging of PCI bridges is right now pretty much broken. Coldplugging
> and hotplugging will assign wrong primary bus numbers in some scenarios.
> 
> I base my knowledge on how this is supposed to work on
> http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html
> 
> I did a couple of tests, building whole hierarchies of bridges, both
> hot and coldplugged. "info pci" as well as the Linux guests showed
> what I was expecting.
> 
> David Hildenbrand (2):
>   s390x/pci: Fix primary bus number for PCI bridges
>   s390x/pci: Fix hotplugging of PCI bridges
> 
>  hw/s390x/s390-pci-bus.c | 41 +++++++++++++++++++++++++++--------------
>  1 file changed, 27 insertions(+), 14 deletions(-)
> 

I'll leave the actual review of this to folks familiar with how zPCI is
supposed to work :)

Does the guest actually see anything of this? Consistency is good, and
not crashing even better, but I think all of the topology is invisible
on the guest side anyway...

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes
  2019-01-22 13:01 ` [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes Cornelia Huck
@ 2019-01-23  8:02   ` David Hildenbrand
  2019-01-23  9:47     ` Cornelia Huck
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2019-01-23  8:02 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: qemu-devel, qemu-s390x, Collin Walling, Thomas Huth,
	Christian Borntraeger, Richard Henderson, Pierre Morel

On 22.01.19 14:01, Cornelia Huck wrote:
> On Tue, 22 Jan 2019 13:51:31 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> Hotplugging of PCI bridges is right now pretty much broken. Coldplugging
>> and hotplugging will assign wrong primary bus numbers in some scenarios.
>>
>> I base my knowledge on how this is supposed to work on
>> http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html
>>
>> I did a couple of tests, building whole hierarchies of bridges, both
>> hot and coldplugged. "info pci" as well as the Linux guests showed
>> what I was expecting.
>>
>> David Hildenbrand (2):
>>   s390x/pci: Fix primary bus number for PCI bridges
>>   s390x/pci: Fix hotplugging of PCI bridges
>>
>>  hw/s390x/s390-pci-bus.c | 41 +++++++++++++++++++++++++++--------------
>>  1 file changed, 27 insertions(+), 14 deletions(-)
>>
> 
> I'll leave the actual review of this to folks familiar with how zPCI is
> supposed to work :)
> 
> Does the guest actually see anything of this? Consistency is good, and
> not crashing even better, but I think all of the topology is invisible
> on the guest side anyway...
> 

I am no PCI expert, but I think the guest is able to read/write these
numbers via the configuration space. As far as I can see, on x86 the
BIOS builds the topology. On Power/s390x this job is delegated to
firmware / QEMU.

There are quite some numbers in QEMU relying on these numbers to be
correct: E.g. pci_secondary_bus_in_range()

If the guest relies on the topology to be correct, things can go wrong.
I have no idea how Linux guests actually use the topology.

Also, the output of "info pci" will be wrong.

Anyhow, this is the right thing to do, but I agree that Patch #1 might
not be as critical as patch #2.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes
  2019-01-23  8:02   ` David Hildenbrand
@ 2019-01-23  9:47     ` Cornelia Huck
  0 siblings, 0 replies; 15+ messages in thread
From: Cornelia Huck @ 2019-01-23  9:47 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Collin Walling, Thomas Huth,
	Christian Borntraeger, Richard Henderson, Pierre Morel

On Wed, 23 Jan 2019 09:02:34 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 22.01.19 14:01, Cornelia Huck wrote:
> > On Tue, 22 Jan 2019 13:51:31 +0100
> > David Hildenbrand <david@redhat.com> wrote:
> >   
> >> Hotplugging of PCI bridges is right now pretty much broken. Coldplugging
> >> and hotplugging will assign wrong primary bus numbers in some scenarios.
> >>
> >> I base my knowledge on how this is supposed to work on
> >> http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html
> >>
> >> I did a couple of tests, building whole hierarchies of bridges, both
> >> hot and coldplugged. "info pci" as well as the Linux guests showed
> >> what I was expecting.
> >>
> >> David Hildenbrand (2):
> >>   s390x/pci: Fix primary bus number for PCI bridges
> >>   s390x/pci: Fix hotplugging of PCI bridges
> >>
> >>  hw/s390x/s390-pci-bus.c | 41 +++++++++++++++++++++++++++--------------
> >>  1 file changed, 27 insertions(+), 14 deletions(-)
> >>  
> > 
> > I'll leave the actual review of this to folks familiar with how zPCI is
> > supposed to work :)
> > 
> > Does the guest actually see anything of this? Consistency is good, and
> > not crashing even better, but I think all of the topology is invisible
> > on the guest side anyway...
> >   
> 
> I am no PCI expert, but I think the guest is able to read/write these
> numbers via the configuration space. As far as I can see, on x86 the
> BIOS builds the topology. On Power/s390x this job is delegated to
> firmware / QEMU.
> 
> There are quite some numbers in QEMU relying on these numbers to be
> correct: E.g. pci_secondary_bus_in_range()
> 
> If the guest relies on the topology to be correct, things can go wrong.
> I have no idea how Linux guests actually use the topology.

Ok, if this shows up in the config space, it is guest visible. I was
thinking mostly about the zPCI enumeration, which results in a flat
layout.

> 
> Also, the output of "info pci" will be wrong.
> 
> Anyhow, this is the right thing to do, but I agree that Patch #1 might
> not be as critical as patch #2.

I'm certainly not arguing against inclusion :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges
  2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges David Hildenbrand
@ 2019-01-23 10:26   ` Thomas Huth
  2019-01-23 10:32     ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Huth @ 2019-01-23 10:26 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Collin Walling, Christian Borntraeger, Cornelia Huck,
	Richard Henderson

On 2019-01-22 13:51, David Hildenbrand wrote:
> The primary bus number corresponds always to the bus number of the
> bus the bridge is attached to.
> 
> Right now, if we have two bridges attached to the same bus (e.g. root
> bus) this is however not the case. Fix assignment.
> 
> While at it
> - Add a comment why we have to reassign durign every reset (which I

s/durign/during/

>   found to be supprising)

s/supprising/surprising/

> - Drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff. As we are
>   setting it via a DFS and not via a BFS (as discussed e.g. in [1]), this
>   is not necessary. The last number when we return is the highest
>   number.

I think that explanation is slightly wrong / misleading. It's not about
DFS vs. BFS, it's about guest code vs. QEMU code. If you do a bridge
setup from the guest side, you've got to set the subordinate bus number
to 0xff to make sure that the bridge forwards all config space accesses
to the attached devices while you're scanning the devices that are
attached to the bridge.
But this code is not running in the guest, it is running in QEMU, and so
it can access the config space of the attached devices directly via
pci_default_write_config() - the write do not need to pass the parent
bridge in this case, and thus the subordinate bus number in the bridge
is ignored for the config space write access.

> Please note that hotplugging of bridges is in general still broken, will
> be fixed next.
> 
> [1] http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/s390x/s390-pci-bus.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)

With the commit message fixed:

Reviewed-by: Thomas Huth <thuth@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges
  2019-01-23 10:26   ` Thomas Huth
@ 2019-01-23 10:32     ` David Hildenbrand
  2019-01-23 10:37       ` Cornelia Huck
  2019-01-23 10:39       ` Thomas Huth
  0 siblings, 2 replies; 15+ messages in thread
From: David Hildenbrand @ 2019-01-23 10:32 UTC (permalink / raw)
  To: Thomas Huth, qemu-devel
  Cc: qemu-s390x, Collin Walling, Christian Borntraeger, Cornelia Huck,
	Richard Henderson

On 23.01.19 11:26, Thomas Huth wrote:
> On 2019-01-22 13:51, David Hildenbrand wrote:
>> The primary bus number corresponds always to the bus number of the
>> bus the bridge is attached to.
>>
>> Right now, if we have two bridges attached to the same bus (e.g. root
>> bus) this is however not the case. Fix assignment.
>>
>> While at it
>> - Add a comment why we have to reassign durign every reset (which I
> 
> s/durign/during/
> 
>>   found to be supprising)
> 
> s/supprising/surprising/
> 
>> - Drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff. As we are
>>   setting it via a DFS and not via a BFS (as discussed e.g. in [1]), this
>>   is not necessary. The last number when we return is the highest
>>   number.
> 
> I think that explanation is slightly wrong / misleading. It's not about
> DFS vs. BFS, it's about guest code vs. QEMU code. If you do a bridge
> setup from the guest side, you've got to set the subordinate bus number
> to 0xff to make sure that the bridge forwards all config space accesses
> to the attached devices while you're scanning the devices that are
> attached to the bridge.
> But this code is not running in the guest, it is running in QEMU, and so
> it can access the config space of the attached devices directly via
> pci_default_write_config() - the write do not need to pass the parent
> bridge in this case, and thus the subordinate bus number in the bridge
> is ignored for the config space write access.

Indeed, I phrased that better in the spapr/pci patch I sent, What about
this:

"
The primary bus number corresponds always to the bus number of the
bus the bridge is attached to.

Right now, if we have two bridges attached to the same bus (e.g. root
bus) this is however not the case. The first bridge will have primary
bus 0, the second bridge primary bus 1, which is wrong. Fix the assignment.

While at it, drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff.
Setting it temporarily to that value (as discussed e.g. in [1]), is
only relevant for a running system that probes the buses. The value is
effectively unused for us just doing a DFS.

Also add a comment why we have to reassign during every reset (which I
found to be surprising.

Please note that hotplugging of bridges is in general still broken, will
be fixed next.
"

> 
>> Please note that hotplugging of bridges is in general still broken, will
>> be fixed next.
>>
>> [1] http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  hw/s390x/s390-pci-bus.c | 13 ++++++++-----
>>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> With the commit message fixed:

@Conny can you fix up when applying if there are no other comments?

> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> 

Thanks Thomas!


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges
  2019-01-23 10:32     ` David Hildenbrand
@ 2019-01-23 10:37       ` Cornelia Huck
  2019-01-23 10:39       ` Thomas Huth
  1 sibling, 0 replies; 15+ messages in thread
From: Cornelia Huck @ 2019-01-23 10:37 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Thomas Huth, qemu-devel, qemu-s390x, Collin Walling,
	Christian Borntraeger, Richard Henderson

On Wed, 23 Jan 2019 11:32:48 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 23.01.19 11:26, Thomas Huth wrote:
> > On 2019-01-22 13:51, David Hildenbrand wrote:  
> >> The primary bus number corresponds always to the bus number of the
> >> bus the bridge is attached to.
> >>
> >> Right now, if we have two bridges attached to the same bus (e.g. root
> >> bus) this is however not the case. Fix assignment.
> >>
> >> While at it
> >> - Add a comment why we have to reassign durign every reset (which I  
> > 
> > s/durign/during/
> >   
> >>   found to be supprising)  
> > 
> > s/supprising/surprising/
> >   
> >> - Drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff. As we are
> >>   setting it via a DFS and not via a BFS (as discussed e.g. in [1]), this
> >>   is not necessary. The last number when we return is the highest
> >>   number.  
> > 
> > I think that explanation is slightly wrong / misleading. It's not about
> > DFS vs. BFS, it's about guest code vs. QEMU code. If you do a bridge
> > setup from the guest side, you've got to set the subordinate bus number
> > to 0xff to make sure that the bridge forwards all config space accesses
> > to the attached devices while you're scanning the devices that are
> > attached to the bridge.
> > But this code is not running in the guest, it is running in QEMU, and so
> > it can access the config space of the attached devices directly via
> > pci_default_write_config() - the write do not need to pass the parent
> > bridge in this case, and thus the subordinate bus number in the bridge
> > is ignored for the config space write access.  
> 
> Indeed, I phrased that better in the spapr/pci patch I sent, What about
> this:
> 
> "
> The primary bus number corresponds always to the bus number of the
> bus the bridge is attached to.
> 
> Right now, if we have two bridges attached to the same bus (e.g. root
> bus) this is however not the case. The first bridge will have primary
> bus 0, the second bridge primary bus 1, which is wrong. Fix the assignment.
> 
> While at it, drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff.
> Setting it temporarily to that value (as discussed e.g. in [1]), is
> only relevant for a running system that probes the buses. The value is
> effectively unused for us just doing a DFS.
> 
> Also add a comment why we have to reassign during every reset (which I
> found to be surprising.
> 
> Please note that hotplugging of bridges is in general still broken, will
> be fixed next.
> "
> 
> >   
> >> Please note that hotplugging of bridges is in general still broken, will
> >> be fixed next.
> >>
> >> [1] http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html
> >>
> >> Signed-off-by: David Hildenbrand <david@redhat.com>
> >> ---
> >>  hw/s390x/s390-pci-bus.c | 13 ++++++++-----
> >>  1 file changed, 8 insertions(+), 5 deletions(-)  
> > 
> > With the commit message fixed:  
> 
> @Conny can you fix up when applying if there are no other comments?

Sure, can do. Waiting for zpci maintainer ack :)

> 
> > 
> > Reviewed-by: Thomas Huth <thuth@redhat.com>
> >   
> 
> Thanks Thomas!
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges
  2019-01-23 10:32     ` David Hildenbrand
  2019-01-23 10:37       ` Cornelia Huck
@ 2019-01-23 10:39       ` Thomas Huth
  1 sibling, 0 replies; 15+ messages in thread
From: Thomas Huth @ 2019-01-23 10:39 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Collin Walling, Christian Borntraeger, Cornelia Huck,
	Richard Henderson

On 2019-01-23 11:32, David Hildenbrand wrote:
> On 23.01.19 11:26, Thomas Huth wrote:
>> On 2019-01-22 13:51, David Hildenbrand wrote:
>>> The primary bus number corresponds always to the bus number of the
>>> bus the bridge is attached to.
>>>
>>> Right now, if we have two bridges attached to the same bus (e.g. root
>>> bus) this is however not the case. Fix assignment.
>>>
>>> While at it
>>> - Add a comment why we have to reassign durign every reset (which I
>>
>> s/durign/during/
>>
>>>   found to be supprising)
>>
>> s/supprising/surprising/
>>
>>> - Drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff. As we are
>>>   setting it via a DFS and not via a BFS (as discussed e.g. in [1]), this
>>>   is not necessary. The last number when we return is the highest
>>>   number.
>>
>> I think that explanation is slightly wrong / misleading. It's not about
>> DFS vs. BFS, it's about guest code vs. QEMU code. If you do a bridge
>> setup from the guest side, you've got to set the subordinate bus number
>> to 0xff to make sure that the bridge forwards all config space accesses
>> to the attached devices while you're scanning the devices that are
>> attached to the bridge.
>> But this code is not running in the guest, it is running in QEMU, and so
>> it can access the config space of the attached devices directly via
>> pci_default_write_config() - the write do not need to pass the parent
>> bridge in this case, and thus the subordinate bus number in the bridge
>> is ignored for the config space write access.
> 
> Indeed, I phrased that better in the spapr/pci patch I sent, What about
> this:
> 
> "
> The primary bus number corresponds always to the bus number of the
> bus the bridge is attached to.
> 
> Right now, if we have two bridges attached to the same bus (e.g. root
> bus) this is however not the case. The first bridge will have primary
> bus 0, the second bridge primary bus 1, which is wrong. Fix the assignment.
> 
> While at it, drop setting the PCI_SUBORDINATE_BUS temporarily to 0xff.
> Setting it temporarily to that value (as discussed e.g. in [1]), is
> only relevant for a running system that probes the buses. The value is
> effectively unused for us just doing a DFS.

I'd rather replace the last sentence with:

The value is not necessary in QEMU code since we can access the config
space of the devices directly here, without the need to pass the config
space write requests through the superordinate bridges.

Apart from that, text looks fine to me now.

 Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of PCI bridges
  2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of " David Hildenbrand
@ 2019-01-23 11:16   ` Thomas Huth
  2019-01-23 11:42     ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Huth @ 2019-01-23 11:16 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Collin Walling, Christian Borntraeger, Cornelia Huck,
	Richard Henderson, Pierre Morel, Michael S . Tsirkin,
	Laurent Vivier, Marcel Apfelbaum

On 2019-01-22 13:51, David Hildenbrand wrote:
> When hotplugging a PCI bridge right now to the root port, we resolve
> pci_get_bus(pdev)->parent_dev, which results in a SEGFAULT. Hotplugging
> really only works right now when hotplugging to another bridge.
> 
> Instead, we have to properly check if we are already at the root.
> 
> Let's cleanup the code while at it a bit and factor out updating the
> subordiante bus number into a separate function. The check for
> "old_nr < nr" is right now not strictly necessary, but makes it more
> obvious what is actually going on.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  hw/s390x/s390-pci-bus.c | 28 +++++++++++++++++++---------
>  1 file changed, 19 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
> index b7c4613fde..9b5c5fff60 100644
> --- a/hw/s390x/s390-pci-bus.c
> +++ b/hw/s390x/s390-pci-bus.c
> @@ -843,6 +843,21 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      }
>  }
>  
> +static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
> +{
> +    uint32_t old_nr;
> +
> +    pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
> +    while (!pci_bus_is_root(pci_get_bus(dev))) {
> +        dev = pci_get_bus(dev)->parent_dev;
> +
> +        old_nr = pci_default_read_config(dev, PCI_SUBORDINATE_BUS, 1);
> +        if (old_nr < nr) {
> +            pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
> +        }
> +    }
> +}
> +
>  static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>                                Error **errp)
>  {
> @@ -851,26 +866,21 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>      S390PCIBusDevice *pbdev = NULL;
>  
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
> -        BusState *bus;
>          PCIBridge *pb = PCI_BRIDGE(dev);
> -        PCIDevice *pdev = PCI_DEVICE(dev);
>  
> +        pdev = PCI_DEVICE(dev);

Why not keep the direct assignment?

>          pci_bridge_map_irq(pb, dev->id, s390_pci_map_irq);
>          pci_setup_iommu(&pb->sec_bus, s390_pci_dma_iommu, s);
>  
> -        bus = BUS(&pb->sec_bus);
> -        qbus_set_hotplug_handler(bus, DEVICE(s), errp);
> +        qbus_set_hotplug_handler(BUS(&pb->sec_bus), DEVICE(s), errp);
>  
>          if (dev->hotplugged) {
>              pci_default_write_config(pdev, PCI_PRIMARY_BUS,
>                                       pci_dev_bus_num(pdev), 1);
>              s->bus_no += 1;
>              pci_default_write_config(pdev, PCI_SECONDARY_BUS, s->bus_no, 1);
> -            do {
> -                pdev = pci_get_bus(pdev)->parent_dev;
> -                pci_default_write_config(pdev, PCI_SUBORDINATE_BUS,
> -                                         s->bus_no, 1);
> -            } while (pci_get_bus(pdev) && pci_dev_bus_num(pdev));
> +
> +            s390_pci_update_subordinate(pdev, s->bus_no);


While your changes look OK at a first glance, and certainly fix the
crash, I wonder whether this is the right thing to do here at all. Why
does the hypervisor set up the all the bridge registers here? I'd rather
expect that this is the job of the guest after it detects a hot-plugged
device?

Also I'm not sure whether the numbering is right in every case. Let's
use the example from the URL that you've mentioned in the cover letter
(http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html step 4):

Assume that you hot-plug another bridge "5" to "Bridge 2". If I get the
code right, the setup will now look like this:

               CPU
                |
  --------------+-------------- Bus 0
                |
                |  Pri=0
               BR1 Sec=1
                |  Sub=5
                |
  ------+-------+----------+--- Bus 1
        |                  |
        |  Pri=1           |  Pri=1
       BR3 Sec=3          BR2 Sec=2
        |  Sub=4           |  Sub=5
        |                  |
  --+---+--- Bus 3       --+-+- Bus 2
    |                        |
    |  Pri=3                 |  Pri=2
   BR4 Sec=4                BR5 Sec=5
    |  Sub=4                 |  Sub=5
    |                        |
  --+---- Bus 4            --+-- Bus 5

Requests to bus 3 and 4 are now also routed through to bus 2 and 5. That
sounds wrong to me. Or is this how hot-plugging of PCI bridges is
supposed to work??

 Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of PCI bridges
  2019-01-23 11:16   ` Thomas Huth
@ 2019-01-23 11:42     ` David Hildenbrand
  2019-01-23 13:42       ` Thomas Huth
  0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2019-01-23 11:42 UTC (permalink / raw)
  To: Thomas Huth, qemu-devel
  Cc: qemu-s390x, Collin Walling, Christian Borntraeger, Cornelia Huck,
	Richard Henderson, Pierre Morel, Michael S . Tsirkin,
	Laurent Vivier, Marcel Apfelbaum

On 23.01.19 12:16, Thomas Huth wrote:
> On 2019-01-22 13:51, David Hildenbrand wrote:
>> When hotplugging a PCI bridge right now to the root port, we resolve
>> pci_get_bus(pdev)->parent_dev, which results in a SEGFAULT. Hotplugging
>> really only works right now when hotplugging to another bridge.
>>
>> Instead, we have to properly check if we are already at the root.
>>
>> Let's cleanup the code while at it a bit and factor out updating the
>> subordiante bus number into a separate function. The check for
>> "old_nr < nr" is right now not strictly necessary, but makes it more
>> obvious what is actually going on.
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>  hw/s390x/s390-pci-bus.c | 28 +++++++++++++++++++---------
>>  1 file changed, 19 insertions(+), 9 deletions(-)
>>
>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>> index b7c4613fde..9b5c5fff60 100644
>> --- a/hw/s390x/s390-pci-bus.c
>> +++ b/hw/s390x/s390-pci-bus.c
>> @@ -843,6 +843,21 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>      }
>>  }
>>  
>> +static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
>> +{
>> +    uint32_t old_nr;
>> +
>> +    pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
>> +    while (!pci_bus_is_root(pci_get_bus(dev))) {
>> +        dev = pci_get_bus(dev)->parent_dev;
>> +
>> +        old_nr = pci_default_read_config(dev, PCI_SUBORDINATE_BUS, 1);
>> +        if (old_nr < nr) {
>> +            pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
>> +        }
>> +    }
>> +}
>> +
>>  static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>                                Error **errp)
>>  {
>> @@ -851,26 +866,21 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>      S390PCIBusDevice *pbdev = NULL;
>>  
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
>> -        BusState *bus;
>>          PCIBridge *pb = PCI_BRIDGE(dev);
>> -        PCIDevice *pdev = PCI_DEVICE(dev);
>>  
>> +        pdev = PCI_DEVICE(dev);
> 
> Why not keep the direct assignment?

We have the same local variable in that function already. I'm just
reusing it.

> 
>>          pci_bridge_map_irq(pb, dev->id, s390_pci_map_irq);
>>          pci_setup_iommu(&pb->sec_bus, s390_pci_dma_iommu, s);
>>  
>> -        bus = BUS(&pb->sec_bus);
>> -        qbus_set_hotplug_handler(bus, DEVICE(s), errp);
>> +        qbus_set_hotplug_handler(BUS(&pb->sec_bus), DEVICE(s), errp);
>>  
>>          if (dev->hotplugged) {
>>              pci_default_write_config(pdev, PCI_PRIMARY_BUS,
>>                                       pci_dev_bus_num(pdev), 1);
>>              s->bus_no += 1;
>>              pci_default_write_config(pdev, PCI_SECONDARY_BUS, s->bus_no, 1);
>> -            do {
>> -                pdev = pci_get_bus(pdev)->parent_dev;
>> -                pci_default_write_config(pdev, PCI_SUBORDINATE_BUS,
>> -                                         s->bus_no, 1);
>> -            } while (pci_get_bus(pdev) && pci_dev_bus_num(pdev));
>> +
>> +            s390_pci_update_subordinate(pdev, s->bus_no);
> 
> 
> While your changes look OK at a first glance, and certainly fix the
> crash, I wonder whether this is the right thing to do here at all. Why
> does the hypervisor set up the all the bridge registers here? I'd rather
> expect that this is the job of the guest after it detects a hot-plugged
> device?

I honestly have no idea who is responsible for that. I can see that
spapr does not seem to handle hotplugging the way s390x does. I was
hoping that Colin maybe know why we have to do that on s390x this way.
At least this patch does not change the behavior but only fixes it.

> 
> Also I'm not sure whether the numbering is right in every case. Let's
> use the example from the URL that you've mentioned in the cover letter
> (http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html step 4):
> 
> Assume that you hot-plug another bridge "5" to "Bridge 2". If I get the
> code right, the setup will now look like this:

Yes, this is what I would expect! This implies that any bus in the
hierarchy lies between Sec and Sub. But that there might be multiple
candidates, which feels wrong as you correctly say.

> 
>                CPU
>                 |
>   --------------+-------------- Bus 0
>                 |
>                 |  Pri=0
>                BR1 Sec=1
>                 |  Sub=5
>                 |
>   ------+-------+----------+--- Bus 1
>         |                  |
>         |  Pri=1           |  Pri=1
>        BR3 Sec=3          BR2 Sec=2
>         |  Sub=4           |  Sub=5
>         |                  |
>   --+---+--- Bus 3       --+-+- Bus 2
>     |                        |
>     |  Pri=3                 |  Pri=2
>    BR4 Sec=4                BR5 Sec=5
>     |  Sub=4                 |  Sub=5
>     |                        |
>   --+---- Bus 4            --+-- Bus 5
> 
> Requests to bus 3 and 4 are now also routed through to bus 2 and 5. That
> sounds wrong to me. Or is this how hot-plugging of PCI bridges is
> supposed to work??
> 

Learning more about PCI

Quoting from [2]

"PCIe enumeration is generally done twice. First, your BIOS (UEFI or
otherwise) will do it, to figure out who's present and how much memory
they need. This data can then be passed on to the host OS who can take
it as-is, but Linux and Windows often perform their own enumeration
procedure ..."

"The custom software part comes in during this enumeration process and
that is you must reserve ahead of time PCI Bus numbers, and memory
segments for potential future devices -- this is sometimes called 'bus
padding'. This avoids the need to re-enumerate the bus in the future
which can often not be done without disruption to the system ..."


So if I understand correctly, the BIOS/Firmware/QEMU only provides the
initial state. After that, the guest is responsible to manage it,
especially to manage hotplug.

However, I wonder if it would also be up to BIOS/Firmware/QEMU to do
this bus padding or if it is completely on the OS side. And if
hotplugged devices simply have all 0's.

What about special cases where hotplug happends after BIOS enumerated
but before the guest started?

[2]
https://electronics.stackexchange.com/questions/208767/does-pcie-hotplug-actually-work-in-practice


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of PCI bridges
  2019-01-23 11:42     ` David Hildenbrand
@ 2019-01-23 13:42       ` Thomas Huth
  2019-01-23 14:26         ` David Hildenbrand
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Huth @ 2019-01-23 13:42 UTC (permalink / raw)
  To: David Hildenbrand, qemu-devel
  Cc: qemu-s390x, Collin Walling, Christian Borntraeger, Cornelia Huck,
	Richard Henderson, Pierre Morel, Michael S . Tsirkin,
	Laurent Vivier, Marcel Apfelbaum, Gerald Schaefer, linux-s390,
	Sebastian Ott

On 2019-01-23 12:42, David Hildenbrand wrote:
> On 23.01.19 12:16, Thomas Huth wrote:
>> On 2019-01-22 13:51, David Hildenbrand wrote:
>>> When hotplugging a PCI bridge right now to the root port, we resolve
>>> pci_get_bus(pdev)->parent_dev, which results in a SEGFAULT. Hotplugging
>>> really only works right now when hotplugging to another bridge.
>>>
>>> Instead, we have to properly check if we are already at the root.
>>>
>>> Let's cleanup the code while at it a bit and factor out updating the
>>> subordiante bus number into a separate function. The check for
>>> "old_nr < nr" is right now not strictly necessary, but makes it more
>>> obvious what is actually going on.
>>>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>>  hw/s390x/s390-pci-bus.c | 28 +++++++++++++++++++---------
>>>  1 file changed, 19 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>> index b7c4613fde..9b5c5fff60 100644
>>> --- a/hw/s390x/s390-pci-bus.c
>>> +++ b/hw/s390x/s390-pci-bus.c
>>> @@ -843,6 +843,21 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>      }
>>>  }
>>>  
>>> +static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
>>> +{
>>> +    uint32_t old_nr;
>>> +
>>> +    pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
>>> +    while (!pci_bus_is_root(pci_get_bus(dev))) {
>>> +        dev = pci_get_bus(dev)->parent_dev;
>>> +
>>> +        old_nr = pci_default_read_config(dev, PCI_SUBORDINATE_BUS, 1);
>>> +        if (old_nr < nr) {
>>> +            pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
>>> +        }
>>> +    }
>>> +}
>>> +
>>>  static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>                                Error **errp)
>>>  {
>>> @@ -851,26 +866,21 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>      S390PCIBusDevice *pbdev = NULL;
>>>  
>>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
>>> -        BusState *bus;
>>>          PCIBridge *pb = PCI_BRIDGE(dev);
>>> -        PCIDevice *pdev = PCI_DEVICE(dev);
>>>  
>>> +        pdev = PCI_DEVICE(dev);
>>
>> Why not keep the direct assignment?
> 
> We have the same local variable in that function already. I'm just
> reusing it.

D'oh, right. Please disregard my comment.

>>
>>>          pci_bridge_map_irq(pb, dev->id, s390_pci_map_irq);
>>>          pci_setup_iommu(&pb->sec_bus, s390_pci_dma_iommu, s);
>>>  
>>> -        bus = BUS(&pb->sec_bus);
>>> -        qbus_set_hotplug_handler(bus, DEVICE(s), errp);
>>> +        qbus_set_hotplug_handler(BUS(&pb->sec_bus), DEVICE(s), errp);
>>>  
>>>          if (dev->hotplugged) {
>>>              pci_default_write_config(pdev, PCI_PRIMARY_BUS,
>>>                                       pci_dev_bus_num(pdev), 1);
>>>              s->bus_no += 1;
>>>              pci_default_write_config(pdev, PCI_SECONDARY_BUS, s->bus_no, 1);
>>> -            do {
>>> -                pdev = pci_get_bus(pdev)->parent_dev;
>>> -                pci_default_write_config(pdev, PCI_SUBORDINATE_BUS,
>>> -                                         s->bus_no, 1);
>>> -            } while (pci_get_bus(pdev) && pci_dev_bus_num(pdev));
>>> +
>>> +            s390_pci_update_subordinate(pdev, s->bus_no);
>>
>>
>> While your changes look OK at a first glance, and certainly fix the
>> crash, I wonder whether this is the right thing to do here at all. Why
>> does the hypervisor set up the all the bridge registers here? I'd rather
>> expect that this is the job of the guest after it detects a hot-plugged
>> device?
> 
> I honestly have no idea who is responsible for that. I can see that
> spapr does not seem to handle hotplugging the way s390x does. I was
> hoping that Colin maybe know why we have to do that on s390x this way.
> At least this patch does not change the behavior but only fixes it.

Right, you're patch certainly fixes a crash, and does not make it any
worse than it was before, so:

Reviewed-by: Thomas Huth <thuth@redhat.com>

>>
>> Also I'm not sure whether the numbering is right in every case. Let's
>> use the example from the URL that you've mentioned in the cover letter
>> (http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html step 4):
>>
>> Assume that you hot-plug another bridge "5" to "Bridge 2". If I get the
>> code right, the setup will now look like this:
> 
> Yes, this is what I would expect! This implies that any bus in the
> hierarchy lies between Sec and Sub. But that there might be multiple
> candidates, which feels wrong as you correctly say.
> 
>>
>>                CPU
>>                 |
>>   --------------+-------------- Bus 0
>>                 |
>>                 |  Pri=0
>>                BR1 Sec=1
>>                 |  Sub=5
>>                 |
>>   ------+-------+----------+--- Bus 1
>>         |                  |
>>         |  Pri=1           |  Pri=1
>>        BR3 Sec=3          BR2 Sec=2
>>         |  Sub=4           |  Sub=5
>>         |                  |
>>   --+---+--- Bus 3       --+-+- Bus 2
>>     |                        |
>>     |  Pri=3                 |  Pri=2
>>    BR4 Sec=4                BR5 Sec=5
>>     |  Sub=4                 |  Sub=5
>>     |                        |
>>   --+---- Bus 4            --+-- Bus 5
>>
>> Requests to bus 3 and 4 are now also routed through to bus 2 and 5. That
>> sounds wrong to me. Or is this how hot-plugging of PCI bridges is
>> supposed to work??
>>
> 
> Learning more about PCI
> 
> Quoting from [2]
> 
> "PCIe enumeration is generally done twice. First, your BIOS (UEFI or
> otherwise) will do it, to figure out who's present and how much memory
> they need. This data can then be passed on to the host OS who can take
> it as-is, but Linux and Windows often perform their own enumeration
> procedure ..."
> 
> "The custom software part comes in during this enumeration process and
> that is you must reserve ahead of time PCI Bus numbers, and memory
> segments for potential future devices -- this is sometimes called 'bus
> padding'. This avoids the need to re-enumerate the bus in the future
> which can often not be done without disruption to the system ..."
> 
> So if I understand correctly, the BIOS/Firmware/QEMU only provides the
> initial state. After that, the guest is responsible to manage it,
> especially to manage hotplug.
> 
> However, I wonder if it would also be up to BIOS/Firmware/QEMU to do
> this bus padding or if it is completely on the OS side. And if
> hotplugged devices simply have all 0's.

Anyway, the current behavior sounds pretty wrong to me. I am sure that
we can not simply traverse the bridge hierarchy up to the top and change
the subordinate bus setting everywhere without informing the guest about
that change. What if the guest OS has cached the values from the
bridges? It then might use the wrong value for future calculations...

> What about special cases where hotplug happends after BIOS enumerated
> but before the guest started?

I can't comment on s390x, but IIRC, on POWER, there is a hotplug
interrupt which keeps being pending until the guest OS has initialized
it's interrupt handler. Then it is up to the guest OS to set up the PCI
device, if I remember clearly.

 Thomas


> [2]
> https://electronics.stackexchange.com/questions/208767/does-pcie-hotplug-actually-work-in-practice
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of PCI bridges
  2019-01-23 13:42       ` Thomas Huth
@ 2019-01-23 14:26         ` David Hildenbrand
  0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2019-01-23 14:26 UTC (permalink / raw)
  To: Thomas Huth, qemu-devel
  Cc: qemu-s390x, Collin Walling, Christian Borntraeger, Cornelia Huck,
	Richard Henderson, Pierre Morel, Michael S . Tsirkin,
	Laurent Vivier, Marcel Apfelbaum, Gerald Schaefer, linux-s390,
	Sebastian Ott

On 23.01.19 14:42, Thomas Huth wrote:
> On 2019-01-23 12:42, David Hildenbrand wrote:
>> On 23.01.19 12:16, Thomas Huth wrote:
>>> On 2019-01-22 13:51, David Hildenbrand wrote:
>>>> When hotplugging a PCI bridge right now to the root port, we resolve
>>>> pci_get_bus(pdev)->parent_dev, which results in a SEGFAULT. Hotplugging
>>>> really only works right now when hotplugging to another bridge.
>>>>
>>>> Instead, we have to properly check if we are already at the root.
>>>>
>>>> Let's cleanup the code while at it a bit and factor out updating the
>>>> subordiante bus number into a separate function. The check for
>>>> "old_nr < nr" is right now not strictly necessary, but makes it more
>>>> obvious what is actually going on.
>>>>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>>  hw/s390x/s390-pci-bus.c | 28 +++++++++++++++++++---------
>>>>  1 file changed, 19 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/hw/s390x/s390-pci-bus.c b/hw/s390x/s390-pci-bus.c
>>>> index b7c4613fde..9b5c5fff60 100644
>>>> --- a/hw/s390x/s390-pci-bus.c
>>>> +++ b/hw/s390x/s390-pci-bus.c
>>>> @@ -843,6 +843,21 @@ static void s390_pcihost_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>      }
>>>>  }
>>>>  
>>>> +static void s390_pci_update_subordinate(PCIDevice *dev, uint32_t nr)
>>>> +{
>>>> +    uint32_t old_nr;
>>>> +
>>>> +    pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
>>>> +    while (!pci_bus_is_root(pci_get_bus(dev))) {
>>>> +        dev = pci_get_bus(dev)->parent_dev;
>>>> +
>>>> +        old_nr = pci_default_read_config(dev, PCI_SUBORDINATE_BUS, 1);
>>>> +        if (old_nr < nr) {
>>>> +            pci_default_write_config(dev, PCI_SUBORDINATE_BUS, nr, 1);
>>>> +        }
>>>> +    }
>>>> +}
>>>> +
>>>>  static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>                                Error **errp)
>>>>  {
>>>> @@ -851,26 +866,21 @@ static void s390_pcihost_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
>>>>      S390PCIBusDevice *pbdev = NULL;
>>>>  
>>>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) {
>>>> -        BusState *bus;
>>>>          PCIBridge *pb = PCI_BRIDGE(dev);
>>>> -        PCIDevice *pdev = PCI_DEVICE(dev);
>>>>  
>>>> +        pdev = PCI_DEVICE(dev);
>>>
>>> Why not keep the direct assignment?
>>
>> We have the same local variable in that function already. I'm just
>> reusing it.
> 
> D'oh, right. Please disregard my comment.
> 
>>>
>>>>          pci_bridge_map_irq(pb, dev->id, s390_pci_map_irq);
>>>>          pci_setup_iommu(&pb->sec_bus, s390_pci_dma_iommu, s);
>>>>  
>>>> -        bus = BUS(&pb->sec_bus);
>>>> -        qbus_set_hotplug_handler(bus, DEVICE(s), errp);
>>>> +        qbus_set_hotplug_handler(BUS(&pb->sec_bus), DEVICE(s), errp);
>>>>  
>>>>          if (dev->hotplugged) {
>>>>              pci_default_write_config(pdev, PCI_PRIMARY_BUS,
>>>>                                       pci_dev_bus_num(pdev), 1);
>>>>              s->bus_no += 1;
>>>>              pci_default_write_config(pdev, PCI_SECONDARY_BUS, s->bus_no, 1);
>>>> -            do {
>>>> -                pdev = pci_get_bus(pdev)->parent_dev;
>>>> -                pci_default_write_config(pdev, PCI_SUBORDINATE_BUS,
>>>> -                                         s->bus_no, 1);
>>>> -            } while (pci_get_bus(pdev) && pci_dev_bus_num(pdev));
>>>> +
>>>> +            s390_pci_update_subordinate(pdev, s->bus_no);
>>>
>>>
>>> While your changes look OK at a first glance, and certainly fix the
>>> crash, I wonder whether this is the right thing to do here at all. Why
>>> does the hypervisor set up the all the bridge registers here? I'd rather
>>> expect that this is the job of the guest after it detects a hot-plugged
>>> device?
>>
>> I honestly have no idea who is responsible for that. I can see that
>> spapr does not seem to handle hotplugging the way s390x does. I was
>> hoping that Colin maybe know why we have to do that on s390x this way.
>> At least this patch does not change the behavior but only fixes it.
> 
> Right, you're patch certainly fixes a crash, and does not make it any
> worse than it was before, so:
> 
> Reviewed-by: Thomas Huth <thuth@redhat.com>
> 
>>>
>>> Also I'm not sure whether the numbering is right in every case. Let's
>>> use the example from the URL that you've mentioned in the cover letter
>>> (http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html step 4):
>>>
>>> Assume that you hot-plug another bridge "5" to "Bridge 2". If I get the
>>> code right, the setup will now look like this:
>>
>> Yes, this is what I would expect! This implies that any bus in the
>> hierarchy lies between Sec and Sub. But that there might be multiple
>> candidates, which feels wrong as you correctly say.
>>
>>>
>>>                CPU
>>>                 |
>>>   --------------+-------------- Bus 0
>>>                 |
>>>                 |  Pri=0
>>>                BR1 Sec=1
>>>                 |  Sub=5
>>>                 |
>>>   ------+-------+----------+--- Bus 1
>>>         |                  |
>>>         |  Pri=1           |  Pri=1
>>>        BR3 Sec=3          BR2 Sec=2
>>>         |  Sub=4           |  Sub=5
>>>         |                  |
>>>   --+---+--- Bus 3       --+-+- Bus 2
>>>     |                        |
>>>     |  Pri=3                 |  Pri=2
>>>    BR4 Sec=4                BR5 Sec=5
>>>     |  Sub=4                 |  Sub=5
>>>     |                        |
>>>   --+---- Bus 4            --+-- Bus 5
>>>
>>> Requests to bus 3 and 4 are now also routed through to bus 2 and 5. That
>>> sounds wrong to me. Or is this how hot-plugging of PCI bridges is
>>> supposed to work??
>>>
>>
>> Learning more about PCI
>>
>> Quoting from [2]
>>
>> "PCIe enumeration is generally done twice. First, your BIOS (UEFI or
>> otherwise) will do it, to figure out who's present and how much memory
>> they need. This data can then be passed on to the host OS who can take
>> it as-is, but Linux and Windows often perform their own enumeration
>> procedure ..."
>>
>> "The custom software part comes in during this enumeration process and
>> that is you must reserve ahead of time PCI Bus numbers, and memory
>> segments for potential future devices -- this is sometimes called 'bus
>> padding'. This avoids the need to re-enumerate the bus in the future
>> which can often not be done without disruption to the system ..."
>>
>> So if I understand correctly, the BIOS/Firmware/QEMU only provides the
>> initial state. After that, the guest is responsible to manage it,
>> especially to manage hotplug.
>>
>> However, I wonder if it would also be up to BIOS/Firmware/QEMU to do
>> this bus padding or if it is completely on the OS side. And if
>> hotplugged devices simply have all 0's.
> 
> Anyway, the current behavior sounds pretty wrong to me. I am sure that
> we can not simply traverse the bridge hierarchy up to the top and change
> the subordinate bus setting everywhere without informing the guest about
> that change. What if the guest OS has cached the values from the
> bridges? It then might use the wrong value for future calculations...

Yes, it also sounds wrong to me. Maybe this is what Firmware manages on
s390x - maybe they reserve bus numbers. Or maybe hotplug has to really
be handled by the guests. Only IBM people can clarify that.

Thanks!

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes
  2019-01-22 12:51 [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes David Hildenbrand
                   ` (2 preceding siblings ...)
  2019-01-22 13:01 ` [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes Cornelia Huck
@ 2019-01-28 11:42 ` Cornelia Huck
  3 siblings, 0 replies; 15+ messages in thread
From: Cornelia Huck @ 2019-01-28 11:42 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: qemu-devel, qemu-s390x, Collin Walling, Thomas Huth,
	Christian Borntraeger, Richard Henderson

On Tue, 22 Jan 2019 13:51:31 +0100
David Hildenbrand <david@redhat.com> wrote:

> Hotplugging of PCI bridges is right now pretty much broken. Coldplugging
> and hotplugging will assign wrong primary bus numbers in some scenarios.
> 
> I base my knowledge on how this is supposed to work on
> http://www.science.unitn.it/~fiorella/guidelinux/tlk/node76.html
> 
> I did a couple of tests, building whole hierarchies of bridges, both
> hot and coldplugged. "info pci" as well as the Linux guests showed
> what I was expecting.
> 
> David Hildenbrand (2):
>   s390x/pci: Fix primary bus number for PCI bridges
>   s390x/pci: Fix hotplugging of PCI bridges
> 
>  hw/s390x/s390-pci-bus.c | 41 +++++++++++++++++++++++++++--------------
>  1 file changed, 27 insertions(+), 14 deletions(-)
> 

So, if I understand the discussion correctly:
- Patch 1 is fine, just needs a description tweak.
- Patch 2 fixes a crash, and certainly makes things better. However,
  current handling seems to be at least a bit odd, if not broken;
  something to be looked into later.

I'd like to queue these, can I get an ack/feedback from IBM?

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-01-28 11:42 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-22 12:51 [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes David Hildenbrand
2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 1/2] s390x/pci: Fix primary bus number for PCI bridges David Hildenbrand
2019-01-23 10:26   ` Thomas Huth
2019-01-23 10:32     ` David Hildenbrand
2019-01-23 10:37       ` Cornelia Huck
2019-01-23 10:39       ` Thomas Huth
2019-01-22 12:51 ` [Qemu-devel] [PATCH v1 2/2] s390x/pci: Fix hotplugging of " David Hildenbrand
2019-01-23 11:16   ` Thomas Huth
2019-01-23 11:42     ` David Hildenbrand
2019-01-23 13:42       ` Thomas Huth
2019-01-23 14:26         ` David Hildenbrand
2019-01-22 13:01 ` [Qemu-devel] [PATCH v1 0/2] s390x/pci: PCI bridge plugging fixes Cornelia Huck
2019-01-23  8:02   ` David Hildenbrand
2019-01-23  9:47     ` Cornelia Huck
2019-01-28 11:42 ` Cornelia Huck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.