All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Gibson <david@gibson.dropbear.id.au>
To: Greg Kurz <groug@kaod.org>
Cc: qemu-devel@nongnu.org, Thomas Huth <thuth@redhat.com>,
	Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-ppc@nongnu.org, Marcel Apfelbaum <marcel@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] spapr/pci: populate PCI DT in reverse order
Date: Tue, 28 Feb 2017 11:51:32 +1100	[thread overview]
Message-ID: <20170228005132.GI17615@umbus.fritz.box> (raw)
In-Reply-To: <148776029578.5865.5785337570950575739.stgit@bahia>

[-- Attachment #1: Type: text/plain, Size: 7273 bytes --]

On Wed, Feb 22, 2017 at 11:56:53AM +0100, Greg Kurz wrote:
> From: Greg Kurz <gkurz@linux.vnet.ibm.com>
> 
> Since commit 1d2d974244c6 "spapr_pci: enumerate and add PCI device tree", QEMU
> populates the PCI device tree in the opposite order compared to SLOF.
> 
> Before 1d2d974244c6:
> 
> Populating /pci@800000020000000
>                      00 0000 (D) : 1af4 1000    virtio [ net ]
>                      00 0800 (D) : 1af4 1001    virtio [ block ]
>                      00 1000 (D) : 1af4 1009    virtio [ network ]
> Populating /pci@800000020000000/unknown-legacy-device@2
> 
> 7e5294b8 :  /pci@800000020000000
> 7e52b998 :  |-- ethernet@0
> 7e52c0c8 :  |-- scsi@1
> 7e52c7e8 :  +-- unknown-legacy-device@2 ok
> 
> Since 1d2d974244c6:
> 
> Populating /pci@800000020000000
>                      00 1000 (D) : 1af4 1009    virtio [ network ]
> Populating /pci@800000020000000/unknown-legacy-device@2
>                      00 0800 (D) : 1af4 1001    virtio [ block ]
>                      00 0000 (D) : 1af4 1000    virtio [ net ]
> 
> 7e5e8118 :  /pci@800000020000000
> 7e5ea6a0 :  |-- unknown-legacy-device@2
> 7e5eadb8 :  |-- scsi@1
> 7e5eb4d8 :  +-- ethernet@0 ok
> 
> This behaviour change is not actually a bug since no assumptions should be
> made on DT ordering. But it has no real justification either, other than
> being the consequence of the way fdt_add_subnode() inserts new elements
> to the front of the FDT rather than adding them to the tail.
> 
> This patch reverts to the historical SLOF ordering by walking PCI devices
> in reverse order. This reconciles pseries with x86 machine types behavior.
> It is expected to make things easier when porting existing applications to
> power.
> 
> Signed-off-by: Greg Kurz <gkurz@linux.vnet.ibm.com>
> Tested-by: Thomas Huth <thuth@redhat.com>
> Reviewed-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
> (slight update to the changelog)
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  hw/pci/pci.c         |   28 ++++++++++++++++++++++++++++
>  hw/ppc/spapr_pci.c   |   12 ++++++------
>  include/hw/pci/pci.h |    4 ++++
>  3 files changed, 38 insertions(+), 6 deletions(-)
> 
> David,
> 
> This patch was posted and already discussed during 2.5 development:
> 
> http://patchwork.ozlabs.org/patch/549925/
> 
> The "consensus" at the time was that guests should not rely on device
> ordering (i.e. use persistent naming instead).
> 
> I got recently contacted by OpenStack people who had several complaints
> about the reverse ordering of PCI devices in pseries: different behavior
> between ppc64 and x86, lots of time spent in debugging when porting
> applications from x86 to ppc64 before realizing that it is caused by the
> reverse ordering, necessity to carry hacky workarounds...
> 
> One strong argument against handling this properly with persistent naming
> is that it requires systemd/udev. This option is considered as painful
> with CirrOS, which aims at remaining as minimal as possible and is widely
> used in the OpenStack ecosystem.
> 
> Would you re-consider your position and apply this patch ?

As it happens, I'd thought about this from time to time already, and
concluded that (re-)reversing the DT order was probably the least bad
approach.

So, applied to ppc-for-2.9.

> 
> Cheers.
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index a563555e7da7..273f1e46025a 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1530,6 +1530,34 @@ static const pci_class_desc pci_class_descriptions[] =
>      { 0, NULL}
>  };
>  
> +static void pci_for_each_device_under_bus_reverse(PCIBus *bus,
> +                                                  void (*fn)(PCIBus *b,
> +                                                             PCIDevice *d,
> +                                                             void *opaque),
> +                                                  void *opaque)
> +{
> +    PCIDevice *d;
> +    int devfn;
> +
> +    for (devfn = 0; devfn < ARRAY_SIZE(bus->devices); devfn++) {
> +        d = bus->devices[ARRAY_SIZE(bus->devices) - 1 - devfn];
> +        if (d) {
> +            fn(bus, d, opaque);
> +        }
> +    }
> +}
> +
> +void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
> +                         void (*fn)(PCIBus *b, PCIDevice *d, void *opaque),
> +                         void *opaque)
> +{
> +    bus = pci_find_bus_nr(bus, bus_num);
> +
> +    if (bus) {
> +        pci_for_each_device_under_bus_reverse(bus, fn, opaque);
> +    }
> +}
> +
>  static void pci_for_each_device_under_bus(PCIBus *bus,
>                                            void (*fn)(PCIBus *b, PCIDevice *d,
>                                                       void *opaque),
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index fd6fc1d95344..2a20c2a140fc 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -1782,9 +1782,9 @@ static void spapr_populate_pci_devices_dt(PCIBus *bus, PCIDevice *pdev,
>      s_fdt.fdt = p->fdt;
>      s_fdt.node_off = offset;
>      s_fdt.sphb = p->sphb;
> -    pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> -                        spapr_populate_pci_devices_dt,
> -                        &s_fdt);
> +    pci_for_each_device_reverse(sec_bus, pci_bus_num(sec_bus),
> +                                spapr_populate_pci_devices_dt,
> +                                &s_fdt);
>  }
>  
>  static void spapr_phb_pci_enumerate_bridge(PCIBus *bus, PCIDevice *pdev,
> @@ -1953,9 +1953,9 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb,
>      s_fdt.fdt = fdt;
>      s_fdt.node_off = bus_off;
>      s_fdt.sphb = phb;
> -    pci_for_each_device(bus, pci_bus_num(bus),
> -                        spapr_populate_pci_devices_dt,
> -                        &s_fdt);
> +    pci_for_each_device_reverse(bus, pci_bus_num(bus),
> +                                spapr_populate_pci_devices_dt,
> +                                &s_fdt);
>  
>      ret = spapr_drc_populate_dt(fdt, bus_off, OBJECT(phb),
>                                  SPAPR_DR_CONNECTOR_TYPE_PCI);
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 6983f13745a5..9349acbfb278 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -429,6 +429,10 @@ int pci_bus_numa_node(PCIBus *bus);
>  void pci_for_each_device(PCIBus *bus, int bus_num,
>                           void (*fn)(PCIBus *bus, PCIDevice *d, void *opaque),
>                           void *opaque);
> +void pci_for_each_device_reverse(PCIBus *bus, int bus_num,
> +                                 void (*fn)(PCIBus *bus, PCIDevice *d,
> +                                            void *opaque),
> +                                 void *opaque);
>  void pci_for_each_bus_depth_first(PCIBus *bus,
>                                    void *(*begin)(PCIBus *bus, void *parent_state),
>                                    void (*end)(PCIBus *bus, void *state),
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2017-02-28  1:19 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-22 10:56 [Qemu-devel] [PATCH] spapr/pci: populate PCI DT in reverse order Greg Kurz
2017-02-24 10:51 ` Thomas Huth
2017-02-24 11:12 ` Nikunj A Dadhania
2017-02-25  9:39 ` Alexey Kardashevskiy
2017-02-25 10:40   ` Greg Kurz
2017-02-28  0:43     ` Alexey Kardashevskiy
2017-02-27 22:20 ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2017-03-01  1:07   ` David Gibson
2017-02-28  0:51 ` David Gibson [this message]
  -- strict thread matches above, loose matches on Subject: below --
2015-11-30 10:45 [Qemu-devel] " Greg Kurz
2015-12-01 21:48 ` Thomas Huth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170228005132.GI17615@umbus.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.