All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Andrea Bolognani <abologna@redhat.com>,
	David Gibson <david@gibson.dropbear.id.au>
Cc: Greg Kurz <groug@kaod.org>, Paolo Bonzini <pbonzini@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
	libvir-list@redhat.com, Michael Roth <mdroth@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [Qemu-ppc] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default
Date: Thu, 17 Nov 2016 13:02:57 +1100	[thread overview]
Message-ID: <3353ecef-2308-13e3-025d-df41b2e89945@ozlabs.ru> (raw)
In-Reply-To: <1479218565.3319.18.camel@redhat.com>

On 16/11/16 01:02, Andrea Bolognani wrote:
> On Tue, 2016-11-01 at 13:46 +1100, David Gibson wrote:
>> On Mon, Oct 31, 2016 at 03:10:23PM +1100, Alexey Kardashevskiy wrote:
>>>  
>>> On 31/10/16 13:53, David Gibson wrote:
>>>>  
>>>> On Fri, Oct 28, 2016 at 12:07:12PM +0200, Greg Kurz wrote:
>>>>>  
>>>>> On Fri, 28 Oct 2016 18:56:40 +1100
>>>>> Alexey Kardashevskiy <aik@ozlabs.ru> wrote:
>>>>>  
>>>>>>  
>>>>>> At the moment sPAPR PHB creates a root buf of TYPE_PCI_BUS type.
>>>>>> This means that vfio-pci devices attached to it (and this is
>>>>>> a default behaviour) hide PCIe extended capabilities as
>>>>>> the bus does not pass a pci_bus_is_express(pdev->bus) check.
>>>>>>  
>>>>>> This changes adds a default PCI bus type property to sPAPR PHB
>>>>>> and uses TYPE_PCIE_BUS if none passed; older machines get TYPE_PCI_BUS
>>>>>> for backward compatibility as a bus type is used in the bus name
>>>>>> so the root bus name becomes "pcie.0" instead of "pci.0".
>>>>>>  
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>> ---
>>>>>>  
>>>>>> What can possibly go wrong with such change of a name?
>>>>>> From devices prospective, I cannot see any.
>>>>>>  
>>>>>> libvirt might get upset as "pci.0" will not be available,
>>>>>> will it make sense to create pcie.0 as a root bus and always
>>>>>> add a PCIe->PCI bridge and name its bus "pci.0"?
>>>>>>  
>>>>>> Or create root bus from TYPE_PCIE_BUS and force name to "pci.0"?
>>>>>> pci_register_bus() can do this.
>>>>>>  
>>>>>>  
>>>>>> ---
>>>>>>   hw/ppc/spapr.c              | 5 +++++
>>>>>>   hw/ppc/spapr_pci.c          | 5 ++++-
>>>>>>   include/hw/pci-host/spapr.h | 1 +
>>>>>>   3 files changed, 10 insertions(+), 1 deletion(-)
>>>>>>  
>>>>>> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
>>>>>> index 0b3820b..a268511 100644
>>>>>> --- a/hw/ppc/spapr.c
>>>>>> +++ b/hw/ppc/spapr.c
>>>>>> @@ -2541,6 +2541,11 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", true);
>>>>>>           .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
>>>>>>           .property = "mem64_win_size",               \
>>>>>>           .value    = "0",                            \
>>>>>> +    },                                              \
>>>>>> +    {                                               \
>>>>>> +        .driver   = TYPE_SPAPR_PCI_HOST_BRIDGE,     \
>>>>>> +        .property = "root_bus_type",                \
>>>>>> +        .value    = TYPE_PCI_BUS,                   \
>>>>>>       },
>>>>>>   
>>>>>>   static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t index,
>>>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>>>> index 7cde30e..2fa1f22 100644
>>>>>> --- a/hw/ppc/spapr_pci.c
>>>>>> +++ b/hw/ppc/spapr_pci.c
>>>>>> @@ -1434,7 +1434,9 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp)
>>>>>>       bus = pci_register_bus(dev, NULL,
>>>>>>                              pci_spapr_set_irq, pci_spapr_map_irq, sphb,
>>>>>>                              &sphb->memspace, &sphb->iospace,
>>>>>> -                           PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS);
>>>>>> +                           PCI_DEVFN(0, 0), PCI_NUM_PINS,
>>>>>> +                           sphb->root_bus_type ? sphb->root_bus_type :
>>>>>> +                           TYPE_PCIE_BUS);
>>>>>  
>>>>> Shouldn't we ensure that sphb->root_bus_type is either TYPE_PCIE_BUS or
>>>>> TYPE_PCI_BUS ?
>>>>  
>>>> Yes, I think so.  In fact, I think it would be better to make the
>>>> property a boolean that just selects PCI-E, rather than this which
>>>> exposes qemu (semi-)internal type names on the comamnd line.
>>>  
>>> Sure, a "pcie-root" boolean property should do.
>>>  
>>> However this is not my main concern, I rather wonder if we have to have
>>> pci.0 when we pick PCIe for the root.
>>  
>> Right.
>>  
>> I've added Andrea Bologna to the CC list to get a libvirt perspective.
> 
> Thanks for doing so: changes such as this one can have quite
> an impact on the upper layers of the stack, so the earliest
> libvirt is involved in the discussion the better.
> 
> I'm going to go a step further and cross-post to libvir-list
> in order to give other libvirt contributors a chance to chime
> in too.
> 
>> Andrea,
>>  
>> To summarise the issue here:
>>     * As I've said before the PAPR spec kinda-sorta abstracts the
>>       difference between vanilla PCI and PCI-E
>>     * However, because within qemu we're declaring the bus as PCI that
>>       means some PCI-E devices aren't working right
>>     * In particular it means that PCI-E extended config space isn't
>>       available
>>  
>> The proposal is to change (on newer machine types) the spapr PHB code
>> to declare a PCI-E bus instead.  AIUI this still won't make the root
>> complex guest visible (which it's not supposed to be under PAPR), and
>> the guest shouldn't see a difference in most cases - it will still see
>> the PAPR abstracted PCIish bus, but will now be able to get extended
>> config space.
>>  
>> The possible problem from a libvirt perspective is that doing this in
>> the simplest way in qemu would change the name of the default bus from
>> pci.0 to pcie.0.  We have two suggested ways to mitigate this:
>>     1) Automatically create a PCI-E to PCI bridge, so that new machine
>>        types will have both a pcie.0 and pci.0 bus
>>     2) Force the name of the bus to be pci.0, even though it's treated
>>        as PCI-E in other ways.
>>  
>> We're trying to work out exactly what will and won't cause trouble for
>> libvirt.
> 
> Option 2) is definitely a no-no, as we don't want to be piling
> up even more hacks and architecture-specific code: the PCI
> Express Root Bus should be called pcie.0, just as it is on q35
> and mach-virt machine types.
> 
> Option 1) doesn't look too bad, but devices that are added
> automatically by QEMU are an issue since we need to hardcode
> knowledge of them into libvirt if we want the rest of the PCI
> address allocation logic to handle them correctly.
> 
> Moreover libvirt now has the ability of building a legacy PCI
> topology without user intervention, if needed to plug in
> legacy devices, on machines that have a PCI Express Root Bus,
> which makes the additional bridge fully redundant...
> 
> ... or at least it would, if we actually had a proper
> PCIe-to-PCI bridge; AFAIK, though, the closest we have is the
> i82801b11-bridge that is Intel-specific despite having so far
> been abused as a generic PCIe-to-PCI bridge. I'm not even
> sure whether it would work at all on ppc64.
> 
> Moving from legacy PCI to PCI Express would definitely be an
> improvement, in my opinion. As mentioned, that's already the
> case for at least two other architectures, so the more we can
> standardize on that, the better.
> 
> That said, considering that a big part of the PCI address
> allocation logic is based off whether the specific machine
> type exposes a legay PCI Root Bus or a PCI Express Root Bus,
> libvirt will need a way to be able to tell which one is which.
> 
> Version checks are pretty much out of the question, as they
> fail as soon as downstream releases enter the picture. A
> few ways we could deal with the situation:
> 
>   1) switch to PCI Express on newer machine types, and
>      expose some sort of capability through QMP so that
>      libvirt can know about the switch
> 
>   2) switch between legacy PCI and PCI Express based on a
>      machine type option. libvirt would be able to find out
>      whether the option is available or not, and default to
>      either
> 
>        <controller type='pci' model='pci-root'/>
> 
>      or
> 
>        <controller type='pci' model='pcie-root'/>
> 
>      based on that. In order to support multiple PHBs
>      properly, those would have to be switchable with an
>      option as well
> 
>   3) create an entirely new machine type, eg. pseries-pcie
>      or whatever someone with the ability to come up with
>      decent names can suggest :) That would make ppc64
>      similar to x86, where i440fx and q35 have different
>      root buses. libvirt would learn about the new machine
>      type, know that it has a PCI Express Root Bus, and
>      behave accordingly
> 
> Option 1) would break horribly with existing libvirt
> versions, and so would Option 2) if we default to using


How exactly 1) will break libvirt? Migrating from pseries-2.7 to
pseries-2.8 does not work anyway, and machines are allowed to behave
different from version to version, what distinct difference will using
"pseries-pcie-X.Y" make? I believe after we introduced the very first
pseries-pcie-X.Y, we will just stop adding new pseries-X.Y.



> PCI Express. Option 2) with default to legacy PCI and
> option 3) would work just fine with existing libvirt
> versions AFAICT, but wouldn't of course expose the new
> capabilities.
> 
> Option 3) is probably the one that will be less confusing
> to users; we might even decide to take the chance and fix
> other small annoyances with the current pseries machine
> type, if there's any. On the other hand, it might very well
> be considered to be too big a hammer for such a small nail.



-- 
Alexey

  reply	other threads:[~2016-11-17  2:04 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-28  7:56 [Qemu-devel] [RFC PATCH qemu] spapr_pci: Create PCI-express root bus by default Alexey Kardashevskiy
2016-10-28 10:07 ` [Qemu-devel] [Qemu-ppc] " Greg Kurz
2016-10-31  2:53   ` David Gibson
2016-10-31  4:10     ` Alexey Kardashevskiy
2016-11-01  2:46       ` David Gibson
2016-11-15 14:02         ` Andrea Bolognani
2016-11-17  2:02           ` Alexey Kardashevskiy [this message]
2016-11-18  6:11             ` David Gibson
2016-11-18  8:17             ` Andrea Bolognani
2016-11-21  2:12               ` Alexey Kardashevskiy
2016-11-21 13:08                 ` Andrea Bolognani
2016-11-22  2:26                   ` Alexey Kardashevskiy
2016-11-23  5:02                     ` David Gibson
2016-11-25 14:36                       ` Andrea Bolognani
2016-12-02  3:37                         ` David Gibson
2016-11-22 14:07                   ` [Qemu-devel] [libvirt] " Eric Blake
2016-11-23  5:00               ` [Qemu-devel] " David Gibson
2016-11-25 13:46                 ` Andrea Bolognani
2016-12-02  4:18                   ` David Gibson
2016-12-02  5:17                     ` Benjamin Herrenschmidt
2016-12-02  5:50                       ` David Gibson
2016-12-02 21:41                         ` Benjamin Herrenschmidt
2016-12-03  1:02                           ` Alexey Kardashevskiy
2016-12-05 19:06                     ` [Qemu-devel] [libvirt] " Laine Stump
2016-12-05 20:54                     ` Laine Stump
2016-12-07  3:34                       ` David Gibson
2016-12-06 17:30                     ` [Qemu-devel] " Andrea Bolognani
2016-12-07  4:11                       ` David Gibson
2016-12-07 16:42                         ` Andrea Bolognani
2016-12-13 12:25                           ` Marcel Apfelbaum
2016-12-13 13:15                             ` Greg Kurz
2016-12-13 15:15                             ` Benjamin Herrenschmidt
2016-12-14  2:48                               ` David Gibson
2016-12-14 12:02                               ` Marcel Apfelbaum
2016-12-14  2:46                             ` David Gibson
2016-12-14 18:26                               ` Marcel Apfelbaum
2016-12-15 21:59                                 ` Benjamin Herrenschmidt
2016-12-19 17:39                                 ` Andrea Bolognani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3353ecef-2308-13e3-025d-df41b2e89945@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=abologna@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=groug@kaod.org \
    --cc=libvir-list@redhat.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.