All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcel Apfelbaum <marcel@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>, qemu-devel@nongnu.org
Cc: mst@redhat.com, Peter Maydell <peter.maydell@linaro.org>,
	Drew Jones <drjones@redhat.com>, Laine Stump <laine@redhat.com>,
	Andrea Bolognani <abologna@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>
Subject: Re: [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines
Date: Tue, 6 Sep 2016 17:46:42 +0300	[thread overview]
Message-ID: <339e185b-648e-1c4a-081b-5fd52ceeef21@redhat.com> (raw)
In-Reply-To: <739af9d6-2382-c4bd-679d-87b153124491@redhat.com>

On 09/06/2016 04:31 PM, Laszlo Ersek wrote:
> On 09/05/16 22:02, Marcel Apfelbaum wrote:
>> On 09/05/2016 07:24 PM, Laszlo Ersek wrote:
>>> On 09/01/16 15:22, Marcel Apfelbaum wrote:
>>>> Proposes best practices on how to use PCIe/PCI device
>>>> in PCIe based machines and explain the reasoning behind them.
>>>>
>>>> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
>>>> ---
>>>>
>>>> Hi,
>>>>
>>>> Please add your comments on what to add/remove/edit to make this doc
>>>> usable.
>>>
>>

[...]

>>
>>> (But, certainly no IO reservation for PCI Express root port, upstream
>>> port, or downstream port! And i'll need your help for telling these
>>> apart in OVMF.)
>>>
>>
>> Just let me know how can I help.
>
> Well, in the EFI_PCI_HOT_PLUG_INIT_PROTOCOL.GetResourcePadding()
> implementation, I'll have to look at the PCI config space of the
> "bridge-like" PCI device that the generic PCI Bus driver of edk2 passes
> back to me, asking me about resource reservation.
>
> Based on the config space, I should be able to tell apart "PCI-PCI
> bridge" from "PCI Express downstream or root port". So what I'd need
> here is a semi-formal natural language description of these conditions.

You can use PCI Express Spec: 7.8.2. PCI Express Capabilities Register (Offset 02h)

Bit 7:4 Register Description:
Device/Port Type – Indicates the specific type of this PCI
Express Function. Note that different Functions in a multi-
Function device can generally be of different types.
Defined encodings are:
0000b PCI Express Endpoint
0001b Legacy PCI Express Endpoint
0100b Root Port of PCI Express Root Complex*
0101b Upstream Port of PCI Express Switch*
0110b Downstream Port of PCI Express Switch*
0111b PCI Express to PCI/PCI-X Bridge*
1000b PCI/PCI-X to PCI Express Bridge*
1001b Root Complex Integrated Endpoint



>
> Hmm, actually I think I've already written code, for another patch, that
> identifies the latter category. So everything where that check doesn't
> fire can be deemed "PCI-PCI bridge". (This hook gets called only for
> bridges.)
>
> Yet another alternative: if we go for the special PCI capability, for
> exposing reservation sizes from QEMU to the firmware, then I can simply
> search the capability list for just that capability. I think that could
> be the easiest for me.
>

That would be a "later" step.
BTW, following a offline chat with Michael S. Tsirkin
regarding virto 1.0 requiring 8M MMIO by default we arrived to a conclusion that
is not really needed and we came up with an alternative that will require less then 2M
MMIO space.
I put this here because the above solution will give us some time to deal with
the MMIO ranges reservation.

[...]

>>>> +
>>>> +
>>>> +4. Hot Plug
>>>> +============
>>>> +The root bus pcie.0 does not support hot-plug, so Integrated Devices,
>>>
>>> s/root bus/root complex/? Also, any root complexes added with pxb-pcie
>>> don't support hotplug.
>>>
>>
>> Actually pxb-pcie should support PCI Express Native Hotplug.
>
> Huh, interesting.
>
>> If they don't is a bug and I'll take care of it.
>
> Hmm, a bit lower down you mention that PCI Express native hot plug is
> based on SHPCs. So, when you say that pxb-pcie should support PCI
> Express Native Hotplug, you mean that it should occur through SHPC, right?
>

Yes, but I was talking about the Integrated SHPCs of the PCI Express
Root Ports and PCI Express Downstream Ports. (devices plugged into them)


> However, for pxb-pci*, we had to disable SHPC: see QEMU commit
> d10dda2d60c8 ("hw/pci-bridge: disable SHPC in PXB"), in June 2015.
>

This is only for the pxb device (not pxb-pcie) and only for the internal pci-bridge that comes with it.
And... we don't use SHPC based hot-plug for PCI, only for PCI Express.
For PCI we are using only the ACPI hotplug. So disabling it is not so bad.

The pxb-pcie does not have the internal PCI bridge. You don't need it because:
1. You can't have Integrated Devices for pxb-pcie
2. The PCI Express Upstream Port is a type of PCI-Bridge anyway.


> For background, the series around it was
> <https://lists.nongnu.org/archive/html/qemu-devel/2015-06/msg05136.html>
> -- I think v7 was the last version.
>
> ... Actually, now I wonder if d10dda2d60c8 should be possible to revert
> at this point! Namely, in OVMF I may have unwittingly fixed this issue
> -- obviously much later than the QEMU commit: in March 2016. See
>
> https://github.com/tianocore/edk2/commit/8f35eb92c419
>
> If you look at the commit message of the QEMU patch, it says
>
>     [...]
>
>     Unfortunately, when this happens, the PCI_COMMAND_MEMORY bit is
>     clear in the root bus's command register [...]
>
> which I think should no longer be true, thanks to edk2 commit 8f35eb92c419.
>
> So maybe we should re-evaluate QEMU commit d10dda2d60c8. If pxb-pci and
> pxb-pcie work with current OVMF, due to edk2 commit 8f35eb92c419, then
> maybe we should revert QEMU commit d10dda2d60c8.
>
> Not urgent for me :), obviously, I'm just explaining so you can make a
> note for later, if you wish to (if hot-plugging directly into pxb-pcie
> should be necessary -- I think it's very low priority).
>

As stated above, since we don't use it anyway it doesn't matter.

[...]

>> Nope, you cannot hotplug a PCI Express Root Port or a PCI Express
>> Downstream Port.
>> The reason: The PCI Express Native Hotplug is based on SHPCs (Standard
>> HotPlug Controllers)
>> which are integrated only in the mentioned ports and not in Upstream
>> Ports or the Root Complex.
>> The "other" reason: When you buy a switch/server it has a number of
>> ports and that's it.
>> You cannot add "later".
>
> Makes sense, thank you. I think if you add the HMP example, it will make
> it clear. I only assumed that you needed several monitor commands for
> hotplugging a single switch (i.e., one command per one port) because on
> the QEMU command line you do need a separate -device option for the
> upstream port, and every single downstream port, of the same switch.
>
> If, using the monitor, it's just one device_add for the upstream port,
> and the downstream ports are added automatically, then I guess it'll be
> easy to understand.
>

No it doesn't work like that, you would need to add them one by one (upstream ports and then downstream ports)
as far as I understand it.
Actually I've never done it before, I'll try it first and update the doc on
how it should be done. (if it can be done...)

>>
>>>
>>> But, this question is actually irrelevant IMO, because here I would add
>>> another subsection about *planning* for hot-plug. (I think that's pretty
>>> important.) And those plans should make the hotplugging of switches
>>> unnecessary!
>>>
>>
>> I'll add a subsection for it. But when you are out of options you *can*
>> hotplug a switch if your sysadmin skills are limited...
>
> You probably can, but then we'll run into the resource allocation
> problem again:
>
> (1) The user will hotplug a switch (= S1) under a root port with, say,
> two downstream ports (= S1-DP1, S1-DP2).
>
> (2) They'll then plug a PCI Express device into one of those downstream
> ports (S1-DP1-Dev1).
>
> (3) Then they'll want to hot-plug *another* switch into the *other*
> downstream port (S1-DP2-S2).
>
>
>                          DP1 -- Dev1 (2)
>                         /
>      root port -- S1 (1)
>                         \
>                          DP2 -- S2 (3)
>
> However, concerning the resource needs of S2 (and especially the devices
> hot-plugged under S2!), S1 won't have enough left over, because Dev1
> (under DP1) will have eaten into them, and Dev1's BARs will have been
> programmed!
>

Theoretically the Guest OS should trigger PCI resources re-allocation
but I agree we should not count on them.

> We could never credibly explain our way out of this situation in a bug
> report. For that reason, I think we should discourage hotplug ideas that
> would change the topology, and require recursive resource allocation at
> higher levels and/or parallel branches of the topology.
>
> I know Linux can do that, and it even succeeds if there is enough room,
> but from the messages seen in the guest dmesg when it fails, how do you
> explain to the user that they should have plugged in S2 first, and Dev1
> second?
>
> So, we should recommend *not* to hotplug switches or PCI-PCI bridges.
> Instead,
> - keep a very flat hierarchy from the start;
> - for PCI Express, add as many root ports and downstream ports as you
> deem enough for future hotplug needs (keeping the flat formula I described);
> - for legacy PCI, add as many sibling PCI-PCI bridges directly under the
> one DMI-PCI bridge as you deem sufficient for future hotplug needs.
>
> In short, don't change the hierarchy at runtime by hotplugging internal
> nodes; hotplug *leaf nodes* only.
>

Agreed. I'll re-use some of your comments in the doc.

>>

[...]

>>
>> Gerd explicitly asked for the second idea (vendor specific capability)
>
> Nice, thank you for confirming it; let's do this then. It will also
> simplify my work in the
> EFI_PCI_HOT_PLUG_INIT_PROTOCOL.GetResourcePadding() function: it should
> suffice to scan the config space of the bridge, regardless of the
> "PCI-PCI bridge / PCI Express root or downstream port" distinction.
>

Will do, but since we have a quick way to deal with the current issue
(virtio 1.0 requiring 8MB MMIO while firmware reserving 2MB for PCI-Bridges hotplug)

[...]

Thanks,
Marcel

>
> Cheers!
> Laszlo
>

  reply	other threads:[~2016-09-06 14:47 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-01 13:22 [Qemu-devel] [PATCH RFC] docs: add PCIe devices placement guidelines Marcel Apfelbaum
2016-09-01 13:27 ` Peter Maydell
2016-09-01 13:51   ` Marcel Apfelbaum
2016-09-01 17:14     ` Laszlo Ersek
2016-09-05 16:24 ` Laszlo Ersek
2016-09-05 20:02   ` Marcel Apfelbaum
2016-09-06 13:31     ` Laszlo Ersek
2016-09-06 14:46       ` Marcel Apfelbaum [this message]
2016-09-07  6:21       ` Gerd Hoffmann
2016-09-07  8:06         ` Laszlo Ersek
2016-09-07  8:23           ` Marcel Apfelbaum
2016-09-07  8:06         ` Marcel Apfelbaum
2016-09-07 16:08           ` Alex Williamson
2016-09-07 19:32             ` Marcel Apfelbaum
2016-09-07 17:55           ` Laine Stump
2016-09-07 19:39             ` Marcel Apfelbaum
2016-09-07 20:34               ` Laine Stump
2016-09-15  8:38               ` Andrew Jones
2016-09-15 14:20                 ` Marcel Apfelbaum
2016-09-16 16:50                   ` Andrea Bolognani
2016-09-08  7:33             ` Gerd Hoffmann
2016-09-06 11:35   ` Gerd Hoffmann
2016-09-06 13:58     ` Laine Stump
2016-09-07  7:04       ` Gerd Hoffmann
2016-09-07 18:20         ` Laine Stump
2016-09-08  7:26           ` Gerd Hoffmann
2016-09-06 14:47     ` Marcel Apfelbaum
2016-09-07  7:53     ` Laszlo Ersek
2016-09-07  7:57       ` Marcel Apfelbaum
2016-10-04 14:59   ` Daniel P. Berrange
2016-10-04 15:40     ` Laszlo Ersek
2016-10-04 16:10       ` Laine Stump
2016-10-04 16:43         ` Laszlo Ersek
2016-10-04 18:08           ` Laine Stump
2016-10-04 18:52             ` Alex Williamson
2016-10-10 12:02               ` Andrea Bolognani
2016-10-10 14:36                 ` Marcel Apfelbaum
2016-10-11 15:37                   ` Andrea Bolognani
2016-10-04 18:56             ` Laszlo Ersek
2016-10-04 17:54         ` Laine Stump
2016-10-05  9:17           ` Marcel Apfelbaum
2016-10-10 11:09             ` Andrea Bolognani
2016-10-10 14:15               ` Marcel Apfelbaum
2016-10-11 13:30                 ` Andrea Bolognani
2016-10-04 15:45     ` Alex Williamson
2016-10-04 16:25       ` Laine Stump
2016-10-05 10:03         ` Marcel Apfelbaum
2016-09-06 15:38 ` Alex Williamson
2016-09-06 18:14   ` Marcel Apfelbaum
2016-09-06 18:32     ` Alex Williamson
2016-09-06 18:59       ` Marcel Apfelbaum
2016-09-07  7:44       ` Laszlo Ersek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=339e185b-648e-1c4a-081b-5fd52ceeef21@redhat.com \
    --to=marcel@redhat.com \
    --cc=abologna@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=drjones@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=laine@redhat.com \
    --cc=lersek@redhat.com \
    --cc=mst@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.