All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
@ 2016-10-13 13:52 Marcel Apfelbaum
  2016-10-13 14:05 ` Marcel Apfelbaum
  0 siblings, 1 reply; 13+ messages in thread
From: Marcel Apfelbaum @ 2016-10-13 13:52 UTC (permalink / raw)
  To: qemu-devel

Proposes best practices on how to use PCI Express/PCI device
in PCI Express based machines and explain the reasoning behind them.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
---

Hi,

I am sending the doc  twice, it appears the first time didn't make it to qemu-devel list.

RFC->v2:
 - Addressed a lot of comments from the reviewers (many thanks to all, especially to Laszlo)

Since the RFC mail-thread was relatively long and already
has passed a lot of time from the RFC, I post this version
even if is very possible that I left some of the comments out,
my apologies if so.

I will go over the comments again, in the meantime please
feel free to comment on this version, even if on something
you've already pointed out.

It may take a day or two until I'll be able to respond, but I
will do my best to address all comments.

Thanks,
Marcel


 docs/pcie.txt | 273 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 273 insertions(+)
 create mode 100644 docs/pcie.txt

diff --git a/docs/pcie.txt b/docs/pcie.txt
new file mode 100644
index 0000000..7d852f1
--- /dev/null
+++ b/docs/pcie.txt
@@ -0,0 +1,273 @@
+PCI EXPRESS GUIDELINES
+======================
+
+1. Introduction
+================
+The doc proposes best practices on how to use PCI Express/PCI device
+in PCI Express based machines and explains the reasoning behind them.
+
+
+2. Device placement strategy
+============================
+QEMU does not have a clear socket-device matching mechanism
+and allows any PCI/PCI Express device to be plugged into any PCI/PCI Express slot.
+Plugging a PCI device into a PCI Express slot might not always work and
+is weird anyway since it cannot be done for "bare metal".
+Plugging a PCI Express device into a PCI slot will hide the Extended
+Configuration Space thus is also not recommended.
+
+The recommendation is to separate the PCI Express and PCI hierarchies.
+PCI Express devices should be plugged only into PCI Express Root Ports and
+PCI Express Downstream ports.
+
+2.1 Root Bus (pcie.0)
+=====================
+Place only the following kinds of devices directly on the Root Complex:
+    (1) Devices with dedicated, specific functionality (network card,
+        graphics card, IDE controller, etc); place only legacy PCI devices on
+        the Root Complex. These will be considered Integrated Endpoints.
+        Note: Integrated devices are not hot-pluggable.
+
+        Although the PCI Express spec does not forbid PCI Express devices as
+        Integrated Endpoints, existing hardware mostly integrates legacy PCI
+        devices with the Root Complex. Guest OSes are suspected to behave
+        strangely when PCI Express devices are integrated with the Root Complex.
+
+    (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express
+        hierarchies.
+
+    (3) DMI-PCI bridges (i82801b11-bridge), for starting legacy PCI hierarchies.
+
+    (4) Extra Root Complexes (pxb-pcie), if multiple PCIe Root Buses are needed.
+
+   pcie.0 bus
+   -----------------------------------------------------------------------------
+        |                |                    |                  |
+   -----------   ------------------   ------------------   --------------
+   | PCI Dev |   | PCIe Root Port |   | DMI-PCI bridge |   |  pxb-pcie  |
+   -----------   ------------------   ------------------   --------------
+
+2.1.1 To plug a device into a pcie.0 as Root Complex Integrated Device use:
+          -device <dev>[,bus=pcie.0]
+2.1.2 To expose a new PCI Express Root Bus use:
+          -device pxb-pcie,id=pcie.1,bus_nr=x,[numa_node=y],[addr=z]
+      Only PCI Express Root Ports and DMI-PCI bridges can be connected to the pcie.1 bus:
+          -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \
+          -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1
+
+
+2.2 PCI Express only hierarchy
+==============================
+Always use PCI Express Root Ports to start PCI Express hierarchies.
+
+A PCI Express Root bus supports up to 32 devices. Since each
+PCI Express Root Port is a function and a multi-function
+device may support up to 8 functions, the maximum possible
+PCI Express Root Ports per PCI Express Root Bus is 256.
+
+Prefer coupling PCI Express Root Ports into multi-function devices
+to keep a simple flat hierarchy that is enough for most scenarios.
+Only use PCI Express Switches (x3130-upstream, xio3130-downstream)
+if there is no more room for PCI Express Root Ports.
+Please see section 4. for further justifications.
+
+Plug only PCI Express devices into PCI Express Ports.
+
+
+   pcie.0 bus
+   ----------------------------------------------------------------------------------
+        |                 |                                    |
+   -------------    -------------                        -------------
+   | Root Port |    | Root Port |                        | Root Port |
+   ------------     -------------                        -------------
+         |                            -------------------------|------------------------
+    ------------                      |                 -----------------              |
+    | PCIe Dev |                      |    PCI Express  | Upstream Port |              |
+    ------------                      |      Switch     -----------------              |
+                                      |                  |            |                |
+                                      |    -------------------    -------------------  |
+                                      |    | Downstream Port |    | Downstream Port |  |
+                                      |    -------------------    -------------------  |
+                                      -------------|-----------------------|------------
+                                             ------------
+                                             | PCIe Dev |
+                                             ------------
+
+2.2.1 Plugging a PCI Express device into a PCI Express Root Port:
+          -device ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
+          -device <dev>,bus=root_port1
+      Note that chassis parameter is compulsory, and must be unique
+      for each PCI Express Root Port.
+2.2.2 Using multi-function PCI Express Root Ports:
+      -device ioh3420,id=root_port1,multifunction=on,chassis=x[,bus=pcie.0][,slot=y][,addr=z.0] \
+      -device ioh3420,id=root_port2,,chassis=x1[,bus=pcie.0][,slot=y1][,addr=z.1] \
+      -device ioh3420,id=root_port3,,chassis=x2[,bus=pcie.0][,slot=y2][,addr=z.2] \
+2.2.2 Plugging a PCI Express device into a Switch:
+      -device ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
+      -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x]          \
+      -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1[,slot=y1][,addr=z1]] \
+      -device <dev>,bus=downstream_port1
+
+
+2.3 PCI only hierarchy
+======================
+Legacy PCI devices can be plugged into pcie.0 as Integrated Devices.
+Besides that use DMI-PCI bridges (i82801b11-bridge) to start PCI hierarchies.
+
+Prefer flat hierarchies. For most scenarios a single DMI-PCI bridge (having 32 slots)
+and several PCI-PCI bridges attached to it (each supporting also 32 slots) will support
+hundreds of legacy devices. The recommendation is to populate one PCI-PCI bridge
+under the DMI-PCI bridge until is full and then plug a new PCI-PCI bridge...
+
+   pcie.0 bus
+   ----------------------------------------------
+        |                            |
+   -----------               ------------------
+   | PCI Dev |               | DMI-PCI BRIDGE |
+   ----------                ------------------
+                               |            |
+                        -----------    ------------------
+                        | PCI Dev |    | PCI-PCI Bridge |
+                        -----------    ------------------
+                                         |           |
+                                  -----------     -----------
+                                  | PCI Dev |     | PCI Dev |
+                                  -----------     -----------
+
+2.3.1 To plug a PCI device into a pcie.0 as Integrated Device use:
+      -device <dev>[,bus=pcie.0]
+2.3.2 Plugging a PCI device into a DMI-PCI bridge:
+      -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]    \
+      -device <dev>,bus=dmi_pci_bridge1[,addr=x]
+2.3.3 Plugging a PCI device into a PCI-PCI bridge:
+      -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]                        \
+      -device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]    \
+      -device <dev>,bus=pci_bridge1[,addr=x]
+
+
+3. IO space issues
+===================
+The PCI Express Root Ports and PCI Express Downstream ports are seen by
+Firmware/Guest OS as PCI-PCI bridges and, as required by PCI spec,
+should reserve a 4K IO range for each even if only one (multifunction)
+device can be plugged into them, resulting in poor IO space utilization.
+
+The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations
+by not allocating IO space if possible:
+    (1) - For empty PCI Express Root Ports/PCI Express Downstream ports.
+    (2) - If the device behind the PCI Express Root Ports/PCI Express
+          Downstream has no IO BARs.
+
+The IO space is very limited, 65536 byte-wide IO ports, but it's fragmented
+resulting in ~10 PCI Express Root Ports (or PCI Express Downstream/Upstream ports) 
+ports per system if devices with IO BARs are used in the PCI Express hierarchy.
+
+Using the proposed device placing strategy solves this issue
+by using only PCI Express devices within PCI Express hierarchy.
+
+The PCI Express spec requires the PCI Express devices to work without using IO.
+The PCI hierarchy has no such limitations.
+
+
+4. Bus numbers issues
+======================
+Each PCI domain can have up to only 256 buses and the QEMU PCI Express
+machines do not support multiple PCI domains even if extra Root
+Complexes (pxb-pcie) are used.
+
+Each element of the PCI Express hierarchy (Root Complexes,
+PCI Express Root Ports, PCI Express Downstream/Upstream ports)
+takes up bus numbers. Since only one (multifunction) device
+can be attached to a PCI Express Root Port or PCI Express Downstream
+Port it is advised to plan in advance for the expected number of
+devices to prevent bus numbers starvation.
+
+
+5. Hot Plug
+============
+The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices)
+do not support hot-plug, so any devices plugged into Root Complexes
+cannot be hot-plugged/hot-unplugged:
+    (1) PCI Express Integrated Devices
+    (2) PCI Express Root Ports
+    (3) DMI-PCI bridges
+    (4) pxb-pcie
+
+PCI devices can be hot-plugged into PCI-PCI bridges, however cannot
+be hot-plugged into DMI-PCI bridges.
+The PCI hotplug is ACPI based and can work side by side with the
+PCI Express native hotplug.
+
+PCI Express devices can be natively hot-plugged/hot-unplugged into/from
+PCI Express Root Ports (and PCI Express Downstream Ports).
+
+5.1 Planning for hotplug:
+    (1) PCI hierarchy
+        Leave enough PCI-PCI bridge slots empty or add one
+        or more empty PCI-PCI bridges to the DMI-PCI bridge.
+
+        For each such bridge the Guest Firmware is expected to reserve 4K IO
+        space and 2M MMIO range to be used for all devices behind it.
+
+        Because of the hard IO limit of around 10 PCI bridges (~ 40K space) per system
+        don't use more than 9 bridges, leaving 4K for the Integrated devices
+        and none for the PCI Express Hierarchy.
+
+    (2) PCI Express hierarchy:
+        Leave enough PCI Express Root Ports empty. Use multifunction
+        PCI Express Root Ports to prevent going out of PCI bus numbers.
+        Don't use PCI Express Switches if you don't have too, they use
+        an extra PCI bus that may handy to plug another device id it comes to it.
+
+5.3 Hot plug example:
+Using HMP: (add -monitor stdio to QEMU command line)
+  device_add <dev>,id=<id>,bus=<pcie.0/PCI Express Root Port Id/PCI-PCI bridge Id/pxb-pcie Id>
+
+
+6. Device assignment
+====================
+Host devices are mostly PCI Express and should be plugged only into
+PCI Express Root Ports or PCI Express Downstream Ports.
+PCI-PCI bridge slots can be used for legacy PCI host devices.
+
+6.1 How to detect if a device is PCI Express:
+  > lspci -s 03:00.0 -v (as root)
+
+    03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
+    Subsystem: Intel Corporation Dual Band Wireless-AC 7260
+    Flags: bus master, fast devsel, latency 0, IRQ 50
+    Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
+    Capabilities: [c8] Power Management version 3
+    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
+    Capabilities: [40] Express Endpoint, MSI 00
+
+    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+    Capabilities: [100] Advanced Error Reporting
+    Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
+    Capabilities: [14c] Latency Tolerance Reporting
+    Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 
+
+
+7. Virtio devices
+=================
+Virtio devices plugged into the PCI hierarchy or as Integrated Devices
+will remain PCI and have transitional behaviour as default.
+Transitional virtio devices work in both IO and MMIO modes depending on
+the guest support.
+
+Virtio devices plugged into PCI Express ports are PCI Express devices and
+have "1.0" behavior by default without IO support.
+In both case disable-* properties can be used to override the behaviour.
+
+Note that setting disable-legacy=off will enable legacy mode (enabling
+legacy behavior) for PCI Express virtio devices causing them to
+require IO space, which, given our PCI Express hierarchy, may quickly
+lead to resource exhaustion, and is therefore strongly discouraged.
+
+
+8. Conclusion
+==============
+The proposal offers a usage model that is easy to understand and follow
+and in the same time overcomes the PCI Express architecture limitations.
+
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-13 13:52 [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines Marcel Apfelbaum
@ 2016-10-13 14:05 ` Marcel Apfelbaum
  2016-10-14 11:36   ` Laszlo Ersek
  2016-10-17 14:18   ` Andrea Bolognani
  0 siblings, 2 replies; 13+ messages in thread
From: Marcel Apfelbaum @ 2016-10-13 14:05 UTC (permalink / raw)
  To: Marcel Apfelbaum, qemu-devel
  Cc: Laszlo Ersek, Gerd Hoffmann, Laine Stump, Peter Maydell,
	Andrew Jones, Andrea Bolognani, Alex Williamson, Daniel Berrange,
	Michael S. Tsirkin

On 10/13/2016 04:52 PM, Marcel Apfelbaum wrote:
> Proposes best practices on how to use PCI Express/PCI device
> in PCI Express based machines and explain the reasoning behind them.
>
> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
> ---
>
> Hi,
>
> I am sending the doc  twice, it appears the first time didn't make it to qemu-devel list.

Hi,

Adding people to CC. Sorry for the earlier noise.

Thanks,
Marcel

>
> RFC->v2:
>  - Addressed a lot of comments from the reviewers (many thanks to all, especially to Laszlo)
>
> Since the RFC mail-thread was relatively long and already
> has passed a lot of time from the RFC, I post this version
> even if is very possible that I left some of the comments out,
> my apologies if so.
>
> I will go over the comments again, in the meantime please
> feel free to comment on this version, even if on something
> you've already pointed out.
>
> It may take a day or two until I'll be able to respond, but I
> will do my best to address all comments.
>
> Thanks,
> Marcel
>
>
>  docs/pcie.txt | 273 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 273 insertions(+)
>  create mode 100644 docs/pcie.txt
>
> diff --git a/docs/pcie.txt b/docs/pcie.txt
> new file mode 100644
> index 0000000..7d852f1
> --- /dev/null
> +++ b/docs/pcie.txt
> @@ -0,0 +1,273 @@
> +PCI EXPRESS GUIDELINES
> +======================
> +
> +1. Introduction
> +================
> +The doc proposes best practices on how to use PCI Express/PCI device
> +in PCI Express based machines and explains the reasoning behind them.
> +
> +
> +2. Device placement strategy
> +============================
> +QEMU does not have a clear socket-device matching mechanism
> +and allows any PCI/PCI Express device to be plugged into any PCI/PCI Express slot.
> +Plugging a PCI device into a PCI Express slot might not always work and
> +is weird anyway since it cannot be done for "bare metal".
> +Plugging a PCI Express device into a PCI slot will hide the Extended
> +Configuration Space thus is also not recommended.
> +
> +The recommendation is to separate the PCI Express and PCI hierarchies.
> +PCI Express devices should be plugged only into PCI Express Root Ports and
> +PCI Express Downstream ports.
> +
> +2.1 Root Bus (pcie.0)
> +=====================
> +Place only the following kinds of devices directly on the Root Complex:
> +    (1) Devices with dedicated, specific functionality (network card,
> +        graphics card, IDE controller, etc); place only legacy PCI devices on
> +        the Root Complex. These will be considered Integrated Endpoints.
> +        Note: Integrated devices are not hot-pluggable.
> +
> +        Although the PCI Express spec does not forbid PCI Express devices as
> +        Integrated Endpoints, existing hardware mostly integrates legacy PCI
> +        devices with the Root Complex. Guest OSes are suspected to behave
> +        strangely when PCI Express devices are integrated with the Root Complex.
> +
> +    (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express
> +        hierarchies.
> +
> +    (3) DMI-PCI bridges (i82801b11-bridge), for starting legacy PCI hierarchies.
> +
> +    (4) Extra Root Complexes (pxb-pcie), if multiple PCIe Root Buses are needed.
> +
> +   pcie.0 bus
> +   -----------------------------------------------------------------------------
> +        |                |                    |                  |
> +   -----------   ------------------   ------------------   --------------
> +   | PCI Dev |   | PCIe Root Port |   | DMI-PCI bridge |   |  pxb-pcie  |
> +   -----------   ------------------   ------------------   --------------
> +
> +2.1.1 To plug a device into a pcie.0 as Root Complex Integrated Device use:
> +          -device <dev>[,bus=pcie.0]
> +2.1.2 To expose a new PCI Express Root Bus use:
> +          -device pxb-pcie,id=pcie.1,bus_nr=x,[numa_node=y],[addr=z]
> +      Only PCI Express Root Ports and DMI-PCI bridges can be connected to the pcie.1 bus:
> +          -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \
> +          -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1
> +
> +
> +2.2 PCI Express only hierarchy
> +==============================
> +Always use PCI Express Root Ports to start PCI Express hierarchies.
> +
> +A PCI Express Root bus supports up to 32 devices. Since each
> +PCI Express Root Port is a function and a multi-function
> +device may support up to 8 functions, the maximum possible
> +PCI Express Root Ports per PCI Express Root Bus is 256.
> +
> +Prefer coupling PCI Express Root Ports into multi-function devices
> +to keep a simple flat hierarchy that is enough for most scenarios.
> +Only use PCI Express Switches (x3130-upstream, xio3130-downstream)
> +if there is no more room for PCI Express Root Ports.
> +Please see section 4. for further justifications.
> +
> +Plug only PCI Express devices into PCI Express Ports.
> +
> +
> +   pcie.0 bus
> +   ----------------------------------------------------------------------------------
> +        |                 |                                    |
> +   -------------    -------------                        -------------
> +   | Root Port |    | Root Port |                        | Root Port |
> +   ------------     -------------                        -------------
> +         |                            -------------------------|------------------------
> +    ------------                      |                 -----------------              |
> +    | PCIe Dev |                      |    PCI Express  | Upstream Port |              |
> +    ------------                      |      Switch     -----------------              |
> +                                      |                  |            |                |
> +                                      |    -------------------    -------------------  |
> +                                      |    | Downstream Port |    | Downstream Port |  |
> +                                      |    -------------------    -------------------  |
> +                                      -------------|-----------------------|------------
> +                                             ------------
> +                                             | PCIe Dev |
> +                                             ------------
> +
> +2.2.1 Plugging a PCI Express device into a PCI Express Root Port:
> +          -device ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
> +          -device <dev>,bus=root_port1
> +      Note that chassis parameter is compulsory, and must be unique
> +      for each PCI Express Root Port.
> +2.2.2 Using multi-function PCI Express Root Ports:
> +      -device ioh3420,id=root_port1,multifunction=on,chassis=x[,bus=pcie.0][,slot=y][,addr=z.0] \
> +      -device ioh3420,id=root_port2,,chassis=x1[,bus=pcie.0][,slot=y1][,addr=z.1] \
> +      -device ioh3420,id=root_port3,,chassis=x2[,bus=pcie.0][,slot=y2][,addr=z.2] \
> +2.2.2 Plugging a PCI Express device into a Switch:
> +      -device ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
> +      -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x]          \
> +      -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1[,slot=y1][,addr=z1]] \
> +      -device <dev>,bus=downstream_port1
> +
> +
> +2.3 PCI only hierarchy
> +======================
> +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices.
> +Besides that use DMI-PCI bridges (i82801b11-bridge) to start PCI hierarchies.
> +
> +Prefer flat hierarchies. For most scenarios a single DMI-PCI bridge (having 32 slots)
> +and several PCI-PCI bridges attached to it (each supporting also 32 slots) will support
> +hundreds of legacy devices. The recommendation is to populate one PCI-PCI bridge
> +under the DMI-PCI bridge until is full and then plug a new PCI-PCI bridge...
> +
> +   pcie.0 bus
> +   ----------------------------------------------
> +        |                            |
> +   -----------               ------------------
> +   | PCI Dev |               | DMI-PCI BRIDGE |
> +   ----------                ------------------
> +                               |            |
> +                        -----------    ------------------
> +                        | PCI Dev |    | PCI-PCI Bridge |
> +                        -----------    ------------------
> +                                         |           |
> +                                  -----------     -----------
> +                                  | PCI Dev |     | PCI Dev |
> +                                  -----------     -----------
> +
> +2.3.1 To plug a PCI device into a pcie.0 as Integrated Device use:
> +      -device <dev>[,bus=pcie.0]
> +2.3.2 Plugging a PCI device into a DMI-PCI bridge:
> +      -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]    \
> +      -device <dev>,bus=dmi_pci_bridge1[,addr=x]
> +2.3.3 Plugging a PCI device into a PCI-PCI bridge:
> +      -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]                        \
> +      -device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]    \
> +      -device <dev>,bus=pci_bridge1[,addr=x]
> +
> +
> +3. IO space issues
> +===================
> +The PCI Express Root Ports and PCI Express Downstream ports are seen by
> +Firmware/Guest OS as PCI-PCI bridges and, as required by PCI spec,
> +should reserve a 4K IO range for each even if only one (multifunction)
> +device can be plugged into them, resulting in poor IO space utilization.
> +
> +The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations
> +by not allocating IO space if possible:
> +    (1) - For empty PCI Express Root Ports/PCI Express Downstream ports.
> +    (2) - If the device behind the PCI Express Root Ports/PCI Express
> +          Downstream has no IO BARs.
> +
> +The IO space is very limited, 65536 byte-wide IO ports, but it's fragmented
> +resulting in ~10 PCI Express Root Ports (or PCI Express Downstream/Upstream ports)
> +ports per system if devices with IO BARs are used in the PCI Express hierarchy.
> +
> +Using the proposed device placing strategy solves this issue
> +by using only PCI Express devices within PCI Express hierarchy.
> +
> +The PCI Express spec requires the PCI Express devices to work without using IO.
> +The PCI hierarchy has no such limitations.
> +
> +
> +4. Bus numbers issues
> +======================
> +Each PCI domain can have up to only 256 buses and the QEMU PCI Express
> +machines do not support multiple PCI domains even if extra Root
> +Complexes (pxb-pcie) are used.
> +
> +Each element of the PCI Express hierarchy (Root Complexes,
> +PCI Express Root Ports, PCI Express Downstream/Upstream ports)
> +takes up bus numbers. Since only one (multifunction) device
> +can be attached to a PCI Express Root Port or PCI Express Downstream
> +Port it is advised to plan in advance for the expected number of
> +devices to prevent bus numbers starvation.
> +
> +
> +5. Hot Plug
> +============
> +The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices)
> +do not support hot-plug, so any devices plugged into Root Complexes
> +cannot be hot-plugged/hot-unplugged:
> +    (1) PCI Express Integrated Devices
> +    (2) PCI Express Root Ports
> +    (3) DMI-PCI bridges
> +    (4) pxb-pcie
> +
> +PCI devices can be hot-plugged into PCI-PCI bridges, however cannot
> +be hot-plugged into DMI-PCI bridges.
> +The PCI hotplug is ACPI based and can work side by side with the
> +PCI Express native hotplug.
> +
> +PCI Express devices can be natively hot-plugged/hot-unplugged into/from
> +PCI Express Root Ports (and PCI Express Downstream Ports).
> +
> +5.1 Planning for hotplug:
> +    (1) PCI hierarchy
> +        Leave enough PCI-PCI bridge slots empty or add one
> +        or more empty PCI-PCI bridges to the DMI-PCI bridge.
> +
> +        For each such bridge the Guest Firmware is expected to reserve 4K IO
> +        space and 2M MMIO range to be used for all devices behind it.
> +
> +        Because of the hard IO limit of around 10 PCI bridges (~ 40K space) per system
> +        don't use more than 9 bridges, leaving 4K for the Integrated devices
> +        and none for the PCI Express Hierarchy.
> +
> +    (2) PCI Express hierarchy:
> +        Leave enough PCI Express Root Ports empty. Use multifunction
> +        PCI Express Root Ports to prevent going out of PCI bus numbers.
> +        Don't use PCI Express Switches if you don't have too, they use
> +        an extra PCI bus that may handy to plug another device id it comes to it.
> +
> +5.3 Hot plug example:
> +Using HMP: (add -monitor stdio to QEMU command line)
> +  device_add <dev>,id=<id>,bus=<pcie.0/PCI Express Root Port Id/PCI-PCI bridge Id/pxb-pcie Id>
> +
> +
> +6. Device assignment
> +====================
> +Host devices are mostly PCI Express and should be plugged only into
> +PCI Express Root Ports or PCI Express Downstream Ports.
> +PCI-PCI bridge slots can be used for legacy PCI host devices.
> +
> +6.1 How to detect if a device is PCI Express:
> +  > lspci -s 03:00.0 -v (as root)
> +
> +    03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
> +    Subsystem: Intel Corporation Dual Band Wireless-AC 7260
> +    Flags: bus master, fast devsel, latency 0, IRQ 50
> +    Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
> +    Capabilities: [c8] Power Management version 3
> +    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> +    Capabilities: [40] Express Endpoint, MSI 00
> +
> +    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +    Capabilities: [100] Advanced Error Reporting
> +    Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
> +    Capabilities: [14c] Latency Tolerance Reporting
> +    Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014
> +
> +
> +7. Virtio devices
> +=================
> +Virtio devices plugged into the PCI hierarchy or as Integrated Devices
> +will remain PCI and have transitional behaviour as default.
> +Transitional virtio devices work in both IO and MMIO modes depending on
> +the guest support.
> +
> +Virtio devices plugged into PCI Express ports are PCI Express devices and
> +have "1.0" behavior by default without IO support.
> +In both case disable-* properties can be used to override the behaviour.
> +
> +Note that setting disable-legacy=off will enable legacy mode (enabling
> +legacy behavior) for PCI Express virtio devices causing them to
> +require IO space, which, given our PCI Express hierarchy, may quickly
> +lead to resource exhaustion, and is therefore strongly discouraged.
> +
> +
> +8. Conclusion
> +==============
> +The proposal offers a usage model that is easy to understand and follow
> +and in the same time overcomes the PCI Express architecture limitations.
> +
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-13 14:05 ` Marcel Apfelbaum
@ 2016-10-14 11:36   ` Laszlo Ersek
  2016-10-17 12:07     ` Gerd Hoffmann
  2016-10-27 11:27     ` Marcel Apfelbaum
  2016-10-17 14:18   ` Andrea Bolognani
  1 sibling, 2 replies; 13+ messages in thread
From: Laszlo Ersek @ 2016-10-14 11:36 UTC (permalink / raw)
  To: Marcel Apfelbaum, qemu-devel
  Cc: Peter Maydell, Andrew Jones, Michael S. Tsirkin,
	Andrea Bolognani, Alex Williamson, Gerd Hoffmann, Laine Stump

On 10/13/16 16:05, Marcel Apfelbaum wrote:
> On 10/13/2016 04:52 PM, Marcel Apfelbaum wrote:
>> Proposes best practices on how to use PCI Express/PCI device
>> in PCI Express based machines and explain the reasoning behind them.
>>
>> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
>> ---
>>
>> Hi,
>>
>> I am sending the doc  twice, it appears the first time didn't make it
>> to qemu-devel list.
> 
> Hi,
> 
> Adding people to CC. Sorry for the earlier noise.
> 
> Thanks,
> Marcel
> 
>>
>> RFC->v2:
>>  - Addressed a lot of comments from the reviewers (many thanks to all,
>> especially to Laszlo)
>>
>> Since the RFC mail-thread was relatively long and already
>> has passed a lot of time from the RFC, I post this version
>> even if is very possible that I left some of the comments out,
>> my apologies if so.
>>
>> I will go over the comments again, in the meantime please
>> feel free to comment on this version, even if on something
>> you've already pointed out.
>>
>> It may take a day or two until I'll be able to respond, but I
>> will do my best to address all comments.
>>
>> Thanks,
>> Marcel
>>
>>
>>  docs/pcie.txt | 273
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 273 insertions(+)
>>  create mode 100644 docs/pcie.txt
>>
>> diff --git a/docs/pcie.txt b/docs/pcie.txt
>> new file mode 100644
>> index 0000000..7d852f1
>> --- /dev/null
>> +++ b/docs/pcie.txt
>> @@ -0,0 +1,273 @@
>> +PCI EXPRESS GUIDELINES
>> +======================
>> +
>> +1. Introduction
>> +================
>> +The doc proposes best practices on how to use PCI Express/PCI device
>> +in PCI Express based machines and explains the reasoning behind them.
>> +
>> +
>> +2. Device placement strategy
>> +============================
>> +QEMU does not have a clear socket-device matching mechanism
>> +and allows any PCI/PCI Express device to be plugged into any PCI/PCI
>> Express slot.

{1} This line seems too long; I suggest to rewrap the entire document at
79 chars (except command line fragments where the wrapping would be
impractical).

>> +Plugging a PCI device into a PCI Express slot might not always work and
>> +is weird anyway since it cannot be done for "bare metal".
>> +Plugging a PCI Express device into a PCI slot will hide the Extended
>> +Configuration Space thus is also not recommended.
>> +
>> +The recommendation is to separate the PCI Express and PCI hierarchies.
>> +PCI Express devices should be plugged only into PCI Express Root
>> Ports and
>> +PCI Express Downstream ports.
>> +
>> +2.1 Root Bus (pcie.0)
>> +=====================
>> +Place only the following kinds of devices directly on the Root Complex:
>> +    (1) Devices with dedicated, specific functionality (network card,
>> +        graphics card, IDE controller, etc); place only legacy PCI

{2} I recommend to replace the semicolon (;) with a colon (:) here.

>> devices on
>> +        the Root Complex. These will be considered Integrated Endpoints.
>> +        Note: Integrated devices are not hot-pluggable.
>> +
>> +        Although the PCI Express spec does not forbid PCI Express
>> devices as
>> +        Integrated Endpoints, existing hardware mostly integrates
>> legacy PCI
>> +        devices with the Root Complex. Guest OSes are suspected to
>> behave
>> +        strangely when PCI Express devices are integrated with the
>> Root Complex.
>> +
>> +    (2) PCI Express Root Ports (ioh3420), for starting exclusively
>> PCI Express
>> +        hierarchies.
>> +
>> +    (3) DMI-PCI bridges (i82801b11-bridge), for starting legacy PCI
>> hierarchies.
>> +
>> +    (4) Extra Root Complexes (pxb-pcie), if multiple PCIe Root Buses
>> are needed.

{3} s/PCIe/PCI Express/, please :)

>> +
>> +   pcie.0 bus
>> +  
>> -----------------------------------------------------------------------------
>>
>> +        |                |                    |                  |
>> +   -----------   ------------------   ------------------  
>> --------------
>> +   | PCI Dev |   | PCIe Root Port |   | DMI-PCI bridge |   | 
>> pxb-pcie  |
>> +   -----------   ------------------   ------------------  
>> --------------
>> +
>> +2.1.1 To plug a device into a pcie.0 as Root Complex Integrated

{4} s/a pcie.0 as/pcie.0 as a/

>> Device use:
>> +          -device <dev>[,bus=pcie.0]
>> +2.1.2 To expose a new PCI Express Root Bus use:
>> +          -device pxb-pcie,id=pcie.1,bus_nr=x,[numa_node=y],[addr=z]

{5} I think the commas should be moved into the brackets (like below).

>> +      Only PCI Express Root Ports and DMI-PCI bridges can be
>> connected to the pcie.1 bus:
>> +          -device
>> ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \
>> +          -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1
>> +
>> +
>> +2.2 PCI Express only hierarchy
>> +==============================
>> +Always use PCI Express Root Ports to start PCI Express hierarchies.
>> +
>> +A PCI Express Root bus supports up to 32 devices. Since each
>> +PCI Express Root Port is a function and a multi-function
>> +device may support up to 8 functions, the maximum possible
>> +PCI Express Root Ports per PCI Express Root Bus is 256.

{6} s/maximum possible/maximum possible number of/, I suggest

>> +
>> +Prefer coupling PCI Express Root Ports into multi-function devices
>> +to keep a simple flat hierarchy that is enough for most scenarios.
>> +Only use PCI Express Switches (x3130-upstream, xio3130-downstream)
>> +if there is no more room for PCI Express Root Ports.
>> +Please see section 4. for further justifications.
>> +
>> +Plug only PCI Express devices into PCI Express Ports.
>> +
>> +
>> +   pcie.0 bus
>> +  
>> ----------------------------------------------------------------------------------
>>
>> +        |                 |                                    |
>> +   -------------    -------------                        -------------
>> +   | Root Port |    | Root Port |                        | Root Port |
>> +   ------------     -------------                        -------------
>> +         |                           
>> -------------------------|------------------------
>> +    ------------                      |                
>> -----------------              |
>> +    | PCIe Dev |                      |    PCI Express  | Upstream
>> Port |              |
>> +    ------------                      |      Switch    
>> -----------------              |
>> +                                      |                  |           
>> |                |
>> +                                      |    -------------------   
>> -------------------  |
>> +                                      |    | Downstream Port |    |
>> Downstream Port |  |
>> +                                      |    -------------------   
>> -------------------  |
>> +                                     
>> -------------|-----------------------|------------
>> +                                             ------------
>> +                                             | PCIe Dev |
>> +                                             ------------
>> +
>> +2.2.1 Plugging a PCI Express device into a PCI Express Root Port:
>> +          -device
>> ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
>> +          -device <dev>,bus=root_port1
>> +      Note that chassis parameter is compulsory, and must be unique
>> +      for each PCI Express Root Port.

{7} Hmmm, I think it's rather that the (chassis, slot) *pair* must be
unique. You can get away with leaving the default chassis=0 unspecified,
and spell out just the slots, I think.

>> +2.2.2 Using multi-function PCI Express Root Ports:
>> +      -device
>> ioh3420,id=root_port1,multifunction=on,chassis=x[,bus=pcie.0][,slot=y][,addr=z.0]
>> \
>> +      -device
>> ioh3420,id=root_port2,,chassis=x1[,bus=pcie.0][,slot=y1][,addr=z.1] \
>> +      -device
>> ioh3420,id=root_port3,,chassis=x2[,bus=pcie.0][,slot=y2][,addr=z.2] \

{8} This looks good to me, except for the double-comma typos: ",,".

>> +2.2.2 Plugging a PCI Express device into a Switch:
>> +      -device
>> ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
>> +      -device
>> x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x]          \
>> +      -device
>> xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1[,slot=y1][,addr=z1]]
>> \
>> +      -device <dev>,bus=downstream_port1
>> +

{9} For all of these command lines, can you specify if z=0 (that is,
device#0) is valid or not in the addr=z properties?

Earlier discussion on this question:

On 10/04/16 18:25, Laine Stump wrote:
> On 10/04/2016 11:45 AM, Alex Williamson wrote:

>>  Same with the restriction from using slot
>> 0 on PCI bridges, there's no basis for that except on the root bus.
>
> I tried allowing devices to be plugged into slot 0 of a pci-bridge in
> libvirt - qemu barfed, so I moved the "minSlot" for pci-bridge back up
> to 1. Slot 0 is completely usable on a dmi-to-pci-bridge though (and
> libvirt allows it). At this point, even if qemu enabled using slot 0
> of a pci-bridge, libvirt wouldn't be able to expose that to users
> (unless the min/max slot of each PCI controller was made visible
> somewhere via QMP)

On 10/05/16 12:03, Marcel Apfelbaum wrote:
> The reason for not being able to plug a device into slot 0 of a PCI
> Bridge is the SHPC (Hot-plug controller) device embedded in the PCI
> bridge by default. The SHPC spec requires this. If one disables it
> with shpc=false, he should be able to use the slot 0.

For simplicity's sake I guess we should just recommend >=1 slot numbers.

Back to the patch:

On 10/13/16 16:05, Marcel Apfelbaum wrote:
> On 10/13/2016 04:52 PM, Marcel Apfelbaum wrote:

>> +
>> +2.3 PCI only hierarchy
>> +======================
>> +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices.
>> +Besides that use DMI-PCI bridges (i82801b11-bridge) to start PCI
>> hierarchies.
>> +
>> +Prefer flat hierarchies. For most scenarios a single DMI-PCI bridge
>> (having 32 slots)
>> +and several PCI-PCI bridges attached to it (each supporting also 32
>> slots) will support
>> +hundreds of legacy devices. The recommendation is to populate one
>> PCI-PCI bridge
>> +under the DMI-PCI bridge until is full and then plug a new PCI-PCI
>> bridge...
>> +
>> +   pcie.0 bus
>> +   ----------------------------------------------
>> +        |                            |
>> +   -----------               ------------------
>> +   | PCI Dev |               | DMI-PCI BRIDGE |
>> +   ----------                ------------------
>> +                               |            |
>> +                        -----------    ------------------
>> +                        | PCI Dev |    | PCI-PCI Bridge |
>> +                        -----------    ------------------
>> +                                         |           |
>> +                                  -----------     -----------
>> +                                  | PCI Dev |     | PCI Dev |
>> +                                  -----------     -----------
>> +
>> +2.3.1 To plug a PCI device into a pcie.0 as Integrated Device use:
>> +      -device <dev>[,bus=pcie.0]

(This is repeated from 2.1.1, but I guess it doesn't hurt, for
completeness in this chapter.)

>> +2.3.2 Plugging a PCI device into a DMI-PCI bridge:
>> +      -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]    \
>> +      -device <dev>,bus=dmi_pci_bridge1[,addr=x]

{10} I recall that we discussed this at length, that is, placing PCI
(non-express) devices directly on the DMI-PCI bridge. IIRC the argument
was that it's technically possible, it just won't support hot-plug.

I'd like if we removed this use case from the document. It might be
possible, but for simplicity's sake, we shouldn't advertise it, in my
opinion. (Unless someone strongly disagrees with this idea, of course.)
I recall esp. Laine, Alex and Daniel focusing on this, but I don't
remember if (and what for) they wanted this option. Personally I'd like
to see it disappear (unless convinced otherwise).

... Actually, going through the RFC thread again, it seems that the use
case that I'm arguing against -- i.e., "use DMI-PCI as a generic
PCIe-to-PCI bridge" -- makes Alex cringe as well. So there's that :)

{11} Syntax remark, should we keep this section: the commas are not
right in ",[,bus=pcie.0]".

>> +2.3.3 Plugging a PCI device into a PCI-PCI bridge:
>> +      -device
>> i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]                       
>> \

{12} double comma again

>> +      -device
>> pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]   
>> \
>> +      -device <dev>,bus=pci_bridge1[,addr=x]

{13} It would be nice to spell out the valid device addresses (y and x)
here too -- can we use 0 for them? SHPC again?

Can we / should we simply go with >=1 device addresses?

>> +
>> +
>> +3. IO space issues
>> +===================
>> +The PCI Express Root Ports and PCI Express Downstream ports are seen by
>> +Firmware/Guest OS as PCI-PCI bridges and, as required by PCI spec,
>> +should reserve a 4K IO range for each even if only one (multifunction)
>> +device can be plugged into them, resulting in poor IO space utilization.

{14} I completely agree with this paragraph, I'd just recommend to allow
the reader more time to digest it. Something like:

----
The PCI Express Root Ports and PCI Express Downstream ports are seen by
Firmware/Guest OS as PCI-PCI bridges. As required by the PCI spec, each
such Port should be reserved a 4K IO range for, even though only one
(multifunction) device can be plugged into each Port. This results in
poor IO space utilization.
----

>> +
>> +The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations
>> +by not allocating IO space if possible:
>> +    (1) - For empty PCI Express Root Ports/PCI Express Downstream ports.
>> +    (2) - If the device behind the PCI Express Root Ports/PCI Express
>> +          Downstream has no IO BARs.

{15} I'd say:

... by not allocating IO space for each PCI Express Root / PCI Express
Downstream port if:
(1) the port is empty, or
(2) the device behind the port has no IO BARs.

>> +
>> +The IO space is very limited, 65536 byte-wide IO ports, but it's
>> fragmented

{16} Suggestion: "The IO space is very limited, to 65536 byte-wide IO
ports, and may even be fragmented by fixed IO ports owned by platform
devices."

>> +resulting in ~10 PCI Express Root Ports (or PCI Express

{17} s/resulting in/resulting in at most/

>> Downstream/Upstream ports)

{18} Whoa, upstream ports? Why do we need to spell them out here?
Upstream ports need their own bus numbers, but their IO space assignment
only collects the IO space assignments of their downstream ports. Isn't
that right?

(I believe spelling out upstream ports here was recommended by Alex, but
I don't understand why.)

>> +ports per system if devices with IO BARs are used in the PCI Express

{19} The word "ports" is unnecessary here.

>> hierarchy.
>> +
>> +Using the proposed device placing strategy solves this issue
>> +by using only PCI Express devices within PCI Express hierarchy.
>> +
>> +The PCI Express spec requires the PCI Express devices to work without
>> using IO.
>> +The PCI hierarchy has no such limitations.

{20} There should be no empty line (or even a line break) between the
two above paragraphs: the second paragraph explains the first one.

>> +
>> +
>> +4. Bus numbers issues
>> +======================
>> +Each PCI domain can have up to only 256 buses and the QEMU PCI Express
>> +machines do not support multiple PCI domains even if extra Root
>> +Complexes (pxb-pcie) are used.
>> +
>> +Each element of the PCI Express hierarchy (Root Complexes,
>> +PCI Express Root Ports, PCI Express Downstream/Upstream ports)
>> +takes up bus numbers. Since only one (multifunction) device
>> +can be attached to a PCI Express Root Port or PCI Express Downstream
>> +Port it is advised to plan in advance for the expected number of
>> +devices to prevent bus numbers starvation.

{21} Please add:

"""
In particular:

- Avoiding PCI Express Switches (and thereby striving for a flat PCI
Express hierarchy) enables the hierarchy to not spend bus numbers on
Upstream Ports.

- The bus_nr properties of the pxb-pcie devices partition the 0..255 bus
number space. All bus numbers assigned to the buses recursively behind a
given pxb-pcie device's root bus must fit between the bus_nr property of
that pxb-pcie device, and the lowest of the higher bus_nr properties
that the command line sets for other pxb-pcie devices.
"""

(You do mention switches being more bus number-hungry, under chapter 5,
hotplug; however, saving bus numbers makes sense for a purely
cold-plugged scenario as well, especially in combination with pxb-pcie,
where the partitioning can become a limiting factor.)

>> +
>> +
>> +5. Hot Plug
>> +============
>> +The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie
>> devices)
>> +do not support hot-plug, so any devices plugged into Root Complexes
>> +cannot be hot-plugged/hot-unplugged:
>> +    (1) PCI Express Integrated Devices
>> +    (2) PCI Express Root Ports
>> +    (3) DMI-PCI bridges
>> +    (4) pxb-pcie
>> +
>> +PCI devices can be hot-plugged into PCI-PCI bridges, however cannot
>> +be hot-plugged into DMI-PCI bridges.

{22} If you agree with my suggestion to remove 2.3.2, that is, the "PCI
device cold-plugged directly into DMI-PCI bridge" case, then the second
half of the sentence can be dropped.

>> +The PCI hotplug is ACPI based and can work side by side with the
>> +PCI Express native hotplug.
>> +
>> +PCI Express devices can be natively hot-plugged/hot-unplugged into/from
>> +PCI Express Root Ports (and PCI Express Downstream Ports).
>> +
>> +5.1 Planning for hotplug:
>> +    (1) PCI hierarchy
>> +        Leave enough PCI-PCI bridge slots empty or add one
>> +        or more empty PCI-PCI bridges to the DMI-PCI bridge.
>> +
>> +        For each such bridge the Guest Firmware is expected to
>> reserve 4K IO
>> +        space and 2M MMIO range to be used for all devices behind it.
>> +
>> +        Because of the hard IO limit of around 10 PCI bridges (~ 40K
>> space) per system
>> +        don't use more than 9 bridges, leaving 4K for the Integrated
>> devices
>> +        and none for the PCI Express Hierarchy.

{23} s/9 bridges/9 PCI-PCI bridges/

>> +
>> +    (2) PCI Express hierarchy:
>> +        Leave enough PCI Express Root Ports empty. Use multifunction
>> +        PCI Express Root Ports to prevent going out of PCI bus numbers.

{24} I agree, but I'd put it a bit differently: use multifunction PCI
Express Root Ports on the Root Complex(es), for keeping the hierarchy as
flat as possible, thereby saving PCI bus numbers.

>> +        Don't use PCI Express Switches if you don't have too, they use
>> +        an extra PCI bus that may handy to plug another device id it
>> comes to it.
>> +

{25} I'd put it as: Don't use PCI Express Switches if you don't have
too, each one of those uses an extra PCI bus (for its Upstream Port)
that could be put to better use with another Root Port or Downstream
Port, which may come handy for hotplugging another device.

{26} Another remark (important to me) in this section: the document
doesn't state firmware expectations. It's clear the firmware is expected
to reserve no IO space for PCI Express Downstream Ports and Root Ports,
but what about MMIO?

We discussed this at length with Alex, but I think we didn't conclude
anything. It would be nice if firmware received some instructions from
this document in this regard, even before we implement our own ports and
bridges in QEMU.

<digression>

If we think such recommendations are out of scope at this point, *and*
noone disagrees strongly (Gerd?), then I could add some experimental
fw_cfg knobs to OVMF for this, such as (units in MB):

-fw_cfg opt/org.tianocore.ovmf/X-ReservePciE/PrefMmio32Mb,string=...
-fw_cfg opt/org.tianocore.ovmf/X-ReservePciE/NonPrefMmio32Mb,string=...
-fw_cfg opt/org.tianocore.ovmf/X-ReservePciE/PrefMmio64Mb,string=..
-fw_cfg opt/org.tianocore.ovmf/X-ReservePciE/NonPrefMmio64Mb,string=..

Under this idea, I would reserve no resources at all for Downstream
Ports and Root Ports in OVMF by default; but users could influence those
reservations. I think that would be enough to kick things off. It also
needs no modifications for QEMU.

</digression>

>> +5.3 Hot plug example:
>> +Using HMP: (add -monitor stdio to QEMU command line)
>> +  device_add <dev>,id=<id>,bus=<pcie.0/PCI Express Root Port
>> Id/PCI-PCI bridge Id/pxb-pcie Id>

{27} I think the bus=<...> part is incorrect here. Based on the rest of
the guidelines, we have to specify the ID of:
- a PCI Express Root Port, or
- a PCI Express Downstream Port, or
- a PCI-PCI bridge.

>> +
>> +
>> +6. Device assignment
>> +====================
>> +Host devices are mostly PCI Express and should be plugged only into
>> +PCI Express Root Ports or PCI Express Downstream Ports.
>> +PCI-PCI bridge slots can be used for legacy PCI host devices.
>> +
>> +6.1 How to detect if a device is PCI Express:
>> +  > lspci -s 03:00.0 -v (as root)
>> +
>> +    03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
>> +    Subsystem: Intel Corporation Dual Band Wireless-AC 7260
>> +    Flags: bus master, fast devsel, latency 0, IRQ 50
>> +    Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
>> +    Capabilities: [c8] Power Management version 3
>> +    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>> +    Capabilities: [40] Express Endpoint, MSI 00
>> +
>> +    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +    Capabilities: [100] Advanced Error Reporting
>> +    Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
>> +    Capabilities: [14c] Latency Tolerance Reporting
>> +    Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1
>> Len=014
>> +
>> +
>> +7. Virtio devices
>> +=================
>> +Virtio devices plugged into the PCI hierarchy or as Integrated Devices
>> +will remain PCI and have transitional behaviour as default.
>> +Transitional virtio devices work in both IO and MMIO modes depending on
>> +the guest support.

{28} Suggest to add: firmware will assign both IO and MMIO resources to
transitional virtio devices.

>> +
>> +Virtio devices plugged into PCI Express ports are PCI Express devices
>> and
>> +have "1.0" behavior by default without IO support.
>> +In both case disable-* properties can be used to override the behaviour.

{29} s/case/cases/; also, please spell out the disable-* properties fully.

>> +
>> +Note that setting disable-legacy=off will enable legacy mode (enabling
>> +legacy behavior) for PCI Express virtio devices causing them to
>> +require IO space, which, given our PCI Express hierarchy, may quickly

{30} s/given our PCI Express hierarchy/given the limited available IO space/

>> +lead to resource exhaustion, and is therefore strongly discouraged.
>> +
>> +
>> +8. Conclusion
>> +==============
>> +The proposal offers a usage model that is easy to understand and follow
>> +and in the same time overcomes the PCI Express architecture limitations.
>> +
>>
> 
> 

I think this version has seen big improvements, and I think it's
structurally complete. While composing this review, I went through the
entire RFC thread again, and I *think* you didn't miss anything from
that. Great job again!

My comments vary in importance. I trust you to take each comment with an
appropriately sized grain of salt ;)

Thank you!
Laszlo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-14 11:36   ` Laszlo Ersek
@ 2016-10-17 12:07     ` Gerd Hoffmann
  2016-10-17 14:07       ` Laszlo Ersek
  2016-10-27 11:27     ` Marcel Apfelbaum
  1 sibling, 1 reply; 13+ messages in thread
From: Gerd Hoffmann @ 2016-10-17 12:07 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Marcel Apfelbaum, qemu-devel, Peter Maydell, Andrew Jones,
	Michael S. Tsirkin, Andrea Bolognani, Alex Williamson,
	Laine Stump

  Hi,

> {26} Another remark (important to me) in this section: the document
> doesn't state firmware expectations. It's clear the firmware is expected
> to reserve no IO space for PCI Express Downstream Ports and Root Ports,
> but what about MMIO?
> 
> We discussed this at length with Alex, but I think we didn't conclude
> anything. It would be nice if firmware received some instructions from
> this document in this regard, even before we implement our own ports and
> bridges in QEMU.

Where do we stand in terms of generic pcie ports btw?

I think the plan is still to communicate suggestions to the firmware via
pci config space, either by using reset defaults of the limit register,
or of that doesn't work due to initialization order issues using some
vendor specific pcie capability.

As long as we don't have that there is nothing do document, other than
maybe briefly mentioning the plans we have and documenting the current
state (2M mmio in seabios, and I think the same for ovmf).

The patches adding the generic ports can also update the documentation
of course.

> <digression>
> 
> If we think such recommendations are out of scope at this point, *and*
> noone disagrees strongly (Gerd?), then I could add some experimental
> fw_cfg knobs to OVMF for this, such as (units in MB):

Why?  Given that the virtio mmio bar size issue is solved I don't see a
strong reason to hurry with this.  Just wait until the generic ports are
there.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-17 12:07     ` Gerd Hoffmann
@ 2016-10-17 14:07       ` Laszlo Ersek
  2016-10-27 11:28         ` Marcel Apfelbaum
  0 siblings, 1 reply; 13+ messages in thread
From: Laszlo Ersek @ 2016-10-17 14:07 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Marcel Apfelbaum, qemu-devel, Peter Maydell, Andrew Jones,
	Michael S. Tsirkin, Andrea Bolognani, Alex Williamson,
	Laine Stump

On 10/17/16 14:07, Gerd Hoffmann wrote:
>   Hi,
> 
>> {26} Another remark (important to me) in this section: the document
>> doesn't state firmware expectations. It's clear the firmware is expected
>> to reserve no IO space for PCI Express Downstream Ports and Root Ports,
>> but what about MMIO?
>>
>> We discussed this at length with Alex, but I think we didn't conclude
>> anything. It would be nice if firmware received some instructions from
>> this document in this regard, even before we implement our own ports and
>> bridges in QEMU.
> 
> Where do we stand in terms of generic pcie ports btw?

"planning phase", AFAIR

> 
> I think the plan is still to communicate suggestions to the firmware via
> pci config space, either by using reset defaults of the limit register,
> or of that doesn't work due to initialization order issues using some
> vendor specific pcie capability.
> 
> As long as we don't have that there is nothing do document, other than
> maybe briefly mentioning the plans we have and documenting the current
> state (2M mmio in seabios, and I think the same for ovmf).
> 
> The patches adding the generic ports can also update the documentation
> of course.
> 
>> <digression>
>>
>> If we think such recommendations are out of scope at this point, *and*
>> noone disagrees strongly (Gerd?), then I could add some experimental
>> fw_cfg knobs to OVMF for this, such as (units in MB):
> 
> Why?  Given that the virtio mmio bar size issue is solved I don't see a
> strong reason to hurry with this.  Just wait until the generic ports are
> there.

Fine.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-13 14:05 ` Marcel Apfelbaum
  2016-10-14 11:36   ` Laszlo Ersek
@ 2016-10-17 14:18   ` Andrea Bolognani
  2016-10-17 14:26     ` Laszlo Ersek
  2016-10-27 14:36     ` Marcel Apfelbaum
  1 sibling, 2 replies; 13+ messages in thread
From: Andrea Bolognani @ 2016-10-17 14:18 UTC (permalink / raw)
  To: Marcel Apfelbaum, qemu-devel
  Cc: Laszlo Ersek, Gerd Hoffmann, Laine Stump, Peter Maydell,
	Andrew Jones, Alex Williamson, Daniel Berrange,
	Michael S. Tsirkin

On Thu, 2016-10-13 at 17:05 +0300, Marcel Apfelbaum wrote:
> > +PCI EXPRESS GUIDELINES
> > +======================
> > +
> > +1. Introduction
> > +================
> > +The doc proposes best practices on how to use PCI Express/PCI device
> > +in PCI Express based machines and explains the reasoning behind them.
> > +
> > +
> > +2. Device placement strategy
> > +============================
> > +QEMU does not have a clear socket-device matching mechanism
> > +and allows any PCI/PCI Express device to be plugged into any PCI/PCI Express slot.
> > +Plugging a PCI device into a PCI Express slot might not always work and
> > +is weird anyway since it cannot be done for "bare metal".
> > +Plugging a PCI Express device into a PCI slot will hide the Extended
> > +Configuration Space thus is also not recommended.
> > +
> > +The recommendation is to separate the PCI Express and PCI hierarchies.
> > +PCI Express devices should be plugged only into PCI Express Root Ports and
> > +PCI Express Downstream ports.
> > +
> > +2.1 Root Bus (pcie.0)
> > +=====================
> > +Place only the following kinds of devices directly on the Root Complex:
> > +    (1) Devices with dedicated, specific functionality (network card,
> > +        graphics card, IDE controller, etc); place only legacy PCI devices on
> > +        the Root Complex. These will be considered Integrated Endpoints.
> > +        Note: Integrated devices are not hot-pluggable.

s/Integrated devices/Integrated Endpoints/ (which I assume
is a Spec-Originated Term) in the last sentence, to be
consistent with the one right before it.

I'm also not sure what you mean by devices with "dedicated,
specific functionality", and unfortunately the examples don't
seem to be helping me.

> > +        Although the PCI Express spec does not forbid PCI Express devices as
> > +        Integrated Endpoints, existing hardware mostly integrates legacy PCI
> > +        devices with the Root Complex. Guest OSes are suspected to behave
> > +        strangely when PCI Express devices are integrated with the Root Complex.
> > +
> > +    (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express
> > +        hierarchies.
> > +
> > +    (3) DMI-PCI bridges (i82801b11-bridge), for starting legacy PCI hierarchies.
> > +
> > +    (4) Extra Root Complexes (pxb-pcie), if multiple PCIe Root Buses are needed.
> > +
> > +   pcie.0 bus
> > +   -----------------------------------------------------------------------------
> > +        |                |                    |                  |
> > +   -----------   ------------------   ------------------   --------------
> > +   | PCI Dev |   | PCIe Root Port |   | DMI-PCI bridge |   |  pxb-pcie  |
> > +   -----------   ------------------   ------------------   --------------
> > +
> > +2.1.1 To plug a device into a pcie.0 as Root Complex Integrated Device use:

s/Root Complex Integrated Device/Integrated Endpoint/ ?

I don't know how much any of these terms can be used
interchangeably, but it would be good IMHO if we could choose
a single term and stick to it throughout the document.

> > +          -device <dev>[,bus=pcie.0]

Is the bus=pcie.0 bit really optional? Will QEMU just assign
devices to pcie.0 automatically unless you provide an explicit
bus= option telling it otherwise?

> > +2.1.2 To expose a new PCI Express Root Bus use:
> > +          -device pxb-pcie,id=pcie.1,bus_nr=x,[numa_node=y],[addr=z]
> > +      Only PCI Express Root Ports and DMI-PCI bridges can be connected to the pcie.1 bus:
> > +          -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \
> > +          -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1

Here I really can't see how the bus= option would be optional
for ioh3420 but mandatory for i82801b11-bridge.

Neither for <dev> above nor for i82801b11-bridge you show
that the slot= option exists. Of course these are merely
usage examples and are not intended to replace the full
documentation: since this is the case, I think we should
make that very explicit and possibly avoid listing options
we don't use at all, eg.

  -device e1000,bus=pcie.0

  This will plug a e1000 Ethernet adapter into pcie.0 as an
  Integrated Endpoint.

and so on.

> > +2.2 PCI Express only hierarchy
> > +==============================
> > +Always use PCI Express Root Ports to start PCI Express hierarchies.
> > +
> > +A PCI Express Root bus supports up to 32 devices. Since each
> > +PCI Express Root Port is a function and a multi-function
> > +device may support up to 8 functions, the maximum possible
> > +PCI Express Root Ports per PCI Express Root Bus is 256.
> > +
> > +Prefer coupling PCI Express Root Ports into multi-function devices

s/coupling/grouping/

> > +to keep a simple flat hierarchy that is enough for most scenarios.
> > +Only use PCI Express Switches (x3130-upstream, xio3130-downstream)
> > +if there is no more room for PCI Express Root Ports.
> > +Please see section 4. for further justifications.
> > +
> > +Plug only PCI Express devices into PCI Express Ports.
> > +
> > +
> > +   pcie.0 bus
> > +   ----------------------------------------------------------------------------------
> > +        |                 |                                    |
> > +   -------------    -------------                        -------------
> > +   | Root Port |    | Root Port |                        | Root Port |
> > +   ------------     -------------                        -------------
> > +         |                            -------------------------|------------------------
> > +    ------------                      |                 -----------------              |
> > +    | PCIe Dev |                      |    PCI Express  | Upstream Port |              |
> > +    ------------                      |      Switch     -----------------              |
> > +                                      |                  |            |                |
> > +                                      |    -------------------    -------------------  |
> > +                                      |    | Downstream Port |    | Downstream Port |  |
> > +                                      |    -------------------    -------------------  |
> > +                                      -------------|-----------------------|------------
> > +                                             ------------
> > +                                             | PCIe Dev |
> > +                                             ------------
> > +
> > +2.2.1 Plugging a PCI Express device into a PCI Express Root Port:
> > +          -device ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
> > +          -device <dev>,bus=root_port1
> > +      Note that chassis parameter is compulsory, and must be unique

s/compulsory/mandatory/

> > +      for each PCI Express Root Port.
> > +2.2.2 Using multi-function PCI Express Root Ports:
> > +      -device ioh3420,id=root_port1,multifunction=on,chassis=x[,bus=pcie.0][,slot=y][,addr=z.0] \
> > +      -device ioh3420,id=root_port2,,chassis=x1[,bus=pcie.0][,slot=y1][,addr=z.1] \
> > +      -device ioh3420,id=root_port3,,chassis=x2[,bus=pcie.0][,slot=y2][,addr=z.2] \
> > +2.2.2 Plugging a PCI Express device into a Switch:
> > +      -device ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
> > +      -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x]          \
> > +      -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1[,slot=y1][,addr=z1]] \
> > +      -device <dev>,bus=downstream_port1
> > +
> > +
> > +2.3 PCI only hierarchy
> > +======================
> > +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices.

Maybe we could add something like

  but, as mentioned above, doing so means the legacy PCI
  device in question will be incapable of hot-unplugging.

just to stress the fact.

Aside: "device" or "endpoint"? My understanding is that
endpoints are things like Ethernet adapters, and devices
are endpoints *or* controller, but I might be way off base
here ;)

> > +Besides that use DMI-PCI bridges (i82801b11-bridge) to start PCI hierarchies.
> > +
> > +Prefer flat hierarchies. For most scenarios a single DMI-PCI bridge (having 32 slots)
> > +and several PCI-PCI bridges attached to it (each supporting also 32 slots) will support
> > +hundreds of legacy devices. The recommendation is to populate one PCI-PCI bridge
> > +under the DMI-PCI bridge until is full and then plug a new PCI-PCI bridge...
> > +
> > +   pcie.0 bus
> > +   ----------------------------------------------
> > +        |                            |
> > +   -----------               ------------------
> > +   | PCI Dev |               | DMI-PCI BRIDGE |
> > +   ----------                ------------------
> > +                               |            |
> > +                        -----------    ------------------
> > +                        | PCI Dev |    | PCI-PCI Bridge |
> > +                        -----------    ------------------
> > +                                         |           |
> > +                                  -----------     -----------
> > +                                  | PCI Dev |     | PCI Dev |
> > +                                  -----------     -----------
> > +
> > +2.3.1 To plug a PCI device into a pcie.0 as Integrated Device use:
> > +      -device <dev>[,bus=pcie.0]
> > +2.3.2 Plugging a PCI device into a DMI-PCI bridge:
> > +      -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]    \
> > +      -device <dev>,bus=dmi_pci_bridge1[,addr=x]
> > +2.3.3 Plugging a PCI device into a PCI-PCI bridge:
> > +      -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]                        \
> > +      -device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]    \
> > +      -device <dev>,bus=pci_bridge1[,addr=x]
> > +
> > +
> > +3. IO space issues
> > +===================
> > +The PCI Express Root Ports and PCI Express Downstream ports are seen by
> > +Firmware/Guest OS as PCI-PCI bridges and, as required by PCI spec,
> > +should reserve a 4K IO range for each even if only one (multifunction)
> > +device can be plugged into them, resulting in poor IO space utilization.
> > +
> > +The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations
> > +by not allocating IO space if possible:
> > +    (1) - For empty PCI Express Root Ports/PCI Express Downstream ports.
> > +    (2) - If the device behind the PCI Express Root Ports/PCI Express
> > +          Downstream has no IO BARs.
> > +
> > +The IO space is very limited, 65536 byte-wide IO ports, but it's fragmented
> > +resulting in ~10 PCI Express Root Ports (or PCI Express Downstream/Upstream ports)
> > +ports per system if devices with IO BARs are used in the PCI Express hierarchy.
> > +
> > +Using the proposed device placing strategy solves this issue
> > +by using only PCI Express devices within PCI Express hierarchy.
> > +
> > +The PCI Express spec requires the PCI Express devices to work without using IO.
> > +The PCI hierarchy has no such limitations.

After reading this section and piecing all bits together, my
understanding is that

  a) the spec requires firmware and OS to allocate 4K of IO
     space for each PCI Express Root Port or PCI Express
     Switch Downstream Port

  b) that's necessary because legacy PCI Devices require IO
     space to operate

  c) however, IO space is a very limited resource

  d) the spec also requires PCI Express Devices (or rather
     Endpoints?) to work without using any IO space

  e) if we stick to the plan outlined in this document, there
     will never be legacy PCI Devices plugged into PCI Express
     Root Ports or PCI Express Switch Downstream Ports (only
     PCI Express Endpoints or PCI Express Switch Upstream
     Ports)

  f) thanks to d) and e), the firmware (and OS? IIRC Linux
     currently doesn't do this) can ignore a) and allocate no
     IO space for PCI Express Root Ports and PCI Express
     Switch Downstream Ports

  g) thus effectively making c) irrelevant

If my reading is correct, I think we should put more emphasis
on the fact that our device placement strategy is effective
in avoiding IO space exaustion because we stick to PCI Express
Devices only which, unlike legacy PCI Devices, require no IO
space to operate.

> > +4. Bus numbers issues
> > +======================
> > +Each PCI domain can have up to only 256 buses and the QEMU PCI Express
> > +machines do not support multiple PCI domains even if extra Root
> > +Complexes (pxb-pcie) are used.
> > +
> > +Each element of the PCI Express hierarchy (Root Complexes,
> > +PCI Express Root Ports, PCI Express Downstream/Upstream ports)
> > +takes up bus numbers. Since only one (multifunction) device
> > +can be attached to a PCI Express Root Port or PCI Express Downstream
> > +Port it is advised to plan in advance for the expected number of
> > +devices to prevent bus numbers starvation.
> > +
> > +
> > +5. Hot Plug

Hot Plug, hot-plug or hotplug?

Pick one and stick with it :)

> > +============
> > +The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices)
> > +do not support hot-plug, so any devices plugged into Root Complexes
> > +cannot be hot-plugged/hot-unplugged:
> > +    (1) PCI Express Integrated Devices
> > +    (2) PCI Express Root Ports
> > +    (3) DMI-PCI bridges
> > +    (4) pxb-pcie
> > +
> > +PCI devices can be hot-plugged into PCI-PCI bridges, however cannot
> > +be hot-plugged into DMI-PCI bridges.
> > +The PCI hotplug is ACPI based and can work side by side with the
> > +PCI Express native hotplug.
> > +
> > +PCI Express devices can be natively hot-plugged/hot-unplugged into/from
> > +PCI Express Root Ports (and PCI Express Downstream Ports).
> > +
> > +5.1 Planning for hotplug:
> > +    (1) PCI hierarchy
> > +        Leave enough PCI-PCI bridge slots empty or add one
> > +        or more empty PCI-PCI bridges to the DMI-PCI bridge.
> > +
> > +        For each such bridge the Guest Firmware is expected to reserve 4K IO
> > +        space and 2M MMIO range to be used for all devices behind it.
> > +
> > +        Because of the hard IO limit of around 10 PCI bridges (~ 40K space) per system
> > +        don't use more than 9 bridges, leaving 4K for the Integrated devices
> > +        and none for the PCI Express Hierarchy.

I would rewrite the last line as

  (the PCI Express hierarchy needs no IO space).

or something like that. Unless I misunderstood :)

> > +    (2) PCI Express hierarchy:
> > +        Leave enough PCI Express Root Ports empty. Use multifunction
> > +        PCI Express Root Ports to prevent going out of PCI bus numbers.

When you say "multifunction PCI Express Root Ports", are you
suggesting the user should take advantage of all 8 functions
in a PCI Express Root Port, eg. by plugging 4 Ethernet
adapters into the same PCI Express Root Port instead of using
a single PCI Express Root Port for each one of them, or that
multiple PCI Express Root Ports should be plugged into
functions of the same pcie.0 slot?

Either way, I don't see how doing so would prevent you from
running out of PCI bus numbers - if anything, not doing the
latter will make it so you'll run out of ports way before
you come anywhere close to 256 PCI buses.

Point is, being less terse here could be very helpful.

> > +        Don't use PCI Express Switches if you don't have too, they use
> > +        an extra PCI bus that may handy to plug another device id it comes to it.

s/too/to/
s/may handy/may come handy/
s/id it comes/if it comes/

> > +5.3 Hot plug example:
> > +Using HMP: (add -monitor stdio to QEMU command line)
> > +  device_add <dev>,id=<id>,bus=<pcie.0/PCI Express Root Port Id/PCI-PCI bridge Id/pxb-pcie Id>
> > +
> > +
> > +6. Device assignment
> > +====================
> > +Host devices are mostly PCI Express and should be plugged only into
> > +PCI Express Root Ports or PCI Express Downstream Ports.
> > +PCI-PCI bridge slots can be used for legacy PCI host devices.
> > +
> > +6.1 How to detect if a device is PCI Express:
> > +  > lspci -s 03:00.0 -v (as root)
> > +
> > +    03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
> > +    Subsystem: Intel Corporation Dual Band Wireless-AC 7260
> > +    Flags: bus master, fast devsel, latency 0, IRQ 50
> > +    Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
> > +    Capabilities: [c8] Power Management version 3
> > +    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > +    Capabilities: [40] Express Endpoint, MSI 00
> > +

I'd skip this empty line...

> > +    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > +

... and this one too.

> > +    Capabilities: [100] Advanced Error Reporting
> > +    Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
> > +    Capabilities: [14c] Latency Tolerance Reporting
> > +    Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014

I'd also add a small note along the lines of

  If you can see the "Express Endpoint" capability in the
  output, then the device is indeed PCI Express.

> > +7. Virtio devices
> > +=================
> > +Virtio devices plugged into the PCI hierarchy or as Integrated Devices
> > +will remain PCI and have transitional behaviour as default.
> > +Transitional virtio devices work in both IO and MMIO modes depending on
> > +the guest support.
> > +
> > +Virtio devices plugged into PCI Express ports are PCI Express devices and
> > +have "1.0" behavior by default without IO support.
> > +In both case disable-* properties can be used to override the behaviour.
> > +
> > +Note that setting disable-legacy=off will enable legacy mode (enabling
> > +legacy behavior) for PCI Express virtio devices causing them to
> > +require IO space, which, given our PCI Express hierarchy, may quickly
> > +lead to resource exhaustion, and is therefore strongly discouraged.
> > +
> > +
> > +8. Conclusion
> > +==============
> > +The proposal offers a usage model that is easy to understand and follow
> > +and in the same time overcomes the PCI Express architecture limitations.

s/in the same/at the same/


Overall I think the contents we care about are pretty much
there and are exposed and organized in a very clear fashion;
pretty much all of my comments are meant to improve the few
areas where I believe we could make things even easier to
grasp for the reader, remove potential ambiguity, or be more
consistent.

Thanks for working on this! Looking forward to v3 ;)

-- 
Andrea Bolognani / Red Hat / Virtualization

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-17 14:18   ` Andrea Bolognani
@ 2016-10-17 14:26     ` Laszlo Ersek
  2016-10-17 14:53       ` Andrea Bolognani
  2016-10-27 14:36     ` Marcel Apfelbaum
  1 sibling, 1 reply; 13+ messages in thread
From: Laszlo Ersek @ 2016-10-17 14:26 UTC (permalink / raw)
  To: Andrea Bolognani, Marcel Apfelbaum, qemu-devel
  Cc: Gerd Hoffmann, Laine Stump, Peter Maydell, Andrew Jones,
	Alex Williamson, Daniel Berrange, Michael S. Tsirkin

On 10/17/16 16:18, Andrea Bolognani wrote:
> On Thu, 2016-10-13 at 17:05 +0300, Marcel Apfelbaum wrote:
>>> +PCI EXPRESS GUIDELINES
>>> +======================
>>> +
>>> +1. Introduction
>>> +================
>>> +The doc proposes best practices on how to use PCI Express/PCI device
>>> +in PCI Express based machines and explains the reasoning behind them.
>>> +
>>> +
>>> +2. Device placement strategy
>>> +============================
>>> +QEMU does not have a clear socket-device matching mechanism
>>> +and allows any PCI/PCI Express device to be plugged into any PCI/PCI Express slot.
>>> +Plugging a PCI device into a PCI Express slot might not always work and
>>> +is weird anyway since it cannot be done for "bare metal".
>>> +Plugging a PCI Express device into a PCI slot will hide the Extended
>>> +Configuration Space thus is also not recommended.
>>> +
>>> +The recommendation is to separate the PCI Express and PCI hierarchies.
>>> +PCI Express devices should be plugged only into PCI Express Root Ports and
>>> +PCI Express Downstream ports.
>>> +
>>> +2.1 Root Bus (pcie.0)
>>> +=====================
>>> +Place only the following kinds of devices directly on the Root Complex:
>>> +    (1) Devices with dedicated, specific functionality (network card,
>>> +        graphics card, IDE controller, etc); place only legacy PCI devices on
>>> +        the Root Complex. These will be considered Integrated Endpoints.
>>> +        Note: Integrated devices are not hot-pluggable.
> 
> s/Integrated devices/Integrated Endpoints/ (which I assume
> is a Spec-Originated Term) in the last sentence, to be
> consistent with the one right before it.
> 
> I'm also not sure what you mean by devices with "dedicated,
> specific functionality", and unfortunately the examples don't
> seem to be helping me.

That language is from me (I suggested it earlier). It means devices that
you want to use for some actual task, rather than just to build your PCI
(Express) hierarchy with them. "Leaf nodes" or whatever. "Everything
non-bridge".

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-17 14:26     ` Laszlo Ersek
@ 2016-10-17 14:53       ` Andrea Bolognani
  0 siblings, 0 replies; 13+ messages in thread
From: Andrea Bolognani @ 2016-10-17 14:53 UTC (permalink / raw)
  To: Laszlo Ersek, Marcel Apfelbaum, qemu-devel
  Cc: Gerd Hoffmann, Laine Stump, Peter Maydell, Andrew Jones,
	Alex Williamson, Daniel Berrange, Michael S. Tsirkin

On Mon, 2016-10-17 at 16:26 +0200, Laszlo Ersek wrote:
> > > > +2.1 Root Bus (pcie.0)
> > > > +=====================
> > > > +Place only the following kinds of devices directly on the Root Complex:
> > > > +    (1) Devices with dedicated, specific functionality (network card,
> > > > +        graphics card, IDE controller, etc); place only legacy PCI devices on
> > > > +        the Root Complex. These will be considered Integrated Endpoints.
> > > > +        Note: Integrated devices are not hot-pluggable.
> > 
> > s/Integrated devices/Integrated Endpoints/ (which I assume
> > is a Spec-Originated Term) in the last sentence, to be
> > consistent with the one right before it.
> > 
> > I'm also not sure what you mean by devices with "dedicated,
> > specific functionality", and unfortunately the examples don't
> > seem to be helping me.
> 
> That language is from me (I suggested it earlier). It means devices that
> you want to use for some actual task, rather than just to build your PCI
> (Express) hierarchy with them. "Leaf nodes" or whatever. "Everything
> non-bridge".

Isn't that basically the definition of Endpoint? It certainly
is what I convinced myself "Endpoint" means :)

Either way, I think we should spend a few more words there,
for clarity's sake. Rewording the phrase to put more emphasis
on the "legacy PCI devices only" part would probably be
warranted as well.

-- 
Andrea Bolognani / Red Hat / Virtualization

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-14 11:36   ` Laszlo Ersek
  2016-10-17 12:07     ` Gerd Hoffmann
@ 2016-10-27 11:27     ` Marcel Apfelbaum
  2016-10-27 15:44       ` Laszlo Ersek
  1 sibling, 1 reply; 13+ messages in thread
From: Marcel Apfelbaum @ 2016-10-27 11:27 UTC (permalink / raw)
  To: Laszlo Ersek, qemu-devel
  Cc: Peter Maydell, Andrew Jones, Michael S. Tsirkin,
	Andrea Bolognani, Alex Williamson, Gerd Hoffmann, Laine Stump

On 10/14/2016 02:36 PM, Laszlo Ersek wrote:
> On 10/13/16 16:05, Marcel Apfelbaum wrote:
>> On 10/13/2016 04:52 PM, Marcel Apfelbaum wrote:
>>> Proposes best practices on how to use PCI Express/PCI device
>>> in PCI Express based machines and explain the reasoning behind them.
>>>
>>> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
>>> ---
>>>
>>> Hi,
>>>
>>> I am sending the doc  twice, it appears the first time didn't make it
>>> to qemu-devel list.
>>
>> Hi,
>>
>> Adding people to CC. Sorry for the earlier noise.
>>
>> Thanks,
>> Marcel
>>
>>>

Hi Laszlo,
Thanks for the review, I'll do my best to address  all
the comments in nest version.

[...]

>>> +
>>> +2.2.1 Plugging a PCI Express device into a PCI Express Root Port:
>>> +          -device
>>> ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
>>> +          -device <dev>,bus=root_port1
>>> +      Note that chassis parameter is compulsory, and must be unique
>>> +      for each PCI Express Root Port.
>
> {7} Hmmm, I think it's rather that the (chassis, slot) *pair* must be
> unique. You can get away with leaving the default chassis=0 unspecified,
> and spell out just the slots, I think.
>

Yes you can, I wanted to let the "slot" out of the equation and use the chassis
as a known parameter from the PCI bridge - easier to digest.
However your comment make sense, I'll update the doc.

>>> +2.2.2 Using multi-function PCI Express Root Ports:
>>> +      -device
>>> ioh3420,id=root_port1,multifunction=on,chassis=x[,bus=pcie.0][,slot=y][,addr=z.0]
>>> \
>>> +      -device
>>> ioh3420,id=root_port2,,chassis=x1[,bus=pcie.0][,slot=y1][,addr=z.1] \
>>> +      -device
>>> ioh3420,id=root_port3,,chassis=x2[,bus=pcie.0][,slot=y2][,addr=z.2] \
>
> {8} This looks good to me, except for the double-comma typos: ",,".
>
>>> +2.2.2 Plugging a PCI Express device into a Switch:
>>> +      -device
>>> ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
>>> +      -device
>>> x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x]          \
>>> +      -device
>>> xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1[,slot=y1][,addr=z1]]
>>> \
>>> +      -device <dev>,bus=downstream_port1
>>> +
>
> {9} For all of these command lines, can you specify if z=0 (that is,
> device#0) is valid or not in the addr=z properties?
>


Address 0 is valid for all PCI bridges (we see PCI Express Root/Upstream/Downstream/Ports as PCI bridges)
that do not have an SHPC component.
Even the "regular" pci bridge can use slot 0 if we pass shpc=off as parameter.
For PCI Express slot 0 is always valid, this is why I didn't add it to the doc.


> Earlier discussion on this question:
>
> On 10/04/16 18:25, Laine Stump wrote:
>> On 10/04/2016 11:45 AM, Alex Williamson wrote:
>
>>>  Same with the restriction from using slot
>>> 0 on PCI bridges, there's no basis for that except on the root bus.
>>
>> I tried allowing devices to be plugged into slot 0 of a pci-bridge in
>> libvirt - qemu barfed, so I moved the "minSlot" for pci-bridge back up
>> to 1. Slot 0 is completely usable on a dmi-to-pci-bridge though (and
>> libvirt allows it). At this point, even if qemu enabled using slot 0
>> of a pci-bridge, libvirt wouldn't be able to expose that to users
>> (unless the min/max slot of each PCI controller was made visible
>> somewhere via QMP)
>
> On 10/05/16 12:03, Marcel Apfelbaum wrote:
>> The reason for not being able to plug a device into slot 0 of a PCI
>> Bridge is the SHPC (Hot-plug controller) device embedded in the PCI
>> bridge by default. The SHPC spec requires this. If one disables it
>> with shpc=false, he should be able to use the slot 0.
>
> For simplicity's sake I guess we should just recommend >=1 slot numbers.
>

Is a pity to skip slot 0 for PCI Express bridges.
- Root ports and Downstream ports have only slot 0 :)
- Root Complexes (pice.0 and pxb-pcies) can use slot 0

> Back to the patch:
>
> On 10/13/16 16:05, Marcel Apfelbaum wrote:
>> On 10/13/2016 04:52 PM, Marcel Apfelbaum wrote:
>
>>> +
>>> +2.3 PCI only hierarchy
>>> +======================
>>> +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices.
>>> +Besides that use DMI-PCI bridges (i82801b11-bridge) to start PCI
>>> hierarchies.
>>> +
>>> +Prefer flat hierarchies. For most scenarios a single DMI-PCI bridge
>>> (having 32 slots)
>>> +and several PCI-PCI bridges attached to it (each supporting also 32
>>> slots) will support
>>> +hundreds of legacy devices. The recommendation is to populate one
>>> PCI-PCI bridge
>>> +under the DMI-PCI bridge until is full and then plug a new PCI-PCI
>>> bridge...
>>> +
>>> +   pcie.0 bus
>>> +   ----------------------------------------------
>>> +        |                            |
>>> +   -----------               ------------------
>>> +   | PCI Dev |               | DMI-PCI BRIDGE |
>>> +   ----------                ------------------
>>> +                               |            |
>>> +                        -----------    ------------------
>>> +                        | PCI Dev |    | PCI-PCI Bridge |
>>> +                        -----------    ------------------
>>> +                                         |           |
>>> +                                  -----------     -----------
>>> +                                  | PCI Dev |     | PCI Dev |
>>> +                                  -----------     -----------
>>> +
>>> +2.3.1 To plug a PCI device into a pcie.0 as Integrated Device use:
>>> +      -device <dev>[,bus=pcie.0]
>
> (This is repeated from 2.1.1, but I guess it doesn't hurt, for
> completeness in this chapter.)
>
>>> +2.3.2 Plugging a PCI device into a DMI-PCI bridge:
>>> +      -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]    \
>>> +      -device <dev>,bus=dmi_pci_bridge1[,addr=x]
>
> {10} I recall that we discussed this at length, that is, placing PCI
> (non-express) devices directly on the DMI-PCI bridge. IIRC the argument
> was that it's technically possible, it just won't support hot-plug.
>

Exactly

> I'd like if we removed this use case from the document. It might be
> possible, but for simplicity's sake, we shouldn't advertise it, in my
> opinion. (Unless someone strongly disagrees with this idea, of course.)
> I recall esp. Laine, Alex and Daniel focusing on this, but I don't
> remember if (and what for) they wanted this option. Personally I'd like
> to see it disappear (unless convinced otherwise).
>
> ... Actually, going through the RFC thread again, it seems that the use
> case that I'm arguing against -- i.e., "use DMI-PCI as a generic
> PCIe-to-PCI bridge" -- makes Alex cringe as well. So there's that :)
>

I can do that, even if for non-hotplug scenarios we don't have any reason
to avoid that. On the other hand, not advertising is not a big crime.

> {11} Syntax remark, should we keep this section: the commas are not
> right in ",[,bus=pcie.0]".
>
>>> +2.3.3 Plugging a PCI device into a PCI-PCI bridge:
>>> +      -device
>>> i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]
>>> \
>
> {12} double comma again
>
>>> +      -device
>>> pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]
>>> \
>>> +      -device <dev>,bus=pci_bridge1[,addr=x]
>
> {13} It would be nice to spell out the valid device addresses (y and x)
> here too -- can we use 0 for them? SHPC again?
>
> Can we / should we simply go with >=1 device addresses?
>

For pci-bridges only - yes. A better idea (I think) is to disable SHPC by default
from the next QEMU version. Make this slot usable. Sounds OK?

>>> +
>>> +
>>> +3. IO space issues
>>> +===================
>>> +The PCI Express Root Ports and PCI Express Downstream ports are seen by
>>> +Firmware/Guest OS as PCI-PCI bridges and, as required by PCI spec,
>>> +should reserve a 4K IO range for each even if only one (multifunction)
>>> +device can be plugged into them, resulting in poor IO space utilization.
>
> {14} I completely agree with this paragraph, I'd just recommend to allow
> the reader more time to digest it. Something like:
>
> ----
> The PCI Express Root Ports and PCI Express Downstream ports are seen by
> Firmware/Guest OS as PCI-PCI bridges. As required by the PCI spec, each
> such Port should be reserved a 4K IO range for, even though only one
> (multifunction) device can be plugged into each Port. This results in
> poor IO space utilization.
> ----
>
>>> +
>>> +The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations
>>> +by not allocating IO space if possible:
>>> +    (1) - For empty PCI Express Root Ports/PCI Express Downstream ports.
>>> +    (2) - If the device behind the PCI Express Root Ports/PCI Express
>>> +          Downstream has no IO BARs.
>
> {15} I'd say:
>
> ... by not allocating IO space for each PCI Express Root / PCI Express
> Downstream port if:
> (1) the port is empty, or
> (2) the device behind the port has no IO BARs.
>
>>> +
>>> +The IO space is very limited, 65536 byte-wide IO ports, but it's
>>> fragmented
>
> {16} Suggestion: "The IO space is very limited, to 65536 byte-wide IO
> ports, and may even be fragmented by fixed IO ports owned by platform
> devices."
>
>>> +resulting in ~10 PCI Express Root Ports (or PCI Express
>
> {17} s/resulting in/resulting in at most/
>
>>> Downstream/Upstream ports)
>
> {18} Whoa, upstream ports? Why do we need to spell them out here?
> Upstream ports need their own bus numbers, but their IO space assignment
> only collects the IO space assignments of their downstream ports. Isn't
> that right?
>

Yes, I wad referring to the "couple", if you have an Upstream port you obviously
have at least a Downstream one.

> (I believe spelling out upstream ports here was recommended by Alex, but
> I don't understand why.)
>

Maybe to understand the connection between them? Usage?

>>> +ports per system if devices with IO BARs are used in the PCI Express
>
> {19} The word "ports" is unnecessary here.
>
>>> hierarchy.
>>> +
>>> +Using the proposed device placing strategy solves this issue
>>> +by using only PCI Express devices within PCI Express hierarchy.
>>> +
>>> +The PCI Express spec requires the PCI Express devices to work without
>>> using IO.
>>> +The PCI hierarchy has no such limitations.
>
> {20} There should be no empty line (or even a line break) between the
> two above paragraphs: the second paragraph explains the first one.
>
>>> +
>>> +
>>> +4. Bus numbers issues
>>> +======================
>>> +Each PCI domain can have up to only 256 buses and the QEMU PCI Express
>>> +machines do not support multiple PCI domains even if extra Root
>>> +Complexes (pxb-pcie) are used.
>>> +
>>> +Each element of the PCI Express hierarchy (Root Complexes,
>>> +PCI Express Root Ports, PCI Express Downstream/Upstream ports)
>>> +takes up bus numbers. Since only one (multifunction) device
>>> +can be attached to a PCI Express Root Port or PCI Express Downstream
>>> +Port it is advised to plan in advance for the expected number of
>>> +devices to prevent bus numbers starvation.
>
> {21} Please add:
>
> """
> In particular:
>
> - Avoiding PCI Express Switches (and thereby striving for a flat PCI
> Express hierarchy) enables the hierarchy to not spend bus numbers on
> Upstream Ports.
>
> - The bus_nr properties of the pxb-pcie devices partition the 0..255 bus
> number space. All bus numbers assigned to the buses recursively behind a
> given pxb-pcie device's root bus must fit between the bus_nr property of
> that pxb-pcie device, and the lowest of the higher bus_nr properties
> that the command line sets for other pxb-pcie devices.
> """
>
> (You do mention switches being more bus number-hungry, under chapter 5,
> hotplug; however, saving bus numbers makes sense for a purely
> cold-plugged scenario as well, especially in combination with pxb-pcie,
> where the partitioning can become a limiting factor.)
>
>>> +
>>> +
>>> +5. Hot Plug
>>> +============
>>> +The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie
>>> devices)
>>> +do not support hot-plug, so any devices plugged into Root Complexes
>>> +cannot be hot-plugged/hot-unplugged:
>>> +    (1) PCI Express Integrated Devices
>>> +    (2) PCI Express Root Ports
>>> +    (3) DMI-PCI bridges
>>> +    (4) pxb-pcie
>>> +
>>> +PCI devices can be hot-plugged into PCI-PCI bridges, however cannot
>>> +be hot-plugged into DMI-PCI bridges.
>
> {22} If you agree with my suggestion to remove 2.3.2, that is, the "PCI
> device cold-plugged directly into DMI-PCI bridge" case, then the second
> half of the sentence can be dropped.
>
>>> +The PCI hotplug is ACPI based and can work side by side with the
>>> +PCI Express native hotplug.
>>> +
>>> +PCI Express devices can be natively hot-plugged/hot-unplugged into/from
>>> +PCI Express Root Ports (and PCI Express Downstream Ports).
>>> +
>>> +5.1 Planning for hotplug:
>>> +    (1) PCI hierarchy
>>> +        Leave enough PCI-PCI bridge slots empty or add one
>>> +        or more empty PCI-PCI bridges to the DMI-PCI bridge.
>>> +
>>> +        For each such bridge the Guest Firmware is expected to
>>> reserve 4K IO
>>> +        space and 2M MMIO range to be used for all devices behind it.
>>> +
>>> +        Because of the hard IO limit of around 10 PCI bridges (~ 40K
>>> space) per system
>>> +        don't use more than 9 bridges, leaving 4K for the Integrated
>>> devices
>>> +        and none for the PCI Express Hierarchy.
>
> {23} s/9 bridges/9 PCI-PCI bridges/
>
>>> +
>>> +    (2) PCI Express hierarchy:
>>> +        Leave enough PCI Express Root Ports empty. Use multifunction
>>> +        PCI Express Root Ports to prevent going out of PCI bus numbers.
>
> {24} I agree, but I'd put it a bit differently: use multifunction PCI
> Express Root Ports on the Root Complex(es), for keeping the hierarchy as
> flat as possible, thereby saving PCI bus numbers.
>
>>> +        Don't use PCI Express Switches if you don't have too, they use
>>> +        an extra PCI bus that may handy to plug another device id it
>>> comes to it.
>>> +
>
> {25} I'd put it as: Don't use PCI Express Switches if you don't have
> too, each one of those uses an extra PCI bus (for its Upstream Port)
> that could be put to better use with another Root Port or Downstream
> Port, which may come handy for hotplugging another device.
>
> {26} Another remark (important to me) in this section: the document
> doesn't state firmware expectations. It's clear the firmware is expected
> to reserve no IO space for PCI Express Downstream Ports and Root Ports,
> but what about MMIO?
>
> We discussed this at length with Alex, but I think we didn't conclude
> anything. It would be nice if firmware received some instructions from
> this document in this regard, even before we implement our own ports and
> bridges in QEMU.
>

Hmm, I have no idea what to add here, except:
The firmware is expected to reserve at least 2M for each pci bridge?

> <digression>
>
> If we think such recommendations are out of scope at this point, *and*
> noone disagrees strongly (Gerd?), then I could add some experimental
> fw_cfg knobs to OVMF for this, such as (units in MB):
>
> -fw_cfg opt/org.tianocore.ovmf/X-ReservePciE/PrefMmio32Mb,string=...
> -fw_cfg opt/org.tianocore.ovmf/X-ReservePciE/NonPrefMmio32Mb,string=...
> -fw_cfg opt/org.tianocore.ovmf/X-ReservePciE/PrefMmio64Mb,string=..
> -fw_cfg opt/org.tianocore.ovmf/X-ReservePciE/NonPrefMmio64Mb,string=..
>
> Under this idea, I would reserve no resources at all for Downstream
> Ports and Root Ports in OVMF by default; but users could influence those
> reservations. I think that would be enough to kick things off. It also
> needs no modifications for QEMU.
>
> </digression>
>
>>> +5.3 Hot plug example:
>>> +Using HMP: (add -monitor stdio to QEMU command line)
>>> +  device_add <dev>,id=<id>,bus=<pcie.0/PCI Express Root Port
>>> Id/PCI-PCI bridge Id/pxb-pcie Id>
>
> {27} I think the bus=<...> part is incorrect here. Based on the rest of
> the guidelines, we have to specify the ID of:
> - a PCI Express Root Port, or
> - a PCI Express Downstream Port, or
> - a PCI-PCI bridge.
>

I don't get it, you specify what you wrote above as the bus, right?
For example if you start the machine with
     .... -device ioh3420,id=root_port1,
you hotplug with: device_add e1000,bus=root_port1.

>>> +
>>> +
>>> +6. Device assignment
>>> +====================
>>> +Host devices are mostly PCI Express and should be plugged only into
>>> +PCI Express Root Ports or PCI Express Downstream Ports.
>>> +PCI-PCI bridge slots can be used for legacy PCI host devices.
>>> +
>>> +6.1 How to detect if a device is PCI Express:
>>> +  > lspci -s 03:00.0 -v (as root)
>>> +
>>> +    03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
>>> +    Subsystem: Intel Corporation Dual Band Wireless-AC 7260
>>> +    Flags: bus master, fast devsel, latency 0, IRQ 50
>>> +    Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
>>> +    Capabilities: [c8] Power Management version 3
>>> +    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>> +    Capabilities: [40] Express Endpoint, MSI 00
>>> +
>>> +    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> +
>>> +    Capabilities: [100] Advanced Error Reporting
>>> +    Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
>>> +    Capabilities: [14c] Latency Tolerance Reporting
>>> +    Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1
>>> Len=014
>>> +
>>> +
>>> +7. Virtio devices
>>> +=================
>>> +Virtio devices plugged into the PCI hierarchy or as Integrated Devices
>>> +will remain PCI and have transitional behaviour as default.
>>> +Transitional virtio devices work in both IO and MMIO modes depending on
>>> +the guest support.
>
> {28} Suggest to add: firmware will assign both IO and MMIO resources to
> transitional virtio devices.
>
>>> +
>>> +Virtio devices plugged into PCI Express ports are PCI Express devices
>>> and
>>> +have "1.0" behavior by default without IO support.
>>> +In both case disable-* properties can be used to override the behaviour.
>
> {29} s/case/cases/; also, please spell out the disable-* properties fully.
>
>>> +
>>> +Note that setting disable-legacy=off will enable legacy mode (enabling
>>> +legacy behavior) for PCI Express virtio devices causing them to
>>> +require IO space, which, given our PCI Express hierarchy, may quickly
>
> {30} s/given our PCI Express hierarchy/given the limited available IO space/
>
>>> +lead to resource exhaustion, and is therefore strongly discouraged.
>>> +
>>> +
>>> +8. Conclusion
>>> +==============
>>> +The proposal offers a usage model that is easy to understand and follow
>>> +and in the same time overcomes the PCI Express architecture limitations.
>>> +
>>>
>>
>>
>
> I think this version has seen big improvements, and I think it's
> structurally complete. While composing this review, I went through the
> entire RFC thread again, and I *think* you didn't miss anything from
> that. Great job again!
>

Thanks!

> My comments vary in importance. I trust you to take each comment with an
> appropriately sized grain of salt ;)
>

Sure, thanks
Marcel

> Thank you!
> Laszlo
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-17 14:07       ` Laszlo Ersek
@ 2016-10-27 11:28         ` Marcel Apfelbaum
  0 siblings, 0 replies; 13+ messages in thread
From: Marcel Apfelbaum @ 2016-10-27 11:28 UTC (permalink / raw)
  To: Laszlo Ersek, Gerd Hoffmann
  Cc: qemu-devel, Peter Maydell, Andrew Jones, Michael S. Tsirkin,
	Andrea Bolognani, Alex Williamson, Laine Stump

On 10/17/2016 05:07 PM, Laszlo Ersek wrote:
> On 10/17/16 14:07, Gerd Hoffmann wrote:
>>   Hi,
>>
>>> {26} Another remark (important to me) in this section: the document
>>> doesn't state firmware expectations. It's clear the firmware is expected
>>> to reserve no IO space for PCI Express Downstream Ports and Root Ports,
>>> but what about MMIO?
>>>
>>> We discussed this at length with Alex, but I think we didn't conclude
>>> anything. It would be nice if firmware received some instructions from
>>> this document in this regard, even before we implement our own ports and
>>> bridges in QEMU.
>>
>> Where do we stand in terms of generic pcie ports btw?
>
> "planning phase", AFAIR
>

Indeed, I'll start working on it once this doc is finished.
Thanks,
Marcel

>>
>> I think the plan is still to communicate suggestions to the firmware via
>> pci config space, either by using reset defaults of the limit register,
>> or of that doesn't work due to initialization order issues using some
>> vendor specific pcie capability.
>>
>> As long as we don't have that there is nothing do document, other than
>> maybe briefly mentioning the plans we have and documenting the current
>> state (2M mmio in seabios, and I think the same for ovmf).
>>
>> The patches adding the generic ports can also update the documentation
>> of course.
>>
>>> <digression>
>>>
>>> If we think such recommendations are out of scope at this point, *and*
>>> noone disagrees strongly (Gerd?), then I could add some experimental
>>> fw_cfg knobs to OVMF for this, such as (units in MB):
>>
>> Why?  Given that the virtio mmio bar size issue is solved I don't see a
>> strong reason to hurry with this.  Just wait until the generic ports are
>> there.
>
> Fine.
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-17 14:18   ` Andrea Bolognani
  2016-10-17 14:26     ` Laszlo Ersek
@ 2016-10-27 14:36     ` Marcel Apfelbaum
  1 sibling, 0 replies; 13+ messages in thread
From: Marcel Apfelbaum @ 2016-10-27 14:36 UTC (permalink / raw)
  To: Andrea Bolognani, qemu-devel
  Cc: Laszlo Ersek, Gerd Hoffmann, Laine Stump, Peter Maydell,
	Andrew Jones, Alex Williamson, Daniel Berrange,
	Michael S. Tsirkin

On 10/17/2016 05:18 PM, Andrea Bolognani wrote:
> On Thu, 2016-10-13 at 17:05 +0300, Marcel Apfelbaum wrote:
>>> +PCI EXPRESS GUIDELINES
>>> +======================
>>> +
>>> +1. Introduction
>>> +================
>>> +The doc proposes best practices on how to use PCI Express/PCI device
>>> +in PCI Express based machines and explains the reasoning behind them.
>>> +
>>> +
>>> +2. Device placement strategy
>>> +============================
>>> +QEMU does not have a clear socket-device matching mechanism
>>> +and allows any PCI/PCI Express device to be plugged into any PCI/PCI Express slot.
>>> +Plugging a PCI device into a PCI Express slot might not always work and
>>> +is weird anyway since it cannot be done for "bare metal".
>>> +Plugging a PCI Express device into a PCI slot will hide the Extended
>>> +Configuration Space thus is also not recommended.
>>> +
>>> +The recommendation is to separate the PCI Express and PCI hierarchies.
>>> +PCI Express devices should be plugged only into PCI Express Root Ports and
>>> +PCI Express Downstream ports.
>>> +
>>> +2.1 Root Bus (pcie.0)
>>> +=====================
>>> +Place only the following kinds of devices directly on the Root Complex:
>>> +    (1) Devices with dedicated, specific functionality (network card,
>>> +        graphics card, IDE controller, etc); place only legacy PCI devices on
>>> +        the Root Complex. These will be considered Integrated Endpoints.
>>> +        Note: Integrated devices are not hot-pluggable.
>

Hi Andrea,
Thanks for the review.

> s/Integrated devices/Integrated Endpoints/ (which I assume
> is a Spec-Originated Term) in the last sentence, to be
> consistent with the one right before it.
>

OK

> I'm also not sure what you mean by devices with "dedicated,
> specific functionality", and unfortunately the examples don't
> seem to be helping me.

I'll try to re-phrase a little,  the PCI Express spec doesn't
say anything specific about what Endpoints can be (as far as I know)
and what is happening is that on-board devices remain *usually*
PCI and not PCI express.They may be NICs, Sound cards..

Sounds better?

>
>>> +        Although the PCI Express spec does not forbid PCI Express devices as
>>> +        Integrated Endpoints, existing hardware mostly integrates legacy PCI
>>> +        devices with the Root Complex. Guest OSes are suspected to behave
>>> +        strangely when PCI Express devices are integrated with the Root Complex.
>>> +
>>> +    (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express
>>> +        hierarchies.
>>> +
>>> +    (3) DMI-PCI bridges (i82801b11-bridge), for starting legacy PCI hierarchies.
>>> +
>>> +    (4) Extra Root Complexes (pxb-pcie), if multiple PCIe Root Buses are needed.
>>> +
>>> +   pcie.0 bus
>>> +   -----------------------------------------------------------------------------
>>> +        |                |                    |                  |
>>> +   -----------   ------------------   ------------------   --------------
>>> +   | PCI Dev |   | PCIe Root Port |   | DMI-PCI bridge |   |  pxb-pcie  |
>>> +   -----------   ------------------   ------------------   --------------
>>> +
>>> +2.1.1 To plug a device into a pcie.0 as Root Complex Integrated Device use:
>
> s/Root Complex Integrated Device/Integrated Endpoint/ ?
>

sure

> I don't know how much any of these terms can be used
> interchangeably, but it would be good IMHO if we could choose
> a single term and stick to it throughout the document.
>
>>> +          -device <dev>[,bus=pcie.0]
>
> Is the bus=pcie.0 bit really optional? Will QEMU just assign
> devices to pcie.0 automatically unless you provide an explicit
> bus= option telling it otherwise?

yes, that is if you dont have a pxb-pcie devices that exposes
a new PCI Root Bus.

If you have only one PCI Root Bus is OK.

>
>>> +2.1.2 To expose a new PCI Express Root Bus use:
>>> +          -device pxb-pcie,id=pcie.1,bus_nr=x,[numa_node=y],[addr=z]
>>> +      Only PCI Express Root Ports and DMI-PCI bridges can be connected to the pcie.1 bus:
>>> +          -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z] \
>>> +          -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1
>
> Here I really can't see how the bus= option would be optional
> for ioh3420 but mandatory for i82801b11-bridge.
>

As explained above, is not related to the DMI-PCI bridge, but to the
fact we have 2 PCI root buses, so we have to specify which one we need.

> Neither for <dev> above nor for i82801b11-bridge you show
> that the slot= option exists.

Because we don't have this option.

Of course these are merely
> usage examples and are not intended to replace the full
> documentation: since this is the case, I think we should
> make that very explicit and possibly avoid listing options
> we don't use at all, eg.
>

I will mention that the document does not list all the options,
only the ones needed for the examples.


>   -device e1000,bus=pcie.0
>
>   This will plug a e1000 Ethernet adapter into pcie.0 as an
>   Integrated Endpoint.
>
> and so on.
>

Right, but the bus is optional if you don't have multiple PCI root buses (pxb/pxb-pcie).

>>> +2.2 PCI Express only hierarchy
>>> +==============================
>>> +Always use PCI Express Root Ports to start PCI Express hierarchies.
>>> +
>>> +A PCI Express Root bus supports up to 32 devices. Since each
>>> +PCI Express Root Port is a function and a multi-function
>>> +device may support up to 8 functions, the maximum possible
>>> +PCI Express Root Ports per PCI Express Root Bus is 256.
>>> +
>>> +Prefer coupling PCI Express Root Ports into multi-function devices
>
> s/coupling/grouping/
>

OK

>>> +to keep a simple flat hierarchy that is enough for most scenarios.
>>> +Only use PCI Express Switches (x3130-upstream, xio3130-downstream)
>>> +if there is no more room for PCI Express Root Ports.
>>> +Please see section 4. for further justifications.
>>> +
>>> +Plug only PCI Express devices into PCI Express Ports.
>>> +
>>> +
>>> +   pcie.0 bus
>>> +   ----------------------------------------------------------------------------------
>>> +        |                 |                                    |
>>> +   -------------    -------------                        -------------
>>> +   | Root Port |    | Root Port |                        | Root Port |
>>> +   ------------     -------------                        -------------
>>> +         |                            -------------------------|------------------------
>>> +    ------------                      |                 -----------------              |
>>> +    | PCIe Dev |                      |    PCI Express  | Upstream Port |              |
>>> +    ------------                      |      Switch     -----------------              |
>>> +                                      |                  |            |                |
>>> +                                      |    -------------------    -------------------  |
>>> +                                      |    | Downstream Port |    | Downstream Port |  |
>>> +                                      |    -------------------    -------------------  |
>>> +                                      -------------|-----------------------|------------
>>> +                                             ------------
>>> +                                             | PCIe Dev |
>>> +                                             ------------
>>> +
>>> +2.2.1 Plugging a PCI Express device into a PCI Express Root Port:
>>> +          -device ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
>>> +          -device <dev>,bus=root_port1
>>> +      Note that chassis parameter is compulsory, and must be unique
>
> s/compulsory/mandatory/
>

OK

>>> +      for each PCI Express Root Port.
>>> +2.2.2 Using multi-function PCI Express Root Ports:
>>> +      -device ioh3420,id=root_port1,multifunction=on,chassis=x[,bus=pcie.0][,slot=y][,addr=z.0] \
>>> +      -device ioh3420,id=root_port2,,chassis=x1[,bus=pcie.0][,slot=y1][,addr=z.1] \
>>> +      -device ioh3420,id=root_port3,,chassis=x2[,bus=pcie.0][,slot=y2][,addr=z.2] \
>>> +2.2.2 Plugging a PCI Express device into a Switch:
>>> +      -device ioh3420,id=root_port1,chassis=x[,bus=pcie.0][,slot=y][,addr=z]  \
>>> +      -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x]          \
>>> +      -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1[,slot=y1][,addr=z1]] \
>>> +      -device <dev>,bus=downstream_port1
>>> +
>>> +
>>> +2.3 PCI only hierarchy
>>> +======================
>>> +Legacy PCI devices can be plugged into pcie.0 as Integrated Devices.
>
> Maybe we could add something like
>
>   but, as mentioned above, doing so means the legacy PCI
>   device in question will be incapable of hot-unplugging.
>
> just to stress the fact.

good idea.

>
> Aside: "device" or "endpoint"? My understanding is that
> endpoints are things like Ethernet adapters, and devices
> are endpoints *or* controller, but I might be way off base
> here ;)
>

You are right, I will stick with "endpoints" and make the distinction.
I actually used "Integrated devices" as it is clearer than "endpoints"
for the reader, but if is not clear enough I'll do it "right".

>>> +Besides that use DMI-PCI bridges (i82801b11-bridge) to start PCI hierarchies.
>>> +
>>> +Prefer flat hierarchies. For most scenarios a single DMI-PCI bridge (having 32 slots)
>>> +and several PCI-PCI bridges attached to it (each supporting also 32 slots) will support
>>> +hundreds of legacy devices. The recommendation is to populate one PCI-PCI bridge
>>> +under the DMI-PCI bridge until is full and then plug a new PCI-PCI bridge...
>>> +
>>> +   pcie.0 bus
>>> +   ----------------------------------------------
>>> +        |                            |
>>> +   -----------               ------------------
>>> +   | PCI Dev |               | DMI-PCI BRIDGE |
>>> +   ----------                ------------------
>>> +                               |            |
>>> +                        -----------    ------------------
>>> +                        | PCI Dev |    | PCI-PCI Bridge |
>>> +                        -----------    ------------------
>>> +                                         |           |
>>> +                                  -----------     -----------
>>> +                                  | PCI Dev |     | PCI Dev |
>>> +                                  -----------     -----------
>>> +
>>> +2.3.1 To plug a PCI device into a pcie.0 as Integrated Device use:
>>> +      -device <dev>[,bus=pcie.0]
>>> +2.3.2 Plugging a PCI device into a DMI-PCI bridge:
>>> +      -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]    \
>>> +      -device <dev>,bus=dmi_pci_bridge1[,addr=x]
>>> +2.3.3 Plugging a PCI device into a PCI-PCI bridge:
>>> +      -device i82801b11-bridge,id=dmi_pci_bridge1,[,bus=pcie.0]                        \
>>> +      -device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]    \
>>> +      -device <dev>,bus=pci_bridge1[,addr=x]
>>> +
>>> +
>>> +3. IO space issues
>>> +===================
>>> +The PCI Express Root Ports and PCI Express Downstream ports are seen by
>>> +Firmware/Guest OS as PCI-PCI bridges and, as required by PCI spec,
>>> +should reserve a 4K IO range for each even if only one (multifunction)
>>> +device can be plugged into them, resulting in poor IO space utilization.
>>> +
>>> +The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations
>>> +by not allocating IO space if possible:
>>> +    (1) - For empty PCI Express Root Ports/PCI Express Downstream ports.
>>> +    (2) - If the device behind the PCI Express Root Ports/PCI Express
>>> +          Downstream has no IO BARs.
>>> +
>>> +The IO space is very limited, 65536 byte-wide IO ports, but it's fragmented
>>> +resulting in ~10 PCI Express Root Ports (or PCI Express Downstream/Upstream ports)
>>> +ports per system if devices with IO BARs are used in the PCI Express hierarchy.
>>> +
>>> +Using the proposed device placing strategy solves this issue
>>> +by using only PCI Express devices within PCI Express hierarchy.
>>> +
>>> +The PCI Express spec requires the PCI Express devices to work without using IO.
>>> +The PCI hierarchy has no such limitations.
>
> After reading this section and piecing all bits together, my
> understanding is that
>
>   a) the spec requires firmware and OS to allocate 4K of IO
>      space for each PCI Express Root Port or PCI Express
>      Switch Downstream Port

at least

>
>   b) that's necessary because legacy PCI Devices require IO
>      space to operate
>

right

>   c) however, IO space is a very limited resource
>

right again

>   d) the spec also requires PCI Express Devices (or rather
>      Endpoints?) to work without using any IO space
>

PCI Express Devices, not Endpoints.


>   e) if we stick to the plan outlined in this document, there
>      will never be legacy PCI Devices plugged into PCI Express
>      Root Ports or PCI Express Switch Downstream Ports (only
>      PCI Express Endpoints or PCI Express Switch Upstream
>      Ports)
>

Yes, this is the idea.

>   f) thanks to d) and e), the firmware (and OS? IIRC Linux
>      currently doesn't do this) can ignore a) and allocate no
>      IO space for PCI Express Root Ports and PCI Express
>      Switch Downstream Ports
>

Right, and even if the firmware/OS would try to allocate IO and fail,
a device not requiring IO should still be able to function properly.

>   g) thus effectively making c) irrelevant
>

Yes.

> If my reading is correct, I think we should put more emphasis
> on the fact that our device placement strategy is effective
> in avoiding IO space exaustion because we stick to PCI Express
> Devices only which, unlike legacy PCI Devices, require no IO
> space to operate.
>

Well, this is what the "3." point is about.
But since the device placement "theory" proposed by this document
is not aimed only for the IO limitation, we have several points,
each of them explaining another reason to do so.

>>> +4. Bus numbers issues
>>> +======================
>>> +Each PCI domain can have up to only 256 buses and the QEMU PCI Express
>>> +machines do not support multiple PCI domains even if extra Root
>>> +Complexes (pxb-pcie) are used.
>>> +
>>> +Each element of the PCI Express hierarchy (Root Complexes,
>>> +PCI Express Root Ports, PCI Express Downstream/Upstream ports)
>>> +takes up bus numbers. Since only one (multifunction) device
>>> +can be attached to a PCI Express Root Port or PCI Express Downstream
>>> +Port it is advised to plan in advance for the expected number of
>>> +devices to prevent bus numbers starvation.
>>> +
>>> +
>>> +5. Hot Plug
>
> Hot Plug, hot-plug or hotplug?
>
> Pick one and stick with it :)
>

You got me here :) I'll pick one.


>>> +============
>>> +The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices)
>>> +do not support hot-plug, so any devices plugged into Root Complexes
>>> +cannot be hot-plugged/hot-unplugged:
>>> +    (1) PCI Express Integrated Devices
>>> +    (2) PCI Express Root Ports
>>> +    (3) DMI-PCI bridges
>>> +    (4) pxb-pcie
>>> +
>>> +PCI devices can be hot-plugged into PCI-PCI bridges, however cannot
>>> +be hot-plugged into DMI-PCI bridges.
>>> +The PCI hotplug is ACPI based and can work side by side with the
>>> +PCI Express native hotplug.
>>> +
>>> +PCI Express devices can be natively hot-plugged/hot-unplugged into/from
>>> +PCI Express Root Ports (and PCI Express Downstream Ports).
>>> +
>>> +5.1 Planning for hotplug:
>>> +    (1) PCI hierarchy
>>> +        Leave enough PCI-PCI bridge slots empty or add one
>>> +        or more empty PCI-PCI bridges to the DMI-PCI bridge.
>>> +
>>> +        For each such bridge the Guest Firmware is expected to reserve 4K IO
>>> +        space and 2M MMIO range to be used for all devices behind it.
>>> +
>>> +        Because of the hard IO limit of around 10 PCI bridges (~ 40K space) per system
>>> +        don't use more than 9 bridges, leaving 4K for the Integrated devices
>>> +        and none for the PCI Express Hierarchy.
>
> I would rewrite the last line as
>
>   (the PCI Express hierarchy needs no IO space).
>
> or something like that. Unless I misunderstood :)


maybe "...the PCI Express hierarchy that needs no IO space"

>
>>> +    (2) PCI Express hierarchy:
>>> +        Leave enough PCI Express Root Ports empty. Use multifunction
>>> +        PCI Express Root Ports to prevent going out of PCI bus numbers.
>
> When you say "multifunction PCI Express Root Ports", are you
> suggesting the user should take advantage of all 8 functions
> in a PCI Express Root Port, eg. by plugging 4 Ethernet
> adapters into the same PCI Express Root Port instead of using
> a single PCI Express Root Port for each one of them, or that
> multiple PCI Express Root Ports should be plugged into
> functions of the same pcie.0 slot?
>


The later, please see the example 2.2.2

> Either way, I don't see how doing so would prevent you from
> running out of PCI bus numbers - if anything, not doing the
> latter will make it so you'll run out of ports way before
> you come anywhere close to 256 PCI buses.
>

You are right, the reason is to not run out of ports. (the context is hot-plug here)

> Point is, being less terse here could be very helpful.
>
>>> +        Don't use PCI Express Switches if you don't have too, they use
>>> +        an extra PCI bus that may handy to plug another device id it comes to it.
>
> s/too/to/
> s/may handy/may come handy/
> s/id it comes/if it comes/
>

thanks again

>>> +5.3 Hot plug example:
>>> +Using HMP: (add -monitor stdio to QEMU command line)
>>> +  device_add <dev>,id=<id>,bus=<pcie.0/PCI Express Root Port Id/PCI-PCI bridge Id/pxb-pcie Id>
>>> +
>>> +
>>> +6. Device assignment
>>> +====================
>>> +Host devices are mostly PCI Express and should be plugged only into
>>> +PCI Express Root Ports or PCI Express Downstream Ports.
>>> +PCI-PCI bridge slots can be used for legacy PCI host devices.
>>> +
>>> +6.1 How to detect if a device is PCI Express:
>>> +  > lspci -s 03:00.0 -v (as root)
>>> +
>>> +    03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
>>> +    Subsystem: Intel Corporation Dual Band Wireless-AC 7260
>>> +    Flags: bus master, fast devsel, latency 0, IRQ 50
>>> +    Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
>>> +    Capabilities: [c8] Power Management version 3
>>> +    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>> +    Capabilities: [40] Express Endpoint, MSI 00
>>> +
>
> I'd skip this empty line...
>
>>> +    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> +
>
> ... and this one too.
>
>>> +    Capabilities: [100] Advanced Error Reporting
>>> +    Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
>>> +    Capabilities: [14c] Latency Tolerance Reporting
>>> +    Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014
>
> I'd also add a small note along the lines of
>
>   If you can see the "Express Endpoint" capability in the
>   output, then the device is indeed PCI Express.
>


OK

>>> +7. Virtio devices
>>> +=================
>>> +Virtio devices plugged into the PCI hierarchy or as Integrated Devices
>>> +will remain PCI and have transitional behaviour as default.
>>> +Transitional virtio devices work in both IO and MMIO modes depending on
>>> +the guest support.
>>> +
>>> +Virtio devices plugged into PCI Express ports are PCI Express devices and
>>> +have "1.0" behavior by default without IO support.
>>> +In both case disable-* properties can be used to override the behaviour.
>>> +
>>> +Note that setting disable-legacy=off will enable legacy mode (enabling
>>> +legacy behavior) for PCI Express virtio devices causing them to
>>> +require IO space, which, given our PCI Express hierarchy, may quickly
>>> +lead to resource exhaustion, and is therefore strongly discouraged.
>>> +
>>> +
>>> +8. Conclusion
>>> +==============
>>> +The proposal offers a usage model that is easy to understand and follow
>>> +and in the same time overcomes the PCI Express architecture limitations.
>
> s/in the same/at the same/
>
>
> Overall I think the contents we care about are pretty much
> there and are exposed and organized in a very clear fashion;
> pretty much all of my comments are meant to improve the few
> areas where I believe we could make things even easier to
> grasp for the reader, remove potential ambiguity, or be more
> consistent.
>
> Thanks for working on this! Looking forward to v3 ;)
>

Thanks a lot, the review is valuable and much appreciated!
Marcel

> --
> Andrea Bolognani / Red Hat / Virtualization
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-27 11:27     ` Marcel Apfelbaum
@ 2016-10-27 15:44       ` Laszlo Ersek
  2016-10-27 18:06         ` Marcel Apfelbaum
  0 siblings, 1 reply; 13+ messages in thread
From: Laszlo Ersek @ 2016-10-27 15:44 UTC (permalink / raw)
  To: Marcel Apfelbaum, qemu-devel
  Cc: Peter Maydell, Andrew Jones, Michael S. Tsirkin,
	Andrea Bolognani, Alex Williamson, Gerd Hoffmann, Laine Stump

On 10/27/16 13:27, Marcel Apfelbaum wrote:
> On 10/14/2016 02:36 PM, Laszlo Ersek wrote:
>> On 10/13/16 16:05, Marcel Apfelbaum wrote:
>>> On 10/13/2016 04:52 PM, Marcel Apfelbaum wrote:

>>>> +      -device
>>>> pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]
>>>> \
>>>> +      -device <dev>,bus=pci_bridge1[,addr=x]
>>
>> {13} It would be nice to spell out the valid device addresses (y and x)
>> here too -- can we use 0 for them? SHPC again?
>>
>> Can we / should we simply go with >=1 device addresses?
>>
> 
> For pci-bridges only - yes. A better idea (I think) is to disable SHPC
> by default
> from the next QEMU version. Make this slot usable. Sounds OK?

Sure.

>>>> +        Don't use PCI Express Switches if you don't have too, they use
>>>> +        an extra PCI bus that may handy to plug another device id it
>>>> comes to it.
>>>> +
>>
>> {25} I'd put it as: Don't use PCI Express Switches if you don't have
>> too, each one of those uses an extra PCI bus (for its Upstream Port)
>> that could be put to better use with another Root Port or Downstream
>> Port, which may come handy for hotplugging another device.
>>
>> {26} Another remark (important to me) in this section: the document
>> doesn't state firmware expectations. It's clear the firmware is expected
>> to reserve no IO space for PCI Express Downstream Ports and Root Ports,
>> but what about MMIO?
>>
>> We discussed this at length with Alex, but I think we didn't conclude
>> anything. It would be nice if firmware received some instructions from
>> this document in this regard, even before we implement our own ports and
>> bridges in QEMU.
>>
> 
> Hmm, I have no idea what to add here, except:
> The firmware is expected to reserve at least 2M for each pci bridge?

Just ignore {26} for now please. I'll come back to this later when we
have our own (generic) ports.

>>>> +5.3 Hot plug example:
>>>> +Using HMP: (add -monitor stdio to QEMU command line)
>>>> +  device_add <dev>,id=<id>,bus=<pcie.0/PCI Express Root Port
>>>> Id/PCI-PCI bridge Id/pxb-pcie Id>
>>
>> {27} I think the bus=<...> part is incorrect here. Based on the rest of
>> the guidelines, we have to specify the ID of:
>> - a PCI Express Root Port, or
>> - a PCI Express Downstream Port, or
>> - a PCI-PCI bridge.
>>
> 
> I don't get it, you specify what you wrote above as the bus, right?
> For example if you start the machine with
>     .... -device ioh3420,id=root_port1,
> you hotplug with: device_add e1000,bus=root_port1.

My point is that your example names

  bus=<pcie.0/PCI Express Root Port Id/PCI-PCI bridge Id/pxb-pcie Id>

which suggests that the following bus *types* can receive hotplugged
devices:
(a) main root bus ("pcie.0")
(c) root port ("PCI Express Root Port Id")
(c) PCI-PCI bridge ("PCI-PCI bridge Id")
(d) extra root bus ("pxb-pcie Id")

Based on the rest of the guidelines, suggestions (a) and (d) are invalid
-- those bus types cannot accept hotplugged devices --, plus option (e)
is missing, namely:
(e) downstream port.

In other words, your example IDs *infer* bus types, and the set of bus
types inferred is both wrong (incorrect elements) and incomplete (one
correct element missing).

Therefore my proposal is to provide the following example:

  bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI
bridge Id>

That's all.

Cheers
Laszlo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines
  2016-10-27 15:44       ` Laszlo Ersek
@ 2016-10-27 18:06         ` Marcel Apfelbaum
  0 siblings, 0 replies; 13+ messages in thread
From: Marcel Apfelbaum @ 2016-10-27 18:06 UTC (permalink / raw)
  To: Laszlo Ersek, qemu-devel
  Cc: Peter Maydell, Andrew Jones, Michael S. Tsirkin,
	Andrea Bolognani, Alex Williamson, Gerd Hoffmann, Laine Stump

On 10/27/2016 06:44 PM, Laszlo Ersek wrote:
> On 10/27/16 13:27, Marcel Apfelbaum wrote:
>> On 10/14/2016 02:36 PM, Laszlo Ersek wrote:
>>> On 10/13/16 16:05, Marcel Apfelbaum wrote:
>>>> On 10/13/2016 04:52 PM, Marcel Apfelbaum wrote:
>
>>>>> +      -device
>>>>> pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]
>>>>> \
>>>>> +      -device <dev>,bus=pci_bridge1[,addr=x]
>>>
>>> {13} It would be nice to spell out the valid device addresses (y and x)
>>> here too -- can we use 0 for them? SHPC again?
>>>
>>> Can we / should we simply go with >=1 device addresses?
>>>
>>
>> For pci-bridges only - yes. A better idea (I think) is to disable SHPC
>> by default
>> from the next QEMU version. Make this slot usable. Sounds OK?
>
> Sure.
>
>>>>> +        Don't use PCI Express Switches if you don't have too, they use
>>>>> +        an extra PCI bus that may handy to plug another device id it
>>>>> comes to it.
>>>>> +
>>>
>>> {25} I'd put it as: Don't use PCI Express Switches if you don't have
>>> too, each one of those uses an extra PCI bus (for its Upstream Port)
>>> that could be put to better use with another Root Port or Downstream
>>> Port, which may come handy for hotplugging another device.
>>>
>>> {26} Another remark (important to me) in this section: the document
>>> doesn't state firmware expectations. It's clear the firmware is expected
>>> to reserve no IO space for PCI Express Downstream Ports and Root Ports,
>>> but what about MMIO?
>>>
>>> We discussed this at length with Alex, but I think we didn't conclude
>>> anything. It would be nice if firmware received some instructions from
>>> this document in this regard, even before we implement our own ports and
>>> bridges in QEMU.
>>>
>>
>> Hmm, I have no idea what to add here, except:
>> The firmware is expected to reserve at least 2M for each pci bridge?
>
> Just ignore {26} for now please. I'll come back to this later when we
> have our own (generic) ports.
>
>>>>> +5.3 Hot plug example:
>>>>> +Using HMP: (add -monitor stdio to QEMU command line)
>>>>> +  device_add <dev>,id=<id>,bus=<pcie.0/PCI Express Root Port
>>>>> Id/PCI-PCI bridge Id/pxb-pcie Id>
>>>
>>> {27} I think the bus=<...> part is incorrect here. Based on the rest of
>>> the guidelines, we have to specify the ID of:
>>> - a PCI Express Root Port, or
>>> - a PCI Express Downstream Port, or
>>> - a PCI-PCI bridge.
>>>
>>
>> I don't get it, you specify what you wrote above as the bus, right?
>> For example if you start the machine with
>>     .... -device ioh3420,id=root_port1,
>> you hotplug with: device_add e1000,bus=root_port1.
>
> My point is that your example names
>
>   bus=<pcie.0/PCI Express Root Port Id/PCI-PCI bridge Id/pxb-pcie Id>
>
> which suggests that the following bus *types* can receive hotplugged
> devices:
> (a) main root bus ("pcie.0")
> (c) root port ("PCI Express Root Port Id")
> (c) PCI-PCI bridge ("PCI-PCI bridge Id")
> (d) extra root bus ("pxb-pcie Id")
>
> Based on the rest of the guidelines, suggestions (a) and (d) are invalid
> -- those bus types cannot accept hotplugged devices --, plus option (e)
> is missing, namely:
> (e) downstream port.
>
> In other words, your example IDs *infer* bus types, and the set of bus
> types inferred is both wrong (incorrect elements) and incomplete (one
> correct element missing).
>
> Therefore my proposal is to provide the following example:
>
>   bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI
> bridge Id>
>

And now I understood the 'bug' :)

Thanks,
Marcel

> That's all.
>
> Cheers
> Laszlo
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-10-27 18:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-13 13:52 [Qemu-devel] [PATCH V2 RESEND] docs: add PCIe devices placement guidelines Marcel Apfelbaum
2016-10-13 14:05 ` Marcel Apfelbaum
2016-10-14 11:36   ` Laszlo Ersek
2016-10-17 12:07     ` Gerd Hoffmann
2016-10-17 14:07       ` Laszlo Ersek
2016-10-27 11:28         ` Marcel Apfelbaum
2016-10-27 11:27     ` Marcel Apfelbaum
2016-10-27 15:44       ` Laszlo Ersek
2016-10-27 18:06         ` Marcel Apfelbaum
2016-10-17 14:18   ` Andrea Bolognani
2016-10-17 14:26     ` Laszlo Ersek
2016-10-17 14:53       ` Andrea Bolognani
2016-10-27 14:36     ` Marcel Apfelbaum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.