xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Gerasimenko <x1917x@gmail.com>
To: xen-devel@lists.xenproject.org
Cc: Anthony Perard <anthony.perard@citrix.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Alexey Gerasimenko <x1917x@gmail.com>,
	qemu-devel@nongnu.org
Subject: [RFC PATCH 21/30] xen/pt: Xen PCIe passthrough support for Q35: bypass PCIe topology check
Date: Tue, 13 Mar 2018 04:34:06 +1000	[thread overview]
Message-ID: <bc91bcb3b29bda26289019574cac4302eb7ca085.1520867956.git.x1917x@gmail.com> (raw)
In-Reply-To: <cover.1520867740.git.x1917x@gmail.com>
In-Reply-To: <cover.1520867740.git.x1917x@gmail.com>

Compared to legacy i440 system, there are certain difficulties while
passing through PCIe devices to guest OSes like Windows 7 and above
on platforms with native support of PCIe bus (in our case Q35). This
problem is not applicable to older OSes like Windows XP -- PCIe
passthrough on such OSes can be used normally as these OSes have
no support for PCIe-specific features and treat all PCIe devices as legacy
PCI ones.

The problem manifests itself as "Code 10" error for a passed thru PCIe
device in Windows Device Manager (along with exclamation mark on it). The
device with such error do not function no matter the fact that Windows
successfully booted while actually using this device, ex. as a primary VGA
card with VBE features, LFB, etc. working properly during boot time.
It doesn't matter which PCI class the device have -- the problem is common
to GPUs, NIC cards, USB controllers, etc. In the same time, all these
devices can be passed thru successfully using i440 emulation on same
Windows 7+ OSes.

The actual root cause of the problem lies in the fact that Windows kernel
(PnP manager particularly) while processing StartDevice IRP refuses
to continue to start the device and control flow actually doesn't even
reach the IRP handler in the device driver at all. The real reason for
this typically does not appear at the time PnP manager tries to start the
device, but happens much earlier -- during the Windows boot stage, while
enumerating devices on a PCI/PCIe bus in the Windows pci.sys driver. There
is a set of checks for every discovered device on the PCIe bus. Failing
some of them leads to marking the discovered PCIe device as 'invalid'
by setting the flag. Later on, StartDevice attempt will fail due to this
flag, finally resulting in Code 10 error.

The actual check in pci.sys which results in the PCIe device being marked
as 'invalid' in our case is a validation of upstream PCIe bus hierarchy
to which passed through device belongs. Basically, pci.sys checks if the
PCIe device has parent devices, such as PCIe Root Port or upstream PCIe
switch. In our case the PCIe device has no parents and resides on bus
0 without eg. corresponding Root Port.

Therefore, in order to resolve this problem in a architecturally correct
way, we need to introduce to Xen some support of at least trivial non-flat
PCI bus hierarchy. In very simplest case - just one virtual Root Port,
on secondary bus of which all physical functions of the real passed thru
device will reside, eg. GPU and its HDAudio function.

This solution is not hard to implement technically, but there are multiple
affecting limitations present in Xen (many related to each other)
currently:

- in many places the code is limited to use bus 0 only. This applicable
  to both hypervisor and supplemental modules like hvmloader. This
  limitation is enforced on API level -- many functions and interfaces
  allow to specify only devfn argument while bus 0 being implied.

- lot of code assumes Type0 PCI config space layout only, while we need
  to handle Type1 PCI devices as well

- currently there no way to assign to a guest domain even a simplest
  linked hierarchy of passed thru PCI devices. In some cases we might need
  to passthrough a real PCIe Switch/Root Port with his downstream child
  devices.

- in a similar way Xen/hvmloader lacks the concept of IO/MMIO space
  nesting. Both code which does MMIO hole sizing and code which allocates
  BARs to MMIO hole have no idea of MMIO ranges nesting and their relations.
  In case of virtual Root Port we have basically an emulated PCI-PCI bridge
  with some parts of its MMIO range used for real MMIO ranges of passed
  through device(s).

So, adding to Xen multiple PCI buses support will require a bit of effort
and discussions regarding the actual design of the feature.  Nevertheless,
this task is crucial for PCI/GPU passthrough features of Xen to work
properly.

To summarize, we need to implement following things in the future:
1) Get rid of PCI bus 0 limitation everywhere. This could've been
  a simplest of subtasks but in reality this will require to change
  interfaces as well - AFAIR even adding a PCI device via QMP only allows
  to specify a device slot while we need to have some way to place the
  device on an arbitrary bus.

2) Fully or partially emulated PCI-PCI bridge which will provide
  a secondary bus for PCIe device placement - there might be a possibility
  to reuse some existing emulation QEMU provides. This also includes Type1
  devices support.
  The task will become more complicated if there arise necessity, for
  example, to control the PCIe link for a passed through PCIe device. As PT
  device reset is mandatory in most cases, there might be a chance
  to encounter a situation when we need to retrain the PCIe link to restore
  PCIe link speed after the reset. In this case there will be a need
  to selectively translate accesses to certain registers of emulated PCIe
  Switch/Root Port to the corresponding physical upstream PCIe
  Switch/RootPort. This will require some interaction with Dom0, hopefully
  extending xen-pciback will be enough.

3) The concept of I/O and MMIO ranges nesting, for tasks like sizing MMIO
  hole or PCI BAR allocation. This one should be pretty simple.

The actual implementation still is a matter to discuss of course.

In the meantime there can be used a very simple workaround which allows
to bypass pci.sys limitation for PCIe topology check - there exist one
good exception to "must have upstream PCIe parent" rule of pci.sys. It's
chipset-integrated devices. How pci.sys can tell if it deals with
a chipset built-in device? It checks one of PCI Express Capability fields
in the device PCI conf space. For chipset built-in devices this field will
state "root complex integrated device" while in our  case for a normal
passed thru PCIe device there will be a "PCIe endpoint" type. So that's
what the workaround does - it intercepts reading of this particular field
for passed through devices and returns the "root complex integrated
device" value for PCIe endpoints. This makes pci.sys happy and allows
Windows 7 and above to use PT device on PCIe-capable system normally.
So far no negative side effects were encountered while using this
approach, so it's a good temporary solution until multiple PCI bus support
will be added to Xen.

Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
---
 hw/xen/xen_pt_config_init.c | 60 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

diff --git a/hw/xen/xen_pt_config_init.c b/hw/xen/xen_pt_config_init.c
index 02e8c97f3c..91de215407 100644
--- a/hw/xen/xen_pt_config_init.c
+++ b/hw/xen/xen_pt_config_init.c
@@ -902,6 +902,55 @@ static int xen_pt_linkctrl2_reg_init(XenPCIPassthroughState *s,
     *data = reg_field;
     return 0;
 }
+/* initialize PCI Express Capabilities register */
+static int xen_pt_pcie_capabilities_reg_init(XenPCIPassthroughState *s,
+                                             XenPTRegInfo *reg,
+                                             uint32_t real_offset,
+                                             uint32_t *data)
+{
+    uint8_t dev_type = get_pcie_device_type(s);
+    uint16_t reg_field;
+
+    if (xen_host_pci_get_word(&s->real_device,
+                             real_offset - reg->offset + PCI_EXP_FLAGS,
+                             &reg_field)) {
+        XEN_PT_ERR(&s->dev, "Error reading PCIe Capabilities reg\n");
+        *data = 0;
+        return 0;
+    }
+
+    /*
+     * Q35 workaround for Win7+ pci.sys PCIe topology check.
+     * As our PT device currently located on a bus 0, fake the
+     * device/port type field to the "Root Complex integrated device"
+     * value to bypass the check
+     */
+    switch (dev_type) {
+    case PCI_EXP_TYPE_ENDPOINT:
+    case PCI_EXP_TYPE_LEG_END:
+        XEN_PT_LOG(&s->dev, "Original PCIe Capabilities reg is 0x%04X\n",
+            reg_field);
+        reg_field &= ~PCI_EXP_FLAGS_TYPE;
+        reg_field |= ((PCI_EXP_TYPE_RC_END /*9*/ << 4) & PCI_EXP_FLAGS_TYPE);
+        XEN_PT_LOG(&s->dev, "Q35 PCIe topology check workaround: "
+                   "faking Capabilities reg to 0x%04X\n", reg_field);
+        break;
+
+    case PCI_EXP_TYPE_ROOT_PORT:
+    case PCI_EXP_TYPE_UPSTREAM:
+    case PCI_EXP_TYPE_DOWNSTREAM:
+    case PCI_EXP_TYPE_PCI_BRIDGE:
+    case PCI_EXP_TYPE_PCIE_BRIDGE:
+    case PCI_EXP_TYPE_RC_END:
+    case PCI_EXP_TYPE_RC_EC:
+    default:
+        /* do nothing, return as is */
+        break;
+    }
+
+    *data = reg_field;
+    return 0;
+}
 
 /* PCI Express Capability Structure reg static information table */
 static XenPTRegInfo xen_pt_emu_reg_pcie[] = {
@@ -916,6 +965,17 @@ static XenPTRegInfo xen_pt_emu_reg_pcie[] = {
         .u.b.read   = xen_pt_byte_reg_read,
         .u.b.write  = xen_pt_byte_reg_write,
     },
+    /* PCI Express Capabilities Register */
+    {
+        .offset     = PCI_EXP_FLAGS,
+        .size       = 2,
+        .init_val   = 0x0000,
+        .ro_mask    = 0xFFFF,
+        .emu_mask   = 0xFFFF,
+        .init       = xen_pt_pcie_capabilities_reg_init,
+        .u.w.read   = xen_pt_word_reg_read,
+        .u.w.write  = xen_pt_word_reg_write,
+    },
     /* Device Capabilities reg */
     {
         .offset     = PCI_EXP_DEVCAP,
-- 
2.11.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  parent reply	other threads:[~2018-03-12 18:35 UTC|newest]

Thread overview: 155+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-12 18:33 [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices Alexey Gerasimenko
2018-03-12 18:33 ` [RFC PATCH 01/12] libacpi: new DSDT ACPI table for Q35 Alexey Gerasimenko
2018-03-12 19:38   ` Konrad Rzeszutek Wilk
2018-03-12 20:10     ` Alexey G
2018-03-12 20:32       ` Konrad Rzeszutek Wilk
2018-03-12 21:19         ` Alexey G
2018-03-13  2:41           ` Tian, Kevin
2018-03-19 12:43   ` Roger Pau Monné
2018-03-19 13:57     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 02/12] Makefile: build and use new DSDT " Alexey Gerasimenko
2018-03-19 12:46   ` Roger Pau Monné
2018-03-19 14:18     ` Alexey G
2018-03-19 13:07   ` Jan Beulich
2018-03-19 14:10     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 03/12] hvmloader: add function to query an emulated machine type (i440/Q35) Alexey Gerasimenko
2018-03-13 17:26   ` Wei Liu
2018-03-13 17:58     ` Alexey G
2018-03-13 18:04       ` Wei Liu
2018-03-19 12:56   ` Roger Pau Monné
2018-03-19 16:26     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 04/12] hvmloader: add ACPI enabling for Q35 Alexey Gerasimenko
2018-03-13 17:26   ` Wei Liu
2018-03-19 13:01   ` Roger Pau Monné
2018-03-19 23:59     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 05/12] hvmloader: add Q35 DSDT table loading Alexey Gerasimenko
2018-03-19 14:45   ` Roger Pau Monné
2018-03-20  0:15     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 06/12] hvmloader: add basic Q35 support Alexey Gerasimenko
2018-03-19 15:30   ` Roger Pau Monné
2018-03-19 23:44     ` Alexey G
2018-03-20  9:20       ` Roger Pau Monné
2018-03-20 21:23         ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 07/12] hvmloader: allocate MMCONFIG area in the MMIO hole + minor code refactoring Alexey Gerasimenko
2018-03-19 15:58   ` Roger Pau Monné
2018-03-19 19:49     ` Alexey G
2018-03-20  8:50       ` Roger Pau Monné
2018-03-20  9:25         ` Paul Durrant
2018-03-21  0:58         ` Alexey G
2018-03-21  9:09           ` Roger Pau Monné
2018-03-21  9:36             ` Paul Durrant
2018-03-21 14:35               ` Alexey G
2018-03-21 14:58                 ` Paul Durrant
2018-03-21 14:25             ` Alexey G
2018-03-21 14:54               ` Paul Durrant
2018-03-21 17:41                 ` Alexey G
2018-03-21 15:20               ` Roger Pau Monné
2018-03-21 16:56                 ` Alexey G
2018-03-21 17:06                   ` Paul Durrant
2018-03-22  0:31                     ` Alexey G
2018-03-22  9:04                       ` Jan Beulich
2018-03-22  9:55                         ` Alexey G
2018-03-22 10:06                           ` Paul Durrant
2018-03-22 11:56                             ` Alexey G
2018-03-22 12:09                               ` Jan Beulich
2018-03-22 13:05                                 ` Alexey G
2018-03-22 13:20                                   ` Jan Beulich
2018-03-22 14:34                                     ` Alexey G
2018-03-22 14:42                                       ` Jan Beulich
2018-03-22 15:08                                         ` Alexey G
2018-03-23 13:57                                           ` Paul Durrant
2018-03-23 22:32                                             ` Alexey G
2018-03-26  9:24                                               ` Roger Pau Monné
2018-03-26 19:42                                                 ` Alexey G
2018-03-27  8:45                                                   ` Roger Pau Monné
2018-03-27 15:37                                                     ` Alexey G
2018-03-28  9:30                                                       ` Roger Pau Monné
2018-03-28 11:42                                                         ` Alexey G
2018-03-28 12:05                                                           ` Paul Durrant
2018-03-28 10:03                                                       ` Paul Durrant
2018-03-28 14:14                                                         ` Alexey G
2018-03-21 17:15                   ` Roger Pau Monné
2018-03-21 22:49                     ` Alexey G
2018-03-22  9:29                       ` Paul Durrant
2018-03-22 10:05                         ` Roger Pau Monné
2018-03-22 10:09                           ` Paul Durrant
2018-03-22 11:36                             ` Alexey G
2018-03-22 10:50                         ` Alexey G
2018-03-22  9:57                       ` Roger Pau Monné
2018-03-22 12:29                         ` Alexey G
2018-03-22 12:44                           ` Roger Pau Monné
2018-03-22 15:31                             ` Alexey G
2018-03-23 10:29                               ` Paul Durrant
2018-03-23 11:38                                 ` Jan Beulich
2018-03-23 13:52                                   ` Paul Durrant
2018-05-29 14:23   ` Jan Beulich
2018-05-29 17:56     ` Alexey G
2018-05-29 18:47       ` Alexey G
2018-05-30  4:32         ` Alexey G
2018-05-30  8:13           ` Jan Beulich
2018-05-31  4:25             ` Alexey G
2018-05-30  8:12         ` Jan Beulich
2018-05-31  5:15           ` Alexey G
2018-06-01  5:30             ` Jan Beulich
2018-06-01 15:53               ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 08/12] libxl: Q35 support (new option device_model_machine) Alexey Gerasimenko
2018-03-13 17:25   ` Wei Liu
2018-03-13 17:32     ` Anthony PERARD
2018-03-19 17:01   ` Roger Pau Monné
2018-03-19 22:11     ` Alexey G
2018-03-20  9:11       ` Roger Pau Monné
2018-03-21 16:27         ` Wei Liu
2018-03-21 17:03           ` Anthony PERARD
2018-03-21 16:25       ` Wei Liu
2018-03-12 18:33 ` [RFC PATCH 09/12] libxl: Xen Platform device support for Q35 Alexey Gerasimenko
2018-03-19 15:05   ` Alexey G
2018-03-21 16:32     ` Wei Liu
2018-03-12 18:33 ` [RFC PATCH 10/12] libacpi: build ACPI MCFG table if requested Alexey Gerasimenko
2018-03-19 17:33   ` Roger Pau Monné
2018-03-19 21:46     ` Alexey G
2018-03-20  9:03       ` Roger Pau Monné
2018-03-20 21:06         ` Alexey G
2018-05-29 14:36   ` Jan Beulich
2018-05-29 18:20     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 11/12] hvmloader: use libacpi to build MCFG table Alexey Gerasimenko
2018-03-14 17:48   ` Alexey G
2018-03-19 17:49   ` Roger Pau Monné
2018-03-19 21:20     ` Alexey G
2018-03-20  8:58       ` Roger Pau Monné
2018-03-20  9:36       ` Jan Beulich
2018-03-20 20:53         ` Alexey G
2018-03-21  7:36           ` Jan Beulich
2018-05-29 14:46   ` Jan Beulich
2018-05-29 17:26     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 12/12] docs: provide description for device_model_machine option Alexey Gerasimenko
2018-03-12 18:33 ` [RFC PATCH 13/30] pc/xen: Xen Q35 support: provide IRQ handling for PCI devices Alexey Gerasimenko
2018-03-14 10:48   ` Paolo Bonzini
     [not found]   ` <406abf99-4311-f08d-9f61-df72a9a3ef05@redhat.com>
2018-03-14 11:28     ` Alexey G
2018-03-12 18:33 ` [RFC PATCH 14/30] pc/q35: Apply PCI bus BSEL property for Xen PCI device hotplug Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 15/30] q35/acpi/xen: Provide ACPI PCI hotplug interface for Xen on Q35 Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 16/30] q35/xen: Add Xen platform device support for Q35 Alexey Gerasimenko
2018-03-12 19:44   ` Eduardo Habkost
     [not found]   ` <20180312194406.GX3417@localhost.localdomain>
2018-03-12 20:56     ` Alexey G
2018-03-12 21:44       ` Eduardo Habkost
     [not found]       ` <20180312214402.GY3417@localhost.localdomain>
2018-03-13 23:49         ` Alexey G
2018-03-13  9:24   ` [Qemu-devel] " Daniel P. Berrangé
2018-03-12 18:34 ` [RFC PATCH 17/30] q35: Fix incorrect values for PCIEXBAR masks Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 18/30] xen/pt: XenHostPCIDevice: provide functions for PCI Capabilities and PCIe Extended Capabilities enumeration Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 19/30] xen/pt: avoid reading PCIe device type and cap version multiple times Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 20/30] xen/pt: determine the legacy/PCIe mode for a passed through device Alexey Gerasimenko
2018-03-12 18:34 ` Alexey Gerasimenko [this message]
2018-03-12 18:34 ` [RFC PATCH 22/30] xen/pt: add support for PCIe Extended Capabilities and larger config space Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 23/30] xen/pt: handle PCIe Extended Capabilities Next register Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 24/30] xen/pt: allow to hide PCIe Extended Capabilities Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 25/30] xen/pt: add Vendor-specific PCIe Extended Capability descriptor and sizing Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 26/30] xen/pt: add fixed-size PCIe Extended Capabilities descriptors Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 27/30] xen/pt: add AER PCIe Extended Capability descriptor and sizing Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 28/30] xen/pt: add descriptors and size calculation for RCLD/ACS/PMUX/DPA/MCAST/TPH/DPC PCIe Extended Capabilities Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 29/30] xen/pt: add Resizable BAR PCIe Extended Capability descriptor and sizing Alexey Gerasimenko
2018-03-12 18:34 ` [RFC PATCH 30/30] xen/pt: add VC/VC9/MFVC PCIe Extended Capabilities descriptors " Alexey Gerasimenko
2018-03-13  9:21 ` [Qemu-devel] [RFC PATCH 00/30] Xen Q35 Bringup patches + support for PCIe Extended Capabilities for passed through devices Daniel P. Berrangé
2018-03-13 11:37   ` Alexey G
2018-03-13 11:44     ` Daniel P. Berrangé
2018-03-16 17:34 ` Alexey G
2018-03-16 18:26   ` Stefano Stabellini
2018-03-16 18:36   ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc91bcb3b29bda26289019574cac4302eb7ca085.1520867956.git.x1917x@gmail.com \
    --to=x1917x@gmail.com \
    --cc=anthony.perard@citrix.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sstabellini@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).