archive mirror
 help / color / mirror / Atom feed
From: (Casey Leedom)
Subject: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
Date: Wed, 9 Aug 2017 12:33:34 +0000	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

| From: Bjorn Helgaas <>
| Sent: Tuesday, August 8, 2017 7:22 PM
| ...
| and the caller should do something like this:
|     if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev)))
|         adapter->flags |= ROOT_NO_RELAXED_ORDERING;
| That way it's obvious where the issue is, and it's obvious that the
| answer might be different for peer-to-peer transactions than it is for
| transactions to the root port, i.e., to coherent memory.
| ...

Which is back to something very close to what I initially suggested in my
first patch submission.  Because you're right, this isn't about broken
Source Devices, it's about broken Completing Devices.

Unfortunately, as Alexander Duyck noted, in a Virtual Machine we won't be
able to see our Root Port in order to make this determination.  (And in some
Hypervisor implementations I've seen, there's not even a synthetic Root Port
available to the VM at all, let alone read-only access to the real one.)

So the current scheme was developed of having the Hypervisor kernel traverse
down the PCIe Fabric when it finds a broken Root Port implementation (the
issue that we've mostly been primarily focused upon), and turning off the
PCIe Capability Device Control[Relaxed Ordering Enable].  This was to serve
two purposes:

 1. Turn that off in order to prevent sending any Transaction Layer
    Packets with the Relaxed Ordering Attribute to any Completer.
    Which unfortunately would also prevent Peer-to-Peer use of the
    Relaxed Ordering Attribute.

 2. Act as a message to Device Drivers for those downstream devices
    that the they were dealing with a broken Root Port implementation.
    And this would work even for a driver in a VM with an attached
    device since it would be able to see the PCIe Configuration Space
    for the attached device.

I haven't been excited about any of this because:

 A. While so far all of the examples we've talked about are broken
    Root Port Completers, it's perfectly possible that other devices
    could be broken -- say an NVMe Device which is not "Coherent
    Memory".  How would this fit into the framework APIs being

 B. I have yet to see a n example of how the currently proposed
    API framework would be used in a hybrid environment where
    TLPs to the Root Port would not use Relaxed Ordering, but TLPs
    to a Peer would use Relaxed Ordering.  So far its all been about
    using a "big hammer" to completely disable the use of Relaxed

But the VM problem keeps cropping up over and over.  A driver in a VM
doesn't have access to the Root Port to determine if its on a "Black List"
and our only way of communicating with the driver in the VM is to leave the
device in a particular state (i.e. initialize the PCIe Capability Device
Control[Relaxed Ordering Enable] to "off").

Oh, and also, on the current patch submission's focus on broken Root Port
implementations: one could suggest that even if we're stuck with the "Device
attached to a VM Conundrum", that what we should really be thinking about is
if ANY device within a PCIe Fabric has broken Relaxed Ordering completion
problems, and, if so, "poisoning" the entire containing PCIe Fabric by
turning off Relaxed Ordering Enable for every device, up, down sideways --
including the Root Port itself.

| ...
| This associates the message with the Requester that may potentially
| use relaxed ordering. But there's nothing wrong or unusual about the
| Requester; the issue is with the *Completer*, so I think the message
| should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING.
| Maybe it should be both places; I dunno.
| This implementation assumes the device only initiates transactions to
| coherent memory, i.e., it assumes the device never does peer-to-peer
| DMA.  I guess we'll have to wait and see if we trip over any
| peer-to-peer issues, then figure out how to handle them.
| ...

Yes, as soon as we want to implement the hybrid use of Relaxed Ordering I
mentioned above.  And that the Intel document mentions itself.


  parent reply	other threads:[~2017-08-09 12:33 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-05  7:15 [PATCH v9 0/4] Add new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag Ding Tianhong
2017-08-05  7:15 ` [PATCH v9 1/4] PCI: Add new PCIe Fabric End Node flag, PCI_DEV_FLAGS_NO_RELAXED_ORDERING Ding Tianhong
2017-08-08 23:22   ` Bjorn Helgaas
2017-08-09  1:40     ` Casey Leedom
2017-08-09  3:02       ` Bjorn Helgaas
2017-08-09 12:17         ` Ding Tianhong
2017-08-09 16:36           ` Casey Leedom
2017-08-09 15:58     ` Raj, Ashok
2017-08-09 16:46       ` Casey Leedom
2017-08-09 18:00         ` Raj, Ashok
2017-08-09 20:11           ` Casey Leedom
2017-08-05  7:15 ` [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported Ding Tianhong
2017-08-09  2:22   ` Bjorn Helgaas
2017-08-09  3:25     ` Bjorn Helgaas
2017-08-09 13:42       ` Ding Tianhong
2017-08-09 12:33     ` Casey Leedom [this message]
2017-08-09 13:23     ` Ding Tianhong
2017-08-05  7:15 ` [PATCH v9 3/4] net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag Ding Tianhong
2017-08-05  7:15 ` [PATCH v9 4/4] net/cxgb4vf: " Ding Tianhong
2017-08-07  3:47 ` [PATCH v9 0/4] Add " David Miller
2017-08-07  4:13   ` Ding Tianhong
2017-08-07 21:14     ` David Miller
2017-08-08  1:56       ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).