From: Aya Levin <ayal@mellanox.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: David Miller <davem@davemloft.net>,
kuba@kernel.org, saeedm@mellanox.com, mkubecek@suse.cz,
linux-pci@vger.kernel.org, netdev@vger.kernel.org,
tariqt@mellanox.com, alexander.h.duyck@linux.intel.com,
Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering
Date: Tue, 14 Jul 2020 13:47:15 +0300 [thread overview]
Message-ID: <5ecc588d-6397-37e4-3104-a32a639312f0@mellanox.com> (raw)
In-Reply-To: <20200708231630.GA472767@bjorn-Precision-5520>
On 7/9/2020 2:16 AM, Bjorn Helgaas wrote:
> On Sun, Jul 08, 2040 at 11:22:12AM +0300, Aya Levin wrote:
>> On 7/6/2020 10:49 PM, David Miller wrote:
>>> From: Aya Levin <ayal@mellanox.com>
>>> Date: Mon, 6 Jul 2020 16:00:59 +0300
>>>
>>>> Assuming the discussions with Bjorn will conclude in a well-trusted
>>>> API that ensures relaxed ordering in enabled, I'd still like a method
>>>> to turn off relaxed ordering for performance debugging sake.
>>>> Bjorn highlighted the fact that the PCIe sub system can only offer a
>>>> query method. Even if theoretically a set API will be provided, this
>>>> will not fit a netdev debugging - I wonder if CPU vendors even support
>>>> relaxed ordering set/unset...
>>>> On the driver's side relaxed ordering is an attribute of the mkey and
>>>> should be available for configuration (similar to number of CPU
>>>> vs. number of channels).
>>>> Based on the above, and binding the driver's default relaxed ordering
>>>> to the return value from pcie_relaxed_ordering_enabled(), may I
>>>> continue with previous direction of a private-flag to control the
>>>> client side (my driver) ?
>>>
>>> I don't like this situation at all.
>>>
>>> If RO is so dodgy that it potentially needs to be disabled, that is
>>> going to be an issue not just with networking devices but also with
>>> storage and other device types as well.
>>>
>>> Will every device type have a custom way to disable RO, thus
>>> inconsistently, in order to accomodate this?
>>>
>>> That makes no sense and is a terrible user experience.
>>>
>>> That's why the knob belongs generically in PCI or similar.
>>>
>> Hi Bjorn,
>>
>> Mellanox NIC supports relaxed ordering operation over DMA buffers.
>> However for debug prepossess we must have a chicken bit to disable
>> relaxed ordering on a specific system without effecting others in
>> run-time. In order to meet this requirement, I added a netdev
>> private-flag to ethtool for set RO API.
>>
>> Dave raised a concern regarding embedding relaxed ordering set API
>> per system (networking, storage and others). We need the ability to
>> manage relaxed ordering in a unify manner. Could you please define a
>> PCI sub-system solution to meet this requirement?
>
> I agree, this is definitely a mess. Let me just outline what I think
> we have today and what we're missing.
>
> - On the hardware side, device use of Relaxed Ordering is controlled
> by the Enable Relaxed Ordering bit in the PCIe Device Control
> register (or the PCI-X Command register). If set, the device is
> allowed but not required to set the Relaxed Ordering bit in
> transactions it initiates (PCIe r5.0, sec 7.5.3.4; PCI-X 2.0, sec
> 7.2.3).
>
> I suspect there may be device-specific controls, too, because [1]
> claims to enable/disable Relaxed Ordering but doesn't touch the
> PCIe Device Control register. Device-specific controls are
> certainly allowed, but of course it would be up to the driver, and
> the device cannot generate TLPs with Relaxed Ordering unless the
> architected PCIe Enable Relaxed Ordering bit is *also* set.
>
> - Platform firmware can enable Relaxed Ordering for a device either
> before handoff to the OS or via the _HPX ACPI method.
>
> - The PCI core never enables Relaxed Ordering itself except when
> applying _HPX.
>
> - At enumeration-time, the PCI core disables Relaxed Ordering in
> pci_configure_relaxed_ordering() if the device is below a Root
> Port that has a quirk indicating an erratum. This quirk currently
> includes many Intel Root Ports, but not all, and is an ongoing
> maintenance problem.
>
> - The PCI core provides pcie_relaxed_ordering_enabled() which tells
> you whether Relaxed Ordering is enabled. Only used by cxgb4 and
> csio, which use that information to fill in Ingress Queue
> Commands.
>
> - The PCI core does not provide a driver interface to enable or
> disable Relaxed Ordering.
>
> - Some drivers disable Relaxed Ordering themselves: mtip32xx,
> netup_unidvb, tg3, myri10ge (oddly, only if CONFIG_MYRI10GE_DCA),
> tsi721, kp2000_pcie.
>
> - Some drivers enable Relaxed Ordering themselves: niu, tegra.
>
> What are we missing and what should the PCI core do?
>
> - Currently the Enable Relaxed Ordering bit depends on what firmware
> did. Maybe the PCI core should always clear it during
> enumeration?
>
> - The PCI core should probably have a driver interface like
> pci_set_relaxed_ordering(dev, enable) that can set or clear the
> architected PCI-X or PCIe Enable Relaxed Ordering bit.
>
> - Maybe there should be a kernel command-line parameter like
> "pci=norelax" that disables Relaxed Ordering for every device and
> prevents pci_set_relaxed_ordering() from enabling it.
>
> I'm mixed on this because these tend to become folklore about how
> to "fix" problems and we end up with systems that don't work
> unless you happen to find the option on the web. For debugging
> issues, it might be enough to disable Relaxed Ordering using
> setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0"
>
> [1] https://lore.kernel.org/netdev/20200623195229.26411-11-saeedm@mellanox.com/
>
Hi Bjorn,
Thanks for the detailed reply. From initial testing I can say that
turning off the relaxed ordering on the PCI (setpci -s02:00.0
CAP_EXP+8.w=0) is the chicken bit I was looking for.
This lower the risk of depending on pcie_relaxed_ordering_enabled(). I
will update my patch and resubmit.
Thanks,
Aya
next prev parent reply other threads:[~2020-07-14 10:47 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20200623195229.26411-1-saeedm@mellanox.com>
[not found] ` <20200623195229.26411-11-saeedm@mellanox.com>
[not found] ` <20200623143118.51373eb7@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
[not found] ` <dda5c2b729bbaf025592aa84e2bdb84d0cda7570.camel@mellanox.com>
[not found] ` <082c6bfe-5146-c213-9220-65177717c342@mellanox.com>
2020-06-24 17:22 ` [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering Jakub Kicinski
2020-06-24 20:15 ` Saeed Mahameed
[not found] ` <20200624133018.5a4d238b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2020-07-06 13:00 ` Aya Levin
2020-07-06 16:52 ` Jakub Kicinski
2020-07-06 19:49 ` David Miller
2040-07-08 8:22 ` Aya Levin
2020-07-08 23:16 ` Bjorn Helgaas
2020-07-08 23:26 ` Jason Gunthorpe
2020-07-09 17:35 ` Jonathan Lemon
2020-07-09 18:20 ` Jason Gunthorpe
2020-07-09 19:47 ` Jakub Kicinski
2020-07-10 2:18 ` Saeed Mahameed
2020-07-10 12:21 ` Jason Gunthorpe
2020-07-09 20:33 ` Jonathan Lemon
2020-07-14 10:47 ` Aya Levin [this message]
2020-07-23 21:03 ` Alexander Duyck
2020-06-26 20:12 ` Bjorn Helgaas
2020-06-26 20:24 ` David Miller
2020-06-29 9:32 ` Aya Levin
2020-06-29 19:33 ` Bjorn Helgaas
2020-06-29 19:57 ` Raj, Ashok
2020-06-30 7:32 ` Ding Tianhong
2020-07-05 11:15 ` Aya Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5ecc588d-6397-37e4-3104-a32a639312f0@mellanox.com \
--to=ayal@mellanox.com \
--cc=alexander.h.duyck@linux.intel.com \
--cc=davem@davemloft.net \
--cc=helgaas@kernel.org \
--cc=jgg@nvidia.com \
--cc=kuba@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mkubecek@suse.cz \
--cc=netdev@vger.kernel.org \
--cc=saeedm@mellanox.com \
--cc=tariqt@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).