linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Aya Levin <ayal@mellanox.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: David Miller <davem@davemloft.net>,
	kuba@kernel.org, saeedm@mellanox.com, mkubecek@suse.cz,
	linux-pci@vger.kernel.org, netdev@vger.kernel.org,
	tariqt@mellanox.com, alexander.h.duyck@linux.intel.com,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering
Date: Tue, 14 Jul 2020 13:47:15 +0300	[thread overview]
Message-ID: <5ecc588d-6397-37e4-3104-a32a639312f0@mellanox.com> (raw)
In-Reply-To: <20200708231630.GA472767@bjorn-Precision-5520>



On 7/9/2020 2:16 AM, Bjorn Helgaas wrote:
> On Sun, Jul 08, 2040 at 11:22:12AM +0300, Aya Levin wrote:
>> On 7/6/2020 10:49 PM, David Miller wrote:
>>> From: Aya Levin <ayal@mellanox.com>
>>> Date: Mon, 6 Jul 2020 16:00:59 +0300
>>>
>>>> Assuming the discussions with Bjorn will conclude in a well-trusted
>>>> API that ensures relaxed ordering in enabled, I'd still like a method
>>>> to turn off relaxed ordering for performance debugging sake.
>>>> Bjorn highlighted the fact that the PCIe sub system can only offer a
>>>> query method. Even if theoretically a set API will be provided, this
>>>> will not fit a netdev debugging - I wonder if CPU vendors even support
>>>> relaxed ordering set/unset...
>>>> On the driver's side relaxed ordering is an attribute of the mkey and
>>>> should be available for configuration (similar to number of CPU
>>>> vs. number of channels).
>>>> Based on the above, and binding the driver's default relaxed ordering
>>>> to the return value from pcie_relaxed_ordering_enabled(), may I
>>>> continue with previous direction of a private-flag to control the
>>>> client side (my driver) ?
>>>
>>> I don't like this situation at all.
>>>
>>> If RO is so dodgy that it potentially needs to be disabled, that is
>>> going to be an issue not just with networking devices but also with
>>> storage and other device types as well.
>>>
>>> Will every device type have a custom way to disable RO, thus
>>> inconsistently, in order to accomodate this?
>>>
>>> That makes no sense and is a terrible user experience.
>>>
>>> That's why the knob belongs generically in PCI or similar.
>>>
>> Hi Bjorn,
>>
>> Mellanox NIC supports relaxed ordering operation over DMA buffers.
>> However for debug prepossess we must have a chicken bit to disable
>> relaxed ordering on a specific system without effecting others in
>> run-time. In order to meet this requirement, I added a netdev
>> private-flag to ethtool for set RO API.
>>
>> Dave raised a concern regarding embedding relaxed ordering set API
>> per system (networking, storage and others). We need the ability to
>> manage relaxed ordering in a unify manner. Could you please define a
>> PCI sub-system solution to meet this requirement?
> 
> I agree, this is definitely a mess.  Let me just outline what I think
> we have today and what we're missing.
> 
>    - On the hardware side, device use of Relaxed Ordering is controlled
>      by the Enable Relaxed Ordering bit in the PCIe Device Control
>      register (or the PCI-X Command register).  If set, the device is
>      allowed but not required to set the Relaxed Ordering bit in
>      transactions it initiates (PCIe r5.0, sec 7.5.3.4; PCI-X 2.0, sec
>      7.2.3).
> 
>      I suspect there may be device-specific controls, too, because [1]
>      claims to enable/disable Relaxed Ordering but doesn't touch the
>      PCIe Device Control register.  Device-specific controls are
>      certainly allowed, but of course it would be up to the driver, and
>      the device cannot generate TLPs with Relaxed Ordering unless the
>      architected PCIe Enable Relaxed Ordering bit is *also* set.
> 
>    - Platform firmware can enable Relaxed Ordering for a device either
>      before handoff to the OS or via the _HPX ACPI method.
> 
>    - The PCI core never enables Relaxed Ordering itself except when
>      applying _HPX.
> 
>    - At enumeration-time, the PCI core disables Relaxed Ordering in
>      pci_configure_relaxed_ordering() if the device is below a Root
>      Port that has a quirk indicating an erratum.  This quirk currently
>      includes many Intel Root Ports, but not all, and is an ongoing
>      maintenance problem.
> 
>    - The PCI core provides pcie_relaxed_ordering_enabled() which tells
>      you whether Relaxed Ordering is enabled.  Only used by cxgb4 and
>      csio, which use that information to fill in Ingress Queue
>      Commands.
> 
>    - The PCI core does not provide a driver interface to enable or
>      disable Relaxed Ordering.
> 
>    - Some drivers disable Relaxed Ordering themselves: mtip32xx,
>      netup_unidvb, tg3, myri10ge (oddly, only if CONFIG_MYRI10GE_DCA),
>      tsi721, kp2000_pcie.
> 
>    - Some drivers enable Relaxed Ordering themselves: niu, tegra.
> 
> What are we missing and what should the PCI core do?
> 
>    - Currently the Enable Relaxed Ordering bit depends on what firmware
>      did.  Maybe the PCI core should always clear it during
>      enumeration?
> 
>    - The PCI core should probably have a driver interface like
>      pci_set_relaxed_ordering(dev, enable) that can set or clear the
>      architected PCI-X or PCIe Enable Relaxed Ordering bit.
> 
>    - Maybe there should be a kernel command-line parameter like
>      "pci=norelax" that disables Relaxed Ordering for every device and
>      prevents pci_set_relaxed_ordering() from enabling it.
> 
>      I'm mixed on this because these tend to become folklore about how
>      to "fix" problems and we end up with systems that don't work
>      unless you happen to find the option on the web.  For debugging
>      issues, it might be enough to disable Relaxed Ordering using
>      setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0"
> 
> [1] https://lore.kernel.org/netdev/20200623195229.26411-11-saeedm@mellanox.com/
> 
Hi Bjorn,

Thanks for the detailed reply. From initial testing I can say that 
turning off the relaxed ordering on the PCI (setpci -s02:00.0 
CAP_EXP+8.w=0) is the chicken bit I was looking for.
This lower the risk of depending on pcie_relaxed_ordering_enabled(). I 
will update my patch and resubmit.

Thanks,
Aya

  parent reply	other threads:[~2020-07-14 10:47 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200623195229.26411-1-saeedm@mellanox.com>
     [not found] ` <20200623195229.26411-11-saeedm@mellanox.com>
     [not found]   ` <20200623143118.51373eb7@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
     [not found]     ` <dda5c2b729bbaf025592aa84e2bdb84d0cda7570.camel@mellanox.com>
     [not found]       ` <082c6bfe-5146-c213-9220-65177717c342@mellanox.com>
2020-06-24 17:22         ` [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering Jakub Kicinski
2020-06-24 20:15           ` Saeed Mahameed
     [not found]             ` <20200624133018.5a4d238b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2020-07-06 13:00               ` Aya Levin
2020-07-06 16:52                 ` Jakub Kicinski
2020-07-06 19:49                 ` David Miller
2040-07-08  8:22                   ` Aya Levin
2020-07-08 23:16                     ` Bjorn Helgaas
2020-07-08 23:26                       ` Jason Gunthorpe
2020-07-09 17:35                         ` Jonathan Lemon
2020-07-09 18:20                           ` Jason Gunthorpe
2020-07-09 19:47                             ` Jakub Kicinski
2020-07-10  2:18                               ` Saeed Mahameed
2020-07-10 12:21                                 ` Jason Gunthorpe
2020-07-09 20:33                             ` Jonathan Lemon
2020-07-14 10:47                       ` Aya Levin [this message]
2020-07-23 21:03                     ` Alexander Duyck
2020-06-26 20:12           ` Bjorn Helgaas
2020-06-26 20:24             ` David Miller
2020-06-29  9:32             ` Aya Levin
2020-06-29 19:33               ` Bjorn Helgaas
2020-06-29 19:57                 ` Raj, Ashok
2020-06-30  7:32                   ` Ding Tianhong
2020-07-05 11:15                     ` Aya Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ecc588d-6397-37e4-3104-a32a639312f0@mellanox.com \
    --to=ayal@mellanox.com \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=helgaas@kernel.org \
    --cc=jgg@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mkubecek@suse.cz \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).