netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Aya Levin <ayal@mellanox.com>
Cc: David Miller <davem@davemloft.net>,
	kuba@kernel.org, saeedm@mellanox.com, mkubecek@suse.cz,
	linux-pci@vger.kernel.org, netdev@vger.kernel.org,
	tariqt@mellanox.com, alexander.h.duyck@linux.intel.com,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering
Date: Wed, 8 Jul 2020 18:16:30 -0500	[thread overview]
Message-ID: <20200708231630.GA472767@bjorn-Precision-5520> (raw)
In-Reply-To: <0506f0aa-f35e-09c7-5ba0-b74cd9eb1384@mellanox.com>

On Sun, Jul 08, 2040 at 11:22:12AM +0300, Aya Levin wrote:
> On 7/6/2020 10:49 PM, David Miller wrote:
> > From: Aya Levin <ayal@mellanox.com>
> > Date: Mon, 6 Jul 2020 16:00:59 +0300
> > 
> > > Assuming the discussions with Bjorn will conclude in a well-trusted
> > > API that ensures relaxed ordering in enabled, I'd still like a method
> > > to turn off relaxed ordering for performance debugging sake.
> > > Bjorn highlighted the fact that the PCIe sub system can only offer a
> > > query method. Even if theoretically a set API will be provided, this
> > > will not fit a netdev debugging - I wonder if CPU vendors even support
> > > relaxed ordering set/unset...
> > > On the driver's side relaxed ordering is an attribute of the mkey and
> > > should be available for configuration (similar to number of CPU
> > > vs. number of channels).
> > > Based on the above, and binding the driver's default relaxed ordering
> > > to the return value from pcie_relaxed_ordering_enabled(), may I
> > > continue with previous direction of a private-flag to control the
> > > client side (my driver) ?
> > 
> > I don't like this situation at all.
> > 
> > If RO is so dodgy that it potentially needs to be disabled, that is
> > going to be an issue not just with networking devices but also with
> > storage and other device types as well.
> > 
> > Will every device type have a custom way to disable RO, thus
> > inconsistently, in order to accomodate this?
> > 
> > That makes no sense and is a terrible user experience.
> > 
> > That's why the knob belongs generically in PCI or similar.
> > 
> Hi Bjorn,
> 
> Mellanox NIC supports relaxed ordering operation over DMA buffers.
> However for debug prepossess we must have a chicken bit to disable
> relaxed ordering on a specific system without effecting others in
> run-time. In order to meet this requirement, I added a netdev
> private-flag to ethtool for set RO API.
> 
> Dave raised a concern regarding embedding relaxed ordering set API
> per system (networking, storage and others). We need the ability to
> manage relaxed ordering in a unify manner. Could you please define a
> PCI sub-system solution to meet this requirement?

I agree, this is definitely a mess.  Let me just outline what I think
we have today and what we're missing.

  - On the hardware side, device use of Relaxed Ordering is controlled
    by the Enable Relaxed Ordering bit in the PCIe Device Control
    register (or the PCI-X Command register).  If set, the device is
    allowed but not required to set the Relaxed Ordering bit in
    transactions it initiates (PCIe r5.0, sec 7.5.3.4; PCI-X 2.0, sec
    7.2.3).

    I suspect there may be device-specific controls, too, because [1]
    claims to enable/disable Relaxed Ordering but doesn't touch the
    PCIe Device Control register.  Device-specific controls are
    certainly allowed, but of course it would be up to the driver, and
    the device cannot generate TLPs with Relaxed Ordering unless the
    architected PCIe Enable Relaxed Ordering bit is *also* set.

  - Platform firmware can enable Relaxed Ordering for a device either
    before handoff to the OS or via the _HPX ACPI method.

  - The PCI core never enables Relaxed Ordering itself except when
    applying _HPX.

  - At enumeration-time, the PCI core disables Relaxed Ordering in
    pci_configure_relaxed_ordering() if the device is below a Root
    Port that has a quirk indicating an erratum.  This quirk currently
    includes many Intel Root Ports, but not all, and is an ongoing
    maintenance problem.

  - The PCI core provides pcie_relaxed_ordering_enabled() which tells
    you whether Relaxed Ordering is enabled.  Only used by cxgb4 and
    csio, which use that information to fill in Ingress Queue
    Commands.

  - The PCI core does not provide a driver interface to enable or
    disable Relaxed Ordering.

  - Some drivers disable Relaxed Ordering themselves: mtip32xx,
    netup_unidvb, tg3, myri10ge (oddly, only if CONFIG_MYRI10GE_DCA),
    tsi721, kp2000_pcie.

  - Some drivers enable Relaxed Ordering themselves: niu, tegra.

What are we missing and what should the PCI core do?

  - Currently the Enable Relaxed Ordering bit depends on what firmware
    did.  Maybe the PCI core should always clear it during
    enumeration?

  - The PCI core should probably have a driver interface like
    pci_set_relaxed_ordering(dev, enable) that can set or clear the
    architected PCI-X or PCIe Enable Relaxed Ordering bit.

  - Maybe there should be a kernel command-line parameter like
    "pci=norelax" that disables Relaxed Ordering for every device and
    prevents pci_set_relaxed_ordering() from enabling it.

    I'm mixed on this because these tend to become folklore about how
    to "fix" problems and we end up with systems that don't work
    unless you happen to find the option on the web.  For debugging
    issues, it might be enough to disable Relaxed Ordering using
    setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0"

[1] https://lore.kernel.org/netdev/20200623195229.26411-11-saeedm@mellanox.com/

  reply	other threads:[~2020-07-08 23:16 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed
2020-06-23 19:52 ` [net-next 01/10] net/mlx5: Avoid eswitch header inclusion in fs core layer Saeed Mahameed
2020-06-23 21:00   ` Jakub Kicinski
2020-06-23 19:52 ` [net-next 02/10] net/mlx5: FWTrace: Add missing space Saeed Mahameed
2020-06-23 19:52 ` [net-next 03/10] net/mlx5: Add a missing macro undefinition Saeed Mahameed
2020-06-23 19:52 ` [net-next 04/10] net/mlx5: Use kfree(ft->g) in arfs_create_groups() Saeed Mahameed
2020-06-23 19:52 ` [net-next 05/10] net/mlx5e: Remove unused mlx5e_xsk_first_unused_channel Saeed Mahameed
2020-06-23 19:52 ` [net-next 06/10] net/mlx5e: Move including net/arp.h from en_rep.c to rep/neigh.c Saeed Mahameed
2020-06-23 21:02   ` Jakub Kicinski
2020-06-23 19:52 ` [net-next 07/10] net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT Saeed Mahameed
2020-06-23 21:03   ` Jakub Kicinski
2020-06-23 21:26     ` Saeed Mahameed
2020-06-23 21:33       ` Jakub Kicinski
2020-06-23 19:52 ` [net-next 08/10] net/mlx5e: vxlan: Use RCU for vxlan table lookup Saeed Mahameed
2020-06-23 19:52 ` [net-next 09/10] net/mlx5e: vxlan: Return bool instead of opaque ptr in port_lookup() Saeed Mahameed
2020-06-23 19:52 ` [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering Saeed Mahameed
2020-06-23 21:31   ` Jakub Kicinski
2020-06-24  6:56     ` Saeed Mahameed
2020-06-24  7:34       ` Aya Levin
2020-06-24 17:22         ` Jakub Kicinski
2020-06-24 20:15           ` Saeed Mahameed
     [not found]             ` <20200624133018.5a4d238b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
2020-07-06 13:00               ` Aya Levin
2020-07-06 16:52                 ` Jakub Kicinski
2020-07-06 19:49                 ` David Miller
2040-07-08  8:22                   ` Aya Levin
2020-07-08 23:16                     ` Bjorn Helgaas [this message]
2020-07-08 23:26                       ` Jason Gunthorpe
2020-07-09 17:35                         ` Jonathan Lemon
2020-07-09 18:20                           ` Jason Gunthorpe
2020-07-09 19:47                             ` Jakub Kicinski
2020-07-10  2:18                               ` Saeed Mahameed
2020-07-10 12:21                                 ` Jason Gunthorpe
2020-07-09 20:33                             ` Jonathan Lemon
2020-07-14 10:47                       ` Aya Levin
2020-07-23 21:03                     ` Alexander Duyck
2020-06-26 20:12           ` Bjorn Helgaas
2020-06-26 20:24             ` David Miller
2020-06-29  9:32             ` Aya Levin
2020-06-29 19:33               ` Bjorn Helgaas
2020-06-29 19:57                 ` Raj, Ashok
2020-06-30  7:32                   ` Ding Tianhong
2020-07-05 11:15                     ` Aya Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200708231630.GA472767@bjorn-Precision-5520 \
    --to=helgaas@kernel.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=ayal@mellanox.com \
    --cc=davem@davemloft.net \
    --cc=jgg@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mkubecek@suse.cz \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@mellanox.com \
    --cc=tariqt@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).