All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Alex G." <mr.nuke.me@gmail.com>
To: Bjorn Helgaas <helgaas@kernel.org>, Alex_Gagniuc@Dellteam.com
Cc: bhelgaas@google.com, Austin.Bolen@dell.com, Shyam.Iyer@dell.com,
	keith.busch@intel.com, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, jeffrey.t.kirsher@intel.com,
	ariel.elior@cavium.com, michael.chan@broadcom.com,
	ganeshgr@chelsio.com, tariqt@mellanox.com,
	jakub.kicinski@netronome.com, talgi@mellanox.com,
	airlied@gmail.com, alexander.deucher@amd.com,
	Mike Marciniszyn <mike.marciniszyn@intel.com>
Subject: Re: [PATCH v3] PCI: Check for PCIe downtraining conditions
Date: Thu, 19 Jul 2018 10:46:43 -0500	[thread overview]
Message-ID: <8baf16fb-7e7c-ec59-19ef-a709e164dc94@gmail.com> (raw)
In-Reply-To: <20180718215359.GG128988@bhelgaas-glaptop.roam.corp.google.com>



On 07/18/2018 04:53 PM, Bjorn Helgaas wrote:
> [+cc Mike (hfi1)]
> 
> On Mon, Jul 16, 2018 at 10:28:35PM +0000, Alex_Gagniuc@Dellteam.com wrote:
>> On 7/16/2018 4:17 PM, Bjorn Helgaas wrote:
>>>> ...
>>>> The easiest way to detect this is with pcie_print_link_status(),
>>>> since the bottleneck is usually the link that is downtrained. It's not
>>>> a perfect solution, but it works extremely well in most cases.
>>>
>>> This is an interesting idea.  I have two concerns:
>>>
>>> Some drivers already do this on their own, and we probably don't want
>>> duplicate output for those devices.  In most cases (ixgbe and mlx* are
>>> exceptions), the drivers do this unconditionally so we *could* remove
>>> it from the driver if we add it to the core.  The dmesg order would
>>> change, and the message wouldn't be associated with the driver as it
>>> now is.
>>
>> Oh, there are only 8 users of that. Even I could patch up the drivers to
>> remove the call, assuming we reach agreement about this change.
>>
>>> Also, I think some of the GPU devices might come up at a lower speed,
>>> then download firmware, then reset the device so it comes up at a
>>> higher speed.  I think this patch will make us complain about about
>>> the low initial speed, which might confuse users.
>>
>> I spoke to one of the PCIe spec writers. It's allowable for a device to
>> downtrain speed or width. It would also be extremely dumb to downtrain
>> with the intent to re-train at a higher speed later, but it's possible
>> devices do dumb stuff like that. That's why it's an informational
>> message, instead of a warning.
> 
> FWIW, here's some of the discussion related to hfi1 from [1]:
> 
>    > Btw, why is the driver configuring the PCIe link speed?  Isn't
>    > this something we should be handling in the PCI core?
> 
>    The device comes out of reset at the 5GT/s speed. The driver
>    downloads device firmware, programs PCIe registers, and co-ordinates
>    the transition to 8GT/s.
> 
>    This recipe is device specific and is therefore implemented in the
>    hfi1 driver built on top of PCI core functions and macros.
> 
> Also several DRM drivers seem to do this (see ),
> si_pcie_gen3_enable()); from [2]:
> 
>    My understanding was that some platfoms only bring up the link in gen 1
>    mode for compatibility reasons.
> 
> [1] https://lkml.kernel.org/r/32E1700B9017364D9B60AED9960492BC627FF54C@fmsmsx120.amr.corp.intel.com
> [2] https://lkml.kernel.org/r/BN6PR12MB1809BD30AA5B890C054F9832F7B50@BN6PR12MB1809.namprd12.prod.outlook.com

Downtraining a link "for compatibility reasons" is one of those dumb 
things that devices do. I'm SURPRISED AMD HW does it, although it is 
perfectly permissible by PCIe spec.

>> Another case: Some devices (lower-end GPUs) use silicon (and marketing)
>> that advertises x16, but they're only routed for x8. I'm okay with
>> seeing an informational message in this case. In fact, I didn't know
>> that my Quadro card for three years is only wired for x8 until I was
>> testing this patch.
> 
> Yeah, it's probably OK.  I don't want bug reports from people who
> think something's broken when it's really just a hardware limitation
> of their system.  But hopefully the message is not alarming.

It looks fairly innocent:

[    0.749415] pci 0000:18:00.0: 4.000 Gb/s available PCIe bandwidth, 
limited by 5 GT/s x1 link at 0000:17:03.0 (capable of 15.752 Gb/s with 8 
GT/s x2 link)

>>> So I'm not sure whether it's better to do this in the core for all
>>> devices, or if we should just add it to the high-performance drivers
>>> that really care.
>>
>> You're thinking "do I really need that bandwidth" because I'm using a
>> function called "_bandwidth_". The point of the change is very far from
>> that: it is to help in system troubleshooting by detecting downtraining
>> conditions.
> 
> I'm not sure what you think I'm thinking :)  My question is whether
> it's worthwhile to print this extra information for *every* PCIe
> device, given that your use case is the tiny percentage of broken
> systems.

I think this information is a lot more useful than a bunch of other info 
that's printed. Is "type 00 class 0x088000" more valuable? What about 
"reg 0x20: [mem 0x9d950000-0x9d95ffff 64bit pref]", which is also 
available under /proc/iomem for those curious?

> If we only printed the info in the "bw_avail < bw_cap" case, i.e.,
> when the device is capable of more than it's getting, that would make
> a lot of sense to me.  The normal case line is more questionable.  I
> think the reason that's there is because the network drivers are very
> performance sensitive and like to see that info all the time.

I agree that can be an acceptable compromise.

> Maybe we need something like this:
> 
>    pcie_print_link_status(struct pci_dev *dev, int verbose)
>    {
>      ...
>      if (bw_avail >= bw_cap) {
>        if (verbose)
>          pci_info(dev, "... available PCIe bandwidth ...");
>      } else
>        pci_info(dev, "... available PCIe bandwidth, limited by ...");
>    }
> 
> So the core could print only the potential problems with:
> 
>    pcie_print_link_status(dev, 0);
> 
> and drivers that really care even if there's no problem could do:
> 
>    pcie_print_link_status(dev, 1);

Sounds good. I'll try to push out updated PATCH early next week.

>>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
>> [snip]
>>>> +	/* Look from the device up to avoid downstream ports with no devices. */
>>>> +	if ((pci_pcie_type(dev) != PCI_EXP_TYPE_ENDPOINT) &&
>>>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_LEG_END) &&
>>>> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_UPSTREAM))
>>>> +		return;
>>>
>>> Do we care about Upstream Ports here?
>>
>> YES! Switches. e.g. an x16 switch with 4x downstream ports could
>> downtrain at 8x and 4x, and we'd never catch it.
> 
> OK, I think I see your point: if the upstream port *could* do 16x but
> only trains to 4x, and two endpoints below it are both capable of 4x,
> the endpoints *think* they're happy but in fact they have to share 4x
> when they could use more.
> 
> Bjorn
> 

  reply	other threads:[~2018-07-19 15:46 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-04 15:55 [PATCH v3] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
2018-06-05 12:27 ` Andy Shevchenko
2018-06-05 13:04   ` Andy Shevchenko
2018-07-16 21:17 ` Bjorn Helgaas
2018-07-16 22:28   ` Alex_Gagniuc
2018-07-16 22:28     ` Alex_Gagniuc
2018-07-18 21:53     ` Bjorn Helgaas
2018-07-19 15:46       ` Alex G. [this message]
2018-07-19 17:07         ` Deucher, Alexander
2018-07-23 20:01       ` [PATCH v2] PCI/AER: Do not clear AER bits if we don't own AER Alexandru Gagniuc
2018-07-25  1:24         ` kbuild test robot
2018-07-23 20:03       ` [PATCH v5] PCI: Check for PCIe downtraining conditions Alexandru Gagniuc
2018-07-23 21:01         ` Jakub Kicinski
2018-07-23 21:52           ` Tal Gilboa
2018-07-23 22:14             ` Jakub Kicinski
2018-07-23 23:59               ` Alex G.
2018-07-24 13:39                 ` Tal Gilboa
2018-07-30 23:26                   ` Alex_Gagniuc
2018-07-30 23:26                     ` Alex_Gagniuc
2018-07-31  6:40             ` Tal Gilboa
2018-07-31 15:10               ` Alex G.
2018-08-05  7:05                 ` Tal Gilboa
2018-08-06 18:39                   ` Alex_Gagniuc
2018-08-06 18:39                     ` Alex_Gagniuc
2018-08-06 19:46                     ` Bjorn Helgaas
2018-08-06 23:25                       ` [PATCH v6 1/9] " Alexandru Gagniuc
2018-08-06 23:25                         ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 2/9] bnx2x: Do not call pcie_print_link_status() Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 3/9] bnxt_en: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 4/9] cxgb4: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-06 23:25                         ` [PATCH v6 5/9] fm10k: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-07 17:52                           ` Jeff Kirsher
2018-08-07 17:52                             ` [Intel-wired-lan] " Jeff Kirsher
2018-08-06 23:25                         ` [PATCH v6 6/9] ixgbe: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-07 17:51                           ` Jeff Kirsher
2018-08-07 17:51                             ` [Intel-wired-lan] " Jeff Kirsher
2018-08-06 23:25                         ` [PATCH v6 7/9] net/mlx4: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-08  6:10                           ` Leon Romanovsky
2018-08-08  6:10                             ` [Intel-wired-lan] " Leon Romanovsky
2018-08-06 23:25                         ` [PATCH v6 8/9] net/mlx5: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-08  6:08                           ` Leon Romanovsky
2018-08-08  6:08                             ` [Intel-wired-lan] " Leon Romanovsky
2018-08-08 14:23                             ` Tal Gilboa
2018-08-08 14:23                               ` [Intel-wired-lan] " Tal Gilboa
2018-08-08 14:23                               ` Tal Gilboa
2018-08-08 15:41                               ` Leon Romanovsky
2018-08-08 15:41                                 ` [Intel-wired-lan] " Leon Romanovsky
2018-08-08 15:56                                 ` Tal Gilboa
2018-08-08 15:56                                   ` [Intel-wired-lan] " Tal Gilboa
2018-08-08 16:33                                   ` Alex G.
2018-08-08 16:33                                     ` [Intel-wired-lan] " Alex G.
2018-08-08 17:27                                     ` Leon Romanovsky
2018-08-08 17:27                                       ` [Intel-wired-lan] " Leon Romanovsky
2018-08-09 14:02                                       ` Bjorn Helgaas
2018-08-09 14:02                                         ` [Intel-wired-lan] " Bjorn Helgaas
2018-08-06 23:25                         ` [PATCH v6 9/9] nfp: " Alexandru Gagniuc
2018-08-06 23:25                           ` [Intel-wired-lan] " Alexandru Gagniuc
2018-08-07 19:44                         ` [PATCH v6 1/9] PCI: Check for PCIe downtraining conditions David Miller
2018-08-07 19:44                           ` [Intel-wired-lan] " David Miller
2018-08-07 21:41                         ` Bjorn Helgaas
2018-08-07 21:41                           ` [Intel-wired-lan] " Bjorn Helgaas
2018-07-18 13:38   ` [PATCH v3] " Tal Gilboa
2018-07-19 15:49     ` Alex G.
2018-07-23  5:21       ` Tal Gilboa
2018-07-23 17:01         ` Alex G.
2018-07-23 21:35           ` Tal Gilboa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8baf16fb-7e7c-ec59-19ef-a709e164dc94@gmail.com \
    --to=mr.nuke.me@gmail.com \
    --cc=Alex_Gagniuc@Dellteam.com \
    --cc=Austin.Bolen@dell.com \
    --cc=Shyam.Iyer@dell.com \
    --cc=airlied@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=ariel.elior@cavium.com \
    --cc=bhelgaas@google.com \
    --cc=ganeshgr@chelsio.com \
    --cc=helgaas@kernel.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=michael.chan@broadcom.com \
    --cc=mike.marciniszyn@intel.com \
    --cc=talgi@mellanox.com \
    --cc=tariqt@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.