All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Ian Kumlien <ian.kumlien@gmail.com>
Cc: kai.heng.feng@canonical.com, linux-pci@vger.kernel.org,
	alexander.duyck@gmail.com, refactormyself@gmail.com,
	puranjay12@gmail.com
Subject: Re: [PATCH 1/3] PCI/ASPM: Use the path max in L1 ASPM latency check
Date: Tue, 12 Jan 2021 14:42:02 -0600	[thread overview]
Message-ID: <20210112204202.GA1489918@bjorn-Precision-5520> (raw)
In-Reply-To: <20201024205548.1837770-1-ian.kumlien@gmail.com>

On Sat, Oct 24, 2020 at 10:55:46PM +0200, Ian Kumlien wrote:
> Make pcie_aspm_check_latency comply with the PCIe spec, specifically:
> "5.4.1.2.2. Exit from the L1 State"
> 
> Which makes it clear that each switch is required to initiate a
> transition within 1μs from receiving it, accumulating this latency and
> then we have to wait for the slowest link along the path before
> entering L0 state from L1.
> 
> The current code doesn't take the maximum latency into account.
> 
> From the example:
>    +----------------+
>    |                |
>    |  Root complex  |
>    |                |
>    |    +-----+     |
>    |    |32 μs|     |
>    +----------------+
>            |
>            |  Link 1
>            |
>    +----------------+
>    |     |8 μs|     |
>    |     +----+     |
>    |    Switch A    |
>    |     +----+     |
>    |     |8 μs|     |
>    +----------------+
>            |
>            |  Link 2
>            |
>    +----------------+
>    |    |32 μs|     |
>    |    +-----+     |
>    |    Switch B    |
>    |    +-----+     |
>    |    |32 μs|     |
>    +----------------+
>            |
>            |  Link 3
>            |
>    +----------------+
>    |     |8μs|      |
>    |     +---+      |
>    |   Endpoint C   |
>    |                |
>    |                |
>    +----------------+
> 
> Links 1, 2 and 3 are all in L1 state - endpoint C initiates the
> transition to L0 at time T. Since switch B takes 32 μs to exit L1 on
> it's ports, Link 3 will transition to L0 at T+32 (longest time
> considering T+8 for endpoint C and T+32 for switch B).
> 
> Switch B is required to initiate a transition from the L1 state on it's
> upstream port after no more than 1 μs from the beginning of the
> transition from L1 state on the downstream port. Therefore, transition from
> L1 to L0 will begin on link 2 at T+1, this will cascade up the path.
> 
> The path will exit L1 at T+34.
> 
> On my specific system:
> 03:00.0 Ethernet controller: Intel Corporation I211 Gigabit Network Connection (rev 03)
> 04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. Device 816e (rev 1a)
> 
>             Exit latency       Acceptable latency
> Tree:       L1       L0s       L1       L0s
> ----------  -------  -----     -------  ------
> 00:01.2     <32 us   -
> | 01:00.0   <32 us   -
> |- 02:03.0  <32 us   -
> | \03:00.0  <16 us   <2us      <64 us   <512ns
> |
> \- 02:04.0  <32 us   -
>   \04:00.0  <64 us   unlimited <64 us   <512ns
> 
> 04:00.0's latency is the same as the maximum it allows so as we walk the path
> the first switchs startup latency will pass the acceptable latency limit
> for the link, and as a side-effect it fixes my issues with 03:00.0.

I don't think this is quite right.  We're looking at the path to
04:00.0, which includes two Links and one Switch:

The upstream Link:
  00:01.2 AMD Root Port               L1 Exit Latency <32us
  01:00.0 AMD Switch Upstream Port    L1 Exit Latency <32us

The downstream Link:
  02:04.0 AMD Switch Downstream Port  L1 Exit Latency <32us
  04:00.0 Realtek Endpoint            L1 Exit Latency <64us, Acceptable 64us

If both Links are in L1 and 04:00.0 needs to use the Link at time T,
I think the following events are relevant:

  T        04:00.0 initiates L1 exit on downstream Link
  T+1us    01:00.0 initiates L1 exit on upstream Link
  T+33us   upstream Link is in L0 (32us after initiating exit)
  T+64us   downstream Link is in L0 (64us after initiating exit)

The upstream Link's L1 exit latency is completely covered by the
downstream Link's, so 04:00.0 *should* only see its own exit latency
(64us), which it claims to be able to tolerate.

This patch computes "l1_max_latency + l1_switch_latency", which is
64us + 1us in this case.  I don't think it's correct to add the 1us
here because that delay is only relevant to the upstream Link.  We
should add it to the *upstream Link's* exit latency, but even with
that, its exit latency is only 33us from 04:00.0's point of view.

> Without this patch, 03:00.0 misbehaves and only gives me ~40 mbit/s over
> links with 6 or more hops. With this patch I'm back to a maximum of ~933
> mbit/s.
> 
> The original code path did:
> 04:00:0-02:04.0 max latency 64    -> ok
> 02:04.0-01:00.0 max latency 32 +1 -> ok
> 01:00.0-00:01.2 max latency 32 +2 -> ok
> 
> And thus didn't see any L1 ASPM latency issues.
> 
> The new code does:
> 04:00:0-02:04.0 max latency 64    -> ok
> 02:04.0-01:00.0 max latency 64 +1 -> latency exceeded
> 01:00.0-00:01.2 max latency 64 +2 -> latency exceeded
> 
> It correctly identifies the issue.
> 
> For reference, pcie information:
> https://bugzilla.kernel.org/show_bug.cgi?id=209725
> 
> Kai-Heng Feng has a machine that will not boot with ASPM without this patch,
> information is documented here:
> https://bugzilla.kernel.org/show_bug.cgi?id=209671
> 
> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> ---
>  drivers/pci/pcie/aspm.c | 22 ++++++++++++++--------
>  1 file changed, 14 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 253c30cc1967..c03ead0f1013 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>  
>  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  {
> -	u32 latency, l1_switch_latency = 0;
> +	u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
>  	struct aspm_latency *acceptable;
>  	struct pcie_link_state *link;
>  
> @@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  		if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
>  		    (link->latency_dw.l0s > acceptable->l0s))
>  			link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +
>  		/*
>  		 * Check L1 latency.
> -		 * Every switch on the path to root complex need 1
> -		 * more microsecond for L1. Spec doesn't mention L0s.
> +		 *
> +		 * PCIe r5.0, sec 5.4.1.2.2 states:
> +		 * A Switch is required to initiate an L1 exit transition on its
> +		 * Upstream Port Link after no more than 1 μs from the beginning of an
> +		 * L1 exit transition on any of its Downstream Port Links.
>  		 *
>  		 * The exit latencies for L1 substates are not advertised
>  		 * by a device.  Since the spec also doesn't mention a way
> @@ -469,11 +473,13 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  		 * L1 exit latencies advertised by a device include L1
>  		 * substate latencies (and hence do not do any check).
>  		 */
> -		latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> -		if ((link->aspm_capable & ASPM_STATE_L1) &&
> -		    (latency + l1_switch_latency > acceptable->l1))
> -			link->aspm_capable &= ~ASPM_STATE_L1;
> -		l1_switch_latency += 1000;
> +		if (link->aspm_capable & ASPM_STATE_L1) {
> +			latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> +			l1_max_latency = max_t(u32, latency, l1_max_latency);
> +			if (l1_max_latency + l1_switch_latency > acceptable->l1)
> +				link->aspm_capable &= ~ASPM_STATE_L1;
> +			l1_switch_latency += 1000;
> +		}

This is pretty subtle but I *think* the existing code is actually
correct.  The exit latency of a downstream Link overlaps all except
1us of the latency of the next upstream Link, so I don't think we have
to add the total Switch delay to the max Link exit latency.  We only
have to add the Switch delays downstream of Link X to Link X's exit
latency.

Also, I think we should accumulate the 1us Switch latency for *all*
Switches as the existing code does.  For the case where a Switch's
Upstream Port is L1-capable but the Downstream Port is not, this patch
doesn't add any l1_switch_latency.  That assumes the Switch can start
the upstream L1 exit instantly upon receipt of a TLP at the Downstream
Port.  I think we should assume it takes the same time (up to 1us) to
start that exit as it would if the Downstream Port were in L1.

>  		link = link->parent;
>  	}

My guess is the real problem is the Switch is advertising incorrect
exit latencies.  If the Switch advertised "<64us" exit latency for its
Upstream Port, we'd compute "64us exit latency + 1us Switch delay =
65us", which is more than either 03:00.0 or 04:00.0 can tolerate, so
we would disable L1 on that upstream Link.

Working around this would require some sort of quirk to override the
values read from the Switch, which is a little messy.  Here's what I'm
thinking (not asking you to do this; just trying to think of an
approach):

  - Configure common clock earlier, in pci_configure_device(), because
    this changes the "read-only" L1 exit latencies in Link
    Capabilities.

  - Cache Link Capabilities in the pci_dev.

  - Add a quirk to override the cached values for the Switch based on
    Vendor/Device ID and probably DMI motherboard/BIOS info.

Bjorn

  parent reply	other threads:[~2021-01-12 21:46 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-07 13:28 [PATCH] Use maximum latency when determining L1 ASPM Ian Kumlien
2020-10-08  4:20 ` Kai-Heng Feng
2020-10-08 16:13 ` Bjorn Helgaas
2020-10-12 10:20   ` Ian Kumlien
2020-10-14  8:34     ` Kai-Heng Feng
2020-10-14 13:33       ` Ian Kumlien
2020-10-14 14:36         ` Bjorn Helgaas
2020-10-14 15:39           ` Ian Kumlien
2020-10-16 14:53             ` Ian Kumlien
2020-10-16 21:28         ` Bjorn Helgaas
2020-10-16 22:41           ` Ian Kumlien
2020-10-18 11:35             ` Ian Kumlien
2020-10-22 15:37               ` Bjorn Helgaas
2020-10-22 15:41                 ` Ian Kumlien
2020-10-22 18:30                   ` Bjorn Helgaas
2020-10-24 20:55                     ` [PATCH 1/3] PCI/ASPM: Use the path max in L1 ASPM latency check Ian Kumlien
2020-10-24 20:55                       ` [PATCH 2/3] PCI/ASPM: Fix L0s max " Ian Kumlien
2020-11-15 21:49                         ` Ian Kumlien
2020-10-24 20:55                       ` [PATCH 3/3] [RFC] PCI/ASPM: Print L1/L0s latency messages per endpoint Ian Kumlien
2020-11-15 21:49                       ` [PATCH 1/3] PCI/ASPM: Use the path max in L1 ASPM latency check Ian Kumlien
2020-12-07 11:04                         ` Ian Kumlien
2020-12-12 23:47                       ` Bjorn Helgaas
2020-12-13 21:39                         ` Ian Kumlien
2020-12-14  5:44                           ` Bjorn Helgaas
2020-12-14  5:44                             ` [Intel-wired-lan] " Bjorn Helgaas
2020-12-14  9:14                             ` Ian Kumlien
2020-12-14  9:14                               ` [Intel-wired-lan] " Ian Kumlien
2020-12-14 14:02                               ` Bjorn Helgaas
2020-12-14 14:02                                 ` [Intel-wired-lan] " Bjorn Helgaas
2020-12-14 15:47                                 ` Ian Kumlien
2020-12-14 15:47                                   ` [Intel-wired-lan] " Ian Kumlien
2020-12-14 19:19                                   ` Bjorn Helgaas
2020-12-14 19:19                                     ` [Intel-wired-lan] " Bjorn Helgaas
2020-12-14 22:56                                     ` Ian Kumlien
2020-12-14 22:56                                       ` [Intel-wired-lan] " Ian Kumlien
2020-12-15  0:40                                       ` Bjorn Helgaas
2020-12-15  0:40                                         ` [Intel-wired-lan] " Bjorn Helgaas
2020-12-15 13:09                                         ` Ian Kumlien
2020-12-15 13:09                                           ` [Intel-wired-lan] " Ian Kumlien
2020-12-16  0:08                                           ` Bjorn Helgaas
2020-12-16  0:08                                             ` [Intel-wired-lan] " Bjorn Helgaas
2020-12-16 11:20                                             ` Ian Kumlien
2020-12-16 11:20                                               ` [Intel-wired-lan] " Ian Kumlien
2020-12-16 23:21                                               ` Bjorn Helgaas
2020-12-16 23:21                                                 ` [Intel-wired-lan] " Bjorn Helgaas
2020-12-17 23:37                                                 ` Ian Kumlien
2020-12-17 23:37                                                   ` [Intel-wired-lan] " Ian Kumlien
2021-01-12 20:42                       ` Bjorn Helgaas [this message]
2021-01-28 12:41                         ` Ian Kumlien
2021-02-24 22:19                           ` Ian Kumlien
2021-02-25 22:03                             ` Bjorn Helgaas
2021-04-26 14:36                               ` Ian Kumlien
2021-04-28 21:15                                 ` Bjorn Helgaas
2021-05-15 11:52                                   ` Ian Kumlien

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210112204202.GA1489918@bjorn-Precision-5520 \
    --to=helgaas@kernel.org \
    --cc=alexander.duyck@gmail.com \
    --cc=ian.kumlien@gmail.com \
    --cc=kai.heng.feng@canonical.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=puranjay12@gmail.com \
    --cc=refactormyself@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.