Re: [RFC] ASPM L1 link latencies

From: Ian Kumlien <ian.kumlien@gmail.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org
Subject: Re: [RFC] ASPM L1 link latencies
Date: Thu, 30 Jul 2020 01:26:25 +0200	[thread overview]
Message-ID: <CAA85sZuQchm20C4v9V+TnOx+GZ4DJ13jX4g7fPdDB0QJN2Ot=A@mail.gmail.com> (raw)
In-Reply-To: <CAA85sZvL4_mR=w2MU7JUx5eksnCt1yBZD=jbhAMoMVz38OJ5aA@mail.gmail.com>

On Thu, Jul 30, 2020 at 1:12 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Thu, Jul 30, 2020 at 1:02 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > On Thu, Jul 30, 2020 at 12:47 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Sat, Jul 25, 2020 at 09:47:05PM +0200, Ian Kumlien wrote:
> > > > Hi,
> > > >
> > > > A while ago I realised that I was having all kinds off issues with my
> > > > connection, ~933 mbit had become ~40 mbit
> > > >
> > > > This only applied on links to the internet (via a linux fw running
> > > > NAT) however while debugging with the help of Alexander Duyck
> > > > we realised that ASPM could be the culprit (at least disabling ASPM on
> > > > the nic it self made things work just fine)...
> > > >
> > > > So while trying to understand PCIe and such things, I found this:
> > > >
> > > > The calculations of the max delay looked at "that node" + start latency * "hops"
> > > >
> > > > But one hop might have a larger latency and break the acceptable delay...
> > > >
> > > > So after a lot playing around with the code, i ended up with this, and
> > > > it seems to fix my problem and does
> > > > set two pcie bridges to ASPM Disabled that didn't happen before.
> > > >
> > > > I do however have questions.... Shouldn't the change be applied to
> > > > the endpoint?  Or should it be applied recursively along the path to
> > > > the endpoint?
> > >
> > > I don't understand this very well, but I think we do need to consider
> > > the latencies along the entire path.  PCIe r5.0, sec 5.4.1.3, contains
> > > this:
> > >
> > >   Power management software, using the latency information reported by
> > >   all components in the Hierarchy, can enable the appropriate level of
> > >   ASPM by comparing exit latency for each given path from Root to
> > >   Endpoint against the acceptable latency that each corresponding
> > >   Endpoint can withstand.
> >
> > One of the questions is this:
> > They say from root to endpoint while we walk from endpoint to root
> >
> > So, is that more optimal in some way? or should latencies always be
> > considered from root to endpoint?
> > In that case, should the link ASPM be disabled somewhere else?
> > (I tried to disable them on the "endpoint" and it didn't help for some reason)
> >
> > > Also this:
> >
> > [--8<--]
> >
> > >   - For each component with one or more Endpoint Functions, PCI
> > >     Express system software examines the Endpoint L0s/L1 Acceptable
> > >     Latency, as reported by each Endpoint Function in its Link
> > >     Capabilities register, and enables or disables L0s/L1 entry (via
> > >     the ASPM Control field in the Link Control register) accordingly
> > >     in some or all of the intervening device Ports on that hierarchy.
> >
> > > > Also, the L0S checks are only done on the local links, is this
> > > > correct?
> > >
> > > ASPM configuration is done on both ends of a link.  I'm not sure it
> > > makes sense to enable any state (L0s, L1, L1.1, L1.2) unless both ends
> > > of the link support it.  In particular, sec 5.4.1.3 says:
> > >
> > >   Software must not enable L0s in either direction on a given Link
> > >   unless components on both sides of the Link each support L0s;
> > >   otherwise, the result is undefined.
> > >
> > > But I think we do need to consider the entire path when enabling L0s;
> > > from sec 7.5.3.3:
> > >
> > >   Endpoint L0s Acceptable Latency - This field indicates the
> > >   acceptable total latency that an Endpoint can withstand due to the
> > >   transition from L0s state to the L0 state. It is essentially an
> > >   indirect measure of the Endpoint’s internal buffering.  Power
> > >   management software uses the reported L0s Acceptable Latency number
> > >   to compare against the L0s exit latencies reported by all components
> > >   comprising the data path from this Endpoint to the Root Complex Root
> > >   Port to determine whether ASPM L0s entry can be used with no loss of
> > >   performance.
> > >
> > > Does any of that help answer your question?
> >
> > Yes! It's exactly what I wanted to know, :)
> >
> > So now the question is should I group the fixes into one patch or
> > separate them for easier bisecting?
>
> Actually this raises a few questions...
>
> It does sound like this is sum(link->latency_up.l0s) +
> sum(link->latency_dw.l0s) of the link vs acceptable->l0s
>
> But, would that mean that we walk the link backwards? so it's both sides?
>
> Currently they are separated - and they are not diaabled as a whole...
>
> How should we handle the difference between up and down to keep the
> finer grained control we have?

so, this:

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index bd53fba7f382..18b659e6d3e8 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,

 static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 {
-       u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
+       u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
+               l0s_max_latency = 0;
        struct aspm_latency *acceptable;
        struct pcie_link_state *link;

@@ -448,14 +449,18 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)

        while (link) {
                /* Check upstream direction L0s latency */
-               if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
-                   (link->latency_up.l0s > acceptable->l0s))
-                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+               if (link->aspm_capable & ASPM_STATE_L0S_UP) {
+                       l0s_max_latency += link->latency_up.l0s;
+                       if (l0s_max_latency > acceptable->l0s)
+                               link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+               }

                /* Check downstream direction L0s latency */
-               if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
-                   (link->latency_dw.l0s > acceptable->l0s))
-                       link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+               if (link->aspm_capable & ASPM_STATE_L0S_DW) {
+                       l0s_max_latency += link->latency_dw.l0s;
+                       if (l0s_max_latency > acceptable->l0s)
+                               link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+               }
                /*
                 * Check L1 latency.
                 * Every switch on the path to root complex need 1

--- vs ----

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index bd53fba7f382..74dee09e788b 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,

 static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 {
-       u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
+       u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
+               l0s_max_latency = 0;
        struct aspm_latency *acceptable;
        struct pcie_link_state *link;

@@ -448,14 +449,17 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)

        while (link) {
                /* Check upstream direction L0s latency */
-               if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
-                   (link->latency_up.l0s > acceptable->l0s))
-                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+               if (link->aspm_capable & ASPM_STATE_L0S_UP)
+                       l0s_max_latency += link->latency_up.l0s;

                /* Check downstream direction L0s latency */
-               if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
-                   (link->latency_dw.l0s > acceptable->l0s))
-                       link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+               if (link->aspm_capable & ASPM_STATE_L0S_DW)
+                       l0s_max_latency += link->latency_dw.l0s;
+
+               /* If the latency exceeds, disable both */
+               if (l0s_max_latency > acceptable->l0s)
+                       link->aspm_capable &= ~ASPM_STATE_L0S;
+
                /*
                 * Check L1 latency.
                 * Every switch on the path to root complex need 1

----

Currently we don't make a difference between them and I don't quite
understand the upstream and downstream terminology since it's a serial
bus ;)