All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
       [not found] <CAA85sZvrPApeAYPVSYdVuKnp84xCpLBLf+f32e=R9tdPC0dvOw@mail.gmail.com>
@ 2020-09-25 15:49 ` Bjorn Helgaas
  2020-09-25 22:26   ` Ian Kumlien
  0 siblings, 1 reply; 33+ messages in thread
From: Bjorn Helgaas @ 2020-09-25 15:49 UTC (permalink / raw)
  To: Ian Kumlien
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

[+cc linux-pci, others again]

On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > So.......
> > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> >
> > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > on your bugzilla, here's the path:
> 
> Correct, or you could do it like this:
> 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> I211 Gigabit Network Connection (rev 03)
> 
> >   00:01.2 Root Port              to [bus 01-07]
> >   01:00.0 Switch Upstream Port   to [bus 02-07]
> >   02:03.0 Switch Downstream Port to [bus 03]
> >   03:00.0 Endpoint (Intel I211 NIC)
> >
> > Your system does also have an 04:00.0 here:
> >
> >   00:01.2 Root Port              to [bus 01-07]
> >   01:00.0 Switch Upstream Port   to [bus 02-07]
> >   02:04.0 Switch Downstream Port to [bus 04]
> >   04:00.0 Endpoint (Realtek 816e)
> >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> >   04:00.2 Endpoint (Realtek 816a)
> >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> >   04:00.7 Endpoint (Realtek 816c IPMI)
> >
> > Which NIC is the problem?
> 
> The intel one - so the side effect of the realtek nic is that it fixes
> the intel nics issues.
> 
> It would be that the intel nic actually has a bug with L1 (and it
> would seem that it's to kind with latencies) so it actually has a
> smaller buffer...
> 
> And afair, the realtek has a larger buffer, since it behaves better
> with L1 enabled.
> 
> Either way, it's a fix that's needed ;)

OK, what specifically is the fix that's needed?  The L0s change seems
like a "this looks wrong" thing that doesn't actually affect your
situation, so let's skip that for now.

And let's support the change you *do* need with the "lspci -vv" for
all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
since they share the 00:01.2 - 01:00.0 link), before and after the
change.

I want to identify something in the "before" configuration that is
wrong according to the spec, and a change in the "after" config so it
now conforms to the spec.

Bjorn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-25 15:49 ` [PATCH] Use maximum latency when determining L1/L0s ASPM v2 Bjorn Helgaas
@ 2020-09-25 22:26   ` Ian Kumlien
  2020-09-28  0:06     ` Bjorn Helgaas
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Kumlien @ 2020-09-25 22:26 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc linux-pci, others again]
>
> On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > So.......
> > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > >
> > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > on your bugzilla, here's the path:
> >
> > Correct, or you could do it like this:
> > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > I211 Gigabit Network Connection (rev 03)
> >
> > >   00:01.2 Root Port              to [bus 01-07]
> > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > >   02:03.0 Switch Downstream Port to [bus 03]
> > >   03:00.0 Endpoint (Intel I211 NIC)
> > >
> > > Your system does also have an 04:00.0 here:
> > >
> > >   00:01.2 Root Port              to [bus 01-07]
> > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > >   02:04.0 Switch Downstream Port to [bus 04]
> > >   04:00.0 Endpoint (Realtek 816e)
> > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > >   04:00.2 Endpoint (Realtek 816a)
> > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > >
> > > Which NIC is the problem?
> >
> > The intel one - so the side effect of the realtek nic is that it fixes
> > the intel nics issues.
> >
> > It would be that the intel nic actually has a bug with L1 (and it
> > would seem that it's to kind with latencies) so it actually has a
> > smaller buffer...
> >
> > And afair, the realtek has a larger buffer, since it behaves better
> > with L1 enabled.
> >
> > Either way, it's a fix that's needed ;)
>
> OK, what specifically is the fix that's needed?  The L0s change seems
> like a "this looks wrong" thing that doesn't actually affect your
> situation, so let's skip that for now.

L1 latency calculation is not good enough, it assumes that *any*
link is the highest latency link - which is incorrect.

The latency to bring L1 up is number-of-hops*1000 +
maximum-latency-along-the-path

currently it's only doing number-of-hops*1000 +
arbitrary-latency-of-current-link

> And let's support the change you *do* need with the "lspci -vv" for
> all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> since they share the 00:01.2 - 01:00.0 link), before and after the
> change.

They are all included in all lspci output in the bug

> I want to identify something in the "before" configuration that is
> wrong according to the spec, and a change in the "after" config so it
> now conforms to the spec.

So there are a few issues here, the current code does not apply to spec.

The intel nic gets fixed as a side effect (it should still get a
proper fix) of making
the code apply to spec.

Basically, while hunting for the issue, I found that the L1 and L0s
latency calculations used to determine
if they should be enabled or not is wrong - that is what I'm currently
trying to push - it also seems like the
intel nic claims that it can handle 64us but apparently it can't.

So, three bugs, two are "fixed" one needs additional fixing.

> Bjorn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-25 22:26   ` Ian Kumlien
@ 2020-09-28  0:06     ` Bjorn Helgaas
  2020-09-28 10:24       ` Ian Kumlien
  0 siblings, 1 reply; 33+ messages in thread
From: Bjorn Helgaas @ 2020-09-28  0:06 UTC (permalink / raw)
  To: Ian Kumlien
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

On Sat, Sep 26, 2020 at 12:26:53AM +0200, Ian Kumlien wrote:
> On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
> > > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > > So.......
> > > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > > >
> > > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > > on your bugzilla, here's the path:
> > >
> > > Correct, or you could do it like this:
> > > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > > I211 Gigabit Network Connection (rev 03)
> > >
> > > >   00:01.2 Root Port              to [bus 01-07]
> > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > >
> > > > Your system does also have an 04:00.0 here:
> > > >
> > > >   00:01.2 Root Port              to [bus 01-07]
> > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > >   02:04.0 Switch Downstream Port to [bus 04]
> > > >   04:00.0 Endpoint (Realtek 816e)
> > > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > > >   04:00.2 Endpoint (Realtek 816a)
> > > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > > >
> > > > Which NIC is the problem?
> > >
> > > The intel one - so the side effect of the realtek nic is that it fixes
> > > the intel nics issues.
> > >
> > > It would be that the intel nic actually has a bug with L1 (and it
> > > would seem that it's to kind with latencies) so it actually has a
> > > smaller buffer...
> > >
> > > And afair, the realtek has a larger buffer, since it behaves better
> > > with L1 enabled.
> > >
> > > Either way, it's a fix that's needed ;)
> >
> > OK, what specifically is the fix that's needed?  The L0s change seems
> > like a "this looks wrong" thing that doesn't actually affect your
> > situation, so let's skip that for now.
> 
> L1 latency calculation is not good enough, it assumes that *any*
> link is the highest latency link - which is incorrect.
> 
> The latency to bring L1 up is number-of-hops*1000 +
> maximum-latency-along-the-path
> 
> currently it's only doing number-of-hops*1000 +
> arbitrary-latency-of-current-link
> 
> > And let's support the change you *do* need with the "lspci -vv" for
> > all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> > since they share the 00:01.2 - 01:00.0 link), before and after the
> > change.
> 
> They are all included in all lspci output in the bug

No doubt.  But I spent a long time going through those and the
differences I found are not enough to show a spec violation or a fix.

Here's what I extracted (this is a repeat; I mentioned this before):

                                                    LnkCtl    LnkCtl
           ------DevCap-------  ----LnkCap-------  -Before-  -After--
  00:01.2                                L1 <32us       L1+       L1-
  01:00.0                                L1 <32us       L1+       L1-
  02:03.0                                L1 <32us       L1+       L1+
  03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-

I don't see anything wrong here yet.  03:00.0 claims it can handle up
to 64us of L1 exit latency, and the L1 exit latency of the entire path
should be 33us.  What am I missing?

> > I want to identify something in the "before" configuration that is
> > wrong according to the spec, and a change in the "after" config so it
> > now conforms to the spec.
> 
> So there are a few issues here, the current code does not apply to spec.
> 
> The intel nic gets fixed as a side effect (it should still get a
> proper fix) of making
> the code apply to spec.
> 
> Basically, while hunting for the issue, I found that the L1 and L0s
> latency calculations used to determine
> if they should be enabled or not is wrong - that is what I'm currently
> trying to push - it also seems like the
> intel nic claims that it can handle 64us but apparently it can't.
> 
> So, three bugs, two are "fixed" one needs additional fixing.

OK, we just need to split these up as much as possible and support
them with the relevant lspci output, analysis of what specifically is
wrong, and the lspci output showing the effect of the fix.

Bjorn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-28  0:06     ` Bjorn Helgaas
@ 2020-09-28 10:24       ` Ian Kumlien
  2020-09-28 17:09         ` Bjorn Helgaas
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Kumlien @ 2020-09-28 10:24 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

On Mon, Sep 28, 2020 at 2:06 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Sat, Sep 26, 2020 at 12:26:53AM +0200, Ian Kumlien wrote:
> > On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > > > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > >
> > > > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > > > So.......
> > > > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > > > >
> > > > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > > > on your bugzilla, here's the path:
> > > >
> > > > Correct, or you could do it like this:
> > > > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > > > I211 Gigabit Network Connection (rev 03)
> > > >
> > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > > >
> > > > > Your system does also have an 04:00.0 here:
> > > > >
> > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > >   02:04.0 Switch Downstream Port to [bus 04]
> > > > >   04:00.0 Endpoint (Realtek 816e)
> > > > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > > > >   04:00.2 Endpoint (Realtek 816a)
> > > > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > > > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > > > >
> > > > > Which NIC is the problem?
> > > >
> > > > The intel one - so the side effect of the realtek nic is that it fixes
> > > > the intel nics issues.
> > > >
> > > > It would be that the intel nic actually has a bug with L1 (and it
> > > > would seem that it's to kind with latencies) so it actually has a
> > > > smaller buffer...
> > > >
> > > > And afair, the realtek has a larger buffer, since it behaves better
> > > > with L1 enabled.
> > > >
> > > > Either way, it's a fix that's needed ;)
> > >
> > > OK, what specifically is the fix that's needed?  The L0s change seems
> > > like a "this looks wrong" thing that doesn't actually affect your
> > > situation, so let's skip that for now.
> >
> > L1 latency calculation is not good enough, it assumes that *any*
> > link is the highest latency link - which is incorrect.
> >
> > The latency to bring L1 up is number-of-hops*1000 +
> > maximum-latency-along-the-path
> >
> > currently it's only doing number-of-hops*1000 +
> > arbitrary-latency-of-current-link
> >
> > > And let's support the change you *do* need with the "lspci -vv" for
> > > all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> > > since they share the 00:01.2 - 01:00.0 link), before and after the
> > > change.
> >
> > They are all included in all lspci output in the bug
>
> No doubt.  But I spent a long time going through those and the
> differences I found are not enough to show a spec violation or a fix.
>
> Here's what I extracted (this is a repeat; I mentioned this before):
>
>                                                     LnkCtl    LnkCtl
>            ------DevCap-------  ----LnkCap-------  -Before-  -After--
>   00:01.2                                L1 <32us       L1+       L1-
>   01:00.0                                L1 <32us       L1+       L1-
>   02:03.0                                L1 <32us       L1+       L1+
>   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
>
> I don't see anything wrong here yet.  03:00.0 claims it can handle up
> to 64us of L1 exit latency, and the L1 exit latency of the entire path
> should be 33us.  What am I missing?

it should be 32+3 so 35 us - but the intel nic claims something it
can't live up to.

Since this is triggered by the realtek device
                                                     LnkCtl    LnkCtl
            ------DevCap-------  ----LnkCap-------  -Before-  -After--
   00:01.2                                L1 <32us       L1+       L1-
   01:00.0                                L1 <32us       L1+       L1-
   02:04.0                                L1 <32us       L1+       L1+
   04:00.0  L0s <512 L1 <64us

But exit for 04:00.0 is 64us which means it breaks its own latency
requirements once it's behind anything

> > > I want to identify something in the "before" configuration that is
> > > wrong according to the spec, and a change in the "after" config so it
> > > now conforms to the spec.
> >
> > So there are a few issues here, the current code does not apply to spec.
> >
> > The intel nic gets fixed as a side effect (it should still get a
> > proper fix) of making
> > the code apply to spec.
> >
> > Basically, while hunting for the issue, I found that the L1 and L0s
> > latency calculations used to determine
> > if they should be enabled or not is wrong - that is what I'm currently
> > trying to push - it also seems like the
> > intel nic claims that it can handle 64us but apparently it can't.
> >
> > So, three bugs, two are "fixed" one needs additional fixing.
>
> OK, we just need to split these up as much as possible and support
> them with the relevant lspci output, analysis of what specifically is
> wrong, and the lspci output showing the effect of the fix.

Could i have a copy of the pcie spec? I have found sections of it but
it's really hard to find them again when you
want to refer to something... Which I need to do to show that it
doesn't conform...

Ie I can't show all the errors on my system :)

> Bjorn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-28 10:24       ` Ian Kumlien
@ 2020-09-28 17:09         ` Bjorn Helgaas
  2020-09-28 17:41           ` Ian Kumlien
  2020-09-28 18:10           ` Alexander Duyck
  0 siblings, 2 replies; 33+ messages in thread
From: Bjorn Helgaas @ 2020-09-28 17:09 UTC (permalink / raw)
  To: Ian Kumlien
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

On Mon, Sep 28, 2020 at 12:24:11PM +0200, Ian Kumlien wrote:
> On Mon, Sep 28, 2020 at 2:06 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Sat, Sep 26, 2020 at 12:26:53AM +0200, Ian Kumlien wrote:
> > > On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > > > > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > >
> > > > > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > > > > So.......
> > > > > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > > > > >
> > > > > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > > > > on your bugzilla, here's the path:
> > > > >
> > > > > Correct, or you could do it like this:
> > > > > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > > > > I211 Gigabit Network Connection (rev 03)
> > > > >
> > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > > > >
> > > > > > Your system does also have an 04:00.0 here:
> > > > > >
> > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > >   02:04.0 Switch Downstream Port to [bus 04]
> > > > > >   04:00.0 Endpoint (Realtek 816e)
> > > > > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > > > > >   04:00.2 Endpoint (Realtek 816a)
> > > > > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > > > > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > > > > >
> > > > > > Which NIC is the problem?
> > > > >
> > > > > The intel one - so the side effect of the realtek nic is that it fixes
> > > > > the intel nics issues.
> > > > >
> > > > > It would be that the intel nic actually has a bug with L1 (and it
> > > > > would seem that it's to kind with latencies) so it actually has a
> > > > > smaller buffer...
> > > > >
> > > > > And afair, the realtek has a larger buffer, since it behaves better
> > > > > with L1 enabled.
> > > > >
> > > > > Either way, it's a fix that's needed ;)
> > > >
> > > > OK, what specifically is the fix that's needed?  The L0s change seems
> > > > like a "this looks wrong" thing that doesn't actually affect your
> > > > situation, so let's skip that for now.
> > >
> > > L1 latency calculation is not good enough, it assumes that *any*
> > > link is the highest latency link - which is incorrect.
> > >
> > > The latency to bring L1 up is number-of-hops*1000 +
> > > maximum-latency-along-the-path
> > >
> > > currently it's only doing number-of-hops*1000 +
> > > arbitrary-latency-of-current-link
> > >
> > > > And let's support the change you *do* need with the "lspci -vv" for
> > > > all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> > > > since they share the 00:01.2 - 01:00.0 link), before and after the
> > > > change.
> > >
> > > They are all included in all lspci output in the bug
> >
> > No doubt.  But I spent a long time going through those and the
> > differences I found are not enough to show a spec violation or a fix.
> >
> > Here's what I extracted (this is a repeat; I mentioned this before):
> >
> >                                                     LnkCtl    LnkCtl
> >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> >   00:01.2                                L1 <32us       L1+       L1-
> >   01:00.0                                L1 <32us       L1+       L1-
> >   02:03.0                                L1 <32us       L1+       L1+
> >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> >
> > I don't see anything wrong here yet.  03:00.0 claims it can handle up
> > to 64us of L1 exit latency, and the L1 exit latency of the entire path
> > should be 33us.  What am I missing?
> 
> it should be 32+3 so 35 us - but the intel nic claims something it
> can't live up to.

How did you compute 35us?  Here's my math for 33us:

  It looks like we *should* be able to enable L1 on both links since
  the exit latency should be <33us (first link starts exit at T=0,
  completes by T=32; second link starts exit at T=1, completes by
  T=33), and 03:00.0 can tolerate up to 64us.

If 03:00.0 advertises that it can tolerate 64us but it really can't,
the fix would be a quirk to override the DevCap value for that device.

> Since this is triggered by the realtek device

I'm still under the impression that the throughput problem is with
03:00.0, the Intel I211 NIC.  In what sense is this triggered by the
Realtek device?  Does the I211 work correctly until we add the Realtek
device below the same switch?

>                                                      LnkCtl    LnkCtl
>             ------DevCap-------  ----LnkCap-------  -Before-  -After--
>    00:01.2                                L1 <32us       L1+       L1-
>    01:00.0                                L1 <32us       L1+       L1-
>    02:04.0                                L1 <32us       L1+       L1+
>    04:00.0  L0s <512 L1 <64us
> 
> But exit for 04:00.0 is 64us which means it breaks its own latency
> requirements once it's behind anything

From your "lspci-before" attachment, 04:00.0 advertises:

  04:00.0 DevCap: Latency L1 <64us
          LnkCap: Exit Latency L1 <64us

So I see what you mean; if it's behind a switch, the link on the
upstream side of the switch would add some L1 exit latency, so we
wouldn't be able to exit L1 for the entire path in <64us.  This isn't
broken in the sense of a hardware defect; it just means we won't be
able to enable L1 on some parts of the path.

I wouldn't be surprised if Linux is doing that wrong right now, but we
would just need to nail down exactly what's wrong.

> > > > I want to identify something in the "before" configuration that is
> > > > wrong according to the spec, and a change in the "after" config so it
> > > > now conforms to the spec.
> > >
> > > So there are a few issues here, the current code does not apply to spec.
> > >
> > > The intel nic gets fixed as a side effect (it should still get a
> > > proper fix) of making
> > > the code apply to spec.
> > >
> > > Basically, while hunting for the issue, I found that the L1 and L0s
> > > latency calculations used to determine
> > > if they should be enabled or not is wrong - that is what I'm currently
> > > trying to push - it also seems like the
> > > intel nic claims that it can handle 64us but apparently it can't.
> > >
> > > So, three bugs, two are "fixed" one needs additional fixing.
> >
> > OK, we just need to split these up as much as possible and support
> > them with the relevant lspci output, analysis of what specifically is
> > wrong, and the lspci output showing the effect of the fix.
> 
> Could i have a copy of the pcie spec? I have found sections of it but
> it's really hard to find them again when you
> want to refer to something... Which I need to do to show that it
> doesn't conform...

Wish I could give you the spec, but it's not public.  There are books
that cover most of this material, e.g., Mindshare's "PCI Express
System Architecture".  I have a 2004 copy and it covers ASPM (but not
the L1 Substates).  I have also heard rumors that it's possible to
find older copies of the spec on the web.

> Ie I can't show all the errors on my system :)
> 
> > Bjorn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-28 17:09         ` Bjorn Helgaas
@ 2020-09-28 17:41           ` Ian Kumlien
  2020-09-28 19:53             ` Alexander Duyck
  2020-09-28 18:10           ` Alexander Duyck
  1 sibling, 1 reply; 33+ messages in thread
From: Ian Kumlien @ 2020-09-28 17:41 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

On Mon, Sep 28, 2020 at 7:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Sep 28, 2020 at 12:24:11PM +0200, Ian Kumlien wrote:
> > On Mon, Sep 28, 2020 at 2:06 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Sat, Sep 26, 2020 at 12:26:53AM +0200, Ian Kumlien wrote:
> > > > On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > > > > > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > >
> > > > > > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > > > > > So.......
> > > > > > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > > > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > > > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > > > > > >
> > > > > > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > > > > > on your bugzilla, here's the path:
> > > > > >
> > > > > > Correct, or you could do it like this:
> > > > > > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > > > > > I211 Gigabit Network Connection (rev 03)
> > > > > >
> > > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > > > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > > > > >
> > > > > > > Your system does also have an 04:00.0 here:
> > > > > > >
> > > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > > >   02:04.0 Switch Downstream Port to [bus 04]
> > > > > > >   04:00.0 Endpoint (Realtek 816e)
> > > > > > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > > > > > >   04:00.2 Endpoint (Realtek 816a)
> > > > > > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > > > > > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > > > > > >
> > > > > > > Which NIC is the problem?
> > > > > >
> > > > > > The intel one - so the side effect of the realtek nic is that it fixes
> > > > > > the intel nics issues.
> > > > > >
> > > > > > It would be that the intel nic actually has a bug with L1 (and it
> > > > > > would seem that it's to kind with latencies) so it actually has a
> > > > > > smaller buffer...
> > > > > >
> > > > > > And afair, the realtek has a larger buffer, since it behaves better
> > > > > > with L1 enabled.
> > > > > >
> > > > > > Either way, it's a fix that's needed ;)
> > > > >
> > > > > OK, what specifically is the fix that's needed?  The L0s change seems
> > > > > like a "this looks wrong" thing that doesn't actually affect your
> > > > > situation, so let's skip that for now.
> > > >
> > > > L1 latency calculation is not good enough, it assumes that *any*
> > > > link is the highest latency link - which is incorrect.
> > > >
> > > > The latency to bring L1 up is number-of-hops*1000 +
> > > > maximum-latency-along-the-path
> > > >
> > > > currently it's only doing number-of-hops*1000 +
> > > > arbitrary-latency-of-current-link
> > > >
> > > > > And let's support the change you *do* need with the "lspci -vv" for
> > > > > all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> > > > > since they share the 00:01.2 - 01:00.0 link), before and after the
> > > > > change.
> > > >
> > > > They are all included in all lspci output in the bug
> > >
> > > No doubt.  But I spent a long time going through those and the
> > > differences I found are not enough to show a spec violation or a fix.
> > >
> > > Here's what I extracted (this is a repeat; I mentioned this before):
> > >
> > >                                                     LnkCtl    LnkCtl
> > >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > >   00:01.2                                L1 <32us       L1+       L1-
> > >   01:00.0                                L1 <32us       L1+       L1-
> > >   02:03.0                                L1 <32us       L1+       L1+
> > >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> > >
> > > I don't see anything wrong here yet.  03:00.0 claims it can handle up
> > > to 64us of L1 exit latency, and the L1 exit latency of the entire path
> > > should be 33us.  What am I missing?
> >
> > it should be 32+3 so 35 us - but the intel nic claims something it
> > can't live up to.
>
> How did you compute 35us?  Here's my math for 33us:
>
>   It looks like we *should* be able to enable L1 on both links since
>   the exit latency should be <33us (first link starts exit at T=0,
>   completes by T=32; second link starts exit at T=1, completes by
>   T=33), and 03:00.0 can tolerate up to 64us.

03:00.0 + 0
02:03.0 + 1000
01:00.0 + 1000
00:01.2 + 1000

so, 32 us + 3us or am I missing something?

> If 03:00.0 advertises that it can tolerate 64us but it really can't,
> the fix would be a quirk to override the DevCap value for that device.

Yeah, I wonder what it should be - I assume we could calculate it from latencies
or perhaps there is something hidden in pcie 4 or higher L1 modes...

I'm very uncertain of what level lspci handles and how much of the
data it extracts
(And i don't know better myself either ;))

> > Since this is triggered by the realtek device
>
> I'm still under the impression that the throughput problem is with
> 03:00.0, the Intel I211 NIC.  In what sense is this triggered by the
> Realtek device?  Does the I211 work correctly until we add the Realtek
> device below the same switch?

They are both embedded devices on the motherboard
Asus  Pro WS X570-ACE with bios: 2206

So i can't test one without the other...

> >                                                      LnkCtl    LnkCtl
> >             ------DevCap-------  ----LnkCap-------  -Before-  -After--
> >    00:01.2                                L1 <32us       L1+       L1-
> >    01:00.0                                L1 <32us       L1+       L1-
> >    02:04.0                                L1 <32us       L1+       L1+
> >    04:00.0  L0s <512 L1 <64us
> >
> > But exit for 04:00.0 is 64us which means it breaks its own latency
> > requirements once it's behind anything
>
> From your "lspci-before" attachment, 04:00.0 advertises:
>
>   04:00.0 DevCap: Latency L1 <64us
>           LnkCap: Exit Latency L1 <64us
>
> So I see what you mean; if it's behind a switch, the link on the
> upstream side of the switch would add some L1 exit latency, so we
> wouldn't be able to exit L1 for the entire path in <64us.  This isn't
> broken in the sense of a hardware defect; it just means we won't be
> able to enable L1 on some parts of the path.

Yes, it just seems odd to have the same latency as max latency..

It could also be that they are assuming that we will not check that
endpoint's latency
as we walk the path... I have heard that this is quite tricky to get
right and that Microsoft also had problems with it.

(I assume that they would have a bit more resources and devices to test with)

> I wouldn't be surprised if Linux is doing that wrong right now, but we
> would just need to nail down exactly what's wrong.

OK

> > > > > I want to identify something in the "before" configuration that is
> > > > > wrong according to the spec, and a change in the "after" config so it
> > > > > now conforms to the spec.
> > > >
> > > > So there are a few issues here, the current code does not apply to spec.
> > > >
> > > > The intel nic gets fixed as a side effect (it should still get a
> > > > proper fix) of making
> > > > the code apply to spec.
> > > >
> > > > Basically, while hunting for the issue, I found that the L1 and L0s
> > > > latency calculations used to determine
> > > > if they should be enabled or not is wrong - that is what I'm currently
> > > > trying to push - it also seems like the
> > > > intel nic claims that it can handle 64us but apparently it can't.
> > > >
> > > > So, three bugs, two are "fixed" one needs additional fixing.
> > >
> > > OK, we just need to split these up as much as possible and support
> > > them with the relevant lspci output, analysis of what specifically is
> > > wrong, and the lspci output showing the effect of the fix.
> >
> > Could i have a copy of the pcie spec? I have found sections of it but
> > it's really hard to find them again when you
> > want to refer to something... Which I need to do to show that it
> > doesn't conform...
>
> Wish I could give you the spec, but it's not public.  There are books
> that cover most of this material, e.g., Mindshare's "PCI Express
> System Architecture".  I have a 2004 copy and it covers ASPM (but not
> the L1 Substates).  I have also heard rumors that it's possible to
> find older copies of the spec on the web.

Dang... :/

> > Ie I can't show all the errors on my system :)
> >
> > > Bjorn

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-28 17:09         ` Bjorn Helgaas
  2020-09-28 17:41           ` Ian Kumlien
@ 2020-09-28 18:10           ` Alexander Duyck
  1 sibling, 0 replies; 33+ messages in thread
From: Alexander Duyck @ 2020-09-28 18:10 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Ian Kumlien, linux-pci, Saheed O. Bolarinwa, Puranjay Mohan,
	Alexander Duyck

On Mon, Sep 28, 2020 at 10:11 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Sep 28, 2020 at 12:24:11PM +0200, Ian Kumlien wrote:
> > On Mon, Sep 28, 2020 at 2:06 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Sat, Sep 26, 2020 at 12:26:53AM +0200, Ian Kumlien wrote:
> > > > On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > > > > > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > >
> > > > > > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > > > > > So.......
> > > > > > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > > > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > > > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > > > > > >
> > > > > > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > > > > > on your bugzilla, here's the path:
> > > > > >
> > > > > > Correct, or you could do it like this:
> > > > > > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > > > > > I211 Gigabit Network Connection (rev 03)
> > > > > >
> > > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > > > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > > > > >
> > > > > > > Your system does also have an 04:00.0 here:
> > > > > > >
> > > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > > >   02:04.0 Switch Downstream Port to [bus 04]
> > > > > > >   04:00.0 Endpoint (Realtek 816e)
> > > > > > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > > > > > >   04:00.2 Endpoint (Realtek 816a)
> > > > > > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > > > > > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > > > > > >
> > > > > > > Which NIC is the problem?
> > > > > >
> > > > > > The intel one - so the side effect of the realtek nic is that it fixes
> > > > > > the intel nics issues.
> > > > > >
> > > > > > It would be that the intel nic actually has a bug with L1 (and it
> > > > > > would seem that it's to kind with latencies) so it actually has a
> > > > > > smaller buffer...
> > > > > >
> > > > > > And afair, the realtek has a larger buffer, since it behaves better
> > > > > > with L1 enabled.
> > > > > >
> > > > > > Either way, it's a fix that's needed ;)
> > > > >
> > > > > OK, what specifically is the fix that's needed?  The L0s change seems
> > > > > like a "this looks wrong" thing that doesn't actually affect your
> > > > > situation, so let's skip that for now.
> > > >
> > > > L1 latency calculation is not good enough, it assumes that *any*
> > > > link is the highest latency link - which is incorrect.
> > > >
> > > > The latency to bring L1 up is number-of-hops*1000 +
> > > > maximum-latency-along-the-path
> > > >
> > > > currently it's only doing number-of-hops*1000 +
> > > > arbitrary-latency-of-current-link
> > > >
> > > > > And let's support the change you *do* need with the "lspci -vv" for
> > > > > all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> > > > > since they share the 00:01.2 - 01:00.0 link), before and after the
> > > > > change.
> > > >
> > > > They are all included in all lspci output in the bug
> > >
> > > No doubt.  But I spent a long time going through those and the
> > > differences I found are not enough to show a spec violation or a fix.
> > >
> > > Here's what I extracted (this is a repeat; I mentioned this before):
> > >
> > >                                                     LnkCtl    LnkCtl
> > >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > >   00:01.2                                L1 <32us       L1+       L1-
> > >   01:00.0                                L1 <32us       L1+       L1-
> > >   02:03.0                                L1 <32us       L1+       L1+
> > >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> > >
> > > I don't see anything wrong here yet.  03:00.0 claims it can handle up
> > > to 64us of L1 exit latency, and the L1 exit latency of the entire path
> > > should be 33us.  What am I missing?
> >
> > it should be 32+3 so 35 us - but the intel nic claims something it
> > can't live up to.
>
> How did you compute 35us?  Here's my math for 33us:
>
>   It looks like we *should* be able to enable L1 on both links since
>   the exit latency should be <33us (first link starts exit at T=0,
>   completes by T=32; second link starts exit at T=1, completes by
>   T=33), and 03:00.0 can tolerate up to 64us.
>
> If 03:00.0 advertises that it can tolerate 64us but it really can't,
> the fix would be a quirk to override the DevCap value for that device.
>
> > Since this is triggered by the realtek device
>
> I'm still under the impression that the throughput problem is with
> 03:00.0, the Intel I211 NIC.  In what sense is this triggered by the
> Realtek device?  Does the I211 work correctly until we add the Realtek
> device below the same switch?
>
> >                                                      LnkCtl    LnkCtl
> >             ------DevCap-------  ----LnkCap-------  -Before-  -After--
> >    00:01.2                                L1 <32us       L1+       L1-
> >    01:00.0                                L1 <32us       L1+       L1-
> >    02:04.0                                L1 <32us       L1+       L1+
> >    04:00.0  L0s <512 L1 <64us
> >
> > But exit for 04:00.0 is 64us which means it breaks its own latency
> > requirements once it's behind anything
>
> From your "lspci-before" attachment, 04:00.0 advertises:
>
>   04:00.0 DevCap: Latency L1 <64us
>           LnkCap: Exit Latency L1 <64us
>
> So I see what you mean; if it's behind a switch, the link on the
> upstream side of the switch would add some L1 exit latency, so we
> wouldn't be able to exit L1 for the entire path in <64us.  This isn't
> broken in the sense of a hardware defect; it just means we won't be
> able to enable L1 on some parts of the path.
>
> I wouldn't be surprised if Linux is doing that wrong right now, but we
> would just need to nail down exactly what's wrong.

Actually I wonder if we shouldn't be looking at the wakeup latency
being the highest possible latency for any endpoint on the switch plus
1us per link? Wouldn't it be possible for us to see head-of-line
blocking on the PCIe bus if TLPs are being routed to the realtek
resulting in us having to buffer in the switch until the realtek link
is awake? The current code makes sense from the upstream perspective,
but I don't think it takes the downstream routing into account and
that could result in head-of-line blocking could it not?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-28 17:41           ` Ian Kumlien
@ 2020-09-28 19:53             ` Alexander Duyck
  2020-09-28 20:04               ` Ian Kumlien
  2020-09-28 21:43               ` Ian Kumlien
  0 siblings, 2 replies; 33+ messages in thread
From: Alexander Duyck @ 2020-09-28 19:53 UTC (permalink / raw)
  To: Ian Kumlien
  Cc: Bjorn Helgaas, linux-pci, Saheed O. Bolarinwa, Puranjay Mohan,
	Alexander Duyck

On Mon, Sep 28, 2020 at 10:42 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Mon, Sep 28, 2020 at 7:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Mon, Sep 28, 2020 at 12:24:11PM +0200, Ian Kumlien wrote:
> > > On Mon, Sep 28, 2020 at 2:06 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Sat, Sep 26, 2020 at 12:26:53AM +0200, Ian Kumlien wrote:
> > > > > On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > > > > > > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > >
> > > > > > > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > > > > > > So.......
> > > > > > > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > > > > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > > > > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > > > > > > >
> > > > > > > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > > > > > > on your bugzilla, here's the path:
> > > > > > >
> > > > > > > Correct, or you could do it like this:
> > > > > > > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > > > > > > I211 Gigabit Network Connection (rev 03)
> > > > > > >
> > > > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > > > > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > > > > > >
> > > > > > > > Your system does also have an 04:00.0 here:
> > > > > > > >
> > > > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > > > >   02:04.0 Switch Downstream Port to [bus 04]
> > > > > > > >   04:00.0 Endpoint (Realtek 816e)
> > > > > > > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > > > > > > >   04:00.2 Endpoint (Realtek 816a)
> > > > > > > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > > > > > > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > > > > > > >
> > > > > > > > Which NIC is the problem?
> > > > > > >
> > > > > > > The intel one - so the side effect of the realtek nic is that it fixes
> > > > > > > the intel nics issues.
> > > > > > >
> > > > > > > It would be that the intel nic actually has a bug with L1 (and it
> > > > > > > would seem that it's to kind with latencies) so it actually has a
> > > > > > > smaller buffer...
> > > > > > >
> > > > > > > And afair, the realtek has a larger buffer, since it behaves better
> > > > > > > with L1 enabled.
> > > > > > >
> > > > > > > Either way, it's a fix that's needed ;)
> > > > > >
> > > > > > OK, what specifically is the fix that's needed?  The L0s change seems
> > > > > > like a "this looks wrong" thing that doesn't actually affect your
> > > > > > situation, so let's skip that for now.
> > > > >
> > > > > L1 latency calculation is not good enough, it assumes that *any*
> > > > > link is the highest latency link - which is incorrect.
> > > > >
> > > > > The latency to bring L1 up is number-of-hops*1000 +
> > > > > maximum-latency-along-the-path
> > > > >
> > > > > currently it's only doing number-of-hops*1000 +
> > > > > arbitrary-latency-of-current-link
> > > > >
> > > > > > And let's support the change you *do* need with the "lspci -vv" for
> > > > > > all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> > > > > > since they share the 00:01.2 - 01:00.0 link), before and after the
> > > > > > change.
> > > > >
> > > > > They are all included in all lspci output in the bug
> > > >
> > > > No doubt.  But I spent a long time going through those and the
> > > > differences I found are not enough to show a spec violation or a fix.
> > > >
> > > > Here's what I extracted (this is a repeat; I mentioned this before):
> > > >
> > > >                                                     LnkCtl    LnkCtl
> > > >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > > >   00:01.2                                L1 <32us       L1+       L1-
> > > >   01:00.0                                L1 <32us       L1+       L1-
> > > >   02:03.0                                L1 <32us       L1+       L1+
> > > >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> > > >
> > > > I don't see anything wrong here yet.  03:00.0 claims it can handle up
> > > > to 64us of L1 exit latency, and the L1 exit latency of the entire path
> > > > should be 33us.  What am I missing?
> > >
> > > it should be 32+3 so 35 us - but the intel nic claims something it
> > > can't live up to.
> >
> > How did you compute 35us?  Here's my math for 33us:
> >
> >   It looks like we *should* be able to enable L1 on both links since
> >   the exit latency should be <33us (first link starts exit at T=0,
> >   completes by T=32; second link starts exit at T=1, completes by
> >   T=33), and 03:00.0 can tolerate up to 64us.
>
> 03:00.0 + 0
> 02:03.0 + 1000
> 01:00.0 + 1000
> 00:01.2 + 1000
>
> so, 32 us + 3us or am I missing something?

You are adding two too many. Specifically you should be counting the
links, not the endpoints. If I am not mistaken you only have two
links. 00:01.2<->01:00.0 and 02:03.0<->3:00.0. That is how Bjorn is
coming up with 33, because you only need to add 1 for the additional
link.

> > If 03:00.0 advertises that it can tolerate 64us but it really can't,
> > the fix would be a quirk to override the DevCap value for that device.
>
> Yeah, I wonder what it should be - I assume we could calculate it from latencies
> or perhaps there is something hidden in pcie 4 or higher L1 modes...
>
> I'm very uncertain of what level lspci handles and how much of the
> data it extracts
> (And i don't know better myself either ;))

The one question I would have is if we are actually seeing less than
64us or not. In the testing you did, did we ever try disabling the L1
on just the realtek devices? That would help to eliminate that as a
possible source of issues. As per my other comments I am wondering if
we are seeing head-of-line blocking occurring where requests for the
realtek device are blocking the downstream PCIe bus while it is
waiting on the device to wake up. If we are held for 64us with a
transaction stuck on the switch waiting for the realtek link to wake
up that might be long enough to start causing us to see throughput
problems as we cannot move the data between the CPU and the NIC fast
enough due to the potential stalls.

> > > Since this is triggered by the realtek device
> >
> > I'm still under the impression that the throughput problem is with
> > 03:00.0, the Intel I211 NIC.  In what sense is this triggered by the
> > Realtek device?  Does the I211 work correctly until we add the Realtek
> > device below the same switch?
>
> They are both embedded devices on the motherboard
> Asus  Pro WS X570-ACE with bios: 2206
>
> So i can't test one without the other...

You should be able to manually disable L1 on the realtek link
(4:00.0<->2:04.0) instead of doing it on the upstream link on the
switch. That may provide a datapoint on the L1 behavior of the setup.
Basically if you took the realtek out of the equation in terms of the
L1 exit time you should see the exit time drop to no more than 33us
like what would be expected with just the i210.

Also I wonder if we shouldn't prioritize the upstream side of the
switches for power savings versus the downstream side. In most cases
the upstream port on a switch will have more lanes than the downstream
side so enabling power saving there should result in greater power
savings versus the individual devices such as a i210 which is only a
single PCIe lane anyway.

> > >                                                      LnkCtl    LnkCtl
> > >             ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > >    00:01.2                                L1 <32us       L1+       L1-
> > >    01:00.0                                L1 <32us       L1+       L1-
> > >    02:04.0                                L1 <32us       L1+       L1+
> > >    04:00.0  L0s <512 L1 <64us
> > >
> > > But exit for 04:00.0 is 64us which means it breaks its own latency
> > > requirements once it's behind anything
> >
> > From your "lspci-before" attachment, 04:00.0 advertises:
> >
> >   04:00.0 DevCap: Latency L1 <64us
> >           LnkCap: Exit Latency L1 <64us
> >
> > So I see what you mean; if it's behind a switch, the link on the
> > upstream side of the switch would add some L1 exit latency, so we
> > wouldn't be able to exit L1 for the entire path in <64us.  This isn't
> > broken in the sense of a hardware defect; it just means we won't be
> > able to enable L1 on some parts of the path.
>
> Yes, it just seems odd to have the same latency as max latency..
>
> It could also be that they are assuming that we will not check that
> endpoint's latency
> as we walk the path... I have heard that this is quite tricky to get
> right and that Microsoft also had problems with it.
>
> (I assume that they would have a bit more resources and devices to test with)

I read that as this device is not configured to support stacked L1 ASPM.

> > I wouldn't be surprised if Linux is doing that wrong right now, but we
> > would just need to nail down exactly what's wrong.
>
> OK

This is something we didn't enable until recently so that isn't too
surprising. Previously, we just disabled ASPM if a switch was present.

> > > > > > I want to identify something in the "before" configuration that is
> > > > > > wrong according to the spec, and a change in the "after" config so it
> > > > > > now conforms to the spec.
> > > > >
> > > > > So there are a few issues here, the current code does not apply to spec.
> > > > >
> > > > > The intel nic gets fixed as a side effect (it should still get a
> > > > > proper fix) of making
> > > > > the code apply to spec.
> > > > >
> > > > > Basically, while hunting for the issue, I found that the L1 and L0s
> > > > > latency calculations used to determine
> > > > > if they should be enabled or not is wrong - that is what I'm currently
> > > > > trying to push - it also seems like the
> > > > > intel nic claims that it can handle 64us but apparently it can't.
> > > > >
> > > > > So, three bugs, two are "fixed" one needs additional fixing.
> > > >
> > > > OK, we just need to split these up as much as possible and support
> > > > them with the relevant lspci output, analysis of what specifically is
> > > > wrong, and the lspci output showing the effect of the fix.
> > >
> > > Could i have a copy of the pcie spec? I have found sections of it but
> > > it's really hard to find them again when you
> > > want to refer to something... Which I need to do to show that it
> > > doesn't conform...
> >
> > Wish I could give you the spec, but it's not public.  There are books
> > that cover most of this material, e.g., Mindshare's "PCI Express
> > System Architecture".  I have a 2004 copy and it covers ASPM (but not
> > the L1 Substates).  I have also heard rumors that it's possible to
> > find older copies of the spec on the web.
>
> Dang... :/

Just do a google search for "pcie gen 2 specification" you should find
a few results that way.

Thanks.

- Alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-28 19:53             ` Alexander Duyck
@ 2020-09-28 20:04               ` Ian Kumlien
  2020-09-28 20:33                 ` Ian Kumlien
  2020-09-28 21:43               ` Ian Kumlien
  1 sibling, 1 reply; 33+ messages in thread
From: Ian Kumlien @ 2020-09-28 20:04 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Bjorn Helgaas, linux-pci, Saheed O. Bolarinwa, Puranjay Mohan,
	Alexander Duyck

On Mon, Sep 28, 2020 at 9:53 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Mon, Sep 28, 2020 at 10:42 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >
> > On Mon, Sep 28, 2020 at 7:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > On Mon, Sep 28, 2020 at 12:24:11PM +0200, Ian Kumlien wrote:
> > > > On Mon, Sep 28, 2020 at 2:06 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > On Sat, Sep 26, 2020 at 12:26:53AM +0200, Ian Kumlien wrote:
> > > > > > On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > > > > > > > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > > > > > > > So.......
> > > > > > > > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > > > > > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > > > > > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > > > > > > > >
> > > > > > > > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > > > > > > > on your bugzilla, here's the path:
> > > > > > > >
> > > > > > > > Correct, or you could do it like this:
> > > > > > > > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > > > > > > > I211 Gigabit Network Connection (rev 03)
> > > > > > > >
> > > > > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > > > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > > > > > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > > > > > > >
> > > > > > > > > Your system does also have an 04:00.0 here:
> > > > > > > > >
> > > > > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > > > > >   02:04.0 Switch Downstream Port to [bus 04]
> > > > > > > > >   04:00.0 Endpoint (Realtek 816e)
> > > > > > > > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > > > > > > > >   04:00.2 Endpoint (Realtek 816a)
> > > > > > > > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > > > > > > > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > > > > > > > >
> > > > > > > > > Which NIC is the problem?
> > > > > > > >
> > > > > > > > The intel one - so the side effect of the realtek nic is that it fixes
> > > > > > > > the intel nics issues.
> > > > > > > >
> > > > > > > > It would be that the intel nic actually has a bug with L1 (and it
> > > > > > > > would seem that it's to kind with latencies) so it actually has a
> > > > > > > > smaller buffer...
> > > > > > > >
> > > > > > > > And afair, the realtek has a larger buffer, since it behaves better
> > > > > > > > with L1 enabled.
> > > > > > > >
> > > > > > > > Either way, it's a fix that's needed ;)
> > > > > > >
> > > > > > > OK, what specifically is the fix that's needed?  The L0s change seems
> > > > > > > like a "this looks wrong" thing that doesn't actually affect your
> > > > > > > situation, so let's skip that for now.
> > > > > >
> > > > > > L1 latency calculation is not good enough, it assumes that *any*
> > > > > > link is the highest latency link - which is incorrect.
> > > > > >
> > > > > > The latency to bring L1 up is number-of-hops*1000 +
> > > > > > maximum-latency-along-the-path
> > > > > >
> > > > > > currently it's only doing number-of-hops*1000 +
> > > > > > arbitrary-latency-of-current-link
> > > > > >
> > > > > > > And let's support the change you *do* need with the "lspci -vv" for
> > > > > > > all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> > > > > > > since they share the 00:01.2 - 01:00.0 link), before and after the
> > > > > > > change.
> > > > > >
> > > > > > They are all included in all lspci output in the bug
> > > > >
> > > > > No doubt.  But I spent a long time going through those and the
> > > > > differences I found are not enough to show a spec violation or a fix.
> > > > >
> > > > > Here's what I extracted (this is a repeat; I mentioned this before):
> > > > >
> > > > >                                                     LnkCtl    LnkCtl
> > > > >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > > > >   00:01.2                                L1 <32us       L1+       L1-
> > > > >   01:00.0                                L1 <32us       L1+       L1-
> > > > >   02:03.0                                L1 <32us       L1+       L1+
> > > > >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> > > > >
> > > > > I don't see anything wrong here yet.  03:00.0 claims it can handle up
> > > > > to 64us of L1 exit latency, and the L1 exit latency of the entire path
> > > > > should be 33us.  What am I missing?
> > > >
> > > > it should be 32+3 so 35 us - but the intel nic claims something it
> > > > can't live up to.
> > >
> > > How did you compute 35us?  Here's my math for 33us:
> > >
> > >   It looks like we *should* be able to enable L1 on both links since
> > >   the exit latency should be <33us (first link starts exit at T=0,
> > >   completes by T=32; second link starts exit at T=1, completes by
> > >   T=33), and 03:00.0 can tolerate up to 64us.
> >
> > 03:00.0 + 0
> > 02:03.0 + 1000
> > 01:00.0 + 1000
> > 00:01.2 + 1000
> >
> > so, 32 us + 3us or am I missing something?
>
> You are adding two too many. Specifically you should be counting the
> links, not the endpoints. If I am not mistaken you only have two
> links. 00:01.2<->01:00.0 and 02:03.0<->3:00.0. That is how Bjorn is
> coming up with 33, because you only need to add 1 for the additional
> link.

Then I'm missing something, since i saw L1 transition to the power up
step, and it's the link to link step that is 1us

> > > If 03:00.0 advertises that it can tolerate 64us but it really can't,
> > > the fix would be a quirk to override the DevCap value for that device.
> >
> > Yeah, I wonder what it should be - I assume we could calculate it from latencies
> > or perhaps there is something hidden in pcie 4 or higher L1 modes...
> >
> > I'm very uncertain of what level lspci handles and how much of the
> > data it extracts
> > (And i don't know better myself either ;))
>
> The one question I would have is if we are actually seeing less than
> 64us or not. In the testing you did, did we ever try disabling the L1
> on just the realtek devices? That would help to eliminate that as a
> possible source of issues. As per my other comments I am wondering if
> we are seeing head-of-line blocking occurring where requests for the
> realtek device are blocking the downstream PCIe bus while it is
> waiting on the device to wake up. If we are held for 64us with a
> transaction stuck on the switch waiting for the realtek link to wake
> up that might be long enough to start causing us to see throughput
> problems as we cannot move the data between the CPU and the NIC fast
> enough due to the potential stalls.

No, we did not try to disable it on this specific realtek device, I'll
try it and see

It also looks like i have a lot of aspm L1.1 supporting devices...

Anyway, I have no clue of the head-of-line blocking but it does seem
plausible since it's all serial.. .

> > > > Since this is triggered by the realtek device
> > >
> > > I'm still under the impression that the throughput problem is with
> > > 03:00.0, the Intel I211 NIC.  In what sense is this triggered by the
> > > Realtek device?  Does the I211 work correctly until we add the Realtek
> > > device below the same switch?
> >
> > They are both embedded devices on the motherboard
> > Asus  Pro WS X570-ACE with bios: 2206
> >
> > So i can't test one without the other...
>
> You should be able to manually disable L1 on the realtek link
> (4:00.0<->2:04.0) instead of doing it on the upstream link on the
> switch. That may provide a datapoint on the L1 behavior of the setup.
> Basically if you took the realtek out of the equation in terms of the
> L1 exit time you should see the exit time drop to no more than 33us
> like what would be expected with just the i210.

Yeah, will try it out with echo 0 >
/sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:04.0/0000:04:00.0/link/l1_aspm
(which is the device reported by my patch)

> Also I wonder if we shouldn't prioritize the upstream side of the
> switches for power savings versus the downstream side. In most cases
> the upstream port on a switch will have more lanes than the downstream
> side so enabling power saving there should result in greater power
> savings versus the individual devices such as a i210 which is only a
> single PCIe lane anyway.

That makes sense it would also avoid a lot of errors due to buffering
limitations, but L1 doesn't seem to be separated like that
L0s is however...

> > > >                                                      LnkCtl    LnkCtl
> > > >             ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > > >    00:01.2                                L1 <32us       L1+       L1-
> > > >    01:00.0                                L1 <32us       L1+       L1-
> > > >    02:04.0                                L1 <32us       L1+       L1+
> > > >    04:00.0  L0s <512 L1 <64us
> > > >
> > > > But exit for 04:00.0 is 64us which means it breaks its own latency
> > > > requirements once it's behind anything
> > >
> > > From your "lspci-before" attachment, 04:00.0 advertises:
> > >
> > >   04:00.0 DevCap: Latency L1 <64us
> > >           LnkCap: Exit Latency L1 <64us
> > >
> > > So I see what you mean; if it's behind a switch, the link on the
> > > upstream side of the switch would add some L1 exit latency, so we
> > > wouldn't be able to exit L1 for the entire path in <64us.  This isn't
> > > broken in the sense of a hardware defect; it just means we won't be
> > > able to enable L1 on some parts of the path.
> >
> > Yes, it just seems odd to have the same latency as max latency..
> >
> > It could also be that they are assuming that we will not check that
> > endpoint's latency
> > as we walk the path... I have heard that this is quite tricky to get
> > right and that Microsoft also had problems with it.
> >
> > (I assume that they would have a bit more resources and devices to test with)
>
> I read that as this device is not configured to support stacked L1 ASPM.

Yeah and since my reasoning was off by one, it makes more sense =)

> > > I wouldn't be surprised if Linux is doing that wrong right now, but we
> > > would just need to nail down exactly what's wrong.
> >
> > OK
>
> This is something we didn't enable until recently so that isn't too
> surprising. Previously, we just disabled ASPM if a switch was present.
>
> > > > > > > I want to identify something in the "before" configuration that is
> > > > > > > wrong according to the spec, and a change in the "after" config so it
> > > > > > > now conforms to the spec.
> > > > > >
> > > > > > So there are a few issues here, the current code does not apply to spec.
> > > > > >
> > > > > > The intel nic gets fixed as a side effect (it should still get a
> > > > > > proper fix) of making
> > > > > > the code apply to spec.
> > > > > >
> > > > > > Basically, while hunting for the issue, I found that the L1 and L0s
> > > > > > latency calculations used to determine
> > > > > > if they should be enabled or not is wrong - that is what I'm currently
> > > > > > trying to push - it also seems like the
> > > > > > intel nic claims that it can handle 64us but apparently it can't.
> > > > > >
> > > > > > So, three bugs, two are "fixed" one needs additional fixing.
> > > > >
> > > > > OK, we just need to split these up as much as possible and support
> > > > > them with the relevant lspci output, analysis of what specifically is
> > > > > wrong, and the lspci output showing the effect of the fix.
> > > >
> > > > Could i have a copy of the pcie spec? I have found sections of it but
> > > > it's really hard to find them again when you
> > > > want to refer to something... Which I need to do to show that it
> > > > doesn't conform...
> > >
> > > Wish I could give you the spec, but it's not public.  There are books
> > > that cover most of this material, e.g., Mindshare's "PCI Express
> > > System Architecture".  I have a 2004 copy and it covers ASPM (but not
> > > the L1 Substates).  I have also heard rumors that it's possible to
> > > find older copies of the spec on the web.
> >
> > Dang... :/
>
> Just do a google search for "pcie gen 2 specification" you should find
> a few results that way.
>
> Thanks.
>
> - Alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-28 20:04               ` Ian Kumlien
@ 2020-09-28 20:33                 ` Ian Kumlien
  2020-09-28 23:30                   ` Alexander Duyck
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Kumlien @ 2020-09-28 20:33 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Bjorn Helgaas, linux-pci, Saheed O. Bolarinwa, Puranjay Mohan,
	Alexander Duyck

On Mon, Sep 28, 2020 at 10:04 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Mon, Sep 28, 2020 at 9:53 PM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Mon, Sep 28, 2020 at 10:42 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > >
> > > On Mon, Sep 28, 2020 at 7:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
> > > > On Mon, Sep 28, 2020 at 12:24:11PM +0200, Ian Kumlien wrote:
> > > > > On Mon, Sep 28, 2020 at 2:06 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > On Sat, Sep 26, 2020 at 12:26:53AM +0200, Ian Kumlien wrote:
> > > > > > > On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > > > > > > > > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > > > > > > > > So.......
> > > > > > > > > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > > > > > > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > > > > > > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > > > > > > > > >
> > > > > > > > > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > > > > > > > > on your bugzilla, here's the path:
> > > > > > > > >
> > > > > > > > > Correct, or you could do it like this:
> > > > > > > > > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > > > > > > > > I211 Gigabit Network Connection (rev 03)
> > > > > > > > >
> > > > > > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > > > > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > > > > > > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > > > > > > > >
> > > > > > > > > > Your system does also have an 04:00.0 here:
> > > > > > > > > >
> > > > > > > > > >   00:01.2 Root Port              to [bus 01-07]
> > > > > > > > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > > > > > > > >   02:04.0 Switch Downstream Port to [bus 04]
> > > > > > > > > >   04:00.0 Endpoint (Realtek 816e)
> > > > > > > > > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > > > > > > > > >   04:00.2 Endpoint (Realtek 816a)
> > > > > > > > > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > > > > > > > > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > > > > > > > > >
> > > > > > > > > > Which NIC is the problem?
> > > > > > > > >
> > > > > > > > > The intel one - so the side effect of the realtek nic is that it fixes
> > > > > > > > > the intel nics issues.
> > > > > > > > >
> > > > > > > > > It would be that the intel nic actually has a bug with L1 (and it
> > > > > > > > > would seem that it's to kind with latencies) so it actually has a
> > > > > > > > > smaller buffer...
> > > > > > > > >
> > > > > > > > > And afair, the realtek has a larger buffer, since it behaves better
> > > > > > > > > with L1 enabled.
> > > > > > > > >
> > > > > > > > > Either way, it's a fix that's needed ;)
> > > > > > > >
> > > > > > > > OK, what specifically is the fix that's needed?  The L0s change seems
> > > > > > > > like a "this looks wrong" thing that doesn't actually affect your
> > > > > > > > situation, so let's skip that for now.
> > > > > > >
> > > > > > > L1 latency calculation is not good enough, it assumes that *any*
> > > > > > > link is the highest latency link - which is incorrect.
> > > > > > >
> > > > > > > The latency to bring L1 up is number-of-hops*1000 +
> > > > > > > maximum-latency-along-the-path
> > > > > > >
> > > > > > > currently it's only doing number-of-hops*1000 +
> > > > > > > arbitrary-latency-of-current-link
> > > > > > >
> > > > > > > > And let's support the change you *do* need with the "lspci -vv" for
> > > > > > > > all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> > > > > > > > since they share the 00:01.2 - 01:00.0 link), before and after the
> > > > > > > > change.
> > > > > > >
> > > > > > > They are all included in all lspci output in the bug
> > > > > >
> > > > > > No doubt.  But I spent a long time going through those and the
> > > > > > differences I found are not enough to show a spec violation or a fix.
> > > > > >
> > > > > > Here's what I extracted (this is a repeat; I mentioned this before):
> > > > > >
> > > > > >                                                     LnkCtl    LnkCtl
> > > > > >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > > > > >   00:01.2                                L1 <32us       L1+       L1-
> > > > > >   01:00.0                                L1 <32us       L1+       L1-
> > > > > >   02:03.0                                L1 <32us       L1+       L1+
> > > > > >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> > > > > >
> > > > > > I don't see anything wrong here yet.  03:00.0 claims it can handle up
> > > > > > to 64us of L1 exit latency, and the L1 exit latency of the entire path
> > > > > > should be 33us.  What am I missing?
> > > > >
> > > > > it should be 32+3 so 35 us - but the intel nic claims something it
> > > > > can't live up to.
> > > >
> > > > How did you compute 35us?  Here's my math for 33us:
> > > >
> > > >   It looks like we *should* be able to enable L1 on both links since
> > > >   the exit latency should be <33us (first link starts exit at T=0,
> > > >   completes by T=32; second link starts exit at T=1, completes by
> > > >   T=33), and 03:00.0 can tolerate up to 64us.
> > >
> > > 03:00.0 + 0
> > > 02:03.0 + 1000
> > > 01:00.0 + 1000
> > > 00:01.2 + 1000
> > >
> > > so, 32 us + 3us or am I missing something?
> >
> > You are adding two too many. Specifically you should be counting the
> > links, not the endpoints. If I am not mistaken you only have two
> > links. 00:01.2<->01:00.0 and 02:03.0<->3:00.0. That is how Bjorn is
> > coming up with 33, because you only need to add 1 for the additional
> > link.
>
> Then I'm missing something, since i saw L1 transition to the power up
> step, and it's the link to link step that is 1us
>
> > > > If 03:00.0 advertises that it can tolerate 64us but it really can't,
> > > > the fix would be a quirk to override the DevCap value for that device.
> > >
> > > Yeah, I wonder what it should be - I assume we could calculate it from latencies
> > > or perhaps there is something hidden in pcie 4 or higher L1 modes...
> > >
> > > I'm very uncertain of what level lspci handles and how much of the
> > > data it extracts
> > > (And i don't know better myself either ;))
> >
> > The one question I would have is if we are actually seeing less than
> > 64us or not. In the testing you did, did we ever try disabling the L1
> > on just the realtek devices? That would help to eliminate that as a
> > possible source of issues. As per my other comments I am wondering if
> > we are seeing head-of-line blocking occurring where requests for the
> > realtek device are blocking the downstream PCIe bus while it is
> > waiting on the device to wake up. If we are held for 64us with a
> > transaction stuck on the switch waiting for the realtek link to wake
> > up that might be long enough to start causing us to see throughput
> > problems as we cannot move the data between the CPU and the NIC fast
> > enough due to the potential stalls.
>
> No, we did not try to disable it on this specific realtek device, I'll
> try it and see
>
> It also looks like i have a lot of aspm L1.1 supporting devices...
>
> Anyway, I have no clue of the head-of-line blocking but it does seem
> plausible since it's all serial.. .

> > > > > Since this is triggered by the realtek device
> > > >
> > > > I'm still under the impression that the throughput problem is with
> > > > 03:00.0, the Intel I211 NIC.  In what sense is this triggered by the
> > > > Realtek device?  Does the I211 work correctly until we add the Realtek
> > > > device below the same switch?
> > >
> > > They are both embedded devices on the motherboard
> > > Asus  Pro WS X570-ACE with bios: 2206
> > >
> > > So i can't test one without the other...
> >
> > You should be able to manually disable L1 on the realtek link
> > (4:00.0<->2:04.0) instead of doing it on the upstream link on the
> > switch. That may provide a datapoint on the L1 behavior of the setup.
> > Basically if you took the realtek out of the equation in terms of the
> > L1 exit time you should see the exit time drop to no more than 33us
> > like what would be expected with just the i210.
>
> Yeah, will try it out with echo 0 >
> /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:04.0/0000:04:00.0/link/l1_aspm
> (which is the device reported by my patch)

So, 04:00.0 is already disabled, the existing code apparently handled
that correctly... *but*

given the path:
00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek
Semiconductor Co., Ltd. Device 816e (rev 1a)

Walking backwards:
-- 04:00.0 has l1 disabled
-- 02:04.0 doesn't have aspm?!

lspci reports:
Capabilities: [370 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
L1SubCtl2:
Capabilities: [400 v1] Data Link Feature <?>
Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [440 v1] Lane Margining at the Receiver <?>

However the link directory is empty.

Anything we should know about these unknown capabilities? also aspm
L1.1 and .1.2, heh =)

-- 01:00.0 has L1, disabling it makes the intel nic work again

ASPM L1 enabled:
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  5.40 MBytes  45.3 Mbits/sec    0   62.2 KBytes
[  5]   1.00-2.00   sec  4.47 MBytes  37.5 Mbits/sec    0   70.7 KBytes
[  5]   2.00-3.00   sec  4.10 MBytes  34.4 Mbits/sec    0   42.4 KBytes
[  5]   3.00-4.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
[  5]   4.00-5.00   sec  4.47 MBytes  37.5 Mbits/sec    0    105 KBytes
[  5]   5.00-6.00   sec  4.47 MBytes  37.5 Mbits/sec    0   84.8 KBytes
[  5]   6.00-7.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
[  5]   7.00-8.00   sec  4.10 MBytes  34.4 Mbits/sec    0   45.2 KBytes
[  5]   8.00-9.00   sec  4.47 MBytes  37.5 Mbits/sec    0   56.6 KBytes
[  5]   9.00-10.00  sec  4.47 MBytes  37.5 Mbits/sec    0   48.1 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  44.9 MBytes  37.7 Mbits/sec    0             sender
[  5]   0.00-10.01  sec  44.0 MBytes  36.9 Mbits/sec                  receiver

ASPM L1 disabled:
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   111 MBytes   935 Mbits/sec  733    761 KBytes
[  5]   1.00-2.00   sec   110 MBytes   923 Mbits/sec  733    662 KBytes
[  5]   2.00-3.00   sec   109 MBytes   912 Mbits/sec  1036   1.20 MBytes
[  5]   3.00-4.00   sec   109 MBytes   912 Mbits/sec  647    738 KBytes
[  5]   4.00-5.00   sec   110 MBytes   923 Mbits/sec  852    744 KBytes
[  5]   5.00-6.00   sec   109 MBytes   912 Mbits/sec  546    908 KBytes
[  5]   6.00-7.00   sec   109 MBytes   912 Mbits/sec  303    727 KBytes
[  5]   7.00-8.00   sec   109 MBytes   912 Mbits/sec  432    769 KBytes
[  5]   8.00-9.00   sec   110 MBytes   923 Mbits/sec  462    652 KBytes
[  5]   9.00-10.00  sec   109 MBytes   912 Mbits/sec  576    764 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.07 GBytes   918 Mbits/sec  6320             sender
[  5]   0.00-10.01  sec  1.06 GBytes   912 Mbits/sec                  receiver

(all measurements are over live internet - so thus variance)

> > Also I wonder if we shouldn't prioritize the upstream side of the
> > switches for power savings versus the downstream side. In most cases
> > the upstream port on a switch will have more lanes than the downstream
> > side so enabling power saving there should result in greater power
> > savings versus the individual devices such as a i210 which is only a
> > single PCIe lane anyway.
>
> That makes sense it would also avoid a lot of errors due to buffering
> limitations, but L1 doesn't seem to be separated like that
> L0s is however...
>
> > > > >                                                      LnkCtl    LnkCtl
> > > > >             ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > > > >    00:01.2                                L1 <32us       L1+       L1-
> > > > >    01:00.0                                L1 <32us       L1+       L1-
> > > > >    02:04.0                                L1 <32us       L1+       L1+
> > > > >    04:00.0  L0s <512 L1 <64us
> > > > >
> > > > > But exit for 04:00.0 is 64us which means it breaks its own latency
> > > > > requirements once it's behind anything
> > > >
> > > > From your "lspci-before" attachment, 04:00.0 advertises:
> > > >
> > > >   04:00.0 DevCap: Latency L1 <64us
> > > >           LnkCap: Exit Latency L1 <64us
> > > >
> > > > So I see what you mean; if it's behind a switch, the link on the
> > > > upstream side of the switch would add some L1 exit latency, so we
> > > > wouldn't be able to exit L1 for the entire path in <64us.  This isn't
> > > > broken in the sense of a hardware defect; it just means we won't be
> > > > able to enable L1 on some parts of the path.
> > >
> > > Yes, it just seems odd to have the same latency as max latency..
> > >
> > > It could also be that they are assuming that we will not check that
> > > endpoint's latency
> > > as we walk the path... I have heard that this is quite tricky to get
> > > right and that Microsoft also had problems with it.
> > >
> > > (I assume that they would have a bit more resources and devices to test with)
> >
> > I read that as this device is not configured to support stacked L1 ASPM.
>
> Yeah and since my reasoning was off by one, it makes more sense =)
>
> > > > I wouldn't be surprised if Linux is doing that wrong right now, but we
> > > > would just need to nail down exactly what's wrong.
> > >
> > > OK
> >
> > This is something we didn't enable until recently so that isn't too
> > surprising. Previously, we just disabled ASPM if a switch was present.
> >
> > > > > > > > I want to identify something in the "before" configuration that is
> > > > > > > > wrong according to the spec, and a change in the "after" config so it
> > > > > > > > now conforms to the spec.
> > > > > > >
> > > > > > > So there are a few issues here, the current code does not apply to spec.
> > > > > > >
> > > > > > > The intel nic gets fixed as a side effect (it should still get a
> > > > > > > proper fix) of making
> > > > > > > the code apply to spec.
> > > > > > >
> > > > > > > Basically, while hunting for the issue, I found that the L1 and L0s
> > > > > > > latency calculations used to determine
> > > > > > > if they should be enabled or not is wrong - that is what I'm currently
> > > > > > > trying to push - it also seems like the
> > > > > > > intel nic claims that it can handle 64us but apparently it can't.
> > > > > > >
> > > > > > > So, three bugs, two are "fixed" one needs additional fixing.
> > > > > >
> > > > > > OK, we just need to split these up as much as possible and support
> > > > > > them with the relevant lspci output, analysis of what specifically is
> > > > > > wrong, and the lspci output showing the effect of the fix.
> > > > >
> > > > > Could i have a copy of the pcie spec? I have found sections of it but
> > > > > it's really hard to find them again when you
> > > > > want to refer to something... Which I need to do to show that it
> > > > > doesn't conform...
> > > >
> > > > Wish I could give you the spec, but it's not public.  There are books
> > > > that cover most of this material, e.g., Mindshare's "PCI Express
> > > > System Architecture".  I have a 2004 copy and it covers ASPM (but not
> > > > the L1 Substates).  I have also heard rumors that it's possible to
> > > > find older copies of the spec on the web.
> > >
> > > Dang... :/
> >
> > Just do a google search for "pcie gen 2 specification" you should find
> > a few results that way.
> >
> > Thanks.
> >
> > - Alex

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-28 19:53             ` Alexander Duyck
  2020-09-28 20:04               ` Ian Kumlien
@ 2020-09-28 21:43               ` Ian Kumlien
  1 sibling, 0 replies; 33+ messages in thread
From: Ian Kumlien @ 2020-09-28 21:43 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Bjorn Helgaas, linux-pci, Saheed O. Bolarinwa, Puranjay Mohan,
	Alexander Duyck

On Mon, Sep 28, 2020 at 9:53 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Mon, Sep 28, 2020 at 10:42 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:

> Just do a google search for "pcie gen 2 specification" you should find
> a few results that way.

Actually found a draft pcie gen 4 spec...

and according to 5.4.1.2.2 exit from the L1 State
"A Switch is required to initiate an L1 exit transition on its
Upstream Port Link after no more than 1
μs from the beginning of an L1 exit transition on any of its
Downstream Port Links. Refer to
Section 4.2 for details of the Physical Layer signaling during L1 exit.

Consider the example in Figure 5-8. The numbers attached to each Port
represent the
corresponding Port’s reported Transmit Lanes L1 exit latency in units
of microseconds.

Links 1, 2, and 3 are all in the L1 state, and Endpoint C initiates a
transition to the L0 state at time
T. Since Switch B takes 32 μs to exit L1 on its Ports, Link 3 will
transition to the L0 state at T+32
(longest time considering T+8 for the Endpoint C, and T+32 for Switch B).

Switch B is required to initiate a transition from the L1 state on its
Upstream Port Link (Link 2)
after no more than 1 μs from the beginning of the transition from the
L1 state on Link 3.
Therefore, transition to the L0 state will begin on Link 2 at T+1.
Similarly, Link 1 will start its
transition to the L0 state at time T+2.

Following along as above, Link 2 will complete its transition to the
L0 state at time T+33 (since
Switch B takes longer to transition and it started at time T+1). Link
1 will complete its transition to
the L0 state at time T+34 (since the Root Complex takes 32 μs to
transition and it started at time
T+2).

Therefore, among Links 1, 2, and 3, the Link to complete the
transition to the L0 state last is Link 1
with a 34 μs delay. This is the delay experienced by the packet that
initiated the transition in
Endpoint C."

So basically, my change to L1 is validated by this, ;)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-28 20:33                 ` Ian Kumlien
@ 2020-09-28 23:30                   ` Alexander Duyck
  2020-09-29 12:51                     ` Ian Kumlien
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Duyck @ 2020-09-28 23:30 UTC (permalink / raw)
  To: Ian Kumlien
  Cc: Bjorn Helgaas, linux-pci, Saheed O. Bolarinwa, Puranjay Mohan,
	Alexander Duyck

On Mon, Sep 28, 2020 at 1:33 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Mon, Sep 28, 2020 at 10:04 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >
> > On Mon, Sep 28, 2020 at 9:53 PM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:

<snip>

> > > You should be able to manually disable L1 on the realtek link
> > > (4:00.0<->2:04.0) instead of doing it on the upstream link on the
> > > switch. That may provide a datapoint on the L1 behavior of the setup.
> > > Basically if you took the realtek out of the equation in terms of the
> > > L1 exit time you should see the exit time drop to no more than 33us
> > > like what would be expected with just the i210.
> >
> > Yeah, will try it out with echo 0 >
> > /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:04.0/0000:04:00.0/link/l1_aspm
> > (which is the device reported by my patch)
>
> So, 04:00.0 is already disabled, the existing code apparently handled
> that correctly... *but*
>
> given the path:
> 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek
> Semiconductor Co., Ltd. Device 816e (rev 1a)
>
> Walking backwards:
> -- 04:00.0 has l1 disabled
> -- 02:04.0 doesn't have aspm?!
>
> lspci reports:
> Capabilities: [370 v1] L1 PM Substates
> L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> L1SubCtl2:
> Capabilities: [400 v1] Data Link Feature <?>
> Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> Capabilities: [440 v1] Lane Margining at the Receiver <?>
>
> However the link directory is empty.
>
> Anything we should know about these unknown capabilities? also aspm
> L1.1 and .1.2, heh =)
>
> -- 01:00.0 has L1, disabling it makes the intel nic work again

I recall that much. However the question is why? If there is already a
32us time to bring up the link between the NIC and the switch why
would the additional 1us to also bring up the upstream port have that
much of an effect? That is why I am thinking that it may be worthwhile
to try to isolate things further so that only the upstream port and
the NIC have L1 enabled. If we are still seeing issues in that state
then I can only assume there is something off with the
00:01.2<->1:00.0 link to where it either isn't advertising the actual
L1 recovery time. For example the "Width x4 (downgraded)" looks very
suspicious and could be responsible for something like that if the
link training is having to go through exception cases to work out the
x4 link instead of a x8.

> ASPM L1 enabled:
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec  5.40 MBytes  45.3 Mbits/sec    0   62.2 KBytes
> [  5]   1.00-2.00   sec  4.47 MBytes  37.5 Mbits/sec    0   70.7 KBytes
> [  5]   2.00-3.00   sec  4.10 MBytes  34.4 Mbits/sec    0   42.4 KBytes
> [  5]   3.00-4.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
> [  5]   4.00-5.00   sec  4.47 MBytes  37.5 Mbits/sec    0    105 KBytes
> [  5]   5.00-6.00   sec  4.47 MBytes  37.5 Mbits/sec    0   84.8 KBytes
> [  5]   6.00-7.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
> [  5]   7.00-8.00   sec  4.10 MBytes  34.4 Mbits/sec    0   45.2 KBytes
> [  5]   8.00-9.00   sec  4.47 MBytes  37.5 Mbits/sec    0   56.6 KBytes
> [  5]   9.00-10.00  sec  4.47 MBytes  37.5 Mbits/sec    0   48.1 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  44.9 MBytes  37.7 Mbits/sec    0             sender
> [  5]   0.00-10.01  sec  44.0 MBytes  36.9 Mbits/sec                  receiver
>
> ASPM L1 disabled:
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.00   sec   111 MBytes   935 Mbits/sec  733    761 KBytes
> [  5]   1.00-2.00   sec   110 MBytes   923 Mbits/sec  733    662 KBytes
> [  5]   2.00-3.00   sec   109 MBytes   912 Mbits/sec  1036   1.20 MBytes
> [  5]   3.00-4.00   sec   109 MBytes   912 Mbits/sec  647    738 KBytes
> [  5]   4.00-5.00   sec   110 MBytes   923 Mbits/sec  852    744 KBytes
> [  5]   5.00-6.00   sec   109 MBytes   912 Mbits/sec  546    908 KBytes
> [  5]   6.00-7.00   sec   109 MBytes   912 Mbits/sec  303    727 KBytes
> [  5]   7.00-8.00   sec   109 MBytes   912 Mbits/sec  432    769 KBytes
> [  5]   8.00-9.00   sec   110 MBytes   923 Mbits/sec  462    652 KBytes
> [  5]   9.00-10.00  sec   109 MBytes   912 Mbits/sec  576    764 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-10.00  sec  1.07 GBytes   918 Mbits/sec  6320             sender
> [  5]   0.00-10.01  sec  1.06 GBytes   912 Mbits/sec                  receiver
>
> (all measurements are over live internet - so thus variance)

I forgot there were 5 total devices that were hanging off of there as
well. You might try checking to see if disabling L1 on devices 5:00.0,
6:00.0 and/or 7:00.0 has any effect while leaving the L1 on 01:00.0
and the NIC active. The basic idea is to go through and make certain
we aren't seeing an L1 issue with one of the other downstream links on
the switch.

The more I think about it the entire setup for this does seem a bit
suspicious. I was looking over the lspci tree and the dump from the
system. From what I can tell the upstream switch link at 01.2 <->
1:00.0 is only a Gen4 x4 link. However coming off of that is 5
devices, two NICs using either Gen1 or 2 at x1, and then a USB
controller and 2 SATA controller reporting Gen 4 x16. Specifically
those last 3 devices have me a bit curious as they are all reporting
L0s and L1 exit latencies that are the absolute minimum which has me
wondering if they are even reporting actual values.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-28 23:30                   ` Alexander Duyck
@ 2020-09-29 12:51                     ` Ian Kumlien
  2020-09-29 16:23                       ` Alexander Duyck
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Kumlien @ 2020-09-29 12:51 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Bjorn Helgaas, linux-pci, Saheed O. Bolarinwa, Puranjay Mohan,
	Alexander Duyck

On Tue, Sep 29, 2020 at 1:31 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Mon, Sep 28, 2020 at 1:33 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >
> > On Mon, Sep 28, 2020 at 10:04 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > >
> > > On Mon, Sep 28, 2020 at 9:53 PM Alexander Duyck
> > > <alexander.duyck@gmail.com> wrote:
>
> <snip>
>
> > > > You should be able to manually disable L1 on the realtek link
> > > > (4:00.0<->2:04.0) instead of doing it on the upstream link on the
> > > > switch. That may provide a datapoint on the L1 behavior of the setup.
> > > > Basically if you took the realtek out of the equation in terms of the
> > > > L1 exit time you should see the exit time drop to no more than 33us
> > > > like what would be expected with just the i210.
> > >
> > > Yeah, will try it out with echo 0 >
> > > /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:04.0/0000:04:00.0/link/l1_aspm
> > > (which is the device reported by my patch)
> >
> > So, 04:00.0 is already disabled, the existing code apparently handled
> > that correctly... *but*
> >
> > given the path:
> > 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek
> > Semiconductor Co., Ltd. Device 816e (rev 1a)
> >
> > Walking backwards:
> > -- 04:00.0 has l1 disabled
> > -- 02:04.0 doesn't have aspm?!
> >
> > lspci reports:
> > Capabilities: [370 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
> > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> > L1SubCtl2:
> > Capabilities: [400 v1] Data Link Feature <?>
> > Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> > Capabilities: [440 v1] Lane Margining at the Receiver <?>
> >
> > However the link directory is empty.
> >
> > Anything we should know about these unknown capabilities? also aspm
> > L1.1 and .1.2, heh =)
> >
> > -- 01:00.0 has L1, disabling it makes the intel nic work again
>
> I recall that much. However the question is why? If there is already a
> 32us time to bring up the link between the NIC and the switch why
> would the additional 1us to also bring up the upstream port have that
> much of an effect? That is why I am thinking that it may be worthwhile
> to try to isolate things further so that only the upstream port and
> the NIC have L1 enabled. If we are still seeing issues in that state
> then I can only assume there is something off with the
> 00:01.2<->1:00.0 link to where it either isn't advertising the actual
> L1 recovery time. For example the "Width x4 (downgraded)" looks very
> suspicious and could be responsible for something like that if the
> link training is having to go through exception cases to work out the
> x4 link instead of a x8.

It is a x4 link, all links that aren't "fully populated" or "fully
utilized" are listed as downgraded...

So, x16 card in x8 slot or pcie 3 card in pcie 2 slot - all lists as downgraded

> > ASPM L1 enabled:
> > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > [  5]   0.00-1.00   sec  5.40 MBytes  45.3 Mbits/sec    0   62.2 KBytes
> > [  5]   1.00-2.00   sec  4.47 MBytes  37.5 Mbits/sec    0   70.7 KBytes
> > [  5]   2.00-3.00   sec  4.10 MBytes  34.4 Mbits/sec    0   42.4 KBytes
> > [  5]   3.00-4.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
> > [  5]   4.00-5.00   sec  4.47 MBytes  37.5 Mbits/sec    0    105 KBytes
> > [  5]   5.00-6.00   sec  4.47 MBytes  37.5 Mbits/sec    0   84.8 KBytes
> > [  5]   6.00-7.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
> > [  5]   7.00-8.00   sec  4.10 MBytes  34.4 Mbits/sec    0   45.2 KBytes
> > [  5]   8.00-9.00   sec  4.47 MBytes  37.5 Mbits/sec    0   56.6 KBytes
> > [  5]   9.00-10.00  sec  4.47 MBytes  37.5 Mbits/sec    0   48.1 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  44.9 MBytes  37.7 Mbits/sec    0             sender
> > [  5]   0.00-10.01  sec  44.0 MBytes  36.9 Mbits/sec                  receiver
> >
> > ASPM L1 disabled:
> > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > [  5]   0.00-1.00   sec   111 MBytes   935 Mbits/sec  733    761 KBytes
> > [  5]   1.00-2.00   sec   110 MBytes   923 Mbits/sec  733    662 KBytes
> > [  5]   2.00-3.00   sec   109 MBytes   912 Mbits/sec  1036   1.20 MBytes
> > [  5]   3.00-4.00   sec   109 MBytes   912 Mbits/sec  647    738 KBytes
> > [  5]   4.00-5.00   sec   110 MBytes   923 Mbits/sec  852    744 KBytes
> > [  5]   5.00-6.00   sec   109 MBytes   912 Mbits/sec  546    908 KBytes
> > [  5]   6.00-7.00   sec   109 MBytes   912 Mbits/sec  303    727 KBytes
> > [  5]   7.00-8.00   sec   109 MBytes   912 Mbits/sec  432    769 KBytes
> > [  5]   8.00-9.00   sec   110 MBytes   923 Mbits/sec  462    652 KBytes
> > [  5]   9.00-10.00  sec   109 MBytes   912 Mbits/sec  576    764 KBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval           Transfer     Bitrate         Retr
> > [  5]   0.00-10.00  sec  1.07 GBytes   918 Mbits/sec  6320             sender
> > [  5]   0.00-10.01  sec  1.06 GBytes   912 Mbits/sec                  receiver
> >
> > (all measurements are over live internet - so thus variance)
>
> I forgot there were 5 total devices that were hanging off of there as
> well. You might try checking to see if disabling L1 on devices 5:00.0,
> 6:00.0 and/or 7:00.0 has any effect while leaving the L1 on 01:00.0
> and the NIC active. The basic idea is to go through and make certain
> we aren't seeing an L1 issue with one of the other downstream links on
> the switch.

I did, and i saw no change, only disabling L1 on 01:00.0 gives any effect.
But i'd say you're right in your thinking - with L0s head-of-queue
stalling can happen
due to retry buffers and so on, was interesting to see it detailed...

> The more I think about it the entire setup for this does seem a bit
> suspicious. I was looking over the lspci tree and the dump from the
> system. From what I can tell the upstream switch link at 01.2 <->
> 1:00.0 is only a Gen4 x4 link. However coming off of that is 5
> devices, two NICs using either Gen1 or 2 at x1, and then a USB
> controller and 2 SATA controller reporting Gen 4 x16. Specifically
> those last 3 devices have me a bit curious as they are all reporting
> L0s and L1 exit latencies that are the absolute minimum which has me
> wondering if they are even reporting actual values.

Heh, I have been trying to google for erratas wrt to:
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch
Upstream aka 1022:57ad

and the cpu, to see if there is something else I could have missed,
but i haven't found anything relating to this yet...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-29 12:51                     ` Ian Kumlien
@ 2020-09-29 16:23                       ` Alexander Duyck
  2020-09-29 21:13                         ` Ian Kumlien
  0 siblings, 1 reply; 33+ messages in thread
From: Alexander Duyck @ 2020-09-29 16:23 UTC (permalink / raw)
  To: Ian Kumlien
  Cc: Bjorn Helgaas, linux-pci, Saheed O. Bolarinwa, Puranjay Mohan,
	Alexander Duyck

On Tue, Sep 29, 2020 at 5:51 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Tue, Sep 29, 2020 at 1:31 AM Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
> >
> > On Mon, Sep 28, 2020 at 1:33 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > >
> > > On Mon, Sep 28, 2020 at 10:04 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > >
> > > > On Mon, Sep 28, 2020 at 9:53 PM Alexander Duyck
> > > > <alexander.duyck@gmail.com> wrote:
> >
> > <snip>
> >
> > > > > You should be able to manually disable L1 on the realtek link
> > > > > (4:00.0<->2:04.0) instead of doing it on the upstream link on the
> > > > > switch. That may provide a datapoint on the L1 behavior of the setup.
> > > > > Basically if you took the realtek out of the equation in terms of the
> > > > > L1 exit time you should see the exit time drop to no more than 33us
> > > > > like what would be expected with just the i210.
> > > >
> > > > Yeah, will try it out with echo 0 >
> > > > /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:04.0/0000:04:00.0/link/l1_aspm
> > > > (which is the device reported by my patch)
> > >
> > > So, 04:00.0 is already disabled, the existing code apparently handled
> > > that correctly... *but*
> > >
> > > given the path:
> > > 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek
> > > Semiconductor Co., Ltd. Device 816e (rev 1a)
> > >
> > > Walking backwards:
> > > -- 04:00.0 has l1 disabled
> > > -- 02:04.0 doesn't have aspm?!
> > >
> > > lspci reports:
> > > Capabilities: [370 v1] L1 PM Substates
> > > L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
> > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> > > L1SubCtl2:
> > > Capabilities: [400 v1] Data Link Feature <?>
> > > Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> > > Capabilities: [440 v1] Lane Margining at the Receiver <?>
> > >
> > > However the link directory is empty.
> > >
> > > Anything we should know about these unknown capabilities? also aspm
> > > L1.1 and .1.2, heh =)
> > >
> > > -- 01:00.0 has L1, disabling it makes the intel nic work again
> >
> > I recall that much. However the question is why? If there is already a
> > 32us time to bring up the link between the NIC and the switch why
> > would the additional 1us to also bring up the upstream port have that
> > much of an effect? That is why I am thinking that it may be worthwhile
> > to try to isolate things further so that only the upstream port and
> > the NIC have L1 enabled. If we are still seeing issues in that state
> > then I can only assume there is something off with the
> > 00:01.2<->1:00.0 link to where it either isn't advertising the actual
> > L1 recovery time. For example the "Width x4 (downgraded)" looks very
> > suspicious and could be responsible for something like that if the
> > link training is having to go through exception cases to work out the
> > x4 link instead of a x8.
>
> It is a x4 link, all links that aren't "fully populated" or "fully
> utilized" are listed as downgraded...
>
> So, x16 card in x8 slot or pcie 3 card in pcie 2 slot - all lists as downgraded

Right, but when both sides say they are capable of x8 and are
reporting a x4 as is the case in the 00:01.2 <-> 01:00.0 link, that
raises some eyebrows as both sides say they are capable of x8 so it
makes me wonder if the lanes were only run for x4 and BIOS/firmware
wasn't configured correctly, or if only 4 of the lanes are working
resulting in a x4 due to an electrical issue:

00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD]
Starship/Matisse GPP Bridge (prog-if 00 [Normal decode])
LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM L1, Exit Latency L1 <32us
LnkSta: Speed 16GT/s (ok), Width x4 (downgraded)

01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch
LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM L1, Exit Latency L1 <32us
LnkSta: Speed 16GT/s (ok), Width x4 (downgraded)

I bring it up because in the past I have seen some NICs that start out
x4 and after a week with ASPM on and moderate activity end up dropping
to a x1 and eventually fall off the bus due to electrical issues on
the motherboard. I recall you mentioning that this has always
connected at no higher than x4, but I still don't know if that is by
design or simply because it cannot due to some other issue.

> > > ASPM L1 enabled:
> > > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > > [  5]   0.00-1.00   sec  5.40 MBytes  45.3 Mbits/sec    0   62.2 KBytes
> > > [  5]   1.00-2.00   sec  4.47 MBytes  37.5 Mbits/sec    0   70.7 KBytes
> > > [  5]   2.00-3.00   sec  4.10 MBytes  34.4 Mbits/sec    0   42.4 KBytes
> > > [  5]   3.00-4.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
> > > [  5]   4.00-5.00   sec  4.47 MBytes  37.5 Mbits/sec    0    105 KBytes
> > > [  5]   5.00-6.00   sec  4.47 MBytes  37.5 Mbits/sec    0   84.8 KBytes
> > > [  5]   6.00-7.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
> > > [  5]   7.00-8.00   sec  4.10 MBytes  34.4 Mbits/sec    0   45.2 KBytes
> > > [  5]   8.00-9.00   sec  4.47 MBytes  37.5 Mbits/sec    0   56.6 KBytes
> > > [  5]   9.00-10.00  sec  4.47 MBytes  37.5 Mbits/sec    0   48.1 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval           Transfer     Bitrate         Retr
> > > [  5]   0.00-10.00  sec  44.9 MBytes  37.7 Mbits/sec    0             sender
> > > [  5]   0.00-10.01  sec  44.0 MBytes  36.9 Mbits/sec                  receiver
> > >
> > > ASPM L1 disabled:
> > > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > > [  5]   0.00-1.00   sec   111 MBytes   935 Mbits/sec  733    761 KBytes
> > > [  5]   1.00-2.00   sec   110 MBytes   923 Mbits/sec  733    662 KBytes
> > > [  5]   2.00-3.00   sec   109 MBytes   912 Mbits/sec  1036   1.20 MBytes
> > > [  5]   3.00-4.00   sec   109 MBytes   912 Mbits/sec  647    738 KBytes
> > > [  5]   4.00-5.00   sec   110 MBytes   923 Mbits/sec  852    744 KBytes
> > > [  5]   5.00-6.00   sec   109 MBytes   912 Mbits/sec  546    908 KBytes
> > > [  5]   6.00-7.00   sec   109 MBytes   912 Mbits/sec  303    727 KBytes
> > > [  5]   7.00-8.00   sec   109 MBytes   912 Mbits/sec  432    769 KBytes
> > > [  5]   8.00-9.00   sec   110 MBytes   923 Mbits/sec  462    652 KBytes
> > > [  5]   9.00-10.00  sec   109 MBytes   912 Mbits/sec  576    764 KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval           Transfer     Bitrate         Retr
> > > [  5]   0.00-10.00  sec  1.07 GBytes   918 Mbits/sec  6320             sender
> > > [  5]   0.00-10.01  sec  1.06 GBytes   912 Mbits/sec                  receiver
> > >
> > > (all measurements are over live internet - so thus variance)
> >
> > I forgot there were 5 total devices that were hanging off of there as
> > well. You might try checking to see if disabling L1 on devices 5:00.0,
> > 6:00.0 and/or 7:00.0 has any effect while leaving the L1 on 01:00.0
> > and the NIC active. The basic idea is to go through and make certain
> > we aren't seeing an L1 issue with one of the other downstream links on
> > the switch.
>
> I did, and i saw no change, only disabling L1 on 01:00.0 gives any effect.
> But i'd say you're right in your thinking - with L0s head-of-queue
> stalling can happen
> due to retry buffers and so on, was interesting to see it detailed...

Okay, so the issue then is definitely the use of L1 on the 00:01.2 <->
01:00.0 link. The only piece we don't have the answer to is why, which
is something we might only be able to answer if we had a PCIe
analyzer.

> > The more I think about it the entire setup for this does seem a bit
> > suspicious. I was looking over the lspci tree and the dump from the
> > system. From what I can tell the upstream switch link at 01.2 <->
> > 1:00.0 is only a Gen4 x4 link. However coming off of that is 5
> > devices, two NICs using either Gen1 or 2 at x1, and then a USB
> > controller and 2 SATA controller reporting Gen 4 x16. Specifically
> > those last 3 devices have me a bit curious as they are all reporting
> > L0s and L1 exit latencies that are the absolute minimum which has me
> > wondering if they are even reporting actual values.
>
> Heh, I have been trying to google for erratas wrt to:
> 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch
> Upstream aka 1022:57ad
>
> and the cpu, to see if there is something else I could have missed,
> but i haven't found anything relating to this yet...

The thing is this could be something that there isn't an errata for.
All it takes is a bad component somewhere and you can have one lane
that is a bit flaky and causes the link establishment to take longer
than it is supposed to.

The fact that the patch resolves the issue ends up being more
coincidental than intentional though. We should be able to have the
NIC work with just the upstream and NIC link on the switch running
with ASPM enabled, the fact that we can't makes me wonder about that
upstream port link. Did you only have this one system or were there
other similar systems you could test with?

If we only have the one system it might make sense to just update the
description for the patch and get away from focusing on this issue,
and instead focus on the fact that the PCIe spec indicates that this
is the way it is supposed to be calculated. If we had more of these
systems to test with and found this is a common thing then we could
look at adding a PCI quirk for the device to just disable ASPM
whenever we saw it.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-29 16:23                       ` Alexander Duyck
@ 2020-09-29 21:13                         ` Ian Kumlien
  0 siblings, 0 replies; 33+ messages in thread
From: Ian Kumlien @ 2020-09-29 21:13 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Bjorn Helgaas, linux-pci, Saheed O. Bolarinwa, Puranjay Mohan,
	Alexander Duyck

On Tue, Sep 29, 2020 at 6:23 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Tue, Sep 29, 2020 at 5:51 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> >
> > On Tue, Sep 29, 2020 at 1:31 AM Alexander Duyck
> > <alexander.duyck@gmail.com> wrote:
> > >
> > > On Mon, Sep 28, 2020 at 1:33 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > >
> > > > On Mon, Sep 28, 2020 at 10:04 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
> > > > >
> > > > > On Mon, Sep 28, 2020 at 9:53 PM Alexander Duyck
> > > > > <alexander.duyck@gmail.com> wrote:
> > >
> > > <snip>
> > >
> > > > > > You should be able to manually disable L1 on the realtek link
> > > > > > (4:00.0<->2:04.0) instead of doing it on the upstream link on the
> > > > > > switch. That may provide a datapoint on the L1 behavior of the setup.
> > > > > > Basically if you took the realtek out of the equation in terms of the
> > > > > > L1 exit time you should see the exit time drop to no more than 33us
> > > > > > like what would be expected with just the i210.
> > > > >
> > > > > Yeah, will try it out with echo 0 >
> > > > > /sys/devices/pci0000:00/0000:00:01.2/0000:01:00.0/0000:02:04.0/0000:04:00.0/link/l1_aspm
> > > > > (which is the device reported by my patch)
> > > >
> > > > So, 04:00.0 is already disabled, the existing code apparently handled
> > > > that correctly... *but*
> > > >
> > > > given the path:
> > > > 00:01.2/01:00.0/02:04.0/04:00.0 Unassigned class [ff00]: Realtek
> > > > Semiconductor Co., Ltd. Device 816e (rev 1a)
> > > >
> > > > Walking backwards:
> > > > -- 04:00.0 has l1 disabled
> > > > -- 02:04.0 doesn't have aspm?!
> > > >
> > > > lspci reports:
> > > > Capabilities: [370 v1] L1 PM Substates
> > > > L1SubCap: PCI-PM_L1.2- PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1+ L1_PM_Substates+
> > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> > > > L1SubCtl2:
> > > > Capabilities: [400 v1] Data Link Feature <?>
> > > > Capabilities: [410 v1] Physical Layer 16.0 GT/s <?>
> > > > Capabilities: [440 v1] Lane Margining at the Receiver <?>
> > > >
> > > > However the link directory is empty.
> > > >
> > > > Anything we should know about these unknown capabilities? also aspm
> > > > L1.1 and .1.2, heh =)
> > > >
> > > > -- 01:00.0 has L1, disabling it makes the intel nic work again
> > >
> > > I recall that much. However the question is why? If there is already a
> > > 32us time to bring up the link between the NIC and the switch why
> > > would the additional 1us to also bring up the upstream port have that
> > > much of an effect? That is why I am thinking that it may be worthwhile
> > > to try to isolate things further so that only the upstream port and
> > > the NIC have L1 enabled. If we are still seeing issues in that state
> > > then I can only assume there is something off with the
> > > 00:01.2<->1:00.0 link to where it either isn't advertising the actual
> > > L1 recovery time. For example the "Width x4 (downgraded)" looks very
> > > suspicious and could be responsible for something like that if the
> > > link training is having to go through exception cases to work out the
> > > x4 link instead of a x8.
> >
> > It is a x4 link, all links that aren't "fully populated" or "fully
> > utilized" are listed as downgraded...
> >
> > So, x16 card in x8 slot or pcie 3 card in pcie 2 slot - all lists as downgraded
>
> Right, but when both sides say they are capable of x8 and are
> reporting a x4 as is the case in the 00:01.2 <-> 01:00.0 link, that
> raises some eyebrows as both sides say they are capable of x8 so it
> makes me wonder if the lanes were only run for x4 and BIOS/firmware
> wasn't configured correctly, or if only 4 of the lanes are working
> resulting in a x4 due to an electrical issue:

I think there are only 4 physical lanes and afair it has nothing to do with bios

As I stated before, it looks the same for the mellanox card.

> 00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD]
> Starship/Matisse GPP Bridge (prog-if 00 [Normal decode])
> LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM L1, Exit Latency L1 <32us
> LnkSta: Speed 16GT/s (ok), Width x4 (downgraded)
>
> 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch
> LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM L1, Exit Latency L1 <32us
> LnkSta: Speed 16GT/s (ok), Width x4 (downgraded)
>
> I bring it up because in the past I have seen some NICs that start out
> x4 and after a week with ASPM on and moderate activity end up dropping
> to a x1 and eventually fall off the bus due to electrical issues on
> the motherboard. I recall you mentioning that this has always
> connected at no higher than x4, but I still don't know if that is by
> design or simply because it cannot due to some other issue.

I would have to check with ASUS but I suspect that is as intended

> > > > ASPM L1 enabled:
> > > > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > > > [  5]   0.00-1.00   sec  5.40 MBytes  45.3 Mbits/sec    0   62.2 KBytes
> > > > [  5]   1.00-2.00   sec  4.47 MBytes  37.5 Mbits/sec    0   70.7 KBytes
> > > > [  5]   2.00-3.00   sec  4.10 MBytes  34.4 Mbits/sec    0   42.4 KBytes
> > > > [  5]   3.00-4.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
> > > > [  5]   4.00-5.00   sec  4.47 MBytes  37.5 Mbits/sec    0    105 KBytes
> > > > [  5]   5.00-6.00   sec  4.47 MBytes  37.5 Mbits/sec    0   84.8 KBytes
> > > > [  5]   6.00-7.00   sec  4.47 MBytes  37.5 Mbits/sec    0   65.0 KBytes
> > > > [  5]   7.00-8.00   sec  4.10 MBytes  34.4 Mbits/sec    0   45.2 KBytes
> > > > [  5]   8.00-9.00   sec  4.47 MBytes  37.5 Mbits/sec    0   56.6 KBytes
> > > > [  5]   9.00-10.00  sec  4.47 MBytes  37.5 Mbits/sec    0   48.1 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval           Transfer     Bitrate         Retr
> > > > [  5]   0.00-10.00  sec  44.9 MBytes  37.7 Mbits/sec    0             sender
> > > > [  5]   0.00-10.01  sec  44.0 MBytes  36.9 Mbits/sec                  receiver
> > > >
> > > > ASPM L1 disabled:
> > > > [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> > > > [  5]   0.00-1.00   sec   111 MBytes   935 Mbits/sec  733    761 KBytes
> > > > [  5]   1.00-2.00   sec   110 MBytes   923 Mbits/sec  733    662 KBytes
> > > > [  5]   2.00-3.00   sec   109 MBytes   912 Mbits/sec  1036   1.20 MBytes
> > > > [  5]   3.00-4.00   sec   109 MBytes   912 Mbits/sec  647    738 KBytes
> > > > [  5]   4.00-5.00   sec   110 MBytes   923 Mbits/sec  852    744 KBytes
> > > > [  5]   5.00-6.00   sec   109 MBytes   912 Mbits/sec  546    908 KBytes
> > > > [  5]   6.00-7.00   sec   109 MBytes   912 Mbits/sec  303    727 KBytes
> > > > [  5]   7.00-8.00   sec   109 MBytes   912 Mbits/sec  432    769 KBytes
> > > > [  5]   8.00-9.00   sec   110 MBytes   923 Mbits/sec  462    652 KBytes
> > > > [  5]   9.00-10.00  sec   109 MBytes   912 Mbits/sec  576    764 KBytes
> > > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > > [ ID] Interval           Transfer     Bitrate         Retr
> > > > [  5]   0.00-10.00  sec  1.07 GBytes   918 Mbits/sec  6320             sender
> > > > [  5]   0.00-10.01  sec  1.06 GBytes   912 Mbits/sec                  receiver
> > > >
> > > > (all measurements are over live internet - so thus variance)
> > >
> > > I forgot there were 5 total devices that were hanging off of there as
> > > well. You might try checking to see if disabling L1 on devices 5:00.0,
> > > 6:00.0 and/or 7:00.0 has any effect while leaving the L1 on 01:00.0
> > > and the NIC active. The basic idea is to go through and make certain
> > > we aren't seeing an L1 issue with one of the other downstream links on
> > > the switch.
> >
> > I did, and i saw no change, only disabling L1 on 01:00.0 gives any effect.
> > But i'd say you're right in your thinking - with L0s head-of-queue
> > stalling can happen
> > due to retry buffers and so on, was interesting to see it detailed...
>
> Okay, so the issue then is definitely the use of L1 on the 00:01.2 <->
> 01:00.0 link. The only piece we don't have the answer to is why, which
> is something we might only be able to answer if we had a PCIe
> analyzer.

Yeah... Maybe these should always have l1 disabled, i have only found
l1.1 and l1.2 errata

> > > The more I think about it the entire setup for this does seem a bit
> > > suspicious. I was looking over the lspci tree and the dump from the
> > > system. From what I can tell the upstream switch link at 01.2 <->
> > > 1:00.0 is only a Gen4 x4 link. However coming off of that is 5
> > > devices, two NICs using either Gen1 or 2 at x1, and then a USB
> > > controller and 2 SATA controller reporting Gen 4 x16. Specifically
> > > those last 3 devices have me a bit curious as they are all reporting
> > > L0s and L1 exit latencies that are the absolute minimum which has me
> > > wondering if they are even reporting actual values.
> >
> > Heh, I have been trying to google for erratas wrt to:
> > 01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Matisse Switch
> > Upstream aka 1022:57ad
> >
> > and the cpu, to see if there is something else I could have missed,
> > but i haven't found anything relating to this yet...
>
> The thing is this could be something that there isn't an errata for.
> All it takes is a bad component somewhere and you can have one lane
> that is a bit flaky and causes the link establishment to take longer
> than it is supposed to.

> The fact that the patch resolves the issue ends up being more
> coincidental than intentional though. We should be able to have the
> NIC work with just the upstream and NIC link on the switch running
> with ASPM enabled, the fact that we can't makes me wonder about that
> upstream port link. Did you only have this one system or were there
> other similar systems you could test with?

I only have this one system...

One thing to bring up was that I did have some issues recreating the
low bandwith state when
testing the l1 settings on the other pcie ids... Ie it worked better for a while

So, It could very well be a race condition with the hardware.

> If we only have the one system it might make sense to just update the
> description for the patch and get away from focusing on this issue,
> and instead focus on the fact that the PCIe spec indicates that this
> is the way it is supposed to be calculated. If we had more of these
> systems to test with and found this is a common thing then we could
> look at adding a PCI quirk for the device to just disable ASPM
> whenever we saw it.

Yeah, agreed, I'll try the ASUS support... but I wonder if I'll get a
good answer

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-10-07 11:31         ` Ian Kumlien
@ 2020-10-07 13:03           ` Bjorn Helgaas
  0 siblings, 0 replies; 33+ messages in thread
From: Bjorn Helgaas @ 2020-10-07 13:03 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: linux-pci

On Wed, Oct 07, 2020 at 01:31:48PM +0200, Ian Kumlien wrote:
> On Mon, Oct 5, 2020 at 9:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Mon, Oct 05, 2020 at 08:38:55PM +0200, Ian Kumlien wrote:
> > > On Mon, Oct 5, 2020 at 8:31 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
> > > > On Mon, Aug 03, 2020 at 04:58:32PM +0200, Ian Kumlien wrote:
> > > > > Changes:
> > > > > * Handle L0s correclty as well, making it per direction
> > > > > * Moved the switch cost in to the if statement since a non L1 switch has
> > > > >   no additional cost.
> > > > >
> > > > > For L0s:
> > > > > We sumarize the entire latency per direction to see if it's acceptable
> > > > > for the PCIe endpoint.
> > > > >
> > > > > If it's not, we clear the link for the path that had too large latency.
> > > > >
> > > > > For L1:
> > > > > Currently we check the maximum latency of upstream and downstream
> > > > > per link, not the maximum for the path
> > > > >
> > > > > This would work if all links have the same latency, but:
> > > > > endpoint -> c -> b -> a -> root  (in the order we walk the path)
> > > > >
> > > > > If c or b has the higest latency, it will not register
> > > > >
> > > > > Fix this by maintaining the maximum latency value for the path
> > > > >
> > > > > This change fixes a regression introduced (but not caused) by:
> > > > > 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
> > > > >
> > > > > Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> > > >
> > > > I'm not sure where we're at with this.  If we can come up with:
> > > >
> > > >   - "lspci -vv" for the entire affected hierarchy before the fix
> > > >
> > > >   - specific identification of incorrect configuration per spec
> > > >
> > > >   - patch that fixes that specific misconfiguration
> > > >
> > > >   - "lspci -vv" for the entire affected hierarchy after the fix
> > > >
> > > > then we have something to work with.  It doesn't have to (and should
> > > > not) fix all the problems at once.
> > >
> > > So detail the changes on my specific machine and then mention
> > > 5.4.1.2.2 of the pci spec
> > > detailing the exit from PCIe ASPM L1?
> >
> > Like I said, I need to see the current ASPM configuration, a note
> > about what is wrong with it (this probably involves a comparison with
> > what the spec says it *should* be), and the configuration after the
> > patch showing that it's now fixed.
> >
> > > Basically writing a better changelog for the first patch?
> > >
> > > Any comments on the L0s patch?
> >
> > Not yet.  When it's packaged up in mergeable form I'll review it.  I
> > just don't have time to extract everything myself.
> 
> So, did it like this, since I don't think the output from my system
> actually is important.
> (I added some descriptive text that is loosely based on the spec)

I think the before/after lspci from your system *is* important.
Concrete examples are a big help.

The commit log below looks accurate, but of course needs to be
attached to a specific patch.  Can you please post the complete patch
with commit log all by itself so it's not buried in the middle of this
long thread?

> ----
> Use maximum latency when determining L1 ASPM
> 
> Make pcie_aspm_check_latency comply with the PCIe spec, specifically:
> "5.4.1.2.2. Exit from the L1 State"
> 
> Which makes it clear that each switch is required to initiate a
> transition within 1μs from receiving it, accumulating this latency and
> then we have to wait for the slowest link along the path before
> entering L0 state from L1.
> 
> The current code doesn't take the maximum latency into account.
> 
> From the example:
>    +----------------+
>    |                |
>    |  Root complex  |
>    |                |
>    |    +-----+     |
>    |    |32 μs|     |
>    +----------------+
>            |
>            |  Link 1
>            |
>    +----------------+
>    |     |8 μs|     |
>    |     +----+     |
>    |    Switch A    |
>    |     +----+     |
>    |     |8 μs|     |
>    +----------------+
>            |
>            |  Link 2
>            |
>    +----------------+
>    |    |32 μs|     |
>    |    +-----+     |
>    |    Switch B    |
>    |    +-----+     |
>    |    |32 μs|     |
>    +----------------+
>            |
>            |  Link 3
>            |
>    +----------------+
>    |     |8μs|      |
>    |     +---+      |
>    |   Endpoint C   |
>    |                |
>    |                |
>    +----------------+
> 
> Links 1, 2 and 3 are all in L1 state - endpoint C initiates the
> transition to L0 at time T. Since switch B takes 32 μs to exit L1 on
> it's ports, Link 3 will transition to L0 at T+32 (longest time
> considering T+8 for endpoint C and T+32 for switch B).
> 
> Switch B is required to initiate a transition from the L1 state on it's
> upstream port after no more than 1 μs from the beginning of the
> transition from L1 state on the downstream port. Therefore, transition from
> L1 to L0 will begin on link 2 at T+1, this will cascade up the path.
> 
> The path will exit L1 at T+34.
> 
> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> ----
> 
> > > > > ---
> > > > >  drivers/pci/pcie/aspm.c | 41 ++++++++++++++++++++++++++---------------
> > > > >  1 file changed, 26 insertions(+), 15 deletions(-)
> > > > >
> > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > > index b17e5ffd31b1..bc512e217258 100644
> > > > > --- a/drivers/pci/pcie/aspm.c
> > > > > +++ b/drivers/pci/pcie/aspm.c
> > > > > @@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> > > > >
> > > > >  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > > > >  {
> > > > > -     u32 latency, l1_switch_latency = 0;
> > > > > +     u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
> > > > > +             l0s_latency_up = 0, l0s_latency_dw = 0;
> > > > >       struct aspm_latency *acceptable;
> > > > >       struct pcie_link_state *link;
> > > > >
> > > > > @@ -447,15 +448,22 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > > > >       acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
> > > > >
> > > > >       while (link) {
> > > > > -             /* Check upstream direction L0s latency */
> > > > > -             if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> > > > > -                 (link->latency_up.l0s > acceptable->l0s))
> > > > > -                     link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > > > > -
> > > > > -             /* Check downstream direction L0s latency */
> > > > > -             if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> > > > > -                 (link->latency_dw.l0s > acceptable->l0s))
> > > > > -                     link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > > > > +             if (link->aspm_capable & ASPM_STATE_L0S) {
> > > > > +                     /* Check upstream direction L0s latency */
> > > > > +                     if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> > > > > +                             l0s_latency_up += link->latency_up.l0s;
> > > > > +                             if (l0s_latency_up > acceptable->l0s)
> > > > > +                                     link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > > > > +                     }
> > > > > +
> > > > > +                     /* Check downstream direction L0s latency */
> > > > > +                     if (link->aspm_capable & ASPM_STATE_L0S_DW) {
> > > > > +                             l0s_latency_dw += link->latency_dw.l0s;
> > > > > +                             if (l0s_latency_dw > acceptable->l0s)
> > > > > +                                     link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > > > > +                     }
> > > > > +             }
> > > > > +
> > > > >               /*
> > > > >                * Check L1 latency.
> > > > >                * Every switch on the path to root complex need 1
> > > > > @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > > > >                * L1 exit latencies advertised by a device include L1
> > > > >                * substate latencies (and hence do not do any check).
> > > > >                */
> > > > > -             latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > > > -             if ((link->aspm_capable & ASPM_STATE_L1) &&
> > > > > -                 (latency + l1_switch_latency > acceptable->l1))
> > > > > -                     link->aspm_capable &= ~ASPM_STATE_L1;
> > > > > -             l1_switch_latency += 1000;
> > > > > +             if (link->aspm_capable & ASPM_STATE_L1) {
> > > > > +                     latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > > > +                     l1_max_latency = max_t(u32, latency, l1_max_latency);
> > > > > +                     if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > > > > +                             link->aspm_capable &= ~ASPM_STATE_L1;
> > > > > +
> > > > > +                     l1_switch_latency += 1000;
> > > > > +             }
> > > > >
> > > > >               link = link->parent;
> > > > >       }
> > > > > --
> > > > > 2.28.0
> > > > >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-10-05 19:09       ` Bjorn Helgaas
@ 2020-10-07 11:31         ` Ian Kumlien
  2020-10-07 13:03           ` Bjorn Helgaas
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Kumlien @ 2020-10-07 11:31 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

On Mon, Oct 5, 2020 at 9:09 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Oct 05, 2020 at 08:38:55PM +0200, Ian Kumlien wrote:
> > On Mon, Oct 5, 2020 at 8:31 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > >
> > > On Mon, Aug 03, 2020 at 04:58:32PM +0200, Ian Kumlien wrote:
> > > > Changes:
> > > > * Handle L0s correclty as well, making it per direction
> > > > * Moved the switch cost in to the if statement since a non L1 switch has
> > > >   no additional cost.
> > > >
> > > > For L0s:
> > > > We sumarize the entire latency per direction to see if it's acceptable
> > > > for the PCIe endpoint.
> > > >
> > > > If it's not, we clear the link for the path that had too large latency.
> > > >
> > > > For L1:
> > > > Currently we check the maximum latency of upstream and downstream
> > > > per link, not the maximum for the path
> > > >
> > > > This would work if all links have the same latency, but:
> > > > endpoint -> c -> b -> a -> root  (in the order we walk the path)
> > > >
> > > > If c or b has the higest latency, it will not register
> > > >
> > > > Fix this by maintaining the maximum latency value for the path
> > > >
> > > > This change fixes a regression introduced (but not caused) by:
> > > > 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
> > > >
> > > > Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> > >
> > > I'm not sure where we're at with this.  If we can come up with:
> > >
> > >   - "lspci -vv" for the entire affected hierarchy before the fix
> > >
> > >   - specific identification of incorrect configuration per spec
> > >
> > >   - patch that fixes that specific misconfiguration
> > >
> > >   - "lspci -vv" for the entire affected hierarchy after the fix
> > >
> > > then we have something to work with.  It doesn't have to (and should
> > > not) fix all the problems at once.
> >
> > So detail the changes on my specific machine and then mention
> > 5.4.1.2.2 of the pci spec
> > detailing the exit from PCIe ASPM L1?
>
> Like I said, I need to see the current ASPM configuration, a note
> about what is wrong with it (this probably involves a comparison with
> what the spec says it *should* be), and the configuration after the
> patch showing that it's now fixed.
>
> > Basically writing a better changelog for the first patch?
> >
> > Any comments on the L0s patch?
>
> Not yet.  When it's packaged up in mergeable form I'll review it.  I
> just don't have time to extract everything myself.

So, did it like this, since I don't think the output from my system
actually is important.
(I added some descriptive text that is loosely based on the spec)
----
Use maximum latency when determining L1 ASPM

Make pcie_aspm_check_latency comply with the PCIe spec, specifically:
"5.4.1.2.2. Exit from the L1 State"

Which makes it clear that each switch is required to initiate a
transition within 1μs from receiving it, accumulating this latency and
then we have to wait for the slowest link along the path before
entering L0 state from L1.

The current code doesn't take the maximum latency into account.

From the example:
   +----------------+
   |                |
   |  Root complex  |
   |                |
   |    +-----+     |
   |    |32 μs|     |
   +----------------+
           |
           |  Link 1
           |
   +----------------+
   |     |8 μs|     |
   |     +----+     |
   |    Switch A    |
   |     +----+     |
   |     |8 μs|     |
   +----------------+
           |
           |  Link 2
           |
   +----------------+
   |    |32 μs|     |
   |    +-----+     |
   |    Switch B    |
   |    +-----+     |
   |    |32 μs|     |
   +----------------+
           |
           |  Link 3
           |
   +----------------+
   |     |8μs|      |
   |     +---+      |
   |   Endpoint C   |
   |                |
   |                |
   +----------------+

Links 1, 2 and 3 are all in L1 state - endpoint C initiates the
transition to L0 at time T. Since switch B takes 32 μs to exit L1 on
it's ports, Link 3 will transition to L0 at T+32 (longest time
considering T+8 for endpoint C and T+32 for switch B).

Switch B is required to initiate a transition from the L1 state on it's
upstream port after no more than 1 μs from the beginning of the
transition from L1 state on the downstream port. Therefore, transition from
L1 to L0 will begin on link 2 at T+1, this will cascade up the path.

The path will exit L1 at T+34.

Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
----

> > > > ---
> > > >  drivers/pci/pcie/aspm.c | 41 ++++++++++++++++++++++++++---------------
> > > >  1 file changed, 26 insertions(+), 15 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > > index b17e5ffd31b1..bc512e217258 100644
> > > > --- a/drivers/pci/pcie/aspm.c
> > > > +++ b/drivers/pci/pcie/aspm.c
> > > > @@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> > > >
> > > >  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > > >  {
> > > > -     u32 latency, l1_switch_latency = 0;
> > > > +     u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
> > > > +             l0s_latency_up = 0, l0s_latency_dw = 0;
> > > >       struct aspm_latency *acceptable;
> > > >       struct pcie_link_state *link;
> > > >
> > > > @@ -447,15 +448,22 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > > >       acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
> > > >
> > > >       while (link) {
> > > > -             /* Check upstream direction L0s latency */
> > > > -             if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> > > > -                 (link->latency_up.l0s > acceptable->l0s))
> > > > -                     link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > > > -
> > > > -             /* Check downstream direction L0s latency */
> > > > -             if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> > > > -                 (link->latency_dw.l0s > acceptable->l0s))
> > > > -                     link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > > > +             if (link->aspm_capable & ASPM_STATE_L0S) {
> > > > +                     /* Check upstream direction L0s latency */
> > > > +                     if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> > > > +                             l0s_latency_up += link->latency_up.l0s;
> > > > +                             if (l0s_latency_up > acceptable->l0s)
> > > > +                                     link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > > > +                     }
> > > > +
> > > > +                     /* Check downstream direction L0s latency */
> > > > +                     if (link->aspm_capable & ASPM_STATE_L0S_DW) {
> > > > +                             l0s_latency_dw += link->latency_dw.l0s;
> > > > +                             if (l0s_latency_dw > acceptable->l0s)
> > > > +                                     link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > > > +                     }
> > > > +             }
> > > > +
> > > >               /*
> > > >                * Check L1 latency.
> > > >                * Every switch on the path to root complex need 1
> > > > @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > > >                * L1 exit latencies advertised by a device include L1
> > > >                * substate latencies (and hence do not do any check).
> > > >                */
> > > > -             latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > > -             if ((link->aspm_capable & ASPM_STATE_L1) &&
> > > > -                 (latency + l1_switch_latency > acceptable->l1))
> > > > -                     link->aspm_capable &= ~ASPM_STATE_L1;
> > > > -             l1_switch_latency += 1000;
> > > > +             if (link->aspm_capable & ASPM_STATE_L1) {
> > > > +                     latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > > +                     l1_max_latency = max_t(u32, latency, l1_max_latency);
> > > > +                     if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > > > +                             link->aspm_capable &= ~ASPM_STATE_L1;
> > > > +
> > > > +                     l1_switch_latency += 1000;
> > > > +             }
> > > >
> > > >               link = link->parent;
> > > >       }
> > > > --
> > > > 2.28.0
> > > >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-10-05 18:38     ` Ian Kumlien
@ 2020-10-05 19:09       ` Bjorn Helgaas
  2020-10-07 11:31         ` Ian Kumlien
  0 siblings, 1 reply; 33+ messages in thread
From: Bjorn Helgaas @ 2020-10-05 19:09 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: linux-pci

On Mon, Oct 05, 2020 at 08:38:55PM +0200, Ian Kumlien wrote:
> On Mon, Oct 5, 2020 at 8:31 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Mon, Aug 03, 2020 at 04:58:32PM +0200, Ian Kumlien wrote:
> > > Changes:
> > > * Handle L0s correclty as well, making it per direction
> > > * Moved the switch cost in to the if statement since a non L1 switch has
> > >   no additional cost.
> > >
> > > For L0s:
> > > We sumarize the entire latency per direction to see if it's acceptable
> > > for the PCIe endpoint.
> > >
> > > If it's not, we clear the link for the path that had too large latency.
> > >
> > > For L1:
> > > Currently we check the maximum latency of upstream and downstream
> > > per link, not the maximum for the path
> > >
> > > This would work if all links have the same latency, but:
> > > endpoint -> c -> b -> a -> root  (in the order we walk the path)
> > >
> > > If c or b has the higest latency, it will not register
> > >
> > > Fix this by maintaining the maximum latency value for the path
> > >
> > > This change fixes a regression introduced (but not caused) by:
> > > 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
> > >
> > > Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> >
> > I'm not sure where we're at with this.  If we can come up with:
> >
> >   - "lspci -vv" for the entire affected hierarchy before the fix
> >
> >   - specific identification of incorrect configuration per spec
> >
> >   - patch that fixes that specific misconfiguration
> >
> >   - "lspci -vv" for the entire affected hierarchy after the fix
> >
> > then we have something to work with.  It doesn't have to (and should
> > not) fix all the problems at once.
> 
> So detail the changes on my specific machine and then mention
> 5.4.1.2.2 of the pci spec
> detailing the exit from PCIe ASPM L1?

Like I said, I need to see the current ASPM configuration, a note
about what is wrong with it (this probably involves a comparison with
what the spec says it *should* be), and the configuration after the
patch showing that it's now fixed.

> Basically writing a better changelog for the first patch?
> 
> Any comments on the L0s patch?

Not yet.  When it's packaged up in mergeable form I'll review it.  I
just don't have time to extract everything myself.

> > > ---
> > >  drivers/pci/pcie/aspm.c | 41 ++++++++++++++++++++++++++---------------
> > >  1 file changed, 26 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index b17e5ffd31b1..bc512e217258 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> > >
> > >  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > >  {
> > > -     u32 latency, l1_switch_latency = 0;
> > > +     u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
> > > +             l0s_latency_up = 0, l0s_latency_dw = 0;
> > >       struct aspm_latency *acceptable;
> > >       struct pcie_link_state *link;
> > >
> > > @@ -447,15 +448,22 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > >       acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
> > >
> > >       while (link) {
> > > -             /* Check upstream direction L0s latency */
> > > -             if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> > > -                 (link->latency_up.l0s > acceptable->l0s))
> > > -                     link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > > -
> > > -             /* Check downstream direction L0s latency */
> > > -             if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> > > -                 (link->latency_dw.l0s > acceptable->l0s))
> > > -                     link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > > +             if (link->aspm_capable & ASPM_STATE_L0S) {
> > > +                     /* Check upstream direction L0s latency */
> > > +                     if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> > > +                             l0s_latency_up += link->latency_up.l0s;
> > > +                             if (l0s_latency_up > acceptable->l0s)
> > > +                                     link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > > +                     }
> > > +
> > > +                     /* Check downstream direction L0s latency */
> > > +                     if (link->aspm_capable & ASPM_STATE_L0S_DW) {
> > > +                             l0s_latency_dw += link->latency_dw.l0s;
> > > +                             if (l0s_latency_dw > acceptable->l0s)
> > > +                                     link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > > +                     }
> > > +             }
> > > +
> > >               /*
> > >                * Check L1 latency.
> > >                * Every switch on the path to root complex need 1
> > > @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > >                * L1 exit latencies advertised by a device include L1
> > >                * substate latencies (and hence do not do any check).
> > >                */
> > > -             latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > -             if ((link->aspm_capable & ASPM_STATE_L1) &&
> > > -                 (latency + l1_switch_latency > acceptable->l1))
> > > -                     link->aspm_capable &= ~ASPM_STATE_L1;
> > > -             l1_switch_latency += 1000;
> > > +             if (link->aspm_capable & ASPM_STATE_L1) {
> > > +                     latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > +                     l1_max_latency = max_t(u32, latency, l1_max_latency);
> > > +                     if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > > +                             link->aspm_capable &= ~ASPM_STATE_L1;
> > > +
> > > +                     l1_switch_latency += 1000;
> > > +             }
> > >
> > >               link = link->parent;
> > >       }
> > > --
> > > 2.28.0
> > >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-10-05 18:31   ` Bjorn Helgaas
@ 2020-10-05 18:38     ` Ian Kumlien
  2020-10-05 19:09       ` Bjorn Helgaas
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Kumlien @ 2020-10-05 18:38 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci

On Mon, Oct 5, 2020 at 8:31 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Mon, Aug 03, 2020 at 04:58:32PM +0200, Ian Kumlien wrote:
> > Changes:
> > * Handle L0s correclty as well, making it per direction
> > * Moved the switch cost in to the if statement since a non L1 switch has
> >   no additional cost.
> >
> > For L0s:
> > We sumarize the entire latency per direction to see if it's acceptable
> > for the PCIe endpoint.
> >
> > If it's not, we clear the link for the path that had too large latency.
> >
> > For L1:
> > Currently we check the maximum latency of upstream and downstream
> > per link, not the maximum for the path
> >
> > This would work if all links have the same latency, but:
> > endpoint -> c -> b -> a -> root  (in the order we walk the path)
> >
> > If c or b has the higest latency, it will not register
> >
> > Fix this by maintaining the maximum latency value for the path
> >
> > This change fixes a regression introduced (but not caused) by:
> > 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
> >
> > Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
>
> I'm not sure where we're at with this.  If we can come up with:
>
>   - "lspci -vv" for the entire affected hierarchy before the fix
>
>   - specific identification of incorrect configuration per spec
>
>   - patch that fixes that specific misconfiguration
>
>   - "lspci -vv" for the entire affected hierarchy after the fix
>
> then we have something to work with.  It doesn't have to (and should
> not) fix all the problems at once.

So detail the changes on my specific machine and then mention
5.4.1.2.2 of the pci spec
detailing the exit from PCIe ASPM L1?

Basically writing a better changelog for the first patch?

Any comments on the L0s patch?

> > ---
> >  drivers/pci/pcie/aspm.c | 41 ++++++++++++++++++++++++++---------------
> >  1 file changed, 26 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index b17e5ffd31b1..bc512e217258 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> >
> >  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >  {
> > -     u32 latency, l1_switch_latency = 0;
> > +     u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
> > +             l0s_latency_up = 0, l0s_latency_dw = 0;
> >       struct aspm_latency *acceptable;
> >       struct pcie_link_state *link;
> >
> > @@ -447,15 +448,22 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >       acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
> >
> >       while (link) {
> > -             /* Check upstream direction L0s latency */
> > -             if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> > -                 (link->latency_up.l0s > acceptable->l0s))
> > -                     link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > -
> > -             /* Check downstream direction L0s latency */
> > -             if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> > -                 (link->latency_dw.l0s > acceptable->l0s))
> > -                     link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > +             if (link->aspm_capable & ASPM_STATE_L0S) {
> > +                     /* Check upstream direction L0s latency */
> > +                     if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> > +                             l0s_latency_up += link->latency_up.l0s;
> > +                             if (l0s_latency_up > acceptable->l0s)
> > +                                     link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > +                     }
> > +
> > +                     /* Check downstream direction L0s latency */
> > +                     if (link->aspm_capable & ASPM_STATE_L0S_DW) {
> > +                             l0s_latency_dw += link->latency_dw.l0s;
> > +                             if (l0s_latency_dw > acceptable->l0s)
> > +                                     link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > +                     }
> > +             }
> > +
> >               /*
> >                * Check L1 latency.
> >                * Every switch on the path to root complex need 1
> > @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >                * L1 exit latencies advertised by a device include L1
> >                * substate latencies (and hence do not do any check).
> >                */
> > -             latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > -             if ((link->aspm_capable & ASPM_STATE_L1) &&
> > -                 (latency + l1_switch_latency > acceptable->l1))
> > -                     link->aspm_capable &= ~ASPM_STATE_L1;
> > -             l1_switch_latency += 1000;
> > +             if (link->aspm_capable & ASPM_STATE_L1) {
> > +                     latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > +                     l1_max_latency = max_t(u32, latency, l1_max_latency);
> > +                     if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > +                             link->aspm_capable &= ~ASPM_STATE_L1;
> > +
> > +                     l1_switch_latency += 1000;
> > +             }
> >
> >               link = link->parent;
> >       }
> > --
> > 2.28.0
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-08-03 14:58 ` [PATCH] Use maximum latency when determining L1/L0s ASPM v2 Ian Kumlien
                     ` (2 preceding siblings ...)
  2020-09-22 20:19   ` Bjorn Helgaas
@ 2020-10-05 18:31   ` Bjorn Helgaas
  2020-10-05 18:38     ` Ian Kumlien
  3 siblings, 1 reply; 33+ messages in thread
From: Bjorn Helgaas @ 2020-10-05 18:31 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: linux-pci

On Mon, Aug 03, 2020 at 04:58:32PM +0200, Ian Kumlien wrote:
> Changes:
> * Handle L0s correclty as well, making it per direction
> * Moved the switch cost in to the if statement since a non L1 switch has
>   no additional cost.
> 
> For L0s:
> We sumarize the entire latency per direction to see if it's acceptable
> for the PCIe endpoint.
> 
> If it's not, we clear the link for the path that had too large latency.
> 
> For L1:
> Currently we check the maximum latency of upstream and downstream
> per link, not the maximum for the path
> 
> This would work if all links have the same latency, but:
> endpoint -> c -> b -> a -> root  (in the order we walk the path)
> 
> If c or b has the higest latency, it will not register
> 
> Fix this by maintaining the maximum latency value for the path
> 
> This change fixes a regression introduced (but not caused) by:
> 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
> 
> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>

I'm not sure where we're at with this.  If we can come up with:

  - "lspci -vv" for the entire affected hierarchy before the fix

  - specific identification of incorrect configuration per spec

  - patch that fixes that specific misconfiguration

  - "lspci -vv" for the entire affected hierarchy after the fix

then we have something to work with.  It doesn't have to (and should
not) fix all the problems at once.

> ---
>  drivers/pci/pcie/aspm.c | 41 ++++++++++++++++++++++++++---------------
>  1 file changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index b17e5ffd31b1..bc512e217258 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>  
>  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  {
> -	u32 latency, l1_switch_latency = 0;
> +	u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
> +		l0s_latency_up = 0, l0s_latency_dw = 0;
>  	struct aspm_latency *acceptable;
>  	struct pcie_link_state *link;
>  
> @@ -447,15 +448,22 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  	acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
>  
>  	while (link) {
> -		/* Check upstream direction L0s latency */
> -		if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> -		    (link->latency_up.l0s > acceptable->l0s))
> -			link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> -
> -		/* Check downstream direction L0s latency */
> -		if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> -		    (link->latency_dw.l0s > acceptable->l0s))
> -			link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +		if (link->aspm_capable & ASPM_STATE_L0S) {
> +			/* Check upstream direction L0s latency */
> +			if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> +				l0s_latency_up += link->latency_up.l0s;
> +				if (l0s_latency_up > acceptable->l0s)
> +					link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> +			}
> +
> +			/* Check downstream direction L0s latency */
> +			if (link->aspm_capable & ASPM_STATE_L0S_DW) {
> +				l0s_latency_dw += link->latency_dw.l0s;
> +				if (l0s_latency_dw > acceptable->l0s)
> +					link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +			}
> +		}
> +
>  		/*
>  		 * Check L1 latency.
>  		 * Every switch on the path to root complex need 1
> @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  		 * L1 exit latencies advertised by a device include L1
>  		 * substate latencies (and hence do not do any check).
>  		 */
> -		latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> -		if ((link->aspm_capable & ASPM_STATE_L1) &&
> -		    (latency + l1_switch_latency > acceptable->l1))
> -			link->aspm_capable &= ~ASPM_STATE_L1;
> -		l1_switch_latency += 1000;
> +		if (link->aspm_capable & ASPM_STATE_L1) {
> +			latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> +			l1_max_latency = max_t(u32, latency, l1_max_latency);
> +			if (l1_max_latency + l1_switch_latency > acceptable->l1)
> +				link->aspm_capable &= ~ASPM_STATE_L1;
> +
> +			l1_switch_latency += 1000;
> +		}
>  
>  		link = link->parent;
>  	}
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-24 16:24               ` Bjorn Helgaas
@ 2020-09-25  8:06                 ` Ian Kumlien
  0 siblings, 0 replies; 33+ messages in thread
From: Ian Kumlien @ 2020-09-25  8:06 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

Ok so it might not be related to that change then...

but:
 ID] Interval           Transfer     Bitrate
[  5]   0.00-1.00   sec  4.34 MBytes  36.4 Mbits/sec
[  5]   1.00-2.00   sec  7.11 MBytes  59.6 Mbits/sec
[  5]   2.00-3.00   sec  4.76 MBytes  39.9 Mbits/sec
[  5]   3.00-4.00   sec  4.13 MBytes  34.7 Mbits/sec
[  5]   4.00-5.00   sec  4.19 MBytes  35.1 Mbits/sec
[  5]   5.00-6.00   sec  5.70 MBytes  47.8 Mbits/sec
[  5]   6.00-7.00   sec  5.96 MBytes  50.0 Mbits/sec
[  5]   7.00-8.00   sec  4.17 MBytes  35.0 Mbits/sec
[  5]   8.00-9.00   sec  4.14 MBytes  34.7 Mbits/sec
[  5]   9.00-10.00  sec  4.08 MBytes  34.3 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-10.00  sec  48.6 MBytes  40.8 Mbits/sec                  receiver

The issue persists

On Thu, Sep 24, 2020 at 6:24 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Sep 23, 2020 at 11:36:00PM +0200, Ian Kumlien wrote:
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=66ff14e59e8a
> >
> > "7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
> > Linux to enable ASPM, but for some undocumented reason, it didn't enable
> > ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.
> >
> > Remove this exclusion so we can enable ASPM on these links."
> > ...
>
> > And all of this worked before the commit above.
>
> OK, really sorry, I got myself totally confused here, and I need to
> start over from scratch.  Correct me when I go off the rails.
>
> You're running 5.8.11+, and you get ~40 Mbit/s on the Intel I211 NIC.
> Reverting 66ff14e59e8a ("PCI/ASPM: Allow ASPM on links to
> PCIe-to-PCI/PCI-X Bridges") gets your bandwidth up to the 900+ Mbit/s
> you expect.
>
> 66ff14e59e8a only makes a difference if you have a PCIe-to-PCI/PCI-X
> Bridge (PCI_EXP_TYPE_PCI_BRIDGE) in your system.  But from your lspci
> and pci=earlydump output, I don't see any of those.  The only bridges
> I see are:
>
> [    0.810346] pci 0000:00:01.2: [1022:1483] type 01 Root Port to [bus 01-07]
> [    0.810587] pci 0000:00:03.1: [1022:1483] type 01 Root Port to [bus 08]
> [    0.810587] pci 0000:00:03.2: [1022:1483] type 01 Root Port to [bus 09]
> [    0.810837] pci 0000:00:07.1: [1022:1484] type 01 Root Port to [bus 0a]
> [    0.811587] pci 0000:00:08.1: [1022:1484] type 01 Root Port to [bus 0b]
> [    0.812586] pci 0000:01:00.0: [1022:57ad] type 01 Upstream Port to [bus 02-07]
> [    0.812629] pci 0000:02:03.0: [1022:57a3] type 01 Downstream Port to [bus 03]
> [    0.813584] pci 0000:02:04.0: [1022:57a3] type 01 Downstream Port to [bus 04]
> [    0.814584] pci 0000:02:08.0: [1022:57a4] type 01 Downstream Port to [bus 05]
> [    0.815584] pci 0000:02:09.0: [1022:57a4] type 01 Downstream Port to [bus 06]
> [    0.815584] pci 0000:02:0a.0: [1022:57a4] type 01 Downstream Port to [bus 07]
>
> So I'm lost right off the bat.  You have no PCI_EXP_TYPE_PCI_BRIDGE
> device, so how can 66ff14e59e8a make a difference for you?
>
> Can you add a printk there, e.g.,
>
>         list_for_each_entry(child, &linkbus->devices, bus_list) {
>                 if (pci_pcie_type(child) == PCI_EXP_TYPE_PCI_BRIDGE) {
>   +                     pci_info(child, "PCIe-to-PCI bridge, disabling ASPM\n");
>                         link->aspm_disable = ASPM_STATE_ALL;
>                         break;
>                 }
>         }
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-23 21:36             ` Ian Kumlien
  2020-09-23 21:48               ` Ian Kumlien
@ 2020-09-24 16:24               ` Bjorn Helgaas
  2020-09-25  8:06                 ` Ian Kumlien
  1 sibling, 1 reply; 33+ messages in thread
From: Bjorn Helgaas @ 2020-09-24 16:24 UTC (permalink / raw)
  To: Ian Kumlien
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

On Wed, Sep 23, 2020 at 11:36:00PM +0200, Ian Kumlien wrote:

> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=66ff14e59e8a
> 
> "7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
> Linux to enable ASPM, but for some undocumented reason, it didn't enable
> ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.
> 
> Remove this exclusion so we can enable ASPM on these links."
> ...

> And all of this worked before the commit above.

OK, really sorry, I got myself totally confused here, and I need to
start over from scratch.  Correct me when I go off the rails.

You're running 5.8.11+, and you get ~40 Mbit/s on the Intel I211 NIC.
Reverting 66ff14e59e8a ("PCI/ASPM: Allow ASPM on links to
PCIe-to-PCI/PCI-X Bridges") gets your bandwidth up to the 900+ Mbit/s
you expect.

66ff14e59e8a only makes a difference if you have a PCIe-to-PCI/PCI-X
Bridge (PCI_EXP_TYPE_PCI_BRIDGE) in your system.  But from your lspci
and pci=earlydump output, I don't see any of those.  The only bridges
I see are:

[    0.810346] pci 0000:00:01.2: [1022:1483] type 01 Root Port to [bus 01-07]
[    0.810587] pci 0000:00:03.1: [1022:1483] type 01 Root Port to [bus 08]
[    0.810587] pci 0000:00:03.2: [1022:1483] type 01 Root Port to [bus 09]
[    0.810837] pci 0000:00:07.1: [1022:1484] type 01 Root Port to [bus 0a]
[    0.811587] pci 0000:00:08.1: [1022:1484] type 01 Root Port to [bus 0b]
[    0.812586] pci 0000:01:00.0: [1022:57ad] type 01 Upstream Port to [bus 02-07]
[    0.812629] pci 0000:02:03.0: [1022:57a3] type 01 Downstream Port to [bus 03]
[    0.813584] pci 0000:02:04.0: [1022:57a3] type 01 Downstream Port to [bus 04]
[    0.814584] pci 0000:02:08.0: [1022:57a4] type 01 Downstream Port to [bus 05]
[    0.815584] pci 0000:02:09.0: [1022:57a4] type 01 Downstream Port to [bus 06]
[    0.815584] pci 0000:02:0a.0: [1022:57a4] type 01 Downstream Port to [bus 07]

So I'm lost right off the bat.  You have no PCI_EXP_TYPE_PCI_BRIDGE
device, so how can 66ff14e59e8a make a difference for you?

Can you add a printk there, e.g.,

        list_for_each_entry(child, &linkbus->devices, bus_list) {
                if (pci_pcie_type(child) == PCI_EXP_TYPE_PCI_BRIDGE) {
  +                     pci_info(child, "PCIe-to-PCI bridge, disabling ASPM\n");
                        link->aspm_disable = ASPM_STATE_ALL;
                        break;
                }
        }


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-23 21:36             ` Ian Kumlien
@ 2020-09-23 21:48               ` Ian Kumlien
  2020-09-24 16:24               ` Bjorn Helgaas
  1 sibling, 0 replies; 33+ messages in thread
From: Ian Kumlien @ 2020-09-23 21:48 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

Added dmesg with pci=earlydump on the bugzilla

https://bugzilla.kernel.org/attachment.cgi?id=292601

On Wed, Sep 23, 2020 at 11:36 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Wed, Sep 23, 2020 at 11:23 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > On Wed, Sep 23, 2020 at 01:29:18AM +0200, Ian Kumlien wrote:
> > > On Wed, Sep 23, 2020 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Tue, Sep 22, 2020 at 11:02:35PM +0200, Ian Kumlien wrote:
> >
> > > > > commit db3d9c4baf4ab177d87b5cd41f624f5901e7390f
> > > > > Author: Ian Kumlien <ian.kumlien@gmail.com>
> > > > > Date:   Sun Jul 26 16:01:15 2020 +0200
> > > > >
> > > > >     Use maximum latency when determining L1 ASPM
> > > > >
> > > > >     If it's not, we clear the link for the path that had too large latency.
> > > > >
> > > > >     Currently we check the maximum latency of upstream and downstream
> > > > >     per link, not the maximum for the path
> > > > >
> > > > >     This would work if all links have the same latency, but:
> > > > >     endpoint -> c -> b -> a -> root  (in the order we walk the path)
> > > > >
> > > > >     If c or b has the higest latency, it will not register
> > > > >
> > > > >     Fix this by maintaining the maximum latency value for the path
> > > > >
> > > > >     See this bugzilla for more information:
> > > > >     https://bugzilla.kernel.org/show_bug.cgi?id=208741
> > > > >
> > > > >     This fixes an issue for me where my desktops machines maximum bandwidth
> > > > >     for remote connections dropped from 933 MBit to ~40 MBit.
> > > > >
> > > > >     The bug became obvious once we enabled ASPM on all links:
> > > > >     66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
> > > >
> > > > I can't connect the dots here yet.  I don't see a PCIe-to-PCI/PCI-X
> > > > bridge in your lspci, so I can't figure out why this commit would make
> > > > a difference for you.
> > > >
> > > > IIUC, the problem device is 03:00.0, the Intel I211 NIC.  Here's the
> > > > path to it:
> > > >
> > > >   00:01.2 Root Port              to [bus 01-07]
> > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > >
> > > > And I think this is the relevant info:
> > > >
> > > >                                                     LnkCtl    LnkCtl
> > > >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > > >   00:01.2                                L1 <32us       L1+       L1-
> > > >   01:00.0                                L1 <32us       L1+       L1-
> > > >   02:03.0                                L1 <32us       L1+       L1+
> > > >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> > > >
> > > > The NIC says it can tolerate at most 512ns of L0s exit latency and at
> > > > most 64us of L1 exit latency.
> > > >
> > > > 02:03.0 doesn't support L0s, and the NIC itself can't exit L0s that
> > > > fast anyway (it can only do <2us), so L0s should be out of the picture
> > > > completely.
> > > >
> > > > Before your patch, apparently we (or BIOS) enabled L1 on the link from
> > > > 00:01.2 to 01:00.0, and partially enabled it on the link from 02:03.0
> > > > to 03:00.0.
> > >
> > > According to the spec, this is managed by the OS - which was the
> > > change introduced...
> >
> > BIOS frequently enables ASPM and, if CONFIG_PCIEASPM_DEFAULT=y, I
> > don't think Linux touches it unless a user requests it via sysfs.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=66ff14e59e8a
>
> "7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
> Linux to enable ASPM, but for some undocumented reason, it didn't enable
> ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.
>
> Remove this exclusion so we can enable ASPM on these links."
>
> > What does "grep ASPM .config" tell you?
>
> CONFIG_PCIEASPM=y
> CONFIG_PCIEASPM_POWERSAVE=y
>
> And all of this worked before the commit above.
>
> > Boot with "pci=earlydump" and the dmesg will tell us what the BIOS
> > did.
> >
> > If you do this in the unmodified kernel:
> >
> >   # echo 1 > /sys/.../0000:03:00.0/l1_aspm
> >
> > it should enable L1 for 03:00.0.  I'd like to know whether it actually
> > does, and whether the NIC behaves any differently when L1 is enabled
> > for the entire path instead of just the first three components.
>
> With an unmodified kernel, I have ~40 mbit bandwidth.
>
> > If the above doesn't work, you should be able to enable ASPM manually:
> >
> >   # setpci -s03:00.0 CAP_EXP+0x10.w=0x0042
>
> I used something similar to disable ASPM, which made the nic work again.
>
> > > > It looks like we *should* be able to enable L1 on both links since the
> > > > exit latency should be <33us (first link starts exit at T=0, completes
> > > > by T=32; second link starts exit at T=1, completes by T=33), and
> > > > 03:00.0 can tolerate up to 64us.
> > > >
> > > > I guess the effect of your patch is to disable L1 on the 00:01.2 -
> > > > 01:00.0 link?  And that makes the NIC work better?  I am obviously
> > > > missing something because I don't understand why the patch does that
> > > > or why it works better.
> > >
> > > It makes it work like normal again, like if i disable ASPM on the
> > > nic itself...
> >
> > I wonder if ASPM is just broken on this device.
> > __e1000e_disable_aspm() mentions hardware errata on a different Intel
> > NIC.
>
> I doubt it. The total time seems to surpass the time it can handle
> with it's buffers.
>
> Note that the nic is running with ASPM now, and it is working =)
>
> > > I don't know which value that reflects, up or down - since we do max
> > > of both values and
> > > it actually disables ASPM.
> > >
> > > What we can see is that the first device that passes the threshold
> > > is 01:00.0
> >
> > I don't understand this.  03:00.0 claims to be able to tolerate 64us
> > of L1 exit latency.  The path should only have <33us of latency, so
> > it's not even close to the 64us threshold.
>
> Well, this is why i dumped the raw data - i don't know how lspci reads it
>
> > > How can I read more data from PCIe without needing to add kprint...
> > >
> > > This is what lspci uses apparently:
> > > #define  PCI_EXP_LNKCAP_L0S     0x07000 /* L0s Exit Latency */
> > > #define  PCI_EXP_LNKCAP_L1      0x38000 /* L1 Exit Latency */
> > >
> > > But which latencies are those? up or down?
> >
> > I think the idea in aspm.c that exit latencies depend on which
> > direction traffic is going is incorrect.  The components at the
> > upstream and downstream ends of the link may have different exit
> > latencies, of course.
>
> Yes, and the max latency is the maximum time the device can buffer before
> the link has to be up, so maximum time to establish link must be
> max(up, down) right?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-23 21:23           ` Bjorn Helgaas
@ 2020-09-23 21:36             ` Ian Kumlien
  2020-09-23 21:48               ` Ian Kumlien
  2020-09-24 16:24               ` Bjorn Helgaas
  0 siblings, 2 replies; 33+ messages in thread
From: Ian Kumlien @ 2020-09-23 21:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

On Wed, Sep 23, 2020 at 11:23 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Wed, Sep 23, 2020 at 01:29:18AM +0200, Ian Kumlien wrote:
> > On Wed, Sep 23, 2020 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Tue, Sep 22, 2020 at 11:02:35PM +0200, Ian Kumlien wrote:
>
> > > > commit db3d9c4baf4ab177d87b5cd41f624f5901e7390f
> > > > Author: Ian Kumlien <ian.kumlien@gmail.com>
> > > > Date:   Sun Jul 26 16:01:15 2020 +0200
> > > >
> > > >     Use maximum latency when determining L1 ASPM
> > > >
> > > >     If it's not, we clear the link for the path that had too large latency.
> > > >
> > > >     Currently we check the maximum latency of upstream and downstream
> > > >     per link, not the maximum for the path
> > > >
> > > >     This would work if all links have the same latency, but:
> > > >     endpoint -> c -> b -> a -> root  (in the order we walk the path)
> > > >
> > > >     If c or b has the higest latency, it will not register
> > > >
> > > >     Fix this by maintaining the maximum latency value for the path
> > > >
> > > >     See this bugzilla for more information:
> > > >     https://bugzilla.kernel.org/show_bug.cgi?id=208741
> > > >
> > > >     This fixes an issue for me where my desktops machines maximum bandwidth
> > > >     for remote connections dropped from 933 MBit to ~40 MBit.
> > > >
> > > >     The bug became obvious once we enabled ASPM on all links:
> > > >     66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
> > >
> > > I can't connect the dots here yet.  I don't see a PCIe-to-PCI/PCI-X
> > > bridge in your lspci, so I can't figure out why this commit would make
> > > a difference for you.
> > >
> > > IIUC, the problem device is 03:00.0, the Intel I211 NIC.  Here's the
> > > path to it:
> > >
> > >   00:01.2 Root Port              to [bus 01-07]
> > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > >   02:03.0 Switch Downstream Port to [bus 03]
> > >   03:00.0 Endpoint (Intel I211 NIC)
> > >
> > > And I think this is the relevant info:
> > >
> > >                                                     LnkCtl    LnkCtl
> > >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> > >   00:01.2                                L1 <32us       L1+       L1-
> > >   01:00.0                                L1 <32us       L1+       L1-
> > >   02:03.0                                L1 <32us       L1+       L1+
> > >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> > >
> > > The NIC says it can tolerate at most 512ns of L0s exit latency and at
> > > most 64us of L1 exit latency.
> > >
> > > 02:03.0 doesn't support L0s, and the NIC itself can't exit L0s that
> > > fast anyway (it can only do <2us), so L0s should be out of the picture
> > > completely.
> > >
> > > Before your patch, apparently we (or BIOS) enabled L1 on the link from
> > > 00:01.2 to 01:00.0, and partially enabled it on the link from 02:03.0
> > > to 03:00.0.
> >
> > According to the spec, this is managed by the OS - which was the
> > change introduced...
>
> BIOS frequently enables ASPM and, if CONFIG_PCIEASPM_DEFAULT=y, I
> don't think Linux touches it unless a user requests it via sysfs.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=66ff14e59e8a

"7d715a6c1ae5 ("PCI: add PCI Express ASPM support") added the ability for
Linux to enable ASPM, but for some undocumented reason, it didn't enable
ASPM on links where the downstream component is a PCIe-to-PCI/PCI-X Bridge.

Remove this exclusion so we can enable ASPM on these links."

> What does "grep ASPM .config" tell you?

CONFIG_PCIEASPM=y
CONFIG_PCIEASPM_POWERSAVE=y

And all of this worked before the commit above.

> Boot with "pci=earlydump" and the dmesg will tell us what the BIOS
> did.
>
> If you do this in the unmodified kernel:
>
>   # echo 1 > /sys/.../0000:03:00.0/l1_aspm
>
> it should enable L1 for 03:00.0.  I'd like to know whether it actually
> does, and whether the NIC behaves any differently when L1 is enabled
> for the entire path instead of just the first three components.

With an unmodified kernel, I have ~40 mbit bandwidth.

> If the above doesn't work, you should be able to enable ASPM manually:
>
>   # setpci -s03:00.0 CAP_EXP+0x10.w=0x0042

I used something similar to disable ASPM, which made the nic work again.

> > > It looks like we *should* be able to enable L1 on both links since the
> > > exit latency should be <33us (first link starts exit at T=0, completes
> > > by T=32; second link starts exit at T=1, completes by T=33), and
> > > 03:00.0 can tolerate up to 64us.
> > >
> > > I guess the effect of your patch is to disable L1 on the 00:01.2 -
> > > 01:00.0 link?  And that makes the NIC work better?  I am obviously
> > > missing something because I don't understand why the patch does that
> > > or why it works better.
> >
> > It makes it work like normal again, like if i disable ASPM on the
> > nic itself...
>
> I wonder if ASPM is just broken on this device.
> __e1000e_disable_aspm() mentions hardware errata on a different Intel
> NIC.

I doubt it. The total time seems to surpass the time it can handle
with it's buffers.

Note that the nic is running with ASPM now, and it is working =)

> > I don't know which value that reflects, up or down - since we do max
> > of both values and
> > it actually disables ASPM.
> >
> > What we can see is that the first device that passes the threshold
> > is 01:00.0
>
> I don't understand this.  03:00.0 claims to be able to tolerate 64us
> of L1 exit latency.  The path should only have <33us of latency, so
> it's not even close to the 64us threshold.

Well, this is why i dumped the raw data - i don't know how lspci reads it

> > How can I read more data from PCIe without needing to add kprint...
> >
> > This is what lspci uses apparently:
> > #define  PCI_EXP_LNKCAP_L0S     0x07000 /* L0s Exit Latency */
> > #define  PCI_EXP_LNKCAP_L1      0x38000 /* L1 Exit Latency */
> >
> > But which latencies are those? up or down?
>
> I think the idea in aspm.c that exit latencies depend on which
> direction traffic is going is incorrect.  The components at the
> upstream and downstream ends of the link may have different exit
> latencies, of course.

Yes, and the max latency is the maximum time the device can buffer before
the link has to be up, so maximum time to establish link must be
max(up, down) right?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-22 23:29         ` Ian Kumlien
  2020-09-22 23:31           ` Ian Kumlien
@ 2020-09-23 21:23           ` Bjorn Helgaas
  2020-09-23 21:36             ` Ian Kumlien
  1 sibling, 1 reply; 33+ messages in thread
From: Bjorn Helgaas @ 2020-09-23 21:23 UTC (permalink / raw)
  To: Ian Kumlien
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

On Wed, Sep 23, 2020 at 01:29:18AM +0200, Ian Kumlien wrote:
> On Wed, Sep 23, 2020 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Tue, Sep 22, 2020 at 11:02:35PM +0200, Ian Kumlien wrote:

> > > commit db3d9c4baf4ab177d87b5cd41f624f5901e7390f
> > > Author: Ian Kumlien <ian.kumlien@gmail.com>
> > > Date:   Sun Jul 26 16:01:15 2020 +0200
> > >
> > >     Use maximum latency when determining L1 ASPM
> > >
> > >     If it's not, we clear the link for the path that had too large latency.
> > >
> > >     Currently we check the maximum latency of upstream and downstream
> > >     per link, not the maximum for the path
> > >
> > >     This would work if all links have the same latency, but:
> > >     endpoint -> c -> b -> a -> root  (in the order we walk the path)
> > >
> > >     If c or b has the higest latency, it will not register
> > >
> > >     Fix this by maintaining the maximum latency value for the path
> > >
> > >     See this bugzilla for more information:
> > >     https://bugzilla.kernel.org/show_bug.cgi?id=208741
> > >
> > >     This fixes an issue for me where my desktops machines maximum bandwidth
> > >     for remote connections dropped from 933 MBit to ~40 MBit.
> > >
> > >     The bug became obvious once we enabled ASPM on all links:
> > >     66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
> >
> > I can't connect the dots here yet.  I don't see a PCIe-to-PCI/PCI-X
> > bridge in your lspci, so I can't figure out why this commit would make
> > a difference for you.
> >
> > IIUC, the problem device is 03:00.0, the Intel I211 NIC.  Here's the
> > path to it:
> >
> >   00:01.2 Root Port              to [bus 01-07]
> >   01:00.0 Switch Upstream Port   to [bus 02-07]
> >   02:03.0 Switch Downstream Port to [bus 03]
> >   03:00.0 Endpoint (Intel I211 NIC)
> >
> > And I think this is the relevant info:
> >
> >                                                     LnkCtl    LnkCtl
> >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> >   00:01.2                                L1 <32us       L1+       L1-
> >   01:00.0                                L1 <32us       L1+       L1-
> >   02:03.0                                L1 <32us       L1+       L1+
> >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> >
> > The NIC says it can tolerate at most 512ns of L0s exit latency and at
> > most 64us of L1 exit latency.
> >
> > 02:03.0 doesn't support L0s, and the NIC itself can't exit L0s that
> > fast anyway (it can only do <2us), so L0s should be out of the picture
> > completely.
> >
> > Before your patch, apparently we (or BIOS) enabled L1 on the link from
> > 00:01.2 to 01:00.0, and partially enabled it on the link from 02:03.0
> > to 03:00.0.
> 
> According to the spec, this is managed by the OS - which was the
> change introduced...

BIOS frequently enables ASPM and, if CONFIG_PCIEASPM_DEFAULT=y, I
don't think Linux touches it unless a user requests it via sysfs.

What does "grep ASPM .config" tell you?

Boot with "pci=earlydump" and the dmesg will tell us what the BIOS
did.

If you do this in the unmodified kernel:

  # echo 1 > /sys/.../0000:03:00.0/l1_aspm

it should enable L1 for 03:00.0.  I'd like to know whether it actually
does, and whether the NIC behaves any differently when L1 is enabled
for the entire path instead of just the first three components.

If the above doesn't work, you should be able to enable ASPM manually:

  # setpci -s03:00.0 CAP_EXP+0x10.w=0x0042

> > It looks like we *should* be able to enable L1 on both links since the
> > exit latency should be <33us (first link starts exit at T=0, completes
> > by T=32; second link starts exit at T=1, completes by T=33), and
> > 03:00.0 can tolerate up to 64us.
> >
> > I guess the effect of your patch is to disable L1 on the 00:01.2 -
> > 01:00.0 link?  And that makes the NIC work better?  I am obviously
> > missing something because I don't understand why the patch does that
> > or why it works better.
> 
> It makes it work like normal again, like if i disable ASPM on the
> nic itself...

I wonder if ASPM is just broken on this device.
__e1000e_disable_aspm() mentions hardware errata on a different Intel
NIC.

> I don't know which value that reflects, up or down - since we do max
> of both values and
> it actually disables ASPM.
> 
> What we can see is that the first device that passes the threshold
> is 01:00.0

I don't understand this.  03:00.0 claims to be able to tolerate 64us
of L1 exit latency.  The path should only have <33us of latency, so
it's not even close to the 64us threshold.

> How can I read more data from PCIe without needing to add kprint...
> 
> This is what lspci uses apparently:
> #define  PCI_EXP_LNKCAP_L0S     0x07000 /* L0s Exit Latency */
> #define  PCI_EXP_LNKCAP_L1      0x38000 /* L1 Exit Latency */
> 
> But which latencies are those? up or down?

I think the idea in aspm.c that exit latencies depend on which
direction traffic is going is incorrect.  The components at the
upstream and downstream ends of the link may have different exit
latencies, of course.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-22 23:29         ` Ian Kumlien
@ 2020-09-22 23:31           ` Ian Kumlien
  2020-09-23 21:23           ` Bjorn Helgaas
  1 sibling, 0 replies; 33+ messages in thread
From: Ian Kumlien @ 2020-09-22 23:31 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

Actually, from reading the code, it should theoretically only be up...

since:
        /*
         * Re-read upstream/downstream components' register state
         * after clock configuration
         */
        pcie_get_aspm_reg(parent, &upreg);
        pcie_get_aspm_reg(child, &dwreg);
...

But the max was there before? And also, it fixes actual issues? I'm
off to bed.... :)

On Wed, Sep 23, 2020 at 1:29 AM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> On Wed, Sep 23, 2020 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >
> > [+cc Alexander]
> >
> > On Tue, Sep 22, 2020 at 11:02:35PM +0200, Ian Kumlien wrote:
> > > On Tue, Sep 22, 2020 at 10:19 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > > On Mon, Aug 03, 2020 at 04:58:32PM +0200, Ian Kumlien wrote:
> >
> > > > > @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > > > >                * L1 exit latencies advertised by a device include L1
> > > > >                * substate latencies (and hence do not do any check).
> > > > >                */
> > > > > -             latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > > > -             if ((link->aspm_capable & ASPM_STATE_L1) &&
> > > > > -                 (latency + l1_switch_latency > acceptable->l1))
> > > > > -                     link->aspm_capable &= ~ASPM_STATE_L1;
> > > > > -             l1_switch_latency += 1000;
> > > > > +             if (link->aspm_capable & ASPM_STATE_L1) {
> > > > > +                     latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > > > +                     l1_max_latency = max_t(u32, latency, l1_max_latency);
> > > > > +                     if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > > > > +                             link->aspm_capable &= ~ASPM_STATE_L1;
> > > > > +
> > > > > +                     l1_switch_latency += 1000;
> > > > > +             }
> > > >
> > > > This accumulates the 1 usec delays for a Switch to propagate the exit
> > > > transition from its Downstream Port to its Upstream Port, but it
> > > > doesn't accumulate the L1 exit latencies themselves for the entire
> > > > path, does it?  I.e., we don't accumulate "latency" for the whole
> > > > path.  Don't we need that?
> > >
> > > Not for L1's apparently, from what I gather the maximum link latency
> > > is "largest latency" + 1us * number-of-hops
> > >
> > > Ie, just like the comment above states - the L1 total time might be
> > > more but  1us is all that is needed to "start" and that propagates
> > > over the link.
> >
> > Ah, you're right!  I don't think this is clear from the existing code
> > comment, but it *is* clear from the example in sec 5.4.1.2.2 (Figure
> > 5-8) of the spec.
> >
> > > @@ -448,14 +449,18 @@ static void pcie_aspm_check_latency(struct
> > > pci_dev *endpoint)
> > >
> > >         while (link) {
> > >                 /* Check upstream direction L0s latency */
> > > -               if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> > > -                   (link->latency_up.l0s > acceptable->l0s))
> > > -                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > > +               if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> > > +                       l0s_latency_up += link->latency_up.l0s;
> >
> > It's pretty clear from sec 5.4.1.2.2 that we *don't* need to
> > accumulate the L1 exit latencies.  Unfortunately sec 5.4.1.1.2 about
> > L0s exit doesn't have such a nice example.
> >
> > The L0s *language* is similar though:
> >
> >   5.4.1.1.2: If the Upstream component is a Switch (i.e., it is not
> >   the Root Complex), then it must initiate a transition on its
> >   Upstream Port Transmit Lanes (if the Upstream Port's Transmit Lanes
> >   are in a low-power state) as soon as it detects an exit from L0s on
> >   any of its Downstream Ports.
>
> "it detects an exit"
>
> >   5.4.1.2.1: A Switch is required to initiate an L1 exit transition on
> >   its Upstream Port Link after no more than 1 μs from the beginning of
> >   an L1 exit transition on any of its Downstream Port Links.  during
> >   L1 exit.
>
> vs "from the beginning of"
>
> So to me, this looks like edge triggering - only sense i could make of
> it would be cumulative
>
> (you should also note that i have no L0s issues, but I suspect that
> the code is wrong currently)
>
> > So a switch must start upstream L0s exit "as soon as" it sees L0s exit
> > on any downstream port, while it must start L1 exit "no more than 1 μs"
> > after seeing an L1 exit.
> >
> > And I really can't tell from the spec whether we need to accumulate
> > the L0s exit latencies or not.  Maybe somebody can clarify this.
> >
> > > commit db3d9c4baf4ab177d87b5cd41f624f5901e7390f
> > > Author: Ian Kumlien <ian.kumlien@gmail.com>
> > > Date:   Sun Jul 26 16:01:15 2020 +0200
> > >
> > >     Use maximum latency when determining L1 ASPM
> > >
> > >     If it's not, we clear the link for the path that had too large latency.
> > >
> > >     Currently we check the maximum latency of upstream and downstream
> > >     per link, not the maximum for the path
> > >
> > >     This would work if all links have the same latency, but:
> > >     endpoint -> c -> b -> a -> root  (in the order we walk the path)
> > >
> > >     If c or b has the higest latency, it will not register
> > >
> > >     Fix this by maintaining the maximum latency value for the path
> > >
> > >     See this bugzilla for more information:
> > >     https://bugzilla.kernel.org/show_bug.cgi?id=208741
> > >
> > >     This fixes an issue for me where my desktops machines maximum bandwidth
> > >     for remote connections dropped from 933 MBit to ~40 MBit.
> > >
> > >     The bug became obvious once we enabled ASPM on all links:
> > >     66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
> >
> > I can't connect the dots here yet.  I don't see a PCIe-to-PCI/PCI-X
> > bridge in your lspci, so I can't figure out why this commit would make
> > a difference for you.
> >
> > IIUC, the problem device is 03:00.0, the Intel I211 NIC.  Here's the
> > path to it:
> >
> >   00:01.2 Root Port              to [bus 01-07]
> >   01:00.0 Switch Upstream Port   to [bus 02-07]
> >   02:03.0 Switch Downstream Port to [bus 03]
> >   03:00.0 Endpoint (Intel I211 NIC)
> >
> > And I think this is the relevant info:
> >
> >                                                     LnkCtl    LnkCtl
> >            ------DevCap-------  ----LnkCap-------  -Before-  -After--
> >   00:01.2                                L1 <32us       L1+       L1-
> >   01:00.0                                L1 <32us       L1+       L1-
> >   02:03.0                                L1 <32us       L1+       L1+
> >   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
> >
> > The NIC says it can tolerate at most 512ns of L0s exit latency and at
> > most 64us of L1 exit latency.
> >
> > 02:03.0 doesn't support L0s, and the NIC itself can't exit L0s that
> > fast anyway (it can only do <2us), so L0s should be out of the picture
> > completely.
> >
> > Before your patch, apparently we (or BIOS) enabled L1 on the link from
> > 00:01.2 to 01:00.0, and partially enabled it on the link from 02:03.0
> > to 03:00.0.
>
> According to the spec, this is managed by the OS - which was the
> change introduced...
>
> > It looks like we *should* be able to enable L1 on both links since the
> > exit latency should be <33us (first link starts exit at T=0, completes
> > by T=32; second link starts exit at T=1, completes by T=33), and
> > 03:00.0 can tolerate up to 64us.
> >
> > I guess the effect of your patch is to disable L1 on the 00:01.2 -
> > 01:00.0 link?  And that makes the NIC work better?  I am obviously
> > missing something because I don't understand why the patch does that
> > or why it works better.
>
> It makes it work like normal again, like if i disable ASPM on the nic itself...
>
> I don't know which value that reflects, up or down - since we do max
> of both values and
> it actually disables ASPM.
>
> What we can see is that the first device that passes the threshold is 01:00.0
>
> How can I read more data from PCIe without needing to add kprint...
>
> This is what lspci uses apparently:
> #define  PCI_EXP_LNKCAP_L0S     0x07000 /* L0s Exit Latency */
> #define  PCI_EXP_LNKCAP_L1      0x38000 /* L1 Exit Latency */
>
> But which latencies are those? up or down?
>
> > I added Alexander to cc since it sounds like he's helped debug this,
> > too.
> >
> > >     Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> > >
> > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > > index 253c30cc1967..893b37669087 100644
> > > --- a/drivers/pci/pcie/aspm.c
> > > +++ b/drivers/pci/pcie/aspm.c
> > > @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> > >
> > >  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > >  {
> > > -       u32 latency, l1_switch_latency = 0;
> > > +       u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
> > >         struct aspm_latency *acceptable;
> > >         struct pcie_link_state *link;
> > >
> > > @@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct
> > > pci_dev *endpoint)
> > >                 if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> > >                     (link->latency_dw.l0s > acceptable->l0s))
> > >                         link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > > +
> > >                 /*
> > >                  * Check L1 latency.
> > > -                * Every switch on the path to root complex need 1
> > > -                * more microsecond for L1. Spec doesn't mention L0s.
> > > +                *
> > > +                * PCIe r5.0, sec 5.4.1.2.2 states:
> > > +                * A Switch is required to initiate an L1 exit transition on its
> > > +                * Upstream Port Link after no more than 1 μs from the
> > > beginning of an
> > > +                * L1 exit transition on any of its Downstream Port Links.
> > >                  *
> > >                  * The exit latencies for L1 substates are not advertised
> > >                  * by a device.  Since the spec also doesn't mention a way
> > > @@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct
> > > pci_dev *endpoint)
> > >                  * L1 exit latencies advertised by a device include L1
> > >                  * substate latencies (and hence do not do any check).
> > >                  */
> > > -               latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > -               if ((link->aspm_capable & ASPM_STATE_L1) &&
> > > -                   (latency + l1_switch_latency > acceptable->l1))
> > > -                       link->aspm_capable &= ~ASPM_STATE_L1;
> > > -               l1_switch_latency += 1000;
> > > +               if (link->aspm_capable & ASPM_STATE_L1) {
> > > +                       latency = max_t(u32, link->latency_up.l1,
> > > link->latency_dw.l1);
> > > +                       l1_max_latency = max_t(u32, latency, l1_max_latency);
> > > +                       if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > > +                               link->aspm_capable &= ~ASPM_STATE_L1;
> > > +
> > > +                       l1_switch_latency += 1000;
> > > +               }
> > >
> > >                 link = link->parent;
> > >         }
> > > ----------------------

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-22 23:00       ` Bjorn Helgaas
@ 2020-09-22 23:29         ` Ian Kumlien
  2020-09-22 23:31           ` Ian Kumlien
  2020-09-23 21:23           ` Bjorn Helgaas
  0 siblings, 2 replies; 33+ messages in thread
From: Ian Kumlien @ 2020-09-22 23:29 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

On Wed, Sep 23, 2020 at 1:00 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Alexander]
>
> On Tue, Sep 22, 2020 at 11:02:35PM +0200, Ian Kumlien wrote:
> > On Tue, Sep 22, 2020 at 10:19 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Mon, Aug 03, 2020 at 04:58:32PM +0200, Ian Kumlien wrote:
>
> > > > @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > > >                * L1 exit latencies advertised by a device include L1
> > > >                * substate latencies (and hence do not do any check).
> > > >                */
> > > > -             latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > > -             if ((link->aspm_capable & ASPM_STATE_L1) &&
> > > > -                 (latency + l1_switch_latency > acceptable->l1))
> > > > -                     link->aspm_capable &= ~ASPM_STATE_L1;
> > > > -             l1_switch_latency += 1000;
> > > > +             if (link->aspm_capable & ASPM_STATE_L1) {
> > > > +                     latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > > +                     l1_max_latency = max_t(u32, latency, l1_max_latency);
> > > > +                     if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > > > +                             link->aspm_capable &= ~ASPM_STATE_L1;
> > > > +
> > > > +                     l1_switch_latency += 1000;
> > > > +             }
> > >
> > > This accumulates the 1 usec delays for a Switch to propagate the exit
> > > transition from its Downstream Port to its Upstream Port, but it
> > > doesn't accumulate the L1 exit latencies themselves for the entire
> > > path, does it?  I.e., we don't accumulate "latency" for the whole
> > > path.  Don't we need that?
> >
> > Not for L1's apparently, from what I gather the maximum link latency
> > is "largest latency" + 1us * number-of-hops
> >
> > Ie, just like the comment above states - the L1 total time might be
> > more but  1us is all that is needed to "start" and that propagates
> > over the link.
>
> Ah, you're right!  I don't think this is clear from the existing code
> comment, but it *is* clear from the example in sec 5.4.1.2.2 (Figure
> 5-8) of the spec.
>
> > @@ -448,14 +449,18 @@ static void pcie_aspm_check_latency(struct
> > pci_dev *endpoint)
> >
> >         while (link) {
> >                 /* Check upstream direction L0s latency */
> > -               if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> > -                   (link->latency_up.l0s > acceptable->l0s))
> > -                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > +               if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> > +                       l0s_latency_up += link->latency_up.l0s;
>
> It's pretty clear from sec 5.4.1.2.2 that we *don't* need to
> accumulate the L1 exit latencies.  Unfortunately sec 5.4.1.1.2 about
> L0s exit doesn't have such a nice example.
>
> The L0s *language* is similar though:
>
>   5.4.1.1.2: If the Upstream component is a Switch (i.e., it is not
>   the Root Complex), then it must initiate a transition on its
>   Upstream Port Transmit Lanes (if the Upstream Port's Transmit Lanes
>   are in a low-power state) as soon as it detects an exit from L0s on
>   any of its Downstream Ports.

"it detects an exit"

>   5.4.1.2.1: A Switch is required to initiate an L1 exit transition on
>   its Upstream Port Link after no more than 1 μs from the beginning of
>   an L1 exit transition on any of its Downstream Port Links.  during
>   L1 exit.

vs "from the beginning of"

So to me, this looks like edge triggering - only sense i could make of
it would be cumulative

(you should also note that i have no L0s issues, but I suspect that
the code is wrong currently)

> So a switch must start upstream L0s exit "as soon as" it sees L0s exit
> on any downstream port, while it must start L1 exit "no more than 1 μs"
> after seeing an L1 exit.
>
> And I really can't tell from the spec whether we need to accumulate
> the L0s exit latencies or not.  Maybe somebody can clarify this.
>
> > commit db3d9c4baf4ab177d87b5cd41f624f5901e7390f
> > Author: Ian Kumlien <ian.kumlien@gmail.com>
> > Date:   Sun Jul 26 16:01:15 2020 +0200
> >
> >     Use maximum latency when determining L1 ASPM
> >
> >     If it's not, we clear the link for the path that had too large latency.
> >
> >     Currently we check the maximum latency of upstream and downstream
> >     per link, not the maximum for the path
> >
> >     This would work if all links have the same latency, but:
> >     endpoint -> c -> b -> a -> root  (in the order we walk the path)
> >
> >     If c or b has the higest latency, it will not register
> >
> >     Fix this by maintaining the maximum latency value for the path
> >
> >     See this bugzilla for more information:
> >     https://bugzilla.kernel.org/show_bug.cgi?id=208741
> >
> >     This fixes an issue for me where my desktops machines maximum bandwidth
> >     for remote connections dropped from 933 MBit to ~40 MBit.
> >
> >     The bug became obvious once we enabled ASPM on all links:
> >     66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
>
> I can't connect the dots here yet.  I don't see a PCIe-to-PCI/PCI-X
> bridge in your lspci, so I can't figure out why this commit would make
> a difference for you.
>
> IIUC, the problem device is 03:00.0, the Intel I211 NIC.  Here's the
> path to it:
>
>   00:01.2 Root Port              to [bus 01-07]
>   01:00.0 Switch Upstream Port   to [bus 02-07]
>   02:03.0 Switch Downstream Port to [bus 03]
>   03:00.0 Endpoint (Intel I211 NIC)
>
> And I think this is the relevant info:
>
>                                                     LnkCtl    LnkCtl
>            ------DevCap-------  ----LnkCap-------  -Before-  -After--
>   00:01.2                                L1 <32us       L1+       L1-
>   01:00.0                                L1 <32us       L1+       L1-
>   02:03.0                                L1 <32us       L1+       L1+
>   03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-
>
> The NIC says it can tolerate at most 512ns of L0s exit latency and at
> most 64us of L1 exit latency.
>
> 02:03.0 doesn't support L0s, and the NIC itself can't exit L0s that
> fast anyway (it can only do <2us), so L0s should be out of the picture
> completely.
>
> Before your patch, apparently we (or BIOS) enabled L1 on the link from
> 00:01.2 to 01:00.0, and partially enabled it on the link from 02:03.0
> to 03:00.0.

According to the spec, this is managed by the OS - which was the
change introduced...

> It looks like we *should* be able to enable L1 on both links since the
> exit latency should be <33us (first link starts exit at T=0, completes
> by T=32; second link starts exit at T=1, completes by T=33), and
> 03:00.0 can tolerate up to 64us.
>
> I guess the effect of your patch is to disable L1 on the 00:01.2 -
> 01:00.0 link?  And that makes the NIC work better?  I am obviously
> missing something because I don't understand why the patch does that
> or why it works better.

It makes it work like normal again, like if i disable ASPM on the nic itself...

I don't know which value that reflects, up or down - since we do max
of both values and
it actually disables ASPM.

What we can see is that the first device that passes the threshold is 01:00.0

How can I read more data from PCIe without needing to add kprint...

This is what lspci uses apparently:
#define  PCI_EXP_LNKCAP_L0S     0x07000 /* L0s Exit Latency */
#define  PCI_EXP_LNKCAP_L1      0x38000 /* L1 Exit Latency */

But which latencies are those? up or down?

> I added Alexander to cc since it sounds like he's helped debug this,
> too.
>
> >     Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index 253c30cc1967..893b37669087 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> >
> >  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >  {
> > -       u32 latency, l1_switch_latency = 0;
> > +       u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
> >         struct aspm_latency *acceptable;
> >         struct pcie_link_state *link;
> >
> > @@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct
> > pci_dev *endpoint)
> >                 if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> >                     (link->latency_dw.l0s > acceptable->l0s))
> >                         link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > +
> >                 /*
> >                  * Check L1 latency.
> > -                * Every switch on the path to root complex need 1
> > -                * more microsecond for L1. Spec doesn't mention L0s.
> > +                *
> > +                * PCIe r5.0, sec 5.4.1.2.2 states:
> > +                * A Switch is required to initiate an L1 exit transition on its
> > +                * Upstream Port Link after no more than 1 μs from the
> > beginning of an
> > +                * L1 exit transition on any of its Downstream Port Links.
> >                  *
> >                  * The exit latencies for L1 substates are not advertised
> >                  * by a device.  Since the spec also doesn't mention a way
> > @@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct
> > pci_dev *endpoint)
> >                  * L1 exit latencies advertised by a device include L1
> >                  * substate latencies (and hence do not do any check).
> >                  */
> > -               latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > -               if ((link->aspm_capable & ASPM_STATE_L1) &&
> > -                   (latency + l1_switch_latency > acceptable->l1))
> > -                       link->aspm_capable &= ~ASPM_STATE_L1;
> > -               l1_switch_latency += 1000;
> > +               if (link->aspm_capable & ASPM_STATE_L1) {
> > +                       latency = max_t(u32, link->latency_up.l1,
> > link->latency_dw.l1);
> > +                       l1_max_latency = max_t(u32, latency, l1_max_latency);
> > +                       if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > +                               link->aspm_capable &= ~ASPM_STATE_L1;
> > +
> > +                       l1_switch_latency += 1000;
> > +               }
> >
> >                 link = link->parent;
> >         }
> > ----------------------

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-22 21:02     ` Ian Kumlien
@ 2020-09-22 23:00       ` Bjorn Helgaas
  2020-09-22 23:29         ` Ian Kumlien
  0 siblings, 1 reply; 33+ messages in thread
From: Bjorn Helgaas @ 2020-09-22 23:00 UTC (permalink / raw)
  To: Ian Kumlien
  Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan, Alexander Duyck

[+cc Alexander]

On Tue, Sep 22, 2020 at 11:02:35PM +0200, Ian Kumlien wrote:
> On Tue, Sep 22, 2020 at 10:19 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Mon, Aug 03, 2020 at 04:58:32PM +0200, Ian Kumlien wrote:

> > > @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> > >                * L1 exit latencies advertised by a device include L1
> > >                * substate latencies (and hence do not do any check).
> > >                */
> > > -             latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > -             if ((link->aspm_capable & ASPM_STATE_L1) &&
> > > -                 (latency + l1_switch_latency > acceptable->l1))
> > > -                     link->aspm_capable &= ~ASPM_STATE_L1;
> > > -             l1_switch_latency += 1000;
> > > +             if (link->aspm_capable & ASPM_STATE_L1) {
> > > +                     latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > > +                     l1_max_latency = max_t(u32, latency, l1_max_latency);
> > > +                     if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > > +                             link->aspm_capable &= ~ASPM_STATE_L1;
> > > +
> > > +                     l1_switch_latency += 1000;
> > > +             }
> >
> > This accumulates the 1 usec delays for a Switch to propagate the exit
> > transition from its Downstream Port to its Upstream Port, but it
> > doesn't accumulate the L1 exit latencies themselves for the entire
> > path, does it?  I.e., we don't accumulate "latency" for the whole
> > path.  Don't we need that?
> 
> Not for L1's apparently, from what I gather the maximum link latency
> is "largest latency" + 1us * number-of-hops
> 
> Ie, just like the comment above states - the L1 total time might be
> more but  1us is all that is needed to "start" and that propagates
> over the link.

Ah, you're right!  I don't think this is clear from the existing code
comment, but it *is* clear from the example in sec 5.4.1.2.2 (Figure
5-8) of the spec.

> @@ -448,14 +449,18 @@ static void pcie_aspm_check_latency(struct
> pci_dev *endpoint)
> 
>         while (link) {
>                 /* Check upstream direction L0s latency */
> -               if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> -                   (link->latency_up.l0s > acceptable->l0s))
> -                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> +               if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> +                       l0s_latency_up += link->latency_up.l0s;

It's pretty clear from sec 5.4.1.2.2 that we *don't* need to
accumulate the L1 exit latencies.  Unfortunately sec 5.4.1.1.2 about
L0s exit doesn't have such a nice example.

The L0s *language* is similar though:

  5.4.1.1.2: If the Upstream component is a Switch (i.e., it is not
  the Root Complex), then it must initiate a transition on its
  Upstream Port Transmit Lanes (if the Upstream Port's Transmit Lanes
  are in a low-power state) as soon as it detects an exit from L0s on
  any of its Downstream Ports.

  5.4.1.2.1: A Switch is required to initiate an L1 exit transition on
  its Upstream Port Link after no more than 1 μs from the beginning of
  an L1 exit transition on any of its Downstream Port Links.  during
  L1 exit.

So a switch must start upstream L0s exit "as soon as" it sees L0s exit
on any downstream port, while it must start L1 exit "no more than 1 μs"
after seeing an L1 exit.

And I really can't tell from the spec whether we need to accumulate
the L0s exit latencies or not.  Maybe somebody can clarify this.

> commit db3d9c4baf4ab177d87b5cd41f624f5901e7390f
> Author: Ian Kumlien <ian.kumlien@gmail.com>
> Date:   Sun Jul 26 16:01:15 2020 +0200
> 
>     Use maximum latency when determining L1 ASPM
> 
>     If it's not, we clear the link for the path that had too large latency.
> 
>     Currently we check the maximum latency of upstream and downstream
>     per link, not the maximum for the path
> 
>     This would work if all links have the same latency, but:
>     endpoint -> c -> b -> a -> root  (in the order we walk the path)
> 
>     If c or b has the higest latency, it will not register
> 
>     Fix this by maintaining the maximum latency value for the path
> 
>     See this bugzilla for more information:
>     https://bugzilla.kernel.org/show_bug.cgi?id=208741
> 
>     This fixes an issue for me where my desktops machines maximum bandwidth
>     for remote connections dropped from 933 MBit to ~40 MBit.
> 
>     The bug became obvious once we enabled ASPM on all links:
>     66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)

I can't connect the dots here yet.  I don't see a PCIe-to-PCI/PCI-X
bridge in your lspci, so I can't figure out why this commit would make
a difference for you.

IIUC, the problem device is 03:00.0, the Intel I211 NIC.  Here's the
path to it:

  00:01.2 Root Port              to [bus 01-07]
  01:00.0 Switch Upstream Port   to [bus 02-07]
  02:03.0 Switch Downstream Port to [bus 03]
  03:00.0 Endpoint (Intel I211 NIC)

And I think this is the relevant info:

						    LnkCtl    LnkCtl
	   ------DevCap-------  ----LnkCap-------  -Before-  -After--
  00:01.2                                L1 <32us       L1+       L1-
  01:00.0                                L1 <32us       L1+       L1-
  02:03.0                                L1 <32us       L1+       L1+
  03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-

The NIC says it can tolerate at most 512ns of L0s exit latency and at
most 64us of L1 exit latency.

02:03.0 doesn't support L0s, and the NIC itself can't exit L0s that
fast anyway (it can only do <2us), so L0s should be out of the picture
completely.

Before your patch, apparently we (or BIOS) enabled L1 on the link from
00:01.2 to 01:00.0, and partially enabled it on the link from 02:03.0
to 03:00.0.

It looks like we *should* be able to enable L1 on both links since the
exit latency should be <33us (first link starts exit at T=0, completes
by T=32; second link starts exit at T=1, completes by T=33), and
03:00.0 can tolerate up to 64us.

I guess the effect of your patch is to disable L1 on the 00:01.2 -
01:00.0 link?  And that makes the NIC work better?  I am obviously
missing something because I don't understand why the patch does that
or why it works better.

I added Alexander to cc since it sounds like he's helped debug this,
too.

>     Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 253c30cc1967..893b37669087 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> 
>  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  {
> -       u32 latency, l1_switch_latency = 0;
> +       u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
>         struct aspm_latency *acceptable;
>         struct pcie_link_state *link;
> 
> @@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct
> pci_dev *endpoint)
>                 if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
>                     (link->latency_dw.l0s > acceptable->l0s))
>                         link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +
>                 /*
>                  * Check L1 latency.
> -                * Every switch on the path to root complex need 1
> -                * more microsecond for L1. Spec doesn't mention L0s.
> +                *
> +                * PCIe r5.0, sec 5.4.1.2.2 states:
> +                * A Switch is required to initiate an L1 exit transition on its
> +                * Upstream Port Link after no more than 1 μs from the
> beginning of an
> +                * L1 exit transition on any of its Downstream Port Links.
>                  *
>                  * The exit latencies for L1 substates are not advertised
>                  * by a device.  Since the spec also doesn't mention a way
> @@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct
> pci_dev *endpoint)
>                  * L1 exit latencies advertised by a device include L1
>                  * substate latencies (and hence do not do any check).
>                  */
> -               latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> -               if ((link->aspm_capable & ASPM_STATE_L1) &&
> -                   (latency + l1_switch_latency > acceptable->l1))
> -                       link->aspm_capable &= ~ASPM_STATE_L1;
> -               l1_switch_latency += 1000;
> +               if (link->aspm_capable & ASPM_STATE_L1) {
> +                       latency = max_t(u32, link->latency_up.l1,
> link->latency_dw.l1);
> +                       l1_max_latency = max_t(u32, latency, l1_max_latency);
> +                       if (l1_max_latency + l1_switch_latency > acceptable->l1)
> +                               link->aspm_capable &= ~ASPM_STATE_L1;
> +
> +                       l1_switch_latency += 1000;
> +               }
> 
>                 link = link->parent;
>         }
> ----------------------

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-09-22 20:19   ` Bjorn Helgaas
@ 2020-09-22 21:02     ` Ian Kumlien
  2020-09-22 23:00       ` Bjorn Helgaas
  0 siblings, 1 reply; 33+ messages in thread
From: Ian Kumlien @ 2020-09-22 21:02 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan

On Tue, Sep 22, 2020 at 10:19 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Saheed, Puranjay]
>
> On Mon, Aug 03, 2020 at 04:58:32PM +0200, Ian Kumlien wrote:
> > Changes:
> > * Handle L0s correclty as well, making it per direction
> > * Moved the switch cost in to the if statement since a non L1 switch has
> >   no additional cost.
> >
> > For L0s:
> > We sumarize the entire latency per direction to see if it's acceptable
> > for the PCIe endpoint.
> >
> > If it's not, we clear the link for the path that had too large latency.
> >
> > For L1:
> > Currently we check the maximum latency of upstream and downstream
> > per link, not the maximum for the path
> >
> > This would work if all links have the same latency, but:
> > endpoint -> c -> b -> a -> root  (in the order we walk the path)
> >
> > If c or b has the higest latency, it will not register
> >
> > Fix this by maintaining the maximum latency value for the path
> >
> > This change fixes a regression introduced (but not caused) by:
> > 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
>
> We need to include some info about the problem here, e.g., the sort of
> info hinted at in
> https://lore.kernel.org/r/CAA85sZvJQge6ETwF1GkdvK1Mpwazh_cYJcmeZVAohmt0FjbMZg@mail.gmail.com
>
> It would be *really* nice to have "lspci -vv" output for the system
> when broken and when working correctly.  If they were attached to a
> bugzilla.kernel.org entry, we could include the URL to that here.

I did create a bugzilla entry, and i see what you mean, will fix

> > Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 41 ++++++++++++++++++++++++++---------------
> >  1 file changed, 26 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index b17e5ffd31b1..bc512e217258 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
> >
> >  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >  {
> > -     u32 latency, l1_switch_latency = 0;
> > +     u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
> > +             l0s_latency_up = 0, l0s_latency_dw = 0;
> >       struct aspm_latency *acceptable;
> >       struct pcie_link_state *link;
> >
> > @@ -447,15 +448,22 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >       acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
> >
> >       while (link) {
> > -             /* Check upstream direction L0s latency */
> > -             if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> > -                 (link->latency_up.l0s > acceptable->l0s))
> > -                     link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > -
> > -             /* Check downstream direction L0s latency */
> > -             if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> > -                 (link->latency_dw.l0s > acceptable->l0s))
> > -                     link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > +             if (link->aspm_capable & ASPM_STATE_L0S) {
> > +                     /* Check upstream direction L0s latency */
> > +                     if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> > +                             l0s_latency_up += link->latency_up.l0s;
> > +                             if (l0s_latency_up > acceptable->l0s)
> > +                                     link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> > +                     }
> > +
> > +                     /* Check downstream direction L0s latency */
> > +                     if (link->aspm_capable & ASPM_STATE_L0S_DW) {
> > +                             l0s_latency_dw += link->latency_dw.l0s;
> > +                             if (l0s_latency_dw > acceptable->l0s)
> > +                                     link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> > +                     }
> > +             }
>
> The main point of the above is to accumulate l0s_latency_up and
> l0s_latency_dw along the entire path.  I don't understand the
> additional:
>
>   if (link->aspm_capable & ASPM_STATE_L0S)
>
> around the whole block.  I don't think it's *wrong*, but unless I'm
> missing something it's unnecessary since we already check for
> ASPM_STATE_L0S_UP and ASPM_STATE_L0S_DW.  It does make it arguably
> more parallel with the ASPM_STATE_L1 case below, but it complicates
> the diff enough that I'm not sure it's worth it.

Yeah it's a leftover from a earlier version of the patch actually,
sorry about that

> I think this could also be split off as a separate patch to fix the
> L0s latency checking.

Ok, will do!

> >               /*
> >                * Check L1 latency.
> >                * Every switch on the path to root complex need 1
>
> Let's take the opportunity to update the comment to add the spec
> citation for this additional 1 usec of L1 exit latency for every
> switch.  I think it is PCIe r5.0, sec 5.4.1.2.2, which says:
>
>   A Switch is required to initiate an L1 exit transition on its
>   Upstream Port Link after no more than 1 μs from the beginning of an
>   L1 exit transition on any of its Downstream Port Links.

Will do!

> > @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
> >                * L1 exit latencies advertised by a device include L1
> >                * substate latencies (and hence do not do any check).
> >                */
> > -             latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > -             if ((link->aspm_capable & ASPM_STATE_L1) &&
> > -                 (latency + l1_switch_latency > acceptable->l1))
> > -                     link->aspm_capable &= ~ASPM_STATE_L1;
> > -             l1_switch_latency += 1000;
> > +             if (link->aspm_capable & ASPM_STATE_L1) {
> > +                     latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> > +                     l1_max_latency = max_t(u32, latency, l1_max_latency);
> > +                     if (l1_max_latency + l1_switch_latency > acceptable->l1)
> > +                             link->aspm_capable &= ~ASPM_STATE_L1;
> > +
> > +                     l1_switch_latency += 1000;
> > +             }
>
> This accumulates the 1 usec delays for a Switch to propagate the exit
> transition from its Downstream Port to its Upstream Port, but it
> doesn't accumulate the L1 exit latencies themselves for the entire
> path, does it?  I.e., we don't accumulate "latency" for the whole
> path.  Don't we need that?

Not for L1's apparently, from what I gather the maximum link latency
is "largest latency" + 1us * number-of-hops

Ie, just like the comment above states - the L1 total time might be
more but  1us is all that is needed to "start" and that propagates
over the link.

> >               link = link->parent;
> >       }
> > --
> > 2.28.0
> >

Included inline for discussion, will send it separately as well - when
i know it's ok :)


So this now became:
commit dde058f731f97b6f86600eb6578c8b02b8720edb (HEAD)
Author: Ian Kumlien <ian.kumlien@gmail.com>
Date:   Tue Sep 22 22:58:19 2020 +0200

    PCIe L0s latency is cumulative from the root

    From what I have been able to figure out, L0s latency is
    cumulative from the root and should be treated as such.

    Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 893b37669087..15d64832a988 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,

 static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 {
-       u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
+       u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
+               l0s_latency_up = 0, l0s_latency_dw = 0;
        struct aspm_latency *acceptable;
        struct pcie_link_state *link;

@@ -448,14 +449,18 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)

        while (link) {
                /* Check upstream direction L0s latency */
-               if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
-                   (link->latency_up.l0s > acceptable->l0s))
-                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+               if (link->aspm_capable & ASPM_STATE_L0S_UP) {
+                       l0s_latency_up += link->latency_up.l0s;
+                       if (l0s_latency_up > acceptable->l0s)
+                               link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+               }

                /* Check downstream direction L0s latency */
-               if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
-                   (link->latency_dw.l0s > acceptable->l0s))
-                       link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+               if (link->aspm_capable & ASPM_STATE_L0S_DW) {
+                       l0s_latency_dw += link->latency_dw.l0s;
+                       if (l0s_latency_dw > acceptable->l0s)
+                               link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+               }

                /*
                 * Check L1 latency.
--------------------------

And:
commit db3d9c4baf4ab177d87b5cd41f624f5901e7390f
Author: Ian Kumlien <ian.kumlien@gmail.com>
Date:   Sun Jul 26 16:01:15 2020 +0200

    Use maximum latency when determining L1 ASPM

    If it's not, we clear the link for the path that had too large latency.

    Currently we check the maximum latency of upstream and downstream
    per link, not the maximum for the path

    This would work if all links have the same latency, but:
    endpoint -> c -> b -> a -> root  (in the order we walk the path)

    If c or b has the higest latency, it will not register

    Fix this by maintaining the maximum latency value for the path

    See this bugzilla for more information:
    https://bugzilla.kernel.org/show_bug.cgi?id=208741

    This fixes an issue for me where my desktops machines maximum bandwidth
    for remote connections dropped from 933 MBit to ~40 MBit.

    The bug became obvious once we enabled ASPM on all links:
    66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)

    Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 253c30cc1967..893b37669087 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,

 static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 {
-       u32 latency, l1_switch_latency = 0;
+       u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
        struct aspm_latency *acceptable;
        struct pcie_link_state *link;

@@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
                if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
                    (link->latency_dw.l0s > acceptable->l0s))
                        link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+
                /*
                 * Check L1 latency.
-                * Every switch on the path to root complex need 1
-                * more microsecond for L1. Spec doesn't mention L0s.
+                *
+                * PCIe r5.0, sec 5.4.1.2.2 states:
+                * A Switch is required to initiate an L1 exit transition on its
+                * Upstream Port Link after no more than 1 μs from the
beginning of an
+                * L1 exit transition on any of its Downstream Port Links.
                 *
                 * The exit latencies for L1 substates are not advertised
                 * by a device.  Since the spec also doesn't mention a way
@@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
                 * L1 exit latencies advertised by a device include L1
                 * substate latencies (and hence do not do any check).
                 */
-               latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
-               if ((link->aspm_capable & ASPM_STATE_L1) &&
-                   (latency + l1_switch_latency > acceptable->l1))
-                       link->aspm_capable &= ~ASPM_STATE_L1;
-               l1_switch_latency += 1000;
+               if (link->aspm_capable & ASPM_STATE_L1) {
+                       latency = max_t(u32, link->latency_up.l1,
link->latency_dw.l1);
+                       l1_max_latency = max_t(u32, latency, l1_max_latency);
+                       if (l1_max_latency + l1_switch_latency > acceptable->l1)
+                               link->aspm_capable &= ~ASPM_STATE_L1;
+
+                       l1_switch_latency += 1000;
+               }

                link = link->parent;
        }
----------------------

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-08-03 14:58 ` [PATCH] Use maximum latency when determining L1/L0s ASPM v2 Ian Kumlien
  2020-08-15 19:39   ` Ian Kumlien
  2020-09-18 22:47   ` Ian Kumlien
@ 2020-09-22 20:19   ` Bjorn Helgaas
  2020-09-22 21:02     ` Ian Kumlien
  2020-10-05 18:31   ` Bjorn Helgaas
  3 siblings, 1 reply; 33+ messages in thread
From: Bjorn Helgaas @ 2020-09-22 20:19 UTC (permalink / raw)
  To: Ian Kumlien; +Cc: linux-pci, Saheed O. Bolarinwa, Puranjay Mohan

[+cc Saheed, Puranjay]

On Mon, Aug 03, 2020 at 04:58:32PM +0200, Ian Kumlien wrote:
> Changes:
> * Handle L0s correclty as well, making it per direction
> * Moved the switch cost in to the if statement since a non L1 switch has
>   no additional cost.
> 
> For L0s:
> We sumarize the entire latency per direction to see if it's acceptable
> for the PCIe endpoint.
> 
> If it's not, we clear the link for the path that had too large latency.
> 
> For L1:
> Currently we check the maximum latency of upstream and downstream
> per link, not the maximum for the path
> 
> This would work if all links have the same latency, but:
> endpoint -> c -> b -> a -> root  (in the order we walk the path)
> 
> If c or b has the higest latency, it will not register
> 
> Fix this by maintaining the maximum latency value for the path
> 
> This change fixes a regression introduced (but not caused) by:
> 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)

We need to include some info about the problem here, e.g., the sort of
info hinted at in
https://lore.kernel.org/r/CAA85sZvJQge6ETwF1GkdvK1Mpwazh_cYJcmeZVAohmt0FjbMZg@mail.gmail.com

It would be *really* nice to have "lspci -vv" output for the system
when broken and when working correctly.  If they were attached to a
bugzilla.kernel.org entry, we could include the URL to that here.

> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> ---
>  drivers/pci/pcie/aspm.c | 41 ++++++++++++++++++++++++++---------------
>  1 file changed, 26 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index b17e5ffd31b1..bc512e217258 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>  
>  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  {
> -	u32 latency, l1_switch_latency = 0;
> +	u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
> +		l0s_latency_up = 0, l0s_latency_dw = 0;
>  	struct aspm_latency *acceptable;
>  	struct pcie_link_state *link;
>  
> @@ -447,15 +448,22 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  	acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
>  
>  	while (link) {
> -		/* Check upstream direction L0s latency */
> -		if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> -		    (link->latency_up.l0s > acceptable->l0s))
> -			link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> -
> -		/* Check downstream direction L0s latency */
> -		if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> -		    (link->latency_dw.l0s > acceptable->l0s))
> -			link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +		if (link->aspm_capable & ASPM_STATE_L0S) {
> +			/* Check upstream direction L0s latency */
> +			if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> +				l0s_latency_up += link->latency_up.l0s;
> +				if (l0s_latency_up > acceptable->l0s)
> +					link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> +			}
> +
> +			/* Check downstream direction L0s latency */
> +			if (link->aspm_capable & ASPM_STATE_L0S_DW) {
> +				l0s_latency_dw += link->latency_dw.l0s;
> +				if (l0s_latency_dw > acceptable->l0s)
> +					link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +			}
> +		}

The main point of the above is to accumulate l0s_latency_up and
l0s_latency_dw along the entire path.  I don't understand the
additional:

  if (link->aspm_capable & ASPM_STATE_L0S)

around the whole block.  I don't think it's *wrong*, but unless I'm
missing something it's unnecessary since we already check for
ASPM_STATE_L0S_UP and ASPM_STATE_L0S_DW.  It does make it arguably
more parallel with the ASPM_STATE_L1 case below, but it complicates
the diff enough that I'm not sure it's worth it.

I think this could also be split off as a separate patch to fix the
L0s latency checking.

>  		/*
>  		 * Check L1 latency.
>  		 * Every switch on the path to root complex need 1

Let's take the opportunity to update the comment to add the spec
citation for this additional 1 usec of L1 exit latency for every
switch.  I think it is PCIe r5.0, sec 5.4.1.2.2, which says:

  A Switch is required to initiate an L1 exit transition on its
  Upstream Port Link after no more than 1 μs from the beginning of an
  L1 exit transition on any of its Downstream Port Links.

> @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  		 * L1 exit latencies advertised by a device include L1
>  		 * substate latencies (and hence do not do any check).
>  		 */
> -		latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> -		if ((link->aspm_capable & ASPM_STATE_L1) &&
> -		    (latency + l1_switch_latency > acceptable->l1))
> -			link->aspm_capable &= ~ASPM_STATE_L1;
> -		l1_switch_latency += 1000;
> +		if (link->aspm_capable & ASPM_STATE_L1) {
> +			latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> +			l1_max_latency = max_t(u32, latency, l1_max_latency);
> +			if (l1_max_latency + l1_switch_latency > acceptable->l1)
> +				link->aspm_capable &= ~ASPM_STATE_L1;
> +
> +			l1_switch_latency += 1000;
> +		}

This accumulates the 1 usec delays for a Switch to propagate the exit
transition from its Downstream Port to its Upstream Port, but it
doesn't accumulate the L1 exit latencies themselves for the entire
path, does it?  I.e., we don't accumulate "latency" for the whole
path.  Don't we need that?

>  		link = link->parent;
>  	}
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-08-03 14:58 ` [PATCH] Use maximum latency when determining L1/L0s ASPM v2 Ian Kumlien
  2020-08-15 19:39   ` Ian Kumlien
@ 2020-09-18 22:47   ` Ian Kumlien
  2020-09-22 20:19   ` Bjorn Helgaas
  2020-10-05 18:31   ` Bjorn Helgaas
  3 siblings, 0 replies; 33+ messages in thread
From: Ian Kumlien @ 2020-09-18 22:47 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas

Any comments on this? I'm still patching my kernels...

On Mon, Aug 3, 2020 at 4:58 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> Changes:
> * Handle L0s correclty as well, making it per direction
> * Moved the switch cost in to the if statement since a non L1 switch has
>   no additional cost.
>
> For L0s:
> We sumarize the entire latency per direction to see if it's acceptable
> for the PCIe endpoint.
>
> If it's not, we clear the link for the path that had too large latency.
>
> For L1:
> Currently we check the maximum latency of upstream and downstream
> per link, not the maximum for the path
>
> This would work if all links have the same latency, but:
> endpoint -> c -> b -> a -> root  (in the order we walk the path)
>
> If c or b has the higest latency, it will not register
>
> Fix this by maintaining the maximum latency value for the path
>
> This change fixes a regression introduced (but not caused) by:
> 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
>
> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> ---
>  drivers/pci/pcie/aspm.c | 41 ++++++++++++++++++++++++++---------------
>  1 file changed, 26 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index b17e5ffd31b1..bc512e217258 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>
>  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  {
> -       u32 latency, l1_switch_latency = 0;
> +       u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
> +               l0s_latency_up = 0, l0s_latency_dw = 0;
>         struct aspm_latency *acceptable;
>         struct pcie_link_state *link;
>
> @@ -447,15 +448,22 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>         acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
>
>         while (link) {
> -               /* Check upstream direction L0s latency */
> -               if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> -                   (link->latency_up.l0s > acceptable->l0s))
> -                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> -
> -               /* Check downstream direction L0s latency */
> -               if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> -                   (link->latency_dw.l0s > acceptable->l0s))
> -                       link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +               if (link->aspm_capable & ASPM_STATE_L0S) {
> +                       /* Check upstream direction L0s latency */
> +                       if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> +                               l0s_latency_up += link->latency_up.l0s;
> +                               if (l0s_latency_up > acceptable->l0s)
> +                                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> +                       }
> +
> +                       /* Check downstream direction L0s latency */
> +                       if (link->aspm_capable & ASPM_STATE_L0S_DW) {
> +                               l0s_latency_dw += link->latency_dw.l0s;
> +                               if (l0s_latency_dw > acceptable->l0s)
> +                                       link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +                       }
> +               }
> +
>                 /*
>                  * Check L1 latency.
>                  * Every switch on the path to root complex need 1
> @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>                  * L1 exit latencies advertised by a device include L1
>                  * substate latencies (and hence do not do any check).
>                  */
> -               latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> -               if ((link->aspm_capable & ASPM_STATE_L1) &&
> -                   (latency + l1_switch_latency > acceptable->l1))
> -                       link->aspm_capable &= ~ASPM_STATE_L1;
> -               l1_switch_latency += 1000;
> +               if (link->aspm_capable & ASPM_STATE_L1) {
> +                       latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> +                       l1_max_latency = max_t(u32, latency, l1_max_latency);
> +                       if (l1_max_latency + l1_switch_latency > acceptable->l1)
> +                               link->aspm_capable &= ~ASPM_STATE_L1;
> +
> +                       l1_switch_latency += 1000;
> +               }
>
>                 link = link->parent;
>         }
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-08-03 14:58 ` [PATCH] Use maximum latency when determining L1/L0s ASPM v2 Ian Kumlien
@ 2020-08-15 19:39   ` Ian Kumlien
  2020-09-18 22:47   ` Ian Kumlien
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Ian Kumlien @ 2020-08-15 19:39 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas

Hi again,

Just trying to bump and also asking a question ...

That 1 us, could that be related to L0s latency - so should we add a
potential workaround by doing
max_t(u32, 1000, L0s.up + L0s.dw) to ensure that the time is never
greater on any hardware ;)

And also warn about it... =)

On Mon, Aug 3, 2020 at 4:58 PM Ian Kumlien <ian.kumlien@gmail.com> wrote:
>
> Changes:
> * Handle L0s correclty as well, making it per direction
> * Moved the switch cost in to the if statement since a non L1 switch has
>   no additional cost.
>
> For L0s:
> We sumarize the entire latency per direction to see if it's acceptable
> for the PCIe endpoint.
>
> If it's not, we clear the link for the path that had too large latency.
>
> For L1:
> Currently we check the maximum latency of upstream and downstream
> per link, not the maximum for the path
>
> This would work if all links have the same latency, but:
> endpoint -> c -> b -> a -> root  (in the order we walk the path)
>
> If c or b has the higest latency, it will not register
>
> Fix this by maintaining the maximum latency value for the path
>
> This change fixes a regression introduced (but not caused) by:
> 66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)
>
> Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
> ---
>  drivers/pci/pcie/aspm.c | 41 ++++++++++++++++++++++++++---------------
>  1 file changed, 26 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index b17e5ffd31b1..bc512e217258 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
>
>  static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>  {
> -       u32 latency, l1_switch_latency = 0;
> +       u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
> +               l0s_latency_up = 0, l0s_latency_dw = 0;
>         struct aspm_latency *acceptable;
>         struct pcie_link_state *link;
>
> @@ -447,15 +448,22 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>         acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
>
>         while (link) {
> -               /* Check upstream direction L0s latency */
> -               if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
> -                   (link->latency_up.l0s > acceptable->l0s))
> -                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> -
> -               /* Check downstream direction L0s latency */
> -               if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
> -                   (link->latency_dw.l0s > acceptable->l0s))
> -                       link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +               if (link->aspm_capable & ASPM_STATE_L0S) {
> +                       /* Check upstream direction L0s latency */
> +                       if (link->aspm_capable & ASPM_STATE_L0S_UP) {
> +                               l0s_latency_up += link->latency_up.l0s;
> +                               if (l0s_latency_up > acceptable->l0s)
> +                                       link->aspm_capable &= ~ASPM_STATE_L0S_UP;
> +                       }
> +
> +                       /* Check downstream direction L0s latency */
> +                       if (link->aspm_capable & ASPM_STATE_L0S_DW) {
> +                               l0s_latency_dw += link->latency_dw.l0s;
> +                               if (l0s_latency_dw > acceptable->l0s)
> +                                       link->aspm_capable &= ~ASPM_STATE_L0S_DW;
> +                       }
> +               }
> +
>                 /*
>                  * Check L1 latency.
>                  * Every switch on the path to root complex need 1
> @@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
>                  * L1 exit latencies advertised by a device include L1
>                  * substate latencies (and hence do not do any check).
>                  */
> -               latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> -               if ((link->aspm_capable & ASPM_STATE_L1) &&
> -                   (latency + l1_switch_latency > acceptable->l1))
> -                       link->aspm_capable &= ~ASPM_STATE_L1;
> -               l1_switch_latency += 1000;
> +               if (link->aspm_capable & ASPM_STATE_L1) {
> +                       latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
> +                       l1_max_latency = max_t(u32, latency, l1_max_latency);
> +                       if (l1_max_latency + l1_switch_latency > acceptable->l1)
> +                               link->aspm_capable &= ~ASPM_STATE_L1;
> +
> +                       l1_switch_latency += 1000;
> +               }
>
>                 link = link->parent;
>         }
> --
> 2.28.0
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH] Use maximum latency when determining L1/L0s ASPM v2
  2020-07-29 22:27 [PATCH] Use maximum latency when determining L1 ASPM Bjorn Helgaas
@ 2020-08-03 14:58 ` Ian Kumlien
  2020-08-15 19:39   ` Ian Kumlien
                     ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Ian Kumlien @ 2020-08-03 14:58 UTC (permalink / raw)
  To: linux-pci, helgaas; +Cc: Ian Kumlien

Changes:
* Handle L0s correclty as well, making it per direction
* Moved the switch cost in to the if statement since a non L1 switch has
  no additional cost.

For L0s:
We sumarize the entire latency per direction to see if it's acceptable
for the PCIe endpoint.

If it's not, we clear the link for the path that had too large latency.

For L1:
Currently we check the maximum latency of upstream and downstream
per link, not the maximum for the path

This would work if all links have the same latency, but:
endpoint -> c -> b -> a -> root  (in the order we walk the path)

If c or b has the higest latency, it will not register

Fix this by maintaining the maximum latency value for the path

This change fixes a regression introduced (but not caused) by:
66ff14e59e8a (PCI/ASPM: Allow ASPM on links to PCIe-to-PCI/PCI-X Bridges)

Signed-off-by: Ian Kumlien <ian.kumlien@gmail.com>
---
 drivers/pci/pcie/aspm.c | 41 ++++++++++++++++++++++++++---------------
 1 file changed, 26 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index b17e5ffd31b1..bc512e217258 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
 
 static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 {
-	u32 latency, l1_switch_latency = 0;
+	u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
+		l0s_latency_up = 0, l0s_latency_dw = 0;
 	struct aspm_latency *acceptable;
 	struct pcie_link_state *link;
 
@@ -447,15 +448,22 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 	acceptable = &link->acceptable[PCI_FUNC(endpoint->devfn)];
 
 	while (link) {
-		/* Check upstream direction L0s latency */
-		if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
-		    (link->latency_up.l0s > acceptable->l0s))
-			link->aspm_capable &= ~ASPM_STATE_L0S_UP;
-
-		/* Check downstream direction L0s latency */
-		if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
-		    (link->latency_dw.l0s > acceptable->l0s))
-			link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+		if (link->aspm_capable & ASPM_STATE_L0S) {
+			/* Check upstream direction L0s latency */
+			if (link->aspm_capable & ASPM_STATE_L0S_UP) {
+				l0s_latency_up += link->latency_up.l0s;
+				if (l0s_latency_up > acceptable->l0s)
+					link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+			}
+
+			/* Check downstream direction L0s latency */
+			if (link->aspm_capable & ASPM_STATE_L0S_DW) {
+				l0s_latency_dw += link->latency_dw.l0s;
+				if (l0s_latency_dw > acceptable->l0s)
+					link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+			}
+		}
+
 		/*
 		 * Check L1 latency.
 		 * Every switch on the path to root complex need 1
@@ -469,11 +477,14 @@ static void pcie_aspm_check_latency(struct pci_dev *endpoint)
 		 * L1 exit latencies advertised by a device include L1
 		 * substate latencies (and hence do not do any check).
 		 */
-		latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
-		if ((link->aspm_capable & ASPM_STATE_L1) &&
-		    (latency + l1_switch_latency > acceptable->l1))
-			link->aspm_capable &= ~ASPM_STATE_L1;
-		l1_switch_latency += 1000;
+		if (link->aspm_capable & ASPM_STATE_L1) {
+			latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
+			l1_max_latency = max_t(u32, latency, l1_max_latency);
+			if (l1_max_latency + l1_switch_latency > acceptable->l1)
+				link->aspm_capable &= ~ASPM_STATE_L1;
+
+			l1_switch_latency += 1000;
+		}
 
 		link = link->parent;
 	}
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-10-07 13:03 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAA85sZvrPApeAYPVSYdVuKnp84xCpLBLf+f32e=R9tdPC0dvOw@mail.gmail.com>
2020-09-25 15:49 ` [PATCH] Use maximum latency when determining L1/L0s ASPM v2 Bjorn Helgaas
2020-09-25 22:26   ` Ian Kumlien
2020-09-28  0:06     ` Bjorn Helgaas
2020-09-28 10:24       ` Ian Kumlien
2020-09-28 17:09         ` Bjorn Helgaas
2020-09-28 17:41           ` Ian Kumlien
2020-09-28 19:53             ` Alexander Duyck
2020-09-28 20:04               ` Ian Kumlien
2020-09-28 20:33                 ` Ian Kumlien
2020-09-28 23:30                   ` Alexander Duyck
2020-09-29 12:51                     ` Ian Kumlien
2020-09-29 16:23                       ` Alexander Duyck
2020-09-29 21:13                         ` Ian Kumlien
2020-09-28 21:43               ` Ian Kumlien
2020-09-28 18:10           ` Alexander Duyck
2020-07-29 22:27 [PATCH] Use maximum latency when determining L1 ASPM Bjorn Helgaas
2020-08-03 14:58 ` [PATCH] Use maximum latency when determining L1/L0s ASPM v2 Ian Kumlien
2020-08-15 19:39   ` Ian Kumlien
2020-09-18 22:47   ` Ian Kumlien
2020-09-22 20:19   ` Bjorn Helgaas
2020-09-22 21:02     ` Ian Kumlien
2020-09-22 23:00       ` Bjorn Helgaas
2020-09-22 23:29         ` Ian Kumlien
2020-09-22 23:31           ` Ian Kumlien
2020-09-23 21:23           ` Bjorn Helgaas
2020-09-23 21:36             ` Ian Kumlien
2020-09-23 21:48               ` Ian Kumlien
2020-09-24 16:24               ` Bjorn Helgaas
2020-09-25  8:06                 ` Ian Kumlien
2020-10-05 18:31   ` Bjorn Helgaas
2020-10-05 18:38     ` Ian Kumlien
2020-10-05 19:09       ` Bjorn Helgaas
2020-10-07 11:31         ` Ian Kumlien
2020-10-07 13:03           ` Bjorn Helgaas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.