All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Ian Kumlien <ian.kumlien@gmail.com>
Cc: linux-pci@vger.kernel.org,
	"Saheed O. Bolarinwa" <refactormyself@gmail.com>,
	Puranjay Mohan <puranjay12@gmail.com>,
	Alexander Duyck <alexander.h.duyck@linux.intel.com>
Subject: Re: [PATCH] Use maximum latency when determining L1/L0s ASPM v2
Date: Sun, 27 Sep 2020 19:06:37 -0500	[thread overview]
Message-ID: <20200928000637.GA2471891@bjorn-Precision-5520> (raw)
In-Reply-To: <CAA85sZsRgEEQ9bRqvMKTkGW4QhVp+cqNbw+VmRZQHfpy==F0+Q@mail.gmail.com>

On Sat, Sep 26, 2020 at 12:26:53AM +0200, Ian Kumlien wrote:
> On Fri, Sep 25, 2020 at 5:49 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Sep 25, 2020 at 03:54:11PM +0200, Ian Kumlien wrote:
> > > On Fri, Sep 25, 2020 at 3:39 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > >
> > > > On Fri, Sep 25, 2020 at 12:18:50PM +0200, Ian Kumlien wrote:
> > > > > So.......
> > > > > [    0.815843] pci 0000:04:00.0: L1 latency exceeded - path: 1000 - max: 64000
> > > > > [    0.815843] pci 0000:00:01.2: Upstream device - 32000
> > > > > [    0.815844] pci 0000:01:00.0: Downstream device - 32000
> > > >
> > > > Wait a minute.  I've been looking at *03:00.0*, not 04:00.0.  Based
> > > > on your bugzilla, here's the path:
> > >
> > > Correct, or you could do it like this:
> > > 00:01.2/01:00.0/02:03.0/03:00.0 Ethernet controller: Intel Corporation
> > > I211 Gigabit Network Connection (rev 03)
> > >
> > > >   00:01.2 Root Port              to [bus 01-07]
> > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > >   02:03.0 Switch Downstream Port to [bus 03]
> > > >   03:00.0 Endpoint (Intel I211 NIC)
> > > >
> > > > Your system does also have an 04:00.0 here:
> > > >
> > > >   00:01.2 Root Port              to [bus 01-07]
> > > >   01:00.0 Switch Upstream Port   to [bus 02-07]
> > > >   02:04.0 Switch Downstream Port to [bus 04]
> > > >   04:00.0 Endpoint (Realtek 816e)
> > > >   04:00.1 Endpoint (Realtek RTL8111/8168/8411 NIC)
> > > >   04:00.2 Endpoint (Realtek 816a)
> > > >   04:00.4 Endpoint (Realtek 816d EHCI USB)
> > > >   04:00.7 Endpoint (Realtek 816c IPMI)
> > > >
> > > > Which NIC is the problem?
> > >
> > > The intel one - so the side effect of the realtek nic is that it fixes
> > > the intel nics issues.
> > >
> > > It would be that the intel nic actually has a bug with L1 (and it
> > > would seem that it's to kind with latencies) so it actually has a
> > > smaller buffer...
> > >
> > > And afair, the realtek has a larger buffer, since it behaves better
> > > with L1 enabled.
> > >
> > > Either way, it's a fix that's needed ;)
> >
> > OK, what specifically is the fix that's needed?  The L0s change seems
> > like a "this looks wrong" thing that doesn't actually affect your
> > situation, so let's skip that for now.
> 
> L1 latency calculation is not good enough, it assumes that *any*
> link is the highest latency link - which is incorrect.
> 
> The latency to bring L1 up is number-of-hops*1000 +
> maximum-latency-along-the-path
> 
> currently it's only doing number-of-hops*1000 +
> arbitrary-latency-of-current-link
> 
> > And let's support the change you *do* need with the "lspci -vv" for
> > all the relevant devices (including both 03:00.0 and 04:00.x, I guess,
> > since they share the 00:01.2 - 01:00.0 link), before and after the
> > change.
> 
> They are all included in all lspci output in the bug

No doubt.  But I spent a long time going through those and the
differences I found are not enough to show a spec violation or a fix.

Here's what I extracted (this is a repeat; I mentioned this before):

                                                    LnkCtl    LnkCtl
           ------DevCap-------  ----LnkCap-------  -Before-  -After--
  00:01.2                                L1 <32us       L1+       L1-
  01:00.0                                L1 <32us       L1+       L1-
  02:03.0                                L1 <32us       L1+       L1+
  03:00.0  L0s <512ns L1 <64us  L0s <2us L1 <16us  L0s- L1-  L0s- L1-

I don't see anything wrong here yet.  03:00.0 claims it can handle up
to 64us of L1 exit latency, and the L1 exit latency of the entire path
should be 33us.  What am I missing?

> > I want to identify something in the "before" configuration that is
> > wrong according to the spec, and a change in the "after" config so it
> > now conforms to the spec.
> 
> So there are a few issues here, the current code does not apply to spec.
> 
> The intel nic gets fixed as a side effect (it should still get a
> proper fix) of making
> the code apply to spec.
> 
> Basically, while hunting for the issue, I found that the L1 and L0s
> latency calculations used to determine
> if they should be enabled or not is wrong - that is what I'm currently
> trying to push - it also seems like the
> intel nic claims that it can handle 64us but apparently it can't.
> 
> So, three bugs, two are "fixed" one needs additional fixing.

OK, we just need to split these up as much as possible and support
them with the relevant lspci output, analysis of what specifically is
wrong, and the lspci output showing the effect of the fix.

Bjorn

  reply	other threads:[~2020-09-28  0:06 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAA85sZvrPApeAYPVSYdVuKnp84xCpLBLf+f32e=R9tdPC0dvOw@mail.gmail.com>
2020-09-25 15:49 ` [PATCH] Use maximum latency when determining L1/L0s ASPM v2 Bjorn Helgaas
2020-09-25 22:26   ` Ian Kumlien
2020-09-28  0:06     ` Bjorn Helgaas [this message]
2020-09-28 10:24       ` Ian Kumlien
2020-09-28 17:09         ` Bjorn Helgaas
2020-09-28 17:41           ` Ian Kumlien
2020-09-28 19:53             ` Alexander Duyck
2020-09-28 20:04               ` Ian Kumlien
2020-09-28 20:33                 ` Ian Kumlien
2020-09-28 23:30                   ` Alexander Duyck
2020-09-29 12:51                     ` Ian Kumlien
2020-09-29 16:23                       ` Alexander Duyck
2020-09-29 21:13                         ` Ian Kumlien
2020-09-28 21:43               ` Ian Kumlien
2020-09-28 18:10           ` Alexander Duyck
2020-07-29 22:27 [PATCH] Use maximum latency when determining L1 ASPM Bjorn Helgaas
2020-08-03 14:58 ` [PATCH] Use maximum latency when determining L1/L0s ASPM v2 Ian Kumlien
2020-08-15 19:39   ` Ian Kumlien
2020-09-18 22:47   ` Ian Kumlien
2020-09-22 20:19   ` Bjorn Helgaas
2020-09-22 21:02     ` Ian Kumlien
2020-09-22 23:00       ` Bjorn Helgaas
2020-09-22 23:29         ` Ian Kumlien
2020-09-22 23:31           ` Ian Kumlien
2020-09-23 21:23           ` Bjorn Helgaas
2020-09-23 21:36             ` Ian Kumlien
2020-09-23 21:48               ` Ian Kumlien
2020-09-24 16:24               ` Bjorn Helgaas
2020-09-25  8:06                 ` Ian Kumlien
2020-10-05 18:31   ` Bjorn Helgaas
2020-10-05 18:38     ` Ian Kumlien
2020-10-05 19:09       ` Bjorn Helgaas
2020-10-07 11:31         ` Ian Kumlien
2020-10-07 13:03           ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200928000637.GA2471891@bjorn-Precision-5520 \
    --to=helgaas@kernel.org \
    --cc=alexander.h.duyck@linux.intel.com \
    --cc=ian.kumlien@gmail.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=puranjay12@gmail.com \
    --cc=refactormyself@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.