netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Duyck <alexander.duyck@gmail.com>
To: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	David Miller <davem@davemloft.net>,
	Realtek linux nic maintainers <nic_swsd@realtek.com>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Stephen Hemminger <stephen@networkplumber.org>
Subject: Re: [PATCH net-next resubmit v2] r8169: disable ASPM in case of tx timeout
Date: Wed, 11 Jan 2023 14:38:04 -0800	[thread overview]
Message-ID: <CAKgT0UewG-nfgd3mz6GPy=KLk8gkerToyapg4R+=g4wUo5fMWQ@mail.gmail.com> (raw)
In-Reply-To: <fc80b42a-e488-e8a2-9669-d33a5150ac9b@gmail.com>

On Wed, Jan 11, 2023 at 12:17 PM Heiner Kallweit <hkallweit1@gmail.com> wrote:
>
> On 11.01.2023 17:16, Alexander H Duyck wrote:
> > On Tue, 2023-01-10 at 23:03 +0100, Heiner Kallweit wrote:
> >> There are still single reports of systems where ASPM incompatibilities
> >> cause tx timeouts. It's not clear whom to blame, so let's disable
> >> ASPM in case of a tx timeout.
> >>
> >> v2:
> >> - add one-time warning for informing the user
> >>
> >> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> >
> >>From past experience I have seen ASPM issues cause the device to
> > disappear from the bus after failing to come out of L1. If that occurs
> > this won't be able to recover after the timeout without resetting the
> > bus itself. As such it may be necessary to disable the link states
> > prior to using the device rather than waiting until after the error.
> > That can be addressed in a follow-on patch if this doesn't resolve the
> > issue.
> >
>
> Interesting, reports about disappearing devices I haven't seen yet.
> Symptoms I've seen differ, based on combination of more or less faulty
> NIC chipset version, BIOS bugs, PCIe mainboard chipset.
> Typically users experienced missed rx packets, tx timeouts or NIC lockups.
> Disabling ASPM resulted in complaints of notebook users about reduced
> system runtime on battery.
> Meanwhile we found a good balance and reports about ASPM issues
> became quite rare.
> Just L1.2 still causes issues under load even with newer chipset versions,
> therefore L1.2 is disabled per default.

Does your driver do any checking for MMIO failures on reads? Basically
when the device disappears it should start returning ~0 on mmio reads.
The device itself doesn't disappear, but it doesn't respond to
requests anymore so it might be the "NIC lockups" case you mentioned.
The Intel parts would disappear as they would trigger their "surprise
removal" logic which would detach the netdevice. I have seen that
issue on some platforms. It is kind of interesting when you can
actually watch it happen as the issue was essentially a marginal PCIe
connection so it would start out at x4, then renegotiate down with
each ASPM L1 link bounce, and eventually it would end up at x1 before
just dropping off the bus.

I agree pro-actively disabling ASPM is bad for power savings. So if
this approach can resolve it then I am more than willing to give it a
try. My main concern is if MMIO is already borked, updating the ASPM
settings may not be enough to bring it back and it may require a
secondary bus reset.

  reply	other threads:[~2023-01-11 22:38 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-10 22:03 [PATCH net-next resubmit v2] r8169: disable ASPM in case of tx timeout Heiner Kallweit
2023-01-11 16:16 ` Alexander H Duyck
2023-01-11 20:17   ` Heiner Kallweit
2023-01-11 22:38     ` Alexander Duyck [this message]
2023-01-11 22:57       ` Heiner Kallweit
2023-01-12  4:20 ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKgT0UewG-nfgd3mz6GPy=KLk8gkerToyapg4R+=g4wUo5fMWQ@mail.gmail.com' \
    --to=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hkallweit1@gmail.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nic_swsd@realtek.com \
    --cc=pabeni@redhat.com \
    --cc=stephen@networkplumber.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).