All of lore.kernel.org
 help / color / mirror / Atom feed
From: Douglas Anderson <dianders@chromium.org>
To: Jakub Kicinski <kuba@kernel.org>,
	Hayes Wang <hayeswang@realtek.com>,
	"David S . Miller" <davem@davemloft.net>
Cc: "Edward Hill" <ecgh@chromium.org>,
	"Laura Nao" <laura.nao@collabora.com>,
	"Alan Stern" <stern@rowland.harvard.edu>,
	"Simon Horman" <horms@kernel.org>,
	linux-usb@vger.kernel.org,
	"Grant Grundler" <grundler@chromium.org>,
	"Douglas Anderson" <dianders@chromium.org>,
	"Bjørn Mork" <bjorn@mork.no>,
	"Eric Dumazet" <edumazet@google.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Prashant Malani" <pmalani@chromium.org>,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org
Subject: [PATCH v5 0/8] r8152: Avoid writing garbage to the adapter's registers
Date: Fri, 20 Oct 2023 14:06:51 -0700	[thread overview]
Message-ID: <20231020210751.3415723-1-dianders@chromium.org> (raw)

This series is the result of a cooperative debug effort between
Realtek and the ChromeOS team. On ChromeOS, we've noticed that Realtek
Ethernet adapters can sometimes get so wedged that even a reboot of
the host can't get them to enumerate again, assuming that the adapter
was on a powered hub and din't lose power when the host rebooted. This
is sometimes seen in the ChromeOS automated testing lab. The only way
to recover adapters in this state is to manually power cycle them.

I managed to reproduce one instance of this wedging (unknown if this
is truly related to what the test lab sees) by doing this:
1. Start a flood ping from a host to the device.
2. Drop the device into kdb.
3. Wait 90 seconds.
4. Resume from kdb (the "g" command).
5. Wait another 45 seconds.

Upon analysis, Realtek realized this was happening:

1. The Linux driver was getting a "Tx timeout" after resuming from kdb
   and then trying to reset itself.
2. As part of the reset, the Linux driver was attempting to do a
   read-modify-write of the adapter's registers.
3. The read would fail (due to a timeout) and the driver pretended
   that the register contained all 0xFFs. See commit f53a7ad18959
   ("r8152: Set memory to all 0xFFs on failed reg reads")
4. The driver would take this value of all 0xFFs, modify it, and
   attempt to write it back to the adapter.
5. By this time the USB channel seemed to recover and thus we'd
   successfully write a value that was mostly 0xFFs to the adpater.
6. The adapter didn't like this and would wedge itself.

Another Engineer also managed to reproduce wedging of the Realtek
Ethernet adpater during a reboot test on an AMD Chromebook. In that
case he was sometimes seeing -EPIPE returned from the control
transfers.

This patch series fixes both issues.

Changes in v5:
- ("Run the unload routine if we have errors during probe") new for v5.
- ("Cancel hw_phy_work if we have an error in probe") new for v5.
- ("Release firmware if we have an error in probe") new for v5.
- Removed extra mutex_unlock() left over in v4.
- Fixed minor typos.
- Don't do queue an unbind/bind reset if probe fails; just retry probe.

Changes in v4:
- Took out some unnecessary locks/unlocks of the control mutex.
- Added comment about reading version causing probe fail if 3 fails.
- Added text to commit msg about the potential unbind/bind loop.

Changes in v3:
- Fixed v2 changelog ending up in the commit message.
- farmework -> framework in comments.

Changes in v2:
- ("Check for unplug in rtl_phy_patch_request()") new for v2.
- ("Check for unplug in r8153b_ups_en() / r8153c_ups_en()") new for v2.
- ("Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE") new for v2.
- Reset patch no longer based on retry patch, since that was dropped.
- Reset patch should be robust even if failures happen in probe.
- Switched booleans to bits in the "flags" variable.
- Check for -ENODEV instead of "udev->state == USB_STATE_NOTATTACHED"

Douglas Anderson (8):
  r8152: Increase USB control msg timeout to 5000ms as per spec
  r8152: Run the unload routine if we have errors during probe
  r8152: Cancel hw_phy_work if we have an error in probe
  r8152: Release firmware if we have an error in probe
  r8152: Check for unplug in rtl_phy_patch_request()
  r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en()
  r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE
  r8152: Block future register access if register access fails

 drivers/net/usb/r8152.c | 303 ++++++++++++++++++++++++++++++----------
 1 file changed, 230 insertions(+), 73 deletions(-)

-- 
2.42.0.758.gaed0368e0e-goog


             reply	other threads:[~2023-10-20 21:08 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-20 21:06 Douglas Anderson [this message]
2023-10-20 21:06 ` [PATCH v5 1/8] r8152: Increase USB control msg timeout to 5000ms as per spec Douglas Anderson
2023-10-21 14:48   ` Grant Grundler
2023-10-24  1:23   ` Florian Fainelli
2023-10-20 21:06 ` [PATCH v5 2/8] r8152: Run the unload routine if we have errors during probe Douglas Anderson
2023-10-21 14:50   ` Grant Grundler
2023-10-24  1:24   ` Florian Fainelli
2023-10-20 21:06 ` [PATCH v5 3/8] r8152: Cancel hw_phy_work if we have an error in probe Douglas Anderson
2023-10-21 14:52   ` Grant Grundler
2023-10-24  1:24   ` Florian Fainelli
2023-10-20 21:06 ` [PATCH v5 4/8] r8152: Release firmware " Douglas Anderson
2023-10-21 15:01   ` Grant Grundler
2023-10-24  1:25   ` Florian Fainelli
2023-10-20 21:06 ` [PATCH v5 5/8] r8152: Check for unplug in rtl_phy_patch_request() Douglas Anderson
2023-10-21 15:03   ` Grant Grundler
2023-10-24  1:25   ` Florian Fainelli
2023-10-20 21:06 ` [PATCH v5 6/8] r8152: Check for unplug in r8153b_ups_en() / r8153c_ups_en() Douglas Anderson
2023-10-21 15:05   ` Grant Grundler
2023-10-24  1:25   ` Florian Fainelli
2023-10-20 21:06 ` [PATCH v5 7/8] r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE Douglas Anderson
2023-10-21 15:06   ` Grant Grundler
2023-10-24  1:26   ` Florian Fainelli
2023-10-20 21:06 ` [PATCH v5 8/8] r8152: Block future register access if register access fails Douglas Anderson
2023-10-21 15:35   ` Grant Grundler
2023-10-25 16:28   ` Simon Horman
2023-10-25 20:24     ` Doug Anderson
2023-11-03 16:52       ` Simon Horman
2023-10-22 10:50 ` [PATCH v5 0/8] r8152: Avoid writing garbage to the adapter's registers patchwork-bot+netdevbpf
2023-10-24  1:27   ` Florian Fainelli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231020210751.3415723-1-dianders@chromium.org \
    --to=dianders@chromium.org \
    --cc=bjorn@mork.no \
    --cc=davem@davemloft.net \
    --cc=ecgh@chromium.org \
    --cc=edumazet@google.com \
    --cc=grundler@chromium.org \
    --cc=hayeswang@realtek.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=laura.nao@collabora.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pmalani@chromium.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.