All of lore.kernel.org
 help / color / mirror / Atom feed
* wireless dongle causing entire machine to hang
@ 2018-08-08 17:58 Randy Oostdyk
  2018-08-08 19:04 ` Larry Finger
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Randy Oostdyk @ 2018-08-08 17:58 UTC (permalink / raw)
  To: linux-wireless

Good day all,

I'm writing to report an issue with the linux kernel, and I'm hoping
this is the right place to report it. (short aside: I tried to ask in
the #linux-wireless IRC channel, but wasn't permitted to speak!)

I'm aware that this is a USB protocol issue, and so this may be the
wrong place to report the bug, but the warning seems to be generated
by the wireless driver, and that appears to be the key issue here.

My USB wifi dongle is on the end of a very long USB cable, and was
connected to a USB hub. On two different occasions (after hours or
days of use), the machine it was attached to (Raspberry Pi 3) stopped
responding. I was unable to SSH in, even over ethernet. After a hard
reboot, I found that the following error was repeated **many thousands
of times per second** in three different log files:

Rpi kernel: [857011.857581] ieee80211 phy1:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71

As the machine continued in that state for hours, those log files had
grown to several gigabytes in size each! (/var/log/syslog,
/var/log/kern.log, and /var/log/messages)

It appears to be a very similar (if not same) bug referenced here:
https://www.spinics.net/lists/linux-wireless/msg150555.html

He resolved the "soft lockup" issue with some changes to the driver
(diff included in that thread), so I'm hoping this is the right place
to bring this issue up.

Output of uname -a:
Linux RCBLpi 4.14.52-v7+ #1123 SMP Wed Jun 27 17:35:49 BST 2018 armv7l GNU/Linux

If any further details are needed, feel free to email me directly, as
I haven't subscribed to this list.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-08 17:58 wireless dongle causing entire machine to hang Randy Oostdyk
@ 2018-08-08 19:04 ` Larry Finger
  2018-08-08 19:14 ` Arend van Spriel
  2018-08-09 11:10 ` Stanislaw Gruszka
  2 siblings, 0 replies; 14+ messages in thread
From: Larry Finger @ 2018-08-08 19:04 UTC (permalink / raw)
  To: Randy Oostdyk, linux-wireless

On 08/08/2018 12:58 PM, Randy Oostdyk wrote:
> Good day all,
> 
> I'm writing to report an issue with the linux kernel, and I'm hoping
> this is the right place to report it. (short aside: I tried to ask in
> the #linux-wireless IRC channel, but wasn't permitted to speak!)
> 
> I'm aware that this is a USB protocol issue, and so this may be the
> wrong place to report the bug, but the warning seems to be generated
> by the wireless driver, and that appears to be the key issue here.
> 
> My USB wifi dongle is on the end of a very long USB cable, and was
> connected to a USB hub. On two different occasions (after hours or
> days of use), the machine it was attached to (Raspberry Pi 3) stopped
> responding. I was unable to SSH in, even over ethernet. After a hard
> reboot, I found that the following error was repeated **many thousands
> of times per second** in three different log files:
> 
> Rpi kernel: [857011.857581] ieee80211 phy1:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> 
> As the machine continued in that state for hours, those log files had
> grown to several gigabytes in size each! (/var/log/syslog,
> /var/log/kern.log, and /var/log/messages)
> 
> It appears to be a very similar (if not same) bug referenced here:
> https://www.spinics.net/lists/linux-wireless/msg150555.html
> 
> He resolved the "soft lockup" issue with some changes to the driver
> (diff included in that thread), so I'm hoping this is the right place
> to bring this issue up.
> 
> Output of uname -a:
> Linux RCBLpi 4.14.52-v7+ #1123 SMP Wed Jun 27 17:35:49 BST 2018 armv7l GNU/Linux

My browsing shows that error 71 (EPROTO) is some kind of protocol error. Was the 
one you noted the first, or had others been logged? If that was the first, that 
infrequent an error will be hard to find.

I suspect some subtle USB timing issue. Is that very long cable within specs of 
5 meters for USB2 or 3 for USB3? Was the hub powered? If your setup permits, you 
might try with a shorter cable.

Larry

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-08 17:58 wireless dongle causing entire machine to hang Randy Oostdyk
  2018-08-08 19:04 ` Larry Finger
@ 2018-08-08 19:14 ` Arend van Spriel
  2018-08-08 21:43   ` Jonas Gorski
  2018-08-09 11:10 ` Stanislaw Gruszka
  2 siblings, 1 reply; 14+ messages in thread
From: Arend van Spriel @ 2018-08-08 19:14 UTC (permalink / raw)
  To: Randy Oostdyk, linux-wireless

On 8/8/2018 7:58 PM, Randy Oostdyk wrote:
> Good day all,

Hi Randy

> I'm writing to report an issue with the linux kernel, and I'm hoping
> this is the right place to report it. (short aside: I tried to ask in
> the #linux-wireless IRC channel, but wasn't permitted to speak!)

uhm. why?

> I'm aware that this is a USB protocol issue, and so this may be the
> wrong place to report the bug, but the warning seems to be generated
> by the wireless driver, and that appears to be the key issue here.
>
> My USB wifi dongle is on the end of a very long USB cable, and was
> connected to a USB hub. On two different occasions (after hours or
> days of use), the machine it was attached to (Raspberry Pi 3) stopped
> responding. I was unable to SSH in, even over ethernet. After a hard
> reboot, I found that the following error was repeated **many thousands
> of times per second** in three different log files:

I assume that USB is powered, right? Now probably asking the obvious, 
but you did not write it down so here it is: did you try with a very 
short USB cable?

I guess you read this already [1]

> Rpi kernel: [857011.857581] ieee80211 phy1:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71

So this -EPROTO. The usb host controller of the RPi3 can be found here:

https://github.com/raspberrypi/linux/tree/rpi-4.18.y/drivers/usb/host/dwc_otg

Not sure it that tells us anything though.

Regards,
Arend

[1] 
https://www.raspberrypi.org/documentation/hardware/raspberrypi/usb/README.md

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-08 19:14 ` Arend van Spriel
@ 2018-08-08 21:43   ` Jonas Gorski
  0 siblings, 0 replies; 14+ messages in thread
From: Jonas Gorski @ 2018-08-08 21:43 UTC (permalink / raw)
  To: Arend van Spriel; +Cc: Randy Oostdyk, linux-wireless

On 8 August 2018 at 21:14, Arend van Spriel
<arend.vanspriel@broadcom.com> wrote:
> On 8/8/2018 7:58 PM, Randy Oostdyk wrote:
>>
>> Good day all,
>
>
> Hi Randy
>
>> I'm writing to report an issue with the linux kernel, and I'm hoping
>> this is the right place to report it. (short aside: I tried to ask in
>> the #linux-wireless IRC channel, but wasn't permitted to speak!)
>
>
> uhm. why?

There was (and still is) a lot of link spamming recently on freenode
in #linux-wireless (and several other open [source] channels) from
unregistered clients; to prevent further disruptions unregistered
clients were prevented from writing to the channel for now.

Quite likely this was/is Randys issue. Registering helps, not sure how
easy it is to give individual unregistered users the right to talk.


Regards
Jonas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-08 17:58 wireless dongle causing entire machine to hang Randy Oostdyk
  2018-08-08 19:04 ` Larry Finger
  2018-08-08 19:14 ` Arend van Spriel
@ 2018-08-09 11:10 ` Stanislaw Gruszka
  2018-08-09 18:49   ` Randy Oostdyk
  2 siblings, 1 reply; 14+ messages in thread
From: Stanislaw Gruszka @ 2018-08-09 11:10 UTC (permalink / raw)
  To: Randy Oostdyk; +Cc: linux-wireless

On Wed, Aug 08, 2018 at 11:58:36AM -0600, Randy Oostdyk wrote:
> Good day all,
> 
> I'm writing to report an issue with the linux kernel, and I'm hoping
> this is the right place to report it. (short aside: I tried to ask in
> the #linux-wireless IRC channel, but wasn't permitted to speak!)
> 
> I'm aware that this is a USB protocol issue, and so this may be the
> wrong place to report the bug, but the warning seems to be generated
> by the wireless driver, and that appears to be the key issue here.
> 
> My USB wifi dongle is on the end of a very long USB cable, and was
> connected to a USB hub. On two different occasions (after hours or
> days of use), the machine it was attached to (Raspberry Pi 3) stopped
> responding. I was unable to SSH in, even over ethernet. After a hard
> reboot, I found that the following error was repeated **many thousands
> of times per second** in three different log files:
> 
> Rpi kernel: [857011.857581] ieee80211 phy1:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71

We have this change that make device not presend when we get -EPROTO
error from USB driver.

commit 2ad69ac5976191e9bb7dc4044204a504653ad1bb
Author: Richard Genoud <richard.genoud@gmail.com>
Date:   Thu Apr 3 16:48:56 2014 +0200

    rt2x00: Endless loop on hub port power down 

However I reverted it in

commit dc50a52a36f459e0e3bed492d72a78bf64772c25
Author: Stanislaw Gruszka <sgruszka@redhat.com>
Date:   Wed Nov 26 15:29:16 2014 +0100

    Revert "rt2x00: Endless loop on hub port power down"

Because it caused that if we get this error only one time,
it prevent devices to work, which worked otherwise.

rt2800usb driver expect thet we get -ENODEV or -ENOENT from
usb host driver when we can not talk to device anymore. On other usb
errors rt2800usb just try to work. I think this should be fixed in
usb host driver to return proper error code.

> As the machine continued in that state for hours, those log files had
> grown to several gigabytes in size each! (/var/log/syslog,
> /var/log/kern.log, and /var/log/messages)
> 
> It appears to be a very similar (if not same) bug referenced here:
> https://www.spinics.net/lists/linux-wireless/msg150555.html
> 
> He resolved the "soft lockup" issue with some changes to the driver
> (diff included in that thread), so I'm hoping this is the right place
> to bring this issue up.

I'm reluctant to replace _warn by _dbg messages as if somethings
will go wrong we will not notice that. We can use printk_ratelimited()
variant instead to do not spam log in speed that it will hung
the machine. But the correct fix should be in USB host drivers,
which seems to be not in perfect shape on those embedded platforms.

Regards
Stanislaw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-09 11:10 ` Stanislaw Gruszka
@ 2018-08-09 18:49   ` Randy Oostdyk
  2018-08-10 11:14     ` Stanislaw Gruszka
  0 siblings, 1 reply; 14+ messages in thread
From: Randy Oostdyk @ 2018-08-09 18:49 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: linux-wireless

On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> I'm reluctant to replace _warn by _dbg messages as if somethings
> will go wrong we will not notice that. We can use printk_ratelimited()
> variant instead to do not spam log in speed that it will hung
> the machine. But the correct fix should be in USB host drivers,
> which seems to be not in perfect shape on those embedded platforms.

Agreed about the "correct" fix likely being on the USB side, but at
least using printk_ratelimited() will avoid the log spamming, and
avoid locking the machine, as you said.

Will someone else take that approach and come up with a patch, or is
this something I should try to take on myself? I could certainly test
such a patch, if desired.

Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-09 18:49   ` Randy Oostdyk
@ 2018-08-10 11:14     ` Stanislaw Gruszka
  2018-08-10 11:47       ` Randy Oostdyk
  2018-08-15  5:19       ` Randy Oostdyk
  0 siblings, 2 replies; 14+ messages in thread
From: Stanislaw Gruszka @ 2018-08-10 11:14 UTC (permalink / raw)
  To: Randy Oostdyk; +Cc: linux-wireless

[-- Attachment #1: Type: text/plain, Size: 1409 bytes --]

On Thu, Aug 09, 2018 at 12:49:58PM -0600, Randy Oostdyk wrote:
> On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> > I'm reluctant to replace _warn by _dbg messages as if somethings
> > will go wrong we will not notice that. We can use printk_ratelimited()
> > variant instead to do not spam log in speed that it will hung
> > the machine. But the correct fix should be in USB host drivers,
> > which seems to be not in perfect shape on those embedded platforms.
> 
> Agreed about the "correct" fix likely being on the USB side, but at
> least using printk_ratelimited() will avoid the log spamming, and
> avoid locking the machine, as you said.
> 
> Will someone else take that approach and come up with a patch, or is
> this something I should try to take on myself? I could certainly test
> such a patch, if desired.
> 
> Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.

I'm attaching the ratelimit patch for test. 

However after reading email from Larry I withdraw blaming USB host 
driver. The issue could be faulty HW and USB host driver can not do
much much about this. 

Another question is if machine hung due to log spam or because there
was some crash. Those error conditions trigger code paths that are
not usually used, so there could be some oops that hung the system.

Anyway you can test the patch and report back.

Cheers
Stanislaw

[-- Attachment #2: rt2x00_ratelimited_err.patch --]
[-- Type: text/plain, Size: 1197 bytes --]

diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00.h b/drivers/net/wireless/ralink/rt2x00/rt2x00.h
index 4b1744e..94459d5 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00.h
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00.h
@@ -65,14 +65,19 @@
 /* Utility printing macros
  * rt2x00_probe_err is for messages when rt2x00_dev is uninitialized
  */
+#define wiphy_err_ratelimited(wiphy, format, args...)		\
+	dev_err_ratelimited(&(wiphy)->dev, format, ##args)
+#define wiphy_warn_ratelimited(wiphy, format, args...)		\
+	dev_warn_ratelimited(&(wiphy)->dev, format, ##args)
+
 #define rt2x00_probe_err(fmt, ...)					\
 	printk(KERN_ERR KBUILD_MODNAME ": %s: Error - " fmt,		\
 	       __func__, ##__VA_ARGS__)
 #define rt2x00_err(dev, fmt, ...)					\
-	wiphy_err((dev)->hw->wiphy, "%s: Error - " fmt,			\
+	wiphy_err_ratelimited((dev)->hw->wiphy, "%s: Error - " fmt,	\
 		  __func__, ##__VA_ARGS__)
 #define rt2x00_warn(dev, fmt, ...)					\
-	wiphy_warn((dev)->hw->wiphy, "%s: Warning - " fmt,		\
+	wiphy_warn_ratelimited((dev)->hw->wiphy, "%s: Warning - " fmt,	\
 		   __func__, ##__VA_ARGS__)
 #define rt2x00_info(dev, fmt, ...)					\
 	wiphy_info((dev)->hw->wiphy, "%s: Info - " fmt,			\

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-10 11:14     ` Stanislaw Gruszka
@ 2018-08-10 11:47       ` Randy Oostdyk
  2018-08-15  5:19       ` Randy Oostdyk
  1 sibling, 0 replies; 14+ messages in thread
From: Randy Oostdyk @ 2018-08-10 11:47 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: linux-wireless

It's indeed possible that the cause was faulty or out-of-spec HW. I'm
confident the machine hung (soft lock?) due to log spam, because upon
reboot, log files were several GB each, covering the time period I was
unable to connect to it.

Thanks for the patch. I'll try to test it with my previous HW setup,
and let you know how it goes. It might take a few weeks, due to the
intermittent nature of the USB problem that triggered it.

On Fri, Aug 10, 2018 at 5:14 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> On Thu, Aug 09, 2018 at 12:49:58PM -0600, Randy Oostdyk wrote:
>> On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>> > I'm reluctant to replace _warn by _dbg messages as if somethings
>> > will go wrong we will not notice that. We can use printk_ratelimited()
>> > variant instead to do not spam log in speed that it will hung
>> > the machine. But the correct fix should be in USB host drivers,
>> > which seems to be not in perfect shape on those embedded platforms.
>>
>> Agreed about the "correct" fix likely being on the USB side, but at
>> least using printk_ratelimited() will avoid the log spamming, and
>> avoid locking the machine, as you said.
>>
>> Will someone else take that approach and come up with a patch, or is
>> this something I should try to take on myself? I could certainly test
>> such a patch, if desired.
>>
>> Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.
>
> I'm attaching the ratelimit patch for test.
>
> However after reading email from Larry I withdraw blaming USB host
> driver. The issue could be faulty HW and USB host driver can not do
> much much about this.
>
> Another question is if machine hung due to log spam or because there
> was some crash. Those error conditions trigger code paths that are
> not usually used, so there could be some oops that hung the system.
>
> Anyway you can test the patch and report back.
>
> Cheers
> Stanislaw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-10 11:14     ` Stanislaw Gruszka
  2018-08-10 11:47       ` Randy Oostdyk
@ 2018-08-15  5:19       ` Randy Oostdyk
  2018-08-15  8:07         ` Stanislaw Gruszka
  1 sibling, 1 reply; 14+ messages in thread
From: Randy Oostdyk @ 2018-08-15  5:19 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: linux-wireless

Finally got an opportunity to implement the patch and test it out.
Looks like it suppressed the log spam (excellent!), but the machine
unfortunately remains unresponsive. Attached is a relevant section of
the log: I'm not able to interpret the CPU stall backtrace info, but
perhaps someone here can determine whether it's USB or the wireless
driver hanging the machine? In the log, it repeats with further
warnings and CPU stall backtraces. I can supply more, if needed.

As mentioned earlier, it seems clear I have some kind of USB hardware
issue going on, and I'll sort that out myself, I just wanted to report
the hang, in case it might help contribute toward improving the
reliability of the kernel.

Cheers!


Aug 10 14:44:09 RCBLpi kernel: [  284.697778] wlan1: AP
fc:2d:5e:dd:54:32 changed bandwidth, new config is 2437 MHz, width 1
(2437/0 MHz)
Aug 10 14:45:00 RCBLpi kernel: [  335.680872]
rt2800usb_tx_sta_fifo_read_completed: 1389 callbacks suppressed
Aug 10 14:45:00 RCBLpi kernel: [  335.680887] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [  335.680986] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [  335.681059] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [  335.681126] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [  335.681193] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [  335.681282] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [  335.681348] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [  335.681415] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [  335.681548] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [  335.681623] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:00 RCBLpi kernel: [  336.171569] ieee80211 phy4:
rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for
offset 0x1700 with error -71
Aug 10 14:45:05 RCBLpi kernel: [  340.681558]
rt2800usb_tx_sta_fifo_read_completed: 73528 callbacks suppressed
Aug 10 14:45:05 RCBLpi kernel: [  340.681573] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [  340.681659] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [  340.681754] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [  340.681829] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [  340.681903] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [  340.681984] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [  340.682079] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [  340.682149] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [  340.682236] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:05 RCBLpi kernel: [  340.682302] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [  345.691516]
rt2800usb_tx_sta_fifo_read_completed: 76991 callbacks suppressed
Aug 10 14:45:10 RCBLpi kernel: [  345.691530] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [  345.691603] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [  345.691704] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [  345.691810] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [  345.691877] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [  345.691958] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [  345.692024] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [  345.692091] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [  345.692173] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:10 RCBLpi kernel: [  345.692241] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [  350.701580]
rt2800usb_tx_sta_fifo_read_completed: 77030 callbacks suppressed
Aug 10 14:45:15 RCBLpi kernel: [  350.701596] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [  350.701722] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [  350.701789] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [  350.701870] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [  350.701935] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [  350.702001] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [  350.702084] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [  350.702150] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [  350.702241] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:15 RCBLpi kernel: [  350.702307] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  355.711589]
rt2800usb_tx_sta_fifo_read_completed: 77771 callbacks suppressed
Aug 10 14:45:20 RCBLpi kernel: [  355.711603] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  355.711674] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  355.711763] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  355.711829] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  355.711917] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  355.711983] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  355.712048] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  355.712135] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  355.712199] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  355.712265] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:20 RCBLpi kernel: [  356.171492] INFO: rcu_sched
self-detected stall on CPU
Aug 10 14:45:20 RCBLpi kernel: [  356.171522]   0-...: (2099 ticks
this GP) idle=72a/140000000000002/0 softirq=11632/11632 fqs=1002
Aug 10 14:45:20 RCBLpi kernel: [  356.171525]    (t=2100 jiffies
g=5626 c=5625 q=17568)
Aug 10 14:45:20 RCBLpi kernel: [  356.171541] NMI backtrace for cpu 0
Aug 10 14:45:20 RCBLpi kernel: [  356.171553] CPU: 0 PID: 1007 Comm:
kworker/u8:17 Tainted: G         C      4.14.61-v7+ #1
Aug 10 14:45:20 RCBLpi kernel: [  356.171558] Hardware name: BCM2835
Aug 10 14:45:20 RCBLpi kernel: [  356.171588] Workqueue: phy4
rt2x00usb_work_rxdone [rt2x00usb]
Aug 10 14:45:20 RCBLpi kernel: [  356.171622] [<8010ffc0>]
(unwind_backtrace) from [<8010c1fc>] (show_stack+0x20/0x24)
Aug 10 14:45:20 RCBLpi kernel: [  356.171640] [<8010c1fc>]
(show_stack) from [<807894fc>] (dump_stack+0xc8/0x10c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171659] [<807894fc>]
(dump_stack) from [<8078f43c>] (nmi_cpu_backtrace+0x11c/0x120)
Aug 10 14:45:20 RCBLpi kernel: [  356.171676] [<8078f43c>]
(nmi_cpu_backtrace) from [<8078f520>]
(nmi_trigger_cpumask_backtrace+0xe0/0x12c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171692] [<8078f520>]
(nmi_trigger_cpumask_backtrace) from [<8010e648>]
(arch_trigger_cpumask_backtrace+0x20/0x24)
Aug 10 14:45:20 RCBLpi kernel: [  356.171710] [<8010e648>]
(arch_trigger_cpumask_backtrace) from [<80185710>]
(rcu_dump_cpu_stacks+0xb0/0xdc)
Aug 10 14:45:20 RCBLpi kernel: [  356.171728] [<80185710>]
(rcu_dump_cpu_stacks) from [<801850b0>]
(rcu_check_callbacks+0x7e8/0x960)
Aug 10 14:45:20 RCBLpi kernel: [  356.171745] [<801850b0>]
(rcu_check_callbacks) from [<8018b314>]
(update_process_times+0x44/0x70)
Aug 10 14:45:20 RCBLpi kernel: [  356.171762] [<8018b314>]
(update_process_times) from [<8019d32c>] (tick_sched_handle+0x64/0x70)
Aug 10 14:45:20 RCBLpi kernel: [  356.171779] [<8019d32c>]
(tick_sched_handle) from [<8019d590>] (tick_sched_timer+0x50/0xac)
Aug 10 14:45:20 RCBLpi kernel: [  356.171795] [<8019d590>]
(tick_sched_timer) from [<8018c2cc>]
(__hrtimer_run_queues+0x158/0x2ec)
Aug 10 14:45:20 RCBLpi kernel: [  356.171810] [<8018c2cc>]
(__hrtimer_run_queues) from [<8018c704>]
(hrtimer_interrupt+0xb8/0x20c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171827] [<8018c704>]
(hrtimer_interrupt) from [<8063de74>]
(arch_timer_handler_phys+0x40/0x48)
Aug 10 14:45:20 RCBLpi kernel: [  356.171846] [<8063de74>]
(arch_timer_handler_phys) from [<8017a150>]
(handle_percpu_devid_irq+0x88/0x23c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171863] [<8017a150>]
(handle_percpu_devid_irq) from [<801749fc>]
(generic_handle_irq+0x34/0x44)
Aug 10 14:45:20 RCBLpi kernel: [  356.171879] [<801749fc>]
(generic_handle_irq) from [<80175050>] (__handle_domain_irq+0x6c/0xc4)
Aug 10 14:45:20 RCBLpi kernel: [  356.171894] [<80175050>]
(__handle_domain_irq) from [<80101520>]
(bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
Aug 10 14:45:20 RCBLpi kernel: [  356.171911] [<80101520>]
(bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
(__irq_svc+0x5c/0x7c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171918] Exception
stack(0xadbffab0 to 0xadbffaf8)
Aug 10 14:45:20 RCBLpi kernel: [  356.171927] faa0:
                 00000000 00000000 a980bd00 00000000
Aug 10 14:45:20 RCBLpi kernel: [  356.171939] fac0: b7716e80 60000113
b9fbe400 adae99d0 00000001 adbfe000 00000001 adbffb1c
Aug 10 14:45:20 RCBLpi kernel: [  356.171949] fae0: adbffb00 adbffb00
805b3388 805b339c 60000113 ffffffff
Aug 10 14:45:20 RCBLpi kernel: [  356.171969] [<807a4ffc>] (__irq_svc)
from [<805b339c>] (__usb_hcd_giveback_urb+0x118/0x15c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171988] [<805b339c>]
(__usb_hcd_giveback_urb) from [<805b3530>]
(usb_hcd_giveback_urb+0x4c/0xfc)
Aug 10 14:45:20 RCBLpi kernel: [  356.172006] [<805b3530>]
(usb_hcd_giveback_urb) from [<805dde40>]
(completion_tasklet_func+0x80/0xac)
Aug 10 14:45:20 RCBLpi kernel: [  356.172024] [<805dde40>]
(completion_tasklet_func) from [<805ecc60>]
(tasklet_callback+0x20/0x24)
Aug 10 14:45:20 RCBLpi kernel: [  356.172040] [<805ecc60>]
(tasklet_callback) from [<80123ac4>] (tasklet_hi_action+0x70/0x104)
Aug 10 14:45:20 RCBLpi kernel: [  356.172055] [<80123ac4>]
(tasklet_hi_action) from [<8010169c>] (__do_softirq+0x174/0x3d8)
Aug 10 14:45:20 RCBLpi kernel: [  356.172069] [<8010169c>]
(__do_softirq) from [<801236a8>] (irq_exit+0xe4/0x140)
Aug 10 14:45:20 RCBLpi kernel: [  356.172083] [<801236a8>] (irq_exit)
from [<80175054>] (__handle_domain_irq+0x70/0xc4)
Aug 10 14:45:20 RCBLpi kernel: [  356.172097] [<80175054>]
(__handle_domain_irq) from [<80101520>]
(bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
Aug 10 14:45:20 RCBLpi kernel: [  356.172111] [<80101520>]
(bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
(__irq_svc+0x5c/0x7c)
Aug 10 14:45:20 RCBLpi kernel: [  356.172117] Exception
stack(0xadbffc58 to 0xadbffca0)
Aug 10 14:45:20 RCBLpi kernel: [  356.172125] fc40:
                                   b9d83a40 00000000
Aug 10 14:45:20 RCBLpi kernel: [  356.172136] fc60: 00000001 00000000
60000013 00000000 aa2d4b00 a9949600 b9db0400 b9fbe400
Aug 10 14:45:20 RCBLpi kernel: [  356.172148] fc80: 00000000 adbffcbc
adbffca8 adbffca8 807a4d2c 807a4d40 60000013 ffffffff
Aug 10 14:45:20 RCBLpi kernel: [  356.172164] [<807a4ffc>] (__irq_svc)
from [<807a4d40>] (_raw_spin_unlock_irqrestore+0x50/0x70)
Aug 10 14:45:20 RCBLpi kernel: [  356.172180] [<807a4d40>]
(_raw_spin_unlock_irqrestore) from [<805ecbf8>]
(DWC_SPINUNLOCK_IRQRESTORE+0x18/0x1c)
Aug 10 14:45:20 RCBLpi kernel: [  356.172196] [<805ecbf8>]
(DWC_SPINUNLOCK_IRQRESTORE) from [<805e27f4>]
(dwc_otg_urb_enqueue+0x1dc/0x31c)
Aug 10 14:45:20 RCBLpi kernel: [  356.172212] [<805e27f4>]
(dwc_otg_urb_enqueue) from [<805b4f34>]
(usb_hcd_submit_urb+0xc8/0x934)
Aug 10 14:45:20 RCBLpi kernel: [  356.172226] [<805b4f34>]
(usb_hcd_submit_urb) from [<805b687c>] (usb_submit_urb+0x284/0x510)
Aug 10 14:45:20 RCBLpi kernel: [  356.172250] [<805b687c>]
(usb_submit_urb) from [<7f4a76e0>] (rt2x00usb_kick_rx_entry+0xb4/0xfc
[rt2x00usb])
Aug 10 14:45:20 RCBLpi kernel: [  356.172296] [<7f4a76e0>]
(rt2x00usb_kick_rx_entry [rt2x00usb]) from [<7f4a7758>]
(rt2x00usb_clear_entry+0x30/0x34 [rt2x00usb])
Aug 10 14:45:20 RCBLpi kernel: [  356.172354] [<7f4a7758>]
(rt2x00usb_clear_entry [rt2x00usb]) from [<7f492a44>]
(rt2x00lib_rxdone+0x148/0x5b8 [rt2x00lib])
Aug 10 14:45:20 RCBLpi kernel: [  356.172412] [<7f492a44>]
(rt2x00lib_rxdone [rt2x00lib]) from [<7f4a8088>]
(rt2x00usb_work_rxdone+0x58/0x8c [rt2x00usb])
Aug 10 14:45:20 RCBLpi kernel: [  356.172440] [<7f4a8088>]
(rt2x00usb_work_rxdone [rt2x00usb]) from [<80137384>]
(process_one_work+0x224/0x518)
Aug 10 14:45:20 RCBLpi kernel: [  356.172459] [<80137384>]
(process_one_work) from [<8013835c>] (worker_thread+0x60/0x5f0)
Aug 10 14:45:20 RCBLpi kernel: [  356.172476] [<8013835c>]
(worker_thread) from [<8013db68>] (kthread+0x144/0x174)
Aug 10 14:45:20 RCBLpi kernel: [  356.172492] [<8013db68>] (kthread)
from [<801080ac>] (ret_from_fork+0x14/0x28)
Aug 10 14:45:22 RCBLpi kernel: [  357.591518] INFO: rcu_sched detected
expedited stalls on CPUs/tasks: { 0-... } 2109 jiffies s: 221 root:
0x1/.
Aug 10 14:45:22 RCBLpi kernel: [  357.591545] blocking rcu_node structures:
Aug 10 14:45:22 RCBLpi kernel: [  357.591554] Task dump for CPU 0:
Aug 10 14:45:22 RCBLpi kernel: [  357.591560] kworker/u8:17   R
running task        0  1007      2 0x00000002
Aug 10 14:45:22 RCBLpi kernel: [  357.591603] Workqueue: phy4
rt2x00usb_work_rxdone [rt2x00usb]
Aug 10 14:45:25 RCBLpi kernel: [  360.721537]
rt2800usb_tx_sta_fifo_read_completed: 77886 callbacks suppressed
Aug 10 14:45:25 RCBLpi kernel: [  360.721552] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:25 RCBLpi kernel: [  360.721651] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71
Aug 10 14:45:25 RCBLpi kernel: [  360.721720] ieee80211 phy4:
rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
-71

On Fri, Aug 10, 2018 at 5:14 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> On Thu, Aug 09, 2018 at 12:49:58PM -0600, Randy Oostdyk wrote:
>> On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>> > I'm reluctant to replace _warn by _dbg messages as if somethings
>> > will go wrong we will not notice that. We can use printk_ratelimited()
>> > variant instead to do not spam log in speed that it will hung
>> > the machine. But the correct fix should be in USB host drivers,
>> > which seems to be not in perfect shape on those embedded platforms.
>>
>> Agreed about the "correct" fix likely being on the USB side, but at
>> least using printk_ratelimited() will avoid the log spamming, and
>> avoid locking the machine, as you said.
>>
>> Will someone else take that approach and come up with a patch, or is
>> this something I should try to take on myself? I could certainly test
>> such a patch, if desired.
>>
>> Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.
>
> I'm attaching the ratelimit patch for test.
>
> However after reading email from Larry I withdraw blaming USB host
> driver. The issue could be faulty HW and USB host driver can not do
> much much about this.
>
> Another question is if machine hung due to log spam or because there
> was some crash. Those error conditions trigger code paths that are
> not usually used, so there could be some oops that hung the system.
>
> Anyway you can test the patch and report back.
>
> Cheers
> Stanislaw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-15  5:19       ` Randy Oostdyk
@ 2018-08-15  8:07         ` Stanislaw Gruszka
  2018-08-16  4:25           ` Randy Oostdyk
  0 siblings, 1 reply; 14+ messages in thread
From: Stanislaw Gruszka @ 2018-08-15  8:07 UTC (permalink / raw)
  To: Randy Oostdyk; +Cc: linux-wireless

Please attach logs as txt file. Your mail client break lines
what makes logs unreadable for me.

Thanks
Stanislaw

On Tue, Aug 14, 2018 at 11:19:09PM -0600, Randy Oostdyk wrote:
> Finally got an opportunity to implement the patch and test it out.
> Looks like it suppressed the log spam (excellent!), but the machine
> unfortunately remains unresponsive. Attached is a relevant section of
> the log: I'm not able to interpret the CPU stall backtrace info, but
> perhaps someone here can determine whether it's USB or the wireless
> driver hanging the machine? In the log, it repeats with further
> warnings and CPU stall backtraces. I can supply more, if needed.
> 
> As mentioned earlier, it seems clear I have some kind of USB hardware
> issue going on, and I'll sort that out myself, I just wanted to report
> the hang, in case it might help contribute toward improving the
> reliability of the kernel.
> 
> Cheers!
> 
> 
> Aug 10 14:44:09 RCBLpi kernel: [  284.697778] wlan1: AP
> fc:2d:5e:dd:54:32 changed bandwidth, new config is 2437 MHz, width 1
> (2437/0 MHz)
> Aug 10 14:45:00 RCBLpi kernel: [  335.680872]
> rt2800usb_tx_sta_fifo_read_completed: 1389 callbacks suppressed
> Aug 10 14:45:00 RCBLpi kernel: [  335.680887] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [  335.680986] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [  335.681059] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [  335.681126] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [  335.681193] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [  335.681282] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [  335.681348] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [  335.681415] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [  335.681548] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [  335.681623] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:00 RCBLpi kernel: [  336.171569] ieee80211 phy4:
> rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for
> offset 0x1700 with error -71
> Aug 10 14:45:05 RCBLpi kernel: [  340.681558]
> rt2800usb_tx_sta_fifo_read_completed: 73528 callbacks suppressed
> Aug 10 14:45:05 RCBLpi kernel: [  340.681573] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [  340.681659] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [  340.681754] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [  340.681829] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [  340.681903] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [  340.681984] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [  340.682079] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [  340.682149] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [  340.682236] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:05 RCBLpi kernel: [  340.682302] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [  345.691516]
> rt2800usb_tx_sta_fifo_read_completed: 76991 callbacks suppressed
> Aug 10 14:45:10 RCBLpi kernel: [  345.691530] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [  345.691603] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [  345.691704] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [  345.691810] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [  345.691877] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [  345.691958] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [  345.692024] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [  345.692091] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [  345.692173] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:10 RCBLpi kernel: [  345.692241] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [  350.701580]
> rt2800usb_tx_sta_fifo_read_completed: 77030 callbacks suppressed
> Aug 10 14:45:15 RCBLpi kernel: [  350.701596] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [  350.701722] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [  350.701789] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [  350.701870] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [  350.701935] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [  350.702001] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [  350.702084] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [  350.702150] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [  350.702241] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:15 RCBLpi kernel: [  350.702307] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  355.711589]
> rt2800usb_tx_sta_fifo_read_completed: 77771 callbacks suppressed
> Aug 10 14:45:20 RCBLpi kernel: [  355.711603] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  355.711674] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  355.711763] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  355.711829] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  355.711917] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  355.711983] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  355.712048] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  355.712135] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  355.712199] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  355.712265] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:20 RCBLpi kernel: [  356.171492] INFO: rcu_sched
> self-detected stall on CPU
> Aug 10 14:45:20 RCBLpi kernel: [  356.171522]   0-...: (2099 ticks
> this GP) idle=72a/140000000000002/0 softirq=11632/11632 fqs=1002
> Aug 10 14:45:20 RCBLpi kernel: [  356.171525]    (t=2100 jiffies
> g=5626 c=5625 q=17568)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171541] NMI backtrace for cpu 0
> Aug 10 14:45:20 RCBLpi kernel: [  356.171553] CPU: 0 PID: 1007 Comm:
> kworker/u8:17 Tainted: G         C      4.14.61-v7+ #1
> Aug 10 14:45:20 RCBLpi kernel: [  356.171558] Hardware name: BCM2835
> Aug 10 14:45:20 RCBLpi kernel: [  356.171588] Workqueue: phy4
> rt2x00usb_work_rxdone [rt2x00usb]
> Aug 10 14:45:20 RCBLpi kernel: [  356.171622] [<8010ffc0>]
> (unwind_backtrace) from [<8010c1fc>] (show_stack+0x20/0x24)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171640] [<8010c1fc>]
> (show_stack) from [<807894fc>] (dump_stack+0xc8/0x10c)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171659] [<807894fc>]
> (dump_stack) from [<8078f43c>] (nmi_cpu_backtrace+0x11c/0x120)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171676] [<8078f43c>]
> (nmi_cpu_backtrace) from [<8078f520>]
> (nmi_trigger_cpumask_backtrace+0xe0/0x12c)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171692] [<8078f520>]
> (nmi_trigger_cpumask_backtrace) from [<8010e648>]
> (arch_trigger_cpumask_backtrace+0x20/0x24)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171710] [<8010e648>]
> (arch_trigger_cpumask_backtrace) from [<80185710>]
> (rcu_dump_cpu_stacks+0xb0/0xdc)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171728] [<80185710>]
> (rcu_dump_cpu_stacks) from [<801850b0>]
> (rcu_check_callbacks+0x7e8/0x960)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171745] [<801850b0>]
> (rcu_check_callbacks) from [<8018b314>]
> (update_process_times+0x44/0x70)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171762] [<8018b314>]
> (update_process_times) from [<8019d32c>] (tick_sched_handle+0x64/0x70)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171779] [<8019d32c>]
> (tick_sched_handle) from [<8019d590>] (tick_sched_timer+0x50/0xac)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171795] [<8019d590>]
> (tick_sched_timer) from [<8018c2cc>]
> (__hrtimer_run_queues+0x158/0x2ec)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171810] [<8018c2cc>]
> (__hrtimer_run_queues) from [<8018c704>]
> (hrtimer_interrupt+0xb8/0x20c)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171827] [<8018c704>]
> (hrtimer_interrupt) from [<8063de74>]
> (arch_timer_handler_phys+0x40/0x48)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171846] [<8063de74>]
> (arch_timer_handler_phys) from [<8017a150>]
> (handle_percpu_devid_irq+0x88/0x23c)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171863] [<8017a150>]
> (handle_percpu_devid_irq) from [<801749fc>]
> (generic_handle_irq+0x34/0x44)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171879] [<801749fc>]
> (generic_handle_irq) from [<80175050>] (__handle_domain_irq+0x6c/0xc4)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171894] [<80175050>]
> (__handle_domain_irq) from [<80101520>]
> (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171911] [<80101520>]
> (bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
> (__irq_svc+0x5c/0x7c)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171918] Exception
> stack(0xadbffab0 to 0xadbffaf8)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171927] faa0:
>                  00000000 00000000 a980bd00 00000000
> Aug 10 14:45:20 RCBLpi kernel: [  356.171939] fac0: b7716e80 60000113
> b9fbe400 adae99d0 00000001 adbfe000 00000001 adbffb1c
> Aug 10 14:45:20 RCBLpi kernel: [  356.171949] fae0: adbffb00 adbffb00
> 805b3388 805b339c 60000113 ffffffff
> Aug 10 14:45:20 RCBLpi kernel: [  356.171969] [<807a4ffc>] (__irq_svc)
> from [<805b339c>] (__usb_hcd_giveback_urb+0x118/0x15c)
> Aug 10 14:45:20 RCBLpi kernel: [  356.171988] [<805b339c>]
> (__usb_hcd_giveback_urb) from [<805b3530>]
> (usb_hcd_giveback_urb+0x4c/0xfc)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172006] [<805b3530>]
> (usb_hcd_giveback_urb) from [<805dde40>]
> (completion_tasklet_func+0x80/0xac)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172024] [<805dde40>]
> (completion_tasklet_func) from [<805ecc60>]
> (tasklet_callback+0x20/0x24)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172040] [<805ecc60>]
> (tasklet_callback) from [<80123ac4>] (tasklet_hi_action+0x70/0x104)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172055] [<80123ac4>]
> (tasklet_hi_action) from [<8010169c>] (__do_softirq+0x174/0x3d8)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172069] [<8010169c>]
> (__do_softirq) from [<801236a8>] (irq_exit+0xe4/0x140)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172083] [<801236a8>] (irq_exit)
> from [<80175054>] (__handle_domain_irq+0x70/0xc4)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172097] [<80175054>]
> (__handle_domain_irq) from [<80101520>]
> (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172111] [<80101520>]
> (bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
> (__irq_svc+0x5c/0x7c)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172117] Exception
> stack(0xadbffc58 to 0xadbffca0)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172125] fc40:
>                                    b9d83a40 00000000
> Aug 10 14:45:20 RCBLpi kernel: [  356.172136] fc60: 00000001 00000000
> 60000013 00000000 aa2d4b00 a9949600 b9db0400 b9fbe400
> Aug 10 14:45:20 RCBLpi kernel: [  356.172148] fc80: 00000000 adbffcbc
> adbffca8 adbffca8 807a4d2c 807a4d40 60000013 ffffffff
> Aug 10 14:45:20 RCBLpi kernel: [  356.172164] [<807a4ffc>] (__irq_svc)
> from [<807a4d40>] (_raw_spin_unlock_irqrestore+0x50/0x70)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172180] [<807a4d40>]
> (_raw_spin_unlock_irqrestore) from [<805ecbf8>]
> (DWC_SPINUNLOCK_IRQRESTORE+0x18/0x1c)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172196] [<805ecbf8>]
> (DWC_SPINUNLOCK_IRQRESTORE) from [<805e27f4>]
> (dwc_otg_urb_enqueue+0x1dc/0x31c)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172212] [<805e27f4>]
> (dwc_otg_urb_enqueue) from [<805b4f34>]
> (usb_hcd_submit_urb+0xc8/0x934)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172226] [<805b4f34>]
> (usb_hcd_submit_urb) from [<805b687c>] (usb_submit_urb+0x284/0x510)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172250] [<805b687c>]
> (usb_submit_urb) from [<7f4a76e0>] (rt2x00usb_kick_rx_entry+0xb4/0xfc
> [rt2x00usb])
> Aug 10 14:45:20 RCBLpi kernel: [  356.172296] [<7f4a76e0>]
> (rt2x00usb_kick_rx_entry [rt2x00usb]) from [<7f4a7758>]
> (rt2x00usb_clear_entry+0x30/0x34 [rt2x00usb])
> Aug 10 14:45:20 RCBLpi kernel: [  356.172354] [<7f4a7758>]
> (rt2x00usb_clear_entry [rt2x00usb]) from [<7f492a44>]
> (rt2x00lib_rxdone+0x148/0x5b8 [rt2x00lib])
> Aug 10 14:45:20 RCBLpi kernel: [  356.172412] [<7f492a44>]
> (rt2x00lib_rxdone [rt2x00lib]) from [<7f4a8088>]
> (rt2x00usb_work_rxdone+0x58/0x8c [rt2x00usb])
> Aug 10 14:45:20 RCBLpi kernel: [  356.172440] [<7f4a8088>]
> (rt2x00usb_work_rxdone [rt2x00usb]) from [<80137384>]
> (process_one_work+0x224/0x518)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172459] [<80137384>]
> (process_one_work) from [<8013835c>] (worker_thread+0x60/0x5f0)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172476] [<8013835c>]
> (worker_thread) from [<8013db68>] (kthread+0x144/0x174)
> Aug 10 14:45:20 RCBLpi kernel: [  356.172492] [<8013db68>] (kthread)
> from [<801080ac>] (ret_from_fork+0x14/0x28)
> Aug 10 14:45:22 RCBLpi kernel: [  357.591518] INFO: rcu_sched detected
> expedited stalls on CPUs/tasks: { 0-... } 2109 jiffies s: 221 root:
> 0x1/.
> Aug 10 14:45:22 RCBLpi kernel: [  357.591545] blocking rcu_node structures:
> Aug 10 14:45:22 RCBLpi kernel: [  357.591554] Task dump for CPU 0:
> Aug 10 14:45:22 RCBLpi kernel: [  357.591560] kworker/u8:17   R
> running task        0  1007      2 0x00000002
> Aug 10 14:45:22 RCBLpi kernel: [  357.591603] Workqueue: phy4
> rt2x00usb_work_rxdone [rt2x00usb]
> Aug 10 14:45:25 RCBLpi kernel: [  360.721537]
> rt2800usb_tx_sta_fifo_read_completed: 77886 callbacks suppressed
> Aug 10 14:45:25 RCBLpi kernel: [  360.721552] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:25 RCBLpi kernel: [  360.721651] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> Aug 10 14:45:25 RCBLpi kernel: [  360.721720] ieee80211 phy4:
> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
> -71
> 
> On Fri, Aug 10, 2018 at 5:14 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> > On Thu, Aug 09, 2018 at 12:49:58PM -0600, Randy Oostdyk wrote:
> >> On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> >> > I'm reluctant to replace _warn by _dbg messages as if somethings
> >> > will go wrong we will not notice that. We can use printk_ratelimited()
> >> > variant instead to do not spam log in speed that it will hung
> >> > the machine. But the correct fix should be in USB host drivers,
> >> > which seems to be not in perfect shape on those embedded platforms.
> >>
> >> Agreed about the "correct" fix likely being on the USB side, but at
> >> least using printk_ratelimited() will avoid the log spamming, and
> >> avoid locking the machine, as you said.
> >>
> >> Will someone else take that approach and come up with a patch, or is
> >> this something I should try to take on myself? I could certainly test
> >> such a patch, if desired.
> >>
> >> Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.
> >
> > I'm attaching the ratelimit patch for test.
> >
> > However after reading email from Larry I withdraw blaming USB host
> > driver. The issue could be faulty HW and USB host driver can not do
> > much much about this.
> >
> > Another question is if machine hung due to log spam or because there
> > was some crash. Those error conditions trigger code paths that are
> > not usually used, so there could be some oops that hung the system.
> >
> > Anyway you can test the patch and report back.
> >
> > Cheers
> > Stanislaw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-15  8:07         ` Stanislaw Gruszka
@ 2018-08-16  4:25           ` Randy Oostdyk
  2018-08-16 10:31             ` Stanislaw Gruszka
  0 siblings, 1 reply; 14+ messages in thread
From: Randy Oostdyk @ 2018-08-16  4:25 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: linux-wireless

[-- Attachment #1: Type: text/plain, Size: 19363 bytes --]

Sorry about the mangled log file! Here's the txt file as an
attachment, hope it's useful.

Cheers.

On Wed, Aug 15, 2018 at 2:07 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> Please attach logs as txt file. Your mail client break lines
> what makes logs unreadable for me.
>
> Thanks
> Stanislaw
>
> On Tue, Aug 14, 2018 at 11:19:09PM -0600, Randy Oostdyk wrote:
>> Finally got an opportunity to implement the patch and test it out.
>> Looks like it suppressed the log spam (excellent!), but the machine
>> unfortunately remains unresponsive. Attached is a relevant section of
>> the log: I'm not able to interpret the CPU stall backtrace info, but
>> perhaps someone here can determine whether it's USB or the wireless
>> driver hanging the machine? In the log, it repeats with further
>> warnings and CPU stall backtraces. I can supply more, if needed.
>>
>> As mentioned earlier, it seems clear I have some kind of USB hardware
>> issue going on, and I'll sort that out myself, I just wanted to report
>> the hang, in case it might help contribute toward improving the
>> reliability of the kernel.
>>
>> Cheers!
>>
>>
>> Aug 10 14:44:09 RCBLpi kernel: [  284.697778] wlan1: AP
>> fc:2d:5e:dd:54:32 changed bandwidth, new config is 2437 MHz, width 1
>> (2437/0 MHz)
>> Aug 10 14:45:00 RCBLpi kernel: [  335.680872]
>> rt2800usb_tx_sta_fifo_read_completed: 1389 callbacks suppressed
>> Aug 10 14:45:00 RCBLpi kernel: [  335.680887] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [  335.680986] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [  335.681059] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [  335.681126] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [  335.681193] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [  335.681282] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [  335.681348] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [  335.681415] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [  335.681548] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [  335.681623] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:00 RCBLpi kernel: [  336.171569] ieee80211 phy4:
>> rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for
>> offset 0x1700 with error -71
>> Aug 10 14:45:05 RCBLpi kernel: [  340.681558]
>> rt2800usb_tx_sta_fifo_read_completed: 73528 callbacks suppressed
>> Aug 10 14:45:05 RCBLpi kernel: [  340.681573] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [  340.681659] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [  340.681754] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [  340.681829] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [  340.681903] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [  340.681984] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [  340.682079] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [  340.682149] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [  340.682236] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:05 RCBLpi kernel: [  340.682302] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [  345.691516]
>> rt2800usb_tx_sta_fifo_read_completed: 76991 callbacks suppressed
>> Aug 10 14:45:10 RCBLpi kernel: [  345.691530] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [  345.691603] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [  345.691704] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [  345.691810] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [  345.691877] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [  345.691958] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [  345.692024] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [  345.692091] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [  345.692173] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:10 RCBLpi kernel: [  345.692241] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [  350.701580]
>> rt2800usb_tx_sta_fifo_read_completed: 77030 callbacks suppressed
>> Aug 10 14:45:15 RCBLpi kernel: [  350.701596] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [  350.701722] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [  350.701789] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [  350.701870] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [  350.701935] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [  350.702001] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [  350.702084] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [  350.702150] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [  350.702241] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:15 RCBLpi kernel: [  350.702307] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  355.711589]
>> rt2800usb_tx_sta_fifo_read_completed: 77771 callbacks suppressed
>> Aug 10 14:45:20 RCBLpi kernel: [  355.711603] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  355.711674] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  355.711763] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  355.711829] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  355.711917] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  355.711983] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  355.712048] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  355.712135] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  355.712199] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  355.712265] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171492] INFO: rcu_sched
>> self-detected stall on CPU
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171522]   0-...: (2099 ticks
>> this GP) idle=72a/140000000000002/0 softirq=11632/11632 fqs=1002
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171525]    (t=2100 jiffies
>> g=5626 c=5625 q=17568)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171541] NMI backtrace for cpu 0
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171553] CPU: 0 PID: 1007 Comm:
>> kworker/u8:17 Tainted: G         C      4.14.61-v7+ #1
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171558] Hardware name: BCM2835
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171588] Workqueue: phy4
>> rt2x00usb_work_rxdone [rt2x00usb]
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171622] [<8010ffc0>]
>> (unwind_backtrace) from [<8010c1fc>] (show_stack+0x20/0x24)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171640] [<8010c1fc>]
>> (show_stack) from [<807894fc>] (dump_stack+0xc8/0x10c)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171659] [<807894fc>]
>> (dump_stack) from [<8078f43c>] (nmi_cpu_backtrace+0x11c/0x120)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171676] [<8078f43c>]
>> (nmi_cpu_backtrace) from [<8078f520>]
>> (nmi_trigger_cpumask_backtrace+0xe0/0x12c)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171692] [<8078f520>]
>> (nmi_trigger_cpumask_backtrace) from [<8010e648>]
>> (arch_trigger_cpumask_backtrace+0x20/0x24)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171710] [<8010e648>]
>> (arch_trigger_cpumask_backtrace) from [<80185710>]
>> (rcu_dump_cpu_stacks+0xb0/0xdc)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171728] [<80185710>]
>> (rcu_dump_cpu_stacks) from [<801850b0>]
>> (rcu_check_callbacks+0x7e8/0x960)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171745] [<801850b0>]
>> (rcu_check_callbacks) from [<8018b314>]
>> (update_process_times+0x44/0x70)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171762] [<8018b314>]
>> (update_process_times) from [<8019d32c>] (tick_sched_handle+0x64/0x70)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171779] [<8019d32c>]
>> (tick_sched_handle) from [<8019d590>] (tick_sched_timer+0x50/0xac)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171795] [<8019d590>]
>> (tick_sched_timer) from [<8018c2cc>]
>> (__hrtimer_run_queues+0x158/0x2ec)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171810] [<8018c2cc>]
>> (__hrtimer_run_queues) from [<8018c704>]
>> (hrtimer_interrupt+0xb8/0x20c)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171827] [<8018c704>]
>> (hrtimer_interrupt) from [<8063de74>]
>> (arch_timer_handler_phys+0x40/0x48)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171846] [<8063de74>]
>> (arch_timer_handler_phys) from [<8017a150>]
>> (handle_percpu_devid_irq+0x88/0x23c)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171863] [<8017a150>]
>> (handle_percpu_devid_irq) from [<801749fc>]
>> (generic_handle_irq+0x34/0x44)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171879] [<801749fc>]
>> (generic_handle_irq) from [<80175050>] (__handle_domain_irq+0x6c/0xc4)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171894] [<80175050>]
>> (__handle_domain_irq) from [<80101520>]
>> (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171911] [<80101520>]
>> (bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
>> (__irq_svc+0x5c/0x7c)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171918] Exception
>> stack(0xadbffab0 to 0xadbffaf8)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171927] faa0:
>>                  00000000 00000000 a980bd00 00000000
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171939] fac0: b7716e80 60000113
>> b9fbe400 adae99d0 00000001 adbfe000 00000001 adbffb1c
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171949] fae0: adbffb00 adbffb00
>> 805b3388 805b339c 60000113 ffffffff
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171969] [<807a4ffc>] (__irq_svc)
>> from [<805b339c>] (__usb_hcd_giveback_urb+0x118/0x15c)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.171988] [<805b339c>]
>> (__usb_hcd_giveback_urb) from [<805b3530>]
>> (usb_hcd_giveback_urb+0x4c/0xfc)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172006] [<805b3530>]
>> (usb_hcd_giveback_urb) from [<805dde40>]
>> (completion_tasklet_func+0x80/0xac)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172024] [<805dde40>]
>> (completion_tasklet_func) from [<805ecc60>]
>> (tasklet_callback+0x20/0x24)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172040] [<805ecc60>]
>> (tasklet_callback) from [<80123ac4>] (tasklet_hi_action+0x70/0x104)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172055] [<80123ac4>]
>> (tasklet_hi_action) from [<8010169c>] (__do_softirq+0x174/0x3d8)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172069] [<8010169c>]
>> (__do_softirq) from [<801236a8>] (irq_exit+0xe4/0x140)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172083] [<801236a8>] (irq_exit)
>> from [<80175054>] (__handle_domain_irq+0x70/0xc4)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172097] [<80175054>]
>> (__handle_domain_irq) from [<80101520>]
>> (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172111] [<80101520>]
>> (bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>]
>> (__irq_svc+0x5c/0x7c)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172117] Exception
>> stack(0xadbffc58 to 0xadbffca0)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172125] fc40:
>>                                    b9d83a40 00000000
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172136] fc60: 00000001 00000000
>> 60000013 00000000 aa2d4b00 a9949600 b9db0400 b9fbe400
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172148] fc80: 00000000 adbffcbc
>> adbffca8 adbffca8 807a4d2c 807a4d40 60000013 ffffffff
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172164] [<807a4ffc>] (__irq_svc)
>> from [<807a4d40>] (_raw_spin_unlock_irqrestore+0x50/0x70)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172180] [<807a4d40>]
>> (_raw_spin_unlock_irqrestore) from [<805ecbf8>]
>> (DWC_SPINUNLOCK_IRQRESTORE+0x18/0x1c)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172196] [<805ecbf8>]
>> (DWC_SPINUNLOCK_IRQRESTORE) from [<805e27f4>]
>> (dwc_otg_urb_enqueue+0x1dc/0x31c)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172212] [<805e27f4>]
>> (dwc_otg_urb_enqueue) from [<805b4f34>]
>> (usb_hcd_submit_urb+0xc8/0x934)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172226] [<805b4f34>]
>> (usb_hcd_submit_urb) from [<805b687c>] (usb_submit_urb+0x284/0x510)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172250] [<805b687c>]
>> (usb_submit_urb) from [<7f4a76e0>] (rt2x00usb_kick_rx_entry+0xb4/0xfc
>> [rt2x00usb])
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172296] [<7f4a76e0>]
>> (rt2x00usb_kick_rx_entry [rt2x00usb]) from [<7f4a7758>]
>> (rt2x00usb_clear_entry+0x30/0x34 [rt2x00usb])
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172354] [<7f4a7758>]
>> (rt2x00usb_clear_entry [rt2x00usb]) from [<7f492a44>]
>> (rt2x00lib_rxdone+0x148/0x5b8 [rt2x00lib])
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172412] [<7f492a44>]
>> (rt2x00lib_rxdone [rt2x00lib]) from [<7f4a8088>]
>> (rt2x00usb_work_rxdone+0x58/0x8c [rt2x00usb])
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172440] [<7f4a8088>]
>> (rt2x00usb_work_rxdone [rt2x00usb]) from [<80137384>]
>> (process_one_work+0x224/0x518)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172459] [<80137384>]
>> (process_one_work) from [<8013835c>] (worker_thread+0x60/0x5f0)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172476] [<8013835c>]
>> (worker_thread) from [<8013db68>] (kthread+0x144/0x174)
>> Aug 10 14:45:20 RCBLpi kernel: [  356.172492] [<8013db68>] (kthread)
>> from [<801080ac>] (ret_from_fork+0x14/0x28)
>> Aug 10 14:45:22 RCBLpi kernel: [  357.591518] INFO: rcu_sched detected
>> expedited stalls on CPUs/tasks: { 0-... } 2109 jiffies s: 221 root:
>> 0x1/.
>> Aug 10 14:45:22 RCBLpi kernel: [  357.591545] blocking rcu_node structures:
>> Aug 10 14:45:22 RCBLpi kernel: [  357.591554] Task dump for CPU 0:
>> Aug 10 14:45:22 RCBLpi kernel: [  357.591560] kworker/u8:17   R
>> running task        0  1007      2 0x00000002
>> Aug 10 14:45:22 RCBLpi kernel: [  357.591603] Workqueue: phy4
>> rt2x00usb_work_rxdone [rt2x00usb]
>> Aug 10 14:45:25 RCBLpi kernel: [  360.721537]
>> rt2800usb_tx_sta_fifo_read_completed: 77886 callbacks suppressed
>> Aug 10 14:45:25 RCBLpi kernel: [  360.721552] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:25 RCBLpi kernel: [  360.721651] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>> Aug 10 14:45:25 RCBLpi kernel: [  360.721720] ieee80211 phy4:
>> rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed
>> -71
>>
>> On Fri, Aug 10, 2018 at 5:14 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>> > On Thu, Aug 09, 2018 at 12:49:58PM -0600, Randy Oostdyk wrote:
>> >> On Thu, Aug 9, 2018 at 5:10 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
>> >> > I'm reluctant to replace _warn by _dbg messages as if somethings
>> >> > will go wrong we will not notice that. We can use printk_ratelimited()
>> >> > variant instead to do not spam log in speed that it will hung
>> >> > the machine. But the correct fix should be in USB host drivers,
>> >> > which seems to be not in perfect shape on those embedded platforms.
>> >>
>> >> Agreed about the "correct" fix likely being on the USB side, but at
>> >> least using printk_ratelimited() will avoid the log spamming, and
>> >> avoid locking the machine, as you said.
>> >>
>> >> Will someone else take that approach and come up with a patch, or is
>> >> this something I should try to take on myself? I could certainly test
>> >> such a patch, if desired.
>> >>
>> >> Meanwhile, I'll look into reporting the USB bug, if it hasn't already been.
>> >
>> > I'm attaching the ratelimit patch for test.
>> >
>> > However after reading email from Larry I withdraw blaming USB host
>> > driver. The issue could be faulty HW and USB host driver can not do
>> > much much about this.
>> >
>> > Another question is if machine hung due to log spam or because there
>> > was some crash. Those error conditions trigger code paths that are
>> > not usually used, so there could be some oops that hung the system.
>> >
>> > Anyway you can test the patch and report back.
>> >
>> > Cheers
>> > Stanislaw

[-- Attachment #2: rt2800usb_cpu_stall.txt --]
[-- Type: text/plain, Size: 15683 bytes --]

Aug 10 14:44:09 RCBLpi kernel: [  284.697778] wlan1: AP fc:2d:5e:dd:54:32 changed bandwidth, new config is 2437 MHz, width 1 (2437/0 MHz)
Aug 10 14:45:00 RCBLpi kernel: [  335.680872] rt2800usb_tx_sta_fifo_read_completed: 1389 callbacks suppressed
Aug 10 14:45:00 RCBLpi kernel: [  335.680887] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:00 RCBLpi kernel: [  335.680986] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:00 RCBLpi kernel: [  335.681059] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:00 RCBLpi kernel: [  335.681126] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:00 RCBLpi kernel: [  335.681193] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:00 RCBLpi kernel: [  335.681282] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:00 RCBLpi kernel: [  335.681348] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:00 RCBLpi kernel: [  335.681415] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:00 RCBLpi kernel: [  335.681548] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:00 RCBLpi kernel: [  335.681623] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:00 RCBLpi kernel: [  336.171569] ieee80211 phy4: rt2x00usb_vendor_request: Error - Vendor Request 0x07 failed for offset 0x1700 with error -71
Aug 10 14:45:05 RCBLpi kernel: [  340.681558] rt2800usb_tx_sta_fifo_read_completed: 73528 callbacks suppressed
Aug 10 14:45:05 RCBLpi kernel: [  340.681573] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:05 RCBLpi kernel: [  340.681659] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:05 RCBLpi kernel: [  340.681754] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:05 RCBLpi kernel: [  340.681829] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:05 RCBLpi kernel: [  340.681903] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:05 RCBLpi kernel: [  340.681984] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:05 RCBLpi kernel: [  340.682079] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:05 RCBLpi kernel: [  340.682149] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:05 RCBLpi kernel: [  340.682236] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:05 RCBLpi kernel: [  340.682302] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:10 RCBLpi kernel: [  345.691516] rt2800usb_tx_sta_fifo_read_completed: 76991 callbacks suppressed
Aug 10 14:45:10 RCBLpi kernel: [  345.691530] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:10 RCBLpi kernel: [  345.691603] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:10 RCBLpi kernel: [  345.691704] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:10 RCBLpi kernel: [  345.691810] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:10 RCBLpi kernel: [  345.691877] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:10 RCBLpi kernel: [  345.691958] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:10 RCBLpi kernel: [  345.692024] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:10 RCBLpi kernel: [  345.692091] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:10 RCBLpi kernel: [  345.692173] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:10 RCBLpi kernel: [  345.692241] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:15 RCBLpi kernel: [  350.701580] rt2800usb_tx_sta_fifo_read_completed: 77030 callbacks suppressed
Aug 10 14:45:15 RCBLpi kernel: [  350.701596] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:15 RCBLpi kernel: [  350.701722] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:15 RCBLpi kernel: [  350.701789] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:15 RCBLpi kernel: [  350.701870] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:15 RCBLpi kernel: [  350.701935] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:15 RCBLpi kernel: [  350.702001] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:15 RCBLpi kernel: [  350.702084] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:15 RCBLpi kernel: [  350.702150] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:15 RCBLpi kernel: [  350.702241] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:15 RCBLpi kernel: [  350.702307] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  355.711589] rt2800usb_tx_sta_fifo_read_completed: 77771 callbacks suppressed
Aug 10 14:45:20 RCBLpi kernel: [  355.711603] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  355.711674] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  355.711763] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  355.711829] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  355.711917] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  355.711983] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  355.712048] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  355.712135] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  355.712199] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  355.712265] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:20 RCBLpi kernel: [  356.171492] INFO: rcu_sched self-detected stall on CPU
Aug 10 14:45:20 RCBLpi kernel: [  356.171522]   0-...: (2099 ticks this GP) idle=72a/140000000000002/0 softirq=11632/11632 fqs=1002
Aug 10 14:45:20 RCBLpi kernel: [  356.171525]    (t=2100 jiffies g=5626 c=5625 q=17568)
Aug 10 14:45:20 RCBLpi kernel: [  356.171541] NMI backtrace for cpu 0
Aug 10 14:45:20 RCBLpi kernel: [  356.171553] CPU: 0 PID: 1007 Comm: kworker/u8:17 Tainted: G         C      4.14.61-v7+ #1
Aug 10 14:45:20 RCBLpi kernel: [  356.171558] Hardware name: BCM2835
Aug 10 14:45:20 RCBLpi kernel: [  356.171588] Workqueue: phy4 rt2x00usb_work_rxdone [rt2x00usb]
Aug 10 14:45:20 RCBLpi kernel: [  356.171622] [<8010ffc0>] (unwind_backtrace) from [<8010c1fc>] (show_stack+0x20/0x24)
Aug 10 14:45:20 RCBLpi kernel: [  356.171640] [<8010c1fc>] (show_stack) from [<807894fc>] (dump_stack+0xc8/0x10c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171659] [<807894fc>] (dump_stack) from [<8078f43c>] (nmi_cpu_backtrace+0x11c/0x120)
Aug 10 14:45:20 RCBLpi kernel: [  356.171676] [<8078f43c>] (nmi_cpu_backtrace) from [<8078f520>] (nmi_trigger_cpumask_backtrace+0xe0/0x12c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171692] [<8078f520>] (nmi_trigger_cpumask_backtrace) from [<8010e648>] (arch_trigger_cpumask_backtrace+0x20/0x24)
Aug 10 14:45:20 RCBLpi kernel: [  356.171710] [<8010e648>] (arch_trigger_cpumask_backtrace) from [<80185710>] (rcu_dump_cpu_stacks+0xb0/0xdc)
Aug 10 14:45:20 RCBLpi kernel: [  356.171728] [<80185710>] (rcu_dump_cpu_stacks) from [<801850b0>] (rcu_check_callbacks+0x7e8/0x960)
Aug 10 14:45:20 RCBLpi kernel: [  356.171745] [<801850b0>] (rcu_check_callbacks) from [<8018b314>] (update_process_times+0x44/0x70)
Aug 10 14:45:20 RCBLpi kernel: [  356.171762] [<8018b314>] (update_process_times) from [<8019d32c>] (tick_sched_handle+0x64/0x70)
Aug 10 14:45:20 RCBLpi kernel: [  356.171779] [<8019d32c>] (tick_sched_handle) from [<8019d590>] (tick_sched_timer+0x50/0xac)
Aug 10 14:45:20 RCBLpi kernel: [  356.171795] [<8019d590>] (tick_sched_timer) from [<8018c2cc>] (__hrtimer_run_queues+0x158/0x2ec)
Aug 10 14:45:20 RCBLpi kernel: [  356.171810] [<8018c2cc>] (__hrtimer_run_queues) from [<8018c704>] (hrtimer_interrupt+0xb8/0x20c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171827] [<8018c704>] (hrtimer_interrupt) from [<8063de74>] (arch_timer_handler_phys+0x40/0x48)
Aug 10 14:45:20 RCBLpi kernel: [  356.171846] [<8063de74>] (arch_timer_handler_phys) from [<8017a150>] (handle_percpu_devid_irq+0x88/0x23c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171863] [<8017a150>] (handle_percpu_devid_irq) from [<801749fc>] (generic_handle_irq+0x34/0x44)
Aug 10 14:45:20 RCBLpi kernel: [  356.171879] [<801749fc>] (generic_handle_irq) from [<80175050>] (__handle_domain_irq+0x6c/0xc4)
Aug 10 14:45:20 RCBLpi kernel: [  356.171894] [<80175050>] (__handle_domain_irq) from [<80101520>] (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
Aug 10 14:45:20 RCBLpi kernel: [  356.171911] [<80101520>] (bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>] (__irq_svc+0x5c/0x7c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171918] Exception stack(0xadbffab0 to 0xadbffaf8)
Aug 10 14:45:20 RCBLpi kernel: [  356.171927] faa0:                                     00000000 00000000 a980bd00 00000000
Aug 10 14:45:20 RCBLpi kernel: [  356.171939] fac0: b7716e80 60000113 b9fbe400 adae99d0 00000001 adbfe000 00000001 adbffb1c
Aug 10 14:45:20 RCBLpi kernel: [  356.171949] fae0: adbffb00 adbffb00 805b3388 805b339c 60000113 ffffffff
Aug 10 14:45:20 RCBLpi kernel: [  356.171969] [<807a4ffc>] (__irq_svc) from [<805b339c>] (__usb_hcd_giveback_urb+0x118/0x15c)
Aug 10 14:45:20 RCBLpi kernel: [  356.171988] [<805b339c>] (__usb_hcd_giveback_urb) from [<805b3530>] (usb_hcd_giveback_urb+0x4c/0xfc) 
Aug 10 14:45:20 RCBLpi kernel: [  356.172006] [<805b3530>] (usb_hcd_giveback_urb) from [<805dde40>] (completion_tasklet_func+0x80/0xac)
Aug 10 14:45:20 RCBLpi kernel: [  356.172024] [<805dde40>] (completion_tasklet_func) from [<805ecc60>] (tasklet_callback+0x20/0x24)
Aug 10 14:45:20 RCBLpi kernel: [  356.172040] [<805ecc60>] (tasklet_callback) from [<80123ac4>] (tasklet_hi_action+0x70/0x104)
Aug 10 14:45:20 RCBLpi kernel: [  356.172055] [<80123ac4>] (tasklet_hi_action) from [<8010169c>] (__do_softirq+0x174/0x3d8)
Aug 10 14:45:20 RCBLpi kernel: [  356.172069] [<8010169c>] (__do_softirq) from [<801236a8>] (irq_exit+0xe4/0x140)
Aug 10 14:45:20 RCBLpi kernel: [  356.172083] [<801236a8>] (irq_exit) from [<80175054>] (__handle_domain_irq+0x70/0xc4)
Aug 10 14:45:20 RCBLpi kernel: [  356.172097] [<80175054>] (__handle_domain_irq) from [<80101520>] (bcm2836_arm_irqchip_handle_irq+0xac/0xb0)
Aug 10 14:45:20 RCBLpi kernel: [  356.172111] [<80101520>] (bcm2836_arm_irqchip_handle_irq) from [<807a4ffc>] (__irq_svc+0x5c/0x7c)
Aug 10 14:45:20 RCBLpi kernel: [  356.172117] Exception stack(0xadbffc58 to 0xadbffca0)
Aug 10 14:45:20 RCBLpi kernel: [  356.172125] fc40:                                                       b9d83a40 00000000
Aug 10 14:45:20 RCBLpi kernel: [  356.172136] fc60: 00000001 00000000 60000013 00000000 aa2d4b00 a9949600 b9db0400 b9fbe400
Aug 10 14:45:20 RCBLpi kernel: [  356.172148] fc80: 00000000 adbffcbc adbffca8 adbffca8 807a4d2c 807a4d40 60000013 ffffffff
Aug 10 14:45:20 RCBLpi kernel: [  356.172164] [<807a4ffc>] (__irq_svc) from [<807a4d40>] (_raw_spin_unlock_irqrestore+0x50/0x70)
Aug 10 14:45:20 RCBLpi kernel: [  356.172180] [<807a4d40>] (_raw_spin_unlock_irqrestore) from [<805ecbf8>] (DWC_SPINUNLOCK_IRQRESTORE+0x18/0x1c)
Aug 10 14:45:20 RCBLpi kernel: [  356.172196] [<805ecbf8>] (DWC_SPINUNLOCK_IRQRESTORE) from [<805e27f4>] (dwc_otg_urb_enqueue+0x1dc/0x31c)
Aug 10 14:45:20 RCBLpi kernel: [  356.172212] [<805e27f4>] (dwc_otg_urb_enqueue) from [<805b4f34>] (usb_hcd_submit_urb+0xc8/0x934)
Aug 10 14:45:20 RCBLpi kernel: [  356.172226] [<805b4f34>] (usb_hcd_submit_urb) from [<805b687c>] (usb_submit_urb+0x284/0x510)
Aug 10 14:45:20 RCBLpi kernel: [  356.172250] [<805b687c>] (usb_submit_urb) from [<7f4a76e0>] (rt2x00usb_kick_rx_entry+0xb4/0xfc [rt2x00usb])
Aug 10 14:45:20 RCBLpi kernel: [  356.172296] [<7f4a76e0>] (rt2x00usb_kick_rx_entry [rt2x00usb]) from [<7f4a7758>] (rt2x00usb_clear_entry+0x30/0x34 [rt2x00usb])
Aug 10 14:45:20 RCBLpi kernel: [  356.172354] [<7f4a7758>] (rt2x00usb_clear_entry [rt2x00usb]) from [<7f492a44>] (rt2x00lib_rxdone+0x148/0x5b8 [rt2x00lib])
Aug 10 14:45:20 RCBLpi kernel: [  356.172412] [<7f492a44>] (rt2x00lib_rxdone [rt2x00lib]) from [<7f4a8088>] (rt2x00usb_work_rxdone+0x58/0x8c [rt2x00usb])
Aug 10 14:45:20 RCBLpi kernel: [  356.172440] [<7f4a8088>] (rt2x00usb_work_rxdone [rt2x00usb]) from [<80137384>] (process_one_work+0x224/0x518)
Aug 10 14:45:20 RCBLpi kernel: [  356.172459] [<80137384>] (process_one_work) from [<8013835c>] (worker_thread+0x60/0x5f0)
Aug 10 14:45:20 RCBLpi kernel: [  356.172476] [<8013835c>] (worker_thread) from [<8013db68>] (kthread+0x144/0x174)
Aug 10 14:45:20 RCBLpi kernel: [  356.172492] [<8013db68>] (kthread) from [<801080ac>] (ret_from_fork+0x14/0x28)
Aug 10 14:45:22 RCBLpi kernel: [  357.591518] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-... } 2109 jiffies s: 221 root: 0x1/.
Aug 10 14:45:22 RCBLpi kernel: [  357.591545] blocking rcu_node structures:
Aug 10 14:45:22 RCBLpi kernel: [  357.591554] Task dump for CPU 0:
Aug 10 14:45:22 RCBLpi kernel: [  357.591560] kworker/u8:17   R  running task        0  1007      2 0x00000002
Aug 10 14:45:22 RCBLpi kernel: [  357.591603] Workqueue: phy4 rt2x00usb_work_rxdone [rt2x00usb]
Aug 10 14:45:25 RCBLpi kernel: [  360.721537] rt2800usb_tx_sta_fifo_read_completed: 77886 callbacks suppressed
Aug 10 14:45:25 RCBLpi kernel: [  360.721552] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:25 RCBLpi kernel: [  360.721651] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71
Aug 10 14:45:25 RCBLpi kernel: [  360.721720] ieee80211 phy4: rt2800usb_tx_sta_fifo_read_completed: Warning - TX status read failed -71

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-16  4:25           ` Randy Oostdyk
@ 2018-08-16 10:31             ` Stanislaw Gruszka
  2018-08-16 11:02               ` Stanislaw Gruszka
  0 siblings, 1 reply; 14+ messages in thread
From: Stanislaw Gruszka @ 2018-08-16 10:31 UTC (permalink / raw)
  To: Randy Oostdyk; +Cc: linux-wireless

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]

On Wed, Aug 15, 2018 at 10:25:48PM -0600, Randy Oostdyk wrote:
> Sorry about the mangled log file! Here's the txt file as an
> attachment, hope it's useful.

We loop infinitely in rt2x00_rxdone because we can not submit
urb. We always get EPROTO error. I think usb host driver should
give ENODEV at some point, but this is out of tree dwc_otg driver,
so I don't expect we can get fix for it.

I prepared a patch that count EPROTO errors and if it's bigger
than 10 mark device as gone. It should not make a problem
when we will get one time random error.  I'm attaching it,
please test.

Thanks
Stanislaw


[-- Attachment #2: rt2x00_check_proto_err.patch --]
[-- Type: text/plain, Size: 2235 bytes --]

diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00.h b/drivers/net/wireless/ralink/rt2x00/rt2x00.h
index a279a43..2689bca 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00.h
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00.h
@@ -1014,6 +1014,7 @@ struct rt2x00_dev {
 	unsigned int extra_tx_headroom;
 
 	struct usb_anchor *anchor;
+	unsigned int num_proto_errs;
 
 	/* Clock for System On Chip devices. */
 	struct clk *clk;
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
index 086aad2..3607b28 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
@@ -31,6 +31,22 @@
 #include "rt2x00.h"
 #include "rt2x00usb.h"
 
+static bool rt2x00usb_check_usb_error(struct rt2x00_dev *rt2x00dev, int status)
+{
+	if (status == -ENODEV || status == -ENOENT)
+		return true;
+	
+	if (status == -EPROTO)
+		rt2x00_dev->num_proto_errs++;
+	else
+		rt2x00_dev->num_proto_errs = 0;
+
+	if (rt2x00_dev->num_proto_errs > 10)
+		return true;
+
+	return false
+}
+
 /*
  * Interfacing with the HW.
  */
@@ -57,7 +73,7 @@ int rt2x00usb_vendor_request(struct rt2x00_dev *rt2x00dev,
 		if (status >= 0)
 			return 0;
 
-		if (status == -ENODEV || status == -ENOENT) {
+		if (rt2x00usb_check_usb_error(rt2x00dev, status)) {
 			/* Device has disappeared. */
 			clear_bit(DEVICE_STATE_PRESENT, &rt2x00dev->flags);
 			break;
@@ -321,7 +337,7 @@ static bool rt2x00usb_kick_tx_entry(struct queue_entry *entry, void *data)
 
 	status = usb_submit_urb(entry_priv->urb, GFP_ATOMIC);
 	if (status) {
-		if (status == -ENODEV || status == -ENOENT)
+		if (rt2x00usb_check_usb_error(rt2x00dev, status))
 			clear_bit(DEVICE_STATE_PRESENT, &rt2x00dev->flags);
 		set_bit(ENTRY_DATA_IO_FAILED, &entry->flags);
 		rt2x00lib_dmadone(entry);
@@ -410,7 +426,7 @@ static bool rt2x00usb_kick_rx_entry(struct queue_entry *entry, void *data)
 
 	status = usb_submit_urb(entry_priv->urb, GFP_ATOMIC);
 	if (status) {
-		if (status == -ENODEV || status == -ENOENT)
+		if (rt2x00usb_check_usb_error(rt2x00dev, status))
 			clear_bit(DEVICE_STATE_PRESENT, &rt2x00dev->flags);
 		set_bit(ENTRY_DATA_IO_FAILED, &entry->flags);
 		rt2x00lib_dmadone(entry);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-16 10:31             ` Stanislaw Gruszka
@ 2018-08-16 11:02               ` Stanislaw Gruszka
  2018-08-28 16:09                 ` Randy Oostdyk
  0 siblings, 1 reply; 14+ messages in thread
From: Stanislaw Gruszka @ 2018-08-16 11:02 UTC (permalink / raw)
  To: Randy Oostdyk; +Cc: linux-wireless

[-- Attachment #1: Type: text/plain, Size: 343 bytes --]

On Thu, Aug 16, 2018 at 12:31:07PM +0200, Stanislaw Gruszka wrote:
> I prepared a patch that count EPROTO errors and if it's bigger
> than 10 mark device as gone. It should not make a problem
> when we will get one time random error.  I'm attaching it,
> please test.

Forgot to compile the patch, here is one that compiles.

Cheers
Stanislaw

[-- Attachment #2: rt2x00_check_proto_err_v2.patch --]
[-- Type: text/plain, Size: 1799 bytes --]

diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
index 086aad2..60b8bcc 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
@@ -31,6 +31,22 @@
 #include "rt2x00.h"
 #include "rt2x00usb.h"
 
+static bool rt2x00usb_check_usb_error(struct rt2x00_dev *rt2x00dev, int status)
+{
+	if (status == -ENODEV || status == -ENOENT)
+		return true;
+	
+	if (status == -EPROTO)
+		rt2x00dev->num_proto_errs++;
+	else
+		rt2x00dev->num_proto_errs = 0;
+
+	if (rt2x00dev->num_proto_errs > 10)
+		return true;
+
+	return false;
+}
+
 /*
  * Interfacing with the HW.
  */
@@ -57,7 +73,7 @@ int rt2x00usb_vendor_request(struct rt2x00_dev *rt2x00dev,
 		if (status >= 0)
 			return 0;
 
-		if (status == -ENODEV || status == -ENOENT) {
+		if (rt2x00usb_check_usb_error(rt2x00dev, status)) {
 			/* Device has disappeared. */
 			clear_bit(DEVICE_STATE_PRESENT, &rt2x00dev->flags);
 			break;
@@ -321,7 +337,7 @@ static bool rt2x00usb_kick_tx_entry(struct queue_entry *entry, void *data)
 
 	status = usb_submit_urb(entry_priv->urb, GFP_ATOMIC);
 	if (status) {
-		if (status == -ENODEV || status == -ENOENT)
+		if (rt2x00usb_check_usb_error(rt2x00dev, status))
 			clear_bit(DEVICE_STATE_PRESENT, &rt2x00dev->flags);
 		set_bit(ENTRY_DATA_IO_FAILED, &entry->flags);
 		rt2x00lib_dmadone(entry);
@@ -410,7 +426,7 @@ static bool rt2x00usb_kick_rx_entry(struct queue_entry *entry, void *data)
 
 	status = usb_submit_urb(entry_priv->urb, GFP_ATOMIC);
 	if (status) {
-		if (status == -ENODEV || status == -ENOENT)
+		if (rt2x00usb_check_usb_error(rt2x00dev, status))
 			clear_bit(DEVICE_STATE_PRESENT, &rt2x00dev->flags);
 		set_bit(ENTRY_DATA_IO_FAILED, &entry->flags);
 		rt2x00lib_dmadone(entry);

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: wireless dongle causing entire machine to hang
  2018-08-16 11:02               ` Stanislaw Gruszka
@ 2018-08-28 16:09                 ` Randy Oostdyk
  0 siblings, 0 replies; 14+ messages in thread
From: Randy Oostdyk @ 2018-08-28 16:09 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: linux-wireless

Hi Stanislaw,

I had considerable difficulty getting the module compiled and
installed. I had ignored your first patch, and only applied the second
patch (that you said compiles), but it turns out to be missing the
variable declarations portion of the first patch. After taking that
portion of the first patch, I managed to get the module to compile and
installed.

After one week of testing, I'm not able to cause the USB protocol
error (-71) to happen, except by disconnecting the device from the USB
bus. So I'm not able to fully ascertain if your patch really solved
the issue or not. Sorry... At least I can say that it doesn't break
anything!

I will try to continue testing, and report the results if I can manage
to get the error to surface again.

Cheers,
Randy

On Thu, Aug 16, 2018 at 5:02 AM, Stanislaw Gruszka <sgruszka@redhat.com> wrote:
> On Thu, Aug 16, 2018 at 12:31:07PM +0200, Stanislaw Gruszka wrote:
>> I prepared a patch that count EPROTO errors and if it's bigger
>> than 10 mark device as gone. It should not make a problem
>> when we will get one time random error.  I'm attaching it,
>> please test.
>
> Forgot to compile the patch, here is one that compiles.
>
> Cheers
> Stanislaw

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-08-28 20:02 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-08 17:58 wireless dongle causing entire machine to hang Randy Oostdyk
2018-08-08 19:04 ` Larry Finger
2018-08-08 19:14 ` Arend van Spriel
2018-08-08 21:43   ` Jonas Gorski
2018-08-09 11:10 ` Stanislaw Gruszka
2018-08-09 18:49   ` Randy Oostdyk
2018-08-10 11:14     ` Stanislaw Gruszka
2018-08-10 11:47       ` Randy Oostdyk
2018-08-15  5:19       ` Randy Oostdyk
2018-08-15  8:07         ` Stanislaw Gruszka
2018-08-16  4:25           ` Randy Oostdyk
2018-08-16 10:31             ` Stanislaw Gruszka
2018-08-16 11:02               ` Stanislaw Gruszka
2018-08-28 16:09                 ` Randy Oostdyk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.