linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Deadlock under load with Linux 5.9 and other recent kernels
@ 2020-09-26  7:55 Christian Hewitt
  2020-09-26 10:51 ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Hewitt @ 2020-09-26  7:55 UTC (permalink / raw)
  To: linux-block, linux-usb, linux-amlogic; +Cc: furkan, Brad Harper

I am using an ARM SBC device with Amlogic S922X chip (Beelink GS-King-X, an Android STB) to boot the Kodi mediacentre distro LibreELEC (which I work on) although the issue is also reproducible with Manjaro and Armbian on the same hardware, and with the GT-King and GT-King Pro devices from the same vendor - all three devices are using a common dtsi:

https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gsking-x.dts
https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking-pro.dts
https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking.dts
https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-w400.dtsi

I have schematics for the devices, but can only share those privately on request.

For testing I am booting LibreELEC from SD card. The box has a 4TB SATA drive internally connected with a USB > SATA bridge, see dmesg: http://ix.io/2yLh and I connect a USB stick with a 4GB ISO file that I copy to the internal SATA drive. Within 10-20 seconds of starting the copy the box deadlocks needing a hard power cycle to recover. The timing of the deadlock is variable but the device _always_ deadlocks. Although I am using a simple copy use-case, there are similar reports in Armbian forums performing tasks like installs/updates that involve I/O loads.

Following advice in the #linux-amlogic IRC channel I added CONFIG_SOFTLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK and was able to get output on the HDMI screen (it is not possible to connect to UART pins without destroying the box case). If you advance the following video frame by frame in VLC you can see the output:

https://www.dropbox.com/s/klvcizim8cs5lze/lockup_clip.mov?dl=0

I am not a coding developer so the output doesn’t mean much to me, but I am happy to follow guidance or install patches to get more output or test things.

Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadlock under load with Linux 5.9 and other recent kernels
  2020-09-26  7:55 Deadlock under load with Linux 5.9 and other recent kernels Christian Hewitt
@ 2020-09-26 10:51 ` Jens Axboe
  2020-09-26 11:55   ` Christian Hewitt
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2020-09-26 10:51 UTC (permalink / raw)
  To: Christian Hewitt, linux-block, linux-usb, linux-amlogic
  Cc: furkan, Brad Harper

On 9/26/20 1:55 AM, Christian Hewitt wrote:
> I am using an ARM SBC device with Amlogic S922X chip (Beelink
> GS-King-X, an Android STB) to boot the Kodi mediacentre distro
> LibreELEC (which I work on) although the issue is also reproducible
> with Manjaro and Armbian on the same hardware, and with the GT-King
> and GT-King Pro devices from the same vendor - all three devices are
> using a common dtsi:
> 
> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gsking-x.dts
> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking-pro.dts
> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking.dts
> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-w400.dtsi
> 
> I have schematics for the devices, but can only share those privately
> on request.
> 
> For testing I am booting LibreELEC from SD card. The box has a 4TB
> SATA drive internally connected with a USB > SATA bridge, see dmesg:
> http://ix.io/2yLh and I connect a USB stick with a 4GB ISO file that I
> copy to the internal SATA drive. Within 10-20 seconds of starting the
> copy the box deadlocks needing a hard power cycle to recover. The
> timing of the deadlock is variable but the device _always_ deadlocks.
> Although I am using a simple copy use-case, there are similar reports
> in Armbian forums performing tasks like installs/updates that involve
> I/O loads.
> 
> Following advice in the #linux-amlogic IRC channel I added
> CONFIG_SOFTLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK and was able to
> get output on the HDMI screen (it is not possible to connect to UART
> pins without destroying the box case). If you advance the following
> video frame by frame in VLC you can see the output:
> 
> https://www.dropbox.com/s/klvcizim8cs5lze/lockup_clip.mov?dl=0

Try with this patch:

https://lore.kernel.org/linux-block/20200925191902.543953-1-shakeelb@google.com/

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadlock under load with Linux 5.9 and other recent kernels
  2020-09-26 10:51 ` Jens Axboe
@ 2020-09-26 11:55   ` Christian Hewitt
  2020-09-26 12:13     ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Hewitt @ 2020-09-26 11:55 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-usb, linux-amlogic, furkan, Brad Harper

> 
> On 26 Sep 2020, at 2:51 pm, Jens Axboe <axboe@kernel.dk> wrote:
> 
> On 9/26/20 1:55 AM, Christian Hewitt wrote:
>> I am using an ARM SBC device with Amlogic S922X chip (Beelink
>> GS-King-X, an Android STB) to boot the Kodi mediacentre distro
>> LibreELEC (which I work on) although the issue is also reproducible
>> with Manjaro and Armbian on the same hardware, and with the GT-King
>> and GT-King Pro devices from the same vendor - all three devices are
>> using a common dtsi:
>> 
>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gsking-x.dts
>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking-pro.dts
>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking.dts
>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-w400.dtsi
>> 
>> I have schematics for the devices, but can only share those privately
>> on request.
>> 
>> For testing I am booting LibreELEC from SD card. The box has a 4TB
>> SATA drive internally connected with a USB > SATA bridge, see dmesg:
>> http://ix.io/2yLh and I connect a USB stick with a 4GB ISO file that I
>> copy to the internal SATA drive. Within 10-20 seconds of starting the
>> copy the box deadlocks needing a hard power cycle to recover. The
>> timing of the deadlock is variable but the device _always_ deadlocks.
>> Although I am using a simple copy use-case, there are similar reports
>> in Armbian forums performing tasks like installs/updates that involve
>> I/O loads.
>> 
>> Following advice in the #linux-amlogic IRC channel I added
>> CONFIG_SOFTLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK and was able to
>> get output on the HDMI screen (it is not possible to connect to UART
>> pins without destroying the box case). If you advance the following
>> video frame by frame in VLC you can see the output:
>> 
>> https://www.dropbox.com/s/klvcizim8cs5lze/lockup_clip.mov?dl=0
> 
> Try with this patch:
> 
> https://lore.kernel.org/linux-block/20200925191902.543953-1-shakeelb@google.com/

It still locks up approx. 25 seconds into the copy operation. Here’s the output in video again (a little blurry):

https://www.dropbox.com/s/3j2czaq509arg6g/lockup_clip2.mov?dl=0

Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadlock under load with Linux 5.9 and other recent kernels
  2020-09-26 11:55   ` Christian Hewitt
@ 2020-09-26 12:13     ` Jens Axboe
  2020-09-26 12:28       ` Christian Hewitt
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2020-09-26 12:13 UTC (permalink / raw)
  To: Christian Hewitt
  Cc: linux-block, linux-usb, linux-amlogic, furkan, Brad Harper

On 9/26/20 5:55 AM, Christian Hewitt wrote:
>>
>> On 26 Sep 2020, at 2:51 pm, Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 9/26/20 1:55 AM, Christian Hewitt wrote:
>>> I am using an ARM SBC device with Amlogic S922X chip (Beelink
>>> GS-King-X, an Android STB) to boot the Kodi mediacentre distro
>>> LibreELEC (which I work on) although the issue is also reproducible
>>> with Manjaro and Armbian on the same hardware, and with the GT-King
>>> and GT-King Pro devices from the same vendor - all three devices are
>>> using a common dtsi:
>>>
>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gsking-x.dts
>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking-pro.dts
>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking.dts
>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-w400.dtsi
>>>
>>> I have schematics for the devices, but can only share those privately
>>> on request.
>>>
>>> For testing I am booting LibreELEC from SD card. The box has a 4TB
>>> SATA drive internally connected with a USB > SATA bridge, see dmesg:
>>> http://ix.io/2yLh and I connect a USB stick with a 4GB ISO file that I
>>> copy to the internal SATA drive. Within 10-20 seconds of starting the
>>> copy the box deadlocks needing a hard power cycle to recover. The
>>> timing of the deadlock is variable but the device _always_ deadlocks.
>>> Although I am using a simple copy use-case, there are similar reports
>>> in Armbian forums performing tasks like installs/updates that involve
>>> I/O loads.
>>>
>>> Following advice in the #linux-amlogic IRC channel I added
>>> CONFIG_SOFTLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK and was able to
>>> get output on the HDMI screen (it is not possible to connect to UART
>>> pins without destroying the box case). If you advance the following
>>> video frame by frame in VLC you can see the output:
>>>
>>> https://www.dropbox.com/s/klvcizim8cs5lze/lockup_clip.mov?dl=0
>>
>> Try with this patch:
>>
>> https://lore.kernel.org/linux-block/20200925191902.543953-1-shakeelb@google.com/
> 
> It still locks up approx. 25 seconds into the copy operation. Here’s the output in video again (a little blurry):
> 
> https://www.dropbox.com/s/3j2czaq509arg6g/lockup_clip2.mov?dl=0

Can you try and set CONFIG_SLUB in your .config instead of CONFIG_SLAB?

Also, just take a picture, should be easier to get readable than a video.
And the static trace is all that is needed.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadlock under load with Linux 5.9 and other recent kernels
  2020-09-26 12:13     ` Jens Axboe
@ 2020-09-26 12:28       ` Christian Hewitt
  2020-09-28  1:37         ` Christian Hewitt
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Hewitt @ 2020-09-26 12:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-usb, linux-amlogic, furkan, Brad Harper


> On 26 Sep 2020, at 4:13 pm, Jens Axboe <axboe@kernel.dk> wrote:
> 
> On 9/26/20 5:55 AM, Christian Hewitt wrote:
>>> 
>>> On 26 Sep 2020, at 2:51 pm, Jens Axboe <axboe@kernel.dk> wrote:
>>> 
>>> On 9/26/20 1:55 AM, Christian Hewitt wrote:
>>>> I am using an ARM SBC device with Amlogic S922X chip (Beelink
>>>> GS-King-X, an Android STB) to boot the Kodi mediacentre distro
>>>> LibreELEC (which I work on) although the issue is also reproducible
>>>> with Manjaro and Armbian on the same hardware, and with the GT-King
>>>> and GT-King Pro devices from the same vendor - all three devices are
>>>> using a common dtsi:
>>>> 
>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gsking-x.dts
>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking-pro.dts
>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking.dts
>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-w400.dtsi
>>>> 
>>>> I have schematics for the devices, but can only share those privately
>>>> on request.
>>>> 
>>>> For testing I am booting LibreELEC from SD card. The box has a 4TB
>>>> SATA drive internally connected with a USB > SATA bridge, see dmesg:
>>>> http://ix.io/2yLh and I connect a USB stick with a 4GB ISO file that I
>>>> copy to the internal SATA drive. Within 10-20 seconds of starting the
>>>> copy the box deadlocks needing a hard power cycle to recover. The
>>>> timing of the deadlock is variable but the device _always_ deadlocks.
>>>> Although I am using a simple copy use-case, there are similar reports
>>>> in Armbian forums performing tasks like installs/updates that involve
>>>> I/O loads.
>>>> 
>>>> Following advice in the #linux-amlogic IRC channel I added
>>>> CONFIG_SOFTLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK and was able to
>>>> get output on the HDMI screen (it is not possible to connect to UART
>>>> pins without destroying the box case). If you advance the following
>>>> video frame by frame in VLC you can see the output:
>>>> 
>>>> https://www.dropbox.com/s/klvcizim8cs5lze/lockup_clip.mov?dl=0
>>> 
>>> Try with this patch:
>>> 
>>> https://lore.kernel.org/linux-block/20200925191902.543953-1-shakeelb@google.com/
>> 
>> It still locks up approx. 25 seconds into the copy operation. Here’s the output in video again (a little blurry):
>> 
>> https://www.dropbox.com/s/3j2czaq509arg6g/lockup_clip2.mov?dl=0
> 
> Can you try and set CONFIG_SLUB in your .config instead of CONFIG_SLAB?

CONFIG_SLUB is already set, here’s the full defconfig http://paste.ubuntu.com/p/5BNdZv6J3c/

# dmesg | grep -i slub
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1

> Also, just take a picture, should be easier to get readable than a video.
> And the static trace is all that is needed.

This is from a GT-King Pro which someone reminded me has a large RS232 port on the rear:

https://pastebin.com/raw/sGtzgreN

Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadlock under load with Linux 5.9 and other recent kernels
  2020-09-26 12:28       ` Christian Hewitt
@ 2020-09-28  1:37         ` Christian Hewitt
  2020-09-28 11:06           ` Patrik Nilsson
  0 siblings, 1 reply; 8+ messages in thread
From: Christian Hewitt @ 2020-09-28  1:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-block, linux-usb, linux-amlogic, furkan, Brad Harper


> On 26 Sep 2020, at 4:28 pm, Christian Hewitt <christianshewitt@gmail.com> wrote:
> 
>> 
>> On 26 Sep 2020, at 4:13 pm, Jens Axboe <axboe@kernel.dk> wrote:
>> 
>> On 9/26/20 5:55 AM, Christian Hewitt wrote:
>>>> 
>>>> On 26 Sep 2020, at 2:51 pm, Jens Axboe <axboe@kernel.dk> wrote:
>>>> 
>>>> On 9/26/20 1:55 AM, Christian Hewitt wrote:
>>>>> I am using an ARM SBC device with Amlogic S922X chip (Beelink
>>>>> GS-King-X, an Android STB) to boot the Kodi mediacentre distro
>>>>> LibreELEC (which I work on) although the issue is also reproducible
>>>>> with Manjaro and Armbian on the same hardware, and with the GT-King
>>>>> and GT-King Pro devices from the same vendor - all three devices are
>>>>> using a common dtsi:
>>>>> 
>>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gsking-x.dts
>>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking-pro.dts
>>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking.dts
>>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-w400.dtsi
>>>>> 
>>>>> I have schematics for the devices, but can only share those privately
>>>>> on request.
>>>>> 
>>>>> For testing I am booting LibreELEC from SD card. The box has a 4TB
>>>>> SATA drive internally connected with a USB > SATA bridge, see dmesg:
>>>>> http://ix.io/2yLh and I connect a USB stick with a 4GB ISO file that I
>>>>> copy to the internal SATA drive. Within 10-20 seconds of starting the
>>>>> copy the box deadlocks needing a hard power cycle to recover. The
>>>>> timing of the deadlock is variable but the device _always_ deadlocks.
>>>>> Although I am using a simple copy use-case, there are similar reports
>>>>> in Armbian forums performing tasks like installs/updates that involve
>>>>> I/O loads.
>>>>> 
>>>>> Following advice in the #linux-amlogic IRC channel I added
>>>>> CONFIG_SOFTLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK and was able to
>>>>> get output on the HDMI screen (it is not possible to connect to UART
>>>>> pins without destroying the box case). If you advance the following
>>>>> video frame by frame in VLC you can see the output:
>>>>> 
>>>>> https://www.dropbox.com/s/klvcizim8cs5lze/lockup_clip.mov?dl=0
>>>> 
>>>> Try with this patch:
>>>> 
>>>> https://lore.kernel.org/linux-block/20200925191902.543953-1-shakeelb@google.com/
>>> 
>>> It still locks up approx. 25 seconds into the copy operation. Here’s the output in video again (a little blurry):
>>> 
>>> https://www.dropbox.com/s/3j2czaq509arg6g/lockup_clip2.mov?dl=0
>> 
>> Can you try and set CONFIG_SLUB in your .config instead of CONFIG_SLAB?
> 
> CONFIG_SLUB is already set, here’s the full defconfig http://paste.ubuntu.com/p/5BNdZv6J3c/
> 
> # dmesg | grep -i slub
> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1
> 
>> Also, just take a picture, should be easier to get readable than a video.
>> And the static trace is all that is needed.
> 
> This is from a GT-King Pro which someone reminded me has a large RS232 port on the rear:
> 
> https://pastebin.com/raw/sGtzgreN

from 5.9—rc7 https://pastebin.com/raw/nbHJmrqe

Christian





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Deadlock under load with Linux 5.9 and other recent kernels
  2020-09-28  1:37         ` Christian Hewitt
@ 2020-09-28 11:06           ` Patrik Nilsson
  2020-09-28 13:36             ` Christian Hewitt
  0 siblings, 1 reply; 8+ messages in thread
From: Patrik Nilsson @ 2020-09-28 11:06 UTC (permalink / raw)
  To: Christian Hewitt, Jens Axboe
  Cc: linux-block, linux-usb, linux-amlogic, furkan, Brad Harper

Hi!

To me this bug description is very similar to what I'm struggling with 
on an amd64-platform.

When I get too much data sent via usb, it seems as the usb controlmsg is 
delayed so it times out and unmounts the block device.

I have been working on my related bug for long to get it easily 
reproducible, but failed. It is there all the time. New hardware is on 
its way so I can continue my testing.

Maybe you can test the patch I'm using to see if it works better for you?

In the meanwhile here is my description of my bug:

> I have stress tested the usb system. To the USB is now seven 
> mechanical hard disks and two ssd disks connected. Six processes are 
> at the same time writing random data to the disks. One of them is to 
> the ssd disk I couldn't write data to before without it failed. Also 
> the other usb-ssd disk is my root partition.
>
> Before I applied the patch, my root partition sometimes failed to be 
> kept mounted. Now I have not had any crashes.
>
> This is a quick fix for hard disks, but working. It continued to work 
> when I started three virtualbox guests and let them also do work. The 
> guests' hard disks is on my usb-root partition.
>
> It doesn't work if I also use my usb2ethernet adapter (ID 2001:4a00 
> D-Link Corp.), although my root partition and two randomize tests 
> survived. Maybe a much larger timeout in this case will help? But this 
> I don't find as a good solution.
>
> The behavior is the same on the other (much slower) computer with a 
> different usb hub. I have also tested it with exactly the same setup 
> as earlier, with no mechanical hard disks, and it works with the patch 
> and not without it.

Best regards,
Patrik

---start of diff---
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 5b768b80d1ee..3c550934815c 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -105,7 +105,7 @@ MODULE_PARM_DESC(use_both_schemes,
  DECLARE_RWSEM(ehci_cf_port_reset_rwsem);
  EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem);

-#define HUB_DEBOUNCE_TIMEOUT    2000
+#define HUB_DEBOUNCE_TIMEOUT    10000
  #define HUB_DEBOUNCE_STEP      25
  #define HUB_DEBOUNCE_STABLE     100

diff --git a/include/linux/usb.h b/include/linux/usb.h
index 20c555db4621..e64d441bb78f 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -1841,8 +1841,8 @@ extern int usb_set_configuration(struct usb_device 
*dev, int configuration);
   * USB identifies 5 second timeouts, maybe more in a few cases, and a few
   * slow devices (like some MGE Ellipse UPSes) actually push that limit.
   */
-#define USB_CTRL_GET_TIMEOUT    5000
-#define USB_CTRL_SET_TIMEOUT    5000
+#define USB_CTRL_GET_TIMEOUT    10000
+#define USB_CTRL_SET_TIMEOUT    10000


  /**
---end of diff---


On 28/09/2020 03:37, Christian Hewitt wrote:
>> On 26 Sep 2020, at 4:28 pm, Christian Hewitt <christianshewitt@gmail.com> wrote:
>>
>>> On 26 Sep 2020, at 4:13 pm, Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> On 9/26/20 5:55 AM, Christian Hewitt wrote:
>>>>> On 26 Sep 2020, at 2:51 pm, Jens Axboe <axboe@kernel.dk> wrote:
>>>>>
>>>>> On 9/26/20 1:55 AM, Christian Hewitt wrote:
>>>>>> I am using an ARM SBC device with Amlogic S922X chip (Beelink
>>>>>> GS-King-X, an Android STB) to boot the Kodi mediacentre distro
>>>>>> LibreELEC (which I work on) although the issue is also reproducible
>>>>>> with Manjaro and Armbian on the same hardware, and with the GT-King
>>>>>> and GT-King Pro devices from the same vendor - all three devices are
>>>>>> using a common dtsi:
>>>>>>
>>>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gsking-x.dts
>>>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking-pro.dts
>>>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-gtking.dts
>>>>>> https://github.com/chewitt/linux/blob/amlogic-5.9-integ/arch/arm64/boot/dts/amlogic/meson-g12b-w400.dtsi
>>>>>>
>>>>>> I have schematics for the devices, but can only share those privately
>>>>>> on request.
>>>>>>
>>>>>> For testing I am booting LibreELEC from SD card. The box has a 4TB
>>>>>> SATA drive internally connected with a USB > SATA bridge, see dmesg:
>>>>>> http://ix.io/2yLh and I connect a USB stick with a 4GB ISO file that I
>>>>>> copy to the internal SATA drive. Within 10-20 seconds of starting the
>>>>>> copy the box deadlocks needing a hard power cycle to recover. The
>>>>>> timing of the deadlock is variable but the device _always_ deadlocks.
>>>>>> Although I am using a simple copy use-case, there are similar reports
>>>>>> in Armbian forums performing tasks like installs/updates that involve
>>>>>> I/O loads.
>>>>>>
>>>>>> Following advice in the #linux-amlogic IRC channel I added
>>>>>> CONFIG_SOFTLOCKUP_DETECTOR and CONFIG_DETECT_HUNG_TASK and was able to
>>>>>> get output on the HDMI screen (it is not possible to connect to UART
>>>>>> pins without destroying the box case). If you advance the following
>>>>>> video frame by frame in VLC you can see the output:
>>>>>>
>>>>>> https://www.dropbox.com/s/klvcizim8cs5lze/lockup_clip.mov?dl=0
>>>>> Try with this patch:
>>>>>
>>>>> https://lore.kernel.org/linux-block/20200925191902.543953-1-shakeelb@google.com/
>>>> It still locks up approx. 25 seconds into the copy operation. Here’s the output in video again (a little blurry):
>>>>
>>>> https://www.dropbox.com/s/3j2czaq509arg6g/lockup_clip2.mov?dl=0
>>> Can you try and set CONFIG_SLUB in your .config instead of CONFIG_SLAB?
>> CONFIG_SLUB is already set, here’s the full defconfig http://paste.ubuntu.com/p/5BNdZv6J3c/
>>
>> # dmesg | grep -i slub
>> [    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=6, Nodes=1
>>
>>> Also, just take a picture, should be easier to get readable than a video.
>>> And the static trace is all that is needed.
>> This is from a GT-King Pro which someone reminded me has a large RS232 port on the rear:
>>
>> https://pastebin.com/raw/sGtzgreN
> from 5.9—rc7 https://pastebin.com/raw/nbHJmrqe
>
> Christian
>
>
>
>
-- 
PGP-key fingerprint: 1B30 7F61 AF9E 538A FCD6  2BE7 CED7 B0E4 3BF9 8D6C


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Deadlock under load with Linux 5.9 and other recent kernels
  2020-09-28 11:06           ` Patrik Nilsson
@ 2020-09-28 13:36             ` Christian Hewitt
  0 siblings, 0 replies; 8+ messages in thread
From: Christian Hewitt @ 2020-09-28 13:36 UTC (permalink / raw)
  To: Patrik Nilsson
  Cc: Jens Axboe, linux-block, linux-usb, linux-amlogic, furkan, Brad Harper


> On 28 Sep 2020, at 3:06 pm, Patrik Nilsson <nipatriknilsson@gmail.com> wrote:
> 
> Hi!
> 
> To me this bug description is very similar to what I'm struggling with on an amd64-platform.
> 
> When I get too much data sent via usb, it seems as the usb controlmsg is delayed so it times out and unmounts the block device.
> 
> I have been working on my related bug for long to get it easily reproducible, but failed. It is there all the time. New hardware is on its way so I can continue my testing.
> 
> Maybe you can test the patch I'm using to see if it works better for you?
> 
> In the meanwhile here is my description of my bug:
> 
>> I have stress tested the usb system. To the USB is now seven mechanical hard disks and two ssd disks connected. Six processes are at the same time writing random data to the disks. One of them is to the ssd disk I couldn't write data to before without it failed. Also the other usb-ssd disk is my root partition.
>> 
>> Before I applied the patch, my root partition sometimes failed to be kept mounted. Now I have not had any crashes.
>> 
>> This is a quick fix for hard disks, but working. It continued to work when I started three virtualbox guests and let them also do work. The guests' hard disks is on my usb-root partition.
>> 
>> It doesn't work if I also use my usb2ethernet adapter (ID 2001:4a00 D-Link Corp.), although my root partition and two randomize tests survived. Maybe a much larger timeout in this case will help? But this I don't find as a good solution.
>> 
>> The behavior is the same on the other (much slower) computer with a different usb hub. I have also tested it with exactly the same setup as earlier, with no mechanical hard disks, and it works with the patch and not without it.
> 
> Best regards,
> Patrik
> 
> ---start of diff---
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 5b768b80d1ee..3c550934815c 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -105,7 +105,7 @@ MODULE_PARM_DESC(use_both_schemes,
>  DECLARE_RWSEM(ehci_cf_port_reset_rwsem);
>  EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem);
> 
> -#define HUB_DEBOUNCE_TIMEOUT    2000
> +#define HUB_DEBOUNCE_TIMEOUT    10000
>  #define HUB_DEBOUNCE_STEP      25
>  #define HUB_DEBOUNCE_STABLE     100
> 
> diff --git a/include/linux/usb.h b/include/linux/usb.h
> index 20c555db4621..e64d441bb78f 100644
> --- a/include/linux/usb.h
> +++ b/include/linux/usb.h
> @@ -1841,8 +1841,8 @@ extern int usb_set_configuration(struct usb_device *dev, int configuration);
>   * USB identifies 5 second timeouts, maybe more in a few cases, and a few
>   * slow devices (like some MGE Ellipse UPSes) actually push that limit.
>   */
> -#define USB_CTRL_GET_TIMEOUT    5000
> -#define USB_CTRL_SET_TIMEOUT    5000
> +#define USB_CTRL_GET_TIMEOUT    10000
> +#define USB_CTRL_SET_TIMEOUT    10000
> 
> 
>  /**
> ---end of diff---

No obvious changes with this patch applied. Here’s output https://pastebin.com/raw/ZMgwNqgm

Christian

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-09-28 13:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-26  7:55 Deadlock under load with Linux 5.9 and other recent kernels Christian Hewitt
2020-09-26 10:51 ` Jens Axboe
2020-09-26 11:55   ` Christian Hewitt
2020-09-26 12:13     ` Jens Axboe
2020-09-26 12:28       ` Christian Hewitt
2020-09-28  1:37         ` Christian Hewitt
2020-09-28 11:06           ` Patrik Nilsson
2020-09-28 13:36             ` Christian Hewitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).