linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)
@ 2019-07-24 13:27 Maik Stohn
  2019-07-24 14:20 ` Greg KH
  0 siblings, 1 reply; 7+ messages in thread
From: Maik Stohn @ 2019-07-24 13:27 UTC (permalink / raw)
  To: linux-usb

[-- Attachment #1: Type: text/plain, Size: 5949 bytes --]

KERNEL CRASH when using XHCI devices (affects any architecture, any USB device) 

This was already reported as a kernel bug in bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=204257) but I got told to report it here since it is usb related...

Affected kernels: 5.2, 5.2.1, 5.2.2, 5.3-rc1, ...

This bug is already causing real world problems with existing software and devices using SCSI BOT with raw SCSI commands and libusb software.

Reproduce (tested on several different machines with 5.2,5.2.1,5.2.2,5.3rc1):

 - usb flash drive attached to XHCI controller (e.g. USB3.0 flash drive attached to USB3.0 port)
 - generic scsi module loaded (e.g. /dev/sg0 comes up when attaching the flash drive)
 - command line tool "sg_raw" from "sg3-utils"
 - execute: and press a key + return (-s1 sends one byte which is read from stdin)
   $ sudo sg_raw -s1 /dev/sg0 00 00 00 00 00 00 00 00 00 00
   
 -> KERNEL Oops
 
 - same for -s2, -s3, ... up to -s8   (sending 1 to 8 bytes, exactly the maximum of bytes on my 64 bit machine where the "DMA bypass optimization / IDT" kicks in, see below)
 
Since this can be triggered by any normal user (without any special USB device needed) I think it is important enough to fix it for the existing 5.2 kernel as well.

---

Patch introducing the crash:  https://patchwork.kernel.org/patch/10919167 / commit 33e39350ebd20fe6a77a51b8c21c3aa6b4a208cf - "usb: xhci: add Immediate Data Transfer support"

Reason: NULL pointer dereference

---

I took me quite some time to find the cause of this.

I narrowed the crash down to the place inside of "xhci_queue_bulk_tx" in "xhci-ring.c" where the next SG is loaded

...
    while (sg && sent_len >= block_len) {
      /* New sg entry */
      --num_sgs;
      sent_len -= block_len;
      if (num_sgs != 0) {
        sg = sg_next(sg);
        block_len = sg_dma_len(sg);           <================= CRASH
                                                                 The comment of "sg_dma_len" clearly states "These macros should be used after a dma_map_sg call has been done..." - which was 
                                                                 omitted by the new "xhci_map_urb_for_dma" function since the transfer was considered suitable for IDT.
        addr = (u64) sg_dma_address(sg);
        addr += sent_len;
      }
    }
    block_len -= sent_len;
    send_addr = addr;
...

This only happens if the transfer was cosnideres suitable for IDT. 
When I patched the function "xhci_urb_suitable_for_idt" to always return false (nothing suitable for IDT) everything was working fine. 


Unfortunately I'm not deep enough into the inner workings of the kernel usb host driver to find a solution for this other than reverting the patch for IDT.


Greetings,

Maik Stohn


kernel.log:

Jul 19 16:43:29 lenovo kernel: BUG: kernel NULL pointer dereference, address: 0000000000000018
Jul 19 16:43:29 lenovo kernel: #PF: supervisor read access in kernel mode
Jul 19 16:43:29 lenovo kernel: #PF: error_code(0x0000) - not-present page
Jul 19 16:43:29 lenovo kernel: PGD 0 P4D 0 
Jul 19 16:43:29 lenovo kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Jul 19 16:43:29 lenovo kernel: CPU: 3 PID: 3901 Comm: usb-storage Tainted: P          IOE     5.2.1-arch1-1-ARCH #1
Jul 19 16:43:29 lenovo kernel: Hardware name: LENOVO 20287/AILZAZBZC, BIOS 8DCN26WW 09/23/2013
Jul 19 16:43:29 lenovo kernel: RIP: 0010:xhci_queue_bulk_tx+0x285/0x990 [xhci_hcd]
Jul 19 16:43:29 lenovo kernel: Code: 77 38 48 85 ed 74 33 48 89 ef eb 0a 48 85 ff 74 26 44 39 e3 72 21 44 29 e3 41 83 ee 01 74 ed e8 71 a2 1e e0 4c 63 d3 48 89 c7 <44> 8b 60 18 4c 03 50 10 >
Jul 19 16:43:29 lenovo kernel: RSP: 0018:ffffa7e6c1ce3b20 EFLAGS: 00010046
Jul 19 16:43:29 lenovo kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000410
Jul 19 16:43:29 lenovo kernel: RDX: ffff8b9112353ab0 RSI: 0000000000000010 RDI: 0000000000000000
Jul 19 16:43:29 lenovo kernel: RBP: ffff8b9113c5d570 R08: 0000000000000000 R09: 0000000000000000
Jul 19 16:43:29 lenovo kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Jul 19 16:43:29 lenovo kernel: R13: 0000000000000410 R14: 00000000ffffffff R15: 0000000000000001
Jul 19 16:43:29 lenovo kernel: FS:  0000000000000000(0000) GS:ffff8b91572c0000(0000) knlGS:0000000000000000
Jul 19 16:43:29 lenovo kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 19 16:43:29 lenovo kernel: CR2: 0000000000000018 CR3: 0000000053a0a002 CR4: 00000000001606e0
Jul 19 16:43:29 lenovo kernel: Call Trace:
Jul 19 16:43:29 lenovo kernel:  ? __switch_to_asm+0x41/0x70
Jul 19 16:43:29 lenovo kernel:  ? finish_task_switch+0x84/0x2d0
Jul 19 16:43:29 lenovo kernel:  ? __switch_to+0x87/0x460
Jul 19 16:43:29 lenovo kernel:  xhci_urb_enqueue+0x334/0x5b0 [xhci_hcd]
Jul 19 16:43:29 lenovo kernel:  usb_hcd_submit_urb+0xc7/0xbb0
Jul 19 16:43:29 lenovo kernel:  ? __kmalloc+0x189/0x220
Jul 19 16:43:29 lenovo kernel:  ? usb_alloc_urb+0x3e/0x60
Jul 19 16:43:29 lenovo kernel:  usb_sg_wait+0x7b/0x150
Jul 19 16:43:29 lenovo kernel:  usb_stor_bulk_transfer_sglist.part.0+0x71/0xd0 [usb_storage]
Jul 19 16:43:29 lenovo kernel:  usb_stor_bulk_srb+0x60/0x90 [usb_storage]
Jul 19 16:43:29 lenovo kernel:  usb_stor_Bulk_transport+0x179/0x3f0 [usb_storage]
Jul 19 16:43:29 lenovo kernel:  usb_stor_invoke_transport+0x63/0x520 [usb_storage]
Jul 19 16:43:29 lenovo kernel:  usb_stor_control_thread+0x233/0x300 [usb_storage]
Jul 19 16:43:29 lenovo kernel:  kthread+0xfd/0x130
Jul 19 16:43:29 lenovo kernel:  ? fill_inquiry_response+0x40/0x40 [usb_storage]
Jul 19 16:43:29 lenovo kernel:  ? kthread_park+0x90/0x90
Jul 19 16:43:29 lenovo kernel:  ret_from_fork+0x35/0x40
Jul 19 16:43:29 lenovo kernel: Modules linked in: hid_generic uas usbhid usb_storage hid rfcomm fuse xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mang>


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4165 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)
  2019-07-24 13:27 KERNEL CRASH when using XHCI devices (affects any architecture, any USB device) Maik Stohn
@ 2019-07-24 14:20 ` Greg KH
  2019-07-24 14:34   ` Maik Stohn
  0 siblings, 1 reply; 7+ messages in thread
From: Greg KH @ 2019-07-24 14:20 UTC (permalink / raw)
  To: Maik Stohn; +Cc: linux-usb

On Wed, Jul 24, 2019 at 03:27:51PM +0200, Maik Stohn wrote:
> KERNEL CRASH when using XHCI devices (affects any architecture, any USB device) 
> 
> This was already reported as a kernel bug in bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=204257) but I got told to report it here since it is usb related...
> 
> Affected kernels: 5.2, 5.2.1, 5.2.2, 5.3-rc1, ...
> 
> This bug is already causing real world problems with existing software and devices using SCSI BOT with raw SCSI commands and libusb software.
> 
> Reproduce (tested on several different machines with 5.2,5.2.1,5.2.2,5.3rc1):
> 
>  - usb flash drive attached to XHCI controller (e.g. USB3.0 flash drive attached to USB3.0 port)
>  - generic scsi module loaded (e.g. /dev/sg0 comes up when attaching the flash drive)
>  - command line tool "sg_raw" from "sg3-utils"
>  - execute: and press a key + return (-s1 sends one byte which is read from stdin)
>    $ sudo sg_raw -s1 /dev/sg0 00 00 00 00 00 00 00 00 00 00
>    
>  -> KERNEL Oops
>  
>  - same for -s2, -s3, ... up to -s8   (sending 1 to 8 bytes, exactly the maximum of bytes on my 64 bit machine where the "DMA bypass optimization / IDT" kicks in, see below)
>  
> Since this can be triggered by any normal user (without any special USB device needed) I think it is important enough to fix it for the existing 5.2 kernel as well.
> 
> ---
> 
> Patch introducing the crash:  https://patchwork.kernel.org/patch/10919167 / commit 33e39350ebd20fe6a77a51b8c21c3aa6b4a208cf - "usb: xhci: add Immediate Data Transfer support"
> 
> Reason: NULL pointer dereference
> 
> ---
> 
> I took me quite some time to find the cause of this.
> 
> I narrowed the crash down to the place inside of "xhci_queue_bulk_tx" in "xhci-ring.c" where the next SG is loaded
> 
> ...
>     while (sg && sent_len >= block_len) {
>       /* New sg entry */
>       --num_sgs;
>       sent_len -= block_len;
>       if (num_sgs != 0) {
>         sg = sg_next(sg);
>         block_len = sg_dma_len(sg);           <================= CRASH
>                                                                  The comment of "sg_dma_len" clearly states "These macros should be used after a dma_map_sg call has been done..." - which was 
>                                                                  omitted by the new "xhci_map_urb_for_dma" function since the transfer was considered suitable for IDT.
>         addr = (u64) sg_dma_address(sg);
>         addr += sent_len;
>       }
>     }
>     block_len -= sent_len;
>     send_addr = addr;
> ...
> 
> This only happens if the transfer was cosnideres suitable for IDT. 
> When I patched the function "xhci_urb_suitable_for_idt" to always return false (nothing suitable for IDT) everything was working fine. 
> 
> 
> Unfortunately I'm not deep enough into the inner workings of the kernel usb host driver to find a solution for this other than reverting the patch for IDT.

What patch did you find that caused this regression?  We can revert it
if that is the easiest thing to do.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)
  2019-07-24 14:20 ` Greg KH
@ 2019-07-24 14:34   ` Maik Stohn
  2019-07-24 15:53     ` Greg KH
  2019-07-24 16:03     ` Mathias Nyman
  0 siblings, 2 replies; 7+ messages in thread
From: Maik Stohn @ 2019-07-24 14:34 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-usb

[-- Attachment #1: Type: text/plain, Size: 3361 bytes --]

> 
> Am 24.07.2019 um 16:20 schrieb Greg KH <gregkh@linuxfoundation.org>:
> 
> On Wed, Jul 24, 2019 at 03:27:51PM +0200, Maik Stohn wrote:
>> KERNEL CRASH when using XHCI devices (affects any architecture, any USB device) 
>> 
>> This was already reported as a kernel bug in bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=204257) but I got told to report it here since it is usb related...
>> 
>> Affected kernels: 5.2, 5.2.1, 5.2.2, 5.3-rc1, ...
>> 
>> This bug is already causing real world problems with existing software and devices using SCSI BOT with raw SCSI commands and libusb software.
>> 
>> Reproduce (tested on several different machines with 5.2,5.2.1,5.2.2,5.3rc1):
>> 
>> - usb flash drive attached to XHCI controller (e.g. USB3.0 flash drive attached to USB3.0 port)
>> - generic scsi module loaded (e.g. /dev/sg0 comes up when attaching the flash drive)
>> - command line tool "sg_raw" from "sg3-utils"
>> - execute: and press a key + return (-s1 sends one byte which is read from stdin)
>>   $ sudo sg_raw -s1 /dev/sg0 00 00 00 00 00 00 00 00 00 00
>> 
>> -> KERNEL Oops
>> 
>> - same for -s2, -s3, ... up to -s8   (sending 1 to 8 bytes, exactly the maximum of bytes on my 64 bit machine where the "DMA bypass optimization / IDT" kicks in, see below)
>> 
>> Since this can be triggered by any normal user (without any special USB device needed) I think it is important enough to fix it for the existing 5.2 kernel as well.
>> 
>> ---
>> 
>> Patch introducing the crash:  https://patchwork.kernel.org/patch/10919167 / commit 33e39350ebd20fe6a77a51b8c21c3aa6b4a208cf - "usb: xhci: add Immediate Data Transfer support"
>> 
>> Reason: NULL pointer dereference
>> 
>> ---
>> 
>> I took me quite some time to find the cause of this.
>> 
>> I narrowed the crash down to the place inside of "xhci_queue_bulk_tx" in "xhci-ring.c" where the next SG is loaded
>> 
>> ...
>>    while (sg && sent_len >= block_len) {
>>      /* New sg entry */
>>      --num_sgs;
>>      sent_len -= block_len;
>>      if (num_sgs != 0) {
>>        sg = sg_next(sg);
>>        block_len = sg_dma_len(sg);           <================= CRASH
>>                                                                 The comment of "sg_dma_len" clearly states "These macros should be used after a dma_map_sg call has been done..." - which was 
>>                                                                 omitted by the new "xhci_map_urb_for_dma" function since the transfer was considered suitable for IDT.
>>        addr = (u64) sg_dma_address(sg);
>>        addr += sent_len;
>>      }
>>    }
>>    block_len -= sent_len;
>>    send_addr = addr;
>> ...
>> 
>> This only happens if the transfer was cosnideres suitable for IDT. 
>> When I patched the function "xhci_urb_suitable_for_idt" to always return false (nothing suitable for IDT) everything was working fine. 
>> 
>> 
>> Unfortunately I'm not deep enough into the inner workings of the kernel usb host driver to find a solution for this other than reverting the patch for IDT.
> 
> What patch did you find that caused this regression?  We can revert it
> if that is the easiest thing to do.
> 
> thanks,
> 
> greg k-h

I included the patch causing it above: https://patchwork.kernel.org/patch/10919167/  


Greetings, 

Maik Stohn


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4165 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)
  2019-07-24 14:34   ` Maik Stohn
@ 2019-07-24 15:53     ` Greg KH
  2019-07-24 16:03     ` Mathias Nyman
  1 sibling, 0 replies; 7+ messages in thread
From: Greg KH @ 2019-07-24 15:53 UTC (permalink / raw)
  To: Maik Stohn; +Cc: linux-usb

On Wed, Jul 24, 2019 at 04:34:06PM +0200, Maik Stohn wrote:
> > 
> > Am 24.07.2019 um 16:20 schrieb Greg KH <gregkh@linuxfoundation.org>:
> > 
> > On Wed, Jul 24, 2019 at 03:27:51PM +0200, Maik Stohn wrote:
> >> KERNEL CRASH when using XHCI devices (affects any architecture, any USB device) 
> >> 
> >> This was already reported as a kernel bug in bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=204257) but I got told to report it here since it is usb related...
> >> 
> >> Affected kernels: 5.2, 5.2.1, 5.2.2, 5.3-rc1, ...
> >> 
> >> This bug is already causing real world problems with existing software and devices using SCSI BOT with raw SCSI commands and libusb software.
> >> 
> >> Reproduce (tested on several different machines with 5.2,5.2.1,5.2.2,5.3rc1):
> >> 
> >> - usb flash drive attached to XHCI controller (e.g. USB3.0 flash drive attached to USB3.0 port)
> >> - generic scsi module loaded (e.g. /dev/sg0 comes up when attaching the flash drive)
> >> - command line tool "sg_raw" from "sg3-utils"
> >> - execute: and press a key + return (-s1 sends one byte which is read from stdin)
> >>   $ sudo sg_raw -s1 /dev/sg0 00 00 00 00 00 00 00 00 00 00
> >> 
> >> -> KERNEL Oops
> >> 
> >> - same for -s2, -s3, ... up to -s8   (sending 1 to 8 bytes, exactly the maximum of bytes on my 64 bit machine where the "DMA bypass optimization / IDT" kicks in, see below)
> >> 
> >> Since this can be triggered by any normal user (without any special USB device needed) I think it is important enough to fix it for the existing 5.2 kernel as well.
> >> 
> >> ---
> >> 
> >> Patch introducing the crash:  https://patchwork.kernel.org/patch/10919167 / commit 33e39350ebd20fe6a77a51b8c21c3aa6b4a208cf - "usb: xhci: add Immediate Data Transfer support"
> >> 
> >> Reason: NULL pointer dereference
> >> 
> >> ---
> >> 
> >> I took me quite some time to find the cause of this.
> >> 
> >> I narrowed the crash down to the place inside of "xhci_queue_bulk_tx" in "xhci-ring.c" where the next SG is loaded
> >> 
> >> ...
> >>    while (sg && sent_len >= block_len) {
> >>      /* New sg entry */
> >>      --num_sgs;
> >>      sent_len -= block_len;
> >>      if (num_sgs != 0) {
> >>        sg = sg_next(sg);
> >>        block_len = sg_dma_len(sg);           <================= CRASH
> >>                                                                 The comment of "sg_dma_len" clearly states "These macros should be used after a dma_map_sg call has been done..." - which was 
> >>                                                                 omitted by the new "xhci_map_urb_for_dma" function since the transfer was considered suitable for IDT.
> >>        addr = (u64) sg_dma_address(sg);
> >>        addr += sent_len;
> >>      }
> >>    }
> >>    block_len -= sent_len;
> >>    send_addr = addr;
> >> ...
> >> 
> >> This only happens if the transfer was cosnideres suitable for IDT. 
> >> When I patched the function "xhci_urb_suitable_for_idt" to always return false (nothing suitable for IDT) everything was working fine. 
> >> 
> >> 
> >> Unfortunately I'm not deep enough into the inner workings of the kernel usb host driver to find a solution for this other than reverting the patch for IDT.
> > 
> > What patch did you find that caused this regression?  We can revert it
> > if that is the easiest thing to do.
> > 
> > thanks,
> > 
> > greg k-h
> 
> I included the patch causing it above: https://patchwork.kernel.org/patch/10919167/  

That commit is only in 5.2, and there is another fix for it already, in
5.2.

So do you still have the issue if you only revert the one commit?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)
  2019-07-24 14:34   ` Maik Stohn
  2019-07-24 15:53     ` Greg KH
@ 2019-07-24 16:03     ` Mathias Nyman
  2019-07-24 16:29       ` Maik Stohn
  1 sibling, 1 reply; 7+ messages in thread
From: Mathias Nyman @ 2019-07-24 16:03 UTC (permalink / raw)
  To: Maik Stohn, Greg KH; +Cc: linux-usb

On 24.7.2019 17.34, Maik Stohn wrote:
>>
>> Am 24.07.2019 um 16:20 schrieb Greg KH <gregkh@linuxfoundation.org>:
>>
>> On Wed, Jul 24, 2019 at 03:27:51PM +0200, Maik Stohn wrote:
>>> KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)
>>>
>>> This was already reported as a kernel bug in bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=204257) but I got told to report it here since it is usb related...
>>>
>>> Affected kernels: 5.2, 5.2.1, 5.2.2, 5.3-rc1, ...
>>>
>>> This bug is already causing real world problems with existing software and devices using SCSI BOT with raw SCSI commands and libusb software.
>>>
>>> Reproduce (tested on several different machines with 5.2,5.2.1,5.2.2,5.3rc1):
>>>
>>> - usb flash drive attached to XHCI controller (e.g. USB3.0 flash drive attached to USB3.0 port)
>>> - generic scsi module loaded (e.g. /dev/sg0 comes up when attaching the flash drive)
>>> - command line tool "sg_raw" from "sg3-utils"
>>> - execute: and press a key + return (-s1 sends one byte which is read from stdin)
>>>    $ sudo sg_raw -s1 /dev/sg0 00 00 00 00 00 00 00 00 00 00
>>>
>>> -> KERNEL Oops
>>>
>>> - same for -s2, -s3, ... up to -s8   (sending 1 to 8 bytes, exactly the maximum of bytes on my 64 bit machine where the "DMA bypass optimization / IDT" kicks in, see below)
>>>
>>> Since this can be triggered by any normal user (without any special USB device needed) I think it is important enough to fix it for the existing 5.2 kernel as well.
>>>
>>> ---
>>>
>>> Patch introducing the crash:  https://patchwork.kernel.org/patch/10919167 / commit 33e39350ebd20fe6a77a51b8c21c3aa6b4a208cf - "usb: xhci: add Immediate Data Transfer support"
>>>
>>> Reason: NULL pointer dereference
>>>
>>> ---
>>>
>>> I took me quite some time to find the cause of this.
>>>
>>> I narrowed the crash down to the place inside of "xhci_queue_bulk_tx" in "xhci-ring.c" where the next SG is loaded
>>>
>>> ...
>>>     while (sg && sent_len >= block_len) {
>>>       /* New sg entry */
>>>       --num_sgs;
>>>       sent_len -= block_len;
>>>       if (num_sgs != 0) {
>>>         sg = sg_next(sg);
>>>         block_len = sg_dma_len(sg);           <================= CRASH
>>>                                                                  The comment of "sg_dma_len" clearly states "These macros should be used after a dma_map_sg call has been done..." - which was
>>>                                                                  omitted by the new "xhci_map_urb_for_dma" function since the transfer was considered suitable for IDT.
>>>         addr = (u64) sg_dma_address(sg);
>>>         addr += sent_len;
>>>       }
>>>     }
>>>     block_len -= sent_len;
>>>     send_addr = addr;
>>> ...
>>>
>>> This only happens if the transfer was cosnideres suitable for IDT.
>>> When I patched the function "xhci_urb_suitable_for_idt" to always return false (nothing suitable for IDT) everything was working fine.
>>>
>>>
>>> Unfortunately I'm not deep enough into the inner workings of the kernel usb host driver to find a solution for this other than reverting the patch for IDT.

Nice catch.
The immediate data (IDT) support assumed that when there is max 8 bytes of data,
and it's not already dma mapped then we can just copy the data directly from
urb->transfer_buffer.
											
I didn't take into account that this small amount of data can be in a sg list.

Does the below code help:

diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 7a26496..f5c4144 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -2175,7 +2175,8 @@ static inline bool xhci_urb_suitable_for_idt(struct urb *urb)
         if (!usb_endpoint_xfer_isoc(&urb->ep->desc) && usb_urb_dir_out(urb) &&
             usb_endpoint_maxp(&urb->ep->desc) >= TRB_IDT_MAX_SIZE &&
             urb->transfer_buffer_length <= TRB_IDT_MAX_SIZE &&
-           !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
+           !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) &&
+           !urb->num_sgs)
                 return true;
  
         return false;


-Mathias

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)
  2019-07-24 16:03     ` Mathias Nyman
@ 2019-07-24 16:29       ` Maik Stohn
  2019-07-25  7:31         ` Mathias Nyman
  0 siblings, 1 reply; 7+ messages in thread
From: Maik Stohn @ 2019-07-24 16:29 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Greg KH, linux-usb

[-- Attachment #1: Type: text/plain, Size: 4747 bytes --]

> 
> Am 24.07.2019 um 18:03 schrieb Mathias Nyman <mathias.nyman@linux.intel.com>:
> 
> On 24.7.2019 17.34, Maik Stohn wrote:
>>> 
>>> Am 24.07.2019 um 16:20 schrieb Greg KH <gregkh@linuxfoundation.org>:
>>> 
>>> On Wed, Jul 24, 2019 at 03:27:51PM +0200, Maik Stohn wrote:
>>>> KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)
>>>> 
>>>> This was already reported as a kernel bug in bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=204257) but I got told to report it here since it is usb related...
>>>> 
>>>> Affected kernels: 5.2, 5.2.1, 5.2.2, 5.3-rc1, ...
>>>> 
>>>> This bug is already causing real world problems with existing software and devices using SCSI BOT with raw SCSI commands and libusb software.
>>>> 
>>>> Reproduce (tested on several different machines with 5.2,5.2.1,5.2.2,5.3rc1):
>>>> 
>>>> - usb flash drive attached to XHCI controller (e.g. USB3.0 flash drive attached to USB3.0 port)
>>>> - generic scsi module loaded (e.g. /dev/sg0 comes up when attaching the flash drive)
>>>> - command line tool "sg_raw" from "sg3-utils"
>>>> - execute: and press a key + return (-s1 sends one byte which is read from stdin)
>>>>   $ sudo sg_raw -s1 /dev/sg0 00 00 00 00 00 00 00 00 00 00
>>>> 
>>>> -> KERNEL Oops
>>>> 
>>>> - same for -s2, -s3, ... up to -s8   (sending 1 to 8 bytes, exactly the maximum of bytes on my 64 bit machine where the "DMA bypass optimization / IDT" kicks in, see below)
>>>> 
>>>> Since this can be triggered by any normal user (without any special USB device needed) I think it is important enough to fix it for the existing 5.2 kernel as well.
>>>> 
>>>> ---
>>>> 
>>>> Patch introducing the crash:  https://patchwork.kernel.org/patch/10919167 / commit 33e39350ebd20fe6a77a51b8c21c3aa6b4a208cf - "usb: xhci: add Immediate Data Transfer support"
>>>> 
>>>> Reason: NULL pointer dereference
>>>> 
>>>> ---
>>>> 
>>>> I took me quite some time to find the cause of this.
>>>> 
>>>> I narrowed the crash down to the place inside of "xhci_queue_bulk_tx" in "xhci-ring.c" where the next SG is loaded
>>>> 
>>>> ...
>>>>    while (sg && sent_len >= block_len) {
>>>>      /* New sg entry */
>>>>      --num_sgs;
>>>>      sent_len -= block_len;
>>>>      if (num_sgs != 0) {
>>>>        sg = sg_next(sg);
>>>>        block_len = sg_dma_len(sg);           <================= CRASH
>>>>                                                                 The comment of "sg_dma_len" clearly states "These macros should be used after a dma_map_sg call has been done..." - which was
>>>>                                                                 omitted by the new "xhci_map_urb_for_dma" function since the transfer was considered suitable for IDT.
>>>>        addr = (u64) sg_dma_address(sg);
>>>>        addr += sent_len;
>>>>      }
>>>>    }
>>>>    block_len -= sent_len;
>>>>    send_addr = addr;
>>>> ...
>>>> 
>>>> This only happens if the transfer was cosnideres suitable for IDT.
>>>> When I patched the function "xhci_urb_suitable_for_idt" to always return false (nothing suitable for IDT) everything was working fine.
>>>> 
>>>> 
>>>> Unfortunately I'm not deep enough into the inner workings of the kernel usb host driver to find a solution for this other than reverting the patch for IDT.
> 
> Nice catch.
> The immediate data (IDT) support assumed that when there is max 8 bytes of data,
> and it's not already dma mapped then we can just copy the data directly from
> urb->transfer_buffer.
> 											
> I didn't take into account that this small amount of data can be in a sg list.
> 
> Does the below code help:
> 
> diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
> index 7a26496..f5c4144 100644
> --- a/drivers/usb/host/xhci.h
> +++ b/drivers/usb/host/xhci.h
> @@ -2175,7 +2175,8 @@ static inline bool xhci_urb_suitable_for_idt(struct urb *urb)
>        if (!usb_endpoint_xfer_isoc(&urb->ep->desc) && usb_urb_dir_out(urb) &&
>            usb_endpoint_maxp(&urb->ep->desc) >= TRB_IDT_MAX_SIZE &&
>            urb->transfer_buffer_length <= TRB_IDT_MAX_SIZE &&
> -           !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
> +           !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) &&
> +           !urb->num_sgs)
>                return true;
>         return false;
> 
> 
> -Mathias

Hello Matthias,

Your patch fixes the problem with short SGs and IDT. 
No more crashes. Tested on 5.3-rc.

Will this be applied to 5.2.*  as well? 
I need to inform some users how this problem is mitigated (e.g. using updated 5.2.3? when available or not using 5.2 at all and wait for 5.3).

Thanks for the fast feedback,

Greetings,

Maik Stohn

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 4165 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)
  2019-07-24 16:29       ` Maik Stohn
@ 2019-07-25  7:31         ` Mathias Nyman
  0 siblings, 0 replies; 7+ messages in thread
From: Mathias Nyman @ 2019-07-25  7:31 UTC (permalink / raw)
  To: Maik Stohn; +Cc: Greg KH, linux-usb

On 24.7.2019 19.29, Maik Stohn wrote:
>>
>> Am 24.07.2019 um 18:03 schrieb Mathias Nyman <mathias.nyman@linux.intel.com>:
>>
>> On 24.7.2019 17.34, Maik Stohn wrote:
>>>>
>>>> Am 24.07.2019 um 16:20 schrieb Greg KH <gregkh@linuxfoundation.org>:
>>>>
>>>> On Wed, Jul 24, 2019 at 03:27:51PM +0200, Maik Stohn wrote:
>>>>> KERNEL CRASH when using XHCI devices (affects any architecture, any USB device)
>>>>>
>>>>> This was already reported as a kernel bug in bugzilla (https://bugzilla.kernel.org/show_bug.cgi?id=204257) but I got told to report it here since it is usb related...
>>>>>
>>>>> Affected kernels: 5.2, 5.2.1, 5.2.2, 5.3-rc1, ...
>>>>>
>>>>> This bug is already causing real world problems with existing software and devices using SCSI BOT with raw SCSI commands and libusb software.
>>>>>
>>>>> Reproduce (tested on several different machines with 5.2,5.2.1,5.2.2,5.3rc1):
>>>>>
>>>>> - usb flash drive attached to XHCI controller (e.g. USB3.0 flash drive attached to USB3.0 port)
>>>>> - generic scsi module loaded (e.g. /dev/sg0 comes up when attaching the flash drive)
>>>>> - command line tool "sg_raw" from "sg3-utils"
>>>>> - execute: and press a key + return (-s1 sends one byte which is read from stdin)
>>>>>    $ sudo sg_raw -s1 /dev/sg0 00 00 00 00 00 00 00 00 00 00
>>>>>
>>>>> -> KERNEL Oops
>>>>>
>>>>> - same for -s2, -s3, ... up to -s8   (sending 1 to 8 bytes, exactly the maximum of bytes on my 64 bit machine where the "DMA bypass optimization / IDT" kicks in, see below)
>>>>>
>>>>> Since this can be triggered by any normal user (without any special USB device needed) I think it is important enough to fix it for the existing 5.2 kernel as well.
>>>>>
>>>>> ---
>>>>>
>>>>> Patch introducing the crash:  https://patchwork.kernel.org/patch/10919167 / commit 33e39350ebd20fe6a77a51b8c21c3aa6b4a208cf - "usb: xhci: add Immediate Data Transfer support"
>>>>>
>>>>> Reason: NULL pointer dereference
>>>>>
>>>>> ---
>>>>>
>>>>> I took me quite some time to find the cause of this.
>>>>>
>>>>> I narrowed the crash down to the place inside of "xhci_queue_bulk_tx" in "xhci-ring.c" where the next SG is loaded
>>>>>
>>>>> ...
>>>>>     while (sg && sent_len >= block_len) {
>>>>>       /* New sg entry */
>>>>>       --num_sgs;
>>>>>       sent_len -= block_len;
>>>>>       if (num_sgs != 0) {
>>>>>         sg = sg_next(sg);
>>>>>         block_len = sg_dma_len(sg);           <================= CRASH
>>>>>                                                                  The comment of "sg_dma_len" clearly states "These macros should be used after a dma_map_sg call has been done..." - which was
>>>>>                                                                  omitted by the new "xhci_map_urb_for_dma" function since the transfer was considered suitable for IDT.
>>>>>         addr = (u64) sg_dma_address(sg);
>>>>>         addr += sent_len;
>>>>>       }
>>>>>     }
>>>>>     block_len -= sent_len;
>>>>>     send_addr = addr;
>>>>> ...
>>>>>
>>>>> This only happens if the transfer was cosnideres suitable for IDT.
>>>>> When I patched the function "xhci_urb_suitable_for_idt" to always return false (nothing suitable for IDT) everything was working fine.
>>>>>
>>>>>
>>>>> Unfortunately I'm not deep enough into the inner workings of the kernel usb host driver to find a solution for this other than reverting the patch for IDT.
>>
>> Nice catch.
>> The immediate data (IDT) support assumed that when there is max 8 bytes of data,
>> and it's not already dma mapped then we can just copy the data directly from
>> urb->transfer_buffer.
>> 											
>> I didn't take into account that this small amount of data can be in a sg list.
>>
>> Does the below code help:
>>
>> diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
>> index 7a26496..f5c4144 100644
>> --- a/drivers/usb/host/xhci.h
>> +++ b/drivers/usb/host/xhci.h
>> @@ -2175,7 +2175,8 @@ static inline bool xhci_urb_suitable_for_idt(struct urb *urb)
>>         if (!usb_endpoint_xfer_isoc(&urb->ep->desc) && usb_urb_dir_out(urb) &&
>>             usb_endpoint_maxp(&urb->ep->desc) >= TRB_IDT_MAX_SIZE &&
>>             urb->transfer_buffer_length <= TRB_IDT_MAX_SIZE &&
>> -           !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP))
>> +           !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP) &&
>> +           !urb->num_sgs)
>>                 return true;
>>          return false;
>>
>>
>> -Mathias
> 
> Hello Matthias,
> 
> Your patch fixes the problem with short SGs and IDT.
> No more crashes. Tested on 5.3-rc.
> 
> Will this be applied to 5.2.*  as well?
> I need to inform some users how this problem is mitigated (e.g. using updated 5.2.3? when available or not using 5.2 at all and wait for 5.3).
> 
> Thanks for the fast feedback,
> 
> Greetings,
> 
> Maik Stohn
> 

I'll make a proper patch, and add stable tag so it should end up in a later 5.2.y kernel

-Mathias

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-25  7:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-24 13:27 KERNEL CRASH when using XHCI devices (affects any architecture, any USB device) Maik Stohn
2019-07-24 14:20 ` Greg KH
2019-07-24 14:34   ` Maik Stohn
2019-07-24 15:53     ` Greg KH
2019-07-24 16:03     ` Mathias Nyman
2019-07-24 16:29       ` Maik Stohn
2019-07-25  7:31         ` Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).