linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
@ 2019-12-06  3:24 Marek Marczykowski-Górecki
  2019-12-06  6:50 ` Suwan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Marek Marczykowski-Górecki @ 2019-12-06  3:24 UTC (permalink / raw)
  To: linux-usb; +Cc: Suwan Kim, Shuah Khan, Valentina Manea

[-- Attachment #1: Type: text/plain, Size: 5097 bytes --]

Hello,

I've hit an issue with recent 4.19 and 5.4 kernels. In short: if I
connect Yubikey 4 and use its CCID interface (for example `ykman oath
list` command), the client side hangs (100% reliably). After 60s I get a
message that a CPU hangs waiting for a spinlock (see below).

I've bisected it to a ea44d190764b4422af ("usbip: Implement SG support
to vhci-hcd and stub driver") commit. Which indeed is also backported to
4.19.

Any idea what is going on here? I can easily provide more information,
if you tell me how to get it.

The kernel log:
[ 6452.701016] usb 1-1: New USB device found, idVendor=1050, idProduct=0407, bcdDevice= 4.27
[ 6452.701049] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[ 6452.701075] usb 1-1: Product: Yubikey 4 OTP+U2F+CCID
[ 6452.701092] usb 1-1: Manufacturer: Yubico
[ 6452.711566] input: Yubico Yubikey 4 OTP+U2F+CCID as /devices/platform/vhci_hcd.0/usb1/1-1/1-1:1.0/0003:1050:0407.0001/input/input1
[ 6452.762251] hid-generic 0003:1050:0407.0001: input,hidraw0: USB HID v1.10 Keyboard [Yubico Yubikey 4 OTP+U2F+CCID] on usb-vhci_hcd.0-1/input0
[ 6452.770270] hid-generic 0003:1050:0407.0002: hiddev96,hidraw1: USB HID v1.10 Device [Yubico Yubikey 4 OTP+U2F+CCID] on usb-vhci_hcd.0-1/input1
[ 6530.805002] vhci_hcd: unlink->seqnum 111
[ 6530.805024] vhci_hcd: urb->status -104
[ 6531.386607] usb 1-1: recv xbuf, 42
[ 6531.386701] vhci_hcd: stop threads
[ 6531.386718] vhci_hcd: release socket
[ 6531.386734] vhci_hcd: disconnect device
[ 6531.386800] usb 1-1: USB disconnect, device number 2
[ 6591.409099] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 6591.409130] rcu:     1-...!: (0 ticks this GP) idle=53a/1/0x4000000000000000 softirq=18978/18978 fqs=0 
[ 6591.409158] rcu:     (detected by 0, t=60002 jiffies, g=17933, q=893)
[ 6591.409181] Sending NMI from CPU 0 to CPUs 1:
[ 6591.410415] NMI backtrace for cpu 1
[ 6591.410416] CPU: 1 PID: 338 Comm: kworker/1:2 Tainted: G           O      4.19.84-1.pvops.qubes.x86_64 #1
[ 6591.410417] Workqueue: usb_hub_wq hub_event
[ 6591.410417] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40
[ 6591.410418] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 9c 58 0f 1f 44 00 00 48 89 c3 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 f0 0f b1 17 <85> c0 75 05 48 89 d8 5b c3 89 c6 e8 3e 1c 7e ff 66 90 48 89 d8 5b
[ 6591.410419] RSP: 0000:ffffc900009dfaf8 EFLAGS: 00000046
[ 6591.410419] RAX: 0000000000000000 RBX: 0000000000000082 RCX: 0000000000000000
[ 6591.410420] RDX: 0000000000000001 RSI: ffff88801e633e40 RDI: ffff888006740be0
[ 6591.410420] RBP: ffff888006740be0 R08: 0000000000000000 R09: ffffffff81346f00
[ 6591.410420] R10: ffff888014f5b5d8 R11: 0000000000000000 R12: 00000000ffffff94
[ 6591.410421] R13: ffff88803ff06000 R14: ffff88803ff06000 R15: ffff88801e633e40
[ 6591.410421] FS:  0000000000000000(0000) GS:ffff8880f5b00000(0000) knlGS:0000000000000000
[ 6591.410421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6591.410422] CR2: 00005eede8b122a8 CR3: 000000000220a002 CR4: 00000000003606e0
[ 6591.410422] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6591.410423] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6591.410423] Call Trace:
[ 6591.410423]  vhci_urb_dequeue+0x2b/0x280 [vhci_hcd]
[ 6591.410423]  usb_hcd_flush_endpoint+0x119/0x190
[ 6591.410424]  usb_disable_endpoint+0x7b/0xa0
[ 6591.410424]  usb_disable_interface+0x3e/0x50
[ 6591.410424]  usb_unbind_interface+0x117/0x250
[ 6591.410425]  device_release_driver_internal+0x17d/0x240
[ 6591.410425]  bus_remove_device+0xe5/0x150
[ 6591.410425]  device_del+0x161/0x360
[ 6591.410426]  ? usb_remove_ep_devs+0x1b/0x30
[ 6591.410426]  usb_disable_device+0x93/0x240
[ 6591.410426]  usb_disconnect+0x90/0x270
[ 6591.410427]  hub_port_connect+0x83/0xab0
[ 6591.410427]  hub_event+0x8d1/0xab0
[ 6591.410427]  process_one_work+0x191/0x370
[ 6591.410428]  worker_thread+0x4f/0x3b0
[ 6591.410428]  kthread+0xf8/0x130
[ 6591.410428]  ? rescuer_thread+0x340/0x340
[ 6591.410428]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 6591.410429]  ret_from_fork+0x35/0x40
[ 6591.410432] rcu: rcu_sched kthread starved for 60002 jiffies! g17933 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
[ 6591.411962] rcu: RCU grace-period kthread stack dump:
[ 6591.411980] rcu_sched       I    0    10      2 0x80000000
[ 6591.411998] Call Trace:
[ 6591.412015]  ? __schedule+0x3f5/0x870
[ 6591.412030]  schedule+0x32/0x80
[ 6591.412044]  schedule_timeout+0x16f/0x350
[ 6591.412059]  ? __next_timer_interrupt+0xc0/0xc0
[ 6591.412077]  rcu_gp_kthread+0x569/0x950
[ 6591.412092]  kthread+0xf8/0x130
[ 6591.412106]  ? rcu_nocb_kthread+0x560/0x560
[ 6591.412119]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 6591.412136]  ret_from_fork+0x35/0x40

Some more details are available here:
https://github.com/QubesOS/qubes-issues/issues/5498

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-06  3:24 "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock Marek Marczykowski-Górecki
@ 2019-12-06  6:50 ` Suwan Kim
  2019-12-06 20:57   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 17+ messages in thread
From: Suwan Kim @ 2019-12-06  6:50 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: linux-usb, Shuah Khan, Valentina Manea

On Fri, Dec 06, 2019 at 04:24:06AM +0100, Marek Marczykowski-Górecki wrote:
> Hello,
> 
> I've hit an issue with recent 4.19 and 5.4 kernels. In short: if I
> connect Yubikey 4 and use its CCID interface (for example `ykman oath
> list` command), the client side hangs (100% reliably). After 60s I get a
> message that a CPU hangs waiting for a spinlock (see below).
> 
> I've bisected it to a ea44d190764b4422af ("usbip: Implement SG support
> to vhci-hcd and stub driver") commit. Which indeed is also backported to
> 4.19.
> 
> Any idea what is going on here? I can easily provide more information,
> if you tell me how to get it.
> 

Hi,

Thanks for reporting. Could you turn on lockdep and USBIP_DEBUG
in kernel config and send dmesg log? It will be helpful to figure
out lock dependency in vhci_hcd.

Regards,
Suwan Kim

> The kernel log:
> [ 6452.701016] usb 1-1: New USB device found, idVendor=1050, idProduct=0407, bcdDevice= 4.27
> [ 6452.701049] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [ 6452.701075] usb 1-1: Product: Yubikey 4 OTP+U2F+CCID
> [ 6452.701092] usb 1-1: Manufacturer: Yubico
> [ 6452.711566] input: Yubico Yubikey 4 OTP+U2F+CCID as /devices/platform/vhci_hcd.0/usb1/1-1/1-1:1.0/0003:1050:0407.0001/input/input1
> [ 6452.762251] hid-generic 0003:1050:0407.0001: input,hidraw0: USB HID v1.10 Keyboard [Yubico Yubikey 4 OTP+U2F+CCID] on usb-vhci_hcd.0-1/input0
> [ 6452.770270] hid-generic 0003:1050:0407.0002: hiddev96,hidraw1: USB HID v1.10 Device [Yubico Yubikey 4 OTP+U2F+CCID] on usb-vhci_hcd.0-1/input1
> [ 6530.805002] vhci_hcd: unlink->seqnum 111
> [ 6530.805024] vhci_hcd: urb->status -104
> [ 6531.386607] usb 1-1: recv xbuf, 42
> [ 6531.386701] vhci_hcd: stop threads
> [ 6531.386718] vhci_hcd: release socket
> [ 6531.386734] vhci_hcd: disconnect device
> [ 6531.386800] usb 1-1: USB disconnect, device number 2
> [ 6591.409099] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 6591.409130] rcu:     1-...!: (0 ticks this GP) idle=53a/1/0x4000000000000000 softirq=18978/18978 fqs=0 
> [ 6591.409158] rcu:     (detected by 0, t=60002 jiffies, g=17933, q=893)
> [ 6591.409181] Sending NMI from CPU 0 to CPUs 1:
> [ 6591.410415] NMI backtrace for cpu 1
> [ 6591.410416] CPU: 1 PID: 338 Comm: kworker/1:2 Tainted: G           O      4.19.84-1.pvops.qubes.x86_64 #1
> [ 6591.410417] Workqueue: usb_hub_wq hub_event
> [ 6591.410417] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40
> [ 6591.410418] Code: 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 9c 58 0f 1f 44 00 00 48 89 c3 fa 66 0f 1f 44 00 00 31 c0 ba 01 00 00 00 f0 0f b1 17 <85> c0 75 05 48 89 d8 5b c3 89 c6 e8 3e 1c 7e ff 66 90 48 89 d8 5b
> [ 6591.410419] RSP: 0000:ffffc900009dfaf8 EFLAGS: 00000046
> [ 6591.410419] RAX: 0000000000000000 RBX: 0000000000000082 RCX: 0000000000000000
> [ 6591.410420] RDX: 0000000000000001 RSI: ffff88801e633e40 RDI: ffff888006740be0
> [ 6591.410420] RBP: ffff888006740be0 R08: 0000000000000000 R09: ffffffff81346f00
> [ 6591.410420] R10: ffff888014f5b5d8 R11: 0000000000000000 R12: 00000000ffffff94
> [ 6591.410421] R13: ffff88803ff06000 R14: ffff88803ff06000 R15: ffff88801e633e40
> [ 6591.410421] FS:  0000000000000000(0000) GS:ffff8880f5b00000(0000) knlGS:0000000000000000
> [ 6591.410421] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 6591.410422] CR2: 00005eede8b122a8 CR3: 000000000220a002 CR4: 00000000003606e0
> [ 6591.410422] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 6591.410423] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 6591.410423] Call Trace:
> [ 6591.410423]  vhci_urb_dequeue+0x2b/0x280 [vhci_hcd]
> [ 6591.410423]  usb_hcd_flush_endpoint+0x119/0x190
> [ 6591.410424]  usb_disable_endpoint+0x7b/0xa0
> [ 6591.410424]  usb_disable_interface+0x3e/0x50
> [ 6591.410424]  usb_unbind_interface+0x117/0x250
> [ 6591.410425]  device_release_driver_internal+0x17d/0x240
> [ 6591.410425]  bus_remove_device+0xe5/0x150
> [ 6591.410425]  device_del+0x161/0x360
> [ 6591.410426]  ? usb_remove_ep_devs+0x1b/0x30
> [ 6591.410426]  usb_disable_device+0x93/0x240
> [ 6591.410426]  usb_disconnect+0x90/0x270
> [ 6591.410427]  hub_port_connect+0x83/0xab0
> [ 6591.410427]  hub_event+0x8d1/0xab0
> [ 6591.410427]  process_one_work+0x191/0x370
> [ 6591.410428]  worker_thread+0x4f/0x3b0
> [ 6591.410428]  kthread+0xf8/0x130
> [ 6591.410428]  ? rescuer_thread+0x340/0x340
> [ 6591.410428]  ? kthread_create_worker_on_cpu+0x70/0x70
> [ 6591.410429]  ret_from_fork+0x35/0x40
> [ 6591.410432] rcu: rcu_sched kthread starved for 60002 jiffies! g17933 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
> [ 6591.411962] rcu: RCU grace-period kthread stack dump:
> [ 6591.411980] rcu_sched       I    0    10      2 0x80000000
> [ 6591.411998] Call Trace:
> [ 6591.412015]  ? __schedule+0x3f5/0x870
> [ 6591.412030]  schedule+0x32/0x80
> [ 6591.412044]  schedule_timeout+0x16f/0x350
> [ 6591.412059]  ? __next_timer_interrupt+0xc0/0xc0
> [ 6591.412077]  rcu_gp_kthread+0x569/0x950
> [ 6591.412092]  kthread+0xf8/0x130
> [ 6591.412106]  ? rcu_nocb_kthread+0x560/0x560
> [ 6591.412119]  ? kthread_create_worker_on_cpu+0x70/0x70
> [ 6591.412136]  ret_from_fork+0x35/0x40
> 
> Some more details are available here:
> https://github.com/QubesOS/qubes-issues/issues/5498
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-06  6:50 ` Suwan Kim
@ 2019-12-06 20:57   ` Marek Marczykowski-Górecki
  2019-12-06 21:12     ` Shuah Khan
  2019-12-09  2:01     ` Suwan Kim
  0 siblings, 2 replies; 17+ messages in thread
From: Marek Marczykowski-Górecki @ 2019-12-06 20:57 UTC (permalink / raw)
  To: Suwan Kim; +Cc: linux-usb, Shuah Khan, Valentina Manea

[-- Attachment #1: Type: text/plain, Size: 4201 bytes --]

On Fri, Dec 06, 2019 at 03:50:58PM +0900, Suwan Kim wrote:
> On Fri, Dec 06, 2019 at 04:24:06AM +0100, Marek Marczykowski-Górecki wrote:
> > Hello,
> > 
> > I've hit an issue with recent 4.19 and 5.4 kernels. In short: if I
> > connect Yubikey 4 and use its CCID interface (for example `ykman oath
> > list` command), the client side hangs (100% reliably). After 60s I get a
> > message that a CPU hangs waiting for a spinlock (see below).
> > 
> > I've bisected it to a ea44d190764b4422af ("usbip: Implement SG support
> > to vhci-hcd and stub driver") commit. Which indeed is also backported to
> > 4.19.
> > 
> > Any idea what is going on here? I can easily provide more information,
> > if you tell me how to get it.
> > 
> 
> Hi,
> 
> Thanks for reporting. Could you turn on lockdep and USBIP_DEBUG
> in kernel config and send dmesg log? It will be helpful to figure
> out lock dependency in vhci_hcd.

Hmm, I've tried, but I don't see much more information there (see
below). I've just enabled PROVE_LOCKING and USBIP_DEBUG. Do I need to do
anything more, like some boot option?

Also, this one (as the previous one) is from 4.19.84. Interestingly, on
4.19.87 I don't get the message at all.

Hmm, I've done also another test: got 4.19.84 with "usbip: Implement SG
support to vhci-hcd and stub driver" reverted and it still hangs...

I'm going for another bisect round (4.19.81 works, 4.19.84 doesn't).

[  212.890519] usb 1-1: recv xbuf, 42
[  212.891177] vhci_hcd: vhci_shutdown_connection:1024: stop threads
[  212.891228] vhci_hcd: vhci_shutdown_connection:1032: release socket
[  212.891388] vhci_hcd: vhci_shutdown_connection:1058: disconnect device
[  212.891637] usb 1-1: USB disconnect, device number 2
[  277.967398] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  277.967456] rcu: 	1-...0: (1 GPs behind) idle=d66/1/0x4000000000000000 softirq=32543/32544 fqs=15856 
[  277.967544] rcu: 	(detected by 2, t=65009 jiffies, g=65285, q=5921)
[  277.967560] Sending NMI from CPU 2 to CPUs 1:
[  277.968606] NMI backtrace for cpu 1
[  277.968607] CPU: 1 PID: 450 Comm: kworker/1:3 Tainted: G           O      4.19.84-1.pvops.qubes.x86_64 #2
[  277.968608] Workqueue: usb_hub_wq hub_event
[  277.968608] RIP: 0010:_raw_spin_lock+0x2c/0x40
[  277.968609] Code: 44 00 00 55 48 89 fd 65 ff 05 40 b8 60 7c ff 74 24 08 48 8d 7f 18 45 31 c9 31 c9 41 b8 01 00 00 00 31 d2 31 f6 e8 64 f0 71 ff <48> 89 ef 58 5d e9 1a 3a 72 ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
[  277.968610] RSP: 0018:ffffb29f00bbbb90 EFLAGS: 00000082
[  277.968610] RAX: ffff8fa938768000 RBX: ffff8fa981fe3950 RCX: a50d71adbc2b783b
[  277.968611] RDX: 00000000c093239f RSI: 00000000158aa627 RDI: 0000000000000046
[  277.968611] RBP: ffffffff8450c860 R08: 0000000000000001 R09: 00000000000c0490
[  277.968612] R10: 0000000000000000 R11: a50d71adbc2b783b R12: ffff8fa981fe3968
[  277.968612] R13: ffff8fa91728e000 R14: ffffffff8434e626 R15: 0000000000040200
[  277.968613] FS:  0000000000000000(0000) GS:ffff8fa9f5a80000(0000) knlGS:0000000000000000
[  277.968613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  277.968614] CR2: 00006335daf937b8 CR3: 00000000b0410002 CR4: 00000000001606e0
[  277.968614] Call Trace:
[  277.968614]  ? usb_hcd_flush_endpoint+0x122/0x170
[  277.968615]  usb_hcd_flush_endpoint+0x122/0x170
[  277.968615]  usb_disable_interface+0x3c/0x50
[  277.968616]  usb_unbind_interface+0x181/0x260
[  277.968616]  device_release_driver_internal+0x18b/0x250
[  277.968617]  bus_remove_device+0xfc/0x170
[  277.968617]  device_del+0x165/0x380
[  277.968617]  usb_disable_device+0x93/0x240
[  277.968618]  usb_disconnect+0xc1/0x2c0
[  277.968618]  hub_event+0xcc4/0x1620
[  277.968618]  process_one_work+0x221/0x580
[  277.968619]  worker_thread+0x50/0x3b0
[  277.968619]  ? process_one_work+0x580/0x580
[  277.968620]  kthread+0x122/0x140
[  277.968620]  ? kthread_create_worker_on_cpu+0x70/0x70
[  277.968620]  ret_from_fork+0x3a/0x50



-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-06 20:57   ` Marek Marczykowski-Górecki
@ 2019-12-06 21:12     ` Shuah Khan
  2019-12-07  0:58       ` Marek Marczykowski-Górecki
  2019-12-09  2:01     ` Suwan Kim
  1 sibling, 1 reply; 17+ messages in thread
From: Shuah Khan @ 2019-12-06 21:12 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki, Suwan Kim
  Cc: linux-usb, Valentina Manea, Shuah Khan

On 12/6/19 1:57 PM, Marek Marczykowski-Górecki wrote:
> On Fri, Dec 06, 2019 at 03:50:58PM +0900, Suwan Kim wrote:
>> On Fri, Dec 06, 2019 at 04:24:06AM +0100, Marek Marczykowski-Górecki wrote:
>>> Hello,
>>>
>>> I've hit an issue with recent 4.19 and 5.4 kernels. In short: if I
>>> connect Yubikey 4 and use its CCID interface (for example `ykman oath
>>> list` command), the client side hangs (100% reliably). After 60s I get a
>>> message that a CPU hangs waiting for a spinlock (see below).
>>>
>>> I've bisected it to a ea44d190764b4422af ("usbip: Implement SG support
>>> to vhci-hcd and stub driver") commit. Which indeed is also backported to
>>> 4.19.
>>>
>>> Any idea what is going on here? I can easily provide more information,
>>> if you tell me how to get it.
>>>
>>
>> Hi,
>>
>> Thanks for reporting. Could you turn on lockdep and USBIP_DEBUG
>> in kernel config and send dmesg log? It will be helpful to figure
>> out lock dependency in vhci_hcd.
> 
> Hmm, I've tried, but I don't see much more information there (see
> below). I've just enabled PROVE_LOCKING and USBIP_DEBUG. Do I need to do
> anything more, like some boot option?
> 
> Also, this one (as the previous one) is from 4.19.84. Interestingly, on
> 4.19.87 I don't get the message at all.
> 
> Hmm, I've done also another test: got 4.19.84 with "usbip: Implement SG
> support to vhci-hcd and stub driver" reverted and it still hangs...
> 
> I'm going for another bisect round (4.19.81 works, 4.19.84 doesn't).

Does 4.19.82 work?

thanks,
-- Shuah


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-06 21:12     ` Shuah Khan
@ 2019-12-07  0:58       ` Marek Marczykowski-Górecki
  2019-12-07 18:45         ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 17+ messages in thread
From: Marek Marczykowski-Górecki @ 2019-12-07  0:58 UTC (permalink / raw)
  To: Shuah Khan; +Cc: Suwan Kim, linux-usb, Valentina Manea

[-- Attachment #1: Type: text/plain, Size: 1982 bytes --]

On Fri, Dec 06, 2019 at 02:12:08PM -0700, Shuah Khan wrote:
> On 12/6/19 1:57 PM, Marek Marczykowski-Górecki wrote:
> > On Fri, Dec 06, 2019 at 03:50:58PM +0900, Suwan Kim wrote:
> > > On Fri, Dec 06, 2019 at 04:24:06AM +0100, Marek Marczykowski-Górecki wrote:
> > > > Hello,
> > > > 
> > > > I've hit an issue with recent 4.19 and 5.4 kernels. In short: if I
> > > > connect Yubikey 4 and use its CCID interface (for example `ykman oath
> > > > list` command), the client side hangs (100% reliably). After 60s I get a
> > > > message that a CPU hangs waiting for a spinlock (see below).
> > > > 
> > > > I've bisected it to a ea44d190764b4422af ("usbip: Implement SG support
> > > > to vhci-hcd and stub driver") commit. Which indeed is also backported to
> > > > 4.19.
> > > > 
> > > > Any idea what is going on here? I can easily provide more information,
> > > > if you tell me how to get it.
> > > > 
> > > 
> > > Hi,
> > > 
> > > Thanks for reporting. Could you turn on lockdep and USBIP_DEBUG
> > > in kernel config and send dmesg log? It will be helpful to figure
> > > out lock dependency in vhci_hcd.
> > 
> > Hmm, I've tried, but I don't see much more information there (see
> > below). I've just enabled PROVE_LOCKING and USBIP_DEBUG. Do I need to do
> > anything more, like some boot option?
> > 
> > Also, this one (as the previous one) is from 4.19.84. Interestingly, on
> > 4.19.87 I don't get the message at all.
> > 
> > Hmm, I've done also another test: got 4.19.84 with "usbip: Implement SG
> > support to vhci-hcd and stub driver" reverted and it still hangs...
> > 
> > I'm going for another bisect round (4.19.81 works, 4.19.84 doesn't).
> 
> Does 4.19.82 work?

Yes, it does, or at least something after 4.19.82 works (9da271c1c).

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-07  0:58       ` Marek Marczykowski-Górecki
@ 2019-12-07 18:45         ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 17+ messages in thread
From: Marek Marczykowski-Górecki @ 2019-12-07 18:45 UTC (permalink / raw)
  To: Shuah Khan; +Cc: Suwan Kim, linux-usb, Valentina Manea

[-- Attachment #1: Type: text/plain, Size: 4274 bytes --]

On Sat, Dec 07, 2019 at 01:58:09AM +0100, Marek Marczykowski-Górecki wrote:
> On Fri, Dec 06, 2019 at 02:12:08PM -0700, Shuah Khan wrote:
> > On 12/6/19 1:57 PM, Marek Marczykowski-Górecki wrote:
> > > On Fri, Dec 06, 2019 at 03:50:58PM +0900, Suwan Kim wrote:
> > > > On Fri, Dec 06, 2019 at 04:24:06AM +0100, Marek Marczykowski-Górecki wrote:
> > > > > Hello,
> > > > > 
> > > > > I've hit an issue with recent 4.19 and 5.4 kernels. In short: if I
> > > > > connect Yubikey 4 and use its CCID interface (for example `ykman oath
> > > > > list` command), the client side hangs (100% reliably). After 60s I get a
> > > > > message that a CPU hangs waiting for a spinlock (see below).
> > > > > 
> > > > > I've bisected it to a ea44d190764b4422af ("usbip: Implement SG support
> > > > > to vhci-hcd and stub driver") commit. Which indeed is also backported to
> > > > > 4.19.
> > > > > 
> > > > > Any idea what is going on here? I can easily provide more information,
> > > > > if you tell me how to get it.
> > > > > 
> > > > 
> > > > Hi,
> > > > 
> > > > Thanks for reporting. Could you turn on lockdep and USBIP_DEBUG
> > > > in kernel config and send dmesg log? It will be helpful to figure
> > > > out lock dependency in vhci_hcd.
> > > 
> > > Hmm, I've tried, but I don't see much more information there (see
> > > below). I've just enabled PROVE_LOCKING and USBIP_DEBUG. Do I need to do
> > > anything more, like some boot option?
> > > 
> > > Also, this one (as the previous one) is from 4.19.84. Interestingly, on
> > > 4.19.87 I don't get the message at all.
> > > 
> > > Hmm, I've done also another test: got 4.19.84 with "usbip: Implement SG
> > > support to vhci-hcd and stub driver" reverted and it still hangs...
> > > 
> > > I'm going for another bisect round (4.19.81 works, 4.19.84 doesn't).
> > 
> > Does 4.19.82 work?
> 
> Yes, it does, or at least something after 4.19.82 works (9da271c1c).

I've done bisect again and again got into the same commit. Most of the
time I didn't get any message about the deadlock, it simply hanged.

Full bisect log:

git bisect start
# good: [ef244c3088856cf048c77231653b4c92a7b2213c] Linux 4.19.81
git bisect good ef244c3088856cf048c77231653b4c92a7b2213c
# bad: [c555efaf14026c7751fa68d87403a5eb5ae7dcaf] Linux 4.19.84
git bisect bad c555efaf14026c7751fa68d87403a5eb5ae7dcaf
# good: [9da271c1cdc14839b694e23889a653c1ed0b5f8f] net: usb: lan78xx: Disable interrupts before calling generic_handle_irq()
git bisect good 9da271c1cdc14839b694e23889a653c1ed0b5f8f
# bad: [8181146cd7de890cdfdda68ddc3730250887d7fc] PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30
git bisect bad 8181146cd7de890cdfdda68ddc3730250887d7fc
# good: [8e6bf4bc3a88e4b84e5c4ec50143a71a61503336] mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges
git bisect good 8e6bf4bc3a88e4b84e5c4ec50143a71a61503336
# good: [0327c7818da27b018464c0b9f541c5d276c57172] can: flexcan: disable completely the ECC mechanism
git bisect good 0327c7818da27b018464c0b9f541c5d276c57172
# good: [5e36cf8edb5812e378b57511263d1a0a9172eeb9] configfs: fix a deadlock in configfs_symlink()
git bisect good 5e36cf8edb5812e378b57511263d1a0a9172eeb9
# good: [4f6c5200269998f0066e1bd4db931297b7b2b906] ALSA: usb-audio: Fix possible NULL dereference at create_yamaha_midi_quirk()
git bisect good 4f6c5200269998f0066e1bd4db931297b7b2b906
# good: [502bd151448c2c76a927b26783e5538875c534ff] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
git bisect good 502bd151448c2c76a927b26783e5538875c534ff
# good: [f865ae473c16fb2b8b8601fa04f4f6517ad557b3] usbip: Fix vhci_urb_enqueue() URB null transfer buffer error path
git bisect good f865ae473c16fb2b8b8601fa04f4f6517ad557b3
# bad: [e2dd254bde5cdac24e7774584d6f3c2c61fe09e5] usbip: Implement SG support to vhci-hcd and stub driver
git bisect bad e2dd254bde5cdac24e7774584d6f3c2c61fe09e5
# first bad commit: [e2dd254bde5cdac24e7774584d6f3c2c61fe09e5] usbip: Implement SG support to vhci-hcd and stub driver


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-06 20:57   ` Marek Marczykowski-Górecki
  2019-12-06 21:12     ` Shuah Khan
@ 2019-12-09  2:01     ` Suwan Kim
  2019-12-09  3:37       ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 17+ messages in thread
From: Suwan Kim @ 2019-12-09  2:01 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: linux-usb, Shuah Khan, Valentina Manea

On Fri, Dec 06, 2019 at 09:57:42PM +0100, Marek Marczykowski-Górecki wrote:
> On Fri, Dec 06, 2019 at 03:50:58PM +0900, Suwan Kim wrote:
> > On Fri, Dec 06, 2019 at 04:24:06AM +0100, Marek Marczykowski-Górecki wrote:
> > > Hello,
> > > 
> > > I've hit an issue with recent 4.19 and 5.4 kernels. In short: if I
> > > connect Yubikey 4 and use its CCID interface (for example `ykman oath
> > > list` command), the client side hangs (100% reliably). After 60s I get a
> > > message that a CPU hangs waiting for a spinlock (see below).
> > > 
> > > I've bisected it to a ea44d190764b4422af ("usbip: Implement SG support
> > > to vhci-hcd and stub driver") commit. Which indeed is also backported to
> > > 4.19.
> > > 
> > > Any idea what is going on here? I can easily provide more information,
> > > if you tell me how to get it.
> > > 
> > 
> > Hi,
> > 
> > Thanks for reporting. Could you turn on lockdep and USBIP_DEBUG
> > in kernel config and send dmesg log? It will be helpful to figure
> > out lock dependency in vhci_hcd.
> 
> Hmm, I've tried, but I don't see much more information there (see
> below). I've just enabled PROVE_LOCKING and USBIP_DEBUG. Do I need to do
> anything more, like some boot option?
> 
> Also, this one (as the previous one) is from 4.19.84. Interestingly, on
> 4.19.87 I don't get the message at all.
> 
> Hmm, I've done also another test: got 4.19.84 with "usbip: Implement SG
> support to vhci-hcd and stub driver" reverted and it still hangs...

If so, deadlock is caused by other causes, and why is it different
from the results of bisect?

> 
> I'm going for another bisect round (4.19.81 works, 4.19.84 doesn't).
> 
> [  212.890519] usb 1-1: recv xbuf, 42

This message is printed by receive error and before that, driver
canceled URB transmission. we need to know the exact situation
before this message.

Could you send me a longer log messages showing the situation
before "[  212.890519] usb 1-1: recv xbuf, 42"?

And please also send the result of "lsusb -v".

Regards,
Suwan Kim

> [  212.891177] vhci_hcd: vhci_shutdown_connection:1024: stop threads
> [  212.891228] vhci_hcd: vhci_shutdown_connection:1032: release socket
> [  212.891388] vhci_hcd: vhci_shutdown_connection:1058: disconnect device
> [  212.891637] usb 1-1: USB disconnect, device number 2
> [  277.967398] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [  277.967456] rcu: 	1-...0: (1 GPs behind) idle=d66/1/0x4000000000000000 softirq=32543/32544 fqs=15856 
> [  277.967544] rcu: 	(detected by 2, t=65009 jiffies, g=65285, q=5921)
> [  277.967560] Sending NMI from CPU 2 to CPUs 1:
> [  277.968606] NMI backtrace for cpu 1
> [  277.968607] CPU: 1 PID: 450 Comm: kworker/1:3 Tainted: G           O      4.19.84-1.pvops.qubes.x86_64 #2
> [  277.968608] Workqueue: usb_hub_wq hub_event
> [  277.968608] RIP: 0010:_raw_spin_lock+0x2c/0x40
> [  277.968609] Code: 44 00 00 55 48 89 fd 65 ff 05 40 b8 60 7c ff 74 24 08 48 8d 7f 18 45 31 c9 31 c9 41 b8 01 00 00 00 31 d2 31 f6 e8 64 f0 71 ff <48> 89 ef 58 5d e9 1a 3a 72 ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
> [  277.968610] RSP: 0018:ffffb29f00bbbb90 EFLAGS: 00000082
> [  277.968610] RAX: ffff8fa938768000 RBX: ffff8fa981fe3950 RCX: a50d71adbc2b783b
> [  277.968611] RDX: 00000000c093239f RSI: 00000000158aa627 RDI: 0000000000000046
> [  277.968611] RBP: ffffffff8450c860 R08: 0000000000000001 R09: 00000000000c0490
> [  277.968612] R10: 0000000000000000 R11: a50d71adbc2b783b R12: ffff8fa981fe3968
> [  277.968612] R13: ffff8fa91728e000 R14: ffffffff8434e626 R15: 0000000000040200
> [  277.968613] FS:  0000000000000000(0000) GS:ffff8fa9f5a80000(0000) knlGS:0000000000000000
> [  277.968613] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  277.968614] CR2: 00006335daf937b8 CR3: 00000000b0410002 CR4: 00000000001606e0
> [  277.968614] Call Trace:
> [  277.968614]  ? usb_hcd_flush_endpoint+0x122/0x170
> [  277.968615]  usb_hcd_flush_endpoint+0x122/0x170
> [  277.968615]  usb_disable_interface+0x3c/0x50
> [  277.968616]  usb_unbind_interface+0x181/0x260
> [  277.968616]  device_release_driver_internal+0x18b/0x250
> [  277.968617]  bus_remove_device+0xfc/0x170
> [  277.968617]  device_del+0x165/0x380
> [  277.968617]  usb_disable_device+0x93/0x240
> [  277.968618]  usb_disconnect+0xc1/0x2c0
> [  277.968618]  hub_event+0xcc4/0x1620
> [  277.968618]  process_one_work+0x221/0x580
> [  277.968619]  worker_thread+0x50/0x3b0
> [  277.968619]  ? process_one_work+0x580/0x580
> [  277.968620]  kthread+0x122/0x140
> [  277.968620]  ? kthread_create_worker_on_cpu+0x70/0x70
> [  277.968620]  ret_from_fork+0x3a/0x50
> 
> 
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-09  2:01     ` Suwan Kim
@ 2019-12-09  3:37       ` Marek Marczykowski-Górecki
  2019-12-09  6:35         ` Suwan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Marek Marczykowski-Górecki @ 2019-12-09  3:37 UTC (permalink / raw)
  To: Suwan Kim; +Cc: linux-usb, Shuah Khan, Valentina Manea

[-- Attachment #1: Type: text/plain, Size: 15813 bytes --]

On Mon, Dec 09, 2019 at 11:01:30AM +0900, Suwan Kim wrote:
> On Fri, Dec 06, 2019 at 09:57:42PM +0100, Marek Marczykowski-Górecki wrote:
> > On Fri, Dec 06, 2019 at 03:50:58PM +0900, Suwan Kim wrote:
> > > On Fri, Dec 06, 2019 at 04:24:06AM +0100, Marek Marczykowski-Górecki wrote:
> > > > Hello,
> > > > 
> > > > I've hit an issue with recent 4.19 and 5.4 kernels. In short: if I
> > > > connect Yubikey 4 and use its CCID interface (for example `ykman oath
> > > > list` command), the client side hangs (100% reliably). After 60s I get a
> > > > message that a CPU hangs waiting for a spinlock (see below).
> > > > 
> > > > I've bisected it to a ea44d190764b4422af ("usbip: Implement SG support
> > > > to vhci-hcd and stub driver") commit. Which indeed is also backported to
> > > > 4.19.
> > > > 
> > > > Any idea what is going on here? I can easily provide more information,
> > > > if you tell me how to get it.
> > > > 
> > > 
> > > Hi,
> > > 
> > > Thanks for reporting. Could you turn on lockdep and USBIP_DEBUG
> > > in kernel config and send dmesg log? It will be helpful to figure
> > > out lock dependency in vhci_hcd.
> > 
> > Hmm, I've tried, but I don't see much more information there (see
> > below). I've just enabled PROVE_LOCKING and USBIP_DEBUG. Do I need to do
> > anything more, like some boot option?
> > 
> > Also, this one (as the previous one) is from 4.19.84. Interestingly, on
> > 4.19.87 I don't get the message at all.
> > 
> > Hmm, I've done also another test: got 4.19.84 with "usbip: Implement SG
> > support to vhci-hcd and stub driver" reverted and it still hangs...
> 
> If so, deadlock is caused by other causes, and why is it different
> from the results of bisect?

No idea, but as you've seen in the other email, another bisect returned
exactly the same commit.


> > I'm going for another bisect round (4.19.81 works, 4.19.84 doesn't).
> > 
> > [  212.890519] usb 1-1: recv xbuf, 42
> 
> This message is printed by receive error and before that, driver
> canceled URB transmission. we need to know the exact situation
> before this message.

I've added some more messages and found recv_size is 0.

> Could you send me a longer log messages showing the situation
> before "[  212.890519] usb 1-1: recv xbuf, 42"?

Sure, with added extra messages (debug patch below).

[  131.397522] usb 1-1: num_sgs 0
[  131.406588] usb 1-1: num_sgs 0
[  131.410621] usb 1-1: num_sgs 0
[  131.411950] usb 1-1: num_sgs 0
[  131.413186] usb 1-1: num_sgs 0
[  131.414590] usb 1-1: num_sgs 0
[  131.417086] usb 1-1: num_sgs 0
[  131.418188] usb 1-1: num_sgs 0
[  131.419228] usb 1-1: num_sgs 0
[  131.420248] usb 1-1: num_sgs 0
[  131.457315] usb 1-1: num_sgs 5
[  131.457345] usb 1-1: size 42, copy 42 recv 42, recv_size 42, sg->length 16384
[  131.457359] usb 1-1: size 42, copy 0 recv -22, recv_size 0, sg->length 16384
[  131.457372] usb 1-1: recv xbuf, 42 size 42
[  131.458263] vhci_hcd: vhci_shutdown_connection:1024: stop threads
[  131.458318] vhci_hcd: vhci_shutdown_connection:1032: release socket
[  131.458431] vhci_hcd: vhci_shutdown_connection:1058: disconnect device
[  131.460171] usb 1-1: USB disconnect, device number 2

(...)

If I add "if (!recv_size) continue;" there, it works!

> And please also send the result of "lsusb -v".

Bus 003 Device 006: ID 1050:0407 Yubico.com Yubikey 4 OTP+U2F+CCID
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 
  bDeviceSubClass         0 
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x1050 Yubico.com
  idProduct          0x0407 Yubikey 4 OTP+U2F+CCID
  bcdDevice            4.34
  iManufacturer           1 Yubico
  iProduct                2 Yubikey 4 OTP+U2F+CCID
  iSerial                 0 
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x0096
    bNumInterfaces          3
    bConfigurationValue     1
    iConfiguration          0 
    bmAttributes         0x80
      (Bus Powered)
    MaxPower               30mA
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           1
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      1 Boot Interface Subclass
      bInterfaceProtocol      1 Keyboard
      iInterface              0 
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.10
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength      71
          Report Descriptor: (length is 71)
            Item(Global): Usage Page, data= [ 0x01 ] 1
                            Generic Desktop Controls
            Item(Local ): Usage, data= [ 0x06 ] 6
                            Keyboard
            Item(Main  ): Collection, data= [ 0x01 ] 1
                            Application
            Item(Global): Usage Page, data= [ 0x07 ] 7
                            Keyboard
            Item(Local ): Usage Minimum, data= [ 0xe0 ] 224
                            Control Left
            Item(Local ): Usage Maximum, data= [ 0xe7 ] 231
                            GUI Right
            Item(Global): Logical Minimum, data= [ 0x00 ] 0
            Item(Global): Logical Maximum, data= [ 0x01 ] 1
            Item(Global): Report Size, data= [ 0x01 ] 1
            Item(Global): Report Count, data= [ 0x08 ] 8
            Item(Main  ): Input, data= [ 0x02 ] 2
                            Data Variable Absolute No_Wrap Linear
                            Preferred_State No_Null_Position Non_Volatile Bitfield
            Item(Global): Report Count, data= [ 0x01 ] 1
            Item(Global): Report Size, data= [ 0x08 ] 8
            Item(Main  ): Input, data= [ 0x01 ] 1
                            Constant Array Absolute No_Wrap Linear
                            Preferred_State No_Null_Position Non_Volatile Bitfield
            Item(Global): Report Count, data= [ 0x05 ] 5
            Item(Global): Report Size, data= [ 0x01 ] 1
            Item(Global): Usage Page, data= [ 0x08 ] 8
                            LEDs
            Item(Local ): Usage Minimum, data= [ 0x01 ] 1
                            NumLock
            Item(Local ): Usage Maximum, data= [ 0x05 ] 5
                            Kana
            Item(Main  ): Output, data= [ 0x02 ] 2
                            Data Variable Absolute No_Wrap Linear
                            Preferred_State No_Null_Position Non_Volatile Bitfield
            Item(Global): Report Count, data= [ 0x01 ] 1
            Item(Global): Report Size, data= [ 0x03 ] 3
            Item(Main  ): Output, data= [ 0x01 ] 1
                            Constant Array Absolute No_Wrap Linear
                            Preferred_State No_Null_Position Non_Volatile Bitfield
            Item(Global): Report Count, data= [ 0x06 ] 6
            Item(Global): Report Size, data= [ 0x08 ] 8
            Item(Global): Logical Minimum, data= [ 0x00 ] 0
            Item(Global): Logical Maximum, data= [ 0x65 ] 101
            Item(Global): Usage Page, data= [ 0x07 ] 7
                            Keyboard
            Item(Local ): Usage Minimum, data= [ 0x00 ] 0
                            No Event
            Item(Local ): Usage Maximum, data= [ 0x65 ] 101
                            Keyboard Application (Windows Key for Win95 or Compose)
            Item(Main  ): Input, data= [ 0x00 ] 0
                            Data Array Absolute No_Wrap Linear
                            Preferred_State No_Null_Position Non_Volatile Bitfield
            Item(Local ): Usage, data= [ 0x03 ] 3
                            Keyboard Error Undefined
            Item(Global): Report Size, data= [ 0x08 ] 8
            Item(Global): Report Count, data= [ 0x08 ] 8
            Item(Main  ): Feature, data= [ 0x02 ] 2
                            Data Variable Absolute No_Wrap Linear
                            Preferred_State No_Null_Position Non_Volatile Bitfield
            Item(Main  ): End Collection, data=none
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x81  EP 1 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              10
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass         3 Human Interface Device
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
        HID Device Descriptor:
          bLength                 9
          bDescriptorType        33
          bcdHID               1.10
          bCountryCode            0 Not supported
          bNumDescriptors         1
          bDescriptorType        34 Report
          wDescriptorLength      34
          Report Descriptor: (length is 34)
            Item(Global): Usage Page, data= [ 0xd0 0xf1 ] 61904
                            (null)
            Item(Local ): Usage, data= [ 0x01 ] 1
                            (null)
            Item(Main  ): Collection, data= [ 0x01 ] 1
                            Application
            Item(Local ): Usage, data= [ 0x20 ] 32
                            (null)
            Item(Global): Logical Minimum, data= [ 0x00 ] 0
            Item(Global): Logical Maximum, data= [ 0xff 0x00 ] 255
            Item(Global): Report Size, data= [ 0x08 ] 8
            Item(Global): Report Count, data= [ 0x40 ] 64
            Item(Main  ): Input, data= [ 0x02 ] 2
                            Data Variable Absolute No_Wrap Linear
                            Preferred_State No_Null_Position Non_Volatile Bitfield
            Item(Local ): Usage, data= [ 0x21 ] 33
                            (null)
            Item(Global): Logical Minimum, data= [ 0x00 ] 0
            Item(Global): Logical Maximum, data= [ 0xff 0x00 ] 255
            Item(Global): Report Size, data= [ 0x08 ] 8
            Item(Global): Report Count, data= [ 0x40 ] 64
            Item(Main  ): Output, data= [ 0x02 ] 2
                            Data Variable Absolute No_Wrap Linear
                            Preferred_State No_Null_Position Non_Volatile Bitfield
            Item(Main  ): End Collection, data=none
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x04  EP 4 OUT
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               2
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x84  EP 4 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               2
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       0
      bNumEndpoints           3
      bInterfaceClass        11 Chip/SmartCard
      bInterfaceSubClass      0 
      bInterfaceProtocol      0 
      iInterface              0 
      ChipCard Interface Descriptor:
        bLength                54
        bDescriptorType        33
        bcdCCID              1.00
        nMaxSlotIndex           0
        bVoltageSupport         7  5.0V 3.0V 1.8V 
        dwProtocols             2  T=1
        dwDefaultClock       4000
        dwMaxiumumClock      4000
        bNumClockSupported      0
        dwDataRate         307200 bps
        dwMaxDataRate      307200 bps
        bNumDataRatesSupp.      0
        dwMaxIFSD            2038
        dwSyncProtocols  00000000 
        dwMechanical     00000000 
        dwFeatures       000400FE
          Auto configuration based on ATR
          Auto activation on insert
          Auto voltage selection
          Auto clock change
          Auto baud rate change
          Auto parameter negotiation made by CCID
          Short and extended APDU level exchange
        dwMaxCCIDMsgLen      3072
        bClassGetResponse    echo
        bClassEnvelope       echo
        wlcdLayout           none
        bPINSupport             0 
        bMaxCCIDBusySlots       1
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x02  EP 2 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x82  EP 2 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0040  1x 64 bytes
        bInterval               0
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes            3
          Transfer Type            Interrupt
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0008  1x 8 bytes
        bInterval              32
Device Status:     0x0000
  (Bus Powered)



And the debug patch:

diff --git a/drivers/usb/usbip/usbip_common.c b/drivers/usb/usbip/usbip_common.c
index d88a5b15f..89b87d5ed 100644
--- a/drivers/usb/usbip/usbip_common.c
+++ b/drivers/usb/usbip/usbip_common.c
@@ -709,6 +709,7 @@ int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
 		/* should not happen, probably malicious packet */
 		goto error;
 
+	dev_err(&urb->dev->dev, "num_sgs %d\n", urb->num_sgs);
 	if (urb->num_sgs) {
 		copy = size;
 		for_each_sg(urb->sg, sg, urb->num_sgs, i) {
@@ -721,6 +722,7 @@ int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
 
 			recv = usbip_recv(ud->tcp_socket, sg_virt(sg),
 						recv_size);
+			dev_err(&urb->dev->dev, "size %d, copy %d recv %d, recv_size %d, sg->length %d\n", size, copy, recv, recv_size, sg->length);
 
 			if (recv != recv_size)
 				goto error;
@@ -740,7 +742,7 @@ int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
 	return ret;
 
 error:
-	dev_err(&urb->dev->dev, "recv xbuf, %d\n", ret);
+	dev_err(&urb->dev->dev, "recv xbuf, %d size %d\n", ret, size);
 	if (ud->side == USBIP_STUB || ud->side == USBIP_VUDC)
 		usbip_event_add(ud, SDEV_EVENT_ERROR_TCP);
 	else


-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-09  3:37       ` Marek Marczykowski-Górecki
@ 2019-12-09  6:35         ` Suwan Kim
  2019-12-09 14:19           ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 17+ messages in thread
From: Suwan Kim @ 2019-12-09  6:35 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: linux-usb, Shuah Khan, Valentina Manea

On Mon, Dec 09, 2019 at 04:37:40AM +0100, Marek Marczykowski-Górecki wrote:
> On Mon, Dec 09, 2019 at 11:01:30AM +0900, Suwan Kim wrote:
> > On Fri, Dec 06, 2019 at 09:57:42PM +0100, Marek Marczykowski-Górecki wrote:
> > > On Fri, Dec 06, 2019 at 03:50:58PM +0900, Suwan Kim wrote:
> > > > On Fri, Dec 06, 2019 at 04:24:06AM +0100, Marek Marczykowski-Górecki wrote:
> > > > > Hello,
> > > > > 
> > > > > I've hit an issue with recent 4.19 and 5.4 kernels. In short: if I
> > > > > connect Yubikey 4 and use its CCID interface (for example `ykman oath
> > > > > list` command), the client side hangs (100% reliably). After 60s I get a
> > > > > message that a CPU hangs waiting for a spinlock (see below).
> > > > > 
> > > > > I've bisected it to a ea44d190764b4422af ("usbip: Implement SG support
> > > > > to vhci-hcd and stub driver") commit. Which indeed is also backported to
> > > > > 4.19.
> > > > > 
> > > > > Any idea what is going on here? I can easily provide more information,
> > > > > if you tell me how to get it.
> > > > > 
> > > > 
> > > > Hi,
> > > > 
> > > > Thanks for reporting. Could you turn on lockdep and USBIP_DEBUG
> > > > in kernel config and send dmesg log? It will be helpful to figure
> > > > out lock dependency in vhci_hcd.
> > > 
> > > Hmm, I've tried, but I don't see much more information there (see
> > > below). I've just enabled PROVE_LOCKING and USBIP_DEBUG. Do I need to do
> > > anything more, like some boot option?
> > > 
> > > Also, this one (as the previous one) is from 4.19.84. Interestingly, on
> > > 4.19.87 I don't get the message at all.
> > > 
> > > Hmm, I've done also another test: got 4.19.84 with "usbip: Implement SG
> > > support to vhci-hcd and stub driver" reverted and it still hangs...
> > 
> > If so, deadlock is caused by other causes, and why is it different
> > from the results of bisect?
> 
> No idea, but as you've seen in the other email, another bisect returned
> exactly the same commit.
> 
> 
> > > I'm going for another bisect round (4.19.81 works, 4.19.84 doesn't).
> > > 
> > > [  212.890519] usb 1-1: recv xbuf, 42
> > 
> > This message is printed by receive error and before that, driver
> > canceled URB transmission. we need to know the exact situation
> > before this message.
> 
> I've added some more messages and found recv_size is 0.

That is the bug point. "size" is urb->actual_length that means
amount of data actually received from device. And "copy" is
amount of data received from usbip server. So, in this situation,
vhci-hcd received all the data from usbip server even if there
are more sg entries left. So, "copy == 0" means vhci-hcd receives
all data from the server and we should check "if (copy == 0)" in
for_each_sg() loop of usbip_recv_xbuff() to exit the loop and not
to add error event.

> 
> > Could you send me a longer log messages showing the situation
> > before "[  212.890519] usb 1-1: recv xbuf, 42"?
> 
> Sure, with added extra messages (debug patch below).
> 
> [  131.397522] usb 1-1: num_sgs 0
> [  131.406588] usb 1-1: num_sgs 0
> [  131.410621] usb 1-1: num_sgs 0
> [  131.411950] usb 1-1: num_sgs 0
> [  131.413186] usb 1-1: num_sgs 0
> [  131.414590] usb 1-1: num_sgs 0
> [  131.417086] usb 1-1: num_sgs 0
> [  131.418188] usb 1-1: num_sgs 0
> [  131.419228] usb 1-1: num_sgs 0
> [  131.420248] usb 1-1: num_sgs 0
> [  131.457315] usb 1-1: num_sgs 5
> [  131.457345] usb 1-1: size 42, copy 42 recv 42, recv_size 42, sg->length 16384

Device sent 42 bytes data (size 42) and vhci-hcd received 42 bytes
data from the server. vhci-hcd received all the data and It should
exit the loop.

> [  131.457359] usb 1-1: size 42, copy 0 recv -22, recv_size 0, sg->length 16384
> [  131.457372] usb 1-1: recv xbuf, 42 size 42
> [  131.458263] vhci_hcd: vhci_shutdown_connection:1024: stop threads
> [  131.458318] vhci_hcd: vhci_shutdown_connection:1032: release socket
> [  131.458431] vhci_hcd: vhci_shutdown_connection:1058: disconnect device
> [  131.460171] usb 1-1: USB disconnect, device number 2
> 
> (...)
> 
> If I add "if (!recv_size) continue;" there, it works!

I think we should check "copy" not the "recv_size" because "copy"
shows the amount of data received from the server.

int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
...
...
	if (urb->num_sgs) {
		copy = size;
		for_each_sg(urb->sg, sg, urb->num_sgs, i) {
			int recv_size;

			if (copy < sg->length)
				recv_size = copy;
			else
				recv_size = sg->length;

			recv = usbip_recv(ud->tcp_socket, sg_virt(sg),
						recv_size);

			if (recv != recv_size)
				goto error;

			copy -= recv;
			ret += recv;
			
			/* Add here */
			if (!copy)
				break;
			^^^^^^^^^^^^^^
		}

Regards,
Suwan Kim

> > And please also send the result of "lsusb -v".
> 
> Bus 003 Device 006: ID 1050:0407 Yubico.com Yubikey 4 OTP+U2F+CCID
> Device Descriptor:
>   bLength                18
>   bDescriptorType         1
>   bcdUSB               2.00
>   bDeviceClass            0 
>   bDeviceSubClass         0 
>   bDeviceProtocol         0 
>   bMaxPacketSize0        64
>   idVendor           0x1050 Yubico.com
>   idProduct          0x0407 Yubikey 4 OTP+U2F+CCID
>   bcdDevice            4.34
>   iManufacturer           1 Yubico
>   iProduct                2 Yubikey 4 OTP+U2F+CCID
>   iSerial                 0 
>   bNumConfigurations      1
>   Configuration Descriptor:
>     bLength                 9
>     bDescriptorType         2
>     wTotalLength       0x0096
>     bNumInterfaces          3
>     bConfigurationValue     1
>     iConfiguration          0 
>     bmAttributes         0x80
>       (Bus Powered)
>     MaxPower               30mA
>     Interface Descriptor:
>       bLength                 9
>       bDescriptorType         4
>       bInterfaceNumber        0
>       bAlternateSetting       0
>       bNumEndpoints           1
>       bInterfaceClass         3 Human Interface Device
>       bInterfaceSubClass      1 Boot Interface Subclass
>       bInterfaceProtocol      1 Keyboard
>       iInterface              0 
>         HID Device Descriptor:
>           bLength                 9
>           bDescriptorType        33
>           bcdHID               1.10
>           bCountryCode            0 Not supported
>           bNumDescriptors         1
>           bDescriptorType        34 Report
>           wDescriptorLength      71
>           Report Descriptor: (length is 71)
>             Item(Global): Usage Page, data= [ 0x01 ] 1
>                             Generic Desktop Controls
>             Item(Local ): Usage, data= [ 0x06 ] 6
>                             Keyboard
>             Item(Main  ): Collection, data= [ 0x01 ] 1
>                             Application
>             Item(Global): Usage Page, data= [ 0x07 ] 7
>                             Keyboard
>             Item(Local ): Usage Minimum, data= [ 0xe0 ] 224
>                             Control Left
>             Item(Local ): Usage Maximum, data= [ 0xe7 ] 231
>                             GUI Right
>             Item(Global): Logical Minimum, data= [ 0x00 ] 0
>             Item(Global): Logical Maximum, data= [ 0x01 ] 1
>             Item(Global): Report Size, data= [ 0x01 ] 1
>             Item(Global): Report Count, data= [ 0x08 ] 8
>             Item(Main  ): Input, data= [ 0x02 ] 2
>                             Data Variable Absolute No_Wrap Linear
>                             Preferred_State No_Null_Position Non_Volatile Bitfield
>             Item(Global): Report Count, data= [ 0x01 ] 1
>             Item(Global): Report Size, data= [ 0x08 ] 8
>             Item(Main  ): Input, data= [ 0x01 ] 1
>                             Constant Array Absolute No_Wrap Linear
>                             Preferred_State No_Null_Position Non_Volatile Bitfield
>             Item(Global): Report Count, data= [ 0x05 ] 5
>             Item(Global): Report Size, data= [ 0x01 ] 1
>             Item(Global): Usage Page, data= [ 0x08 ] 8
>                             LEDs
>             Item(Local ): Usage Minimum, data= [ 0x01 ] 1
>                             NumLock
>             Item(Local ): Usage Maximum, data= [ 0x05 ] 5
>                             Kana
>             Item(Main  ): Output, data= [ 0x02 ] 2
>                             Data Variable Absolute No_Wrap Linear
>                             Preferred_State No_Null_Position Non_Volatile Bitfield
>             Item(Global): Report Count, data= [ 0x01 ] 1
>             Item(Global): Report Size, data= [ 0x03 ] 3
>             Item(Main  ): Output, data= [ 0x01 ] 1
>                             Constant Array Absolute No_Wrap Linear
>                             Preferred_State No_Null_Position Non_Volatile Bitfield
>             Item(Global): Report Count, data= [ 0x06 ] 6
>             Item(Global): Report Size, data= [ 0x08 ] 8
>             Item(Global): Logical Minimum, data= [ 0x00 ] 0
>             Item(Global): Logical Maximum, data= [ 0x65 ] 101
>             Item(Global): Usage Page, data= [ 0x07 ] 7
>                             Keyboard
>             Item(Local ): Usage Minimum, data= [ 0x00 ] 0
>                             No Event
>             Item(Local ): Usage Maximum, data= [ 0x65 ] 101
>                             Keyboard Application (Windows Key for Win95 or Compose)
>             Item(Main  ): Input, data= [ 0x00 ] 0
>                             Data Array Absolute No_Wrap Linear
>                             Preferred_State No_Null_Position Non_Volatile Bitfield
>             Item(Local ): Usage, data= [ 0x03 ] 3
>                             Keyboard Error Undefined
>             Item(Global): Report Size, data= [ 0x08 ] 8
>             Item(Global): Report Count, data= [ 0x08 ] 8
>             Item(Main  ): Feature, data= [ 0x02 ] 2
>                             Data Variable Absolute No_Wrap Linear
>                             Preferred_State No_Null_Position Non_Volatile Bitfield
>             Item(Main  ): End Collection, data=none
>       Endpoint Descriptor:
>         bLength                 7
>         bDescriptorType         5
>         bEndpointAddress     0x81  EP 1 IN
>         bmAttributes            3
>           Transfer Type            Interrupt
>           Synch Type               None
>           Usage Type               Data
>         wMaxPacketSize     0x0008  1x 8 bytes
>         bInterval              10
>     Interface Descriptor:
>       bLength                 9
>       bDescriptorType         4
>       bInterfaceNumber        1
>       bAlternateSetting       0
>       bNumEndpoints           2
>       bInterfaceClass         3 Human Interface Device
>       bInterfaceSubClass      0 
>       bInterfaceProtocol      0 
>       iInterface              0 
>         HID Device Descriptor:
>           bLength                 9
>           bDescriptorType        33
>           bcdHID               1.10
>           bCountryCode            0 Not supported
>           bNumDescriptors         1
>           bDescriptorType        34 Report
>           wDescriptorLength      34
>           Report Descriptor: (length is 34)
>             Item(Global): Usage Page, data= [ 0xd0 0xf1 ] 61904
>                             (null)
>             Item(Local ): Usage, data= [ 0x01 ] 1
>                             (null)
>             Item(Main  ): Collection, data= [ 0x01 ] 1
>                             Application
>             Item(Local ): Usage, data= [ 0x20 ] 32
>                             (null)
>             Item(Global): Logical Minimum, data= [ 0x00 ] 0
>             Item(Global): Logical Maximum, data= [ 0xff 0x00 ] 255
>             Item(Global): Report Size, data= [ 0x08 ] 8
>             Item(Global): Report Count, data= [ 0x40 ] 64
>             Item(Main  ): Input, data= [ 0x02 ] 2
>                             Data Variable Absolute No_Wrap Linear
>                             Preferred_State No_Null_Position Non_Volatile Bitfield
>             Item(Local ): Usage, data= [ 0x21 ] 33
>                             (null)
>             Item(Global): Logical Minimum, data= [ 0x00 ] 0
>             Item(Global): Logical Maximum, data= [ 0xff 0x00 ] 255
>             Item(Global): Report Size, data= [ 0x08 ] 8
>             Item(Global): Report Count, data= [ 0x40 ] 64
>             Item(Main  ): Output, data= [ 0x02 ] 2
>                             Data Variable Absolute No_Wrap Linear
>                             Preferred_State No_Null_Position Non_Volatile Bitfield
>             Item(Main  ): End Collection, data=none
>       Endpoint Descriptor:
>         bLength                 7
>         bDescriptorType         5
>         bEndpointAddress     0x04  EP 4 OUT
>         bmAttributes            3
>           Transfer Type            Interrupt
>           Synch Type               None
>           Usage Type               Data
>         wMaxPacketSize     0x0040  1x 64 bytes
>         bInterval               2
>       Endpoint Descriptor:
>         bLength                 7
>         bDescriptorType         5
>         bEndpointAddress     0x84  EP 4 IN
>         bmAttributes            3
>           Transfer Type            Interrupt
>           Synch Type               None
>           Usage Type               Data
>         wMaxPacketSize     0x0040  1x 64 bytes
>         bInterval               2
>     Interface Descriptor:
>       bLength                 9
>       bDescriptorType         4
>       bInterfaceNumber        2
>       bAlternateSetting       0
>       bNumEndpoints           3
>       bInterfaceClass        11 Chip/SmartCard
>       bInterfaceSubClass      0 
>       bInterfaceProtocol      0 
>       iInterface              0 
>       ChipCard Interface Descriptor:
>         bLength                54
>         bDescriptorType        33
>         bcdCCID              1.00
>         nMaxSlotIndex           0
>         bVoltageSupport         7  5.0V 3.0V 1.8V 
>         dwProtocols             2  T=1
>         dwDefaultClock       4000
>         dwMaxiumumClock      4000
>         bNumClockSupported      0
>         dwDataRate         307200 bps
>         dwMaxDataRate      307200 bps
>         bNumDataRatesSupp.      0
>         dwMaxIFSD            2038
>         dwSyncProtocols  00000000 
>         dwMechanical     00000000 
>         dwFeatures       000400FE
>           Auto configuration based on ATR
>           Auto activation on insert
>           Auto voltage selection
>           Auto clock change
>           Auto baud rate change
>           Auto parameter negotiation made by CCID
>           Short and extended APDU level exchange
>         dwMaxCCIDMsgLen      3072
>         bClassGetResponse    echo
>         bClassEnvelope       echo
>         wlcdLayout           none
>         bPINSupport             0 
>         bMaxCCIDBusySlots       1
>       Endpoint Descriptor:
>         bLength                 7
>         bDescriptorType         5
>         bEndpointAddress     0x02  EP 2 OUT
>         bmAttributes            2
>           Transfer Type            Bulk
>           Synch Type               None
>           Usage Type               Data
>         wMaxPacketSize     0x0040  1x 64 bytes
>         bInterval               0
>       Endpoint Descriptor:
>         bLength                 7
>         bDescriptorType         5
>         bEndpointAddress     0x82  EP 2 IN
>         bmAttributes            2
>           Transfer Type            Bulk
>           Synch Type               None
>           Usage Type               Data
>         wMaxPacketSize     0x0040  1x 64 bytes
>         bInterval               0
>       Endpoint Descriptor:
>         bLength                 7
>         bDescriptorType         5
>         bEndpointAddress     0x83  EP 3 IN
>         bmAttributes            3
>           Transfer Type            Interrupt
>           Synch Type               None
>           Usage Type               Data
>         wMaxPacketSize     0x0008  1x 8 bytes
>         bInterval              32
> Device Status:     0x0000
>   (Bus Powered)
> 
> 
> 
> And the debug patch:
> 
> diff --git a/drivers/usb/usbip/usbip_common.c b/drivers/usb/usbip/usbip_common.c
> index d88a5b15f..89b87d5ed 100644
> --- a/drivers/usb/usbip/usbip_common.c
> +++ b/drivers/usb/usbip/usbip_common.c
> @@ -709,6 +709,7 @@ int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
>  		/* should not happen, probably malicious packet */
>  		goto error;
>  
> +	dev_err(&urb->dev->dev, "num_sgs %d\n", urb->num_sgs);
>  	if (urb->num_sgs) {
>  		copy = size;
>  		for_each_sg(urb->sg, sg, urb->num_sgs, i) {
> @@ -721,6 +722,7 @@ int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
>  
>  			recv = usbip_recv(ud->tcp_socket, sg_virt(sg),
>  						recv_size);
> +			dev_err(&urb->dev->dev, "size %d, copy %d recv %d, recv_size %d, sg->length %d\n", size, copy, recv, recv_size, sg->length);
>  
>  			if (recv != recv_size)
>  				goto error;
> @@ -740,7 +742,7 @@ int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
>  	return ret;
>  
>  error:
> -	dev_err(&urb->dev->dev, "recv xbuf, %d\n", ret);
> +	dev_err(&urb->dev->dev, "recv xbuf, %d size %d\n", ret, size);
>  	if (ud->side == USBIP_STUB || ud->side == USBIP_VUDC)
>  		usbip_event_add(ud, SDEV_EVENT_ERROR_TCP);
>  	else
> 
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-09  6:35         ` Suwan Kim
@ 2019-12-09 14:19           ` Marek Marczykowski-Górecki
  2019-12-10 14:25             ` Suwan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Marek Marczykowski-Górecki @ 2019-12-09 14:19 UTC (permalink / raw)
  To: Suwan Kim; +Cc: linux-usb, Shuah Khan, Valentina Manea

[-- Attachment #1: Type: text/plain, Size: 3675 bytes --]

On Mon, Dec 09, 2019 at 03:35:43PM +0900, Suwan Kim wrote:
> On Mon, Dec 09, 2019 at 04:37:40AM +0100, Marek Marczykowski-Górecki wrote:
> > On Mon, Dec 09, 2019 at 11:01:30AM +0900, Suwan Kim wrote:
> > > On Fri, Dec 06, 2019 at 09:57:42PM +0100, Marek Marczykowski-Górecki wrote:
> > > > [  212.890519] usb 1-1: recv xbuf, 42
> > > 
> > > This message is printed by receive error and before that, driver
> > > canceled URB transmission. we need to know the exact situation
> > > before this message.
> > 
> > I've added some more messages and found recv_size is 0.
> 
> That is the bug point. "size" is urb->actual_length that means
> amount of data actually received from device. And "copy" is
> amount of data received from usbip server. So, in this situation,
> vhci-hcd received all the data from usbip server even if there
> are more sg entries left. So, "copy == 0" means vhci-hcd receives
> all data from the server and we should check "if (copy == 0)" in
> for_each_sg() loop of usbip_recv_xbuff() to exit the loop and not
> to add error event.

That makes sense. But I think there is also another issue here: hang in
case of an error. Here it was EINVAL, but there are probably other
reasons why usbip_recv can fail, like network error or misbehaving
server. This definitely should not cause the client to fail this way...
And also, the actual error code is lost.

> > > Could you send me a longer log messages showing the situation
> > > before "[  212.890519] usb 1-1: recv xbuf, 42"?
> > 
> > Sure, with added extra messages (debug patch below).
> > 
> > [  131.397522] usb 1-1: num_sgs 0
> > [  131.406588] usb 1-1: num_sgs 0
> > [  131.410621] usb 1-1: num_sgs 0
> > [  131.411950] usb 1-1: num_sgs 0
> > [  131.413186] usb 1-1: num_sgs 0
> > [  131.414590] usb 1-1: num_sgs 0
> > [  131.417086] usb 1-1: num_sgs 0
> > [  131.418188] usb 1-1: num_sgs 0
> > [  131.419228] usb 1-1: num_sgs 0
> > [  131.420248] usb 1-1: num_sgs 0
> > [  131.457315] usb 1-1: num_sgs 5
> > [  131.457345] usb 1-1: size 42, copy 42 recv 42, recv_size 42, sg->length 16384
> 
> Device sent 42 bytes data (size 42) and vhci-hcd received 42 bytes
> data from the server. vhci-hcd received all the data and It should
> exit the loop.
> 
> > [  131.457359] usb 1-1: size 42, copy 0 recv -22, recv_size 0, sg->length 16384
> > [  131.457372] usb 1-1: recv xbuf, 42 size 42
> > [  131.458263] vhci_hcd: vhci_shutdown_connection:1024: stop threads
> > [  131.458318] vhci_hcd: vhci_shutdown_connection:1032: release socket
> > [  131.458431] vhci_hcd: vhci_shutdown_connection:1058: disconnect device
> > [  131.460171] usb 1-1: USB disconnect, device number 2
> > 
> > (...)
> > 
> > If I add "if (!recv_size) continue;" there, it works!
> 
> I think we should check "copy" not the "recv_size" because "copy"
> shows the amount of data received from the server.
> 
> int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
> ...
> ...
> 	if (urb->num_sgs) {
> 		copy = size;
> 		for_each_sg(urb->sg, sg, urb->num_sgs, i) {
> 			int recv_size;
> 
> 			if (copy < sg->length)
> 				recv_size = copy;
> 			else
> 				recv_size = sg->length;
> 
> 			recv = usbip_recv(ud->tcp_socket, sg_virt(sg),
> 						recv_size);
> 
> 			if (recv != recv_size)
> 				goto error;
> 
> 			copy -= recv;
> 			ret += recv;
> 			
> 			/* Add here */
> 			if (!copy)
> 				break;
> 			^^^^^^^^^^^^^^
> 		}

This helps too.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-09 14:19           ` Marek Marczykowski-Górecki
@ 2019-12-10 14:25             ` Suwan Kim
  2019-12-10 15:32               ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 17+ messages in thread
From: Suwan Kim @ 2019-12-10 14:25 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: linux-usb, Shuah Khan, Valentina Manea

On Mon, Dec 09, 2019 at 03:19:59PM +0100, Marek Marczykowski-Górecki wrote:
> On Mon, Dec 09, 2019 at 03:35:43PM +0900, Suwan Kim wrote:
> > On Mon, Dec 09, 2019 at 04:37:40AM +0100, Marek Marczykowski-Górecki wrote:
> > > On Mon, Dec 09, 2019 at 11:01:30AM +0900, Suwan Kim wrote:
> > > > On Fri, Dec 06, 2019 at 09:57:42PM +0100, Marek Marczykowski-Górecki wrote:
> > > > > [  212.890519] usb 1-1: recv xbuf, 42
> > > > 
> > > > This message is printed by receive error and before that, driver
> > > > canceled URB transmission. we need to know the exact situation
> > > > before this message.
> > > 
> > > I've added some more messages and found recv_size is 0.
> > 
> > That is the bug point. "size" is urb->actual_length that means
> > amount of data actually received from device. And "copy" is
> > amount of data received from usbip server. So, in this situation,
> > vhci-hcd received all the data from usbip server even if there
> > are more sg entries left. So, "copy == 0" means vhci-hcd receives
> > all data from the server and we should check "if (copy == 0)" in
> > for_each_sg() loop of usbip_recv_xbuff() to exit the loop and not
> > to add error event.
> 
> That makes sense. But I think there is also another issue here: hang in
> case of an error. Here it was EINVAL, but there are probably other
> reasons why usbip_recv can fail, like network error or misbehaving
> server. This definitely should not cause the client to fail this way...
> And also, the actual error code is lost.

I agree. I have been taking a look at it and trying to reproduce
the same issue on my machine. I guess race condition between hub
irq thread and driver (vhci_urb_enqueue or hcd?)
But I'm not sure...

> 
> > > > Could you send me a longer log messages showing the situation
> > > > before "[  212.890519] usb 1-1: recv xbuf, 42"?
> > > 
> > > Sure, with added extra messages (debug patch below).
> > > 
> > > [  131.397522] usb 1-1: num_sgs 0
> > > [  131.406588] usb 1-1: num_sgs 0
> > > [  131.410621] usb 1-1: num_sgs 0
> > > [  131.411950] usb 1-1: num_sgs 0
> > > [  131.413186] usb 1-1: num_sgs 0
> > > [  131.414590] usb 1-1: num_sgs 0
> > > [  131.417086] usb 1-1: num_sgs 0
> > > [  131.418188] usb 1-1: num_sgs 0
> > > [  131.419228] usb 1-1: num_sgs 0
> > > [  131.420248] usb 1-1: num_sgs 0
> > > [  131.457315] usb 1-1: num_sgs 5
> > > [  131.457345] usb 1-1: size 42, copy 42 recv 42, recv_size 42, sg->length 16384
> > 
> > Device sent 42 bytes data (size 42) and vhci-hcd received 42 bytes
> > data from the server. vhci-hcd received all the data and It should
> > exit the loop.
> > 
> > > [  131.457359] usb 1-1: size 42, copy 0 recv -22, recv_size 0, sg->length 16384
> > > [  131.457372] usb 1-1: recv xbuf, 42 size 42
> > > [  131.458263] vhci_hcd: vhci_shutdown_connection:1024: stop threads
> > > [  131.458318] vhci_hcd: vhci_shutdown_connection:1032: release socket
> > > [  131.458431] vhci_hcd: vhci_shutdown_connection:1058: disconnect device
> > > [  131.460171] usb 1-1: USB disconnect, device number 2
> > > 
> > > (...)
> > > 
> > > If I add "if (!recv_size) continue;" there, it works!
> > 
> > I think we should check "copy" not the "recv_size" because "copy"
> > shows the amount of data received from the server.
> > 
> > int usbip_recv_xbuff(struct usbip_device *ud, struct urb *urb)
> > ...
> > ...
> > 	if (urb->num_sgs) {
> > 		copy = size;
> > 		for_each_sg(urb->sg, sg, urb->num_sgs, i) {
> > 			int recv_size;
> > 
> > 			if (copy < sg->length)
> > 				recv_size = copy;
> > 			else
> > 				recv_size = sg->length;
> > 
> > 			recv = usbip_recv(ud->tcp_socket, sg_virt(sg),
> > 						recv_size);
> > 
> > 			if (recv != recv_size)
> > 				goto error;
> > 
> > 			copy -= recv;
> > 			ret += recv;
> > 			
> > 			/* Add here */
> > 			if (!copy)
> > 				break;
> > 			^^^^^^^^^^^^^^
> > 		}
> 
> This helps too.

Good. We should fix this issue first. I will submit a patch soon.

Regrads,
Suwan Kim

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-10 14:25             ` Suwan Kim
@ 2019-12-10 15:32               ` Marek Marczykowski-Górecki
  2019-12-11  3:07                 ` Suwan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Marek Marczykowski-Górecki @ 2019-12-10 15:32 UTC (permalink / raw)
  To: Suwan Kim; +Cc: linux-usb, Shuah Khan, Valentina Manea

[-- Attachment #1: Type: text/plain, Size: 3683 bytes --]

On Tue, Dec 10, 2019 at 11:25:35PM +0900, Suwan Kim wrote:
> On Mon, Dec 09, 2019 at 03:19:59PM +0100, Marek Marczykowski-Górecki wrote:
> > On Mon, Dec 09, 2019 at 03:35:43PM +0900, Suwan Kim wrote:
> > > On Mon, Dec 09, 2019 at 04:37:40AM +0100, Marek Marczykowski-Górecki wrote:
> > > > On Mon, Dec 09, 2019 at 11:01:30AM +0900, Suwan Kim wrote:
> > > > > On Fri, Dec 06, 2019 at 09:57:42PM +0100, Marek Marczykowski-Górecki wrote:
> > > > > > [  212.890519] usb 1-1: recv xbuf, 42
> > > > > 
> > > > > This message is printed by receive error and before that, driver
> > > > > canceled URB transmission. we need to know the exact situation
> > > > > before this message.
> > > > 
> > > > I've added some more messages and found recv_size is 0.
> > > 
> > > That is the bug point. "size" is urb->actual_length that means
> > > amount of data actually received from device. And "copy" is
> > > amount of data received from usbip server. So, in this situation,
> > > vhci-hcd received all the data from usbip server even if there
> > > are more sg entries left. So, "copy == 0" means vhci-hcd receives
> > > all data from the server and we should check "if (copy == 0)" in
> > > for_each_sg() loop of usbip_recv_xbuff() to exit the loop and not
> > > to add error event.
> > 
> > That makes sense. But I think there is also another issue here: hang in
> > case of an error. Here it was EINVAL, but there are probably other
> > reasons why usbip_recv can fail, like network error or misbehaving
> > server. This definitely should not cause the client to fail this way...
> > And also, the actual error code is lost.
> 
> I agree. I have been taking a look at it and trying to reproduce
> the same issue on my machine. I guess race condition between hub
> irq thread and driver (vhci_urb_enqueue or hcd?)
> But I'm not sure...

I've tried it some more time and one time I've got a traceback pointing
at lock_release() called from vhci_urb_dequeue, not lock_acquire(). This
get me thinking it may not be deadlock on a spinlock, but some infinite
loop. Looking at the source, I think it's about usb_hcd_flush_endpoint
looping indefinitely because vhci_urb_dequeue() exit early on this:


    spin_lock_irqsave(&vhci->lock, flags);

    priv = urb->hcpriv;
    if (!priv) {
        /* URB was never linked! or will be soon given back by
         * vhci_rx. */
        spin_unlock_irqrestore(&vhci->lock, flags);
        return -EIDRM;
    }

Adding a print there confirms it.

And I think it's because of vhci_recv_ret_submit():


    spin_lock_irqsave(&vdev->priv_lock, flags);
    urb = pickup_urb_and_free_priv(vdev, pdu->base.seqnum);

// ****** priv freed here

    spin_unlock_irqrestore(&vdev->priv_lock, flags);

    if (!urb) {
        pr_err("cannot find a urb of seqnum %u max seqnum %d\n",
            pdu->base.seqnum,
            atomic_read(&vhci_hcd->seqnum));
        usbip_event_add(ud, VDEV_EVENT_ERROR_TCP);
        return;
    }
    
    /* unpack the pdu to a urb */
    usbip_pack_pdu(pdu, urb, USBIP_RET_SUBMIT, 0);

    /* recv transfer buffer */
    if (usbip_recv_xbuff(ud, urb) < 0)
// ***** exit early here
        return;

    /* recv iso_packet_descriptor */
    if (usbip_recv_iso(ud, urb) < 0)
        return;

I'm not really sure what should happen, but I think some cleanup in case
of usbip_recv_xbuff() failure is missing. And probably in case of
usbip_recv_iso() failure too.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-10 15:32               ` Marek Marczykowski-Górecki
@ 2019-12-11  3:07                 ` Suwan Kim
  2019-12-11  3:20                   ` Marek Marczykowski-Górecki
  0 siblings, 1 reply; 17+ messages in thread
From: Suwan Kim @ 2019-12-11  3:07 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: linux-usb, Shuah Khan, Valentina Manea

On Tue, Dec 10, 2019 at 04:32:21PM +0100, Marek Marczykowski-Górecki wrote:
> On Tue, Dec 10, 2019 at 11:25:35PM +0900, Suwan Kim wrote:
> > On Mon, Dec 09, 2019 at 03:19:59PM +0100, Marek Marczykowski-Górecki wrote:
> > > On Mon, Dec 09, 2019 at 03:35:43PM +0900, Suwan Kim wrote:
> > > > On Mon, Dec 09, 2019 at 04:37:40AM +0100, Marek Marczykowski-Górecki wrote:
> > > > > On Mon, Dec 09, 2019 at 11:01:30AM +0900, Suwan Kim wrote:
> > > > > > On Fri, Dec 06, 2019 at 09:57:42PM +0100, Marek Marczykowski-Górecki wrote:
> > > > > > > [  212.890519] usb 1-1: recv xbuf, 42
> > > > > > 
> > > > > > This message is printed by receive error and before that, driver
> > > > > > canceled URB transmission. we need to know the exact situation
> > > > > > before this message.
> > > > > 
> > > > > I've added some more messages and found recv_size is 0.
> > > > 
> > > > That is the bug point. "size" is urb->actual_length that means
> > > > amount of data actually received from device. And "copy" is
> > > > amount of data received from usbip server. So, in this situation,
> > > > vhci-hcd received all the data from usbip server even if there
> > > > are more sg entries left. So, "copy == 0" means vhci-hcd receives
> > > > all data from the server and we should check "if (copy == 0)" in
> > > > for_each_sg() loop of usbip_recv_xbuff() to exit the loop and not
> > > > to add error event.
> > > 
> > > That makes sense. But I think there is also another issue here: hang in
> > > case of an error. Here it was EINVAL, but there are probably other
> > > reasons why usbip_recv can fail, like network error or misbehaving
> > > server. This definitely should not cause the client to fail this way...
> > > And also, the actual error code is lost.
> > 
> > I agree. I have been taking a look at it and trying to reproduce
> > the same issue on my machine. I guess race condition between hub
> > irq thread and driver (vhci_urb_enqueue or hcd?)
> > But I'm not sure...
> 
> I've tried it some more time and one time I've got a traceback pointing
> at lock_release() called from vhci_urb_dequeue, not lock_acquire(). This
> get me thinking it may not be deadlock on a spinlock, but some infinite
> loop. Looking at the source, I think it's about usb_hcd_flush_endpoint
> looping indefinitely because vhci_urb_dequeue() exit early on this:

That really makes sense. Very good debugging. I can also see the
infinite loop now.

When there is transaction error, vhci rx thread adds error event
and event handler tries to shutdown the connection and also changes
the port status of vhci that kicks hub_irq. Then hub_irq cleans up
the device resources including pending URBs (vhci_urb_dequeue).
But as you pointed out, "priv" is already released in vhci rx
thread.

> 
> 
>     spin_lock_irqsave(&vhci->lock, flags);
> 
>     priv = urb->hcpriv;
>     if (!priv) {
>         /* URB was never linked! or will be soon given back by
>          * vhci_rx. */
>         spin_unlock_irqrestore(&vhci->lock, flags);
>         return -EIDRM;
>     }
> 
> Adding a print there confirms it.
> 
> And I think it's because of vhci_recv_ret_submit():
> 
> 
>     spin_lock_irqsave(&vdev->priv_lock, flags);
>     urb = pickup_urb_and_free_priv(vdev, pdu->base.seqnum);
> 
> // ****** priv freed here
> 
>     spin_unlock_irqrestore(&vdev->priv_lock, flags);
> 
>     if (!urb) {
>         pr_err("cannot find a urb of seqnum %u max seqnum %d\n",
>             pdu->base.seqnum,
>             atomic_read(&vhci_hcd->seqnum));
>         usbip_event_add(ud, VDEV_EVENT_ERROR_TCP);
>         return;
>     }
>     
>     /* unpack the pdu to a urb */
>     usbip_pack_pdu(pdu, urb, USBIP_RET_SUBMIT, 0);
> 
>     /* recv transfer buffer */
>     if (usbip_recv_xbuff(ud, urb) < 0)
> // ***** exit early here
>         return;
> 
>     /* recv iso_packet_descriptor */
>     if (usbip_recv_iso(ud, urb) < 0)
>         return;
> 
> I'm not really sure what should happen, but I think some cleanup in case
> of usbip_recv_xbuff() failure is missing. And probably in case of
> usbip_recv_iso() failure too.

HCD should giveback URB to driver calling usb_hcd_giveback_urb().
But in the case of transaction error, vhci_recv_ret_submit()
terminates without giveback URB. So, I think the error path in
usbip_recv_xbuff and usbip_recv_iso should unlink URB from ep and
insert proper error code in urb->status that indicates error
status to driver and handle giveback URB to driver.

Then hub_irq doesn't need to flush error URB.
That will be helpful to graceful connection shutdown.


static void vhci_recv_ret_submit(struct vhci_device *vdev,
				 struct usbip_header *pdu)
...
...
	if (usbip_recv_xbuff(ud, urb) < 0) {
		urb->status = -EPIPE;
		goto error;	// goto error path
	}

	/* recv iso_packet_descriptor */
	if (usbip_recv_iso(ud, urb) < 0) {
		urb->status = -EPIPE;
		goto error;	// goto error path
	}
...
...
error://error path
	spin_lock_irqsave(&vhci->lock, flags);
	usb_hcd_unlink_urb_from_ep(vhci_hcd_to_hcd(vhci_hcd), urb);
	spin_unlock_irqrestore(&vhci->lock, flags);

	usb_hcd_giveback_urb(vhci_hcd_to_hcd(vhci_hcd), urb, urb->status);

	usbip_dbg_vhci_rx("Leave\n");
}

Regards,
Suwan Kim

> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-11  3:07                 ` Suwan Kim
@ 2019-12-11  3:20                   ` Marek Marczykowski-Górecki
  2019-12-11  4:53                     ` Suwan Kim
  2019-12-11  6:27                     ` Suwan Kim
  0 siblings, 2 replies; 17+ messages in thread
From: Marek Marczykowski-Górecki @ 2019-12-11  3:20 UTC (permalink / raw)
  To: Suwan Kim; +Cc: linux-usb, Shuah Khan, Valentina Manea

[-- Attachment #1: Type: text/plain, Size: 1504 bytes --]

On Wed, Dec 11, 2019 at 12:07:54PM +0900, Suwan Kim wrote:
> HCD should giveback URB to driver calling usb_hcd_giveback_urb().
> But in the case of transaction error, vhci_recv_ret_submit()
> terminates without giveback URB. So, I think the error path in
> usbip_recv_xbuff and usbip_recv_iso should unlink URB from ep and
> insert proper error code in urb->status that indicates error
> status to driver and handle giveback URB to driver.
> 
> Then hub_irq doesn't need to flush error URB.
> That will be helpful to graceful connection shutdown.
> 
> 
> static void vhci_recv_ret_submit(struct vhci_device *vdev,
> 				 struct usbip_header *pdu)
> ...
> ...
> 	if (usbip_recv_xbuff(ud, urb) < 0) {
> 		urb->status = -EPIPE;
> 		goto error;	// goto error path
> 	}
> 
> 	/* recv iso_packet_descriptor */
> 	if (usbip_recv_iso(ud, urb) < 0) {
> 		urb->status = -EPIPE;
> 		goto error;	// goto error path
> 	}
> ...
> ...
> error://error path
> 	spin_lock_irqsave(&vhci->lock, flags);
> 	usb_hcd_unlink_urb_from_ep(vhci_hcd_to_hcd(vhci_hcd), urb);
> 	spin_unlock_irqrestore(&vhci->lock, flags);
> 
> 	usb_hcd_giveback_urb(vhci_hcd_to_hcd(vhci_hcd), urb, urb->status);
> 
> 	usbip_dbg_vhci_rx("Leave\n");
> }

Yup, that works! Now the error is handled gracefully instead of hanging.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-11  3:20                   ` Marek Marczykowski-Górecki
@ 2019-12-11  4:53                     ` Suwan Kim
  2019-12-11  6:27                     ` Suwan Kim
  1 sibling, 0 replies; 17+ messages in thread
From: Suwan Kim @ 2019-12-11  4:53 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: linux-usb, Shuah Khan, Valentina Manea

On Wed, Dec 11, 2019 at 04:20:32AM +0100, Marek Marczykowski-Górecki wrote:
> On Wed, Dec 11, 2019 at 12:07:54PM +0900, Suwan Kim wrote:
> > HCD should giveback URB to driver calling usb_hcd_giveback_urb().
> > But in the case of transaction error, vhci_recv_ret_submit()
> > terminates without giveback URB. So, I think the error path in
> > usbip_recv_xbuff and usbip_recv_iso should unlink URB from ep and
> > insert proper error code in urb->status that indicates error
> > status to driver and handle giveback URB to driver.
> > 
> > Then hub_irq doesn't need to flush error URB.
> > That will be helpful to graceful connection shutdown.
> > 
> > 
> > static void vhci_recv_ret_submit(struct vhci_device *vdev,
> > 				 struct usbip_header *pdu)
> > ...
> > ...
> > 	if (usbip_recv_xbuff(ud, urb) < 0) {
> > 		urb->status = -EPIPE;
> > 		goto error;	// goto error path
> > 	}
> > 
> > 	/* recv iso_packet_descriptor */
> > 	if (usbip_recv_iso(ud, urb) < 0) {
> > 		urb->status = -EPIPE;
> > 		goto error;	// goto error path
> > 	}
> > ...
> > ...
> > error://error path
> > 	spin_lock_irqsave(&vhci->lock, flags);
> > 	usb_hcd_unlink_urb_from_ep(vhci_hcd_to_hcd(vhci_hcd), urb);
> > 	spin_unlock_irqrestore(&vhci->lock, flags);
> > 
> > 	usb_hcd_giveback_urb(vhci_hcd_to_hcd(vhci_hcd), urb, urb->status);
> > 
> > 	usbip_dbg_vhci_rx("Leave\n");
> > }
> 
> Yup, that works! Now the error is handled gracefully instead of hanging.

Ok, good! I will send a patch including Reported-by tag with you :)
Thanks for reporting the bugs.

Regards,
Suwan Kim

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-11  3:20                   ` Marek Marczykowski-Górecki
  2019-12-11  4:53                     ` Suwan Kim
@ 2019-12-11  6:27                     ` Suwan Kim
  2019-12-11 11:01                       ` Marek Marczykowski-Górecki
  1 sibling, 1 reply; 17+ messages in thread
From: Suwan Kim @ 2019-12-11  6:27 UTC (permalink / raw)
  To: Marek Marczykowski-Górecki; +Cc: linux-usb, Shuah Khan, Valentina Manea

On Wed, Dec 11, 2019 at 04:20:32AM +0100, Marek Marczykowski-Górecki wrote:
> On Wed, Dec 11, 2019 at 12:07:54PM +0900, Suwan Kim wrote:
> > HCD should giveback URB to driver calling usb_hcd_giveback_urb().
> > But in the case of transaction error, vhci_recv_ret_submit()
> > terminates without giveback URB. So, I think the error path in
> > usbip_recv_xbuff and usbip_recv_iso should unlink URB from ep and
> > insert proper error code in urb->status that indicates error
> > status to driver and handle giveback URB to driver.
> > 
> > Then hub_irq doesn't need to flush error URB.
> > That will be helpful to graceful connection shutdown.
> > 
> > 
> > static void vhci_recv_ret_submit(struct vhci_device *vdev,
> > 				 struct usbip_header *pdu)
> > ...
> > ...
> > 	if (usbip_recv_xbuff(ud, urb) < 0) {
> > 		urb->status = -EPIPE;
> > 		goto error;	// goto error path
> > 	}
> > 
> > 	/* recv iso_packet_descriptor */
> > 	if (usbip_recv_iso(ud, urb) < 0) {
> > 		urb->status = -EPIPE;
> > 		goto error;	// goto error path
> > 	}
> > ...
> > ...
> > error://error path
> > 	spin_lock_irqsave(&vhci->lock, flags);
> > 	usb_hcd_unlink_urb_from_ep(vhci_hcd_to_hcd(vhci_hcd), urb);
> > 	spin_unlock_irqrestore(&vhci->lock, flags);
> > 
> > 	usb_hcd_giveback_urb(vhci_hcd_to_hcd(vhci_hcd), urb, urb->status);
> > 
> > 	usbip_dbg_vhci_rx("Leave\n");
> > }
> 
> Yup, that works! Now the error is handled gracefully instead of hanging.

Marek, Could you test if it works too?

Regards,
Suwan Kim


diff --git a/drivers/usb/usbip/vhci_rx.c b/drivers/usb/usbip/vhci_rx.c
index 33f8972ba842..19a807e398eb 100644
--- a/drivers/usb/usbip/vhci_rx.c
+++ b/drivers/usb/usbip/vhci_rx.c
@@ -78,11 +78,11 @@ static void vhci_recv_ret_submit(struct vhci_device *vdev,

        /* recv transfer buffer */
        if (usbip_recv_xbuff(ud, urb) < 0)
-               return;
+               urb->status = -EPIPE;

        /* recv iso_packet_descriptor */
        if (usbip_recv_iso(ud, urb) < 0)
-               return;
+               urb->status = -EPIPE;

        /* restore the padding in iso packets */
        usbip_pad_iso(ud, urb);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock
  2019-12-11  6:27                     ` Suwan Kim
@ 2019-12-11 11:01                       ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 17+ messages in thread
From: Marek Marczykowski-Górecki @ 2019-12-11 11:01 UTC (permalink / raw)
  To: Suwan Kim; +Cc: linux-usb, Shuah Khan, Valentina Manea

[-- Attachment #1: Type: text/plain, Size: 2595 bytes --]

On Wed, Dec 11, 2019 at 03:27:05PM +0900, Suwan Kim wrote:
> On Wed, Dec 11, 2019 at 04:20:32AM +0100, Marek Marczykowski-Górecki wrote:
> > On Wed, Dec 11, 2019 at 12:07:54PM +0900, Suwan Kim wrote:
> > > HCD should giveback URB to driver calling usb_hcd_giveback_urb().
> > > But in the case of transaction error, vhci_recv_ret_submit()
> > > terminates without giveback URB. So, I think the error path in
> > > usbip_recv_xbuff and usbip_recv_iso should unlink URB from ep and
> > > insert proper error code in urb->status that indicates error
> > > status to driver and handle giveback URB to driver.
> > > 
> > > Then hub_irq doesn't need to flush error URB.
> > > That will be helpful to graceful connection shutdown.
> > > 
> > > 
> > > static void vhci_recv_ret_submit(struct vhci_device *vdev,
> > > 				 struct usbip_header *pdu)
> > > ...
> > > ...
> > > 	if (usbip_recv_xbuff(ud, urb) < 0) {
> > > 		urb->status = -EPIPE;
> > > 		goto error;	// goto error path
> > > 	}
> > > 
> > > 	/* recv iso_packet_descriptor */
> > > 	if (usbip_recv_iso(ud, urb) < 0) {
> > > 		urb->status = -EPIPE;
> > > 		goto error;	// goto error path
> > > 	}
> > > ...
> > > ...
> > > error://error path
> > > 	spin_lock_irqsave(&vhci->lock, flags);
> > > 	usb_hcd_unlink_urb_from_ep(vhci_hcd_to_hcd(vhci_hcd), urb);
> > > 	spin_unlock_irqrestore(&vhci->lock, flags);
> > > 
> > > 	usb_hcd_giveback_urb(vhci_hcd_to_hcd(vhci_hcd), urb, urb->status);
> > > 
> > > 	usbip_dbg_vhci_rx("Leave\n");
> > > }
> > 
> > Yup, that works! Now the error is handled gracefully instead of hanging.
> 
> Marek, Could you test if it works too?

Yes, this seems to work too.


> Regards,
> Suwan Kim
> 
> 
> diff --git a/drivers/usb/usbip/vhci_rx.c b/drivers/usb/usbip/vhci_rx.c
> index 33f8972ba842..19a807e398eb 100644
> --- a/drivers/usb/usbip/vhci_rx.c
> +++ b/drivers/usb/usbip/vhci_rx.c
> @@ -78,11 +78,11 @@ static void vhci_recv_ret_submit(struct vhci_device *vdev,
> 
>         /* recv transfer buffer */
>         if (usbip_recv_xbuff(ud, urb) < 0)
> -               return;
> +               urb->status = -EPIPE;
> 
>         /* recv iso_packet_descriptor */
>         if (usbip_recv_iso(ud, urb) < 0)
> -               return;
> +               urb->status = -EPIPE;
> 
>         /* restore the padding in iso packets */
>         usbip_pad_iso(ud, urb);

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-12-11 11:01 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-06  3:24 "usbip: Implement SG support to vhci-hcd and stub driver" causes a deadlock Marek Marczykowski-Górecki
2019-12-06  6:50 ` Suwan Kim
2019-12-06 20:57   ` Marek Marczykowski-Górecki
2019-12-06 21:12     ` Shuah Khan
2019-12-07  0:58       ` Marek Marczykowski-Górecki
2019-12-07 18:45         ` Marek Marczykowski-Górecki
2019-12-09  2:01     ` Suwan Kim
2019-12-09  3:37       ` Marek Marczykowski-Górecki
2019-12-09  6:35         ` Suwan Kim
2019-12-09 14:19           ` Marek Marczykowski-Górecki
2019-12-10 14:25             ` Suwan Kim
2019-12-10 15:32               ` Marek Marczykowski-Górecki
2019-12-11  3:07                 ` Suwan Kim
2019-12-11  3:20                   ` Marek Marczykowski-Górecki
2019-12-11  4:53                     ` Suwan Kim
2019-12-11  6:27                     ` Suwan Kim
2019-12-11 11:01                       ` Marek Marczykowski-Górecki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).