linux-media.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* hdpvr mutex deadlock on kernel 5.1.x
@ 2019-06-15 20:06 Keith Pyle
  2019-06-17 10:22 ` Hans Verkuil
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Pyle @ 2019-06-15 20:06 UTC (permalink / raw)
  To: Linux Media Mailing List

We have observed a hard mutex deadlock in the hdpvr driver on both 5.1.8
and 5.1.10.  The problem occurs while reading from the HD-PVR device
and results in an unkillable user process.  It is readily reproducible.

The problem has been reproduced on two different systems and with two
different HD-PVR devices using the 0x1e firmware.

We're not particularly familiar with the hdpvr code and could use
advice/assistance on this problem.  We're certainly willing to test patches.

As of this writing, origin/master is
0011572c883082a95e02d47f45fc4a42dc0e8634 (a commit in Linus' tree).

`git log v5.1.10..origin/master --grep=hdpvr --` returns no output, so
there appear to be no post-5.1 commits mentioning the hdpvr directly.

`git log v5.1.10..origin/master -- drivers/media/usb/hdpvr/` shows
only some comment updates and a strncpy change.  There's nothing that
appears locking related.

-----

Steps taken to characterize and demonstrate the problem:

We enabled kernel lock debugging using these options:

CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_DEBUG_ATOMIC_SLEEP=y

The kernel was built with:

gcc version 7.4.0 (Gentoo Hardened 7.4.0-r1 p1.2)
GNU ld (Gentoo 2.31.1 p5) 2.31.1

Three snippets of lockdep output are included below.

1. On opening the device, lockdep reported the following:

[  113.195852] ------------[ cut here ]------------
[  113.195869] do not call blocking ops when !TASK_RUNNING; state=1 set 
at [<00000000e54b9652>] prepare_to_wait_event+0xa7/0xe5
[  113.195885] WARNING: CPU: 2 PID: 2616 at __might_sleep+0x52/0x65
[  113.195889] Modules linked in: hdpvr x86_pkg_temp_thermal efivarfs
[  113.195902] CPU: 2 PID: 2616 Comm: cat Not tainted 5.1.10-ld+ #4
[  113.195906] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 
1.60 07/11/2017
[  113.195914] RIP: 0010:__might_sleep+0x52/0x65
[  113.195920] Code: 80 3d e7 98 43 01 00 75 23 48 8b 90 38 21 00 00 48 
c7 c7 ce 46 11 b8 c6 05 d0 98 43 01 01 48 8b 70 10 48 89 d1 e8 56 b2 fd 
ff <0f> 0b 44 89 e2 89 ee 48 89 df 5b 5d 41 5c e9 de fd ff ff 83 fe 07
[  113.195925] RSP: 0018:ffffb73b0057fd30 EFLAGS: 00010282
[  113.195930] RAX: 0000000000000070 RBX: ffffffffb81164a0 RCX: 
0000000000000007
[  113.195935] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 
ffff9ed35eb15610
[  113.195939] RBP: 000000000000038c R08: 0000000000000001 R09: 
000000000000000c
[  113.195943] R10: ffffb73b0057fe18 R11: 0000008391c86b93 R12: 
0000000000000000
[  113.195947] R13: 0000000000020000 R14: 0000000000000002 R15: 
0000000000000000
[  113.195953] FS:  00007fa6eee2c580(0000) GS:ffff9ed35eb00000(0000) 
knlGS:0000000000000000
[  113.195958] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  113.195963] CR2: 000056533197de84 CR3: 00000004425de002 CR4: 
00000000001606e0
[  113.195967] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  113.195972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  113.195975] Call Trace:
[  113.195985]  __mutex_lock+0x47/0x7ac
[  113.195997]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
[  113.196005]  ? _raw_spin_unlock_irqrestore+0x39/0x4a
[  113.196013]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
[  113.196020]  hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
[  113.196029]  hdpvr_read+0x174/0x450 [hdpvr]
[  113.196037]  ? wait_woken+0x86/0x86
[  113.196044]  v4l2_read+0x38/0x7a
[  113.196052]  vfs_read+0xad/0x136
[  113.196059]  ksys_read+0x53/0x95
[  113.196066]  do_syscall_64+0x52/0x155
[  113.196074]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  113.196079] RIP: 0033:0x7fa6eed52005
[  113.196085] Code: 00 00 0f 1f 00 48 83 ec 38 64 48 8b 04 25 28 00 00 
00 48 89 44 24 28 31 c0 48 8d 05 35 85 0d 00 8b 00 85 c0 75 27 31 c0 0f 
05 <48> 3d 00 f0 ff ff 77 63 48 8b 4c 24 28 64 48 33 0c 25 28 00 00 00
[  113.196089] RSP: 002b:00007fffd97d5630 EFLAGS: 00000246 ORIG_RAX: 
0000000000000000
[  113.196096] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 
00007fa6eed52005
[  113.196100] RDX: 0000000000020000 RSI: 00007fa6eee31000 RDI: 
0000000000000003
[  113.196104] RBP: 0000000000020000 R08: 00000000ffffffff R09: 
0000000000000000
[  113.196109] R10: 0000000000000022 R11: 0000000000000246 R12: 
00007fa6eee31000
[  113.196113] R13: 0000000000000003 R14: 0000000000020000 R15: 
0000000000000000
[  113.196119] irq event stamp: 1514
[  113.196128] hardirqs last  enabled at (1513): [<ffffffffb714832b>] 
console_unlock+0x493/0x4da
[  113.196134] hardirqs last disabled at (1514): [<ffffffffb7001460>] 
trace_hardirqs_off_thunk+0x1a/0x1c
[  113.196140] softirqs last  enabled at (1500): [<ffffffffb7c00376>] 
__do_softirq+0x376/0x3b2
[  113.196148] softirqs last disabled at (1491): [<ffffffffb70f1ea2>] 
irq_exit+0x4e/0x9d
[  113.196152] ---[ end trace 0881f93401450644 ]---

2. A short capture from the device proceeded successfully and lockdep
reported the following on turning the device off:

[  227.915296] usb 1-10: USB disconnect, device number 9
[  227.917973] hdpvr 1-10:1.0: device video0 disconnected

[  228.024287] ======================================================
[  228.024291] WARNING: possible circular locking dependency detected
[  228.024296] 5.1.10-ld+ #4 Tainted: G        W
[  228.024300] ------------------------------------------------------
[  228.024304] kworker/3:1/41 is trying to acquire lock:
[  228.024308] 000000002a90d673 ((work_completion)(&dev->worker)){+.+.}, 
at: __flush_work+0x3d/0x245
[  228.024322]
                but task is already holding lock:
[  228.024326] 000000009cff49c4 (&dev->io_mutex){+.+.}, at: 
hdpvr_device_release+0x22/0x89 [hdpvr]
[  228.024337]
                which lock already depends on the new lock.

[  228.024341]
                the existing dependency chain (in reverse order) is:
[  228.024344]
                -> #1 (&dev->io_mutex){+.+.}:
[  228.024354]        __mutex_lock+0x81/0x7ac
[  228.024360]        hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
[  228.024366]        process_one_work+0x2b9/0x4db
[  228.024372]        worker_thread+0x1d1/0x2a5
[  228.024377]        kthread+0x11c/0x124
[  228.024383]        ret_from_fork+0x3a/0x50
[  228.024386]
                -> #0 ((work_completion)(&dev->worker)){+.+.}:
[  228.024395]        lock_acquire+0x14f/0x17a
[  228.024400]        __flush_work+0x60/0x245
[  228.024406]        hdpvr_device_release+0x2e/0x89 [hdpvr]
[  228.024413]        v4l2_device_release+0xaa/0xc0
[  228.024420]        device_release+0x52/0x7a
[  228.024427]        kobject_put+0x78/0x8f
[  228.024433]        hdpvr_disconnect+0xd1/0xdd [hdpvr]
[  228.024441]        usb_unbind_interface+0x8a/0x1c5
[  228.024447]        device_release_driver_internal+0xe7/0x198
[  228.024452]        bus_remove_device+0xf8/0x10d
[  228.024459]        device_del+0x19e/0x2c4
[  228.024465]        usb_disable_device+0x7b/0x18d
[  228.024470]        usb_disconnect+0x90/0x190
[  228.024475]        hub_event+0x6c4/0xfc4
[  228.024481]        process_one_work+0x2b9/0x4db
[  228.024487]        worker_thread+0x1d1/0x2a5
[  228.024491]        kthread+0x11c/0x124
[  228.024497]        ret_from_fork+0x3a/0x50
[  228.024500]
                other info that might help us debug this:

[  228.024503]  Possible unsafe locking scenario:

[  228.024506]        CPU0                    CPU1
[  228.024509]        ----                    ----
[  228.024512]   lock(&dev->io_mutex);
[  228.024516] lock((work_completion)(&dev->worker));
[  228.024520] lock(&dev->io_mutex);
[  228.024524]   lock((work_completion)(&dev->worker));
[  228.024528]
                 *** DEADLOCK ***

[  228.024532] 6 locks held by kworker/3:1/41:
[  228.024535]  #0: 00000000d157af1b ((wq_completion)usb_hub_wq){+.+.}, 
at: process_one_work+0x187/0x4db
[  228.024545]  #1: 000000009ea387d9 
((work_completion)(&hub->events)){+.+.}, at: process_one_work+0x187/0x4db
[  228.024553]  #2: 00000000cf24eaef (&dev->mutex){....}, at: 
hub_event+0x5b/0xfc4
[  228.024561]  #3: 0000000096a7bf78 (&dev->mutex){....}, at: 
usb_disconnect+0x55/0x190
[  228.024568]  #4: 00000000681cf121 (&dev->mutex){....}, at: 
device_release_driver_internal+0x15/0x198
[  228.024576]  #5: 000000009cff49c4 (&dev->io_mutex){+.+.}, at: 
hdpvr_device_release+0x22/0x89 [hdpvr]
[  228.024585]
                stack backtrace:
[  228.024592] CPU: 3 PID: 41 Comm: kworker/3:1 Tainted: G W         
5.1.10-ld+ #4
[  228.024595] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 
1.60 07/11/2017
[  228.024602] Workqueue: usb_hub_wq hub_event
[  228.024606] Call Trace:
[  228.024614]  dump_stack+0x67/0x8e
[  228.024622]  print_circular_bug.isra.21+0x265/0x272
[  228.024629]  check_prev_add.constprop.29+0x479/0xa1c
[  228.024636]  ? add_lock_to_list.isra.13+0x28/0xc1
[  228.024642]  ? __lock_acquire+0xe83/0xf65
[  228.024647]  ? alloc_lock_chain+0x11/0x34
[  228.024652]  __lock_acquire+0xe83/0xf65
[  228.024658]  lock_acquire+0x14f/0x17a
[  228.024664]  ? __flush_work+0x3d/0x245
[  228.024670]  __flush_work+0x60/0x245
[  228.024675]  ? __flush_work+0x3d/0x245
[  228.024682]  ? hdpvr_device_release+0x22/0x89 [hdpvr]
[  228.024690]  ? kfree+0x196/0x217
[  228.024696]  ? hdpvr_free_queue+0x68/0x73 [hdpvr]
[  228.024703]  hdpvr_device_release+0x2e/0x89 [hdpvr]
[  228.024708]  v4l2_device_release+0xaa/0xc0
[  228.024716]  device_release+0x52/0x7a
[  228.024723]  kobject_put+0x78/0x8f
[  228.024729]  hdpvr_disconnect+0xd1/0xdd [hdpvr]
[  228.024738]  usb_unbind_interface+0x8a/0x1c5
[  228.024745]  device_release_driver_internal+0xe7/0x198
[  228.024750]  bus_remove_device+0xf8/0x10d
[  228.024758]  device_del+0x19e/0x2c4
[  228.024764]  ? usb_remove_ep_devs+0x16/0x21
[  228.024771]  usb_disable_device+0x7b/0x18d
[  228.024777]  usb_disconnect+0x90/0x190
[  228.024783]  hub_event+0x6c4/0xfc4
[  228.024790]  ? process_one_work+0x187/0x4db
[  228.024796]  process_one_work+0x2b9/0x4db
[  228.024803]  ? worker_thread+0x22d/0x2a5
[  228.024809]  worker_thread+0x1d1/0x2a5
[  228.024815]  ? rescuer_thread+0x278/0x278
[  228.024820]  kthread+0x11c/0x124
[  228.024824]  ? kthread_park+0x71/0x71
[  228.024831]  ret_from_fork+0x3a/0x50

3. A longer capture test was then run.  After ~20 minutes, the reading 
process
deadlocked and lockdep reported the following.  A reboot was required.

[ 3196.020009] INFO: task kworker/2:4:412 blocked for more than 122 seconds.
[ 3196.020020]       Tainted: G        W         5.1.10-ld+ #4
[ 3196.020025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 3196.020032] kworker/2:4     D    0   412      2 0x80000000
[ 3196.020049] Workqueue: events hdpvr_transmit_buffers [hdpvr]
[ 3196.020055] Call Trace:
[ 3196.020069]  ? __schedule+0x8f8/0x959
[ 3196.020077]  ? __mutex_lock+0x2f7/0x7ac
[ 3196.020084]  schedule+0x69/0x81
[ 3196.020092]  schedule_preempt_disabled+0xc/0x14
[ 3196.020099]  __mutex_lock+0x4d0/0x7ac
[ 3196.020107]  ? __switch_to_asm+0x31/0x60
[ 3196.020115]  ? __switch_to_asm+0x25/0x60
[ 3196.020123]  ? hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
[ 3196.020131]  ? __switch_to_asm+0x25/0x60
[ 3196.020138]  ? __switch_to_asm+0x25/0x60
[ 3196.020146]  ? hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
[ 3196.020154]  hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
[ 3196.020164]  ? process_one_work+0x187/0x4db
[ 3196.020172]  process_one_work+0x2b9/0x4db
[ 3196.020181]  ? worker_thread+0x22d/0x2a5
[ 3196.020188]  worker_thread+0x1d1/0x2a5
[ 3196.020196]  ? rescuer_thread+0x278/0x278
[ 3196.020202]  kthread+0x11c/0x124
[ 3196.020209]  ? kthread_park+0x71/0x71
[ 3196.020217]  ret_from_fork+0x3a/0x50
[ 3196.020250] INFO: task cat_hdpvr:3736 blocked for more than 122 seconds.
[ 3196.020256]       Tainted: G        W         5.1.10-ld+ #4
[ 3196.020261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 3196.020266] cat_hdpvr       D    0  3736   3735 0x80000002
[ 3196.020273] Call Trace:
[ 3196.020281]  ? __schedule+0x8f8/0x959
[ 3196.020287]  ? __mutex_lock+0x2f7/0x7ac
[ 3196.020294]  schedule+0x69/0x81
[ 3196.020301]  schedule_preempt_disabled+0xc/0x14
[ 3196.020307]  __mutex_lock+0x4d0/0x7ac
[ 3196.020316]  ? hdpvr_release+0x24/0x5b [hdpvr]
[ 3196.020327]  ? kmem_cache_free+0x13a/0x20c
[ 3196.020334]  ? hdpvr_release+0x24/0x5b [hdpvr]
[ 3196.020341]  hdpvr_release+0x24/0x5b [hdpvr]
[ 3196.020350]  v4l2_release+0x83/0xbe
[ 3196.020360]  __fput+0xfa/0x19b
[ 3196.020369]  task_work_run+0x7a/0x9c
[ 3196.020377]  do_exit+0x4f3/0xa41
[ 3196.020385]  do_group_exit+0xad/0xad
[ 3196.020393]  __x64_sys_exit_group+0xf/0xf
[ 3196.020400]  do_syscall_64+0x52/0x155
[ 3196.020409]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3196.020416] RIP: 0033:0x7f41505bdcd6
[ 3196.020427] Code: Bad RIP value.
[ 3196.020433] RSP: 002b:00007ffe864cdad8 EFLAGS: 00000246 ORIG_RAX: 
00000000000000e7
[ 3196.020442] RAX: ffffffffffffffda RBX: 00007f41506b25c0 RCX: 
00007f41505bdcd6
[ 3196.020447] RDX: 0000000000000000 RSI: 000000000000003c RDI: 
0000000000000000
[ 3196.020453] RBP: 0000000000000000 R08: 00000000000000e7 R09: 
ffffffffffffff80
[ 3196.020458] R10: 0000000000000002 R11: 0000000000000246 R12: 
00007f41506b25c0
[ 3196.020463] R13: 0000000000000001 R14: 00007f41506bb268 R15: 
0000000000000000
[ 3196.020495] INFO: lockdep is turned off.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hdpvr mutex deadlock on kernel 5.1.x
  2019-06-15 20:06 hdpvr mutex deadlock on kernel 5.1.x Keith Pyle
@ 2019-06-17 10:22 ` Hans Verkuil
  2019-06-18  4:17   ` Keith Pyle
  0 siblings, 1 reply; 9+ messages in thread
From: Hans Verkuil @ 2019-06-17 10:22 UTC (permalink / raw)
  To: Keith Pyle, Linux Media Mailing List

On 6/15/19 10:06 PM, Keith Pyle wrote:
> We have observed a hard mutex deadlock in the hdpvr driver on both 5.1.8
> and 5.1.10.  The problem occurs while reading from the HD-PVR device
> and results in an unkillable user process.  It is readily reproducible.
> 
> The problem has been reproduced on two different systems and with two
> different HD-PVR devices using the 0x1e firmware.
> 
> We're not particularly familiar with the hdpvr code and could use
> advice/assistance on this problem.  We're certainly willing to test patches.

Great! :-)

Can you try this first:

Edit drivers/media/usb/hdpvr/hdpvr-video.c, go to function hdpvr_device_release()
and remove the mutex_lock/unlock around the flush_work(&dev->worker); call. That's
definitely wrong.

Compile and try again. I expect this will at least solve 1 and 2, but I'm not sure
about 3 (read deadlock).

Regards,

	Hans

> 
> As of this writing, origin/master is
> 0011572c883082a95e02d47f45fc4a42dc0e8634 (a commit in Linus' tree).
> 
> `git log v5.1.10..origin/master --grep=hdpvr --` returns no output, so
> there appear to be no post-5.1 commits mentioning the hdpvr directly.
> 
> `git log v5.1.10..origin/master -- drivers/media/usb/hdpvr/` shows
> only some comment updates and a strncpy change.  There's nothing that
> appears locking related.
> 
> -----
> 
> Steps taken to characterize and demonstrate the problem:
> 
> We enabled kernel lock debugging using these options:
> 
> CONFIG_LOCK_DEBUGGING_SUPPORT=y
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_LOCKDEP=y
> CONFIG_DEBUG_RT_MUTEXES=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
> CONFIG_DEBUG_RWSEMS=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
> 
> The kernel was built with:
> 
> gcc version 7.4.0 (Gentoo Hardened 7.4.0-r1 p1.2)
> GNU ld (Gentoo 2.31.1 p5) 2.31.1
> 
> Three snippets of lockdep output are included below.
> 
> 1. On opening the device, lockdep reported the following:
> 
> [  113.195852] ------------[ cut here ]------------
> [  113.195869] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000e54b9652>] prepare_to_wait_event+0xa7/0xe5
> [  113.195885] WARNING: CPU: 2 PID: 2616 at __might_sleep+0x52/0x65
> [  113.195889] Modules linked in: hdpvr x86_pkg_temp_thermal efivarfs
> [  113.195902] CPU: 2 PID: 2616 Comm: cat Not tainted 5.1.10-ld+ #4
> [  113.195906] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 1.60 07/11/2017
> [  113.195914] RIP: 0010:__might_sleep+0x52/0x65
> [  113.195920] Code: 80 3d e7 98 43 01 00 75 23 48 8b 90 38 21 00 00 48 c7 c7 ce 46 11 b8 c6 05 d0 98 43 01 01 48 8b 70 10 48 89 d1 e8 56 b2 fd ff <0f> 0b 44 89 e2 89 ee 48 89 df 5b 5d 41 5c e9 de fd
> ff ff 83 fe 07
> [  113.195925] RSP: 0018:ffffb73b0057fd30 EFLAGS: 00010282
> [  113.195930] RAX: 0000000000000070 RBX: ffffffffb81164a0 RCX: 0000000000000007
> [  113.195935] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff9ed35eb15610
> [  113.195939] RBP: 000000000000038c R08: 0000000000000001 R09: 000000000000000c
> [  113.195943] R10: ffffb73b0057fe18 R11: 0000008391c86b93 R12: 0000000000000000
> [  113.195947] R13: 0000000000020000 R14: 0000000000000002 R15: 0000000000000000
> [  113.195953] FS:  00007fa6eee2c580(0000) GS:ffff9ed35eb00000(0000) knlGS:0000000000000000
> [  113.195958] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  113.195963] CR2: 000056533197de84 CR3: 00000004425de002 CR4: 00000000001606e0
> [  113.195967] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  113.195972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  113.195975] Call Trace:
> [  113.195985]  __mutex_lock+0x47/0x7ac
> [  113.195997]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
> [  113.196005]  ? _raw_spin_unlock_irqrestore+0x39/0x4a
> [  113.196013]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
> [  113.196020]  hdpvr_get_next_buffer+0x16/0x4a [hdpvr]hdpvr_submit_buffers
> [  113.196029]  hdpvr_read+0x174/0x450 [hdpvr]
> [  113.196037]  ? wait_woken+0x86/0x86
> [  113.196044]  v4l2_read+0x38/0x7a
> [  113.196052]  vfs_read+0xad/0x136
> [  113.196059]  ksys_read+0x53/0x95
> [  113.196066]  do_syscall_64+0x52/0x155
> [  113.196074]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  113.196079] RIP: 0033:0x7fa6eed52005
> [  113.196085] Code: 00 00 0f 1f 00 48 83 ec 38 64 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 8d 05 35 85 0d 00 8b 00 85 c0 75 27 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 63 48 8b 4c 24 28 64 48 33 0c
> 25 28 00 00 00
> [  113.196089] RSP: 002b:00007fffd97d5630 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [  113.196096] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fa6eed52005
> [  113.196100] RDX: 0000000000020000 RSI: 00007fa6eee31000 RDI: 0000000000000003
> [  113.196104] RBP: 0000000000020000 R08: 00000000ffffffff R09: 0000000000000000
> [  113.196109] R10: 0000000000000022 R11: 0000000000000246 R12: 00007fa6eee31000
> [  113.196113] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000000
> [  113.196119] irq event stamp: 1514
> [  113.196128] hardirqs last  enabled at (1513): [<ffffffffb714832b>] console_unlock+0x493/0x4da
> [  113.196134] hardirqs last disabled at (1514): [<ffffffffb7001460>] trace_hardirqs_off_thunk+0x1a/0x1c
> [  113.196140] softirqs last  enabled at (1500): [<ffffffffb7c00376>] __do_softirq+0x376/0x3b2
> [  113.196148] softirqs last disabled at (1491): [<ffffffffb70f1ea2>] irq_exit+0x4e/0x9d
> [  113.196152] ---[ end trace 0881f93401450644 ]---
> 
> 2. A short capture from the device proceeded successfully and lockdep
> reported the following on turning the device off:
> 
> [  227.915296] usb 1-10: USB disconnect, device number 9
> [  227.917973] hdpvr 1-10:1.0: device video0 disconnected
> 
> [  228.024287] ======================================================
> [  228.024291] WARNING: possible circular locking dependency detected
> [  228.024296] 5.1.10-ld+ #4 Tainted: G        W
> [  228.024300] ------------------------------------------------------
> [  228.024304] kworker/3:1/41 is trying to acquire lock:
> [  228.024308] 000000002a90d673 ((work_completion)(&dev->worker)){+.+.}, at: __flush_work+0x3d/0x245
> [  228.024322]
>                but task is already holding lock:
> [  228.024326] 000000009cff49c4 (&dev->io_mutex){+.+.}, at: hdpvr_device_release+0x22/0x89 [hdpvr]
> [  228.024337]
>                which lock already depends on the new lock.
> 
> [  228.024341]
>                the existing dependency chain (in reverse order) is:
> [  228.024344]
>                -> #1 (&dev->io_mutex){+.+.}:
> [  228.024354]        __mutex_lock+0x81/0x7ac
> [  228.024360]        hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
> [  228.024366]        process_one_work+0x2b9/0x4db
> [  228.024372]        worker_thread+0x1d1/0x2a5
> [  228.024377]        kthread+0x11c/0x124
> [  228.024383]        ret_from_fork+0x3a/0x50
> [  228.024386]
>                -> #0 ((work_completion)(&dev->worker)){+.+.}:
> [  228.024395]        lock_acquire+0x14f/0x17a
> [  228.024400]        __flush_work+0x60/0x245
> [  228.024406]        hdpvr_device_release+0x2e/0x89 [hdpvr]
> [  228.024413]        v4l2_device_release+0xaa/0xc0
> [  228.024420]        device_release+0x52/0x7a
> [  228.024427]        kobject_put+0x78/0x8f
> [  228.024433]        hdpvr_disconnect+0xd1/0xdd [hdpvr]
> [  228.024441]        usb_unbind_interface+0x8a/0x1c5
> [  228.024447]        device_release_driver_internal+0xe7/0x198
> [  228.024452]        bus_remove_device+0xf8/0x10d
> [  228.024459]        device_del+0x19e/0x2c4
> [  228.024465]        usb_disable_device+0x7b/0x18d
> [  228.024470]        usb_disconnect+0x90/0x190
> [  228.024475]        hub_event+0x6c4/0xfc4
> [  228.024481]        process_one_work+0x2b9/0x4db
> [  228.024487]        worker_thread+0x1d1/0x2a5
> [  228.024491]        kthread+0x11c/0x124
> [  228.024497]        ret_from_fork+0x3a/0x50
> [  228.024500]
>                other info that might help us debug this:
> 
> [  228.024503]  Possible unsafe locking scenario:
> 
> [  228.024506]        CPU0                    CPU1
> [  228.024509]        ----                    ----
> [  228.024512]   lock(&dev->io_mutex);
> [  228.024516] lock((work_completion)(&dev->worker));
> [  228.024520] lock(&dev->io_mutex);
> [  228.024524]   lock((work_completion)(&dev->worker));
> [  228.024528]
>                 *** DEADLOCK ***
> 
> [  228.024532] 6 locks held by kworker/3:1/41:
> [  228.024535]  #0: 00000000d157af1b ((wq_completion)usb_hub_wq){+.+.}, at: process_one_work+0x187/0x4db
> [  228.024545]  #1: 000000009ea387d9 ((work_completion)(&hub->events)){+.+.}, at: process_one_work+0x187/0x4db
> [  228.024553]  #2: 00000000cf24eaef (&dev->mutex){....}, at: hub_event+0x5b/0xfc4
> [  228.024561]  #3: 0000000096a7bf78 (&dev->mutex){....}, at: usb_disconnect+0x55/0x190
> [  228.024568]  #4: 00000000681cf121 (&dev->mutex){....}, at: device_release_driver_internal+0x15/0x198
> [  228.024576]  #5: 000000009cff49c4 (&dev->io_mutex){+.+.}, at: hdpvr_device_release+0x22/0x89 [hdpvr]
> [  228.024585]
>                stack backtrace:
> [  228.024592] CPU: 3 PID: 41 Comm: kworker/3:1 Tainted: G W         5.1.10-ld+ #4
> [  228.024595] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 1.60 07/11/2017
> [  228.024602] Workqueue: usb_hub_wq hub_event
> [  228.024606] Call Trace:
> [  228.024614]  dump_stack+0x67/0x8e
> [  228.024622]  print_circular_bug.isra.21+0x265/0x272
> [  228.024629]  check_prev_add.constprop.29+0x479/0xa1c
> [  228.024636]  ? add_lock_to_list.isra.13+0x28/0xc1
> [  228.024642]  ? __lock_acquire+0xe83/0xf65
> [  228.024647]  ? alloc_lock_chain+0x11/0x34
> [  228.024652]  __lock_acquire+0xe83/0xf65
> [  228.024658]  lock_acquire+0x14f/0x17a
> [  228.024664]  ? __flush_work+0x3d/0x245
> [  228.024670]  __flush_work+0x60/0x245
> [  228.024675]  ? __flush_work+0x3d/0x245
> [  228.024682]  ? hdpvr_device_release+0x22/0x89 [hdpvr]
> [  228.024690]  ? kfree+0x196/0x217
> [  228.024696]  ? hdpvr_free_queue+0x68/0x73 [hdpvr]
> [  228.024703]  hdpvr_device_release+0x2e/0x89 [hdpvr]
> [  228.024708]  v4l2_device_release+0xaa/0xc0
> [  228.024716]  device_release+0x52/0x7a
> [  228.024723]  kobject_put+0x78/0x8f
> [  228.024729]  hdpvr_disconnect+0xd1/0xdd [hdpvr]
> [  228.024738]  usb_unbind_interface+0x8a/0x1c5
> [  228.024745]  device_release_driver_internal+0xe7/0x198
> [  228.024750]  bus_remove_device+0xf8/0x10d
> [  228.024758]  device_del+0x19e/0x2c4
> [  228.024764]  ? usb_remove_ep_devs+0x16/0x21
> [  228.024771]  usb_disable_device+0x7b/0x18d
> [  228.024777]  usb_disconnect+0x90/0x190
> [  228.024783]  hub_event+0x6c4/0xfc4
> [  228.024790]  ? process_one_work+0x187/0x4db
> [  228.024796]  process_one_work+0x2b9/0x4db
> [  228.024803]  ? worker_thread+0x22d/0x2a5
> [  228.024809]  worker_thread+0x1d1/0x2a5
> [  228.024815]  ? rescuer_thread+0x278/0x278
> [  228.024820]  kthread+0x11c/0x124
> [  228.024824]  ? kthread_park+0x71/0x71
> [  228.024831]  ret_from_fork+0x3a/0x50
> 
> 3. A longer capture test was then run.  After ~20 minutes, the reading process
> deadlocked and lockdep reported the following.  A reboot was required.
> 
> [ 3196.020009] INFO: task kworker/2:4:412 blocked for more than 122 seconds.
> [ 3196.020020]       Tainted: G        W         5.1.10-ld+ #4
> [ 3196.020025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 3196.020032] kworker/2:4     D    0   412      2 0x80000000
> [ 3196.020049] Workqueue: events hdpvr_transmit_buffers [hdpvr]
> [ 3196.020055] Call Trace:
> [ 3196.020069]  ? __schedule+0x8f8/0x959
> [ 3196.020077]  ? __mutex_lock+0x2f7/0x7ac
> [ 3196.020084]  schedule+0x69/0x81
> [ 3196.020092]  schedule_preempt_disabled+0xc/0x14
> [ 3196.020099]  __mutex_lock+0x4d0/0x7ac
> [ 3196.020107]  ? __switch_to_asm+0x31/0x60
> [ 3196.020115]  ? __switch_to_asm+0x25/0x60
> [ 3196.020123]  ? hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
> [ 3196.020131]  ? __switch_to_asm+0x25/0x60
> [ 3196.020138]  ? __switch_to_asm+0x25/0x60
> [ 3196.020146]  ? hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
> [ 3196.020154]  hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
> [ 3196.020164]  ? process_one_work+0x187/0x4db
> [ 3196.020172]  process_one_work+0x2b9/0x4db
> [ 3196.020181]  ? worker_thread+0x22d/0x2a5
> [ 3196.020188]  worker_thread+0x1d1/0x2a5
> [ 3196.020196]  ? rescuer_thread+0x278/0x278
> [ 3196.020202]  kthread+0x11c/0x124
> [ 3196.020209]  ? kthread_park+0x71/0x71
> [ 3196.020217]  ret_from_fork+0x3a/0x50
> [ 3196.020250] INFO: task cat_hdpvr:3736 blocked for more than 122 seconds.
> [ 3196.020256]       Tainted: G        W         5.1.10-ld+ #4
> [ 3196.020261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 3196.020266] cat_hdpvr       D    0  3736   3735 0x80000002
> [ 3196.020273] Call Trace:
> [ 3196.020281]  ? __schedule+0x8f8/0x959
> [ 3196.020287]  ? __mutex_lock+0x2f7/0x7ac
> [ 3196.020294]  schedule+0x69/0x81
> [ 3196.020301]  schedule_preempt_disabled+0xc/0x14
> [ 3196.020307]  __mutex_lock+0x4d0/0x7ac
> [ 3196.020316]  ? hdpvr_release+0x24/0x5b [hdpvr]
> [ 3196.020327]  ? kmem_cache_free+0x13a/0x20c
> [ 3196.020334]  ? hdpvr_release+0x24/0x5b [hdpvr]
> [ 3196.020341]  hdpvr_release+0x24/0x5b [hdpvr]
> [ 3196.020350]  v4l2_release+0x83/0xbe
> [ 3196.020360]  __fput+0xfa/0x19b
> [ 3196.020369]  task_work_run+0x7a/0x9c
> [ 3196.020377]  do_exit+0x4f3/0xa41
> [ 3196.020385]  do_group_exit+0xad/0xad
> [ 3196.020393]  __x64_sys_exit_group+0xf/0xf
> [ 3196.020400]  do_syscall_64+0x52/0x155
> [ 3196.020409]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 3196.020416] RIP: 0033:0x7f41505bdcd6
> [ 3196.020427] Code: Bad RIP value.
> [ 3196.020433] RSP: 002b:00007ffe864cdad8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> [ 3196.020442] RAX: ffffffffffffffda RBX: 00007f41506b25c0 RCX: 00007f41505bdcd6
> [ 3196.020447] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
> [ 3196.020453] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff80
> [ 3196.020458] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f41506b25c0
> [ 3196.020463] R13: 0000000000000001 R14: 00007f41506bb268 R15: 0000000000000000
> [ 3196.020495] INFO: lockdep is turned off.
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hdpvr mutex deadlock on kernel 5.1.x
  2019-06-17 10:22 ` Hans Verkuil
@ 2019-06-18  4:17   ` Keith Pyle
  2019-06-18  7:16     ` Hans Verkuil
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Pyle @ 2019-06-18  4:17 UTC (permalink / raw)
  To: Hans Verkuil, Linux Media Mailing List

On 06/17/19 05:22, Hans Verkuil wrote:
> On 6/15/19 10:06 PM, Keith Pyle wrote:
>> We have observed a hard mutex deadlock in the hdpvr driver on both 5.1.8
>> and 5.1.10.  The problem occurs while reading from the HD-PVR device
>> and results in an unkillable user process.  It is readily reproducible.
>>
>> The problem has been reproduced on two different systems and with two
>> different HD-PVR devices using the 0x1e firmware.
>>
>> We're not particularly familiar with the hdpvr code and could use
>> advice/assistance on this problem.  We're certainly willing to test patches.
> Great! :-)
>
> Can you try this first:
>
> Edit drivers/media/usb/hdpvr/hdpvr-video.c, go to function hdpvr_device_release()
> and remove the mutex_lock/unlock around the flush_work(&dev->worker); call. That's
> definitely wrong.
>
> Compile and try again. I expect this will at least solve 1 and 2, but I'm not sure
> about 3 (read deadlock).
>
> Regards,
>
> 	Hans
>
>> As of this writing, origin/master is
>> 0011572c883082a95e02d47f45fc4a42dc0e8634 (a commit in Linus' tree).
>>
>> `git log v5.1.10..origin/master --grep=hdpvr --` returns no output, so
>> there appear to be no post-5.1 commits mentioning the hdpvr directly.
>>
>> `git log v5.1.10..origin/master -- drivers/media/usb/hdpvr/` shows
>> only some comment updates and a strncpy change.  There's nothing that
>> appears locking related.
>>
>> -----
>>
>> Steps taken to characterize and demonstrate the problem:
>>
>> We enabled kernel lock debugging using these options:
>>
>> CONFIG_LOCK_DEBUGGING_SUPPORT=y
>> CONFIG_LOCKDEP_SUPPORT=y
>> CONFIG_LOCKDEP=y
>> CONFIG_DEBUG_RT_MUTEXES=y
>> CONFIG_DEBUG_SPINLOCK=y
>> CONFIG_DEBUG_MUTEXES=y
>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
>> CONFIG_DEBUG_RWSEMS=y
>> CONFIG_DEBUG_LOCK_ALLOC=y
>> CONFIG_DEBUG_ATOMIC_SLEEP=y
>>
>> The kernel was built with:
>>
>> gcc version 7.4.0 (Gentoo Hardened 7.4.0-r1 p1.2)
>> GNU ld (Gentoo 2.31.1 p5) 2.31.1
>>
>> Three snippets of lockdep output are included below.
>>
>> 1. On opening the device, lockdep reported the following:
>>
>> [  113.195852] ------------[ cut here ]------------
>> [  113.195869] do not call blocking ops when !TASK_RUNNING; state=1 set at [<00000000e54b9652>] prepare_to_wait_event+0xa7/0xe5
>> [  113.195885] WARNING: CPU: 2 PID: 2616 at __might_sleep+0x52/0x65
>> [  113.195889] Modules linked in: hdpvr x86_pkg_temp_thermal efivarfs
>> [  113.195902] CPU: 2 PID: 2616 Comm: cat Not tainted 5.1.10-ld+ #4
>> [  113.195906] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 1.60 07/11/2017
>> [  113.195914] RIP: 0010:__might_sleep+0x52/0x65
>> [  113.195920] Code: 80 3d e7 98 43 01 00 75 23 48 8b 90 38 21 00 00 48 c7 c7 ce 46 11 b8 c6 05 d0 98 43 01 01 48 8b 70 10 48 89 d1 e8 56 b2 fd ff <0f> 0b 44 89 e2 89 ee 48 89 df 5b 5d 41 5c e9 de fd
>> ff ff 83 fe 07
>> [  113.195925] RSP: 0018:ffffb73b0057fd30 EFLAGS: 00010282
>> [  113.195930] RAX: 0000000000000070 RBX: ffffffffb81164a0 RCX: 0000000000000007
>> [  113.195935] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff9ed35eb15610
>> [  113.195939] RBP: 000000000000038c R08: 0000000000000001 R09: 000000000000000c
>> [  113.195943] R10: ffffb73b0057fe18 R11: 0000008391c86b93 R12: 0000000000000000
>> [  113.195947] R13: 0000000000020000 R14: 0000000000000002 R15: 0000000000000000
>> [  113.195953] FS:  00007fa6eee2c580(0000) GS:ffff9ed35eb00000(0000) knlGS:0000000000000000
>> [  113.195958] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  113.195963] CR2: 000056533197de84 CR3: 00000004425de002 CR4: 00000000001606e0
>> [  113.195967] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  113.195972] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  113.195975] Call Trace:
>> [  113.195985]  __mutex_lock+0x47/0x7ac
>> [  113.195997]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
>> [  113.196005]  ? _raw_spin_unlock_irqrestore+0x39/0x4a
>> [  113.196013]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
>> [  113.196020]  hdpvr_get_next_buffer+0x16/0x4a [hdpvr]hdpvr_submit_buffers
>> [  113.196029]  hdpvr_read+0x174/0x450 [hdpvr]
>> [  113.196037]  ? wait_woken+0x86/0x86
>> [  113.196044]  v4l2_read+0x38/0x7a
>> [  113.196052]  vfs_read+0xad/0x136
>> [  113.196059]  ksys_read+0x53/0x95
>> [  113.196066]  do_syscall_64+0x52/0x155
>> [  113.196074]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [  113.196079] RIP: 0033:0x7fa6eed52005
>> [  113.196085] Code: 00 00 0f 1f 00 48 83 ec 38 64 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 8d 05 35 85 0d 00 8b 00 85 c0 75 27 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 63 48 8b 4c 24 28 64 48 33 0c
>> 25 28 00 00 00
>> [  113.196089] RSP: 002b:00007fffd97d5630 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>> [  113.196096] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fa6eed52005
>> [  113.196100] RDX: 0000000000020000 RSI: 00007fa6eee31000 RDI: 0000000000000003
>> [  113.196104] RBP: 0000000000020000 R08: 00000000ffffffff R09: 0000000000000000
>> [  113.196109] R10: 0000000000000022 R11: 0000000000000246 R12: 00007fa6eee31000
>> [  113.196113] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000000000
>> [  113.196119] irq event stamp: 1514
>> [  113.196128] hardirqs last  enabled at (1513): [<ffffffffb714832b>] console_unlock+0x493/0x4da
>> [  113.196134] hardirqs last disabled at (1514): [<ffffffffb7001460>] trace_hardirqs_off_thunk+0x1a/0x1c
>> [  113.196140] softirqs last  enabled at (1500): [<ffffffffb7c00376>] __do_softirq+0x376/0x3b2
>> [  113.196148] softirqs last disabled at (1491): [<ffffffffb70f1ea2>] irq_exit+0x4e/0x9d
>> [  113.196152] ---[ end trace 0881f93401450644 ]---
>>
>> 2. A short capture from the device proceeded successfully and lockdep
>> reported the following on turning the device off:
>>
>> [  227.915296] usb 1-10: USB disconnect, device number 9
>> [  227.917973] hdpvr 1-10:1.0: device video0 disconnected
>>
>> [  228.024287] ======================================================
>> [  228.024291] WARNING: possible circular locking dependency detected
>> [  228.024296] 5.1.10-ld+ #4 Tainted: G        W
>> [  228.024300] ------------------------------------------------------
>> [  228.024304] kworker/3:1/41 is trying to acquire lock:
>> [  228.024308] 000000002a90d673 ((work_completion)(&dev->worker)){+.+.}, at: __flush_work+0x3d/0x245
>> [  228.024322]
>>                 but task is already holding lock:
>> [  228.024326] 000000009cff49c4 (&dev->io_mutex){+.+.}, at: hdpvr_device_release+0x22/0x89 [hdpvr]
>> [  228.024337]
>>                 which lock already depends on the new lock.
>>
>> [  228.024341]
>>                 the existing dependency chain (in reverse order) is:
>> [  228.024344]
>>                 -> #1 (&dev->io_mutex){+.+.}:
>> [  228.024354]        __mutex_lock+0x81/0x7ac
>> [  228.024360]        hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
>> [  228.024366]        process_one_work+0x2b9/0x4db
>> [  228.024372]        worker_thread+0x1d1/0x2a5
>> [  228.024377]        kthread+0x11c/0x124
>> [  228.024383]        ret_from_fork+0x3a/0x50
>> [  228.024386]
>>                 -> #0 ((work_completion)(&dev->worker)){+.+.}:
>> [  228.024395]        lock_acquire+0x14f/0x17a
>> [  228.024400]        __flush_work+0x60/0x245
>> [  228.024406]        hdpvr_device_release+0x2e/0x89 [hdpvr]
>> [  228.024413]        v4l2_device_release+0xaa/0xc0
>> [  228.024420]        device_release+0x52/0x7a
>> [  228.024427]        kobject_put+0x78/0x8f
>> [  228.024433]        hdpvr_disconnect+0xd1/0xdd [hdpvr]
>> [  228.024441]        usb_unbind_interface+0x8a/0x1c5
>> [  228.024447]        device_release_driver_internal+0xe7/0x198
>> [  228.024452]        bus_remove_device+0xf8/0x10d
>> [  228.024459]        device_del+0x19e/0x2c4
>> [  228.024465]        usb_disable_device+0x7b/0x18d
>> [  228.024470]        usb_disconnect+0x90/0x190
>> [  228.024475]        hub_event+0x6c4/0xfc4
>> [  228.024481]        process_one_work+0x2b9/0x4db
>> [  228.024487]        worker_thread+0x1d1/0x2a5
>> [  228.024491]        kthread+0x11c/0x124
>> [  228.024497]        ret_from_fork+0x3a/0x50
>> [  228.024500]
>>                 other info that might help us debug this:
>>
>> [  228.024503]  Possible unsafe locking scenario:
>>
>> [  228.024506]        CPU0                    CPU1
>> [  228.024509]        ----                    ----
>> [  228.024512]   lock(&dev->io_mutex);
>> [  228.024516] lock((work_completion)(&dev->worker));
>> [  228.024520] lock(&dev->io_mutex);
>> [  228.024524]   lock((work_completion)(&dev->worker));
>> [  228.024528]
>>                  *** DEADLOCK ***
>>
>> [  228.024532] 6 locks held by kworker/3:1/41:
>> [  228.024535]  #0: 00000000d157af1b ((wq_completion)usb_hub_wq){+.+.}, at: process_one_work+0x187/0x4db
>> [  228.024545]  #1: 000000009ea387d9 ((work_completion)(&hub->events)){+.+.}, at: process_one_work+0x187/0x4db
>> [  228.024553]  #2: 00000000cf24eaef (&dev->mutex){....}, at: hub_event+0x5b/0xfc4
>> [  228.024561]  #3: 0000000096a7bf78 (&dev->mutex){....}, at: usb_disconnect+0x55/0x190
>> [  228.024568]  #4: 00000000681cf121 (&dev->mutex){....}, at: device_release_driver_internal+0x15/0x198
>> [  228.024576]  #5: 000000009cff49c4 (&dev->io_mutex){+.+.}, at: hdpvr_device_release+0x22/0x89 [hdpvr]
>> [  228.024585]
>>                 stack backtrace:
>> [  228.024592] CPU: 3 PID: 41 Comm: kworker/3:1 Tainted: G W         5.1.10-ld+ #4
>> [  228.024595] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 1.60 07/11/2017
>> [  228.024602] Workqueue: usb_hub_wq hub_event
>> [  228.024606] Call Trace:
>> [  228.024614]  dump_stack+0x67/0x8e
>> [  228.024622]  print_circular_bug.isra.21+0x265/0x272
>> [  228.024629]  check_prev_add.constprop.29+0x479/0xa1c
>> [  228.024636]  ? add_lock_to_list.isra.13+0x28/0xc1
>> [  228.024642]  ? __lock_acquire+0xe83/0xf65
>> [  228.024647]  ? alloc_lock_chain+0x11/0x34
>> [  228.024652]  __lock_acquire+0xe83/0xf65
>> [  228.024658]  lock_acquire+0x14f/0x17a
>> [  228.024664]  ? __flush_work+0x3d/0x245
>> [  228.024670]  __flush_work+0x60/0x245
>> [  228.024675]  ? __flush_work+0x3d/0x245
>> [  228.024682]  ? hdpvr_device_release+0x22/0x89 [hdpvr]
>> [  228.024690]  ? kfree+0x196/0x217
>> [  228.024696]  ? hdpvr_free_queue+0x68/0x73 [hdpvr]
>> [  228.024703]  hdpvr_device_release+0x2e/0x89 [hdpvr]
>> [  228.024708]  v4l2_device_release+0xaa/0xc0
>> [  228.024716]  device_release+0x52/0x7a
>> [  228.024723]  kobject_put+0x78/0x8f
>> [  228.024729]  hdpvr_disconnect+0xd1/0xdd [hdpvr]
>> [  228.024738]  usb_unbind_interface+0x8a/0x1c5
>> [  228.024745]  device_release_driver_internal+0xe7/0x198
>> [  228.024750]  bus_remove_device+0xf8/0x10d
>> [  228.024758]  device_del+0x19e/0x2c4
>> [  228.024764]  ? usb_remove_ep_devs+0x16/0x21
>> [  228.024771]  usb_disable_device+0x7b/0x18d
>> [  228.024777]  usb_disconnect+0x90/0x190
>> [  228.024783]  hub_event+0x6c4/0xfc4
>> [  228.024790]  ? process_one_work+0x187/0x4db
>> [  228.024796]  process_one_work+0x2b9/0x4db
>> [  228.024803]  ? worker_thread+0x22d/0x2a5
>> [  228.024809]  worker_thread+0x1d1/0x2a5
>> [  228.024815]  ? rescuer_thread+0x278/0x278
>> [  228.024820]  kthread+0x11c/0x124
>> [  228.024824]  ? kthread_park+0x71/0x71
>> [  228.024831]  ret_from_fork+0x3a/0x50
>>
>> 3. A longer capture test was then run.  After ~20 minutes, the reading process
>> deadlocked and lockdep reported the following.  A reboot was required.
>>
>> [ 3196.020009] INFO: task kworker/2:4:412 blocked for more than 122 seconds.
>> [ 3196.020020]       Tainted: G        W         5.1.10-ld+ #4
>> [ 3196.020025] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 3196.020032] kworker/2:4     D    0   412      2 0x80000000
>> [ 3196.020049] Workqueue: events hdpvr_transmit_buffers [hdpvr]
>> [ 3196.020055] Call Trace:
>> [ 3196.020069]  ? __schedule+0x8f8/0x959
>> [ 3196.020077]  ? __mutex_lock+0x2f7/0x7ac
>> [ 3196.020084]  schedule+0x69/0x81
>> [ 3196.020092]  schedule_preempt_disabled+0xc/0x14
>> [ 3196.020099]  __mutex_lock+0x4d0/0x7ac
>> [ 3196.020107]  ? __switch_to_asm+0x31/0x60
>> [ 3196.020115]  ? __switch_to_asm+0x25/0x60
>> [ 3196.020123]  ? hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
>> [ 3196.020131]  ? __switch_to_asm+0x25/0x60
>> [ 3196.020138]  ? __switch_to_asm+0x25/0x60
>> [ 3196.020146]  ? hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
>> [ 3196.020154]  hdpvr_transmit_buffers+0x3c/0x26c [hdpvr]
>> [ 3196.020164]  ? process_one_work+0x187/0x4db
>> [ 3196.020172]  process_one_work+0x2b9/0x4db
>> [ 3196.020181]  ? worker_thread+0x22d/0x2a5
>> [ 3196.020188]  worker_thread+0x1d1/0x2a5
>> [ 3196.020196]  ? rescuer_thread+0x278/0x278
>> [ 3196.020202]  kthread+0x11c/0x124
>> [ 3196.020209]  ? kthread_park+0x71/0x71
>> [ 3196.020217]  ret_from_fork+0x3a/0x50
>> [ 3196.020250] INFO: task cat_hdpvr:3736 blocked for more than 122 seconds.
>> [ 3196.020256]       Tainted: G        W         5.1.10-ld+ #4
>> [ 3196.020261] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [ 3196.020266] cat_hdpvr       D    0  3736   3735 0x80000002
>> [ 3196.020273] Call Trace:
>> [ 3196.020281]  ? __schedule+0x8f8/0x959
>> [ 3196.020287]  ? __mutex_lock+0x2f7/0x7ac
>> [ 3196.020294]  schedule+0x69/0x81
>> [ 3196.020301]  schedule_preempt_disabled+0xc/0x14
>> [ 3196.020307]  __mutex_lock+0x4d0/0x7ac
>> [ 3196.020316]  ? hdpvr_release+0x24/0x5b [hdpvr]
>> [ 3196.020327]  ? kmem_cache_free+0x13a/0x20c
>> [ 3196.020334]  ? hdpvr_release+0x24/0x5b [hdpvr]
>> [ 3196.020341]  hdpvr_release+0x24/0x5b [hdpvr]
>> [ 3196.020350]  v4l2_release+0x83/0xbe
>> [ 3196.020360]  __fput+0xfa/0x19b
>> [ 3196.020369]  task_work_run+0x7a/0x9c
>> [ 3196.020377]  do_exit+0x4f3/0xa41
>> [ 3196.020385]  do_group_exit+0xad/0xad
>> [ 3196.020393]  __x64_sys_exit_group+0xf/0xf
>> [ 3196.020400]  do_syscall_64+0x52/0x155
>> [ 3196.020409]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [ 3196.020416] RIP: 0033:0x7f41505bdcd6
>> [ 3196.020427] Code: Bad RIP value.
>> [ 3196.020433] RSP: 002b:00007ffe864cdad8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
>> [ 3196.020442] RAX: ffffffffffffffda RBX: 00007f41506b25c0 RCX: 00007f41505bdcd6
>> [ 3196.020447] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
>> [ 3196.020453] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff80
>> [ 3196.020458] R10: 0000000000000002 R11: 0000000000000246 R12: 00007f41506b25c0
>> [ 3196.020463] R13: 0000000000000001 R14: 00007f41506bb268 R15: 0000000000000000
>> [ 3196.020495] INFO: lockdep is turned off.
>>
>
We made the suggested change, compiled, installed, and rebooted. There 
was some progress - test 2 (turning the HD-PVR off) no longer produces a 
splat.  Test 1 (start capture) and test 3 (run capture and trigger 
HD-PVR to stop streaming) both still produce a traceback (see below).  
Test 3 also still results in the unkillable process.

--- test 1 trace

[  286.987001] ------------[ cut here ]------------
[  286.987006] do not call blocking ops when !TASK_RUNNING; state=1 set 
at [<0000000007f37036>] prepare_to_wait_event+0xa7/0xe5
[  286.987009] WARNING: CPU: 1 PID: 2768 at __might_sleep+0x52/0x65
[  286.987010] Modules linked in: x86_pkg_temp_thermal hdpvr efivarfs
[  286.987013] CPU: 1 PID: 2768 Comm: cat_hdpvr Not tainted 5.1.10-ld+ #4
[  286.987014] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 
1.60 07/11/2017
[  286.987015] RIP: 0010:__might_sleep+0x52/0x65
[  286.987016] Code: 80 3d e7 98 43 01 00 75 23 48 8b 90 38 21 00 00 48 
c7 c7 ce 46 11 b7 c6 05 d0 98 43 01 01 48 8b 70 10 48 89 d1 e8 56 b2 fd 
ff <0f> 0b 44 89 e2 89 ee 48 89 df 5b 5d 41 5c e9 de fd ff ff 83 fe 07
[  286.987017] RSP: 0018:ffffa24480677d30 EFLAGS: 00010282
[  286.987018] RAX: 0000000000000070 RBX: ffffffffb71164a0 RCX: 
0000000000000007
[  286.987019] RDX: 0000000000000000 RSI: 0000000000000006 RDI: 
ffff9f1c5ea95610
[  286.987020] RBP: 000000000000038c R08: 0000000000000001 R09: 
000000000000000c
[  286.987020] R10: ffffa24480677e18 R11: 0000012dadcb5f34 R12: 
0000000000000000
[  286.987021] R13: 0000000000002000 R14: 0000000000000002 R15: 
0000000000000000
[  286.987022] FS:  00007f8130e71540(0000) GS:ffff9f1c5ea80000(0000) 
knlGS:0000000000000000
[  286.987023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  286.987024] CR2: 00007ffc0fb650a0 CR3: 000000043f3c6002 CR4: 
00000000001606e0
[  286.987025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[  286.987025] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000400
[  286.987026] Call Trace:
[  286.987028]  __mutex_lock+0x47/0x7ac
[  286.987031]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
[  286.987032]  ? find_held_lock+0x2b/0x6e
[  286.987033]  ? prepare_to_wait_event+0xdc/0xe5
[  286.987034]  ? _raw_spin_unlock_irqrestore+0x37/0x4a
[  286.987036]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
[  286.987037]  hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
[  286.987038]  hdpvr_read+0x174/0x450 [hdpvr]
[  286.987040]  ? _raw_spin_unlock_irq+0x24/0x35
[  286.987041]  ? lockdep_hardirqs_on+0x168/0x183
[  286.987042]  ? wait_woken+0x86/0x86
[  286.987044]  v4l2_read+0x38/0x7a
[  286.987045]  vfs_read+0xad/0x136
[  286.987047]  ksys_read+0x53/0x95
[  286.987048]  do_syscall_64+0x52/0x155
[  286.987050]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[  286.987051] RIP: 0033:0x7f8130d97005
[  286.987052] Code: 00 00 0f 1f 00 48 83 ec 38 64 48 8b 04 25 28 00 00 
00 48 89 44 24 28 31 c0 48 8d 05 35 85 0d 00 8b 00 85 c0 75 27 31 c0 0f 
05 <48> 3d 00 f0 ff ff 77 63 48 8b 4c 24 28 64 48 33 0c 25 28 00 00 00
[  286.987052] RSP: 002b:00007ffc0fb2aea0 EFLAGS: 00000246 ORIG_RAX: 
0000000000000000
[  286.987053] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 
00007f8130d97005
[  286.987054] RDX: 0000000000002000 RSI: 0000555c13090080 RDI: 
0000000000000003
[  286.987055] RBP: 00007ffc0fb2af70 R08: 0000000000000000 R09: 
0000000000000000
[  286.987056] R10: 0000000000000000 R11: 0000000000000246 R12: 
0000555c13090080
[  286.987056] R13: 0000000000000000 R14: 00007ffc0fb2d165 R15: 
0000000000000000
[  286.987058] irq event stamp: 1644
[  286.987059] hardirqs last  enabled at (1643): [<ffffffffb614832b>] 
console_unlock+0x493/0x4da
[  286.987060] hardirqs last disabled at (1644): [<ffffffffb6001460>] 
trace_hardirqs_off_thunk+0x1a/0x1c
[  286.987062] softirqs last  enabled at (1246): [<ffffffffb6c00376>] 
__do_softirq+0x376/0x3b2
[  286.987063] softirqs last disabled at (1237): [<ffffffffb60f1ea2>] 
irq_exit+0x4e/0x9d
[  286.987064] ---[ end trace 64a925bcd732953e ]---

--- test 3 trace

[ 7933.630215] ------------[ cut here ]------------
[ 7933.630218] DEBUG_LOCKS_WARN_ON(depth <= 0)
[ 7933.630224] WARNING: CPU: 0 PID: 14570 at lock_release+0x147/0x230
[ 7933.630226] Modules linked in: x86_pkg_temp_thermal hdpvr efivarfs
[ 7933.630229] CPU: 0 PID: 14570 Comm: cat_hdpvr Tainted: G W         
5.1.10-ld+ #4
[ 7933.630230] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 
1.60 07/11/2017
[ 7933.630232] RIP: 0010:lock_release+0x147/0x230
[ 7933.630233] Code: 01 5f 04 75 29 e8 35 e7 25 00 85 c0 74 3b 83 3d 18 
1a 42 01 00 75 32 48 c7 c6 a3 78 11 b7 48 c7 c7 2f 44 10 b7 e8 a9 10 fb 
ff <0f> 0b eb 1b 48 8d 4c 24 04 44 89 fa 48 89 ee 48 89 df e8 87 c4 ff
[ 7933.630234] RSP: 0018:ffffa24481c6fd88 EFLAGS: 00010082
[ 7933.630236] RAX: 000000000000001f RBX: ffff9f1c28ea8000 RCX: 
0000000000000000
[ 7933.630237] RDX: ffffffffb6149356 RSI: 0000000000000001 RDI: 
ffffffffb6149368
[ 7933.630238] RBP: ffff9f1c2c838820 R08: ffffffffba72cee0 R09: 
000000000000001f
[ 7933.630239] R10: 0000000000000001 R11: 00001e633eb23eea R12: 
ffffffffc0366de1
[ 7933.630240] R13: 0000000000000246 R14: ffff9f1c28500700 R15: 
0000000000000000
[ 7933.630241] FS:  00007fd61de88540(0000) GS:ffff9f1c5ea00000(0000) 
knlGS:0000000000000000
[ 7933.630242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7933.630243] CR2: 0000000007af0000 CR3: 0000000428622002 CR4: 
00000000001606f0
[ 7933.630244] DR0: 0000000037e28bcc DR1: 0000000000000000 DR2: 
0000000000000000
[ 7933.630245] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 
0000000000000600
[ 7933.630245] Call Trace:
[ 7933.630249]  __mutex_unlock_slowpath+0x34/0x1e9
[ 7933.630252]  hdpvr_stop_streaming+0x78/0x169 [hdpvr]
[ 7933.630254]  hdpvr_read+0x33c/0x450 [hdpvr]
[ 7933.630256]  ? lockdep_hardirqs_on+0x168/0x183
[ 7933.630258]  ? wait_woken+0x86/0x86
[ 7933.630260]  v4l2_read+0x38/0x7a
[ 7933.630262]  vfs_read+0xad/0x136
[ 7933.630264]  ksys_read+0x53/0x95
[ 7933.630266]  do_syscall_64+0x52/0x155
[ 7933.630268]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 7933.630270] RIP: 0033:0x7fd61ddae005
[ 7933.630271] Code: 00 00 0f 1f 00 48 83 ec 38 64 48 8b 04 25 28 00 00 
00 48 89 44 24 28 31 c0 48 8d 05 35 85 0d 00 8b 00 85 c0 75 27 31 c0 0f 
05 <48> 3d 00 f0 ff ff 77 63 48 8b 4c 24 28 64 48 33 0c 25 28 00 00 00
[ 7933.630272] RSP: 002b:00007fffdb949a60 EFLAGS: 00000246 ORIG_RAX: 
0000000000000000
[ 7933.630274] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 
00007fd61ddae005
[ 7933.630275] RDX: 0000000000002000 RSI: 000055caa2856080 RDI: 
0000000000000003
[ 7933.630275] RBP: 00007fffdb949b30 R08: 0000000000000000 R09: 
0000000000000000
[ 7933.630276] R10: 0000000000000000 R11: 0000000000000246 R12: 
000055caa2856080
[ 7933.630277] R13: 0000000000000000 R14: 00007fffdb94a156 R15: 
0000000000002000
[ 7933.630279] irq event stamp: 30970489
[ 7933.630281] hardirqs last  enabled at (30970489): 
[<ffffffffb620e6eb>] kfree+0x196/0x217
[ 7933.630283] hardirqs last disabled at (30970488): 
[<ffffffffb620e6c8>] kfree+0x173/0x217
[ 7933.630284] softirqs last  enabled at (30970034): 
[<ffffffffb6c00376>] __do_softirq+0x376/0x3b2
[ 7933.630286] softirqs last disabled at (30970027): 
[<ffffffffb60f1ea2>] irq_exit+0x4e/0x9d
[ 7933.630287] ---[ end trace 64a925bcd732953f ]---


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hdpvr mutex deadlock on kernel 5.1.x
  2019-06-18  4:17   ` Keith Pyle
@ 2019-06-18  7:16     ` Hans Verkuil
  2019-06-19  2:29       ` Keith Pyle
  0 siblings, 1 reply; 9+ messages in thread
From: Hans Verkuil @ 2019-06-18  7:16 UTC (permalink / raw)
  To: Keith Pyle, Linux Media Mailing List

Hi Keith,

On 6/18/19 6:17 AM, Keith Pyle wrote:
> We made the suggested change, compiled, installed, and rebooted. There was some progress - test 2 (turning the HD-PVR off) no longer produces a splat.  Test 1 (start capture) and test 3 (run capture
> and trigger HD-PVR to stop streaming) both still produce a traceback (see below).  Test 3 also still results in the unkillable process.

Try the following patch. Test 2 was caused by locking when it shouldn't, test 3 was caused by not
locking when it should :-) and I think test 1 was caused by locking when it is not allowed.

Let me know if this works!

Regards,

	Hans

Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
---
diff --git a/drivers/media/usb/hdpvr/hdpvr-video.c b/drivers/media/usb/hdpvr/hdpvr-video.c
index 3786ddcc0d18..4ba134a3bc25 100644
--- a/drivers/media/usb/hdpvr/hdpvr-video.c
+++ b/drivers/media/usb/hdpvr/hdpvr-video.c
@@ -435,7 +435,7 @@ static ssize_t hdpvr_read(struct file *file, char __user *buffer, size_t count,
 	/* wait for the first buffer */
 	if (!(file->f_flags & O_NONBLOCK)) {
 		if (wait_event_interruptible(dev->wait_data,
-					     hdpvr_get_next_buffer(dev)))
+					     !list_empty_careful(&dev->rec_buff_list)))
 			return -ERESTARTSYS;
 	}

@@ -463,7 +463,9 @@ static ssize_t hdpvr_read(struct file *file, char __user *buffer, size_t count,
 			if (!err) {
 				v4l2_dbg(MSG_INFO, hdpvr_debug, &dev->v4l2_dev,
 					"timeout: restart streaming\n");
+				mutex_lock(&dev->io_mutex);
 				hdpvr_stop_streaming(dev);
+				mutex_unlock(&dev->io_mutex);
 				msecs_to_jiffies(4000);
 				err = hdpvr_start_streaming(dev);
 				if (err) {
@@ -1124,9 +1126,7 @@ static void hdpvr_device_release(struct video_device *vdev)
 	struct hdpvr_device *dev = video_get_drvdata(vdev);

 	hdpvr_delete(dev);
-	mutex_lock(&dev->io_mutex);
 	flush_work(&dev->worker);
-	mutex_unlock(&dev->io_mutex);

 	v4l2_device_unregister(&dev->v4l2_dev);
 	v4l2_ctrl_handler_free(&dev->hdl);



> 
> --- test 1 trace
> 
> [  286.987001] ------------[ cut here ]------------
> [  286.987006] do not call blocking ops when !TASK_RUNNING; state=1 set at [<0000000007f37036>] prepare_to_wait_event+0xa7/0xe5
> [  286.987009] WARNING: CPU: 1 PID: 2768 at __might_sleep+0x52/0x65
> [  286.987010] Modules linked in: x86_pkg_temp_thermal hdpvr efivarfs
> [  286.987013] CPU: 1 PID: 2768 Comm: cat_hdpvr Not tainted 5.1.10-ld+ #4
> [  286.987014] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 1.60 07/11/2017
> [  286.987015] RIP: 0010:__might_sleep+0x52/0x65
> [  286.987016] Code: 80 3d e7 98 43 01 00 75 23 48 8b 90 38 21 00 00 48 c7 c7 ce 46 11 b7 c6 05 d0 98 43 01 01 48 8b 70 10 48 89 d1 e8 56 b2 fd ff <0f> 0b 44 89 e2 89 ee 48 89 df 5b 5d 41 5c e9 de fd
> ff ff 83 fe 07
> [  286.987017] RSP: 0018:ffffa24480677d30 EFLAGS: 00010282
> [  286.987018] RAX: 0000000000000070 RBX: ffffffffb71164a0 RCX: 0000000000000007
> [  286.987019] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff9f1c5ea95610
> [  286.987020] RBP: 000000000000038c R08: 0000000000000001 R09: 000000000000000c
> [  286.987020] R10: ffffa24480677e18 R11: 0000012dadcb5f34 R12: 0000000000000000
> [  286.987021] R13: 0000000000002000 R14: 0000000000000002 R15: 0000000000000000
> [  286.987022] FS:  00007f8130e71540(0000) GS:ffff9f1c5ea80000(0000) knlGS:0000000000000000
> [  286.987023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  286.987024] CR2: 00007ffc0fb650a0 CR3: 000000043f3c6002 CR4: 00000000001606e0
> [  286.987025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  286.987025] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  286.987026] Call Trace:
> [  286.987028]  __mutex_lock+0x47/0x7ac
> [  286.987031]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
> [  286.987032]  ? find_held_lock+0x2b/0x6e
> [  286.987033]  ? prepare_to_wait_event+0xdc/0xe5
> [  286.987034]  ? _raw_spin_unlock_irqrestore+0x37/0x4a
> [  286.987036]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
> [  286.987037]  hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
> [  286.987038]  hdpvr_read+0x174/0x450 [hdpvr]
> [  286.987040]  ? _raw_spin_unlock_irq+0x24/0x35
> [  286.987041]  ? lockdep_hardirqs_on+0x168/0x183
> [  286.987042]  ? wait_woken+0x86/0x86
> [  286.987044]  v4l2_read+0x38/0x7a
> [  286.987045]  vfs_read+0xad/0x136
> [  286.987047]  ksys_read+0x53/0x95
> [  286.987048]  do_syscall_64+0x52/0x155
> [  286.987050]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [  286.987051] RIP: 0033:0x7f8130d97005
> [  286.987052] Code: 00 00 0f 1f 00 48 83 ec 38 64 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 8d 05 35 85 0d 00 8b 00 85 c0 75 27 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 63 48 8b 4c 24 28 64 48 33 0c
> 25 28 00 00 00
> [  286.987052] RSP: 002b:00007ffc0fb2aea0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [  286.987053] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8130d97005
> [  286.987054] RDX: 0000000000002000 RSI: 0000555c13090080 RDI: 0000000000000003
> [  286.987055] RBP: 00007ffc0fb2af70 R08: 0000000000000000 R09: 0000000000000000
> [  286.987056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000555c13090080
> [  286.987056] R13: 0000000000000000 R14: 00007ffc0fb2d165 R15: 0000000000000000
> [  286.987058] irq event stamp: 1644
> [  286.987059] hardirqs last  enabled at (1643): [<ffffffffb614832b>] console_unlock+0x493/0x4da
> [  286.987060] hardirqs last disabled at (1644): [<ffffffffb6001460>] trace_hardirqs_off_thunk+0x1a/0x1c
> [  286.987062] softirqs last  enabled at (1246): [<ffffffffb6c00376>] __do_softirq+0x376/0x3b2
> [  286.987063] softirqs last disabled at (1237): [<ffffffffb60f1ea2>] irq_exit+0x4e/0x9d
> [  286.987064] ---[ end trace 64a925bcd732953e ]---
> 
> --- test 3 trace
> 
> [ 7933.630215] ------------[ cut here ]------------
> [ 7933.630218] DEBUG_LOCKS_WARN_ON(depth <= 0)
> [ 7933.630224] WARNING: CPU: 0 PID: 14570 at lock_release+0x147/0x230
> [ 7933.630226] Modules linked in: x86_pkg_temp_thermal hdpvr efivarfs
> [ 7933.630229] CPU: 0 PID: 14570 Comm: cat_hdpvr Tainted: G W         5.1.10-ld+ #4
> [ 7933.630230] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 1.60 07/11/2017
> [ 7933.630232] RIP: 0010:lock_release+0x147/0x230
> [ 7933.630233] Code: 01 5f 04 75 29 e8 35 e7 25 00 85 c0 74 3b 83 3d 18 1a 42 01 00 75 32 48 c7 c6 a3 78 11 b7 48 c7 c7 2f 44 10 b7 e8 a9 10 fb ff <0f> 0b eb 1b 48 8d 4c 24 04 44 89 fa 48 89 ee 48 89
> df e8 87 c4 ff
> [ 7933.630234] RSP: 0018:ffffa24481c6fd88 EFLAGS: 00010082
> [ 7933.630236] RAX: 000000000000001f RBX: ffff9f1c28ea8000 RCX: 0000000000000000
> [ 7933.630237] RDX: ffffffffb6149356 RSI: 0000000000000001 RDI: ffffffffb6149368
> [ 7933.630238] RBP: ffff9f1c2c838820 R08: ffffffffba72cee0 R09: 000000000000001f
> [ 7933.630239] R10: 0000000000000001 R11: 00001e633eb23eea R12: ffffffffc0366de1
> [ 7933.630240] R13: 0000000000000246 R14: ffff9f1c28500700 R15: 0000000000000000
> [ 7933.630241] FS:  00007fd61de88540(0000) GS:ffff9f1c5ea00000(0000) knlGS:0000000000000000
> [ 7933.630242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7933.630243] CR2: 0000000007af0000 CR3: 0000000428622002 CR4: 00000000001606f0
> [ 7933.630244] DR0: 0000000037e28bcc DR1: 0000000000000000 DR2: 0000000000000000
> [ 7933.630245] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> [ 7933.630245] Call Trace:
> [ 7933.630249]  __mutex_unlock_slowpath+0x34/0x1e9
> [ 7933.630252]  hdpvr_stop_streaming+0x78/0x169 [hdpvr]
> [ 7933.630254]  hdpvr_read+0x33c/0x450 [hdpvr]
> [ 7933.630256]  ? lockdep_hardirqs_on+0x168/0x183
> [ 7933.630258]  ? wait_woken+0x86/0x86
> [ 7933.630260]  v4l2_read+0x38/0x7a
> [ 7933.630262]  vfs_read+0xad/0x136
> [ 7933.630264]  ksys_read+0x53/0x95
> [ 7933.630266]  do_syscall_64+0x52/0x155
> [ 7933.630268]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 7933.630270] RIP: 0033:0x7fd61ddae005
> [ 7933.630271] Code: 00 00 0f 1f 00 48 83 ec 38 64 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 8d 05 35 85 0d 00 8b 00 85 c0 75 27 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 63 48 8b 4c 24 28 64 48 33 0c
> 25 28 00 00 00
> [ 7933.630272] RSP: 002b:00007fffdb949a60 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [ 7933.630274] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd61ddae005
> [ 7933.630275] RDX: 0000000000002000 RSI: 000055caa2856080 RDI: 0000000000000003
> [ 7933.630275] RBP: 00007fffdb949b30 R08: 0000000000000000 R09: 0000000000000000
> [ 7933.630276] R10: 0000000000000000 R11: 0000000000000246 R12: 000055caa2856080
> [ 7933.630277] R13: 0000000000000000 R14: 00007fffdb94a156 R15: 0000000000002000
> [ 7933.630279] irq event stamp: 30970489
> [ 7933.630281] hardirqs last  enabled at (30970489): [<ffffffffb620e6eb>] kfree+0x196/0x217
> [ 7933.630283] hardirqs last disabled at (30970488): [<ffffffffb620e6c8>] kfree+0x173/0x217
> [ 7933.630284] softirqs last  enabled at (30970034): [<ffffffffb6c00376>] __do_softirq+0x376/0x3b2
> [ 7933.630286] softirqs last disabled at (30970027): [<ffffffffb60f1ea2>] irq_exit+0x4e/0x9d
> [ 7933.630287] ---[ end trace 64a925bcd732953f ]---
> 


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: hdpvr mutex deadlock on kernel 5.1.x
  2019-06-18  7:16     ` Hans Verkuil
@ 2019-06-19  2:29       ` Keith Pyle
  2019-06-20 11:33         ` Hans Verkuil
  0 siblings, 1 reply; 9+ messages in thread
From: Keith Pyle @ 2019-06-19  2:29 UTC (permalink / raw)
  To: Hans Verkuil, Linux Media Mailing List

On 06/18/19 02:16, Hans Verkuil wrote:
> Hi Keith,
>
> On 6/18/19 6:17 AM, Keith Pyle wrote:
>> We made the suggested change, compiled, installed, and rebooted. There was some progress - test 2 (turning the HD-PVR off) no longer produces a splat.  Test 1 (start capture) and test 3 (run capture
>> and trigger HD-PVR to stop streaming) both still produce a traceback (see below).  Test 3 also still results in the unkillable process.
> Try the following patch. Test 2 was caused by locking when it shouldn't, test 3 was caused by not
> locking when it should :-) and I think test 1 was caused by locking when it is not allowed.
>
> Let me know if this works!
>
> Regards,
>
> 	Hans
Good news!  With these patches, lockdep does not report any of the prior 
problems and the capture process does not deadlock for my test3.

There is one item I noted: hdpvr_read has the line

msec_to_jiffies(4000);

that doesn't really do anything.  This should be a 4 second sleep, based 
on our discussion back in 2014 
(https://www.mail-archive.com/linux-media@vger.kernel.org/msg75163.html), 
since the restart will certainly fail unless the HD-PVR is given at 
least 3 seconds to reset after the stop.

Keith
>
> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
> ---
> diff --git a/drivers/media/usb/hdpvr/hdpvr-video.c b/drivers/media/usb/hdpvr/hdpvr-video.c
> index 3786ddcc0d18..4ba134a3bc25 100644
> --- a/drivers/media/usb/hdpvr/hdpvr-video.c
> +++ b/drivers/media/usb/hdpvr/hdpvr-video.c
> @@ -435,7 +435,7 @@ static ssize_t hdpvr_read(struct file *file, char __user *buffer, size_t count,
>   	/* wait for the first buffer */
>   	if (!(file->f_flags & O_NONBLOCK)) {
>   		if (wait_event_interruptible(dev->wait_data,
> -					     hdpvr_get_next_buffer(dev)))
> +					     !list_empty_careful(&dev->rec_buff_list)))
>   			return -ERESTARTSYS;
>   	}
>
> @@ -463,7 +463,9 @@ static ssize_t hdpvr_read(struct file *file, char __user *buffer, size_t count,
>   			if (!err) {
>   				v4l2_dbg(MSG_INFO, hdpvr_debug, &dev->v4l2_dev,
>   					"timeout: restart streaming\n");
> +				mutex_lock(&dev->io_mutex);
>   				hdpvr_stop_streaming(dev);
> +				mutex_unlock(&dev->io_mutex);
>   				msecs_to_jiffies(4000);
>   				err = hdpvr_start_streaming(dev);
>   				if (err) {
> @@ -1124,9 +1126,7 @@ static void hdpvr_device_release(struct video_device *vdev)
>   	struct hdpvr_device *dev = video_get_drvdata(vdev);
>
>   	hdpvr_delete(dev);
> -	mutex_lock(&dev->io_mutex);
>   	flush_work(&dev->worker);
> -	mutex_unlock(&dev->io_mutex);
>
>   	v4l2_device_unregister(&dev->v4l2_dev);
>   	v4l2_ctrl_handler_free(&dev->hdl);
>
>
>
>> --- test 1 trace
>>
>> [  286.987001] ------------[ cut here ]------------
>> [  286.987006] do not call blocking ops when !TASK_RUNNING; state=1 set at [<0000000007f37036>] prepare_to_wait_event+0xa7/0xe5
>> [  286.987009] WARNING: CPU: 1 PID: 2768 at __might_sleep+0x52/0x65
>> [  286.987010] Modules linked in: x86_pkg_temp_thermal hdpvr efivarfs
>> [  286.987013] CPU: 1 PID: 2768 Comm: cat_hdpvr Not tainted 5.1.10-ld+ #4
>> [  286.987014] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 1.60 07/11/2017
>> [  286.987015] RIP: 0010:__might_sleep+0x52/0x65
>> [  286.987016] Code: 80 3d e7 98 43 01 00 75 23 48 8b 90 38 21 00 00 48 c7 c7 ce 46 11 b7 c6 05 d0 98 43 01 01 48 8b 70 10 48 89 d1 e8 56 b2 fd ff <0f> 0b 44 89 e2 89 ee 48 89 df 5b 5d 41 5c e9 de fd
>> ff ff 83 fe 07
>> [  286.987017] RSP: 0018:ffffa24480677d30 EFLAGS: 00010282
>> [  286.987018] RAX: 0000000000000070 RBX: ffffffffb71164a0 RCX: 0000000000000007
>> [  286.987019] RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffff9f1c5ea95610
>> [  286.987020] RBP: 000000000000038c R08: 0000000000000001 R09: 000000000000000c
>> [  286.987020] R10: ffffa24480677e18 R11: 0000012dadcb5f34 R12: 0000000000000000
>> [  286.987021] R13: 0000000000002000 R14: 0000000000000002 R15: 0000000000000000
>> [  286.987022] FS:  00007f8130e71540(0000) GS:ffff9f1c5ea80000(0000) knlGS:0000000000000000
>> [  286.987023] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  286.987024] CR2: 00007ffc0fb650a0 CR3: 000000043f3c6002 CR4: 00000000001606e0
>> [  286.987025] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> [  286.987025] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> [  286.987026] Call Trace:
>> [  286.987028]  __mutex_lock+0x47/0x7ac
>> [  286.987031]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
>> [  286.987032]  ? find_held_lock+0x2b/0x6e
>> [  286.987033]  ? prepare_to_wait_event+0xdc/0xe5
>> [  286.987034]  ? _raw_spin_unlock_irqrestore+0x37/0x4a
>> [  286.987036]  ? hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
>> [  286.987037]  hdpvr_get_next_buffer+0x16/0x4a [hdpvr]
>> [  286.987038]  hdpvr_read+0x174/0x450 [hdpvr]
>> [  286.987040]  ? _raw_spin_unlock_irq+0x24/0x35
>> [  286.987041]  ? lockdep_hardirqs_on+0x168/0x183
>> [  286.987042]  ? wait_woken+0x86/0x86
>> [  286.987044]  v4l2_read+0x38/0x7a
>> [  286.987045]  vfs_read+0xad/0x136
>> [  286.987047]  ksys_read+0x53/0x95
>> [  286.987048]  do_syscall_64+0x52/0x155
>> [  286.987050]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [  286.987051] RIP: 0033:0x7f8130d97005
>> [  286.987052] Code: 00 00 0f 1f 00 48 83 ec 38 64 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 8d 05 35 85 0d 00 8b 00 85 c0 75 27 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 63 48 8b 4c 24 28 64 48 33 0c
>> 25 28 00 00 00
>> [  286.987052] RSP: 002b:00007ffc0fb2aea0 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>> [  286.987053] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f8130d97005
>> [  286.987054] RDX: 0000000000002000 RSI: 0000555c13090080 RDI: 0000000000000003
>> [  286.987055] RBP: 00007ffc0fb2af70 R08: 0000000000000000 R09: 0000000000000000
>> [  286.987056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000555c13090080
>> [  286.987056] R13: 0000000000000000 R14: 00007ffc0fb2d165 R15: 0000000000000000
>> [  286.987058] irq event stamp: 1644
>> [  286.987059] hardirqs last  enabled at (1643): [<ffffffffb614832b>] console_unlock+0x493/0x4da
>> [  286.987060] hardirqs last disabled at (1644): [<ffffffffb6001460>] trace_hardirqs_off_thunk+0x1a/0x1c
>> [  286.987062] softirqs last  enabled at (1246): [<ffffffffb6c00376>] __do_softirq+0x376/0x3b2
>> [  286.987063] softirqs last disabled at (1237): [<ffffffffb60f1ea2>] irq_exit+0x4e/0x9d
>> [  286.987064] ---[ end trace 64a925bcd732953e ]---
>>
>> --- test 3 trace
>>
>> [ 7933.630215] ------------[ cut here ]------------
>> [ 7933.630218] DEBUG_LOCKS_WARN_ON(depth <= 0)
>> [ 7933.630224] WARNING: CPU: 0 PID: 14570 at lock_release+0x147/0x230
>> [ 7933.630226] Modules linked in: x86_pkg_temp_thermal hdpvr efivarfs
>> [ 7933.630229] CPU: 0 PID: 14570 Comm: cat_hdpvr Tainted: G W         5.1.10-ld+ #4
>> [ 7933.630230] Hardware name: MSI MS-7A72/Z270 PC MATE (MS-7A72), BIOS 1.60 07/11/2017
>> [ 7933.630232] RIP: 0010:lock_release+0x147/0x230
>> [ 7933.630233] Code: 01 5f 04 75 29 e8 35 e7 25 00 85 c0 74 3b 83 3d 18 1a 42 01 00 75 32 48 c7 c6 a3 78 11 b7 48 c7 c7 2f 44 10 b7 e8 a9 10 fb ff <0f> 0b eb 1b 48 8d 4c 24 04 44 89 fa 48 89 ee 48 89
>> df e8 87 c4 ff
>> [ 7933.630234] RSP: 0018:ffffa24481c6fd88 EFLAGS: 00010082
>> [ 7933.630236] RAX: 000000000000001f RBX: ffff9f1c28ea8000 RCX: 0000000000000000
>> [ 7933.630237] RDX: ffffffffb6149356 RSI: 0000000000000001 RDI: ffffffffb6149368
>> [ 7933.630238] RBP: ffff9f1c2c838820 R08: ffffffffba72cee0 R09: 000000000000001f
>> [ 7933.630239] R10: 0000000000000001 R11: 00001e633eb23eea R12: ffffffffc0366de1
>> [ 7933.630240] R13: 0000000000000246 R14: ffff9f1c28500700 R15: 0000000000000000
>> [ 7933.630241] FS:  00007fd61de88540(0000) GS:ffff9f1c5ea00000(0000) knlGS:0000000000000000
>> [ 7933.630242] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 7933.630243] CR2: 0000000007af0000 CR3: 0000000428622002 CR4: 00000000001606f0
>> [ 7933.630244] DR0: 0000000037e28bcc DR1: 0000000000000000 DR2: 0000000000000000
>> [ 7933.630245] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
>> [ 7933.630245] Call Trace:
>> [ 7933.630249]  __mutex_unlock_slowpath+0x34/0x1e9
>> [ 7933.630252]  hdpvr_stop_streaming+0x78/0x169 [hdpvr]
>> [ 7933.630254]  hdpvr_read+0x33c/0x450 [hdpvr]
>> [ 7933.630256]  ? lockdep_hardirqs_on+0x168/0x183
>> [ 7933.630258]  ? wait_woken+0x86/0x86
>> [ 7933.630260]  v4l2_read+0x38/0x7a
>> [ 7933.630262]  vfs_read+0xad/0x136
>> [ 7933.630264]  ksys_read+0x53/0x95
>> [ 7933.630266]  do_syscall_64+0x52/0x155
>> [ 7933.630268]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [ 7933.630270] RIP: 0033:0x7fd61ddae005
>> [ 7933.630271] Code: 00 00 0f 1f 00 48 83 ec 38 64 48 8b 04 25 28 00 00 00 48 89 44 24 28 31 c0 48 8d 05 35 85 0d 00 8b 00 85 c0 75 27 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 63 48 8b 4c 24 28 64 48 33 0c
>> 25 28 00 00 00
>> [ 7933.630272] RSP: 002b:00007fffdb949a60 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
>> [ 7933.630274] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fd61ddae005
>> [ 7933.630275] RDX: 0000000000002000 RSI: 000055caa2856080 RDI: 0000000000000003
>> [ 7933.630275] RBP: 00007fffdb949b30 R08: 0000000000000000 R09: 0000000000000000
>> [ 7933.630276] R10: 0000000000000000 R11: 0000000000000246 R12: 000055caa2856080
>> [ 7933.630277] R13: 0000000000000000 R14: 00007fffdb94a156 R15: 0000000000002000
>> [ 7933.630279] irq event stamp: 30970489
>> [ 7933.630281] hardirqs last  enabled at (30970489): [<ffffffffb620e6eb>] kfree+0x196/0x217
>> [ 7933.630283] hardirqs last disabled at (30970488): [<ffffffffb620e6c8>] kfree+0x173/0x217
>> [ 7933.630284] softirqs last  enabled at (30970034): [<ffffffffb6c00376>] __do_softirq+0x376/0x3b2
>> [ 7933.630286] softirqs last disabled at (30970027): [<ffffffffb60f1ea2>] irq_exit+0x4e/0x9d
>> [ 7933.630287] ---[ end trace 64a925bcd732953f ]---
>>
>


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hdpvr mutex deadlock on kernel 5.1.x
  2019-06-19  2:29       ` Keith Pyle
@ 2019-06-20 11:33         ` Hans Verkuil
  2019-06-21  2:50           ` Keith Pyle
  2019-06-28  1:52           ` Keith Pyle
  0 siblings, 2 replies; 9+ messages in thread
From: Hans Verkuil @ 2019-06-20 11:33 UTC (permalink / raw)
  To: Keith Pyle, Linux Media Mailing List

On 6/19/19 4:29 AM, Keith Pyle wrote:
> On 06/18/19 02:16, Hans Verkuil wrote:
>> Hi Keith,
>>
>> On 6/18/19 6:17 AM, Keith Pyle wrote:
>>> We made the suggested change, compiled, installed, and rebooted. There was some progress - test 2 (turning the HD-PVR off) no longer produces a splat.  Test 1 (start capture) and test 3 (run capture
>>> and trigger HD-PVR to stop streaming) both still produce a traceback (see below).  Test 3 also still results in the unkillable process.
>> Try the following patch. Test 2 was caused by locking when it shouldn't, test 3 was caused by not
>> locking when it should :-) and I think test 1 was caused by locking when it is not allowed.
>>
>> Let me know if this works!
>>
>> Regards,
>>
>>     Hans
> Good news!  With these patches, lockdep does not report any of the prior problems and the capture process does not deadlock for my test3.
> 
> There is one item I noted: hdpvr_read has the line
> 
> msec_to_jiffies(4000);

Oops!

> 
> that doesn't really do anything.  This should be a 4 second sleep, based on our discussion back in 2014 (https://www.mail-archive.com/linux-media@vger.kernel.org/msg75163.html), since the restart will
> certainly fail unless the HD-PVR is given at least 3 seconds to reset after the stop.

I think a msleep(4000) at that point is solving only one use-case. I assume
the same problem will occur if you read() from the video device, then close()
it, re-open it and read() again, all within 4 seconds.

The real fix would be to store a timestamp (jiffies) when you stop streaming,
and in start_streaming check if there are less than 4 seconds between the last
stop and new start, and then sleep until 4 seconds have passed.

Is this something you can work on and provide a patch?

For now I'll post a patch fixing the deadlocks etc. so you can develop your
patch for this on top.

Regards,

	Hans

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hdpvr mutex deadlock on kernel 5.1.x
  2019-06-20 11:33         ` Hans Verkuil
@ 2019-06-21  2:50           ` Keith Pyle
  2019-06-28  1:52           ` Keith Pyle
  1 sibling, 0 replies; 9+ messages in thread
From: Keith Pyle @ 2019-06-21  2:50 UTC (permalink / raw)
  To: Hans Verkuil, Linux Media Mailing List

On 06/20/19 06:33, Hans Verkuil wrote:
> On 6/19/19 4:29 AM, Keith Pyle wrote:
>> On 06/18/19 02:16, Hans Verkuil wrote:
>>> Hi Keith,
>>>
>>> On 6/18/19 6:17 AM, Keith Pyle wrote:
>>>> We made the suggested change, compiled, installed, and rebooted. There was some progress - test 2 (turning the HD-PVR off) no longer produces a splat.  Test 1 (start capture) and test 3 (run capture
>>>> and trigger HD-PVR to stop streaming) both still produce a traceback (see below).  Test 3 also still results in the unkillable process.
>>> Try the following patch. Test 2 was caused by locking when it shouldn't, test 3 was caused by not
>>> locking when it should :-) and I think test 1 was caused by locking when it is not allowed.
>>>
>>> Let me know if this works!
>>>
>>> Regards,
>>>
>>>      Hans
>> Good news!  With these patches, lockdep does not report any of the prior problems and the capture process does not deadlock for my test3.
>>
>> There is one item I noted: hdpvr_read has the line
>>
>> msec_to_jiffies(4000);
> Oops!
>
>> that doesn't really do anything.  This should be a 4 second sleep, based on our discussion back in 2014 (https://www.mail-archive.com/linux-media@vger.kernel.org/msg75163.html), since the restart will
>> certainly fail unless the HD-PVR is given at least 3 seconds to reset after the stop.
> I think a msleep(4000) at that point is solving only one use-case. I assume
> the same problem will occur if you read() from the video device, then close()
> it, re-open it and read() again, all within 4 seconds.
>
> The real fix would be to store a timestamp (jiffies) when you stop streaming,
> and in start_streaming check if there are less than 4 seconds between the last
> stop and new start, and then sleep until 4 seconds have passed.
>
> Is this something you can work on and provide a patch?
>
> For now I'll post a patch fixing the deadlocks etc. so you can develop your
> patch for this on top.
>
> Regards,
>
> 	Hans
>
I agree with your analysis that it would be better to have every 
start_streaming make the check and sleep if needed.  Yes, I can work on it.

Keith


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: hdpvr mutex deadlock on kernel 5.1.x
  2019-06-20 11:33         ` Hans Verkuil
  2019-06-21  2:50           ` Keith Pyle
@ 2019-06-28  1:52           ` Keith Pyle
  2019-06-28 11:57             ` Hans Verkuil
  1 sibling, 1 reply; 9+ messages in thread
From: Keith Pyle @ 2019-06-28  1:52 UTC (permalink / raw)
  To: Hans Verkuil, Linux Media Mailing List

On 06/20/19 06:33, Hans Verkuil wrote:
> On 6/19/19 4:29 AM, Keith Pyle wrote:
>> On 06/18/19 02:16, Hans Verkuil wrote:
>>> Hi Keith,
>>>
>>> On 6/18/19 6:17 AM, Keith Pyle wrote:
>>>> We made the suggested change, compiled, installed, and rebooted. There was some progress - test 2 (turning the HD-PVR off) no longer produces a splat.  Test 1 (start capture) and test 3 (run capture
>>>> and trigger HD-PVR to stop streaming) both still produce a traceback (see below).  Test 3 also still results in the unkillable process.
>>> Try the following patch. Test 2 was caused by locking when it shouldn't, test 3 was caused by not
>>> locking when it should :-) and I think test 1 was caused by locking when it is not allowed.
>>>
>>> Let me know if this works!
>>>
>>> Regards,
>>>
>>>      Hans
>> Good news!  With these patches, lockdep does not report any of the prior problems and the capture process does not deadlock for my test3.
>>
>> There is one item I noted: hdpvr_read has the line
>>
>> msec_to_jiffies(4000);
> Oops!
>
>> that doesn't really do anything.  This should be a 4 second sleep, based on our discussion back in 2014 (https://www.mail-archive.com/linux-media@vger.kernel.org/msg75163.html), since the restart will
>> certainly fail unless the HD-PVR is given at least 3 seconds to reset after the stop.
> I think a msleep(4000) at that point is solving only one use-case. I assume
> the same problem will occur if you read() from the video device, then close()
> it, re-open it and read() again, all within 4 seconds.
>
> The real fix would be to store a timestamp (jiffies) when you stop streaming,
> and in start_streaming check if there are less than 4 seconds between the last
> stop and new start, and then sleep until 4 seconds have passed.
>
> Is this something you can work on and provide a patch?
>
> For now I'll post a patch fixing the deadlocks etc. so you can develop your
> patch for this on top.
>
> Regards,
>
> 	Hans
>
I've included below a proposed two-part patch for media/usb/hdpvr to be 
added on top of your commit 0fda628d1a97aec51e2120115f1a2adb7c56be5e. 
The proposed patch includes:

- Add the following module parameters:

   - hdpvr_close_to_open_ms_delay: specifies the amount of time that 
must elapse, in milliseconds,
     between stopping streaming and starting streaming since the HD-PVR 
generally takes >3 seconds
     to become ready for reads; defaults to 4000 ms.

   - hdpvr_restart_streaming_max_tries: prevents the driver from getting 
into an out of control
     restart loop; can be set to 0 to emulate the old driver behavior of 
no auto restarts; defaults to 1

   - hdpvr_restart_streaming_ms_delay: after a streaming restart, the 
HD-PVR will not properly
     respond for a brief time; this sets the delay applied after a 
restart; defaults to 100 milliseconds

- hdpvr_stop_streaming saves the stop time (in jiffies).

- hdpvr_start_streaming will sleep as needed to ensure that 
hdpvr_close_to_open_ms_delay has elapsed
   since the last stop streaming.

- Remove the fixed sleep between the hdpvr_stop_streaming and 
hdpvr_start_streaming calls in hdpvr_read
   since hdpvr_start_streaming now includes the necessary sleep for all 
starts.

- Fix bug where restarting streaming could incorrectly cause hdpvr_read 
to return 0 (EOF).

 From 38f265a0563a6aac16aea29f57d96fd2650d93e8 Mon Sep 17 00:00:00 2001
Date: Mon, 24 Jun 2019 22:04:48 -0500
Subject: [PATCH 1/2] Add adaptive sleeping in hdpvr_start_streaming

---
  drivers/media/usb/hdpvr/hdpvr-core.c  |  4 ++++
  drivers/media/usb/hdpvr/hdpvr-video.c | 18 ++++++++++++++++++
  drivers/media/usb/hdpvr/hdpvr.h       |  5 +++++
  3 files changed, 27 insertions(+)

diff --git a/drivers/media/usb/hdpvr/hdpvr-core.c 
b/drivers/media/usb/hdpvr/hdpvr-core.c
index 29ac7fc5b039..5f28174d409c 100644
--- a/drivers/media/usb/hdpvr/hdpvr-core.c
+++ b/drivers/media/usb/hdpvr/hdpvr-core.c
@@ -39,6 +39,10 @@ int hdpvr_debug;
  module_param(hdpvr_debug, int, S_IRUGO|S_IWUSR);
  MODULE_PARM_DESC(hdpvr_debug, "enable debugging output");

+uint hdpvr_close_to_open_ms_delay = 4000;
+module_param(hdpvr_close_to_open_ms_delay, uint, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(hdpvr_close_to_open_ms_delay, "delay restarting 
streaming by the specified number of milliseconds");
+
  static uint default_video_input = HDPVR_VIDEO_INPUTS;
  module_param(default_video_input, uint, S_IRUGO|S_IWUSR);
  MODULE_PARM_DESC(default_video_input, "default video input: 
0=Component / 1=S-Video / 2=Composite");
diff --git a/drivers/media/usb/hdpvr/hdpvr-video.c 
b/drivers/media/usb/hdpvr/hdpvr-video.c
index 693c6169fc01..d114eff06469 100644
--- a/drivers/media/usb/hdpvr/hdpvr-video.c
+++ b/drivers/media/usb/hdpvr/hdpvr-video.c
@@ -276,6 +276,8 @@ static int hdpvr_start_streaming(struct hdpvr_device 
*dev)
  {
      int ret;
      struct hdpvr_video_info vidinf;
+    u64 now_jiffies, delta_jiffies;
+    unsigned msec_to_sleep;

      if (dev->status == STATUS_STREAMING)
          return 0;
@@ -296,6 +298,19 @@ static int hdpvr_start_streaming(struct 
hdpvr_device *dev)
      v4l2_dbg(MSG_BUFFER, hdpvr_debug, &dev->v4l2_dev,
              "video signal: %dx%d@%dhz\n", vidinf.width,
              vidinf.height, vidinf.fps);
+    now_jiffies = get_jiffies_64();
+    /* inline time_after64 since the result of the subtraction is needed
+     * for the sleep
+     */
+    delta_jiffies = dev->jiffies_next_start_streaming - now_jiffies;
+    if ((__s64)delta_jiffies > 0) {
+        /* device firmware may not be ready yet */
+        msec_to_sleep = jiffies_to_msecs(delta_jiffies);
+        v4l2_dbg(MSG_INFO, hdpvr_debug, &dev->v4l2_dev, "device 
firmware may not be ready yet, sleeping for %u milliseconds\n", 
msec_to_sleep);
+        msleep(msec_to_sleep);
+    }
+    else
+        v4l2_dbg(MSG_INFO, hdpvr_debug, &dev->v4l2_dev, "device 
firmware assumed to be already stable, not sleeping\n");

      /* start streaming 2 request */
      ret = usb_control_msg(dev->udev,
@@ -328,6 +343,7 @@ static int hdpvr_stop_streaming(struct hdpvr_device 
*dev)
      int actual_length;
      uint c = 0;
      u8 *buf;
+    u64 now_jiffies;

      if (dev->status == STATUS_IDLE)
          return 0;
@@ -364,6 +380,8 @@ static int hdpvr_stop_streaming(struct hdpvr_device 
*dev)
      kfree(buf);
      v4l2_dbg(MSG_BUFFER, hdpvr_debug, &dev->v4l2_dev,
           "used %d urbs to empty device buffers\n", c-1);
+    now_jiffies = get_jiffies_64();
+    dev->jiffies_next_start_streaming = now_jiffies + 
msecs_to_jiffies(hdpvr_close_to_open_ms_delay);
      msleep(10);

      dev->status = STATUS_IDLE;
diff --git a/drivers/media/usb/hdpvr/hdpvr.h 
b/drivers/media/usb/hdpvr/hdpvr.h
index fa43e1d45ea9..a9d46fbeef9a 100644
--- a/drivers/media/usb/hdpvr/hdpvr.h
+++ b/drivers/media/usb/hdpvr/hdpvr.h
@@ -43,6 +43,7 @@
  /* #define HDPVR_DEBUG */

  extern int hdpvr_debug;
+extern uint hdpvr_close_to_open_ms_delay;

  #define MSG_INFO    1
  #define MSG_BUFFER    2
@@ -95,6 +96,10 @@ struct hdpvr_device {
      struct v4l2_dv_timings    cur_dv_timings;

      uint            flags;
+    /* earliest jiffies at which the device firmware will be ready to
+     * start streaming
+     */
+    u64             jiffies_next_start_streaming;

      /* synchronize I/O */
      struct mutex        io_mutex;
-- 
2.21.0

 From 83be5b523422fd39222368f977ab0c7981421aec Mon Sep 17 00:00:00 2001
Date: Thu, 27 Jun 2019 20:02:52 -0500
Subject: [PATCH 2/2] Add optional restart, with optional delay, after
  restarting streaming

---
  drivers/media/usb/hdpvr/hdpvr-core.c  |  8 ++++++++
  drivers/media/usb/hdpvr/hdpvr-video.c | 28 +++++++++++++++++++++++++++
  drivers/media/usb/hdpvr/hdpvr.h       |  2 ++
  3 files changed, 38 insertions(+)

diff --git a/drivers/media/usb/hdpvr/hdpvr-core.c 
b/drivers/media/usb/hdpvr/hdpvr-core.c
index 5f28174d409c..7cfd9ca6440b 100644
--- a/drivers/media/usb/hdpvr/hdpvr-core.c
+++ b/drivers/media/usb/hdpvr/hdpvr-core.c
@@ -43,6 +43,14 @@ uint hdpvr_close_to_open_ms_delay = 4000;
  module_param(hdpvr_close_to_open_ms_delay, uint, S_IRUGO|S_IWUSR);
  MODULE_PARM_DESC(hdpvr_close_to_open_ms_delay, "delay restarting 
streaming by the specified number of milliseconds");

+uint hdpvr_restart_streaming_max_tries = 1;
+module_param(hdpvr_restart_streaming_max_tries, uint, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(hdpvr_restart_streaming_max_tries, "restart streaming 
at most this many times within one read");
+
+uint hdpvr_restart_streaming_ms_delay = 100;
+module_param(hdpvr_restart_streaming_ms_delay, uint, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(hdpvr_restart_streaming_ms_delay, "delay continue by 
the specified number of milliseconds after restarting streaming");
+
  static uint default_video_input = HDPVR_VIDEO_INPUTS;
  module_param(default_video_input, uint, S_IRUGO|S_IWUSR);
  MODULE_PARM_DESC(default_video_input, "default video input: 
0=Component / 1=S-Video / 2=Composite");
diff --git a/drivers/media/usb/hdpvr/hdpvr-video.c 
b/drivers/media/usb/hdpvr/hdpvr-video.c
index d114eff06469..4feac0777a6c 100644
--- a/drivers/media/usb/hdpvr/hdpvr-video.c
+++ b/drivers/media/usb/hdpvr/hdpvr-video.c
@@ -433,6 +433,7 @@ static ssize_t hdpvr_read(struct file *file, char 
__user *buffer, size_t count,
      struct hdpvr_buffer *buf = NULL;
      struct urb *urb;
      unsigned int ret = 0;
+    unsigned int restarts_remaining = hdpvr_restart_streaming_max_tries;
      int rem, cnt;

      if (*pos)
@@ -483,6 +484,15 @@ static ssize_t hdpvr_read(struct file *file, char 
__user *buffer, size_t count,
                  goto err;
              }
              if (!err) {
+                if (restarts_remaining == 0) {
+                    v4l2_dbg(MSG_BUFFER, hdpvr_debug, &dev->v4l2_dev, 
"timeout: no further restarts allowed by 
hdpvr_restart_streaming_max_tries; returning to caller with ret=%u", ret);
+                    /* This break will return the count of bytes copied
+                     * so far, which may be 0.  In that situation, the
+                     * user application will get an EOF.
+                     */
+                    break;
+                }
+                -- restarts_remaining;
                  v4l2_info(&dev->v4l2_dev,
                        "timeout: restart streaming\n");
                  mutex_lock(&dev->io_mutex);
@@ -493,6 +503,24 @@ static ssize_t hdpvr_read(struct file *file, char 
__user *buffer, size_t count,
                      ret = err;
                      goto err;
                  }
+                /* hdpvr_start_streaming instructs the device to stream,
+                 * but the device is usually not ready by the time
+                 * hdpvr_start_streaming returns.
+                 *
+                 * Without this continue, the loop would terminate.  If
+                 * no data had been copied by a prior iteration of the
+                 * loop, then hdpvr_read would return 0, closing the
+                 * file descriptor prematurely.  Continue back to the
+                 * top of the loop to avoid that.
+                 *
+                 * The device may not be ready within 1 second, so the
+                 * wait_event_interruptible_timeout would then restart
+                 * streaming a second time.  Delay here to give the
+                 * device time to stabilize first.
+                 */
+                if (hdpvr_restart_streaming_ms_delay)
+                    msleep(hdpvr_restart_streaming_ms_delay);
+                continue;
              }
          }

diff --git a/drivers/media/usb/hdpvr/hdpvr.h 
b/drivers/media/usb/hdpvr/hdpvr.h
index a9d46fbeef9a..2b3d92b2b639 100644
--- a/drivers/media/usb/hdpvr/hdpvr.h
+++ b/drivers/media/usb/hdpvr/hdpvr.h
@@ -44,6 +44,8 @@

  extern int hdpvr_debug;
  extern uint hdpvr_close_to_open_ms_delay;
+extern uint hdpvr_restart_streaming_max_tries;
+extern uint hdpvr_restart_streaming_ms_delay;

  #define MSG_INFO    1
  #define MSG_BUFFER    2
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: hdpvr mutex deadlock on kernel 5.1.x
  2019-06-28  1:52           ` Keith Pyle
@ 2019-06-28 11:57             ` Hans Verkuil
  0 siblings, 0 replies; 9+ messages in thread
From: Hans Verkuil @ 2019-06-28 11:57 UTC (permalink / raw)
  To: Keith Pyle, Linux Media Mailing List

Hi Keith,

On 6/28/19 3:52 AM, Keith Pyle wrote:
> On 06/20/19 06:33, Hans Verkuil wrote:
>> On 6/19/19 4:29 AM, Keith Pyle wrote:
>>> On 06/18/19 02:16, Hans Verkuil wrote:
>>>> Hi Keith,
>>>>
>>>> On 6/18/19 6:17 AM, Keith Pyle wrote:
>>>>> We made the suggested change, compiled, installed, and rebooted. There was some progress - test 2 (turning the HD-PVR off) no longer produces a splat.  Test 1 (start capture) and test 3 (run capture
>>>>> and trigger HD-PVR to stop streaming) both still produce a traceback (see below).  Test 3 also still results in the unkillable process.
>>>> Try the following patch. Test 2 was caused by locking when it shouldn't, test 3 was caused by not
>>>> locking when it should :-) and I think test 1 was caused by locking when it is not allowed.
>>>>
>>>> Let me know if this works!
>>>>
>>>> Regards,
>>>>
>>>>      Hans
>>> Good news!  With these patches, lockdep does not report any of the prior problems and the capture process does not deadlock for my test3.
>>>
>>> There is one item I noted: hdpvr_read has the line
>>>
>>> msec_to_jiffies(4000);
>> Oops!
>>
>>> that doesn't really do anything.  This should be a 4 second sleep, based on our discussion back in 2014 (https://www.mail-archive.com/linux-media@vger.kernel.org/msg75163.html), since the restart will
>>> certainly fail unless the HD-PVR is given at least 3 seconds to reset after the stop.
>> I think a msleep(4000) at that point is solving only one use-case. I assume
>> the same problem will occur if you read() from the video device, then close()
>> it, re-open it and read() again, all within 4 seconds.
>>
>> The real fix would be to store a timestamp (jiffies) when you stop streaming,
>> and in start_streaming check if there are less than 4 seconds between the last
>> stop and new start, and then sleep until 4 seconds have passed.
>>
>> Is this something you can work on and provide a patch?
>>
>> For now I'll post a patch fixing the deadlocks etc. so you can develop your
>> patch for this on top.
>>
>> Regards,
>>
>> 	Hans
>>
> I've included below a proposed two-part patch for media/usb/hdpvr to be 
> added on top of your commit 0fda628d1a97aec51e2120115f1a2adb7c56be5e. 
> The proposed patch includes:
> 
> - Add the following module parameters:
> 
>    - hdpvr_close_to_open_ms_delay: specifies the amount of time that 
> must elapse, in milliseconds,
>      between stopping streaming and starting streaming since the HD-PVR 
> generally takes >3 seconds
>      to become ready for reads; defaults to 4000 ms.
> 
>    - hdpvr_restart_streaming_max_tries: prevents the driver from getting 
> into an out of control
>      restart loop; can be set to 0 to emulate the old driver behavior of 
> no auto restarts; defaults to 1
> 
>    - hdpvr_restart_streaming_ms_delay: after a streaming restart, the 
> HD-PVR will not properly
>      respond for a brief time; this sets the delay applied after a 
> restart; defaults to 100 milliseconds
> 
> - hdpvr_stop_streaming saves the stop time (in jiffies).
> 
> - hdpvr_start_streaming will sleep as needed to ensure that 
> hdpvr_close_to_open_ms_delay has elapsed
>    since the last stop streaming.
> 
> - Remove the fixed sleep between the hdpvr_stop_streaming and 
> hdpvr_start_streaming calls in hdpvr_read
>    since hdpvr_start_streaming now includes the necessary sleep for all 
> starts.
> 
> - Fix bug where restarting streaming could incorrectly cause hdpvr_read 
> to return 0 (EOF).

A quick review below.

First of all, when you post the next version, just post each patch as a
separate email. That's much easier to review.

> 
>  From 38f265a0563a6aac16aea29f57d96fd2650d93e8 Mon Sep 17 00:00:00 2001
> Date: Mon, 24 Jun 2019 22:04:48 -0500
> Subject: [PATCH 1/2] Add adaptive sleeping in hdpvr_start_streaming

Add a description of what was changed here.

We also need a 'Signed-of-by:' line for each patch.

> 
> ---
>   drivers/media/usb/hdpvr/hdpvr-core.c  |  4 ++++
>   drivers/media/usb/hdpvr/hdpvr-video.c | 18 ++++++++++++++++++
>   drivers/media/usb/hdpvr/hdpvr.h       |  5 +++++
>   3 files changed, 27 insertions(+)
> 
> diff --git a/drivers/media/usb/hdpvr/hdpvr-core.c 
> b/drivers/media/usb/hdpvr/hdpvr-core.c
> index 29ac7fc5b039..5f28174d409c 100644
> --- a/drivers/media/usb/hdpvr/hdpvr-core.c
> +++ b/drivers/media/usb/hdpvr/hdpvr-core.c
> @@ -39,6 +39,10 @@ int hdpvr_debug;
>   module_param(hdpvr_debug, int, S_IRUGO|S_IWUSR);
>   MODULE_PARM_DESC(hdpvr_debug, "enable debugging output");
> 
> +uint hdpvr_close_to_open_ms_delay = 4000;
> +module_param(hdpvr_close_to_open_ms_delay, uint, S_IRUGO|S_IWUSR);
> +MODULE_PARM_DESC(hdpvr_close_to_open_ms_delay, "delay restarting 
> streaming by the specified number of milliseconds");

Turn off line wrapping in your mail client!

> +
>   static uint default_video_input = HDPVR_VIDEO_INPUTS;
>   module_param(default_video_input, uint, S_IRUGO|S_IWUSR);
>   MODULE_PARM_DESC(default_video_input, "default video input: 
> 0=Component / 1=S-Video / 2=Composite");
> diff --git a/drivers/media/usb/hdpvr/hdpvr-video.c 
> b/drivers/media/usb/hdpvr/hdpvr-video.c
> index 693c6169fc01..d114eff06469 100644
> --- a/drivers/media/usb/hdpvr/hdpvr-video.c
> +++ b/drivers/media/usb/hdpvr/hdpvr-video.c
> @@ -276,6 +276,8 @@ static int hdpvr_start_streaming(struct hdpvr_device 
> *dev)
>   {
>       int ret;
>       struct hdpvr_video_info vidinf;
> +    u64 now_jiffies, delta_jiffies;
> +    unsigned msec_to_sleep;
> 
>       if (dev->status == STATUS_STREAMING)
>           return 0;
> @@ -296,6 +298,19 @@ static int hdpvr_start_streaming(struct 
> hdpvr_device *dev)
>       v4l2_dbg(MSG_BUFFER, hdpvr_debug, &dev->v4l2_dev,
>               "video signal: %dx%d@%dhz\n", vidinf.width,
>               vidinf.height, vidinf.fps);
> +    now_jiffies = get_jiffies_64();

No need to use get_jiffies_64();, just use 'jiffies'.

> +    /* inline time_after64 since the result of the subtraction is needed
> +     * for the sleep
> +     */
> +    delta_jiffies = dev->jiffies_next_start_streaming - now_jiffies;
> +    if ((__s64)delta_jiffies > 0) {
> +        /* device firmware may not be ready yet */
> +        msec_to_sleep = jiffies_to_msecs(delta_jiffies);
> +        v4l2_dbg(MSG_INFO, hdpvr_debug, &dev->v4l2_dev, "device 
> firmware may not be ready yet, sleeping for %u milliseconds\n", 
> msec_to_sleep);

This line should be split over two lines since it is way too long.

BTW, use hard TABs instead of spaces (just follow the existing coding
style).

You should also run scripts/checkpatch.pl --strict over your patch: it
will check it for style errors.

I've stopped reviewing here. The line wrapping and different coding style
makes this too hard to review.

Please clean it up first.

Regards,

	Hans

> +        msleep(msec_to_sleep);
> +    }
> +    else
> +        v4l2_dbg(MSG_INFO, hdpvr_debug, &dev->v4l2_dev, "device 
> firmware assumed to be already stable, not sleeping\n");
> 
>       /* start streaming 2 request */
>       ret = usb_control_msg(dev->udev,
> @@ -328,6 +343,7 @@ static int hdpvr_stop_streaming(struct hdpvr_device 
> *dev)
>       int actual_length;
>       uint c = 0;
>       u8 *buf;
> +    u64 now_jiffies;
> 
>       if (dev->status == STATUS_IDLE)
>           return 0;
> @@ -364,6 +380,8 @@ static int hdpvr_stop_streaming(struct hdpvr_device 
> *dev)
>       kfree(buf);
>       v4l2_dbg(MSG_BUFFER, hdpvr_debug, &dev->v4l2_dev,
>            "used %d urbs to empty device buffers\n", c-1);
> +    now_jiffies = get_jiffies_64();
> +    dev->jiffies_next_start_streaming = now_jiffies + 
> msecs_to_jiffies(hdpvr_close_to_open_ms_delay);
>       msleep(10);
> 
>       dev->status = STATUS_IDLE;
> diff --git a/drivers/media/usb/hdpvr/hdpvr.h 
> b/drivers/media/usb/hdpvr/hdpvr.h
> index fa43e1d45ea9..a9d46fbeef9a 100644
> --- a/drivers/media/usb/hdpvr/hdpvr.h
> +++ b/drivers/media/usb/hdpvr/hdpvr.h
> @@ -43,6 +43,7 @@
>   /* #define HDPVR_DEBUG */
> 
>   extern int hdpvr_debug;
> +extern uint hdpvr_close_to_open_ms_delay;
> 
>   #define MSG_INFO    1
>   #define MSG_BUFFER    2
> @@ -95,6 +96,10 @@ struct hdpvr_device {
>       struct v4l2_dv_timings    cur_dv_timings;
> 
>       uint            flags;
> +    /* earliest jiffies at which the device firmware will be ready to
> +     * start streaming
> +     */
> +    u64             jiffies_next_start_streaming;
> 
>       /* synchronize I/O */
>       struct mutex        io_mutex;
> 

>  From 83be5b523422fd39222368f977ab0c7981421aec Mon Sep 17 00:00:00 2001
> Date: Thu, 27 Jun 2019 20:02:52 -0500
> Subject: [PATCH 2/2] Add optional restart, with optional delay, after
>   restarting streaming
> 
> ---
>   drivers/media/usb/hdpvr/hdpvr-core.c  |  8 ++++++++
>   drivers/media/usb/hdpvr/hdpvr-video.c | 28 +++++++++++++++++++++++++++
>   drivers/media/usb/hdpvr/hdpvr.h       |  2 ++
>   3 files changed, 38 insertions(+)
> 
> diff --git a/drivers/media/usb/hdpvr/hdpvr-core.c 
> b/drivers/media/usb/hdpvr/hdpvr-core.c
> index 5f28174d409c..7cfd9ca6440b 100644
> --- a/drivers/media/usb/hdpvr/hdpvr-core.c
> +++ b/drivers/media/usb/hdpvr/hdpvr-core.c
> @@ -43,6 +43,14 @@ uint hdpvr_close_to_open_ms_delay = 4000;
>   module_param(hdpvr_close_to_open_ms_delay, uint, S_IRUGO|S_IWUSR);
>   MODULE_PARM_DESC(hdpvr_close_to_open_ms_delay, "delay restarting 
> streaming by the specified number of milliseconds");
> 
> +uint hdpvr_restart_streaming_max_tries = 1;
> +module_param(hdpvr_restart_streaming_max_tries, uint, S_IRUGO|S_IWUSR);
> +MODULE_PARM_DESC(hdpvr_restart_streaming_max_tries, "restart streaming 
> at most this many times within one read");
> +
> +uint hdpvr_restart_streaming_ms_delay = 100;
> +module_param(hdpvr_restart_streaming_ms_delay, uint, S_IRUGO|S_IWUSR);
> +MODULE_PARM_DESC(hdpvr_restart_streaming_ms_delay, "delay continue by 
> the specified number of milliseconds after restarting streaming");
> +
>   static uint default_video_input = HDPVR_VIDEO_INPUTS;
>   module_param(default_video_input, uint, S_IRUGO|S_IWUSR);
>   MODULE_PARM_DESC(default_video_input, "default video input: 
> 0=Component / 1=S-Video / 2=Composite");
> diff --git a/drivers/media/usb/hdpvr/hdpvr-video.c 
> b/drivers/media/usb/hdpvr/hdpvr-video.c
> index d114eff06469..4feac0777a6c 100644
> --- a/drivers/media/usb/hdpvr/hdpvr-video.c
> +++ b/drivers/media/usb/hdpvr/hdpvr-video.c
> @@ -433,6 +433,7 @@ static ssize_t hdpvr_read(struct file *file, char 
> __user *buffer, size_t count,
>       struct hdpvr_buffer *buf = NULL;
>       struct urb *urb;
>       unsigned int ret = 0;
> +    unsigned int restarts_remaining = hdpvr_restart_streaming_max_tries;
>       int rem, cnt;
> 
>       if (*pos)
> @@ -483,6 +484,15 @@ static ssize_t hdpvr_read(struct file *file, char 
> __user *buffer, size_t count,
>                   goto err;
>               }
>               if (!err) {
> +                if (restarts_remaining == 0) {
> +                    v4l2_dbg(MSG_BUFFER, hdpvr_debug, &dev->v4l2_dev, 
> "timeout: no further restarts allowed by 
> hdpvr_restart_streaming_max_tries; returning to caller with ret=%u", ret);
> +                    /* This break will return the count of bytes copied
> +                     * so far, which may be 0.  In that situation, the
> +                     * user application will get an EOF.
> +                     */
> +                    break;
> +                }
> +                -- restarts_remaining;
>                   v4l2_info(&dev->v4l2_dev,
>                         "timeout: restart streaming\n");
>                   mutex_lock(&dev->io_mutex);
> @@ -493,6 +503,24 @@ static ssize_t hdpvr_read(struct file *file, char 
> __user *buffer, size_t count,
>                       ret = err;
>                       goto err;
>                   }
> +                /* hdpvr_start_streaming instructs the device to stream,
> +                 * but the device is usually not ready by the time
> +                 * hdpvr_start_streaming returns.
> +                 *
> +                 * Without this continue, the loop would terminate.  If
> +                 * no data had been copied by a prior iteration of the
> +                 * loop, then hdpvr_read would return 0, closing the
> +                 * file descriptor prematurely.  Continue back to the
> +                 * top of the loop to avoid that.
> +                 *
> +                 * The device may not be ready within 1 second, so the
> +                 * wait_event_interruptible_timeout would then restart
> +                 * streaming a second time.  Delay here to give the
> +                 * device time to stabilize first.
> +                 */
> +                if (hdpvr_restart_streaming_ms_delay)
> +                    msleep(hdpvr_restart_streaming_ms_delay);
> +                continue;
>               }
>           }
> 
> diff --git a/drivers/media/usb/hdpvr/hdpvr.h 
> b/drivers/media/usb/hdpvr/hdpvr.h
> index a9d46fbeef9a..2b3d92b2b639 100644
> --- a/drivers/media/usb/hdpvr/hdpvr.h
> +++ b/drivers/media/usb/hdpvr/hdpvr.h
> @@ -44,6 +44,8 @@
> 
>   extern int hdpvr_debug;
>   extern uint hdpvr_close_to_open_ms_delay;
> +extern uint hdpvr_restart_streaming_max_tries;
> +extern uint hdpvr_restart_streaming_ms_delay;
> 
>   #define MSG_INFO    1
>   #define MSG_BUFFER    2


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-28 11:57 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-15 20:06 hdpvr mutex deadlock on kernel 5.1.x Keith Pyle
2019-06-17 10:22 ` Hans Verkuil
2019-06-18  4:17   ` Keith Pyle
2019-06-18  7:16     ` Hans Verkuil
2019-06-19  2:29       ` Keith Pyle
2019-06-20 11:33         ` Hans Verkuil
2019-06-21  2:50           ` Keith Pyle
2019-06-28  1:52           ` Keith Pyle
2019-06-28 11:57             ` Hans Verkuil

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).