[BUG 3.7-rc1] nouveau cli->mutex possible recursive locking detected

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [BUG 3.7-rc1] nouveau cli->mutex possible recursive locking detected
@ 2012-10-16 12:43 Stanislaw Gruszka
  2012-10-24 11:14 ` Arend van Spriel
  0 siblings, 1 reply; 5+ messages in thread
From: Stanislaw Gruszka @ 2012-10-16 12:43 UTC (permalink / raw)
  To: dri-devel; +Cc: linux-kernel, Ben Skeggs

I have this lockdep warning on wireless-testing tree based
on 3.7-rc1 (no other patches except wireless bits).

=============================================
Restarting tasks ... done.
[ INFO: possible recursive locking detected ]
3.7.0-rc1-wl+ #2 Not tainted
---------------------------------------------
Xorg/2269 is trying to acquire lock:
 (&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>] nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]

but task is already holding lock:
 (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>] nouveau_abi16_get+0x34/0x100 [nouveau]

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&cli->mutex);
  lock(&cli->mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

1 lock held by Xorg/2269:
 #0:  (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>] nouveau_abi16_get+0x34/0x100 [nouveau]

stack backtrace:
Pid: 2269, comm: Xorg Not tainted 3.7.0-rc1-wl+ #2
Call Trace:
 [<ffffffff810bbc24>] print_deadlock_bug+0xf4/0x100
 [<ffffffff810bdba9>] validate_chain+0x549/0x7e0
 [<ffffffff810be1a7>] __lock_acquire+0x367/0x580
 [<ffffffffa012a27f>] ? nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
 [<ffffffff810be464>] lock_acquire+0xa4/0x120
 [<ffffffffa012a27f>] ? nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
 [<ffffffff8156c860>] ? _raw_spin_unlock_irqrestore+0x40/0x80
 [<ffffffff81569217>] __mutex_lock_common+0x47/0x3f0
 [<ffffffffa012a27f>] ? nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
 [<ffffffffa011dd61>] ? nv84_graph_tlb_flush+0x291/0x2b0 [nouveau]
 [<ffffffffa00b4be6>] ? _nouveau_gpuobj_wr32+0x26/0x30 [nouveau]
 [<ffffffffa012a27f>] ? nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
 [<ffffffff815696e7>] mutex_lock_nested+0x37/0x50
 [<ffffffffa012a27f>] nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
 [<ffffffffa012a783>] nouveau_bo_move+0xe3/0x330 [nouveau]
 [<ffffffffa009619d>] ttm_bo_handle_move_mem+0x2bd/0x670 [ttm]
 [<ffffffffa0098a1e>] ttm_bo_move_buffer+0x12e/0x150 [ttm]
 [<ffffffffa0098ad9>] ttm_bo_validate+0x99/0x130 [ttm]
 [<ffffffffa012add3>] nouveau_bo_validate+0x23/0x30 [nouveau]
 [<ffffffffa012cd8e>] validate_list+0xae/0x2c0 [nouveau]
 [<ffffffffa012dec2>] nouveau_gem_pushbuf_validate+0xa2/0x1e0 [nouveau]
 [<ffffffffa012e22c>] nouveau_gem_ioctl_pushbuf+0x22c/0x8a0 [nouveau]
 [<ffffffffa002c465>] drm_ioctl+0x355/0x570 [drm]
 [<ffffffff8119349a>] ? do_sync_read+0xaa/0xf0
 [<ffffffffa012e000>] ? nouveau_gem_pushbuf_validate+0x1e0/0x1e0 [nouveau]
 [<ffffffff811a579c>] do_vfs_ioctl+0x8c/0x350
 [<ffffffff81575745>] ? sysret_check+0x22/0x5d
 [<ffffffff811a5b01>] sys_ioctl+0xa1/0xb0
 [<ffffffff81291eee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff81575719>] system_call_fastpath+0x16/0x1b

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG 3.7-rc1] nouveau cli->mutex possible recursive locking detected
  2012-10-16 12:43 [BUG 3.7-rc1] nouveau cli->mutex possible recursive locking detected Stanislaw Gruszka
@ 2012-10-24 11:14 ` Arend van Spriel
  2012-10-24 12:45   ` Arend van Spriel
  0 siblings, 1 reply; 5+ messages in thread
From: Arend van Spriel @ 2012-10-24 11:14 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: dri-devel, linux-kernel, Ben Skeggs

On 10/16/2012 02:43 PM, Stanislaw Gruszka wrote:
> I have this lockdep warning on wireless-testing tree based
> on 3.7-rc1 (no other patches except wireless bits).
>
> =============================================
> Restarting tasks ... done.
> [ INFO: possible recursive locking detected ]
> 3.7.0-rc1-wl+ #2 Not tainted
> ---------------------------------------------
> Xorg/2269 is trying to acquire lock:
>   (&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>] nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
>
> but task is already holding lock:
>   (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>] nouveau_abi16_get+0x34/0x100 [nouveau]
>

I have observed the same bug so I built and tested v3.7-rc2 tag with 
lockdep enabled. It has the same problem and it results in a failure to 
resume after suspend. See below.

Gr. AvS

[   76.272795] PM: suspend of devices complete after 2149.188 msecs
[   76.273110] PM: suspend devices took 2.152 seconds
[   76.273354] suspend debug: Waiting for 5 seconds.
[   81.233082] ehci_hcd 0000:00:1a.0: setting latency timer to 64
[   81.233369] ehci_hcd 0000:00:1d.0: setting latency timer to 64
[   81.233422] pci 0000:00:1e.0: setting latency timer to 64
[   81.248934] e1000e 0000:00:19.0: wake-up capability disabled by ACPI
[   81.249398] e1000e 0000:00:19.0: irq 41 for MSI/MSI-X
[   81.249903] ahci 0000:00:1f.2: setting latency timer to 64
[   81.249982] snd_hda_intel 0000:00:1b.0: irq 43 for MSI/MSI-X
[   81.250515] nouveau  [     DRM] re-enabling device...
[   81.250548] nouveau  [     DRM] resuming client object trees...
[   81.250557] nouveau  [   VBIOS][0000:01:00.0] running init tables
[   81.701998] nouveau  [     DRM] resuming display...
[   81.803923] firewire_core 0000:04:00.4: rediscovered device fw0
[   81.823913] dell_wmi: Received unknown WMI event (0x11)
[   81.824521] serial 00:08: activated
[   82.135333] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[   82.187115] ata6: SATA link down (SStatus 0 SControl 300)
[   82.232290] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[   82.284002] ata5: SATA link down (SStatus 0 SControl 300)
[   82.330629] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by 
device (Stat=0x51 Err=0x04)
[   82.408079] ata2.00: configured for UDMA/133
[   84.073571] ata1.00: failed to get Identify Device Data, Emask 0x1
[   84.127965] ata1.00: ACPI cmd 00/00:00:00:00:00:a0 (NOP) rejected by 
device (Stat=0x51 Err=0x04)
[   84.202292] ata1.00: failed to get Identify Device Data, Emask 0x1
[   84.254039] ata1.00: configured for UDMA/133
[   84.303718] sd 0:0:0:0: [sda] Starting disk
[   84.360186] PM: resume of devices complete after 3132.774 msecs
[   84.410322] PM: resume devices took 3.180 seconds
[   84.449642] PM: Finishing wakeup.
[   84.505964]
[   84.506716] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow 
Control: Rx
[   84.477326] Restarting tasks ... done.
[   84.575294] video LNXVIDEO:00: Restoring backlight state
[   84.623825] =============================================
[   84.623825] [ INFO: possible recursive locking detected ]
[   84.623826] 3.7.0-rc2-testing-lockdep #1 Not tainted
[   84.623827] ---------------------------------------------
[   84.623827] Xorg/1369 is trying to acquire lock:
[   84.623828]  (&cli->mutex){+.+.+.}, at: [<f8974ca8>] 
nouveau_bo_move_m2mf.isra.13+0x38/0x120 [nouveau]
[   84.623856]
[   84.623856] but task is already holding lock:
[   84.623856]  (&cli->mutex){+.+.+.}, at: [<f8979346>] 
nouveau_abi16_get+0x26/0x110 [nouveau]
[   84.623871]
[   84.623871] other info that might help us debug this:
[   84.623872]  Possible unsafe locking scenario:
[   84.623872]
[   84.623872]        CPU0
[   84.623872]        ----
[   84.623873]   lock(&cli->mutex);
[   84.623874]   lock(&cli->mutex);
[   84.623874]
[   84.623874]  *** DEADLOCK ***
[   84.623874]
[   84.623874]  May be due to missing lock nesting notation
[   84.623874]
[   84.623875] 1 lock held by Xorg/1369:
[   84.623889]  #0:  (&cli->mutex){+.+.+.}, at: [<f8979346>] 
nouveau_abi16_get+0x26/0x110 [nouveau]
[   84.623890]



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG 3.7-rc1] nouveau cli->mutex possible recursive locking detected
  2012-10-24 11:14 ` Arend van Spriel
@ 2012-10-24 12:45   ` Arend van Spriel
  2012-10-25  9:26     ` Arend van Spriel
  2012-10-25 10:14     ` Arend van Spriel
  0 siblings, 2 replies; 5+ messages in thread
From: Arend van Spriel @ 2012-10-24 12:45 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: dri-devel, linux-kernel, Ben Skeggs

On 10/24/2012 01:14 PM, Arend van Spriel wrote:
> On 10/16/2012 02:43 PM, Stanislaw Gruszka wrote:
>> I have this lockdep warning on wireless-testing tree based
>> on 3.7-rc1 (no other patches except wireless bits).
>>
>> =============================================
>> Restarting tasks ... done.
>> [ INFO: possible recursive locking detected ]
>> 3.7.0-rc1-wl+ #2 Not tainted
>> ---------------------------------------------
>> Xorg/2269 is trying to acquire lock:
>>   (&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>]
>> nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
>>
>> but task is already holding lock:
>>   (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>]
>> nouveau_abi16_get+0x34/0x100 [nouveau]
>>
>
> I have observed the same bug so I built and tested v3.7-rc2 tag with
> lockdep enabled. It has the same problem and it results in a failure to
> resume after suspend. See below.
>
> Gr. AvS

digging into the trace:


nouveau_gem_ioctl_pushbuf() calls nouveau_abi16_get() which grabs the 
mutex. Assume this should protect the chan variable passed to 
nouveau_gem_pushbuf_validate(), which does a bit more that validate as 
it ends up in nouveau_bo_move_m2mf() which uses the drm->chan. However, 
it deadlocks before that.

Gr. AvS


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG 3.7-rc1] nouveau cli->mutex possible recursive locking detected
  2012-10-24 12:45   ` Arend van Spriel
@ 2012-10-25  9:26     ` Arend van Spriel
  2012-10-25 10:14     ` Arend van Spriel
  1 sibling, 0 replies; 5+ messages in thread
From: Arend van Spriel @ 2012-10-25  9:26 UTC (permalink / raw)
  To: Ben Skeggs; +Cc: Stanislaw Gruszka, dri-devel, linux-kernel

On 10/24/2012 02:45 PM, Arend van Spriel wrote:
> On 10/24/2012 01:14 PM, Arend van Spriel wrote:
>> On 10/16/2012 02:43 PM, Stanislaw Gruszka wrote:
>>> I have this lockdep warning on wireless-testing tree based
>>> on 3.7-rc1 (no other patches except wireless bits).
>>>
>>> =============================================
>>> Restarting tasks ... done.
>>> [ INFO: possible recursive locking detected ]
>>> 3.7.0-rc1-wl+ #2 Not tainted
>>> ---------------------------------------------
>>> Xorg/2269 is trying to acquire lock:
>>>   (&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>]
>>> nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
>>>
>>> but task is already holding lock:
>>>   (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>]
>>> nouveau_abi16_get+0x34/0x100 [nouveau]
>>>
>>
>> I have observed the same bug so I built and tested v3.7-rc2 tag with
>> lockdep enabled. It has the same problem and it results in a failure to
>> resume after suspend. See below.
>>
>> Gr. AvS
>
> digging into the trace:
>
>
> nouveau_gem_ioctl_pushbuf() calls nouveau_abi16_get() which grabs the
> mutex. Assume this should protect the chan variable passed to
> nouveau_gem_pushbuf_validate(), which does a bit more that validate as
> it ends up in nouveau_bo_move_m2mf() which uses the drm->chan. However,
> it deadlocks before that.
>
> Gr. AvS

Maybe this helps. The two locations where the lock is grabbed are from 
the same commit (see below).

Gr. AvS

commit ebb945a94bba2ce8dff7b0942ff2b3f2a52a0a69
Author: Ben Skeggs <bskeggs@redhat.com>
Date:   Fri Jul 20 08:17:34 2012 +1000

     drm/nouveau: port all engines to new engine module format

     This is a HUGE commit, but it's not nearly as bad as it looks - any 
problems
     can be isolated to a particular chipset and engine combination.  It was
     simply too difficult to port each one at a time, the compat layers are
     *already* ridiculous.

     Most of the changes here are simply to the glue, the process for 
each of the
     engine modules was to start with a standard skeleton and copy+paste 
the old
     code into the appropriate places, fixing up variable names etc as 
needed.

     v2: Marcin Slusarz <marcin.slusarz@gmail.com>
     - fix find/replace bug in license header

     v3: Ben Skeggs <bskeggs@redhat.com>
     - bump indirect pushbuf size to 8KiB, 4KiB barely enough for 
userspace and
       left no space for kernel's requirements during GEM pushbuf 
submission.
     - fix duplicate assignments noticed by clang

     v4: Marcin Slusarz <marcin.slusarz@gmail.com>
     - add sparse annotations to nv04_fifo_pause/nv04_fifo_start
     - use ioread32_native/iowrite32_native for fifo control registers

     v5: Ben Skeggs <bskeggs@redhat.com>
     - rebase on v3.6-rc4, modified to keep copy engine fix intact
     - nv10/fence: unmap fence bo before destroying
     - fixed fermi regression when using nvidia gr fuc
     - fixed typo in supported dma_mask checking

     Signed-off-by: Ben Skeggs <bskeggs@redhat.com>




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG 3.7-rc1] nouveau cli->mutex possible recursive locking detected
  2012-10-24 12:45   ` Arend van Spriel
  2012-10-25  9:26     ` Arend van Spriel
@ 2012-10-25 10:14     ` Arend van Spriel
  1 sibling, 0 replies; 5+ messages in thread
From: Arend van Spriel @ 2012-10-25 10:14 UTC (permalink / raw)
  To: Ben Skeggs; +Cc: Stanislaw Gruszka, dri-devel, linux-kernel

On 10/24/2012 02:45 PM, Arend van Spriel wrote:
> On 10/24/2012 01:14 PM, Arend van Spriel wrote:
>> On 10/16/2012 02:43 PM, Stanislaw Gruszka wrote:
>>> I have this lockdep warning on wireless-testing tree based
>>> on 3.7-rc1 (no other patches except wireless bits).
>>>
>>> =============================================
>>> Restarting tasks ... done.
>>> [ INFO: possible recursive locking detected ]
>>> 3.7.0-rc1-wl+ #2 Not tainted
>>> ---------------------------------------------
>>> Xorg/2269 is trying to acquire lock:
>>>   (&cli->mutex){+.+.+.}, at: [<ffffffffa012a27f>]
>>> nouveau_bo_move_m2mf+0x5f/0x170 [nouveau]
>>>
>>> but task is already holding lock:
>>>   (&cli->mutex){+.+.+.}, at: [<ffffffffa012f3c4>]
>>> nouveau_abi16_get+0x34/0x100 [nouveau]
>>>
>>
>> I have observed the same bug so I built and tested v3.7-rc2 tag with
>> lockdep enabled. It has the same problem and it results in a failure to
>> resume after suspend. See below.
>>
>> Gr. AvS
>
> digging into the trace:
>
>
> nouveau_gem_ioctl_pushbuf() calls nouveau_abi16_get() which grabs the
> mutex. Assume this should protect the chan variable passed to
> nouveau_gem_pushbuf_validate(), which does a bit more that validate as
> it ends up in nouveau_bo_move_m2mf() which uses the drm->chan. However,
> it deadlocks before that.
>
> Gr. AvS

I reverted the two drm merges:

ceb736c Merge branch 'drm-nouveau-fixes' of 
git://anongit.freedesktop.org/git/no
612a9aa Merge branch 'drm-next' of 
git://people.freedesktop.org/~airlied/linux

It is not surprising that it solved the deadlock (doing pm_test). 
Unfortunately, suspend/resume still does not work. System goes to sleep 
just fine, but when trying to resume the BIOS kicks in and system boots 
instead of waking up.

Gr. AvS



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-10-25 10:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-16 12:43 [BUG 3.7-rc1] nouveau cli->mutex possible recursive locking detected Stanislaw Gruszka
2012-10-24 11:14 ` Arend van Spriel
2012-10-24 12:45   ` Arend van Spriel
2012-10-25  9:26     ` Arend van Spriel
2012-10-25 10:14     ` Arend van Spriel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).