linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Fangjian (Turing)" <f.fangjian@huawei.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Sinan Kaya <Okaya@kernel.org>
Cc: <linux-pci@vger.kernel.org>, Bjorn Helgaas <helgaas@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: Re: Bug report: AER driver deadlock
Date: Thu, 20 Jun 2019 11:14:11 +0800	[thread overview]
Message-ID: <7b98d81c-55bd-1782-f214-7bbf48e54f16@huawei.com> (raw)
In-Reply-To: <a1c90cfb9ce4062b4823c6647d7709baf1c5534f.camel@kernel.crashing.org>

Hi,
Are there any further advice?

On 2019/6/5 7:47, Benjamin Herrenschmidt wrote:
> On Tue, 2019-06-04 at 10:34 -0400, Sinan Kaya wrote:
>> On 6/3/19, Fangjian (Turing) <f.fangjian@huawei.com> wrote:
>>> Hi, We met a deadlock triggered by a NONFATAL AER event during a sysfs
>>> "sriov_numvfs" operation. Any suggestion to fix such deadlock ?
>>>
>>>   enable one VF
>>>   # echo 1 > /sys/devices/pci0000:74/0000:74:00.0/0000:75:00.0/sriov_numvfs
>>>
>>>   The sysfs "sriov_numvfs" side is:
>>>
>>>     sriov_numvfs_store
>>>       device_lock                               # hold the device_lock
>>>         ...
>>>         pci_enable_sriov
>>>           sriov_enable
>>>             ...
>>>             pci_device_add
>>>               down_write(&pci_bus_sem) 	        # wait for
>>> up_read(&pci_bus_sem)
>>>
>>>   The AER side is:
>>>
>>>     pcie_do_recovery
>>>       pci_walk_bus
>>>         down_read(&pci_bus_sem)                 # hold the rw_semaphore
>>>         report_resume
>>
>> Should we replace these device lock with try lock loop with some sleep
>> statements. This could solve the immediate deadlock issues until
>> someone implements granular locking in pci.
> 
> That won't necessarily solve this AB->BA problem. I think the issue
> here is that sriov shouldn't device_lock before doing something that
> can take the pci_bus_sem.
> 
> Ben.
> 
> 
>>>           device_lock                           # wait for device_unlock()
>>>
>>> The calltrace is as below:
>>>
>>> [  258.411464] INFO: task kworker/0:1:13 blocked for more than 120 seconds.
>>> [  258.418139]       Tainted: G         C O      5.1.0-rc1-ge2e3ca0 #1
>>> [  258.424379] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>>> this message.
>>> [  258.432172] kworker/0:1     D    0    13      2 0x00000028
>>> [  258.437640] Workqueue: events aer_recover_work_func
>>> [  258.442496] Call trace:
>>> [  258.444933]  __switch_to+0xb4/0x1b8
>>> [  258.448409]  __schedule+0x1ec/0x720
>>> [  258.451884]  schedule+0x38/0x90
>>> [  258.455012]  schedule_preempt_disabled+0x20/0x38
>>> [  258.459610]  __mutex_lock.isra.1+0x150/0x518
>>> [  258.463861]  __mutex_lock_slowpath+0x10/0x18
>>> [  258.468112]  mutex_lock+0x34/0x40
>>> [  258.471413]  report_resume+0x1c/0x78
>>> [  258.474973]  pci_walk_bus+0x58/0xb0
>>> [  258.478451]  pcie_do_recovery+0x18c/0x248
>>> [  258.482445]  aer_recover_work_func+0xe0/0x118
>>> [  258.486783]  process_one_work+0x1e4/0x468
>>> [  258.490776]  worker_thread+0x40/0x450
>>> [  258.494424]  kthread+0x128/0x130
>>> [  258.497639]  ret_from_fork+0x10/0x1c
>>> [  258.501329] INFO: task flr.sh:4534 blocked for more than 120 seconds.
>>> [  258.507742]       Tainted: G         C O      5.1.0-rc1-ge2e3ca0 #1
>>> [  258.513980] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>>> this message.
>>> [  258.521774] flr.sh          D    0  4534   4504 0x00000000
>>> [  258.527235] Call trace:
>>> [  258.529671]  __switch_to+0xb4/0x1b8
>>> [  258.533146]  __schedule+0x1ec/0x720
>>> [  258.536619]  schedule+0x38/0x90
>>> [  258.539749]  rwsem_down_write_failed+0x14c/0x210
>>> [  258.544347]  down_write+0x48/0x60
>>> [  258.547648]  pci_device_add+0x1a0/0x290
>>> [  258.551469]  pci_iov_add_virtfn+0x190/0x358
>>> [  258.555633]  sriov_enable+0x24c/0x480
>>> [  258.559279]  pci_enable_sriov+0x14/0x28
>>> [  258.563101]  hisi_zip_sriov_configure+0x64/0x100 [hisi_zip]
>>> [  258.568649]  sriov_numvfs_store+0xc4/0x190
>>> [  258.572728]  dev_attr_store+0x18/0x28
>>> [  258.576375]  sysfs_kf_write+0x3c/0x50
>>> [  258.580024]  kernfs_fop_write+0x114/0x1d8
>>> [  258.584018]  __vfs_write+0x18/0x38
>>> [  258.587404]  vfs_write+0xa4/0x1b0
>>> [  258.590705]  ksys_write+0x60/0xd8
>>> [  258.594007]  __arm64_sys_write+0x18/0x20
>>> [  258.597914]  el0_svc_common+0x5c/0x100
>>> [  258.601646]  el0_svc_handler+0x2c/0x80
>>> [  258.605381]  el0_svc+0x8/0xc
>>> [  379.243461] INFO: task kworker/0:1:13 blocked for more than 241 seconds.
>>> [  379.250134]       Tainted: G         C O      5.1.0-rc1-ge2e3ca0 #1
>>> [  379.256373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
>>> this message.
>>>
>>>
>>> Thank you,
>>> Jay
>>>
>>>
> 
> 
> .
> 


  parent reply	other threads:[~2019-06-20  3:13 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-04  3:25 Bug report: AER driver deadlock Fangjian (Turing)
2019-06-04 14:34 ` Sinan Kaya
2019-06-04 23:47   ` Benjamin Herrenschmidt
2019-06-05  0:59     ` Sinan Kaya
2019-06-20  3:14     ` Fangjian (Turing) [this message]
2019-06-25 17:16 ` Bjorn Helgaas
2019-08-05 12:43   ` Fangjian (Turing)
2019-08-16  7:11     ` Jay Fang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b98d81c-55bd-1782-f214-7bbf48e54f16@huawei.com \
    --to=f.fangjian@huawei.com \
    --cc=Okaya@kernel.org \
    --cc=benh@kernel.crashing.org \
    --cc=helgaas@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).