From: Lu Baolu <baolu.lu@linux.intel.com>
To: Peter Xu <peterx@redhat.com>,
Linux IOMMU Mailing List <iommu@lists.linux-foundation.org>
Cc: dave.jiang@intel.com
Subject: Re: VT-d deadlock issue on device_domain_lock and iommu lock (5.2-rc5)
Date: Fri, 21 Jun 2019 09:58:28 +0800 [thread overview]
Message-ID: <de430ed1-a9f0-6970-a3b3-682fb3dbd8ac@linux.intel.com> (raw)
In-Reply-To: <20190620104418.GA9657@xz-x1>
Hi Peter,
I agree with you that 7560cc3ca7d9 ("iommu/vt-d: Fix lock inversion
between iommu->lock and device_domain_lock") isn't a good fix. There
is also another thread, https://lkml.org/lkml/2019/6/18/996, which
reported this.
I think we can revert this patch now. I will try to reproduce the
original issue and try to find a new fix.
Can you please submit the revert patch?
Best regards,
Baolu
On 6/20/19 6:44 PM, Peter Xu wrote:
> Hi,
>
> With 5.2.0-rc5 I can easily trigger this with lockdep:
>
> --------------------------------------------
> Jun 20 14:37:37 xz-x1 kernel: ======================================================
> Jun 20 14:37:37 xz-x1 kernel: WARNING: possible circular locking dependency detected
> Jun 20 14:37:37 xz-x1 kernel: 5.2.0-rc5 #78 Not tainted
> Jun 20 14:37:37 xz-x1 kernel: ------------------------------------------------------
> Jun 20 14:37:37 xz-x1 kernel: swapper/0/1 is trying to acquire lock:
> Jun 20 14:37:37 xz-x1 kernel: 00000000ea2b3beb (&(&iommu->lock)->rlock){+.+.}, at: domain_context_mapping_one+0xa5/0x4e0
> Jun 20 14:37:37 xz-x1 kernel:
> but task is already holding lock:
> Jun 20 14:37:37 xz-x1 kernel: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
> Jun 20 14:37:37 xz-x1 kernel:
> which lock already depends on the new lock.
> Jun 20 14:37:37 xz-x1 kernel:
> the existing dependency chain (in reverse order) is:
> Jun 20 14:37:37 xz-x1 kernel:
> -> #1 (device_domain_lock){....}:
> Jun 20 14:37:37 xz-x1 kernel: _raw_spin_lock_irqsave+0x3c/0x50
> Jun 20 14:37:37 xz-x1 kernel: dmar_insert_one_dev_info+0xbb/0x510
> Jun 20 14:37:37 xz-x1 kernel: domain_add_dev_info+0x50/0x90
> Jun 20 14:37:37 xz-x1 kernel: dev_prepare_static_identity_mapping+0x30/0x68
> Jun 20 14:37:37 xz-x1 kernel: intel_iommu_init+0xddd/0x1422
> Jun 20 14:37:37 xz-x1 kernel: pci_iommu_init+0x16/0x3f
> Jun 20 14:37:37 xz-x1 kernel: do_one_initcall+0x5d/0x2b4
> Jun 20 14:37:37 xz-x1 kernel: kernel_init_freeable+0x218/0x2c1
> Jun 20 14:37:37 xz-x1 kernel: kernel_init+0xa/0x100
> Jun 20 14:37:37 xz-x1 kernel: ret_from_fork+0x3a/0x50
> Jun 20 14:37:37 xz-x1 kernel:
> -> #0 (&(&iommu->lock)->rlock){+.+.}:
> Jun 20 14:37:37 xz-x1 kernel: lock_acquire+0x9e/0x170
> Jun 20 14:37:37 xz-x1 kernel: _raw_spin_lock+0x25/0x30
> Jun 20 14:37:37 xz-x1 kernel: domain_context_mapping_one+0xa5/0x4e0
> Jun 20 14:37:37 xz-x1 kernel: pci_for_each_dma_alias+0x30/0x140
> Jun 20 14:37:37 xz-x1 kernel: dmar_insert_one_dev_info+0x3b2/0x510
> Jun 20 14:37:37 xz-x1 kernel: domain_add_dev_info+0x50/0x90
> Jun 20 14:37:37 xz-x1 kernel: dev_prepare_static_identity_mapping+0x30/0x68
> Jun 20 14:37:37 xz-x1 kernel: intel_iommu_init+0xddd/0x1422
> Jun 20 14:37:37 xz-x1 kernel: pci_iommu_init+0x16/0x3f
> Jun 20 14:37:37 xz-x1 kernel: do_one_initcall+0x5d/0x2b4
> Jun 20 14:37:37 xz-x1 kernel: kernel_init_freeable+0x218/0x2c1
> Jun 20 14:37:37 xz-x1 kernel: kernel_init+0xa/0x100
> Jun 20 14:37:37 xz-x1 kernel: ret_from_fork+0x3a/0x50
> Jun 20 14:37:37 xz-x1 kernel:
> other info that might help us debug this:
> Jun 20 14:37:37 xz-x1 kernel: Possible unsafe locking scenario:
> Jun 20 14:37:37 xz-x1 kernel: CPU0 CPU1
> Jun 20 14:37:37 xz-x1 kernel: ---- ----
> Jun 20 14:37:37 xz-x1 kernel: lock(device_domain_lock);
> Jun 20 14:37:37 xz-x1 kernel: lock(&(&iommu->lock)->rlock);
> Jun 20 14:37:37 xz-x1 kernel: lock(device_domain_lock);
> Jun 20 14:37:37 xz-x1 kernel: lock(&(&iommu->lock)->rlock);
> Jun 20 14:37:37 xz-x1 kernel:
> *** DEADLOCK ***
> Jun 20 14:37:37 xz-x1 kernel: 2 locks held by swapper/0/1:
> Jun 20 14:37:37 xz-x1 kernel: #0: 00000000033eb13d (dmar_global_lock){++++}, at: intel_iommu_init+0x1e0/0x1422
> Jun 20 14:37:37 xz-x1 kernel: #1: 00000000a681907b (device_domain_lock){....}, at: domain_context_mapping_one+0x8d/0x4e0
> Jun 20 14:37:37 xz-x1 kernel:
> stack backtrace:
> Jun 20 14:37:37 xz-x1 kernel: CPU: 2 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc5 #78
> Jun 20 14:37:37 xz-x1 kernel: Hardware name: LENOVO 20KGS35G01/20KGS35G01, BIOS N23ET50W (1.25 ) 06/25/2018
> Jun 20 14:37:37 xz-x1 kernel: Call Trace:
> Jun 20 14:37:37 xz-x1 kernel: dump_stack+0x85/0xc0
> Jun 20 14:37:37 xz-x1 kernel: print_circular_bug.cold.57+0x15c/0x195
> Jun 20 14:37:37 xz-x1 kernel: __lock_acquire+0x152a/0x1710
> Jun 20 14:37:37 xz-x1 kernel: lock_acquire+0x9e/0x170
> Jun 20 14:37:37 xz-x1 kernel: ? domain_context_mapping_one+0xa5/0x4e0
> Jun 20 14:37:37 xz-x1 kernel: _raw_spin_lock+0x25/0x30
> Jun 20 14:37:37 xz-x1 kernel: ? domain_context_mapping_one+0xa5/0x4e0
> Jun 20 14:37:37 xz-x1 kernel: domain_context_mapping_one+0xa5/0x4e0
> Jun 20 14:37:37 xz-x1 kernel: ? domain_context_mapping_one+0x4e0/0x4e0
> Jun 20 14:37:37 xz-x1 kernel: pci_for_each_dma_alias+0x30/0x140
> Jun 20 14:37:37 xz-x1 kernel: dmar_insert_one_dev_info+0x3b2/0x510
> Jun 20 14:37:37 xz-x1 kernel: domain_add_dev_info+0x50/0x90
> Jun 20 14:37:37 xz-x1 kernel: dev_prepare_static_identity_mapping+0x30/0x68
> Jun 20 14:37:37 xz-x1 kernel: intel_iommu_init+0xddd/0x1422
> Jun 20 14:37:37 xz-x1 kernel: ? printk+0x58/0x6f
> Jun 20 14:37:37 xz-x1 kernel: ? lockdep_hardirqs_on+0xf0/0x180
> Jun 20 14:37:37 xz-x1 kernel: ? do_early_param+0x8e/0x8e
> Jun 20 14:37:37 xz-x1 kernel: ? e820__memblock_setup+0x63/0x63
> Jun 20 14:37:37 xz-x1 kernel: pci_iommu_init+0x16/0x3f
> Jun 20 14:37:37 xz-x1 kernel: do_one_initcall+0x5d/0x2b4
> Jun 20 14:37:37 xz-x1 kernel: ? do_early_param+0x8e/0x8e
> Jun 20 14:37:37 xz-x1 kernel: ? rcu_read_lock_sched_held+0x55/0x60
> Jun 20 14:37:37 xz-x1 kernel: ? do_early_param+0x8e/0x8e
> Jun 20 14:37:37 xz-x1 kernel: kernel_init_freeable+0x218/0x2c1
> Jun 20 14:37:37 xz-x1 kernel: ? rest_init+0x230/0x230
> Jun 20 14:37:37 xz-x1 kernel: kernel_init+0xa/0x100
> Jun 20 14:37:37 xz-x1 kernel: ret_from_fork+0x3a/0x50
> --------------------------------------------
>
> domain_context_mapping_one() is taking device_domain_lock first then
> iommu lock, while dmar_insert_one_dev_info() is doing the reverse.
>
> I digged a bit and I saw this commit which seems suspicous:
>
> 7560cc3ca7d9 ("iommu/vt-d: Fix lock inversion between iommu->lock and
> device_domain_lock", 2019-05-27)
>
> More interestingly, it was trying to fix the inverted deadlock...
>
> The thing is that even if above commit is correct on the ordering (I
> still feel strange that we need to take a per-iommu lock before
> another global lock), that commit seems to be an incomplete fix
> because there's still other places that are using the other way round.
>
> When I read deeper into that commit message, it seems to be telling me
> that before reaching iommu_flush_dev_iotlb() we've got iommu lock
> somewhere but I cannot really understand how it happened because I
> cannot find a path that iommu lock is taken when reaching
> iommu_flush_dev_iotlb(). So I cannot understand how that lockdep
> warning message could trigger.
>
> I reverted that commit and now everything is good here (no long runs
> but at least previous deadlock issue is fixed). And with that, IMHO
> we'll actually have the correct ordering in the whole repository that
> we'll take device_domain_lock before per iommu lock always.
>
> Is there anything I've missed on why we have had 7560cc3ca7d9?
>
> Thanks,
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
next prev parent reply other threads:[~2019-06-21 2:05 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-20 10:44 VT-d deadlock issue on device_domain_lock and iommu lock (5.2-rc5) Peter Xu
2019-06-21 1:58 ` Lu Baolu [this message]
2019-06-21 2:30 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=de430ed1-a9f0-6970-a3b3-682fb3dbd8ac@linux.intel.com \
--to=baolu.lu@linux.intel.com \
--cc=dave.jiang@intel.com \
--cc=iommu@lists.linux-foundation.org \
--cc=peterx@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).