From: David Hildenbrand <david@redhat.com>
To: gregkh@linuxfoundation.org, akpm@linux-foundation.org,
bhe@redhat.com, boqun.feng@gmail.com, dyoung@redhat.com,
josh@joshtriplett.org, paulmck@kernel.org, peterz@infradead.org,
stable@vger.kernel.org, torvalds@linux-foundation.org,
vgoyal@redhat.com
Subject: Re: FAILED: patch "[PATCH] proc/vmcore: fix possible deadlock on concurrent mmap and" failed to apply to 5.16-stable tree
Date: Mon, 4 Apr 2022 12:17:47 +0200 [thread overview]
Message-ID: <5a220426-6b83-6a0e-5af0-ee4c76e72c79@redhat.com> (raw)
In-Reply-To: <164889941824213@kroah.com>
On 02.04.22 13:36, gregkh@linuxfoundation.org wrote:
>
> The patch below does not apply to the 5.16-stable tree.
> If someone wants it applied there, or to any other stable or longterm
> tree, then please email the backport, including the original git commit
> id to <stable@vger.kernel.org>.
>
I don't think we need that particular patch in -stable. The deadlock
shouldn't really happen in practice (concurrent addition/removal of a
callback doesn't really happen in a kdump anvironment). Thanks.
> thanks,
>
> greg k-h
>
> ------------------ original commit in Linus's tree ------------------
>
> From 5039b170369d22613ebc07e81410891f52280a45 Mon Sep 17 00:00:00 2001
> From: David Hildenbrand <david@redhat.com>
> Date: Wed, 23 Mar 2022 16:05:23 -0700
> Subject: [PATCH] proc/vmcore: fix possible deadlock on concurrent mmap and
> read
>
> Lockdep noticed that there is chance for a deadlock if we have concurrent
> mmap, concurrent read, and the addition/removal of a callback.
>
> As nicely explained by Boqun:
> "Lockdep warned about the above sequences because rw_semaphore is a
> fair read-write lock, and the following can cause a deadlock:
>
> TASK 1 TASK 2 TASK 3
> ====== ====== ======
> down_write(mmap_lock);
> down_read(vmcore_cb_rwsem)
> down_write(vmcore_cb_rwsem); // blocked
> down_read(vmcore_cb_rwsem); // cannot get the lock because of the fairness
> down_read(mmap_lock); // blocked
>
> IOW, a reader can block another read if there is a writer queued by
> the second reader and the lock is fair"
>
> To fix this, convert to srcu to make this deadlock impossible. We need
> srcu as our callbacks can sleep. With this change, I cannot trigger any
> lockdep warnings.
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1 Not tainted
> ------------------------------------------------------
> makedumpfile/542 is trying to acquire lock:
> ffffffff832d2eb8 (vmcore_cb_rwsem){.+.+}-{3:3}, at: mmap_vmcore+0x340/0x580
>
> but task is already holding lock:
> ffff8880af226438 (&mm->mmap_lock#2){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x150
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&mm->mmap_lock#2){++++}-{3:3}:
> lock_acquire+0xc3/0x1a0
> __might_fault+0x4e/0x70
> _copy_to_user+0x1f/0x90
> __copy_oldmem_page+0x72/0xc0
> read_from_oldmem+0x77/0x1e0
> read_vmcore+0x2c2/0x310
> proc_reg_read+0x47/0xa0
> vfs_read+0x101/0x340
> __x64_sys_pread64+0x5d/0xa0
> do_syscall_64+0x43/0x90
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> -> #0 (vmcore_cb_rwsem){.+.+}-{3:3}:
> validate_chain+0x9f4/0x2670
> __lock_acquire+0x8f7/0xbc0
> lock_acquire+0xc3/0x1a0
> down_read+0x4a/0x140
> mmap_vmcore+0x340/0x580
> proc_reg_mmap+0x3e/0x90
> mmap_region+0x504/0x880
> do_mmap+0x38a/0x520
> vm_mmap_pgoff+0xc1/0x150
> ksys_mmap_pgoff+0x178/0x200
> do_syscall_64+0x43/0x90
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> other info that might help us debug this:
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&mm->mmap_lock#2);
> lock(vmcore_cb_rwsem);
> lock(&mm->mmap_lock#2);
> lock(vmcore_cb_rwsem);
>
> *** DEADLOCK ***
>
> 1 lock held by makedumpfile/542:
> #0: ffff8880af226438 (&mm->mmap_lock#2){++++}-{3:3}, at: vm_mmap_pgoff+0x84/0x150
>
> stack backtrace:
> CPU: 0 PID: 542 Comm: makedumpfile Not tainted 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> Call Trace:
> __lock_acquire+0x8f7/0xbc0
> lock_acquire+0xc3/0x1a0
> down_read+0x4a/0x140
> mmap_vmcore+0x340/0x580
> proc_reg_mmap+0x3e/0x90
> mmap_region+0x504/0x880
> do_mmap+0x38a/0x520
> vm_mmap_pgoff+0xc1/0x150
> ksys_mmap_pgoff+0x178/0x200
> do_syscall_64+0x43/0x90
>
> Link: https://lkml.kernel.org/r/20220119193417.100385-1-david@redhat.com
> Fixes: cc5f2704c934 ("proc/vmcore: convert oldmem_pfn_is_ram callback to more generic vmcore callbacks")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> Reported-by: Baoquan He <bhe@redhat.com>
> Acked-by: Baoquan He <bhe@redhat.com>
> Cc: Vivek Goyal <vgoyal@redhat.com>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Josh Triplett <josh@joshtriplett.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Boqun Feng <boqun.feng@gmail.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>
> diff --git a/fs/proc/vmcore.c b/fs/proc/vmcore.c
> index 702754dd1daf..edeb01dfe05d 100644
> --- a/fs/proc/vmcore.c
> +++ b/fs/proc/vmcore.c
> @@ -62,7 +62,8 @@ core_param(novmcoredd, vmcoredd_disabled, bool, 0);
> /* Device Dump Size */
> static size_t vmcoredd_orig_sz;
>
> -static DECLARE_RWSEM(vmcore_cb_rwsem);
> +static DEFINE_SPINLOCK(vmcore_cb_lock);
> +DEFINE_STATIC_SRCU(vmcore_cb_srcu);
> /* List of registered vmcore callbacks. */
> static LIST_HEAD(vmcore_cb_list);
> /* Whether the vmcore has been opened once. */
> @@ -70,8 +71,8 @@ static bool vmcore_opened;
>
> void register_vmcore_cb(struct vmcore_cb *cb)
> {
> - down_write(&vmcore_cb_rwsem);
> INIT_LIST_HEAD(&cb->next);
> + spin_lock(&vmcore_cb_lock);
> list_add_tail(&cb->next, &vmcore_cb_list);
> /*
> * Registering a vmcore callback after the vmcore was opened is
> @@ -79,14 +80,14 @@ void register_vmcore_cb(struct vmcore_cb *cb)
> */
> if (vmcore_opened)
> pr_warn_once("Unexpected vmcore callback registration\n");
> - up_write(&vmcore_cb_rwsem);
> + spin_unlock(&vmcore_cb_lock);
> }
> EXPORT_SYMBOL_GPL(register_vmcore_cb);
>
> void unregister_vmcore_cb(struct vmcore_cb *cb)
> {
> - down_write(&vmcore_cb_rwsem);
> - list_del(&cb->next);
> + spin_lock(&vmcore_cb_lock);
> + list_del_rcu(&cb->next);
> /*
> * Unregistering a vmcore callback after the vmcore was opened is
> * very unusual (e.g., forced driver removal), but we cannot stop
> @@ -94,7 +95,9 @@ void unregister_vmcore_cb(struct vmcore_cb *cb)
> */
> if (vmcore_opened)
> pr_warn_once("Unexpected vmcore callback unregistration\n");
> - up_write(&vmcore_cb_rwsem);
> + spin_unlock(&vmcore_cb_lock);
> +
> + synchronize_srcu(&vmcore_cb_srcu);
> }
> EXPORT_SYMBOL_GPL(unregister_vmcore_cb);
>
> @@ -103,9 +106,8 @@ static bool pfn_is_ram(unsigned long pfn)
> struct vmcore_cb *cb;
> bool ret = true;
>
> - lockdep_assert_held_read(&vmcore_cb_rwsem);
> -
> - list_for_each_entry(cb, &vmcore_cb_list, next) {
> + list_for_each_entry_srcu(cb, &vmcore_cb_list, next,
> + srcu_read_lock_held(&vmcore_cb_srcu)) {
> if (unlikely(!cb->pfn_is_ram))
> continue;
> ret = cb->pfn_is_ram(cb, pfn);
> @@ -118,9 +120,9 @@ static bool pfn_is_ram(unsigned long pfn)
>
> static int open_vmcore(struct inode *inode, struct file *file)
> {
> - down_read(&vmcore_cb_rwsem);
> + spin_lock(&vmcore_cb_lock);
> vmcore_opened = true;
> - up_read(&vmcore_cb_rwsem);
> + spin_unlock(&vmcore_cb_lock);
>
> return 0;
> }
> @@ -133,6 +135,7 @@ ssize_t read_from_oldmem(char *buf, size_t count,
> unsigned long pfn, offset;
> size_t nr_bytes;
> ssize_t read = 0, tmp;
> + int idx;
>
> if (!count)
> return 0;
> @@ -140,7 +143,7 @@ ssize_t read_from_oldmem(char *buf, size_t count,
> offset = (unsigned long)(*ppos % PAGE_SIZE);
> pfn = (unsigned long)(*ppos / PAGE_SIZE);
>
> - down_read(&vmcore_cb_rwsem);
> + idx = srcu_read_lock(&vmcore_cb_srcu);
> do {
> if (count > (PAGE_SIZE - offset))
> nr_bytes = PAGE_SIZE - offset;
> @@ -165,7 +168,7 @@ ssize_t read_from_oldmem(char *buf, size_t count,
> offset, userbuf);
> }
> if (tmp < 0) {
> - up_read(&vmcore_cb_rwsem);
> + srcu_read_unlock(&vmcore_cb_srcu, idx);
> return tmp;
> }
>
> @@ -176,8 +179,8 @@ ssize_t read_from_oldmem(char *buf, size_t count,
> ++pfn;
> offset = 0;
> } while (count);
> + srcu_read_unlock(&vmcore_cb_srcu, idx);
>
> - up_read(&vmcore_cb_rwsem);
> return read;
> }
>
> @@ -568,18 +571,18 @@ static int vmcore_remap_oldmem_pfn(struct vm_area_struct *vma,
> unsigned long from, unsigned long pfn,
> unsigned long size, pgprot_t prot)
> {
> - int ret;
> + int ret, idx;
>
> /*
> - * Check if oldmem_pfn_is_ram was registered to avoid
> - * looping over all pages without a reason.
> + * Check if a callback was registered to avoid looping over all
> + * pages without a reason.
> */
> - down_read(&vmcore_cb_rwsem);
> + idx = srcu_read_lock(&vmcore_cb_srcu);
> if (!list_empty(&vmcore_cb_list))
> ret = remap_oldmem_pfn_checked(vma, from, pfn, size, prot);
> else
> ret = remap_oldmem_pfn_range(vma, from, pfn, size, prot);
> - up_read(&vmcore_cb_rwsem);
> + srcu_read_unlock(&vmcore_cb_srcu, idx);
> return ret;
> }
>
>
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2022-04-04 10:17 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-02 11:36 FAILED: patch "[PATCH] proc/vmcore: fix possible deadlock on concurrent mmap and" failed to apply to 5.16-stable tree gregkh
2022-04-04 10:17 ` David Hildenbrand [this message]
2022-04-04 10:40 ` Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5a220426-6b83-6a0e-5af0-ee4c76e72c79@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=bhe@redhat.com \
--cc=boqun.feng@gmail.com \
--cc=dyoung@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=josh@joshtriplett.org \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).