All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>,
	Miklos Szeredi <miklos@szeredi.hu>
Cc: Qian Cai <cai@lca.pw>, Hugh Dickins <hughd@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Amir Goldstein <amir73il@gmail.com>
Subject: Re: Possible deadlock in fuse write path (Was: Re: [PATCH 0/4] Some more lock_page work..)
Date: Thu, 15 Oct 2020 15:55:26 -0400	[thread overview]
Message-ID: <20201015195526.GC226448@redhat.com> (raw)
In-Reply-To: <20201015151606.GA226448@redhat.com>

On Thu, Oct 15, 2020 at 11:16:06AM -0400, Vivek Goyal wrote:
> On Wed, Oct 14, 2020 at 07:44:16PM -0700, Linus Torvalds wrote:
> > On Wed, Oct 14, 2020 at 6:48 PM Qian Cai <cai@lca.pw> wrote:
> > >
> > > While on this topic, I just want to bring up a bug report that we are chasing an
> > > issue that a process is stuck in the loop of wait_on_page_bit_common() for more
> > > than 10 minutes before I gave up.
> > 
> > Judging by call trace, that looks like a deadlock rather than a missed wakeup.
> > 
> > The trace isn't reliable, but I find it suspicious that the call trace
> > just before the fault contains that
> > "iov_iter_copy_from_user_atomic()".
> > 
> > IOW, I think you're in fuse_fill_write_pages(), which has allocated
> > the page, locked it, and then it takes a page fault.
> > 
> > And the page fault waits on a page that is locked.
> > 
> > This is a classic deadlock.
> > 
> > The *intent* is that iov_iter_copy_from_user_atomic() returns zero,
> > and you retry without the page lock held.
> > 
> > HOWEVER.
> > 
> > That's not what fuse actually does. Fuse will do multiple pages, and
> > it will unlock only the _last_ page. It keeps the other pages locked,
> > and puts them in an array:
> > 
> >                 ap->pages[ap->num_pages] = page;
> > 
> > And after the iov_iter_copy_from_user_atomic() fails, it does that
> > "unlock" and repeat.
> > 
> > But while the _last_ page was unlocked, the *previous* pages are still
> > locked in that array. Deadlock.
> > 
> > I really don't think this has anything at all to do with page locking,
> > and everything to do with fuse_fill_write_pages() having a deadlock if
> > the source of data is a mmap of one of the pages it is trying to write
> > to (just with an offset, so that it's not the last page).
> > 
> > See a similar code sequence in generic_perform_write(), but notice how
> > that code only has *one* page that it locks, and never holds an array
> > of pages around over that iov_iter_fault_in_readable() thing.
> 
> Indeed. This is a deadlock in fuse. Thanks for the analysis. I can now
> trivially reproduce it with following program.

I am wondering how should I fix this issue. Is it enough that I drop
the page lock (but keep the reference) inside the loop. And once copying
from user space is done, acquire page locks for all pages (Attached
a patch below).

Or dropping page lock means that there are no guarantees that this
page did not get written back and removed from address space and
a new page has been placed at same offset. Does that mean I should
instead be looking up page cache again after copying from user
space is done. 

---
 fs/fuse/file.c |   20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Index: redhat-linux/fs/fuse/file.c
===================================================================
--- redhat-linux.orig/fs/fuse/file.c	2020-10-15 14:11:23.464720742 -0400
+++ redhat-linux/fs/fuse/file.c	2020-10-15 15:23:17.716569698 -0400
@@ -1117,7 +1117,7 @@ static ssize_t fuse_fill_write_pages(str
 	struct fuse_conn *fc = get_fuse_conn(mapping->host);
 	unsigned offset = pos & (PAGE_SIZE - 1);
 	size_t count = 0;
-	int err;
+	int err, i;
 
 	ap->args.in_pages = true;
 	ap->descs[0].offset = offset;
@@ -1155,6 +1155,14 @@ static ssize_t fuse_fill_write_pages(str
 			goto again;
 		}
 
+		/*
+		 * Unlock page but retain reference count. Unlock is requied
+		 * otherwise it is possible that next loop iteration tries
+		 * to fault in page (source address) we have lock on and we
+		 * will deadlock. So drop lock but keep a reference on the
+		 * page. Re-acquire page lock after breaking out of the loop
+		 */
+		unlock_page(page);
 		err = 0;
 		ap->pages[ap->num_pages] = page;
 		ap->descs[ap->num_pages].length = tmp;
@@ -1171,7 +1179,15 @@ static ssize_t fuse_fill_write_pages(str
 	} while (iov_iter_count(ii) && count < fc->max_write &&
 		 ap->num_pages < max_pages && offset == 0);
 
-	return count > 0 ? count : err;
+	if (count <= 0)
+		return err;
+
+	for (i = 0; i < ap->num_pages; i++) {
+		/* Take page lock */
+		lock_page(ap->pages[i]);
+	}
+
+	return count;
 }
 
 static inline unsigned int fuse_wr_pages(loff_t pos, size_t len,


  reply	other threads:[~2020-10-15 19:55 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-13 19:59 [PATCH 0/4] Some more lock_page work Linus Torvalds
2020-10-13 19:59 ` Linus Torvalds
2020-10-13 20:03 ` Linus Torvalds
2020-10-13 20:03   ` Linus Torvalds
2020-10-14 13:05   ` Kirill A. Shutemov
2020-10-14 16:53     ` Linus Torvalds
2020-10-14 16:53       ` Linus Torvalds
2020-10-14 18:15       ` Matthew Wilcox
2020-10-15 10:41         ` Kirill A. Shutemov
2020-10-15  9:43       ` Kirill A. Shutemov
2020-10-15 16:44         ` Linus Torvalds
2020-10-15 16:44           ` Linus Torvalds
2020-10-14  5:50 ` Hugh Dickins
2020-10-14  5:50   ` Hugh Dickins
2020-10-15  1:48 ` Qian Cai
2020-10-15  2:44   ` Linus Torvalds
2020-10-15  2:44     ` Linus Torvalds
2020-10-15 15:16     ` Possible deadlock in fuse write path (Was: Re: [PATCH 0/4] Some more lock_page work..) Vivek Goyal
2020-10-15 19:55       ` Vivek Goyal [this message]
2020-10-15 21:21         ` Linus Torvalds
2020-10-15 21:21           ` Linus Torvalds
2020-10-16 10:02           ` Miklos Szeredi
2020-10-16 10:02             ` Miklos Szeredi
2020-10-16 12:27             ` Matthew Wilcox
2020-10-20 20:42             ` Vivek Goyal
2020-10-21  7:40               ` Miklos Szeredi
2020-10-21  7:40                 ` Miklos Szeredi
2020-10-21 20:12                 ` Vivek Goyal
2020-10-28 20:29                   ` Miklos Szeredi
2020-10-28 20:29                     ` Miklos Szeredi
2021-02-09 10:01                     ` Miklos Szeredi
2021-02-09 19:09                       ` Vivek Goyal
2020-10-16 18:19           ` Vivek Goyal
2020-10-16 18:24             ` Linus Torvalds
2020-10-16 18:24               ` Linus Torvalds
2020-10-16 18:24               ` Linus Torvalds
2020-10-16 18:24                 ` Linus Torvalds
2020-10-16 23:03             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201015195526.GC226448@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=cai@lca.pw \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.