All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Qian Cai <cai@lca.pw>, Hugh Dickins <hughd@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Amir Goldstein <amir73il@gmail.com>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Possible deadlock in fuse write path (Was: Re: [PATCH 0/4] Some more lock_page work..)
Date: Thu, 15 Oct 2020 11:16:06 -0400	[thread overview]
Message-ID: <20201015151606.GA226448@redhat.com> (raw)
In-Reply-To: <CAHk-=wh9Eu-gNHzqgfvUAAiO=vJ+pWnzxkv+tX55xhGPFy+cOw@mail.gmail.com>

On Wed, Oct 14, 2020 at 07:44:16PM -0700, Linus Torvalds wrote:
> On Wed, Oct 14, 2020 at 6:48 PM Qian Cai <cai@lca.pw> wrote:
> >
> > While on this topic, I just want to bring up a bug report that we are chasing an
> > issue that a process is stuck in the loop of wait_on_page_bit_common() for more
> > than 10 minutes before I gave up.
> 
> Judging by call trace, that looks like a deadlock rather than a missed wakeup.
> 
> The trace isn't reliable, but I find it suspicious that the call trace
> just before the fault contains that
> "iov_iter_copy_from_user_atomic()".
> 
> IOW, I think you're in fuse_fill_write_pages(), which has allocated
> the page, locked it, and then it takes a page fault.
> 
> And the page fault waits on a page that is locked.
> 
> This is a classic deadlock.
> 
> The *intent* is that iov_iter_copy_from_user_atomic() returns zero,
> and you retry without the page lock held.
> 
> HOWEVER.
> 
> That's not what fuse actually does. Fuse will do multiple pages, and
> it will unlock only the _last_ page. It keeps the other pages locked,
> and puts them in an array:
> 
>                 ap->pages[ap->num_pages] = page;
> 
> And after the iov_iter_copy_from_user_atomic() fails, it does that
> "unlock" and repeat.
> 
> But while the _last_ page was unlocked, the *previous* pages are still
> locked in that array. Deadlock.
> 
> I really don't think this has anything at all to do with page locking,
> and everything to do with fuse_fill_write_pages() having a deadlock if
> the source of data is a mmap of one of the pages it is trying to write
> to (just with an offset, so that it's not the last page).
> 
> See a similar code sequence in generic_perform_write(), but notice how
> that code only has *one* page that it locks, and never holds an array
> of pages around over that iov_iter_fault_in_readable() thing.

Indeed. This is a deadlock in fuse. Thanks for the analysis. I can now
trivially reproduce it with following program.

Thanks
Vivek

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/uio.h>

int main(int argc, char *argv[])
{
	int fd, ret;
	void *map_addr;
	size_t map_length = 2 * 4096;
	char *buf_out = "Hello World";
	struct iovec iov[2];

	if (argc != 2 ) {
		printf("Usage:%s <file-to-write> \n", argv[0]);
		exit(1);
	}

	fd = open(argv[1], O_RDWR | O_TRUNC);
	if (fd == -1) {
		fprintf(stderr, "Failed to open file %s:%s, errorno=%d\n", argv[1], strerror(errno), errno);
		exit(1);
	}

	map_addr = mmap(NULL, map_length, PROT_READ | PROT_WRITE, MAP_SHARED,
			fd, 0);
	if (map_addr == MAP_FAILED) {
		fprintf(stderr, "mmap failed %s, errorno=%d\n", strerror(errno),
			errno);
		exit(1);
	}

	/* Write first page and second page */
	pwrite(fd, buf_out, strlen(buf_out), 0);
	pwrite(fd, buf_out, strlen(buf_out), 4096);

	/* Copy from first page and then second page */
	iov[0].iov_base = map_addr;
	iov[0].iov_len = strlen(buf_out) + 1;

	iov[1].iov_base = map_addr + 4096;
	iov[1].iov_len = strlen(buf_out) + 1;

	/*
	 * Write second page offset (4K - 12), reading from first page and
	 * then second page. In first iteration we should fault in first
	 * page and lock second page. And in second iteration we should
	 * try fault in second page which is locked. Deadlock.
	 */
	ret = pwritev(fd, iov, 2, 4096 + 4096 - 12);
	printf("write() returned=%d\n", ret);
	munmap(map_addr, map_length);
	close(fd);
}


  reply	other threads:[~2020-10-15 15:16 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-13 19:59 [PATCH 0/4] Some more lock_page work Linus Torvalds
2020-10-13 19:59 ` Linus Torvalds
2020-10-13 20:03 ` Linus Torvalds
2020-10-13 20:03   ` Linus Torvalds
2020-10-14 13:05   ` Kirill A. Shutemov
2020-10-14 16:53     ` Linus Torvalds
2020-10-14 16:53       ` Linus Torvalds
2020-10-14 18:15       ` Matthew Wilcox
2020-10-15 10:41         ` Kirill A. Shutemov
2020-10-15  9:43       ` Kirill A. Shutemov
2020-10-15 16:44         ` Linus Torvalds
2020-10-15 16:44           ` Linus Torvalds
2020-10-14  5:50 ` Hugh Dickins
2020-10-14  5:50   ` Hugh Dickins
2020-10-15  1:48 ` Qian Cai
2020-10-15  2:44   ` Linus Torvalds
2020-10-15  2:44     ` Linus Torvalds
2020-10-15 15:16     ` Vivek Goyal [this message]
2020-10-15 19:55       ` Possible deadlock in fuse write path (Was: Re: [PATCH 0/4] Some more lock_page work..) Vivek Goyal
2020-10-15 21:21         ` Linus Torvalds
2020-10-15 21:21           ` Linus Torvalds
2020-10-16 10:02           ` Miklos Szeredi
2020-10-16 10:02             ` Miklos Szeredi
2020-10-16 12:27             ` Matthew Wilcox
2020-10-20 20:42             ` Vivek Goyal
2020-10-21  7:40               ` Miklos Szeredi
2020-10-21  7:40                 ` Miklos Szeredi
2020-10-21 20:12                 ` Vivek Goyal
2020-10-28 20:29                   ` Miklos Szeredi
2020-10-28 20:29                     ` Miklos Szeredi
2021-02-09 10:01                     ` Miklos Szeredi
2021-02-09 19:09                       ` Vivek Goyal
2020-10-16 18:19           ` Vivek Goyal
2020-10-16 18:24             ` Linus Torvalds
2020-10-16 18:24               ` Linus Torvalds
2020-10-16 18:24               ` Linus Torvalds
2020-10-16 18:24                 ` Linus Torvalds
2020-10-16 23:03             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201015151606.GA226448@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=cai@lca.pw \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.