Re: Possible deadlock in fuse write path (Was: Re: [PATCH 0/4] Some more lock_page work..)

From: Matthew Wilcox <willy@infradead.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Vivek Goyal <vgoyal@redhat.com>, Qian Cai <cai@lca.pw>,
	Hugh Dickins <hughd@google.com>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	Linux-MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Amir Goldstein <amir73il@gmail.com>
Subject: Re: Possible deadlock in fuse write path (Was: Re: [PATCH 0/4] Some more lock_page work..)
Date: Fri, 16 Oct 2020 13:27:58 +0100	[thread overview]
Message-ID: <20201016122758.GE20115@casper.infradead.org> (raw)
In-Reply-To: <CAJfpegtAstEo+nYgT81swYZWdziaZP_40QGAXcTORqYwgeWNUA@mail.gmail.com>

On Fri, Oct 16, 2020 at 12:02:21PM +0200, Miklos Szeredi wrote:
> This was added by commit ea9b9907b82a ("fuse: implement
> perform_write") in v2.6.26 and remains essentially unchanged, AFAICS.
> So this is an old bug indeed.
> 
> So what is the page lock protecting?   I think not truncation, because
> inode_lock should be sufficient protection.
> 
> What it does after sending a synchronous WRITE and before unlocking
> the pages is set the PG_uptodate flag, but only if the complete page
> was really written, which is what the uptodate flag really says:  this
> page is in sync with the underlying fs.

Not in sync with.  Uptodate means "every byte on this page is at least
as new as the bytes on storage".  Dirty means "at least one byte is
newer than the bytes on storage".

> So I think the page lock here is trying to protect against concurrent
> reads/faults on not uptodate pages.  I.e. until the WRITE request
> completes it is unknown whether the page was really written or not, so
> any reads must block until this state becomes known.  This logic falls
> down on already cached pages, since they start out uptodate and the
> write does not clear this flag.

That's not how the page cache should work -- if a write() has touched
every byte in a page, then the page should be marked as Uptodate, and it
can immediately satisfy read()s without even touching the backing store.

> So keeping the pages locked has dubious value: short writes don't seem
> to work correctly anyway.  Which means that we can probably just set
> the page uptodate right after filling it from the user buffer, and
> unlock the page immediately.
> 
> Am I missing something?

I haven't looked at fuse in detail -- are you handling partial page
writes?  That is, if someone writes to bytes 5-15 of a file, are you
first reading bytes 0-4 and 16-4095 from the backing store?  If so,
you can mark the page Uptodate as soon as you've copied bytes 5-15
from the user.