Re: [LSF/MM/BPF TOPIC] Bcachefs update

From: Jan Kara <jack@suse.cz>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Bcachefs update
Date: Wed, 18 Dec 2019 13:40:52 +0100	[thread overview]
Message-ID: <20191218124052.GB19387@quack2.suse.cz> (raw)
In-Reply-To: <20191216193852.GA8664@kmo-pixel>

On Mon 16-12-19 14:38:52, Kent Overstreet wrote:
> Pagecache consistency:
> 
> I recently got rid of my pagecache add lock; that added locking to core paths in
> filemap.c and some found my locking scheme to be distastefull (and I never liked
> it enough to argue). I've recently switched to something closer to XFS's locking
> scheme (top of the IO paths); however, I do still need one patch to the
> get_user_pages() path to avoid deadlock via recursive page fault - patch is
> below:
> 
> (This would probably be better done as a new argument to get_user_pages(); I
> didn't do it that way initially because the patch would have been _much_
> bigger.)
> 
> Yee haw.
> 
> commit 20ebb1f34cc9a532a675a43b5bd48d1705477816
> Author: Kent Overstreet <kent.overstreet@gmail.com>
> Date:   Wed Oct 16 15:03:50 2019 -0400
> 
>     mm: Add a mechanism to disable faults for a specific mapping
>     
>     This will be used to prevent a nasty cache coherency issue for O_DIRECT
>     writes; O_DIRECT writes need to shoot down the range of the page cache
>     corresponding to the part of the file being written to - but, if the
>     file is mapped in, userspace can pass in an address in that mapping to
>     pwrite(), causing those pages to be faulted back into the page cache
>     in get_user_pages().
>     
>     Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>

I'm not really sure about the exact nature of the deadlock since the
changelog doesn't explain it but if you need to take some lockA in your
page fault path and you already hold lockA in your DIO code, then this
patch isn't going to cut it. Just think of a malicious scheme with two
tasks one doing DIO from fileA (protected by lockA) to buffers mapped from
fileB and the other process the other way around...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR