All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Chinner <dchinner@redhat.com>,
	darrick.wong@oracle.com, tytso@mit.edu,
	linux-btrfs@vger.kernel.org, clm@fb.com, jbacik@fb.com,
	viro@zeniv.linux.org.uk, peterz@infradead.org
Subject: Re: [PATCH 01/10] mm: pagecache add lock
Date: Fri, 18 May 2018 06:13:06 -0700	[thread overview]
Message-ID: <20180518131305.GA6361@bombadil.infradead.org> (raw)
In-Reply-To: <20180518074918.13816-3-kent.overstreet@gmail.com>

On Fri, May 18, 2018 at 03:49:00AM -0400, Kent Overstreet wrote:
> Add a per address space lock around adding pages to the pagecache - making it
> possible for fallocate INSERT_RANGE/COLLAPSE_RANGE to work correctly, and also
> hopefully making truncate and dio a bit saner.

(moving this section here from the overall description so I can reply
to it in one place)

>  * pagecache add lock
> 
> This is the only one that touches existing code in nontrivial ways.
> The problem it's solving is that there is no existing general mechanism
> for shooting down pages in the page and keeping them removed, which is a
> real problem if you're doing anything that modifies file data and isn't
> buffered writes.
> 
> Historically, the only problematic case has been direct IO, and people
> have been willing to say "well, if you mix buffered and direct IO you
> get what you deserve", and that's probably not unreasonable. But now we
> have fallocate insert range and collapse range, and those are broken in
> ways I frankly don't want to think about if they can't ensure consistency
> with the page cache.

ext4 manages collapse-vs-pagefault with the ext4-specific i_mmap_sem.
You may get pushback on the grounds that this ought to be a
filesystem-specific lock rather than one embedded in the generic inode.

> Also, the mechanism truncate uses (i_size and sacrificing a goat) has
> historically been rather fragile, IMO it might be a good think if we
> switched it to a more general rigorous mechanism.
> 
> I need this solved for bcachefs because without this mechanism, the page
> cache inconsistencies lead to various assertions popping (primarily when
> we didn't think we need to get a disk reservation going by page cache
> state, but then do the actual write and disk space accounting says oops,
> we did need one). And having to reason about what can happen without
> a locking mechanism for this is not something I care to spend brain
> cycles on.
> 
> That said, my patch is kind of ugly, and it requires filesystem changes
> for other filesystems to take advantage of it. And unfortunately, since
> one of the code paths that needs locking is readahead, I don't see any
> realistic way of implementing the locking within just bcachefs code.
> 
> So I'm hoping someone has an idea for something cleaner (I think I recall
> Matthew Wilcox saying he had an idea for how to use xarray to solve this),
> but if not I'll polish up my pagecache add lock patch and see what I can
> do to make it less ugly, and hopefully other people find it palatable
> or at least useful.

My idea with the XArray is that we have a number of reserved entries which
we can use as blocking entries.  I was originally planning on making this
an XArray feature, but I now believe it's a page-cache-special feature.
We can always revisit that decision if it turns out to be useful to
another user.

API:

int filemap_block_range(struct address_space *mapping, loff_t start,
		loff_t end);
void filemap_remove_block(struct address_space *mapping, loff_t start,
		loff_t end);

 - After removing a block, the pagecache is empty between [start, end].
 - You have to treat the block as a single entity; don't unblock only
   a subrange of the range you originally blocked.
 - Lookups of a page within a blocked range return NULL.
 - Attempts to add a page to a blocked range sleep on one of the
   page_wait_table queues.
 - Attempts to block a blocked range will also sleep on one of the
   page_wait_table queues.  Is this restriction acceptable for your use
   case?  It's clearly not a problem for fallocate insert/collapse.  It
   would only be a problem for Direct I/O if people are doing subpage
   directio from within the same page.  I think that's rare enough to
   not be a problem (but please tell me if I'm wrong!)

  reply	other threads:[~2018-05-18 13:13 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-18  7:48 [PATCH 00/10] RFC: assorted bcachefs patches Kent Overstreet
2018-05-18  7:48 ` [PATCH 01/10] mempool: Add mempool_init()/mempool_exit() Kent Overstreet
2018-05-18  8:21   ` Johannes Thumshirn
2018-05-18  8:21     ` Johannes Thumshirn
2018-05-18  8:21     ` Johannes Thumshirn
2018-05-18  7:49 ` [PATCH 01/10] mm: pagecache add lock Kent Overstreet
2018-05-18 13:13   ` Matthew Wilcox [this message]
2018-05-18 15:53     ` Christoph Hellwig
2018-05-18 17:45       ` Kent Overstreet
2018-05-23 23:55         ` Notes on locking for pagacache consistency (was: [PATCH 01/10] mm: pagecache add lock) Kent Overstreet
2018-05-20 22:45       ` [PATCH 01/10] mm: pagecache add lock Kent Overstreet
2018-05-23 15:22         ` Christoph Hellwig
2018-05-23 17:12           ` Kent Overstreet
2018-05-18  7:49 ` [PATCH 02/10] block: Convert bio_set to mempool_init() Kent Overstreet
2018-05-18  8:23   ` Johannes Thumshirn
2018-05-18  8:23     ` Johannes Thumshirn
2018-05-18  8:23     ` Johannes Thumshirn
2018-05-18  7:49 ` [PATCH 02/10] mm: export find_get_pages() Kent Overstreet
2018-05-18 16:00   ` Christoph Hellwig
2018-05-18  7:49 ` [PATCH 03/10] block: Add bioset_init()/bioset_exit() Kent Overstreet
2018-05-18  8:58   ` Johannes Thumshirn
2018-05-18  8:58     ` Johannes Thumshirn
2018-05-18  8:58     ` Johannes Thumshirn
2018-05-18  7:49 ` [PATCH 03/10] locking: bring back lglocks Kent Overstreet
2018-05-18  9:51   ` Peter Zijlstra
2018-05-18 10:13     ` Kent Overstreet
2018-05-18 11:03       ` Peter Zijlstra
2018-05-18 11:39         ` Kent Overstreet
2018-05-18  7:49 ` [PATCH 04/10] block: Use bioset_init() for fs_bio_set Kent Overstreet
2018-05-18  7:49 ` [PATCH 04/10] locking: export osq_lock()/osq_unlock() Kent Overstreet
2018-05-18  9:52   ` Peter Zijlstra
2018-05-18 10:18     ` Kent Overstreet
2018-05-18 11:08       ` Peter Zijlstra
2018-05-18 11:32         ` Kent Overstreet
2018-05-18 11:40           ` Peter Zijlstra
2018-05-18 12:40             ` Kent Overstreet
2018-05-18  7:49 ` [PATCH 05/10] block: Add bio_copy_data_iter(), zero_fill_bio_iter() Kent Overstreet
2018-05-18  7:49 ` [PATCH 05/10] don't use spin_lock_irqsave() unnecessarily Kent Overstreet
2018-05-18 16:01   ` Christoph Hellwig
2018-05-18  7:49 ` [PATCH 06/10] block: Split out bio_list_copy_data() Kent Overstreet
2018-05-18  7:49 ` [PATCH 06/10] Generic radix trees Kent Overstreet
2018-05-18 16:02   ` Christoph Hellwig
2018-05-18 17:38     ` Kent Overstreet
2018-05-18  7:49 ` [PATCH 07/10] bcache: optimize continue_at_nobarrier() Kent Overstreet
2018-05-18  7:49 ` [PATCH 07/10] block: Add missing flush_dcache_page() call Kent Overstreet
2018-05-18  7:49 ` [PATCH 08/10] bcache: move closures to lib/ Kent Overstreet
2018-05-18 16:02   ` Christoph Hellwig
2018-05-18  7:49 ` [PATCH 08/10] block: Add warning for bi_next not NULL in bio_endio() Kent Overstreet
2018-05-18  9:00   ` Johannes Thumshirn
2018-05-18  9:00     ` Johannes Thumshirn
2018-05-18  9:00     ` Johannes Thumshirn
2018-05-18  7:49 ` [PATCH 09/10] block: Export bio check/set pages_dirty Kent Overstreet
2018-05-18  7:49 ` [PATCH 09/10] closures: closure_wait_event() Kent Overstreet
2018-05-18  7:49 ` [PATCH 10/10] block: Add sysfs entry for fua support Kent Overstreet
2018-05-18  7:49 ` [PATCH 10/10] Dynamic fault injection Kent Overstreet
2018-05-18 16:02   ` Christoph Hellwig
2018-05-18 17:37     ` Kent Overstreet
2018-05-18 19:05   ` Andreas Dilger
2018-05-18 19:10     ` Kent Overstreet
2018-05-18 20:54       ` Andreas Dilger
2018-05-18  7:55 ` [PATCH 00/10] RFC: assorted bcachefs patches Kent Overstreet
2018-05-18 17:45 ` Josef Bacik
2018-05-18 17:49   ` Kent Overstreet
2018-05-18 18:03     ` Josef Bacik
2018-05-18 18:28       ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180518131305.GA6361@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=clm@fb.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=jbacik@fb.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.