Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, Ted Tso <tytso@mit.edu>,
	Christoph Hellwig <hch@infradead.org>,
	Amir Goldstein <amir73il@gmail.com>
Subject: Re: [PATCH 2/7] mm: Protect operations adding pages to page cache with i_mapping_lock
Date: Thu, 15 Apr 2021 07:57:39 +1000
Message-ID: <20210414215739.GH63242@dread.disaster.area> (raw)
In-Reply-To: <20210414122319.GD31323@quack2.suse.cz>

On Wed, Apr 14, 2021 at 02:23:19PM +0200, Jan Kara wrote:
> On Wed 14-04-21 10:01:13, Dave Chinner wrote:
> > On Tue, Apr 13, 2021 at 01:28:46PM +0200, Jan Kara wrote:
> > > index c5b0457415be..ac5bb50b3a4c 100644
> > > --- a/mm/readahead.c
> > > +++ b/mm/readahead.c
> > > @@ -192,6 +192,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> > >  	 */
> > >  	unsigned int nofs = memalloc_nofs_save();
> > >  
> > > +	down_read(&mapping->host->i_mapping_sem);
> > >  	/*
> > >  	 * Preallocate as many pages as we will need.
> > >  	 */
> > 
> > I can't say I'm a great fan of having the mapping reach back up to
> > the host to lock the host. THis seems the wrong way around to me
> > given that most of the locking in the IO path is in "host locks
> > mapping" and "mapping locks internal mapping structures" order...
> > 
> > I also come back to the naming confusion here, in that when we look
> > at this in long hand from the inode perspective, this chain actually
> > looks like:
> > 
> > 	lock(inode->i_mapping->inode->i_mapping_sem)
> > 
> > i.e. the mapping is reaching back up outside it's scope to lock
> > itself against other inode->i_mapping operations. Smells of layering
> > violations to me.
> > 
> > So, next question: should this truncate semanphore actually be part
> > of the address space, not the inode? This patch is actually moving
> > the page fault serialisation from the inode into the address space
> > operations when page faults and page cache operations are done, so
> > maybe the lock should also make that move? That would help clear up
> > the naming problem, because now we can name it based around what it
> > serialises in the address space, not the address space as a whole...
> 
> I think that moving the lock to address_space makes some sence although the
> lock actually protects consistency of inode->i_mapping->i_pages with
> whatever the filesystem has in its file_offset->disk_block mapping
> structures (which are generally associated with the inode).

Well, I look at is as a mechanism that the filesystem uses to ensure
coherency of the page cache accesses w.r.t. physical layout changes.
The layout is a property of the inode, but changes to the physical
layout of the inode are serialised by other inode based mechanisms.
THe page cache isn't part of the inode - it's part of the address
space - but coherency with the inode is required. Hence inode
operations need to be able to ensure coherency of the address space
content and accesses w.r.t. physical layout changes of the inode,
but the address space really knows nothing about the physical layout
of the inode or how it gets changed...

Hence it's valid for the inode operations to lock the address space
to ensure coherency of the page cache when making physical layout
changes, but locking the address space, by itself, is not sufficient
to safely serialise against physical changes to the inode layout.

> So it is not
> only about inode->i_mapping contents but I agree that struct address_space
> is probably a bit more logical place than struct inode.

Yup. Remember that the XFS_MMAPLOCK arose at the inode level because
that was the only way the filesystem could acheive the necessary
serialisation of page cache accesses whilst doing physical layout
changes. So the lock became an "inode property" because of
implementation constraints, not because it was the best way to
implement the necessary coherency hooks.

> Regarding the name: How about i_pages_rwsem? The lock is protecting
> invalidation of mapping->i_pages and needs to be held until insertion of
> pages into i_pages is safe again...

I don't actually have a good name for this right now. :(

The i_pages structure has it's own internal locking, so
i_pages_rwsem implies things that aren't necessarily true, and
taking a read lock for insertion for something that is named like a
structure protection lock creates cognitive dissonance...

I keep wanting to say "lock for invalidation" and "lock to exclude
invalidation" because those are the two actions that we need for
coherency of operations. But they are way too verbose for an actual
API...

So I want to call this an "invalidation lock" of some kind (no need
to encode the type in the name!), but haven't worked out a good
shorthand for "address space invalidation coherency mechanism"...

Naming is hard. :/

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-13 11:28 [PATCH 0/7 RFC v3] fs: Hole punch vs page cache filling races Jan Kara
2021-04-13 11:28 ` [PATCH 1/7] mm: Fix comments mentioning i_mutex Jan Kara
2021-04-13 12:38   ` Christoph Hellwig
2021-04-13 11:28 ` [PATCH 2/7] mm: Protect operations adding pages to page cache with i_mapping_lock Jan Kara
2021-04-13 12:57   ` Christoph Hellwig
2021-04-13 13:56     ` Jan Kara
2021-04-14  0:01   ` Dave Chinner
2021-04-14 12:23     ` Jan Kara
2021-04-14 21:57       ` Dave Chinner [this message]
2021-04-15 13:11         ` Jan Kara
2021-04-14 22:25     ` Matthew Wilcox
2021-04-15  2:05       ` Dave Chinner
2021-04-13 11:28 ` [PATCH 3/7] ext4: Convert to use inode->i_mapping_sem Jan Kara
2021-04-13 11:28 ` [PATCH 4/7] ext2: Convert to using i_mapping_sem Jan Kara
2021-04-13 11:28 ` [PATCH 5/7] xfs: Convert to use i_mapping_sem Jan Kara
2021-04-13 13:05   ` Christoph Hellwig
2021-04-13 13:42     ` Jan Kara
2021-04-13 11:28 ` [PATCH 6/7] zonefs: Convert to using i_mapping_sem Jan Kara
2021-04-13 11:28 ` [PATCH 7/7] fs: Remove i_mapping_sem protection from .page_mkwrite handlers Jan Kara
2021-04-13 13:09 ` [PATCH 0/7 RFC v3] fs: Hole punch vs page cache filling races Christoph Hellwig
2021-04-13 14:17   ` Jan Kara
2021-04-19 15:20 ` Matthew Wilcox
2021-04-19 16:25   ` Jan Kara
2021-04-20 22:12   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210414215739.GH63242@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=amir73il@gmail.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git