Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org, Ted Tso <tytso@mit.edu>,
	Christoph Hellwig <hch@infradead.org>,
	Amir Goldstein <amir73il@gmail.com>
Subject: Re: [PATCH 2/7] mm: Protect operations adding pages to page cache with i_mapping_lock
Date: Wed, 14 Apr 2021 14:23:19 +0200
Message-ID: <20210414122319.GD31323@quack2.suse.cz> (raw)
In-Reply-To: <20210414000113.GG63242@dread.disaster.area>

On Wed 14-04-21 10:01:13, Dave Chinner wrote:
> On Tue, Apr 13, 2021 at 01:28:46PM +0200, Jan Kara wrote:
> >   *
> >   *  ->mmap_lock
> >   *    ->i_mmap_rwsem
> > @@ -85,7 +86,8 @@
> >   *        ->i_pages lock	(arch-dependent flush_dcache_mmap_lock)
> >   *
> >   *  ->mmap_lock
> > - *    ->lock_page		(access_process_vm)
> > + *    ->i_mapping_sem		(filemap_fault)
> > + *      ->lock_page		(filemap_fault, access_process_vm)
> >   *
> >   *  ->i_rwsem			(generic_perform_write)
> >   *    ->mmap_lock		(fault_in_pages_readable->do_page_fault)
> > @@ -2276,16 +2278,28 @@ static int filemap_update_page(struct kiocb *iocb,
> >  {
> >  	int error;
> >  
> > +	if (iocb->ki_flags & IOCB_NOWAIT) {
> > +		if (!down_read_trylock(&mapping->host->i_mapping_sem))
> > +			return -EAGAIN;
> > +	} else {
> > +		down_read(&mapping->host->i_mapping_sem);
> > +	}
> 
> We really need a lock primitive for this. The number of times this
> exact lock pattern is being replicated all through the IO path is
> getting out of hand.
> 
> static inline bool
> down_read_try_or_lock(struct rwsem *sem, bool try)
> {
> 	if (try) {
> 		if (!down_read_trylock(sem))
> 			return false;
> 	} else {
> 		down_read(&mapping->host->i_mapping_sem);
> 	}
> 	return true;
> }
> 
> and the callers become:
> 
> 	if (!down_read_try_or_lock(sem, (iocb->ki_flags & IOCB_NOWAIT)))
> 		return -EAGAIN;
> 
> We can do the same with mutex_try_or_lock(), down_try_or_lock(), etc
> and we don't need to rely on cargo cult knowledge to propagate this
> pattern anymore. Because I'm betting relatively few people actually
> know why the code is written this way because the only place it is
> documented is in an XFS commit message....
> 
> Doing this is a separate cleanup, though, and not something that
> needs to be done in this patchset.

Yep, good idea but let's do it in a separate patch set.

> > index c5b0457415be..ac5bb50b3a4c 100644
> > --- a/mm/readahead.c
> > +++ b/mm/readahead.c
> > @@ -192,6 +192,7 @@ void page_cache_ra_unbounded(struct readahead_control *ractl,
> >  	 */
> >  	unsigned int nofs = memalloc_nofs_save();
> >  
> > +	down_read(&mapping->host->i_mapping_sem);
> >  	/*
> >  	 * Preallocate as many pages as we will need.
> >  	 */
> 
> I can't say I'm a great fan of having the mapping reach back up to
> the host to lock the host. THis seems the wrong way around to me
> given that most of the locking in the IO path is in "host locks
> mapping" and "mapping locks internal mapping structures" order...
> 
> I also come back to the naming confusion here, in that when we look
> at this in long hand from the inode perspective, this chain actually
> looks like:
> 
> 	lock(inode->i_mapping->inode->i_mapping_sem)
> 
> i.e. the mapping is reaching back up outside it's scope to lock
> itself against other inode->i_mapping operations. Smells of layering
> violations to me.
> 
> So, next question: should this truncate semanphore actually be part
> of the address space, not the inode? This patch is actually moving
> the page fault serialisation from the inode into the address space
> operations when page faults and page cache operations are done, so
> maybe the lock should also make that move? That would help clear up
> the naming problem, because now we can name it based around what it
> serialises in the address space, not the address space as a whole...

I think that moving the lock to address_space makes some sence although the
lock actually protects consistency of inode->i_mapping->i_pages with
whatever the filesystem has in its file_offset->disk_block mapping
structures (which are generally associated with the inode). So it is not
only about inode->i_mapping contents but I agree that struct address_space
is probably a bit more logical place than struct inode.

Regarding the name: How about i_pages_rwsem? The lock is protecting
invalidation of mapping->i_pages and needs to be held until insertion of
pages into i_pages is safe again...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-13 11:28 [PATCH 0/7 RFC v3] fs: Hole punch vs page cache filling races Jan Kara
2021-04-13 11:28 ` [PATCH 1/7] mm: Fix comments mentioning i_mutex Jan Kara
2021-04-13 12:38   ` Christoph Hellwig
2021-04-13 11:28 ` [PATCH 2/7] mm: Protect operations adding pages to page cache with i_mapping_lock Jan Kara
2021-04-13 12:57   ` Christoph Hellwig
2021-04-13 13:56     ` Jan Kara
2021-04-14  0:01   ` Dave Chinner
2021-04-14 12:23     ` Jan Kara [this message]
2021-04-14 21:57       ` Dave Chinner
2021-04-15 13:11         ` Jan Kara
2021-04-14 22:25     ` Matthew Wilcox
2021-04-15  2:05       ` Dave Chinner
2021-04-13 11:28 ` [PATCH 3/7] ext4: Convert to use inode->i_mapping_sem Jan Kara
2021-04-13 11:28 ` [PATCH 4/7] ext2: Convert to using i_mapping_sem Jan Kara
2021-04-13 11:28 ` [PATCH 5/7] xfs: Convert to use i_mapping_sem Jan Kara
2021-04-13 13:05   ` Christoph Hellwig
2021-04-13 13:42     ` Jan Kara
2021-04-13 11:28 ` [PATCH 6/7] zonefs: Convert to using i_mapping_sem Jan Kara
2021-04-13 11:28 ` [PATCH 7/7] fs: Remove i_mapping_sem protection from .page_mkwrite handlers Jan Kara
2021-04-13 13:09 ` [PATCH 0/7 RFC v3] fs: Hole punch vs page cache filling races Christoph Hellwig
2021-04-13 14:17   ` Jan Kara
2021-04-19 15:20 ` Matthew Wilcox
2021-04-19 16:25   ` Jan Kara
2021-04-20 22:12   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210414122319.GD31323@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git