All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Dave Chinner <david@fromorbit.com>,
	ceph-devel@vger.kernel.org, Chao Yu <yuchao0@huawei.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Jaegeuk Kim <jaegeuk@kernel.org>,
	Jeff Layton <jlayton@kernel.org>,
	Johannes Thumshirn <jth@kernel.org>,
	linux-cifs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-f2fs-devel@lists.sourceforge.net, linux-mm@kvack.org,
	linux-xfs@vger.kernel.org, Miklos Szeredi <miklos@szeredi.hu>,
	Steve French <sfrench@samba.org>, Ted Tso <tytso@mit.edu>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock
Date: Tue, 8 Jun 2021 14:19:15 +0200	[thread overview]
Message-ID: <20210608121915.GG5562@quack2.suse.cz> (raw)
In-Reply-To: <20210607160922.GA2945763@locust>

On Mon 07-06-21 09:09:22, Darrick J. Wong wrote:
> On Mon, Jun 07, 2021 at 04:52:13PM +0200, Jan Kara wrote:
> > Currently, serializing operations such as page fault, read, or readahead
> > against hole punching is rather difficult. The basic race scheme is
> > like:
> > 
> > fallocate(FALLOC_FL_PUNCH_HOLE)			read / fault / ..
> >   truncate_inode_pages_range()
> > 						  <create pages in page
> > 						   cache here>
> >   <update fs block mapping and free blocks>
> > 
> > Now the problem is in this way read / page fault / readahead can
> > instantiate pages in page cache with potentially stale data (if blocks
> > get quickly reused). Avoiding this race is not simple - page locks do
> > not work because we want to make sure there are *no* pages in given
> > range. inode->i_rwsem does not work because page fault happens under
> > mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes
> > the performance for mixed read-write workloads suffer.
> > 
> > So create a new rw_semaphore in the address_space - invalidate_lock -
> > that protects adding of pages to page cache for page faults / reads /
> > readahead.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
...
> > +->fallocate implementation must be really careful to maintain page cache
> > +consistency when punching holes or performing other operations that invalidate
> > +page cache contents. Usually the filesystem needs to call
> > +truncate_inode_pages_range() to invalidate relevant range of the page cache.
> > +However the filesystem usually also needs to update its internal (and on disk)
> > +view of file offset -> disk block mapping. Until this update is finished, the
> > +filesystem needs to block page faults and reads from reloading now-stale page
> > +cache contents from the disk. VFS provides mapping->invalidate_lock for this
> > +and acquires it in shared mode in paths loading pages from disk
> > +(filemap_fault(), filemap_read(), readahead paths). The filesystem is
> > +responsible for taking this lock in its fallocate implementation and generally
> > +whenever the page cache contents needs to be invalidated because a block is
> > +moving from under a page.
> 
> Having a page cache invalidation lock isn't optional anymore, so I think
> these last two sentences could be condensed:
> 
> "...from reloading now-stale page cache contents from disk.  Since VFS
> acquires mapping->invalidate_lock in shared mode when loading pages from
> disk (filemap_fault(), filemap_read(), readahead), the fallocate
> implementation must take the invalidate_lock to prevent reloading."
> 
> > +
> > +->copy_file_range and ->remap_file_range implementations need to serialize
> > +against modifications of file data while the operation is running. For
> > +blocking changes through write(2) and similar operations inode->i_rwsem can be
> > +used. For blocking changes through memory mapping, the filesystem can use
> > +mapping->invalidate_lock provided it also acquires it in its ->page_mkwrite
> > +implementation.
> 
> Following the same line of reasoning, if taking the invalidate_lock is
> no longer optional, then the conditional language in this last sentence
> is incorrect.  How about:
> 
> "To block changes to file contents via a memory mapping during the
> operation, the filesystem must take mapping->invalidate_lock to
> coordinate with ->page_mkwrite."
> 
> The code changes look fine to me, though I'm no mm expert. ;)

OK, I've updated the documentation as you suggested. Thanks for review.

									Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID
From: Jan Kara <jack@suse.cz>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-cifs@vger.kernel.org,
	Damien Le Moal <damien.lemoal@wdc.com>,
	linux-ext4@vger.kernel.org, Jan Kara <jack@suse.cz>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Jeff Layton <jlayton@kernel.org>,
	Steve French <sfrench@samba.org>,
	Dave Chinner <david@fromorbit.com>,
	Matthew Wilcox <willy@infradead.org>,
	linux-f2fs-devel@lists.sourceforge.net,
	Christoph Hellwig <hch@infradead.org>,
	linux-mm@kvack.org, Miklos Szeredi <miklos@szeredi.hu>,
	Ted Tso <tytso@mit.edu>,
	linux-fsdevel@vger.kernel.org, Jaegeuk Kim <jaegeuk@kernel.org>,
	ceph-devel@vger.kernel.org, Johannes Thumshirn <jth@kernel.org>,
	linux-xfs@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock
Date: Tue, 8 Jun 2021 14:19:15 +0200	[thread overview]
Message-ID: <20210608121915.GG5562@quack2.suse.cz> (raw)
In-Reply-To: <20210607160922.GA2945763@locust>

On Mon 07-06-21 09:09:22, Darrick J. Wong wrote:
> On Mon, Jun 07, 2021 at 04:52:13PM +0200, Jan Kara wrote:
> > Currently, serializing operations such as page fault, read, or readahead
> > against hole punching is rather difficult. The basic race scheme is
> > like:
> > 
> > fallocate(FALLOC_FL_PUNCH_HOLE)			read / fault / ..
> >   truncate_inode_pages_range()
> > 						  <create pages in page
> > 						   cache here>
> >   <update fs block mapping and free blocks>
> > 
> > Now the problem is in this way read / page fault / readahead can
> > instantiate pages in page cache with potentially stale data (if blocks
> > get quickly reused). Avoiding this race is not simple - page locks do
> > not work because we want to make sure there are *no* pages in given
> > range. inode->i_rwsem does not work because page fault happens under
> > mmap_sem which ranks below inode->i_rwsem. Also using it for reads makes
> > the performance for mixed read-write workloads suffer.
> > 
> > So create a new rw_semaphore in the address_space - invalidate_lock -
> > that protects adding of pages to page cache for page faults / reads /
> > readahead.
> > 
> > Signed-off-by: Jan Kara <jack@suse.cz>
...
> > +->fallocate implementation must be really careful to maintain page cache
> > +consistency when punching holes or performing other operations that invalidate
> > +page cache contents. Usually the filesystem needs to call
> > +truncate_inode_pages_range() to invalidate relevant range of the page cache.
> > +However the filesystem usually also needs to update its internal (and on disk)
> > +view of file offset -> disk block mapping. Until this update is finished, the
> > +filesystem needs to block page faults and reads from reloading now-stale page
> > +cache contents from the disk. VFS provides mapping->invalidate_lock for this
> > +and acquires it in shared mode in paths loading pages from disk
> > +(filemap_fault(), filemap_read(), readahead paths). The filesystem is
> > +responsible for taking this lock in its fallocate implementation and generally
> > +whenever the page cache contents needs to be invalidated because a block is
> > +moving from under a page.
> 
> Having a page cache invalidation lock isn't optional anymore, so I think
> these last two sentences could be condensed:
> 
> "...from reloading now-stale page cache contents from disk.  Since VFS
> acquires mapping->invalidate_lock in shared mode when loading pages from
> disk (filemap_fault(), filemap_read(), readahead), the fallocate
> implementation must take the invalidate_lock to prevent reloading."
> 
> > +
> > +->copy_file_range and ->remap_file_range implementations need to serialize
> > +against modifications of file data while the operation is running. For
> > +blocking changes through write(2) and similar operations inode->i_rwsem can be
> > +used. For blocking changes through memory mapping, the filesystem can use
> > +mapping->invalidate_lock provided it also acquires it in its ->page_mkwrite
> > +implementation.
> 
> Following the same line of reasoning, if taking the invalidate_lock is
> no longer optional, then the conditional language in this last sentence
> is incorrect.  How about:
> 
> "To block changes to file contents via a memory mapping during the
> operation, the filesystem must take mapping->invalidate_lock to
> coordinate with ->page_mkwrite."
> 
> The code changes look fine to me, though I'm no mm expert. ;)

OK, I've updated the documentation as you suggested. Thanks for review.

									Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

  reply	other threads:[~2021-06-08 12:19 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07 14:52 [PATCH 0/14 v7] fs: Hole punch vs page cache filling races Jan Kara
2021-06-07 14:52 ` [f2fs-dev] " Jan Kara
2021-06-07 14:52 ` [PATCH 01/14] mm: Fix comments mentioning i_mutex Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 15:31   ` Darrick J. Wong
2021-06-07 15:31     ` [f2fs-dev] " Darrick J. Wong
2021-06-10  9:06   ` Ming Lei
2021-06-10  9:06     ` [f2fs-dev] " Ming Lei
2021-06-10 10:40     ` Jan Kara
2021-06-10 10:40       ` [f2fs-dev] " Jan Kara
2021-06-07 14:52 ` [PATCH 02/14] documentation: Sync file_operations members with reality Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 14:52 ` [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 16:09   ` Darrick J. Wong
2021-06-07 16:09     ` [f2fs-dev] " Darrick J. Wong
2021-06-08 12:19     ` Jan Kara [this message]
2021-06-08 12:19       ` Jan Kara
2021-06-07 14:52 ` [PATCH 04/14] mm: Add functions to lock invalidate_lock for two mappings Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 15:46   ` Darrick J. Wong
2021-06-07 15:46     ` [f2fs-dev] " Darrick J. Wong
2021-06-07 14:52 ` [PATCH 05/14] ext4: Convert to use mapping->invalidate_lock Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 14:52 ` [PATCH 06/14] ext2: Convert to using invalidate_lock Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 14:52 ` [PATCH 07/14] xfs: Refactor xfs_isilocked() Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 15:49   ` Darrick J. Wong
2021-06-07 15:49     ` [f2fs-dev] " Darrick J. Wong
2021-06-08 22:18   ` Dave Chinner
2021-06-08 22:18     ` [f2fs-dev] " Dave Chinner
2021-06-07 14:52 ` [PATCH 08/14] xfs: Convert to use invalidate_lock Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 15:56   ` Darrick J. Wong
2021-06-07 15:56     ` [f2fs-dev] " Darrick J. Wong
2021-06-08 12:23     ` Jan Kara
2021-06-08 12:23       ` [f2fs-dev] " Jan Kara
2021-06-07 14:52 ` [PATCH 09/14] xfs: Convert double locking of MMAPLOCK to use VFS helpers Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 15:57   ` Darrick J. Wong
2021-06-07 15:57     ` [f2fs-dev] " Darrick J. Wong
2021-06-07 14:52 ` [PATCH 10/14] zonefs: Convert to using invalidate_lock Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 14:52 ` [PATCH 11/14] f2fs: " Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 14:52 ` [PATCH 12/14] fuse: " Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-10 12:42   ` Miklos Szeredi
2021-06-10 12:42     ` [f2fs-dev] " Miklos Szeredi
2021-06-10 12:42     ` Miklos Szeredi
2021-06-07 14:52 ` [PATCH 13/14] ceph: Fix race between hole punch and page fault Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-07 14:52 ` [PATCH 14/14] cifs: " Jan Kara
2021-06-07 14:52   ` [f2fs-dev] " Jan Kara
2021-06-08 11:54 ` [PATCH 0/14 v7] fs: Hole punch vs page cache filling races Jan Kara
2021-06-08 11:54   ` [f2fs-dev] " Jan Kara
2021-06-15  9:17 [PATCH 0/14 v8] " Jan Kara
2021-06-15  9:17 ` [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock Jan Kara
2021-06-16  5:33   ` Christoph Hellwig
2021-06-17 16:15   ` Darrick J. Wong
2021-07-12 16:55 [PATCH 0/14 v9] fs: Hole punch vs page cache filling races Jan Kara
2021-07-12 16:55 ` [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock Jan Kara
2021-07-13  1:25   ` Darrick J. Wong
2021-07-13 11:11     ` Jan Kara
2021-07-13  6:25   ` Christoph Hellwig
2021-07-13 12:35     ` Jan Kara
2021-07-15 13:40 [PATCH 0/14 v10] fs: Hole punch vs page cache filling races Jan Kara
2021-07-15 13:40 ` [PATCH 03/14] mm: Protect operations adding pages to page cache with invalidate_lock Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210608121915.GG5562@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=ceph-devel@vger.kernel.org \
    --cc=damien.lemoal@wdc.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jaegeuk@kernel.org \
    --cc=jlayton@kernel.org \
    --cc=jth@kernel.org \
    --cc=linux-cifs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=sfrench@samba.org \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    --cc=yuchao0@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.