linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] Hole puch vs page cache filling races fixes for 5.14-rc1
@ 2021-06-30 17:25 Jan Kara
  2021-07-01  1:15 ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2021-06-30 17:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, Darrick J. Wong

  Hello Linus,

  could you please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git hole_punch_for_v5.14-rc1

to get patches that fix races leading to possible data corruption or stale
data exposure in multiple filesystems when hole punching races with
operations such as readahead.

Top of the tree is e996ae6bdbd1. The full shortlog is:

Jan Kara (13):
      mm: Fix comments mentioning i_mutex
      documentation: Sync file_operations members with reality
      mm: Protect operations adding pages to page cache with invalidate_lock
      mm: Add functions to lock invalidate_lock for two mappings
      ext4: Convert to use mapping->invalidate_lock
      ext2: Convert to using invalidate_lock
      xfs: Convert to use invalidate_lock
      xfs: Convert double locking of MMAPLOCK to use VFS helpers
      zonefs: Convert to using invalidate_lock
      f2fs: Convert to using invalidate_lock
      fuse: Convert to using invalidate_lock
      ceph: Fix race between hole punch and page fault
      cifs: Fix race between hole punch and page fault

Pavel Reichl (1):
      xfs: Refactor xfs_isilocked()

The diffstat is

 Documentation/filesystems/locking.rst |  77 +++++++++++++++-------
 fs/ceph/addr.c                        |   9 ++-
 fs/ceph/file.c                        |   2 +
 fs/cifs/smb2ops.c                     |   2 +
 fs/ext2/ext2.h                        |  11 ----
 fs/ext2/file.c                        |   7 +-
 fs/ext2/inode.c                       |  12 ++--
 fs/ext2/super.c                       |   3 -
 fs/ext4/ext4.h                        |  10 ---
 fs/ext4/extents.c                     |  25 +++----
 fs/ext4/file.c                        |  13 ++--
 fs/ext4/inode.c                       |  47 +++++--------
 fs/ext4/ioctl.c                       |   4 +-
 fs/ext4/super.c                       |  13 ++--
 fs/ext4/truncate.h                    |   8 ++-
 fs/f2fs/data.c                        |   4 +-
 fs/f2fs/f2fs.h                        |   1 -
 fs/f2fs/file.c                        |  62 +++++++++--------
 fs/f2fs/super.c                       |   1 -
 fs/fuse/dax.c                         |  50 +++++++-------
 fs/fuse/dir.c                         |  11 ++--
 fs/fuse/file.c                        |  10 +--
 fs/fuse/fuse_i.h                      |   7 --
 fs/fuse/inode.c                       |   1 -
 fs/inode.c                            |   2 +
 fs/xfs/xfs_bmap_util.c                |  15 +++--
 fs/xfs/xfs_file.c                     |  13 ++--
 fs/xfs/xfs_inode.c                    | 121 ++++++++++++++++++----------------
 fs/xfs/xfs_inode.h                    |   3 +-
 fs/xfs/xfs_super.c                    |   2 -
 fs/zonefs/super.c                     |  23 ++-----
 fs/zonefs/zonefs.h                    |   7 +-
 include/linux/fs.h                    |  39 +++++++++++
 mm/filemap.c                          | 113 ++++++++++++++++++++++++++-----
 mm/madvise.c                          |   2 +-
 mm/memory-failure.c                   |   2 +-
 mm/readahead.c                        |   2 +
 mm/rmap.c                             |  41 ++++++------
 mm/shmem.c                            |  20 +++---
 mm/truncate.c                         |   9 +--
 40 files changed, 453 insertions(+), 351 deletions(-)

							Thanks
								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] Hole puch vs page cache filling races fixes for 5.14-rc1
  2021-06-30 17:25 [GIT PULL] Hole puch vs page cache filling races fixes for 5.14-rc1 Jan Kara
@ 2021-07-01  1:15 ` Linus Torvalds
  2021-07-01 16:19   ` Jan Kara
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2021-07-01  1:15 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Darrick J. Wong

On Wed, Jun 30, 2021 at 10:25 AM Jan Kara <jack@suse.cz> wrote:
>
>   could you please pull from

No.

There is no way I'll merge something this broken.

Looking up a page in the page cache is just about the most critical
thing there is, and this introduces a completely pointless lock for
that situation.

Does it take the lock only when it creates the page? No. It takes the
lock in filemap_fault() even if it found a valid page in the page
cache.

This is just wrong.

               Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] Hole puch vs page cache filling races fixes for 5.14-rc1
  2021-07-01  1:15 ` Linus Torvalds
@ 2021-07-01 16:19   ` Jan Kara
  2021-07-01 18:04     ` Linus Torvalds
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kara @ 2021-07-01 16:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jan Kara, linux-fsdevel, Darrick J. Wong

On Wed 30-06-21 18:15:09, Linus Torvalds wrote:
> On Wed, Jun 30, 2021 at 10:25 AM Jan Kara <jack@suse.cz> wrote:
> >
> >   could you please pull from
> 
> No.
> 
> There is no way I'll merge something this broken.
> 
> Looking up a page in the page cache is just about the most critical
> thing there is, and this introduces a completely pointless lock for
> that situation.
> 
> Does it take the lock only when it creates the page? No. It takes the
> lock in filemap_fault() even if it found a valid page in the page
> cache.

Hum, fair point. I did filemap_fault() the way it is because I was mostly
just lifting fs-private lock into the VFS one in that code path and
ext4/xfs/f2fs and others grabbed this lock unconditionally in their fault
paths (before calling into filemap_fault()). But you are right that now
that we have the lock in VFS, we can actually do better and have a fast
path when everything is cached and uptodate where we can avoid grabbing the
lock. That being said I don't expect the optimization to matter too much
because in do_read_fault() we first call do_fault_around() which will
exactly map pages that are already in cache and uptodate so we usually get
into filemap_fault() only for pages that are not present or not uptodate.
So do you think the optimization is still worth it despite
do_fault_around()? I guess I can try to see how many times I can see a page
that would benefit from this optimization in filemap_fault() on my test
machine - there are also write faults that don't call do_fault_around() -
and if it's noticeable fraction reorganize filemap_fault() so that we don't
take the lock if the page is present and uptodate...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] Hole puch vs page cache filling races fixes for 5.14-rc1
  2021-07-01 16:19   ` Jan Kara
@ 2021-07-01 18:04     ` Linus Torvalds
  2021-07-06 20:17       ` Jan Kara
  0 siblings, 1 reply; 5+ messages in thread
From: Linus Torvalds @ 2021-07-01 18:04 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-fsdevel, Darrick J. Wong

On Thu, Jul 1, 2021 at 9:19 AM Jan Kara <jack@suse.cz> wrote:
>
> That being said I don't expect the optimization to matter too much
> because in do_read_fault() we first call do_fault_around() which will
> exactly map pages that are already in cache and uptodate

Yeah, I think that ends up saving the situation.

> So do you think the optimization is still worth it despite
> do_fault_around()?

I suspect it doesn't matter that much for performance as you say due
to any filesystem that cares about performance having the "map_pages"
function pointing to filemap_map_pages, but I reacted to it just from
looking at the patch, and it just seems conceptually wrong. Taking the
lock in a situation where it's not actually needed will just cause
problems later when somebody decides that the lock protects something
else entirely.

             Linus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [GIT PULL] Hole puch vs page cache filling races fixes for 5.14-rc1
  2021-07-01 18:04     ` Linus Torvalds
@ 2021-07-06 20:17       ` Jan Kara
  0 siblings, 0 replies; 5+ messages in thread
From: Jan Kara @ 2021-07-06 20:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Kara, linux-fsdevel, Darrick J. Wong, linux-ext4, linux-mm,
	Christoph Hellwig

[-- Attachment #1: Type: text/plain, Size: 1691 bytes --]

On Thu 01-07-21 11:04:28, Linus Torvalds wrote:
> On Thu, Jul 1, 2021 at 9:19 AM Jan Kara <jack@suse.cz> wrote:
> >
> > That being said I don't expect the optimization to matter too much
> > because in do_read_fault() we first call do_fault_around() which will
> > exactly map pages that are already in cache and uptodate
> 
> Yeah, I think that ends up saving the situation.
> 
> > So do you think the optimization is still worth it despite
> > do_fault_around()?
> 
> I suspect it doesn't matter that much for performance as you say due
> to any filesystem that cares about performance having the "map_pages"
> function pointing to filemap_map_pages, but I reacted to it just from
> looking at the patch, and it just seems conceptually wrong. Taking the
> lock in a situation where it's not actually needed will just cause
> problems later when somebody decides that the lock protects something
> else entirely.

OK, I did some experiments and indeed in my totally unscientific boot and
compile tests I've seen ~90% of page faults to not get to filemap_fault()
(they were either anon memory or satisfied with filemap_map_pages()). From
the remaining 10% which got to filemap_fault() about 95% already had a page
in the page cache (so the optimization would help) - usually because we
were doing a write fault. So the optimization seems worth it.  I've added
attached patch on top of the series which implements the optimization you
suggested. I've also pushed out the complete series to

git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git hole_punch_fixes

so that people can see the whole picture. Review is welcome!

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-mm-Acquire-invalidate_lock-in-filemap_fault-only-whe.patch --]
[-- Type: text/x-patch, Size: 3692 bytes --]

From 347157fb9dfaf5a7d10e723c773affe147cdff34 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 6 Jul 2021 16:45:42 +0200
Subject: [PATCH] mm: Acquire invalidate_lock in filemap_fault() only when
 creating pages

We don't need to acquire invalidate_lock in filemap_fault() when the
page is already in the page cache and uptodate since the existence of
the page itself (and its page lock) protect from races with
invalidation. This is rather common situation especially for write
faults (for read faults filemap_map_pages() generally handles this
situation). So let's avoid the unnecessary invalidate_lock acquisition
for this fast path.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 mm/filemap.c | 37 +++++++++++++++++++++++++++++--------
 1 file changed, 29 insertions(+), 8 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index b8e9bccecd9f..82269aaa715e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3030,6 +3030,7 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	pgoff_t max_off;
 	struct page *page;
 	vm_fault_t ret = 0;
+	bool mapping_locked = false;
 
 	max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
 	if (unlikely(offset >= max_off))
@@ -3053,12 +3054,16 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 		fpin = do_sync_mmap_readahead(vmf);
 	}
 
-	/*
-	 * See comment in filemap_create_page() why we need invalidate_lock
-	 */
-	filemap_invalidate_lock_shared(mapping);
 	if (!page) {
 retry_find:
+		/*
+		 * See comment in filemap_create_page() why we need
+		 * invalidate_lock
+		 */
+		if (!mapping_locked) {
+			filemap_invalidate_lock_shared(mapping);
+			mapping_locked = true;
+		}
 		page = pagecache_get_page(mapping, offset,
 					  FGP_CREAT|FGP_FOR_MMAP,
 					  vmf->gfp_mask);
@@ -3068,6 +3073,9 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 			filemap_invalidate_unlock_shared(mapping);
 			return VM_FAULT_OOM;
 		}
+	} else if (unlikely(!PageUptodate(page))) {
+		filemap_invalidate_lock_shared(mapping);
+		mapping_locked = true;
 	}
 
 	if (!lock_page_maybe_drop_mmap(vmf, page, &fpin))
@@ -3085,8 +3093,20 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	 * We have a locked page in the page cache, now we need to check
 	 * that it's up-to-date. If not, it is going to be due to an error.
 	 */
-	if (unlikely(!PageUptodate(page)))
+	if (unlikely(!PageUptodate(page))) {
+		/*
+		 * The page was in cache and uptodate and now it is not.
+		 * Strange but possible since we didn't hold the page lock all
+		 * the time. Let's drop everything get the invalidate lock and
+		 * try again.
+		 */
+		if (!mapping_locked) {
+			unlock_page(page);
+			put_page(page);
+			goto retry_find;
+		}
 		goto page_not_uptodate;
+	}
 
 	/*
 	 * We've made it this far and we had to drop our mmap_lock, now is the
@@ -3097,6 +3117,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 		unlock_page(page);
 		goto out_retry;
 	}
+	if (mapping_locked)
+		filemap_invalidate_unlock_shared(mapping);
 
 	/*
 	 * Found the page and have a reference on it.
@@ -3106,11 +3128,9 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	if (unlikely(offset >= max_off)) {
 		unlock_page(page);
 		put_page(page);
-		filemap_invalidate_unlock_shared(mapping);
 		return VM_FAULT_SIGBUS;
 	}
 
-	filemap_invalidate_unlock_shared(mapping);
 	vmf->page = page;
 	return ret | VM_FAULT_LOCKED;
 
@@ -3141,7 +3161,8 @@ vm_fault_t filemap_fault(struct vm_fault *vmf)
 	 */
 	if (page)
 		put_page(page);
-	filemap_invalidate_unlock_shared(mapping);
+	if (mapping_locked)
+		filemap_invalidate_unlock_shared(mapping);
 	if (fpin)
 		fput(fpin);
 	return ret | VM_FAULT_RETRY;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-07-06 20:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-30 17:25 [GIT PULL] Hole puch vs page cache filling races fixes for 5.14-rc1 Jan Kara
2021-07-01  1:15 ` Linus Torvalds
2021-07-01 16:19   ` Jan Kara
2021-07-01 18:04     ` Linus Torvalds
2021-07-06 20:17       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).