linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache
@ 2023-06-28 10:48 David Howells
  2023-06-28 10:48 ` [PATCH v7 1/2] mm: Merge folio_has_private()/filemap_release_folio() call pairs David Howells
  2023-06-28 10:48 ` [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache David Howells
  0 siblings, 2 replies; 12+ messages in thread
From: David Howells @ 2023-06-28 10:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, Matthew Wilcox, Linus Torvalds, Jeff Layton,
	Christoph Hellwig, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-erofs, linux-ext4, linux-cachefs,
	linux-fsdevel

Hi Andrew,

Should this go through the mm tree?

This fixes an optimisation in fscache whereby we don't read from the cache
for a particular file until we know that there's data there that we don't
have in the pagecache.  The problem is that I'm no longer using PG_fscache
(aka PG_private_2) to indicate that the page is cached and so I don't get a
notification when a cached page is dropped from the pagecache.

The first patch merges some folio_has_private() and filemap_release_folio()
pairs and introduces a helper, folio_needs_release(), to indicate if a
release is required.

The second patch is the actual fix.  Following Willy's suggestions[1], it
adds an AS_RELEASE_ALWAYS flag to an address_space that will make
filemap_release_folio() always call ->release_folio(), even if
PG_private/PG_private_2 aren't set.  folio_needs_release() is altered to
add a check for this.

David

Changes:
========
ver #7)
 - Make NFS set AS_RELEASE_ALWAYS.

ver #6)
 - Drop the third patch which removes a duplicate check in vmscan().

ver #5)
 - Rebased on linus/master.  try_to_release_page() has now been entirely
   replaced by filemap_release_folio(), barring one comment.
 - Cleaned up some pairs in ext4.

ver #4)
 - Split has_private/release call pairs into own patch.
 - Moved folio_needs_release() to mm/internal.h and removed open-coded
   version from filemap_release_folio().
 - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().
 - Added experimental patch to reduce shrink_folio_list().

ver #3)
 - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
 - Moved a '&&' to the correct line.

ver #2)
 - Rewrote entirely according to Willy's suggestion[1].

Link: https://lore.kernel.org/r/Yk9V/03wgdYi65Lb@casper.infradead.org/ [1]
Link: https://lore.kernel.org/r/164928630577.457102.8519251179327601178.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/166844174069.1124521.10890506360974169994.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/166869495238.3720468.4878151409085146764.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/1459152.1669208550@warthog.procyon.org.uk/ # v3 also
Link: https://lore.kernel.org/r/166924370539.1772793.13730698360771821317.stgit@warthog.procyon.org.uk/ # v4
Link: https://lore.kernel.org/r/167172131368.2334525.8569808925687731937.stgit@warthog.procyon.org.uk/ # v5
Link: https://lore.kernel.org/r/20230216150701.3654894-1-dhowells@redhat.com/ # v6

David Howells (2):
  mm: Merge folio_has_private()/filemap_release_folio() call pairs
  mm, netfs, fscache: Stop read optimisation when folio removed from
    pagecache

 fs/9p/cache.c           |  2 ++
 fs/afs/internal.h       |  2 ++
 fs/cachefiles/namei.c   |  2 ++
 fs/ceph/cache.c         |  2 ++
 fs/ext4/move_extent.c   | 12 ++++--------
 fs/nfs/fscache.c        |  3 +++
 fs/smb/client/fscache.c |  2 ++
 fs/splice.c             |  3 +--
 include/linux/pagemap.h | 16 ++++++++++++++++
 mm/filemap.c            |  2 ++
 mm/huge_memory.c        |  3 +--
 mm/internal.h           | 11 +++++++++++
 mm/khugepaged.c         |  3 +--
 mm/memory-failure.c     |  8 +++-----
 mm/migrate.c            |  3 +--
 mm/truncate.c           |  6 ++----
 mm/vmscan.c             |  8 ++++----
 17 files changed, 59 insertions(+), 29 deletions(-)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v7 1/2] mm: Merge folio_has_private()/filemap_release_folio() call pairs
  2023-06-28 10:48 [PATCH v7 0/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache David Howells
@ 2023-06-28 10:48 ` David Howells
  2023-06-28 10:48 ` [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache David Howells
  1 sibling, 0 replies; 12+ messages in thread
From: David Howells @ 2023-06-28 10:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, Matthew Wilcox, Linus Torvalds, Jeff Layton,
	Christoph Hellwig, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-erofs, linux-ext4, linux-cachefs,
	linux-fsdevel, Rohith Surabattula, Steve French, Shyam Prasad N,
	Dave Wysochanski, Dominique Martinet, Ilya Dryomov,
	Theodore Ts'o, Andreas Dilger, linux-mm

Make filemap_release_folio() check folio_has_private().  Then, in most
cases, where a call to folio_has_private() is immediately followed by a
call to filemap_release_folio(), we can get rid of the test in the pair.

There are a couple of sites in mm/vscan.c that this can't so easily be
done.  In shrink_folio_list(), there are actually three cases (something
different is done for incompletely invalidated buffers), but
filemap_release_folio() elides two of them.

In shrink_active_list(), we don't have have the folio lock yet, so the
check allows us to avoid locking the page unnecessarily.

A wrapper function to check if a folio needs release is provided for those
places that still need to do it in the mm/ directory.  This will acquire
additional parts to the condition in a future patch.

After this, the only remaining caller of folio_has_private() outside of mm/
is a check in fuse.

Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Dave Wysochanski <dwysocha@redhat.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: "Theodore Ts'o" <tytso@mit.edu>
cc: Andreas Dilger <adilger.kernel@dilger.ca>
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-afs@lists.infradead.org
cc: v9fs-developer@lists.sourceforge.net
cc: ceph-devel@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-ext4@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---

Notes:
    ver #5)
     - Rebased on linus/master.  try_to_release_page() has now been entirely
       replaced by filemap_release_folio(), barring one comment.
     - Cleaned up some pairs in ext4.
    
    ver #4)
     - Split from fscache fix.
     - Moved folio_needs_release() to mm/internal.h and removed open-coded
       version from filemap_release_folio().
    
    ver #3)
     - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
     - Moved a '&&' to the correct line.
    
    ver #2)
     - Rewrote entirely according to Willy's suggestion[1].

 fs/ext4/move_extent.c | 12 ++++--------
 fs/splice.c           |  3 +--
 mm/filemap.c          |  2 ++
 mm/huge_memory.c      |  3 +--
 mm/internal.h         |  8 ++++++++
 mm/khugepaged.c       |  3 +--
 mm/memory-failure.c   |  8 +++-----
 mm/migrate.c          |  3 +--
 mm/truncate.c         |  6 ++----
 mm/vmscan.c           |  8 ++++----
 10 files changed, 27 insertions(+), 29 deletions(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index b5af2fc03b2f..251584a23d05 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -340,10 +340,8 @@ move_extent_per_page(struct file *o_filp, struct inode *donor_inode,
 			ext4_double_up_write_data_sem(orig_inode, donor_inode);
 			goto data_copy;
 		}
-		if ((folio_has_private(folio[0]) &&
-		     !filemap_release_folio(folio[0], 0)) ||
-		    (folio_has_private(folio[1]) &&
-		     !filemap_release_folio(folio[1], 0))) {
+		if (!filemap_release_folio(folio[0], 0) ||
+		    !filemap_release_folio(folio[1], 0)) {
 			*err = -EBUSY;
 			goto drop_data_sem;
 		}
@@ -362,10 +360,8 @@ move_extent_per_page(struct file *o_filp, struct inode *donor_inode,
 
 	/* At this point all buffers in range are uptodate, old mapping layout
 	 * is no longer required, try to drop it now. */
-	if ((folio_has_private(folio[0]) &&
-		!filemap_release_folio(folio[0], 0)) ||
-	    (folio_has_private(folio[1]) &&
-		!filemap_release_folio(folio[1], 0))) {
+	if (!filemap_release_folio(folio[0], 0) ||
+	    !filemap_release_folio(folio[1], 0)) {
 		*err = -EBUSY;
 		goto unlock_folios;
 	}
diff --git a/fs/splice.c b/fs/splice.c
index 7a9565d8ec4f..6412848891df 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -82,8 +82,7 @@ static bool page_cache_pipe_buf_try_steal(struct pipe_inode_info *pipe,
 		 */
 		folio_wait_writeback(folio);
 
-		if (folio_has_private(folio) &&
-		    !filemap_release_folio(folio, GFP_KERNEL))
+		if (!filemap_release_folio(folio, GFP_KERNEL))
 			goto out_unlock;
 
 		/*
diff --git a/mm/filemap.c b/mm/filemap.c
index 00f01d8ead47..31d07c2f8d32 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4134,6 +4134,8 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp)
 	struct address_space * const mapping = folio->mapping;
 
 	BUG_ON(!folio_test_locked(folio));
+	if (!folio_needs_release(folio))
+		return true;
 	if (folio_test_writeback(folio))
 		return false;
 
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 624671aaa60d..a14b3d1af9d7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2688,8 +2688,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 		gfp = current_gfp_context(mapping_gfp_mask(mapping) &
 							GFP_RECLAIM_MASK);
 
-		if (folio_test_private(folio) &&
-				!filemap_release_folio(folio, gfp)) {
+		if (!filemap_release_folio(folio, gfp)) {
 			ret = -EBUSY;
 			goto out;
 		}
diff --git a/mm/internal.h b/mm/internal.h
index e6029d94bdb2..a76314764d8c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -170,6 +170,14 @@ static inline void set_page_refcounted(struct page *page)
 	set_page_count(page, 1);
 }
 
+/*
+ * Return true if a folio needs ->release_folio() calling upon it.
+ */
+static inline bool folio_needs_release(struct folio *folio)
+{
+	return folio_has_private(folio);
+}
+
 extern unsigned long highest_memmap_pfn;
 
 /*
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 2d0d58fb4e7f..1e6e6a25cd52 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2058,8 +2058,7 @@ static int collapse_file(struct mm_struct *mm, unsigned long addr,
 			goto out_unlock;
 		}
 
-		if (folio_has_private(folio) &&
-		    !filemap_release_folio(folio, GFP_KERNEL)) {
+		if (!filemap_release_folio(folio, GFP_KERNEL)) {
 			result = SCAN_PAGE_HAS_PRIVATE;
 			folio_putback_lru(folio);
 			goto out_unlock;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 5b663eca1f29..38321eb85fb2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -944,14 +944,12 @@ static int truncate_error_page(struct page *p, unsigned long pfn,
 		struct folio *folio = page_folio(p);
 		int err = mapping->a_ops->error_remove_page(mapping, p);
 
-		if (err != 0) {
+		if (err != 0)
 			pr_info("%#lx: Failed to punch page: %d\n", pfn, err);
-		} else if (folio_has_private(folio) &&
-			   !filemap_release_folio(folio, GFP_NOIO)) {
+		else if (!filemap_release_folio(folio, GFP_NOIO))
 			pr_info("%#lx: failed to release buffers\n", pfn);
-		} else {
+		else
 			ret = MF_RECOVERED;
-		}
 	} else {
 		/*
 		 * If the file system doesn't support it just invalidate
diff --git a/mm/migrate.c b/mm/migrate.c
index 01cac26a3127..5fc27d1a7eaf 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -929,8 +929,7 @@ static int fallback_migrate_folio(struct address_space *mapping,
 	 * Buffers may be managed in a filesystem specific way.
 	 * We must have no buffers or drop them.
 	 */
-	if (folio_test_private(src) &&
-	    !filemap_release_folio(src, GFP_KERNEL))
+	if (!filemap_release_folio(src, GFP_KERNEL))
 		return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY;
 
 	return migrate_folio(mapping, dst, src, mode);
diff --git a/mm/truncate.c b/mm/truncate.c
index 86de31ed4d32..6fb830369829 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -19,7 +19,6 @@
 #include <linux/highmem.h>
 #include <linux/pagevec.h>
 #include <linux/task_io_accounting_ops.h>
-#include <linux/buffer_head.h>	/* grr. try_to_release_page */
 #include <linux/shmem_fs.h>
 #include <linux/rmap.h>
 #include "internal.h"
@@ -276,7 +275,7 @@ static long mapping_evict_folio(struct address_space *mapping,
 	if (folio_ref_count(folio) >
 			folio_nr_pages(folio) + folio_has_private(folio) + 1)
 		return 0;
-	if (folio_has_private(folio) && !filemap_release_folio(folio, 0))
+	if (!filemap_release_folio(folio, 0))
 		return 0;
 
 	return remove_mapping(mapping, folio);
@@ -574,8 +573,7 @@ static int invalidate_complete_folio2(struct address_space *mapping,
 	if (folio->mapping != mapping)
 		return 0;
 
-	if (folio_has_private(folio) &&
-	    !filemap_release_folio(folio, GFP_KERNEL))
+	if (!filemap_release_folio(folio, GFP_KERNEL))
 		return 0;
 
 	spin_lock(&mapping->host->i_lock);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5bf98d0a22c9..49bc412b31a2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2058,7 +2058,7 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 		 * (refcount == 1) it can be freed.  Otherwise, leave
 		 * the folio on the LRU so it is swappable.
 		 */
-		if (folio_has_private(folio)) {
+		if (folio_needs_release(folio)) {
 			if (!filemap_release_folio(folio, sc->gfp_mask))
 				goto activate_locked;
 			if (!mapping && folio_ref_count(folio) == 1) {
@@ -2703,9 +2703,9 @@ static void shrink_active_list(unsigned long nr_to_scan,
 		}
 
 		if (unlikely(buffer_heads_over_limit)) {
-			if (folio_test_private(folio) && folio_trylock(folio)) {
-				if (folio_test_private(folio))
-					filemap_release_folio(folio, 0);
+			if (folio_needs_release(folio) &&
+			    folio_trylock(folio)) {
+				filemap_release_folio(folio, 0);
 				folio_unlock(folio);
 			}
 		}


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache
  2023-06-28 10:48 [PATCH v7 0/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache David Howells
  2023-06-28 10:48 ` [PATCH v7 1/2] mm: Merge folio_has_private()/filemap_release_folio() call pairs David Howells
@ 2023-06-28 10:48 ` David Howells
  2023-06-29  0:39   ` [Linux-cachefs] " Xiubo Li
                     ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: David Howells @ 2023-06-28 10:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Howells, Matthew Wilcox, Linus Torvalds, Jeff Layton,
	Christoph Hellwig, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-erofs, linux-ext4, linux-cachefs,
	linux-fsdevel, Rohith Surabattula, Steve French, Shyam Prasad N,
	Dave Wysochanski, Dominique Martinet, Ilya Dryomov, linux-mm

Fscache has an optimisation by which reads from the cache are skipped until
we know that (a) there's data there to be read and (b) that data isn't
entirely covered by pages resident in the netfs pagecache.  This is done
with two flags manipulated by fscache_note_page_release():

	if (...
	    test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
	    test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
		clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);

where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate
that netfslib should download from the server or clear the page instead.

The fscache_note_page_release() function is intended to be called from
->releasepage() - but that only gets called if PG_private or PG_private_2
is set - and currently the former is at the discretion of the network
filesystem and the latter is only set whilst a page is being written to the
cache, so sometimes we miss clearing the optimisation.

Fix this by following Willy's suggestion[1] and adding an address_space
flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call
->release_folio() if it's set, even if PG_private or PG_private_2 aren't
set.

Note that this would require folio_test_private() and page_has_private() to
become more complicated.  To avoid that, in the places[*] where these are
used to conditionalise calls to filemap_release_folio() and
try_to_release_page(), the tests are removed the those functions just
jumped to unconditionally and the test is performed there.

[*] There are some exceptions in vmscan.c where the check guards more than
just a call to the releaser.  I've added a function, folio_needs_release()
to wrap all the checks for that.

AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from
fscache and cleared in ->evict_inode() before truncate_inode_pages_final()
is called.

Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared
and the optimisation cancelled if a cachefiles object already contains data
when we open it.

Fixes: 1f67e6d0b188 ("fscache: Provide a function to note the release of a page")
Fixes: 047487c947e8 ("cachefiles: Implement the I/O routines")
Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Linus Torvalds <torvalds@linux-foundation.org>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Dave Wysochanski <dwysocha@redhat.com>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: linux-cachefs@redhat.com
cc: linux-cifs@vger.kernel.org
cc: linux-afs@lists.infradead.org
cc: v9fs-developer@lists.sourceforge.net
cc: ceph-devel@vger.kernel.org
cc: linux-nfs@vger.kernel.org
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
---

Notes:
    ver #7)
     - Make NFS set AS_RELEASE_ALWAYS.
    
    ver #4)
     - Split out merging of folio_has_private()/filemap_release_folio() call
       pairs into a preceding patch.
     - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().
    
    ver #3)
     - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
     - Moved a '&&' to the correct line.
    
    ver #2)
     - Rewrote entirely according to Willy's suggestion[1].

 fs/9p/cache.c           |  2 ++
 fs/afs/internal.h       |  2 ++
 fs/cachefiles/namei.c   |  2 ++
 fs/ceph/cache.c         |  2 ++
 fs/nfs/fscache.c        |  3 +++
 fs/smb/client/fscache.c |  2 ++
 include/linux/pagemap.h | 16 ++++++++++++++++
 mm/internal.h           |  5 ++++-
 8 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/fs/9p/cache.c b/fs/9p/cache.c
index cebba4eaa0b5..12c0ae29f185 100644
--- a/fs/9p/cache.c
+++ b/fs/9p/cache.c
@@ -68,6 +68,8 @@ void v9fs_cache_inode_get_cookie(struct inode *inode)
 				       &path, sizeof(path),
 				       &version, sizeof(version),
 				       i_size_read(&v9inode->netfs.inode));
+	if (v9inode->netfs.cache)
+		mapping_set_release_always(inode->i_mapping);
 
 	p9_debug(P9_DEBUG_FSC, "inode %p get cookie %p\n",
 		 inode, v9fs_inode_cookie(v9inode));
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 9d3d64921106..da73b97e19a9 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -681,6 +681,8 @@ static inline void afs_vnode_set_cache(struct afs_vnode *vnode,
 {
 #ifdef CONFIG_AFS_FSCACHE
 	vnode->netfs.cache = cookie;
+	if (cookie)
+		mapping_set_release_always(vnode->netfs.inode.i_mapping);
 #endif
 }
 
diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
index d9d22d0ec38a..7bf7a5fcc045 100644
--- a/fs/cachefiles/namei.c
+++ b/fs/cachefiles/namei.c
@@ -585,6 +585,8 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
 	if (ret < 0)
 		goto check_failed;
 
+	clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &object->cookie->flags);
+
 	object->file = file;
 
 	/* Always update the atime on an object we've just looked up (this is
diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
index 177d8e8d73fe..de1dee46d3df 100644
--- a/fs/ceph/cache.c
+++ b/fs/ceph/cache.c
@@ -36,6 +36,8 @@ void ceph_fscache_register_inode_cookie(struct inode *inode)
 				       &ci->i_vino, sizeof(ci->i_vino),
 				       &ci->i_version, sizeof(ci->i_version),
 				       i_size_read(inode));
+	if (ci->netfs.cache)
+		mapping_set_release_always(inode->i_mapping);
 }
 
 void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info *ci)
diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
index 8c35d88a84b1..b05717fe0d4e 100644
--- a/fs/nfs/fscache.c
+++ b/fs/nfs/fscache.c
@@ -180,6 +180,9 @@ void nfs_fscache_init_inode(struct inode *inode)
 					       &auxdata,      /* aux_data */
 					       sizeof(auxdata),
 					       i_size_read(inode));
+
+	if (netfs_inode(inode)->cache)
+		mapping_set_release_always(inode->i_mapping);
 }
 
 /*
diff --git a/fs/smb/client/fscache.c b/fs/smb/client/fscache.c
index 8f6909d633da..3677525ee993 100644
--- a/fs/smb/client/fscache.c
+++ b/fs/smb/client/fscache.c
@@ -108,6 +108,8 @@ void cifs_fscache_get_inode_cookie(struct inode *inode)
 				       &cifsi->uniqueid, sizeof(cifsi->uniqueid),
 				       &cd, sizeof(cd),
 				       i_size_read(&cifsi->netfs.inode));
+	if (cifsi->netfs.cache)
+		mapping_set_release_always(inode->i_mapping);
 }
 
 void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index a56308a9d1a4..a1176ceb4a0c 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -199,6 +199,7 @@ enum mapping_flags {
 	/* writeback related tags are not used */
 	AS_NO_WRITEBACK_TAGS = 5,
 	AS_LARGE_FOLIO_SUPPORT = 6,
+	AS_RELEASE_ALWAYS,	/* Call ->release_folio(), even if no private data */
 };
 
 /**
@@ -269,6 +270,21 @@ static inline int mapping_use_writeback_tags(struct address_space *mapping)
 	return !test_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags);
 }
 
+static inline bool mapping_release_always(const struct address_space *mapping)
+{
+	return test_bit(AS_RELEASE_ALWAYS, &mapping->flags);
+}
+
+static inline void mapping_set_release_always(struct address_space *mapping)
+{
+	set_bit(AS_RELEASE_ALWAYS, &mapping->flags);
+}
+
+static inline void mapping_clear_release_always(struct address_space *mapping)
+{
+	clear_bit(AS_RELEASE_ALWAYS, &mapping->flags);
+}
+
 static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
 {
 	return mapping->gfp_mask;
diff --git a/mm/internal.h b/mm/internal.h
index a76314764d8c..86aef26df905 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -175,7 +175,10 @@ static inline void set_page_refcounted(struct page *page)
  */
 static inline bool folio_needs_release(struct folio *folio)
 {
-	return folio_has_private(folio);
+	struct address_space *mapping = folio->mapping;
+
+	return folio_has_private(folio) ||
+		(mapping && mapping_release_always(mapping));
 }
 
 extern unsigned long highest_memmap_pfn;


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Linux-cachefs] [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache
  2023-06-28 10:48 ` [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache David Howells
@ 2023-06-29  0:39   ` Xiubo Li
  2023-06-30  3:20     ` Jingbo Xu
       [not found]   ` <ZKg/J3OG3kQ9ynSO@fedora>
       [not found]   ` <202307171548.7ab20146-oliver.sang@intel.com>
  2 siblings, 1 reply; 12+ messages in thread
From: Xiubo Li @ 2023-06-29  0:39 UTC (permalink / raw)
  To: David Howells, Andrew Morton
  Cc: Shyam Prasad N, Christoph Hellwig, linux-nfs, Ilya Dryomov,
	linux-cifs, Rohith Surabattula, linux-erofs, Jeff Layton,
	Matthew Wilcox, Steve French, linux-cachefs, linux-mm,
	linux-fsdevel, v9fs-developer, ceph-devel, linux-ext4,
	Linus Torvalds, linux-afs, Dominique Martinet


On 6/28/23 18:48, David Howells wrote:
> Fscache has an optimisation by which reads from the cache are skipped until
> we know that (a) there's data there to be read and (b) that data isn't
> entirely covered by pages resident in the netfs pagecache.  This is done
> with two flags manipulated by fscache_note_page_release():
>
> 	if (...
> 	    test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
> 	    test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
> 		clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
>
> where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate
> that netfslib should download from the server or clear the page instead.
>
> The fscache_note_page_release() function is intended to be called from
> ->releasepage() - but that only gets called if PG_private or PG_private_2
> is set - and currently the former is at the discretion of the network
> filesystem and the latter is only set whilst a page is being written to the
> cache, so sometimes we miss clearing the optimisation.
>
> Fix this by following Willy's suggestion[1] and adding an address_space
> flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call
> ->release_folio() if it's set, even if PG_private or PG_private_2 aren't
> set.
>
> Note that this would require folio_test_private() and page_has_private() to
> become more complicated.  To avoid that, in the places[*] where these are
> used to conditionalise calls to filemap_release_folio() and
> try_to_release_page(), the tests are removed the those functions just
> jumped to unconditionally and the test is performed there.
>
> [*] There are some exceptions in vmscan.c where the check guards more than
> just a call to the releaser.  I've added a function, folio_needs_release()
> to wrap all the checks for that.
>
> AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from
> fscache and cleared in ->evict_inode() before truncate_inode_pages_final()
> is called.
>
> Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared
> and the optimisation cancelled if a cachefiles object already contains data
> when we open it.
>
> Fixes: 1f67e6d0b188 ("fscache: Provide a function to note the release of a page")
> Fixes: 047487c947e8 ("cachefiles: Implement the I/O routines")
> Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: David Howells <dhowells@redhat.com>
> cc: Matthew Wilcox <willy@infradead.org>
> cc: Linus Torvalds <torvalds@linux-foundation.org>
> cc: Steve French <sfrench@samba.org>
> cc: Shyam Prasad N <nspmangalore@gmail.com>
> cc: Rohith Surabattula <rohiths.msft@gmail.com>
> cc: Dave Wysochanski <dwysocha@redhat.com>
> cc: Dominique Martinet <asmadeus@codewreck.org>
> cc: Ilya Dryomov <idryomov@gmail.com>
> cc: linux-cachefs@redhat.com
> cc: linux-cifs@vger.kernel.org
> cc: linux-afs@lists.infradead.org
> cc: v9fs-developer@lists.sourceforge.net
> cc: ceph-devel@vger.kernel.org
> cc: linux-nfs@vger.kernel.org
> cc: linux-fsdevel@vger.kernel.org
> cc: linux-mm@kvack.org
> ---
>
> Notes:
>      ver #7)
>       - Make NFS set AS_RELEASE_ALWAYS.
>      
>      ver #4)
>       - Split out merging of folio_has_private()/filemap_release_folio() call
>         pairs into a preceding patch.
>       - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().
>      
>      ver #3)
>       - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
>       - Moved a '&&' to the correct line.
>      
>      ver #2)
>       - Rewrote entirely according to Willy's suggestion[1].
>
>   fs/9p/cache.c           |  2 ++
>   fs/afs/internal.h       |  2 ++
>   fs/cachefiles/namei.c   |  2 ++
>   fs/ceph/cache.c         |  2 ++
>   fs/nfs/fscache.c        |  3 +++
>   fs/smb/client/fscache.c |  2 ++
>   include/linux/pagemap.h | 16 ++++++++++++++++
>   mm/internal.h           |  5 ++++-
>   8 files changed, 33 insertions(+), 1 deletion(-)

Just one question. Shouldn't do this in 'fs/erofs/fscache.c' too ?

>
> diff --git a/fs/9p/cache.c b/fs/9p/cache.c
> index cebba4eaa0b5..12c0ae29f185 100644
> --- a/fs/9p/cache.c
> +++ b/fs/9p/cache.c
> @@ -68,6 +68,8 @@ void v9fs_cache_inode_get_cookie(struct inode *inode)
>   				       &path, sizeof(path),
>   				       &version, sizeof(version),
>   				       i_size_read(&v9inode->netfs.inode));
> +	if (v9inode->netfs.cache)
> +		mapping_set_release_always(inode->i_mapping);
>   
>   	p9_debug(P9_DEBUG_FSC, "inode %p get cookie %p\n",
>   		 inode, v9fs_inode_cookie(v9inode));
> diff --git a/fs/afs/internal.h b/fs/afs/internal.h
> index 9d3d64921106..da73b97e19a9 100644
> --- a/fs/afs/internal.h
> +++ b/fs/afs/internal.h
> @@ -681,6 +681,8 @@ static inline void afs_vnode_set_cache(struct afs_vnode *vnode,
>   {
>   #ifdef CONFIG_AFS_FSCACHE
>   	vnode->netfs.cache = cookie;
> +	if (cookie)
> +		mapping_set_release_always(vnode->netfs.inode.i_mapping);

If all the cookie need to do the same thing, then how about doing this 
in 'fscache_acquire_cookie()' ?

Thanks

- Xiubo

>   #endif
>   }
>   
> diff --git a/fs/cachefiles/namei.c b/fs/cachefiles/namei.c
> index d9d22d0ec38a..7bf7a5fcc045 100644
> --- a/fs/cachefiles/namei.c
> +++ b/fs/cachefiles/namei.c
> @@ -585,6 +585,8 @@ static bool cachefiles_open_file(struct cachefiles_object *object,
>   	if (ret < 0)
>   		goto check_failed;
>   
> +	clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &object->cookie->flags);
> +
>   	object->file = file;
>   
>   	/* Always update the atime on an object we've just looked up (this is
> diff --git a/fs/ceph/cache.c b/fs/ceph/cache.c
> index 177d8e8d73fe..de1dee46d3df 100644
> --- a/fs/ceph/cache.c
> +++ b/fs/ceph/cache.c
> @@ -36,6 +36,8 @@ void ceph_fscache_register_inode_cookie(struct inode *inode)
>   				       &ci->i_vino, sizeof(ci->i_vino),
>   				       &ci->i_version, sizeof(ci->i_version),
>   				       i_size_read(inode));
> +	if (ci->netfs.cache)
> +		mapping_set_release_always(inode->i_mapping);
>   }
>   
>   void ceph_fscache_unregister_inode_cookie(struct ceph_inode_info *ci)
> diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> index 8c35d88a84b1..b05717fe0d4e 100644
> --- a/fs/nfs/fscache.c
> +++ b/fs/nfs/fscache.c
> @@ -180,6 +180,9 @@ void nfs_fscache_init_inode(struct inode *inode)
>   					       &auxdata,      /* aux_data */
>   					       sizeof(auxdata),
>   					       i_size_read(inode));
> +
> +	if (netfs_inode(inode)->cache)
> +		mapping_set_release_always(inode->i_mapping);
>   }
>   
>   /*
> diff --git a/fs/smb/client/fscache.c b/fs/smb/client/fscache.c
> index 8f6909d633da..3677525ee993 100644
> --- a/fs/smb/client/fscache.c
> +++ b/fs/smb/client/fscache.c
> @@ -108,6 +108,8 @@ void cifs_fscache_get_inode_cookie(struct inode *inode)
>   				       &cifsi->uniqueid, sizeof(cifsi->uniqueid),
>   				       &cd, sizeof(cd),
>   				       i_size_read(&cifsi->netfs.inode));
> +	if (cifsi->netfs.cache)
> +		mapping_set_release_always(inode->i_mapping);
>   }
>   
>   void cifs_fscache_unuse_inode_cookie(struct inode *inode, bool update)
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index a56308a9d1a4..a1176ceb4a0c 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -199,6 +199,7 @@ enum mapping_flags {
>   	/* writeback related tags are not used */
>   	AS_NO_WRITEBACK_TAGS = 5,
>   	AS_LARGE_FOLIO_SUPPORT = 6,
> +	AS_RELEASE_ALWAYS,	/* Call ->release_folio(), even if no private data */
>   };
>   
>   /**
> @@ -269,6 +270,21 @@ static inline int mapping_use_writeback_tags(struct address_space *mapping)
>   	return !test_bit(AS_NO_WRITEBACK_TAGS, &mapping->flags);
>   }
>   
> +static inline bool mapping_release_always(const struct address_space *mapping)
> +{
> +	return test_bit(AS_RELEASE_ALWAYS, &mapping->flags);
> +}
> +
> +static inline void mapping_set_release_always(struct address_space *mapping)
> +{
> +	set_bit(AS_RELEASE_ALWAYS, &mapping->flags);
> +}
> +
> +static inline void mapping_clear_release_always(struct address_space *mapping)
> +{
> +	clear_bit(AS_RELEASE_ALWAYS, &mapping->flags);
> +}
> +
>   static inline gfp_t mapping_gfp_mask(struct address_space * mapping)
>   {
>   	return mapping->gfp_mask;
> diff --git a/mm/internal.h b/mm/internal.h
> index a76314764d8c..86aef26df905 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -175,7 +175,10 @@ static inline void set_page_refcounted(struct page *page)
>    */
>   static inline bool folio_needs_release(struct folio *folio)
>   {
> -	return folio_has_private(folio);
> +	struct address_space *mapping = folio->mapping;
> +
> +	return folio_has_private(folio) ||
> +		(mapping && mapping_release_always(mapping));
>   }
>   
>   extern unsigned long highest_memmap_pfn;
> --
> Linux-cachefs mailing list
> Linux-cachefs@redhat.com
> https://listman.redhat.com/mailman/listinfo/linux-cachefs
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Linux-cachefs] [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache
  2023-06-29  0:39   ` [Linux-cachefs] " Xiubo Li
@ 2023-06-30  3:20     ` Jingbo Xu
  0 siblings, 0 replies; 12+ messages in thread
From: Jingbo Xu @ 2023-06-30  3:20 UTC (permalink / raw)
  To: Xiubo Li, David Howells, Andrew Morton
  Cc: Shyam Prasad N, linux-cifs, linux-nfs, linux-mm,
	Rohith Surabattula, Linus Torvalds, Dominique Martinet,
	Jeff Layton, Matthew Wilcox, linux-afs, Christoph Hellwig,
	Steve French, linux-cachefs, linux-fsdevel, v9fs-developer,
	Ilya Dryomov, linux-ext4, linux-erofs, ceph-devel



On 6/29/23 8:39 AM, Xiubo Li wrote:
> 
> On 6/28/23 18:48, David Howells wrote:
>> Fscache has an optimisation by which reads from the cache are skipped
>> until
>> we know that (a) there's data there to be read and (b) that data isn't
>> entirely covered by pages resident in the netfs pagecache.  This is done
>> with two flags manipulated by fscache_note_page_release():
>>
>>     if (...
>>         test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
>>         test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
>>         clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
>>
>> where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to
>> indicate
>> that netfslib should download from the server or clear the page instead.
>>
>> The fscache_note_page_release() function is intended to be called from
>> ->releasepage() - but that only gets called if PG_private or PG_private_2
>> is set - and currently the former is at the discretion of the network
>> filesystem and the latter is only set whilst a page is being written
>> to the
>> cache, so sometimes we miss clearing the optimisation.
>>
>> Fix this by following Willy's suggestion[1] and adding an address_space
>> flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always
>> call
>> ->release_folio() if it's set, even if PG_private or PG_private_2 aren't
>> set.
>>
>> Note that this would require folio_test_private() and
>> page_has_private() to
>> become more complicated.  To avoid that, in the places[*] where these are
>> used to conditionalise calls to filemap_release_folio() and
>> try_to_release_page(), the tests are removed the those functions just
>> jumped to unconditionally and the test is performed there.
>>
>> [*] There are some exceptions in vmscan.c where the check guards more
>> than
>> just a call to the releaser.  I've added a function,
>> folio_needs_release()
>> to wrap all the checks for that.
>>
>> AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from
>> fscache and cleared in ->evict_inode() before
>> truncate_inode_pages_final()
>> is called.
>>
>> Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared
>> and the optimisation cancelled if a cachefiles object already contains
>> data
>> when we open it.
>>
>> Fixes: 1f67e6d0b188 ("fscache: Provide a function to note the release
>> of a page")
>> Fixes: 047487c947e8 ("cachefiles: Implement the I/O routines")
>> Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
>> Suggested-by: Matthew Wilcox <willy@infradead.org>
>> Signed-off-by: David Howells <dhowells@redhat.com>
>> cc: Matthew Wilcox <willy@infradead.org>
>> cc: Linus Torvalds <torvalds@linux-foundation.org>
>> cc: Steve French <sfrench@samba.org>
>> cc: Shyam Prasad N <nspmangalore@gmail.com>
>> cc: Rohith Surabattula <rohiths.msft@gmail.com>
>> cc: Dave Wysochanski <dwysocha@redhat.com>
>> cc: Dominique Martinet <asmadeus@codewreck.org>
>> cc: Ilya Dryomov <idryomov@gmail.com>
>> cc: linux-cachefs@redhat.com
>> cc: linux-cifs@vger.kernel.org
>> cc: linux-afs@lists.infradead.org
>> cc: v9fs-developer@lists.sourceforge.net
>> cc: ceph-devel@vger.kernel.org
>> cc: linux-nfs@vger.kernel.org
>> cc: linux-fsdevel@vger.kernel.org
>> cc: linux-mm@kvack.org
>> ---
>>
>> Notes:
>>      ver #7)
>>       - Make NFS set AS_RELEASE_ALWAYS.
>>           ver #4)
>>       - Split out merging of
>> folio_has_private()/filemap_release_folio() call
>>         pairs into a preceding patch.
>>       - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().
>>           ver #3)
>>       - Fixed mapping_clear_release_always() to use clear_bit() not
>> set_bit().
>>       - Moved a '&&' to the correct line.
>>           ver #2)
>>       - Rewrote entirely according to Willy's suggestion[1].
>>
>>   fs/9p/cache.c           |  2 ++
>>   fs/afs/internal.h       |  2 ++
>>   fs/cachefiles/namei.c   |  2 ++
>>   fs/ceph/cache.c         |  2 ++
>>   fs/nfs/fscache.c        |  3 +++
>>   fs/smb/client/fscache.c |  2 ++
>>   include/linux/pagemap.h | 16 ++++++++++++++++
>>   mm/internal.h           |  5 ++++-
>>   8 files changed, 33 insertions(+), 1 deletion(-)
> 
> Just one question. Shouldn't do this in 'fs/erofs/fscache.c' too ?
> 

Currently the read optimization is not used in fscache ondemand mode
(used by erofs), though it may not be intended...

cachefiles_ondemand_copen
  if (size)
    clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);

The read optimization is disabled as long as the backing file size is
not 0 (which is the most case).  And thus currently erofs doesn't need
to clear FSCACHE_COOKIE_NO_DATA_TO_READ in .release_folio().

-- 
Thanks,
Jingbo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG mm-unstable] BUG: KASAN: use-after-free in shrink_folio_list+0x9f4/0x1ae0
       [not found]   ` <ZKg/J3OG3kQ9ynSO@fedora>
@ 2023-07-07 16:46     ` Hyeonggon Yoo
  2023-07-07 18:12       ` David Wysochanski
  0 siblings, 1 reply; 12+ messages in thread
From: Hyeonggon Yoo @ 2023-07-07 16:46 UTC (permalink / raw)
  To: David Howells
  Cc: Andrew Morton, Matthew Wilcox, Linus Torvalds, Jeff Layton,
	Christoph Hellwig, linux-afs, linux-nfs, linux-cifs, ceph-devel,
	v9fs-developer, linux-erofs, linux-ext4, linux-cachefs,
	linux-fsdevel, Rohith Surabattula, Steve French, Shyam Prasad N,
	Dave Wysochanski, Dominique Martinet, Ilya Dryomov, linux-mm

On Sat, Jul 8, 2023 at 1:39 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> On Wed, Jun 28, 2023 at 11:48:52AM +0100, David Howells wrote:
> > Fscache has an optimisation by which reads from the cache are skipped until
> > we know that (a) there's data there to be read and (b) that data isn't
> > entirely covered by pages resident in the netfs pagecache.  This is done
> > with two flags manipulated by fscache_note_page_release():
> >
> >       if (...
> >           test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
> >           test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
> >               clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
> >
> > where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate
> > that netfslib should download from the server or clear the page instead.
> >
> > The fscache_note_page_release() function is intended to be called from
> > ->releasepage() - but that only gets called if PG_private or PG_private_2
> > is set - and currently the former is at the discretion of the network
> > filesystem and the latter is only set whilst a page is being written to the
> > cache, so sometimes we miss clearing the optimisation.
> >
> > Fix this by following Willy's suggestion[1] and adding an address_space
> > flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call
> > ->release_folio() if it's set, even if PG_private or PG_private_2 aren't
> > set.
> >
> > Note that this would require folio_test_private() and page_has_private() to
> > become more complicated.  To avoid that, in the places[*] where these are
> > used to conditionalise calls to filemap_release_folio() and
> > try_to_release_page(), the tests are removed the those functions just
> > jumped to unconditionally and the test is performed there.
> >
> > [*] There are some exceptions in vmscan.c where the check guards more than
> > just a call to the releaser.  I've added a function, folio_needs_release()
> > to wrap all the checks for that.
> >
> > AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from
> > fscache and cleared in ->evict_inode() before truncate_inode_pages_final()
> > is called.
> >
> > Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared
> > and the optimisation cancelled if a cachefiles object already contains data
> > when we open it.
> >
> > Fixes: 1f67e6d0b188 ("fscache: Provide a function to note the release of a page")
> > Fixes: 047487c947e8 ("cachefiles: Implement the I/O routines")
> > Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > Signed-off-by: David Howells <dhowells@redhat.com>
>
> Hi David,
>
> I was bisecting a use-after-free BUG on the latest mm-unstable,
> where HEAD is 347e208de0e4 ("rmap: pass the folio to __page_check_anon_rmap()").
>
> According to my bisection, this is the first bad commit.
> Use-After-Free is triggered on reclamation path when swap is enabled.

This was originally occurred during kernel compilation but
can easily be reproduced via:

stress-ng --bigheap $(nproc)

> (and couldn't trigger without swap enabled)
>
> the config, KASAN splat, bisect log are attached.
> hope this isn't too late :(
>
> > cc: Matthew Wilcox <willy@infradead.org>
> > cc: Linus Torvalds <torvalds@linux-foundation.org>
> > cc: Steve French <sfrench@samba.org>
> > cc: Shyam Prasad N <nspmangalore@gmail.com>
> > cc: Rohith Surabattula <rohiths.msft@gmail.com>
> > cc: Dave Wysochanski <dwysocha@redhat.com>
> > cc: Dominique Martinet <asmadeus@codewreck.org>
> > cc: Ilya Dryomov <idryomov@gmail.com>
> > cc: linux-cachefs@redhat.com
> > cc: linux-cifs@vger.kernel.org
> > cc: linux-afs@lists.infradead.org
> > cc: v9fs-developer@lists.sourceforge.net
> > cc: ceph-devel@vger.kernel.org
> > cc: linux-nfs@vger.kernel.org
> > cc: linux-fsdevel@vger.kernel.org
> > cc: linux-mm@kvack.org
> > ---
> >
> > Notes:
> >     ver #7)
> >      - Make NFS set AS_RELEASE_ALWAYS.
> >
> >     ver #4)
> >      - Split out merging of folio_has_private()/filemap_release_folio() call
> >        pairs into a preceding patch.
> >      - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().
> >
> >     ver #3)
> >      - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
> >      - Moved a '&&' to the correct line.
> >
> >     ver #2)
> >      - Rewrote entirely according to Willy's suggestion[1].
> >
> >  fs/9p/cache.c           |  2 ++
> >  fs/afs/internal.h       |  2 ++
> >  fs/cachefiles/namei.c   |  2 ++
> >  fs/ceph/cache.c         |  2 ++
> >  fs/nfs/fscache.c        |  3 +++
> >  fs/smb/client/fscache.c |  2 ++
> >  include/linux/pagemap.h | 16 ++++++++++++++++
> >  mm/internal.h           |  5 ++++-
> >  8 files changed, 33 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG mm-unstable] BUG: KASAN: use-after-free in shrink_folio_list+0x9f4/0x1ae0
  2023-07-07 16:46     ` [BUG mm-unstable] BUG: KASAN: use-after-free in shrink_folio_list+0x9f4/0x1ae0 Hyeonggon Yoo
@ 2023-07-07 18:12       ` David Wysochanski
  2023-07-07 18:27         ` Hyeonggon Yoo
                           ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: David Wysochanski @ 2023-07-07 18:12 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: David Howells, Andrew Morton, Matthew Wilcox, Linus Torvalds,
	Jeff Layton, Christoph Hellwig, linux-afs, linux-nfs, linux-cifs,
	ceph-devel, v9fs-developer, linux-erofs, linux-ext4,
	linux-cachefs, linux-fsdevel, Rohith Surabattula, Steve French,
	Shyam Prasad N, Dominique Martinet, Ilya Dryomov, linux-mm,
	Daire Byrne

On Fri, Jul 7, 2023 at 12:46 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
>
> On Sat, Jul 8, 2023 at 1:39 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> >
> > On Wed, Jun 28, 2023 at 11:48:52AM +0100, David Howells wrote:
> > > Fscache has an optimisation by which reads from the cache are skipped until
> > > we know that (a) there's data there to be read and (b) that data isn't
> > > entirely covered by pages resident in the netfs pagecache.  This is done
> > > with two flags manipulated by fscache_note_page_release():
> > >
> > >       if (...
> > >           test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
> > >           test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
> > >               clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
> > >
> > > where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate
> > > that netfslib should download from the server or clear the page instead.
> > >
> > > The fscache_note_page_release() function is intended to be called from
> > > ->releasepage() - but that only gets called if PG_private or PG_private_2
> > > is set - and currently the former is at the discretion of the network
> > > filesystem and the latter is only set whilst a page is being written to the
> > > cache, so sometimes we miss clearing the optimisation.
> > >
> > > Fix this by following Willy's suggestion[1] and adding an address_space
> > > flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call
> > > ->release_folio() if it's set, even if PG_private or PG_private_2 aren't
> > > set.
> > >
> > > Note that this would require folio_test_private() and page_has_private() to
> > > become more complicated.  To avoid that, in the places[*] where these are
> > > used to conditionalise calls to filemap_release_folio() and
> > > try_to_release_page(), the tests are removed the those functions just
> > > jumped to unconditionally and the test is performed there.
> > >
> > > [*] There are some exceptions in vmscan.c where the check guards more than
> > > just a call to the releaser.  I've added a function, folio_needs_release()
> > > to wrap all the checks for that.
> > >
> > > AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from
> > > fscache and cleared in ->evict_inode() before truncate_inode_pages_final()
> > > is called.
> > >
> > > Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared
> > > and the optimisation cancelled if a cachefiles object already contains data
> > > when we open it.
> > >
> > > Fixes: 1f67e6d0b188 ("fscache: Provide a function to note the release of a page")
> > > Fixes: 047487c947e8 ("cachefiles: Implement the I/O routines")
> > > Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
> > > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > > Signed-off-by: David Howells <dhowells@redhat.com>
> >
> > Hi David,
> >
> > I was bisecting a use-after-free BUG on the latest mm-unstable,
> > where HEAD is 347e208de0e4 ("rmap: pass the folio to __page_check_anon_rmap()").
> >
> > According to my bisection, this is the first bad commit.
> > Use-After-Free is triggered on reclamation path when swap is enabled.
>
> This was originally occurred during kernel compilation but
> can easily be reproduced via:
>
> stress-ng --bigheap $(nproc)
>
> > (and couldn't trigger without swap enabled)
> >
> > the config, KASAN splat, bisect log are attached.
> > hope this isn't too late :(
> >
> > > cc: Matthew Wilcox <willy@infradead.org>
> > > cc: Linus Torvalds <torvalds@linux-foundation.org>
> > > cc: Steve French <sfrench@samba.org>
> > > cc: Shyam Prasad N <nspmangalore@gmail.com>
> > > cc: Rohith Surabattula <rohiths.msft@gmail.com>
> > > cc: Dave Wysochanski <dwysocha@redhat.com>
> > > cc: Dominique Martinet <asmadeus@codewreck.org>
> > > cc: Ilya Dryomov <idryomov@gmail.com>
> > > cc: linux-cachefs@redhat.com
> > > cc: linux-cifs@vger.kernel.org
> > > cc: linux-afs@lists.infradead.org
> > > cc: v9fs-developer@lists.sourceforge.net
> > > cc: ceph-devel@vger.kernel.org
> > > cc: linux-nfs@vger.kernel.org
> > > cc: linux-fsdevel@vger.kernel.org
> > > cc: linux-mm@kvack.org
> > > ---
> > >
> > > Notes:
> > >     ver #7)
> > >      - Make NFS set AS_RELEASE_ALWAYS.
> > >
> > >     ver #4)
> > >      - Split out merging of folio_has_private()/filemap_release_folio() call
> > >        pairs into a preceding patch.
> > >      - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().
> > >
> > >     ver #3)
> > >      - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
> > >      - Moved a '&&' to the correct line.
> > >
> > >     ver #2)
> > >      - Rewrote entirely according to Willy's suggestion[1].
> > >
> > >  fs/9p/cache.c           |  2 ++
> > >  fs/afs/internal.h       |  2 ++
> > >  fs/cachefiles/namei.c   |  2 ++
> > >  fs/ceph/cache.c         |  2 ++
> > >  fs/nfs/fscache.c        |  3 +++
> > >  fs/smb/client/fscache.c |  2 ++
> > >  include/linux/pagemap.h | 16 ++++++++++++++++
> > >  mm/internal.h           |  5 ++++-
> > >  8 files changed, 33 insertions(+), 1 deletion(-)


I think myself / Daire Byrne may have already tracked this down and I
found a 1-liner that fixed a similar crash in his environment.

Can you try this patch on top and let me know if it still crashes?
https://github.com/DaveWysochanskiRH/kernel/commit/902c990e311120179fa5de99d68364b2947b79ec


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG mm-unstable] BUG: KASAN: use-after-free in shrink_folio_list+0x9f4/0x1ae0
  2023-07-07 18:12       ` David Wysochanski
@ 2023-07-07 18:27         ` Hyeonggon Yoo
  2023-07-07 18:40           ` Matthew Wilcox
  2023-07-07 18:33         ` Matthew Wilcox
  2023-07-07 19:23         ` SeongJae Park
  2 siblings, 1 reply; 12+ messages in thread
From: Hyeonggon Yoo @ 2023-07-07 18:27 UTC (permalink / raw)
  To: David Wysochanski
  Cc: David Howells, Andrew Morton, Matthew Wilcox, Linus Torvalds,
	Jeff Layton, Christoph Hellwig, linux-afs, linux-nfs, linux-cifs,
	ceph-devel, v9fs-developer, linux-erofs, linux-ext4,
	linux-cachefs, linux-fsdevel, Rohith Surabattula, Steve French,
	Shyam Prasad N, Dominique Martinet, Ilya Dryomov, linux-mm,
	Daire Byrne

On Fri, Jul 07, 2023 at 02:12:06PM -0400, David Wysochanski wrote:
> On Fri, Jul 7, 2023 at 12:46 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> >
> > On Sat, Jul 8, 2023 at 1:39 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> > >
> > > On Wed, Jun 28, 2023 at 11:48:52AM +0100, David Howells wrote:
> > > > Fscache has an optimisation by which reads from the cache are skipped until
> > > > we know that (a) there's data there to be read and (b) that data isn't
> > > > entirely covered by pages resident in the netfs pagecache.  This is done
> > > > with two flags manipulated by fscache_note_page_release():
> > > >
> > > >       if (...
> > > >           test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
> > > >           test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
> > > >               clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
> > > >
> > > > where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate
> > > > that netfslib should download from the server or clear the page instead.
> > > >
> > > > The fscache_note_page_release() function is intended to be called from
> > > > ->releasepage() - but that only gets called if PG_private or PG_private_2
> > > > is set - and currently the former is at the discretion of the network
> > > > filesystem and the latter is only set whilst a page is being written to the
> > > > cache, so sometimes we miss clearing the optimisation.
> > > >
> > > > Fix this by following Willy's suggestion[1] and adding an address_space
> > > > flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call
> > > > ->release_folio() if it's set, even if PG_private or PG_private_2 aren't
> > > > set.
> > > >
> > > > Note that this would require folio_test_private() and page_has_private() to
> > > > become more complicated.  To avoid that, in the places[*] where these are
> > > > used to conditionalise calls to filemap_release_folio() and
> > > > try_to_release_page(), the tests are removed the those functions just
> > > > jumped to unconditionally and the test is performed there.
> > > >
> > > > [*] There are some exceptions in vmscan.c where the check guards more than
> > > > just a call to the releaser.  I've added a function, folio_needs_release()
> > > > to wrap all the checks for that.
> > > >
> > > > AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from
> > > > fscache and cleared in ->evict_inode() before truncate_inode_pages_final()
> > > > is called.
> > > >
> > > > Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared
> > > > and the optimisation cancelled if a cachefiles object already contains data
> > > > when we open it.
> > > >
> > > > Fixes: 1f67e6d0b188 ("fscache: Provide a function to note the release of a page")
> > > > Fixes: 047487c947e8 ("cachefiles: Implement the I/O routines")
> > > > Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
> > > > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > > > Signed-off-by: David Howells <dhowells@redhat.com>
> > >
> > > Hi David,
> > >
> > > I was bisecting a use-after-free BUG on the latest mm-unstable,
> > > where HEAD is 347e208de0e4 ("rmap: pass the folio to __page_check_anon_rmap()").
> > >
> > > According to my bisection, this is the first bad commit.
> > > Use-After-Free is triggered on reclamation path when swap is enabled.
> >
> > This was originally occurred during kernel compilation but
> > can easily be reproduced via:
> >
> > stress-ng --bigheap $(nproc)
> >
> > > (and couldn't trigger without swap enabled)
> > >
> > > the config, KASAN splat, bisect log are attached.
> > > hope this isn't too late :(
> > >
> > > > cc: Matthew Wilcox <willy@infradead.org>
> > > > cc: Linus Torvalds <torvalds@linux-foundation.org>
> > > > cc: Steve French <sfrench@samba.org>
> > > > cc: Shyam Prasad N <nspmangalore@gmail.com>
> > > > cc: Rohith Surabattula <rohiths.msft@gmail.com>
> > > > cc: Dave Wysochanski <dwysocha@redhat.com>
> > > > cc: Dominique Martinet <asmadeus@codewreck.org>
> > > > cc: Ilya Dryomov <idryomov@gmail.com>
> > > > cc: linux-cachefs@redhat.com
> > > > cc: linux-cifs@vger.kernel.org
> > > > cc: linux-afs@lists.infradead.org
> > > > cc: v9fs-developer@lists.sourceforge.net
> > > > cc: ceph-devel@vger.kernel.org
> > > > cc: linux-nfs@vger.kernel.org
> > > > cc: linux-fsdevel@vger.kernel.org
> > > > cc: linux-mm@kvack.org
> > > > ---
> > > >
> > > > Notes:
> > > >     ver #7)
> > > >      - Make NFS set AS_RELEASE_ALWAYS.
> > > >
> > > >     ver #4)
> > > >      - Split out merging of folio_has_private()/filemap_release_folio() call
> > > >        pairs into a preceding patch.
> > > >      - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().
> > > >
> > > >     ver #3)
> > > >      - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
> > > >      - Moved a '&&' to the correct line.
> > > >
> > > >     ver #2)
> > > >      - Rewrote entirely according to Willy's suggestion[1].
> > > >
> > > >  fs/9p/cache.c           |  2 ++
> > > >  fs/afs/internal.h       |  2 ++
> > > >  fs/cachefiles/namei.c   |  2 ++
> > > >  fs/ceph/cache.c         |  2 ++
> > > >  fs/nfs/fscache.c        |  3 +++
> > > >  fs/smb/client/fscache.c |  2 ++
> > > >  include/linux/pagemap.h | 16 ++++++++++++++++
> > > >  mm/internal.h           |  5 ++++-
> > > >  8 files changed, 33 insertions(+), 1 deletion(-)
> 
> 
> I think myself / Daire Byrne may have already tracked this down and I
> found a 1-liner that fixed a similar crash in his environment.
> 
> Can you try this patch on top and let me know if it still crashes?
> https://github.com/DaveWysochanskiRH/kernel/commit/902c990e311120179fa5de99d68364b2947b79ec

Oh, it does not crash with the patch applied.

Hmm, was it UAF because it references wrong field ->mapping,
instead of swapper address space?


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG mm-unstable] BUG: KASAN: use-after-free in shrink_folio_list+0x9f4/0x1ae0
  2023-07-07 18:12       ` David Wysochanski
  2023-07-07 18:27         ` Hyeonggon Yoo
@ 2023-07-07 18:33         ` Matthew Wilcox
  2023-07-07 19:23         ` SeongJae Park
  2 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2023-07-07 18:33 UTC (permalink / raw)
  To: David Wysochanski
  Cc: Hyeonggon Yoo, David Howells, Andrew Morton, Linus Torvalds,
	Jeff Layton, Christoph Hellwig, linux-afs, linux-nfs, linux-cifs,
	ceph-devel, v9fs-developer, linux-erofs, linux-ext4,
	linux-cachefs, linux-fsdevel, Rohith Surabattula, Steve French,
	Shyam Prasad N, Dominique Martinet, Ilya Dryomov, linux-mm,
	Daire Byrne

On Fri, Jul 07, 2023 at 02:12:06PM -0400, David Wysochanski wrote:
> I think myself / Daire Byrne may have already tracked this down and I
> found a 1-liner that fixed a similar crash in his environment.
> 
> Can you try this patch on top and let me know if it still crashes?
> https://github.com/DaveWysochanskiRH/kernel/commit/902c990e311120179fa5de99d68364b2947b79ec

Said one-liner:
-	struct address_space *mapping = folio->mapping;
+	struct address_space *mapping = folio_mapping(folio);

This will definitely fix the problem.  shrink_folio_list() sees
anonymous folios as well as file folios.

I wonder if we want to go a step further and introduce ...

+static inline bool __folio_needs_release(struct address_space *mapping,
+               struct folio *folio)
+{
+       return folio_has_private(folio) ||
+               (mapping && mapping_release_always(mapping));
+}
+
 /*
  * Return true if a folio needs ->release_folio() calling upon it.
  */
 static inline bool folio_needs_release(struct folio *folio)
 {
-       struct address_space *mapping = folio->mapping;
-
-       return folio_has_private(folio) ||
-               (mapping && mapping_release_always(mapping));
+       return __folio_needs_release(folio_mapping(folio), folio);
 }

since two of the three callers already have done the necessary dance to
get the mapping (and they're the two which happen regularly; the third
is an unusual situation).


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG mm-unstable] BUG: KASAN: use-after-free in shrink_folio_list+0x9f4/0x1ae0
  2023-07-07 18:27         ` Hyeonggon Yoo
@ 2023-07-07 18:40           ` Matthew Wilcox
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2023-07-07 18:40 UTC (permalink / raw)
  To: Hyeonggon Yoo
  Cc: David Wysochanski, David Howells, Andrew Morton, Linus Torvalds,
	Jeff Layton, Christoph Hellwig, linux-afs, linux-nfs, linux-cifs,
	ceph-devel, v9fs-developer, linux-erofs, linux-ext4,
	linux-cachefs, linux-fsdevel, Rohith Surabattula, Steve French,
	Shyam Prasad N, Dominique Martinet, Ilya Dryomov, linux-mm,
	Daire Byrne

On Sat, Jul 08, 2023 at 03:27:42AM +0900, Hyeonggon Yoo wrote:
> Hmm, was it UAF because it references wrong field ->mapping,
> instead of swapper address space?

Ooh, I know this one!

When a folio is in use as an anonymous page, ->mapping has the bottom
two bits set to 01b.  The rest of the pointer is actually a pointer
to an anon_vma.  It's entirely plausible that an anon page might have
had its anon_vma freed by the time the folio is on the inactive list,
and on its way to being recycled (eg it was unmapped).  I'm not
terribly familiar with the lifetime rules of the anon_vma, but I doubt
that a folio still being in RAM would pin it if it has been unmapped.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG mm-unstable] BUG: KASAN: use-after-free in shrink_folio_list+0x9f4/0x1ae0
  2023-07-07 18:12       ` David Wysochanski
  2023-07-07 18:27         ` Hyeonggon Yoo
  2023-07-07 18:33         ` Matthew Wilcox
@ 2023-07-07 19:23         ` SeongJae Park
  2 siblings, 0 replies; 12+ messages in thread
From: SeongJae Park @ 2023-07-07 19:23 UTC (permalink / raw)
  To: David Wysochanski
  Cc: Hyeonggon Yoo, David Howells, Andrew Morton, Matthew Wilcox,
	Linus Torvalds, Jeff Layton, Christoph Hellwig, linux-afs,
	linux-nfs, linux-cifs, ceph-devel, v9fs-developer, linux-erofs,
	linux-ext4, linux-cachefs, linux-fsdevel, Rohith Surabattula,
	Steve French, Shyam Prasad N, Dominique Martinet, Ilya Dryomov,
	linux-mm, Daire Byrne, SeongJae Park

On Fri, 7 Jul 2023 14:12:06 -0400 David Wysochanski <dwysocha@redhat.com> wrote:

> On Fri, Jul 7, 2023 at 12:46 PM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> >
> > On Sat, Jul 8, 2023 at 1:39 AM Hyeonggon Yoo <42.hyeyoo@gmail.com> wrote:
> > >
> > > On Wed, Jun 28, 2023 at 11:48:52AM +0100, David Howells wrote:
> > > > Fscache has an optimisation by which reads from the cache are skipped until
> > > > we know that (a) there's data there to be read and (b) that data isn't
> > > > entirely covered by pages resident in the netfs pagecache.  This is done
> > > > with two flags manipulated by fscache_note_page_release():
> > > >
> > > >       if (...
> > > >           test_bit(FSCACHE_COOKIE_HAVE_DATA, &cookie->flags) &&
> > > >           test_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags))
> > > >               clear_bit(FSCACHE_COOKIE_NO_DATA_TO_READ, &cookie->flags);
> > > >
> > > > where the NO_DATA_TO_READ flag causes cachefiles_prepare_read() to indicate
> > > > that netfslib should download from the server or clear the page instead.
> > > >
> > > > The fscache_note_page_release() function is intended to be called from
> > > > ->releasepage() - but that only gets called if PG_private or PG_private_2
> > > > is set - and currently the former is at the discretion of the network
> > > > filesystem and the latter is only set whilst a page is being written to the
> > > > cache, so sometimes we miss clearing the optimisation.
> > > >
> > > > Fix this by following Willy's suggestion[1] and adding an address_space
> > > > flag, AS_RELEASE_ALWAYS, that causes filemap_release_folio() to always call
> > > > ->release_folio() if it's set, even if PG_private or PG_private_2 aren't
> > > > set.
> > > >
> > > > Note that this would require folio_test_private() and page_has_private() to
> > > > become more complicated.  To avoid that, in the places[*] where these are
> > > > used to conditionalise calls to filemap_release_folio() and
> > > > try_to_release_page(), the tests are removed the those functions just
> > > > jumped to unconditionally and the test is performed there.
> > > >
> > > > [*] There are some exceptions in vmscan.c where the check guards more than
> > > > just a call to the releaser.  I've added a function, folio_needs_release()
> > > > to wrap all the checks for that.
> > > >
> > > > AS_RELEASE_ALWAYS should be set if a non-NULL cookie is obtained from
> > > > fscache and cleared in ->evict_inode() before truncate_inode_pages_final()
> > > > is called.
> > > >
> > > > Additionally, the FSCACHE_COOKIE_NO_DATA_TO_READ flag needs to be cleared
> > > > and the optimisation cancelled if a cachefiles object already contains data
> > > > when we open it.
> > > >
> > > > Fixes: 1f67e6d0b188 ("fscache: Provide a function to note the release of a page")
> > > > Fixes: 047487c947e8 ("cachefiles: Implement the I/O routines")
> > > > Reported-by: Rohith Surabattula <rohiths.msft@gmail.com>
> > > > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > > > Signed-off-by: David Howells <dhowells@redhat.com>
> > >
> > > Hi David,
> > >
> > > I was bisecting a use-after-free BUG on the latest mm-unstable,
> > > where HEAD is 347e208de0e4 ("rmap: pass the folio to __page_check_anon_rmap()").
> > >
> > > According to my bisection, this is the first bad commit.
> > > Use-After-Free is triggered on reclamation path when swap is enabled.
> >
> > This was originally occurred during kernel compilation but
> > can easily be reproduced via:
> >
> > stress-ng --bigheap $(nproc)
> >
> > > (and couldn't trigger without swap enabled)
> > >
> > > the config, KASAN splat, bisect log are attached.
> > > hope this isn't too late :(
> > >
> > > > cc: Matthew Wilcox <willy@infradead.org>
> > > > cc: Linus Torvalds <torvalds@linux-foundation.org>
> > > > cc: Steve French <sfrench@samba.org>
> > > > cc: Shyam Prasad N <nspmangalore@gmail.com>
> > > > cc: Rohith Surabattula <rohiths.msft@gmail.com>
> > > > cc: Dave Wysochanski <dwysocha@redhat.com>
> > > > cc: Dominique Martinet <asmadeus@codewreck.org>
> > > > cc: Ilya Dryomov <idryomov@gmail.com>
> > > > cc: linux-cachefs@redhat.com
> > > > cc: linux-cifs@vger.kernel.org
> > > > cc: linux-afs@lists.infradead.org
> > > > cc: v9fs-developer@lists.sourceforge.net
> > > > cc: ceph-devel@vger.kernel.org
> > > > cc: linux-nfs@vger.kernel.org
> > > > cc: linux-fsdevel@vger.kernel.org
> > > > cc: linux-mm@kvack.org
> > > > ---
> > > >
> > > > Notes:
> > > >     ver #7)
> > > >      - Make NFS set AS_RELEASE_ALWAYS.
> > > >
> > > >     ver #4)
> > > >      - Split out merging of folio_has_private()/filemap_release_folio() call
> > > >        pairs into a preceding patch.
> > > >      - Don't need to clear AS_RELEASE_ALWAYS in ->evict_inode().
> > > >
> > > >     ver #3)
> > > >      - Fixed mapping_clear_release_always() to use clear_bit() not set_bit().
> > > >      - Moved a '&&' to the correct line.
> > > >
> > > >     ver #2)
> > > >      - Rewrote entirely according to Willy's suggestion[1].
> > > >
> > > >  fs/9p/cache.c           |  2 ++
> > > >  fs/afs/internal.h       |  2 ++
> > > >  fs/cachefiles/namei.c   |  2 ++
> > > >  fs/ceph/cache.c         |  2 ++
> > > >  fs/nfs/fscache.c        |  3 +++
> > > >  fs/smb/client/fscache.c |  2 ++
> > > >  include/linux/pagemap.h | 16 ++++++++++++++++
> > > >  mm/internal.h           |  5 ++++-
> > > >  8 files changed, 33 insertions(+), 1 deletion(-)
> 
> 
> I think myself / Daire Byrne may have already tracked this down and I
> found a 1-liner that fixed a similar crash in his environment.
> 
> Can you try this patch on top and let me know if it still crashes?
> https://github.com/DaveWysochanskiRH/kernel/commit/902c990e311120179fa5de99d68364b2947b79ec

I also encountered this issue with my DAMON tests, and was trying to find a
time slot for deep dive.  And I confirmed your fix works.  Thank you for this
great work.  Please Cc me when you post the patch if possible.

Tested-by: SeongJae Park <sj@kernel.org>


Thanks,
SJ

> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache
       [not found]   ` <202307171548.7ab20146-oliver.sang@intel.com>
@ 2023-07-17 12:43     ` David Wysochanski
  0 siblings, 0 replies; 12+ messages in thread
From: David Wysochanski @ 2023-07-17 12:43 UTC (permalink / raw)
  To: kernel test robot
  Cc: David Howells, oe-lkp, lkp, Rohith Surabattula, Matthew Wilcox,
	Linus Torvalds, Steve French, Shyam Prasad N, Dominique Martinet,
	Ilya Dryomov, v9fs, linux-afs, linux-cachefs, ceph-devel,
	linux-nfs, linux-cifs, samba-technical, linux-fsdevel, linux-mm,
	Andrew Morton, Jeff Layton, Christoph Hellwig, v9fs-developer,
	linux-erofs, linux-ext4

On Mon, Jul 17, 2023 at 3:35 AM kernel test robot <oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed "canonical_address#:#[##]" on:
>
> commit: 830503440449014dcf0e4b0b6d905a1b0b2c92ad ("[PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache")
> url: https://github.com/intel-lab-lkp/linux/commits/David-Howells/mm-Merge-folio_has_private-filemap_release_folio-call-pairs/20230628-185100
> base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
> patch link: https://lore.kernel.org/all/20230628104852.3391651-3-dhowells@redhat.com/
> patch subject: [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache
>
> in testcase: vm-scalability
> version: vm-scalability-x86_64-1.0-0_20220518
> with following parameters:
>
>         runtime: 300
>         thp_enabled: always
>         thp_defrag: always
>         nr_task: 32
>         nr_ssd: 1
>         priority: 1
>         test: swap-w-rand
>         cpufreq_governor: performance
>
> test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>
>
> compiler: gcc-12
> test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202307171548.7ab20146-oliver.sang@intel.com
>

This has already been fixed with
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-everything&id=af18573471db5c5c9b96ec95208c340ae7c00e64


>
> [   45.898720][ T1453]
> [   45.907480][ T1453] 2023-07-16 00:36:07  ./case-swap-w-rand
> [   45.907481][ T1453]
> [   45.917873][ T1453] 2023-07-16 00:36:07  ./usemem --runtime 300 -n 32 --random 8048142432
> [   45.917876][ T1453]
> [   47.348632][  T973] general protection fault, probably for non-canonical address 0xf8ff1100207778e6: 0000 [#1] SMP NOPTI
> [   47.359787][  T973] CPU: 123 PID: 973 Comm: kswapd1 Tainted: G S                 6.4.0-rc4-00533-g830503440449 #3
> [   47.370301][  T973] Hardware name: Intel Corporation M50CYP2SB1U/M50CYP2SB1U, BIOS SE5C620.86B.01.01.0003.2104260124 04/26/2021
> [ 47.382201][ T973] RIP: 0010:filemap_release_folio (kbuild/src/x86_64/mm/filemap.c:4063 (discriminator 1))
> [ 47.388172][ T973] Code: 00 48 8b 07 48 8b 57 18 83 e0 01 74 4f 48 f7 07 00 60 00 00 74 22 48 8b 07 f6 c4 80 75 32 48 85 d2 74 34 48 8b 82 90 00 00 00 <48> 8b 40 48 48 85 c0 74 24 ff e0 cc 66 90 48 85 d2 74 15 48 8b 8a
> All code
> ========
>    0:   00 48 8b                add    %cl,-0x75(%rax)
>    3:   07                      (bad)
>    4:   48 8b 57 18             mov    0x18(%rdi),%rdx
>    8:   83 e0 01                and    $0x1,%eax
>    b:   74 4f                   je     0x5c
>    d:   48 f7 07 00 60 00 00    testq  $0x6000,(%rdi)
>   14:   74 22                   je     0x38
>   16:   48 8b 07                mov    (%rdi),%rax
>   19:   f6 c4 80                test   $0x80,%ah
>   1c:   75 32                   jne    0x50
>   1e:   48 85 d2                test   %rdx,%rdx
>   21:   74 34                   je     0x57
>   23:   48 8b 82 90 00 00 00    mov    0x90(%rdx),%rax
>   2a:*  48 8b 40 48             mov    0x48(%rax),%rax          <-- trapping instruction
>   2e:   48 85 c0                test   %rax,%rax
>   31:   74 24                   je     0x57
>   33:   ff e0                   jmpq   *%rax
>   35:   cc                      int3
>   36:   66 90                   xchg   %ax,%ax
>   38:   48 85 d2                test   %rdx,%rdx
>   3b:   74 15                   je     0x52
>   3d:   48                      rex.W
>   3e:   8b                      .byte 0x8b
>   3f:   8a                      .byte 0x8a
>
> Code starting with the faulting instruction
> ===========================================
>    0:   48 8b 40 48             mov    0x48(%rax),%rax
>    4:   48 85 c0                test   %rax,%rax
>    7:   74 24                   je     0x2d
>    9:   ff e0                   jmpq   *%rax
>    b:   cc                      int3
>    c:   66 90                   xchg   %ax,%ax
>    e:   48 85 d2                test   %rdx,%rdx
>   11:   74 15                   je     0x28
>   13:   48                      rex.W
>   14:   8b                      .byte 0x8b
>   15:   8a                      .byte 0x8a
> [   47.408103][  T973] RSP: 0018:ffa00000094f7b28 EFLAGS: 00010246
> [   47.414266][  T973] RAX: f8ff11002077789e RBX: ffa00000094f7c08 RCX: 98ff110020777898
> [   47.422337][  T973] RDX: ff11002077788f71 RSI: 0000000000000cc0 RDI: ffd4000042870d00
> [   47.430417][  T973] RBP: ffa00000094f7b98 R08: ff110001ba106300 R09: 0000000000000028
> [   47.438497][  T973] R10: ff110010846bd080 R11: ff1100207ffd4000 R12: ffd4000042870d00
> [   47.446575][  T973] R13: ffa00000094f7e10 R14: ffa00000094f7c1c R15: ffd4000042870d08
> [   47.454658][  T973] FS:  0000000000000000(0000) GS:ff110020046c0000(0000) knlGS:0000000000000000
> [   47.463703][  T973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   47.470406][  T973] CR2: 00007fddcaf308f8 CR3: 00000001e1a82003 CR4: 0000000000771ee0
> [   47.478496][  T973] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [   47.486594][  T973] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [   47.494696][  T973] PKRU: 55555554
> [   47.498372][  T973] Call Trace:
> [   47.501795][  T973]  <TASK>
> [ 47.504870][ T973] ? die_addr (kbuild/src/x86_64/arch/x86/kernel/dumpstack.c:421 kbuild/src/x86_64/arch/x86/kernel/dumpstack.c:460)
> [ 47.509153][ T973] ? exc_general_protection (kbuild/src/x86_64/arch/x86/kernel/traps.c:783 kbuild/src/x86_64/arch/x86/kernel/traps.c:728)
> [ 47.514826][ T973] ? asm_exc_general_protection (kbuild/src/x86_64/arch/x86/include/asm/idtentry.h:564)
> [ 47.520679][ T973] ? filemap_release_folio (kbuild/src/x86_64/mm/filemap.c:4063 (discriminator 1))
>
>
> To reproduce:
>
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         sudo bin/lkp install job.yaml           # job file is attached in this email
>         bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
>         sudo bin/lkp run generated-yaml-file
>
>         # if come across any failure that blocks the test,
>         # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-07-17 12:45 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-28 10:48 [PATCH v7 0/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache David Howells
2023-06-28 10:48 ` [PATCH v7 1/2] mm: Merge folio_has_private()/filemap_release_folio() call pairs David Howells
2023-06-28 10:48 ` [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache David Howells
2023-06-29  0:39   ` [Linux-cachefs] " Xiubo Li
2023-06-30  3:20     ` Jingbo Xu
     [not found]   ` <ZKg/J3OG3kQ9ynSO@fedora>
2023-07-07 16:46     ` [BUG mm-unstable] BUG: KASAN: use-after-free in shrink_folio_list+0x9f4/0x1ae0 Hyeonggon Yoo
2023-07-07 18:12       ` David Wysochanski
2023-07-07 18:27         ` Hyeonggon Yoo
2023-07-07 18:40           ` Matthew Wilcox
2023-07-07 18:33         ` Matthew Wilcox
2023-07-07 19:23         ` SeongJae Park
     [not found]   ` <202307171548.7ab20146-oliver.sang@intel.com>
2023-07-17 12:43     ` [PATCH v7 2/2] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache David Wysochanski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).