All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Jeff Layton <jlayton@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Hillf Danton <hdanton@sina.com>,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Daniel Golle <daniel@makrotopia.org>,
	Guenter Roeck <groeck7@gmail.com>, Christoph Hellwig <hch@lst.de>,
	John Hubbard <jhubbard@nvidia.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v17 03/14] shmem: Implement splice-read
Date: Tue, 14 Mar 2023 20:08:48 +0000	[thread overview]
Message-ID: <ZBDUUAGO9HbZG8EW@casper.infradead.org> (raw)
In-Reply-To: <CAHk-=wiO-Z7QdKnA+yeLCROiVVE6dBK=TaE7wz4hMc0gE2SPRw@mail.gmail.com>

On Tue, Mar 14, 2023 at 11:02:40AM -0700, Linus Torvalds wrote:
> On Tue, Mar 14, 2023 at 9:43 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > The problem is that we might have swapped out the shmem folio.  So we
> > don't want to clear the page, but ask swap to fill the page.
> 
> Doesn't shmem_swapin_folio() already basically do all that work?
> 
> The real oddity with shmem - compared to other filesystems - is that
> the xarray has a value entry instead of being a real folio. And yes,
> the current filemap code will then just ignore such entries as
> "doesn't exist", and so the regular read iterators will all fail on
> it.
> 
> But while filemap_get_read_batch() will stop at a value-folio, I feel
> like filemap_create_folio() should be able to turn a value page into a
> "real" page. Right now it already allocates said page, but then I
> think filemap_add_folio() will return -EEXIST when said entry exists
> as a value.
> 
> But *if* instead of -EEXIST we could just replace the value with the
> (already locked) page, and have some sane way to pass that value
> (which is the swap entry data) to readpage(), I think that should just
> do it all.

This was basically what I had in mind:

I don't think this will handle a case like:

Alloc order-0 folio at index 4
Alloc order-0 folio at index 7
Swap out both folios
Alloc order-9 folio at indices 0-511

But I don't see where shmem currently handles that either.  Maybe it
falls back to order-0 folios instead of the crude BUG_ON I put in.
Hugh?

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 82c1262f396f..30f2502314de 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -114,12 +114,6 @@ int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
 struct folio *shmem_read_folio_gfp(struct address_space *mapping,
 		pgoff_t index, gfp_t gfp);
 
-static inline struct folio *shmem_read_folio(struct address_space *mapping,
-		pgoff_t index)
-{
-	return shmem_read_folio_gfp(mapping, index, mapping_gfp_mask(mapping));
-}
-
 static inline struct page *shmem_read_mapping_page(
 				struct address_space *mapping, pgoff_t index)
 {
diff --git a/mm/filemap.c b/mm/filemap.c
index 57c1b154fb5a..8e4f95c5b65a 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -877,6 +877,8 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 					order, gfp);
 		xas_lock_irq(&xas);
 		xas_for_each_conflict(&xas, entry) {
+			if (old)
+				BUG_ON(shmem_mapping(mapping));
 			old = entry;
 			if (!xa_is_value(entry)) {
 				xas_set_err(&xas, -EEXIST);
@@ -885,7 +887,12 @@ noinline int __filemap_add_folio(struct address_space *mapping,
 		}
 
 		if (old) {
-			if (shadowp)
+			if (shmem_mapping(mapping)) {
+				folio_set_swap_entry(folio,
+						radix_to_swp_entry(old));
+				folio_set_swapcache(folio);
+				folio_set_swapbacked(folio);
+			} else if (shadowp)
 				*shadowp = old;
 			/* entry may have been split before we acquired lock */
 			order = xa_get_order(xas.xa, xas.xa_index);
diff --git a/mm/shmem.c b/mm/shmem.c
index 8e60826e4246..ea75c7dcf5ec 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2059,6 +2059,18 @@ int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop,
 			mapping_gfp_mask(inode->i_mapping), NULL, NULL, NULL);
 }
 
+static int shmem_read_folio(struct file *file, struct folio *folio)
+{
+	if (folio_test_swapcache(folio)) {
+		swap_readpage(&folio->page, true, NULL);
+	} else {
+		folio_zero_segment(folio, 0, folio_size(folio));
+		folio_mark_uptodate(folio);
+		folio_unlock(folio);
+	}
+	return 0;
+}
+
 /*
  * This is like autoremove_wake_function, but it removes the wait queue
  * entry unconditionally - even if something else had already woken the
@@ -2396,7 +2408,8 @@ static int shmem_fadvise_willneed(struct address_space *mapping,
 	xa_for_each_range(&mapping->i_pages, index, folio, start, end) {
 		if (!xa_is_value(folio))
 			continue;
-		folio = shmem_read_folio(mapping, index);
+		folio = shmem_read_folio_gfp(mapping, index,
+						mapping_gfp_mask(mapping));
 		if (!IS_ERR(folio))
 			folio_put(folio);
 	}
@@ -4037,6 +4050,7 @@ static int shmem_error_remove_page(struct address_space *mapping,
 }
 
 const struct address_space_operations shmem_aops = {
+	.read_folio	= shmem_read_folio,
 	.writepage	= shmem_writepage,
 	.dirty_folio	= noop_dirty_folio,
 #ifdef CONFIG_TMPFS

  reply	other threads:[~2023-03-14 20:09 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
2023-03-08 16:52 ` [PATCH v17 01/14] splice: Clean up direct_splice_read() a bit David Howells
2023-03-14 17:30   ` Christoph Hellwig
2023-03-08 16:52 ` [PATCH v17 02/14] splice: Make do_splice_to() generic and export it David Howells
2023-03-14 17:31   ` Christoph Hellwig
2023-03-14 21:15   ` David Howells
2023-03-15 16:34   ` [RFC PATCH] splice: Convert longs and some ints into ssize_t David Howells
2023-03-08 16:52 ` [PATCH v17 03/14] shmem: Implement splice-read David Howells
2023-03-08 22:39   ` Linus Torvalds
2023-03-14 16:42     ` Matthew Wilcox
2023-03-14 18:02       ` Linus Torvalds
2023-03-14 20:08         ` Matthew Wilcox [this message]
2023-03-14 18:26       ` David Howells
2023-03-14 19:07         ` Linus Torvalds
2023-03-14 19:09           ` Linus Torvalds
2023-03-14 21:50         ` David Howells
2023-03-08 23:42   ` David Howells
2023-03-08 16:52 ` [PATCH v17 04/14] overlayfs: " David Howells
2023-03-08 16:52 ` [PATCH v17 05/14] coda: " David Howells
2023-03-13 13:28   ` Jan Harkes
2023-03-08 16:52 ` [PATCH v17 06/14] tty, proc, kernfs, random: Use direct_splice_read() David Howells
2023-03-08 16:52 ` [PATCH v17 07/14] splice: Do splice read from a file without using ITER_PIPE David Howells
2023-03-14 17:32   ` Christoph Hellwig
2023-03-14 21:52   ` David Howells
2023-03-08 16:52 ` [PATCH v17 08/14] iov_iter: Kill ITER_PIPE David Howells
2023-03-08 16:52 ` [PATCH v17 09/14] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
2023-03-08 21:08   ` Dave Chinner
2023-03-14 17:33   ` Christoph Hellwig
2023-03-08 16:52 ` [PATCH v17 10/14] block: Fix bio_flagged() so that gcc can better optimise it David Howells
2023-03-08 16:52 ` [PATCH v17 11/14] block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic David Howells
2023-03-08 16:52 ` [PATCH v17 12/14] block: Add BIO_PAGE_PINNED and associated infrastructure David Howells
2023-03-08 16:52 ` [PATCH v17 13/14] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
2023-03-08 16:52 ` [PATCH v17 14/14] block: convert bio_map_user_iov " David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZBDUUAGO9HbZG8EW@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=axboe@kernel.dk \
    --cc=daniel@makrotopia.org \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=groeck7@gmail.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=hdanton@sina.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jlayton@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=logang@deltatee.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.