All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christoph Hellwig <hch@infradead.org>, Jan Kara <jack@suse.cz>,
	Jeff Layton <jlayton@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Logan Gunthorpe <logang@deltatee.com>,
	Hillf Danton <hdanton@sina.com>,
	linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Daniel Golle <daniel@makrotopia.org>,
	Guenter Roeck <groeck7@gmail.com>, Christoph Hellwig <hch@lst.de>,
	John Hubbard <jhubbard@nvidia.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH v17 03/14] shmem: Implement splice-read
Date: Tue, 14 Mar 2023 16:42:54 +0000	[thread overview]
Message-ID: <ZBCkDvveAIJENA0G@casper.infradead.org> (raw)
In-Reply-To: <CAHk-=wjYR3h5Q-_i3Q2Et=P8WsrjwNA20fYpEQf9nafHwBNALA@mail.gmail.com>

On Wed, Mar 08, 2023 at 02:39:00PM -0800, Linus Torvalds wrote:
> On Wed, Mar 8, 2023 at 8:53 AM David Howells <dhowells@redhat.com> wrote:
> >
> > The new filemap_splice_read() has an implicit expectation via
> > filemap_get_pages() that ->read_folio() exists if ->readahead() doesn't
> > fully populate the pagecache of the file it is reading from[1], potentially
> > leading to a jump to NULL if this doesn't exist.  shmem, however, (and by
> > extension, tmpfs, ramfs and rootfs), doesn't have ->read_folio(),
> 
> This patch is the only one in your series that I went "Ugh, that's
> really ugly" for.
> 
> Do we really want to basically duplicate all of filemap_splice_read()?
> 
> I get the feeling that the zeropage case just isn't so important that
> we'd need to duplicate filemap_splice_read() just for that, and I
> think that the code should either
> 
>  (a) just make a silly "read_folio()" for shmfs that just clears the page.
> 
>      Ugly but maybe simple and not horrid?

The problem is that we might have swapped out the shmem folio.  So we
don't want to clear the page, but ask swap to fill the page.  The way
that currently works (see shmem_get_folio_gfp()) is to fetch the swap
entry from the page cache, allocate a new folio inside the shmem code,
then replace the swap entry with the new folio.

What I'd like to see is the generic code say "Ah, this is a shmem
inode, so it's special and the xa_value entry is swap information,
not workingset information, so I'll allocate the folio and restore
the folio->private swap information to let the shmem_read_folio
function do its job correctly".

Either that or we completely overhaul the shmem code to store the
location of its swapped data somewhere that's not the page cache.

>  (b) teach filemap_splice_read() that a NULL 'read_folio' function
> means "use the zero page"

Same problem as (a).

>  (c) go even further, and teach read_folio() in general about file
> holes, and allow *any* filesystem to read zeroes that way in general
> without creating a folio for it.

I've had thoughts along those lines in the past.  It's pretty major
surgery, I think.  At the moment, we allocate the pages and add them
to the page cache in a locked state before asking the filesystem to
populate them.  So the fs doesn't even have the file layout (eg the
get_block or iomap info) that would tell it where the holes are until
the page has already been allocated and inserted.  We could of course
free the page and replace it with a special 'THIS_IS_A_HOLE' entry.
It's just never seemed important enuogh to me to do this surgery.

> in a perfect world, if done well I think shmem_file_read_iter() should
> go away, and it could use generic_file_read_iter too.
> 
> I dunno. Maybe shm really is *so* special that this is the right way
> to do things, but I did react quite negatively to this patch. So not a
> complete NAK, but definitely a "do we _really_ have to do this?"

I'd really like to see shmem have a read_folio implementation.  I
don't know how much work it's going to be.

  reply	other threads:[~2023-03-14 16:43 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-08 16:52 [PATCH v17 00/14] splice, block: Use page pinning and kill ITER_PIPE David Howells
2023-03-08 16:52 ` [PATCH v17 01/14] splice: Clean up direct_splice_read() a bit David Howells
2023-03-14 17:30   ` Christoph Hellwig
2023-03-08 16:52 ` [PATCH v17 02/14] splice: Make do_splice_to() generic and export it David Howells
2023-03-14 17:31   ` Christoph Hellwig
2023-03-14 21:15   ` David Howells
2023-03-15 16:34   ` [RFC PATCH] splice: Convert longs and some ints into ssize_t David Howells
2023-03-08 16:52 ` [PATCH v17 03/14] shmem: Implement splice-read David Howells
2023-03-08 22:39   ` Linus Torvalds
2023-03-14 16:42     ` Matthew Wilcox [this message]
2023-03-14 18:02       ` Linus Torvalds
2023-03-14 20:08         ` Matthew Wilcox
2023-03-14 18:26       ` David Howells
2023-03-14 19:07         ` Linus Torvalds
2023-03-14 19:09           ` Linus Torvalds
2023-03-14 21:50         ` David Howells
2023-03-08 23:42   ` David Howells
2023-03-08 16:52 ` [PATCH v17 04/14] overlayfs: " David Howells
2023-03-08 16:52 ` [PATCH v17 05/14] coda: " David Howells
2023-03-13 13:28   ` Jan Harkes
2023-03-08 16:52 ` [PATCH v17 06/14] tty, proc, kernfs, random: Use direct_splice_read() David Howells
2023-03-08 16:52 ` [PATCH v17 07/14] splice: Do splice read from a file without using ITER_PIPE David Howells
2023-03-14 17:32   ` Christoph Hellwig
2023-03-14 21:52   ` David Howells
2023-03-08 16:52 ` [PATCH v17 08/14] iov_iter: Kill ITER_PIPE David Howells
2023-03-08 16:52 ` [PATCH v17 09/14] iomap: Don't get an reference on ZERO_PAGE for direct I/O block zeroing David Howells
2023-03-08 21:08   ` Dave Chinner
2023-03-14 17:33   ` Christoph Hellwig
2023-03-08 16:52 ` [PATCH v17 10/14] block: Fix bio_flagged() so that gcc can better optimise it David Howells
2023-03-08 16:52 ` [PATCH v17 11/14] block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic David Howells
2023-03-08 16:52 ` [PATCH v17 12/14] block: Add BIO_PAGE_PINNED and associated infrastructure David Howells
2023-03-08 16:52 ` [PATCH v17 13/14] block: Convert bio_iov_iter_get_pages to use iov_iter_extract_pages David Howells
2023-03-08 16:52 ` [PATCH v17 14/14] block: convert bio_map_user_iov " David Howells

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZBCkDvveAIJENA0G@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=axboe@kernel.dk \
    --cc=daniel@makrotopia.org \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=groeck7@gmail.com \
    --cc=hch@infradead.org \
    --cc=hch@lst.de \
    --cc=hdanton@sina.com \
    --cc=hughd@google.com \
    --cc=jack@suse.cz \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=jlayton@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=logang@deltatee.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.