linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: linux-fsdevel@vger.kernel.org
Cc: Kent Overstreet <kent.overstreet@gmail.com>,
	David Howells <dhowells@redhat.com>,
	Mike Marshall <hubcap@omnibond.com>
Subject: The future of readahead
Date: Wed, 26 Aug 2020 20:31:16 +0100	[thread overview]
Message-ID: <20200826193116.GU17456@casper.infradead.org> (raw)

Both Kent and David have had conversations with me about improving the
readahead filesystem interface this last week, and as I don't have time
to write the code, here's the design.

1. Kent doesn't like it that we do an XArray lookup for each page.
The proposed solution adds a (small) array of page pointers (or a
pagevec) to the struct readahead_control.  It may make sense to move
__readahead_batch() and readahead_page() out of line at that point.
This should be backed up with performance numbers.

2. David wants to be sure that readahead is aligned to a granule
size (eg 256kB) to support fscache.  When we last talked about it,
I suggested encoding the granule size in the struct address_space.
I no longer think this approach should be pursued, since ...

3. Kent wants to be able to expand readahead to encompass an entire fs
extent (if, eg, that extent is compressed or encrypted).  We don't know
that at the right point; the filesystem can't pass that information
through the generic_file_buffered_read() or filemap_fault() interface
to the readahead code.  So the right approach here is for the filesystem
to ask the readahead code to expand the readahead batch.

So solving #2 and #3 looks like a new interface for filesystems to call:

void readahead_expand(struct readahead_control *rac, loff_t start, u64 len);
or possibly
void readahead_expand(struct readahead_control *rac, pgoff_t start,
		unsigned int count);

It might not actually expand the readahead attempt at all -- for example,
if there's already a page in the page cache, or if it can't allocate
memory.  But this puts the responsibility for allocating pages in the VFS,
where it belongs.

4. Mike wants to be able to do 4MB I/Os [1].  That should be covered by
the solution above.  Mike, just to clarify.  Do you need 4MB pages, or can
you work with some mixture of page sizes going as far as 1024 x 4kB pages?

5. I'm allocating larger pages in the readahead code (part of the THP
patch set [2])

[1] https://lore.kernel.org/linux-fsdevel/CAOg9mSSrJp2dqQTNDgucLoeQcE_E_aYPxnRe5xphhdSPYw7QtQ@mail.gmail.com/
[2] http://git.infradead.org/users/willy/pagecache.git/commitdiff/c00bd4082c7bc32a17b0baa29af6974286978e1f

             reply	other threads:[~2020-08-26 19:31 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-26 19:31 Matthew Wilcox [this message]
2020-08-27 17:02 ` The future of readahead David Howells
2020-08-27 17:21   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200826193116.GU17456@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=dhowells@redhat.com \
    --cc=hubcap@omnibond.com \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).