linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chris Mason <clm@fb.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "linux-xfs@vger.kernel.org" <linux-xfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"jlayton@kernel.org" <jlayton@kernel.org>,
	"hch@infradead.org" <hch@infradead.org>
Subject: Re: [RFC 0/8] Replacing the readpages a_op
Date: Mon, 13 Jan 2020 16:42:10 +0000	[thread overview]
Message-ID: <6CA4CD96-0812-4261-8FF9-CD28AA2EC38A@fb.com> (raw)
In-Reply-To: <20200113153746.26654-1-willy@infradead.org>



On 13 Jan 2020, at 10:37, Matthew Wilcox wrote:

> From: "Matthew Wilcox (Oracle)" <willy@infradead.org>
>
> I think everybody hates the readpages API.  The fundamental problem 
> with
> it is that it passes the pages to be read on a doubly linked list, 
> using
> the ->lru list in the struct page.  That means the filesystems have to
> do the work of calling add_to_page_cache{,_lru,_locked}, and handling
> failures (because another task is also accessing that chunk of the 
> file,
> and so it fails).
>
> This is an attempt to add a ->readahead op to replace ->readpages.  
> I've
> converted two users, iomap/xfs and cifs.  The cifs conversion is 
> lacking
> fscache support, and that's just because I didn't want to do that 
> work;
> I don't believe there's anything fundamental to it.  But I wanted to 
> do
> iomap because it is The Infrastructure Of The Future and cifs because 
> it
> is the sole remaining user of add_to_page_cache_locked(), which 
> enables
> the last two patches in the series.  By the way, that gives CIFS 
> access
> to the workingset shadow infrastructure, which it had to ignore before
> because it couldn't put pages onto the lru list at the right time.

I've always kind of liked the compromise of sending the lists.  It's 
really good at the common case and doesn't have massive problems when 
things break down.   Just glancing through the patches, the old 
readpages is called in bigger chunks, so for massive reads we can do 
more effective readahead on metadata.  I don't think any of us actually 
do, but we could.

With this new operation, our window is constant, and much smaller.

>
> The fundamental question is, how do we indicate to the implementation 
> of
> ->readahead what pages to operate on?  I've gone with passing a 
> pagevec.
> This has the obvious advantage that it's a data structure that already
> exists and is used within filemap for batches of pages.  I had to add 
> a
> bit of new infrastructure to support iterating over the pages in the
> pagevec, but with that done, it's quite nice.
>
> I think the biggest problem is that the size of the pagevec is limited
> to 15 pages (60kB).  So that'll mean that if the readahead window 
> bumps
> all the way up to 256kB, we may end up making 5 BIOs (and merging 
> them)
> instead of one.  I'd kind of like to be able to allocate variable 
> length
> pagevecs while allowing regular pagevecs to be allocated on the stack,
> but I can't figure out a way to do that.  eg this doesn't work:
>
> -       struct page *pages[PAGEVEC_SIZE];
> +       union {
> +               struct page *pages[PAGEVEC_SIZE];
> +               struct page *_pages[];
> +       }
>
> and if we just allocate them, useful and wonderful tools are going to
> point out when pages[16] is accessed that we've overstepped the end of
> the array.
>
> I have considered alternatives to the pagevec like just having the
> ->readahead implementation look up the pages in the i_pages XArray
> directly.  That didn't work out too well.
>

Btrfs basically does this now, honestly iomap isn't that far away.  
Given how sensible iomap is for this, I'd rather see us pile into that 
abstraction than try to pass pagevecs for large ranges.  Otherwise, if 
the lists are awkward we can make some helpers to make it less error 
prone?

-chris


  parent reply	other threads:[~2020-01-13 16:42 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-13 15:37 [RFC 0/8] Replacing the readpages a_op Matthew Wilcox
2020-01-13 15:37 ` [PATCH 1/8] pagevec: Add an iterator Matthew Wilcox
2020-01-13 15:37 ` [PATCH 2/8] mm: Fix the return type of __do_page_cache_readahead Matthew Wilcox
2020-01-13 15:37 ` [PATCH 3/8] mm: Use a pagevec for readahead Matthew Wilcox
2020-01-13 15:37 ` [PATCH 4/8] mm/fs: Add a_ops->readahead Matthew Wilcox
2020-01-13 18:22   ` Daniel Wagner
2020-01-13 19:17     ` Matthew Wilcox
2020-01-13 15:37 ` [PATCH 5/8] iomap,xfs: Convert from readpages to readahead Matthew Wilcox
2020-01-13 15:37 ` [PATCH 6/8] cifs: " Matthew Wilcox
2020-01-13 15:37 ` [PATCH 7/8] mm: Remove add_to_page_cache_locked Matthew Wilcox
2020-01-13 15:37 ` [PATCH 8/8] mm: Unify all add_to_page_cache variants Matthew Wilcox
2020-01-13 16:42 ` Chris Mason [this message]
2020-01-13 17:40   ` [RFC 0/8] Replacing the readpages a_op Matthew Wilcox
2020-01-13 18:00     ` Chris Mason
2020-01-13 21:58       ` Matthew Wilcox
2020-01-13 22:00         ` Jens Axboe
2020-01-13 22:10           ` Matthew Wilcox
2020-01-13 22:14             ` Jens Axboe
2020-01-13 22:27               ` Matthew Wilcox
2020-01-13 22:30                 ` Jens Axboe
2020-01-13 22:34                 ` Chris Mason
2020-01-14  1:01                   ` Matthew Wilcox
2020-01-14  1:07                     ` Chris Mason
2020-01-13 17:54   ` Matthew Wilcox
2020-01-13 22:19     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6CA4CD96-0812-4261-8FF9-CD28AA2EC38A@fb.com \
    --to=clm@fb.com \
    --cc=hch@infradead.org \
    --cc=jlayton@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).