From: Andrew Morton <akpm@linux-foundation.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk
Subject: Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead
Date: Wed, 20 Jun 2018 16:23:53 -0700 [thread overview]
Message-ID: <20180620162353.58d931d64ebf7400a259030e@linux-foundation.org> (raw)
In-Reply-To: <df80944e-f8ce-198a-910d-54e0185d8d41@kernel.dk>
On Wed, 20 Jun 2018 16:28:54 -0600 Jens Axboe <axboe@kernel.dk> wrote:
> >> Nothing prevents it, but it's always been the case. So we need to
> >> embrace the fact one way or another.
> >
> > All it will take is one simple (and desirable) cleanup to f2fs and this
> > will all break. Take that as an indication of the fragility of this
> > assumption.
>
> The assumption has been true for a decade or more. The only problem with
> it is that we have this ->readpages() interface that is only used for
> readahead. Hence it should be ->readahead() or similar, that avoid the
> fragility argument.
There's no reason why readpages shouldn't be used for non-readahead
purposes.
> >>> And what's happening with the individual ->readpage() calls which
> >>> read_pages() does? Do they still not get REQ_RAHEAD treatment?
> >>
> >> They should, yes.
> >
> > hm, how does that happen.
>
> Maybe we aren't talking about the same thing. Which individual
> ->readpage() exactly?
I forget, never mind ;)
> >>> This is kinda dirty, but we could add a readahead flag to blk_plug, set
> >>> it in read_pages(), test it down in the block layer somewhere....
> >>> That's grubby, but at least it wouldn't increase sizeof(task_struct)?
> >>
> >> That sounds way too dirty for my liking, I'd really not want to mix that
> >> in with plugging.
> >
> > That's the wrong way of thinking about it ;) Rename blk_plug to
> > blk_state or something and view it as a communication package between
> > the VFS level and the block layer and it all fits together pretty well.
>
> We have a perfectly fine established interface for telling the block
> layer what kind of IO we're submitting, and that's setting it in the
> bio. Introducing a secondary and orthogonal interface for this is a
> horrible idea, imho.
->readpages() callers don't have access to the bio(s).
> >> Why not just treat it like it is - readpages() is clearly just
> >> read-ahead. This isn't something new I'm inventing, it's always been so.
> >> Seems to me that we just need to make it explicit.
> >
> > There's no reason at all why ->readpages() is exclusively for use by
> > readahead! I'm rather surprised that fsf2 is the only fs which is
> > (should be) using ->readpages internally.
>
> I totally agree, there's no reason why it is, and everybody is surprised
> that that is the case. But the fact of the matter is that that is the
> case, and always has been so. So I'll repeat what I said before, we need
> to embrace that and make it EXPLICIT instead of side stepping the issue.
My point is that f2fs (for example) *should* be using mpage_readpages()
right now, for its non-readahead purposes.
If we are to remove callers' ability to use readpages for non-readahead
purposes we should rename the address_space field.
> > Can we change blktrace instead? Rename REQ_RAHEAD to REQ_READPAGES?
> > Then everything will fit together OK.
>
> Why do you insist on working around it?! That's just creating another
> assumption in another place, not fixing the actual issue.
All I've been told about is blktrace.
> Besides, this isn't going to be just about tracing. Yes, it'll be
> awesome to actually get the right information from blktrace, since right
> now nobody knows which parts are read-ahead and which ones are explicit
> reads. Might be pretty darn useful to debug read-ahead issues.
>
> The read-ahead information must be reliable, otherwise it's impossible
> to introduce other dependencies on top of that. We're having a lot of
> issues with termination of tasks that are stuck in issuing read-ahead.
> If we can rely on the read-ahead information being there AND being
> correct, then we can terminate reads early. This might not sound like
> huge win, but we're talking tens of minutes for some cases. That's the
> planned functional change behind this, but can't be done before we make
> progress on the topic at hand.
Changelog material right there.
next prev parent reply other threads:[~2018-06-20 23:23 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-30 14:42 [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead Jens Axboe
2018-05-30 14:42 ` [PATCH 1/4] mpage: add argument structure for do_mpage_readpage() Jens Axboe
2018-05-30 14:42 ` [PATCH 2/4] mpage: mpage_readpages() should submit IO as read-ahead Jens Axboe
2018-05-30 14:42 ` [PATCH 3/4] btrfs: readpages() " Jens Axboe
2018-05-30 14:42 ` [PATCH 4/4] ext4: " Jens Axboe
2018-06-19 23:56 ` [PATCHSET v3 0/4] Submit ->readpages() " Andrew Morton
2018-06-20 14:07 ` Jens Axboe
2018-06-20 19:58 ` Andrew Morton
2018-06-20 20:15 ` Chris Mason
2018-06-20 22:28 ` Jens Axboe
2018-06-20 23:23 ` Andrew Morton [this message]
2018-06-20 23:33 ` Jens Axboe
2018-06-20 23:45 ` Andrew Morton
2018-06-21 0:06 ` Jens Axboe
2018-06-21 12:21 ` Christoph Hellwig
2018-06-21 13:52 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180620162353.58d931d64ebf7400a259030e@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).