From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-it0-f66.google.com ([209.85.214.66]:39010 "EHLO mail-it0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933528AbeFTW25 (ORCPT ); Wed, 20 Jun 2018 18:28:57 -0400 Received: by mail-it0-f66.google.com with SMTP id p185-v6so1955520itp.4 for ; Wed, 20 Jun 2018 15:28:57 -0700 (PDT) Subject: Re: [PATCHSET v3 0/4] Submit ->readpages() IO as read-ahead To: Andrew Morton Cc: linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk References: <1527691329-2371-1-git-send-email-axboe@kernel.dk> <20180619165640.0c36473a33e20ea6c0fda463@linux-foundation.org> <9b1d2225-104d-865f-d821-8670271d6d4a@kernel.dk> <20180620125801.6e563efc5ed6af2f12c27f07@linux-foundation.org> From: Jens Axboe Message-ID: Date: Wed, 20 Jun 2018 16:28:54 -0600 MIME-Version: 1.0 In-Reply-To: <20180620125801.6e563efc5ed6af2f12c27f07@linux-foundation.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 6/20/18 1:58 PM, Andrew Morton wrote: > On Wed, 20 Jun 2018 08:07:06 -0600 Jens Axboe wrote: > >> On 6/19/18 5:56 PM, Andrew Morton wrote: >>> On Wed, 30 May 2018 08:42:05 -0600 Jens Axboe wrote: >>> >>>> The only caller of ->readpages() is from read-ahead, yet we don't >>>> submit IO flagged with REQ_RAHEAD. This means we don't see it in >>>> blktrace, for instance, which is a shame. We already make assumptions >>>> about ->readpages() just being for read-ahead in the mpage >>>> implementation, using readahead_gfp_mask(mapping) as out GFP mask of >>>> choice. >>>> >>>> This small series fixes up mpage_readpages() to submit with >>>> REQ_RAHEAD, which takes care of file systems using mpage_readpages(). >>>> The first patch is a prep patch, that makes do_mpage_readpage() take >>>> an argument structure. >>> >>> Well gee. That sounds like a total fluke and I don't see anything >>> which prevents future code from using readpages for something other >>> than readahead. >> >> Nothing prevents it, but it's always been the case. So we need to >> embrace the fact one way or another. > > All it will take is one simple (and desirable) cleanup to f2fs and this > will all break. Take that as an indication of the fragility of this > assumption. The assumption has been true for a decade or more. The only problem with it is that we have this ->readpages() interface that is only used for readahead. Hence it should be ->readahead() or similar, that avoid the fragility argument. >>> And what's happening with the individual ->readpage() calls which >>> read_pages() does? Do they still not get REQ_RAHEAD treatment? >> >> They should, yes. > > hm, how does that happen. Maybe we aren't talking about the same thing. Which individual ->readpage() exactly? >>> This is kinda dirty, but we could add a readahead flag to blk_plug, set >>> it in read_pages(), test it down in the block layer somewhere.... >>> That's grubby, but at least it wouldn't increase sizeof(task_struct)? >> >> That sounds way too dirty for my liking, I'd really not want to mix that >> in with plugging. > > That's the wrong way of thinking about it ;) Rename blk_plug to > blk_state or something and view it as a communication package between > the VFS level and the block layer and it all fits together pretty well. We have a perfectly fine established interface for telling the block layer what kind of IO we're submitting, and that's setting it in the bio. Introducing a secondary and orthogonal interface for this is a horrible idea, imho. >> Why not just treat it like it is - readpages() is clearly just >> read-ahead. This isn't something new I'm inventing, it's always been so. >> Seems to me that we just need to make it explicit. > > There's no reason at all why ->readpages() is exclusively for use by > readahead! I'm rather surprised that fsf2 is the only fs which is > (should be) using ->readpages internally. I totally agree, there's no reason why it is, and everybody is surprised that that is the case. But the fact of the matter is that that is the case, and always has been so. So I'll repeat what I said before, we need to embrace that and make it EXPLICIT instead of side stepping the issue. > Can we change blktrace instead? Rename REQ_RAHEAD to REQ_READPAGES? > Then everything will fit together OK. Why do you insist on working around it?! That's just creating another assumption in another place, not fixing the actual issue. Besides, this isn't going to be just about tracing. Yes, it'll be awesome to actually get the right information from blktrace, since right now nobody knows which parts are read-ahead and which ones are explicit reads. Might be pretty darn useful to debug read-ahead issues. The read-ahead information must be reliable, otherwise it's impossible to introduce other dependencies on top of that. We're having a lot of issues with termination of tasks that are stuck in issuing read-ahead. If we can rely on the read-ahead information being there AND being correct, then we can terminate reads early. This might not sound like huge win, but we're talking tens of minutes for some cases. That's the planned functional change behind this, but can't be done before we make progress on the topic at hand. -- Jens Axboe