Re: [RFC PATCH] btrfs: Remove __extent_readpages

From: Nikolay Borisov <nborisov@suse.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH] btrfs: Remove __extent_readpages
Date: Mon, 10 Dec 2018 10:41:27 +0200	[thread overview]
Message-ID: <ea27ebe5-d2f5-54e9-4571-061566adbcad@suse.com> (raw)
In-Reply-To: <20181205165847.gfmx3jmnpe42ucyt@MacBook-Pro-91.local>

On 5.12.18 г. 18:58 ч., Josef Bacik wrote:
> On Mon, Dec 03, 2018 at 12:25:32PM +0200, Nikolay Borisov wrote:
>> When extent_readpages is called from the generic readahead code it first
>> builds a batch of 16 pages (which might or might not be consecutive,
>> depending on whether add_to_page_cache_lru failed) and submits them to
>> __extent_readpages. The latter ensures that the range of pages (in the
>> batch of 16) that is passed to __do_contiguous_readpages is consecutive.
>>
>> If add_to_page_cache_lru does't fail then __extent_readpages will call
>> __do_contiguous_readpages only once with the whole batch of 16.
>> Alternatively, if add_to_page_cache_lru fails once on the 8th page (as an example)
>> then the contigous page read code will be called twice.
>>
>> All of this can be simplified by exploiting the fact that all pages passed
>> to extent_readpages are consecutive, thus when the batch is built in
>> that function it is already consecutive (barring add_to_page_cache_lru
>> failures) so are ready to be submitted directly to __do_contiguous_readpages.
>> Also simplify the name of the function to contiguous_readpages. 
>>
>> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
>> ---
>>
>> So this patch looks like a very nice cleanup, however when doing performance 
>> measurements with fio I was shocked to see that it actually is detrimental to 
>> performance. Here are the results: 
>>
>> The command line used for fio: 
>> fio --name=/media/scratch/seqread --rw=read --direct=0 --ioengine=sync --bs=4k
>>  --numjobs=1 --size=1G --runtime=600  --group_reporting --loop 10
>>
>> This was tested on a vm with 4g of ram so the size of the test is smaller than 
>> the memory, so pages should have been nicely readahead. 
>>
> 
> What this patch changes is now we aren't reading all of the pages we are passed
> by the readahead, so now we fall back to per-page reading when we go to read
> those pages because the readahead window has already moved past them.  This
> isn't great behavior, what we have is nice in that it tries to group the entire
> range together as much as possible.  What your patch changes is that as soon as
> we stop having a contiguous range we just bail and submit what we have.  Testing
> it in isolation is likely to show very little change, but having recently
> touched the fault in code where we definitely do not count major faults in some
> cases I'd suspect that we're not reflecting this higher fault rate in the
> performance counters properly.  We should preserve the existing behavior, what
> hurts a little bit on a lightly loaded box is going to hurt a whole lot more on
> a heavily loaded box.  Thanks,

Following our recent chat regarding this (and the previous prep patch) I
think both doesn't constitute difference in behavior, just making the
code a bit clearer. Also Mel was kind enough to run more rigorous
performance tests and the result were identical.

The main concern in your statements comes from the fact that the prep
patch actually introduced a for loop which creates the batch so the
break you see here is for this nested loop and not for the main loop
that reads all pages passed by the RA code.

> 
> Josef
>