linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: linux-btrfs@vger.kernel.org
Subject: Re: [RFC PATCH] btrfs: Remove __extent_readpages
Date: Mon, 11 Feb 2019 09:46:10 +0200	[thread overview]
Message-ID: <50e531dd-1c4d-dc27-332d-58f7f748c5b0@suse.com> (raw)
In-Reply-To: <20181203102532.13945-1-nborisov@suse.com>



On 3.12.18 г. 12:25 ч., Nikolay Borisov wrote:
> When extent_readpages is called from the generic readahead code it first
> builds a batch of 16 pages (which might or might not be consecutive,
> depending on whether add_to_page_cache_lru failed) and submits them to
> __extent_readpages. The latter ensures that the range of pages (in the
> batch of 16) that is passed to __do_contiguous_readpages is consecutive.
> 
> If add_to_page_cache_lru does't fail then __extent_readpages will call
> __do_contiguous_readpages only once with the whole batch of 16.
> Alternatively, if add_to_page_cache_lru fails once on the 8th page (as an example)
> then the contigous page read code will be called twice.
> 
> All of this can be simplified by exploiting the fact that all pages passed
> to extent_readpages are consecutive, thus when the batch is built in
> that function it is already consecutive (barring add_to_page_cache_lru
> failures) so are ready to be submitted directly to __do_contiguous_readpages.
> Also simplify the name of the function to contiguous_readpages. 
> 
> Signed-off-by: Nikolay Borisov <nborisov@suse.com>
> ---
> 
> So this patch looks like a very nice cleanup, however when doing performance 
> measurements with fio I was shocked to see that it actually is detrimental to 
> performance. Here are the results: 
> 
> The command line used for fio: 
> fio --name=/media/scratch/seqread --rw=read --direct=0 --ioengine=sync --bs=4k
>  --numjobs=1 --size=1G --runtime=600  --group_reporting --loop 10
> 
> This was tested on a vm with 4g of ram so the size of the test is smaller than 
> the memory, so pages should have been nicely readahead. 
> 
> PATCHED: 
> 
> Starting 1 process
> Jobs: 1 (f=1): [R(1)][100.0%][r=519MiB/s][r=133k IOPS][eta 00m:00s] 
> /media/scratch/seqread: (groupid=0, jobs=1): err= 0: pid=3722: Mon Dec  3 09:57:17 2018
>   read: IOPS=78.4k, BW=306MiB/s (321MB/s)(10.0GiB/33444msec)
>     clat (nsec): min=1703, max=9042.7k, avg=5463.97, stdev=121068.28
>      lat (usec): min=2, max=9043, avg= 6.00, stdev=121.07
>     clat percentiles (nsec):
>      |  1.00th=[   1848],  5.00th=[   1896], 10.00th=[   1912],
>      | 20.00th=[   1960], 30.00th=[   2024], 40.00th=[   2160],
>      | 50.00th=[   2384], 60.00th=[   2576], 70.00th=[   2800],
>      | 80.00th=[   3120], 90.00th=[   3824], 95.00th=[   4768],
>      | 99.00th=[   7968], 99.50th=[  14912], 99.90th=[  50944],
>      | 99.95th=[ 667648], 99.99th=[5931008]
>    bw (  KiB/s): min= 2768, max=544542, per=100.00%, avg=409912.68, stdev=162333.72, samples=50
>    iops        : min=  692, max=136135, avg=102478.08, stdev=40583.47, samples=50
>   lat (usec)   : 2=25.93%, 4=65.58%, 10=7.69%, 20=0.57%, 50=0.13%
>   lat (usec)   : 100=0.04%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
>   lat (msec)   : 2=0.01%, 4=0.01%, 10=0.05%
>   cpu          : usr=7.20%, sys=92.55%, ctx=396, majf=0, minf=9
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued rwts: total=2621440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=1
> 
> Run status group 0 (all jobs):
>    READ: bw=306MiB/s (321MB/s), 306MiB/s-306MiB/s (321MB/s-321MB/s), io=10.0GiB (10.7GB), run=33444-33444msec
> 
> 
> UNPATCHED:
> 
> Starting 1 process
> Jobs: 1 (f=1): [R(1)][100.0%][r=568MiB/s][r=145k IOPS][eta 00m:00s] 
> /media/scratch/seqread: (groupid=0, jobs=1): err= 0: pid=640: Mon Dec  3 10:07:38 2018
>   read: IOPS=90.4k, BW=353MiB/s (370MB/s)(10.0GiB/29008msec)
>     clat (nsec): min=1418, max=12374k, avg=4816.38, stdev=109448.00
>      lat (nsec): min=1836, max=12374k, avg=5284.46, stdev=109451.36
>     clat percentiles (nsec):
>      |  1.00th=[   1576],  5.00th=[   1608], 10.00th=[   1640],
>      | 20.00th=[   1672], 30.00th=[   1720], 40.00th=[   1832],
>      | 50.00th=[   2096], 60.00th=[   2288], 70.00th=[   2480],
>      | 80.00th=[   2736], 90.00th=[   3248], 95.00th=[   3952],
>      | 99.00th=[   6368], 99.50th=[  12736], 99.90th=[  43776],
>      | 99.95th=[ 798720], 99.99th=[5341184]
>    bw (  KiB/s): min=34144, max=606208, per=100.00%, avg=465737.56, stdev=177637.57, samples=45
>    iops        : min= 8536, max=151552, avg=116434.33, stdev=44409.46, samples=45
>   lat (usec)   : 2=45.74%, 4=49.50%, 10=4.13%, 20=0.45%, 50=0.08%
>   lat (usec)   : 100=0.03%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
>   lat (msec)   : 2=0.01%, 4=0.01%, 10=0.05%, 20=0.01%
>   cpu          : usr=7.14%, sys=92.39%, ctx=1059, majf=0, minf=9
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued rwts: total=2621440,0,0,0 short=0,0,0,0 dropped=0,0,0,0
>      latency   : target=0, window=0, percentile=100.00%, depth=1
> 
> Run status group 0 (all jobs):
>    READ: bw=353MiB/s (370MB/s), 353MiB/s-353MiB/s (370MB/s-370MB/s), io=10.0GiB (10.7GB), run=29008-29008msec
> 
> Clearly both bandwidth and iops are worse. However I'm puzzled as to why this is 
> the case, given that I don't see how this patch affects the submission of 
> readahead io. 
> 
>  fs/btrfs/extent_io.c | 63 +++++++++++++-------------------------------
>  1 file changed, 19 insertions(+), 44 deletions(-)

Revisiting the patch and the results IMHO it shouldn't be RFC and is
ready for merging.

So gentle ping :)

> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 233f835dd6f8..c097eec1b73d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3059,7 +3059,7 @@ static int __do_readpage(struct extent_io_tree *tree,
>  	return ret;
>  }
>  
> -static inline void __do_contiguous_readpages(struct extent_io_tree *tree,
> +static inline void contiguous_readpages(struct extent_io_tree *tree,
>  					     struct page *pages[], int nr_pages,
>  					     u64 start, u64 end,
>  					     struct extent_map **em_cached,
> @@ -3090,46 +3090,6 @@ static inline void __do_contiguous_readpages(struct extent_io_tree *tree,
>  	}
>  }
>  
> -static void __extent_readpages(struct extent_io_tree *tree,
> -			       struct page *pages[],
> -			       int nr_pages,
> -			       struct extent_map **em_cached,
> -			       struct bio **bio, unsigned long *bio_flags,
> -			       u64 *prev_em_start)
> -{
> -	u64 start = 0;
> -	u64 end = 0;
> -	u64 page_start;
> -	int index;
> -	int first_index = 0;
> -
> -	for (index = 0; index < nr_pages; index++) {
> -		page_start = page_offset(pages[index]);
> -		if (!end) {
> -			start = page_start;
> -			end = start + PAGE_SIZE - 1;
> -			first_index = index;
> -		} else if (end + 1 == page_start) {
> -			end += PAGE_SIZE;
> -		} else {
> -			__do_contiguous_readpages(tree, &pages[first_index],
> -						  index - first_index, start,
> -						  end, em_cached,
> -						  bio, bio_flags,
> -						  prev_em_start);
> -			start = page_start;
> -			end = start + PAGE_SIZE - 1;
> -			first_index = index;
> -		}
> -	}
> -
> -	if (end)
> -		__do_contiguous_readpages(tree, &pages[first_index],
> -					  index - first_index, start,
> -					  end, em_cached, bio,
> -					  bio_flags, prev_em_start);
> -}
> -
>  static int __extent_read_full_page(struct extent_io_tree *tree,
>  				   struct page *page,
>  				   get_extent_t *get_extent,
> @@ -4104,6 +4064,13 @@ int extent_readpages(struct address_space *mapping, struct list_head *pages,
>  	u64 prev_em_start = (u64)-1;
>  
>  	while (!list_empty(pages)) {
> +		u64 contig_end = 0;
> +
> +		/*
> +		 * Produces a batch of up to 16 contiguous pages, assumes
> +		 * that pages are consecutive in pages list (guaranteed by the
> +		 * generic code)
> +		 **/
>  		for (nr = 0; nr < BTRFS_PAGES_BATCH && !list_empty(pages);) {
>  			struct page *page = lru_to_page(pages);
>  
> @@ -4112,14 +4079,22 @@ int extent_readpages(struct address_space *mapping, struct list_head *pages,
>  			if (add_to_page_cache_lru(page, mapping, page->index,
>  						readahead_gfp_mask(mapping))) {
>  				put_page(page);
> -				continue;
> +				break;
>  			}
>  
>  			pagepool[nr++] = page;
> +			contig_end = page_offset(page) + PAGE_SIZE - 1;
>  		}
>  
> -		__extent_readpages(tree, pagepool, nr, &em_cached, &bio,
> -				   &bio_flags, &prev_em_start);
> +		if (nr) {
> +			u64 contig_start = page_offset(pagepool[0]);
> +
> +			ASSERT(contig_start + (nr*PAGE_SIZE) - 1 == contig_end);
> +
> +			contiguous_readpages(tree, pagepool, nr, contig_start,
> +				     contig_end, &em_cached, &bio, &bio_flags,
> +				     &prev_em_start);
> +		}
>  	}
>  
>  	if (em_cached)
> 

  parent reply	other threads:[~2019-02-11  7:46 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-03 10:25 [RFC PATCH] btrfs: Remove __extent_readpages Nikolay Borisov
2018-12-03 14:25 ` Nikolay Borisov
2018-12-05 16:58 ` Josef Bacik
2018-12-10  8:41   ` Nikolay Borisov
2019-02-11  7:46 ` Nikolay Borisov [this message]
2019-02-28 13:41   ` David Sterba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50e531dd-1c4d-dc27-332d-58f7f748c5b0@suse.com \
    --to=nborisov@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).