linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* (no subject)
@ 2009-12-30  5:41 Wu Fengguang
  0 siblings, 0 replies; only message in thread
From: Wu Fengguang @ 2009-12-30  5:41 UTC (permalink / raw)
  To: Quentin Barnes; +Cc: Andi Kleen, linux-kernel, linux-fsdevel, Nick Piggin

Andrew Morton <akpm@linux-foundation.org> 
CC: Steven Whitehouse <swhiteho@redhat.com>
Subject: Re: [RFC][PATCH] Disabling read-ahead makes I/O of large reads
	small
Reply-To: 
In-Reply-To: <87aax18xms.fsf@basil.nowhere.org>

Hi Quentin,

Quentin Barnes <qbarnes+nfs@yahoo-inc.com> writes:

> Adding the posix_fadvise(..., POSIX_FADV_RANDOM) call sets

Have you tried w/o POSIX_FADV_RANDOM (ie. comment out the fadivse call)?
It should be able to achieve the same good performance. The heuristic
readahead logic should detect that the application is doing random
reads.

> ra_pages=0.  This has a very odd side-effect in the kernel.  Once
> read-ahead is disabled, subsequent calls to read(2) are now done in
> the kernel via ->readpage() callback doing I/O one page at a time!
> 
> Pouring through the code in mm/filemap.c I see that the kernel has
> commingled read-ahead and plain read implementations.  The algorithms
> have much in common, so I can see why it was done, but it left this
> anomaly of severely pimping read(2) calls on file descriptors with
> read-ahead disabled.
> 
> 
> For example, with a read(2) of 98K bytes of a file opened with
> O_DIRECT accessed over NFSv3 with rsize=32768, I see:
> =========
> V3 ACCESS Call (Reply In 249), FH:0xf3a8e519
> V3 ACCESS Reply (Call In 248)
> V3 READ Call (Reply In 321), FH:0xf3a8e519 Offset:0 Len:32768
> V3 READ Call (Reply In 287), FH:0xf3a8e519 Offset:32768 Len:32768
> V3 READ Call (Reply In 356), FH:0xf3a8e519 Offset:65536 Len:32768
> V3 READ Reply (Call In 251) Len:32768
> V3 READ Reply (Call In 250) Len:32768
> V3 READ Reply (Call In 252) Len:32768
> =========
> I would expect three READs issued of size 32K, and that's exactly
> what I see.
> 
> 
> For the same file without O_DIRECT but with read-ahead disabled
> (its ra_pages=0), I see:
> =========
> V3 ACCESS Call (Reply In 167), FH:0xf3a8e519
> V3 ACCESS Reply (Call In 166)
> V3 READ Call (Reply In 172), FH:0xf3a8e519 Offset:0 Len:4096 
> V3 READ Reply (Call In 168) Len:4096
> V3 READ Call (Reply In 177), FH:0xf3a8e519 Offset:4096 Len:4096  
> V3 READ Reply (Call In 173) Len:4096 
> V3 READ Call (Reply In 182), FH:0xf3a8e519 Offset:8192 Len:4096
> V3 READ Reply (Call In 178) Len:4096
> [... READ Call/Reply pairs repeated another 21 times ...]
> =========
> Now I see 24 READ calls of 4K each!

Good catch, Thank you very much!

> A workaround for this kernel problem is to hack the app to do a
> readahead(2) call prior to the read(2), however, I would think a
> better approach would be to fix the kernel.  I came up with the
> included patch that once applied restores the expected read(2)
> behavior.  For the latter test case above of a file with read-ahead
> disabled but now with the patch below applied, I now see:
> =========
> V3 ACCESS Call (Reply In 1350), FH:0xf3a8e519
> V3 ACCESS Reply (Call In 1349)
> V3 READ Call (Reply In 1387), FH:0xf3a8e519 Offset:0 Len:32768
> V3 READ Call (Reply In 1421), FH:0xf3a8e519 Offset:32768 Len:32768
> V3 READ Call (Reply In 1456), FH:0xf3a8e519 Offset:65536 Len:32768
> V3 READ Reply (Call In 1351) Len:32768
> V3 READ Reply (Call In 1352) Len:32768
> V3 READ Reply (Call In 1353) Len:32768
> =========
> Which is what I would expect -- back to just three 32K READs.
> 
> After this change, the overall performance of the application
> increased by 313%!

And awesome improvements!

> 
> I have no idea if my patch is the appropriate fix.  I'm well out of
> my area in this part of the kernel.  It solves this one problem, but
> I have no idea how many boundary cases it doesn't cover or even if
> it is the right way to go about addressing this issue.
> 
> Is this behavior of shorting I/O of read(2) considered a bug?  And
> is this approach for a fix approriate?

The approach is mostly OK for the bug. However one issue is missed --
the ra_pages is somehow overloaded. I try to fix the problems in the
two patches just posted. Will that solve your problem?

Thanks,
Fengguang

> 
> --- linux-2.6.32.2/mm/filemap.c 2009-12-18 16:27:07.000000000 -0600
> +++ linux-2.6.32.2-rapatch/mm/filemap.c 2009-12-24 13:07:07.000000000 -0600
> @@ -1012,9 +1012,13 @@ static void do_generic_file_read(struct 
>  find_page:
>                 page = find_get_page(mapping, index);
>                 if (!page) {
> -                       page_cache_sync_readahead(mapping,
> -                                       ra, filp,
> -                                       index, last_index - index);
> +                       if (ra->ra_pages)
> +                               page_cache_sync_readahead(mapping,
> +                                               ra, filp,
> +                                               index, last_index - index);
> +                       else
> +                               force_page_cache_readahead(mapping, filp,
> +                                               index, last_index - index);
>                         page = find_get_page(mapping, index);
>                         if (unlikely(page == NULL))
>                                 goto no_cached_page;

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2009-12-30  5:41 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-30  5:41 Wu Fengguang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).