Re: [PATCH][2.6-mm] Readahead issues and AIO read speedup

From: Badari Pulavarty <pbadari@us.ibm.com>
To: suparna@in.ibm.com, akpm@osdl.org
Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org
Subject: Re: [PATCH][2.6-mm] Readahead issues and AIO read speedup
Date: Thu, 7 Aug 2003 09:01:01 -0700	[thread overview]
Message-ID: <200308070901.01119.pbadari@us.ibm.com> (raw)
In-Reply-To: <20030807100120.GA5170@in.ibm.com>

Suparna,

I noticed  the exact same thing while testing on database benchmark 
on filesystems (without AIO). I added instrumentation in scsi layer to 
record the IO pattern and I found that we are doing lots of (4million) 
4K reads, in my benchmark run. I was tracing that and found that all
those reads are generated by slow read path, since readahead window 
is maximally shrunk. When I forced the readahead code to read 16k 
(my database pagesize), in case ra window closed - I see 20% improvement 
in my benchmark. I asked "Ramchandra Pai" (linuxram@us.ibm.com)
to investigate it further.

Thanks,
Badari

On Thursday 07 August 2003 03:01 am, Suparna Bhattacharya wrote:
> I noticed a problem with the way do_generic_mapping_read
> and readahead works for the case of large reads, especially
> random reads. This was leading to very inefficient behaviour
> for a stream for AIO reads. (See the results a little later
> in this note)
>
> 1) We should be reading ahead at least the pages that are
>    required by the current read request (even if the ra window
>    is maximally shrunk). I think I've seen this in 2.4 - we
>    seem to have lost that in 2.5.
>    The result is that sometimes (large random reads) we end
>    up doing reads one page at a time waiting for it to complete
>    being reading the next page and so on, even for a large read.
>    (until we buildup a readahead window again)
>
> 2) Once the ra window is maximally shrunk, the responsibility
>    for reading the pages and re-building the window is shifted
>    to the slow path in read, which breaks down in the case of
>    a stream of AIO reads where multiple iocbs submit reads
>    to the same file rather than serialise the wait for i/o
>    completion.
>
> So here is a patch that fixes this by making sure we do
> (1) and pushing up the handle_ra_miss calls for the maximally
> shrunk case before the loop that waits for I/O completion.
>
> Does it make a difference ? A lot, actually.