linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* do_generic_mapping_read performance issue
@ 2007-03-09 22:03 Ashif Harji
  2007-03-12 14:20 ` Jan Kara
  0 siblings, 1 reply; 41+ messages in thread
From: Ashif Harji @ 2007-03-09 22:03 UTC (permalink / raw)
  To: linux-kernel


Hi, I am encountering a performance problem, which I have tracked into the 
Linux kernel. The problem occurs with my experimental web server that uses 
sendfile to repeatedly transmit files.  The files are based on the static 
portion of the SPECweb99 fileset and range in size to model a reasonable 
workload.  With this workload, a significant number of the requests are 
for files of size 4 KB or less.

I have determined that the performance problems occurs in the function
do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
Here is the specific code fragment:

         /*
          * When (part of) the same page is read multiple times
          * in succession, only mark it as accessed the first time.
          */
         if (prev_index != index)
                 mark_page_accessed(page);


The implication of this code is that for files of size less than or equal 
to a single page, the page associated with such a file is likely to get 
evicted from the cache regardless of how frequently it is accessed.  The 
reason is that after the first access, prev_index is always zero and index 
can only be zero. Hence, mark_page_accessed is never called after the 
first time the file is requested.  As a result, the page is evicted from 
the cache no matter how frequently it is used.  By changing the kernel to 
always call mark_page_accessed for these files, the server throughput is 
increased by as much as 20%.

I was wondering if anyone could explain why the call to mark_page_accessed 
is conditional? That is, what problem it is trying to solve. It would seem 
that in many scenarios, if the same page is accessed repeatedly, then it 
would be appropriate to keep that page cached.

Please personally CC me on any responses.

thanks,
ashif.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: do_generic_mapping_read performance issue
  2007-03-09 22:03 do_generic_mapping_read performance issue Ashif Harji
@ 2007-03-12 14:20 ` Jan Kara
  2007-03-12 14:39   ` Nick Piggin
  2007-03-12 16:46   ` do_generic_mapping_read performance issue Ashif Harji
  0 siblings, 2 replies; 41+ messages in thread
From: Jan Kara @ 2007-03-12 14:20 UTC (permalink / raw)
  To: Ashif Harji; +Cc: linux-kernel, npiggin

  Hi,

> Hi, I am encountering a performance problem, which I have tracked into the 
> Linux kernel. The problem occurs with my experimental web server that uses 
> sendfile to repeatedly transmit files.  The files are based on the static 
> portion of the SPECweb99 fileset and range in size to model a reasonable 
> workload.  With this workload, a significant number of the requests are 
> for files of size 4 KB or less.
> 
> I have determined that the performance problems occurs in the function
> do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> Here is the specific code fragment:
> 
>         /*
>          * When (part of) the same page is read multiple times
>          * in succession, only mark it as accessed the first time.
>          */
>         if (prev_index != index)
>                 mark_page_accessed(page);
  Actually, the code is like that certainly for two years :).

> The implication of this code is that for files of size less than or equal 
> to a single page, the page associated with such a file is likely to get 
> evicted from the cache regardless of how frequently it is accessed.  The 
> reason is that after the first access, prev_index is always zero and index 
> can only be zero. Hence, mark_page_accessed is never called after the 
> first time the file is requested.  As a result, the page is evicted from 
> the cache no matter how frequently it is used.  By changing the kernel to 
> always call mark_page_accessed for these files, the server throughput is 
> increased by as much as 20%.
  Your analysis seems to be right. But to observe this behaviour you have
to have the file open and just always reread it using the same file
descriptor, don't you? That's probably not too common...

> I was wondering if anyone could explain why the call to mark_page_accessed 
> is conditional? That is, what problem it is trying to solve. It would seem 
> that in many scenarios, if the same page is accessed repeatedly, then it 
> would be appropriate to keep that page cached.
  I also don't know why the condition is there but it's there at least
for two years so I'm not sure anybody remembers ;). Nick, do you have
an idea?

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: do_generic_mapping_read performance issue
  2007-03-12 14:20 ` Jan Kara
@ 2007-03-12 14:39   ` Nick Piggin
  2007-03-12 15:13     ` Jan Kara
  2007-03-12 16:46   ` do_generic_mapping_read performance issue Ashif Harji
  1 sibling, 1 reply; 41+ messages in thread
From: Nick Piggin @ 2007-03-12 14:39 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ashif Harji, linux-kernel

On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
>   Hi,
> 
> > Hi, I am encountering a performance problem, which I have tracked into the 
> > Linux kernel. The problem occurs with my experimental web server that uses 
> > sendfile to repeatedly transmit files.  The files are based on the static 
> > portion of the SPECweb99 fileset and range in size to model a reasonable 
> > workload.  With this workload, a significant number of the requests are 
> > for files of size 4 KB or less.
> > 
> > I have determined that the performance problems occurs in the function
> > do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> > Here is the specific code fragment:
> > 
> >         /*
> >          * When (part of) the same page is read multiple times
> >          * in succession, only mark it as accessed the first time.
> >          */
> >         if (prev_index != index)
> >                 mark_page_accessed(page);
>   Actually, the code is like that certainly for two years :).

Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
stage (ie. read from the start of a page -- obviously that also has holes).

> > The implication of this code is that for files of size less than or equal 
> > to a single page, the page associated with such a file is likely to get 
> > evicted from the cache regardless of how frequently it is accessed.  The 
> > reason is that after the first access, prev_index is always zero and index 
> > can only be zero. Hence, mark_page_accessed is never called after the 
> > first time the file is requested.  As a result, the page is evicted from 
> > the cache no matter how frequently it is used.  By changing the kernel to 
> > always call mark_page_accessed for these files, the server throughput is 
> > increased by as much as 20%.
>   Your analysis seems to be right. But to observe this behaviour you have
> to have the file open and just always reread it using the same file
> descriptor, don't you? That's probably not too common...
> 
> > I was wondering if anyone could explain why the call to mark_page_accessed 
> > is conditional? That is, what problem it is trying to solve. It would seem 
> > that in many scenarios, if the same page is accessed repeatedly, then it 
> > would be appropriate to keep that page cached.
>   I also don't know why the condition is there but it's there at least
> for two years so I'm not sure anybody remembers ;). Nick, do you have
> an idea?

Yeah it is there because that is basically how our "use once" detection
handles the case where an app does not read in chunks that are PAGE_SIZE
multiples and PAGE_SIZE aligned.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: do_generic_mapping_read performance issue
  2007-03-12 14:39   ` Nick Piggin
@ 2007-03-12 15:13     ` Jan Kara
  2007-03-12 17:05       ` Ashif Harji
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Kara @ 2007-03-12 15:13 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Ashif Harji, linux-kernel

On Mon 12-03-07 15:39:00, Nick Piggin wrote:
> On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
> >   Hi,
> > 
> > > Hi, I am encountering a performance problem, which I have tracked into the 
> > > Linux kernel. The problem occurs with my experimental web server that uses 
> > > sendfile to repeatedly transmit files.  The files are based on the static 
> > > portion of the SPECweb99 fileset and range in size to model a reasonable 
> > > workload.  With this workload, a significant number of the requests are 
> > > for files of size 4 KB or less.
> > > 
> > > I have determined that the performance problems occurs in the function
> > > do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> > > Here is the specific code fragment:
> > > 
> > >         /*
> > >          * When (part of) the same page is read multiple times
> > >          * in succession, only mark it as accessed the first time.
> > >          */
> > >         if (prev_index != index)
> > >                 mark_page_accessed(page);
> >   Actually, the code is like that certainly for two years :).
> 
> Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
> stage (ie. read from the start of a page -- obviously that also has holes).
  Yes, at least in 2.6.12-rc5 which is the first one in git :).

> > > I was wondering if anyone could explain why the call to mark_page_accessed 
> > > is conditional? That is, what problem it is trying to solve. It would seem 
> > > that in many scenarios, if the same page is accessed repeatedly, then it 
> > > would be appropriate to keep that page cached.
> >   I also don't know why the condition is there but it's there at least
> > for two years so I'm not sure anybody remembers ;). Nick, do you have
> > an idea?
> 
> Yeah it is there because that is basically how our "use once" detection
> handles the case where an app does not read in chunks that are PAGE_SIZE
> multiples and PAGE_SIZE aligned.
  OK, I see. Then I'm not sure the check does more good than bad. Because
if we happen to reread the same chunk several times, then the check does a
wrong thing...

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: do_generic_mapping_read performance issue
  2007-03-12 14:20 ` Jan Kara
  2007-03-12 14:39   ` Nick Piggin
@ 2007-03-12 16:46   ` Ashif Harji
  1 sibling, 0 replies; 41+ messages in thread
From: Ashif Harji @ 2007-03-12 16:46 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, npiggin


>> The implication of this code is that for files of size less than or equal
>> to a single page, the page associated with such a file is likely to get
>> evicted from the cache regardless of how frequently it is accessed.  The
>> reason is that after the first access, prev_index is always zero and index
>> can only be zero. Hence, mark_page_accessed is never called after the
>> first time the file is requested.  As a result, the page is evicted from
>> the cache no matter how frequently it is used.  By changing the kernel to
>> always call mark_page_accessed for these files, the server throughput is
>> increased by as much as 20%.
>  Your analysis seems to be right. But to observe this behaviour you have
> to have the file open and just always reread it using the same file
> descriptor, don't you? That's probably not too common...

Yes, this problem requires the file to be accessed via the same file 
descriptor.  I suspect this behaviour is common for a certain class of 
applications, e.g., web servers.  However, any application that read from 
the same page repeatedly would also experience this problem.  It is more 
likely to occur with small files, but not exculsively.

ashif.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: do_generic_mapping_read performance issue
  2007-03-12 15:13     ` Jan Kara
@ 2007-03-12 17:05       ` Ashif Harji
  2007-03-12 17:35         ` Jan Kara
  0 siblings, 1 reply; 41+ messages in thread
From: Ashif Harji @ 2007-03-12 17:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: Nick Piggin, linux-kernel


On Mon, 12 Mar 2007, Jan Kara wrote:

> On Mon 12-03-07 15:39:00, Nick Piggin wrote:
>> On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
>>>   Hi,
>>>
>>>> Hi, I am encountering a performance problem, which I have tracked into the
>>>> Linux kernel. The problem occurs with my experimental web server that uses
>>>> sendfile to repeatedly transmit files.  The files are based on the static
>>>> portion of the SPECweb99 fileset and range in size to model a reasonable
>>>> workload.  With this workload, a significant number of the requests are
>>>> for files of size 4 KB or less.
>>>>
>>>> I have determined that the performance problems occurs in the function
>>>> do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
>>>> Here is the specific code fragment:
>>>>
>>>>         /*
>>>>          * When (part of) the same page is read multiple times
>>>>          * in succession, only mark it as accessed the first time.
>>>>          */
>>>>         if (prev_index != index)
>>>>                 mark_page_accessed(page);
>>>   Actually, the code is like that certainly for two years :).
>>
>> Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
>> stage (ie. read from the start of a page -- obviously that also has holes).
>  Yes, at least in 2.6.12-rc5 which is the first one in git :).
>
>>>> I was wondering if anyone could explain why the call to mark_page_accessed
>>>> is conditional? That is, what problem it is trying to solve. It would seem
>>>> that in many scenarios, if the same page is accessed repeatedly, then it
>>>> would be appropriate to keep that page cached.
>>>   I also don't know why the condition is there but it's there at least
>>> for two years so I'm not sure anybody remembers ;). Nick, do you have
>>> an idea?
>>
>> Yeah it is there because that is basically how our "use once" detection
>> handles the case where an app does not read in chunks that are PAGE_SIZE
>> multiples and PAGE_SIZE aligned.
>  OK, I see. Then I'm not sure the check does more good than bad. Because
> if we happen to reread the same chunk several times, then the check does a
> wrong thing...

Thanks for providing me with additional information.

I would like to submit a patch to fix the performance problem.  The 
simplest solution is to remove the check.  Even in the situation where an 
application does not read in PAGE_SIZE multiples as described above, if 
the page is accessed frequently it should remain in the cache.  However, I 
am open to suggestions for a more sophisticated scheme.

ashif.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: do_generic_mapping_read performance issue
  2007-03-12 17:05       ` Ashif Harji
@ 2007-03-12 17:35         ` Jan Kara
  2007-03-13 18:43           ` Ashif Harji
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Kara @ 2007-03-12 17:35 UTC (permalink / raw)
  To: Ashif Harji; +Cc: Nick Piggin, linux-kernel

On Mon 12-03-07 13:05:17, Ashif Harji wrote:
> 
> On Mon, 12 Mar 2007, Jan Kara wrote:
> 
> >On Mon 12-03-07 15:39:00, Nick Piggin wrote:
> >>On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
> >>>  Hi,
> >>>
> >>>>Hi, I am encountering a performance problem, which I have tracked into 
> >>>>the
> >>>>Linux kernel. The problem occurs with my experimental web server that 
> >>>>uses
> >>>>sendfile to repeatedly transmit files.  The files are based on the 
> >>>>static
> >>>>portion of the SPECweb99 fileset and range in size to model a reasonable
> >>>>workload.  With this workload, a significant number of the requests are
> >>>>for files of size 4 KB or less.
> >>>>
> >>>>I have determined that the performance problems occurs in the function
> >>>>do_generic_mapping_read in file mm/filemap.c for kernel version 
> >>>>2.6.20.1.
> >>>>Here is the specific code fragment:
> >>>>
> >>>>        /*
> >>>>         * When (part of) the same page is read multiple times
> >>>>         * in succession, only mark it as accessed the first time.
> >>>>         */
> >>>>        if (prev_index != index)
> >>>>                mark_page_accessed(page);
> >>>  Actually, the code is like that certainly for two years :).
> >>
> >>Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
> >>stage (ie. read from the start of a page -- obviously that also has 
> >>holes).
> > Yes, at least in 2.6.12-rc5 which is the first one in git :).
> >
> >>>>I was wondering if anyone could explain why the call to 
> >>>>mark_page_accessed
> >>>>is conditional? That is, what problem it is trying to solve. It would 
> >>>>seem
> >>>>that in many scenarios, if the same page is accessed repeatedly, then it
> >>>>would be appropriate to keep that page cached.
> >>>  I also don't know why the condition is there but it's there at least
> >>>for two years so I'm not sure anybody remembers ;). Nick, do you have
> >>>an idea?
> >>
> >>Yeah it is there because that is basically how our "use once" detection
> >>handles the case where an app does not read in chunks that are PAGE_SIZE
> >>multiples and PAGE_SIZE aligned.
> > OK, I see. Then I'm not sure the check does more good than bad. Because
> >if we happen to reread the same chunk several times, then the check does a
> >wrong thing...
> 
> I would like to submit a patch to fix the performance problem.  The 
> simplest solution is to remove the check.  Even in the situation where an 
> application does not read in PAGE_SIZE multiples as described above, if 
> the page is accessed frequently it should remain in the cache.  However, I 
> am open to suggestions for a more sophisticated scheme.
  Well, yes, that's an obvious solution. I just wanted to make sure that
such change won't break some other load. But so far it seems it won't...

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: do_generic_mapping_read performance issue
  2007-03-12 17:35         ` Jan Kara
@ 2007-03-13 18:43           ` Ashif Harji
  2007-03-13 18:55             ` Jan Kara
  0 siblings, 1 reply; 41+ messages in thread
From: Ashif Harji @ 2007-03-13 18:43 UTC (permalink / raw)
  To: Jan Kara; +Cc: Nick Piggin, linux-kernel


>> I would like to submit a patch to fix the performance problem.  The
>> simplest solution is to remove the check.  Even in the situation where an
>> application does not read in PAGE_SIZE multiples as described above, if
>> the page is accessed frequently it should remain in the cache.  However, I
>> am open to suggestions for a more sophisticated scheme.
>
>  Well, yes, that's an obvious solution. I just wanted to make sure that
> such change won't break some other load. But so far it seems it won't...

I would like to find out if the change I am suggesting will cause any 
other problems.  Is there someone else I should contact before submitting 
a patch?  Or should I just wait another day or two for comments?

ashif.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: do_generic_mapping_read performance issue
  2007-03-13 18:43           ` Ashif Harji
@ 2007-03-13 18:55             ` Jan Kara
  2007-03-14 19:58               ` [PATCH] mm/filemap.c: unconditionally call mark_page_accessed Ashif Harji
  0 siblings, 1 reply; 41+ messages in thread
From: Jan Kara @ 2007-03-13 18:55 UTC (permalink / raw)
  To: Ashif Harji; +Cc: Nick Piggin, linux-kernel

On Tue 13-03-07 14:43:47, Ashif Harji wrote:
> 
> >>I would like to submit a patch to fix the performance problem.  The
> >>simplest solution is to remove the check.  Even in the situation where an
> >>application does not read in PAGE_SIZE multiples as described above, if
> >>the page is accessed frequently it should remain in the cache.  However, I
> >>am open to suggestions for a more sophisticated scheme.
> >
> > Well, yes, that's an obvious solution. I just wanted to make sure that
> >such change won't break some other load. But so far it seems it won't...
> 
> I would like to find out if the change I am suggesting will cause any 
> other problems.  Is there someone else I should contact before submitting 
> a patch?  Or should I just wait another day or two for comments?
  I don't think you'll get more feedback now. Just write the patch,
benchmark it with your workload and also some others (basically, there should be
no difference) and then submit the patch together with the performance
numbers and some rationale (if it's your first submission, see
Documentation/SubmittingPatches). Give also CC to Andrew <akpm@linux-foundation.org>
and ask him to put it into -mm kernels for testing.

									Honza
-- 
Jan Kara <jack@suse.cz>
SuSE CR Labs

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-13 18:55             ` Jan Kara
@ 2007-03-14 19:58               ` Ashif Harji
  2007-03-14 20:55                 ` Dave Kleikamp
                                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Ashif Harji @ 2007-03-14 19:58 UTC (permalink / raw)
  To: linux-mm; +Cc: Nick Piggin, Jan Kara, linux-kernel, akpm


This patch unconditionally calls mark_page_accessed to prevent pages, 
especially for small files, from being evicted from the page cache despite 
frequent access.

Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>

---

If the same page of a file is repeatedly accessed (without accessing other 
pages of that file) via the same file descriptor, mark_page_accessed is 
never called after the first time the page is accessed.

The implication of this code is that for files of size less than or equal 
to a single page, the page associated with such a file is likely to get 
evicted from the cache regardless of how frequently it is accessed. 
However, this behaviour also occurs with files of any size if the same 
page is repeatedly accessed.

As a benchmark, I have an experimental web server that uses sendfile to 
repeatedly transmit files.  The files are based on the static portion of 
the SPECweb99 fileset and range in size to model a reasonable workload. 
With this workload, a significant number of the requests are for files of 
size 4 KB or less.

By changing the kernel to always call mark_page_accessed, the server 
throughput is increased by as much as 20%.  With one test, for example, 
without the change I get throughput of around 868 Mbps.  After making the 
change, performance increases to 1111 Mbps.

Using a configuration that should be unaffected by the change, performance 
was around 855 Mbps without the change and around 851 Mbps with the 
change.  As expected the change had no appreciable effect.

See thread http://lkml.org/lkml/2007/3/9/403 for additional discussion on 
this change.

This patch is for kernel version 2.6.20.1.

Andrew, can you also put this change into the -mm kernels for testing?


--- linux-2.6.20.1/mm/filemap.c.orig	2007-03-14 10:31:58.000000000 -0500
+++ linux-2.6.20.1/mm/filemap.c	2007-03-13 16:11:54.000000000 -0500
@@ -943,12 +943,7 @@ page_ok:
  		if (mapping_writably_mapped(mapping))
  			flush_dcache_page(page);

-		/*
-		 * When (part of) the same page is read multiple times
-		 * in succession, only mark it as accessed the first time.
-		 */
-		if (prev_index != index)
-			mark_page_accessed(page);
+		mark_page_accessed(page);
  		prev_index = index;

  		/*

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-14 19:58               ` [PATCH] mm/filemap.c: unconditionally call mark_page_accessed Ashif Harji
@ 2007-03-14 20:55                 ` Dave Kleikamp
  2007-03-14 21:33                   ` Andreas Mohr
  2007-03-16 14:20                   ` Anton Blanchard
  2007-03-15 10:39                 ` Peter Zijlstra
  2007-03-15 15:56                 ` Chuck Ebbert
  2 siblings, 2 replies; 41+ messages in thread
From: Dave Kleikamp @ 2007-03-14 20:55 UTC (permalink / raw)
  To: Ashif Harji; +Cc: linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm

On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
> This patch unconditionally calls mark_page_accessed to prevent pages, 
> especially for small files, from being evicted from the page cache despite 
> frequent access.

I guess the downside to this is if a reader is reading a large file, or
several files, sequentially with a small read size (smaller than
PAGE_SIZE), the pages will be marked active after just one read pass.
My gut says the benefits of this patch outweigh the cost.  I would
expect real-world backup apps, etc. to read at least PAGE_SIZE.

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-14 20:55                 ` Dave Kleikamp
@ 2007-03-14 21:33                   ` Andreas Mohr
  2007-03-14 22:08                     ` Dave Kleikamp
                                       ` (2 more replies)
  2007-03-16 14:20                   ` Anton Blanchard
  1 sibling, 3 replies; 41+ messages in thread
From: Andreas Mohr @ 2007-03-14 21:33 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Ashif Harji, linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm

Hi,

On Wed, Mar 14, 2007 at 03:55:41PM -0500, Dave Kleikamp wrote:
> On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
> > This patch unconditionally calls mark_page_accessed to prevent pages, 
> > especially for small files, from being evicted from the page cache despite 
> > frequent access.
> 
> I guess the downside to this is if a reader is reading a large file, or
> several files, sequentially with a small read size (smaller than
> PAGE_SIZE), the pages will be marked active after just one read pass.
> My gut says the benefits of this patch outweigh the cost.  I would
> expect real-world backup apps, etc. to read at least PAGE_SIZE.

I also think that the patch is somewhat problematic, since the original
intention seems to have been a reduction of the number of (expensive?)
mark_page_accessed() calls, but this of course falls flat on its face in case
of permanent single-page accesses or accesses with progressing but very small
read size (single-byte reads or so), since the cached page content will expire
eventually due to lack of mark_page_accessed() updates; thus this patch
decided to call mark_page_accessed() unconditionally which may be a large
performance penalty for subsequent tiny-sized reads.

I've been thinking hard how to avoid the mark_page_accessed() starvation in
case of a fixed, (almost) non-changing access state, but this seems hard since
it'd seem we need some kind of state management here to figure out good
intervals of when to call mark_page_accessed() *again* for this page. E.g.
despite non-changing access patterns you could still call mark_page_accessed()
every 32 calls or so to avoid expiry, but this would need extra helper
variables.

A rather ugly way to do it may be to abuse ra.cache_hit or ra.mmap_hit content
with a
	if ((prev_index != index) || (ra.cache_hit % 32 == 0))
		mark_page_accessed(page);
This assumes that ra.cache_hit gets incremented for every access (haven't
checked whether this is the case).
That way (combined with an enhanced comment properly explaining the dilemma)
you would avoid most mark_page_accessed() invocations of subsequent same-page reads
but still do page status updates from time to time to avoid page deprecation.

Does anyone think this would be acceptable? Any better idea?

Andreas Mohr

P.S.: since I'm not too familiar with this area I could be rather wrong after all...

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-14 21:33                   ` Andreas Mohr
@ 2007-03-14 22:08                     ` Dave Kleikamp
  2007-03-15  1:36                       ` Xiaoning Ding
  2007-03-15 15:00                     ` Rik van Riel
  2007-03-15 17:37                     ` Valdis.Kletnieks
  2 siblings, 1 reply; 41+ messages in thread
From: Dave Kleikamp @ 2007-03-14 22:08 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Ashif Harji, linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm

On Wed, 2007-03-14 at 22:33 +0100, Andreas Mohr wrote:
> Hi,
> 
> On Wed, Mar 14, 2007 at 03:55:41PM -0500, Dave Kleikamp wrote:
> > On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
> > > This patch unconditionally calls mark_page_accessed to prevent pages, 
> > > especially for small files, from being evicted from the page cache despite 
> > > frequent access.
> > 
> > I guess the downside to this is if a reader is reading a large file, or
> > several files, sequentially with a small read size (smaller than
> > PAGE_SIZE), the pages will be marked active after just one read pass.
> > My gut says the benefits of this patch outweigh the cost.  I would
> > expect real-world backup apps, etc. to read at least PAGE_SIZE.
> 
> I also think that the patch is somewhat problematic, since the original
> intention seems to have been a reduction of the number of (expensive?)
> mark_page_accessed() calls,

mark_page_accessed() isn't expensive.  If called repeatedly, starting
with the third call, it will check two page flags and return.  The only
real expense is that the page appears busier than it may be and will be
retained in memory longer than it should.

> but this of course falls flat on its face in case
> of permanent single-page accesses or accesses with progressing but very small
> read size (single-byte reads or so), since the cached page content will expire
> eventually due to lack of mark_page_accessed() updates; thus this patch
> decided to call mark_page_accessed() unconditionally which may be a large
> performance penalty for subsequent tiny-sized reads.

Any application doing many tiny-sized reads isn't exactly asking for
great performance.

> I've been thinking hard how to avoid the mark_page_accessed() starvation in
> case of a fixed, (almost) non-changing access state, but this seems hard since
> it'd seem we need some kind of state management here to figure out good
> intervals of when to call mark_page_accessed() *again* for this page. E.g.
> despite non-changing access patterns you could still call mark_page_accessed()
> every 32 calls or so to avoid expiry, but this would need extra helper
> variables.
> 
> A rather ugly way to do it may be to abuse ra.cache_hit or ra.mmap_hit content
> with a
> 	if ((prev_index != index) || (ra.cache_hit % 32 == 0))
> 		mark_page_accessed(page);
> This assumes that ra.cache_hit gets incremented for every access (haven't
> checked whether this is the case).
> That way (combined with an enhanced comment properly explaining the dilemma)
> you would avoid most mark_page_accessed() invocations of subsequent same-page reads
> but still do page status updates from time to time to avoid page deprecation.
> 
> Does anyone think this would be acceptable? Any better idea?

I wouldn't go looking for anything more complicated than Ashif's patch,
unless testing shows it to be harmful in some realistic workload.

> Andreas Mohr
> 
> P.S.: since I'm not too familiar with this area I could be rather wrong after all...

I could be missing something as well.  :-)

Shaggy
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-14 22:08                     ` Dave Kleikamp
@ 2007-03-15  1:36                       ` Xiaoning Ding
  2007-03-15  5:22                         ` Ashif Harji
  0 siblings, 1 reply; 41+ messages in thread
From: Xiaoning Ding @ 2007-03-15  1:36 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Andreas Mohr, Ashif Harji, linux-mm, Nick Piggin, Jan Kara,
	linux-kernel, akpm

Dave Kleikamp wrote:
> On Wed, 2007-03-14 at 22:33 +0100, Andreas Mohr wrote:
>> Hi,
>>
>> On Wed, Mar 14, 2007 at 03:55:41PM -0500, Dave Kleikamp wrote:
>>> On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
>>>> This patch unconditionally calls mark_page_accessed to prevent pages, 
>>>> especially for small files, from being evicted from the page cache despite 
>>>> frequent access.
>>> I guess the downside to this is if a reader is reading a large file, or
>>> several files, sequentially with a small read size (smaller than
>>> PAGE_SIZE), the pages will be marked active after just one read pass.
>>> My gut says the benefits of this patch outweigh the cost.  I would
>>> expect real-world backup apps, etc. to read at least PAGE_SIZE.
>> I also think that the patch is somewhat problematic, since the original
>> intention seems to have been a reduction of the number of (expensive?)
>> mark_page_accessed() calls,
> 
> mark_page_accessed() isn't expensive.  If called repeatedly, starting
> with the third call, it will check two page flags and return.  The only
> real expense is that the page appears busier than it may be and will be
> retained in memory longer than it should.
> 
If we allow mark_page_accessed() called multiple times for a single page,
a scan of large file with small-size reads would flush the buffer cache.
mark_page_accessed() also requests lru_lock when moving page from
inactive_list to active_list. It may also increase lock contention.

>> but this of course falls flat on its face in case
>> of permanent single-page accesses or accesses with progressing but very small
>> read size (single-byte reads or so), since the cached page content will expire
>> eventually due to lack of mark_page_accessed() updates; thus this patch
>> decided to call mark_page_accessed() unconditionally which may be a large
>> performance penalty for subsequent tiny-sized reads.
> 
> Any application doing many tiny-sized reads isn't exactly asking for
> great performance.
> 
>> I've been thinking hard how to avoid the mark_page_accessed() starvation in
>> case of a fixed, (almost) non-changing access state, but this seems hard since
>> it'd seem we need some kind of state management here to figure out good
>> intervals of when to call mark_page_accessed() *again* for this page. E.g.
>> despite non-changing access patterns you could still call mark_page_accessed()
>> every 32 calls or so to avoid expiry, but this would need extra helper
>> variables.
>>
>> A rather ugly way to do it may be to abuse ra.cache_hit or ra.mmap_hit content
>> with a
>> 	if ((prev_index != index) || (ra.cache_hit % 32 == 0))
>> 		mark_page_accessed(page);
>> This assumes that ra.cache_hit gets incremented for every access (haven't
>> checked whether this is the case).
>> That way (combined with an enhanced comment properly explaining the dilemma)
>> you would avoid most mark_page_accessed() invocations of subsequent same-page reads
>> but still do page status updates from time to time to avoid page deprecation.
>>
>> Does anyone think this would be acceptable? Any better idea?
> 
> I wouldn't go looking for anything more complicated than Ashif's patch,
> unless testing shows it to be harmful in some realistic workload.
> 
>> Andreas Mohr
>>
>> P.S.: since I'm not too familiar with this area I could be rather wrong after all...
> 
> I could be missing something as well.  :-)
> 
> Shaggy


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15  1:36                       ` Xiaoning Ding
@ 2007-03-15  5:22                         ` Ashif Harji
  2007-03-15 12:46                           ` Dave Kleikamp
  2007-03-15 19:07                           ` Andrew Morton
  0 siblings, 2 replies; 41+ messages in thread
From: Ashif Harji @ 2007-03-15  5:22 UTC (permalink / raw)
  To: Xiaoning Ding
  Cc: Dave Kleikamp, Andreas Mohr, linux-mm, Nick Piggin, Jan Kara,
	linux-kernel, akpm



On Wed, 14 Mar 2007, Xiaoning Ding wrote:

> Dave Kleikamp wrote:
>> On Wed, 2007-03-14 at 22:33 +0100, Andreas Mohr wrote:
>>> Hi,
>>> 
>>> On Wed, Mar 14, 2007 at 03:55:41PM -0500, Dave Kleikamp wrote:
>>>> On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
>>>>> This patch unconditionally calls mark_page_accessed to prevent pages, 
>>>>> especially for small files, from being evicted from the page cache 
>>>>> despite frequent access.
>>>> I guess the downside to this is if a reader is reading a large file, or
>>>> several files, sequentially with a small read size (smaller than
>>>> PAGE_SIZE), the pages will be marked active after just one read pass.
>>>> My gut says the benefits of this patch outweigh the cost.  I would
>>>> expect real-world backup apps, etc. to read at least PAGE_SIZE.
>>> I also think that the patch is somewhat problematic, since the original
>>> intention seems to have been a reduction of the number of (expensive?)
>>> mark_page_accessed() calls,
>> 
>> mark_page_accessed() isn't expensive.  If called repeatedly, starting
>> with the third call, it will check two page flags and return.  The only
>> real expense is that the page appears busier than it may be and will be
>> retained in memory longer than it should.
>> 
> If we allow mark_page_accessed() called multiple times for a single page,
> a scan of large file with small-size reads would flush the buffer cache.
> mark_page_accessed() also requests lru_lock when moving page from
> inactive_list to active_list. It may also increase lock contention.

The problem with the existing logic is that it is too coarse.  In trying 
to deal with one usage pattern it is negatively impacting performance for 
other reasonable access patterns.

Further, consider the extreme case of scanning a file 1 byte at a time. 
In this case, you are going to access a page over 4000 times, but that 
page is not going to be marked as active and hence that page is likely to 
be evicted from the cache.  Clearly, there are cases when scanning a file 
that you would like the pages to be kept in the cache.

Finally, the existing code is problematic as there is no reasonable way to 
circumvent the negative impact for small files.

Hence, I think a change is necessary.  The question is whether the 
intent of conditionally calling mark_page_accessed() is still reasonable 
and whether the amount of bookkeeping required to detect that usage 
pattern but not create a problem for other usage patterns is reasonable.

I would tend to agree with David that:  "Any application doing many 
tiny-sized reads isn't exactly asking for great performance."  As well, 
applications concerned with performance and caching problems can read in a 
file in PAGE_SIZE chunks.  I still think the simple fix of removing the 
condition is the best approach, but I'm certainly open to alternatives.

ashif.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-14 19:58               ` [PATCH] mm/filemap.c: unconditionally call mark_page_accessed Ashif Harji
  2007-03-14 20:55                 ` Dave Kleikamp
@ 2007-03-15 10:39                 ` Peter Zijlstra
  2007-03-15 12:38                   ` Nick Piggin
  2007-03-15 15:56                 ` Chuck Ebbert
  2 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2007-03-15 10:39 UTC (permalink / raw)
  To: Ashif Harji; +Cc: linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm

On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
> This patch unconditionally calls mark_page_accessed to prevent pages, 
> especially for small files, from being evicted from the page cache despite 
> frequent access.

Since we're hackling over the use-once stuff again...

/me brings up: http://marc.info/?l=linux-mm&m=115316894804385&w=2 and
ducks.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 10:39                 ` Peter Zijlstra
@ 2007-03-15 12:38                   ` Nick Piggin
  2007-03-15 15:06                     ` Rik van Riel
  0 siblings, 1 reply; 41+ messages in thread
From: Nick Piggin @ 2007-03-15 12:38 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Ashif Harji, linux-mm, Jan Kara, linux-kernel, akpm

On Thu, Mar 15, 2007 at 11:39:14AM +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
> > This patch unconditionally calls mark_page_accessed to prevent pages, 
> > especially for small files, from being evicted from the page cache despite 
> > frequent access.
> 
> Since we're hackling over the use-once stuff again...
> 
> /me brings up: http://marc.info/?l=linux-mm&m=115316894804385&w=2 and
> ducks.

Join the club ;) 

http://groups.google.com.au/group/linux.kernel/msg/7b3237b8e715475b?hl=en&

I can't find the patch where I actually did combine it with a PG_usedonce
bit, but the end result is pretty similar to your patch. And I think one
or two others have also independently invented the same thing.

So it *has* to be good, doesn't it? ;)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15  5:22                         ` Ashif Harji
@ 2007-03-15 12:46                           ` Dave Kleikamp
  2007-03-15 12:50                             ` Nick Piggin
  2007-03-15 19:07                           ` Andrew Morton
  1 sibling, 1 reply; 41+ messages in thread
From: Dave Kleikamp @ 2007-03-15 12:46 UTC (permalink / raw)
  To: Ashif Harji
  Cc: Xiaoning Ding, Andreas Mohr, linux-mm, Nick Piggin, Jan Kara,
	linux-kernel, akpm

On Thu, 2007-03-15 at 01:22 -0400, Ashif Harji wrote:

> I would tend to agree with David that:  "Any application doing many 
> tiny-sized reads isn't exactly asking for great performance."  As well, 
> applications concerned with performance and caching problems can read in a 
> file in PAGE_SIZE chunks.  I still think the simple fix of removing the 
> condition is the best approach, but I'm certainly open to alternatives.

A possible alternative might be to store the offset within the page in
the readahead state, and call mark_page_accessed() when the read offset
is less than or equal to the previous offset.

-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 12:46                           ` Dave Kleikamp
@ 2007-03-15 12:50                             ` Nick Piggin
  0 siblings, 0 replies; 41+ messages in thread
From: Nick Piggin @ 2007-03-15 12:50 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Ashif Harji, Xiaoning Ding, Andreas Mohr, linux-mm, Jan Kara,
	linux-kernel, akpm

On Thu, Mar 15, 2007 at 07:46:56AM -0500, Dave Kleikamp wrote:
> On Thu, 2007-03-15 at 01:22 -0400, Ashif Harji wrote:
> 
> > I would tend to agree with David that:  "Any application doing many 
> > tiny-sized reads isn't exactly asking for great performance."  As well, 
> > applications concerned with performance and caching problems can read in a 
> > file in PAGE_SIZE chunks.  I still think the simple fix of removing the 
> > condition is the best approach, but I'm certainly open to alternatives.
> 
> A possible alternative might be to store the offset within the page in
> the readahead state, and call mark_page_accessed() when the read offset
> is less than or equal to the previous offset.

That could be a good idea.

We definitely want to look at ways to solve with within the existing
approach before any large scale change in behaviour.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-14 21:33                   ` Andreas Mohr
  2007-03-14 22:08                     ` Dave Kleikamp
@ 2007-03-15 15:00                     ` Rik van Riel
  2007-03-15 17:37                     ` Valdis.Kletnieks
  2 siblings, 0 replies; 41+ messages in thread
From: Rik van Riel @ 2007-03-15 15:00 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Dave Kleikamp, Ashif Harji, linux-mm, Nick Piggin, Jan Kara,
	linux-kernel, akpm

Andreas Mohr wrote:

> I've been thinking hard how to avoid the mark_page_accessed() starvation in
> case of a fixed, (almost) non-changing access state, but this seems hard since
> it'd seem we need some kind of state management here to figure out good
> intervals of when to call mark_page_accessed() *again* for this page. 

Like this? :)

http://surriel.com/patches/clockpro/2.6.12/useonce-cleanup

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 12:38                   ` Nick Piggin
@ 2007-03-15 15:06                     ` Rik van Riel
  0 siblings, 0 replies; 41+ messages in thread
From: Rik van Riel @ 2007-03-15 15:06 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Peter Zijlstra, Ashif Harji, linux-mm, Jan Kara, linux-kernel, akpm

Nick Piggin wrote:
> On Thu, Mar 15, 2007 at 11:39:14AM +0100, Peter Zijlstra wrote:
>> On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
>>> This patch unconditionally calls mark_page_accessed to prevent pages, 
>>> especially for small files, from being evicted from the page cache despite 
>>> frequent access.
>> Since we're hackling over the use-once stuff again...
>>
>> /me brings up: http://marc.info/?l=linux-mm&m=115316894804385&w=2 and
>> ducks.
> 
> Join the club ;) 
> 
> http://groups.google.com.au/group/linux.kernel/msg/7b3237b8e715475b?hl=en&
> 
> I can't find the patch where I actually did combine it with a PG_usedonce
> bit, but the end result is pretty similar to your patch.

Except for the bit where vmscan throws away page cache pages
with PageReferenced set, unless they are mapped :)

Also, pages that only got accessed once will have the referenced
bit set too, so your patch gives no way to distinguish between
pages that were accessed once and pages that were accessed multiple
times.

> And I think one
> or two others have also independently invented the same thing.
> 
> So it *has* to be good, doesn't it? ;)

No argument there :)

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-14 19:58               ` [PATCH] mm/filemap.c: unconditionally call mark_page_accessed Ashif Harji
  2007-03-14 20:55                 ` Dave Kleikamp
  2007-03-15 10:39                 ` Peter Zijlstra
@ 2007-03-15 15:56                 ` Chuck Ebbert
  2007-03-15 16:29                   ` Nick Piggin
  2 siblings, 1 reply; 41+ messages in thread
From: Chuck Ebbert @ 2007-03-15 15:56 UTC (permalink / raw)
  To: Ashif Harji; +Cc: linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm

[-- Attachment #1: Type: text/plain, Size: 301 bytes --]

Ashif Harji wrote:
> 
> This patch unconditionally calls mark_page_accessed to prevent pages,
> especially for small files, from being evicted from the page cache
> despite frequent access.
> 
> Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>
> 

I like mine better -- it leaves the comment:



[-- Attachment #2: linux-2.6-fix_mark_page_accessed.patch --]
[-- Type: text/plain, Size: 1080 bytes --]

From: Chuck Ebbert <cebbert@redhat.com>

Always mark page as accessed when reading multiple times.
Original idea and patch by Ashif Harji <asharji@cs.uwaterloo.ca>

Signed-off-by: Chuck Ebbert <cebbert@redhat.com>

--- 2.6.20.2-t.orig/mm/filemap.c
+++ 2.6.20.2-t/mm/filemap.c
@@ -887,7 +887,6 @@
 	unsigned long offset;
 	unsigned long last_index;
 	unsigned long next_index;
-	unsigned long prev_index;
 	loff_t isize;
 	struct page *cached_page;
 	int error;
@@ -896,7 +895,6 @@
 	cached_page = NULL;
 	index = *ppos >> PAGE_CACHE_SHIFT;
 	next_index = index;
-	prev_index = ra.prev_page;
 	last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
 	offset = *ppos & ~PAGE_CACHE_MASK;
 
@@ -945,11 +943,9 @@
 
 		/*
 		 * When (part of) the same page is read multiple times
-		 * in succession, only mark it as accessed the first time.
+		 * in succession, always mark it accessed.
 		 */
-		if (prev_index != index)
-			mark_page_accessed(page);
-		prev_index = index;
+		mark_page_accessed(page);
 
 		/*
 		 * Ok, we have the page, and it's up-to-date, so

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 15:56                 ` Chuck Ebbert
@ 2007-03-15 16:29                   ` Nick Piggin
  2007-03-15 17:04                     ` Rik van Riel
                                       ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Nick Piggin @ 2007-03-15 16:29 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: Ashif Harji, linux-mm, Jan Kara, linux-kernel, akpm

On Thu, Mar 15, 2007 at 11:56:59AM -0400, Chuck Ebbert wrote:
> Ashif Harji wrote:
> > 
> > This patch unconditionally calls mark_page_accessed to prevent pages,
> > especially for small files, from being evicted from the page cache
> > despite frequent access.
> > 
> > Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>
> > 
> 
> I like mine better -- it leaves the comment:

How about this? It also doesn't break the use-once heuristic.

--
A change to make database style random read() workloads perform better, by
calling mark_page_accessed for some non-page-aligned reads broke the case of
< PAGE_CACHE_SIZE files, which will not get their prev_index moved past the
first page.

Combine both heuristics for marking the page accessed.

Signed-off-by: Nick Piggin <npiggin@suse.de>

Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -929,7 +929,7 @@ page_ok:
 		 * When (part of) the same page is read multiple times
 		 * in succession, only mark it as accessed the first time.
 		 */
-		if (prev_index != index)
+		if (prev_index != index || !offset)
 			mark_page_accessed(page);
 		prev_index = index;
 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 16:29                   ` Nick Piggin
@ 2007-03-15 17:04                     ` Rik van Riel
  2007-03-15 17:44                     ` Hugh Dickins
  2007-03-15 19:55                     ` Ashif Harji
  2 siblings, 0 replies; 41+ messages in thread
From: Rik van Riel @ 2007-03-15 17:04 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Chuck Ebbert, Ashif Harji, linux-mm, Jan Kara, linux-kernel, akpm

Nick Piggin wrote:

> A change to make database style random read() workloads perform better, by
> calling mark_page_accessed for some non-page-aligned reads broke the case of
> < PAGE_CACHE_SIZE files, which will not get their prev_index moved past the
> first page.
> 
> Combine both heuristics for marking the page accessed.
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-14 21:33                   ` Andreas Mohr
  2007-03-14 22:08                     ` Dave Kleikamp
  2007-03-15 15:00                     ` Rik van Riel
@ 2007-03-15 17:37                     ` Valdis.Kletnieks
  2007-03-15 18:35                       ` Rik van Riel
  2 siblings, 1 reply; 41+ messages in thread
From: Valdis.Kletnieks @ 2007-03-15 17:37 UTC (permalink / raw)
  To: Andreas Mohr
  Cc: Dave Kleikamp, Ashif Harji, linux-mm, Nick Piggin, Jan Kara,
	linux-kernel, akpm

[-- Attachment #1: Type: text/plain, Size: 548 bytes --]

On Wed, 14 Mar 2007 22:33:17 BST, Andreas Mohr said:

> it'd seem we need some kind of state management here to figure out good
> intervals of when to call mark_page_accessed() *again* for this page. E.g.
> despite non-changing access patterns you could still call mark_page_accessed(
)
> every 32 calls or so to avoid expiry, but this would need extra helper
> variables.

What if you did something like

	if (jiffies%32) {...

(Possibly scaling it so the low-order bits change).  No need to lock it, as
"right most of the time" is close enough.


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 16:29                   ` Nick Piggin
  2007-03-15 17:04                     ` Rik van Riel
@ 2007-03-15 17:44                     ` Hugh Dickins
  2007-03-15 20:01                       ` Nick Piggin
  2007-03-15 22:59                       ` Andrea Arcangeli
  2007-03-15 19:55                     ` Ashif Harji
  2 siblings, 2 replies; 41+ messages in thread
From: Hugh Dickins @ 2007-03-15 17:44 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Chuck Ebbert, Ashif Harji, Miquel van Smoorenburg, linux-mm,
	Jan Kara, linux-kernel, akpm

On Thu, 15 Mar 2007, Nick Piggin wrote:
> On Thu, Mar 15, 2007 at 11:56:59AM -0400, Chuck Ebbert wrote:
> > Ashif Harji wrote:
> > > 
> > > This patch unconditionally calls mark_page_accessed to prevent pages,
> > > especially for small files, from being evicted from the page cache
> > > despite frequent access.
> > > 
> > > Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>

Yeah, yeah, I'm not a real mman, I don't have my own patch and
website for this ;) but I'm old, let me mumble some history...

Ashif's patch would take us back to 2.4.10 when mark_page_accessed
was introduced: in 2.4.11 someone (probably Andrea) immediately
added a !offset || !filp->f_reada condition on it there, which
remains in 2.4 to this day.  That _probably_ means that Ashif's
patch is suboptimal, and that your !offset patch is good.

f_reada went away in 2.5.8, and the !offset condition remained
until 2.6.11, when Miquel (CC'ed) replaced it by today's prev_index
condition.  His changelog entry appended below.  Since it's Miquel
who removed the !offset condition, he should be consulted on its
reintroduction.

Hugh

> > 
> > I like mine better -- it leaves the comment:
> 
> How about this? It also doesn't break the use-once heuristic.
> 
> --
> A change to make database style random read() workloads perform better, by
> calling mark_page_accessed for some non-page-aligned reads broke the case of
> < PAGE_CACHE_SIZE files, which will not get their prev_index moved past the
> first page.
> 
> Combine both heuristics for marking the page accessed.
> 
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> 
> Index: linux-2.6/mm/filemap.c
> ===================================================================
> --- linux-2.6.orig/mm/filemap.c
> +++ linux-2.6/mm/filemap.c
> @@ -929,7 +929,7 @@ page_ok:
>  		 * When (part of) the same page is read multiple times
>  		 * in succession, only mark it as accessed the first time.
>  		 */
> -		if (prev_index != index)
> +		if (prev_index != index || !offset)
>  			mark_page_accessed(page);
>  		prev_index = index;
>  

Miquel's patch comment from ChangeLog-2.6.11:

[PATCH] mark_page_accessed() for read()s on non-page boundaries

When reading a (partial) page from disk using read(), the kernel only marks
the page as "accessed" if the read started at a page boundary.  This means
that files that are accessed randomly at non-page boundaries (usually
database style files) will not be cached properly.

The patch below uses the readahead state instead.  If a page is read(), it
is marked as "accessed" if the previous read() was for a different page,
whatever the offset in the page.

Testing results:


- Boot kernel with mem=128M

- create a testfile of size 8 MB on a partition. Unmount/mount.

- then generate about 10 MB/sec streaming writes

	for i in `seq 1 1000`
	do
		dd if=/dev/zero of=junkfile.$i bs=1M count=10
		sync
		cat junkfile.$i > /dev/null
		sleep 1
	done

- use an application that reads 128 bytes 64000 times from a
  random offset in the 64 MB testfile.

1. Linux 2.6.10-rc3 vanilla, no streaming writes:

# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.03s user 0.22s system 5% cpu 4.456 total

2. Linux 2.6.10-rc3 vanilla, streaming writes:

# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.03s user 0.16s system 2% cpu 7.667 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.03s user 0.37s system 1% cpu 23.294 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.02s user 0.99s system 1% cpu 1:11.52 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.03s user 0.21s system 2% cpu 10.273 total

3. Linux 2.6.10-rc3 with read-page-access.patch , streaming writes:

# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.02s user 0.21s system 3% cpu 7.634 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.04s user 0.22s system 2% cpu 9.588 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.02s user 0.12s system 24% cpu 0.563 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile  0.03s user 0.13s system 98% cpu 0.163 total

As expected, with the read-page-access.patch, the kernel keeps the 8 MB
testfile cached as expected, while without it, it doesn't.

So this is useful for workloads where one smallish (wrt RAM) file is read
randomly over and over again (like heavily used database indexes), while
other I/O is going on.  Plain 2.6 caches those files poorly, if the app
uses plain read().

Signed-Off-By: Miquel van Smoorenburg <miquels@cistron.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 17:37                     ` Valdis.Kletnieks
@ 2007-03-15 18:35                       ` Rik van Riel
  2007-03-16  3:51                         ` Valdis.Kletnieks
  0 siblings, 1 reply; 41+ messages in thread
From: Rik van Riel @ 2007-03-15 18:35 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Andreas Mohr, Dave Kleikamp, Ashif Harji, linux-mm, Nick Piggin,
	Jan Kara, linux-kernel, akpm

Valdis.Kletnieks@vt.edu wrote:
> On Wed, 14 Mar 2007 22:33:17 BST, Andreas Mohr said:
> 
>> it'd seem we need some kind of state management here to figure out good
>> intervals of when to call mark_page_accessed() *again* for this page. E.g.
>> despite non-changing access patterns you could still call mark_page_accessed(
> )
>> every 32 calls or so to avoid expiry, but this would need extra helper
>> variables.
> 
> What if you did something like
> 
> 	if (jiffies%32) {...
> 
> (Possibly scaling it so the low-order bits change).  No need to lock it, as
> "right most of the time" is close enough.

Bad idea.  That way you would only count page accesses if the
phase of the moon^Wjiffie is just right.

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15  5:22                         ` Ashif Harji
  2007-03-15 12:46                           ` Dave Kleikamp
@ 2007-03-15 19:07                           ` Andrew Morton
  2007-03-15 21:49                             ` Andrea Arcangeli
  1 sibling, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2007-03-15 19:07 UTC (permalink / raw)
  To: Ashif Harji; +Cc: dingxn, shaggy, andi, linux-mm, npiggin, jack, linux-kernel

> On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <asharji@cs.uwaterloo.ca> wrote:
> I still think the simple fix of removing the 
> condition is the best approach, but I'm certainly open to alternatives.

Yes, the problem of falsely activating pages when the file is read in small
hunks is worse than the problem which your patch fixes.

We could change it so that if the current read() includes the zeroeth byte
of the page, we run mark_page_accessed() even if this_page==prev_page?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 16:29                   ` Nick Piggin
  2007-03-15 17:04                     ` Rik van Riel
  2007-03-15 17:44                     ` Hugh Dickins
@ 2007-03-15 19:55                     ` Ashif Harji
  2007-03-15 20:07                       ` Nick Piggin
  2 siblings, 1 reply; 41+ messages in thread
From: Ashif Harji @ 2007-03-15 19:55 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Chuck Ebbert, linux-mm, Jan Kara, linux-kernel, akpm,
	Andrew Morton, Rik van Riel



On Thu, 15 Mar 2007, Nick Piggin wrote:

> On Thu, Mar 15, 2007 at 11:56:59AM -0400, Chuck Ebbert wrote:
>> Ashif Harji wrote:
>>>
>>> This patch unconditionally calls mark_page_accessed to prevent pages,
>>> especially for small files, from being evicted from the page cache
>>> despite frequent access.
>>>
>>> Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>
>>>
>>
>> I like mine better -- it leaves the comment:
>
> How about this? It also doesn't break the use-once heuristic.
>
> --
> A change to make database style random read() workloads perform better, by
> calling mark_page_accessed for some non-page-aligned reads broke the case of
> < PAGE_CACHE_SIZE files, which will not get their prev_index moved past the
> first page.
>
> Combine both heuristics for marking the page accessed.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
>
> Index: linux-2.6/mm/filemap.c
> ===================================================================
> --- linux-2.6.orig/mm/filemap.c
> +++ linux-2.6/mm/filemap.c
> @@ -929,7 +929,7 @@ page_ok:
> 		 * When (part of) the same page is read multiple times
> 		 * in succession, only mark it as accessed the first time.
> 		 */
> -		if (prev_index != index)
> +		if (prev_index != index || !offset)
> 			mark_page_accessed(page);
> 		prev_index = index;
>
>

It sounds like people are happy with the fix suggested by Nick.  That fix 
is okay with me as it fixes the problem I am having.

I suspect, however, that by not directly detecting the problematic access 
pattern, where the file is accessed sequentially in small hunks, other 
applications may experience performance problems related to caching. For 
example, if an application frequently and non-sequentially reads from the 
same page.  This is especially true for files of size < PAGE_CACHE_SIZE.
But, I'm not sure if such an access pattern likely.

ashif.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 17:44                     ` Hugh Dickins
@ 2007-03-15 20:01                       ` Nick Piggin
  2007-03-15 22:59                       ` Andrea Arcangeli
  1 sibling, 0 replies; 41+ messages in thread
From: Nick Piggin @ 2007-03-15 20:01 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chuck Ebbert, Ashif Harji, Miquel van Smoorenburg, linux-mm,
	Jan Kara, linux-kernel, akpm

On Thu, Mar 15, 2007 at 05:44:01PM +0000, Hugh Dickins wrote:
> On Thu, 15 Mar 2007, Nick Piggin wrote:
> > On Thu, Mar 15, 2007 at 11:56:59AM -0400, Chuck Ebbert wrote:
> > > Ashif Harji wrote:
> > > > 
> > > > This patch unconditionally calls mark_page_accessed to prevent pages,
> > > > especially for small files, from being evicted from the page cache
> > > > despite frequent access.
> > > > 
> > > > Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>
> 
> Yeah, yeah, I'm not a real mman, I don't have my own patch and
> website for this ;) but I'm old, let me mumble some history...
> 
> Ashif's patch would take us back to 2.4.10 when mark_page_accessed
> was introduced: in 2.4.11 someone (probably Andrea) immediately
> added a !offset || !filp->f_reada condition on it there, which
> remains in 2.4 to this day.  That _probably_ means that Ashif's
> patch is suboptimal, and that your !offset patch is good.
> 
> f_reada went away in 2.5.8, and the !offset condition remained
> until 2.6.11, when Miquel (CC'ed) replaced it by today's prev_index
> condition.  His changelog entry appended below.  Since it's Miquel
> who removed the !offset condition, he should be consulted on its
> reintroduction.

Yeah I did go back and check up on that changelog, because I knew
we had a !offset check there at one stage, which is immune to this
problem (or at least can handle it a little better).

I suspect that Miquel was probably more interested in _increasing_
mark_page_accessed coverage with his new condition than restricting
it from the !offset cases.

Thanks for digging it up and posting here, though.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 19:55                     ` Ashif Harji
@ 2007-03-15 20:07                       ` Nick Piggin
  2007-03-15 20:31                         ` Andreas Mohr
  0 siblings, 1 reply; 41+ messages in thread
From: Nick Piggin @ 2007-03-15 20:07 UTC (permalink / raw)
  To: Ashif Harji
  Cc: Chuck Ebbert, linux-mm, Jan Kara, linux-kernel, akpm, Rik van Riel

On Thu, Mar 15, 2007 at 03:55:08PM -0400, Ashif Harji wrote:
> 
> It sounds like people are happy with the fix suggested by Nick.  That fix 
> is okay with me as it fixes the problem I am having.
> 
> I suspect, however, that by not directly detecting the problematic access 
> pattern, where the file is accessed sequentially in small hunks, other 
> applications may experience performance problems related to caching. For 
> example, if an application frequently and non-sequentially reads from the 
> same page.  This is especially true for files of size < PAGE_CACHE_SIZE.
> But, I'm not sure if such an access pattern likely.

Well in general we like to help applications that help themselves. It
is actually a good heuristic, surprisingly. If an application randomly
accesses the same page (and there is no write activity going on), then
it would be better off to cache it in userspace, and if it doesn't care
to do that then it won't mind having to read it off disk now and again :)


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 20:07                       ` Nick Piggin
@ 2007-03-15 20:31                         ` Andreas Mohr
  0 siblings, 0 replies; 41+ messages in thread
From: Andreas Mohr @ 2007-03-15 20:31 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ashif Harji, Chuck Ebbert, linux-mm, Jan Kara, linux-kernel,
	akpm, Rik van Riel

Hi,

On Thu, Mar 15, 2007 at 09:07:39PM +0100, Nick Piggin wrote:
> Well in general we like to help applications that help themselves. It
> is actually a good heuristic, surprisingly. If an application randomly
> accesses the same page (and there is no write activity going on), then
> it would be better off to cache it in userspace, and if it doesn't care
> to do that then it won't mind having to read it off disk now and again :)

Sounds like a good plan since this probably is a nice way to make stupid apps
doing stupid things sit up and take notice, and maybe the authors will then go
so far as fixing up a few more things that are hurting them once they actually
recognize that something is weird due to overly bad performance.

Why go to great lengths to support stupid apps when there are still so many things
which could be done to help well-behaving ones? ;)

Andreas Mohr

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 19:07                           ` Andrew Morton
@ 2007-03-15 21:49                             ` Andrea Arcangeli
  2007-03-15 22:06                               ` Andrew Morton
  0 siblings, 1 reply; 41+ messages in thread
From: Andrea Arcangeli @ 2007-03-15 21:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ashif Harji, dingxn, shaggy, andi, linux-mm, npiggin, jack, linux-kernel

On Thu, Mar 15, 2007 at 11:07:35AM -0800, Andrew Morton wrote:
> > On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <asharji@cs.uwaterloo.ca> wrote:
> > I still think the simple fix of removing the 
> > condition is the best approach, but I'm certainly open to alternatives.
> 
> Yes, the problem of falsely activating pages when the file is read in small
> hunks is worse than the problem which your patch fixes.

Really? I would have expected all performance sensitive apps to read
in >=PAGE_SIZE chunks. And if they don't because they split their
dataset in blocks (like some database), it may not be so wrong to
activate those pages that have two "hot" blocks more aggressively than
those pages with a single hot block.

So I've an hard time to advocate to prefer the current behavior, but
certainly this can be "fixed" by caching the last_offset like others
pointed out ;)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 21:49                             ` Andrea Arcangeli
@ 2007-03-15 22:06                               ` Andrew Morton
  2007-03-15 23:15                                 ` Andrea Arcangeli
  0 siblings, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2007-03-15 22:06 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Ashif Harji, dingxn, shaggy, andi, linux-mm, npiggin, jack, linux-kernel

On Thu, 15 Mar 2007 22:49:23 +0100
Andrea Arcangeli <andrea@suse.de> wrote:

> On Thu, Mar 15, 2007 at 11:07:35AM -0800, Andrew Morton wrote:
> > > On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <asharji@cs.uwaterloo.ca> wrote:
> > > I still think the simple fix of removing the 
> > > condition is the best approach, but I'm certainly open to alternatives.
> > 
> > Yes, the problem of falsely activating pages when the file is read in small
> > hunks is worse than the problem which your patch fixes.
> 
> Really? I would have expected all performance sensitive apps to read
> in >=PAGE_SIZE chunks. And if they don't because they split their
> dataset in blocks (like some database), it may not be so wrong to
> activate those pages that have two "hot" blocks more aggressively than
> those pages with a single hot block.

But the problem which is being fixed here is really obscure: an application
repeatedly reading the first page and only the first page of a file, always
via the same fd.

I'd expect that the sub-page-size read scenarion happens heaps more often
than that, especially when dealing with larger PAGE_SIZEs.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 17:44                     ` Hugh Dickins
  2007-03-15 20:01                       ` Nick Piggin
@ 2007-03-15 22:59                       ` Andrea Arcangeli
  2007-03-15 23:15                         ` Dave Kleikamp
  1 sibling, 1 reply; 41+ messages in thread
From: Andrea Arcangeli @ 2007-03-15 22:59 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Nick Piggin, Chuck Ebbert, Ashif Harji, Miquel van Smoorenburg,
	linux-mm, Jan Kara, linux-kernel, akpm

On Thu, Mar 15, 2007 at 05:44:01PM +0000, Hugh Dickins wrote:
> who removed the !offset condition, he should be consulted on its
> reintroduction.

the !offset check looks a pretty broken heuristic indeed, it would
break random I/O. The real fix is to add a ra.prev_offset along with
ra.prev_page, and if who implements it wants to be stylish he can as
well use a ra.last_contiguous_read structure that has a page and
offset fields (and then of course remove ra.prev_page).

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 22:06                               ` Andrew Morton
@ 2007-03-15 23:15                                 ` Andrea Arcangeli
  0 siblings, 0 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2007-03-15 23:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ashif Harji, dingxn, shaggy, andi, linux-mm, npiggin, jack, linux-kernel

On Thu, Mar 15, 2007 at 03:06:01PM -0700, Andrew Morton wrote:
> On Thu, 15 Mar 2007 22:49:23 +0100
> Andrea Arcangeli <andrea@suse.de> wrote:
> 
> > On Thu, Mar 15, 2007 at 11:07:35AM -0800, Andrew Morton wrote:
> > > > On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <asharji@cs.uwaterloo.ca> wrote:
> > > > I still think the simple fix of removing the 
> > > > condition is the best approach, but I'm certainly open to alternatives.
> > > 
> > > Yes, the problem of falsely activating pages when the file is read in small
> > > hunks is worse than the problem which your patch fixes.
> > 
> > Really? I would have expected all performance sensitive apps to read
> > in >=PAGE_SIZE chunks. And if they don't because they split their
> > dataset in blocks (like some database), it may not be so wrong to
> > activate those pages that have two "hot" blocks more aggressively than
> > those pages with a single hot block.
> 
> But the problem which is being fixed here is really obscure: an application
> repeatedly reading the first page and only the first page of a file, always
> via the same fd.
>
> I'd expect that the sub-page-size read scenarion happens heaps more often
> than that, especially when dealing with larger PAGE_SIZEs.

Whatever that app is doing, clearly we have to keep those 4k in cache!
Like obviously the specweb demonstrated that as long as you are
_repeating_ the same read, it's correct to activate the page even if
it was reading from the same page as before.

What is wrong is to activate the page more aggressively if it's
_different_ parts of the page that are being read in a contiguous
way. I thought that the whole point of the ra.prev_page was to detect
_contiguous_ (not random) I/O made with a small buffer, anything else
doesn't make much sense to me.

In short I think taking a ra.prev_offset into account as suggested by
Dave Kleikamp is the best, it may actually benefit the obscure app too ;)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 22:59                       ` Andrea Arcangeli
@ 2007-03-15 23:15                         ` Dave Kleikamp
  2007-03-15 23:28                           ` Andrea Arcangeli
  0 siblings, 1 reply; 41+ messages in thread
From: Dave Kleikamp @ 2007-03-15 23:15 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Nick Piggin, Chuck Ebbert, Ashif Harji,
	Miquel van Smoorenburg, linux-mm, Jan Kara, linux-kernel, akpm

On Thu, 2007-03-15 at 23:59 +0100, Andrea Arcangeli wrote:
> On Thu, Mar 15, 2007 at 05:44:01PM +0000, Hugh Dickins wrote:
> > who removed the !offset condition, he should be consulted on its
> > reintroduction.
> 
> the !offset check looks a pretty broken heuristic indeed, it would
> break random I/O.

I wouldn't call it broken.  At worst, I'd say it's imperfect.  But
that's the nature of a heuristic.  It most likely works in a huge
majority of cases.

> The real fix is to add a ra.prev_offset along with
> ra.prev_page, and if who implements it wants to be stylish he can as
> well use a ra.last_contiguous_read structure that has a page and
> offset fields (and then of course remove ra.prev_page).

I suggested something along these lines, but I wonder if it's overkill.
The !offset check is simple and appears to be a decent improvement over
the current code.
-- 
David Kleikamp
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 23:15                         ` Dave Kleikamp
@ 2007-03-15 23:28                           ` Andrea Arcangeli
  0 siblings, 0 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2007-03-15 23:28 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Hugh Dickins, Nick Piggin, Chuck Ebbert, Ashif Harji,
	Miquel van Smoorenburg, linux-mm, Jan Kara, linux-kernel, akpm

On Thu, Mar 15, 2007 at 06:15:45PM -0500, Dave Kleikamp wrote:
> On Thu, 2007-03-15 at 23:59 +0100, Andrea Arcangeli wrote:
> > On Thu, Mar 15, 2007 at 05:44:01PM +0000, Hugh Dickins wrote:
> > > who removed the !offset condition, he should be consulted on its
> > > reintroduction.
> > 
> > the !offset check looks a pretty broken heuristic indeed, it would
> > break random I/O.
> 
> I wouldn't call it broken.  At worst, I'd say it's imperfect.  But
> that's the nature of a heuristic.  It most likely works in a huge
> majority of cases.

well, IMHO in the huge majority of cases the prev_page check isn't
necessary in the first place (and IMHO it hurts a lot more than it can
help, as demonstrated by specweb, since we'll bite on the good guys to
help the bad guys).

The only case where I can imagine the prev_page to make sense is to
handle contiguous I/O made with a small buffer, so clearly an
inefficient code in the first place. But if this guy is reading with
<PAGE_SIZE buffer there's no guarantee that he's reading f_pos aligned
either, hence the need of taking last_offset into account too so at
least it's a "perfect" heuristic that will reliably detect contiguous
I/O no matter in what shape or form you execute it, as long as it is
contiguous I/O.

Any other variation of behavior besides the autodetection of
contiguous I/O run in whatever buffer/aligned form, should be mandated
by userland through fadvise/madvise IMHO or we run into the toes of
the good guys.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-15 18:35                       ` Rik van Riel
@ 2007-03-16  3:51                         ` Valdis.Kletnieks
  2007-03-16  4:09                           ` Rik van Riel
  0 siblings, 1 reply; 41+ messages in thread
From: Valdis.Kletnieks @ 2007-03-16  3:51 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andreas Mohr, Dave Kleikamp, Ashif Harji, linux-mm, Nick Piggin,
	Jan Kara, linux-kernel, akpm

[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]

On Thu, 15 Mar 2007 14:35:17 EDT, Rik van Riel said:
> Valdis.Kletnieks@vt.edu wrote:
> > On Wed, 14 Mar 2007 22:33:17 BST, Andreas Mohr said:
> > 
> >> it'd seem we need some kind of state management here to figure out good
> >> intervals of when to call mark_page_accessed() *again* for this page. E.g.
> >> despite non-changing access patterns you could still call mark_page_accessed()
> >> every 32 calls or so to avoid expiry, but this would need extra helper
> >> variables.
> > 
> > What if you did something like
> > 
> > 	if (jiffies%32) {...
> > 
> > (Possibly scaling it so the low-order bits change).  No need to lock it, as
> > "right most of the time" is close enough.
> 
> Bad idea.  That way you would only count page accesses if the
> phase of the moon^Wjiffie is just right.

On the other hand, Andreas suggested only marking it once every 32 calls,
but that required a helper variable.  Statistically, jiffies%32 should
end up about the same as a helper variable %32.

This of course, if just calling mark_page_accessed() is actually expensive
enough that we don't want to do it unconditionally.


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-16  3:51                         ` Valdis.Kletnieks
@ 2007-03-16  4:09                           ` Rik van Riel
  0 siblings, 0 replies; 41+ messages in thread
From: Rik van Riel @ 2007-03-16  4:09 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Andreas Mohr, Dave Kleikamp, Ashif Harji, linux-mm, Nick Piggin,
	Jan Kara, linux-kernel, akpm

Valdis.Kletnieks@vt.edu wrote:

> On the other hand, Andreas suggested only marking it once every 32 calls,
> but that required a helper variable.  Statistically, jiffies%32 should
> end up about the same as a helper variable %32.
> 
> This of course, if just calling mark_page_accessed() is actually expensive
> enough that we don't want to do it unconditionally.

Not caching a needed page and having to wait for a disk seek
to complete will be *way* more expensive than any call to
mark_page_accessed().

A modern CPU can do somewhere on the order of 50 million
instructions in the time it takes to bring one page in from
disk.

However, this does not mean we should unconditionally call
mark_page_accessed(), since that could cause use to push
wanted data out of the cache because of one program that
does its streaming accesses in a strange way...

This is a situation where getting it right almost certainly
matters.

-- 
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is.  Each group
calls the other unpatriotic.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
  2007-03-14 20:55                 ` Dave Kleikamp
  2007-03-14 21:33                   ` Andreas Mohr
@ 2007-03-16 14:20                   ` Anton Blanchard
  1 sibling, 0 replies; 41+ messages in thread
From: Anton Blanchard @ 2007-03-16 14:20 UTC (permalink / raw)
  To: Dave Kleikamp
  Cc: Ashif Harji, linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm


Hi,

> I guess the downside to this is if a reader is reading a large file, or
> several files, sequentially with a small read size (smaller than
> PAGE_SIZE), the pages will be marked active after just one read pass.
> My gut says the benefits of this patch outweigh the cost.  I would
> expect real-world backup apps, etc. to read at least PAGE_SIZE.

PAGE_SIZE being 8k on sparc, 16-64k on ia64 and potentially 64kb on
powerpc :)

Id expect a large percentage of files to be below that size.

Anton

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2007-03-16 14:23 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-09 22:03 do_generic_mapping_read performance issue Ashif Harji
2007-03-12 14:20 ` Jan Kara
2007-03-12 14:39   ` Nick Piggin
2007-03-12 15:13     ` Jan Kara
2007-03-12 17:05       ` Ashif Harji
2007-03-12 17:35         ` Jan Kara
2007-03-13 18:43           ` Ashif Harji
2007-03-13 18:55             ` Jan Kara
2007-03-14 19:58               ` [PATCH] mm/filemap.c: unconditionally call mark_page_accessed Ashif Harji
2007-03-14 20:55                 ` Dave Kleikamp
2007-03-14 21:33                   ` Andreas Mohr
2007-03-14 22:08                     ` Dave Kleikamp
2007-03-15  1:36                       ` Xiaoning Ding
2007-03-15  5:22                         ` Ashif Harji
2007-03-15 12:46                           ` Dave Kleikamp
2007-03-15 12:50                             ` Nick Piggin
2007-03-15 19:07                           ` Andrew Morton
2007-03-15 21:49                             ` Andrea Arcangeli
2007-03-15 22:06                               ` Andrew Morton
2007-03-15 23:15                                 ` Andrea Arcangeli
2007-03-15 15:00                     ` Rik van Riel
2007-03-15 17:37                     ` Valdis.Kletnieks
2007-03-15 18:35                       ` Rik van Riel
2007-03-16  3:51                         ` Valdis.Kletnieks
2007-03-16  4:09                           ` Rik van Riel
2007-03-16 14:20                   ` Anton Blanchard
2007-03-15 10:39                 ` Peter Zijlstra
2007-03-15 12:38                   ` Nick Piggin
2007-03-15 15:06                     ` Rik van Riel
2007-03-15 15:56                 ` Chuck Ebbert
2007-03-15 16:29                   ` Nick Piggin
2007-03-15 17:04                     ` Rik van Riel
2007-03-15 17:44                     ` Hugh Dickins
2007-03-15 20:01                       ` Nick Piggin
2007-03-15 22:59                       ` Andrea Arcangeli
2007-03-15 23:15                         ` Dave Kleikamp
2007-03-15 23:28                           ` Andrea Arcangeli
2007-03-15 19:55                     ` Ashif Harji
2007-03-15 20:07                       ` Nick Piggin
2007-03-15 20:31                         ` Andreas Mohr
2007-03-12 16:46   ` do_generic_mapping_read performance issue Ashif Harji

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).