* do_generic_mapping_read performance issue
@ 2007-03-09 22:03 Ashif Harji
2007-03-12 14:20 ` Jan Kara
0 siblings, 1 reply; 41+ messages in thread
From: Ashif Harji @ 2007-03-09 22:03 UTC (permalink / raw)
To: linux-kernel
Hi, I am encountering a performance problem, which I have tracked into the
Linux kernel. The problem occurs with my experimental web server that uses
sendfile to repeatedly transmit files. The files are based on the static
portion of the SPECweb99 fileset and range in size to model a reasonable
workload. With this workload, a significant number of the requests are
for files of size 4 KB or less.
I have determined that the performance problems occurs in the function
do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
Here is the specific code fragment:
/*
* When (part of) the same page is read multiple times
* in succession, only mark it as accessed the first time.
*/
if (prev_index != index)
mark_page_accessed(page);
The implication of this code is that for files of size less than or equal
to a single page, the page associated with such a file is likely to get
evicted from the cache regardless of how frequently it is accessed. The
reason is that after the first access, prev_index is always zero and index
can only be zero. Hence, mark_page_accessed is never called after the
first time the file is requested. As a result, the page is evicted from
the cache no matter how frequently it is used. By changing the kernel to
always call mark_page_accessed for these files, the server throughput is
increased by as much as 20%.
I was wondering if anyone could explain why the call to mark_page_accessed
is conditional? That is, what problem it is trying to solve. It would seem
that in many scenarios, if the same page is accessed repeatedly, then it
would be appropriate to keep that page cached.
Please personally CC me on any responses.
thanks,
ashif.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: do_generic_mapping_read performance issue
2007-03-09 22:03 do_generic_mapping_read performance issue Ashif Harji
@ 2007-03-12 14:20 ` Jan Kara
2007-03-12 14:39 ` Nick Piggin
2007-03-12 16:46 ` do_generic_mapping_read performance issue Ashif Harji
0 siblings, 2 replies; 41+ messages in thread
From: Jan Kara @ 2007-03-12 14:20 UTC (permalink / raw)
To: Ashif Harji; +Cc: linux-kernel, npiggin
Hi,
> Hi, I am encountering a performance problem, which I have tracked into the
> Linux kernel. The problem occurs with my experimental web server that uses
> sendfile to repeatedly transmit files. The files are based on the static
> portion of the SPECweb99 fileset and range in size to model a reasonable
> workload. With this workload, a significant number of the requests are
> for files of size 4 KB or less.
>
> I have determined that the performance problems occurs in the function
> do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> Here is the specific code fragment:
>
> /*
> * When (part of) the same page is read multiple times
> * in succession, only mark it as accessed the first time.
> */
> if (prev_index != index)
> mark_page_accessed(page);
Actually, the code is like that certainly for two years :).
> The implication of this code is that for files of size less than or equal
> to a single page, the page associated with such a file is likely to get
> evicted from the cache regardless of how frequently it is accessed. The
> reason is that after the first access, prev_index is always zero and index
> can only be zero. Hence, mark_page_accessed is never called after the
> first time the file is requested. As a result, the page is evicted from
> the cache no matter how frequently it is used. By changing the kernel to
> always call mark_page_accessed for these files, the server throughput is
> increased by as much as 20%.
Your analysis seems to be right. But to observe this behaviour you have
to have the file open and just always reread it using the same file
descriptor, don't you? That's probably not too common...
> I was wondering if anyone could explain why the call to mark_page_accessed
> is conditional? That is, what problem it is trying to solve. It would seem
> that in many scenarios, if the same page is accessed repeatedly, then it
> would be appropriate to keep that page cached.
I also don't know why the condition is there but it's there at least
for two years so I'm not sure anybody remembers ;). Nick, do you have
an idea?
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: do_generic_mapping_read performance issue
2007-03-12 14:20 ` Jan Kara
@ 2007-03-12 14:39 ` Nick Piggin
2007-03-12 15:13 ` Jan Kara
2007-03-12 16:46 ` do_generic_mapping_read performance issue Ashif Harji
1 sibling, 1 reply; 41+ messages in thread
From: Nick Piggin @ 2007-03-12 14:39 UTC (permalink / raw)
To: Jan Kara; +Cc: Ashif Harji, linux-kernel
On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
> Hi,
>
> > Hi, I am encountering a performance problem, which I have tracked into the
> > Linux kernel. The problem occurs with my experimental web server that uses
> > sendfile to repeatedly transmit files. The files are based on the static
> > portion of the SPECweb99 fileset and range in size to model a reasonable
> > workload. With this workload, a significant number of the requests are
> > for files of size 4 KB or less.
> >
> > I have determined that the performance problems occurs in the function
> > do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> > Here is the specific code fragment:
> >
> > /*
> > * When (part of) the same page is read multiple times
> > * in succession, only mark it as accessed the first time.
> > */
> > if (prev_index != index)
> > mark_page_accessed(page);
> Actually, the code is like that certainly for two years :).
Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
stage (ie. read from the start of a page -- obviously that also has holes).
> > The implication of this code is that for files of size less than or equal
> > to a single page, the page associated with such a file is likely to get
> > evicted from the cache regardless of how frequently it is accessed. The
> > reason is that after the first access, prev_index is always zero and index
> > can only be zero. Hence, mark_page_accessed is never called after the
> > first time the file is requested. As a result, the page is evicted from
> > the cache no matter how frequently it is used. By changing the kernel to
> > always call mark_page_accessed for these files, the server throughput is
> > increased by as much as 20%.
> Your analysis seems to be right. But to observe this behaviour you have
> to have the file open and just always reread it using the same file
> descriptor, don't you? That's probably not too common...
>
> > I was wondering if anyone could explain why the call to mark_page_accessed
> > is conditional? That is, what problem it is trying to solve. It would seem
> > that in many scenarios, if the same page is accessed repeatedly, then it
> > would be appropriate to keep that page cached.
> I also don't know why the condition is there but it's there at least
> for two years so I'm not sure anybody remembers ;). Nick, do you have
> an idea?
Yeah it is there because that is basically how our "use once" detection
handles the case where an app does not read in chunks that are PAGE_SIZE
multiples and PAGE_SIZE aligned.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: do_generic_mapping_read performance issue
2007-03-12 14:39 ` Nick Piggin
@ 2007-03-12 15:13 ` Jan Kara
2007-03-12 17:05 ` Ashif Harji
0 siblings, 1 reply; 41+ messages in thread
From: Jan Kara @ 2007-03-12 15:13 UTC (permalink / raw)
To: Nick Piggin; +Cc: Ashif Harji, linux-kernel
On Mon 12-03-07 15:39:00, Nick Piggin wrote:
> On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
> > Hi,
> >
> > > Hi, I am encountering a performance problem, which I have tracked into the
> > > Linux kernel. The problem occurs with my experimental web server that uses
> > > sendfile to repeatedly transmit files. The files are based on the static
> > > portion of the SPECweb99 fileset and range in size to model a reasonable
> > > workload. With this workload, a significant number of the requests are
> > > for files of size 4 KB or less.
> > >
> > > I have determined that the performance problems occurs in the function
> > > do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
> > > Here is the specific code fragment:
> > >
> > > /*
> > > * When (part of) the same page is read multiple times
> > > * in succession, only mark it as accessed the first time.
> > > */
> > > if (prev_index != index)
> > > mark_page_accessed(page);
> > Actually, the code is like that certainly for two years :).
>
> Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
> stage (ie. read from the start of a page -- obviously that also has holes).
Yes, at least in 2.6.12-rc5 which is the first one in git :).
> > > I was wondering if anyone could explain why the call to mark_page_accessed
> > > is conditional? That is, what problem it is trying to solve. It would seem
> > > that in many scenarios, if the same page is accessed repeatedly, then it
> > > would be appropriate to keep that page cached.
> > I also don't know why the condition is there but it's there at least
> > for two years so I'm not sure anybody remembers ;). Nick, do you have
> > an idea?
>
> Yeah it is there because that is basically how our "use once" detection
> handles the case where an app does not read in chunks that are PAGE_SIZE
> multiples and PAGE_SIZE aligned.
OK, I see. Then I'm not sure the check does more good than bad. Because
if we happen to reread the same chunk several times, then the check does a
wrong thing...
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: do_generic_mapping_read performance issue
2007-03-12 14:20 ` Jan Kara
2007-03-12 14:39 ` Nick Piggin
@ 2007-03-12 16:46 ` Ashif Harji
1 sibling, 0 replies; 41+ messages in thread
From: Ashif Harji @ 2007-03-12 16:46 UTC (permalink / raw)
To: Jan Kara; +Cc: linux-kernel, npiggin
>> The implication of this code is that for files of size less than or equal
>> to a single page, the page associated with such a file is likely to get
>> evicted from the cache regardless of how frequently it is accessed. The
>> reason is that after the first access, prev_index is always zero and index
>> can only be zero. Hence, mark_page_accessed is never called after the
>> first time the file is requested. As a result, the page is evicted from
>> the cache no matter how frequently it is used. By changing the kernel to
>> always call mark_page_accessed for these files, the server throughput is
>> increased by as much as 20%.
> Your analysis seems to be right. But to observe this behaviour you have
> to have the file open and just always reread it using the same file
> descriptor, don't you? That's probably not too common...
Yes, this problem requires the file to be accessed via the same file
descriptor. I suspect this behaviour is common for a certain class of
applications, e.g., web servers. However, any application that read from
the same page repeatedly would also experience this problem. It is more
likely to occur with small files, but not exculsively.
ashif.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: do_generic_mapping_read performance issue
2007-03-12 15:13 ` Jan Kara
@ 2007-03-12 17:05 ` Ashif Harji
2007-03-12 17:35 ` Jan Kara
0 siblings, 1 reply; 41+ messages in thread
From: Ashif Harji @ 2007-03-12 17:05 UTC (permalink / raw)
To: Jan Kara; +Cc: Nick Piggin, linux-kernel
On Mon, 12 Mar 2007, Jan Kara wrote:
> On Mon 12-03-07 15:39:00, Nick Piggin wrote:
>> On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
>>> Hi,
>>>
>>>> Hi, I am encountering a performance problem, which I have tracked into the
>>>> Linux kernel. The problem occurs with my experimental web server that uses
>>>> sendfile to repeatedly transmit files. The files are based on the static
>>>> portion of the SPECweb99 fileset and range in size to model a reasonable
>>>> workload. With this workload, a significant number of the requests are
>>>> for files of size 4 KB or less.
>>>>
>>>> I have determined that the performance problems occurs in the function
>>>> do_generic_mapping_read in file mm/filemap.c for kernel version 2.6.20.1.
>>>> Here is the specific code fragment:
>>>>
>>>> /*
>>>> * When (part of) the same page is read multiple times
>>>> * in succession, only mark it as accessed the first time.
>>>> */
>>>> if (prev_index != index)
>>>> mark_page_accessed(page);
>>> Actually, the code is like that certainly for two years :).
>>
>> Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
>> stage (ie. read from the start of a page -- obviously that also has holes).
> Yes, at least in 2.6.12-rc5 which is the first one in git :).
>
>>>> I was wondering if anyone could explain why the call to mark_page_accessed
>>>> is conditional? That is, what problem it is trying to solve. It would seem
>>>> that in many scenarios, if the same page is accessed repeatedly, then it
>>>> would be appropriate to keep that page cached.
>>> I also don't know why the condition is there but it's there at least
>>> for two years so I'm not sure anybody remembers ;). Nick, do you have
>>> an idea?
>>
>> Yeah it is there because that is basically how our "use once" detection
>> handles the case where an app does not read in chunks that are PAGE_SIZE
>> multiples and PAGE_SIZE aligned.
> OK, I see. Then I'm not sure the check does more good than bad. Because
> if we happen to reread the same chunk several times, then the check does a
> wrong thing...
Thanks for providing me with additional information.
I would like to submit a patch to fix the performance problem. The
simplest solution is to remove the check. Even in the situation where an
application does not read in PAGE_SIZE multiples as described above, if
the page is accessed frequently it should remain in the cache. However, I
am open to suggestions for a more sophisticated scheme.
ashif.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: do_generic_mapping_read performance issue
2007-03-12 17:05 ` Ashif Harji
@ 2007-03-12 17:35 ` Jan Kara
2007-03-13 18:43 ` Ashif Harji
0 siblings, 1 reply; 41+ messages in thread
From: Jan Kara @ 2007-03-12 17:35 UTC (permalink / raw)
To: Ashif Harji; +Cc: Nick Piggin, linux-kernel
On Mon 12-03-07 13:05:17, Ashif Harji wrote:
>
> On Mon, 12 Mar 2007, Jan Kara wrote:
>
> >On Mon 12-03-07 15:39:00, Nick Piggin wrote:
> >>On Mon, Mar 12, 2007 at 03:20:12PM +0100, Jan Kara wrote:
> >>> Hi,
> >>>
> >>>>Hi, I am encountering a performance problem, which I have tracked into
> >>>>the
> >>>>Linux kernel. The problem occurs with my experimental web server that
> >>>>uses
> >>>>sendfile to repeatedly transmit files. The files are based on the
> >>>>static
> >>>>portion of the SPECweb99 fileset and range in size to model a reasonable
> >>>>workload. With this workload, a significant number of the requests are
> >>>>for files of size 4 KB or less.
> >>>>
> >>>>I have determined that the performance problems occurs in the function
> >>>>do_generic_mapping_read in file mm/filemap.c for kernel version
> >>>>2.6.20.1.
> >>>>Here is the specific code fragment:
> >>>>
> >>>> /*
> >>>> * When (part of) the same page is read multiple times
> >>>> * in succession, only mark it as accessed the first time.
> >>>> */
> >>>> if (prev_index != index)
> >>>> mark_page_accessed(page);
> >>> Actually, the code is like that certainly for two years :).
> >>
> >>Did it always use ra->prev_page? ISTR it using pos%PAGE_SIZE == 0 at some
> >>stage (ie. read from the start of a page -- obviously that also has
> >>holes).
> > Yes, at least in 2.6.12-rc5 which is the first one in git :).
> >
> >>>>I was wondering if anyone could explain why the call to
> >>>>mark_page_accessed
> >>>>is conditional? That is, what problem it is trying to solve. It would
> >>>>seem
> >>>>that in many scenarios, if the same page is accessed repeatedly, then it
> >>>>would be appropriate to keep that page cached.
> >>> I also don't know why the condition is there but it's there at least
> >>>for two years so I'm not sure anybody remembers ;). Nick, do you have
> >>>an idea?
> >>
> >>Yeah it is there because that is basically how our "use once" detection
> >>handles the case where an app does not read in chunks that are PAGE_SIZE
> >>multiples and PAGE_SIZE aligned.
> > OK, I see. Then I'm not sure the check does more good than bad. Because
> >if we happen to reread the same chunk several times, then the check does a
> >wrong thing...
>
> I would like to submit a patch to fix the performance problem. The
> simplest solution is to remove the check. Even in the situation where an
> application does not read in PAGE_SIZE multiples as described above, if
> the page is accessed frequently it should remain in the cache. However, I
> am open to suggestions for a more sophisticated scheme.
Well, yes, that's an obvious solution. I just wanted to make sure that
such change won't break some other load. But so far it seems it won't...
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: do_generic_mapping_read performance issue
2007-03-12 17:35 ` Jan Kara
@ 2007-03-13 18:43 ` Ashif Harji
2007-03-13 18:55 ` Jan Kara
0 siblings, 1 reply; 41+ messages in thread
From: Ashif Harji @ 2007-03-13 18:43 UTC (permalink / raw)
To: Jan Kara; +Cc: Nick Piggin, linux-kernel
>> I would like to submit a patch to fix the performance problem. The
>> simplest solution is to remove the check. Even in the situation where an
>> application does not read in PAGE_SIZE multiples as described above, if
>> the page is accessed frequently it should remain in the cache. However, I
>> am open to suggestions for a more sophisticated scheme.
>
> Well, yes, that's an obvious solution. I just wanted to make sure that
> such change won't break some other load. But so far it seems it won't...
I would like to find out if the change I am suggesting will cause any
other problems. Is there someone else I should contact before submitting
a patch? Or should I just wait another day or two for comments?
ashif.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: do_generic_mapping_read performance issue
2007-03-13 18:43 ` Ashif Harji
@ 2007-03-13 18:55 ` Jan Kara
2007-03-14 19:58 ` [PATCH] mm/filemap.c: unconditionally call mark_page_accessed Ashif Harji
0 siblings, 1 reply; 41+ messages in thread
From: Jan Kara @ 2007-03-13 18:55 UTC (permalink / raw)
To: Ashif Harji; +Cc: Nick Piggin, linux-kernel
On Tue 13-03-07 14:43:47, Ashif Harji wrote:
>
> >>I would like to submit a patch to fix the performance problem. The
> >>simplest solution is to remove the check. Even in the situation where an
> >>application does not read in PAGE_SIZE multiples as described above, if
> >>the page is accessed frequently it should remain in the cache. However, I
> >>am open to suggestions for a more sophisticated scheme.
> >
> > Well, yes, that's an obvious solution. I just wanted to make sure that
> >such change won't break some other load. But so far it seems it won't...
>
> I would like to find out if the change I am suggesting will cause any
> other problems. Is there someone else I should contact before submitting
> a patch? Or should I just wait another day or two for comments?
I don't think you'll get more feedback now. Just write the patch,
benchmark it with your workload and also some others (basically, there should be
no difference) and then submit the patch together with the performance
numbers and some rationale (if it's your first submission, see
Documentation/SubmittingPatches). Give also CC to Andrew <akpm@linux-foundation.org>
and ask him to put it into -mm kernels for testing.
Honza
--
Jan Kara <jack@suse.cz>
SuSE CR Labs
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-13 18:55 ` Jan Kara
@ 2007-03-14 19:58 ` Ashif Harji
2007-03-14 20:55 ` Dave Kleikamp
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Ashif Harji @ 2007-03-14 19:58 UTC (permalink / raw)
To: linux-mm; +Cc: Nick Piggin, Jan Kara, linux-kernel, akpm
This patch unconditionally calls mark_page_accessed to prevent pages,
especially for small files, from being evicted from the page cache despite
frequent access.
Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>
---
If the same page of a file is repeatedly accessed (without accessing other
pages of that file) via the same file descriptor, mark_page_accessed is
never called after the first time the page is accessed.
The implication of this code is that for files of size less than or equal
to a single page, the page associated with such a file is likely to get
evicted from the cache regardless of how frequently it is accessed.
However, this behaviour also occurs with files of any size if the same
page is repeatedly accessed.
As a benchmark, I have an experimental web server that uses sendfile to
repeatedly transmit files. The files are based on the static portion of
the SPECweb99 fileset and range in size to model a reasonable workload.
With this workload, a significant number of the requests are for files of
size 4 KB or less.
By changing the kernel to always call mark_page_accessed, the server
throughput is increased by as much as 20%. With one test, for example,
without the change I get throughput of around 868 Mbps. After making the
change, performance increases to 1111 Mbps.
Using a configuration that should be unaffected by the change, performance
was around 855 Mbps without the change and around 851 Mbps with the
change. As expected the change had no appreciable effect.
See thread http://lkml.org/lkml/2007/3/9/403 for additional discussion on
this change.
This patch is for kernel version 2.6.20.1.
Andrew, can you also put this change into the -mm kernels for testing?
--- linux-2.6.20.1/mm/filemap.c.orig 2007-03-14 10:31:58.000000000 -0500
+++ linux-2.6.20.1/mm/filemap.c 2007-03-13 16:11:54.000000000 -0500
@@ -943,12 +943,7 @@ page_ok:
if (mapping_writably_mapped(mapping))
flush_dcache_page(page);
- /*
- * When (part of) the same page is read multiple times
- * in succession, only mark it as accessed the first time.
- */
- if (prev_index != index)
- mark_page_accessed(page);
+ mark_page_accessed(page);
prev_index = index;
/*
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-14 19:58 ` [PATCH] mm/filemap.c: unconditionally call mark_page_accessed Ashif Harji
@ 2007-03-14 20:55 ` Dave Kleikamp
2007-03-14 21:33 ` Andreas Mohr
2007-03-16 14:20 ` Anton Blanchard
2007-03-15 10:39 ` Peter Zijlstra
2007-03-15 15:56 ` Chuck Ebbert
2 siblings, 2 replies; 41+ messages in thread
From: Dave Kleikamp @ 2007-03-14 20:55 UTC (permalink / raw)
To: Ashif Harji; +Cc: linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm
On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
> This patch unconditionally calls mark_page_accessed to prevent pages,
> especially for small files, from being evicted from the page cache despite
> frequent access.
I guess the downside to this is if a reader is reading a large file, or
several files, sequentially with a small read size (smaller than
PAGE_SIZE), the pages will be marked active after just one read pass.
My gut says the benefits of this patch outweigh the cost. I would
expect real-world backup apps, etc. to read at least PAGE_SIZE.
Shaggy
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-14 20:55 ` Dave Kleikamp
@ 2007-03-14 21:33 ` Andreas Mohr
2007-03-14 22:08 ` Dave Kleikamp
` (2 more replies)
2007-03-16 14:20 ` Anton Blanchard
1 sibling, 3 replies; 41+ messages in thread
From: Andreas Mohr @ 2007-03-14 21:33 UTC (permalink / raw)
To: Dave Kleikamp
Cc: Ashif Harji, linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm
Hi,
On Wed, Mar 14, 2007 at 03:55:41PM -0500, Dave Kleikamp wrote:
> On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
> > This patch unconditionally calls mark_page_accessed to prevent pages,
> > especially for small files, from being evicted from the page cache despite
> > frequent access.
>
> I guess the downside to this is if a reader is reading a large file, or
> several files, sequentially with a small read size (smaller than
> PAGE_SIZE), the pages will be marked active after just one read pass.
> My gut says the benefits of this patch outweigh the cost. I would
> expect real-world backup apps, etc. to read at least PAGE_SIZE.
I also think that the patch is somewhat problematic, since the original
intention seems to have been a reduction of the number of (expensive?)
mark_page_accessed() calls, but this of course falls flat on its face in case
of permanent single-page accesses or accesses with progressing but very small
read size (single-byte reads or so), since the cached page content will expire
eventually due to lack of mark_page_accessed() updates; thus this patch
decided to call mark_page_accessed() unconditionally which may be a large
performance penalty for subsequent tiny-sized reads.
I've been thinking hard how to avoid the mark_page_accessed() starvation in
case of a fixed, (almost) non-changing access state, but this seems hard since
it'd seem we need some kind of state management here to figure out good
intervals of when to call mark_page_accessed() *again* for this page. E.g.
despite non-changing access patterns you could still call mark_page_accessed()
every 32 calls or so to avoid expiry, but this would need extra helper
variables.
A rather ugly way to do it may be to abuse ra.cache_hit or ra.mmap_hit content
with a
if ((prev_index != index) || (ra.cache_hit % 32 == 0))
mark_page_accessed(page);
This assumes that ra.cache_hit gets incremented for every access (haven't
checked whether this is the case).
That way (combined with an enhanced comment properly explaining the dilemma)
you would avoid most mark_page_accessed() invocations of subsequent same-page reads
but still do page status updates from time to time to avoid page deprecation.
Does anyone think this would be acceptable? Any better idea?
Andreas Mohr
P.S.: since I'm not too familiar with this area I could be rather wrong after all...
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-14 21:33 ` Andreas Mohr
@ 2007-03-14 22:08 ` Dave Kleikamp
2007-03-15 1:36 ` Xiaoning Ding
2007-03-15 15:00 ` Rik van Riel
2007-03-15 17:37 ` Valdis.Kletnieks
2 siblings, 1 reply; 41+ messages in thread
From: Dave Kleikamp @ 2007-03-14 22:08 UTC (permalink / raw)
To: Andreas Mohr
Cc: Ashif Harji, linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm
On Wed, 2007-03-14 at 22:33 +0100, Andreas Mohr wrote:
> Hi,
>
> On Wed, Mar 14, 2007 at 03:55:41PM -0500, Dave Kleikamp wrote:
> > On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
> > > This patch unconditionally calls mark_page_accessed to prevent pages,
> > > especially for small files, from being evicted from the page cache despite
> > > frequent access.
> >
> > I guess the downside to this is if a reader is reading a large file, or
> > several files, sequentially with a small read size (smaller than
> > PAGE_SIZE), the pages will be marked active after just one read pass.
> > My gut says the benefits of this patch outweigh the cost. I would
> > expect real-world backup apps, etc. to read at least PAGE_SIZE.
>
> I also think that the patch is somewhat problematic, since the original
> intention seems to have been a reduction of the number of (expensive?)
> mark_page_accessed() calls,
mark_page_accessed() isn't expensive. If called repeatedly, starting
with the third call, it will check two page flags and return. The only
real expense is that the page appears busier than it may be and will be
retained in memory longer than it should.
> but this of course falls flat on its face in case
> of permanent single-page accesses or accesses with progressing but very small
> read size (single-byte reads or so), since the cached page content will expire
> eventually due to lack of mark_page_accessed() updates; thus this patch
> decided to call mark_page_accessed() unconditionally which may be a large
> performance penalty for subsequent tiny-sized reads.
Any application doing many tiny-sized reads isn't exactly asking for
great performance.
> I've been thinking hard how to avoid the mark_page_accessed() starvation in
> case of a fixed, (almost) non-changing access state, but this seems hard since
> it'd seem we need some kind of state management here to figure out good
> intervals of when to call mark_page_accessed() *again* for this page. E.g.
> despite non-changing access patterns you could still call mark_page_accessed()
> every 32 calls or so to avoid expiry, but this would need extra helper
> variables.
>
> A rather ugly way to do it may be to abuse ra.cache_hit or ra.mmap_hit content
> with a
> if ((prev_index != index) || (ra.cache_hit % 32 == 0))
> mark_page_accessed(page);
> This assumes that ra.cache_hit gets incremented for every access (haven't
> checked whether this is the case).
> That way (combined with an enhanced comment properly explaining the dilemma)
> you would avoid most mark_page_accessed() invocations of subsequent same-page reads
> but still do page status updates from time to time to avoid page deprecation.
>
> Does anyone think this would be acceptable? Any better idea?
I wouldn't go looking for anything more complicated than Ashif's patch,
unless testing shows it to be harmful in some realistic workload.
> Andreas Mohr
>
> P.S.: since I'm not too familiar with this area I could be rather wrong after all...
I could be missing something as well. :-)
Shaggy
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-14 22:08 ` Dave Kleikamp
@ 2007-03-15 1:36 ` Xiaoning Ding
2007-03-15 5:22 ` Ashif Harji
0 siblings, 1 reply; 41+ messages in thread
From: Xiaoning Ding @ 2007-03-15 1:36 UTC (permalink / raw)
To: Dave Kleikamp
Cc: Andreas Mohr, Ashif Harji, linux-mm, Nick Piggin, Jan Kara,
linux-kernel, akpm
Dave Kleikamp wrote:
> On Wed, 2007-03-14 at 22:33 +0100, Andreas Mohr wrote:
>> Hi,
>>
>> On Wed, Mar 14, 2007 at 03:55:41PM -0500, Dave Kleikamp wrote:
>>> On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
>>>> This patch unconditionally calls mark_page_accessed to prevent pages,
>>>> especially for small files, from being evicted from the page cache despite
>>>> frequent access.
>>> I guess the downside to this is if a reader is reading a large file, or
>>> several files, sequentially with a small read size (smaller than
>>> PAGE_SIZE), the pages will be marked active after just one read pass.
>>> My gut says the benefits of this patch outweigh the cost. I would
>>> expect real-world backup apps, etc. to read at least PAGE_SIZE.
>> I also think that the patch is somewhat problematic, since the original
>> intention seems to have been a reduction of the number of (expensive?)
>> mark_page_accessed() calls,
>
> mark_page_accessed() isn't expensive. If called repeatedly, starting
> with the third call, it will check two page flags and return. The only
> real expense is that the page appears busier than it may be and will be
> retained in memory longer than it should.
>
If we allow mark_page_accessed() called multiple times for a single page,
a scan of large file with small-size reads would flush the buffer cache.
mark_page_accessed() also requests lru_lock when moving page from
inactive_list to active_list. It may also increase lock contention.
>> but this of course falls flat on its face in case
>> of permanent single-page accesses or accesses with progressing but very small
>> read size (single-byte reads or so), since the cached page content will expire
>> eventually due to lack of mark_page_accessed() updates; thus this patch
>> decided to call mark_page_accessed() unconditionally which may be a large
>> performance penalty for subsequent tiny-sized reads.
>
> Any application doing many tiny-sized reads isn't exactly asking for
> great performance.
>
>> I've been thinking hard how to avoid the mark_page_accessed() starvation in
>> case of a fixed, (almost) non-changing access state, but this seems hard since
>> it'd seem we need some kind of state management here to figure out good
>> intervals of when to call mark_page_accessed() *again* for this page. E.g.
>> despite non-changing access patterns you could still call mark_page_accessed()
>> every 32 calls or so to avoid expiry, but this would need extra helper
>> variables.
>>
>> A rather ugly way to do it may be to abuse ra.cache_hit or ra.mmap_hit content
>> with a
>> if ((prev_index != index) || (ra.cache_hit % 32 == 0))
>> mark_page_accessed(page);
>> This assumes that ra.cache_hit gets incremented for every access (haven't
>> checked whether this is the case).
>> That way (combined with an enhanced comment properly explaining the dilemma)
>> you would avoid most mark_page_accessed() invocations of subsequent same-page reads
>> but still do page status updates from time to time to avoid page deprecation.
>>
>> Does anyone think this would be acceptable? Any better idea?
>
> I wouldn't go looking for anything more complicated than Ashif's patch,
> unless testing shows it to be harmful in some realistic workload.
>
>> Andreas Mohr
>>
>> P.S.: since I'm not too familiar with this area I could be rather wrong after all...
>
> I could be missing something as well. :-)
>
> Shaggy
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 1:36 ` Xiaoning Ding
@ 2007-03-15 5:22 ` Ashif Harji
2007-03-15 12:46 ` Dave Kleikamp
2007-03-15 19:07 ` Andrew Morton
0 siblings, 2 replies; 41+ messages in thread
From: Ashif Harji @ 2007-03-15 5:22 UTC (permalink / raw)
To: Xiaoning Ding
Cc: Dave Kleikamp, Andreas Mohr, linux-mm, Nick Piggin, Jan Kara,
linux-kernel, akpm
On Wed, 14 Mar 2007, Xiaoning Ding wrote:
> Dave Kleikamp wrote:
>> On Wed, 2007-03-14 at 22:33 +0100, Andreas Mohr wrote:
>>> Hi,
>>>
>>> On Wed, Mar 14, 2007 at 03:55:41PM -0500, Dave Kleikamp wrote:
>>>> On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
>>>>> This patch unconditionally calls mark_page_accessed to prevent pages,
>>>>> especially for small files, from being evicted from the page cache
>>>>> despite frequent access.
>>>> I guess the downside to this is if a reader is reading a large file, or
>>>> several files, sequentially with a small read size (smaller than
>>>> PAGE_SIZE), the pages will be marked active after just one read pass.
>>>> My gut says the benefits of this patch outweigh the cost. I would
>>>> expect real-world backup apps, etc. to read at least PAGE_SIZE.
>>> I also think that the patch is somewhat problematic, since the original
>>> intention seems to have been a reduction of the number of (expensive?)
>>> mark_page_accessed() calls,
>>
>> mark_page_accessed() isn't expensive. If called repeatedly, starting
>> with the third call, it will check two page flags and return. The only
>> real expense is that the page appears busier than it may be and will be
>> retained in memory longer than it should.
>>
> If we allow mark_page_accessed() called multiple times for a single page,
> a scan of large file with small-size reads would flush the buffer cache.
> mark_page_accessed() also requests lru_lock when moving page from
> inactive_list to active_list. It may also increase lock contention.
The problem with the existing logic is that it is too coarse. In trying
to deal with one usage pattern it is negatively impacting performance for
other reasonable access patterns.
Further, consider the extreme case of scanning a file 1 byte at a time.
In this case, you are going to access a page over 4000 times, but that
page is not going to be marked as active and hence that page is likely to
be evicted from the cache. Clearly, there are cases when scanning a file
that you would like the pages to be kept in the cache.
Finally, the existing code is problematic as there is no reasonable way to
circumvent the negative impact for small files.
Hence, I think a change is necessary. The question is whether the
intent of conditionally calling mark_page_accessed() is still reasonable
and whether the amount of bookkeeping required to detect that usage
pattern but not create a problem for other usage patterns is reasonable.
I would tend to agree with David that: "Any application doing many
tiny-sized reads isn't exactly asking for great performance." As well,
applications concerned with performance and caching problems can read in a
file in PAGE_SIZE chunks. I still think the simple fix of removing the
condition is the best approach, but I'm certainly open to alternatives.
ashif.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-14 19:58 ` [PATCH] mm/filemap.c: unconditionally call mark_page_accessed Ashif Harji
2007-03-14 20:55 ` Dave Kleikamp
@ 2007-03-15 10:39 ` Peter Zijlstra
2007-03-15 12:38 ` Nick Piggin
2007-03-15 15:56 ` Chuck Ebbert
2 siblings, 1 reply; 41+ messages in thread
From: Peter Zijlstra @ 2007-03-15 10:39 UTC (permalink / raw)
To: Ashif Harji; +Cc: linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm
On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
> This patch unconditionally calls mark_page_accessed to prevent pages,
> especially for small files, from being evicted from the page cache despite
> frequent access.
Since we're hackling over the use-once stuff again...
/me brings up: http://marc.info/?l=linux-mm&m=115316894804385&w=2 and
ducks.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 10:39 ` Peter Zijlstra
@ 2007-03-15 12:38 ` Nick Piggin
2007-03-15 15:06 ` Rik van Riel
0 siblings, 1 reply; 41+ messages in thread
From: Nick Piggin @ 2007-03-15 12:38 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: Ashif Harji, linux-mm, Jan Kara, linux-kernel, akpm
On Thu, Mar 15, 2007 at 11:39:14AM +0100, Peter Zijlstra wrote:
> On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
> > This patch unconditionally calls mark_page_accessed to prevent pages,
> > especially for small files, from being evicted from the page cache despite
> > frequent access.
>
> Since we're hackling over the use-once stuff again...
>
> /me brings up: http://marc.info/?l=linux-mm&m=115316894804385&w=2 and
> ducks.
Join the club ;)
http://groups.google.com.au/group/linux.kernel/msg/7b3237b8e715475b?hl=en&
I can't find the patch where I actually did combine it with a PG_usedonce
bit, but the end result is pretty similar to your patch. And I think one
or two others have also independently invented the same thing.
So it *has* to be good, doesn't it? ;)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 5:22 ` Ashif Harji
@ 2007-03-15 12:46 ` Dave Kleikamp
2007-03-15 12:50 ` Nick Piggin
2007-03-15 19:07 ` Andrew Morton
1 sibling, 1 reply; 41+ messages in thread
From: Dave Kleikamp @ 2007-03-15 12:46 UTC (permalink / raw)
To: Ashif Harji
Cc: Xiaoning Ding, Andreas Mohr, linux-mm, Nick Piggin, Jan Kara,
linux-kernel, akpm
On Thu, 2007-03-15 at 01:22 -0400, Ashif Harji wrote:
> I would tend to agree with David that: "Any application doing many
> tiny-sized reads isn't exactly asking for great performance." As well,
> applications concerned with performance and caching problems can read in a
> file in PAGE_SIZE chunks. I still think the simple fix of removing the
> condition is the best approach, but I'm certainly open to alternatives.
A possible alternative might be to store the offset within the page in
the readahead state, and call mark_page_accessed() when the read offset
is less than or equal to the previous offset.
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 12:46 ` Dave Kleikamp
@ 2007-03-15 12:50 ` Nick Piggin
0 siblings, 0 replies; 41+ messages in thread
From: Nick Piggin @ 2007-03-15 12:50 UTC (permalink / raw)
To: Dave Kleikamp
Cc: Ashif Harji, Xiaoning Ding, Andreas Mohr, linux-mm, Jan Kara,
linux-kernel, akpm
On Thu, Mar 15, 2007 at 07:46:56AM -0500, Dave Kleikamp wrote:
> On Thu, 2007-03-15 at 01:22 -0400, Ashif Harji wrote:
>
> > I would tend to agree with David that: "Any application doing many
> > tiny-sized reads isn't exactly asking for great performance." As well,
> > applications concerned with performance and caching problems can read in a
> > file in PAGE_SIZE chunks. I still think the simple fix of removing the
> > condition is the best approach, but I'm certainly open to alternatives.
>
> A possible alternative might be to store the offset within the page in
> the readahead state, and call mark_page_accessed() when the read offset
> is less than or equal to the previous offset.
That could be a good idea.
We definitely want to look at ways to solve with within the existing
approach before any large scale change in behaviour.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-14 21:33 ` Andreas Mohr
2007-03-14 22:08 ` Dave Kleikamp
@ 2007-03-15 15:00 ` Rik van Riel
2007-03-15 17:37 ` Valdis.Kletnieks
2 siblings, 0 replies; 41+ messages in thread
From: Rik van Riel @ 2007-03-15 15:00 UTC (permalink / raw)
To: Andreas Mohr
Cc: Dave Kleikamp, Ashif Harji, linux-mm, Nick Piggin, Jan Kara,
linux-kernel, akpm
Andreas Mohr wrote:
> I've been thinking hard how to avoid the mark_page_accessed() starvation in
> case of a fixed, (almost) non-changing access state, but this seems hard since
> it'd seem we need some kind of state management here to figure out good
> intervals of when to call mark_page_accessed() *again* for this page.
Like this? :)
http://surriel.com/patches/clockpro/2.6.12/useonce-cleanup
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 12:38 ` Nick Piggin
@ 2007-03-15 15:06 ` Rik van Riel
0 siblings, 0 replies; 41+ messages in thread
From: Rik van Riel @ 2007-03-15 15:06 UTC (permalink / raw)
To: Nick Piggin
Cc: Peter Zijlstra, Ashif Harji, linux-mm, Jan Kara, linux-kernel, akpm
Nick Piggin wrote:
> On Thu, Mar 15, 2007 at 11:39:14AM +0100, Peter Zijlstra wrote:
>> On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote:
>>> This patch unconditionally calls mark_page_accessed to prevent pages,
>>> especially for small files, from being evicted from the page cache despite
>>> frequent access.
>> Since we're hackling over the use-once stuff again...
>>
>> /me brings up: http://marc.info/?l=linux-mm&m=115316894804385&w=2 and
>> ducks.
>
> Join the club ;)
>
> http://groups.google.com.au/group/linux.kernel/msg/7b3237b8e715475b?hl=en&
>
> I can't find the patch where I actually did combine it with a PG_usedonce
> bit, but the end result is pretty similar to your patch.
Except for the bit where vmscan throws away page cache pages
with PageReferenced set, unless they are mapped :)
Also, pages that only got accessed once will have the referenced
bit set too, so your patch gives no way to distinguish between
pages that were accessed once and pages that were accessed multiple
times.
> And I think one
> or two others have also independently invented the same thing.
>
> So it *has* to be good, doesn't it? ;)
No argument there :)
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-14 19:58 ` [PATCH] mm/filemap.c: unconditionally call mark_page_accessed Ashif Harji
2007-03-14 20:55 ` Dave Kleikamp
2007-03-15 10:39 ` Peter Zijlstra
@ 2007-03-15 15:56 ` Chuck Ebbert
2007-03-15 16:29 ` Nick Piggin
2 siblings, 1 reply; 41+ messages in thread
From: Chuck Ebbert @ 2007-03-15 15:56 UTC (permalink / raw)
To: Ashif Harji; +Cc: linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm
[-- Attachment #1: Type: text/plain, Size: 301 bytes --]
Ashif Harji wrote:
>
> This patch unconditionally calls mark_page_accessed to prevent pages,
> especially for small files, from being evicted from the page cache
> despite frequent access.
>
> Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>
>
I like mine better -- it leaves the comment:
[-- Attachment #2: linux-2.6-fix_mark_page_accessed.patch --]
[-- Type: text/plain, Size: 1080 bytes --]
From: Chuck Ebbert <cebbert@redhat.com>
Always mark page as accessed when reading multiple times.
Original idea and patch by Ashif Harji <asharji@cs.uwaterloo.ca>
Signed-off-by: Chuck Ebbert <cebbert@redhat.com>
--- 2.6.20.2-t.orig/mm/filemap.c
+++ 2.6.20.2-t/mm/filemap.c
@@ -887,7 +887,6 @@
unsigned long offset;
unsigned long last_index;
unsigned long next_index;
- unsigned long prev_index;
loff_t isize;
struct page *cached_page;
int error;
@@ -896,7 +895,6 @@
cached_page = NULL;
index = *ppos >> PAGE_CACHE_SHIFT;
next_index = index;
- prev_index = ra.prev_page;
last_index = (*ppos + desc->count + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT;
offset = *ppos & ~PAGE_CACHE_MASK;
@@ -945,11 +943,9 @@
/*
* When (part of) the same page is read multiple times
- * in succession, only mark it as accessed the first time.
+ * in succession, always mark it accessed.
*/
- if (prev_index != index)
- mark_page_accessed(page);
- prev_index = index;
+ mark_page_accessed(page);
/*
* Ok, we have the page, and it's up-to-date, so
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 15:56 ` Chuck Ebbert
@ 2007-03-15 16:29 ` Nick Piggin
2007-03-15 17:04 ` Rik van Riel
` (2 more replies)
0 siblings, 3 replies; 41+ messages in thread
From: Nick Piggin @ 2007-03-15 16:29 UTC (permalink / raw)
To: Chuck Ebbert; +Cc: Ashif Harji, linux-mm, Jan Kara, linux-kernel, akpm
On Thu, Mar 15, 2007 at 11:56:59AM -0400, Chuck Ebbert wrote:
> Ashif Harji wrote:
> >
> > This patch unconditionally calls mark_page_accessed to prevent pages,
> > especially for small files, from being evicted from the page cache
> > despite frequent access.
> >
> > Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>
> >
>
> I like mine better -- it leaves the comment:
How about this? It also doesn't break the use-once heuristic.
--
A change to make database style random read() workloads perform better, by
calling mark_page_accessed for some non-page-aligned reads broke the case of
< PAGE_CACHE_SIZE files, which will not get their prev_index moved past the
first page.
Combine both heuristics for marking the page accessed.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -929,7 +929,7 @@ page_ok:
* When (part of) the same page is read multiple times
* in succession, only mark it as accessed the first time.
*/
- if (prev_index != index)
+ if (prev_index != index || !offset)
mark_page_accessed(page);
prev_index = index;
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 16:29 ` Nick Piggin
@ 2007-03-15 17:04 ` Rik van Riel
2007-03-15 17:44 ` Hugh Dickins
2007-03-15 19:55 ` Ashif Harji
2 siblings, 0 replies; 41+ messages in thread
From: Rik van Riel @ 2007-03-15 17:04 UTC (permalink / raw)
To: Nick Piggin
Cc: Chuck Ebbert, Ashif Harji, linux-mm, Jan Kara, linux-kernel, akpm
Nick Piggin wrote:
> A change to make database style random read() workloads perform better, by
> calling mark_page_accessed for some non-page-aligned reads broke the case of
> < PAGE_CACHE_SIZE files, which will not get their prev_index moved past the
> first page.
>
> Combine both heuristics for marking the page accessed.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-14 21:33 ` Andreas Mohr
2007-03-14 22:08 ` Dave Kleikamp
2007-03-15 15:00 ` Rik van Riel
@ 2007-03-15 17:37 ` Valdis.Kletnieks
2007-03-15 18:35 ` Rik van Riel
2 siblings, 1 reply; 41+ messages in thread
From: Valdis.Kletnieks @ 2007-03-15 17:37 UTC (permalink / raw)
To: Andreas Mohr
Cc: Dave Kleikamp, Ashif Harji, linux-mm, Nick Piggin, Jan Kara,
linux-kernel, akpm
[-- Attachment #1: Type: text/plain, Size: 548 bytes --]
On Wed, 14 Mar 2007 22:33:17 BST, Andreas Mohr said:
> it'd seem we need some kind of state management here to figure out good
> intervals of when to call mark_page_accessed() *again* for this page. E.g.
> despite non-changing access patterns you could still call mark_page_accessed(
)
> every 32 calls or so to avoid expiry, but this would need extra helper
> variables.
What if you did something like
if (jiffies%32) {...
(Possibly scaling it so the low-order bits change). No need to lock it, as
"right most of the time" is close enough.
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 16:29 ` Nick Piggin
2007-03-15 17:04 ` Rik van Riel
@ 2007-03-15 17:44 ` Hugh Dickins
2007-03-15 20:01 ` Nick Piggin
2007-03-15 22:59 ` Andrea Arcangeli
2007-03-15 19:55 ` Ashif Harji
2 siblings, 2 replies; 41+ messages in thread
From: Hugh Dickins @ 2007-03-15 17:44 UTC (permalink / raw)
To: Nick Piggin
Cc: Chuck Ebbert, Ashif Harji, Miquel van Smoorenburg, linux-mm,
Jan Kara, linux-kernel, akpm
On Thu, 15 Mar 2007, Nick Piggin wrote:
> On Thu, Mar 15, 2007 at 11:56:59AM -0400, Chuck Ebbert wrote:
> > Ashif Harji wrote:
> > >
> > > This patch unconditionally calls mark_page_accessed to prevent pages,
> > > especially for small files, from being evicted from the page cache
> > > despite frequent access.
> > >
> > > Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>
Yeah, yeah, I'm not a real mman, I don't have my own patch and
website for this ;) but I'm old, let me mumble some history...
Ashif's patch would take us back to 2.4.10 when mark_page_accessed
was introduced: in 2.4.11 someone (probably Andrea) immediately
added a !offset || !filp->f_reada condition on it there, which
remains in 2.4 to this day. That _probably_ means that Ashif's
patch is suboptimal, and that your !offset patch is good.
f_reada went away in 2.5.8, and the !offset condition remained
until 2.6.11, when Miquel (CC'ed) replaced it by today's prev_index
condition. His changelog entry appended below. Since it's Miquel
who removed the !offset condition, he should be consulted on its
reintroduction.
Hugh
> >
> > I like mine better -- it leaves the comment:
>
> How about this? It also doesn't break the use-once heuristic.
>
> --
> A change to make database style random read() workloads perform better, by
> calling mark_page_accessed for some non-page-aligned reads broke the case of
> < PAGE_CACHE_SIZE files, which will not get their prev_index moved past the
> first page.
>
> Combine both heuristics for marking the page accessed.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
>
> Index: linux-2.6/mm/filemap.c
> ===================================================================
> --- linux-2.6.orig/mm/filemap.c
> +++ linux-2.6/mm/filemap.c
> @@ -929,7 +929,7 @@ page_ok:
> * When (part of) the same page is read multiple times
> * in succession, only mark it as accessed the first time.
> */
> - if (prev_index != index)
> + if (prev_index != index || !offset)
> mark_page_accessed(page);
> prev_index = index;
>
Miquel's patch comment from ChangeLog-2.6.11:
[PATCH] mark_page_accessed() for read()s on non-page boundaries
When reading a (partial) page from disk using read(), the kernel only marks
the page as "accessed" if the read started at a page boundary. This means
that files that are accessed randomly at non-page boundaries (usually
database style files) will not be cached properly.
The patch below uses the readahead state instead. If a page is read(), it
is marked as "accessed" if the previous read() was for a different page,
whatever the offset in the page.
Testing results:
- Boot kernel with mem=128M
- create a testfile of size 8 MB on a partition. Unmount/mount.
- then generate about 10 MB/sec streaming writes
for i in `seq 1 1000`
do
dd if=/dev/zero of=junkfile.$i bs=1M count=10
sync
cat junkfile.$i > /dev/null
sleep 1
done
- use an application that reads 128 bytes 64000 times from a
random offset in the 64 MB testfile.
1. Linux 2.6.10-rc3 vanilla, no streaming writes:
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.03s user 0.22s system 5% cpu 4.456 total
2. Linux 2.6.10-rc3 vanilla, streaming writes:
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.03s user 0.16s system 2% cpu 7.667 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.03s user 0.37s system 1% cpu 23.294 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.02s user 0.99s system 1% cpu 1:11.52 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.03s user 0.21s system 2% cpu 10.273 total
3. Linux 2.6.10-rc3 with read-page-access.patch , streaming writes:
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.02s user 0.21s system 3% cpu 7.634 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.04s user 0.22s system 2% cpu 9.588 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.02s user 0.12s system 24% cpu 0.563 total
# time ~/rr testfile
Read 128 bytes 64000 times
~/rr testfile 0.03s user 0.13s system 98% cpu 0.163 total
As expected, with the read-page-access.patch, the kernel keeps the 8 MB
testfile cached as expected, while without it, it doesn't.
So this is useful for workloads where one smallish (wrt RAM) file is read
randomly over and over again (like heavily used database indexes), while
other I/O is going on. Plain 2.6 caches those files poorly, if the app
uses plain read().
Signed-Off-By: Miquel van Smoorenburg <miquels@cistron.nl>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 17:37 ` Valdis.Kletnieks
@ 2007-03-15 18:35 ` Rik van Riel
2007-03-16 3:51 ` Valdis.Kletnieks
0 siblings, 1 reply; 41+ messages in thread
From: Rik van Riel @ 2007-03-15 18:35 UTC (permalink / raw)
To: Valdis.Kletnieks
Cc: Andreas Mohr, Dave Kleikamp, Ashif Harji, linux-mm, Nick Piggin,
Jan Kara, linux-kernel, akpm
Valdis.Kletnieks@vt.edu wrote:
> On Wed, 14 Mar 2007 22:33:17 BST, Andreas Mohr said:
>
>> it'd seem we need some kind of state management here to figure out good
>> intervals of when to call mark_page_accessed() *again* for this page. E.g.
>> despite non-changing access patterns you could still call mark_page_accessed(
> )
>> every 32 calls or so to avoid expiry, but this would need extra helper
>> variables.
>
> What if you did something like
>
> if (jiffies%32) {...
>
> (Possibly scaling it so the low-order bits change). No need to lock it, as
> "right most of the time" is close enough.
Bad idea. That way you would only count page accesses if the
phase of the moon^Wjiffie is just right.
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 5:22 ` Ashif Harji
2007-03-15 12:46 ` Dave Kleikamp
@ 2007-03-15 19:07 ` Andrew Morton
2007-03-15 21:49 ` Andrea Arcangeli
1 sibling, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2007-03-15 19:07 UTC (permalink / raw)
To: Ashif Harji; +Cc: dingxn, shaggy, andi, linux-mm, npiggin, jack, linux-kernel
> On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <asharji@cs.uwaterloo.ca> wrote:
> I still think the simple fix of removing the
> condition is the best approach, but I'm certainly open to alternatives.
Yes, the problem of falsely activating pages when the file is read in small
hunks is worse than the problem which your patch fixes.
We could change it so that if the current read() includes the zeroeth byte
of the page, we run mark_page_accessed() even if this_page==prev_page?
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 16:29 ` Nick Piggin
2007-03-15 17:04 ` Rik van Riel
2007-03-15 17:44 ` Hugh Dickins
@ 2007-03-15 19:55 ` Ashif Harji
2007-03-15 20:07 ` Nick Piggin
2 siblings, 1 reply; 41+ messages in thread
From: Ashif Harji @ 2007-03-15 19:55 UTC (permalink / raw)
To: Nick Piggin
Cc: Chuck Ebbert, linux-mm, Jan Kara, linux-kernel, akpm,
Andrew Morton, Rik van Riel
On Thu, 15 Mar 2007, Nick Piggin wrote:
> On Thu, Mar 15, 2007 at 11:56:59AM -0400, Chuck Ebbert wrote:
>> Ashif Harji wrote:
>>>
>>> This patch unconditionally calls mark_page_accessed to prevent pages,
>>> especially for small files, from being evicted from the page cache
>>> despite frequent access.
>>>
>>> Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>
>>>
>>
>> I like mine better -- it leaves the comment:
>
> How about this? It also doesn't break the use-once heuristic.
>
> --
> A change to make database style random read() workloads perform better, by
> calling mark_page_accessed for some non-page-aligned reads broke the case of
> < PAGE_CACHE_SIZE files, which will not get their prev_index moved past the
> first page.
>
> Combine both heuristics for marking the page accessed.
>
> Signed-off-by: Nick Piggin <npiggin@suse.de>
>
> Index: linux-2.6/mm/filemap.c
> ===================================================================
> --- linux-2.6.orig/mm/filemap.c
> +++ linux-2.6/mm/filemap.c
> @@ -929,7 +929,7 @@ page_ok:
> * When (part of) the same page is read multiple times
> * in succession, only mark it as accessed the first time.
> */
> - if (prev_index != index)
> + if (prev_index != index || !offset)
> mark_page_accessed(page);
> prev_index = index;
>
>
It sounds like people are happy with the fix suggested by Nick. That fix
is okay with me as it fixes the problem I am having.
I suspect, however, that by not directly detecting the problematic access
pattern, where the file is accessed sequentially in small hunks, other
applications may experience performance problems related to caching. For
example, if an application frequently and non-sequentially reads from the
same page. This is especially true for files of size < PAGE_CACHE_SIZE.
But, I'm not sure if such an access pattern likely.
ashif.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 17:44 ` Hugh Dickins
@ 2007-03-15 20:01 ` Nick Piggin
2007-03-15 22:59 ` Andrea Arcangeli
1 sibling, 0 replies; 41+ messages in thread
From: Nick Piggin @ 2007-03-15 20:01 UTC (permalink / raw)
To: Hugh Dickins
Cc: Chuck Ebbert, Ashif Harji, Miquel van Smoorenburg, linux-mm,
Jan Kara, linux-kernel, akpm
On Thu, Mar 15, 2007 at 05:44:01PM +0000, Hugh Dickins wrote:
> On Thu, 15 Mar 2007, Nick Piggin wrote:
> > On Thu, Mar 15, 2007 at 11:56:59AM -0400, Chuck Ebbert wrote:
> > > Ashif Harji wrote:
> > > >
> > > > This patch unconditionally calls mark_page_accessed to prevent pages,
> > > > especially for small files, from being evicted from the page cache
> > > > despite frequent access.
> > > >
> > > > Signed-off-by: Ashif Harji <asharji@beta.uwaterloo.ca>
>
> Yeah, yeah, I'm not a real mman, I don't have my own patch and
> website for this ;) but I'm old, let me mumble some history...
>
> Ashif's patch would take us back to 2.4.10 when mark_page_accessed
> was introduced: in 2.4.11 someone (probably Andrea) immediately
> added a !offset || !filp->f_reada condition on it there, which
> remains in 2.4 to this day. That _probably_ means that Ashif's
> patch is suboptimal, and that your !offset patch is good.
>
> f_reada went away in 2.5.8, and the !offset condition remained
> until 2.6.11, when Miquel (CC'ed) replaced it by today's prev_index
> condition. His changelog entry appended below. Since it's Miquel
> who removed the !offset condition, he should be consulted on its
> reintroduction.
Yeah I did go back and check up on that changelog, because I knew
we had a !offset check there at one stage, which is immune to this
problem (or at least can handle it a little better).
I suspect that Miquel was probably more interested in _increasing_
mark_page_accessed coverage with his new condition than restricting
it from the !offset cases.
Thanks for digging it up and posting here, though.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 19:55 ` Ashif Harji
@ 2007-03-15 20:07 ` Nick Piggin
2007-03-15 20:31 ` Andreas Mohr
0 siblings, 1 reply; 41+ messages in thread
From: Nick Piggin @ 2007-03-15 20:07 UTC (permalink / raw)
To: Ashif Harji
Cc: Chuck Ebbert, linux-mm, Jan Kara, linux-kernel, akpm, Rik van Riel
On Thu, Mar 15, 2007 at 03:55:08PM -0400, Ashif Harji wrote:
>
> It sounds like people are happy with the fix suggested by Nick. That fix
> is okay with me as it fixes the problem I am having.
>
> I suspect, however, that by not directly detecting the problematic access
> pattern, where the file is accessed sequentially in small hunks, other
> applications may experience performance problems related to caching. For
> example, if an application frequently and non-sequentially reads from the
> same page. This is especially true for files of size < PAGE_CACHE_SIZE.
> But, I'm not sure if such an access pattern likely.
Well in general we like to help applications that help themselves. It
is actually a good heuristic, surprisingly. If an application randomly
accesses the same page (and there is no write activity going on), then
it would be better off to cache it in userspace, and if it doesn't care
to do that then it won't mind having to read it off disk now and again :)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 20:07 ` Nick Piggin
@ 2007-03-15 20:31 ` Andreas Mohr
0 siblings, 0 replies; 41+ messages in thread
From: Andreas Mohr @ 2007-03-15 20:31 UTC (permalink / raw)
To: Nick Piggin
Cc: Ashif Harji, Chuck Ebbert, linux-mm, Jan Kara, linux-kernel,
akpm, Rik van Riel
Hi,
On Thu, Mar 15, 2007 at 09:07:39PM +0100, Nick Piggin wrote:
> Well in general we like to help applications that help themselves. It
> is actually a good heuristic, surprisingly. If an application randomly
> accesses the same page (and there is no write activity going on), then
> it would be better off to cache it in userspace, and if it doesn't care
> to do that then it won't mind having to read it off disk now and again :)
Sounds like a good plan since this probably is a nice way to make stupid apps
doing stupid things sit up and take notice, and maybe the authors will then go
so far as fixing up a few more things that are hurting them once they actually
recognize that something is weird due to overly bad performance.
Why go to great lengths to support stupid apps when there are still so many things
which could be done to help well-behaving ones? ;)
Andreas Mohr
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 19:07 ` Andrew Morton
@ 2007-03-15 21:49 ` Andrea Arcangeli
2007-03-15 22:06 ` Andrew Morton
0 siblings, 1 reply; 41+ messages in thread
From: Andrea Arcangeli @ 2007-03-15 21:49 UTC (permalink / raw)
To: Andrew Morton
Cc: Ashif Harji, dingxn, shaggy, andi, linux-mm, npiggin, jack, linux-kernel
On Thu, Mar 15, 2007 at 11:07:35AM -0800, Andrew Morton wrote:
> > On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <asharji@cs.uwaterloo.ca> wrote:
> > I still think the simple fix of removing the
> > condition is the best approach, but I'm certainly open to alternatives.
>
> Yes, the problem of falsely activating pages when the file is read in small
> hunks is worse than the problem which your patch fixes.
Really? I would have expected all performance sensitive apps to read
in >=PAGE_SIZE chunks. And if they don't because they split their
dataset in blocks (like some database), it may not be so wrong to
activate those pages that have two "hot" blocks more aggressively than
those pages with a single hot block.
So I've an hard time to advocate to prefer the current behavior, but
certainly this can be "fixed" by caching the last_offset like others
pointed out ;)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 21:49 ` Andrea Arcangeli
@ 2007-03-15 22:06 ` Andrew Morton
2007-03-15 23:15 ` Andrea Arcangeli
0 siblings, 1 reply; 41+ messages in thread
From: Andrew Morton @ 2007-03-15 22:06 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Ashif Harji, dingxn, shaggy, andi, linux-mm, npiggin, jack, linux-kernel
On Thu, 15 Mar 2007 22:49:23 +0100
Andrea Arcangeli <andrea@suse.de> wrote:
> On Thu, Mar 15, 2007 at 11:07:35AM -0800, Andrew Morton wrote:
> > > On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <asharji@cs.uwaterloo.ca> wrote:
> > > I still think the simple fix of removing the
> > > condition is the best approach, but I'm certainly open to alternatives.
> >
> > Yes, the problem of falsely activating pages when the file is read in small
> > hunks is worse than the problem which your patch fixes.
>
> Really? I would have expected all performance sensitive apps to read
> in >=PAGE_SIZE chunks. And if they don't because they split their
> dataset in blocks (like some database), it may not be so wrong to
> activate those pages that have two "hot" blocks more aggressively than
> those pages with a single hot block.
But the problem which is being fixed here is really obscure: an application
repeatedly reading the first page and only the first page of a file, always
via the same fd.
I'd expect that the sub-page-size read scenarion happens heaps more often
than that, especially when dealing with larger PAGE_SIZEs.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 17:44 ` Hugh Dickins
2007-03-15 20:01 ` Nick Piggin
@ 2007-03-15 22:59 ` Andrea Arcangeli
2007-03-15 23:15 ` Dave Kleikamp
1 sibling, 1 reply; 41+ messages in thread
From: Andrea Arcangeli @ 2007-03-15 22:59 UTC (permalink / raw)
To: Hugh Dickins
Cc: Nick Piggin, Chuck Ebbert, Ashif Harji, Miquel van Smoorenburg,
linux-mm, Jan Kara, linux-kernel, akpm
On Thu, Mar 15, 2007 at 05:44:01PM +0000, Hugh Dickins wrote:
> who removed the !offset condition, he should be consulted on its
> reintroduction.
the !offset check looks a pretty broken heuristic indeed, it would
break random I/O. The real fix is to add a ra.prev_offset along with
ra.prev_page, and if who implements it wants to be stylish he can as
well use a ra.last_contiguous_read structure that has a page and
offset fields (and then of course remove ra.prev_page).
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 22:06 ` Andrew Morton
@ 2007-03-15 23:15 ` Andrea Arcangeli
0 siblings, 0 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2007-03-15 23:15 UTC (permalink / raw)
To: Andrew Morton
Cc: Ashif Harji, dingxn, shaggy, andi, linux-mm, npiggin, jack, linux-kernel
On Thu, Mar 15, 2007 at 03:06:01PM -0700, Andrew Morton wrote:
> On Thu, 15 Mar 2007 22:49:23 +0100
> Andrea Arcangeli <andrea@suse.de> wrote:
>
> > On Thu, Mar 15, 2007 at 11:07:35AM -0800, Andrew Morton wrote:
> > > > On Thu, 15 Mar 2007 01:22:45 -0400 (EDT) Ashif Harji <asharji@cs.uwaterloo.ca> wrote:
> > > > I still think the simple fix of removing the
> > > > condition is the best approach, but I'm certainly open to alternatives.
> > >
> > > Yes, the problem of falsely activating pages when the file is read in small
> > > hunks is worse than the problem which your patch fixes.
> >
> > Really? I would have expected all performance sensitive apps to read
> > in >=PAGE_SIZE chunks. And if they don't because they split their
> > dataset in blocks (like some database), it may not be so wrong to
> > activate those pages that have two "hot" blocks more aggressively than
> > those pages with a single hot block.
>
> But the problem which is being fixed here is really obscure: an application
> repeatedly reading the first page and only the first page of a file, always
> via the same fd.
>
> I'd expect that the sub-page-size read scenarion happens heaps more often
> than that, especially when dealing with larger PAGE_SIZEs.
Whatever that app is doing, clearly we have to keep those 4k in cache!
Like obviously the specweb demonstrated that as long as you are
_repeating_ the same read, it's correct to activate the page even if
it was reading from the same page as before.
What is wrong is to activate the page more aggressively if it's
_different_ parts of the page that are being read in a contiguous
way. I thought that the whole point of the ra.prev_page was to detect
_contiguous_ (not random) I/O made with a small buffer, anything else
doesn't make much sense to me.
In short I think taking a ra.prev_offset into account as suggested by
Dave Kleikamp is the best, it may actually benefit the obscure app too ;)
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 22:59 ` Andrea Arcangeli
@ 2007-03-15 23:15 ` Dave Kleikamp
2007-03-15 23:28 ` Andrea Arcangeli
0 siblings, 1 reply; 41+ messages in thread
From: Dave Kleikamp @ 2007-03-15 23:15 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Hugh Dickins, Nick Piggin, Chuck Ebbert, Ashif Harji,
Miquel van Smoorenburg, linux-mm, Jan Kara, linux-kernel, akpm
On Thu, 2007-03-15 at 23:59 +0100, Andrea Arcangeli wrote:
> On Thu, Mar 15, 2007 at 05:44:01PM +0000, Hugh Dickins wrote:
> > who removed the !offset condition, he should be consulted on its
> > reintroduction.
>
> the !offset check looks a pretty broken heuristic indeed, it would
> break random I/O.
I wouldn't call it broken. At worst, I'd say it's imperfect. But
that's the nature of a heuristic. It most likely works in a huge
majority of cases.
> The real fix is to add a ra.prev_offset along with
> ra.prev_page, and if who implements it wants to be stylish he can as
> well use a ra.last_contiguous_read structure that has a page and
> offset fields (and then of course remove ra.prev_page).
I suggested something along these lines, but I wonder if it's overkill.
The !offset check is simple and appears to be a decent improvement over
the current code.
--
David Kleikamp
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 23:15 ` Dave Kleikamp
@ 2007-03-15 23:28 ` Andrea Arcangeli
0 siblings, 0 replies; 41+ messages in thread
From: Andrea Arcangeli @ 2007-03-15 23:28 UTC (permalink / raw)
To: Dave Kleikamp
Cc: Hugh Dickins, Nick Piggin, Chuck Ebbert, Ashif Harji,
Miquel van Smoorenburg, linux-mm, Jan Kara, linux-kernel, akpm
On Thu, Mar 15, 2007 at 06:15:45PM -0500, Dave Kleikamp wrote:
> On Thu, 2007-03-15 at 23:59 +0100, Andrea Arcangeli wrote:
> > On Thu, Mar 15, 2007 at 05:44:01PM +0000, Hugh Dickins wrote:
> > > who removed the !offset condition, he should be consulted on its
> > > reintroduction.
> >
> > the !offset check looks a pretty broken heuristic indeed, it would
> > break random I/O.
>
> I wouldn't call it broken. At worst, I'd say it's imperfect. But
> that's the nature of a heuristic. It most likely works in a huge
> majority of cases.
well, IMHO in the huge majority of cases the prev_page check isn't
necessary in the first place (and IMHO it hurts a lot more than it can
help, as demonstrated by specweb, since we'll bite on the good guys to
help the bad guys).
The only case where I can imagine the prev_page to make sense is to
handle contiguous I/O made with a small buffer, so clearly an
inefficient code in the first place. But if this guy is reading with
<PAGE_SIZE buffer there's no guarantee that he's reading f_pos aligned
either, hence the need of taking last_offset into account too so at
least it's a "perfect" heuristic that will reliably detect contiguous
I/O no matter in what shape or form you execute it, as long as it is
contiguous I/O.
Any other variation of behavior besides the autodetection of
contiguous I/O run in whatever buffer/aligned form, should be mandated
by userland through fadvise/madvise IMHO or we run into the toes of
the good guys.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-15 18:35 ` Rik van Riel
@ 2007-03-16 3:51 ` Valdis.Kletnieks
2007-03-16 4:09 ` Rik van Riel
0 siblings, 1 reply; 41+ messages in thread
From: Valdis.Kletnieks @ 2007-03-16 3:51 UTC (permalink / raw)
To: Rik van Riel
Cc: Andreas Mohr, Dave Kleikamp, Ashif Harji, linux-mm, Nick Piggin,
Jan Kara, linux-kernel, akpm
[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]
On Thu, 15 Mar 2007 14:35:17 EDT, Rik van Riel said:
> Valdis.Kletnieks@vt.edu wrote:
> > On Wed, 14 Mar 2007 22:33:17 BST, Andreas Mohr said:
> >
> >> it'd seem we need some kind of state management here to figure out good
> >> intervals of when to call mark_page_accessed() *again* for this page. E.g.
> >> despite non-changing access patterns you could still call mark_page_accessed()
> >> every 32 calls or so to avoid expiry, but this would need extra helper
> >> variables.
> >
> > What if you did something like
> >
> > if (jiffies%32) {...
> >
> > (Possibly scaling it so the low-order bits change). No need to lock it, as
> > "right most of the time" is close enough.
>
> Bad idea. That way you would only count page accesses if the
> phase of the moon^Wjiffie is just right.
On the other hand, Andreas suggested only marking it once every 32 calls,
but that required a helper variable. Statistically, jiffies%32 should
end up about the same as a helper variable %32.
This of course, if just calling mark_page_accessed() is actually expensive
enough that we don't want to do it unconditionally.
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-16 3:51 ` Valdis.Kletnieks
@ 2007-03-16 4:09 ` Rik van Riel
0 siblings, 0 replies; 41+ messages in thread
From: Rik van Riel @ 2007-03-16 4:09 UTC (permalink / raw)
To: Valdis.Kletnieks
Cc: Andreas Mohr, Dave Kleikamp, Ashif Harji, linux-mm, Nick Piggin,
Jan Kara, linux-kernel, akpm
Valdis.Kletnieks@vt.edu wrote:
> On the other hand, Andreas suggested only marking it once every 32 calls,
> but that required a helper variable. Statistically, jiffies%32 should
> end up about the same as a helper variable %32.
>
> This of course, if just calling mark_page_accessed() is actually expensive
> enough that we don't want to do it unconditionally.
Not caching a needed page and having to wait for a disk seek
to complete will be *way* more expensive than any call to
mark_page_accessed().
A modern CPU can do somewhere on the order of 50 million
instructions in the time it takes to bring one page in from
disk.
However, this does not mean we should unconditionally call
mark_page_accessed(), since that could cause use to push
wanted data out of the cache because of one program that
does its streaming accesses in a strange way...
This is a situation where getting it right almost certainly
matters.
--
Politics is the struggle between those who want to make their country
the best in the world, and those who believe it already is. Each group
calls the other unpatriotic.
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed
2007-03-14 20:55 ` Dave Kleikamp
2007-03-14 21:33 ` Andreas Mohr
@ 2007-03-16 14:20 ` Anton Blanchard
1 sibling, 0 replies; 41+ messages in thread
From: Anton Blanchard @ 2007-03-16 14:20 UTC (permalink / raw)
To: Dave Kleikamp
Cc: Ashif Harji, linux-mm, Nick Piggin, Jan Kara, linux-kernel, akpm
Hi,
> I guess the downside to this is if a reader is reading a large file, or
> several files, sequentially with a small read size (smaller than
> PAGE_SIZE), the pages will be marked active after just one read pass.
> My gut says the benefits of this patch outweigh the cost. I would
> expect real-world backup apps, etc. to read at least PAGE_SIZE.
PAGE_SIZE being 8k on sparc, 16-64k on ia64 and potentially 64kb on
powerpc :)
Id expect a large percentage of files to be below that size.
Anton
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2007-03-16 14:23 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-09 22:03 do_generic_mapping_read performance issue Ashif Harji
2007-03-12 14:20 ` Jan Kara
2007-03-12 14:39 ` Nick Piggin
2007-03-12 15:13 ` Jan Kara
2007-03-12 17:05 ` Ashif Harji
2007-03-12 17:35 ` Jan Kara
2007-03-13 18:43 ` Ashif Harji
2007-03-13 18:55 ` Jan Kara
2007-03-14 19:58 ` [PATCH] mm/filemap.c: unconditionally call mark_page_accessed Ashif Harji
2007-03-14 20:55 ` Dave Kleikamp
2007-03-14 21:33 ` Andreas Mohr
2007-03-14 22:08 ` Dave Kleikamp
2007-03-15 1:36 ` Xiaoning Ding
2007-03-15 5:22 ` Ashif Harji
2007-03-15 12:46 ` Dave Kleikamp
2007-03-15 12:50 ` Nick Piggin
2007-03-15 19:07 ` Andrew Morton
2007-03-15 21:49 ` Andrea Arcangeli
2007-03-15 22:06 ` Andrew Morton
2007-03-15 23:15 ` Andrea Arcangeli
2007-03-15 15:00 ` Rik van Riel
2007-03-15 17:37 ` Valdis.Kletnieks
2007-03-15 18:35 ` Rik van Riel
2007-03-16 3:51 ` Valdis.Kletnieks
2007-03-16 4:09 ` Rik van Riel
2007-03-16 14:20 ` Anton Blanchard
2007-03-15 10:39 ` Peter Zijlstra
2007-03-15 12:38 ` Nick Piggin
2007-03-15 15:06 ` Rik van Riel
2007-03-15 15:56 ` Chuck Ebbert
2007-03-15 16:29 ` Nick Piggin
2007-03-15 17:04 ` Rik van Riel
2007-03-15 17:44 ` Hugh Dickins
2007-03-15 20:01 ` Nick Piggin
2007-03-15 22:59 ` Andrea Arcangeli
2007-03-15 23:15 ` Dave Kleikamp
2007-03-15 23:28 ` Andrea Arcangeli
2007-03-15 19:55 ` Ashif Harji
2007-03-15 20:07 ` Nick Piggin
2007-03-15 20:31 ` Andreas Mohr
2007-03-12 16:46 ` do_generic_mapping_read performance issue Ashif Harji
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).