All of lore.kernel.org
 help / color / mirror / Atom feed
* Swappiness vs. mmap() and interactive response
@ 2009-04-28  4:44 Elladan
  2009-04-28  5:35   ` KOSAKI Motohiro
  2009-04-28 23:29   ` Rik van Riel
  0 siblings, 2 replies; 336+ messages in thread
From: Elladan @ 2009-04-28  4:44 UTC (permalink / raw)
  To: linux-kernel

Hi,

So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
and then I did the following (with XFS over LVM):

mv /500gig/of/data/on/disk/one /disk/two

This quickly caused the system to. grind.. to... a.... complete..... halt.
Basically every UI operation, including the mouse in Xorg, started experiencing
multiple second lag and delays.  This made the system essentially unusable --
for example, just flipping to the window where the "mv" command was running
took 10 seconds on more than one occasion.  Basically a "click and get coffee"
interface.

There was no particular kernel CPU load -- the SATA DMA seemed fine.

If I actively used the GUI, then the pieces I was using would work better, but
they'd start experiencing astonishing latency again if I just let the UI sit
for a little while.  From this, I diagnosed that the problem was probably
related to the VM paging out my GUI.

Next, I set the following:

echo 0 > /proc/sys/vm/swappiness

... hoping it would prevent paging out of the UI in favor of file data that's
only used once.  It did appear to help to a small degree, but not much.  The
system is still effectively unusable while a file copy is going on.

>From this, I diagnosed that most likely, the kernel was paging out all my
application file mmap() data (such as my executables and shared libraries) in
favor of total garbage VM load from the file copy.

I don't know how to verify that this is true definitively.  Are there some
magic numbers in /proc I can look at?  However, I did run latencytop, and it
showed massive 2000+ msec latency in the page fault handler, as well as in
various operations such as XFS read.  

Could this be something else?  There were some long delays in latencytop from
various apps doing fsync as well, but it seems unlikely that this would destroy
latency in Xorg, and again, latency improved whenever I touched an app, for
that app.

Is there any way to fix this, short of rewriting the VM myself?  For example,
is there some way I could convince this VM that pages with active mappings are
valuable?

Thanks.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  4:44 Swappiness vs. mmap() and interactive response Elladan
@ 2009-04-28  5:35   ` KOSAKI Motohiro
  2009-04-28 23:29   ` Rik van Riel
  1 sibling, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-28  5:35 UTC (permalink / raw)
  To: Elladan; +Cc: kosaki.motohiro, linux-kernel, linux-mm, Rik van Riel

(cc to linux-mm and Rik)


> Hi,
> 
> So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> and then I did the following (with XFS over LVM):
> 
> mv /500gig/of/data/on/disk/one /disk/two
> 
> This quickly caused the system to. grind.. to... a.... complete..... halt.
> Basically every UI operation, including the mouse in Xorg, started experiencing
> multiple second lag and delays.  This made the system essentially unusable --
> for example, just flipping to the window where the "mv" command was running
> took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> interface.

I have some question and request.

1. please post your /proc/meminfo
2. Do above copy make tons swap-out? IOW your disk read much faster than write?
3. cache limitation of memcgroup solve this problem?
4. Which disk have your /bin and /usr/bin?



> 
> There was no particular kernel CPU load -- the SATA DMA seemed fine.
> 
> If I actively used the GUI, then the pieces I was using would work better, but
> they'd start experiencing astonishing latency again if I just let the UI sit
> for a little while.  From this, I diagnosed that the problem was probably
> related to the VM paging out my GUI.
> 
> Next, I set the following:
> 
> echo 0 > /proc/sys/vm/swappiness
> 
> ... hoping it would prevent paging out of the UI in favor of file data that's
> only used once.  It did appear to help to a small degree, but not much.  The
> system is still effectively unusable while a file copy is going on.
> 
> From this, I diagnosed that most likely, the kernel was paging out all my
> application file mmap() data (such as my executables and shared libraries) in
> favor of total garbage VM load from the file copy.
> 
> I don't know how to verify that this is true definitively.  Are there some
> magic numbers in /proc I can look at?  However, I did run latencytop, and it
> showed massive 2000+ msec latency in the page fault handler, as well as in
> various operations such as XFS read.  
> 
> Could this be something else?  There were some long delays in latencytop from
> various apps doing fsync as well, but it seems unlikely that this would destroy
> latency in Xorg, and again, latency improved whenever I touched an app, for
> that app.
> 
> Is there any way to fix this, short of rewriting the VM myself?  For example,
> is there some way I could convince this VM that pages with active mappings are
> valuable?
> 
> Thanks.




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  5:35   ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-28  5:35 UTC (permalink / raw)
  To: Elladan; +Cc: kosaki.motohiro, linux-kernel, linux-mm, Rik van Riel

(cc to linux-mm and Rik)


> Hi,
> 
> So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> and then I did the following (with XFS over LVM):
> 
> mv /500gig/of/data/on/disk/one /disk/two
> 
> This quickly caused the system to. grind.. to... a.... complete..... halt.
> Basically every UI operation, including the mouse in Xorg, started experiencing
> multiple second lag and delays.  This made the system essentially unusable --
> for example, just flipping to the window where the "mv" command was running
> took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> interface.

I have some question and request.

1. please post your /proc/meminfo
2. Do above copy make tons swap-out? IOW your disk read much faster than write?
3. cache limitation of memcgroup solve this problem?
4. Which disk have your /bin and /usr/bin?



> 
> There was no particular kernel CPU load -- the SATA DMA seemed fine.
> 
> If I actively used the GUI, then the pieces I was using would work better, but
> they'd start experiencing astonishing latency again if I just let the UI sit
> for a little while.  From this, I diagnosed that the problem was probably
> related to the VM paging out my GUI.
> 
> Next, I set the following:
> 
> echo 0 > /proc/sys/vm/swappiness
> 
> ... hoping it would prevent paging out of the UI in favor of file data that's
> only used once.  It did appear to help to a small degree, but not much.  The
> system is still effectively unusable while a file copy is going on.
> 
> From this, I diagnosed that most likely, the kernel was paging out all my
> application file mmap() data (such as my executables and shared libraries) in
> favor of total garbage VM load from the file copy.
> 
> I don't know how to verify that this is true definitively.  Are there some
> magic numbers in /proc I can look at?  However, I did run latencytop, and it
> showed massive 2000+ msec latency in the page fault handler, as well as in
> various operations such as XFS read.  
> 
> Could this be something else?  There were some long delays in latencytop from
> various apps doing fsync as well, but it seems unlikely that this would destroy
> latency in Xorg, and again, latency improved whenever I touched an app, for
> that app.
> 
> Is there any way to fix this, short of rewriting the VM myself?  For example,
> is there some way I could convince this VM that pages with active mappings are
> valuable?
> 
> Thanks.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  5:35   ` KOSAKI Motohiro
@ 2009-04-28  6:36     ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-28  6:36 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 02:35:29PM +0900, KOSAKI Motohiro wrote:
> (cc to linux-mm and Rik)
> 
> > Hi,
> > 
> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > and then I did the following (with XFS over LVM):
> > 
> > mv /500gig/of/data/on/disk/one /disk/two
> > 
> > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > Basically every UI operation, including the mouse in Xorg, started experiencing
> > multiple second lag and delays.  This made the system essentially unusable --
> > for example, just flipping to the window where the "mv" command was running
> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> > interface.
> 
> I have some question and request.
> 
> 1. please post your /proc/meminfo
> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> 3. cache limitation of memcgroup solve this problem?
> 4. Which disk have your /bin and /usr/bin?

I'll answer these out of order if you don't mind.

2. Do above copy make tons swap-out? IOW your disk read much faster than write?

The disks should be roughly similar.  However:

sda is the read disk, sdb is the write.  Here's a few snippets from iostat -xm 10

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              67.70     0.00  373.10    0.20    48.47     0.00   265.90     1.94    5.21   2.10  78.32
sdb               0.00  1889.60    0.00  139.80     0.00    52.52   769.34    35.01  250.45   5.17  72.28
---
sda               5.30     0.00  483.80    0.30    60.65     0.00   256.59     1.59    3.28   1.65  79.72
sdb               0.00  3632.70    0.00  171.10     0.00    61.10   731.39   117.09  709.66   5.84 100.00
---
sda              51.20     0.00  478.10    1.00    65.79     0.01   281.27     2.48    5.18   1.96  93.72
sdb               0.00  2104.60    0.00  174.80     0.00    62.84   736.28   108.50  613.64   5.72 100.00
--
sda             153.20     0.00  349.40    0.20    60.99     0.00   357.30     4.47   13.19   2.85  99.80
sdb               0.00  1766.50    0.00  158.60     0.00    59.89   773.34   110.07  672.25   6.30  99.96

This data seems to indicate the IO performance varies, but the reader is usually faster.

4. Which disk have your /bin and /usr/bin?

sda, the reader.

3. cache limitation of memcgroup solve this problem?

I was unable to get this to work -- do you have some documentation handy?

1. please post your /proc/meminfo

$ cat /proc/meminfo 
MemTotal:        3467668 kB
MemFree:           20164 kB
Buffers:             204 kB
Cached:          2295232 kB
SwapCached:         4012 kB
Active:           639608 kB
Inactive:        2620880 kB
Active(anon):     608104 kB
Inactive(anon):   360812 kB
Active(file):      31504 kB
Inactive(file):  2260068 kB
Unevictable:           8 kB
Mlocked:               8 kB
SwapTotal:       4194296 kB
SwapFree:        4186968 kB
Dirty:            147280 kB
Writeback:          8424 kB
AnonPages:        961280 kB
Mapped:            39016 kB
Slab:              81904 kB
SReclaimable:      59044 kB
SUnreclaim:        22860 kB
PageTables:        20548 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     5928128 kB
Committed_AS:    1770348 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      281908 kB
VmallocChunk:   34359449059 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       44928 kB
DirectMap2M:     3622912 kB


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  6:36     ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-28  6:36 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 02:35:29PM +0900, KOSAKI Motohiro wrote:
> (cc to linux-mm and Rik)
> 
> > Hi,
> > 
> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > and then I did the following (with XFS over LVM):
> > 
> > mv /500gig/of/data/on/disk/one /disk/two
> > 
> > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > Basically every UI operation, including the mouse in Xorg, started experiencing
> > multiple second lag and delays.  This made the system essentially unusable --
> > for example, just flipping to the window where the "mv" command was running
> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> > interface.
> 
> I have some question and request.
> 
> 1. please post your /proc/meminfo
> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> 3. cache limitation of memcgroup solve this problem?
> 4. Which disk have your /bin and /usr/bin?

I'll answer these out of order if you don't mind.

2. Do above copy make tons swap-out? IOW your disk read much faster than write?

The disks should be roughly similar.  However:

sda is the read disk, sdb is the write.  Here's a few snippets from iostat -xm 10

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda              67.70     0.00  373.10    0.20    48.47     0.00   265.90     1.94    5.21   2.10  78.32
sdb               0.00  1889.60    0.00  139.80     0.00    52.52   769.34    35.01  250.45   5.17  72.28
---
sda               5.30     0.00  483.80    0.30    60.65     0.00   256.59     1.59    3.28   1.65  79.72
sdb               0.00  3632.70    0.00  171.10     0.00    61.10   731.39   117.09  709.66   5.84 100.00
---
sda              51.20     0.00  478.10    1.00    65.79     0.01   281.27     2.48    5.18   1.96  93.72
sdb               0.00  2104.60    0.00  174.80     0.00    62.84   736.28   108.50  613.64   5.72 100.00
--
sda             153.20     0.00  349.40    0.20    60.99     0.00   357.30     4.47   13.19   2.85  99.80
sdb               0.00  1766.50    0.00  158.60     0.00    59.89   773.34   110.07  672.25   6.30  99.96

This data seems to indicate the IO performance varies, but the reader is usually faster.

4. Which disk have your /bin and /usr/bin?

sda, the reader.

3. cache limitation of memcgroup solve this problem?

I was unable to get this to work -- do you have some documentation handy?

1. please post your /proc/meminfo

$ cat /proc/meminfo 
MemTotal:        3467668 kB
MemFree:           20164 kB
Buffers:             204 kB
Cached:          2295232 kB
SwapCached:         4012 kB
Active:           639608 kB
Inactive:        2620880 kB
Active(anon):     608104 kB
Inactive(anon):   360812 kB
Active(file):      31504 kB
Inactive(file):  2260068 kB
Unevictable:           8 kB
Mlocked:               8 kB
SwapTotal:       4194296 kB
SwapFree:        4186968 kB
Dirty:            147280 kB
Writeback:          8424 kB
AnonPages:        961280 kB
Mapped:            39016 kB
Slab:              81904 kB
SReclaimable:      59044 kB
SUnreclaim:        22860 kB
PageTables:        20548 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     5928128 kB
Committed_AS:    1770348 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      281908 kB
VmallocChunk:   34359449059 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       44928 kB
DirectMap2M:     3622912 kB

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  6:36     ` Elladan
@ 2009-04-28  6:52       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-28  6:52 UTC (permalink / raw)
  To: Elladan; +Cc: kosaki.motohiro, linux-kernel, linux-mm, Rik van Riel

Hi

> 3. cache limitation of memcgroup solve this problem?
> 
> I was unable to get this to work -- do you have some documentation handy?

Do you have kernel source tarball?
Documentation/cgroups/memory.txt explain usage kindly.





^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  6:52       ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-28  6:52 UTC (permalink / raw)
  To: Elladan; +Cc: kosaki.motohiro, linux-kernel, linux-mm, Rik van Riel

Hi

> 3. cache limitation of memcgroup solve this problem?
> 
> I was unable to get this to work -- do you have some documentation handy?

Do you have kernel source tarball?
Documentation/cgroups/memory.txt explain usage kindly.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  6:52       ` KOSAKI Motohiro
@ 2009-04-28  7:26         ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-28  7:26 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 03:52:29PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > 3. cache limitation of memcgroup solve this problem?
> > 
> > I was unable to get this to work -- do you have some documentation handy?
> 
> Do you have kernel source tarball?
> Documentation/cgroups/memory.txt explain usage kindly.

Thank you.  My documentation was out of date.

I created a cgroup with limited memory and placed a copy command in it, and the
latency problem seems to essentially go away.  However, I'm also a bit
suspicious that my test might have become invalid, since my IO performance
seems to have dropped somewhat too.

So, am I right in concluding that this more or less implicates bad page
replacement as the culprit?  After I dropped vm caches and let my working set
re-form, the memory cgroup seems to be effective at keeping a large pool of
memory free from file pressure.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  7:26         ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-28  7:26 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 03:52:29PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > 3. cache limitation of memcgroup solve this problem?
> > 
> > I was unable to get this to work -- do you have some documentation handy?
> 
> Do you have kernel source tarball?
> Documentation/cgroups/memory.txt explain usage kindly.

Thank you.  My documentation was out of date.

I created a cgroup with limited memory and placed a copy command in it, and the
latency problem seems to essentially go away.  However, I'm also a bit
suspicious that my test might have become invalid, since my IO performance
seems to have dropped somewhat too.

So, am I right in concluding that this more or less implicates bad page
replacement as the culprit?  After I dropped vm caches and let my working set
re-form, the memory cgroup seems to be effective at keeping a large pool of
memory free from file pressure.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  7:26         ` Elladan
@ 2009-04-28  7:44           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-28  7:44 UTC (permalink / raw)
  To: Elladan; +Cc: kosaki.motohiro, linux-kernel, linux-mm, Rik van Riel

> On Tue, Apr 28, 2009 at 03:52:29PM +0900, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > 3. cache limitation of memcgroup solve this problem?
> > > 
> > > I was unable to get this to work -- do you have some documentation handy?
> > 
> > Do you have kernel source tarball?
> > Documentation/cgroups/memory.txt explain usage kindly.
> 
> Thank you.  My documentation was out of date.
> 
> I created a cgroup with limited memory and placed a copy command in it, and the
> latency problem seems to essentially go away.  However, I'm also a bit
> suspicious that my test might have become invalid, since my IO performance
> seems to have dropped somewhat too.
> 
> So, am I right in concluding that this more or less implicates bad page
> replacement as the culprit?  After I dropped vm caches and let my working set
> re-form, the memory cgroup seems to be effective at keeping a large pool of
> memory free from file pressure.

Hmm..
it seems your result mean bad page replacement occur. but actually
I hevn't seen such result on my environment.

Hmm, I think I need to make reproduce environmet to your trouble.

Thanks.




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  7:44           ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-28  7:44 UTC (permalink / raw)
  To: Elladan; +Cc: kosaki.motohiro, linux-kernel, linux-mm, Rik van Riel

> On Tue, Apr 28, 2009 at 03:52:29PM +0900, KOSAKI Motohiro wrote:
> > Hi
> > 
> > > 3. cache limitation of memcgroup solve this problem?
> > > 
> > > I was unable to get this to work -- do you have some documentation handy?
> > 
> > Do you have kernel source tarball?
> > Documentation/cgroups/memory.txt explain usage kindly.
> 
> Thank you.  My documentation was out of date.
> 
> I created a cgroup with limited memory and placed a copy command in it, and the
> latency problem seems to essentially go away.  However, I'm also a bit
> suspicious that my test might have become invalid, since my IO performance
> seems to have dropped somewhat too.
> 
> So, am I right in concluding that this more or less implicates bad page
> replacement as the culprit?  After I dropped vm caches and let my working set
> re-form, the memory cgroup seems to be effective at keeping a large pool of
> memory free from file pressure.

Hmm..
it seems your result mean bad page replacement occur. but actually
I hevn't seen such result on my environment.

Hmm, I think I need to make reproduce environmet to your trouble.

Thanks.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  5:35   ` KOSAKI Motohiro
@ 2009-04-28  7:48     ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-04-28  7:48 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> (cc to linux-mm and Rik)
> 
> 
> > Hi,
> > 
> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > and then I did the following (with XFS over LVM):
> > 
> > mv /500gig/of/data/on/disk/one /disk/two
> > 
> > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > Basically every UI operation, including the mouse in Xorg, started experiencing
> > multiple second lag and delays.  This made the system essentially unusable --
> > for example, just flipping to the window where the "mv" command was running
> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> > interface.
> 
> I have some question and request.
> 
> 1. please post your /proc/meminfo
> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> 3. cache limitation of memcgroup solve this problem?
> 4. Which disk have your /bin and /usr/bin?
> 

FWIW I fundamentally object to 3 as being a solution.

I still think the idea of read-ahead driven drop-behind is a good one,
alas last time we brought that up people thought differently.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  7:48     ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-04-28  7:48 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> (cc to linux-mm and Rik)
> 
> 
> > Hi,
> > 
> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > and then I did the following (with XFS over LVM):
> > 
> > mv /500gig/of/data/on/disk/one /disk/two
> > 
> > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > Basically every UI operation, including the mouse in Xorg, started experiencing
> > multiple second lag and delays.  This made the system essentially unusable --
> > for example, just flipping to the window where the "mv" command was running
> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> > interface.
> 
> I have some question and request.
> 
> 1. please post your /proc/meminfo
> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> 3. cache limitation of memcgroup solve this problem?
> 4. Which disk have your /bin and /usr/bin?
> 

FWIW I fundamentally object to 3 as being a solution.

I still think the idea of read-ahead driven drop-behind is a good one,
alas last time we brought that up people thought differently.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  7:48     ` Peter Zijlstra
@ 2009-04-28  7:58       ` Balbir Singh
  -1 siblings, 0 replies; 336+ messages in thread
From: Balbir Singh @ 2009-04-28  7:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 1:18 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
>> (cc to linux-mm and Rik)
>>
>>
>> > Hi,
>> >
>> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
>> > and then I did the following (with XFS over LVM):
>> >
>> > mv /500gig/of/data/on/disk/one /disk/two
>> >
>> > This quickly caused the system to. grind.. to... a.... complete..... halt.
>> > Basically every UI operation, including the mouse in Xorg, started experiencing
>> > multiple second lag and delays.  This made the system essentially unusable --
>> > for example, just flipping to the window where the "mv" command was running
>> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
>> > interface.
>>
>> I have some question and request.
>>
>> 1. please post your /proc/meminfo
>> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
>> 3. cache limitation of memcgroup solve this problem?
>> 4. Which disk have your /bin and /usr/bin?
>>
>
> FWIW I fundamentally object to 3 as being a solution.
>

memcgroup were not created to solve latency problems, but they do
isolate memory and if that helps latency, I don't see why that is a
problem. I don't think isolating applications that we think are not
important and interfere or consume more resources than desired is a
bad solution.

> I still think the idea of read-ahead driven drop-behind is a good one,
> alas last time we brought that up people thought differently.

I vaguely remember the patches, but can't recollect the details.

Balbir

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  7:58       ` Balbir Singh
  0 siblings, 0 replies; 336+ messages in thread
From: Balbir Singh @ 2009-04-28  7:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 1:18 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
>> (cc to linux-mm and Rik)
>>
>>
>> > Hi,
>> >
>> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
>> > and then I did the following (with XFS over LVM):
>> >
>> > mv /500gig/of/data/on/disk/one /disk/two
>> >
>> > This quickly caused the system to. grind.. to... a.... complete..... halt.
>> > Basically every UI operation, including the mouse in Xorg, started experiencing
>> > multiple second lag and delays.  This made the system essentially unusable --
>> > for example, just flipping to the window where the "mv" command was running
>> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
>> > interface.
>>
>> I have some question and request.
>>
>> 1. please post your /proc/meminfo
>> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
>> 3. cache limitation of memcgroup solve this problem?
>> 4. Which disk have your /bin and /usr/bin?
>>
>
> FWIW I fundamentally object to 3 as being a solution.
>

memcgroup were not created to solve latency problems, but they do
isolate memory and if that helps latency, I don't see why that is a
problem. I don't think isolating applications that we think are not
important and interfere or consume more resources than desired is a
bad solution.

> I still think the idea of read-ahead driven drop-behind is a good one,
> alas last time we brought that up people thought differently.

I vaguely remember the patches, but can't recollect the details.

Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  7:48     ` Peter Zijlstra
@ 2009-04-28  8:03       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-28  8:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kosaki.motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

> > 1. please post your /proc/meminfo
> > 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> > 3. cache limitation of memcgroup solve this problem?
> > 4. Which disk have your /bin and /usr/bin?
> > 
> 
> FWIW I fundamentally object to 3 as being a solution.

Yes, I also think so.


> I still think the idea of read-ahead driven drop-behind is a good one,
> alas last time we brought that up people thought differently.

hmm.
sorry, I can't recall this patch. do you have any pointer or url?




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  8:03       ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-28  8:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kosaki.motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

> > 1. please post your /proc/meminfo
> > 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> > 3. cache limitation of memcgroup solve this problem?
> > 4. Which disk have your /bin and /usr/bin?
> > 
> 
> FWIW I fundamentally object to 3 as being a solution.

Yes, I also think so.


> I still think the idea of read-ahead driven drop-behind is a good one,
> alas last time we brought that up people thought differently.

hmm.
sorry, I can't recall this patch. do you have any pointer or url?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  7:58       ` Balbir Singh
@ 2009-04-28  8:11         ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-04-28  8:11 UTC (permalink / raw)
  To: Balbir Singh
  Cc: KOSAKI Motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, 2009-04-28 at 13:28 +0530, Balbir Singh wrote:
> On Tue, Apr 28, 2009 at 1:18 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> >> (cc to linux-mm and Rik)
> >>
> >>
> >> > Hi,
> >> >
> >> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> >> > and then I did the following (with XFS over LVM):
> >> >
> >> > mv /500gig/of/data/on/disk/one /disk/two
> >> >
> >> > This quickly caused the system to. grind.. to... a.... complete..... halt.
> >> > Basically every UI operation, including the mouse in Xorg, started experiencing
> >> > multiple second lag and delays.  This made the system essentially unusable --
> >> > for example, just flipping to the window where the "mv" command was running
> >> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> >> > interface.
> >>
> >> I have some question and request.
> >>
> >> 1. please post your /proc/meminfo
> >> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> >> 3. cache limitation of memcgroup solve this problem?
> >> 4. Which disk have your /bin and /usr/bin?
> >>
> >
> > FWIW I fundamentally object to 3 as being a solution.
> >
> 
> memcgroup were not created to solve latency problems, but they do
> isolate memory and if that helps latency, I don't see why that is a
> problem. I don't think isolating applications that we think are not
> important and interfere or consume more resources than desired is a
> bad solution.

So being able to isolate is a good excuse for poor replacement these
days?

Also, exactly because its isolated/limited its sub-optimal.


> > I still think the idea of read-ahead driven drop-behind is a good one,
> > alas last time we brought that up people thought differently.
> 
> I vaguely remember the patches, but can't recollect the details.

A quick google gave me this:

  http://lkml.org/lkml/2007/7/21/219



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  8:11         ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-04-28  8:11 UTC (permalink / raw)
  To: Balbir Singh
  Cc: KOSAKI Motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, 2009-04-28 at 13:28 +0530, Balbir Singh wrote:
> On Tue, Apr 28, 2009 at 1:18 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> >> (cc to linux-mm and Rik)
> >>
> >>
> >> > Hi,
> >> >
> >> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> >> > and then I did the following (with XFS over LVM):
> >> >
> >> > mv /500gig/of/data/on/disk/one /disk/two
> >> >
> >> > This quickly caused the system to. grind.. to... a.... complete..... halt.
> >> > Basically every UI operation, including the mouse in Xorg, started experiencing
> >> > multiple second lag and delays.  This made the system essentially unusable --
> >> > for example, just flipping to the window where the "mv" command was running
> >> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> >> > interface.
> >>
> >> I have some question and request.
> >>
> >> 1. please post your /proc/meminfo
> >> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> >> 3. cache limitation of memcgroup solve this problem?
> >> 4. Which disk have your /bin and /usr/bin?
> >>
> >
> > FWIW I fundamentally object to 3 as being a solution.
> >
> 
> memcgroup were not created to solve latency problems, but they do
> isolate memory and if that helps latency, I don't see why that is a
> problem. I don't think isolating applications that we think are not
> important and interfere or consume more resources than desired is a
> bad solution.

So being able to isolate is a good excuse for poor replacement these
days?

Also, exactly because its isolated/limited its sub-optimal.


> > I still think the idea of read-ahead driven drop-behind is a good one,
> > alas last time we brought that up people thought differently.
> 
> I vaguely remember the patches, but can't recollect the details.

A quick google gave me this:

  http://lkml.org/lkml/2007/7/21/219


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  8:11         ` Peter Zijlstra
@ 2009-04-28  8:23           ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 336+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-04-28  8:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Balbir Singh, KOSAKI Motohiro, Elladan, linux-kernel, linux-mm,
	Rik van Riel

On Tue, 28 Apr 2009 10:11:32 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, 2009-04-28 at 13:28 +0530, Balbir Singh wrote:
> > On Tue, Apr 28, 2009 at 1:18 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> > >> (cc to linux-mm and Rik)
> > >>
> > >>
> > >> > Hi,
> > >> >
> > >> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > >> > and then I did the following (with XFS over LVM):
> > >> >
> > >> > mv /500gig/of/data/on/disk/one /disk/two
> > >> >
> > >> > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > >> > Basically every UI operation, including the mouse in Xorg, started experiencing
> > >> > multiple second lag and delays.  This made the system essentially unusable --
> > >> > for example, just flipping to the window where the "mv" command was running
> > >> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> > >> > interface.
> > >>
> > >> I have some question and request.
> > >>
> > >> 1. please post your /proc/meminfo
> > >> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> > >> 3. cache limitation of memcgroup solve this problem?
> > >> 4. Which disk have your /bin and /usr/bin?
> > >>
> > >
> > > FWIW I fundamentally object to 3 as being a solution.
> > >
> > 
> > memcgroup were not created to solve latency problems, but they do
> > isolate memory and if that helps latency, I don't see why that is a
> > problem. I don't think isolating applications that we think are not
> > important and interfere or consume more resources than desired is a
> > bad solution.
> 
> So being able to isolate is a good excuse for poor replacement these
> days?
> 
While the kernel can't catch what's going on and what's wanted.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  8:23           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 336+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-04-28  8:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Balbir Singh, KOSAKI Motohiro, Elladan, linux-kernel, linux-mm,
	Rik van Riel

On Tue, 28 Apr 2009 10:11:32 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, 2009-04-28 at 13:28 +0530, Balbir Singh wrote:
> > On Tue, Apr 28, 2009 at 1:18 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> > >> (cc to linux-mm and Rik)
> > >>
> > >>
> > >> > Hi,
> > >> >
> > >> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > >> > and then I did the following (with XFS over LVM):
> > >> >
> > >> > mv /500gig/of/data/on/disk/one /disk/two
> > >> >
> > >> > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > >> > Basically every UI operation, including the mouse in Xorg, started experiencing
> > >> > multiple second lag and delays.  This made the system essentially unusable --
> > >> > for example, just flipping to the window where the "mv" command was running
> > >> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> > >> > interface.
> > >>
> > >> I have some question and request.
> > >>
> > >> 1. please post your /proc/meminfo
> > >> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> > >> 3. cache limitation of memcgroup solve this problem?
> > >> 4. Which disk have your /bin and /usr/bin?
> > >>
> > >
> > > FWIW I fundamentally object to 3 as being a solution.
> > >
> > 
> > memcgroup were not created to solve latency problems, but they do
> > isolate memory and if that helps latency, I don't see why that is a
> > problem. I don't think isolating applications that we think are not
> > important and interfere or consume more resources than desired is a
> > bad solution.
> 
> So being able to isolate is a good excuse for poor replacement these
> days?
> 
While the kernel can't catch what's going on and what's wanted.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  8:11         ` Peter Zijlstra
@ 2009-04-28  8:25           ` Balbir Singh
  -1 siblings, 0 replies; 336+ messages in thread
From: Balbir Singh @ 2009-04-28  8:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 1:41 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2009-04-28 at 13:28 +0530, Balbir Singh wrote:
>> On Tue, Apr 28, 2009 at 1:18 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
>> >> (cc to linux-mm and Rik)
>> >>
>> >>
>> >> > Hi,
>> >> >
>> >> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
>> >> > and then I did the following (with XFS over LVM):
>> >> >
>> >> > mv /500gig/of/data/on/disk/one /disk/two
>> >> >
>> >> > This quickly caused the system to. grind.. to... a.... complete..... halt.
>> >> > Basically every UI operation, including the mouse in Xorg, started experiencing
>> >> > multiple second lag and delays.  This made the system essentially unusable --
>> >> > for example, just flipping to the window where the "mv" command was running
>> >> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
>> >> > interface.
>> >>
>> >> I have some question and request.
>> >>
>> >> 1. please post your /proc/meminfo
>> >> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
>> >> 3. cache limitation of memcgroup solve this problem?
>> >> 4. Which disk have your /bin and /usr/bin?
>> >>
>> >
>> > FWIW I fundamentally object to 3 as being a solution.
>> >
>>
>> memcgroup were not created to solve latency problems, but they do
>> isolate memory and if that helps latency, I don't see why that is a
>> problem. I don't think isolating applications that we think are not
>> important and interfere or consume more resources than desired is a
>> bad solution.
>
> So being able to isolate is a good excuse for poor replacement these
> days?
>

Nope.. I am not saying that. Poor replacement needs to be fixed, but
unfortunately that is very dependent of the nature of the workload,
poor for one might be good for another, of course there is always the
middle ground based on our understanding of desired behaviour. Having
said that, isolating unimportant tasks might be a trade-off that
works, it *does not* replace the good algorithms we need to have a
default, but provides manual control of an otherwise auto piloted
system. With virtualization mixed workloads are becoming more common
on the system.

Providing the swappiness knob for example is needed because sometimes
the user does know what he/she needs.

> Also, exactly because its isolated/limited its sub-optimal.
>
>
>> > I still think the idea of read-ahead driven drop-behind is a good one,
>> > alas last time we brought that up people thought differently.
>>
>> I vaguely remember the patches, but can't recollect the details.
>
> A quick google gave me this:
>
>  http://lkml.org/lkml/2007/7/21/219

Thanks! That was quick

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  8:25           ` Balbir Singh
  0 siblings, 0 replies; 336+ messages in thread
From: Balbir Singh @ 2009-04-28  8:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 1:41 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2009-04-28 at 13:28 +0530, Balbir Singh wrote:
>> On Tue, Apr 28, 2009 at 1:18 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
>> >> (cc to linux-mm and Rik)
>> >>
>> >>
>> >> > Hi,
>> >> >
>> >> > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
>> >> > and then I did the following (with XFS over LVM):
>> >> >
>> >> > mv /500gig/of/data/on/disk/one /disk/two
>> >> >
>> >> > This quickly caused the system to. grind.. to... a.... complete..... halt.
>> >> > Basically every UI operation, including the mouse in Xorg, started experiencing
>> >> > multiple second lag and delays.  This made the system essentially unusable --
>> >> > for example, just flipping to the window where the "mv" command was running
>> >> > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
>> >> > interface.
>> >>
>> >> I have some question and request.
>> >>
>> >> 1. please post your /proc/meminfo
>> >> 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
>> >> 3. cache limitation of memcgroup solve this problem?
>> >> 4. Which disk have your /bin and /usr/bin?
>> >>
>> >
>> > FWIW I fundamentally object to 3 as being a solution.
>> >
>>
>> memcgroup were not created to solve latency problems, but they do
>> isolate memory and if that helps latency, I don't see why that is a
>> problem. I don't think isolating applications that we think are not
>> important and interfere or consume more resources than desired is a
>> bad solution.
>
> So being able to isolate is a good excuse for poor replacement these
> days?
>

Nope.. I am not saying that. Poor replacement needs to be fixed, but
unfortunately that is very dependent of the nature of the workload,
poor for one might be good for another, of course there is always the
middle ground based on our understanding of desired behaviour. Having
said that, isolating unimportant tasks might be a trade-off that
works, it *does not* replace the good algorithms we need to have a
default, but provides manual control of an otherwise auto piloted
system. With virtualization mixed workloads are becoming more common
on the system.

Providing the swappiness knob for example is needed because sometimes
the user does know what he/she needs.

> Also, exactly because its isolated/limited its sub-optimal.
>
>
>> > I still think the idea of read-ahead driven drop-behind is a good one,
>> > alas last time we brought that up people thought differently.
>>
>> I vaguely remember the patches, but can't recollect the details.
>
> A quick google gave me this:
>
>  http://lkml.org/lkml/2007/7/21/219

Thanks! That was quick

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  7:48     ` Peter Zijlstra
@ 2009-04-28  9:09       ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-04-28  9:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 09:48:39AM +0200, Peter Zijlstra wrote:
> On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> > (cc to linux-mm and Rik)
> > 
> > 
> > > Hi,
> > > 
> > > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > > and then I did the following (with XFS over LVM):
> > > 
> > > mv /500gig/of/data/on/disk/one /disk/two
> > > 
> > > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > > Basically every UI operation, including the mouse in Xorg, started experiencing
> > > multiple second lag and delays.  This made the system essentially unusable --
> > > for example, just flipping to the window where the "mv" command was running
> > > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> > > interface.
> > 
> > I have some question and request.
> > 
> > 1. please post your /proc/meminfo
> > 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> > 3. cache limitation of memcgroup solve this problem?
> > 4. Which disk have your /bin and /usr/bin?
> > 
> 
> FWIW I fundamentally object to 3 as being a solution.
> 
> I still think the idea of read-ahead driven drop-behind is a good one,
> alas last time we brought that up people thought differently.

The semi-drop-behind is a great idea for the desktop - to put just
accessed pages to end of LRU. However I'm still afraid it vastly
changes the caching behavior and wont work well as expected in server
workloads - shall we verify this?

Back to this big-cp-hurts-responsibility issue. Background write
requests can easily pass the io scheduler's obstacles and fill up
the disk queue. Now every read request will have to wait 10+ writes
- leading to 10x slow down of major page faults.

I reach this conclusion based on recent CFQ code reviews. Will bring up
a queue depth limiting patch for more exercises..

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  9:09       ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-04-28  9:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 09:48:39AM +0200, Peter Zijlstra wrote:
> On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> > (cc to linux-mm and Rik)
> > 
> > 
> > > Hi,
> > > 
> > > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > > and then I did the following (with XFS over LVM):
> > > 
> > > mv /500gig/of/data/on/disk/one /disk/two
> > > 
> > > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > > Basically every UI operation, including the mouse in Xorg, started experiencing
> > > multiple second lag and delays.  This made the system essentially unusable --
> > > for example, just flipping to the window where the "mv" command was running
> > > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> > > interface.
> > 
> > I have some question and request.
> > 
> > 1. please post your /proc/meminfo
> > 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> > 3. cache limitation of memcgroup solve this problem?
> > 4. Which disk have your /bin and /usr/bin?
> > 
> 
> FWIW I fundamentally object to 3 as being a solution.
> 
> I still think the idea of read-ahead driven drop-behind is a good one,
> alas last time we brought that up people thought differently.

The semi-drop-behind is a great idea for the desktop - to put just
accessed pages to end of LRU. However I'm still afraid it vastly
changes the caching behavior and wont work well as expected in server
workloads - shall we verify this?

Back to this big-cp-hurts-responsibility issue. Background write
requests can easily pass the io scheduler's obstacles and fill up
the disk queue. Now every read request will have to wait 10+ writes
- leading to 10x slow down of major page faults.

I reach this conclusion based on recent CFQ code reviews. Will bring up
a queue depth limiting patch for more exercises..

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  9:09       ` Wu Fengguang
@ 2009-04-28  9:26         ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-04-28  9:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:
> On Tue, Apr 28, 2009 at 09:48:39AM +0200, Peter Zijlstra wrote:
> > On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> > > (cc to linux-mm and Rik)
> > >
> > >
> > > > Hi,
> > > >
> > > > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > > > and then I did the following (with XFS over LVM):
> > > >
> > > > mv /500gig/of/data/on/disk/one /disk/two
> > > >
> > > > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > > > Basically every UI operation, including the mouse in Xorg, started experiencing
> > > > multiple second lag and delays.  This made the system essentially unusable --
> > > > for example, just flipping to the window where the "mv" command was running
> > > > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> > > > interface.
> > >
> > > I have some question and request.
> > >
> > > 1. please post your /proc/meminfo
> > > 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> > > 3. cache limitation of memcgroup solve this problem?
> > > 4. Which disk have your /bin and /usr/bin?
> > >
> >
> > FWIW I fundamentally object to 3 as being a solution.
> >
> > I still think the idea of read-ahead driven drop-behind is a good one,
> > alas last time we brought that up people thought differently.
>
> The semi-drop-behind is a great idea for the desktop - to put just
> accessed pages to end of LRU. However I'm still afraid it vastly
> changes the caching behavior and wont work well as expected in server
> workloads - shall we verify this?
>
> Back to this big-cp-hurts-responsibility issue. Background write
> requests can easily pass the io scheduler's obstacles and fill up
> the disk queue. Now every read request will have to wait 10+ writes
> - leading to 10x slow down of major page faults.
>
> I reach this conclusion based on recent CFQ code reviews. Will bring up
> a queue depth limiting patch for more exercises..

Sorry - just realized that Elladan's root fs lies in sda - the read side.

Then why shall a single read stream to cause 2000ms major fault delays?
The 'await' value for sda is <10ms, not even close to 2000ms:

> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> sda              67.70     0.00  373.10    0.20    48.47     0.00   265.90     1.94    5.21   2.10  78.32
> sdb               0.00  1889.60    0.00  139.80     0.00    52.52   769.34    35.01  250.45   5.17  72.28
> ---
> sda               5.30     0.00  483.80    0.30    60.65     0.00   256.59     1.59    3.28   1.65  79.72
> sdb               0.00  3632.70    0.00  171.10     0.00    61.10   731.39   117.09  709.66   5.84 100.00
> ---
> sda              51.20     0.00  478.10    1.00    65.79     0.01   281.27     2.48    5.18   1.96  93.72
> sdb               0.00  2104.60    0.00  174.80     0.00    62.84   736.28   108.50  613.64   5.72 100.00
> --
> sda             153.20     0.00  349.40    0.20    60.99     0.00   357.30     4.47   13.19   2.85  99.80
> sdb               0.00  1766.50    0.00  158.60     0.00    59.89   773.34   110.07  672.25   6.30  99.96


Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28  9:26         ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-04-28  9:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:
> On Tue, Apr 28, 2009 at 09:48:39AM +0200, Peter Zijlstra wrote:
> > On Tue, 2009-04-28 at 14:35 +0900, KOSAKI Motohiro wrote:
> > > (cc to linux-mm and Rik)
> > >
> > >
> > > > Hi,
> > > >
> > > > So, I just set up Ubuntu Jaunty (using Linux 2.6.28) on a quad core phenom box,
> > > > and then I did the following (with XFS over LVM):
> > > >
> > > > mv /500gig/of/data/on/disk/one /disk/two
> > > >
> > > > This quickly caused the system to. grind.. to... a.... complete..... halt.
> > > > Basically every UI operation, including the mouse in Xorg, started experiencing
> > > > multiple second lag and delays.  This made the system essentially unusable --
> > > > for example, just flipping to the window where the "mv" command was running
> > > > took 10 seconds on more than one occasion.  Basically a "click and get coffee"
> > > > interface.
> > >
> > > I have some question and request.
> > >
> > > 1. please post your /proc/meminfo
> > > 2. Do above copy make tons swap-out? IOW your disk read much faster than write?
> > > 3. cache limitation of memcgroup solve this problem?
> > > 4. Which disk have your /bin and /usr/bin?
> > >
> >
> > FWIW I fundamentally object to 3 as being a solution.
> >
> > I still think the idea of read-ahead driven drop-behind is a good one,
> > alas last time we brought that up people thought differently.
>
> The semi-drop-behind is a great idea for the desktop - to put just
> accessed pages to end of LRU. However I'm still afraid it vastly
> changes the caching behavior and wont work well as expected in server
> workloads - shall we verify this?
>
> Back to this big-cp-hurts-responsibility issue. Background write
> requests can easily pass the io scheduler's obstacles and fill up
> the disk queue. Now every read request will have to wait 10+ writes
> - leading to 10x slow down of major page faults.
>
> I reach this conclusion based on recent CFQ code reviews. Will bring up
> a queue depth limiting patch for more exercises..

Sorry - just realized that Elladan's root fs lies in sda - the read side.

Then why shall a single read stream to cause 2000ms major fault delays?
The 'await' value for sda is <10ms, not even close to 2000ms:

> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> sda              67.70     0.00  373.10    0.20    48.47     0.00   265.90     1.94    5.21   2.10  78.32
> sdb               0.00  1889.60    0.00  139.80     0.00    52.52   769.34    35.01  250.45   5.17  72.28
> ---
> sda               5.30     0.00  483.80    0.30    60.65     0.00   256.59     1.59    3.28   1.65  79.72
> sdb               0.00  3632.70    0.00  171.10     0.00    61.10   731.39   117.09  709.66   5.84 100.00
> ---
> sda              51.20     0.00  478.10    1.00    65.79     0.01   281.27     2.48    5.18   1.96  93.72
> sdb               0.00  2104.60    0.00  174.80     0.00    62.84   736.28   108.50  613.64   5.72 100.00
> --
> sda             153.20     0.00  349.40    0.20    60.99     0.00   357.30     4.47   13.19   2.85  99.80
> sdb               0.00  1766.50    0.00  158.60     0.00    59.89   773.34   110.07  672.25   6.30  99.96


Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  9:09       ` Wu Fengguang
@ 2009-04-28 12:08         ` Theodore Tso
  -1 siblings, 0 replies; 336+ messages in thread
From: Theodore Tso @ 2009-04-28 12:08 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Peter Zijlstra, KOSAKI Motohiro, Elladan, linux-kernel, linux-mm,
	Rik van Riel

On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:
> The semi-drop-behind is a great idea for the desktop - to put just
> accessed pages to end of LRU. However I'm still afraid it vastly
> changes the caching behavior and wont work well as expected in server
> workloads - shall we verify this?
> 
> Back to this big-cp-hurts-responsibility issue. Background write
> requests can easily pass the io scheduler's obstacles and fill up
> the disk queue. Now every read request will have to wait 10+ writes
> - leading to 10x slow down of major page faults.
> 
> I reach this conclusion based on recent CFQ code reviews. Will bring up
> a queue depth limiting patch for more exercises..

We can muck with the I/O scheduler, but another thing to consider is
whether the VM should be more aggressively throttling writes in this
case; it sounds like the big cp in this case may be dirtying pages so
aggressively that it's driving other (more useful) pages out of the
page cache --- if the target disk is slower than the source disk (for
example, backing up a SATA primary disk to a USB-attached backup disk)
no amount of drop-behind is going to help the situation.

So that leaves three areas for exploration:

* Write-throttling
* Drop-behind
* background writes pushing aside foreground reads

Hmm, note that although the original bug reporter is running Ubuntu
Jaunty, and hence 2.6.28, this problem is going to get *worse* with
2.6.30, since we have the ext3 data=ordered latency fixes which will
write out the any journal activity, and worse, any synchornous commits
(i.e., caused by fsync) will force out all of the dirty pages with
WRITE_SYNC priority.  So with a heavy load, I suspect this is going to
be more of a VM issue, and especially figuring out how to tune more
aggressive write-throttling may be key here.

					- Ted

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-28 12:08         ` Theodore Tso
  0 siblings, 0 replies; 336+ messages in thread
From: Theodore Tso @ 2009-04-28 12:08 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Peter Zijlstra, KOSAKI Motohiro, Elladan, linux-kernel, linux-mm,
	Rik van Riel

On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:
> The semi-drop-behind is a great idea for the desktop - to put just
> accessed pages to end of LRU. However I'm still afraid it vastly
> changes the caching behavior and wont work well as expected in server
> workloads - shall we verify this?
> 
> Back to this big-cp-hurts-responsibility issue. Background write
> requests can easily pass the io scheduler's obstacles and fill up
> the disk queue. Now every read request will have to wait 10+ writes
> - leading to 10x slow down of major page faults.
> 
> I reach this conclusion based on recent CFQ code reviews. Will bring up
> a queue depth limiting patch for more exercises..

We can muck with the I/O scheduler, but another thing to consider is
whether the VM should be more aggressively throttling writes in this
case; it sounds like the big cp in this case may be dirtying pages so
aggressively that it's driving other (more useful) pages out of the
page cache --- if the target disk is slower than the source disk (for
example, backing up a SATA primary disk to a USB-attached backup disk)
no amount of drop-behind is going to help the situation.

So that leaves three areas for exploration:

* Write-throttling
* Drop-behind
* background writes pushing aside foreground reads

Hmm, note that although the original bug reporter is running Ubuntu
Jaunty, and hence 2.6.28, this problem is going to get *worse* with
2.6.30, since we have the ext3 data=ordered latency fixes which will
write out the any journal activity, and worse, any synchornous commits
(i.e., caused by fsync) will force out all of the dirty pages with
WRITE_SYNC priority.  So with a heavy load, I suspect this is going to
be more of a VM issue, and especially figuring out how to tune more
aggressive write-throttling may be key here.

					- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28  5:35   ` KOSAKI Motohiro
                     ` (2 preceding siblings ...)
  (?)
@ 2009-04-28 15:28   ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-28 15:28 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Elladan, linux-kernel, linux-mm, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1434 bytes --]

KOSAKI Motohiro wrote:

>> Next, I set the following:
>>
>> echo 0 > /proc/sys/vm/swappiness
>>
>> ... hoping it would prevent paging out of the UI in favor of file data that's
>> only used once.  It did appear to help to a small degree, but not much.  The
>> system is still effectively unusable while a file copy is going on.
>>
>> From this, I diagnosed that most likely, the kernel was paging out all my
>> application file mmap() data (such as my executables and shared libraries) in
>> favor of total garbage VM load from the file copy.

I believe your analysis is correct.

When merging the split LRU code upstream, some code was changed
(for scalability reasons) that results in active file pages being
moved to the inactive list any time we evict inactive file pages.

Even if the active file pages are referenced, they are not
protected from the streaming IO.

However, the use-once policy in the VM depends on the active
pages being protected from streaming IO.

A little before the decision to no longer honor the referenced
bit on active file pages was made, we dropped an ugly patch (by
me) after deciding it was just too much of a hack.  However, now
that we have _no_ protection for active file pages against large
amounts of streaming IO, we may want to reinstate something like
it.  Hopefully in a prettier way...

The old patch is attached for inspiration, discussion and maybe
testing :)

-- 
All rights reversed.

[-- Attachment #2: evict-cache-first.patch --]
[-- Type: text/plain, Size: 2986 bytes --]

When there is a lot of streaming IO going on, we do not want
to scan or evict pages from the working set.  The old VM used
to skip any mapped page, but still evict indirect blocks and
other data that is useful to cache.

This patch adds logic to skip scanning the anon lists and
the active file list if most of the file pages are on the
inactive file list (where streaming IO pages live), while
at the lowest scanning priority.

If the system is not doing a lot of streaming IO, eg. the
system is running a database workload, then more often used
file pages will be on the active file list and this logic
is automatically disabled.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 include/linux/mmzone.h |    1 +
 mm/vmscan.c            |   18 ++++++++++++++++--
 2 files changed, 17 insertions(+), 2 deletions(-)

Index: linux-2.6.26-rc8-mm1/include/linux/mmzone.h
===================================================================
--- linux-2.6.26-rc8-mm1.orig/include/linux/mmzone.h	2008-07-07 15:41:32.000000000 -0400
+++ linux-2.6.26-rc8-mm1/include/linux/mmzone.h	2008-07-15 14:58:50.000000000 -0400
@@ -453,6 +453,7 @@ static inline int zone_is_oom_locked(con
  * queues ("queue_length >> 12") during an aging round.
  */
 #define DEF_PRIORITY 12
+#define PRIO_CACHE_ONLY DEF_PRIORITY+1
 
 /* Maximum number of zones on a zonelist */
 #define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
Index: linux-2.6.26-rc8-mm1/mm/vmscan.c
===================================================================
--- linux-2.6.26-rc8-mm1.orig/mm/vmscan.c	2008-07-07 15:41:33.000000000 -0400
+++ linux-2.6.26-rc8-mm1/mm/vmscan.c	2008-07-15 15:10:05.000000000 -0400
@@ -1481,6 +1481,20 @@ static unsigned long shrink_zone(int pri
 		}
 	}
 
+	/*
+	 * If there is a lot of sequential IO going on, most of the
+	 * file pages will be on the inactive file list.  We start
+	 * out by reclaiming those pages, without putting pressure on
+	 * the working set.  We only do this if the bulk of the file pages
+	 * are not in the working set (on the active file list).
+	 */
+	if (priority == PRIO_CACHE_ONLY &&
+			(nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]))
+		for_each_evictable_lru(l)
+			/* Scan only the inactive_file list. */
+			if (l != LRU_INACTIVE_FILE)
+				nr[l] = 0;
+
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {
@@ -1609,7 +1623,7 @@ static unsigned long do_try_to_free_page
 		}
 	}
 
-	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
+	for (priority = PRIO_CACHE_ONLY; priority >= 0; priority--) {
 		sc->nr_scanned = 0;
 		if (!priority)
 			disable_swap_token();
@@ -1771,7 +1785,7 @@ loop_again:
 	for (i = 0; i < pgdat->nr_zones; i++)
 		temp_priority[i] = DEF_PRIORITY;
 
-	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
+	for (priority = PRIO_CACHE_ONLY; priority >= 0; priority--) {
 		int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
 		unsigned long lru_pages = 0;
 

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH] vmscan: evict use-once pages first
  2009-04-28  4:44 Swappiness vs. mmap() and interactive response Elladan
@ 2009-04-28 23:29   ` Rik van Riel
  2009-04-28 23:29   ` Rik van Riel
  1 sibling, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-28 23:29 UTC (permalink / raw)
  To: Elladan; +Cc: linux-kernel, peterz, tytso, kosaki.motohiro, linux-mm

When the file LRU lists are dominated by streaming IO pages,
evict those pages first, before considering evicting other
pages.

This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:
1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted

The pages freed in this way can either be reused for streaming
IO, or allocated for something else. If the pages are used for
streaming IO, this pageout pattern continues. Otherwise, we will
fall back to the normal pageout pattern.

Signed-off-by: Rik van Riel <riel@redhat.com>

--- 
Elladan, does this patch fix the issue you are seeing?

Peter, Kosaki, Ted, does this patch look good to you?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eac9577..4c0304e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1489,6 +1489,21 @@ static void shrink_zone(int priority, struct zone *zone,
 			nr[l] = scan;
 	}
 
+	/*
+	 * When the system is doing streaming IO, memory pressure here
+	 * ensures that active file pages get deactivated, until more
+	 * than half of the file pages are on the inactive list.
+	 *
+	 * Once we get to that situation, protect the system's working
+	 * set from being evicted by disabling active file page aging
+	 * and swapping of swap backed pages.  We still do background
+	 * aging of anonymous pages.
+	 */
+	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]) {
+		nr[LRU_ACTIVE_FILE] = 0;
+		nr[LRU_INACTIVE_ANON] = 0;
+	}
+
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {

^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH] vmscan: evict use-once pages first
@ 2009-04-28 23:29   ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-28 23:29 UTC (permalink / raw)
  To: Elladan; +Cc: linux-kernel, peterz, tytso, kosaki.motohiro, linux-mm

When the file LRU lists are dominated by streaming IO pages,
evict those pages first, before considering evicting other
pages.

This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:
1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted

The pages freed in this way can either be reused for streaming
IO, or allocated for something else. If the pages are used for
streaming IO, this pageout pattern continues. Otherwise, we will
fall back to the normal pageout pattern.

Signed-off-by: Rik van Riel <riel@redhat.com>

--- 
Elladan, does this patch fix the issue you are seeing?

Peter, Kosaki, Ted, does this patch look good to you?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eac9577..4c0304e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1489,6 +1489,21 @@ static void shrink_zone(int priority, struct zone *zone,
 			nr[l] = scan;
 	}
 
+	/*
+	 * When the system is doing streaming IO, memory pressure here
+	 * ensures that active file pages get deactivated, until more
+	 * than half of the file pages are on the inactive list.
+	 *
+	 * Once we get to that situation, protect the system's working
+	 * set from being evicted by disabling active file page aging
+	 * and swapping of swap backed pages.  We still do background
+	 * aging of anonymous pages.
+	 */
+	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]) {
+		nr[LRU_ACTIVE_FILE] = 0;
+		nr[LRU_INACTIVE_ANON] = 0;
+	}
+
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first
  2009-04-28 23:29   ` Rik van Riel
@ 2009-04-29  3:36     ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-29  3:36 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Elladan, linux-kernel, peterz, tytso, kosaki.motohiro, linux-mm

Rik,

This patch appears to significantly improve application latency while a large
file copy runs.  I'm not seeing behavior that implies continuous bad page
replacement.

I'm still seeing some general lag, which I attribute to general filesystem
slowness.  For example, latencytop sees many events like these:

down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 1475.8 msec          5.9 %

xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags 1740.9 msec          2.6 %

Writing a page to disk                            1042.9 msec         43.7 %

It also occasionally sees long page faults:

Page fault                                        2068.3 msec         21.3 %

I guess XFS (and the elevator) is just doing a poor job managing latency
(particularly poor since all the IO on /usr/bin is on the reader disk).
Notable:

Creating block layer request                      451.4 msec         14.4 %

Thank you,
Elladan

On Tue, Apr 28, 2009 at 07:29:07PM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> 
> --- 
> Elladan, does this patch fix the issue you are seeing?
> 
> Peter, Kosaki, Ted, does this patch look good to you?
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eac9577..4c0304e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1489,6 +1489,21 @@ static void shrink_zone(int priority, struct zone *zone,
>  			nr[l] = scan;
>  	}
>  
> +	/*
> +	 * When the system is doing streaming IO, memory pressure here
> +	 * ensures that active file pages get deactivated, until more
> +	 * than half of the file pages are on the inactive list.
> +	 *
> +	 * Once we get to that situation, protect the system's working
> +	 * set from being evicted by disabling active file page aging
> +	 * and swapping of swap backed pages.  We still do background
> +	 * aging of anonymous pages.
> +	 */
> +	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]) {
> +		nr[LRU_ACTIVE_FILE] = 0;
> +		nr[LRU_INACTIVE_ANON] = 0;
> +	}
> +
>  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>  					nr[LRU_INACTIVE_FILE]) {
>  		for_each_evictable_lru(l) {

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first
@ 2009-04-29  3:36     ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-29  3:36 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Elladan, linux-kernel, peterz, tytso, kosaki.motohiro, linux-mm

Rik,

This patch appears to significantly improve application latency while a large
file copy runs.  I'm not seeing behavior that implies continuous bad page
replacement.

I'm still seeing some general lag, which I attribute to general filesystem
slowness.  For example, latencytop sees many events like these:

down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 1475.8 msec          5.9 %

xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags 1740.9 msec          2.6 %

Writing a page to disk                            1042.9 msec         43.7 %

It also occasionally sees long page faults:

Page fault                                        2068.3 msec         21.3 %

I guess XFS (and the elevator) is just doing a poor job managing latency
(particularly poor since all the IO on /usr/bin is on the reader disk).
Notable:

Creating block layer request                      451.4 msec         14.4 %

Thank you,
Elladan

On Tue, Apr 28, 2009 at 07:29:07PM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> 
> --- 
> Elladan, does this patch fix the issue you are seeing?
> 
> Peter, Kosaki, Ted, does this patch look good to you?
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eac9577..4c0304e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1489,6 +1489,21 @@ static void shrink_zone(int priority, struct zone *zone,
>  			nr[l] = scan;
>  	}
>  
> +	/*
> +	 * When the system is doing streaming IO, memory pressure here
> +	 * ensures that active file pages get deactivated, until more
> +	 * than half of the file pages are on the inactive list.
> +	 *
> +	 * Once we get to that situation, protect the system's working
> +	 * set from being evicted by disabling active file page aging
> +	 * and swapping of swap backed pages.  We still do background
> +	 * aging of anonymous pages.
> +	 */
> +	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]) {
> +		nr[LRU_ACTIVE_FILE] = 0;
> +		nr[LRU_INACTIVE_ANON] = 0;
> +	}
> +
>  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>  					nr[LRU_INACTIVE_FILE]) {
>  		for_each_evictable_lru(l) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-28 12:08         ` Theodore Tso
@ 2009-04-29  5:51           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-29  5:51 UTC (permalink / raw)
  To: Theodore Tso, Wu Fengguang, Peter Zijlstra, KOSAKI Motohiro,
	Elladan, linux-kernel, linux-mm, Rik van Riel
  Cc: kosaki.motohiro

Hi

> On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:
> > The semi-drop-behind is a great idea for the desktop - to put just
> > accessed pages to end of LRU. However I'm still afraid it vastly
> > changes the caching behavior and wont work well as expected in server
> > workloads - shall we verify this?
> > 
> > Back to this big-cp-hurts-responsibility issue. Background write
> > requests can easily pass the io scheduler's obstacles and fill up
> > the disk queue. Now every read request will have to wait 10+ writes
> > - leading to 10x slow down of major page faults.
> > 
> > I reach this conclusion based on recent CFQ code reviews. Will bring up
> > a queue depth limiting patch for more exercises..
> 
> We can muck with the I/O scheduler, but another thing to consider is
> whether the VM should be more aggressively throttling writes in this
> case; it sounds like the big cp in this case may be dirtying pages so
> aggressively that it's driving other (more useful) pages out of the
> page cache --- if the target disk is slower than the source disk (for
> example, backing up a SATA primary disk to a USB-attached backup disk)
> no amount of drop-behind is going to help the situation.
> 
> So that leaves three areas for exploration:
> 
> * Write-throttling
> * Drop-behind
> * background writes pushing aside foreground reads
> 
> Hmm, note that although the original bug reporter is running Ubuntu
> Jaunty, and hence 2.6.28, this problem is going to get *worse* with
> 2.6.30, since we have the ext3 data=ordered latency fixes which will
> write out the any journal activity, and worse, any synchornous commits
> (i.e., caused by fsync) will force out all of the dirty pages with
> WRITE_SYNC priority.  So with a heavy load, I suspect this is going to
> be more of a VM issue, and especially figuring out how to tune more
> aggressive write-throttling may be key here.

firstly, I'd like to report my reproduce test result.

test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness, 
                  CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
                  CPU opteronx4, mem 4G

mouse move lag:               not happend
window move lag:              not happend
Mapped page decrease rapidly: not happend (I guess, these page stay in 
                                          active list on my system)
page fault large latency:     happend (latencytop display >200ms)


Then, I don't doubt vm replacement logic now.
but I need more investigate.
I plan to try following thing today and tommorow.

 - XFS
 - LVM
 - another io scheduler (thanks Ted, good view point)
 - Rik's new patch





^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-29  5:51           ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-29  5:51 UTC (permalink / raw)
  To: Theodore Tso, Wu Fengguang, Peter Zijlstra, KOSAKI Motohiro,
	Elladan, linux-kernel, linux-mm, Rik van Riel

Hi

> On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:
> > The semi-drop-behind is a great idea for the desktop - to put just
> > accessed pages to end of LRU. However I'm still afraid it vastly
> > changes the caching behavior and wont work well as expected in server
> > workloads - shall we verify this?
> > 
> > Back to this big-cp-hurts-responsibility issue. Background write
> > requests can easily pass the io scheduler's obstacles and fill up
> > the disk queue. Now every read request will have to wait 10+ writes
> > - leading to 10x slow down of major page faults.
> > 
> > I reach this conclusion based on recent CFQ code reviews. Will bring up
> > a queue depth limiting patch for more exercises..
> 
> We can muck with the I/O scheduler, but another thing to consider is
> whether the VM should be more aggressively throttling writes in this
> case; it sounds like the big cp in this case may be dirtying pages so
> aggressively that it's driving other (more useful) pages out of the
> page cache --- if the target disk is slower than the source disk (for
> example, backing up a SATA primary disk to a USB-attached backup disk)
> no amount of drop-behind is going to help the situation.
> 
> So that leaves three areas for exploration:
> 
> * Write-throttling
> * Drop-behind
> * background writes pushing aside foreground reads
> 
> Hmm, note that although the original bug reporter is running Ubuntu
> Jaunty, and hence 2.6.28, this problem is going to get *worse* with
> 2.6.30, since we have the ext3 data=ordered latency fixes which will
> write out the any journal activity, and worse, any synchornous commits
> (i.e., caused by fsync) will force out all of the dirty pages with
> WRITE_SYNC priority.  So with a heavy load, I suspect this is going to
> be more of a VM issue, and especially figuring out how to tune more
> aggressive write-throttling may be key here.

firstly, I'd like to report my reproduce test result.

test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness, 
                  CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
                  CPU opteronx4, mem 4G

mouse move lag:               not happend
window move lag:              not happend
Mapped page decrease rapidly: not happend (I guess, these page stay in 
                                          active list on my system)
page fault large latency:     happend (latencytop display >200ms)


Then, I don't doubt vm replacement logic now.
but I need more investigate.
I plan to try following thing today and tommorow.

 - XFS
 - LVM
 - another io scheduler (thanks Ted, good view point)
 - Rik's new patch




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-29  5:51           ` KOSAKI Motohiro
@ 2009-04-29  6:34             ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-04-29  6:34 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Theodore Tso, Wu Fengguang, Peter Zijlstra, Elladan,
	linux-kernel, linux-mm, Rik van Riel

On Wed, 29 Apr 2009 14:51:07 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Hi
> 
> > On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:
> > > The semi-drop-behind is a great idea for the desktop - to put just
> > > accessed pages to end of LRU. However I'm still afraid it vastly
> > > changes the caching behavior and wont work well as expected in server
> > > workloads - shall we verify this?
> > > 
> > > Back to this big-cp-hurts-responsibility issue. Background write
> > > requests can easily pass the io scheduler's obstacles and fill up
> > > the disk queue. Now every read request will have to wait 10+ writes
> > > - leading to 10x slow down of major page faults.
> > > 
> > > I reach this conclusion based on recent CFQ code reviews. Will bring up
> > > a queue depth limiting patch for more exercises..
> > 
> > We can muck with the I/O scheduler, but another thing to consider is
> > whether the VM should be more aggressively throttling writes in this
> > case; it sounds like the big cp in this case may be dirtying pages so
> > aggressively that it's driving other (more useful) pages out of the
> > page cache --- if the target disk is slower than the source disk (for
> > example, backing up a SATA primary disk to a USB-attached backup disk)
> > no amount of drop-behind is going to help the situation.
> > 
> > So that leaves three areas for exploration:
> > 
> > * Write-throttling
> > * Drop-behind
> > * background writes pushing aside foreground reads
> > 
> > Hmm, note that although the original bug reporter is running Ubuntu
> > Jaunty, and hence 2.6.28, this problem is going to get *worse* with
> > 2.6.30, since we have the ext3 data=ordered latency fixes which will
> > write out the any journal activity, and worse, any synchornous commits
> > (i.e., caused by fsync) will force out all of the dirty pages with
> > WRITE_SYNC priority.  So with a heavy load, I suspect this is going to
> > be more of a VM issue, and especially figuring out how to tune more
> > aggressive write-throttling may be key here.
> 
> firstly, I'd like to report my reproduce test result.
> 
> test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness, 
>                   CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
>                   CPU opteronx4, mem 4G
> 
> mouse move lag:               not happend
> window move lag:              not happend
> Mapped page decrease rapidly: not happend (I guess, these page stay in 
>                                           active list on my system)
> page fault large latency:     happend (latencytop display >200ms)

hm.  The last two observations appear to be inconsistent.

Elladan, have you checked to see whether the Mapped: number in
/proc/meminfo is decreasing?

> 
> Then, I don't doubt vm replacement logic now.
> but I need more investigate.
> I plan to try following thing today and tommorow.
> 
>  - XFS
>  - LVM
>  - another io scheduler (thanks Ted, good view point)
>  - Rik's new patch

It's not clear that we know what's happening yet, is it?  It's such a
gross problem that you'd think that even our testing would have found
it by now :(

Elladan, do you know if earlier kernels (2.6.26 or thereabouts) had
this severe a problem?

(notes that we _still_ haven't unbusted prev_priority)

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-29  6:34             ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-04-29  6:34 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Theodore Tso, Wu Fengguang, Peter Zijlstra, Elladan,
	linux-kernel, linux-mm, Rik van Riel

On Wed, 29 Apr 2009 14:51:07 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> Hi
> 
> > On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:
> > > The semi-drop-behind is a great idea for the desktop - to put just
> > > accessed pages to end of LRU. However I'm still afraid it vastly
> > > changes the caching behavior and wont work well as expected in server
> > > workloads - shall we verify this?
> > > 
> > > Back to this big-cp-hurts-responsibility issue. Background write
> > > requests can easily pass the io scheduler's obstacles and fill up
> > > the disk queue. Now every read request will have to wait 10+ writes
> > > - leading to 10x slow down of major page faults.
> > > 
> > > I reach this conclusion based on recent CFQ code reviews. Will bring up
> > > a queue depth limiting patch for more exercises..
> > 
> > We can muck with the I/O scheduler, but another thing to consider is
> > whether the VM should be more aggressively throttling writes in this
> > case; it sounds like the big cp in this case may be dirtying pages so
> > aggressively that it's driving other (more useful) pages out of the
> > page cache --- if the target disk is slower than the source disk (for
> > example, backing up a SATA primary disk to a USB-attached backup disk)
> > no amount of drop-behind is going to help the situation.
> > 
> > So that leaves three areas for exploration:
> > 
> > * Write-throttling
> > * Drop-behind
> > * background writes pushing aside foreground reads
> > 
> > Hmm, note that although the original bug reporter is running Ubuntu
> > Jaunty, and hence 2.6.28, this problem is going to get *worse* with
> > 2.6.30, since we have the ext3 data=ordered latency fixes which will
> > write out the any journal activity, and worse, any synchornous commits
> > (i.e., caused by fsync) will force out all of the dirty pages with
> > WRITE_SYNC priority.  So with a heavy load, I suspect this is going to
> > be more of a VM issue, and especially figuring out how to tune more
> > aggressive write-throttling may be key here.
> 
> firstly, I'd like to report my reproduce test result.
> 
> test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness, 
>                   CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
>                   CPU opteronx4, mem 4G
> 
> mouse move lag:               not happend
> window move lag:              not happend
> Mapped page decrease rapidly: not happend (I guess, these page stay in 
>                                           active list on my system)
> page fault large latency:     happend (latencytop display >200ms)

hm.  The last two observations appear to be inconsistent.

Elladan, have you checked to see whether the Mapped: number in
/proc/meminfo is decreasing?

> 
> Then, I don't doubt vm replacement logic now.
> but I need more investigate.
> I plan to try following thing today and tommorow.
> 
>  - XFS
>  - LVM
>  - another io scheduler (thanks Ted, good view point)
>  - Rik's new patch

It's not clear that we know what's happening yet, is it?  It's such a
gross problem that you'd think that even our testing would have found
it by now :(

Elladan, do you know if earlier kernels (2.6.26 or thereabouts) had
this severe a problem?

(notes that we _still_ haven't unbusted prev_priority)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first
  2009-04-28 23:29   ` Rik van Riel
@ 2009-04-29  6:42     ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-04-29  6:42 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Tue, 2009-04-28 at 19:29 -0400, Rik van Riel wrote:

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eac9577..4c0304e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1489,6 +1489,21 @@ static void shrink_zone(int priority, struct zone *zone,
>  			nr[l] = scan;
>  	}
>  
> +	/*
> +	 * When the system is doing streaming IO, memory pressure here
> +	 * ensures that active file pages get deactivated, until more
> +	 * than half of the file pages are on the inactive list.
> +	 *
> +	 * Once we get to that situation, protect the system's working
> +	 * set from being evicted by disabling active file page aging
> +	 * and swapping of swap backed pages.  We still do background
> +	 * aging of anonymous pages.
> +	 */
> +	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]) {
> +		nr[LRU_ACTIVE_FILE] = 0;
> +		nr[LRU_INACTIVE_ANON] = 0;
> +	}
> +

Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
shrinking INACTIVE_ANON even though it makes sense to.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first
@ 2009-04-29  6:42     ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-04-29  6:42 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Tue, 2009-04-28 at 19:29 -0400, Rik van Riel wrote:

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eac9577..4c0304e 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1489,6 +1489,21 @@ static void shrink_zone(int priority, struct zone *zone,
>  			nr[l] = scan;
>  	}
>  
> +	/*
> +	 * When the system is doing streaming IO, memory pressure here
> +	 * ensures that active file pages get deactivated, until more
> +	 * than half of the file pages are on the inactive list.
> +	 *
> +	 * Once we get to that situation, protect the system's working
> +	 * set from being evicted by disabling active file page aging
> +	 * and swapping of swap backed pages.  We still do background
> +	 * aging of anonymous pages.
> +	 */
> +	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]) {
> +		nr[LRU_ACTIVE_FILE] = 0;
> +		nr[LRU_INACTIVE_ANON] = 0;
> +	}
> +

Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
shrinking INACTIVE_ANON even though it makes sense to.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-29  6:34             ` Andrew Morton
@ 2009-04-29  7:47               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-29  7:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Theodore Tso, Wu Fengguang, Peter Zijlstra, Elladan,
	linux-kernel, linux-mm, Rik van Riel

>> Mapped page decrease rapidly: not happend (I guess, these page stay in
>>                                           active list on my system)
>> page fault large latency:     happend (latencytop display >200ms)
>
> hm.  The last two observations appear to be inconsistent.

it mean existing process don't slow down. but new process creation is very slow.


> Elladan, have you checked to see whether the Mapped: number in
> /proc/meminfo is decreasing?
>
>>
>> Then, I don't doubt vm replacement logic now.
>> but I need more investigate.
>> I plan to try following thing today and tommorow.
>>
>>  - XFS
>>  - LVM
>>  - another io scheduler (thanks Ted, good view point)
>>  - Rik's new patch
>
> It's not clear that we know what's happening yet, is it?  It's such a
> gross problem that you'd think that even our testing would have found
> it by now :(

Yes, unclear. but various testing can drill down the reason, I think.


> Elladan, do you know if earlier kernels (2.6.26 or thereabouts) had
> this severe a problem?
>
> (notes that we _still_ haven't unbusted prev_priority)

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-29  7:47               ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-29  7:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Theodore Tso, Wu Fengguang, Peter Zijlstra, Elladan,
	linux-kernel, linux-mm, Rik van Riel

>> Mapped page decrease rapidly: not happend (I guess, these page stay in
>>                                           active list on my system)
>> page fault large latency:     happend (latencytop display >200ms)
>
> hm.  The last two observations appear to be inconsistent.

it mean existing process don't slow down. but new process creation is very slow.


> Elladan, have you checked to see whether the Mapped: number in
> /proc/meminfo is decreasing?
>
>>
>> Then, I don't doubt vm replacement logic now.
>> but I need more investigate.
>> I plan to try following thing today and tommorow.
>>
>>  - XFS
>>  - LVM
>>  - another io scheduler (thanks Ted, good view point)
>>  - Rik's new patch
>
> It's not clear that we know what's happening yet, is it?  It's such a
> gross problem that you'd think that even our testing would have found
> it by now :(

Yes, unclear. but various testing can drill down the reason, I think.


> Elladan, do you know if earlier kernels (2.6.26 or thereabouts) had
> this severe a problem?
>
> (notes that we _still_ haven't unbusted prev_priority)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-29  5:51           ` KOSAKI Motohiro
@ 2009-04-29  7:48             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-29  7:48 UTC (permalink / raw)
  To: Theodore Tso, Wu Fengguang, Peter Zijlstra, KOSAKI Motohiro,
	Elladan, linux-kernel, linux-mm, Rik van Riel

one mistake

> mouse move lag:               not happend
> window move lag:              not happend
> Mapped page decrease rapidly: not happend (I guess, these page stay in
>                                          active list on my system)
> page fault large latency:     happend (latencytop display >200ms)

             ^^^^^^^^^

            >1200ms

sorry.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-29  7:48             ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-29  7:48 UTC (permalink / raw)
  To: Theodore Tso, Wu Fengguang, Peter Zijlstra, KOSAKI Motohiro,
	Elladan, linux-kernel, linux-mm, Rik van Riel

one mistake

> mouse move lag:               not happend
> window move lag:              not happend
> Mapped page decrease rapidly: not happend (I guess, these page stay in
>                                          active list on my system)
> page fault large latency:     happend (latencytop display >200ms)

             ^^^^^^^^^

            >1200ms

sorry.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first
  2009-04-29  6:42     ` Peter Zijlstra
@ 2009-04-29 13:30       ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-29 13:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

Peter Zijlstra wrote:
> On Tue, 2009-04-28 at 19:29 -0400, Rik van Riel wrote:
> 
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index eac9577..4c0304e 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1489,6 +1489,21 @@ static void shrink_zone(int priority, struct zone *zone,
>>  			nr[l] = scan;
>>  	}
>>  
>> +	/*
>> +	 * When the system is doing streaming IO, memory pressure here
>> +	 * ensures that active file pages get deactivated, until more
>> +	 * than half of the file pages are on the inactive list.
>> +	 *
>> +	 * Once we get to that situation, protect the system's working
>> +	 * set from being evicted by disabling active file page aging
>> +	 * and swapping of swap backed pages.  We still do background
>> +	 * aging of anonymous pages.
>> +	 */
>> +	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]) {
>> +		nr[LRU_ACTIVE_FILE] = 0;
>> +		nr[LRU_INACTIVE_ANON] = 0;
>> +	}
>> +
> 
> Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> shrinking INACTIVE_ANON even though it makes sense to.

Only temporarily, until the number of active file pages
is larger than the number of inactive ones.

Think of it as reducing the frequency of shrinking anonymous
pages while the system is near the threshold.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first
@ 2009-04-29 13:30       ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-29 13:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

Peter Zijlstra wrote:
> On Tue, 2009-04-28 at 19:29 -0400, Rik van Riel wrote:
> 
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index eac9577..4c0304e 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1489,6 +1489,21 @@ static void shrink_zone(int priority, struct zone *zone,
>>  			nr[l] = scan;
>>  	}
>>  
>> +	/*
>> +	 * When the system is doing streaming IO, memory pressure here
>> +	 * ensures that active file pages get deactivated, until more
>> +	 * than half of the file pages are on the inactive list.
>> +	 *
>> +	 * Once we get to that situation, protect the system's working
>> +	 * set from being evicted by disabling active file page aging
>> +	 * and swapping of swap backed pages.  We still do background
>> +	 * aging of anonymous pages.
>> +	 */
>> +	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE]) {
>> +		nr[LRU_ACTIVE_FILE] = 0;
>> +		nr[LRU_INACTIVE_ANON] = 0;
>> +	}
>> +
> 
> Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> shrinking INACTIVE_ANON even though it makes sense to.

Only temporarily, until the number of active file pages
is larger than the number of inactive ones.

Think of it as reducing the frequency of shrinking anonymous
pages while the system is near the threshold.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH] vmscan: evict use-once pages first (v2)
  2009-04-29  6:42     ` Peter Zijlstra
@ 2009-04-29 15:47       ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-29 15:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

When the file LRU lists are dominated by streaming IO pages,
evict those pages first, before considering evicting other
pages.

This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:
1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted

The pages freed in this way can either be reused for streaming
IO, or allocated for something else. If the pages are used for
streaming IO, this pageout pattern continues. Otherwise, we will
fall back to the normal pageout pattern.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
On Wed, 29 Apr 2009 08:42:29 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> shrinking INACTIVE_ANON even though it makes sense to.

Peter, after looking at this again, I believe that the get_scan_ratio
logic should take care of protecting the anonymous pages, so we can
get away with this following, less intrusive patch.

Elladan, does this smaller patch still work as expected?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eac9577..4471dcb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1489,6 +1489,18 @@ static void shrink_zone(int priority, struct zone *zone,
 			nr[l] = scan;
 	}
 
+	/*
+	 * When the system is doing streaming IO, memory pressure here
+	 * ensures that active file pages get deactivated, until more
+	 * than half of the file pages are on the inactive list.
+	 *
+	 * Once we get to that situation, protect the system's working
+	 * set from being evicted by disabling active file page aging.
+	 * The logic in get_scan_ratio protects anonymous pages.
+	 */
+	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE])
+		nr[LRU_ACTIVE_FILE] = 0;
+
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {


^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-04-29 15:47       ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-29 15:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

When the file LRU lists are dominated by streaming IO pages,
evict those pages first, before considering evicting other
pages.

This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:
1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted

The pages freed in this way can either be reused for streaming
IO, or allocated for something else. If the pages are used for
streaming IO, this pageout pattern continues. Otherwise, we will
fall back to the normal pageout pattern.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
On Wed, 29 Apr 2009 08:42:29 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> shrinking INACTIVE_ANON even though it makes sense to.

Peter, after looking at this again, I believe that the get_scan_ratio
logic should take care of protecting the anonymous pages, so we can
get away with this following, less intrusive patch.

Elladan, does this smaller patch still work as expected?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index eac9577..4471dcb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1489,6 +1489,18 @@ static void shrink_zone(int priority, struct zone *zone,
 			nr[l] = scan;
 	}
 
+	/*
+	 * When the system is doing streaming IO, memory pressure here
+	 * ensures that active file pages get deactivated, until more
+	 * than half of the file pages are on the inactive list.
+	 *
+	 * Once we get to that situation, protect the system's working
+	 * set from being evicted by disabling active file page aging.
+	 * The logic in get_scan_ratio protects anonymous pages.
+	 */
+	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE])
+		nr[LRU_ACTIVE_FILE] = 0;
+
 	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
 					nr[LRU_INACTIVE_FILE]) {
 		for_each_evictable_lru(l) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-04-29 15:47       ` Rik van Riel
@ 2009-04-29 16:07         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-29 16:07 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

Hi

Looks good than previous version. but I have one question.

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eac9577..4471dcb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1489,6 +1489,18 @@ static void shrink_zone(int priority, struct zone *zone,
>                        nr[l] = scan;
>        }
>
> +       /*
> +        * When the system is doing streaming IO, memory pressure here
> +        * ensures that active file pages get deactivated, until more
> +        * than half of the file pages are on the inactive list.
> +        *
> +        * Once we get to that situation, protect the system's working
> +        * set from being evicted by disabling active file page aging.
> +        * The logic in get_scan_ratio protects anonymous pages.
> +        */
> +       if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE])
> +               nr[LRU_ACTIVE_FILE] = 0;
> +
>        while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>                                        nr[LRU_INACTIVE_FILE]) {
>                for_each_evictable_lru(l) {

we handle active_anon vs inactive_anon ratio by shrink_list().
Why do you insert this logic insert shrink_zone() ?

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-04-29 16:07         ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-29 16:07 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

Hi

Looks good than previous version. but I have one question.

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eac9577..4471dcb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1489,6 +1489,18 @@ static void shrink_zone(int priority, struct zone *zone,
>                        nr[l] = scan;
>        }
>
> +       /*
> +        * When the system is doing streaming IO, memory pressure here
> +        * ensures that active file pages get deactivated, until more
> +        * than half of the file pages are on the inactive list.
> +        *
> +        * Once we get to that situation, protect the system's working
> +        * set from being evicted by disabling active file page aging.
> +        * The logic in get_scan_ratio protects anonymous pages.
> +        */
> +       if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE])
> +               nr[LRU_ACTIVE_FILE] = 0;
> +
>        while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>                                        nr[LRU_INACTIVE_FILE]) {
>                for_each_evictable_lru(l) {

we handle active_anon vs inactive_anon ratio by shrink_list().
Why do you insert this logic insert shrink_zone() ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-04-29 15:47       ` Rik van Riel
@ 2009-04-29 16:10         ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-04-29 16:10 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Wed, 2009-04-29 at 11:47 -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> On Wed, 29 Apr 2009 08:42:29 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> > shrinking INACTIVE_ANON even though it makes sense to.
> 
> Peter, after looking at this again, I believe that the get_scan_ratio
> logic should take care of protecting the anonymous pages, so we can
> get away with this following, less intrusive patch.
> 
> Elladan, does this smaller patch still work as expected?

Provided of course that it actually fixes Elladan's issue, this looks
good to me.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eac9577..4471dcb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1489,6 +1489,18 @@ static void shrink_zone(int priority, struct zone *zone,
>  			nr[l] = scan;
>  	}
>  
> +	/*
> +	 * When the system is doing streaming IO, memory pressure here
> +	 * ensures that active file pages get deactivated, until more
> +	 * than half of the file pages are on the inactive list.
> +	 *
> +	 * Once we get to that situation, protect the system's working
> +	 * set from being evicted by disabling active file page aging.
> +	 * The logic in get_scan_ratio protects anonymous pages.
> +	 */
> +	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE])
> +		nr[LRU_ACTIVE_FILE] = 0;
> +
>  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>  					nr[LRU_INACTIVE_FILE]) {
>  		for_each_evictable_lru(l) {
> 


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-04-29 16:10         ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-04-29 16:10 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Wed, 2009-04-29 at 11:47 -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> On Wed, 29 Apr 2009 08:42:29 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> > shrinking INACTIVE_ANON even though it makes sense to.
> 
> Peter, after looking at this again, I believe that the get_scan_ratio
> logic should take care of protecting the anonymous pages, so we can
> get away with this following, less intrusive patch.
> 
> Elladan, does this smaller patch still work as expected?

Provided of course that it actually fixes Elladan's issue, this looks
good to me.

Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eac9577..4471dcb 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1489,6 +1489,18 @@ static void shrink_zone(int priority, struct zone *zone,
>  			nr[l] = scan;
>  	}
>  
> +	/*
> +	 * When the system is doing streaming IO, memory pressure here
> +	 * ensures that active file pages get deactivated, until more
> +	 * than half of the file pages are on the inactive list.
> +	 *
> +	 * Once we get to that situation, protect the system's working
> +	 * set from being evicted by disabling active file page aging.
> +	 * The logic in get_scan_ratio protects anonymous pages.
> +	 */
> +	if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE])
> +		nr[LRU_ACTIVE_FILE] = 0;
> +
>  	while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>  					nr[LRU_INACTIVE_FILE]) {
>  		for_each_evictable_lru(l) {
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-04-29 16:07         ` KOSAKI Motohiro
@ 2009-04-29 16:18           ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-29 16:18 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

KOSAKI Motohiro wrote:
> Hi
>
> Looks good than previous version. but I have one question.
>
>   
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index eac9577..4471dcb 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1489,6 +1489,18 @@ static void shrink_zone(int priority, struct zone *zone,
>>                        nr[l] = scan;
>>        }
>>
>> +       /*
>> +        * When the system is doing streaming IO, memory pressure here
>> +        * ensures that active file pages get deactivated, until more
>> +        * than half of the file pages are on the inactive list.
>> +        *
>> +        * Once we get to that situation, protect the system's working
>> +        * set from being evicted by disabling active file page aging.
>> +        * The logic in get_scan_ratio protects anonymous pages.
>> +        */
>> +       if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE])
>> +               nr[LRU_ACTIVE_FILE] = 0;
>> +
>>        while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>>                                        nr[LRU_INACTIVE_FILE]) {
>>                for_each_evictable_lru(l) {
>>     
>
> we handle active_anon vs inactive_anon ratio by shrink_list().
> Why do you insert this logic insert shrink_zone() ?
>   
Good question.  I guess that at lower priority levels, we get to scan
a lot more pages and we could go from having too many inactive
file pages to not having enough in one invocation of shrink_zone().

That makes shrink_list() the better place to implement this, even if
it means doing this comparison more often.

I'll send a new patch this afternoon.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-04-29 16:18           ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-29 16:18 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

KOSAKI Motohiro wrote:
> Hi
>
> Looks good than previous version. but I have one question.
>
>   
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index eac9577..4471dcb 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1489,6 +1489,18 @@ static void shrink_zone(int priority, struct zone *zone,
>>                        nr[l] = scan;
>>        }
>>
>> +       /*
>> +        * When the system is doing streaming IO, memory pressure here
>> +        * ensures that active file pages get deactivated, until more
>> +        * than half of the file pages are on the inactive list.
>> +        *
>> +        * Once we get to that situation, protect the system's working
>> +        * set from being evicted by disabling active file page aging.
>> +        * The logic in get_scan_ratio protects anonymous pages.
>> +        */
>> +       if (nr[LRU_INACTIVE_FILE] > nr[LRU_ACTIVE_FILE])
>> +               nr[LRU_ACTIVE_FILE] = 0;
>> +
>>        while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
>>                                        nr[LRU_INACTIVE_FILE]) {
>>                for_each_evictable_lru(l) {
>>     
>
> we handle active_anon vs inactive_anon ratio by shrink_list().
> Why do you insert this logic insert shrink_zone() ?
>   
Good question.  I guess that at lower priority levels, we get to scan
a lot more pages and we could go from having too many inactive
file pages to not having enough in one invocation of shrink_zone().

That makes shrink_list() the better place to implement this, even if
it means doing this comparison more often.

I'll send a new patch this afternoon.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first
  2009-04-29  3:36     ` Elladan
@ 2009-04-29 17:06       ` Christoph Hellwig
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Hellwig @ 2009-04-29 17:06 UTC (permalink / raw)
  To: Elladan
  Cc: Rik van Riel, linux-kernel, peterz, tytso, kosaki.motohiro, linux-mm

On Tue, Apr 28, 2009 at 08:36:51PM -0700, Elladan wrote:
> Rik,
> 
> This patch appears to significantly improve application latency while a large
> file copy runs.  I'm not seeing behavior that implies continuous bad page
> replacement.
> 
> I'm still seeing some general lag, which I attribute to general filesystem
> slowness.  For example, latencytop sees many events like these:
> 
> down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 1475.8 msec          5.9 %

This actually is contention on the buffer lock, and most likely
happens because it's trying to access a buffer that's beeing read
in currently.

> 
> xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags 1740.9 msec          2.6 %

That's an actual metadata read.

> Writing a page to disk                            1042.9 msec         43.7 %
> 
> It also occasionally sees long page faults:
> 
> Page fault                                        2068.3 msec         21.3 %
> 
> I guess XFS (and the elevator) is just doing a poor job managing latency
> (particularly poor since all the IO on /usr/bin is on the reader disk).

The filesystem doesn't really decide which priorities to use, except
for some use of the WRITE_SYNC which is used rather minimall in XFS in
2.6.28.

> Creating block layer request                      451.4 msec         14.4 %

I guess that a wait in get_request because we're above nr_requests..

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first
@ 2009-04-29 17:06       ` Christoph Hellwig
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Hellwig @ 2009-04-29 17:06 UTC (permalink / raw)
  To: Elladan
  Cc: Rik van Riel, linux-kernel, peterz, tytso, kosaki.motohiro, linux-mm

On Tue, Apr 28, 2009 at 08:36:51PM -0700, Elladan wrote:
> Rik,
> 
> This patch appears to significantly improve application latency while a large
> file copy runs.  I'm not seeing behavior that implies continuous bad page
> replacement.
> 
> I'm still seeing some general lag, which I attribute to general filesystem
> slowness.  For example, latencytop sees many events like these:
> 
> down xfs_buf_lock _xfs_buf_find xfs_buf_get_flags 1475.8 msec          5.9 %

This actually is contention on the buffer lock, and most likely
happens because it's trying to access a buffer that's beeing read
in currently.

> 
> xfs_buf_iowait xfs_buf_iostart xfs_buf_read_flags 1740.9 msec          2.6 %

That's an actual metadata read.

> Writing a page to disk                            1042.9 msec         43.7 %
> 
> It also occasionally sees long page faults:
> 
> Page fault                                        2068.3 msec         21.3 %
> 
> I guess XFS (and the elevator) is just doing a poor job managing latency
> (particularly poor since all the IO on /usr/bin is on the reader disk).

The filesystem doesn't really decide which priorities to use, except
for some use of the WRITE_SYNC which is used rather minimall in XFS in
2.6.28.

> Creating block layer request                      451.4 msec         14.4 %

I guess that a wait in get_request because we're above nr_requests..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH] vmscan: evict use-once pages first (v3)
  2009-04-29 16:07         ` KOSAKI Motohiro
@ 2009-04-29 17:14           ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-29 17:14 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

When the file LRU lists are dominated by streaming IO pages,
evict those pages first, before considering evicting other
pages.

This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:
1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted

The pages freed in this way can either be reused for streaming
IO, or allocated for something else. If the pages are used for
streaming IO, this pageout pattern continues. Otherwise, we will
fall back to the normal pageout pattern.

Signed-off-by: Rik van Riel <riel@redhat.com>

---
On Thu, 30 Apr 2009 01:07:51 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> we handle active_anon vs inactive_anon ratio by shrink_list().
> Why do you insert this logic insert shrink_zone() ?

Kosaki, this implementation mirrors the anon side of things precisely.
Does this look good?

Elladan, this patch should work just like the second version. Please
let me know how it works for you.

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a9e3b76..dbfe7ba 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -94,6 +94,7 @@ extern void mem_cgroup_note_reclaim_priority(struct mem_cgroup *mem,
 extern void mem_cgroup_record_reclaim_priority(struct mem_cgroup *mem,
 							int priority);
 int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg);
+int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg);
 unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
 				       struct zone *zone,
 				       enum lru_list lru);
@@ -239,6 +240,12 @@ mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg)
 	return 1;
 }
 
+static inline int
+mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
+{
+	return 1;
+}
+
 static inline unsigned long
 mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg, struct zone *zone,
 			 enum lru_list lru)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e44fb0f..026cb5a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -578,6 +578,17 @@ int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg)
 	return 0;
 }
 
+int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
+{
+	unsigned long active;
+	unsigned long inactive;
+
+	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
+	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
+
+	return (active > inactive);
+}
+
 unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
 				       struct zone *zone,
 				       enum lru_list lru)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eac9577..a73f675 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1348,12 +1348,48 @@ static int inactive_anon_is_low(struct zone *zone, struct scan_control *sc)
 	return low;
 }
 
+static int inactive_file_is_low_global(struct zone *zone)
+{
+	unsigned long active, inactive;
+
+	active = zone_page_state(zone, NR_ACTIVE_FILE);
+	inactive = zone_page_state(zone, NR_INACTIVE_FILE);
+
+	return (active > inactive);
+}
+
+/**
+ * inactive_file_is_low - check if file pages need to be deactivated
+ * @zone: zone to check
+ * @sc:   scan control of this context
+ *
+ * When the system is doing streaming IO, memory pressure here
+ * ensures that active file pages get deactivated, until more
+ * than half of the file pages are on the inactive list.
+ *
+ * Once we get to that situation, protect the system's working
+ * set from being evicted by disabling active file page aging.
+ *
+ * This uses a different ratio than the anonymous pages, because
+ * the page cache uses a use-once replacement algorithm.
+ */
+static int inactive_file_is_low(struct zone *zone, struct scan_control *sc)
+{
+	int low;
+
+	if (scanning_global_lru(sc))
+		low = inactive_file_is_low_global(zone);
+	else
+		low = mem_cgroup_inactive_file_is_low(sc->mem_cgroup);
+	return low;
+}
+
 static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 	struct zone *zone, struct scan_control *sc, int priority)
 {
 	int file = is_file_lru(lru);
 
-	if (lru == LRU_ACTIVE_FILE) {
+	if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
 		shrink_active_list(nr_to_scan, zone, sc, priority, file);
 		return 0;
 	}


^ permalink raw reply related	[flat|nested] 336+ messages in thread

* [PATCH] vmscan: evict use-once pages first (v3)
@ 2009-04-29 17:14           ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-29 17:14 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

When the file LRU lists are dominated by streaming IO pages,
evict those pages first, before considering evicting other
pages.

This should be safe from deadlocks or performance problems
because only three things can happen to an inactive file page:
1) referenced twice and promoted to the active list
2) evicted by the pageout code
3) under IO, after which it will get evicted or promoted

The pages freed in this way can either be reused for streaming
IO, or allocated for something else. If the pages are used for
streaming IO, this pageout pattern continues. Otherwise, we will
fall back to the normal pageout pattern.

Signed-off-by: Rik van Riel <riel@redhat.com>

---
On Thu, 30 Apr 2009 01:07:51 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:

> we handle active_anon vs inactive_anon ratio by shrink_list().
> Why do you insert this logic insert shrink_zone() ?

Kosaki, this implementation mirrors the anon side of things precisely.
Does this look good?

Elladan, this patch should work just like the second version. Please
let me know how it works for you.

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a9e3b76..dbfe7ba 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -94,6 +94,7 @@ extern void mem_cgroup_note_reclaim_priority(struct mem_cgroup *mem,
 extern void mem_cgroup_record_reclaim_priority(struct mem_cgroup *mem,
 							int priority);
 int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg);
+int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg);
 unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
 				       struct zone *zone,
 				       enum lru_list lru);
@@ -239,6 +240,12 @@ mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg)
 	return 1;
 }
 
+static inline int
+mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
+{
+	return 1;
+}
+
 static inline unsigned long
 mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg, struct zone *zone,
 			 enum lru_list lru)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e44fb0f..026cb5a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -578,6 +578,17 @@ int mem_cgroup_inactive_anon_is_low(struct mem_cgroup *memcg)
 	return 0;
 }
 
+int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
+{
+	unsigned long active;
+	unsigned long inactive;
+
+	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
+	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
+
+	return (active > inactive);
+}
+
 unsigned long mem_cgroup_zone_nr_pages(struct mem_cgroup *memcg,
 				       struct zone *zone,
 				       enum lru_list lru)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index eac9577..a73f675 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1348,12 +1348,48 @@ static int inactive_anon_is_low(struct zone *zone, struct scan_control *sc)
 	return low;
 }
 
+static int inactive_file_is_low_global(struct zone *zone)
+{
+	unsigned long active, inactive;
+
+	active = zone_page_state(zone, NR_ACTIVE_FILE);
+	inactive = zone_page_state(zone, NR_INACTIVE_FILE);
+
+	return (active > inactive);
+}
+
+/**
+ * inactive_file_is_low - check if file pages need to be deactivated
+ * @zone: zone to check
+ * @sc:   scan control of this context
+ *
+ * When the system is doing streaming IO, memory pressure here
+ * ensures that active file pages get deactivated, until more
+ * than half of the file pages are on the inactive list.
+ *
+ * Once we get to that situation, protect the system's working
+ * set from being evicted by disabling active file page aging.
+ *
+ * This uses a different ratio than the anonymous pages, because
+ * the page cache uses a use-once replacement algorithm.
+ */
+static int inactive_file_is_low(struct zone *zone, struct scan_control *sc)
+{
+	int low;
+
+	if (scanning_global_lru(sc))
+		low = inactive_file_is_low_global(zone);
+	else
+		low = mem_cgroup_inactive_file_is_low(sc->mem_cgroup);
+	return low;
+}
+
 static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 	struct zone *zone, struct scan_control *sc, int priority)
 {
 	int file = is_file_lru(lru);
 
-	if (lru == LRU_ACTIVE_FILE) {
+	if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
 		shrink_active_list(nr_to_scan, zone, sc, priority, file);
 		return 0;
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
  2009-04-29 17:14           ` Rik van Riel
@ 2009-04-30  0:39             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-30  0:39 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> 
> ---
> On Thu, 30 Apr 2009 01:07:51 +0900
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > we handle active_anon vs inactive_anon ratio by shrink_list().
> > Why do you insert this logic insert shrink_zone() ?
> 
> Kosaki, this implementation mirrors the anon side of things precisely.
> Does this look good?
> 
> Elladan, this patch should work just like the second version. Please
> let me know how it works for you.

Looks good to me. thanks.
but I don't hit Rik's explained issue, I hope Elladan report his test result.




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
@ 2009-04-30  0:39             ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-30  0:39 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> 
> ---
> On Thu, 30 Apr 2009 01:07:51 +0900
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > we handle active_anon vs inactive_anon ratio by shrink_list().
> > Why do you insert this logic insert shrink_zone() ?
> 
> Kosaki, this implementation mirrors the anon side of things precisely.
> Does this look good?
> 
> Elladan, this patch should work just like the second version. Please
> let me know how it works for you.

Looks good to me. thanks.
but I don't hit Rik's explained issue, I hope Elladan report his test result.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-29  6:34             ` Andrew Morton
@ 2009-04-30  4:14               ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-30  4:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Theodore Tso, Wu Fengguang, Peter Zijlstra,
	Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 11:34:55PM -0700, Andrew Morton wrote:
> On Wed, 29 Apr 2009 14:51:07 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > Hi
> > 
> > > On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:
> > > > The semi-drop-behind is a great idea for the desktop - to put just
> > > > accessed pages to end of LRU. However I'm still afraid it vastly
> > > > changes the caching behavior and wont work well as expected in server
> > > > workloads - shall we verify this?
> > > > 
> > > > Back to this big-cp-hurts-responsibility issue. Background write
> > > > requests can easily pass the io scheduler's obstacles and fill up
> > > > the disk queue. Now every read request will have to wait 10+ writes
> > > > - leading to 10x slow down of major page faults.
> > > > 
> > > > I reach this conclusion based on recent CFQ code reviews. Will bring up
> > > > a queue depth limiting patch for more exercises..
> > > 
> > > We can muck with the I/O scheduler, but another thing to consider is
> > > whether the VM should be more aggressively throttling writes in this
> > > case; it sounds like the big cp in this case may be dirtying pages so
> > > aggressively that it's driving other (more useful) pages out of the
> > > page cache --- if the target disk is slower than the source disk (for
> > > example, backing up a SATA primary disk to a USB-attached backup disk)
> > > no amount of drop-behind is going to help the situation.
> > > 
> > > So that leaves three areas for exploration:
> > > 
> > > * Write-throttling
> > > * Drop-behind
> > > * background writes pushing aside foreground reads
> > > 
> > > Hmm, note that although the original bug reporter is running Ubuntu
> > > Jaunty, and hence 2.6.28, this problem is going to get *worse* with
> > > 2.6.30, since we have the ext3 data=ordered latency fixes which will
> > > write out the any journal activity, and worse, any synchornous commits
> > > (i.e., caused by fsync) will force out all of the dirty pages with
> > > WRITE_SYNC priority.  So with a heavy load, I suspect this is going to
> > > be more of a VM issue, and especially figuring out how to tune more
> > > aggressive write-throttling may be key here.
> > 
> > firstly, I'd like to report my reproduce test result.
> > 
> > test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness, 
> >                   CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
> >                   CPU opteronx4, mem 4G
> > 
> > mouse move lag:               not happend
> > window move lag:              not happend
> > Mapped page decrease rapidly: not happend (I guess, these page stay in 
> >                                           active list on my system)
> > page fault large latency:     happend (latencytop display >200ms)
> 
> hm.  The last two observations appear to be inconsistent.
> 
> Elladan, have you checked to see whether the Mapped: number in
> /proc/meminfo is decreasing?

Yes, Mapped decreases while a large file copy is ongoing.  It increases again
if I use the GUI.

> > Then, I don't doubt vm replacement logic now.
> > but I need more investigate.
> > I plan to try following thing today and tommorow.
> > 
> >  - XFS
> >  - LVM
> >  - another io scheduler (thanks Ted, good view point)
> >  - Rik's new patch
> 
> It's not clear that we know what's happening yet, is it?  It's such a
> gross problem that you'd think that even our testing would have found
> it by now :(
> 
> Elladan, do you know if earlier kernels (2.6.26 or thereabouts) had
> this severe a problem?

No, I don't know about older kernels.

Also, just to add a bit: I'm having some difficulty reproducing the extremely
severe latency I was seeing right off.  It's not difficult for me to reproduce
latencies that are painful, but not on the order of 10 second response.  Maybe
3 or 4 seconds at most.  I didn't have a stopwatch handy originally though, so
it's somewhat subjective, but I wonder if there's some element of the load that
I'm missing.

I had a theory about why this might be: my original repro was copying data
which I believe had been written once, but never read.  Plus, I was using
relatime.  However, on second thought this doesn't work -- there's only 8000
files, and a re-test with atime turned on isn't much different than with
relatime.

The other possibility is that there was some other background IO load spike,
which I didn't notice at the time.  I don't know what that would be though,
unless it was one of gnome's indexing jobs (I didn't see one, though).

-Elladan


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-30  4:14               ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-30  4:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Theodore Tso, Wu Fengguang, Peter Zijlstra,
	Elladan, linux-kernel, linux-mm, Rik van Riel

On Tue, Apr 28, 2009 at 11:34:55PM -0700, Andrew Morton wrote:
> On Wed, 29 Apr 2009 14:51:07 +0900 (JST) KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> 
> > Hi
> > 
> > > On Tue, Apr 28, 2009 at 05:09:16PM +0800, Wu Fengguang wrote:
> > > > The semi-drop-behind is a great idea for the desktop - to put just
> > > > accessed pages to end of LRU. However I'm still afraid it vastly
> > > > changes the caching behavior and wont work well as expected in server
> > > > workloads - shall we verify this?
> > > > 
> > > > Back to this big-cp-hurts-responsibility issue. Background write
> > > > requests can easily pass the io scheduler's obstacles and fill up
> > > > the disk queue. Now every read request will have to wait 10+ writes
> > > > - leading to 10x slow down of major page faults.
> > > > 
> > > > I reach this conclusion based on recent CFQ code reviews. Will bring up
> > > > a queue depth limiting patch for more exercises..
> > > 
> > > We can muck with the I/O scheduler, but another thing to consider is
> > > whether the VM should be more aggressively throttling writes in this
> > > case; it sounds like the big cp in this case may be dirtying pages so
> > > aggressively that it's driving other (more useful) pages out of the
> > > page cache --- if the target disk is slower than the source disk (for
> > > example, backing up a SATA primary disk to a USB-attached backup disk)
> > > no amount of drop-behind is going to help the situation.
> > > 
> > > So that leaves three areas for exploration:
> > > 
> > > * Write-throttling
> > > * Drop-behind
> > > * background writes pushing aside foreground reads
> > > 
> > > Hmm, note that although the original bug reporter is running Ubuntu
> > > Jaunty, and hence 2.6.28, this problem is going to get *worse* with
> > > 2.6.30, since we have the ext3 data=ordered latency fixes which will
> > > write out the any journal activity, and worse, any synchornous commits
> > > (i.e., caused by fsync) will force out all of the dirty pages with
> > > WRITE_SYNC priority.  So with a heavy load, I suspect this is going to
> > > be more of a VM issue, and especially figuring out how to tune more
> > > aggressive write-throttling may be key here.
> > 
> > firstly, I'd like to report my reproduce test result.
> > 
> > test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness, 
> >                   CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
> >                   CPU opteronx4, mem 4G
> > 
> > mouse move lag:               not happend
> > window move lag:              not happend
> > Mapped page decrease rapidly: not happend (I guess, these page stay in 
> >                                           active list on my system)
> > page fault large latency:     happend (latencytop display >200ms)
> 
> hm.  The last two observations appear to be inconsistent.
> 
> Elladan, have you checked to see whether the Mapped: number in
> /proc/meminfo is decreasing?

Yes, Mapped decreases while a large file copy is ongoing.  It increases again
if I use the GUI.

> > Then, I don't doubt vm replacement logic now.
> > but I need more investigate.
> > I plan to try following thing today and tommorow.
> > 
> >  - XFS
> >  - LVM
> >  - another io scheduler (thanks Ted, good view point)
> >  - Rik's new patch
> 
> It's not clear that we know what's happening yet, is it?  It's such a
> gross problem that you'd think that even our testing would have found
> it by now :(
> 
> Elladan, do you know if earlier kernels (2.6.26 or thereabouts) had
> this severe a problem?

No, I don't know about older kernels.

Also, just to add a bit: I'm having some difficulty reproducing the extremely
severe latency I was seeing right off.  It's not difficult for me to reproduce
latencies that are painful, but not on the order of 10 second response.  Maybe
3 or 4 seconds at most.  I didn't have a stopwatch handy originally though, so
it's somewhat subjective, but I wonder if there's some element of the load that
I'm missing.

I had a theory about why this might be: my original repro was copying data
which I believe had been written once, but never read.  Plus, I was using
relatime.  However, on second thought this doesn't work -- there's only 8000
files, and a re-test with atime turned on isn't much different than with
relatime.

The other possibility is that there was some other background IO load spike,
which I didn't notice at the time.  I don't know what that would be though,
unless it was one of gnome's indexing jobs (I didn't see one, though).

-Elladan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-30  4:14               ` Elladan
@ 2009-04-30  4:43                 ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-04-30  4:43 UTC (permalink / raw)
  To: Elladan
  Cc: KOSAKI Motohiro, Theodore Tso, Wu Fengguang, Peter Zijlstra,
	linux-kernel, linux-mm, Rik van Riel

On Wed, 29 Apr 2009 21:14:39 -0700 Elladan <elladan@eskimo.com> wrote:

> > Elladan, have you checked to see whether the Mapped: number in
> > /proc/meminfo is decreasing?
> 
> Yes, Mapped decreases while a large file copy is ongoing.  It increases again
> if I use the GUI.

OK.  If that's still happening to an appreciable extent after you've
increased /proc/sys/vm/swappiness then I'd wager that we have a
bug/regression in that area.

Local variable `scan' in shrink_zone() is vulnerable to multiplicative
overflows on large zones, but I doubt if you have enough memory to
trigger that bug.


From: Andrew Morton <akpm@linux-foundation.org>

Local variable `scan' can overflow on zones which are larger than

	(2G * 4k) / 100 = 80GB.

Making it 64-bit on 64-bit will fix that up.

Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone mm/vmscan.c
--- a/mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone
+++ a/mm/vmscan.c
@@ -1479,7 +1479,7 @@ static void shrink_zone(int priority, st
 
 	for_each_evictable_lru(l) {
 		int file = is_file_lru(l);
-		int scan;
+		unsigned long scan;
 
 		scan = zone_nr_pages(zone, sc, l);
 		if (priority) {
_



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-30  4:43                 ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-04-30  4:43 UTC (permalink / raw)
  To: Elladan
  Cc: KOSAKI Motohiro, Theodore Tso, Wu Fengguang, Peter Zijlstra,
	linux-kernel, linux-mm, Rik van Riel

On Wed, 29 Apr 2009 21:14:39 -0700 Elladan <elladan@eskimo.com> wrote:

> > Elladan, have you checked to see whether the Mapped: number in
> > /proc/meminfo is decreasing?
> 
> Yes, Mapped decreases while a large file copy is ongoing.  It increases again
> if I use the GUI.

OK.  If that's still happening to an appreciable extent after you've
increased /proc/sys/vm/swappiness then I'd wager that we have a
bug/regression in that area.

Local variable `scan' in shrink_zone() is vulnerable to multiplicative
overflows on large zones, but I doubt if you have enough memory to
trigger that bug.


From: Andrew Morton <akpm@linux-foundation.org>

Local variable `scan' can overflow on zones which are larger than

	(2G * 4k) / 100 = 80GB.

Making it 64-bit on 64-bit will fix that up.

Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff -puN mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone mm/vmscan.c
--- a/mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone
+++ a/mm/vmscan.c
@@ -1479,7 +1479,7 @@ static void shrink_zone(int priority, st
 
 	for_each_evictable_lru(l) {
 		int file = is_file_lru(l);
-		int scan;
+		unsigned long scan;
 
 		scan = zone_nr_pages(zone, sc, l);
 		if (priority) {
_


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-30  4:43                 ` Andrew Morton
@ 2009-04-30  4:55                   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-30  4:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, Elladan, Theodore Tso, Wu Fengguang,
	Peter Zijlstra, linux-kernel, linux-mm, Rik van Riel

> On Wed, 29 Apr 2009 21:14:39 -0700 Elladan <elladan@eskimo.com> wrote:
> 
> > > Elladan, have you checked to see whether the Mapped: number in
> > > /proc/meminfo is decreasing?
> > 
> > Yes, Mapped decreases while a large file copy is ongoing.  It increases again
> > if I use the GUI.
> 
> OK.  If that's still happening to an appreciable extent after you've
> increased /proc/sys/vm/swappiness then I'd wager that we have a
> bug/regression in that area.
> 
> Local variable `scan' in shrink_zone() is vulnerable to multiplicative
> overflows on large zones, but I doubt if you have enough memory to
> trigger that bug.
> 
> 
> From: Andrew Morton <akpm@linux-foundation.org>
> 
> Local variable `scan' can overflow on zones which are larger than
> 
> 	(2G * 4k) / 100 = 80GB.
> 
> Making it 64-bit on 64-bit will fix that up.

Agghh, thanks bugfix.

Note: His meminfo indicate his machine has 3.5GB ram. then this
patch don't fix his problem.



> 
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/vmscan.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -puN mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone mm/vmscan.c
> --- a/mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone
> +++ a/mm/vmscan.c
> @@ -1479,7 +1479,7 @@ static void shrink_zone(int priority, st
>  
>  	for_each_evictable_lru(l) {
>  		int file = is_file_lru(l);
> -		int scan;
> +		unsigned long scan;
>  
>  		scan = zone_nr_pages(zone, sc, l);
>  		if (priority) {
> _
> 
> 




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-30  4:55                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-30  4:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, Elladan, Theodore Tso, Wu Fengguang,
	Peter Zijlstra, linux-kernel, linux-mm, Rik van Riel

> On Wed, 29 Apr 2009 21:14:39 -0700 Elladan <elladan@eskimo.com> wrote:
> 
> > > Elladan, have you checked to see whether the Mapped: number in
> > > /proc/meminfo is decreasing?
> > 
> > Yes, Mapped decreases while a large file copy is ongoing.  It increases again
> > if I use the GUI.
> 
> OK.  If that's still happening to an appreciable extent after you've
> increased /proc/sys/vm/swappiness then I'd wager that we have a
> bug/regression in that area.
> 
> Local variable `scan' in shrink_zone() is vulnerable to multiplicative
> overflows on large zones, but I doubt if you have enough memory to
> trigger that bug.
> 
> 
> From: Andrew Morton <akpm@linux-foundation.org>
> 
> Local variable `scan' can overflow on zones which are larger than
> 
> 	(2G * 4k) / 100 = 80GB.
> 
> Making it 64-bit on 64-bit will fix that up.

Agghh, thanks bugfix.

Note: His meminfo indicate his machine has 3.5GB ram. then this
patch don't fix his problem.



> 
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/vmscan.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff -puN mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone mm/vmscan.c
> --- a/mm/vmscan.c~vmscan-avoid-multiplication-overflow-in-shrink_zone
> +++ a/mm/vmscan.c
> @@ -1479,7 +1479,7 @@ static void shrink_zone(int priority, st
>  
>  	for_each_evictable_lru(l) {
>  		int file = is_file_lru(l);
> -		int scan;
> +		unsigned long scan;
>  
>  		scan = zone_nr_pages(zone, sc, l);
>  		if (priority) {
> _
> 
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-30  4:43                 ` Andrew Morton
@ 2009-04-30  4:55                   ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-30  4:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Elladan, KOSAKI Motohiro, Theodore Tso, Wu Fengguang,
	Peter Zijlstra, linux-kernel, linux-mm, Rik van Riel

On Wed, Apr 29, 2009 at 09:43:32PM -0700, Andrew Morton wrote:
> On Wed, 29 Apr 2009 21:14:39 -0700 Elladan <elladan@eskimo.com> wrote:
> 
> > > Elladan, have you checked to see whether the Mapped: number in
> > > /proc/meminfo is decreasing?
> > 
> > Yes, Mapped decreases while a large file copy is ongoing.  It increases again
> > if I use the GUI.
> 
> OK.  If that's still happening to an appreciable extent after you've
> increased /proc/sys/vm/swappiness then I'd wager that we have a
> bug/regression in that area.
> 
> Local variable `scan' in shrink_zone() is vulnerable to multiplicative
> overflows on large zones, but I doubt if you have enough memory to
> trigger that bug.

No, I only have 4GB.

This appears to happen with swappiness set to 0 or 60.

-Elladan

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-30  4:55                   ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-30  4:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Elladan, KOSAKI Motohiro, Theodore Tso, Wu Fengguang,
	Peter Zijlstra, linux-kernel, linux-mm, Rik van Riel

On Wed, Apr 29, 2009 at 09:43:32PM -0700, Andrew Morton wrote:
> On Wed, 29 Apr 2009 21:14:39 -0700 Elladan <elladan@eskimo.com> wrote:
> 
> > > Elladan, have you checked to see whether the Mapped: number in
> > > /proc/meminfo is decreasing?
> > 
> > Yes, Mapped decreases while a large file copy is ongoing.  It increases again
> > if I use the GUI.
> 
> OK.  If that's still happening to an appreciable extent after you've
> increased /proc/sys/vm/swappiness then I'd wager that we have a
> bug/regression in that area.
> 
> Local variable `scan' in shrink_zone() is vulnerable to multiplicative
> overflows on large zones, but I doubt if you have enough memory to
> trigger that bug.

No, I only have 4GB.

This appears to happen with swappiness set to 0 or 60.

-Elladan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-04-29 15:47       ` Rik van Riel
@ 2009-04-30  7:20         ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-30  7:20 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Zijlstra, Elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Wed, Apr 29, 2009 at 11:47:08AM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> On Wed, 29 Apr 2009 08:42:29 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> > shrinking INACTIVE_ANON even though it makes sense to.
> 
> Peter, after looking at this again, I believe that the get_scan_ratio
> logic should take care of protecting the anonymous pages, so we can
> get away with this following, less intrusive patch.
> 
> Elladan, does this smaller patch still work as expected?

Rik, since the third patch doesn't work on 2.6.28 (without disabling a lot of
code), I went ahead and tested this patch.

The system does seem relatively responsive with this patch for the most part,
with occasional lag.  I don't see much evidence at least over the course of a
few minutes that it pages out applications significantly.  It seems about
equivalent to the first patch.

Given Andrew Morton's request that I track the Mapped: field in /proc/meminfo,
I went ahead and did that with this patch built into a kernel.  Compared to the
standard Ubuntu kernel, this patch keeps significantly more Mapped memory
around, and it shrinks at a slower rate after the test runs for a while.
Eventually, it seems to reach a steady state.

For example, with your patch, Mapped will often go for 30 seconds without
changing significantly.  Without your patch, it continuously lost about
500-1000K every 5 seconds, and then jumped up again significantly when I
touched Firefox or other applications.  I do see some of that behavior with
your patch too, but it's much less significant.

When I first initiated the background load, Mapped did rapidly decrease from
about 85000K to 47000K.  It seems to have reached a fairly steady state since
then.  I would guess this implies that the VM paged out parts of my executable
set that aren't touched very often, but isn't applying further pressure to my
active pages?  Also for example, after letting the test run for a while, I
scrolled around some tabs in firefox I hadn't used since the test began, and
experienced significant lag.

This seems ok (not disastrous, anyway).  I suspect desktop users would
generally prefer the VM were extremely aggressive about keeping their
executables paged in though, much moreso than this patch provides (and note how
popular swappiness=0 seems to be).  Paging applications back in seems to
introduce a large amount of UI latency, even if the VM keeps it to a sane level
as with this patch.  Also, I don't see many desktop workloads where paging out
applications to grow the data cache is ever helpful -- practically all desktop
workloads where you get a lot of IO involve streaming, not data that might
possibly fit in ram.  If I'm just copying a bunch of files around, I'd prefer
that even "worthless" pages such as eg. parts of Firefox that are only used
during load time or during rare config requests (and would thus not appear to
be part of my working set short-term) stay in cache, so I can get the maximum
interactive performance from my application.

Thank you,
Elladan


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-04-30  7:20         ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-30  7:20 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Peter Zijlstra, Elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Wed, Apr 29, 2009 at 11:47:08AM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> On Wed, 29 Apr 2009 08:42:29 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > Isn't there a hole where LRU_*_FILE << LRU_*_ANON and we now stop
> > shrinking INACTIVE_ANON even though it makes sense to.
> 
> Peter, after looking at this again, I believe that the get_scan_ratio
> logic should take care of protecting the anonymous pages, so we can
> get away with this following, less intrusive patch.
> 
> Elladan, does this smaller patch still work as expected?

Rik, since the third patch doesn't work on 2.6.28 (without disabling a lot of
code), I went ahead and tested this patch.

The system does seem relatively responsive with this patch for the most part,
with occasional lag.  I don't see much evidence at least over the course of a
few minutes that it pages out applications significantly.  It seems about
equivalent to the first patch.

Given Andrew Morton's request that I track the Mapped: field in /proc/meminfo,
I went ahead and did that with this patch built into a kernel.  Compared to the
standard Ubuntu kernel, this patch keeps significantly more Mapped memory
around, and it shrinks at a slower rate after the test runs for a while.
Eventually, it seems to reach a steady state.

For example, with your patch, Mapped will often go for 30 seconds without
changing significantly.  Without your patch, it continuously lost about
500-1000K every 5 seconds, and then jumped up again significantly when I
touched Firefox or other applications.  I do see some of that behavior with
your patch too, but it's much less significant.

When I first initiated the background load, Mapped did rapidly decrease from
about 85000K to 47000K.  It seems to have reached a fairly steady state since
then.  I would guess this implies that the VM paged out parts of my executable
set that aren't touched very often, but isn't applying further pressure to my
active pages?  Also for example, after letting the test run for a while, I
scrolled around some tabs in firefox I hadn't used since the test began, and
experienced significant lag.

This seems ok (not disastrous, anyway).  I suspect desktop users would
generally prefer the VM were extremely aggressive about keeping their
executables paged in though, much moreso than this patch provides (and note how
popular swappiness=0 seems to be).  Paging applications back in seems to
introduce a large amount of UI latency, even if the VM keeps it to a sane level
as with this patch.  Also, I don't see many desktop workloads where paging out
applications to grow the data cache is ever helpful -- practically all desktop
workloads where you get a lot of IO involve streaming, not data that might
possibly fit in ram.  If I'm just copying a bunch of files around, I'd prefer
that even "worthless" pages such as eg. parts of Firefox that are only used
during load time or during rare config requests (and would thus not appear to
be part of my working set short-term) stay in cache, so I can get the maximum
interactive performance from my application.

Thank you,
Elladan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
  2009-04-29 17:14           ` Rik van Riel
@ 2009-04-30  8:10             ` Johannes Weiner
  -1 siblings, 0 replies; 336+ messages in thread
From: Johannes Weiner @ 2009-04-30  8:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

On Wed, Apr 29, 2009 at 01:14:36PM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Although Elladan didn't test this exact patch, he reported on v2 that
the general idea of scanning active files only when they exceed the
inactive set works.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
@ 2009-04-30  8:10             ` Johannes Weiner
  0 siblings, 0 replies; 336+ messages in thread
From: Johannes Weiner @ 2009-04-30  8:10 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

On Wed, Apr 29, 2009 at 01:14:36PM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>

Although Elladan didn't test this exact patch, he reported on v2 that
the general idea of scanning active files only when they exceed the
inactive set works.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-29  5:51           ` KOSAKI Motohiro
@ 2009-04-30 11:59             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-30 11:59 UTC (permalink / raw)
  To: Theodore Tso, Wu Fengguang, Peter Zijlstra, KOSAKI Motohiro,
	Elladan, linux-kernel, linux-mm, Rik van Riel

> test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness,
>                  CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
>                  CPU opteronx4, mem 4G
>
> mouse move lag:               not happend
> window move lag:              not happend
> Mapped page decrease rapidly: not happend (I guess, these page stay in
>                                          active list on my system)
> page fault large latency:     happend (latencytop display >1200ms)
>
>
> Then, I don't doubt vm replacement logic now.
> but I need more investigate.
> I plan to try following thing today and tommorow.
>
>  - XFS
>  - LVM
>  - another io scheduler (thanks Ted, good view point)
>  - Rik's new patch

hm, AS io-scheduler don't make such large latency on my environment.
Elladan, Can you try to AS scheduler? (adding boot option "elevator=as")

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-30 11:59             ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-04-30 11:59 UTC (permalink / raw)
  To: Theodore Tso, Wu Fengguang, Peter Zijlstra, KOSAKI Motohiro,
	Elladan, linux-kernel, linux-mm, Rik van Riel

> test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness,
>                  CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
>                  CPU opteronx4, mem 4G
>
> mouse move lag:               not happend
> window move lag:              not happend
> Mapped page decrease rapidly: not happend (I guess, these page stay in
>                                          active list on my system)
> page fault large latency:     happend (latencytop display >1200ms)
>
>
> Then, I don't doubt vm replacement logic now.
> but I need more investigate.
> I plan to try following thing today and tommorow.
>
>  - XFS
>  - LVM
>  - another io scheduler (thanks Ted, good view point)
>  - Rik's new patch

hm, AS io-scheduler don't make such large latency on my environment.
Elladan, Can you try to AS scheduler? (adding boot option "elevator=as")

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-04-30  7:20         ` Elladan
@ 2009-04-30 13:08           ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-30 13:08 UTC (permalink / raw)
  To: Elladan; +Cc: Peter Zijlstra, linux-kernel, tytso, kosaki.motohiro, linux-mm

Elladan wrote:

>> Elladan, does this smaller patch still work as expected?

> The system does seem relatively responsive with this patch for the most part,
> with occasional lag.  I don't see much evidence at least over the course of a
> few minutes that it pages out applications significantly.  It seems about
> equivalent to the first patch.

OK, good to hear that.

> This seems ok (not disastrous, anyway).  I suspect desktop users would
> generally prefer the VM were extremely aggressive about keeping their
> executables paged in though, 

I agree that desktop users would probably prefer something even
more aggressive.  However, we do need to balance this against
other workloads, where inactive file pages need to be given a
fair chance to be referenced twice and promoted to the active
file list.

Because of that, I have chosen a patch with a minimal risk of
regressions on any workload.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-04-30 13:08           ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-04-30 13:08 UTC (permalink / raw)
  To: Elladan; +Cc: Peter Zijlstra, linux-kernel, tytso, kosaki.motohiro, linux-mm

Elladan wrote:

>> Elladan, does this smaller patch still work as expected?

> The system does seem relatively responsive with this patch for the most part,
> with occasional lag.  I don't see much evidence at least over the course of a
> few minutes that it pages out applications significantly.  It seems about
> equivalent to the first patch.

OK, good to hear that.

> This seems ok (not disastrous, anyway).  I suspect desktop users would
> generally prefer the VM were extremely aggressive about keeping their
> executables paged in though, 

I agree that desktop users would probably prefer something even
more aggressive.  However, we do need to balance this against
other workloads, where inactive file pages need to be given a
fair chance to be referenced twice and promoted to the active
file list.

Because of that, I have chosen a patch with a minimal risk of
regressions on any workload.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-30 11:59             ` KOSAKI Motohiro
@ 2009-04-30 13:46               ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-30 13:46 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Theodore Tso, Wu Fengguang, Peter Zijlstra, Elladan,
	linux-kernel, linux-mm, Rik van Riel

On Thu, Apr 30, 2009 at 08:59:59PM +0900, KOSAKI Motohiro wrote:
> > test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness,
> >                  CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
> >                  CPU opteronx4, mem 4G
> >
> > mouse move lag:               not happend
> > window move lag:              not happend
> > Mapped page decrease rapidly: not happend (I guess, these page stay in
> >                                          active list on my system)
> > page fault large latency:     happend (latencytop display >1200ms)
> >
> >
> > Then, I don't doubt vm replacement logic now.
> > but I need more investigate.
> > I plan to try following thing today and tommorow.
> >
> >  - XFS
> >  - LVM
> >  - another io scheduler (thanks Ted, good view point)
> >  - Rik's new patch
> 
> hm, AS io-scheduler don't make such large latency on my environment.
> Elladan, Can you try to AS scheduler? (adding boot option "elevator=as")

I switched at runtime with /sys/block/sd[ab]/queue/scheduler, using Rik's
second patch for page replacement.  It was hard to tell if this made much
difference in latency, as reported by latencytop.  Both schedulers sometimes
show outliers up to 1400msec or so, and the average latency looks like it may
be similar.

Thanks,
Elladan


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-04-30 13:46               ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-30 13:46 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Theodore Tso, Wu Fengguang, Peter Zijlstra, Elladan,
	linux-kernel, linux-mm, Rik van Riel

On Thu, Apr 30, 2009 at 08:59:59PM +0900, KOSAKI Motohiro wrote:
> > test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness,
> >                  CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
> >                  CPU opteronx4, mem 4G
> >
> > mouse move lag:               not happend
> > window move lag:              not happend
> > Mapped page decrease rapidly: not happend (I guess, these page stay in
> >                                          active list on my system)
> > page fault large latency:     happend (latencytop display >1200ms)
> >
> >
> > Then, I don't doubt vm replacement logic now.
> > but I need more investigate.
> > I plan to try following thing today and tommorow.
> >
> >  - XFS
> >  - LVM
> >  - another io scheduler (thanks Ted, good view point)
> >  - Rik's new patch
> 
> hm, AS io-scheduler don't make such large latency on my environment.
> Elladan, Can you try to AS scheduler? (adding boot option "elevator=as")

I switched at runtime with /sys/block/sd[ab]/queue/scheduler, using Rik's
second patch for page replacement.  It was hard to tell if this made much
difference in latency, as reported by latencytop.  Both schedulers sometimes
show outliers up to 1400msec or so, and the average latency looks like it may
be similar.

Thanks,
Elladan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-04-30 13:08           ` Rik van Riel
@ 2009-04-30 14:00             ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-30 14:00 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Elladan, Peter Zijlstra, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, Apr 30, 2009 at 09:08:06AM -0400, Rik van Riel wrote:
> Elladan wrote:
>
>>> Elladan, does this smaller patch still work as expected?
>
>> The system does seem relatively responsive with this patch for the most part,
>> with occasional lag.  I don't see much evidence at least over the course of a
>> few minutes that it pages out applications significantly.  It seems about
>> equivalent to the first patch.
>
> OK, good to hear that.
>
>> This seems ok (not disastrous, anyway).  I suspect desktop users would
>> generally prefer the VM were extremely aggressive about keeping their
>> executables paged in though, 
>
> I agree that desktop users would probably prefer something even
> more aggressive.  However, we do need to balance this against
> other workloads, where inactive file pages need to be given a
> fair chance to be referenced twice and promoted to the active
> file list.
>
> Because of that, I have chosen a patch with a minimal risk of
> regressions on any workload.

I agree, this seems to work well as a bugfix, for a general purpose system.

I'm just not sure that a general-purpose page replacement algorithm actually
serves most desktop users well.  I remember using some kludges back in the
2.2/2.4 days to try to force eviction of application pages when my system was
low on ram on occasion, but for desktop use that naive VM actually seemed
to generally have fewer latency problems.

Plus, since hard disks haven't been improving in speed (except for the surge in
SSDs), but RAM and CPU have been increasing dramatically, any paging or
swapping activity just becomes more and more noticeable.

Thanks,
Elladan


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-04-30 14:00             ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-04-30 14:00 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Elladan, Peter Zijlstra, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, Apr 30, 2009 at 09:08:06AM -0400, Rik van Riel wrote:
> Elladan wrote:
>
>>> Elladan, does this smaller patch still work as expected?
>
>> The system does seem relatively responsive with this patch for the most part,
>> with occasional lag.  I don't see much evidence at least over the course of a
>> few minutes that it pages out applications significantly.  It seems about
>> equivalent to the first patch.
>
> OK, good to hear that.
>
>> This seems ok (not disastrous, anyway).  I suspect desktop users would
>> generally prefer the VM were extremely aggressive about keeping their
>> executables paged in though, 
>
> I agree that desktop users would probably prefer something even
> more aggressive.  However, we do need to balance this against
> other workloads, where inactive file pages need to be given a
> fair chance to be referenced twice and promoted to the active
> file list.
>
> Because of that, I have chosen a patch with a minimal risk of
> regressions on any workload.

I agree, this seems to work well as a bugfix, for a general purpose system.

I'm just not sure that a general-purpose page replacement algorithm actually
serves most desktop users well.  I remember using some kludges back in the
2.2/2.4 days to try to force eviction of application pages when my system was
low on ram on occasion, but for desktop use that naive VM actually seemed
to generally have fewer latency problems.

Plus, since hard disks haven't been improving in speed (except for the surge in
SSDs), but RAM and CPU have been increasing dramatically, any paging or
swapping activity just becomes more and more noticeable.

Thanks,
Elladan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-04-30  7:20         ` Elladan
@ 2009-05-01  0:45           ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01  0:45 UTC (permalink / raw)
  To: Elladan
  Cc: riel, peterz, elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, 30 Apr 2009 00:20:58 -0700
Elladan <elladan@eskimo.com> wrote:

> > Elladan, does this smaller patch still work as expected?
> 
> Rik, since the third patch doesn't work on 2.6.28 (without disabling a lot of
> code), I went ahead and tested this patch.
> 
> The system does seem relatively responsive with this patch for the most part,
> with occasional lag.  I don't see much evidence at least over the course of a
> few minutes that it pages out applications significantly.  It seems about
> equivalent to the first patch.
> 
> Given Andrew Morton's request that I track the Mapped: field in /proc/meminfo,
> I went ahead and did that with this patch built into a kernel.  Compared to the
> standard Ubuntu kernel, this patch keeps significantly more Mapped memory
> around, and it shrinks at a slower rate after the test runs for a while.
> Eventually, it seems to reach a steady state.
> 
> For example, with your patch, Mapped will often go for 30 seconds without
> changing significantly.  Without your patch, it continuously lost about
> 500-1000K every 5 seconds, and then jumped up again significantly when I
> touched Firefox or other applications.  I do see some of that behavior with
> your patch too, but it's much less significant.

Were you able to tell whether altering /proc/sys/vm/swappiness appropriately
regulated the rate at which the mapped page count decreased?

Thanks.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01  0:45           ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01  0:45 UTC (permalink / raw)
  To: Elladan; +Cc: riel, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, 30 Apr 2009 00:20:58 -0700
Elladan <elladan@eskimo.com> wrote:

> > Elladan, does this smaller patch still work as expected?
> 
> Rik, since the third patch doesn't work on 2.6.28 (without disabling a lot of
> code), I went ahead and tested this patch.
> 
> The system does seem relatively responsive with this patch for the most part,
> with occasional lag.  I don't see much evidence at least over the course of a
> few minutes that it pages out applications significantly.  It seems about
> equivalent to the first patch.
> 
> Given Andrew Morton's request that I track the Mapped: field in /proc/meminfo,
> I went ahead and did that with this patch built into a kernel.  Compared to the
> standard Ubuntu kernel, this patch keeps significantly more Mapped memory
> around, and it shrinks at a slower rate after the test runs for a while.
> Eventually, it seems to reach a steady state.
> 
> For example, with your patch, Mapped will often go for 30 seconds without
> changing significantly.  Without your patch, it continuously lost about
> 500-1000K every 5 seconds, and then jumped up again significantly when I
> touched Firefox or other applications.  I do see some of that behavior with
> your patch too, but it's much less significant.

Were you able to tell whether altering /proc/sys/vm/swappiness appropriately
regulated the rate at which the mapped page count decreased?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01  0:45           ` Andrew Morton
@ 2009-05-01  0:59             ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01  0:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, 30 Apr 2009 17:45:36 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> Were you able to tell whether altering /proc/sys/vm/swappiness
> appropriately regulated the rate at which the mapped page count
> decreased?

That should not make a difference at all for mapped file
pages, after the change was merged that makes the VM ignores
the referenced bit of mapped active file pages.

Ever since the split LRU code was merged, all that the
swappiness controls is the aggressiveness of file vs
anonymous LRU scanning.

Currently the kernel has no effective code to protect the 
page cache working set from streaming IO.  Elladan's bug
report shows that we do need some kind of protection...

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01  0:59             ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01  0:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, 30 Apr 2009 17:45:36 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> Were you able to tell whether altering /proc/sys/vm/swappiness
> appropriately regulated the rate at which the mapped page count
> decreased?

That should not make a difference at all for mapped file
pages, after the change was merged that makes the VM ignores
the referenced bit of mapped active file pages.

Ever since the split LRU code was merged, all that the
swappiness controls is the aggressiveness of file vs
anonymous LRU scanning.

Currently the kernel has no effective code to protect the 
page cache working set from streaming IO.  Elladan's bug
report shows that we do need some kind of protection...

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01  0:59             ` Rik van Riel
@ 2009-05-01  1:13               ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01  1:13 UTC (permalink / raw)
  To: Rik van Riel
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, 30 Apr 2009 20:59:36 -0400
Rik van Riel <riel@redhat.com> wrote:

> On Thu, 30 Apr 2009 17:45:36 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > Were you able to tell whether altering /proc/sys/vm/swappiness
> > appropriately regulated the rate at which the mapped page count
> > decreased?
> 
> That should not make a difference at all for mapped file
> pages, after the change was merged that makes the VM ignores
> the referenced bit of mapped active file pages.
> 
> Ever since the split LRU code was merged, all that the
> swappiness controls is the aggressiveness of file vs
> anonymous LRU scanning.

Which would cause exactly the problem Elladan saw?

> Currently the kernel has no effective code to protect the 
> page cache working set from streaming IO.  Elladan's bug
> report shows that we do need some kind of protection...

Seems to me that reclaim should treat swapcache-backed mapped mages in
a similar fashion to file-backed mapped pages?


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01  1:13               ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01  1:13 UTC (permalink / raw)
  To: Rik van Riel
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, 30 Apr 2009 20:59:36 -0400
Rik van Riel <riel@redhat.com> wrote:

> On Thu, 30 Apr 2009 17:45:36 -0700
> Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > Were you able to tell whether altering /proc/sys/vm/swappiness
> > appropriately regulated the rate at which the mapped page count
> > decreased?
> 
> That should not make a difference at all for mapped file
> pages, after the change was merged that makes the VM ignores
> the referenced bit of mapped active file pages.
> 
> Ever since the split LRU code was merged, all that the
> swappiness controls is the aggressiveness of file vs
> anonymous LRU scanning.

Which would cause exactly the problem Elladan saw?

> Currently the kernel has no effective code to protect the 
> page cache working set from streaming IO.  Elladan's bug
> report shows that we do need some kind of protection...

Seems to me that reclaim should treat swapcache-backed mapped mages in
a similar fashion to file-backed mapped pages?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01  1:13               ` Andrew Morton
@ 2009-05-01  1:50                 ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01  1:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, 30 Apr 2009 18:13:40 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 30 Apr 2009 20:59:36 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > On Thu, 30 Apr 2009 17:45:36 -0700
> > Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > Were you able to tell whether altering /proc/sys/vm/swappiness
> > > appropriately regulated the rate at which the mapped page count
> > > decreased?
> > 
> > That should not make a difference at all for mapped file
> > pages, after the change was merged that makes the VM ignores
> > the referenced bit of mapped active file pages.
> > 
> > Ever since the split LRU code was merged, all that the
> > swappiness controls is the aggressiveness of file vs
> > anonymous LRU scanning.
> 
> Which would cause exactly the problem Elladan saw?

Yes.  It was not noticable in the initial split LRU code,
but after we decided to ignore the referenced bit on active
file pages and deactivate pages regardless, it has gotten
exacerbated.

That change was very good for scalability, so we should not
undo it.  However, we do need to put something in place to
protect the working set from streaming IO.

> > Currently the kernel has no effective code to protect the 
> > page cache working set from streaming IO.  Elladan's bug
> > report shows that we do need some kind of protection...
> 
> Seems to me that reclaim should treat swapcache-backed mapped mages in
> a similar fashion to file-backed mapped pages?

Swapcache-backed pages are not on the same set of LRUs as
file-backed mapped pages.

Furthermore, there is no streaming IO on the anon LRUs like
there is on the file LRUs. Only the file LRUs need (and want)
use-once replacement, which means that we only need special
protection of the working set for file-backed pages.

When we implement working set protection, we might as well
do it for frequently accessed unmapped pages too.  There is
no reason to restrict this protection to mapped pages.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01  1:50                 ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01  1:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, 30 Apr 2009 18:13:40 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Thu, 30 Apr 2009 20:59:36 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > On Thu, 30 Apr 2009 17:45:36 -0700
> > Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > Were you able to tell whether altering /proc/sys/vm/swappiness
> > > appropriately regulated the rate at which the mapped page count
> > > decreased?
> > 
> > That should not make a difference at all for mapped file
> > pages, after the change was merged that makes the VM ignores
> > the referenced bit of mapped active file pages.
> > 
> > Ever since the split LRU code was merged, all that the
> > swappiness controls is the aggressiveness of file vs
> > anonymous LRU scanning.
> 
> Which would cause exactly the problem Elladan saw?

Yes.  It was not noticable in the initial split LRU code,
but after we decided to ignore the referenced bit on active
file pages and deactivate pages regardless, it has gotten
exacerbated.

That change was very good for scalability, so we should not
undo it.  However, we do need to put something in place to
protect the working set from streaming IO.

> > Currently the kernel has no effective code to protect the 
> > page cache working set from streaming IO.  Elladan's bug
> > report shows that we do need some kind of protection...
> 
> Seems to me that reclaim should treat swapcache-backed mapped mages in
> a similar fashion to file-backed mapped pages?

Swapcache-backed pages are not on the same set of LRUs as
file-backed mapped pages.

Furthermore, there is no streaming IO on the anon LRUs like
there is on the file LRUs. Only the file LRUs need (and want)
use-once replacement, which means that we only need special
protection of the working set for file-backed pages.

When we implement working set protection, we might as well
do it for frequently accessed unmapped pages too.  There is
no reason to restrict this protection to mapped pages.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01  1:50                 ` Rik van Riel
@ 2009-05-01  2:54                   ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01  2:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, 30 Apr 2009 21:50:34 -0400 Rik van Riel <riel@redhat.com> wrote:

> > Which would cause exactly the problem Elladan saw?
> 
> Yes.  It was not noticable in the initial split LRU code,
> but after we decided to ignore the referenced bit on active
> file pages and deactivate pages regardless, it has gotten
> exacerbated.
> 
> That change was very good for scalability, so we should not
> undo it.  However, we do need to put something in place to
> protect the working set from streaming IO.
> 
> > > Currently the kernel has no effective code to protect the 
> > > page cache working set from streaming IO.  Elladan's bug
> > > report shows that we do need some kind of protection...
> > 
> > Seems to me that reclaim should treat swapcache-backed mapped mages in
> > a similar fashion to file-backed mapped pages?
> 
> Swapcache-backed pages are not on the same set of LRUs as
> file-backed mapped pages.

yup.

> Furthermore, there is no streaming IO on the anon LRUs like
> there is on the file LRUs. Only the file LRUs need (and want)
> use-once replacement, which means that we only need special
> protection of the working set for file-backed pages.

OK.

> When we implement working set protection, we might as well
> do it for frequently accessed unmapped pages too.  There is
> no reason to restrict this protection to mapped pages.

Well.  Except for empirical observation, which tells us that biasing
reclaim to prefer to retain mapped memory produces a better result.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01  2:54                   ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01  2:54 UTC (permalink / raw)
  To: Rik van Riel
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, 30 Apr 2009 21:50:34 -0400 Rik van Riel <riel@redhat.com> wrote:

> > Which would cause exactly the problem Elladan saw?
> 
> Yes.  It was not noticable in the initial split LRU code,
> but after we decided to ignore the referenced bit on active
> file pages and deactivate pages regardless, it has gotten
> exacerbated.
> 
> That change was very good for scalability, so we should not
> undo it.  However, we do need to put something in place to
> protect the working set from streaming IO.
> 
> > > Currently the kernel has no effective code to protect the 
> > > page cache working set from streaming IO.  Elladan's bug
> > > report shows that we do need some kind of protection...
> > 
> > Seems to me that reclaim should treat swapcache-backed mapped mages in
> > a similar fashion to file-backed mapped pages?
> 
> Swapcache-backed pages are not on the same set of LRUs as
> file-backed mapped pages.

yup.

> Furthermore, there is no streaming IO on the anon LRUs like
> there is on the file LRUs. Only the file LRUs need (and want)
> use-once replacement, which means that we only need special
> protection of the working set for file-backed pages.

OK.

> When we implement working set protection, we might as well
> do it for frequently accessed unmapped pages too.  There is
> no reason to restrict this protection to mapped pages.

Well.  Except for empirical observation, which tells us that biasing
reclaim to prefer to retain mapped memory produces a better result.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01  0:45           ` Andrew Morton
@ 2009-05-01  3:09             ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-05-01  3:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Elladan, riel, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, Apr 30, 2009 at 05:45:36PM -0700, Andrew Morton wrote:
> On Thu, 30 Apr 2009 00:20:58 -0700
> Elladan <elladan@eskimo.com> wrote:
> 
> > > Elladan, does this smaller patch still work as expected?
> > 
> > Rik, since the third patch doesn't work on 2.6.28 (without disabling a lot of
> > code), I went ahead and tested this patch.
> > 
> > The system does seem relatively responsive with this patch for the most part,
> > with occasional lag.  I don't see much evidence at least over the course of a
> > few minutes that it pages out applications significantly.  It seems about
> > equivalent to the first patch.
> > 
> > Given Andrew Morton's request that I track the Mapped: field in /proc/meminfo,
> > I went ahead and did that with this patch built into a kernel.  Compared to the
> > standard Ubuntu kernel, this patch keeps significantly more Mapped memory
> > around, and it shrinks at a slower rate after the test runs for a while.
> > Eventually, it seems to reach a steady state.
> > 
> > For example, with your patch, Mapped will often go for 30 seconds without
> > changing significantly.  Without your patch, it continuously lost about
> > 500-1000K every 5 seconds, and then jumped up again significantly when I
> > touched Firefox or other applications.  I do see some of that behavior with
> > your patch too, but it's much less significant.
> 
> Were you able to tell whether altering /proc/sys/vm/swappiness appropriately
> regulated the rate at which the mapped page count decreased?

I don't believe so.  I tested with swappiness=0 and =60, and in each case the
mapped pages continued to decrease.  I don't know at what rate though.  If
you'd like more precise data, I can rerun the test with appropriate logging.  I
admit my "Hey, latency is terrible and mapped pages is decreasing" testing is
somewhat unscientific.

I get the impression that VM regressions happen fairly regularly.  Does anyone
have good unit tests for this?  Is seems like a difficult problem, since it's
partly based on pattern and partly timing.

-J

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01  3:09             ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-05-01  3:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Elladan, riel, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Thu, Apr 30, 2009 at 05:45:36PM -0700, Andrew Morton wrote:
> On Thu, 30 Apr 2009 00:20:58 -0700
> Elladan <elladan@eskimo.com> wrote:
> 
> > > Elladan, does this smaller patch still work as expected?
> > 
> > Rik, since the third patch doesn't work on 2.6.28 (without disabling a lot of
> > code), I went ahead and tested this patch.
> > 
> > The system does seem relatively responsive with this patch for the most part,
> > with occasional lag.  I don't see much evidence at least over the course of a
> > few minutes that it pages out applications significantly.  It seems about
> > equivalent to the first patch.
> > 
> > Given Andrew Morton's request that I track the Mapped: field in /proc/meminfo,
> > I went ahead and did that with this patch built into a kernel.  Compared to the
> > standard Ubuntu kernel, this patch keeps significantly more Mapped memory
> > around, and it shrinks at a slower rate after the test runs for a while.
> > Eventually, it seems to reach a steady state.
> > 
> > For example, with your patch, Mapped will often go for 30 seconds without
> > changing significantly.  Without your patch, it continuously lost about
> > 500-1000K every 5 seconds, and then jumped up again significantly when I
> > touched Firefox or other applications.  I do see some of that behavior with
> > your patch too, but it's much less significant.
> 
> Were you able to tell whether altering /proc/sys/vm/swappiness appropriately
> regulated the rate at which the mapped page count decreased?

I don't believe so.  I tested with swappiness=0 and =60, and in each case the
mapped pages continued to decrease.  I don't know at what rate though.  If
you'd like more precise data, I can rerun the test with appropriate logging.  I
admit my "Hey, latency is terrible and mapped pages is decreasing" testing is
somewhat unscientific.

I get the impression that VM regressions happen fairly regularly.  Does anyone
have good unit tests for this?  Is seems like a difficult problem, since it's
partly based on pattern and partly timing.

-J

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01  2:54                   ` Andrew Morton
@ 2009-05-01 14:05                     ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 14:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

Andrew Morton wrote:

>> When we implement working set protection, we might as well
>> do it for frequently accessed unmapped pages too.  There is
>> no reason to restrict this protection to mapped pages.
> 
> Well.  Except for empirical observation, which tells us that biasing
> reclaim to prefer to retain mapped memory produces a better result.

That used to be the case because file-backed and
swap-backed pages shared the same set of LRUs,
while each following a different page reclaim
heuristic!

Today:
1) file-backed and swap-backed pages are separated,
2) the majority of mapped pages are on the swap-backed LRUs
3) the accessed bit on active pages no longer means much,
    for good scalability reasons, and
4) because of (3), we cannot really provide special treatment
    to any individual page any more, however

This means we need to provide our working set protection
on a per-list basis, by tweaking the scan rate or avoiding
scanning of the active file list alltogether under certain
conditions.

As a side effect, this will help protect frequently accessed
file pages (good for ftp and nfs servers), indirect blocks,
inode buffers and other frequently used metadata.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01 14:05                     ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 14:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

Andrew Morton wrote:

>> When we implement working set protection, we might as well
>> do it for frequently accessed unmapped pages too.  There is
>> no reason to restrict this protection to mapped pages.
> 
> Well.  Except for empirical observation, which tells us that biasing
> reclaim to prefer to retain mapped memory produces a better result.

That used to be the case because file-backed and
swap-backed pages shared the same set of LRUs,
while each following a different page reclaim
heuristic!

Today:
1) file-backed and swap-backed pages are separated,
2) the majority of mapped pages are on the swap-backed LRUs
3) the accessed bit on active pages no longer means much,
    for good scalability reasons, and
4) because of (3), we cannot really provide special treatment
    to any individual page any more, however

This means we need to provide our working set protection
on a per-list basis, by tweaking the scan rate or avoiding
scanning of the active file list alltogether under certain
conditions.

As a side effect, this will help protect frequently accessed
file pages (good for ftp and nfs servers), indirect blocks,
inode buffers and other frequently used metadata.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 14:05                     ` Rik van Riel
@ 2009-05-01 18:04                       ` Ray Lee
  -1 siblings, 0 replies; 336+ messages in thread
From: Ray Lee @ 2009-05-01 18:04 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Fri, May 1, 2009 at 7:05 AM, Rik van Riel <riel@redhat.com> wrote:
>
> Andrew Morton wrote:
>
>>> When we implement working set protection, we might as well
>>> do it for frequently accessed unmapped pages too.  There is
>>> no reason to restrict this protection to mapped pages.
>>
>> Well.  Except for empirical observation, which tells us that biasing
>> reclaim to prefer to retain mapped memory produces a better result.
>
> That used to be the case because file-backed and
> swap-backed pages shared the same set of LRUs,
> while each following a different page reclaim
> heuristic!
>
> Today:
> 1) file-backed and swap-backed pages are separated,
> 2) the majority of mapped pages are on the swap-backed LRUs
> 3) the accessed bit on active pages no longer means much,
>   for good scalability reasons, and
> 4) because of (3), we cannot really provide special treatment
>   to any individual page any more, however
>
> This means we need to provide our working set protection
> on a per-list basis, by tweaking the scan rate or avoiding
> scanning of the active file list alltogether under certain
> conditions.
>
> As a side effect, this will help protect frequently accessed
> file pages (good for ftp and nfs servers), indirect blocks,
> inode buffers and other frequently used metadata.

Just an honest question: Who does #3 help? All normal linux users, or
large systems for some definition of large? (Helping large systems is
good; historically it eventually helps everyone. But the point I'm
driving at is that the minority of systems which tend to use one
kernel for a while and stick with it -- ie, embedded or large iron --
can and are tuned for specific workloads. The majority of systems that
upgrade the kernel frequently, such as desktop systems needing support
for new hardware, tend to rely more upon the kernel defaults.)

Also, not all the above items are equal from a latency point of view.
The latency impact of an inode needing to be fetched from disk is
budgeted for already in most userspace design. Opening a file can be
slow, news at 11. Try not to open as many files, solution at 11:01.

The latency impact of jumping to a different part of your own
executable, however, is something most userspace programmers likely
never think of. This hurts even more in this modern age of web
browsers, where firefox has to act as a layout engine, video player,
parser and compiler, etc. Not every web page uses every feature, which
means clicking a random URL can suddenly stop the whole shebang while
a previously-unreferenced page is swapped back in. With executables,
past usage doesn't presage future need.

Said a different way, executables are not equivalent to a random
collection of mapped pages. A collection of inodes may or may not have
any causal links between them. A collection of pages for an executable
are linked via function calls, and the compiler and linker already
took a first pass at evicting unnecessary baggage.

Said way #3: We desktop users really want a way to say "Please don't
page my executables out when I'm running a system with 3gig of RAM." I
hate knobs, but I'm willing to beg for one in this case. 'cause
mlock()ing my entire working set into RAM seems pretty silly.

Does any of that make sense, or am I talking out of an inappropriate orifice?

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01 18:04                       ` Ray Lee
  0 siblings, 0 replies; 336+ messages in thread
From: Ray Lee @ 2009-05-01 18:04 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Fri, May 1, 2009 at 7:05 AM, Rik van Riel <riel@redhat.com> wrote:
>
> Andrew Morton wrote:
>
>>> When we implement working set protection, we might as well
>>> do it for frequently accessed unmapped pages too.  There is
>>> no reason to restrict this protection to mapped pages.
>>
>> Well.  Except for empirical observation, which tells us that biasing
>> reclaim to prefer to retain mapped memory produces a better result.
>
> That used to be the case because file-backed and
> swap-backed pages shared the same set of LRUs,
> while each following a different page reclaim
> heuristic!
>
> Today:
> 1) file-backed and swap-backed pages are separated,
> 2) the majority of mapped pages are on the swap-backed LRUs
> 3) the accessed bit on active pages no longer means much,
>   for good scalability reasons, and
> 4) because of (3), we cannot really provide special treatment
>   to any individual page any more, however
>
> This means we need to provide our working set protection
> on a per-list basis, by tweaking the scan rate or avoiding
> scanning of the active file list alltogether under certain
> conditions.
>
> As a side effect, this will help protect frequently accessed
> file pages (good for ftp and nfs servers), indirect blocks,
> inode buffers and other frequently used metadata.

Just an honest question: Who does #3 help? All normal linux users, or
large systems for some definition of large? (Helping large systems is
good; historically it eventually helps everyone. But the point I'm
driving at is that the minority of systems which tend to use one
kernel for a while and stick with it -- ie, embedded or large iron --
can and are tuned for specific workloads. The majority of systems that
upgrade the kernel frequently, such as desktop systems needing support
for new hardware, tend to rely more upon the kernel defaults.)

Also, not all the above items are equal from a latency point of view.
The latency impact of an inode needing to be fetched from disk is
budgeted for already in most userspace design. Opening a file can be
slow, news at 11. Try not to open as many files, solution at 11:01.

The latency impact of jumping to a different part of your own
executable, however, is something most userspace programmers likely
never think of. This hurts even more in this modern age of web
browsers, where firefox has to act as a layout engine, video player,
parser and compiler, etc. Not every web page uses every feature, which
means clicking a random URL can suddenly stop the whole shebang while
a previously-unreferenced page is swapped back in. With executables,
past usage doesn't presage future need.

Said a different way, executables are not equivalent to a random
collection of mapped pages. A collection of inodes may or may not have
any causal links between them. A collection of pages for an executable
are linked via function calls, and the compiler and linker already
took a first pass at evicting unnecessary baggage.

Said way #3: We desktop users really want a way to say "Please don't
page my executables out when I'm running a system with 3gig of RAM." I
hate knobs, but I'm willing to beg for one in this case. 'cause
mlock()ing my entire working set into RAM seems pretty silly.

Does any of that make sense, or am I talking out of an inappropriate orifice?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 18:04                       ` Ray Lee
@ 2009-05-01 19:34                         ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 19:34 UTC (permalink / raw)
  To: Ray Lee
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

Ray Lee wrote:

> Said way #3: We desktop users really want a way to say "Please don't
> page my executables out when I'm running a system with 3gig of RAM." I
> hate knobs, but I'm willing to beg for one in this case. 'cause
> mlock()ing my entire working set into RAM seems pretty silly.
> 
> Does any of that make sense, or am I talking out of an inappropriate orifice?

The "don't page my executables out" part makes sense.

However, I believe that kind of behaviour should be the
default.  Desktops and servers alike have a few different
kinds of data in the page cache:
1) pages that have been frequently accessed at some point
    in the past and got promoted to the active list
2) streaming IO

I believe that we want to give (1) absolute protection from
(2), provided there are not too many pages on the active file
list.  That way we will provide executables, cached indirect
and inode blocks, etc. from streaming IO.

Pages that are new to the page cache start on the inactive
list.  Only if they get accessed twice while on that list,
they get promoted to the active list.

Streaming IO should normally be evicted from memory before
it can get accessed again.  This means those pages do not
get promoted to the active list and the working set is
protected.

Does this make sense?

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01 19:34                         ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 19:34 UTC (permalink / raw)
  To: Ray Lee
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

Ray Lee wrote:

> Said way #3: We desktop users really want a way to say "Please don't
> page my executables out when I'm running a system with 3gig of RAM." I
> hate knobs, but I'm willing to beg for one in this case. 'cause
> mlock()ing my entire working set into RAM seems pretty silly.
> 
> Does any of that make sense, or am I talking out of an inappropriate orifice?

The "don't page my executables out" part makes sense.

However, I believe that kind of behaviour should be the
default.  Desktops and servers alike have a few different
kinds of data in the page cache:
1) pages that have been frequently accessed at some point
    in the past and got promoted to the active list
2) streaming IO

I believe that we want to give (1) absolute protection from
(2), provided there are not too many pages on the active file
list.  That way we will provide executables, cached indirect
and inode blocks, etc. from streaming IO.

Pages that are new to the page cache start on the inactive
list.  Only if they get accessed twice while on that list,
they get promoted to the active list.

Streaming IO should normally be evicted from memory before
it can get accessed again.  This means those pages do not
get promoted to the active list and the working set is
protected.

Does this make sense?

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 14:05                     ` Rik van Riel
@ 2009-05-01 19:35                       ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01 19:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Fri, 01 May 2009 10:05:53 -0400
Rik van Riel <riel@redhat.com> wrote:

> Andrew Morton wrote:
> 
> >> When we implement working set protection, we might as well
> >> do it for frequently accessed unmapped pages too.  There is
> >> no reason to restrict this protection to mapped pages.
> > 
> > Well.  Except for empirical observation, which tells us that biasing
> > reclaim to prefer to retain mapped memory produces a better result.
> 
> That used to be the case because file-backed and
> swap-backed pages shared the same set of LRUs,
> while each following a different page reclaim
> heuristic!

No, I think it still _is_ the case.  When reclaim is treating mapped
and non-mapped pages equally, the end result sucks.  Applications get
all laggy and humans get irritated.  It may be that the system was
optimised from an overall throughput POV, but the result was
*irritating*.

Which led us to prefer to retain mapped pages.  This had nothing at all
to do with internal impementation details - it was a design objective
based upon empirical observation of system behaviour.

> Today:
> 1) file-backed and swap-backed pages are separated,
> 2) the majority of mapped pages are on the swap-backed LRUs
> 3) the accessed bit on active pages no longer means much,
>     for good scalability reasons, and
> 4) because of (3), we cannot really provide special treatment
>     to any individual page any more, however
> 
> This means we need to provide our working set protection
> on a per-list basis, by tweaking the scan rate or avoiding
> scanning of the active file list alltogether under certain
> conditions.
> 
> As a side effect, this will help protect frequently accessed
> file pages (good for ftp and nfs servers), indirect blocks,
> inode buffers and other frequently used metadata.

Yeah, but that's all internal-implementation-of-the-day details.  It
just doesn't matter how the sausages are made.  What we have learned is
that the policy of retaining mapped pages over unmapped pages, *all
other things being equal* leads to a more pleasing system.



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01 19:35                       ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01 19:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Fri, 01 May 2009 10:05:53 -0400
Rik van Riel <riel@redhat.com> wrote:

> Andrew Morton wrote:
> 
> >> When we implement working set protection, we might as well
> >> do it for frequently accessed unmapped pages too.  There is
> >> no reason to restrict this protection to mapped pages.
> > 
> > Well.  Except for empirical observation, which tells us that biasing
> > reclaim to prefer to retain mapped memory produces a better result.
> 
> That used to be the case because file-backed and
> swap-backed pages shared the same set of LRUs,
> while each following a different page reclaim
> heuristic!

No, I think it still _is_ the case.  When reclaim is treating mapped
and non-mapped pages equally, the end result sucks.  Applications get
all laggy and humans get irritated.  It may be that the system was
optimised from an overall throughput POV, but the result was
*irritating*.

Which led us to prefer to retain mapped pages.  This had nothing at all
to do with internal impementation details - it was a design objective
based upon empirical observation of system behaviour.

> Today:
> 1) file-backed and swap-backed pages are separated,
> 2) the majority of mapped pages are on the swap-backed LRUs
> 3) the accessed bit on active pages no longer means much,
>     for good scalability reasons, and
> 4) because of (3), we cannot really provide special treatment
>     to any individual page any more, however
> 
> This means we need to provide our working set protection
> on a per-list basis, by tweaking the scan rate or avoiding
> scanning of the active file list alltogether under certain
> conditions.
> 
> As a side effect, this will help protect frequently accessed
> file pages (good for ftp and nfs servers), indirect blocks,
> inode buffers and other frequently used metadata.

Yeah, but that's all internal-implementation-of-the-day details.  It
just doesn't matter how the sausages are made.  What we have learned is
that the policy of retaining mapped pages over unmapped pages, *all
other things being equal* leads to a more pleasing system.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 19:34                         ` Rik van Riel
@ 2009-05-01 19:44                           ` Ray Lee
  -1 siblings, 0 replies; 336+ messages in thread
From: Ray Lee @ 2009-05-01 19:44 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Fri, May 1, 2009 at 12:34 PM, Rik van Riel <riel@redhat.com> wrote:
> Ray Lee wrote:
>
>> Said way #3: We desktop users really want a way to say "Please don't
>> page my executables out when I'm running a system with 3gig of RAM." I
>> hate knobs, but I'm willing to beg for one in this case. 'cause
>> mlock()ing my entire working set into RAM seems pretty silly.
>>
>> Does any of that make sense, or am I talking out of an inappropriate
>> orifice?
>
> The "don't page my executables out" part makes sense.
>
> However, I believe that kind of behaviour should be the
> default.  Desktops and servers alike have a few different
> kinds of data in the page cache:
> 1) pages that have been frequently accessed at some point
>   in the past and got promoted to the active list
> 2) streaming IO
>
> I believe that we want to give (1) absolute protection from
> (2), provided there are not too many pages on the active file
> list.  That way we will provide executables, cached indirect
> and inode blocks, etc. from streaming IO.
>
> Pages that are new to the page cache start on the inactive
> list.  Only if they get accessed twice while on that list,
> they get promoted to the active list.
>
> Streaming IO should normally be evicted from memory before
> it can get accessed again.  This means those pages do not
> get promoted to the active list and the working set is
> protected.
>
> Does this make sense?

Streaming IO should always be at the bottom of the list as it's nearly
always use-once. That's not the interesting case. (I'm glad you're
protecting everything from steaming IO, it's a good thing. And if it's
a media server and serving the same stream to many clients, if I
understood you correctly those streams will no longer be use-once, and
therefore be a normal citizen with the rest of the cache. That's great
too.)

The interesting case is an updatedb running in the background, paging
out firefox, or worse, parts of X. That sucks.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01 19:44                           ` Ray Lee
  0 siblings, 0 replies; 336+ messages in thread
From: Ray Lee @ 2009-05-01 19:44 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Fri, May 1, 2009 at 12:34 PM, Rik van Riel <riel@redhat.com> wrote:
> Ray Lee wrote:
>
>> Said way #3: We desktop users really want a way to say "Please don't
>> page my executables out when I'm running a system with 3gig of RAM." I
>> hate knobs, but I'm willing to beg for one in this case. 'cause
>> mlock()ing my entire working set into RAM seems pretty silly.
>>
>> Does any of that make sense, or am I talking out of an inappropriate
>> orifice?
>
> The "don't page my executables out" part makes sense.
>
> However, I believe that kind of behaviour should be the
> default.  Desktops and servers alike have a few different
> kinds of data in the page cache:
> 1) pages that have been frequently accessed at some point
>   in the past and got promoted to the active list
> 2) streaming IO
>
> I believe that we want to give (1) absolute protection from
> (2), provided there are not too many pages on the active file
> list.  That way we will provide executables, cached indirect
> and inode blocks, etc. from streaming IO.
>
> Pages that are new to the page cache start on the inactive
> list.  Only if they get accessed twice while on that list,
> they get promoted to the active list.
>
> Streaming IO should normally be evicted from memory before
> it can get accessed again.  This means those pages do not
> get promoted to the active list and the working set is
> protected.
>
> Does this make sense?

Streaming IO should always be at the bottom of the list as it's nearly
always use-once. That's not the interesting case. (I'm glad you're
protecting everything from steaming IO, it's a good thing. And if it's
a media server and serving the same stream to many clients, if I
understood you correctly those streams will no longer be use-once, and
therefore be a normal citizen with the rest of the cache. That's great
too.)

The interesting case is an updatedb running in the background, paging
out firefox, or worse, parts of X. That sucks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 19:35                       ` Andrew Morton
@ 2009-05-01 20:05                         ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 20:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

Andrew Morton wrote:
> On Fri, 01 May 2009 10:05:53 -0400
> Rik van Riel <riel@redhat.com> wrote:

>> This means we need to provide our working set protection
>> on a per-list basis, by tweaking the scan rate or avoiding
>> scanning of the active file list alltogether under certain
>> conditions.
>>
>> As a side effect, this will help protect frequently accessed
>> file pages (good for ftp and nfs servers), indirect blocks,
>> inode buffers and other frequently used metadata.
> 
> Yeah, but that's all internal-implementation-of-the-day details.  It
> just doesn't matter how the sausages are made.  What we have learned is
> that the policy of retaining mapped pages over unmapped pages, *all
> other things being equal* leads to a more pleasing system.

Well, retaining mapped pages is one of the implementations
that lead to a more pleasing system.

I suspect that a fully scan resistant active file list will
show the same behaviour, as well as a few other desired
behaviours that come in very handy in various server loads.

Are you open to evaluating other methods that could lead, on
desktop systems, to a behaviour similar to the one achieved
by the preserve-mapped-pages mechanism?

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01 20:05                         ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 20:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

Andrew Morton wrote:
> On Fri, 01 May 2009 10:05:53 -0400
> Rik van Riel <riel@redhat.com> wrote:

>> This means we need to provide our working set protection
>> on a per-list basis, by tweaking the scan rate or avoiding
>> scanning of the active file list alltogether under certain
>> conditions.
>>
>> As a side effect, this will help protect frequently accessed
>> file pages (good for ftp and nfs servers), indirect blocks,
>> inode buffers and other frequently used metadata.
> 
> Yeah, but that's all internal-implementation-of-the-day details.  It
> just doesn't matter how the sausages are made.  What we have learned is
> that the policy of retaining mapped pages over unmapped pages, *all
> other things being equal* leads to a more pleasing system.

Well, retaining mapped pages is one of the implementations
that lead to a more pleasing system.

I suspect that a fully scan resistant active file list will
show the same behaviour, as well as a few other desired
behaviours that come in very handy in various server loads.

Are you open to evaluating other methods that could lead, on
desktop systems, to a behaviour similar to the one achieved
by the preserve-mapped-pages mechanism?

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 19:44                           ` Ray Lee
@ 2009-05-01 20:08                             ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 20:08 UTC (permalink / raw)
  To: Ray Lee
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

Ray Lee wrote:

> Streaming IO should always be at the bottom of the list as it's nearly
> always use-once. That's not the interesting case.

Unfortunately, on current 2.6.28 through 2.6.30 that is broken.

Streaming IO will eventually eat away all of the pages on the
active file list, causing the binaries and libraries that programs
used to be kicked out of memory.

Not interesting?

> The interesting case is an updatedb running in the background, paging
> out firefox, or worse, parts of X. That sucks.

This is a combination of use-once IO and VFS metadata.

The used-once pages can be reclaimed fairly easily.

The growing metadata needs to be addressed by putting pressure
on it via the slab/slub/slob shrinker functions.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01 20:08                             ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 20:08 UTC (permalink / raw)
  To: Ray Lee
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

Ray Lee wrote:

> Streaming IO should always be at the bottom of the list as it's nearly
> always use-once. That's not the interesting case.

Unfortunately, on current 2.6.28 through 2.6.30 that is broken.

Streaming IO will eventually eat away all of the pages on the
active file list, causing the binaries and libraries that programs
used to be kicked out of memory.

Not interesting?

> The interesting case is an updatedb running in the background, paging
> out firefox, or worse, parts of X. That sucks.

This is a combination of use-once IO and VFS metadata.

The used-once pages can be reclaimed fairly easily.

The growing metadata needs to be addressed by putting pressure
on it via the slab/slub/slob shrinker functions.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 19:34                         ` Rik van Riel
@ 2009-05-01 20:17                           ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-05-01 20:17 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ray Lee, Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Fri, May 01, 2009 at 03:34:19PM -0400, Rik van Riel wrote:
> Ray Lee wrote:
>
>> Said way #3: We desktop users really want a way to say "Please don't
>> page my executables out when I'm running a system with 3gig of RAM." I
>> hate knobs, but I'm willing to beg for one in this case. 'cause
>> mlock()ing my entire working set into RAM seems pretty silly.
>>
>> Does any of that make sense, or am I talking out of an inappropriate orifice?
>
> The "don't page my executables out" part makes sense.
>
> However, I believe that kind of behaviour should be the
> default.  Desktops and servers alike have a few different
> kinds of data in the page cache:
> 1) pages that have been frequently accessed at some point
>    in the past and got promoted to the active list
> 2) streaming IO
>
> I believe that we want to give (1) absolute protection from
> (2), provided there are not too many pages on the active file
> list.  That way we will provide executables, cached indirect
> and inode blocks, etc. from streaming IO.
>
> Pages that are new to the page cache start on the inactive
> list.  Only if they get accessed twice while on that list,
> they get promoted to the active list.
>
> Streaming IO should normally be evicted from memory before
> it can get accessed again.  This means those pages do not
> get promoted to the active list and the working set is
> protected.
>
> Does this make sense?

I think this is a simplistic view of things.

Keep in mind that the goal of a VM is: "load each page before it's needed."
LRU, use-once heuristics, and the like are ways of trying to guess when a page
is needed and when it isn't, because you don't know the future.

For high throughput, treating all pages equally (or with some simple weighting)
is often appropriate, because it allows you to balance various sorts of working
sets dynamically.

But user interfaces are a realtime problem.  When the user presses a button,
you have a deadline to respond before it's annoying, and another deadline
before the user will hit the power button.  With this in mind, the user's
application UI has essentially infinite priority for memory -- it's either
paged into ram before the user presses a button, or you fail.

Very often, this is just a case of streaming IO vs. everything else, in which
case detecting streaming IO (because of the usage pattern) will help.  That's a
pretty simple case.  But imagine I start up a big compute job in the background
-- for example, I run a video encoder or something similar, and this program
touches the source data many times, such that it does not appear to be
"streaming" by a simple heuristic.

Particularly if I walk away from the computer, from any algorithm just based on
recent usage, this will appear to be the only thing worth doing at that time,
so the UI will be paged out.  And of course, when I walk back to the computer
and press a button, the UI will not respond, and will have shocking latency
until I've touched every bit of it that I use again.

That's a bad outcome.  User interactivity is a real-time problem, and your
deadline is less than 30 disk seeks.

Of course, if the bulk job completes dramatically faster with some extra
memory, then the alternative (pinning the entire UI ram) is also a bad outcome.
There's no perfect solution here, and I suspect a really functional system
ultimately needs all sorts of weird hints from the UI.  Or alternatively, a
naive VM (which pins the UI), and enough RAM to keep the user and any bulk jobs
happy.

-Elladan


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01 20:17                           ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-05-01 20:17 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ray Lee, Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Fri, May 01, 2009 at 03:34:19PM -0400, Rik van Riel wrote:
> Ray Lee wrote:
>
>> Said way #3: We desktop users really want a way to say "Please don't
>> page my executables out when I'm running a system with 3gig of RAM." I
>> hate knobs, but I'm willing to beg for one in this case. 'cause
>> mlock()ing my entire working set into RAM seems pretty silly.
>>
>> Does any of that make sense, or am I talking out of an inappropriate orifice?
>
> The "don't page my executables out" part makes sense.
>
> However, I believe that kind of behaviour should be the
> default.  Desktops and servers alike have a few different
> kinds of data in the page cache:
> 1) pages that have been frequently accessed at some point
>    in the past and got promoted to the active list
> 2) streaming IO
>
> I believe that we want to give (1) absolute protection from
> (2), provided there are not too many pages on the active file
> list.  That way we will provide executables, cached indirect
> and inode blocks, etc. from streaming IO.
>
> Pages that are new to the page cache start on the inactive
> list.  Only if they get accessed twice while on that list,
> they get promoted to the active list.
>
> Streaming IO should normally be evicted from memory before
> it can get accessed again.  This means those pages do not
> get promoted to the active list and the working set is
> protected.
>
> Does this make sense?

I think this is a simplistic view of things.

Keep in mind that the goal of a VM is: "load each page before it's needed."
LRU, use-once heuristics, and the like are ways of trying to guess when a page
is needed and when it isn't, because you don't know the future.

For high throughput, treating all pages equally (or with some simple weighting)
is often appropriate, because it allows you to balance various sorts of working
sets dynamically.

But user interfaces are a realtime problem.  When the user presses a button,
you have a deadline to respond before it's annoying, and another deadline
before the user will hit the power button.  With this in mind, the user's
application UI has essentially infinite priority for memory -- it's either
paged into ram before the user presses a button, or you fail.

Very often, this is just a case of streaming IO vs. everything else, in which
case detecting streaming IO (because of the usage pattern) will help.  That's a
pretty simple case.  But imagine I start up a big compute job in the background
-- for example, I run a video encoder or something similar, and this program
touches the source data many times, such that it does not appear to be
"streaming" by a simple heuristic.

Particularly if I walk away from the computer, from any algorithm just based on
recent usage, this will appear to be the only thing worth doing at that time,
so the UI will be paged out.  And of course, when I walk back to the computer
and press a button, the UI will not respond, and will have shocking latency
until I've touched every bit of it that I use again.

That's a bad outcome.  User interactivity is a real-time problem, and your
deadline is less than 30 disk seeks.

Of course, if the bulk job completes dramatically faster with some extra
memory, then the alternative (pinning the entire UI ram) is also a bad outcome.
There's no perfect solution here, and I suspect a really functional system
ultimately needs all sorts of weird hints from the UI.  Or alternatively, a
naive VM (which pins the UI), and enough RAM to keep the user and any bulk jobs
happy.

-Elladan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 20:05                         ` Rik van Riel
@ 2009-05-01 20:45                           ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01 20:45 UTC (permalink / raw)
  To: Rik van Riel
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Fri, 01 May 2009 16:05:55 -0400
Rik van Riel <riel@redhat.com> wrote:

> Are you open to evaluating other methods that could lead, on
> desktop systems, to a behaviour similar to the one achieved
> by the preserve-mapped-pages mechanism?

Well..  it's more a matter of retaining what we've learnt (unless we
feel it's wrong, or technilogy change broke it) and carefully listening
to and responding to what's happening in out-there land.

The number of problem reports we're seeing from the LRU changes is
pretty low.  Hopefully that's because the number of problems _is_
low.

Given the low level of problem reports, the relative immaturity of the
code and our difficulty with determining what effect our changes will
have upon everyone, I'd have thought that sit-tight-and-wait-and-see
would be the prudent approach for the next few months.

otoh if you have a change and it proves good in your testing then sure,
sooner rather than later.

There, that was nice and waffly.

I still haven't forgotten prev_priority tho!

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01 20:45                           ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01 20:45 UTC (permalink / raw)
  To: Rik van Riel
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Fri, 01 May 2009 16:05:55 -0400
Rik van Riel <riel@redhat.com> wrote:

> Are you open to evaluating other methods that could lead, on
> desktop systems, to a behaviour similar to the one achieved
> by the preserve-mapped-pages mechanism?

Well..  it's more a matter of retaining what we've learnt (unless we
feel it's wrong, or technilogy change broke it) and carefully listening
to and responding to what's happening in out-there land.

The number of problem reports we're seeing from the LRU changes is
pretty low.  Hopefully that's because the number of problems _is_
low.

Given the low level of problem reports, the relative immaturity of the
code and our difficulty with determining what effect our changes will
have upon everyone, I'd have thought that sit-tight-and-wait-and-see
would be the prudent approach for the next few months.

otoh if you have a change and it proves good in your testing then sure,
sooner rather than later.

There, that was nice and waffly.

I still haven't forgotten prev_priority tho!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 20:45                           ` Andrew Morton
@ 2009-05-01 21:46                             ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 21:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

Andrew Morton wrote:
> On Fri, 01 May 2009 16:05:55 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
>> Are you open to evaluating other methods that could lead, on
>> desktop systems, to a behaviour similar to the one achieved
>> by the preserve-mapped-pages mechanism?
> 
> Well..  it's more a matter of retaining what we've learnt (unless we
> feel it's wrong, or technilogy change broke it) and carefully listening
> to and responding to what's happening in out-there land.

Treating mapped pages specially is a bad implementation,
because it does not scale.  The reason is the same reason
we dropped "treat referenced active file pages special"
right before the split LRU code was merged by Linus.

Also, it does not help workloads that have a large number
of unmapped pages, where we want to protect the frequently
used ones from a giant stream of used-once pages.  NFS and
FTP servers would be a typical example of this, but so
would a database server with postgres or mysql in a default
setup.

> The number of problem reports we're seeing from the LRU changes is
> pretty low.  Hopefully that's because the number of problems _is_
> low.

I believe the number of problems is low.  However, the
severity of this particular problem means that we'll
probably want to do something about it.

> Given the low level of problem reports, the relative immaturity of the
> code and our difficulty with determining what effect our changes will
> have upon everyone, I'd have thought that sit-tight-and-wait-and-see
> would be the prudent approach for the next few months.
> 
> otoh if you have a change and it proves good in your testing then sure,
> sooner rather than later.

I believe the patch I submitted in this thread should fix
the problem.  I have experimented with the patch before
and Elladan's results show that the situation is resolved
for him.

Furthermore, Peter and I believe the patch has a minimal
risk of side effects.

Of course, there may be better ideas yet.  It would be
nice if people could try to shoot holes in the concept
of the patch - if anybody can even think of a way in
which it could break, we can try to come up with a way
of fixing it.

> I still haven't forgotten prev_priority tho!

The whole priority thing could be(come) a problem too,
with us scanning WAY too many pages at once in a gigantic
memory zone.  Scanning a million pages at once will
probably lead to unacceptable latencies somewhere :)

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-01 21:46                             ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 21:46 UTC (permalink / raw)
  To: Andrew Morton
  Cc: elladan, peterz, linux-kernel, tytso, kosaki.motohiro, linux-mm

Andrew Morton wrote:
> On Fri, 01 May 2009 16:05:55 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
>> Are you open to evaluating other methods that could lead, on
>> desktop systems, to a behaviour similar to the one achieved
>> by the preserve-mapped-pages mechanism?
> 
> Well..  it's more a matter of retaining what we've learnt (unless we
> feel it's wrong, or technilogy change broke it) and carefully listening
> to and responding to what's happening in out-there land.

Treating mapped pages specially is a bad implementation,
because it does not scale.  The reason is the same reason
we dropped "treat referenced active file pages special"
right before the split LRU code was merged by Linus.

Also, it does not help workloads that have a large number
of unmapped pages, where we want to protect the frequently
used ones from a giant stream of used-once pages.  NFS and
FTP servers would be a typical example of this, but so
would a database server with postgres or mysql in a default
setup.

> The number of problem reports we're seeing from the LRU changes is
> pretty low.  Hopefully that's because the number of problems _is_
> low.

I believe the number of problems is low.  However, the
severity of this particular problem means that we'll
probably want to do something about it.

> Given the low level of problem reports, the relative immaturity of the
> code and our difficulty with determining what effect our changes will
> have upon everyone, I'd have thought that sit-tight-and-wait-and-see
> would be the prudent approach for the next few months.
> 
> otoh if you have a change and it proves good in your testing then sure,
> sooner rather than later.

I believe the patch I submitted in this thread should fix
the problem.  I have experimented with the patch before
and Elladan's results show that the situation is resolved
for him.

Furthermore, Peter and I believe the patch has a minimal
risk of side effects.

Of course, there may be better ideas yet.  It would be
nice if people could try to shoot holes in the concept
of the patch - if anybody can even think of a way in
which it could break, we can try to come up with a way
of fixing it.

> I still haven't forgotten prev_priority tho!

The whole priority thing could be(come) a problem too,
with us scanning WAY too many pages at once in a gigantic
memory zone.  Scanning a million pages at once will
probably lead to unacceptable latencies somewhere :)

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
  2009-04-29 17:14           ` Rik van Riel
@ 2009-05-01 22:32             ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01 22:32 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, peterz, elladan, linux-kernel, tytso, linux-mm

On Wed, 29 Apr 2009 13:14:36 -0400
Rik van Riel <riel@redhat.com> wrote:

> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> ..
>
> +int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
> +{
> +	unsigned long active;
> +	unsigned long inactive;
> +
> +	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
> +	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
> +
> +	return (active > inactive);
> +}

This function could trivially be made significantly more efficient by
changing it to do a single pass over all the zones of all the nodes,
rather than two passes.

>  static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>  	struct zone *zone, struct scan_control *sc, int priority)
>  {
>  	int file = is_file_lru(lru);
>  
> -	if (lru == LRU_ACTIVE_FILE) {
> +	if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
>  		shrink_active_list(nr_to_scan, zone, sc, priority, file);
>  		return 0;
>  	}

And it does get called rather often.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
@ 2009-05-01 22:32             ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01 22:32 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, peterz, elladan, linux-kernel, tytso, linux-mm

On Wed, 29 Apr 2009 13:14:36 -0400
Rik van Riel <riel@redhat.com> wrote:

> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> ..
>
> +int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
> +{
> +	unsigned long active;
> +	unsigned long inactive;
> +
> +	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
> +	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
> +
> +	return (active > inactive);
> +}

This function could trivially be made significantly more efficient by
changing it to do a single pass over all the zones of all the nodes,
rather than two passes.

>  static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>  	struct zone *zone, struct scan_control *sc, int priority)
>  {
>  	int file = is_file_lru(lru);
>  
> -	if (lru == LRU_ACTIVE_FILE) {
> +	if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
>  		shrink_active_list(nr_to_scan, zone, sc, priority, file);
>  		return 0;
>  	}

And it does get called rather often.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
  2009-05-01 22:32             ` Andrew Morton
@ 2009-05-01 23:05               ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 23:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, peterz, elladan, linux-kernel, tytso, linux-mm

Andrew Morton wrote:
> On Wed, 29 Apr 2009 13:14:36 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
>> When the file LRU lists are dominated by streaming IO pages,
>> evict those pages first, before considering evicting other
>> pages.
>>
>> This should be safe from deadlocks or performance problems
>> because only three things can happen to an inactive file page:
>> 1) referenced twice and promoted to the active list
>> 2) evicted by the pageout code
>> 3) under IO, after which it will get evicted or promoted
>>
>> The pages freed in this way can either be reused for streaming
>> IO, or allocated for something else. If the pages are used for
>> streaming IO, this pageout pattern continues. Otherwise, we will
>> fall back to the normal pageout pattern.
>>
>> ..
>>
>> +int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
>> +{
>> +	unsigned long active;
>> +	unsigned long inactive;
>> +
>> +	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
>> +	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
>> +
>> +	return (active > inactive);
>> +}
> 
> This function could trivially be made significantly more efficient by
> changing it to do a single pass over all the zones of all the nodes,
> rather than two passes.

How would I do that in a clean way?

The function mem_cgroup_inactive_anon_is_low and
the global versions all do the same.  It would be
nice to make all four of them go fast :)

If there is no standardized infrastructure for
getting multiple statistics yet, I can probably
whip something up.

>>  static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>>  	struct zone *zone, struct scan_control *sc, int priority)
>>  {
>>  	int file = is_file_lru(lru);
>>  
>> -	if (lru == LRU_ACTIVE_FILE) {
>> +	if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
>>  		shrink_active_list(nr_to_scan, zone, sc, priority, file);
>>  		return 0;
>>  	}
> 
> And it does get called rather often.

Same as inactive_anon_is_low.

Optimizing them might make sense if it turns out to
use a significant amount of CPU.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
@ 2009-05-01 23:05               ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-01 23:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, peterz, elladan, linux-kernel, tytso, linux-mm

Andrew Morton wrote:
> On Wed, 29 Apr 2009 13:14:36 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
>> When the file LRU lists are dominated by streaming IO pages,
>> evict those pages first, before considering evicting other
>> pages.
>>
>> This should be safe from deadlocks or performance problems
>> because only three things can happen to an inactive file page:
>> 1) referenced twice and promoted to the active list
>> 2) evicted by the pageout code
>> 3) under IO, after which it will get evicted or promoted
>>
>> The pages freed in this way can either be reused for streaming
>> IO, or allocated for something else. If the pages are used for
>> streaming IO, this pageout pattern continues. Otherwise, we will
>> fall back to the normal pageout pattern.
>>
>> ..
>>
>> +int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
>> +{
>> +	unsigned long active;
>> +	unsigned long inactive;
>> +
>> +	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
>> +	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
>> +
>> +	return (active > inactive);
>> +}
> 
> This function could trivially be made significantly more efficient by
> changing it to do a single pass over all the zones of all the nodes,
> rather than two passes.

How would I do that in a clean way?

The function mem_cgroup_inactive_anon_is_low and
the global versions all do the same.  It would be
nice to make all four of them go fast :)

If there is no standardized infrastructure for
getting multiple statistics yet, I can probably
whip something up.

>>  static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>>  	struct zone *zone, struct scan_control *sc, int priority)
>>  {
>>  	int file = is_file_lru(lru);
>>  
>> -	if (lru == LRU_ACTIVE_FILE) {
>> +	if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
>>  		shrink_active_list(nr_to_scan, zone, sc, priority, file);
>>  		return 0;
>>  	}
> 
> And it does get called rather often.

Same as inactive_anon_is_low.

Optimizing them might make sense if it turns out to
use a significant amount of CPU.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
  2009-05-01 23:05               ` Rik van Riel
@ 2009-05-01 23:25                 ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01 23:25 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, peterz, elladan, linux-kernel, tytso, linux-mm

On Fri, 01 May 2009 19:05:21 -0400
Rik van Riel <riel@redhat.com> wrote:

> Andrew Morton wrote:
> > On Wed, 29 Apr 2009 13:14:36 -0400
> > Rik van Riel <riel@redhat.com> wrote:
> > 
> >> When the file LRU lists are dominated by streaming IO pages,
> >> evict those pages first, before considering evicting other
> >> pages.
> >>
> >> This should be safe from deadlocks or performance problems
> >> because only three things can happen to an inactive file page:
> >> 1) referenced twice and promoted to the active list
> >> 2) evicted by the pageout code
> >> 3) under IO, after which it will get evicted or promoted
> >>
> >> The pages freed in this way can either be reused for streaming
> >> IO, or allocated for something else. If the pages are used for
> >> streaming IO, this pageout pattern continues. Otherwise, we will
> >> fall back to the normal pageout pattern.
> >>
> >> ..
> >>
> >> +int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
> >> +{
> >> +	unsigned long active;
> >> +	unsigned long inactive;
> >> +
> >> +	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
> >> +	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
> >> +
> >> +	return (active > inactive);
> >> +}
> > 
> > This function could trivially be made significantly more efficient by
> > changing it to do a single pass over all the zones of all the nodes,
> > rather than two passes.
> 
> How would I do that in a clean way?

copy-n-paste :(

static unsigned long foo(struct mem_cgroup *mem,
			enum lru_list idx1, enum lru_list idx2)
{
	int nid, zid;
	struct mem_cgroup_per_zone *mz;
	u64 total = 0;

	for_each_online_node(nid)
		for (zid = 0; zid < MAX_NR_ZONES; zid++) {
			mz = mem_cgroup_zoneinfo(mem, nid, zid);
			total += MEM_CGROUP_ZSTAT(mz, idx1);
			total += MEM_CGROUP_ZSTAT(mz, idx3);
		}
	return total;
}

dunno if that's justifiable.

> The function mem_cgroup_inactive_anon_is_low and
> the global versions all do the same.  It would be
> nice to make all four of them go fast :)
> 
> If there is no standardized infrastructure for
> getting multiple statistics yet, I can probably
> whip something up.

It depends how often it would be called for, I guess.

One approach would be pass in a variable-length array of `enum
lru_list's, get returned a same-lengthed array of totals.

Or perhaps all we need to return is the sum of those totals.

I'd let the memcg guys worry about this if I were you ;)

> Optimizing them might make sense if it turns out to
> use a significant amount of CPU.

Yeah.  By then it's often too late though.  The sort of people for whom
(num_online_nodes*MAX_NR_ZONES) is nuttily large tend not to run
kernel.org kernels.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
@ 2009-05-01 23:25                 ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-01 23:25 UTC (permalink / raw)
  To: Rik van Riel
  Cc: kosaki.motohiro, peterz, elladan, linux-kernel, tytso, linux-mm

On Fri, 01 May 2009 19:05:21 -0400
Rik van Riel <riel@redhat.com> wrote:

> Andrew Morton wrote:
> > On Wed, 29 Apr 2009 13:14:36 -0400
> > Rik van Riel <riel@redhat.com> wrote:
> > 
> >> When the file LRU lists are dominated by streaming IO pages,
> >> evict those pages first, before considering evicting other
> >> pages.
> >>
> >> This should be safe from deadlocks or performance problems
> >> because only three things can happen to an inactive file page:
> >> 1) referenced twice and promoted to the active list
> >> 2) evicted by the pageout code
> >> 3) under IO, after which it will get evicted or promoted
> >>
> >> The pages freed in this way can either be reused for streaming
> >> IO, or allocated for something else. If the pages are used for
> >> streaming IO, this pageout pattern continues. Otherwise, we will
> >> fall back to the normal pageout pattern.
> >>
> >> ..
> >>
> >> +int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
> >> +{
> >> +	unsigned long active;
> >> +	unsigned long inactive;
> >> +
> >> +	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
> >> +	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
> >> +
> >> +	return (active > inactive);
> >> +}
> > 
> > This function could trivially be made significantly more efficient by
> > changing it to do a single pass over all the zones of all the nodes,
> > rather than two passes.
> 
> How would I do that in a clean way?

copy-n-paste :(

static unsigned long foo(struct mem_cgroup *mem,
			enum lru_list idx1, enum lru_list idx2)
{
	int nid, zid;
	struct mem_cgroup_per_zone *mz;
	u64 total = 0;

	for_each_online_node(nid)
		for (zid = 0; zid < MAX_NR_ZONES; zid++) {
			mz = mem_cgroup_zoneinfo(mem, nid, zid);
			total += MEM_CGROUP_ZSTAT(mz, idx1);
			total += MEM_CGROUP_ZSTAT(mz, idx3);
		}
	return total;
}

dunno if that's justifiable.

> The function mem_cgroup_inactive_anon_is_low and
> the global versions all do the same.  It would be
> nice to make all four of them go fast :)
> 
> If there is no standardized infrastructure for
> getting multiple statistics yet, I can probably
> whip something up.

It depends how often it would be called for, I guess.

One approach would be pass in a variable-length array of `enum
lru_list's, get returned a same-lengthed array of totals.

Or perhaps all we need to return is the sum of those totals.

I'd let the memcg guys worry about this if I were you ;)

> Optimizing them might make sense if it turns out to
> use a significant amount of CPU.

Yeah.  By then it's often too late though.  The sort of people for whom
(num_online_nodes*MAX_NR_ZONES) is nuttily large tend not to run
kernel.org kernels.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
  2009-04-29 17:14           ` Rik van Riel
@ 2009-05-03  1:15             ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-03  1:15 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

On Wed, Apr 29, 2009 at 01:14:36PM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> 
[snip]
> +static int inactive_file_is_low_global(struct zone *zone)
> +{
> +	unsigned long active, inactive;
> +
> +	active = zone_page_state(zone, NR_ACTIVE_FILE);
> +	inactive = zone_page_state(zone, NR_INACTIVE_FILE);
> +
> +	return (active > inactive);
> +}
[snip]
>  static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>  	struct zone *zone, struct scan_control *sc, int priority)
>  {
>  	int file = is_file_lru(lru);
>  
> -	if (lru == LRU_ACTIVE_FILE) {
> +	if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
>  		shrink_active_list(nr_to_scan, zone, sc, priority, file);
>  		return 0;
>  	}

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

I like this idea - it's simple and sound, and is expected to work well
for the majority workloads. Sure the arbitrary 1:1 active:inactive ratio
may be suboptimal for many workloads, but it is mostly safe.

In the worse scenario, it could waste half the memory that could
otherwise be used for readahead buffer and to prevent thrashing, in a
server serving large datasets that are hardly reused, but still slowly
builds up its active list during the long uptime (think about a slowly
performance downgrade that can be fixed by a crude dropcache action).

That said, the actual performance degradation could be much smaller -
say 15% - all memories are not equal.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
@ 2009-05-03  1:15             ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-03  1:15 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

On Wed, Apr 29, 2009 at 01:14:36PM -0400, Rik van Riel wrote:
> When the file LRU lists are dominated by streaming IO pages,
> evict those pages first, before considering evicting other
> pages.
> 
> This should be safe from deadlocks or performance problems
> because only three things can happen to an inactive file page:
> 1) referenced twice and promoted to the active list
> 2) evicted by the pageout code
> 3) under IO, after which it will get evicted or promoted
> 
> The pages freed in this way can either be reused for streaming
> IO, or allocated for something else. If the pages are used for
> streaming IO, this pageout pattern continues. Otherwise, we will
> fall back to the normal pageout pattern.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> 
[snip]
> +static int inactive_file_is_low_global(struct zone *zone)
> +{
> +	unsigned long active, inactive;
> +
> +	active = zone_page_state(zone, NR_ACTIVE_FILE);
> +	inactive = zone_page_state(zone, NR_INACTIVE_FILE);
> +
> +	return (active > inactive);
> +}
[snip]
>  static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
>  	struct zone *zone, struct scan_control *sc, int priority)
>  {
>  	int file = is_file_lru(lru);
>  
> -	if (lru == LRU_ACTIVE_FILE) {
> +	if (lru == LRU_ACTIVE_FILE && inactive_file_is_low(zone, sc)) {
>  		shrink_active_list(nr_to_scan, zone, sc, priority, file);
>  		return 0;
>  	}

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

I like this idea - it's simple and sound, and is expected to work well
for the majority workloads. Sure the arbitrary 1:1 active:inactive ratio
may be suboptimal for many workloads, but it is mostly safe.

In the worse scenario, it could waste half the memory that could
otherwise be used for readahead buffer and to prevent thrashing, in a
server serving large datasets that are hardly reused, but still slowly
builds up its active list during the long uptime (think about a slowly
performance downgrade that can be fixed by a crude dropcache action).

That said, the actual performance degradation could be much smaller -
say 15% - all memories are not equal.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
  2009-05-01 23:25                 ` Andrew Morton
@ 2009-05-03  1:28                   ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-03  1:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, kosaki.motohiro, peterz, elladan, linux-kernel,
	tytso, linux-mm

On Fri, May 01, 2009 at 04:25:06PM -0700, Andrew Morton wrote:
> On Fri, 01 May 2009 19:05:21 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > Andrew Morton wrote:
> > > On Wed, 29 Apr 2009 13:14:36 -0400
> > > Rik van Riel <riel@redhat.com> wrote:
> > > 
> > >> When the file LRU lists are dominated by streaming IO pages,
> > >> evict those pages first, before considering evicting other
> > >> pages.
> > >>
> > >> This should be safe from deadlocks or performance problems
> > >> because only three things can happen to an inactive file page:
> > >> 1) referenced twice and promoted to the active list
> > >> 2) evicted by the pageout code
> > >> 3) under IO, after which it will get evicted or promoted
> > >>
> > >> The pages freed in this way can either be reused for streaming
> > >> IO, or allocated for something else. If the pages are used for
> > >> streaming IO, this pageout pattern continues. Otherwise, we will
> > >> fall back to the normal pageout pattern.
> > >>
> > >> ..
> > >>
> > >> +int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
> > >> +{
> > >> +	unsigned long active;
> > >> +	unsigned long inactive;
> > >> +
> > >> +	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
> > >> +	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
> > >> +
> > >> +	return (active > inactive);
> > >> +}
> > > 
> > > This function could trivially be made significantly more efficient by
> > > changing it to do a single pass over all the zones of all the nodes,
> > > rather than two passes.
> > 
> > How would I do that in a clean way?
> 
> copy-n-paste :(
> 
> static unsigned long foo(struct mem_cgroup *mem,
> 			enum lru_list idx1, enum lru_list idx2)
> {
> 	int nid, zid;
> 	struct mem_cgroup_per_zone *mz;
> 	u64 total = 0;
> 
> 	for_each_online_node(nid)
> 		for (zid = 0; zid < MAX_NR_ZONES; zid++) {
> 			mz = mem_cgroup_zoneinfo(mem, nid, zid);
> 			total += MEM_CGROUP_ZSTAT(mz, idx1);
> 			total += MEM_CGROUP_ZSTAT(mz, idx3);
> 		}
> 	return total;
> }
> 
> dunno if that's justifiable.
> 
> > The function mem_cgroup_inactive_anon_is_low and
> > the global versions all do the same.  It would be
> > nice to make all four of them go fast :)
> > 
> > If there is no standardized infrastructure for
> > getting multiple statistics yet, I can probably
> > whip something up.
> 
> It depends how often it would be called for, I guess.
> 
> One approach would be pass in a variable-length array of `enum
> lru_list's, get returned a same-lengthed array of totals.
> 
> Or perhaps all we need to return is the sum of those totals.
> 
> I'd let the memcg guys worry about this if I were you ;)
> 
> > Optimizing them might make sense if it turns out to
> > use a significant amount of CPU.
> 
> Yeah.  By then it's often too late though.  The sort of people for whom
> (num_online_nodes*MAX_NR_ZONES) is nuttily large tend not to run
> kernel.org kernels.

Good point. We could add a flag that is tested frequently in shrink_list()
and updated less frequently in shrink_zone() (or whatever).

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
@ 2009-05-03  1:28                   ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-03  1:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, kosaki.motohiro, peterz, elladan, linux-kernel,
	tytso, linux-mm

On Fri, May 01, 2009 at 04:25:06PM -0700, Andrew Morton wrote:
> On Fri, 01 May 2009 19:05:21 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > Andrew Morton wrote:
> > > On Wed, 29 Apr 2009 13:14:36 -0400
> > > Rik van Riel <riel@redhat.com> wrote:
> > > 
> > >> When the file LRU lists are dominated by streaming IO pages,
> > >> evict those pages first, before considering evicting other
> > >> pages.
> > >>
> > >> This should be safe from deadlocks or performance problems
> > >> because only three things can happen to an inactive file page:
> > >> 1) referenced twice and promoted to the active list
> > >> 2) evicted by the pageout code
> > >> 3) under IO, after which it will get evicted or promoted
> > >>
> > >> The pages freed in this way can either be reused for streaming
> > >> IO, or allocated for something else. If the pages are used for
> > >> streaming IO, this pageout pattern continues. Otherwise, we will
> > >> fall back to the normal pageout pattern.
> > >>
> > >> ..
> > >>
> > >> +int mem_cgroup_inactive_file_is_low(struct mem_cgroup *memcg)
> > >> +{
> > >> +	unsigned long active;
> > >> +	unsigned long inactive;
> > >> +
> > >> +	inactive = mem_cgroup_get_local_zonestat(memcg, LRU_INACTIVE_FILE);
> > >> +	active = mem_cgroup_get_local_zonestat(memcg, LRU_ACTIVE_FILE);
> > >> +
> > >> +	return (active > inactive);
> > >> +}
> > > 
> > > This function could trivially be made significantly more efficient by
> > > changing it to do a single pass over all the zones of all the nodes,
> > > rather than two passes.
> > 
> > How would I do that in a clean way?
> 
> copy-n-paste :(
> 
> static unsigned long foo(struct mem_cgroup *mem,
> 			enum lru_list idx1, enum lru_list idx2)
> {
> 	int nid, zid;
> 	struct mem_cgroup_per_zone *mz;
> 	u64 total = 0;
> 
> 	for_each_online_node(nid)
> 		for (zid = 0; zid < MAX_NR_ZONES; zid++) {
> 			mz = mem_cgroup_zoneinfo(mem, nid, zid);
> 			total += MEM_CGROUP_ZSTAT(mz, idx1);
> 			total += MEM_CGROUP_ZSTAT(mz, idx3);
> 		}
> 	return total;
> }
> 
> dunno if that's justifiable.
> 
> > The function mem_cgroup_inactive_anon_is_low and
> > the global versions all do the same.  It would be
> > nice to make all four of them go fast :)
> > 
> > If there is no standardized infrastructure for
> > getting multiple statistics yet, I can probably
> > whip something up.
> 
> It depends how often it would be called for, I guess.
> 
> One approach would be pass in a variable-length array of `enum
> lru_list's, get returned a same-lengthed array of totals.
> 
> Or perhaps all we need to return is the sum of those totals.
> 
> I'd let the memcg guys worry about this if I were you ;)
> 
> > Optimizing them might make sense if it turns out to
> > use a significant amount of CPU.
> 
> Yeah.  By then it's often too late though.  The sort of people for whom
> (num_online_nodes*MAX_NR_ZONES) is nuttily large tend not to run
> kernel.org kernels.

Good point. We could add a flag that is tested frequently in shrink_list()
and updated less frequently in shrink_zone() (or whatever).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
  2009-05-03  1:15             ` Wu Fengguang
@ 2009-05-03  1:33               ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-03  1:33 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KOSAKI Motohiro, Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

On Sun, 3 May 2009 09:15:40 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> In the worse scenario, it could waste half the memory that could
> otherwise be used for readahead buffer and to prevent thrashing, in a
> server serving large datasets that are hardly reused, but still slowly
> builds up its active list during the long uptime (think about a slowly
> performance downgrade that can be fixed by a crude dropcache action).

In the best case, the active list ends up containing all the
indirect blocks for the files that are occasionally reused,
and the system ends up being able to serve its clients with
less disk IO.

For systems like ftp.kernel.org, the files that are most
popular will end up on the active list, without being kicked
out by the files that are less popular.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
@ 2009-05-03  1:33               ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-03  1:33 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KOSAKI Motohiro, Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

On Sun, 3 May 2009 09:15:40 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> In the worse scenario, it could waste half the memory that could
> otherwise be used for readahead buffer and to prevent thrashing, in a
> server serving large datasets that are hardly reused, but still slowly
> builds up its active list during the long uptime (think about a slowly
> performance downgrade that can be fixed by a crude dropcache action).

In the best case, the active list ends up containing all the
indirect blocks for the files that are occasionally reused,
and the system ends up being able to serve its clients with
less disk IO.

For systems like ftp.kernel.org, the files that are most
popular will end up on the active list, without being kicked
out by the files that are less popular.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
  2009-05-03  1:33               ` Rik van Riel
@ 2009-05-03  1:46                 ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-03  1:46 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

On Sun, May 03, 2009 at 09:33:56AM +0800, Rik van Riel wrote:
> On Sun, 3 May 2009 09:15:40 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > In the worse scenario, it could waste half the memory that could
> > otherwise be used for readahead buffer and to prevent thrashing, in a
> > server serving large datasets that are hardly reused, but still slowly
> > builds up its active list during the long uptime (think about a slowly
> > performance downgrade that can be fixed by a crude dropcache action).
> 
> In the best case, the active list ends up containing all the
> indirect blocks for the files that are occasionally reused,
> and the system ends up being able to serve its clients with
> less disk IO.
> 
> For systems like ftp.kernel.org, the files that are most
> popular will end up on the active list, without being kicked
> out by the files that are less popular.

Sure, such good cases tend to be prevalent - so obvious that I didn't
mind to mention ;-)

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v3)
@ 2009-05-03  1:46                 ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-03  1:46 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Peter Zijlstra, Elladan, linux-kernel, tytso, linux-mm

On Sun, May 03, 2009 at 09:33:56AM +0800, Rik van Riel wrote:
> On Sun, 3 May 2009 09:15:40 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > In the worse scenario, it could waste half the memory that could
> > otherwise be used for readahead buffer and to prevent thrashing, in a
> > server serving large datasets that are hardly reused, but still slowly
> > builds up its active list during the long uptime (think about a slowly
> > performance downgrade that can be fixed by a crude dropcache action).
> 
> In the best case, the active list ends up containing all the
> indirect blocks for the files that are occasionally reused,
> and the system ends up being able to serve its clients with
> less disk IO.
> 
> For systems like ftp.kernel.org, the files that are most
> popular will end up on the active list, without being kicked
> out by the files that are less popular.

Sure, such good cases tend to be prevalent - so obvious that I didn't
mind to mention ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 19:35                       ` Andrew Morton
@ 2009-05-03  3:15                         ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-03  3:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Fri, May 01, 2009 at 12:35:41PM -0700, Andrew Morton wrote:
> On Fri, 01 May 2009 10:05:53 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > Andrew Morton wrote:
> > 
> > >> When we implement working set protection, we might as well
> > >> do it for frequently accessed unmapped pages too.  There is
> > >> no reason to restrict this protection to mapped pages.
> > > 
> > > Well.  Except for empirical observation, which tells us that biasing
> > > reclaim to prefer to retain mapped memory produces a better result.
> > 
> > That used to be the case because file-backed and
> > swap-backed pages shared the same set of LRUs,
> > while each following a different page reclaim
> > heuristic!
> 
> No, I think it still _is_ the case.  When reclaim is treating mapped
> and non-mapped pages equally, the end result sucks.  Applications get
> all laggy and humans get irritated.  It may be that the system was
> optimised from an overall throughput POV, but the result was
> *irritating*.
> 
> Which led us to prefer to retain mapped pages.  This had nothing at all
> to do with internal impementation details - it was a design objective
> based upon empirical observation of system behaviour.

Heartily Agreed. We shall try hard to protect the running applications.

Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit check)
tries to address scalability problem when every page get mapped and
referenced, so that logic(which lowed the priority of mapped pages)
could be enabled only on conditions like (priority < DEF_PRIORITY).

Or preferably we can explicitly protect the mapped executables,
as illustrated by this patch (a quick prototype).

Thanks,
Fengguang
---
 include/linux/pagemap.h |    1 +
 mm/mmap.c               |    2 ++
 mm/nommu.c              |    2 ++
 mm/vmscan.c             |   37 +++++++++++++++++++++++++++++++++++--
 4 files changed, 40 insertions(+), 2 deletions(-)

--- linux.orig/include/linux/pagemap.h
+++ linux/include/linux/pagemap.h
@@ -25,6 +25,7 @@ enum mapping_flags {
 #ifdef CONFIG_UNEVICTABLE_LRU
 	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
 #endif
+	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
--- linux.orig/mm/mmap.c
+++ linux/mm/mmap.c
@@ -1198,6 +1198,8 @@ munmap_back:
 			goto unmap_and_free_vma;
 		if (vm_flags & VM_EXECUTABLE)
 			added_exe_file_vma(mm);
+		if (vm_flags & VM_EXEC)
+			set_bit(AS_EXEC, &file->f_mapping->flags);
 	} else if (vm_flags & VM_SHARED) {
 		error = shmem_zero_setup(vma);
 		if (error)
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1220,6 +1220,7 @@ static void shrink_active_list(unsigned 
 	int pgdeactivate = 0;
 	unsigned long pgscanned;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
+	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct pagevec pvec;
@@ -1259,8 +1260,15 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup))
+		    page_referenced(page, 0, sc->mem_cgroup)) {
+			struct address_space *mapping = page_mapping(page);
+
 			pgmoved++;
+			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
+				list_add(&page->lru, &l_active);
+				continue;
+			}
+		}
 
 		list_add(&page->lru, &l_inactive);
 	}
@@ -1269,7 +1277,6 @@ static void shrink_active_list(unsigned 
 	 * Move the pages to the [file or anon] inactive list.
 	 */
 	pagevec_init(&pvec, 1);
-	lru = LRU_BASE + file * LRU_FILE;
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
@@ -1281,6 +1288,7 @@ static void shrink_active_list(unsigned 
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
 	pgmoved = 0;
+	lru = LRU_BASE + file * LRU_FILE;
 	while (!list_empty(&l_inactive)) {
 		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1305,6 +1313,31 @@ static void shrink_active_list(unsigned 
 	}
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
 	pgdeactivate += pgmoved;
+
+	pgmoved = 0;
+	lru = LRU_ACTIVE + file * LRU_FILE;
+	while (!list_empty(&l_active)) {
+		page = lru_to_page(&l_active);
+		prefetchw_prev_lru_page(page, &l_active, flags);
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+		VM_BUG_ON(!PageActive(page));
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+			pgmoved = 0;
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	__count_vm_events(PGDEACTIVATE, pgdeactivate);
 	spin_unlock_irq(&zone->lru_lock);
--- linux.orig/mm/nommu.c
+++ linux/mm/nommu.c
@@ -1220,6 +1220,8 @@ unsigned long do_mmap_pgoff(struct file 
 			added_exe_file_vma(current->mm);
 			vma->vm_mm = current->mm;
 		}
+		if (vm_flags & VM_EXEC)
+			set_bit(AS_EXEC, &file->f_mapping->flags);
 	}
 
 	down_write(&nommu_region_sem);

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-03  3:15                         ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-03  3:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Fri, May 01, 2009 at 12:35:41PM -0700, Andrew Morton wrote:
> On Fri, 01 May 2009 10:05:53 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > Andrew Morton wrote:
> > 
> > >> When we implement working set protection, we might as well
> > >> do it for frequently accessed unmapped pages too.  There is
> > >> no reason to restrict this protection to mapped pages.
> > > 
> > > Well.  Except for empirical observation, which tells us that biasing
> > > reclaim to prefer to retain mapped memory produces a better result.
> > 
> > That used to be the case because file-backed and
> > swap-backed pages shared the same set of LRUs,
> > while each following a different page reclaim
> > heuristic!
> 
> No, I think it still _is_ the case.  When reclaim is treating mapped
> and non-mapped pages equally, the end result sucks.  Applications get
> all laggy and humans get irritated.  It may be that the system was
> optimised from an overall throughput POV, but the result was
> *irritating*.
> 
> Which led us to prefer to retain mapped pages.  This had nothing at all
> to do with internal impementation details - it was a design objective
> based upon empirical observation of system behaviour.

Heartily Agreed. We shall try hard to protect the running applications.

Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit check)
tries to address scalability problem when every page get mapped and
referenced, so that logic(which lowed the priority of mapped pages)
could be enabled only on conditions like (priority < DEF_PRIORITY).

Or preferably we can explicitly protect the mapped executables,
as illustrated by this patch (a quick prototype).

Thanks,
Fengguang
---
 include/linux/pagemap.h |    1 +
 mm/mmap.c               |    2 ++
 mm/nommu.c              |    2 ++
 mm/vmscan.c             |   37 +++++++++++++++++++++++++++++++++++--
 4 files changed, 40 insertions(+), 2 deletions(-)

--- linux.orig/include/linux/pagemap.h
+++ linux/include/linux/pagemap.h
@@ -25,6 +25,7 @@ enum mapping_flags {
 #ifdef CONFIG_UNEVICTABLE_LRU
 	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
 #endif
+	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
--- linux.orig/mm/mmap.c
+++ linux/mm/mmap.c
@@ -1198,6 +1198,8 @@ munmap_back:
 			goto unmap_and_free_vma;
 		if (vm_flags & VM_EXECUTABLE)
 			added_exe_file_vma(mm);
+		if (vm_flags & VM_EXEC)
+			set_bit(AS_EXEC, &file->f_mapping->flags);
 	} else if (vm_flags & VM_SHARED) {
 		error = shmem_zero_setup(vma);
 		if (error)
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1220,6 +1220,7 @@ static void shrink_active_list(unsigned 
 	int pgdeactivate = 0;
 	unsigned long pgscanned;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
+	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct pagevec pvec;
@@ -1259,8 +1260,15 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup))
+		    page_referenced(page, 0, sc->mem_cgroup)) {
+			struct address_space *mapping = page_mapping(page);
+
 			pgmoved++;
+			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
+				list_add(&page->lru, &l_active);
+				continue;
+			}
+		}
 
 		list_add(&page->lru, &l_inactive);
 	}
@@ -1269,7 +1277,6 @@ static void shrink_active_list(unsigned 
 	 * Move the pages to the [file or anon] inactive list.
 	 */
 	pagevec_init(&pvec, 1);
-	lru = LRU_BASE + file * LRU_FILE;
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
@@ -1281,6 +1288,7 @@ static void shrink_active_list(unsigned 
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
 	pgmoved = 0;
+	lru = LRU_BASE + file * LRU_FILE;
 	while (!list_empty(&l_inactive)) {
 		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1305,6 +1313,31 @@ static void shrink_active_list(unsigned 
 	}
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
 	pgdeactivate += pgmoved;
+
+	pgmoved = 0;
+	lru = LRU_ACTIVE + file * LRU_FILE;
+	while (!list_empty(&l_active)) {
+		page = lru_to_page(&l_active);
+		prefetchw_prev_lru_page(page, &l_active, flags);
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+		VM_BUG_ON(!PageActive(page));
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+			pgmoved = 0;
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	__count_vm_events(PGDEACTIVATE, pgdeactivate);
 	spin_unlock_irq(&zone->lru_lock);
--- linux.orig/mm/nommu.c
+++ linux/mm/nommu.c
@@ -1220,6 +1220,8 @@ unsigned long do_mmap_pgoff(struct file 
 			added_exe_file_vma(current->mm);
 			vma->vm_mm = current->mm;
 		}
+		if (vm_flags & VM_EXEC)
+			set_bit(AS_EXEC, &file->f_mapping->flags);
 	}
 
 	down_write(&nommu_region_sem);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-03  3:15                         ` Wu Fengguang
@ 2009-05-03  3:24                           ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-03  3:24 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Sun, 3 May 2009 11:15:39 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit
> check) tries to address scalability problem when every page get
> mapped and referenced, so that logic(which lowed the priority of
> mapped pages) could be enabled only on conditions like (priority <
> DEF_PRIORITY).
> 
> Or preferably we can explicitly protect the mapped executables,
> as illustrated by this patch (a quick prototype).

Over time, given enough streaming IO and idle applications,
executables will still be evicted with just this patch.

However, a combination of your patch and mine might do the
trick.  I suspect that executables are never a very big
part of memory, except on small memory systems, so protecting
just the mapped executables should not be a scalability
problem.

My patch in combination with your patch should make sure
that if something gets evicted from the active list, it's
not executables - meanwhile, lots of the time streaming
IO will completely leave the active file list alone.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-03  3:24                           ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-03  3:24 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Sun, 3 May 2009 11:15:39 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit
> check) tries to address scalability problem when every page get
> mapped and referenced, so that logic(which lowed the priority of
> mapped pages) could be enabled only on conditions like (priority <
> DEF_PRIORITY).
> 
> Or preferably we can explicitly protect the mapped executables,
> as illustrated by this patch (a quick prototype).

Over time, given enough streaming IO and idle applications,
executables will still be evicted with just this patch.

However, a combination of your patch and mine might do the
trick.  I suspect that executables are never a very big
part of memory, except on small memory systems, so protecting
just the mapped executables should not be a scalability
problem.

My patch in combination with your patch should make sure
that if something gets evicted from the active list, it's
not executables - meanwhile, lots of the time streaming
IO will completely leave the active file list alone.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-03  3:24                           ` Rik van Riel
@ 2009-05-03  3:43                             ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-03  3:43 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Sun, May 03, 2009 at 11:24:03AM +0800, Rik van Riel wrote:
> On Sun, 3 May 2009 11:15:39 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit
> > check) tries to address scalability problem when every page get
> > mapped and referenced, so that logic(which lowed the priority of
> > mapped pages) could be enabled only on conditions like (priority <
> > DEF_PRIORITY).
> > 
> > Or preferably we can explicitly protect the mapped executables,
> > as illustrated by this patch (a quick prototype).
> 
> Over time, given enough streaming IO and idle applications,
> executables will still be evicted with just this patch.
> 
> However, a combination of your patch and mine might do the
> trick.  I suspect that executables are never a very big
> part of memory, except on small memory systems, so protecting
> just the mapped executables should not be a scalability
> problem.

Yes, that's my intent to take advantage of you patch :-)

There may be programs that embed large amount of static data with
them - think about self-decompression data - but that's fine: this
patch behaves not in a too persistent way. Plus we can apply a size
limit(say 100M) if necessary.

> My patch in combination with your patch should make sure
> that if something gets evicted from the active list, it's
> not executables - meanwhile, lots of the time streaming
> IO will completely leave the active file list alone.
 
They together make
- mapped executable pages the first class citizen;
- streaming IO least intrusive.

I think that would make most desktop users and server administrators
contented and comfortable :-)

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-03  3:43                             ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-03  3:43 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, elladan, peterz, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Sun, May 03, 2009 at 11:24:03AM +0800, Rik van Riel wrote:
> On Sun, 3 May 2009 11:15:39 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit
> > check) tries to address scalability problem when every page get
> > mapped and referenced, so that logic(which lowed the priority of
> > mapped pages) could be enabled only on conditions like (priority <
> > DEF_PRIORITY).
> > 
> > Or preferably we can explicitly protect the mapped executables,
> > as illustrated by this patch (a quick prototype).
> 
> Over time, given enough streaming IO and idle applications,
> executables will still be evicted with just this patch.
> 
> However, a combination of your patch and mine might do the
> trick.  I suspect that executables are never a very big
> part of memory, except on small memory systems, so protecting
> just the mapped executables should not be a scalability
> problem.

Yes, that's my intent to take advantage of you patch :-)

There may be programs that embed large amount of static data with
them - think about self-decompression data - but that's fine: this
patch behaves not in a too persistent way. Plus we can apply a size
limit(say 100M) if necessary.

> My patch in combination with your patch should make sure
> that if something gets evicted from the active list, it's
> not executables - meanwhile, lots of the time streaming
> IO will completely leave the active file list alone.
 
They together make
- mapped executable pages the first class citizen;
- streaming IO least intrusive.

I think that would make most desktop users and server administrators
contented and comfortable :-)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-01 19:35                       ` Andrew Morton
@ 2009-05-04  8:04                         ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-04  8:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Fri, 2009-05-01 at 12:35 -0700, Andrew Morton wrote:

> No, I think it still _is_ the case.  When reclaim is treating mapped
> and non-mapped pages equally, the end result sucks.  Applications get
> all laggy and humans get irritated.  It may be that the system was
> optimised from an overall throughput POV, but the result was
> *irritating*.
> 
> Which led us to prefer to retain mapped pages.  This had nothing at all
> to do with internal impementation details - it was a design objective
> based upon empirical observation of system behaviour.

Shouldn't we make a distinction between PROT_EXEC and other mappings in
this? Because as soon as you're running an application that uses gobs
and gobs of mmap'ed memory, the mapped vs non-mapped thing breaks down.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-04  8:04                         ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-04  8:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, elladan, linux-kernel, tytso, kosaki.motohiro, linux-mm

On Fri, 2009-05-01 at 12:35 -0700, Andrew Morton wrote:

> No, I think it still _is_ the case.  When reclaim is treating mapped
> and non-mapped pages equally, the end result sucks.  Applications get
> all laggy and humans get irritated.  It may be that the system was
> optimised from an overall throughput POV, but the result was
> *irritating*.
> 
> Which led us to prefer to retain mapped pages.  This had nothing at all
> to do with internal impementation details - it was a design objective
> based upon empirical observation of system behaviour.

Shouldn't we make a distinction between PROT_EXEC and other mappings in
this? Because as soon as you're running an application that uses gobs
and gobs of mmap'ed memory, the mapped vs non-mapped thing breaks down.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
  2009-05-03  3:15                         ` Wu Fengguang
@ 2009-05-04 10:23                           ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-04 10:23 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Rik van Riel, elladan, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Sun, 2009-05-03 at 11:15 +0800, Wu Fengguang wrote:
> On Fri, May 01, 2009 at 12:35:41PM -0700, Andrew Morton wrote:
> > On Fri, 01 May 2009 10:05:53 -0400
> > Rik van Riel <riel@redhat.com> wrote:
> > 
> > > Andrew Morton wrote:
> > > 
> > > >> When we implement working set protection, we might as well
> > > >> do it for frequently accessed unmapped pages too.  There is
> > > >> no reason to restrict this protection to mapped pages.
> > > > 
> > > > Well.  Except for empirical observation, which tells us that biasing
> > > > reclaim to prefer to retain mapped memory produces a better result.
> > > 
> > > That used to be the case because file-backed and
> > > swap-backed pages shared the same set of LRUs,
> > > while each following a different page reclaim
> > > heuristic!
> > 
> > No, I think it still _is_ the case.  When reclaim is treating mapped
> > and non-mapped pages equally, the end result sucks.  Applications get
> > all laggy and humans get irritated.  It may be that the system was
> > optimised from an overall throughput POV, but the result was
> > *irritating*.
> > 
> > Which led us to prefer to retain mapped pages.  This had nothing at all
> > to do with internal impementation details - it was a design objective
> > based upon empirical observation of system behaviour.
> 
> Heartily Agreed. We shall try hard to protect the running applications.
> 
> Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit check)
> tries to address scalability problem when every page get mapped and
> referenced, so that logic(which lowed the priority of mapped pages)
> could be enabled only on conditions like (priority < DEF_PRIORITY).
> 
> Or preferably we can explicitly protect the mapped executables,
> as illustrated by this patch (a quick prototype).

Ah, nice, this re-instates the young bit for PROT_EXEC pages.
I very much like this.


> Thanks,
> Fengguang
> ---
>  include/linux/pagemap.h |    1 +
>  mm/mmap.c               |    2 ++
>  mm/nommu.c              |    2 ++
>  mm/vmscan.c             |   37 +++++++++++++++++++++++++++++++++++--
>  4 files changed, 40 insertions(+), 2 deletions(-)
> 
> --- linux.orig/include/linux/pagemap.h
> +++ linux/include/linux/pagemap.h
> @@ -25,6 +25,7 @@ enum mapping_flags {
>  #ifdef CONFIG_UNEVICTABLE_LRU
>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
>  #endif
> +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
>  };
>  
>  static inline void mapping_set_error(struct address_space *mapping, int error)
> --- linux.orig/mm/mmap.c
> +++ linux/mm/mmap.c
> @@ -1198,6 +1198,8 @@ munmap_back:
>  			goto unmap_and_free_vma;
>  		if (vm_flags & VM_EXECUTABLE)
>  			added_exe_file_vma(mm);
> +		if (vm_flags & VM_EXEC)
> +			set_bit(AS_EXEC, &file->f_mapping->flags);
>  	} else if (vm_flags & VM_SHARED) {
>  		error = shmem_zero_setup(vma);
>  		if (error)
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1220,6 +1220,7 @@ static void shrink_active_list(unsigned 
>  	int pgdeactivate = 0;
>  	unsigned long pgscanned;
>  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> +	LIST_HEAD(l_active);
>  	LIST_HEAD(l_inactive);
>  	struct page *page;
>  	struct pagevec pvec;
> @@ -1259,8 +1260,15 @@ static void shrink_active_list(unsigned 
>  
>  		/* page_referenced clears PageReferenced */
>  		if (page_mapping_inuse(page) &&
> -		    page_referenced(page, 0, sc->mem_cgroup))
> +		    page_referenced(page, 0, sc->mem_cgroup)) {
> +			struct address_space *mapping = page_mapping(page);
> +
>  			pgmoved++;
> +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> +				list_add(&page->lru, &l_active);
> +				continue;
> +			}
> +		}
>  
>  		list_add(&page->lru, &l_inactive);
>  	}
> @@ -1269,7 +1277,6 @@ static void shrink_active_list(unsigned 
>  	 * Move the pages to the [file or anon] inactive list.
>  	 */
>  	pagevec_init(&pvec, 1);
> -	lru = LRU_BASE + file * LRU_FILE;
>  
>  	spin_lock_irq(&zone->lru_lock);
>  	/*
> @@ -1281,6 +1288,7 @@ static void shrink_active_list(unsigned 
>  	reclaim_stat->recent_rotated[!!file] += pgmoved;
>  
>  	pgmoved = 0;
> +	lru = LRU_BASE + file * LRU_FILE;
>  	while (!list_empty(&l_inactive)) {
>  		page = lru_to_page(&l_inactive);
>  		prefetchw_prev_lru_page(page, &l_inactive, flags);
> @@ -1305,6 +1313,31 @@ static void shrink_active_list(unsigned 
>  	}
>  	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
>  	pgdeactivate += pgmoved;
> +
> +	pgmoved = 0;
> +	lru = LRU_ACTIVE + file * LRU_FILE;
> +	while (!list_empty(&l_active)) {
> +		page = lru_to_page(&l_active);
> +		prefetchw_prev_lru_page(page, &l_active, flags);
> +		VM_BUG_ON(PageLRU(page));
> +		SetPageLRU(page);
> +		VM_BUG_ON(!PageActive(page));
> +
> +		list_move(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_add_lru_list(page, lru);
> +		pgmoved++;
> +		if (!pagevec_add(&pvec, page)) {
> +			__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +			pgmoved = 0;
> +			spin_unlock_irq(&zone->lru_lock);
> +			if (buffer_heads_over_limit)
> +				pagevec_strip(&pvec);
> +			__pagevec_release(&pvec);
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +	}
> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +
>  	__count_zone_vm_events(PGREFILL, zone, pgscanned);
>  	__count_vm_events(PGDEACTIVATE, pgdeactivate);
>  	spin_unlock_irq(&zone->lru_lock);
> --- linux.orig/mm/nommu.c
> +++ linux/mm/nommu.c
> @@ -1220,6 +1220,8 @@ unsigned long do_mmap_pgoff(struct file 
>  			added_exe_file_vma(current->mm);
>  			vma->vm_mm = current->mm;
>  		}
> +		if (vm_flags & VM_EXEC)
> +			set_bit(AS_EXEC, &file->f_mapping->flags);
>  	}
>  
>  	down_write(&nommu_region_sem);

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH] vmscan: evict use-once pages first (v2)
@ 2009-05-04 10:23                           ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-04 10:23 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Rik van Riel, elladan, linux-kernel, tytso,
	kosaki.motohiro, linux-mm

On Sun, 2009-05-03 at 11:15 +0800, Wu Fengguang wrote:
> On Fri, May 01, 2009 at 12:35:41PM -0700, Andrew Morton wrote:
> > On Fri, 01 May 2009 10:05:53 -0400
> > Rik van Riel <riel@redhat.com> wrote:
> > 
> > > Andrew Morton wrote:
> > > 
> > > >> When we implement working set protection, we might as well
> > > >> do it for frequently accessed unmapped pages too.  There is
> > > >> no reason to restrict this protection to mapped pages.
> > > > 
> > > > Well.  Except for empirical observation, which tells us that biasing
> > > > reclaim to prefer to retain mapped memory produces a better result.
> > > 
> > > That used to be the case because file-backed and
> > > swap-backed pages shared the same set of LRUs,
> > > while each following a different page reclaim
> > > heuristic!
> > 
> > No, I think it still _is_ the case.  When reclaim is treating mapped
> > and non-mapped pages equally, the end result sucks.  Applications get
> > all laggy and humans get irritated.  It may be that the system was
> > optimised from an overall throughput POV, but the result was
> > *irritating*.
> > 
> > Which led us to prefer to retain mapped pages.  This had nothing at all
> > to do with internal impementation details - it was a design objective
> > based upon empirical observation of system behaviour.
> 
> Heartily Agreed. We shall try hard to protect the running applications.
> 
> Commit 7e9cd484204f(vmscan: fix pagecache reclaim referenced bit check)
> tries to address scalability problem when every page get mapped and
> referenced, so that logic(which lowed the priority of mapped pages)
> could be enabled only on conditions like (priority < DEF_PRIORITY).
> 
> Or preferably we can explicitly protect the mapped executables,
> as illustrated by this patch (a quick prototype).

Ah, nice, this re-instates the young bit for PROT_EXEC pages.
I very much like this.


> Thanks,
> Fengguang
> ---
>  include/linux/pagemap.h |    1 +
>  mm/mmap.c               |    2 ++
>  mm/nommu.c              |    2 ++
>  mm/vmscan.c             |   37 +++++++++++++++++++++++++++++++++++--
>  4 files changed, 40 insertions(+), 2 deletions(-)
> 
> --- linux.orig/include/linux/pagemap.h
> +++ linux/include/linux/pagemap.h
> @@ -25,6 +25,7 @@ enum mapping_flags {
>  #ifdef CONFIG_UNEVICTABLE_LRU
>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
>  #endif
> +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
>  };
>  
>  static inline void mapping_set_error(struct address_space *mapping, int error)
> --- linux.orig/mm/mmap.c
> +++ linux/mm/mmap.c
> @@ -1198,6 +1198,8 @@ munmap_back:
>  			goto unmap_and_free_vma;
>  		if (vm_flags & VM_EXECUTABLE)
>  			added_exe_file_vma(mm);
> +		if (vm_flags & VM_EXEC)
> +			set_bit(AS_EXEC, &file->f_mapping->flags);
>  	} else if (vm_flags & VM_SHARED) {
>  		error = shmem_zero_setup(vma);
>  		if (error)
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1220,6 +1220,7 @@ static void shrink_active_list(unsigned 
>  	int pgdeactivate = 0;
>  	unsigned long pgscanned;
>  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> +	LIST_HEAD(l_active);
>  	LIST_HEAD(l_inactive);
>  	struct page *page;
>  	struct pagevec pvec;
> @@ -1259,8 +1260,15 @@ static void shrink_active_list(unsigned 
>  
>  		/* page_referenced clears PageReferenced */
>  		if (page_mapping_inuse(page) &&
> -		    page_referenced(page, 0, sc->mem_cgroup))
> +		    page_referenced(page, 0, sc->mem_cgroup)) {
> +			struct address_space *mapping = page_mapping(page);
> +
>  			pgmoved++;
> +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> +				list_add(&page->lru, &l_active);
> +				continue;
> +			}
> +		}
>  
>  		list_add(&page->lru, &l_inactive);
>  	}
> @@ -1269,7 +1277,6 @@ static void shrink_active_list(unsigned 
>  	 * Move the pages to the [file or anon] inactive list.
>  	 */
>  	pagevec_init(&pvec, 1);
> -	lru = LRU_BASE + file * LRU_FILE;
>  
>  	spin_lock_irq(&zone->lru_lock);
>  	/*
> @@ -1281,6 +1288,7 @@ static void shrink_active_list(unsigned 
>  	reclaim_stat->recent_rotated[!!file] += pgmoved;
>  
>  	pgmoved = 0;
> +	lru = LRU_BASE + file * LRU_FILE;
>  	while (!list_empty(&l_inactive)) {
>  		page = lru_to_page(&l_inactive);
>  		prefetchw_prev_lru_page(page, &l_inactive, flags);
> @@ -1305,6 +1313,31 @@ static void shrink_active_list(unsigned 
>  	}
>  	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
>  	pgdeactivate += pgmoved;
> +
> +	pgmoved = 0;
> +	lru = LRU_ACTIVE + file * LRU_FILE;
> +	while (!list_empty(&l_active)) {
> +		page = lru_to_page(&l_active);
> +		prefetchw_prev_lru_page(page, &l_active, flags);
> +		VM_BUG_ON(PageLRU(page));
> +		SetPageLRU(page);
> +		VM_BUG_ON(!PageActive(page));
> +
> +		list_move(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_add_lru_list(page, lru);
> +		pgmoved++;
> +		if (!pagevec_add(&pvec, page)) {
> +			__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +			pgmoved = 0;
> +			spin_unlock_irq(&zone->lru_lock);
> +			if (buffer_heads_over_limit)
> +				pagevec_strip(&pvec);
> +			__pagevec_release(&pvec);
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +	}
> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +
>  	__count_zone_vm_events(PGREFILL, zone, pgscanned);
>  	__count_vm_events(PGDEACTIVATE, pgdeactivate);
>  	spin_unlock_irq(&zone->lru_lock);
> --- linux.orig/mm/nommu.c
> +++ linux/mm/nommu.c
> @@ -1220,6 +1220,8 @@ unsigned long do_mmap_pgoff(struct file 
>  			added_exe_file_vma(current->mm);
>  			vma->vm_mm = current->mm;
>  		}
> +		if (vm_flags & VM_EXEC)
> +			set_bit(AS_EXEC, &file->f_mapping->flags);
>  	}
>  
>  	down_write(&nommu_region_sem);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
  2009-04-30 11:59             ` KOSAKI Motohiro
@ 2009-05-06 11:04               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-06 11:04 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: kosaki.motohiro, Theodore Tso, Wu Fengguang, Peter Zijlstra,
	Elladan, linux-kernel, linux-mm, Rik van Riel

> > test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness,
> >         CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
> >         CPU opteronx4, mem 4G
> >
> > mouse move lag:        not happend
> > window move lag:       not happend
> > Mapped page decrease rapidly: not happend (I guess, these page stay in
> >                     active list on my system)
> > page fault large latency:   happend (latencytop display >1200ms)
> >
> >
> > Then, I don't doubt vm replacement logic now.
> > but I need more investigate.
> > I plan to try following thing today and tommorow.
> >
> > - XFS
> > - LVM
> > - another io scheduler (thanks Ted, good view point)
> > - Rik's new patch
> 
> hm, AS io-scheduler don't make such large latency on my environment.
> Elladan, Can you try to AS scheduler? (adding boot option "elevator=as")

second test result:
read dev(sda): SSD, lvm+XFS
write dev(sdb): HDD, lvm+XFS

the result is the same of ext3 without lvm. Thus I think
XFS isn't guilty.




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: Swappiness vs. mmap() and interactive response
@ 2009-05-06 11:04               ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-06 11:04 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Theodore Tso, Wu Fengguang, Peter Zijlstra, Elladan,
	linux-kernel, linux-mm, Rik van Riel

> > test environment: no lvm, copy ext3 to ext3 (not mv), no change swappiness,
> >         CFQ is used, userland is Fedora10, mmotm(2.6.30-rc1 + mm patch),
> >         CPU opteronx4, mem 4G
> >
> > mouse move lag:        not happend
> > window move lag:       not happend
> > Mapped page decrease rapidly: not happend (I guess, these page stay in
> >                     active list on my system)
> > page fault large latency:   happend (latencytop display >1200ms)
> >
> >
> > Then, I don't doubt vm replacement logic now.
> > but I need more investigate.
> > I plan to try following thing today and tommorow.
> >
> > - XFS
> > - LVM
> > - another io scheduler (thanks Ted, good view point)
> > - Rik's new patch
> 
> hm, AS io-scheduler don't make such large latency on my environment.
> Elladan, Can you try to AS scheduler? (adding boot option "elevator=as")

second test result:
read dev(sda): SSD, lvm+XFS
write dev(sdb): HDD, lvm+XFS

the result is the same of ext3 without lvm. Thus I think
XFS isn't guilty.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-04 10:23                           ` Peter Zijlstra
@ 2009-05-07 12:11                             ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-07 12:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Rik van Riel, linux-kernel, tytso, linux-mm,
	Elladan, Nick Piggin, Johannes Weiner, Christoph Lameter,
	KOSAKI Motohiro

Introduce AS_EXEC to mark executables and their linked libraries, and to
protect their referenced active pages from being deactivated.

CC: Elladan <elladan@eskimo.com>
CC: Nick Piggin <npiggin@suse.de>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/pagemap.h |    1 +
 mm/mmap.c               |    2 ++
 mm/nommu.c              |    2 ++
 mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
 4 files changed, 38 insertions(+), 2 deletions(-)

--- linux.orig/include/linux/pagemap.h
+++ linux/include/linux/pagemap.h
@@ -25,6 +25,7 @@ enum mapping_flags {
 #ifdef CONFIG_UNEVICTABLE_LRU
 	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
 #endif
+	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
--- linux.orig/mm/mmap.c
+++ linux/mm/mmap.c
@@ -1194,6 +1194,8 @@ munmap_back:
 			goto unmap_and_free_vma;
 		if (vm_flags & VM_EXECUTABLE)
 			added_exe_file_vma(mm);
+		if (vm_flags & VM_EXEC)
+			set_bit(AS_EXEC, &file->f_mapping->flags);
 	} else if (vm_flags & VM_SHARED) {
 		error = shmem_zero_setup(vma);
 		if (error)
--- linux.orig/mm/nommu.c
+++ linux/mm/nommu.c
@@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
 			added_exe_file_vma(current->mm);
 			vma->vm_mm = current->mm;
 		}
+		if (vm_flags & VM_EXEC)
+			set_bit(AS_EXEC, &file->f_mapping->flags);
 	}
 
 	down_write(&nommu_region_sem);
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
 	unsigned long pgmoved;
 	unsigned long pgscanned;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
+	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct pagevec pvec;
@@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup))
+		    page_referenced(page, 0, sc->mem_cgroup)) {
+			struct address_space *mapping = page_mapping(page);
+
 			pgmoved++;
+			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
+				list_add(&page->lru, &l_active);
+				continue;
+			}
+		}
 
 		list_add(&page->lru, &l_inactive);
 	}
@@ -1279,7 +1287,6 @@ static void shrink_active_list(unsigned 
 	 * Move the pages to the [file or anon] inactive list.
 	 */
 	pagevec_init(&pvec, 1);
-	lru = LRU_BASE + file * LRU_FILE;
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
@@ -1291,6 +1298,7 @@ static void shrink_active_list(unsigned 
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
 	pgmoved = 0;  /* count pages moved to inactive list */
+	lru = LRU_BASE + file * LRU_FILE;
 	while (!list_empty(&l_inactive)) {
 		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1313,6 +1321,29 @@ static void shrink_active_list(unsigned 
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	__count_vm_events(PGDEACTIVATE, pgmoved);
+
+	pgmoved = 0;  /* count pages moved back to active list */
+	lru = LRU_ACTIVE + file * LRU_FILE;
+	while (!list_empty(&l_active)) {
+		page = lru_to_page(&l_active);
+		prefetchw_prev_lru_page(page, &l_active, flags);
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+		VM_BUG_ON(!PageActive(page));
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)
 		pagevec_strip(&pvec);

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 12:11                             ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-07 12:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Peter Zijlstra, Rik van Riel, linux-kernel, tytso, linux-mm,
	Elladan, Nick Piggin, Johannes Weiner, Christoph Lameter,
	KOSAKI Motohiro

Introduce AS_EXEC to mark executables and their linked libraries, and to
protect their referenced active pages from being deactivated.

CC: Elladan <elladan@eskimo.com>
CC: Nick Piggin <npiggin@suse.de>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/pagemap.h |    1 +
 mm/mmap.c               |    2 ++
 mm/nommu.c              |    2 ++
 mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
 4 files changed, 38 insertions(+), 2 deletions(-)

--- linux.orig/include/linux/pagemap.h
+++ linux/include/linux/pagemap.h
@@ -25,6 +25,7 @@ enum mapping_flags {
 #ifdef CONFIG_UNEVICTABLE_LRU
 	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
 #endif
+	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
 };
 
 static inline void mapping_set_error(struct address_space *mapping, int error)
--- linux.orig/mm/mmap.c
+++ linux/mm/mmap.c
@@ -1194,6 +1194,8 @@ munmap_back:
 			goto unmap_and_free_vma;
 		if (vm_flags & VM_EXECUTABLE)
 			added_exe_file_vma(mm);
+		if (vm_flags & VM_EXEC)
+			set_bit(AS_EXEC, &file->f_mapping->flags);
 	} else if (vm_flags & VM_SHARED) {
 		error = shmem_zero_setup(vma);
 		if (error)
--- linux.orig/mm/nommu.c
+++ linux/mm/nommu.c
@@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
 			added_exe_file_vma(current->mm);
 			vma->vm_mm = current->mm;
 		}
+		if (vm_flags & VM_EXEC)
+			set_bit(AS_EXEC, &file->f_mapping->flags);
 	}
 
 	down_write(&nommu_region_sem);
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
 	unsigned long pgmoved;
 	unsigned long pgscanned;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
+	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct pagevec pvec;
@@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup))
+		    page_referenced(page, 0, sc->mem_cgroup)) {
+			struct address_space *mapping = page_mapping(page);
+
 			pgmoved++;
+			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
+				list_add(&page->lru, &l_active);
+				continue;
+			}
+		}
 
 		list_add(&page->lru, &l_inactive);
 	}
@@ -1279,7 +1287,6 @@ static void shrink_active_list(unsigned 
 	 * Move the pages to the [file or anon] inactive list.
 	 */
 	pagevec_init(&pvec, 1);
-	lru = LRU_BASE + file * LRU_FILE;
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
@@ -1291,6 +1298,7 @@ static void shrink_active_list(unsigned 
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
 	pgmoved = 0;  /* count pages moved to inactive list */
+	lru = LRU_BASE + file * LRU_FILE;
 	while (!list_empty(&l_inactive)) {
 		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1313,6 +1321,29 @@ static void shrink_active_list(unsigned 
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	__count_vm_events(PGDEACTIVATE, pgmoved);
+
+	pgmoved = 0;  /* count pages moved back to active list */
+	lru = LRU_ACTIVE + file * LRU_FILE;
+	while (!list_empty(&l_active)) {
+		page = lru_to_page(&l_active);
+		prefetchw_prev_lru_page(page, &l_active, flags);
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+		VM_BUG_ON(!PageActive(page));
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)
 		pagevec_strip(&pvec);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 12:11                             ` Wu Fengguang
@ 2009-05-07 13:39                               ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-07 13:39 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Peter Zijlstra, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, 7 May 2009, Wu Fengguang wrote:

> Introduce AS_EXEC to mark executables and their linked libraries, and to
> protect their referenced active pages from being deactivated.


We already have support for mlock(). How is this an improvement? This is
worse since the AS_EXEC pages stay on the active list and are continually
rescanned.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 13:39                               ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-07 13:39 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Peter Zijlstra, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, 7 May 2009, Wu Fengguang wrote:

> Introduce AS_EXEC to mark executables and their linked libraries, and to
> protect their referenced active pages from being deactivated.


We already have support for mlock(). How is this an improvement? This is
worse since the AS_EXEC pages stay on the active list and are continually
rescanned.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 13:39                               ` Christoph Lameter
@ 2009-05-07 14:15                                 ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-07 14:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, 2009-05-07 at 09:39 -0400, Christoph Lameter wrote:
> On Thu, 7 May 2009, Wu Fengguang wrote:
> 
> > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > protect their referenced active pages from being deactivated.
> 
> 
> We already have support for mlock(). How is this an improvement? This is
> worse since the AS_EXEC pages stay on the active list and are continually
> rescanned.

It re-instates the young bit for PROT_EXEC pages, so that they will only
be paged when they are really cold, or there is severe pressure.

This simply gives them an edge over regular data. I don't think the
extra scanning is a problem, since you rarely have huge amounts of
executable pages around.

mlock()'ing all code just doesn't sound like a good alternative.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 14:15                                 ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-07 14:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, 2009-05-07 at 09:39 -0400, Christoph Lameter wrote:
> On Thu, 7 May 2009, Wu Fengguang wrote:
> 
> > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > protect their referenced active pages from being deactivated.
> 
> 
> We already have support for mlock(). How is this an improvement? This is
> worse since the AS_EXEC pages stay on the active list and are continually
> rescanned.

It re-instates the young bit for PROT_EXEC pages, so that they will only
be paged when they are really cold, or there is severe pressure.

This simply gives them an edge over regular data. I don't think the
extra scanning is a problem, since you rarely have huge amounts of
executable pages around.

mlock()'ing all code just doesn't sound like a good alternative.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 14:15                                 ` Peter Zijlstra
@ 2009-05-07 14:18                                   ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-07 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, 7 May 2009, Peter Zijlstra wrote:

> It re-instates the young bit for PROT_EXEC pages, so that they will only
> be paged when they are really cold, or there is severe pressure.

But they are rescanned until then. Really cold means what exactly? I do a
back up of a few hundred gigabytes and do not use firefox while the backup
is ongoing. Will the firefox pages still be in memory or not?

> This simply gives them an edge over regular data. I don't think the
> extra scanning is a problem, since you rarely have huge amounts of
> executable pages around.
>
> mlock()'ing all code just doesn't sound like a good alternative.

Another possibility may be to put the exec pages on the mlock list
and scan the list if under extreme duress?


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 14:18                                   ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-07 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, 7 May 2009, Peter Zijlstra wrote:

> It re-instates the young bit for PROT_EXEC pages, so that they will only
> be paged when they are really cold, or there is severe pressure.

But they are rescanned until then. Really cold means what exactly? I do a
back up of a few hundred gigabytes and do not use firefox while the backup
is ongoing. Will the firefox pages still be in memory or not?

> This simply gives them an edge over regular data. I don't think the
> extra scanning is a problem, since you rarely have huge amounts of
> executable pages around.
>
> mlock()'ing all code just doesn't sound like a good alternative.

Another possibility may be to put the exec pages on the mlock list
and scan the list if under extreme duress?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 14:18                                   ` Christoph Lameter
@ 2009-05-07 14:38                                     ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-07 14:38 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, 2009-05-07 at 10:18 -0400, Christoph Lameter wrote:
> On Thu, 7 May 2009, Peter Zijlstra wrote:
> 
> > It re-instates the young bit for PROT_EXEC pages, so that they will only
> > be paged when they are really cold, or there is severe pressure.
> 
> But they are rescanned until then. Really cold means what exactly? I do a
> back up of a few hundred gigabytes and do not use firefox while the backup
> is ongoing. Will the firefox pages still be in memory or not?

Likely not.

What this patch does is check the young bit on active_file scan, if its
found to be set and the page is PROT_EXEC, put the page back on the
active_file list, otherwise drop it to the inactive_file list.

So if you haven't ran any firefox code, it should be gone from the
active list after 2 full cycles, and from the inactive list on the first
full inactive cycle after that.

If you don't understand the patch, what are you complaining about, whats
your point?

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 14:38                                     ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-07 14:38 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, 2009-05-07 at 10:18 -0400, Christoph Lameter wrote:
> On Thu, 7 May 2009, Peter Zijlstra wrote:
> 
> > It re-instates the young bit for PROT_EXEC pages, so that they will only
> > be paged when they are really cold, or there is severe pressure.
> 
> But they are rescanned until then. Really cold means what exactly? I do a
> back up of a few hundred gigabytes and do not use firefox while the backup
> is ongoing. Will the firefox pages still be in memory or not?

Likely not.

What this patch does is check the young bit on active_file scan, if its
found to be set and the page is PROT_EXEC, put the page back on the
active_file list, otherwise drop it to the inactive_file list.

So if you haven't ran any firefox code, it should be gone from the
active list after 2 full cycles, and from the inactive list on the first
full inactive cycle after that.

If you don't understand the patch, what are you complaining about, whats
your point?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 14:18                                   ` Christoph Lameter
@ 2009-05-07 15:06                                     ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-07 15:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Wu Fengguang, Andrew Morton, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

Christoph Lameter wrote:
> On Thu, 7 May 2009, Peter Zijlstra wrote:
>
>   
>> It re-instates the young bit for PROT_EXEC pages, so that they will only
>> be paged when they are really cold, or there is severe pressure.
>>     
>
> But they are rescanned until then. Really cold means what exactly? I do a
> back up of a few hundred gigabytes and do not use firefox while the backup
> is ongoing. Will the firefox pages still be in memory or not?
>   
The patch with the subject "[PATCH] vmscan: evict use-once pages first (v3)"
together with this patch should make sure that it stays in memory.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 15:06                                     ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-07 15:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Wu Fengguang, Andrew Morton, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

Christoph Lameter wrote:
> On Thu, 7 May 2009, Peter Zijlstra wrote:
>
>   
>> It re-instates the young bit for PROT_EXEC pages, so that they will only
>> be paged when they are really cold, or there is severe pressure.
>>     
>
> But they are rescanned until then. Really cold means what exactly? I do a
> back up of a few hundred gigabytes and do not use firefox while the backup
> is ongoing. Will the firefox pages still be in memory or not?
>   
The patch with the subject "[PATCH] vmscan: evict use-once pages first (v3)"
together with this patch should make sure that it stays in memory.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 12:11                             ` Wu Fengguang
@ 2009-05-07 15:10                               ` Johannes Weiner
  -1 siblings, 0 replies; 336+ messages in thread
From: Johannes Weiner @ 2009-05-07 15:10 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Peter Zijlstra, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

On Thu, May 07, 2009 at 08:11:01PM +0800, Wu Fengguang wrote:
> Introduce AS_EXEC to mark executables and their linked libraries, and to
> protect their referenced active pages from being deactivated.
> 
> CC: Elladan <elladan@eskimo.com>
> CC: Nick Piggin <npiggin@suse.de>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Christoph Lameter <cl@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  include/linux/pagemap.h |    1 +
>  mm/mmap.c               |    2 ++
>  mm/nommu.c              |    2 ++
>  mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
>  4 files changed, 38 insertions(+), 2 deletions(-)
> 
> --- linux.orig/include/linux/pagemap.h
> +++ linux/include/linux/pagemap.h
> @@ -25,6 +25,7 @@ enum mapping_flags {
>  #ifdef CONFIG_UNEVICTABLE_LRU
>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
>  #endif
> +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
>  };
>  
>  static inline void mapping_set_error(struct address_space *mapping, int error)
> --- linux.orig/mm/mmap.c
> +++ linux/mm/mmap.c
> @@ -1194,6 +1194,8 @@ munmap_back:
>  			goto unmap_and_free_vma;
>  		if (vm_flags & VM_EXECUTABLE)
>  			added_exe_file_vma(mm);
> +		if (vm_flags & VM_EXEC)
> +			set_bit(AS_EXEC, &file->f_mapping->flags);
>  	} else if (vm_flags & VM_SHARED) {
>  		error = shmem_zero_setup(vma);
>  		if (error)
> --- linux.orig/mm/nommu.c
> +++ linux/mm/nommu.c
> @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
>  			added_exe_file_vma(current->mm);
>  			vma->vm_mm = current->mm;
>  		}
> +		if (vm_flags & VM_EXEC)
> +			set_bit(AS_EXEC, &file->f_mapping->flags);
>  	}

I find it a bit ugly that it applies an attribute of the memory area
(per mm) to the page cache mapping (shared).  Because this in turn
means that the reference through a non-executable vma might get the
pages rotated just because there is/was an executable mmap around.

>  	down_write(&nommu_region_sem);
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
>  	unsigned long pgmoved;
>  	unsigned long pgscanned;
>  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> +	LIST_HEAD(l_active);
>  	LIST_HEAD(l_inactive);
>  	struct page *page;
>  	struct pagevec pvec;
> @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
>  
>  		/* page_referenced clears PageReferenced */
>  		if (page_mapping_inuse(page) &&
> -		    page_referenced(page, 0, sc->mem_cgroup))
> +		    page_referenced(page, 0, sc->mem_cgroup)) {
> +			struct address_space *mapping = page_mapping(page);
> +
>  			pgmoved++;
> +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> +				list_add(&page->lru, &l_active);
> +				continue;
> +			}
> +		}

Since we walk the VMAs in page_referenced anyway, wouldn't it be
better to check if one of them is executable?  This would even work
for executable anon pages.  After all, there are applications that cow
executable mappings (sbcl and other language environments that use an
executable, run-time modified core image come to mind).

	Hannes

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 15:10                               ` Johannes Weiner
  0 siblings, 0 replies; 336+ messages in thread
From: Johannes Weiner @ 2009-05-07 15:10 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, Peter Zijlstra, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

On Thu, May 07, 2009 at 08:11:01PM +0800, Wu Fengguang wrote:
> Introduce AS_EXEC to mark executables and their linked libraries, and to
> protect their referenced active pages from being deactivated.
> 
> CC: Elladan <elladan@eskimo.com>
> CC: Nick Piggin <npiggin@suse.de>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Christoph Lameter <cl@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  include/linux/pagemap.h |    1 +
>  mm/mmap.c               |    2 ++
>  mm/nommu.c              |    2 ++
>  mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
>  4 files changed, 38 insertions(+), 2 deletions(-)
> 
> --- linux.orig/include/linux/pagemap.h
> +++ linux/include/linux/pagemap.h
> @@ -25,6 +25,7 @@ enum mapping_flags {
>  #ifdef CONFIG_UNEVICTABLE_LRU
>  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
>  #endif
> +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
>  };
>  
>  static inline void mapping_set_error(struct address_space *mapping, int error)
> --- linux.orig/mm/mmap.c
> +++ linux/mm/mmap.c
> @@ -1194,6 +1194,8 @@ munmap_back:
>  			goto unmap_and_free_vma;
>  		if (vm_flags & VM_EXECUTABLE)
>  			added_exe_file_vma(mm);
> +		if (vm_flags & VM_EXEC)
> +			set_bit(AS_EXEC, &file->f_mapping->flags);
>  	} else if (vm_flags & VM_SHARED) {
>  		error = shmem_zero_setup(vma);
>  		if (error)
> --- linux.orig/mm/nommu.c
> +++ linux/mm/nommu.c
> @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
>  			added_exe_file_vma(current->mm);
>  			vma->vm_mm = current->mm;
>  		}
> +		if (vm_flags & VM_EXEC)
> +			set_bit(AS_EXEC, &file->f_mapping->flags);
>  	}

I find it a bit ugly that it applies an attribute of the memory area
(per mm) to the page cache mapping (shared).  Because this in turn
means that the reference through a non-executable vma might get the
pages rotated just because there is/was an executable mmap around.

>  	down_write(&nommu_region_sem);
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
>  	unsigned long pgmoved;
>  	unsigned long pgscanned;
>  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> +	LIST_HEAD(l_active);
>  	LIST_HEAD(l_inactive);
>  	struct page *page;
>  	struct pagevec pvec;
> @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
>  
>  		/* page_referenced clears PageReferenced */
>  		if (page_mapping_inuse(page) &&
> -		    page_referenced(page, 0, sc->mem_cgroup))
> +		    page_referenced(page, 0, sc->mem_cgroup)) {
> +			struct address_space *mapping = page_mapping(page);
> +
>  			pgmoved++;
> +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> +				list_add(&page->lru, &l_active);
> +				continue;
> +			}
> +		}

Since we walk the VMAs in page_referenced anyway, wouldn't it be
better to check if one of them is executable?  This would even work
for executable anon pages.  After all, there are applications that cow
executable mappings (sbcl and other language environments that use an
executable, run-time modified core image come to mind).

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 15:10                               ` Johannes Weiner
@ 2009-05-07 15:17                                 ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-07 15:17 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:

> > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> >  
> >  		/* page_referenced clears PageReferenced */
> >  		if (page_mapping_inuse(page) &&
> > -		    page_referenced(page, 0, sc->mem_cgroup))
> > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > +			struct address_space *mapping = page_mapping(page);
> > +
> >  			pgmoved++;
> > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > +				list_add(&page->lru, &l_active);
> > +				continue;
> > +			}
> > +		}
> 
> Since we walk the VMAs in page_referenced anyway, wouldn't it be
> better to check if one of them is executable?  This would even work
> for executable anon pages.  After all, there are applications that cow
> executable mappings (sbcl and other language environments that use an
> executable, run-time modified core image come to mind).

Hmm, like provide a vm_flags mask along to page_referenced() to only
account matching vmas... seems like a sensible idea.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 15:17                                 ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-07 15:17 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:

> > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> >  
> >  		/* page_referenced clears PageReferenced */
> >  		if (page_mapping_inuse(page) &&
> > -		    page_referenced(page, 0, sc->mem_cgroup))
> > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > +			struct address_space *mapping = page_mapping(page);
> > +
> >  			pgmoved++;
> > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > +				list_add(&page->lru, &l_active);
> > +				continue;
> > +			}
> > +		}
> 
> Since we walk the VMAs in page_referenced anyway, wouldn't it be
> better to check if one of them is executable?  This would even work
> for executable anon pages.  After all, there are applications that cow
> executable mappings (sbcl and other language environments that use an
> executable, run-time modified core image come to mind).

Hmm, like provide a vm_flags mask along to page_referenced() to only
account matching vmas... seems like a sensible idea.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 15:17                                 ` Peter Zijlstra
@ 2009-05-07 15:21                                   ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-07 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Johannes Weiner, Wu Fengguang, Andrew Morton, linux-kernel,
	tytso, linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

Peter Zijlstra wrote:
> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> 
>>> @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
>>>  
>>>  		/* page_referenced clears PageReferenced */
>>>  		if (page_mapping_inuse(page) &&
>>> -		    page_referenced(page, 0, sc->mem_cgroup))
>>> +		    page_referenced(page, 0, sc->mem_cgroup)) {
>>> +			struct address_space *mapping = page_mapping(page);
>>> +
>>>  			pgmoved++;
>>> +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
>>> +				list_add(&page->lru, &l_active);
>>> +				continue;
>>> +			}
>>> +		}
>> Since we walk the VMAs in page_referenced anyway, wouldn't it be
>> better to check if one of them is executable?  This would even work
>> for executable anon pages.  After all, there are applications that cow
>> executable mappings (sbcl and other language environments that use an
>> executable, run-time modified core image come to mind).
> 
> Hmm, like provide a vm_flags mask along to page_referenced() to only
> account matching vmas... seems like a sensible idea.

Not for anon pages, though, because JVMs could have way too many
executable anonymous segments, which would make us run into the
scalability problems again.

Lets leave this just to the file side of the LRUs, because that
is where we have the streaming IO problem.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 15:21                                   ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-07 15:21 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Johannes Weiner, Wu Fengguang, Andrew Morton, linux-kernel,
	tytso, linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

Peter Zijlstra wrote:
> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> 
>>> @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
>>>  
>>>  		/* page_referenced clears PageReferenced */
>>>  		if (page_mapping_inuse(page) &&
>>> -		    page_referenced(page, 0, sc->mem_cgroup))
>>> +		    page_referenced(page, 0, sc->mem_cgroup)) {
>>> +			struct address_space *mapping = page_mapping(page);
>>> +
>>>  			pgmoved++;
>>> +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
>>> +				list_add(&page->lru, &l_active);
>>> +				continue;
>>> +			}
>>> +		}
>> Since we walk the VMAs in page_referenced anyway, wouldn't it be
>> better to check if one of them is executable?  This would even work
>> for executable anon pages.  After all, there are applications that cow
>> executable mappings (sbcl and other language environments that use an
>> executable, run-time modified core image come to mind).
> 
> Hmm, like provide a vm_flags mask along to page_referenced() to only
> account matching vmas... seems like a sensible idea.

Not for anon pages, though, because JVMs could have way too many
executable anonymous segments, which would make us run into the
scalability problems again.

Lets leave this just to the file side of the LRUs, because that
is where we have the streaming IO problem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 14:38                                     ` Peter Zijlstra
@ 2009-05-07 15:36                                       ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-07 15:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, 7 May 2009, Peter Zijlstra wrote:

> So if you haven't ran any firefox code, it should be gone from the
> active list after 2 full cycles, and from the inactive list on the first
> full inactive cycle after that.

So some incremental changes. I still want to use firefox after my backup
without having to wait 5 minutes while its paging exec pages back in.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 15:36                                       ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-07 15:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wu Fengguang, Andrew Morton, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, 7 May 2009, Peter Zijlstra wrote:

> So if you haven't ran any firefox code, it should be gone from the
> active list after 2 full cycles, and from the inactive list on the first
> full inactive cycle after that.

So some incremental changes. I still want to use firefox after my backup
without having to wait 5 minutes while its paging exec pages back in.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 15:36                                       ` Christoph Lameter
@ 2009-05-07 15:59                                         ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-07 15:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Wu Fengguang, Andrew Morton, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

Christoph Lameter wrote:
> On Thu, 7 May 2009, Peter Zijlstra wrote:
> 
>> So if you haven't ran any firefox code, it should be gone from the
>> active list after 2 full cycles, and from the inactive list on the first
>> full inactive cycle after that.
> 
> So some incremental changes. I still want to use firefox after my backup
> without having to wait 5 minutes while its paging exec pages back in.

Please try to read and understand the patches, before
imagining that they might not be enough.

The active file list is kept at least as large as
the inactive file list.  Your backup is one large
streaming IO.  This means the files touched by
your backup should go onto the inactive file list
and get reclaimed, without putting pressure on
the active file list.

If you are still not convinced that these (small)
changes are enough, please test the patches and
show us the results, so we can tweak things further.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 15:59                                         ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-07 15:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Wu Fengguang, Andrew Morton, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

Christoph Lameter wrote:
> On Thu, 7 May 2009, Peter Zijlstra wrote:
> 
>> So if you haven't ran any firefox code, it should be gone from the
>> active list after 2 full cycles, and from the inactive list on the first
>> full inactive cycle after that.
> 
> So some incremental changes. I still want to use firefox after my backup
> without having to wait 5 minutes while its paging exec pages back in.

Please try to read and understand the patches, before
imagining that they might not be enough.

The active file list is kept at least as large as
the inactive file list.  Your backup is one large
streaming IO.  This means the files touched by
your backup should go onto the inactive file list
and get reclaimed, without putting pressure on
the active file list.

If you are still not convinced that these (small)
changes are enough, please test the patches and
show us the results, so we can tweak things further.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 14:18                                   ` Christoph Lameter
@ 2009-05-07 16:00                                     ` Lee Schermerhorn
  -1 siblings, 0 replies; 336+ messages in thread
From: Lee Schermerhorn @ 2009-05-07 16:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Wu Fengguang, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Johannes Weiner, KOSAKI Motohiro

On Thu, 2009-05-07 at 10:18 -0400, Christoph Lameter wrote:
> On Thu, 7 May 2009, Peter Zijlstra wrote:
> 
> > It re-instates the young bit for PROT_EXEC pages, so that they will only
> > be paged when they are really cold, or there is severe pressure.
> 
> But they are rescanned until then. Really cold means what exactly? I do a
> back up of a few hundred gigabytes and do not use firefox while the backup
> is ongoing. Will the firefox pages still be in memory or not?
> 
> > This simply gives them an edge over regular data. I don't think the
> > extra scanning is a problem, since you rarely have huge amounts of
> > executable pages around.
> >
> > mlock()'ing all code just doesn't sound like a good alternative.
> 
> Another possibility may be to put the exec pages on the mlock list
> and scan the list if under extreme duress?

Actually, you don't need to go thru the overhead of mucking with the
PG_mlocked flag which incurs the rmap walk on unlock, etc.  If one sets
the the AS_UNEVICTABLE flag, the pages will be shuffled off the the
unevictable LRU iff we ever try to reclaim them.  And, we do have the
function to scan the unevictable lru to "rescue" pages in a given
mapping should we want to bring them back under extreme load.  We'd need
to remove the AS_UNEVICTABLE flag, first.  This is how
SHM_LOCK/SHM_UNLOCK works.

Lee




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 16:00                                     ` Lee Schermerhorn
  0 siblings, 0 replies; 336+ messages in thread
From: Lee Schermerhorn @ 2009-05-07 16:00 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Peter Zijlstra, Wu Fengguang, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Johannes Weiner, KOSAKI Motohiro

On Thu, 2009-05-07 at 10:18 -0400, Christoph Lameter wrote:
> On Thu, 7 May 2009, Peter Zijlstra wrote:
> 
> > It re-instates the young bit for PROT_EXEC pages, so that they will only
> > be paged when they are really cold, or there is severe pressure.
> 
> But they are rescanned until then. Really cold means what exactly? I do a
> back up of a few hundred gigabytes and do not use firefox while the backup
> is ongoing. Will the firefox pages still be in memory or not?
> 
> > This simply gives them an edge over regular data. I don't think the
> > extra scanning is a problem, since you rarely have huge amounts of
> > executable pages around.
> >
> > mlock()'ing all code just doesn't sound like a good alternative.
> 
> Another possibility may be to put the exec pages on the mlock list
> and scan the list if under extreme duress?

Actually, you don't need to go thru the overhead of mucking with the
PG_mlocked flag which incurs the rmap walk on unlock, etc.  If one sets
the the AS_UNEVICTABLE flag, the pages will be shuffled off the the
unevictable LRU iff we ever try to reclaim them.  And, we do have the
function to scan the unevictable lru to "rescue" pages in a given
mapping should we want to bring them back under extreme load.  We'd need
to remove the AS_UNEVICTABLE flag, first.  This is how
SHM_LOCK/SHM_UNLOCK works.

Lee



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 16:00                                     ` Lee Schermerhorn
@ 2009-05-07 16:32                                       ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-07 16:32 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Peter Zijlstra, Wu Fengguang, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Johannes Weiner, KOSAKI Motohiro

On Thu, 7 May 2009, Lee Schermerhorn wrote:

> > Another possibility may be to put the exec pages on the mlock list
> > and scan the list if under extreme duress?
>
> Actually, you don't need to go thru the overhead of mucking with the
> PG_mlocked flag which incurs the rmap walk on unlock, etc.  If one sets
> the the AS_UNEVICTABLE flag, the pages will be shuffled off the the
> unevictable LRU iff we ever try to reclaim them.  And, we do have the
> function to scan the unevictable lru to "rescue" pages in a given
> mapping should we want to bring them back under extreme load.  We'd need
> to remove the AS_UNEVICTABLE flag, first.  This is how
> SHM_LOCK/SHM_UNLOCK works.

We need some way to control this. If there would be a way to simply switch
off eviction of exec pages (via /proc/sys/vm/never_reclaim_exec_pages or
so) I'd use it.



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 16:32                                       ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-07 16:32 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: Peter Zijlstra, Wu Fengguang, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Johannes Weiner, KOSAKI Motohiro

On Thu, 7 May 2009, Lee Schermerhorn wrote:

> > Another possibility may be to put the exec pages on the mlock list
> > and scan the list if under extreme duress?
>
> Actually, you don't need to go thru the overhead of mucking with the
> PG_mlocked flag which incurs the rmap walk on unlock, etc.  If one sets
> the the AS_UNEVICTABLE flag, the pages will be shuffled off the the
> unevictable LRU iff we ever try to reclaim them.  And, we do have the
> function to scan the unevictable lru to "rescue" pages in a given
> mapping should we want to bring them back under extreme load.  We'd need
> to remove the AS_UNEVICTABLE flag, first.  This is how
> SHM_LOCK/SHM_UNLOCK works.

We need some way to control this. If there would be a way to simply switch
off eviction of exec pages (via /proc/sys/vm/never_reclaim_exec_pages or
so) I'd use it.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 16:32                                       ` Christoph Lameter
@ 2009-05-07 17:11                                         ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-07 17:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Lee Schermerhorn, Peter Zijlstra, Wu Fengguang, Andrew Morton,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Johannes Weiner, KOSAKI Motohiro

Christoph Lameter wrote:

> We need some way to control this. If there would be a way to simply switch
> off eviction of exec pages (via /proc/sys/vm/never_reclaim_exec_pages or
> so) I'd use it.

Nobody (except you) is proposing that we completely disable
the eviction of executable pages.  I believe that your idea
could easily lead to a denial of service attack, with a user
creating a very large executable file and mmaping it.

Giving executable pages some priority over other file cache
pages is nowhere near as dangerous wrt. unexpected side effects
and should work just as well.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 17:11                                         ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-07 17:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Lee Schermerhorn, Peter Zijlstra, Wu Fengguang, Andrew Morton,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Johannes Weiner, KOSAKI Motohiro

Christoph Lameter wrote:

> We need some way to control this. If there would be a way to simply switch
> off eviction of exec pages (via /proc/sys/vm/never_reclaim_exec_pages or
> so) I'd use it.

Nobody (except you) is proposing that we completely disable
the eviction of executable pages.  I believe that your idea
could easily lead to a denial of service attack, with a user
creating a very large executable file and mmaping it.

Giving executable pages some priority over other file cache
pages is nowhere near as dangerous wrt. unexpected side effects
and should work just as well.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 15:10                               ` Johannes Weiner
@ 2009-05-07 20:44                                 ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-07 20:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: fengguang.wu, peterz, riel, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro

On Thu, 7 May 2009 17:10:39 +0200
Johannes Weiner <hannes@cmpxchg.org> wrote:

> > +++ linux/mm/nommu.c
> > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> >  			added_exe_file_vma(current->mm);
> >  			vma->vm_mm = current->mm;
> >  		}
> > +		if (vm_flags & VM_EXEC)
> > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> >  	}
> 
> I find it a bit ugly that it applies an attribute of the memory area
> (per mm) to the page cache mapping (shared).  Because this in turn
> means that the reference through a non-executable vma might get the
> pages rotated just because there is/was an executable mmap around.

Yes, it's not good.  That AS_EXEC bit will hang around for arbitrarily
long periods in the inode cache.  So we'll have AS_EXEC set on an
entire file because someone mapped some of it with PROT_EXEC half an
hour ago.  Where's the sense in that?



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-07 20:44                                 ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-07 20:44 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: fengguang.wu, peterz, riel, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro

On Thu, 7 May 2009 17:10:39 +0200
Johannes Weiner <hannes@cmpxchg.org> wrote:

> > +++ linux/mm/nommu.c
> > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> >  			added_exe_file_vma(current->mm);
> >  			vma->vm_mm = current->mm;
> >  		}
> > +		if (vm_flags & VM_EXEC)
> > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> >  	}
> 
> I find it a bit ugly that it applies an attribute of the memory area
> (per mm) to the page cache mapping (shared).  Because this in turn
> means that the reference through a non-executable vma might get the
> pages rotated just because there is/was an executable mmap around.

Yes, it's not good.  That AS_EXEC bit will hang around for arbitrarily
long periods in the inode cache.  So we'll have AS_EXEC set on an
entire file because someone mapped some of it with PROT_EXEC half an
hour ago.  Where's the sense in that?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 15:10                               ` Johannes Weiner
@ 2009-05-08  3:02                                 ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  3:02 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Peter Zijlstra, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

On Thu, May 07, 2009 at 11:10:39PM +0800, Johannes Weiner wrote:
> On Thu, May 07, 2009 at 08:11:01PM +0800, Wu Fengguang wrote:
> > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > protect their referenced active pages from being deactivated.
> > 
> > CC: Elladan <elladan@eskimo.com>
> > CC: Nick Piggin <npiggin@suse.de>
> > CC: Johannes Weiner <hannes@cmpxchg.org>
> > CC: Christoph Lameter <cl@linux-foundation.org>
> > CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > Acked-by: Rik van Riel <riel@redhat.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  include/linux/pagemap.h |    1 +
> >  mm/mmap.c               |    2 ++
> >  mm/nommu.c              |    2 ++
> >  mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
> >  4 files changed, 38 insertions(+), 2 deletions(-)
> > 
> > --- linux.orig/include/linux/pagemap.h
> > +++ linux/include/linux/pagemap.h
> > @@ -25,6 +25,7 @@ enum mapping_flags {
> >  #ifdef CONFIG_UNEVICTABLE_LRU
> >  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> >  #endif
> > +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
> >  };
> >  
> >  static inline void mapping_set_error(struct address_space *mapping, int error)
> > --- linux.orig/mm/mmap.c
> > +++ linux/mm/mmap.c
> > @@ -1194,6 +1194,8 @@ munmap_back:
> >  			goto unmap_and_free_vma;
> >  		if (vm_flags & VM_EXECUTABLE)
> >  			added_exe_file_vma(mm);
> > +		if (vm_flags & VM_EXEC)
> > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> >  	} else if (vm_flags & VM_SHARED) {
> >  		error = shmem_zero_setup(vma);
> >  		if (error)
> > --- linux.orig/mm/nommu.c
> > +++ linux/mm/nommu.c
> > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> >  			added_exe_file_vma(current->mm);
> >  			vma->vm_mm = current->mm;
> >  		}
> > +		if (vm_flags & VM_EXEC)
> > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> >  	}
> 
> I find it a bit ugly that it applies an attribute of the memory area
> (per mm) to the page cache mapping (shared).  Because this in turn
> means that the reference through a non-executable vma might get the
> pages rotated just because there is/was an executable mmap around.

Right, the intention was to identify a whole executable/library file,
eg. /bin/bash or /lib/libc-2.9.so, covering both _text_ and _data_
sections.

> >  	down_write(&nommu_region_sem);
> > --- linux.orig/mm/vmscan.c
> > +++ linux/mm/vmscan.c
> > @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
> >  	unsigned long pgmoved;
> >  	unsigned long pgscanned;
> >  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> > +	LIST_HEAD(l_active);
> >  	LIST_HEAD(l_inactive);
> >  	struct page *page;
> >  	struct pagevec pvec;
> > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> >  
> >  		/* page_referenced clears PageReferenced */
> >  		if (page_mapping_inuse(page) &&
> > -		    page_referenced(page, 0, sc->mem_cgroup))
> > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > +			struct address_space *mapping = page_mapping(page);
> > +
> >  			pgmoved++;
> > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > +				list_add(&page->lru, &l_active);
> > +				continue;
> > +			}
> > +		}
> 
> Since we walk the VMAs in page_referenced anyway, wouldn't it be
> better to check if one of them is executable?  This would even work
> for executable anon pages.  After all, there are applications that cow
> executable mappings (sbcl and other language environments that use an
> executable, run-time modified core image come to mind).

The page_referenced() path will only cover the _text_ section.  But
yeah, the _data_ section is more likely to grow huge in some rare cases.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08  3:02                                 ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  3:02 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Peter Zijlstra, Rik van Riel, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

On Thu, May 07, 2009 at 11:10:39PM +0800, Johannes Weiner wrote:
> On Thu, May 07, 2009 at 08:11:01PM +0800, Wu Fengguang wrote:
> > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > protect their referenced active pages from being deactivated.
> > 
> > CC: Elladan <elladan@eskimo.com>
> > CC: Nick Piggin <npiggin@suse.de>
> > CC: Johannes Weiner <hannes@cmpxchg.org>
> > CC: Christoph Lameter <cl@linux-foundation.org>
> > CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > Acked-by: Rik van Riel <riel@redhat.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  include/linux/pagemap.h |    1 +
> >  mm/mmap.c               |    2 ++
> >  mm/nommu.c              |    2 ++
> >  mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
> >  4 files changed, 38 insertions(+), 2 deletions(-)
> > 
> > --- linux.orig/include/linux/pagemap.h
> > +++ linux/include/linux/pagemap.h
> > @@ -25,6 +25,7 @@ enum mapping_flags {
> >  #ifdef CONFIG_UNEVICTABLE_LRU
> >  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> >  #endif
> > +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
> >  };
> >  
> >  static inline void mapping_set_error(struct address_space *mapping, int error)
> > --- linux.orig/mm/mmap.c
> > +++ linux/mm/mmap.c
> > @@ -1194,6 +1194,8 @@ munmap_back:
> >  			goto unmap_and_free_vma;
> >  		if (vm_flags & VM_EXECUTABLE)
> >  			added_exe_file_vma(mm);
> > +		if (vm_flags & VM_EXEC)
> > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> >  	} else if (vm_flags & VM_SHARED) {
> >  		error = shmem_zero_setup(vma);
> >  		if (error)
> > --- linux.orig/mm/nommu.c
> > +++ linux/mm/nommu.c
> > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> >  			added_exe_file_vma(current->mm);
> >  			vma->vm_mm = current->mm;
> >  		}
> > +		if (vm_flags & VM_EXEC)
> > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> >  	}
> 
> I find it a bit ugly that it applies an attribute of the memory area
> (per mm) to the page cache mapping (shared).  Because this in turn
> means that the reference through a non-executable vma might get the
> pages rotated just because there is/was an executable mmap around.

Right, the intention was to identify a whole executable/library file,
eg. /bin/bash or /lib/libc-2.9.so, covering both _text_ and _data_
sections.

> >  	down_write(&nommu_region_sem);
> > --- linux.orig/mm/vmscan.c
> > +++ linux/mm/vmscan.c
> > @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
> >  	unsigned long pgmoved;
> >  	unsigned long pgscanned;
> >  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> > +	LIST_HEAD(l_active);
> >  	LIST_HEAD(l_inactive);
> >  	struct page *page;
> >  	struct pagevec pvec;
> > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> >  
> >  		/* page_referenced clears PageReferenced */
> >  		if (page_mapping_inuse(page) &&
> > -		    page_referenced(page, 0, sc->mem_cgroup))
> > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > +			struct address_space *mapping = page_mapping(page);
> > +
> >  			pgmoved++;
> > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > +				list_add(&page->lru, &l_active);
> > +				continue;
> > +			}
> > +		}
> 
> Since we walk the VMAs in page_referenced anyway, wouldn't it be
> better to check if one of them is executable?  This would even work
> for executable anon pages.  After all, there are applications that cow
> executable mappings (sbcl and other language environments that use an
> executable, run-time modified core image come to mind).

The page_referenced() path will only cover the _text_ section.  But
yeah, the _data_ section is more likely to grow huge in some rare cases.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 15:17                                 ` Peter Zijlstra
@ 2009-05-08  3:30                                   ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  3:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, linux-kernel,
	tytso, linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> 
> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> > >  
> > >  		/* page_referenced clears PageReferenced */
> > >  		if (page_mapping_inuse(page) &&
> > > -		    page_referenced(page, 0, sc->mem_cgroup))
> > > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > > +			struct address_space *mapping = page_mapping(page);
> > > +
> > >  			pgmoved++;
> > > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > +				list_add(&page->lru, &l_active);
> > > +				continue;
> > > +			}
> > > +		}
> > 
> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > better to check if one of them is executable?  This would even work
> > for executable anon pages.  After all, there are applications that cow
> > executable mappings (sbcl and other language environments that use an
> > executable, run-time modified core image come to mind).
> 
> Hmm, like provide a vm_flags mask along to page_referenced() to only
> account matching vmas... seems like a sensible idea.

I'd prefer to make vm_flags an out-param, like this:

-       int page_referenced(struct page *page, int is_locked,
+       int page_referenced(struct page *page, int is_locked, unsigned long *vm_flags,
                                struct mem_cgroup *mem_cont)

which allows reporting more versatile flags and status bits :) 

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08  3:30                                   ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  3:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, linux-kernel,
	tytso, linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> 
> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> > >  
> > >  		/* page_referenced clears PageReferenced */
> > >  		if (page_mapping_inuse(page) &&
> > > -		    page_referenced(page, 0, sc->mem_cgroup))
> > > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > > +			struct address_space *mapping = page_mapping(page);
> > > +
> > >  			pgmoved++;
> > > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > +				list_add(&page->lru, &l_active);
> > > +				continue;
> > > +			}
> > > +		}
> > 
> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > better to check if one of them is executable?  This would even work
> > for executable anon pages.  After all, there are applications that cow
> > executable mappings (sbcl and other language environments that use an
> > executable, run-time modified core image come to mind).
> 
> Hmm, like provide a vm_flags mask along to page_referenced() to only
> account matching vmas... seems like a sensible idea.

I'd prefer to make vm_flags an out-param, like this:

-       int page_referenced(struct page *page, int is_locked,
+       int page_referenced(struct page *page, int is_locked, unsigned long *vm_flags,
                                struct mem_cgroup *mem_cont)

which allows reporting more versatile flags and status bits :) 

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 17:11                                         ` Rik van Riel
@ 2009-05-08  3:40                                           ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-05-08  3:40 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Christoph Lameter, Lee Schermerhorn, Peter Zijlstra,
	Wu Fengguang, Andrew Morton, linux-kernel, tytso, linux-mm,
	Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, May 07, 2009 at 01:11:41PM -0400, Rik van Riel wrote:
> Christoph Lameter wrote:
>
>> We need some way to control this. If there would be a way to simply switch
>> off eviction of exec pages (via /proc/sys/vm/never_reclaim_exec_pages or
>> so) I'd use it.
>
> Nobody (except you) is proposing that we completely disable
> the eviction of executable pages.  I believe that your idea
> could easily lead to a denial of service attack, with a user
> creating a very large executable file and mmaping it.
>
> Giving executable pages some priority over other file cache
> pages is nowhere near as dangerous wrt. unexpected side effects
> and should work just as well.

I don't think this sort of DOS is relevant for a single user or trusted user
system.  

I don't know of any distro that applies default ulimits, so desktops are
already susceptible to the far more trivial "call malloc a lot" or "fork bomb"
attacks.  Plus, ulimits don't help, since they only apply per process - you'd
need a default mem cgroup before this mattered, I think.

Thanks,
Elladan


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08  3:40                                           ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-05-08  3:40 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Christoph Lameter, Lee Schermerhorn, Peter Zijlstra,
	Wu Fengguang, Andrew Morton, linux-kernel, tytso, linux-mm,
	Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Thu, May 07, 2009 at 01:11:41PM -0400, Rik van Riel wrote:
> Christoph Lameter wrote:
>
>> We need some way to control this. If there would be a way to simply switch
>> off eviction of exec pages (via /proc/sys/vm/never_reclaim_exec_pages or
>> so) I'd use it.
>
> Nobody (except you) is proposing that we completely disable
> the eviction of executable pages.  I believe that your idea
> could easily lead to a denial of service attack, with a user
> creating a very large executable file and mmaping it.
>
> Giving executable pages some priority over other file cache
> pages is nowhere near as dangerous wrt. unexpected side effects
> and should work just as well.

I don't think this sort of DOS is relevant for a single user or trusted user
system.  

I don't know of any distro that applies default ulimits, so desktops are
already susceptible to the far more trivial "call malloc a lot" or "fork bomb"
attacks.  Plus, ulimits don't help, since they only apply per process - you'd
need a default mem cgroup before this mattered, I think.

Thanks,
Elladan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [RFC][PATCH] vmscan: report vm_flags in page_referenced()
  2009-05-07 15:17                                 ` Peter Zijlstra
@ 2009-05-08  4:17                                   ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  4:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, linux-kernel,
	tytso, linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> 
> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> > >  
> > >  		/* page_referenced clears PageReferenced */
> > >  		if (page_mapping_inuse(page) &&
> > > -		    page_referenced(page, 0, sc->mem_cgroup))
> > > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > > +			struct address_space *mapping = page_mapping(page);
> > > +
> > >  			pgmoved++;
> > > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > +				list_add(&page->lru, &l_active);
> > > +				continue;
> > > +			}
> > > +		}
> > 
> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > better to check if one of them is executable?  This would even work
> > for executable anon pages.  After all, there are applications that cow
> > executable mappings (sbcl and other language environments that use an
> > executable, run-time modified core image come to mind).
> 
> Hmm, like provide a vm_flags mask along to page_referenced() to only
> account matching vmas... seems like a sensible idea.

Here is a quick patch for your opinions. Compile tested.

With the added vm_flags reporting, the mlock=>unevictable logic can
possibly be made more straightforward.

Thanks,
Fengguang
---
vmscan: report vm_flags in page_referenced()

This enables more informed reclaim heuristics, eg. to protect executable
file pages more aggressively.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/rmap.h |    5 +++--
 mm/rmap.c            |   30 +++++++++++++++++++++---------
 mm/vmscan.c          |    7 +++++--
 3 files changed, 29 insertions(+), 13 deletions(-)

--- linux.orig/include/linux/rmap.h
+++ linux/include/linux/rmap.h
@@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct 
 /*
  * Called from mm/vmscan.c to handle paging out
  */
-int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
+int page_referenced(struct page *, int is_locked,
+			struct mem_cgroup *cnt, unsigned long *vm_flags);
 int try_to_unmap(struct page *, int ignore_refs);
 
 /*
@@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
 #define anon_vma_prepare(vma)	(0)
 #define anon_vma_link(vma)	do {} while (0)
 
-#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
+#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
 #define try_to_unmap(page, refs) SWAP_FAIL
 
 static inline int page_mkclean(struct page *page)
--- linux.orig/mm/rmap.c
+++ linux/mm/rmap.c
@@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
  * repeatedly from either page_referenced_anon or page_referenced_file.
  */
 static int page_referenced_one(struct page *page,
-	struct vm_area_struct *vma, unsigned int *mapcount)
+			       struct vm_area_struct *vma,
+			       unsigned int *mapcount)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -385,7 +386,8 @@ out:
 }
 
 static int page_referenced_anon(struct page *page,
-				struct mem_cgroup *mem_cont)
+				struct mem_cgroup *mem_cont,
+				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
 	struct anon_vma *anon_vma;
@@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
 		referenced += page_referenced_one(page, vma, &mapcount);
+		*vm_flags |= vma->vm_flags;
 		if (!mapcount)
 			break;
 	}
@@ -418,6 +421,7 @@ static int page_referenced_anon(struct p
  * page_referenced_file - referenced check for object-based rmap
  * @page: the page we're checking references on.
  * @mem_cont: target memory controller
+ * @vm_flags: collect the encountered vma->vm_flags
  *
  * For an object-based mapped page, find all the places it is mapped and
  * check/clear the referenced flag.  This is done by following the page->mapping
@@ -427,7 +431,8 @@ static int page_referenced_anon(struct p
  * This function is only called from page_referenced for object-based pages.
  */
 static int page_referenced_file(struct page *page,
-				struct mem_cgroup *mem_cont)
+				struct mem_cgroup *mem_cont,
+				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
@@ -468,6 +473,7 @@ static int page_referenced_file(struct p
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
 		referenced += page_referenced_one(page, vma, &mapcount);
+		*vm_flags |= vma->vm_flags;
 		if (!mapcount)
 			break;
 	}
@@ -481,29 +487,35 @@ static int page_referenced_file(struct p
  * @page: the page to test
  * @is_locked: caller holds lock on the page
  * @mem_cont: target memory controller
+ * @vm_flags: collect the encountered vma->vm_flags
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
  */
-int page_referenced(struct page *page, int is_locked,
-			struct mem_cgroup *mem_cont)
+int page_referenced(struct page *page,
+		    int is_locked,
+		    struct mem_cgroup *mem_cont,
+		    unsigned long *vm_flags)
 {
 	int referenced = 0;
 
 	if (TestClearPageReferenced(page))
 		referenced++;
 
+	*vm_flags = 0;
 	if (page_mapped(page) && page->mapping) {
 		if (PageAnon(page))
-			referenced += page_referenced_anon(page, mem_cont);
+			referenced += page_referenced_anon(page, mem_cont,
+								vm_flags);
 		else if (is_locked)
-			referenced += page_referenced_file(page, mem_cont);
+			referenced += page_referenced_file(page, mem_cont,
+								vm_flags);
 		else if (!trylock_page(page))
 			referenced++;
 		else {
 			if (page->mapping)
-				referenced +=
-					page_referenced_file(page, mem_cont);
+				referenced += page_referenced_file(page,
+							mem_cont, vm_flags);
 			unlock_page(page);
 		}
 	}
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -598,6 +598,7 @@ static unsigned long shrink_page_list(st
 	struct pagevec freed_pvec;
 	int pgactivate = 0;
 	unsigned long nr_reclaimed = 0;
+	unsigned long vm_flags;
 
 	cond_resched();
 
@@ -648,7 +649,8 @@ static unsigned long shrink_page_list(st
 				goto keep_locked;
 		}
 
-		referenced = page_referenced(page, 1, sc->mem_cgroup);
+		referenced = page_referenced(page, 1,
+						sc->mem_cgroup, &vm_flags);
 		/* In active use or really unfreeable?  Activate it. */
 		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
 					referenced && page_mapping_inuse(page))
@@ -1229,6 +1231,7 @@ static void shrink_active_list(unsigned 
 {
 	unsigned long pgmoved;
 	unsigned long pgscanned;
+	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
 	LIST_HEAD(l_inactive);
 	struct page *page;
@@ -1269,7 +1272,7 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup))
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
 			pgmoved++;
 
 		list_add(&page->lru, &l_inactive);

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [RFC][PATCH] vmscan: report vm_flags in page_referenced()
@ 2009-05-08  4:17                                   ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  4:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, linux-kernel,
	tytso, linux-mm, Elladan, Nick Piggin, Christoph Lameter,
	KOSAKI Motohiro

On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> 
> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> > >  
> > >  		/* page_referenced clears PageReferenced */
> > >  		if (page_mapping_inuse(page) &&
> > > -		    page_referenced(page, 0, sc->mem_cgroup))
> > > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > > +			struct address_space *mapping = page_mapping(page);
> > > +
> > >  			pgmoved++;
> > > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > +				list_add(&page->lru, &l_active);
> > > +				continue;
> > > +			}
> > > +		}
> > 
> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > better to check if one of them is executable?  This would even work
> > for executable anon pages.  After all, there are applications that cow
> > executable mappings (sbcl and other language environments that use an
> > executable, run-time modified core image come to mind).
> 
> Hmm, like provide a vm_flags mask along to page_referenced() to only
> account matching vmas... seems like a sensible idea.

Here is a quick patch for your opinions. Compile tested.

With the added vm_flags reporting, the mlock=>unevictable logic can
possibly be made more straightforward.

Thanks,
Fengguang
---
vmscan: report vm_flags in page_referenced()

This enables more informed reclaim heuristics, eg. to protect executable
file pages more aggressively.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/rmap.h |    5 +++--
 mm/rmap.c            |   30 +++++++++++++++++++++---------
 mm/vmscan.c          |    7 +++++--
 3 files changed, 29 insertions(+), 13 deletions(-)

--- linux.orig/include/linux/rmap.h
+++ linux/include/linux/rmap.h
@@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct 
 /*
  * Called from mm/vmscan.c to handle paging out
  */
-int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
+int page_referenced(struct page *, int is_locked,
+			struct mem_cgroup *cnt, unsigned long *vm_flags);
 int try_to_unmap(struct page *, int ignore_refs);
 
 /*
@@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
 #define anon_vma_prepare(vma)	(0)
 #define anon_vma_link(vma)	do {} while (0)
 
-#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
+#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
 #define try_to_unmap(page, refs) SWAP_FAIL
 
 static inline int page_mkclean(struct page *page)
--- linux.orig/mm/rmap.c
+++ linux/mm/rmap.c
@@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
  * repeatedly from either page_referenced_anon or page_referenced_file.
  */
 static int page_referenced_one(struct page *page,
-	struct vm_area_struct *vma, unsigned int *mapcount)
+			       struct vm_area_struct *vma,
+			       unsigned int *mapcount)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -385,7 +386,8 @@ out:
 }
 
 static int page_referenced_anon(struct page *page,
-				struct mem_cgroup *mem_cont)
+				struct mem_cgroup *mem_cont,
+				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
 	struct anon_vma *anon_vma;
@@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
 		referenced += page_referenced_one(page, vma, &mapcount);
+		*vm_flags |= vma->vm_flags;
 		if (!mapcount)
 			break;
 	}
@@ -418,6 +421,7 @@ static int page_referenced_anon(struct p
  * page_referenced_file - referenced check for object-based rmap
  * @page: the page we're checking references on.
  * @mem_cont: target memory controller
+ * @vm_flags: collect the encountered vma->vm_flags
  *
  * For an object-based mapped page, find all the places it is mapped and
  * check/clear the referenced flag.  This is done by following the page->mapping
@@ -427,7 +431,8 @@ static int page_referenced_anon(struct p
  * This function is only called from page_referenced for object-based pages.
  */
 static int page_referenced_file(struct page *page,
-				struct mem_cgroup *mem_cont)
+				struct mem_cgroup *mem_cont,
+				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
@@ -468,6 +473,7 @@ static int page_referenced_file(struct p
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
 		referenced += page_referenced_one(page, vma, &mapcount);
+		*vm_flags |= vma->vm_flags;
 		if (!mapcount)
 			break;
 	}
@@ -481,29 +487,35 @@ static int page_referenced_file(struct p
  * @page: the page to test
  * @is_locked: caller holds lock on the page
  * @mem_cont: target memory controller
+ * @vm_flags: collect the encountered vma->vm_flags
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
  */
-int page_referenced(struct page *page, int is_locked,
-			struct mem_cgroup *mem_cont)
+int page_referenced(struct page *page,
+		    int is_locked,
+		    struct mem_cgroup *mem_cont,
+		    unsigned long *vm_flags)
 {
 	int referenced = 0;
 
 	if (TestClearPageReferenced(page))
 		referenced++;
 
+	*vm_flags = 0;
 	if (page_mapped(page) && page->mapping) {
 		if (PageAnon(page))
-			referenced += page_referenced_anon(page, mem_cont);
+			referenced += page_referenced_anon(page, mem_cont,
+								vm_flags);
 		else if (is_locked)
-			referenced += page_referenced_file(page, mem_cont);
+			referenced += page_referenced_file(page, mem_cont,
+								vm_flags);
 		else if (!trylock_page(page))
 			referenced++;
 		else {
 			if (page->mapping)
-				referenced +=
-					page_referenced_file(page, mem_cont);
+				referenced += page_referenced_file(page,
+							mem_cont, vm_flags);
 			unlock_page(page);
 		}
 	}
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -598,6 +598,7 @@ static unsigned long shrink_page_list(st
 	struct pagevec freed_pvec;
 	int pgactivate = 0;
 	unsigned long nr_reclaimed = 0;
+	unsigned long vm_flags;
 
 	cond_resched();
 
@@ -648,7 +649,8 @@ static unsigned long shrink_page_list(st
 				goto keep_locked;
 		}
 
-		referenced = page_referenced(page, 1, sc->mem_cgroup);
+		referenced = page_referenced(page, 1,
+						sc->mem_cgroup, &vm_flags);
 		/* In active use or really unfreeable?  Activate it. */
 		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
 					referenced && page_mapping_inuse(page))
@@ -1229,6 +1231,7 @@ static void shrink_active_list(unsigned 
 {
 	unsigned long pgmoved;
 	unsigned long pgscanned;
+	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
 	LIST_HEAD(l_inactive);
 	struct page *page;
@@ -1269,7 +1272,7 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup))
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
 			pgmoved++;
 
 		list_add(&page->lru, &l_inactive);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08  3:02                                 ` Wu Fengguang
@ 2009-05-08  7:30                                   ` Minchan Kim
  -1 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-08  7:30 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Johannes Weiner, Andrew Morton, Peter Zijlstra, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

Hi, Let me have a question. 

On Fri, 8 May 2009 11:02:09 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Thu, May 07, 2009 at 11:10:39PM +0800, Johannes Weiner wrote:
> > On Thu, May 07, 2009 at 08:11:01PM +0800, Wu Fengguang wrote:
> > > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > > protect their referenced active pages from being deactivated.
> > > 
> > > CC: Elladan <elladan@eskimo.com>
> > > CC: Nick Piggin <npiggin@suse.de>
> > > CC: Johannes Weiner <hannes@cmpxchg.org>
> > > CC: Christoph Lameter <cl@linux-foundation.org>
> > > CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > > Acked-by: Rik van Riel <riel@redhat.com>
> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > ---
> > >  include/linux/pagemap.h |    1 +
> > >  mm/mmap.c               |    2 ++
> > >  mm/nommu.c              |    2 ++
> > >  mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
> > >  4 files changed, 38 insertions(+), 2 deletions(-)
> > > 
> > > --- linux.orig/include/linux/pagemap.h
> > > +++ linux/include/linux/pagemap.h
> > > @@ -25,6 +25,7 @@ enum mapping_flags {
> > >  #ifdef CONFIG_UNEVICTABLE_LRU
> > >  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> > >  #endif
> > > +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
> > >  };
> > >  
> > >  static inline void mapping_set_error(struct address_space *mapping, int error)
> > > --- linux.orig/mm/mmap.c
> > > +++ linux/mm/mmap.c
> > > @@ -1194,6 +1194,8 @@ munmap_back:
> > >  			goto unmap_and_free_vma;
> > >  		if (vm_flags & VM_EXECUTABLE)
> > >  			added_exe_file_vma(mm);
> > > +		if (vm_flags & VM_EXEC)
> > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > >  	} else if (vm_flags & VM_SHARED) {
> > >  		error = shmem_zero_setup(vma);
> > >  		if (error)
> > > --- linux.orig/mm/nommu.c
> > > +++ linux/mm/nommu.c
> > > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> > >  			added_exe_file_vma(current->mm);
> > >  			vma->vm_mm = current->mm;
> > >  		}
> > > +		if (vm_flags & VM_EXEC)
> > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > >  	}
> > 
> > I find it a bit ugly that it applies an attribute of the memory area
> > (per mm) to the page cache mapping (shared).  Because this in turn
> > means that the reference through a non-executable vma might get the
> > pages rotated just because there is/was an executable mmap around.
> 
> Right, the intention was to identify a whole executable/library file,
> eg. /bin/bash or /lib/libc-2.9.so, covering both _text_ and _data_
> sections.

But, your patch is care just text section. 
Do I miss something ?

> > >  	down_write(&nommu_region_sem);
> > > --- linux.orig/mm/vmscan.c
> > > +++ linux/mm/vmscan.c
> > > @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
> > >  	unsigned long pgmoved;
> > >  	unsigned long pgscanned;
> > >  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> > > +	LIST_HEAD(l_active);
> > >  	LIST_HEAD(l_inactive);
> > >  	struct page *page;
> > >  	struct pagevec pvec;
> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> > >  
> > >  		/* page_referenced clears PageReferenced */
> > >  		if (page_mapping_inuse(page) &&
> > > -		    page_referenced(page, 0, sc->mem_cgroup))
> > > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > > +			struct address_space *mapping = page_mapping(page);
> > > +
> > >  			pgmoved++;
> > > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > +				list_add(&page->lru, &l_active);
> > > +				continue;
> > > +			}
> > > +		}
> > 
> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > better to check if one of them is executable?  This would even work
> > for executable anon pages.  After all, there are applications that cow
> > executable mappings (sbcl and other language environments that use an
> > executable, run-time modified core image come to mind).
> 
> The page_referenced() path will only cover the _text_ section.  But

Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ? 
Could you elaborate please ?

> yeah, the _data_ section is more likely to grow huge in some rare cases.
>
> Thanks,
> Fengguang
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
Kinds Regards
Minchan Kim

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08  7:30                                   ` Minchan Kim
  0 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-08  7:30 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Johannes Weiner, Andrew Morton, Peter Zijlstra, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

Hi, Let me have a question. 

On Fri, 8 May 2009 11:02:09 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Thu, May 07, 2009 at 11:10:39PM +0800, Johannes Weiner wrote:
> > On Thu, May 07, 2009 at 08:11:01PM +0800, Wu Fengguang wrote:
> > > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > > protect their referenced active pages from being deactivated.
> > > 
> > > CC: Elladan <elladan@eskimo.com>
> > > CC: Nick Piggin <npiggin@suse.de>
> > > CC: Johannes Weiner <hannes@cmpxchg.org>
> > > CC: Christoph Lameter <cl@linux-foundation.org>
> > > CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > > Acked-by: Rik van Riel <riel@redhat.com>
> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > ---
> > >  include/linux/pagemap.h |    1 +
> > >  mm/mmap.c               |    2 ++
> > >  mm/nommu.c              |    2 ++
> > >  mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
> > >  4 files changed, 38 insertions(+), 2 deletions(-)
> > > 
> > > --- linux.orig/include/linux/pagemap.h
> > > +++ linux/include/linux/pagemap.h
> > > @@ -25,6 +25,7 @@ enum mapping_flags {
> > >  #ifdef CONFIG_UNEVICTABLE_LRU
> > >  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> > >  #endif
> > > +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
> > >  };
> > >  
> > >  static inline void mapping_set_error(struct address_space *mapping, int error)
> > > --- linux.orig/mm/mmap.c
> > > +++ linux/mm/mmap.c
> > > @@ -1194,6 +1194,8 @@ munmap_back:
> > >  			goto unmap_and_free_vma;
> > >  		if (vm_flags & VM_EXECUTABLE)
> > >  			added_exe_file_vma(mm);
> > > +		if (vm_flags & VM_EXEC)
> > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > >  	} else if (vm_flags & VM_SHARED) {
> > >  		error = shmem_zero_setup(vma);
> > >  		if (error)
> > > --- linux.orig/mm/nommu.c
> > > +++ linux/mm/nommu.c
> > > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> > >  			added_exe_file_vma(current->mm);
> > >  			vma->vm_mm = current->mm;
> > >  		}
> > > +		if (vm_flags & VM_EXEC)
> > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > >  	}
> > 
> > I find it a bit ugly that it applies an attribute of the memory area
> > (per mm) to the page cache mapping (shared).  Because this in turn
> > means that the reference through a non-executable vma might get the
> > pages rotated just because there is/was an executable mmap around.
> 
> Right, the intention was to identify a whole executable/library file,
> eg. /bin/bash or /lib/libc-2.9.so, covering both _text_ and _data_
> sections.

But, your patch is care just text section. 
Do I miss something ?

> > >  	down_write(&nommu_region_sem);
> > > --- linux.orig/mm/vmscan.c
> > > +++ linux/mm/vmscan.c
> > > @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
> > >  	unsigned long pgmoved;
> > >  	unsigned long pgscanned;
> > >  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> > > +	LIST_HEAD(l_active);
> > >  	LIST_HEAD(l_inactive);
> > >  	struct page *page;
> > >  	struct pagevec pvec;
> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> > >  
> > >  		/* page_referenced clears PageReferenced */
> > >  		if (page_mapping_inuse(page) &&
> > > -		    page_referenced(page, 0, sc->mem_cgroup))
> > > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > > +			struct address_space *mapping = page_mapping(page);
> > > +
> > >  			pgmoved++;
> > > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > +				list_add(&page->lru, &l_active);
> > > +				continue;
> > > +			}
> > > +		}
> > 
> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > better to check if one of them is executable?  This would even work
> > for executable anon pages.  After all, there are applications that cow
> > executable mappings (sbcl and other language environments that use an
> > executable, run-time modified core image come to mind).
> 
> The page_referenced() path will only cover the _text_ section.  But

Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ? 
Could you elaborate please ?

> yeah, the _data_ section is more likely to grow huge in some rare cases.
>
> Thanks,
> Fengguang
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
Kinds Regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08  7:30                                   ` Minchan Kim
@ 2009-05-08  8:09                                     ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, Andrew Morton, Peter Zijlstra, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

On Fri, May 08, 2009 at 03:30:42PM +0800, Minchan Kim wrote:
> Hi, Let me have a question. 
> 
> On Fri, 8 May 2009 11:02:09 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Thu, May 07, 2009 at 11:10:39PM +0800, Johannes Weiner wrote:
> > > On Thu, May 07, 2009 at 08:11:01PM +0800, Wu Fengguang wrote:
> > > > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > > > protect their referenced active pages from being deactivated.
> > > > 
> > > > CC: Elladan <elladan@eskimo.com>
> > > > CC: Nick Piggin <npiggin@suse.de>
> > > > CC: Johannes Weiner <hannes@cmpxchg.org>
> > > > CC: Christoph Lameter <cl@linux-foundation.org>
> > > > CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > > > Acked-by: Rik van Riel <riel@redhat.com>
> > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > ---
> > > >  include/linux/pagemap.h |    1 +
> > > >  mm/mmap.c               |    2 ++
> > > >  mm/nommu.c              |    2 ++
> > > >  mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
> > > >  4 files changed, 38 insertions(+), 2 deletions(-)
> > > > 
> > > > --- linux.orig/include/linux/pagemap.h
> > > > +++ linux/include/linux/pagemap.h
> > > > @@ -25,6 +25,7 @@ enum mapping_flags {
> > > >  #ifdef CONFIG_UNEVICTABLE_LRU
> > > >  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> > > >  #endif
> > > > +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
> > > >  };
> > > >  
> > > >  static inline void mapping_set_error(struct address_space *mapping, int error)
> > > > --- linux.orig/mm/mmap.c
> > > > +++ linux/mm/mmap.c
> > > > @@ -1194,6 +1194,8 @@ munmap_back:
> > > >  			goto unmap_and_free_vma;
> > > >  		if (vm_flags & VM_EXECUTABLE)
> > > >  			added_exe_file_vma(mm);
> > > > +		if (vm_flags & VM_EXEC)
> > > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > > >  	} else if (vm_flags & VM_SHARED) {
> > > >  		error = shmem_zero_setup(vma);
> > > >  		if (error)
> > > > --- linux.orig/mm/nommu.c
> > > > +++ linux/mm/nommu.c
> > > > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> > > >  			added_exe_file_vma(current->mm);
> > > >  			vma->vm_mm = current->mm;
> > > >  		}
> > > > +		if (vm_flags & VM_EXEC)
> > > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > > >  	}
> > > 
> > > I find it a bit ugly that it applies an attribute of the memory area
> > > (per mm) to the page cache mapping (shared).  Because this in turn
> > > means that the reference through a non-executable vma might get the
> > > pages rotated just because there is/was an executable mmap around.
> > 
> > Right, the intention was to identify a whole executable/library file,
> > eg. /bin/bash or /lib/libc-2.9.so, covering both _text_ and _data_
> > sections.
> 
> But, your patch is care just text section. 
> Do I miss something ?

This patch actually protects the mapped pages in the whole executable
file.  Sorry, the title was a bit misleading..

> > > >  	down_write(&nommu_region_sem);
> > > > --- linux.orig/mm/vmscan.c
> > > > +++ linux/mm/vmscan.c
> > > > @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
> > > >  	unsigned long pgmoved;
> > > >  	unsigned long pgscanned;
> > > >  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> > > > +	LIST_HEAD(l_active);
> > > >  	LIST_HEAD(l_inactive);
> > > >  	struct page *page;
> > > >  	struct pagevec pvec;
> > > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> > > >  
> > > >  		/* page_referenced clears PageReferenced */
> > > >  		if (page_mapping_inuse(page) &&
> > > > -		    page_referenced(page, 0, sc->mem_cgroup))
> > > > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > > > +			struct address_space *mapping = page_mapping(page);
> > > > +
> > > >  			pgmoved++;
> > > > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > > +				list_add(&page->lru, &l_active);
> > > > +				continue;
> > > > +			}
> > > > +		}
> > > 
> > > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > > better to check if one of them is executable?  This would even work
> > > for executable anon pages.  After all, there are applications that cow
> > > executable mappings (sbcl and other language environments that use an
> > > executable, run-time modified core image come to mind).
> > 
> > The page_referenced() path will only cover the _text_ section.  But
> 
> Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ? 
> Could you elaborate please ?

I was under the wild assumption that only the _text_ section will be
PROT_EXEC mapped.  No?

Thanks,
Fengguang

> > yeah, the _data_ section is more likely to grow huge in some rare cases.
> >
> > Thanks,
> > Fengguang
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> 
> -- 
> Kinds Regards
> Minchan Kim

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08  8:09                                     ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, Andrew Morton, Peter Zijlstra, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

On Fri, May 08, 2009 at 03:30:42PM +0800, Minchan Kim wrote:
> Hi, Let me have a question. 
> 
> On Fri, 8 May 2009 11:02:09 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > On Thu, May 07, 2009 at 11:10:39PM +0800, Johannes Weiner wrote:
> > > On Thu, May 07, 2009 at 08:11:01PM +0800, Wu Fengguang wrote:
> > > > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > > > protect their referenced active pages from being deactivated.
> > > > 
> > > > CC: Elladan <elladan@eskimo.com>
> > > > CC: Nick Piggin <npiggin@suse.de>
> > > > CC: Johannes Weiner <hannes@cmpxchg.org>
> > > > CC: Christoph Lameter <cl@linux-foundation.org>
> > > > CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > > > Acked-by: Rik van Riel <riel@redhat.com>
> > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > ---
> > > >  include/linux/pagemap.h |    1 +
> > > >  mm/mmap.c               |    2 ++
> > > >  mm/nommu.c              |    2 ++
> > > >  mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
> > > >  4 files changed, 38 insertions(+), 2 deletions(-)
> > > > 
> > > > --- linux.orig/include/linux/pagemap.h
> > > > +++ linux/include/linux/pagemap.h
> > > > @@ -25,6 +25,7 @@ enum mapping_flags {
> > > >  #ifdef CONFIG_UNEVICTABLE_LRU
> > > >  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> > > >  #endif
> > > > +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
> > > >  };
> > > >  
> > > >  static inline void mapping_set_error(struct address_space *mapping, int error)
> > > > --- linux.orig/mm/mmap.c
> > > > +++ linux/mm/mmap.c
> > > > @@ -1194,6 +1194,8 @@ munmap_back:
> > > >  			goto unmap_and_free_vma;
> > > >  		if (vm_flags & VM_EXECUTABLE)
> > > >  			added_exe_file_vma(mm);
> > > > +		if (vm_flags & VM_EXEC)
> > > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > > >  	} else if (vm_flags & VM_SHARED) {
> > > >  		error = shmem_zero_setup(vma);
> > > >  		if (error)
> > > > --- linux.orig/mm/nommu.c
> > > > +++ linux/mm/nommu.c
> > > > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> > > >  			added_exe_file_vma(current->mm);
> > > >  			vma->vm_mm = current->mm;
> > > >  		}
> > > > +		if (vm_flags & VM_EXEC)
> > > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > > >  	}
> > > 
> > > I find it a bit ugly that it applies an attribute of the memory area
> > > (per mm) to the page cache mapping (shared).  Because this in turn
> > > means that the reference through a non-executable vma might get the
> > > pages rotated just because there is/was an executable mmap around.
> > 
> > Right, the intention was to identify a whole executable/library file,
> > eg. /bin/bash or /lib/libc-2.9.so, covering both _text_ and _data_
> > sections.
> 
> But, your patch is care just text section. 
> Do I miss something ?

This patch actually protects the mapped pages in the whole executable
file.  Sorry, the title was a bit misleading..

> > > >  	down_write(&nommu_region_sem);
> > > > --- linux.orig/mm/vmscan.c
> > > > +++ linux/mm/vmscan.c
> > > > @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
> > > >  	unsigned long pgmoved;
> > > >  	unsigned long pgscanned;
> > > >  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> > > > +	LIST_HEAD(l_active);
> > > >  	LIST_HEAD(l_inactive);
> > > >  	struct page *page;
> > > >  	struct pagevec pvec;
> > > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> > > >  
> > > >  		/* page_referenced clears PageReferenced */
> > > >  		if (page_mapping_inuse(page) &&
> > > > -		    page_referenced(page, 0, sc->mem_cgroup))
> > > > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > > > +			struct address_space *mapping = page_mapping(page);
> > > > +
> > > >  			pgmoved++;
> > > > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > > +				list_add(&page->lru, &l_active);
> > > > +				continue;
> > > > +			}
> > > > +		}
> > > 
> > > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > > better to check if one of them is executable?  This would even work
> > > for executable anon pages.  After all, there are applications that cow
> > > executable mappings (sbcl and other language environments that use an
> > > executable, run-time modified core image come to mind).
> > 
> > The page_referenced() path will only cover the _text_ section.  But
> 
> Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ? 
> Could you elaborate please ?

I was under the wild assumption that only the _text_ section will be
PROT_EXEC mapped.  No?

Thanks,
Fengguang

> > yeah, the _data_ section is more likely to grow huge in some rare cases.
> >
> > Thanks,
> > Fengguang
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> 
> -- 
> Kinds Regards
> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-07 20:44                                 ` Andrew Morton
@ 2009-05-08  8:16                                   ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, peterz, riel, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro, Minchan Kim

On Fri, May 08, 2009 at 04:44:10AM +0800, Andrew Morton wrote:
> On Thu, 7 May 2009 17:10:39 +0200
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > > +++ linux/mm/nommu.c
> > > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> > >  			added_exe_file_vma(current->mm);
> > >  			vma->vm_mm = current->mm;
> > >  		}
> > > +		if (vm_flags & VM_EXEC)
> > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > >  	}
> > 
> > I find it a bit ugly that it applies an attribute of the memory area
> > (per mm) to the page cache mapping (shared).  Because this in turn
> > means that the reference through a non-executable vma might get the
> > pages rotated just because there is/was an executable mmap around.
> 
> Yes, it's not good.  That AS_EXEC bit will hang around for arbitrarily
> long periods in the inode cache.  So we'll have AS_EXEC set on an
> entire file because someone mapped some of it with PROT_EXEC half an
> hour ago.  Where's the sense in that?

Yes that nonsense case is possible, but should be rare.

AS_EXEC means "this is (likely) an executable file".
It has broader coverage in both space and time:

- it protects the whole file instead of only the text section
- it allows to further protect the many executables/libraries that
  typically runs short in time but maybe frequently, eg. ls, cat,
  git, gcc, perl, python, ...

But none of the above cases are as important in user experience as the
currently running code, so here goes the new patch (which applies after
vmscan: report vm_flags in page_referenced()).

Thanks,
Fengguang
---
vmscan: make mapped executable pages the first class citizen

Protect referenced PROT_EXEC mapped pages from being deactivated.

PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
currently running executables and their linked libraries, they shall really be
cached aggressively to provide good user experiences.

CC: Elladan <elladan@eskimo.com>
CC: Nick Piggin <npiggin@suse.de>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned 
 	unsigned long pgscanned;
 	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
+	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct pagevec pvec;
@@ -1272,8 +1273,13 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
 			pgmoved++;
+			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
+				list_add(&page->lru, &l_active);
+				continue;
+			}
+		}
 
 		list_add(&page->lru, &l_inactive);
 	}
@@ -1282,7 +1288,6 @@ static void shrink_active_list(unsigned 
 	 * Move the pages to the [file or anon] inactive list.
 	 */
 	pagevec_init(&pvec, 1);
-	lru = LRU_BASE + file * LRU_FILE;
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
@@ -1294,6 +1299,7 @@ static void shrink_active_list(unsigned 
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
 	pgmoved = 0;  /* count pages moved to inactive list */
+	lru = LRU_BASE + file * LRU_FILE;
 	while (!list_empty(&l_inactive)) {
 		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1316,6 +1322,29 @@ static void shrink_active_list(unsigned 
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	__count_vm_events(PGDEACTIVATE, pgmoved);
+
+	pgmoved = 0;  /* count pages moved back to active list */
+	lru = LRU_ACTIVE + file * LRU_FILE;
+	while (!list_empty(&l_active)) {
+		page = lru_to_page(&l_active);
+		prefetchw_prev_lru_page(page, &l_active, flags);
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+		VM_BUG_ON(!PageActive(page));
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)
 		pagevec_strip(&pvec);

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08  8:16                                   ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, peterz, riel, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro, Minchan Kim

On Fri, May 08, 2009 at 04:44:10AM +0800, Andrew Morton wrote:
> On Thu, 7 May 2009 17:10:39 +0200
> Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > > +++ linux/mm/nommu.c
> > > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> > >  			added_exe_file_vma(current->mm);
> > >  			vma->vm_mm = current->mm;
> > >  		}
> > > +		if (vm_flags & VM_EXEC)
> > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > >  	}
> > 
> > I find it a bit ugly that it applies an attribute of the memory area
> > (per mm) to the page cache mapping (shared).  Because this in turn
> > means that the reference through a non-executable vma might get the
> > pages rotated just because there is/was an executable mmap around.
> 
> Yes, it's not good.  That AS_EXEC bit will hang around for arbitrarily
> long periods in the inode cache.  So we'll have AS_EXEC set on an
> entire file because someone mapped some of it with PROT_EXEC half an
> hour ago.  Where's the sense in that?

Yes that nonsense case is possible, but should be rare.

AS_EXEC means "this is (likely) an executable file".
It has broader coverage in both space and time:

- it protects the whole file instead of only the text section
- it allows to further protect the many executables/libraries that
  typically runs short in time but maybe frequently, eg. ls, cat,
  git, gcc, perl, python, ...

But none of the above cases are as important in user experience as the
currently running code, so here goes the new patch (which applies after
vmscan: report vm_flags in page_referenced()).

Thanks,
Fengguang
---
vmscan: make mapped executable pages the first class citizen

Protect referenced PROT_EXEC mapped pages from being deactivated.

PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
currently running executables and their linked libraries, they shall really be
cached aggressively to provide good user experiences.

CC: Elladan <elladan@eskimo.com>
CC: Nick Piggin <npiggin@suse.de>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned 
 	unsigned long pgscanned;
 	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
+	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct pagevec pvec;
@@ -1272,8 +1273,13 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
 			pgmoved++;
+			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
+				list_add(&page->lru, &l_active);
+				continue;
+			}
+		}
 
 		list_add(&page->lru, &l_inactive);
 	}
@@ -1282,7 +1288,6 @@ static void shrink_active_list(unsigned 
 	 * Move the pages to the [file or anon] inactive list.
 	 */
 	pagevec_init(&pvec, 1);
-	lru = LRU_BASE + file * LRU_FILE;
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
@@ -1294,6 +1299,7 @@ static void shrink_active_list(unsigned 
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
 	pgmoved = 0;  /* count pages moved to inactive list */
+	lru = LRU_BASE + file * LRU_FILE;
 	while (!list_empty(&l_inactive)) {
 		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1316,6 +1322,29 @@ static void shrink_active_list(unsigned 
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	__count_vm_events(PGDEACTIVATE, pgmoved);
+
+	pgmoved = 0;  /* count pages moved back to active list */
+	lru = LRU_ACTIVE + file * LRU_FILE;
+	while (!list_empty(&l_active)) {
+		page = lru_to_page(&l_active);
+		prefetchw_prev_lru_page(page, &l_active, flags);
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+		VM_BUG_ON(!PageActive(page));
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)
 		pagevec_strip(&pvec);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08  8:16                                   ` Wu Fengguang
@ 2009-05-08  8:28                                     ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, peterz, riel, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro, Minchan Kim

On Fri, May 08, 2009 at 04:16:08PM +0800, Wu Fengguang wrote:
> ---
> vmscan: make mapped executable pages the first class citizen
> 
> Protect referenced PROT_EXEC mapped pages from being deactivated.
> 
> PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
> currently running executables and their linked libraries, they shall really be
> cached aggressively to provide good user experiences.

I can verify that it actually works :)

Thanks,
Fengguang
---
printk("rescued %s 0x%lx\n", dname, page->index);

[  929.047700] rescued ld-2.9.so 0x0
[  929.051295] rescued libc-2.9.so 0x0
[  929.054984] rescued init 0x0
[  929.058086] rescued libc-2.9.so 0x1
[  929.061810] rescued libc-2.9.so 0x2
[  929.065557] rescued libc-2.9.so 0x3
[  929.069279] rescued libc-2.9.so 0x7
[  929.072978] rescued libc-2.9.so 0x8
[  929.076697] rescued libc-2.9.so 0x9
[  929.080413] rescued libc-2.9.so 0xb
[  929.084127] rescued libc-2.9.so 0xf
[  929.087849] rescued libc-2.9.so 0x10
[  929.091667] rescued libc-2.9.so 0x12
[  929.095426] rescued libc-2.9.so 0x13
[  929.099235] rescued libc-2.9.so 0x14
[  929.103055] rescued libc-2.9.so 0x15
[  929.106868] rescued libc-2.9.so 0x16
[  929.110661] rescued libc-2.9.so 0x1e
[  929.114468] rescued libc-2.9.so 0x42
[  929.118259] rescued libc-2.9.so 0x43
[  929.122063] rescued libc-2.9.so 0x44
[  929.125863] rescued libc-2.9.so 0x45
[  929.129666] rescued libc-2.9.so 0x46
[  929.133469] rescued libc-2.9.so 0x4c
[  929.137258] rescued libc-2.9.so 0x6b
[  929.141050] rescued libc-2.9.so 0x6f
[  929.144916] rescued libc-2.9.so 0x70
[  929.148695] rescued libc-2.9.so 0x71
[  929.152495] rescued libc-2.9.so 0x74
[  929.156272] rescued libc-2.9.so 0x76
[  929.160095] rescued libc-2.9.so 0x79
[  929.163904] rescued libc-2.9.so 0x7b
[  929.168007] rescued libc-2.9.so 0x7c
[  929.171800] rescued libc-2.9.so 0x7d
[  929.176518] rescued libnss_compat-2.9.so 0x0
[  929.180362] rescued libc-2.9.so 0x105
[  929.184617] rescued libc-2.9.so 0x4
[  929.188173] rescued libc-2.9.so 0x106
[  929.188191] rescued libc-2.9.so 0x5
[  929.195487] rescued libc-2.9.so 0x6
[  929.199042] rescued libc-2.9.so 0x116
[  929.202805] rescued libc-2.9.so 0x7f
[  929.202818] rescued libc-2.9.so 0x118
[  929.202838] rescued libc-2.9.so 0x11a
[  929.202863] rescued ld-2.9.so 0x1
[  929.202878] rescued ld-2.9.so 0x2
[  929.202892] rescued ld-2.9.so 0x3
[  929.202909] rescued ld-2.9.so 0x4
[  929.202925] rescued ld-2.9.so 0x5
[  929.202940] rescued ld-2.9.so 0x6
[  929.202956] rescued ld-2.9.so 0x7
[  929.202973] rescued ld-2.9.so 0xa
[  929.202989] rescued ld-2.9.so 0xb
[  929.203005] rescued ld-2.9.so 0xc
[  929.203021] rescued ld-2.9.so 0xf
[  929.203037] rescued ld-2.9.so 0x10
[  929.203052] rescued ld-2.9.so 0x14
[  929.203068] rescued ld-2.9.so 0x16
[  929.203084] rescued ld-2.9.so 0x18
[  929.203100] rescued ld-2.9.so 0x1a
[  929.203381] rescued libc-2.9.so 0x28
[  929.203392] rescued libc-2.9.so 0x29
[  929.203405] rescued libc-2.9.so 0x2a
[  929.203423] rescued libc-2.9.so 0x2b
[  929.203434] rescued libc-2.9.so 0x2f
[  929.203457] rescued libc-2.9.so 0x72
[  929.203477] rescued libc-2.9.so 0x73
[  929.203497] rescued libc-2.9.so 0x75
[  929.203516] rescued libc-2.9.so 0x77
[  929.203528] rescued libc-2.9.so 0xb7
[  929.203541] rescued libc-2.9.so 0xb8
[  929.203560] rescued libc-2.9.so 0xd9
[  929.203577] rescued libc-2.9.so 0x119
[  929.203590] rescued libc-2.9.so 0x11f
[  929.319976] rescued libc-2.9.so 0xa
[  929.323657] rescued libc-2.9.so 0xc
[  929.327374] rescued libc-2.9.so 0xd
[  929.331028] rescued libc-2.9.so 0xe
[  929.331575] rescued libc-2.9.so 0x24
[  929.331591] rescued libc-2.9.so 0x6a
[  929.331599] rescued libc-2.9.so 0x84
[  929.331609] rescued libc-2.9.so 0x6c
[  929.331627] rescued libc-2.9.so 0x6d
[  929.331642] rescued libc-2.9.so 0x64
[  929.331645] rescued libc-2.9.so 0xa6
[  929.331648] rescued libc-2.9.so 0xa7
[  929.331651] rescued libc-2.9.so 0xa8
[  929.331654] rescued libc-2.9.so 0xa9
[  929.331657] rescued libc-2.9.so 0xaa
[  929.331660] rescued libc-2.9.so 0xab
[  929.331663] rescued libc-2.9.so 0xae
[  929.331666] rescued libc-2.9.so 0xaf
[  929.331669] rescued libc-2.9.so 0xb0
[  929.331672] rescued libc-2.9.so 0xb1
[  929.331674] rescued libc-2.9.so 0xb2
[  929.331677] rescued libc-2.9.so 0xb3
[  929.331680] rescued libc-2.9.so 0xb4
[  929.331683] rescued libc-2.9.so 0xb5
[  929.331686] rescued libc-2.9.so 0xb6
[  929.331707] rescued libnss_files-2.9.so 0x1
[  929.331716] rescued libnss_files-2.9.so 0x2
[  929.331724] rescued libnss_files-2.9.so 0x8
[  929.424155] rescued libc-2.9.so 0x9e
[  929.426448] rescued libnss_nis-2.9.so 0x1
[  929.426457] rescued libnss_nis-2.9.so 0x2
[  929.426467] rescued libnss_nis-2.9.so 0x5
[  929.426475] rescued libnss_nis-2.9.so 0x4
[  929.426484] rescued libnss_nis-2.9.so 0x7
[  929.426500] rescued libnsl-2.9.so 0x1
[  929.426511] rescued libnsl-2.9.so 0x2
[  929.426520] rescued libnsl-2.9.so 0x3
[  929.426530] rescued libnsl-2.9.so 0x4
[  929.426540] rescued libnsl-2.9.so 0xf
[  929.426556] rescued libnss_compat-2.9.so 0x1
[  929.426565] rescued libnss_compat-2.9.so 0x3
[  929.426574] rescued libnss_compat-2.9.so 0x2
[  929.426584] rescued libnss_compat-2.9.so 0x5
[  929.480343] rescued libc-2.9.so 0x35
[  929.487805] rescued libc-2.9.so 0xc1
[  929.488119] rescued libc-2.9.so 0x111
[  929.488136] rescued libc-2.9.so 0x60
[  929.488157] rescued libc-2.9.so 0x8f
[  929.488170] rescued libc-2.9.so 0x9d
[  929.488185] rescued libc-2.9.so 0xa0
[  929.488193] rescued libc-2.9.so 0xcd
[  929.488209] rescued libc-2.9.so 0xcf
[  929.488221] rescued libc-2.9.so 0xde
[  929.488506] rescued libc-2.9.so 0xfd
[  929.488517] rescued libc-2.9.so 0xff
[  929.488532] rescued libc-2.9.so 0x100
[  929.488554] rescued libc-2.9.so 0x112
[  929.488567] rescued libc-2.9.so 0x11b
[  929.488584] rescued ld-2.9.so 0x11
[  929.488588] rescued libblkid.so.1.0 0x0
[  929.488599] rescued libpthread-2.9.so 0x0
[  929.488607] rescued librt-2.9.so 0x0
[  929.488616] rescued libselinux.so.1 0x1
[  929.488621] rescued libselinux.so.1 0x2
[  929.488626] rescued libselinux.so.1 0x3
[  929.488632] rescued libselinux.so.1 0x4
[  929.488637] rescued libselinux.so.1 0x5
[  929.488642] rescued libselinux.so.1 0xa
[  929.488647] rescued libselinux.so.1 0xc
[  929.488651] rescued libselinux.so.1 0x11
[  929.488656] rescued libselinux.so.1 0x12
[  929.488660] rescued libselinux.so.1 0x13
[  929.488665] rescued libselinux.so.1 0x14
[  929.488675] rescued libc-2.9.so 0xc3
[  929.488688] rescued libc-2.9.so 0xd1
[  929.488692] rescued libc-2.9.so 0x107
[  929.488703] rescued libc-2.9.so 0x9a
[  929.488716] rescued libc-2.9.so 0x9b
[  929.488720] rescued udevd 0x0
[  929.488729] rescued libc-2.9.so 0xca
[  929.489302] rescued libc-2.9.so 0x36
[  929.489314] rescued libdl-2.9.so 0x1
[  929.489324] rescued libc-2.9.so 0xda
[  929.489335] rescued libc-2.9.so 0xdd
[  929.489346] rescued libc-2.9.so 0xdc
[  929.489350] rescued libc-2.9.so 0xdb
[  929.489641] rescued udevd 0x2
[  929.489644] rescued udevd 0xa
[  929.489652] rescued libpthread-2.9.so 0x9
[  929.489659] rescued libpthread-2.9.so 0x8
[  929.489673] rescued udevd 0x1
[  929.661734] rescued libc-2.9.so 0xc2
[  929.665491] rescued libc-2.9.so 0xc6
[  929.669228] rescued libc-2.9.so 0xc7
[  929.673267] rescued libc-2.9.so 0x113
[  929.677241] rescued libc-2.9.so 0x114
[  929.681087] rescued libc-2.9.so 0x115
[  929.685510] rescued libselinux.so.1 0x0
[  929.689720] rescued libsepol.so.1 0x0
[  929.693586] rescued ld-2.9.so 0x8
[  929.697072] rescued ld-2.9.so 0x9
[  929.700527] rescued ld-2.9.so 0xd
[  929.703982] rescued ld-2.9.so 0xe
[  929.707459] rescued ld-2.9.so 0x13
[  929.711017] rescued ld-2.9.so 0x15
[  929.714551] rescued ld-2.9.so 0x17
[  929.718089] rescued libdl-2.9.so 0x0
[  929.720395] rescued libc-2.9.so 0x11
[  929.720412] rescued libc-2.9.so 0x17
[  929.720428] rescued libc-2.9.so 0x18
[  929.720443] rescued libc-2.9.so 0x19
[  929.720457] rescued libc-2.9.so 0x1a
[  929.720473] rescued libc-2.9.so 0x1b
[  929.720488] rescued libc-2.9.so 0x1c
[  929.720502] rescued libc-2.9.so 0x1d
[  929.720521] rescued libc-2.9.so 0x31
[  929.720541] rescued libc-2.9.so 0x32
[  929.720558] rescued libc-2.9.so 0x33
[  929.720578] rescued libc-2.9.so 0x34
[  929.720598] rescued libc-2.9.so 0x62
[  929.720614] rescued libc-2.9.so 0x63
[  929.720628] rescued libc-2.9.so 0x65
[  929.720646] rescued libc-2.9.so 0x6e
[  929.720668] rescued libc-2.9.so 0x7a
[  929.720685] rescued libc-2.9.so 0x82
[  929.720700] rescued libc-2.9.so 0x83
[  929.720721] rescued libc-2.9.so 0x9f
[  929.720742] rescued libc-2.9.so 0xcb
[  929.720754] rescued libc-2.9.so 0xcc
[  929.720775] rescued libc-2.9.so 0xce
[  929.720798] rescued libc-2.9.so 0x103
[  929.721080] rescued udevd 0x15
[  929.721083] rescued udevd 0x5
[  929.721085] rescued udevd 0x7
[  929.721088] rescued udevd 0xb
[  929.721091] rescued udevd 0xc
[  929.721093] rescued udevd 0xf
[  929.721096] rescued udevd 0x11
[  929.721098] rescued udevd 0x12
[  929.721101] rescued udevd 0x14
[  929.721104] rescued udevd 0x16
[  929.721106] rescued udevd 0x3
[  929.721109] rescued udevd 0xd
[  929.721111] rescued udevd 0xe
[  929.721125] rescued libc-2.9.so 0x52
[  929.721134] rescued libc-2.9.so 0x53
[  929.721145] rescued libc-2.9.so 0x54
[  929.721148] rescued udevd 0x4
[  929.721153] rescued libc-2.9.so 0x3d
[  929.721162] rescued libc-2.9.so 0x2c
[  929.721169] rescued libc-2.9.so 0x2d
[  929.721175] rescued libc-2.9.so 0x80
[  929.721181] rescued libc-2.9.so 0x81
[  929.721185] rescued libc-2.9.so 0x38
[  929.721190] rescued libc-2.9.so 0x39
[  929.736569] rescued libc-2.9.so 0x3a
[  929.736576] rescued libc-2.9.so 0x40
[  929.736581] rescued libc-2.9.so 0x41
[  929.736597] rescued libc-2.9.so 0x55
[  929.736601] rescued libc-2.9.so 0x56
[  929.736615] rescued libpthread-2.9.so 0x1
[  929.736622] rescued libpthread-2.9.so 0x2
[  929.736629] rescued libpthread-2.9.so 0x3
[  929.736636] rescued libpthread-2.9.so 0x4
[  929.736646] rescued libpthread-2.9.so 0x5
[  929.736654] rescued libpthread-2.9.so 0xa
[  929.736662] rescued libpthread-2.9.so 0xc
[  929.736669] rescued libpthread-2.9.so 0xe
[  929.736677] rescued libpthread-2.9.so 0x10
[  929.736685] rescued librt-2.9.so 0x1
[  929.736691] rescued librt-2.9.so 0x2
[  929.736696] rescued librt-2.9.so 0x5
[  929.736971] rescued libc-2.9.so 0x57
[  929.736984] rescued libc-2.9.so 0x8e
[  929.736992] rescued libc-2.9.so 0x90
[  929.736999] rescued libc-2.9.so 0x91
[  929.737007] rescued libc-2.9.so 0x95
[  929.737014] rescued libc-2.9.so 0x96
[  929.737021] rescued libc-2.9.so 0x97
[  929.737027] rescued librt-2.9.so 0x3
[  929.737042] rescued libc-2.9.so 0x47
[  929.737047] rescued libc-2.9.so 0x59
[  929.737055] rescued libc-2.9.so 0xc8
[  929.737059] rescued libc-2.9.so 0xc9
[  929.737067] rescued libm-2.9.so 0x0
[  929.737073] rescued libm-2.9.so 0x1
[  930.005563] rescued libnss_files-2.9.so 0x0
[  930.005830] rescued libm-2.9.so 0x2
[  930.005836] rescued libm-2.9.so 0x3
[  930.005841] rescued libm-2.9.so 0x44
[  930.005845] rescued libm-2.9.so 0x28
[  930.005849] rescued portmap 0x0
[  930.005861] rescued libwrap.so.0.7.6 0x0
[  930.005865] rescued rpc.statd 0x0
[  930.005868] rescued rpc.statd 0x1
[  930.005870] rescued rpc.statd 0x2
[  930.005873] rescued rpc.statd 0x4
[  930.005876] rescued rpc.statd 0x8
[  930.005881] rescued libwrap.so.0.7.6 0x1
[  930.005885] rescued libwrap.so.0.7.6 0x2
[  930.005889] rescued libwrap.so.0.7.6 0x3
[  930.005893] rescued libwrap.so.0.7.6 0x6
[  930.005897] rescued rpc.idmapd 0x0
[  930.006158] rescued libresolv-2.9.so 0x0
[  930.006171] rescued libc-2.9.so 0x9c
[  930.006182] rescued libc-2.9.so 0xfe
[  930.006186] rescued libnfsidmap.so.0.3.0 0x0
[  930.006189] rescued libevent-1.3e.so.1.0.3 0x0
[  930.006196] rescued libpthread-2.9.so 0xb
[  930.006202] rescued libattr.so.1.1.0 0x0
[  930.006208] rescued libc-2.9.so 0x61
[  930.006211] rescued libattr.so.1.1.0 0x1
[  930.006215] rescued libattr.so.1.1.0 0x3
[  930.006219] rescued init 0x1
[  930.006221] rescued init 0x2
[  930.006224] rescued init 0x5
[  930.006226] rescued init 0x7
[  930.006229] rescued init 0x6
[  930.006233] rescued libsepol.so.1 0x2
[  930.006236] rescued libsepol.so.1 0x3
[  930.006238] rescued libsepol.so.1 0x4
[  930.006241] rescued libsepol.so.1 0x2d
[  930.006244] rescued init 0x3
[  930.006246] rescued init 0x4
[  930.006250] rescued acpid 0x0
[  930.006259] rescued libdbus-1.so.3.4.0 0x0
[  930.006262] rescued dbus-daemon 0x0
[  930.006275] rescued libpthread-2.9.so 0xd
[  930.006279] rescued libexpat.so.1.5.2 0x0
[  930.006287] rescued sshd 0x0
[  930.006292] rescued libkeyutils-1.2.so 0x0
[  930.006297] rescued libkrb5support.so.0.1 0x0
[  930.006304] rescued libcom_err.so.2.1 0x0
[  930.006311] rescued libk5crypto.so.3.1 0x0
[  930.006318] rescued libgssapi_krb5.so.2.2 0x0
[  930.006322] rescued libresolv-2.9.so 0x1
[  930.006325] rescued libresolv-2.9.so 0x2
[  930.006329] rescued libresolv-2.9.so 0x3
[  930.006333] rescued libresolv-2.9.so 0xf
[  930.006340] rescued libcrypt-2.9.so 0x0
[  930.207054] rescued libnss_nis-2.9.so 0x0
[  930.211539] rescued libnsl-2.9.so 0x0
[  930.215805] rescued libz.so.1.2.3.3 0x0
[  930.215968] rescued libkrb5.so.3.3 0x8
[  930.215972] rescued libkrb5.so.3.3 0x9
[  930.215976] rescued libkrb5.so.3.3 0xa
[  930.215979] rescued libkrb5.so.3.3 0xb
[  930.215983] rescued libkrb5.so.3.3 0xc
[  930.215986] rescued libkrb5.so.3.3 0xd
[  930.215989] rescued libkrb5.so.3.3 0xe
[  930.215993] rescued libkrb5.so.3.3 0xf
[  930.215996] rescued libkrb5.so.3.3 0x10
[  930.216000] rescued libkrb5.so.3.3 0x11
[  930.216003] rescued libkrb5.so.3.3 0x12
[  930.216006] rescued libkrb5.so.3.3 0x13
[  930.216009] rescued libkrb5.so.3.3 0x14
[  930.216013] rescued libkrb5.so.3.3 0x15
[  930.216016] rescued libkrb5.so.3.3 0x16
[  930.216019] rescued libkrb5.so.3.3 0x19
[  930.216023] rescued libkrb5.so.3.3 0x88
[  930.216028] rescued libgssapi_krb5.so.2.2 0x1
[  930.216031] rescued libgssapi_krb5.so.2.2 0x2
[  930.216035] rescued libgssapi_krb5.so.2.2 0x3
[  930.216038] rescued libgssapi_krb5.so.2.2 0x4
[  930.216041] rescued libgssapi_krb5.so.2.2 0x5
[  930.216045] rescued libgssapi_krb5.so.2.2 0x6
[  930.216048] rescued libgssapi_krb5.so.2.2 0x25
[  930.216053] rescued libcrypt-2.9.so 0x6
[  930.216057] rescued libz.so.1.2.3.3 0x1
[  930.216060] rescued libz.so.1.2.3.3 0x2
[  930.216064] rescued libz.so.1.2.3.3 0xe
[  930.216068] rescued libutil-2.9.so 0x1
[  930.216075] rescued libcrypto.so.0.9.8 0x5
[  930.216080] rescued libcrypto.so.0.9.8 0x6
[  930.216085] rescued libcrypto.so.0.9.8 0x7
[  930.216368] rescued libcrypto.so.0.9.8 0x8
[  930.216373] rescued libcrypto.so.0.9.8 0x9
[  930.216378] rescued libcrypto.so.0.9.8 0xa
[  930.216383] rescued libcrypto.so.0.9.8 0xb
[  930.216386] rescued libcrypto.so.0.9.8 0xc
[  930.216391] rescued libcrypto.so.0.9.8 0xd
[  930.216395] rescued libcrypto.so.0.9.8 0xe
[  930.216399] rescued libcrypto.so.0.9.8 0xf
[  930.216404] rescued libcrypto.so.0.9.8 0x10
[  930.216408] rescued libcrypto.so.0.9.8 0x11
[  930.216412] rescued libcrypto.so.0.9.8 0x12
[  930.216415] rescued libcrypto.so.0.9.8 0x13
[  930.216418] rescued libcrypto.so.0.9.8 0x14
[  930.216423] rescued libcrypto.so.0.9.8 0x15
[  930.216426] rescued libcrypto.so.0.9.8 0x16
[  930.216431] rescued libcrypto.so.0.9.8 0x17
[  930.216434] rescued libcrypto.so.0.9.8 0x18
[  930.216439] rescued libcrypto.so.0.9.8 0x19
[  930.216443] rescued libcrypto.so.0.9.8 0x1a
[  930.216447] rescued libcrypto.so.0.9.8 0x1b
[  930.216450] rescued libcrypto.so.0.9.8 0x1c
[  930.216455] rescued libcrypto.so.0.9.8 0x1d
[  930.216460] rescued libcrypto.so.0.9.8 0x1e
[  930.216464] rescued libcrypto.so.0.9.8 0x1f
[  930.216469] rescued libcrypto.so.0.9.8 0x20
[  930.216473] rescued libcrypto.so.0.9.8 0x21
[  930.216476] rescued libcrypto.so.0.9.8 0x22
[  930.216480] rescued libcrypto.so.0.9.8 0x23
[  930.216483] rescued libcrypto.so.0.9.8 0x24
[  930.216487] rescued libcrypto.so.0.9.8 0x25
[  930.216491] rescued libcrypto.so.0.9.8 0x26
[  930.216494] rescued libcrypto.so.0.9.8 0x27
[  930.216503] rescued libcrypto.so.0.9.8 0x28
[  930.216506] rescued libcrypto.so.0.9.8 0x29
[  930.216510] rescued libcrypto.so.0.9.8 0x2a
[  930.216513] rescued libcrypto.so.0.9.8 0x2b
[  930.216516] rescued libcrypto.so.0.9.8 0x2c
[  930.216520] rescued libcrypto.so.0.9.8 0x2d
[  930.216525] rescued libcrypto.so.0.9.8 0x2e
[  930.216530] rescued libcrypto.so.0.9.8 0x2f
[  930.216535] rescued libcrypto.so.0.9.8 0x30
[  930.216538] rescued libcrypto.so.0.9.8 0x31
[  930.216541] rescued libcrypto.so.0.9.8 0x32
[  930.216544] rescued libcrypto.so.0.9.8 0x33
[  930.216548] rescued libcrypto.so.0.9.8 0x34
[  930.216551] rescued libcrypto.so.0.9.8 0x35
[  930.217015] rescued libcrypto.so.0.9.8 0x36
[  930.217019] rescued libcrypto.so.0.9.8 0x37
[  930.217022] rescued libcrypto.so.0.9.8 0x38
[  930.217025] rescued libcrypto.so.0.9.8 0x39
[  930.217029] rescued libcrypto.so.0.9.8 0x3a
[  930.217032] rescued libcrypto.so.0.9.8 0x3b
[  930.217036] rescued libcrypto.so.0.9.8 0x3c
[  930.217039] rescued libcrypto.so.0.9.8 0x3d
[  930.217043] rescued libcrypto.so.0.9.8 0x3e
[  930.217046] rescued libcrypto.so.0.9.8 0x3f
[  930.217049] rescued libcrypto.so.0.9.8 0x40
[  930.217053] rescued libcrypto.so.0.9.8 0x41
[  930.217056] rescued libcrypto.so.0.9.8 0x42
[  930.217060] rescued libcrypto.so.0.9.8 0x43
[  930.217063] rescued libcrypto.so.0.9.8 0x44
[  930.217066] rescued libcrypto.so.0.9.8 0x45
[  930.217070] rescued libcrypto.so.0.9.8 0x46
[  930.217073] rescued libcrypto.so.0.9.8 0x47
[  930.217076] rescued libcrypto.so.0.9.8 0x48
[  930.217080] rescued libcrypto.so.0.9.8 0x49
[  930.217083] rescued libcrypto.so.0.9.8 0x4a
[  930.217086] rescued libcrypto.so.0.9.8 0x4b
[  930.217090] rescued libcrypto.so.0.9.8 0x4c
[  930.217093] rescued libcrypto.so.0.9.8 0x4d
[  930.217096] rescued libcrypto.so.0.9.8 0x4e
[  930.217099] rescued libcrypto.so.0.9.8 0x4f
[  930.217103] rescued libcrypto.so.0.9.8 0x50
[  930.217106] rescued libcrypto.so.0.9.8 0x51
[  930.217109] rescued libcrypto.so.0.9.8 0x52
[  930.217113] rescued libcrypto.so.0.9.8 0x53
[  930.217116] rescued libcrypto.so.0.9.8 0x54
[  930.217119] rescued libcrypto.so.0.9.8 0x55
[  930.217384] rescued libcrypto.so.0.9.8 0x56
[  930.217387] rescued libcrypto.so.0.9.8 0x57
[  930.217391] rescued libcrypto.so.0.9.8 0x58
[  930.217394] rescued libcrypto.so.0.9.8 0x59
[  930.217397] rescued libcrypto.so.0.9.8 0x5a
[  930.217401] rescued libcrypto.so.0.9.8 0x5b
[  930.217404] rescued libcrypto.so.0.9.8 0x5c
[  930.217408] rescued libcrypto.so.0.9.8 0x5d
[  930.217411] rescued libcrypto.so.0.9.8 0x5e
[  930.217414] rescued libcrypto.so.0.9.8 0x5f
[  930.217418] rescued libcrypto.so.0.9.8 0x60
[  930.217421] rescued libcrypto.so.0.9.8 0x61
[  930.217425] rescued libcrypto.so.0.9.8 0x62
[  930.217428] rescued libcrypto.so.0.9.8 0x63
[  930.217431] rescued libcrypto.so.0.9.8 0x64
[  930.217436] rescued libcrypto.so.0.9.8 0x65
[  930.217440] rescued libcrypto.so.0.9.8 0x66
[  930.217443] rescued libcrypto.so.0.9.8 0x67
[  930.217448] rescued libcrypto.so.0.9.8 0x68
[  930.217452] rescued libcrypto.so.0.9.8 0x69
[  930.217456] rescued libcrypto.so.0.9.8 0x6a
[  930.217460] rescued libcrypto.so.0.9.8 0x6b
[  930.217464] rescued libcrypto.so.0.9.8 0x6c
[  930.217468] rescued libcrypto.so.0.9.8 0x6d
[  930.217472] rescued libcrypto.so.0.9.8 0x6e
[  930.217477] rescued libcrypto.so.0.9.8 0x70
[  930.217482] rescued libcrypto.so.0.9.8 0x71
[  930.217487] rescued libcrypto.so.0.9.8 0x73
[  930.217492] rescued libcrypto.so.0.9.8 0x74
[  930.217497] rescued libcrypto.so.0.9.8 0x75
[  930.217501] rescued libcrypto.so.0.9.8 0x77
[  930.217505] rescued libcrypto.so.0.9.8 0x78
[  930.217515] rescued libcrypto.so.0.9.8 0x79
[  930.217519] rescued libcrypto.so.0.9.8 0x12e
[  930.217523] rescued libcrypto.so.0.9.8 0x12f
[  930.217528] rescued libpam.so.0.81.12 0x1
[  930.217533] rescued libpam.so.0.81.12 0x2
[  930.217537] rescued libpam.so.0.81.12 0x8
[  930.217544] rescued sshd 0x1
[  930.217549] rescued sshd 0x2
[  930.217554] rescued sshd 0x3
[  930.217559] rescued sshd 0x4
[  930.217564] rescued sshd 0x5
[  930.217567] rescued sshd 0x6
[  930.217571] rescued sshd 0x7
[  930.217575] rescued sshd 0x8
[  930.865111] rescued libutil-2.9.so 0x0
[  930.865493] rescued sshd 0x9
[  930.865499] rescued sshd 0xa
[  930.865503] rescued sshd 0xb
[  930.865508] rescued sshd 0xc
[  930.865512] rescued sshd 0x12
[  930.865517] rescued sshd 0x48
[  930.865523] rescued sshd 0x4d
[  930.865527] rescued sshd 0x51
[  930.865532] rescued sshd 0x53
[  930.865535] rescued sshd 0x54
[  930.865540] rescued sshd 0x55
[  930.865545] rescued sshd 0x56
[  930.865550] rescued sshd 0x57
[  930.865554] rescued sshd 0x58
[  930.865558] rescued sshd 0x5e
[  930.865562] rescued sshd 0x67
[  930.865571] rescued libnss_files-2.9.so 0x3
[  930.865598] rescued libc-2.9.so 0xc5
[  930.865607] rescued libc-2.9.so 0xe6
[  930.865618] rescued ld-2.9.so 0x19
[  930.865622] rescued rpc.mountd 0x0
[  930.865625] rescued libnss_files-2.9.so 0x6
[  930.865628] rescued libnss_files-2.9.so 0x9
[  930.865631] rescued libc-2.9.so 0xe7
[  930.865635] rescued libc-2.9.so 0xef
[  930.865638] rescued libc-2.9.so 0xf0
[  930.865641] rescued libc-2.9.so 0xf1
[  930.865644] rescued libc-2.9.so 0xf2
[  930.865648] rescued libc-2.9.so 0xf3
[  930.865921] rescued libc-2.9.so 0xf4
[  930.865925] rescued libc-2.9.so 0xf6
[  930.865928] rescued libc-2.9.so 0xf7
[  930.865932] rescued hald 0x0
[  930.865936] rescued hald-runner 0x0
[  930.865942] rescued libdbus-glib-1.so.2.1.0 0x0
[  930.865952] rescued libpcre.so.3.12.1 0x0
[  930.865961] rescued libglib-2.0.so.0.2000.1 0x2
[  930.865967] rescued libglib-2.0.so.0.2000.1 0x4
[  930.865973] rescued libglib-2.0.so.0.2000.1 0x5
[  930.865978] rescued libglib-2.0.so.0.2000.1 0xc
[  930.865984] rescued libglib-2.0.so.0.2000.1 0xd
[  930.865990] rescued libglib-2.0.so.0.2000.1 0x10
[  930.865995] rescued libglib-2.0.so.0.2000.1 0x11
[  930.866001] rescued libglib-2.0.so.0.2000.1 0x12
[  930.866006] rescued libglib-2.0.so.0.2000.1 0x13
[  930.866012] rescued libglib-2.0.so.0.2000.1 0x14
[  930.866018] rescued libglib-2.0.so.0.2000.1 0x6f
[  930.866022] rescued libglib-2.0.so.0.2000.1 0x70
[  930.866027] rescued libglib-2.0.so.0.2000.1 0xb3
[  930.866033] rescued libhal.so.1.0.0 0x0
[  930.866036] rescued libglib-2.0.so.0.2000.1 0x7f
[  930.866042] rescued libglib-2.0.so.0.2000.1 0xe
[  930.866057] rescued libdbus-1.so.3.4.0 0x1
[  930.866063] rescued libdbus-1.so.3.4.0 0x2
[  930.866069] rescued libdbus-1.so.3.4.0 0x3
[  930.866074] rescued libdbus-1.so.3.4.0 0x4
[  930.866080] rescued libdbus-1.so.3.4.0 0x5
[  930.866085] rescued libdbus-1.so.3.4.0 0x6
[  930.866091] rescued libdbus-1.so.3.4.0 0x7
[  930.866097] rescued libdbus-1.so.3.4.0 0x11
[  930.866103] rescued libdbus-1.so.3.4.0 0x24
[  930.866108] rescued libdbus-1.so.3.4.0 0x25
[  930.866114] rescued libdbus-1.so.3.4.0 0x27
[  930.866120] rescued libdbus-1.so.3.4.0 0x28
[  930.866125] rescued libdbus-1.so.3.4.0 0x29
[  930.866131] rescued libdbus-1.so.3.4.0 0x2a
[  931.125223] rescued libpam.so.0.81.12 0x0
[  931.129347] rescued libkeyutils-1.2.so 0x1
[  931.132618] rescued libdbus-1.so.3.4.0 0x2c
[  931.132624] rescued libdbus-1.so.3.4.0 0x2d
[  931.132630] rescued libdbus-1.so.3.4.0 0x32
[  931.132636] rescued libglib-2.0.so.0.2000.1 0x42
[  931.132644] rescued libgobject-2.0.so.0.2000.1 0x1
[  931.132650] rescued libgobject-2.0.so.0.2000.1 0x2
[  931.132653] rescued hald 0x41
[  931.132655] rescued hald 0x42
[  931.132661] rescued libc-2.9.so 0x66
[  931.132665] rescued hald-addon-input 0x0
[  931.132671] rescued libdbus-1.so.3.4.0 0x8
[  931.132676] rescued libdbus-1.so.3.4.0 0x9
[  931.132682] rescued libdbus-1.so.3.4.0 0xa
[  931.132687] rescued libdbus-1.so.3.4.0 0xb
[  931.132693] rescued libdbus-1.so.3.4.0 0xc
[  931.132698] rescued libdbus-1.so.3.4.0 0xd
[  931.132704] rescued libdbus-1.so.3.4.0 0xe
[  931.132709] rescued libdbus-1.so.3.4.0 0xf
[  931.132715] rescued libdbus-1.so.3.4.0 0x10
[  931.132720] rescued libdbus-1.so.3.4.0 0x12
[  931.132726] rescued libdbus-1.so.3.4.0 0x13
[  931.132731] rescued libdbus-1.so.3.4.0 0x14
[  931.132737] rescued libdbus-1.so.3.4.0 0x15
[  931.132742] rescued libdbus-1.so.3.4.0 0x16
[  931.132748] rescued libdbus-1.so.3.4.0 0x17
[  931.132753] rescued libdbus-1.so.3.4.0 0x18
[  931.132759] rescued libdbus-1.so.3.4.0 0x19
[  931.132764] rescued libdbus-1.so.3.4.0 0x1a
[  931.132770] rescued libdbus-1.so.3.4.0 0x1b
[  931.132775] rescued libdbus-1.so.3.4.0 0x1c
[  931.132781] rescued libdbus-1.so.3.4.0 0x1d
[  931.132786] rescued libdbus-1.so.3.4.0 0x20
[  931.133058] rescued libdbus-1.so.3.4.0 0x21
[  931.133064] rescued libdbus-1.so.3.4.0 0x22
[  931.133070] rescued libdbus-1.so.3.4.0 0x23
[  931.133075] rescued libdbus-1.so.3.4.0 0x26
[  931.133081] rescued libdbus-1.so.3.4.0 0x2b
[  931.133086] rescued libdbus-1.so.3.4.0 0x2e
[  931.133092] rescued libdbus-1.so.3.4.0 0x2f
[  931.133097] rescued libdbus-1.so.3.4.0 0x31
[  931.133102] rescued libhal.so.1.0.0 0x1
[  931.133106] rescued libhal.so.1.0.0 0x2
[  931.133111] rescued libhal.so.1.0.0 0x3
[  931.133114] rescued libhal.so.1.0.0 0x4
[  931.133119] rescued libhal.so.1.0.0 0xb
[  931.133123] rescued libhal.so.1.0.0 0xc
[  931.133127] rescued libhal.so.1.0.0 0xd
[  931.133133] rescued libpcre.so.3.12.1 0x1
[  931.133138] rescued libpcre.so.3.12.1 0x1d
[  931.133144] rescued libglib-2.0.so.0.2000.1 0x3
[  931.133150] rescued libglib-2.0.so.0.2000.1 0x6
[  931.133155] rescued libglib-2.0.so.0.2000.1 0x7
[  931.133161] rescued libglib-2.0.so.0.2000.1 0x9
[  931.133167] rescued libglib-2.0.so.0.2000.1 0xa
[  931.133172] rescued libglib-2.0.so.0.2000.1 0xb
[  931.133178] rescued libglib-2.0.so.0.2000.1 0xf
[  931.133183] rescued libglib-2.0.so.0.2000.1 0x15
[  931.133189] rescued libglib-2.0.so.0.2000.1 0x71
[  931.133193] rescued hald-addon-cpufreq 0x0
[  931.133198] rescued libglib-2.0.so.0.2000.1 0x8
[  931.133204] rescued libglib-2.0.so.0.2000.1 0x3a
[  931.133210] rescued libglib-2.0.so.0.2000.1 0x56
[  931.133215] rescued libglib-2.0.so.0.2000.1 0x57
[  931.133221] rescued libglib-2.0.so.0.2000.1 0x58
[  931.133234] rescued libglib-2.0.so.0.2000.1 0x59
[  931.133239] rescued libglib-2.0.so.0.2000.1 0x6e
[  931.133245] rescued libglib-2.0.so.0.2000.1 0x7a
[  931.133251] rescued libglib-2.0.so.0.2000.1 0x7e
[  931.133254] rescued libhal.so.1.0.0 0xa
[  931.133260] rescued libglib-2.0.so.0.2000.1 0x5a
[  931.133266] rescued libglib-2.0.so.0.2000.1 0x41
[  931.133271] rescued libglib-2.0.so.0.2000.1 0x5b
[  931.133727] rescued hald 0x34
[  931.133732] rescued libglib-2.0.so.0.2000.1 0x5c
[  931.133737] rescued libglib-2.0.so.0.2000.1 0x5d
[  931.133742] rescued libglib-2.0.so.0.2000.1 0x6c
[  931.133746] rescued pulseaudio 0x0
[  931.133750] rescued libogg.so.0.5.3 0x0
[  931.133755] rescued libFLAC.so.8.2.0 0x0
[  931.133759] rescued libsndfile.so.1.0.17 0x0
[  931.133764] rescued libsamplerate.so.0.1.4 0x0
[  931.133768] rescued libltdl.so.3.1.6 0x0
[  931.133773] rescued libcap.so.1.10 0x0
[  931.134047] rescued gconf-helper 0x0
[  931.134051] rescued libpulsecore.so.5.0.1 0x12
[  931.134053] rescued libpulsecore.so.5.0.1 0x13
[  931.134057] rescued libpulsecore.so.5.0.1 0x14
[  931.134063] rescued libpthread-2.9.so 0xf
[  931.134067] rescued gconfd-2 0x0
[  931.134072] rescued libgmodule-2.0.so.0.2000.1 0x0
[  931.134076] rescued libgthread-2.0.so.0.2000.1 0x0
[  931.134080] rescued pulseaudio 0x1
[  931.134082] rescued pulseaudio 0x2
[  931.134085] rescued pulseaudio 0x3
[  931.134087] rescued pulseaudio 0x7
[  931.134090] rescued pulseaudio 0x8
[  931.134092] rescued pulseaudio 0x9
[  931.134095] rescued pulseaudio 0xc
[  931.134099] rescued liboil-0.3.so.0.3.0 0x1
[  931.134102] rescued liboil-0.3.so.0.3.0 0x2
[  931.134105] rescued liboil-0.3.so.0.3.0 0x3
[  931.134107] rescued liboil-0.3.so.0.3.0 0x4
[  931.134110] rescued liboil-0.3.so.0.3.0 0x5
[  931.134113] rescued liboil-0.3.so.0.3.0 0x6
[  931.134115] rescued liboil-0.3.so.0.3.0 0x7
[  931.134118] rescued liboil-0.3.so.0.3.0 0x8
[  931.134121] rescued liboil-0.3.so.0.3.0 0x9
[  931.134131] rescued liboil-0.3.so.0.3.0 0xa
[  931.134134] rescued liboil-0.3.so.0.3.0 0xb
[  931.134137] rescued liboil-0.3.so.0.3.0 0xc
[  931.134139] rescued liboil-0.3.so.0.3.0 0xd
[  931.134142] rescued liboil-0.3.so.0.3.0 0xe
[  931.134145] rescued liboil-0.3.so.0.3.0 0xf
[  931.134147] rescued liboil-0.3.so.0.3.0 0x10
[  931.134150] rescued liboil-0.3.so.0.3.0 0x11
[  931.134153] rescued liboil-0.3.so.0.3.0 0x12
[  931.134156] rescued liboil-0.3.so.0.3.0 0x13
[  931.134158] rescued liboil-0.3.so.0.3.0 0x14
[  931.134161] rescued liboil-0.3.so.0.3.0 0x15
[  931.134164] rescued liboil-0.3.so.0.3.0 0x16
[  931.134166] rescued liboil-0.3.so.0.3.0 0x17
[  931.644692] rescued libkrb5support.so.0.1 0x1
[  931.645060] rescued liboil-0.3.so.0.3.0 0x18
[  931.645063] rescued liboil-0.3.so.0.3.0 0x19
[  931.645066] rescued liboil-0.3.so.0.3.0 0x1a
[  931.645069] rescued liboil-0.3.so.0.3.0 0x1b
[  931.645072] rescued liboil-0.3.so.0.3.0 0x1c
[  931.645075] rescued liboil-0.3.so.0.3.0 0x1d
[  931.645078] rescued liboil-0.3.so.0.3.0 0x1e
[  931.645080] rescued liboil-0.3.so.0.3.0 0x1f
[  931.645083] rescued liboil-0.3.so.0.3.0 0x20
[  931.645086] rescued liboil-0.3.so.0.3.0 0x21
[  931.645089] rescued liboil-0.3.so.0.3.0 0x22
[  931.645091] rescued liboil-0.3.so.0.3.0 0x23
[  931.645094] rescued liboil-0.3.so.0.3.0 0x24
[  931.645097] rescued liboil-0.3.so.0.3.0 0x25
[  931.645100] rescued liboil-0.3.so.0.3.0 0x26
[  931.645102] rescued liboil-0.3.so.0.3.0 0x27
[  931.645105] rescued liboil-0.3.so.0.3.0 0x28
[  931.645108] rescued liboil-0.3.so.0.3.0 0x29
[  931.645111] rescued liboil-0.3.so.0.3.0 0x2a
[  931.645113] rescued liboil-0.3.so.0.3.0 0x2b
[  931.645116] rescued liboil-0.3.so.0.3.0 0x2c
[  931.645119] rescued liboil-0.3.so.0.3.0 0x2d
[  931.645123] rescued liboil-0.3.so.0.3.0 0x5a
[  931.645126] rescued libogg.so.0.5.3 0x1
[  931.645129] rescued libogg.so.0.5.3 0x3
[  931.645133] rescued libFLAC.so.8.2.0 0x1
[  931.645135] rescued libFLAC.so.8.2.0 0x2
[  931.645138] rescued libFLAC.so.8.2.0 0x3
[  931.645141] rescued libFLAC.so.8.2.0 0x4
[  931.645143] rescued libFLAC.so.8.2.0 0x5
[  931.645146] rescued libFLAC.so.8.2.0 0x6
[  931.645149] rescued libFLAC.so.8.2.0 0x7
[  931.645421] rescued libFLAC.so.8.2.0 0x8
[  931.645424] rescued libFLAC.so.8.2.0 0x9
[  931.645426] rescued libFLAC.so.8.2.0 0xa
[  931.645429] rescued libFLAC.so.8.2.0 0xb
[  931.645432] rescued libFLAC.so.8.2.0 0xc
[  931.645435] rescued libFLAC.so.8.2.0 0x43
[  931.645439] rescued libsndfile.so.1.0.17 0x1
[  931.645441] rescued libsndfile.so.1.0.17 0x2
[  931.645444] rescued libsndfile.so.1.0.17 0x3
[  931.645447] rescued libsndfile.so.1.0.17 0x4
[  931.645450] rescued libsndfile.so.1.0.17 0x3e
[  931.645453] rescued libsamplerate.so.0.1.4 0x2
[  931.645457] rescued libltdl.so.3.1.6 0x1
[  931.645460] rescued libltdl.so.3.1.6 0x5
[  931.645464] rescued libpulsecore.so.5.0.1 0x1
[  931.645468] rescued libpulsecore.so.5.0.1 0x2
[  931.645471] rescued libpulsecore.so.5.0.1 0x3
[  931.645474] rescued libpulsecore.so.5.0.1 0x4
[  931.645478] rescued libpulsecore.so.5.0.1 0x5
[  931.645481] rescued libpulsecore.so.5.0.1 0x6
[  931.645484] rescued libpulsecore.so.5.0.1 0x7
[  931.645487] rescued libpulsecore.so.5.0.1 0x8
[  931.645490] rescued libpulsecore.so.5.0.1 0x9
[  931.645494] rescued libpulsecore.so.5.0.1 0xa
[  931.645497] rescued libpulsecore.so.5.0.1 0xb
[  931.645500] rescued libpulsecore.so.5.0.1 0xc
[  931.645502] rescued libpulsecore.so.5.0.1 0xd
[  931.645505] rescued libpulsecore.so.5.0.1 0xe
[  931.645508] rescued libpulsecore.so.5.0.1 0xf
[  931.645511] rescued libpulsecore.so.5.0.1 0x10
[  931.645514] rescued libpulsecore.so.5.0.1 0x11
[  931.645517] rescued libpulsecore.so.5.0.1 0x1a
[  931.645527] rescued libpulsecore.so.5.0.1 0x29
[  931.645530] rescued libpulsecore.so.5.0.1 0x5f
[  931.645534] rescued libcap.so.1.10 0x2
[  931.645548] rescued libc-2.9.so 0xb9
[  931.645554] rescued libc-2.9.so 0xba
[  931.645560] rescued libc-2.9.so 0xbb
[  931.645568] rescued libc-2.9.so 0xec
[  931.645574] rescued libc-2.9.so 0xee
[  931.645580] rescued libc-2.9.so 0xeb
[  931.645591] rescued getty 0x0
[  931.645596] rescued sshd 0xd
[  931.645601] rescued sshd 0xe
[  931.967139] rescued libkrb5support.so.0.1 0x5
[  931.971614] rescued libc-2.9.so 0xbc
[  931.972606] rescued sshd 0x40
[  931.972611] rescued sshd 0x41
[  931.972615] rescued sshd 0x63
[  931.972624] rescued libcrypto.so.0.9.8 0x6f
[  931.972629] rescued libcrypto.so.0.9.8 0x72
[  931.972634] rescued libcrypto.so.0.9.8 0x76
[  931.972639] rescued libcrypto.so.0.9.8 0x7f
[  931.972644] rescued libcrypto.so.0.9.8 0x80
[  931.972649] rescued libcrypto.so.0.9.8 0x82
[  931.972654] rescued libcrypto.so.0.9.8 0x83
[  931.972659] rescued libcrypto.so.0.9.8 0x84
[  931.972924] rescued libcrypto.so.0.9.8 0x85
[  931.972928] rescued libcrypto.so.0.9.8 0x9e
[  931.972931] rescued libcrypto.so.0.9.8 0x9f
[  931.972934] rescued libcrypto.so.0.9.8 0xa0
[  931.972939] rescued libcrypto.so.0.9.8 0xa1
[  931.972943] rescued libcrypto.so.0.9.8 0xa2
[  931.972947] rescued libcrypto.so.0.9.8 0xa3
[  931.972950] rescued libcrypto.so.0.9.8 0xa4
[  931.972954] rescued libcrypto.so.0.9.8 0xa6
[  931.972958] rescued libcrypto.so.0.9.8 0xa7
[  931.972962] rescued libcrypto.so.0.9.8 0xa8
[  931.972965] rescued libcrypto.so.0.9.8 0xa9
[  931.972969] rescued libcrypto.so.0.9.8 0xaa
[  931.972973] rescued libcrypto.so.0.9.8 0xab
[  931.972976] rescued libcrypto.so.0.9.8 0xac
[  931.972981] rescued libcrypto.so.0.9.8 0xbc
[  931.972986] rescued libcrypto.so.0.9.8 0xbf
[  931.972990] rescued libcrypto.so.0.9.8 0xc3
[  931.972995] rescued libcrypto.so.0.9.8 0xc9
[  931.972999] rescued libcrypto.so.0.9.8 0xca
[  931.973004] rescued libcrypto.so.0.9.8 0xcb
[  931.973009] rescued libcrypto.so.0.9.8 0xd6
[  931.973013] rescued libcrypto.so.0.9.8 0xd7
[  931.973018] rescued libcrypto.so.0.9.8 0xd8
[  931.973023] rescued libcrypto.so.0.9.8 0xd9
[  931.973028] rescued libcrypto.so.0.9.8 0xdd
[  931.973033] rescued libcrypto.so.0.9.8 0xde
[  931.973037] rescued libcrypto.so.0.9.8 0xdf
[  931.973042] rescued libcrypto.so.0.9.8 0xe0
[  931.973047] rescued libcrypto.so.0.9.8 0xe3
[  931.973052] rescued libcrypto.so.0.9.8 0x133
[  931.973056] rescued libcrypto.so.0.9.8 0x13c
[  931.973068] rescued libcrypto.so.0.9.8 0x147
[  931.973072] rescued libcrypto.so.0.9.8 0x148
[  931.973077] rescued sshd 0xf
[  931.973080] rescued sshd 0x11
[  931.973084] rescued sshd 0x17
[  931.973088] rescued sshd 0x2f
[  931.973093] rescued sshd 0x34
[  931.973097] rescued sshd 0x35
[  931.973102] rescued sshd 0x36
[  931.973105] rescued sshd 0x3f
[  931.973110] rescued sshd 0x49
[  931.973113] rescued sshd 0x4a
[  931.973118] rescued sshd 0x5c
[  931.973121] rescued sshd 0x60
[  931.973570] rescued sshd 0x61
[  931.973575] rescued sshd 0x65
[  931.973581] rescued pam_env.so 0x0
[  931.973586] rescued pam_unix.so 0x0
[  931.973591] rescued pam_nologin.so 0x0
[  931.973597] rescued pam_motd.so 0x0
[  931.973601] rescued pam_mail.so 0x0
[  931.973606] rescued pam_limits.so 0x0
[  931.973612] rescued libcrypto.so.0.9.8 0x9a
[  931.973615] rescued libcrypto.so.0.9.8 0xc2
[  931.973618] rescued libcrypto.so.0.9.8 0xc5
[  931.973622] rescued libcrypto.so.0.9.8 0xc6
[  931.973625] rescued libcrypto.so.0.9.8 0xc8
[  931.973628] rescued libcrypto.so.0.9.8 0xcc
[  931.973631] rescued libcrypto.so.0.9.8 0xcd
[  931.973635] rescued libcrypto.so.0.9.8 0xce
[  931.973638] rescued libcrypto.so.0.9.8 0xda
[  931.973641] rescued libcrypto.so.0.9.8 0xdb
[  931.973644] rescued libcrypto.so.0.9.8 0xdc
[  931.973648] rescued libcrypto.so.0.9.8 0xe1
[  931.973651] rescued libcrypto.so.0.9.8 0xe5
[  931.973654] rescued libcrypto.so.0.9.8 0xe6
[  931.973657] rescued libcrypto.so.0.9.8 0xee
[  931.973660] rescued libcrypto.so.0.9.8 0xef
[  931.973664] rescued libcrypto.so.0.9.8 0xf3
[  931.973667] rescued libcrypto.so.0.9.8 0xf5
[  931.973670] rescued libcrypto.so.0.9.8 0xf6
[  931.973673] rescued libcrypto.so.0.9.8 0xf7
[  931.973677] rescued libcrypto.so.0.9.8 0xfb
[  931.973680] rescued libcrypto.so.0.9.8 0xfe
[  931.973683] rescued libcrypto.so.0.9.8 0xff
[  931.973687] rescued libcrypto.so.0.9.8 0x100
[  931.973948] rescued libcrypto.so.0.9.8 0x102
[  931.973952] rescued libcrypto.so.0.9.8 0x121
[  931.973955] rescued libcrypto.so.0.9.8 0x130
[  931.973959] rescued libcrypto.so.0.9.8 0x131
[  931.973962] rescued libcrypto.so.0.9.8 0x132
[  931.973967] rescued libcrypto.so.0.9.8 0x137
[  931.973970] rescued libcrypto.so.0.9.8 0x149
[  931.973974] rescued libcrypto.so.0.9.8 0x14a
[  931.973985] rescued ld-2.9.so 0x12
[  931.973988] rescued sshd 0x13
[  931.973993] rescued sshd 0x14
[  931.973996] rescued sshd 0x18
[  931.974000] rescued sshd 0x19
[  931.974004] rescued sshd 0x1b
[  931.974008] rescued sshd 0x20
[  931.974012] rescued sshd 0x26
[  931.974016] rescued sshd 0x27
[  931.974020] rescued sshd 0x33
[  931.974024] rescued sshd 0x3e
[  931.974028] rescued sshd 0x43
[  931.974032] rescued sshd 0x45
[  931.974036] rescued sshd 0x4c
[  931.974041] rescued sshd 0x4e
[  931.974045] rescued sshd 0x4f
[  931.974048] rescued sshd 0x50
[  931.974052] rescued sshd 0x59
[  931.974056] rescued sshd 0x5a
[  931.974059] rescued sshd 0x5d
[  931.974064] rescued sshd 0x5f
[  931.974069] rescued sshd 0x66
[  931.974080] rescued libcrypto.so.0.9.8 0x88
[  931.974084] rescued libcrypto.so.0.9.8 0x89
[  931.974087] rescued libcrypto.so.0.9.8 0x97
[  931.974091] rescued libcrypto.so.0.9.8 0x98
[  932.448253] rescued libc-2.9.so 0x117
[  932.448616] rescued libcrypto.so.0.9.8 0x99
[  932.448620] rescued libcrypto.so.0.9.8 0x135
[  932.448623] rescued libcrypto.so.0.9.8 0x136
[  932.448627] rescued sshd 0x1e
[  932.448630] rescued sshd 0x28
[  932.448634] rescued sshd 0x44
[  932.448637] rescued sshd 0x4b
[  932.448642] rescued sshd 0x5b
[  932.448651] rescued libc-2.9.so 0x20
[  932.448657] rescued libc-2.9.so 0x21
[  932.448663] rescued libc-2.9.so 0x22
[  932.448670] rescued libc-2.9.so 0x27
[  932.448677] rescued libc-2.9.so 0x30
[  932.448690] rescued libc-2.9.so 0x11c
[  932.448711] rescued zsh4 0x0
[  932.448715] rescued libpam.so.0.81.12 0x3
[  932.448719] rescued sshd 0x2d
[  932.448723] rescued sshd 0x46
[  932.448995] rescued libcap.so.2.11 0x0
[  932.449032] rescued zsh4 0x2
[  932.449035] rescued zsh4 0x4
[  932.449039] rescued zsh4 0xa
[  932.449042] rescued zsh4 0xd
[  932.449045] rescued zsh4 0xe
[  932.449048] rescued zsh4 0xf
[  932.449058] rescued zsh4 0x25
[  932.449062] rescued zsh4 0x27
[  932.449065] rescued zsh4 0x28
[  932.449068] rescued zsh4 0x29
[  932.449071] rescued zsh4 0x2a
[  932.449074] rescued zsh4 0x2b
[  932.449077] rescued zsh4 0x2c
[  932.449080] rescued zsh4 0x2d
[  932.449084] rescued zsh4 0x2f
[  932.449087] rescued zsh4 0x35
[  932.449090] rescued zsh4 0x41
[  932.449093] rescued zsh4 0x43
[  932.449096] rescued zsh4 0x44
[  932.449099] rescued zsh4 0x4c
[  932.580004] rescued libcom_err.so.2.1 0x1
[  932.584145] rescued libk5crypto.so.3.1 0x1
[  932.584561] rescued zsh4 0x50
[  932.584565] rescued zsh4 0x51
[  932.584568] rescued zsh4 0x52
[  932.584571] rescued zsh4 0x5b
[  932.584575] rescued zsh4 0x5d
[  932.584578] rescued zsh4 0x60
[  932.584581] rescued zsh4 0x61
[  932.584584] rescued zsh4 0x64
[  932.584587] rescued zsh4 0x65
[  932.584590] rescued zsh4 0x66
[  932.584593] rescued zsh4 0x70
[  932.584596] rescued zsh4 0x74
[  932.584599] rescued zsh4 0x75
[  932.584602] rescued zsh4 0x76
[  932.584605] rescued zsh4 0x77
[  932.584608] rescued zsh4 0x81
[  932.584611] rescued zsh4 0x82
[  932.584614] rescued zsh4 0x83
[  932.584617] rescued zsh4 0x86
[  932.584620] rescued zsh4 0x8d
[  932.584623] rescued zsh4 0x91
[  932.584627] rescued zsh4 0x92
[  932.584632] rescued libncursesw.so.5.7 0x1
[  932.584644] rescued libc-2.9.so 0x8d
[  932.584647] rescued zsh4 0x3
[  932.584650] rescued zsh4 0x88
[  932.585372] rescued zsh4 0x33
[  932.585376] rescued zsh4 0x78
[  932.585382] rescued zsh4 0xc
[  932.585386] rescued terminfo.so 0x0
[  932.585389] rescued zsh4 0x26
[  932.585393] rescued zsh4 0x31
[  932.585396] rescued zsh4 0x4b
[  932.585399] rescued zsh4 0x79
[  932.585402] rescued zsh4 0x7e
[  932.585405] rescued zsh4 0x85
[  932.585408] rescued zsh4 0x8c
[  932.585412] rescued zsh4 0x23
[  932.585415] rescued zsh4 0x46
[  932.585418] rescued zsh4 0x14
[  932.585422] rescued zsh4 0x15
[  932.585425] rescued zsh4 0x16
[  932.585428] rescued zsh4 0x17
[  932.585431] rescued zsh4 0x5e
[  932.585434] rescued zsh4 0x7a
[  932.585698] rescued zsh4 0x7b
[  932.585708] rescued libc-2.9.so 0x8b
[  932.585717] rescued libc-2.9.so 0x120
[  932.585723] rescued libpthread-2.9.so 0x7
[  932.585736] rescued zsh4 0x37
[  932.585740] rescued zsh4 0x5c
[  932.585743] rescued zsh4 0x7c
[  932.585747] rescued libc-2.9.so 0x3f
[  932.754456] rescued libk5crypto.so.3.1 0x2
[  932.754832] rescued libc-2.9.so 0x23
[  932.754848] rescued libc-2.9.so 0x101
[  932.755122] rescued libc-2.9.so 0x102
[  932.755135] rescued libc-2.9.so 0xe5
[  932.755146] rescued libc-2.9.so 0xa1
[  932.755154] rescued udevd 0x8
[  932.755157] rescued udevd 0x9
[  932.755160] rescued udevd 0x13
[  932.755212] rescued libwrap.so.0.7.6 0x7
[  932.755215] rescued libc-2.9.so 0xed
[  932.794167] rescued libk5crypto.so.3.1 0x3
[  932.798372] rescued libk5crypto.so.3.1 0x4
[  932.802589] rescued libk5crypto.so.3.1 0x5
[  932.806787] rescued libk5crypto.so.3.1 0x19
[  932.811122] rescued libkrb5.so.3.3 0x1
[  932.815223] rescued libkrb5.so.3.3 0x2
[  932.819126] rescued libkrb5.so.3.3 0x3
[  932.819740] rescued sshd 0x64
[  932.819746] rescued libpam.so.0.81.12 0x4
[  932.819752] rescued libpam.so.0.81.12 0x5
[  932.819763] rescued libpam.so.0.81.12 0x6
[  932.819766] rescued libpam.so.0.81.12 0x7
[  932.819771] rescued libpam.so.0.81.12 0x9
[  932.819776] rescued sshd 0x10
[  932.819780] rescued sshd 0x15
[  932.819783] rescued sshd 0x16
[  932.819787] rescued sshd 0x1a
[  932.819790] rescued sshd 0x1c
[  932.819793] rescued sshd 0x1d
[  932.819796] rescued sshd 0x21
[  932.819800] rescued sshd 0x22
[  932.819804] rescued sshd 0x23
[  932.819807] rescued sshd 0x24
[  932.819810] rescued sshd 0x25
[  932.880415] rescued libkrb5.so.3.3 0x4
[  932.884264] rescued libkrb5.so.3.3 0x5
[  932.888127] rescued libkrb5.so.3.3 0x6
[  932.892060] rescued libkrb5.so.3.3 0x7
[  932.892504] rescued sshd 0x2c
[  932.892510] rescued sshd 0x2e
[  932.892514] rescued sshd 0x37
[  932.892517] rescued sshd 0x38
[  932.892521] rescued sshd 0x39
[  932.892524] rescued sshd 0x3a
[  932.892528] rescued sshd 0x3b
[  932.892531] rescued sshd 0x3c
[  932.892534] rescued sshd 0x3d
[  932.892538] rescued sshd 0x42
[  932.892541] rescued sshd 0x47
[  932.892546] rescued libcrypto.so.0.9.8 0xa5
[  932.892550] rescued libcrypto.so.0.9.8 0xbd
[  932.892553] rescued libcrypto.so.0.9.8 0xbe
[  932.892557] rescued libcrypto.so.0.9.8 0xc0
[  932.892560] rescued libcrypto.so.0.9.8 0xed
[  932.892564] rescued libcrypto.so.0.9.8 0xf4
[  932.892850] rescued hald 0x30
[  932.892853] rescued hald 0x31
[  932.892857] rescued hald 0x35
[  932.892859] rescued hald 0x36
[  932.892871] rescued hald 0x43
[  932.892875] rescued hald 0x45
[  932.892877] rescued hald 0x46
[  932.892904] rescued libdbus-glib-1.so.2.1.0 0x8
[  932.892909] rescued libdbus-glib-1.so.2.1.0 0x9
[  932.892920] rescued libdbus-glib-1.so.2.1.0 0xa
[  932.892938] rescued libgobject-2.0.so.0.2000.1 0x27
[  932.892949] rescued libgobject-2.0.so.0.2000.1 0x2b
[  932.893495] rescued libgobject-2.0.so.0.2000.1 0x9
[  932.893510] rescued libgobject-2.0.so.0.2000.1 0xe
[  932.893514] rescued libgobject-2.0.so.0.2000.1 0xf
[  932.893837] rescued libglib-2.0.so.0.2000.1 0x16
[  932.893848] rescued libglib-2.0.so.0.2000.1 0x24
[  932.893861] rescued libglib-2.0.so.0.2000.1 0x2b
[  932.893866] rescued libglib-2.0.so.0.2000.1 0x2c
[  932.893872] rescued libglib-2.0.so.0.2000.1 0x2e
[  932.893884] rescued libglib-2.0.so.0.2000.1 0x37
[  932.893889] rescued libglib-2.0.so.0.2000.1 0x38
[  932.893901] rescued libglib-2.0.so.0.2000.1 0x39
[  932.893907] rescued libglib-2.0.so.0.2000.1 0x3b
[  932.893912] rescued hald 0x6
[  932.893915] rescued hald 0x7
[  932.893918] rescued hald 0x8
[  932.893920] rescued hald 0x9
[  932.893924] rescued hald 0xb
[  932.893926] rescued hald 0xc
[  932.893929] rescued hald 0xd
[  933.077526] rescued hald 0x10
[  933.077579] rescued libglib-2.0.so.0.2000.1 0x62
[  933.077625] rescued libglib-2.0.so.0.2000.1 0x3c
[  933.077631] rescued libglib-2.0.so.0.2000.1 0x3d
[  933.077641] rescued libglib-2.0.so.0.2000.1 0x44
[  933.077645] rescued libglib-2.0.so.0.2000.1 0x45
[  933.077652] rescued libglib-2.0.so.0.2000.1 0x4a
[  933.109022] rescued hald 0x11
[  933.112194] rescued hald 0x12
[  933.115274] rescued hald 0x13
[  933.118380] rescued hald 0x18
[  933.121476] rescued hald 0x23
[  933.124563] rescued hald 0x24
[  933.128902] rescued libpulsecore.so.5.0.1 0x61
[  933.133460] rescued libpulsecore.so.5.0.1 0x64
[  933.136587] rescued libpulsecore.so.5.0.1 0x34
[  933.137423] rescued libpulsecore.so.5.0.1 0x55
[  933.137426] rescued libpulsecore.so.5.0.1 0x56
[  933.151551] rescued libpulsecore.so.5.0.1 0x68
[  933.152274] rescued libgconf-2.so.4.1.5 0x10
[  933.152278] rescued libgconf-2.so.4.1.5 0x11
[  933.152283] rescued libgconf-2.so.4.1.5 0x14
[  933.152289] rescued libgconf-2.so.4.1.5 0x18
[  933.152293] rescued libgconf-2.so.4.1.5 0x19
[  933.178388] rescued libpulsecore.so.5.0.1 0x15
[  933.182934] rescued libpulsecore.so.5.0.1 0x16
[  933.184590] rescued libORBit-2.so.0.1.0 0x53
[  933.184930] rescued libORBit-2.so.0.1.0 0x27
[  933.184934] rescued libORBit-2.so.0.1.0 0x28
[  933.184938] rescued libORBit-2.so.0.1.0 0x29
[  933.184949] rescued libORBit-2.so.0.1.0 0x2f
[  933.184957] rescued libORBit-2.so.0.1.0 0x33
[  933.184961] rescued libORBit-2.so.0.1.0 0x34
[  933.184964] rescued libORBit-2.so.0.1.0 0x35
[  933.184975] rescued libgthread-2.0.so.0.2000.1 0x1
[  933.184979] rescued libgthread-2.0.so.0.2000.1 0x2
[  933.185004] rescued libORBit-2.so.0.1.0 0x49
[  933.185007] rescued libORBit-2.so.0.1.0 0x4a
[  933.185011] rescued libORBit-2.so.0.1.0 0x4b
[  933.185015] rescued libORBit-2.so.0.1.0 0x4d
[  933.185471] rescued gconfd-2 0x4
[  933.185475] rescued gconfd-2 0x6
[  933.185479] rescued gconfd-2 0x8
[  933.185482] rescued gconfd-2 0x9
[  933.185484] rescued gconfd-2 0xa
[  933.185494] rescued libgconfbackend-xml.so 0x4
[  933.185810] rescued pam_env.so 0x1
[  933.185815] rescued pam_env.so 0x2
[  933.185820] rescued pam_unix.so 0x1
[  933.185825] rescued pam_unix.so 0x2
[  933.185830] rescued pam_unix.so 0x3
[  933.185833] rescued pam_unix.so 0x4
[  933.185837] rescued pam_unix.so 0x5
[  933.185840] rescued pam_unix.so 0x6
[  933.185843] rescued pam_unix.so 0x7
[  933.185848] rescued pam_unix.so 0xa
[  933.185851] rescued pam_unix.so 0xb
[  933.185856] rescued pam_mail.so 0x1
[  933.185860] rescued pam_limits.so 0x1
[  933.185864] rescued pam_limits.so 0x2
[  933.185876] rescued zsh4 0x87
[  933.185879] rescued zsh4 0x89
[  933.185882] rescued zsh4 0x8a
[  933.185885] rescued zsh4 0x8b
[  933.185888] rescued zsh4 0x8e
[  933.185891] rescued zsh4 0x8f
[  933.185895] rescued zsh4 0x90
[  933.185898] rescued zsh4 0x93
[  933.185901] rescued zsh4 0x94
[  933.185904] rescued zsh4 0x95
[  933.185914] rescued zsh4 0x5
[  933.185917] rescued zsh4 0x6
[  933.185920] rescued zsh4 0x7
[  933.185923] rescued zsh4 0x8
[  933.185926] rescued zsh4 0x9
[  933.185929] rescued zsh4 0xb
[  933.185932] rescued zsh4 0x10
[  933.185935] rescued zsh4 0x11
[  933.185938] rescued zsh4 0x12
[  933.185941] rescued zsh4 0x13
[  933.185944] rescued zsh4 0x18
[  933.185947] rescued zsh4 0x1a
[  933.185950] rescued zsh4 0x1b
[  933.185954] rescued zsh4 0x1c
[  933.392565] rescued libpulsecore.so.5.0.1 0x18
[  933.392918] rescued zsh4 0x1d
[  933.392922] rescued zsh4 0x1e
[  933.392927] rescued zsh4 0x22
[  933.392930] rescued zsh4 0x24
[  933.392933] rescued zsh4 0x2e
[  933.392936] rescued zsh4 0x30
[  933.392939] rescued zsh4 0x32
[  933.392942] rescued zsh4 0x34
[  933.392945] rescued zsh4 0x36
[  933.392948] rescued zsh4 0x38
[  933.392951] rescued zsh4 0x39
[  933.392954] rescued zsh4 0x3a
[  933.392957] rescued zsh4 0x3b
[  933.392961] rescued libcap.so.2.11 0x1
[  933.392965] rescued libcap.so.2.11 0x2
[  933.392970] rescued libncursesw.so.5.7 0x2
[  933.392973] rescued libncursesw.so.5.7 0x3
[  933.392976] rescued libncursesw.so.5.7 0x32
[  933.392980] rescued libncursesw.so.5.7 0x33
[  933.392983] rescued libncursesw.so.5.7 0x34
[  933.392986] rescued libncursesw.so.5.7 0x35
[  933.392990] rescued libncursesw.so.5.7 0x36
[  933.392993] rescued libncursesw.so.5.7 0x37
[  933.392996] rescued libncursesw.so.5.7 0x38
[  933.392999] rescued libncursesw.so.5.7 0x39
[  933.393002] rescued libncursesw.so.5.7 0x3a
[  933.393006] rescued libncursesw.so.5.7 0x3b
[  933.393009] rescued libncursesw.so.5.7 0x3c
[  933.393012] rescued libncursesw.so.5.7 0x3d
[  933.393015] rescued libncursesw.so.5.7 0x3e
[  933.393286] rescued libncursesw.so.5.7 0x3f
[  933.393289] rescued libncursesw.so.5.7 0x4
[  933.393293] rescued libncursesw.so.5.7 0x5
[  933.393296] rescued libncursesw.so.5.7 0x6
[  933.393299] rescued libncursesw.so.5.7 0x7
[  933.393303] rescued libncursesw.so.5.7 0x8
[  933.393306] rescued libncursesw.so.5.7 0x9
[  933.393309] rescued libncursesw.so.5.7 0xa
[  933.393312] rescued libncursesw.so.5.7 0xb
[  933.393315] rescued libncursesw.so.5.7 0xc
[  933.393319] rescued libncursesw.so.5.7 0xd
[  933.393322] rescued libncursesw.so.5.7 0xe
[  933.393325] rescued libncursesw.so.5.7 0xf
[  933.393329] rescued libncursesw.so.5.7 0x10
[  933.393332] rescued libncursesw.so.5.7 0x11
[  933.393335] rescued libncursesw.so.5.7 0x12
[  933.393338] rescued libncursesw.so.5.7 0x13
[  933.393341] rescued libncursesw.so.5.7 0x14
[  933.393345] rescued zsh4 0x3c
[  933.393348] rescued zsh4 0x3e
[  933.393351] rescued zsh4 0x3f
[  933.393354] rescued zsh4 0x40
[  933.393357] rescued zsh4 0x42
[  933.393360] rescued zsh4 0x45
[  933.393363] rescued zsh4 0x48
[  933.393366] rescued zsh4 0x49
[  933.393369] rescued zsh4 0x4a
[  933.393372] rescued zsh4 0x4d
[  933.393375] rescued zsh4 0x4e
[  933.393378] rescued zsh4 0x4f
[  933.393381] rescued zsh4 0x53
[  933.393384] rescued zsh4 0x54
[  933.393394] rescued zsh4 0x55
[  933.393397] rescued zsh4 0x56
[  933.393400] rescued zsh4 0x57
[  933.393403] rescued zsh4 0x58
[  933.393406] rescued zsh4 0x59
[  933.393409] rescued zsh4 0x5a
[  933.393412] rescued zsh4 0x67
[  933.393415] rescued zsh4 0x68
[  933.393418] rescued zsh4 0x69
[  933.393421] rescued zsh4 0x6a
[  933.393424] rescued zsh4 0x6b
[  933.393427] rescued zsh4 0x6c
[  933.393430] rescued zsh4 0x6d
[  933.393433] rescued zsh4 0x6e
[  933.668221] rescued libpulsecore.so.5.0.1 0x19
[  933.672781] rescued libpulsecore.so.5.0.1 0x24
[  933.676581] rescued zsh4 0x6f
[  933.676585] rescued zsh4 0x71
[  933.676588] rescued zsh4 0x72
[  933.676591] rescued zsh4 0x73
[  933.676594] rescued zsh4 0x7d
[  933.676597] rescued zsh4 0x7f
[  933.676600] rescued zsh4 0x80
[  933.676603] rescued zsh4 0x84
[  933.676606] rescued zsh4 0x5f
[  933.676609] rescued zsh4 0x62
[  933.676612] rescued zsh4 0x63
[  933.676618] rescued terminfo.so 0x1
[  933.676622] rescued zle.so 0x1
[  933.676625] rescued zle.so 0x2
[  933.676628] rescued zle.so 0x3
[  933.676632] rescued zle.so 0x27
[  933.676635] rescued zle.so 0x28
[  933.676638] rescued zle.so 0x29
[  933.676641] rescued zle.so 0x2a
[  933.676644] rescued zle.so 0x2b
[  933.676647] rescued zle.so 0x2c
[  933.676650] rescued zle.so 0x2d
[  933.676653] rescued zle.so 0x2e
[  933.676656] rescued zle.so 0x2f
[  933.676659] rescued zle.so 0x30
[  933.676662] rescued zle.so 0x31
[  933.676665] rescued zle.so 0x32
[  933.676668] rescued zle.so 0x33
[  933.676671] rescued zle.so 0x34
[  933.676674] rescued zle.so 0x36
[  933.676677] rescued zle.so 0x38
[  933.676950] rescued zle.so 0x39
[  933.676953] rescued zle.so 0x4
[  933.676956] rescued zle.so 0x5
[  933.676960] rescued zle.so 0x6
[  933.676963] rescued zle.so 0x7
[  933.676966] rescued zle.so 0x8
[  933.676969] rescued zle.so 0x9
[  933.676972] rescued zle.so 0xa
[  933.676975] rescued zle.so 0xb
[  933.676978] rescued zle.so 0xc
[  933.676981] rescued zle.so 0xd
[  933.676983] rescued zle.so 0xe
[  933.676987] rescued zle.so 0xf
[  933.676990] rescued zle.so 0x10
[  933.676993] rescued zle.so 0x11
[  933.676996] rescued zle.so 0x12
[  933.676999] rescued zle.so 0x13
[  933.677002] rescued zle.so 0x14
[  933.677005] rescued zle.so 0x15
[  933.677008] rescued zle.so 0x16
[  933.677011] rescued zle.so 0x17
[  933.677014] rescued zle.so 0x1a
[  933.677017] rescued zle.so 0x1b
[  933.677021] rescued zle.so 0x1c
[  933.677024] rescued zle.so 0x1d
[  933.677027] rescued zle.so 0x1e
[  933.677030] rescued zle.so 0x20
[  933.677033] rescued zle.so 0x21
[  933.677036] rescued zle.so 0x23
[  933.677039] rescued zle.so 0x24
[  933.677042] rescued zle.so 0x25
[  933.677046] rescued zle.so 0x26
[  933.677056] rescued complete.so 0x0
[  933.677060] rescued complete.so 0x1
[  933.677063] rescued complete.so 0x2
[  933.677066] rescued complete.so 0x3
[  933.677069] rescued complete.so 0x5
[  933.677072] rescued complete.so 0x6
[  933.677075] rescued complete.so 0x7
[  933.677078] rescued complete.so 0x8
[  933.677081] rescued complete.so 0x9
[  933.677084] rescued complete.so 0xa
[  933.677087] rescued complete.so 0xb
[  933.677090] rescued complete.so 0xc
[  933.677093] rescued complete.so 0xd
[  933.677096] rescued complete.so 0xe
[  933.692562] rescued complete.so 0xf
[  933.692567] rescued complete.so 0x10
[  933.692570] rescued complete.so 0x11
[  933.692574] rescued complete.so 0x12
[  933.692577] rescued complete.so 0x13
[  933.692580] rescued complete.so 0x14
[  933.692583] rescued complete.so 0x15
[  933.692586] rescued complete.so 0x16
[  933.692589] rescued complete.so 0x17
[  933.692593] rescued complete.so 0x18
[  933.692596] rescued complete.so 0x19
[  933.692599] rescued complete.so 0x1a
[  933.692602] rescued complete.so 0x1b
[  933.692605] rescued complete.so 0x1c
[  933.692608] rescued complete.so 0x1d
[  933.692612] rescued complete.so 0x1e
[  933.692615] rescued complete.so 0x1f
[  933.692618] rescued complete.so 0x20
[  933.692621] rescued complete.so 0x4
[  933.692625] rescued zutil.so 0x0
[  933.692628] rescued zutil.so 0x1
[  933.692632] rescued zutil.so 0x2
[  933.692635] rescued zutil.so 0x3
[  933.692638] rescued zutil.so 0x4
[  933.692641] rescued zutil.so 0x5
[  933.692645] rescued rlimits.so 0x0
[  933.692649] rescued rlimits.so 0x1
[  933.692652] rescued rlimits.so 0x2
[  933.692656] rescued complist.so 0x0
[  933.692659] rescued complist.so 0x1
[  933.692662] rescued complist.so 0x2
[  933.692666] rescued complist.so 0x3
[  933.692942] rescued complist.so 0xc
[  933.692945] rescued complist.so 0xd
[  933.692950] rescued parameter.so 0x0
[  933.692954] rescued parameter.so 0x1
[  933.692957] rescued parameter.so 0x2
[  933.692960] rescued parameter.so 0x3
[  933.692963] rescued parameter.so 0x4
[  933.692966] rescued parameter.so 0x6
[  934.070241] rescued libpulsecore.so.5.0.1 0x25
[  934.074982] rescued libpulsecore.so.5.0.1 0x2b
[  934.079663] rescued libpulsecore.so.5.0.1 0x2c
[  934.084835] rescued computil.so 0x1
[  934.084864] rescued computil.so 0x0
[  934.092002] rescued computil.so 0x2
[  934.095601] rescued computil.so 0x4
[  934.099196] rescued computil.so 0x5
[  934.102795] rescued computil.so 0x6
[  934.106413] rescued computil.so 0x7
[  934.110022] rescued computil.so 0x8
[  934.113623] rescued computil.so 0xe
[  934.397565] rescued zle.so 0x35
[  934.553110] rescued parameter.so 0x5
[  934.557012] rescued zsh4 0x47


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08  8:28                                     ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08  8:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Johannes Weiner, peterz, riel, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro, Minchan Kim

On Fri, May 08, 2009 at 04:16:08PM +0800, Wu Fengguang wrote:
> ---
> vmscan: make mapped executable pages the first class citizen
> 
> Protect referenced PROT_EXEC mapped pages from being deactivated.
> 
> PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
> currently running executables and their linked libraries, they shall really be
> cached aggressively to provide good user experiences.

I can verify that it actually works :)

Thanks,
Fengguang
---
printk("rescued %s 0x%lx\n", dname, page->index);

[  929.047700] rescued ld-2.9.so 0x0
[  929.051295] rescued libc-2.9.so 0x0
[  929.054984] rescued init 0x0
[  929.058086] rescued libc-2.9.so 0x1
[  929.061810] rescued libc-2.9.so 0x2
[  929.065557] rescued libc-2.9.so 0x3
[  929.069279] rescued libc-2.9.so 0x7
[  929.072978] rescued libc-2.9.so 0x8
[  929.076697] rescued libc-2.9.so 0x9
[  929.080413] rescued libc-2.9.so 0xb
[  929.084127] rescued libc-2.9.so 0xf
[  929.087849] rescued libc-2.9.so 0x10
[  929.091667] rescued libc-2.9.so 0x12
[  929.095426] rescued libc-2.9.so 0x13
[  929.099235] rescued libc-2.9.so 0x14
[  929.103055] rescued libc-2.9.so 0x15
[  929.106868] rescued libc-2.9.so 0x16
[  929.110661] rescued libc-2.9.so 0x1e
[  929.114468] rescued libc-2.9.so 0x42
[  929.118259] rescued libc-2.9.so 0x43
[  929.122063] rescued libc-2.9.so 0x44
[  929.125863] rescued libc-2.9.so 0x45
[  929.129666] rescued libc-2.9.so 0x46
[  929.133469] rescued libc-2.9.so 0x4c
[  929.137258] rescued libc-2.9.so 0x6b
[  929.141050] rescued libc-2.9.so 0x6f
[  929.144916] rescued libc-2.9.so 0x70
[  929.148695] rescued libc-2.9.so 0x71
[  929.152495] rescued libc-2.9.so 0x74
[  929.156272] rescued libc-2.9.so 0x76
[  929.160095] rescued libc-2.9.so 0x79
[  929.163904] rescued libc-2.9.so 0x7b
[  929.168007] rescued libc-2.9.so 0x7c
[  929.171800] rescued libc-2.9.so 0x7d
[  929.176518] rescued libnss_compat-2.9.so 0x0
[  929.180362] rescued libc-2.9.so 0x105
[  929.184617] rescued libc-2.9.so 0x4
[  929.188173] rescued libc-2.9.so 0x106
[  929.188191] rescued libc-2.9.so 0x5
[  929.195487] rescued libc-2.9.so 0x6
[  929.199042] rescued libc-2.9.so 0x116
[  929.202805] rescued libc-2.9.so 0x7f
[  929.202818] rescued libc-2.9.so 0x118
[  929.202838] rescued libc-2.9.so 0x11a
[  929.202863] rescued ld-2.9.so 0x1
[  929.202878] rescued ld-2.9.so 0x2
[  929.202892] rescued ld-2.9.so 0x3
[  929.202909] rescued ld-2.9.so 0x4
[  929.202925] rescued ld-2.9.so 0x5
[  929.202940] rescued ld-2.9.so 0x6
[  929.202956] rescued ld-2.9.so 0x7
[  929.202973] rescued ld-2.9.so 0xa
[  929.202989] rescued ld-2.9.so 0xb
[  929.203005] rescued ld-2.9.so 0xc
[  929.203021] rescued ld-2.9.so 0xf
[  929.203037] rescued ld-2.9.so 0x10
[  929.203052] rescued ld-2.9.so 0x14
[  929.203068] rescued ld-2.9.so 0x16
[  929.203084] rescued ld-2.9.so 0x18
[  929.203100] rescued ld-2.9.so 0x1a
[  929.203381] rescued libc-2.9.so 0x28
[  929.203392] rescued libc-2.9.so 0x29
[  929.203405] rescued libc-2.9.so 0x2a
[  929.203423] rescued libc-2.9.so 0x2b
[  929.203434] rescued libc-2.9.so 0x2f
[  929.203457] rescued libc-2.9.so 0x72
[  929.203477] rescued libc-2.9.so 0x73
[  929.203497] rescued libc-2.9.so 0x75
[  929.203516] rescued libc-2.9.so 0x77
[  929.203528] rescued libc-2.9.so 0xb7
[  929.203541] rescued libc-2.9.so 0xb8
[  929.203560] rescued libc-2.9.so 0xd9
[  929.203577] rescued libc-2.9.so 0x119
[  929.203590] rescued libc-2.9.so 0x11f
[  929.319976] rescued libc-2.9.so 0xa
[  929.323657] rescued libc-2.9.so 0xc
[  929.327374] rescued libc-2.9.so 0xd
[  929.331028] rescued libc-2.9.so 0xe
[  929.331575] rescued libc-2.9.so 0x24
[  929.331591] rescued libc-2.9.so 0x6a
[  929.331599] rescued libc-2.9.so 0x84
[  929.331609] rescued libc-2.9.so 0x6c
[  929.331627] rescued libc-2.9.so 0x6d
[  929.331642] rescued libc-2.9.so 0x64
[  929.331645] rescued libc-2.9.so 0xa6
[  929.331648] rescued libc-2.9.so 0xa7
[  929.331651] rescued libc-2.9.so 0xa8
[  929.331654] rescued libc-2.9.so 0xa9
[  929.331657] rescued libc-2.9.so 0xaa
[  929.331660] rescued libc-2.9.so 0xab
[  929.331663] rescued libc-2.9.so 0xae
[  929.331666] rescued libc-2.9.so 0xaf
[  929.331669] rescued libc-2.9.so 0xb0
[  929.331672] rescued libc-2.9.so 0xb1
[  929.331674] rescued libc-2.9.so 0xb2
[  929.331677] rescued libc-2.9.so 0xb3
[  929.331680] rescued libc-2.9.so 0xb4
[  929.331683] rescued libc-2.9.so 0xb5
[  929.331686] rescued libc-2.9.so 0xb6
[  929.331707] rescued libnss_files-2.9.so 0x1
[  929.331716] rescued libnss_files-2.9.so 0x2
[  929.331724] rescued libnss_files-2.9.so 0x8
[  929.424155] rescued libc-2.9.so 0x9e
[  929.426448] rescued libnss_nis-2.9.so 0x1
[  929.426457] rescued libnss_nis-2.9.so 0x2
[  929.426467] rescued libnss_nis-2.9.so 0x5
[  929.426475] rescued libnss_nis-2.9.so 0x4
[  929.426484] rescued libnss_nis-2.9.so 0x7
[  929.426500] rescued libnsl-2.9.so 0x1
[  929.426511] rescued libnsl-2.9.so 0x2
[  929.426520] rescued libnsl-2.9.so 0x3
[  929.426530] rescued libnsl-2.9.so 0x4
[  929.426540] rescued libnsl-2.9.so 0xf
[  929.426556] rescued libnss_compat-2.9.so 0x1
[  929.426565] rescued libnss_compat-2.9.so 0x3
[  929.426574] rescued libnss_compat-2.9.so 0x2
[  929.426584] rescued libnss_compat-2.9.so 0x5
[  929.480343] rescued libc-2.9.so 0x35
[  929.487805] rescued libc-2.9.so 0xc1
[  929.488119] rescued libc-2.9.so 0x111
[  929.488136] rescued libc-2.9.so 0x60
[  929.488157] rescued libc-2.9.so 0x8f
[  929.488170] rescued libc-2.9.so 0x9d
[  929.488185] rescued libc-2.9.so 0xa0
[  929.488193] rescued libc-2.9.so 0xcd
[  929.488209] rescued libc-2.9.so 0xcf
[  929.488221] rescued libc-2.9.so 0xde
[  929.488506] rescued libc-2.9.so 0xfd
[  929.488517] rescued libc-2.9.so 0xff
[  929.488532] rescued libc-2.9.so 0x100
[  929.488554] rescued libc-2.9.so 0x112
[  929.488567] rescued libc-2.9.so 0x11b
[  929.488584] rescued ld-2.9.so 0x11
[  929.488588] rescued libblkid.so.1.0 0x0
[  929.488599] rescued libpthread-2.9.so 0x0
[  929.488607] rescued librt-2.9.so 0x0
[  929.488616] rescued libselinux.so.1 0x1
[  929.488621] rescued libselinux.so.1 0x2
[  929.488626] rescued libselinux.so.1 0x3
[  929.488632] rescued libselinux.so.1 0x4
[  929.488637] rescued libselinux.so.1 0x5
[  929.488642] rescued libselinux.so.1 0xa
[  929.488647] rescued libselinux.so.1 0xc
[  929.488651] rescued libselinux.so.1 0x11
[  929.488656] rescued libselinux.so.1 0x12
[  929.488660] rescued libselinux.so.1 0x13
[  929.488665] rescued libselinux.so.1 0x14
[  929.488675] rescued libc-2.9.so 0xc3
[  929.488688] rescued libc-2.9.so 0xd1
[  929.488692] rescued libc-2.9.so 0x107
[  929.488703] rescued libc-2.9.so 0x9a
[  929.488716] rescued libc-2.9.so 0x9b
[  929.488720] rescued udevd 0x0
[  929.488729] rescued libc-2.9.so 0xca
[  929.489302] rescued libc-2.9.so 0x36
[  929.489314] rescued libdl-2.9.so 0x1
[  929.489324] rescued libc-2.9.so 0xda
[  929.489335] rescued libc-2.9.so 0xdd
[  929.489346] rescued libc-2.9.so 0xdc
[  929.489350] rescued libc-2.9.so 0xdb
[  929.489641] rescued udevd 0x2
[  929.489644] rescued udevd 0xa
[  929.489652] rescued libpthread-2.9.so 0x9
[  929.489659] rescued libpthread-2.9.so 0x8
[  929.489673] rescued udevd 0x1
[  929.661734] rescued libc-2.9.so 0xc2
[  929.665491] rescued libc-2.9.so 0xc6
[  929.669228] rescued libc-2.9.so 0xc7
[  929.673267] rescued libc-2.9.so 0x113
[  929.677241] rescued libc-2.9.so 0x114
[  929.681087] rescued libc-2.9.so 0x115
[  929.685510] rescued libselinux.so.1 0x0
[  929.689720] rescued libsepol.so.1 0x0
[  929.693586] rescued ld-2.9.so 0x8
[  929.697072] rescued ld-2.9.so 0x9
[  929.700527] rescued ld-2.9.so 0xd
[  929.703982] rescued ld-2.9.so 0xe
[  929.707459] rescued ld-2.9.so 0x13
[  929.711017] rescued ld-2.9.so 0x15
[  929.714551] rescued ld-2.9.so 0x17
[  929.718089] rescued libdl-2.9.so 0x0
[  929.720395] rescued libc-2.9.so 0x11
[  929.720412] rescued libc-2.9.so 0x17
[  929.720428] rescued libc-2.9.so 0x18
[  929.720443] rescued libc-2.9.so 0x19
[  929.720457] rescued libc-2.9.so 0x1a
[  929.720473] rescued libc-2.9.so 0x1b
[  929.720488] rescued libc-2.9.so 0x1c
[  929.720502] rescued libc-2.9.so 0x1d
[  929.720521] rescued libc-2.9.so 0x31
[  929.720541] rescued libc-2.9.so 0x32
[  929.720558] rescued libc-2.9.so 0x33
[  929.720578] rescued libc-2.9.so 0x34
[  929.720598] rescued libc-2.9.so 0x62
[  929.720614] rescued libc-2.9.so 0x63
[  929.720628] rescued libc-2.9.so 0x65
[  929.720646] rescued libc-2.9.so 0x6e
[  929.720668] rescued libc-2.9.so 0x7a
[  929.720685] rescued libc-2.9.so 0x82
[  929.720700] rescued libc-2.9.so 0x83
[  929.720721] rescued libc-2.9.so 0x9f
[  929.720742] rescued libc-2.9.so 0xcb
[  929.720754] rescued libc-2.9.so 0xcc
[  929.720775] rescued libc-2.9.so 0xce
[  929.720798] rescued libc-2.9.so 0x103
[  929.721080] rescued udevd 0x15
[  929.721083] rescued udevd 0x5
[  929.721085] rescued udevd 0x7
[  929.721088] rescued udevd 0xb
[  929.721091] rescued udevd 0xc
[  929.721093] rescued udevd 0xf
[  929.721096] rescued udevd 0x11
[  929.721098] rescued udevd 0x12
[  929.721101] rescued udevd 0x14
[  929.721104] rescued udevd 0x16
[  929.721106] rescued udevd 0x3
[  929.721109] rescued udevd 0xd
[  929.721111] rescued udevd 0xe
[  929.721125] rescued libc-2.9.so 0x52
[  929.721134] rescued libc-2.9.so 0x53
[  929.721145] rescued libc-2.9.so 0x54
[  929.721148] rescued udevd 0x4
[  929.721153] rescued libc-2.9.so 0x3d
[  929.721162] rescued libc-2.9.so 0x2c
[  929.721169] rescued libc-2.9.so 0x2d
[  929.721175] rescued libc-2.9.so 0x80
[  929.721181] rescued libc-2.9.so 0x81
[  929.721185] rescued libc-2.9.so 0x38
[  929.721190] rescued libc-2.9.so 0x39
[  929.736569] rescued libc-2.9.so 0x3a
[  929.736576] rescued libc-2.9.so 0x40
[  929.736581] rescued libc-2.9.so 0x41
[  929.736597] rescued libc-2.9.so 0x55
[  929.736601] rescued libc-2.9.so 0x56
[  929.736615] rescued libpthread-2.9.so 0x1
[  929.736622] rescued libpthread-2.9.so 0x2
[  929.736629] rescued libpthread-2.9.so 0x3
[  929.736636] rescued libpthread-2.9.so 0x4
[  929.736646] rescued libpthread-2.9.so 0x5
[  929.736654] rescued libpthread-2.9.so 0xa
[  929.736662] rescued libpthread-2.9.so 0xc
[  929.736669] rescued libpthread-2.9.so 0xe
[  929.736677] rescued libpthread-2.9.so 0x10
[  929.736685] rescued librt-2.9.so 0x1
[  929.736691] rescued librt-2.9.so 0x2
[  929.736696] rescued librt-2.9.so 0x5
[  929.736971] rescued libc-2.9.so 0x57
[  929.736984] rescued libc-2.9.so 0x8e
[  929.736992] rescued libc-2.9.so 0x90
[  929.736999] rescued libc-2.9.so 0x91
[  929.737007] rescued libc-2.9.so 0x95
[  929.737014] rescued libc-2.9.so 0x96
[  929.737021] rescued libc-2.9.so 0x97
[  929.737027] rescued librt-2.9.so 0x3
[  929.737042] rescued libc-2.9.so 0x47
[  929.737047] rescued libc-2.9.so 0x59
[  929.737055] rescued libc-2.9.so 0xc8
[  929.737059] rescued libc-2.9.so 0xc9
[  929.737067] rescued libm-2.9.so 0x0
[  929.737073] rescued libm-2.9.so 0x1
[  930.005563] rescued libnss_files-2.9.so 0x0
[  930.005830] rescued libm-2.9.so 0x2
[  930.005836] rescued libm-2.9.so 0x3
[  930.005841] rescued libm-2.9.so 0x44
[  930.005845] rescued libm-2.9.so 0x28
[  930.005849] rescued portmap 0x0
[  930.005861] rescued libwrap.so.0.7.6 0x0
[  930.005865] rescued rpc.statd 0x0
[  930.005868] rescued rpc.statd 0x1
[  930.005870] rescued rpc.statd 0x2
[  930.005873] rescued rpc.statd 0x4
[  930.005876] rescued rpc.statd 0x8
[  930.005881] rescued libwrap.so.0.7.6 0x1
[  930.005885] rescued libwrap.so.0.7.6 0x2
[  930.005889] rescued libwrap.so.0.7.6 0x3
[  930.005893] rescued libwrap.so.0.7.6 0x6
[  930.005897] rescued rpc.idmapd 0x0
[  930.006158] rescued libresolv-2.9.so 0x0
[  930.006171] rescued libc-2.9.so 0x9c
[  930.006182] rescued libc-2.9.so 0xfe
[  930.006186] rescued libnfsidmap.so.0.3.0 0x0
[  930.006189] rescued libevent-1.3e.so.1.0.3 0x0
[  930.006196] rescued libpthread-2.9.so 0xb
[  930.006202] rescued libattr.so.1.1.0 0x0
[  930.006208] rescued libc-2.9.so 0x61
[  930.006211] rescued libattr.so.1.1.0 0x1
[  930.006215] rescued libattr.so.1.1.0 0x3
[  930.006219] rescued init 0x1
[  930.006221] rescued init 0x2
[  930.006224] rescued init 0x5
[  930.006226] rescued init 0x7
[  930.006229] rescued init 0x6
[  930.006233] rescued libsepol.so.1 0x2
[  930.006236] rescued libsepol.so.1 0x3
[  930.006238] rescued libsepol.so.1 0x4
[  930.006241] rescued libsepol.so.1 0x2d
[  930.006244] rescued init 0x3
[  930.006246] rescued init 0x4
[  930.006250] rescued acpid 0x0
[  930.006259] rescued libdbus-1.so.3.4.0 0x0
[  930.006262] rescued dbus-daemon 0x0
[  930.006275] rescued libpthread-2.9.so 0xd
[  930.006279] rescued libexpat.so.1.5.2 0x0
[  930.006287] rescued sshd 0x0
[  930.006292] rescued libkeyutils-1.2.so 0x0
[  930.006297] rescued libkrb5support.so.0.1 0x0
[  930.006304] rescued libcom_err.so.2.1 0x0
[  930.006311] rescued libk5crypto.so.3.1 0x0
[  930.006318] rescued libgssapi_krb5.so.2.2 0x0
[  930.006322] rescued libresolv-2.9.so 0x1
[  930.006325] rescued libresolv-2.9.so 0x2
[  930.006329] rescued libresolv-2.9.so 0x3
[  930.006333] rescued libresolv-2.9.so 0xf
[  930.006340] rescued libcrypt-2.9.so 0x0
[  930.207054] rescued libnss_nis-2.9.so 0x0
[  930.211539] rescued libnsl-2.9.so 0x0
[  930.215805] rescued libz.so.1.2.3.3 0x0
[  930.215968] rescued libkrb5.so.3.3 0x8
[  930.215972] rescued libkrb5.so.3.3 0x9
[  930.215976] rescued libkrb5.so.3.3 0xa
[  930.215979] rescued libkrb5.so.3.3 0xb
[  930.215983] rescued libkrb5.so.3.3 0xc
[  930.215986] rescued libkrb5.so.3.3 0xd
[  930.215989] rescued libkrb5.so.3.3 0xe
[  930.215993] rescued libkrb5.so.3.3 0xf
[  930.215996] rescued libkrb5.so.3.3 0x10
[  930.216000] rescued libkrb5.so.3.3 0x11
[  930.216003] rescued libkrb5.so.3.3 0x12
[  930.216006] rescued libkrb5.so.3.3 0x13
[  930.216009] rescued libkrb5.so.3.3 0x14
[  930.216013] rescued libkrb5.so.3.3 0x15
[  930.216016] rescued libkrb5.so.3.3 0x16
[  930.216019] rescued libkrb5.so.3.3 0x19
[  930.216023] rescued libkrb5.so.3.3 0x88
[  930.216028] rescued libgssapi_krb5.so.2.2 0x1
[  930.216031] rescued libgssapi_krb5.so.2.2 0x2
[  930.216035] rescued libgssapi_krb5.so.2.2 0x3
[  930.216038] rescued libgssapi_krb5.so.2.2 0x4
[  930.216041] rescued libgssapi_krb5.so.2.2 0x5
[  930.216045] rescued libgssapi_krb5.so.2.2 0x6
[  930.216048] rescued libgssapi_krb5.so.2.2 0x25
[  930.216053] rescued libcrypt-2.9.so 0x6
[  930.216057] rescued libz.so.1.2.3.3 0x1
[  930.216060] rescued libz.so.1.2.3.3 0x2
[  930.216064] rescued libz.so.1.2.3.3 0xe
[  930.216068] rescued libutil-2.9.so 0x1
[  930.216075] rescued libcrypto.so.0.9.8 0x5
[  930.216080] rescued libcrypto.so.0.9.8 0x6
[  930.216085] rescued libcrypto.so.0.9.8 0x7
[  930.216368] rescued libcrypto.so.0.9.8 0x8
[  930.216373] rescued libcrypto.so.0.9.8 0x9
[  930.216378] rescued libcrypto.so.0.9.8 0xa
[  930.216383] rescued libcrypto.so.0.9.8 0xb
[  930.216386] rescued libcrypto.so.0.9.8 0xc
[  930.216391] rescued libcrypto.so.0.9.8 0xd
[  930.216395] rescued libcrypto.so.0.9.8 0xe
[  930.216399] rescued libcrypto.so.0.9.8 0xf
[  930.216404] rescued libcrypto.so.0.9.8 0x10
[  930.216408] rescued libcrypto.so.0.9.8 0x11
[  930.216412] rescued libcrypto.so.0.9.8 0x12
[  930.216415] rescued libcrypto.so.0.9.8 0x13
[  930.216418] rescued libcrypto.so.0.9.8 0x14
[  930.216423] rescued libcrypto.so.0.9.8 0x15
[  930.216426] rescued libcrypto.so.0.9.8 0x16
[  930.216431] rescued libcrypto.so.0.9.8 0x17
[  930.216434] rescued libcrypto.so.0.9.8 0x18
[  930.216439] rescued libcrypto.so.0.9.8 0x19
[  930.216443] rescued libcrypto.so.0.9.8 0x1a
[  930.216447] rescued libcrypto.so.0.9.8 0x1b
[  930.216450] rescued libcrypto.so.0.9.8 0x1c
[  930.216455] rescued libcrypto.so.0.9.8 0x1d
[  930.216460] rescued libcrypto.so.0.9.8 0x1e
[  930.216464] rescued libcrypto.so.0.9.8 0x1f
[  930.216469] rescued libcrypto.so.0.9.8 0x20
[  930.216473] rescued libcrypto.so.0.9.8 0x21
[  930.216476] rescued libcrypto.so.0.9.8 0x22
[  930.216480] rescued libcrypto.so.0.9.8 0x23
[  930.216483] rescued libcrypto.so.0.9.8 0x24
[  930.216487] rescued libcrypto.so.0.9.8 0x25
[  930.216491] rescued libcrypto.so.0.9.8 0x26
[  930.216494] rescued libcrypto.so.0.9.8 0x27
[  930.216503] rescued libcrypto.so.0.9.8 0x28
[  930.216506] rescued libcrypto.so.0.9.8 0x29
[  930.216510] rescued libcrypto.so.0.9.8 0x2a
[  930.216513] rescued libcrypto.so.0.9.8 0x2b
[  930.216516] rescued libcrypto.so.0.9.8 0x2c
[  930.216520] rescued libcrypto.so.0.9.8 0x2d
[  930.216525] rescued libcrypto.so.0.9.8 0x2e
[  930.216530] rescued libcrypto.so.0.9.8 0x2f
[  930.216535] rescued libcrypto.so.0.9.8 0x30
[  930.216538] rescued libcrypto.so.0.9.8 0x31
[  930.216541] rescued libcrypto.so.0.9.8 0x32
[  930.216544] rescued libcrypto.so.0.9.8 0x33
[  930.216548] rescued libcrypto.so.0.9.8 0x34
[  930.216551] rescued libcrypto.so.0.9.8 0x35
[  930.217015] rescued libcrypto.so.0.9.8 0x36
[  930.217019] rescued libcrypto.so.0.9.8 0x37
[  930.217022] rescued libcrypto.so.0.9.8 0x38
[  930.217025] rescued libcrypto.so.0.9.8 0x39
[  930.217029] rescued libcrypto.so.0.9.8 0x3a
[  930.217032] rescued libcrypto.so.0.9.8 0x3b
[  930.217036] rescued libcrypto.so.0.9.8 0x3c
[  930.217039] rescued libcrypto.so.0.9.8 0x3d
[  930.217043] rescued libcrypto.so.0.9.8 0x3e
[  930.217046] rescued libcrypto.so.0.9.8 0x3f
[  930.217049] rescued libcrypto.so.0.9.8 0x40
[  930.217053] rescued libcrypto.so.0.9.8 0x41
[  930.217056] rescued libcrypto.so.0.9.8 0x42
[  930.217060] rescued libcrypto.so.0.9.8 0x43
[  930.217063] rescued libcrypto.so.0.9.8 0x44
[  930.217066] rescued libcrypto.so.0.9.8 0x45
[  930.217070] rescued libcrypto.so.0.9.8 0x46
[  930.217073] rescued libcrypto.so.0.9.8 0x47
[  930.217076] rescued libcrypto.so.0.9.8 0x48
[  930.217080] rescued libcrypto.so.0.9.8 0x49
[  930.217083] rescued libcrypto.so.0.9.8 0x4a
[  930.217086] rescued libcrypto.so.0.9.8 0x4b
[  930.217090] rescued libcrypto.so.0.9.8 0x4c
[  930.217093] rescued libcrypto.so.0.9.8 0x4d
[  930.217096] rescued libcrypto.so.0.9.8 0x4e
[  930.217099] rescued libcrypto.so.0.9.8 0x4f
[  930.217103] rescued libcrypto.so.0.9.8 0x50
[  930.217106] rescued libcrypto.so.0.9.8 0x51
[  930.217109] rescued libcrypto.so.0.9.8 0x52
[  930.217113] rescued libcrypto.so.0.9.8 0x53
[  930.217116] rescued libcrypto.so.0.9.8 0x54
[  930.217119] rescued libcrypto.so.0.9.8 0x55
[  930.217384] rescued libcrypto.so.0.9.8 0x56
[  930.217387] rescued libcrypto.so.0.9.8 0x57
[  930.217391] rescued libcrypto.so.0.9.8 0x58
[  930.217394] rescued libcrypto.so.0.9.8 0x59
[  930.217397] rescued libcrypto.so.0.9.8 0x5a
[  930.217401] rescued libcrypto.so.0.9.8 0x5b
[  930.217404] rescued libcrypto.so.0.9.8 0x5c
[  930.217408] rescued libcrypto.so.0.9.8 0x5d
[  930.217411] rescued libcrypto.so.0.9.8 0x5e
[  930.217414] rescued libcrypto.so.0.9.8 0x5f
[  930.217418] rescued libcrypto.so.0.9.8 0x60
[  930.217421] rescued libcrypto.so.0.9.8 0x61
[  930.217425] rescued libcrypto.so.0.9.8 0x62
[  930.217428] rescued libcrypto.so.0.9.8 0x63
[  930.217431] rescued libcrypto.so.0.9.8 0x64
[  930.217436] rescued libcrypto.so.0.9.8 0x65
[  930.217440] rescued libcrypto.so.0.9.8 0x66
[  930.217443] rescued libcrypto.so.0.9.8 0x67
[  930.217448] rescued libcrypto.so.0.9.8 0x68
[  930.217452] rescued libcrypto.so.0.9.8 0x69
[  930.217456] rescued libcrypto.so.0.9.8 0x6a
[  930.217460] rescued libcrypto.so.0.9.8 0x6b
[  930.217464] rescued libcrypto.so.0.9.8 0x6c
[  930.217468] rescued libcrypto.so.0.9.8 0x6d
[  930.217472] rescued libcrypto.so.0.9.8 0x6e
[  930.217477] rescued libcrypto.so.0.9.8 0x70
[  930.217482] rescued libcrypto.so.0.9.8 0x71
[  930.217487] rescued libcrypto.so.0.9.8 0x73
[  930.217492] rescued libcrypto.so.0.9.8 0x74
[  930.217497] rescued libcrypto.so.0.9.8 0x75
[  930.217501] rescued libcrypto.so.0.9.8 0x77
[  930.217505] rescued libcrypto.so.0.9.8 0x78
[  930.217515] rescued libcrypto.so.0.9.8 0x79
[  930.217519] rescued libcrypto.so.0.9.8 0x12e
[  930.217523] rescued libcrypto.so.0.9.8 0x12f
[  930.217528] rescued libpam.so.0.81.12 0x1
[  930.217533] rescued libpam.so.0.81.12 0x2
[  930.217537] rescued libpam.so.0.81.12 0x8
[  930.217544] rescued sshd 0x1
[  930.217549] rescued sshd 0x2
[  930.217554] rescued sshd 0x3
[  930.217559] rescued sshd 0x4
[  930.217564] rescued sshd 0x5
[  930.217567] rescued sshd 0x6
[  930.217571] rescued sshd 0x7
[  930.217575] rescued sshd 0x8
[  930.865111] rescued libutil-2.9.so 0x0
[  930.865493] rescued sshd 0x9
[  930.865499] rescued sshd 0xa
[  930.865503] rescued sshd 0xb
[  930.865508] rescued sshd 0xc
[  930.865512] rescued sshd 0x12
[  930.865517] rescued sshd 0x48
[  930.865523] rescued sshd 0x4d
[  930.865527] rescued sshd 0x51
[  930.865532] rescued sshd 0x53
[  930.865535] rescued sshd 0x54
[  930.865540] rescued sshd 0x55
[  930.865545] rescued sshd 0x56
[  930.865550] rescued sshd 0x57
[  930.865554] rescued sshd 0x58
[  930.865558] rescued sshd 0x5e
[  930.865562] rescued sshd 0x67
[  930.865571] rescued libnss_files-2.9.so 0x3
[  930.865598] rescued libc-2.9.so 0xc5
[  930.865607] rescued libc-2.9.so 0xe6
[  930.865618] rescued ld-2.9.so 0x19
[  930.865622] rescued rpc.mountd 0x0
[  930.865625] rescued libnss_files-2.9.so 0x6
[  930.865628] rescued libnss_files-2.9.so 0x9
[  930.865631] rescued libc-2.9.so 0xe7
[  930.865635] rescued libc-2.9.so 0xef
[  930.865638] rescued libc-2.9.so 0xf0
[  930.865641] rescued libc-2.9.so 0xf1
[  930.865644] rescued libc-2.9.so 0xf2
[  930.865648] rescued libc-2.9.so 0xf3
[  930.865921] rescued libc-2.9.so 0xf4
[  930.865925] rescued libc-2.9.so 0xf6
[  930.865928] rescued libc-2.9.so 0xf7
[  930.865932] rescued hald 0x0
[  930.865936] rescued hald-runner 0x0
[  930.865942] rescued libdbus-glib-1.so.2.1.0 0x0
[  930.865952] rescued libpcre.so.3.12.1 0x0
[  930.865961] rescued libglib-2.0.so.0.2000.1 0x2
[  930.865967] rescued libglib-2.0.so.0.2000.1 0x4
[  930.865973] rescued libglib-2.0.so.0.2000.1 0x5
[  930.865978] rescued libglib-2.0.so.0.2000.1 0xc
[  930.865984] rescued libglib-2.0.so.0.2000.1 0xd
[  930.865990] rescued libglib-2.0.so.0.2000.1 0x10
[  930.865995] rescued libglib-2.0.so.0.2000.1 0x11
[  930.866001] rescued libglib-2.0.so.0.2000.1 0x12
[  930.866006] rescued libglib-2.0.so.0.2000.1 0x13
[  930.866012] rescued libglib-2.0.so.0.2000.1 0x14
[  930.866018] rescued libglib-2.0.so.0.2000.1 0x6f
[  930.866022] rescued libglib-2.0.so.0.2000.1 0x70
[  930.866027] rescued libglib-2.0.so.0.2000.1 0xb3
[  930.866033] rescued libhal.so.1.0.0 0x0
[  930.866036] rescued libglib-2.0.so.0.2000.1 0x7f
[  930.866042] rescued libglib-2.0.so.0.2000.1 0xe
[  930.866057] rescued libdbus-1.so.3.4.0 0x1
[  930.866063] rescued libdbus-1.so.3.4.0 0x2
[  930.866069] rescued libdbus-1.so.3.4.0 0x3
[  930.866074] rescued libdbus-1.so.3.4.0 0x4
[  930.866080] rescued libdbus-1.so.3.4.0 0x5
[  930.866085] rescued libdbus-1.so.3.4.0 0x6
[  930.866091] rescued libdbus-1.so.3.4.0 0x7
[  930.866097] rescued libdbus-1.so.3.4.0 0x11
[  930.866103] rescued libdbus-1.so.3.4.0 0x24
[  930.866108] rescued libdbus-1.so.3.4.0 0x25
[  930.866114] rescued libdbus-1.so.3.4.0 0x27
[  930.866120] rescued libdbus-1.so.3.4.0 0x28
[  930.866125] rescued libdbus-1.so.3.4.0 0x29
[  930.866131] rescued libdbus-1.so.3.4.0 0x2a
[  931.125223] rescued libpam.so.0.81.12 0x0
[  931.129347] rescued libkeyutils-1.2.so 0x1
[  931.132618] rescued libdbus-1.so.3.4.0 0x2c
[  931.132624] rescued libdbus-1.so.3.4.0 0x2d
[  931.132630] rescued libdbus-1.so.3.4.0 0x32
[  931.132636] rescued libglib-2.0.so.0.2000.1 0x42
[  931.132644] rescued libgobject-2.0.so.0.2000.1 0x1
[  931.132650] rescued libgobject-2.0.so.0.2000.1 0x2
[  931.132653] rescued hald 0x41
[  931.132655] rescued hald 0x42
[  931.132661] rescued libc-2.9.so 0x66
[  931.132665] rescued hald-addon-input 0x0
[  931.132671] rescued libdbus-1.so.3.4.0 0x8
[  931.132676] rescued libdbus-1.so.3.4.0 0x9
[  931.132682] rescued libdbus-1.so.3.4.0 0xa
[  931.132687] rescued libdbus-1.so.3.4.0 0xb
[  931.132693] rescued libdbus-1.so.3.4.0 0xc
[  931.132698] rescued libdbus-1.so.3.4.0 0xd
[  931.132704] rescued libdbus-1.so.3.4.0 0xe
[  931.132709] rescued libdbus-1.so.3.4.0 0xf
[  931.132715] rescued libdbus-1.so.3.4.0 0x10
[  931.132720] rescued libdbus-1.so.3.4.0 0x12
[  931.132726] rescued libdbus-1.so.3.4.0 0x13
[  931.132731] rescued libdbus-1.so.3.4.0 0x14
[  931.132737] rescued libdbus-1.so.3.4.0 0x15
[  931.132742] rescued libdbus-1.so.3.4.0 0x16
[  931.132748] rescued libdbus-1.so.3.4.0 0x17
[  931.132753] rescued libdbus-1.so.3.4.0 0x18
[  931.132759] rescued libdbus-1.so.3.4.0 0x19
[  931.132764] rescued libdbus-1.so.3.4.0 0x1a
[  931.132770] rescued libdbus-1.so.3.4.0 0x1b
[  931.132775] rescued libdbus-1.so.3.4.0 0x1c
[  931.132781] rescued libdbus-1.so.3.4.0 0x1d
[  931.132786] rescued libdbus-1.so.3.4.0 0x20
[  931.133058] rescued libdbus-1.so.3.4.0 0x21
[  931.133064] rescued libdbus-1.so.3.4.0 0x22
[  931.133070] rescued libdbus-1.so.3.4.0 0x23
[  931.133075] rescued libdbus-1.so.3.4.0 0x26
[  931.133081] rescued libdbus-1.so.3.4.0 0x2b
[  931.133086] rescued libdbus-1.so.3.4.0 0x2e
[  931.133092] rescued libdbus-1.so.3.4.0 0x2f
[  931.133097] rescued libdbus-1.so.3.4.0 0x31
[  931.133102] rescued libhal.so.1.0.0 0x1
[  931.133106] rescued libhal.so.1.0.0 0x2
[  931.133111] rescued libhal.so.1.0.0 0x3
[  931.133114] rescued libhal.so.1.0.0 0x4
[  931.133119] rescued libhal.so.1.0.0 0xb
[  931.133123] rescued libhal.so.1.0.0 0xc
[  931.133127] rescued libhal.so.1.0.0 0xd
[  931.133133] rescued libpcre.so.3.12.1 0x1
[  931.133138] rescued libpcre.so.3.12.1 0x1d
[  931.133144] rescued libglib-2.0.so.0.2000.1 0x3
[  931.133150] rescued libglib-2.0.so.0.2000.1 0x6
[  931.133155] rescued libglib-2.0.so.0.2000.1 0x7
[  931.133161] rescued libglib-2.0.so.0.2000.1 0x9
[  931.133167] rescued libglib-2.0.so.0.2000.1 0xa
[  931.133172] rescued libglib-2.0.so.0.2000.1 0xb
[  931.133178] rescued libglib-2.0.so.0.2000.1 0xf
[  931.133183] rescued libglib-2.0.so.0.2000.1 0x15
[  931.133189] rescued libglib-2.0.so.0.2000.1 0x71
[  931.133193] rescued hald-addon-cpufreq 0x0
[  931.133198] rescued libglib-2.0.so.0.2000.1 0x8
[  931.133204] rescued libglib-2.0.so.0.2000.1 0x3a
[  931.133210] rescued libglib-2.0.so.0.2000.1 0x56
[  931.133215] rescued libglib-2.0.so.0.2000.1 0x57
[  931.133221] rescued libglib-2.0.so.0.2000.1 0x58
[  931.133234] rescued libglib-2.0.so.0.2000.1 0x59
[  931.133239] rescued libglib-2.0.so.0.2000.1 0x6e
[  931.133245] rescued libglib-2.0.so.0.2000.1 0x7a
[  931.133251] rescued libglib-2.0.so.0.2000.1 0x7e
[  931.133254] rescued libhal.so.1.0.0 0xa
[  931.133260] rescued libglib-2.0.so.0.2000.1 0x5a
[  931.133266] rescued libglib-2.0.so.0.2000.1 0x41
[  931.133271] rescued libglib-2.0.so.0.2000.1 0x5b
[  931.133727] rescued hald 0x34
[  931.133732] rescued libglib-2.0.so.0.2000.1 0x5c
[  931.133737] rescued libglib-2.0.so.0.2000.1 0x5d
[  931.133742] rescued libglib-2.0.so.0.2000.1 0x6c
[  931.133746] rescued pulseaudio 0x0
[  931.133750] rescued libogg.so.0.5.3 0x0
[  931.133755] rescued libFLAC.so.8.2.0 0x0
[  931.133759] rescued libsndfile.so.1.0.17 0x0
[  931.133764] rescued libsamplerate.so.0.1.4 0x0
[  931.133768] rescued libltdl.so.3.1.6 0x0
[  931.133773] rescued libcap.so.1.10 0x0
[  931.134047] rescued gconf-helper 0x0
[  931.134051] rescued libpulsecore.so.5.0.1 0x12
[  931.134053] rescued libpulsecore.so.5.0.1 0x13
[  931.134057] rescued libpulsecore.so.5.0.1 0x14
[  931.134063] rescued libpthread-2.9.so 0xf
[  931.134067] rescued gconfd-2 0x0
[  931.134072] rescued libgmodule-2.0.so.0.2000.1 0x0
[  931.134076] rescued libgthread-2.0.so.0.2000.1 0x0
[  931.134080] rescued pulseaudio 0x1
[  931.134082] rescued pulseaudio 0x2
[  931.134085] rescued pulseaudio 0x3
[  931.134087] rescued pulseaudio 0x7
[  931.134090] rescued pulseaudio 0x8
[  931.134092] rescued pulseaudio 0x9
[  931.134095] rescued pulseaudio 0xc
[  931.134099] rescued liboil-0.3.so.0.3.0 0x1
[  931.134102] rescued liboil-0.3.so.0.3.0 0x2
[  931.134105] rescued liboil-0.3.so.0.3.0 0x3
[  931.134107] rescued liboil-0.3.so.0.3.0 0x4
[  931.134110] rescued liboil-0.3.so.0.3.0 0x5
[  931.134113] rescued liboil-0.3.so.0.3.0 0x6
[  931.134115] rescued liboil-0.3.so.0.3.0 0x7
[  931.134118] rescued liboil-0.3.so.0.3.0 0x8
[  931.134121] rescued liboil-0.3.so.0.3.0 0x9
[  931.134131] rescued liboil-0.3.so.0.3.0 0xa
[  931.134134] rescued liboil-0.3.so.0.3.0 0xb
[  931.134137] rescued liboil-0.3.so.0.3.0 0xc
[  931.134139] rescued liboil-0.3.so.0.3.0 0xd
[  931.134142] rescued liboil-0.3.so.0.3.0 0xe
[  931.134145] rescued liboil-0.3.so.0.3.0 0xf
[  931.134147] rescued liboil-0.3.so.0.3.0 0x10
[  931.134150] rescued liboil-0.3.so.0.3.0 0x11
[  931.134153] rescued liboil-0.3.so.0.3.0 0x12
[  931.134156] rescued liboil-0.3.so.0.3.0 0x13
[  931.134158] rescued liboil-0.3.so.0.3.0 0x14
[  931.134161] rescued liboil-0.3.so.0.3.0 0x15
[  931.134164] rescued liboil-0.3.so.0.3.0 0x16
[  931.134166] rescued liboil-0.3.so.0.3.0 0x17
[  931.644692] rescued libkrb5support.so.0.1 0x1
[  931.645060] rescued liboil-0.3.so.0.3.0 0x18
[  931.645063] rescued liboil-0.3.so.0.3.0 0x19
[  931.645066] rescued liboil-0.3.so.0.3.0 0x1a
[  931.645069] rescued liboil-0.3.so.0.3.0 0x1b
[  931.645072] rescued liboil-0.3.so.0.3.0 0x1c
[  931.645075] rescued liboil-0.3.so.0.3.0 0x1d
[  931.645078] rescued liboil-0.3.so.0.3.0 0x1e
[  931.645080] rescued liboil-0.3.so.0.3.0 0x1f
[  931.645083] rescued liboil-0.3.so.0.3.0 0x20
[  931.645086] rescued liboil-0.3.so.0.3.0 0x21
[  931.645089] rescued liboil-0.3.so.0.3.0 0x22
[  931.645091] rescued liboil-0.3.so.0.3.0 0x23
[  931.645094] rescued liboil-0.3.so.0.3.0 0x24
[  931.645097] rescued liboil-0.3.so.0.3.0 0x25
[  931.645100] rescued liboil-0.3.so.0.3.0 0x26
[  931.645102] rescued liboil-0.3.so.0.3.0 0x27
[  931.645105] rescued liboil-0.3.so.0.3.0 0x28
[  931.645108] rescued liboil-0.3.so.0.3.0 0x29
[  931.645111] rescued liboil-0.3.so.0.3.0 0x2a
[  931.645113] rescued liboil-0.3.so.0.3.0 0x2b
[  931.645116] rescued liboil-0.3.so.0.3.0 0x2c
[  931.645119] rescued liboil-0.3.so.0.3.0 0x2d
[  931.645123] rescued liboil-0.3.so.0.3.0 0x5a
[  931.645126] rescued libogg.so.0.5.3 0x1
[  931.645129] rescued libogg.so.0.5.3 0x3
[  931.645133] rescued libFLAC.so.8.2.0 0x1
[  931.645135] rescued libFLAC.so.8.2.0 0x2
[  931.645138] rescued libFLAC.so.8.2.0 0x3
[  931.645141] rescued libFLAC.so.8.2.0 0x4
[  931.645143] rescued libFLAC.so.8.2.0 0x5
[  931.645146] rescued libFLAC.so.8.2.0 0x6
[  931.645149] rescued libFLAC.so.8.2.0 0x7
[  931.645421] rescued libFLAC.so.8.2.0 0x8
[  931.645424] rescued libFLAC.so.8.2.0 0x9
[  931.645426] rescued libFLAC.so.8.2.0 0xa
[  931.645429] rescued libFLAC.so.8.2.0 0xb
[  931.645432] rescued libFLAC.so.8.2.0 0xc
[  931.645435] rescued libFLAC.so.8.2.0 0x43
[  931.645439] rescued libsndfile.so.1.0.17 0x1
[  931.645441] rescued libsndfile.so.1.0.17 0x2
[  931.645444] rescued libsndfile.so.1.0.17 0x3
[  931.645447] rescued libsndfile.so.1.0.17 0x4
[  931.645450] rescued libsndfile.so.1.0.17 0x3e
[  931.645453] rescued libsamplerate.so.0.1.4 0x2
[  931.645457] rescued libltdl.so.3.1.6 0x1
[  931.645460] rescued libltdl.so.3.1.6 0x5
[  931.645464] rescued libpulsecore.so.5.0.1 0x1
[  931.645468] rescued libpulsecore.so.5.0.1 0x2
[  931.645471] rescued libpulsecore.so.5.0.1 0x3
[  931.645474] rescued libpulsecore.so.5.0.1 0x4
[  931.645478] rescued libpulsecore.so.5.0.1 0x5
[  931.645481] rescued libpulsecore.so.5.0.1 0x6
[  931.645484] rescued libpulsecore.so.5.0.1 0x7
[  931.645487] rescued libpulsecore.so.5.0.1 0x8
[  931.645490] rescued libpulsecore.so.5.0.1 0x9
[  931.645494] rescued libpulsecore.so.5.0.1 0xa
[  931.645497] rescued libpulsecore.so.5.0.1 0xb
[  931.645500] rescued libpulsecore.so.5.0.1 0xc
[  931.645502] rescued libpulsecore.so.5.0.1 0xd
[  931.645505] rescued libpulsecore.so.5.0.1 0xe
[  931.645508] rescued libpulsecore.so.5.0.1 0xf
[  931.645511] rescued libpulsecore.so.5.0.1 0x10
[  931.645514] rescued libpulsecore.so.5.0.1 0x11
[  931.645517] rescued libpulsecore.so.5.0.1 0x1a
[  931.645527] rescued libpulsecore.so.5.0.1 0x29
[  931.645530] rescued libpulsecore.so.5.0.1 0x5f
[  931.645534] rescued libcap.so.1.10 0x2
[  931.645548] rescued libc-2.9.so 0xb9
[  931.645554] rescued libc-2.9.so 0xba
[  931.645560] rescued libc-2.9.so 0xbb
[  931.645568] rescued libc-2.9.so 0xec
[  931.645574] rescued libc-2.9.so 0xee
[  931.645580] rescued libc-2.9.so 0xeb
[  931.645591] rescued getty 0x0
[  931.645596] rescued sshd 0xd
[  931.645601] rescued sshd 0xe
[  931.967139] rescued libkrb5support.so.0.1 0x5
[  931.971614] rescued libc-2.9.so 0xbc
[  931.972606] rescued sshd 0x40
[  931.972611] rescued sshd 0x41
[  931.972615] rescued sshd 0x63
[  931.972624] rescued libcrypto.so.0.9.8 0x6f
[  931.972629] rescued libcrypto.so.0.9.8 0x72
[  931.972634] rescued libcrypto.so.0.9.8 0x76
[  931.972639] rescued libcrypto.so.0.9.8 0x7f
[  931.972644] rescued libcrypto.so.0.9.8 0x80
[  931.972649] rescued libcrypto.so.0.9.8 0x82
[  931.972654] rescued libcrypto.so.0.9.8 0x83
[  931.972659] rescued libcrypto.so.0.9.8 0x84
[  931.972924] rescued libcrypto.so.0.9.8 0x85
[  931.972928] rescued libcrypto.so.0.9.8 0x9e
[  931.972931] rescued libcrypto.so.0.9.8 0x9f
[  931.972934] rescued libcrypto.so.0.9.8 0xa0
[  931.972939] rescued libcrypto.so.0.9.8 0xa1
[  931.972943] rescued libcrypto.so.0.9.8 0xa2
[  931.972947] rescued libcrypto.so.0.9.8 0xa3
[  931.972950] rescued libcrypto.so.0.9.8 0xa4
[  931.972954] rescued libcrypto.so.0.9.8 0xa6
[  931.972958] rescued libcrypto.so.0.9.8 0xa7
[  931.972962] rescued libcrypto.so.0.9.8 0xa8
[  931.972965] rescued libcrypto.so.0.9.8 0xa9
[  931.972969] rescued libcrypto.so.0.9.8 0xaa
[  931.972973] rescued libcrypto.so.0.9.8 0xab
[  931.972976] rescued libcrypto.so.0.9.8 0xac
[  931.972981] rescued libcrypto.so.0.9.8 0xbc
[  931.972986] rescued libcrypto.so.0.9.8 0xbf
[  931.972990] rescued libcrypto.so.0.9.8 0xc3
[  931.972995] rescued libcrypto.so.0.9.8 0xc9
[  931.972999] rescued libcrypto.so.0.9.8 0xca
[  931.973004] rescued libcrypto.so.0.9.8 0xcb
[  931.973009] rescued libcrypto.so.0.9.8 0xd6
[  931.973013] rescued libcrypto.so.0.9.8 0xd7
[  931.973018] rescued libcrypto.so.0.9.8 0xd8
[  931.973023] rescued libcrypto.so.0.9.8 0xd9
[  931.973028] rescued libcrypto.so.0.9.8 0xdd
[  931.973033] rescued libcrypto.so.0.9.8 0xde
[  931.973037] rescued libcrypto.so.0.9.8 0xdf
[  931.973042] rescued libcrypto.so.0.9.8 0xe0
[  931.973047] rescued libcrypto.so.0.9.8 0xe3
[  931.973052] rescued libcrypto.so.0.9.8 0x133
[  931.973056] rescued libcrypto.so.0.9.8 0x13c
[  931.973068] rescued libcrypto.so.0.9.8 0x147
[  931.973072] rescued libcrypto.so.0.9.8 0x148
[  931.973077] rescued sshd 0xf
[  931.973080] rescued sshd 0x11
[  931.973084] rescued sshd 0x17
[  931.973088] rescued sshd 0x2f
[  931.973093] rescued sshd 0x34
[  931.973097] rescued sshd 0x35
[  931.973102] rescued sshd 0x36
[  931.973105] rescued sshd 0x3f
[  931.973110] rescued sshd 0x49
[  931.973113] rescued sshd 0x4a
[  931.973118] rescued sshd 0x5c
[  931.973121] rescued sshd 0x60
[  931.973570] rescued sshd 0x61
[  931.973575] rescued sshd 0x65
[  931.973581] rescued pam_env.so 0x0
[  931.973586] rescued pam_unix.so 0x0
[  931.973591] rescued pam_nologin.so 0x0
[  931.973597] rescued pam_motd.so 0x0
[  931.973601] rescued pam_mail.so 0x0
[  931.973606] rescued pam_limits.so 0x0
[  931.973612] rescued libcrypto.so.0.9.8 0x9a
[  931.973615] rescued libcrypto.so.0.9.8 0xc2
[  931.973618] rescued libcrypto.so.0.9.8 0xc5
[  931.973622] rescued libcrypto.so.0.9.8 0xc6
[  931.973625] rescued libcrypto.so.0.9.8 0xc8
[  931.973628] rescued libcrypto.so.0.9.8 0xcc
[  931.973631] rescued libcrypto.so.0.9.8 0xcd
[  931.973635] rescued libcrypto.so.0.9.8 0xce
[  931.973638] rescued libcrypto.so.0.9.8 0xda
[  931.973641] rescued libcrypto.so.0.9.8 0xdb
[  931.973644] rescued libcrypto.so.0.9.8 0xdc
[  931.973648] rescued libcrypto.so.0.9.8 0xe1
[  931.973651] rescued libcrypto.so.0.9.8 0xe5
[  931.973654] rescued libcrypto.so.0.9.8 0xe6
[  931.973657] rescued libcrypto.so.0.9.8 0xee
[  931.973660] rescued libcrypto.so.0.9.8 0xef
[  931.973664] rescued libcrypto.so.0.9.8 0xf3
[  931.973667] rescued libcrypto.so.0.9.8 0xf5
[  931.973670] rescued libcrypto.so.0.9.8 0xf6
[  931.973673] rescued libcrypto.so.0.9.8 0xf7
[  931.973677] rescued libcrypto.so.0.9.8 0xfb
[  931.973680] rescued libcrypto.so.0.9.8 0xfe
[  931.973683] rescued libcrypto.so.0.9.8 0xff
[  931.973687] rescued libcrypto.so.0.9.8 0x100
[  931.973948] rescued libcrypto.so.0.9.8 0x102
[  931.973952] rescued libcrypto.so.0.9.8 0x121
[  931.973955] rescued libcrypto.so.0.9.8 0x130
[  931.973959] rescued libcrypto.so.0.9.8 0x131
[  931.973962] rescued libcrypto.so.0.9.8 0x132
[  931.973967] rescued libcrypto.so.0.9.8 0x137
[  931.973970] rescued libcrypto.so.0.9.8 0x149
[  931.973974] rescued libcrypto.so.0.9.8 0x14a
[  931.973985] rescued ld-2.9.so 0x12
[  931.973988] rescued sshd 0x13
[  931.973993] rescued sshd 0x14
[  931.973996] rescued sshd 0x18
[  931.974000] rescued sshd 0x19
[  931.974004] rescued sshd 0x1b
[  931.974008] rescued sshd 0x20
[  931.974012] rescued sshd 0x26
[  931.974016] rescued sshd 0x27
[  931.974020] rescued sshd 0x33
[  931.974024] rescued sshd 0x3e
[  931.974028] rescued sshd 0x43
[  931.974032] rescued sshd 0x45
[  931.974036] rescued sshd 0x4c
[  931.974041] rescued sshd 0x4e
[  931.974045] rescued sshd 0x4f
[  931.974048] rescued sshd 0x50
[  931.974052] rescued sshd 0x59
[  931.974056] rescued sshd 0x5a
[  931.974059] rescued sshd 0x5d
[  931.974064] rescued sshd 0x5f
[  931.974069] rescued sshd 0x66
[  931.974080] rescued libcrypto.so.0.9.8 0x88
[  931.974084] rescued libcrypto.so.0.9.8 0x89
[  931.974087] rescued libcrypto.so.0.9.8 0x97
[  931.974091] rescued libcrypto.so.0.9.8 0x98
[  932.448253] rescued libc-2.9.so 0x117
[  932.448616] rescued libcrypto.so.0.9.8 0x99
[  932.448620] rescued libcrypto.so.0.9.8 0x135
[  932.448623] rescued libcrypto.so.0.9.8 0x136
[  932.448627] rescued sshd 0x1e
[  932.448630] rescued sshd 0x28
[  932.448634] rescued sshd 0x44
[  932.448637] rescued sshd 0x4b
[  932.448642] rescued sshd 0x5b
[  932.448651] rescued libc-2.9.so 0x20
[  932.448657] rescued libc-2.9.so 0x21
[  932.448663] rescued libc-2.9.so 0x22
[  932.448670] rescued libc-2.9.so 0x27
[  932.448677] rescued libc-2.9.so 0x30
[  932.448690] rescued libc-2.9.so 0x11c
[  932.448711] rescued zsh4 0x0
[  932.448715] rescued libpam.so.0.81.12 0x3
[  932.448719] rescued sshd 0x2d
[  932.448723] rescued sshd 0x46
[  932.448995] rescued libcap.so.2.11 0x0
[  932.449032] rescued zsh4 0x2
[  932.449035] rescued zsh4 0x4
[  932.449039] rescued zsh4 0xa
[  932.449042] rescued zsh4 0xd
[  932.449045] rescued zsh4 0xe
[  932.449048] rescued zsh4 0xf
[  932.449058] rescued zsh4 0x25
[  932.449062] rescued zsh4 0x27
[  932.449065] rescued zsh4 0x28
[  932.449068] rescued zsh4 0x29
[  932.449071] rescued zsh4 0x2a
[  932.449074] rescued zsh4 0x2b
[  932.449077] rescued zsh4 0x2c
[  932.449080] rescued zsh4 0x2d
[  932.449084] rescued zsh4 0x2f
[  932.449087] rescued zsh4 0x35
[  932.449090] rescued zsh4 0x41
[  932.449093] rescued zsh4 0x43
[  932.449096] rescued zsh4 0x44
[  932.449099] rescued zsh4 0x4c
[  932.580004] rescued libcom_err.so.2.1 0x1
[  932.584145] rescued libk5crypto.so.3.1 0x1
[  932.584561] rescued zsh4 0x50
[  932.584565] rescued zsh4 0x51
[  932.584568] rescued zsh4 0x52
[  932.584571] rescued zsh4 0x5b
[  932.584575] rescued zsh4 0x5d
[  932.584578] rescued zsh4 0x60
[  932.584581] rescued zsh4 0x61
[  932.584584] rescued zsh4 0x64
[  932.584587] rescued zsh4 0x65
[  932.584590] rescued zsh4 0x66
[  932.584593] rescued zsh4 0x70
[  932.584596] rescued zsh4 0x74
[  932.584599] rescued zsh4 0x75
[  932.584602] rescued zsh4 0x76
[  932.584605] rescued zsh4 0x77
[  932.584608] rescued zsh4 0x81
[  932.584611] rescued zsh4 0x82
[  932.584614] rescued zsh4 0x83
[  932.584617] rescued zsh4 0x86
[  932.584620] rescued zsh4 0x8d
[  932.584623] rescued zsh4 0x91
[  932.584627] rescued zsh4 0x92
[  932.584632] rescued libncursesw.so.5.7 0x1
[  932.584644] rescued libc-2.9.so 0x8d
[  932.584647] rescued zsh4 0x3
[  932.584650] rescued zsh4 0x88
[  932.585372] rescued zsh4 0x33
[  932.585376] rescued zsh4 0x78
[  932.585382] rescued zsh4 0xc
[  932.585386] rescued terminfo.so 0x0
[  932.585389] rescued zsh4 0x26
[  932.585393] rescued zsh4 0x31
[  932.585396] rescued zsh4 0x4b
[  932.585399] rescued zsh4 0x79
[  932.585402] rescued zsh4 0x7e
[  932.585405] rescued zsh4 0x85
[  932.585408] rescued zsh4 0x8c
[  932.585412] rescued zsh4 0x23
[  932.585415] rescued zsh4 0x46
[  932.585418] rescued zsh4 0x14
[  932.585422] rescued zsh4 0x15
[  932.585425] rescued zsh4 0x16
[  932.585428] rescued zsh4 0x17
[  932.585431] rescued zsh4 0x5e
[  932.585434] rescued zsh4 0x7a
[  932.585698] rescued zsh4 0x7b
[  932.585708] rescued libc-2.9.so 0x8b
[  932.585717] rescued libc-2.9.so 0x120
[  932.585723] rescued libpthread-2.9.so 0x7
[  932.585736] rescued zsh4 0x37
[  932.585740] rescued zsh4 0x5c
[  932.585743] rescued zsh4 0x7c
[  932.585747] rescued libc-2.9.so 0x3f
[  932.754456] rescued libk5crypto.so.3.1 0x2
[  932.754832] rescued libc-2.9.so 0x23
[  932.754848] rescued libc-2.9.so 0x101
[  932.755122] rescued libc-2.9.so 0x102
[  932.755135] rescued libc-2.9.so 0xe5
[  932.755146] rescued libc-2.9.so 0xa1
[  932.755154] rescued udevd 0x8
[  932.755157] rescued udevd 0x9
[  932.755160] rescued udevd 0x13
[  932.755212] rescued libwrap.so.0.7.6 0x7
[  932.755215] rescued libc-2.9.so 0xed
[  932.794167] rescued libk5crypto.so.3.1 0x3
[  932.798372] rescued libk5crypto.so.3.1 0x4
[  932.802589] rescued libk5crypto.so.3.1 0x5
[  932.806787] rescued libk5crypto.so.3.1 0x19
[  932.811122] rescued libkrb5.so.3.3 0x1
[  932.815223] rescued libkrb5.so.3.3 0x2
[  932.819126] rescued libkrb5.so.3.3 0x3
[  932.819740] rescued sshd 0x64
[  932.819746] rescued libpam.so.0.81.12 0x4
[  932.819752] rescued libpam.so.0.81.12 0x5
[  932.819763] rescued libpam.so.0.81.12 0x6
[  932.819766] rescued libpam.so.0.81.12 0x7
[  932.819771] rescued libpam.so.0.81.12 0x9
[  932.819776] rescued sshd 0x10
[  932.819780] rescued sshd 0x15
[  932.819783] rescued sshd 0x16
[  932.819787] rescued sshd 0x1a
[  932.819790] rescued sshd 0x1c
[  932.819793] rescued sshd 0x1d
[  932.819796] rescued sshd 0x21
[  932.819800] rescued sshd 0x22
[  932.819804] rescued sshd 0x23
[  932.819807] rescued sshd 0x24
[  932.819810] rescued sshd 0x25
[  932.880415] rescued libkrb5.so.3.3 0x4
[  932.884264] rescued libkrb5.so.3.3 0x5
[  932.888127] rescued libkrb5.so.3.3 0x6
[  932.892060] rescued libkrb5.so.3.3 0x7
[  932.892504] rescued sshd 0x2c
[  932.892510] rescued sshd 0x2e
[  932.892514] rescued sshd 0x37
[  932.892517] rescued sshd 0x38
[  932.892521] rescued sshd 0x39
[  932.892524] rescued sshd 0x3a
[  932.892528] rescued sshd 0x3b
[  932.892531] rescued sshd 0x3c
[  932.892534] rescued sshd 0x3d
[  932.892538] rescued sshd 0x42
[  932.892541] rescued sshd 0x47
[  932.892546] rescued libcrypto.so.0.9.8 0xa5
[  932.892550] rescued libcrypto.so.0.9.8 0xbd
[  932.892553] rescued libcrypto.so.0.9.8 0xbe
[  932.892557] rescued libcrypto.so.0.9.8 0xc0
[  932.892560] rescued libcrypto.so.0.9.8 0xed
[  932.892564] rescued libcrypto.so.0.9.8 0xf4
[  932.892850] rescued hald 0x30
[  932.892853] rescued hald 0x31
[  932.892857] rescued hald 0x35
[  932.892859] rescued hald 0x36
[  932.892871] rescued hald 0x43
[  932.892875] rescued hald 0x45
[  932.892877] rescued hald 0x46
[  932.892904] rescued libdbus-glib-1.so.2.1.0 0x8
[  932.892909] rescued libdbus-glib-1.so.2.1.0 0x9
[  932.892920] rescued libdbus-glib-1.so.2.1.0 0xa
[  932.892938] rescued libgobject-2.0.so.0.2000.1 0x27
[  932.892949] rescued libgobject-2.0.so.0.2000.1 0x2b
[  932.893495] rescued libgobject-2.0.so.0.2000.1 0x9
[  932.893510] rescued libgobject-2.0.so.0.2000.1 0xe
[  932.893514] rescued libgobject-2.0.so.0.2000.1 0xf
[  932.893837] rescued libglib-2.0.so.0.2000.1 0x16
[  932.893848] rescued libglib-2.0.so.0.2000.1 0x24
[  932.893861] rescued libglib-2.0.so.0.2000.1 0x2b
[  932.893866] rescued libglib-2.0.so.0.2000.1 0x2c
[  932.893872] rescued libglib-2.0.so.0.2000.1 0x2e
[  932.893884] rescued libglib-2.0.so.0.2000.1 0x37
[  932.893889] rescued libglib-2.0.so.0.2000.1 0x38
[  932.893901] rescued libglib-2.0.so.0.2000.1 0x39
[  932.893907] rescued libglib-2.0.so.0.2000.1 0x3b
[  932.893912] rescued hald 0x6
[  932.893915] rescued hald 0x7
[  932.893918] rescued hald 0x8
[  932.893920] rescued hald 0x9
[  932.893924] rescued hald 0xb
[  932.893926] rescued hald 0xc
[  932.893929] rescued hald 0xd
[  933.077526] rescued hald 0x10
[  933.077579] rescued libglib-2.0.so.0.2000.1 0x62
[  933.077625] rescued libglib-2.0.so.0.2000.1 0x3c
[  933.077631] rescued libglib-2.0.so.0.2000.1 0x3d
[  933.077641] rescued libglib-2.0.so.0.2000.1 0x44
[  933.077645] rescued libglib-2.0.so.0.2000.1 0x45
[  933.077652] rescued libglib-2.0.so.0.2000.1 0x4a
[  933.109022] rescued hald 0x11
[  933.112194] rescued hald 0x12
[  933.115274] rescued hald 0x13
[  933.118380] rescued hald 0x18
[  933.121476] rescued hald 0x23
[  933.124563] rescued hald 0x24
[  933.128902] rescued libpulsecore.so.5.0.1 0x61
[  933.133460] rescued libpulsecore.so.5.0.1 0x64
[  933.136587] rescued libpulsecore.so.5.0.1 0x34
[  933.137423] rescued libpulsecore.so.5.0.1 0x55
[  933.137426] rescued libpulsecore.so.5.0.1 0x56
[  933.151551] rescued libpulsecore.so.5.0.1 0x68
[  933.152274] rescued libgconf-2.so.4.1.5 0x10
[  933.152278] rescued libgconf-2.so.4.1.5 0x11
[  933.152283] rescued libgconf-2.so.4.1.5 0x14
[  933.152289] rescued libgconf-2.so.4.1.5 0x18
[  933.152293] rescued libgconf-2.so.4.1.5 0x19
[  933.178388] rescued libpulsecore.so.5.0.1 0x15
[  933.182934] rescued libpulsecore.so.5.0.1 0x16
[  933.184590] rescued libORBit-2.so.0.1.0 0x53
[  933.184930] rescued libORBit-2.so.0.1.0 0x27
[  933.184934] rescued libORBit-2.so.0.1.0 0x28
[  933.184938] rescued libORBit-2.so.0.1.0 0x29
[  933.184949] rescued libORBit-2.so.0.1.0 0x2f
[  933.184957] rescued libORBit-2.so.0.1.0 0x33
[  933.184961] rescued libORBit-2.so.0.1.0 0x34
[  933.184964] rescued libORBit-2.so.0.1.0 0x35
[  933.184975] rescued libgthread-2.0.so.0.2000.1 0x1
[  933.184979] rescued libgthread-2.0.so.0.2000.1 0x2
[  933.185004] rescued libORBit-2.so.0.1.0 0x49
[  933.185007] rescued libORBit-2.so.0.1.0 0x4a
[  933.185011] rescued libORBit-2.so.0.1.0 0x4b
[  933.185015] rescued libORBit-2.so.0.1.0 0x4d
[  933.185471] rescued gconfd-2 0x4
[  933.185475] rescued gconfd-2 0x6
[  933.185479] rescued gconfd-2 0x8
[  933.185482] rescued gconfd-2 0x9
[  933.185484] rescued gconfd-2 0xa
[  933.185494] rescued libgconfbackend-xml.so 0x4
[  933.185810] rescued pam_env.so 0x1
[  933.185815] rescued pam_env.so 0x2
[  933.185820] rescued pam_unix.so 0x1
[  933.185825] rescued pam_unix.so 0x2
[  933.185830] rescued pam_unix.so 0x3
[  933.185833] rescued pam_unix.so 0x4
[  933.185837] rescued pam_unix.so 0x5
[  933.185840] rescued pam_unix.so 0x6
[  933.185843] rescued pam_unix.so 0x7
[  933.185848] rescued pam_unix.so 0xa
[  933.185851] rescued pam_unix.so 0xb
[  933.185856] rescued pam_mail.so 0x1
[  933.185860] rescued pam_limits.so 0x1
[  933.185864] rescued pam_limits.so 0x2
[  933.185876] rescued zsh4 0x87
[  933.185879] rescued zsh4 0x89
[  933.185882] rescued zsh4 0x8a
[  933.185885] rescued zsh4 0x8b
[  933.185888] rescued zsh4 0x8e
[  933.185891] rescued zsh4 0x8f
[  933.185895] rescued zsh4 0x90
[  933.185898] rescued zsh4 0x93
[  933.185901] rescued zsh4 0x94
[  933.185904] rescued zsh4 0x95
[  933.185914] rescued zsh4 0x5
[  933.185917] rescued zsh4 0x6
[  933.185920] rescued zsh4 0x7
[  933.185923] rescued zsh4 0x8
[  933.185926] rescued zsh4 0x9
[  933.185929] rescued zsh4 0xb
[  933.185932] rescued zsh4 0x10
[  933.185935] rescued zsh4 0x11
[  933.185938] rescued zsh4 0x12
[  933.185941] rescued zsh4 0x13
[  933.185944] rescued zsh4 0x18
[  933.185947] rescued zsh4 0x1a
[  933.185950] rescued zsh4 0x1b
[  933.185954] rescued zsh4 0x1c
[  933.392565] rescued libpulsecore.so.5.0.1 0x18
[  933.392918] rescued zsh4 0x1d
[  933.392922] rescued zsh4 0x1e
[  933.392927] rescued zsh4 0x22
[  933.392930] rescued zsh4 0x24
[  933.392933] rescued zsh4 0x2e
[  933.392936] rescued zsh4 0x30
[  933.392939] rescued zsh4 0x32
[  933.392942] rescued zsh4 0x34
[  933.392945] rescued zsh4 0x36
[  933.392948] rescued zsh4 0x38
[  933.392951] rescued zsh4 0x39
[  933.392954] rescued zsh4 0x3a
[  933.392957] rescued zsh4 0x3b
[  933.392961] rescued libcap.so.2.11 0x1
[  933.392965] rescued libcap.so.2.11 0x2
[  933.392970] rescued libncursesw.so.5.7 0x2
[  933.392973] rescued libncursesw.so.5.7 0x3
[  933.392976] rescued libncursesw.so.5.7 0x32
[  933.392980] rescued libncursesw.so.5.7 0x33
[  933.392983] rescued libncursesw.so.5.7 0x34
[  933.392986] rescued libncursesw.so.5.7 0x35
[  933.392990] rescued libncursesw.so.5.7 0x36
[  933.392993] rescued libncursesw.so.5.7 0x37
[  933.392996] rescued libncursesw.so.5.7 0x38
[  933.392999] rescued libncursesw.so.5.7 0x39
[  933.393002] rescued libncursesw.so.5.7 0x3a
[  933.393006] rescued libncursesw.so.5.7 0x3b
[  933.393009] rescued libncursesw.so.5.7 0x3c
[  933.393012] rescued libncursesw.so.5.7 0x3d
[  933.393015] rescued libncursesw.so.5.7 0x3e
[  933.393286] rescued libncursesw.so.5.7 0x3f
[  933.393289] rescued libncursesw.so.5.7 0x4
[  933.393293] rescued libncursesw.so.5.7 0x5
[  933.393296] rescued libncursesw.so.5.7 0x6
[  933.393299] rescued libncursesw.so.5.7 0x7
[  933.393303] rescued libncursesw.so.5.7 0x8
[  933.393306] rescued libncursesw.so.5.7 0x9
[  933.393309] rescued libncursesw.so.5.7 0xa
[  933.393312] rescued libncursesw.so.5.7 0xb
[  933.393315] rescued libncursesw.so.5.7 0xc
[  933.393319] rescued libncursesw.so.5.7 0xd
[  933.393322] rescued libncursesw.so.5.7 0xe
[  933.393325] rescued libncursesw.so.5.7 0xf
[  933.393329] rescued libncursesw.so.5.7 0x10
[  933.393332] rescued libncursesw.so.5.7 0x11
[  933.393335] rescued libncursesw.so.5.7 0x12
[  933.393338] rescued libncursesw.so.5.7 0x13
[  933.393341] rescued libncursesw.so.5.7 0x14
[  933.393345] rescued zsh4 0x3c
[  933.393348] rescued zsh4 0x3e
[  933.393351] rescued zsh4 0x3f
[  933.393354] rescued zsh4 0x40
[  933.393357] rescued zsh4 0x42
[  933.393360] rescued zsh4 0x45
[  933.393363] rescued zsh4 0x48
[  933.393366] rescued zsh4 0x49
[  933.393369] rescued zsh4 0x4a
[  933.393372] rescued zsh4 0x4d
[  933.393375] rescued zsh4 0x4e
[  933.393378] rescued zsh4 0x4f
[  933.393381] rescued zsh4 0x53
[  933.393384] rescued zsh4 0x54
[  933.393394] rescued zsh4 0x55
[  933.393397] rescued zsh4 0x56
[  933.393400] rescued zsh4 0x57
[  933.393403] rescued zsh4 0x58
[  933.393406] rescued zsh4 0x59
[  933.393409] rescued zsh4 0x5a
[  933.393412] rescued zsh4 0x67
[  933.393415] rescued zsh4 0x68
[  933.393418] rescued zsh4 0x69
[  933.393421] rescued zsh4 0x6a
[  933.393424] rescued zsh4 0x6b
[  933.393427] rescued zsh4 0x6c
[  933.393430] rescued zsh4 0x6d
[  933.393433] rescued zsh4 0x6e
[  933.668221] rescued libpulsecore.so.5.0.1 0x19
[  933.672781] rescued libpulsecore.so.5.0.1 0x24
[  933.676581] rescued zsh4 0x6f
[  933.676585] rescued zsh4 0x71
[  933.676588] rescued zsh4 0x72
[  933.676591] rescued zsh4 0x73
[  933.676594] rescued zsh4 0x7d
[  933.676597] rescued zsh4 0x7f
[  933.676600] rescued zsh4 0x80
[  933.676603] rescued zsh4 0x84
[  933.676606] rescued zsh4 0x5f
[  933.676609] rescued zsh4 0x62
[  933.676612] rescued zsh4 0x63
[  933.676618] rescued terminfo.so 0x1
[  933.676622] rescued zle.so 0x1
[  933.676625] rescued zle.so 0x2
[  933.676628] rescued zle.so 0x3
[  933.676632] rescued zle.so 0x27
[  933.676635] rescued zle.so 0x28
[  933.676638] rescued zle.so 0x29
[  933.676641] rescued zle.so 0x2a
[  933.676644] rescued zle.so 0x2b
[  933.676647] rescued zle.so 0x2c
[  933.676650] rescued zle.so 0x2d
[  933.676653] rescued zle.so 0x2e
[  933.676656] rescued zle.so 0x2f
[  933.676659] rescued zle.so 0x30
[  933.676662] rescued zle.so 0x31
[  933.676665] rescued zle.so 0x32
[  933.676668] rescued zle.so 0x33
[  933.676671] rescued zle.so 0x34
[  933.676674] rescued zle.so 0x36
[  933.676677] rescued zle.so 0x38
[  933.676950] rescued zle.so 0x39
[  933.676953] rescued zle.so 0x4
[  933.676956] rescued zle.so 0x5
[  933.676960] rescued zle.so 0x6
[  933.676963] rescued zle.so 0x7
[  933.676966] rescued zle.so 0x8
[  933.676969] rescued zle.so 0x9
[  933.676972] rescued zle.so 0xa
[  933.676975] rescued zle.so 0xb
[  933.676978] rescued zle.so 0xc
[  933.676981] rescued zle.so 0xd
[  933.676983] rescued zle.so 0xe
[  933.676987] rescued zle.so 0xf
[  933.676990] rescued zle.so 0x10
[  933.676993] rescued zle.so 0x11
[  933.676996] rescued zle.so 0x12
[  933.676999] rescued zle.so 0x13
[  933.677002] rescued zle.so 0x14
[  933.677005] rescued zle.so 0x15
[  933.677008] rescued zle.so 0x16
[  933.677011] rescued zle.so 0x17
[  933.677014] rescued zle.so 0x1a
[  933.677017] rescued zle.so 0x1b
[  933.677021] rescued zle.so 0x1c
[  933.677024] rescued zle.so 0x1d
[  933.677027] rescued zle.so 0x1e
[  933.677030] rescued zle.so 0x20
[  933.677033] rescued zle.so 0x21
[  933.677036] rescued zle.so 0x23
[  933.677039] rescued zle.so 0x24
[  933.677042] rescued zle.so 0x25
[  933.677046] rescued zle.so 0x26
[  933.677056] rescued complete.so 0x0
[  933.677060] rescued complete.so 0x1
[  933.677063] rescued complete.so 0x2
[  933.677066] rescued complete.so 0x3
[  933.677069] rescued complete.so 0x5
[  933.677072] rescued complete.so 0x6
[  933.677075] rescued complete.so 0x7
[  933.677078] rescued complete.so 0x8
[  933.677081] rescued complete.so 0x9
[  933.677084] rescued complete.so 0xa
[  933.677087] rescued complete.so 0xb
[  933.677090] rescued complete.so 0xc
[  933.677093] rescued complete.so 0xd
[  933.677096] rescued complete.so 0xe
[  933.692562] rescued complete.so 0xf
[  933.692567] rescued complete.so 0x10
[  933.692570] rescued complete.so 0x11
[  933.692574] rescued complete.so 0x12
[  933.692577] rescued complete.so 0x13
[  933.692580] rescued complete.so 0x14
[  933.692583] rescued complete.so 0x15
[  933.692586] rescued complete.so 0x16
[  933.692589] rescued complete.so 0x17
[  933.692593] rescued complete.so 0x18
[  933.692596] rescued complete.so 0x19
[  933.692599] rescued complete.so 0x1a
[  933.692602] rescued complete.so 0x1b
[  933.692605] rescued complete.so 0x1c
[  933.692608] rescued complete.so 0x1d
[  933.692612] rescued complete.so 0x1e
[  933.692615] rescued complete.so 0x1f
[  933.692618] rescued complete.so 0x20
[  933.692621] rescued complete.so 0x4
[  933.692625] rescued zutil.so 0x0
[  933.692628] rescued zutil.so 0x1
[  933.692632] rescued zutil.so 0x2
[  933.692635] rescued zutil.so 0x3
[  933.692638] rescued zutil.so 0x4
[  933.692641] rescued zutil.so 0x5
[  933.692645] rescued rlimits.so 0x0
[  933.692649] rescued rlimits.so 0x1
[  933.692652] rescued rlimits.so 0x2
[  933.692656] rescued complist.so 0x0
[  933.692659] rescued complist.so 0x1
[  933.692662] rescued complist.so 0x2
[  933.692666] rescued complist.so 0x3
[  933.692942] rescued complist.so 0xc
[  933.692945] rescued complist.so 0xd
[  933.692950] rescued parameter.so 0x0
[  933.692954] rescued parameter.so 0x1
[  933.692957] rescued parameter.so 0x2
[  933.692960] rescued parameter.so 0x3
[  933.692963] rescued parameter.so 0x4
[  933.692966] rescued parameter.so 0x6
[  934.070241] rescued libpulsecore.so.5.0.1 0x25
[  934.074982] rescued libpulsecore.so.5.0.1 0x2b
[  934.079663] rescued libpulsecore.so.5.0.1 0x2c
[  934.084835] rescued computil.so 0x1
[  934.084864] rescued computil.so 0x0
[  934.092002] rescued computil.so 0x2
[  934.095601] rescued computil.so 0x4
[  934.099196] rescued computil.so 0x5
[  934.102795] rescued computil.so 0x6
[  934.106413] rescued computil.so 0x7
[  934.110022] rescued computil.so 0x8
[  934.113623] rescued computil.so 0xe
[  934.397565] rescued zle.so 0x35
[  934.553110] rescued parameter.so 0x5
[  934.557012] rescued zsh4 0x47

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08  8:09                                     ` Wu Fengguang
@ 2009-05-08  9:34                                       ` Minchan Kim
  -1 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-08  9:34 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Minchan Kim, Johannes Weiner, Andrew Morton, Peter Zijlstra,
	Rik van Riel, linux-kernel, tytso, linux-mm, Elladan,
	Nick Piggin, Christoph Lameter, KOSAKI Motohiro

On Fri, 8 May 2009 16:09:21 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Fri, May 08, 2009 at 03:30:42PM +0800, Minchan Kim wrote:
> > Hi, Let me have a question. 
> > 
> > On Fri, 8 May 2009 11:02:09 +0800
> > Wu Fengguang <fengguang.wu@intel.com> wrote:
> > 
> > > On Thu, May 07, 2009 at 11:10:39PM +0800, Johannes Weiner wrote:
> > > > On Thu, May 07, 2009 at 08:11:01PM +0800, Wu Fengguang wrote:
> > > > > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > > > > protect their referenced active pages from being deactivated.
> > > > > 
> > > > > CC: Elladan <elladan@eskimo.com>
> > > > > CC: Nick Piggin <npiggin@suse.de>
> > > > > CC: Johannes Weiner <hannes@cmpxchg.org>
> > > > > CC: Christoph Lameter <cl@linux-foundation.org>
> > > > > CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > > > > Acked-by: Rik van Riel <riel@redhat.com>
> > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > > ---
> > > > >  include/linux/pagemap.h |    1 +
> > > > >  mm/mmap.c               |    2 ++
> > > > >  mm/nommu.c              |    2 ++
> > > > >  mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
> > > > >  4 files changed, 38 insertions(+), 2 deletions(-)
> > > > > 
> > > > > --- linux.orig/include/linux/pagemap.h
> > > > > +++ linux/include/linux/pagemap.h
> > > > > @@ -25,6 +25,7 @@ enum mapping_flags {
> > > > >  #ifdef CONFIG_UNEVICTABLE_LRU
> > > > >  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> > > > >  #endif
> > > > > +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
> > > > >  };
> > > > >  
> > > > >  static inline void mapping_set_error(struct address_space *mapping, int error)
> > > > > --- linux.orig/mm/mmap.c
> > > > > +++ linux/mm/mmap.c
> > > > > @@ -1194,6 +1194,8 @@ munmap_back:
> > > > >  			goto unmap_and_free_vma;
> > > > >  		if (vm_flags & VM_EXECUTABLE)
> > > > >  			added_exe_file_vma(mm);
> > > > > +		if (vm_flags & VM_EXEC)
> > > > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > > > >  	} else if (vm_flags & VM_SHARED) {
> > > > >  		error = shmem_zero_setup(vma);
> > > > >  		if (error)
> > > > > --- linux.orig/mm/nommu.c
> > > > > +++ linux/mm/nommu.c
> > > > > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> > > > >  			added_exe_file_vma(current->mm);
> > > > >  			vma->vm_mm = current->mm;
> > > > >  		}
> > > > > +		if (vm_flags & VM_EXEC)
> > > > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > > > >  	}
> > > > 
> > > > I find it a bit ugly that it applies an attribute of the memory area
> > > > (per mm) to the page cache mapping (shared).  Because this in turn
> > > > means that the reference through a non-executable vma might get the
> > > > pages rotated just because there is/was an executable mmap around.
> > > 
> > > Right, the intention was to identify a whole executable/library file,
> > > eg. /bin/bash or /lib/libc-2.9.so, covering both _text_ and _data_
> > > sections.
> > 
> > But, your patch is care just text section. 
> > Do I miss something ?
> 
> This patch actually protects the mapped pages in the whole executable
> file.  Sorry, the title was a bit misleading..

Yeah. I was confusing with title. 
Thanks for quick reply. :)

> > > > >  	down_write(&nommu_region_sem);
> > > > > --- linux.orig/mm/vmscan.c
> > > > > +++ linux/mm/vmscan.c
> > > > > @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
> > > > >  	unsigned long pgmoved;
> > > > >  	unsigned long pgscanned;
> > > > >  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> > > > > +	LIST_HEAD(l_active);
> > > > >  	LIST_HEAD(l_inactive);
> > > > >  	struct page *page;
> > > > >  	struct pagevec pvec;
> > > > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> > > > >  
> > > > >  		/* page_referenced clears PageReferenced */
> > > > >  		if (page_mapping_inuse(page) &&
> > > > > -		    page_referenced(page, 0, sc->mem_cgroup))
> > > > > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > > > > +			struct address_space *mapping = page_mapping(page);
> > > > > +
> > > > >  			pgmoved++;
> > > > > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > > > +				list_add(&page->lru, &l_active);
> > > > > +				continue;
> > > > > +			}
> > > > > +		}
> > > > 
> > > > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > > > better to check if one of them is executable?  This would even work
> > > > for executable anon pages.  After all, there are applications that cow
> > > > executable mappings (sbcl and other language environments that use an
> > > > executable, run-time modified core image come to mind).
> > > 
> > > The page_referenced() path will only cover the _text_ section.  But
> > 
> > Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ? 
> > Could you elaborate please ?
> 
> I was under the wild assumption that only the _text_ section will be
> PROT_EXEC mapped.  No?

Yes. I support your idea. 

> Thanks,
> Fengguang

-- 
Kinds Regards
Minchan Kim

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08  9:34                                       ` Minchan Kim
  0 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-08  9:34 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Minchan Kim, Johannes Weiner, Andrew Morton, Peter Zijlstra,
	Rik van Riel, linux-kernel, tytso, linux-mm, Elladan,
	Nick Piggin, Christoph Lameter, KOSAKI Motohiro

On Fri, 8 May 2009 16:09:21 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Fri, May 08, 2009 at 03:30:42PM +0800, Minchan Kim wrote:
> > Hi, Let me have a question. 
> > 
> > On Fri, 8 May 2009 11:02:09 +0800
> > Wu Fengguang <fengguang.wu@intel.com> wrote:
> > 
> > > On Thu, May 07, 2009 at 11:10:39PM +0800, Johannes Weiner wrote:
> > > > On Thu, May 07, 2009 at 08:11:01PM +0800, Wu Fengguang wrote:
> > > > > Introduce AS_EXEC to mark executables and their linked libraries, and to
> > > > > protect their referenced active pages from being deactivated.
> > > > > 
> > > > > CC: Elladan <elladan@eskimo.com>
> > > > > CC: Nick Piggin <npiggin@suse.de>
> > > > > CC: Johannes Weiner <hannes@cmpxchg.org>
> > > > > CC: Christoph Lameter <cl@linux-foundation.org>
> > > > > CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > > > > Acked-by: Peter Zijlstra <peterz@infradead.org>
> > > > > Acked-by: Rik van Riel <riel@redhat.com>
> > > > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > > > ---
> > > > >  include/linux/pagemap.h |    1 +
> > > > >  mm/mmap.c               |    2 ++
> > > > >  mm/nommu.c              |    2 ++
> > > > >  mm/vmscan.c             |   35 +++++++++++++++++++++++++++++++++--
> > > > >  4 files changed, 38 insertions(+), 2 deletions(-)
> > > > > 
> > > > > --- linux.orig/include/linux/pagemap.h
> > > > > +++ linux/include/linux/pagemap.h
> > > > > @@ -25,6 +25,7 @@ enum mapping_flags {
> > > > >  #ifdef CONFIG_UNEVICTABLE_LRU
> > > > >  	AS_UNEVICTABLE	= __GFP_BITS_SHIFT + 3,	/* e.g., ramdisk, SHM_LOCK */
> > > > >  #endif
> > > > > +	AS_EXEC		= __GFP_BITS_SHIFT + 4,	/* mapped PROT_EXEC somewhere */
> > > > >  };
> > > > >  
> > > > >  static inline void mapping_set_error(struct address_space *mapping, int error)
> > > > > --- linux.orig/mm/mmap.c
> > > > > +++ linux/mm/mmap.c
> > > > > @@ -1194,6 +1194,8 @@ munmap_back:
> > > > >  			goto unmap_and_free_vma;
> > > > >  		if (vm_flags & VM_EXECUTABLE)
> > > > >  			added_exe_file_vma(mm);
> > > > > +		if (vm_flags & VM_EXEC)
> > > > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > > > >  	} else if (vm_flags & VM_SHARED) {
> > > > >  		error = shmem_zero_setup(vma);
> > > > >  		if (error)
> > > > > --- linux.orig/mm/nommu.c
> > > > > +++ linux/mm/nommu.c
> > > > > @@ -1224,6 +1224,8 @@ unsigned long do_mmap_pgoff(struct file 
> > > > >  			added_exe_file_vma(current->mm);
> > > > >  			vma->vm_mm = current->mm;
> > > > >  		}
> > > > > +		if (vm_flags & VM_EXEC)
> > > > > +			set_bit(AS_EXEC, &file->f_mapping->flags);
> > > > >  	}
> > > > 
> > > > I find it a bit ugly that it applies an attribute of the memory area
> > > > (per mm) to the page cache mapping (shared).  Because this in turn
> > > > means that the reference through a non-executable vma might get the
> > > > pages rotated just because there is/was an executable mmap around.
> > > 
> > > Right, the intention was to identify a whole executable/library file,
> > > eg. /bin/bash or /lib/libc-2.9.so, covering both _text_ and _data_
> > > sections.
> > 
> > But, your patch is care just text section. 
> > Do I miss something ?
> 
> This patch actually protects the mapped pages in the whole executable
> file.  Sorry, the title was a bit misleading..

Yeah. I was confusing with title. 
Thanks for quick reply. :)

> > > > >  	down_write(&nommu_region_sem);
> > > > > --- linux.orig/mm/vmscan.c
> > > > > +++ linux/mm/vmscan.c
> > > > > @@ -1230,6 +1230,7 @@ static void shrink_active_list(unsigned 
> > > > >  	unsigned long pgmoved;
> > > > >  	unsigned long pgscanned;
> > > > >  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> > > > > +	LIST_HEAD(l_active);
> > > > >  	LIST_HEAD(l_inactive);
> > > > >  	struct page *page;
> > > > >  	struct pagevec pvec;
> > > > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned 
> > > > >  
> > > > >  		/* page_referenced clears PageReferenced */
> > > > >  		if (page_mapping_inuse(page) &&
> > > > > -		    page_referenced(page, 0, sc->mem_cgroup))
> > > > > +		    page_referenced(page, 0, sc->mem_cgroup)) {
> > > > > +			struct address_space *mapping = page_mapping(page);
> > > > > +
> > > > >  			pgmoved++;
> > > > > +			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > > > > +				list_add(&page->lru, &l_active);
> > > > > +				continue;
> > > > > +			}
> > > > > +		}
> > > > 
> > > > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > > > better to check if one of them is executable?  This would even work
> > > > for executable anon pages.  After all, there are applications that cow
> > > > executable mappings (sbcl and other language environments that use an
> > > > executable, run-time modified core image come to mind).
> > > 
> > > The page_referenced() path will only cover the _text_ section.  But
> > 
> > Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ? 
> > Could you elaborate please ?
> 
> I was under the wild assumption that only the _text_ section will be
> PROT_EXEC mapped.  No?

Yes. I support your idea. 

> Thanks,
> Fengguang

-- 
Kinds Regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
  2009-05-08  4:17                                   ` Wu Fengguang
@ 2009-05-08 12:09                                     ` Minchan Kim
  -1 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-08 12:09 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Peter Zijlstra, Johannes Weiner, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

On Fri, May 8, 2009 at 1:17 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
>> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
>>
>> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
>> > >
>> > >           /* page_referenced clears PageReferenced */
>> > >           if (page_mapping_inuse(page) &&
>> > > -             page_referenced(page, 0, sc->mem_cgroup))
>> > > +             page_referenced(page, 0, sc->mem_cgroup)) {
>> > > +                 struct address_space *mapping = page_mapping(page);
>> > > +
>> > >                   pgmoved++;
>> > > +                 if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
>> > > +                         list_add(&page->lru, &l_active);
>> > > +                         continue;
>> > > +                 }
>> > > +         }
>> >
>> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
>> > better to check if one of them is executable?  This would even work
>> > for executable anon pages.  After all, there are applications that cow
>> > executable mappings (sbcl and other language environments that use an
>> > executable, run-time modified core image come to mind).
>>
>> Hmm, like provide a vm_flags mask along to page_referenced() to only
>> account matching vmas... seems like a sensible idea.
>
> Here is a quick patch for your opinions. Compile tested.
>
> With the added vm_flags reporting, the mlock=>unevictable logic can
> possibly be made more straightforward.
>
> Thanks,
> Fengguang
> ---
> vmscan: report vm_flags in page_referenced()
>
> This enables more informed reclaim heuristics, eg. to protect executable
> file pages more aggressively.
>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  include/linux/rmap.h |    5 +++--
>  mm/rmap.c            |   30 +++++++++++++++++++++---------
>  mm/vmscan.c          |    7 +++++--
>  3 files changed, 29 insertions(+), 13 deletions(-)
>
> --- linux.orig/include/linux/rmap.h
> +++ linux/include/linux/rmap.h
> @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
>  /*
>  * Called from mm/vmscan.c to handle paging out
>  */
> -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
> +int page_referenced(struct page *, int is_locked,
> +                       struct mem_cgroup *cnt, unsigned long *vm_flags);
>  int try_to_unmap(struct page *, int ignore_refs);
>
>  /*
> @@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
>  #define anon_vma_prepare(vma)  (0)
>  #define anon_vma_link(vma)     do {} while (0)
>
> -#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
> +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
>  #define try_to_unmap(page, refs) SWAP_FAIL
>
>  static inline int page_mkclean(struct page *page)
> --- linux.orig/mm/rmap.c
> +++ linux/mm/rmap.c
> @@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
>  * repeatedly from either page_referenced_anon or page_referenced_file.
>  */
>  static int page_referenced_one(struct page *page,
> -       struct vm_area_struct *vma, unsigned int *mapcount)
> +                              struct vm_area_struct *vma,
> +                              unsigned int *mapcount)
>  {
>        struct mm_struct *mm = vma->vm_mm;
>        unsigned long address;
> @@ -385,7 +386,8 @@ out:
>  }
>
>  static int page_referenced_anon(struct page *page,
> -                               struct mem_cgroup *mem_cont)
> +                               struct mem_cgroup *mem_cont,
> +                               unsigned long *vm_flags)
>  {
>        unsigned int mapcount;
>        struct anon_vma *anon_vma;
> @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
>                if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
>                        continue;
>                referenced += page_referenced_one(page, vma, &mapcount);
> +               *vm_flags |= vma->vm_flags;

Sometime this vma don't contain the anon page.
That's why we need page_check_address.
For such a case, wrong *vm_flag cause be harmful to reclaim.
It can be happen in your first class citizen patch, I think.



-- 
Kinds regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
@ 2009-05-08 12:09                                     ` Minchan Kim
  0 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-08 12:09 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Peter Zijlstra, Johannes Weiner, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

On Fri, May 8, 2009 at 1:17 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
>> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
>>
>> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
>> > >
>> > >           /* page_referenced clears PageReferenced */
>> > >           if (page_mapping_inuse(page) &&
>> > > -             page_referenced(page, 0, sc->mem_cgroup))
>> > > +             page_referenced(page, 0, sc->mem_cgroup)) {
>> > > +                 struct address_space *mapping = page_mapping(page);
>> > > +
>> > >                   pgmoved++;
>> > > +                 if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
>> > > +                         list_add(&page->lru, &l_active);
>> > > +                         continue;
>> > > +                 }
>> > > +         }
>> >
>> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
>> > better to check if one of them is executable?  This would even work
>> > for executable anon pages.  After all, there are applications that cow
>> > executable mappings (sbcl and other language environments that use an
>> > executable, run-time modified core image come to mind).
>>
>> Hmm, like provide a vm_flags mask along to page_referenced() to only
>> account matching vmas... seems like a sensible idea.
>
> Here is a quick patch for your opinions. Compile tested.
>
> With the added vm_flags reporting, the mlock=>unevictable logic can
> possibly be made more straightforward.
>
> Thanks,
> Fengguang
> ---
> vmscan: report vm_flags in page_referenced()
>
> This enables more informed reclaim heuristics, eg. to protect executable
> file pages more aggressively.
>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  include/linux/rmap.h |    5 +++--
>  mm/rmap.c            |   30 +++++++++++++++++++++---------
>  mm/vmscan.c          |    7 +++++--
>  3 files changed, 29 insertions(+), 13 deletions(-)
>
> --- linux.orig/include/linux/rmap.h
> +++ linux/include/linux/rmap.h
> @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
>  /*
>  * Called from mm/vmscan.c to handle paging out
>  */
> -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
> +int page_referenced(struct page *, int is_locked,
> +                       struct mem_cgroup *cnt, unsigned long *vm_flags);
>  int try_to_unmap(struct page *, int ignore_refs);
>
>  /*
> @@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
>  #define anon_vma_prepare(vma)  (0)
>  #define anon_vma_link(vma)     do {} while (0)
>
> -#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
> +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
>  #define try_to_unmap(page, refs) SWAP_FAIL
>
>  static inline int page_mkclean(struct page *page)
> --- linux.orig/mm/rmap.c
> +++ linux/mm/rmap.c
> @@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
>  * repeatedly from either page_referenced_anon or page_referenced_file.
>  */
>  static int page_referenced_one(struct page *page,
> -       struct vm_area_struct *vma, unsigned int *mapcount)
> +                              struct vm_area_struct *vma,
> +                              unsigned int *mapcount)
>  {
>        struct mm_struct *mm = vma->vm_mm;
>        unsigned long address;
> @@ -385,7 +386,8 @@ out:
>  }
>
>  static int page_referenced_anon(struct page *page,
> -                               struct mem_cgroup *mem_cont)
> +                               struct mem_cgroup *mem_cont,
> +                               unsigned long *vm_flags)
>  {
>        unsigned int mapcount;
>        struct anon_vma *anon_vma;
> @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
>                if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
>                        continue;
>                referenced += page_referenced_one(page, vma, &mapcount);
> +               *vm_flags |= vma->vm_flags;

Sometime this vma don't contain the anon page.
That's why we need page_check_address.
For such a case, wrong *vm_flag cause be harmful to reclaim.
It can be happen in your first class citizen patch, I think.



-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
  2009-05-08 12:09                                     ` Minchan Kim
@ 2009-05-08 12:15                                       ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08 12:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Peter Zijlstra, Johannes Weiner, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

On Fri, May 08, 2009 at 08:09:24PM +0800, Minchan Kim wrote:
> On Fri, May 8, 2009 at 1:17 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> >> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> >>
> >> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
> >> > >
> >> > >           /* page_referenced clears PageReferenced */
> >> > >           if (page_mapping_inuse(page) &&
> >> > > -             page_referenced(page, 0, sc->mem_cgroup))
> >> > > +             page_referenced(page, 0, sc->mem_cgroup)) {
> >> > > +                 struct address_space *mapping = page_mapping(page);
> >> > > +
> >> > >                   pgmoved++;
> >> > > +                 if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> >> > > +                         list_add(&page->lru, &l_active);
> >> > > +                         continue;
> >> > > +                 }
> >> > > +         }
> >> >
> >> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> >> > better to check if one of them is executable?  This would even work
> >> > for executable anon pages.  After all, there are applications that cow
> >> > executable mappings (sbcl and other language environments that use an
> >> > executable, run-time modified core image come to mind).
> >>
> >> Hmm, like provide a vm_flags mask along to page_referenced() to only
> >> account matching vmas... seems like a sensible idea.
> >
> > Here is a quick patch for your opinions. Compile tested.
> >
> > With the added vm_flags reporting, the mlock=>unevictable logic can
> > possibly be made more straightforward.
> >
> > Thanks,
> > Fengguang
> > ---
> > vmscan: report vm_flags in page_referenced()
> >
> > This enables more informed reclaim heuristics, eg. to protect executable
> > file pages more aggressively.
> >
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  include/linux/rmap.h |    5 +++--
> >  mm/rmap.c            |   30 +++++++++++++++++++++---------
> >  mm/vmscan.c          |    7 +++++--
> >  3 files changed, 29 insertions(+), 13 deletions(-)
> >
> > --- linux.orig/include/linux/rmap.h
> > +++ linux/include/linux/rmap.h
> > @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
> >  /*
> >  * Called from mm/vmscan.c to handle paging out
> >  */
> > -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
> > +int page_referenced(struct page *, int is_locked,
> > +                       struct mem_cgroup *cnt, unsigned long *vm_flags);
> >  int try_to_unmap(struct page *, int ignore_refs);
> >
> >  /*
> > @@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
> >  #define anon_vma_prepare(vma)  (0)
> >  #define anon_vma_link(vma)     do {} while (0)
> >
> > -#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
> > +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
> >  #define try_to_unmap(page, refs) SWAP_FAIL
> >
> >  static inline int page_mkclean(struct page *page)
> > --- linux.orig/mm/rmap.c
> > +++ linux/mm/rmap.c
> > @@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
> >  * repeatedly from either page_referenced_anon or page_referenced_file.
> >  */
> >  static int page_referenced_one(struct page *page,
> > -       struct vm_area_struct *vma, unsigned int *mapcount)
> > +                              struct vm_area_struct *vma,
> > +                              unsigned int *mapcount)
> >  {
> >        struct mm_struct *mm = vma->vm_mm;
> >        unsigned long address;
> > @@ -385,7 +386,8 @@ out:
> >  }
> >
> >  static int page_referenced_anon(struct page *page,
> > -                               struct mem_cgroup *mem_cont)
> > +                               struct mem_cgroup *mem_cont,
> > +                               unsigned long *vm_flags)
> >  {
> >        unsigned int mapcount;
> >        struct anon_vma *anon_vma;
> > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
> >                if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> >                        continue;
> >                referenced += page_referenced_one(page, vma, &mapcount);
> > +               *vm_flags |= vma->vm_flags;
> 
> Sometime this vma don't contain the anon page.
> That's why we need page_check_address.
> For such a case, wrong *vm_flag cause be harmful to reclaim.
> It can be happen in your first class citizen patch, I think.

Yes I'm aware of that - the VMA area covers that page, but have no pte
actually installed for that page. That should be OK - the presentation
of such VMA is a good indication of it being some executable text.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
@ 2009-05-08 12:15                                       ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-08 12:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Peter Zijlstra, Johannes Weiner, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

On Fri, May 08, 2009 at 08:09:24PM +0800, Minchan Kim wrote:
> On Fri, May 8, 2009 at 1:17 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> >> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> >>
> >> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
> >> > >
> >> > > A  A  A  A  A  /* page_referenced clears PageReferenced */
> >> > > A  A  A  A  A  if (page_mapping_inuse(page) &&
> >> > > - A  A  A  A  A  A  page_referenced(page, 0, sc->mem_cgroup))
> >> > > + A  A  A  A  A  A  page_referenced(page, 0, sc->mem_cgroup)) {
> >> > > + A  A  A  A  A  A  A  A  struct address_space *mapping = page_mapping(page);
> >> > > +
> >> > > A  A  A  A  A  A  A  A  A  pgmoved++;
> >> > > + A  A  A  A  A  A  A  A  if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> >> > > + A  A  A  A  A  A  A  A  A  A  A  A  list_add(&page->lru, &l_active);
> >> > > + A  A  A  A  A  A  A  A  A  A  A  A  continue;
> >> > > + A  A  A  A  A  A  A  A  }
> >> > > + A  A  A  A  }
> >> >
> >> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> >> > better to check if one of them is executable? A This would even work
> >> > for executable anon pages. A After all, there are applications that cow
> >> > executable mappings (sbcl and other language environments that use an
> >> > executable, run-time modified core image come to mind).
> >>
> >> Hmm, like provide a vm_flags mask along to page_referenced() to only
> >> account matching vmas... seems like a sensible idea.
> >
> > Here is a quick patch for your opinions. Compile tested.
> >
> > With the added vm_flags reporting, the mlock=>unevictable logic can
> > possibly be made more straightforward.
> >
> > Thanks,
> > Fengguang
> > ---
> > vmscan: report vm_flags in page_referenced()
> >
> > This enables more informed reclaim heuristics, eg. to protect executable
> > file pages more aggressively.
> >
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> > A include/linux/rmap.h | A  A 5 +++--
> > A mm/rmap.c A  A  A  A  A  A | A  30 +++++++++++++++++++++---------
> > A mm/vmscan.c A  A  A  A  A | A  A 7 +++++--
> > A 3 files changed, 29 insertions(+), 13 deletions(-)
> >
> > --- linux.orig/include/linux/rmap.h
> > +++ linux/include/linux/rmap.h
> > @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
> > A /*
> > A * Called from mm/vmscan.c to handle paging out
> > A */
> > -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
> > +int page_referenced(struct page *, int is_locked,
> > + A  A  A  A  A  A  A  A  A  A  A  struct mem_cgroup *cnt, unsigned long *vm_flags);
> > A int try_to_unmap(struct page *, int ignore_refs);
> >
> > A /*
> > @@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
> > A #define anon_vma_prepare(vma) A (0)
> > A #define anon_vma_link(vma) A  A  do {} while (0)
> >
> > -#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
> > +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
> > A #define try_to_unmap(page, refs) SWAP_FAIL
> >
> > A static inline int page_mkclean(struct page *page)
> > --- linux.orig/mm/rmap.c
> > +++ linux/mm/rmap.c
> > @@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
> > A * repeatedly from either page_referenced_anon or page_referenced_file.
> > A */
> > A static int page_referenced_one(struct page *page,
> > - A  A  A  struct vm_area_struct *vma, unsigned int *mapcount)
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A struct vm_area_struct *vma,
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A unsigned int *mapcount)
> > A {
> > A  A  A  A struct mm_struct *mm = vma->vm_mm;
> > A  A  A  A unsigned long address;
> > @@ -385,7 +386,8 @@ out:
> > A }
> >
> > A static int page_referenced_anon(struct page *page,
> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  struct mem_cgroup *mem_cont)
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  struct mem_cgroup *mem_cont,
> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  unsigned long *vm_flags)
> > A {
> > A  A  A  A unsigned int mapcount;
> > A  A  A  A struct anon_vma *anon_vma;
> > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
> > A  A  A  A  A  A  A  A if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> > A  A  A  A  A  A  A  A  A  A  A  A continue;
> > A  A  A  A  A  A  A  A referenced += page_referenced_one(page, vma, &mapcount);
> > + A  A  A  A  A  A  A  *vm_flags |= vma->vm_flags;
> 
> Sometime this vma don't contain the anon page.
> That's why we need page_check_address.
> For such a case, wrong *vm_flag cause be harmful to reclaim.
> It can be happen in your first class citizen patch, I think.

Yes I'm aware of that - the VMA area covers that page, but have no pte
actually installed for that page. That should be OK - the presentation
of such VMA is a good indication of it being some executable text.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
  2009-05-08 12:15                                       ` Wu Fengguang
@ 2009-05-08 14:01                                         ` Minchan Kim
  -1 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-08 14:01 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Peter Zijlstra, Johannes Weiner, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

On Fri, May 8, 2009 at 9:15 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Fri, May 08, 2009 at 08:09:24PM +0800, Minchan Kim wrote:
>> On Fri, May 8, 2009 at 1:17 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> > On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
>> >> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
>> >>
>> >> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
>> >> > >
>> >> > >           /* page_referenced clears PageReferenced */
>> >> > >           if (page_mapping_inuse(page) &&
>> >> > > -             page_referenced(page, 0, sc->mem_cgroup))
>> >> > > +             page_referenced(page, 0, sc->mem_cgroup)) {
>> >> > > +                 struct address_space *mapping = page_mapping(page);
>> >> > > +
>> >> > >                   pgmoved++;
>> >> > > +                 if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
>> >> > > +                         list_add(&page->lru, &l_active);
>> >> > > +                         continue;
>> >> > > +                 }
>> >> > > +         }
>> >> >
>> >> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
>> >> > better to check if one of them is executable?  This would even work
>> >> > for executable anon pages.  After all, there are applications that cow
>> >> > executable mappings (sbcl and other language environments that use an
>> >> > executable, run-time modified core image come to mind).
>> >>
>> >> Hmm, like provide a vm_flags mask along to page_referenced() to only
>> >> account matching vmas... seems like a sensible idea.
>> >
>> > Here is a quick patch for your opinions. Compile tested.
>> >
>> > With the added vm_flags reporting, the mlock=>unevictable logic can
>> > possibly be made more straightforward.
>> >
>> > Thanks,
>> > Fengguang
>> > ---
>> > vmscan: report vm_flags in page_referenced()
>> >
>> > This enables more informed reclaim heuristics, eg. to protect executable
>> > file pages more aggressively.
>> >
>> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
>> > ---
>> >  include/linux/rmap.h |    5 +++--
>> >  mm/rmap.c            |   30 +++++++++++++++++++++---------
>> >  mm/vmscan.c          |    7 +++++--
>> >  3 files changed, 29 insertions(+), 13 deletions(-)
>> >
>> > --- linux.orig/include/linux/rmap.h
>> > +++ linux/include/linux/rmap.h
>> > @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
>> >  /*
>> >  * Called from mm/vmscan.c to handle paging out
>> >  */
>> > -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
>> > +int page_referenced(struct page *, int is_locked,
>> > +                       struct mem_cgroup *cnt, unsigned long *vm_flags);
>> >  int try_to_unmap(struct page *, int ignore_refs);
>> >
>> >  /*
>> > @@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
>> >  #define anon_vma_prepare(vma)  (0)
>> >  #define anon_vma_link(vma)     do {} while (0)
>> >
>> > -#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
>> > +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
>> >  #define try_to_unmap(page, refs) SWAP_FAIL
>> >
>> >  static inline int page_mkclean(struct page *page)
>> > --- linux.orig/mm/rmap.c
>> > +++ linux/mm/rmap.c
>> > @@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
>> >  * repeatedly from either page_referenced_anon or page_referenced_file.
>> >  */
>> >  static int page_referenced_one(struct page *page,
>> > -       struct vm_area_struct *vma, unsigned int *mapcount)
>> > +                              struct vm_area_struct *vma,
>> > +                              unsigned int *mapcount)
>> >  {
>> >        struct mm_struct *mm = vma->vm_mm;
>> >        unsigned long address;
>> > @@ -385,7 +386,8 @@ out:
>> >  }
>> >
>> >  static int page_referenced_anon(struct page *page,
>> > -                               struct mem_cgroup *mem_cont)
>> > +                               struct mem_cgroup *mem_cont,
>> > +                               unsigned long *vm_flags)
>> >  {
>> >        unsigned int mapcount;
>> >        struct anon_vma *anon_vma;
>> > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
>> >                if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
>> >                        continue;
>> >                referenced += page_referenced_one(page, vma, &mapcount);
>> > +               *vm_flags |= vma->vm_flags;
>>
>> Sometime this vma don't contain the anon page.
>> That's why we need page_check_address.
>> For such a case, wrong *vm_flag cause be harmful to reclaim.
>> It can be happen in your first class citizen patch, I think.
>
> Yes I'm aware of that - the VMA area covers that page, but have no pte
> actually installed for that page. That should be OK - the presentation
> of such VMA is a good indication of it being some executable text.
>

Sorry but I can't understand your point.

This is general interface but not only executable text.
Sometime, The information of vma which don't really have the page can
be passed to caller.
ex) It can be happen by COW, mremap, non-linear mapping and so on.
but I am not sure.
I doubt  vm_flag information is useful.
-- 
Kinds regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
@ 2009-05-08 14:01                                         ` Minchan Kim
  0 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-08 14:01 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Peter Zijlstra, Johannes Weiner, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

On Fri, May 8, 2009 at 9:15 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Fri, May 08, 2009 at 08:09:24PM +0800, Minchan Kim wrote:
>> On Fri, May 8, 2009 at 1:17 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> > On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
>> >> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
>> >>
>> >> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
>> >> > >
>> >> > >           /* page_referenced clears PageReferenced */
>> >> > >           if (page_mapping_inuse(page) &&
>> >> > > -             page_referenced(page, 0, sc->mem_cgroup))
>> >> > > +             page_referenced(page, 0, sc->mem_cgroup)) {
>> >> > > +                 struct address_space *mapping = page_mapping(page);
>> >> > > +
>> >> > >                   pgmoved++;
>> >> > > +                 if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
>> >> > > +                         list_add(&page->lru, &l_active);
>> >> > > +                         continue;
>> >> > > +                 }
>> >> > > +         }
>> >> >
>> >> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
>> >> > better to check if one of them is executable?  This would even work
>> >> > for executable anon pages.  After all, there are applications that cow
>> >> > executable mappings (sbcl and other language environments that use an
>> >> > executable, run-time modified core image come to mind).
>> >>
>> >> Hmm, like provide a vm_flags mask along to page_referenced() to only
>> >> account matching vmas... seems like a sensible idea.
>> >
>> > Here is a quick patch for your opinions. Compile tested.
>> >
>> > With the added vm_flags reporting, the mlock=>unevictable logic can
>> > possibly be made more straightforward.
>> >
>> > Thanks,
>> > Fengguang
>> > ---
>> > vmscan: report vm_flags in page_referenced()
>> >
>> > This enables more informed reclaim heuristics, eg. to protect executable
>> > file pages more aggressively.
>> >
>> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
>> > ---
>> >  include/linux/rmap.h |    5 +++--
>> >  mm/rmap.c            |   30 +++++++++++++++++++++---------
>> >  mm/vmscan.c          |    7 +++++--
>> >  3 files changed, 29 insertions(+), 13 deletions(-)
>> >
>> > --- linux.orig/include/linux/rmap.h
>> > +++ linux/include/linux/rmap.h
>> > @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
>> >  /*
>> >  * Called from mm/vmscan.c to handle paging out
>> >  */
>> > -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
>> > +int page_referenced(struct page *, int is_locked,
>> > +                       struct mem_cgroup *cnt, unsigned long *vm_flags);
>> >  int try_to_unmap(struct page *, int ignore_refs);
>> >
>> >  /*
>> > @@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
>> >  #define anon_vma_prepare(vma)  (0)
>> >  #define anon_vma_link(vma)     do {} while (0)
>> >
>> > -#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
>> > +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
>> >  #define try_to_unmap(page, refs) SWAP_FAIL
>> >
>> >  static inline int page_mkclean(struct page *page)
>> > --- linux.orig/mm/rmap.c
>> > +++ linux/mm/rmap.c
>> > @@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
>> >  * repeatedly from either page_referenced_anon or page_referenced_file.
>> >  */
>> >  static int page_referenced_one(struct page *page,
>> > -       struct vm_area_struct *vma, unsigned int *mapcount)
>> > +                              struct vm_area_struct *vma,
>> > +                              unsigned int *mapcount)
>> >  {
>> >        struct mm_struct *mm = vma->vm_mm;
>> >        unsigned long address;
>> > @@ -385,7 +386,8 @@ out:
>> >  }
>> >
>> >  static int page_referenced_anon(struct page *page,
>> > -                               struct mem_cgroup *mem_cont)
>> > +                               struct mem_cgroup *mem_cont,
>> > +                               unsigned long *vm_flags)
>> >  {
>> >        unsigned int mapcount;
>> >        struct anon_vma *anon_vma;
>> > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
>> >                if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
>> >                        continue;
>> >                referenced += page_referenced_one(page, vma, &mapcount);
>> > +               *vm_flags |= vma->vm_flags;
>>
>> Sometime this vma don't contain the anon page.
>> That's why we need page_check_address.
>> For such a case, wrong *vm_flag cause be harmful to reclaim.
>> It can be happen in your first class citizen patch, I think.
>
> Yes I'm aware of that - the VMA area covers that page, but have no pte
> actually installed for that page. That should be OK - the presentation
> of such VMA is a good indication of it being some executable text.
>

Sorry but I can't understand your point.

This is general interface but not only executable text.
Sometime, The information of vma which don't really have the page can
be passed to caller.
ex) It can be happen by COW, mremap, non-linear mapping and so on.
but I am not sure.
I doubt  vm_flag information is useful.
-- 
Kinds regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08  9:34                                       ` Minchan Kim
@ 2009-05-08 14:25                                         ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-08 14:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Wu Fengguang, Johannes Weiner, Andrew Morton, Peter Zijlstra,
	Rik van Riel, linux-kernel, tytso, linux-mm, Elladan,
	Nick Piggin, KOSAKI Motohiro

On Fri, 8 May 2009, Minchan Kim wrote:

> > > Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ?
> > > Could you elaborate please ?
> >
> > I was under the wild assumption that only the _text_ section will be
> > PROT_EXEC mapped.  No?
>
> Yes. I support your idea.

Why do PROT_EXEC mapped segments deserve special treatment? What about the
other memory segments of the process? Essentials like stack, heap and
data segments of the libraries?


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 14:25                                         ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-08 14:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Wu Fengguang, Johannes Weiner, Andrew Morton, Peter Zijlstra,
	Rik van Riel, linux-kernel, tytso, linux-mm, Elladan,
	Nick Piggin, KOSAKI Motohiro

On Fri, 8 May 2009, Minchan Kim wrote:

> > > Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ?
> > > Could you elaborate please ?
> >
> > I was under the wild assumption that only the _text_ section will be
> > PROT_EXEC mapped.  No?
>
> Yes. I support your idea.

Why do PROT_EXEC mapped segments deserve special treatment? What about the
other memory segments of the process? Essentials like stack, heap and
data segments of the libraries?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08 14:25                                         ` Christoph Lameter
@ 2009-05-08 14:34                                           ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-08 14:34 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Minchan Kim, Wu Fengguang, Johannes Weiner, Andrew Morton,
	Peter Zijlstra, linux-kernel, tytso, linux-mm, Elladan,
	Nick Piggin, KOSAKI Motohiro

Christoph Lameter wrote:
> On Fri, 8 May 2009, Minchan Kim wrote:
> 
>>>> Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ?
>>>> Could you elaborate please ?
>>> I was under the wild assumption that only the _text_ section will be
>>> PROT_EXEC mapped.  No?
>> Yes. I support your idea.
> 
> Why do PROT_EXEC mapped segments deserve special treatment? What about the
> other memory segments of the process? Essentials like stack, heap and
> data segments of the libraries?

Christopher, please look at what changed in the VM
since 2.6.29 and you will understand how the stack,
heap and data segments already get special treatment.

Please stop pretending you're an idiot.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 14:34                                           ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-08 14:34 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Minchan Kim, Wu Fengguang, Johannes Weiner, Andrew Morton,
	Peter Zijlstra, linux-kernel, tytso, linux-mm, Elladan,
	Nick Piggin, KOSAKI Motohiro

Christoph Lameter wrote:
> On Fri, 8 May 2009, Minchan Kim wrote:
> 
>>>> Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ?
>>>> Could you elaborate please ?
>>> I was under the wild assumption that only the _text_ section will be
>>> PROT_EXEC mapped.  No?
>> Yes. I support your idea.
> 
> Why do PROT_EXEC mapped segments deserve special treatment? What about the
> other memory segments of the process? Essentials like stack, heap and
> data segments of the libraries?

Christopher, please look at what changed in the VM
since 2.6.29 and you will understand how the stack,
heap and data segments already get special treatment.

Please stop pretending you're an idiot.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08  3:40                                           ` Elladan
@ 2009-05-08 16:04                                             ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-08 16:04 UTC (permalink / raw)
  To: Elladan
  Cc: Christoph Lameter, Lee Schermerhorn, Peter Zijlstra,
	Wu Fengguang, Andrew Morton, linux-kernel, tytso, linux-mm,
	Nick Piggin, Johannes Weiner, KOSAKI Motohiro

Elladan wrote:

>> Nobody (except you) is proposing that we completely disable
>> the eviction of executable pages.  I believe that your idea
>> could easily lead to a denial of service attack, with a user
>> creating a very large executable file and mmaping it.
>>
>> Giving executable pages some priority over other file cache
>> pages is nowhere near as dangerous wrt. unexpected side effects
>> and should work just as well.
> 
> I don't think this sort of DOS is relevant for a single user or trusted user
> system.  

Which not all systems are, meaning that the mechanism
Christoph proposes can never be enabled by default and
would have to be tweaked by the user.

I prefer code that should work just as well 99% of the
time, but can be enabled by default for everybody.
That way people automatically get the benefit.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 16:04                                             ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-08 16:04 UTC (permalink / raw)
  To: Elladan
  Cc: Christoph Lameter, Lee Schermerhorn, Peter Zijlstra,
	Wu Fengguang, Andrew Morton, linux-kernel, tytso, linux-mm,
	Nick Piggin, Johannes Weiner, KOSAKI Motohiro

Elladan wrote:

>> Nobody (except you) is proposing that we completely disable
>> the eviction of executable pages.  I believe that your idea
>> could easily lead to a denial of service attack, with a user
>> creating a very large executable file and mmaping it.
>>
>> Giving executable pages some priority over other file cache
>> pages is nowhere near as dangerous wrt. unexpected side effects
>> and should work just as well.
> 
> I don't think this sort of DOS is relevant for a single user or trusted user
> system.  

Which not all systems are, meaning that the mechanism
Christoph proposes can never be enabled by default and
would have to be tweaked by the user.

I prefer code that should work just as well 99% of the
time, but can be enabled by default for everybody.
That way people automatically get the benefit.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08  3:40                                           ` Elladan
@ 2009-05-08 17:18                                             ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-08 17:18 UTC (permalink / raw)
  To: Elladan
  Cc: Rik van Riel, Lee Schermerhorn, Peter Zijlstra, Wu Fengguang,
	Andrew Morton, linux-kernel, tytso, linux-mm, Nick Piggin,
	Johannes Weiner, KOSAKI Motohiro

On Thu, 7 May 2009, Elladan wrote:

> > Nobody (except you) is proposing that we completely disable
> > the eviction of executable pages.  I believe that your idea
> > could easily lead to a denial of service attack, with a user
> > creating a very large executable file and mmaping it.

The amount of mlockable pages is limited via ulimit. We can already make
the pages unreclaimable through mlock().

> I don't know of any distro that applies default ulimits, so desktops are
> already susceptible to the far more trivial "call malloc a lot" or "fork bomb"
> attacks.  Plus, ulimits don't help, since they only apply per process - you'd
> need a default mem cgroup before this mattered, I think.

The point remains that the proposed patch does not solve the general
problem that we encounter with exec pages of critical components of the
user interface being evicted from memory.

Do we have test data that shows a benefit? The description is minimal. Rik
claimed on IRC that tests have been done. If so then the patch description
should include the tests. Which loads benefit from this patch?

A significant change to the reclaim algorithm also needs to
have a clear description of the effects on reclaim behavior which is also
lacking.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 17:18                                             ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-08 17:18 UTC (permalink / raw)
  To: Elladan
  Cc: Rik van Riel, Lee Schermerhorn, Peter Zijlstra, Wu Fengguang,
	Andrew Morton, linux-kernel, tytso, linux-mm, Nick Piggin,
	Johannes Weiner, KOSAKI Motohiro

On Thu, 7 May 2009, Elladan wrote:

> > Nobody (except you) is proposing that we completely disable
> > the eviction of executable pages.  I believe that your idea
> > could easily lead to a denial of service attack, with a user
> > creating a very large executable file and mmaping it.

The amount of mlockable pages is limited via ulimit. We can already make
the pages unreclaimable through mlock().

> I don't know of any distro that applies default ulimits, so desktops are
> already susceptible to the far more trivial "call malloc a lot" or "fork bomb"
> attacks.  Plus, ulimits don't help, since they only apply per process - you'd
> need a default mem cgroup before this mattered, I think.

The point remains that the proposed patch does not solve the general
problem that we encounter with exec pages of critical components of the
user interface being evicted from memory.

Do we have test data that shows a benefit? The description is minimal. Rik
claimed on IRC that tests have been done. If so then the patch description
should include the tests. Which loads benefit from this patch?

A significant change to the reclaim algorithm also needs to
have a clear description of the effects on reclaim behavior which is also
lacking.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08  3:40                                           ` Elladan
@ 2009-05-08 17:37                                             ` Alan Cox
  -1 siblings, 0 replies; 336+ messages in thread
From: Alan Cox @ 2009-05-08 17:37 UTC (permalink / raw)
  To: Elladan
  Cc: Rik van Riel, Christoph Lameter, Lee Schermerhorn,
	Peter Zijlstra, Wu Fengguang, Andrew Morton, linux-kernel, tytso,
	linux-mm, Elladan, Nick Piggin, Johannes Weiner, KOSAKI Motohiro


> I don't think this sort of DOS is relevant for a single user or trusted user
> system.  
> 
> I don't know of any distro that applies default ulimits, so desktops are

A lot of people turn on the vm overcommit protection. In fact if you run
some of the standard desktop apps today its practically essential to deal
with them quietly leaking the box into oblivion or just going mad at
random intervals.

> already susceptible to the far more trivial "call malloc a lot" or "fork bomb"
> attacks.  Plus, ulimits don't help, since they only apply per process - you'd
> need a default mem cgroup before this mattered, I think.

We have a system wide one in effect via the vm overcommit stuff and have
had for years. It works, its relevant and even if it didn't "everything
else sucks" isn't an excuse for more suckage but a call for better things.

If you want any kind of tunable user controllable vm priority then the
obvious things to do would be to borrow the nice() values or implement a
vmnice() for VMAs so users can only say "flog me harder".

Not I fear that it matters - until you fix the two problems of obscenely
bloated leaky apps and bad I/O performance its really an "everything
louder than everything else" kind of argument.

Alan

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 17:37                                             ` Alan Cox
  0 siblings, 0 replies; 336+ messages in thread
From: Alan Cox @ 2009-05-08 17:37 UTC (permalink / raw)
  To: Elladan
  Cc: Rik van Riel, Christoph Lameter, Lee Schermerhorn,
	Peter Zijlstra, Wu Fengguang, Andrew Morton, linux-kernel, tytso,
	linux-mm, Nick Piggin, Johannes Weiner, KOSAKI Motohiro


> I don't think this sort of DOS is relevant for a single user or trusted user
> system.  
> 
> I don't know of any distro that applies default ulimits, so desktops are

A lot of people turn on the vm overcommit protection. In fact if you run
some of the standard desktop apps today its practically essential to deal
with them quietly leaking the box into oblivion or just going mad at
random intervals.

> already susceptible to the far more trivial "call malloc a lot" or "fork bomb"
> attacks.  Plus, ulimits don't help, since they only apply per process - you'd
> need a default mem cgroup before this mattered, I think.

We have a system wide one in effect via the vm overcommit stuff and have
had for years. It works, its relevant and even if it didn't "everything
else sucks" isn't an excuse for more suckage but a call for better things.

If you want any kind of tunable user controllable vm priority then the
obvious things to do would be to borrow the nice() values or implement a
vmnice() for VMAs so users can only say "flog me harder".

Not I fear that it matters - until you fix the two problems of obscenely
bloated leaky apps and bad I/O performance its really an "everything
louder than everything else" kind of argument.

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-08 14:25                                         ` Christoph Lameter
@ 2009-05-08 17:41                                           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-08 17:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Minchan Kim, Wu Fengguang, Johannes Weiner, Andrew Morton,
	Peter Zijlstra, Rik van Riel, linux-kernel, tytso, linux-mm,
	Elladan, Nick Piggin

>> > > Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ?
>> > > Could you elaborate please ?
>> >
>> > I was under the wild assumption that only the _text_ section will be
>> > PROT_EXEC mapped.  No?
>>
>> Yes. I support your idea.
>
> Why do PROT_EXEC mapped segments deserve special treatment? What about the
> other memory segments of the process? Essentials like stack, heap and
> data segments of the libraries?

Currently, file-backed page and swap-backed page are lived in separate lru.

text section: file
stack: anon
heap: anon
data segment: anon

and, streaming IO problem don't affect swap-backed lru. it's only
file-backed lru problem.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 17:41                                           ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-08 17:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Minchan Kim, Wu Fengguang, Johannes Weiner, Andrew Morton,
	Peter Zijlstra, Rik van Riel, linux-kernel, tytso, linux-mm,
	Elladan, Nick Piggin

>> > > Why did you said that "The page_referenced() path will only cover the ""_text_"" section" ?
>> > > Could you elaborate please ?
>> >
>> > I was under the wild assumption that only the _text_ section will be
>> > PROT_EXEC mapped.  No?
>>
>> Yes. I support your idea.
>
> Why do PROT_EXEC mapped segments deserve special treatment? What about the
> other memory segments of the process? Essentials like stack, heap and
> data segments of the libraries?

Currently, file-backed page and swap-backed page are lived in separate lru.

text section: file
stack: anon
heap: anon
data segment: anon

and, streaming IO problem don't affect swap-backed lru. it's only
file-backed lru problem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08  8:16                                   ` Wu Fengguang
@ 2009-05-08 19:58                                     ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-08 19:58 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

On Fri, 8 May 2009 16:16:08 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> vmscan: make mapped executable pages the first class citizen
> 
> Protect referenced PROT_EXEC mapped pages from being deactivated.
> 
> PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
> currently running executables and their linked libraries, they shall really be
> cached aggressively to provide good user experiences.
> 

The patch seems reasonable but the changelog and the (non-existent)
design documentation could do with a touch-up.

> 
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned 
>  	unsigned long pgscanned;
>  	unsigned long vm_flags;
>  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> +	LIST_HEAD(l_active);
>  	LIST_HEAD(l_inactive);
>  	struct page *page;
>  	struct pagevec pvec;
> @@ -1272,8 +1273,13 @@ static void shrink_active_list(unsigned 
>  
>  		/* page_referenced clears PageReferenced */
>  		if (page_mapping_inuse(page) &&
> -		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
> +		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
>  			pgmoved++;
> +			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
> +				list_add(&page->lru, &l_active);
> +				continue;
> +			}
> +		}

What we're doing here is to identify referenced, file-backed active
pages.  We clear their referenced bit and give than another trip around
the active list.  So if they aren't referenced during that additional
pass, they will get deactivated next time they are scanned, yes?  It's
a fairly high-level design/heuristic thing which needs careful
commenting, please.


Also, the change makes this comment:

	spin_lock_irq(&zone->lru_lock);
	/*
	 * Count referenced pages from currently used mappings as
	 * rotated, even though they are moved to the inactive list.
	 * This helps balance scan pressure between file and anonymous
	 * pages in get_scan_ratio.
	 */
	reclaim_stat->recent_rotated[!!file] += pgmoved;

inaccurate.
								
>  		list_add(&page->lru, &l_inactive);
>  	}
> @@ -1282,7 +1288,6 @@ static void shrink_active_list(unsigned 
>  	 * Move the pages to the [file or anon] inactive list.
>  	 */
>  	pagevec_init(&pvec, 1);
> -	lru = LRU_BASE + file * LRU_FILE;
>  
>  	spin_lock_irq(&zone->lru_lock);
>  	/*
> @@ -1294,6 +1299,7 @@ static void shrink_active_list(unsigned 
>  	reclaim_stat->recent_rotated[!!file] += pgmoved;
>  
>  	pgmoved = 0;  /* count pages moved to inactive list */
> +	lru = LRU_BASE + file * LRU_FILE;
>  	while (!list_empty(&l_inactive)) {
>  		page = lru_to_page(&l_inactive);
>  		prefetchw_prev_lru_page(page, &l_inactive, flags);
> @@ -1316,6 +1322,29 @@ static void shrink_active_list(unsigned 
>  	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
>  	__count_zone_vm_events(PGREFILL, zone, pgscanned);
>  	__count_vm_events(PGDEACTIVATE, pgmoved);
> +
> +	pgmoved = 0;  /* count pages moved back to active list */
> +	lru = LRU_ACTIVE + file * LRU_FILE;
> +	while (!list_empty(&l_active)) {
> +		page = lru_to_page(&l_active);
> +		prefetchw_prev_lru_page(page, &l_active, flags);
> +		VM_BUG_ON(PageLRU(page));
> +		SetPageLRU(page);
> +		VM_BUG_ON(!PageActive(page));
> +
> +		list_move(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_add_lru_list(page, lru);
> +		pgmoved++;
> +		if (!pagevec_add(&pvec, page)) {
> +			spin_unlock_irq(&zone->lru_lock);
> +			if (buffer_heads_over_limit)
> +				pagevec_strip(&pvec);
> +			__pagevec_release(&pvec);
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +	}

The copy-n-pasting here is unfortunate.  But I expect that if we redid
this as a loop, the result would be a bit ugly - the pageActive
handling gets in the way.

> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);

Is it just me, is is all this stuff:

	lru = LRU_ACTIVE + file * LRU_FILE;
	...
	foo(NR_LRU_BASE + lru);

really hard to read?



Now.  How do we know that this patch improves Linux?


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 19:58                                     ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-08 19:58 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

On Fri, 8 May 2009 16:16:08 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> vmscan: make mapped executable pages the first class citizen
> 
> Protect referenced PROT_EXEC mapped pages from being deactivated.
> 
> PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
> currently running executables and their linked libraries, they shall really be
> cached aggressively to provide good user experiences.
> 

The patch seems reasonable but the changelog and the (non-existent)
design documentation could do with a touch-up.

> 
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned 
>  	unsigned long pgscanned;
>  	unsigned long vm_flags;
>  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> +	LIST_HEAD(l_active);
>  	LIST_HEAD(l_inactive);
>  	struct page *page;
>  	struct pagevec pvec;
> @@ -1272,8 +1273,13 @@ static void shrink_active_list(unsigned 
>  
>  		/* page_referenced clears PageReferenced */
>  		if (page_mapping_inuse(page) &&
> -		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
> +		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
>  			pgmoved++;
> +			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
> +				list_add(&page->lru, &l_active);
> +				continue;
> +			}
> +		}

What we're doing here is to identify referenced, file-backed active
pages.  We clear their referenced bit and give than another trip around
the active list.  So if they aren't referenced during that additional
pass, they will get deactivated next time they are scanned, yes?  It's
a fairly high-level design/heuristic thing which needs careful
commenting, please.


Also, the change makes this comment:

	spin_lock_irq(&zone->lru_lock);
	/*
	 * Count referenced pages from currently used mappings as
	 * rotated, even though they are moved to the inactive list.
	 * This helps balance scan pressure between file and anonymous
	 * pages in get_scan_ratio.
	 */
	reclaim_stat->recent_rotated[!!file] += pgmoved;

inaccurate.
								
>  		list_add(&page->lru, &l_inactive);
>  	}
> @@ -1282,7 +1288,6 @@ static void shrink_active_list(unsigned 
>  	 * Move the pages to the [file or anon] inactive list.
>  	 */
>  	pagevec_init(&pvec, 1);
> -	lru = LRU_BASE + file * LRU_FILE;
>  
>  	spin_lock_irq(&zone->lru_lock);
>  	/*
> @@ -1294,6 +1299,7 @@ static void shrink_active_list(unsigned 
>  	reclaim_stat->recent_rotated[!!file] += pgmoved;
>  
>  	pgmoved = 0;  /* count pages moved to inactive list */
> +	lru = LRU_BASE + file * LRU_FILE;
>  	while (!list_empty(&l_inactive)) {
>  		page = lru_to_page(&l_inactive);
>  		prefetchw_prev_lru_page(page, &l_inactive, flags);
> @@ -1316,6 +1322,29 @@ static void shrink_active_list(unsigned 
>  	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
>  	__count_zone_vm_events(PGREFILL, zone, pgscanned);
>  	__count_vm_events(PGDEACTIVATE, pgmoved);
> +
> +	pgmoved = 0;  /* count pages moved back to active list */
> +	lru = LRU_ACTIVE + file * LRU_FILE;
> +	while (!list_empty(&l_active)) {
> +		page = lru_to_page(&l_active);
> +		prefetchw_prev_lru_page(page, &l_active, flags);
> +		VM_BUG_ON(PageLRU(page));
> +		SetPageLRU(page);
> +		VM_BUG_ON(!PageActive(page));
> +
> +		list_move(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_add_lru_list(page, lru);
> +		pgmoved++;
> +		if (!pagevec_add(&pvec, page)) {
> +			spin_unlock_irq(&zone->lru_lock);
> +			if (buffer_heads_over_limit)
> +				pagevec_strip(&pvec);
> +			__pagevec_release(&pvec);
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +	}

The copy-n-pasting here is unfortunate.  But I expect that if we redid
this as a loop, the result would be a bit ugly - the pageActive
handling gets in the way.

> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);

Is it just me, is is all this stuff:

	lru = LRU_ACTIVE + file * LRU_FILE;
	...
	foo(NR_LRU_BASE + lru);

really hard to read?



Now.  How do we know that this patch improves Linux?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08 19:58                                     ` Andrew Morton
@ 2009-05-08 22:00                                       ` Alan Cox
  -1 siblings, 0 replies; 336+ messages in thread
From: Alan Cox @ 2009-05-08 22:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wu Fengguang, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

> The patch seems reasonable but the changelog and the (non-existent)
> design documentation could do with a touch-up.

Is it right that I as a user can do things like mmap my database
PROT_EXEC to get better database numbers by making other
stuff swap first ?

You seem to be giving everyone a "nice my process up" hack.

Alan

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 22:00                                       ` Alan Cox
  0 siblings, 0 replies; 336+ messages in thread
From: Alan Cox @ 2009-05-08 22:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Wu Fengguang, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

> The patch seems reasonable but the changelog and the (non-existent)
> design documentation could do with a touch-up.

Is it right that I as a user can do things like mmap my database
PROT_EXEC to get better database numbers by making other
stuff swap first ?

You seem to be giving everyone a "nice my process up" hack.

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08 22:00                                       ` Alan Cox
@ 2009-05-08 22:15                                         ` Andrew Morton
  -1 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-08 22:15 UTC (permalink / raw)
  To: Alan Cox
  Cc: fengguang.wu, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Fri, 8 May 2009 23:00:45 +0100
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > The patch seems reasonable but the changelog and the (non-existent)
> > design documentation could do with a touch-up.
> 
> Is it right that I as a user can do things like mmap my database
> PROT_EXEC to get better database numbers by making other
> stuff swap first ?
>
> You seem to be giving everyone a "nice my process up" hack.

Yep.

But prior to 2.6.27(?) the same effect could be had by mmap()ing the
file with or without PROT_EXEC.  The patch restores a
probably-beneficial heuristic which got lost in the LRU rewrite.

So we're no worse than pre-2.6.27 kernels here.  Plus there are
probably more effective ways of getting that sort of boost, such as
having a process running which simply touches your favoured pages
at a suitable (and fairly low) frequency.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 22:15                                         ` Andrew Morton
  0 siblings, 0 replies; 336+ messages in thread
From: Andrew Morton @ 2009-05-08 22:15 UTC (permalink / raw)
  To: Alan Cox
  Cc: fengguang.wu, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Fri, 8 May 2009 23:00:45 +0100
Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> > The patch seems reasonable but the changelog and the (non-existent)
> > design documentation could do with a touch-up.
> 
> Is it right that I as a user can do things like mmap my database
> PROT_EXEC to get better database numbers by making other
> stuff swap first ?
>
> You seem to be giving everyone a "nice my process up" hack.

Yep.

But prior to 2.6.27(?) the same effect could be had by mmap()ing the
file with or without PROT_EXEC.  The patch restores a
probably-beneficial heuristic which got lost in the LRU rewrite.

So we're no worse than pre-2.6.27 kernels here.  Plus there are
probably more effective ways of getting that sort of boost, such as
having a process running which simply touches your favoured pages
at a suitable (and fairly low) frequency.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08 22:00                                       ` Alan Cox
@ 2009-05-08 22:20                                         ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-08 22:20 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, Wu Fengguang, hannes, peterz, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

Alan Cox wrote:
>> The patch seems reasonable but the changelog and the (non-existent)
>> design documentation could do with a touch-up.
> 
> Is it right that I as a user can do things like mmap my database
> PROT_EXEC to get better database numbers by making other
> stuff swap first ?

Yes, but only if your SELinux policy allows you to
mmap something that's both executable and writable
at the same time.

> You seem to be giving everyone a "nice my process up" hack.

A user who wants to slow the system down has always
been able to do so.

I am not convinced that the potential disadvantages
of giving mapped referenced executable file pages an
extra round trip on the active file list outweighs
the advantages of doing so for normal workloads.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 22:20                                         ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-08 22:20 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, Wu Fengguang, hannes, peterz, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

Alan Cox wrote:
>> The patch seems reasonable but the changelog and the (non-existent)
>> design documentation could do with a touch-up.
> 
> Is it right that I as a user can do things like mmap my database
> PROT_EXEC to get better database numbers by making other
> stuff swap first ?

Yes, but only if your SELinux policy allows you to
mmap something that's both executable and writable
at the same time.

> You seem to be giving everyone a "nice my process up" hack.

A user who wants to slow the system down has always
been able to do so.

I am not convinced that the potential disadvantages
of giving mapped referenced executable file pages an
extra round trip on the active file list outweighs
the advantages of doing so for normal workloads.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08 22:15                                         ` Andrew Morton
@ 2009-05-08 22:53                                           ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-05-08 22:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Cox, fengguang.wu, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, kosaki.motohiro,
	minchan.kim

On Fri, May 08, 2009 at 03:15:32PM -0700, Andrew Morton wrote:
> On Fri, 8 May 2009 23:00:45 +0100
> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> 
> > > The patch seems reasonable but the changelog and the (non-existent)
> > > design documentation could do with a touch-up.
> > 
> > Is it right that I as a user can do things like mmap my database
> > PROT_EXEC to get better database numbers by making other
> > stuff swap first ?
> >
> > You seem to be giving everyone a "nice my process up" hack.
> 
> Yep.
> 
> But prior to 2.6.27(?) the same effect could be had by mmap()ing the
> file with or without PROT_EXEC.  The patch restores a
> probably-beneficial heuristic which got lost in the LRU rewrite.
> 
> So we're no worse than pre-2.6.27 kernels here.  Plus there are
> probably more effective ways of getting that sort of boost, such as
> having a process running which simply touches your favoured pages
> at a suitable (and fairly low) frequency.

An example of a process which does this automatically is the Java virtual
machine (and probably other runtimes which use a mark and sweep type GC).

You can see this in practice pretty easily -- a jvm process will automatically
keep its memory paged in, even under strong VM pressure.

-E

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-08 22:53                                           ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-05-08 22:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alan Cox, fengguang.wu, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, kosaki.motohiro,
	minchan.kim

On Fri, May 08, 2009 at 03:15:32PM -0700, Andrew Morton wrote:
> On Fri, 8 May 2009 23:00:45 +0100
> Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> 
> > > The patch seems reasonable but the changelog and the (non-existent)
> > > design documentation could do with a touch-up.
> > 
> > Is it right that I as a user can do things like mmap my database
> > PROT_EXEC to get better database numbers by making other
> > stuff swap first ?
> >
> > You seem to be giving everyone a "nice my process up" hack.
> 
> Yep.
> 
> But prior to 2.6.27(?) the same effect could be had by mmap()ing the
> file with or without PROT_EXEC.  The patch restores a
> probably-beneficial heuristic which got lost in the LRU rewrite.
> 
> So we're no worse than pre-2.6.27 kernels here.  Plus there are
> probably more effective ways of getting that sort of boost, such as
> having a process running which simply touches your favoured pages
> at a suitable (and fairly low) frequency.

An example of a process which does this automatically is the Java virtual
machine (and probably other runtimes which use a mark and sweep type GC).

You can see this in practice pretty easily -- a jvm process will automatically
keep its memory paged in, even under strong VM pressure.

-E

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08 16:04                                             ` Rik van Riel
@ 2009-05-09  4:04                                               ` Elladan
  -1 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-05-09  4:04 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Elladan, Christoph Lameter, Lee Schermerhorn, Peter Zijlstra,
	Wu Fengguang, Andrew Morton, linux-kernel, tytso, linux-mm,
	Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Fri, May 08, 2009 at 12:04:27PM -0400, Rik van Riel wrote:
> Elladan wrote:
>
>>> Nobody (except you) is proposing that we completely disable
>>> the eviction of executable pages.  I believe that your idea
>>> could easily lead to a denial of service attack, with a user
>>> creating a very large executable file and mmaping it.
>>>
>>> Giving executable pages some priority over other file cache
>>> pages is nowhere near as dangerous wrt. unexpected side effects
>>> and should work just as well.
>>
>> I don't think this sort of DOS is relevant for a single user or trusted user
>> system.  
>
> Which not all systems are, meaning that the mechanism
> Christoph proposes can never be enabled by default and
> would have to be tweaked by the user.
>
> I prefer code that should work just as well 99% of the
> time, but can be enabled by default for everybody.
> That way people automatically get the benefit.

I read Christopher's proposal as essentially, "have a desktop switch which
won't evict executable pages unless they're using more than some huge
percentage of RAM" (presumably, he wants anonymous pages to get special
treatment too) -- this would essentially be similar to mlocking all your
executables, only with a safety net if you go above x% and without affecting
non-executable file maps.

Given that, the DOS possibility you proposed seemed to just be one where a user
could push a lot of unprotected pages out quickly and make the system run slow.

I don't see how that's any different than just asking malloc() for a lot of ram
and then touching it a lot to make it appear very hot to the VM.  Any user can
trivially do that already, and some apps (eg. a jvm) happily do that for you.
The pathology is the same, and if anything an executable mmap is harder.

-E

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-09  4:04                                               ` Elladan
  0 siblings, 0 replies; 336+ messages in thread
From: Elladan @ 2009-05-09  4:04 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Elladan, Christoph Lameter, Lee Schermerhorn, Peter Zijlstra,
	Wu Fengguang, Andrew Morton, linux-kernel, tytso, linux-mm,
	Nick Piggin, Johannes Weiner, KOSAKI Motohiro

On Fri, May 08, 2009 at 12:04:27PM -0400, Rik van Riel wrote:
> Elladan wrote:
>
>>> Nobody (except you) is proposing that we completely disable
>>> the eviction of executable pages.  I believe that your idea
>>> could easily lead to a denial of service attack, with a user
>>> creating a very large executable file and mmaping it.
>>>
>>> Giving executable pages some priority over other file cache
>>> pages is nowhere near as dangerous wrt. unexpected side effects
>>> and should work just as well.
>>
>> I don't think this sort of DOS is relevant for a single user or trusted user
>> system.  
>
> Which not all systems are, meaning that the mechanism
> Christoph proposes can never be enabled by default and
> would have to be tweaked by the user.
>
> I prefer code that should work just as well 99% of the
> time, but can be enabled by default for everybody.
> That way people automatically get the benefit.

I read Christopher's proposal as essentially, "have a desktop switch which
won't evict executable pages unless they're using more than some huge
percentage of RAM" (presumably, he wants anonymous pages to get special
treatment too) -- this would essentially be similar to mlocking all your
executables, only with a safety net if you go above x% and without affecting
non-executable file maps.

Given that, the DOS possibility you proposed seemed to just be one where a user
could push a lot of unprotected pages out quickly and make the system run slow.

I don't see how that's any different than just asking malloc() for a lot of ram
and then touching it a lot to make it appear very hot to the VM.  Any user can
trivially do that already, and some apps (eg. a jvm) happily do that for you.
The pathology is the same, and if anything an executable mmap is harder.

-E

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
  2009-05-08 14:01                                         ` Minchan Kim
@ 2009-05-09  6:56                                           ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-09  6:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Peter Zijlstra, Johannes Weiner, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

On Fri, May 08, 2009 at 10:01:19PM +0800, Minchan Kim wrote:
> On Fri, May 8, 2009 at 9:15 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Fri, May 08, 2009 at 08:09:24PM +0800, Minchan Kim wrote:
> >> On Fri, May 8, 2009 at 1:17 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> >> > On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> >> >> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> >> >>
> >> >> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
> >> >> > >
> >> >> > >           /* page_referenced clears PageReferenced */
> >> >> > >           if (page_mapping_inuse(page) &&
> >> >> > > -             page_referenced(page, 0, sc->mem_cgroup))
> >> >> > > +             page_referenced(page, 0, sc->mem_cgroup)) {
> >> >> > > +                 struct address_space *mapping = page_mapping(page);
> >> >> > > +
> >> >> > >                   pgmoved++;
> >> >> > > +                 if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> >> >> > > +                         list_add(&page->lru, &l_active);
> >> >> > > +                         continue;
> >> >> > > +                 }
> >> >> > > +         }
> >> >> >
> >> >> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> >> >> > better to check if one of them is executable?  This would even work
> >> >> > for executable anon pages.  After all, there are applications that cow
> >> >> > executable mappings (sbcl and other language environments that use an
> >> >> > executable, run-time modified core image come to mind).
> >> >>
> >> >> Hmm, like provide a vm_flags mask along to page_referenced() to only
> >> >> account matching vmas... seems like a sensible idea.
> >> >
> >> > Here is a quick patch for your opinions. Compile tested.
> >> >
> >> > With the added vm_flags reporting, the mlock=>unevictable logic can
> >> > possibly be made more straightforward.
> >> >
> >> > Thanks,
> >> > Fengguang
> >> > ---
> >> > vmscan: report vm_flags in page_referenced()
> >> >
> >> > This enables more informed reclaim heuristics, eg. to protect executable
> >> > file pages more aggressively.
> >> >
> >> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> >> > ---
> >> >  include/linux/rmap.h |    5 +++--
> >> >  mm/rmap.c            |   30 +++++++++++++++++++++---------
> >> >  mm/vmscan.c          |    7 +++++--
> >> >  3 files changed, 29 insertions(+), 13 deletions(-)
> >> >
> >> > --- linux.orig/include/linux/rmap.h
> >> > +++ linux/include/linux/rmap.h
> >> > @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
> >> >  /*
> >> >  * Called from mm/vmscan.c to handle paging out
> >> >  */
> >> > -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
> >> > +int page_referenced(struct page *, int is_locked,
> >> > +                       struct mem_cgroup *cnt, unsigned long *vm_flags);
> >> >  int try_to_unmap(struct page *, int ignore_refs);
> >> >
> >> >  /*
> >> > @@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
> >> >  #define anon_vma_prepare(vma)  (0)
> >> >  #define anon_vma_link(vma)     do {} while (0)
> >> >
> >> > -#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
> >> > +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
> >> >  #define try_to_unmap(page, refs) SWAP_FAIL
> >> >
> >> >  static inline int page_mkclean(struct page *page)
> >> > --- linux.orig/mm/rmap.c
> >> > +++ linux/mm/rmap.c
> >> > @@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
> >> >  * repeatedly from either page_referenced_anon or page_referenced_file.
> >> >  */
> >> >  static int page_referenced_one(struct page *page,
> >> > -       struct vm_area_struct *vma, unsigned int *mapcount)
> >> > +                              struct vm_area_struct *vma,
> >> > +                              unsigned int *mapcount)
> >> >  {
> >> >        struct mm_struct *mm = vma->vm_mm;
> >> >        unsigned long address;
> >> > @@ -385,7 +386,8 @@ out:
> >> >  }
> >> >
> >> >  static int page_referenced_anon(struct page *page,
> >> > -                               struct mem_cgroup *mem_cont)
> >> > +                               struct mem_cgroup *mem_cont,
> >> > +                               unsigned long *vm_flags)
> >> >  {
> >> >        unsigned int mapcount;
> >> >        struct anon_vma *anon_vma;
> >> > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
> >> >                if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> >> >                        continue;
> >> >                referenced += page_referenced_one(page, vma, &mapcount);
> >> > +               *vm_flags |= vma->vm_flags;
> >>
> >> Sometime this vma don't contain the anon page.
> >> That's why we need page_check_address.
> >> For such a case, wrong *vm_flag cause be harmful to reclaim.
> >> It can be happen in your first class citizen patch, I think.
> >
> > Yes I'm aware of that - the VMA area covers that page, but have no pte
> > actually installed for that page. That should be OK - the presentation
> > of such VMA is a good indication of it being some executable text.
> >
> 
> Sorry but I can't understand your point.
> 
> This is general interface but not only executable text.
> Sometime, The information of vma which don't really have the page can
> be passed to caller.

Right. But if the caller don't care, why bother passing the vm_flags
parameter down to page_referenced_one()? We can do that when there
comes a need, otherwise it sounds more like unnecessary overheads.

> ex) It can be happen by COW, mremap, non-linear mapping and so on.
> but I am not sure.

Hmm, this reminded me of the mlocked page protection logic in
page_referenced_one(). Why shall the "if (vma->vm_flags & VM_LOCKED)"
check be placed *after* the page_check_address() check? Is there a
case that an *existing* page frame is not mapped to the VM_LOCKED vma?
And why not to protect the page in such a case?

> I doubt  vm_flag information is useful.

Me too :) I don't expect many of the other flags to be useful.
Just that passing them out blindly could be more convenient than doing

        if (vma->vm_flags & PROT_EXEC)
                *vm_flags = PROT_EXEC;

But I do suspect passing out VM_LOCKED could help somehow.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
@ 2009-05-09  6:56                                           ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-09  6:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Peter Zijlstra, Johannes Weiner, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro

On Fri, May 08, 2009 at 10:01:19PM +0800, Minchan Kim wrote:
> On Fri, May 8, 2009 at 9:15 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Fri, May 08, 2009 at 08:09:24PM +0800, Minchan Kim wrote:
> >> On Fri, May 8, 2009 at 1:17 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> >> > On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> >> >> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> >> >>
> >> >> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
> >> >> > >
> >> >> > > A  A  A  A  A  /* page_referenced clears PageReferenced */
> >> >> > > A  A  A  A  A  if (page_mapping_inuse(page) &&
> >> >> > > - A  A  A  A  A  A  page_referenced(page, 0, sc->mem_cgroup))
> >> >> > > + A  A  A  A  A  A  page_referenced(page, 0, sc->mem_cgroup)) {
> >> >> > > + A  A  A  A  A  A  A  A  struct address_space *mapping = page_mapping(page);
> >> >> > > +
> >> >> > > A  A  A  A  A  A  A  A  A  pgmoved++;
> >> >> > > + A  A  A  A  A  A  A  A  if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> >> >> > > + A  A  A  A  A  A  A  A  A  A  A  A  list_add(&page->lru, &l_active);
> >> >> > > + A  A  A  A  A  A  A  A  A  A  A  A  continue;
> >> >> > > + A  A  A  A  A  A  A  A  }
> >> >> > > + A  A  A  A  }
> >> >> >
> >> >> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> >> >> > better to check if one of them is executable? A This would even work
> >> >> > for executable anon pages. A After all, there are applications that cow
> >> >> > executable mappings (sbcl and other language environments that use an
> >> >> > executable, run-time modified core image come to mind).
> >> >>
> >> >> Hmm, like provide a vm_flags mask along to page_referenced() to only
> >> >> account matching vmas... seems like a sensible idea.
> >> >
> >> > Here is a quick patch for your opinions. Compile tested.
> >> >
> >> > With the added vm_flags reporting, the mlock=>unevictable logic can
> >> > possibly be made more straightforward.
> >> >
> >> > Thanks,
> >> > Fengguang
> >> > ---
> >> > vmscan: report vm_flags in page_referenced()
> >> >
> >> > This enables more informed reclaim heuristics, eg. to protect executable
> >> > file pages more aggressively.
> >> >
> >> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> >> > ---
> >> > A include/linux/rmap.h | A  A 5 +++--
> >> > A mm/rmap.c A  A  A  A  A  A | A  30 +++++++++++++++++++++---------
> >> > A mm/vmscan.c A  A  A  A  A | A  A 7 +++++--
> >> > A 3 files changed, 29 insertions(+), 13 deletions(-)
> >> >
> >> > --- linux.orig/include/linux/rmap.h
> >> > +++ linux/include/linux/rmap.h
> >> > @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
> >> > A /*
> >> > A * Called from mm/vmscan.c to handle paging out
> >> > A */
> >> > -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
> >> > +int page_referenced(struct page *, int is_locked,
> >> > + A  A  A  A  A  A  A  A  A  A  A  struct mem_cgroup *cnt, unsigned long *vm_flags);
> >> > A int try_to_unmap(struct page *, int ignore_refs);
> >> >
> >> > A /*
> >> > @@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
> >> > A #define anon_vma_prepare(vma) A (0)
> >> > A #define anon_vma_link(vma) A  A  do {} while (0)
> >> >
> >> > -#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
> >> > +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
> >> > A #define try_to_unmap(page, refs) SWAP_FAIL
> >> >
> >> > A static inline int page_mkclean(struct page *page)
> >> > --- linux.orig/mm/rmap.c
> >> > +++ linux/mm/rmap.c
> >> > @@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
> >> > A * repeatedly from either page_referenced_anon or page_referenced_file.
> >> > A */
> >> > A static int page_referenced_one(struct page *page,
> >> > - A  A  A  struct vm_area_struct *vma, unsigned int *mapcount)
> >> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A struct vm_area_struct *vma,
> >> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A unsigned int *mapcount)
> >> > A {
> >> > A  A  A  A struct mm_struct *mm = vma->vm_mm;
> >> > A  A  A  A unsigned long address;
> >> > @@ -385,7 +386,8 @@ out:
> >> > A }
> >> >
> >> > A static int page_referenced_anon(struct page *page,
> >> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  struct mem_cgroup *mem_cont)
> >> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  struct mem_cgroup *mem_cont,
> >> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  unsigned long *vm_flags)
> >> > A {
> >> > A  A  A  A unsigned int mapcount;
> >> > A  A  A  A struct anon_vma *anon_vma;
> >> > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
> >> > A  A  A  A  A  A  A  A if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> >> > A  A  A  A  A  A  A  A  A  A  A  A continue;
> >> > A  A  A  A  A  A  A  A referenced += page_referenced_one(page, vma, &mapcount);
> >> > + A  A  A  A  A  A  A  *vm_flags |= vma->vm_flags;
> >>
> >> Sometime this vma don't contain the anon page.
> >> That's why we need page_check_address.
> >> For such a case, wrong *vm_flag cause be harmful to reclaim.
> >> It can be happen in your first class citizen patch, I think.
> >
> > Yes I'm aware of that - the VMA area covers that page, but have no pte
> > actually installed for that page. That should be OK - the presentation
> > of such VMA is a good indication of it being some executable text.
> >
> 
> Sorry but I can't understand your point.
> 
> This is general interface but not only executable text.
> Sometime, The information of vma which don't really have the page can
> be passed to caller.

Right. But if the caller don't care, why bother passing the vm_flags
parameter down to page_referenced_one()? We can do that when there
comes a need, otherwise it sounds more like unnecessary overheads.

> ex) It can be happen by COW, mremap, non-linear mapping and so on.
> but I am not sure.

Hmm, this reminded me of the mlocked page protection logic in
page_referenced_one(). Why shall the "if (vma->vm_flags & VM_LOCKED)"
check be placed *after* the page_check_address() check? Is there a
case that an *existing* page frame is not mapped to the VM_LOCKED vma?
And why not to protect the page in such a case?

> I doubt  vm_flag information is useful.

Me too :) I don't expect many of the other flags to be useful.
Just that passing them out blindly could be more convenient than doing

        if (vma->vm_flags & PROT_EXEC)
                *vm_flags = PROT_EXEC;

But I do suspect passing out VM_LOCKED could help somehow.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08 17:18                                             ` Christoph Lameter
@ 2009-05-09 10:20                                               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-09 10:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, Elladan, Rik van Riel, Lee Schermerhorn,
	Peter Zijlstra, Wu Fengguang, Andrew Morton, linux-kernel, tytso,
	linux-mm, Nick Piggin, Johannes Weiner

> On Thu, 7 May 2009, Elladan wrote:
> 
> > > Nobody (except you) is proposing that we completely disable
> > > the eviction of executable pages.  I believe that your idea
> > > could easily lead to a denial of service attack, with a user
> > > creating a very large executable file and mmaping it.
> 
> The amount of mlockable pages is limited via ulimit. We can already make
> the pages unreclaimable through mlock().
> 
> > I don't know of any distro that applies default ulimits, so desktops are
> > already susceptible to the far more trivial "call malloc a lot" or "fork bomb"
> > attacks.  Plus, ulimits don't help, since they only apply per process - you'd
> > need a default mem cgroup before this mattered, I think.
> 
> The point remains that the proposed patch does not solve the general
> problem that we encounter with exec pages of critical components of the
> user interface being evicted from memory.
> 
> Do we have test data that shows a benefit? The description is minimal. Rik
> claimed on IRC that tests have been done. If so then the patch description
> should include the tests. Which loads benefit from this patch?
> 
> A significant change to the reclaim algorithm also needs to
> have a clear description of the effects on reclaim behavior which is also
> lacking.

btw,

This is very good news to me.
Recently I've taked sevaral time for reproducing this issue. but
I have no luck. I'm interesting its test-case.

Thanks.



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-09 10:20                                               ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-09 10:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, Elladan, Rik van Riel, Lee Schermerhorn,
	Peter Zijlstra, Wu Fengguang, Andrew Morton, linux-kernel, tytso,
	linux-mm, Nick Piggin, Johannes Weiner

> On Thu, 7 May 2009, Elladan wrote:
> 
> > > Nobody (except you) is proposing that we completely disable
> > > the eviction of executable pages.  I believe that your idea
> > > could easily lead to a denial of service attack, with a user
> > > creating a very large executable file and mmaping it.
> 
> The amount of mlockable pages is limited via ulimit. We can already make
> the pages unreclaimable through mlock().
> 
> > I don't know of any distro that applies default ulimits, so desktops are
> > already susceptible to the far more trivial "call malloc a lot" or "fork bomb"
> > attacks.  Plus, ulimits don't help, since they only apply per process - you'd
> > need a default mem cgroup before this mattered, I think.
> 
> The point remains that the proposed patch does not solve the general
> problem that we encounter with exec pages of critical components of the
> user interface being evicted from memory.
> 
> Do we have test data that shows a benefit? The description is minimal. Rik
> claimed on IRC that tests have been done. If so then the patch description
> should include the tests. Which loads benefit from this patch?
> 
> A significant change to the reclaim algorithm also needs to
> have a clear description of the effects on reclaim behavior which is also
> lacking.

btw,

This is very good news to me.
Recently I've taked sevaral time for reproducing this issue. but
I have no luck. I'm interesting its test-case.

Thanks.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-08 22:00                                       ` Alan Cox
@ 2009-05-10  8:59                                         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10  8:59 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, Wu Fengguang, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

2009/5/9 Alan Cox <alan@lxorguk.ukuu.org.uk>:
>> The patch seems reasonable but the changelog and the (non-existent)
>> design documentation could do with a touch-up.
>
> Is it right that I as a user can do things like mmap my database
> PROT_EXEC to get better database numbers by making other
> stuff swap first ?
>
> You seem to be giving everyone a "nice my process up" hack.

How about this?
if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
shrink_inactive_list() already
reclaim the active page forcely.
then, this patch don't change kernel reclaim policy.

anyway, user process non-changable preventing "nice my process up
hack" seems makes sense to me.

test result:

echo 100 > /proc/sys/vm/dirty_ratio
echo 100 > /proc/sys/vm/dirty_background_ratio
run modified qsbench (use mmap(PROT_EXEC) instead malloc)

           active2active vs active2inactive ratio
before    5:5
after       1:9

please don't ask performance number. I haven't reproduce Wu's patch
improvemnt ;)

Wu, What do you think?

---
 mm/vmscan.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c	2009-05-10 02:40:01.000000000 +0900
+++ b/mm/vmscan.c	2009-05-10 03:33:30.000000000 +0900
@@ -1275,7 +1275,8 @@ static void shrink_active_list(unsigned
 			struct address_space *mapping = page_mapping(page);

 			pgmoved++;
-			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
+			if (mapping && (priority >= DEF_PRIORITY - 2) &&
+			    test_bit(AS_EXEC, &mapping->flags)) {
 				pga2a++;
 				list_add(&page->lru, &l_active);
 				continue;

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10  8:59                                         ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10  8:59 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrew Morton, Wu Fengguang, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

2009/5/9 Alan Cox <alan@lxorguk.ukuu.org.uk>:
>> The patch seems reasonable but the changelog and the (non-existent)
>> design documentation could do with a touch-up.
>
> Is it right that I as a user can do things like mmap my database
> PROT_EXEC to get better database numbers by making other
> stuff swap first ?
>
> You seem to be giving everyone a "nice my process up" hack.

How about this?
if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
shrink_inactive_list() already
reclaim the active page forcely.
then, this patch don't change kernel reclaim policy.

anyway, user process non-changable preventing "nice my process up
hack" seems makes sense to me.

test result:

echo 100 > /proc/sys/vm/dirty_ratio
echo 100 > /proc/sys/vm/dirty_background_ratio
run modified qsbench (use mmap(PROT_EXEC) instead malloc)

           active2active vs active2inactive ratio
before    5:5
after       1:9

please don't ask performance number. I haven't reproduce Wu's patch
improvemnt ;)

Wu, What do you think?

---
 mm/vmscan.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: b/mm/vmscan.c
===================================================================
--- a/mm/vmscan.c	2009-05-10 02:40:01.000000000 +0900
+++ b/mm/vmscan.c	2009-05-10 03:33:30.000000000 +0900
@@ -1275,7 +1275,8 @@ static void shrink_active_list(unsigned
 			struct address_space *mapping = page_mapping(page);

 			pgmoved++;
-			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
+			if (mapping && (priority >= DEF_PRIORITY - 2) &&
+			    test_bit(AS_EXEC, &mapping->flags)) {
 				pga2a++;
 				list_add(&page->lru, &l_active);
 				continue;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10  8:59                                         ` KOSAKI Motohiro
@ 2009-05-10  9:07                                           ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-10  9:07 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan Cox, Andrew Morton, Wu Fengguang, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, 2009-05-10 at 17:59 +0900, KOSAKI Motohiro wrote:
> 2009/5/9 Alan Cox <alan@lxorguk.ukuu.org.uk>:
> >> The patch seems reasonable but the changelog and the (non-existent)
> >> design documentation could do with a touch-up.
> >
> > Is it right that I as a user can do things like mmap my database
> > PROT_EXEC to get better database numbers by making other
> > stuff swap first ?
> >
> > You seem to be giving everyone a "nice my process up" hack.
> 
> How about this?
> if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
> shrink_inactive_list() already
> reclaim the active page forcely.
> then, this patch don't change kernel reclaim policy.
> 
> anyway, user process non-changable preventing "nice my process up
> hack" seems makes sense to me.
> 
> test result:
> 
> echo 100 > /proc/sys/vm/dirty_ratio
> echo 100 > /proc/sys/vm/dirty_background_ratio
> run modified qsbench (use mmap(PROT_EXEC) instead malloc)
> 
>            active2active vs active2inactive ratio
> before    5:5
> after       1:9
> 
> please don't ask performance number. I haven't reproduce Wu's patch
> improvemnt ;)
> 
> Wu, What do you think?

I don't think this is desirable, like Andrew already said, there's tons
of ways to defeat any of this and we've so far always priorized mappings
over !mappings. Limiting this to only PROT_EXEC mappings is already less
than it used to be.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10  9:07                                           ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-10  9:07 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan Cox, Andrew Morton, Wu Fengguang, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, 2009-05-10 at 17:59 +0900, KOSAKI Motohiro wrote:
> 2009/5/9 Alan Cox <alan@lxorguk.ukuu.org.uk>:
> >> The patch seems reasonable but the changelog and the (non-existent)
> >> design documentation could do with a touch-up.
> >
> > Is it right that I as a user can do things like mmap my database
> > PROT_EXEC to get better database numbers by making other
> > stuff swap first ?
> >
> > You seem to be giving everyone a "nice my process up" hack.
> 
> How about this?
> if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
> shrink_inactive_list() already
> reclaim the active page forcely.
> then, this patch don't change kernel reclaim policy.
> 
> anyway, user process non-changable preventing "nice my process up
> hack" seems makes sense to me.
> 
> test result:
> 
> echo 100 > /proc/sys/vm/dirty_ratio
> echo 100 > /proc/sys/vm/dirty_background_ratio
> run modified qsbench (use mmap(PROT_EXEC) instead malloc)
> 
>            active2active vs active2inactive ratio
> before    5:5
> after       1:9
> 
> please don't ask performance number. I haven't reproduce Wu's patch
> improvemnt ;)
> 
> Wu, What do you think?

I don't think this is desirable, like Andrew already said, there's tons
of ways to defeat any of this and we've so far always priorized mappings
over !mappings. Limiting this to only PROT_EXEC mappings is already less
than it used to be.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10  8:59                                         ` KOSAKI Motohiro
@ 2009-05-10  9:20                                           ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-10  9:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 04:59:17PM +0800, KOSAKI Motohiro wrote:
> 2009/5/9 Alan Cox <alan@lxorguk.ukuu.org.uk>:
> >> The patch seems reasonable but the changelog and the (non-existent)
> >> design documentation could do with a touch-up.
> >
> > Is it right that I as a user can do things like mmap my database
> > PROT_EXEC to get better database numbers by making other
> > stuff swap first ?
> >
> > You seem to be giving everyone a "nice my process up" hack.
> 
> How about this?

Why it deserves more tricks? PROT_EXEC pages are rare.
If user space is to abuse PROT_EXEC, let them be for it ;-)

> if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
> shrink_inactive_list() already
> reclaim the active page forcely.

Isn't lumpy reclaim now enabled by (and only by) non-zero order?

> then, this patch don't change kernel reclaim policy.
> 
> anyway, user process non-changable preventing "nice my process up
> hack" seems makes sense to me.
> 
> test result:
> 
> echo 100 > /proc/sys/vm/dirty_ratio
> echo 100 > /proc/sys/vm/dirty_background_ratio
> run modified qsbench (use mmap(PROT_EXEC) instead malloc)
> 
>            active2active vs active2inactive ratio
> before    5:5
> after       1:9

Do you have scripts for producing such numbers? I'm dreaming to have
such tools :-)

> please don't ask performance number. I haven't reproduce Wu's patch
> improvemnt ;)

That's why I decided to "explain" instead of "benchmark" the benefits
of my patch, hehe.

Thanks,
Fengguang

> ---
>  mm/vmscan.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: b/mm/vmscan.c
> ===================================================================
> --- a/mm/vmscan.c	2009-05-10 02:40:01.000000000 +0900
> +++ b/mm/vmscan.c	2009-05-10 03:33:30.000000000 +0900
> @@ -1275,7 +1275,8 @@ static void shrink_active_list(unsigned
>  			struct address_space *mapping = page_mapping(page);
> 
>  			pgmoved++;
> -			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> +			if (mapping && (priority >= DEF_PRIORITY - 2) &&
> +			    test_bit(AS_EXEC, &mapping->flags)) {
>  				pga2a++;
>  				list_add(&page->lru, &l_active);
>  				continue;

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10  9:20                                           ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-10  9:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 04:59:17PM +0800, KOSAKI Motohiro wrote:
> 2009/5/9 Alan Cox <alan@lxorguk.ukuu.org.uk>:
> >> The patch seems reasonable but the changelog and the (non-existent)
> >> design documentation could do with a touch-up.
> >
> > Is it right that I as a user can do things like mmap my database
> > PROT_EXEC to get better database numbers by making other
> > stuff swap first ?
> >
> > You seem to be giving everyone a "nice my process up" hack.
> 
> How about this?

Why it deserves more tricks? PROT_EXEC pages are rare.
If user space is to abuse PROT_EXEC, let them be for it ;-)

> if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
> shrink_inactive_list() already
> reclaim the active page forcely.

Isn't lumpy reclaim now enabled by (and only by) non-zero order?

> then, this patch don't change kernel reclaim policy.
> 
> anyway, user process non-changable preventing "nice my process up
> hack" seems makes sense to me.
> 
> test result:
> 
> echo 100 > /proc/sys/vm/dirty_ratio
> echo 100 > /proc/sys/vm/dirty_background_ratio
> run modified qsbench (use mmap(PROT_EXEC) instead malloc)
> 
>            active2active vs active2inactive ratio
> before    5:5
> after       1:9

Do you have scripts for producing such numbers? I'm dreaming to have
such tools :-)

> please don't ask performance number. I haven't reproduce Wu's patch
> improvemnt ;)

That's why I decided to "explain" instead of "benchmark" the benefits
of my patch, hehe.

Thanks,
Fengguang

> ---
>  mm/vmscan.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: b/mm/vmscan.c
> ===================================================================
> --- a/mm/vmscan.c	2009-05-10 02:40:01.000000000 +0900
> +++ b/mm/vmscan.c	2009-05-10 03:33:30.000000000 +0900
> @@ -1275,7 +1275,8 @@ static void shrink_active_list(unsigned
>  			struct address_space *mapping = page_mapping(page);
> 
>  			pgmoved++;
> -			if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> +			if (mapping && (priority >= DEF_PRIORITY - 2) &&
> +			    test_bit(AS_EXEC, &mapping->flags)) {
>  				pga2a++;
>  				list_add(&page->lru, &l_active);
>  				continue;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10  9:20                                           ` Wu Fengguang
@ 2009-05-10  9:29                                             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10  9:29 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> >> The patch seems reasonable but the changelog and the (non-existent)
>> >> design documentation could do with a touch-up.
>> >
>> > Is it right that I as a user can do things like mmap my database
>> > PROT_EXEC to get better database numbers by making other
>> > stuff swap first ?
>> >
>> > You seem to be giving everyone a "nice my process up" hack.
>>
>> How about this?
>
> Why it deserves more tricks? PROT_EXEC pages are rare.
> If user space is to abuse PROT_EXEC, let them be for it ;-)

yes, typicall rare.
tha problem is, user program _can_ use PROT_EXEC for get higher priority
ahthough non-executable memory.

In general, static priority mechanism have one weakness. if all object
have higher
priority, it break priority mechanism.


>> if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
>> shrink_inactive_list() already
>> reclaim the active page forcely.
>
> Isn't lumpy reclaim now enabled by (and only by) non-zero order?

you are right. but I only say the kernel already have policy changing threashold
for preventing worst case.


>> then, this patch don't change kernel reclaim policy.
>>
>> anyway, user process non-changable preventing "nice my process up
>> hack" seems makes sense to me.
>>
>> test result:
>>
>> echo 100 > /proc/sys/vm/dirty_ratio
>> echo 100 > /proc/sys/vm/dirty_background_ratio
>> run modified qsbench (use mmap(PROT_EXEC) instead malloc)
>>
>>            active2active vs active2inactive ratio
>> before    5:5
>> after       1:9
>
> Do you have scripts for producing such numbers? I'm dreaming to have
> such tools :-)

I made stastics showing patch for testing, hehe :)

---
 include/linux/vmstat.h |    1 +
 mm/vmstat.c            |    1 +
 2 files changed, 2 insertions(+)

Index: b/include/linux/vmstat.h
===================================================================
--- a/include/linux/vmstat.h    2009-02-17 07:34:38.000000000 +0900
+++ b/include/linux/vmstat.h    2009-05-10 02:36:37.000000000 +0900
@@ -51,6 +51,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
                UNEVICTABLE_PGSTRANDED, /* unable to isolate on unlock */
                UNEVICTABLE_MLOCKFREED,
 #endif
+               FOR_ALL_ZONES(PGA2A),
                NR_VM_EVENT_ITEMS
 };

Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c       2009-05-10 01:08:36.000000000 +0900
+++ b/mm/vmstat.c       2009-05-10 02:37:18.000000000 +0900
@@ -708,6 +708,7 @@ static const char * const vmstat_text[]
        "unevictable_pgs_stranded",
        "unevictable_pgs_mlockfreed",
 #endif
+       TEXTS_FOR_ZONES("pga2a")
 #endif
 };



>> please don't ask performance number. I haven't reproduce Wu's patch
>> improvemnt ;)
>
> That's why I decided to "explain" instead of "benchmark" the benefits
> of my patch, hehe.

okey, I see.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10  9:29                                             ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10  9:29 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> >> The patch seems reasonable but the changelog and the (non-existent)
>> >> design documentation could do with a touch-up.
>> >
>> > Is it right that I as a user can do things like mmap my database
>> > PROT_EXEC to get better database numbers by making other
>> > stuff swap first ?
>> >
>> > You seem to be giving everyone a "nice my process up" hack.
>>
>> How about this?
>
> Why it deserves more tricks? PROT_EXEC pages are rare.
> If user space is to abuse PROT_EXEC, let them be for it ;-)

yes, typicall rare.
tha problem is, user program _can_ use PROT_EXEC for get higher priority
ahthough non-executable memory.

In general, static priority mechanism have one weakness. if all object
have higher
priority, it break priority mechanism.


>> if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
>> shrink_inactive_list() already
>> reclaim the active page forcely.
>
> Isn't lumpy reclaim now enabled by (and only by) non-zero order?

you are right. but I only say the kernel already have policy changing threashold
for preventing worst case.


>> then, this patch don't change kernel reclaim policy.
>>
>> anyway, user process non-changable preventing "nice my process up
>> hack" seems makes sense to me.
>>
>> test result:
>>
>> echo 100 > /proc/sys/vm/dirty_ratio
>> echo 100 > /proc/sys/vm/dirty_background_ratio
>> run modified qsbench (use mmap(PROT_EXEC) instead malloc)
>>
>>            active2active vs active2inactive ratio
>> before    5:5
>> after       1:9
>
> Do you have scripts for producing such numbers? I'm dreaming to have
> such tools :-)

I made stastics showing patch for testing, hehe :)

---
 include/linux/vmstat.h |    1 +
 mm/vmstat.c            |    1 +
 2 files changed, 2 insertions(+)

Index: b/include/linux/vmstat.h
===================================================================
--- a/include/linux/vmstat.h    2009-02-17 07:34:38.000000000 +0900
+++ b/include/linux/vmstat.h    2009-05-10 02:36:37.000000000 +0900
@@ -51,6 +51,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
                UNEVICTABLE_PGSTRANDED, /* unable to isolate on unlock */
                UNEVICTABLE_MLOCKFREED,
 #endif
+               FOR_ALL_ZONES(PGA2A),
                NR_VM_EVENT_ITEMS
 };

Index: b/mm/vmstat.c
===================================================================
--- a/mm/vmstat.c       2009-05-10 01:08:36.000000000 +0900
+++ b/mm/vmstat.c       2009-05-10 02:37:18.000000000 +0900
@@ -708,6 +708,7 @@ static const char * const vmstat_text[]
        "unevictable_pgs_stranded",
        "unevictable_pgs_mlockfreed",
 #endif
+       TEXTS_FOR_ZONES("pga2a")
 #endif
 };



>> please don't ask performance number. I haven't reproduce Wu's patch
>> improvemnt ;)
>
> That's why I decided to "explain" instead of "benchmark" the benefits
> of my patch, hehe.

okey, I see.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10  9:07                                           ` Peter Zijlstra
@ 2009-05-10  9:35                                             ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-10  9:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Alan Cox, Andrew Morton, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 05:07:26PM +0800, Peter Zijlstra wrote:
> On Sun, 2009-05-10 at 17:59 +0900, KOSAKI Motohiro wrote:
> > 2009/5/9 Alan Cox <alan@lxorguk.ukuu.org.uk>:
> > >> The patch seems reasonable but the changelog and the (non-existent)
> > >> design documentation could do with a touch-up.
> > >
> > > Is it right that I as a user can do things like mmap my database
> > > PROT_EXEC to get better database numbers by making other
> > > stuff swap first ?
> > >
> > > You seem to be giving everyone a "nice my process up" hack.
> > 
> > How about this?
> > if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
> > shrink_inactive_list() already
> > reclaim the active page forcely.
> > then, this patch don't change kernel reclaim policy.
> > 
> > anyway, user process non-changable preventing "nice my process up
> > hack" seems makes sense to me.
> > 
> > test result:
> > 
> > echo 100 > /proc/sys/vm/dirty_ratio
> > echo 100 > /proc/sys/vm/dirty_background_ratio
> > run modified qsbench (use mmap(PROT_EXEC) instead malloc)
> > 
> >            active2active vs active2inactive ratio
> > before    5:5
> > after       1:9
> > 
> > please don't ask performance number. I haven't reproduce Wu's patch
> > improvemnt ;)
> > 
> > Wu, What do you think?
> 
> I don't think this is desirable, like Andrew already said, there's tons
> of ways to defeat any of this and we've so far always priorized mappings
> over !mappings. Limiting this to only PROT_EXEC mappings is already less
> than it used to be.

Yeah. One thing I realized in readahead is that *anything* can happen.
When it comes to caching, app/user behaviors are *far more* unpredictable.
We can make the heuristics as large as 1000LOC (and leave users and
ourselves lost in the mist) or as simple as 100LOC (and make it happy
to hacking or even abuse).

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10  9:35                                             ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-10  9:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Alan Cox, Andrew Morton, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 05:07:26PM +0800, Peter Zijlstra wrote:
> On Sun, 2009-05-10 at 17:59 +0900, KOSAKI Motohiro wrote:
> > 2009/5/9 Alan Cox <alan@lxorguk.ukuu.org.uk>:
> > >> The patch seems reasonable but the changelog and the (non-existent)
> > >> design documentation could do with a touch-up.
> > >
> > > Is it right that I as a user can do things like mmap my database
> > > PROT_EXEC to get better database numbers by making other
> > > stuff swap first ?
> > >
> > > You seem to be giving everyone a "nice my process up" hack.
> > 
> > How about this?
> > if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
> > shrink_inactive_list() already
> > reclaim the active page forcely.
> > then, this patch don't change kernel reclaim policy.
> > 
> > anyway, user process non-changable preventing "nice my process up
> > hack" seems makes sense to me.
> > 
> > test result:
> > 
> > echo 100 > /proc/sys/vm/dirty_ratio
> > echo 100 > /proc/sys/vm/dirty_background_ratio
> > run modified qsbench (use mmap(PROT_EXEC) instead malloc)
> > 
> >            active2active vs active2inactive ratio
> > before    5:5
> > after       1:9
> > 
> > please don't ask performance number. I haven't reproduce Wu's patch
> > improvemnt ;)
> > 
> > Wu, What do you think?
> 
> I don't think this is desirable, like Andrew already said, there's tons
> of ways to defeat any of this and we've so far always priorized mappings
> over !mappings. Limiting this to only PROT_EXEC mappings is already less
> than it used to be.

Yeah. One thing I realized in readahead is that *anything* can happen.
When it comes to caching, app/user behaviors are *far more* unpredictable.
We can make the heuristics as large as 1000LOC (and leave users and
ourselves lost in the mist) or as simple as 100LOC (and make it happy
to hacking or even abuse).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10  9:07                                           ` Peter Zijlstra
@ 2009-05-10  9:36                                             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10  9:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alan Cox, Andrew Morton, Wu Fengguang, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> How about this?
>> if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
>> shrink_inactive_list() already
>> reclaim the active page forcely.
>> then, this patch don't change kernel reclaim policy.
>>
>> anyway, user process non-changable preventing "nice my process up
>> hack" seems makes sense to me.
>>
>> test result:
>>
>> echo 100 > /proc/sys/vm/dirty_ratio
>> echo 100 > /proc/sys/vm/dirty_background_ratio
>> run modified qsbench (use mmap(PROT_EXEC) instead malloc)
>>
>>            active2active vs active2inactive ratio
>> before    5:5
>> after       1:9
>>
>> please don't ask performance number. I haven't reproduce Wu's patch
>> improvemnt ;)
>>
>> Wu, What do you think?
>
> I don't think this is desirable, like Andrew already said, there's tons
> of ways to defeat any of this and we've so far always priorized mappings
> over !mappings. Limiting this to only PROT_EXEC mappings is already less
> than it used to be.

I don't oppose this policy. PROT_EXEC seems good viewpoint.
The problem is PROT_EXEC'ed page isn't gurantee rarely.

if all pages claim "Hey, I'm higher priority page, please don't
reclaim me", end-user get
suck result easily.

before 2.6.27 kernel have similar problems. many mapped page cause bad latency
easily. I don't want reproduce this.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10  9:36                                             ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10  9:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alan Cox, Andrew Morton, Wu Fengguang, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> How about this?
>> if priority < DEF_PRIORITY-2, aggressive lumpy reclaim in
>> shrink_inactive_list() already
>> reclaim the active page forcely.
>> then, this patch don't change kernel reclaim policy.
>>
>> anyway, user process non-changable preventing "nice my process up
>> hack" seems makes sense to me.
>>
>> test result:
>>
>> echo 100 > /proc/sys/vm/dirty_ratio
>> echo 100 > /proc/sys/vm/dirty_background_ratio
>> run modified qsbench (use mmap(PROT_EXEC) instead malloc)
>>
>>            active2active vs active2inactive ratio
>> before    5:5
>> after       1:9
>>
>> please don't ask performance number. I haven't reproduce Wu's patch
>> improvemnt ;)
>>
>> Wu, What do you think?
>
> I don't think this is desirable, like Andrew already said, there's tons
> of ways to defeat any of this and we've so far always priorized mappings
> over !mappings. Limiting this to only PROT_EXEC mappings is already less
> than it used to be.

I don't oppose this policy. PROT_EXEC seems good viewpoint.
The problem is PROT_EXEC'ed page isn't gurantee rarely.

if all pages claim "Hey, I'm higher priority page, please don't
reclaim me", end-user get
suck result easily.

before 2.6.27 kernel have similar problems. many mapped page cause bad latency
easily. I don't want reproduce this.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10  9:29                                             ` KOSAKI Motohiro
@ 2009-05-10 10:03                                               ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-10 10:03 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 05:29:43PM +0800, KOSAKI Motohiro wrote:
> >> >> The patch seems reasonable but the changelog and the (non-existent)
> >> >> design documentation could do with a touch-up.
> >> >
> >> > Is it right that I as a user can do things like mmap my database
> >> > PROT_EXEC to get better database numbers by making other
> >> > stuff swap first ?
> >> >
> >> > You seem to be giving everyone a "nice my process up" hack.
> >>
> >> How about this?
> >
> > Why it deserves more tricks? PROT_EXEC pages are rare.
> > If user space is to abuse PROT_EXEC, let them be for it ;-)
> 
> yes, typicall rare.
> tha problem is, user program _can_ use PROT_EXEC for get higher priority
> ahthough non-executable memory.

- abuses should be rare
- large scale abuses will be even more rare,
- the resulted vmscan overheads are the *expected* side effect
- the side effects are still safe

So if that's what they want, let them have it to their heart's content.

You know it's normal for many users/apps to care only about the result.
When they want something but cannot get it from the smarter version of
PROT_EXEC heuristics, they will go on to devise more complicated tricks.

In the end both sides loose.

If the abused case is important enough, then let's introduce a feature
to explicitly prioritize the pages. But let's leave the PROT_EXEC case
simple.

> In general, static priority mechanism have one weakness. if all object
> have higher priority, it break priority mechanism.

Yup.

> >> then, this patch don't change kernel reclaim policy.
> >>
> >> anyway, user process non-changable preventing "nice my process up
> >> hack" seems makes sense to me.
> >>
> >> test result:
> >>
> >> echo 100 > /proc/sys/vm/dirty_ratio
> >> echo 100 > /proc/sys/vm/dirty_background_ratio
> >> run modified qsbench (use mmap(PROT_EXEC) instead malloc)
> >>
> >>            active2active vs active2inactive ratio
> >> before    5:5
> >> after       1:9
> >
> > Do you have scripts for producing such numbers? I'm dreaming to have
> > such tools :-)
> 
> I made stastics showing patch for testing, hehe :)

I see :)

Thanks,
Fengguang

> ---
>  include/linux/vmstat.h |    1 +
>  mm/vmstat.c            |    1 +
>  2 files changed, 2 insertions(+)
> 
> Index: b/include/linux/vmstat.h
> ===================================================================
> --- a/include/linux/vmstat.h    2009-02-17 07:34:38.000000000 +0900
> +++ b/include/linux/vmstat.h    2009-05-10 02:36:37.000000000 +0900
> @@ -51,6 +51,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
>                 UNEVICTABLE_PGSTRANDED, /* unable to isolate on unlock */
>                 UNEVICTABLE_MLOCKFREED,
>  #endif
> +               FOR_ALL_ZONES(PGA2A),
>                 NR_VM_EVENT_ITEMS
>  };
> 
> Index: b/mm/vmstat.c
> ===================================================================
> --- a/mm/vmstat.c       2009-05-10 01:08:36.000000000 +0900
> +++ b/mm/vmstat.c       2009-05-10 02:37:18.000000000 +0900
> @@ -708,6 +708,7 @@ static const char * const vmstat_text[]
>         "unevictable_pgs_stranded",
>         "unevictable_pgs_mlockfreed",
>  #endif
> +       TEXTS_FOR_ZONES("pga2a")
>  #endif
>  };
> 
> 
> 
> >> please don't ask performance number. I haven't reproduce Wu's patch
> >> improvemnt ;)
> >
> > That's why I decided to "explain" instead of "benchmark" the benefits
> > of my patch, hehe.
> 
> okey, I see.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10 10:03                                               ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-10 10:03 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 05:29:43PM +0800, KOSAKI Motohiro wrote:
> >> >> The patch seems reasonable but the changelog and the (non-existent)
> >> >> design documentation could do with a touch-up.
> >> >
> >> > Is it right that I as a user can do things like mmap my database
> >> > PROT_EXEC to get better database numbers by making other
> >> > stuff swap first ?
> >> >
> >> > You seem to be giving everyone a "nice my process up" hack.
> >>
> >> How about this?
> >
> > Why it deserves more tricks? PROT_EXEC pages are rare.
> > If user space is to abuse PROT_EXEC, let them be for it ;-)
> 
> yes, typicall rare.
> tha problem is, user program _can_ use PROT_EXEC for get higher priority
> ahthough non-executable memory.

- abuses should be rare
- large scale abuses will be even more rare,
- the resulted vmscan overheads are the *expected* side effect
- the side effects are still safe

So if that's what they want, let them have it to their heart's content.

You know it's normal for many users/apps to care only about the result.
When they want something but cannot get it from the smarter version of
PROT_EXEC heuristics, they will go on to devise more complicated tricks.

In the end both sides loose.

If the abused case is important enough, then let's introduce a feature
to explicitly prioritize the pages. But let's leave the PROT_EXEC case
simple.

> In general, static priority mechanism have one weakness. if all object
> have higher priority, it break priority mechanism.

Yup.

> >> then, this patch don't change kernel reclaim policy.
> >>
> >> anyway, user process non-changable preventing "nice my process up
> >> hack" seems makes sense to me.
> >>
> >> test result:
> >>
> >> echo 100 > /proc/sys/vm/dirty_ratio
> >> echo 100 > /proc/sys/vm/dirty_background_ratio
> >> run modified qsbench (use mmap(PROT_EXEC) instead malloc)
> >>
> >> A  A  A  A  A  A active2active vs active2inactive ratio
> >> before A  A 5:5
> >> after A  A  A  1:9
> >
> > Do you have scripts for producing such numbers? I'm dreaming to have
> > such tools :-)
> 
> I made stastics showing patch for testing, hehe :)

I see :)

Thanks,
Fengguang

> ---
>  include/linux/vmstat.h |    1 +
>  mm/vmstat.c            |    1 +
>  2 files changed, 2 insertions(+)
> 
> Index: b/include/linux/vmstat.h
> ===================================================================
> --- a/include/linux/vmstat.h    2009-02-17 07:34:38.000000000 +0900
> +++ b/include/linux/vmstat.h    2009-05-10 02:36:37.000000000 +0900
> @@ -51,6 +51,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
>                 UNEVICTABLE_PGSTRANDED, /* unable to isolate on unlock */
>                 UNEVICTABLE_MLOCKFREED,
>  #endif
> +               FOR_ALL_ZONES(PGA2A),
>                 NR_VM_EVENT_ITEMS
>  };
> 
> Index: b/mm/vmstat.c
> ===================================================================
> --- a/mm/vmstat.c       2009-05-10 01:08:36.000000000 +0900
> +++ b/mm/vmstat.c       2009-05-10 02:37:18.000000000 +0900
> @@ -708,6 +708,7 @@ static const char * const vmstat_text[]
>         "unevictable_pgs_stranded",
>         "unevictable_pgs_mlockfreed",
>  #endif
> +       TEXTS_FOR_ZONES("pga2a")
>  #endif
>  };
> 
> 
> 
> >> please don't ask performance number. I haven't reproduce Wu's patch
> >> improvemnt ;)
> >
> > That's why I decided to "explain" instead of "benchmark" the benefits
> > of my patch, hehe.
> 
> okey, I see.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10  9:35                                             ` Wu Fengguang
@ 2009-05-10 10:06                                               ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 10:06 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Peter Zijlstra, Alan Cox, Andrew Morton, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> I don't think this is desirable, like Andrew already said, there's tons
>> of ways to defeat any of this and we've so far always priorized mappings
>> over !mappings. Limiting this to only PROT_EXEC mappings is already less
>> than it used to be.
>
> Yeah. One thing I realized in readahead is that *anything* can happen.
> When it comes to caching, app/user behaviors are *far more* unpredictable.
> We can make the heuristics as large as 1000LOC (and leave users and
> ourselves lost in the mist) or as simple as 100LOC (and make it happy
> to hacking or even abuse).

umm. I think it isn't good example.
Please see recent_scan/rotate stastics. it use only less 100LOC.

Plus, I don't think stastics is wrong.
if the page can claim "I'm high priority", it's risky. bad userland
program might exploit this rule.
but if the page claim "I think PROT_EXEC is important, maybe", it
isn't risky. if user-program want to exploit the rule, kernel ignore
the claim.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10 10:06                                               ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 10:06 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Peter Zijlstra, Alan Cox, Andrew Morton, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> I don't think this is desirable, like Andrew already said, there's tons
>> of ways to defeat any of this and we've so far always priorized mappings
>> over !mappings. Limiting this to only PROT_EXEC mappings is already less
>> than it used to be.
>
> Yeah. One thing I realized in readahead is that *anything* can happen.
> When it comes to caching, app/user behaviors are *far more* unpredictable.
> We can make the heuristics as large as 1000LOC (and leave users and
> ourselves lost in the mist) or as simple as 100LOC (and make it happy
> to hacking or even abuse).

umm. I think it isn't good example.
Please see recent_scan/rotate stastics. it use only less 100LOC.

Plus, I don't think stastics is wrong.
if the page can claim "I'm high priority", it's risky. bad userland
program might exploit this rule.
but if the page claim "I think PROT_EXEC is important, maybe", it
isn't risky. if user-program want to exploit the rule, kernel ignore
the claim.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 10:03                                               ` Wu Fengguang
@ 2009-05-10 10:15                                                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 10:15 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> >> >> The patch seems reasonable but the changelog and the (non-existent)
>> >> >> design documentation could do with a touch-up.
>> >> >
>> >> > Is it right that I as a user can do things like mmap my database
>> >> > PROT_EXEC to get better database numbers by making other
>> >> > stuff swap first ?
>> >> >
>> >> > You seem to be giving everyone a "nice my process up" hack.
>> >>
>> >> How about this?
>> >
>> > Why it deserves more tricks? PROT_EXEC pages are rare.
>> > If user space is to abuse PROT_EXEC, let them be for it ;-)
>>
>> yes, typicall rare.
>> tha problem is, user program _can_ use PROT_EXEC for get higher priority
>> ahthough non-executable memory.
>
> - abuses should be rare
> - large scale abuses will be even more rare,
> - the resulted vmscan overheads are the *expected* side effect
> - the side effects are still safe

Who expect?
The fact is, application developer decide to use PROT_EXEC, but side-effect
cause end-user, not application developer.

In general, side-effect attack mistaked guy, it's no problem. They can
do it their own risk.
but We know application developer and administrator are often different person.


> So if that's what they want, let them have it to their heart's content.
>
> You know it's normal for many users/apps to care only about the result.
> When they want something but cannot get it from the smarter version of
> PROT_EXEC heuristics, they will go on to devise more complicated tricks.
>
> In the end both sides loose.
>
> If the abused case is important enough, then let's introduce a feature
> to explicitly prioritize the pages. But let's leave the PROT_EXEC case
> simple.

No.
explicit priotize mechanism don't solve problem anyway. application
developer don't know end-user environment.
they can't mark proper page priority.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10 10:15                                                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 10:15 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> >> >> The patch seems reasonable but the changelog and the (non-existent)
>> >> >> design documentation could do with a touch-up.
>> >> >
>> >> > Is it right that I as a user can do things like mmap my database
>> >> > PROT_EXEC to get better database numbers by making other
>> >> > stuff swap first ?
>> >> >
>> >> > You seem to be giving everyone a "nice my process up" hack.
>> >>
>> >> How about this?
>> >
>> > Why it deserves more tricks? PROT_EXEC pages are rare.
>> > If user space is to abuse PROT_EXEC, let them be for it ;-)
>>
>> yes, typicall rare.
>> tha problem is, user program _can_ use PROT_EXEC for get higher priority
>> ahthough non-executable memory.
>
> - abuses should be rare
> - large scale abuses will be even more rare,
> - the resulted vmscan overheads are the *expected* side effect
> - the side effects are still safe

Who expect?
The fact is, application developer decide to use PROT_EXEC, but side-effect
cause end-user, not application developer.

In general, side-effect attack mistaked guy, it's no problem. They can
do it their own risk.
but We know application developer and administrator are often different person.


> So if that's what they want, let them have it to their heart's content.
>
> You know it's normal for many users/apps to care only about the result.
> When they want something but cannot get it from the smarter version of
> PROT_EXEC heuristics, they will go on to devise more complicated tricks.
>
> In the end both sides loose.
>
> If the abused case is important enough, then let's introduce a feature
> to explicitly prioritize the pages. But let's leave the PROT_EXEC case
> simple.

No.
explicit priotize mechanism don't solve problem anyway. application
developer don't know end-user environment.
they can't mark proper page priority.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 10:15                                                 ` KOSAKI Motohiro
@ 2009-05-10 11:21                                                   ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-10 11:21 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 06:15:02PM +0800, KOSAKI Motohiro wrote:
> >> >> >> The patch seems reasonable but the changelog and the (non-existent)
> >> >> >> design documentation could do with a touch-up.
> >> >> >
> >> >> > Is it right that I as a user can do things like mmap my database
> >> >> > PROT_EXEC to get better database numbers by making other
> >> >> > stuff swap first ?
> >> >> >
> >> >> > You seem to be giving everyone a "nice my process up" hack.
> >> >>
> >> >> How about this?
> >> >
> >> > Why it deserves more tricks? PROT_EXEC pages are rare.
> >> > If user space is to abuse PROT_EXEC, let them be for it ;-)
> >>
> >> yes, typicall rare.
> >> tha problem is, user program _can_ use PROT_EXEC for get higher priority
> >> ahthough non-executable memory.
> >
> > - abuses should be rare
> > - large scale abuses will be even more rare,
> > - the resulted vmscan overheads are the *expected* side effect
> > - the side effects are still safe
> 
> Who expect?
> The fact is, application developer decide to use PROT_EXEC, but side-effect
> cause end-user, not application developer.
> 
> In general, side-effect attack mistaked guy, it's no problem. They can
> do it their own risk.
> but We know application developer and administrator are often different person.
> 
> 
> > So if that's what they want, let them have it to their heart's content.
> >
> > You know it's normal for many users/apps to care only about the result.
> > When they want something but cannot get it from the smarter version of
> > PROT_EXEC heuristics, they will go on to devise more complicated tricks.
> >
> > In the end both sides loose.
> >
> > If the abused case is important enough, then let's introduce a feature
> > to explicitly prioritize the pages. But let's leave the PROT_EXEC case
> > simple.
> 
> No.
> explicit priotize mechanism don't solve problem anyway. application
> developer don't know end-user environment.
> they can't mark proper page priority.

So it's simply wrong for an application to prioritize itself and is
not fair gaming and hence should be blamed. I doubt any application
aimed for a wide audience will do this insane hack. But specific
targeted applications are more likely to do all tricks which fits
their needs&environment, and likely they are doing so for good reasons
and are aware of the consequences.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10 11:21                                                   ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-10 11:21 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 06:15:02PM +0800, KOSAKI Motohiro wrote:
> >> >> >> The patch seems reasonable but the changelog and the (non-existent)
> >> >> >> design documentation could do with a touch-up.
> >> >> >
> >> >> > Is it right that I as a user can do things like mmap my database
> >> >> > PROT_EXEC to get better database numbers by making other
> >> >> > stuff swap first ?
> >> >> >
> >> >> > You seem to be giving everyone a "nice my process up" hack.
> >> >>
> >> >> How about this?
> >> >
> >> > Why it deserves more tricks? PROT_EXEC pages are rare.
> >> > If user space is to abuse PROT_EXEC, let them be for it ;-)
> >>
> >> yes, typicall rare.
> >> tha problem is, user program _can_ use PROT_EXEC for get higher priority
> >> ahthough non-executable memory.
> >
> > - abuses should be rare
> > - large scale abuses will be even more rare,
> > - the resulted vmscan overheads are the *expected* side effect
> > - the side effects are still safe
> 
> Who expect?
> The fact is, application developer decide to use PROT_EXEC, but side-effect
> cause end-user, not application developer.
> 
> In general, side-effect attack mistaked guy, it's no problem. They can
> do it their own risk.
> but We know application developer and administrator are often different person.
> 
> 
> > So if that's what they want, let them have it to their heart's content.
> >
> > You know it's normal for many users/apps to care only about the result.
> > When they want something but cannot get it from the smarter version of
> > PROT_EXEC heuristics, they will go on to devise more complicated tricks.
> >
> > In the end both sides loose.
> >
> > If the abused case is important enough, then let's introduce a feature
> > to explicitly prioritize the pages. But let's leave the PROT_EXEC case
> > simple.
> 
> No.
> explicit priotize mechanism don't solve problem anyway. application
> developer don't know end-user environment.
> they can't mark proper page priority.

So it's simply wrong for an application to prioritize itself and is
not fair gaming and hence should be blamed. I doubt any application
aimed for a wide audience will do this insane hack. But specific
targeted applications are more likely to do all tricks which fits
their needs&environment, and likely they are doing so for good reasons
and are aware of the consequences.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 11:21                                                   ` Wu Fengguang
@ 2009-05-10 11:39                                                     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 11:39 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> > So if that's what they want, let them have it to their heart's content.
>> >
>> > You know it's normal for many users/apps to care only about the result.
>> > When they want something but cannot get it from the smarter version of
>> > PROT_EXEC heuristics, they will go on to devise more complicated tricks.
>> >
>> > In the end both sides loose.
>> >
>> > If the abused case is important enough, then let's introduce a feature
>> > to explicitly prioritize the pages. But let's leave the PROT_EXEC case
>> > simple.
>>
>> No.
>> explicit priotize mechanism don't solve problem anyway. application
>> developer don't know end-user environment.
>> they can't mark proper page priority.
>
> So it's simply wrong for an application to prioritize itself and is
> not fair gaming and hence should be blamed. I doubt any application
> aimed for a wide audience will do this insane hack.

There already are.
some application don't interest strict PROT_ setting.

They always use mmap(PROT_READ | PROT_WRITE | PROT_EXEC) for anycase.
Please google it. you can find various example.


> But specific
> targeted applications are more likely to do all tricks which fits
> their needs&environment, and likely they are doing so for good reasons
> and are aware of the consequences.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10 11:39                                                     ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 11:39 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> > So if that's what they want, let them have it to their heart's content.
>> >
>> > You know it's normal for many users/apps to care only about the result.
>> > When they want something but cannot get it from the smarter version of
>> > PROT_EXEC heuristics, they will go on to devise more complicated tricks.
>> >
>> > In the end both sides loose.
>> >
>> > If the abused case is important enough, then let's introduce a feature
>> > to explicitly prioritize the pages. But let's leave the PROT_EXEC case
>> > simple.
>>
>> No.
>> explicit priotize mechanism don't solve problem anyway. application
>> developer don't know end-user environment.
>> they can't mark proper page priority.
>
> So it's simply wrong for an application to prioritize itself and is
> not fair gaming and hence should be blamed. I doubt any application
> aimed for a wide audience will do this insane hack.

There already are.
some application don't interest strict PROT_ setting.

They always use mmap(PROT_READ | PROT_WRITE | PROT_EXEC) for anycase.
Please google it. you can find various example.


> But specific
> targeted applications are more likely to do all tricks which fits
> their needs&environment, and likely they are doing so for good reasons
> and are aware of the consequences.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 11:39                                                     ` KOSAKI Motohiro
@ 2009-05-10 11:44                                                       ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-10 11:44 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 07:39:12PM +0800, KOSAKI Motohiro wrote:
> >> > So if that's what they want, let them have it to their heart's content.
> >> >
> >> > You know it's normal for many users/apps to care only about the result.
> >> > When they want something but cannot get it from the smarter version of
> >> > PROT_EXEC heuristics, they will go on to devise more complicated tricks.
> >> >
> >> > In the end both sides loose.
> >> >
> >> > If the abused case is important enough, then let's introduce a feature
> >> > to explicitly prioritize the pages. But let's leave the PROT_EXEC case
> >> > simple.
> >>
> >> No.
> >> explicit priotize mechanism don't solve problem anyway. application
> >> developer don't know end-user environment.
> >> they can't mark proper page priority.
> >
> > So it's simply wrong for an application to prioritize itself and is
> > not fair gaming and hence should be blamed. I doubt any application
> > aimed for a wide audience will do this insane hack.
> 
> There already are.
> some application don't interest strict PROT_ setting.

> They always use mmap(PROT_READ | PROT_WRITE | PROT_EXEC) for anycase.
> Please google it. you can find various example.
 
How widely is PROT_EXEC abused? Would you share some of your google results?

Thanks,
Fengguang

> 
> > But specific
> > targeted applications are more likely to do all tricks which fits
> > their needs&environment, and likely they are doing so for good reasons
> > and are aware of the consequences.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10 11:44                                                       ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-10 11:44 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Alan Cox, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 07:39:12PM +0800, KOSAKI Motohiro wrote:
> >> > So if that's what they want, let them have it to their heart's content.
> >> >
> >> > You know it's normal for many users/apps to care only about the result.
> >> > When they want something but cannot get it from the smarter version of
> >> > PROT_EXEC heuristics, they will go on to devise more complicated tricks.
> >> >
> >> > In the end both sides loose.
> >> >
> >> > If the abused case is important enough, then let's introduce a feature
> >> > to explicitly prioritize the pages. But let's leave the PROT_EXEC case
> >> > simple.
> >>
> >> No.
> >> explicit priotize mechanism don't solve problem anyway. application
> >> developer don't know end-user environment.
> >> they can't mark proper page priority.
> >
> > So it's simply wrong for an application to prioritize itself and is
> > not fair gaming and hence should be blamed. I doubt any application
> > aimed for a wide audience will do this insane hack.
> 
> There already are.
> some application don't interest strict PROT_ setting.

> They always use mmap(PROT_READ | PROT_WRITE | PROT_EXEC) for anycase.
> Please google it. you can find various example.
 
How widely is PROT_EXEC abused? Would you share some of your google results?

Thanks,
Fengguang

> 
> > But specific
> > targeted applications are more likely to do all tricks which fits
> > their needs&environment, and likely they are doing so for good reasons
> > and are aware of the consequences.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 11:44                                                       ` Wu Fengguang
@ 2009-05-10 12:19                                                         ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-10 12:19 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KOSAKI Motohiro, Alan Cox, Andrew Morton, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, 2009-05-10 at 19:44 +0800, Wu Fengguang wrote:
> 
> > They always use mmap(PROT_READ | PROT_WRITE | PROT_EXEC) for anycase.
> > Please google it. you can find various example.
>  
> How widely is PROT_EXEC abused? Would you share some of your google results?

That's a security bug right there and should be fixed regardless of our
heuristics.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10 12:19                                                         ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-10 12:19 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: KOSAKI Motohiro, Alan Cox, Andrew Morton, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, 2009-05-10 at 19:44 +0800, Wu Fengguang wrote:
> 
> > They always use mmap(PROT_READ | PROT_WRITE | PROT_EXEC) for anycase.
> > Please google it. you can find various example.
>  
> How widely is PROT_EXEC abused? Would you share some of your google results?

That's a security bug right there and should be fixed regardless of our
heuristics.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 12:19                                                         ` Peter Zijlstra
@ 2009-05-10 12:39                                                           ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 12:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wu Fengguang, Alan Cox, Andrew Morton, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> > They always use mmap(PROT_READ | PROT_WRITE | PROT_EXEC) for anycase.
>> > Please google it. you can find various example.
>>
>> How widely is PROT_EXEC abused? Would you share some of your google results?
>
> That's a security bug right there and should be fixed regardless of our
> heuristics.

Yes, should be. but it's not security issue. it doesn't make any security hole.
Plus, this claim doesn't help to solve end-user problems.

I think the basic concept of the patch is right.
  - executable mapping is important for good latency
  - executable file is relatively small

The last problem is, The patch assume executable mappings is rare, but
it isn't guranteed.
How do we separate truth executable mapping and mis-used PROT_EXEC usage?

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10 12:39                                                           ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 12:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wu Fengguang, Alan Cox, Andrew Morton, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

>> > They always use mmap(PROT_READ | PROT_WRITE | PROT_EXEC) for anycase.
>> > Please google it. you can find various example.
>>
>> How widely is PROT_EXEC abused? Would you share some of your google results?
>
> That's a security bug right there and should be fixed regardless of our
> heuristics.

Yes, should be. but it's not security issue. it doesn't make any security hole.
Plus, this claim doesn't help to solve end-user problems.

I think the basic concept of the patch is right.
  - executable mapping is important for good latency
  - executable file is relatively small

The last problem is, The patch assume executable mappings is rare, but
it isn't guranteed.
How do we separate truth executable mapping and mis-used PROT_EXEC usage?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 12:39                                                           ` KOSAKI Motohiro
@ 2009-05-10 13:17                                                             ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-10 13:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Wu Fengguang, Alan Cox, Andrew Morton, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, 2009-05-10 at 21:39 +0900, KOSAKI Motohiro wrote:
> >> > They always use mmap(PROT_READ | PROT_WRITE | PROT_EXEC) for anycase.
> >> > Please google it. you can find various example.
> >>
> >> How widely is PROT_EXEC abused? Would you share some of your google results?
> >
> > That's a security bug right there and should be fixed regardless of our
> > heuristics.
> 
> Yes, should be. but it's not security issue. it doesn't make any security hole.
> Plus, this claim doesn't help to solve end-user problems.

Having more stuff executable than absolutely needed is always a security
issue.

> I think the basic concept of the patch is right.
>   - executable mapping is important for good latency
>   - executable file is relatively small
> 
> The last problem is, The patch assume executable mappings is rare, but
> it isn't guranteed.
> How do we separate truth executable mapping and mis-used PROT_EXEC usage?

One could possibly limit the size, but I don't think it pays to bother
about this until we really run into it, again as Andrew already said,
there's more ways to screw reclaim if you really want to.




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10 13:17                                                             ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-10 13:17 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Wu Fengguang, Alan Cox, Andrew Morton, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, 2009-05-10 at 21:39 +0900, KOSAKI Motohiro wrote:
> >> > They always use mmap(PROT_READ | PROT_WRITE | PROT_EXEC) for anycase.
> >> > Please google it. you can find various example.
> >>
> >> How widely is PROT_EXEC abused? Would you share some of your google results?
> >
> > That's a security bug right there and should be fixed regardless of our
> > heuristics.
> 
> Yes, should be. but it's not security issue. it doesn't make any security hole.
> Plus, this claim doesn't help to solve end-user problems.

Having more stuff executable than absolutely needed is always a security
issue.

> I think the basic concept of the patch is right.
>   - executable mapping is important for good latency
>   - executable file is relatively small
> 
> The last problem is, The patch assume executable mappings is rare, but
> it isn't guranteed.
> How do we separate truth executable mapping and mis-used PROT_EXEC usage?

One could possibly limit the size, but I don't think it pays to bother
about this until we really run into it, again as Andrew already said,
there's more ways to screw reclaim if you really want to.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10  9:36                                             ` KOSAKI Motohiro
@ 2009-05-10 13:45                                               ` Alan Cox
  -1 siblings, 0 replies; 336+ messages in thread
From: Alan Cox @ 2009-05-10 13:45 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Peter Zijlstra, Andrew Morton, Wu Fengguang, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, 10 May 2009 18:36:19 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> I don't oppose this policy. PROT_EXEC seems good viewpoint.

I don't think it is that simple

Not only can it be abused but some systems such as java have large
PROT_EXEC mapped environments, as do many other JIT based languages.

Secondly it moves the pressure from the storage volume holding the system
binaries and libraries to the swap device which already has to deal with
a lot of random (and thus expensive) I/O, as well as the users filestore
for mapped objects there - which may even be on a USB thumbdrive.

I still think the focus is on the wrong thing. We shouldn't be trying to
micro-optimise page replacement guesswork - we should be macro-optimising
the resulting I/O performance. My disks each do 50MBytes/second and even with the
Gnome developers finest creations that ought to be enough if the rest of
the system was working properly.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10 13:45                                               ` Alan Cox
  0 siblings, 0 replies; 336+ messages in thread
From: Alan Cox @ 2009-05-10 13:45 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Peter Zijlstra, Andrew Morton, Wu Fengguang, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

On Sun, 10 May 2009 18:36:19 +0900
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
> I don't oppose this policy. PROT_EXEC seems good viewpoint.

I don't think it is that simple

Not only can it be abused but some systems such as java have large
PROT_EXEC mapped environments, as do many other JIT based languages.

Secondly it moves the pressure from the storage volume holding the system
binaries and libraries to the swap device which already has to deal with
a lot of random (and thus expensive) I/O, as well as the users filestore
for mapped objects there - which may even be on a USB thumbdrive.

I still think the focus is on the wrong thing. We shouldn't be trying to
micro-optimise page replacement guesswork - we should be macro-optimising
the resulting I/O performance. My disks each do 50MBytes/second and even with the
Gnome developers finest creations that ought to be enough if the rest of
the system was working properly.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 13:45                                               ` Alan Cox
@ 2009-05-10 13:56                                                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 13:56 UTC (permalink / raw)
  To: Alan Cox
  Cc: Peter Zijlstra, Andrew Morton, Wu Fengguang, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

Hi


>> I don't oppose this policy. PROT_EXEC seems good viewpoint.
>
> I don't think it is that simple
>
> Not only can it be abused but some systems such as java have large
> PROT_EXEC mapped environments, as do many other JIT based languages.

hmm
I don't think this patch change JIT behavior.
JIT makes large executable _anon_  pages. but page_mapping(anon-page)
return NULL.

Thus, the logic do nothing.


> Secondly it moves the pressure from the storage volume holding the system
> binaries and libraries to the swap device which already has to deal with
> a lot of random (and thus expensive) I/O, as well as the users filestore
> for mapped objects there - which may even be on a USB thumbdrive.

true.
My SSD have high speed random reading charactastics.

> I still think the focus is on the wrong thing. We shouldn't be trying to
> micro-optimise page replacement guesswork - we should be macro-optimising
> the resulting I/O performance. My disks each do 50MBytes/second and even with the
> Gnome developers finest creations that ought to be enough if the rest of
> the system was working properly.

Yes, mesurement is essential.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10 13:56                                                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 13:56 UTC (permalink / raw)
  To: Alan Cox
  Cc: Peter Zijlstra, Andrew Morton, Wu Fengguang, hannes, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

Hi


>> I don't oppose this policy. PROT_EXEC seems good viewpoint.
>
> I don't think it is that simple
>
> Not only can it be abused but some systems such as java have large
> PROT_EXEC mapped environments, as do many other JIT based languages.

hmm
I don't think this patch change JIT behavior.
JIT makes large executable _anon_  pages. but page_mapping(anon-page)
return NULL.

Thus, the logic do nothing.


> Secondly it moves the pressure from the storage volume holding the system
> binaries and libraries to the swap device which already has to deal with
> a lot of random (and thus expensive) I/O, as well as the users filestore
> for mapped objects there - which may even be on a USB thumbdrive.

true.
My SSD have high speed random reading charactastics.

> I still think the focus is on the wrong thing. We shouldn't be trying to
> micro-optimise page replacement guesswork - we should be macro-optimising
> the resulting I/O performance. My disks each do 50MBytes/second and even with the
> Gnome developers finest creations that ought to be enough if the rest of
> the system was working properly.

Yes, mesurement is essential.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-10 13:45                                               ` Alan Cox
@ 2009-05-10 14:51                                                 ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-10 14:51 UTC (permalink / raw)
  To: Alan Cox
  Cc: KOSAKI Motohiro, Peter Zijlstra, Andrew Morton, Wu Fengguang,
	hannes, linux-kernel, tytso, linux-mm, elladan, npiggin, cl,
	minchan.kim

Alan Cox wrote:
> On Sun, 10 May 2009 18:36:19 +0900
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>> I don't oppose this policy. PROT_EXEC seems good viewpoint.
> 
> I don't think it is that simple
> 
> Not only can it be abused but some systems such as java have large
> PROT_EXEC mapped environments, as do many other JIT based languages.

On the file LRU side, or on the anon LRU side?

> Secondly it moves the pressure from the storage volume holding the system
> binaries and libraries to the swap device which already has to deal with
> a lot of random (and thus expensive) I/O, as well as the users filestore
> for mapped objects there - which may even be on a USB thumbdrive.

Preserving the PROT_EXEC pages over streaming IO should not
move much (if any) pressure from the file LRUs onto the
swap-backed (anon) LRUs.

> I still think the focus is on the wrong thing. We shouldn't be trying to
> micro-optimise page replacement guesswork - we should be macro-optimising
> the resulting I/O performance.

Any ideas on how to achieve that? :)

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10 14:51                                                 ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-10 14:51 UTC (permalink / raw)
  To: Alan Cox
  Cc: KOSAKI Motohiro, Peter Zijlstra, Andrew Morton, Wu Fengguang,
	hannes, linux-kernel, tytso, linux-mm, elladan, npiggin, cl,
	minchan.kim

Alan Cox wrote:
> On Sun, 10 May 2009 18:36:19 +0900
> KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote:
>> I don't oppose this policy. PROT_EXEC seems good viewpoint.
> 
> I don't think it is that simple
> 
> Not only can it be abused but some systems such as java have large
> PROT_EXEC mapped environments, as do many other JIT based languages.

On the file LRU side, or on the anon LRU side?

> Secondly it moves the pressure from the storage volume holding the system
> binaries and libraries to the swap device which already has to deal with
> a lot of random (and thus expensive) I/O, as well as the users filestore
> for mapped objects there - which may even be on a USB thumbdrive.

Preserving the PROT_EXEC pages over streaming IO should not
move much (if any) pressure from the file LRUs onto the
swap-backed (anon) LRUs.

> I still think the focus is on the wrong thing. We shouldn't be trying to
> micro-optimise page replacement guesswork - we should be macro-optimising
> the resulting I/O performance.

Any ideas on how to achieve that? :)

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 14:51                                                 ` Rik van Riel
@ 2009-05-10 14:59                                                   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 14:59 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Alan Cox, Peter Zijlstra, Andrew Morton, Wu Fengguang, hannes,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

Hi

>> Secondly it moves the pressure from the storage volume holding the system
>> binaries and libraries to the swap device which already has to deal with
>> a lot of random (and thus expensive) I/O, as well as the users filestore
>> for mapped objects there - which may even be on a USB thumbdrive.
>
> Preserving the PROT_EXEC pages over streaming IO should not
> move much (if any) pressure from the file LRUs onto the
> swap-backed (anon) LRUs.

I don't think this is good example.
this issue is already solved by your patch. Thus this patch don't
improve streaming IO issue.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10 14:59                                                   ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-10 14:59 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Alan Cox, Peter Zijlstra, Andrew Morton, Wu Fengguang, hannes,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

Hi

>> Secondly it moves the pressure from the storage volume holding the system
>> binaries and libraries to the swap device which already has to deal with
>> a lot of random (and thus expensive) I/O, as well as the users filestore
>> for mapped objects there - which may even be on a USB thumbdrive.
>
> Preserving the PROT_EXEC pages over streaming IO should not
> move much (if any) pressure from the file LRUs onto the
> swap-backed (anon) LRUs.

I don't think this is good example.
this issue is already solved by your patch. Thus this patch don't
improve streaming IO issue.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 14:51                                                 ` Rik van Riel
@ 2009-05-10 20:13                                                   ` Alan Cox
  -1 siblings, 0 replies; 336+ messages in thread
From: Alan Cox @ 2009-05-10 20:13 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Peter Zijlstra, Andrew Morton, Wu Fengguang,
	hannes, linux-kernel, tytso, linux-mm, elladan, npiggin, cl,
	minchan.kim

> > Not only can it be abused but some systems such as java have large
> > PROT_EXEC mapped environments, as do many other JIT based languages.
> 
> On the file LRU side, or on the anon LRU side?

Generally anonymous so it would indeed be ok.

> > I still think the focus is on the wrong thing. We shouldn't be trying to
> > micro-optimise page replacement guesswork - we should be macro-optimising
> > the resulting I/O performance.
> 
> Any ideas on how to achieve that? :)

I know - vm is hard, and page out consists of making the best wrong
decision without having the facts.

Make your swap decisions depend upon I/O load on storage devices. Make
your paging decisions based upon writing and reading large contiguous
chunks (512K costs the same as 8K pretty much) - but you already know
that .

Historically BSD tackled some of this by actually swapping processes out
once pressure got very high - because even way back it actually became
cheaper at some point than spewing randomness at the disk drive. Plus it
also avoids the death by thrashing problem. Possibly however that means
the chunk size should relate to the paging rate ?

I get to watch what comes down the pipe from the vm, and it's not pretty,
especially when todays disk drive is more like swapping to a tape loop. I
can see how to fix anonymous page out (log structured swap) but I'm not
sure what that would do to anonymous page-in even with a cleaner.

At the block level it may be worth having a look what is going on in more
detail - the bigger queues and I/O sizes on modern disks (plus the
cache flushimng) also mean that the amount of time it can take a command
to the head and back to the OS has probably jumped a lot with newer SATA
devices - even if the block layer is getting them queued at the head of
the queue and promptly. I can give a disk 15 seconds of work quite easily
and possibly stuffing the disk stupid isn't the right algorithm when
paging is considered.

rpm -e gnome* and Arjan's ioprio hacks seem to fix my box but thats not a
general useful approach. I need to re-test the ioprio hacks with a
current kernel and see if the other I/O changes have helped.

Alan

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10 20:13                                                   ` Alan Cox
  0 siblings, 0 replies; 336+ messages in thread
From: Alan Cox @ 2009-05-10 20:13 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Peter Zijlstra, Andrew Morton, Wu Fengguang,
	hannes, linux-kernel, tytso, linux-mm, elladan, npiggin, cl,
	minchan.kim

> > Not only can it be abused but some systems such as java have large
> > PROT_EXEC mapped environments, as do many other JIT based languages.
> 
> On the file LRU side, or on the anon LRU side?

Generally anonymous so it would indeed be ok.

> > I still think the focus is on the wrong thing. We shouldn't be trying to
> > micro-optimise page replacement guesswork - we should be macro-optimising
> > the resulting I/O performance.
> 
> Any ideas on how to achieve that? :)

I know - vm is hard, and page out consists of making the best wrong
decision without having the facts.

Make your swap decisions depend upon I/O load on storage devices. Make
your paging decisions based upon writing and reading large contiguous
chunks (512K costs the same as 8K pretty much) - but you already know
that .

Historically BSD tackled some of this by actually swapping processes out
once pressure got very high - because even way back it actually became
cheaper at some point than spewing randomness at the disk drive. Plus it
also avoids the death by thrashing problem. Possibly however that means
the chunk size should relate to the paging rate ?

I get to watch what comes down the pipe from the vm, and it's not pretty,
especially when todays disk drive is more like swapping to a tape loop. I
can see how to fix anonymous page out (log structured swap) but I'm not
sure what that would do to anonymous page-in even with a cleaner.

At the block level it may be worth having a look what is going on in more
detail - the bigger queues and I/O sizes on modern disks (plus the
cache flushimng) also mean that the amount of time it can take a command
to the head and back to the OS has probably jumped a lot with newer SATA
devices - even if the block layer is getting them queued at the head of
the queue and promptly. I can give a disk 15 seconds of work quite easily
and possibly stuffing the disk stupid isn't the right algorithm when
paging is considered.

rpm -e gnome* and Arjan's ioprio hacks seem to fix my box but thats not a
general useful approach. I need to re-test the ioprio hacks with a
current kernel and see if the other I/O changes have helped.

Alan

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-10 20:13                                                   ` Alan Cox
@ 2009-05-10 20:37                                                     ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-10 20:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: KOSAKI Motohiro, Peter Zijlstra, Andrew Morton, Wu Fengguang,
	hannes, linux-kernel, tytso, linux-mm, elladan, npiggin, cl,
	minchan.kim

Alan Cox wrote:

> Make your swap decisions depend upon I/O load on storage devices. Make
> your paging decisions based upon writing and reading large contiguous
> chunks (512K costs the same as 8K pretty much) - but you already know
> that .

Even a 2MB chunk only takes 3x as much time to write to
or read from disk as a 4kB page.

> Historically BSD tackled some of this by actually swapping processes out
> once pressure got very high 

Our big problem today usually isn't throughput though,
but latency - the time it takes to bring a previously
inactive application back to life.

If we have any throughput related memory problems,
they often seem to be due to TLB miss penalties.

I believe it is time to start looking into transparent
use of 2MB superpages for anonymous memory (and tmpfs?)
in Linux on x86-64.

I realize the utter horror of all the different corner
cases one can have with those. However, with a careful
design the problems should be manageable and the
advantages are many.

With a reservation based system, populating a 2MB area
4kB at a time until most of the area is in use by one
process (or not), waste can be kept to a minimum.

I guess I'll start with this the same way I started
with the split LRU code - think of all the ways things
could possibly go wrong and come up with a design that
seems mostly impervious to the downsides.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-10 20:37                                                     ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-10 20:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: KOSAKI Motohiro, Peter Zijlstra, Andrew Morton, Wu Fengguang,
	hannes, linux-kernel, tytso, linux-mm, elladan, npiggin, cl,
	minchan.kim

Alan Cox wrote:

> Make your swap decisions depend upon I/O load on storage devices. Make
> your paging decisions based upon writing and reading large contiguous
> chunks (512K costs the same as 8K pretty much) - but you already know
> that .

Even a 2MB chunk only takes 3x as much time to write to
or read from disk as a 4kB page.

> Historically BSD tackled some of this by actually swapping processes out
> once pressure got very high 

Our big problem today usually isn't throughput though,
but latency - the time it takes to bring a previously
inactive application back to life.

If we have any throughput related memory problems,
they often seem to be due to TLB miss penalties.

I believe it is time to start looking into transparent
use of 2MB superpages for anonymous memory (and tmpfs?)
in Linux on x86-64.

I realize the utter horror of all the different corner
cases one can have with those. However, with a careful
design the problems should be manageable and the
advantages are many.

With a reservation based system, populating a 2MB area
4kB at a time until most of the area is in use by one
process (or not), waste can be kept to a minimum.

I guess I'll start with this the same way I started
with the split LRU code - think of all the ways things
could possibly go wrong and come up with a design that
seems mostly impervious to the downsides.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 20:37                                                     ` Rik van Riel
@ 2009-05-10 21:23                                                       ` Arjan van de Ven
  -1 siblings, 0 replies; 336+ messages in thread
From: Arjan van de Ven @ 2009-05-10 21:23 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Alan Cox, KOSAKI Motohiro, Peter Zijlstra, Andrew Morton,
	Wu Fengguang, hannes, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, minchan.kim

On Sun, 10 May 2009 16:37:33 -0400
Rik van Riel <riel@redhat.com> wrote:

> Alan Cox wrote:
> 
> > Make your swap decisions depend upon I/O load on storage devices.
> > Make your paging decisions based upon writing and reading large
> > contiguous chunks (512K costs the same as 8K pretty much) - but you
> > already know that .
> 
> Even a 2MB chunk only takes 3x as much time to write to
> or read from disk as a 4kB page.

... if your disk rotates.
If instead it's a voltage level in a transistor... the opposite is
true... it starts to approach linear-with-size then ;-)
 
At least we know for the block device which of the two types it is
inside the kernel (ok, there's a few false positives towards rotating,
but those we could/should quirk away)

> 
> > Historically BSD tackled some of this by actually swapping
> > processes out once pressure got very high 
> 
> Our big problem today usually isn't throughput though,
> but latency - the time it takes to bring a previously
> inactive application back to life.

Could we do a chain? E.g. store which page we paged out next (for the
vma) as part of the first pageout, and then page them just right back
in? Or even have a (bitmap?) of pages that have been in memory for the
vma, and on a re-fault, look for other pages "nearby" that used to be
in but are now out ?

> 
> If we have any throughput related memory problems,
> they often seem to be due to TLB miss penalties.

TLB miss is cheap on x86. For most non-HPC workloads they
tend to be hidden by the out of order execution...

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10 21:23                                                       ` Arjan van de Ven
  0 siblings, 0 replies; 336+ messages in thread
From: Arjan van de Ven @ 2009-05-10 21:23 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Alan Cox, KOSAKI Motohiro, Peter Zijlstra, Andrew Morton,
	Wu Fengguang, hannes, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, minchan.kim

On Sun, 10 May 2009 16:37:33 -0400
Rik van Riel <riel@redhat.com> wrote:

> Alan Cox wrote:
> 
> > Make your swap decisions depend upon I/O load on storage devices.
> > Make your paging decisions based upon writing and reading large
> > contiguous chunks (512K costs the same as 8K pretty much) - but you
> > already know that .
> 
> Even a 2MB chunk only takes 3x as much time to write to
> or read from disk as a 4kB page.

... if your disk rotates.
If instead it's a voltage level in a transistor... the opposite is
true... it starts to approach linear-with-size then ;-)
 
At least we know for the block device which of the two types it is
inside the kernel (ok, there's a few false positives towards rotating,
but those we could/should quirk away)

> 
> > Historically BSD tackled some of this by actually swapping
> > processes out once pressure got very high 
> 
> Our big problem today usually isn't throughput though,
> but latency - the time it takes to bring a previously
> inactive application back to life.

Could we do a chain? E.g. store which page we paged out next (for the
vma) as part of the first pageout, and then page them just right back
in? Or even have a (bitmap?) of pages that have been in memory for the
vma, and on a re-fault, look for other pages "nearby" that used to be
in but are now out ?

> 
> If we have any throughput related memory problems,
> they often seem to be due to TLB miss penalties.

TLB miss is cheap on x86. For most non-HPC workloads they
tend to be hidden by the out of order execution...

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 20:37                                                     ` Rik van Riel
@ 2009-05-10 21:29                                                       ` Alan Cox
  -1 siblings, 0 replies; 336+ messages in thread
From: Alan Cox @ 2009-05-10 21:29 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Peter Zijlstra, Andrew Morton, Wu Fengguang,
	hannes, linux-kernel, tytso, linux-mm, elladan, npiggin, cl,
	minchan.kim

> Our big problem today usually isn't throughput though,
> but latency - the time it takes to bring a previously
> inactive application back to life.

But if you page back in in 2MB chunks that is faster too. The initial "oh
dear we guessed wrong and he's clicked on OpenOffice again" we can't
really speed up (barring not paging out those bits and a little bit of
potential gain from not ramming stuff down the disks throat at full pelt)
but the amount of time it takes after that first "run for the disk"
moment is a lot shorter. 

One question I have no idea as to the answer or any research on is "if I
take a 2MB chunk of an apps pages and toss them out together is there
sufficient statistical correlation that makes it useful to pull them back
in together"

Clearly working in 512K/2MB chunks reduces the efficiency that we get
from memory (which we have lots of) as well as improving our I/O
efficiency dramatically (which we are very short of), the question is
which dominates under load.






^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-10 21:29                                                       ` Alan Cox
  0 siblings, 0 replies; 336+ messages in thread
From: Alan Cox @ 2009-05-10 21:29 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Peter Zijlstra, Andrew Morton, Wu Fengguang,
	hannes, linux-kernel, tytso, linux-mm, elladan, npiggin, cl,
	minchan.kim

> Our big problem today usually isn't throughput though,
> but latency - the time it takes to bring a previously
> inactive application back to life.

But if you page back in in 2MB chunks that is faster too. The initial "oh
dear we guessed wrong and he's clicked on OpenOffice again" we can't
really speed up (barring not paging out those bits and a little bit of
potential gain from not ramming stuff down the disks throat at full pelt)
but the amount of time it takes after that first "run for the disk"
moment is a lot shorter. 

One question I have no idea as to the answer or any research on is "if I
take a 2MB chunk of an apps pages and toss them out together is there
sufficient statistical correlation that makes it useful to pull them back
in together"

Clearly working in 512K/2MB chunks reduces the efficiency that we get
from memory (which we have lots of) as well as improving our I/O
efficiency dramatically (which we are very short of), the question is
which dominates under load.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
  2009-05-09  6:56                                           ` Wu Fengguang
@ 2009-05-10 23:45                                             ` Minchan Kim
  -1 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-10 23:45 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, Andrew Morton,
	Rik van Riel, linux-kernel, tytso, linux-mm, Elladan,
	Nick Piggin, Christoph Lameter, KOSAKI Motohiro,
	Lee Schermerhorn

Sorry for late. 

On Sat, 9 May 2009 14:56:40 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Fri, May 08, 2009 at 10:01:19PM +0800, Minchan Kim wrote:
> > On Fri, May 8, 2009 at 9:15 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > > On Fri, May 08, 2009 at 08:09:24PM +0800, Minchan Kim wrote:
> > >> On Fri, May 8, 2009 at 1:17 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > >> > On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> > >> >> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> > >> >>
> > >> >> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
> > >> >> > >
> > >> >> > >           /* page_referenced clears PageReferenced */
> > >> >> > >           if (page_mapping_inuse(page) &&
> > >> >> > > -             page_referenced(page, 0, sc->mem_cgroup))
> > >> >> > > +             page_referenced(page, 0, sc->mem_cgroup)) {
> > >> >> > > +                 struct address_space *mapping = page_mapping(page);
> > >> >> > > +
> > >> >> > >                   pgmoved++;
> > >> >> > > +                 if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > >> >> > > +                         list_add(&page->lru, &l_active);
> > >> >> > > +                         continue;
> > >> >> > > +                 }
> > >> >> > > +         }
> > >> >> >
> > >> >> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > >> >> > better to check if one of them is executable?  This would even work
> > >> >> > for executable anon pages.  After all, there are applications that cow
> > >> >> > executable mappings (sbcl and other language environments that use an
> > >> >> > executable, run-time modified core image come to mind).
> > >> >>
> > >> >> Hmm, like provide a vm_flags mask along to page_referenced() to only
> > >> >> account matching vmas... seems like a sensible idea.
> > >> >
> > >> > Here is a quick patch for your opinions. Compile tested.
> > >> >
> > >> > With the added vm_flags reporting, the mlock=>unevictable logic can
> > >> > possibly be made more straightforward.
> > >> >
> > >> > Thanks,
> > >> > Fengguang
> > >> > ---
> > >> > vmscan: report vm_flags in page_referenced()
> > >> >
> > >> > This enables more informed reclaim heuristics, eg. to protect executable
> > >> > file pages more aggressively.
> > >> >
> > >> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > >> > ---
> > >> >  include/linux/rmap.h |    5 +++--
> > >> >  mm/rmap.c            |   30 +++++++++++++++++++++---------
> > >> >  mm/vmscan.c          |    7 +++++--
> > >> >  3 files changed, 29 insertions(+), 13 deletions(-)
> > >> >
> > >> > --- linux.orig/include/linux/rmap.h
> > >> > +++ linux/include/linux/rmap.h
> > >> > @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
> > >> >  /*
> > >> >  * Called from mm/vmscan.c to handle paging out
> > >> >  */
> > >> > -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
> > >> > +int page_referenced(struct page *, int is_locked,
> > >> > +                       struct mem_cgroup *cnt, unsigned long *vm_flags);
> > >> >  int try_to_unmap(struct page *, int ignore_refs);
> > >> >
> > >> >  /*
> > >> > @@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
> > >> >  #define anon_vma_prepare(vma)  (0)
> > >> >  #define anon_vma_link(vma)     do {} while (0)
> > >> >
> > >> > -#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
> > >> > +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
> > >> >  #define try_to_unmap(page, refs) SWAP_FAIL
> > >> >
> > >> >  static inline int page_mkclean(struct page *page)
> > >> > --- linux.orig/mm/rmap.c
> > >> > +++ linux/mm/rmap.c
> > >> > @@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
> > >> >  * repeatedly from either page_referenced_anon or page_referenced_file.
> > >> >  */
> > >> >  static int page_referenced_one(struct page *page,
> > >> > -       struct vm_area_struct *vma, unsigned int *mapcount)
> > >> > +                              struct vm_area_struct *vma,
> > >> > +                              unsigned int *mapcount)
> > >> >  {
> > >> >        struct mm_struct *mm = vma->vm_mm;
> > >> >        unsigned long address;
> > >> > @@ -385,7 +386,8 @@ out:
> > >> >  }
> > >> >
> > >> >  static int page_referenced_anon(struct page *page,
> > >> > -                               struct mem_cgroup *mem_cont)
> > >> > +                               struct mem_cgroup *mem_cont,
> > >> > +                               unsigned long *vm_flags)
> > >> >  {
> > >> >        unsigned int mapcount;
> > >> >        struct anon_vma *anon_vma;
> > >> > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
> > >> >                if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> > >> >                        continue;
> > >> >                referenced += page_referenced_one(page, vma, &mapcount);
> > >> > +               *vm_flags |= vma->vm_flags;
> > >>
> > >> Sometime this vma don't contain the anon page.
> > >> That's why we need page_check_address.
> > >> For such a case, wrong *vm_flag cause be harmful to reclaim.
> > >> It can be happen in your first class citizen patch, I think.
> > >
> > > Yes I'm aware of that - the VMA area covers that page, but have no pte
> > > actually installed for that page. That should be OK - the presentation
> > > of such VMA is a good indication of it being some executable text.
> > >
> > 
> > Sorry but I can't understand your point.
> > 
> > This is general interface but not only executable text.
> > Sometime, The information of vma which don't really have the page can
> > be passed to caller.
> 
> Right. But if the caller don't care, why bother passing the vm_flags
> parameter down to page_referenced_one()? We can do that when there
> comes a need, otherwise it sounds more like unnecessary overheads.
> 
> > ex) It can be happen by COW, mremap, non-linear mapping and so on.
> > but I am not sure.
> 
> Hmm, this reminded me of the mlocked page protection logic in
> page_referenced_one(). Why shall the "if (vma->vm_flags & VM_LOCKED)"
> check be placed *after* the page_check_address() check? Is there a
> case that an *existing* page frame is not mapped to the VM_LOCKED vma?
> And why not to protect the page in such a case?


I also have been having a question that routine.
As annotation said, it seems to prevent increaseing referenced counter for mlocked page to move the page to unevictable list ASAP.
Is right?
 
But now, page_referenced use refereced variable as just flag not count. 
So, I think referecned variable counted is meaningless. 

What do you think ?


-- 
Kinds Regards
Minchan Kim

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
@ 2009-05-10 23:45                                             ` Minchan Kim
  0 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-10 23:45 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Minchan Kim, Peter Zijlstra, Johannes Weiner, Andrew Morton,
	Rik van Riel, linux-kernel, tytso, linux-mm, Elladan,
	Nick Piggin, Christoph Lameter, KOSAKI Motohiro,
	Lee Schermerhorn

Sorry for late. 

On Sat, 9 May 2009 14:56:40 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Fri, May 08, 2009 at 10:01:19PM +0800, Minchan Kim wrote:
> > On Fri, May 8, 2009 at 9:15 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > > On Fri, May 08, 2009 at 08:09:24PM +0800, Minchan Kim wrote:
> > >> On Fri, May 8, 2009 at 1:17 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > >> > On Thu, May 07, 2009 at 11:17:46PM +0800, Peter Zijlstra wrote:
> > >> >> On Thu, 2009-05-07 at 17:10 +0200, Johannes Weiner wrote:
> > >> >>
> > >> >> > > @@ -1269,8 +1270,15 @@ static void shrink_active_list(unsigned
> > >> >> > >
> > >> >> > > A  A  A  A  A  /* page_referenced clears PageReferenced */
> > >> >> > > A  A  A  A  A  if (page_mapping_inuse(page) &&
> > >> >> > > - A  A  A  A  A  A  page_referenced(page, 0, sc->mem_cgroup))
> > >> >> > > + A  A  A  A  A  A  page_referenced(page, 0, sc->mem_cgroup)) {
> > >> >> > > + A  A  A  A  A  A  A  A  struct address_space *mapping = page_mapping(page);
> > >> >> > > +
> > >> >> > > A  A  A  A  A  A  A  A  A  pgmoved++;
> > >> >> > > + A  A  A  A  A  A  A  A  if (mapping && test_bit(AS_EXEC, &mapping->flags)) {
> > >> >> > > + A  A  A  A  A  A  A  A  A  A  A  A  list_add(&page->lru, &l_active);
> > >> >> > > + A  A  A  A  A  A  A  A  A  A  A  A  continue;
> > >> >> > > + A  A  A  A  A  A  A  A  }
> > >> >> > > + A  A  A  A  }
> > >> >> >
> > >> >> > Since we walk the VMAs in page_referenced anyway, wouldn't it be
> > >> >> > better to check if one of them is executable? A This would even work
> > >> >> > for executable anon pages. A After all, there are applications that cow
> > >> >> > executable mappings (sbcl and other language environments that use an
> > >> >> > executable, run-time modified core image come to mind).
> > >> >>
> > >> >> Hmm, like provide a vm_flags mask along to page_referenced() to only
> > >> >> account matching vmas... seems like a sensible idea.
> > >> >
> > >> > Here is a quick patch for your opinions. Compile tested.
> > >> >
> > >> > With the added vm_flags reporting, the mlock=>unevictable logic can
> > >> > possibly be made more straightforward.
> > >> >
> > >> > Thanks,
> > >> > Fengguang
> > >> > ---
> > >> > vmscan: report vm_flags in page_referenced()
> > >> >
> > >> > This enables more informed reclaim heuristics, eg. to protect executable
> > >> > file pages more aggressively.
> > >> >
> > >> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > >> > ---
> > >> > A include/linux/rmap.h | A  A 5 +++--
> > >> > A mm/rmap.c A  A  A  A  A  A | A  30 +++++++++++++++++++++---------
> > >> > A mm/vmscan.c A  A  A  A  A | A  A 7 +++++--
> > >> > A 3 files changed, 29 insertions(+), 13 deletions(-)
> > >> >
> > >> > --- linux.orig/include/linux/rmap.h
> > >> > +++ linux/include/linux/rmap.h
> > >> > @@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct
> > >> > A /*
> > >> > A * Called from mm/vmscan.c to handle paging out
> > >> > A */
> > >> > -int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
> > >> > +int page_referenced(struct page *, int is_locked,
> > >> > + A  A  A  A  A  A  A  A  A  A  A  struct mem_cgroup *cnt, unsigned long *vm_flags);
> > >> > A int try_to_unmap(struct page *, int ignore_refs);
> > >> >
> > >> > A /*
> > >> > @@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
> > >> > A #define anon_vma_prepare(vma) A (0)
> > >> > A #define anon_vma_link(vma) A  A  do {} while (0)
> > >> >
> > >> > -#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
> > >> > +#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
> > >> > A #define try_to_unmap(page, refs) SWAP_FAIL
> > >> >
> > >> > A static inline int page_mkclean(struct page *page)
> > >> > --- linux.orig/mm/rmap.c
> > >> > +++ linux/mm/rmap.c
> > >> > @@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
> > >> > A * repeatedly from either page_referenced_anon or page_referenced_file.
> > >> > A */
> > >> > A static int page_referenced_one(struct page *page,
> > >> > - A  A  A  struct vm_area_struct *vma, unsigned int *mapcount)
> > >> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A struct vm_area_struct *vma,
> > >> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A unsigned int *mapcount)
> > >> > A {
> > >> > A  A  A  A struct mm_struct *mm = vma->vm_mm;
> > >> > A  A  A  A unsigned long address;
> > >> > @@ -385,7 +386,8 @@ out:
> > >> > A }
> > >> >
> > >> > A static int page_referenced_anon(struct page *page,
> > >> > - A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  struct mem_cgroup *mem_cont)
> > >> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  struct mem_cgroup *mem_cont,
> > >> > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  unsigned long *vm_flags)
> > >> > A {
> > >> > A  A  A  A unsigned int mapcount;
> > >> > A  A  A  A struct anon_vma *anon_vma;
> > >> > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
> > >> > A  A  A  A  A  A  A  A if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> > >> > A  A  A  A  A  A  A  A  A  A  A  A continue;
> > >> > A  A  A  A  A  A  A  A referenced += page_referenced_one(page, vma, &mapcount);
> > >> > + A  A  A  A  A  A  A  *vm_flags |= vma->vm_flags;
> > >>
> > >> Sometime this vma don't contain the anon page.
> > >> That's why we need page_check_address.
> > >> For such a case, wrong *vm_flag cause be harmful to reclaim.
> > >> It can be happen in your first class citizen patch, I think.
> > >
> > > Yes I'm aware of that - the VMA area covers that page, but have no pte
> > > actually installed for that page. That should be OK - the presentation
> > > of such VMA is a good indication of it being some executable text.
> > >
> > 
> > Sorry but I can't understand your point.
> > 
> > This is general interface but not only executable text.
> > Sometime, The information of vma which don't really have the page can
> > be passed to caller.
> 
> Right. But if the caller don't care, why bother passing the vm_flags
> parameter down to page_referenced_one()? We can do that when there
> comes a need, otherwise it sounds more like unnecessary overheads.
> 
> > ex) It can be happen by COW, mremap, non-linear mapping and so on.
> > but I am not sure.
> 
> Hmm, this reminded me of the mlocked page protection logic in
> page_referenced_one(). Why shall the "if (vma->vm_flags & VM_LOCKED)"
> check be placed *after* the page_check_address() check? Is there a
> case that an *existing* page frame is not mapped to the VM_LOCKED vma?
> And why not to protect the page in such a case?


I also have been having a question that routine.
As annotation said, it seems to prevent increaseing referenced counter for mlocked page to move the page to unevictable list ASAP.
Is right?
 
But now, page_referenced use refereced variable as just flag not count. 
So, I think referecned variable counted is meaningless. 

What do you think ?


-- 
Kinds Regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
  2009-05-10 21:23                                                       ` Arjan van de Ven
@ 2009-05-11 10:03                                                         ` Johannes Weiner
  -1 siblings, 0 replies; 336+ messages in thread
From: Johannes Weiner @ 2009-05-11 10:03 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Rik van Riel, Alan Cox, KOSAKI Motohiro, Peter Zijlstra,
	Andrew Morton, Wu Fengguang, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 02:23:22PM -0700, Arjan van de Ven wrote:
> On Sun, 10 May 2009 16:37:33 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > Alan Cox wrote:
> > > Historically BSD tackled some of this by actually swapping
> > > processes out once pressure got very high 
> > 
> > Our big problem today usually isn't throughput though,
> > but latency - the time it takes to bring a previously
> > inactive application back to life.
> 
> Could we do a chain? E.g. store which page we paged out next (for the
> vma) as part of the first pageout, and then page them just right back
> in? Or even have a (bitmap?) of pages that have been in memory for the
> vma, and on a re-fault, look for other pages "nearby" that used to be
> in but are now out ?

I'm not sure I understand your chaining idea.

As to the virtually-related pages, I hacked up a clustering idea for
swap-out once (and swap-in readahead should then get virtually related
pages grouped together as well) but it didn't work out as expected.

The LRU order is perhaps a better hint for access patterns than the
relationship on a virtual address level, but at the moment we fail to
keep the LRU order intact on swap so bets are off again...

I have only black-box-benchmarked performance numbers and didn't look
too close at it at the time, though.  If somebody wants to play with
it, patch is attached.

	Hannes

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3b58602..ba11dee 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1020,6 +1020,101 @@ int isolate_lru_page(struct page *page)
 	return ret;
 }
 
+static unsigned long cluster_inactive_anon_vma(struct vm_area_struct *vma,
+					struct page *page,
+					unsigned long *scanned,
+					struct list_head *cluster)
+{
+	pte_t *pte;
+	spinlock_t *ptl;
+	unsigned long va, area, start, end, nr_taken = 0, nr_scanned = 0;
+
+	va = page_address_in_vma(page, vma);
+	if (va == -EFAULT)
+		return 0;
+
+	pte = page_check_address(page, vma->vm_mm, va, &ptl, 0);
+	if (!pte)
+		return 0;
+	pte_unmap_unlock(pte, ptl);
+
+	area = page_cluster << PAGE_SHIFT;
+	start = va - area;
+	if (start < vma->vm_start)
+		start = vma->vm_start;
+	end = va + area;
+	if (end > vma->vm_end)
+		end = vma->vm_end;
+
+	for (va = start; va < end; va += PAGE_SIZE, nr_scanned++) {
+		pgd_t *pgd;
+		pud_t *pud;
+		pmd_t *pmd;
+		struct zone *zone;
+		struct page *cursor;
+
+		pgd = pgd_offset(vma->vm_mm, va);
+		if (!pgd_present(*pgd))
+			continue;
+		pud = pud_offset(pgd, va);
+		if (!pud_present(*pud))
+			continue;
+		pmd = pmd_offset(pud, va);
+		if (!pmd_present(*pmd))
+			continue;
+		pte = pte_offset_map_lock(vma->vm_mm, pmd, va, &ptl);
+		if (!pte_present(*pte)) {
+			pte_unmap_unlock(pte, ptl);
+			continue;
+		}
+		cursor = vm_normal_page(vma, va, *pte);
+		pte_unmap_unlock(pte, ptl);
+
+		if (!cursor || cursor == page)
+			continue;
+
+		zone = page_zone(cursor);
+		if (zone != page_zone(page))
+			continue;
+
+		spin_lock_irq(&zone->lru_lock);
+		if (!__isolate_lru_page(cursor, ISOLATE_INACTIVE, 0)) {
+			list_move_tail(&cursor->lru, cluster);
+			nr_taken++;
+		}
+		spin_unlock_irq(&zone->lru_lock);
+	}
+	*scanned += nr_scanned;
+	return nr_taken;
+}
+
+static unsigned long cluster_inactive_anon(struct list_head *list,
+					unsigned long *scanned)
+{
+	LIST_HEAD(cluster);
+	unsigned long nr_taken = 0, nr_scanned = 0;
+
+	while (!list_empty(list)) {
+		struct page *page;
+		struct anon_vma *anon_vma;
+		struct vm_area_struct *vma;
+
+		page = list_entry(list->next, struct page, lru);
+		list_move(&page->lru, &cluster);
+
+		anon_vma = page_lock_anon_vma(page);
+		if (!anon_vma)
+			continue;
+		list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
+			nr_taken += cluster_inactive_anon_vma(vma, page,
+							&nr_scanned, &cluster);
+		page_unlock_anon_vma(anon_vma);
+	}
+	list_replace(&cluster, list);
+	*scanned += nr_scanned;
+	return nr_taken;
+}
+
 /*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
@@ -1061,6 +1156,11 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
 		nr_taken = sc->isolate_pages(sc->swap_cluster_max,
 			     &page_list, &nr_scan, sc->order, mode,
 				zone, sc->mem_cgroup, 0, file);
+		if (!file && mode == ISOLATE_INACTIVE) {
+			spin_unlock_irq(&zone->lru_lock);
+			nr_taken += cluster_inactive_anon(&page_list, &nr_scan);
+			spin_lock_irq(&zone->lru_lock);
+		}
 		nr_active = clear_active_flags(&page_list, count);
 		__count_vm_events(PGDEACTIVATE, nr_active);
 

^ permalink raw reply related	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class  citizen
@ 2009-05-11 10:03                                                         ` Johannes Weiner
  0 siblings, 0 replies; 336+ messages in thread
From: Johannes Weiner @ 2009-05-11 10:03 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Rik van Riel, Alan Cox, KOSAKI Motohiro, Peter Zijlstra,
	Andrew Morton, Wu Fengguang, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, minchan.kim

On Sun, May 10, 2009 at 02:23:22PM -0700, Arjan van de Ven wrote:
> On Sun, 10 May 2009 16:37:33 -0400
> Rik van Riel <riel@redhat.com> wrote:
> 
> > Alan Cox wrote:
> > > Historically BSD tackled some of this by actually swapping
> > > processes out once pressure got very high 
> > 
> > Our big problem today usually isn't throughput though,
> > but latency - the time it takes to bring a previously
> > inactive application back to life.
> 
> Could we do a chain? E.g. store which page we paged out next (for the
> vma) as part of the first pageout, and then page them just right back
> in? Or even have a (bitmap?) of pages that have been in memory for the
> vma, and on a re-fault, look for other pages "nearby" that used to be
> in but are now out ?

I'm not sure I understand your chaining idea.

As to the virtually-related pages, I hacked up a clustering idea for
swap-out once (and swap-in readahead should then get virtually related
pages grouped together as well) but it didn't work out as expected.

The LRU order is perhaps a better hint for access patterns than the
relationship on a virtual address level, but at the moment we fail to
keep the LRU order intact on swap so bets are off again...

I have only black-box-benchmarked performance numbers and didn't look
too close at it at the time, though.  If somebody wants to play with
it, patch is attached.

	Hannes

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3b58602..ba11dee 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1020,6 +1020,101 @@ int isolate_lru_page(struct page *page)
 	return ret;
 }
 
+static unsigned long cluster_inactive_anon_vma(struct vm_area_struct *vma,
+					struct page *page,
+					unsigned long *scanned,
+					struct list_head *cluster)
+{
+	pte_t *pte;
+	spinlock_t *ptl;
+	unsigned long va, area, start, end, nr_taken = 0, nr_scanned = 0;
+
+	va = page_address_in_vma(page, vma);
+	if (va == -EFAULT)
+		return 0;
+
+	pte = page_check_address(page, vma->vm_mm, va, &ptl, 0);
+	if (!pte)
+		return 0;
+	pte_unmap_unlock(pte, ptl);
+
+	area = page_cluster << PAGE_SHIFT;
+	start = va - area;
+	if (start < vma->vm_start)
+		start = vma->vm_start;
+	end = va + area;
+	if (end > vma->vm_end)
+		end = vma->vm_end;
+
+	for (va = start; va < end; va += PAGE_SIZE, nr_scanned++) {
+		pgd_t *pgd;
+		pud_t *pud;
+		pmd_t *pmd;
+		struct zone *zone;
+		struct page *cursor;
+
+		pgd = pgd_offset(vma->vm_mm, va);
+		if (!pgd_present(*pgd))
+			continue;
+		pud = pud_offset(pgd, va);
+		if (!pud_present(*pud))
+			continue;
+		pmd = pmd_offset(pud, va);
+		if (!pmd_present(*pmd))
+			continue;
+		pte = pte_offset_map_lock(vma->vm_mm, pmd, va, &ptl);
+		if (!pte_present(*pte)) {
+			pte_unmap_unlock(pte, ptl);
+			continue;
+		}
+		cursor = vm_normal_page(vma, va, *pte);
+		pte_unmap_unlock(pte, ptl);
+
+		if (!cursor || cursor == page)
+			continue;
+
+		zone = page_zone(cursor);
+		if (zone != page_zone(page))
+			continue;
+
+		spin_lock_irq(&zone->lru_lock);
+		if (!__isolate_lru_page(cursor, ISOLATE_INACTIVE, 0)) {
+			list_move_tail(&cursor->lru, cluster);
+			nr_taken++;
+		}
+		spin_unlock_irq(&zone->lru_lock);
+	}
+	*scanned += nr_scanned;
+	return nr_taken;
+}
+
+static unsigned long cluster_inactive_anon(struct list_head *list,
+					unsigned long *scanned)
+{
+	LIST_HEAD(cluster);
+	unsigned long nr_taken = 0, nr_scanned = 0;
+
+	while (!list_empty(list)) {
+		struct page *page;
+		struct anon_vma *anon_vma;
+		struct vm_area_struct *vma;
+
+		page = list_entry(list->next, struct page, lru);
+		list_move(&page->lru, &cluster);
+
+		anon_vma = page_lock_anon_vma(page);
+		if (!anon_vma)
+			continue;
+		list_for_each_entry(vma, &anon_vma->head, anon_vma_node)
+			nr_taken += cluster_inactive_anon_vma(vma, page,
+							&nr_scanned, &cluster);
+		page_unlock_anon_vma(anon_vma);
+	}
+	list_replace(&cluster, list);
+	*scanned += nr_scanned;
+	return nr_taken;
+}
+
 /*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
@@ -1061,6 +1156,11 @@ static unsigned long shrink_inactive_list(unsigned long max_scan,
 		nr_taken = sc->isolate_pages(sc->swap_cluster_max,
 			     &page_list, &nr_scan, sc->order, mode,
 				zone, sc->mem_cgroup, 0, file);
+		if (!file && mode == ISOLATE_INACTIVE) {
+			spin_unlock_irq(&zone->lru_lock);
+			nr_taken += cluster_inactive_anon(&page_list, &nr_scan);
+			spin_lock_irq(&zone->lru_lock);
+		}
 		nr_active = clear_active_flags(&page_list, count);
 		__count_vm_events(PGDEACTIVATE, nr_active);
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08 19:58                                     ` Andrew Morton
@ 2009-05-12  2:50                                       ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  2:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

On Sat, May 09, 2009 at 03:58:59AM +0800, Andrew Morton wrote:
> On Fri, 8 May 2009 16:16:08 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > vmscan: make mapped executable pages the first class citizen
> > 
> > Protect referenced PROT_EXEC mapped pages from being deactivated.
> > 
> > PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
> > currently running executables and their linked libraries, they shall really be
> > cached aggressively to provide good user experiences.
> > 
> 
> The patch seems reasonable but the changelog and the (non-existent)
> design documentation could do with a touch-up.

Sure, I expanded the changelog a lot :-)

> > 
> > --- linux.orig/mm/vmscan.c
> > +++ linux/mm/vmscan.c
> > @@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned 
> >  	unsigned long pgscanned;
> >  	unsigned long vm_flags;
> >  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> > +	LIST_HEAD(l_active);
> >  	LIST_HEAD(l_inactive);
> >  	struct page *page;
> >  	struct pagevec pvec;
> > @@ -1272,8 +1273,13 @@ static void shrink_active_list(unsigned 
> >  
> >  		/* page_referenced clears PageReferenced */
> >  		if (page_mapping_inuse(page) &&
> > -		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
> > +		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
> >  			pgmoved++;
> > +			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
> > +				list_add(&page->lru, &l_active);
> > +				continue;
> > +			}
> > +		}
> 
> What we're doing here is to identify referenced, file-backed active
> pages.  We clear their referenced bit and give than another trip around
> the active list.  So if they aren't referenced during that additional
> pass, they will get deactivated next time they are scanned, yes?  It's
> a fairly high-level design/heuristic thing which needs careful
> commenting, please.

OK. I tried to explain the logic behind the code with the following comments:

+                       /*
+                        * Identify referenced, file-backed active pages and
+                        * give them one more trip around the active list. So
+                        * that executable code get better chances to stay in
+                        * memory under moderate memory pressure.  Anon pages
+                        * are ignored, since JVM can create lots of anon
+                        * VM_EXEC pages.
+                        */


> 
> Also, the change makes this comment:
> 
> 	spin_lock_irq(&zone->lru_lock);
> 	/*
> 	 * Count referenced pages from currently used mappings as
> 	 * rotated, even though they are moved to the inactive list.
> 	 * This helps balance scan pressure between file and anonymous
> 	 * pages in get_scan_ratio.
> 	 */
> 	reclaim_stat->recent_rotated[!!file] += pgmoved;
> 
> inaccurate.

Good catch, I'll just remove the stale "even though they are moved to
the inactive list".
 								
> >  		list_add(&page->lru, &l_inactive);
> >  	}
> > @@ -1282,7 +1288,6 @@ static void shrink_active_list(unsigned 
> >  	 * Move the pages to the [file or anon] inactive list.
> >  	 */
> >  	pagevec_init(&pvec, 1);
> > -	lru = LRU_BASE + file * LRU_FILE;
> >  
> >  	spin_lock_irq(&zone->lru_lock);
> >  	/*
> > @@ -1294,6 +1299,7 @@ static void shrink_active_list(unsigned 
> >  	reclaim_stat->recent_rotated[!!file] += pgmoved;
> >  
> >  	pgmoved = 0;  /* count pages moved to inactive list */
> > +	lru = LRU_BASE + file * LRU_FILE;
> >  	while (!list_empty(&l_inactive)) {
> >  		page = lru_to_page(&l_inactive);
> >  		prefetchw_prev_lru_page(page, &l_inactive, flags);
> > @@ -1316,6 +1322,29 @@ static void shrink_active_list(unsigned 
> >  	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> >  	__count_zone_vm_events(PGREFILL, zone, pgscanned);
> >  	__count_vm_events(PGDEACTIVATE, pgmoved);
> > +
> > +	pgmoved = 0;  /* count pages moved back to active list */
> > +	lru = LRU_ACTIVE + file * LRU_FILE;
> > +	while (!list_empty(&l_active)) {
> > +		page = lru_to_page(&l_active);
> > +		prefetchw_prev_lru_page(page, &l_active, flags);
> > +		VM_BUG_ON(PageLRU(page));
> > +		SetPageLRU(page);
> > +		VM_BUG_ON(!PageActive(page));
> > +
> > +		list_move(&page->lru, &zone->lru[lru].list);
> > +		mem_cgroup_add_lru_list(page, lru);
> > +		pgmoved++;
> > +		if (!pagevec_add(&pvec, page)) {
> > +			spin_unlock_irq(&zone->lru_lock);
> > +			if (buffer_heads_over_limit)
> > +				pagevec_strip(&pvec);
> > +			__pagevec_release(&pvec);
> > +			spin_lock_irq(&zone->lru_lock);
> > +		}
> > +	}
> 
> The copy-n-pasting here is unfortunate.  But I expect that if we redid
> this as a loop, the result would be a bit ugly - the pageActive
> handling gets in the way.

Yup. I introduced a function for the two mostly duplicated code blocks.
 
> > +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> 
> Is it just me, is is all this stuff:
> 
> 	lru = LRU_ACTIVE + file * LRU_FILE;
> 	...
> 	foo(NR_LRU_BASE + lru);
> 
> really hard to read?

Yes, it seems hacky, but can hardly be reduced because the full code is

  	lru = LRU_ACTIVE + file * LRU_FILE;
  	...
        foo(lru);
        ...
  	bar(NR_LRU_BASE + lru);

> 
> Now.  How do we know that this patch improves Linux?

Hmm, it seems hard to get measurable performance numbers.

But we know that the running executable code is precious and shall be
protected, and the patch protects them in this way:

        before patch: will be reclaimed if not referenced in I
        after  patch: will be reclaimed if not referenced in I+A
where
        A = time to fully scan the active   file LRU
        I = time to fully scan the inactive file LRU

Note that normally A >> I.

Therefore this patch greatly prolongs the in-cache time of executable code,
when there are moderate memory pressures.


Followed are the three updated patches.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-12  2:50                                       ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  2:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

On Sat, May 09, 2009 at 03:58:59AM +0800, Andrew Morton wrote:
> On Fri, 8 May 2009 16:16:08 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > vmscan: make mapped executable pages the first class citizen
> > 
> > Protect referenced PROT_EXEC mapped pages from being deactivated.
> > 
> > PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
> > currently running executables and their linked libraries, they shall really be
> > cached aggressively to provide good user experiences.
> > 
> 
> The patch seems reasonable but the changelog and the (non-existent)
> design documentation could do with a touch-up.

Sure, I expanded the changelog a lot :-)

> > 
> > --- linux.orig/mm/vmscan.c
> > +++ linux/mm/vmscan.c
> > @@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned 
> >  	unsigned long pgscanned;
> >  	unsigned long vm_flags;
> >  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> > +	LIST_HEAD(l_active);
> >  	LIST_HEAD(l_inactive);
> >  	struct page *page;
> >  	struct pagevec pvec;
> > @@ -1272,8 +1273,13 @@ static void shrink_active_list(unsigned 
> >  
> >  		/* page_referenced clears PageReferenced */
> >  		if (page_mapping_inuse(page) &&
> > -		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
> > +		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
> >  			pgmoved++;
> > +			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
> > +				list_add(&page->lru, &l_active);
> > +				continue;
> > +			}
> > +		}
> 
> What we're doing here is to identify referenced, file-backed active
> pages.  We clear their referenced bit and give than another trip around
> the active list.  So if they aren't referenced during that additional
> pass, they will get deactivated next time they are scanned, yes?  It's
> a fairly high-level design/heuristic thing which needs careful
> commenting, please.

OK. I tried to explain the logic behind the code with the following comments:

+                       /*
+                        * Identify referenced, file-backed active pages and
+                        * give them one more trip around the active list. So
+                        * that executable code get better chances to stay in
+                        * memory under moderate memory pressure.  Anon pages
+                        * are ignored, since JVM can create lots of anon
+                        * VM_EXEC pages.
+                        */


> 
> Also, the change makes this comment:
> 
> 	spin_lock_irq(&zone->lru_lock);
> 	/*
> 	 * Count referenced pages from currently used mappings as
> 	 * rotated, even though they are moved to the inactive list.
> 	 * This helps balance scan pressure between file and anonymous
> 	 * pages in get_scan_ratio.
> 	 */
> 	reclaim_stat->recent_rotated[!!file] += pgmoved;
> 
> inaccurate.

Good catch, I'll just remove the stale "even though they are moved to
the inactive list".
 								
> >  		list_add(&page->lru, &l_inactive);
> >  	}
> > @@ -1282,7 +1288,6 @@ static void shrink_active_list(unsigned 
> >  	 * Move the pages to the [file or anon] inactive list.
> >  	 */
> >  	pagevec_init(&pvec, 1);
> > -	lru = LRU_BASE + file * LRU_FILE;
> >  
> >  	spin_lock_irq(&zone->lru_lock);
> >  	/*
> > @@ -1294,6 +1299,7 @@ static void shrink_active_list(unsigned 
> >  	reclaim_stat->recent_rotated[!!file] += pgmoved;
> >  
> >  	pgmoved = 0;  /* count pages moved to inactive list */
> > +	lru = LRU_BASE + file * LRU_FILE;
> >  	while (!list_empty(&l_inactive)) {
> >  		page = lru_to_page(&l_inactive);
> >  		prefetchw_prev_lru_page(page, &l_inactive, flags);
> > @@ -1316,6 +1322,29 @@ static void shrink_active_list(unsigned 
> >  	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> >  	__count_zone_vm_events(PGREFILL, zone, pgscanned);
> >  	__count_vm_events(PGDEACTIVATE, pgmoved);
> > +
> > +	pgmoved = 0;  /* count pages moved back to active list */
> > +	lru = LRU_ACTIVE + file * LRU_FILE;
> > +	while (!list_empty(&l_active)) {
> > +		page = lru_to_page(&l_active);
> > +		prefetchw_prev_lru_page(page, &l_active, flags);
> > +		VM_BUG_ON(PageLRU(page));
> > +		SetPageLRU(page);
> > +		VM_BUG_ON(!PageActive(page));
> > +
> > +		list_move(&page->lru, &zone->lru[lru].list);
> > +		mem_cgroup_add_lru_list(page, lru);
> > +		pgmoved++;
> > +		if (!pagevec_add(&pvec, page)) {
> > +			spin_unlock_irq(&zone->lru_lock);
> > +			if (buffer_heads_over_limit)
> > +				pagevec_strip(&pvec);
> > +			__pagevec_release(&pvec);
> > +			spin_lock_irq(&zone->lru_lock);
> > +		}
> > +	}
> 
> The copy-n-pasting here is unfortunate.  But I expect that if we redid
> this as a loop, the result would be a bit ugly - the pageActive
> handling gets in the way.

Yup. I introduced a function for the two mostly duplicated code blocks.
 
> > +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> 
> Is it just me, is is all this stuff:
> 
> 	lru = LRU_ACTIVE + file * LRU_FILE;
> 	...
> 	foo(NR_LRU_BASE + lru);
> 
> really hard to read?

Yes, it seems hacky, but can hardly be reduced because the full code is

  	lru = LRU_ACTIVE + file * LRU_FILE;
  	...
        foo(lru);
        ...
  	bar(NR_LRU_BASE + lru);

> 
> Now.  How do we know that this patch improves Linux?

Hmm, it seems hard to get measurable performance numbers.

But we know that the running executable code is precious and shall be
protected, and the patch protects them in this way:

        before patch: will be reclaimed if not referenced in I
        after  patch: will be reclaimed if not referenced in I+A
where
        A = time to fully scan the active   file LRU
        I = time to fully scan the inactive file LRU

Note that normally A >> I.

Therefore this patch greatly prolongs the in-cache time of executable code,
when there are moderate memory pressures.


Followed are the three updated patches.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH -mm] vmscan: report vm_flags in page_referenced()
  2009-05-08 19:58                                     ` Andrew Morton
@ 2009-05-12  2:51                                       ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  2:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

Collect the vma->vm_flags in every VMA that page_referenced() walked.
Note that the walked VMAs
- may not actually have the page installed in PTE
- may not be equal to all VMAs that covered the page

This is a preparation for more informed reclaim heuristics, eg. to
protect executable file pages more aggressively.

For now only VM_EXEC is used by the caller, in which case we don't care
whether the VMAs actually has that page installed in their PTEs. The fact
that the page is covered by some VM_EXEC VMA and therefore is likely part
of an executable's text section is enough information.

CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/rmap.h |    5 +++--
 mm/rmap.c            |   30 +++++++++++++++++++++---------
 mm/vmscan.c          |    7 +++++--
 3 files changed, 29 insertions(+), 13 deletions(-)

--- linux.orig/include/linux/rmap.h
+++ linux/include/linux/rmap.h
@@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct 
 /*
  * Called from mm/vmscan.c to handle paging out
  */
-int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
+int page_referenced(struct page *, int is_locked,
+			struct mem_cgroup *cnt, unsigned long *vm_flags);
 int try_to_unmap(struct page *, int ignore_refs);
 
 /*
@@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
 #define anon_vma_prepare(vma)	(0)
 #define anon_vma_link(vma)	do {} while (0)
 
-#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
+#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
 #define try_to_unmap(page, refs) SWAP_FAIL
 
 static inline int page_mkclean(struct page *page)
--- linux.orig/mm/rmap.c
+++ linux/mm/rmap.c
@@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
  * repeatedly from either page_referenced_anon or page_referenced_file.
  */
 static int page_referenced_one(struct page *page,
-	struct vm_area_struct *vma, unsigned int *mapcount)
+			       struct vm_area_struct *vma,
+			       unsigned int *mapcount)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -385,7 +386,8 @@ out:
 }
 
 static int page_referenced_anon(struct page *page,
-				struct mem_cgroup *mem_cont)
+				struct mem_cgroup *mem_cont,
+				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
 	struct anon_vma *anon_vma;
@@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
 		referenced += page_referenced_one(page, vma, &mapcount);
+		*vm_flags |= vma->vm_flags;
 		if (!mapcount)
 			break;
 	}
@@ -418,6 +421,7 @@ static int page_referenced_anon(struct p
  * page_referenced_file - referenced check for object-based rmap
  * @page: the page we're checking references on.
  * @mem_cont: target memory controller
+ * @vm_flags: collect encountered vma->vm_flags
  *
  * For an object-based mapped page, find all the places it is mapped and
  * check/clear the referenced flag.  This is done by following the page->mapping
@@ -427,7 +431,8 @@ static int page_referenced_anon(struct p
  * This function is only called from page_referenced for object-based pages.
  */
 static int page_referenced_file(struct page *page,
-				struct mem_cgroup *mem_cont)
+				struct mem_cgroup *mem_cont,
+				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
@@ -468,6 +473,7 @@ static int page_referenced_file(struct p
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
 		referenced += page_referenced_one(page, vma, &mapcount);
+		*vm_flags |= vma->vm_flags;
 		if (!mapcount)
 			break;
 	}
@@ -481,29 +487,35 @@ static int page_referenced_file(struct p
  * @page: the page to test
  * @is_locked: caller holds lock on the page
  * @mem_cont: target memory controller
+ * @vm_flags: collect encountered vma->vm_flags
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
  */
-int page_referenced(struct page *page, int is_locked,
-			struct mem_cgroup *mem_cont)
+int page_referenced(struct page *page,
+		    int is_locked,
+		    struct mem_cgroup *mem_cont,
+		    unsigned long *vm_flags)
 {
 	int referenced = 0;
 
 	if (TestClearPageReferenced(page))
 		referenced++;
 
+	*vm_flags = 0;
 	if (page_mapped(page) && page->mapping) {
 		if (PageAnon(page))
-			referenced += page_referenced_anon(page, mem_cont);
+			referenced += page_referenced_anon(page, mem_cont,
+								vm_flags);
 		else if (is_locked)
-			referenced += page_referenced_file(page, mem_cont);
+			referenced += page_referenced_file(page, mem_cont,
+								vm_flags);
 		else if (!trylock_page(page))
 			referenced++;
 		else {
 			if (page->mapping)
-				referenced +=
-					page_referenced_file(page, mem_cont);
+				referenced += page_referenced_file(page,
+							mem_cont, vm_flags);
 			unlock_page(page);
 		}
 	}
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -598,6 +598,7 @@ static unsigned long shrink_page_list(st
 	struct pagevec freed_pvec;
 	int pgactivate = 0;
 	unsigned long nr_reclaimed = 0;
+	unsigned long vm_flags;
 
 	cond_resched();
 
@@ -648,7 +649,8 @@ static unsigned long shrink_page_list(st
 				goto keep_locked;
 		}
 
-		referenced = page_referenced(page, 1, sc->mem_cgroup);
+		referenced = page_referenced(page, 1,
+						sc->mem_cgroup, &vm_flags);
 		/* In active use or really unfreeable?  Activate it. */
 		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
 					referenced && page_mapping_inuse(page))
@@ -1229,6 +1231,7 @@ static void shrink_active_list(unsigned 
 {
 	unsigned long pgmoved;
 	unsigned long pgscanned;
+	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
 	LIST_HEAD(l_inactive);
 	struct page *page;
@@ -1269,7 +1272,7 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup))
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
 			pgmoved++;
 
 		list_add(&page->lru, &l_inactive);

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH -mm] vmscan: report vm_flags in page_referenced()
@ 2009-05-12  2:51                                       ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  2:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

Collect the vma->vm_flags in every VMA that page_referenced() walked.
Note that the walked VMAs
- may not actually have the page installed in PTE
- may not be equal to all VMAs that covered the page

This is a preparation for more informed reclaim heuristics, eg. to
protect executable file pages more aggressively.

For now only VM_EXEC is used by the caller, in which case we don't care
whether the VMAs actually has that page installed in their PTEs. The fact
that the page is covered by some VM_EXEC VMA and therefore is likely part
of an executable's text section is enough information.

CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 include/linux/rmap.h |    5 +++--
 mm/rmap.c            |   30 +++++++++++++++++++++---------
 mm/vmscan.c          |    7 +++++--
 3 files changed, 29 insertions(+), 13 deletions(-)

--- linux.orig/include/linux/rmap.h
+++ linux/include/linux/rmap.h
@@ -83,7 +83,8 @@ static inline void page_dup_rmap(struct 
 /*
  * Called from mm/vmscan.c to handle paging out
  */
-int page_referenced(struct page *, int is_locked, struct mem_cgroup *cnt);
+int page_referenced(struct page *, int is_locked,
+			struct mem_cgroup *cnt, unsigned long *vm_flags);
 int try_to_unmap(struct page *, int ignore_refs);
 
 /*
@@ -128,7 +129,7 @@ int page_wrprotect(struct page *page, in
 #define anon_vma_prepare(vma)	(0)
 #define anon_vma_link(vma)	do {} while (0)
 
-#define page_referenced(page,l,cnt) TestClearPageReferenced(page)
+#define page_referenced(page, locked, cnt, flags) TestClearPageReferenced(page)
 #define try_to_unmap(page, refs) SWAP_FAIL
 
 static inline int page_mkclean(struct page *page)
--- linux.orig/mm/rmap.c
+++ linux/mm/rmap.c
@@ -333,7 +333,8 @@ static int page_mapped_in_vma(struct pag
  * repeatedly from either page_referenced_anon or page_referenced_file.
  */
 static int page_referenced_one(struct page *page,
-	struct vm_area_struct *vma, unsigned int *mapcount)
+			       struct vm_area_struct *vma,
+			       unsigned int *mapcount)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address;
@@ -385,7 +386,8 @@ out:
 }
 
 static int page_referenced_anon(struct page *page,
-				struct mem_cgroup *mem_cont)
+				struct mem_cgroup *mem_cont,
+				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
 	struct anon_vma *anon_vma;
@@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
 		referenced += page_referenced_one(page, vma, &mapcount);
+		*vm_flags |= vma->vm_flags;
 		if (!mapcount)
 			break;
 	}
@@ -418,6 +421,7 @@ static int page_referenced_anon(struct p
  * page_referenced_file - referenced check for object-based rmap
  * @page: the page we're checking references on.
  * @mem_cont: target memory controller
+ * @vm_flags: collect encountered vma->vm_flags
  *
  * For an object-based mapped page, find all the places it is mapped and
  * check/clear the referenced flag.  This is done by following the page->mapping
@@ -427,7 +431,8 @@ static int page_referenced_anon(struct p
  * This function is only called from page_referenced for object-based pages.
  */
 static int page_referenced_file(struct page *page,
-				struct mem_cgroup *mem_cont)
+				struct mem_cgroup *mem_cont,
+				unsigned long *vm_flags)
 {
 	unsigned int mapcount;
 	struct address_space *mapping = page->mapping;
@@ -468,6 +473,7 @@ static int page_referenced_file(struct p
 		if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
 			continue;
 		referenced += page_referenced_one(page, vma, &mapcount);
+		*vm_flags |= vma->vm_flags;
 		if (!mapcount)
 			break;
 	}
@@ -481,29 +487,35 @@ static int page_referenced_file(struct p
  * @page: the page to test
  * @is_locked: caller holds lock on the page
  * @mem_cont: target memory controller
+ * @vm_flags: collect encountered vma->vm_flags
  *
  * Quick test_and_clear_referenced for all mappings to a page,
  * returns the number of ptes which referenced the page.
  */
-int page_referenced(struct page *page, int is_locked,
-			struct mem_cgroup *mem_cont)
+int page_referenced(struct page *page,
+		    int is_locked,
+		    struct mem_cgroup *mem_cont,
+		    unsigned long *vm_flags)
 {
 	int referenced = 0;
 
 	if (TestClearPageReferenced(page))
 		referenced++;
 
+	*vm_flags = 0;
 	if (page_mapped(page) && page->mapping) {
 		if (PageAnon(page))
-			referenced += page_referenced_anon(page, mem_cont);
+			referenced += page_referenced_anon(page, mem_cont,
+								vm_flags);
 		else if (is_locked)
-			referenced += page_referenced_file(page, mem_cont);
+			referenced += page_referenced_file(page, mem_cont,
+								vm_flags);
 		else if (!trylock_page(page))
 			referenced++;
 		else {
 			if (page->mapping)
-				referenced +=
-					page_referenced_file(page, mem_cont);
+				referenced += page_referenced_file(page,
+							mem_cont, vm_flags);
 			unlock_page(page);
 		}
 	}
--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -598,6 +598,7 @@ static unsigned long shrink_page_list(st
 	struct pagevec freed_pvec;
 	int pgactivate = 0;
 	unsigned long nr_reclaimed = 0;
+	unsigned long vm_flags;
 
 	cond_resched();
 
@@ -648,7 +649,8 @@ static unsigned long shrink_page_list(st
 				goto keep_locked;
 		}
 
-		referenced = page_referenced(page, 1, sc->mem_cgroup);
+		referenced = page_referenced(page, 1,
+						sc->mem_cgroup, &vm_flags);
 		/* In active use or really unfreeable?  Activate it. */
 		if (sc->order <= PAGE_ALLOC_COSTLY_ORDER &&
 					referenced && page_mapping_inuse(page))
@@ -1229,6 +1231,7 @@ static void shrink_active_list(unsigned 
 {
 	unsigned long pgmoved;
 	unsigned long pgscanned;
+	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
 	LIST_HEAD(l_inactive);
 	struct page *page;
@@ -1269,7 +1272,7 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup))
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
 			pgmoved++;
 
 		list_add(&page->lru, &l_inactive);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-08 19:58                                     ` Andrew Morton
@ 2009-05-12  2:52                                       ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  2:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

Protect referenced PROT_EXEC mapped pages from being deactivated.

PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
currently running executables and their linked libraries, they shall really be
cached aggressively to provide good user experiences.

Thanks to Johannes Weiner for the advice to reuse the VMA walk in
page_referenced() to get the PROT_EXEC bit.


[more details]

( The consequences of this patch will have to be discussed together with
  Rik van Riel's recent patch "vmscan: evict use-once pages first". )

( Some of the good points and insights are taken into this changelog.
  Thanks to all the involved people for the great LKML discussions. )

the problem
-----------

For a typical desktop, the most precious working set is composed of
*actively accessed*
	(1) memory mapped executables
	(2) and their anonymous pages
	(3) and other files
	(4) and the dcache/icache/.. slabs
while the least important data are
	(5) infrequently used or use-once files

For a typical desktop, one major problem is busty and large amount of (5)
use-once files flushing out the working set.

Inside the working set, (4) dcache/icache have already been too sticky ;-)
So we only have to care (2) anonymous and (1)(3) file pages.

anonymous pages
---------------
Anonymous pages are effectively immune to the streaming IO attack, because we
now have separate file/anon LRU lists. When the use-once files crowd into the
file LRU, the list's "quality" is significantly lowered. Therefore the scan
balance policy in get_scan_ratio() will choose to scan the (low quality) file
LRU much more frequently than the anon LRU.

file pages
----------
Rik proposed to *not* scan the active file LRU when the inactive list grows
larger than active list. This guarantees that when there are use-once streaming
IO, and the working set is not too large(so that active_size < inactive_size),
the active file LRU will *not* be scanned at all. So the not-too-large working
set can be well protected.

But there are also situations where the file working set is a bit large so that
(active_size >= inactive_size), or the streaming IOs are not purely use-once.
In these cases, the active list will be scanned slowly. Because the current
shrink_active_list() policy is to deactivate active pages regardless of their
referenced bits. The deactivated pages become susceptible to the streaming IO
attack: the inactive list could be scanned fast (500MB / 50MBps = 10s) so that
the deactivated pages don't have enough time to get re-referenced. Because a
user tend to switch between windows in intervals from seconds to minutes.

This patch holds mapped executable pages in the active list as long as they
are referenced during each full scan of the active list.  Because the active
list is normally scanned much slower, they get longer grace time (eg. 100s)
for further references, which better matches the pace of user operations.

side effects
------------

This patch is safe in general, it restores the pre-2.6.28 mmap() behavior
but in a much smaller and well targeted scope.

One may worry about some one to abuse the PROT_EXEC heuristic.  But as
Andrew Morton stated, there are other tricks to getting that sort of boost.

Another concern is the PROT_EXEC mapped pages growing large in rare cases,
and therefore hurting reclaim efficiency. But a sane application targeted for
large audience will never use PROT_EXEC for data mappings. If some home made
application tries to abuse that bit, it shall be aware of the consequences,
which won't be disastrous even in the worst case.

CC: Elladan <elladan@eskimo.com>
CC: Nick Piggin <npiggin@suse.de>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned 
 	unsigned long pgscanned;
 	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
+	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct pagevec pvec;
@@ -1272,8 +1273,21 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
 			pgmoved++;
+			/*
+			 * Identify referenced, file-backed active pages and
+			 * give them one more trip around the active list. So
+			 * that executable code get better chances to stay in
+			 * memory under moderate memory pressure.  Anon pages
+			 * are ignored, since JVM can create lots of anon
+			 * VM_EXEC pages.
+			 */
+			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
+				list_add(&page->lru, &l_active);
+				continue;
+			}
+		}
 
 		list_add(&page->lru, &l_inactive);
 	}
@@ -1282,7 +1296,6 @@ static void shrink_active_list(unsigned 
 	 * Move the pages to the [file or anon] inactive list.
 	 */
 	pagevec_init(&pvec, 1);
-	lru = LRU_BASE + file * LRU_FILE;
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
@@ -1294,6 +1307,7 @@ static void shrink_active_list(unsigned 
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
 	pgmoved = 0;  /* count pages moved to inactive list */
+	lru = LRU_BASE + file * LRU_FILE;
 	while (!list_empty(&l_inactive)) {
 		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1316,6 +1330,29 @@ static void shrink_active_list(unsigned 
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	__count_vm_events(PGDEACTIVATE, pgmoved);
+
+	pgmoved = 0;  /* count pages moved back to active list */
+	lru = LRU_ACTIVE + file * LRU_FILE;
+	while (!list_empty(&l_active)) {
+		page = lru_to_page(&l_active);
+		prefetchw_prev_lru_page(page, &l_active, flags);
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+		VM_BUG_ON(!PageActive(page));
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)
 		pagevec_strip(&pvec);

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-12  2:52                                       ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  2:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

Protect referenced PROT_EXEC mapped pages from being deactivated.

PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
currently running executables and their linked libraries, they shall really be
cached aggressively to provide good user experiences.

Thanks to Johannes Weiner for the advice to reuse the VMA walk in
page_referenced() to get the PROT_EXEC bit.


[more details]

( The consequences of this patch will have to be discussed together with
  Rik van Riel's recent patch "vmscan: evict use-once pages first". )

( Some of the good points and insights are taken into this changelog.
  Thanks to all the involved people for the great LKML discussions. )

the problem
-----------

For a typical desktop, the most precious working set is composed of
*actively accessed*
	(1) memory mapped executables
	(2) and their anonymous pages
	(3) and other files
	(4) and the dcache/icache/.. slabs
while the least important data are
	(5) infrequently used or use-once files

For a typical desktop, one major problem is busty and large amount of (5)
use-once files flushing out the working set.

Inside the working set, (4) dcache/icache have already been too sticky ;-)
So we only have to care (2) anonymous and (1)(3) file pages.

anonymous pages
---------------
Anonymous pages are effectively immune to the streaming IO attack, because we
now have separate file/anon LRU lists. When the use-once files crowd into the
file LRU, the list's "quality" is significantly lowered. Therefore the scan
balance policy in get_scan_ratio() will choose to scan the (low quality) file
LRU much more frequently than the anon LRU.

file pages
----------
Rik proposed to *not* scan the active file LRU when the inactive list grows
larger than active list. This guarantees that when there are use-once streaming
IO, and the working set is not too large(so that active_size < inactive_size),
the active file LRU will *not* be scanned at all. So the not-too-large working
set can be well protected.

But there are also situations where the file working set is a bit large so that
(active_size >= inactive_size), or the streaming IOs are not purely use-once.
In these cases, the active list will be scanned slowly. Because the current
shrink_active_list() policy is to deactivate active pages regardless of their
referenced bits. The deactivated pages become susceptible to the streaming IO
attack: the inactive list could be scanned fast (500MB / 50MBps = 10s) so that
the deactivated pages don't have enough time to get re-referenced. Because a
user tend to switch between windows in intervals from seconds to minutes.

This patch holds mapped executable pages in the active list as long as they
are referenced during each full scan of the active list.  Because the active
list is normally scanned much slower, they get longer grace time (eg. 100s)
for further references, which better matches the pace of user operations.

side effects
------------

This patch is safe in general, it restores the pre-2.6.28 mmap() behavior
but in a much smaller and well targeted scope.

One may worry about some one to abuse the PROT_EXEC heuristic.  But as
Andrew Morton stated, there are other tricks to getting that sort of boost.

Another concern is the PROT_EXEC mapped pages growing large in rare cases,
and therefore hurting reclaim efficiency. But a sane application targeted for
large audience will never use PROT_EXEC for data mappings. If some home made
application tries to abuse that bit, it shall be aware of the consequences,
which won't be disastrous even in the worst case.

CC: Elladan <elladan@eskimo.com>
CC: Nick Piggin <npiggin@suse.de>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned 
 	unsigned long pgscanned;
 	unsigned long vm_flags;
 	LIST_HEAD(l_hold);	/* The pages which were snipped off */
+	LIST_HEAD(l_active);
 	LIST_HEAD(l_inactive);
 	struct page *page;
 	struct pagevec pvec;
@@ -1272,8 +1273,21 @@ static void shrink_active_list(unsigned 
 
 		/* page_referenced clears PageReferenced */
 		if (page_mapping_inuse(page) &&
-		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
+		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
 			pgmoved++;
+			/*
+			 * Identify referenced, file-backed active pages and
+			 * give them one more trip around the active list. So
+			 * that executable code get better chances to stay in
+			 * memory under moderate memory pressure.  Anon pages
+			 * are ignored, since JVM can create lots of anon
+			 * VM_EXEC pages.
+			 */
+			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
+				list_add(&page->lru, &l_active);
+				continue;
+			}
+		}
 
 		list_add(&page->lru, &l_inactive);
 	}
@@ -1282,7 +1296,6 @@ static void shrink_active_list(unsigned 
 	 * Move the pages to the [file or anon] inactive list.
 	 */
 	pagevec_init(&pvec, 1);
-	lru = LRU_BASE + file * LRU_FILE;
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
@@ -1294,6 +1307,7 @@ static void shrink_active_list(unsigned 
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
 	pgmoved = 0;  /* count pages moved to inactive list */
+	lru = LRU_BASE + file * LRU_FILE;
 	while (!list_empty(&l_inactive)) {
 		page = lru_to_page(&l_inactive);
 		prefetchw_prev_lru_page(page, &l_inactive, flags);
@@ -1316,6 +1330,29 @@ static void shrink_active_list(unsigned 
 	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	__count_vm_events(PGDEACTIVATE, pgmoved);
+
+	pgmoved = 0;  /* count pages moved back to active list */
+	lru = LRU_ACTIVE + file * LRU_FILE;
+	while (!list_empty(&l_active)) {
+		page = lru_to_page(&l_active);
+		prefetchw_prev_lru_page(page, &l_active, flags);
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+		VM_BUG_ON(!PageActive(page));
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)
 		pagevec_strip(&pvec);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
  2009-05-08 19:58                                     ` Andrew Morton
@ 2009-05-12  2:53                                       ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  2:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
 1 file changed, 35 insertions(+), 49 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
+{
+	unsigned long pgmoved = 0;
+
+	while (!list_empty(&list)) {
+		page = lru_to_page(&list);
+		prefetchw_prev_lru_page(page, &list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (lru < LRU_ACTIVE)
+			ClearPageActive(page);
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (lru < LRU_ACTIVE)
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1254,6 +1284,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1293,65 +1324,20 @@ static void shrink_active_list(unsigned 
 	}
 
 	/*
-	 * Move the pages to the [file or anon] inactive list.
+	 * Move pages back to the lru list.
 	 */
 	pagevec_init(&pvec, 1);
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
-	 * Count referenced pages from currently used mappings as
-	 * rotated, even though they are moved to the inactive list.
+	 * Count referenced pages from currently used mappings as rotated.
 	 * This helps balance scan pressure between file and anonymous
 	 * pages in get_scan_ratio.
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(LRU_ACTIVE + file * LRU_FILE, &l_active);
+	move_active_pages_to_lru(LRU_BASE   + file * LRU_FILE, &l_inactive);
 
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)

^ permalink raw reply	[flat|nested] 336+ messages in thread

* [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-12  2:53                                       ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  2:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
 1 file changed, 35 insertions(+), 49 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
+{
+	unsigned long pgmoved = 0;
+
+	while (!list_empty(&list)) {
+		page = lru_to_page(&list);
+		prefetchw_prev_lru_page(page, &list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (lru < LRU_ACTIVE)
+			ClearPageActive(page);
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (lru < LRU_ACTIVE)
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1254,6 +1284,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1293,65 +1324,20 @@ static void shrink_active_list(unsigned 
 	}
 
 	/*
-	 * Move the pages to the [file or anon] inactive list.
+	 * Move pages back to the lru list.
 	 */
 	pagevec_init(&pvec, 1);
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
-	 * Count referenced pages from currently used mappings as
-	 * rotated, even though they are moved to the inactive list.
+	 * Count referenced pages from currently used mappings as rotated.
 	 * This helps balance scan pressure between file and anonymous
 	 * pages in get_scan_ratio.
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(LRU_ACTIVE + file * LRU_FILE, &l_active);
+	move_active_pages_to_lru(LRU_BASE   + file * LRU_FILE, &l_inactive);
 
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
  2009-05-12  2:53                                       ` Wu Fengguang
@ 2009-05-12  2:58                                         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12  2:58 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Andrew Morton, hannes, peterz, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

> +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)

it can be static?

> +{
> +	unsigned long pgmoved = 0;
> +
> +	while (!list_empty(&list)) {
> +		page = lru_to_page(&list);
> +		prefetchw_prev_lru_page(page, &list, flags);
> +
> +		VM_BUG_ON(PageLRU(page));
> +		SetPageLRU(page);
> +
> +		VM_BUG_ON(!PageActive(page));
> +		if (lru < LRU_ACTIVE)
> +			ClearPageActive(page);
> +
> +		list_move(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_add_lru_list(page, lru);
> +		pgmoved++;
> +		if (!pagevec_add(&pvec, page)) {
> +			spin_unlock_irq(&zone->lru_lock);
> +			if (buffer_heads_over_limit)
> +				pagevec_strip(&pvec);
> +			__pagevec_release(&pvec);
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +	}
> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +	if (lru < LRU_ACTIVE)
> +		__count_vm_events(PGDEACTIVATE, pgmoved);
> +}



^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-12  2:58                                         ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12  2:58 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Andrew Morton, hannes, peterz, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

> +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)

it can be static?

> +{
> +	unsigned long pgmoved = 0;
> +
> +	while (!list_empty(&list)) {
> +		page = lru_to_page(&list);
> +		prefetchw_prev_lru_page(page, &list, flags);
> +
> +		VM_BUG_ON(PageLRU(page));
> +		SetPageLRU(page);
> +
> +		VM_BUG_ON(!PageActive(page));
> +		if (lru < LRU_ACTIVE)
> +			ClearPageActive(page);
> +
> +		list_move(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_add_lru_list(page, lru);
> +		pgmoved++;
> +		if (!pagevec_add(&pvec, page)) {
> +			spin_unlock_irq(&zone->lru_lock);
> +			if (buffer_heads_over_limit)
> +				pagevec_strip(&pvec);
> +			__pagevec_release(&pvec);
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +	}
> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +	if (lru < LRU_ACTIVE)
> +		__count_vm_events(PGDEACTIVATE, pgmoved);
> +}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-12  2:52                                       ` Wu Fengguang
@ 2009-05-12  3:00                                         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12  3:00 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Andrew Morton, hannes, peterz, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

> side effects
> ------------
> 
> This patch is safe in general, it restores the pre-2.6.28 mmap() behavior
> but in a much smaller and well targeted scope.
> 
> One may worry about some one to abuse the PROT_EXEC heuristic.  But as
> Andrew Morton stated, there are other tricks to getting that sort of boost.
> 
> Another concern is the PROT_EXEC mapped pages growing large in rare cases,
> and therefore hurting reclaim efficiency. But a sane application targeted for
> large audience will never use PROT_EXEC for data mappings. If some home made
> application tries to abuse that bit, it shall be aware of the consequences,
> which won't be disastrous even in the worst case.

ok, I lose.
if you can post good mesurement result, I'll ack this.




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-12  3:00                                         ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-12  3:00 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: kosaki.motohiro, Andrew Morton, hannes, peterz, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, cl, minchan.kim

> side effects
> ------------
> 
> This patch is safe in general, it restores the pre-2.6.28 mmap() behavior
> but in a much smaller and well targeted scope.
> 
> One may worry about some one to abuse the PROT_EXEC heuristic.  But as
> Andrew Morton stated, there are other tricks to getting that sort of boost.
> 
> Another concern is the PROT_EXEC mapped pages growing large in rare cases,
> and therefore hurting reclaim efficiency. But a sane application targeted for
> large audience will never use PROT_EXEC for data mappings. If some home made
> application tries to abuse that bit, it shall be aware of the consequences,
> which won't be disastrous even in the worst case.

ok, I lose.
if you can post good mesurement result, I'll ack this.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
  2009-05-12  2:58                                         ` KOSAKI Motohiro
@ 2009-05-12  3:03                                           ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  3:03 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, minchan.kim

On Tue, May 12, 2009 at 10:58:28AM +0800, KOSAKI Motohiro wrote:
> > +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
> 
> it can be static?

Thanks, here's the updated patch.

---
vmscan: merge duplicate code in shrink_active_list()

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
 1 file changed, 35 insertions(+), 49 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
+{
+	unsigned long pgmoved = 0;
+
+	while (!list_empty(&list)) {
+		page = lru_to_page(&list);
+		prefetchw_prev_lru_page(page, &list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (lru < LRU_ACTIVE)
+			ClearPageActive(page);
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (lru < LRU_ACTIVE)
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1254,6 +1284,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1293,65 +1324,20 @@ static void shrink_active_list(unsigned 
 	}
 
 	/*
-	 * Move the pages to the [file or anon] inactive list.
+	 * Move pages back to the lru list.
 	 */
 	pagevec_init(&pvec, 1);
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
-	 * Count referenced pages from currently used mappings as
-	 * rotated, even though they are moved to the inactive list.
+	 * Count referenced pages from currently used mappings as rotated.
 	 * This helps balance scan pressure between file and anonymous
 	 * pages in get_scan_ratio.
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(LRU_ACTIVE + file * LRU_FILE, &l_active);
+	move_active_pages_to_lru(LRU_BASE   + file * LRU_FILE, &l_inactive);
 
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-12  3:03                                           ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  3:03 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, minchan.kim

On Tue, May 12, 2009 at 10:58:28AM +0800, KOSAKI Motohiro wrote:
> > +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
> 
> it can be static?

Thanks, here's the updated patch.

---
vmscan: merge duplicate code in shrink_active_list()

The "move pages to active list" and "move pages to inactive list"
code blocks are mostly identical and can be served by a function.

Thanks to Andrew Morton for pointing this out.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
 1 file changed, 35 insertions(+), 49 deletions(-)

--- linux.orig/mm/vmscan.c
+++ linux/mm/vmscan.c
@@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
  * But we had to alter page->flags anyway.
  */
 
+static void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
+{
+	unsigned long pgmoved = 0;
+
+	while (!list_empty(&list)) {
+		page = lru_to_page(&list);
+		prefetchw_prev_lru_page(page, &list, flags);
+
+		VM_BUG_ON(PageLRU(page));
+		SetPageLRU(page);
+
+		VM_BUG_ON(!PageActive(page));
+		if (lru < LRU_ACTIVE)
+			ClearPageActive(page);
+
+		list_move(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_add_lru_list(page, lru);
+		pgmoved++;
+		if (!pagevec_add(&pvec, page)) {
+			spin_unlock_irq(&zone->lru_lock);
+			if (buffer_heads_over_limit)
+				pagevec_strip(&pvec);
+			__pagevec_release(&pvec);
+			spin_lock_irq(&zone->lru_lock);
+		}
+	}
+	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	if (lru < LRU_ACTIVE)
+		__count_vm_events(PGDEACTIVATE, pgmoved);
+}
 
 static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 			struct scan_control *sc, int priority, int file)
@@ -1254,6 +1284,7 @@ static void shrink_active_list(unsigned 
 	}
 	reclaim_stat->recent_scanned[!!file] += pgmoved;
 
+	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	if (file)
 		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
 	else
@@ -1293,65 +1324,20 @@ static void shrink_active_list(unsigned 
 	}
 
 	/*
-	 * Move the pages to the [file or anon] inactive list.
+	 * Move pages back to the lru list.
 	 */
 	pagevec_init(&pvec, 1);
 
 	spin_lock_irq(&zone->lru_lock);
 	/*
-	 * Count referenced pages from currently used mappings as
-	 * rotated, even though they are moved to the inactive list.
+	 * Count referenced pages from currently used mappings as rotated.
 	 * This helps balance scan pressure between file and anonymous
 	 * pages in get_scan_ratio.
 	 */
 	reclaim_stat->recent_rotated[!!file] += pgmoved;
 
-	pgmoved = 0;  /* count pages moved to inactive list */
-	lru = LRU_BASE + file * LRU_FILE;
-	while (!list_empty(&l_inactive)) {
-		page = lru_to_page(&l_inactive);
-		prefetchw_prev_lru_page(page, &l_inactive, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-		ClearPageActive(page);
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
-	__count_zone_vm_events(PGREFILL, zone, pgscanned);
-	__count_vm_events(PGDEACTIVATE, pgmoved);
-
-	pgmoved = 0;  /* count pages moved back to active list */
-	lru = LRU_ACTIVE + file * LRU_FILE;
-	while (!list_empty(&l_active)) {
-		page = lru_to_page(&l_active);
-		prefetchw_prev_lru_page(page, &l_active, flags);
-		VM_BUG_ON(PageLRU(page));
-		SetPageLRU(page);
-		VM_BUG_ON(!PageActive(page));
-
-		list_move(&page->lru, &zone->lru[lru].list);
-		mem_cgroup_add_lru_list(page, lru);
-		pgmoved++;
-		if (!pagevec_add(&pvec, page)) {
-			spin_unlock_irq(&zone->lru_lock);
-			if (buffer_heads_over_limit)
-				pagevec_strip(&pvec);
-			__pagevec_release(&pvec);
-			spin_lock_irq(&zone->lru_lock);
-		}
-	}
-	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
+	move_active_pages_to_lru(LRU_ACTIVE + file * LRU_FILE, &l_active);
+	move_active_pages_to_lru(LRU_BASE   + file * LRU_FILE, &l_inactive);
 
 	spin_unlock_irq(&zone->lru_lock);
 	if (buffer_heads_over_limit)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-12  2:50                                       ` Wu Fengguang
@ 2009-05-12  4:35                                         ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  4:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, May 12, 2009 at 10:50:58AM +0800, Wu Fengguang wrote:
> > Now.  How do we know that this patch improves Linux?
> 
> Hmm, it seems hard to get measurable performance numbers.
> 
> But we know that the running executable code is precious and shall be
> protected, and the patch protects them in this way:
> 
>         before patch: will be reclaimed if not referenced in I
>         after  patch: will be reclaimed if not referenced in I+A

s/will/may/, to be more exact.

> where
>         A = time to fully scan the active   file LRU
>         I = time to fully scan the inactive file LRU
> 
> Note that normally A >> I.
> 
> Therefore this patch greatly prolongs the in-cache time of executable code,
> when there are moderate memory pressures.

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-12  4:35                                         ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12  4:35 UTC (permalink / raw)
  To: Andrew Morton
  Cc: hannes, peterz, riel, linux-kernel, tytso, linux-mm, elladan,
	npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, May 12, 2009 at 10:50:58AM +0800, Wu Fengguang wrote:
> > Now.  How do we know that this patch improves Linux?
> 
> Hmm, it seems hard to get measurable performance numbers.
> 
> But we know that the running executable code is precious and shall be
> protected, and the patch protects them in this way:
> 
>         before patch: will be reclaimed if not referenced in I
>         after  patch: will be reclaimed if not referenced in I+A

s/will/may/, to be more exact.

> where
>         A = time to fully scan the active   file LRU
>         I = time to fully scan the inactive file LRU
> 
> Note that normally A >> I.
> 
> Therefore this patch greatly prolongs the in-cache time of executable code,
> when there are moderate memory pressures.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: report vm_flags in page_referenced()
  2009-05-12  2:51                                       ` Wu Fengguang
@ 2009-05-12  6:23                                         ` Peter Zijlstra
  -1 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-12  6:23 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, hannes, riel, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, 2009-05-12 at 10:51 +0800, Wu Fengguang wrote:
> @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
>                 if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
>                         continue;
>                 referenced += page_referenced_one(page, vma, &mapcount);
> +               *vm_flags |= vma->vm_flags;
>                 if (!mapcount)
>                         break;
>         }

Shouldn't that read:

  if (page_referenced_on(page, vma, &mapcount)) {
    referenced++;
    *vm_flags |= vma->vm_flags;
  }

So that we only add the vma-flags of those vmas that actually have a
young bit set?

In which case it'd be more at home in page_referenced_one():

@@ -381,6 +381,8 @@ out_unmap:
 	(*mapcount)--;
 	pte_unmap_unlock(pte, ptl);
 out:
+	if (referenced)
+		*vm_flags |= vma->vm_flags;
 	return referenced;
 }
 

Otherwise seems like a nice series, but please post the next version in
a new thread, this one is getting a bit unwieldy ;-)

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: report vm_flags in page_referenced()
@ 2009-05-12  6:23                                         ` Peter Zijlstra
  0 siblings, 0 replies; 336+ messages in thread
From: Peter Zijlstra @ 2009-05-12  6:23 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, hannes, riel, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, 2009-05-12 at 10:51 +0800, Wu Fengguang wrote:
> @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
>                 if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
>                         continue;
>                 referenced += page_referenced_one(page, vma, &mapcount);
> +               *vm_flags |= vma->vm_flags;
>                 if (!mapcount)
>                         break;
>         }

Shouldn't that read:

  if (page_referenced_on(page, vma, &mapcount)) {
    referenced++;
    *vm_flags |= vma->vm_flags;
  }

So that we only add the vma-flags of those vmas that actually have a
young bit set?

In which case it'd be more at home in page_referenced_one():

@@ -381,6 +381,8 @@ out_unmap:
 	(*mapcount)--;
 	pte_unmap_unlock(pte, ptl);
 out:
+	if (referenced)
+		*vm_flags |= vma->vm_flags;
 	return referenced;
 }
 

Otherwise seems like a nice series, but please post the next version in
a new thread, this one is getting a bit unwieldy ;-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: report vm_flags in page_referenced()
  2009-05-12  6:23                                         ` Peter Zijlstra
@ 2009-05-12  6:44                                           ` Minchan Kim
  -1 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-12  6:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wu Fengguang, Andrew Morton, hannes, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, 12 May 2009 08:23:09 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, 2009-05-12 at 10:51 +0800, Wu Fengguang wrote:
> > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
> >                 if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> >                         continue;
> >                 referenced += page_referenced_one(page, vma, &mapcount);
> > +               *vm_flags |= vma->vm_flags;
> >                 if (!mapcount)
> >                         break;
> >         }
> 
> Shouldn't that read:
> 
>   if (page_referenced_on(page, vma, &mapcount)) {
>     referenced++;
>     *vm_flags |= vma->vm_flags;
>   }
> 
> So that we only add the vma-flags of those vmas that actually have a
> young bit set?
> 
> In which case it'd be more at home in page_referenced_one():
> 
> @@ -381,6 +381,8 @@ out_unmap:
>  	(*mapcount)--;
>  	pte_unmap_unlock(pte, ptl);
>  out:
> +	if (referenced)
> +		*vm_flags |= vma->vm_flags;
>  	return referenced;
>  }

Good. I am ACK for peter's suggestion.
It can prevent setting vm_flag for worng vma which don't have the page.

-- 
Kinds Regards
Minchan Kim

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: report vm_flags in page_referenced()
@ 2009-05-12  6:44                                           ` Minchan Kim
  0 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-12  6:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Wu Fengguang, Andrew Morton, hannes, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, 12 May 2009 08:23:09 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Tue, 2009-05-12 at 10:51 +0800, Wu Fengguang wrote:
> > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
> >                 if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> >                         continue;
> >                 referenced += page_referenced_one(page, vma, &mapcount);
> > +               *vm_flags |= vma->vm_flags;
> >                 if (!mapcount)
> >                         break;
> >         }
> 
> Shouldn't that read:
> 
>   if (page_referenced_on(page, vma, &mapcount)) {
>     referenced++;
>     *vm_flags |= vma->vm_flags;
>   }
> 
> So that we only add the vma-flags of those vmas that actually have a
> young bit set?
> 
> In which case it'd be more at home in page_referenced_one():
> 
> @@ -381,6 +381,8 @@ out_unmap:
>  	(*mapcount)--;
>  	pte_unmap_unlock(pte, ptl);
>  out:
> +	if (referenced)
> +		*vm_flags |= vma->vm_flags;
>  	return referenced;
>  }

Good. I am ACK for peter's suggestion.
It can prevent setting vm_flag for worng vma which don't have the page.

-- 
Kinds Regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
  2009-05-12  2:53                                       ` Wu Fengguang
@ 2009-05-12  7:26                                         ` Minchan Kim
  -1 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-12  7:26 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, 12 May 2009 10:53:19 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> The "move pages to active list" and "move pages to inactive list"
> code blocks are mostly identical and can be served by a function.
> 
> Thanks to Andrew Morton for pointing this out.
> 
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
>  1 file changed, 35 insertions(+), 49 deletions(-)
> 
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
>   * But we had to alter page->flags anyway.
>   */
>  
> +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
> +{
> +	unsigned long pgmoved = 0;
> +
> +	while (!list_empty(&list)) {
> +		page = lru_to_page(&list);
> +		prefetchw_prev_lru_page(page, &list, flags);
> +
> +		VM_BUG_ON(PageLRU(page));
> +		SetPageLRU(page);
> +
> +		VM_BUG_ON(!PageActive(page));
> +		if (lru < LRU_ACTIVE)
> +			ClearPageActive(page);

Arithmetic on the LRU list is not good code for redability, I think. 
How about adding comment? 

if (lru < LRU_ACTIVE) /* In case of moving from active list to inactive */

Ignore me if you think this is trivial. 
I am okay. 

> +
> +		list_move(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_add_lru_list(page, lru);
> +		pgmoved++;
> +		if (!pagevec_add(&pvec, page)) {
> +			spin_unlock_irq(&zone->lru_lock);
> +			if (buffer_heads_over_limit)
> +				pagevec_strip(&pvec);
> +			__pagevec_release(&pvec);
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +	}
> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +	if (lru < LRU_ACTIVE)
> +		__count_vm_events(PGDEACTIVATE, pgmoved);
> +}
>  
>  static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
>  			struct scan_control *sc, int priority, int file)
> @@ -1254,6 +1284,7 @@ static void shrink_active_list(unsigned 
>  	}
>  	reclaim_stat->recent_scanned[!!file] += pgmoved;
>  
> +	__count_zone_vm_events(PGREFILL, zone, pgscanned);
>  	if (file)
>  		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
>  	else
> @@ -1293,65 +1324,20 @@ static void shrink_active_list(unsigned 
>  	}
>  
>  	/*
> -	 * Move the pages to the [file or anon] inactive list.
> +	 * Move pages back to the lru list.
>  	 */
>  	pagevec_init(&pvec, 1);
>  
>  	spin_lock_irq(&zone->lru_lock);
>  	/*
> -	 * Count referenced pages from currently used mappings as
> -	 * rotated, even though they are moved to the inactive list.
> +	 * Count referenced pages from currently used mappings as rotated.
>  	 * This helps balance scan pressure between file and anonymous
>  	 * pages in get_scan_ratio.
>  	 */
>  	reclaim_stat->recent_rotated[!!file] += pgmoved;
>  
> -	pgmoved = 0;  /* count pages moved to inactive list */
> -	lru = LRU_BASE + file * LRU_FILE;
> -	while (!list_empty(&l_inactive)) {
> -		page = lru_to_page(&l_inactive);
> -		prefetchw_prev_lru_page(page, &l_inactive, flags);
> -		VM_BUG_ON(PageLRU(page));
> -		SetPageLRU(page);
> -		VM_BUG_ON(!PageActive(page));
> -		ClearPageActive(page);
> -
> -		list_move(&page->lru, &zone->lru[lru].list);
> -		mem_cgroup_add_lru_list(page, lru);
> -		pgmoved++;
> -		if (!pagevec_add(&pvec, page)) {
> -			spin_unlock_irq(&zone->lru_lock);
> -			if (buffer_heads_over_limit)
> -				pagevec_strip(&pvec);
> -			__pagevec_release(&pvec);
> -			spin_lock_irq(&zone->lru_lock);
> -		}
> -	}
> -	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> -	__count_zone_vm_events(PGREFILL, zone, pgscanned);
> -	__count_vm_events(PGDEACTIVATE, pgmoved);
> -
> -	pgmoved = 0;  /* count pages moved back to active list */
> -	lru = LRU_ACTIVE + file * LRU_FILE;
> -	while (!list_empty(&l_active)) {
> -		page = lru_to_page(&l_active);
> -		prefetchw_prev_lru_page(page, &l_active, flags);
> -		VM_BUG_ON(PageLRU(page));
> -		SetPageLRU(page);
> -		VM_BUG_ON(!PageActive(page));
> -
> -		list_move(&page->lru, &zone->lru[lru].list);
> -		mem_cgroup_add_lru_list(page, lru);
> -		pgmoved++;
> -		if (!pagevec_add(&pvec, page)) {
> -			spin_unlock_irq(&zone->lru_lock);
> -			if (buffer_heads_over_limit)
> -				pagevec_strip(&pvec);
> -			__pagevec_release(&pvec);
> -			spin_lock_irq(&zone->lru_lock);
> -		}
> -	}
> -	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +	move_active_pages_to_lru(LRU_ACTIVE + file * LRU_FILE, &l_active);
> +	move_active_pages_to_lru(LRU_BASE   + file * LRU_FILE, &l_inactive);
>  
>  	spin_unlock_irq(&zone->lru_lock);
>  	if (buffer_heads_over_limit)


-- 
Kinds Regards
Minchan Kim

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-12  7:26                                         ` Minchan Kim
  0 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-12  7:26 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, 12 May 2009 10:53:19 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> The "move pages to active list" and "move pages to inactive list"
> code blocks are mostly identical and can be served by a function.
> 
> Thanks to Andrew Morton for pointing this out.
> 
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
>  1 file changed, 35 insertions(+), 49 deletions(-)
> 
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
>   * But we had to alter page->flags anyway.
>   */
>  
> +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
> +{
> +	unsigned long pgmoved = 0;
> +
> +	while (!list_empty(&list)) {
> +		page = lru_to_page(&list);
> +		prefetchw_prev_lru_page(page, &list, flags);
> +
> +		VM_BUG_ON(PageLRU(page));
> +		SetPageLRU(page);
> +
> +		VM_BUG_ON(!PageActive(page));
> +		if (lru < LRU_ACTIVE)
> +			ClearPageActive(page);

Arithmetic on the LRU list is not good code for redability, I think. 
How about adding comment? 

if (lru < LRU_ACTIVE) /* In case of moving from active list to inactive */

Ignore me if you think this is trivial. 
I am okay. 

> +
> +		list_move(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_add_lru_list(page, lru);
> +		pgmoved++;
> +		if (!pagevec_add(&pvec, page)) {
> +			spin_unlock_irq(&zone->lru_lock);
> +			if (buffer_heads_over_limit)
> +				pagevec_strip(&pvec);
> +			__pagevec_release(&pvec);
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +	}
> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +	if (lru < LRU_ACTIVE)
> +		__count_vm_events(PGDEACTIVATE, pgmoved);
> +}
>  
>  static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
>  			struct scan_control *sc, int priority, int file)
> @@ -1254,6 +1284,7 @@ static void shrink_active_list(unsigned 
>  	}
>  	reclaim_stat->recent_scanned[!!file] += pgmoved;
>  
> +	__count_zone_vm_events(PGREFILL, zone, pgscanned);
>  	if (file)
>  		__mod_zone_page_state(zone, NR_ACTIVE_FILE, -pgmoved);
>  	else
> @@ -1293,65 +1324,20 @@ static void shrink_active_list(unsigned 
>  	}
>  
>  	/*
> -	 * Move the pages to the [file or anon] inactive list.
> +	 * Move pages back to the lru list.
>  	 */
>  	pagevec_init(&pvec, 1);
>  
>  	spin_lock_irq(&zone->lru_lock);
>  	/*
> -	 * Count referenced pages from currently used mappings as
> -	 * rotated, even though they are moved to the inactive list.
> +	 * Count referenced pages from currently used mappings as rotated.
>  	 * This helps balance scan pressure between file and anonymous
>  	 * pages in get_scan_ratio.
>  	 */
>  	reclaim_stat->recent_rotated[!!file] += pgmoved;
>  
> -	pgmoved = 0;  /* count pages moved to inactive list */
> -	lru = LRU_BASE + file * LRU_FILE;
> -	while (!list_empty(&l_inactive)) {
> -		page = lru_to_page(&l_inactive);
> -		prefetchw_prev_lru_page(page, &l_inactive, flags);
> -		VM_BUG_ON(PageLRU(page));
> -		SetPageLRU(page);
> -		VM_BUG_ON(!PageActive(page));
> -		ClearPageActive(page);
> -
> -		list_move(&page->lru, &zone->lru[lru].list);
> -		mem_cgroup_add_lru_list(page, lru);
> -		pgmoved++;
> -		if (!pagevec_add(&pvec, page)) {
> -			spin_unlock_irq(&zone->lru_lock);
> -			if (buffer_heads_over_limit)
> -				pagevec_strip(&pvec);
> -			__pagevec_release(&pvec);
> -			spin_lock_irq(&zone->lru_lock);
> -		}
> -	}
> -	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> -	__count_zone_vm_events(PGREFILL, zone, pgscanned);
> -	__count_vm_events(PGDEACTIVATE, pgmoved);
> -
> -	pgmoved = 0;  /* count pages moved back to active list */
> -	lru = LRU_ACTIVE + file * LRU_FILE;
> -	while (!list_empty(&l_active)) {
> -		page = lru_to_page(&l_active);
> -		prefetchw_prev_lru_page(page, &l_active, flags);
> -		VM_BUG_ON(PageLRU(page));
> -		SetPageLRU(page);
> -		VM_BUG_ON(!PageActive(page));
> -
> -		list_move(&page->lru, &zone->lru[lru].list);
> -		mem_cgroup_add_lru_list(page, lru);
> -		pgmoved++;
> -		if (!pagevec_add(&pvec, page)) {
> -			spin_unlock_irq(&zone->lru_lock);
> -			if (buffer_heads_over_limit)
> -				pagevec_strip(&pvec);
> -			__pagevec_release(&pvec);
> -			spin_lock_irq(&zone->lru_lock);
> -		}
> -	}
> -	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +	move_active_pages_to_lru(LRU_ACTIVE + file * LRU_FILE, &l_active);
> +	move_active_pages_to_lru(LRU_BASE   + file * LRU_FILE, &l_inactive);
>  
>  	spin_unlock_irq(&zone->lru_lock);
>  	if (buffer_heads_over_limit)


-- 
Kinds Regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-12  2:52                                       ` Wu Fengguang
@ 2009-05-12  8:17                                         ` Minchan Kim
  -1 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-12  8:17 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, 12 May 2009 10:52:46 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

That is great explanation. :)

Now, we just need any numbers. 
But, as you know, It is difficult to get the numbers for various workloads. 

I don't know it is job of you or us ?? 
MM tree is always not stable. It's place to test freely for various workloads.

If we can justify patch at least, we can test it after merge once. 
(Of course, It depends on Andrew )
After long testing without any regresssions, we can merge this into linus tree. 

I think this patch is enough. 

Wu Fengguang 
Please, resend this patch series with modifying 'merge duplicate code in shrink_active_list' patch. 

Thanks for your great effort. :)

> Protect referenced PROT_EXEC mapped pages from being deactivated.
> 
> PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
> currently running executables and their linked libraries, they shall really be
> cached aggressively to provide good user experiences.
> 
> Thanks to Johannes Weiner for the advice to reuse the VMA walk in
> page_referenced() to get the PROT_EXEC bit.
> 
> 
> [more details]
> 
> ( The consequences of this patch will have to be discussed together with
>   Rik van Riel's recent patch "vmscan: evict use-once pages first". )
> 
> ( Some of the good points and insights are taken into this changelog.
>   Thanks to all the involved people for the great LKML discussions. )
> 
> the problem
> -----------
> 
> For a typical desktop, the most precious working set is composed of
> *actively accessed*
> 	(1) memory mapped executables
> 	(2) and their anonymous pages
> 	(3) and other files
> 	(4) and the dcache/icache/.. slabs
> while the least important data are
> 	(5) infrequently used or use-once files
> 
> For a typical desktop, one major problem is busty and large amount of (5)
> use-once files flushing out the working set.
> 
> Inside the working set, (4) dcache/icache have already been too sticky ;-)
> So we only have to care (2) anonymous and (1)(3) file pages.
> 
> anonymous pages
> ---------------
> Anonymous pages are effectively immune to the streaming IO attack, because we
> now have separate file/anon LRU lists. When the use-once files crowd into the
> file LRU, the list's "quality" is significantly lowered. Therefore the scan
> balance policy in get_scan_ratio() will choose to scan the (low quality) file
> LRU much more frequently than the anon LRU.
> 
> file pages
> ----------
> Rik proposed to *not* scan the active file LRU when the inactive list grows
> larger than active list. This guarantees that when there are use-once streaming
> IO, and the working set is not too large(so that active_size < inactive_size),
> the active file LRU will *not* be scanned at all. So the not-too-large working
> set can be well protected.
> 
> But there are also situations where the file working set is a bit large so that
> (active_size >= inactive_size), or the streaming IOs are not purely use-once.
> In these cases, the active list will be scanned slowly. Because the current
> shrink_active_list() policy is to deactivate active pages regardless of their
> referenced bits. The deactivated pages become susceptible to the streaming IO
> attack: the inactive list could be scanned fast (500MB / 50MBps = 10s) so that
> the deactivated pages don't have enough time to get re-referenced. Because a
> user tend to switch between windows in intervals from seconds to minutes.
> 
> This patch holds mapped executable pages in the active list as long as they
> are referenced during each full scan of the active list.  Because the active
> list is normally scanned much slower, they get longer grace time (eg. 100s)
> for further references, which better matches the pace of user operations.
> 
> side effects
> ------------
> 
> This patch is safe in general, it restores the pre-2.6.28 mmap() behavior
> but in a much smaller and well targeted scope.
> 
> One may worry about some one to abuse the PROT_EXEC heuristic.  But as
> Andrew Morton stated, there are other tricks to getting that sort of boost.
> 
> Another concern is the PROT_EXEC mapped pages growing large in rare cases,
> and therefore hurting reclaim efficiency. But a sane application targeted for
> large audience will never use PROT_EXEC for data mappings. If some home made
> application tries to abuse that bit, it shall be aware of the consequences,
> which won't be disastrous even in the worst case.
> 
> CC: Elladan <elladan@eskimo.com>
> CC: Nick Piggin <npiggin@suse.de>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Christoph Lameter <cl@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  mm/vmscan.c |   41 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 2 deletions(-)
> 
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned 
>  	unsigned long pgscanned;
>  	unsigned long vm_flags;
>  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> +	LIST_HEAD(l_active);
>  	LIST_HEAD(l_inactive);
>  	struct page *page;
>  	struct pagevec pvec;
> @@ -1272,8 +1273,21 @@ static void shrink_active_list(unsigned 
>  
>  		/* page_referenced clears PageReferenced */
>  		if (page_mapping_inuse(page) &&
> -		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
> +		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
>  			pgmoved++;
> +			/*
> +			 * Identify referenced, file-backed active pages and
> +			 * give them one more trip around the active list. So
> +			 * that executable code get better chances to stay in
> +			 * memory under moderate memory pressure.  Anon pages
> +			 * are ignored, since JVM can create lots of anon
> +			 * VM_EXEC pages.
> +			 */
> +			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
> +				list_add(&page->lru, &l_active);
> +				continue;
> +			}
> +		}
>  
>  		list_add(&page->lru, &l_inactive);
>  	}
> @@ -1282,7 +1296,6 @@ static void shrink_active_list(unsigned 
>  	 * Move the pages to the [file or anon] inactive list.
>  	 */
>  	pagevec_init(&pvec, 1);
> -	lru = LRU_BASE + file * LRU_FILE;
>  
>  	spin_lock_irq(&zone->lru_lock);
>  	/*
> @@ -1294,6 +1307,7 @@ static void shrink_active_list(unsigned 
>  	reclaim_stat->recent_rotated[!!file] += pgmoved;
>  
>  	pgmoved = 0;  /* count pages moved to inactive list */
> +	lru = LRU_BASE + file * LRU_FILE;
>  	while (!list_empty(&l_inactive)) {
>  		page = lru_to_page(&l_inactive);
>  		prefetchw_prev_lru_page(page, &l_inactive, flags);
> @@ -1316,6 +1330,29 @@ static void shrink_active_list(unsigned 
>  	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
>  	__count_zone_vm_events(PGREFILL, zone, pgscanned);
>  	__count_vm_events(PGDEACTIVATE, pgmoved);
> +
> +	pgmoved = 0;  /* count pages moved back to active list */
> +	lru = LRU_ACTIVE + file * LRU_FILE;
> +	while (!list_empty(&l_active)) {
> +		page = lru_to_page(&l_active);
> +		prefetchw_prev_lru_page(page, &l_active, flags);
> +		VM_BUG_ON(PageLRU(page));
> +		SetPageLRU(page);
> +		VM_BUG_ON(!PageActive(page));
> +
> +		list_move(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_add_lru_list(page, lru);
> +		pgmoved++;
> +		if (!pagevec_add(&pvec, page)) {
> +			spin_unlock_irq(&zone->lru_lock);
> +			if (buffer_heads_over_limit)
> +				pagevec_strip(&pvec);
> +			__pagevec_release(&pvec);
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +	}
> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +
>  	spin_unlock_irq(&zone->lru_lock);
>  	if (buffer_heads_over_limit)
>  		pagevec_strip(&pvec);


-- 
Kinds Regards
Minchan Kim

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-12  8:17                                         ` Minchan Kim
  0 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-12  8:17 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, 12 May 2009 10:52:46 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

That is great explanation. :)

Now, we just need any numbers. 
But, as you know, It is difficult to get the numbers for various workloads. 

I don't know it is job of you or us ?? 
MM tree is always not stable. It's place to test freely for various workloads.

If we can justify patch at least, we can test it after merge once. 
(Of course, It depends on Andrew )
After long testing without any regresssions, we can merge this into linus tree. 

I think this patch is enough. 

Wu Fengguang 
Please, resend this patch series with modifying 'merge duplicate code in shrink_active_list' patch. 

Thanks for your great effort. :)

> Protect referenced PROT_EXEC mapped pages from being deactivated.
> 
> PROT_EXEC(or its internal presentation VM_EXEC) pages normally belong to some
> currently running executables and their linked libraries, they shall really be
> cached aggressively to provide good user experiences.
> 
> Thanks to Johannes Weiner for the advice to reuse the VMA walk in
> page_referenced() to get the PROT_EXEC bit.
> 
> 
> [more details]
> 
> ( The consequences of this patch will have to be discussed together with
>   Rik van Riel's recent patch "vmscan: evict use-once pages first". )
> 
> ( Some of the good points and insights are taken into this changelog.
>   Thanks to all the involved people for the great LKML discussions. )
> 
> the problem
> -----------
> 
> For a typical desktop, the most precious working set is composed of
> *actively accessed*
> 	(1) memory mapped executables
> 	(2) and their anonymous pages
> 	(3) and other files
> 	(4) and the dcache/icache/.. slabs
> while the least important data are
> 	(5) infrequently used or use-once files
> 
> For a typical desktop, one major problem is busty and large amount of (5)
> use-once files flushing out the working set.
> 
> Inside the working set, (4) dcache/icache have already been too sticky ;-)
> So we only have to care (2) anonymous and (1)(3) file pages.
> 
> anonymous pages
> ---------------
> Anonymous pages are effectively immune to the streaming IO attack, because we
> now have separate file/anon LRU lists. When the use-once files crowd into the
> file LRU, the list's "quality" is significantly lowered. Therefore the scan
> balance policy in get_scan_ratio() will choose to scan the (low quality) file
> LRU much more frequently than the anon LRU.
> 
> file pages
> ----------
> Rik proposed to *not* scan the active file LRU when the inactive list grows
> larger than active list. This guarantees that when there are use-once streaming
> IO, and the working set is not too large(so that active_size < inactive_size),
> the active file LRU will *not* be scanned at all. So the not-too-large working
> set can be well protected.
> 
> But there are also situations where the file working set is a bit large so that
> (active_size >= inactive_size), or the streaming IOs are not purely use-once.
> In these cases, the active list will be scanned slowly. Because the current
> shrink_active_list() policy is to deactivate active pages regardless of their
> referenced bits. The deactivated pages become susceptible to the streaming IO
> attack: the inactive list could be scanned fast (500MB / 50MBps = 10s) so that
> the deactivated pages don't have enough time to get re-referenced. Because a
> user tend to switch between windows in intervals from seconds to minutes.
> 
> This patch holds mapped executable pages in the active list as long as they
> are referenced during each full scan of the active list.  Because the active
> list is normally scanned much slower, they get longer grace time (eg. 100s)
> for further references, which better matches the pace of user operations.
> 
> side effects
> ------------
> 
> This patch is safe in general, it restores the pre-2.6.28 mmap() behavior
> but in a much smaller and well targeted scope.
> 
> One may worry about some one to abuse the PROT_EXEC heuristic.  But as
> Andrew Morton stated, there are other tricks to getting that sort of boost.
> 
> Another concern is the PROT_EXEC mapped pages growing large in rare cases,
> and therefore hurting reclaim efficiency. But a sane application targeted for
> large audience will never use PROT_EXEC for data mappings. If some home made
> application tries to abuse that bit, it shall be aware of the consequences,
> which won't be disastrous even in the worst case.
> 
> CC: Elladan <elladan@eskimo.com>
> CC: Nick Piggin <npiggin@suse.de>
> CC: Johannes Weiner <hannes@cmpxchg.org>
> CC: Christoph Lameter <cl@linux-foundation.org>
> CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Acked-by: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Rik van Riel <riel@redhat.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  mm/vmscan.c |   41 +++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 39 insertions(+), 2 deletions(-)
> 
> --- linux.orig/mm/vmscan.c
> +++ linux/mm/vmscan.c
> @@ -1233,6 +1233,7 @@ static void shrink_active_list(unsigned 
>  	unsigned long pgscanned;
>  	unsigned long vm_flags;
>  	LIST_HEAD(l_hold);	/* The pages which were snipped off */
> +	LIST_HEAD(l_active);
>  	LIST_HEAD(l_inactive);
>  	struct page *page;
>  	struct pagevec pvec;
> @@ -1272,8 +1273,21 @@ static void shrink_active_list(unsigned 
>  
>  		/* page_referenced clears PageReferenced */
>  		if (page_mapping_inuse(page) &&
> -		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags))
> +		    page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
>  			pgmoved++;
> +			/*
> +			 * Identify referenced, file-backed active pages and
> +			 * give them one more trip around the active list. So
> +			 * that executable code get better chances to stay in
> +			 * memory under moderate memory pressure.  Anon pages
> +			 * are ignored, since JVM can create lots of anon
> +			 * VM_EXEC pages.
> +			 */
> +			if ((vm_flags & VM_EXEC) && !PageAnon(page)) {
> +				list_add(&page->lru, &l_active);
> +				continue;
> +			}
> +		}
>  
>  		list_add(&page->lru, &l_inactive);
>  	}
> @@ -1282,7 +1296,6 @@ static void shrink_active_list(unsigned 
>  	 * Move the pages to the [file or anon] inactive list.
>  	 */
>  	pagevec_init(&pvec, 1);
> -	lru = LRU_BASE + file * LRU_FILE;
>  
>  	spin_lock_irq(&zone->lru_lock);
>  	/*
> @@ -1294,6 +1307,7 @@ static void shrink_active_list(unsigned 
>  	reclaim_stat->recent_rotated[!!file] += pgmoved;
>  
>  	pgmoved = 0;  /* count pages moved to inactive list */
> +	lru = LRU_BASE + file * LRU_FILE;
>  	while (!list_empty(&l_inactive)) {
>  		page = lru_to_page(&l_inactive);
>  		prefetchw_prev_lru_page(page, &l_inactive, flags);
> @@ -1316,6 +1330,29 @@ static void shrink_active_list(unsigned 
>  	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
>  	__count_zone_vm_events(PGREFILL, zone, pgscanned);
>  	__count_vm_events(PGDEACTIVATE, pgmoved);
> +
> +	pgmoved = 0;  /* count pages moved back to active list */
> +	lru = LRU_ACTIVE + file * LRU_FILE;
> +	while (!list_empty(&l_active)) {
> +		page = lru_to_page(&l_active);
> +		prefetchw_prev_lru_page(page, &l_active, flags);
> +		VM_BUG_ON(PageLRU(page));
> +		SetPageLRU(page);
> +		VM_BUG_ON(!PageActive(page));
> +
> +		list_move(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_add_lru_list(page, lru);
> +		pgmoved++;
> +		if (!pagevec_add(&pvec, page)) {
> +			spin_unlock_irq(&zone->lru_lock);
> +			if (buffer_heads_over_limit)
> +				pagevec_strip(&pvec);
> +			__pagevec_release(&pvec);
> +			spin_lock_irq(&zone->lru_lock);
> +		}
> +	}
> +	__mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved);
> +
>  	spin_unlock_irq(&zone->lru_lock);
>  	if (buffer_heads_over_limit)
>  		pagevec_strip(&pvec);


-- 
Kinds Regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: report vm_flags in page_referenced()
  2009-05-12  6:44                                           ` Minchan Kim
@ 2009-05-12 11:44                                             ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12 11:44 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Peter Zijlstra, Andrew Morton, hannes, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro

On Tue, May 12, 2009 at 02:44:13PM +0800, Minchan Kim wrote:
> On Tue, 12 May 2009 08:23:09 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Tue, 2009-05-12 at 10:51 +0800, Wu Fengguang wrote:
> > > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
> > >                 if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> > >                         continue;
> > >                 referenced += page_referenced_one(page, vma, &mapcount);
> > > +               *vm_flags |= vma->vm_flags;
> > >                 if (!mapcount)
> > >                         break;
> > >         }
> > 
> > Shouldn't that read:
> > 
> >   if (page_referenced_on(page, vma, &mapcount)) {
> >     referenced++;
> >     *vm_flags |= vma->vm_flags;
> >   }
> > 
> > So that we only add the vma-flags of those vmas that actually have a
> > young bit set?
> > 
> > In which case it'd be more at home in page_referenced_one():
> > 
> > @@ -381,6 +381,8 @@ out_unmap:
> >  	(*mapcount)--;
> >  	pte_unmap_unlock(pte, ptl);
> >  out:
> > +	if (referenced)
> > +		*vm_flags |= vma->vm_flags;
> >  	return referenced;
> >  }
> 
> Good. I am ACK for peter's suggestion.
> It can prevent setting vm_flag for worng vma which don't have the page.

Good suggestions!  I realized now it is a flaky idea to not do that in
page_referenced_one(), hehe.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: report vm_flags in page_referenced()
@ 2009-05-12 11:44                                             ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12 11:44 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Peter Zijlstra, Andrew Morton, hannes, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro

On Tue, May 12, 2009 at 02:44:13PM +0800, Minchan Kim wrote:
> On Tue, 12 May 2009 08:23:09 +0200
> Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > On Tue, 2009-05-12 at 10:51 +0800, Wu Fengguang wrote:
> > > @@ -406,6 +408,7 @@ static int page_referenced_anon(struct p
> > >                 if (mem_cont && !mm_match_cgroup(vma->vm_mm, mem_cont))
> > >                         continue;
> > >                 referenced += page_referenced_one(page, vma, &mapcount);
> > > +               *vm_flags |= vma->vm_flags;
> > >                 if (!mapcount)
> > >                         break;
> > >         }
> > 
> > Shouldn't that read:
> > 
> >   if (page_referenced_on(page, vma, &mapcount)) {
> >     referenced++;
> >     *vm_flags |= vma->vm_flags;
> >   }
> > 
> > So that we only add the vma-flags of those vmas that actually have a
> > young bit set?
> > 
> > In which case it'd be more at home in page_referenced_one():
> > 
> > @@ -381,6 +381,8 @@ out_unmap:
> >  	(*mapcount)--;
> >  	pte_unmap_unlock(pte, ptl);
> >  out:
> > +	if (referenced)
> > +		*vm_flags |= vma->vm_flags;
> >  	return referenced;
> >  }
> 
> Good. I am ACK for peter's suggestion.
> It can prevent setting vm_flag for worng vma which don't have the page.

Good suggestions!  I realized now it is a flaky idea to not do that in
page_referenced_one(), hehe.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
  2009-05-12  7:26                                         ` Minchan Kim
@ 2009-05-12 11:48                                           ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12 11:48 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro

On Tue, May 12, 2009 at 03:26:33PM +0800, Minchan Kim wrote:
> On Tue, 12 May 2009 10:53:19 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > The "move pages to active list" and "move pages to inactive list"
> > code blocks are mostly identical and can be served by a function.
> > 
> > Thanks to Andrew Morton for pointing this out.
> > 
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
> >  1 file changed, 35 insertions(+), 49 deletions(-)
> > 
> > --- linux.orig/mm/vmscan.c
> > +++ linux/mm/vmscan.c
> > @@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
> >   * But we had to alter page->flags anyway.
> >   */
> >  
> > +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
> > +{
> > +	unsigned long pgmoved = 0;
> > +
> > +	while (!list_empty(&list)) {
> > +		page = lru_to_page(&list);
> > +		prefetchw_prev_lru_page(page, &list, flags);
> > +
> > +		VM_BUG_ON(PageLRU(page));
> > +		SetPageLRU(page);
> > +
> > +		VM_BUG_ON(!PageActive(page));
> > +		if (lru < LRU_ACTIVE)
> > +			ClearPageActive(page);
> 
> Arithmetic on the LRU list is not good code for redability, I think. 
> How about adding comment? 
> 
> if (lru < LRU_ACTIVE) /* In case of moving from active list to inactive */
> 
> Ignore me if you think this is trivial. 

Good suggestion. Or this simple one: "we are de-activating"?

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-12 11:48                                           ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-12 11:48 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, hannes, peterz, riel, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro

On Tue, May 12, 2009 at 03:26:33PM +0800, Minchan Kim wrote:
> On Tue, 12 May 2009 10:53:19 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > The "move pages to active list" and "move pages to inactive list"
> > code blocks are mostly identical and can be served by a function.
> > 
> > Thanks to Andrew Morton for pointing this out.
> > 
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
> >  1 file changed, 35 insertions(+), 49 deletions(-)
> > 
> > --- linux.orig/mm/vmscan.c
> > +++ linux/mm/vmscan.c
> > @@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
> >   * But we had to alter page->flags anyway.
> >   */
> >  
> > +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
> > +{
> > +	unsigned long pgmoved = 0;
> > +
> > +	while (!list_empty(&list)) {
> > +		page = lru_to_page(&list);
> > +		prefetchw_prev_lru_page(page, &list, flags);
> > +
> > +		VM_BUG_ON(PageLRU(page));
> > +		SetPageLRU(page);
> > +
> > +		VM_BUG_ON(!PageActive(page));
> > +		if (lru < LRU_ACTIVE)
> > +			ClearPageActive(page);
> 
> Arithmetic on the LRU list is not good code for redability, I think. 
> How about adding comment? 
> 
> if (lru < LRU_ACTIVE) /* In case of moving from active list to inactive */
> 
> Ignore me if you think this is trivial. 

Good suggestion. Or this simple one: "we are de-activating"?

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
  2009-05-12 11:48                                           ` Wu Fengguang
@ 2009-05-12 11:57                                             ` Minchan Kim
  -1 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-12 11:57 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Minchan Kim, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, kosaki.motohiro

On Tue, 12 May 2009 19:48:07 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Tue, May 12, 2009 at 03:26:33PM +0800, Minchan Kim wrote:
> > On Tue, 12 May 2009 10:53:19 +0800
> > Wu Fengguang <fengguang.wu@intel.com> wrote:
> > 
> > > The "move pages to active list" and "move pages to inactive list"
> > > code blocks are mostly identical and can be served by a function.
> > > 
> > > Thanks to Andrew Morton for pointing this out.
> > > 
> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > ---
> > >  mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
> > >  1 file changed, 35 insertions(+), 49 deletions(-)
> > > 
> > > --- linux.orig/mm/vmscan.c
> > > +++ linux/mm/vmscan.c
> > > @@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
> > >   * But we had to alter page->flags anyway.
> > >   */
> > >  
> > > +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
> > > +{
> > > +	unsigned long pgmoved = 0;
> > > +
> > > +	while (!list_empty(&list)) {
> > > +		page = lru_to_page(&list);
> > > +		prefetchw_prev_lru_page(page, &list, flags);
> > > +
> > > +		VM_BUG_ON(PageLRU(page));
> > > +		SetPageLRU(page);
> > > +
> > > +		VM_BUG_ON(!PageActive(page));
> > > +		if (lru < LRU_ACTIVE)
> > > +			ClearPageActive(page);
> > 
> > Arithmetic on the LRU list is not good code for redability, I think. 
> > How about adding comment? 
> > 
> > if (lru < LRU_ACTIVE) /* In case of moving from active list to inactive */
> > 
> > Ignore me if you think this is trivial. 
> 
> Good suggestion. Or this simple one: "we are de-activating"?
> 

Looks good to me :)

> Thanks,
> Fengguang


-- 
Kinds Regards
Minchan Kim

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-12 11:57                                             ` Minchan Kim
  0 siblings, 0 replies; 336+ messages in thread
From: Minchan Kim @ 2009-05-12 11:57 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Minchan Kim, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, cl, kosaki.motohiro

On Tue, 12 May 2009 19:48:07 +0800
Wu Fengguang <fengguang.wu@intel.com> wrote:

> On Tue, May 12, 2009 at 03:26:33PM +0800, Minchan Kim wrote:
> > On Tue, 12 May 2009 10:53:19 +0800
> > Wu Fengguang <fengguang.wu@intel.com> wrote:
> > 
> > > The "move pages to active list" and "move pages to inactive list"
> > > code blocks are mostly identical and can be served by a function.
> > > 
> > > Thanks to Andrew Morton for pointing this out.
> > > 
> > > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > > ---
> > >  mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
> > >  1 file changed, 35 insertions(+), 49 deletions(-)
> > > 
> > > --- linux.orig/mm/vmscan.c
> > > +++ linux/mm/vmscan.c
> > > @@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
> > >   * But we had to alter page->flags anyway.
> > >   */
> > >  
> > > +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
> > > +{
> > > +	unsigned long pgmoved = 0;
> > > +
> > > +	while (!list_empty(&list)) {
> > > +		page = lru_to_page(&list);
> > > +		prefetchw_prev_lru_page(page, &list, flags);
> > > +
> > > +		VM_BUG_ON(PageLRU(page));
> > > +		SetPageLRU(page);
> > > +
> > > +		VM_BUG_ON(!PageActive(page));
> > > +		if (lru < LRU_ACTIVE)
> > > +			ClearPageActive(page);
> > 
> > Arithmetic on the LRU list is not good code for redability, I think. 
> > How about adding comment? 
> > 
> > if (lru < LRU_ACTIVE) /* In case of moving from active list to inactive */
> > 
> > Ignore me if you think this is trivial. 
> 
> Good suggestion. Or this simple one: "we are de-activating"?
> 

Looks good to me :)

> Thanks,
> Fengguang


-- 
Kinds Regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-12  2:50                                       ` Wu Fengguang
@ 2009-05-12 13:20                                         ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-12 13:20 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, hannes, peterz, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro, minchan.kim

Wu Fengguang wrote:

>> Also, the change makes this comment:
>>
>> 	spin_lock_irq(&zone->lru_lock);
>> 	/*
>> 	 * Count referenced pages from currently used mappings as
>> 	 * rotated, even though they are moved to the inactive list.
>> 	 * This helps balance scan pressure between file and anonymous
>> 	 * pages in get_scan_ratio.
>> 	 */
>> 	reclaim_stat->recent_rotated[!!file] += pgmoved;
>>
>> inaccurate.
> 
> Good catch, I'll just remove the stale "even though they are moved to
> the inactive list".

Well, it is still true for !VM_EXEC pages.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-12 13:20                                         ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-12 13:20 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, hannes, peterz, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro, minchan.kim

Wu Fengguang wrote:

>> Also, the change makes this comment:
>>
>> 	spin_lock_irq(&zone->lru_lock);
>> 	/*
>> 	 * Count referenced pages from currently used mappings as
>> 	 * rotated, even though they are moved to the inactive list.
>> 	 * This helps balance scan pressure between file and anonymous
>> 	 * pages in get_scan_ratio.
>> 	 */
>> 	reclaim_stat->recent_rotated[!!file] += pgmoved;
>>
>> inaccurate.
> 
> Good catch, I'll just remove the stale "even though they are moved to
> the inactive list".

Well, it is still true for !VM_EXEC pages.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
  2009-05-12 11:48                                           ` Wu Fengguang
@ 2009-05-12 13:32                                             ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-12 13:32 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Minchan Kim, Andrew Morton, hannes, peterz, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro

Wu Fengguang wrote:
> On Tue, May 12, 2009 at 03:26:33PM +0800, Minchan Kim wrote:
>> On Tue, 12 May 2009 10:53:19 +0800
>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>
>>> The "move pages to active list" and "move pages to inactive list"
>>> code blocks are mostly identical and can be served by a function.
>>>
>>> Thanks to Andrew Morton for pointing this out.
>>>
>>> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
>>> ---
>>>  mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
>>>  1 file changed, 35 insertions(+), 49 deletions(-)
>>>
>>> --- linux.orig/mm/vmscan.c
>>> +++ linux/mm/vmscan.c
>>> @@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
>>>   * But we had to alter page->flags anyway.
>>>   */
>>>  
>>> +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
>>> +{
>>> +	unsigned long pgmoved = 0;
>>> +
>>> +	while (!list_empty(&list)) {
>>> +		page = lru_to_page(&list);
>>> +		prefetchw_prev_lru_page(page, &list, flags);
>>> +
>>> +		VM_BUG_ON(PageLRU(page));
>>> +		SetPageLRU(page);
>>> +
>>> +		VM_BUG_ON(!PageActive(page));
>>> +		if (lru < LRU_ACTIVE)
>>> +			ClearPageActive(page);
>> Arithmetic on the LRU list is not good code for redability, I think. 
>> How about adding comment? 
>>
>> if (lru < LRU_ACTIVE) /* In case of moving from active list to inactive */
>>
>> Ignore me if you think this is trivial. 
> 
> Good suggestion. Or this simple one: "we are de-activating"?

lru < LRU_ACTIVE will never be true for file pages,
either active or inactive.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-12 13:32                                             ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-12 13:32 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Minchan Kim, Andrew Morton, hannes, peterz, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro

Wu Fengguang wrote:
> On Tue, May 12, 2009 at 03:26:33PM +0800, Minchan Kim wrote:
>> On Tue, 12 May 2009 10:53:19 +0800
>> Wu Fengguang <fengguang.wu@intel.com> wrote:
>>
>>> The "move pages to active list" and "move pages to inactive list"
>>> code blocks are mostly identical and can be served by a function.
>>>
>>> Thanks to Andrew Morton for pointing this out.
>>>
>>> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
>>> ---
>>>  mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
>>>  1 file changed, 35 insertions(+), 49 deletions(-)
>>>
>>> --- linux.orig/mm/vmscan.c
>>> +++ linux/mm/vmscan.c
>>> @@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
>>>   * But we had to alter page->flags anyway.
>>>   */
>>>  
>>> +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
>>> +{
>>> +	unsigned long pgmoved = 0;
>>> +
>>> +	while (!list_empty(&list)) {
>>> +		page = lru_to_page(&list);
>>> +		prefetchw_prev_lru_page(page, &list, flags);
>>> +
>>> +		VM_BUG_ON(PageLRU(page));
>>> +		SetPageLRU(page);
>>> +
>>> +		VM_BUG_ON(!PageActive(page));
>>> +		if (lru < LRU_ACTIVE)
>>> +			ClearPageActive(page);
>> Arithmetic on the LRU list is not good code for redability, I think. 
>> How about adding comment? 
>>
>> if (lru < LRU_ACTIVE) /* In case of moving from active list to inactive */
>>
>> Ignore me if you think this is trivial. 
> 
> Good suggestion. Or this simple one: "we are de-activating"?

lru < LRU_ACTIVE will never be true for file pages,
either active or inactive.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-12 20:54                                           ` Christoph Lameter
@ 2009-05-12 17:06                                             ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-12 17:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

Christoph Lameter wrote:
> All these expiration modifications do not take into account that a desktop
> may sit idle for hours while some other things run in the background (like
> backups at night or updatedb and other maintenance things). This still
> means that the desktop will be usuable in the morning.

New file pages start on the inactive list and will get reclaimed
after one access. Only file pages that get accessed multiple times
get promoted to the active file LRU.

The patch that only allows active file pages to be deactivated
if the active file LRU is larger than the inactive file LRU should
protect the working set from being evicted due to streaming IO.

Even if the working set is currently idle.

> I have had some success with a patch that protects a pages in the file
> cache from being unmapped if the mapped pages are below a certain
> percentage of the file cache. Its another VM knob to define the percentage
> though.

That is another way of protecting mapped file pages, but it does
not have the side effect of protecting the page cache working set
on eg. file servers or mysql or postgresql installs with default
tunables (which rely heavily on the page cache to cache the right
things).

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-12 17:06                                             ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-12 17:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

Christoph Lameter wrote:
> All these expiration modifications do not take into account that a desktop
> may sit idle for hours while some other things run in the background (like
> backups at night or updatedb and other maintenance things). This still
> means that the desktop will be usuable in the morning.

New file pages start on the inactive list and will get reclaimed
after one access. Only file pages that get accessed multiple times
get promoted to the active file LRU.

The patch that only allows active file pages to be deactivated
if the active file LRU is larger than the inactive file LRU should
protect the working set from being evicted due to streaming IO.

Even if the working set is currently idle.

> I have had some success with a patch that protects a pages in the file
> cache from being unmapped if the mapped pages are below a certain
> percentage of the file cache. Its another VM knob to define the percentage
> though.

That is another way of protecting mapped file pages, but it does
not have the side effect of protecting the page cache working set
on eg. file servers or mysql or postgresql installs with default
tunables (which rely heavily on the page cache to cache the right
things).

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-12 21:20                                               ` Christoph Lameter
@ 2009-05-12 17:39                                                 ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-12 17:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

Christoph Lameter wrote:
> On Tue, 12 May 2009, Rik van Riel wrote:
> 
>> The patch that only allows active file pages to be deactivated
>> if the active file LRU is larger than the inactive file LRU should
>> protect the working set from being evicted due to streaming IO.
> 
> Streaming I/O means access once? 

Yeah, "used-once pages" would be a better criteria, since
you could go through a gigantic set of used-once pages without
doing linear IO.

I expect that some databases might do that.

> What exactly are the criteria for a page
> to be part of streaming I/O? AFAICT the definition is more dependent on
> the software running than on a certain usage pattern discernible to the
> VM. Software may after all perform multiple scans over a stream of data or
> go back to prior locations in the file.


-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-12 17:39                                                 ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-12 17:39 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

Christoph Lameter wrote:
> On Tue, 12 May 2009, Rik van Riel wrote:
> 
>> The patch that only allows active file pages to be deactivated
>> if the active file LRU is larger than the inactive file LRU should
>> protect the working set from being evicted due to streaming IO.
> 
> Streaming I/O means access once? 

Yeah, "used-once pages" would be a better criteria, since
you could go through a gigantic set of used-once pages without
doing linear IO.

I expect that some databases might do that.

> What exactly are the criteria for a page
> to be part of streaming I/O? AFAICT the definition is more dependent on
> the software running than on a certain usage pattern discernible to the
> VM. Software may after all perform multiple scans over a stream of data or
> go back to prior locations in the file.


-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-12 22:02                                                   ` Christoph Lameter
@ 2009-05-12 20:17                                                     ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-12 20:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

Christoph Lameter wrote:
> On Tue, 12 May 2009, Rik van Riel wrote:
> 
>>> Streaming I/O means access once?
>> Yeah, "used-once pages" would be a better criteria, since
>> you could go through a gigantic set of used-once pages without
>> doing linear IO.
> 
> Can we see some load for which this patch has a beneficial effect?
> With some numbers?

How many do you want before you're satisfied that this
benefits a significant number of workloads?

How many numbers do you want to feel safe that no workloads
suffer badly from this patch?

Also, wow would you measure a concept as nebulous as desktop
interactivity?

Btw, the patch has gone into the Fedora kernel RPM to get
a good amount of user testing.  I'll let you know what the
users say (if anything).

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-12 20:17                                                     ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-12 20:17 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

Christoph Lameter wrote:
> On Tue, 12 May 2009, Rik van Riel wrote:
> 
>>> Streaming I/O means access once?
>> Yeah, "used-once pages" would be a better criteria, since
>> you could go through a gigantic set of used-once pages without
>> doing linear IO.
> 
> Can we see some load for which this patch has a beneficial effect?
> With some numbers?

How many do you want before you're satisfied that this
benefits a significant number of workloads?

How many numbers do you want to feel safe that no workloads
suffer badly from this patch?

Also, wow would you measure a concept as nebulous as desktop
interactivity?

Btw, the patch has gone into the Fedora kernel RPM to get
a good amount of user testing.  I'll let you know what the
users say (if anything).

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-12 20:17                                                     ` Rik van Riel
@ 2009-05-12 20:26                                                       ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-12 20:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

On Tue, 12 May 2009, Rik van Riel wrote:

> How many do you want before you're satisfied that this
> benefits a significant number of workloads?

One would be a good starter.

> How many numbers do you want to feel safe that no workloads
> suffer badly from this patch?
>
> Also, wow would you measure a concept as nebulous as desktop
> interactivity?

Measure the response to desktop clicks? I.e. retrieve an URL with a
webbrowser that was running when the other load started.

> Btw, the patch has gone into the Fedora kernel RPM to get
> a good amount of user testing.  I'll let you know what the
> users say (if anything).

I have not seen a single reference to a measurement taken with this patch.

Maybe that is because I have not looked at the threads that discuss
measurements with this patch. Where are they?

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-12 20:26                                                       ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-12 20:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

On Tue, 12 May 2009, Rik van Riel wrote:

> How many do you want before you're satisfied that this
> benefits a significant number of workloads?

One would be a good starter.

> How many numbers do you want to feel safe that no workloads
> suffer badly from this patch?
>
> Also, wow would you measure a concept as nebulous as desktop
> interactivity?

Measure the response to desktop clicks? I.e. retrieve an URL with a
webbrowser that was running when the other load started.

> Btw, the patch has gone into the Fedora kernel RPM to get
> a good amount of user testing.  I'll let you know what the
> users say (if anything).

I have not seen a single reference to a measurement taken with this patch.

Maybe that is because I have not looked at the threads that discuss
measurements with this patch. Where are they?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-12  3:00                                         ` KOSAKI Motohiro
@ 2009-05-12 20:54                                           ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-12 20:54 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Wu Fengguang, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, minchan.kim

All these expiration modifications do not take into account that a desktop
may sit idle for hours while some other things run in the background (like
backups at night or updatedb and other maintenance things). This still
means that the desktop will be usuable in the morning.

I have had some success with a patch that protects a pages in the file
cache from being unmapped if the mapped pages are below a certain
percentage of the file cache. Its another VM knob to define the percentage
though.


Subject: Do not evict mapped pages

It is quite annoying when important executable pages of the user interface
are evicted from memory because backup or some other function runs and no one
is clicking any buttons for awhile. Once you get back to the desktop and
try to click a link one is in for a surprise. It can take quite a long time
for the desktop to recover from the swap outs.

This patch ensures that mapped pages in the file cache are not evicted if there
are a sufficient number of unmapped pages present. A similar technique is
already in use under NUMA for zone reclaim. The same method can be used to
protect mapped pages from reclaim.

The percentage of file backed pages protected is set via
/proc/sys/vm/file_mapped_ratio. This defaults to 20%.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 Documentation/sysctl/vm.txt |   14 ++++++++++++++
 include/linux/swap.h        |    1 +
 kernel/sysctl.c             |   13 ++++++++++++-
 mm/vmscan.c                 |   32 ++++++++++++++++++++++++++++----
 4 files changed, 55 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c	2009-05-11 21:37:15.397876418 -0500
+++ linux-2.6/mm/vmscan.c	2009-05-11 21:37:23.287875742 -0500
@@ -585,7 +585,8 @@ void putback_lru_page(struct page *page)
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
 					struct scan_control *sc,
-					enum pageout_io sync_writeback)
+					enum pageout_io sync_writeback,
+					int unmap_mapped)
 {
 	LIST_HEAD(ret_pages);
 	struct pagevec freed_pvec;
@@ -616,7 +617,7 @@ static unsigned long shrink_page_list(st
 		if (unlikely(!page_evictable(page, NULL)))
 			goto cull_mlocked;

-		if (!sc->may_unmap && page_mapped(page))
+		if (!unmap_mapped && page_mapped(page))
 			goto keep_locked;

 		/* Double the slab pressure for mapped and swapcache pages */
@@ -1047,6 +1048,12 @@ int isolate_lru_page(struct page *page)
 }

 /*
+ * Percentage of pages of the file lru necessary for unmapping of
+ * pages to occur during reclaim.
+ */
+int sysctl_file_unmap_ratio = 20;
+
+/*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
@@ -1059,10 +1066,26 @@ static unsigned long shrink_inactive_lis
 	unsigned long nr_scanned = 0;
 	unsigned long nr_reclaimed = 0;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
+	int unmap_mapped = 0;

 	pagevec_init(&pvec, 1);

 	lru_add_drain();
+
+	/*
+	 * Only allow unmapping of file backed pages if the amount of file
+	 * mapped page becomes greater than a certain percentage of the file
+	 * lru (+ free memory in order to avoid useless unmaps before memory
+	 * fills up).
+	 */
+	if (sc->may_unmap && (!file ||
+		zone_page_state(zone, NR_FILE_MAPPED) * 100 >
+			(zone_page_state(zone, NR_FREE_PAGES) +
+			zone_page_state(zone, NR_ACTIVE_FILE) +
+			zone_page_state(zone, NR_INACTIVE_FILE))
+				* sysctl_file_unmap_ratio))
+					unmap_mapped = 1;
+
 	spin_lock_irq(&zone->lru_lock);
 	do {
 		struct page *page;
@@ -1111,7 +1134,8 @@ static unsigned long shrink_inactive_lis
 		spin_unlock_irq(&zone->lru_lock);

 		nr_scanned += nr_scan;
-		nr_freed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
+		nr_freed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC,
+									unmap_mapped);

 		/*
 		 * If we are direct reclaiming for contiguous pages and we do
@@ -1131,7 +1155,7 @@ static unsigned long shrink_inactive_lis
 			count_vm_events(PGDEACTIVATE, nr_active);

 			nr_freed += shrink_page_list(&page_list, sc,
-							PAGEOUT_IO_SYNC);
+						PAGEOUT_IO_SYNC, unmap_mapped);
 		}

 		nr_reclaimed += nr_freed;
Index: linux-2.6/include/linux/swap.h
===================================================================
--- linux-2.6.orig/include/linux/swap.h	2009-05-11 21:37:15.417879047 -0500
+++ linux-2.6/include/linux/swap.h	2009-05-11 21:37:23.287875742 -0500
@@ -221,6 +221,7 @@ extern unsigned long shrink_all_memory(u
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
+extern int sysctl_file_unmap_ratio;

 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
Index: linux-2.6/kernel/sysctl.c
===================================================================
--- linux-2.6.orig/kernel/sysctl.c	2009-05-11 21:37:15.467877848 -0500
+++ linux-2.6/kernel/sysctl.c	2009-05-11 21:37:23.307877270 -0500
@@ -92,7 +92,6 @@ extern int rcutorture_runnable;

 /* Constants used for minimum and  maximum */
 #ifdef CONFIG_DETECT_SOFTLOCKUP
-static int sixty = 60;
 static int neg_one = -1;
 #endif

@@ -100,6 +99,7 @@ static int zero;
 static int __maybe_unused one = 1;
 static int __maybe_unused two = 2;
 static unsigned long one_ul = 1;
+static int sixty = 60;
 static int one_hundred = 100;
 static int one_thousand = 1000;

@@ -1141,6 +1141,17 @@ static struct ctl_table vm_table[] = {
 		.strategy	= &sysctl_intvec,
 		.extra1		= &min_percpu_pagelist_fract,
 	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "file_mapped_ratio",
+		.data		= &sysctl_file_unmap_ratio,
+		.maxlen		= sizeof(sysctl_file_unmap_ratio),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+		.extra2		= &sixty,
+	},
 #ifdef CONFIG_MMU
 	{
 		.ctl_name	= VM_MAX_MAP_COUNT,
Index: linux-2.6/Documentation/sysctl/vm.txt
===================================================================
--- linux-2.6.orig/Documentation/sysctl/vm.txt	2009-05-11 21:45:43.937878597 -0500
+++ linux-2.6/Documentation/sysctl/vm.txt	2009-05-11 21:52:57.217874275 -0500
@@ -26,6 +26,7 @@ Currently, these files are in /proc/sys/
 - dirty_ratio
 - dirty_writeback_centisecs
 - drop_caches
+- file_mapped_ratio
 - hugepages_treat_as_movable
 - hugetlb_shm_group
 - laptop_mode
@@ -140,6 +141,19 @@ user should run `sync' first.

 ==============================================================

+file_mapped_ratio
+
+A percentage of the file backed pages in memory. If there are more
+mapped pages than this percentage then reclaim will unmap pages
+from the memory of processes.
+
+The main function of this ratio is to protect pags in use
+by proceses from streaming I/O and other operations that
+put a lot of churn on the page cache and would usually evict
+most pages.
+
+==============================================================
+
 hugepages_treat_as_movable

 This parameter is only useful when kernelcore= is specified at boot time to

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-12 20:54                                           ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-12 20:54 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Wu Fengguang, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, minchan.kim

All these expiration modifications do not take into account that a desktop
may sit idle for hours while some other things run in the background (like
backups at night or updatedb and other maintenance things). This still
means that the desktop will be usuable in the morning.

I have had some success with a patch that protects a pages in the file
cache from being unmapped if the mapped pages are below a certain
percentage of the file cache. Its another VM knob to define the percentage
though.


Subject: Do not evict mapped pages

It is quite annoying when important executable pages of the user interface
are evicted from memory because backup or some other function runs and no one
is clicking any buttons for awhile. Once you get back to the desktop and
try to click a link one is in for a surprise. It can take quite a long time
for the desktop to recover from the swap outs.

This patch ensures that mapped pages in the file cache are not evicted if there
are a sufficient number of unmapped pages present. A similar technique is
already in use under NUMA for zone reclaim. The same method can be used to
protect mapped pages from reclaim.

The percentage of file backed pages protected is set via
/proc/sys/vm/file_mapped_ratio. This defaults to 20%.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>

---
 Documentation/sysctl/vm.txt |   14 ++++++++++++++
 include/linux/swap.h        |    1 +
 kernel/sysctl.c             |   13 ++++++++++++-
 mm/vmscan.c                 |   32 ++++++++++++++++++++++++++++----
 4 files changed, 55 insertions(+), 5 deletions(-)

Index: linux-2.6/mm/vmscan.c
===================================================================
--- linux-2.6.orig/mm/vmscan.c	2009-05-11 21:37:15.397876418 -0500
+++ linux-2.6/mm/vmscan.c	2009-05-11 21:37:23.287875742 -0500
@@ -585,7 +585,8 @@ void putback_lru_page(struct page *page)
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
 					struct scan_control *sc,
-					enum pageout_io sync_writeback)
+					enum pageout_io sync_writeback,
+					int unmap_mapped)
 {
 	LIST_HEAD(ret_pages);
 	struct pagevec freed_pvec;
@@ -616,7 +617,7 @@ static unsigned long shrink_page_list(st
 		if (unlikely(!page_evictable(page, NULL)))
 			goto cull_mlocked;

-		if (!sc->may_unmap && page_mapped(page))
+		if (!unmap_mapped && page_mapped(page))
 			goto keep_locked;

 		/* Double the slab pressure for mapped and swapcache pages */
@@ -1047,6 +1048,12 @@ int isolate_lru_page(struct page *page)
 }

 /*
+ * Percentage of pages of the file lru necessary for unmapping of
+ * pages to occur during reclaim.
+ */
+int sysctl_file_unmap_ratio = 20;
+
+/*
  * shrink_inactive_list() is a helper for shrink_zone().  It returns the number
  * of reclaimed pages
  */
@@ -1059,10 +1066,26 @@ static unsigned long shrink_inactive_lis
 	unsigned long nr_scanned = 0;
 	unsigned long nr_reclaimed = 0;
 	struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc);
+	int unmap_mapped = 0;

 	pagevec_init(&pvec, 1);

 	lru_add_drain();
+
+	/*
+	 * Only allow unmapping of file backed pages if the amount of file
+	 * mapped page becomes greater than a certain percentage of the file
+	 * lru (+ free memory in order to avoid useless unmaps before memory
+	 * fills up).
+	 */
+	if (sc->may_unmap && (!file ||
+		zone_page_state(zone, NR_FILE_MAPPED) * 100 >
+			(zone_page_state(zone, NR_FREE_PAGES) +
+			zone_page_state(zone, NR_ACTIVE_FILE) +
+			zone_page_state(zone, NR_INACTIVE_FILE))
+				* sysctl_file_unmap_ratio))
+					unmap_mapped = 1;
+
 	spin_lock_irq(&zone->lru_lock);
 	do {
 		struct page *page;
@@ -1111,7 +1134,8 @@ static unsigned long shrink_inactive_lis
 		spin_unlock_irq(&zone->lru_lock);

 		nr_scanned += nr_scan;
-		nr_freed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC);
+		nr_freed = shrink_page_list(&page_list, sc, PAGEOUT_IO_ASYNC,
+									unmap_mapped);

 		/*
 		 * If we are direct reclaiming for contiguous pages and we do
@@ -1131,7 +1155,7 @@ static unsigned long shrink_inactive_lis
 			count_vm_events(PGDEACTIVATE, nr_active);

 			nr_freed += shrink_page_list(&page_list, sc,
-							PAGEOUT_IO_SYNC);
+						PAGEOUT_IO_SYNC, unmap_mapped);
 		}

 		nr_reclaimed += nr_freed;
Index: linux-2.6/include/linux/swap.h
===================================================================
--- linux-2.6.orig/include/linux/swap.h	2009-05-11 21:37:15.417879047 -0500
+++ linux-2.6/include/linux/swap.h	2009-05-11 21:37:23.287875742 -0500
@@ -221,6 +221,7 @@ extern unsigned long shrink_all_memory(u
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern long vm_total_pages;
+extern int sysctl_file_unmap_ratio;

 #ifdef CONFIG_NUMA
 extern int zone_reclaim_mode;
Index: linux-2.6/kernel/sysctl.c
===================================================================
--- linux-2.6.orig/kernel/sysctl.c	2009-05-11 21:37:15.467877848 -0500
+++ linux-2.6/kernel/sysctl.c	2009-05-11 21:37:23.307877270 -0500
@@ -92,7 +92,6 @@ extern int rcutorture_runnable;

 /* Constants used for minimum and  maximum */
 #ifdef CONFIG_DETECT_SOFTLOCKUP
-static int sixty = 60;
 static int neg_one = -1;
 #endif

@@ -100,6 +99,7 @@ static int zero;
 static int __maybe_unused one = 1;
 static int __maybe_unused two = 2;
 static unsigned long one_ul = 1;
+static int sixty = 60;
 static int one_hundred = 100;
 static int one_thousand = 1000;

@@ -1141,6 +1141,17 @@ static struct ctl_table vm_table[] = {
 		.strategy	= &sysctl_intvec,
 		.extra1		= &min_percpu_pagelist_fract,
 	},
+	{
+		.ctl_name	= CTL_UNNUMBERED,
+		.procname	= "file_mapped_ratio",
+		.data		= &sysctl_file_unmap_ratio,
+		.maxlen		= sizeof(sysctl_file_unmap_ratio),
+		.mode		= 0644,
+		.proc_handler	= &proc_dointvec_minmax,
+		.strategy	= &sysctl_intvec,
+		.extra1		= &zero,
+		.extra2		= &sixty,
+	},
 #ifdef CONFIG_MMU
 	{
 		.ctl_name	= VM_MAX_MAP_COUNT,
Index: linux-2.6/Documentation/sysctl/vm.txt
===================================================================
--- linux-2.6.orig/Documentation/sysctl/vm.txt	2009-05-11 21:45:43.937878597 -0500
+++ linux-2.6/Documentation/sysctl/vm.txt	2009-05-11 21:52:57.217874275 -0500
@@ -26,6 +26,7 @@ Currently, these files are in /proc/sys/
 - dirty_ratio
 - dirty_writeback_centisecs
 - drop_caches
+- file_mapped_ratio
 - hugepages_treat_as_movable
 - hugetlb_shm_group
 - laptop_mode
@@ -140,6 +141,19 @@ user should run `sync' first.

 ==============================================================

+file_mapped_ratio
+
+A percentage of the file backed pages in memory. If there are more
+mapped pages than this percentage then reclaim will unmap pages
+from the memory of processes.
+
+The main function of this ratio is to protect pags in use
+by proceses from streaming I/O and other operations that
+put a lot of churn on the page cache and would usually evict
+most pages.
+
+==============================================================
+
 hugepages_treat_as_movable

 This parameter is only useful when kernelcore= is specified at boot time to

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-12 17:06                                             ` Rik van Riel
@ 2009-05-12 21:20                                               ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-12 21:20 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

On Tue, 12 May 2009, Rik van Riel wrote:

> The patch that only allows active file pages to be deactivated
> if the active file LRU is larger than the inactive file LRU should
> protect the working set from being evicted due to streaming IO.

Streaming I/O means access once? What exactly are the criteria for a page
to be part of streaming I/O? AFAICT the definition is more dependent on
the software running than on a certain usage pattern discernible to the
VM. Software may after all perform multiple scans over a stream of data or
go back to prior locations in the file.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-12 21:20                                               ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-12 21:20 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

On Tue, 12 May 2009, Rik van Riel wrote:

> The patch that only allows active file pages to be deactivated
> if the active file LRU is larger than the inactive file LRU should
> protect the working set from being evicted due to streaming IO.

Streaming I/O means access once? What exactly are the criteria for a page
to be part of streaming I/O? AFAICT the definition is more dependent on
the software running than on a certain usage pattern discernible to the
VM. Software may after all perform multiple scans over a stream of data or
go back to prior locations in the file.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-12 17:39                                                 ` Rik van Riel
@ 2009-05-12 22:02                                                   ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-12 22:02 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

On Tue, 12 May 2009, Rik van Riel wrote:

> > Streaming I/O means access once?
>
> Yeah, "used-once pages" would be a better criteria, since
> you could go through a gigantic set of used-once pages without
> doing linear IO.

Can we see some load for which this patch has a beneficial effect?
With some numbers?


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-12 22:02                                                   ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-12 22:02 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KOSAKI Motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

On Tue, 12 May 2009, Rik van Riel wrote:

> > Streaming I/O means access once?
>
> Yeah, "used-once pages" would be a better criteria, since
> you could go through a gigantic set of used-once pages without
> doing linear IO.

Can we see some load for which this patch has a beneficial effect?
With some numbers?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-12 20:54                                           ` Christoph Lameter
@ 2009-05-13  0:45                                             ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-13  0:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	riel, linux-kernel, tytso, linux-mm, elladan, npiggin,
	minchan.kim

> All these expiration modifications do not take into account that a desktop
> may sit idle for hours while some other things run in the background (like
> backups at night or updatedb and other maintenance things). This still
> means that the desktop will be usuable in the morning.

Have you seen this phenomenom?
I always use linux desktop for development. but I haven't seen it.
perhaps I have no luck. I really want to know reproduce way.

Please let me know reproduce way.


> I have had some success with a patch that protects a pages in the file
> cache from being unmapped if the mapped pages are below a certain
> percentage of the file cache. Its another VM knob to define the percentage
> though.
> 
> 
> Subject: Do not evict mapped pages
> 
> It is quite annoying when important executable pages of the user interface
> are evicted from memory because backup or some other function runs and no one
> is clicking any buttons for awhile. Once you get back to the desktop and
> try to click a link one is in for a surprise. It can take quite a long time
> for the desktop to recover from the swap outs.
> 
> This patch ensures that mapped pages in the file cache are not evicted if there
> are a sufficient number of unmapped pages present. A similar technique is
> already in use under NUMA for zone reclaim. The same method can be used to
> protect mapped pages from reclaim.

note: (a bit offtopic)

some Nehalem machine has long node distance and enabled zone reclaim mode.
but it cause terrible result.

it only works on large numa.

> 
> The percentage of file backed pages protected is set via
> /proc/sys/vm/file_mapped_ratio. This defaults to 20%.

Why do you think typical mapped ratio is less than 20% on desktop machine?

Some desktop component (e.g. V4L, GEM, some game) use tons mapped page.
but in the other hand, another some desktop user only use browser.
So we can't assume typical mapped ratio on desktop, IMHO.

Plus, typical desktop user don't set any sysctl value.

key point is access-once vs access-many.
I don't think mapped ratio is good approximation value.





^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-13  0:45                                             ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-13  0:45 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	riel, linux-kernel, tytso, linux-mm, elladan, npiggin,
	minchan.kim

> All these expiration modifications do not take into account that a desktop
> may sit idle for hours while some other things run in the background (like
> backups at night or updatedb and other maintenance things). This still
> means that the desktop will be usuable in the morning.

Have you seen this phenomenom?
I always use linux desktop for development. but I haven't seen it.
perhaps I have no luck. I really want to know reproduce way.

Please let me know reproduce way.


> I have had some success with a patch that protects a pages in the file
> cache from being unmapped if the mapped pages are below a certain
> percentage of the file cache. Its another VM knob to define the percentage
> though.
> 
> 
> Subject: Do not evict mapped pages
> 
> It is quite annoying when important executable pages of the user interface
> are evicted from memory because backup or some other function runs and no one
> is clicking any buttons for awhile. Once you get back to the desktop and
> try to click a link one is in for a surprise. It can take quite a long time
> for the desktop to recover from the swap outs.
> 
> This patch ensures that mapped pages in the file cache are not evicted if there
> are a sufficient number of unmapped pages present. A similar technique is
> already in use under NUMA for zone reclaim. The same method can be used to
> protect mapped pages from reclaim.

note: (a bit offtopic)

some Nehalem machine has long node distance and enabled zone reclaim mode.
but it cause terrible result.

it only works on large numa.

> 
> The percentage of file backed pages protected is set via
> /proc/sys/vm/file_mapped_ratio. This defaults to 20%.

Why do you think typical mapped ratio is less than 20% on desktop machine?

Some desktop component (e.g. V4L, GEM, some game) use tons mapped page.
but in the other hand, another some desktop user only use browser.
So we can't assume typical mapped ratio on desktop, IMHO.

Plus, typical desktop user don't set any sysctl value.

key point is access-once vs access-many.
I don't think mapped ratio is good approximation value.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-13  0:45                                             ` KOSAKI Motohiro
@ 2009-05-14 20:14                                               ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-14 20:14 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Wu Fengguang, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, minchan.kim

On Wed, 13 May 2009, KOSAKI Motohiro wrote:

> > All these expiration modifications do not take into account that a desktop
> > may sit idle for hours while some other things run in the background (like
> > backups at night or updatedb and other maintenance things). This still
> > means that the desktop will be usuable in the morning.
>
> Have you seen this phenomenom?
> I always use linux desktop for development. but I haven't seen it.
> perhaps I have no luck. I really want to know reproduce way.
>
> Please let me know reproduce way.

Run a backup (or rsync) over a few hundred GB.

> > The percentage of file backed pages protected is set via
> > /proc/sys/vm/file_mapped_ratio. This defaults to 20%.
>
> Why do you think typical mapped ratio is less than 20% on desktop machine?

Observation of the typical mapped size of Firefox under KDE.

> key point is access-once vs access-many.

Nothing against it if it works.


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-14 20:14                                               ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-14 20:14 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Wu Fengguang, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, minchan.kim

On Wed, 13 May 2009, KOSAKI Motohiro wrote:

> > All these expiration modifications do not take into account that a desktop
> > may sit idle for hours while some other things run in the background (like
> > backups at night or updatedb and other maintenance things). This still
> > means that the desktop will be usuable in the morning.
>
> Have you seen this phenomenom?
> I always use linux desktop for development. but I haven't seen it.
> perhaps I have no luck. I really want to know reproduce way.
>
> Please let me know reproduce way.

Run a backup (or rsync) over a few hundred GB.

> > The percentage of file backed pages protected is set via
> > /proc/sys/vm/file_mapped_ratio. This defaults to 20%.
>
> Why do you think typical mapped ratio is less than 20% on desktop machine?

Observation of the typical mapped size of Firefox under KDE.

> key point is access-once vs access-many.

Nothing against it if it works.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-14 20:14                                               ` Christoph Lameter
@ 2009-05-14 23:28                                                 ` KOSAKI Motohiro
  -1 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-14 23:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	riel, linux-kernel, tytso, linux-mm, elladan, npiggin,
	minchan.kim

> On Wed, 13 May 2009, KOSAKI Motohiro wrote:
> 
> > > All these expiration modifications do not take into account that a desktop
> > > may sit idle for hours while some other things run in the background (like
> > > backups at night or updatedb and other maintenance things). This still
> > > means that the desktop will be usuable in the morning.
> >
> > Have you seen this phenomenom?
> > I always use linux desktop for development. but I haven't seen it.
> > perhaps I have no luck. I really want to know reproduce way.
> >
> > Please let me know reproduce way.
> 
> Run a backup (or rsync) over a few hundred GB.

-ENOTREPRODUCED

umm.
May I ask detail operation?


> > > The percentage of file backed pages protected is set via
> > > /proc/sys/vm/file_mapped_ratio. This defaults to 20%.
> >
> > Why do you think typical mapped ratio is less than 20% on desktop machine?
> 
> Observation of the typical mapped size of Firefox under KDE.

My point is, desktop people have very various mapped ratio.
Do you oppose this?


> > key point is access-once vs access-many.
> 
> Nothing against it if it works.
> 




^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-14 23:28                                                 ` KOSAKI Motohiro
  0 siblings, 0 replies; 336+ messages in thread
From: KOSAKI Motohiro @ 2009-05-14 23:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: kosaki.motohiro, Wu Fengguang, Andrew Morton, hannes, peterz,
	riel, linux-kernel, tytso, linux-mm, elladan, npiggin,
	minchan.kim

> On Wed, 13 May 2009, KOSAKI Motohiro wrote:
> 
> > > All these expiration modifications do not take into account that a desktop
> > > may sit idle for hours while some other things run in the background (like
> > > backups at night or updatedb and other maintenance things). This still
> > > means that the desktop will be usuable in the morning.
> >
> > Have you seen this phenomenom?
> > I always use linux desktop for development. but I haven't seen it.
> > perhaps I have no luck. I really want to know reproduce way.
> >
> > Please let me know reproduce way.
> 
> Run a backup (or rsync) over a few hundred GB.

-ENOTREPRODUCED

umm.
May I ask detail operation?


> > > The percentage of file backed pages protected is set via
> > > /proc/sys/vm/file_mapped_ratio. This defaults to 20%.
> >
> > Why do you think typical mapped ratio is less than 20% on desktop machine?
> 
> Observation of the typical mapped size of Firefox under KDE.

My point is, desktop people have very various mapped ratio.
Do you oppose this?


> > key point is access-once vs access-many.
> 
> Nothing against it if it works.
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-14 23:28                                                 ` KOSAKI Motohiro
@ 2009-05-14 23:42                                                   ` Rik van Riel
  -1 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-14 23:42 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

KOSAKI Motohiro wrote:

>>>> The percentage of file backed pages protected is set via
>>>> /proc/sys/vm/file_mapped_ratio. This defaults to 20%.
>>> Why do you think typical mapped ratio is less than 20% on desktop machine?
>> Observation of the typical mapped size of Firefox under KDE.
> 
> My point is, desktop people have very various mapped ratio.
> Do you oppose this?

I suspect that the mapped ratio could be much higher
on my system.  I have only 2GB of RAM dedicated to my
dom0 (which is also my desktop) and the amount of page
cache often goes down to about 150MB.

At the moment nr_mapped is 26400 and the amount of
memory taken up by buffer and page cache together is
a little over 300MB.  That's close to 50%.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-14 23:42                                                   ` Rik van Riel
  0 siblings, 0 replies; 336+ messages in thread
From: Rik van Riel @ 2009-05-14 23:42 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Wu Fengguang, Andrew Morton, hannes, peterz,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

KOSAKI Motohiro wrote:

>>>> The percentage of file backed pages protected is set via
>>>> /proc/sys/vm/file_mapped_ratio. This defaults to 20%.
>>> Why do you think typical mapped ratio is less than 20% on desktop machine?
>> Observation of the typical mapped size of Firefox under KDE.
> 
> My point is, desktop people have very various mapped ratio.
> Do you oppose this?

I suspect that the mapped ratio could be much higher
on my system.  I have only 2GB of RAM dedicated to my
dom0 (which is also my desktop) and the amount of page
cache often goes down to about 150MB.

At the moment nr_mapped is 26400 and the amount of
memory taken up by buffer and page cache together is
a little over 300MB.  That's close to 50%.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-14 23:28                                                 ` KOSAKI Motohiro
@ 2009-05-15 18:09                                                   ` Christoph Lameter
  -1 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-15 18:09 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Wu Fengguang, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, minchan.kim

On Fri, 15 May 2009, KOSAKI Motohiro wrote:

> May I ask detail operation?

Detailed operation? Well no. More of an experience.

Browse the web in the evening. Let the backup run overnight. Try to access
the web in the morning. Pretty unscientific.

> > Observation of the typical mapped size of Firefox under KDE.
>
> My point is, desktop people have very various mapped ratio.
> Do you oppose this?

No of course not. Loads may have different mapped ratios. That is why
there is a /proc/sys/vm tunable in my patch (which is not good as
mentioned in the patch). If Rik's solution works without it great.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-15 18:09                                                   ` Christoph Lameter
  0 siblings, 0 replies; 336+ messages in thread
From: Christoph Lameter @ 2009-05-15 18:09 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Wu Fengguang, Andrew Morton, hannes, peterz, riel, linux-kernel,
	tytso, linux-mm, elladan, npiggin, minchan.kim

On Fri, 15 May 2009, KOSAKI Motohiro wrote:

> May I ask detail operation?

Detailed operation? Well no. More of an experience.

Browse the web in the evening. Let the backup run overnight. Try to access
the web in the morning. Pretty unscientific.

> > Observation of the typical mapped size of Firefox under KDE.
>
> My point is, desktop people have very various mapped ratio.
> Do you oppose this?

No of course not. Loads may have different mapped ratios. That is why
there is a /proc/sys/vm tunable in my patch (which is not good as
mentioned in the patch). If Rik's solution works without it great.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
  2009-05-14 20:14                                               ` Christoph Lameter
@ 2009-05-16  8:54                                                 ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-16  8:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Andrew Morton, hannes, peterz, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

Hi Christoph,

On Fri, May 15, 2009 at 04:14:31AM +0800, Christoph Lameter wrote:
> On Wed, 13 May 2009, KOSAKI Motohiro wrote:
> 
> > > All these expiration modifications do not take into account that a desktop
> > > may sit idle for hours while some other things run in the background (like
> > > backups at night or updatedb and other maintenance things). This still
> > > means that the desktop will be usuable in the morning.
> >
> > Have you seen this phenomenom?
> > I always use linux desktop for development. but I haven't seen it.
> > perhaps I have no luck. I really want to know reproduce way.
> >
> > Please let me know reproduce way.
> 
> Run a backup (or rsync) over a few hundred GB.

Simple experiments show that rsync is use-once workload:

1) fresh run(full backup): the source file pages in the logo/ dir are cached and
   referenced *once*:

        rsync -a logo localhost:/tmp/

2) second run(incremental backup): only the updated files are read and
   read only once:

        rsync -a logo localhost:/tmp/

> > > The percentage of file backed pages protected is set via
> > > /proc/sys/vm/file_mapped_ratio. This defaults to 20%.
> >
> > Why do you think typical mapped ratio is less than 20% on desktop machine?
> 
> Observation of the typical mapped size of Firefox under KDE.

Since the explicit PROT_EXEC targeted mmap page protection plus Rik's
use-once patch works just OK for rsync - a typical backup scenario,
and it works without an extra sysctl tunable, I tend to continue
pushing the PROT_EXEC approach :-)

Thanks,
Fengguang

> > key point is access-once vs access-many.
> 
> Nothing against it if it works.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim
@ 2009-05-16  8:54                                                 ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-16  8:54 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: KOSAKI Motohiro, Andrew Morton, hannes, peterz, riel,
	linux-kernel, tytso, linux-mm, elladan, npiggin, minchan.kim

Hi Christoph,

On Fri, May 15, 2009 at 04:14:31AM +0800, Christoph Lameter wrote:
> On Wed, 13 May 2009, KOSAKI Motohiro wrote:
> 
> > > All these expiration modifications do not take into account that a desktop
> > > may sit idle for hours while some other things run in the background (like
> > > backups at night or updatedb and other maintenance things). This still
> > > means that the desktop will be usuable in the morning.
> >
> > Have you seen this phenomenom?
> > I always use linux desktop for development. but I haven't seen it.
> > perhaps I have no luck. I really want to know reproduce way.
> >
> > Please let me know reproduce way.
> 
> Run a backup (or rsync) over a few hundred GB.

Simple experiments show that rsync is use-once workload:

1) fresh run(full backup): the source file pages in the logo/ dir are cached and
   referenced *once*:

        rsync -a logo localhost:/tmp/

2) second run(incremental backup): only the updated files are read and
   read only once:

        rsync -a logo localhost:/tmp/

> > > The percentage of file backed pages protected is set via
> > > /proc/sys/vm/file_mapped_ratio. This defaults to 20%.
> >
> > Why do you think typical mapped ratio is less than 20% on desktop machine?
> 
> Observation of the typical mapped size of Firefox under KDE.

Since the explicit PROT_EXEC targeted mmap page protection plus Rik's
use-once patch works just OK for rsync - a typical backup scenario,
and it works without an extra sysctl tunable, I tend to continue
pushing the PROT_EXEC approach :-)

Thanks,
Fengguang

> > key point is access-once vs access-many.
> 
> Nothing against it if it works.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
  2009-05-12 13:20                                         ` Rik van Riel
@ 2009-05-16  9:26                                           ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-16  9:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, hannes, peterz, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, May 12, 2009 at 09:20:11PM +0800, Rik van Riel wrote:
> Wu Fengguang wrote:
> 
> >> Also, the change makes this comment:
> >>
> >> 	spin_lock_irq(&zone->lru_lock);
> >> 	/*
> >> 	 * Count referenced pages from currently used mappings as
> >> 	 * rotated, even though they are moved to the inactive list.
> >> 	 * This helps balance scan pressure between file and anonymous
> >> 	 * pages in get_scan_ratio.
> >> 	 */
> >> 	reclaim_stat->recent_rotated[!!file] += pgmoved;
> >>
> >> inaccurate.
> > 
> > Good catch, I'll just remove the stale "even though they are moved to
> > the inactive list".
> 
> Well, it is still true for !VM_EXEC pages.

This comment?

        Count referenced pages from currently used mappings as rotated, even
        though only some of them are actually re-activated. This helps...

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: make mapped executable pages the first class citizen
@ 2009-05-16  9:26                                           ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-16  9:26 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, hannes, peterz, linux-kernel, tytso, linux-mm,
	elladan, npiggin, cl, kosaki.motohiro, minchan.kim

On Tue, May 12, 2009 at 09:20:11PM +0800, Rik van Riel wrote:
> Wu Fengguang wrote:
> 
> >> Also, the change makes this comment:
> >>
> >> 	spin_lock_irq(&zone->lru_lock);
> >> 	/*
> >> 	 * Count referenced pages from currently used mappings as
> >> 	 * rotated, even though they are moved to the inactive list.
> >> 	 * This helps balance scan pressure between file and anonymous
> >> 	 * pages in get_scan_ratio.
> >> 	 */
> >> 	reclaim_stat->recent_rotated[!!file] += pgmoved;
> >>
> >> inaccurate.
> > 
> > Good catch, I'll just remove the stale "even though they are moved to
> > the inactive list".
> 
> Well, it is still true for !VM_EXEC pages.

This comment?

        Count referenced pages from currently used mappings as rotated, even
        though only some of them are actually re-activated. This helps...

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
  2009-05-12 13:32                                             ` Rik van Riel
@ 2009-05-16  9:30                                               ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-16  9:30 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Minchan Kim, Andrew Morton, hannes, peterz, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro

On Tue, May 12, 2009 at 09:32:48PM +0800, Rik van Riel wrote:
> Wu Fengguang wrote:
> > On Tue, May 12, 2009 at 03:26:33PM +0800, Minchan Kim wrote:
> >> On Tue, 12 May 2009 10:53:19 +0800
> >> Wu Fengguang <fengguang.wu@intel.com> wrote:
> >>
> >>> The "move pages to active list" and "move pages to inactive list"
> >>> code blocks are mostly identical and can be served by a function.
> >>>
> >>> Thanks to Andrew Morton for pointing this out.
> >>>
> >>> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> >>> ---
> >>>  mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
> >>>  1 file changed, 35 insertions(+), 49 deletions(-)
> >>>
> >>> --- linux.orig/mm/vmscan.c
> >>> +++ linux/mm/vmscan.c
> >>> @@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
> >>>   * But we had to alter page->flags anyway.
> >>>   */
> >>>  
> >>> +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
> >>> +{
> >>> +	unsigned long pgmoved = 0;
> >>> +
> >>> +	while (!list_empty(&list)) {
> >>> +		page = lru_to_page(&list);
> >>> +		prefetchw_prev_lru_page(page, &list, flags);
> >>> +
> >>> +		VM_BUG_ON(PageLRU(page));
> >>> +		SetPageLRU(page);
> >>> +
> >>> +		VM_BUG_ON(!PageActive(page));
> >>> +		if (lru < LRU_ACTIVE)
> >>> +			ClearPageActive(page);
> >> Arithmetic on the LRU list is not good code for redability, I think. 
> >> How about adding comment? 
> >>
> >> if (lru < LRU_ACTIVE) /* In case of moving from active list to inactive */
> >>
> >> Ignore me if you think this is trivial. 
> > 
> > Good suggestion. Or this simple one: "we are de-activating"?
> 
> lru < LRU_ACTIVE will never be true for file pages,
> either active or inactive.

Thanks - that old patch version was really broken.

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [PATCH -mm] vmscan: merge duplicate code in shrink_active_list()
@ 2009-05-16  9:30                                               ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-16  9:30 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Minchan Kim, Andrew Morton, hannes, peterz, linux-kernel, tytso,
	linux-mm, elladan, npiggin, cl, kosaki.motohiro

On Tue, May 12, 2009 at 09:32:48PM +0800, Rik van Riel wrote:
> Wu Fengguang wrote:
> > On Tue, May 12, 2009 at 03:26:33PM +0800, Minchan Kim wrote:
> >> On Tue, 12 May 2009 10:53:19 +0800
> >> Wu Fengguang <fengguang.wu@intel.com> wrote:
> >>
> >>> The "move pages to active list" and "move pages to inactive list"
> >>> code blocks are mostly identical and can be served by a function.
> >>>
> >>> Thanks to Andrew Morton for pointing this out.
> >>>
> >>> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> >>> ---
> >>>  mm/vmscan.c |   84 ++++++++++++++++++++------------------------------
> >>>  1 file changed, 35 insertions(+), 49 deletions(-)
> >>>
> >>> --- linux.orig/mm/vmscan.c
> >>> +++ linux/mm/vmscan.c
> >>> @@ -1225,6 +1225,36 @@ static inline void note_zone_scanning_pr
> >>>   * But we had to alter page->flags anyway.
> >>>   */
> >>>  
> >>> +void move_active_pages_to_lru(enum lru_list lru, struct list_head *list)
> >>> +{
> >>> +	unsigned long pgmoved = 0;
> >>> +
> >>> +	while (!list_empty(&list)) {
> >>> +		page = lru_to_page(&list);
> >>> +		prefetchw_prev_lru_page(page, &list, flags);
> >>> +
> >>> +		VM_BUG_ON(PageLRU(page));
> >>> +		SetPageLRU(page);
> >>> +
> >>> +		VM_BUG_ON(!PageActive(page));
> >>> +		if (lru < LRU_ACTIVE)
> >>> +			ClearPageActive(page);
> >> Arithmetic on the LRU list is not good code for redability, I think. 
> >> How about adding comment? 
> >>
> >> if (lru < LRU_ACTIVE) /* In case of moving from active list to inactive */
> >>
> >> Ignore me if you think this is trivial. 
> > 
> > Good suggestion. Or this simple one: "we are de-activating"?
> 
> lru < LRU_ACTIVE will never be true for file pages,
> either active or inactive.

Thanks - that old patch version was really broken.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
  2009-05-10 23:45                                             ` Minchan Kim
@ 2009-05-17 11:25                                               ` Wu Fengguang
  -1 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-17 11:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Peter Zijlstra, Johannes Weiner, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro, Lee Schermerhorn

On Mon, May 11, 2009 at 07:45:00AM +0800, Minchan Kim wrote:
> Sorry for late. 
> 
> On Sat, 9 May 2009 14:56:40 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Hmm, this reminded me of the mlocked page protection logic in
> > page_referenced_one(). Why shall the "if (vma->vm_flags & VM_LOCKED)"
> > check be placed *after* the page_check_address() check? Is there a
> > case that an *existing* page frame is not mapped to the VM_LOCKED vma?
> > And why not to protect the page in such a case?
> 
> 
> I also have been having a question that routine.
> As annotation said, it seems to prevent increaseing referenced counter for mlocked page to move the page to unevictable list ASAP.
> Is right?

That's right. And it is only stopping the reference count for the
current VMA - if the reference count has already been elevated, the
mlocked page will continue to float in the [in]active lists.

> But now, page_referenced use refereced variable as just flag not count. 
> So, I think referecned variable counted is meaningless. 

Yes kind of, but anyway it costs nothing :-)

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 336+ messages in thread

* Re: [RFC][PATCH] vmscan: report vm_flags in page_referenced()
@ 2009-05-17 11:25                                               ` Wu Fengguang
  0 siblings, 0 replies; 336+ messages in thread
From: Wu Fengguang @ 2009-05-17 11:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Peter Zijlstra, Johannes Weiner, Andrew Morton, Rik van Riel,
	linux-kernel, tytso, linux-mm, Elladan, Nick Piggin,
	Christoph Lameter, KOSAKI Motohiro, Lee Schermerhorn

On Mon, May 11, 2009 at 07:45:00AM +0800, Minchan Kim wrote:
> Sorry for late. 
> 
> On Sat, 9 May 2009 14:56:40 +0800
> Wu Fengguang <fengguang.wu@intel.com> wrote:
> 
> > Hmm, this reminded me of the mlocked page protection logic in
> > page_referenced_one(). Why shall the "if (vma->vm_flags & VM_LOCKED)"
> > check be placed *after* the page_check_address() check? Is there a
> > case that an *existing* page frame is not mapped to the VM_LOCKED vma?
> > And why not to protect the page in such a case?
> 
> 
> I also have been having a question that routine.
> As annotation said, it seems to prevent increaseing referenced counter for mlocked page to move the page to unevictable list ASAP.
> Is right?

That's right. And it is only stopping the reference count for the
current VMA - if the reference count has already been elevated, the
mlocked page will continue to float in the [in]active lists.

> But now, page_referenced use refereced variable as just flag not count. 
> So, I think referecned variable counted is meaningless. 

Yes kind of, but anyway it costs nothing :-)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 336+ messages in thread

end of thread, other threads:[~2009-05-17 11:26 UTC | newest]

Thread overview: 336+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-28  4:44 Swappiness vs. mmap() and interactive response Elladan
2009-04-28  5:35 ` KOSAKI Motohiro
2009-04-28  5:35   ` KOSAKI Motohiro
2009-04-28  6:36   ` Elladan
2009-04-28  6:36     ` Elladan
2009-04-28  6:52     ` KOSAKI Motohiro
2009-04-28  6:52       ` KOSAKI Motohiro
2009-04-28  7:26       ` Elladan
2009-04-28  7:26         ` Elladan
2009-04-28  7:44         ` KOSAKI Motohiro
2009-04-28  7:44           ` KOSAKI Motohiro
2009-04-28  7:48   ` Peter Zijlstra
2009-04-28  7:48     ` Peter Zijlstra
2009-04-28  7:58     ` Balbir Singh
2009-04-28  7:58       ` Balbir Singh
2009-04-28  8:11       ` Peter Zijlstra
2009-04-28  8:11         ` Peter Zijlstra
2009-04-28  8:23         ` KAMEZAWA Hiroyuki
2009-04-28  8:23           ` KAMEZAWA Hiroyuki
2009-04-28  8:25         ` Balbir Singh
2009-04-28  8:25           ` Balbir Singh
2009-04-28  8:03     ` KOSAKI Motohiro
2009-04-28  8:03       ` KOSAKI Motohiro
2009-04-28  9:09     ` Wu Fengguang
2009-04-28  9:09       ` Wu Fengguang
2009-04-28  9:26       ` Wu Fengguang
2009-04-28  9:26         ` Wu Fengguang
2009-04-28 12:08       ` Theodore Tso
2009-04-28 12:08         ` Theodore Tso
2009-04-29  5:51         ` KOSAKI Motohiro
2009-04-29  5:51           ` KOSAKI Motohiro
2009-04-29  6:34           ` Andrew Morton
2009-04-29  6:34             ` Andrew Morton
2009-04-29  7:47             ` KOSAKI Motohiro
2009-04-29  7:47               ` KOSAKI Motohiro
2009-04-30  4:14             ` Elladan
2009-04-30  4:14               ` Elladan
2009-04-30  4:43               ` Andrew Morton
2009-04-30  4:43                 ` Andrew Morton
2009-04-30  4:55                 ` KOSAKI Motohiro
2009-04-30  4:55                   ` KOSAKI Motohiro
2009-04-30  4:55                 ` Elladan
2009-04-30  4:55                   ` Elladan
2009-04-29  7:48           ` KOSAKI Motohiro
2009-04-29  7:48             ` KOSAKI Motohiro
2009-04-30 11:59           ` KOSAKI Motohiro
2009-04-30 11:59             ` KOSAKI Motohiro
2009-04-30 13:46             ` Elladan
2009-04-30 13:46               ` Elladan
2009-05-06 11:04             ` KOSAKI Motohiro
2009-05-06 11:04               ` KOSAKI Motohiro
2009-04-28 15:28   ` Rik van Riel
2009-04-28 23:29 ` [PATCH] vmscan: evict use-once pages first Rik van Riel
2009-04-28 23:29   ` Rik van Riel
2009-04-29  3:36   ` Elladan
2009-04-29  3:36     ` Elladan
2009-04-29 17:06     ` Christoph Hellwig
2009-04-29 17:06       ` Christoph Hellwig
2009-04-29  6:42   ` Peter Zijlstra
2009-04-29  6:42     ` Peter Zijlstra
2009-04-29 13:30     ` Rik van Riel
2009-04-29 13:30       ` Rik van Riel
2009-04-29 15:47     ` [PATCH] vmscan: evict use-once pages first (v2) Rik van Riel
2009-04-29 15:47       ` Rik van Riel
2009-04-29 16:07       ` KOSAKI Motohiro
2009-04-29 16:07         ` KOSAKI Motohiro
2009-04-29 16:18         ` Rik van Riel
2009-04-29 16:18           ` Rik van Riel
2009-04-29 17:14         ` [PATCH] vmscan: evict use-once pages first (v3) Rik van Riel
2009-04-29 17:14           ` Rik van Riel
2009-04-30  0:39           ` KOSAKI Motohiro
2009-04-30  0:39             ` KOSAKI Motohiro
2009-04-30  8:10           ` Johannes Weiner
2009-04-30  8:10             ` Johannes Weiner
2009-05-01 22:32           ` Andrew Morton
2009-05-01 22:32             ` Andrew Morton
2009-05-01 23:05             ` Rik van Riel
2009-05-01 23:05               ` Rik van Riel
2009-05-01 23:25               ` Andrew Morton
2009-05-01 23:25                 ` Andrew Morton
2009-05-03  1:28                 ` Wu Fengguang
2009-05-03  1:28                   ` Wu Fengguang
2009-05-03  1:15           ` Wu Fengguang
2009-05-03  1:15             ` Wu Fengguang
2009-05-03  1:33             ` Rik van Riel
2009-05-03  1:33               ` Rik van Riel
2009-05-03  1:46               ` Wu Fengguang
2009-05-03  1:46                 ` Wu Fengguang
2009-04-29 16:10       ` [PATCH] vmscan: evict use-once pages first (v2) Peter Zijlstra
2009-04-29 16:10         ` Peter Zijlstra
2009-04-30  7:20       ` Elladan
2009-04-30  7:20         ` Elladan
2009-04-30 13:08         ` Rik van Riel
2009-04-30 13:08           ` Rik van Riel
2009-04-30 14:00           ` Elladan
2009-04-30 14:00             ` Elladan
2009-05-01  0:45         ` Andrew Morton
2009-05-01  0:45           ` Andrew Morton
2009-05-01  0:59           ` Rik van Riel
2009-05-01  0:59             ` Rik van Riel
2009-05-01  1:13             ` Andrew Morton
2009-05-01  1:13               ` Andrew Morton
2009-05-01  1:50               ` Rik van Riel
2009-05-01  1:50                 ` Rik van Riel
2009-05-01  2:54                 ` Andrew Morton
2009-05-01  2:54                   ` Andrew Morton
2009-05-01 14:05                   ` Rik van Riel
2009-05-01 14:05                     ` Rik van Riel
2009-05-01 18:04                     ` Ray Lee
2009-05-01 18:04                       ` Ray Lee
2009-05-01 19:34                       ` Rik van Riel
2009-05-01 19:34                         ` Rik van Riel
2009-05-01 19:44                         ` Ray Lee
2009-05-01 19:44                           ` Ray Lee
2009-05-01 20:08                           ` Rik van Riel
2009-05-01 20:08                             ` Rik van Riel
2009-05-01 20:17                         ` Elladan
2009-05-01 20:17                           ` Elladan
2009-05-01 19:35                     ` Andrew Morton
2009-05-01 19:35                       ` Andrew Morton
2009-05-01 20:05                       ` Rik van Riel
2009-05-01 20:05                         ` Rik van Riel
2009-05-01 20:45                         ` Andrew Morton
2009-05-01 20:45                           ` Andrew Morton
2009-05-01 21:46                           ` Rik van Riel
2009-05-01 21:46                             ` Rik van Riel
2009-05-03  3:15                       ` Wu Fengguang
2009-05-03  3:15                         ` Wu Fengguang
2009-05-03  3:24                         ` Rik van Riel
2009-05-03  3:24                           ` Rik van Riel
2009-05-03  3:43                           ` Wu Fengguang
2009-05-03  3:43                             ` Wu Fengguang
2009-05-04 10:23                         ` Peter Zijlstra
2009-05-04 10:23                           ` Peter Zijlstra
2009-05-07 12:11                           ` [PATCH -mm] vmscan: make mapped executable pages the first class citizen Wu Fengguang
2009-05-07 12:11                             ` Wu Fengguang
2009-05-07 13:39                             ` Christoph Lameter
2009-05-07 13:39                               ` Christoph Lameter
2009-05-07 14:15                               ` Peter Zijlstra
2009-05-07 14:15                                 ` Peter Zijlstra
2009-05-07 14:18                                 ` Christoph Lameter
2009-05-07 14:18                                   ` Christoph Lameter
2009-05-07 14:38                                   ` Peter Zijlstra
2009-05-07 14:38                                     ` Peter Zijlstra
2009-05-07 15:36                                     ` Christoph Lameter
2009-05-07 15:36                                       ` Christoph Lameter
2009-05-07 15:59                                       ` Rik van Riel
2009-05-07 15:59                                         ` Rik van Riel
2009-05-07 15:06                                   ` Rik van Riel
2009-05-07 15:06                                     ` Rik van Riel
2009-05-07 16:00                                   ` Lee Schermerhorn
2009-05-07 16:00                                     ` Lee Schermerhorn
2009-05-07 16:32                                     ` Christoph Lameter
2009-05-07 16:32                                       ` Christoph Lameter
2009-05-07 17:11                                       ` Rik van Riel
2009-05-07 17:11                                         ` Rik van Riel
2009-05-08  3:40                                         ` Elladan
2009-05-08  3:40                                           ` Elladan
2009-05-08 16:04                                           ` Rik van Riel
2009-05-08 16:04                                             ` Rik van Riel
2009-05-09  4:04                                             ` Elladan
2009-05-09  4:04                                               ` Elladan
2009-05-08 17:18                                           ` Christoph Lameter
2009-05-08 17:18                                             ` Christoph Lameter
2009-05-09 10:20                                             ` KOSAKI Motohiro
2009-05-09 10:20                                               ` KOSAKI Motohiro
2009-05-08 17:37                                           ` Alan Cox
2009-05-08 17:37                                             ` Alan Cox
2009-05-07 15:10                             ` Johannes Weiner
2009-05-07 15:10                               ` Johannes Weiner
2009-05-07 15:17                               ` Peter Zijlstra
2009-05-07 15:17                                 ` Peter Zijlstra
2009-05-07 15:21                                 ` Rik van Riel
2009-05-07 15:21                                   ` Rik van Riel
2009-05-08  3:30                                 ` Wu Fengguang
2009-05-08  3:30                                   ` Wu Fengguang
2009-05-08  4:17                                 ` [RFC][PATCH] vmscan: report vm_flags in page_referenced() Wu Fengguang
2009-05-08  4:17                                   ` Wu Fengguang
2009-05-08 12:09                                   ` Minchan Kim
2009-05-08 12:09                                     ` Minchan Kim
2009-05-08 12:15                                     ` Wu Fengguang
2009-05-08 12:15                                       ` Wu Fengguang
2009-05-08 14:01                                       ` Minchan Kim
2009-05-08 14:01                                         ` Minchan Kim
2009-05-09  6:56                                         ` Wu Fengguang
2009-05-09  6:56                                           ` Wu Fengguang
2009-05-10 23:45                                           ` Minchan Kim
2009-05-10 23:45                                             ` Minchan Kim
2009-05-17 11:25                                             ` Wu Fengguang
2009-05-17 11:25                                               ` Wu Fengguang
2009-05-07 20:44                               ` [PATCH -mm] vmscan: make mapped executable pages the first class citizen Andrew Morton
2009-05-07 20:44                                 ` Andrew Morton
2009-05-08  8:16                                 ` Wu Fengguang
2009-05-08  8:16                                   ` Wu Fengguang
2009-05-08  8:28                                   ` Wu Fengguang
2009-05-08  8:28                                     ` Wu Fengguang
2009-05-08 19:58                                   ` Andrew Morton
2009-05-08 19:58                                     ` Andrew Morton
2009-05-08 22:00                                     ` Alan Cox
2009-05-08 22:00                                       ` Alan Cox
2009-05-08 22:15                                       ` Andrew Morton
2009-05-08 22:15                                         ` Andrew Morton
2009-05-08 22:53                                         ` Elladan
2009-05-08 22:53                                           ` Elladan
2009-05-08 22:20                                       ` Rik van Riel
2009-05-08 22:20                                         ` Rik van Riel
2009-05-10  8:59                                       ` KOSAKI Motohiro
2009-05-10  8:59                                         ` KOSAKI Motohiro
2009-05-10  9:07                                         ` Peter Zijlstra
2009-05-10  9:07                                           ` Peter Zijlstra
2009-05-10  9:35                                           ` Wu Fengguang
2009-05-10  9:35                                             ` Wu Fengguang
2009-05-10 10:06                                             ` KOSAKI Motohiro
2009-05-10 10:06                                               ` KOSAKI Motohiro
2009-05-10  9:36                                           ` KOSAKI Motohiro
2009-05-10  9:36                                             ` KOSAKI Motohiro
2009-05-10 13:45                                             ` Alan Cox
2009-05-10 13:45                                               ` Alan Cox
2009-05-10 13:56                                               ` KOSAKI Motohiro
2009-05-10 13:56                                                 ` KOSAKI Motohiro
2009-05-10 14:51                                               ` Rik van Riel
2009-05-10 14:51                                                 ` Rik van Riel
2009-05-10 14:59                                                 ` KOSAKI Motohiro
2009-05-10 14:59                                                   ` KOSAKI Motohiro
2009-05-10 20:13                                                 ` Alan Cox
2009-05-10 20:13                                                   ` Alan Cox
2009-05-10 20:37                                                   ` Rik van Riel
2009-05-10 20:37                                                     ` Rik van Riel
2009-05-10 21:23                                                     ` Arjan van de Ven
2009-05-10 21:23                                                       ` Arjan van de Ven
2009-05-11 10:03                                                       ` Johannes Weiner
2009-05-11 10:03                                                         ` Johannes Weiner
2009-05-10 21:29                                                     ` Alan Cox
2009-05-10 21:29                                                       ` Alan Cox
2009-05-10  9:20                                         ` Wu Fengguang
2009-05-10  9:20                                           ` Wu Fengguang
2009-05-10  9:29                                           ` KOSAKI Motohiro
2009-05-10  9:29                                             ` KOSAKI Motohiro
2009-05-10 10:03                                             ` Wu Fengguang
2009-05-10 10:03                                               ` Wu Fengguang
2009-05-10 10:15                                               ` KOSAKI Motohiro
2009-05-10 10:15                                                 ` KOSAKI Motohiro
2009-05-10 11:21                                                 ` Wu Fengguang
2009-05-10 11:21                                                   ` Wu Fengguang
2009-05-10 11:39                                                   ` KOSAKI Motohiro
2009-05-10 11:39                                                     ` KOSAKI Motohiro
2009-05-10 11:44                                                     ` Wu Fengguang
2009-05-10 11:44                                                       ` Wu Fengguang
2009-05-10 12:19                                                       ` Peter Zijlstra
2009-05-10 12:19                                                         ` Peter Zijlstra
2009-05-10 12:39                                                         ` KOSAKI Motohiro
2009-05-10 12:39                                                           ` KOSAKI Motohiro
2009-05-10 13:17                                                           ` Peter Zijlstra
2009-05-10 13:17                                                             ` Peter Zijlstra
2009-05-12  2:50                                     ` Wu Fengguang
2009-05-12  2:50                                       ` Wu Fengguang
2009-05-12  4:35                                       ` Wu Fengguang
2009-05-12  4:35                                         ` Wu Fengguang
2009-05-12 13:20                                       ` Rik van Riel
2009-05-12 13:20                                         ` Rik van Riel
2009-05-16  9:26                                         ` Wu Fengguang
2009-05-16  9:26                                           ` Wu Fengguang
2009-05-12  2:51                                     ` [PATCH -mm] vmscan: report vm_flags in page_referenced() Wu Fengguang
2009-05-12  2:51                                       ` Wu Fengguang
2009-05-12  6:23                                       ` Peter Zijlstra
2009-05-12  6:23                                         ` Peter Zijlstra
2009-05-12  6:44                                         ` Minchan Kim
2009-05-12  6:44                                           ` Minchan Kim
2009-05-12 11:44                                           ` Wu Fengguang
2009-05-12 11:44                                             ` Wu Fengguang
2009-05-12  2:52                                     ` [PATCH -mm] vmscan: make mapped executable pages the first class citizen Wu Fengguang
2009-05-12  2:52                                       ` Wu Fengguang
2009-05-12  3:00                                       ` KOSAKI Motohiro
2009-05-12  3:00                                         ` KOSAKI Motohiro
2009-05-12 20:54                                         ` [PATCH -mm] vmscan: protect a fraction of file backed mapped pages from reclaim Christoph Lameter
2009-05-12 20:54                                           ` Christoph Lameter
2009-05-12 17:06                                           ` Rik van Riel
2009-05-12 17:06                                             ` Rik van Riel
2009-05-12 21:20                                             ` Christoph Lameter
2009-05-12 21:20                                               ` Christoph Lameter
2009-05-12 17:39                                               ` Rik van Riel
2009-05-12 17:39                                                 ` Rik van Riel
2009-05-12 22:02                                                 ` Christoph Lameter
2009-05-12 22:02                                                   ` Christoph Lameter
2009-05-12 20:17                                                   ` Rik van Riel
2009-05-12 20:17                                                     ` Rik van Riel
2009-05-12 20:26                                                     ` Christoph Lameter
2009-05-12 20:26                                                       ` Christoph Lameter
2009-05-13  0:45                                           ` KOSAKI Motohiro
2009-05-13  0:45                                             ` KOSAKI Motohiro
2009-05-14 20:14                                             ` Christoph Lameter
2009-05-14 20:14                                               ` Christoph Lameter
2009-05-14 23:28                                               ` KOSAKI Motohiro
2009-05-14 23:28                                                 ` KOSAKI Motohiro
2009-05-14 23:42                                                 ` Rik van Riel
2009-05-14 23:42                                                   ` Rik van Riel
2009-05-15 18:09                                                 ` Christoph Lameter
2009-05-15 18:09                                                   ` Christoph Lameter
2009-05-16  8:54                                               ` Wu Fengguang
2009-05-16  8:54                                                 ` Wu Fengguang
2009-05-12  8:17                                       ` [PATCH -mm] vmscan: make mapped executable pages the first class citizen Minchan Kim
2009-05-12  8:17                                         ` Minchan Kim
2009-05-12  2:53                                     ` [PATCH -mm] vmscan: merge duplicate code in shrink_active_list() Wu Fengguang
2009-05-12  2:53                                       ` Wu Fengguang
2009-05-12  2:58                                       ` KOSAKI Motohiro
2009-05-12  2:58                                         ` KOSAKI Motohiro
2009-05-12  3:03                                         ` Wu Fengguang
2009-05-12  3:03                                           ` Wu Fengguang
2009-05-12  7:26                                       ` Minchan Kim
2009-05-12  7:26                                         ` Minchan Kim
2009-05-12 11:48                                         ` Wu Fengguang
2009-05-12 11:48                                           ` Wu Fengguang
2009-05-12 11:57                                           ` Minchan Kim
2009-05-12 11:57                                             ` Minchan Kim
2009-05-12 13:32                                           ` Rik van Riel
2009-05-12 13:32                                             ` Rik van Riel
2009-05-16  9:30                                             ` Wu Fengguang
2009-05-16  9:30                                               ` Wu Fengguang
2009-05-08  3:02                               ` [PATCH -mm] vmscan: make mapped executable pages the first class citizen Wu Fengguang
2009-05-08  3:02                                 ` Wu Fengguang
2009-05-08  7:30                                 ` Minchan Kim
2009-05-08  7:30                                   ` Minchan Kim
2009-05-08  8:09                                   ` Wu Fengguang
2009-05-08  8:09                                     ` Wu Fengguang
2009-05-08  9:34                                     ` Minchan Kim
2009-05-08  9:34                                       ` Minchan Kim
2009-05-08 14:25                                       ` Christoph Lameter
2009-05-08 14:25                                         ` Christoph Lameter
2009-05-08 14:34                                         ` Rik van Riel
2009-05-08 14:34                                           ` Rik van Riel
2009-05-08 17:41                                         ` KOSAKI Motohiro
2009-05-08 17:41                                           ` KOSAKI Motohiro
2009-05-04  8:04                       ` [PATCH] vmscan: evict use-once pages first (v2) Peter Zijlstra
2009-05-04  8:04                         ` Peter Zijlstra
2009-05-01  3:09           ` Elladan
2009-05-01  3:09             ` Elladan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.