All of lore.kernel.org
 help / color / mirror / Atom feed
* drop_caches ...
@ 2009-03-04  9:57 Markus
  2009-03-04 10:04 ` drop_caches Wu Fengguang
  0 siblings, 1 reply; 49+ messages in thread
From: Markus @ 2009-03-04  9:57 UTC (permalink / raw)
  To: lkml

Hello!

I have a small problem. Maybe its just a misunderstanding but I cant 
solve it.

I think that writing "3" to drop_caches should drop all buffers and 
caches which are already written. So its recommended to put a "sync" 
infront of it.
So I did "free -m ; sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m"
And it gave me:
             total       used       free     shared    buffers     
cached
Mem:          3950       3922         28          0          1        
879
-/+ buffers/cache:       3041        909
Swap:         5342        205       5136
             total       used       free     shared    buffers     
cached
Mem:          3950       3907         43          0          0        
864
-/+ buffers/cache:       3041        908
Swap:         5341        206       5135

So the buffer was 1 and is 0 afterthat. But cached is at 879 MB before 
and is still 864 MB (!!!) after that!

I am at swappiness=0 and when I remove and readd one swap-partition 
after another (so there is always swap). It will keep the cached and 
put the swapped memory on other swaps?!

I _think_ thats not the way it should go?

It would be really kind if someone could explain that issue and 
what "cached" is at all!

Have a nice day...
Markus

PS: Please CC me as I am not subscribed.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04  9:57 drop_caches Markus
@ 2009-03-04 10:04 ` Wu Fengguang
  2009-03-04 10:32   ` drop_caches Markus
  0 siblings, 1 reply; 49+ messages in thread
From: Wu Fengguang @ 2009-03-04 10:04 UTC (permalink / raw)
  To: Markus; +Cc: lkml

Hi Markus,

On Wed, Mar 04, 2009 at 10:57:33AM +0100, Markus wrote:
> Hello!
> 
> I have a small problem. Maybe its just a misunderstanding but I cant 
> solve it.
> 
> I think that writing "3" to drop_caches should drop all buffers and 
> caches which are already written. So its recommended to put a "sync" 
> infront of it.
> So I did "free -m ; sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m"
> And it gave me:
>              total       used       free     shared    buffers     
> cached
> Mem:          3950       3922         28          0          1        
> 879
> -/+ buffers/cache:       3041        909
> Swap:         5342        205       5136
>              total       used       free     shared    buffers     
> cached
> Mem:          3950       3907         43          0          0        
> 864
> -/+ buffers/cache:       3041        908
> Swap:         5341        206       5135
> 
> So the buffer was 1 and is 0 afterthat. But cached is at 879 MB before 
> and is still 864 MB (!!!) after that!
> 
> I am at swappiness=0 and when I remove and readd one swap-partition 
> after another (so there is always swap). It will keep the cached and 
> put the swapped memory on other swaps?!
> 
> I _think_ thats not the way it should go?
> 
> It would be really kind if someone could explain that issue and 
> what "cached" is at all!

The memory mapped pages won't be dropped in this way.
"cat /proc/meminfo" will show you the number of mapped pages.

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04 10:04 ` drop_caches Wu Fengguang
@ 2009-03-04 10:32   ` Markus
  2009-03-04 11:05     ` drop_caches Wu Fengguang
  0 siblings, 1 reply; 49+ messages in thread
From: Markus @ 2009-03-04 10:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wu Fengguang

Hello Fengguang!

> Hi Markus,
> 
> On Wed, Mar 04, 2009 at 10:57:33AM +0100, Markus wrote:
> > Hello!
> > 
> > I have a small problem. Maybe its just a misunderstanding but I cant 
> > solve it.
> > 
> > I think that writing "3" to drop_caches should drop all buffers and 
> > caches which are already written. So its recommended to put a "sync" 
> > infront of it.
> > So I did "free -m ; sync ; echo 3 > /proc/sys/vm/drop_caches ; 
free -m"
> > And it gave me:
> >              total       used       free     shared    buffers     
> > cached
> > Mem:          3950       3922         28          0          1        
> > 879
> > -/+ buffers/cache:       3041        909
> > Swap:         5342        205       5136
> >              total       used       free     shared    buffers     
> > cached
> > Mem:          3950       3907         43          0          0        
> > 864
> > -/+ buffers/cache:       3041        908
> > Swap:         5341        206       5135
> > 
> > So the buffer was 1 and is 0 afterthat. But cached is at 879 MB 
before 
> > and is still 864 MB (!!!) after that!
> > 
> > I am at swappiness=0 and when I remove and readd one swap-partition 
> > after another (so there is always swap). It will keep the cached and 
> > put the swapped memory on other swaps?!
> > 
> > I _think_ thats not the way it should go?
> > 
> > It would be really kind if someone could explain that issue and 
> > what "cached" is at all!
> 
> The memory mapped pages won't be dropped in this way.
> "cat /proc/meminfo" will show you the number of mapped pages.

# sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; cat /proc/meminfo
             total       used       free     shared    buffers     
cached
Mem:          3950       3262        688          0          0        
359
-/+ buffers/cache:       2902       1047
Swap:         5890       1509       4381
MemTotal:        4045500 kB
MemFree:          705180 kB
Buffers:             508 kB
Cached:           367748 kB
SwapCached:       880744 kB
Active:          1555032 kB
Inactive:        1634868 kB
Active(anon):    1527100 kB
Inactive(anon):  1607328 kB
Active(file):      27932 kB
Inactive(file):    27540 kB
Unevictable:         816 kB
Mlocked:               0 kB
SwapTotal:       6032344 kB
SwapFree:        4486496 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:       2378112 kB
Mapped:            52196 kB
Slab:              65640 kB
SReclaimable:      46192 kB
SUnreclaim:        19448 kB
PageTables:        28200 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8055092 kB
Committed_AS:    4915636 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       44580 kB
VmallocChunk:   34359677239 kB
DirectMap4k:     3182528 kB
DirectMap2M:     1011712 kB

The cached reduced to 359 MB (after the dropping).
I dont know where to read the "number of mapped pages".
"Mapped" is about 51 MB.

thanks in advance

Markus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04 10:32   ` drop_caches Markus
@ 2009-03-04 11:05     ` Wu Fengguang
  2009-03-04 11:29       ` drop_caches Markus
  0 siblings, 1 reply; 49+ messages in thread
From: Wu Fengguang @ 2009-03-04 11:05 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel

On Wed, Mar 04, 2009 at 12:32:02PM +0200, Markus wrote:
> Hello Fengguang!
> 
> > Hi Markus,
> > 
> > On Wed, Mar 04, 2009 at 10:57:33AM +0100, Markus wrote:
> > > Hello!
> > > 
> > > I have a small problem. Maybe its just a misunderstanding but I cant 
> > > solve it.
> > > 
> > > I think that writing "3" to drop_caches should drop all buffers and 
> > > caches which are already written. So its recommended to put a "sync" 
> > > infront of it.
> > > So I did "free -m ; sync ; echo 3 > /proc/sys/vm/drop_caches ; 
> free -m"
> > > And it gave me:
> > >              total       used       free     shared    buffers     
> > > cached
> > > Mem:          3950       3922         28          0          1        
> > > 879
> > > -/+ buffers/cache:       3041        909
> > > Swap:         5342        205       5136
> > >              total       used       free     shared    buffers     
> > > cached
> > > Mem:          3950       3907         43          0          0        
> > > 864
> > > -/+ buffers/cache:       3041        908
> > > Swap:         5341        206       5135
> > > 
> > > So the buffer was 1 and is 0 afterthat. But cached is at 879 MB 
> before 
> > > and is still 864 MB (!!!) after that!
> > > 
> > > I am at swappiness=0 and when I remove and readd one swap-partition 
> > > after another (so there is always swap). It will keep the cached and 
> > > put the swapped memory on other swaps?!
> > > 
> > > I _think_ thats not the way it should go?
> > > 
> > > It would be really kind if someone could explain that issue and 
> > > what "cached" is at all!
> > 
> > The memory mapped pages won't be dropped in this way.
> > "cat /proc/meminfo" will show you the number of mapped pages.
> 
> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; cat /proc/meminfo
>              total       used       free     shared    buffers     
> cached
> Mem:          3950       3262        688          0          0        
> 359
> -/+ buffers/cache:       2902       1047
> Swap:         5890       1509       4381
> MemTotal:        4045500 kB
> MemFree:          705180 kB
> Buffers:             508 kB
> Cached:           367748 kB
> SwapCached:       880744 kB
> Active:          1555032 kB
> Inactive:        1634868 kB
> Active(anon):    1527100 kB
> Inactive(anon):  1607328 kB
> Active(file):      27932 kB
> Inactive(file):    27540 kB
> Unevictable:         816 kB
> Mlocked:               0 kB
> SwapTotal:       6032344 kB
> SwapFree:        4486496 kB
> Dirty:                 0 kB
> Writeback:             0 kB
> AnonPages:       2378112 kB
> Mapped:            52196 kB
> Slab:              65640 kB
> SReclaimable:      46192 kB
> SUnreclaim:        19448 kB
> PageTables:        28200 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     8055092 kB
> Committed_AS:    4915636 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:       44580 kB
> VmallocChunk:   34359677239 kB
> DirectMap4k:     3182528 kB
> DirectMap2M:     1011712 kB
> 
> The cached reduced to 359 MB (after the dropping).
> I dont know where to read the "number of mapped pages".
> "Mapped" is about 51 MB.

Does your tmpfs store lots of files?

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04 11:05     ` drop_caches Wu Fengguang
@ 2009-03-04 11:29       ` Markus
  2009-03-04 11:57           ` drop_caches Wu Fengguang
  0 siblings, 1 reply; 49+ messages in thread
From: Markus @ 2009-03-04 11:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wu Fengguang

> > > The memory mapped pages won't be dropped in this way.
> > > "cat /proc/meminfo" will show you the number of mapped pages.
> > 
> > # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
cat /proc/meminfo
> >              total       used       free     shared    buffers     
> > cached
> > Mem:          3950       3262        688          0          0        
> > 359
> > -/+ buffers/cache:       2902       1047
> > Swap:         5890       1509       4381
> > MemTotal:        4045500 kB
> > MemFree:          705180 kB
> > Buffers:             508 kB
> > Cached:           367748 kB
> > SwapCached:       880744 kB
> > Active:          1555032 kB
> > Inactive:        1634868 kB
> > Active(anon):    1527100 kB
> > Inactive(anon):  1607328 kB
> > Active(file):      27932 kB
> > Inactive(file):    27540 kB
> > Unevictable:         816 kB
> > Mlocked:               0 kB
> > SwapTotal:       6032344 kB
> > SwapFree:        4486496 kB
> > Dirty:                 0 kB
> > Writeback:             0 kB
> > AnonPages:       2378112 kB
> > Mapped:            52196 kB
> > Slab:              65640 kB
> > SReclaimable:      46192 kB
> > SUnreclaim:        19448 kB
> > PageTables:        28200 kB
> > NFS_Unstable:          0 kB
> > Bounce:                0 kB
> > WritebackTmp:          0 kB
> > CommitLimit:     8055092 kB
> > Committed_AS:    4915636 kB
> > VmallocTotal:   34359738367 kB
> > VmallocUsed:       44580 kB
> > VmallocChunk:   34359677239 kB
> > DirectMap4k:     3182528 kB
> > DirectMap2M:     1011712 kB
> > 
> > The cached reduced to 359 MB (after the dropping).
> > I dont know where to read the "number of mapped pages".
> > "Mapped" is about 51 MB.
> 
> Does your tmpfs store lots of files?

Dont think so:

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md6               14G  8.2G  5.6G  60% /
udev                   10M  304K  9.8M   3% /dev
cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
/dev/md4               19G   15G  3.1G  83% /home
/dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
shm                   2.0G     0  2.0G   0% /dev/shm
/dev/md1               99M   19M   76M  20% /boot

# mount
/dev/md6 on / type ext3 (rw,noatime,nodiratime,barrier=0)
/proc on /proc type proc (rw,noexec,nosuid,noatime,nodiratime)
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)
devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)
cachedir on /lib64/splash/cache type tmpfs (rw,size=4096k,mode=644)
/dev/md4 on /home type ext3 (rw,noatime,nodiratime,barrier=0)
/dev/md3 on /usr/portage type ext4 (rw,noatime,nodiratime,barrier=0)
shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
usbfs on /proc/bus/usb type usbfs 
(rw,noexec,nosuid,devmode=0664,devgid=85)
automount(pid6507) on /mnt/.autofs/misc type autofs 
(rw,fd=4,pgrp=6507,minproto=2,maxproto=4)
automount(pid6521) on /mnt/.autofs/usb type autofs 
(rw,fd=4,pgrp=6521,minproto=2,maxproto=4)
/dev/md1 on /boot type ext2 (rw,noatime,nodiratime)

I dont know what exactly all that memory is used for. It varies from 
about 300 MB to up to one GB.
Tell me where to look and I will!

Thanks!

Markus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04 11:29       ` drop_caches Markus
@ 2009-03-04 11:57           ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-04 11:57 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, linux-mm

On Wed, Mar 04, 2009 at 01:29:40PM +0200, Markus wrote:
> > > > The memory mapped pages won't be dropped in this way.
> > > > "cat /proc/meminfo" will show you the number of mapped pages.
> > > 
> > > # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
> cat /proc/meminfo
> > >              total       used       free     shared    buffers     
> > > cached
> > > Mem:          3950       3262        688          0          0        
> > > 359
> > > -/+ buffers/cache:       2902       1047
> > > Swap:         5890       1509       4381
> > > MemTotal:        4045500 kB
> > > MemFree:          705180 kB
> > > Buffers:             508 kB
> > > Cached:           367748 kB
> > > SwapCached:       880744 kB
> > > Active:          1555032 kB
> > > Inactive:        1634868 kB
> > > Active(anon):    1527100 kB
> > > Inactive(anon):  1607328 kB
> > > Active(file):      27932 kB
> > > Inactive(file):    27540 kB
> > > Unevictable:         816 kB
> > > Mlocked:               0 kB
> > > SwapTotal:       6032344 kB
> > > SwapFree:        4486496 kB
> > > Dirty:                 0 kB
> > > Writeback:             0 kB
> > > AnonPages:       2378112 kB
> > > Mapped:            52196 kB
> > > Slab:              65640 kB
> > > SReclaimable:      46192 kB
> > > SUnreclaim:        19448 kB
> > > PageTables:        28200 kB
> > > NFS_Unstable:          0 kB
> > > Bounce:                0 kB
> > > WritebackTmp:          0 kB
> > > CommitLimit:     8055092 kB
> > > Committed_AS:    4915636 kB
> > > VmallocTotal:   34359738367 kB
> > > VmallocUsed:       44580 kB
> > > VmallocChunk:   34359677239 kB
> > > DirectMap4k:     3182528 kB
> > > DirectMap2M:     1011712 kB
> > > 
> > > The cached reduced to 359 MB (after the dropping).
> > > I dont know where to read the "number of mapped pages".
> > > "Mapped" is about 51 MB.
> > 
> > Does your tmpfs store lots of files?
> 
> Dont think so:
> 
> # df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/md6               14G  8.2G  5.6G  60% /
> udev                   10M  304K  9.8M   3% /dev
> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
> /dev/md4               19G   15G  3.1G  83% /home
> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
> shm                   2.0G     0  2.0G   0% /dev/shm
> /dev/md1               99M   19M   76M  20% /boot
> 
> # mount
> /dev/md6 on / type ext3 (rw,noatime,nodiratime,barrier=0)
> /proc on /proc type proc (rw,noexec,nosuid,noatime,nodiratime)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
> udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)
> devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)
> cachedir on /lib64/splash/cache type tmpfs (rw,size=4096k,mode=644)
> /dev/md4 on /home type ext3 (rw,noatime,nodiratime,barrier=0)
> /dev/md3 on /usr/portage type ext4 (rw,noatime,nodiratime,barrier=0)
> shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
> usbfs on /proc/bus/usb type usbfs 
> (rw,noexec,nosuid,devmode=0664,devgid=85)
> automount(pid6507) on /mnt/.autofs/misc type autofs 
> (rw,fd=4,pgrp=6507,minproto=2,maxproto=4)
> automount(pid6521) on /mnt/.autofs/usb type autofs 
> (rw,fd=4,pgrp=6521,minproto=2,maxproto=4)
> /dev/md1 on /boot type ext2 (rw,noatime,nodiratime)
> 
> I dont know what exactly all that memory is used for. It varies from 
> about 300 MB to up to one GB.
> Tell me where to look and I will!

So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  It's
strange to me that there are so many undroppable cached pages(Cached=359M),
and most of them lie out of the LRU queue(Active+Inactive file=53M)...

Anyone have better clues on these 'hidden' pages?

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-04 11:57           ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-04 11:57 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, linux-mm

On Wed, Mar 04, 2009 at 01:29:40PM +0200, Markus wrote:
> > > > The memory mapped pages won't be dropped in this way.
> > > > "cat /proc/meminfo" will show you the number of mapped pages.
> > > 
> > > # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
> cat /proc/meminfo
> > >              total       used       free     shared    buffers     
> > > cached
> > > Mem:          3950       3262        688          0          0        
> > > 359
> > > -/+ buffers/cache:       2902       1047
> > > Swap:         5890       1509       4381
> > > MemTotal:        4045500 kB
> > > MemFree:          705180 kB
> > > Buffers:             508 kB
> > > Cached:           367748 kB
> > > SwapCached:       880744 kB
> > > Active:          1555032 kB
> > > Inactive:        1634868 kB
> > > Active(anon):    1527100 kB
> > > Inactive(anon):  1607328 kB
> > > Active(file):      27932 kB
> > > Inactive(file):    27540 kB
> > > Unevictable:         816 kB
> > > Mlocked:               0 kB
> > > SwapTotal:       6032344 kB
> > > SwapFree:        4486496 kB
> > > Dirty:                 0 kB
> > > Writeback:             0 kB
> > > AnonPages:       2378112 kB
> > > Mapped:            52196 kB
> > > Slab:              65640 kB
> > > SReclaimable:      46192 kB
> > > SUnreclaim:        19448 kB
> > > PageTables:        28200 kB
> > > NFS_Unstable:          0 kB
> > > Bounce:                0 kB
> > > WritebackTmp:          0 kB
> > > CommitLimit:     8055092 kB
> > > Committed_AS:    4915636 kB
> > > VmallocTotal:   34359738367 kB
> > > VmallocUsed:       44580 kB
> > > VmallocChunk:   34359677239 kB
> > > DirectMap4k:     3182528 kB
> > > DirectMap2M:     1011712 kB
> > > 
> > > The cached reduced to 359 MB (after the dropping).
> > > I dont know where to read the "number of mapped pages".
> > > "Mapped" is about 51 MB.
> > 
> > Does your tmpfs store lots of files?
> 
> Dont think so:
> 
> # df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/md6               14G  8.2G  5.6G  60% /
> udev                   10M  304K  9.8M   3% /dev
> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
> /dev/md4               19G   15G  3.1G  83% /home
> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
> shm                   2.0G     0  2.0G   0% /dev/shm
> /dev/md1               99M   19M   76M  20% /boot
> 
> # mount
> /dev/md6 on / type ext3 (rw,noatime,nodiratime,barrier=0)
> /proc on /proc type proc (rw,noexec,nosuid,noatime,nodiratime)
> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
> udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)
> devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)
> cachedir on /lib64/splash/cache type tmpfs (rw,size=4096k,mode=644)
> /dev/md4 on /home type ext3 (rw,noatime,nodiratime,barrier=0)
> /dev/md3 on /usr/portage type ext4 (rw,noatime,nodiratime,barrier=0)
> shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
> usbfs on /proc/bus/usb type usbfs 
> (rw,noexec,nosuid,devmode=0664,devgid=85)
> automount(pid6507) on /mnt/.autofs/misc type autofs 
> (rw,fd=4,pgrp=6507,minproto=2,maxproto=4)
> automount(pid6521) on /mnt/.autofs/usb type autofs 
> (rw,fd=4,pgrp=6521,minproto=2,maxproto=4)
> /dev/md1 on /boot type ext2 (rw,noatime,nodiratime)
> 
> I dont know what exactly all that memory is used for. It varies from 
> about 300 MB to up to one GB.
> Tell me where to look and I will!

So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  It's
strange to me that there are so many undroppable cached pages(Cached=359M),
and most of them lie out of the LRU queue(Active+Inactive file=53M)...

Anyone have better clues on these 'hidden' pages?

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04 11:57           ` drop_caches Wu Fengguang
@ 2009-03-04 12:32             ` Zdenek Kabelac
  -1 siblings, 0 replies; 49+ messages in thread
From: Zdenek Kabelac @ 2009-03-04 12:32 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Markus, linux-kernel, linux-mm

Wu Fengguang napsal(a):
> On Wed, Mar 04, 2009 at 01:29:40PM +0200, Markus wrote:
>>>>> The memory mapped pages won't be dropped in this way.
>>>>> "cat /proc/meminfo" will show you the number of mapped pages.
>>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
>> cat /proc/meminfo
>>>>              total       used       free     shared    buffers     
>>>> cached
>>>> Mem:          3950       3262        688          0          0        
>>>> 359
>>>> -/+ buffers/cache:       2902       1047
>>>> Swap:         5890       1509       4381
>>>> MemTotal:        4045500 kB
>>>> MemFree:          705180 kB
>>>> Buffers:             508 kB
>>>> Cached:           367748 kB
>>>> SwapCached:       880744 kB
>>>> Active:          1555032 kB
>>>> Inactive:        1634868 kB
>>>> Active(anon):    1527100 kB
>>>> Inactive(anon):  1607328 kB
>>>> Active(file):      27932 kB
>>>> Inactive(file):    27540 kB
>>>> Unevictable:         816 kB
>>>> Mlocked:               0 kB
>>>> SwapTotal:       6032344 kB
>>>> SwapFree:        4486496 kB
>>>> Dirty:                 0 kB
>>>> Writeback:             0 kB
>>>> AnonPages:       2378112 kB
>>>> Mapped:            52196 kB
>>>> Slab:              65640 kB
>>>> SReclaimable:      46192 kB
>>>> SUnreclaim:        19448 kB
>>>> PageTables:        28200 kB
>>>> NFS_Unstable:          0 kB
>>>> Bounce:                0 kB
>>>> WritebackTmp:          0 kB
>>>> CommitLimit:     8055092 kB
>>>> Committed_AS:    4915636 kB
>>>> VmallocTotal:   34359738367 kB
>>>> VmallocUsed:       44580 kB
>>>> VmallocChunk:   34359677239 kB
>>>> DirectMap4k:     3182528 kB
>>>> DirectMap2M:     1011712 kB
>>>>
>>>> The cached reduced to 359 MB (after the dropping).
>>>> I dont know where to read the "number of mapped pages".
>>>> "Mapped" is about 51 MB.
>>> Does your tmpfs store lots of files?
>> Dont think so:
>>
>> # df -h
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/md6               14G  8.2G  5.6G  60% /
>> udev                   10M  304K  9.8M   3% /dev
>> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
>> /dev/md4               19G   15G  3.1G  83% /home
>> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
>> shm                   2.0G     0  2.0G   0% /dev/shm
>> /dev/md1               99M   19M   76M  20% /boot
>>
>> # mount
>> /dev/md6 on / type ext3 (rw,noatime,nodiratime,barrier=0)
>> /proc on /proc type proc (rw,noexec,nosuid,noatime,nodiratime)
>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
>> udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)
>> devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)
>> cachedir on /lib64/splash/cache type tmpfs (rw,size=4096k,mode=644)
>> /dev/md4 on /home type ext3 (rw,noatime,nodiratime,barrier=0)
>> /dev/md3 on /usr/portage type ext4 (rw,noatime,nodiratime,barrier=0)
>> shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
>> usbfs on /proc/bus/usb type usbfs 
>> (rw,noexec,nosuid,devmode=0664,devgid=85)
>> automount(pid6507) on /mnt/.autofs/misc type autofs 
>> (rw,fd=4,pgrp=6507,minproto=2,maxproto=4)
>> automount(pid6521) on /mnt/.autofs/usb type autofs 
>> (rw,fd=4,pgrp=6521,minproto=2,maxproto=4)
>> /dev/md1 on /boot type ext2 (rw,noatime,nodiratime)
>>
>> I dont know what exactly all that memory is used for. It varies from 
>> about 300 MB to up to one GB.
>> Tell me where to look and I will!
> 
> So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  It's
> strange to me that there are so many undroppable cached pages(Cached=359M),
> and most of them lie out of the LRU queue(Active+Inactive file=53M)...
> 
> Anyone have better clues on these 'hidden' pages?

Maybe try this:

cat /proc/`pidof X`/smaps | grep drm | wc -l

you will see some growing numbers.

Also check  cat /proc/dri/0/gem_objects
there should be some number  # object bytes - which should be close to your 
missing cached pages.


If you are using Intel GEM driver - there is some unlimited caching issue

see: http://bugs.freedesktop.org/show_bug.cgi?id=20404


Zdenek







^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-04 12:32             ` Zdenek Kabelac
  0 siblings, 0 replies; 49+ messages in thread
From: Zdenek Kabelac @ 2009-03-04 12:32 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Markus, linux-kernel, linux-mm

Wu Fengguang napsal(a):
> On Wed, Mar 04, 2009 at 01:29:40PM +0200, Markus wrote:
>>>>> The memory mapped pages won't be dropped in this way.
>>>>> "cat /proc/meminfo" will show you the number of mapped pages.
>>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
>> cat /proc/meminfo
>>>>              total       used       free     shared    buffers     
>>>> cached
>>>> Mem:          3950       3262        688          0          0        
>>>> 359
>>>> -/+ buffers/cache:       2902       1047
>>>> Swap:         5890       1509       4381
>>>> MemTotal:        4045500 kB
>>>> MemFree:          705180 kB
>>>> Buffers:             508 kB
>>>> Cached:           367748 kB
>>>> SwapCached:       880744 kB
>>>> Active:          1555032 kB
>>>> Inactive:        1634868 kB
>>>> Active(anon):    1527100 kB
>>>> Inactive(anon):  1607328 kB
>>>> Active(file):      27932 kB
>>>> Inactive(file):    27540 kB
>>>> Unevictable:         816 kB
>>>> Mlocked:               0 kB
>>>> SwapTotal:       6032344 kB
>>>> SwapFree:        4486496 kB
>>>> Dirty:                 0 kB
>>>> Writeback:             0 kB
>>>> AnonPages:       2378112 kB
>>>> Mapped:            52196 kB
>>>> Slab:              65640 kB
>>>> SReclaimable:      46192 kB
>>>> SUnreclaim:        19448 kB
>>>> PageTables:        28200 kB
>>>> NFS_Unstable:          0 kB
>>>> Bounce:                0 kB
>>>> WritebackTmp:          0 kB
>>>> CommitLimit:     8055092 kB
>>>> Committed_AS:    4915636 kB
>>>> VmallocTotal:   34359738367 kB
>>>> VmallocUsed:       44580 kB
>>>> VmallocChunk:   34359677239 kB
>>>> DirectMap4k:     3182528 kB
>>>> DirectMap2M:     1011712 kB
>>>>
>>>> The cached reduced to 359 MB (after the dropping).
>>>> I dont know where to read the "number of mapped pages".
>>>> "Mapped" is about 51 MB.
>>> Does your tmpfs store lots of files?
>> Dont think so:
>>
>> # df -h
>> Filesystem            Size  Used Avail Use% Mounted on
>> /dev/md6               14G  8.2G  5.6G  60% /
>> udev                   10M  304K  9.8M   3% /dev
>> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
>> /dev/md4               19G   15G  3.1G  83% /home
>> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
>> shm                   2.0G     0  2.0G   0% /dev/shm
>> /dev/md1               99M   19M   76M  20% /boot
>>
>> # mount
>> /dev/md6 on / type ext3 (rw,noatime,nodiratime,barrier=0)
>> /proc on /proc type proc (rw,noexec,nosuid,noatime,nodiratime)
>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
>> udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)
>> devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)
>> cachedir on /lib64/splash/cache type tmpfs (rw,size=4096k,mode=644)
>> /dev/md4 on /home type ext3 (rw,noatime,nodiratime,barrier=0)
>> /dev/md3 on /usr/portage type ext4 (rw,noatime,nodiratime,barrier=0)
>> shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
>> usbfs on /proc/bus/usb type usbfs 
>> (rw,noexec,nosuid,devmode=0664,devgid=85)
>> automount(pid6507) on /mnt/.autofs/misc type autofs 
>> (rw,fd=4,pgrp=6507,minproto=2,maxproto=4)
>> automount(pid6521) on /mnt/.autofs/usb type autofs 
>> (rw,fd=4,pgrp=6521,minproto=2,maxproto=4)
>> /dev/md1 on /boot type ext2 (rw,noatime,nodiratime)
>>
>> I dont know what exactly all that memory is used for. It varies from 
>> about 300 MB to up to one GB.
>> Tell me where to look and I will!
> 
> So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  It's
> strange to me that there are so many undroppable cached pages(Cached=359M),
> and most of them lie out of the LRU queue(Active+Inactive file=53M)...
> 
> Anyone have better clues on these 'hidden' pages?

Maybe try this:

cat /proc/`pidof X`/smaps | grep drm | wc -l

you will see some growing numbers.

Also check  cat /proc/dri/0/gem_objects
there should be some number  # object bytes - which should be close to your 
missing cached pages.


If you are using Intel GEM driver - there is some unlimited caching issue

see: http://bugs.freedesktop.org/show_bug.cgi?id=20404


Zdenek






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04 12:32             ` drop_caches Zdenek Kabelac
@ 2009-03-04 13:47               ` Markus
  -1 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-04 13:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Zdenek Kabelac, Wu Fengguang, linux-mm

> >>>>> The memory mapped pages won't be dropped in this way.
> >>>>> "cat /proc/meminfo" will show you the number of mapped pages.
> >>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
> >> cat /proc/meminfo
> >>>>              total       used       free     shared    buffers     
> >>>> cached
> >>>> Mem:          3950       3262        688          0          0        
> >>>> 359
> >>>> -/+ buffers/cache:       2902       1047
> >>>> Swap:         5890       1509       4381
> >>>> MemTotal:        4045500 kB
> >>>> MemFree:          705180 kB
> >>>> Buffers:             508 kB
> >>>> Cached:           367748 kB
> >>>> SwapCached:       880744 kB
> >>>> Active:          1555032 kB
> >>>> Inactive:        1634868 kB
> >>>> Active(anon):    1527100 kB
> >>>> Inactive(anon):  1607328 kB
> >>>> Active(file):      27932 kB
> >>>> Inactive(file):    27540 kB
> >>>> Unevictable:         816 kB
> >>>> Mlocked:               0 kB
> >>>> SwapTotal:       6032344 kB
> >>>> SwapFree:        4486496 kB
> >>>> Dirty:                 0 kB
> >>>> Writeback:             0 kB
> >>>> AnonPages:       2378112 kB
> >>>> Mapped:            52196 kB
> >>>> Slab:              65640 kB
> >>>> SReclaimable:      46192 kB
> >>>> SUnreclaim:        19448 kB
> >>>> PageTables:        28200 kB
> >>>> NFS_Unstable:          0 kB
> >>>> Bounce:                0 kB
> >>>> WritebackTmp:          0 kB
> >>>> CommitLimit:     8055092 kB
> >>>> Committed_AS:    4915636 kB
> >>>> VmallocTotal:   34359738367 kB
> >>>> VmallocUsed:       44580 kB
> >>>> VmallocChunk:   34359677239 kB
> >>>> DirectMap4k:     3182528 kB
> >>>> DirectMap2M:     1011712 kB
> >>>>
> >>>> The cached reduced to 359 MB (after the dropping).
> >>>> I dont know where to read the "number of mapped pages".
> >>>> "Mapped" is about 51 MB.
> >>> Does your tmpfs store lots of files?
> >> Dont think so:
> >>
> >> # df -h
> >> Filesystem            Size  Used Avail Use% Mounted on
> >> /dev/md6               14G  8.2G  5.6G  60% /
> >> udev                   10M  304K  9.8M   3% /dev
> >> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
> >> /dev/md4               19G   15G  3.1G  83% /home
> >> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
> >> shm                   2.0G     0  2.0G   0% /dev/shm
> >> /dev/md1               99M   19M   76M  20% /boot
> >>
> >> # mount
> >> /dev/md6 on / type ext3 (rw,noatime,nodiratime,barrier=0)
> >> /proc on /proc type proc (rw,noexec,nosuid,noatime,nodiratime)
> >> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
> >> udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)
> >> devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)
> >> cachedir on /lib64/splash/cache type tmpfs (rw,size=4096k,mode=644)
> >> /dev/md4 on /home type ext3 (rw,noatime,nodiratime,barrier=0)
> >> /dev/md3 on /usr/portage type ext4 
(rw,noatime,nodiratime,barrier=0)
> >> shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
> >> usbfs on /proc/bus/usb type usbfs 
> >> (rw,noexec,nosuid,devmode=0664,devgid=85)
> >> automount(pid6507) on /mnt/.autofs/misc type autofs 
> >> (rw,fd=4,pgrp=6507,minproto=2,maxproto=4)
> >> automount(pid6521) on /mnt/.autofs/usb type autofs 
> >> (rw,fd=4,pgrp=6521,minproto=2,maxproto=4)
> >> /dev/md1 on /boot type ext2 (rw,noatime,nodiratime)
> >>
> >> I dont know what exactly all that memory is used for. It varies 
from 
> >> about 300 MB to up to one GB.
> >> Tell me where to look and I will!
> > 
> > So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  
It's
> > strange to me that there are so many undroppable cached 
pages(Cached=359M),
> > and most of them lie out of the LRU queue(Active+Inactive 
file=53M)...
> > 
> > Anyone have better clues on these 'hidden' pages?
> 
> Maybe try this:
> 
> cat /proc/`pidof X`/smaps | grep drm | wc -l
> 
> you will see some growing numbers.
> 
> Also check  cat /proc/dri/0/gem_objects
> there should be some number  # object bytes - which should be close to 
your 
> missing cached pages.
> 
> 
> If you are using Intel GEM driver - there is some unlimited caching 
issue
> 
> see: http://bugs.freedesktop.org/show_bug.cgi?id=20404
> 
# cat /proc/`pidof X`/smaps | grep drm | wc -l
0
# cat /proc/dri/0/gem_objects
cat: /proc/dri/0/gem_objects: No such file or directory

I use Xorg 1.3 with an nvidia gpu. Dont know if I use a "Intel GEM 
driver".

Btw I am running a 2.6.28.2.

Thanks.
Markus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-04 13:47               ` Markus
  0 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-04 13:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Zdenek Kabelac, Wu Fengguang, linux-mm

> >>>>> The memory mapped pages won't be dropped in this way.
> >>>>> "cat /proc/meminfo" will show you the number of mapped pages.
> >>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
> >> cat /proc/meminfo
> >>>>              total       used       free     shared    buffers     
> >>>> cached
> >>>> Mem:          3950       3262        688          0          0        
> >>>> 359
> >>>> -/+ buffers/cache:       2902       1047
> >>>> Swap:         5890       1509       4381
> >>>> MemTotal:        4045500 kB
> >>>> MemFree:          705180 kB
> >>>> Buffers:             508 kB
> >>>> Cached:           367748 kB
> >>>> SwapCached:       880744 kB
> >>>> Active:          1555032 kB
> >>>> Inactive:        1634868 kB
> >>>> Active(anon):    1527100 kB
> >>>> Inactive(anon):  1607328 kB
> >>>> Active(file):      27932 kB
> >>>> Inactive(file):    27540 kB
> >>>> Unevictable:         816 kB
> >>>> Mlocked:               0 kB
> >>>> SwapTotal:       6032344 kB
> >>>> SwapFree:        4486496 kB
> >>>> Dirty:                 0 kB
> >>>> Writeback:             0 kB
> >>>> AnonPages:       2378112 kB
> >>>> Mapped:            52196 kB
> >>>> Slab:              65640 kB
> >>>> SReclaimable:      46192 kB
> >>>> SUnreclaim:        19448 kB
> >>>> PageTables:        28200 kB
> >>>> NFS_Unstable:          0 kB
> >>>> Bounce:                0 kB
> >>>> WritebackTmp:          0 kB
> >>>> CommitLimit:     8055092 kB
> >>>> Committed_AS:    4915636 kB
> >>>> VmallocTotal:   34359738367 kB
> >>>> VmallocUsed:       44580 kB
> >>>> VmallocChunk:   34359677239 kB
> >>>> DirectMap4k:     3182528 kB
> >>>> DirectMap2M:     1011712 kB
> >>>>
> >>>> The cached reduced to 359 MB (after the dropping).
> >>>> I dont know where to read the "number of mapped pages".
> >>>> "Mapped" is about 51 MB.
> >>> Does your tmpfs store lots of files?
> >> Dont think so:
> >>
> >> # df -h
> >> Filesystem            Size  Used Avail Use% Mounted on
> >> /dev/md6               14G  8.2G  5.6G  60% /
> >> udev                   10M  304K  9.8M   3% /dev
> >> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
> >> /dev/md4               19G   15G  3.1G  83% /home
> >> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
> >> shm                   2.0G     0  2.0G   0% /dev/shm
> >> /dev/md1               99M   19M   76M  20% /boot
> >>
> >> # mount
> >> /dev/md6 on / type ext3 (rw,noatime,nodiratime,barrier=0)
> >> /proc on /proc type proc (rw,noexec,nosuid,noatime,nodiratime)
> >> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
> >> udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)
> >> devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)
> >> cachedir on /lib64/splash/cache type tmpfs (rw,size=4096k,mode=644)
> >> /dev/md4 on /home type ext3 (rw,noatime,nodiratime,barrier=0)
> >> /dev/md3 on /usr/portage type ext4 
(rw,noatime,nodiratime,barrier=0)
> >> shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
> >> usbfs on /proc/bus/usb type usbfs 
> >> (rw,noexec,nosuid,devmode=0664,devgid=85)
> >> automount(pid6507) on /mnt/.autofs/misc type autofs 
> >> (rw,fd=4,pgrp=6507,minproto=2,maxproto=4)
> >> automount(pid6521) on /mnt/.autofs/usb type autofs 
> >> (rw,fd=4,pgrp=6521,minproto=2,maxproto=4)
> >> /dev/md1 on /boot type ext2 (rw,noatime,nodiratime)
> >>
> >> I dont know what exactly all that memory is used for. It varies 
from 
> >> about 300 MB to up to one GB.
> >> Tell me where to look and I will!
> > 
> > So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  
It's
> > strange to me that there are so many undroppable cached 
pages(Cached=359M),
> > and most of them lie out of the LRU queue(Active+Inactive 
file=53M)...
> > 
> > Anyone have better clues on these 'hidden' pages?
> 
> Maybe try this:
> 
> cat /proc/`pidof X`/smaps | grep drm | wc -l
> 
> you will see some growing numbers.
> 
> Also check  cat /proc/dri/0/gem_objects
> there should be some number  # object bytes - which should be close to 
your 
> missing cached pages.
> 
> 
> If you are using Intel GEM driver - there is some unlimited caching 
issue
> 
> see: http://bugs.freedesktop.org/show_bug.cgi?id=20404
> 
# cat /proc/`pidof X`/smaps | grep drm | wc -l
0
# cat /proc/dri/0/gem_objects
cat: /proc/dri/0/gem_objects: No such file or directory

I use Xorg 1.3 with an nvidia gpu. Dont know if I use a "Intel GEM 
driver".

Btw I am running a 2.6.28.2.

Thanks.
Markus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04 13:47               ` drop_caches Markus
@ 2009-03-04 14:09                 ` Zdenek Kabelac
  -1 siblings, 0 replies; 49+ messages in thread
From: Zdenek Kabelac @ 2009-03-04 14:09 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Wu Fengguang, linux-mm

Markus napsal(a):
>>>>>>> The memory mapped pages won't be dropped in this way.
>>>>>>> "cat /proc/meminfo" will show you the number of mapped pages.
>>>>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
>>>> cat /proc/meminfo
>>>>>>              total       used       free     shared    buffers     
>>>>>> cached
>>>>>> Mem:          3950       3262        688          0          0        
>>>>>> 359
>>>>>> -/+ buffers/cache:       2902       1047
>>>>>> Swap:         5890       1509       4381
>>>>>> MemTotal:        4045500 kB
>>>>>> MemFree:          705180 kB
>>>>>> Buffers:             508 kB
>>>>>> Cached:           367748 kB
>>>>>> SwapCached:       880744 kB
>>>>>> Active:          1555032 kB
>>>>>> Inactive:        1634868 kB
>>>>>> Active(anon):    1527100 kB
>>>>>> Inactive(anon):  1607328 kB
>>>>>> Active(file):      27932 kB
>>>>>> Inactive(file):    27540 kB
>>>>>> Unevictable:         816 kB
>>>>>> Mlocked:               0 kB
>>>>>> SwapTotal:       6032344 kB
>>>>>> SwapFree:        4486496 kB
>>>>>> Dirty:                 0 kB
>>>>>> Writeback:             0 kB
>>>>>> AnonPages:       2378112 kB
>>>>>> Mapped:            52196 kB
>>>>>> Slab:              65640 kB
>>>>>> SReclaimable:      46192 kB
>>>>>> SUnreclaim:        19448 kB
>>>>>> PageTables:        28200 kB
>>>>>> NFS_Unstable:          0 kB
>>>>>> Bounce:                0 kB
>>>>>> WritebackTmp:          0 kB
>>>>>> CommitLimit:     8055092 kB
>>>>>> Committed_AS:    4915636 kB
>>>>>> VmallocTotal:   34359738367 kB
>>>>>> VmallocUsed:       44580 kB
>>>>>> VmallocChunk:   34359677239 kB
>>>>>> DirectMap4k:     3182528 kB
>>>>>> DirectMap2M:     1011712 kB
>>>>>>
>>>>>> The cached reduced to 359 MB (after the dropping).
>>>>>> I dont know where to read the "number of mapped pages".
>>>>>> "Mapped" is about 51 MB.
>>>>> Does your tmpfs store lots of files?
>>>> Dont think so:
>>>>
>>>> # df -h
>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>> /dev/md6               14G  8.2G  5.6G  60% /
>>>> udev                   10M  304K  9.8M   3% /dev
>>>> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
>>>> /dev/md4               19G   15G  3.1G  83% /home
>>>> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
>>>> shm                   2.0G     0  2.0G   0% /dev/shm
>>>> /dev/md1               99M   19M   76M  20% /boot
>>>>
>>>> # mount
>>>> /dev/md6 on / type ext3 (rw,noatime,nodiratime,barrier=0)
>>>> /proc on /proc type proc (rw,noexec,nosuid,noatime,nodiratime)
>>>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
>>>> udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)
>>>> devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)
>>>> cachedir on /lib64/splash/cache type tmpfs (rw,size=4096k,mode=644)
>>>> /dev/md4 on /home type ext3 (rw,noatime,nodiratime,barrier=0)
>>>> /dev/md3 on /usr/portage type ext4 
> (rw,noatime,nodiratime,barrier=0)
>>>> shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
>>>> usbfs on /proc/bus/usb type usbfs 
>>>> (rw,noexec,nosuid,devmode=0664,devgid=85)
>>>> automount(pid6507) on /mnt/.autofs/misc type autofs 
>>>> (rw,fd=4,pgrp=6507,minproto=2,maxproto=4)
>>>> automount(pid6521) on /mnt/.autofs/usb type autofs 
>>>> (rw,fd=4,pgrp=6521,minproto=2,maxproto=4)
>>>> /dev/md1 on /boot type ext2 (rw,noatime,nodiratime)
>>>>
>>>> I dont know what exactly all that memory is used for. It varies 
> from 
>>>> about 300 MB to up to one GB.
>>>> Tell me where to look and I will!
>>> So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  
> It's
>>> strange to me that there are so many undroppable cached 
> pages(Cached=359M),
>>> and most of them lie out of the LRU queue(Active+Inactive 
> file=53M)...
>>> Anyone have better clues on these 'hidden' pages?
>> Maybe try this:
>>
>> cat /proc/`pidof X`/smaps | grep drm | wc -l
>>
>> you will see some growing numbers.
>>
>> Also check  cat /proc/dri/0/gem_objects
>> there should be some number  # object bytes - which should be close to 
> your 
>> missing cached pages.
>>
>>
>> If you are using Intel GEM driver - there is some unlimited caching 
> issue
>> see: http://bugs.freedesktop.org/show_bug.cgi?id=20404
>>
> # cat /proc/`pidof X`/smaps | grep drm | wc -l
> 0
> # cat /proc/dri/0/gem_objects
> cat: /proc/dri/0/gem_objects: No such file or directory
> 
> I use Xorg 1.3 with an nvidia gpu. Dont know if I use a "Intel GEM 
> driver".
> 


Are you using binary  driver from NVidia ??
Maybe you should ask authors of this binary blob ?

Could you try to use for a while Vesa driver to see, if you are able to get 
same strange results ?

Zdenek

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-04 14:09                 ` Zdenek Kabelac
  0 siblings, 0 replies; 49+ messages in thread
From: Zdenek Kabelac @ 2009-03-04 14:09 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Wu Fengguang, linux-mm

Markus napsal(a):
>>>>>>> The memory mapped pages won't be dropped in this way.
>>>>>>> "cat /proc/meminfo" will show you the number of mapped pages.
>>>>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
>>>> cat /proc/meminfo
>>>>>>              total       used       free     shared    buffers     
>>>>>> cached
>>>>>> Mem:          3950       3262        688          0          0        
>>>>>> 359
>>>>>> -/+ buffers/cache:       2902       1047
>>>>>> Swap:         5890       1509       4381
>>>>>> MemTotal:        4045500 kB
>>>>>> MemFree:          705180 kB
>>>>>> Buffers:             508 kB
>>>>>> Cached:           367748 kB
>>>>>> SwapCached:       880744 kB
>>>>>> Active:          1555032 kB
>>>>>> Inactive:        1634868 kB
>>>>>> Active(anon):    1527100 kB
>>>>>> Inactive(anon):  1607328 kB
>>>>>> Active(file):      27932 kB
>>>>>> Inactive(file):    27540 kB
>>>>>> Unevictable:         816 kB
>>>>>> Mlocked:               0 kB
>>>>>> SwapTotal:       6032344 kB
>>>>>> SwapFree:        4486496 kB
>>>>>> Dirty:                 0 kB
>>>>>> Writeback:             0 kB
>>>>>> AnonPages:       2378112 kB
>>>>>> Mapped:            52196 kB
>>>>>> Slab:              65640 kB
>>>>>> SReclaimable:      46192 kB
>>>>>> SUnreclaim:        19448 kB
>>>>>> PageTables:        28200 kB
>>>>>> NFS_Unstable:          0 kB
>>>>>> Bounce:                0 kB
>>>>>> WritebackTmp:          0 kB
>>>>>> CommitLimit:     8055092 kB
>>>>>> Committed_AS:    4915636 kB
>>>>>> VmallocTotal:   34359738367 kB
>>>>>> VmallocUsed:       44580 kB
>>>>>> VmallocChunk:   34359677239 kB
>>>>>> DirectMap4k:     3182528 kB
>>>>>> DirectMap2M:     1011712 kB
>>>>>>
>>>>>> The cached reduced to 359 MB (after the dropping).
>>>>>> I dont know where to read the "number of mapped pages".
>>>>>> "Mapped" is about 51 MB.
>>>>> Does your tmpfs store lots of files?
>>>> Dont think so:
>>>>
>>>> # df -h
>>>> Filesystem            Size  Used Avail Use% Mounted on
>>>> /dev/md6               14G  8.2G  5.6G  60% /
>>>> udev                   10M  304K  9.8M   3% /dev
>>>> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
>>>> /dev/md4               19G   15G  3.1G  83% /home
>>>> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
>>>> shm                   2.0G     0  2.0G   0% /dev/shm
>>>> /dev/md1               99M   19M   76M  20% /boot
>>>>
>>>> # mount
>>>> /dev/md6 on / type ext3 (rw,noatime,nodiratime,barrier=0)
>>>> /proc on /proc type proc (rw,noexec,nosuid,noatime,nodiratime)
>>>> sysfs on /sys type sysfs (rw,nosuid,nodev,noexec)
>>>> udev on /dev type tmpfs (rw,nosuid,size=10240k,mode=755)
>>>> devpts on /dev/pts type devpts (rw,nosuid,noexec,gid=5,mode=620)
>>>> cachedir on /lib64/splash/cache type tmpfs (rw,size=4096k,mode=644)
>>>> /dev/md4 on /home type ext3 (rw,noatime,nodiratime,barrier=0)
>>>> /dev/md3 on /usr/portage type ext4 
> (rw,noatime,nodiratime,barrier=0)
>>>> shm on /dev/shm type tmpfs (rw,noexec,nosuid,nodev)
>>>> usbfs on /proc/bus/usb type usbfs 
>>>> (rw,noexec,nosuid,devmode=0664,devgid=85)
>>>> automount(pid6507) on /mnt/.autofs/misc type autofs 
>>>> (rw,fd=4,pgrp=6507,minproto=2,maxproto=4)
>>>> automount(pid6521) on /mnt/.autofs/usb type autofs 
>>>> (rw,fd=4,pgrp=6521,minproto=2,maxproto=4)
>>>> /dev/md1 on /boot type ext2 (rw,noatime,nodiratime)
>>>>
>>>> I dont know what exactly all that memory is used for. It varies 
> from 
>>>> about 300 MB to up to one GB.
>>>> Tell me where to look and I will!
>>> So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  
> It's
>>> strange to me that there are so many undroppable cached 
> pages(Cached=359M),
>>> and most of them lie out of the LRU queue(Active+Inactive 
> file=53M)...
>>> Anyone have better clues on these 'hidden' pages?
>> Maybe try this:
>>
>> cat /proc/`pidof X`/smaps | grep drm | wc -l
>>
>> you will see some growing numbers.
>>
>> Also check  cat /proc/dri/0/gem_objects
>> there should be some number  # object bytes - which should be close to 
> your 
>> missing cached pages.
>>
>>
>> If you are using Intel GEM driver - there is some unlimited caching 
> issue
>> see: http://bugs.freedesktop.org/show_bug.cgi?id=20404
>>
> # cat /proc/`pidof X`/smaps | grep drm | wc -l
> 0
> # cat /proc/dri/0/gem_objects
> cat: /proc/dri/0/gem_objects: No such file or directory
> 
> I use Xorg 1.3 with an nvidia gpu. Dont know if I use a "Intel GEM 
> driver".
> 


Are you using binary  driver from NVidia ??
Maybe you should ask authors of this binary blob ?

Could you try to use for a while Vesa driver to see, if you are able to get 
same strange results ?

Zdenek

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04 14:09                 ` drop_caches Zdenek Kabelac
@ 2009-03-04 18:47                   ` Markus
  -1 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-04 18:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Zdenek Kabelac, Wu Fengguang, linux-mm

Am Mittwoch, 4. März 2009 schrieb Zdenek Kabelac:
> Markus napsal(a):
> >>>>>>> The memory mapped pages won't be dropped in this way.
> >>>>>>> "cat /proc/meminfo" will show you the number of mapped pages.
> >>>>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
> >>>> cat /proc/meminfo
> >>>>>>              total       used       free     shared    buffers     
> >>>>>> cached
> >>>>>> Mem:          3950       3262        688          0          0        
> >>>>>> 359
> >>>>>> -/+ buffers/cache:       2902       1047
> >>>>>> Swap:         5890       1509       4381
> >>>>>> MemTotal:        4045500 kB
> >>>>>> MemFree:          705180 kB
> >>>>>> Buffers:             508 kB
> >>>>>> Cached:           367748 kB
> >>>>>> SwapCached:       880744 kB
> >>>>>> Active:          1555032 kB
> >>>>>> Inactive:        1634868 kB
> >>>>>> Active(anon):    1527100 kB
> >>>>>> Inactive(anon):  1607328 kB
> >>>>>> Active(file):      27932 kB
> >>>>>> Inactive(file):    27540 kB
> >>>>>> Unevictable:         816 kB
> >>>>>> Mlocked:               0 kB
> >>>>>> SwapTotal:       6032344 kB
> >>>>>> SwapFree:        4486496 kB
> >>>>>> Dirty:                 0 kB
> >>>>>> Writeback:             0 kB
> >>>>>> AnonPages:       2378112 kB
> >>>>>> Mapped:            52196 kB
> >>>>>> Slab:              65640 kB
> >>>>>> SReclaimable:      46192 kB
> >>>>>> SUnreclaim:        19448 kB
> >>>>>> PageTables:        28200 kB
> >>>>>> NFS_Unstable:          0 kB
> >>>>>> Bounce:                0 kB
> >>>>>> WritebackTmp:          0 kB
> >>>>>> CommitLimit:     8055092 kB
> >>>>>> Committed_AS:    4915636 kB
> >>>>>> VmallocTotal:   34359738367 kB
> >>>>>> VmallocUsed:       44580 kB
> >>>>>> VmallocChunk:   34359677239 kB
> >>>>>> DirectMap4k:     3182528 kB
> >>>>>> DirectMap2M:     1011712 kB
> >>>>>>
> >>>>>> The cached reduced to 359 MB (after the dropping).
> >>>>>> I dont know where to read the "number of mapped pages".
> >>>>>> "Mapped" is about 51 MB.
> >>>>> Does your tmpfs store lots of files?
> >>>> Dont think so:
> >>>>
> >>>> # df -h
> >>>> Filesystem            Size  Used Avail Use% Mounted on
> >>>> /dev/md6               14G  8.2G  5.6G  60% /
> >>>> udev                   10M  304K  9.8M   3% /dev
> >>>> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
> >>>> /dev/md4               19G   15G  3.1G  83% /home
> >>>> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
> >>>> shm                   2.0G     0  2.0G   0% /dev/shm
> >>>> /dev/md1               99M   19M   76M  20% /boot
> >>>>
> >>>> I dont know what exactly all that memory is used for. It varies 
> > from 
> >>>> about 300 MB to up to one GB.
> >>>> Tell me where to look and I will!
> >>> So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  
> > It's
> >>> strange to me that there are so many undroppable cached 
> > pages(Cached=359M),
> >>> and most of them lie out of the LRU queue(Active+Inactive 
> > file=53M)...
> >>> Anyone have better clues on these 'hidden' pages?
> >> Maybe try this:
> >>
> >> cat /proc/`pidof X`/smaps | grep drm | wc -l
> >>
> >> you will see some growing numbers.
> >>
> >> Also check  cat /proc/dri/0/gem_objects
> >> there should be some number  # object bytes - which should be close 
to 
> > your 
> >> missing cached pages.
> >>
> >>
> >> If you are using Intel GEM driver - there is some unlimited caching 
> > issue
> >> see: http://bugs.freedesktop.org/show_bug.cgi?id=20404
> >>
> > # cat /proc/`pidof X`/smaps | grep drm | wc -l
> > 0
> > # cat /proc/dri/0/gem_objects
> > cat: /proc/dri/0/gem_objects: No such file or directory
> > 
> > I use Xorg 1.3 with an nvidia gpu. Dont know if I use a "Intel GEM 
> > driver".
> > 
> 
> 
> Are you using binary  driver from NVidia ??
> Maybe you should ask authors of this binary blob ?
> 
> Could you try to use for a while Vesa driver to see, if you are able 
to get 
> same strange results ?

I rebooted in console without the nvidia-module loaded and have the same 
results (updated to 2.6.28.7 btw):
# sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; cat /proc/meminfo
             total       used       free     shared    buffers     
cached
Mem:          3950       1647       2303          0          0        
924
-/+ buffers/cache:        722       3228
Swap:         5890          0       5890
MemTotal:        4045444 kB
MemFree:         2358944 kB
Buffers:             544 kB
Cached:           946624 kB
SwapCached:            0 kB
Active:          1614756 kB
Inactive:           7632 kB
Active(anon):    1602476 kB
Inactive(anon):        0 kB
Active(file):      12280 kB
Inactive(file):     7632 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       6032344 kB
SwapFree:        6032344 kB
Dirty:                72 kB
Writeback:            32 kB
AnonPages:        675224 kB
Mapped:            17756 kB
Slab:              19936 kB
SReclaimable:       9652 kB
SUnreclaim:        10284 kB
PageTables:         8296 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8055064 kB
Committed_AS:    3648088 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       10616 kB
VmallocChunk:   34359716459 kB
DirectMap4k:        6080 kB
DirectMap2M:     4188160 kB

Thanks!
Markus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-04 18:47                   ` Markus
  0 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-04 18:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Zdenek Kabelac, Wu Fengguang, linux-mm

Am Mittwoch, 4. März 2009 schrieb Zdenek Kabelac:
> Markus napsal(a):
> >>>>>>> The memory mapped pages won't be dropped in this way.
> >>>>>>> "cat /proc/meminfo" will show you the number of mapped pages.
> >>>>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
> >>>> cat /proc/meminfo
> >>>>>>              total       used       free     shared    buffers     
> >>>>>> cached
> >>>>>> Mem:          3950       3262        688          0          0        
> >>>>>> 359
> >>>>>> -/+ buffers/cache:       2902       1047
> >>>>>> Swap:         5890       1509       4381
> >>>>>> MemTotal:        4045500 kB
> >>>>>> MemFree:          705180 kB
> >>>>>> Buffers:             508 kB
> >>>>>> Cached:           367748 kB
> >>>>>> SwapCached:       880744 kB
> >>>>>> Active:          1555032 kB
> >>>>>> Inactive:        1634868 kB
> >>>>>> Active(anon):    1527100 kB
> >>>>>> Inactive(anon):  1607328 kB
> >>>>>> Active(file):      27932 kB
> >>>>>> Inactive(file):    27540 kB
> >>>>>> Unevictable:         816 kB
> >>>>>> Mlocked:               0 kB
> >>>>>> SwapTotal:       6032344 kB
> >>>>>> SwapFree:        4486496 kB
> >>>>>> Dirty:                 0 kB
> >>>>>> Writeback:             0 kB
> >>>>>> AnonPages:       2378112 kB
> >>>>>> Mapped:            52196 kB
> >>>>>> Slab:              65640 kB
> >>>>>> SReclaimable:      46192 kB
> >>>>>> SUnreclaim:        19448 kB
> >>>>>> PageTables:        28200 kB
> >>>>>> NFS_Unstable:          0 kB
> >>>>>> Bounce:                0 kB
> >>>>>> WritebackTmp:          0 kB
> >>>>>> CommitLimit:     8055092 kB
> >>>>>> Committed_AS:    4915636 kB
> >>>>>> VmallocTotal:   34359738367 kB
> >>>>>> VmallocUsed:       44580 kB
> >>>>>> VmallocChunk:   34359677239 kB
> >>>>>> DirectMap4k:     3182528 kB
> >>>>>> DirectMap2M:     1011712 kB
> >>>>>>
> >>>>>> The cached reduced to 359 MB (after the dropping).
> >>>>>> I dont know where to read the "number of mapped pages".
> >>>>>> "Mapped" is about 51 MB.
> >>>>> Does your tmpfs store lots of files?
> >>>> Dont think so:
> >>>>
> >>>> # df -h
> >>>> Filesystem            Size  Used Avail Use% Mounted on
> >>>> /dev/md6               14G  8.2G  5.6G  60% /
> >>>> udev                   10M  304K  9.8M   3% /dev
> >>>> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
> >>>> /dev/md4               19G   15G  3.1G  83% /home
> >>>> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
> >>>> shm                   2.0G     0  2.0G   0% /dev/shm
> >>>> /dev/md1               99M   19M   76M  20% /boot
> >>>>
> >>>> I dont know what exactly all that memory is used for. It varies 
> > from 
> >>>> about 300 MB to up to one GB.
> >>>> Tell me where to look and I will!
> >>> So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  
> > It's
> >>> strange to me that there are so many undroppable cached 
> > pages(Cached=359M),
> >>> and most of them lie out of the LRU queue(Active+Inactive 
> > file=53M)...
> >>> Anyone have better clues on these 'hidden' pages?
> >> Maybe try this:
> >>
> >> cat /proc/`pidof X`/smaps | grep drm | wc -l
> >>
> >> you will see some growing numbers.
> >>
> >> Also check  cat /proc/dri/0/gem_objects
> >> there should be some number  # object bytes - which should be close 
to 
> > your 
> >> missing cached pages.
> >>
> >>
> >> If you are using Intel GEM driver - there is some unlimited caching 
> > issue
> >> see: http://bugs.freedesktop.org/show_bug.cgi?id=20404
> >>
> > # cat /proc/`pidof X`/smaps | grep drm | wc -l
> > 0
> > # cat /proc/dri/0/gem_objects
> > cat: /proc/dri/0/gem_objects: No such file or directory
> > 
> > I use Xorg 1.3 with an nvidia gpu. Dont know if I use a "Intel GEM 
> > driver".
> > 
> 
> 
> Are you using binary  driver from NVidia ??
> Maybe you should ask authors of this binary blob ?
> 
> Could you try to use for a while Vesa driver to see, if you are able 
to get 
> same strange results ?

I rebooted in console without the nvidia-module loaded and have the same 
results (updated to 2.6.28.7 btw):
# sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; cat /proc/meminfo
             total       used       free     shared    buffers     
cached
Mem:          3950       1647       2303          0          0        
924
-/+ buffers/cache:        722       3228
Swap:         5890          0       5890
MemTotal:        4045444 kB
MemFree:         2358944 kB
Buffers:             544 kB
Cached:           946624 kB
SwapCached:            0 kB
Active:          1614756 kB
Inactive:           7632 kB
Active(anon):    1602476 kB
Inactive(anon):        0 kB
Active(file):      12280 kB
Inactive(file):     7632 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       6032344 kB
SwapFree:        6032344 kB
Dirty:                72 kB
Writeback:            32 kB
AnonPages:        675224 kB
Mapped:            17756 kB
Slab:              19936 kB
SReclaimable:       9652 kB
SUnreclaim:        10284 kB
PageTables:         8296 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8055064 kB
Committed_AS:    3648088 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       10616 kB
VmallocChunk:   34359716459 kB
DirectMap4k:        6080 kB
DirectMap2M:     4188160 kB

Thanks!
Markus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04 18:47                   ` drop_caches Markus
@ 2009-03-05  0:48                     ` Wu Fengguang
  -1 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05  0:48 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

[-- Attachment #1: Type: text/plain, Size: 5982 bytes --]

On Wed, Mar 04, 2009 at 08:47:41PM +0200, Markus wrote:
> Am Mittwoch, 4. März 2009 schrieb Zdenek Kabelac:
> > Markus napsal(a):
> > >>>>>>> The memory mapped pages won't be dropped in this way.
> > >>>>>>> "cat /proc/meminfo" will show you the number of mapped pages.
> > >>>>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
> > >>>> cat /proc/meminfo
> > >>>>>>              total       used       free     shared    buffers     
> > >>>>>> cached
> > >>>>>> Mem:          3950       3262        688          0          0        
> > >>>>>> 359
> > >>>>>> -/+ buffers/cache:       2902       1047
> > >>>>>> Swap:         5890       1509       4381
> > >>>>>> MemTotal:        4045500 kB
> > >>>>>> MemFree:          705180 kB
> > >>>>>> Buffers:             508 kB
> > >>>>>> Cached:           367748 kB
> > >>>>>> SwapCached:       880744 kB
> > >>>>>> Active:          1555032 kB
> > >>>>>> Inactive:        1634868 kB
> > >>>>>> Active(anon):    1527100 kB
> > >>>>>> Inactive(anon):  1607328 kB
> > >>>>>> Active(file):      27932 kB
> > >>>>>> Inactive(file):    27540 kB
> > >>>>>> Unevictable:         816 kB
> > >>>>>> Mlocked:               0 kB
> > >>>>>> SwapTotal:       6032344 kB
> > >>>>>> SwapFree:        4486496 kB
> > >>>>>> Dirty:                 0 kB
> > >>>>>> Writeback:             0 kB
> > >>>>>> AnonPages:       2378112 kB
> > >>>>>> Mapped:            52196 kB
> > >>>>>> Slab:              65640 kB
> > >>>>>> SReclaimable:      46192 kB
> > >>>>>> SUnreclaim:        19448 kB
> > >>>>>> PageTables:        28200 kB
> > >>>>>> NFS_Unstable:          0 kB
> > >>>>>> Bounce:                0 kB
> > >>>>>> WritebackTmp:          0 kB
> > >>>>>> CommitLimit:     8055092 kB
> > >>>>>> Committed_AS:    4915636 kB
> > >>>>>> VmallocTotal:   34359738367 kB
> > >>>>>> VmallocUsed:       44580 kB
> > >>>>>> VmallocChunk:   34359677239 kB
> > >>>>>> DirectMap4k:     3182528 kB
> > >>>>>> DirectMap2M:     1011712 kB
> > >>>>>>
> > >>>>>> The cached reduced to 359 MB (after the dropping).
> > >>>>>> I dont know where to read the "number of mapped pages".
> > >>>>>> "Mapped" is about 51 MB.
> > >>>>> Does your tmpfs store lots of files?
> > >>>> Dont think so:
> > >>>>
> > >>>> # df -h
> > >>>> Filesystem            Size  Used Avail Use% Mounted on
> > >>>> /dev/md6               14G  8.2G  5.6G  60% /
> > >>>> udev                   10M  304K  9.8M   3% /dev
> > >>>> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
> > >>>> /dev/md4               19G   15G  3.1G  83% /home
> > >>>> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
> > >>>> shm                   2.0G     0  2.0G   0% /dev/shm
> > >>>> /dev/md1               99M   19M   76M  20% /boot
> > >>>>
> > >>>> I dont know what exactly all that memory is used for. It varies 
> > >>>> from about 300 MB to up to one GB.
> > >>>> Tell me where to look and I will!
> > >>> So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  
> > > It's strange to me that there are so many undroppable cached pages(Cached=359M),
> > > and most of them lie out of the LRU queue(Active+Inactive file=53M)...
> > >>> Anyone have better clues on these 'hidden' pages?
> > >> Maybe try this:
> > >>
> > >> cat /proc/`pidof X`/smaps | grep drm | wc -l
> > >>
> > >> you will see some growing numbers.
> > >>
> > >> Also check  cat /proc/dri/0/gem_objects
> > >> there should be some number  # object bytes - which should be close 
> to 
> > > your 
> > >> missing cached pages.
> > >>
> > >>
> > >> If you are using Intel GEM driver - there is some unlimited caching 
> > > issue
> > >> see: http://bugs.freedesktop.org/show_bug.cgi?id=20404
> > >>
> > > # cat /proc/`pidof X`/smaps | grep drm | wc -l
> > > 0
> > > # cat /proc/dri/0/gem_objects
> > > cat: /proc/dri/0/gem_objects: No such file or directory
> > > 
> > > I use Xorg 1.3 with an nvidia gpu. Dont know if I use a "Intel GEM 
> > > driver".
> > > 
> > 
> > 
> > Are you using binary  driver from NVidia ??
> > Maybe you should ask authors of this binary blob ?
> > 
> > Could you try to use for a while Vesa driver to see, if you are able 
> to get 
> > same strange results ?
> 
> I rebooted in console without the nvidia-module loaded and have the same 
> results (updated to 2.6.28.7 btw):
> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; cat /proc/meminfo
>              total       used       free     shared    buffers     
> cached
> Mem:          3950       1647       2303          0          0        
> 924
> -/+ buffers/cache:        722       3228
> Swap:         5890          0       5890
> MemTotal:        4045444 kB
> MemFree:         2358944 kB
> Buffers:             544 kB
> Cached:           946624 kB
> SwapCached:            0 kB
> Active:          1614756 kB
> Inactive:           7632 kB
> Active(anon):    1602476 kB
> Inactive(anon):        0 kB
> Active(file):      12280 kB
> Inactive(file):     7632 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:       6032344 kB
> SwapFree:        6032344 kB
> Dirty:                72 kB
> Writeback:            32 kB
> AnonPages:        675224 kB
> Mapped:            17756 kB
> Slab:              19936 kB
> SReclaimable:       9652 kB
> SUnreclaim:        10284 kB
> PageTables:         8296 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     8055064 kB
> Committed_AS:    3648088 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:       10616 kB
> VmallocChunk:   34359716459 kB
> DirectMap4k:        6080 kB
> DirectMap2M:     4188160 kB

Markus, you may want to try this patch, it will have better chance to figure out
the hidden file pages.

1) apply the patch and recompile kernel with CONFIG_PROC_FILECACHE=m
2) after booting:
        modprobe filecache
        cp /proc/filecache filecache-`date +'%F'`
3) send us the copied file, it will list all cached files, including
   the normally hidden ones.

Thanks,
Fengguang

[-- Attachment #2: filecache-2.6.28.patch --]
[-- Type: text/x-diff, Size: 31992 bytes --]

--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -27,6 +27,7 @@ extern unsigned long max_mapnr;
 extern unsigned long num_physpages;
 extern void * high_memory;
 extern int page_cluster;
+extern char * const zone_names[];
 
 #ifdef CONFIG_SYSCTL
 extern int sysctl_legacy_va_layout;
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -104,7 +104,7 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_Z
 
 EXPORT_SYMBOL(totalram_pages);
 
-static char * const zone_names[MAX_NR_ZONES] = {
+char * const zone_names[MAX_NR_ZONES] = {
 #ifdef CONFIG_ZONE_DMA
 	 "DMA",
 #endif
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -1943,7 +1943,10 @@ char *__d_path(const struct path *path, 
 
 		if (dentry == root->dentry && vfsmnt == root->mnt)
 			break;
-		if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
+		if (unlikely(!vfsmnt)) {
+			if (IS_ROOT(dentry))
+				break;
+		} else if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
 			/* Global root? */
 			if (vfsmnt->mnt_parent == vfsmnt) {
 				goto global_root;
--- linux-2.6.orig/lib/radix-tree.c
+++ linux-2.6/lib/radix-tree.c
@@ -564,7 +564,6 @@ out:
 }
 EXPORT_SYMBOL(radix_tree_tag_clear);
 
-#ifndef __KERNEL__	/* Only the test harness uses this at present */
 /**
  * radix_tree_tag_get - get a tag on a radix tree node
  * @root:		radix tree root
@@ -627,7 +626,6 @@ int radix_tree_tag_get(struct radix_tree
 	}
 }
 EXPORT_SYMBOL(radix_tree_tag_get);
-#endif
 
 /**
  *	radix_tree_next_hole    -    find the next hole (not-present entry)
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -82,6 +82,10 @@ static struct hlist_head *inode_hashtabl
  */
 DEFINE_SPINLOCK(inode_lock);
 
+EXPORT_SYMBOL(inode_in_use);
+EXPORT_SYMBOL(inode_unused);
+EXPORT_SYMBOL(inode_lock);
+
 /*
  * iprune_mutex provides exclusion between the kswapd or try_to_free_pages
  * icache shrinking path, and the umount path.  Without this exclusion,
@@ -247,6 +251,8 @@ void __iget(struct inode * inode)
 	inodes_stat.nr_unused--;
 }
 
+EXPORT_SYMBOL(__iget);
+
 /**
  * clear_inode - clear an inode
  * @inode: inode to clear
@@ -1353,6 +1359,16 @@ void inode_double_unlock(struct inode *i
 }
 EXPORT_SYMBOL(inode_double_unlock);
 
+
+struct hlist_head * get_inode_hash_budget(unsigned long index)
+{
+       if (index >= (1 << i_hash_shift))
+               return NULL;
+
+       return inode_hashtable + index;
+}
+EXPORT_SYMBOL_GPL(get_inode_hash_budget);
+
 static __initdata unsigned long ihash_entries;
 static int __init set_ihash_entries(char *str)
 {
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -45,6 +45,9 @@
 LIST_HEAD(super_blocks);
 DEFINE_SPINLOCK(sb_lock);
 
+EXPORT_SYMBOL(super_blocks);
+EXPORT_SYMBOL(sb_lock);
+
 /**
  *	alloc_super	-	create new superblock
  *	@type:	filesystem type superblock should belong to
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -230,6 +230,7 @@ unsigned long shrink_slab(unsigned long 
 	up_read(&shrinker_rwsem);
 	return ret;
 }
+EXPORT_SYMBOL(shrink_slab);
 
 /* Called without lock on whether page is mapped, so answer is unstable */
 static inline int page_mapping_inuse(struct page *page)
--- linux-2.6.orig/mm/swap_state.c
+++ linux-2.6/mm/swap_state.c
@@ -44,6 +44,7 @@ struct address_space swapper_space = {
 	.i_mmap_nonlinear = LIST_HEAD_INIT(swapper_space.i_mmap_nonlinear),
 	.backing_dev_info = &swap_backing_dev_info,
 };
+EXPORT_SYMBOL_GPL(swapper_space);
 
 #define INC_CACHE_INFO(x)	do { swap_cache_info.x++; } while (0)
 
--- linux-2.6.orig/Documentation/filesystems/proc.txt
+++ linux-2.6/Documentation/filesystems/proc.txt
@@ -266,6 +266,7 @@ Table 1-4: Kernel info in /proc
  driver	     Various drivers grouped here, currently rtc (2.4)
  execdomains Execdomains, related to security			(2.4)
  fb	     Frame Buffer devices				(2.4)
+ filecache   Query/drop in-memory file cache
  fs	     File system parameters, currently nfs/exports	(2.4)
  ide         Directory containing info about the IDE subsystem 
  interrupts  Interrupt usage                                   
@@ -456,6 +457,88 @@ varies by architecture and compile optio
 
 > cat /proc/meminfo
 
+..............................................................................
+
+filecache:
+
+Provides access to the in-memory file cache.
+
+To list an index of all cached files:
+
+    echo ls > /proc/filecache
+    cat /proc/filecache
+
+The output looks like:
+
+    # filecache 1.0
+    #      ino       size   cached cached%  state   refcnt  dev             file
+       1026334         91       92    100   --      66      03:02(hda2)     /lib/ld-2.3.6.so
+        233608       1242      972     78   --      66      03:02(hda2)     /lib/tls/libc-2.3.6.so
+         65203        651      476     73   --      1       03:02(hda2)     /bin/bash
+       1026445        261      160     61   --      10      03:02(hda2)     /lib/libncurses.so.5.5
+        235427         10       12    100   --      44      03:02(hda2)     /lib/tls/libdl-2.3.6.so
+
+FIELD	INTRO
+---------------------------------------------------------------------------
+ino	inode number
+size	inode size in KB
+cached	cached size in KB
+cached%	percent of file data cached
+state1	'-' clean; 'd' metadata dirty; 'D' data dirty
+state2	'-' unlocked; 'L' locked, normally indicates file being written out
+refcnt	file reference count, it's an in-kernel one, not exactly open count
+dev	major:minor numbers in hex, followed by a descriptive device name
+file	file path _inside_ the filesystem. There are several special names:
+	'(noname)':	the file name is not available
+	'(03:02)':	the file is a block device file of major:minor
+	'...(deleted)': the named file has been deleted from the disk
+
+To list the cached pages of a perticular file:
+
+    echo /bin/bash > /proc/filecache
+    cat /proc/filecache
+
+    # file /bin/bash
+    # flags R:referenced A:active U:uptodate D:dirty W:writeback M:mmap
+    # idx   len     state   refcnt
+    0       36      RAU__M  3
+    36      1       RAU__M  2
+    37      8       RAU__M  3
+    45      2       RAU___  1
+    47      6       RAU__M  3
+    53      3       RAU__M  2
+    56      2       RAU__M  3
+
+FIELD	INTRO
+----------------------------------------------------------------------------
+idx	page index
+len	number of pages which are cached and share the same state
+state	page state of the flags listed in line two
+refcnt	page reference count
+
+Careful users may notice that the file name to be queried is remembered between
+commands. Internally, the module has a global variable to store the file name
+parameter, so that it can be inherited by newly opened /proc/filecache file.
+However it can lead to interference for multiple queriers. The solution here
+is to obey a rule: only root can interactively change the file name parameter;
+normal users must go for scripts to access the interface. Scripts should do it
+by following the code example below:
+
+    filecache = open("/proc/filecache", "rw");
+    # avoid polluting the global parameter filename
+    filecache.write("set private");
+
+To instruct the kernel to drop clean caches, dentries and inodes from memory,
+causing that memory to become free:
+
+    # drop clean file data cache (i.e. file backed pagecache)
+    echo drop pagecache > /proc/filecache
+
+    # drop clean file metadata cache (i.e. dentries and inodes)
+    echo drop slabcache > /proc/filecache
+
+Note that the drop commands are non-destructive operations and dirty objects
+are not freeable, the user should run `sync' first.
 
 MemTotal:     16344972 kB
 MemFree:      13634064 kB
--- /dev/null
+++ linux-2.6/fs/proc/filecache.c
@@ -0,0 +1,1035 @@
+/*
+ * fs/proc/filecache.c
+ *
+ * Copyright (C) 2006, 2007 Fengguang Wu <wfg@mail.ustc.edu.cn>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/radix-tree.h>
+#include <linux/page-flags.h>
+#include <linux/pagevec.h>
+#include <linux/pagemap.h>
+#include <linux/vmalloc.h>
+#include <linux/writeback.h>
+#include <linux/buffer_head.h>
+#include <linux/parser.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/module.h>
+#include <asm/uaccess.h>
+
+/*
+ * Increase minor version when new columns are added;
+ * Increase major version when existing columns are changed.
+ */
+#define FILECACHE_VERSION	"1.0"
+
+/* Internal buffer sizes. The larger the more effcient. */
+#define SBUF_SIZE	(128<<10)
+#define IWIN_PAGE_ORDER	3
+#define IWIN_SIZE	((PAGE_SIZE<<IWIN_PAGE_ORDER) / sizeof(struct inode *))
+
+/*
+ * Session management.
+ *
+ * Each opened /proc/filecache file is assiocated with a session object.
+ * Also there is a global_session that maintains status across open()/close()
+ * (i.e. the lifetime of an opened file), so that a casual user can query the
+ * filecache via _multiple_ simple shell commands like
+ * 'echo cat /bin/bash > /proc/filecache; cat /proc/filecache'.
+ *
+ * session.query_file is the file whose cache info is to be queried.
+ * Its value determines what we get on read():
+ * 	- NULL: ii_*() called to show the inode index
+ * 	- filp: pg_*() called to show the page groups of a filp
+ *
+ * session.query_file is
+ * 	- cloned from global_session.query_file on open();
+ * 	- updated on write("cat filename");
+ * 	  note that the new file will also be saved in global_session.query_file if
+ * 	  session.private_session is false.
+ */
+
+struct session {
+	/* options */
+	int		private_session;
+	unsigned long	ls_options;
+	dev_t		ls_dev;
+
+	/* parameters */
+	struct file	*query_file;
+
+	/* seqfile pos */
+	pgoff_t		start_offset;
+	pgoff_t		next_offset;
+
+	/* inode at last pos */
+	struct {
+		unsigned long pos;
+		unsigned long state;
+		struct inode *inode;
+		struct inode *pinned_inode;
+	} ipos;
+
+	/* inode window */
+	struct {
+		unsigned long cursor;
+		unsigned long origin;
+		unsigned long size;
+		struct inode **inodes;
+	} iwin;
+};
+
+static struct session global_session;
+
+/*
+ * Session address is stored in proc_file->f_ra.start:
+ * we assume that there will be no readahead for proc_file.
+ */
+static struct session *get_session(struct file *proc_file)
+{
+	return (struct session *)proc_file->f_ra.start;
+}
+
+static void set_session(struct file *proc_file, struct session *s)
+{
+	BUG_ON(proc_file->f_ra.start);
+	proc_file->f_ra.start = (unsigned long)s;
+}
+
+static void update_global_file(struct session *s)
+{
+	if (s->private_session)
+		return;
+
+	if (global_session.query_file)
+		fput(global_session.query_file);
+
+	global_session.query_file = s->query_file;
+
+	if (global_session.query_file)
+		get_file(global_session.query_file);
+}
+
+/*
+ * Cases of the name:
+ * 1) NULL                (new session)
+ * 	s->query_file = global_session.query_file = 0;
+ * 2) ""                  (ls/la)
+ * 	s->query_file = global_session.query_file;
+ * 3) a regular file name (cat newfile)
+ * 	s->query_file = global_session.query_file = newfile;
+ */
+static int session_update_file(struct session *s, char *name)
+{
+	static DEFINE_MUTEX(mutex); /* protects global_session.query_file */
+	int err = 0;
+
+	mutex_lock(&mutex);
+
+	/*
+	 * We are to quit, or to list the cached files.
+	 * Reset *.query_file.
+	 */
+	if (!name) {
+		if (s->query_file) {
+			fput(s->query_file);
+			s->query_file = NULL;
+		}
+		update_global_file(s);
+		goto out;
+	}
+
+	/*
+	 * This is a new session.
+	 * Inherit options/parameters from global ones.
+	 */
+	if (name[0] == '\0') {
+		*s = global_session;
+		if (s->query_file)
+			get_file(s->query_file);
+		goto out;
+	}
+
+	/*
+	 * Open the named file.
+	 */
+	if (s->query_file)
+		fput(s->query_file);
+	s->query_file = filp_open(name, O_RDONLY|O_LARGEFILE, 0);
+	if (IS_ERR(s->query_file)) {
+		err = PTR_ERR(s->query_file);
+		s->query_file = NULL;
+	} else
+		update_global_file(s);
+
+out:
+	mutex_unlock(&mutex);
+
+	return err;
+}
+
+static struct session *session_create(void)
+{
+	struct session *s;
+	int err = 0;
+
+	s = kmalloc(sizeof(*s), GFP_KERNEL);
+	if (s)
+		err = session_update_file(s, "");
+	else
+		err = -ENOMEM;
+
+	return err ? ERR_PTR(err) : s;
+}
+
+static void session_release(struct session *s)
+{
+	if (s->ipos.pinned_inode)
+		iput(s->ipos.pinned_inode);
+	if (s->query_file)
+		fput(s->query_file);
+	kfree(s);
+}
+
+
+/*
+ * Listing of cached files.
+ *
+ * Usage:
+ * 		echo > /proc/filecache  # enter listing mode
+ * 		cat /proc/filecache     # get the file listing
+ */
+
+/* code style borrowed from ib_srp.c */
+enum {
+	LS_OPT_ERR	=	0,
+	LS_OPT_NOCLEAN	=	1 << 0,
+	LS_OPT_NODIRTY	=	1 << 1,
+	LS_OPT_NOUNUSED	=	1 << 2,
+	LS_OPT_EMPTY	=	1 << 3,
+	LS_OPT_ALL	=	1 << 4,
+	LS_OPT_DEV	=	1 << 5,
+};
+
+static match_table_t ls_opt_tokens = {
+	{ LS_OPT_NOCLEAN,	"noclean" 	},
+	{ LS_OPT_NODIRTY,	"nodirty" 	},
+	{ LS_OPT_NOUNUSED,	"nounused" 	},
+	{ LS_OPT_EMPTY,		"empty"		},
+	{ LS_OPT_ALL,		"all" 		},
+	{ LS_OPT_DEV,		"dev=%s"	},
+	{ LS_OPT_ERR,		NULL 		}
+};
+
+static int ls_parse_options(const char *buf, struct session *s)
+{
+	substring_t args[MAX_OPT_ARGS];
+	char *options, *sep_opt;
+	char *p;
+	int token;
+	int ret = 0;
+
+	if (!buf)
+		return 0;
+	options = kstrdup(buf, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	s->ls_options = 0;
+	sep_opt = options;
+	while ((p = strsep(&sep_opt, " ")) != NULL) {
+		if (!*p)
+			continue;
+
+		token = match_token(p, ls_opt_tokens, args);
+
+		switch (token) {
+		case LS_OPT_NOCLEAN:
+		case LS_OPT_NODIRTY:
+		case LS_OPT_NOUNUSED:
+		case LS_OPT_EMPTY:
+		case LS_OPT_ALL:
+			s->ls_options |= token;
+			break;
+		case LS_OPT_DEV:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (*p == '/') {
+				struct kstat stat;
+				struct nameidata nd;
+				ret = path_lookup(p, LOOKUP_FOLLOW, &nd);
+				if (!ret)
+					ret = vfs_getattr(nd.path.mnt,
+							  nd.path.dentry, &stat);
+				if (!ret)
+					s->ls_dev = stat.rdev;
+			} else
+				s->ls_dev = simple_strtoul(p, NULL, 0);
+			/* printk("%lx %s\n", (long)s->ls_dev, p); */
+			kfree(p);
+			break;
+
+		default:
+			printk(KERN_WARNING "unknown parameter or missing value "
+			       "'%s' in ls command\n", p);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+out:
+	kfree(options);
+	return ret;
+}
+
+/*
+ * Add possible filters here.
+ * No permission check: we cannot verify the path's permission anyway.
+ * We simply demand root previledge for accessing /proc/filecache.
+ */
+static int may_show_inode(struct session *s, struct inode *inode)
+{
+	if (!atomic_read(&inode->i_count))
+		return 0;
+	if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+		return 0;
+	if (!inode->i_mapping)
+		return 0;
+
+	if (s->ls_dev && s->ls_dev != inode->i_sb->s_dev)
+		return 0;
+
+	if (s->ls_options & LS_OPT_ALL)
+		return 1;
+
+	if (!(s->ls_options & LS_OPT_EMPTY) && !inode->i_mapping->nrpages)
+		return 0;
+
+	if ((s->ls_options & LS_OPT_NOCLEAN) && !(inode->i_state & I_DIRTY))
+		return 0;
+
+	if ((s->ls_options & LS_OPT_NODIRTY) && (inode->i_state & I_DIRTY))
+		return 0;
+
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+	      S_ISLNK(inode->i_mode) || S_ISBLK(inode->i_mode)))
+		return 0;
+
+	return 1;
+}
+
+/*
+ * Full: there are more data following.
+ */
+static int iwin_full(struct session *s)
+{
+	return !s->iwin.cursor ||
+		s->iwin.cursor > s->iwin.origin + s->iwin.size;
+}
+
+static int iwin_push(struct session *s, struct inode *inode)
+{
+	if (!may_show_inode(s, inode))
+		return 0;
+
+	s->iwin.cursor++;
+
+	if (s->iwin.size >= IWIN_SIZE)
+		return 1;
+
+	if (s->iwin.cursor > s->iwin.origin)
+		s->iwin.inodes[s->iwin.size++] = inode;
+	return 0;
+}
+
+/*
+ * Travease the inode lists in order - newest first.
+ * And fill @s->iwin.inodes with inodes positioned in [@pos, @pos+IWIN_SIZE).
+ */
+static int iwin_fill(struct session *s, unsigned long pos)
+{
+	struct inode *inode;
+	struct super_block *sb;
+
+	s->iwin.origin = pos;
+	s->iwin.cursor = 0;
+	s->iwin.size = 0;
+
+	/*
+	 * We have a cursor inode, clean and expected to be unchanged.
+	 */
+	if (s->ipos.inode && pos >= s->ipos.pos &&
+			!(s->ipos.state & I_DIRTY) &&
+			s->ipos.state == s->ipos.inode->i_state) {
+		inode = s->ipos.inode;
+		s->iwin.cursor = s->ipos.pos;
+		goto continue_from_saved;
+	}
+
+	if (s->ls_options & LS_OPT_NODIRTY)
+		goto clean_inodes;
+
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (s->ls_dev && s->ls_dev != sb->s_dev)
+			continue;
+
+		list_for_each_entry(inode, &sb->s_dirty, i_list) {
+			if (iwin_push(s, inode))
+				goto out_full_unlock;
+		}
+		list_for_each_entry(inode, &sb->s_io, i_list) {
+			if (iwin_push(s, inode))
+				goto out_full_unlock;
+		}
+	}
+	spin_unlock(&sb_lock);
+
+clean_inodes:
+	list_for_each_entry(inode, &inode_in_use, i_list) {
+		if (iwin_push(s, inode))
+			goto out_full;
+continue_from_saved:
+		;
+	}
+
+	if (s->ls_options & LS_OPT_NOUNUSED)
+		return 0;
+
+	list_for_each_entry(inode, &inode_unused, i_list) {
+		if (iwin_push(s, inode))
+			goto out_full;
+	}
+
+	return 0;
+
+out_full_unlock:
+	spin_unlock(&sb_lock);
+out_full:
+	return 1;
+}
+
+static struct inode *iwin_inode(struct session *s, unsigned long pos)
+{
+	if ((iwin_full(s) && pos >= s->iwin.origin + s->iwin.size)
+			  || pos < s->iwin.origin)
+		iwin_fill(s, pos);
+
+	if (pos >= s->iwin.cursor)
+		return NULL;
+
+	s->ipos.pos = pos;
+	s->ipos.inode = s->iwin.inodes[pos - s->iwin.origin];
+	BUG_ON(!s->ipos.inode);
+	return s->ipos.inode;
+}
+
+static void show_inode(struct seq_file *m, struct inode *inode)
+{
+	char state[] = "--"; /* dirty, locked */
+	struct dentry *dentry;
+	loff_t size = i_size_read(inode);
+	unsigned long nrpages;
+	int percent;
+	int refcnt;
+	int shift;
+
+	if (!size)
+		size++;
+
+	if (inode->i_mapping)
+		nrpages = inode->i_mapping->nrpages;
+	else {
+		nrpages = 0;
+		WARN_ON(1);
+	}
+
+	for (shift = 0; (size >> shift) > ULONG_MAX / 128; shift += 12)
+		;
+	percent = min(100UL, (((100 * nrpages) >> shift) << PAGE_CACHE_SHIFT) /
+						(unsigned long)(size >> shift));
+
+	if (inode->i_state & (I_DIRTY_DATASYNC|I_DIRTY_PAGES))
+		state[0] = 'D';
+	else if (inode->i_state & I_DIRTY_SYNC)
+		state[0] = 'd';
+
+	if (inode->i_state & I_LOCK)
+		state[0] = 'L';
+
+	refcnt = 0;
+	list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
+		refcnt += atomic_read(&dentry->d_count);
+	}
+
+	seq_printf(m, "%10lu %10llu %8lu %7d ",
+			inode->i_ino,
+			DIV_ROUND_UP(size, 1024),
+			nrpages << (PAGE_CACHE_SHIFT - 10),
+			percent);
+
+	seq_printf(m, "%6d %5s ",
+			refcnt,
+			state);
+
+	seq_printf(m, "%02x:%02x(%s)\t",
+			MAJOR(inode->i_sb->s_dev),
+			MINOR(inode->i_sb->s_dev),
+			inode->i_sb->s_id);
+
+	if (list_empty(&inode->i_dentry)) {
+		if (!atomic_read(&inode->i_count))
+			seq_puts(m, "(noname)\n");
+		else
+			seq_printf(m, "(%02x:%02x)\n",
+					imajor(inode), iminor(inode));
+	} else {
+		struct path path = {
+			.mnt = NULL,
+			.dentry = list_entry(inode->i_dentry.next,
+					     struct dentry, d_alias)
+		};
+
+		seq_path(m, &path, " \t\n\\");
+		seq_putc(m, '\n');
+	}
+}
+
+static int ii_show(struct seq_file *m, void *v)
+{
+	unsigned long index = *(loff_t *) v;
+	struct session *s = m->private;
+        struct inode *inode;
+
+	if (index == 0) {
+		seq_puts(m, "# filecache " FILECACHE_VERSION "\n");
+		seq_puts(m, "#      ino       size   cached cached% "
+				"refcnt state "
+				"dev\t\tfile\n");
+	}
+
+        inode = iwin_inode(s,index);
+	show_inode(m, inode);
+
+	return 0;
+}
+
+static void *ii_start(struct seq_file *m, loff_t *pos)
+{
+	struct session *s = m->private;
+
+	s->iwin.size = 0;
+	s->iwin.inodes = (struct inode **)
+				__get_free_pages( GFP_KERNEL, IWIN_PAGE_ORDER);
+	if (!s->iwin.inodes)
+		return NULL;
+
+	spin_lock(&inode_lock);
+
+	return iwin_inode(s, *pos) ? pos : NULL;
+}
+
+static void *ii_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct session *s = m->private;
+
+	(*pos)++;
+	return iwin_inode(s, *pos) ? pos : NULL;
+}
+
+static void ii_stop(struct seq_file *m, void *v)
+{
+	struct session *s = m->private;
+	struct inode *inode = s->ipos.inode;
+
+	if (!s->iwin.inodes)
+		return;
+
+	if (inode) {
+		__iget(inode);
+		s->ipos.state = inode->i_state;
+	}
+	spin_unlock(&inode_lock);
+
+	free_pages((unsigned long) s->iwin.inodes, IWIN_PAGE_ORDER);
+	if (s->ipos.pinned_inode)
+		iput(s->ipos.pinned_inode);
+	s->ipos.pinned_inode = inode;
+}
+
+/*
+ * Listing of cached page ranges of a file.
+ *
+ * Usage:
+ * 		echo 'file name' > /proc/filecache
+ * 		cat /proc/filecache
+ */
+
+unsigned long page_mask;
+#define PG_MMAP		PG_lru		/* reuse any non-relevant flag */
+#define PG_BUFFER	PG_swapcache	/* ditto */
+#define PG_DIRTY	PG_error	/* ditto */
+#define PG_WRITEBACK	PG_buddy	/* ditto */
+
+/*
+ * Page state names, prefixed by their abbreviations.
+ */
+struct {
+	unsigned long	mask;
+	const char     *name;
+	int		faked;
+} page_flag [] = {
+	{1 << PG_referenced,	"R:referenced",	0},
+	{1 << PG_active,	"A:active",	0},
+	{1 << PG_MMAP,		"M:mmap",	1},
+
+	{1 << PG_uptodate,	"U:uptodate",	0},
+	{1 << PG_dirty,		"D:dirty",	0},
+	{1 << PG_writeback,	"W:writeback",	0},
+	{1 << PG_reclaim,	"X:readahead",	0},
+
+	{1 << PG_private,	"P:private",	0},
+	{1 << PG_owner_priv_1,	"O:owner",	0},
+
+	{1 << PG_BUFFER,	"b:buffer",	1},
+	{1 << PG_DIRTY,		"d:dirty",	1},
+	{1 << PG_WRITEBACK,	"w:writeback",	1},
+};
+
+static unsigned long page_flags(struct page* page)
+{
+	unsigned long flags;
+	struct address_space *mapping = page_mapping(page);
+
+	flags = page->flags & page_mask;
+
+	if (page_mapped(page))
+		flags |= (1 << PG_MMAP);
+
+	if (page_has_buffers(page))
+		flags |= (1 << PG_BUFFER);
+
+	if (mapping) {
+		if (radix_tree_tag_get(&mapping->page_tree,
+					page_index(page),
+					PAGECACHE_TAG_WRITEBACK))
+			flags |= (1 << PG_WRITEBACK);
+
+		if (radix_tree_tag_get(&mapping->page_tree,
+					page_index(page),
+					PAGECACHE_TAG_DIRTY))
+			flags |= (1 << PG_DIRTY);
+	}
+
+	return flags;
+}
+
+static int pages_similiar(struct page* page0, struct page* page)
+{
+	if (page_count(page0) != page_count(page))
+		return 0;
+
+	if (page_flags(page0) != page_flags(page))
+		return 0;
+
+	return 1;
+}
+
+static void show_range(struct seq_file *m, struct page* page, unsigned long len)
+{
+	int i;
+	unsigned long flags;
+
+	if (!m || !page)
+		return;
+
+	seq_printf(m, "%lu\t%lu\t", page->index, len);
+
+	flags = page_flags(page);
+	for (i = 0; i < ARRAY_SIZE(page_flag); i++)
+		seq_putc(m, (flags & page_flag[i].mask) ?
+					page_flag[i].name[0] : '_');
+
+	seq_printf(m, "\t%d\n", page_count(page));
+}
+
+#define BATCH_LINES	100
+static pgoff_t show_file_cache(struct seq_file *m,
+				struct address_space *mapping, pgoff_t start)
+{
+	int i;
+	int lines = 0;
+	pgoff_t len = 0;
+	struct pagevec pvec;
+	struct page *page;
+	struct page *page0 = NULL;
+
+	for (;;) {
+		pagevec_init(&pvec, 0);
+		pvec.nr = radix_tree_gang_lookup(&mapping->page_tree,
+				(void **)pvec.pages, start + len, PAGEVEC_SIZE);
+
+		if (pvec.nr == 0) {
+			show_range(m, page0, len);
+			start = ULONG_MAX;
+			goto out;
+		}
+
+		if (!page0)
+			page0 = pvec.pages[0];
+
+		for (i = 0; i < pvec.nr; i++) {
+			page = pvec.pages[i];
+
+			if (page->index == start + len &&
+					pages_similiar(page0, page))
+				len++;
+			else {
+				show_range(m, page0, len);
+				page0 = page;
+				start = page->index;
+				len = 1;
+				if (++lines > BATCH_LINES)
+					goto out;
+			}
+		}
+	}
+
+out:
+	return start;
+}
+
+static int pg_show(struct seq_file *m, void *v)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+	pgoff_t offset;
+
+	if (!file)
+		return ii_show(m, v);
+
+	offset = *(loff_t *) v;
+
+	if (!offset) { /* print header */
+		int i;
+
+		seq_puts(m, "# file ");
+		seq_path(m, &file->f_path, " \t\n\\");
+
+		seq_puts(m, "\n# flags");
+		for (i = 0; i < ARRAY_SIZE(page_flag); i++)
+			seq_printf(m, " %s", page_flag[i].name);
+
+		seq_puts(m, "\n# idx\tlen\tstate\t\trefcnt\n");
+	}
+
+	s->start_offset = offset;
+	s->next_offset = show_file_cache(m, file->f_mapping, offset);
+
+	return 0;
+}
+
+static void *file_pos(struct file *file, loff_t *pos)
+{
+	loff_t size = i_size_read(file->f_mapping->host);
+	pgoff_t end = DIV_ROUND_UP(size, PAGE_CACHE_SIZE);
+	pgoff_t offset = *pos;
+
+	return offset < end ? pos : NULL;
+}
+
+static void *pg_start(struct seq_file *m, loff_t *pos)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+	pgoff_t offset = *pos;
+
+	if (!file)
+		return ii_start(m, pos);
+
+	rcu_read_lock();
+
+	if (offset - s->start_offset == 1)
+		*pos = s->next_offset;
+	return file_pos(file, pos);
+}
+
+static void *pg_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+
+	if (!file)
+		return ii_next(m, v, pos);
+
+	*pos = s->next_offset;
+	return file_pos(file, pos);
+}
+
+static void pg_stop(struct seq_file *m, void *v)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+
+	if (!file)
+		return ii_stop(m, v);
+
+	rcu_read_unlock();
+}
+
+struct seq_operations seq_filecache_op = {
+	.start	= pg_start,
+	.next	= pg_next,
+	.stop	= pg_stop,
+	.show	= pg_show,
+};
+
+/*
+ * Implement the manual drop-all-pagecache function
+ */
+
+#define MAX_INODES	(PAGE_SIZE / sizeof(struct inode *))
+static int drop_pagecache(void)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct inode *inode;
+	struct inode **inodes;
+	unsigned long i, j, k;
+	int err = 0;
+
+	inodes = (struct inode **)__get_free_pages(GFP_KERNEL, IWIN_PAGE_ORDER);
+	if (!inodes)
+		return -ENOMEM;
+
+	for (i = 0; (head = get_inode_hash_budget(i)); i++) {
+		if (hlist_empty(head))
+			continue;
+
+		j = 0;
+		cond_resched();
+
+		/*
+		 * Grab some inodes.
+		 */
+		spin_lock(&inode_lock);
+		hlist_for_each (node, head) {
+			inode = hlist_entry(node, struct inode, i_hash);
+			if (!atomic_read(&inode->i_count))
+				continue;
+			if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+				continue;
+			if (!inode->i_mapping || !inode->i_mapping->nrpages)
+				continue;
+			__iget(inode);
+			inodes[j++] = inode;
+			if (j >= MAX_INODES)
+				break;
+		}
+		spin_unlock(&inode_lock);
+
+		/*
+		 * Free clean pages.
+		 */
+		for (k = 0; k < j; k++) {
+			inode = inodes[k];
+			invalidate_mapping_pages(inode->i_mapping, 0, ~1);
+			iput(inode);
+		}
+
+		/*
+		 * Simply ignore the remaining inodes.
+		 */
+		if (j >= MAX_INODES && !err) {
+			printk(KERN_WARNING
+				"Too many collides in inode hash table.\n"
+				"Pls boot with a larger ihash_entries=XXX.\n");
+			err = -EAGAIN;
+		}
+	}
+
+	free_pages((unsigned long) inodes, IWIN_PAGE_ORDER);
+	return err;
+}
+
+static void drop_slabcache(void)
+{
+	int nr_objects;
+
+	do {
+		nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
+	} while (nr_objects > 10);
+}
+
+/*
+ * Proc file operations.
+ */
+
+static int filecache_open(struct inode *inode, struct file *proc_file)
+{
+	struct seq_file *m;
+	struct session *s;
+	unsigned size;
+	char *buf = 0;
+	int ret;
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENOENT;
+
+	s = session_create();
+	if (IS_ERR(s)) {
+		ret = PTR_ERR(s);
+		goto out;
+	}
+	set_session(proc_file, s);
+
+	size = SBUF_SIZE;
+	buf = kmalloc(size, GFP_KERNEL);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = seq_open(proc_file, &seq_filecache_op);
+	if (!ret) {
+		m = proc_file->private_data;
+		m->private = s;
+		m->buf = buf;
+		m->size = size;
+	}
+
+out:
+	if (ret) {
+		kfree(s);
+		kfree(buf);
+		module_put(THIS_MODULE);
+	}
+	return ret;
+}
+
+static int filecache_release(struct inode *inode, struct file *proc_file)
+{
+	struct session *s = get_session(proc_file);
+	int ret;
+
+	session_release(s);
+	ret = seq_release(inode, proc_file);
+	module_put(THIS_MODULE);
+	return ret;
+}
+
+ssize_t filecache_write(struct file *proc_file, const char __user * buffer,
+			size_t count, loff_t *ppos)
+{
+	struct session *s;
+	char *name;
+	int err = 0;
+
+	if (count >= PATH_MAX + 5)
+		return -ENAMETOOLONG;
+
+	name = kmalloc(count+1, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+
+	if (copy_from_user(name, buffer, count)) {
+		err = -EFAULT;
+		goto out;
+	}
+
+	/* strip the optional newline */
+	if (count && name[count-1] == '\n')
+		name[count-1] = '\0';
+	else
+		name[count] = '\0';
+
+	s = get_session(proc_file);
+	if (!strcmp(name, "set private")) {
+		s->private_session = 1;
+		goto out;
+	}
+
+	if (!strncmp(name, "cat ", 4)) {
+		err = session_update_file(s, name+4);
+		goto out;
+	}
+
+	if (!strncmp(name, "ls", 2)) {
+		err = session_update_file(s, NULL);
+		if (!err)
+			err = ls_parse_options(name+2, s);
+		if (!err && !s->private_session) {
+			global_session.ls_dev = s->ls_dev;
+			global_session.ls_options = s->ls_options;
+		}
+		goto out;
+	}
+
+	if (!strncmp(name, "drop pagecache", 14)) {
+		err = drop_pagecache();
+		goto out;
+	}
+
+	if (!strncmp(name, "drop slabcache", 14)) {
+		drop_slabcache();
+		goto out;
+	}
+
+	/* err = -EINVAL; */
+	err = session_update_file(s, name);
+
+out:
+	kfree(name);
+
+	return err ? err : count;
+}
+
+static struct file_operations proc_filecache_fops = {
+	.owner		= THIS_MODULE,
+	.open		= filecache_open,
+	.release	= filecache_release,
+	.write		= filecache_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+};
+
+
+static __init int filecache_init(void)
+{
+	int i;
+	struct proc_dir_entry *entry;
+
+	entry = create_proc_entry("filecache", 0600, NULL);
+	if (entry)
+		entry->proc_fops = &proc_filecache_fops;
+
+	for (page_mask = i = 0; i < ARRAY_SIZE(page_flag); i++)
+		if (!page_flag[i].faked)
+			page_mask |= page_flag[i].mask;
+
+	return 0;
+}
+
+static void filecache_exit(void)
+{
+	remove_proc_entry("filecache", NULL);
+	if (global_session.query_file)
+		fput(global_session.query_file);
+}
+
+MODULE_AUTHOR("Fengguang Wu <wfg@mail.ustc.edu.cn>");
+MODULE_LICENSE("GPL");
+
+module_init(filecache_init);
+module_exit(filecache_exit);
--- linux-2.6.orig/fs/Kconfig
+++ linux-2.6/fs/Kconfig
@@ -750,6 +750,36 @@ config CONFIGFS_FS
 	  Both sysfs and configfs can and should exist together on the
 	  same system. One is not a replacement for the other.
 
+config PROC_FILECACHE
+	tristate "/proc/filecache support"
+	default m
+	depends on PROC_FS
+	help
+	  This option creates a file /proc/filecache which enables one to
+	  query/drop the cached files in memory.
+
+	  A quick start guide:
+
+	  # echo 'ls' > /proc/filecache
+	  # head /proc/filecache
+
+	  # echo 'cat /bin/bash' > /proc/filecache
+	  # head /proc/filecache
+
+	  # echo 'drop pagecache' > /proc/filecache
+	  # echo 'drop slabcache' > /proc/filecache
+
+	  For more details, please check Documentation/filesystems/proc.txt .
+
+	  It can be a handy tool for sysadms and desktop users.
+
+config PROC_FILECACHE_EXTRAS
+	bool "track extra states"
+	default y
+	depends on PROC_FILECACHE
+	help
+	  Track extra states that costs a little more time/space.
+
 endmenu
 
 menu "Miscellaneous filesystems"
--- linux-2.6.orig/fs/proc/Makefile
+++ linux-2.6/fs/proc/Makefile
@@ -2,7 +2,8 @@
 # Makefile for the Linux proc filesystem routines.
 #
 
-obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PROC_FS)		+= proc.o
+obj-$(CONFIG_PROC_FILECACHE)	+= filecache.o
 
 proc-y			:= nommu.o task_nommu.o
 proc-$(CONFIG_MMU)	:= mmu.o task_mmu.o
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -1907,6 +1907,7 @@ extern void remove_inode_hash(struct ino
 static inline void insert_inode_hash(struct inode *inode) {
 	__insert_inode_hash(inode, inode->i_ino);
 }
+struct hlist_head * get_inode_hash_budget(unsigned long index);
 
 extern struct file * get_empty_filp(void);
 extern void file_move(struct file *f, struct list_head *list);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05  0:48                     ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05  0:48 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

[-- Attachment #1: Type: text/plain, Size: 5982 bytes --]

On Wed, Mar 04, 2009 at 08:47:41PM +0200, Markus wrote:
> Am Mittwoch, 4. MA?rz 2009 schrieb Zdenek Kabelac:
> > Markus napsal(a):
> > >>>>>>> The memory mapped pages won't be dropped in this way.
> > >>>>>>> "cat /proc/meminfo" will show you the number of mapped pages.
> > >>>>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
> > >>>> cat /proc/meminfo
> > >>>>>>              total       used       free     shared    buffers     
> > >>>>>> cached
> > >>>>>> Mem:          3950       3262        688          0          0        
> > >>>>>> 359
> > >>>>>> -/+ buffers/cache:       2902       1047
> > >>>>>> Swap:         5890       1509       4381
> > >>>>>> MemTotal:        4045500 kB
> > >>>>>> MemFree:          705180 kB
> > >>>>>> Buffers:             508 kB
> > >>>>>> Cached:           367748 kB
> > >>>>>> SwapCached:       880744 kB
> > >>>>>> Active:          1555032 kB
> > >>>>>> Inactive:        1634868 kB
> > >>>>>> Active(anon):    1527100 kB
> > >>>>>> Inactive(anon):  1607328 kB
> > >>>>>> Active(file):      27932 kB
> > >>>>>> Inactive(file):    27540 kB
> > >>>>>> Unevictable:         816 kB
> > >>>>>> Mlocked:               0 kB
> > >>>>>> SwapTotal:       6032344 kB
> > >>>>>> SwapFree:        4486496 kB
> > >>>>>> Dirty:                 0 kB
> > >>>>>> Writeback:             0 kB
> > >>>>>> AnonPages:       2378112 kB
> > >>>>>> Mapped:            52196 kB
> > >>>>>> Slab:              65640 kB
> > >>>>>> SReclaimable:      46192 kB
> > >>>>>> SUnreclaim:        19448 kB
> > >>>>>> PageTables:        28200 kB
> > >>>>>> NFS_Unstable:          0 kB
> > >>>>>> Bounce:                0 kB
> > >>>>>> WritebackTmp:          0 kB
> > >>>>>> CommitLimit:     8055092 kB
> > >>>>>> Committed_AS:    4915636 kB
> > >>>>>> VmallocTotal:   34359738367 kB
> > >>>>>> VmallocUsed:       44580 kB
> > >>>>>> VmallocChunk:   34359677239 kB
> > >>>>>> DirectMap4k:     3182528 kB
> > >>>>>> DirectMap2M:     1011712 kB
> > >>>>>>
> > >>>>>> The cached reduced to 359 MB (after the dropping).
> > >>>>>> I dont know where to read the "number of mapped pages".
> > >>>>>> "Mapped" is about 51 MB.
> > >>>>> Does your tmpfs store lots of files?
> > >>>> Dont think so:
> > >>>>
> > >>>> # df -h
> > >>>> Filesystem            Size  Used Avail Use% Mounted on
> > >>>> /dev/md6               14G  8.2G  5.6G  60% /
> > >>>> udev                   10M  304K  9.8M   3% /dev
> > >>>> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
> > >>>> /dev/md4               19G   15G  3.1G  83% /home
> > >>>> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
> > >>>> shm                   2.0G     0  2.0G   0% /dev/shm
> > >>>> /dev/md1               99M   19M   76M  20% /boot
> > >>>>
> > >>>> I dont know what exactly all that memory is used for. It varies 
> > >>>> from about 300 MB to up to one GB.
> > >>>> Tell me where to look and I will!
> > >>> So you don't have lots of mapped pages(Mapped=51M) or tmpfs files.  
> > > It's strange to me that there are so many undroppable cached pages(Cached=359M),
> > > and most of them lie out of the LRU queue(Active+Inactive file=53M)...
> > >>> Anyone have better clues on these 'hidden' pages?
> > >> Maybe try this:
> > >>
> > >> cat /proc/`pidof X`/smaps | grep drm | wc -l
> > >>
> > >> you will see some growing numbers.
> > >>
> > >> Also check  cat /proc/dri/0/gem_objects
> > >> there should be some number  # object bytes - which should be close 
> to 
> > > your 
> > >> missing cached pages.
> > >>
> > >>
> > >> If you are using Intel GEM driver - there is some unlimited caching 
> > > issue
> > >> see: http://bugs.freedesktop.org/show_bug.cgi?id=20404
> > >>
> > > # cat /proc/`pidof X`/smaps | grep drm | wc -l
> > > 0
> > > # cat /proc/dri/0/gem_objects
> > > cat: /proc/dri/0/gem_objects: No such file or directory
> > > 
> > > I use Xorg 1.3 with an nvidia gpu. Dont know if I use a "Intel GEM 
> > > driver".
> > > 
> > 
> > 
> > Are you using binary  driver from NVidia ??
> > Maybe you should ask authors of this binary blob ?
> > 
> > Could you try to use for a while Vesa driver to see, if you are able 
> to get 
> > same strange results ?
> 
> I rebooted in console without the nvidia-module loaded and have the same 
> results (updated to 2.6.28.7 btw):
> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; cat /proc/meminfo
>              total       used       free     shared    buffers     
> cached
> Mem:          3950       1647       2303          0          0        
> 924
> -/+ buffers/cache:        722       3228
> Swap:         5890          0       5890
> MemTotal:        4045444 kB
> MemFree:         2358944 kB
> Buffers:             544 kB
> Cached:           946624 kB
> SwapCached:            0 kB
> Active:          1614756 kB
> Inactive:           7632 kB
> Active(anon):    1602476 kB
> Inactive(anon):        0 kB
> Active(file):      12280 kB
> Inactive(file):     7632 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:       6032344 kB
> SwapFree:        6032344 kB
> Dirty:                72 kB
> Writeback:            32 kB
> AnonPages:        675224 kB
> Mapped:            17756 kB
> Slab:              19936 kB
> SReclaimable:       9652 kB
> SUnreclaim:        10284 kB
> PageTables:         8296 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     8055064 kB
> Committed_AS:    3648088 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:       10616 kB
> VmallocChunk:   34359716459 kB
> DirectMap4k:        6080 kB
> DirectMap2M:     4188160 kB

Markus, you may want to try this patch, it will have better chance to figure out
the hidden file pages.

1) apply the patch and recompile kernel with CONFIG_PROC_FILECACHE=m
2) after booting:
        modprobe filecache
        cp /proc/filecache filecache-`date +'%F'`
3) send us the copied file, it will list all cached files, including
   the normally hidden ones.

Thanks,
Fengguang

[-- Attachment #2: filecache-2.6.28.patch --]
[-- Type: text/x-diff, Size: 31992 bytes --]

--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -27,6 +27,7 @@ extern unsigned long max_mapnr;
 extern unsigned long num_physpages;
 extern void * high_memory;
 extern int page_cluster;
+extern char * const zone_names[];
 
 #ifdef CONFIG_SYSCTL
 extern int sysctl_legacy_va_layout;
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -104,7 +104,7 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_Z
 
 EXPORT_SYMBOL(totalram_pages);
 
-static char * const zone_names[MAX_NR_ZONES] = {
+char * const zone_names[MAX_NR_ZONES] = {
 #ifdef CONFIG_ZONE_DMA
 	 "DMA",
 #endif
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -1943,7 +1943,10 @@ char *__d_path(const struct path *path, 
 
 		if (dentry == root->dentry && vfsmnt == root->mnt)
 			break;
-		if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
+		if (unlikely(!vfsmnt)) {
+			if (IS_ROOT(dentry))
+				break;
+		} else if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
 			/* Global root? */
 			if (vfsmnt->mnt_parent == vfsmnt) {
 				goto global_root;
--- linux-2.6.orig/lib/radix-tree.c
+++ linux-2.6/lib/radix-tree.c
@@ -564,7 +564,6 @@ out:
 }
 EXPORT_SYMBOL(radix_tree_tag_clear);
 
-#ifndef __KERNEL__	/* Only the test harness uses this at present */
 /**
  * radix_tree_tag_get - get a tag on a radix tree node
  * @root:		radix tree root
@@ -627,7 +626,6 @@ int radix_tree_tag_get(struct radix_tree
 	}
 }
 EXPORT_SYMBOL(radix_tree_tag_get);
-#endif
 
 /**
  *	radix_tree_next_hole    -    find the next hole (not-present entry)
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -82,6 +82,10 @@ static struct hlist_head *inode_hashtabl
  */
 DEFINE_SPINLOCK(inode_lock);
 
+EXPORT_SYMBOL(inode_in_use);
+EXPORT_SYMBOL(inode_unused);
+EXPORT_SYMBOL(inode_lock);
+
 /*
  * iprune_mutex provides exclusion between the kswapd or try_to_free_pages
  * icache shrinking path, and the umount path.  Without this exclusion,
@@ -247,6 +251,8 @@ void __iget(struct inode * inode)
 	inodes_stat.nr_unused--;
 }
 
+EXPORT_SYMBOL(__iget);
+
 /**
  * clear_inode - clear an inode
  * @inode: inode to clear
@@ -1353,6 +1359,16 @@ void inode_double_unlock(struct inode *i
 }
 EXPORT_SYMBOL(inode_double_unlock);
 
+
+struct hlist_head * get_inode_hash_budget(unsigned long index)
+{
+       if (index >= (1 << i_hash_shift))
+               return NULL;
+
+       return inode_hashtable + index;
+}
+EXPORT_SYMBOL_GPL(get_inode_hash_budget);
+
 static __initdata unsigned long ihash_entries;
 static int __init set_ihash_entries(char *str)
 {
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -45,6 +45,9 @@
 LIST_HEAD(super_blocks);
 DEFINE_SPINLOCK(sb_lock);
 
+EXPORT_SYMBOL(super_blocks);
+EXPORT_SYMBOL(sb_lock);
+
 /**
  *	alloc_super	-	create new superblock
  *	@type:	filesystem type superblock should belong to
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -230,6 +230,7 @@ unsigned long shrink_slab(unsigned long 
 	up_read(&shrinker_rwsem);
 	return ret;
 }
+EXPORT_SYMBOL(shrink_slab);
 
 /* Called without lock on whether page is mapped, so answer is unstable */
 static inline int page_mapping_inuse(struct page *page)
--- linux-2.6.orig/mm/swap_state.c
+++ linux-2.6/mm/swap_state.c
@@ -44,6 +44,7 @@ struct address_space swapper_space = {
 	.i_mmap_nonlinear = LIST_HEAD_INIT(swapper_space.i_mmap_nonlinear),
 	.backing_dev_info = &swap_backing_dev_info,
 };
+EXPORT_SYMBOL_GPL(swapper_space);
 
 #define INC_CACHE_INFO(x)	do { swap_cache_info.x++; } while (0)
 
--- linux-2.6.orig/Documentation/filesystems/proc.txt
+++ linux-2.6/Documentation/filesystems/proc.txt
@@ -266,6 +266,7 @@ Table 1-4: Kernel info in /proc
  driver	     Various drivers grouped here, currently rtc (2.4)
  execdomains Execdomains, related to security			(2.4)
  fb	     Frame Buffer devices				(2.4)
+ filecache   Query/drop in-memory file cache
  fs	     File system parameters, currently nfs/exports	(2.4)
  ide         Directory containing info about the IDE subsystem 
  interrupts  Interrupt usage                                   
@@ -456,6 +457,88 @@ varies by architecture and compile optio
 
 > cat /proc/meminfo
 
+..............................................................................
+
+filecache:
+
+Provides access to the in-memory file cache.
+
+To list an index of all cached files:
+
+    echo ls > /proc/filecache
+    cat /proc/filecache
+
+The output looks like:
+
+    # filecache 1.0
+    #      ino       size   cached cached%  state   refcnt  dev             file
+       1026334         91       92    100   --      66      03:02(hda2)     /lib/ld-2.3.6.so
+        233608       1242      972     78   --      66      03:02(hda2)     /lib/tls/libc-2.3.6.so
+         65203        651      476     73   --      1       03:02(hda2)     /bin/bash
+       1026445        261      160     61   --      10      03:02(hda2)     /lib/libncurses.so.5.5
+        235427         10       12    100   --      44      03:02(hda2)     /lib/tls/libdl-2.3.6.so
+
+FIELD	INTRO
+---------------------------------------------------------------------------
+ino	inode number
+size	inode size in KB
+cached	cached size in KB
+cached%	percent of file data cached
+state1	'-' clean; 'd' metadata dirty; 'D' data dirty
+state2	'-' unlocked; 'L' locked, normally indicates file being written out
+refcnt	file reference count, it's an in-kernel one, not exactly open count
+dev	major:minor numbers in hex, followed by a descriptive device name
+file	file path _inside_ the filesystem. There are several special names:
+	'(noname)':	the file name is not available
+	'(03:02)':	the file is a block device file of major:minor
+	'...(deleted)': the named file has been deleted from the disk
+
+To list the cached pages of a perticular file:
+
+    echo /bin/bash > /proc/filecache
+    cat /proc/filecache
+
+    # file /bin/bash
+    # flags R:referenced A:active U:uptodate D:dirty W:writeback M:mmap
+    # idx   len     state   refcnt
+    0       36      RAU__M  3
+    36      1       RAU__M  2
+    37      8       RAU__M  3
+    45      2       RAU___  1
+    47      6       RAU__M  3
+    53      3       RAU__M  2
+    56      2       RAU__M  3
+
+FIELD	INTRO
+----------------------------------------------------------------------------
+idx	page index
+len	number of pages which are cached and share the same state
+state	page state of the flags listed in line two
+refcnt	page reference count
+
+Careful users may notice that the file name to be queried is remembered between
+commands. Internally, the module has a global variable to store the file name
+parameter, so that it can be inherited by newly opened /proc/filecache file.
+However it can lead to interference for multiple queriers. The solution here
+is to obey a rule: only root can interactively change the file name parameter;
+normal users must go for scripts to access the interface. Scripts should do it
+by following the code example below:
+
+    filecache = open("/proc/filecache", "rw");
+    # avoid polluting the global parameter filename
+    filecache.write("set private");
+
+To instruct the kernel to drop clean caches, dentries and inodes from memory,
+causing that memory to become free:
+
+    # drop clean file data cache (i.e. file backed pagecache)
+    echo drop pagecache > /proc/filecache
+
+    # drop clean file metadata cache (i.e. dentries and inodes)
+    echo drop slabcache > /proc/filecache
+
+Note that the drop commands are non-destructive operations and dirty objects
+are not freeable, the user should run `sync' first.
 
 MemTotal:     16344972 kB
 MemFree:      13634064 kB
--- /dev/null
+++ linux-2.6/fs/proc/filecache.c
@@ -0,0 +1,1035 @@
+/*
+ * fs/proc/filecache.c
+ *
+ * Copyright (C) 2006, 2007 Fengguang Wu <wfg@mail.ustc.edu.cn>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/radix-tree.h>
+#include <linux/page-flags.h>
+#include <linux/pagevec.h>
+#include <linux/pagemap.h>
+#include <linux/vmalloc.h>
+#include <linux/writeback.h>
+#include <linux/buffer_head.h>
+#include <linux/parser.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/module.h>
+#include <asm/uaccess.h>
+
+/*
+ * Increase minor version when new columns are added;
+ * Increase major version when existing columns are changed.
+ */
+#define FILECACHE_VERSION	"1.0"
+
+/* Internal buffer sizes. The larger the more effcient. */
+#define SBUF_SIZE	(128<<10)
+#define IWIN_PAGE_ORDER	3
+#define IWIN_SIZE	((PAGE_SIZE<<IWIN_PAGE_ORDER) / sizeof(struct inode *))
+
+/*
+ * Session management.
+ *
+ * Each opened /proc/filecache file is assiocated with a session object.
+ * Also there is a global_session that maintains status across open()/close()
+ * (i.e. the lifetime of an opened file), so that a casual user can query the
+ * filecache via _multiple_ simple shell commands like
+ * 'echo cat /bin/bash > /proc/filecache; cat /proc/filecache'.
+ *
+ * session.query_file is the file whose cache info is to be queried.
+ * Its value determines what we get on read():
+ * 	- NULL: ii_*() called to show the inode index
+ * 	- filp: pg_*() called to show the page groups of a filp
+ *
+ * session.query_file is
+ * 	- cloned from global_session.query_file on open();
+ * 	- updated on write("cat filename");
+ * 	  note that the new file will also be saved in global_session.query_file if
+ * 	  session.private_session is false.
+ */
+
+struct session {
+	/* options */
+	int		private_session;
+	unsigned long	ls_options;
+	dev_t		ls_dev;
+
+	/* parameters */
+	struct file	*query_file;
+
+	/* seqfile pos */
+	pgoff_t		start_offset;
+	pgoff_t		next_offset;
+
+	/* inode at last pos */
+	struct {
+		unsigned long pos;
+		unsigned long state;
+		struct inode *inode;
+		struct inode *pinned_inode;
+	} ipos;
+
+	/* inode window */
+	struct {
+		unsigned long cursor;
+		unsigned long origin;
+		unsigned long size;
+		struct inode **inodes;
+	} iwin;
+};
+
+static struct session global_session;
+
+/*
+ * Session address is stored in proc_file->f_ra.start:
+ * we assume that there will be no readahead for proc_file.
+ */
+static struct session *get_session(struct file *proc_file)
+{
+	return (struct session *)proc_file->f_ra.start;
+}
+
+static void set_session(struct file *proc_file, struct session *s)
+{
+	BUG_ON(proc_file->f_ra.start);
+	proc_file->f_ra.start = (unsigned long)s;
+}
+
+static void update_global_file(struct session *s)
+{
+	if (s->private_session)
+		return;
+
+	if (global_session.query_file)
+		fput(global_session.query_file);
+
+	global_session.query_file = s->query_file;
+
+	if (global_session.query_file)
+		get_file(global_session.query_file);
+}
+
+/*
+ * Cases of the name:
+ * 1) NULL                (new session)
+ * 	s->query_file = global_session.query_file = 0;
+ * 2) ""                  (ls/la)
+ * 	s->query_file = global_session.query_file;
+ * 3) a regular file name (cat newfile)
+ * 	s->query_file = global_session.query_file = newfile;
+ */
+static int session_update_file(struct session *s, char *name)
+{
+	static DEFINE_MUTEX(mutex); /* protects global_session.query_file */
+	int err = 0;
+
+	mutex_lock(&mutex);
+
+	/*
+	 * We are to quit, or to list the cached files.
+	 * Reset *.query_file.
+	 */
+	if (!name) {
+		if (s->query_file) {
+			fput(s->query_file);
+			s->query_file = NULL;
+		}
+		update_global_file(s);
+		goto out;
+	}
+
+	/*
+	 * This is a new session.
+	 * Inherit options/parameters from global ones.
+	 */
+	if (name[0] == '\0') {
+		*s = global_session;
+		if (s->query_file)
+			get_file(s->query_file);
+		goto out;
+	}
+
+	/*
+	 * Open the named file.
+	 */
+	if (s->query_file)
+		fput(s->query_file);
+	s->query_file = filp_open(name, O_RDONLY|O_LARGEFILE, 0);
+	if (IS_ERR(s->query_file)) {
+		err = PTR_ERR(s->query_file);
+		s->query_file = NULL;
+	} else
+		update_global_file(s);
+
+out:
+	mutex_unlock(&mutex);
+
+	return err;
+}
+
+static struct session *session_create(void)
+{
+	struct session *s;
+	int err = 0;
+
+	s = kmalloc(sizeof(*s), GFP_KERNEL);
+	if (s)
+		err = session_update_file(s, "");
+	else
+		err = -ENOMEM;
+
+	return err ? ERR_PTR(err) : s;
+}
+
+static void session_release(struct session *s)
+{
+	if (s->ipos.pinned_inode)
+		iput(s->ipos.pinned_inode);
+	if (s->query_file)
+		fput(s->query_file);
+	kfree(s);
+}
+
+
+/*
+ * Listing of cached files.
+ *
+ * Usage:
+ * 		echo > /proc/filecache  # enter listing mode
+ * 		cat /proc/filecache     # get the file listing
+ */
+
+/* code style borrowed from ib_srp.c */
+enum {
+	LS_OPT_ERR	=	0,
+	LS_OPT_NOCLEAN	=	1 << 0,
+	LS_OPT_NODIRTY	=	1 << 1,
+	LS_OPT_NOUNUSED	=	1 << 2,
+	LS_OPT_EMPTY	=	1 << 3,
+	LS_OPT_ALL	=	1 << 4,
+	LS_OPT_DEV	=	1 << 5,
+};
+
+static match_table_t ls_opt_tokens = {
+	{ LS_OPT_NOCLEAN,	"noclean" 	},
+	{ LS_OPT_NODIRTY,	"nodirty" 	},
+	{ LS_OPT_NOUNUSED,	"nounused" 	},
+	{ LS_OPT_EMPTY,		"empty"		},
+	{ LS_OPT_ALL,		"all" 		},
+	{ LS_OPT_DEV,		"dev=%s"	},
+	{ LS_OPT_ERR,		NULL 		}
+};
+
+static int ls_parse_options(const char *buf, struct session *s)
+{
+	substring_t args[MAX_OPT_ARGS];
+	char *options, *sep_opt;
+	char *p;
+	int token;
+	int ret = 0;
+
+	if (!buf)
+		return 0;
+	options = kstrdup(buf, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	s->ls_options = 0;
+	sep_opt = options;
+	while ((p = strsep(&sep_opt, " ")) != NULL) {
+		if (!*p)
+			continue;
+
+		token = match_token(p, ls_opt_tokens, args);
+
+		switch (token) {
+		case LS_OPT_NOCLEAN:
+		case LS_OPT_NODIRTY:
+		case LS_OPT_NOUNUSED:
+		case LS_OPT_EMPTY:
+		case LS_OPT_ALL:
+			s->ls_options |= token;
+			break;
+		case LS_OPT_DEV:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (*p == '/') {
+				struct kstat stat;
+				struct nameidata nd;
+				ret = path_lookup(p, LOOKUP_FOLLOW, &nd);
+				if (!ret)
+					ret = vfs_getattr(nd.path.mnt,
+							  nd.path.dentry, &stat);
+				if (!ret)
+					s->ls_dev = stat.rdev;
+			} else
+				s->ls_dev = simple_strtoul(p, NULL, 0);
+			/* printk("%lx %s\n", (long)s->ls_dev, p); */
+			kfree(p);
+			break;
+
+		default:
+			printk(KERN_WARNING "unknown parameter or missing value "
+			       "'%s' in ls command\n", p);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+out:
+	kfree(options);
+	return ret;
+}
+
+/*
+ * Add possible filters here.
+ * No permission check: we cannot verify the path's permission anyway.
+ * We simply demand root previledge for accessing /proc/filecache.
+ */
+static int may_show_inode(struct session *s, struct inode *inode)
+{
+	if (!atomic_read(&inode->i_count))
+		return 0;
+	if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+		return 0;
+	if (!inode->i_mapping)
+		return 0;
+
+	if (s->ls_dev && s->ls_dev != inode->i_sb->s_dev)
+		return 0;
+
+	if (s->ls_options & LS_OPT_ALL)
+		return 1;
+
+	if (!(s->ls_options & LS_OPT_EMPTY) && !inode->i_mapping->nrpages)
+		return 0;
+
+	if ((s->ls_options & LS_OPT_NOCLEAN) && !(inode->i_state & I_DIRTY))
+		return 0;
+
+	if ((s->ls_options & LS_OPT_NODIRTY) && (inode->i_state & I_DIRTY))
+		return 0;
+
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+	      S_ISLNK(inode->i_mode) || S_ISBLK(inode->i_mode)))
+		return 0;
+
+	return 1;
+}
+
+/*
+ * Full: there are more data following.
+ */
+static int iwin_full(struct session *s)
+{
+	return !s->iwin.cursor ||
+		s->iwin.cursor > s->iwin.origin + s->iwin.size;
+}
+
+static int iwin_push(struct session *s, struct inode *inode)
+{
+	if (!may_show_inode(s, inode))
+		return 0;
+
+	s->iwin.cursor++;
+
+	if (s->iwin.size >= IWIN_SIZE)
+		return 1;
+
+	if (s->iwin.cursor > s->iwin.origin)
+		s->iwin.inodes[s->iwin.size++] = inode;
+	return 0;
+}
+
+/*
+ * Travease the inode lists in order - newest first.
+ * And fill @s->iwin.inodes with inodes positioned in [@pos, @pos+IWIN_SIZE).
+ */
+static int iwin_fill(struct session *s, unsigned long pos)
+{
+	struct inode *inode;
+	struct super_block *sb;
+
+	s->iwin.origin = pos;
+	s->iwin.cursor = 0;
+	s->iwin.size = 0;
+
+	/*
+	 * We have a cursor inode, clean and expected to be unchanged.
+	 */
+	if (s->ipos.inode && pos >= s->ipos.pos &&
+			!(s->ipos.state & I_DIRTY) &&
+			s->ipos.state == s->ipos.inode->i_state) {
+		inode = s->ipos.inode;
+		s->iwin.cursor = s->ipos.pos;
+		goto continue_from_saved;
+	}
+
+	if (s->ls_options & LS_OPT_NODIRTY)
+		goto clean_inodes;
+
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (s->ls_dev && s->ls_dev != sb->s_dev)
+			continue;
+
+		list_for_each_entry(inode, &sb->s_dirty, i_list) {
+			if (iwin_push(s, inode))
+				goto out_full_unlock;
+		}
+		list_for_each_entry(inode, &sb->s_io, i_list) {
+			if (iwin_push(s, inode))
+				goto out_full_unlock;
+		}
+	}
+	spin_unlock(&sb_lock);
+
+clean_inodes:
+	list_for_each_entry(inode, &inode_in_use, i_list) {
+		if (iwin_push(s, inode))
+			goto out_full;
+continue_from_saved:
+		;
+	}
+
+	if (s->ls_options & LS_OPT_NOUNUSED)
+		return 0;
+
+	list_for_each_entry(inode, &inode_unused, i_list) {
+		if (iwin_push(s, inode))
+			goto out_full;
+	}
+
+	return 0;
+
+out_full_unlock:
+	spin_unlock(&sb_lock);
+out_full:
+	return 1;
+}
+
+static struct inode *iwin_inode(struct session *s, unsigned long pos)
+{
+	if ((iwin_full(s) && pos >= s->iwin.origin + s->iwin.size)
+			  || pos < s->iwin.origin)
+		iwin_fill(s, pos);
+
+	if (pos >= s->iwin.cursor)
+		return NULL;
+
+	s->ipos.pos = pos;
+	s->ipos.inode = s->iwin.inodes[pos - s->iwin.origin];
+	BUG_ON(!s->ipos.inode);
+	return s->ipos.inode;
+}
+
+static void show_inode(struct seq_file *m, struct inode *inode)
+{
+	char state[] = "--"; /* dirty, locked */
+	struct dentry *dentry;
+	loff_t size = i_size_read(inode);
+	unsigned long nrpages;
+	int percent;
+	int refcnt;
+	int shift;
+
+	if (!size)
+		size++;
+
+	if (inode->i_mapping)
+		nrpages = inode->i_mapping->nrpages;
+	else {
+		nrpages = 0;
+		WARN_ON(1);
+	}
+
+	for (shift = 0; (size >> shift) > ULONG_MAX / 128; shift += 12)
+		;
+	percent = min(100UL, (((100 * nrpages) >> shift) << PAGE_CACHE_SHIFT) /
+						(unsigned long)(size >> shift));
+
+	if (inode->i_state & (I_DIRTY_DATASYNC|I_DIRTY_PAGES))
+		state[0] = 'D';
+	else if (inode->i_state & I_DIRTY_SYNC)
+		state[0] = 'd';
+
+	if (inode->i_state & I_LOCK)
+		state[0] = 'L';
+
+	refcnt = 0;
+	list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
+		refcnt += atomic_read(&dentry->d_count);
+	}
+
+	seq_printf(m, "%10lu %10llu %8lu %7d ",
+			inode->i_ino,
+			DIV_ROUND_UP(size, 1024),
+			nrpages << (PAGE_CACHE_SHIFT - 10),
+			percent);
+
+	seq_printf(m, "%6d %5s ",
+			refcnt,
+			state);
+
+	seq_printf(m, "%02x:%02x(%s)\t",
+			MAJOR(inode->i_sb->s_dev),
+			MINOR(inode->i_sb->s_dev),
+			inode->i_sb->s_id);
+
+	if (list_empty(&inode->i_dentry)) {
+		if (!atomic_read(&inode->i_count))
+			seq_puts(m, "(noname)\n");
+		else
+			seq_printf(m, "(%02x:%02x)\n",
+					imajor(inode), iminor(inode));
+	} else {
+		struct path path = {
+			.mnt = NULL,
+			.dentry = list_entry(inode->i_dentry.next,
+					     struct dentry, d_alias)
+		};
+
+		seq_path(m, &path, " \t\n\\");
+		seq_putc(m, '\n');
+	}
+}
+
+static int ii_show(struct seq_file *m, void *v)
+{
+	unsigned long index = *(loff_t *) v;
+	struct session *s = m->private;
+        struct inode *inode;
+
+	if (index == 0) {
+		seq_puts(m, "# filecache " FILECACHE_VERSION "\n");
+		seq_puts(m, "#      ino       size   cached cached% "
+				"refcnt state "
+				"dev\t\tfile\n");
+	}
+
+        inode = iwin_inode(s,index);
+	show_inode(m, inode);
+
+	return 0;
+}
+
+static void *ii_start(struct seq_file *m, loff_t *pos)
+{
+	struct session *s = m->private;
+
+	s->iwin.size = 0;
+	s->iwin.inodes = (struct inode **)
+				__get_free_pages( GFP_KERNEL, IWIN_PAGE_ORDER);
+	if (!s->iwin.inodes)
+		return NULL;
+
+	spin_lock(&inode_lock);
+
+	return iwin_inode(s, *pos) ? pos : NULL;
+}
+
+static void *ii_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct session *s = m->private;
+
+	(*pos)++;
+	return iwin_inode(s, *pos) ? pos : NULL;
+}
+
+static void ii_stop(struct seq_file *m, void *v)
+{
+	struct session *s = m->private;
+	struct inode *inode = s->ipos.inode;
+
+	if (!s->iwin.inodes)
+		return;
+
+	if (inode) {
+		__iget(inode);
+		s->ipos.state = inode->i_state;
+	}
+	spin_unlock(&inode_lock);
+
+	free_pages((unsigned long) s->iwin.inodes, IWIN_PAGE_ORDER);
+	if (s->ipos.pinned_inode)
+		iput(s->ipos.pinned_inode);
+	s->ipos.pinned_inode = inode;
+}
+
+/*
+ * Listing of cached page ranges of a file.
+ *
+ * Usage:
+ * 		echo 'file name' > /proc/filecache
+ * 		cat /proc/filecache
+ */
+
+unsigned long page_mask;
+#define PG_MMAP		PG_lru		/* reuse any non-relevant flag */
+#define PG_BUFFER	PG_swapcache	/* ditto */
+#define PG_DIRTY	PG_error	/* ditto */
+#define PG_WRITEBACK	PG_buddy	/* ditto */
+
+/*
+ * Page state names, prefixed by their abbreviations.
+ */
+struct {
+	unsigned long	mask;
+	const char     *name;
+	int		faked;
+} page_flag [] = {
+	{1 << PG_referenced,	"R:referenced",	0},
+	{1 << PG_active,	"A:active",	0},
+	{1 << PG_MMAP,		"M:mmap",	1},
+
+	{1 << PG_uptodate,	"U:uptodate",	0},
+	{1 << PG_dirty,		"D:dirty",	0},
+	{1 << PG_writeback,	"W:writeback",	0},
+	{1 << PG_reclaim,	"X:readahead",	0},
+
+	{1 << PG_private,	"P:private",	0},
+	{1 << PG_owner_priv_1,	"O:owner",	0},
+
+	{1 << PG_BUFFER,	"b:buffer",	1},
+	{1 << PG_DIRTY,		"d:dirty",	1},
+	{1 << PG_WRITEBACK,	"w:writeback",	1},
+};
+
+static unsigned long page_flags(struct page* page)
+{
+	unsigned long flags;
+	struct address_space *mapping = page_mapping(page);
+
+	flags = page->flags & page_mask;
+
+	if (page_mapped(page))
+		flags |= (1 << PG_MMAP);
+
+	if (page_has_buffers(page))
+		flags |= (1 << PG_BUFFER);
+
+	if (mapping) {
+		if (radix_tree_tag_get(&mapping->page_tree,
+					page_index(page),
+					PAGECACHE_TAG_WRITEBACK))
+			flags |= (1 << PG_WRITEBACK);
+
+		if (radix_tree_tag_get(&mapping->page_tree,
+					page_index(page),
+					PAGECACHE_TAG_DIRTY))
+			flags |= (1 << PG_DIRTY);
+	}
+
+	return flags;
+}
+
+static int pages_similiar(struct page* page0, struct page* page)
+{
+	if (page_count(page0) != page_count(page))
+		return 0;
+
+	if (page_flags(page0) != page_flags(page))
+		return 0;
+
+	return 1;
+}
+
+static void show_range(struct seq_file *m, struct page* page, unsigned long len)
+{
+	int i;
+	unsigned long flags;
+
+	if (!m || !page)
+		return;
+
+	seq_printf(m, "%lu\t%lu\t", page->index, len);
+
+	flags = page_flags(page);
+	for (i = 0; i < ARRAY_SIZE(page_flag); i++)
+		seq_putc(m, (flags & page_flag[i].mask) ?
+					page_flag[i].name[0] : '_');
+
+	seq_printf(m, "\t%d\n", page_count(page));
+}
+
+#define BATCH_LINES	100
+static pgoff_t show_file_cache(struct seq_file *m,
+				struct address_space *mapping, pgoff_t start)
+{
+	int i;
+	int lines = 0;
+	pgoff_t len = 0;
+	struct pagevec pvec;
+	struct page *page;
+	struct page *page0 = NULL;
+
+	for (;;) {
+		pagevec_init(&pvec, 0);
+		pvec.nr = radix_tree_gang_lookup(&mapping->page_tree,
+				(void **)pvec.pages, start + len, PAGEVEC_SIZE);
+
+		if (pvec.nr == 0) {
+			show_range(m, page0, len);
+			start = ULONG_MAX;
+			goto out;
+		}
+
+		if (!page0)
+			page0 = pvec.pages[0];
+
+		for (i = 0; i < pvec.nr; i++) {
+			page = pvec.pages[i];
+
+			if (page->index == start + len &&
+					pages_similiar(page0, page))
+				len++;
+			else {
+				show_range(m, page0, len);
+				page0 = page;
+				start = page->index;
+				len = 1;
+				if (++lines > BATCH_LINES)
+					goto out;
+			}
+		}
+	}
+
+out:
+	return start;
+}
+
+static int pg_show(struct seq_file *m, void *v)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+	pgoff_t offset;
+
+	if (!file)
+		return ii_show(m, v);
+
+	offset = *(loff_t *) v;
+
+	if (!offset) { /* print header */
+		int i;
+
+		seq_puts(m, "# file ");
+		seq_path(m, &file->f_path, " \t\n\\");
+
+		seq_puts(m, "\n# flags");
+		for (i = 0; i < ARRAY_SIZE(page_flag); i++)
+			seq_printf(m, " %s", page_flag[i].name);
+
+		seq_puts(m, "\n# idx\tlen\tstate\t\trefcnt\n");
+	}
+
+	s->start_offset = offset;
+	s->next_offset = show_file_cache(m, file->f_mapping, offset);
+
+	return 0;
+}
+
+static void *file_pos(struct file *file, loff_t *pos)
+{
+	loff_t size = i_size_read(file->f_mapping->host);
+	pgoff_t end = DIV_ROUND_UP(size, PAGE_CACHE_SIZE);
+	pgoff_t offset = *pos;
+
+	return offset < end ? pos : NULL;
+}
+
+static void *pg_start(struct seq_file *m, loff_t *pos)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+	pgoff_t offset = *pos;
+
+	if (!file)
+		return ii_start(m, pos);
+
+	rcu_read_lock();
+
+	if (offset - s->start_offset == 1)
+		*pos = s->next_offset;
+	return file_pos(file, pos);
+}
+
+static void *pg_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+
+	if (!file)
+		return ii_next(m, v, pos);
+
+	*pos = s->next_offset;
+	return file_pos(file, pos);
+}
+
+static void pg_stop(struct seq_file *m, void *v)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+
+	if (!file)
+		return ii_stop(m, v);
+
+	rcu_read_unlock();
+}
+
+struct seq_operations seq_filecache_op = {
+	.start	= pg_start,
+	.next	= pg_next,
+	.stop	= pg_stop,
+	.show	= pg_show,
+};
+
+/*
+ * Implement the manual drop-all-pagecache function
+ */
+
+#define MAX_INODES	(PAGE_SIZE / sizeof(struct inode *))
+static int drop_pagecache(void)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct inode *inode;
+	struct inode **inodes;
+	unsigned long i, j, k;
+	int err = 0;
+
+	inodes = (struct inode **)__get_free_pages(GFP_KERNEL, IWIN_PAGE_ORDER);
+	if (!inodes)
+		return -ENOMEM;
+
+	for (i = 0; (head = get_inode_hash_budget(i)); i++) {
+		if (hlist_empty(head))
+			continue;
+
+		j = 0;
+		cond_resched();
+
+		/*
+		 * Grab some inodes.
+		 */
+		spin_lock(&inode_lock);
+		hlist_for_each (node, head) {
+			inode = hlist_entry(node, struct inode, i_hash);
+			if (!atomic_read(&inode->i_count))
+				continue;
+			if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+				continue;
+			if (!inode->i_mapping || !inode->i_mapping->nrpages)
+				continue;
+			__iget(inode);
+			inodes[j++] = inode;
+			if (j >= MAX_INODES)
+				break;
+		}
+		spin_unlock(&inode_lock);
+
+		/*
+		 * Free clean pages.
+		 */
+		for (k = 0; k < j; k++) {
+			inode = inodes[k];
+			invalidate_mapping_pages(inode->i_mapping, 0, ~1);
+			iput(inode);
+		}
+
+		/*
+		 * Simply ignore the remaining inodes.
+		 */
+		if (j >= MAX_INODES && !err) {
+			printk(KERN_WARNING
+				"Too many collides in inode hash table.\n"
+				"Pls boot with a larger ihash_entries=XXX.\n");
+			err = -EAGAIN;
+		}
+	}
+
+	free_pages((unsigned long) inodes, IWIN_PAGE_ORDER);
+	return err;
+}
+
+static void drop_slabcache(void)
+{
+	int nr_objects;
+
+	do {
+		nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
+	} while (nr_objects > 10);
+}
+
+/*
+ * Proc file operations.
+ */
+
+static int filecache_open(struct inode *inode, struct file *proc_file)
+{
+	struct seq_file *m;
+	struct session *s;
+	unsigned size;
+	char *buf = 0;
+	int ret;
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENOENT;
+
+	s = session_create();
+	if (IS_ERR(s)) {
+		ret = PTR_ERR(s);
+		goto out;
+	}
+	set_session(proc_file, s);
+
+	size = SBUF_SIZE;
+	buf = kmalloc(size, GFP_KERNEL);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = seq_open(proc_file, &seq_filecache_op);
+	if (!ret) {
+		m = proc_file->private_data;
+		m->private = s;
+		m->buf = buf;
+		m->size = size;
+	}
+
+out:
+	if (ret) {
+		kfree(s);
+		kfree(buf);
+		module_put(THIS_MODULE);
+	}
+	return ret;
+}
+
+static int filecache_release(struct inode *inode, struct file *proc_file)
+{
+	struct session *s = get_session(proc_file);
+	int ret;
+
+	session_release(s);
+	ret = seq_release(inode, proc_file);
+	module_put(THIS_MODULE);
+	return ret;
+}
+
+ssize_t filecache_write(struct file *proc_file, const char __user * buffer,
+			size_t count, loff_t *ppos)
+{
+	struct session *s;
+	char *name;
+	int err = 0;
+
+	if (count >= PATH_MAX + 5)
+		return -ENAMETOOLONG;
+
+	name = kmalloc(count+1, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+
+	if (copy_from_user(name, buffer, count)) {
+		err = -EFAULT;
+		goto out;
+	}
+
+	/* strip the optional newline */
+	if (count && name[count-1] == '\n')
+		name[count-1] = '\0';
+	else
+		name[count] = '\0';
+
+	s = get_session(proc_file);
+	if (!strcmp(name, "set private")) {
+		s->private_session = 1;
+		goto out;
+	}
+
+	if (!strncmp(name, "cat ", 4)) {
+		err = session_update_file(s, name+4);
+		goto out;
+	}
+
+	if (!strncmp(name, "ls", 2)) {
+		err = session_update_file(s, NULL);
+		if (!err)
+			err = ls_parse_options(name+2, s);
+		if (!err && !s->private_session) {
+			global_session.ls_dev = s->ls_dev;
+			global_session.ls_options = s->ls_options;
+		}
+		goto out;
+	}
+
+	if (!strncmp(name, "drop pagecache", 14)) {
+		err = drop_pagecache();
+		goto out;
+	}
+
+	if (!strncmp(name, "drop slabcache", 14)) {
+		drop_slabcache();
+		goto out;
+	}
+
+	/* err = -EINVAL; */
+	err = session_update_file(s, name);
+
+out:
+	kfree(name);
+
+	return err ? err : count;
+}
+
+static struct file_operations proc_filecache_fops = {
+	.owner		= THIS_MODULE,
+	.open		= filecache_open,
+	.release	= filecache_release,
+	.write		= filecache_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+};
+
+
+static __init int filecache_init(void)
+{
+	int i;
+	struct proc_dir_entry *entry;
+
+	entry = create_proc_entry("filecache", 0600, NULL);
+	if (entry)
+		entry->proc_fops = &proc_filecache_fops;
+
+	for (page_mask = i = 0; i < ARRAY_SIZE(page_flag); i++)
+		if (!page_flag[i].faked)
+			page_mask |= page_flag[i].mask;
+
+	return 0;
+}
+
+static void filecache_exit(void)
+{
+	remove_proc_entry("filecache", NULL);
+	if (global_session.query_file)
+		fput(global_session.query_file);
+}
+
+MODULE_AUTHOR("Fengguang Wu <wfg@mail.ustc.edu.cn>");
+MODULE_LICENSE("GPL");
+
+module_init(filecache_init);
+module_exit(filecache_exit);
--- linux-2.6.orig/fs/Kconfig
+++ linux-2.6/fs/Kconfig
@@ -750,6 +750,36 @@ config CONFIGFS_FS
 	  Both sysfs and configfs can and should exist together on the
 	  same system. One is not a replacement for the other.
 
+config PROC_FILECACHE
+	tristate "/proc/filecache support"
+	default m
+	depends on PROC_FS
+	help
+	  This option creates a file /proc/filecache which enables one to
+	  query/drop the cached files in memory.
+
+	  A quick start guide:
+
+	  # echo 'ls' > /proc/filecache
+	  # head /proc/filecache
+
+	  # echo 'cat /bin/bash' > /proc/filecache
+	  # head /proc/filecache
+
+	  # echo 'drop pagecache' > /proc/filecache
+	  # echo 'drop slabcache' > /proc/filecache
+
+	  For more details, please check Documentation/filesystems/proc.txt .
+
+	  It can be a handy tool for sysadms and desktop users.
+
+config PROC_FILECACHE_EXTRAS
+	bool "track extra states"
+	default y
+	depends on PROC_FILECACHE
+	help
+	  Track extra states that costs a little more time/space.
+
 endmenu
 
 menu "Miscellaneous filesystems"
--- linux-2.6.orig/fs/proc/Makefile
+++ linux-2.6/fs/proc/Makefile
@@ -2,7 +2,8 @@
 # Makefile for the Linux proc filesystem routines.
 #
 
-obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PROC_FS)		+= proc.o
+obj-$(CONFIG_PROC_FILECACHE)	+= filecache.o
 
 proc-y			:= nommu.o task_nommu.o
 proc-$(CONFIG_MMU)	:= mmu.o task_mmu.o
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -1907,6 +1907,7 @@ extern void remove_inode_hash(struct ino
 static inline void insert_inode_hash(struct inode *inode) {
 	__insert_inode_hash(inode, inode->i_ino);
 }
+struct hlist_head * get_inode_hash_budget(unsigned long index);
 
 extern struct file * get_empty_filp(void);
 extern void file_move(struct file *f, struct list_head *list);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05  0:48                     ` drop_caches Wu Fengguang
@ 2009-03-05  9:06                       ` Lukas Hejtmanek
  -1 siblings, 0 replies; 49+ messages in thread
From: Lukas Hejtmanek @ 2009-03-05  9:06 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Markus, linux-kernel, Zdenek Kabelac, linux-mm

Hello,

On Thu, Mar 05, 2009 at 08:48:50AM +0800, Wu Fengguang wrote:
> Markus, you may want to try this patch, it will have better chance to figure
> out the hidden file pages.

just for curiosity, would it be possible to print process name which caused
the file to be loaded into caches?

-- 
Lukáš Hejtmánek

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05  9:06                       ` Lukas Hejtmanek
  0 siblings, 0 replies; 49+ messages in thread
From: Lukas Hejtmanek @ 2009-03-05  9:06 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Markus, linux-kernel, Zdenek Kabelac, linux-mm

Hello,

On Thu, Mar 05, 2009 at 08:48:50AM +0800, Wu Fengguang wrote:
> Markus, you may want to try this patch, it will have better chance to figure
> out the hidden file pages.

just for curiosity, would it be possible to print process name which caused
the file to be loaded into caches?

-- 
Luka1 Hejtmanek

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05  9:06                       ` drop_caches Lukas Hejtmanek
@ 2009-03-05  9:14                         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 49+ messages in thread
From: KOSAKI Motohiro @ 2009-03-05  9:14 UTC (permalink / raw)
  To: Lukas Hejtmanek
  Cc: kosaki.motohiro, Wu Fengguang, Markus, linux-kernel,
	Zdenek Kabelac, linux-mm

> Hello,
> 
> On Thu, Mar 05, 2009 at 08:48:50AM +0800, Wu Fengguang wrote:
> > Markus, you may want to try this patch, it will have better chance to figure
> > out the hidden file pages.
> 
> just for curiosity, would it be possible to print process name which caused
> the file to be loaded into caches?

impossible.
kernel don't know which process populate page cache.






^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05  9:14                         ` KOSAKI Motohiro
  0 siblings, 0 replies; 49+ messages in thread
From: KOSAKI Motohiro @ 2009-03-05  9:14 UTC (permalink / raw)
  To: Lukas Hejtmanek
  Cc: kosaki.motohiro, Wu Fengguang, Markus, linux-kernel,
	Zdenek Kabelac, linux-mm

> Hello,
> 
> On Thu, Mar 05, 2009 at 08:48:50AM +0800, Wu Fengguang wrote:
> > Markus, you may want to try this patch, it will have better chance to figure
> > out the hidden file pages.
> 
> just for curiosity, would it be possible to print process name which caused
> the file to be loaded into caches?

impossible.
kernel don't know which process populate page cache.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05  9:14                         ` drop_caches KOSAKI Motohiro
@ 2009-03-05 11:11                           ` Wu Fengguang
  -1 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 11:11 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Lukas Hejtmanek, Markus, linux-kernel, Zdenek Kabelac, linux-mm

On Thu, Mar 05, 2009 at 11:14:33AM +0200, KOSAKI Motohiro wrote:
> > Hello,
> > 
> > On Thu, Mar 05, 2009 at 08:48:50AM +0800, Wu Fengguang wrote:
> > > Markus, you may want to try this patch, it will have better chance to figure
> > > out the hidden file pages.
> > 
> > just for curiosity, would it be possible to print process name which caused
> > the file to be loaded into caches?
 
Yes the code has been there but not included in the patch I sent to you.

When enabled by the following option, the kernel will save the short name of
current process into every _newly_ allocated inode structure, which will then
be displayed in the filecache.

+config PROC_FILECACHE_EXTRAS
+       bool "track extra states"
+       default y
+       depends on PROC_FILECACHE
+       help
+         Track extra states that costs a little more time/space.

However it adds runtime overhead, and the information is not reliably usable.
So not everyone will like this idea and I'm not maintaining this feature now.

But I do have an interesting old copy that shows the process names:

#      ino       size   cached cached% refcnt state accessed   uid process         dev          file
   1221729          1        4     100      0    --       27     0 rc              08:01(sda1)	/etc/default/rcS
   1058788         32       32     100      0    --       92     0 udevd           08:01(sda1)	/sbin/modprobe
   1221859          2        4     100      0    --        2     0 rc              08:01(sda1)	/etc/init.d/module-init-tools
   1400967          2        4     100      0    --       65     0 tput            08:01(sda1)	/lib/terminfo/l/linux
    195578         90       92     100      0    --       10     0 S03udev         08:01(sda1)	/usr/bin/expr
    196704         12       12     100      0    --       60     0 S03udev         08:01(sda1)	/usr/bin/tput
   1221849          1        4     100      0    --        2     0 S18ifupdown-cle 08:01(sda1)	/etc/default/ifupdown
   1221847          2        4     100      0    --        2     0 rc              08:01(sda1)	/etc/init.d/ifupdown-clean
   1726534          1        4     100      0    --       56     0 alsa-utils      08:01(sda1)	/bin/which
   1726549          7        8     100      0    --       25     0 sh              08:01(sda1)	/bin/mountpoint
   1221998          3        4     100      0    --       30     0 sh              08:01(sda1)	/etc/fstab
   1727533        100      100     100      0    --      306     0 sh              08:01(sda1)	/bin/grep
   1221653          3        4     100      0    --        3     0 rc              08:01(sda1)	/etc/init.d/mountdevsubfs.sh
   1400773          3        4     100      0    --        9     0 sh              08:01(sda1)	/lib/init/mount-functions.sh
   1400851          8        8     100      0    --       48     0 rc              08:01(sda1)	/lib/lsb/init-functions
   1727381         19       20     100      0    --       34     0 sh              08:01(sda1)	/bin/uname
   1221672          1        4     100      0    --        3     0 sh              08:01(sda1)	/etc/default/tmpfs
   1221669          1        4     100      0    --        3     0 sh              08:01(sda1)	/etc/default/devpts
   1224261          2        4     100      1    --      975     0 rcS             08:01(sda1)	/etc/passwd
   1221725          1        4     100      0    --      492     0 rcS             08:01(sda1)	/etc/nsswitch.conf
   1221659          4        4     100      0    --        1     0 rc              08:01(sda1)	/etc/init.d/mtab.sh
   1726557         50       52     100      0    --      186     0 rc              08:01(sda1)	/bin/sed
   1222991          2        4     100      0    --       25     0 mount           08:01(sda1)	/etc/blkid.tab
   1222681          1        4     100      0    --      207     0 init            08:01(sda1)	/etc/selinux/config
   1727379         40       40     100      0    --      251     0 sh              08:01(sda1)	/bin/rm
   1564027         35       36     100      9    --      142     0 touch           08:01(sda1)	/lib/librt-2.6.so
   1727368         40       40     100      0    --       70     0 sh              08:01(sda1)	/bin/touch
   1223550         97      100     100      0    --     4479     0 init            08:01(sda1)	/etc/ld.so.cache
   1400771         10       12     100      0    --        2     0 sh              08:01(sda1)	/lib/init/readlink
   1065053          8        8     100      0    --        2     0 sh              08:01(sda1)	/sbin/logsave
   1221665         10       12     100      0    --        1     0 rc              08:01(sda1)	/etc/init.d/checkroot.sh
     12661          1        4     100      1    d-       10     0 udevd           00:0e(tmpfs)	/.udev/db/block@sr0
     12320          1        4     100      1    D-       11     0 udevd           00:0e(tmpfs)	/.udev/db/md0
     12661          1        4     100      1    d-       10     0 udevd           00:0e(tmpfs)	/.udev/db/block@sr0
     12320          1        4     100      1    D-       11     0 udevd           00:0e(tmpfs)	/.udev/db/md0
     12316          1        4     100      1    D-       11     0 udevd           00:0e(tmpfs)	/.udev/db/md2
     12289          1        4     100      1    d-       10     0 udevd           00:0e(tmpfs)	/.udev/db/class@input@input2@event2
   1726532         19       20     100      0    --       42     0 net.agent       08:01(sda1)	/bin/sleep
     11918          1        4     100      1    d-       10     0 udevd           00:0e(tmpfs)	/.udev/db/class@input@input0@event0
     11912          1        4     100      1    d-       10     0 udevd           00:0e(tmpfs)	/.udev/db/class@input@input1@event1
   1058730         60       60     100      1    --        1     0 S03udev         08:01(sda1)	/sbin/udevd
   1564011        123      124     100     16    --      220     0 mount           08:01(sda1)	/lib/libpthread-2.6.so
   1400830         70       72     100      0    --       27     0 mount           08:01(sda1)	/lib/libdevmapper.so.1.02
   1400847         11       12     100      0    --       27     0 mount           08:01(sda1)	/lib/libuuid.so.1.2
   1400881         39       40     100      0    --       27     0 mount           08:01(sda1)	/lib/libblkid.so.1.0
   1726538         87       88     100      0    --       17     0 sh              08:01(sda1)	/bin/mount
   1221817          8        8     100      0    --        4     0 rcS             08:01(sda1)	/etc/init.d/rc
   1564018         43       44     100     50    --      492     0 rcS             08:01(sda1)	/lib/libnss_files-2.6.so
   1564012         43       44     100     43    --      473     0 rcS             08:01(sda1)	/lib/libnss_nis-2.6.so
   1564010         87       88     100     47    --      513     0 rcS             08:01(sda1)	/lib/libnsl-2.6.so
   1564020         35       36     100     43    --      473     0 rcS             08:01(sda1)	/lib/libnss_compat-2.6.so
   1661561        359      360     100     13    --      384     0 rcS             08:01(sda1)	/lib/libncurses.so.5.6
   1727359        752      752     100      2    --      291     0 init            08:01(sda1)	/bin/bash
   1564016         15       16     100     52    --      801     0 init            08:01(sda1)	/lib/libdl-2.6.so
   1564015       1352     1352     100     82    --     3338     0 init            08:01(sda1)	/lib/libc-2.6.so
   1402884         91       92     100      7    --      206     0 init            08:01(sda1)	/lib/libselinux.so.1
   1401085        236      236     100      7    --      206     0 init            08:01(sda1)	/lib/libsepol.so.1
   1564007        121      124     100     82    --     3338     0 run-init        08:01(sda1)	/lib/ld-2.6.so
   1058733         40       40     100      1    --        1     0 busybox         08:01(sda1)	/sbin/init
         0  160836480      308       0      0    --        0     0 mdadm           00:02(bdev)	(08:00)
         0   32226390        4       0      0    --        0     0 mdadm           00:02(bdev)	(08:02)
         0     128489        4       0      0    --        0     0 mdadm           00:02(bdev)	(08:07)
         0  160836480      308       0      0    --        0     0 mdadm           00:02(bdev)	(08:10)
         0   32226390        4       0      0    --        0     0 mdadm           00:02(bdev)	(08:12)
         0     313236        4       0      0    --        0     0 mdadm           00:02(bdev)	(08:18)
      7976          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda4
      7970          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda8
      7964          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb6
      7957          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda7
      7951          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda6
      7944          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb8
      7938          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb7
      7931          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda5
      7924          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb5
      7918          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda3
      7911          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb4
      7905          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb3
      7898          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda2
      7892          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda1
      7885          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb2
      7851          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb1
      7823          1        4     100      1    D-       32     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda
      7769          1        4     100      1    D-       28     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb
      7472          1        4     100      1    D-        4     0 udevd           00:0e(tmpfs)	/.udev/db/class@input@input1@mouse0
      7068          1        4     100      1    D-        4     0 udevd           00:0e(tmpfs)	/.udev/db/class@input@mice
      2227          1        4     100      1    D-     1790     0 udevd           00:0e(tmpfs)	/.udev/uevent_seqnum
      2127          1        4     100      1    D-       11     0 init            00:0e(tmpfs)	/.initramfs/progress_state

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 11:11                           ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 11:11 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Lukas Hejtmanek, Markus, linux-kernel, Zdenek Kabelac, linux-mm

On Thu, Mar 05, 2009 at 11:14:33AM +0200, KOSAKI Motohiro wrote:
> > Hello,
> > 
> > On Thu, Mar 05, 2009 at 08:48:50AM +0800, Wu Fengguang wrote:
> > > Markus, you may want to try this patch, it will have better chance to figure
> > > out the hidden file pages.
> > 
> > just for curiosity, would it be possible to print process name which caused
> > the file to be loaded into caches?
 
Yes the code has been there but not included in the patch I sent to you.

When enabled by the following option, the kernel will save the short name of
current process into every _newly_ allocated inode structure, which will then
be displayed in the filecache.

+config PROC_FILECACHE_EXTRAS
+       bool "track extra states"
+       default y
+       depends on PROC_FILECACHE
+       help
+         Track extra states that costs a little more time/space.

However it adds runtime overhead, and the information is not reliably usable.
So not everyone will like this idea and I'm not maintaining this feature now.

But I do have an interesting old copy that shows the process names:

#      ino       size   cached cached% refcnt state accessed   uid process         dev          file
   1221729          1        4     100      0    --       27     0 rc              08:01(sda1)	/etc/default/rcS
   1058788         32       32     100      0    --       92     0 udevd           08:01(sda1)	/sbin/modprobe
   1221859          2        4     100      0    --        2     0 rc              08:01(sda1)	/etc/init.d/module-init-tools
   1400967          2        4     100      0    --       65     0 tput            08:01(sda1)	/lib/terminfo/l/linux
    195578         90       92     100      0    --       10     0 S03udev         08:01(sda1)	/usr/bin/expr
    196704         12       12     100      0    --       60     0 S03udev         08:01(sda1)	/usr/bin/tput
   1221849          1        4     100      0    --        2     0 S18ifupdown-cle 08:01(sda1)	/etc/default/ifupdown
   1221847          2        4     100      0    --        2     0 rc              08:01(sda1)	/etc/init.d/ifupdown-clean
   1726534          1        4     100      0    --       56     0 alsa-utils      08:01(sda1)	/bin/which
   1726549          7        8     100      0    --       25     0 sh              08:01(sda1)	/bin/mountpoint
   1221998          3        4     100      0    --       30     0 sh              08:01(sda1)	/etc/fstab
   1727533        100      100     100      0    --      306     0 sh              08:01(sda1)	/bin/grep
   1221653          3        4     100      0    --        3     0 rc              08:01(sda1)	/etc/init.d/mountdevsubfs.sh
   1400773          3        4     100      0    --        9     0 sh              08:01(sda1)	/lib/init/mount-functions.sh
   1400851          8        8     100      0    --       48     0 rc              08:01(sda1)	/lib/lsb/init-functions
   1727381         19       20     100      0    --       34     0 sh              08:01(sda1)	/bin/uname
   1221672          1        4     100      0    --        3     0 sh              08:01(sda1)	/etc/default/tmpfs
   1221669          1        4     100      0    --        3     0 sh              08:01(sda1)	/etc/default/devpts
   1224261          2        4     100      1    --      975     0 rcS             08:01(sda1)	/etc/passwd
   1221725          1        4     100      0    --      492     0 rcS             08:01(sda1)	/etc/nsswitch.conf
   1221659          4        4     100      0    --        1     0 rc              08:01(sda1)	/etc/init.d/mtab.sh
   1726557         50       52     100      0    --      186     0 rc              08:01(sda1)	/bin/sed
   1222991          2        4     100      0    --       25     0 mount           08:01(sda1)	/etc/blkid.tab
   1222681          1        4     100      0    --      207     0 init            08:01(sda1)	/etc/selinux/config
   1727379         40       40     100      0    --      251     0 sh              08:01(sda1)	/bin/rm
   1564027         35       36     100      9    --      142     0 touch           08:01(sda1)	/lib/librt-2.6.so
   1727368         40       40     100      0    --       70     0 sh              08:01(sda1)	/bin/touch
   1223550         97      100     100      0    --     4479     0 init            08:01(sda1)	/etc/ld.so.cache
   1400771         10       12     100      0    --        2     0 sh              08:01(sda1)	/lib/init/readlink
   1065053          8        8     100      0    --        2     0 sh              08:01(sda1)	/sbin/logsave
   1221665         10       12     100      0    --        1     0 rc              08:01(sda1)	/etc/init.d/checkroot.sh
     12661          1        4     100      1    d-       10     0 udevd           00:0e(tmpfs)	/.udev/db/block@sr0
     12320          1        4     100      1    D-       11     0 udevd           00:0e(tmpfs)	/.udev/db/md0
     12661          1        4     100      1    d-       10     0 udevd           00:0e(tmpfs)	/.udev/db/block@sr0
     12320          1        4     100      1    D-       11     0 udevd           00:0e(tmpfs)	/.udev/db/md0
     12316          1        4     100      1    D-       11     0 udevd           00:0e(tmpfs)	/.udev/db/md2
     12289          1        4     100      1    d-       10     0 udevd           00:0e(tmpfs)	/.udev/db/class@input@input2@event2
   1726532         19       20     100      0    --       42     0 net.agent       08:01(sda1)	/bin/sleep
     11918          1        4     100      1    d-       10     0 udevd           00:0e(tmpfs)	/.udev/db/class@input@input0@event0
     11912          1        4     100      1    d-       10     0 udevd           00:0e(tmpfs)	/.udev/db/class@input@input1@event1
   1058730         60       60     100      1    --        1     0 S03udev         08:01(sda1)	/sbin/udevd
   1564011        123      124     100     16    --      220     0 mount           08:01(sda1)	/lib/libpthread-2.6.so
   1400830         70       72     100      0    --       27     0 mount           08:01(sda1)	/lib/libdevmapper.so.1.02
   1400847         11       12     100      0    --       27     0 mount           08:01(sda1)	/lib/libuuid.so.1.2
   1400881         39       40     100      0    --       27     0 mount           08:01(sda1)	/lib/libblkid.so.1.0
   1726538         87       88     100      0    --       17     0 sh              08:01(sda1)	/bin/mount
   1221817          8        8     100      0    --        4     0 rcS             08:01(sda1)	/etc/init.d/rc
   1564018         43       44     100     50    --      492     0 rcS             08:01(sda1)	/lib/libnss_files-2.6.so
   1564012         43       44     100     43    --      473     0 rcS             08:01(sda1)	/lib/libnss_nis-2.6.so
   1564010         87       88     100     47    --      513     0 rcS             08:01(sda1)	/lib/libnsl-2.6.so
   1564020         35       36     100     43    --      473     0 rcS             08:01(sda1)	/lib/libnss_compat-2.6.so
   1661561        359      360     100     13    --      384     0 rcS             08:01(sda1)	/lib/libncurses.so.5.6
   1727359        752      752     100      2    --      291     0 init            08:01(sda1)	/bin/bash
   1564016         15       16     100     52    --      801     0 init            08:01(sda1)	/lib/libdl-2.6.so
   1564015       1352     1352     100     82    --     3338     0 init            08:01(sda1)	/lib/libc-2.6.so
   1402884         91       92     100      7    --      206     0 init            08:01(sda1)	/lib/libselinux.so.1
   1401085        236      236     100      7    --      206     0 init            08:01(sda1)	/lib/libsepol.so.1
   1564007        121      124     100     82    --     3338     0 run-init        08:01(sda1)	/lib/ld-2.6.so
   1058733         40       40     100      1    --        1     0 busybox         08:01(sda1)	/sbin/init
         0  160836480      308       0      0    --        0     0 mdadm           00:02(bdev)	(08:00)
         0   32226390        4       0      0    --        0     0 mdadm           00:02(bdev)	(08:02)
         0     128489        4       0      0    --        0     0 mdadm           00:02(bdev)	(08:07)
         0  160836480      308       0      0    --        0     0 mdadm           00:02(bdev)	(08:10)
         0   32226390        4       0      0    --        0     0 mdadm           00:02(bdev)	(08:12)
         0     313236        4       0      0    --        0     0 mdadm           00:02(bdev)	(08:18)
      7976          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda4
      7970          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda8
      7964          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb6
      7957          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda7
      7951          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda6
      7944          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb8
      7938          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb7
      7931          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda5
      7924          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb5
      7918          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda3
      7911          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb4
      7905          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb3
      7898          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda2
      7892          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda@sda1
      7885          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb2
      7851          1        4     100      1    D-       12     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb@sdb1
      7823          1        4     100      1    D-       32     0 udevd           00:0e(tmpfs)	/.udev/db/block@sda
      7769          1        4     100      1    D-       28     0 udevd           00:0e(tmpfs)	/.udev/db/block@sdb
      7472          1        4     100      1    D-        4     0 udevd           00:0e(tmpfs)	/.udev/db/class@input@input1@mouse0
      7068          1        4     100      1    D-        4     0 udevd           00:0e(tmpfs)	/.udev/db/class@input@mice
      2227          1        4     100      1    D-     1790     0 udevd           00:0e(tmpfs)	/.udev/uevent_seqnum
      2127          1        4     100      1    D-       11     0 init            00:0e(tmpfs)	/.initramfs/progress_state

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05  0:48                     ` drop_caches Wu Fengguang
@ 2009-03-05 11:55                       ` Markus
  -1 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-05 11:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wu Fengguang, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

Am Donnerstag, 5. März 2009 schrieb Wu Fengguang:
> On Wed, Mar 04, 2009 at 08:47:41PM +0200, Markus wrote:
> > Am Mittwoch, 4. März 2009 schrieb Zdenek Kabelac:
> > > Markus napsal(a):
> > > >>>>>>> The memory mapped pages won't be dropped in this way.
> > > >>>>>>> "cat /proc/meminfo" will show you the number of mapped 
pages.
> > > >>>>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
> > > >>>> cat /proc/meminfo
> > > >>>>>>              total       used       free     shared    
buffers     
> > > >>>>>> cached
> > > >>>>>> Mem:          3950       3262        688          0          
0        
> > > >>>>>> 359
> > > >>>>>> -/+ buffers/cache:       2902       1047
> > > >>>>>> Swap:         5890       1509       4381
> > > >>>>>> MemTotal:        4045500 kB
> > > >>>>>> MemFree:          705180 kB
> > > >>>>>> Buffers:             508 kB
> > > >>>>>> Cached:           367748 kB
> > > >>>>>> SwapCached:       880744 kB
> > > >>>>>> Active:          1555032 kB
> > > >>>>>> Inactive:        1634868 kB
> > > >>>>>> Active(anon):    1527100 kB
> > > >>>>>> Inactive(anon):  1607328 kB
> > > >>>>>> Active(file):      27932 kB
> > > >>>>>> Inactive(file):    27540 kB
> > > >>>>>> Unevictable:         816 kB
> > > >>>>>> Mlocked:               0 kB
> > > >>>>>> SwapTotal:       6032344 kB
> > > >>>>>> SwapFree:        4486496 kB
> > > >>>>>> Dirty:                 0 kB
> > > >>>>>> Writeback:             0 kB
> > > >>>>>> AnonPages:       2378112 kB
> > > >>>>>> Mapped:            52196 kB
> > > >>>>>> Slab:              65640 kB
> > > >>>>>> SReclaimable:      46192 kB
> > > >>>>>> SUnreclaim:        19448 kB
> > > >>>>>> PageTables:        28200 kB
> > > >>>>>> NFS_Unstable:          0 kB
> > > >>>>>> Bounce:                0 kB
> > > >>>>>> WritebackTmp:          0 kB
> > > >>>>>> CommitLimit:     8055092 kB
> > > >>>>>> Committed_AS:    4915636 kB
> > > >>>>>> VmallocTotal:   34359738367 kB
> > > >>>>>> VmallocUsed:       44580 kB
> > > >>>>>> VmallocChunk:   34359677239 kB
> > > >>>>>> DirectMap4k:     3182528 kB
> > > >>>>>> DirectMap2M:     1011712 kB
> > > >>>>>>
> > > >>>>>> The cached reduced to 359 MB (after the dropping).
> > > >>>>>> I dont know where to read the "number of mapped pages".
> > > >>>>>> "Mapped" is about 51 MB.
> > > >>>>> Does your tmpfs store lots of files?
> > > >>>> Dont think so:
> > > >>>>
> > > >>>> # df -h
> > > >>>> Filesystem            Size  Used Avail Use% Mounted on
> > > >>>> /dev/md6               14G  8.2G  5.6G  60% /
> > > >>>> udev                   10M  304K  9.8M   3% /dev
> > > >>>> cachedir              4.0M  100K  4.0M   
3% /lib64/splash/cache
> > > >>>> /dev/md4               19G   15G  3.1G  83% /home
> > > >>>> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
> > > >>>> shm                   2.0G     0  2.0G   0% /dev/shm
> > > >>>> /dev/md1               99M   19M   76M  20% /boot
> > > >>>>
> > > >>>> I dont know what exactly all that memory is used for. It 
varies 
> > > >>>> from about 300 MB to up to one GB.
> > > >>>> Tell me where to look and I will!
> > > >>> So you don't have lots of mapped pages(Mapped=51M) or tmpfs 
files.  
> > > > It's strange to me that there are so many undroppable cached 
pages(Cached=359M),
> > > > and most of them lie out of the LRU queue(Active+Inactive 
file=53M)...
> > > >>> Anyone have better clues on these 'hidden' pages?
> > > >> Maybe try this:
> > > >>
> > > >> cat /proc/`pidof X`/smaps | grep drm | wc -l
> > > >>
> > > >> you will see some growing numbers.
> > > >>
> > > >> Also check  cat /proc/dri/0/gem_objects
> > > >> there should be some number  # object bytes - which should be 
close 
> > to 
> > > > your 
> > > >> missing cached pages.
> > > >>
> > > >>
> > > >> If you are using Intel GEM driver - there is some unlimited 
caching 
> > > > issue
> > > >> see: http://bugs.freedesktop.org/show_bug.cgi?id=20404
> > > >>
> > > > # cat /proc/`pidof X`/smaps | grep drm | wc -l
> > > > 0
> > > > # cat /proc/dri/0/gem_objects
> > > > cat: /proc/dri/0/gem_objects: No such file or directory
> > > > 
> > > > I use Xorg 1.3 with an nvidia gpu. Dont know if I use a "Intel 
GEM 
> > > > driver".
> > > > 
> > > 
> > > 
> > > Are you using binary  driver from NVidia ??
> > > Maybe you should ask authors of this binary blob ?
> > > 
> > > Could you try to use for a while Vesa driver to see, if you are 
able 
> > to get 
> > > same strange results ?
> > 
> > I rebooted in console without the nvidia-module loaded and have the 
same 
> > results (updated to 2.6.28.7 btw):
> > # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
cat /proc/meminfo
> >              total       used       free     shared    buffers     
> > cached
> > Mem:          3950       1647       2303          0          0        
> > 924
> > -/+ buffers/cache:        722       3228
> > Swap:         5890          0       5890
> > MemTotal:        4045444 kB
> > MemFree:         2358944 kB
> > Buffers:             544 kB
> > Cached:           946624 kB
> > SwapCached:            0 kB
> > Active:          1614756 kB
> > Inactive:           7632 kB
> > Active(anon):    1602476 kB
> > Inactive(anon):        0 kB
> > Active(file):      12280 kB
> > Inactive(file):     7632 kB
> > Unevictable:           0 kB
> > Mlocked:               0 kB
> > SwapTotal:       6032344 kB
> > SwapFree:        6032344 kB
> > Dirty:                72 kB
> > Writeback:            32 kB
> > AnonPages:        675224 kB
> > Mapped:            17756 kB
> > Slab:              19936 kB
> > SReclaimable:       9652 kB
> > SUnreclaim:        10284 kB
> > PageTables:         8296 kB
> > NFS_Unstable:          0 kB
> > Bounce:                0 kB
> > WritebackTmp:          0 kB
> > CommitLimit:     8055064 kB
> > Committed_AS:    3648088 kB
> > VmallocTotal:   34359738367 kB
> > VmallocUsed:       10616 kB
> > VmallocChunk:   34359716459 kB
> > DirectMap4k:        6080 kB
> > DirectMap2M:     4188160 kB
> 
> Markus, you may want to try this patch, it will have better chance to 
figure out
> the hidden file pages.
> 
> 1) apply the patch and recompile kernel with CONFIG_PROC_FILECACHE=m
> 2) after booting:
>         modprobe filecache
>         cp /proc/filecache filecache-`date +'%F'`
> 3) send us the copied file, it will list all cached files, including
>    the normally hidden ones.

The file consists of 674 lines. If I interpret it right, "size" is the 
filesize and "cached" the amount of the file being in cache (why can 
this be bigger than the file?!).

# sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
cat /proc/meminfo ; cp /proc/filecache filecache-$(date +"%F")
             total       used       free     shared    buffers     
cached
Mem:          3950       1935       2015          0          0       
1009
-/+ buffers/cache:        925       3025
Swap:         5890          0       5890
MemTotal:        4045436 kB
MemFree:         2063976 kB
Buffers:             480 kB
Cached:          1033724 kB
SwapCached:            0 kB
Active:          1846000 kB
Inactive:          48552 kB
Active(anon):    1790892 kB
Inactive(anon):        8 kB
Active(file):      55108 kB
Inactive(file):    48544 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       6032344 kB
SwapFree:        6032344 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        860380 kB
Mapped:           101908 kB
Slab:              25772 kB
SReclaimable:      12560 kB
SUnreclaim:        13212 kB
PageTables:        16476 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8055060 kB
Committed_AS:    4132748 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       42256 kB
VmallocChunk:   34359683067 kB
DirectMap4k:       14272 kB
DirectMap2M:     4179968 kB

# sort -n -k 3 filecache-2009-03-05 | tail -n 5
     15886       7112     7112     100      1    d- 00:08
(tmpfs)        /dev/zero\040(deleted)
     16209      35708    35708     100      1    d- 00:08
(tmpfs)        /dev/zero\040(deleted)
     16212      82128    82128     100      1    d- 00:08
(tmpfs)        /dev/zero\040(deleted)
     15887     340024   340024     100      1    d- 00:08
(tmpfs)        /dev/zero\040(deleted)
     15884     455008   455008     100      1    d- 00:08
(tmpfs)        /dev/zero\040(deleted)

The sum of the third column is 1013 MB.
To note the biggest ones (or do you want the whole file?)... and thats 
after a sync and a drop_caches! (Can be seen in the commands given.)

Thanks!
Markus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 11:55                       ` Markus
  0 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-05 11:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wu Fengguang, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

Am Donnerstag, 5. März 2009 schrieb Wu Fengguang:
> On Wed, Mar 04, 2009 at 08:47:41PM +0200, Markus wrote:
> > Am Mittwoch, 4. März 2009 schrieb Zdenek Kabelac:
> > > Markus napsal(a):
> > > >>>>>>> The memory mapped pages won't be dropped in this way.
> > > >>>>>>> "cat /proc/meminfo" will show you the number of mapped 
pages.
> > > >>>>>> # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
> > > >>>> cat /proc/meminfo
> > > >>>>>>              total       used       free     shared    
buffers     
> > > >>>>>> cached
> > > >>>>>> Mem:          3950       3262        688          0          
0        
> > > >>>>>> 359
> > > >>>>>> -/+ buffers/cache:       2902       1047
> > > >>>>>> Swap:         5890       1509       4381
> > > >>>>>> MemTotal:        4045500 kB
> > > >>>>>> MemFree:          705180 kB
> > > >>>>>> Buffers:             508 kB
> > > >>>>>> Cached:           367748 kB
> > > >>>>>> SwapCached:       880744 kB
> > > >>>>>> Active:          1555032 kB
> > > >>>>>> Inactive:        1634868 kB
> > > >>>>>> Active(anon):    1527100 kB
> > > >>>>>> Inactive(anon):  1607328 kB
> > > >>>>>> Active(file):      27932 kB
> > > >>>>>> Inactive(file):    27540 kB
> > > >>>>>> Unevictable:         816 kB
> > > >>>>>> Mlocked:               0 kB
> > > >>>>>> SwapTotal:       6032344 kB
> > > >>>>>> SwapFree:        4486496 kB
> > > >>>>>> Dirty:                 0 kB
> > > >>>>>> Writeback:             0 kB
> > > >>>>>> AnonPages:       2378112 kB
> > > >>>>>> Mapped:            52196 kB
> > > >>>>>> Slab:              65640 kB
> > > >>>>>> SReclaimable:      46192 kB
> > > >>>>>> SUnreclaim:        19448 kB
> > > >>>>>> PageTables:        28200 kB
> > > >>>>>> NFS_Unstable:          0 kB
> > > >>>>>> Bounce:                0 kB
> > > >>>>>> WritebackTmp:          0 kB
> > > >>>>>> CommitLimit:     8055092 kB
> > > >>>>>> Committed_AS:    4915636 kB
> > > >>>>>> VmallocTotal:   34359738367 kB
> > > >>>>>> VmallocUsed:       44580 kB
> > > >>>>>> VmallocChunk:   34359677239 kB
> > > >>>>>> DirectMap4k:     3182528 kB
> > > >>>>>> DirectMap2M:     1011712 kB
> > > >>>>>>
> > > >>>>>> The cached reduced to 359 MB (after the dropping).
> > > >>>>>> I dont know where to read the "number of mapped pages".
> > > >>>>>> "Mapped" is about 51 MB.
> > > >>>>> Does your tmpfs store lots of files?
> > > >>>> Dont think so:
> > > >>>>
> > > >>>> # df -h
> > > >>>> Filesystem            Size  Used Avail Use% Mounted on
> > > >>>> /dev/md6               14G  8.2G  5.6G  60% /
> > > >>>> udev                   10M  304K  9.8M   3% /dev
> > > >>>> cachedir              4.0M  100K  4.0M   
3% /lib64/splash/cache
> > > >>>> /dev/md4               19G   15G  3.1G  83% /home
> > > >>>> /dev/md3              8.3G  4.5G  3.9G  55% /usr/portage
> > > >>>> shm                   2.0G     0  2.0G   0% /dev/shm
> > > >>>> /dev/md1               99M   19M   76M  20% /boot
> > > >>>>
> > > >>>> I dont know what exactly all that memory is used for. It 
varies 
> > > >>>> from about 300 MB to up to one GB.
> > > >>>> Tell me where to look and I will!
> > > >>> So you don't have lots of mapped pages(Mapped=51M) or tmpfs 
files.  
> > > > It's strange to me that there are so many undroppable cached 
pages(Cached=359M),
> > > > and most of them lie out of the LRU queue(Active+Inactive 
file=53M)...
> > > >>> Anyone have better clues on these 'hidden' pages?
> > > >> Maybe try this:
> > > >>
> > > >> cat /proc/`pidof X`/smaps | grep drm | wc -l
> > > >>
> > > >> you will see some growing numbers.
> > > >>
> > > >> Also check  cat /proc/dri/0/gem_objects
> > > >> there should be some number  # object bytes - which should be 
close 
> > to 
> > > > your 
> > > >> missing cached pages.
> > > >>
> > > >>
> > > >> If you are using Intel GEM driver - there is some unlimited 
caching 
> > > > issue
> > > >> see: http://bugs.freedesktop.org/show_bug.cgi?id=20404
> > > >>
> > > > # cat /proc/`pidof X`/smaps | grep drm | wc -l
> > > > 0
> > > > # cat /proc/dri/0/gem_objects
> > > > cat: /proc/dri/0/gem_objects: No such file or directory
> > > > 
> > > > I use Xorg 1.3 with an nvidia gpu. Dont know if I use a "Intel 
GEM 
> > > > driver".
> > > > 
> > > 
> > > 
> > > Are you using binary  driver from NVidia ??
> > > Maybe you should ask authors of this binary blob ?
> > > 
> > > Could you try to use for a while Vesa driver to see, if you are 
able 
> > to get 
> > > same strange results ?
> > 
> > I rebooted in console without the nvidia-module loaded and have the 
same 
> > results (updated to 2.6.28.7 btw):
> > # sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
cat /proc/meminfo
> >              total       used       free     shared    buffers     
> > cached
> > Mem:          3950       1647       2303          0          0        
> > 924
> > -/+ buffers/cache:        722       3228
> > Swap:         5890          0       5890
> > MemTotal:        4045444 kB
> > MemFree:         2358944 kB
> > Buffers:             544 kB
> > Cached:           946624 kB
> > SwapCached:            0 kB
> > Active:          1614756 kB
> > Inactive:           7632 kB
> > Active(anon):    1602476 kB
> > Inactive(anon):        0 kB
> > Active(file):      12280 kB
> > Inactive(file):     7632 kB
> > Unevictable:           0 kB
> > Mlocked:               0 kB
> > SwapTotal:       6032344 kB
> > SwapFree:        6032344 kB
> > Dirty:                72 kB
> > Writeback:            32 kB
> > AnonPages:        675224 kB
> > Mapped:            17756 kB
> > Slab:              19936 kB
> > SReclaimable:       9652 kB
> > SUnreclaim:        10284 kB
> > PageTables:         8296 kB
> > NFS_Unstable:          0 kB
> > Bounce:                0 kB
> > WritebackTmp:          0 kB
> > CommitLimit:     8055064 kB
> > Committed_AS:    3648088 kB
> > VmallocTotal:   34359738367 kB
> > VmallocUsed:       10616 kB
> > VmallocChunk:   34359716459 kB
> > DirectMap4k:        6080 kB
> > DirectMap2M:     4188160 kB
> 
> Markus, you may want to try this patch, it will have better chance to 
figure out
> the hidden file pages.
> 
> 1) apply the patch and recompile kernel with CONFIG_PROC_FILECACHE=m
> 2) after booting:
>         modprobe filecache
>         cp /proc/filecache filecache-`date +'%F'`
> 3) send us the copied file, it will list all cached files, including
>    the normally hidden ones.

The file consists of 674 lines. If I interpret it right, "size" is the 
filesize and "cached" the amount of the file being in cache (why can 
this be bigger than the file?!).

# sync ; echo 3 > /proc/sys/vm/drop_caches ; free -m ; 
cat /proc/meminfo ; cp /proc/filecache filecache-$(date +"%F")
             total       used       free     shared    buffers     
cached
Mem:          3950       1935       2015          0          0       
1009
-/+ buffers/cache:        925       3025
Swap:         5890          0       5890
MemTotal:        4045436 kB
MemFree:         2063976 kB
Buffers:             480 kB
Cached:          1033724 kB
SwapCached:            0 kB
Active:          1846000 kB
Inactive:          48552 kB
Active(anon):    1790892 kB
Inactive(anon):        8 kB
Active(file):      55108 kB
Inactive(file):    48544 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       6032344 kB
SwapFree:        6032344 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        860380 kB
Mapped:           101908 kB
Slab:              25772 kB
SReclaimable:      12560 kB
SUnreclaim:        13212 kB
PageTables:        16476 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8055060 kB
Committed_AS:    4132748 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       42256 kB
VmallocChunk:   34359683067 kB
DirectMap4k:       14272 kB
DirectMap2M:     4179968 kB

# sort -n -k 3 filecache-2009-03-05 | tail -n 5
     15886       7112     7112     100      1    d- 00:08
(tmpfs)        /dev/zero\040(deleted)
     16209      35708    35708     100      1    d- 00:08
(tmpfs)        /dev/zero\040(deleted)
     16212      82128    82128     100      1    d- 00:08
(tmpfs)        /dev/zero\040(deleted)
     15887     340024   340024     100      1    d- 00:08
(tmpfs)        /dev/zero\040(deleted)
     15884     455008   455008     100      1    d- 00:08
(tmpfs)        /dev/zero\040(deleted)

The sum of the third column is 1013 MB.
To note the biggest ones (or do you want the whole file?)... and thats 
after a sync and a drop_caches! (Can be seen in the commands given.)

Thanks!
Markus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 11:55                       ` drop_caches Markus
  (?)
@ 2009-03-05 13:29                       ` Wu Fengguang
  2009-03-05 14:05                           ` drop_caches Markus
  -1 siblings, 1 reply; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 13:29 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

[-- Attachment #1: Type: text/plain, Size: 1034 bytes --]

Hi Markus,

Could you please try the attached patch which will also show the
user and process that opened these files? It adds three more fields
when CONFIG_PROC_FILECACHE_EXTRAS is selected.

Thanks,
Fengguang
 
On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> 
> # sort -n -k 3 filecache-2009-03-05 | tail -n 5
>      15886       7112     7112     100      1    d- 00:08
> (tmpfs)        /dev/zero\040(deleted)
>      16209      35708    35708     100      1    d- 00:08
> (tmpfs)        /dev/zero\040(deleted)
>      16212      82128    82128     100      1    d- 00:08
> (tmpfs)        /dev/zero\040(deleted)
>      15887     340024   340024     100      1    d- 00:08
> (tmpfs)        /dev/zero\040(deleted)
>      15884     455008   455008     100      1    d- 00:08
> (tmpfs)        /dev/zero\040(deleted)
> 
> The sum of the third column is 1013 MB.
> To note the biggest ones (or do you want the whole file?)... and thats 
> after a sync and a drop_caches! (Can be seen in the commands given.)
> 
> Thanks!
> Markus

[-- Attachment #2: filecache-2.6.28.patch --]
[-- Type: text/x-diff, Size: 33812 bytes --]

--- linux-2.6.orig/include/linux/mm.h
+++ linux-2.6/include/linux/mm.h
@@ -27,6 +27,7 @@ extern unsigned long max_mapnr;
 extern unsigned long num_physpages;
 extern void * high_memory;
 extern int page_cluster;
+extern char * const zone_names[];
 
 #ifdef CONFIG_SYSCTL
 extern int sysctl_legacy_va_layout;
--- linux-2.6.orig/mm/page_alloc.c
+++ linux-2.6/mm/page_alloc.c
@@ -104,7 +104,7 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_Z
 
 EXPORT_SYMBOL(totalram_pages);
 
-static char * const zone_names[MAX_NR_ZONES] = {
+char * const zone_names[MAX_NR_ZONES] = {
 #ifdef CONFIG_ZONE_DMA
 	 "DMA",
 #endif
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -1943,7 +1943,10 @@ char *__d_path(const struct path *path, 
 
 		if (dentry == root->dentry && vfsmnt == root->mnt)
 			break;
-		if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
+		if (unlikely(!vfsmnt)) {
+			if (IS_ROOT(dentry))
+				break;
+		} else if (dentry == vfsmnt->mnt_root || IS_ROOT(dentry)) {
 			/* Global root? */
 			if (vfsmnt->mnt_parent == vfsmnt) {
 				goto global_root;
--- linux-2.6.orig/lib/radix-tree.c
+++ linux-2.6/lib/radix-tree.c
@@ -564,7 +564,6 @@ out:
 }
 EXPORT_SYMBOL(radix_tree_tag_clear);
 
-#ifndef __KERNEL__	/* Only the test harness uses this at present */
 /**
  * radix_tree_tag_get - get a tag on a radix tree node
  * @root:		radix tree root
@@ -627,7 +626,6 @@ int radix_tree_tag_get(struct radix_tree
 	}
 }
 EXPORT_SYMBOL(radix_tree_tag_get);
-#endif
 
 /**
  *	radix_tree_next_hole    -    find the next hole (not-present entry)
--- linux-2.6.orig/fs/inode.c
+++ linux-2.6/fs/inode.c
@@ -82,6 +82,10 @@ static struct hlist_head *inode_hashtabl
  */
 DEFINE_SPINLOCK(inode_lock);
 
+EXPORT_SYMBOL(inode_in_use);
+EXPORT_SYMBOL(inode_unused);
+EXPORT_SYMBOL(inode_lock);
+
 /*
  * iprune_mutex provides exclusion between the kswapd or try_to_free_pages
  * icache shrinking path, and the umount path.  Without this exclusion,
@@ -108,6 +112,14 @@ static void wake_up_inode(struct inode *
 	wake_up_bit(&inode->i_state, __I_LOCK);
 }
 
+static inline void inode_created_by(struct inode *inode, struct task_struct *task)
+{
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+	inode->i_cuid = task->uid;
+	memcpy(inode->i_comm, task->comm, sizeof(task->comm));
+#endif
+}
+
 static struct inode *alloc_inode(struct super_block *sb)
 {
 	static const struct address_space_operations empty_aops;
@@ -183,6 +195,7 @@ static struct inode *alloc_inode(struct 
 		}
 		inode->i_private = NULL;
 		inode->i_mapping = mapping;
+		inode_created_by(inode, current);
 	}
 	return inode;
 }
@@ -247,6 +260,8 @@ void __iget(struct inode * inode)
 	inodes_stat.nr_unused--;
 }
 
+EXPORT_SYMBOL(__iget);
+
 /**
  * clear_inode - clear an inode
  * @inode: inode to clear
@@ -1353,6 +1368,16 @@ void inode_double_unlock(struct inode *i
 }
 EXPORT_SYMBOL(inode_double_unlock);
 
+
+struct hlist_head * get_inode_hash_budget(unsigned long index)
+{
+       if (index >= (1 << i_hash_shift))
+               return NULL;
+
+       return inode_hashtable + index;
+}
+EXPORT_SYMBOL_GPL(get_inode_hash_budget);
+
 static __initdata unsigned long ihash_entries;
 static int __init set_ihash_entries(char *str)
 {
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -45,6 +45,9 @@
 LIST_HEAD(super_blocks);
 DEFINE_SPINLOCK(sb_lock);
 
+EXPORT_SYMBOL(super_blocks);
+EXPORT_SYMBOL(sb_lock);
+
 /**
  *	alloc_super	-	create new superblock
  *	@type:	filesystem type superblock should belong to
--- linux-2.6.orig/mm/vmscan.c
+++ linux-2.6/mm/vmscan.c
@@ -230,6 +230,7 @@ unsigned long shrink_slab(unsigned long 
 	up_read(&shrinker_rwsem);
 	return ret;
 }
+EXPORT_SYMBOL(shrink_slab);
 
 /* Called without lock on whether page is mapped, so answer is unstable */
 static inline int page_mapping_inuse(struct page *page)
--- linux-2.6.orig/mm/swap_state.c
+++ linux-2.6/mm/swap_state.c
@@ -44,6 +44,7 @@ struct address_space swapper_space = {
 	.i_mmap_nonlinear = LIST_HEAD_INIT(swapper_space.i_mmap_nonlinear),
 	.backing_dev_info = &swap_backing_dev_info,
 };
+EXPORT_SYMBOL_GPL(swapper_space);
 
 #define INC_CACHE_INFO(x)	do { swap_cache_info.x++; } while (0)
 
--- linux-2.6.orig/Documentation/filesystems/proc.txt
+++ linux-2.6/Documentation/filesystems/proc.txt
@@ -266,6 +266,7 @@ Table 1-4: Kernel info in /proc
  driver	     Various drivers grouped here, currently rtc (2.4)
  execdomains Execdomains, related to security			(2.4)
  fb	     Frame Buffer devices				(2.4)
+ filecache   Query/drop in-memory file cache
  fs	     File system parameters, currently nfs/exports	(2.4)
  ide         Directory containing info about the IDE subsystem 
  interrupts  Interrupt usage                                   
@@ -456,6 +457,88 @@ varies by architecture and compile optio
 
 > cat /proc/meminfo
 
+..............................................................................
+
+filecache:
+
+Provides access to the in-memory file cache.
+
+To list an index of all cached files:
+
+    echo ls > /proc/filecache
+    cat /proc/filecache
+
+The output looks like:
+
+    # filecache 1.0
+    #      ino       size   cached cached%  state   refcnt  dev             file
+       1026334         91       92    100   --      66      03:02(hda2)     /lib/ld-2.3.6.so
+        233608       1242      972     78   --      66      03:02(hda2)     /lib/tls/libc-2.3.6.so
+         65203        651      476     73   --      1       03:02(hda2)     /bin/bash
+       1026445        261      160     61   --      10      03:02(hda2)     /lib/libncurses.so.5.5
+        235427         10       12    100   --      44      03:02(hda2)     /lib/tls/libdl-2.3.6.so
+
+FIELD	INTRO
+---------------------------------------------------------------------------
+ino	inode number
+size	inode size in KB
+cached	cached size in KB
+cached%	percent of file data cached
+state1	'-' clean; 'd' metadata dirty; 'D' data dirty
+state2	'-' unlocked; 'L' locked, normally indicates file being written out
+refcnt	file reference count, it's an in-kernel one, not exactly open count
+dev	major:minor numbers in hex, followed by a descriptive device name
+file	file path _inside_ the filesystem. There are several special names:
+	'(noname)':	the file name is not available
+	'(03:02)':	the file is a block device file of major:minor
+	'...(deleted)': the named file has been deleted from the disk
+
+To list the cached pages of a perticular file:
+
+    echo /bin/bash > /proc/filecache
+    cat /proc/filecache
+
+    # file /bin/bash
+    # flags R:referenced A:active U:uptodate D:dirty W:writeback M:mmap
+    # idx   len     state   refcnt
+    0       36      RAU__M  3
+    36      1       RAU__M  2
+    37      8       RAU__M  3
+    45      2       RAU___  1
+    47      6       RAU__M  3
+    53      3       RAU__M  2
+    56      2       RAU__M  3
+
+FIELD	INTRO
+----------------------------------------------------------------------------
+idx	page index
+len	number of pages which are cached and share the same state
+state	page state of the flags listed in line two
+refcnt	page reference count
+
+Careful users may notice that the file name to be queried is remembered between
+commands. Internally, the module has a global variable to store the file name
+parameter, so that it can be inherited by newly opened /proc/filecache file.
+However it can lead to interference for multiple queriers. The solution here
+is to obey a rule: only root can interactively change the file name parameter;
+normal users must go for scripts to access the interface. Scripts should do it
+by following the code example below:
+
+    filecache = open("/proc/filecache", "rw");
+    # avoid polluting the global parameter filename
+    filecache.write("set private");
+
+To instruct the kernel to drop clean caches, dentries and inodes from memory,
+causing that memory to become free:
+
+    # drop clean file data cache (i.e. file backed pagecache)
+    echo drop pagecache > /proc/filecache
+
+    # drop clean file metadata cache (i.e. dentries and inodes)
+    echo drop slabcache > /proc/filecache
+
+Note that the drop commands are non-destructive operations and dirty objects
+are not freeable, the user should run `sync' first.
 
 MemTotal:     16344972 kB
 MemFree:      13634064 kB
--- /dev/null
+++ linux-2.6/fs/proc/filecache.c
@@ -0,0 +1,1045 @@
+/*
+ * fs/proc/filecache.c
+ *
+ * Copyright (C) 2006, 2007 Fengguang Wu <wfg@mail.ustc.edu.cn>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/radix-tree.h>
+#include <linux/page-flags.h>
+#include <linux/pagevec.h>
+#include <linux/pagemap.h>
+#include <linux/vmalloc.h>
+#include <linux/writeback.h>
+#include <linux/buffer_head.h>
+#include <linux/parser.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/module.h>
+#include <asm/uaccess.h>
+
+/*
+ * Increase minor version when new columns are added;
+ * Increase major version when existing columns are changed.
+ */
+#define FILECACHE_VERSION	"1.0"
+
+/* Internal buffer sizes. The larger the more effcient. */
+#define SBUF_SIZE	(128<<10)
+#define IWIN_PAGE_ORDER	3
+#define IWIN_SIZE	((PAGE_SIZE<<IWIN_PAGE_ORDER) / sizeof(struct inode *))
+
+/*
+ * Session management.
+ *
+ * Each opened /proc/filecache file is assiocated with a session object.
+ * Also there is a global_session that maintains status across open()/close()
+ * (i.e. the lifetime of an opened file), so that a casual user can query the
+ * filecache via _multiple_ simple shell commands like
+ * 'echo cat /bin/bash > /proc/filecache; cat /proc/filecache'.
+ *
+ * session.query_file is the file whose cache info is to be queried.
+ * Its value determines what we get on read():
+ * 	- NULL: ii_*() called to show the inode index
+ * 	- filp: pg_*() called to show the page groups of a filp
+ *
+ * session.query_file is
+ * 	- cloned from global_session.query_file on open();
+ * 	- updated on write("cat filename");
+ * 	  note that the new file will also be saved in global_session.query_file if
+ * 	  session.private_session is false.
+ */
+
+struct session {
+	/* options */
+	int		private_session;
+	unsigned long	ls_options;
+	dev_t		ls_dev;
+
+	/* parameters */
+	struct file	*query_file;
+
+	/* seqfile pos */
+	pgoff_t		start_offset;
+	pgoff_t		next_offset;
+
+	/* inode at last pos */
+	struct {
+		unsigned long pos;
+		unsigned long state;
+		struct inode *inode;
+		struct inode *pinned_inode;
+	} ipos;
+
+	/* inode window */
+	struct {
+		unsigned long cursor;
+		unsigned long origin;
+		unsigned long size;
+		struct inode **inodes;
+	} iwin;
+};
+
+static struct session global_session;
+
+/*
+ * Session address is stored in proc_file->f_ra.start:
+ * we assume that there will be no readahead for proc_file.
+ */
+static struct session *get_session(struct file *proc_file)
+{
+	return (struct session *)proc_file->f_ra.start;
+}
+
+static void set_session(struct file *proc_file, struct session *s)
+{
+	BUG_ON(proc_file->f_ra.start);
+	proc_file->f_ra.start = (unsigned long)s;
+}
+
+static void update_global_file(struct session *s)
+{
+	if (s->private_session)
+		return;
+
+	if (global_session.query_file)
+		fput(global_session.query_file);
+
+	global_session.query_file = s->query_file;
+
+	if (global_session.query_file)
+		get_file(global_session.query_file);
+}
+
+/*
+ * Cases of the name:
+ * 1) NULL                (new session)
+ * 	s->query_file = global_session.query_file = 0;
+ * 2) ""                  (ls/la)
+ * 	s->query_file = global_session.query_file;
+ * 3) a regular file name (cat newfile)
+ * 	s->query_file = global_session.query_file = newfile;
+ */
+static int session_update_file(struct session *s, char *name)
+{
+	static DEFINE_MUTEX(mutex); /* protects global_session.query_file */
+	int err = 0;
+
+	mutex_lock(&mutex);
+
+	/*
+	 * We are to quit, or to list the cached files.
+	 * Reset *.query_file.
+	 */
+	if (!name) {
+		if (s->query_file) {
+			fput(s->query_file);
+			s->query_file = NULL;
+		}
+		update_global_file(s);
+		goto out;
+	}
+
+	/*
+	 * This is a new session.
+	 * Inherit options/parameters from global ones.
+	 */
+	if (name[0] == '\0') {
+		*s = global_session;
+		if (s->query_file)
+			get_file(s->query_file);
+		goto out;
+	}
+
+	/*
+	 * Open the named file.
+	 */
+	if (s->query_file)
+		fput(s->query_file);
+	s->query_file = filp_open(name, O_RDONLY|O_LARGEFILE, 0);
+	if (IS_ERR(s->query_file)) {
+		err = PTR_ERR(s->query_file);
+		s->query_file = NULL;
+	} else
+		update_global_file(s);
+
+out:
+	mutex_unlock(&mutex);
+
+	return err;
+}
+
+static struct session *session_create(void)
+{
+	struct session *s;
+	int err = 0;
+
+	s = kmalloc(sizeof(*s), GFP_KERNEL);
+	if (s)
+		err = session_update_file(s, "");
+	else
+		err = -ENOMEM;
+
+	return err ? ERR_PTR(err) : s;
+}
+
+static void session_release(struct session *s)
+{
+	if (s->ipos.pinned_inode)
+		iput(s->ipos.pinned_inode);
+	if (s->query_file)
+		fput(s->query_file);
+	kfree(s);
+}
+
+
+/*
+ * Listing of cached files.
+ *
+ * Usage:
+ * 		echo > /proc/filecache  # enter listing mode
+ * 		cat /proc/filecache     # get the file listing
+ */
+
+/* code style borrowed from ib_srp.c */
+enum {
+	LS_OPT_ERR	=	0,
+	LS_OPT_NOCLEAN	=	1 << 0,
+	LS_OPT_NODIRTY	=	1 << 1,
+	LS_OPT_NOUNUSED	=	1 << 2,
+	LS_OPT_EMPTY	=	1 << 3,
+	LS_OPT_ALL	=	1 << 4,
+	LS_OPT_DEV	=	1 << 5,
+};
+
+static match_table_t ls_opt_tokens = {
+	{ LS_OPT_NOCLEAN,	"noclean" 	},
+	{ LS_OPT_NODIRTY,	"nodirty" 	},
+	{ LS_OPT_NOUNUSED,	"nounused" 	},
+	{ LS_OPT_EMPTY,		"empty"		},
+	{ LS_OPT_ALL,		"all" 		},
+	{ LS_OPT_DEV,		"dev=%s"	},
+	{ LS_OPT_ERR,		NULL 		}
+};
+
+static int ls_parse_options(const char *buf, struct session *s)
+{
+	substring_t args[MAX_OPT_ARGS];
+	char *options, *sep_opt;
+	char *p;
+	int token;
+	int ret = 0;
+
+	if (!buf)
+		return 0;
+	options = kstrdup(buf, GFP_KERNEL);
+	if (!options)
+		return -ENOMEM;
+
+	s->ls_options = 0;
+	sep_opt = options;
+	while ((p = strsep(&sep_opt, " ")) != NULL) {
+		if (!*p)
+			continue;
+
+		token = match_token(p, ls_opt_tokens, args);
+
+		switch (token) {
+		case LS_OPT_NOCLEAN:
+		case LS_OPT_NODIRTY:
+		case LS_OPT_NOUNUSED:
+		case LS_OPT_EMPTY:
+		case LS_OPT_ALL:
+			s->ls_options |= token;
+			break;
+		case LS_OPT_DEV:
+			p = match_strdup(args);
+			if (!p) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			if (*p == '/') {
+				struct kstat stat;
+				struct nameidata nd;
+				ret = path_lookup(p, LOOKUP_FOLLOW, &nd);
+				if (!ret)
+					ret = vfs_getattr(nd.path.mnt,
+							  nd.path.dentry, &stat);
+				if (!ret)
+					s->ls_dev = stat.rdev;
+			} else
+				s->ls_dev = simple_strtoul(p, NULL, 0);
+			/* printk("%lx %s\n", (long)s->ls_dev, p); */
+			kfree(p);
+			break;
+
+		default:
+			printk(KERN_WARNING "unknown parameter or missing value "
+			       "'%s' in ls command\n", p);
+			ret = -EINVAL;
+			goto out;
+		}
+	}
+
+out:
+	kfree(options);
+	return ret;
+}
+
+/*
+ * Add possible filters here.
+ * No permission check: we cannot verify the path's permission anyway.
+ * We simply demand root previledge for accessing /proc/filecache.
+ */
+static int may_show_inode(struct session *s, struct inode *inode)
+{
+	if (!atomic_read(&inode->i_count))
+		return 0;
+	if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+		return 0;
+	if (!inode->i_mapping)
+		return 0;
+
+	if (s->ls_dev && s->ls_dev != inode->i_sb->s_dev)
+		return 0;
+
+	if (s->ls_options & LS_OPT_ALL)
+		return 1;
+
+	if (!(s->ls_options & LS_OPT_EMPTY) && !inode->i_mapping->nrpages)
+		return 0;
+
+	if ((s->ls_options & LS_OPT_NOCLEAN) && !(inode->i_state & I_DIRTY))
+		return 0;
+
+	if ((s->ls_options & LS_OPT_NODIRTY) && (inode->i_state & I_DIRTY))
+		return 0;
+
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+	      S_ISLNK(inode->i_mode) || S_ISBLK(inode->i_mode)))
+		return 0;
+
+	return 1;
+}
+
+/*
+ * Full: there are more data following.
+ */
+static int iwin_full(struct session *s)
+{
+	return !s->iwin.cursor ||
+		s->iwin.cursor > s->iwin.origin + s->iwin.size;
+}
+
+static int iwin_push(struct session *s, struct inode *inode)
+{
+	if (!may_show_inode(s, inode))
+		return 0;
+
+	s->iwin.cursor++;
+
+	if (s->iwin.size >= IWIN_SIZE)
+		return 1;
+
+	if (s->iwin.cursor > s->iwin.origin)
+		s->iwin.inodes[s->iwin.size++] = inode;
+	return 0;
+}
+
+/*
+ * Travease the inode lists in order - newest first.
+ * And fill @s->iwin.inodes with inodes positioned in [@pos, @pos+IWIN_SIZE).
+ */
+static int iwin_fill(struct session *s, unsigned long pos)
+{
+	struct inode *inode;
+	struct super_block *sb;
+
+	s->iwin.origin = pos;
+	s->iwin.cursor = 0;
+	s->iwin.size = 0;
+
+	/*
+	 * We have a cursor inode, clean and expected to be unchanged.
+	 */
+	if (s->ipos.inode && pos >= s->ipos.pos &&
+			!(s->ipos.state & I_DIRTY) &&
+			s->ipos.state == s->ipos.inode->i_state) {
+		inode = s->ipos.inode;
+		s->iwin.cursor = s->ipos.pos;
+		goto continue_from_saved;
+	}
+
+	if (s->ls_options & LS_OPT_NODIRTY)
+		goto clean_inodes;
+
+	spin_lock(&sb_lock);
+	list_for_each_entry(sb, &super_blocks, s_list) {
+		if (s->ls_dev && s->ls_dev != sb->s_dev)
+			continue;
+
+		list_for_each_entry(inode, &sb->s_dirty, i_list) {
+			if (iwin_push(s, inode))
+				goto out_full_unlock;
+		}
+		list_for_each_entry(inode, &sb->s_io, i_list) {
+			if (iwin_push(s, inode))
+				goto out_full_unlock;
+		}
+	}
+	spin_unlock(&sb_lock);
+
+clean_inodes:
+	list_for_each_entry(inode, &inode_in_use, i_list) {
+		if (iwin_push(s, inode))
+			goto out_full;
+continue_from_saved:
+		;
+	}
+
+	if (s->ls_options & LS_OPT_NOUNUSED)
+		return 0;
+
+	list_for_each_entry(inode, &inode_unused, i_list) {
+		if (iwin_push(s, inode))
+			goto out_full;
+	}
+
+	return 0;
+
+out_full_unlock:
+	spin_unlock(&sb_lock);
+out_full:
+	return 1;
+}
+
+static struct inode *iwin_inode(struct session *s, unsigned long pos)
+{
+	if ((iwin_full(s) && pos >= s->iwin.origin + s->iwin.size)
+			  || pos < s->iwin.origin)
+		iwin_fill(s, pos);
+
+	if (pos >= s->iwin.cursor)
+		return NULL;
+
+	s->ipos.pos = pos;
+	s->ipos.inode = s->iwin.inodes[pos - s->iwin.origin];
+	BUG_ON(!s->ipos.inode);
+	return s->ipos.inode;
+}
+
+static void show_inode(struct seq_file *m, struct inode *inode)
+{
+	char state[] = "--"; /* dirty, locked */
+	struct dentry *dentry;
+	loff_t size = i_size_read(inode);
+	unsigned long nrpages;
+	int percent;
+	int refcnt;
+	int shift;
+
+	if (!size)
+		size++;
+
+	if (inode->i_mapping)
+		nrpages = inode->i_mapping->nrpages;
+	else {
+		nrpages = 0;
+		WARN_ON(1);
+	}
+
+	for (shift = 0; (size >> shift) > ULONG_MAX / 128; shift += 12)
+		;
+	percent = min(100UL, (((100 * nrpages) >> shift) << PAGE_CACHE_SHIFT) /
+						(unsigned long)(size >> shift));
+
+	if (inode->i_state & (I_DIRTY_DATASYNC|I_DIRTY_PAGES))
+		state[0] = 'D';
+	else if (inode->i_state & I_DIRTY_SYNC)
+		state[0] = 'd';
+
+	if (inode->i_state & I_LOCK)
+		state[0] = 'L';
+
+	refcnt = 0;
+	list_for_each_entry(dentry, &inode->i_dentry, d_alias) {
+		refcnt += atomic_read(&dentry->d_count);
+	}
+
+	seq_printf(m, "%10lu %10llu %8lu %7d ",
+			inode->i_ino,
+			DIV_ROUND_UP(size, 1024),
+			nrpages << (PAGE_CACHE_SHIFT - 10),
+			percent);
+
+	seq_printf(m, "%6d %5s ",
+			refcnt,
+			state);
+
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+	seq_printf(m, "%8u %5u %-16s",
+			inode->i_access_count,
+			inode->i_cuid,
+			inode->i_comm);
+#endif
+
+	seq_printf(m, "%02x:%02x(%s)\t",
+			MAJOR(inode->i_sb->s_dev),
+			MINOR(inode->i_sb->s_dev),
+			inode->i_sb->s_id);
+
+	if (list_empty(&inode->i_dentry)) {
+		if (!atomic_read(&inode->i_count))
+			seq_puts(m, "(noname)\n");
+		else
+			seq_printf(m, "(%02x:%02x)\n",
+					imajor(inode), iminor(inode));
+	} else {
+		struct path path = {
+			.mnt = NULL,
+			.dentry = list_entry(inode->i_dentry.next,
+					     struct dentry, d_alias)
+		};
+
+		seq_path(m, &path, " \t\n\\");
+		seq_putc(m, '\n');
+	}
+}
+
+static int ii_show(struct seq_file *m, void *v)
+{
+	unsigned long index = *(loff_t *) v;
+	struct session *s = m->private;
+        struct inode *inode;
+
+	if (index == 0) {
+		seq_puts(m, "# filecache " FILECACHE_VERSION "\n");
+		seq_puts(m, "#      ino       size   cached cached% "
+				"refcnt state "
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+				"accessed   uid process         "
+#endif
+				"dev\t\tfile\n");
+	}
+
+        inode = iwin_inode(s,index);
+	show_inode(m, inode);
+
+	return 0;
+}
+
+static void *ii_start(struct seq_file *m, loff_t *pos)
+{
+	struct session *s = m->private;
+
+	s->iwin.size = 0;
+	s->iwin.inodes = (struct inode **)
+				__get_free_pages( GFP_KERNEL, IWIN_PAGE_ORDER);
+	if (!s->iwin.inodes)
+		return NULL;
+
+	spin_lock(&inode_lock);
+
+	return iwin_inode(s, *pos) ? pos : NULL;
+}
+
+static void *ii_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct session *s = m->private;
+
+	(*pos)++;
+	return iwin_inode(s, *pos) ? pos : NULL;
+}
+
+static void ii_stop(struct seq_file *m, void *v)
+{
+	struct session *s = m->private;
+	struct inode *inode = s->ipos.inode;
+
+	if (!s->iwin.inodes)
+		return;
+
+	if (inode) {
+		__iget(inode);
+		s->ipos.state = inode->i_state;
+	}
+	spin_unlock(&inode_lock);
+
+	free_pages((unsigned long) s->iwin.inodes, IWIN_PAGE_ORDER);
+	if (s->ipos.pinned_inode)
+		iput(s->ipos.pinned_inode);
+	s->ipos.pinned_inode = inode;
+}
+
+/*
+ * Listing of cached page ranges of a file.
+ *
+ * Usage:
+ * 		echo 'file name' > /proc/filecache
+ * 		cat /proc/filecache
+ */
+
+unsigned long page_mask;
+#define PG_MMAP		PG_lru		/* reuse any non-relevant flag */
+#define PG_BUFFER	PG_swapcache	/* ditto */
+#define PG_DIRTY	PG_error	/* ditto */
+#define PG_WRITEBACK	PG_buddy	/* ditto */
+
+/*
+ * Page state names, prefixed by their abbreviations.
+ */
+struct {
+	unsigned long	mask;
+	const char     *name;
+	int		faked;
+} page_flag [] = {
+	{1 << PG_referenced,	"R:referenced",	0},
+	{1 << PG_active,	"A:active",	0},
+	{1 << PG_MMAP,		"M:mmap",	1},
+
+	{1 << PG_uptodate,	"U:uptodate",	0},
+	{1 << PG_dirty,		"D:dirty",	0},
+	{1 << PG_writeback,	"W:writeback",	0},
+	{1 << PG_reclaim,	"X:readahead",	0},
+
+	{1 << PG_private,	"P:private",	0},
+	{1 << PG_owner_priv_1,	"O:owner",	0},
+
+	{1 << PG_BUFFER,	"b:buffer",	1},
+	{1 << PG_DIRTY,		"d:dirty",	1},
+	{1 << PG_WRITEBACK,	"w:writeback",	1},
+};
+
+static unsigned long page_flags(struct page* page)
+{
+	unsigned long flags;
+	struct address_space *mapping = page_mapping(page);
+
+	flags = page->flags & page_mask;
+
+	if (page_mapped(page))
+		flags |= (1 << PG_MMAP);
+
+	if (page_has_buffers(page))
+		flags |= (1 << PG_BUFFER);
+
+	if (mapping) {
+		if (radix_tree_tag_get(&mapping->page_tree,
+					page_index(page),
+					PAGECACHE_TAG_WRITEBACK))
+			flags |= (1 << PG_WRITEBACK);
+
+		if (radix_tree_tag_get(&mapping->page_tree,
+					page_index(page),
+					PAGECACHE_TAG_DIRTY))
+			flags |= (1 << PG_DIRTY);
+	}
+
+	return flags;
+}
+
+static int pages_similiar(struct page* page0, struct page* page)
+{
+	if (page_count(page0) != page_count(page))
+		return 0;
+
+	if (page_flags(page0) != page_flags(page))
+		return 0;
+
+	return 1;
+}
+
+static void show_range(struct seq_file *m, struct page* page, unsigned long len)
+{
+	int i;
+	unsigned long flags;
+
+	if (!m || !page)
+		return;
+
+	seq_printf(m, "%lu\t%lu\t", page->index, len);
+
+	flags = page_flags(page);
+	for (i = 0; i < ARRAY_SIZE(page_flag); i++)
+		seq_putc(m, (flags & page_flag[i].mask) ?
+					page_flag[i].name[0] : '_');
+
+	seq_printf(m, "\t%d\n", page_count(page));
+}
+
+#define BATCH_LINES	100
+static pgoff_t show_file_cache(struct seq_file *m,
+				struct address_space *mapping, pgoff_t start)
+{
+	int i;
+	int lines = 0;
+	pgoff_t len = 0;
+	struct pagevec pvec;
+	struct page *page;
+	struct page *page0 = NULL;
+
+	for (;;) {
+		pagevec_init(&pvec, 0);
+		pvec.nr = radix_tree_gang_lookup(&mapping->page_tree,
+				(void **)pvec.pages, start + len, PAGEVEC_SIZE);
+
+		if (pvec.nr == 0) {
+			show_range(m, page0, len);
+			start = ULONG_MAX;
+			goto out;
+		}
+
+		if (!page0)
+			page0 = pvec.pages[0];
+
+		for (i = 0; i < pvec.nr; i++) {
+			page = pvec.pages[i];
+
+			if (page->index == start + len &&
+					pages_similiar(page0, page))
+				len++;
+			else {
+				show_range(m, page0, len);
+				page0 = page;
+				start = page->index;
+				len = 1;
+				if (++lines > BATCH_LINES)
+					goto out;
+			}
+		}
+	}
+
+out:
+	return start;
+}
+
+static int pg_show(struct seq_file *m, void *v)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+	pgoff_t offset;
+
+	if (!file)
+		return ii_show(m, v);
+
+	offset = *(loff_t *) v;
+
+	if (!offset) { /* print header */
+		int i;
+
+		seq_puts(m, "# file ");
+		seq_path(m, &file->f_path, " \t\n\\");
+
+		seq_puts(m, "\n# flags");
+		for (i = 0; i < ARRAY_SIZE(page_flag); i++)
+			seq_printf(m, " %s", page_flag[i].name);
+
+		seq_puts(m, "\n# idx\tlen\tstate\t\trefcnt\n");
+	}
+
+	s->start_offset = offset;
+	s->next_offset = show_file_cache(m, file->f_mapping, offset);
+
+	return 0;
+}
+
+static void *file_pos(struct file *file, loff_t *pos)
+{
+	loff_t size = i_size_read(file->f_mapping->host);
+	pgoff_t end = DIV_ROUND_UP(size, PAGE_CACHE_SIZE);
+	pgoff_t offset = *pos;
+
+	return offset < end ? pos : NULL;
+}
+
+static void *pg_start(struct seq_file *m, loff_t *pos)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+	pgoff_t offset = *pos;
+
+	if (!file)
+		return ii_start(m, pos);
+
+	rcu_read_lock();
+
+	if (offset - s->start_offset == 1)
+		*pos = s->next_offset;
+	return file_pos(file, pos);
+}
+
+static void *pg_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+
+	if (!file)
+		return ii_next(m, v, pos);
+
+	*pos = s->next_offset;
+	return file_pos(file, pos);
+}
+
+static void pg_stop(struct seq_file *m, void *v)
+{
+	struct session *s = m->private;
+	struct file *file = s->query_file;
+
+	if (!file)
+		return ii_stop(m, v);
+
+	rcu_read_unlock();
+}
+
+struct seq_operations seq_filecache_op = {
+	.start	= pg_start,
+	.next	= pg_next,
+	.stop	= pg_stop,
+	.show	= pg_show,
+};
+
+/*
+ * Implement the manual drop-all-pagecache function
+ */
+
+#define MAX_INODES	(PAGE_SIZE / sizeof(struct inode *))
+static int drop_pagecache(void)
+{
+	struct hlist_head *head;
+	struct hlist_node *node;
+	struct inode *inode;
+	struct inode **inodes;
+	unsigned long i, j, k;
+	int err = 0;
+
+	inodes = (struct inode **)__get_free_pages(GFP_KERNEL, IWIN_PAGE_ORDER);
+	if (!inodes)
+		return -ENOMEM;
+
+	for (i = 0; (head = get_inode_hash_budget(i)); i++) {
+		if (hlist_empty(head))
+			continue;
+
+		j = 0;
+		cond_resched();
+
+		/*
+		 * Grab some inodes.
+		 */
+		spin_lock(&inode_lock);
+		hlist_for_each (node, head) {
+			inode = hlist_entry(node, struct inode, i_hash);
+			if (!atomic_read(&inode->i_count))
+				continue;
+			if (inode->i_state & (I_FREEING|I_CLEAR|I_WILL_FREE))
+				continue;
+			if (!inode->i_mapping || !inode->i_mapping->nrpages)
+				continue;
+			__iget(inode);
+			inodes[j++] = inode;
+			if (j >= MAX_INODES)
+				break;
+		}
+		spin_unlock(&inode_lock);
+
+		/*
+		 * Free clean pages.
+		 */
+		for (k = 0; k < j; k++) {
+			inode = inodes[k];
+			invalidate_mapping_pages(inode->i_mapping, 0, ~1);
+			iput(inode);
+		}
+
+		/*
+		 * Simply ignore the remaining inodes.
+		 */
+		if (j >= MAX_INODES && !err) {
+			printk(KERN_WARNING
+				"Too many collides in inode hash table.\n"
+				"Pls boot with a larger ihash_entries=XXX.\n");
+			err = -EAGAIN;
+		}
+	}
+
+	free_pages((unsigned long) inodes, IWIN_PAGE_ORDER);
+	return err;
+}
+
+static void drop_slabcache(void)
+{
+	int nr_objects;
+
+	do {
+		nr_objects = shrink_slab(1000, GFP_KERNEL, 1000);
+	} while (nr_objects > 10);
+}
+
+/*
+ * Proc file operations.
+ */
+
+static int filecache_open(struct inode *inode, struct file *proc_file)
+{
+	struct seq_file *m;
+	struct session *s;
+	unsigned size;
+	char *buf = 0;
+	int ret;
+
+	if (!try_module_get(THIS_MODULE))
+		return -ENOENT;
+
+	s = session_create();
+	if (IS_ERR(s)) {
+		ret = PTR_ERR(s);
+		goto out;
+	}
+	set_session(proc_file, s);
+
+	size = SBUF_SIZE;
+	buf = kmalloc(size, GFP_KERNEL);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	ret = seq_open(proc_file, &seq_filecache_op);
+	if (!ret) {
+		m = proc_file->private_data;
+		m->private = s;
+		m->buf = buf;
+		m->size = size;
+	}
+
+out:
+	if (ret) {
+		kfree(s);
+		kfree(buf);
+		module_put(THIS_MODULE);
+	}
+	return ret;
+}
+
+static int filecache_release(struct inode *inode, struct file *proc_file)
+{
+	struct session *s = get_session(proc_file);
+	int ret;
+
+	session_release(s);
+	ret = seq_release(inode, proc_file);
+	module_put(THIS_MODULE);
+	return ret;
+}
+
+ssize_t filecache_write(struct file *proc_file, const char __user * buffer,
+			size_t count, loff_t *ppos)
+{
+	struct session *s;
+	char *name;
+	int err = 0;
+
+	if (count >= PATH_MAX + 5)
+		return -ENAMETOOLONG;
+
+	name = kmalloc(count+1, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+
+	if (copy_from_user(name, buffer, count)) {
+		err = -EFAULT;
+		goto out;
+	}
+
+	/* strip the optional newline */
+	if (count && name[count-1] == '\n')
+		name[count-1] = '\0';
+	else
+		name[count] = '\0';
+
+	s = get_session(proc_file);
+	if (!strcmp(name, "set private")) {
+		s->private_session = 1;
+		goto out;
+	}
+
+	if (!strncmp(name, "cat ", 4)) {
+		err = session_update_file(s, name+4);
+		goto out;
+	}
+
+	if (!strncmp(name, "ls", 2)) {
+		err = session_update_file(s, NULL);
+		if (!err)
+			err = ls_parse_options(name+2, s);
+		if (!err && !s->private_session) {
+			global_session.ls_dev = s->ls_dev;
+			global_session.ls_options = s->ls_options;
+		}
+		goto out;
+	}
+
+	if (!strncmp(name, "drop pagecache", 14)) {
+		err = drop_pagecache();
+		goto out;
+	}
+
+	if (!strncmp(name, "drop slabcache", 14)) {
+		drop_slabcache();
+		goto out;
+	}
+
+	/* err = -EINVAL; */
+	err = session_update_file(s, name);
+
+out:
+	kfree(name);
+
+	return err ? err : count;
+}
+
+static struct file_operations proc_filecache_fops = {
+	.owner		= THIS_MODULE,
+	.open		= filecache_open,
+	.release	= filecache_release,
+	.write		= filecache_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+};
+
+
+static __init int filecache_init(void)
+{
+	int i;
+	struct proc_dir_entry *entry;
+
+	entry = create_proc_entry("filecache", 0600, NULL);
+	if (entry)
+		entry->proc_fops = &proc_filecache_fops;
+
+	for (page_mask = i = 0; i < ARRAY_SIZE(page_flag); i++)
+		if (!page_flag[i].faked)
+			page_mask |= page_flag[i].mask;
+
+	return 0;
+}
+
+static void filecache_exit(void)
+{
+	remove_proc_entry("filecache", NULL);
+	if (global_session.query_file)
+		fput(global_session.query_file);
+}
+
+MODULE_AUTHOR("Fengguang Wu <wfg@mail.ustc.edu.cn>");
+MODULE_LICENSE("GPL");
+
+module_init(filecache_init);
+module_exit(filecache_exit);
--- linux-2.6.orig/include/linux/fs.h
+++ linux-2.6/include/linux/fs.h
@@ -685,6 +685,12 @@ struct inode {
 	void			*i_security;
 #endif
 	void			*i_private; /* fs or device private pointer */
+
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+	unsigned int		i_access_count;	/* opened how many times? */
+	uid_t			i_cuid;		/* opened first by which user? */
+	char			i_comm[16];	/* opened first by which app? */
+#endif
 };
 
 /*
@@ -773,6 +779,13 @@ static inline unsigned imajor(const stru
 	return MAJOR(inode->i_rdev);
 }
 
+static inline void inode_accessed(struct inode *inode)
+{
+#ifdef CONFIG_PROC_FILECACHE_EXTRAS
+	inode->i_access_count++;
+#endif
+}
+
 extern struct block_device *I_BDEV(struct inode *inode);
 
 struct fown_struct {
@@ -1907,6 +1920,7 @@ extern void remove_inode_hash(struct ino
 static inline void insert_inode_hash(struct inode *inode) {
 	__insert_inode_hash(inode, inode->i_ino);
 }
+struct hlist_head * get_inode_hash_budget(unsigned long index);
 
 extern struct file * get_empty_filp(void);
 extern void file_move(struct file *f, struct list_head *list);
--- linux-2.6.orig/fs/open.c
+++ linux-2.6/fs/open.c
@@ -828,6 +828,7 @@ static struct file *__dentry_open(struct
 			goto cleanup_all;
 	}
 
+	inode_accessed(inode);
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
--- linux-2.6.orig/fs/Kconfig
+++ linux-2.6/fs/Kconfig
@@ -750,6 +750,36 @@ config CONFIGFS_FS
 	  Both sysfs and configfs can and should exist together on the
 	  same system. One is not a replacement for the other.
 
+config PROC_FILECACHE
+	tristate "/proc/filecache support"
+	default m
+	depends on PROC_FS
+	help
+	  This option creates a file /proc/filecache which enables one to
+	  query/drop the cached files in memory.
+
+	  A quick start guide:
+
+	  # echo 'ls' > /proc/filecache
+	  # head /proc/filecache
+
+	  # echo 'cat /bin/bash' > /proc/filecache
+	  # head /proc/filecache
+
+	  # echo 'drop pagecache' > /proc/filecache
+	  # echo 'drop slabcache' > /proc/filecache
+
+	  For more details, please check Documentation/filesystems/proc.txt .
+
+	  It can be a handy tool for sysadms and desktop users.
+
+config PROC_FILECACHE_EXTRAS
+	bool "track extra states"
+	default y
+	depends on PROC_FILECACHE
+	help
+	  Track extra states that costs a little more time/space.
+
 endmenu
 
 menu "Miscellaneous filesystems"
--- linux-2.6.orig/fs/proc/Makefile
+++ linux-2.6/fs/proc/Makefile
@@ -2,7 +2,8 @@
 # Makefile for the Linux proc filesystem routines.
 #
 
-obj-$(CONFIG_PROC_FS) += proc.o
+obj-$(CONFIG_PROC_FS)		+= proc.o
+obj-$(CONFIG_PROC_FILECACHE)	+= filecache.o
 
 proc-y			:= nommu.o task_nommu.o
 proc-$(CONFIG_MMU)	:= mmu.o task_mmu.o

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 11:55                       ` drop_caches Markus
@ 2009-03-05 13:36                         ` Wu Fengguang
  -1 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 13:36 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

Hi Markus,

On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > Markus, you may want to try this patch, it will have better chance to figure out
> > the hidden file pages.
> > 
> > 1) apply the patch and recompile kernel with CONFIG_PROC_FILECACHE=m
> > 2) after booting:
> >         modprobe filecache
> >         cp /proc/filecache filecache-`date +'%F'`
> > 3) send us the copied file, it will list all cached files, including
> >    the normally hidden ones.
> 
> The file consists of 674 lines. If I interpret it right, "size" is the 
> filesize and "cached" the amount of the file being in cache (why can 
> this be bigger than the file?!).

          size = file size in bytes
        cached = cached pages

So it's normal that (size > cached).

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 13:36                         ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 13:36 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

Hi Markus,

On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > Markus, you may want to try this patch, it will have better chance to figure out
> > the hidden file pages.
> > 
> > 1) apply the patch and recompile kernel with CONFIG_PROC_FILECACHE=m
> > 2) after booting:
> >         modprobe filecache
> >         cp /proc/filecache filecache-`date +'%F'`
> > 3) send us the copied file, it will list all cached files, including
> >    the normally hidden ones.
> 
> The file consists of 674 lines. If I interpret it right, "size" is the 
> filesize and "cached" the amount of the file being in cache (why can 
> this be bigger than the file?!).

          size = file size in bytes
        cached = cached pages

So it's normal that (size > cached).

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 13:36                         ` drop_caches Wu Fengguang
@ 2009-03-05 13:45                           ` Lukas Hejtmanek
  -1 siblings, 0 replies; 49+ messages in thread
From: Lukas Hejtmanek @ 2009-03-05 13:45 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Markus, linux-kernel, Zdenek Kabelac, linux-mm

On Thu, Mar 05, 2009 at 09:36:03PM +0800, Wu Fengguang wrote:
>           size = file size in bytes
>         cached = cached pages
> 
> So it's normal that (size > cached).

I guess, the question was how it can happen that (cached > size).

-- 
Lukáš Hejtmánek

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 13:45                           ` Lukas Hejtmanek
  0 siblings, 0 replies; 49+ messages in thread
From: Lukas Hejtmanek @ 2009-03-05 13:45 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Markus, linux-kernel, Zdenek Kabelac, linux-mm

On Thu, Mar 05, 2009 at 09:36:03PM +0800, Wu Fengguang wrote:
>           size = file size in bytes
>         cached = cached pages
> 
> So it's normal that (size > cached).

I guess, the question was how it can happen that (cached > size).

-- 
Luka1 Hejtmanek

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 13:45                           ` drop_caches Lukas Hejtmanek
@ 2009-03-05 13:48                             ` Wu Fengguang
  -1 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 13:48 UTC (permalink / raw)
  To: Lukas Hejtmanek; +Cc: Markus, linux-kernel, Zdenek Kabelac, linux-mm

On Thu, Mar 05, 2009 at 03:45:28PM +0200, Lukas Hejtmanek wrote:
> On Thu, Mar 05, 2009 at 09:36:03PM +0800, Wu Fengguang wrote:
> >           size = file size in bytes
> >         cached = cached pages
> > 
> > So it's normal that (size > cached).
> 
> I guess, the question was how it can happen that (cached > size).

Ah, because cached size is rounded up to page boundaries,
so a 1K sized file will have 4K cached size.

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 13:48                             ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 13:48 UTC (permalink / raw)
  To: Lukas Hejtmanek; +Cc: Markus, linux-kernel, Zdenek Kabelac, linux-mm

On Thu, Mar 05, 2009 at 03:45:28PM +0200, Lukas Hejtmanek wrote:
> On Thu, Mar 05, 2009 at 09:36:03PM +0800, Wu Fengguang wrote:
> >           size = file size in bytes
> >         cached = cached pages
> > 
> > So it's normal that (size > cached).
> 
> I guess, the question was how it can happen that (cached > size).

Ah, because cached size is rounded up to page boundaries,
so a 1K sized file will have 4K cached size.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 13:36                         ` drop_caches Wu Fengguang
@ 2009-03-05 13:50                           ` Markus
  -1 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wu Fengguang, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

Am Donnerstag, 5. März 2009 schrieb Wu Fengguang:
> Hi Markus,
> 
> On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > > Markus, you may want to try this patch, it will have better chance 
to figure out
> > > the hidden file pages.
> > > 
> > > 1) apply the patch and recompile kernel with 
CONFIG_PROC_FILECACHE=m
> > > 2) after booting:
> > >         modprobe filecache
> > >         cp /proc/filecache filecache-`date +'%F'`
> > > 3) send us the copied file, it will list all cached files, 
including
> > >    the normally hidden ones.
> > 
> > The file consists of 674 lines. If I interpret it right, "size" is 
the 
> > filesize and "cached" the amount of the file being in cache (why can 
> > this be bigger than the file?!).
> 
>           size = file size in bytes
>         cached = cached pages
> 
> So it's normal that (size > cached).

Yeah, I just wondered because sometimes its size < cached, buts thats 
because cached must obviously be a multiple of 4 KB. So no problem 
here ;)

Thanks,
Markus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 13:50                           ` Markus
  0 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-05 13:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wu Fengguang, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

Am Donnerstag, 5. März 2009 schrieb Wu Fengguang:
> Hi Markus,
> 
> On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > > Markus, you may want to try this patch, it will have better chance 
to figure out
> > > the hidden file pages.
> > > 
> > > 1) apply the patch and recompile kernel with 
CONFIG_PROC_FILECACHE=m
> > > 2) after booting:
> > >         modprobe filecache
> > >         cp /proc/filecache filecache-`date +'%F'`
> > > 3) send us the copied file, it will list all cached files, 
including
> > >    the normally hidden ones.
> > 
> > The file consists of 674 lines. If I interpret it right, "size" is 
the 
> > filesize and "cached" the amount of the file being in cache (why can 
> > this be bigger than the file?!).
> 
>           size = file size in bytes
>         cached = cached pages
> 
> So it's normal that (size > cached).

Yeah, I just wondered because sometimes its size < cached, buts thats 
because cached must obviously be a multiple of 4 KB. So no problem 
here ;)

Thanks,
Markus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 13:36                         ` drop_caches Wu Fengguang
@ 2009-03-05 14:01                           ` Lukas Hejtmanek
  -1 siblings, 0 replies; 49+ messages in thread
From: Lukas Hejtmanek @ 2009-03-05 14:01 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Markus, linux-kernel, Zdenek Kabelac, linux-mm

On Thu, Mar 05, 2009 at 09:36:03PM +0800, Wu Fengguang wrote:
> > filesize and "cached" the amount of the file being in cache (why can 
> > this be bigger than the file?!).
> 
>           size = file size in bytes
>         cached = cached pages
> 
> So it's normal that (size > cached).

and one more thing. It seems that at least in the version of filecache I have,
the size and cached are in kB rather than in B.

-- 
Lukáš Hejtmánek

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 14:01                           ` Lukas Hejtmanek
  0 siblings, 0 replies; 49+ messages in thread
From: Lukas Hejtmanek @ 2009-03-05 14:01 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Markus, linux-kernel, Zdenek Kabelac, linux-mm

On Thu, Mar 05, 2009 at 09:36:03PM +0800, Wu Fengguang wrote:
> > filesize and "cached" the amount of the file being in cache (why can 
> > this be bigger than the file?!).
> 
>           size = file size in bytes
>         cached = cached pages
> 
> So it's normal that (size > cached).

and one more thing. It seems that at least in the version of filecache I have,
the size and cached are in kB rather than in B.

-- 
Luka1 Hejtmanek

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 13:29                       ` drop_caches Wu Fengguang
@ 2009-03-05 14:05                           ` Markus
  0 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-05 14:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wu Fengguang, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

> Could you please try the attached patch which will also show the
> user and process that opened these files? It adds three more fields
> when CONFIG_PROC_FILECACHE_EXTRAS is selected.
> 
> Thanks,
> Fengguang
>  
> On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > 
> > # sort -n -k 3 filecache-2009-03-05 | tail -n 5
> >      15886       7112     7112     100      1    d- 00:08
> > (tmpfs)        /dev/zero\040(deleted)
> >      16209      35708    35708     100      1    d- 00:08
> > (tmpfs)        /dev/zero\040(deleted)
> >      16212      82128    82128     100      1    d- 00:08
> > (tmpfs)        /dev/zero\040(deleted)
> >      15887     340024   340024     100      1    d- 00:08
> > (tmpfs)        /dev/zero\040(deleted)
> >      15884     455008   455008     100      1    d- 00:08
> > (tmpfs)        /dev/zero\040(deleted)
> > 
> > The sum of the third column is 1013 MB.
> > To note the biggest ones (or do you want the whole file?)... and 
thats 
> > after a sync and a drop_caches! (Can be seen in the commands given.)

I could, but I know where these things belong to. Its from sphinx (a 
mysql indexer) searchd. It loads parts of the index into memory.
The sizes looked well-known and killing the searchd will reduce "cached" 
to a normal amount ;)

I just dont know why its in "cached" (can that be swapped out btw?).
But I think thats not a problem of the kernel, but of anonymous 
mmap-ing.

I think its resolved, thanks to everybody and Fengguang in particular!

Markus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 14:05                           ` Markus
  0 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-05 14:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wu Fengguang, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

> Could you please try the attached patch which will also show the
> user and process that opened these files? It adds three more fields
> when CONFIG_PROC_FILECACHE_EXTRAS is selected.
> 
> Thanks,
> Fengguang
>  
> On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > 
> > # sort -n -k 3 filecache-2009-03-05 | tail -n 5
> >      15886       7112     7112     100      1    d- 00:08
> > (tmpfs)        /dev/zero\040(deleted)
> >      16209      35708    35708     100      1    d- 00:08
> > (tmpfs)        /dev/zero\040(deleted)
> >      16212      82128    82128     100      1    d- 00:08
> > (tmpfs)        /dev/zero\040(deleted)
> >      15887     340024   340024     100      1    d- 00:08
> > (tmpfs)        /dev/zero\040(deleted)
> >      15884     455008   455008     100      1    d- 00:08
> > (tmpfs)        /dev/zero\040(deleted)
> > 
> > The sum of the third column is 1013 MB.
> > To note the biggest ones (or do you want the whole file?)... and 
thats 
> > after a sync and a drop_caches! (Can be seen in the commands given.)

I could, but I know where these things belong to. Its from sphinx (a 
mysql indexer) searchd. It loads parts of the index into memory.
The sizes looked well-known and killing the searchd will reduce "cached" 
to a normal amount ;)

I just dont know why its in "cached" (can that be swapped out btw?).
But I think thats not a problem of the kernel, but of anonymous 
mmap-ing.

I think its resolved, thanks to everybody and Fengguang in particular!

Markus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 14:01                           ` drop_caches Lukas Hejtmanek
@ 2009-03-05 14:07                             ` Wu Fengguang
  -1 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 14:07 UTC (permalink / raw)
  To: Lukas Hejtmanek; +Cc: Markus, linux-kernel, Zdenek Kabelac, linux-mm

On Thu, Mar 05, 2009 at 04:01:25PM +0200, Lukas Hejtmanek wrote:
> On Thu, Mar 05, 2009 at 09:36:03PM +0800, Wu Fengguang wrote:
> > > filesize and "cached" the amount of the file being in cache (why can 
> > > this be bigger than the file?!).
> > 
> >           size = file size in bytes
> >         cached = cached pages
> > 
> > So it's normal that (size > cached).
> 
> and one more thing. It seems that at least in the version of filecache I have,
> the size and cached are in kB rather than in B.

Ah sorry for the confusion, it is in KB: DIV_ROUND_UP(size, 1024).
It may be better to simply use bytes though.

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 14:07                             ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 14:07 UTC (permalink / raw)
  To: Lukas Hejtmanek; +Cc: Markus, linux-kernel, Zdenek Kabelac, linux-mm

On Thu, Mar 05, 2009 at 04:01:25PM +0200, Lukas Hejtmanek wrote:
> On Thu, Mar 05, 2009 at 09:36:03PM +0800, Wu Fengguang wrote:
> > > filesize and "cached" the amount of the file being in cache (why can 
> > > this be bigger than the file?!).
> > 
> >           size = file size in bytes
> >         cached = cached pages
> > 
> > So it's normal that (size > cached).
> 
> and one more thing. It seems that at least in the version of filecache I have,
> the size and cached are in kB rather than in B.

Ah sorry for the confusion, it is in KB: DIV_ROUND_UP(size, 1024).
It may be better to simply use bytes though.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 14:05                           ` drop_caches Markus
@ 2009-03-05 14:22                             ` Wu Fengguang
  -1 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 14:22 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

Hi Markus,

On Thu, Mar 05, 2009 at 04:05:26PM +0200, Markus wrote:
> > Could you please try the attached patch which will also show the
> > user and process that opened these files? It adds three more fields
> > when CONFIG_PROC_FILECACHE_EXTRAS is selected.
> > 
> > Thanks,
> > Fengguang
> >  
> > On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > > 
> > > # sort -n -k 3 filecache-2009-03-05 | tail -n 5
> > >      15886       7112     7112     100      1    d- 00:08
> > > (tmpfs)        /dev/zero\040(deleted)
> > >      16209      35708    35708     100      1    d- 00:08
> > > (tmpfs)        /dev/zero\040(deleted)
> > >      16212      82128    82128     100      1    d- 00:08
> > > (tmpfs)        /dev/zero\040(deleted)
> > >      15887     340024   340024     100      1    d- 00:08
> > > (tmpfs)        /dev/zero\040(deleted)
> > >      15884     455008   455008     100      1    d- 00:08
> > > (tmpfs)        /dev/zero\040(deleted)
> > > 
> > > The sum of the third column is 1013 MB.
> > > To note the biggest ones (or do you want the whole file?)... and 
> thats 
> > > after a sync and a drop_caches! (Can be seen in the commands given.)
> 
> I could, but I know where these things belong to. Its from sphinx (a 
> mysql indexer) searchd. It loads parts of the index into memory.
> The sizes looked well-known and killing the searchd will reduce "cached" 
> to a normal amount ;)

And it's weird about the file name: /dev/zero.  I wonder how it
managed to create that file, and then delete it, inside a tmpfs!

Just out of curiosity, are they shm objects? Can you show us the
output of 'df'? In your convenient time.

> I just dont know why its in "cached" (can that be swapped out btw?).
> But I think thats not a problem of the kernel, but of anonymous 
> mmap-ing.

You know, because the file is created in tmpfs, which is swap-backed.
By definition the pages here cannot be dropped by third-party.

> I think its resolved, thanks to everybody and Fengguang in particular!

You are welcome :-)

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 14:22                             ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 14:22 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

Hi Markus,

On Thu, Mar 05, 2009 at 04:05:26PM +0200, Markus wrote:
> > Could you please try the attached patch which will also show the
> > user and process that opened these files? It adds three more fields
> > when CONFIG_PROC_FILECACHE_EXTRAS is selected.
> > 
> > Thanks,
> > Fengguang
> >  
> > On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > > 
> > > # sort -n -k 3 filecache-2009-03-05 | tail -n 5
> > >      15886       7112     7112     100      1    d- 00:08
> > > (tmpfs)        /dev/zero\040(deleted)
> > >      16209      35708    35708     100      1    d- 00:08
> > > (tmpfs)        /dev/zero\040(deleted)
> > >      16212      82128    82128     100      1    d- 00:08
> > > (tmpfs)        /dev/zero\040(deleted)
> > >      15887     340024   340024     100      1    d- 00:08
> > > (tmpfs)        /dev/zero\040(deleted)
> > >      15884     455008   455008     100      1    d- 00:08
> > > (tmpfs)        /dev/zero\040(deleted)
> > > 
> > > The sum of the third column is 1013 MB.
> > > To note the biggest ones (or do you want the whole file?)... and 
> thats 
> > > after a sync and a drop_caches! (Can be seen in the commands given.)
> 
> I could, but I know where these things belong to. Its from sphinx (a 
> mysql indexer) searchd. It loads parts of the index into memory.
> The sizes looked well-known and killing the searchd will reduce "cached" 
> to a normal amount ;)

And it's weird about the file name: /dev/zero.  I wonder how it
managed to create that file, and then delete it, inside a tmpfs!

Just out of curiosity, are they shm objects? Can you show us the
output of 'df'? In your convenient time.

> I just dont know why its in "cached" (can that be swapped out btw?).
> But I think thats not a problem of the kernel, but of anonymous 
> mmap-ing.

You know, because the file is created in tmpfs, which is swap-backed.
By definition the pages here cannot be dropped by third-party.

> I think its resolved, thanks to everybody and Fengguang in particular!

You are welcome :-)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 14:22                             ` drop_caches Wu Fengguang
@ 2009-03-05 14:43                               ` Markus
  -1 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-05 14:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wu Fengguang, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

> > > Could you please try the attached patch which will also show the
> > > user and process that opened these files? It adds three more 
fields
> > > when CONFIG_PROC_FILECACHE_EXTRAS is selected.
> > > 
> > > Thanks,
> > > Fengguang
> > >  
> > > On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > > > 
> > > > # sort -n -k 3 filecache-2009-03-05 | tail -n 5
> > > >      15886       7112     7112     100      1    d- 00:08
> > > > (tmpfs)        /dev/zero\040(deleted)
> > > >      16209      35708    35708     100      1    d- 00:08
> > > > (tmpfs)        /dev/zero\040(deleted)
> > > >      16212      82128    82128     100      1    d- 00:08
> > > > (tmpfs)        /dev/zero\040(deleted)
> > > >      15887     340024   340024     100      1    d- 00:08
> > > > (tmpfs)        /dev/zero\040(deleted)
> > > >      15884     455008   455008     100      1    d- 00:08
> > > > (tmpfs)        /dev/zero\040(deleted)
> > > > 
> > > > The sum of the third column is 1013 MB.
> > > > To note the biggest ones (or do you want the whole file?)... and 
> > thats 
> > > > after a sync and a drop_caches! (Can be seen in the commands 
given.)
> > 
> > I could, but I know where these things belong to. Its from sphinx (a 
> > mysql indexer) searchd. It loads parts of the index into memory.
> > The sizes looked well-known and killing the searchd will 
reduce "cached" 
> > to a normal amount ;)
> 
> And it's weird about the file name: /dev/zero.  I wonder how it
> managed to create that file, and then delete it, inside a tmpfs!

I dont know exactly. But in the source its just a:
... mmap ( NULL, m_iLength, PROT_READ | PROT_WRITE, MAP_SHARED | 
MAP_ANON, -1, 0 );
Perhaps thats the way shared anonymous memory is handled?!


> Just out of curiosity, are they shm objects? Can you show us the
> output of 'df'? In your convenient time.

Thats all:
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md6               14G  7.9G  5.9G  58% /
udev                   10M  304K  9.8M   3% /dev
cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
/dev/md4               19G   15G  3.2G  82% /home
/dev/md3              8.3G  4.5G  3.8G  55% /usr/portage
shm                   2.0G     0  2.0G   0% /dev/shm


> > I just dont know why its in "cached" (can that be swapped out btw?).
> > But I think thats not a problem of the kernel, but of anonymous 
> > mmap-ing.
> 
> You know, because the file is created in tmpfs, which is swap-backed.
> By definition the pages here cannot be dropped by third-party.

Hm, ok.


> > I think its resolved, thanks to everybody and Fengguang in 
particular!
> 
> You are welcome :-)
;)

Have a nice day.
Markus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 14:43                               ` Markus
  0 siblings, 0 replies; 49+ messages in thread
From: Markus @ 2009-03-05 14:43 UTC (permalink / raw)
  To: linux-kernel; +Cc: Wu Fengguang, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

> > > Could you please try the attached patch which will also show the
> > > user and process that opened these files? It adds three more 
fields
> > > when CONFIG_PROC_FILECACHE_EXTRAS is selected.
> > > 
> > > Thanks,
> > > Fengguang
> > >  
> > > On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > > > 
> > > > # sort -n -k 3 filecache-2009-03-05 | tail -n 5
> > > >      15886       7112     7112     100      1    d- 00:08
> > > > (tmpfs)        /dev/zero\040(deleted)
> > > >      16209      35708    35708     100      1    d- 00:08
> > > > (tmpfs)        /dev/zero\040(deleted)
> > > >      16212      82128    82128     100      1    d- 00:08
> > > > (tmpfs)        /dev/zero\040(deleted)
> > > >      15887     340024   340024     100      1    d- 00:08
> > > > (tmpfs)        /dev/zero\040(deleted)
> > > >      15884     455008   455008     100      1    d- 00:08
> > > > (tmpfs)        /dev/zero\040(deleted)
> > > > 
> > > > The sum of the third column is 1013 MB.
> > > > To note the biggest ones (or do you want the whole file?)... and 
> > thats 
> > > > after a sync and a drop_caches! (Can be seen in the commands 
given.)
> > 
> > I could, but I know where these things belong to. Its from sphinx (a 
> > mysql indexer) searchd. It loads parts of the index into memory.
> > The sizes looked well-known and killing the searchd will 
reduce "cached" 
> > to a normal amount ;)
> 
> And it's weird about the file name: /dev/zero.  I wonder how it
> managed to create that file, and then delete it, inside a tmpfs!

I dont know exactly. But in the source its just a:
... mmap ( NULL, m_iLength, PROT_READ | PROT_WRITE, MAP_SHARED | 
MAP_ANON, -1, 0 );
Perhaps thats the way shared anonymous memory is handled?!


> Just out of curiosity, are they shm objects? Can you show us the
> output of 'df'? In your convenient time.

Thats all:
# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md6               14G  7.9G  5.9G  58% /
udev                   10M  304K  9.8M   3% /dev
cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
/dev/md4               19G   15G  3.2G  82% /home
/dev/md3              8.3G  4.5G  3.8G  55% /usr/portage
shm                   2.0G     0  2.0G   0% /dev/shm


> > I just dont know why its in "cached" (can that be swapped out btw?).
> > But I think thats not a problem of the kernel, but of anonymous 
> > mmap-ing.
> 
> You know, because the file is created in tmpfs, which is swap-backed.
> By definition the pages here cannot be dropped by third-party.

Hm, ok.


> > I think its resolved, thanks to everybody and Fengguang in 
particular!
> 
> You are welcome :-)
;)

Have a nice day.
Markus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-05 14:43                               ` drop_caches Markus
@ 2009-03-05 14:52                                 ` Wu Fengguang
  -1 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 14:52 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

On Thu, Mar 05, 2009 at 04:43:22PM +0200, Markus wrote:
> > > > Could you please try the attached patch which will also show the
> > > > user and process that opened these files? It adds three more 
> fields
> > > > when CONFIG_PROC_FILECACHE_EXTRAS is selected.
> > > > 
> > > > Thanks,
> > > > Fengguang
> > > >  
> > > > On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > > > > 
> > > > > # sort -n -k 3 filecache-2009-03-05 | tail -n 5
> > > > >      15886       7112     7112     100      1    d- 00:08
> > > > > (tmpfs)        /dev/zero\040(deleted)
> > > > >      16209      35708    35708     100      1    d- 00:08
> > > > > (tmpfs)        /dev/zero\040(deleted)
> > > > >      16212      82128    82128     100      1    d- 00:08
> > > > > (tmpfs)        /dev/zero\040(deleted)
> > > > >      15887     340024   340024     100      1    d- 00:08
> > > > > (tmpfs)        /dev/zero\040(deleted)
> > > > >      15884     455008   455008     100      1    d- 00:08
> > > > > (tmpfs)        /dev/zero\040(deleted)
> > > > > 
> > > > > The sum of the third column is 1013 MB.
> > > > > To note the biggest ones (or do you want the whole file?)... and 
> > > thats 
> > > > > after a sync and a drop_caches! (Can be seen in the commands 
> given.)
> > > 
> > > I could, but I know where these things belong to. Its from sphinx (a 
> > > mysql indexer) searchd. It loads parts of the index into memory.
> > > The sizes looked well-known and killing the searchd will 
> reduce "cached" 
> > > to a normal amount ;)
> > 
> > And it's weird about the file name: /dev/zero.  I wonder how it
> > managed to create that file, and then delete it, inside a tmpfs!
> 
> I dont know exactly. But in the source its just a:
> ... mmap ( NULL, m_iLength, PROT_READ | PROT_WRITE, MAP_SHARED | 
> MAP_ANON, -1, 0 );
> Perhaps thats the way shared anonymous memory is handled?!

Good to know this. The corresponding kernel function is:

        /**     
         * shmem_zero_setup - setup a shared anonymous mapping
         * @vma: the vma to be mmapped is prepared by do_mmap_pgoff
         */
        int shmem_zero_setup(struct vm_area_struct *vma)
        {       
                struct file *file;
                loff_t size = vma->vm_end - vma->vm_start;
                
                file = shmem_file_setup("dev/zero", size, vma->vm_flags);

Here goes the /dev/zero ^_^

> > Just out of curiosity, are they shm objects? Can you show us the
> > output of 'df'? In your convenient time.
> 
> Thats all:
> # df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/md6               14G  7.9G  5.9G  58% /
> udev                   10M  304K  9.8M   3% /dev
> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
> /dev/md4               19G   15G  3.2G  82% /home
> /dev/md3              8.3G  4.5G  3.8G  55% /usr/portage

> shm                   2.0G     0  2.0G   0% /dev/shm

shm objects will be accounted here. It's clean.

Thanks,
Fengguang


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-05 14:52                                 ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-05 14:52 UTC (permalink / raw)
  To: Markus; +Cc: linux-kernel, Zdenek Kabelac, linux-mm, Lukas Hejtmanek

On Thu, Mar 05, 2009 at 04:43:22PM +0200, Markus wrote:
> > > > Could you please try the attached patch which will also show the
> > > > user and process that opened these files? It adds three more 
> fields
> > > > when CONFIG_PROC_FILECACHE_EXTRAS is selected.
> > > > 
> > > > Thanks,
> > > > Fengguang
> > > >  
> > > > On Thu, Mar 05, 2009 at 01:55:35PM +0200, Markus wrote:
> > > > > 
> > > > > # sort -n -k 3 filecache-2009-03-05 | tail -n 5
> > > > >      15886       7112     7112     100      1    d- 00:08
> > > > > (tmpfs)        /dev/zero\040(deleted)
> > > > >      16209      35708    35708     100      1    d- 00:08
> > > > > (tmpfs)        /dev/zero\040(deleted)
> > > > >      16212      82128    82128     100      1    d- 00:08
> > > > > (tmpfs)        /dev/zero\040(deleted)
> > > > >      15887     340024   340024     100      1    d- 00:08
> > > > > (tmpfs)        /dev/zero\040(deleted)
> > > > >      15884     455008   455008     100      1    d- 00:08
> > > > > (tmpfs)        /dev/zero\040(deleted)
> > > > > 
> > > > > The sum of the third column is 1013 MB.
> > > > > To note the biggest ones (or do you want the whole file?)... and 
> > > thats 
> > > > > after a sync and a drop_caches! (Can be seen in the commands 
> given.)
> > > 
> > > I could, but I know where these things belong to. Its from sphinx (a 
> > > mysql indexer) searchd. It loads parts of the index into memory.
> > > The sizes looked well-known and killing the searchd will 
> reduce "cached" 
> > > to a normal amount ;)
> > 
> > And it's weird about the file name: /dev/zero.  I wonder how it
> > managed to create that file, and then delete it, inside a tmpfs!
> 
> I dont know exactly. But in the source its just a:
> ... mmap ( NULL, m_iLength, PROT_READ | PROT_WRITE, MAP_SHARED | 
> MAP_ANON, -1, 0 );
> Perhaps thats the way shared anonymous memory is handled?!

Good to know this. The corresponding kernel function is:

        /**     
         * shmem_zero_setup - setup a shared anonymous mapping
         * @vma: the vma to be mmapped is prepared by do_mmap_pgoff
         */
        int shmem_zero_setup(struct vm_area_struct *vma)
        {       
                struct file *file;
                loff_t size = vma->vm_end - vma->vm_start;
                
                file = shmem_file_setup("dev/zero", size, vma->vm_flags);

Here goes the /dev/zero ^_^

> > Just out of curiosity, are they shm objects? Can you show us the
> > output of 'df'? In your convenient time.
> 
> Thats all:
> # df -h
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/md6               14G  7.9G  5.9G  58% /
> udev                   10M  304K  9.8M   3% /dev
> cachedir              4.0M  100K  4.0M   3% /lib64/splash/cache
> /dev/md4               19G   15G  3.2G  82% /home
> /dev/md3              8.3G  4.5G  3.8G  55% /usr/portage

> shm                   2.0G     0  2.0G   0% /dev/shm

shm objects will be accounted here. It's clean.

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
  2009-03-04 12:38 drop_caches Lukas Hejtmanek
@ 2009-03-04 12:54   ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-04 12:54 UTC (permalink / raw)
  To: Lukas Hejtmanek; +Cc: linux-kernel, linux-mm, Zdenek Kabelac

On Wed, Mar 04, 2009 at 02:38:36PM +0200, Lukas Hejtmanek wrote:
> Hello,
> 
> > So you don't have lots of mapped pages(Mapped=51M) or tmpfs files. It's
> > strange to me that there are so many undroppable cached pages(Cached=359M),
> > and most of them lie out of the LRU queue(Active+Inactive file=53M)...
> 
> > Anyone have better clues on these 'hidden' pages?
> 
> I think he is simply using Intel driver + GEM + UXA = TONS of drm mm objects
> in tmpfs which is 'hidden' unless you have /proc/filecache to see them.

Ah I was about to ask him to try filecache before you and Zdenek kick in.

I was expecting the shm pages to be accounted in /dev/shm, however the
GEM shm pages are allocated from an in-kernel tmpfs mount...

And I noticed in filecache that you are compiling your own
/usr/local/drm/lib/libdrm_intel.so.1.0.0, now I know what are you doing ;-)

Good job, Lukas!

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-04 12:54   ` Wu Fengguang
  0 siblings, 0 replies; 49+ messages in thread
From: Wu Fengguang @ 2009-03-04 12:54 UTC (permalink / raw)
  To: Lukas Hejtmanek; +Cc: linux-kernel, linux-mm, Zdenek Kabelac

On Wed, Mar 04, 2009 at 02:38:36PM +0200, Lukas Hejtmanek wrote:
> Hello,
> 
> > So you don't have lots of mapped pages(Mapped=51M) or tmpfs files. It's
> > strange to me that there are so many undroppable cached pages(Cached=359M),
> > and most of them lie out of the LRU queue(Active+Inactive file=53M)...
> 
> > Anyone have better clues on these 'hidden' pages?
> 
> I think he is simply using Intel driver + GEM + UXA = TONS of drm mm objects
> in tmpfs which is 'hidden' unless you have /proc/filecache to see them.

Ah I was about to ask him to try filecache before you and Zdenek kick in.

I was expecting the shm pages to be accounted in /dev/shm, however the
GEM shm pages are allocated from an in-kernel tmpfs mount...

And I noticed in filecache that you are compiling your own
/usr/local/drm/lib/libdrm_intel.so.1.0.0, now I know what are you doing ;-)

Good job, Lukas!

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: drop_caches ...
@ 2009-03-04 12:38 Lukas Hejtmanek
  2009-03-04 12:54   ` drop_caches Wu Fengguang
  0 siblings, 1 reply; 49+ messages in thread
From: Lukas Hejtmanek @ 2009-03-04 12:38 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: linux-kernel

Hello,

> So you don't have lots of mapped pages(Mapped=51M) or tmpfs files. It's
> strange to me that there are so many undroppable cached pages(Cached=359M),
> and most of them lie out of the LRU queue(Active+Inactive file=53M)...

> Anyone have better clues on these 'hidden' pages?

I think he is simply using Intel driver + GEM + UXA = TONS of drm mm objects
in tmpfs which is 'hidden' unless you have /proc/filecache to see them.

-- 
Lukáš Hejtmánek

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2009-03-05 14:54 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-04  9:57 drop_caches Markus
2009-03-04 10:04 ` drop_caches Wu Fengguang
2009-03-04 10:32   ` drop_caches Markus
2009-03-04 11:05     ` drop_caches Wu Fengguang
2009-03-04 11:29       ` drop_caches Markus
2009-03-04 11:57         ` drop_caches Wu Fengguang
2009-03-04 11:57           ` drop_caches Wu Fengguang
2009-03-04 12:32           ` drop_caches Zdenek Kabelac
2009-03-04 12:32             ` drop_caches Zdenek Kabelac
2009-03-04 13:47             ` drop_caches Markus
2009-03-04 13:47               ` drop_caches Markus
2009-03-04 14:09               ` drop_caches Zdenek Kabelac
2009-03-04 14:09                 ` drop_caches Zdenek Kabelac
2009-03-04 18:47                 ` drop_caches Markus
2009-03-04 18:47                   ` drop_caches Markus
2009-03-05  0:48                   ` drop_caches Wu Fengguang
2009-03-05  0:48                     ` drop_caches Wu Fengguang
2009-03-05  9:06                     ` drop_caches Lukas Hejtmanek
2009-03-05  9:06                       ` drop_caches Lukas Hejtmanek
2009-03-05  9:14                       ` drop_caches KOSAKI Motohiro
2009-03-05  9:14                         ` drop_caches KOSAKI Motohiro
2009-03-05 11:11                         ` drop_caches Wu Fengguang
2009-03-05 11:11                           ` drop_caches Wu Fengguang
2009-03-05 11:55                     ` drop_caches Markus
2009-03-05 11:55                       ` drop_caches Markus
2009-03-05 13:29                       ` drop_caches Wu Fengguang
2009-03-05 14:05                         ` drop_caches Markus
2009-03-05 14:05                           ` drop_caches Markus
2009-03-05 14:22                           ` drop_caches Wu Fengguang
2009-03-05 14:22                             ` drop_caches Wu Fengguang
2009-03-05 14:43                             ` drop_caches Markus
2009-03-05 14:43                               ` drop_caches Markus
2009-03-05 14:52                               ` drop_caches Wu Fengguang
2009-03-05 14:52                                 ` drop_caches Wu Fengguang
2009-03-05 13:36                       ` drop_caches Wu Fengguang
2009-03-05 13:36                         ` drop_caches Wu Fengguang
2009-03-05 13:45                         ` drop_caches Lukas Hejtmanek
2009-03-05 13:45                           ` drop_caches Lukas Hejtmanek
2009-03-05 13:48                           ` drop_caches Wu Fengguang
2009-03-05 13:48                             ` drop_caches Wu Fengguang
2009-03-05 13:50                         ` drop_caches Markus
2009-03-05 13:50                           ` drop_caches Markus
2009-03-05 14:01                         ` drop_caches Lukas Hejtmanek
2009-03-05 14:01                           ` drop_caches Lukas Hejtmanek
2009-03-05 14:07                           ` drop_caches Wu Fengguang
2009-03-05 14:07                             ` drop_caches Wu Fengguang
2009-03-04 12:38 drop_caches Lukas Hejtmanek
2009-03-04 12:54 ` drop_caches Wu Fengguang
2009-03-04 12:54   ` drop_caches Wu Fengguang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.