* VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
@ 2005-09-11 10:57 Theodore Ts'o
2005-09-11 12:00 ` Dipankar Sarma
0 siblings, 1 reply; 32+ messages in thread
From: Theodore Ts'o @ 2005-09-11 10:57 UTC (permalink / raw)
To: linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 2089 bytes --]
I've been noticing this for a while (probably since at least 2.6.11 or
so, but I haven't been keeping close attention), but I haven't had the
time to get some proof that this was the cause, and to write it up
until now.
I have a T40 laptop (Pentium M processor) with 2 gigs of memory, and
from time to time, after the system has been up for a while, the
dentry cache grows huge, as does the ext3_inode_cache:
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
dentry_cache 434515 514112 136 29 1 : tunables 120 60 0 : slabdata 17728 17728 0
ext3_inode_cache 587635 589992 464 8 1 : tunables 54 27 0 : slabdata 73748 73749 0
Leading to an impending shortage in low memory:
LowFree: 9268 kB
... and if I don't take corrective measures, very shortly thereafter
the system will become unresponsive and will end up thrashing itself
to death, with symptoms that are identical to a case of 2.4 lowmem
exhaustion --- except this is on a 2.6.13 kernel, where all of these
problems were supposed to be solved.
It turns out I can head off the system lockup by requesting the
formation of hugepages, which will immediately cause a dramatic
reduction of memory usage in both high- and low- memory as various
caches and flushed:
echo 100 > /proc/sys/vm/nr_hugepages
echo 0 > /proc/sys/vm/nr_hugepages
The question is why isn't the kernel able to figure out how to do
release dentry cache entries automatically when it starts thrashing due
to a lack of low memory? Clearly it can, since requesting hugepages
does shrink the dentry cache:
dentry_cache 20097 20097 136 29 1 : tunables 120 60 0 : slabdata 693 693 0
ext3_inode_cache 17782 17784 464 8 1 : tunables 54 27 0 : slabdata 2223 2223 0
LowFree: 835916 kB
Has anyone else seen this, or have some ideas about how to fix it?
Thanks, regards,
- Ted
[-- Attachment #2: slabinfo --]
[-- Type: text/plain, Size: 13055 bytes --]
slabinfo - version: 2.1
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
nfs_write_data 36 36 448 9 1 : tunables 54 27 0 : slabdata 4 4 0
nfs_read_data 32 36 448 9 1 : tunables 54 27 0 : slabdata 4 4 0
nfs_inode_cache 69 72 592 6 1 : tunables 54 27 0 : slabdata 12 12 0
nfs_page 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0
rpc_buffers 8 8 2048 2 1 : tunables 24 12 0 : slabdata 4 4 0
rpc_tasks 8 20 192 20 1 : tunables 120 60 0 : slabdata 1 1 0
rpc_inode_cache 8 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0
uhci_urb_priv 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0
fib6_nodes 7 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0
ip6_dst_cache 7 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0
ndisc_cache 1 20 192 20 1 : tunables 120 60 0 : slabdata 1 1 0
RAWv6 3 6 640 6 1 : tunables 54 27 0 : slabdata 1 1 0
UDPv6 2 7 576 7 1 : tunables 54 27 0 : slabdata 1 1 0
request_sock_TCPv6 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
TCPv6 12 14 1088 7 2 : tunables 24 12 0 : slabdata 2 2 0
ip_fib_alias 11 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0
ip_fib_hash 11 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0
UNIX 343 350 384 10 1 : tunables 54 27 0 : slabdata 35 35 0
tcp_tw_bucket 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
tcp_bind_bucket 29 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0
inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0
secpath_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
xfrm_dst_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0
ip_dst_cache 29 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0
arp_cache 4 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0
RAW 2 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0
UDP 28 28 512 7 1 : tunables 54 27 0 : slabdata 4 4 0
request_sock_TCP 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0
TCP 144 148 960 4 1 : tunables 54 27 0 : slabdata 37 37 0
flow_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
cfq_ioc_pool 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0
cfq_pool 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0
crq_pool 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0
deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0
as_arq 65 130 60 65 1 : tunables 120 60 0 : slabdata 2 2 0
mqueue_inode_cache 1 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0
hugetlbfs_inode_cache 1 12 316 12 1 : tunables 54 27 0 : slabdata 1 1 0
ext2_inode_cache 0 0 444 9 1 : tunables 54 27 0 : slabdata 0 0 0
ext2_xattr 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0
journal_handle 8 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0
journal_head 2985 3000 52 75 1 : tunables 120 60 0 : slabdata 40 40 0
revoke_table 6 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0
revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0
ext3_inode_cache 587635 589992 464 8 1 : tunables 54 27 0 : slabdata 73748 73749 0
ext3_xattr 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0
dnotify_cache 5 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0
eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0
eventpoll_epi 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
inotify_event_cache 0 0 28 135 1 : tunables 120 60 0 : slabdata 0 0 0
inotify_watch_cache 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0
kioctx 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0
kiocb 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
fasync_cache 3 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0
shmem_inode_cache 963 963 408 9 1 : tunables 54 27 0 : slabdata 107 107 0
posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0
uid_cache 10 61 64 61 1 : tunables 120 60 0 : slabdata 1 1 0
blkdev_ioc 95 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0
blkdev_queue 25 30 380 10 1 : tunables 54 27 0 : slabdata 3 3 0
blkdev_requests 78 78 152 26 1 : tunables 120 60 0 : slabdata 3 3 0
biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
biovec-64 256 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0
biovec-16 256 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0
biovec-4 258 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 340 904 16 226 1 : tunables 120 60 0 : slabdata 4 4 0
bio 374 465 128 31 1 : tunables 120 60 0 : slabdata 14 15 0
file_lock_cache 45 45 88 45 1 : tunables 120 60 0 : slabdata 1 1 0
sock_inode_cache 570 570 384 10 1 : tunables 54 27 0 : slabdata 57 57 0
skbuff_head_cache 880 1160 192 20 1 : tunables 120 60 0 : slabdata 58 58 0
proc_inode_cache 672 672 332 12 1 : tunables 54 27 0 : slabdata 56 56 0
sigqueue 75 108 148 27 1 : tunables 120 60 0 : slabdata 4 4 0
radix_tree_node 27827 29162 276 14 1 : tunables 54 27 0 : slabdata 2083 2083 0
bdev_cache 7 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0
sysfs_dir_cache 3540 3552 40 96 1 : tunables 120 60 0 : slabdata 37 37 0
mnt_cache 28 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0
inode_cache 1251 1404 316 12 1 : tunables 54 27 0 : slabdata 117 117 0
dentry_cache 434515 514112 136 29 1 : tunables 120 60 0 : slabdata 17728 17728 0
filp 4500 4660 192 20 1 : tunables 120 60 0 : slabdata 233 233 0
names_cache 7 7 4096 1 1 : tunables 24 12 0 : slabdata 7 7 0
key_jar 20 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0
idr_layer_cache 91 116 136 29 1 : tunables 120 60 0 : slabdata 4 4 0
buffer_head 153510 162891 48 81 1 : tunables 120 60 0 : slabdata 2011 2011 0
mm_struct 119 119 576 7 1 : tunables 54 27 0 : slabdata 17 17 0
vm_area_struct 8115 8640 88 45 1 : tunables 120 60 0 : slabdata 192 192 0
fs_cache 113 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0
files_cache 114 117 448 9 1 : tunables 54 27 0 : slabdata 13 13 0
signal_cache 135 140 384 10 1 : tunables 54 27 0 : slabdata 14 14 0
sighand_cache 132 135 1344 3 1 : tunables 24 12 0 : slabdata 45 45 0
task_struct 150 153 1328 3 1 : tunables 24 12 0 : slabdata 51 51 0
anon_vma 3535 3663 8 407 1 : tunables 120 60 0 : slabdata 9 9 0
pgd 115 115 4096 1 1 : tunables 24 12 0 : slabdata 115 115 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-65536 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-32768 18 18 32768 1 8 : tunables 8 4 0 : slabdata 18 18 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 1 1 16384 1 4 : tunables 8 4 0 : slabdata 1 1 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 158 158 8192 1 2 : tunables 8 4 0 : slabdata 158 158 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0
size-4096 385 387 4096 1 1 : tunables 24 12 0 : slabdata 385 387 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0
size-2048 75 76 2048 2 1 : tunables 24 12 0 : slabdata 38 38 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0
size-1024 212 212 1024 4 1 : tunables 54 27 0 : slabdata 53 53 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0
size-512 375 456 512 8 1 : tunables 54 27 0 : slabdata 57 57 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
size-256 645 750 256 15 1 : tunables 120 60 0 : slabdata 50 50 0
size-192(DMA) 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0
size-192 100 100 192 20 1 : tunables 120 60 0 : slabdata 5 5 0
size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
size-128 4259 4557 128 31 1 : tunables 120 60 0 : slabdata 147 147 0
size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0
size-64 150913 150914 64 61 1 : tunables 120 60 0 : slabdata 2474 2474 0
size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0
size-32 3273 3332 32 119 1 : tunables 120 60 0 : slabdata 28 28 0
kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0
[-- Attachment #3: meminfo --]
[-- Type: text/plain, Size: 670 bytes --]
MemTotal: 2074880 kB
MemFree: 15220 kB
Buffers: 339900 kB
Cached: 798368 kB
SwapCached: 18252 kB
Active: 1025436 kB
Inactive: 603900 kB
HighTotal: 1178944 kB
HighFree: 5952 kB
LowTotal: 895936 kB
LowFree: 9268 kB
SwapTotal: 2124352 kB
SwapFree: 2060040 kB
Dirty: 9356 kB
Writeback: 0 kB
Mapped: 691788 kB
Slab: 405400 kB
CommitLimit: 3161792 kB
Committed_AS: 1206060 kB
PageTables: 5276 kB
VmallocTotal: 114680 kB
VmallocUsed: 24256 kB
VmallocChunk: 89588 kB
HugePages_Total: 0
HugePages_Free: 0
Hugepagesize: 4096 kB
[-- Attachment #4: config.gz --]
[-- Type: application/octet-stream, Size: 11644 bytes --]
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-11 10:57 VM balancing issues on 2.6.13: dentry cache not getting shrunk enough Theodore Ts'o
@ 2005-09-11 12:00 ` Dipankar Sarma
2005-09-12 3:16 ` Theodore Ts'o
0 siblings, 1 reply; 32+ messages in thread
From: Dipankar Sarma @ 2005-09-11 12:00 UTC (permalink / raw)
To: Theodore Ts'o, linux-mm, linux-kernel; +Cc: Bharata B. Rao
Hi Ted,
On Sun, Sep 11, 2005 at 06:57:09AM -0400, Theodore Ts'o wrote:
>
> I have a T40 laptop (Pentium M processor) with 2 gigs of memory, and
> from time to time, after the system has been up for a while, the
> dentry cache grows huge, as does the ext3_inode_cache:
>
> slabinfo - version: 2.1
> # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
> dentry_cache 434515 514112 136 29 1 : tunables 120 60 0 : slabdata 17728 17728 0
> ext3_inode_cache 587635 589992 464 8 1 : tunables 54 27 0 : slabdata 73748 73749 0
>
> Leading to an impending shortage in low memory:
>
> LowFree: 9268 kB
Do you have the /proc/sys/fs/dentry-state output when such lowmem
shortage happens ?
>
> It turns out I can head off the system lockup by requesting the
> formation of hugepages, which will immediately cause a dramatic
> reduction of memory usage in both high- and low- memory as various
> caches and flushed:
>
> echo 100 > /proc/sys/vm/nr_hugepages
> echo 0 > /proc/sys/vm/nr_hugepages
>
> The question is why isn't the kernel able to figure out how to do
> release dentry cache entries automatically when it starts thrashing due
> to a lack of low memory? Clearly it can, since requesting hugepages
> does shrink the dentry cache:
This is a problem that Bharata has been investigating at the moment.
But he hasn't seen anything that can't be cured by a small memory
pressure - IOW, dentries do get freed under memory pressure. So
your case might be very useful. Bharata is maintaing an instrumentation
patch to collect more information and an alternative dentry aging patch
(using rbtree). Perhaps you could try with those.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-11 12:00 ` Dipankar Sarma
@ 2005-09-12 3:16 ` Theodore Ts'o
2005-09-12 6:16 ` Martin J. Bligh
2005-09-13 8:47 ` Bharata B Rao
0 siblings, 2 replies; 32+ messages in thread
From: Theodore Ts'o @ 2005-09-12 3:16 UTC (permalink / raw)
To: Dipankar Sarma; +Cc: linux-mm, linux-kernel, Bharata B. Rao
On Sun, Sep 11, 2005 at 05:30:46PM +0530, Dipankar Sarma wrote:
> Do you have the /proc/sys/fs/dentry-state output when such lowmem
> shortage happens ?
Not yet, but the situation occurs on my laptop about 2 or 3 times
(when I'm not travelling and so it doesn't get rebooted). So
reproducing it isn't utterly trivial, but it's does happen often
enough that it should be possible to get the necessary data.
> This is a problem that Bharata has been investigating at the moment.
> But he hasn't seen anything that can't be cured by a small memory
> pressure - IOW, dentries do get freed under memory pressure. So
> your case might be very useful. Bharata is maintaing an instrumentation
> patch to collect more information and an alternative dentry aging patch
> (using rbtree). Perhaps you could try with those.
Send it to me, and I'd be happy to try either the instrumentation
patch or the dentry aging patch.
Thanks, regards,
- Ted
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-12 3:16 ` Theodore Ts'o
@ 2005-09-12 6:16 ` Martin J. Bligh
2005-09-12 12:53 ` Bharata B Rao
2005-09-13 8:47 ` Bharata B Rao
1 sibling, 1 reply; 32+ messages in thread
From: Martin J. Bligh @ 2005-09-12 6:16 UTC (permalink / raw)
To: Theodore Ts'o, Dipankar Sarma; +Cc: linux-mm, linux-kernel, Bharata B. Rao
--Theodore Ts'o <tytso@mit.edu> wrote (on Sunday, September 11, 2005 23:16:36 -0400):
> On Sun, Sep 11, 2005 at 05:30:46PM +0530, Dipankar Sarma wrote:
>> Do you have the /proc/sys/fs/dentry-state output when such lowmem
>> shortage happens ?
>
> Not yet, but the situation occurs on my laptop about 2 or 3 times
> (when I'm not travelling and so it doesn't get rebooted). So
> reproducing it isn't utterly trivial, but it's does happen often
> enough that it should be possible to get the necessary data.
>
>> This is a problem that Bharata has been investigating at the moment.
>> But he hasn't seen anything that can't be cured by a small memory
>> pressure - IOW, dentries do get freed under memory pressure. So
>> your case might be very useful. Bharata is maintaing an instrumentation
>> patch to collect more information and an alternative dentry aging patch
>> (using rbtree). Perhaps you could try with those.
>
> Send it to me, and I'd be happy to try either the instrumentation
> patch or the dentry aging patch.
Other thing that might be helpful is to shove a printk in prune_dcache
so we can see when it's getting called, and how successful it is, if the
more sophisticated stuff doesn't help ;-)
M.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-12 6:16 ` Martin J. Bligh
@ 2005-09-12 12:53 ` Bharata B Rao
0 siblings, 0 replies; 32+ messages in thread
From: Bharata B Rao @ 2005-09-12 12:53 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Sun, Sep 11, 2005 at 11:16:30PM -0700, Martin J. Bligh wrote:
>
>
> --Theodore Ts'o <tytso@mit.edu> wrote (on Sunday, September 11, 2005 23:16:36 -0400):
>
> > On Sun, Sep 11, 2005 at 05:30:46PM +0530, Dipankar Sarma wrote:
> >> Do you have the /proc/sys/fs/dentry-state output when such lowmem
> >> shortage happens ?
> >
> > Not yet, but the situation occurs on my laptop about 2 or 3 times
> > (when I'm not travelling and so it doesn't get rebooted). So
> > reproducing it isn't utterly trivial, but it's does happen often
> > enough that it should be possible to get the necessary data.
> >
> >> This is a problem that Bharata has been investigating at the moment.
> >> But he hasn't seen anything that can't be cured by a small memory
> >> pressure - IOW, dentries do get freed under memory pressure. So
> >> your case might be very useful. Bharata is maintaing an instrumentation
> >> patch to collect more information and an alternative dentry aging patch
> >> (using rbtree). Perhaps you could try with those.
> >
> > Send it to me, and I'd be happy to try either the instrumentation
> > patch or the dentry aging patch.
>
> Other thing that might be helpful is to shove a printk in prune_dcache
> so we can see when it's getting called, and how successful it is, if the
> more sophisticated stuff doesn't help ;-)
>
I have incorporated this in the dcache stats patch I have. I will
post it tommorrow after adding some more instrumentation data
(number of inuse and free dentries in lru list) and after a bit of
cleanup and testing.
Regards,
Bharata.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-12 3:16 ` Theodore Ts'o
2005-09-12 6:16 ` Martin J. Bligh
@ 2005-09-13 8:47 ` Bharata B Rao
2005-09-13 21:59 ` David Chinner
` (2 more replies)
1 sibling, 3 replies; 32+ messages in thread
From: Bharata B Rao @ 2005-09-13 8:47 UTC (permalink / raw)
To: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1942 bytes --]
On Sun, Sep 11, 2005 at 11:16:36PM -0400, Theodore Ts'o wrote:
> On Sun, Sep 11, 2005 at 05:30:46PM +0530, Dipankar Sarma wrote:
> > Do you have the /proc/sys/fs/dentry-state output when such lowmem
> > shortage happens ?
>
> Not yet, but the situation occurs on my laptop about 2 or 3 times
> (when I'm not travelling and so it doesn't get rebooted). So
> reproducing it isn't utterly trivial, but it's does happen often
> enough that it should be possible to get the necessary data.
>
> > This is a problem that Bharata has been investigating at the moment.
> > But he hasn't seen anything that can't be cured by a small memory
> > pressure - IOW, dentries do get freed under memory pressure. So
> > your case might be very useful. Bharata is maintaing an instrumentation
> > patch to collect more information and an alternative dentry aging patch
> > (using rbtree). Perhaps you could try with those.
>
> Send it to me, and I'd be happy to try either the instrumentation
> patch or the dentry aging patch.
>
Ted,
I am sending two patches here.
First is dentry_stats patch which collects some dcache statistics
and puts it into /proc/meminfo. This patch provides information
about how dentries are distributed in dcache slab pages, how many
free and in use dentries are present in dentry_unused lru list and
how prune_dcache() performs with respect to freeing the requested
number of dentries.
Second is Sonny Rao's rbtree dentry reclaim patch which is an attempt
to improve this dcache fragmentation problem.
These patches apply on 2.6.13-rc7 and 2.6.13 cleanly.
Could you please apply the dcache_stats patch and check if the problem
can be reproduced. When that happens, could you please capture the
/proc/meminfo, /proc/sys/fs/dentry-state and /proc/slabinfo.
It would be nice if you could also try the rbtree patch to check if
it improves the situation. rbtree patch applies on top of the stats
patch.
Regards,
Bharata.
[-- Attachment #2: dcache_stats.patch --]
[-- Type: text/plain, Size: 9875 bytes --]
This patch obtains some statistics about dcache and exports it as
part of /proc/meminfo.
The following data is collected:
1. A count of pages with 1,2,3,... dentries.
2. Number of dentries requested for freeing and the actual number
of dentries freed during the last invocation of prune_dcache.
3. Information about dcache lru list: number of inuse, free,
referenced and total dentries.
Original Author: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Bharata B Rao <bharata@in.ibm.com>
---
arch/i386/mm/init.c | 8 +++++++
fs/dcache.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++
fs/proc/proc_misc.c | 27 +++++++++++++++++++++++
include/linux/dcache.h | 11 +++++++++
include/linux/mm.h | 3 ++
mm/bootmem.c | 4 +++
6 files changed, 109 insertions(+)
diff -puN include/linux/mm.h~dcache_stats include/linux/mm.h
--- linux-2.6.13-rc7/include/linux/mm.h~dcache_stats 2005-09-12 10:57:52.000000000 +0530
+++ linux-2.6.13-rc7-bharata/include/linux/mm.h 2005-09-13 11:21:52.601920944 +0530
@@ -225,6 +225,9 @@ struct page {
* to show when page is mapped
* & limit reverse map searches.
*/
+ int nr_dentry; /* Number of dentries in this page */
+ spinlock_t nr_dentry_lock;
+
unsigned long private; /* Mapping-private opaque data:
* usually used for buffer_heads
* if PagePrivate set; used for
diff -puN arch/i386/mm/init.c~dcache_stats arch/i386/mm/init.c
--- linux-2.6.13-rc7/arch/i386/mm/init.c~dcache_stats 2005-09-12 10:57:52.000000000 +0530
+++ linux-2.6.13-rc7-bharata/arch/i386/mm/init.c 2005-09-13 11:22:29.357333272 +0530
@@ -272,6 +272,7 @@ void __init one_highpage_init(struct pag
set_page_count(page, 1);
__free_page(page);
totalhigh_pages++;
+ spin_lock_init(&page->nr_dentry_lock);
} else
SetPageReserved(page);
}
@@ -669,6 +670,7 @@ static int noinline do_test_wp_bit(void)
void free_initmem(void)
{
unsigned long addr;
+ struct page *page;
addr = (unsigned long)(&__init_begin);
for (; addr < (unsigned long)(&__init_end); addr += PAGE_SIZE) {
@@ -676,6 +678,8 @@ void free_initmem(void)
set_page_count(virt_to_page(addr), 1);
memset((void *)addr, 0xcc, PAGE_SIZE);
free_page(addr);
+ page = virt_to_page(addr);
+ spin_lock_init(&page->nr_dentry_lock);
totalram_pages++;
}
printk (KERN_INFO "Freeing unused kernel memory: %dk freed\n", (__init_end - __init_begin) >> 10);
@@ -684,12 +688,16 @@ void free_initmem(void)
#ifdef CONFIG_BLK_DEV_INITRD
void free_initrd_mem(unsigned long start, unsigned long end)
{
+ struct page *page;
+
if (start < end)
printk (KERN_INFO "Freeing initrd memory: %ldk freed\n", (end - start) >> 10);
for (; start < end; start += PAGE_SIZE) {
ClearPageReserved(virt_to_page(start));
set_page_count(virt_to_page(start), 1);
free_page(start);
+ page = virt_to_page(start);
+ spin_lock_init(&page->nr_dentry_lock);
totalram_pages++;
}
}
diff -puN fs/dcache.c~dcache_stats fs/dcache.c
--- linux-2.6.13-rc7/fs/dcache.c~dcache_stats 2005-09-12 10:57:52.000000000 +0530
+++ linux-2.6.13-rc7-bharata/fs/dcache.c 2005-09-13 12:27:07.079829848 +0530
@@ -33,6 +33,7 @@
#include <linux/seqlock.h>
#include <linux/swap.h>
#include <linux/bootmem.h>
+#include <linux/pagemap.h>
/* #define DCACHE_DEBUG 1 */
@@ -69,12 +70,48 @@ struct dentry_stat_t dentry_stat = {
.age_limit = 45,
};
+atomic_t nr_dentry[30]; /* I have seen a max of 27 dentries in a page */
+struct lru_dentry_stat lru_dentry_stat;
+DEFINE_SPINLOCK(prune_dcache_lock);
+
+void get_dstat_info(void)
+{
+ struct dentry *dentry;
+
+ lru_dentry_stat.nr_total = lru_dentry_stat.nr_inuse = 0;
+ lru_dentry_stat.nr_ref = lru_dentry_stat.nr_free = 0;
+
+ spin_lock(&dcache_lock);
+ list_for_each_entry(dentry, &dentry_unused, d_lru) {
+ if (atomic_read(&dentry->d_count))
+ lru_dentry_stat.nr_inuse++;
+ if (dentry->d_flags & DCACHE_REFERENCED)
+ lru_dentry_stat.nr_ref++;
+ }
+ lru_dentry_stat.nr_total = dentry_stat.nr_unused;
+ lru_dentry_stat.nr_free = lru_dentry_stat.nr_total -
+ lru_dentry_stat.nr_inuse;
+ spin_unlock(&dcache_lock);
+}
+
static void d_callback(struct rcu_head *head)
{
struct dentry * dentry = container_of(head, struct dentry, d_rcu);
+ unsigned long flags;
+ struct page *page;
if (dname_external(dentry))
kfree(dentry->d_name.name);
+
+ page = virt_to_page(dentry);
+ spin_lock_irqsave(&page->nr_dentry_lock, flags);
+ atomic_dec(&nr_dentry[page->nr_dentry]);
+ if (--page->nr_dentry != 0)
+ atomic_inc(&nr_dentry[page->nr_dentry]);
+ BUG_ON(atomic_read(&nr_dentry[page->nr_dentry]) < 0);
+ BUG_ON(page->nr_dentry > 29);
+ spin_unlock_irqrestore(&page->nr_dentry_lock, flags);
+
kmem_cache_free(dentry_cache, dentry);
}
@@ -393,6 +430,9 @@ static inline void prune_one_dentry(stru
static void prune_dcache(int count)
{
+ int nr_requested = count;
+ int nr_freed = 0;
+
spin_lock(&dcache_lock);
for (; count ; count--) {
struct dentry *dentry;
@@ -427,8 +467,13 @@ static void prune_dcache(int count)
continue;
}
prune_one_dentry(dentry);
+ nr_freed++;
}
spin_unlock(&dcache_lock);
+ spin_lock(&prune_dcache_lock);
+ lru_dentry_stat.dprune_req = nr_requested;
+ lru_dentry_stat.dprune_freed = nr_freed;
+ spin_unlock(&prune_dcache_lock);
}
/*
@@ -720,6 +765,8 @@ struct dentry *d_alloc(struct dentry * p
{
struct dentry *dentry;
char *dname;
+ unsigned long flags;
+ struct page *page;
dentry = kmem_cache_alloc(dentry_cache, GFP_KERNEL);
if (!dentry)
@@ -769,6 +816,15 @@ struct dentry *d_alloc(struct dentry * p
dentry_stat.nr_dentry++;
spin_unlock(&dcache_lock);
+ page = virt_to_page(dentry);
+ spin_lock_irqsave(&page->nr_dentry_lock, flags);
+ if (page->nr_dentry != 0)
+ atomic_dec(&nr_dentry[page->nr_dentry]);
+ atomic_inc(&nr_dentry[++page->nr_dentry]);
+ BUG_ON(atomic_read(&nr_dentry[page->nr_dentry]) < 0);
+ BUG_ON(page->nr_dentry > 29);
+ spin_unlock_irqrestore(&page->nr_dentry_lock, flags);
+
return dentry;
}
diff -puN fs/proc/proc_misc.c~dcache_stats fs/proc/proc_misc.c
--- linux-2.6.13-rc7/fs/proc/proc_misc.c~dcache_stats 2005-09-12 10:57:52.000000000 +0530
+++ linux-2.6.13-rc7-bharata/fs/proc/proc_misc.c 2005-09-13 11:49:43.460911768 +0530
@@ -45,6 +45,7 @@
#include <linux/sysrq.h>
#include <linux/vmalloc.h>
#include <linux/crash_dump.h>
+#include <linux/dcache.h>
#include <asm/uaccess.h>
#include <asm/pgtable.h>
#include <asm/io.h>
@@ -115,6 +116,9 @@ static int uptime_read_proc(char *page,
return proc_calc_metrics(page, start, off, count, eof, len);
}
+extern atomic_t nr_dentry[];
+extern spinlock_t prune_dcache_lock;
+
static int meminfo_read_proc(char *page, char **start, off_t off,
int count, int *eof, void *data)
{
@@ -128,6 +132,7 @@ static int meminfo_read_proc(char *page,
unsigned long allowed;
struct vmalloc_info vmi;
long cached;
+ int j, total_dcache_pages = 0;
get_page_state(&ps);
get_zone_counts(&active, &inactive, &free);
@@ -200,6 +205,28 @@ static int meminfo_read_proc(char *page,
vmi.largest_chunk >> 10
);
+ for (j =0; j < 30; j++) {
+ len += sprintf(page + len, "pages_with_[%2d]_dentries: %d\n",
+ j, atomic_read(&nr_dentry[j]));
+ total_dcache_pages += atomic_read(&nr_dentry[j]);
+ }
+ len += sprintf(page + len, "dcache_pages total: %d\n",
+ total_dcache_pages);
+
+ spin_lock(&prune_dcache_lock);
+ len += sprintf(page + len, "prune_dcache: requested %d freed %d\n",
+ lru_dentry_stat.dprune_req, lru_dentry_stat.dprune_freed);
+ spin_unlock(&prune_dcache_lock);
+
+ get_dstat_info();
+ len += sprintf(page + len, "dcache lru list data:\n"
+ "dentries total: %d\n"
+ "dentries in_use: %d\n"
+ "dentries free: %d\n"
+ "dentries referenced: %d\n",
+ lru_dentry_stat.nr_total, lru_dentry_stat.nr_inuse,
+ lru_dentry_stat.nr_free, lru_dentry_stat.nr_ref);
+
len += hugetlb_report_meminfo(page + len);
return proc_calc_metrics(page, start, off, count, eof, len);
diff -puN mm/bootmem.c~dcache_stats mm/bootmem.c
--- linux-2.6.13-rc7/mm/bootmem.c~dcache_stats 2005-09-12 10:57:52.000000000 +0530
+++ linux-2.6.13-rc7-bharata/mm/bootmem.c 2005-09-13 11:26:31.358543496 +0530
@@ -291,12 +291,14 @@ static unsigned long __init free_all_boo
page = pfn_to_page(pfn);
count += BITS_PER_LONG;
__ClearPageReserved(page);
+ spin_lock_init(&page->nr_dentry_lock);
order = ffs(BITS_PER_LONG) - 1;
set_page_refs(page, order);
for (j = 1; j < BITS_PER_LONG; j++) {
if (j + 16 < BITS_PER_LONG)
prefetchw(page + j + 16);
__ClearPageReserved(page + j);
+ spin_lock_init(&((page + j)->nr_dentry_lock));
}
__free_pages(page, order);
i += BITS_PER_LONG;
@@ -311,6 +313,7 @@ static unsigned long __init free_all_boo
__ClearPageReserved(page);
set_page_refs(page, 0);
__free_page(page);
+ spin_lock_init(&page->nr_dentry_lock);
}
}
} else {
@@ -331,6 +334,7 @@ static unsigned long __init free_all_boo
__ClearPageReserved(page);
set_page_count(page, 1);
__free_page(page);
+ spin_lock_init(&page->nr_dentry_lock);
}
total += count;
bdata->node_bootmem_map = NULL;
diff -puN include/linux/dcache.h~dcache_stats include/linux/dcache.h
--- linux-2.6.13-rc7/include/linux/dcache.h~dcache_stats 2005-09-12 17:30:01.000000000 +0530
+++ linux-2.6.13-rc7-bharata/include/linux/dcache.h 2005-09-13 12:27:07.080829696 +0530
@@ -46,6 +46,17 @@ struct dentry_stat_t {
};
extern struct dentry_stat_t dentry_stat;
+struct lru_dentry_stat {
+ int nr_total;
+ int nr_inuse;
+ int nr_ref;
+ int nr_free;
+ int dprune_req;
+ int dprune_freed;
+};
+extern struct lru_dentry_stat lru_dentry_stat;
+extern void get_dstat_info(void);
+
/* Name hashing routines. Initial hash value */
/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */
#define init_name_hash() 0
_
[-- Attachment #3: rbtree_dcache_reclaim.patch --]
[-- Type: text/plain, Size: 6971 bytes --]
This patch maintains the dentries in a red black tree. RB tree is
scanned in-order and dentries are put into the end of LRU list
to increase the chances of freeing a dentries of a given page.
Original Author: Santhosh Rao <raosanth@us.ibm.com>
Signed-off-by: Bharata B Rao <bharata@in.ibm.com>
---
fs/dcache.c | 143 +++++++++++++++++++++++++++++++++++++++++++++++--
include/linux/dcache.h | 2
2 files changed, 141 insertions(+), 4 deletions(-)
diff -puN fs/dcache.c~rbtree_dcache_reclaim fs/dcache.c
--- linux-2.6.13-rc7/fs/dcache.c~rbtree_dcache_reclaim 2005-09-13 12:11:11.279133640 +0530
+++ linux-2.6.13-rc7-bharata/fs/dcache.c 2005-09-13 12:15:02.732947312 +0530
@@ -34,6 +34,7 @@
#include <linux/swap.h>
#include <linux/bootmem.h>
#include <linux/pagemap.h>
+#include <linux/rbtree.h>
/* #define DCACHE_DEBUG 1 */
@@ -70,6 +71,50 @@ struct dentry_stat_t dentry_stat = {
.age_limit = 45,
};
+static struct rb_root dentry_tree = RB_ROOT;
+
+#define RB_NONE (2)
+#define ON_RB(node) ((node)->rb_color != RB_NONE)
+#define RB_CLEAR(node) ((node)->rb_color = RB_NONE )
+
+
+/* take a dentry safely off the rbtree */
+static void drb_delete(struct dentry* dentry)
+{
+ if (ON_RB(&dentry->d_rb)) {
+ rb_erase(&dentry->d_rb, &dentry_tree);
+ RB_CLEAR(&dentry->d_rb);
+ } else {
+ /* All allocated dentry objs should be in the tree */
+ BUG_ON(1);
+ }
+}
+
+static struct dentry * drb_insert(struct dentry * dentry)
+{
+ struct rb_node ** p = &dentry_tree.rb_node;
+ struct rb_node * parent = NULL;
+ struct rb_node * node = &dentry->d_rb;
+ struct dentry * cur = NULL;
+
+ while (*p) {
+ parent = *p;
+ cur = rb_entry(parent, struct dentry, d_rb);
+
+ if (dentry < cur)
+ p = &(*p)->rb_left;
+ else if (dentry > cur)
+ p = &(*p)->rb_right;
+ else {
+ return cur;
+ }
+ }
+
+ rb_link_node(node, parent, p);
+ rb_insert_color(node,&dentry_tree);
+ return NULL;
+}
+
atomic_t nr_dentry[30]; /* I have seen a max of 27 dentries in a page */
struct lru_dentry_stat lru_dentry_stat;
DEFINE_SPINLOCK(prune_dcache_lock);
@@ -232,6 +277,7 @@ kill_it: {
list_del(&dentry->d_child);
dentry_stat.nr_dentry--; /* For d_free, below */
/*drops the locks, at that point nobody can reach this dentry */
+ drb_delete(dentry);
dentry_iput(dentry);
parent = dentry->d_parent;
d_free(dentry);
@@ -407,6 +453,7 @@ static inline void prune_one_dentry(stru
__d_drop(dentry);
list_del(&dentry->d_child);
dentry_stat.nr_dentry--; /* For d_free, below */
+ drb_delete(dentry);
dentry_iput(dentry);
parent = dentry->d_parent;
d_free(dentry);
@@ -416,7 +463,7 @@ static inline void prune_one_dentry(stru
}
/**
- * prune_dcache - shrink the dcache
+ * prune_lru - shrink the lru list
* @count: number of entries to try and free
*
* Shrink the dcache. This is done when we need
@@ -428,7 +475,7 @@ static inline void prune_one_dentry(stru
* all the dentries are in use.
*/
-static void prune_dcache(int count)
+static void prune_lru(int count)
{
int nr_requested = count;
int nr_freed = 0;
@@ -476,6 +523,93 @@ static void prune_dcache(int count)
spin_unlock(&prune_dcache_lock);
}
+/**
+ * prune_dcache - try and "intelligently" shrink the dcache
+ * @requested - num of dentrys to try and free
+ *
+ * The basic strategy here is to scan through our tree of dentrys
+ * in-order and put them at the end of the lru - free list
+ * Why in-order? Because, we want the chances of actually freeing
+ * all 15-27 (depending on arch) dentrys on a given page, instead
+ * of just in random lru order, which tends to lower dcache utilization
+ * and not free many pages.
+ */
+static void prune_dcache(unsigned requested)
+{
+ /* ------ debug --------- */
+ //static int mod = 0;
+ //int flag = 0, removed = 0;
+ /* ------ debug --------- */
+
+ unsigned found = 0;
+ unsigned count;
+ struct rb_node * next;
+ struct dentry *dentry;
+#define NUM_LRU_PTRS 8
+ struct rb_node *lru_ptrs[NUM_LRU_PTRS];
+ struct list_head *cur;
+ int i;
+
+ spin_lock(&dcache_lock);
+
+ cur = dentry_unused.prev;
+
+ /* grab NUM_LRU_PTRS entrys off the end of lru list */
+ /* we'll use these as pseudo-random starting points in the tree */
+ for (i = 0 ; i < NUM_LRU_PTRS ; i++ ){
+ if ( cur == &dentry_unused ) {
+ /* if there aren't NUM_LRU_PTRS entrys, we probably
+ can't even free a page now, give up */
+ spin_unlock(&dcache_lock);
+ return;
+ }
+ lru_ptrs[i] = &(list_entry(cur,struct dentry, d_lru)->d_rb);
+ cur = cur->prev;
+ }
+
+ i = 0;
+
+ do {
+ count = 4 * PAGE_SIZE / sizeof(struct dentry) ; /* abitrary heuristic */
+ next = lru_ptrs[i];
+ for (; count ; count--) {
+ if( ! next ) {
+ //flag = 1; /* ------ debug --------- */
+ break;
+ }
+ dentry = list_entry(next, struct dentry, d_rb);
+ next = rb_next(next);
+ prefetch(next);
+ if( ! list_empty( &dentry->d_lru) ) {
+ list_del_init(&dentry->d_lru);
+ dentry_stat.nr_unused--;
+ }
+ if (atomic_read(&dentry->d_count)) {
+ //removed++; /* ------ debug --------- */
+ continue;
+ } else {
+ list_add_tail(&dentry->d_lru, &dentry_unused);
+ dentry_stat.nr_unused++;
+ found++;
+ }
+ }
+ i++;
+ } while ( (found < requested / 2) && (i < NUM_LRU_PTRS ) );
+#undef NUM_LRU_PTRS
+
+ spin_unlock(&dcache_lock);
+
+ /* ------ debug --------- */
+ //mod++;
+ //if ( ! (mod & 64) ) {
+ // mod = 0;
+ // printk("prune_dcache: i %d flag %d, found %d removed %d\n",i,flag,found,removed);
+ //}
+ /* ------ debug --------- */
+
+ prune_lru(found);
+}
+
/*
* Shrink the dcache for the specified super block.
* This allows us to unmount a device without disturbing
@@ -687,7 +821,7 @@ void shrink_dcache_parent(struct dentry
int found;
while ((found = select_parent(parent)) != 0)
- prune_dcache(found);
+ prune_lru(found);
}
/**
@@ -725,7 +859,7 @@ void shrink_dcache_anon(struct hlist_hea
}
}
spin_unlock(&dcache_lock);
- prune_dcache(found);
+ prune_lru(found);
} while(found);
}
@@ -814,6 +948,7 @@ struct dentry *d_alloc(struct dentry * p
if (parent)
list_add(&dentry->d_child, &parent->d_subdirs);
dentry_stat.nr_dentry++;
+ drb_insert(dentry);
spin_unlock(&dcache_lock);
page = virt_to_page(dentry);
diff -puN include/linux/dcache.h~rbtree_dcache_reclaim include/linux/dcache.h
--- linux-2.6.13-rc7/include/linux/dcache.h~rbtree_dcache_reclaim 2005-09-13 12:11:11.284132880 +0530
+++ linux-2.6.13-rc7-bharata/include/linux/dcache.h 2005-09-13 12:11:11.306129536 +0530
@@ -9,6 +9,7 @@
#include <linux/cache.h>
#include <linux/rcupdate.h>
#include <asm/bug.h>
+#include <linux/rbtree.h>
struct nameidata;
struct vfsmount;
@@ -104,6 +105,7 @@ struct dentry {
struct dentry *d_parent; /* parent directory */
struct qstr d_name;
+ struct rb_node d_rb;
struct list_head d_lru; /* LRU list */
struct list_head d_child; /* child of parent list */
struct list_head d_subdirs; /* our children */
_
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-13 8:47 ` Bharata B Rao
@ 2005-09-13 21:59 ` David Chinner
2005-09-14 9:01 ` Andi Kleen
2005-09-14 15:48 ` Sonny Rao
2005-09-14 21:34 ` Marcelo Tosatti
2005-09-14 23:08 ` Marcelo Tosatti
2 siblings, 2 replies; 32+ messages in thread
From: David Chinner @ 2005-09-13 21:59 UTC (permalink / raw)
To: Bharata B Rao; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
>
> Second is Sonny Rao's rbtree dentry reclaim patch which is an attempt
> to improve this dcache fragmentation problem.
FYI, in the past I've tried this patch to reduce dcache fragmentation on
an Altix (16k pages, 62 dentries to a slab page) under heavy
fileserver workloads and it had no measurable effect. It appeared
that there was almost always at least one active dentry on each page
in the slab. The story may very well be different on 4k page
machines, however.
Typically, fragmentation was bad enough that reclaim removed ~90% of
the working set of dentries to free about 1% of the memory in the
dentry slab. We had to get down to freeing > 95% of the dentry cache
before fragmentation started to reduce and the system stopped trying to
reclaim the dcache which we then spent the next 10 minutes
repopulating......
We also tried separating out directory dentries into a separate slab
so that (potentially) longer lived dentries were clustered together
rather than sparsely distributed around the slab cache. Once again,
it had no measurable effect on the level of fragmentation (with or
without the rbtree patch).
FWIW, the inode cache was showing very similar levels of fragmentation
under reclaim as well.
Cheers,
Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-13 21:59 ` David Chinner
@ 2005-09-14 9:01 ` Andi Kleen
2005-09-14 9:16 ` Manfred Spraul
` (3 more replies)
2005-09-14 15:48 ` Sonny Rao
1 sibling, 4 replies; 32+ messages in thread
From: Andi Kleen @ 2005-09-14 9:01 UTC (permalink / raw)
To: David Chinner
Cc: Bharata B Rao, Theodore Ts'o, Dipankar Sarma, linux-mm,
linux-kernel, manfred
On Tuesday 13 September 2005 23:59, David Chinner wrote:
> On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> > Second is Sonny Rao's rbtree dentry reclaim patch which is an attempt
> > to improve this dcache fragmentation problem.
>
> FYI, in the past I've tried this patch to reduce dcache fragmentation on
> an Altix (16k pages, 62 dentries to a slab page) under heavy
> fileserver workloads and it had no measurable effect. It appeared
> that there was almost always at least one active dentry on each page
> in the slab. The story may very well be different on 4k page
> machines, however.
I always thought dentry freeing would work much better if it
was turned upside down.
Instead of starting from the high level dcache lists it could
be driven by slab: on memory pressure slab tries to return pages with unused
cache objects. In that case it should check if there are only
a small number of pinned objects on the page set left, and if
yes use a new callback to the higher level user (=dcache) and ask them
to free the object.
The slab datastructures are not completely suited for this right now,
but it could be done by using one more of the list_heads in struct page
for slab backing pages.
It would probably not be very LRU but a simple hack of having slowly
increasing dcache generations. Each dentry use updates the generation.
First slab memory freeing pass only frees objects with older generations.
Using slowly increasing generations has the advantage of timestamps
that you can avoid dirtying cache lines in the common case when
the generation doesn't change on access (= no additional cache line bouncing)
and it would easily allow to tune the aging rate under stress by changing the
length of the generation.
-Andi
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 9:01 ` Andi Kleen
@ 2005-09-14 9:16 ` Manfred Spraul
2005-09-14 9:43 ` Andrew Morton
2005-09-14 9:35 ` Andrew Morton
` (2 subsequent siblings)
3 siblings, 1 reply; 32+ messages in thread
From: Manfred Spraul @ 2005-09-14 9:16 UTC (permalink / raw)
To: Andi Kleen
Cc: David Chinner, Bharata B Rao, Theodore Ts'o, Dipankar Sarma,
linux-mm, linux-kernel
Andi Kleen wrote:
>The slab datastructures are not completely suited for this right now,
>but it could be done by using one more of the list_heads in struct page
>for slab backing pages.
>
>
>
I agree, I even started prototyping something a year ago, but ran out of
time.
One tricky point are directory dentries: As far as I see, they are
pinned and unfreeable if a (freeable) directory entry is in the cache.
--
Manfred
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 9:01 ` Andi Kleen
2005-09-14 9:16 ` Manfred Spraul
@ 2005-09-14 9:35 ` Andrew Morton
2005-09-14 13:57 ` Martin J. Bligh
2005-09-14 22:48 ` David Chinner
3 siblings, 0 replies; 32+ messages in thread
From: Andrew Morton @ 2005-09-14 9:35 UTC (permalink / raw)
To: Andi Kleen; +Cc: dgc, bharata, tytso, dipankar, linux-mm, linux-kernel, manfred
Andi Kleen <ak@suse.de> wrote:
>
> On Tuesday 13 September 2005 23:59, David Chinner wrote:
> > On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> > > Second is Sonny Rao's rbtree dentry reclaim patch which is an attempt
> > > to improve this dcache fragmentation problem.
> >
> > FYI, in the past I've tried this patch to reduce dcache fragmentation on
> > an Altix (16k pages, 62 dentries to a slab page) under heavy
> > fileserver workloads and it had no measurable effect. It appeared
> > that there was almost always at least one active dentry on each page
> > in the slab. The story may very well be different on 4k page
> > machines, however.
>
> I always thought dentry freeing would work much better if it
> was turned upside down.
>
> Instead of starting from the high level dcache lists it could
> be driven by slab: on memory pressure slab tries to return pages with unused
> cache objects. In that case it should check if there are only
> a small number of pinned objects on the page set left, and if
> yes use a new callback to the higher level user (=dcache) and ask them
> to free the object.
Considered doing that with buffer_heads a few years ago. It's impossible
unless you have a global lock, which bh's don't have. dentries _do_ have a
global lock, and we'd be tied to having it for ever more.
The shrinking code would have be able to deal with a dentry which is going
through destruction by other call paths, so dcache_lock coverage would have
to be extended considerably - it would have to cover the kmem_cache_free(),
for example. Or we put some i_am_alive flag into the dentry.
> The slab datastructures are not completely suited for this right now,
> but it could be done by using one more of the list_heads in struct page
> for slab backing pages.
Yes, some help would be needed in the slab code.
There's only one list_head in struct page and slab is already using it.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 9:16 ` Manfred Spraul
@ 2005-09-14 9:43 ` Andrew Morton
2005-09-14 9:52 ` Dipankar Sarma
2005-09-14 22:44 ` Theodore Ts'o
0 siblings, 2 replies; 32+ messages in thread
From: Andrew Morton @ 2005-09-14 9:43 UTC (permalink / raw)
To: Manfred Spraul; +Cc: ak, dgc, bharata, tytso, dipankar, linux-mm, linux-kernel
Manfred Spraul <manfred@colorfullife.com> wrote:
>
> One tricky point are directory dentries: As far as I see, they are
> pinned and unfreeable if a (freeable) directory entry is in the cache.
>
Well. That's the whole problem.
I don't think it's been demonstrated that Ted's problem was caused by
internal fragementation, btw. Ted, could you run slabtop, see what the
dcache occupancy is? Monitor it as you start to manually apply pressure?
If the occupancy falls to 10% and not many slab pages are freed up yet then
yup, it's internal fragmentation.
I've found that internal fragmentation due to pinned directory dentries can
be very high if you're running silly benchmarks which create some
regular-shaped directory tree which can easily create pathological
patterns. For real-world things with irregular creation and access
patterns and irregular directory sizes the fragmentation isn't as easy to
demonstrate.
Another approach would be to do an aging round on a directory's children
when an unfreeable dentry is encountered on the LRU. Something like that.
If internal fragmentation is indeed the problem.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 9:43 ` Andrew Morton
@ 2005-09-14 9:52 ` Dipankar Sarma
2005-09-14 22:44 ` Theodore Ts'o
1 sibling, 0 replies; 32+ messages in thread
From: Dipankar Sarma @ 2005-09-14 9:52 UTC (permalink / raw)
To: Andrew Morton
Cc: Manfred Spraul, ak, dgc, bharata, tytso, linux-mm, linux-kernel
On Wed, Sep 14, 2005 at 02:43:13AM -0700, Andrew Morton wrote:
> Manfred Spraul <manfred@colorfullife.com> wrote:
> >
> > One tricky point are directory dentries: As far as I see, they are
> > pinned and unfreeable if a (freeable) directory entry is in the cache.
> >
> I don't think it's been demonstrated that Ted's problem was caused by
> internal fragementation, btw. Ted, could you run slabtop, see what the
> dcache occupancy is? Monitor it as you start to manually apply pressure?
> If the occupancy falls to 10% and not many slab pages are freed up yet then
> yup, it's internal fragmentation.
>
> I've found that internal fragmentation due to pinned directory dentries can
> be very high if you're running silly benchmarks which create some
> regular-shaped directory tree which can easily create pathological
> patterns. For real-world things with irregular creation and access
> patterns and irregular directory sizes the fragmentation isn't as easy to
> demonstrate.
>
> Another approach would be to do an aging round on a directory's children
> when an unfreeable dentry is encountered on the LRU. Something like that.
> If internal fragmentation is indeed the problem.
One other point to look at is whether fragmentation is due to pinned
dentries or not. We can get that information only from dcache itself.
That is what we need to acertain first using the instrumentation
patch. Solving the problem of large # of pinned dentries and large # of LRU
free dentries will likely require different approaches. Even the
LRU dentries are sometimes pinned due to the lazy-lru stuff that
we did for lock-free dcache. Let us get some accurate dentry
stats first from the instrumentation patch.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 9:01 ` Andi Kleen
2005-09-14 9:16 ` Manfred Spraul
2005-09-14 9:35 ` Andrew Morton
@ 2005-09-14 13:57 ` Martin J. Bligh
2005-09-14 15:37 ` Sonny Rao
2005-09-15 7:21 ` Helge Hafting
2005-09-14 22:48 ` David Chinner
3 siblings, 2 replies; 32+ messages in thread
From: Martin J. Bligh @ 2005-09-14 13:57 UTC (permalink / raw)
To: Andi Kleen, David Chinner
Cc: Bharata B Rao, Theodore Ts'o, Dipankar Sarma, linux-mm,
linux-kernel, manfred
>> > Second is Sonny Rao's rbtree dentry reclaim patch which is an attempt
>> > to improve this dcache fragmentation problem.
>>
>> FYI, in the past I've tried this patch to reduce dcache fragmentation on
>> an Altix (16k pages, 62 dentries to a slab page) under heavy
>> fileserver workloads and it had no measurable effect. It appeared
>> that there was almost always at least one active dentry on each page
>> in the slab. The story may very well be different on 4k page
>> machines, however.
>
> I always thought dentry freeing would work much better if it
> was turned upside down.
>
> Instead of starting from the high level dcache lists it could
> be driven by slab: on memory pressure slab tries to return pages with unused
> cache objects. In that case it should check if there are only
> a small number of pinned objects on the page set left, and if
> yes use a new callback to the higher level user (=dcache) and ask them
> to free the object.
>
> The slab datastructures are not completely suited for this right now,
> but it could be done by using one more of the list_heads in struct page
> for slab backing pages.
>
> It would probably not be very LRU but a simple hack of having slowly
> increasing dcache generations. Each dentry use updates the generation.
> First slab memory freeing pass only frees objects with older generations.
If they're freeable, we should easily be able to move them, and therefore
compact a fragmented slab. That way we can preserve the LRU'ness of it.
Stage 1: free the oldest entries. Stage 2: compact the slab into whole
pages. Stage 3: free whole pages back to teh page allocator.
> Using slowly increasing generations has the advantage of timestamps
> that you can avoid dirtying cache lines in the common case when
> the generation doesn't change on access (= no additional cache line bouncing)
> and it would easily allow to tune the aging rate under stress by changing the
> length of the generation.
LRU algorithm may need general tweaking like this anyway ... strict LRU
is expensive to keep.
M.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 13:57 ` Martin J. Bligh
@ 2005-09-14 15:37 ` Sonny Rao
2005-09-15 7:21 ` Helge Hafting
1 sibling, 0 replies; 32+ messages in thread
From: Sonny Rao @ 2005-09-14 15:37 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Andi Kleen, David Chinner, Bharata B Rao, Theodore Ts'o,
Dipankar Sarma, linux-mm, linux-kernel, manfred
On Wed, Sep 14, 2005 at 06:57:56AM -0700, Martin J. Bligh wrote:
> >> > Second is Sonny Rao's rbtree dentry reclaim patch which is an attempt
> >> > to improve this dcache fragmentation problem.
> >>
> >> FYI, in the past I've tried this patch to reduce dcache fragmentation on
> >> an Altix (16k pages, 62 dentries to a slab page) under heavy
> >> fileserver workloads and it had no measurable effect. It appeared
> >> that there was almost always at least one active dentry on each page
> >> in the slab. The story may very well be different on 4k page
> >> machines, however.
> >
> > I always thought dentry freeing would work much better if it
> > was turned upside down.
> >
> > Instead of starting from the high level dcache lists it could
> > be driven by slab: on memory pressure slab tries to return pages with unused
> > cache objects. In that case it should check if there are only
> > a small number of pinned objects on the page set left, and if
> > yes use a new callback to the higher level user (=dcache) and ask them
> > to free the object.
> >
> > The slab datastructures are not completely suited for this right now,
> > but it could be done by using one more of the list_heads in struct page
> > for slab backing pages.
> >
> > It would probably not be very LRU but a simple hack of having slowly
> > increasing dcache generations. Each dentry use updates the generation.
> > First slab memory freeing pass only frees objects with older generations.
>
> If they're freeable, we should easily be able to move them, and therefore
> compact a fragmented slab. That way we can preserve the LRU'ness of it.
> Stage 1: free the oldest entries. Stage 2: compact the slab into whole
> pages. Stage 3: free whole pages back to teh page allocator.
>
> > Using slowly increasing generations has the advantage of timestamps
> > that you can avoid dirtying cache lines in the common case when
> > the generation doesn't change on access (= no additional cache line bouncing)
> > and it would easily allow to tune the aging rate under stress by changing the
> > length of the generation.
>
> LRU algorithm may need general tweaking like this anyway ... strict LRU
> is expensive to keep.
Based on what I remember, I'd contend it isn't really LRU today, so I
wouldn't try and stick to something that we aren't doing. :)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-13 21:59 ` David Chinner
2005-09-14 9:01 ` Andi Kleen
@ 2005-09-14 15:48 ` Sonny Rao
2005-09-14 22:02 ` David Chinner
1 sibling, 1 reply; 32+ messages in thread
From: Sonny Rao @ 2005-09-14 15:48 UTC (permalink / raw)
To: David Chinner
Cc: Bharata B Rao, Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Wed, Sep 14, 2005 at 07:59:32AM +1000, David Chinner wrote:
> On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> >
> > Second is Sonny Rao's rbtree dentry reclaim patch which is an attempt
> > to improve this dcache fragmentation problem.
>
> FYI, in the past I've tried this patch to reduce dcache fragmentation on
> an Altix (16k pages, 62 dentries to a slab page) under heavy
> fileserver workloads and it had no measurable effect. It appeared
> that there was almost always at least one active dentry on each page
> in the slab. The story may very well be different on 4k page
> machines, however.
>
> Typically, fragmentation was bad enough that reclaim removed ~90% of
> the working set of dentries to free about 1% of the memory in the
> dentry slab. We had to get down to freeing > 95% of the dentry cache
> before fragmentation started to reduce and the system stopped trying to
> reclaim the dcache which we then spent the next 10 minutes
> repopulating......
>
> We also tried separating out directory dentries into a separate slab
> so that (potentially) longer lived dentries were clustered together
> rather than sparsely distributed around the slab cache. Once again,
> it had no measurable effect on the level of fragmentation (with or
> without the rbtree patch).
I'm not surprised... With 62 dentrys per page, the likelyhood of
success is very small, and in fact performance could degrade since we
are holding the dcache lock more often and doing less useful work.
It has been over a year and my memory is hazy, but I think I did see
about a 10% improvement on my workload (some sort of SFS simulation
with millions of files being randomly accessed) on an x86 machine but CPU
utilization also went way up which I think was the dcache lock.
Whatever happened to the vfs_cache_pressue band-aid/sledgehammer ?
Is it not considered an option ?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-13 8:47 ` Bharata B Rao
2005-09-13 21:59 ` David Chinner
@ 2005-09-14 21:34 ` Marcelo Tosatti
2005-09-14 21:43 ` Dipankar Sarma
2005-09-15 4:28 ` Bharata B Rao
2005-09-14 23:08 ` Marcelo Tosatti
2 siblings, 2 replies; 32+ messages in thread
From: Marcelo Tosatti @ 2005-09-14 21:34 UTC (permalink / raw)
To: Bharata B Rao; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> On Sun, Sep 11, 2005 at 11:16:36PM -0400, Theodore Ts'o wrote:
> > On Sun, Sep 11, 2005 at 05:30:46PM +0530, Dipankar Sarma wrote:
> > > Do you have the /proc/sys/fs/dentry-state output when such lowmem
> > > shortage happens ?
> >
> > Not yet, but the situation occurs on my laptop about 2 or 3 times
> > (when I'm not travelling and so it doesn't get rebooted). So
> > reproducing it isn't utterly trivial, but it's does happen often
> > enough that it should be possible to get the necessary data.
> >
> > > This is a problem that Bharata has been investigating at the moment.
> > > But he hasn't seen anything that can't be cured by a small memory
> > > pressure - IOW, dentries do get freed under memory pressure. So
> > > your case might be very useful. Bharata is maintaing an instrumentation
> > > patch to collect more information and an alternative dentry aging patch
> > > (using rbtree). Perhaps you could try with those.
> >
> > Send it to me, and I'd be happy to try either the instrumentation
> > patch or the dentry aging patch.
> >
>
> Ted,
>
> I am sending two patches here.
>
> First is dentry_stats patch which collects some dcache statistics
> and puts it into /proc/meminfo. This patch provides information
> about how dentries are distributed in dcache slab pages, how many
> free and in use dentries are present in dentry_unused lru list and
> how prune_dcache() performs with respect to freeing the requested
> number of dentries.
Hi Bharata,
+void get_dstat_info(void)
+{
+ struct dentry *dentry;
+
+ lru_dentry_stat.nr_total = lru_dentry_stat.nr_inuse = 0;
+ lru_dentry_stat.nr_ref = lru_dentry_stat.nr_free = 0;
+
+ spin_lock(&dcache_lock);
+ list_for_each_entry(dentry, &dentry_unused, d_lru) {
+ if (atomic_read(&dentry->d_count))
+ lru_dentry_stat.nr_inuse++;
Dentries on dentry_unused list with d_count positive? Is that possible
at all? As far as my limited understanding goes, only dentries with zero
count can be part of the dentry_unused list.
+ if (dentry->d_flags & DCACHE_REFERENCED)
+ lru_dentry_stat.nr_ref++;
+ }
@@ -393,6 +430,9 @@ static inline void prune_one_dentry(stru
static void prune_dcache(int count)
{
+ int nr_requested = count;
+ int nr_freed = 0;
+
spin_lock(&dcache_lock);
for (; count ; count--) {
struct dentry *dentry;
@@ -427,8 +467,13 @@ static void prune_dcache(int count)
continue;
}
prune_one_dentry(dentry);
+ nr_freed++;
}
spin_unlock(&dcache_lock);
+ spin_lock(&prune_dcache_lock);
+ lru_dentry_stat.dprune_req = nr_requested;
+ lru_dentry_stat.dprune_freed = nr_freed;
Don't you mean "+=" ?
+ spin_unlock(&prune_dcache_lock);
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 21:34 ` Marcelo Tosatti
@ 2005-09-14 21:43 ` Dipankar Sarma
2005-09-15 4:28 ` Bharata B Rao
1 sibling, 0 replies; 32+ messages in thread
From: Dipankar Sarma @ 2005-09-14 21:43 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Bharata B Rao, Theodore Ts'o, linux-mm, linux-kernel
On Wed, Sep 14, 2005 at 06:34:04PM -0300, Marcelo Tosatti wrote:
> On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> > On Sun, Sep 11, 2005 at 11:16:36PM -0400, Theodore Ts'o wrote:
> >
> > Ted,
> >
> > I am sending two patches here.
> >
> > First is dentry_stats patch which collects some dcache statistics
> > and puts it into /proc/meminfo. This patch provides information
> > about how dentries are distributed in dcache slab pages, how many
> > free and in use dentries are present in dentry_unused lru list and
> > how prune_dcache() performs with respect to freeing the requested
> > number of dentries.
>
> Hi Bharata,
>
> +void get_dstat_info(void)
> +{
> + struct dentry *dentry;
> +
> + lru_dentry_stat.nr_total = lru_dentry_stat.nr_inuse = 0;
> + lru_dentry_stat.nr_ref = lru_dentry_stat.nr_free = 0;
> +
> + spin_lock(&dcache_lock);
> + list_for_each_entry(dentry, &dentry_unused, d_lru) {
> + if (atomic_read(&dentry->d_count))
> + lru_dentry_stat.nr_inuse++;
>
> Dentries on dentry_unused list with d_count positive? Is that possible
> at all? As far as my limited understanding goes, only dentries with zero
> count can be part of the dentry_unused list.
That changed during the lock-free dcache implementation during
2.5. If we strictly update the lru list, we will have to acquire
the dcache_lock in __d_lookup() on a successful lookup. So we
did lazy-lru, leave the dentries with non-zero refcounts
and clean them up later when we acquire dcache_lock for other
purposes.
Thanks
Dipankar
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 15:48 ` Sonny Rao
@ 2005-09-14 22:02 ` David Chinner
2005-09-14 22:40 ` Sonny Rao
0 siblings, 1 reply; 32+ messages in thread
From: David Chinner @ 2005-09-14 22:02 UTC (permalink / raw)
To: Sonny Rao
Cc: Bharata B Rao, Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Wed, Sep 14, 2005 at 11:48:52AM -0400, Sonny Rao wrote:
> On Wed, Sep 14, 2005 at 07:59:32AM +1000, David Chinner wrote:
> > On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> > >
> > > Second is Sonny Rao's rbtree dentry reclaim patch which is an attempt
> > > to improve this dcache fragmentation problem.
> >
> > FYI, in the past I've tried this patch to reduce dcache fragmentation on
> > an Altix (16k pages, 62 dentries to a slab page) under heavy
> > fileserver workloads and it had no measurable effect. It appeared
> > that there was almost always at least one active dentry on each page
> > in the slab. The story may very well be different on 4k page
> > machines, however.
....
> I'm not surprised... With 62 dentrys per page, the likelyhood of
> success is very small, and in fact performance could degrade since we
> are holding the dcache lock more often and doing less useful work.
>
> It has been over a year and my memory is hazy, but I think I did see
> about a 10% improvement on my workload (some sort of SFS simulation
> with millions of files being randomly accessed) on an x86 machine but CPU
> utilization also went way up which I think was the dcache lock.
Hmmm - can't say that I've had the same experience. I did not notice
any decrease in fragmentation or increase in CPU usage...
FWIW, SFS is just one workload that produces fragmentation. Any
load that mixes or switches repeatedly between filesystem traversals
to producing memory pressure via the page cache tends to result in
fragmentation of the inode and dentry slabs...
> Whatever happened to the vfs_cache_pressue band-aid/sledgehammer ?
> Is it not considered an option ?
All that did was increase the fragmentation levels. Instead of
seeing a 4-5:1 free/used ratio in the dcache, it would push out to
10-15:1 if vfs_cache_pressue was used to prefer reclaiming dentries
over page cache pages. Going the other way and prefering reclaim of
page cache pages did nothing to change the level of fragmentation.
Reclaim still freed most of the dentries in the working set but it
took a little longer to do it.
Right now our only solution to prevent fragmentation on reclaim is
to throw more memory at the machine to prevent reclaim from
happening as the workload changes.
Cheers,
Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 22:02 ` David Chinner
@ 2005-09-14 22:40 ` Sonny Rao
2005-09-15 1:14 ` David Chinner
0 siblings, 1 reply; 32+ messages in thread
From: Sonny Rao @ 2005-09-14 22:40 UTC (permalink / raw)
To: David Chinner
Cc: Bharata B Rao, Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Thu, Sep 15, 2005 at 08:02:22AM +1000, David Chinner wrote:
> On Wed, Sep 14, 2005 at 11:48:52AM -0400, Sonny Rao wrote:
> > On Wed, Sep 14, 2005 at 07:59:32AM +1000, David Chinner wrote:
> > > On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> > > >
> > > > Second is Sonny Rao's rbtree dentry reclaim patch which is an attempt
> > > > to improve this dcache fragmentation problem.
> > >
> > > FYI, in the past I've tried this patch to reduce dcache fragmentation on
> > > an Altix (16k pages, 62 dentries to a slab page) under heavy
> > > fileserver workloads and it had no measurable effect. It appeared
> > > that there was almost always at least one active dentry on each page
> > > in the slab. The story may very well be different on 4k page
> > > machines, however.
>
> ....
>
> > I'm not surprised... With 62 dentrys per page, the likelyhood of
> > success is very small, and in fact performance could degrade since we
> > are holding the dcache lock more often and doing less useful work.
> >
> > It has been over a year and my memory is hazy, but I think I did see
> > about a 10% improvement on my workload (some sort of SFS simulation
> > with millions of files being randomly accessed) on an x86 machine but CPU
> > utilization also went way up which I think was the dcache lock.
>
> Hmmm - can't say that I've had the same experience. I did not notice
> any decrease in fragmentation or increase in CPU usage...
Well, this was on an x86 machine with 8 cores but relatively poor
scalability and horrific memory latencies ... i.e. it tends to
exaggerate the effects of bad locks compared to what I would see on a
more scalable POWER machine. We actually ran SFS on a 4-way POWER-5
machine with the patch and didn't see any real change in throughput,
and fragmentation was a little better. I can go dig out the data if
someone is really interested.
In your case with 62 dentry objects per page (which is only going to
get much worse as we bump up base page sizes), I think the chances of
success of this approach or anything similar are horrible because we
aren't really solving any of the fundamental issues.
For me, the patch we mostly an experiment to see if the "blunderbus"
effect (to quote mjb) could be controlled any better that we do
today. Mostly, it didn't seem worth it to me -- especially since we
wanted the global dcache lock to go away.
> FWIW, SFS is just one workload that produces fragmentation. Any
> load that mixes or switches repeatedly between filesystem traversals
> to producing memory pressure via the page cache tends to result in
> fragmentation of the inode and dentry slabs...
Yep, and that's more or less how I "simulated" SFS, just had tons of
small files. I wasn't trying to really simulate the networking part
or op mixture etc -- just the slab fragmentation as a "worst-case".
> > Whatever happened to the vfs_cache_pressue band-aid/sledgehammer ?
> > Is it not considered an option ?
>
> All that did was increase the fragmentation levels. Instead of
> seeing a 4-5:1 free/used ratio in the dcache, it would push out to
> 10-15:1 if vfs_cache_pressue was used to prefer reclaiming dentries
> over page cache pages. Going the other way and prefering reclaim of
> page cache pages did nothing to change the level of fragmentation.
> Reclaim still freed most of the dentries in the working set but it
> took a little longer to do it.
Yes, but on systems with smaller pages it does seem to have some
positive effect. I don't really know how well this has been
quantified.
> Right now our only solution to prevent fragmentation on reclaim is
> to throw more memory at the machine to prevent reclaim from
> happening as the workload changes.
That is unfortunate, but interesting because I didn't know if this was
not a "real-problem" as some have contended. I know SPEC SFS is a
somewhat questionable workload (really, what isn't though?), so the
evidence gathered from that didn't seem to convince many people.
What kind of (real) workload are you seeing this on?
Thanks,
Sonny
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 9:43 ` Andrew Morton
2005-09-14 9:52 ` Dipankar Sarma
@ 2005-09-14 22:44 ` Theodore Ts'o
1 sibling, 0 replies; 32+ messages in thread
From: Theodore Ts'o @ 2005-09-14 22:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Manfred Spraul, ak, dgc, bharata, dipankar, linux-mm, linux-kernel
On Wed, Sep 14, 2005 at 02:43:13AM -0700, Andrew Morton wrote:
> Manfred Spraul <manfred@colorfullife.com> wrote:
> >
> > One tricky point are directory dentries: As far as I see, they are
> > pinned and unfreeable if a (freeable) directory entry is in the cache.
> >
>
> Well. That's the whole problem.
>
> I don't think it's been demonstrated that Ted's problem was caused by
> internal fragementation, btw. Ted, could you run slabtop, see what the
> dcache occupancy is? Monitor it as you start to manually apply pressure?
> If the occupancy falls to 10% and not many slab pages are freed up yet then
> yup, it's internal fragmentation.
The next time I can get my machine into that state, sure, I'll try it.
I used to be able to reproduce it using normal laptop usage patterns
(Lotus notes running under wine, kernel builds, apt-get upgrade's,
openoffice, firefox, etc.) about twice a week with 2.6.13-rc5, but
with 2.6.13, it happened once or twice, but since then I haven't been
able to trigger it. (Predictably, not after I posted about it on
LKML. :-/)
I've been trying a few things in the hopes of deliberately triggering
it, but so far, no luck. Maybe I should go back to 2.6.13-rc5 and see
if I have an easier time of reproducing the failure case.
- Ted
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 9:01 ` Andi Kleen
` (2 preceding siblings ...)
2005-09-14 13:57 ` Martin J. Bligh
@ 2005-09-14 22:48 ` David Chinner
3 siblings, 0 replies; 32+ messages in thread
From: David Chinner @ 2005-09-14 22:48 UTC (permalink / raw)
To: Andi Kleen
Cc: Bharata B Rao, Theodore Ts'o, Dipankar Sarma, linux-mm,
linux-kernel, manfred
On Wed, Sep 14, 2005 at 11:01:15AM +0200, Andi Kleen wrote:
> On Tuesday 13 September 2005 23:59, David Chinner wrote:
> > On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> > > Second is Sonny Rao's rbtree dentry reclaim patch which is an attempt
> > > to improve this dcache fragmentation problem.
> >
> > FYI, in the past I've tried this patch to reduce dcache fragmentation on
> > an Altix (16k pages, 62 dentries to a slab page) under heavy
> > fileserver workloads and it had no measurable effect. It appeared
> > that there was almost always at least one active dentry on each page
> > in the slab. The story may very well be different on 4k page
> > machines, however.
>
> I always thought dentry freeing would work much better if it
> was turned upside down.
>
> Instead of starting from the high level dcache lists it could
> be driven by slab: on memory pressure slab tries to return pages with unused
> cache objects. In that case it should check if there are only
> a small number of pinned objects on the page set left, and if
> yes use a new callback to the higher level user (=dcache) and ask them
> to free the object.
If you add a slab free object callback, then you have the beginnings
of a more flexible solution to memory reclaim from the slabs.
For example, you can easily implement a reclaim-not-allocate method
for new slab allocations for when there is no memory available or the
size of the slab is passed some configurable high water mark...
Right now these is no way to control the size of a slab cache. Part
of the reason for the fragmentation I have seen is the massive
changes in size of the caches due to the OS making wrong decisions
about memory reclaim when small changes in the workload occur. We
currently have no way to provide hints to help the OS make the right
decision for a given workload....
Cheers,
Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-13 8:47 ` Bharata B Rao
2005-09-13 21:59 ` David Chinner
2005-09-14 21:34 ` Marcelo Tosatti
@ 2005-09-14 23:08 ` Marcelo Tosatti
2005-09-15 9:39 ` Bharata B Rao
2 siblings, 1 reply; 32+ messages in thread
From: Marcelo Tosatti @ 2005-09-14 23:08 UTC (permalink / raw)
To: Bharata B Rao; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> On Sun, Sep 11, 2005 at 11:16:36PM -0400, Theodore Ts'o wrote:
> > On Sun, Sep 11, 2005 at 05:30:46PM +0530, Dipankar Sarma wrote:
> > > Do you have the /proc/sys/fs/dentry-state output when such lowmem
> > > shortage happens ?
> >
> > Not yet, but the situation occurs on my laptop about 2 or 3 times
> > (when I'm not travelling and so it doesn't get rebooted). So
> > reproducing it isn't utterly trivial, but it's does happen often
> > enough that it should be possible to get the necessary data.
> >
> > > This is a problem that Bharata has been investigating at the moment.
> > > But he hasn't seen anything that can't be cured by a small memory
> > > pressure - IOW, dentries do get freed under memory pressure. So
> > > your case might be very useful. Bharata is maintaing an instrumentation
> > > patch to collect more information and an alternative dentry aging patch
> > > (using rbtree). Perhaps you could try with those.
> >
> > Send it to me, and I'd be happy to try either the instrumentation
> > patch or the dentry aging patch.
> >
>
> Ted,
>
> I am sending two patches here.
>
> First is dentry_stats patch which collects some dcache statistics
> and puts it into /proc/meminfo. This patch provides information
> about how dentries are distributed in dcache slab pages, how many
> free and in use dentries are present in dentry_unused lru list and
> how prune_dcache() performs with respect to freeing the requested
> number of dentries.
Bharata,
Ideally one should move the "nr_requested/nr_freed" counters from your
stats patch into "struct shrinker" (or somewhere else more appropriate
in which per-shrinkable-cache stats are maintained), and use the
"mod_page_state" infrastructure to do lockless per-CPU accounting. ie.
break /proc/vmstats's "slabs_scanned" apart in meaningful pieces.
IMO something along that line should be merged into mainline to walk
away from the "what the fuck is going on" state of things.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 22:40 ` Sonny Rao
@ 2005-09-15 1:14 ` David Chinner
0 siblings, 0 replies; 32+ messages in thread
From: David Chinner @ 2005-09-15 1:14 UTC (permalink / raw)
To: Sonny Rao
Cc: Bharata B Rao, Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Wed, Sep 14, 2005 at 06:40:40PM -0400, Sonny Rao wrote:
> On Thu, Sep 15, 2005 at 08:02:22AM +1000, David Chinner wrote:
> > Right now our only solution to prevent fragmentation on reclaim is
> > to throw more memory at the machine to prevent reclaim from
> > happening as the workload changes.
>
> That is unfortunate, but interesting because I didn't know if this was
> not a "real-problem" as some have contended. I know SPEC SFS is a
> somewhat questionable workload (really, what isn't though?), so the
> evidence gathered from that didn't seem to convince many people.
>
> What kind of (real) workload are you seeing this on?
Nothing special. Here's an example from a local altix build
server (8p, 12GiB RAM):
linvfs_icache 3376574 3891360 672 24 1 : tunables 54 27 8 : slabdata 162140 162140 0
dentry_cache 2632811 3007186 256 62 1 : tunables 120 60 8 : slabdata 48503 48503 0
I just copied and untarred some stuff I need to look at (~2GiB
data) and when that completed we now have:
linvfs_icache 590840 2813328 672 24 1 : tunables 54 27 8 : slabdata 117222 117222
dentry_cache 491984 2717708 256 62 1 : tunables 120 60 8 : slabdata 43834 43834
A few minutes later, with ppl doing normal work (rsync, kernel and
userspace package builds, tar, etc), a bit more had been reclaimed:
linvfs_icache 580589 2797992 672 24 1 : tunables 54 27 8 : slabdata 116583 116583 0
dentry_cache 412009 2418558 256 62 1 : tunables 120 60 8 : slabdata 39009 39009 0
We started with ~2.9GiB of active slab objects in ~210k pages
(3.3GiB RAM) in these two slabs. We've trimmed their active size
down to ~500MiB, but we still have 155k pages (2.5GiB) allocated to
the slabs.
I've seen much worse than this on build servers with more memory and
larger filesystems, especially after the filesystems have been
crawled by a backup program over night and we've ended up with > 10
million objects in each of these caches.
Cheers,
Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 21:34 ` Marcelo Tosatti
2005-09-14 21:43 ` Dipankar Sarma
@ 2005-09-15 4:28 ` Bharata B Rao
1 sibling, 0 replies; 32+ messages in thread
From: Bharata B Rao @ 2005-09-15 4:28 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Wed, Sep 14, 2005 at 06:34:04PM -0300, Marcelo Tosatti wrote:
> On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> > On Sun, Sep 11, 2005 at 11:16:36PM -0400, Theodore Ts'o wrote:
> > > On Sun, Sep 11, 2005 at 05:30:46PM +0530, Dipankar Sarma wrote:
> > > > Do you have the /proc/sys/fs/dentry-state output when such lowmem
> > > > shortage happens ?
> > >
> > > Not yet, but the situation occurs on my laptop about 2 or 3 times
> > > (when I'm not travelling and so it doesn't get rebooted). So
> > > reproducing it isn't utterly trivial, but it's does happen often
> > > enough that it should be possible to get the necessary data.
> > >
> > > > This is a problem that Bharata has been investigating at the moment.
> > > > But he hasn't seen anything that can't be cured by a small memory
> > > > pressure - IOW, dentries do get freed under memory pressure. So
> > > > your case might be very useful. Bharata is maintaing an instrumentation
> > > > patch to collect more information and an alternative dentry aging patch
> > > > (using rbtree). Perhaps you could try with those.
> > >
> > > Send it to me, and I'd be happy to try either the instrumentation
> > > patch or the dentry aging patch.
> > >
> >
> > Ted,
> >
> > I am sending two patches here.
> >
> > First is dentry_stats patch which collects some dcache statistics
> > and puts it into /proc/meminfo. This patch provides information
> > about how dentries are distributed in dcache slab pages, how many
> > free and in use dentries are present in dentry_unused lru list and
> > how prune_dcache() performs with respect to freeing the requested
> > number of dentries.
>
> Hi Bharata,
>
> +void get_dstat_info(void)
> +{
> + struct dentry *dentry;
> +
> + lru_dentry_stat.nr_total = lru_dentry_stat.nr_inuse = 0;
> + lru_dentry_stat.nr_ref = lru_dentry_stat.nr_free = 0;
> +
> + spin_lock(&dcache_lock);
> + list_for_each_entry(dentry, &dentry_unused, d_lru) {
> + if (atomic_read(&dentry->d_count))
> + lru_dentry_stat.nr_inuse++;
>
> Dentries on dentry_unused list with d_count positive? Is that possible
> at all? As far as my limited understanding goes, only dentries with zero
> count can be part of the dentry_unused list.
As Dipankar mentioned, its now possible to have positive d_count dentires
on unused_list. BTW I think we need a better way to get this data than
going through the entire unused_list linearly, which might not be
scalable with huge number of dentries.
>
> + if (dentry->d_flags & DCACHE_REFERENCED)
> + lru_dentry_stat.nr_ref++;
> + }
>
>
> @@ -393,6 +430,9 @@ static inline void prune_one_dentry(stru
>
> static void prune_dcache(int count)
> {
> + int nr_requested = count;
> + int nr_freed = 0;
> +
> spin_lock(&dcache_lock);
> for (; count ; count--) {
> struct dentry *dentry;
> @@ -427,8 +467,13 @@ static void prune_dcache(int count)
> continue;
> }
> prune_one_dentry(dentry);
> + nr_freed++;
> }
> spin_unlock(&dcache_lock);
> + spin_lock(&prune_dcache_lock);
> + lru_dentry_stat.dprune_req = nr_requested;
> + lru_dentry_stat.dprune_freed = nr_freed;
>
> Don't you mean "+=" ?
No. Actually here I am capturing the number of dentries freed
per invocation of prune_dcache.
Regards,
Bharata.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 13:57 ` Martin J. Bligh
2005-09-14 15:37 ` Sonny Rao
@ 2005-09-15 7:21 ` Helge Hafting
1 sibling, 0 replies; 32+ messages in thread
From: Helge Hafting @ 2005-09-15 7:21 UTC (permalink / raw)
To: Martin J. Bligh
Cc: Andi Kleen, David Chinner, Bharata B Rao, Theodore Ts'o,
Dipankar Sarma, linux-mm, linux-kernel, manfred
Martin J. Bligh wrote:
>
>If they're freeable, we should easily be able to move them, and therefore
>compact a fragmented slab. That way we can preserve the LRU'ness of it.
>Stage 1: free the oldest entries. Stage 2: compact the slab into whole
>pages. Stage 3: free whole pages back to teh page allocator.
>
>
That seems like the perfect solution to me. Freeing up 95% or more
gives us clean pages - and moving instead of actually freeing
everything avoids the cost of repopulating the cache later. :-)
Helge Hafting
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-14 23:08 ` Marcelo Tosatti
@ 2005-09-15 9:39 ` Bharata B Rao
2005-09-15 13:29 ` Marcelo Tosatti
0 siblings, 1 reply; 32+ messages in thread
From: Bharata B Rao @ 2005-09-15 9:39 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Wed, Sep 14, 2005 at 08:08:43PM -0300, Marcelo Tosatti wrote:
> On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> >
<snip>
> > First is dentry_stats patch which collects some dcache statistics
> > and puts it into /proc/meminfo. This patch provides information
> > about how dentries are distributed in dcache slab pages, how many
> > free and in use dentries are present in dentry_unused lru list and
> > how prune_dcache() performs with respect to freeing the requested
> > number of dentries.
>
> Bharata,
>
> Ideally one should move the "nr_requested/nr_freed" counters from your
> stats patch into "struct shrinker" (or somewhere else more appropriate
> in which per-shrinkable-cache stats are maintained), and use the
> "mod_page_state" infrastructure to do lockless per-CPU accounting. ie.
> break /proc/vmstats's "slabs_scanned" apart in meaningful pieces.
Yes, I agree that we should have the nr_requested and nr_freed type of
counters in appropriate place. And "struct shrinker" is probably right
place for it.
Essentially you are suggesting that we maintain per cpu statistics
of 'requested to free'(scanned) slab objects and actual freed objects.
And this should be on per shrinkable cache basis.
Is it ok to maintain this requested/freed counters as growing counters
or would it make more sense to have them reflect the statistics from
the latest/last attempt of cache shrink ? And where would be right
place to export this information ? (/proc/slabinfo ?, since it already
gives details of all caches)
If I understand correctly, "slabs_scanned" is the sum total number
of objects from all shrinkable caches scanned for possible freeeing.
I didn't get why this is part of page_state which mostly includes
page related statistics.
Regards,
Bharata.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-15 9:39 ` Bharata B Rao
@ 2005-09-15 13:29 ` Marcelo Tosatti
2005-10-02 16:32 ` Bharata B Rao
0 siblings, 1 reply; 32+ messages in thread
From: Marcelo Tosatti @ 2005-09-15 13:29 UTC (permalink / raw)
To: Bharata B Rao; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Thu, Sep 15, 2005 at 03:09:45PM +0530, Bharata B Rao wrote:
> On Wed, Sep 14, 2005 at 08:08:43PM -0300, Marcelo Tosatti wrote:
> > On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> > >
> <snip>
> > > First is dentry_stats patch which collects some dcache statistics
> > > and puts it into /proc/meminfo. This patch provides information
> > > about how dentries are distributed in dcache slab pages, how many
> > > free and in use dentries are present in dentry_unused lru list and
> > > how prune_dcache() performs with respect to freeing the requested
> > > number of dentries.
> >
> > Bharata,
> >
> > Ideally one should move the "nr_requested/nr_freed" counters from your
> > stats patch into "struct shrinker" (or somewhere else more appropriate
> > in which per-shrinkable-cache stats are maintained), and use the
> > "mod_page_state" infrastructure to do lockless per-CPU accounting. ie.
> > break /proc/vmstats's "slabs_scanned" apart in meaningful pieces.
>
> Yes, I agree that we should have the nr_requested and nr_freed type of
> counters in appropriate place. And "struct shrinker" is probably right
> place for it.
>
> Essentially you are suggesting that we maintain per cpu statistics
> of 'requested to free'(scanned) slab objects and actual freed objects.
> And this should be on per shrinkable cache basis.
Yep.
> Is it ok to maintain this requested/freed counters as growing counters
> or would it make more sense to have them reflect the statistics from
> the latest/last attempt of cache shrink ?
It makes a lot more sense to account for all shrink attempts: it is necessary
to know how the reclaiming process is behaving over time. Thats why I wondered
about using "=" instead of "+=" in your patch.
> And where would be right place to export this information ?
> (/proc/slabinfo ?, since it already gives details of all caches)
My feeling is that changing /proc/slabinfo format might break userspace
applications.
> If I understand correctly, "slabs_scanned" is the sum total number
> of objects from all shrinkable caches scanned for possible freeeing.
Yep.
> I didn't get why this is part of page_state which mostly includes
> page related statistics.
Well, page_state contains most of the reclaiming statistics - its scope
is broader than "struct page" information.
To me it seems like the best place.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-09-15 13:29 ` Marcelo Tosatti
@ 2005-10-02 16:32 ` Bharata B Rao
2005-10-02 20:06 ` Marcelo
0 siblings, 1 reply; 32+ messages in thread
From: Bharata B Rao @ 2005-10-02 16:32 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 3630 bytes --]
On Thu, Sep 15, 2005 at 10:29:10AM -0300, Marcelo Tosatti wrote:
> On Thu, Sep 15, 2005 at 03:09:45PM +0530, Bharata B Rao wrote:
> > On Wed, Sep 14, 2005 at 08:08:43PM -0300, Marcelo Tosatti wrote:
> > > On Tue, Sep 13, 2005 at 02:17:52PM +0530, Bharata B Rao wrote:
> > > >
> > <snip>
> > > > First is dentry_stats patch which collects some dcache statistics
> > > > and puts it into /proc/meminfo. This patch provides information
> > > > about how dentries are distributed in dcache slab pages, how many
> > > > free and in use dentries are present in dentry_unused lru list and
> > > > how prune_dcache() performs with respect to freeing the requested
> > > > number of dentries.
> > >
> > > Bharata,
> > >
> > > Ideally one should move the "nr_requested/nr_freed" counters from your
> > > stats patch into "struct shrinker" (or somewhere else more appropriate
> > > in which per-shrinkable-cache stats are maintained), and use the
> > > "mod_page_state" infrastructure to do lockless per-CPU accounting. ie.
> > > break /proc/vmstats's "slabs_scanned" apart in meaningful pieces.
> >
> > Yes, I agree that we should have the nr_requested and nr_freed type of
> > counters in appropriate place. And "struct shrinker" is probably right
> > place for it.
> >
> > Essentially you are suggesting that we maintain per cpu statistics
> > of 'requested to free'(scanned) slab objects and actual freed objects.
> > And this should be on per shrinkable cache basis.
>
> Yep.
>
> > Is it ok to maintain this requested/freed counters as growing counters
> > or would it make more sense to have them reflect the statistics from
> > the latest/last attempt of cache shrink ?
>
> It makes a lot more sense to account for all shrink attempts: it is necessary
> to know how the reclaiming process is behaving over time. Thats why I wondered
> about using "=" instead of "+=" in your patch.
>
> > And where would be right place to export this information ?
> > (/proc/slabinfo ?, since it already gives details of all caches)
>
> My feeling is that changing /proc/slabinfo format might break userspace
> applications.
>
> > If I understand correctly, "slabs_scanned" is the sum total number
> > of objects from all shrinkable caches scanned for possible freeeing.
>
> Yep.
>
> > I didn't get why this is part of page_state which mostly includes
> > page related statistics.
>
> Well, page_state contains most of the reclaiming statistics - its scope
> is broader than "struct page" information.
>
> To me it seems like the best place.
>
Marcelo,
The attached patch is an attempt to break the "slabs_scanned" into
meaningful pieces as you suggested.
But I coudn't do this cleanly because kmem_cache_t isn't defined
in a .h file and I didn't want to touch too many files in the first
attempt.
What I am doing here is making the "requested to free" and
"actual freed" counters as part of struct shrinker. With this I can
update these statistics seamlessly from shrink_slab().
I don't have this as per cpu counters because I wasn't sure if shrink_slab()
would have many concurrent executions warranting a lockless percpu
counters for these.
I am displaying this information as part of /proc/slabinfo and I have
verified that it atleast isn't breaking slabtop.
I thought about having this as part of /proc/vmstat and using
mod_page_state infrastructure as u suggested, but having the
"requested to free" and "actual freed" counters in struct page_state
for only those caches which set the shrinker function didn't look
good.
If you think that all this can be done in a better way, please
let me know.
Regards,
Bharata.
[-- Attachment #2: cache_shrink_stats.patch --]
[-- Type: text/plain, Size: 6764 bytes --]
Signed-off-by: Bharata B Rao <bharata@in.ibm.com>
---
fs/dcache.c | 4 +++-
fs/dquot.c | 4 +++-
fs/inode.c | 4 +++-
include/linux/mm.h | 15 ++++++++++++++-
include/linux/slab.h | 3 +++
mm/slab.c | 14 ++++++++++++++
mm/vmscan.c | 19 +++++++------------
7 files changed, 47 insertions(+), 16 deletions(-)
diff -puN mm/vmscan.c~cache_shrink_stats mm/vmscan.c
--- linux-2.6.14-rc2-shrink/mm/vmscan.c~cache_shrink_stats 2005-09-28 11:17:01.508944136 +0530
+++ linux-2.6.14-rc2-shrink-bharata/mm/vmscan.c 2005-09-28 17:18:57.799566152 +0530
@@ -84,17 +84,6 @@ struct scan_control {
int swap_cluster_max;
};
-/*
- * The list of shrinker callbacks used by to apply pressure to
- * ageable caches.
- */
-struct shrinker {
- shrinker_t shrinker;
- struct list_head list;
- int seeks; /* seeks to recreate an obj */
- long nr; /* objs pending delete */
-};
-
#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
#ifdef ARCH_HAS_PREFETCH
@@ -146,6 +135,8 @@ struct shrinker *set_shrinker(int seeks,
shrinker->shrinker = theshrinker;
shrinker->seeks = seeks;
shrinker->nr = 0;
+ atomic_set(&shrinker->nr_req, 0);
+ atomic_set(&shrinker->nr_freed, 0);
down_write(&shrinker_rwsem);
list_add_tail(&shrinker->list, &shrinker_list);
up_write(&shrinker_rwsem);
@@ -221,9 +212,13 @@ static int shrink_slab(unsigned long sca
shrink_ret = (*shrinker->shrinker)(this_scan, gfp_mask);
if (shrink_ret == -1)
break;
- if (shrink_ret < nr_before)
+ if (shrink_ret < nr_before) {
ret += nr_before - shrink_ret;
+ atomic_add(nr_before - shrink_ret,
+ &shrinker->nr_freed);
+ }
mod_page_state(slabs_scanned, this_scan);
+ atomic_add(this_scan, &shrinker->nr_req);
total_scan -= this_scan;
cond_resched();
diff -puN fs/inode.c~cache_shrink_stats fs/inode.c
--- linux-2.6.14-rc2-shrink/fs/inode.c~cache_shrink_stats 2005-09-28 11:25:58.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/fs/inode.c 2005-09-28 14:02:24.422431992 +0530
@@ -1357,11 +1357,13 @@ void __init inode_init_early(void)
void __init inode_init(unsigned long mempages)
{
int loop;
+ struct shrinker *shrinker;
/* inode slab cache */
inode_cachep = kmem_cache_create("inode_cache", sizeof(struct inode),
0, SLAB_RECLAIM_ACCOUNT|SLAB_PANIC, init_once, NULL);
- set_shrinker(DEFAULT_SEEKS, shrink_icache_memory);
+ shrinker = set_shrinker(DEFAULT_SEEKS, shrink_icache_memory);
+ kmem_set_shrinker(inode_cachep, shrinker);
/* Hash may have been set up in inode_init_early */
if (!hashdist)
diff -puN fs/dquot.c~cache_shrink_stats fs/dquot.c
--- linux-2.6.14-rc2-shrink/fs/dquot.c~cache_shrink_stats 2005-09-28 11:28:51.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/fs/dquot.c 2005-09-28 14:06:13.197652872 +0530
@@ -1793,6 +1793,7 @@ static int __init dquot_init(void)
{
int i;
unsigned long nr_hash, order;
+ struct shrinker *shrinker;
printk(KERN_NOTICE "VFS: Disk quotas %s\n", __DQUOT_VERSION__);
@@ -1824,7 +1825,8 @@ static int __init dquot_init(void)
printk("Dquot-cache hash table entries: %ld (order %ld, %ld bytes)\n",
nr_hash, order, (PAGE_SIZE << order));
- set_shrinker(DEFAULT_SEEKS, shrink_dqcache_memory);
+ shrinker = set_shrinker(DEFAULT_SEEKS, shrink_dqcache_memory);
+ kmem_set_shrinker(dquot_cachep, shrinker);
return 0;
}
diff -puN fs/dcache.c~cache_shrink_stats fs/dcache.c
--- linux-2.6.14-rc2-shrink/fs/dcache.c~cache_shrink_stats 2005-09-28 11:31:35.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/fs/dcache.c 2005-09-28 13:47:46.507895288 +0530
@@ -1668,6 +1668,7 @@ static void __init dcache_init_early(voi
static void __init dcache_init(unsigned long mempages)
{
int loop;
+ struct shrinker *shrinker;
/*
* A constructor could be added for stable state like the lists,
@@ -1680,7 +1681,8 @@ static void __init dcache_init(unsigned
SLAB_RECLAIM_ACCOUNT|SLAB_PANIC,
NULL, NULL);
- set_shrinker(DEFAULT_SEEKS, shrink_dcache_memory);
+ shrinker = set_shrinker(DEFAULT_SEEKS, shrink_dcache_memory);
+ kmem_set_shrinker(dentry_cache, shrinker);
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
diff -puN mm/slab.c~cache_shrink_stats mm/slab.c
--- linux-2.6.14-rc2-shrink/mm/slab.c~cache_shrink_stats 2005-09-28 11:40:00.285338264 +0530
+++ linux-2.6.14-rc2-shrink-bharata/mm/slab.c 2005-09-28 14:26:52.187297816 +0530
@@ -400,6 +400,9 @@ struct kmem_cache_s {
/* de-constructor func */
void (*dtor)(void *, kmem_cache_t *, unsigned long);
+ /* shrinker data for this cache */
+ struct shrinker *shrinker;
+
/* 4) cache creation/removal */
const char *name;
struct list_head next;
@@ -3483,6 +3486,12 @@ static int s_show(struct seq_file *m, vo
allochit, allocmiss, freehit, freemiss);
}
#endif
+ /* shrinker stats */
+ if (cachep->shrinker) {
+ seq_printf(m, " : shrinker stat %7lu %7lu",
+ atomic_read(&cachep->shrinker->nr_req),
+ atomic_read(&cachep->shrinker->nr_freed));
+ }
seq_putc(m, '\n');
spin_unlock_irq(&cachep->spinlock);
return 0;
@@ -3606,3 +3615,8 @@ char *kstrdup(const char *s, unsigned in
return buf;
}
EXPORT_SYMBOL(kstrdup);
+
+void kmem_set_shrinker(kmem_cache_t *cachep, struct shrinker *shrinker)
+{
+ cachep->shrinker = shrinker;
+}
diff -puN include/linux/mm.h~cache_shrink_stats include/linux/mm.h
--- linux-2.6.14-rc2-shrink/include/linux/mm.h~cache_shrink_stats 2005-09-28 12:41:09.664507840 +0530
+++ linux-2.6.14-rc2-shrink-bharata/include/linux/mm.h 2005-09-28 12:41:46.014981728 +0530
@@ -755,7 +755,20 @@ typedef int (*shrinker_t)(int nr_to_scan
*/
#define DEFAULT_SEEKS 2
-struct shrinker;
+
+/*
+ * The list of shrinker callbacks used by to apply pressure to
+ * ageable caches.
+ */
+struct shrinker {
+ shrinker_t shrinker;
+ struct list_head list;
+ int seeks; /* seeks to recreate an obj */
+ long nr; /* objs pending delete */
+ atomic_t nr_req; /* objs scanned for possible freeing */
+ atomic_t nr_freed; /* actual number of objects freed */
+};
+
extern struct shrinker *set_shrinker(int, shrinker_t);
extern void remove_shrinker(struct shrinker *shrinker);
diff -puN include/linux/slab.h~cache_shrink_stats include/linux/slab.h
--- linux-2.6.14-rc2-shrink/include/linux/slab.h~cache_shrink_stats 2005-09-28 13:52:53.852171856 +0530
+++ linux-2.6.14-rc2-shrink-bharata/include/linux/slab.h 2005-09-28 14:07:42.127133536 +0530
@@ -147,6 +147,9 @@ extern kmem_cache_t *bio_cachep;
extern atomic_t slab_reclaim_pages;
+struct shrinker;
+extern void kmem_set_shrinker(kmem_cache_t *cachep, struct shrinker *shrinker);
+
#endif /* __KERNEL__ */
#endif /* _LINUX_SLAB_H */
_
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough
2005-10-02 16:32 ` Bharata B Rao
@ 2005-10-02 20:06 ` Marcelo
2005-10-04 13:36 ` shrinkable cache statistics [was Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough] Bharata B Rao
0 siblings, 1 reply; 32+ messages in thread
From: Marcelo @ 2005-10-02 20:06 UTC (permalink / raw)
To: Bharata B Rao
Cc: Marcelo Tosatti, Theodore Ts'o, Dipankar Sarma, linux-mm,
linux-kernel
Bharata,
On Sun, Oct 02, 2005 at 10:02:29PM +0530, Bharata B Rao wrote:
>
> Marcelo,
>
> The attached patch is an attempt to break the "slabs_scanned" into
> meaningful pieces as you suggested.
>
> But I coudn't do this cleanly because kmem_cache_t isn't defined
> in a .h file and I didn't want to touch too many files in the first
> attempt.
>
> What I am doing here is making the "requested to free" and
> "actual freed" counters as part of struct shrinker. With this I can
> update these statistics seamlessly from shrink_slab().
>
> I don't have this as per cpu counters because I wasn't sure if shrink_slab()
> would have many concurrent executions warranting a lockless percpu
> counters for these.
Per-CPU counters are interesting because they avoid the atomic
operation _and_ potential cacheline bouncing. Given the fact that less
commonly used counters in the reclaim path are already per-CPU,
I think that it might be worth to do it here too.
> I am displaying this information as part of /proc/slabinfo and I have
> verified that it atleast isn't breaking slabtop.
>
> I thought about having this as part of /proc/vmstat and using
> mod_page_state infrastructure as u suggested, but having the
> "requested to free" and "actual freed" counters in struct page_state
> for only those caches which set the shrinker function didn't look
> good.
OK... You could change the atomic counters to per-CPU variables
in "struct shrinker".
> If you think that all this can be done in a better way, please
> let me know.
^ permalink raw reply [flat|nested] 32+ messages in thread
* shrinkable cache statistics [was Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough]
2005-10-02 20:06 ` Marcelo
@ 2005-10-04 13:36 ` Bharata B Rao
2005-10-05 21:25 ` Marcelo Tosatti
0 siblings, 1 reply; 32+ messages in thread
From: Bharata B Rao @ 2005-10-04 13:36 UTC (permalink / raw)
To: Marcelo; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 4574 bytes --]
Marcelo,
Here's my next attempt in breaking the "slabs_scanned" from /proc/vmstat
into meaningful per cache statistics. Now I have the statistics counters
as percpu. [an issue remaining is that there are more than one cache as
part of mbcache and they all have a common shrinker routine and I am
displaying the collective shrinker stats info on each of them in
/proc/slabinfo ==> some kind of duplication]
With this patch (and my earlier dcache stats patch) I observed some
interesting results with the following test scenario on a 8cpu p3 box:
- Ran an application which consumes 40% of the total memory.
- Ran dbench on tmpfs with 128 clients twice (serially).
- Ran a find on a ext3 partition having ~9.5million entries (files and
directories included)
At the end of this run, I have the following results:
[root@llm09 bharata]# cat /proc/meminfo
MemTotal: 3872528 kB
MemFree: 1420940 kB
Buffers: 714068 kB
Cached: 21536 kB
SwapCached: 2264 kB
Active: 1672680 kB
Inactive: 637460 kB
HighTotal: 3014616 kB
HighFree: 1411740 kB
LowTotal: 857912 kB
LowFree: 9200 kB
SwapTotal: 2096472 kB
SwapFree: 2051408 kB
Dirty: 172 kB
Writeback: 0 kB
Mapped: 1583680 kB
Slab: 119564 kB
CommitLimit: 4032736 kB
Committed_AS: 1647260 kB
PageTables: 2248 kB
VmallocTotal: 114680 kB
VmallocUsed: 1264 kB
VmallocChunk: 113384 kB
nr_dentries/page nr_pages nr_inuse
0 0 0
1 5 2
2 12 4
3 26 9
4 46 18
5 76 40
6 82 47
7 91 59
8 122 93
9 114 102
10 142 136
11 138 185
12 118 164
13 128 206
14 126 208
15 120 219
16 136 261
17 159 315
18 145 311
19 179 379
20 192 407
21 256 631
22 286 741
23 316 816
24 342 934
25 381 1177
26 664 2813
27 0 0
28 0 0
29 0 0
Total: 4402 10277
dcache lru: total 75369 inuse 3599
[Here,
nr_dentries/page - Number of dentries per page
nr_pages - Number of pages with given number of dentries
nr_inuse - Number of inuse dentries in those pages.
Eg: From the above data, there are 26 pages with 3 dentries each
and out of 78 total dentries in these 3 pages, 9 dentries are in use.]
[root@llm09 bharata]# grep shrinker /proc/slabinfo
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> : shrinker stat <nr requested> <nr freed>
ext3_xattr 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0 : shrinker stat 0 0
dquot 0 0 160 24 1 : tunables 120 60 8 : slabdata 0 0 0 : shrinker stat 0 0
inode_cache 1301 1390 400 10 1 : tunables 54 27 8 : slabdata 139 139 0 : shrinker stat 682752 681900
dentry_cache 82110 114452 152 26 1 : tunables 120 60 8 : slabdata 4402 4402 0 : shrinker stat 1557760 760100
[root@llm09 bharata]# grep slabs_scanned /proc/vmstat
slabs_scanned 2240512
[root@llm09 bharata]# cat /proc/sys/fs/dentry-state
82046 75369 45 0 3599 0
[The order of dentry-state o/p is like this:
total dentries in dentry hash list, total dentries in lru list, age limit,
want_pages, inuse dentries in lru list, dummy]
So, we can see that with low memory pressure, even though the
shrinker runs on dcache repeatedly, not many dentries are freed
by dcache. And dcache lru list still has huge number of free
dentries.
Regards,
Bharata.
[-- Attachment #2: cache_shrink_stats.patch --]
[-- Type: text/plain, Size: 8730 bytes --]
This patch adds two more fields to each entry of shrinkable cache
in /proc/slabinfo: the number of objects scanned for freeing and the
actual number of objects freed.
Signed-off-by: Bharata B Rao <bharata@in.ibm.com>
---
fs/dcache.c | 4 +++-
fs/dquot.c | 4 +++-
fs/inode.c | 4 +++-
fs/mbcache.c | 2 ++
include/linux/mm.h | 39 ++++++++++++++++++++++++++++++++++++++-
include/linux/slab.h | 3 +++
mm/slab.c | 15 +++++++++++++++
mm/vmscan.c | 23 +++++++++++------------
8 files changed, 78 insertions(+), 16 deletions(-)
diff -puN mm/vmscan.c~cache_shrink_stats mm/vmscan.c
--- linux-2.6.14-rc2-shrink/mm/vmscan.c~cache_shrink_stats 2005-09-28 11:17:01.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/mm/vmscan.c 2005-10-04 15:27:52.000000000 +0530
@@ -84,17 +84,6 @@ struct scan_control {
int swap_cluster_max;
};
-/*
- * The list of shrinker callbacks used by to apply pressure to
- * ageable caches.
- */
-struct shrinker {
- shrinker_t shrinker;
- struct list_head list;
- int seeks; /* seeks to recreate an obj */
- long nr; /* objs pending delete */
-};
-
#define lru_to_page(_head) (list_entry((_head)->prev, struct page, lru))
#ifdef ARCH_HAS_PREFETCH
@@ -146,6 +135,11 @@ struct shrinker *set_shrinker(int seeks,
shrinker->shrinker = theshrinker;
shrinker->seeks = seeks;
shrinker->nr = 0;
+ shrinker->s_stats = alloc_percpu(struct shrinker_stats);
+ if (!shrinker->s_stats) {
+ kfree(shrinker);
+ return NULL;
+ }
down_write(&shrinker_rwsem);
list_add_tail(&shrinker->list, &shrinker_list);
up_write(&shrinker_rwsem);
@@ -162,6 +156,7 @@ void remove_shrinker(struct shrinker *sh
down_write(&shrinker_rwsem);
list_del(&shrinker->list);
up_write(&shrinker_rwsem);
+ free_percpu(shrinker->s_stats);
kfree(shrinker);
}
EXPORT_SYMBOL(remove_shrinker);
@@ -221,8 +216,12 @@ static int shrink_slab(unsigned long sca
shrink_ret = (*shrinker->shrinker)(this_scan, gfp_mask);
if (shrink_ret == -1)
break;
- if (shrink_ret < nr_before)
+ if (shrink_ret < nr_before) {
ret += nr_before - shrink_ret;
+ shrinker_stat_add(shrinker, nr_freed,
+ (nr_before - shrink_ret));
+ }
+ shrinker_stat_add(shrinker, nr_req, this_scan);
mod_page_state(slabs_scanned, this_scan);
total_scan -= this_scan;
diff -puN fs/inode.c~cache_shrink_stats fs/inode.c
--- linux-2.6.14-rc2-shrink/fs/inode.c~cache_shrink_stats 2005-09-28 11:25:58.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/fs/inode.c 2005-09-28 14:02:24.000000000 +0530
@@ -1357,11 +1357,13 @@ void __init inode_init_early(void)
void __init inode_init(unsigned long mempages)
{
int loop;
+ struct shrinker *shrinker;
/* inode slab cache */
inode_cachep = kmem_cache_create("inode_cache", sizeof(struct inode),
0, SLAB_RECLAIM_ACCOUNT|SLAB_PANIC, init_once, NULL);
- set_shrinker(DEFAULT_SEEKS, shrink_icache_memory);
+ shrinker = set_shrinker(DEFAULT_SEEKS, shrink_icache_memory);
+ kmem_set_shrinker(inode_cachep, shrinker);
/* Hash may have been set up in inode_init_early */
if (!hashdist)
diff -puN fs/dquot.c~cache_shrink_stats fs/dquot.c
--- linux-2.6.14-rc2-shrink/fs/dquot.c~cache_shrink_stats 2005-09-28 11:28:51.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/fs/dquot.c 2005-09-28 14:06:13.000000000 +0530
@@ -1793,6 +1793,7 @@ static int __init dquot_init(void)
{
int i;
unsigned long nr_hash, order;
+ struct shrinker *shrinker;
printk(KERN_NOTICE "VFS: Disk quotas %s\n", __DQUOT_VERSION__);
@@ -1824,7 +1825,8 @@ static int __init dquot_init(void)
printk("Dquot-cache hash table entries: %ld (order %ld, %ld bytes)\n",
nr_hash, order, (PAGE_SIZE << order));
- set_shrinker(DEFAULT_SEEKS, shrink_dqcache_memory);
+ shrinker = set_shrinker(DEFAULT_SEEKS, shrink_dqcache_memory);
+ kmem_set_shrinker(dquot_cachep, shrinker);
return 0;
}
diff -puN fs/dcache.c~cache_shrink_stats fs/dcache.c
--- linux-2.6.14-rc2-shrink/fs/dcache.c~cache_shrink_stats 2005-09-28 11:31:35.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/fs/dcache.c 2005-09-28 13:47:46.000000000 +0530
@@ -1668,6 +1668,7 @@ static void __init dcache_init_early(voi
static void __init dcache_init(unsigned long mempages)
{
int loop;
+ struct shrinker *shrinker;
/*
* A constructor could be added for stable state like the lists,
@@ -1680,7 +1681,8 @@ static void __init dcache_init(unsigned
SLAB_RECLAIM_ACCOUNT|SLAB_PANIC,
NULL, NULL);
- set_shrinker(DEFAULT_SEEKS, shrink_dcache_memory);
+ shrinker = set_shrinker(DEFAULT_SEEKS, shrink_dcache_memory);
+ kmem_set_shrinker(dentry_cache, shrinker);
/* Hash may have been set up in dcache_init_early */
if (!hashdist)
diff -puN mm/slab.c~cache_shrink_stats mm/slab.c
--- linux-2.6.14-rc2-shrink/mm/slab.c~cache_shrink_stats 2005-09-28 11:40:00.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/mm/slab.c 2005-10-04 14:09:53.000000000 +0530
@@ -400,6 +400,9 @@ struct kmem_cache_s {
/* de-constructor func */
void (*dtor)(void *, kmem_cache_t *, unsigned long);
+ /* shrinker data for this cache */
+ struct shrinker *shrinker;
+
/* 4) cache creation/removal */
const char *name;
struct list_head next;
@@ -3363,6 +3366,7 @@ static void *s_start(struct seq_file *m,
" <error> <maxfreeable> <nodeallocs> <remotefrees>");
seq_puts(m, " : cpustat <allochit> <allocmiss> <freehit> <freemiss>");
#endif
+ seq_puts(m, " : shrinker stat <nr requested> <nr freed>");
seq_putc(m, '\n');
}
p = cache_chain.next;
@@ -3483,6 +3487,12 @@ static int s_show(struct seq_file *m, vo
allochit, allocmiss, freehit, freemiss);
}
#endif
+ /* shrinker stats */
+ if (cachep->shrinker) {
+ seq_printf(m, " : shrinker stat %7lu %7lu",
+ shrinker_stat_read(cachep->shrinker, nr_req),
+ shrinker_stat_read(cachep->shrinker, nr_freed));
+ }
seq_putc(m, '\n');
spin_unlock_irq(&cachep->spinlock);
return 0;
@@ -3606,3 +3616,8 @@ char *kstrdup(const char *s, unsigned in
return buf;
}
EXPORT_SYMBOL(kstrdup);
+
+void kmem_set_shrinker(kmem_cache_t *cachep, struct shrinker *shrinker)
+{
+ cachep->shrinker = shrinker;
+}
diff -puN include/linux/mm.h~cache_shrink_stats include/linux/mm.h
--- linux-2.6.14-rc2-shrink/include/linux/mm.h~cache_shrink_stats 2005-09-28 12:41:09.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/include/linux/mm.h 2005-10-04 12:29:22.000000000 +0530
@@ -755,7 +755,44 @@ typedef int (*shrinker_t)(int nr_to_scan
*/
#define DEFAULT_SEEKS 2
-struct shrinker;
+
+struct shrinker_stats {
+ unsigned long nr_req; /* objs scanned for possible freeing */
+ unsigned long nr_freed; /* actual number of objects freed */
+};
+
+/*
+ * The list of shrinker callbacks used by to apply pressure to
+ * ageable caches.
+ */
+struct shrinker {
+ shrinker_t shrinker;
+ struct list_head list;
+ int seeks; /* seeks to recreate an obj */
+ long nr; /* objs pending delete */
+ struct shrinker_stats *s_stats;
+};
+
+#define shrinker_stat_add(shrinker, field, addnd) \
+ do { \
+ preempt_disable(); \
+ (per_cpu_ptr(shrinker->s_stats, \
+ smp_processor_id())->field += addnd); \
+ preempt_enable(); \
+ } while (0)
+
+#define shrinker_stat_read(shrinker, field) \
+({ \
+ typeof(shrinker->s_stats->field) res = 0; \
+ int i; \
+ for (i=0; i < NR_CPUS; i++) { \
+ if (!cpu_possible(i)) \
+ continue; \
+ res += per_cpu_ptr(shrinker->s_stats, i)->field; \
+ } \
+ res; \
+})
+
extern struct shrinker *set_shrinker(int, shrinker_t);
extern void remove_shrinker(struct shrinker *shrinker);
diff -puN include/linux/slab.h~cache_shrink_stats include/linux/slab.h
--- linux-2.6.14-rc2-shrink/include/linux/slab.h~cache_shrink_stats 2005-09-28 13:52:53.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/include/linux/slab.h 2005-09-28 14:07:42.000000000 +0530
@@ -147,6 +147,9 @@ extern kmem_cache_t *bio_cachep;
extern atomic_t slab_reclaim_pages;
+struct shrinker;
+extern void kmem_set_shrinker(kmem_cache_t *cachep, struct shrinker *shrinker);
+
#endif /* __KERNEL__ */
#endif /* _LINUX_SLAB_H */
diff -puN fs/mbcache.c~cache_shrink_stats fs/mbcache.c
--- linux-2.6.14-rc2-shrink/fs/mbcache.c~cache_shrink_stats 2005-10-04 13:47:35.000000000 +0530
+++ linux-2.6.14-rc2-shrink-bharata/fs/mbcache.c 2005-10-04 13:48:34.000000000 +0530
@@ -292,6 +292,8 @@ mb_cache_create(const char *name, struct
if (!cache->c_entry_cache)
goto fail;
+ kmem_set_shrinker(cache->c_entry_cache, mb_shrinker);
+
spin_lock(&mb_cache_spinlock);
list_add(&cache->c_cache_list, &mb_cache_list);
spin_unlock(&mb_cache_spinlock);
_
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: shrinkable cache statistics [was Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough]
2005-10-04 13:36 ` shrinkable cache statistics [was Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough] Bharata B Rao
@ 2005-10-05 21:25 ` Marcelo Tosatti
2005-10-07 8:12 ` Bharata B Rao
0 siblings, 1 reply; 32+ messages in thread
From: Marcelo Tosatti @ 2005-10-05 21:25 UTC (permalink / raw)
To: Bharata B Rao; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
Hi Bharata,
On Tue, Oct 04, 2005 at 07:06:35PM +0530, Bharata B Rao wrote:
> Marcelo,
>
> Here's my next attempt in breaking the "slabs_scanned" from /proc/vmstat
> into meaningful per cache statistics. Now I have the statistics counters
> as percpu. [an issue remaining is that there are more than one cache as
> part of mbcache and they all have a common shrinker routine and I am
> displaying the collective shrinker stats info on each of them in
> /proc/slabinfo ==> some kind of duplication]
Looks good to me! IMO it should be a candidate for -mm/mainline.
Nothing useful to suggest on the mbcache issue... sorry.
> With this patch (and my earlier dcache stats patch) I observed some
> interesting results with the following test scenario on a 8cpu p3 box:
>
> - Ran an application which consumes 40% of the total memory.
> - Ran dbench on tmpfs with 128 clients twice (serially).
> - Ran a find on a ext3 partition having ~9.5million entries (files and
> directories included)
>
> At the end of this run, I have the following results:
>
> [root@llm09 bharata]# cat /proc/meminfo
> MemTotal: 3872528 kB
> MemFree: 1420940 kB
> Buffers: 714068 kB
> Cached: 21536 kB
> SwapCached: 2264 kB
> Active: 1672680 kB
> Inactive: 637460 kB
> HighTotal: 3014616 kB
> HighFree: 1411740 kB
> LowTotal: 857912 kB
> LowFree: 9200 kB
> SwapTotal: 2096472 kB
> SwapFree: 2051408 kB
> Dirty: 172 kB
> Writeback: 0 kB
> Mapped: 1583680 kB
> Slab: 119564 kB
> CommitLimit: 4032736 kB
> Committed_AS: 1647260 kB
> PageTables: 2248 kB
> VmallocTotal: 114680 kB
> VmallocUsed: 1264 kB
> VmallocChunk: 113384 kB
> nr_dentries/page nr_pages nr_inuse
> 0 0 0
> 1 5 2
> 2 12 4
> 3 26 9
> 4 46 18
> 5 76 40
> 6 82 47
> 7 91 59
> 8 122 93
> 9 114 102
> 10 142 136
> 11 138 185
> 12 118 164
> 13 128 206
> 14 126 208
> 15 120 219
> 16 136 261
> 17 159 315
> 18 145 311
> 19 179 379
> 20 192 407
> 21 256 631
> 22 286 741
> 23 316 816
> 24 342 934
> 25 381 1177
> 26 664 2813
> 27 0 0
> 28 0 0
> 29 0 0
> Total: 4402 10277
> dcache lru: total 75369 inuse 3599
>
> [Here,
> nr_dentries/page - Number of dentries per page
> nr_pages - Number of pages with given number of dentries
> nr_inuse - Number of inuse dentries in those pages.
> Eg: From the above data, there are 26 pages with 3 dentries each
> and out of 78 total dentries in these 3 pages, 9 dentries are in use.]
>
> [root@llm09 bharata]# grep shrinker /proc/slabinfo
> # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> : shrinker stat <nr requested> <nr freed>
> ext3_xattr 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0 : shrinker stat 0 0
> dquot 0 0 160 24 1 : tunables 120 60 8 : slabdata 0 0 0 : shrinker stat 0 0
> inode_cache 1301 1390 400 10 1 : tunables 54 27 8 : slabdata 139 139 0 : shrinker stat 682752 681900
> dentry_cache 82110 114452 152 26 1 : tunables 120 60 8 : slabdata 4402 4402 0 : shrinker stat 1557760 760100
>
> [root@llm09 bharata]# grep slabs_scanned /proc/vmstat
> slabs_scanned 2240512
>
> [root@llm09 bharata]# cat /proc/sys/fs/dentry-state
> 82046 75369 45 0 3599 0
> [The order of dentry-state o/p is like this:
> total dentries in dentry hash list, total dentries in lru list, age limit,
> want_pages, inuse dentries in lru list, dummy]
>
> So, we can see that with low memory pressure, even though the
> shrinker runs on dcache repeatedly, not many dentries are freed
> by dcache. And dcache lru list still has huge number of free
> dentries.
The success/attempt ratio is about 1/2, which seems alright?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: shrinkable cache statistics [was Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough]
2005-10-05 21:25 ` Marcelo Tosatti
@ 2005-10-07 8:12 ` Bharata B Rao
0 siblings, 0 replies; 32+ messages in thread
From: Bharata B Rao @ 2005-10-07 8:12 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Theodore Ts'o, Dipankar Sarma, linux-mm, linux-kernel
On Wed, Oct 05, 2005 at 06:25:51PM -0300, Marcelo Tosatti wrote:
> Hi Bharata,
>
> On Tue, Oct 04, 2005 at 07:06:35PM +0530, Bharata B Rao wrote:
> > Marcelo,
> >
> > Here's my next attempt in breaking the "slabs_scanned" from /proc/vmstat
> > into meaningful per cache statistics. Now I have the statistics counters
> > as percpu. [an issue remaining is that there are more than one cache as
> > part of mbcache and they all have a common shrinker routine and I am
> > displaying the collective shrinker stats info on each of them in
> > /proc/slabinfo ==> some kind of duplication]
>
> Looks good to me! IMO it should be a candidate for -mm/mainline.
>
> Nothing useful to suggest on the mbcache issue... sorry.
Thanks Marcelo for reviewing.
<snip>
> >
> > [root@llm09 bharata]# grep shrinker /proc/slabinfo
> > # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> : shrinker stat <nr requested> <nr freed>
> > ext3_xattr 0 0 48 78 1 : tunables 120 60 8 : slabdata 0 0 0 : shrinker stat 0 0
> > dquot 0 0 160 24 1 : tunables 120 60 8 : slabdata 0 0 0 : shrinker stat 0 0
> > inode_cache 1301 1390 400 10 1 : tunables 54 27 8 : slabdata 139 139 0 : shrinker stat 682752 681900
> > dentry_cache 82110 114452 152 26 1 : tunables 120 60 8 : slabdata 4402 4402 0 : shrinker stat 1557760 760100
> >
> > [root@llm09 bharata]# grep slabs_scanned /proc/vmstat
> > slabs_scanned 2240512
> >
> > [root@llm09 bharata]# cat /proc/sys/fs/dentry-state
> > 82046 75369 45 0 3599 0
> > [The order of dentry-state o/p is like this:
> > total dentries in dentry hash list, total dentries in lru list, age limit,
> > want_pages, inuse dentries in lru list, dummy]
> >
> > So, we can see that with low memory pressure, even though the
> > shrinker runs on dcache repeatedly, not many dentries are freed
> > by dcache. And dcache lru list still has huge number of free
> > dentries.
>
> The success/attempt ratio is about 1/2, which seems alright?
>
Hmm... when compared to inode_cache, I felt dcache shrinker wasn't
doing a good job. Anyway I will analyze further to see if things
can be made better with the existing shrinker.
Regards,
Bharata.
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2005-10-07 8:13 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-11 10:57 VM balancing issues on 2.6.13: dentry cache not getting shrunk enough Theodore Ts'o
2005-09-11 12:00 ` Dipankar Sarma
2005-09-12 3:16 ` Theodore Ts'o
2005-09-12 6:16 ` Martin J. Bligh
2005-09-12 12:53 ` Bharata B Rao
2005-09-13 8:47 ` Bharata B Rao
2005-09-13 21:59 ` David Chinner
2005-09-14 9:01 ` Andi Kleen
2005-09-14 9:16 ` Manfred Spraul
2005-09-14 9:43 ` Andrew Morton
2005-09-14 9:52 ` Dipankar Sarma
2005-09-14 22:44 ` Theodore Ts'o
2005-09-14 9:35 ` Andrew Morton
2005-09-14 13:57 ` Martin J. Bligh
2005-09-14 15:37 ` Sonny Rao
2005-09-15 7:21 ` Helge Hafting
2005-09-14 22:48 ` David Chinner
2005-09-14 15:48 ` Sonny Rao
2005-09-14 22:02 ` David Chinner
2005-09-14 22:40 ` Sonny Rao
2005-09-15 1:14 ` David Chinner
2005-09-14 21:34 ` Marcelo Tosatti
2005-09-14 21:43 ` Dipankar Sarma
2005-09-15 4:28 ` Bharata B Rao
2005-09-14 23:08 ` Marcelo Tosatti
2005-09-15 9:39 ` Bharata B Rao
2005-09-15 13:29 ` Marcelo Tosatti
2005-10-02 16:32 ` Bharata B Rao
2005-10-02 20:06 ` Marcelo
2005-10-04 13:36 ` shrinkable cache statistics [was Re: VM balancing issues on 2.6.13: dentry cache not getting shrunk enough] Bharata B Rao
2005-10-05 21:25 ` Marcelo Tosatti
2005-10-07 8:12 ` Bharata B Rao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).