* Re: 2.6.8.1 mempool subsystem sickness
[not found] <091420042058.15928.41475B8000002BA100003E382200763704970A059D0A0306@comcast.net>
@ 2004-09-14 20:32 ` Jeff V. Merkey
2004-09-14 22:59 ` Nick Piggin
0 siblings, 1 reply; 10+ messages in thread
From: Jeff V. Merkey @ 2004-09-14 20:32 UTC (permalink / raw)
To: Nick Piggin, linux-kernel, jmerkey
Hi Jeff,
> Can you give us a few more details please? Post the allocation failure
> messages in full, and post /proc/meminfo, etc. Thanks.
> -
>
Here you go.
Jeff
Sep 14 14:18:59 datascout4 kernel: if_regen2: page allocation failure. order:3,
mode:0x20
Sep 14 14:18:59 datascout4 kernel: [<80106c7e>] dump_stack+0x1e/0x30
Sep 14 14:18:59 datascout4 kernel: [<80134aac>] __alloc_pages+0x2dc/0x350
Sep 14 14:18:59 datascout4 kernel: [<80134b42>] __get_free_pages+0x22/0x50
Sep 14 14:18:59 datascout4 kernel: [<80137d9f>] kmem_getpages+0x1f/0xd0
Sep 14 14:18:59 datascout4 kernel: [<8013897a>] cache_grow+0x9a/0x130
Sep 14 14:18:59 datascout4 kernel: [<80138b4b>] cache_alloc_refill+0x13b/0x1e0
Sep 14 14:18:59 datascout4 kernel: [<80138fa4>] __kmalloc+0x74/0x80
Sep 14 14:18:59 datascout4 kernel: [<80299298>] alloc_skb+0x48/0xf0
Sep 14 14:18:59 datascout4 kernel: [<f8972e67>] create_xmit_packet+0x57/0x100
[dsfs]
Sep 14 14:18:59 datascout4 kernel: [<f8973150>] regen_data+0x60/0x1d0 [dsfs]
Sep 14 14:18:59 datascout4 kernel: [<80104355>] kernel_thread_helper+0x5/0x10
Sep 14 14:18:59 datascout4 kernel: printk: 12 messages suppressed.
MemTotal: 1944860 kB
MemFree: 200008 kB
Buffers: 133772 kB
Cached: 678268 kB
SwapCached: 208 kB
Active: 435712 kB
Inactive: 436756 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 1944860 kB
LowFree: 200008 kB
SwapTotal: 1052248 kB
SwapFree: 1051976 kB
Dirty: 7960 kB
Writeback: 0 kB
Mapped: 68160 kB
Slab: 863556 kB
Committed_AS: 92220 kB
PageTables: 1344 kB
VmallocTotal: 122804 kB
VmallocUsed: 2872 kB
VmallocChunk: 119856 kB
evdev 7552 0 - Live 0xf8aa3000
ipx 24364 0 - Live 0xf89f5000
appletalk 29748 0 - Live 0xf8ad8000
parport_pc 22464 1 - Live 0xf89e3000
lp 9644 0 - Live 0xf880f000
parport 34376 2 parport_pc,lp, Live 0xf8aab000
autofs4 15492 0 - Live 0xf89f0000
rfcomm 32412 0 - Live 0xf8a96000
l2cap 19460 5 rfcomm, Live 0xf89ea000
bluetooth 39812 4 rfcomm,l2cap, Live 0xf89ce000
sunrpc 126436 1 - Live 0xf8ab8000
e1000 83460 0 - Live 0xf89fc000
sg 33568 0 - Live 0xf89d9000
microcode 5536 0 - Live 0xf8851000
ide_cd 36512 0 - Live 0xf890c000
cdrom 35868 1 ide_cd, Live 0xf8867000
dsfs 269912 1 - Live 0xf896d000
sd_mod 17280 2 - Live 0xf8861000
3w_xxxx 35108 2 - Live 0xf8822000
scsi_mod 103244 3 sg,sd_mod,3w_xxxx, Live 0xf8923000
dm_mod 49788 0 - Live 0xf8871000
orinoco_usb 22440 0 - Live 0xf8847000
firmware_class 7424 1 orinoco_usb, Live 0xf8813000
orinoco 44048 1 orinoco_usb, Live 0xf8855000
uhci_hcd 28944 0 - Live 0xf8819000
usbcore 99300 4 orinoco_usb,uhci_hcd, Live 0xf882d000
slabinfo - version: 2.0
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
: tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs>
<num_slabs> <sharedavail>
bt_sock 3 10 384 10 1 : tunables 54 27 0 :
slabdata 1 1 0
rpc_buffers 8 8 2048 2 1 : tunables 24 12 0 :
slabdata 4 4 0
rpc_tasks 8 15 256 15 1 : tunables 120 60 0 :
slabdata 1 1 0
rpc_inode_cache 6 7 512 7 1 : tunables 54 27 0 :
slabdata 1 1 0
ip_fib_hash 11 226 16 226 1 : tunables 120 60 0 :
slabdata 1 1 0
scsi_cmd_cache 160 160 384 10 1 : tunables 54 27 0 :
slabdata 16 16 0
sgpool-128 32 32 2048 2 1 : tunables 24 12 0 :
slabdata 16 16 0
sgpool-64 32 32 1024 4 1 : tunables 54 27 0 :
slabdata 8 8 0
sgpool-32 32 32 512 8 1 : tunables 54 27 0 :
slabdata 4 4 0
sgpool-16 32 45 256 15 1 : tunables 120 60 0 :
slabdata 3 3 0
sgpool-8 217 217 128 31 1 : tunables 120 60 0 :
slabdata 7 7 0
dm_tio 0 0 16 226 1 : tunables 120 60 0 :
slabdata 0 0 0
dm_io 0 0 16 226 1 : tunables 120 60 0 :
slabdata 0 0 0
uhci_urb_priv 1 88 44 88 1 : tunables 120 60 0 :
slabdata 1 1 0
dn_fib_info_cache 0 0 128 31 1 : tunables 120 60 0 :
slabdata 0 0 0
dn_dst_cache 0 0 256 15 1 : tunables 120 60 0 :
slabdata 0 0 0
dn_neigh_cache 0 0 256 15 1 : tunables 120 60 0 :
slabdata 0 0 0
decnet_socket_cache 0 0 768 5 1 : tunables 54 27 0 :
slabdata 0 0 0
clip_arp_cache 0 0 256 15 1 : tunables 120 60 0 :
slabdata 0 0 0
xfrm6_tunnel_spi 0 0 64 61 1 : tunables 120 60 0 :
slabdata 0 0 0
fib6_nodes 15 119 32 119 1 : tunables 120 60 0 :
slabdata 1 1 0
ip6_dst_cache 60 105 256 15 1 : tunables 120 60 0 :
slabdata 7 7 0
ndisc_cache 5 15 256 15 1 : tunables 120 60 0 :
slabdata 1 1 0
raw6_sock 0 0 640 6 1 : tunables 54 27 0 :
slabdata 0 0 0
udp6_sock 0 0 640 6 1 : tunables 54 27 0 :
slabdata 0 0 0
tcp6_sock 7 7 1152 7 2 : tunables 24 12 0 :
slabdata 1 1 0
unix_sock 50 50 384 10 1 : tunables 54 27 0 :
slabdata 5 5 0
ip_mrt_cache 0 0 128 31 1 : tunables 120 60 0 :
slabdata 0 0 0
tcp_tw_bucket 0 0 128 31 1 : tunables 120 60 0 :
slabdata 0 0 0
tcp_bind_bucket 8 226 16 226 1 : tunables 120 60 0 :
slabdata 1 1 0
tcp_open_request 0 0 128 31 1 : tunables 120 60 0 :
slabdata 0 0 0
inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 :
slabdata 0 0 0
secpath_cache 0 0 128 31 1 : tunables 120 60 0 :
slabdata 0 0 0
xfrm_dst_cache 0 0 256 15 1 : tunables 120 60 0 :
slabdata 0 0 0
ip_dst_cache 20 30 256 15 1 : tunables 120 60 0 :
slabdata 2 2 0
arp_cache 1 31 128 31 1 : tunables 120 60 0 :
slabdata 1 1 0
raw4_sock 0 0 512 7 1 : tunables 54 27 0 :
slabdata 0 0 0
udp_sock 3 7 512 7 1 : tunables 54 27 0 :
slabdata 1 1 0
tcp_sock 16 16 1024 4 1 : tunables 54 27 0 :
slabdata 4 4 0
flow_cache 0 0 128 31 1 : tunables 120 60 0 :
slabdata 0 0 0
isofs_inode_cache 0 0 384 10 1 : tunables 54 27 0 :
slabdata 0 0 0
fat_inode_cache 0 0 384 10 1 : tunables 54 27 0 :
slabdata 0 0 0
ext2_inode_cache 0 0 512 7 1 : tunables 54 27 0 :
slabdata 0 0 0
journal_handle 8 135 28 135 1 : tunables 120 60 0 :
slabdata 1 1 0
journal_head 456 2349 48 81 1 : tunables 120 60 0 :
slabdata 29 29 0
revoke_table 4 290 12 290 1 : tunables 120 60 0 :
slabdata 1 1 0
revoke_record 0 0 16 226 1 : tunables 120 60 0 :
slabdata 0 0 0
ext3_inode_cache 336400 340655 512 7 1 : tunables 54 27 0 :
slabdata 48665 48665 0
ext3_xattr 0 0 44 88 1 : tunables 120 60 0 :
slabdata 0 0 0
reiser_inode_cache 0 0 384 10 1 : tunables 54 27 0 :
slabdata 0 0 0
dquot 0 0 128 31 1 : tunables 120 60 0 :
slabdata 0 0 0
eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 :
slabdata 0 0 0
eventpoll_epi 0 0 128 31 1 : tunables 120 60 0 :
slabdata 0 0 0
kioctx 0 0 256 15 1 : tunables 120 60 0 :
slabdata 0 0 0
kiocb 0 0 128 31 1 : tunables 120 60 0 :
slabdata 0 0 0
dnotify_cache 1 185 20 185 1 : tunables 120 60 0 :
slabdata 1 1 0
file_lock_cache 2 43 92 43 1 : tunables 120 60 0 :
slabdata 1 1 0
fasync_cache 0 0 16 226 1 : tunables 120 60 0 :
slabdata 0 0 0
shmem_inode_cache 8 14 512 7 1 : tunables 54 27 0 :
slabdata 2 2 0
posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 :
slabdata 0 0 0
uid_cache 9 119 32 119 1 : tunables 120 60 0 :
slabdata 1 1 0
cfq_pool 64 119 32 119 1 : tunables 120 60 0 :
slabdata 1 1 0
crq_pool 0 0 36 107 1 : tunables 120 60 0 :
slabdata 0 0 0
deadline_drq 0 0 48 81 1 : tunables 120 60 0 :
slabdata 0 0 0
as_arq 200 260 60 65 1 : tunables 120 60 0 :
slabdata 4 4 0
blkdev_ioc 35 185 20 185 1 : tunables 120 60 0 :
slabdata 1 1 0
blkdev_queue 21 27 448 9 1 : tunables 54 27 0 :
slabdata 3 3 0
blkdev_requests 252 286 152 26 1 : tunables 120 60 0 :
slabdata 11 11 0
biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 :
slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12 0 :
slabdata 52 52 0
biovec-64 256 260 768 5 1 : tunables 54 27 0 :
slabdata 52 52 0
biovec-16 131328 131340 256 15 1 : tunables 120 60 0 :
slabdata 8756 8756 0
biovec-4 256 305 64 61 1 : tunables 120 60 0 :
slabdata 5 5 0
biovec-1 404 678 16 226 1 : tunables 120 60 0 :
slabdata 3 3 0
bio 131457 131577 64 61 1 : tunables 120 60 0 :
slabdata 2157 2157 0
sock_inode_cache 60 60 384 10 1 : tunables 54 27 0 :
slabdata 6 6 0
skbuff_head_cache 2130 2370 256 15 1 : tunables 120 60 0 :
slabdata 158 158 0
sock 4 10 384 10 1 : tunables 54 27 0 :
slabdata 1 1 0
proc_inode_cache 409 470 384 10 1 : tunables 54 27 0 :
slabdata 47 47 0
sigqueue 27 27 148 27 1 : tunables 120 60 0 :
slabdata 1 1 0
radix_tree_node 53400 57358 276 14 1 : tunables 54 27 0 :
slabdata 4097 4097 0
bdev_cache 8 14 512 7 1 : tunables 54 27 0 :
slabdata 2 2 0
mnt_cache 21 31 128 31 1 : tunables 120 60 0 :
slabdata 1 1 0
inode_cache 3299 3310 384 10 1 : tunables 54 27 0 :
slabdata 331 331 0
dentry_cache 198368 254576 140 28 1 : tunables 120 60 0 :
slabdata 9092 9092 0
filp 675 675 256 15 1 : tunables 120 60 0 :
slabdata 45 45 0
names_cache 4 4 4096 1 1 : tunables 24 12 0 :
slabdata 4 4 0
idr_layer_cache 88 116 136 29 1 : tunables 120 60 0 :
slabdata 4 4 0
buffer_head 193153 230931 48 81 1 : tunables 120 60 0 :
slabdata 2851 2851 0
mm_struct 70 70 512 7 1 : tunables 54 27 0 :
slabdata 10 10 0
vm_area_struct 1726 1786 84 47 1 : tunables 120 60 0 :
slabdata 38 38 0
fs_cache 106 119 32 119 1 : tunables 120 60 0 :
slabdata 1 1 0
files_cache 70 70 512 7 1 : tunables 54 27 0 :
slabdata 10 10 0
signal_cache 124 124 128 31 1 : tunables 120 60 0 :
slabdata 4 4 0
sighand_cache 90 90 1408 5 2 : tunables 24 12 0 :
slabdata 18 18 0
task_struct 95 95 1424 5 2 : tunables 24 12 0 :
slabdata 19 19 0
anon_vma 814 814 8 407 1 : tunables 120 60 0 :
slabdata 2 2 0
pgd 65 65 4096 1 1 : tunables 24 12 0 :
slabdata 65 65 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 :
slabdata 0 0 0
size-131072 2 2 131072 1 32 : tunables 8 4 0 :
slabdata 2 2 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 :
slabdata 0 0 0
size-65536 8234 8234 65536 1 16 : tunables 8 4 0 :
slabdata 8234 8234 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 :
slabdata 0 0 0
size-32768 7 16 32768 1 8 : tunables 8 4 0 :
slabdata 7 16 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 :
slabdata 0 0 0
size-16384 6 6 16384 1 4 : tunables 8 4 0 :
slabdata 6 6 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 :
slabdata 0 0 0
size-8192 105 109 8192 1 2 : tunables 8 4 0 :
slabdata 105 109 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 :
slabdata 0 0 0
size-4096 2099 2111 4096 1 1 : tunables 24 12 0 :
slabdata 2099 2111 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 :
slabdata 0 0 0
size-2048 8276 8276 2048 2 1 : tunables 24 12 0 :
slabdata 4138 4138 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 :
slabdata 0 0 0
size-1024 140 140 1024 4 1 : tunables 54 27 0 :
slabdata 35 35 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 :
slabdata 0 0 0
size-512 176 560 512 8 1 : tunables 54 27 0 :
slabdata 70 70 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 :
slabdata 0 0 0
size-256 222 480 256 15 1 : tunables 120 60 0 :
slabdata 32 32 0
size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 :
slabdata 0 0 0
size-128 1807 1829 128 31 1 : tunables 120 60 0 :
slabdata 59 59 0
size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 :
slabdata 0 0 0
size-64 7762 8662 64 61 1 : tunables 120 60 0 :
slabdata 142 142 0
size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 :
slabdata 0 0 0
size-32 15764 16184 32 119 1 : tunables 120 60 0 :
slabdata 136 136 0
kmem_cache 124 124 128 31 1 : tunables 120 60 0 :
slabdata 4 4 0
nr_dirty 1316
nr_writeback 0
nr_unstable 0
nr_page_table_pages 336
nr_mapped 18227
nr_slab 215890
pgpgin 1496713
pgpgout 1020309568
pswpin 365
pswpout 3752
pgalloc_high 0
pgalloc_normal 1578465026
pgalloc_dma 319286
pgfree 1578828064
pgactivate 592523
pgdeactivate 198222
pgfault 118936465
pgmajfault 1721
pgrefill_high 0
pgrefill_normal 199636
pgrefill_dma 50800
pgsteal_high 0
pgsteal_normal 172811
pgsteal_dma 26364
pgscan_kswapd_high 0
pgscan_kswapd_normal 79497
pgscan_kswapd_dma 140076
pgscan_direct_high 0
pgscan_direct_normal 99231
pgscan_direct_dma 3795
pginodesteal 3837
slabs_scanned 847283
kswapd_steal 101698
kswapd_inodesteal 35700
pageoutrun 282
allocstall 2977
pgrotated 20310
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness
2004-09-14 20:32 ` 2.6.8.1 mempool subsystem sickness Jeff V. Merkey
@ 2004-09-14 22:59 ` Nick Piggin
[not found] ` <20040914223122.GA3325@galt.devicelogics.com>
0 siblings, 1 reply; 10+ messages in thread
From: Nick Piggin @ 2004-09-14 22:59 UTC (permalink / raw)
To: Jeff V. Merkey; +Cc: linux-kernel, jmerkey
Jeff V. Merkey wrote:
>
> Hi Jeff,
>
>> Can you give us a few more details please? Post the allocation failure
>> messages in full, and post /proc/meminfo, etc. Thanks.
>> -
>>
> Here you go.
>
Thanks.
> Jeff
>
> Sep 14 14:18:59 datascout4 kernel: if_regen2: page allocation failure.
> order:3, mode:0x20
> Sep 14 14:18:59 datascout4 kernel: [<80106c7e>] dump_stack+0x1e/0x30
> Sep 14 14:18:59 datascout4 kernel: [<80134aac>] __alloc_pages+0x2dc/0x350
> Sep 14 14:18:59 datascout4 kernel: [<80134b42>] __get_free_pages+0x22/0x50
> Sep 14 14:18:59 datascout4 kernel: [<80137d9f>] kmem_getpages+0x1f/0xd0
> Sep 14 14:18:59 datascout4 kernel: [<8013897a>] cache_grow+0x9a/0x130
> Sep 14 14:18:59 datascout4 kernel: [<80138b4b>]
> cache_alloc_refill+0x13b/0x1e0
> Sep 14 14:18:59 datascout4 kernel: [<80138fa4>] __kmalloc+0x74/0x80
> Sep 14 14:18:59 datascout4 kernel: [<80299298>] alloc_skb+0x48/0xf0
> Sep 14 14:18:59 datascout4 kernel: [<f8972e67>]
> create_xmit_packet+0x57/0x100 [dsfs]
> Sep 14 14:18:59 datascout4 kernel: [<f8973150>] regen_data+0x60/0x1d0
> [dsfs]
> Sep 14 14:18:59 datascout4 kernel: [<80104355>]
> kernel_thread_helper+0x5/0x10
> Sep 14 14:18:59 datascout4 kernel: printk: 12 messages suppressed.
>
> MemTotal: 1944860 kB
> MemFree: 200008 kB
Wow. You have 200MB free, and can't satisfy an order 3 allocation (although it
is atomic).
Now it just so happens that I have a couple of patches that are supposed to fix
exactly this. Unfortunately I haven't had the hardware to properly test them
(they're pretty stable though). Want to give them a try? :)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness
[not found] ` <20040914223122.GA3325@galt.devicelogics.com>
@ 2004-09-14 23:51 ` Nick Piggin
2004-09-15 0:51 ` Gene Heskett
2004-09-15 17:27 ` Jeff V. Merkey
0 siblings, 2 replies; 10+ messages in thread
From: Nick Piggin @ 2004-09-14 23:51 UTC (permalink / raw)
To: jmerkey; +Cc: Jeff V. Merkey, linux-kernel, jmerkey
[-- Attachment #1: Type: text/plain, Size: 199 bytes --]
jmerkey@galt.devicelogics.com wrote:
> You bet. Send them to me. For some reason I am not able to post
> to LKML again.
>
> Jeff
>
OK, this is against 2.6.9-rc2. Let me know how you go. Thanks
[-- Attachment #2: vm-rollup.patch --]
[-- Type: text/x-patch, Size: 9996 bytes --]
---
linux-2.6-npiggin/include/linux/mmzone.h | 8 ++
linux-2.6-npiggin/mm/page_alloc.c | 83 ++++++++++++++++++-------------
linux-2.6-npiggin/mm/vmscan.c | 34 +++++++++---
3 files changed, 81 insertions(+), 44 deletions(-)
diff -puN mm/page_alloc.c~vm-rollup mm/page_alloc.c
--- linux-2.6/mm/page_alloc.c~vm-rollup 2004-09-15 09:48:12.000000000 +1000
+++ linux-2.6-npiggin/mm/page_alloc.c 2004-09-15 09:48:59.000000000 +1000
@@ -206,6 +206,7 @@ static inline void __free_pages_bulk (st
BUG_ON(bad_range(zone, buddy1));
BUG_ON(bad_range(zone, buddy2));
list_del(&buddy1->lru);
+ area->nr_free--;
mask <<= 1;
order++;
area++;
@@ -213,6 +214,7 @@ static inline void __free_pages_bulk (st
page_idx &= mask;
}
list_add(&(base + page_idx)->lru, &area->free_list);
+ area->nr_free++;
}
static inline void free_pages_check(const char *function, struct page *page)
@@ -314,6 +316,7 @@ expand(struct zone *zone, struct page *p
size >>= 1;
BUG_ON(bad_range(zone, &page[size]));
list_add(&page[size].lru, &area->free_list);
+ area->nr_free++;
MARK_USED(index + size, high, area);
}
return page;
@@ -377,6 +380,7 @@ static struct page *__rmqueue(struct zon
page = list_entry(area->free_list.next, struct page, lru);
list_del(&page->lru);
+ area->nr_free--;
index = page - zone->zone_mem_map;
if (current_order != MAX_ORDER-1)
MARK_USED(index, current_order, area);
@@ -579,6 +583,36 @@ buffered_rmqueue(struct zone *zone, int
}
/*
+ * Return 1 if free pages are above 'mark'. This takes into account the order
+ * of the allocation.
+ */
+int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
+ int alloc_type, int can_try_harder, int gfp_high)
+{
+ unsigned long min = mark, free_pages = z->free_pages;
+ int o;
+
+ if (gfp_high)
+ min -= min / 2;
+ if (can_try_harder)
+ min -= min / 4;
+
+ if (free_pages < min + z->protection[alloc_type])
+ return 0;
+ for (o = 0; o < order; o++) {
+ /* At the next order, this order's pages become unavailable */
+ free_pages -= z->free_area[order].nr_free << o;
+
+ /* Require fewer higher order pages to be free */
+ min >>= 1;
+
+ if (free_pages < min + (1 << order) - 1)
+ return 0;
+ }
+ return 1;
+}
+
+/*
* This is the 'heart' of the zoned buddy allocator.
*
* Herein lies the mysterious "incremental min". That's the
@@ -599,7 +633,6 @@ __alloc_pages(unsigned int gfp_mask, uns
struct zonelist *zonelist)
{
const int wait = gfp_mask & __GFP_WAIT;
- unsigned long min;
struct zone **zones, *z;
struct page *page;
struct reclaim_state reclaim_state;
@@ -629,9 +662,9 @@ __alloc_pages(unsigned int gfp_mask, uns
/* Go through the zonelist once, looking for a zone with enough free */
for (i = 0; (z = zones[i]) != NULL; i++) {
- min = z->pages_low + (1<<order) + z->protection[alloc_type];
- if (z->free_pages < min)
+ if (!zone_watermark_ok(z, order, z->pages_low,
+ alloc_type, 0, 0))
continue;
page = buffered_rmqueue(z, order, gfp_mask);
@@ -640,21 +673,16 @@ __alloc_pages(unsigned int gfp_mask, uns
}
for (i = 0; (z = zones[i]) != NULL; i++)
- wakeup_kswapd(z);
+ wakeup_kswapd(z, order);
/*
* Go through the zonelist again. Let __GFP_HIGH and allocations
* coming from realtime tasks to go deeper into reserves
*/
for (i = 0; (z = zones[i]) != NULL; i++) {
- min = z->pages_min;
- if (gfp_mask & __GFP_HIGH)
- min /= 2;
- if (can_try_harder)
- min -= min / 4;
- min += (1<<order) + z->protection[alloc_type];
-
- if (z->free_pages < min)
+ if (!zone_watermark_ok(z, order, z->pages_min,
+ alloc_type, can_try_harder,
+ gfp_mask & __GFP_HIGH))
continue;
page = buffered_rmqueue(z, order, gfp_mask);
@@ -690,14 +718,9 @@ rebalance:
/* go through the zonelist yet one more time */
for (i = 0; (z = zones[i]) != NULL; i++) {
- min = z->pages_min;
- if (gfp_mask & __GFP_HIGH)
- min /= 2;
- if (can_try_harder)
- min -= min / 4;
- min += (1<<order) + z->protection[alloc_type];
-
- if (z->free_pages < min)
+ if (!zone_watermark_ok(z, order, z->pages_min,
+ alloc_type, can_try_harder,
+ gfp_mask & __GFP_HIGH))
continue;
page = buffered_rmqueue(z, order, gfp_mask);
@@ -1117,7 +1140,6 @@ void show_free_areas(void)
}
for_each_zone(zone) {
- struct list_head *elem;
unsigned long nr, flags, order, total = 0;
show_node(zone);
@@ -1129,9 +1151,7 @@ void show_free_areas(void)
spin_lock_irqsave(&zone->lock, flags);
for (order = 0; order < MAX_ORDER; order++) {
- nr = 0;
- list_for_each(elem, &zone->free_area[order].free_list)
- ++nr;
+ nr = zone->free_area[order].nr_free;
total += nr << order;
printk("%lu*%lukB ", nr, K(1UL) << order);
}
@@ -1457,6 +1477,7 @@ void zone_init_free_lists(struct pglist_
bitmap_size = pages_to_bitmap_size(order, size);
zone->free_area[order].map =
(unsigned long *) alloc_bootmem_node(pgdat, bitmap_size);
+ zone->free_area[order].nr_free = 0;
}
}
@@ -1481,6 +1502,7 @@ static void __init free_area_init_core(s
pgdat->nr_zones = 0;
init_waitqueue_head(&pgdat->kswapd_wait);
+ pgdat->kswapd_max_order = 0;
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
@@ -1644,8 +1666,7 @@ static void frag_stop(struct seq_file *m
}
/*
- * This walks the freelist for each zone. Whilst this is slow, I'd rather
- * be slow here than slow down the fast path by keeping stats - mjbligh
+ * This walks the free areas for each zone.
*/
static int frag_show(struct seq_file *m, void *arg)
{
@@ -1661,14 +1682,8 @@ static int frag_show(struct seq_file *m,
spin_lock_irqsave(&zone->lock, flags);
seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
- for (order = 0; order < MAX_ORDER; ++order) {
- unsigned long nr_bufs = 0;
- struct list_head *elem;
-
- list_for_each(elem, &(zone->free_area[order].free_list))
- ++nr_bufs;
- seq_printf(m, "%6lu ", nr_bufs);
- }
+ for (order = 0; order < MAX_ORDER; ++order)
+ seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
spin_unlock_irqrestore(&zone->lock, flags);
seq_putc(m, '\n');
}
diff -puN include/linux/mmzone.h~vm-rollup include/linux/mmzone.h
--- linux-2.6/include/linux/mmzone.h~vm-rollup 2004-09-15 09:48:16.000000000 +1000
+++ linux-2.6-npiggin/include/linux/mmzone.h 2004-09-15 09:48:59.000000000 +1000
@@ -23,6 +23,7 @@
struct free_area {
struct list_head free_list;
unsigned long *map;
+ unsigned long nr_free;
};
struct pglist_data;
@@ -262,8 +263,9 @@ typedef struct pglist_data {
range, including holes */
int node_id;
struct pglist_data *pgdat_next;
- wait_queue_head_t kswapd_wait;
+ wait_queue_head_t kswapd_wait;
struct task_struct *kswapd;
+ int kswapd_max_order;
} pg_data_t;
#define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages)
@@ -277,7 +279,9 @@ void __get_zone_counts(unsigned long *ac
void get_zone_counts(unsigned long *active, unsigned long *inactive,
unsigned long *free);
void build_all_zonelists(void);
-void wakeup_kswapd(struct zone *zone);
+void wakeup_kswapd(struct zone *zone, int order);
+int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
+ int alloc_type, int can_try_harder, int gfp_high);
/*
* zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc.
diff -puN mm/vmscan.c~vm-rollup mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-rollup 2004-09-15 09:48:18.000000000 +1000
+++ linux-2.6-npiggin/mm/vmscan.c 2004-09-15 09:49:31.000000000 +1000
@@ -965,7 +965,7 @@ out:
* the page allocator fallback scheme to ensure that aging of pages is balanced
* across the zones.
*/
-static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
+static int balance_pgdat(pg_data_t *pgdat, int nr_pages, int order)
{
int to_free = nr_pages;
int priority;
@@ -1003,7 +1003,8 @@ static int balance_pgdat(pg_data_t *pgda
priority != DEF_PRIORITY)
continue;
- if (zone->free_pages <= zone->pages_high) {
+ if (!zone_watermark_ok(zone, order,
+ zone->pages_high, 0, 0, 0)) {
end_zone = i;
goto scan;
}
@@ -1035,7 +1036,8 @@ scan:
continue;
if (nr_pages == 0) { /* Not software suspend */
- if (zone->free_pages <= zone->pages_high)
+ if (!zone_watermark_ok(zone, order,
+ zone->pages_high, end_zone, 0, 0))
all_zones_ok = 0;
}
zone->temp_priority = priority;
@@ -1126,13 +1128,26 @@ static int kswapd(void *p)
tsk->flags |= PF_MEMALLOC|PF_KSWAPD;
for ( ; ; ) {
+ unsigned long order = 0, new_order;
if (current->flags & PF_FREEZE)
refrigerator(PF_FREEZE);
+
prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
- schedule();
+ new_order = pgdat->kswapd_max_order;
+ pgdat->kswapd_max_order = 0;
+ if (order < new_order) {
+ /*
+ * Don't sleep if someone wants a larger 'order'
+ * allocation
+ */
+ order = new_order;
+ } else {
+ schedule();
+ order = pgdat->kswapd_max_order;
+ }
finish_wait(&pgdat->kswapd_wait, &wait);
- balance_pgdat(pgdat, 0);
+ balance_pgdat(pgdat, 0, order);
}
return 0;
}
@@ -1140,10 +1155,13 @@ static int kswapd(void *p)
/*
* A zone is low on free memory, so wake its kswapd task to service it.
*/
-void wakeup_kswapd(struct zone *zone)
+void wakeup_kswapd(struct zone *zone, int order)
{
- if (zone->free_pages > zone->pages_low)
+ pg_data_t *pgdat = zone->zone_pgdat;
+
+ if (pgdat->kswapd_max_order < order)
return;
+ pgdat->kswapd_max_order = order;
if (!waitqueue_active(&zone->zone_pgdat->kswapd_wait))
return;
wake_up_interruptible(&zone->zone_pgdat->kswapd_wait);
@@ -1166,7 +1184,7 @@ int shrink_all_memory(int nr_pages)
current->reclaim_state = &reclaim_state;
for_each_pgdat(pgdat) {
int freed;
- freed = balance_pgdat(pgdat, nr_to_free);
+ freed = balance_pgdat(pgdat, nr_to_free, 0);
ret += freed;
nr_to_free -= freed;
if (nr_to_free <= 0)
_
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness
2004-09-14 23:51 ` Nick Piggin
@ 2004-09-15 0:51 ` Gene Heskett
2004-09-15 17:27 ` Jeff V. Merkey
1 sibling, 0 replies; 10+ messages in thread
From: Gene Heskett @ 2004-09-15 0:51 UTC (permalink / raw)
To: linux-kernel; +Cc: Nick Piggin, jmerkey, Jeff V. Merkey, jmerkey
On Tuesday 14 September 2004 19:51, Nick Piggin wrote:
>jmerkey@galt.devicelogics.com wrote:
>> You bet. Send them to me. For some reason I am not able to post
>> to LKML again.
>>
>> Jeff
>
>OK, this is against 2.6.9-rc2. Let me know how you go. Thanks
Humm, it came thru the list just fine, Nick.
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.26% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness
2004-09-14 23:51 ` Nick Piggin
2004-09-15 0:51 ` Gene Heskett
@ 2004-09-15 17:27 ` Jeff V. Merkey
2004-09-15 17:33 ` Jeff V. Merkey
2004-09-16 1:46 ` Nick Piggin
1 sibling, 2 replies; 10+ messages in thread
From: Jeff V. Merkey @ 2004-09-15 17:27 UTC (permalink / raw)
To: Nick Piggin; +Cc: jmerkey, linux-kernel, jmerkey
[-- Attachment #1: Type: text/plain, Size: 767 bytes --]
Nick Piggin wrote:
> jmerkey@galt.devicelogics.com wrote:
>
>> You bet. Send them to me. For some reason I am not able to post to
>> LKML again.
>>
>> Jeff
>>
> OK, this is against 2.6.9-rc2. Let me know how you go. Thanks
>
>
>
Nick,
The problem is corrected with this patch. I am running with 3GB of
kernel memory
and 1GB user space with the userspace splitting patch with very heavy
swapping
and user space app activity and no failed allocations. This patch
should be rolled
into 2.6.9-rc2 since it fixes the problem. With standard 3GB User/1GB
kernel
address space, it also fixes the problems with X server running out of
memory
and the apps crashing.
Jeff
Here's the stats from the test of the patch against 2.6.8-rc2 with the
patch applied
[-- Attachment #2: proc.meminfo --]
[-- Type: text/plain, Size: 0 bytes --]
[-- Attachment #3: proc.vmstat --]
[-- Type: text/plain, Size: 0 bytes --]
[-- Attachment #4: proc.slabinfo --]
[-- Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness
2004-09-15 17:27 ` Jeff V. Merkey
@ 2004-09-15 17:33 ` Jeff V. Merkey
2004-09-16 1:46 ` Nick Piggin
1 sibling, 0 replies; 10+ messages in thread
From: Jeff V. Merkey @ 2004-09-15 17:33 UTC (permalink / raw)
To: Jeff V. Merkey; +Cc: Nick Piggin, jmerkey, linux-kernel, jmerkey
[-- Attachment #1: Type: text/plain, Size: 871 bytes --]
Jeff V. Merkey wrote:
> Nick Piggin wrote:
>
>> jmerkey@galt.devicelogics.com wrote:
>>
>>> You bet. Send them to me. For some reason I am not able to post to
>>> LKML again.
>>>
>>> Jeff
>>>
>> OK, this is against 2.6.9-rc2. Let me know how you go. Thanks
>>
>>
>>
>
> Nick,
>
> The problem is corrected with this patch. I am running with 3GB of
> kernel memory
> and 1GB user space with the userspace splitting patch with very heavy
> swapping
> and user space app activity and no failed allocations. This patch
> should be rolled
> into 2.6.9-rc2 since it fixes the problem. With standard 3GB User/1GB
> kernel
> address space, it also fixes the problems with X server running out of
> memory
> and the apps crashing.
>
> Jeff
>
> Here's the stats from the test of the patch against 2.6.8-rc2 with the
> patch applied
>
>
Attachments included.
Jeff
[-- Attachment #2: proc.meminfo --]
[-- Type: text/plain, Size: 572 bytes --]
MemTotal: 2983616 kB
MemFree: 576608 kB
Buffers: 42116 kB
Cached: 86000 kB
SwapCached: 2364 kB
Active: 133756 kB
Inactive: 25340 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 2983616 kB
LowFree: 576608 kB
SwapTotal: 1052248 kB
SwapFree: 1011856 kB
Dirty: 4136 kB
Writeback: 0 kB
Mapped: 35872 kB
Slab: 2239264 kB
Committed_AS: 97816 kB
PageTables: 1076 kB
VmallocTotal: 122824 kB
VmallocUsed: 2896 kB
VmallocChunk: 119604 kB
[-- Attachment #3: proc.vmstat --]
[-- Type: text/plain, Size: 704 bytes --]
nr_dirty 1070
nr_writeback 0
nr_unstable 0
nr_page_table_pages 257
nr_mapped 5452
nr_slab 559846
pgpgin 259093865
pgpgout 68682338
pswpin 59363
pswpout 292378
pgalloc_high 0
pgalloc_normal 30770083
pgalloc_dma 2951033
pgfree 33868082
pgactivate 1505831
pgdeactivate 1709234
pgfault 64727816
pgmajfault 15099
pgrefill_high 0
pgrefill_normal 1685663
pgrefill_dma 1153475
pgsteal_high 0
pgsteal_normal 1043923
pgsteal_dma 424170
pgscan_kswapd_high 0
pgscan_kswapd_normal 1209615
pgscan_kswapd_dma 1983944
pgscan_direct_high 0
pgscan_direct_normal 206712
pgscan_direct_dma 9603
pginodesteal 11
slabs_scanned 342016
kswapd_steal 1298184
kswapd_inodesteal 9310
pageoutrun 2291
allocstall 4830
pgrotated 446422
[-- Attachment #4: proc.slabinfo --]
[-- Type: text/plain, Size: 13583 bytes --]
slabinfo - version: 2.0
# name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
bt_sock 3 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0
rpc_buffers 8 8 2048 2 1 : tunables 24 12 0 : slabdata 4 4 0
rpc_tasks 8 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0
rpc_inode_cache 6 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0
ip_fib_hash 11 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0
scsi_cmd_cache 52 110 384 10 1 : tunables 54 27 0 : slabdata 8 11 0
sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0
sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0
sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0
sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0
sgpool-8 97 217 128 31 1 : tunables 120 60 0 : slabdata 6 7 0
dm_tio 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0
dm_io 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0
uhci_urb_priv 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0
dn_fib_info_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
dn_dst_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
dn_neigh_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
decnet_socket_cache 0 0 768 5 1 : tunables 54 27 0 : slabdata 0 0 0
clip_arp_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
xfrm6_tunnel_spi 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0
fib6_nodes 13 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0
ip6_dst_cache 51 90 256 15 1 : tunables 120 60 0 : slabdata 6 6 0
ndisc_cache 5 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0
raw6_sock 0 0 640 6 1 : tunables 54 27 0 : slabdata 0 0 0
udp6_sock 0 0 640 6 1 : tunables 54 27 0 : slabdata 0 0 0
tcp6_sock 5 7 1152 7 2 : tunables 24 12 0 : slabdata 1 1 0
unix_sock 50 50 384 10 1 : tunables 54 27 0 : slabdata 5 5 0
ip_mrt_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
tcp_tw_bucket 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
tcp_bind_bucket 8 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0
tcp_open_request 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
inet_peer_cache 1 61 64 61 1 : tunables 120 60 0 : slabdata 1 1 0
secpath_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
xfrm_dst_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
ip_dst_cache 23 30 256 15 1 : tunables 120 60 0 : slabdata 2 2 0
arp_cache 1 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0
raw4_sock 0 0 512 7 1 : tunables 54 27 0 : slabdata 0 0 0
udp_sock 3 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0
tcp_sock 20 20 1024 4 1 : tunables 54 27 0 : slabdata 5 5 0
flow_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
isofs_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0
fat_inode_cache 0 0 340 11 1 : tunables 54 27 0 : slabdata 0 0 0
ext2_inode_cache 0 0 400 10 1 : tunables 54 27 0 : slabdata 0 0 0
journal_handle 16 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0
journal_head 521 891 48 81 1 : tunables 120 60 0 : slabdata 11 11 0
revoke_table 4 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0
revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0
ext3_inode_cache 7263 7263 440 9 1 : tunables 54 27 0 : slabdata 807 807 0
ext3_xattr 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0
reiser_inode_cache 0 0 368 11 1 : tunables 54 27 0 : slabdata 0 0 0
dquot 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0
eventpoll_epi 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
kioctx 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
kiocb 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
dnotify_cache 1 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0
fasync_cache 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0
shmem_inode_cache 4 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0
posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0
uid_cache 9 61 64 61 1 : tunables 120 60 0 : slabdata 1 1 0
cfq_pool 64 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0
crq_pool 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0
deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0
as_arq 135 195 60 65 1 : tunables 120 60 0 : slabdata 3 3 0
blkdev_ioc 109 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0
blkdev_queue 21 24 452 8 1 : tunables 54 27 0 : slabdata 3 3 0
blkdev_requests 122 182 152 26 1 : tunables 120 60 0 : slabdata 7 7 0
biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
biovec-64 257 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0
biovec-16 131340 131340 256 15 1 : tunables 120 60 0 : slabdata 8756 8756 0
biovec-4 305 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 368 678 16 226 1 : tunables 120 60 0 : slabdata 3 3 0
bio 131396 131516 64 61 1 : tunables 120 60 0 : slabdata 2156 2156 0
file_lock_cache 6 45 88 45 1 : tunables 120 60 0 : slabdata 1 1 0
sock_inode_cache 100 100 384 10 1 : tunables 54 27 0 : slabdata 10 10 0
skbuff_head_cache 2130 2130 256 15 1 : tunables 120 60 0 : slabdata 142 142 0
sock 30 30 384 10 1 : tunables 54 27 0 : slabdata 3 3 0
proc_inode_cache 435 481 308 13 1 : tunables 54 27 0 : slabdata 37 37 0
sigqueue 27 27 148 27 1 : tunables 120 60 0 : slabdata 1 1 0
radix_tree_node 7378 7378 276 14 1 : tunables 54 27 0 : slabdata 527 527 0
bdev_cache 8 14 512 7 1 : tunables 54 27 0 : slabdata 2 2 0
mnt_cache 21 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0
inode_cache 3406 3406 292 13 1 : tunables 54 27 0 : slabdata 262 262 0
dentry_cache 15148 15148 140 28 1 : tunables 120 60 0 : slabdata 541 541 0
filp 705 945 256 15 1 : tunables 120 60 0 : slabdata 63 63 0
names_cache 19 19 4096 1 1 : tunables 24 12 0 : slabdata 19 19 0
idr_layer_cache 100 116 136 29 1 : tunables 120 60 0 : slabdata 4 4 0
buffer_head 18041 67230 48 81 1 : tunables 120 60 0 : slabdata 830 830 0
mm_struct 102 102 640 6 1 : tunables 54 27 0 : slabdata 17 17 0
vm_area_struct 1831 2256 84 47 1 : tunables 120 60 0 : slabdata 48 48 0
fs_cache 119 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0
files_cache 98 98 512 7 1 : tunables 54 27 0 : slabdata 14 14 0
signal_cache 155 155 128 31 1 : tunables 120 60 0 : slabdata 5 5 0
sighand_cache 89 115 1408 5 2 : tunables 24 12 0 : slabdata 23 23 0
task_struct 93 114 1360 3 1 : tunables 24 12 0 : slabdata 38 38 0
anon_vma 921 1221 8 407 1 : tunables 120 60 0 : slabdata 3 3 0
pgd 73 73 4096 1 1 : tunables 24 12 0 : slabdata 73 73 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0
size-131072 2 2 131072 1 32 : tunables 8 4 0 : slabdata 2 2 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0
size-65536 32834 32834 65536 1 16 : tunables 8 4 0 : slabdata 32834 32834 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0
size-32768 1 1 32768 1 8 : tunables 8 4 0 : slabdata 1 1 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0
size-16384 2 2 16384 1 4 : tunables 8 4 0 : slabdata 2 2 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0
size-8192 124 128 8192 1 2 : tunables 8 4 0 : slabdata 124 128 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0
size-4096 2095 2095 4096 1 1 : tunables 24 12 0 : slabdata 2095 2095 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0
size-2048 32867 32868 2048 2 1 : tunables 24 12 0 : slabdata 16434 16434 0
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0
size-1024 141 144 1024 4 1 : tunables 54 27 0 : slabdata 36 36 0
size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0
size-512 200 560 512 8 1 : tunables 54 27 0 : slabdata 70 70 0
size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0
size-256 189 480 256 15 1 : tunables 120 60 0 : slabdata 32 32 0
size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0
size-128 2181 2263 128 31 1 : tunables 120 60 0 : slabdata 73 73 0
size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0
size-64 921 1281 64 61 1 : tunables 120 60 0 : slabdata 21 21 0
size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0
size-32 52715 52955 32 119 1 : tunables 120 60 0 : slabdata 445 445 0
kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness
2004-09-15 17:27 ` Jeff V. Merkey
2004-09-15 17:33 ` Jeff V. Merkey
@ 2004-09-16 1:46 ` Nick Piggin
2004-09-16 5:56 ` Jens Axboe
1 sibling, 1 reply; 10+ messages in thread
From: Nick Piggin @ 2004-09-16 1:46 UTC (permalink / raw)
To: Jeff V. Merkey; +Cc: jmerkey, linux-kernel, jmerkey
Jeff V. Merkey wrote:
> Nick Piggin wrote:
>> OK, this is against 2.6.9-rc2. Let me know how you go. Thanks
>>
>>
>>
>
> Nick,
>
> The problem is corrected with this patch. I am running with 3GB of
> kernel memory
> and 1GB user space with the userspace splitting patch with very heavy
> swapping
> and user space app activity and no failed allocations. This patch
> should be rolled
> into 2.6.9-rc2 since it fixes the problem. With standard 3GB User/1GB
> kernel
> address space, it also fixes the problems with X server running out of
> memory
> and the apps crashing.
>
Hi Jeff,
Thanks, that is very cool. The memory problems you're seeing aren't
actually a regression (it's always been like that), and I still haven't
got hold of some gigabit networking hardware to test it thoroughly, so
as such so it may be difficult to get this into 2.6.9. Hopefully soon
though.
I can provide you (or anyone) with up to date patches on request though,
so just let me know.
> Jeff
>
> Here's the stats from the test of the patch against 2.6.8-rc2 with the
> patch applied
>
>
Scanning stats look good at a quick glance. kswapd doesn't seem to be
going crazy.
However,
size-65536 32834 32834 65536 1 16
This slab entry is taking up about 2GB of unreclaimable memory (order-4,
no less). This must be a leak... does the number continue to rise as
your system runs?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness
2004-09-16 1:46 ` Nick Piggin
@ 2004-09-16 5:56 ` Jens Axboe
0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2004-09-16 5:56 UTC (permalink / raw)
To: Nick Piggin; +Cc: Jeff V. Merkey, jmerkey, linux-kernel, jmerkey
On Thu, Sep 16 2004, Nick Piggin wrote:
> Jeff V. Merkey wrote:
> >Jeff
> >
> >Here's the stats from the test of the patch against 2.6.8-rc2 with the
> >patch applied
> >
> >
>
> Scanning stats look good at a quick glance. kswapd doesn't seem to be
> going crazy.
>
> However,
> size-65536 32834 32834 65536 1 16
>
> This slab entry is taking up about 2GB of unreclaimable memory (order-4,
> no less). This must be a leak... does the number continue to rise as
> your system runs?
There's also a huge amount of 16-page bio + vecs in flight:
biovec-16 131340
That would point to a leak as well, most likely.
--
Jens Axboe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: 2.6.8.1 mempool subsystem sickness
2004-09-08 16:48 jmerkey
@ 2004-09-08 23:05 ` Nick Piggin
0 siblings, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2004-09-08 23:05 UTC (permalink / raw)
To: jmerkey; +Cc: linux-kernel, jmerkey
jmerkey@comcast.net wrote:
> On a system with 4GB of memory, and without
> the user space patch that spilts user space
> just a stock kernel, I am seeing memory
> allocation failures with X server and simple
> apps on a machine with a Pentium 4
> processor and 500MB of memory.
>
> If you load large apps and do a lot of
> skb traffic, the mempool abd slab
> caches start gobbling up pages
> and don't seem to balance them
> very well, resulting in memory
> allocation failures over time if
> the system stays up for a week
> or more.
>
> I am also seeing the same behavior
> on another system which has been
> running for almost 30 days with
> an skb based traffic regeneration
> test calling and sending skb's
> in kernel between two interfaces.
>
> The pages over time get stuck
> in the slab allocator and user
> space apps start to fail on alloc
> requests.
>
> Rebooting the system clears
> the problem, which slowly over time
> comes back. I am seeing this with
> stock kernels from kernel.org
> and on kernels I have patched,
> so the problem seems to be
> in the base code. I have spent
> the last two weeks observing
> the problem to verify I can
> reproduce it and it keeps
> happening.
>
> Jeff
>
Hi Jeff,
Can you give us a few more details please? Post the allocation failure
messages in full, and post /proc/meminfo, etc. Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* 2.6.8.1 mempool subsystem sickness
@ 2004-09-08 16:48 jmerkey
2004-09-08 23:05 ` Nick Piggin
0 siblings, 1 reply; 10+ messages in thread
From: jmerkey @ 2004-09-08 16:48 UTC (permalink / raw)
To: linux-kernel; +Cc: jmerkey
On a system with 4GB of memory, and without
the user space patch that spilts user space
just a stock kernel, I am seeing memory
allocation failures with X server and simple
apps on a machine with a Pentium 4
processor and 500MB of memory.
If you load large apps and do a lot of
skb traffic, the mempool abd slab
caches start gobbling up pages
and don't seem to balance them
very well, resulting in memory
allocation failures over time if
the system stays up for a week
or more.
I am also seeing the same behavior
on another system which has been
running for almost 30 days with
an skb based traffic regeneration
test calling and sending skb's
in kernel between two interfaces.
The pages over time get stuck
in the slab allocator and user
space apps start to fail on alloc
requests.
Rebooting the system clears
the problem, which slowly over time
comes back. I am seeing this with
stock kernels from kernel.org
and on kernels I have patched,
so the problem seems to be
in the base code. I have spent
the last two weeks observing
the problem to verify I can
reproduce it and it keeps
happening.
Jeff
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-09-16 6:03 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <091420042058.15928.41475B8000002BA100003E382200763704970A059D0A0306@comcast.net>
2004-09-14 20:32 ` 2.6.8.1 mempool subsystem sickness Jeff V. Merkey
2004-09-14 22:59 ` Nick Piggin
[not found] ` <20040914223122.GA3325@galt.devicelogics.com>
2004-09-14 23:51 ` Nick Piggin
2004-09-15 0:51 ` Gene Heskett
2004-09-15 17:27 ` Jeff V. Merkey
2004-09-15 17:33 ` Jeff V. Merkey
2004-09-16 1:46 ` Nick Piggin
2004-09-16 5:56 ` Jens Axboe
2004-09-08 16:48 jmerkey
2004-09-08 23:05 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).