linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.8.1 mempool subsystem sickness
       [not found] <091420042058.15928.41475B8000002BA100003E382200763704970A059D0A0306@comcast.net>
@ 2004-09-14 20:32 ` Jeff V. Merkey
  2004-09-14 22:59   ` Nick Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff V. Merkey @ 2004-09-14 20:32 UTC (permalink / raw)
  To: Nick Piggin, linux-kernel, jmerkey


Hi Jeff,

> Can you give us a few more details please? Post the allocation failure
> messages in full, and post /proc/meminfo, etc. Thanks.
> -
>
Here you go.

Jeff

Sep 14 14:18:59 datascout4 kernel: if_regen2: page allocation failure. order:3, 
mode:0x20
Sep 14 14:18:59 datascout4 kernel:  [<80106c7e>] dump_stack+0x1e/0x30
Sep 14 14:18:59 datascout4 kernel:  [<80134aac>] __alloc_pages+0x2dc/0x350
Sep 14 14:18:59 datascout4 kernel:  [<80134b42>] __get_free_pages+0x22/0x50
Sep 14 14:18:59 datascout4 kernel:  [<80137d9f>] kmem_getpages+0x1f/0xd0
Sep 14 14:18:59 datascout4 kernel:  [<8013897a>] cache_grow+0x9a/0x130
Sep 14 14:18:59 datascout4 kernel:  [<80138b4b>] cache_alloc_refill+0x13b/0x1e0
Sep 14 14:18:59 datascout4 kernel:  [<80138fa4>] __kmalloc+0x74/0x80
Sep 14 14:18:59 datascout4 kernel:  [<80299298>] alloc_skb+0x48/0xf0
Sep 14 14:18:59 datascout4 kernel:  [<f8972e67>] create_xmit_packet+0x57/0x100 
[dsfs]
Sep 14 14:18:59 datascout4 kernel:  [<f8973150>] regen_data+0x60/0x1d0 [dsfs]
Sep 14 14:18:59 datascout4 kernel:  [<80104355>] kernel_thread_helper+0x5/0x10
Sep 14 14:18:59 datascout4 kernel: printk: 12 messages suppressed.

MemTotal:      1944860 kB
MemFree:        200008 kB
Buffers:        133772 kB
Cached:         678268 kB
SwapCached:        208 kB
Active:         435712 kB
Inactive:       436756 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      1944860 kB
LowFree:        200008 kB
SwapTotal:     1052248 kB
SwapFree:      1051976 kB
Dirty:            7960 kB
Writeback:           0 kB
Mapped:          68160 kB
Slab:           863556 kB
Committed_AS:    92220 kB
PageTables:       1344 kB
VmallocTotal:   122804 kB
VmallocUsed:      2872 kB
VmallocChunk:   119856 kB

evdev 7552 0 - Live 0xf8aa3000
ipx 24364 0 - Live 0xf89f5000
appletalk 29748 0 - Live 0xf8ad8000
parport_pc 22464 1 - Live 0xf89e3000
lp 9644 0 - Live 0xf880f000
parport 34376 2 parport_pc,lp, Live 0xf8aab000
autofs4 15492 0 - Live 0xf89f0000
rfcomm 32412 0 - Live 0xf8a96000
l2cap 19460 5 rfcomm, Live 0xf89ea000
bluetooth 39812 4 rfcomm,l2cap, Live 0xf89ce000
sunrpc 126436 1 - Live 0xf8ab8000
e1000 83460 0 - Live 0xf89fc000
sg 33568 0 - Live 0xf89d9000
microcode 5536 0 - Live 0xf8851000
ide_cd 36512 0 - Live 0xf890c000
cdrom 35868 1 ide_cd, Live 0xf8867000
dsfs 269912 1 - Live 0xf896d000
sd_mod 17280 2 - Live 0xf8861000
3w_xxxx 35108 2 - Live 0xf8822000
scsi_mod 103244 3 sg,sd_mod,3w_xxxx, Live 0xf8923000
dm_mod 49788 0 - Live 0xf8871000
orinoco_usb 22440 0 - Live 0xf8847000
firmware_class 7424 1 orinoco_usb, Live 0xf8813000
orinoco 44048 1 orinoco_usb, Live 0xf8855000
uhci_hcd 28944 0 - Live 0xf8819000
usbcore 99300 4 orinoco_usb,uhci_hcd, Live 0xf882d000

slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> 
: tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> 
<num_slabs> <sharedavail>
bt_sock                3     10    384   10    1 : tunables   54   27    0 : 
slabdata      1      1      0
rpc_buffers            8      8   2048    2    1 : tunables   24   12    0 : 
slabdata      4      4      0
rpc_tasks              8     15    256   15    1 : tunables  120   60    0 : 
slabdata      1      1      0
rpc_inode_cache        6      7    512    7    1 : tunables   54   27    0 : 
slabdata      1      1      0
ip_fib_hash           11    226     16  226    1 : tunables  120   60    0 : 
slabdata      1      1      0
scsi_cmd_cache       160    160    384   10    1 : tunables   54   27    0 : 
slabdata     16     16      0
sgpool-128            32     32   2048    2    1 : tunables   24   12    0 : 
slabdata     16     16      0
sgpool-64             32     32   1024    4    1 : tunables   54   27    0 : 
slabdata      8      8      0
sgpool-32             32     32    512    8    1 : tunables   54   27    0 : 
slabdata      4      4      0
sgpool-16             32     45    256   15    1 : tunables  120   60    0 : 
slabdata      3      3      0
sgpool-8             217    217    128   31    1 : tunables  120   60    0 : 
slabdata      7      7      0
dm_tio                 0      0     16  226    1 : tunables  120   60    0 : 
slabdata      0      0      0
dm_io                  0      0     16  226    1 : tunables  120   60    0 : 
slabdata      0      0      0
uhci_urb_priv          1     88     44   88    1 : tunables  120   60    0 : 
slabdata      1      1      0
dn_fib_info_cache      0      0    128   31    1 : tunables  120   60    0 : 
slabdata      0      0      0
dn_dst_cache           0      0    256   15    1 : tunables  120   60    0 : 
slabdata      0      0      0
dn_neigh_cache         0      0    256   15    1 : tunables  120   60    0 : 
slabdata      0      0      0
decnet_socket_cache      0      0    768    5    1 : tunables   54   27    0 : 
slabdata      0      0      0
clip_arp_cache         0      0    256   15    1 : tunables  120   60    0 : 
slabdata      0      0      0
xfrm6_tunnel_spi       0      0     64   61    1 : tunables  120   60    0 : 
slabdata      0      0      0
fib6_nodes            15    119     32  119    1 : tunables  120   60    0 : 
slabdata      1      1      0
ip6_dst_cache         60    105    256   15    1 : tunables  120   60    0 : 
slabdata      7      7      0
ndisc_cache            5     15    256   15    1 : tunables  120   60    0 : 
slabdata      1      1      0
raw6_sock              0      0    640    6    1 : tunables   54   27    0 : 
slabdata      0      0      0
udp6_sock              0      0    640    6    1 : tunables   54   27    0 : 
slabdata      0      0      0
tcp6_sock              7      7   1152    7    2 : tunables   24   12    0 : 
slabdata      1      1      0
unix_sock             50     50    384   10    1 : tunables   54   27    0 : 
slabdata      5      5      0
ip_mrt_cache           0      0    128   31    1 : tunables  120   60    0 : 
slabdata      0      0      0
tcp_tw_bucket          0      0    128   31    1 : tunables  120   60    0 : 
slabdata      0      0      0
tcp_bind_bucket        8    226     16  226    1 : tunables  120   60    0 : 
slabdata      1      1      0
tcp_open_request       0      0    128   31    1 : tunables  120   60    0 : 
slabdata      0      0      0
inet_peer_cache        0      0     64   61    1 : tunables  120   60    0 : 
slabdata      0      0      0
secpath_cache          0      0    128   31    1 : tunables  120   60    0 : 
slabdata      0      0      0
xfrm_dst_cache         0      0    256   15    1 : tunables  120   60    0 : 
slabdata      0      0      0
ip_dst_cache          20     30    256   15    1 : tunables  120   60    0 : 
slabdata      2      2      0
arp_cache              1     31    128   31    1 : tunables  120   60    0 : 
slabdata      1      1      0
raw4_sock              0      0    512    7    1 : tunables   54   27    0 : 
slabdata      0      0      0
udp_sock               3      7    512    7    1 : tunables   54   27    0 : 
slabdata      1      1      0
tcp_sock              16     16   1024    4    1 : tunables   54   27    0 : 
slabdata      4      4      0
flow_cache             0      0    128   31    1 : tunables  120   60    0 : 
slabdata      0      0      0
isofs_inode_cache      0      0    384   10    1 : tunables   54   27    0 : 
slabdata      0      0      0
fat_inode_cache        0      0    384   10    1 : tunables   54   27    0 : 
slabdata      0      0      0
ext2_inode_cache       0      0    512    7    1 : tunables   54   27    0 : 
slabdata      0      0      0
journal_handle         8    135     28  135    1 : tunables  120   60    0 : 
slabdata      1      1      0
journal_head         456   2349     48   81    1 : tunables  120   60    0 : 
slabdata     29     29      0
revoke_table           4    290     12  290    1 : tunables  120   60    0 : 
slabdata      1      1      0
revoke_record          0      0     16  226    1 : tunables  120   60    0 : 
slabdata      0      0      0
ext3_inode_cache  336400 340655    512    7    1 : tunables   54   27    0 : 
slabdata  48665  48665      0
ext3_xattr             0      0     44   88    1 : tunables  120   60    0 : 
slabdata      0      0      0
reiser_inode_cache      0      0    384   10    1 : tunables   54   27    0 : 
slabdata      0      0      0
dquot                  0      0    128   31    1 : tunables  120   60    0 : 
slabdata      0      0      0
eventpoll_pwq          0      0     36  107    1 : tunables  120   60    0 : 
slabdata      0      0      0
eventpoll_epi          0      0    128   31    1 : tunables  120   60    0 : 
slabdata      0      0      0
kioctx                 0      0    256   15    1 : tunables  120   60    0 : 
slabdata      0      0      0
kiocb                  0      0    128   31    1 : tunables  120   60    0 : 
slabdata      0      0      0
dnotify_cache          1    185     20  185    1 : tunables  120   60    0 : 
slabdata      1      1      0
file_lock_cache        2     43     92   43    1 : tunables  120   60    0 : 
slabdata      1      1      0
fasync_cache           0      0     16  226    1 : tunables  120   60    0 : 
slabdata      0      0      0
shmem_inode_cache      8     14    512    7    1 : tunables   54   27    0 : 
slabdata      2      2      0
posix_timers_cache      0      0     96   41    1 : tunables  120   60    0 : 
slabdata      0      0      0
uid_cache              9    119     32  119    1 : tunables  120   60    0 : 
slabdata      1      1      0
cfq_pool              64    119     32  119    1 : tunables  120   60    0 : 
slabdata      1      1      0
crq_pool               0      0     36  107    1 : tunables  120   60    0 : 
slabdata      0      0      0
deadline_drq           0      0     48   81    1 : tunables  120   60    0 : 
slabdata      0      0      0
as_arq               200    260     60   65    1 : tunables  120   60    0 : 
slabdata      4      4      0
blkdev_ioc            35    185     20  185    1 : tunables  120   60    0 : 
slabdata      1      1      0
blkdev_queue          21     27    448    9    1 : tunables   54   27    0 : 
slabdata      3      3      0
blkdev_requests      252    286    152   26    1 : tunables  120   60    0 : 
slabdata     11     11      0
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : 
slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : 
slabdata     52     52      0
biovec-64            256    260    768    5    1 : tunables   54   27    0 : 
slabdata     52     52      0
biovec-16         131328 131340    256   15    1 : tunables  120   60    0 : 
slabdata   8756   8756      0
biovec-4             256    305     64   61    1 : tunables  120   60    0 : 
slabdata      5      5      0
biovec-1             404    678     16  226    1 : tunables  120   60    0 : 
slabdata      3      3      0
bio               131457 131577     64   61    1 : tunables  120   60    0 : 
slabdata   2157   2157      0
sock_inode_cache      60     60    384   10    1 : tunables   54   27    0 : 
slabdata      6      6      0
skbuff_head_cache   2130   2370    256   15    1 : tunables  120   60    0 : 
slabdata    158    158      0
sock                   4     10    384   10    1 : tunables   54   27    0 : 
slabdata      1      1      0
proc_inode_cache     409    470    384   10    1 : tunables   54   27    0 : 
slabdata     47     47      0
sigqueue              27     27    148   27    1 : tunables  120   60    0 : 
slabdata      1      1      0
radix_tree_node    53400  57358    276   14    1 : tunables   54   27    0 : 
slabdata   4097   4097      0
bdev_cache             8     14    512    7    1 : tunables   54   27    0 : 
slabdata      2      2      0
mnt_cache             21     31    128   31    1 : tunables  120   60    0 : 
slabdata      1      1      0
inode_cache         3299   3310    384   10    1 : tunables   54   27    0 : 
slabdata    331    331      0
dentry_cache      198368 254576    140   28    1 : tunables  120   60    0 : 
slabdata   9092   9092      0
filp                 675    675    256   15    1 : tunables  120   60    0 : 
slabdata     45     45      0
names_cache            4      4   4096    1    1 : tunables   24   12    0 : 
slabdata      4      4      0
idr_layer_cache       88    116    136   29    1 : tunables  120   60    0 : 
slabdata      4      4      0
buffer_head       193153 230931     48   81    1 : tunables  120   60    0 : 
slabdata   2851   2851      0
mm_struct             70     70    512    7    1 : tunables   54   27    0 : 
slabdata     10     10      0
vm_area_struct      1726   1786     84   47    1 : tunables  120   60    0 : 
slabdata     38     38      0
fs_cache             106    119     32  119    1 : tunables  120   60    0 : 
slabdata      1      1      0
files_cache           70     70    512    7    1 : tunables   54   27    0 : 
slabdata     10     10      0
signal_cache         124    124    128   31    1 : tunables  120   60    0 : 
slabdata      4      4      0
sighand_cache         90     90   1408    5    2 : tunables   24   12    0 : 
slabdata     18     18      0
task_struct           95     95   1424    5    2 : tunables   24   12    0 : 
slabdata     19     19      0
anon_vma             814    814      8  407    1 : tunables  120   60    0 : 
slabdata      2      2      0
pgd                   65     65   4096    1    1 : tunables   24   12    0 : 
slabdata     65     65      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : 
slabdata      0      0      0
size-131072            2      2 131072    1   32 : tunables    8    4    0 : 
slabdata      2      2      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : 
slabdata      0      0      0
size-65536          8234   8234  65536    1   16 : tunables    8    4    0 : 
slabdata   8234   8234      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : 
slabdata      0      0      0
size-32768             7     16  32768    1    8 : tunables    8    4    0 : 
slabdata      7     16      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : 
slabdata      0      0      0
size-16384             6      6  16384    1    4 : tunables    8    4    0 : 
slabdata      6      6      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : 
slabdata      0      0      0
size-8192            105    109   8192    1    2 : tunables    8    4    0 : 
slabdata    105    109      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : 
slabdata      0      0      0
size-4096           2099   2111   4096    1    1 : tunables   24   12    0 : 
slabdata   2099   2111      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : 
slabdata      0      0      0
size-2048           8276   8276   2048    2    1 : tunables   24   12    0 : 
slabdata   4138   4138      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : 
slabdata      0      0      0
size-1024            140    140   1024    4    1 : tunables   54   27    0 : 
slabdata     35     35      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : 
slabdata      0      0      0
size-512             176    560    512    8    1 : tunables   54   27    0 : 
slabdata     70     70      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : 
slabdata      0      0      0
size-256             222    480    256   15    1 : tunables  120   60    0 : 
slabdata     32     32      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : 
slabdata      0      0      0
size-128            1807   1829    128   31    1 : tunables  120   60    0 : 
slabdata     59     59      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : 
slabdata      0      0      0
size-64             7762   8662     64   61    1 : tunables  120   60    0 : 
slabdata    142    142      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : 
slabdata      0      0      0
size-32            15764  16184     32  119    1 : tunables  120   60    0 : 
slabdata    136    136      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : 
slabdata      4      4      0

nr_dirty 1316
nr_writeback 0
nr_unstable 0
nr_page_table_pages 336
nr_mapped 18227
nr_slab 215890
pgpgin 1496713
pgpgout 1020309568
pswpin 365
pswpout 3752
pgalloc_high 0
pgalloc_normal 1578465026
pgalloc_dma 319286
pgfree 1578828064
pgactivate 592523
pgdeactivate 198222
pgfault 118936465
pgmajfault 1721
pgrefill_high 0
pgrefill_normal 199636
pgrefill_dma 50800
pgsteal_high 0
pgsteal_normal 172811
pgsteal_dma 26364
pgscan_kswapd_high 0
pgscan_kswapd_normal 79497
pgscan_kswapd_dma 140076
pgscan_direct_high 0
pgscan_direct_normal 99231
pgscan_direct_dma 3795
pginodesteal 3837
slabs_scanned 847283
kswapd_steal 101698
kswapd_inodesteal 35700
pageoutrun 282
allocstall 2977
pgrotated 20310


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.8.1 mempool subsystem sickness
  2004-09-14 20:32 ` 2.6.8.1 mempool subsystem sickness Jeff V. Merkey
@ 2004-09-14 22:59   ` Nick Piggin
       [not found]     ` <20040914223122.GA3325@galt.devicelogics.com>
  0 siblings, 1 reply; 10+ messages in thread
From: Nick Piggin @ 2004-09-14 22:59 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: linux-kernel, jmerkey

Jeff V. Merkey wrote:
> 
> Hi Jeff,
> 
>> Can you give us a few more details please? Post the allocation failure
>> messages in full, and post /proc/meminfo, etc. Thanks.
>> -
>>
> Here you go.
> 

Thanks.

> Jeff
> 
> Sep 14 14:18:59 datascout4 kernel: if_regen2: page allocation failure. 
> order:3, mode:0x20
> Sep 14 14:18:59 datascout4 kernel:  [<80106c7e>] dump_stack+0x1e/0x30
> Sep 14 14:18:59 datascout4 kernel:  [<80134aac>] __alloc_pages+0x2dc/0x350
> Sep 14 14:18:59 datascout4 kernel:  [<80134b42>] __get_free_pages+0x22/0x50
> Sep 14 14:18:59 datascout4 kernel:  [<80137d9f>] kmem_getpages+0x1f/0xd0
> Sep 14 14:18:59 datascout4 kernel:  [<8013897a>] cache_grow+0x9a/0x130
> Sep 14 14:18:59 datascout4 kernel:  [<80138b4b>] 
> cache_alloc_refill+0x13b/0x1e0
> Sep 14 14:18:59 datascout4 kernel:  [<80138fa4>] __kmalloc+0x74/0x80
> Sep 14 14:18:59 datascout4 kernel:  [<80299298>] alloc_skb+0x48/0xf0
> Sep 14 14:18:59 datascout4 kernel:  [<f8972e67>] 
> create_xmit_packet+0x57/0x100 [dsfs]
> Sep 14 14:18:59 datascout4 kernel:  [<f8973150>] regen_data+0x60/0x1d0 
> [dsfs]
> Sep 14 14:18:59 datascout4 kernel:  [<80104355>] 
> kernel_thread_helper+0x5/0x10
> Sep 14 14:18:59 datascout4 kernel: printk: 12 messages suppressed.
> 
> MemTotal:      1944860 kB
> MemFree:        200008 kB

Wow. You have 200MB free, and can't satisfy an order 3 allocation (although it
is atomic).

Now it just so happens that I have a couple of patches that are supposed to fix
exactly this. Unfortunately I haven't had the hardware to properly test them
(they're pretty stable though). Want to give them a try? :)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.8.1 mempool subsystem sickness
       [not found]     ` <20040914223122.GA3325@galt.devicelogics.com>
@ 2004-09-14 23:51       ` Nick Piggin
  2004-09-15  0:51         ` Gene Heskett
  2004-09-15 17:27         ` Jeff V. Merkey
  0 siblings, 2 replies; 10+ messages in thread
From: Nick Piggin @ 2004-09-14 23:51 UTC (permalink / raw)
  To: jmerkey; +Cc: Jeff V. Merkey, linux-kernel, jmerkey

[-- Attachment #1: Type: text/plain, Size: 199 bytes --]

jmerkey@galt.devicelogics.com wrote:
> You bet.  Send them to me.  For some reason I am not able to post 
> to LKML again.
> 
> Jeff
> 

OK, this is against 2.6.9-rc2. Let me know how you go. Thanks

[-- Attachment #2: vm-rollup.patch --]
[-- Type: text/x-patch, Size: 9996 bytes --]




---

 linux-2.6-npiggin/include/linux/mmzone.h |    8 ++
 linux-2.6-npiggin/mm/page_alloc.c        |   83 ++++++++++++++++++-------------
 linux-2.6-npiggin/mm/vmscan.c            |   34 +++++++++---
 3 files changed, 81 insertions(+), 44 deletions(-)

diff -puN mm/page_alloc.c~vm-rollup mm/page_alloc.c
--- linux-2.6/mm/page_alloc.c~vm-rollup	2004-09-15 09:48:12.000000000 +1000
+++ linux-2.6-npiggin/mm/page_alloc.c	2004-09-15 09:48:59.000000000 +1000
@@ -206,6 +206,7 @@ static inline void __free_pages_bulk (st
 		BUG_ON(bad_range(zone, buddy1));
 		BUG_ON(bad_range(zone, buddy2));
 		list_del(&buddy1->lru);
+		area->nr_free--;
 		mask <<= 1;
 		order++;
 		area++;
@@ -213,6 +214,7 @@ static inline void __free_pages_bulk (st
 		page_idx &= mask;
 	}
 	list_add(&(base + page_idx)->lru, &area->free_list);
+	area->nr_free++;
 }
 
 static inline void free_pages_check(const char *function, struct page *page)
@@ -314,6 +316,7 @@ expand(struct zone *zone, struct page *p
 		size >>= 1;
 		BUG_ON(bad_range(zone, &page[size]));
 		list_add(&page[size].lru, &area->free_list);
+		area->nr_free++;
 		MARK_USED(index + size, high, area);
 	}
 	return page;
@@ -377,6 +380,7 @@ static struct page *__rmqueue(struct zon
 
 		page = list_entry(area->free_list.next, struct page, lru);
 		list_del(&page->lru);
+		area->nr_free--;
 		index = page - zone->zone_mem_map;
 		if (current_order != MAX_ORDER-1)
 			MARK_USED(index, current_order, area);
@@ -579,6 +583,36 @@ buffered_rmqueue(struct zone *zone, int 
 }
 
 /*
+ * Return 1 if free pages are above 'mark'. This takes into account the order
+ * of the allocation.
+ */
+int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
+		int alloc_type, int can_try_harder, int gfp_high)
+{
+	unsigned long min = mark, free_pages = z->free_pages;
+	int o;
+
+	if (gfp_high)
+		min -= min / 2;
+	if (can_try_harder)
+		min -= min / 4;
+
+	if (free_pages < min + z->protection[alloc_type])
+		return 0;
+	for (o = 0; o < order; o++) {
+		/* At the next order, this order's pages become unavailable */
+		free_pages -= z->free_area[order].nr_free << o;
+
+		/* Require fewer higher order pages to be free */
+		min >>= 1;
+
+		if (free_pages < min + (1 << order) - 1)
+			return 0;
+	}
+	return 1;
+}
+
+/*
  * This is the 'heart' of the zoned buddy allocator.
  *
  * Herein lies the mysterious "incremental min".  That's the
@@ -599,7 +633,6 @@ __alloc_pages(unsigned int gfp_mask, uns
 		struct zonelist *zonelist)
 {
 	const int wait = gfp_mask & __GFP_WAIT;
-	unsigned long min;
 	struct zone **zones, *z;
 	struct page *page;
 	struct reclaim_state reclaim_state;
@@ -629,9 +662,9 @@ __alloc_pages(unsigned int gfp_mask, uns
 
 	/* Go through the zonelist once, looking for a zone with enough free */
 	for (i = 0; (z = zones[i]) != NULL; i++) {
-		min = z->pages_low + (1<<order) + z->protection[alloc_type];
 
-		if (z->free_pages < min)
+		if (!zone_watermark_ok(z, order, z->pages_low,
+				alloc_type, 0, 0))
 			continue;
 
 		page = buffered_rmqueue(z, order, gfp_mask);
@@ -640,21 +673,16 @@ __alloc_pages(unsigned int gfp_mask, uns
 	}
 
 	for (i = 0; (z = zones[i]) != NULL; i++)
-		wakeup_kswapd(z);
+		wakeup_kswapd(z, order);
 
 	/*
 	 * Go through the zonelist again. Let __GFP_HIGH and allocations
 	 * coming from realtime tasks to go deeper into reserves
 	 */
 	for (i = 0; (z = zones[i]) != NULL; i++) {
-		min = z->pages_min;
-		if (gfp_mask & __GFP_HIGH)
-			min /= 2;
-		if (can_try_harder)
-			min -= min / 4;
-		min += (1<<order) + z->protection[alloc_type];
-
-		if (z->free_pages < min)
+		if (!zone_watermark_ok(z, order, z->pages_min,
+				alloc_type, can_try_harder,
+				gfp_mask & __GFP_HIGH))
 			continue;
 
 		page = buffered_rmqueue(z, order, gfp_mask);
@@ -690,14 +718,9 @@ rebalance:
 
 	/* go through the zonelist yet one more time */
 	for (i = 0; (z = zones[i]) != NULL; i++) {
-		min = z->pages_min;
-		if (gfp_mask & __GFP_HIGH)
-			min /= 2;
-		if (can_try_harder)
-			min -= min / 4;
-		min += (1<<order) + z->protection[alloc_type];
-
-		if (z->free_pages < min)
+		if (!zone_watermark_ok(z, order, z->pages_min,
+				alloc_type, can_try_harder,
+				gfp_mask & __GFP_HIGH))
 			continue;
 
 		page = buffered_rmqueue(z, order, gfp_mask);
@@ -1117,7 +1140,6 @@ void show_free_areas(void)
 	}
 
 	for_each_zone(zone) {
-		struct list_head *elem;
  		unsigned long nr, flags, order, total = 0;
 
 		show_node(zone);
@@ -1129,9 +1151,7 @@ void show_free_areas(void)
 
 		spin_lock_irqsave(&zone->lock, flags);
 		for (order = 0; order < MAX_ORDER; order++) {
-			nr = 0;
-			list_for_each(elem, &zone->free_area[order].free_list)
-				++nr;
+			nr = zone->free_area[order].nr_free;
 			total += nr << order;
 			printk("%lu*%lukB ", nr, K(1UL) << order);
 		}
@@ -1457,6 +1477,7 @@ void zone_init_free_lists(struct pglist_
 		bitmap_size = pages_to_bitmap_size(order, size);
 		zone->free_area[order].map =
 		  (unsigned long *) alloc_bootmem_node(pgdat, bitmap_size);
+		zone->free_area[order].nr_free = 0;
 	}
 }
 
@@ -1481,6 +1502,7 @@ static void __init free_area_init_core(s
 
 	pgdat->nr_zones = 0;
 	init_waitqueue_head(&pgdat->kswapd_wait);
+	pgdat->kswapd_max_order = 0;
 	
 	for (j = 0; j < MAX_NR_ZONES; j++) {
 		struct zone *zone = pgdat->node_zones + j;
@@ -1644,8 +1666,7 @@ static void frag_stop(struct seq_file *m
 }
 
 /* 
- * This walks the freelist for each zone. Whilst this is slow, I'd rather 
- * be slow here than slow down the fast path by keeping stats - mjbligh
+ * This walks the free areas for each zone.
  */
 static int frag_show(struct seq_file *m, void *arg)
 {
@@ -1661,14 +1682,8 @@ static int frag_show(struct seq_file *m,
 
 		spin_lock_irqsave(&zone->lock, flags);
 		seq_printf(m, "Node %d, zone %8s ", pgdat->node_id, zone->name);
-		for (order = 0; order < MAX_ORDER; ++order) {
-			unsigned long nr_bufs = 0;
-			struct list_head *elem;
-
-			list_for_each(elem, &(zone->free_area[order].free_list))
-				++nr_bufs;
-			seq_printf(m, "%6lu ", nr_bufs);
-		}
+		for (order = 0; order < MAX_ORDER; ++order)
+			seq_printf(m, "%6lu ", zone->free_area[order].nr_free);
 		spin_unlock_irqrestore(&zone->lock, flags);
 		seq_putc(m, '\n');
 	}
diff -puN include/linux/mmzone.h~vm-rollup include/linux/mmzone.h
--- linux-2.6/include/linux/mmzone.h~vm-rollup	2004-09-15 09:48:16.000000000 +1000
+++ linux-2.6-npiggin/include/linux/mmzone.h	2004-09-15 09:48:59.000000000 +1000
@@ -23,6 +23,7 @@
 struct free_area {
 	struct list_head	free_list;
 	unsigned long		*map;
+	unsigned long		nr_free;
 };
 
 struct pglist_data;
@@ -262,8 +263,9 @@ typedef struct pglist_data {
 					     range, including holes */
 	int node_id;
 	struct pglist_data *pgdat_next;
-	wait_queue_head_t       kswapd_wait;
+	wait_queue_head_t kswapd_wait;
 	struct task_struct *kswapd;
+	int kswapd_max_order;
 } pg_data_t;
 
 #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
@@ -277,7 +279,9 @@ void __get_zone_counts(unsigned long *ac
 void get_zone_counts(unsigned long *active, unsigned long *inactive,
 			unsigned long *free);
 void build_all_zonelists(void);
-void wakeup_kswapd(struct zone *zone);
+void wakeup_kswapd(struct zone *zone, int order);
+int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
+		int alloc_type, int can_try_harder, int gfp_high);
 
 /*
  * zone_idx() returns 0 for the ZONE_DMA zone, 1 for the ZONE_NORMAL zone, etc.
diff -puN mm/vmscan.c~vm-rollup mm/vmscan.c
--- linux-2.6/mm/vmscan.c~vm-rollup	2004-09-15 09:48:18.000000000 +1000
+++ linux-2.6-npiggin/mm/vmscan.c	2004-09-15 09:49:31.000000000 +1000
@@ -965,7 +965,7 @@ out:
  * the page allocator fallback scheme to ensure that aging of pages is balanced
  * across the zones.
  */
-static int balance_pgdat(pg_data_t *pgdat, int nr_pages)
+static int balance_pgdat(pg_data_t *pgdat, int nr_pages, int order)
 {
 	int to_free = nr_pages;
 	int priority;
@@ -1003,7 +1003,8 @@ static int balance_pgdat(pg_data_t *pgda
 						priority != DEF_PRIORITY)
 					continue;
 
-				if (zone->free_pages <= zone->pages_high) {
+				if (!zone_watermark_ok(zone, order,
+						zone->pages_high, 0, 0, 0)) {
 					end_zone = i;
 					goto scan;
 				}
@@ -1035,7 +1036,8 @@ scan:
 				continue;
 
 			if (nr_pages == 0) {	/* Not software suspend */
-				if (zone->free_pages <= zone->pages_high)
+				if (!zone_watermark_ok(zone, order,
+						zone->pages_high, end_zone, 0, 0))
 					all_zones_ok = 0;
 			}
 			zone->temp_priority = priority;
@@ -1126,13 +1128,26 @@ static int kswapd(void *p)
 	tsk->flags |= PF_MEMALLOC|PF_KSWAPD;
 
 	for ( ; ; ) {
+		unsigned long order = 0, new_order;
 		if (current->flags & PF_FREEZE)
 			refrigerator(PF_FREEZE);
+
 		prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
-		schedule();
+		new_order = pgdat->kswapd_max_order;
+		pgdat->kswapd_max_order = 0;
+		if (order < new_order) {
+			/*
+			 * Don't sleep if someone wants a larger 'order'
+			 * allocation
+			 */
+			order = new_order;
+		} else {
+			schedule();
+			order = pgdat->kswapd_max_order;
+		}
 		finish_wait(&pgdat->kswapd_wait, &wait);
 
-		balance_pgdat(pgdat, 0);
+		balance_pgdat(pgdat, 0, order);
 	}
 	return 0;
 }
@@ -1140,10 +1155,13 @@ static int kswapd(void *p)
 /*
  * A zone is low on free memory, so wake its kswapd task to service it.
  */
-void wakeup_kswapd(struct zone *zone)
+void wakeup_kswapd(struct zone *zone, int order)
 {
-	if (zone->free_pages > zone->pages_low)
+	pg_data_t *pgdat = zone->zone_pgdat;
+
+	if (pgdat->kswapd_max_order < order)
 		return;
+	pgdat->kswapd_max_order = order;
 	if (!waitqueue_active(&zone->zone_pgdat->kswapd_wait))
 		return;
 	wake_up_interruptible(&zone->zone_pgdat->kswapd_wait);
@@ -1166,7 +1184,7 @@ int shrink_all_memory(int nr_pages)
 	current->reclaim_state = &reclaim_state;
 	for_each_pgdat(pgdat) {
 		int freed;
-		freed = balance_pgdat(pgdat, nr_to_free);
+		freed = balance_pgdat(pgdat, nr_to_free, 0);
 		ret += freed;
 		nr_to_free -= freed;
 		if (nr_to_free <= 0)

_

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.8.1 mempool subsystem sickness
  2004-09-14 23:51       ` Nick Piggin
@ 2004-09-15  0:51         ` Gene Heskett
  2004-09-15 17:27         ` Jeff V. Merkey
  1 sibling, 0 replies; 10+ messages in thread
From: Gene Heskett @ 2004-09-15  0:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Nick Piggin, jmerkey, Jeff V. Merkey, jmerkey

On Tuesday 14 September 2004 19:51, Nick Piggin wrote:
>jmerkey@galt.devicelogics.com wrote:
>> You bet.  Send them to me.  For some reason I am not able to post
>> to LKML again.
>>
>> Jeff
>
>OK, this is against 2.6.9-rc2. Let me know how you go. Thanks

Humm, it came thru the list just fine, Nick.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.26% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.8.1 mempool subsystem sickness
  2004-09-14 23:51       ` Nick Piggin
  2004-09-15  0:51         ` Gene Heskett
@ 2004-09-15 17:27         ` Jeff V. Merkey
  2004-09-15 17:33           ` Jeff V. Merkey
  2004-09-16  1:46           ` Nick Piggin
  1 sibling, 2 replies; 10+ messages in thread
From: Jeff V. Merkey @ 2004-09-15 17:27 UTC (permalink / raw)
  To: Nick Piggin; +Cc: jmerkey, linux-kernel, jmerkey

[-- Attachment #1: Type: text/plain, Size: 767 bytes --]

Nick Piggin wrote:

> jmerkey@galt.devicelogics.com wrote:
>
>> You bet.  Send them to me.  For some reason I am not able to post to 
>> LKML again.
>>
>> Jeff
>>
> OK, this is against 2.6.9-rc2. Let me know how you go. Thanks
>
>  
>

Nick,

The problem is corrected with this patch.  I am running with 3GB of 
kernel memory
and 1GB user space with the userspace splitting patch with very heavy 
swapping
and user space app activity and no failed allocations.  This patch 
should be rolled
into 2.6.9-rc2 since it fixes the problem.  With standard 3GB User/1GB 
kernel
address space, it also fixes the problems with X server running out of 
memory
and the apps crashing.

Jeff

Here's the stats from the test of the patch against 2.6.8-rc2 with the 
patch applied



[-- Attachment #2: proc.meminfo --]
[-- Type: text/plain, Size: 0 bytes --]



[-- Attachment #3: proc.vmstat --]
[-- Type: text/plain, Size: 0 bytes --]



[-- Attachment #4: proc.slabinfo --]
[-- Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.8.1 mempool subsystem sickness
  2004-09-15 17:27         ` Jeff V. Merkey
@ 2004-09-15 17:33           ` Jeff V. Merkey
  2004-09-16  1:46           ` Nick Piggin
  1 sibling, 0 replies; 10+ messages in thread
From: Jeff V. Merkey @ 2004-09-15 17:33 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: Nick Piggin, jmerkey, linux-kernel, jmerkey

[-- Attachment #1: Type: text/plain, Size: 871 bytes --]

Jeff V. Merkey wrote:

> Nick Piggin wrote:
>
>> jmerkey@galt.devicelogics.com wrote:
>>
>>> You bet.  Send them to me.  For some reason I am not able to post to 
>>> LKML again.
>>>
>>> Jeff
>>>
>> OK, this is against 2.6.9-rc2. Let me know how you go. Thanks
>>
>>  
>>
>
> Nick,
>
> The problem is corrected with this patch.  I am running with 3GB of 
> kernel memory
> and 1GB user space with the userspace splitting patch with very heavy 
> swapping
> and user space app activity and no failed allocations.  This patch 
> should be rolled
> into 2.6.9-rc2 since it fixes the problem.  With standard 3GB User/1GB 
> kernel
> address space, it also fixes the problems with X server running out of 
> memory
> and the apps crashing.
>
> Jeff
>
> Here's the stats from the test of the patch against 2.6.8-rc2 with the 
> patch applied
>
>


Attachments included.

Jeff


[-- Attachment #2: proc.meminfo --]
[-- Type: text/plain, Size: 572 bytes --]

MemTotal:      2983616 kB
MemFree:        576608 kB
Buffers:         42116 kB
Cached:          86000 kB
SwapCached:       2364 kB
Active:         133756 kB
Inactive:        25340 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:      2983616 kB
LowFree:        576608 kB
SwapTotal:     1052248 kB
SwapFree:      1011856 kB
Dirty:            4136 kB
Writeback:           0 kB
Mapped:          35872 kB
Slab:          2239264 kB
Committed_AS:    97816 kB
PageTables:       1076 kB
VmallocTotal:   122824 kB
VmallocUsed:      2896 kB
VmallocChunk:   119604 kB

[-- Attachment #3: proc.vmstat --]
[-- Type: text/plain, Size: 704 bytes --]

nr_dirty 1070
nr_writeback 0
nr_unstable 0
nr_page_table_pages 257
nr_mapped 5452
nr_slab 559846
pgpgin 259093865
pgpgout 68682338
pswpin 59363
pswpout 292378
pgalloc_high 0
pgalloc_normal 30770083
pgalloc_dma 2951033
pgfree 33868082
pgactivate 1505831
pgdeactivate 1709234
pgfault 64727816
pgmajfault 15099
pgrefill_high 0
pgrefill_normal 1685663
pgrefill_dma 1153475
pgsteal_high 0
pgsteal_normal 1043923
pgsteal_dma 424170
pgscan_kswapd_high 0
pgscan_kswapd_normal 1209615
pgscan_kswapd_dma 1983944
pgscan_direct_high 0
pgscan_direct_normal 206712
pgscan_direct_dma 9603
pginodesteal 11
slabs_scanned 342016
kswapd_steal 1298184
kswapd_inodesteal 9310
pageoutrun 2291
allocstall 4830
pgrotated 446422

[-- Attachment #4: proc.slabinfo --]
[-- Type: text/plain, Size: 13583 bytes --]

slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
bt_sock                3     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0
rpc_buffers            8      8   2048    2    1 : tunables   24   12    0 : slabdata      4      4      0
rpc_tasks              8     15    256   15    1 : tunables  120   60    0 : slabdata      1      1      0
rpc_inode_cache        6      7    512    7    1 : tunables   54   27    0 : slabdata      1      1      0
ip_fib_hash           11    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
scsi_cmd_cache        52    110    384   10    1 : tunables   54   27    0 : slabdata      8     11      0
sgpool-128            32     32   2048    2    1 : tunables   24   12    0 : slabdata     16     16      0
sgpool-64             32     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
sgpool-32             32     32    512    8    1 : tunables   54   27    0 : slabdata      4      4      0
sgpool-16             32     45    256   15    1 : tunables  120   60    0 : slabdata      3      3      0
sgpool-8              97    217    128   31    1 : tunables  120   60    0 : slabdata      6      7      0
dm_tio                 0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
dm_io                  0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
uhci_urb_priv          0      0     44   88    1 : tunables  120   60    0 : slabdata      0      0      0
dn_fib_info_cache      0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
dn_dst_cache           0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
dn_neigh_cache         0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
decnet_socket_cache      0      0    768    5    1 : tunables   54   27    0 : slabdata      0      0      0
clip_arp_cache         0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
xfrm6_tunnel_spi       0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
fib6_nodes            13    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
ip6_dst_cache         51     90    256   15    1 : tunables  120   60    0 : slabdata      6      6      0
ndisc_cache            5     15    256   15    1 : tunables  120   60    0 : slabdata      1      1      0
raw6_sock              0      0    640    6    1 : tunables   54   27    0 : slabdata      0      0      0
udp6_sock              0      0    640    6    1 : tunables   54   27    0 : slabdata      0      0      0
tcp6_sock              5      7   1152    7    2 : tunables   24   12    0 : slabdata      1      1      0
unix_sock             50     50    384   10    1 : tunables   54   27    0 : slabdata      5      5      0
ip_mrt_cache           0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
tcp_tw_bucket          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
tcp_bind_bucket        8    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_open_request       0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
inet_peer_cache        1     61     64   61    1 : tunables  120   60    0 : slabdata      1      1      0
secpath_cache          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
xfrm_dst_cache         0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
ip_dst_cache          23     30    256   15    1 : tunables  120   60    0 : slabdata      2      2      0
arp_cache              1     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
raw4_sock              0      0    512    7    1 : tunables   54   27    0 : slabdata      0      0      0
udp_sock               3      7    512    7    1 : tunables   54   27    0 : slabdata      1      1      0
tcp_sock              20     20   1024    4    1 : tunables   54   27    0 : slabdata      5      5      0
flow_cache             0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
isofs_inode_cache      0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
fat_inode_cache        0      0    340   11    1 : tunables   54   27    0 : slabdata      0      0      0
ext2_inode_cache       0      0    400   10    1 : tunables   54   27    0 : slabdata      0      0      0
journal_handle        16    135     28  135    1 : tunables  120   60    0 : slabdata      1      1      0
journal_head         521    891     48   81    1 : tunables  120   60    0 : slabdata     11     11      0
revoke_table           4    290     12  290    1 : tunables  120   60    0 : slabdata      1      1      0
revoke_record          0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
ext3_inode_cache    7263   7263    440    9    1 : tunables   54   27    0 : slabdata    807    807      0
ext3_xattr             0      0     44   88    1 : tunables  120   60    0 : slabdata      0      0      0
reiser_inode_cache      0      0    368   11    1 : tunables   54   27    0 : slabdata      0      0      0
dquot                  0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_pwq          0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_epi          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
kioctx                 0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
kiocb                  0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
dnotify_cache          1    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
fasync_cache           0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
shmem_inode_cache      4     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0
posix_timers_cache      0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uid_cache              9     61     64   61    1 : tunables  120   60    0 : slabdata      1      1      0
cfq_pool              64    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
crq_pool               0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
deadline_drq           0      0     48   81    1 : tunables  120   60    0 : slabdata      0      0      0
as_arq               135    195     60   65    1 : tunables  120   60    0 : slabdata      3      3      0
blkdev_ioc           109    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_queue          21     24    452    8    1 : tunables   54   27    0 : slabdata      3      3      0
blkdev_requests      122    182    152   26    1 : tunables  120   60    0 : slabdata      7      7      0
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
biovec-64            257    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
biovec-16         131340 131340    256   15    1 : tunables  120   60    0 : slabdata   8756   8756      0
biovec-4             305    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1             368    678     16  226    1 : tunables  120   60    0 : slabdata      3      3      0
bio               131396 131516     64   61    1 : tunables  120   60    0 : slabdata   2156   2156      0
file_lock_cache        6     45     88   45    1 : tunables  120   60    0 : slabdata      1      1      0
sock_inode_cache     100    100    384   10    1 : tunables   54   27    0 : slabdata     10     10      0
skbuff_head_cache   2130   2130    256   15    1 : tunables  120   60    0 : slabdata    142    142      0
sock                  30     30    384   10    1 : tunables   54   27    0 : slabdata      3      3      0
proc_inode_cache     435    481    308   13    1 : tunables   54   27    0 : slabdata     37     37      0
sigqueue              27     27    148   27    1 : tunables  120   60    0 : slabdata      1      1      0
radix_tree_node     7378   7378    276   14    1 : tunables   54   27    0 : slabdata    527    527      0
bdev_cache             8     14    512    7    1 : tunables   54   27    0 : slabdata      2      2      0
mnt_cache             21     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
inode_cache         3406   3406    292   13    1 : tunables   54   27    0 : slabdata    262    262      0
dentry_cache       15148  15148    140   28    1 : tunables  120   60    0 : slabdata    541    541      0
filp                 705    945    256   15    1 : tunables  120   60    0 : slabdata     63     63      0
names_cache           19     19   4096    1    1 : tunables   24   12    0 : slabdata     19     19      0
idr_layer_cache      100    116    136   29    1 : tunables  120   60    0 : slabdata      4      4      0
buffer_head        18041  67230     48   81    1 : tunables  120   60    0 : slabdata    830    830      0
mm_struct            102    102    640    6    1 : tunables   54   27    0 : slabdata     17     17      0
vm_area_struct      1831   2256     84   47    1 : tunables  120   60    0 : slabdata     48     48      0
fs_cache             119    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
files_cache           98     98    512    7    1 : tunables   54   27    0 : slabdata     14     14      0
signal_cache         155    155    128   31    1 : tunables  120   60    0 : slabdata      5      5      0
sighand_cache         89    115   1408    5    2 : tunables   24   12    0 : slabdata     23     23      0
task_struct           93    114   1360    3    1 : tunables   24   12    0 : slabdata     38     38      0
anon_vma             921   1221      8  407    1 : tunables  120   60    0 : slabdata      3      3      0
pgd                   73     73   4096    1    1 : tunables   24   12    0 : slabdata     73     73      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            2      2 131072    1   32 : tunables    8    4    0 : slabdata      2      2      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536         32834  32834  65536    1   16 : tunables    8    4    0 : slabdata  32834  32834      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             1      1  32768    1    8 : tunables    8    4    0 : slabdata      1      1      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             2      2  16384    1    4 : tunables    8    4    0 : slabdata      2      2      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192            124    128   8192    1    2 : tunables    8    4    0 : slabdata    124    128      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096           2095   2095   4096    1    1 : tunables   24   12    0 : slabdata   2095   2095      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048          32867  32868   2048    2    1 : tunables   24   12    0 : slabdata  16434  16434      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024            141    144   1024    4    1 : tunables   54   27    0 : slabdata     36     36      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512             200    560    512    8    1 : tunables   54   27    0 : slabdata     70     70      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             189    480    256   15    1 : tunables  120   60    0 : slabdata     32     32      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
size-128            2181   2263    128   31    1 : tunables  120   60    0 : slabdata     73     73      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
size-64              921   1281     64   61    1 : tunables  120   60    0 : slabdata     21     21      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
size-32            52715  52955     32  119    1 : tunables  120   60    0 : slabdata    445    445      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : slabdata      4      4      0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.8.1 mempool subsystem sickness
  2004-09-15 17:27         ` Jeff V. Merkey
  2004-09-15 17:33           ` Jeff V. Merkey
@ 2004-09-16  1:46           ` Nick Piggin
  2004-09-16  5:56             ` Jens Axboe
  1 sibling, 1 reply; 10+ messages in thread
From: Nick Piggin @ 2004-09-16  1:46 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: jmerkey, linux-kernel, jmerkey

Jeff V. Merkey wrote:
> Nick Piggin wrote:

>> OK, this is against 2.6.9-rc2. Let me know how you go. Thanks
>>
>>  
>>
> 
> Nick,
> 
> The problem is corrected with this patch.  I am running with 3GB of 
> kernel memory
> and 1GB user space with the userspace splitting patch with very heavy 
> swapping
> and user space app activity and no failed allocations.  This patch 
> should be rolled
> into 2.6.9-rc2 since it fixes the problem.  With standard 3GB User/1GB 
> kernel
> address space, it also fixes the problems with X server running out of 
> memory
> and the apps crashing.
> 

Hi Jeff,
Thanks, that is very cool. The memory problems you're seeing aren't
actually a regression (it's always been like that), and I still haven't
got hold of some gigabit networking hardware to test it thoroughly, so
as such so it may be difficult to get this into 2.6.9. Hopefully soon
though.

I can provide you (or anyone) with up to date patches on request though,
so just let me know.

> Jeff
> 
> Here's the stats from the test of the patch against 2.6.8-rc2 with the 
> patch applied
> 
> 

Scanning stats look good at a quick glance. kswapd doesn't seem to be
going crazy.

However,
size-65536         32834  32834  65536    1   16

This slab entry is taking up about 2GB of unreclaimable memory (order-4,
no less). This must be a leak... does the number continue to rise as
your system runs?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.8.1 mempool subsystem sickness
  2004-09-16  1:46           ` Nick Piggin
@ 2004-09-16  5:56             ` Jens Axboe
  0 siblings, 0 replies; 10+ messages in thread
From: Jens Axboe @ 2004-09-16  5:56 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Jeff V. Merkey, jmerkey, linux-kernel, jmerkey

On Thu, Sep 16 2004, Nick Piggin wrote:
> Jeff V. Merkey wrote:
> >Jeff
> >
> >Here's the stats from the test of the patch against 2.6.8-rc2 with the 
> >patch applied
> >
> >
> 
> Scanning stats look good at a quick glance. kswapd doesn't seem to be
> going crazy.
> 
> However,
> size-65536         32834  32834  65536    1   16
> 
> This slab entry is taking up about 2GB of unreclaimable memory (order-4,
> no less). This must be a leak... does the number continue to rise as
> your system runs?

There's also a huge amount of 16-page bio + vecs in flight:

biovec-16         131340

That would point to a leak as well, most likely.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 2.6.8.1 mempool subsystem sickness
  2004-09-08 16:48 jmerkey
@ 2004-09-08 23:05 ` Nick Piggin
  0 siblings, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2004-09-08 23:05 UTC (permalink / raw)
  To: jmerkey; +Cc: linux-kernel, jmerkey

jmerkey@comcast.net wrote:
> On a system with 4GB of memory, and without 
> the user space patch that spilts user space
> just a stock kernel, I am seeing memory 
> allocation failures with X server and simple
> apps on a machine with a Pentium 4 
> processor and 500MB of memory.  
> 
> If you load large apps and do a lot of 
> skb traffic, the mempool abd slab 
> caches start gobbling up pages
> and don't seem to balance them 
> very well, resulting in memory 
> allocation failures over time if
> the system stays up for a week 
> or more.  
> 
> I am also seeing the same behavior 
> on another system which has been
> running for almost 30 days with 
> an skb based traffic regeneration 
> test calling and sending skb's
> in kernel between two interfaces.
> 
> The pages over time get stuck 
> in the slab allocator and user
> space apps start to fail on alloc
> requests.  
> 
> Rebooting the system clears
> the problem, which slowly over time
> comes back.  I am seeing this with
> stock kernels from kernel.org 
> and on kernels I have patched,
> so the problem seems to be
> in the base code.  I have spent
> the last two weeks observing 
> the problem to verify I can
> reproduce it and it keeps 
> happening.  
> 
> Jeff
> 

Hi Jeff,
Can you give us a few more details please? Post the allocation failure
messages in full, and post /proc/meminfo, etc. Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* 2.6.8.1 mempool subsystem sickness
@ 2004-09-08 16:48 jmerkey
  2004-09-08 23:05 ` Nick Piggin
  0 siblings, 1 reply; 10+ messages in thread
From: jmerkey @ 2004-09-08 16:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: jmerkey

On a system with 4GB of memory, and without 
the user space patch that spilts user space
just a stock kernel, I am seeing memory 
allocation failures with X server and simple
apps on a machine with a Pentium 4 
processor and 500MB of memory.  

If you load large apps and do a lot of 
skb traffic, the mempool abd slab 
caches start gobbling up pages
and don't seem to balance them 
very well, resulting in memory 
allocation failures over time if
the system stays up for a week 
or more.  

I am also seeing the same behavior 
on another system which has been
running for almost 30 days with 
an skb based traffic regeneration 
test calling and sending skb's
in kernel between two interfaces.

The pages over time get stuck 
in the slab allocator and user
space apps start to fail on alloc
requests.  

Rebooting the system clears
the problem, which slowly over time
comes back.  I am seeing this with
stock kernels from kernel.org 
and on kernels I have patched,
so the problem seems to be
in the base code.  I have spent
the last two weeks observing 
the problem to verify I can
reproduce it and it keeps 
happening.  

Jeff


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-09-16  6:03 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <091420042058.15928.41475B8000002BA100003E382200763704970A059D0A0306@comcast.net>
2004-09-14 20:32 ` 2.6.8.1 mempool subsystem sickness Jeff V. Merkey
2004-09-14 22:59   ` Nick Piggin
     [not found]     ` <20040914223122.GA3325@galt.devicelogics.com>
2004-09-14 23:51       ` Nick Piggin
2004-09-15  0:51         ` Gene Heskett
2004-09-15 17:27         ` Jeff V. Merkey
2004-09-15 17:33           ` Jeff V. Merkey
2004-09-16  1:46           ` Nick Piggin
2004-09-16  5:56             ` Jens Axboe
2004-09-08 16:48 jmerkey
2004-09-08 23:05 ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).