All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
@ 2007-01-21 19:27 Justin Piszcz
  2007-01-22 13:37 ` Pavel Machek
  2007-01-22 19:57   ` Andrew Morton
  0 siblings, 2 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-21 19:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-raid, xfs

Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
the OOM killer and kill all of my processes?

Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
happens every time!

Anything to try?  Any other output needed?  Can someone shed some light on 
this situation?

Thanks.


The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)

procs -----------memory---------- ---swap-- -----io---- -system-- 
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id 
wa
 0  7    764  50348     12 1269988    0    0 53632   172 1902 4600  1  8 
29 62
 0  7    764  49420     12 1260004    0    0 53632 34368 1871 6357  2 11 
48 40
 0  6    764  39608     12 1237420    0    0 52880 94696 1891 7424  1 12 
47 39
 0  6    764  44264     12 1226064    0    0 42496 29723 1908 6035  1  9 
31 58
 0  6    764  26672     12 1214180    0    0 43520 117472 1944 6189  1 13  
0 87
 0  7    764   9132     12 1211732    0    0 22016 80400 1570 3304  1  8  
0 92
 1  8    764   9512     12 1200388    0    0 33288 62212 1687 4843  1  9  
0 91
 0  5    764  13980     12 1197096    0    0  5012   161 1619 2115  1  4 
42 54
 0  5    764  29604     12 1197220    0    0     0   112 1548 1602  0  3 
50 48
 0  5    764  49692     12 1197396    0    0     0   152 1484 1438  1  3 
50 47
 0  5    764  73128     12 1197644    0    0     0   120 1463 1392  1  3 
49 47
 0  4    764  99460     12 1197704    0    0    24   168 1545 1803  1  3 
39 57
 0  4    764 100088     12 1219296    0    0 11672 75450 1614 1371  0  5 
73 22
 0  6    764  50404     12 1269072    0    0 53632   145 1989 3871  1  9 
34 56
 0  6    764  51500     12 1267684    0    0 53632   608 1834 4437  1  8 
21 71
 4  5    764  51424     12 1266792    0    0 53504  7584 1847 4393  2  9 
48 42
 0  6    764  51456     12 1263736    0    0 53636  9804 1880 4326  1 10  
9 81
 0  6    764  50640     12 1263060    0    0 53504  4392 1929 4430  1  8 
28 63
 0  6    764  50956     12 1257884   24    0 50724 17214 1858 4755  1 11 
35 54
 0  6    764  48360     12 1247692    0    0 50840 48880 1871 6242  1 10  
0 89
 0  6    764  40028     12 1225860    0    0 42512 93346 1770 5599  2 11  
0 87
procs -----------memory---------- ---swap-- -----io---- -system-- 
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id 
wa
 0  6    764  20372     12 1211744    0    0 21512 123664 1747 4378  0  9  
3 88
 0  6    764  12140     12 1188584    0    0 20224 111244 1628 4244  1  9 
15 76
 0  7    764  11280     12 1171144    0    0 22300 80936 1669 5314  1  8 
11 80
 0  6    764  12168     12 1162840    0    0 28168 44072 1808 5065  1  8  
0 92
 1  7   3132 123740     12 1051848    0 2368  3852 34246 2097 2376  0  5  
0 94
 1  6   3132  19996     12 1155664    0    0 51752   290 1999 2136  2  8  
0 91


The last lines of iostat (right before it kill -9'd my shell/ssh)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
           0.51    0.00    7.65   91.84    0.00    0.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda              87.63  3873.20 254.64 65.98 21905.15 24202.06   287.61   
144.37  209.54   3.22 103.30
sdb               0.00  3873.20  1.03 132.99    12.37 60045.36   896.25    
41.19  398.53   6.93  92.89
sdc               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
sdd               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
sde               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
sdf               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
sdg               0.00    30.93  0.00  9.28     0.00   157.73    34.00     
0.01    1.11   1.11   1.03
sdh               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
sdi              12.37     0.00  7.22  1.03    78.35     1.03    19.25     
0.04    5.38   5.38   4.43
sdj              12.37     0.00  6.19  1.03    74.23     1.03    20.86     
0.02    2.43   2.43   1.75
sdk               0.00    30.93  0.00  9.28     0.00   157.73    34.00     
0.01    0.56   0.56   0.52
md0               0.00     0.00  0.00 610.31     0.00  2441.24     8.00     
0.00    0.00   0.00   0.00
md2               0.00     0.00 347.42 996.91 22181.44  7917.53    44.78     
0.00    0.00   0.00   0.00
md1               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
md3               0.00     0.00  1.03  9.28     4.12   164.95    32.80     
0.00    0.00   0.00   0.00
md4               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle  
           1.02    0.00   12.24   86.73    0.00    0.00

Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda             142.00  2497.00 403.00 38.00 34828.00 15112.00   226.49   
142.77  181.95   2.27 100.10
sdb               0.00  2498.00  8.00 27.00    36.00 12844.00   736.00     
3.63   45.31   6.06  21.20
sdc               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
sdd               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
sde               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
sdf               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
sdg               7.00    25.00  1.00 11.00    32.00   144.00    29.33     
0.04    3.58   2.58   3.10
sdh               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
sdi              10.00     7.00  4.00  1.00    56.00    32.00    35.20     
0.01    2.40   2.40   1.20
sdj              10.00    22.00  4.00  8.00    56.00   120.00    29.33     
0.03    2.50   2.50   3.00
sdk               7.00    10.00  1.00  4.00    32.00    56.00    35.20     
0.01    2.40   2.40   1.20
md0               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
md2               0.00     0.00 558.00 7028.00 34664.00 56076.00    23.92     
0.00    0.00   0.00   0.00
md1               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00
md3               0.00     0.00  0.00 11.00     0.00   168.00    30.55     
0.00    0.00   0.00   0.00
md4               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
0.00    0.00   0.00   0.00


The last lines of dmesg:
[ 5947.199985] lowmem_reserve[]: 0 0 0
[ 5947.199992] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
[ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
[ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
[ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
[ 5947.200055] Free swap  = 2197628kB
[ 5947.200058] Total swap = 2200760kB
[ 5947.200060] Free swap:       2197628kB
[ 5947.205664] 517888 pages of RAM 
[ 5947.205671] 288512 pages of HIGHMEM
[ 5947.205673] 5666 reserved pages 
[ 5947.205675] 257163 pages shared
[ 5947.205678] 600 pages swap cached 
[ 5947.205680] 88876 pages dirty
[ 5947.205682] 115111 pages writeback
[ 5947.205684] 5608 pages mapped
[ 5947.205686] 49367 pages slab
[ 5947.205688] 541 pages pagetables
[ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
child
[ 5947.205801] Killed process 1853 (named)
[ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
oomkilladj=0
[ 5947.206621]  [<c013e33b>] out_of_memory+0x17b/0x1b0
[ 5947.206631]  [<c013fcac>] __alloc_pages+0x29c/0x2f0
[ 5947.206636]  [<c01479ad>] __pte_alloc+0x1d/0x90
[ 5947.206643]  [<c0148bf7>] copy_page_range+0x357/0x380
[ 5947.206649]  [<c0119d75>] copy_process+0x765/0xfc0
[ 5947.206655]  [<c012c3f9>] alloc_pid+0x1b9/0x280
[ 5947.206662]  [<c011a839>] do_fork+0x79/0x1e0
[ 5947.206674]  [<c015f91f>] do_pipe+0x5f/0xc0
[ 5947.206680]  [<c0101176>] sys_clone+0x36/0x40
[ 5947.206686]  [<c0103138>] syscall_call+0x7/0xb
[ 5947.206691]  [<c0420033>] __sched_text_start+0x853/0x950
[ 5947.206698]  ======================= 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-21 19:27 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2 Justin Piszcz
@ 2007-01-22 13:37 ` Pavel Machek
  2007-01-22 18:48   ` Justin Piszcz
  2007-01-22 19:57   ` Andrew Morton
  1 sibling, 1 reply; 24+ messages in thread
From: Pavel Machek @ 2007-01-22 13:37 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs

On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> the OOM killer and kill all of my processes?
> 
> Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> happens every time!
> 
> Anything to try?  Any other output needed?  Can someone shed some light on 
> this situation?

Is it highmem-related? Can you try it with mem=256M?

							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-22 13:37 ` Pavel Machek
@ 2007-01-22 18:48   ` Justin Piszcz
  2007-01-22 23:47     ` Pavel Machek
  2007-01-24 23:39     ` Justin Piszcz
  0 siblings, 2 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-22 18:48 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, linux-raid, xfs



On Mon, 22 Jan 2007, Pavel Machek wrote:

> On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
> > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > the OOM killer and kill all of my processes?
> > 
> > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > happens every time!
> > 
> > Anything to try?  Any other output needed?  Can someone shed some light on 
> > this situation?
> 
> Is it highmem-related? Can you try it with mem=256M?
> 
> 							Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

I will give this a try later or tomorrow, I cannot have my machine crash 
at the moment.

Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
that has anything to do with it because after the system kill -9's all the 
processes etc, my terminal looks like garbage.

Justin.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-21 19:27 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2 Justin Piszcz
@ 2007-01-22 19:57   ` Andrew Morton
  2007-01-22 19:57   ` Andrew Morton
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2007-01-22 19:57 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs

> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> the OOM killer and kill all of my processes?

What's that?   Software raid or hardware raid?  If the latter, which driver?

> Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> happens every time!
> 
> Anything to try?  Any other output needed?  Can someone shed some light on 
> this situation?
> 
> Thanks.
> 
> 
> The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
> 
> procs -----------memory---------- ---swap-- -----io---- -system-- 
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id 
> wa
>  0  7    764  50348     12 1269988    0    0 53632   172 1902 4600  1  8 
> 29 62
>  0  7    764  49420     12 1260004    0    0 53632 34368 1871 6357  2 11 
> 48 40

The wordwrapping is painful :(

> 
> The last lines of dmesg:
> [ 5947.199985] lowmem_reserve[]: 0 0 0
> [ 5947.199992] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
> 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
> [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
> 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
> [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
> 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
> [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
> [ 5947.200055] Free swap  = 2197628kB
> [ 5947.200058] Total swap = 2200760kB
> [ 5947.200060] Free swap:       2197628kB
> [ 5947.205664] 517888 pages of RAM 
> [ 5947.205671] 288512 pages of HIGHMEM
> [ 5947.205673] 5666 reserved pages 
> [ 5947.205675] 257163 pages shared
> [ 5947.205678] 600 pages swap cached 
> [ 5947.205680] 88876 pages dirty
> [ 5947.205682] 115111 pages writeback
> [ 5947.205684] 5608 pages mapped
> [ 5947.205686] 49367 pages slab
> [ 5947.205688] 541 pages pagetables
> [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
> child
> [ 5947.205801] Killed process 1853 (named)
> [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
> oomkilladj=0
> [ 5947.206621]  [<c013e33b>] out_of_memory+0x17b/0x1b0
> [ 5947.206631]  [<c013fcac>] __alloc_pages+0x29c/0x2f0
> [ 5947.206636]  [<c01479ad>] __pte_alloc+0x1d/0x90
> [ 5947.206643]  [<c0148bf7>] copy_page_range+0x357/0x380
> [ 5947.206649]  [<c0119d75>] copy_process+0x765/0xfc0
> [ 5947.206655]  [<c012c3f9>] alloc_pid+0x1b9/0x280
> [ 5947.206662]  [<c011a839>] do_fork+0x79/0x1e0
> [ 5947.206674]  [<c015f91f>] do_pipe+0x5f/0xc0
> [ 5947.206680]  [<c0101176>] sys_clone+0x36/0x40
> [ 5947.206686]  [<c0103138>] syscall_call+0x7/0xb
> [ 5947.206691]  [<c0420033>] __sched_text_start+0x853/0x950
> [ 5947.206698]  ======================= 

Important information from the oom-killing event is missing.  Please send
it all.

From your earlier reports we have several hundred MB of ZONE_NORMAL memory
which has gone awol.

Please include /proc/meminfo from after the oom-killing.

Please work out what is using all that slab memory, via /proc/slabinfo.

After the oom-killing, please see if you can free up the ZONE_NORMAL memory
via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
work out what happened to the missing couple-of-hundred MB from
ZONE_NORMAL.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
@ 2007-01-22 19:57   ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2007-01-22 19:57 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs

> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> the OOM killer and kill all of my processes?

What's that?   Software raid or hardware raid?  If the latter, which driver?

> Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> happens every time!
> 
> Anything to try?  Any other output needed?  Can someone shed some light on 
> this situation?
> 
> Thanks.
> 
> 
> The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
> 
> procs -----------memory---------- ---swap-- -----io---- -system-- 
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id 
> wa
>  0  7    764  50348     12 1269988    0    0 53632   172 1902 4600  1  8 
> 29 62
>  0  7    764  49420     12 1260004    0    0 53632 34368 1871 6357  2 11 
> 48 40

The wordwrapping is painful :(

> 
> The last lines of dmesg:
> [ 5947.199985] lowmem_reserve[]: 0 0 0
> [ 5947.199992] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
> 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
> [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
> 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
> [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
> 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
> [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
> [ 5947.200055] Free swap  = 2197628kB
> [ 5947.200058] Total swap = 2200760kB
> [ 5947.200060] Free swap:       2197628kB
> [ 5947.205664] 517888 pages of RAM 
> [ 5947.205671] 288512 pages of HIGHMEM
> [ 5947.205673] 5666 reserved pages 
> [ 5947.205675] 257163 pages shared
> [ 5947.205678] 600 pages swap cached 
> [ 5947.205680] 88876 pages dirty
> [ 5947.205682] 115111 pages writeback
> [ 5947.205684] 5608 pages mapped
> [ 5947.205686] 49367 pages slab
> [ 5947.205688] 541 pages pagetables
> [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
> child
> [ 5947.205801] Killed process 1853 (named)
> [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
> oomkilladj=0
> [ 5947.206621]  [<c013e33b>] out_of_memory+0x17b/0x1b0
> [ 5947.206631]  [<c013fcac>] __alloc_pages+0x29c/0x2f0
> [ 5947.206636]  [<c01479ad>] __pte_alloc+0x1d/0x90
> [ 5947.206643]  [<c0148bf7>] copy_page_range+0x357/0x380
> [ 5947.206649]  [<c0119d75>] copy_process+0x765/0xfc0
> [ 5947.206655]  [<c012c3f9>] alloc_pid+0x1b9/0x280
> [ 5947.206662]  [<c011a839>] do_fork+0x79/0x1e0
> [ 5947.206674]  [<c015f91f>] do_pipe+0x5f/0xc0
> [ 5947.206680]  [<c0101176>] sys_clone+0x36/0x40
> [ 5947.206686]  [<c0103138>] syscall_call+0x7/0xb
> [ 5947.206691]  [<c0420033>] __sched_text_start+0x853/0x950
> [ 5947.206698]  ======================= 

Important information from the oom-killing event is missing.  Please send
it all.

>From your earlier reports we have several hundred MB of ZONE_NORMAL memory
which has gone awol.

Please include /proc/meminfo from after the oom-killing.

Please work out what is using all that slab memory, via /proc/slabinfo.

After the oom-killing, please see if you can free up the ZONE_NORMAL memory
via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
work out what happened to the missing couple-of-hundred MB from
ZONE_NORMAL.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-22 19:57   ` Andrew Morton
  (?)
@ 2007-01-22 20:20   ` Justin Piszcz
  -1 siblings, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-22 20:20 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-raid, xfs

> What's that?   Software raid or hardware raid?  If the latter, which 
driver?

Software RAID (md)

On Mon, 22 Jan 2007, Andrew Morton wrote:

> > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > the OOM killer and kill all of my processes?
> 
> What's that?   Software raid or hardware raid?  If the latter, which driver?
> 
> > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > happens every time!
> > 
> > Anything to try?  Any other output needed?  Can someone shed some light on 
> > this situation?
> > 
> > Thanks.
> > 
> > 
> > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
> > 
> > procs -----------memory---------- ---swap-- -----io---- -system-- 
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id 
> > wa
> >  0  7    764  50348     12 1269988    0    0 53632   172 1902 4600  1  8 
> > 29 62
> >  0  7    764  49420     12 1260004    0    0 53632 34368 1871 6357  2 11 
> > 48 40
> 
> The wordwrapping is painful :(
> 
> > 
> > The last lines of dmesg:
> > [ 5947.199985] lowmem_reserve[]: 0 0 0
> > [ 5947.199992] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
> > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
> > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
> > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
> > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
> > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
> > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
> > [ 5947.200055] Free swap  = 2197628kB
> > [ 5947.200058] Total swap = 2200760kB
> > [ 5947.200060] Free swap:       2197628kB
> > [ 5947.205664] 517888 pages of RAM 
> > [ 5947.205671] 288512 pages of HIGHMEM
> > [ 5947.205673] 5666 reserved pages 
> > [ 5947.205675] 257163 pages shared
> > [ 5947.205678] 600 pages swap cached 
> > [ 5947.205680] 88876 pages dirty
> > [ 5947.205682] 115111 pages writeback
> > [ 5947.205684] 5608 pages mapped
> > [ 5947.205686] 49367 pages slab
> > [ 5947.205688] 541 pages pagetables
> > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
> > child
> > [ 5947.205801] Killed process 1853 (named)
> > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
> > oomkilladj=0
> > [ 5947.206621]  [<c013e33b>] out_of_memory+0x17b/0x1b0
> > [ 5947.206631]  [<c013fcac>] __alloc_pages+0x29c/0x2f0
> > [ 5947.206636]  [<c01479ad>] __pte_alloc+0x1d/0x90
> > [ 5947.206643]  [<c0148bf7>] copy_page_range+0x357/0x380
> > [ 5947.206649]  [<c0119d75>] copy_process+0x765/0xfc0
> > [ 5947.206655]  [<c012c3f9>] alloc_pid+0x1b9/0x280
> > [ 5947.206662]  [<c011a839>] do_fork+0x79/0x1e0
> > [ 5947.206674]  [<c015f91f>] do_pipe+0x5f/0xc0
> > [ 5947.206680]  [<c0101176>] sys_clone+0x36/0x40
> > [ 5947.206686]  [<c0103138>] syscall_call+0x7/0xb
> > [ 5947.206691]  [<c0420033>] __sched_text_start+0x853/0x950
> > [ 5947.206698]  ======================= 
> 
> Important information from the oom-killing event is missing.  Please send
> it all.
> 
> >From your earlier reports we have several hundred MB of ZONE_NORMAL memory
> which has gone awol.
> 
> Please include /proc/meminfo from after the oom-killing.
> 
> Please work out what is using all that slab memory, via /proc/slabinfo.
> 
> After the oom-killing, please see if you can free up the ZONE_NORMAL memory
> via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
> work out what happened to the missing couple-of-hundred MB from
> ZONE_NORMAL.
> 
> 

I believe this is the first part of it (hopefully):

2908kB active:86104kB inactive:1061904kB present:1145032kB pages_scanned:0 
all_unreclaimable? no
[ 5947.199985] lowmem_reserve[]: 0 0 0
[ 5947.199992] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
[ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
[ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
[ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
[ 5947.200055] Free swap  = 2197628kB
[ 5947.200058] Total swap = 2200760kB
[ 5947.200060] Free swap:       2197628kB
[ 5947.205664] 517888 pages of RAM
[ 5947.205671] 288512 pages of HIGHMEM
[ 5947.205673] 5666 reserved pages
[ 5947.205675] 257163 pages shared
[ 5947.205678] 600 pages swap cached
[ 5947.205680] 88876 pages dirty
[ 5947.205682] 115111 pages writeback
[ 5947.205684] 5608 pages mapped
[ 5947.205686] 49367 pages slab
[ 5947.205688] 541 pages pagetables
[ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
child
[ 5947.205801] Killed process 1853 (named)
[ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
oomkilladj=0
[ 5947.206621]  [<c013e33b>] out_of_memory+0x17b/0x1b0
[ 5947.206631]  [<c013fcac>] __alloc_pages+0x29c/0x2f0
[ 5947.206636]  [<c01479ad>] __pte_alloc+0x1d/0x90
[ 5947.206643]  [<c0148bf7>] copy_page_range+0x357/0x380
[ 5947.206649]  [<c0119d75>] copy_process+0x765/0xfc0
[ 5947.206655]  [<c012c3f9>] alloc_pid+0x1b9/0x280
[ 5947.206662]  [<c011a839>] do_fork+0x79/0x1e0
[ 5947.206674]  [<c015f91f>] do_pipe+0x5f/0xc0
[ 5947.206680]  [<c0101176>] sys_clone+0x36/0x40
[ 5947.206686]  [<c0103138>] syscall_call+0x7/0xb
[ 5947.206691]  [<c0420033>] __sched_text_start+0x853/0x950
[ 5947.206698]  =======================

I will have to include the other parts when I am near the machine and can 
reboot it locally :)

Justin.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-22 18:48   ` Justin Piszcz
@ 2007-01-22 23:47     ` Pavel Machek
  2007-01-24 23:39     ` Justin Piszcz
  1 sibling, 0 replies; 24+ messages in thread
From: Pavel Machek @ 2007-01-22 23:47 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs

On Mon 2007-01-22 13:48:44, Justin Piszcz wrote:
> 
> 
> On Mon, 22 Jan 2007, Pavel Machek wrote:
> 
> > On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
> > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > > the OOM killer and kill all of my processes?
> > > 
> > > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > > happens every time!
> > > 
> > > Anything to try?  Any other output needed?  Can someone shed some light on 
> > > this situation?
> > 
> > Is it highmem-related? Can you try it with mem=256M?
> > 
> > 							Pavel
> > -- 
> > (english) http://www.livejournal.com/~pavelmachek
> > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> I will give this a try later or tomorrow, I cannot have my machine crash 
> at the moment.
> 
> Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
> that has anything to do with it because after the system kill -9's all the 
> processes etc, my terminal looks like garbage.

That looks like separate problem. Switch to text mode console (vgacon,
not fbcon) for tests.
								Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-22 19:57   ` Andrew Morton
  (?)
  (?)
@ 2007-01-23  0:37   ` Donald Douwsma
  2007-01-23  1:12     ` Andrew Morton
  -1 siblings, 1 reply; 24+ messages in thread
From: Donald Douwsma @ 2007-01-23  0:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Justin Piszcz, linux-kernel, linux-raid, xfs

Andrew Morton wrote:
>> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
>> the OOM killer and kill all of my processes?
> 
> What's that?   Software raid or hardware raid?  If the latter, which driver?

I've hit this using local disk while testing xfs built against 2.6.20-rc4 (SMP x86_64)

dmesg follows, I'm not sure if anything in this is useful after the first event as our automated tests continued on
after the failure.

> Please include /proc/meminfo from after the oom-killing.
>
> Please work out what is using all that slab memory, via /proc/slabinfo.

Sorry I didnt pick this up ether.
I'll try to reproduce this and gather some more detailed info for a single event.

Donald


...
XFS mounting filesystem sdb5
Ending clean XFS mount for filesystem: sdb5
XFS mounting filesystem sdb5
Ending clean XFS mount for filesystem: sdb5
hald invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

Call Trace:
 [<ffffffff80257367>] out_of_memory+0x70/0x25d
 [<ffffffff80258f6b>] __alloc_pages+0x22c/0x2b5
 [<ffffffff8026d889>] alloc_page_vma+0x71/0x76
 [<ffffffff8026937b>] read_swap_cache_async+0x45/0xd8
 [<ffffffff8025f2e0>] swapin_readahead+0x60/0xd3
 [<ffffffff80260ece>] __handle_mm_fault+0x703/0x9d8
 [<ffffffff80532bf7>] do_page_fault+0x42b/0x7b3
 [<ffffffff80278adf>] do_readv_writev+0x176/0x18b
 [<ffffffff8052efde>] thread_return+0x0/0xed
 [<ffffffff8034d7f5>] __const_udelay+0x2c/0x2d
 [<ffffffff803f4e0b>] scsi_done+0x0/0x17
 [<ffffffff8053109d>] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
CPU    1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU    2: Hot: hi:  186, btch:  31 usd:  20   Cold: hi:   62, btch:  15 usd:  47
CPU    3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  56
Active:76 inactive:495856 dirty:0 writeback:0 unstable:0 free:3680 slab:9119 mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB present:9376kB pages_scanned:3296
all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003
Node 0 DMA32 free:6684kB min:5712kB low:7140kB high:8568kB active:304kB inactive:1981624kB present:2052068kB
pages_scanned:4343329 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8036kB
Node 0 DMA32: 273*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 6684kB
Swap cache: add 741048, delete 244661, find 84826/143198, race 680+239
Free swap  = 1088524kB
Total swap = 3140668kB
Free swap:       1088524kB
524224 pages of RAM
9619 reserved pages
259 pages shared
496388 pages swap cached
No available memory (MPOL_BIND): kill process 3492 (hald) score 0 or a child
Killed process 3626 (hald-addon-acpi)
top invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

Call Trace:
 [<ffffffff80257367>] out_of_memory+0x70/0x25d
 [<ffffffff80258f6b>] __alloc_pages+0x22c/0x2b5
 [<ffffffff8026e6a3>] alloc_pages_current+0x74/0x79
 [<ffffffff802548c8>] __page_cache_alloc+0xb/0xe
 [<ffffffff8025a65f>] __do_page_cache_readahead+0xa1/0x217
 [<ffffffff8052f776>] io_schedule+0x28/0x33
 [<ffffffff8052f9e7>] __wait_on_bit_lock+0x5b/0x66
 [<ffffffff802546de>] __lock_page+0x72/0x78
 [<ffffffff8025ab22>] do_page_cache_readahead+0x4e/0x5a
 [<ffffffff80256714>] filemap_nopage+0x140/0x30c
 [<ffffffff802609c6>] __handle_mm_fault+0x1fb/0x9d8
 [<ffffffff80532bf7>] do_page_fault+0x42b/0x7b3
 [<ffffffff80228273>] __wake_up+0x43/0x50
 [<ffffffff80380bd5>] tty_ldisc_deref+0x71/0x76
 [<ffffffff8053109d>] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
CPU    1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU    2: Hot: hi:  186, btch:  31 usd:   1   Cold: hi:   62, btch:  15 usd:  10
CPU    3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  26
Active:90 inactive:496233 dirty:0 writeback:0 unstable:0 free:3485 slab:9119 mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB present:9376kB pages_scanned:3328
all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003
Node 0 DMA32 free:5904kB min:5712kB low:7140kB high:8568kB active:360kB inactive:1983092kB present:2052068kB
pages_scanned:4587649 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8036kB
Node 0 DMA32: 78*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 5904kB
Swap cache: add 741067, delete 244673, find 84826/143210, race 680+239
Free swap  = 1088572kB
Total swap = 3140668kB
Free swap:       1088572kB
524224 pages of RAM
9619 reserved pages
290 pages shared
496396 pages swap cached
No available memory (MPOL_BIND): kill process 7914 (top) score 0 or a child
Killed process 7914 (top)
nscd invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

Call Trace:
 [<ffffffff80257367>] out_of_memory+0x70/0x25d
 [<ffffffff80258f6b>] __alloc_pages+0x22c/0x2b5
 [<ffffffff8026d889>] alloc_page_vma+0x71/0x76
 [<ffffffff8026937b>] read_swap_cache_async+0x45/0xd8
 [<ffffffff80260ede>] __handle_mm_fault+0x713/0x9d8
 [<ffffffff80532bf7>] do_page_fault+0x42b/0x7b3
 [<ffffffff80238e16>] try_to_del_timer_sync+0x51/0x5a
 [<ffffffff80238e2b>] del_timer_sync+0xc/0x16
 [<ffffffff8052f939>] schedule_timeout+0x92/0xad
 [<ffffffff80238a40>] process_timeout+0x0/0xb
 [<ffffffff8029e563>] sys_epoll_wait+0x3e0/0x421
 [<ffffffff8022a96b>] default_wake_function+0x0/0xf
 [<ffffffff8053109d>] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  53
CPU    1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU    2: Hot: hi:  186, btch:  31 usd:   0   Cold: hi:   62, btch:  15 usd:  14
CPU    3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  26
Active:91 inactive:496325 dirty:0 writeback:0 unstable:0 free:3425 slab:9119 mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB present:9376kB pages_scanned:3328
all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003
Node 0 DMA32 free:5664kB min:5712kB low:7140kB high:8568kB active:364kB inactive:1983372kB present:2052068kB
pages_scanned:4610273 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8036kB
Node 0 DMA32: 18*4kB 29*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 5664kB
Swap cache: add 741069, delete 244674, find 84826/143212, race 680+239
Free swap  = 1088576kB
Total swap = 3140668kB
Free swap:       1088576kB
524224 pages of RAM
9619 reserved pages
293 pages shared
496396 pages swap cached
No available memory (MPOL_BIND): kill process 4166 (nscd) score 0 or a child
Killed process 4166 (nscd)
xfs_repair invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

Call Trace:
 [<ffffffff80257367>] out_of_memory+0x70/0x25d
 [<ffffffff80258f6b>] __alloc_pages+0x22c/0x2b5
 [<ffffffff8026e6a3>] alloc_pages_current+0x74/0x79
 [<ffffffff802548c8>] __page_cache_alloc+0xb/0xe
 [<ffffffff8025a65f>] __do_page_cache_readahead+0xa1/0x217
 [<ffffffff8025ab22>] do_page_cache_readahead+0x4e/0x5a
 [<ffffffff80256714>] filemap_nopage+0x140/0x30c
 [<ffffffff802609c6>] __handle_mm_fault+0x1fb/0x9d8
 [<ffffffff80532bf7>] do_page_fault+0x42b/0x7b3
 [<ffffffff802426eb>] autoremove_wake_function+0x0/0x2e
 [<ffffffff80244ea4>] up_write+0x9/0xb
 [<ffffffff8026602d>] sys_mprotect+0x645/0x764
 [<ffffffff8053109d>] error_exit+0x0/0x84

Mem-info:
Node 0 DMA per-cpu:
CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  53
CPU    1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
CPU    2: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, btch:  15 usd:  14
CPU    3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  26
Active:91 inactive:496247 dirty:0 writeback:0 unstable:0 free:3394 slab:9119 mapped:32 pagetables:637
Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB present:9376kB pages_scanned:3328
all_unreclaimable? yes
lowmem_reserve[]: 0 2003 2003
Node 0 DMA32 free:5540kB min:5712kB low:7140kB high:8568kB active:364kB inactive:1983300kB present:2052068kB
pages_scanned:4631841 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 0 DMA: 1*4kB 0*8kB 0*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 1*4096kB = 8036kB
Node 0 DMA32: 1*4kB 22*8kB 1*16kB 1*32kB 1*64kB 1*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 5540kB
Swap cache: add 741070, delete 244674, find 84826/143212, race 680+239
Free swap  = 1088576kB
Total swap = 3140668kB
Free swap:       1088576kB
524224 pages of RAM
9619 reserved pages
293 pages shared
496397 pages swap cached
No available memory (MPOL_BIND): kill process 17869 (xfs_repair) score 0 or a child
Killed process 17869 (xfs_repair)
klogd invoked oom-killer: gfp_mask=0x200d2, order=0, oomkilladj=0

Call Trace:
 [<ffffffff80257367>] out_of_memory+0x70/0x25d
 [<ffffffff80258f6b>] __alloc_pages+0x22c/0x2b5
 [<ffffffff8025afc2>] __pagevec_lru_add_active+0xce/0xde
 [<ffffffff8026d889>] alloc_page_vma+0x71/0x76
 [<ffffffff8026937b>] read_swap_cache_async+0x45/0xd8
 [<ffffffff80260ede>] __handle_mm_fault+0x713/0x9d8
 [<ffffffff80532bf7>] do_page_fault+0x42b/0x7b3
 [<ffffffff802426eb>] autoremove_wake_function+0x0/0x2e
 [<ffffffff8053109d>] error_exit+0x0/0x84
...


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-23  0:37   ` Donald Douwsma
@ 2007-01-23  1:12     ` Andrew Morton
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2007-01-23  1:12 UTC (permalink / raw)
  To: Donald Douwsma; +Cc: Justin Piszcz, linux-kernel, linux-raid, xfs

On Tue, 23 Jan 2007 11:37:09 +1100
Donald Douwsma <donaldd@sgi.com> wrote:

> Andrew Morton wrote:
> >> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> >> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> >> the OOM killer and kill all of my processes?
> > 
> > What's that?   Software raid or hardware raid?  If the latter, which driver?
> 
> I've hit this using local disk while testing xfs built against 2.6.20-rc4 (SMP x86_64)
> 
> dmesg follows, I'm not sure if anything in this is useful after the first event as our automated tests continued on
> after the failure.

This looks different.

> ...
>
> Mem-info:
> Node 0 DMA per-cpu:
> CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> CPU    2: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> CPU    3: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
> Node 0 DMA32 per-cpu:
> CPU    0: Hot: hi:  186, btch:  31 usd:  31   Cold: hi:   62, btch:  15 usd:  53
> CPU    1: Hot: hi:  186, btch:  31 usd:   2   Cold: hi:   62, btch:  15 usd:  60
> CPU    2: Hot: hi:  186, btch:  31 usd:  20   Cold: hi:   62, btch:  15 usd:  47
> CPU    3: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, btch:  15 usd:  56
> Active:76 inactive:495856 dirty:0 writeback:0 unstable:0 free:3680 slab:9119 mapped:32 pagetables:637

No dirty pages, no pages under writeback.

> Node 0 DMA free:8036kB min:24kB low:28kB high:36kB active:0kB inactive:1856kB present:9376kB pages_scanned:3296
> all_unreclaimable? yes
> lowmem_reserve[]: 0 2003 2003
> Node 0 DMA32 free:6684kB min:5712kB low:7140kB high:8568kB active:304kB inactive:1981624kB present:2052068kB

Inactive list is filled.

> pages_scanned:4343329 all_unreclaimable? yes

We scanned our guts out and decided that nothing was reclaimable.

> No available memory (MPOL_BIND): kill process 3492 (hald) score 0 or a child
> No available memory (MPOL_BIND): kill process 7914 (top) score 0 or a child
> No available memory (MPOL_BIND): kill process 4166 (nscd) score 0 or a child
> No available memory (MPOL_BIND): kill process 17869 (xfs_repair) score 0 or a child

But in all cases a constrained memory policy was in use.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-22 18:48   ` Justin Piszcz
  2007-01-22 23:47     ` Pavel Machek
@ 2007-01-24 23:39     ` Justin Piszcz
  2007-01-24 23:42       ` Justin Piszcz
  2007-01-25  0:32       ` Pavel Machek
  1 sibling, 2 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-24 23:39 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, linux-raid, xfs

> Is it highmem-related? Can you try it with mem=256M?

Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
use an onboard graphics controller that has 128MB of RAM allocated to it 
and I believe the ICH8 chipset also uses some memory, in any event mem=256 
causes the machine to lockup before it can even get to the boot/init 
processes, the two leds on the keyboard were blinking, caps lock and 
scroll lock and I saw no console at all!

Justin.

On Mon, 22 Jan 2007, Justin Piszcz wrote:

> 
> 
> On Mon, 22 Jan 2007, Pavel Machek wrote:
> 
> > On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
> > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > > the OOM killer and kill all of my processes?
> > > 
> > > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > > happens every time!
> > > 
> > > Anything to try?  Any other output needed?  Can someone shed some light on 
> > > this situation?
> > 
> > Is it highmem-related? Can you try it with mem=256M?
> > 
> > 							Pavel
> > -- 
> > (english) http://www.livejournal.com/~pavelmachek
> > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> I will give this a try later or tomorrow, I cannot have my machine crash 
> at the moment.
> 
> Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
> that has anything to do with it because after the system kill -9's all the 
> processes etc, my terminal looks like garbage.
> 
> Justin.
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-22 19:57   ` Andrew Morton
                     ` (2 preceding siblings ...)
  (?)
@ 2007-01-24 23:40   ` Justin Piszcz
  -1 siblings, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-24 23:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-raid, xfs



On Mon, 22 Jan 2007, Andrew Morton wrote:

> > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > the OOM killer and kill all of my processes?
> 
> What's that?   Software raid or hardware raid?  If the latter, which driver?
> 
> > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > happens every time!
> > 
> > Anything to try?  Any other output needed?  Can someone shed some light on 
> > this situation?
> > 
> > Thanks.
> > 
> > 
> > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
> > 
> > procs -----------memory---------- ---swap-- -----io---- -system-- 
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id 
> > wa
> >  0  7    764  50348     12 1269988    0    0 53632   172 1902 4600  1  8 
> > 29 62
> >  0  7    764  49420     12 1260004    0    0 53632 34368 1871 6357  2 11 
> > 48 40
> 
> The wordwrapping is painful :(
> 
> > 
> > The last lines of dmesg:
> > [ 5947.199985] lowmem_reserve[]: 0 0 0
> > [ 5947.199992] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
> > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
> > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
> > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
> > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
> > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
> > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
> > [ 5947.200055] Free swap  = 2197628kB
> > [ 5947.200058] Total swap = 2200760kB
> > [ 5947.200060] Free swap:       2197628kB
> > [ 5947.205664] 517888 pages of RAM 
> > [ 5947.205671] 288512 pages of HIGHMEM
> > [ 5947.205673] 5666 reserved pages 
> > [ 5947.205675] 257163 pages shared
> > [ 5947.205678] 600 pages swap cached 
> > [ 5947.205680] 88876 pages dirty
> > [ 5947.205682] 115111 pages writeback
> > [ 5947.205684] 5608 pages mapped
> > [ 5947.205686] 49367 pages slab
> > [ 5947.205688] 541 pages pagetables
> > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
> > child
> > [ 5947.205801] Killed process 1853 (named)
> > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
> > oomkilladj=0
> > [ 5947.206621]  [<c013e33b>] out_of_memory+0x17b/0x1b0
> > [ 5947.206631]  [<c013fcac>] __alloc_pages+0x29c/0x2f0
> > [ 5947.206636]  [<c01479ad>] __pte_alloc+0x1d/0x90
> > [ 5947.206643]  [<c0148bf7>] copy_page_range+0x357/0x380
> > [ 5947.206649]  [<c0119d75>] copy_process+0x765/0xfc0
> > [ 5947.206655]  [<c012c3f9>] alloc_pid+0x1b9/0x280
> > [ 5947.206662]  [<c011a839>] do_fork+0x79/0x1e0
> > [ 5947.206674]  [<c015f91f>] do_pipe+0x5f/0xc0
> > [ 5947.206680]  [<c0101176>] sys_clone+0x36/0x40
> > [ 5947.206686]  [<c0103138>] syscall_call+0x7/0xb
> > [ 5947.206691]  [<c0420033>] __sched_text_start+0x853/0x950
> > [ 5947.206698]  ======================= 
> 
> Important information from the oom-killing event is missing.  Please send
> it all.
> 
> >From your earlier reports we have several hundred MB of ZONE_NORMAL memory
> which has gone awol.
> 
> Please include /proc/meminfo from after the oom-killing.
> 
> Please work out what is using all that slab memory, via /proc/slabinfo.
> 
> After the oom-killing, please see if you can free up the ZONE_NORMAL memory
> via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
> work out what happened to the missing couple-of-hundred MB from
> ZONE_NORMAL.
> 
> 

Trying this now.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-24 23:39     ` Justin Piszcz
@ 2007-01-24 23:42       ` Justin Piszcz
  2007-01-25  0:32       ` Pavel Machek
  1 sibling, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-24 23:42 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, linux-raid, xfs

And FYI yes I used mem=256M just as you said, not mem=256.

Justin.

On Wed, 24 Jan 2007, Justin Piszcz wrote:

> > Is it highmem-related? Can you try it with mem=256M?
> 
> Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
> 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
> use an onboard graphics controller that has 128MB of RAM allocated to it 
> and I believe the ICH8 chipset also uses some memory, in any event mem=256 
> causes the machine to lockup before it can even get to the boot/init 
> processes, the two leds on the keyboard were blinking, caps lock and 
> scroll lock and I saw no console at all!
> 
> Justin.
> 
> On Mon, 22 Jan 2007, Justin Piszcz wrote:
> 
> > 
> > 
> > On Mon, 22 Jan 2007, Pavel Machek wrote:
> > 
> > > On Sun 2007-01-21 14:27:34, Justin Piszcz wrote:
> > > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > > > the OOM killer and kill all of my processes?
> > > > 
> > > > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > > > happens every time!
> > > > 
> > > > Anything to try?  Any other output needed?  Can someone shed some light on 
> > > > this situation?
> > > 
> > > Is it highmem-related? Can you try it with mem=256M?
> > > 
> > > 							Pavel
> > > -- 
> > > (english) http://www.livejournal.com/~pavelmachek
> > > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> > > -
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > 
> > 
> > I will give this a try later or tomorrow, I cannot have my machine crash 
> > at the moment.
> > 
> > Also, the onboard video on the Intel 965 chipset uses 128MB, not sure if 
> > that has anything to do with it because after the system kill -9's all the 
> > processes etc, my terminal looks like garbage.
> > 
> > Justin.
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-22 19:57   ` Andrew Morton
                     ` (3 preceding siblings ...)
  (?)
@ 2007-01-25  0:10   ` Justin Piszcz
  2007-01-25  0:36     ` Nick Piggin
  2007-01-25  1:21     ` Bill Cizek
  -1 siblings, 2 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-25  0:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-raid, xfs



On Mon, 22 Jan 2007, Andrew Morton wrote:

> > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > the OOM killer and kill all of my processes?
> 
> What's that?   Software raid or hardware raid?  If the latter, which driver?
> 
> > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > happens every time!
> > 
> > Anything to try?  Any other output needed?  Can someone shed some light on 
> > this situation?
> > 
> > Thanks.
> > 
> > 
> > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
> > 
> > procs -----------memory---------- ---swap-- -----io---- -system-- 
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id 
> > wa
> >  0  7    764  50348     12 1269988    0    0 53632   172 1902 4600  1  8 
> > 29 62
> >  0  7    764  49420     12 1260004    0    0 53632 34368 1871 6357  2 11 
> > 48 40
> 
> The wordwrapping is painful :(
> 
> > 
> > The last lines of dmesg:
> > [ 5947.199985] lowmem_reserve[]: 0 0 0
> > [ 5947.199992] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
> > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
> > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
> > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
> > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
> > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
> > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
> > [ 5947.200055] Free swap  = 2197628kB
> > [ 5947.200058] Total swap = 2200760kB
> > [ 5947.200060] Free swap:       2197628kB
> > [ 5947.205664] 517888 pages of RAM 
> > [ 5947.205671] 288512 pages of HIGHMEM
> > [ 5947.205673] 5666 reserved pages 
> > [ 5947.205675] 257163 pages shared
> > [ 5947.205678] 600 pages swap cached 
> > [ 5947.205680] 88876 pages dirty
> > [ 5947.205682] 115111 pages writeback
> > [ 5947.205684] 5608 pages mapped
> > [ 5947.205686] 49367 pages slab
> > [ 5947.205688] 541 pages pagetables
> > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
> > child
> > [ 5947.205801] Killed process 1853 (named)
> > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
> > oomkilladj=0
> > [ 5947.206621]  [<c013e33b>] out_of_memory+0x17b/0x1b0
> > [ 5947.206631]  [<c013fcac>] __alloc_pages+0x29c/0x2f0
> > [ 5947.206636]  [<c01479ad>] __pte_alloc+0x1d/0x90
> > [ 5947.206643]  [<c0148bf7>] copy_page_range+0x357/0x380
> > [ 5947.206649]  [<c0119d75>] copy_process+0x765/0xfc0
> > [ 5947.206655]  [<c012c3f9>] alloc_pid+0x1b9/0x280
> > [ 5947.206662]  [<c011a839>] do_fork+0x79/0x1e0
> > [ 5947.206674]  [<c015f91f>] do_pipe+0x5f/0xc0
> > [ 5947.206680]  [<c0101176>] sys_clone+0x36/0x40
> > [ 5947.206686]  [<c0103138>] syscall_call+0x7/0xb
> > [ 5947.206691]  [<c0420033>] __sched_text_start+0x853/0x950
> > [ 5947.206698]  ======================= 
> 
> Important information from the oom-killing event is missing.  Please send
> it all.
> 
> >From your earlier reports we have several hundred MB of ZONE_NORMAL memory
> which has gone awol.
> 
> Please include /proc/meminfo from after the oom-killing.
> 
> Please work out what is using all that slab memory, via /proc/slabinfo.
> 
> After the oom-killing, please see if you can free up the ZONE_NORMAL memory
> via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
> work out what happened to the missing couple-of-hundred MB from
> ZONE_NORMAL.
> 
> 

Running with PREEMPT OFF lets me copy the file!!  The machine LAGS 
occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds 
of lag, but hey, it does not crash!! I will boot the older kernel with 
preempt on and see if I can get you that information you requested.

Justin.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-24 23:39     ` Justin Piszcz
  2007-01-24 23:42       ` Justin Piszcz
@ 2007-01-25  0:32       ` Pavel Machek
  2007-01-25  0:36         ` Justin Piszcz
                           ` (2 more replies)
  1 sibling, 3 replies; 24+ messages in thread
From: Pavel Machek @ 2007-01-25  0:32 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs

Hi!

> > Is it highmem-related? Can you try it with mem=256M?
> 
> Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
> 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
> use an onboard graphics controller that has 128MB of RAM allocated to it 
> and I believe the ICH8 chipset also uses some memory, in any event mem=256 
> causes the machine to lockup before it can even get to the boot/init 
> processes, the two leds on the keyboard were blinking, caps lock and 
> scroll lock and I saw no console at all!

Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-22 19:57   ` Andrew Morton
                     ` (4 preceding siblings ...)
  (?)
@ 2007-01-25  0:34   ` Justin Piszcz
  -1 siblings, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-25  0:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-raid, xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5044 bytes --]

There is some XFS stuff in the dmesg too, that is why I am continuing to 
include the XFS mailing list.  Scroll down to read more.

On Mon, 22 Jan 2007, Andrew Morton wrote:

> > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
> > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
> > the OOM killer and kill all of my processes?
> 
> What's that?   Software raid or hardware raid?  If the latter, which driver?
> 
> > Doing this on a single disk 2.6.19.2 is OK, no issues.  However, this 
> > happens every time!
> > 
> > Anything to try?  Any other output needed?  Can someone shed some light on 
> > this situation?
> > 
> > Thanks.
> > 
> > 
> > The last lines of vmstat 1 (right before it kill -9'd my shell/ssh)
> > 
> > procs -----------memory---------- ---swap-- -----io---- -system-- 
> > ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id 
> > wa
> >  0  7    764  50348     12 1269988    0    0 53632   172 1902 4600  1  8 
> > 29 62
> >  0  7    764  49420     12 1260004    0    0 53632 34368 1871 6357  2 11 
> > 48 40
> 
> The wordwrapping is painful :(
> 
> > 
> > The last lines of dmesg:
> > [ 5947.199985] lowmem_reserve[]: 0 0 0
> > [ 5947.199992] DMA: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 1*128kB 1*256kB 
> > 0*512kB 1*1024kB 1*2048kB 0*4096kB = 3544kB
> > [ 5947.200010] Normal: 1*4kB 0*8kB 1*16kB 1*32kB 0*64kB 1*128kB 0*256kB 
> > 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2740kB
> > [ 5947.200035] HighMem: 98*4kB 35*8kB 9*16kB 69*32kB 4*64kB 1*128kB 
> > 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3664kB
> > [ 5947.200052] Swap cache: add 789, delete 189, find 16/17, race 0+0
> > [ 5947.200055] Free swap  = 2197628kB
> > [ 5947.200058] Total swap = 2200760kB
> > [ 5947.200060] Free swap:       2197628kB
> > [ 5947.205664] 517888 pages of RAM 
> > [ 5947.205671] 288512 pages of HIGHMEM
> > [ 5947.205673] 5666 reserved pages 
> > [ 5947.205675] 257163 pages shared
> > [ 5947.205678] 600 pages swap cached 
> > [ 5947.205680] 88876 pages dirty
> > [ 5947.205682] 115111 pages writeback
> > [ 5947.205684] 5608 pages mapped
> > [ 5947.205686] 49367 pages slab
> > [ 5947.205688] 541 pages pagetables
> > [ 5947.205795] Out of memory: kill process 1853 (named) score 9937 or a 
> > child
> > [ 5947.205801] Killed process 1853 (named)
> > [ 5947.206616] bash invoked oom-killer: gfp_mask=0x84d0, order=0, 
> > oomkilladj=0
> > [ 5947.206621]  [<c013e33b>] out_of_memory+0x17b/0x1b0
> > [ 5947.206631]  [<c013fcac>] __alloc_pages+0x29c/0x2f0
> > [ 5947.206636]  [<c01479ad>] __pte_alloc+0x1d/0x90
> > [ 5947.206643]  [<c0148bf7>] copy_page_range+0x357/0x380
> > [ 5947.206649]  [<c0119d75>] copy_process+0x765/0xfc0
> > [ 5947.206655]  [<c012c3f9>] alloc_pid+0x1b9/0x280
> > [ 5947.206662]  [<c011a839>] do_fork+0x79/0x1e0
> > [ 5947.206674]  [<c015f91f>] do_pipe+0x5f/0xc0
> > [ 5947.206680]  [<c0101176>] sys_clone+0x36/0x40
> > [ 5947.206686]  [<c0103138>] syscall_call+0x7/0xb
> > [ 5947.206691]  [<c0420033>] __sched_text_start+0x853/0x950
> > [ 5947.206698]  ======================= 
> 
> Important information from the oom-killing event is missing.  Please send
> it all.
> 
> >From your earlier reports we have several hundred MB of ZONE_NORMAL memory
> which has gone awol.
> 
> Please include /proc/meminfo from after the oom-killing.
> 
> Please work out what is using all that slab memory, via /proc/slabinfo.
> 
> After the oom-killing, please see if you can free up the ZONE_NORMAL memory
> via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
> work out what happened to the missing couple-of-hundred MB from
> ZONE_NORMAL.
> 
> 

I have done all you said, and I ran a constant loop w/ vmstat & cat 
/proc/slabinfo toward the end of the file(s) (_after_oom_killer) is when I 
ran:

p34:~# echo 3 > /proc/sys/vm/drop_caches
p34:~# echo 3 > /proc/sys/vm/drop_caches
p34:~# echo 3 > /proc/sys/vm/drop_caches
p34:~# 

The tarball will yield the following files you requested:

4.0K    _proc_meminfo_after_oom_killing.txt
976K    _slabinfo_after_oom_killer.txt
460K    _slabinfo.txt
8.0K    _vmstat_after_oom_killer.txt
4.0K    _vmstat.txt

I am going back to 2.6.20-rc5-6 w/NO-PRE-EMPT, as this was about 10x more 
stable in copying operations and everything else, if you need any more 
diagnostics/crashing(s) in this manner, let me know, because I can make it 
happen every single time with pre-empt on.

And not sure if it matters but when I copy 18gb 18g.2 I have seen it make 
it to various stages:

Attempt size_the 18g.2 got to:

1. 7.8G
2. 4.3G
3. 1.2G

Back to 2.6.20-rc5 w/ no pre-emption, until then/or if/you request 
anything else let me know.  Also I ran echo t > /proc/sysrq-trigger

I have attached THE ENTIRE syslog/kernel dmesg/ring buffer so you'll see 
my system booting up with no problems/errors up until now in which I am 
going back to the old kernel.  This is attached as 
kernel_ring_buffer.txt.bz2

Andrew,

Please let me know if any of this helps!

Justin.


[-- Attachment #2: Type: APPLICATION/octet-stream, Size: 33223 bytes --]

[-- Attachment #3: Type: APPLICATION/octet-stream, Size: 20488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-25  0:10   ` Justin Piszcz
@ 2007-01-25  0:36     ` Nick Piggin
  2007-01-25 11:11       ` Justin Piszcz
  2007-01-25  1:21     ` Bill Cizek
  1 sibling, 1 reply; 24+ messages in thread
From: Nick Piggin @ 2007-01-25  0:36 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Andrew Morton, linux-kernel, linux-raid, xfs

Justin Piszcz wrote:
> 
> On Mon, 22 Jan 2007, Andrew Morton wrote:

>>After the oom-killing, please see if you can free up the ZONE_NORMAL memory
>>via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
>>work out what happened to the missing couple-of-hundred MB from
>>ZONE_NORMAL.
>>
> 
> Running with PREEMPT OFF lets me copy the file!!  The machine LAGS 
> occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds 
> of lag, but hey, it does not crash!! I will boot the older kernel with 
> preempt on and see if I can get you that information you requested.

It wouldn't be a bad idea to recompile the new kernel with preempt on
and get the info from there.

It is usually best to be working with the most recent kernels. We can
always backport any important fixes if we need to.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-25  0:32       ` Pavel Machek
@ 2007-01-25  0:36         ` Justin Piszcz
  2007-01-25  0:58         ` Justin Piszcz
  2007-01-25  9:08         ` Justin Piszcz
  2 siblings, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-25  0:36 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, linux-raid, xfs



On Thu, 25 Jan 2007, Pavel Machek wrote:

> Hi!
> 
> > > Is it highmem-related? Can you try it with mem=256M?
> > 
> > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
> > 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
> > use an onboard graphics controller that has 128MB of RAM allocated to it 
> > and I believe the ICH8 chipset also uses some memory, in any event mem=256 
> > causes the machine to lockup before it can even get to the boot/init 
> > processes, the two leds on the keyboard were blinking, caps lock and 
> > scroll lock and I saw no console at all!
> 
> Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
> 									Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Ok, this will be my last test for tonight, trying now.

Justin.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-25  0:32       ` Pavel Machek
  2007-01-25  0:36         ` Justin Piszcz
@ 2007-01-25  0:58         ` Justin Piszcz
  2007-01-25  9:08         ` Justin Piszcz
  2 siblings, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-25  0:58 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, linux-raid, xfs



On Thu, 25 Jan 2007, Pavel Machek wrote:

> Hi!
> 
> > > Is it highmem-related? Can you try it with mem=256M?
> > 
> > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
> > 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
> > use an onboard graphics controller that has 128MB of RAM allocated to it 
> > and I believe the ICH8 chipset also uses some memory, in any event mem=256 
> > causes the machine to lockup before it can even get to the boot/init 
> > processes, the two leds on the keyboard were blinking, caps lock and 
> > scroll lock and I saw no console at all!
> 
> Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
> 									Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Looks like you may be onto something Pavel, has not invoked the OOM killer 
yet using mem=700M, I see the swap increasing and the speed of the copy is 
only 14-15MB/s, when mem= is off (giving me all memory w/preempt) I get 
45-65MB/s.

With append="mem=700M", seen below:

top - 19:38:46 up 1 min,  3 users,  load average: 1.24, 0.38, 0.13
Tasks: 172 total,   1 running, 171 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.6%us,  4.6%sy,  1.3%ni, 69.6%id, 13.9%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:    705512k total,   699268k used,     6244k free,       12k buffers
Swap:  2200760k total,    18520k used,  2182240k free,    34968k cached


(with mem=700M):
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 2  0  40276   7456     12  23184    0    0 13196 14904 1204 4756  1  2 50 47
 0  2  40276   6156     12  24264    0    0 20096 18352 1293 6400  1  4 49 46
 0  2  40276   6264     12  24264    0    0 15504 17836 1219 5202  0  2 50 48
 0  2  40276   6144     12  24376    0    0 14348 14453 1190 4815  0  3 49 48
 0  2  40276   6156     12  24504    0    0 11396 12724 1169 3724  1  2 50 48
 0  1  40276   7532     12  23272    0    0 11412 13121 1183 4017  0  2 50 48
 0  1  40276   7944     12  22644    0    0 19084 19144 1234 6548  0  4 50 46


Almost there, looks like its going to make it.
-rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb
-rw-r--r-- 1 user group 15399329792 2007-01-24 19:55 18gb.copy

Hrmm, with preempt off it works and with preempt ON and mem=700M it works.
I guess I will run with preempt off as I'd prefer to have the memory 
available-- and tolerate the large lag bursts w/ no preemption.


Yup it worked.

-rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb
-rw-r--r-- 1 user group 18630127104 2007-01-21 12:41 18gb.copy

Justin.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-25  0:10   ` Justin Piszcz
  2007-01-25  0:36     ` Nick Piggin
@ 2007-01-25  1:21     ` Bill Cizek
  2007-01-25 11:13       ` Justin Piszcz
  1 sibling, 1 reply; 24+ messages in thread
From: Bill Cizek @ 2007-01-25  1:21 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: linux-kernel, linux-raid, xfs

Justin Piszcz wrote:
> On Mon, 22 Jan 2007, Andrew Morton wrote:
>   
>>> On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz <jpiszcz@lucidpixels.com> wrote:
>>> Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to invoke 
>>> the OOM killer and kill all of my processes?
>>>       
> Running with PREEMPT OFF lets me copy the file!!  The machine LAGS 
> occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds 
> of lag, but hey, it does not crash!! I will boot the older kernel with 
> preempt on and see if I can get you that information you requested.
>   
Justin,

According to your kernel_ring_buffer.txt (attached to another email), 
you are using "anticipatory" as your io scheduler:
   289  Jan 24 18:35:25 p34 kernel: [    0.142130] io scheduler noop 
registered
   290  Jan 24 18:35:25 p34 kernel: [    0.142194] io scheduler 
anticipatory registered (default)

I had a problem with this scheduler where my system would occasionally 
lockup during heavy I/O.  Sometimes it would fix itself, sometimes I had 
to reboot.  I changed to the "CFQ" io scheduler and my system has worked 
fine since then.

CFQ has to be built into the kernel (under BlockLayer/IOSchedulers).  It 
can be selected as default or you can set it during runtime:

echo cfq > /sys/block/<disk>/queue/scheduler
...

Hope this helps,
Bill

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-25  0:32       ` Pavel Machek
  2007-01-25  0:36         ` Justin Piszcz
  2007-01-25  0:58         ` Justin Piszcz
@ 2007-01-25  9:08         ` Justin Piszcz
  2007-01-25 22:34           ` Mark Hahn
  2 siblings, 1 reply; 24+ messages in thread
From: Justin Piszcz @ 2007-01-25  9:08 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, linux-raid, xfs



On Thu, 25 Jan 2007, Pavel Machek wrote:

> Hi!
> 
> > > Is it highmem-related? Can you try it with mem=256M?
> > 
> > Bad idea, the kernel crashes & burns when I use mem=256, I had to boot 
> > 2.6.20-rc5-6 single to get back into my machine, very nasty.  Remember I 
> > use an onboard graphics controller that has 128MB of RAM allocated to it 
> > and I believe the ICH8 chipset also uses some memory, in any event mem=256 
> > causes the machine to lockup before it can even get to the boot/init 
> > processes, the two leds on the keyboard were blinking, caps lock and 
> > scroll lock and I saw no console at all!
> 
> Okay, so try mem=700M or disable CONFIG_HIGHMEM or something.
> 									Pavel
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

I forgot to remove the mem=700M with PREEMPT off and this is what I saw 
this morning:

[18005.261875] Killed process 4768 (screen)
[18005.262343] Out of memory: kill process 4793 (screen) score 385 or a 
child
[18005.262350] Killed process 4793 (screen)
[18005.378536] Out of memory: kill process 4825 (screen) score 385 or a 
child
[18005.378542] Killed process 4825 (screen)
[18005.378547] Out of memory: kill process 4825 (screen) score 385 or a 
child
[18005.378553] Killed process 4825 (screen)
[18005.413072] Out of memory: kill process 4875 (screen) score 385 or a 
child
[18005.413079] Killed process 4875 (screen)
[18005.423735] Out of memory: kill process 4970 (screen) score 385 or a 
child
[18005.423742] Killed process 4970 (screen)
[18005.431391] Out of memory: kill process 21365 (xfs_fsr) score 286 or a 
child
[18005.431398] Killed process 21365 (xfs_fsr)
 
$ screen -ls
  There are screens on:
        2532.pts-0.p34  (Dead ???)
        3776.pts-2.p34  (Dead ???)
        4768.pts-7.p34  (Dead ???)
        4793.pts-9.p34  (Dead ???)
        4825.pts-11.p34 (Dead ???)
        4875.pts-13.p34 (Dead ???)
        4970.pts-15.p34 (Dead ???)
 

Lovely...

$ uname -r
2.6.20-rc5

Something is seriously wrong with that OOM killer.

Justin.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-25  0:36     ` Nick Piggin
@ 2007-01-25 11:11       ` Justin Piszcz
  0 siblings, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-25 11:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, linux-kernel, linux-raid, xfs



On Thu, 25 Jan 2007, Nick Piggin wrote:

> Justin Piszcz wrote:
> > 
> > On Mon, 22 Jan 2007, Andrew Morton wrote:
> 
> > >After the oom-killing, please see if you can free up the ZONE_NORMAL memory
> > >via a few `echo 3 > /proc/sys/vm/drop_caches' commands.  See if you can
> > >work out what happened to the missing couple-of-hundred MB from
> > >ZONE_NORMAL.
> > >
> > 
> > Running with PREEMPT OFF lets me copy the file!!  The machine LAGS
> > occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of
> > lag, but hey, it does not crash!! I will boot the older kernel with preempt
> > on and see if I can get you that information you requested.
> 
> It wouldn't be a bad idea to recompile the new kernel with preempt on
> and get the info from there.
> 
> It is usually best to be working with the most recent kernels. We can
> always backport any important fixes if we need to.
> 
> Thanks,
> Nick
> 
> -- 
> SUSE Labs, Novell Inc.
> Send instant messages to your online friends http://au.messenger.yahoo.com -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

In my tests for the most part I am using the latest kernels.

Justin.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-25  1:21     ` Bill Cizek
@ 2007-01-25 11:13       ` Justin Piszcz
  0 siblings, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-25 11:13 UTC (permalink / raw)
  To: Bill Cizek; +Cc: linux-kernel, linux-raid, xfs, Alan Piszcz



On Wed, 24 Jan 2007, Bill Cizek wrote:

> Justin Piszcz wrote:
> > On Mon, 22 Jan 2007, Andrew Morton wrote:
> >   
> > > > On Sun, 21 Jan 2007 14:27:34 -0500 (EST) Justin Piszcz
> > > > <jpiszcz@lucidpixels.com> wrote:
> > > > Why does copying an 18GB on a 74GB raptor raid1 cause the kernel to
> > > > invoke the OOM killer and kill all of my processes?
> > > >       
> > Running with PREEMPT OFF lets me copy the file!!  The machine LAGS
> > occasionally every 5-30-60 seconds or so VERY BADLY, talking 5-10 seconds of
> > lag, but hey, it does not crash!! I will boot the older kernel with preempt
> > on and see if I can get you that information you requested.
> >   
> Justin,
> 
> According to your kernel_ring_buffer.txt (attached to another email), you are
> using "anticipatory" as your io scheduler:
>   289  Jan 24 18:35:25 p34 kernel: [    0.142130] io scheduler noop registered
>   290  Jan 24 18:35:25 p34 kernel: [    0.142194] io scheduler anticipatory
> registered (default)
> 
> I had a problem with this scheduler where my system would occasionally lockup
> during heavy I/O.  Sometimes it would fix itself, sometimes I had to reboot.
> I changed to the "CFQ" io scheduler and my system has worked fine since then.
> 
> CFQ has to be built into the kernel (under BlockLayer/IOSchedulers).  It can
> be selected as default or you can set it during runtime:
> 
> echo cfq > /sys/block/<disk>/queue/scheduler
> ...
> 
> Hope this helps,
> Bill
> 
>

I used to run CFQ awhile back but then I switched over to AS as it has 
better performance for my workloads, currently, I am running with PREEMPT 
off, if I see any additional issues, I will switch to the CFQ scheduler.

Right now, its the OOM killer that is going crazy.

Justin.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-25  9:08         ` Justin Piszcz
@ 2007-01-25 22:34           ` Mark Hahn
  2007-01-26  0:22             ` Justin Piszcz
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Hahn @ 2007-01-25 22:34 UTC (permalink / raw)
  To: Justin Piszcz; +Cc: Pavel Machek, linux-kernel, linux-raid, xfs

> Something is seriously wrong with that OOM killer.

do you know you don't have to operate in OOM-slaughter mode?

"vm.overcommit_memory = 2" in your /etc/sysctl.conf puts you 
into a mode where the kernel tracks your "committed" memory 
needs, and will eventually cause some allocations to fail.
this is often much nicer than the default random OOM slaughter.
(you probably also need to adjust vm.overcommit_ratio with 
some knowlege of your MemTotal and SwapTotal.)

regards, mark hahn.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2
  2007-01-25 22:34           ` Mark Hahn
@ 2007-01-26  0:22             ` Justin Piszcz
  0 siblings, 0 replies; 24+ messages in thread
From: Justin Piszcz @ 2007-01-26  0:22 UTC (permalink / raw)
  To: Mark Hahn; +Cc: Pavel Machek, linux-kernel, linux-raid, xfs



On Thu, 25 Jan 2007, Mark Hahn wrote:

> > Something is seriously wrong with that OOM killer.
> 
> do you know you don't have to operate in OOM-slaughter mode?
> 
> "vm.overcommit_memory = 2" in your /etc/sysctl.conf puts you into a mode where
> the kernel tracks your "committed" memory needs, and will eventually cause
> some allocations to fail.
> this is often much nicer than the default random OOM slaughter.
> (you probably also need to adjust vm.overcommit_ratio with some knowlege of
> your MemTotal and SwapTotal.)
> 
> regards, mark hahn.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

# sysctl -a | grep vm.over
vm.overcommit_ratio = 50
vm.overcommit_memory = 0

I'll have to experiment with these options, thanks for the info!

Justin.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2007-01-26  0:22 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-21 19:27 2.6.20-rc5: cp 18gb 18gb.2 = OOM killer, reproducible just like 2.16.19.2 Justin Piszcz
2007-01-22 13:37 ` Pavel Machek
2007-01-22 18:48   ` Justin Piszcz
2007-01-22 23:47     ` Pavel Machek
2007-01-24 23:39     ` Justin Piszcz
2007-01-24 23:42       ` Justin Piszcz
2007-01-25  0:32       ` Pavel Machek
2007-01-25  0:36         ` Justin Piszcz
2007-01-25  0:58         ` Justin Piszcz
2007-01-25  9:08         ` Justin Piszcz
2007-01-25 22:34           ` Mark Hahn
2007-01-26  0:22             ` Justin Piszcz
2007-01-22 19:57 ` Andrew Morton
2007-01-22 19:57   ` Andrew Morton
2007-01-22 20:20   ` Justin Piszcz
2007-01-23  0:37   ` Donald Douwsma
2007-01-23  1:12     ` Andrew Morton
2007-01-24 23:40   ` Justin Piszcz
2007-01-25  0:10   ` Justin Piszcz
2007-01-25  0:36     ` Nick Piggin
2007-01-25 11:11       ` Justin Piszcz
2007-01-25  1:21     ` Bill Cizek
2007-01-25 11:13       ` Justin Piszcz
2007-01-25  0:34   ` Justin Piszcz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.