linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
@ 2004-08-23 16:10 Karl Vogel
  2004-08-23 17:00 ` Jens Axboe
  2004-08-24 10:03 ` Jens Axboe
  0 siblings, 2 replies; 12+ messages in thread
From: Karl Vogel @ 2004-08-23 16:10 UTC (permalink / raw)
  To: 'Jens Axboe', Marcelo Tosatti; +Cc: linux-kernel

> > Jens, is this huge amount of bio/biovec's allocations 
> expected with CFQ? Its really really bad.
> 
> Nope, it's not by design :-)
> 
> A test case would be nice, then I'll fix it as soon as possible. But
> please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> fix to ll_rw_blk that can easily cause this. The first report is for
> 2.6.8.1, so I'm more puzzled on that.

I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there 
is anything extra I need to try/record, just shoot!

Original post with testcase + stats:
  http://article.gmane.org/gmane.linux.kernel/228156


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
  2004-08-23 16:10 Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
@ 2004-08-23 17:00 ` Jens Axboe
  2004-08-24 10:03 ` Jens Axboe
  1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-23 17:00 UTC (permalink / raw)
  To: Karl Vogel; +Cc: Marcelo Tosatti, linux-kernel

On Mon, Aug 23 2004, Karl Vogel wrote:
> > > Jens, is this huge amount of bio/biovec's allocations 
> > expected with CFQ? Its really really bad.
> > 
> > Nope, it's not by design :-)
> > 
> > A test case would be nice, then I'll fix it as soon as possible. But
> > please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> > fix to ll_rw_blk that can easily cause this. The first report is for
> > 2.6.8.1, so I'm more puzzled on that.
> 
> I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there 
> is anything extra I need to try/record, just shoot!
> 
> Original post with testcase + stats:
>   http://article.gmane.org/gmane.linux.kernel/228156

Good report, I'll reproduce it here tomorrow. Thanks!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
  2004-08-24 10:03 ` Jens Axboe
@ 2004-08-24  9:18   ` Marcelo Tosatti
  2004-08-24 10:52     ` Jens Axboe
  2004-08-24 10:13   ` Jens Axboe
  1 sibling, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2004-08-24  9:18 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Karl Vogel, linux-kernel, Ingo Molnar

On Tue, Aug 24, 2004 at 12:03:43PM +0200, Jens Axboe wrote:
> On Mon, Aug 23 2004, Karl Vogel wrote:
> > > > Jens, is this huge amount of bio/biovec's allocations 
> > > expected with CFQ? Its really really bad.
> > > 
> > > Nope, it's not by design :-)
> > > 
> > > A test case would be nice, then I'll fix it as soon as possible. But
> > > please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> > > fix to ll_rw_blk that can easily cause this. The first report is for
> > > 2.6.8.1, so I'm more puzzled on that.
> > 
> > I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there 
> > is anything extra I need to try/record, just shoot!
> > 
> > Original post with testcase + stats:
> >   http://article.gmane.org/gmane.linux.kernel/228156
> 
> 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> 2.6.8-rc4 report is not valid due to the fixed problem related to that
> in CFQ already. I'd still like for you to retest with 2.6.8.1.
> 
> So I'm trying 2.6.8.1 with voluntary preempt applied now, the bug could
> be related to that.

Jens,

You are right, I've been unable to reproduce the problem I was seeing 
(huge amount of bio/biovec's allocation causing major swapouts) with 
2.6.8.1.

With this kernel, The 512MB system swaps around 50MB and recovers perfectly, 
I can't see any odd behaviour with CFQ.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
  2004-08-23 16:10 Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
  2004-08-23 17:00 ` Jens Axboe
@ 2004-08-24 10:03 ` Jens Axboe
  2004-08-24  9:18   ` Marcelo Tosatti
  2004-08-24 10:13   ` Jens Axboe
  1 sibling, 2 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-24 10:03 UTC (permalink / raw)
  To: Karl Vogel; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar

On Mon, Aug 23 2004, Karl Vogel wrote:
> > > Jens, is this huge amount of bio/biovec's allocations 
> > expected with CFQ? Its really really bad.
> > 
> > Nope, it's not by design :-)
> > 
> > A test case would be nice, then I'll fix it as soon as possible. But
> > please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> > fix to ll_rw_blk that can easily cause this. The first report is for
> > 2.6.8.1, so I'm more puzzled on that.
> 
> I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there 
> is anything extra I need to try/record, just shoot!
> 
> Original post with testcase + stats:
>   http://article.gmane.org/gmane.linux.kernel/228156

2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
2.6.8-rc4 report is not valid due to the fixed problem related to that
in CFQ already. I'd still like for you to retest with 2.6.8.1.

So I'm trying 2.6.8.1 with voluntary preempt applied now, the bug could
be related to that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
  2004-08-24 10:03 ` Jens Axboe
  2004-08-24  9:18   ` Marcelo Tosatti
@ 2004-08-24 10:13   ` Jens Axboe
  1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-24 10:13 UTC (permalink / raw)
  To: Karl Vogel; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar

On Tue, Aug 24 2004, Jens Axboe wrote:
> On Mon, Aug 23 2004, Karl Vogel wrote:
> > > > Jens, is this huge amount of bio/biovec's allocations 
> > > expected with CFQ? Its really really bad.
> > > 
> > > Nope, it's not by design :-)
> > > 
> > > A test case would be nice, then I'll fix it as soon as possible. But
> > > please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> > > fix to ll_rw_blk that can easily cause this. The first report is for
> > > 2.6.8.1, so I'm more puzzled on that.
> > 
> > I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there 
> > is anything extra I need to try/record, just shoot!
> > 
> > Original post with testcase + stats:
> >   http://article.gmane.org/gmane.linux.kernel/228156
> 
> 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> 2.6.8-rc4 report is not valid due to the fixed problem related to that
> in CFQ already. I'd still like for you to retest with 2.6.8.1.
> 
> So I'm trying 2.6.8.1 with voluntary preempt applied now, the bug could
> be related to that.

Oh, and please do also do a sysrq-t from a hung box and save the output.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
  2004-08-24  9:18   ` Marcelo Tosatti
@ 2004-08-24 10:52     ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-24 10:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Karl Vogel, linux-kernel, Ingo Molnar

On Tue, Aug 24 2004, Marcelo Tosatti wrote:
> On Tue, Aug 24, 2004 at 12:03:43PM +0200, Jens Axboe wrote:
> > On Mon, Aug 23 2004, Karl Vogel wrote:
> > > > > Jens, is this huge amount of bio/biovec's allocations 
> > > > expected with CFQ? Its really really bad.
> > > > 
> > > > Nope, it's not by design :-)
> > > > 
> > > > A test case would be nice, then I'll fix it as soon as possible. But
> > > > please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> > > > fix to ll_rw_blk that can easily cause this. The first report is for
> > > > 2.6.8.1, so I'm more puzzled on that.
> > > 
> > > I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there 
> > > is anything extra I need to try/record, just shoot!
> > > 
> > > Original post with testcase + stats:
> > >   http://article.gmane.org/gmane.linux.kernel/228156
> > 
> > 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> > 2.6.8-rc4 report is not valid due to the fixed problem related to that
> > in CFQ already. I'd still like for you to retest with 2.6.8.1.
> > 
> > So I'm trying 2.6.8.1 with voluntary preempt applied now, the bug could
> > be related to that.
> 
> Jens,
> 
> You are right, I've been unable to reproduce the problem I was seeing 
> (huge amount of bio/biovec's allocation causing major swapouts) with 
> 2.6.8.1.
> 
> With this kernel, The 512MB system swaps around 50MB and recovers perfectly, 
> I can't see any odd behaviour with CFQ.

Great, thanks for verifying. So that just leaves this other problem,
once traces of hung processes are generated we'll know more. Currently I
cannot reproduce it with 2.6.8.1-mm4 at all, enabling preempt did
nothing to help it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
@ 2004-08-24 10:35 Karl Vogel
  0 siblings, 0 replies; 12+ messages in thread
From: Karl Vogel @ 2004-08-24 10:35 UTC (permalink / raw)
  To: 'Jens Axboe'; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar

> > The tests of yesterday evening, did recover. So I'm 
> guessing if I had
> > waited long enough the box would have recovered on the previous
> > tests. Looking at the vmstat from my previous tests, shows that the
> > box was low on memory (free/buff/cache are all very low):
> > 
> >   http://users.telenet.be/kvogel/vmstat-after-kill.txt
> > 
> > That was probably why it was swapping like mad. 
> 
> Ok, so now I'm confused - tests on what kernel recovered?

2.6.8.1 with voluntary-preempt-P7

The same kernel as the one that didn't recover (waited 10 minutes,
after which it was still swapping like mad).

Ofcourse the test where it recovered was when nothing else was
running on the box (no X session, no KDE, just plain 'init 3').

Karl.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
  2004-08-24 10:28 Karl Vogel
@ 2004-08-24 10:29 ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-24 10:29 UTC (permalink / raw)
  To: Karl Vogel; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar

On Tue, Aug 24 2004, Karl Vogel wrote:
> > > > Original post with testcase + stats:
> > > >   http://article.gmane.org/gmane.linux.kernel/228156
> > > 
> > > 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> > > 2.6.8-rc4 report is not valid due to the fixed problem 
> > related to that
> > > in CFQ already. I'd still like for you to retest with 2.6.8.1.
> > > 
> 
> Did some extra testing yesterday. When not running X or anything
> substantial, I'm able to trigger it after running the expunge 2 or
> 3 times in a row. 
> If I increase the calloc size, it triggers faster (tried with 1Gb
> calloc on a 512Mb box with 1Gb swap partition). 

I'll try increasing the size.

> The first expunge run, completes fine. The ones after that, get 
> OOM killed and I get a printk about page allocation order 0 failure.
> 
> The 2.6.8.1-mm4 was a clean version, but I will double check this,
> this evening.
> 
> I also tried with deadline, but was unable to trigger it.

I'm adding preempt to the mix, maybe that'll help provoke it.

> > Oh, and please do also do a sysrq-t from a hung box and save 
> > the output.
> 
> Note: the box doesn't hang completely. Just some processes get stuck
> in 'D' and the machine swaps heavily.

That's fine, I'd like a sysrq-t of that.

> The tests of yesterday evening, did recover. So I'm guessing if I had
> waited long enough the box would have recovered on the previous
> tests. Looking at the vmstat from my previous tests, shows that the
> box was low on memory (free/buff/cache are all very low):
> 
>   http://users.telenet.be/kvogel/vmstat-after-kill.txt
> 
> That was probably why it was swapping like mad. 

Ok, so now I'm confused - tests on what kernel recovered?

> Will provide you with that sysrq-t this evening.

Great.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
@ 2004-08-24 10:28 Karl Vogel
  2004-08-24 10:29 ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Karl Vogel @ 2004-08-24 10:28 UTC (permalink / raw)
  To: 'Jens Axboe'; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar

> > > Original post with testcase + stats:
> > >   http://article.gmane.org/gmane.linux.kernel/228156
> > 
> > 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> > 2.6.8-rc4 report is not valid due to the fixed problem 
> related to that
> > in CFQ already. I'd still like for you to retest with 2.6.8.1.
> > 

Did some extra testing yesterday. When not running X or anything
substantial, I'm able to trigger it after running the expunge 2 or
3 times in a row. 
If I increase the calloc size, it triggers faster (tried with 1Gb
calloc on a 512Mb box with 1Gb swap partition). 

The first expunge run, completes fine. The ones after that, get 
OOM killed and I get a printk about page allocation order 0 failure.

The 2.6.8.1-mm4 was a clean version, but I will double check this,
this evening.

I also tried with deadline, but was unable to trigger it.

> Oh, and please do also do a sysrq-t from a hung box and save 
> the output.

Note: the box doesn't hang completely. Just some processes get stuck
in 'D' and the machine swaps heavily.

The tests of yesterday evening, did recover. So I'm guessing if I had
waited long enough the box would have recovered on the previous
tests. Looking at the vmstat from my previous tests, shows that the
box was low on memory (free/buff/cache are all very low):

  http://users.telenet.be/kvogel/vmstat-after-kill.txt

That was probably why it was swapping like mad. 


Will provide you with that sysrq-t this evening.

Karl.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
  2004-08-23 14:12     ` Marcelo Tosatti
@ 2004-08-23 15:41       ` Jens Axboe
  0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-23 15:41 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Karl Vogel, linux-kernel

On Mon, Aug 23 2004, Marcelo Tosatti wrote:
> On Sun, Aug 22, 2004 at 09:18:51PM +0200, Karl Vogel wrote:
> > When using elevator=as I'm unable to trigger the swap of death, so it seems
> > that the CFQ scheduler is at blame here.
> > 
> > With AS scheduler, the system recovers in +-10 seconds, vmstat output during
> > that time:
> > 
> > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> >  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
> >  1  0      0 295632  40372  49400   87  278   324   303 1424   784  7  2 78 13
> >  0  0      0 295632  40372  49400    0    0     0     0 1210   648  3  1 96  0
> >  0  0      0 295632  40372  49400    0    0     0     0 1209   652  4  0 96  0
> >  2  0      0 112784  40372  49400    0    0     0     0 1204   630 23 34 43  0
> >  1  9 156236    788    264   8128   28 156220  3012 156228 3748  3655 11 31  0 59
> >  0 15 176656   2196    280   8664    0 20420   556 20436 1108   374  2  5  0 93
> >  0 17 205320    724    232   7960   28 28664   396 28664 1118   503  7 12  0 81
> >  2 12 217892   1812    252   8556  248 12584   864 12584 1495   318  2  7  0 91
> >  4 14 253268   2500    268   8728  188 35392   432 35392 1844   399  3  7  0 90
> >  0 13 255692   1188    288   9152  960 2424  1408  2424 1173  2215 10  5  0 85
> >  0  7 266140   2288    312   9276  604 10468   752 10468 1248   644  5  5  0 90
> >  0  7 190516 340636    348   9860 1400    0  2016     0 1294   817  4  8  0 88
> >  1  8 190516 339460    384  10844  552    0  1556     4 1241   642  3  1  0 96
> >  1  3 190516 337084    404  11968 1432    0  2576     4 1292   788  3  1  0 96
> >  0  6 190516 333892    420  13612 1844    0  3500     0 1343   850  5  2  0 93
> >  0  1 190516 333700    424  13848  480    0   720     0 1250   654  3  2  0 95
> >  0  1 190516 334468    424  13848  188    0   188     0 1224   589  3  2  0 95
> > 
> > With CFQ processes got stuck in 'D' and never left that state. See URL's in my
> > initial post for diagnostics.
> 
> I can confirm this on a 512MB box with 512MB swap (2.6.8-rc4). Using CFQ the machine swaps out
> 400 megs, with AS it swaps out 30M.  
> 
> That leads to allocation failures/etc. 
> 
> CFQ allocates a huge number of bio/biovecs:
> 
>  cat /proc/slabinfo | grep bio
> biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
> biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata 52     52      0
> biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata 53     53      0
> biovec-16            260    260    192   20    1 : tunables  120   60    0 : slabdata 13     13      0
> biovec-4             272    305     64   61    1 : tunables  120   60    0 : slabdata  5      5      0
> biovec-1          121088 122040     16  226    1 : tunables  120   60    0 : slabdata    540    540      0
> bio               121131 121573     64   61    1 : tunables  120   60    0 : slabdata   1992   1993      0
> 
> 
> biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata 128    128      0
> biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata  52     52      0
> biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata  53     53      0
> biovec-16            258    260    192   20    1 : tunables  120   60    0 : slabdata  13     13      0
> biovec-4             257    305     64   61    1 : tunables  120   60    0 : slabdata   5      5      0
> biovec-1           66390  68026     16  226    1 : tunables  120   60    0 : slabdata 301    301      0
> bio                66389  67222     64   61    1 : tunables  120   60    0 : slabdata   1102   1102      0
> 
> (which are freed later on, but the cause for the trashing during the swap IO).
> 
> While AS does:
> 
> [marcelo@yage marcelo]$ cat /proc/slabinfo | grep bio
> biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
> biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
> biovec-64            260    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
> biovec-16            280    280    192   20    1 : tunables  120   60    0 : slabdata     14     14      0
> biovec-4             264    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
> biovec-1            4478   5424     16  226    1 : tunables  120   60    0 : slabdata     24     24      0
> bio                 4525   5002     64   61    1 : tunables  120   60    0 : slabdata     81     82      0
> 
> 
> Odd thing is the 400M swapped out are not reclaimed after exp (the 512MB callocator) exits. With AS 
> almost all swapped out memory is reclaimed on exit.
> 
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  0  0 492828  13308    320   3716    0    0     0     0 1002     5  0  0 100  0
> 
> 
> Jens, is this huge amount of bio/biovec's allocations expected with CFQ? Its really really bad.

Nope, it's not by design :-)

A test case would be nice, then I'll fix it as soon as possible. But
please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
fix to ll_rw_blk that can easily cause this. The first report is for
2.6.8.1, so I'm more puzzled on that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
  2004-08-22 19:18   ` Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
@ 2004-08-23 14:12     ` Marcelo Tosatti
  2004-08-23 15:41       ` Jens Axboe
  0 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2004-08-23 14:12 UTC (permalink / raw)
  To: Karl Vogel, axboe; +Cc: linux-kernel

On Sun, Aug 22, 2004 at 09:18:51PM +0200, Karl Vogel wrote:
> When using elevator=as I'm unable to trigger the swap of death, so it seems
> that the CFQ scheduler is at blame here.
> 
> With AS scheduler, the system recovers in +-10 seconds, vmstat output during
> that time:
> 
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
>  1  0      0 295632  40372  49400   87  278   324   303 1424   784  7  2 78 13
>  0  0      0 295632  40372  49400    0    0     0     0 1210   648  3  1 96  0
>  0  0      0 295632  40372  49400    0    0     0     0 1209   652  4  0 96  0
>  2  0      0 112784  40372  49400    0    0     0     0 1204   630 23 34 43  0
>  1  9 156236    788    264   8128   28 156220  3012 156228 3748  3655 11 31  0 59
>  0 15 176656   2196    280   8664    0 20420   556 20436 1108   374  2  5  0 93
>  0 17 205320    724    232   7960   28 28664   396 28664 1118   503  7 12  0 81
>  2 12 217892   1812    252   8556  248 12584   864 12584 1495   318  2  7  0 91
>  4 14 253268   2500    268   8728  188 35392   432 35392 1844   399  3  7  0 90
>  0 13 255692   1188    288   9152  960 2424  1408  2424 1173  2215 10  5  0 85
>  0  7 266140   2288    312   9276  604 10468   752 10468 1248   644  5  5  0 90
>  0  7 190516 340636    348   9860 1400    0  2016     0 1294   817  4  8  0 88
>  1  8 190516 339460    384  10844  552    0  1556     4 1241   642  3  1  0 96
>  1  3 190516 337084    404  11968 1432    0  2576     4 1292   788  3  1  0 96
>  0  6 190516 333892    420  13612 1844    0  3500     0 1343   850  5  2  0 93
>  0  1 190516 333700    424  13848  480    0   720     0 1250   654  3  2  0 95
>  0  1 190516 334468    424  13848  188    0   188     0 1224   589  3  2  0 95
> 
> With CFQ processes got stuck in 'D' and never left that state. See URL's in my
> initial post for diagnostics.

I can confirm this on a 512MB box with 512MB swap (2.6.8-rc4). Using CFQ the machine swaps out
400 megs, with AS it swaps out 30M.  

That leads to allocation failures/etc. 

CFQ allocates a huge number of bio/biovecs:

 cat /proc/slabinfo | grep bio
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata 52     52      0
biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata 53     53      0
biovec-16            260    260    192   20    1 : tunables  120   60    0 : slabdata 13     13      0
biovec-4             272    305     64   61    1 : tunables  120   60    0 : slabdata  5      5      0
biovec-1          121088 122040     16  226    1 : tunables  120   60    0 : slabdata    540    540      0
bio               121131 121573     64   61    1 : tunables  120   60    0 : slabdata   1992   1993      0


biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata 128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata  52     52      0
biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata  53     53      0
biovec-16            258    260    192   20    1 : tunables  120   60    0 : slabdata  13     13      0
biovec-4             257    305     64   61    1 : tunables  120   60    0 : slabdata   5      5      0
biovec-1           66390  68026     16  226    1 : tunables  120   60    0 : slabdata 301    301      0
bio                66389  67222     64   61    1 : tunables  120   60    0 : slabdata   1102   1102      0

(which are freed later on, but the cause for the trashing during the swap IO).

While AS does:

[marcelo@yage marcelo]$ cat /proc/slabinfo | grep bio
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
biovec-64            260    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
biovec-16            280    280    192   20    1 : tunables  120   60    0 : slabdata     14     14      0
biovec-4             264    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1            4478   5424     16  226    1 : tunables  120   60    0 : slabdata     24     24      0
bio                 4525   5002     64   61    1 : tunables  120   60    0 : slabdata     81     82      0


Odd thing is the 400M swapped out are not reclaimed after exp (the 512MB callocator) exits. With AS 
almost all swapped out memory is reclaimed on exit.

 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 0  0 492828  13308    320   3716    0    0     0     0 1002     5  0  0 100  0


Jens, is this huge amount of bio/biovec's allocations expected with CFQ? Its really really bad.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
  2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
@ 2004-08-22 19:18   ` Karl Vogel
  2004-08-23 14:12     ` Marcelo Tosatti
  0 siblings, 1 reply; 12+ messages in thread
From: Karl Vogel @ 2004-08-22 19:18 UTC (permalink / raw)
  To: linux-kernel

When using elevator=as I'm unable to trigger the swap of death, so it seems
that the CFQ scheduler is at blame here.

With AS scheduler, the system recovers in +-10 seconds, vmstat output during
that time:

procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0      0 295632  40372  49400   87  278   324   303 1424   784  7  2 78 13
 0  0      0 295632  40372  49400    0    0     0     0 1210   648  3  1 96  0
 0  0      0 295632  40372  49400    0    0     0     0 1209   652  4  0 96  0
 2  0      0 112784  40372  49400    0    0     0     0 1204   630 23 34 43  0
 1  9 156236    788    264   8128   28 156220  3012 156228 3748  3655 11 31  0 59
 0 15 176656   2196    280   8664    0 20420   556 20436 1108   374  2  5  0 93
 0 17 205320    724    232   7960   28 28664   396 28664 1118   503  7 12  0 81
 2 12 217892   1812    252   8556  248 12584   864 12584 1495   318  2  7  0 91
 4 14 253268   2500    268   8728  188 35392   432 35392 1844   399  3  7  0 90
 0 13 255692   1188    288   9152  960 2424  1408  2424 1173  2215 10  5  0 85
 0  7 266140   2288    312   9276  604 10468   752 10468 1248   644  5  5  0 90
 0  7 190516 340636    348   9860 1400    0  2016     0 1294   817  4  8  0 88
 1  8 190516 339460    384  10844  552    0  1556     4 1241   642  3  1  0 96
 1  3 190516 337084    404  11968 1432    0  2576     4 1292   788  3  1  0 96
 0  6 190516 333892    420  13612 1844    0  3500     0 1343   850  5  2  0 93
 0  1 190516 333700    424  13848  480    0   720     0 1250   654  3  2  0 95
 0  1 190516 334468    424  13848  188    0   188     0 1224   589  3  2  0 95

With CFQ processes got stuck in 'D' and never left that state. See URL's in my
initial post for diagnostics.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-08-24 10:54 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-23 16:10 Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
2004-08-23 17:00 ` Jens Axboe
2004-08-24 10:03 ` Jens Axboe
2004-08-24  9:18   ` Marcelo Tosatti
2004-08-24 10:52     ` Jens Axboe
2004-08-24 10:13   ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2004-08-24 10:35 Karl Vogel
2004-08-24 10:28 Karl Vogel
2004-08-24 10:29 ` Jens Axboe
2004-08-22 13:27 Kernel 2.6.8.1: swap storm of death Karl Vogel
2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
2004-08-22 19:18   ` Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
2004-08-23 14:12     ` Marcelo Tosatti
2004-08-23 15:41       ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).