* RE: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
@ 2004-08-23 16:10 Karl Vogel
2004-08-23 17:00 ` Jens Axboe
2004-08-24 10:03 ` Jens Axboe
0 siblings, 2 replies; 12+ messages in thread
From: Karl Vogel @ 2004-08-23 16:10 UTC (permalink / raw)
To: 'Jens Axboe', Marcelo Tosatti; +Cc: linux-kernel
> > Jens, is this huge amount of bio/biovec's allocations
> expected with CFQ? Its really really bad.
>
> Nope, it's not by design :-)
>
> A test case would be nice, then I'll fix it as soon as possible. But
> please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> fix to ll_rw_blk that can easily cause this. The first report is for
> 2.6.8.1, so I'm more puzzled on that.
I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there
is anything extra I need to try/record, just shoot!
Original post with testcase + stats:
http://article.gmane.org/gmane.linux.kernel/228156
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-23 16:10 Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
@ 2004-08-23 17:00 ` Jens Axboe
2004-08-24 10:03 ` Jens Axboe
1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-23 17:00 UTC (permalink / raw)
To: Karl Vogel; +Cc: Marcelo Tosatti, linux-kernel
On Mon, Aug 23 2004, Karl Vogel wrote:
> > > Jens, is this huge amount of bio/biovec's allocations
> > expected with CFQ? Its really really bad.
> >
> > Nope, it's not by design :-)
> >
> > A test case would be nice, then I'll fix it as soon as possible. But
> > please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> > fix to ll_rw_blk that can easily cause this. The first report is for
> > 2.6.8.1, so I'm more puzzled on that.
>
> I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there
> is anything extra I need to try/record, just shoot!
>
> Original post with testcase + stats:
> http://article.gmane.org/gmane.linux.kernel/228156
Good report, I'll reproduce it here tomorrow. Thanks!
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-24 10:03 ` Jens Axboe
@ 2004-08-24 9:18 ` Marcelo Tosatti
2004-08-24 10:52 ` Jens Axboe
2004-08-24 10:13 ` Jens Axboe
1 sibling, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2004-08-24 9:18 UTC (permalink / raw)
To: Jens Axboe; +Cc: Karl Vogel, linux-kernel, Ingo Molnar
On Tue, Aug 24, 2004 at 12:03:43PM +0200, Jens Axboe wrote:
> On Mon, Aug 23 2004, Karl Vogel wrote:
> > > > Jens, is this huge amount of bio/biovec's allocations
> > > expected with CFQ? Its really really bad.
> > >
> > > Nope, it's not by design :-)
> > >
> > > A test case would be nice, then I'll fix it as soon as possible. But
> > > please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> > > fix to ll_rw_blk that can easily cause this. The first report is for
> > > 2.6.8.1, so I'm more puzzled on that.
> >
> > I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there
> > is anything extra I need to try/record, just shoot!
> >
> > Original post with testcase + stats:
> > http://article.gmane.org/gmane.linux.kernel/228156
>
> 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> 2.6.8-rc4 report is not valid due to the fixed problem related to that
> in CFQ already. I'd still like for you to retest with 2.6.8.1.
>
> So I'm trying 2.6.8.1 with voluntary preempt applied now, the bug could
> be related to that.
Jens,
You are right, I've been unable to reproduce the problem I was seeing
(huge amount of bio/biovec's allocation causing major swapouts) with
2.6.8.1.
With this kernel, The 512MB system swaps around 50MB and recovers perfectly,
I can't see any odd behaviour with CFQ.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-23 16:10 Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
2004-08-23 17:00 ` Jens Axboe
@ 2004-08-24 10:03 ` Jens Axboe
2004-08-24 9:18 ` Marcelo Tosatti
2004-08-24 10:13 ` Jens Axboe
1 sibling, 2 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-24 10:03 UTC (permalink / raw)
To: Karl Vogel; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar
On Mon, Aug 23 2004, Karl Vogel wrote:
> > > Jens, is this huge amount of bio/biovec's allocations
> > expected with CFQ? Its really really bad.
> >
> > Nope, it's not by design :-)
> >
> > A test case would be nice, then I'll fix it as soon as possible. But
> > please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> > fix to ll_rw_blk that can easily cause this. The first report is for
> > 2.6.8.1, so I'm more puzzled on that.
>
> I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there
> is anything extra I need to try/record, just shoot!
>
> Original post with testcase + stats:
> http://article.gmane.org/gmane.linux.kernel/228156
2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
2.6.8-rc4 report is not valid due to the fixed problem related to that
in CFQ already. I'd still like for you to retest with 2.6.8.1.
So I'm trying 2.6.8.1 with voluntary preempt applied now, the bug could
be related to that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-24 10:03 ` Jens Axboe
2004-08-24 9:18 ` Marcelo Tosatti
@ 2004-08-24 10:13 ` Jens Axboe
1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-24 10:13 UTC (permalink / raw)
To: Karl Vogel; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar
On Tue, Aug 24 2004, Jens Axboe wrote:
> On Mon, Aug 23 2004, Karl Vogel wrote:
> > > > Jens, is this huge amount of bio/biovec's allocations
> > > expected with CFQ? Its really really bad.
> > >
> > > Nope, it's not by design :-)
> > >
> > > A test case would be nice, then I'll fix it as soon as possible. But
> > > please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> > > fix to ll_rw_blk that can easily cause this. The first report is for
> > > 2.6.8.1, so I'm more puzzled on that.
> >
> > I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there
> > is anything extra I need to try/record, just shoot!
> >
> > Original post with testcase + stats:
> > http://article.gmane.org/gmane.linux.kernel/228156
>
> 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> 2.6.8-rc4 report is not valid due to the fixed problem related to that
> in CFQ already. I'd still like for you to retest with 2.6.8.1.
>
> So I'm trying 2.6.8.1 with voluntary preempt applied now, the bug could
> be related to that.
Oh, and please do also do a sysrq-t from a hung box and save the output.
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-24 9:18 ` Marcelo Tosatti
@ 2004-08-24 10:52 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-24 10:52 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Karl Vogel, linux-kernel, Ingo Molnar
On Tue, Aug 24 2004, Marcelo Tosatti wrote:
> On Tue, Aug 24, 2004 at 12:03:43PM +0200, Jens Axboe wrote:
> > On Mon, Aug 23 2004, Karl Vogel wrote:
> > > > > Jens, is this huge amount of bio/biovec's allocations
> > > > expected with CFQ? Its really really bad.
> > > >
> > > > Nope, it's not by design :-)
> > > >
> > > > A test case would be nice, then I'll fix it as soon as possible. But
> > > > please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
> > > > fix to ll_rw_blk that can easily cause this. The first report is for
> > > > 2.6.8.1, so I'm more puzzled on that.
> > >
> > > I tried with 2.6.8.1 and 2.6.8.1-mm4, both had the problem. If there
> > > is anything extra I need to try/record, just shoot!
> > >
> > > Original post with testcase + stats:
> > > http://article.gmane.org/gmane.linux.kernel/228156
> >
> > 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> > 2.6.8-rc4 report is not valid due to the fixed problem related to that
> > in CFQ already. I'd still like for you to retest with 2.6.8.1.
> >
> > So I'm trying 2.6.8.1 with voluntary preempt applied now, the bug could
> > be related to that.
>
> Jens,
>
> You are right, I've been unable to reproduce the problem I was seeing
> (huge amount of bio/biovec's allocation causing major swapouts) with
> 2.6.8.1.
>
> With this kernel, The 512MB system swaps around 50MB and recovers perfectly,
> I can't see any odd behaviour with CFQ.
Great, thanks for verifying. So that just leaves this other problem,
once traces of hung processes are generated we'll know more. Currently I
cannot reproduce it with 2.6.8.1-mm4 at all, enabling preempt did
nothing to help it.
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
@ 2004-08-24 10:35 Karl Vogel
0 siblings, 0 replies; 12+ messages in thread
From: Karl Vogel @ 2004-08-24 10:35 UTC (permalink / raw)
To: 'Jens Axboe'; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar
> > The tests of yesterday evening, did recover. So I'm
> guessing if I had
> > waited long enough the box would have recovered on the previous
> > tests. Looking at the vmstat from my previous tests, shows that the
> > box was low on memory (free/buff/cache are all very low):
> >
> > http://users.telenet.be/kvogel/vmstat-after-kill.txt
> >
> > That was probably why it was swapping like mad.
>
> Ok, so now I'm confused - tests on what kernel recovered?
2.6.8.1 with voluntary-preempt-P7
The same kernel as the one that didn't recover (waited 10 minutes,
after which it was still swapping like mad).
Ofcourse the test where it recovered was when nothing else was
running on the box (no X session, no KDE, just plain 'init 3').
Karl.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-24 10:28 Karl Vogel
@ 2004-08-24 10:29 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-24 10:29 UTC (permalink / raw)
To: Karl Vogel; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar
On Tue, Aug 24 2004, Karl Vogel wrote:
> > > > Original post with testcase + stats:
> > > > http://article.gmane.org/gmane.linux.kernel/228156
> > >
> > > 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> > > 2.6.8-rc4 report is not valid due to the fixed problem
> > related to that
> > > in CFQ already. I'd still like for you to retest with 2.6.8.1.
> > >
>
> Did some extra testing yesterday. When not running X or anything
> substantial, I'm able to trigger it after running the expunge 2 or
> 3 times in a row.
> If I increase the calloc size, it triggers faster (tried with 1Gb
> calloc on a 512Mb box with 1Gb swap partition).
I'll try increasing the size.
> The first expunge run, completes fine. The ones after that, get
> OOM killed and I get a printk about page allocation order 0 failure.
>
> The 2.6.8.1-mm4 was a clean version, but I will double check this,
> this evening.
>
> I also tried with deadline, but was unable to trigger it.
I'm adding preempt to the mix, maybe that'll help provoke it.
> > Oh, and please do also do a sysrq-t from a hung box and save
> > the output.
>
> Note: the box doesn't hang completely. Just some processes get stuck
> in 'D' and the machine swaps heavily.
That's fine, I'd like a sysrq-t of that.
> The tests of yesterday evening, did recover. So I'm guessing if I had
> waited long enough the box would have recovered on the previous
> tests. Looking at the vmstat from my previous tests, shows that the
> box was low on memory (free/buff/cache are all very low):
>
> http://users.telenet.be/kvogel/vmstat-after-kill.txt
>
> That was probably why it was swapping like mad.
Ok, so now I'm confused - tests on what kernel recovered?
> Will provide you with that sysrq-t this evening.
Great.
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* RE: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
@ 2004-08-24 10:28 Karl Vogel
2004-08-24 10:29 ` Jens Axboe
0 siblings, 1 reply; 12+ messages in thread
From: Karl Vogel @ 2004-08-24 10:28 UTC (permalink / raw)
To: 'Jens Axboe'; +Cc: Marcelo Tosatti, linux-kernel, Ingo Molnar
> > > Original post with testcase + stats:
> > > http://article.gmane.org/gmane.linux.kernel/228156
> >
> > 2.6.8.1-mm4 clean does not reproduce the problem. Marcelo, your
> > 2.6.8-rc4 report is not valid due to the fixed problem
> related to that
> > in CFQ already. I'd still like for you to retest with 2.6.8.1.
> >
Did some extra testing yesterday. When not running X or anything
substantial, I'm able to trigger it after running the expunge 2 or
3 times in a row.
If I increase the calloc size, it triggers faster (tried with 1Gb
calloc on a 512Mb box with 1Gb swap partition).
The first expunge run, completes fine. The ones after that, get
OOM killed and I get a printk about page allocation order 0 failure.
The 2.6.8.1-mm4 was a clean version, but I will double check this,
this evening.
I also tried with deadline, but was unable to trigger it.
> Oh, and please do also do a sysrq-t from a hung box and save
> the output.
Note: the box doesn't hang completely. Just some processes get stuck
in 'D' and the machine swaps heavily.
The tests of yesterday evening, did recover. So I'm guessing if I had
waited long enough the box would have recovered on the previous
tests. Looking at the vmstat from my previous tests, shows that the
box was low on memory (free/buff/cache are all very low):
http://users.telenet.be/kvogel/vmstat-after-kill.txt
That was probably why it was swapping like mad.
Will provide you with that sysrq-t this evening.
Karl.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-23 14:12 ` Marcelo Tosatti
@ 2004-08-23 15:41 ` Jens Axboe
0 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2004-08-23 15:41 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Karl Vogel, linux-kernel
On Mon, Aug 23 2004, Marcelo Tosatti wrote:
> On Sun, Aug 22, 2004 at 09:18:51PM +0200, Karl Vogel wrote:
> > When using elevator=as I'm unable to trigger the swap of death, so it seems
> > that the CFQ scheduler is at blame here.
> >
> > With AS scheduler, the system recovers in +-10 seconds, vmstat output during
> > that time:
> >
> > procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> > r b swpd free buff cache si so bi bo in cs us sy id wa
> > 1 0 0 295632 40372 49400 87 278 324 303 1424 784 7 2 78 13
> > 0 0 0 295632 40372 49400 0 0 0 0 1210 648 3 1 96 0
> > 0 0 0 295632 40372 49400 0 0 0 0 1209 652 4 0 96 0
> > 2 0 0 112784 40372 49400 0 0 0 0 1204 630 23 34 43 0
> > 1 9 156236 788 264 8128 28 156220 3012 156228 3748 3655 11 31 0 59
> > 0 15 176656 2196 280 8664 0 20420 556 20436 1108 374 2 5 0 93
> > 0 17 205320 724 232 7960 28 28664 396 28664 1118 503 7 12 0 81
> > 2 12 217892 1812 252 8556 248 12584 864 12584 1495 318 2 7 0 91
> > 4 14 253268 2500 268 8728 188 35392 432 35392 1844 399 3 7 0 90
> > 0 13 255692 1188 288 9152 960 2424 1408 2424 1173 2215 10 5 0 85
> > 0 7 266140 2288 312 9276 604 10468 752 10468 1248 644 5 5 0 90
> > 0 7 190516 340636 348 9860 1400 0 2016 0 1294 817 4 8 0 88
> > 1 8 190516 339460 384 10844 552 0 1556 4 1241 642 3 1 0 96
> > 1 3 190516 337084 404 11968 1432 0 2576 4 1292 788 3 1 0 96
> > 0 6 190516 333892 420 13612 1844 0 3500 0 1343 850 5 2 0 93
> > 0 1 190516 333700 424 13848 480 0 720 0 1250 654 3 2 0 95
> > 0 1 190516 334468 424 13848 188 0 188 0 1224 589 3 2 0 95
> >
> > With CFQ processes got stuck in 'D' and never left that state. See URL's in my
> > initial post for diagnostics.
>
> I can confirm this on a 512MB box with 512MB swap (2.6.8-rc4). Using CFQ the machine swaps out
> 400 megs, with AS it swaps out 30M.
>
> That leads to allocation failures/etc.
>
> CFQ allocates a huge number of bio/biovecs:
>
> cat /proc/slabinfo | grep bio
> biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
> biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
> biovec-64 265 265 768 5 1 : tunables 54 27 0 : slabdata 53 53 0
> biovec-16 260 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0
> biovec-4 272 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
> biovec-1 121088 122040 16 226 1 : tunables 120 60 0 : slabdata 540 540 0
> bio 121131 121573 64 61 1 : tunables 120 60 0 : slabdata 1992 1993 0
>
>
> biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
> biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
> biovec-64 265 265 768 5 1 : tunables 54 27 0 : slabdata 53 53 0
> biovec-16 258 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0
> biovec-4 257 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
> biovec-1 66390 68026 16 226 1 : tunables 120 60 0 : slabdata 301 301 0
> bio 66389 67222 64 61 1 : tunables 120 60 0 : slabdata 1102 1102 0
>
> (which are freed later on, but the cause for the trashing during the swap IO).
>
> While AS does:
>
> [marcelo@yage marcelo]$ cat /proc/slabinfo | grep bio
> biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
> biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
> biovec-64 260 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0
> biovec-16 280 280 192 20 1 : tunables 120 60 0 : slabdata 14 14 0
> biovec-4 264 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
> biovec-1 4478 5424 16 226 1 : tunables 120 60 0 : slabdata 24 24 0
> bio 4525 5002 64 61 1 : tunables 120 60 0 : slabdata 81 82 0
>
>
> Odd thing is the 400M swapped out are not reclaimed after exp (the 512MB callocator) exits. With AS
> almost all swapped out memory is reclaimed on exit.
>
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 0 0 492828 13308 320 3716 0 0 0 0 1002 5 0 0 100 0
>
>
> Jens, is this huge amount of bio/biovec's allocations expected with CFQ? Its really really bad.
Nope, it's not by design :-)
A test case would be nice, then I'll fix it as soon as possible. But
please retest with 2.6.8.1 marcelo, 2.6.8-rc4 is missing an important
fix to ll_rw_blk that can easily cause this. The first report is for
2.6.8.1, so I'm more puzzled on that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-22 19:18 ` Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
@ 2004-08-23 14:12 ` Marcelo Tosatti
2004-08-23 15:41 ` Jens Axboe
0 siblings, 1 reply; 12+ messages in thread
From: Marcelo Tosatti @ 2004-08-23 14:12 UTC (permalink / raw)
To: Karl Vogel, axboe; +Cc: linux-kernel
On Sun, Aug 22, 2004 at 09:18:51PM +0200, Karl Vogel wrote:
> When using elevator=as I'm unable to trigger the swap of death, so it seems
> that the CFQ scheduler is at blame here.
>
> With AS scheduler, the system recovers in +-10 seconds, vmstat output during
> that time:
>
> procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 1 0 0 295632 40372 49400 87 278 324 303 1424 784 7 2 78 13
> 0 0 0 295632 40372 49400 0 0 0 0 1210 648 3 1 96 0
> 0 0 0 295632 40372 49400 0 0 0 0 1209 652 4 0 96 0
> 2 0 0 112784 40372 49400 0 0 0 0 1204 630 23 34 43 0
> 1 9 156236 788 264 8128 28 156220 3012 156228 3748 3655 11 31 0 59
> 0 15 176656 2196 280 8664 0 20420 556 20436 1108 374 2 5 0 93
> 0 17 205320 724 232 7960 28 28664 396 28664 1118 503 7 12 0 81
> 2 12 217892 1812 252 8556 248 12584 864 12584 1495 318 2 7 0 91
> 4 14 253268 2500 268 8728 188 35392 432 35392 1844 399 3 7 0 90
> 0 13 255692 1188 288 9152 960 2424 1408 2424 1173 2215 10 5 0 85
> 0 7 266140 2288 312 9276 604 10468 752 10468 1248 644 5 5 0 90
> 0 7 190516 340636 348 9860 1400 0 2016 0 1294 817 4 8 0 88
> 1 8 190516 339460 384 10844 552 0 1556 4 1241 642 3 1 0 96
> 1 3 190516 337084 404 11968 1432 0 2576 4 1292 788 3 1 0 96
> 0 6 190516 333892 420 13612 1844 0 3500 0 1343 850 5 2 0 93
> 0 1 190516 333700 424 13848 480 0 720 0 1250 654 3 2 0 95
> 0 1 190516 334468 424 13848 188 0 188 0 1224 589 3 2 0 95
>
> With CFQ processes got stuck in 'D' and never left that state. See URL's in my
> initial post for diagnostics.
I can confirm this on a 512MB box with 512MB swap (2.6.8-rc4). Using CFQ the machine swaps out
400 megs, with AS it swaps out 30M.
That leads to allocation failures/etc.
CFQ allocates a huge number of bio/biovecs:
cat /proc/slabinfo | grep bio
biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
biovec-64 265 265 768 5 1 : tunables 54 27 0 : slabdata 53 53 0
biovec-16 260 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0
biovec-4 272 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 121088 122040 16 226 1 : tunables 120 60 0 : slabdata 540 540 0
bio 121131 121573 64 61 1 : tunables 120 60 0 : slabdata 1992 1993 0
biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
biovec-64 265 265 768 5 1 : tunables 54 27 0 : slabdata 53 53 0
biovec-16 258 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0
biovec-4 257 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 66390 68026 16 226 1 : tunables 120 60 0 : slabdata 301 301 0
bio 66389 67222 64 61 1 : tunables 120 60 0 : slabdata 1102 1102 0
(which are freed later on, but the cause for the trashing during the swap IO).
While AS does:
[marcelo@yage marcelo]$ cat /proc/slabinfo | grep bio
biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0
biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0
biovec-64 260 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0
biovec-16 280 280 192 20 1 : tunables 120 60 0 : slabdata 14 14 0
biovec-4 264 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0
biovec-1 4478 5424 16 226 1 : tunables 120 60 0 : slabdata 24 24 0
bio 4525 5002 64 61 1 : tunables 120 60 0 : slabdata 81 82 0
Odd thing is the 400M swapped out are not reclaimed after exp (the 512MB callocator) exits. With AS
almost all swapped out memory is reclaimed on exit.
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 492828 13308 320 3716 0 0 0 0 1002 5 0 0 100 0
Jens, is this huge amount of bio/biovec's allocations expected with CFQ? Its really really bad.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit
2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
@ 2004-08-22 19:18 ` Karl Vogel
2004-08-23 14:12 ` Marcelo Tosatti
0 siblings, 1 reply; 12+ messages in thread
From: Karl Vogel @ 2004-08-22 19:18 UTC (permalink / raw)
To: linux-kernel
When using elevator=as I'm unable to trigger the swap of death, so it seems
that the CFQ scheduler is at blame here.
With AS scheduler, the system recovers in +-10 seconds, vmstat output during
that time:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 295632 40372 49400 87 278 324 303 1424 784 7 2 78 13
0 0 0 295632 40372 49400 0 0 0 0 1210 648 3 1 96 0
0 0 0 295632 40372 49400 0 0 0 0 1209 652 4 0 96 0
2 0 0 112784 40372 49400 0 0 0 0 1204 630 23 34 43 0
1 9 156236 788 264 8128 28 156220 3012 156228 3748 3655 11 31 0 59
0 15 176656 2196 280 8664 0 20420 556 20436 1108 374 2 5 0 93
0 17 205320 724 232 7960 28 28664 396 28664 1118 503 7 12 0 81
2 12 217892 1812 252 8556 248 12584 864 12584 1495 318 2 7 0 91
4 14 253268 2500 268 8728 188 35392 432 35392 1844 399 3 7 0 90
0 13 255692 1188 288 9152 960 2424 1408 2424 1173 2215 10 5 0 85
0 7 266140 2288 312 9276 604 10468 752 10468 1248 644 5 5 0 90
0 7 190516 340636 348 9860 1400 0 2016 0 1294 817 4 8 0 88
1 8 190516 339460 384 10844 552 0 1556 4 1241 642 3 1 0 96
1 3 190516 337084 404 11968 1432 0 2576 4 1292 788 3 1 0 96
0 6 190516 333892 420 13612 1844 0 3500 0 1343 850 5 2 0 93
0 1 190516 333700 424 13848 480 0 720 0 1250 654 3 2 0 95
0 1 190516 334468 424 13848 188 0 188 0 1224 589 3 2 0 95
With CFQ processes got stuck in 'D' and never left that state. See URL's in my
initial post for diagnostics.
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-08-24 10:54 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-23 16:10 Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
2004-08-23 17:00 ` Jens Axboe
2004-08-24 10:03 ` Jens Axboe
2004-08-24 9:18 ` Marcelo Tosatti
2004-08-24 10:52 ` Jens Axboe
2004-08-24 10:13 ` Jens Axboe
-- strict thread matches above, loose matches on Subject: below --
2004-08-24 10:35 Karl Vogel
2004-08-24 10:28 Karl Vogel
2004-08-24 10:29 ` Jens Axboe
2004-08-22 13:27 Kernel 2.6.8.1: swap storm of death Karl Vogel
2004-08-22 18:49 ` Kernel 2.6.8.1: swap storm of death - 2.6.8.1-mm4 also karl.vogel
2004-08-22 19:18 ` Kernel 2.6.8.1: swap storm of death - CFQ scheduler=culprit Karl Vogel
2004-08-23 14:12 ` Marcelo Tosatti
2004-08-23 15:41 ` Jens Axboe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).