linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] 2.4 VM sucks. Again
@ 2002-05-23 13:11 Roy Sigurd Karlsbakk
  2002-05-23 14:54 ` Martin J. Bligh
                   ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-23 13:11 UTC (permalink / raw)
  To: linux-kernel

hi all

I've been here complaining about the 2.4 VM before, and here I am, back again.

PROBLEM:
----------------------
Starting up 30 downloads from a custom HTTP server (or Tux - or Apache - 
doesn't matter), file size is 3-6GB, download speed = ~4.5Mbps. After some 
time the kernel (a) goes bOOM (out of memory) if not having any swap, or (b) 
goes gong swapping out anything it can.

The custom HTTP server processes each have a static buffer of two megabytes, 
no malloc()s, and are written in < 1000 lines of C.

Theory: The buffer fills up, as the clients can't read as fast as kernel is 
reading from disk, and the server goes boom

thanks for any help

roy


-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 13:11 [BUG] 2.4 VM sucks. Again Roy Sigurd Karlsbakk
@ 2002-05-23 14:54 ` Martin J. Bligh
  2002-05-23 16:29   ` Roy Sigurd Karlsbakk
  2002-05-23 16:03 ` Johannes Erdfelt
  2002-05-23 18:12 ` jlnance
  2 siblings, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2002-05-23 14:54 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk, linux-kernel

> PROBLEM:
> ----------------------
> Starting up 30 downloads from a custom HTTP server (or Tux - or Apache - 
> doesn't matter), file size is 3-6GB, download speed = ~4.5Mbps. After some 
> time the kernel (a) goes bOOM (out of memory) if not having any swap, or (b) 
> goes gong swapping out anything it can.

How much RAM do you have, and what does /proc/meminfo
and /proc/slabinfo say just before the explosion point?

M.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 13:11 [BUG] 2.4 VM sucks. Again Roy Sigurd Karlsbakk
  2002-05-23 14:54 ` Martin J. Bligh
@ 2002-05-23 16:03 ` Johannes Erdfelt
  2002-05-23 16:33   ` Roy Sigurd Karlsbakk
  2002-05-23 18:12 ` jlnance
  2 siblings, 1 reply; 48+ messages in thread
From: Johannes Erdfelt @ 2002-05-23 16:03 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: linux-kernel

On Thu, May 23, 2002, Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote:
> I've been here complaining about the 2.4 VM before, and here I am, back again.
> 
> PROBLEM:
> ----------------------
> Starting up 30 downloads from a custom HTTP server (or Tux - or Apache - 
> doesn't matter), file size is 3-6GB, download speed = ~4.5Mbps. After some 
> time the kernel (a) goes bOOM (out of memory) if not having any swap, or (b) 
> goes gong swapping out anything it can.
> 
> The custom HTTP server processes each have a static buffer of two megabytes, 
> no malloc()s, and are written in < 1000 lines of C.
> 
> Theory: The buffer fills up, as the clients can't read as fast as kernel is 
> reading from disk, and the server goes boom
> 
> thanks for any help

What kernel is this?

JE


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 14:54 ` Martin J. Bligh
@ 2002-05-23 16:29   ` Roy Sigurd Karlsbakk
  2002-05-23 16:46     ` Martin J. Bligh
  2002-05-24 15:11     ` [BUG] 2.4 VM sucks. Again Alan Cox
  0 siblings, 2 replies; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-23 16:29 UTC (permalink / raw)
  To: Martin J. Bligh, linux-kernel

> > Starting up 30 downloads from a custom HTTP server (or Tux - or Apache -
> > doesn't matter), file size is 3-6GB, download speed = ~4.5Mbps. After
> > some time the kernel (a) goes bOOM (out of memory) if not having any
> > swap, or (b) goes gong swapping out anything it can.
>
> How much RAM do you have, and what does /proc/meminfo
> and /proc/slabinfo say just before the explosion point?

I have 1 gig - highmem (not enabled) - 900 megs.
for what I can see, kernel can't reclaim buffers fast enough.
ut looks better on -aa.

-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 16:03 ` Johannes Erdfelt
@ 2002-05-23 16:33   ` Roy Sigurd Karlsbakk
  2002-05-23 22:50     ` Luigi Genoni
  0 siblings, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-23 16:33 UTC (permalink / raw)
  To: Johannes Erdfelt; +Cc: linux-kernel

> What kernel is this?

Sorry. forgot to tell

it's 2.4.18-ac? and 2.4.19pre-several. I beleive it's the same stuff I've 
seen on earlier kernels as well. 

-aa seems to solve or reduce the problem
-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 16:29   ` Roy Sigurd Karlsbakk
@ 2002-05-23 16:46     ` Martin J. Bligh
  2002-05-24 10:04       ` Roy Sigurd Karlsbakk
  2002-05-24 15:11     ` [BUG] 2.4 VM sucks. Again Alan Cox
  1 sibling, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2002-05-23 16:46 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk, linux-kernel

>> > Starting up 30 downloads from a custom HTTP server (or Tux - or Apache -
>> > doesn't matter), file size is 3-6GB, download speed = ~4.5Mbps. After
>> > some time the kernel (a) goes bOOM (out of memory) if not having any
>> > swap, or (b) goes gong swapping out anything it can.
>> 
>> How much RAM do you have, and what does /proc/meminfo
>> and /proc/slabinfo say just before the explosion point?
> 
> I have 1 gig - highmem (not enabled) - 900 megs.
> for what I can see, kernel can't reclaim buffers fast enough.
> ut looks better on -aa.

Sounds like exactly the same problem we were having. There are two
approaches to solving this - Andrea has a patch that tries to free them
under memory pressure, akpm has a patch that hacks them down as soon
as you've fininshed with them (posted to lse-tech mailing list). Both approaches
seemed to work for me, but the performance of the fixes still has to be established.

I've seen over 1Gb of buffer_heads ;-)

M.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 13:11 [BUG] 2.4 VM sucks. Again Roy Sigurd Karlsbakk
  2002-05-23 14:54 ` Martin J. Bligh
  2002-05-23 16:03 ` Johannes Erdfelt
@ 2002-05-23 18:12 ` jlnance
  2002-05-24 10:36   ` Roy Sigurd Karlsbakk
  2 siblings, 1 reply; 48+ messages in thread
From: jlnance @ 2002-05-23 18:12 UTC (permalink / raw)
  To: roy, linux-kernel

On Thu, May 23, 2002 at 03:11:24PM +0200, Roy Sigurd Karlsbakk wrote:

> Starting up 30 downloads from a custom HTTP server (or Tux - or Apache - 
> doesn't matter), file size is 3-6GB, download speed = ~4.5Mbps. After some 
> time the kernel (a) goes bOOM (out of memory) if not having any swap, or (b) 
> goes gong swapping out anything it can.

Does this work if the client and the server are on the same machine?  It
would make reproducing this a lot easier if it only required 1 machine.

Thanks,

Jim

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 16:33   ` Roy Sigurd Karlsbakk
@ 2002-05-23 22:50     ` Luigi Genoni
  2002-05-24 11:53       ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 48+ messages in thread
From: Luigi Genoni @ 2002-05-23 22:50 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Johannes Erdfelt, linux-kernel

Have you tried latest aa versione?
They are quite interesting.
I am playing with 1.4.19-pre8aa3 right now...

On Thu, 23 May 2002, Roy Sigurd Karlsbakk wrote:

> > What kernel is this?
>
> Sorry. forgot to tell
>
> it's 2.4.18-ac? and 2.4.19pre-several. I beleive it's the same stuff I've
> seen on earlier kernels as well.
>
> -aa seems to solve or reduce the problem
> --
> Roy Sigurd Karlsbakk, Datavaktmester
>
> Computers are like air conditioners.
> They stop working when you open Windows.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 16:46     ` Martin J. Bligh
@ 2002-05-24 10:04       ` Roy Sigurd Karlsbakk
  2002-05-24 14:35         ` Martin J. Bligh
  0 siblings, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-24 10:04 UTC (permalink / raw)
  To: Martin J. Bligh, linux-kernel

> > I have 1 gig - highmem (not enabled) - 900 megs.
> > for what I can see, kernel can't reclaim buffers fast enough.
> > ut looks better on -aa.
>
> Sounds like exactly the same problem we were having. There are two
> approaches to solving this - Andrea has a patch that tries to free them
> under memory pressure, akpm has a patch that hacks them down as soon
> as you've fininshed with them (posted to lse-tech mailing list). Both
> approaches seemed to work for me, but the performance of the fixes still
> has to be established.

Where can I find the akpm patch?

Any plans to merge this into the main kernel, giving a choice (in config or 
/proc) to enable this?

> I've seen over 1Gb of buffer_heads ;-)
>
> M.

-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 18:12 ` jlnance
@ 2002-05-24 10:36   ` Roy Sigurd Karlsbakk
  2002-05-31 21:21     ` Andrea Arcangeli
  0 siblings, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-24 10:36 UTC (permalink / raw)
  To: jlnance, linux-kernel

On Thursday 23 May 2002 20:12, jlnance@intrex.net wrote:
> On Thu, May 23, 2002 at 03:11:24PM +0200, Roy Sigurd Karlsbakk wrote:
> > Starting up 30 downloads from a custom HTTP server (or Tux - or Apache -
> > doesn't matter), file size is 3-6GB, download speed = ~4.5Mbps. After
> > some time the kernel (a) goes bOOM (out of memory) if not having any
> > swap, or (b) goes gong swapping out anything it can.
>
> Does this work if the client and the server are on the same machine?  It
> would make reproducing this a lot easier if it only required 1 machine.

I guess it'd work fine with only one machine, as IMO, the problem must be the 
kernel not releasing buffers
-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 22:50     ` Luigi Genoni
@ 2002-05-24 11:53       ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-24 11:53 UTC (permalink / raw)
  To: Luigi Genoni; +Cc: Johannes Erdfelt, linux-kernel

On Friday 24 May 2002 00:50, Luigi Genoni wrote:
> Have you tried latest aa versione?
> They are quite interesting.
> I am playing with 1.4.19-pre8aa3 right now...

I just tried it. it's better, but not good enough. it still fucks up

-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 10:04       ` Roy Sigurd Karlsbakk
@ 2002-05-24 14:35         ` Martin J. Bligh
  2002-05-24 19:32           ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2002-05-24 14:35 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk, linux-kernel

>> Sounds like exactly the same problem we were having. There are two
>> approaches to solving this - Andrea has a patch that tries to free them
>> under memory pressure, akpm has a patch that hacks them down as soon
>> as you've fininshed with them (posted to lse-tech mailing list). Both
>> approaches seemed to work for me, but the performance of the fixes still
>> has to be established.
> 
> Where can I find the akpm patch?

http://marc.theaimsgroup.com/?l=lse-tech&m=102083525007877&w=2

> Any plans to merge this into the main kernel, giving a choice 
> (in config or /proc) to enable this?

I don't think Andrew is ready to submit this yet ... before anything
gets merged back, it'd be very worthwhile testing the relative
performance of both solutions ... the more testers we have the
better ;-)

Thanks,

M.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-23 16:29   ` Roy Sigurd Karlsbakk
  2002-05-23 16:46     ` Martin J. Bligh
@ 2002-05-24 15:11     ` Alan Cox
  2002-05-24 15:53       ` Martin J. Bligh
  2002-05-27 11:12       ` Roy Sigurd Karlsbakk
  1 sibling, 2 replies; 48+ messages in thread
From: Alan Cox @ 2002-05-24 15:11 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Martin J. Bligh, linux-kernel

> > How much RAM do you have, and what does /proc/meminfo
> > and /proc/slabinfo say just before the explosion point?
> 
> I have 1 gig - highmem (not enabled) - 900 megs.
> for what I can see, kernel can't reclaim buffers fast enough.
> ut looks better on -aa.
> 

What sort of setup. I can't duplicate the problem here ?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 15:11     ` [BUG] 2.4 VM sucks. Again Alan Cox
@ 2002-05-24 15:53       ` Martin J. Bligh
  2002-05-24 16:14         ` Alan Cox
  2002-05-27 11:12       ` Roy Sigurd Karlsbakk
  1 sibling, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2002-05-24 15:53 UTC (permalink / raw)
  To: Alan Cox, Roy Sigurd Karlsbakk; +Cc: linux-kernel

>> > How much RAM do you have, and what does /proc/meminfo
>> > and /proc/slabinfo say just before the explosion point?
>> 
>> I have 1 gig - highmem (not enabled) - 900 megs.
>> for what I can see, kernel can't reclaim buffers fast enough.
>> ut looks better on -aa.
>> 
> 
> What sort of setup. I can't duplicate the problem here ?

I'm not sure exactly what Roy was doing, but we were taking a machine
with 16Gb of RAM, and reading files into the page cache - I think we built up
8 million buffer_heads according to slabinfo ... on a P4 they're 128 bytes each,
on a P3 96 bytes.

M.



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 15:53       ` Martin J. Bligh
@ 2002-05-24 16:14         ` Alan Cox
  2002-05-24 16:31           ` Martin J. Bligh
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Cox @ 2002-05-24 16:14 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Alan Cox, Roy Sigurd Karlsbakk, linux-kernel

> > What sort of setup. I can't duplicate the problem here ?
> 
> I'm not sure exactly what Roy was doing, but we were taking a machine
> with 16Gb of RAM, and reading files into the page cache - I think we built up
> 8 million buffer_heads according to slabinfo ... on a P4 they're 128 bytes each,
> on a P3 96 bytes.

The buffer heads one would make sense. I only test on realistic sized systems. 
Once you pass 4Gb there are so many problems its not worth using x86 in the
long run


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 16:14         ` Alan Cox
@ 2002-05-24 16:31           ` Martin J. Bligh
  2002-05-24 17:30             ` Austin Gonyou
  0 siblings, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2002-05-24 16:31 UTC (permalink / raw)
  To: Alan Cox; +Cc: Roy Sigurd Karlsbakk, linux-kernel

>> I'm not sure exactly what Roy was doing, but we were taking a machine
>> with 16Gb of RAM, and reading files into the page cache - I think we built up
>> 8 million buffer_heads according to slabinfo ... on a P4 they're 128 bytes each,
>> on a P3 96 bytes.
> 
> The buffer heads one would make sense. I only test on realistic sized systems. 

Well, it'll still waste valuable memory there too, though you may not totally kill it.

> Once you pass 4Gb there are so many problems its not worth using x86 in the
> long run

Nah, we just haven't fixed them yet ;-)

M.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 16:31           ` Martin J. Bligh
@ 2002-05-24 17:30             ` Austin Gonyou
  2002-05-24 17:43               ` Martin J. Bligh
  2002-05-27  9:24               ` [BUG] 2.4 VM sucks. Again Marco Colombo
  0 siblings, 2 replies; 48+ messages in thread
From: Austin Gonyou @ 2002-05-24 17:30 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Alan Cox, Roy Sigurd Karlsbakk, linux-kernel

On Fri, 2002-05-24 at 11:31, Martin J. Bligh wrote:
> >> I'm not sure exactly what Roy was doing, but we were taking a machine
> >> with 16Gb of RAM, and reading files into the page cache - I think we built up
> >> 8 million buffer_heads according to slabinfo ... on a P4 they're 128 bytes each,
> >> on a P3 96 bytes.
> > 
> > The buffer heads one would make sense. I only test on realistic sized systems. 
> 
> Well, it'll still waste valuable memory there too, though you may not totally kill it.
> 
> > Once you pass 4Gb there are so many problems its not worth using x86 in the
> > long run
> 
I assume that you mean by "not worth using x86" you're referring to say,
degraded performance over other platforms? Well...if you talk
price/performance, using x86 is perfect in those terms since you can buy
more boxes and have a more fluid architecture, rather than building a
monolithic system. Monolithic systems aren't always the best. Just look
at Fermilab!

> Nah, we just haven't fixed them yet ;-)
> 
> M.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 17:30             ` Austin Gonyou
@ 2002-05-24 17:43               ` Martin J. Bligh
  2002-05-24 18:03                 ` Austin Gonyou
  2002-05-27  9:24               ` [BUG] 2.4 VM sucks. Again Marco Colombo
  1 sibling, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2002-05-24 17:43 UTC (permalink / raw)
  To: Austin Gonyou; +Cc: Alan Cox, Roy Sigurd Karlsbakk, linux-kernel

> I assume that you mean by "not worth using x86" you're referring to say,
> degraded performance over other platforms? Well...if you talk
> price/performance, using x86 is perfect in those terms since you can buy
> more boxes and have a more fluid architecture, rather than building a
> monolithic system. Monolithic systems aren't always the best. Just look
> at Fermilab!

Well, to be honest, with the current mainline kernel on >4Gb x86 machines,
we're not talking about slow performance on mainline kernel, we're talking
about "falls flat on it's face, in a jibbering heap" (if you actually stress the
machine with real workloads). If we apply a bunch of patches, we can get 
the ostritch to just about fly (most of the time), but we're working towards good 
performance too ... it's not that far off. 

Of course, this means that we actually have to get these patches accepted
for them to be of much use ;-). -aa kernel works best in this area, on the 
workloads I've been looking at so far ... this area is very much "under active
development" at the moment.

M.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 17:43               ` Martin J. Bligh
@ 2002-05-24 18:03                 ` Austin Gonyou
  2002-05-24 18:10                   ` Martin J. Bligh
  0 siblings, 1 reply; 48+ messages in thread
From: Austin Gonyou @ 2002-05-24 18:03 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Alan Cox, Roy Sigurd Karlsbakk, linux-kernel

On Fri, 2002-05-24 at 12:43, Martin J. Bligh wrote:
> > I assume that you mean by "not worth using x86" you're referring to say,
> > degraded performance over other platforms? Well...if you talk
> > price/performance, using x86 is perfect in those terms since you can buy
> > more boxes and have a more fluid architecture, rather than building a
> > monolithic system. Monolithic systems aren't always the best. Just look
> > at Fermilab!
> 
> Well, to be honest, with the current mainline kernel on >4Gb x86 machines,
> we're not talking about slow performance on mainline kernel, we're talking
> about "falls flat on it's face, in a jibbering heap" (if you actually stress the
> machine with real workloads). If we apply a bunch of patches, we can get 
> the ostritch to just about fly (most of the time), but we're working towards good 
> performance too ... it's not that far off. 

Understood, I think that's everyone's goal in the end anyway.

> Of course, this means that we actually have to get these patches accepted
> for them to be of much use ;-). -aa kernel works best in this area, on the 
> workloads I've been looking at so far ... this area is very much "under active
> development" at the moment.
> 
> M.

Yes, After using a -AA series, then recompiling Glibc with some
optimizations, kind of re-purifying the system a few times. 
Then applying some Oracle patches, (to fix some Oracle bugs in our
environment) then voila!  We can have a *very* fast Linux box on 4P or
8P with 4-8GB RAM with an uptime of >60 days. I've never a box longer
than that to prove otherwise, but it was stable from a *production*
point of view. 

Also, adjusting the bdflush parms greatly increases stability I've found
in this respect. On top of all of that though, using XFS with increased
logbuffers and LVM or EVMS to do striping really improved performance
with IO too. 

Problem is, my tests are *unofficial* but I plan to do something perhaps
at OSDL and see what we can show in a max single-box config with real
hardware, etc. 

Anyway, I digress. 

Austin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 18:03                 ` Austin Gonyou
@ 2002-05-24 18:10                   ` Martin J. Bligh
  2002-05-24 18:29                     ` 2.4 Kernel Perf discussion [Was Re: [BUG] 2.4 VM sucks. Again] Austin Gonyou
  0 siblings, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2002-05-24 18:10 UTC (permalink / raw)
  To: Austin Gonyou; +Cc: Alan Cox, Roy Sigurd Karlsbakk, linux-kernel

> Also, adjusting the bdflush parms greatly increases stability I've found
> in this respect.

What exactly did you do to them? Can you specify what you're set to
at the moment (and anything you found along the way in tuning)?

> Problem is, my tests are *unofficial* but I plan to do something perhaps
> at OSDL and see what we can show in a max single-box config with real
> hardware, etc. 

Great stuff, I'm very interested in knowing about any problems you find.
We're doing very similar things here, anywhere from 8-32 procs, and
4-32Gb of RAM, both NUMA and SMP.

Thanks,

Martin.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* 2.4 Kernel Perf discussion [Was Re: [BUG] 2.4 VM sucks. Again]
  2002-05-24 18:10                   ` Martin J. Bligh
@ 2002-05-24 18:29                     ` Austin Gonyou
  2002-05-24 19:01                       ` Stephen Frost
  0 siblings, 1 reply; 48+ messages in thread
From: Austin Gonyou @ 2002-05-24 18:29 UTC (permalink / raw)
  To: linux-kernel

On Fri, 2002-05-24 at 13:10, Martin J. Bligh wrote:
> > Also, adjusting the bdflush parms greatly increases stability I've found
> > in this respect.
> 
> What exactly did you do to them? Can you specify what you're set to
> at the moment (and anything you found along the way in tuning)?

I actually changed the defaults of the bdflush parms before compiling. I
don't have that info right now because I had to dismantle my system in a
hurry, was a try-buy from Dell at the time, and we weren't authorized to
buy yet.

At any rate, I found, at the time, (2.4.17-pre5-aa2-xfs I think), that
the defaults for bdflush when running dbench would just *destroy* the
system. Changing the bdflush parms to be about 60% full, and flushing to
30%, while potentially wasteful, was indeed an improvement. 

IOzone benchmarks also show distinct improvements in this regard as
well, but I never had such terrible kswapd/bdflush issues with that test
as I did with dbench, to begin with. 

The test system was a Dell 6450 with 8GB ram and P3 Xeon 700Mhz 2MB
cache procs. I expect far greater peformance from the P4 Xeon 1.6GHz 1MB
Cache procs though. In that scenario, we will only be using 4GB ram
probably. That test will be internal to us and should start in the next
couple weeks (I hope). I'll be charged with making the system testing as
immaculate as possible so we have crisp information to use in our
decision making process as we move from Sun to x86.

> > Problem is, my tests are *unofficial* but I plan to do something perhaps
> > at OSDL and see what we can show in a max single-box config with real
> > hardware, etc. 
> 
> Great stuff, I'm very interested in knowing about any problems you find.
> We're doing very similar things here, anywhere from 8-32 procs, and
> 4-32Gb of RAM, both NUMA and SMP.

As soon as I can get time on their systems to do 4/8-way testing, I'll
make my benches available. Should be good stuff. :)

> Thanks,
> 
> Martin.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: 2.4 Kernel Perf discussion [Was Re: [BUG] 2.4 VM sucks. Again]
  2002-05-24 18:29                     ` 2.4 Kernel Perf discussion [Was Re: [BUG] 2.4 VM sucks. Again] Austin Gonyou
@ 2002-05-24 19:01                       ` Stephen Frost
  0 siblings, 0 replies; 48+ messages in thread
From: Stephen Frost @ 2002-05-24 19:01 UTC (permalink / raw)
  To: Austin Gonyou; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 644 bytes --]

* Austin Gonyou (austin@digitalroadkill.net) wrote:
> As soon as I can get time on their systems to do 4/8-way testing, I'll
> make my benches available. Should be good stuff. :)

  I may be getting an opportunity in the next weeks/months to play with
  a 16-way SparcCenter 2000 w/ 85mhz procs and 3GB of ram.  I realize
  this machine is rather pokey but I was wondering if it might be useful
  to help test the kernel with a large number of processors.  So, if you
  or anyone else have some test you'd like me to run (assuming I get the
  machine all set up and running Linux) let me know and I'd be happy to
  try some things.

  	Stephen

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 14:35         ` Martin J. Bligh
@ 2002-05-24 19:32           ` Andrew Morton
  2002-05-30 10:29             ` Roy Sigurd Karlsbakk
                               ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Andrew Morton @ 2002-05-24 19:32 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Roy Sigurd Karlsbakk, linux-kernel

"Martin J. Bligh" wrote:
> 
> >> Sounds like exactly the same problem we were having. There are two
> >> approaches to solving this - Andrea has a patch that tries to free them
> >> under memory pressure, akpm has a patch that hacks them down as soon
> >> as you've fininshed with them (posted to lse-tech mailing list). Both
> >> approaches seemed to work for me, but the performance of the fixes still
> >> has to be established.
> >
> > Where can I find the akpm patch?
> 
> http://marc.theaimsgroup.com/?l=lse-tech&m=102083525007877&w=2
> 
> > Any plans to merge this into the main kernel, giving a choice
> > (in config or /proc) to enable this?
> 
> I don't think Andrew is ready to submit this yet ... before anything
> gets merged back, it'd be very worthwhile testing the relative
> performance of both solutions ... the more testers we have the
> better ;-)
> 

Cripes no.  It's pretty experimental.  Andrea spotted a bug, too.  Fixed
version is below.

It's possible that keeping the number of buffers as low as possible
will give improved performance over Andrea's approach because it
leaves more ZONE_NORMAL for other things.  It's also possible that
it'll give worse performance because more get_block's need to be
done for file overwriting.


--- 2.4.19-pre8/include/linux/pagemap.h~nuke-buffers	Fri May 24 12:24:56 2002
+++ 2.4.19-pre8-akpm/include/linux/pagemap.h	Fri May 24 12:26:30 2002
@@ -89,13 +89,7 @@ extern void add_to_page_cache(struct pag
 extern void add_to_page_cache_locked(struct page * page, struct address_space *mapping, unsigned long index);
 extern int add_to_page_cache_unique(struct page * page, struct address_space *mapping, unsigned long index, struct page **hash);
 
-extern void ___wait_on_page(struct page *);
-
-static inline void wait_on_page(struct page * page)
-{
-	if (PageLocked(page))
-		___wait_on_page(page);
-}
+extern void wait_on_page(struct page *);
 
 extern struct page * grab_cache_page (struct address_space *, unsigned long);
 extern struct page * grab_cache_page_nowait (struct address_space *, unsigned long);
--- 2.4.19-pre8/mm/filemap.c~nuke-buffers	Fri May 24 12:24:56 2002
+++ 2.4.19-pre8-akpm/mm/filemap.c	Fri May 24 12:24:56 2002
@@ -608,7 +608,7 @@ int filemap_fdatawait(struct address_spa
 		page_cache_get(page);
 		spin_unlock(&pagecache_lock);
 
-		___wait_on_page(page);
+		wait_on_page(page);
 		if (PageError(page))
 			ret = -EIO;
 
@@ -805,33 +805,29 @@ static inline wait_queue_head_t *page_wa
 	return &wait[hash];
 }
 
-/* 
- * Wait for a page to get unlocked.
+static void kill_buffers(struct page *page)
+{
+	if (!PageLocked(page))
+		BUG();
+	if (page->buffers)
+		try_to_release_page(page, GFP_NOIO);
+}
+
+/*
+ * Wait for a page to come unlocked.  Then try to ditch its buffer_heads.
  *
- * This must be called with the caller "holding" the page,
- * ie with increased "page->count" so that the page won't
- * go away during the wait..
+ * FIXME: Make the ditching dependent on CONFIG_MONSTER_BOX or something.
  */
-void ___wait_on_page(struct page *page)
+void wait_on_page(struct page *page)
 {
-	wait_queue_head_t *waitqueue = page_waitqueue(page);
-	struct task_struct *tsk = current;
-	DECLARE_WAITQUEUE(wait, tsk);
-
-	add_wait_queue(waitqueue, &wait);
-	do {
-		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
-		if (!PageLocked(page))
-			break;
-		sync_page(page);
-		schedule();
-	} while (PageLocked(page));
-	__set_task_state(tsk, TASK_RUNNING);
-	remove_wait_queue(waitqueue, &wait);
+	lock_page(page);
+	kill_buffers(page);
+	unlock_page(page);
 }
+EXPORT_SYMBOL(wait_on_page);
 
 /*
- * Unlock the page and wake up sleepers in ___wait_on_page.
+ * Unlock the page and wake up sleepers in lock_page.
  */
 void unlock_page(struct page *page)
 {
@@ -1400,6 +1396,11 @@ found_page:
 
 		if (!Page_Uptodate(page))
 			goto page_not_up_to_date;
+		if (page->buffers) {
+			lock_page(page);
+			kill_buffers(page);
+			unlock_page(page);
+		}
 		generic_file_readahead(reada_ok, filp, inode, page);
 page_ok:
 		/* If users can be writing to this page using arbitrary
@@ -1457,6 +1458,7 @@ page_not_up_to_date:
 
 		/* Did somebody else fill it already? */
 		if (Page_Uptodate(page)) {
+			kill_buffers(page);
 			UnlockPage(page);
 			goto page_ok;
 		}
@@ -1948,6 +1950,11 @@ retry_find:
 	 */
 	if (!Page_Uptodate(page))
 		goto page_not_uptodate;
+	if (page->buffers) {
+		lock_page(page);
+		kill_buffers(page);
+		unlock_page(page);
+	}
 
 success:
  	/*
@@ -2006,6 +2013,7 @@ page_not_uptodate:
 
 	/* Did somebody else get it up-to-date? */
 	if (Page_Uptodate(page)) {
+		kill_buffers(page);
 		UnlockPage(page);
 		goto success;
 	}
@@ -2033,6 +2041,7 @@ page_not_uptodate:
 
 	/* Somebody else successfully read it in? */
 	if (Page_Uptodate(page)) {
+		kill_buffers(page);
 		UnlockPage(page);
 		goto success;
 	}
@@ -2850,6 +2859,7 @@ retry:
 		goto retry;
 	}
 	if (Page_Uptodate(page)) {
+		kill_buffers(page);
 		UnlockPage(page);
 		goto out;
 	}
--- 2.4.19-pre8/kernel/ksyms.c~nuke-buffers	Fri May 24 12:24:56 2002
+++ 2.4.19-pre8-akpm/kernel/ksyms.c	Fri May 24 12:24:56 2002
@@ -202,7 +202,6 @@ EXPORT_SYMBOL(ll_rw_block);
 EXPORT_SYMBOL(submit_bh);
 EXPORT_SYMBOL(unlock_buffer);
 EXPORT_SYMBOL(__wait_on_buffer);
-EXPORT_SYMBOL(___wait_on_page);
 EXPORT_SYMBOL(generic_direct_IO);
 EXPORT_SYMBOL(discard_bh_page);
 EXPORT_SYMBOL(block_write_full_page);
--- 2.4.19-pre8/mm/vmscan.c~nuke-buffers	Fri May 24 12:24:56 2002
+++ 2.4.19-pre8-akpm/mm/vmscan.c	Fri May 24 12:24:56 2002
@@ -365,8 +365,13 @@ static int shrink_cache(int nr_pages, zo
 		if (unlikely(!page_count(page)))
 			continue;
 
-		if (!memclass(page_zone(page), classzone))
+		if (!memclass(page_zone(page), classzone)) {
+			if (page->buffers && !TryLockPage(page)) {
+				try_to_release_page(page, GFP_NOIO);
+				unlock_page(page);
+			}
 			continue;
+		}
 
 		/* Racy check to avoid trylocking when not worthwhile */
 		if (!page->buffers && (page_count(page) != 1 || !page->mapping))
@@ -562,6 +567,11 @@ static int shrink_caches(zone_t * classz
 	nr_pages -= kmem_cache_reap(gfp_mask);
 	if (nr_pages <= 0)
 		return 0;
+	if ((gfp_mask & __GFP_WAIT) && (shrink_buffer_cache() > 16)) {
+		nr_pages -= kmem_cache_reap(gfp_mask);
+		if (nr_pages <= 0)
+			return 0;
+	}
 
 	nr_pages = chunk_size;
 	/* try to keep the active list 2/3 of the size of the cache */
--- 2.4.19-pre8/fs/buffer.c~nuke-buffers	Fri May 24 12:24:56 2002
+++ 2.4.19-pre8-akpm/fs/buffer.c	Fri May 24 12:26:28 2002
@@ -1500,6 +1500,10 @@ static int __block_write_full_page(struc
 	/* Stage 3: submit the IO */
 	do {
 		struct buffer_head *next = bh->b_this_page;
+		/*
+		 * Stick it on BUF_LOCKED so shrink_buffer_cache() can nail it.
+		 */
+		refile_buffer(bh);
 		submit_bh(WRITE, bh);
 		bh = next;
 	} while (bh != head);
@@ -2615,6 +2619,25 @@ static int sync_page_buffers(struct buff
 int try_to_free_buffers(struct page * page, unsigned int gfp_mask)
 {
 	struct buffer_head * tmp, * bh = page->buffers;
+	int was_uptodate = 1;
+
+	if (!PageLocked(page))
+		BUG();
+
+	if (!bh)
+		return 1;
+	/*
+	 * Quick check for freeable buffers before we go take three
+	 * global locks.
+	 */
+	if (!(gfp_mask & __GFP_IO)) {
+		tmp = bh;
+		do {
+			if (buffer_busy(tmp))
+				return 0;
+			tmp = tmp->b_this_page;
+		} while (tmp != bh);
+	}
 
 cleaned_buffers_try_again:
 	spin_lock(&lru_list_lock);
@@ -2637,7 +2660,8 @@ cleaned_buffers_try_again:
 		tmp = tmp->b_this_page;
 
 		if (p->b_dev == B_FREE) BUG();
-
+		if (!buffer_uptodate(p))
+			was_uptodate = 0;
 		remove_inode_queue(p);
 		__remove_from_queues(p);
 		__put_unused_buffer_head(p);
@@ -2645,7 +2669,15 @@ cleaned_buffers_try_again:
 	spin_unlock(&unused_list_lock);
 
 	/* Wake up anyone waiting for buffer heads */
-	wake_up(&buffer_wait);
+	smp_mb();
+	if (waitqueue_active(&buffer_wait))
+		wake_up(&buffer_wait);
+
+	/*
+	 * Make sure we don't read buffers again when they are reattached
+	 */
+	if (was_uptodate)
+		SetPageUptodate(page);
 
 	/* And free the page */
 	page->buffers = NULL;
@@ -2674,6 +2706,62 @@ busy_buffer_page:
 }
 EXPORT_SYMBOL(try_to_free_buffers);
 
+/*
+ * Returns the number of pages which might have become freeable 
+ */
+int shrink_buffer_cache(void)
+{
+	struct buffer_head *bh;
+	int nr_todo;
+	int nr_shrunk = 0;
+
+	/*
+	 * Move any clean unlocked buffers from BUF_LOCKED onto BUF_CLEAN
+	 */
+	spin_lock(&lru_list_lock);
+	for ( ; ; ) {
+		bh = lru_list[BUF_LOCKED];
+		if (!bh || buffer_locked(bh))
+			break;
+		__refile_buffer(bh);
+	}
+
+	/*
+	 * Now start liberating buffers
+	 */
+	nr_todo = nr_buffers_type[BUF_CLEAN];
+	while (nr_todo--) {
+		struct page *page;
+
+		bh = lru_list[BUF_CLEAN];
+		if (!bh)
+			break;
+
+		/*
+		 * Park the buffer on BUF_LOCKED so we don't revisit it on
+		 * this pass.
+		 */
+		__remove_from_lru_list(bh);
+		bh->b_list = BUF_LOCKED;
+		__insert_into_lru_list(bh, BUF_LOCKED);
+		page = bh->b_page;
+		if (TryLockPage(page))
+			continue;
+
+		page_cache_get(page);
+		spin_unlock(&lru_list_lock);
+		if (try_to_release_page(page, GFP_NOIO))
+			nr_shrunk++;
+		unlock_page(page);
+		page_cache_release(page);
+		spin_lock(&lru_list_lock);
+	}
+	spin_unlock(&lru_list_lock);
+//	printk("%s: liberated %d page's worth of buffer_heads\n",
+//		__FUNCTION__, nr_shrunk);
+	return (nr_shrunk * sizeof(struct buffer_head)) / PAGE_CACHE_SIZE;
+}
+
 /* ================== Debugging =================== */
 
 void show_buffers(void)
@@ -2988,6 +3076,7 @@ int kupdate(void *startup)
 #ifdef DEBUG
 		printk(KERN_DEBUG "kupdate() activated...\n");
 #endif
+		shrink_buffer_cache();
 		sync_old_buffers();
 		run_task_queue(&tq_disk);
 	}
--- 2.4.19-pre8/include/linux/fs.h~nuke-buffers	Fri May 24 12:24:56 2002
+++ 2.4.19-pre8-akpm/include/linux/fs.h	Fri May 24 12:24:56 2002
@@ -1116,6 +1116,7 @@ extern int FASTCALL(try_to_free_buffers(
 extern void refile_buffer(struct buffer_head * buf);
 extern void create_empty_buffers(struct page *, kdev_t, unsigned long);
 extern void end_buffer_io_sync(struct buffer_head *bh, int uptodate);
+extern int shrink_buffer_cache(void);
 
 /* reiserfs_writepage needs this */
 extern void set_buffer_async_io(struct buffer_head *bh) ;


-

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 17:30             ` Austin Gonyou
  2002-05-24 17:43               ` Martin J. Bligh
@ 2002-05-27  9:24               ` Marco Colombo
  2002-05-27 22:24                 ` Austin Gonyou
  1 sibling, 1 reply; 48+ messages in thread
From: Marco Colombo @ 2002-05-27  9:24 UTC (permalink / raw)
  To: Austin Gonyou; +Cc: linux-kernel

On 24 May 2002, Austin Gonyou wrote:

> On Fri, 2002-05-24 at 11:31, Martin J. Bligh wrote:
> > >> I'm not sure exactly what Roy was doing, but we were taking a machine
> > >> with 16Gb of RAM, and reading files into the page cache - I think we built up
> > >> 8 million buffer_heads according to slabinfo ... on a P4 they're 128 bytes each,
> > >> on a P3 96 bytes.
> > > 
> > > The buffer heads one would make sense. I only test on realistic sized systems. 
> > 
> > Well, it'll still waste valuable memory there too, though you may not totally kill it.
> > 
> > > Once you pass 4Gb there are so many problems its not worth using x86 in the
> > > long run
> > 
> I assume that you mean by "not worth using x86" you're referring to say,
> degraded performance over other platforms? Well...if you talk
> price/performance, using x86 is perfect in those terms since you can buy
> more boxes and have a more fluid architecture, rather than building a
> monolithic system. Monolithic systems aren't always the best. Just look
> at Fermilab!

Uh? There are many alpha-based clusters out there. Why do you think 
!x86 == monolithic?

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 15:11     ` [BUG] 2.4 VM sucks. Again Alan Cox
  2002-05-24 15:53       ` Martin J. Bligh
@ 2002-05-27 11:12       ` Roy Sigurd Karlsbakk
  2002-05-27 14:31         ` Alan Cox
  1 sibling, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-27 11:12 UTC (permalink / raw)
  To: Alan Cox; +Cc: Martin J. Bligh, linux-kernel

> > > How much RAM do you have, and what does /proc/meminfo
> > > and /proc/slabinfo say just before the explosion point?
> >
> > I have 1 gig - highmem (not enabled) - 900 megs.
> > for what I can see, kernel can't reclaim buffers fast enough.
> > ut looks better on -aa.
>
> What sort of setup. I can't duplicate the problem here ?

The setup is 2-4 drives in raid0, with chunk size 1MB.

If I try to do ~50 simultanous reads from disk, it's no problem as long as 
the data is being read from the nic with the same speed as it's being read 
from disk. The server apps are running via inetd (testing), and have 2MB of 
buffer each. (read 2MB from disk, write 2MB to NIC).

The server chrashes within minutes. The same problem occurs when using Tux

thanks

roy

-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-27 14:31         ` Alan Cox
@ 2002-05-27 13:43           ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-27 13:43 UTC (permalink / raw)
  To: Alan Cox; +Cc: Martin J. Bligh, linux-kernel

On Monday 27 May 2002 16:31, you wrote:
> On Mon, 2002-05-27 at 12:12, Roy Sigurd Karlsbakk wrote:
> > If I try to do ~50 simultanous reads from disk, it's no problem as long
> > as the data is being read from the nic with the same speed as it's being
> > read from disk. The server apps are running via inetd (testing), and have
> > 2MB of buffer each. (read 2MB from disk, write 2MB to NIC).
> >
> > The server chrashes within minutes. The same problem occurs when using
> > Tux
>
> How much physical memory and is your app using sendfile ?

I have 1gig with highmem disabled, ergo 900MB.

My app is just doing read() write(), but as the problem occurs similarly with 
Tux (which uses sendfile()), it shouldn't really matter

-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-27 11:12       ` Roy Sigurd Karlsbakk
@ 2002-05-27 14:31         ` Alan Cox
  2002-05-27 13:43           ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 48+ messages in thread
From: Alan Cox @ 2002-05-27 14:31 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Martin J. Bligh, linux-kernel

On Mon, 2002-05-27 at 12:12, Roy Sigurd Karlsbakk wrote:
> If I try to do ~50 simultanous reads from disk, it's no problem as long as 
> the data is being read from the nic with the same speed as it's being read 
> from disk. The server apps are running via inetd (testing), and have 2MB of 
> buffer each. (read 2MB from disk, write 2MB to NIC).
> 
> The server chrashes within minutes. The same problem occurs when using Tux
> 

How much physical memory and is your app using sendfile ?


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-27  9:24               ` [BUG] 2.4 VM sucks. Again Marco Colombo
@ 2002-05-27 22:24                 ` Austin Gonyou
  2002-05-27 23:08                   ` Austin Gonyou
  0 siblings, 1 reply; 48+ messages in thread
From: Austin Gonyou @ 2002-05-27 22:24 UTC (permalink / raw)
  To: Marco Colombo; +Cc: linux-kernel


I'm not referring to just *non* x86 arches in this case. Sorry about
that. Any setup can be non-monolithic, but the measurement to decide if
it is cost worthy is price/performance ratio. 

I'm not saying that "if it's not x86, it's monolithic",  in the context
of the discussion, it's really about large costly boxes, designed to be
large, costly boxes. That, from this perspective, is monolithic. 


On Mon, 2002-05-27 at 04:24, Marco Colombo wrote:
> On 24 May 2002, Austin Gonyou wrote:
> 
> > On Fri, 2002-05-24 at 11:31, Martin J. Bligh wrote:
> > > >> I'm not sure exactly what Roy was doing, but we were taking a machine
> > > >> with 16Gb of RAM, and reading files into the page cache - I think we built up
> > > >> 8 million buffer_heads according to slabinfo ... on a P4 they're 128 bytes each,
> > > >> on a P3 96 bytes.
> > > > 
> > > > The buffer heads one would make sense. I only test on realistic sized systems. 
> > > 
> > > Well, it'll still waste valuable memory there too, though you may not totally kill it.
> > > 
> > > > Once you pass 4Gb there are so many problems its not worth using x86 in the
> > > > long run
> > > 
> > I assume that you mean by "not worth using x86" you're referring to say,
> > degraded performance over other platforms? Well...if you talk
> > price/performance, using x86 is perfect in those terms since you can buy
> > more boxes and have a more fluid architecture, rather than building a
> > monolithic system. Monolithic systems aren't always the best. Just look
> > at Fermilab!
> 
> Uh? There are many alpha-based clusters out there. Why do you think 
> !x86 == monolithic?
> 
> .TM.
> -- 
>       ____/  ____/   /
>      /      /       /			Marco Colombo
>     ___/  ___  /   /		      Technical Manager
>    /          /   /			 ESI s.r.l.
>  _____/ _____/  _/		       Colombo@ESI.it
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-27 22:24                 ` Austin Gonyou
@ 2002-05-27 23:08                   ` Austin Gonyou
  0 siblings, 0 replies; 48+ messages in thread
From: Austin Gonyou @ 2002-05-27 23:08 UTC (permalink / raw)
  To: Austin Gonyou; +Cc: Marco Colombo, linux-kernel

Just to clarify, it was Sparc v. x86. (which is what I meant to state in
my first sentence there. :)

On Mon, 2002-05-27 at 17:24, Austin Gonyou wrote:
> I'm not referring to just *non* x86 arches in this case. Sorry about
> that. Any setup can be non-monolithic, but the measurement to decide if
> it is cost worthy is price/performance ratio. 
> 
> I'm not saying that "if it's not x86, it's monolithic",  in the context
> of the discussion, it's really about large costly boxes, designed to be
> large, costly boxes. That, from this perspective, is monolithic. 
> 
> 
> On Mon, 2002-05-27 at 04:24, Marco Colombo wrote:
> > On 24 May 2002, Austin Gonyou wrote:
> > 
> > > On Fri, 2002-05-24 at 11:31, Martin J. Bligh wrote:
> > > > >> I'm not sure exactly what Roy was doing, but we were taking a machine
> > > > >> with 16Gb of RAM, and reading files into the page cache - I think we built up
> > > > >> 8 million buffer_heads according to slabinfo ... on a P4 they're 128 bytes each,
> > > > >> on a P3 96 bytes.
> > > > > 
> > > > > The buffer heads one would make sense. I only test on realistic sized systems. 
> > > > 
> > > > Well, it'll still waste valuable memory there too, though you may not totally kill it.
> > > > 
> > > > > Once you pass 4Gb there are so many problems its not worth using x86 in the
> > > > > long run
> > > > 
> > > I assume that you mean by "not worth using x86" you're referring to say,
> > > degraded performance over other platforms? Well...if you talk
> > > price/performance, using x86 is perfect in those terms since you can buy
> > > more boxes and have a more fluid architecture, rather than building a
> > > monolithic system. Monolithic systems aren't always the best. Just look
> > > at Fermilab!
> > 
> > Uh? There are many alpha-based clusters out there. Why do you think 
> > !x86 == monolithic?
> > 
> > .TM.
> > -- 
> >       ____/  ____/   /
> >      /      /       /			Marco Colombo
> >     ___/  ___  /   /		      Technical Manager
> >    /          /   /			 ESI s.r.l.
> >  _____/ _____/  _/		       Colombo@ESI.it
> > 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 19:32           ` Andrew Morton
@ 2002-05-30 10:29             ` Roy Sigurd Karlsbakk
  2002-05-30 19:28               ` Andrew Morton
  2002-06-18 11:26             ` Roy Sigurd Karlsbakk
                               ` (2 subsequent siblings)
  3 siblings, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-30 10:29 UTC (permalink / raw)
  To: Andrew Morton, Martin J. Bligh; +Cc: linux-kernel

> > I don't think Andrew is ready to submit this yet ... before anything
> > gets merged back, it'd be very worthwhile testing the relative
> > performance of both solutions ... the more testers we have the
> > better ;-)
>
> Cripes no.  It's pretty experimental.  Andrea spotted a bug, too.  Fixed
> version is below.

Works great! This should _definetely_ be merged into the main kernel after 
som testing. Without it _all_ other kernels I've tested (2.4.lots) goes OOM 
under the mentioned scenarios. This one simply does the job.

> It's possible that keeping the number of buffers as low as possible
> will give improved performance over Andrea's approach because it
> leaves more ZONE_NORMAL for other things.  It's also possible that
> it'll give worse performance because more get_block's need to be
> done for file overwriting.

Andrea's patch merely pushed the problem forward. This one fixed it
-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-30 10:29             ` Roy Sigurd Karlsbakk
@ 2002-05-30 19:28               ` Andrew Morton
  2002-05-31 16:56                 ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2002-05-30 19:28 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Martin J. Bligh, linux-kernel

Roy Sigurd Karlsbakk wrote:
> 
> > > I don't think Andrew is ready to submit this yet ... before anything
> > > gets merged back, it'd be very worthwhile testing the relative
> > > performance of both solutions ... the more testers we have the
> > > better ;-)
> >
> > Cripes no.  It's pretty experimental.  Andrea spotted a bug, too.  Fixed
> > version is below.
> 
> Works great! This should _definetely_ be merged into the main kernel after
> som testing. Without it _all_ other kernels I've tested (2.4.lots) goes OOM
> under the mentioned scenarios. This one simply does the job.

I suspect nuke-buffers is simply always the right thing to do.  It's
what 2.5 is doing now (effectively).  We'll see...

But in your case, you only have a couple of gigs of memory, iirc.
You shouldn't be running into catastrophic buffer_head congestion.
Something odd is happening.

If you can provide a really detailed set of steps which can be
used by others to reproduce this, that would really help.

-

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-30 19:28               ` Andrew Morton
@ 2002-05-31 16:56                 ` Roy Sigurd Karlsbakk
  2002-05-31 18:19                   ` Andrea Arcangeli
  0 siblings, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-31 16:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel

> I suspect nuke-buffers is simply always the right thing to do.  It's
> what 2.5 is doing now (effectively).  We'll see...
>
> But in your case, you only have a couple of gigs of memory, iirc.
> You shouldn't be running into catastrophic buffer_head congestion.
> Something odd is happening.
>
> If you can provide a really detailed set of steps which can be
> used by others to reproduce this, that would really help.

What I do: start lots (10-50) downloads, each with a speed of 4,5Mbps from 
another client. The two are connected using gigEthernet. downloads are over 
HTTP, with Tux or other servers (have tried several). If the clients are 
reading at full speed (e.g. only a few clients, or reading directly from 
localhost), the problem doesn't occir. However, when reading at a fixed rate, 
it seems like the server is caching itself to death.


Detailed configuration:

- 4 IBM 40gig disks in RAID-0. chunk size 1MB
- 1 x athlon 1GHz
- 1GB RAM - no highmem (900 meg)
- kernel 2.4.19pre7 + patch from Andrew Morton to ditch buffers early 
        (thread: [BUG] 2.4 VM sucks. Again)
- gigEthernet between test client and server

Anyone got a clue?

-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-31 16:56                 ` Roy Sigurd Karlsbakk
@ 2002-05-31 18:19                   ` Andrea Arcangeli
  0 siblings, 0 replies; 48+ messages in thread
From: Andrea Arcangeli @ 2002-05-31 18:19 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Andrew Morton, Martin J. Bligh, linux-kernel

On Fri, May 31, 2002 at 06:56:54PM +0200, Roy Sigurd Karlsbakk wrote:
> > I suspect nuke-buffers is simply always the right thing to do.  It's
> > what 2.5 is doing now (effectively).  We'll see...
> >
> > But in your case, you only have a couple of gigs of memory, iirc.
> > You shouldn't be running into catastrophic buffer_head congestion.
> > Something odd is happening.
> >
> > If you can provide a really detailed set of steps which can be
> > used by others to reproduce this, that would really help.
> 
> What I do: start lots (10-50) downloads, each with a speed of 4,5Mbps from 
> another client. The two are connected using gigEthernet. downloads are over 
> HTTP, with Tux or other servers (have tried several). If the clients are 
> reading at full speed (e.g. only a few clients, or reading directly from 
> localhost), the problem doesn't occir. However, when reading at a fixed rate, 
> it seems like the server is caching itself to death.
> 
> 
> Detailed configuration:
> 
> - 4 IBM 40gig disks in RAID-0. chunk size 1MB
> - 1 x athlon 1GHz
> - 1GB RAM - no highmem (900 meg)
> - kernel 2.4.19pre7 + patch from Andrew Morton to ditch buffers early 
>         (thread: [BUG] 2.4 VM sucks. Again)
> - gigEthernet between test client and server
> 
> Anyone got a clue?

can you try to reproduce with 2.4.19pre9aa2 just in case it's an oom
deadlock, and if it deadlocks again can you press SYSRQ+T, and many
times SYSQR+P, and send this info along the system.map (you may need the
serial console to easily gather the data if not even a SYSRQ+I is able
to let the box resurrect from the livelock). (the system.map possibly
not on l-k because it's quite big)

thanks!

Andrea

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 10:36   ` Roy Sigurd Karlsbakk
@ 2002-05-31 21:21     ` Andrea Arcangeli
  2002-06-01 12:36       ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 48+ messages in thread
From: Andrea Arcangeli @ 2002-05-31 21:21 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: jlnance, linux-kernel

On Fri, May 24, 2002 at 12:36:32PM +0200, Roy Sigurd Karlsbakk wrote:
> On Thursday 23 May 2002 20:12, jlnance@intrex.net wrote:
> > On Thu, May 23, 2002 at 03:11:24PM +0200, Roy Sigurd Karlsbakk wrote:
> > > Starting up 30 downloads from a custom HTTP server (or Tux - or Apache -
> > > doesn't matter), file size is 3-6GB, download speed = ~4.5Mbps. After
> > > some time the kernel (a) goes bOOM (out of memory) if not having any
> > > swap, or (b) goes gong swapping out anything it can.
> >
> > Does this work if the client and the server are on the same machine?  It
> > would make reproducing this a lot easier if it only required 1 machine.
> 
> I guess it'd work fine with only one machine, as IMO, the problem must be the 
> kernel not releasing buffers

too much variable.

Also keep in mind if you grow the socket buffer to hundred mbyte on an
highmem machine the zone-normal will finish too fast and you may run out
of memory. 2.4.19pre9aa2 in such case should at least return -ENOMEM and
not deadlock.

Andrea

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-31 21:21     ` Andrea Arcangeli
@ 2002-06-01 12:36       ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-06-01 12:36 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: jlnance, linux-kernel

> > I guess it'd work fine with only one machine, as IMO, the problem must be
> > the kernel not releasing buffers
>
> too much variable.
>
> Also keep in mind if you grow the socket buffer to hundred mbyte on an
> highmem machine the zone-normal will finish too fast and you may run out
> of memory. 2.4.19pre9aa2 in such case should at least return -ENOMEM and
> not deadlock.

it's not a highmem machine. And. It's not user space processes using the 
memory
-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-05-24 19:32           ` Andrew Morton
  2002-05-30 10:29             ` Roy Sigurd Karlsbakk
@ 2002-06-18 11:26             ` Roy Sigurd Karlsbakk
  2002-06-18 19:42               ` Andrew Morton
  2002-07-10  7:50             ` [2.4 BUFFERING BUG] (was [BUG] 2.4 VM sucks. Again) Roy Sigurd Karlsbakk
  2002-08-28  9:28             ` [BUG+FIX] 2.4 buggercache sucks Roy Sigurd Karlsbakk
  3 siblings, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-06-18 11:26 UTC (permalink / raw)
  To: Andrew Morton, Martin J. Bligh; +Cc: linux-kernel

> > > Any plans to merge this into the main kernel, giving a choice
> > > (in config or /proc) to enable this?
> >
> > I don't think Andrew is ready to submit this yet ... before anything
> > gets merged back, it'd be very worthwhile testing the relative
> > performance of both solutions ... the more testers we have the
> > better ;-)
>
> Cripes no.  It's pretty experimental.  Andrea spotted a bug, too.  Fixed
> version is below.

Any more plans?
The patch has been working great for some time now, and I'd really like to see 
this in the official tree. Also - I guess this patch will eliminate any 
caching whatsoever, and therefore not really a good thing for file or web 
servers?

roy

-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-06-18 11:26             ` Roy Sigurd Karlsbakk
@ 2002-06-18 19:42               ` Andrew Morton
  2002-06-19 11:26                 ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2002-06-18 19:42 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Martin J. Bligh, linux-kernel

Roy Sigurd Karlsbakk wrote:
> 
> > > > Any plans to merge this into the main kernel, giving a choice
> > > > (in config or /proc) to enable this?
> > >
> > > I don't think Andrew is ready to submit this yet ... before anything
> > > gets merged back, it'd be very worthwhile testing the relative
> > > performance of both solutions ... the more testers we have the
> > > better ;-)
> >
> > Cripes no.  It's pretty experimental.  Andrea spotted a bug, too.  Fixed
> > version is below.
> 
> Any more plans?
> The patch has been working great for some time now, and I'd really like to see
> this in the official tree

Roy, all we know is that "nuke-buffers stops your machine from locking up".
But we don't know why your machine locks up in the first place.  This just
isn't sufficient grounds to apply it!  We need to know exactly why your
kernel is failing.  We don't know what the bug is.

You have two gigabytes of RAM, yes?  It's very weird that stripping buffers
prevents a lockup on a machine with such a small highmem/lowmem ratio.

I'll have yet another shot at reproducing it.  So, again, could you please
tell me *exactly*, in great deatail, what I need to do to reproduce this
problem?

- memory size
- number of CPUs
- IO system
- kernel version, any applied patches, compiler version
- exact sequence of commands
- anything else you can think of

Have you been able to reproduce the failure on any other machine?

> Also - I guess this patch will eliminate any
> caching whatsoever, and therefore not really a good thing for file or web
> servers?

No, not at all.  All the pagecache is still there - the patch just
throws away the buffer_heads which are attached to those pagecache
pages.

The 2.5 kernel does it tons better.  Have you tried it?

-

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG] 2.4 VM sucks. Again
  2002-06-18 19:42               ` Andrew Morton
@ 2002-06-19 11:26                 ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-06-19 11:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel

> Roy, all we know is that "nuke-buffers stops your machine from locking up".
> But we don't know why your machine locks up in the first place.  This just
> isn't sufficient grounds to apply it!  We need to know exactly why your
> kernel is failing.  We don't know what the bug is.

The bug, as previously described, occurs when multiple (20+) clients downloads 
large files (3-6Gigs each) at a speed of ~5Mbps. The error does _not_ occur 
when a fewer number of clients are downloading at speeds close to disk speed. 
All testing is being done on gigE crossover.

> You have two gigabytes of RAM, yes?  It's very weird that stripping buffers
> prevents a lockup on a machine with such a small highmem/lowmem ratio.

No. I have 1GB - highmem (which is disabled) giving me ~900MB

> I'll have yet another shot at reproducing it.  So, again, could you please
> tell me *exactly*, in great deatail, what I need to do to reproduce this
> problem?

> - memory size

1GB - highmem

> - number of CPUs

1 Athlon 1133Mz, 256kB cache

> - IO system

standard 33MHz/32bit single peer PCI motherboard (SiS based)
on-board SiS IDE/ATA 100 controller.
promise 20269 controller
realtek 100mbps nic
e1000 gigE nic
4 IBM 40gig 120GXP drives - one on each IDE channel
data partition on RAID-0 across all drives

> - kernel version, any applied patches, compiler version
kernel 2.4.19-pre8+tux+akpm buffer patch
	I have tried _many_ different kernels, and as I needed the 20269 support, I
	chose 2.4.19-pre, Tux is there as I did some testing with that. The problem
	is _not_ tux specific, as I've tried with other server software (custom or
	standard) as well.
gcc2.95.3

> - exact sequence of commands

start http server software
start 20+ downloads. each downloaded file is 3-6 gigs
after some time most processes are killed OOM

> - anything else you can think of

I have not tried to give it coffee yet, although that might help. I'm usually 
pretty pissed if I haven't got my morning coffee

> Have you been able to reproduce the failure on any other machine?

yes. I have set up one other machine with exact same setup and one with 
slightly different setup and reproduced it.

> No, not at all.  All the pagecache is still there - the patch just
> throws away the buffer_heads which are attached to those pagecache
> pages.

oh. that's good.

> The 2.5 kernel does it tons better.  Have you tried it?

I haven't. I've tried to compile it a few times, but it has failed. And. I 
don't want to run 2.5 on a production server.

But - If you ask me to test it, I will

thanks for all help

roy

-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [2.4 BUFFERING BUG] (was [BUG] 2.4 VM sucks. Again)
  2002-05-24 19:32           ` Andrew Morton
  2002-05-30 10:29             ` Roy Sigurd Karlsbakk
  2002-06-18 11:26             ` Roy Sigurd Karlsbakk
@ 2002-07-10  7:50             ` Roy Sigurd Karlsbakk
  2002-07-10  8:05               ` Andrew Morton
  2002-08-28  9:28             ` [BUG+FIX] 2.4 buggercache sucks Roy Sigurd Karlsbakk
  3 siblings, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-07-10  7:50 UTC (permalink / raw)
  To: Andrew Morton, Martin J. Bligh; +Cc: linux-kernel

hi

I've been using the patch below from Andrew for some weeks now, sometimes 
under quite heavy load, and find it quite stable.

Just wanted to say ...

roy

> > I don't think Andrew is ready to submit this yet ... before anything
> > gets merged back, it'd be very worthwhile testing the relative
> > performance of both solutions ... the more testers we have the
> > better ;-)
>
> Cripes no.  It's pretty experimental.  Andrea spotted a bug, too.  Fixed
> version is below.
>
> It's possible that keeping the number of buffers as low as possible
> will give improved performance over Andrea's approach because it
> leaves more ZONE_NORMAL for other things.  It's also possible that
> it'll give worse performance because more get_block's need to be
> done for file overwriting.
>
>
> --- 2.4.19-pre8/include/linux/pagemap.h~nuke-buffers	Fri May 24 12:24:56
> 2002 +++ 2.4.19-pre8-akpm/include/linux/pagemap.h	Fri May 24 12:26:30 2002
> @@ -89,13 +89,7 @@ extern void add_to_page_cache(struct pag
>  extern void add_to_page_cache_locked(struct page * page, struct
> address_space *mapping, unsigned long index); extern int
> add_to_page_cache_unique(struct page * page, struct address_space *mapping,
> unsigned long index, struct page **hash);
>
> -extern void ___wait_on_page(struct page *);
> -
> -static inline void wait_on_page(struct page * page)
> -{
> -	if (PageLocked(page))
> -		___wait_on_page(page);
> -}
> +extern void wait_on_page(struct page *);
>
>  extern struct page * grab_cache_page (struct address_space *, unsigned
> long); extern struct page * grab_cache_page_nowait (struct address_space *,
> unsigned long); --- 2.4.19-pre8/mm/filemap.c~nuke-buffers	Fri May 24
> 12:24:56 2002 +++ 2.4.19-pre8-akpm/mm/filemap.c	Fri May 24 12:24:56 2002
> @@ -608,7 +608,7 @@ int filemap_fdatawait(struct address_spa
>  		page_cache_get(page);
>  		spin_unlock(&pagecache_lock);
>
> -		___wait_on_page(page);
> +		wait_on_page(page);
>  		if (PageError(page))
>  			ret = -EIO;
>
> @@ -805,33 +805,29 @@ static inline wait_queue_head_t *page_wa
>  	return &wait[hash];
>  }
>
> -/*
> - * Wait for a page to get unlocked.
> +static void kill_buffers(struct page *page)
> +{
> +	if (!PageLocked(page))
> +		BUG();
> +	if (page->buffers)
> +		try_to_release_page(page, GFP_NOIO);
> +}
> +
> +/*
> + * Wait for a page to come unlocked.  Then try to ditch its buffer_heads.
>   *
> - * This must be called with the caller "holding" the page,
> - * ie with increased "page->count" so that the page won't
> - * go away during the wait..
> + * FIXME: Make the ditching dependent on CONFIG_MONSTER_BOX or something.
>   */
> -void ___wait_on_page(struct page *page)
> +void wait_on_page(struct page *page)
>  {
> -	wait_queue_head_t *waitqueue = page_waitqueue(page);
> -	struct task_struct *tsk = current;
> -	DECLARE_WAITQUEUE(wait, tsk);
> -
> -	add_wait_queue(waitqueue, &wait);
> -	do {
> -		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> -		if (!PageLocked(page))
> -			break;
> -		sync_page(page);
> -		schedule();
> -	} while (PageLocked(page));
> -	__set_task_state(tsk, TASK_RUNNING);
> -	remove_wait_queue(waitqueue, &wait);
> +	lock_page(page);
> +	kill_buffers(page);
> +	unlock_page(page);
>  }
> +EXPORT_SYMBOL(wait_on_page);
>
>  /*
> - * Unlock the page and wake up sleepers in ___wait_on_page.
> + * Unlock the page and wake up sleepers in lock_page.
>   */
>  void unlock_page(struct page *page)
>  {
> @@ -1400,6 +1396,11 @@ found_page:
>
>  		if (!Page_Uptodate(page))
>  			goto page_not_up_to_date;
> +		if (page->buffers) {
> +			lock_page(page);
> +			kill_buffers(page);
> +			unlock_page(page);
> +		}
>  		generic_file_readahead(reada_ok, filp, inode, page);
>  page_ok:
>  		/* If users can be writing to this page using arbitrary
> @@ -1457,6 +1458,7 @@ page_not_up_to_date:
>
>  		/* Did somebody else fill it already? */
>  		if (Page_Uptodate(page)) {
> +			kill_buffers(page);
>  			UnlockPage(page);
>  			goto page_ok;
>  		}
> @@ -1948,6 +1950,11 @@ retry_find:
>  	 */
>  	if (!Page_Uptodate(page))
>  		goto page_not_uptodate;
> +	if (page->buffers) {
> +		lock_page(page);
> +		kill_buffers(page);
> +		unlock_page(page);
> +	}
>
>  success:
>   	/*
> @@ -2006,6 +2013,7 @@ page_not_uptodate:
>
>  	/* Did somebody else get it up-to-date? */
>  	if (Page_Uptodate(page)) {
> +		kill_buffers(page);
>  		UnlockPage(page);
>  		goto success;
>  	}
> @@ -2033,6 +2041,7 @@ page_not_uptodate:
>
>  	/* Somebody else successfully read it in? */
>  	if (Page_Uptodate(page)) {
> +		kill_buffers(page);
>  		UnlockPage(page);
>  		goto success;
>  	}
> @@ -2850,6 +2859,7 @@ retry:
>  		goto retry;
>  	}
>  	if (Page_Uptodate(page)) {
> +		kill_buffers(page);
>  		UnlockPage(page);
>  		goto out;
>  	}
> --- 2.4.19-pre8/kernel/ksyms.c~nuke-buffers	Fri May 24 12:24:56 2002
> +++ 2.4.19-pre8-akpm/kernel/ksyms.c	Fri May 24 12:24:56 2002
> @@ -202,7 +202,6 @@ EXPORT_SYMBOL(ll_rw_block);
>  EXPORT_SYMBOL(submit_bh);
>  EXPORT_SYMBOL(unlock_buffer);
>  EXPORT_SYMBOL(__wait_on_buffer);
> -EXPORT_SYMBOL(___wait_on_page);
>  EXPORT_SYMBOL(generic_direct_IO);
>  EXPORT_SYMBOL(discard_bh_page);
>  EXPORT_SYMBOL(block_write_full_page);
> --- 2.4.19-pre8/mm/vmscan.c~nuke-buffers	Fri May 24 12:24:56 2002
> +++ 2.4.19-pre8-akpm/mm/vmscan.c	Fri May 24 12:24:56 2002
> @@ -365,8 +365,13 @@ static int shrink_cache(int nr_pages, zo
>  		if (unlikely(!page_count(page)))
>  			continue;
>
> -		if (!memclass(page_zone(page), classzone))
> +		if (!memclass(page_zone(page), classzone)) {
> +			if (page->buffers && !TryLockPage(page)) {
> +				try_to_release_page(page, GFP_NOIO);
> +				unlock_page(page);
> +			}
>  			continue;
> +		}
>
>  		/* Racy check to avoid trylocking when not worthwhile */
>  		if (!page->buffers && (page_count(page) != 1 || !page->mapping))
> @@ -562,6 +567,11 @@ static int shrink_caches(zone_t * classz
>  	nr_pages -= kmem_cache_reap(gfp_mask);
>  	if (nr_pages <= 0)
>  		return 0;
> +	if ((gfp_mask & __GFP_WAIT) && (shrink_buffer_cache() > 16)) {
> +		nr_pages -= kmem_cache_reap(gfp_mask);
> +		if (nr_pages <= 0)
> +			return 0;
> +	}
>
>  	nr_pages = chunk_size;
>  	/* try to keep the active list 2/3 of the size of the cache */
> --- 2.4.19-pre8/fs/buffer.c~nuke-buffers	Fri May 24 12:24:56 2002
> +++ 2.4.19-pre8-akpm/fs/buffer.c	Fri May 24 12:26:28 2002
> @@ -1500,6 +1500,10 @@ static int __block_write_full_page(struc
>  	/* Stage 3: submit the IO */
>  	do {
>  		struct buffer_head *next = bh->b_this_page;
> +		/*
> +		 * Stick it on BUF_LOCKED so shrink_buffer_cache() can nail it.
> +		 */
> +		refile_buffer(bh);
>  		submit_bh(WRITE, bh);
>  		bh = next;
>  	} while (bh != head);
> @@ -2615,6 +2619,25 @@ static int sync_page_buffers(struct buff
>  int try_to_free_buffers(struct page * page, unsigned int gfp_mask)
>  {
>  	struct buffer_head * tmp, * bh = page->buffers;
> +	int was_uptodate = 1;
> +
> +	if (!PageLocked(page))
> +		BUG();
> +
> +	if (!bh)
> +		return 1;
> +	/*
> +	 * Quick check for freeable buffers before we go take three
> +	 * global locks.
> +	 */
> +	if (!(gfp_mask & __GFP_IO)) {
> +		tmp = bh;
> +		do {
> +			if (buffer_busy(tmp))
> +				return 0;
> +			tmp = tmp->b_this_page;
> +		} while (tmp != bh);
> +	}
>
>  cleaned_buffers_try_again:
>  	spin_lock(&lru_list_lock);
> @@ -2637,7 +2660,8 @@ cleaned_buffers_try_again:
>  		tmp = tmp->b_this_page;
>
>  		if (p->b_dev == B_FREE) BUG();
> -
> +		if (!buffer_uptodate(p))
> +			was_uptodate = 0;
>  		remove_inode_queue(p);
>  		__remove_from_queues(p);
>  		__put_unused_buffer_head(p);
> @@ -2645,7 +2669,15 @@ cleaned_buffers_try_again:
>  	spin_unlock(&unused_list_lock);
>
>  	/* Wake up anyone waiting for buffer heads */
> -	wake_up(&buffer_wait);
> +	smp_mb();
> +	if (waitqueue_active(&buffer_wait))
> +		wake_up(&buffer_wait);
> +
> +	/*
> +	 * Make sure we don't read buffers again when they are reattached
> +	 */
> +	if (was_uptodate)
> +		SetPageUptodate(page);
>
>  	/* And free the page */
>  	page->buffers = NULL;
> @@ -2674,6 +2706,62 @@ busy_buffer_page:
>  }
>  EXPORT_SYMBOL(try_to_free_buffers);
>
> +/*
> + * Returns the number of pages which might have become freeable
> + */
> +int shrink_buffer_cache(void)
> +{
> +	struct buffer_head *bh;
> +	int nr_todo;
> +	int nr_shrunk = 0;
> +
> +	/*
> +	 * Move any clean unlocked buffers from BUF_LOCKED onto BUF_CLEAN
> +	 */
> +	spin_lock(&lru_list_lock);
> +	for ( ; ; ) {
> +		bh = lru_list[BUF_LOCKED];
> +		if (!bh || buffer_locked(bh))
> +			break;
> +		__refile_buffer(bh);
> +	}
> +
> +	/*
> +	 * Now start liberating buffers
> +	 */
> +	nr_todo = nr_buffers_type[BUF_CLEAN];
> +	while (nr_todo--) {
> +		struct page *page;
> +
> +		bh = lru_list[BUF_CLEAN];
> +		if (!bh)
> +			break;
> +
> +		/*
> +		 * Park the buffer on BUF_LOCKED so we don't revisit it on
> +		 * this pass.
> +		 */
> +		__remove_from_lru_list(bh);
> +		bh->b_list = BUF_LOCKED;
> +		__insert_into_lru_list(bh, BUF_LOCKED);
> +		page = bh->b_page;
> +		if (TryLockPage(page))
> +			continue;
> +
> +		page_cache_get(page);
> +		spin_unlock(&lru_list_lock);
> +		if (try_to_release_page(page, GFP_NOIO))
> +			nr_shrunk++;
> +		unlock_page(page);
> +		page_cache_release(page);
> +		spin_lock(&lru_list_lock);
> +	}
> +	spin_unlock(&lru_list_lock);
> +//	printk("%s: liberated %d page's worth of buffer_heads\n",
> +//		__FUNCTION__, nr_shrunk);
> +	return (nr_shrunk * sizeof(struct buffer_head)) / PAGE_CACHE_SIZE;
> +}
> +
>  /* ================== Debugging =================== */
>
>  void show_buffers(void)
> @@ -2988,6 +3076,7 @@ int kupdate(void *startup)
>  #ifdef DEBUG
>  		printk(KERN_DEBUG "kupdate() activated...\n");
>  #endif
> +		shrink_buffer_cache();
>  		sync_old_buffers();
>  		run_task_queue(&tq_disk);
>  	}
> --- 2.4.19-pre8/include/linux/fs.h~nuke-buffers	Fri May 24 12:24:56 2002
> +++ 2.4.19-pre8-akpm/include/linux/fs.h	Fri May 24 12:24:56 2002
> @@ -1116,6 +1116,7 @@ extern int FASTCALL(try_to_free_buffers(
>  extern void refile_buffer(struct buffer_head * buf);
>  extern void create_empty_buffers(struct page *, kdev_t, unsigned long);
>  extern void end_buffer_io_sync(struct buffer_head *bh, int uptodate);
> +extern int shrink_buffer_cache(void);
>
>  /* reiserfs_writepage needs this */
>  extern void set_buffer_async_io(struct buffer_head *bh) ;
>
>
> -

-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [2.4 BUFFERING BUG] (was [BUG] 2.4 VM sucks. Again)
  2002-07-10  7:50             ` [2.4 BUFFERING BUG] (was [BUG] 2.4 VM sucks. Again) Roy Sigurd Karlsbakk
@ 2002-07-10  8:05               ` Andrew Morton
  2002-07-10  8:14                 ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 48+ messages in thread
From: Andrew Morton @ 2002-07-10  8:05 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk; +Cc: Martin J. Bligh, linux-kernel

Roy Sigurd Karlsbakk wrote:
> 
> hi
> 
> I've been using the patch below from Andrew for some weeks now, sometimes
> under quite heavy load, and find it quite stable.
> 

Wish we knew why.  I've tried many times to reproduce the problem
which you're seeing.  With just two gigs of memory, buffer_heads
really cannot explain anything.  It's weird.

We discussed this in Ottawa - I guess Andrea will add the toss-the-buffers
code on the read side (basically the filemap.c stuff).  That may
be sufficient, but without an understanding of what is going on,
it is hard to predict.

-

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [2.4 BUFFERING BUG] (was [BUG] 2.4 VM sucks. Again)
  2002-07-10  8:05               ` Andrew Morton
@ 2002-07-10  8:14                 ` Roy Sigurd Karlsbakk
  0 siblings, 0 replies; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-07-10  8:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel

On Wednesday 10 July 2002 10:05, Andrew Morton wrote:
> Roy Sigurd Karlsbakk wrote:
> > hi
> >
> > I've been using the patch below from Andrew for some weeks now, sometimes
> > under quite heavy load, and find it quite stable.
>
> Wish we knew why.  I've tried many times to reproduce the problem
> which you're seeing.  With just two gigs of memory, buffer_heads
> really cannot explain anything.  It's weird.

well - firstly, I'm using _1_ gig of memory - highmem (= 900 megs something)
secondly - I have reproduced it on two different installations, although on 
the same hardware - standard PC with SiS MB and an extra promise controller, 
RAID-0 on 4 drives and chunksize 1MB. Given a 30-50 processes each reading a 
4gig file and sending it over HTTP, everything works fine _if_ and only _if_ 
the client reads at high speed. If, however, the client reads at normal 
streaming speed (4,3Mbps), buffers go bOOM.

> We discussed this in Ottawa - I guess Andrea will add the toss-the-buffers
> code on the read side (basically the filemap.c stuff).  That may
> be sufficient, but without an understanding of what is going on,
> it is hard to predict.

Is there _any_ more data I can give, or any more testing I can do, then I'll 
do my very best to help

roy
-- 
Roy Sigurd Karlsbakk, Datavaktmester

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [BUG+FIX] 2.4 buggercache sucks
  2002-05-24 19:32           ` Andrew Morton
                               ` (2 preceding siblings ...)
  2002-07-10  7:50             ` [2.4 BUFFERING BUG] (was [BUG] 2.4 VM sucks. Again) Roy Sigurd Karlsbakk
@ 2002-08-28  9:28             ` Roy Sigurd Karlsbakk
  2002-08-28 15:30               ` Martin J. Bligh
  3 siblings, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-08-28  9:28 UTC (permalink / raw)
  To: Andrew Morton, Martin J. Bligh; +Cc: linux-kernel

hi

the patch below has now been tested out for quite some time.

Will it be likely to see this into 2.4.20?

roy


On Friday 24 May 2002 21:32, Andrew Morton wrote:
> "Martin J. Bligh" wrote:
> > >> Sounds like exactly the same problem we were having. There are two
> > >> approaches to solving this - Andrea has a patch that tries to free
> > >> them under memory pressure, akpm has a patch that hacks them down as
> > >> soon as you've fininshed with them (posted to lse-tech mailing list).
> > >> Both approaches seemed to work for me, but the performance of the
> > >> fixes still has to be established.
> > >
> > > Where can I find the akpm patch?
> >
> > http://marc.theaimsgroup.com/?l=lse-tech&m=102083525007877&w=2
> >
> > > Any plans to merge this into the main kernel, giving a choice
> > > (in config or /proc) to enable this?
> >
> > I don't think Andrew is ready to submit this yet ... before anything
> > gets merged back, it'd be very worthwhile testing the relative
> > performance of both solutions ... the more testers we have the
> > better ;-)
>
> Cripes no.  It's pretty experimental.  Andrea spotted a bug, too.  Fixed
> version is below.
>
> It's possible that keeping the number of buffers as low as possible
> will give improved performance over Andrea's approach because it
> leaves more ZONE_NORMAL for other things.  It's also possible that
> it'll give worse performance because more get_block's need to be
> done for file overwriting.
>
>
> --- 2.4.19-pre8/include/linux/pagemap.h~nuke-buffers	Fri May 24 12:24:56
> 2002 +++ 2.4.19-pre8-akpm/include/linux/pagemap.h	Fri May 24 12:26:30 2002
> @@ -89,13 +89,7 @@ extern void add_to_page_cache(struct pag
>  extern void add_to_page_cache_locked(struct page * page, struct
> address_space *mapping, unsigned long index); extern int
> add_to_page_cache_unique(struct page * page, struct address_space *mapping,
> unsigned long index, struct page **hash);
>
> -extern void ___wait_on_page(struct page *);
> -
> -static inline void wait_on_page(struct page * page)
> -{
> -	if (PageLocked(page))
> -		___wait_on_page(page);
> -}
> +extern void wait_on_page(struct page *);
>
>  extern struct page * grab_cache_page (struct address_space *, unsigned
> long); extern struct page * grab_cache_page_nowait (struct address_space *,
> unsigned long); --- 2.4.19-pre8/mm/filemap.c~nuke-buffers	Fri May 24
> 12:24:56 2002 +++ 2.4.19-pre8-akpm/mm/filemap.c	Fri May 24 12:24:56 2002
> @@ -608,7 +608,7 @@ int filemap_fdatawait(struct address_spa
>  		page_cache_get(page);
>  		spin_unlock(&pagecache_lock);
>
> -		___wait_on_page(page);
> +		wait_on_page(page);
>  		if (PageError(page))
>  			ret = -EIO;
>
> @@ -805,33 +805,29 @@ static inline wait_queue_head_t *page_wa
>  	return &wait[hash];
>  }
>
> -/*
> - * Wait for a page to get unlocked.
> +static void kill_buffers(struct page *page)
> +{
> +	if (!PageLocked(page))
> +		BUG();
> +	if (page->buffers)
> +		try_to_release_page(page, GFP_NOIO);
> +}
> +
> +/*
> + * Wait for a page to come unlocked.  Then try to ditch its buffer_heads.
>   *
> - * This must be called with the caller "holding" the page,
> - * ie with increased "page->count" so that the page won't
> - * go away during the wait..
> + * FIXME: Make the ditching dependent on CONFIG_MONSTER_BOX or something.
>   */
> -void ___wait_on_page(struct page *page)
> +void wait_on_page(struct page *page)
>  {
> -	wait_queue_head_t *waitqueue = page_waitqueue(page);
> -	struct task_struct *tsk = current;
> -	DECLARE_WAITQUEUE(wait, tsk);
> -
> -	add_wait_queue(waitqueue, &wait);
> -	do {
> -		set_task_state(tsk, TASK_UNINTERRUPTIBLE);
> -		if (!PageLocked(page))
> -			break;
> -		sync_page(page);
> -		schedule();
> -	} while (PageLocked(page));
> -	__set_task_state(tsk, TASK_RUNNING);
> -	remove_wait_queue(waitqueue, &wait);
> +	lock_page(page);
> +	kill_buffers(page);
> +	unlock_page(page);
>  }
> +EXPORT_SYMBOL(wait_on_page);
>
>  /*
> - * Unlock the page and wake up sleepers in ___wait_on_page.
> + * Unlock the page and wake up sleepers in lock_page.
>   */
>  void unlock_page(struct page *page)
>  {
> @@ -1400,6 +1396,11 @@ found_page:
>
>  		if (!Page_Uptodate(page))
>  			goto page_not_up_to_date;
> +		if (page->buffers) {
> +			lock_page(page);
> +			kill_buffers(page);
> +			unlock_page(page);
> +		}
>  		generic_file_readahead(reada_ok, filp, inode, page);
>  page_ok:
>  		/* If users can be writing to this page using arbitrary
> @@ -1457,6 +1458,7 @@ page_not_up_to_date:
>
>  		/* Did somebody else fill it already? */
>  		if (Page_Uptodate(page)) {
> +			kill_buffers(page);
>  			UnlockPage(page);
>  			goto page_ok;
>  		}
> @@ -1948,6 +1950,11 @@ retry_find:
>  	 */
>  	if (!Page_Uptodate(page))
>  		goto page_not_uptodate;
> +	if (page->buffers) {
> +		lock_page(page);
> +		kill_buffers(page);
> +		unlock_page(page);
> +	}
>
>  success:
>   	/*
> @@ -2006,6 +2013,7 @@ page_not_uptodate:
>
>  	/* Did somebody else get it up-to-date? */
>  	if (Page_Uptodate(page)) {
> +		kill_buffers(page);
>  		UnlockPage(page);
>  		goto success;
>  	}
> @@ -2033,6 +2041,7 @@ page_not_uptodate:
>
>  	/* Somebody else successfully read it in? */
>  	if (Page_Uptodate(page)) {
> +		kill_buffers(page);
>  		UnlockPage(page);
>  		goto success;
>  	}
> @@ -2850,6 +2859,7 @@ retry:
>  		goto retry;
>  	}
>  	if (Page_Uptodate(page)) {
> +		kill_buffers(page);
>  		UnlockPage(page);
>  		goto out;
>  	}
> --- 2.4.19-pre8/kernel/ksyms.c~nuke-buffers	Fri May 24 12:24:56 2002
> +++ 2.4.19-pre8-akpm/kernel/ksyms.c	Fri May 24 12:24:56 2002
> @@ -202,7 +202,6 @@ EXPORT_SYMBOL(ll_rw_block);
>  EXPORT_SYMBOL(submit_bh);
>  EXPORT_SYMBOL(unlock_buffer);
>  EXPORT_SYMBOL(__wait_on_buffer);
> -EXPORT_SYMBOL(___wait_on_page);
>  EXPORT_SYMBOL(generic_direct_IO);
>  EXPORT_SYMBOL(discard_bh_page);
>  EXPORT_SYMBOL(block_write_full_page);
> --- 2.4.19-pre8/mm/vmscan.c~nuke-buffers	Fri May 24 12:24:56 2002
> +++ 2.4.19-pre8-akpm/mm/vmscan.c	Fri May 24 12:24:56 2002
> @@ -365,8 +365,13 @@ static int shrink_cache(int nr_pages, zo
>  		if (unlikely(!page_count(page)))
>  			continue;
>
> -		if (!memclass(page_zone(page), classzone))
> +		if (!memclass(page_zone(page), classzone)) {
> +			if (page->buffers && !TryLockPage(page)) {
> +				try_to_release_page(page, GFP_NOIO);
> +				unlock_page(page);
> +			}
>  			continue;
> +		}
>
>  		/* Racy check to avoid trylocking when not worthwhile */
>  		if (!page->buffers && (page_count(page) != 1 || !page->mapping))
> @@ -562,6 +567,11 @@ static int shrink_caches(zone_t * classz
>  	nr_pages -= kmem_cache_reap(gfp_mask);
>  	if (nr_pages <= 0)
>  		return 0;
> +	if ((gfp_mask & __GFP_WAIT) && (shrink_buffer_cache() > 16)) {
> +		nr_pages -= kmem_cache_reap(gfp_mask);
> +		if (nr_pages <= 0)
> +			return 0;
> +	}
>
>  	nr_pages = chunk_size;
>  	/* try to keep the active list 2/3 of the size of the cache */
> --- 2.4.19-pre8/fs/buffer.c~nuke-buffers	Fri May 24 12:24:56 2002
> +++ 2.4.19-pre8-akpm/fs/buffer.c	Fri May 24 12:26:28 2002
> @@ -1500,6 +1500,10 @@ static int __block_write_full_page(struc
>  	/* Stage 3: submit the IO */
>  	do {
>  		struct buffer_head *next = bh->b_this_page;
> +		/*
> +		 * Stick it on BUF_LOCKED so shrink_buffer_cache() can nail it.
> +		 */
> +		refile_buffer(bh);
>  		submit_bh(WRITE, bh);
>  		bh = next;
>  	} while (bh != head);
> @@ -2615,6 +2619,25 @@ static int sync_page_buffers(struct buff
>  int try_to_free_buffers(struct page * page, unsigned int gfp_mask)
>  {
>  	struct buffer_head * tmp, * bh = page->buffers;
> +	int was_uptodate = 1;
> +
> +	if (!PageLocked(page))
> +		BUG();
> +
> +	if (!bh)
> +		return 1;
> +	/*
> +	 * Quick check for freeable buffers before we go take three
> +	 * global locks.
> +	 */
> +	if (!(gfp_mask & __GFP_IO)) {
> +		tmp = bh;
> +		do {
> +			if (buffer_busy(tmp))
> +				return 0;
> +			tmp = tmp->b_this_page;
> +		} while (tmp != bh);
> +	}
>
>  cleaned_buffers_try_again:
>  	spin_lock(&lru_list_lock);
> @@ -2637,7 +2660,8 @@ cleaned_buffers_try_again:
>  		tmp = tmp->b_this_page;
>
>  		if (p->b_dev == B_FREE) BUG();
> -
> +		if (!buffer_uptodate(p))
> +			was_uptodate = 0;
>  		remove_inode_queue(p);
>  		__remove_from_queues(p);
>  		__put_unused_buffer_head(p);
> @@ -2645,7 +2669,15 @@ cleaned_buffers_try_again:
>  	spin_unlock(&unused_list_lock);
>
>  	/* Wake up anyone waiting for buffer heads */
> -	wake_up(&buffer_wait);
> +	smp_mb();
> +	if (waitqueue_active(&buffer_wait))
> +		wake_up(&buffer_wait);
> +
> +	/*
> +	 * Make sure we don't read buffers again when they are reattached
> +	 */
> +	if (was_uptodate)
> +		SetPageUptodate(page);
>
>  	/* And free the page */
>  	page->buffers = NULL;
> @@ -2674,6 +2706,62 @@ busy_buffer_page:
>  }
>  EXPORT_SYMBOL(try_to_free_buffers);
>
> +/*
> + * Returns the number of pages which might have become freeable
> + */
> +int shrink_buffer_cache(void)
> +{
> +	struct buffer_head *bh;
> +	int nr_todo;
> +	int nr_shrunk = 0;
> +
> +	/*
> +	 * Move any clean unlocked buffers from BUF_LOCKED onto BUF_CLEAN
> +	 */
> +	spin_lock(&lru_list_lock);
> +	for ( ; ; ) {
> +		bh = lru_list[BUF_LOCKED];
> +		if (!bh || buffer_locked(bh))
> +			break;
> +		__refile_buffer(bh);
> +	}
> +
> +	/*
> +	 * Now start liberating buffers
> +	 */
> +	nr_todo = nr_buffers_type[BUF_CLEAN];
> +	while (nr_todo--) {
> +		struct page *page;
> +
> +		bh = lru_list[BUF_CLEAN];
> +		if (!bh)
> +			break;
> +
> +		/*
> +		 * Park the buffer on BUF_LOCKED so we don't revisit it on
> +		 * this pass.
> +		 */
> +		__remove_from_lru_list(bh);
> +		bh->b_list = BUF_LOCKED;
> +		__insert_into_lru_list(bh, BUF_LOCKED);
> +		page = bh->b_page;
> +		if (TryLockPage(page))
> +			continue;
> +
> +		page_cache_get(page);
> +		spin_unlock(&lru_list_lock);
> +		if (try_to_release_page(page, GFP_NOIO))
> +			nr_shrunk++;
> +		unlock_page(page);
> +		page_cache_release(page);
> +		spin_lock(&lru_list_lock);
> +	}
> +	spin_unlock(&lru_list_lock);
> +//	printk("%s: liberated %d page's worth of buffer_heads\n",
> +//		__FUNCTION__, nr_shrunk);
> +	return (nr_shrunk * sizeof(struct buffer_head)) / PAGE_CACHE_SIZE;
> +}
> +
>  /* ================== Debugging =================== */
>
>  void show_buffers(void)
> @@ -2988,6 +3076,7 @@ int kupdate(void *startup)
>  #ifdef DEBUG
>  		printk(KERN_DEBUG "kupdate() activated...\n");
>  #endif
> +		shrink_buffer_cache();
>  		sync_old_buffers();
>  		run_task_queue(&tq_disk);
>  	}
> --- 2.4.19-pre8/include/linux/fs.h~nuke-buffers	Fri May 24 12:24:56 2002
> +++ 2.4.19-pre8-akpm/include/linux/fs.h	Fri May 24 12:24:56 2002
> @@ -1116,6 +1116,7 @@ extern int FASTCALL(try_to_free_buffers(
>  extern void refile_buffer(struct buffer_head * buf);
>  extern void create_empty_buffers(struct page *, kdev_t, unsigned long);
>  extern void end_buffer_io_sync(struct buffer_head *bh, int uptodate);
> +extern int shrink_buffer_cache(void);
>
>  /* reiserfs_writepage needs this */
>  extern void set_buffer_async_io(struct buffer_head *bh) ;
>
>
> -

-- 
Roy Sigurd Karlsbakk, Datavaktmester
ProntoTV AS - http://www.pronto.tv/
Tel: +47 9801 3356

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG+FIX] 2.4 buggercache sucks
  2002-08-28  9:28             ` [BUG+FIX] 2.4 buggercache sucks Roy Sigurd Karlsbakk
@ 2002-08-28 15:30               ` Martin J. Bligh
  2002-08-29  8:00                 ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2002-08-28 15:30 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk, Andrew Morton; +Cc: linux-kernel

Andrew had a new version that he just submitted to 2.5,
but it may not backport easily.

The agreement at OLS was to treat read and write seperately - 
nuke them immediately for one side, and reclaim under mem
pressure for the other. Half of Andrea's patch, and half of 
Andrew's. Unfortunately I can never remember which was which ;-)
And I don't think anyone has rolled that together yet ....

Summary: the code below probably isn't the desired solution.

M.

--On Wednesday, August 28, 2002 11:28 AM +0200 Roy Sigurd Karlsbakk <roy@karlsbakk.net> wrote:

> hi
> 
> the patch below has now been tested out for quite some time.
> 
> Will it be likely to see this into 2.4.20?
> 
> roy
> 
> 
> On Friday 24 May 2002 21:32, Andrew Morton wrote:
>> "Martin J. Bligh" wrote:
>> > >> Sounds like exactly the same problem we were having. There are two
>> > >> approaches to solving this - Andrea has a patch that tries to free
>> > >> them under memory pressure, akpm has a patch that hacks them down as
>> > >> soon as you've fininshed with them (posted to lse-tech mailing list).
>> > >> Both approaches seemed to work for me, but the performance of the
>> > >> fixes still has to be established.
>> > > 
>> > > Where can I find the akpm patch?
>> > 
>> > http://marc.theaimsgroup.com/?l=lse-tech&m=102083525007877&w=2
>> > 
>> > > Any plans to merge this into the main kernel, giving a choice
>> > > (in config or /proc) to enable this?
>> > 
>> > I don't think Andrew is ready to submit this yet ... before anything
>> > gets merged back, it'd be very worthwhile testing the relative
>> > performance of both solutions ... the more testers we have the
>> > better ;-)
>> 
>> Cripes no.  It's pretty experimental.  Andrea spotted a bug, too.  Fixed
>> version is below.
>> 
>> It's possible that keeping the number of buffers as low as possible
>> will give improved performance over Andrea's approach because it
>> leaves more ZONE_NORMAL for other things.  It's also possible that
>> it'll give worse performance because more get_block's need to be
>> done for file overwriting.
>> 


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG+FIX] 2.4 buggercache sucks
  2002-08-28 15:30               ` Martin J. Bligh
@ 2002-08-29  8:00                 ` Roy Sigurd Karlsbakk
  2002-08-29 13:42                   ` Martin J. Bligh
  0 siblings, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-08-29  8:00 UTC (permalink / raw)
  To: Martin J. Bligh, Andrew Morton; +Cc: linux-kernel

> Summary: the code below probably isn't the desired solution.

Very well - but where is the code to run then?

I mean - this code solved _my_ problem. Without it the server OOMs within 
minutes of high load, as explained earlier. I'd rather like a clean fix in 
2.4 than this, although it works.

Any thougths?

roy

-- 
Roy Sigurd Karlsbakk, Datavaktmester
ProntoTV AS - http://www.pronto.tv/
Tel: +47 9801 3356

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG+FIX] 2.4 buggercache sucks
  2002-08-29  8:00                 ` Roy Sigurd Karlsbakk
@ 2002-08-29 13:42                   ` Martin J. Bligh
  2002-08-30  9:21                     ` Roy Sigurd Karlsbakk
  0 siblings, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2002-08-29 13:42 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk, Andrew Morton; +Cc: linux-kernel

>> Summary: the code below probably isn't the desired solution.
> 
> Very well - but where is the code to run then?

Not quite sure what you mean?
 
> I mean - this code solved _my_ problem. Without it the server OOMs within 
> minutes of high load, as explained earlier. I'd rather like a clean fix in 
> 2.4 than this, although it works.

I'm sure Andrew could explain this better than I - he wrote the
code, I just whined about the problem. Basically he frees the
buffer_head immediately after he's used it, which could at least
in theory degrade performance a little if it could have been reused.
Now, nobody's ever really benchmarked that, so a more conservative
approach is likely to be taken, unless someone can prove it doesn't
degrade performance much for people who don't need the fix. One
of the cases people were running scared of was something doing 
continual overwrites of a file, I think something like:

for (i=0;i<BIGNUMBER;i++) {
	lseek (0);
	write 4K of data;
}

Or something. 

Was your workload doing lots of reads, or lots of writes? Or both?

M.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG+FIX] 2.4 buggercache sucks
  2002-08-29 13:42                   ` Martin J. Bligh
@ 2002-08-30  9:21                     ` Roy Sigurd Karlsbakk
  2002-08-30 17:19                       ` Martin J. Bligh
  0 siblings, 1 reply; 48+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-08-30  9:21 UTC (permalink / raw)
  To: Martin J. Bligh, Andrew Morton; +Cc: linux-kernel

> > I mean - this code solved _my_ problem. Without it the server OOMs within
> > minutes of high load, as explained earlier. I'd rather like a clean fix
> > in 2.4 than this, although it works.
>
> I'm sure Andrew could explain this better than I - he wrote the
> code, I just whined about the problem. Basically he frees the
> buffer_head immediately after he's used it, which could at least
> in theory degrade performance a little if it could have been reused.
> Now, nobody's ever really benchmarked that, so a more conservative
> approach is likely to be taken, unless someone can prove it doesn't
> degrade performance much for people who don't need the fix. One
> of the cases people were running scared of was something doing
> continual overwrites of a file, I think something like:
>
> for (i=0;i<BIGNUMBER;i++) {
> 	lseek (0);
> 	write 4K of data;
> }
>
> Or something.
>
> Was your workload doing lots of reads, or lots of writes? Or both?

I was downloading large files @ ~ 4Mbps from 20-50 clients - filesize ~3GB
the box has 1GB memory minus (no highmem) - so - 900 megs. After some time it 
starts swapping and it OOMs. Same happens with several userspace httpd's

roy

-- 
Roy Sigurd Karlsbakk, Datavaktmester
ProntoTV AS - http://www.pronto.tv/
Tel: +47 9801 3356

Computers are like air conditioners.
They stop working when you open Windows.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG+FIX] 2.4 buggercache sucks
  2002-08-30  9:21                     ` Roy Sigurd Karlsbakk
@ 2002-08-30 17:19                       ` Martin J. Bligh
  2002-08-30 18:49                         ` Andrew Morton
  0 siblings, 1 reply; 48+ messages in thread
From: Martin J. Bligh @ 2002-08-30 17:19 UTC (permalink / raw)
  To: Roy Sigurd Karlsbakk, Andrew Morton; +Cc: linux-kernel

>> Was your workload doing lots of reads, or lots of writes? Or both?
> 
> I was downloading large files @ ~ 4Mbps from 20-50 clients - filesize ~3GB
> the box has 1GB memory minus (no highmem) - so - 900 megs. After some time it 
> starts swapping and it OOMs. Same happens with several userspace httpd's

Mmmm .... not quite sure which way round to read that. Presumably the box
that was the server fell over, and the clients are fine? So the workload that's
causing problems is doing predominantly reads? If so, I suggest you tear down
Andrew's patch to read side only, and submit that ... I get the feeling that would
be acceptable, and would solve your problem.

M.


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [BUG+FIX] 2.4 buggercache sucks
  2002-08-30 17:19                       ` Martin J. Bligh
@ 2002-08-30 18:49                         ` Andrew Morton
  0 siblings, 0 replies; 48+ messages in thread
From: Andrew Morton @ 2002-08-30 18:49 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Roy Sigurd Karlsbakk, linux-kernel

"Martin J. Bligh" wrote:
> 
> >> Was your workload doing lots of reads, or lots of writes? Or both?
> >
> > I was downloading large files @ ~ 4Mbps from 20-50 clients - filesize ~3GB
> > the box has 1GB memory minus (no highmem) - so - 900 megs. After some time it
> > starts swapping and it OOMs. Same happens with several userspace httpd's
> 
> Mmmm .... not quite sure which way round to read that. Presumably the box
> that was the server fell over, and the clients are fine? So the workload that's
> causing problems is doing predominantly reads? If so, I suggest you tear down
> Andrew's patch to read side only, and submit that ... I get the feeling that would
> be acceptable, and would solve your problem.

But we still don't know what the problem _is_.  It's very weird.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2002-08-30 18:47 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-05-23 13:11 [BUG] 2.4 VM sucks. Again Roy Sigurd Karlsbakk
2002-05-23 14:54 ` Martin J. Bligh
2002-05-23 16:29   ` Roy Sigurd Karlsbakk
2002-05-23 16:46     ` Martin J. Bligh
2002-05-24 10:04       ` Roy Sigurd Karlsbakk
2002-05-24 14:35         ` Martin J. Bligh
2002-05-24 19:32           ` Andrew Morton
2002-05-30 10:29             ` Roy Sigurd Karlsbakk
2002-05-30 19:28               ` Andrew Morton
2002-05-31 16:56                 ` Roy Sigurd Karlsbakk
2002-05-31 18:19                   ` Andrea Arcangeli
2002-06-18 11:26             ` Roy Sigurd Karlsbakk
2002-06-18 19:42               ` Andrew Morton
2002-06-19 11:26                 ` Roy Sigurd Karlsbakk
2002-07-10  7:50             ` [2.4 BUFFERING BUG] (was [BUG] 2.4 VM sucks. Again) Roy Sigurd Karlsbakk
2002-07-10  8:05               ` Andrew Morton
2002-07-10  8:14                 ` Roy Sigurd Karlsbakk
2002-08-28  9:28             ` [BUG+FIX] 2.4 buggercache sucks Roy Sigurd Karlsbakk
2002-08-28 15:30               ` Martin J. Bligh
2002-08-29  8:00                 ` Roy Sigurd Karlsbakk
2002-08-29 13:42                   ` Martin J. Bligh
2002-08-30  9:21                     ` Roy Sigurd Karlsbakk
2002-08-30 17:19                       ` Martin J. Bligh
2002-08-30 18:49                         ` Andrew Morton
2002-05-24 15:11     ` [BUG] 2.4 VM sucks. Again Alan Cox
2002-05-24 15:53       ` Martin J. Bligh
2002-05-24 16:14         ` Alan Cox
2002-05-24 16:31           ` Martin J. Bligh
2002-05-24 17:30             ` Austin Gonyou
2002-05-24 17:43               ` Martin J. Bligh
2002-05-24 18:03                 ` Austin Gonyou
2002-05-24 18:10                   ` Martin J. Bligh
2002-05-24 18:29                     ` 2.4 Kernel Perf discussion [Was Re: [BUG] 2.4 VM sucks. Again] Austin Gonyou
2002-05-24 19:01                       ` Stephen Frost
2002-05-27  9:24               ` [BUG] 2.4 VM sucks. Again Marco Colombo
2002-05-27 22:24                 ` Austin Gonyou
2002-05-27 23:08                   ` Austin Gonyou
2002-05-27 11:12       ` Roy Sigurd Karlsbakk
2002-05-27 14:31         ` Alan Cox
2002-05-27 13:43           ` Roy Sigurd Karlsbakk
2002-05-23 16:03 ` Johannes Erdfelt
2002-05-23 16:33   ` Roy Sigurd Karlsbakk
2002-05-23 22:50     ` Luigi Genoni
2002-05-24 11:53       ` Roy Sigurd Karlsbakk
2002-05-23 18:12 ` jlnance
2002-05-24 10:36   ` Roy Sigurd Karlsbakk
2002-05-31 21:21     ` Andrea Arcangeli
2002-06-01 12:36       ` Roy Sigurd Karlsbakk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).