linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* broken VM in 2.4.10-pre9
@ 2001-09-15 22:43 Peter Magnusson
  2001-09-15 23:50 ` Jan Harkes
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Peter Magnusson @ 2001-09-15 22:43 UTC (permalink / raw)
  To: linux-kernel

2.4.7: good VM
2.4.8: not good
2.4.9: not good!!!++
2.4.10-pre4: quite ok VM, but put little more on the swap than 2.4.7
2.4.10-pre8: not good
2.4.10-pre9: not good ... Linux didnt had used any swap at all, then i
             unrared two very large files at the same time. And now 104
             Mbyte swap is used! :-( 2.4.7 didnt do like this.
             Best is to use the swap as little as possible.

My cfg:

Real mem: 512684K (512 Mbyte)
Swap    : 257032K
compiled with: gcc version 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)


!! remove "nothanksok." from my email if you want to reply to me !!





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-15 22:43 broken VM in 2.4.10-pre9 Peter Magnusson
@ 2001-09-15 23:50 ` Jan Harkes
  2001-09-16  5:31 ` Linus Torvalds
  2001-09-17 10:25 ` Tonu Samuel
  2 siblings, 0 replies; 22+ messages in thread
From: Jan Harkes @ 2001-09-15 23:50 UTC (permalink / raw)
  To: linux-kernel

What do you consider as good VM?

Because pages aren't 'aged' until there is swap allocated for them, your
kernel should actually work better if it has a lot of pages backed by
swap. The only thing is, we don't really make the right decision about
which pages to swap out, but that's just a detail.

IMHO. A large number of cached/active pages == good.

Jan

On Sun, Sep 16, 2001 at 12:43:35AM +0200, Peter Magnusson wrote:
> 2.4.7: good VM
> 2.4.8: not good
> 2.4.9: not good!!!++
> 2.4.10-pre4: quite ok VM, but put little more on the swap than 2.4.7
> 2.4.10-pre8: not good
> 2.4.10-pre9: not good ... Linux didnt had used any swap at all, then i
>              unrared two very large files at the same time. And now 104
>              Mbyte swap is used! :-( 2.4.7 didnt do like this.
>              Best is to use the swap as little as possible.
> 
> My cfg:
> 
> Real mem: 512684K (512 Mbyte)
> Swap    : 257032K
> compiled with: gcc version 2.96 20000731 (Linux-Mandrake 8.0 2.96-0.48mdk)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-15 22:43 broken VM in 2.4.10-pre9 Peter Magnusson
  2001-09-15 23:50 ` Jan Harkes
@ 2001-09-16  5:31 ` Linus Torvalds
  2001-09-16  8:45   ` Eric W. Biederman
  2001-09-17 10:25 ` Tonu Samuel
  2 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2001-09-16  5:31 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.33L2.0109160031500.7740-100000@flashdance>,
Peter Magnusson  <iocc@flashdance.nothanksok.cx> wrote:
>
>2.4.10-pre4: quite ok VM, but put little more on the swap than 2.4.7
>2.4.10-pre8: not good

Ehh..

There are _no_ VM changes that I can see between pre4 and pre8.

>2.4.10-pre9: not good ... Linux didnt had used any swap at all, then i
>             unrared two very large files at the same time. And now 104
>             Mbyte swap is used! :-( 2.4.7 didnt do like this.
>             Best is to use the swap as little as possible.

.. and there are none between pre8 and pre9.

Basically, it sounds lik eyou have tested different loads on different
kernels, and some loads are nice and others are not.

Also note that the amount of "swap used" is totally meaningless in
2.4.x. The 2.4.x kernel will _allocate_ the swap backing store much
earlier than 2.2.x, but that doesn't actuall ymean that it does any of
the IO. Indeed, allocating the swap backing store just means that the
swap pages are then kept track of, so that they can be aged along with
other stores.

So whether Linux uses swap or not is a 100% meaningless indicator of
"goodness".  The only thing that matters is how well the job gets done,
ie was it reasonably responsive, and did the big untars finish quickly.. 

Don't look at how many pages of swap were used. That's a statistic,
nothing more.

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-16  5:31 ` Linus Torvalds
@ 2001-09-16  8:45   ` Eric W. Biederman
  0 siblings, 0 replies; 22+ messages in thread
From: Eric W. Biederman @ 2001-09-16  8:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

torvalds@transmeta.com (Linus Torvalds) writes:

> Don't look at how many pages of swap were used. That's a statistic,
> nothing more.

It is a statistic until you run out of them.  Obviously that isn't
the problem here, or we'd hear complaints about the OOM killer.  But
the number of pages used can make a difference.

Eric

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-16 19:07     ` Rik van Riel
@ 2001-09-16 15:19       ` Phillip Susi
  2001-09-16 19:33         ` Jeremy Zawodny
  2001-09-16 19:52         ` Rik van Riel
  2001-09-16 19:17       ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Alan Cox
  2001-09-16 19:19       ` Andrea Arcangeli
  2 siblings, 2 replies; 22+ messages in thread
From: Phillip Susi @ 2001-09-16 15:19 UTC (permalink / raw)
  To: linux-kernel

Maybe I'm missing something here, but it seems to me that these problems are 
due to the cache putting pressure on VM, so process pages get swapped out.  
The obvious solution to this is to limit the size of the cache, or implement 
some sort of algorithm to slow its growth and reduce the pressure on VM.  It 
also seems that one of the causes for the cache expanding is large bulk file 
copies, or reads for say, mp3 playing.  Wasn't there a flag to disable 
caching on file IO that these programs could use, to keep from polluting the 
cache?  

Am I way off base here?

-- 
--> Phill Susi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-17 10:25 ` Tonu Samuel
@ 2001-09-16 16:47   ` Jeremy Zawodny
  2001-09-16 18:36     ` Alan Cox
  2001-09-16 18:34   ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Andrea Arcangeli
  2001-09-16 19:37   ` broken VM in 2.4.10-pre9 Linus Torvalds
  2 siblings, 1 reply; 22+ messages in thread
From: Jeremy Zawodny @ 2001-09-16 16:47 UTC (permalink / raw)
  To: Tonu Samuel; +Cc: Linus Torvalds, linux-kernel

On Mon, Sep 17, 2001 at 06:25:38PM +0800, Tonu Samuel wrote:
> On 16 Sep 2001 05:31:11 +0000, Linus Torvalds wrote:
> 
> > Also note that the amount of "swap used" is totally meaningless in
> > 2.4.x. The 2.4.x kernel will _allocate_ the swap backing store much
> > earlier than 2.2.x, but that doesn't actuall ymean that it does any of
> > the IO. Indeed, allocating the swap backing store just means that the
> > swap pages are then kept track of, so that they can be aged along with
> > other stores.
> 
> Problem still exists and persists. Not long time ago man from Yahoo
> described well case when change from 2.2.19 to 2.4.x caused
> performance problems. On 2.2.19 everything ran fine. They have MySQL
> running+did backups from disk. After upgrade to 2.4.x MySQL
> performance felt down on backup time. They investigated stuff and
> found that MySQL daemon gets swapped out in the middle of usage to
> make room for buffers. In summary: this made both sql and backup
> double slow. Even increasing memory from 1G->2G didn't
> helped. Finally they disabled swap at all and problem lost.

Yep, that was me.  It was frustrating to have to double the RAM in the
machine and then turn off swap.  The extra RAM did help, but it really
only delayed the problem.

> If you do not want to change it back as it was in 2.2.x then would
> be good if this is tunable somehow.

Agreed.  I'd be great if there was an option to say "Don't swap out
memory that was allocated by these programs.  If you run out of disk
buffers, toss the oldest ones and start re-using them."

Jeremy
-- 
Jeremy D. Zawodny     |  Perl, Web, MySQL, Linux Magazine, Yahoo!
<Jeremy@Zawodny.com>  |  http://jeremy.zawodny.com/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* vm rewrite ready [Re: broken VM in 2.4.10-pre9]
  2001-09-17 10:25 ` Tonu Samuel
  2001-09-16 16:47   ` Jeremy Zawodny
@ 2001-09-16 18:34   ` Andrea Arcangeli
  2001-09-16 19:07     ` Rik van Riel
       [not found]     ` <20010917174037.7e3739b9.skraw@ithnet.com>
  2001-09-16 19:37   ` broken VM in 2.4.10-pre9 Linus Torvalds
  2 siblings, 2 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2001-09-16 18:34 UTC (permalink / raw)
  To: Tonu Samuel; +Cc: Linus Torvalds, linux-kernel

On Mon, Sep 17, 2001 at 06:25:38PM +0800, Tonu Samuel wrote:
> On 16 Sep 2001 05:31:11 +0000, Linus Torvalds wrote:
> 
> > Also note that the amount of "swap used" is totally meaningless in
> > 2.4.x. The 2.4.x kernel will _allocate_ the swap backing store much
> > earlier than 2.2.x, but that doesn't actuall ymean that it does any of
> > the IO. Indeed, allocating the swap backing store just means that the
> > swap pages are then kept track of, so that they can be aged along with
> > other stores.
> 
> Problem still exists and persists. Not long time ago man from Yahoo
> described well case when change from 2.2.19 to 2.4.x caused performance
> problems. On 2.2.19 everything ran fine. They have MySQL running+did

After a few days of developement I think I'm ready to release the VM
rewrite I did.

The alternate vm will be included in 2.4.10pre9aa1 (or anwways the very
next -aa release) and I'll maintain it in the -aa tree.  It is supposed
to provide:

1) stable kswapd, avoid the kswapd 100% load of the cpu problem
   (this is provided by the classzone design, btw I improved the
   implementation a little bit compared to the 2.3/2.4.0-test patches,
   now I try to do things as lazily as possible without the bookkeeping
   in the pagealloc/pagefreeing)
2) optimal performance, avoid slowdowns after multiple runs of workloads
   and avoid swapout storms (for databases not using O_DIRECT)
3) you will get swap+ram of available virtual memory

At the moment it's of course still a bit experimental and subject to
changes but I'm writing this email on top of it and it's perfectly
usable.

This isn't an hack/band-aid or a small set of changes, it's a
complete rewrite from scratch of the whole memory balancing including
garbage collections lru lists, kswapd etc... (only the swap_out() path
is almost unchanged)

The only benchmark I did so far is been `dbench`. Without the vm patch
applied dbench says:

andrea@laser:/mnt > dbench 40
40 clients started

Throughput 9.40112 MB/sec (NB\x11.7514 MB/sec  94.0112 MBit/sec)
andrea@laser:/mnt > dbench 40
40 clients started
.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................+.....................................+...................................................+...............................................................................+...+...............................................................................+.........................+.....................................+...............+................................................................................+.++++++++++++++++++++++++++++++****************************************
Throughput 9.56469 MB/sec (NB\x11.9559 MB/sec  95.6469 MBit/sec)
andrea@laser:/mnt > 

After I apply my vm patch dbench constantly says:

andrea@laser:/mnt > for i in 1 2 3 4 ; do dbench 40; done
40 clients started

Throughput 20.353 MB/sec (NB%.4412 MB/sec  203.53 MBit/sec)
40 clients started

Throughput 20.9269 MB/sec (NB&.1586 MB/sec  209.269 MBit/sec)
40 clients started

Throughput 21.0787 MB/sec (NB&.3483 MB/sec  210.787 MBit/sec)
40 clients started

Throughput 21.6167 MB/sec (NB'.0208 MB/sec  216.167 MBit/sec)
andrea@laser:/mnt > 

Andrea

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-16 16:47   ` Jeremy Zawodny
@ 2001-09-16 18:36     ` Alan Cox
  2001-09-16 19:38       ` Linus Torvalds
  0 siblings, 1 reply; 22+ messages in thread
From: Alan Cox @ 2001-09-16 18:36 UTC (permalink / raw)
  To: Jeremy Zawodny; +Cc: Tonu Samuel, Linus Torvalds, linux-kernel

> Yep, that was me.  It was frustrating to have to double the RAM in the
> machine and then turn off swap.  The extra RAM did help, but it really
> only delayed the problem.

That shouldnt be needed with at least the later -ac kernels - nor is the
swap > twice ram rule present in those

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9]
  2001-09-16 18:34   ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Andrea Arcangeli
@ 2001-09-16 19:07     ` Rik van Riel
  2001-09-16 15:19       ` broken VM in 2.4.10-pre9 Phillip Susi
                         ` (2 more replies)
       [not found]     ` <20010917174037.7e3739b9.skraw@ithnet.com>
  1 sibling, 3 replies; 22+ messages in thread
From: Rik van Riel @ 2001-09-16 19:07 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Tonu Samuel, Linus Torvalds, linux-kernel

On Sun, 16 Sep 2001, Andrea Arcangeli wrote:

> The alternate vm will be included in 2.4.10pre9aa1 (or anwways the
> very next -aa release) and I'll maintain it in the -aa tree.

Cool, I'll definately take a look to see if there are any
good ideas ready to be integrated into the -linus or -ac
kernels.

> It is supposed to provide:

   [snip holy grail]

I doubt you'll be able to achieve all of those without
really major changes, but I'll take a look at your code
when you make it public ;)

cheers,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9]
  2001-09-16 19:17       ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Alan Cox
@ 2001-09-16 19:15         ` Rik van Riel
  0 siblings, 0 replies; 22+ messages in thread
From: Rik van Riel @ 2001-09-16 19:15 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andrea Arcangeli, Tonu Samuel, Linus Torvalds, linux-kernel

On Sun, 16 Sep 2001, Alan Cox wrote:

> >    [snip holy grail]
> >
> > I doubt you'll be able to achieve all of those without
> > really major changes, but I'll take a look at your code
> > when you make it public ;)
>
> Andrea made 2.2 finally stable under really high VM loads. I'm
> certainly interested to see what comes out of this.

Definately, I have no doubt he'll achieve some good
results.  It's the overly wild claims I'm having doubts
about.

I'm looking forward to seeing his patch...

regards,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9]
  2001-09-16 19:07     ` Rik van Riel
  2001-09-16 15:19       ` broken VM in 2.4.10-pre9 Phillip Susi
@ 2001-09-16 19:17       ` Alan Cox
  2001-09-16 19:15         ` Rik van Riel
  2001-09-16 19:19       ` Andrea Arcangeli
  2 siblings, 1 reply; 22+ messages in thread
From: Alan Cox @ 2001-09-16 19:17 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrea Arcangeli, Tonu Samuel, Linus Torvalds, linux-kernel

>    [snip holy grail]
> 
> I doubt you'll be able to achieve all of those without
> really major changes, but I'll take a look at your code
> when you make it public ;)

Andrea made 2.2 finally stable under really high VM loads. I'm certainly
interested to see what comes out of this. 

Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9]
  2001-09-16 19:07     ` Rik van Riel
  2001-09-16 15:19       ` broken VM in 2.4.10-pre9 Phillip Susi
  2001-09-16 19:17       ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Alan Cox
@ 2001-09-16 19:19       ` Andrea Arcangeli
  2001-09-16 19:30         ` Linus Torvalds
  2 siblings, 1 reply; 22+ messages in thread
From: Andrea Arcangeli @ 2001-09-16 19:19 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Tonu Samuel, Linus Torvalds, linux-kernel

On Sun, Sep 16, 2001 at 04:07:16PM -0300, Rik van Riel wrote:
> I doubt you'll be able to achieve all of those without
> really major changes, but I'll take a look at your code
> when you make it public ;)

as said it is quite a major change, it discards most of the the 2.4 vm
that I don't agree with, it is basically an evolution of the classzone
patch.

andrea@athlon:~/remote/kernel.org/kernels/v2.4/2.4.10pre9aa1 > diffstat 80_vm-aa-1 
 ID                      |binary
 arch/alpha/mm/fault.c   |    7 
 arch/i386/mm/fault.c    |   25 +
 fs/buffer.c             |   68 +--
 fs/dcache.c             |    2 
 fs/inode.c              |   59 +--
 fs/proc/proc_misc.c     |    8 
 include/linux/fs.h      |    2 
 include/linux/highmem.h |    2 
 include/linux/list.h    |    1 
 include/linux/mm.h      |   50 +-
 include/linux/mmzone.h  |    9 
 include/linux/pagemap.h |    1 
 include/linux/sched.h   |    3 
 include/linux/slab.h    |    2 
 include/linux/swap.h    |  148 ++-----
 include/linux/swapctl.h |   22 -
 kernel/fork.c           |    2 
 kernel/signal.c         |    2 
 kernel/sysctl.c         |    6 
 mm/filemap.c            |   38 -
 mm/memory.c             |   12 
 mm/numa.c               |    8 
 mm/oom_kill.c           |   40 --
 mm/page_alloc.c         |  501 +++++++++-----------------
 mm/shmem.c              |    2 
 mm/slab.c               |    8 
 mm/swap.c               |  105 -----
 mm/swap_state.c         |   14 
 mm/swapfile.c           |   21 -
 mm/vmscan.c             |  913 +++++++++++++++---------------------------------
 31 files changed, 699 insertions(+), 1382 deletions(-)
andrea@athlon:~/remote/kernel.org/kernels/v2.4/2.4.10pre9aa1 > 

Andrea

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9]
  2001-09-16 19:19       ` Andrea Arcangeli
@ 2001-09-16 19:30         ` Linus Torvalds
  0 siblings, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2001-09-16 19:30 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Rik van Riel, Tonu Samuel, linux-kernel


On Sun, 16 Sep 2001, Andrea Arcangeli wrote:
>
> as said it is quite a major change, it discards most of the the 2.4 vm
> that I don't agree with, it is basically an evolution of the classzone
> patch.

That is the wrong direction to go into.

We'll be completely screwed on NuMA with the classzone patch. I've said so
before, I'll say so again.

The basic approach of the classzone patch is _wrong_, in making global
decisions where no "globality" exists.

I bet that the improvements are from other things, not from classzone
itself. An dI will bet that if we start doing classzones, we'll regret it
a LOT in a few years.

		Linus


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-16 15:19       ` broken VM in 2.4.10-pre9 Phillip Susi
@ 2001-09-16 19:33         ` Jeremy Zawodny
  2001-09-16 19:54           ` [PATCH] " Rik van Riel
  2001-09-16 19:52         ` Rik van Riel
  1 sibling, 1 reply; 22+ messages in thread
From: Jeremy Zawodny @ 2001-09-16 19:33 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-kernel

On Sun, Sep 16, 2001 at 03:19:29PM +0000, Phillip Susi wrote:

> Maybe I'm missing something here, but it seems to me that these
> problems are due to the cache putting pressure on VM, so process
> pages get swapped out.

That's what it felt like in the cases that I ran into it.  It was
trying to treat all memory equally, when it probably shouldn't have.

Jeremy
-- 
Jeremy D. Zawodny     |  Perl, Web, MySQL, Linux Magazine, Yahoo!
<Jeremy@Zawodny.com>  |  http://jeremy.zawodny.com/

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-17 10:25 ` Tonu Samuel
  2001-09-16 16:47   ` Jeremy Zawodny
  2001-09-16 18:34   ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Andrea Arcangeli
@ 2001-09-16 19:37   ` Linus Torvalds
  2001-09-17 14:04     ` Olaf Zaplinski
  2 siblings, 1 reply; 22+ messages in thread
From: Linus Torvalds @ 2001-09-16 19:37 UTC (permalink / raw)
  To: linux-kernel

In article <1000722338.14005.0.camel@x153.internalnet>,
Tonu Samuel  <tonu@please.do.not.remove.this.spam.ee> wrote:
>
>Problem still exists and persists. Not long time ago man from Yahoo
>described well case when change from 2.2.19 to 2.4.x caused performance
>problems. On 2.2.19 everything ran fine. They have MySQL running+did
>backups from disk. After upgrade to 2.4.x MySQL performance felt down on
>backup time. They investigated stuff and found that MySQL daemon gets
>swapped out in the middle of usage to make room for buffers.

Note that if you're using a raw device backup strategy (ie "e2dump" or
similar), that is expected: 2.4.x up until about 2.4.7 gave _much_ too
much preference to the buffer cache. 

That should actually have been fixed in 2.4.8. We used to mark buffer
pages much too active.

> In summary:
>this made both sql and backup double slow. Even increasing memory from
>1G->2G didn't helped. Finally they disabled swap at all and problem
>lost.

You just hid the problem - by disabling swap the buffer cache couldn't
grow without bounds any more, and the proper buffer cache shrinking
couldn't happen.

Try 2.4.8 or later.

>If you do not want to change it back as it was in 2.2.x then would be
>good if this is tunable somehow. 

Tuning for bugs?

What do you want to happen? You want to have an interface like

	echo 0 > /proc/bugs/mm

that makes mm bugs go away?

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-16 18:36     ` Alan Cox
@ 2001-09-16 19:38       ` Linus Torvalds
  0 siblings, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2001-09-16 19:38 UTC (permalink / raw)
  To: linux-kernel

In article <E15igmC-0005bs-00@the-village.bc.nu>,
Alan Cox  <alan@lxorguk.ukuu.org.uk> wrote:
>> Yep, that was me.  It was frustrating to have to double the RAM in the
>> machine and then turn off swap.  The extra RAM did help, but it really
>> only delayed the problem.
>
>That shouldnt be needed with at least the later -ac kernels - nor is the
>swap > twice ram rule present in those

Nor has it been present in the standard kernels since 2.4.8.

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-16 15:19       ` broken VM in 2.4.10-pre9 Phillip Susi
  2001-09-16 19:33         ` Jeremy Zawodny
@ 2001-09-16 19:52         ` Rik van Riel
  1 sibling, 0 replies; 22+ messages in thread
From: Rik van Riel @ 2001-09-16 19:52 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-kernel

On Sun, 16 Sep 2001, Phillip Susi wrote:

> Maybe I'm missing something here, but it seems to me that these
> problems are due to the cache putting pressure on VM, so process pages
> get swapped out.  The obvious solution to this is to limit the size of
> the cache, or implement some sort of algorithm to slow its growth and
> reduce the pressure on VM.

> Am I way off base here?

You're absolutely right and it's only a tiny patch to
implement this thing.  I've attached a completely
untested (I haven't even compiled this thing) patch
which implements this thing. I suspect it'll apply to
any recent -ac kernel, porting it to -linus should be
easy.

regards,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)



--- mm/vmscan.c.orig	Sun Sep 16 16:44:14 2001
+++ mm/vmscan.c	Sun Sep 16 16:49:09 2001
@@ -731,6 +731,8 @@
  */
 #define too_many_buffers (atomic_read(&buffermem_pages) > \
 		(num_physpages * buffer_mem.borrow_percent / 100))
+#define too_much_cache (page_cache_size - swapper_space.nrpages) > \
+		(num_physpages * page_cache.borrow_percent / 100))
 int refill_inactive_scan(unsigned int priority)
 {
 	struct list_head * page_lru;
@@ -793,6 +795,18 @@
 		 * be reclaimed there...
 		 */
 		if (page->buffers && !page->mapping && too_many_buffers) {
+			deactivate_page_nolock(page);
+			page_active = 0;
+		}
+
+		/*
+		 * If the page cache is too large, move the page
+		 * to the inactive list. If it is really accessed
+		 * it'll be referenced before it reaches the point
+		 * where we'll reclaim it.
+		 */
+		if (page->mapping && too_much_cache && page_count(page) <=
+					(page->buffers ? 2 : 1)) {
 			deactivate_page_nolock(page);
 			page_active = 0;
 		}
--- mm/swap.c.orig	Sun Sep 16 16:50:43 2001
+++ mm/swap.c	Sun Sep 16 16:50:58 2001
@@ -64,7 +64,7 @@

 buffer_mem_t page_cache = {
 	2,	/* minimum percent page cache */
-	15,	/* borrow percent page cache */
+	60,	/* borrow percent page cache */
 	75	/* maximum */
 };



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH]  Re: broken VM in 2.4.10-pre9
  2001-09-16 19:33         ` Jeremy Zawodny
@ 2001-09-16 19:54           ` Rik van Riel
  0 siblings, 0 replies; 22+ messages in thread
From: Rik van Riel @ 2001-09-16 19:54 UTC (permalink / raw)
  To: Jeremy Zawodny; +Cc: Phillip Susi, linux-kernel

On Sun, 16 Sep 2001, Jeremy Zawodny wrote:
> On Sun, Sep 16, 2001 at 03:19:29PM +0000, Phillip Susi wrote:
>
> > Maybe I'm missing something here, but it seems to me that these
> > problems are due to the cache putting pressure on VM, so process
> > pages get swapped out.
>
> That's what it felt like in the cases that I ran into it.  It was
> trying to treat all memory equally, when it probably shouldn't have.

Indeed, it should treat all memory equally, except when we
really have far too much cache.  I'll resend the patch with
the subject clearly marked since this trivial thing really
does need testers ;)

regards,

Rik
-- 
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/


--- mm/vmscan.c.orig	Sun Sep 16 16:44:14 2001
+++ mm/vmscan.c	Sun Sep 16 16:49:09 2001
@@ -731,6 +731,8 @@
  */
 #define too_many_buffers (atomic_read(&buffermem_pages) > \
 		(num_physpages * buffer_mem.borrow_percent / 100))
+#define too_much_cache (page_cache_size - swapper_space.nrpages) > \
+		(num_physpages * page_cache.borrow_percent / 100))
 int refill_inactive_scan(unsigned int priority)
 {
 	struct list_head * page_lru;
@@ -793,6 +795,18 @@
 		 * be reclaimed there...
 		 */
 		if (page->buffers && !page->mapping && too_many_buffers) {
+			deactivate_page_nolock(page);
+			page_active = 0;
+		}
+
+		/*
+		 * If the page cache is too large, move the page
+		 * to the inactive list. If it is really accessed
+		 * it'll be referenced before it reaches the point
+		 * where we'll reclaim it.
+		 */
+		if (page->mapping && too_much_cache && page_count(page) <=
+					(page->buffers ? 2 : 1)) {
 			deactivate_page_nolock(page);
 			page_active = 0;
 		}
--- mm/swap.c.orig	Sun Sep 16 16:50:43 2001
+++ mm/swap.c	Sun Sep 16 16:50:58 2001
@@ -64,7 +64,7 @@

 buffer_mem_t page_cache = {
 	2,	/* minimum percent page cache */
-	15,	/* borrow percent page cache */
+	60,	/* borrow percent page cache */
 	75	/* maximum */
 };



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-15 22:43 broken VM in 2.4.10-pre9 Peter Magnusson
  2001-09-15 23:50 ` Jan Harkes
  2001-09-16  5:31 ` Linus Torvalds
@ 2001-09-17 10:25 ` Tonu Samuel
  2001-09-16 16:47   ` Jeremy Zawodny
                     ` (2 more replies)
  2 siblings, 3 replies; 22+ messages in thread
From: Tonu Samuel @ 2001-09-17 10:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On 16 Sep 2001 05:31:11 +0000, Linus Torvalds wrote:

> Also note that the amount of "swap used" is totally meaningless in
> 2.4.x. The 2.4.x kernel will _allocate_ the swap backing store much
> earlier than 2.2.x, but that doesn't actuall ymean that it does any of
> the IO. Indeed, allocating the swap backing store just means that the
> swap pages are then kept track of, so that they can be aged along with
> other stores.

Problem still exists and persists. Not long time ago man from Yahoo
described well case when change from 2.2.19 to 2.4.x caused performance
problems. On 2.2.19 everything ran fine. They have MySQL running+did
backups from disk. After upgrade to 2.4.x MySQL performance felt down on
backup time. They investigated stuff and found that MySQL daemon gets
swapped out in the middle of usage to make room for buffers. In summary:
this made both sql and backup double slow. Even increasing memory from
1G->2G didn't helped. Finally they disabled swap at all and problem
lost.

If you do not want to change it back as it was in 2.2.x then would be
good if this is tunable somehow. 
 
-- 
For technical support contracts, goto https://order.mysql.com/
   __  ___     ___ ____  __
  /  |/  /_ __/ __/ __ \/ /    Mr. Tonu Samuel <tonu@mysql.com>
 / /|_/ / // /\ \/ /_/ / /__   MySQL AB, Security Administrator
/_/  /_/\_, /___/\___\_\___/   Hong Kong, China
       <___/   www.mysql.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: broken VM in 2.4.10-pre9
  2001-09-16 19:37   ` broken VM in 2.4.10-pre9 Linus Torvalds
@ 2001-09-17 14:04     ` Olaf Zaplinski
  0 siblings, 0 replies; 22+ messages in thread
From: Olaf Zaplinski @ 2001-09-17 14:04 UTC (permalink / raw)
  To: linux-kernel

Linus Torvalds wrote:
> [...]
> What do you want to happen? You want to have an interface like
> 
>         echo 0 > /proc/bugs/mm
> 
> that makes mm bugs go away?

Good idea! ;-)

Well, I had similar problems and went back to 2.2.19... but isn't there a
tuneable yet?

On http://www.badtux.org/eric/editorial/mindcraft.html I found this one:

'Tuning the file buffer size so that more than 60% of memory can be used
(90% in this example) can be accomplished by issuing the following command:
echo "2 10 90" >/proc/sys/vm/buffermem"
This is documented in the file /usr/src/linux/Documentation/sysctl/vm.txt
along with many other tuning parameters, such as the 'bdflush' parameter.'


But vm.txt from 2.4.9ac10 and 2.2.19 says:

buffermem:

The three values in this file correspond to the values in
the struct buffer_mem. It controls how much memory should
be used for buffer memory. The percentage is calculated
as a percentage of total system memory.

The values are:
min_percent     -- this is the minimum percentage of memory
                   that should be spent on buffer memory
borrow_percent  -- UNUSED
max_percent     -- UNUSED

Is vm.txt out of date, or is there really no tuneable, neither in 2.2.x nor
in 2.4.x?

Olaf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9]
       [not found]         ` <20010917191256.6e6a1c87.skraw@ithnet.com>
@ 2001-09-17 22:41           ` Andrea Arcangeli
  0 siblings, 0 replies; 22+ messages in thread
From: Andrea Arcangeli @ 2001-09-17 22:41 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: linux-kernel

[ CC'ed to l-k with Stephan approval ]

On Mon, Sep 17, 2001 at 07:12:56PM +0200, Stephan von Krawczynski wrote:
> On Mon, 17 Sep 2001 18:10:40 +0200 Andrea Arcangeli <andrea@suse.de> wrote:
> 
> > On Mon, Sep 17, 2001 at 05:40:37PM +0200, Stephan von Krawczynski wrote:
> > > On Sun, 16 Sep 2001 20:34:14 +0200 Andrea Arcangeli <andrea@suse.de> wrote:
> > > 
> > > > After a few days of developement I think I'm ready to release the VM
> > > > rewrite I did.
> > > > 
> > > > The alternate vm will be included in 2.4.10pre9aa1 (or anwways the very
> > > > next -aa release) and I'll maintain it in the -aa tree.  It is supposed
> > > > to provide:
> > > 
> > > Where can I get a patch working on 2.4.9 (possibly pre9 or pre10)?
> > > Didn't find it on ftp.kernel.org.
> > 
> > I uploaded it now. You can apply the whole 2.4.10pre10 patch
> > 
> > 
> ftp://ftp.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.10pre10aa1.bz2
> > 
> > Any feedback is welcome so I can make it better if it swapouts too much
> > etc...
> 
> Hello Andrea,
> 
> my first impression: very high performance compared to all other versions I
> tested so far.
>
> - cpu average load is low, during whole test sometimes even below 3
>   (never saw
> this before)

Good.

I also had another report with very vfs intensive operation going on and
I suspect this patch will be a good idea (even if it can lead to the
usual excessive grow of the vfs caches on the long run but the current
way is probably too aggressive).

--- 2.4.10pre10aa2/mm/vmscan.c.~1~	Mon Sep 17 19:17:27 2001
+++ 2.4.10pre10aa2/mm/vmscan.c	Tue Sep 18 00:09:33 2001
@@ -518,12 +518,12 @@
 	if (nr_pages <= 0)
 		return 0;
 
-	shrink_dcache_memory(priority, gfp_mask);
-	shrink_icache_memory(priority, gfp_mask);
-
 	nr_pages = shrink_cache(&active_list, &max_scan, nr_pages, classzone, gfp_mask);
 	if (nr_pages <= 0)
 		return 0;
+
+	shrink_dcache_memory(priority, gfp_mask);
+	shrink_icache_memory(priority, gfp_mask);
 
 	return nr_pages;
 }

> - meminfo during test:
>         total:    used:    free:  shared: buffers:  cached:
> Mem:  923574272 920178688  3395584        0 73883648 741076992
> Swap: 271392768        0 271392768
> MemTotal:       901928 kB
> MemFree:          3316 kB
> MemShared:           0 kB
> Buffers:         72152 kB
> Cached:         723708 kB
> SwapCached:          0 kB
> Active:         116172 kB
> Inactive:       679688 kB
> HighTotal:           0 kB
> HighFree:            0 kB
> LowTotal:       901928 kB
> LowFree:          3316 kB
> SwapTotal:      265032 kB
> SwapFree:       265032 kB

Fine.

> Doesn't change that much (once all mem is eaten up from free).

that's expected, I didn't claimed to have added the defragmenter yet ;)
The architecture for the defrag it's just there though, I just ignored
solving the order > 1 for now, that will be probably the next step.

> - Has same alloc problems as other versions:
> Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 2-order allocation failed
> (gfp=0x20/0) from c012de72
> Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 1-order allocation failed
> (gfp=0x20/0) from c012de72
> Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 0-order allocation failed
> (gfp=0x20/0) from c012de72

while this is order 0 this is a GFP_ATOMIC allocation so it's sane too that
it failed.  Can you symbol-resolve the address "c012de72" so we know
who's doing this GFP_ATOMIC allocation? thanks. We could theoretically
shrink the cache also from GFP_ATOMIC but we should make a few spinlocks
irq spinlocks.

> Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 3-order allocation failed
> (gfp=0x20/0) from c012de72
> Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 3-order allocation failed
> (gfp=0x20/0) from c012de72
> Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 2-order allocation failed
> (gfp=0x20/0) from c012de72
> Sep 17 19:04:17 admin kernel: cdda2wav __alloc_pages: 3-order allocation failed
> (gfp=0x20/0) from c012de72
> I cut out only a few to give you a hint. I patched the current->comm in myself,
> thats where the cdda2wav comes from.

Ok.

> Is it possible for you to make something like always at least one free page in
> every zone->order? If not try to "refill" the order queue? There must be some
> way to get rid of those alloc-failures.

Of course, as said that's probably the next step, but it won't be a free
page in every zone order, we'll do the work lazily as usual (only when
necessary, order > 0 allocations should be very unlikely, even more
unlikely should be order >0 allocations with GFP_ATOMIC, I believe the
right fix is to fix the caller that is allocating memory that way, but
of course on the long run we'll also try to defrag the ram, but this is
not a good reason for not fixing the drivers! :)

> I do an overnight test right now and have a look tomorrow morning how things
> went.

Ok.

> I'll be back,

thanks for the feedback. As said right now all kind of feedback is
welcome and I've a few other changes pending that looks attractive but
they hurts the wonderful dbench numbers so I didn't made them yet in the
hope dbench has some relation to real life too (and I can pretty much
see why the current algorithm works better than the other changes, it
wasn't developed to run well in dbench of course, it just incidentally
happened to be the best score in dbench).

Another detail: it happened that I was talking with David Mosemberg
about the ptrace races while working on the vm, so due an "editing in
the wrong tree error" there's now a leftover in the 80_vm-aa-1 patch in
signal.c, this patch should be backed out:

diff -urN vm-ref/kernel/signal.c vm/kernel/signal.c
--- vm-ref/kernel/signal.c	Mon Sep 17 01:26:12 2001
+++ vm/kernel/signal.c	Mon Sep 17 01:26:25 2001
@@ -382,7 +382,7 @@
 	switch (sig) {
 	case SIGKILL: case SIGCONT:
 		/* Wake up the process if stopped.  */
-		if (t->state == TASK_STOPPED)
+		if (t->state == TASK_STOPPED && !(t->ptrace & PT_PTRACED))
 			wake_up_process(t);
 		t->exit_code = 0;
 		rm_sig_from_queue(SIGSTOP, t);


but it's non fatal, just ignore it for now, it will be fixed in the next
-aa.  The only downside is that any SIGKILL or SIGCONT won't arrive to
the task while it's being ptraced.

Andrea

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: vm rewrite ready [Re: broken VM in 2.4.10-pre9]
       [not found]     ` <20010917174037.7e3739b9.skraw@ithnet.com>
       [not found]       ` <20010917181040.J713@athlon.random>
@ 2001-09-18  9:00       ` Stephan von Krawczynski
  1 sibling, 0 replies; 22+ messages in thread
From: Stephan von Krawczynski @ 2001-09-18  9:00 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

On Tue, 18 Sep 2001 00:41:16 +0200 Andrea Arcangeli <andrea@suse.de> wrote:

> [ CC'ed to l-k with Stephan approval ]
> > - cpu average load is low, during whole test sometimes even below 3
> >   (never saw
> > this before)
> 
> Good.
> 
> I also had another report with very vfs intensive operation going on and
> I suspect this patch will be a good idea (even if it can lead to the
> usual excessive grow of the vfs caches on the long run but the current
> way is probably too aggressive).

Hm, are you sure about this? Here is /proc/meminfo after a night of heavy nfs
action (we are at the server side):

        total:    used:    free:  shared: buffers:  cached:
Mem:  923574272 919187456  4386816        0 39723008 793706496
Swap: 271392768  1417216 269975552
MemTotal:       901928 kB
MemFree:          4284 kB
MemShared:           0 kB
Buffers:         38792 kB
Cached:         775052 kB
SwapCached:         52 kB
Active:         811464 kB
Inactive:         2432 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       901928 kB
LowFree:          4284 kB
SwapTotal:      265032 kB
SwapFree:       263648 kB

You see most mem found its way in the active queue. If you talk about
"aggressive" meaning aggressively aged or even freed, I cannot see it.
I will go on for another day without additional patching and see how things
evolve and how the system behaves in interactive situation.

Ah, another thing to mention. I got some _new_ alloc failures:

Sep 18 04:16:49 admin kernel: nfsd __alloc_pages: 1-order allocation failed
(gfp=0x20/0) from c012de72
Sep 18 04:17:27 admin kernel: nfsd __alloc_pages: 0-order allocation failed
(gfp=0x1d2/0) from c012de72
Sep 18 04:21:18 admin kernel: gzip __alloc_pages: 0-order allocation failed
(gfp=0x1d2/0) from c012de72

c012de5c T _alloc_pages 
c012de74 t balance_classzone

Hope this helps,
Stephan


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2001-09-18  9:00 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-15 22:43 broken VM in 2.4.10-pre9 Peter Magnusson
2001-09-15 23:50 ` Jan Harkes
2001-09-16  5:31 ` Linus Torvalds
2001-09-16  8:45   ` Eric W. Biederman
2001-09-17 10:25 ` Tonu Samuel
2001-09-16 16:47   ` Jeremy Zawodny
2001-09-16 18:36     ` Alan Cox
2001-09-16 19:38       ` Linus Torvalds
2001-09-16 18:34   ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Andrea Arcangeli
2001-09-16 19:07     ` Rik van Riel
2001-09-16 15:19       ` broken VM in 2.4.10-pre9 Phillip Susi
2001-09-16 19:33         ` Jeremy Zawodny
2001-09-16 19:54           ` [PATCH] " Rik van Riel
2001-09-16 19:52         ` Rik van Riel
2001-09-16 19:17       ` vm rewrite ready [Re: broken VM in 2.4.10-pre9] Alan Cox
2001-09-16 19:15         ` Rik van Riel
2001-09-16 19:19       ` Andrea Arcangeli
2001-09-16 19:30         ` Linus Torvalds
     [not found]     ` <20010917174037.7e3739b9.skraw@ithnet.com>
     [not found]       ` <20010917181040.J713@athlon.random>
     [not found]         ` <20010917191256.6e6a1c87.skraw@ithnet.com>
2001-09-17 22:41           ` Andrea Arcangeli
2001-09-18  9:00       ` Stephan von Krawczynski
2001-09-16 19:37   ` broken VM in 2.4.10-pre9 Linus Torvalds
2001-09-17 14:04     ` Olaf Zaplinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).