linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Break 2.4 VM in five easy steps
@ 2001-06-06 15:31 Derek Glidden
  2001-06-06 15:46 ` John Alvord
  2001-06-06 21:30 ` Alan Cox
  0 siblings, 2 replies; 35+ messages in thread
From: Derek Glidden @ 2001-06-06 15:31 UTC (permalink / raw)
  To: Alexander Viro, linux-kernel


> Funny. I can count many ways in which 4.3BSD, SunOS{3,4} and post-4.4 BSD
> systems I've used were broken, but I've never thought that swap==2*RAM rule
> was one of them.

Yes, but Linux isn't 4.3BSD, SunOS or post-4.4 BSD.  Not to mention, all
other OS's I've had experience using *don't* break severely if you don't
follow the "swap==2*RAM" rule.  Except Linux 2.4.

> Not that being more kind on swap would be a bad thing, but that rule for
> amount of swap is pretty common. ISTR similar for (very old) SCO, so it's
> not just BSD world. How are modern Missed'em'V variants in that respect, BTW?

Yes, but that has traditionally been one of the big BENEFITS of Linux,
and other UNIXes.  As Sean Hunter said, "Virtual memory is one of the
killer features of
unix."  Linux has *never* in the past REQUIRED me to follow that rule. 
Which is a big reason I use it in so many places.

Take an example mentioned by someone on the list already: a laptop.  I
have two laptops that run Linux.  One has a 4GB disk, one has a 12GB
disk.  Both disks are VERY full of data and both machines get pretty
heavy use.  It's a fact that I just bumped one laptop (with 256MB of
swap configured) from 128MB to 256MB of RAM.  Does this mean that if I
want to upgrade to the 2.4 kernel on that machine I now have to back up
all that data, repartition the drive and restore everything just so I
can fastidiously follow the "swap == 2*RAM" rule else the 2.4 VM
subsystem will break?  Bollocks, to quote yet another participant in
this silly discussion.

I'm beginning to be amazed at the Linux VM hackers' attitudes regarding
this problem.  I expect this sort of behaviour from academics - ignoring
real actual problems being reported by real actual people really and
actually experiencing and reporting them because "technically" or
"theoretically" they "shouldn't be an issue" or because "the "literature
[documentation] says otherwise - but not from this group.  

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 15:31 Break 2.4 VM in five easy steps Derek Glidden
@ 2001-06-06 15:46 ` John Alvord
  2001-06-06 15:58   ` Derek Glidden
  2001-06-06 21:30 ` Alan Cox
  1 sibling, 1 reply; 35+ messages in thread
From: John Alvord @ 2001-06-06 15:46 UTC (permalink / raw)
  To: Derek Glidden; +Cc: Alexander Viro, linux-kernel

On Wed, 06 Jun 2001 11:31:28 -0400, Derek Glidden
<dglidden@illusionary.com> wrote:


>
>I'm beginning to be amazed at the Linux VM hackers' attitudes regarding
>this problem.  I expect this sort of behaviour from academics - ignoring
>real actual problems being reported by real actual people really and
>actually experiencing and reporting them because "technically" or
>"theoretically" they "shouldn't be an issue" or because "the "literature
>[documentation] says otherwise - but not from this group.  

There have been multiple comments that a fix for the problem is
forthcoming. Is there some reason you have to keep talking about it?

John alvord

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 15:46 ` John Alvord
@ 2001-06-06 15:58   ` Derek Glidden
  2001-06-06 18:27     ` Eric W. Biederman
  2001-06-09  7:34     ` Rik van Riel
  0 siblings, 2 replies; 35+ messages in thread
From: Derek Glidden @ 2001-06-06 15:58 UTC (permalink / raw)
  To: John Alvord; +Cc: linux-kernel

John Alvord wrote:
> 
> On Wed, 06 Jun 2001 11:31:28 -0400, Derek Glidden
> <dglidden@illusionary.com> wrote:
> 
> >
> >I'm beginning to be amazed at the Linux VM hackers' attitudes regarding
> >this problem.  I expect this sort of behaviour from academics - ignoring
> >real actual problems being reported by real actual people really and
> >actually experiencing and reporting them because "technically" or
> >"theoretically" they "shouldn't be an issue" or because "the "literature
> >[documentation] says otherwise - but not from this group.
> 
> There have been multiple comments that a fix for the problem is
> forthcoming. Is there some reason you have to keep talking about it?

Because there have been many more comments that "The rule for 2.4 is
'swap == 2*RAM' and that's the way it is" and "disk space is cheap -
just add more" than there have been "this is going to be fixed" which is
extremely discouraging and doesn't instill me with all sorts of
confidence that this problem is being taken seriously.

Or are you saying that if someone is unhappy with a particular
situation, they should just keep their mouth shut and accept it?

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#!/usr/bin/perl -w
$_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map
{$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;
$t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z)
[$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join
"",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d=
unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d
>>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q*
8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]}
print+x"C*",@a}';s/x/pack+/g;eval 

usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \
    | extract_mpeg2 | mpeg2dec - 

http://www.eff.org/                    http://www.opendvd.org/ 
         http://www.cs.cmu.edu/~dst/DeCSS/Gallery/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 15:58   ` Derek Glidden
@ 2001-06-06 18:27     ` Eric W. Biederman
  2001-06-06 18:47       ` Derek Glidden
                         ` (2 more replies)
  2001-06-09  7:34     ` Rik van Riel
  1 sibling, 3 replies; 35+ messages in thread
From: Eric W. Biederman @ 2001-06-06 18:27 UTC (permalink / raw)
  To: Derek Glidden; +Cc: John Alvord, linux-kernel

Derek Glidden <dglidden@illusionary.com> writes:

> John Alvord wrote:
> > 
> > On Wed, 06 Jun 2001 11:31:28 -0400, Derek Glidden
> > <dglidden@illusionary.com> wrote:
> > 
> > >
> > >I'm beginning to be amazed at the Linux VM hackers' attitudes regarding
> > >this problem.  I expect this sort of behaviour from academics - ignoring
> > >real actual problems being reported by real actual people really and
> > >actually experiencing and reporting them because "technically" or
> > >"theoretically" they "shouldn't be an issue" or because "the "literature
> > >[documentation] says otherwise - but not from this group.
> > 
> > There have been multiple comments that a fix for the problem is
> > forthcoming. Is there some reason you have to keep talking about it?
> 
> Because there have been many more comments that "The rule for 2.4 is
> 'swap == 2*RAM' and that's the way it is" and "disk space is cheap -
> just add more" than there have been "this is going to be fixed" which is
> extremely discouraging and doesn't instill me with all sorts of
> confidence that this problem is being taken seriously.

The hard rule will always be that to cover all pathological cases swap
must be greater than RAM.  Because in the worse case all RAM will be
in thes swap cache.  That this is more than just the worse case in 2.4
is problematic.  I.e. In the worst case: 
Virtual Memory = RAM + (swap - RAM).

You can't improve the worst case.  We can improve the worst case that
many people are facing.

> Or are you saying that if someone is unhappy with a particular
> situation, they should just keep their mouth shut and accept it?

It's worth complaining about.  It is also worth digging into and find
out what the real problem is.  I have a hunch that this hole
conversation on swap sizes being irritating is hiding the real
problem.  

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 18:27     ` Eric W. Biederman
@ 2001-06-06 18:47       ` Derek Glidden
  2001-06-06 18:52         ` Eric W. Biederman
  2001-06-06 20:43       ` Daniel Phillips
  2001-06-06 21:57       ` LA Walsh
  2 siblings, 1 reply; 35+ messages in thread
From: Derek Glidden @ 2001-06-06 18:47 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel

"Eric W. Biederman" wrote:
> 
> > Or are you saying that if someone is unhappy with a particular
> > situation, they should just keep their mouth shut and accept it?
> 
> It's worth complaining about.  It is also worth digging into and find
> out what the real problem is.  I have a hunch that this hole
> conversation on swap sizes being irritating is hiding the real
> problem.

I totally agree with this, and want to reiterate that the original
problem I posted has /nothing/ to do with the "swap == 2*RAM" issue.

The problem I reported is not that 2.4 uses huge amounts of swap but
that trying to recover that swap off of disk under 2.4 can leave the
machine in an entirely unresponsive state, while 2.2 handles identical
situations gracefully.  

I'm annoyed by 2.4's "requirement" of too much swap, but I consider that
less a bug and more a severe design flaw.  

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#!/usr/bin/perl -w
$_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map
{$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;
$t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z)
[$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join
"",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d=
unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d
>>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q*
8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]}
print+x"C*",@a}';s/x/pack+/g;eval 

usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \
    | extract_mpeg2 | mpeg2dec - 

http://www.eff.org/                    http://www.opendvd.org/ 
         http://www.cs.cmu.edu/~dst/DeCSS/Gallery/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 18:47       ` Derek Glidden
@ 2001-06-06 18:52         ` Eric W. Biederman
  2001-06-06 19:06           ` Mike Galbraith
                             ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Eric W. Biederman @ 2001-06-06 18:52 UTC (permalink / raw)
  To: Derek Glidden; +Cc: linux-kernel, linux-mm

Derek Glidden <dglidden@illusionary.com> writes:


> The problem I reported is not that 2.4 uses huge amounts of swap but
> that trying to recover that swap off of disk under 2.4 can leave the
> machine in an entirely unresponsive state, while 2.2 handles identical
> situations gracefully.  
> 

The interesting thing from other reports is that it appears to be kswapd
using up CPU resources.  Not the swapout code at all.  So it appears
to be a fundamental VM issue.  And calling swapoff is just a good way
to trigger it. 

If you could confirm this by calling swapoff sometime other than at
reboot time.  That might help.  Say by running top on the console.

Eric




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 18:52         ` Eric W. Biederman
@ 2001-06-06 19:06           ` Mike Galbraith
  2001-06-06 19:28             ` Eric W. Biederman
  2001-06-06 19:28           ` Break 2.4 VM in five easy steps Derek Glidden
  2001-06-09  7:55           ` Rik van Riel
  2 siblings, 1 reply; 35+ messages in thread
From: Mike Galbraith @ 2001-06-06 19:06 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Derek Glidden, linux-kernel, linux-mm

On 6 Jun 2001, Eric W. Biederman wrote:

> Derek Glidden <dglidden@illusionary.com> writes:
>
>
> > The problem I reported is not that 2.4 uses huge amounts of swap but
> > that trying to recover that swap off of disk under 2.4 can leave the
> > machine in an entirely unresponsive state, while 2.2 handles identical
> > situations gracefully.
> >
>
> The interesting thing from other reports is that it appears to be kswapd
> using up CPU resources.  Not the swapout code at all.  So it appears
> to be a fundamental VM issue.  And calling swapoff is just a good way
> to trigger it.
>
> If you could confirm this by calling swapoff sometime other than at
> reboot time.  That might help.  Say by running top on the console.

The thing goes comatose here too. SCHED_RR vmstat doesn't run, console
switch is nogo...

After running his memory hog, swapoff took 18 seconds.  I hacked a
bleeder valve for dead swap pages, and it dropped to 4 seconds.. still
utterly comatose for those 4 seconds though.

	-Mike


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 18:52         ` Eric W. Biederman
  2001-06-06 19:06           ` Mike Galbraith
@ 2001-06-06 19:28           ` Derek Glidden
  2001-06-09  7:55           ` Rik van Riel
  2 siblings, 0 replies; 35+ messages in thread
From: Derek Glidden @ 2001-06-06 19:28 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel, linux-mm

"Eric W. Biederman" wrote:
> 
> Derek Glidden <dglidden@illusionary.com> writes:
> 
> > The problem I reported is not that 2.4 uses huge amounts of swap but
> > that trying to recover that swap off of disk under 2.4 can leave the
> > machine in an entirely unresponsive state, while 2.2 handles identical
> > situations gracefully.
> >
> 
> The interesting thing from other reports is that it appears to be kswapd
> using up CPU resources.  Not the swapout code at all.  So it appears
> to be a fundamental VM issue.  And calling swapoff is just a good way
> to trigger it.
> 
> If you could confirm this by calling swapoff sometime other than at
> reboot time.  That might help.  Say by running top on the console.

That's exactly what my original test was doing.  I think it was Jeffrey
Baker complaining about "swapoff" at reboot.  See my original post that
started this thread and follow the "five easy steps."  :)  I'm sucking
down a lot of swap, although not all that's available which is something
I am specifically trying to avoid - I wanted to stress the VM/swap
recovery procedure, not "out of RAM and swap" memory pressure - and then
running 'swapoff' from an xterm or a console.

The problem with being able to see what's eating up CPU resources is
that the whole machine stops responding for me to tell.  consoles stop
updating, the X display freezes, keyboard input is locked out, etc.  As
far as anyone can tell, for several minutes, the whole machine is locked
up. (except, strangely enough, the machine will still respond to ping) 
I've tried running 'top' to see what task is taking up all the CPU time,
but the system hangs before it shows anything meaningful.  I have been
able to tell that it hits 100% "system" utilization very quickly though.

I did notice that the first thing sys_swapoff() does is call
lock_kernel() ... so if sys_swapoff() takes a long time, I imagine
things will get very unresponsive quickly.  (But I'm not intimately
familiar with the various kernel locks, so I don't know what
granularity/atomicity/whatever lock_kernel() enforces.)

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#!/usr/bin/perl -w
$_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map
{$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;
$t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z)
[$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join
"",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d=
unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d
>>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q*
8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]}
print+x"C*",@a}';s/x/pack+/g;eval 

usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \
    | extract_mpeg2 | mpeg2dec - 

http://www.eff.org/                    http://www.opendvd.org/ 
         http://www.cs.cmu.edu/~dst/DeCSS/Gallery/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 19:06           ` Mike Galbraith
@ 2001-06-06 19:28             ` Eric W. Biederman
  2001-06-07  4:32               ` Mike Galbraith
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2001-06-06 19:28 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Derek Glidden, linux-kernel, linux-mm

Mike Galbraith <mikeg@wen-online.de> writes:

> On 6 Jun 2001, Eric W. Biederman wrote:
> 
> > Derek Glidden <dglidden@illusionary.com> writes:
> >
> >
> > > The problem I reported is not that 2.4 uses huge amounts of swap but
> > > that trying to recover that swap off of disk under 2.4 can leave the
> > > machine in an entirely unresponsive state, while 2.2 handles identical
> > > situations gracefully.
> > >
> >
> > The interesting thing from other reports is that it appears to be kswapd
> > using up CPU resources.  Not the swapout code at all.  So it appears
> > to be a fundamental VM issue.  And calling swapoff is just a good way
> > to trigger it.
> >
> > If you could confirm this by calling swapoff sometime other than at
> > reboot time.  That might help.  Say by running top on the console.
> 
> The thing goes comatose here too. SCHED_RR vmstat doesn't run, console
> switch is nogo...
> 
> After running his memory hog, swapoff took 18 seconds.  I hacked a
> bleeder valve for dead swap pages, and it dropped to 4 seconds.. still
> utterly comatose for those 4 seconds though.

At the top of the while(1) loop in try_to_unuse what happens if you put in.
if (need_resched) schedule(); 
It should be outside all of the locks.  It might just be a matter of everything
serializing on the SMP locks, and the kernel refusing to preempt itself.

Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 18:27     ` Eric W. Biederman
  2001-06-06 18:47       ` Derek Glidden
@ 2001-06-06 20:43       ` Daniel Phillips
  2001-06-06 21:57       ` LA Walsh
  2 siblings, 0 replies; 35+ messages in thread
From: Daniel Phillips @ 2001-06-06 20:43 UTC (permalink / raw)
  To: Eric W. Biederman, Derek Glidden; +Cc: John Alvord, linux-kernel

On Wednesday 06 June 2001 20:27, Eric W. Biederman wrote:
> The hard rule will always be that to cover all pathological cases
> swap must be greater than RAM.  Because in the worse case all RAM
> will be in thes swap cache.

Could you explain in very simple terms how the worst case comes about?

--
Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 15:31 Break 2.4 VM in five easy steps Derek Glidden
  2001-06-06 15:46 ` John Alvord
@ 2001-06-06 21:30 ` Alan Cox
  2001-06-06 21:57   ` Derek Glidden
  1 sibling, 1 reply; 35+ messages in thread
From: Alan Cox @ 2001-06-06 21:30 UTC (permalink / raw)
  To: Derek Glidden; +Cc: Alexander Viro, linux-kernel

> I'm beginning to be amazed at the Linux VM hackers' attitudes regarding
> this problem.  I expect this sort of behaviour from academics - ignoring
> real actual problems being reported by real actual people really and

Actually I find your attitude amazing. If you would like a quote on fixing
specific VM problems Im sure several people will be happy to tender.

> actually experiencing and reporting them because "technically" or
> "theoretically" they "shouldn't be an issue" or because "the "literature
> [documentation] says otherwise - but not from this group.  

I guess the patch to fix this that I have in my mailbox to merge doesnt exist.
A pity because if it doesnt exist I cant send it to you


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 18:27     ` Eric W. Biederman
  2001-06-06 18:47       ` Derek Glidden
  2001-06-06 20:43       ` Daniel Phillips
@ 2001-06-06 21:57       ` LA Walsh
  2001-06-07  6:35         ` Eric W. Biederman
  2 siblings, 1 reply; 35+ messages in thread
From: LA Walsh @ 2001-06-06 21:57 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel

"Eric W. Biederman" wrote:

> The hard rule will always be that to cover all pathological cases swap
> must be greater than RAM.  Because in the worse case all RAM will be
> in thes swap cache.  That this is more than just the worse case in 2.4
> is problematic.  I.e. In the worst case:
> Virtual Memory = RAM + (swap - RAM).

Hmmm....so my 512M laptop only really has 256M?  Um...I regularlly run
more than 256M of programs.  I don't want it to swap -- its a special, weird
condition if I do start swapping.  I don't want to waste 1G of HD (5%) for
something I never want to use.  IRIX runs just fine with swap<RAM.  In
Irix, your Virtual Memory = RAM + swap.  Seems like the Linux kernel requires
more swap than other old OS's (SunOS3 (virtual mem = min(mem,swap)).
I *thought* I remember that restriction being lifted in SunOS4 when they
upgraded the VM.  Even though I worked there for 6 years, that was
6 years ago...

> You can't improve the worst case.  We can improve the worst case that
> many people are facing.

---
    Other OS's don't have this pathological 'worst case' scenario.  Even
my Windows [vm]box seems to operate fine with swap<MEM.  On IRIX,
virtual space closely approximates physical + disk memory.

> It's worth complaining about.  It is also worth digging into and find
> out what the real problem is.  I have a hunch that this hole
> conversation on swap sizes being irritating is hiding the real
> problem.

---
    Okay, admission of ignorance.  When we speak of "swap space",
is this term inclusive of both demand paging space and
swap-out-entire-programs space or one or another?
-linda

--
The above thoughts and           | They may have nothing to do with
writings are my own.             | the opinions of my employer. :-)
L A Walsh                        | Trust Technology, Core Linux, SGI
law@sgi.com                      | Voice: (650) 933-5338




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 21:30 ` Alan Cox
@ 2001-06-06 21:57   ` Derek Glidden
  2001-06-09  8:09     ` Rik van Riel
  0 siblings, 1 reply; 35+ messages in thread
From: Derek Glidden @ 2001-06-06 21:57 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

Alan Cox wrote:
> 
> > I'm beginning to be amazed at the Linux VM hackers' attitudes regarding
> > this problem.  I expect this sort of behaviour from academics - ignoring
> > real actual problems being reported by real actual people really and
> 
> Actually I find your attitude amazing. If you would like a quote on fixing
> specific VM problems Im sure several people will be happy to tender.

The very first thing I said in my very first message on this topic is
that I've been following LKML for a couple of weeks now and following
the VM work.  I _know_ there are VM problems.  I _know_ there are people
working on the problems.  Yet, when I post a specific example, with
_clear and simple_ instructions on how to reproduce a problem I'm
experiencing and an offer to do whatever I can to help fix the problem,
I am told repeatedly, in effect "you need more swap, that's your
problem" (which isn't really even related to the issue I reported) by
names I have come to recognize and respect despite my status as not a
kernel hacker. Why shouldn't I be flabbergasted by that?

> I guess the patch to fix this that I have in my mailbox to merge doesnt exist.
> A pity because if it doesnt exist I cant send it to you

huh ... I just don't know how to take that except it seems to uphold
everything I said to which you responded so intensely.

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#!/usr/bin/perl -w
$_='while(read+STDIN,$_,2048){$a=29;$b=73;$c=142;$t=255;@t=map
{$_%16or$t^=$c^=($m=(11,10,116,100,11,122,20,100)[$_/16%8])&110;
$t^=(72,@z=(64,72,$a^=12*($_%16-2?0:$m&17)),$b^=$_%64?12:0,@z)
[$_%8]}(16..271);if((@a=unx"C*",$_)[20]&48){$h=5;$_=unxb24,join
"",@b=map{xB8,unxb8,chr($_^$a[--$h+84])}@ARGV;s/...$/1$&/;$d=
unxV,xb25,$_;$e=256|(ord$b[4])<<9|ord$b[3];$d=$d>>8^($f=$t&($d
>>12^$d>>4^$d^$d/8))<<17,$e=$e>>8^($t&($g=($q=$e>>14&7^$e)^$q*
8^$q<<6))<<9,$_=$t[$_]^(($h>>=8)+=$f+(~$g&$t))for@a[128..$#a]}
print+x"C*",@a}';s/x/pack+/g;eval 

usage: qrpff 153 2 8 105 225 < /mnt/dvd/VOB_FILENAME \
    | extract_mpeg2 | mpeg2dec - 

http://www.eff.org/                    http://www.opendvd.org/ 
         http://www.cs.cmu.edu/~dst/DeCSS/Gallery/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 19:28             ` Eric W. Biederman
@ 2001-06-07  4:32               ` Mike Galbraith
  2001-06-07  6:38                 ` Eric W. Biederman
  2001-06-07 17:10                 ` Marcelo Tosatti
  0 siblings, 2 replies; 35+ messages in thread
From: Mike Galbraith @ 2001-06-07  4:32 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Derek Glidden, linux-kernel, linux-mm

On 6 Jun 2001, Eric W. Biederman wrote:

> Mike Galbraith <mikeg@wen-online.de> writes:
>
> > > If you could confirm this by calling swapoff sometime other than at
> > > reboot time.  That might help.  Say by running top on the console.
> >
> > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console
> > switch is nogo...
> >
> > After running his memory hog, swapoff took 18 seconds.  I hacked a
> > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still
> > utterly comatose for those 4 seconds though.
>
> At the top of the while(1) loop in try_to_unuse what happens if you put in.
> if (need_resched) schedule();
> It should be outside all of the locks.  It might just be a matter of everything
> serializing on the SMP locks, and the kernel refusing to preempt itself.

That did it.

	-Mike


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 21:57       ` LA Walsh
@ 2001-06-07  6:35         ` Eric W. Biederman
  2001-06-07 15:25           ` LA Walsh
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2001-06-07  6:35 UTC (permalink / raw)
  To: LA Walsh; +Cc: linux-kernel

LA Walsh <law@sgi.com> writes:

> "Eric W. Biederman" wrote:
> 
> > The hard rule will always be that to cover all pathological cases swap
> > must be greater than RAM.  Because in the worse case all RAM will be
> > in thes swap cache.  That this is more than just the worse case in 2.4
> > is problematic.  I.e. In the worst case:
> > Virtual Memory = RAM + (swap - RAM).
> 
> Hmmm....so my 512M laptop only really has 256M?  Um...I regularlly run
> more than 256M of programs.  I don't want it to swap -- its a special, weird
> condition if I do start swapping.  I don't want to waste 1G of HD (5%) for
> something I never want to use.  IRIX runs just fine with swap<RAM.  In
> Irix, your Virtual Memory = RAM + swap.  Seems like the Linux kernel requires
> more swap than other old OS's (SunOS3 (virtual mem = min(mem,swap)).
> I *thought* I remember that restriction being lifted in SunOS4 when they
> upgraded the VM.  Even though I worked there for 6 years, that was
> 6 years ago...

There are cetain scenario's where you can't avoid virtual mem =
min(RAM,swap). Which is what I was trying to say, (bad formula).  What
happens is that pages get referenced  evenly enough and quickly enough
that you simply cannot reuse the on disk pages.  Basically in the
worst case all of RAM is pretty much in flight doing I/O.  This is
true of all paging systems.

However just because in the worst case virtual mem = min(RAM,swap), is
no reason other cases should use that much swap.  If you are doing a
lot of swapping it is more efficient to plan on mem = min(RAM,swap) as
well, because frequently you can save on I/O operations by simply
reusing the existing swap page.

> 
> > You can't improve the worst case.  We can improve the worst case that
> > many people are facing.
> 
> ---
>     Other OS's don't have this pathological 'worst case' scenario.  Even
> my Windows [vm]box seems to operate fine with swap<MEM.  On IRIX,
> virtual space closely approximates physical + disk memory.

It's a theoretical worst case and they all have it.  In practice it is
very hard to find a work load where practically every page in the
system is close to the I/O point howerver.

Except for removing pages that aren't used paging with swap < RAM is
not useful.  Simply removing pages that aren't in active use but might
possibly be used someday is a common case, so it is worth supporting.

> 
> > It's worth complaining about.  It is also worth digging into and find
> > out what the real problem is.  I have a hunch that this hole
> > conversation on swap sizes being irritating is hiding the real
> > problem.
> 
> ---
>     Okay, admission of ignorance.  When we speak of "swap space",
> is this term inclusive of both demand paging space and
> swap-out-entire-programs space or one or another?

Linux has no method to swap out an entire program so when I speak of
swapping I'm actually thinking paging.

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-07  4:32               ` Mike Galbraith
@ 2001-06-07  6:38                 ` Eric W. Biederman
  2001-06-07  7:28                   ` Mike Galbraith
  2001-06-07 17:10                 ` Marcelo Tosatti
  1 sibling, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2001-06-07  6:38 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Derek Glidden, linux-kernel, linux-mm

Mike Galbraith <mikeg@wen-online.de> writes:

> On 6 Jun 2001, Eric W. Biederman wrote:
> 
> > Mike Galbraith <mikeg@wen-online.de> writes:
> >
> > > > If you could confirm this by calling swapoff sometime other than at
> > > > reboot time.  That might help.  Say by running top on the console.
> > >
> > > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console
> > > switch is nogo...
> > >
> > > After running his memory hog, swapoff took 18 seconds.  I hacked a
> > > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still
> > > utterly comatose for those 4 seconds though.
> >
> > At the top of the while(1) loop in try_to_unuse what happens if you put in.
> > if (need_resched) schedule();
> > It should be outside all of the locks.  It might just be a matter of
> everything
> 
> > serializing on the SMP locks, and the kernel refusing to preempt itself.
> 
> That did it.

Does this improve the swapoff speed or just allow other programs to
run at the same time?  If it is still slow under that kind of load it
would be interesting to know what is taking up all time.

If it is no longer slow a patch should be made and sent to Linus.

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-07  6:38                 ` Eric W. Biederman
@ 2001-06-07  7:28                   ` Mike Galbraith
  2001-06-07  7:59                     ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Galbraith @ 2001-06-07  7:28 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Derek Glidden, linux-kernel, linux-mm

On 7 Jun 2001, Eric W. Biederman wrote:

> Mike Galbraith <mikeg@wen-online.de> writes:
>
> > On 6 Jun 2001, Eric W. Biederman wrote:
> >
> > > Mike Galbraith <mikeg@wen-online.de> writes:
> > >
> > > > > If you could confirm this by calling swapoff sometime other than at
> > > > > reboot time.  That might help.  Say by running top on the console.
> > > >
> > > > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console
> > > > switch is nogo...
> > > >
> > > > After running his memory hog, swapoff took 18 seconds.  I hacked a
> > > > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still
> > > > utterly comatose for those 4 seconds though.
> > >
> > > At the top of the while(1) loop in try_to_unuse what happens if you put in.
> > > if (need_resched) schedule();
> > > It should be outside all of the locks.  It might just be a matter of
> > everything
> >
> > > serializing on the SMP locks, and the kernel refusing to preempt itself.
> >
> > That did it.
>
> Does this improve the swapoff speed or just allow other programs to
> run at the same time?  If it is still slow under that kind of load it
> would be interesting to know what is taking up all time.
>
> If it is no longer slow a patch should be made and sent to Linus.

No, it only cures the freeze.  The other appears to be the slow code
pointed out by Andrew Morton being tickled by dead swap pages.

	-Mike


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-07  7:28                   ` Mike Galbraith
@ 2001-06-07  7:59                     ` Eric W. Biederman
  2001-06-07  8:15                       ` Mike Galbraith
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2001-06-07  7:59 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Derek Glidden, linux-kernel, linux-mm

Mike Galbraith <mikeg@wen-online.de> writes:

> On 7 Jun 2001, Eric W. Biederman wrote:
> 
> > Does this improve the swapoff speed or just allow other programs to
> > run at the same time?  If it is still slow under that kind of load it
> > would be interesting to know what is taking up all time.
> >
> > If it is no longer slow a patch should be made and sent to Linus.
> 
> No, it only cures the freeze.  The other appears to be the slow code
> pointed out by Andrew Morton being tickled by dead swap pages.

O.k.  I think I'm ready to nominate the dead swap pages for the big
2.4.x VM bug award.  So we are burning cpu cycles in sys_swapoff
instead of being IO bound?  Just wanting to understand this the cheap way :)

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-07  7:59                     ` Eric W. Biederman
@ 2001-06-07  8:15                       ` Mike Galbraith
  0 siblings, 0 replies; 35+ messages in thread
From: Mike Galbraith @ 2001-06-07  8:15 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Derek Glidden, linux-kernel, linux-mm

On 7 Jun 2001, Eric W. Biederman wrote:

> Mike Galbraith <mikeg@wen-online.de> writes:
>
> > On 7 Jun 2001, Eric W. Biederman wrote:
> >
> > > Does this improve the swapoff speed or just allow other programs to
> > > run at the same time?  If it is still slow under that kind of load it
> > > would be interesting to know what is taking up all time.
> > >
> > > If it is no longer slow a patch should be made and sent to Linus.
> >
> > No, it only cures the freeze.  The other appears to be the slow code
> > pointed out by Andrew Morton being tickled by dead swap pages.
>
> O.k.  I think I'm ready to nominate the dead swap pages for the big
> 2.4.x VM bug award.  So we are burning cpu cycles in sys_swapoff
> instead of being IO bound?  Just wanting to understand this the cheap way :)

There's no IO being done whatsoever (that I can see with only a blinky).
I can fire up ktrace and find out exactly what's going on if that would
be helpful.  Eating the dead swap pages from the active page list prior
to swapoff cures all but a short freeze.  Eating the rest (few of those)
might cure the rest, but I doubt it.

	-Mike


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-07  6:35         ` Eric W. Biederman
@ 2001-06-07 15:25           ` LA Walsh
  2001-06-07 16:42             ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: LA Walsh @ 2001-06-07 15:25 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel

"Eric W. Biederman" wrote:

> There are cetain scenario's where you can't avoid virtual mem =
> min(RAM,swap). Which is what I was trying to say, (bad formula).  What
> happens is that pages get referenced  evenly enough and quickly enough
> that you simply cannot reuse the on disk pages.  Basically in the
> worst case all of RAM is pretty much in flight doing I/O.  This is
> true of all paging systems.

----
    So, if I understand, you are talking about thrashing behavior
where your active set is larger than physical ram.  If that
is the case then requiring 2X+ swap for "better" performance
is reasonable.  However, if your active set is truely larger
than your physical memory on a consistant basis, in this day,
the solution is usually "add more RAM".  I may be wrong, but
my belief is that with today's computers people are used to having
enough memory to do their normal tasks and that swap is for
"peak loads" that don't occur on a sustained basis.  Of course
I imagine that this is my belief as it is my own practice/view.
I want to have considerably more memory than my normal working
set.  Swap on my laptop disk is *slow*.  It's a low-power, low-RPM,
slow seek rate all to conserve power (difference between spinning/off
= 1W).  So I have 50% of my phys mem on swap -- because I want to
'feel' it when I goto swap and start looking for memory hogs.
For me, the pathological case is touching swap *at all*.  So the
idea of the entire active set being >=phys mem is already broken
on my setup.  Thus my expectation of swap only as 'warning'/'buffer'
zone.

    Now for whatever reason, since 2.4, I consistently use at least
a few Mb of swap -- stands at 5Meg now.  Weird -- but I notice things
like nscd running 7 copies that take 72M.  Seems like overkill for
a laptop.

> However just because in the worst case virtual mem = min(RAM,swap), is
> no reason other cases should use that much swap.  If you are doing a
> lot of swapping it is more efficient to plan on mem = min(RAM,swap) as
> well, because frequently you can save on I/O operations by simply
> reusing the existing swap page.

---
    Agreed.  But planning your swap space for a worst
case scenario that you never hit is wasteful.  My worst
case is using any swap.  The system should be able to live
with swap=1/2*phys in my situation.  I don't think I'm
unique in this respect.

> It's a theoretical worst case and they all have it.  In practice it is
> very hard to find a work load where practically every page in the
> system is close to the I/O point howerver.

---
    Well exactly the point.  It was in such situations in some older
systems that some programs were swapped out and temporarily made
unavailable for running (they showed up in the 'w' space in vmstat).

> Except for removing pages that aren't used paging with swap < RAM is
> not useful.  Simply removing pages that aren't in active use but might
> possibly be used someday is a common case, so it is worth supporting.

---
    I think that is the point -- it was supported in 2.2, it is, IMO,
a serious regression that it is not supported in 2.4.

-linda

--
The above thoughts and       | They may have nothing to do with
writings are my own.         | the opinions of my employer. :-)
L A Walsh                    | Senior MTS, Trust Tech., Core Linux, SGI
law@sgi.com                  | Voice: (650) 933-5338




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-07 15:25           ` LA Walsh
@ 2001-06-07 16:42             ` Eric W. Biederman
  2001-06-07 20:47               ` LA Walsh
  0 siblings, 1 reply; 35+ messages in thread
From: Eric W. Biederman @ 2001-06-07 16:42 UTC (permalink / raw)
  To: LA Walsh; +Cc: linux-kernel

LA Walsh <law@sgi.com> writes:

>     Now for whatever reason, since 2.4, I consistently use at least
> a few Mb of swap -- stands at 5Meg now.  Weird -- but I notice things
> like nscd running 7 copies that take 72M.  Seems like overkill for
> a laptop.

So the question becomes why you are seeing an increased swap usage.
Currently there are two canidates in the 2.4.x code path.

1) Delayed swap deallocation, when a program exits after it
   has gone into swap it's swap usage is not freed. Ouch.

2) Increased tenacity of swap caching.  In particular in 2.2.x if a page
   that was in the swap cache was written to the the page in the swap
   space would be removed.  In 2.4.x the location in swap space is
   retained with the goal of getting more efficient swap-ins.

Neither of the known canidates from increasing the swap load applies
when you aren't swapping in the first place.  They may aggrevate the
usage of swap when you are already swapping but they do not cause
swapping themselves.  This is why the intial recommendation for
increased swap space size was made.  If you are swapping we will use
more swap.

However what pushes your laptop over the edge into swapping is an
entirely different question.  And probably what should be solved.

>     I think that is the point -- it was supported in 2.2, it is, IMO,
> a serious regression that it is not supported in 2.4.

The problem with this general line of arguing is that it lumps a whole
bunch of real issues/regressions into one over all perception.  Since
there are multiple reasons people are seeing problems, they need to be
tracked down with specifics.

The swapoff case comes down to dead swap pages in the swap cache.
Which greatly increases the number of swap pages slows the system
down, but since these pages are trivial to free we don't generate any
I/O so don't wait for I/O and thus never enter the scheduler.  Making
nothing else in the system runnable.

Your case is significantly different.  I don't know if you are seeing 
any issues with swapping at all.  With a 5M usage it may simply be
totally unused pages being pushed out to the swap space.

Eric

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-07  4:32               ` Mike Galbraith
  2001-06-07  6:38                 ` Eric W. Biederman
@ 2001-06-07 17:10                 ` Marcelo Tosatti
  2001-06-07 17:43                   ` Please test: workaround to help swapoff behaviour Marcelo Tosatti
  1 sibling, 1 reply; 35+ messages in thread
From: Marcelo Tosatti @ 2001-06-07 17:10 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Eric W. Biederman, Derek Glidden, linux-kernel, linux-mm



On Thu, 7 Jun 2001, Mike Galbraith wrote:

> On 6 Jun 2001, Eric W. Biederman wrote:
> 
> > Mike Galbraith <mikeg@wen-online.de> writes:
> >
> > > > If you could confirm this by calling swapoff sometime other than at
> > > > reboot time.  That might help.  Say by running top on the console.
> > >
> > > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console
> > > switch is nogo...
> > >
> > > After running his memory hog, swapoff took 18 seconds.  I hacked a
> > > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still
> > > utterly comatose for those 4 seconds though.
> >
> > At the top of the while(1) loop in try_to_unuse what happens if you put in.
> > if (need_resched) schedule();
> > It should be outside all of the locks.  It might just be a matter of everything
> > serializing on the SMP locks, and the kernel refusing to preempt itself.
> 
> That did it.

What about including this workaround in the kernel ? 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Please test: workaround to help swapoff behaviour
  2001-06-07 17:10                 ` Marcelo Tosatti
@ 2001-06-07 17:43                   ` Marcelo Tosatti
  0 siblings, 0 replies; 35+ messages in thread
From: Marcelo Tosatti @ 2001-06-07 17:43 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Eric W. Biederman, Derek Glidden, lkml, linux-mm



On Thu, 7 Jun 2001, Marcelo Tosatti wrote:

> 
> On Thu, 7 Jun 2001, Mike Galbraith wrote:
> 
> > On 6 Jun 2001, Eric W. Biederman wrote:
> > 
> > > Mike Galbraith <mikeg@wen-online.de> writes:
> > >
> > > > > If you could confirm this by calling swapoff sometime other than at
> > > > > reboot time.  That might help.  Say by running top on the console.
> > > >
> > > > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console
> > > > switch is nogo...
> > > >
> > > > After running his memory hog, swapoff took 18 seconds.  I hacked a
> > > > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still
> > > > utterly comatose for those 4 seconds though.
> > >
> > > At the top of the while(1) loop in try_to_unuse what happens if you put in.
> > > if (need_resched) schedule();
> > > It should be outside all of the locks.  It might just be a matter of everything
> > > serializing on the SMP locks, and the kernel refusing to preempt itself.
> > 
> > That did it.
> 
> What about including this workaround in the kernel ? 

Well, 

This is for the people who has been experiencing the lockups while running
swapoff.

Please test. (against 2.4.6-pre1)

Thanks for the suggestion, Eric. 


--- linux.orig/mm/swapfile.c	Wed Jun  6 18:16:45 2001
+++ linux/mm/swapfile.c	Thu Jun  7 16:06:11 2001
@@ -345,6 +345,8 @@
 		/*
 		 * Find a swap page in use and read it in.
 		 */
+		if (current->need_resched)
+			schedule();
 		swap_device_lock(si);
 		for (i = 1; i < si->max ; i++) {
 			if (si->swap_map[i] > 0 && si->swap_map[i] != SWAP_MAP_BAD) {


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-07 16:42             ` Eric W. Biederman
@ 2001-06-07 20:47               ` LA Walsh
  2001-06-08 19:38                 ` Pavel Machek
  0 siblings, 1 reply; 35+ messages in thread
From: LA Walsh @ 2001-06-07 20:47 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-kernel

"Eric W. Biederman" wrote:

> LA Walsh <law@sgi.com> writes:
>
> >     Now for whatever reason, since 2.4, I consistently use at least
> > a few Mb of swap -- stands at 5Meg now.  Weird -- but I notice things
> > like nscd running 7 copies that take 72M.  Seems like overkill for
> > a laptop.
>
> So the question becomes why you are seeing an increased swap usage.
> Currently there are two canidates in the 2.4.x code path.
>
> 1) Delayed swap deallocation, when a program exits after it
>    has gone into swap it's swap usage is not freed. Ouch.

---
    Double ouch.  Swap is backing a non-existent program?

>
>
> 2) Increased tenacity of swap caching.  In particular in 2.2.x if a page
>    that was in the swap cache was written to the the page in the swap
>    space would be removed.  In 2.4.x the location in swap space is
>    retained with the goal of getting more efficient swap-ins.

----
    But if the page in memory is 'dirty', you can't be efficient with swapping
*in* the page.  The page on disk is invalid and should be released, or am I
missing something?

> Neither of the known canidates from increasing the swap load applies
> when you aren't swapping in the first place.  They may aggrevate the
> usage of swap when you are already swapping but they do not cause
> swapping themselves.  This is why the intial recommendation for
> increased swap space size was made.  If you are swapping we will use
> more swap.
>
> However what pushes your laptop over the edge into swapping is an
> entirely different question.  And probably what should be solved.

----
    On my laptop, it is insignificant and to my knowledge has no measurable
impact.  It seems like there is always 3-5 Meg used in swap no matter what's
running (or not) on the system.

> >     I think that is the point -- it was supported in 2.2, it is, IMO,
> > a serious regression that it is not supported in 2.4.
>
> The problem with this general line of arguing is that it lumps a whole
> bunch of real issues/regressions into one over all perception.  Since
> there are multiple reasons people are seeing problems, they need to be
> tracked down with specifics.

---
    Uhhh, yeah, sorta -- it's addressing the statement that a "new requirement of
2.4 is to have double the swap space".  If everyone agrees that's a problem, then
yes, we can go into specifics of what is causing or contributing to the problem.
It's getting past the attitude of some people that 2xMem for swap is somehow
'normal and acceptable -- deal with it".  In my case, seems like 10Mb of swap would
be all that would generally be used (I don't think I've ever seen swap usage over 7Mb)
on a 512M system.  To be told "oh, your wrong, you *should* have 1Gig or you are
operating in an 'unsupported' or non-standard configuration".  I find that very
user-unfriendly.


>
> The swapoff case comes down to dead swap pages in the swap cache.
> Which greatly increases the number of swap pages slows the system
> down, but since these pages are trivial to free we don't generate any
> I/O so don't wait for I/O and thus never enter the scheduler.  Making
> nothing else in the system runnable.

---
    I haven't ever *noticed* this on my machine but that could be
because there isn't much in swap to begin with?  Could be I was
just blissfully ignorant of the time it took to do a swapoff.
Hmmm....let's see...  Just tried it.  I didn't get a total lock up,
but cursor movement was definitely jerky:
> time sudo swapoff -a

real    0m10.577s
user    0m0.000s
sys     0m9.430s

Looking at vmstat, the needed space was taken mostly out of the
page cache (86M->81.8M) and about 700K each out of free and buff.


> Your case is significantly different.  I don't know if you are seeing
> any issues with swapping at all.  With a 5M usage it may simply be
> totally unused pages being pushed out to the swap space.

---
    Probably -- I guess the page cache and disk buffers put enough pressure to
push some things off to swap.

-linda
--
The above thoughts and       | They may have nothing to do with
writings are my own.         | the opinions of my employer. :-)
L A Walsh                    | Senior MTS, Trust Tech, Core Linux, SGI
law@sgi.com                  | Voice: (650) 933-5338



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-07 20:47               ` LA Walsh
@ 2001-06-08 19:38                 ` Pavel Machek
  0 siblings, 0 replies; 35+ messages in thread
From: Pavel Machek @ 2001-06-08 19:38 UTC (permalink / raw)
  To: LA Walsh; +Cc: Eric W. Biederman, linux-kernel

Hi!

>     But if the page in memory is 'dirty', you can't be efficient with swapping
> *in* the page.  The page on disk is invalid and should be released, or am I
> missing something?

Yes. You are missing fragmentation. This keeps it low.
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 15:58   ` Derek Glidden
  2001-06-06 18:27     ` Eric W. Biederman
@ 2001-06-09  7:34     ` Rik van Riel
  1 sibling, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2001-06-09  7:34 UTC (permalink / raw)
  To: Derek Glidden; +Cc: John Alvord, linux-kernel

On Wed, 6 Jun 2001, Derek Glidden wrote:

> Or are you saying that if someone is unhappy with a particular
> situation, they should just keep their mouth shut and accept it?

There are lots of options ...

1) wait until somebody fixes the problem
2) fix the problem yourself
3) start infinite flamewars and make developers
   so sick of the problem nobody wants to fix it
4) pay someone to fix the problem ;)

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 18:52         ` Eric W. Biederman
  2001-06-06 19:06           ` Mike Galbraith
  2001-06-06 19:28           ` Break 2.4 VM in five easy steps Derek Glidden
@ 2001-06-09  7:55           ` Rik van Riel
  2 siblings, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2001-06-09  7:55 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Derek Glidden, linux-kernel, linux-mm

On 6 Jun 2001, Eric W. Biederman wrote:
> Derek Glidden <dglidden@illusionary.com> writes:
> 
> > The problem I reported is not that 2.4 uses huge amounts of swap but
> > that trying to recover that swap off of disk under 2.4 can leave the
> > machine in an entirely unresponsive state, while 2.2 handles identical
> > situations gracefully.  
> 
> The interesting thing from other reports is that it appears to be
> kswapd using up CPU resources.

This part is being worked on, expect a solution for this thing
soon...


Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Break 2.4 VM in five easy steps
  2001-06-06 21:57   ` Derek Glidden
@ 2001-06-09  8:09     ` Rik van Riel
  0 siblings, 0 replies; 35+ messages in thread
From: Rik van Riel @ 2001-06-09  8:09 UTC (permalink / raw)
  To: Derek Glidden; +Cc: Alan Cox, linux-kernel

On Wed, 6 Jun 2001, Derek Glidden wrote:

> working on the problems.  Yet, when I post a specific example, with
> _clear and simple_ instructions on how to reproduce a problem I'm
> experiencing and an offer to do whatever I can to help fix the problem,
> I am told repeatedly, in effect "you need more swap, that's your
> problem" (which isn't really even related to the issue I reported) by
> names I have come to recognize and respect despite my status as not a
> kernel hacker. Why shouldn't I be flabbergasted by that?

It gets even more fun when you realise that the people who
told you this aren't working on the VM and in fact never
seem to have contributed any VM code ;)

cheers,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
@ 2001-06-10 13:56 Bulent Abali
  0 siblings, 0 replies; 35+ messages in thread
From: Bulent Abali @ 2001-06-10 13:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Marcelo Tosatti, Mike Galbraith, Derek Glidden, lkml, linux-mm,
	Stephen Tweedie



>The fix is to kill the dead/orphaned swap pages before we get to
>swapoff.  At shutdown time there is practically nothing active in
> ...
>Once the dead swap pages problem is fixed it is time to optimize
>swapoff.

I think fixing the orphaned swap pages problem will eliminate the
problem all together.  Probably there is no need to optimize
swapoff.

Because as the system is shutting down all the processes will be
killed and their pages in swap will be orphaned. If those pages
were to be reaped in a timely manner there wouldn't be any work
left for swapoff.

Bulent



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
  2001-06-09 20:32 Bulent Abali
@ 2001-06-10  2:12 ` Eric W. Biederman
  0 siblings, 0 replies; 35+ messages in thread
From: Eric W. Biederman @ 2001-06-10  2:12 UTC (permalink / raw)
  To: Bulent Abali
  Cc: Marcelo Tosatti, Mike Galbraith, Derek Glidden, lkml, linux-mm,
	Stephen Tweedie

"Bulent Abali" <abali@us.ibm.com> writes:

> >Bulent,
> >
> >Could you please check if 2.4.6-pre2+the schedule patch has better
> >swapoff behaviour for you?
> 
> Marcelo,
> 
> It works as expected.  Doesn't lockup the box however swapoff keeps burning
> the CPU cycles.  It took 4 1/2 minutes to swapoff about 256MB of swap
> content.  Shutdown took just as long.  I was hoping that shutdown would
> kill the swapoff process but it doesn't.  It just hangs there.  Shutdown
> is the common case.  Therefore, swapoff needs to be optimized for
> shutdowns.
> You could imagine users frustration waiting for a shutdown when there are
> gigabytes in the swap.
> 
> So to summarize, schedule patch is better than nothing but falls far short.
> I would put it in 2.4.6.  Read on.

The fix is to kill the dead/orphaned swap pages before we get to
swapoff.  At shutdown time there is practically nothing active in
swap, so this should speed things up tremendously.  The dead swap
pages need to be killed as soon as possible to keep us from wasting
RAM and swap, and totally agravating whatever swapping situation is
present.

Once the dead swap pages problem is fixed it is time to optimize
swapoff.  

> ----------
> 
> The problem is with the try_to_unuse() algorithm which is very inefficient.
> I searched the linux-mm archives and Tweedie was on to this. This is what
> he wrote:  "it is much cheaper to find a swap entry for a given page than
> to find the swap cache page for a given swap entry." And he posted a
> patch http://mail.nl.linux.org/linux-mm/2001-03/msg00224.html
> His patch is in the Redhat 7.1 kernel 2.4.2-2 and not in 2.4.5.

> 
> But in any case I believe the patch will not work as expected.
> It seems to me that he is calling the function check_orphaned_swap(page)
> in the wrong place.  He is calling the function while scanning the
> active_list in refill_inactive_scan().  The problem with that is if you
> wait
> 60 seconds or longer the orphaned swap pages will move from active
> to inactive lists. Therefore the function will miss the orphans in inactive
> lists.  Any comments?

The analysis sounds about right.  

We should be killing most of these pages in free_pte.  Or at the very
least putting them on their own list that we can scan them
effectively.  Someone was creating a patch to that effect earlier.

try_to_unuse is inefficient with respect to cpu usage but it is
efficient with respect to swap usage.  If you are doing this on a
running machine where you are removing a swap you don't want an
algorithm that increases your need for swap.  All of the trivial
transformations of try_to_unuse have the property of breaking the
sharing of swap pages.  


Eric


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
@ 2001-06-09 20:32 Bulent Abali
  2001-06-10  2:12 ` Eric W. Biederman
  0 siblings, 1 reply; 35+ messages in thread
From: Bulent Abali @ 2001-06-09 20:32 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Mike Galbraith, Eric W. Biederman, Derek Glidden, lkml, linux-mm,
	Stephen Tweedie




>Bulent,
>
>Could you please check if 2.4.6-pre2+the schedule patch has better
>swapoff behaviour for you?

Marcelo,

It works as expected.  Doesn't lockup the box however swapoff keeps burning
the CPU cycles.  It took 4 1/2 minutes to swapoff about 256MB of swap
content.  Shutdown took just as long.  I was hoping that shutdown would
kill the swapoff process but it doesn't.  It just hangs there.  Shutdown
is the common case.  Therefore, swapoff needs to be optimized for
shutdowns.
You could imagine users frustration waiting for a shutdown when there are
gigabytes in the swap.

So to summarize, schedule patch is better than nothing but falls far short.
I would put it in 2.4.6.  Read on.

----------

The problem is with the try_to_unuse() algorithm which is very inefficient.
I searched the linux-mm archives and Tweedie was on to this. This is what
he wrote:  "it is much cheaper to find a swap entry for a given page than
to find the swap cache page for a given swap entry." And he posted a
patch http://mail.nl.linux.org/linux-mm/2001-03/msg00224.html
His patch is in the Redhat 7.1 kernel 2.4.2-2 and not in 2.4.5.

But in any case I believe the patch will not work as expected.
It seems to me that he is calling the function check_orphaned_swap(page)
in the wrong place.  He is calling the function while scanning the
active_list in refill_inactive_scan().  The problem with that is if you
wait
60 seconds or longer the orphaned swap pages will move from active
to inactive lists. Therefore the function will miss the orphans in inactive
lists.  Any comments?




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
@ 2001-06-08 23:53 Bulent Abali
  0 siblings, 0 replies; 35+ messages in thread
From: Bulent Abali @ 2001-06-08 23:53 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Mike Galbraith, Eric W. Biederman, Derek Glidden, lkml, linux-mm


>> I looked at try_to_unuse in swapfile.c.  I believe that the algorithm is
>> broken.
>> For each and every swap entry it is walking the entire process list
>> (for_each_task(p)).  It is also grabbing a whole bunch of locks
>> for each swap entry.  It might be worthwhile processing swap entries in
>> batches instead of one entry at a time.
>>
>> In any case, I think having this patch is worthwhile as a quick and
dirty
>> remedy.
>
>Bulent,
>
>Could you please check if 2.4.6-pre2+the schedule patch has better
>swapoff behaviour for you?

No problem.  I will check it tomorrow. I don't think it can be any worse
than it is now.  The patch looks correct in principle.
I believe it should go in to 2.4.6.  But I will test it.

On small machines people don't notice it, but otherwise if you have few
GB of memory it really hurts.  Shutdowns take forever since swapoff takes
forever.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
  2001-06-07 20:33 Please test: workaround to help swapoff behaviour Bulent Abali
  2001-06-07 19:40 ` Marcelo Tosatti
@ 2001-06-08 21:11 ` Marcelo Tosatti
  1 sibling, 0 replies; 35+ messages in thread
From: Marcelo Tosatti @ 2001-06-08 21:11 UTC (permalink / raw)
  To: Bulent Abali
  Cc: Mike Galbraith, Eric W. Biederman, Derek Glidden, lkml, linux-mm



On Thu, 7 Jun 2001, Bulent Abali wrote:

> 
> 
> 
> 
> >This is for the people who has been experiencing the lockups while running
> >swapoff.
> >
> >Please test. (against 2.4.6-pre1)
> >
> >
> >--- linux.orig/mm/swapfile.c Wed Jun  6 18:16:45 2001
> >+++ linux/mm/swapfile.c Thu Jun  7 16:06:11 2001
> >@@ -345,6 +345,8 @@
> >         /*
> >          * Find a swap page in use and read it in.
> >          */
> >+        if (current->need_resched)
> >+             schedule();
> >         swap_device_lock(si);
> >         for (i = 1; i < si->max ; i++) {
> >              if (si->swap_map[i] > 0 && si->swap_map[i] != SWAP_MAP_BAD)
> {
> 
> 
> I tested your patch against 2.4.5.  It works.  No more lockups.  Without
> the
> patch it took 14 minutes 51 seconds to complete swapoff (this is to recover
> 1.5GB of
> swap space).  During this time the system was frozen.  No keyboard, no
> screen, etc. Practically locked-up.
> 
> With the patch there are no more lockups. Swapoff kept running in the
> background.
> This is a winner.
> 
> But here is the caveat: swapoff keeps burning 100% of the cycles until it
> completes.
> This is not going to be a big deal during shutdowns.  Only when you enter
> swapoff from
> the command line it is going to be a problem.
> 
> I looked at try_to_unuse in swapfile.c.  I believe that the algorithm is
> broken.
> For each and every swap entry it is walking the entire process list
> (for_each_task(p)).  It is also grabbing a whole bunch of locks
> for each swap entry.  It might be worthwhile processing swap entries in
> batches instead of one entry at a time.
> 
> In any case, I think having this patch is worthwhile as a quick and dirty
> remedy.

Bulent, 

Could you please check if 2.4.6-pre2+the schedule patch has better
swapoff behaviour for you? 

Thanks 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
@ 2001-06-07 20:33 Bulent Abali
  2001-06-07 19:40 ` Marcelo Tosatti
  2001-06-08 21:11 ` Marcelo Tosatti
  0 siblings, 2 replies; 35+ messages in thread
From: Bulent Abali @ 2001-06-07 20:33 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Mike Galbraith, Eric W. Biederman, Derek Glidden, lkml, linux-mm





>This is for the people who has been experiencing the lockups while running
>swapoff.
>
>Please test. (against 2.4.6-pre1)
>
>
>--- linux.orig/mm/swapfile.c Wed Jun  6 18:16:45 2001
>+++ linux/mm/swapfile.c Thu Jun  7 16:06:11 2001
>@@ -345,6 +345,8 @@
>         /*
>          * Find a swap page in use and read it in.
>          */
>+        if (current->need_resched)
>+             schedule();
>         swap_device_lock(si);
>         for (i = 1; i < si->max ; i++) {
>              if (si->swap_map[i] > 0 && si->swap_map[i] != SWAP_MAP_BAD)
{


I tested your patch against 2.4.5.  It works.  No more lockups.  Without
the
patch it took 14 minutes 51 seconds to complete swapoff (this is to recover
1.5GB of
swap space).  During this time the system was frozen.  No keyboard, no
screen, etc. Practically locked-up.

With the patch there are no more lockups. Swapoff kept running in the
background.
This is a winner.

But here is the caveat: swapoff keeps burning 100% of the cycles until it
completes.
This is not going to be a big deal during shutdowns.  Only when you enter
swapoff from
the command line it is going to be a problem.

I looked at try_to_unuse in swapfile.c.  I believe that the algorithm is
broken.
For each and every swap entry it is walking the entire process list
(for_each_task(p)).  It is also grabbing a whole bunch of locks
for each swap entry.  It might be worthwhile processing swap entries in
batches instead of one entry at a time.

In any case, I think having this patch is worthwhile as a quick and dirty
remedy.

Bulent Abali




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
  2001-06-07 20:33 Please test: workaround to help swapoff behaviour Bulent Abali
@ 2001-06-07 19:40 ` Marcelo Tosatti
  2001-06-08 21:11 ` Marcelo Tosatti
  1 sibling, 0 replies; 35+ messages in thread
From: Marcelo Tosatti @ 2001-06-07 19:40 UTC (permalink / raw)
  To: Bulent Abali
  Cc: Mike Galbraith, Eric W. Biederman, Derek Glidden, lkml, linux-mm



On Thu, 7 Jun 2001, Bulent Abali wrote:

> 
> I tested your patch against 2.4.5.  It works.  No more lockups.  Without
> the
> patch it took 14 minutes 51 seconds to complete swapoff (this is to recover
> 1.5GB of
> swap space).  During this time the system was frozen.  No keyboard, no
> screen, etc. Practically locked-up.
> 
> With the patch there are no more lockups. Swapoff kept running in the
> background.
> This is a winner.
>
> But here is the caveat: swapoff keeps burning 100% of the cycles until it
> completes.

Yup. Wait a while until the dead swap cache issue is sorted out. 

When that finally happens, the time spent in swapoff will probably be
"acceptable".

> This is not going to be a big deal during shutdowns.  Only when you enter
> swapoff from
> the command line it is going to be a problem.
> 
> I looked at try_to_unuse in swapfile.c.  I believe that the algorithm is
> broken.

Yes. 

> For each and every swap entry it is walking the entire process list
> (for_each_task(p)).  It is also grabbing a whole bunch of locks
> for each swap entry.  It might be worthwhile processing swap entries in
> batches instead of one entry at a time.

The real fix is to make the processing the other way around --- go looking
into the pte's and from there do the swapins. 

Don't have the time to do everything, though. :) 


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2001-06-11  9:19 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-06 15:31 Break 2.4 VM in five easy steps Derek Glidden
2001-06-06 15:46 ` John Alvord
2001-06-06 15:58   ` Derek Glidden
2001-06-06 18:27     ` Eric W. Biederman
2001-06-06 18:47       ` Derek Glidden
2001-06-06 18:52         ` Eric W. Biederman
2001-06-06 19:06           ` Mike Galbraith
2001-06-06 19:28             ` Eric W. Biederman
2001-06-07  4:32               ` Mike Galbraith
2001-06-07  6:38                 ` Eric W. Biederman
2001-06-07  7:28                   ` Mike Galbraith
2001-06-07  7:59                     ` Eric W. Biederman
2001-06-07  8:15                       ` Mike Galbraith
2001-06-07 17:10                 ` Marcelo Tosatti
2001-06-07 17:43                   ` Please test: workaround to help swapoff behaviour Marcelo Tosatti
2001-06-06 19:28           ` Break 2.4 VM in five easy steps Derek Glidden
2001-06-09  7:55           ` Rik van Riel
2001-06-06 20:43       ` Daniel Phillips
2001-06-06 21:57       ` LA Walsh
2001-06-07  6:35         ` Eric W. Biederman
2001-06-07 15:25           ` LA Walsh
2001-06-07 16:42             ` Eric W. Biederman
2001-06-07 20:47               ` LA Walsh
2001-06-08 19:38                 ` Pavel Machek
2001-06-09  7:34     ` Rik van Riel
2001-06-06 21:30 ` Alan Cox
2001-06-06 21:57   ` Derek Glidden
2001-06-09  8:09     ` Rik van Riel
2001-06-07 20:33 Please test: workaround to help swapoff behaviour Bulent Abali
2001-06-07 19:40 ` Marcelo Tosatti
2001-06-08 21:11 ` Marcelo Tosatti
2001-06-08 23:53 Bulent Abali
2001-06-09 20:32 Bulent Abali
2001-06-10  2:12 ` Eric W. Biederman
2001-06-10 13:56 Bulent Abali

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).