linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Please test: workaround to help swapoff behaviour
@ 2001-06-09 20:32 Bulent Abali
  2001-06-10  2:12 ` Eric W. Biederman
  0 siblings, 1 reply; 8+ messages in thread
From: Bulent Abali @ 2001-06-09 20:32 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Mike Galbraith, Eric W. Biederman, Derek Glidden, lkml, linux-mm,
	Stephen Tweedie




>Bulent,
>
>Could you please check if 2.4.6-pre2+the schedule patch has better
>swapoff behaviour for you?

Marcelo,

It works as expected.  Doesn't lockup the box however swapoff keeps burning
the CPU cycles.  It took 4 1/2 minutes to swapoff about 256MB of swap
content.  Shutdown took just as long.  I was hoping that shutdown would
kill the swapoff process but it doesn't.  It just hangs there.  Shutdown
is the common case.  Therefore, swapoff needs to be optimized for
shutdowns.
You could imagine users frustration waiting for a shutdown when there are
gigabytes in the swap.

So to summarize, schedule patch is better than nothing but falls far short.
I would put it in 2.4.6.  Read on.

----------

The problem is with the try_to_unuse() algorithm which is very inefficient.
I searched the linux-mm archives and Tweedie was on to this. This is what
he wrote:  "it is much cheaper to find a swap entry for a given page than
to find the swap cache page for a given swap entry." And he posted a
patch http://mail.nl.linux.org/linux-mm/2001-03/msg00224.html
His patch is in the Redhat 7.1 kernel 2.4.2-2 and not in 2.4.5.

But in any case I believe the patch will not work as expected.
It seems to me that he is calling the function check_orphaned_swap(page)
in the wrong place.  He is calling the function while scanning the
active_list in refill_inactive_scan().  The problem with that is if you
wait
60 seconds or longer the orphaned swap pages will move from active
to inactive lists. Therefore the function will miss the orphans in inactive
lists.  Any comments?




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
  2001-06-09 20:32 Please test: workaround to help swapoff behaviour Bulent Abali
@ 2001-06-10  2:12 ` Eric W. Biederman
  0 siblings, 0 replies; 8+ messages in thread
From: Eric W. Biederman @ 2001-06-10  2:12 UTC (permalink / raw)
  To: Bulent Abali
  Cc: Marcelo Tosatti, Mike Galbraith, Derek Glidden, lkml, linux-mm,
	Stephen Tweedie

"Bulent Abali" <abali@us.ibm.com> writes:

> >Bulent,
> >
> >Could you please check if 2.4.6-pre2+the schedule patch has better
> >swapoff behaviour for you?
> 
> Marcelo,
> 
> It works as expected.  Doesn't lockup the box however swapoff keeps burning
> the CPU cycles.  It took 4 1/2 minutes to swapoff about 256MB of swap
> content.  Shutdown took just as long.  I was hoping that shutdown would
> kill the swapoff process but it doesn't.  It just hangs there.  Shutdown
> is the common case.  Therefore, swapoff needs to be optimized for
> shutdowns.
> You could imagine users frustration waiting for a shutdown when there are
> gigabytes in the swap.
> 
> So to summarize, schedule patch is better than nothing but falls far short.
> I would put it in 2.4.6.  Read on.

The fix is to kill the dead/orphaned swap pages before we get to
swapoff.  At shutdown time there is practically nothing active in
swap, so this should speed things up tremendously.  The dead swap
pages need to be killed as soon as possible to keep us from wasting
RAM and swap, and totally agravating whatever swapping situation is
present.

Once the dead swap pages problem is fixed it is time to optimize
swapoff.  

> ----------
> 
> The problem is with the try_to_unuse() algorithm which is very inefficient.
> I searched the linux-mm archives and Tweedie was on to this. This is what
> he wrote:  "it is much cheaper to find a swap entry for a given page than
> to find the swap cache page for a given swap entry." And he posted a
> patch http://mail.nl.linux.org/linux-mm/2001-03/msg00224.html
> His patch is in the Redhat 7.1 kernel 2.4.2-2 and not in 2.4.5.

> 
> But in any case I believe the patch will not work as expected.
> It seems to me that he is calling the function check_orphaned_swap(page)
> in the wrong place.  He is calling the function while scanning the
> active_list in refill_inactive_scan().  The problem with that is if you
> wait
> 60 seconds or longer the orphaned swap pages will move from active
> to inactive lists. Therefore the function will miss the orphans in inactive
> lists.  Any comments?

The analysis sounds about right.  

We should be killing most of these pages in free_pte.  Or at the very
least putting them on their own list that we can scan them
effectively.  Someone was creating a patch to that effect earlier.

try_to_unuse is inefficient with respect to cpu usage but it is
efficient with respect to swap usage.  If you are doing this on a
running machine where you are removing a swap you don't want an
algorithm that increases your need for swap.  All of the trivial
transformations of try_to_unuse have the property of breaking the
sharing of swap pages.  


Eric


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
@ 2001-06-10 13:56 Bulent Abali
  0 siblings, 0 replies; 8+ messages in thread
From: Bulent Abali @ 2001-06-10 13:56 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Marcelo Tosatti, Mike Galbraith, Derek Glidden, lkml, linux-mm,
	Stephen Tweedie



>The fix is to kill the dead/orphaned swap pages before we get to
>swapoff.  At shutdown time there is practically nothing active in
> ...
>Once the dead swap pages problem is fixed it is time to optimize
>swapoff.

I think fixing the orphaned swap pages problem will eliminate the
problem all together.  Probably there is no need to optimize
swapoff.

Because as the system is shutting down all the processes will be
killed and their pages in swap will be orphaned. If those pages
were to be reaped in a timely manner there wouldn't be any work
left for swapoff.

Bulent



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
@ 2001-06-08 23:53 Bulent Abali
  0 siblings, 0 replies; 8+ messages in thread
From: Bulent Abali @ 2001-06-08 23:53 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Mike Galbraith, Eric W. Biederman, Derek Glidden, lkml, linux-mm


>> I looked at try_to_unuse in swapfile.c.  I believe that the algorithm is
>> broken.
>> For each and every swap entry it is walking the entire process list
>> (for_each_task(p)).  It is also grabbing a whole bunch of locks
>> for each swap entry.  It might be worthwhile processing swap entries in
>> batches instead of one entry at a time.
>>
>> In any case, I think having this patch is worthwhile as a quick and
dirty
>> remedy.
>
>Bulent,
>
>Could you please check if 2.4.6-pre2+the schedule patch has better
>swapoff behaviour for you?

No problem.  I will check it tomorrow. I don't think it can be any worse
than it is now.  The patch looks correct in principle.
I believe it should go in to 2.4.6.  But I will test it.

On small machines people don't notice it, but otherwise if you have few
GB of memory it really hurts.  Shutdowns take forever since swapoff takes
forever.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
  2001-06-07 20:33 Bulent Abali
  2001-06-07 19:40 ` Marcelo Tosatti
@ 2001-06-08 21:11 ` Marcelo Tosatti
  1 sibling, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2001-06-08 21:11 UTC (permalink / raw)
  To: Bulent Abali
  Cc: Mike Galbraith, Eric W. Biederman, Derek Glidden, lkml, linux-mm



On Thu, 7 Jun 2001, Bulent Abali wrote:

> 
> 
> 
> 
> >This is for the people who has been experiencing the lockups while running
> >swapoff.
> >
> >Please test. (against 2.4.6-pre1)
> >
> >
> >--- linux.orig/mm/swapfile.c Wed Jun  6 18:16:45 2001
> >+++ linux/mm/swapfile.c Thu Jun  7 16:06:11 2001
> >@@ -345,6 +345,8 @@
> >         /*
> >          * Find a swap page in use and read it in.
> >          */
> >+        if (current->need_resched)
> >+             schedule();
> >         swap_device_lock(si);
> >         for (i = 1; i < si->max ; i++) {
> >              if (si->swap_map[i] > 0 && si->swap_map[i] != SWAP_MAP_BAD)
> {
> 
> 
> I tested your patch against 2.4.5.  It works.  No more lockups.  Without
> the
> patch it took 14 minutes 51 seconds to complete swapoff (this is to recover
> 1.5GB of
> swap space).  During this time the system was frozen.  No keyboard, no
> screen, etc. Practically locked-up.
> 
> With the patch there are no more lockups. Swapoff kept running in the
> background.
> This is a winner.
> 
> But here is the caveat: swapoff keeps burning 100% of the cycles until it
> completes.
> This is not going to be a big deal during shutdowns.  Only when you enter
> swapoff from
> the command line it is going to be a problem.
> 
> I looked at try_to_unuse in swapfile.c.  I believe that the algorithm is
> broken.
> For each and every swap entry it is walking the entire process list
> (for_each_task(p)).  It is also grabbing a whole bunch of locks
> for each swap entry.  It might be worthwhile processing swap entries in
> batches instead of one entry at a time.
> 
> In any case, I think having this patch is worthwhile as a quick and dirty
> remedy.

Bulent, 

Could you please check if 2.4.6-pre2+the schedule patch has better
swapoff behaviour for you? 

Thanks 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
@ 2001-06-07 20:33 Bulent Abali
  2001-06-07 19:40 ` Marcelo Tosatti
  2001-06-08 21:11 ` Marcelo Tosatti
  0 siblings, 2 replies; 8+ messages in thread
From: Bulent Abali @ 2001-06-07 20:33 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: Mike Galbraith, Eric W. Biederman, Derek Glidden, lkml, linux-mm





>This is for the people who has been experiencing the lockups while running
>swapoff.
>
>Please test. (against 2.4.6-pre1)
>
>
>--- linux.orig/mm/swapfile.c Wed Jun  6 18:16:45 2001
>+++ linux/mm/swapfile.c Thu Jun  7 16:06:11 2001
>@@ -345,6 +345,8 @@
>         /*
>          * Find a swap page in use and read it in.
>          */
>+        if (current->need_resched)
>+             schedule();
>         swap_device_lock(si);
>         for (i = 1; i < si->max ; i++) {
>              if (si->swap_map[i] > 0 && si->swap_map[i] != SWAP_MAP_BAD)
{


I tested your patch against 2.4.5.  It works.  No more lockups.  Without
the
patch it took 14 minutes 51 seconds to complete swapoff (this is to recover
1.5GB of
swap space).  During this time the system was frozen.  No keyboard, no
screen, etc. Practically locked-up.

With the patch there are no more lockups. Swapoff kept running in the
background.
This is a winner.

But here is the caveat: swapoff keeps burning 100% of the cycles until it
completes.
This is not going to be a big deal during shutdowns.  Only when you enter
swapoff from
the command line it is going to be a problem.

I looked at try_to_unuse in swapfile.c.  I believe that the algorithm is
broken.
For each and every swap entry it is walking the entire process list
(for_each_task(p)).  It is also grabbing a whole bunch of locks
for each swap entry.  It might be worthwhile processing swap entries in
batches instead of one entry at a time.

In any case, I think having this patch is worthwhile as a quick and dirty
remedy.

Bulent Abali




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Please test: workaround to help swapoff behaviour
  2001-06-07 20:33 Bulent Abali
@ 2001-06-07 19:40 ` Marcelo Tosatti
  2001-06-08 21:11 ` Marcelo Tosatti
  1 sibling, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2001-06-07 19:40 UTC (permalink / raw)
  To: Bulent Abali
  Cc: Mike Galbraith, Eric W. Biederman, Derek Glidden, lkml, linux-mm



On Thu, 7 Jun 2001, Bulent Abali wrote:

> 
> I tested your patch against 2.4.5.  It works.  No more lockups.  Without
> the
> patch it took 14 minutes 51 seconds to complete swapoff (this is to recover
> 1.5GB of
> swap space).  During this time the system was frozen.  No keyboard, no
> screen, etc. Practically locked-up.
> 
> With the patch there are no more lockups. Swapoff kept running in the
> background.
> This is a winner.
>
> But here is the caveat: swapoff keeps burning 100% of the cycles until it
> completes.

Yup. Wait a while until the dead swap cache issue is sorted out. 

When that finally happens, the time spent in swapoff will probably be
"acceptable".

> This is not going to be a big deal during shutdowns.  Only when you enter
> swapoff from
> the command line it is going to be a problem.
> 
> I looked at try_to_unuse in swapfile.c.  I believe that the algorithm is
> broken.

Yes. 

> For each and every swap entry it is walking the entire process list
> (for_each_task(p)).  It is also grabbing a whole bunch of locks
> for each swap entry.  It might be worthwhile processing swap entries in
> batches instead of one entry at a time.

The real fix is to make the processing the other way around --- go looking
into the pte's and from there do the swapins. 

Don't have the time to do everything, though. :) 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Please test: workaround to help swapoff behaviour
  2001-06-07 17:10 Break 2.4 VM in five easy steps Marcelo Tosatti
@ 2001-06-07 17:43 ` Marcelo Tosatti
  0 siblings, 0 replies; 8+ messages in thread
From: Marcelo Tosatti @ 2001-06-07 17:43 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: Eric W. Biederman, Derek Glidden, lkml, linux-mm



On Thu, 7 Jun 2001, Marcelo Tosatti wrote:

> 
> On Thu, 7 Jun 2001, Mike Galbraith wrote:
> 
> > On 6 Jun 2001, Eric W. Biederman wrote:
> > 
> > > Mike Galbraith <mikeg@wen-online.de> writes:
> > >
> > > > > If you could confirm this by calling swapoff sometime other than at
> > > > > reboot time.  That might help.  Say by running top on the console.
> > > >
> > > > The thing goes comatose here too. SCHED_RR vmstat doesn't run, console
> > > > switch is nogo...
> > > >
> > > > After running his memory hog, swapoff took 18 seconds.  I hacked a
> > > > bleeder valve for dead swap pages, and it dropped to 4 seconds.. still
> > > > utterly comatose for those 4 seconds though.
> > >
> > > At the top of the while(1) loop in try_to_unuse what happens if you put in.
> > > if (need_resched) schedule();
> > > It should be outside all of the locks.  It might just be a matter of everything
> > > serializing on the SMP locks, and the kernel refusing to preempt itself.
> > 
> > That did it.
> 
> What about including this workaround in the kernel ? 

Well, 

This is for the people who has been experiencing the lockups while running
swapoff.

Please test. (against 2.4.6-pre1)

Thanks for the suggestion, Eric. 


--- linux.orig/mm/swapfile.c	Wed Jun  6 18:16:45 2001
+++ linux/mm/swapfile.c	Thu Jun  7 16:06:11 2001
@@ -345,6 +345,8 @@
 		/*
 		 * Find a swap page in use and read it in.
 		 */
+		if (current->need_resched)
+			schedule();
 		swap_device_lock(si);
 		for (i = 1; i < si->max ; i++) {
 			if (si->swap_map[i] > 0 && si->swap_map[i] != SWAP_MAP_BAD) {


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2001-06-10 13:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-09 20:32 Please test: workaround to help swapoff behaviour Bulent Abali
2001-06-10  2:12 ` Eric W. Biederman
  -- strict thread matches above, loose matches on Subject: below --
2001-06-10 13:56 Bulent Abali
2001-06-08 23:53 Bulent Abali
2001-06-07 20:33 Bulent Abali
2001-06-07 19:40 ` Marcelo Tosatti
2001-06-08 21:11 ` Marcelo Tosatti
2001-06-07 17:10 Break 2.4 VM in five easy steps Marcelo Tosatti
2001-06-07 17:43 ` Please test: workaround to help swapoff behaviour Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).