linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kswap spinning
@ 2001-08-18 15:15 Mark Hemment
  2001-08-20 10:10 ` Marcelo Tosatti
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Hemment @ 2001-08-18 15:15 UTC (permalink / raw)
  To: linux-kernel

Hi,

  Jumping from 2.4.7 to 2.4.9 has shown up a problem with the VM balancing
code.
  Under load, I've seen kswapd become a CPU hog (on a 5GB box).  Now that
I've got a theory, I cannot reproduce the problem for confirmation, but
here is the theory anyway.

  The tests in free_shortage() and inactive_shortage() assume that the
memory state for a zone can be improved by calling refill_inactive_scan(),
page_launder(), the inode/dcache shrinking functions, and the general
slab-cache reaping func.  Usually, some combination of the reaping
functions will improve a zone's state - but not always.

  The problem is the DMA zone.  As it is (relatively) small on some archs
(IA-32 for example), it is possible that almost all pages from this zone
are being used by a sub-systems which simply won't give them up.  Examples
of use are; vmalloc()ed page for modules, pre-allocated socket buffs
for NICs, task-structs/kernel-stack - you get the idea.

  In the case I believe I'm seeing, both free_shortage() and
inactive_shortage() are returning a shortage for the DMA zone which
triggers all the work in do_try_to_free_pages().  But as there aren't any
DMA pages left in the page-cache, being used as anonymous pages, or being
used in the other places the code looks, do_try_to_free_pages() returns
non-zero to kswapd() and off we go again.  This also causes callers of
try_to_free_pages() (from __alloc_pages()) to suck CPU cycles as well.

  On HIGHMEM boxes, it is possible for the NORMAL zone to get into the
same state (although v unlikely).

  I'd rather not get into having specialist code in mm/vmscan.c (or arch
specific code) to handle a small DMA zone - so what are the other
solutions?  (Assuming the above theory holds true.)

  One possible solution is not to give DMA memory out, except for;
	o explicit DMA allocations
	o page-cache, anonymous page, allocations
assuming explicit DMA allocations are low, we'll at least know the
remaining DMA pages are somewhere we can get at them - would need to be
trigger by an arch specific flag for those that don't have a "tiny" zone.
However, I don't like this, feels like fixing the wrong problem. :(

Mark


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kswap spinning
  2001-08-18 15:15 kswap spinning Mark Hemment
@ 2001-08-20 10:10 ` Marcelo Tosatti
  2001-08-21  1:58   ` Marcelo Tosatti
  0 siblings, 1 reply; 4+ messages in thread
From: Marcelo Tosatti @ 2001-08-20 10:10 UTC (permalink / raw)
  To: Mark Hemment; +Cc: linux-kernel



On Sat, 18 Aug 2001, Mark Hemment wrote:

> Hi,
> 
>   Jumping from 2.4.7 to 2.4.9 has shown up a problem with the VM balancing
> code.
>   Under load, I've seen kswapd become a CPU hog (on a 5GB box).  Now that
> I've got a theory, I cannot reproduce the problem for confirmation, but
> here is the theory anyway.
> 
>   The tests in free_shortage() and inactive_shortage() assume that the
> memory state for a zone can be improved by calling refill_inactive_scan(),
> page_launder(), the inode/dcache shrinking functions, and the general
> slab-cache reaping func.  Usually, some combination of the reaping
> functions will improve a zone's state - but not always.
> 
>   The problem is the DMA zone.  As it is (relatively) small on some archs
> (IA-32 for example), it is possible that almost all pages from this zone
> are being used by a sub-systems which simply won't give them up.  Examples
> of use are; vmalloc()ed page for modules, pre-allocated socket buffs
> for NICs, task-structs/kernel-stack - you get the idea.
> 
>   In the case I believe I'm seeing, both free_shortage() and
> inactive_shortage() are returning a shortage for the DMA zone which
> triggers all the work in do_try_to_free_pages().  But as there aren't any
> DMA pages left in the page-cache, being used as anonymous pages, or being
> used in the other places the code looks, do_try_to_free_pages() returns
> non-zero to kswapd() and off we go again.  This also causes callers of
> try_to_free_pages() (from __alloc_pages()) to suck CPU cycles as well.
> 
>   On HIGHMEM boxes, it is possible for the NORMAL zone to get into the
> same state (although v unlikely).
> 
>   I'd rather not get into having specialist code in mm/vmscan.c (or arch
> specific code) to handle a small DMA zone - so what are the other
> solutions?  (Assuming the above theory holds true.)
> 
>   One possible solution is not to give DMA memory out, except for;
> 	o explicit DMA allocations
> 	o page-cache, anonymous page, allocations
> assuming explicit DMA allocations are low, we'll at least know the
> remaining DMA pages are somewhere we can get at them - would need to be
> trigger by an arch specific flag for those that don't have a "tiny" zone.
> However, I don't like this, feels like fixing the wrong problem. :(

The DMA zone problem _can_ happen, but it should also happen with 2.4.7 I
think. (I'm not completly sure, though. Maybe Linus changes to
try_to_free_pages() are the reason...)

Lets first find out the actual problem.

Could you please boot with profile=2 and use readprofile to find out where
kswapd is spending its time? 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kswap spinning
  2001-08-20 10:10 ` Marcelo Tosatti
@ 2001-08-21  1:58   ` Marcelo Tosatti
  2001-08-21  3:36     ` Rik van Riel
  0 siblings, 1 reply; 4+ messages in thread
From: Marcelo Tosatti @ 2001-08-21  1:58 UTC (permalink / raw)
  To: Mark Hemment; +Cc: linux-kernel



On Mon, 20 Aug 2001, Marcelo Tosatti wrote:

> 
> 
> On Sat, 18 Aug 2001, Mark Hemment wrote:
> 
> > Hi,
> > 
> >   Jumping from 2.4.7 to 2.4.9 has shown up a problem with the VM balancing
> > code.
> >   Under load, I've seen kswapd become a CPU hog (on a 5GB box).  Now that
> > I've got a theory, I cannot reproduce the problem for confirmation, but
> > here is the theory anyway.
> > 
> >   The tests in free_shortage() and inactive_shortage() assume that the
> > memory state for a zone can be improved by calling refill_inactive_scan(),
> > page_launder(), the inode/dcache shrinking functions, and the general
> > slab-cache reaping func.  Usually, some combination of the reaping
> > functions will improve a zone's state - but not always.
> > 
> >   The problem is the DMA zone.  As it is (relatively) small on some archs
> > (IA-32 for example), it is possible that almost all pages from this zone
> > are being used by a sub-systems which simply won't give them up.  Examples
> > of use are; vmalloc()ed page for modules, pre-allocated socket buffs
> > for NICs, task-structs/kernel-stack - you get the idea.
> > 
> >   In the case I believe I'm seeing, both free_shortage() and
> > inactive_shortage() are returning a shortage for the DMA zone which
> > triggers all the work in do_try_to_free_pages().  But as there aren't any
> > DMA pages left in the page-cache, being used as anonymous pages, or being
> > used in the other places the code looks, do_try_to_free_pages() returns
> > non-zero to kswapd() and off we go again.  This also causes callers of
> > try_to_free_pages() (from __alloc_pages()) to suck CPU cycles as well.
> > 
> >   On HIGHMEM boxes, it is possible for the NORMAL zone to get into the
> > same state (although v unlikely).
> > 
> >   I'd rather not get into having specialist code in mm/vmscan.c (or arch
> > specific code) to handle a small DMA zone - so what are the other
> > solutions?  (Assuming the above theory holds true.)
> > 
> >   One possible solution is not to give DMA memory out, except for;
> > 	o explicit DMA allocations
> > 	o page-cache, anonymous page, allocations
> > assuming explicit DMA allocations are low, we'll at least know the
> > remaining DMA pages are somewhere we can get at them - would need to be
> > trigger by an arch specific flag for those that don't have a "tiny" zone.
> > However, I don't like this, feels like fixing the wrong problem. :(
> 
> The DMA zone problem _can_ happen, but it should also happen with 2.4.7 I
> think. (I'm not completly sure, though. Maybe Linus changes to
> try_to_free_pages() are the reason...)
> 
> Lets first find out the actual problem.
> 
> Could you please boot with profile=2 and use readprofile to find out where
> kswapd is spending its time? 

Well, I've just noted Linus made kswapd loop as long as there is any kind
(inactive or free) shortage.

I guess that is the reason. 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kswap spinning
  2001-08-21  1:58   ` Marcelo Tosatti
@ 2001-08-21  3:36     ` Rik van Riel
  0 siblings, 0 replies; 4+ messages in thread
From: Rik van Riel @ 2001-08-21  3:36 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Mark Hemment, linux-kernel

On Mon, 20 Aug 2001, Marcelo Tosatti wrote:

> > Could you please boot with profile=2 and use readprofile to find out where
> > kswapd is spending its time?
>
> Well, I've just noted Linus made kswapd loop as long as there is any
> kind (inactive or free) shortage.

Well, duh.  Let me explain.

>From mm/vmscan.c::kswapd() a short comment I wrote while
implementing part of the current VM:

                /*
                 * We go to sleep if either the free page shortage
                 * or the inactive page shortage is gone. We do this
                 * because:
                 * 1) we need no more free pages   or
                 * 2) the inactive pages need to be flushed to disk,
                 *    it wouldn't help to eat CPU time now ...
                 *
                 * We go to sleep for one second, but if it's needed
                 * we'll be woken up earlier...
                 */
                if (!free_shortage() || !inactive_shortage()) {
                        interruptible_sleep_on_timeout(&kswapd_wait, HZ);

I wonder when Linus lost his ability to read comments ;)

regards,

Rik
--
IA64: a worthy successor to i860.

http://www.surriel.com/		http://distro.conectiva.com/

Send all your spam to aardvark@nl.linux.org (spam digging piggy)


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2001-08-21  3:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-08-18 15:15 kswap spinning Mark Hemment
2001-08-20 10:10 ` Marcelo Tosatti
2001-08-21  1:58   ` Marcelo Tosatti
2001-08-21  3:36     ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).