linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Ongoing 2.4 VM suckage pagemap_lru_lock
       [not found] <Pine.LNX.4.30.0108030400180.5800-100000@fs131-224.f-secure.com>
@ 2001-08-03 15:44 ` Jeremy Linton
  0 siblings, 0 replies; 2+ messages in thread
From: Jeremy Linton @ 2001-08-03 15:44 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: linux-kernel

Yes, I have 2.4.4 as well as 2.4.7pre6 on the box. That stack dump was
generated from a 2.4.4 kernel.



----- Original Message -----
From: "Szabolcs Szakacsits" <szaka@f-secure.com>
To: "Jeremy Linton" <jlinton@interactivesi.com>
Sent: Thursday, August 02, 2001 8:01 PM
Subject: Re: Ongoing 2.4 VM suckage pagemap_lru_lock


>
> On Thu, 2 Aug 2001, Jeremy Linton wrote:
>
> > #0  reclaim_page (zone=0xc0285ae8) at
> > /usr/src/linux.2.4.4/include/asm/spinlock.h:102
>
> Are you using 2.4.4? This is an old issue, the relevant codes changed
> significantly ....
>
> Szaka
>
>



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Ongoing 2.4 VM suckage pagemap_lru_lock
  2001-08-02 22:17 Ongoing 2.4 VM suckage Jeffrey W. Baker
@ 2001-08-02 23:46 ` Jeremy Linton
  0 siblings, 0 replies; 2+ messages in thread
From: Jeremy Linton @ 2001-08-02 23:46 UTC (permalink / raw)
  To: Jeffrey W. Baker; +Cc: linux-kernel

> I'm telling you that's not what happens.  When memory pressure gets really
> high, the kernel takes all the CPU time and the box is completely useless.
> Maybe the VM sorts itself out but after five minutes of barely responding,
> I usually just power cycle the damn thing.  As I said, this isn't a
> classic thrash because the swap disks only blip perhaps once every ten
> seconds!
>
> You don't have to go to extremes to observe this behavior.  Yesterday, I
> had one box where kswapd used 100% of one CPU for 70 minutes straight,
> while user process all ran on the other CPU.  All RAM and half swap was
> used, and I/O was heavy.  The machine had been up for 14 days.  I just
> don't understand why kswapd needs to run and run and run and run and run

    Actually, this sounds very similar to a problem I see on a somewhat
regular basis with a very memory hungry module running in the machine.
Basically the module eats up about a quarter of system memory. Then a user
space process comes along and uses a big virtual area (about 1.2x the total
physical memory in the box). If the user space process starts to write to a
lot of the virtual memory it owns, then the box basically slows down to the
point where it appears to have locked up, disk activity goes to 1 blip every
few seconds. On the other hand if the user process is doing mostly read
accesses to the memory space then everything is fine.

    I can't even break into gdb when the box is 'locked up' but before it
locks up I notice that there is massive contention for the pagemap_lru_lock
(been running a hand rolled kernel lock profiler) from two different
places... Take a look at these stack dumps.

Kswapd is in page_launder.......
#0  page_launder (gfp_mask=4, user=0) at vmscan.c:592
#1  0xc013d665 in do_try_to_free_pages (gfp_mask=4, user=0) at vmscan.c:935
#2  0xc013d73b in kswapd (unused=0x0) at vmscan.c:1016
#3  0xc01056b6 in kernel_thread (fn=0xddaa0848, arg=0xdfff5fbc, flags=9) at
process.c:443
#4  0xddaa0844 in ?? ()

And my user space process is desperatly trying to get a page from a page
fault!

#0  reclaim_page (zone=0xc0285ae8) at
/usr/src/linux.2.4.4/include/asm/spinlock.h:102
#1  0xc013e474 in __alloc_pages_limit (zonelist=0xc02864dc, order=0,
limit=1, direct_reclaim=1) at page_alloc.c:294
#2  0xc013e581 in __alloc_pages (zonelist=0xc02864dc, order=0) at
page_alloc.c:383
#3  0xc012de43 in do_anonymous_page (mm=0xdfb88884, vma=0xdb45ce3c,
page_table=0xc091e46c, write_access=1, addr=1506914312)
    at /usr/src/linux.2.4.4/include/linux/mm.h:392
#4  0xc012df40 in do_no_page (mm=0xdfb88884, vma=0xdb45ce3c,
address=1506914312, write_access=1, page_table=0xc091e46c)
    at memory.c:1237
#5  0xc012e15b in handle_mm_fault (mm=0xdfb88884, vma=0xdb45ce3c,
address=1506914312, write_access=1) at memory.c:1317
#6  0xc01163dc in do_page_fault (regs=0xdb2d3fc4, error_code=6) at
fault.c:265
#7  0xc01078b0 in error_code () at af_packet.c:1881
#8  0x40040177 in ?? () at af_packet.c:1881

The spinlock counts are usually on the order of ~1million spins to get the
lock!!!!!!


                                                        jlinton



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2001-08-03 15:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.30.0108030400180.5800-100000@fs131-224.f-secure.com>
2001-08-03 15:44 ` Ongoing 2.4 VM suckage pagemap_lru_lock Jeremy Linton
2001-08-02 22:17 Ongoing 2.4 VM suckage Jeffrey W. Baker
2001-08-02 23:46 ` Ongoing 2.4 VM suckage pagemap_lru_lock Jeremy Linton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).