From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
lee.schermerhorn@hp.com
Subject: Re: [patch 00/20] VM pageout scalability improvements
Date: Mon, 24 Dec 2007 04:29:36 +0530 [thread overview]
Message-ID: <476EE858.202@linux.vnet.ibm.com> (raw)
In-Reply-To: <20071222192119.030f32d5@bree.surriel.com>
Rik van Riel wrote:
> On Sun, 23 Dec 2007 01:57:32 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> Rik van Riel wrote:
>>> On large memory systems, the VM can spend way too much time scanning
>>> through pages that it cannot (or should not) evict from memory. Not
>>> only does it use up CPU time, but it also provokes lock contention
>>> and can leave large systems under memory presure in a catatonic state.
>> I remember you mentioning that by large memory systems you mean systems
>> with at-least 128GB, does this definition still hold?
>
> It depends on the workload. Certain test cases can wedge the
> VM with as little as 16GB of RAM. Other workloads cause trouble
> at 32 or 64GB, with the system sometimes hanging for several
> minutes, all the CPUs in the pageout code and no actual swap IO.
>
Interesting, I have not run into it so far. But I have smaller machines,
typically 4-8GB.
> On systems of 128GB and more, we have seen systems hang in the
> pageout code overnight, without deciding what to swap out.
>
>>> This patch series improves VM scalability by:
>>>
>>> 1) making the locking a little more scalable
>>>
>>> 2) putting filesystem backed, swap backed and non-reclaimable pages
>>> onto their own LRUs, so the system only scans the pages that it
>>> can/should evict from memory
>>>
>>> 3) switching to SEQ replacement for the anonymous LRUs, so the
>>> number of pages that need to be scanned when the system
>>> starts swapping is bound to a reasonable number
>>>
>>> The noreclaim patches come verbatim from Lee Schermerhorn and
>>> Nick Piggin. I have not taken a detailed look at them yet and
>>> all I have done is fix the rejects against the latest -mm kernel.
>> Is there a consolidate patch available, it makes it easier to test.
>
> I will make a big patch available with the next version. I have
> to upgrade my patch set to newer noreclaim patches from Lee and
> add a few small cleanups elsewhere.
>
That would be nice. I'll try and help out by testing the patches and
running them
>>> I am posting this series now because I would like to get more
>>> feedback, while I am studying and improving the noreclaim patches
>>> myself.
>> What kind of tests show the problem? I'll try and review and test the code.
>
> The easiest test possible simply allocates a ton of memory and
> then touches it all. Enough memory that the system needs to go
> into swap.
>
> Once memory is full, you will see the VM scan like mad, with a
> big CPU spike (clearing the referenced bits off all pages) before
> it starts swapping out anything. That big CPU spike should be
> gone or greatly reduced with my patches.
>
> On really huge systems, that big CPU spike can be enough for one
> CPU to spend so much time in the VM that all the other CPUs join
> it, and the system goes under in a big lock contention fest.
>
> Besides, even single threadedly clearing the referenced bits on
> 1TB worth of memory can't result in acceptable latencies :)
>
> In the real world, users with large JVMs on their servers, which
> sometimes go a little into swap, can trigger this system. All of
> the CPUs end up scanning the active list, and all pages have the
> referenced bit set. Even if the system eventually recovers, it
> might as well have been dead.
>
> Going into swap a little should only take a little bit of time.
>
Very fascinating, so we need to scale better with larger memory.
I suspect part of the answer will lie with using large/huge pages.
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
next prev parent reply other threads:[~2007-12-23 23:00 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-18 21:15 [patch 00/20] VM pageout scalability improvements Rik van Riel
2007-12-18 21:15 ` [patch 01/20] convert anon_vma list lock a read/write lock Rik van Riel
2007-12-20 7:07 ` Christoph Lameter
2007-12-18 21:15 ` [patch 02/20] make the inode i_mmap_lock a reader/writer lock Rik van Riel
2007-12-19 0:48 ` Nick Piggin
2007-12-19 4:09 ` KOSAKI Motohiro
2007-12-19 15:52 ` Lee Schermerhorn
2007-12-19 16:31 ` Rik van Riel
2007-12-19 16:53 ` Lee Schermerhorn
2007-12-19 19:28 ` Peter Zijlstra
2007-12-19 23:40 ` Nick Piggin
2007-12-20 7:04 ` Christoph Lameter
2007-12-20 7:59 ` Nick Piggin
2008-01-02 23:35 ` Mike Travis
2008-01-03 6:07 ` Nick Piggin
2008-01-03 8:55 ` Ingo Molnar
2008-01-07 9:01 ` Nick Piggin
2007-12-18 21:15 ` [patch 03/20] move isolate_lru_page() to vmscan.c Rik van Riel
2007-12-20 7:08 ` Christoph Lameter
2007-12-18 21:15 ` [patch 04/20] free swap space on swap-in/activation Rik van Riel
2007-12-18 21:15 ` [patch 05/20] define page_file_cache() function Rik van Riel
2007-12-18 21:15 ` [patch 06/20] debugging checks for page_file_cache() Rik van Riel
2007-12-18 21:15 ` [patch 07/20] Use an indexed array for LRU variables Rik van Riel
2007-12-18 21:15 ` [patch 08/20] split LRU lists into anon & file sets Rik van Riel
2007-12-18 21:15 ` [patch 09/20] split anon & file LRUs for memcontrol code Rik van Riel
2007-12-18 21:15 ` [patch 10/20] SEQ replacement for anonymous pages Rik van Riel
2007-12-19 5:17 ` KOSAKI Motohiro
2007-12-19 13:40 ` Rik van Riel
2007-12-20 2:04 ` KOSAKI Motohiro
2007-12-18 21:15 ` [patch 11/20] add newly swapped in pages to the inactive list Rik van Riel
2007-12-18 21:15 ` [patch 12/20] No Reclaim LRU Infrastructure Rik van Riel
2007-12-18 21:15 ` [patch 13/20] Non-reclaimable page statistics Rik van Riel
2007-12-18 21:15 ` [patch 14/20] Scan noreclaim list for reclaimable pages Rik van Riel
2007-12-18 21:15 ` [patch 15/20] ramfs pages are non-reclaimable Rik van Riel
2007-12-18 21:15 ` [patch 16/20] SHM_LOCKED pages are nonreclaimable Rik van Riel
2007-12-18 21:15 ` [patch 17/20] non-reclaimable mlocked pages Rik van Riel
2007-12-19 0:56 ` Nick Piggin
2007-12-19 13:45 ` Rik van Riel
2007-12-19 14:24 ` Peter Zijlstra
2007-12-19 14:53 ` Rik van Riel
2007-12-19 16:08 ` Lee Schermerhorn
2007-12-19 16:04 ` Lee Schermerhorn
2007-12-20 20:56 ` Rik van Riel
2007-12-21 10:52 ` Nick Piggin
2007-12-21 14:17 ` Rik van Riel
2007-12-23 12:22 ` Nick Piggin
2007-12-24 1:00 ` Rik van Riel
2007-12-19 23:34 ` Nick Piggin
2007-12-20 7:19 ` Christoph Lameter
2007-12-20 15:33 ` Rik van Riel
2007-12-21 17:13 ` Lee Schermerhorn
2007-12-18 21:15 ` [patch 18/20] mlock vma pages under mmap_sem held for read Rik van Riel
2007-12-18 21:15 ` [patch 19/20] handle mlocked pages during map/unmap and truncate Rik van Riel
2007-12-18 21:15 ` [patch 20/20] account mlocked pages Rik van Riel
2007-12-22 20:27 ` [patch 00/20] VM pageout scalability improvements Balbir Singh
2007-12-23 0:21 ` Rik van Riel
2007-12-23 22:59 ` Balbir Singh [this message]
2007-12-24 1:11 ` Rik van Riel
2007-12-28 3:20 ` Matt Mackall
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=476EE858.202@linux.vnet.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=lee.schermerhorn@hp.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).