From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>,
KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
Gilad Ben-Yossef <gilad@benyossef.com>,
linux-kernel@vger.kernel.org, Chris Metcalf <cmetcalf@tilera.com>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Frederic Weisbecker <fweisbec@gmail.com>,
linux-mm@kvack.org, Pekka Enberg <penberg@kernel.org>,
Matt Mackall <mpm@selenic.com>,
Sasha Levin <levinsasha928@gmail.com>,
Rik van Riel <riel@redhat.com>, Andi Kleen <andi@firstfloor.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
linux-fsdevel@vger.kernel.org, Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
Date: Mon, 9 Jan 2012 17:25:48 +0000 [thread overview]
Message-ID: <20120109172548.GJ27881@csn.ul.ie> (raw)
In-Reply-To: <20120105151919.37d64365.akpm@linux-foundation.org>
On Thu, Jan 05, 2012 at 03:19:19PM -0800, Andrew Morton wrote:
> On Thu, 5 Jan 2012 22:31:06 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
>
> > On Thu, Jan 05, 2012 at 02:06:45PM -0800, Andrew Morton wrote:
> > > On Thu, 5 Jan 2012 16:17:39 +0000
> > > Mel Gorman <mel@csn.ul.ie> wrote:
> > >
> > > > mm: page allocator: Guard against CPUs going offline while draining per-cpu page lists
> > > >
> > > > While running a CPU hotplug stress test under memory pressure, I
> > > > saw cases where under enough stress the machine would halt although
> > > > it required a machine with 8 cores and plenty memory. I think the
> > > > problems may be related.
> > >
> > > When we first implemented them, the percpu pages in the page allocator
> > > were of really really marginal benefit. I didn't merge the patches at
> > > all for several cycles, and it was eventually a 49/51 decision.
> > >
> > > So I suggest that our approach to solving this particular problem
> > > should be to nuke the whole thing, then see if that caused any
> > > observeable problems. If it did, can we solve those problems by means
> > > other than bringing the dang things back?
> > >
> >
> > Sounds drastic.
>
> Wrong thinking ;)
>
:)
> Simplifying the code should always be the initial proposal. Adding
> more complexity on top is the worst-case when-all-else-failed option.
> Yet we so often reach for that option first :(
>
Enngghh, I really want to agree with you but reducing lock contention
has been such an important goal for a long time that I am really loathe
to just rip it out and hope for the best.
> > It would be less controversial to replace this patch
> > with a version that calls get_online_cpu() in drain_all_pages() but
> > remove the call to drain_all_pages() call from the page allocator on
> > the grounds it is not safe against CPU hotplug and to hell with the
> > slightly elevated allocation failure rates and stalls. That would avoid
> > the try_get_online_cpus() crappiness and be less complex.
>
> If we can come up with a reasonably simple patch which improves or even
> fixes the problem then I suppose there is some value in that, as it
> provides users of earlier kernels with something to backport if they
> hit problems.
>
I'm preparing a patch that is a simplier fix but not sending an IPI at
all. There is also a sysfs fix that is necessary for tests to complete
successfully. The details will be in the series.
> But the social downside of that is that everyone would shuffle off
> towards other bright and shiny things and we'd be stuck with more
> complexity piled on top of dubiously beneficial code.
>
> > If you really want to consider deleting the per-cpu allocator, maybe
> > it could be a LSF/MM topic?
>
> eek, spare me.
>
It was worth a shot.
> Anyway, we couldn't discuss such a topic without data. Such data would
> be obtained by deleting the code and measuring the results. Which is
> what I just said ;)
>
Crap. ok. I've added a TODO list to implement a patch that removes it.
It is at a lower priority than removing lumpy reclaim though -
eventally this TODO list will start shrinking. I'll need to put
some thought into how it can be tested but even then I probably am
not the best person to test it. I don't have regular access to a 2+
socket machine to test NUMA effects for example.
> > Personally I would be wary of deleting
> > it but mostly because I lack regular access to the type of hardware
> > to evaulate whether it was safe to remove or not. Minimally, removing
> > the per-cpu allocator could make the zone lock very hot even though slub
> > probably makes it very hot already.
>
> Much of the testing of the initial code was done on mbligh's weirdass
> NUMAq box: 32-way 386 NUMA which suffered really badly if there were
> contention issues. And even on that box, the code was marginal. So
> I'm hopeful that things will be similar on current machines. Of
> course, it's possible that calling patterns have changed in ways which
> make the code more beneficial than it used to be.
>
Core counts are also higher and some workloads might be more
allocator intensive than they used to be - netperf and network-related
allocations for socket receive might be a problem for example.
> But this all ties into my proposal yesterday to remove
> mm/swap.c:lru_*_pvecs. Most or all of the heavy one-page-at-a-time
> code can pretty easily be converted to operate on batches of pages.
>
> Folowing on from that, it should be pretty simple to extend the
> batching down into the page freeing. Look at put_pages_list() and
> weep. And stuff like free_hot_cold_page_list() which could easily free
> the pages directly whilebatching the locking.
>
> Page freeing should be relatively straightforward. Batching page
> allocation is hard in some cases (anonymous pagefaults).
>
Page faulting would certainly be hard to batch but it would only be
really a big problem if they are intensive enough and on enough CPUs to
cause zone lock contention that was a problem.
> Please do note that the above suggestions are only needed if removing
> the pcp lists causes a problem! It may not.
>
True.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2012-01-09 17:25 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1325499859-2262-1-git-send-email-gilad@benyossef.com>
2012-01-02 10:24 ` [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
2012-01-03 7:51 ` Michal Nazarewicz
2012-01-03 8:12 ` Gilad Ben-Yossef
2012-01-03 8:57 ` Michal Nazarewicz
2012-01-03 22:26 ` Andrew Morton
2012-01-05 13:17 ` Michal Nazarewicz
2012-01-08 16:04 ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 2/8] arm: Move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 3/8] tile: Move tile to use " Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 4/8] smp: Add func to IPI cpus based on parameter func Gilad Ben-Yossef
2012-01-03 22:34 ` Andrew Morton
2012-01-08 16:09 ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 5/8] slub: Only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
2012-01-03 17:45 ` KOSAKI Motohiro
2012-01-03 18:58 ` Gilad Ben-Yossef
2012-01-03 22:02 ` KOSAKI Motohiro
2012-01-05 14:20 ` Mel Gorman
2012-01-05 14:40 ` Russell King - ARM Linux
2012-01-05 15:24 ` Peter Zijlstra
2012-01-05 16:17 ` Mel Gorman
2012-01-05 16:35 ` Russell King - ARM Linux
2012-01-05 18:35 ` Paul E. McKenney
2012-01-05 22:21 ` Mel Gorman
2012-01-06 6:06 ` Srivatsa S. Bhat
2012-01-06 10:46 ` Mel Gorman
2012-01-06 13:28 ` Greg KH
2012-01-06 14:09 ` Mel Gorman
2012-01-05 22:06 ` Andrew Morton
2012-01-05 22:31 ` Mel Gorman
2012-01-05 23:19 ` Andrew Morton
2012-01-09 17:25 ` Mel Gorman [this message]
2012-01-07 16:52 ` Paul E. McKenney
2012-01-07 17:05 ` Paul E. McKenney
2012-01-05 15:54 ` Mel Gorman
2012-01-08 16:01 ` Gilad Ben-Yossef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120109172548.GJ27881@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=avi@redhat.com \
--cc=cmetcalf@tilera.com \
--cc=fweisbec@gmail.com \
--cc=gilad@benyossef.com \
--cc=kosaki.motohiro@gmail.com \
--cc=levinsasha928@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@arm.linux.org.uk \
--cc=mpm@selenic.com \
--cc=penberg@kernel.org \
--cc=riel@redhat.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).