linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Russell King - ARM Linux <linux@arm.linux.org.uk>,
	KOSAKI Motohiro <kosaki.motohiro@gmail.com>,
	Gilad Ben-Yossef <gilad@benyossef.com>,
	linux-kernel@vger.kernel.org, Chris Metcalf <cmetcalf@tilera.com>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	linux-mm@kvack.org, Pekka Enberg <penberg@kernel.org>,
	Matt Mackall <mpm@selenic.com>,
	Sasha Levin <levinsasha928@gmail.com>,
	Rik van Riel <riel@redhat.com>, Andi Kleen <andi@firstfloor.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, Avi Kivity <avi@redhat.com>
Subject: Re: [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist
Date: Mon, 9 Jan 2012 17:25:48 +0000	[thread overview]
Message-ID: <20120109172548.GJ27881@csn.ul.ie> (raw)
In-Reply-To: <20120105151919.37d64365.akpm@linux-foundation.org>

On Thu, Jan 05, 2012 at 03:19:19PM -0800, Andrew Morton wrote:
> On Thu, 5 Jan 2012 22:31:06 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Thu, Jan 05, 2012 at 02:06:45PM -0800, Andrew Morton wrote:
> > > On Thu, 5 Jan 2012 16:17:39 +0000
> > > Mel Gorman <mel@csn.ul.ie> wrote:
> > > 
> > > > mm: page allocator: Guard against CPUs going offline while draining per-cpu page lists
> > > > 
> > > > While running a CPU hotplug stress test under memory pressure, I
> > > > saw cases where under enough stress the machine would halt although
> > > > it required a machine with 8 cores and plenty memory. I think the
> > > > problems may be related.
> > > 
> > > When we first implemented them, the percpu pages in the page allocator
> > > were of really really marginal benefit.  I didn't merge the patches at
> > > all for several cycles, and it was eventually a 49/51 decision.
> > > 
> > > So I suggest that our approach to solving this particular problem
> > > should be to nuke the whole thing, then see if that caused any
> > > observeable problems.  If it did, can we solve those problems by means
> > > other than bringing the dang things back?
> > > 
> > 
> > Sounds drastic.
> 
> Wrong thinking ;)
> 

:)

> Simplifying the code should always be the initial proposal.  Adding
> more complexity on top is the worst-case when-all-else-failed option. 
> Yet we so often reach for that option first :(
> 

Enngghh, I really want to agree with you but reducing lock contention
has been such an important goal for a long time that I am really loathe
to just rip it out and hope for the best.

> > It would be less controversial to replace this patch
> > with a version that calls get_online_cpu() in drain_all_pages() but
> > remove the call to drain_all_pages() call from the page allocator on
> > the grounds it is not safe against CPU hotplug and to hell with the
> > slightly elevated allocation failure rates and stalls. That would avoid
> > the try_get_online_cpus() crappiness and be less complex.
> 
> If we can come up with a reasonably simple patch which improves or even
> fixes the problem then I suppose there is some value in that, as it
> provides users of earlier kernels with something to backport if they
> hit problems.
> 

I'm preparing a patch that is a simplier fix but not sending an IPI at
all. There is also a sysfs fix that is necessary for tests to complete
successfully. The details will be in the series.

> But the social downside of that is that everyone would shuffle off
> towards other bright and shiny things and we'd be stuck with more
> complexity piled on top of dubiously beneficial code.
> 
> > If you really want to consider deleting the per-cpu allocator, maybe
> > it could be a LSF/MM topic?
> 
> eek, spare me.
> 

It was worth a shot.

> Anyway, we couldn't discuss such a topic without data.  Such data would
> be obtained by deleting the code and measuring the results.  Which is
> what I just said ;)
> 

Crap. ok. I've added a TODO list to implement a patch that removes it.
It is at a lower priority than removing lumpy reclaim though -
eventally this TODO list will start shrinking. I'll need to put
some thought into how it can be tested but even then I probably am
not the best person to test it. I don't have regular access to a 2+
socket machine to test NUMA effects for example.

> > Personally I would be wary of deleting
> > it but mostly because I lack regular access to the type of hardware
> > to evaulate whether it was safe to remove or not. Minimally, removing
> > the per-cpu allocator could make the zone lock very hot even though slub
> > probably makes it very hot already.
> 
> Much of the testing of the initial code was done on mbligh's weirdass
> NUMAq box: 32-way 386 NUMA which suffered really badly if there were
> contention issues.  And even on that box, the code was marginal.  So
> I'm hopeful that things will be similar on current machines.  Of
> course, it's possible that calling patterns have changed in ways which
> make the code more beneficial than it used to be.
> 

Core counts are also higher and some workloads might be more
allocator intensive than they used to be - netperf and network-related
allocations for socket receive might be a problem for example.

> But this all ties into my proposal yesterday to remove
> mm/swap.c:lru_*_pvecs.  Most or all of the heavy one-page-at-a-time
> code can pretty easily be converted to operate on batches of pages. 
>
> Folowing on from that, it should be pretty simple to extend the
> batching down into the page freeing.  Look at put_pages_list() and
> weep.  And stuff like free_hot_cold_page_list() which could easily free
> the pages directly whilebatching the locking.
> 
> Page freeing should be relatively straightforward.  Batching page
> allocation is hard in some cases (anonymous pagefaults).
> 

Page faulting would certainly be hard to batch but it would only be
really a big problem if they are intensive enough and on enough CPUs to
cause zone lock contention that was a problem.

> Please do note that the above suggestions are only needed if removing
> the pcp lists causes a problem!  It may not.
> 

True.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2012-01-09 17:25 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1325499859-2262-1-git-send-email-gilad@benyossef.com>
2012-01-02 10:24 ` [PATCH v5 1/8] smp: Introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
2012-01-03  7:51   ` Michal Nazarewicz
2012-01-03  8:12     ` Gilad Ben-Yossef
2012-01-03  8:57       ` Michal Nazarewicz
2012-01-03 22:26   ` Andrew Morton
2012-01-05 13:17     ` Michal Nazarewicz
2012-01-08 16:04     ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 2/8] arm: Move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 3/8] tile: Move tile to use " Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 4/8] smp: Add func to IPI cpus based on parameter func Gilad Ben-Yossef
2012-01-03 22:34   ` Andrew Morton
2012-01-08 16:09     ` Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 5/8] slub: Only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef
2012-01-02 10:24 ` [PATCH v5 7/8] mm: Only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
2012-01-03 17:45   ` KOSAKI Motohiro
2012-01-03 18:58     ` Gilad Ben-Yossef
2012-01-03 22:02       ` KOSAKI Motohiro
2012-01-05 14:20     ` Mel Gorman
2012-01-05 14:40       ` Russell King - ARM Linux
2012-01-05 15:24         ` Peter Zijlstra
2012-01-05 16:17         ` Mel Gorman
2012-01-05 16:35           ` Russell King - ARM Linux
2012-01-05 18:35             ` Paul E. McKenney
2012-01-05 22:21               ` Mel Gorman
2012-01-06  6:06                 ` Srivatsa S. Bhat
2012-01-06 10:46                   ` Mel Gorman
2012-01-06 13:28                 ` Greg KH
2012-01-06 14:09                   ` Mel Gorman
2012-01-05 22:06           ` Andrew Morton
2012-01-05 22:31             ` Mel Gorman
2012-01-05 23:19               ` Andrew Morton
2012-01-09 17:25                 ` Mel Gorman [this message]
2012-01-07 16:52           ` Paul E. McKenney
2012-01-07 17:05             ` Paul E. McKenney
2012-01-05 15:54   ` Mel Gorman
2012-01-08 16:01     ` Gilad Ben-Yossef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120109172548.GJ27881@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=avi@redhat.com \
    --cc=cmetcalf@tilera.com \
    --cc=fweisbec@gmail.com \
    --cc=gilad@benyossef.com \
    --cc=kosaki.motohiro@gmail.com \
    --cc=levinsasha928@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@arm.linux.org.uk \
    --cc=mpm@selenic.com \
    --cc=penberg@kernel.org \
    --cc=riel@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).