From: Dave Chinner <david@fromorbit.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
xfs@oss.sgi.com, Linux-MM <linux-mm@kvack.org>,
Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
Andrew Morton <akpm@linux-foundation.org>,
ppc-dev <linuxppc-dev@lists.ozlabs.org>,
Ingo Molnar <mingo@kernel.org>, Mel Gorman <mgorman@suse.de>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
Date: Wed, 18 Mar 2015 09:08:40 +1100 [thread overview]
Message-ID: <20150317220840.GC28621@dastard> (raw)
In-Reply-To: <CA+55aFzSPcNgxw4GC7aAV1r0P5LniyVVC66COz=3cgMcx73Nag@mail.gmail.com>
On Tue, Mar 17, 2015 at 02:30:57PM -0700, Linus Torvalds wrote:
> On Tue, Mar 17, 2015 at 1:51 PM, Dave Chinner <david@fromorbit.com> wrote:
> >
> > On the -o ag_stride=-1 -o bhash=101073 config, the 60s perf stat I
> > was using during steady state shows:
> >
> > 471,752 migrate:mm_migrate_pages ( +- 7.38% )
> >
> > The migrate pages rate is even higher than in 4.0-rc1 (~360,000)
> > and 3.19 (~55,000), so that looks like even more of a problem than
> > before.
>
> Hmm. How stable are those numbers boot-to-boot?
I've run the test several times but only profiles once so far.
runtimes were 7m45, 7m50, 7m44s, 8m2s, and the profiles came from
the 8m2s run.
reboot, run again:
$ sudo perf stat -a -r 6 -e migrate:mm_migrate_pages sleep 10
Performance counter stats for 'system wide' (6 runs):
572,839 migrate:mm_migrate_pages ( +- 3.15% )
10.001664694 seconds time elapsed ( +- 0.00% )
$
And just to confirm, a minute later, still in phase 3:
590,974 migrate:mm_migrate_pages ( +- 2.86% )
Reboot, run again:
575,344 migrate:mm_migrate_pages ( +- 0.70% )
So there is boot-to-boot variation, but it doesn't look like it
gets any better....
> That kind of extreme spread makes me suspicious. It's also interesting
> that if the numbers really go up even more (and by that big amount),
> then why does there seem to be almost no correlation with performance
> (which apparently went up since rc1, despite migrate_pages getting
> even _worse_).
>
> > And the profile looks like:
> >
> > - 43.73% 0.05% [kernel] [k] native_flush_tlb_others
>
> Ok, that's down from rc1 (67%), but still hugely up from 3.19 (13.7%).
> And flush_tlb_page() does seem to be called about ten times more
> (flush_tlb_mm_range used to be 1.4% of the callers, now it's invisible
> at 0.13%)
>
> Damn. From a performance number standpoint, it looked like we zoomed
> in on the right thing. But now it's migrating even more pages than
> before. Odd.
Throttling problem, like Mel originally suspected?
> > And the vmstats are:
> >
> > 3.19:
> >
> > numa_hit 5163221
> > numa_local 5153127
>
> > 4.0-rc1:
> >
> > numa_hit 36952043
> > numa_local 36927384
> >
> > 4.0-rc4:
> >
> > numa_hit 23447345
> > numa_local 23438564
> >
> > Page migrations are still up by a factor of ~20 on 3.19.
>
> The thing is, those "numa_hit" things come from the zone_statistics()
> call in buffered_rmqueue(), which in turn is simple from the memory
> allocator. That has *nothing* to do with virtual memory, and
> everything to do with actual physical memory allocations. So the load
> is simply allocating a lot more pages, presumably for those stupid
> migration events.
>
> But then it doesn't correlate with performance anyway..
>
> Can you do a simple stupid test? Apply that commit 53da3bc2ba9e ("mm:
> fix up numa read-only thread grouping logic") to 3.19, so that it uses
> the same "pte_dirty()" logic as 4.0-rc4. That *should* make the 3.19
> and 4.0-rc4 numbers comparable.
patched 3.19 numbers on this test are slightly worse than stock
3.19, but nowhere near as bad as 4.0-rc4:
241,718 migrate:mm_migrate_pages ( +- 5.17% )
So that pte_write->pte_dirty change makes this go from ~55k to 240k,
and runtime go from 4m54s to 5m20s. vmstats:
numa_hit 9162476
numa_miss 0
numa_foreign 0
numa_interleave 10685
numa_local 9153740
numa_other 8736
numa_pte_updates 49582103
numa_huge_pte_updates 0
numa_hint_faults 48075098
numa_hint_faults_local 12974704
numa_pages_migrated 5748256
pgmigrate_success 5748256
pgmigrate_fail 0
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2015-03-17 22:08 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-07 15:20 [RFC PATCH 0/4] Automatic NUMA balancing and PROT_NONE handling followup v2r8 Mel Gorman
2015-03-07 15:20 ` [PATCH 1/4] mm: thp: Return the correct value for change_huge_pmd Mel Gorman
2015-03-07 20:13 ` Linus Torvalds
2015-03-07 20:31 ` Linus Torvalds
2015-03-07 20:56 ` Mel Gorman
2015-03-07 15:20 ` [PATCH 2/4] mm: numa: Remove migrate_ratelimited Mel Gorman
2015-03-07 15:20 ` [PATCH 3/4] mm: numa: Mark huge PTEs young when clearing NUMA hinting faults Mel Gorman
2015-03-07 18:33 ` Linus Torvalds
2015-03-07 18:42 ` Linus Torvalds
2015-03-07 15:20 ` [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Mel Gorman
2015-03-07 16:36 ` Ingo Molnar
2015-03-07 17:37 ` Mel Gorman
2015-03-08 9:54 ` Ingo Molnar
2015-03-07 19:12 ` Linus Torvalds
2015-03-08 10:02 ` Ingo Molnar
2015-03-08 18:35 ` Linus Torvalds
2015-03-08 18:46 ` Linus Torvalds
2015-03-09 11:29 ` Dave Chinner
2015-03-09 16:52 ` Linus Torvalds
2015-03-09 19:19 ` Dave Chinner
2015-03-10 23:55 ` Linus Torvalds
2015-03-12 13:10 ` Mel Gorman
2015-03-12 16:20 ` Linus Torvalds
2015-03-12 18:49 ` Mel Gorman
2015-03-17 7:06 ` Dave Chinner
2015-03-17 16:53 ` Linus Torvalds
2015-03-17 20:51 ` Dave Chinner
2015-03-17 21:30 ` Linus Torvalds
2015-03-17 22:08 ` Dave Chinner [this message]
2015-03-18 16:08 ` Linus Torvalds
2015-03-18 17:31 ` Linus Torvalds
2015-03-18 22:23 ` Dave Chinner
2015-03-19 14:10 ` Mel Gorman
2015-03-19 18:09 ` Linus Torvalds
2015-03-19 21:41 ` Linus Torvalds
2015-03-19 22:41 ` Dave Chinner
2015-03-19 23:05 ` Linus Torvalds
2015-03-19 23:23 ` Dave Chinner
2015-03-20 0:23 ` Dave Chinner
2015-03-20 1:29 ` Linus Torvalds
2015-03-20 4:13 ` Dave Chinner
2015-03-20 17:02 ` Linus Torvalds
2015-03-23 12:01 ` Mel Gorman
2015-03-20 10:12 ` Mel Gorman
2015-03-20 9:56 ` Mel Gorman
2015-03-08 20:40 ` Mel Gorman
2015-03-09 21:02 ` Mel Gorman
2015-03-10 13:08 ` Mel Gorman
2015-03-08 9:41 ` Ingo Molnar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150317220840.GC28621@dastard \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).