linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: ppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Dave Chinner <david@fromorbit.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	xfs@oss.sgi.com, Linux-MM <linux-mm@kvack.org>,
	Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur
Date: Mon, 9 Mar 2015 21:02:20 +0000	[thread overview]
Message-ID: <20150309210147.GA3406@suse.de> (raw)
In-Reply-To: <20150308203145.GA4038@suse.de>

On Sun, Mar 08, 2015 at 08:40:25PM +0000, Mel Gorman wrote:
> > Because if the answer is 'yes', then we can safely say: 'we regressed 
> > performance because correctness [not dropping dirty bits] comes before 
> > performance'.
> > 
> > If the answer is 'no', then we still have a mystery (and a regression) 
> > to track down.
> > 
> > As a second hack (not to be applied), could we change:
> > 
> >  #define _PAGE_BIT_PROTNONE      _PAGE_BIT_GLOBAL
> > 
> > to:
> > 
> >  #define _PAGE_BIT_PROTNONE      (_PAGE_BIT_GLOBAL+1)
> > 
> 
> In itself, that's not enough. The SWP_OFFSET_SHIFT would also need updating
> as a partial revert of 21d9ee3eda7792c45880b2f11bff8e95c9a061fb but it
> can be done.
> 

More importantily, _PAGE_BIT_GLOBAL+1 == the special PTE bit so just
updating the value should crash. For the purposes of testing the idea, I
thought the straight-forward option was to break soft dirty page tracking
and steal their bit for testing (patch below). Took most of the day to
get access to the test machine so tests are not long running and only
the autonuma one has completed;

autonumabench
                                              3.19.0             4.0.0-rc1             4.0.0-rc1             4.0.0-rc1
                                             vanilla               vanilla         slowscan-v2r7        protnone-v3
Time User-NUMA01                  25695.96 (  0.00%)    32883.59 (-27.97%)    35288.00 (-37.33%)    35236.21 (-37.13%)
Time User-NUMA01_THEADLOCAL       17404.36 (  0.00%)    17453.20 ( -0.28%)    17765.79 ( -2.08%)    17590.10 ( -1.07%)
Time User-NUMA02                   2037.65 (  0.00%)     2063.70 ( -1.28%)     2063.22 ( -1.25%)     2072.95 ( -1.73%)
Time User-NUMA02_SMT                981.02 (  0.00%)      983.70 ( -0.27%)      976.01 (  0.51%)      983.42 ( -0.24%)
Time System-NUMA01                  194.70 (  0.00%)      602.44 (-209.42%)      209.42 ( -7.56%)      737.36 (-278.72%)
Time System-NUMA01_THEADLOCAL        98.52 (  0.00%)       78.10 ( 20.73%)       92.70 (  5.91%)       80.69 ( 18.10%)
Time System-NUMA02                    9.28 (  0.00%)        6.47 ( 30.28%)        6.06 ( 34.70%)        6.63 ( 28.56%)
Time System-NUMA02_SMT                3.79 (  0.00%)        5.06 (-33.51%)        3.39 ( 10.55%)        3.60 (  5.01%)
Time Elapsed-NUMA01                 558.84 (  0.00%)      755.96 (-35.27%)      833.63 (-49.17%)      804.50 (-43.96%)
Time Elapsed-NUMA01_THEADLOCAL      382.54 (  0.00%)      382.22 (  0.08%)      395.45 ( -3.37%)      388.12 ( -1.46%)
Time Elapsed-NUMA02                  49.83 (  0.00%)       49.38 (  0.90%)       50.21 ( -0.76%)       48.99 (  1.69%)
Time Elapsed-NUMA02_SMT              46.59 (  0.00%)       47.70 ( -2.38%)       48.55 ( -4.21%)       49.50 ( -6.25%)
Time CPU-NUMA01                    4632.00 (  0.00%)     4429.00 (  4.38%)     4258.00 (  8.07%)     4471.00 (  3.48%)
Time CPU-NUMA01_THEADLOCAL         4575.00 (  0.00%)     4586.00 ( -0.24%)     4515.00 (  1.31%)     4552.00 (  0.50%)
Time CPU-NUMA02                    4107.00 (  0.00%)     4191.00 ( -2.05%)     4120.00 ( -0.32%)     4244.00 ( -3.34%)
Time CPU-NUMA02_SMT                2113.00 (  0.00%)     2072.00 (  1.94%)     2017.00 (  4.54%)     1993.00 (  5.68%)

              3.19.0   4.0.0-rc1   4.0.0-rc1   4.0.0-rc1
             vanilla     vanillaslowscan-v2r7protnone-v3
User        46119.12    53384.29    56093.11    55882.82
System        306.41      692.14      311.64      828.36
Elapsed      1039.88     1236.87     1328.61     1292.92

So just using a different bit doesn't seem to be it either

                                3.19.0   4.0.0-rc1   4.0.0-rc1   4.0.0-rc1
                               vanilla     vanillaslowscan-v2r7protnone-v3
NUMA alloc hit                 1202922     1437560     1472578     1499274
NUMA alloc miss                      0           0           0           0
NUMA interleave hit                  0           0           0           0
NUMA alloc local               1200683     1436781     1472226     1498680
NUMA base PTE updates        222840103   304513172   121532313   337431414
NUMA huge PMD updates           434894      594467      237170      658715
NUMA page range updates      445505831   608880276   242963353   674693494
NUMA hint faults                601358      733491      334334      820793
NUMA hint local faults          371571      511530      227171      565003
NUMA hint local percent             61          69          67          68
NUMA pages migrated            7073177    26366701     8607082    31288355

Patch to use a bit other than the global bit for prot none is below.

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 8c7c10802e9c..1f243323693c 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -20,16 +20,16 @@
 #define _PAGE_BIT_SOFTW2	10	/* " */
 #define _PAGE_BIT_SOFTW3	11	/* " */
 #define _PAGE_BIT_PAT_LARGE	12	/* On 2MB or 1GB pages */
-#define _PAGE_BIT_SPECIAL	_PAGE_BIT_SOFTW1
-#define _PAGE_BIT_CPA_TEST	_PAGE_BIT_SOFTW1
+#define _PAGE_BIT_SPECIAL	_PAGE_BIT_SOFTW3
+#define _PAGE_BIT_CPA_TEST	_PAGE_BIT_SOFTW3
 #define _PAGE_BIT_SPLITTING	_PAGE_BIT_SOFTW2 /* only valid on a PSE pmd */
-#define _PAGE_BIT_HIDDEN	_PAGE_BIT_SOFTW3 /* hidden by kmemcheck */
-#define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_SOFTW3 /* software dirty tracking */
+#define _PAGE_BIT_HIDDEN	_PAGE_BIT_SOFTW1 /* hidden by kmemcheck */
+#define _PAGE_BIT_SOFT_DIRTY	_PAGE_BIT_SOFTW1 /* software dirty tracking */
 #define _PAGE_BIT_NX           63       /* No execute: only valid after cpuid check */
 
 /* If _PAGE_BIT_PRESENT is clear, we use these: */
 /* - if the user mapped it with PROT_NONE; pte_present gives true */
-#define _PAGE_BIT_PROTNONE	_PAGE_BIT_GLOBAL
+#define _PAGE_BIT_PROTNONE	_PAGE_BIT_SOFTW1
 
 #define _PAGE_PRESENT	(_AT(pteval_t, 1) << _PAGE_BIT_PRESENT)
 #define _PAGE_RW	(_AT(pteval_t, 1) << _PAGE_BIT_RW)
@@ -98,8 +98,7 @@
 
 /* Set of bits not changed in pte_modify */
 #define _PAGE_CHG_MASK	(PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT |		\
-			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY |	\
-			 _PAGE_SOFT_DIRTY)
+			 _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY)
 #define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE)
 
 /*

  reply	other threads:[~2015-03-09 21:02 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-07 15:20 [RFC PATCH 0/4] Automatic NUMA balancing and PROT_NONE handling followup v2r8 Mel Gorman
2015-03-07 15:20 ` [PATCH 1/4] mm: thp: Return the correct value for change_huge_pmd Mel Gorman
2015-03-07 20:13   ` Linus Torvalds
2015-03-07 20:31   ` Linus Torvalds
2015-03-07 20:56     ` Mel Gorman
2015-03-07 15:20 ` [PATCH 2/4] mm: numa: Remove migrate_ratelimited Mel Gorman
2015-03-07 15:20 ` [PATCH 3/4] mm: numa: Mark huge PTEs young when clearing NUMA hinting faults Mel Gorman
2015-03-07 18:33   ` Linus Torvalds
2015-03-07 18:42     ` Linus Torvalds
2015-03-07 15:20 ` [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur Mel Gorman
2015-03-07 16:36   ` Ingo Molnar
2015-03-07 17:37     ` Mel Gorman
2015-03-08  9:54       ` Ingo Molnar
2015-03-07 19:12     ` Linus Torvalds
2015-03-08 10:02       ` Ingo Molnar
2015-03-08 18:35         ` Linus Torvalds
2015-03-08 18:46           ` Linus Torvalds
2015-03-09 11:29           ` Dave Chinner
2015-03-09 16:52             ` Linus Torvalds
2015-03-09 19:19               ` Dave Chinner
2015-03-10 23:55                 ` Linus Torvalds
2015-03-12 13:10                   ` Mel Gorman
2015-03-12 16:20                     ` Linus Torvalds
2015-03-12 18:49                       ` Mel Gorman
2015-03-17  7:06                         ` Dave Chinner
2015-03-17 16:53                           ` Linus Torvalds
2015-03-17 20:51                             ` Dave Chinner
2015-03-17 21:30                               ` Linus Torvalds
2015-03-17 22:08                                 ` Dave Chinner
2015-03-18 16:08                                   ` Linus Torvalds
2015-03-18 17:31                                     ` Linus Torvalds
2015-03-18 22:23                                       ` Dave Chinner
2015-03-19 14:10                                       ` Mel Gorman
2015-03-19 18:09                                         ` Linus Torvalds
2015-03-19 21:41                                       ` Linus Torvalds
2015-03-19 22:41                                         ` Dave Chinner
2015-03-19 23:05                                           ` Linus Torvalds
2015-03-19 23:23                                             ` Dave Chinner
2015-03-20  0:23                                             ` Dave Chinner
2015-03-20  1:29                                               ` Linus Torvalds
2015-03-20  4:13                                                 ` Dave Chinner
2015-03-20 17:02                                                   ` Linus Torvalds
2015-03-23 12:01                                                     ` Mel Gorman
2015-03-20 10:12                                                 ` Mel Gorman
2015-03-20  9:56                                             ` Mel Gorman
2015-03-08 20:40         ` Mel Gorman
2015-03-09 21:02           ` Mel Gorman [this message]
2015-03-10 13:08             ` Mel Gorman
2015-03-08  9:41   ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150309210147.GA3406@suse.de \
    --to=mgorman@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=david@fromorbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).