linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Ingo Molnar <mingo@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Paul Turner <pjt@google.com>,
	Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	Christoph Lameter <cl@linux.com>, Rik van Riel <riel@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH 00/10] Latest numa/core release, v18
Date: Tue, 4 Dec 2012 22:49:57 +0000	[thread overview]
Message-ID: <20121204224957.GC2797@suse.de> (raw)
In-Reply-To: <1354305521-11583-1-git-send-email-mingo@kernel.org>

On Fri, Nov 30, 2012 at 08:58:31PM +0100, Ingo Molnar wrote:
> I'm pleased to announce the latest, -v18 numa/core release.
> 

I collected the results for the following kernels

stats-v8r6	  TLB flush optimisations, stats from balancenuma tree
numacore-20121130 numacore v17 (tip/master as of Nov 30th)
numacore-20121202 numacore v18 (tip/master as of Dec  2nd)
numabase-20121203 unified tree (tip/numa/base as of Dec 3rd)
autonuma-v8fastr4 autonuma rebased with THP patch on top
balancenuma-v9r2  Almost identical to balancenuma v8 but as a build fix for mips
balancenuma-v10r1 v9 + Ingo's migration optimisation on top

Unfortunately, I did not get very far with the comparison. On looking
at just the first set of results, I noticed something screwy with the
numacore-20121202 and numabase-20121203 results. It becomes obvious if
you look at the autonuma benchmark.

AUTONUMA BENCH
                                      3.7.0-rc7             3.7.0-rc6             3.7.0-rc7             3.7.0-rc7             3.7.0-rc7             3.7.0-rc7             3.7.0-rc7
                                     stats-v8r6     numacore-20121130     numacore-20121202     numabase-20121203    autonuma-v28fastr4      balancenuma-v9r2     balancenuma-v10r1
User    NUMA01               65230.85 (  0.00%)    24835.22 ( 61.93%)    69344.37 ( -6.31%)    62845.76 (  3.66%)    30410.22 ( 53.38%)    52436.65 ( 19.61%)    42111.49 ( 35.44%)
User    NUMA01_THEADLOCAL    60794.67 (  0.00%)    17856.17 ( 70.63%)    53416.06 ( 12.14%)    50088.06 ( 17.61%)    17185.34 ( 71.73%)    17829.96 ( 70.67%)    17820.65 ( 70.69%)
User    NUMA02                7031.50 (  0.00%)     2084.38 ( 70.36%)     6726.17 (  4.34%)     6713.99 (  4.52%)     2238.73 ( 68.16%)     2079.48 ( 70.43%)     2068.27 ( 70.59%)
User    NUMA02_SMT            2916.19 (  0.00%)     1009.28 ( 65.39%)     3207.30 ( -9.98%)     3150.35 ( -8.03%)     1037.07 ( 64.44%)      997.57 ( 65.79%)      990.41 ( 66.04%)
System  NUMA01                  39.66 (  0.00%)      926.55 (-2236.23%)      333.49 (-740.87%)      283.49 (-614.80%)      236.83 (-497.15%)      275.09 (-593.62%)      329.73 (-731.39%)
System  NUMA01_THEADLOCAL       42.33 (  0.00%)      513.99 (-1114.25%)       40.59 (  4.11%)       38.80 (  8.34%)       70.90 (-67.49%)      110.82 (-161.80%)      114.57 (-170.66%)
System  NUMA02                   1.25 (  0.00%)       18.57 (-1385.60%)        1.04 ( 16.80%)        1.06 ( 15.20%)        6.39 (-411.20%)        6.42 (-413.60%)        6.97 (-457.60%)
System  NUMA02_SMT              16.66 (  0.00%)       12.32 ( 26.05%)        0.95 ( 94.30%)        0.93 ( 94.42%)        3.17 ( 80.97%)        3.58 ( 78.51%)        5.75 ( 65.49%)
Elapsed NUMA01                1511.76 (  0.00%)      575.93 ( 61.90%)     1644.63 ( -8.79%)     1508.19 (  0.24%)      701.62 ( 53.59%)     1185.53 ( 21.58%)      950.50 ( 37.13%)
Elapsed NUMA01_THEADLOCAL     1387.17 (  0.00%)      398.55 ( 71.27%)     1260.92 (  9.10%)     1257.44 (  9.35%)      378.47 ( 72.72%)      397.37 ( 71.35%)      399.97 ( 71.17%)
Elapsed NUMA02                 176.81 (  0.00%)       51.14 ( 71.08%)      180.80 ( -2.26%)      180.59 ( -2.14%)       53.45 ( 69.77%)       49.51 ( 72.00%)       50.93 ( 71.20%)
Elapsed NUMA02_SMT             163.96 (  0.00%)       48.92 ( 70.16%)      166.96 ( -1.83%)      163.94 (  0.01%)       48.17 ( 70.62%)       47.71 ( 70.90%)       46.76 ( 71.48%)
CPU     NUMA01                4317.00 (  0.00%)     4473.00 ( -3.61%)     4236.00 (  1.88%)     4185.00 (  3.06%)     4368.00 ( -1.18%)     4446.00 ( -2.99%)     4465.00 ( -3.43%)
CPU     NUMA01_THEADLOCAL     4385.00 (  0.00%)     4609.00 ( -5.11%)     4239.00 (  3.33%)     3986.00 (  9.10%)     4559.00 ( -3.97%)     4514.00 ( -2.94%)     4484.00 ( -2.26%)
CPU     NUMA02                3977.00 (  0.00%)     4111.00 ( -3.37%)     3720.00 (  6.46%)     3718.00 (  6.51%)     4200.00 ( -5.61%)     4212.00 ( -5.91%)     4074.00 ( -2.44%)
CPU     NUMA02_SMT            1788.00 (  0.00%)     2087.00 (-16.72%)     1921.00 ( -7.44%)     1922.00 ( -7.49%)     2159.00 (-20.75%)     2098.00 (-17.34%)     2130.00 (-19.13%)

While numacore-v17 did quite well for the range of workloads, v18 does
not. It's just about comparable to mainline and the unified tree is more
or less the same.

balancenuma does reasonably well. It does not do a great job on numa01
but it's better than mainline is and it's been explained already why
balancenuma without a placement policy is not able to interleave like the
adverse workload requires.

MMTests Statistics: duration
           3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7   3.7.0-rc7   3.7.0-rc7   3.7.0-rc7
          stats-v8r6numacore-20121130numacore-20121202numabase-20121203autonuma-v28fastr4balancenuma-v9r2balancenuma-v10r1
User       135980.38    45792.55   132701.13   122805.28    50878.50    73350.91    62997.72
System        100.53     1472.19      376.74      324.98      317.89      396.58      457.66
Elapsed      3248.36     1084.63     3262.62     3118.70     1191.85     1689.70     1456.66

Everyone adds system CPU overhead. numacore-v18 has lower overhead than
v17 and I thought it might be how worklets were accounted for but then I
looked at the vmstats.

MMTests Statistics: vmstat
                             3.7.0-rc7   3.7.0-rc6   3.7.0-rc7   3.7.0-rc7   3.7.0-rc7   3.7.0-rc7   3.7.0-rc7
                            stats-v8r6numacore-20121130numacore-20121202numabase-20121203autonuma-v28fastr4balancenuma-v9r2balancenuma-v10r1
Page Ins                         42320       41628       40624       40404       41592       40524       40800
Page Outs                        16516        8032       17064       16320        8596       10712        9652
Swap Ins                             0           0           0           0           0           0           0
Swap Outs                            0           0           0           0           0           0           0
Direct pages scanned                 0           0           0           0           0           0           0
Kswapd pages scanned                 0           0           0           0           0           0           0
Kswapd pages reclaimed               0           0           0           0           0           0           0
Direct pages reclaimed               0           0           0           0           0           0           0
Kswapd efficiency                 100%        100%        100%        100%        100%        100%        100%
Kswapd velocity                  0.000       0.000       0.000       0.000       0.000       0.000       0.000
Direct efficiency                 100%        100%        100%        100%        100%        100%        100%
Direct velocity                  0.000       0.000       0.000       0.000       0.000       0.000       0.000
Percentage direct scans             0%          0%          0%          0%          0%          0%          0%
Page writes by reclaim               0           0           0           0           0           0           0
Page writes file                     0           0           0           0           0           0           0
Page writes anon                     0           0           0           0           0           0           0
Page reclaim immediate               0           0           0           0           0           0           0
Page rescued immediate               0           0           0           0           0           0           0
Slabs scanned                        0           0           0           0           0           0           0
Direct inode steals                  0           0           0           0           0           0           0
Kswapd inode steals                  0           0           0           0           0           0           0
Kswapd skipped wait                  0           0           0           0           0           0           0
THP fault alloc                  17801       13484       19107       19323       20032       18691       17880
THP collapse alloc                  14           0           6          11          54           9           5
THP splits                           5           0           5           6           7           2           8
THP fault fallback                   0           0           0           0           0           0           0
THP collapse fail                    0           0           0           0           0           0           0
Compaction stalls                    0           0           0           0           0           0           0
Compaction success                   0           0           0           0           0           0           0
Compaction failures                  0           0           0           0           0           0           0
Page migrate success                 0           0           0           0           0     9599473     9266463
Page migrate failure                 0           0           0           0           0           0           0
Compaction pages isolated            0           0           0           0           0           0           0
Compaction migrate scanned           0           0           0           0           0           0           0
Compaction free scanned              0           0           0           0           0           0           0
Compaction cost                      0           0           0           0           0        9964        9618
NUMA PTE updates                     0           0           0           0           0   132800892   130575725
NUMA hint faults                     0           0           0           0           0      606294      501532
NUMA hint local faults               0           0           0           0           0      453880      370744
NUMA pages migrated                  0           0           0           0           0     9599473     9266463
AutoNUMA cost                        0           0           0           0           0        4143        3597

The unified tree numabase-20121203 should have had some NUMA PTE activity
and the stat code looked ok at a glance. However, zero activity there
implies that numacore is completely disabled or non-existant. I checked,
the patch had applied and it was certainly enabled in the kernel config
so I looked closer and I see that task_tick_numa looks like this.

static void task_tick_numa(struct rq *rq, struct task_struct *curr)
{
        /* Cheap checks first: */
        if (!task_numa_candidate(curr)) {
                if (curr->numa_shared >= 0)
                        curr->numa_shared = -1;
                return;
        }

        task_tick_numa_scan(rq, curr);
        task_tick_numa_placement(rq, curr);
}

Ok, so task_numa_candidate() is meant to shortcut expensive steps, fair
enough but it begins with this check.

        /* kthreads don't have any user-space memory to scan: */
        if (!p->mm || !p->numa_faults)
                return false;

How is numa_faults ever meant to be positive if task_tick_numa_scan()
never even gets the chance to run to set a PTE pte_numa? Is numacore not
effectively disabled? I'm also not 100% sure that the "/* Don't disturb
hard-bound tasks: */" is correct either.  A task could be bound to the
CPUs across 2 nodes, just not all nodes and still want to do balancing.

Ingo, you reported that you were seeing results within 1% of
hard-binding. What were you testing with and are you sure that's what you
pushed to tip/master? The damage appears to be caused by "sched: Add RSS
filter to NUMA-balancing" which is doing more than just RSS filtering but
if so, then it's not clear what you were testing that you saw good results
with it unless you accidentally merged the wrong version of that patch.

I'll stop the analysis for now. FWIW, very broadly speaking it looked like
the migration scalability patches help balancenuma a bit for some of the
tests although it increases system CPU usage a little.

-- 
Mel Gorman
SUSE Labs

      parent reply	other threads:[~2012-12-04 22:58 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-30 19:58 [PATCH 00/10] Latest numa/core release, v18 Ingo Molnar
2012-11-30 19:58 ` [PATCH 01/10] sched: Add "task flipping" support Ingo Molnar
2012-11-30 19:58 ` [PATCH 02/10] sched: Move the NUMA placement logic to a worklet Ingo Molnar
2012-11-30 19:58 ` [PATCH 03/10] numa, mempolicy: Improve CONFIG_NUMA_BALANCING=y OOM behavior Ingo Molnar
2012-11-30 19:58 ` [PATCH 04/10] mm, numa: Turn 4K pte NUMA faults into effective hugepage ones Ingo Molnar
2012-11-30 19:58 ` [PATCH 05/10] sched: Introduce directed NUMA convergence Ingo Molnar
2012-11-30 19:58 ` [PATCH 06/10] sched: Remove statistical NUMA scheduling Ingo Molnar
2012-11-30 19:58 ` [PATCH 07/10] sched: Track quality and strength of convergence Ingo Molnar
2012-11-30 19:58 ` [PATCH 08/10] sched: Converge NUMA migrations Ingo Molnar
2012-11-30 19:58 ` [PATCH 09/10] sched: Add convergence strength based adaptive NUMA page fault rate Ingo Molnar
2012-11-30 19:58 ` [PATCH 10/10] sched: Refine the 'shared tasks' memory interleaving logic Ingo Molnar
2012-11-30 20:37 ` [PATCH 00/10] Latest numa/core release, v18 Linus Torvalds
2012-12-01  9:49   ` [RFC PATCH] mm/migration: Don't lock anon vmas in rmap_walk_anon() Ingo Molnar
2012-12-01 12:26     ` [RFC PATCH] mm/migration: Remove anon vma locking from try_to_unmap() use Ingo Molnar
2012-12-01 18:38       ` Linus Torvalds
2012-12-01 18:41         ` Ingo Molnar
2012-12-01 18:50           ` Linus Torvalds
2012-12-01 20:10             ` [PATCH 1/2] mm/rmap: Convert the struct anon_vma::mutex to an rwsem Ingo Molnar
2012-12-01 20:19               ` Rik van Riel
2012-12-02 15:10                 ` Ingo Molnar
2012-12-03 13:59               ` Mel Gorman
2012-12-01 20:15             ` [PATCH 2/2] mm/migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable Ingo Molnar
2012-12-01 20:33               ` Rik van Riel
2012-12-02 15:12                 ` [PATCH 2/2, v2] " Ingo Molnar
2012-12-02 17:53                   ` Rik van Riel
2012-12-04 14:42                   ` Michel Lespinasse
2012-12-05  2:59                   ` Michel Lespinasse
2012-12-03 14:17               ` [PATCH 2/2] " Mel Gorman
2012-12-04 14:37                 ` Michel Lespinasse
2012-12-04 18:17                   ` Mel Gorman
2012-12-01 18:55         ` [RFC PATCH] mm/migration: Remove anon vma locking from try_to_unmap() use Rik van Riel
2012-12-01 16:19     ` [RFC PATCH] mm/migration: Don't lock anon vmas in rmap_walk_anon() Rik van Riel
2012-12-01 17:55     ` Linus Torvalds
2012-12-01 18:30       ` Ingo Molnar
2012-12-03 13:41   ` [PATCH 00/10] Latest numa/core release, v18 Mel Gorman
2012-12-04 17:30     ` Thomas Gleixner
2012-12-03 10:43 ` Mel Gorman
2012-12-03 11:32 ` Mel Gorman
2012-12-04 22:49 ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121204224957.GC2797@suse.de \
    --to=mgorman@suse.de \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=pjt@google.com \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).