All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Jirka Hladky <jhladky@redhat.com>,
	Rik van Riel <riel@surriel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: [PATCH 2/2] mm, numa: Migrate pages to local nodes quicker early in the lifetime of a task
Date: Tue, 2 Oct 2018 23:00:05 +0530	[thread overview]
Message-ID: <20181002173005.GD4593@linux.vnet.ibm.com> (raw)
In-Reply-To: <20181002135459.GA7003@techsingularity.net>

> > 
> > This does have issues when using with workloads that access more shared faults
> > than private faults.
> > 
> 
> Not as such. It can have issues on workloads where memory is initialised
> by one thread, then additional threads are created and access the same
> memory. They are not necessarily shared once buffers are handed over. In
> such a case, migrating quickly is the right thing to do. If it's truely
> shared pages then there may be some unnecessary migrations early in the
> lifetime of the task but it'll settle down quickly enough.
> 

Do you have a workload recommendation to try for shared fault accesses.
I will try to get a DayTrader run in a day or two. There JVM and db threads
act on the same memory, I presume it might show some insights.

> Is it just numa01 that was affected for you? I ask because that particular
> workload is an averse workload on any machine with more than sockets and
> your machine description says it has 4 nodes. What it is testing is quite
> specific to 2-node machines.
> 

Agree, 

Some variations of numa01.sh where I have one process having threads equal
to number of cpus does regress but not as much as numa01.

./numa03.sh      Real:  484.84    555.51    518.59    22.91    -5.84277%
./numa03.sh      Sys:   44.41     64.40     53.24     6.65     -11.3824%
./numa03.sh      User:  51328.77  59429.39  55366.62  2744.39  -9.47912%


> > SPECJbb did show some small loss and gains.
> > 
> 
> That almost always shows small gains and losses so that's not too
> surprising.
> 

Okay.

> > Our numa grouping is not fast enough. It can take sometimes several
> > iterations before all the tasks belonging to the same group end up being
> > part of the group. With the current check we end up spreading memory faster
> > than we should hence hurting the chance of early consolidation.
> > 
> > Can we restrict to something like this?
> > 
> > if (p->numa_scan_seq >=MIN && p->numa_scan_seq <= MIN+4 &&
> >     (cpupid_match_pid(p, last_cpupid)))
> > 	return true;
> > 
> > meaning, we ran atleast MIN number of scans, and we find the task to be most likely
> > task using this page.
> > 
> 


> What's MIN? Assuming it's any type of delay, note that this will regress
> STREAM again because it's very sensitive to the starting state.
> 

I was thinking of MIN as 3 to give a chance for things to settle.
but that might not help STREAM as you pointed out.

Do you have a hint on which commit made STREAM regress?

if we want to prioritize STREAM like workloads (i.e private faults) one simpler
fix could be to change the quadtraic equation

from:
	if (!cpupid_pid_unset(last_cpupid) &&
				cpupid_to_nid(last_cpupid) != dst_nid)
		return false;
to:
	if (!cpupid_pid_unset(last_cpupid) &&
				cpupid_to_nid(last_cpupid) == dst_nid)
		return true;

i.e to say if the group tasks likely consolidated to a node or the task was
moved to a different node but access were private, just move the memory.

The drawback though is we keep pulling memory everytime the task moves
across nodes. (which is probably restricted for long running tasks to some
extent by your fix)

-- 
Thanks and Regards
Srikar Dronamraju


  reply	other threads:[~2018-10-02 17:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-01 10:05 [PATCH 0/2] Faster migration for automatic NUMA balancing Mel Gorman
2018-10-01 10:05 ` [PATCH 1/2] mm, numa: Remove rate-limiting of automatic numa balancing migration Mel Gorman
2018-10-01 15:39   ` Rik van Riel
2018-10-02 10:17   ` [tip:sched/urgent] mm, sched/numa: Remove rate-limiting of automatic NUMA " tip-bot for Mel Gorman
2018-10-02 11:54   ` [PATCH 1/2] mm, numa: Remove rate-limiting of automatic numa " Srikar Dronamraju
2018-10-01 10:05 ` [PATCH 2/2] mm, numa: Migrate pages to local nodes quicker early in the lifetime of a task Mel Gorman
2018-10-01 15:41   ` Rik van Riel
2018-10-02 10:17   ` [tip:sched/urgent] sched/numa: " tip-bot for Mel Gorman
2018-10-02 12:41   ` [PATCH 2/2] mm, numa: " Srikar Dronamraju
2018-10-02 13:54     ` Mel Gorman
2018-10-02 17:30       ` Srikar Dronamraju [this message]
2018-10-02 18:22         ` Mel Gorman
2018-10-03 13:15           ` Srikar Dronamraju
2018-10-03 13:07         ` Srikar Dronamraju
2018-10-03 13:21           ` Mel Gorman
2018-10-03 14:08             ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181002173005.GD4593@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=jhladky@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.