All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/8] sched: Favour moving tasks towards the preferred node
Date: Fri, 28 Jun 2013 14:45:35 +0100	[thread overview]
Message-ID: <20130628134535.GX1875@suse.de> (raw)
In-Reply-To: <20130627161127.GZ28407@twins.programming.kicks-ass.net>

On Thu, Jun 27, 2013 at 06:11:27PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 26, 2013 at 03:38:04PM +0100, Mel Gorman wrote:
> > +/* Returns true if the destination node has incurred more faults */
> > +static bool migrate_improves_locality(struct task_struct *p, struct lb_env *env)
> > +{
> > +	int src_nid, dst_nid;
> > +
> > +	if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
> > +		return false;
> > +
> > +	src_nid = cpu_to_node(env->src_cpu);
> > +	dst_nid = cpu_to_node(env->dst_cpu);
> > +
> > +	if (src_nid == dst_nid)
> > +		return false;
> > +
> > +	if (p->numa_migrate_seq < sysctl_numa_balancing_settle_count &&
> > +	    p->numa_preferred_nid == dst_nid)
> > +		return true;
> > +
> > +	return false;
> > +}
> 
> Also, until I just actually _read_ that function; I assumed it would
> compare p->numa_faults[src_nid] and p->numa_faults[dst_nid]. Because
> even when the dst_nid isn't the preferred nid; it might still have more
> pages than where we currently are.
> 

I tested something like this and also tested it when only taking shared
accesses into account but it performed badly in some cases.  I've included
the last patch I tested below for reference but dropped it until I figured
out why it performed badly. I guessed it was due to increased bouncing
due to shared faults but didn't prove it.

> Idem with the proposed migrate_degrades_locality().
> 
> Something like so I suppose
> 
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3969,6 +3969,7 @@ task_hot(struct task_struct *p, u64 now,
>  	return delta < (s64)sysctl_sched_migration_cost;
>  }
>  
> +#ifdef CONFIG_NUMA_BALANCING
>  /* Returns true if the destination node has incurred more faults */
>  static bool migrate_improves_locality(struct task_struct *p, struct lb_env *env)
>  {
> @@ -3983,13 +3984,50 @@ static bool migrate_improves_locality(st
>  	if (src_nid == dst_nid)
>  		return false;
>  
> -	if (p->numa_migrate_seq < sysctl_numa_balancing_settle_count &&
> -	    p->numa_preferred_nid == dst_nid)
> +	if (p->numa_migrate_seq >= sysctl_numa_balancing_settle_count)
> +		return false;
> +
> +	if (p->numa_preferred_nid == dst_nid)
> +		return true;
> +
> +	if (p->numa_faults[src_nid] < p->numa_faults[dst_nid])
> +		return true;
> +
> +	return false;
> +}
> +

I tested something like this.

> +static vool migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
> +{
> +	int src_nid, dst_nid;
> +
> +	if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
> +		return false;
> +
> +	src_nid = cpu_to_node(env->src_cpu);
> +	dst_nid = cpu_to_node(env->dst_cpu);
> +
> +	if (src_nid == dst_nid)
> +		return false;
> +
> +	if (p->numa_faults[src_nid] > p->numa_faults[dst_nid])
>  		return true;
>  
>  	return false;
>  }

But I had not tried this and it makes sense. I'll test it out and include
it in the next revision if it looks good. Unless you object I'll add
your signed-off because the version of the patch I'm about to test looks
almost identical to this.

>  
> +#else
> +
> +static inline bool migrate_improves_locality(struct task_struct *p, struct lb_env *env)
> +{
> +	return false;
> +}
> +
> +static inline bool migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
> +{
> +	return false;
> +}
> +
> +#endif /* CONFIG_NUMA_BALANCING */
>  
>  /*
>   * can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
> @@ -4055,8 +4093,10 @@ int can_migrate_task(struct task_struct
>  		return 1;
>  
>  	tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq), env->sd);
> +	if (!tsk_cache_hot)
> +		tsk_cache_hot = migrate_degrades_locality(p, env);
>  	if (!tsk_cache_hot ||
> -		env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
> +	    env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
>  
>  		if (tsk_cache_hot) {
>  			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
> 

This is the last patch similar to this idea I tested.

---8<---
sched: Favour moving tasks towards nodes that incurred more faults

Signed-off-by: Mel Gorman <mgorman@suse.de>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e9bbb70..3379ca4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3980,9 +3980,18 @@ static bool migrate_improves_locality(struct task_struct *p, struct lb_env *env)
 	if (src_nid == dst_nid)
 		return false;
 
-	if (p->numa_migrate_seq < sysctl_numa_balancing_settle_count &&
-	    p->numa_preferred_nid == dst_nid)
-		return true;
+	if (p->numa_migrate_seq < sysctl_numa_balancing_settle_count) {
+		if (p->numa_preferred_nid == dst_nid)
+			return true;
+
+		/*
+		 * Move towards node if there were a higher number of shared
+		 * NUMA hinting faults
+		 */
+		if (p->numa_faults[task_faults_idx(dst_nid, 0)] >
+		    p->numa_faults[task_faults_idx(src_nid, 0)])
+			return true;
+	}
 
 	return false;
 }


-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 5/8] sched: Favour moving tasks towards the preferred node
Date: Fri, 28 Jun 2013 14:45:35 +0100	[thread overview]
Message-ID: <20130628134535.GX1875@suse.de> (raw)
In-Reply-To: <20130627161127.GZ28407@twins.programming.kicks-ass.net>

On Thu, Jun 27, 2013 at 06:11:27PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 26, 2013 at 03:38:04PM +0100, Mel Gorman wrote:
> > +/* Returns true if the destination node has incurred more faults */
> > +static bool migrate_improves_locality(struct task_struct *p, struct lb_env *env)
> > +{
> > +	int src_nid, dst_nid;
> > +
> > +	if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
> > +		return false;
> > +
> > +	src_nid = cpu_to_node(env->src_cpu);
> > +	dst_nid = cpu_to_node(env->dst_cpu);
> > +
> > +	if (src_nid == dst_nid)
> > +		return false;
> > +
> > +	if (p->numa_migrate_seq < sysctl_numa_balancing_settle_count &&
> > +	    p->numa_preferred_nid == dst_nid)
> > +		return true;
> > +
> > +	return false;
> > +}
> 
> Also, until I just actually _read_ that function; I assumed it would
> compare p->numa_faults[src_nid] and p->numa_faults[dst_nid]. Because
> even when the dst_nid isn't the preferred nid; it might still have more
> pages than where we currently are.
> 

I tested something like this and also tested it when only taking shared
accesses into account but it performed badly in some cases.  I've included
the last patch I tested below for reference but dropped it until I figured
out why it performed badly. I guessed it was due to increased bouncing
due to shared faults but didn't prove it.

> Idem with the proposed migrate_degrades_locality().
> 
> Something like so I suppose
> 
> ---
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3969,6 +3969,7 @@ task_hot(struct task_struct *p, u64 now,
>  	return delta < (s64)sysctl_sched_migration_cost;
>  }
>  
> +#ifdef CONFIG_NUMA_BALANCING
>  /* Returns true if the destination node has incurred more faults */
>  static bool migrate_improves_locality(struct task_struct *p, struct lb_env *env)
>  {
> @@ -3983,13 +3984,50 @@ static bool migrate_improves_locality(st
>  	if (src_nid == dst_nid)
>  		return false;
>  
> -	if (p->numa_migrate_seq < sysctl_numa_balancing_settle_count &&
> -	    p->numa_preferred_nid == dst_nid)
> +	if (p->numa_migrate_seq >= sysctl_numa_balancing_settle_count)
> +		return false;
> +
> +	if (p->numa_preferred_nid == dst_nid)
> +		return true;
> +
> +	if (p->numa_faults[src_nid] < p->numa_faults[dst_nid])
> +		return true;
> +
> +	return false;
> +}
> +

I tested something like this.

> +static vool migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
> +{
> +	int src_nid, dst_nid;
> +
> +	if (!p->numa_faults || !(env->sd->flags & SD_NUMA))
> +		return false;
> +
> +	src_nid = cpu_to_node(env->src_cpu);
> +	dst_nid = cpu_to_node(env->dst_cpu);
> +
> +	if (src_nid == dst_nid)
> +		return false;
> +
> +	if (p->numa_faults[src_nid] > p->numa_faults[dst_nid])
>  		return true;
>  
>  	return false;
>  }

But I had not tried this and it makes sense. I'll test it out and include
it in the next revision if it looks good. Unless you object I'll add
your signed-off because the version of the patch I'm about to test looks
almost identical to this.

>  
> +#else
> +
> +static inline bool migrate_improves_locality(struct task_struct *p, struct lb_env *env)
> +{
> +	return false;
> +}
> +
> +static inline bool migrate_degrades_locality(struct task_struct *p, struct lb_env *env)
> +{
> +	return false;
> +}
> +
> +#endif /* CONFIG_NUMA_BALANCING */
>  
>  /*
>   * can_migrate_task - may task p from runqueue rq be migrated to this_cpu?
> @@ -4055,8 +4093,10 @@ int can_migrate_task(struct task_struct
>  		return 1;
>  
>  	tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq), env->sd);
> +	if (!tsk_cache_hot)
> +		tsk_cache_hot = migrate_degrades_locality(p, env);
>  	if (!tsk_cache_hot ||
> -		env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
> +	    env->sd->nr_balance_failed > env->sd->cache_nice_tries) {
>  
>  		if (tsk_cache_hot) {
>  			schedstat_inc(env->sd, lb_hot_gained[env->idle]);
> 

This is the last patch similar to this idea I tested.

---8<---
sched: Favour moving tasks towards nodes that incurred more faults

Signed-off-by: Mel Gorman <mgorman@suse.de>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e9bbb70..3379ca4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3980,9 +3980,18 @@ static bool migrate_improves_locality(struct task_struct *p, struct lb_env *env)
 	if (src_nid == dst_nid)
 		return false;
 
-	if (p->numa_migrate_seq < sysctl_numa_balancing_settle_count &&
-	    p->numa_preferred_nid == dst_nid)
-		return true;
+	if (p->numa_migrate_seq < sysctl_numa_balancing_settle_count) {
+		if (p->numa_preferred_nid == dst_nid)
+			return true;
+
+		/*
+		 * Move towards node if there were a higher number of shared
+		 * NUMA hinting faults
+		 */
+		if (p->numa_faults[task_faults_idx(dst_nid, 0)] >
+		    p->numa_faults[task_faults_idx(src_nid, 0)])
+			return true;
+	}
 
 	return false;
 }


-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-06-28 13:45 UTC|newest]

Thread overview: 124+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-26 14:37 [PATCH 0/6] Basic scheduler support for automatic NUMA balancing Mel Gorman
2013-06-26 14:37 ` Mel Gorman
2013-06-26 14:38 ` [PATCH 1/8] mm: numa: Document automatic NUMA balancing sysctls Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-26 14:38 ` [PATCH 2/8] sched: Track NUMA hinting faults on per-node basis Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-27 15:57   ` Peter Zijlstra
2013-06-27 15:57     ` Peter Zijlstra
2013-06-28 12:22     ` Mel Gorman
2013-06-28 12:22       ` Mel Gorman
2013-06-28  6:08   ` Srikar Dronamraju
2013-06-28  6:08     ` Srikar Dronamraju
2013-06-28  8:56     ` Peter Zijlstra
2013-06-28  8:56       ` Peter Zijlstra
2013-06-28 12:30     ` Mel Gorman
2013-06-28 12:30       ` Mel Gorman
2013-06-26 14:38 ` [PATCH 3/8] sched: Select a preferred node with the most numa hinting faults Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-28  6:14   ` Srikar Dronamraju
2013-06-28  6:14     ` Srikar Dronamraju
2013-06-28  8:59     ` Peter Zijlstra
2013-06-28  8:59       ` Peter Zijlstra
2013-06-28 10:24       ` Srikar Dronamraju
2013-06-28 10:24         ` Srikar Dronamraju
2013-06-28 12:33     ` Mel Gorman
2013-06-28 12:33       ` Mel Gorman
2013-06-26 14:38 ` [PATCH 4/8] sched: Update NUMA hinting faults once per scan Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-28  6:32   ` Srikar Dronamraju
2013-06-28  6:32     ` Srikar Dronamraju
2013-06-28  9:01     ` Peter Zijlstra
2013-06-28  9:01       ` Peter Zijlstra
2013-06-26 14:38 ` [PATCH 5/8] sched: Favour moving tasks towards the preferred node Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-27 14:52   ` Peter Zijlstra
2013-06-27 14:52     ` Peter Zijlstra
2013-06-27 14:53   ` Peter Zijlstra
2013-06-27 14:53     ` Peter Zijlstra
2013-06-28 13:00     ` Mel Gorman
2013-06-28 13:00       ` Mel Gorman
2013-06-27 16:01   ` Peter Zijlstra
2013-06-27 16:01     ` Peter Zijlstra
2013-06-28 13:01     ` Mel Gorman
2013-06-28 13:01       ` Mel Gorman
2013-06-27 16:11   ` Peter Zijlstra
2013-06-27 16:11     ` Peter Zijlstra
2013-06-28 13:45     ` Mel Gorman [this message]
2013-06-28 13:45       ` Mel Gorman
2013-06-28 15:10       ` Peter Zijlstra
2013-06-28 15:10         ` Peter Zijlstra
2013-06-28  8:11   ` Srikar Dronamraju
2013-06-28  8:11     ` Srikar Dronamraju
2013-06-28  9:04     ` Peter Zijlstra
2013-06-28  9:04       ` Peter Zijlstra
2013-06-28 10:07       ` Srikar Dronamraju
2013-06-28 10:07         ` Srikar Dronamraju
2013-06-28 10:24         ` Peter Zijlstra
2013-06-28 10:24           ` Peter Zijlstra
2013-06-28 13:51         ` Mel Gorman
2013-06-28 13:51           ` Mel Gorman
2013-06-28 17:14           ` Srikar Dronamraju
2013-06-28 17:14             ` Srikar Dronamraju
2013-06-28 17:34             ` Mel Gorman
2013-06-28 17:34               ` Mel Gorman
2013-06-28 17:44               ` Srikar Dronamraju
2013-06-28 17:44                 ` Srikar Dronamraju
2013-06-26 14:38 ` [PATCH 6/8] sched: Reschedule task on preferred NUMA node once selected Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-27 14:54   ` Peter Zijlstra
2013-06-27 14:54     ` Peter Zijlstra
2013-06-28 13:54     ` Mel Gorman
2013-06-28 13:54       ` Mel Gorman
2013-07-02 12:06   ` Srikar Dronamraju
2013-07-02 12:06     ` Srikar Dronamraju
2013-07-02 16:29     ` Mel Gorman
2013-07-02 16:29       ` Mel Gorman
2013-07-02 18:17     ` Peter Zijlstra
2013-07-02 18:17       ` Peter Zijlstra
2013-07-06  6:44       ` Srikar Dronamraju
2013-07-06  6:44         ` Srikar Dronamraju
2013-07-06 10:47         ` Peter Zijlstra
2013-07-06 10:47           ` Peter Zijlstra
2013-07-02 18:15   ` Peter Zijlstra
2013-07-02 18:15     ` Peter Zijlstra
2013-07-03  9:50     ` Peter Zijlstra
2013-07-03  9:50       ` Peter Zijlstra
2013-07-03 15:28       ` Mel Gorman
2013-07-03 15:28         ` Mel Gorman
2013-07-03 18:46         ` Peter Zijlstra
2013-07-03 18:46           ` Peter Zijlstra
2013-06-26 14:38 ` [PATCH 7/8] sched: Split accounting of NUMA hinting faults that pass two-stage filter Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-27 14:56   ` Peter Zijlstra
2013-06-27 14:56     ` Peter Zijlstra
2013-06-28 14:00     ` Mel Gorman
2013-06-28 14:00       ` Mel Gorman
2013-06-28  7:00   ` Srikar Dronamraju
2013-06-28  7:00     ` Srikar Dronamraju
2013-06-28  9:36     ` Peter Zijlstra
2013-06-28  9:36       ` Peter Zijlstra
2013-06-28 10:12       ` Srikar Dronamraju
2013-06-28 10:12         ` Srikar Dronamraju
2013-06-28 10:33         ` Peter Zijlstra
2013-06-28 10:33           ` Peter Zijlstra
2013-06-28 14:29           ` Mel Gorman
2013-06-28 14:29             ` Mel Gorman
2013-06-28 15:12             ` Peter Zijlstra
2013-06-28 15:12               ` Peter Zijlstra
2013-06-26 14:38 ` [PATCH 8/8] sched: Increase NUMA PTE scanning when a new preferred node is selected Mel Gorman
2013-06-26 14:38   ` Mel Gorman
2013-06-27 14:59 ` [PATCH 0/6] Basic scheduler support for automatic NUMA balancing Peter Zijlstra
2013-06-27 14:59   ` Peter Zijlstra
2013-06-28 13:54 ` Srikar Dronamraju
2013-06-28 13:54   ` Srikar Dronamraju
2013-07-01  5:39   ` Srikar Dronamraju
2013-07-01  5:39     ` Srikar Dronamraju
2013-07-01  8:43     ` Mel Gorman
2013-07-01  8:43       ` Mel Gorman
2013-07-02  5:28       ` Srikar Dronamraju
2013-07-02  5:28         ` Srikar Dronamraju
2013-07-02  7:46   ` Peter Zijlstra
2013-07-02  7:46     ` Peter Zijlstra
2013-07-02  8:55     ` Peter Zijlstra
2013-07-02  8:55       ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130628134535.GX1875@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.