All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Wang <wangyun@linux.vnet.ibm.com>
To: Mike Galbraith <bitbucket@online.de>
Cc: linux-kernel@vger.kernel.org, mingo@redhat.com,
	peterz@infradead.org, mingo@kernel.org, a.p.zijlstra@chello.nl
Subject: Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair()
Date: Thu, 24 Jan 2013 15:00:42 +0800	[thread overview]
Message-ID: <5100DC1A.9070906@linux.vnet.ibm.com> (raw)
In-Reply-To: <5100CE1F.7080704@linux.vnet.ibm.com>

On 01/24/2013 02:01 PM, Michael Wang wrote:
> On 01/23/2013 05:32 PM, Mike Galbraith wrote:
> [snip]
>> ---
>>  include/linux/topology.h |    6 ++---
>>  kernel/sched/core.c      |   41 ++++++++++++++++++++++++++++++-------
>>  kernel/sched/fair.c      |   52 +++++++++++++++++++++++++++++------------------
>>  3 files changed, 70 insertions(+), 29 deletions(-)
>>
>> --- a/include/linux/topology.h
>> +++ b/include/linux/topology.h
>> @@ -95,7 +95,7 @@ int arch_update_cpu_topology(void);
>>  				| 1*SD_BALANCE_NEWIDLE			\
>>  				| 1*SD_BALANCE_EXEC			\
>>  				| 1*SD_BALANCE_FORK			\
>> -				| 0*SD_BALANCE_WAKE			\
>> +				| 1*SD_BALANCE_WAKE			\
>>  				| 1*SD_WAKE_AFFINE			\
>>  				| 1*SD_SHARE_CPUPOWER			\
>>  				| 1*SD_SHARE_PKG_RESOURCES		\
>> @@ -126,7 +126,7 @@ int arch_update_cpu_topology(void);
>>  				| 1*SD_BALANCE_NEWIDLE			\
>>  				| 1*SD_BALANCE_EXEC			\
>>  				| 1*SD_BALANCE_FORK			\
>> -				| 0*SD_BALANCE_WAKE			\
>> +				| 1*SD_BALANCE_WAKE			\
>>  				| 1*SD_WAKE_AFFINE			\
>>  				| 0*SD_SHARE_CPUPOWER			\
>>  				| 1*SD_SHARE_PKG_RESOURCES		\
>> @@ -156,7 +156,7 @@ int arch_update_cpu_topology(void);
>>  				| 1*SD_BALANCE_NEWIDLE			\
>>  				| 1*SD_BALANCE_EXEC			\
>>  				| 1*SD_BALANCE_FORK			\
>> -				| 0*SD_BALANCE_WAKE			\
>> +				| 1*SD_BALANCE_WAKE			\
>>  				| 1*SD_WAKE_AFFINE			\
>>  				| 0*SD_SHARE_CPUPOWER			\
>>  				| 0*SD_SHARE_PKG_RESOURCES		\
> 
> I've enabled WAKE flag on my box like you did, but still can't see
> regression, and I've just tested on a power server with 64 cpu, also
> failed to reproduce the issue (not compared with virgin yet, but can't
> see collapse).
> 
> I will do more testing on the power box to confirm it.

I still can't reproduce the issue, but there are some difference
according to my default sd topology:

WYT: sbm of cpu 0
WYT: 	 exec map
WYT: 		 sd f051be80, idx 0, level 0, weight 4
WYT: 		 sd f08b3700, idx 1, level 1, weight 32
WYT: 		 sd f08b3700, idx 2, level 1, weight 32
WYT: 	 fork map
WYT: 		 sd f051be80, idx 0, level 0, weight 4
WYT: 		 sd f08b3700, idx 1, level 1, weight 32
WYT: 		 sd f08b3700, idx 2, level 1, weight 32
WYT: 	 wake map
WYT: 		 sd f051be80, idx 0, level 0, weight 4
WYT: 		 sd f08b3700, idx 1, level 1, weight 32
WYT: 		 sd f08b6300, idx 2, level 2, weight 64
WYT: 	 affine map
WYT: 		 affine with cpu 0 in sd f051be80, weight 4
WYT: 		 affine with cpu 1 in sd f051be80, weight 4
WYT: 		 affine with cpu 2 in sd f051be80, weight 4
WYT: 		 affine with cpu 3 in sd f051be80, weight 4
		...

And there are only sibling, cpu and numa level, no mc level while your
box have, but that looks harmless to me... isn't it?

This is the aim 7 results of the patched kernel, it's just fine.

Tasks    jobs/min  jti  jobs/min/task      real       cpu
    1      424.07  100       424.0728     14.29      4.29   Thu Jan 24
01:52:22 2013
    5     2561.28   99       512.2570     11.83      8.82   Thu Jan 24
01:52:35 2013
   10     5033.22   97       503.3223     12.04     16.35   Thu Jan 24
01:52:47 2013
   20    10350.13   98       517.5064     11.71     28.54   Thu Jan 24
01:52:59 2013
   40    20116.18   98       502.9046     12.05     62.06   Thu Jan 24
01:53:11 2013
   80    39255.06   98       490.6883     12.35    122.18   Thu Jan 24
01:53:24 2013
  160    69405.87   97       433.7867     13.97    234.41   Thu Jan 24
01:53:38 2013
  320   111192.66   92       347.4771     17.44    463.18   Thu Jan 24
01:53:56 2013
  640   158044.01   86       246.9438     24.54    920.38   Thu Jan 24
01:54:20 2013
 1280   199763.07   87       156.0649     38.83   1833.75   Thu Jan 24
01:54:59 2013
 2560   229933.30   81        89.8177     67.47   3665.30   Thu Jan 24
01:56:07 2013

And this is my cpu info:
processor	: 63
cpu		: POWER7 (raw), altivec supported
clock		: 8.388608MHz
revision	: 2.3 (pvr 003f 0203)

Regards,
Michael Wang

> 
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5609,11 +5609,39 @@ static void update_top_cache_domain(int
>>  static int sbm_max_level;
>>  DEFINE_PER_CPU_SHARED_ALIGNED(struct sched_balance_map, sbm_array);
>>
>> +static void debug_sched_balance_map(int cpu)
>> +{
>> +	int i, type, level = 0;
>> +	struct sched_balance_map *sbm = &per_cpu(sbm_array, cpu);
>> +
>> +	printk("WYT: sbm of cpu %d\n", cpu);
>> +
>> +	for (type = 0; type < SBM_MAX_TYPE; type++) {
>> +		if (type == SBM_EXEC_TYPE)
>> +			printk("WYT: \t exec map\n");
>> +		else if (type == SBM_FORK_TYPE)
>> +			printk("WYT: \t fork map\n");
>> +		else if (type == SBM_WAKE_TYPE)
>> +			printk("WYT: \t wake map\n");
>> +
>> +		for (level = 0; level < sbm_max_level; level++) {
>> +			if (sbm->sd[type][level])
>> +				printk("WYT: \t\t sd %x, idx %d, level %d, weight %d\n", sbm->sd[type][level], level, sbm->sd[type][level]->level, sbm->sd[type][level]->span_weight);
>> +		}
>> +	}
>> +
>> +	printk("WYT: \t affine map\n");
>> +
>> +	for_each_possible_cpu(i) {
>> +		if (sbm->affine_map[i])
>> +			printk("WYT: \t\t affine with cpu %x in sd %x, weight %d\n", i, sbm->affine_map[i], sbm->affine_map[i]->span_weight);
>> +	}
>> +}
>> +
>>  static void build_sched_balance_map(int cpu)
>>  {
>>  	struct sched_balance_map *sbm = &per_cpu(sbm_array, cpu);
>>  	struct sched_domain *sd = cpu_rq(cpu)->sd;
>> -	struct sched_domain *top_sd = NULL;
>>  	int i, type, level = 0;
>>
>>  	memset(sbm->top_level, 0, sizeof((*sbm).top_level));
>> @@ -5656,11 +5684,9 @@ static void build_sched_balance_map(int
>>  	 * fill the hole to get lower level sd easily.
>>  	 */
>>  	for (type = 0; type < SBM_MAX_TYPE; type++) {
>> -		level = sbm->top_level[type];
>> -		top_sd = sbm->sd[type][level];
>> -		if ((++level != sbm_max_level) && top_sd) {
>> -			for (; level < sbm_max_level; level++)
>> -				sbm->sd[type][level] = top_sd;
>> +		for (level = 1; level < sbm_max_level; level++) {
>> +			if (!sbm->sd[type][level])
>> +				sbm->sd[type][level] = sbm->sd[type][level - 1];
>>  		}
>>  	}
>>  }
>> @@ -5719,6 +5745,7 @@ cpu_attach_domain(struct sched_domain *s
>>  	 * destroy_sched_domains() already do the work.
>>  	 */
>>  	build_sched_balance_map(cpu);
>> +//MIKE	debug_sched_balance_map(cpu);
>>  	rcu_assign_pointer(rq->sbm, sbm);
>>  }
>>
>> @@ -6220,7 +6247,7 @@ sd_numa_init(struct sched_domain_topolog
>>  					| 1*SD_BALANCE_NEWIDLE
>>  					| 0*SD_BALANCE_EXEC
>>  					| 0*SD_BALANCE_FORK
>> -					| 0*SD_BALANCE_WAKE
>> +					| 1*SD_BALANCE_WAKE
>>  					| 0*SD_WAKE_AFFINE
>>  					| 0*SD_SHARE_CPUPOWER
>>  					| 0*SD_SHARE_PKG_RESOURCES
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -3312,7 +3312,7 @@ static int select_idle_sibling(struct ta
>>  static int
>>  select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags)
>>  {
>> -	struct sched_domain *sd = NULL;
>> +	struct sched_domain *sd = NULL, *tmp;
>>  	int cpu = smp_processor_id();
>>  	int prev_cpu = task_cpu(p);
>>  	int new_cpu = cpu;
>> @@ -3376,31 +3376,45 @@ select_task_rq_fair(struct task_struct *
>>
>>  balance_path:
>>  	new_cpu = (sd_flag & SD_BALANCE_WAKE) ? prev_cpu : cpu;
>> -	sd = sbm->sd[type][sbm->top_level[type]];
>> +	sd = tmp = sbm->sd[type][sbm->top_level[type]];
>>
>>  	while (sd) {
>>  		int load_idx = sd->forkexec_idx;
>> -		struct sched_group *sg = NULL;
>> +		struct sched_group *group;
>> +		int weight;
>> +
>> +		if (!(sd->flags & sd_flag)) {
>> +			sd = sd->child;
>> +			continue;
>> +		}
>>
>>  		if (sd_flag & SD_BALANCE_WAKE)
>>  			load_idx = sd->wake_idx;
>>
>> -		sg = find_idlest_group(sd, p, cpu, load_idx);
>> -		if (!sg)
>> -			goto next_sd;
>> -
>> -		new_cpu = find_idlest_cpu(sg, p, cpu);
>> -		if (new_cpu != -1)
>> -			cpu = new_cpu;
>> -next_sd:
>> -		if (!sd->level)
>> -			break;
>> -
>> -		sbm = cpu_rq(cpu)->sbm;
>> -		if (!sbm)
>> -			break;
>> -
>> -		sd = sbm->sd[type][sd->level - 1];
> 
> May be we could test part by part? I'm planing to write another debug
> patch, by which we could compare just part of the two ways, will send to
> you when I finished it.
> 
> Regards,
> Michael Wang
> 
>> +		group = find_idlest_group(sd, p, cpu, load_idx);
>> +		if (!group) {
>> +			sd = sd->child;
>> +			continue;
>> +		}
>> +
>> +		new_cpu = find_idlest_cpu(group, p, cpu);
>> +		if (new_cpu == -1 || new_cpu == cpu) {
>> +			/* Now try balancing at a lower domain level of cpu */
>> +			sd = sd->child;
>> +			continue;
>> +		}
>> +
>> +		/* Now try balancing at a lower domain level of new_cpu */
>> +		cpu = new_cpu;
>> +		weight = sd->span_weight;
>> +		sd = NULL;
>> +		for_each_domain(cpu, tmp) {
>> +			if (weight <= tmp->span_weight)
>> +				break;
>> +			if (tmp->flags & sd_flag)
>> +				sd = tmp;
>> +		}
>> +		/* while loop will break here if sd == NULL */
>>  	}
>>
>>  unlock:
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
> 


  parent reply	other threads:[~2013-01-24  7:01 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1356588535-23251-1-git-send-email-wangyun@linux.vnet.ibm.com>
2013-01-09  9:28 ` [RFC PATCH 0/2] sched: simplify the select_task_rq_fair() Michael Wang
2013-01-12  8:01   ` Mike Galbraith
2013-01-12 10:19     ` Mike Galbraith
2013-01-14  9:21       ` Mike Galbraith
2013-01-15  3:10         ` Michael Wang
2013-01-15  4:52           ` Mike Galbraith
2013-01-15  8:26             ` Michael Wang
2013-01-17  5:55         ` Michael Wang
2013-01-20  4:09           ` Mike Galbraith
2013-01-21  2:50             ` Michael Wang
2013-01-21  4:38               ` Mike Galbraith
2013-01-21  5:07                 ` Michael Wang
2013-01-21  6:42                   ` Mike Galbraith
2013-01-21  7:09                     ` Mike Galbraith
2013-01-21  7:45                       ` Michael Wang
2013-01-21  9:09                         ` Mike Galbraith
2013-01-21  9:22                           ` Michael Wang
2013-01-21  9:44                             ` Mike Galbraith
2013-01-21 10:30                               ` Mike Galbraith
2013-01-22  3:43                               ` Michael Wang
2013-01-22  8:03                                 ` Mike Galbraith
2013-01-22  8:56                                   ` Michael Wang
2013-01-22 11:34                                     ` Mike Galbraith
2013-01-23  3:01                                       ` Michael Wang
2013-01-23  5:02                                         ` Mike Galbraith
2013-01-22 14:41                                     ` Mike Galbraith
2013-01-23  2:44                                       ` Michael Wang
2013-01-23  4:31                                         ` Mike Galbraith
2013-01-23  5:09                                           ` Michael Wang
2013-01-23  6:28                                             ` Mike Galbraith
2013-01-23  7:10                                               ` Michael Wang
2013-01-23  8:20                                                 ` Mike Galbraith
2013-01-23  8:30                                                   ` Michael Wang
2013-01-23  8:49                                                     ` Mike Galbraith
2013-01-23  9:00                                                       ` Michael Wang
2013-01-23  9:18                                                         ` Mike Galbraith
2013-01-23  9:26                                                           ` Michael Wang
2013-01-23  9:37                                                             ` Mike Galbraith
2013-01-23  9:32                                                           ` Mike Galbraith
2013-01-24  6:01                                                             ` Michael Wang
2013-01-24  6:51                                                               ` Mike Galbraith
2013-01-24  7:15                                                                 ` Michael Wang
2013-01-24  7:47                                                                   ` Mike Galbraith
2013-01-24  8:14                                                                     ` Michael Wang
2013-01-24  9:07                                                                       ` Mike Galbraith
2013-01-24  9:26                                                                         ` Michael Wang
2013-01-24 10:34                                                                           ` Mike Galbraith
2013-01-25  2:14                                                                             ` Michael Wang
2013-01-24  7:00                                                               ` Michael Wang [this message]
2013-01-21  7:34                     ` Michael Wang
2013-01-21  8:26                       ` Mike Galbraith
2013-01-21  8:46                         ` Michael Wang
2013-01-21  9:11                           ` Mike Galbraith
2013-01-15  2:46     ` Michael Wang
2013-01-11  8:15 Michael Wang
2013-01-11 10:13 ` Nikunj A Dadhania
2013-01-15  2:20   ` Michael Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5100DC1A.9070906@linux.vnet.ibm.com \
    --to=wangyun@linux.vnet.ibm.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=bitbucket@online.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.