[RFC] Splitting scheduler into two halves

* [RFC] Splitting scheduler into two halves
@ 2014-02-28  2:13 Du, Yuyang
  2014-02-28 10:29 ` Morten Rasmussen
  0 siblings, 1 reply; 5+ messages in thread
From: Du, Yuyang @ 2014-02-28  2:13 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: linux-kernel, linux-pm, Van De Ven, Arjan, Brown, Len, Wysocki, Rafael J

Hi Peter/Ingo and all,

With the advent of more cores and heterogeneous architectures, the scheduler is required to be more complex (power efficiency) and diverse (big.little). For the scheduler to address that challenge as a whole, it is costly but not necessary. This proposal argues that the scheduler be spitted into two parts: top half (task scheduling) and bottom half (load balance). Let the bottom half take charge of the incoming requirements.

The two halves are rather orthogonal in functionality. The task scheduling (top half) seeks for *ONE* CPU to execute running tasks fairly (priority included), while the load balance (bottom half) aims for *ALL* CPUs to maximize the throughput of the computing power. The goal of task scheduling is pretty unique and clear, and CFS and RT in that part are exactly approaching the goal. The load balance, however, is constrained to meet more goals, to name a few, performance (throughput/responsiveness), power consumption, architecture differences, etc. Those things are often hard to achieve because they may conflict and are difficult to estimate and plan. So, shall we declare the independence of the two, give them freedom to pursue their own "happiness".

We take an incremental development method. As a starting point, we did three things (but did not change one single line of real-work code):
	1)	Remove load balance from fair.c into load_balance.c (~3000 lines of codes). As a result, fair.c/rt.c and load_balance.c have very little intersection.
	2)	Define struct sched_lb_class that consists of the following members to umbrella the load balance entry points.
		a.	const struct sched_lb_class *next;
		b.	int (*fork_balance) (struct task_struct *p, int sd_flags, int wake_flags);
		c.	int (*exec_balance) (struct task_struct *p, int sd_flags, int wake_flags);
		d.	int (*wakeup_balance) (struct task_struct *p, int sd_flags, int wake_flags);
		e.	void (*idle_balance) (int this_cpu, struct rq *this_rq);
		f.	void (*periodic_rebalance) (int cpu, enum cpu_idle_type idle);
		g.	void (*nohz_idle_balance) (int this_cpu, enum cpu_idle_type idle);
		h.	void (*start_periodic_balance) (struct rq *rq, int cpu);
		i.	void (*check_nohz_idle_balance) (struct rq *rq, int cpu);
	3)	Insert another layer of indirection to wrap the implemented functions in sched_lb_class. Implement a default load balance class that is just the previous load balance.

The next to do is to continue redesigning and refactoring to make life easier toward more powerful and diverse load balance. And more importantly, this RFC solicits a discussion to get early feedback on the big proposed change.

Thanks,
Yuyang

^ permalink raw reply	[flat|nested] 5+ messages in thread