linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jan H. Schönherr" <jschoenh@amazon.de>
To: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>
Cc: "Jan H. Schönherr" <jschoenh@amazon.de>, linux-kernel@vger.kernel.org
Subject: [RFC 23/60] cosched: Add core data structures for coscheduling
Date: Fri,  7 Sep 2018 23:40:10 +0200	[thread overview]
Message-ID: <20180907214047.26914-24-jschoenh@amazon.de> (raw)
In-Reply-To: <20180907214047.26914-1-jschoenh@amazon.de>

For coscheduling, we will set up hierarchical runqueues that correspond
to larger fractions of the system. They will be organized along the
scheduling domains.

Although it is overkill at the moment, we keep a full struct rq per
scheduling domain. The existing code is so used to pass struct rq
around, that it would be a large refactoring effort to concentrate the
actually needed fields of struct rq in a smaller structure. Also, we
will probably need more fields in the future.

Extend struct rq and struct cfs_rq with extra structs encapsulating
all purely coscheduling related fields: struct sdrq_data and struct
sdrq, respectively.

Extend struct task_group, so that we can keep track of the hierarchy
and how this task_group should behave. We can now distinguish between
regular task groups and scheduled task groups. The former work as usual,
while the latter actively utilize the hierarchical aspect and represent
SEs of a lower hierarchy level at a higher level within the parent task
group, causing SEs at the lower level to get coscheduled.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
---
 kernel/sched/sched.h | 151 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b36e61914a42..1bce6061ac45 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -368,6 +368,27 @@ struct task_group {
 #endif
 #endif
 
+#ifdef CONFIG_COSCHEDULING
+	/*
+	 * References the top of this task-group's RQ hierarchy. This is
+	 * static and does not change. It is used as entry-point to traverse
+	 * the structure on creation/destruction.
+	 */
+	struct cfs_rq *top_cfsrq;
+
+	/* Protects .scheduled from concurrent modifications */
+	raw_spinlock_t lock;
+
+	/*
+	 * Indicates the level at which this task group is scheduled:
+	 * 0 == bottom level == regular task group
+	 * >0 == scheduled task group
+	 *
+	 * Modifications are (for now) requested by the user.
+	 */
+	int scheduled;
+#endif
+
 #ifdef CONFIG_RT_GROUP_SCHED
 	struct sched_rt_entity	**rt_se;
 	struct rt_rq		**rt_rq;
@@ -485,6 +506,120 @@ struct rq_flags {
 #endif
 };
 
+#ifdef CONFIG_COSCHEDULING
+struct sdrq_data {
+	/*
+	 * Leader for this part of the hierarchy.
+	 *
+	 * The leader CPU is responsible for scheduling decisions and any
+	 * required maintenance.
+	 *
+	 * Leadership is variable and may be taken while the hierarchy level
+	 * is locked.
+	 */
+	int leader;
+
+	/* Height within the hierarchy: leaf == 0, parent == child + 1 */
+	int level;
+
+	/* Parent runqueue */
+	struct sdrq_data *parent;
+
+	/*
+	 * SD-RQ from which SEs get selected.
+	 *
+	 * This is set by the parent's leader and defines the current
+	 * schedulable subset of tasks within this part of the hierarchy.
+	 */
+	struct sdrq *current_sdrq;
+
+	/* CPUs making up this part of the hierarchy */
+	const struct cpumask *span;
+
+	/* Number of CPUs within this part of the hierarchy */
+	unsigned int span_weight;
+
+	/*
+	 * Determines if the corresponding SD-RQs are to be allocated on
+	 * a specific NUMA node.
+	 */
+	int numa_node;
+
+	/* Storage for rq_flags, when we need to lock multiple runqueues. */
+	struct rq_flags rf;
+
+	/* Do we have the parent runqueue locked? */
+	bool parent_locked;
+
+	/*
+	 * In case the CPU has been forced into idle, the idle_se references the
+	 * scheduling entity responsible for this. Only used on bottom level at
+	 * the moment.
+	 */
+	struct sched_entity *idle_se;
+};
+
+struct sdrq {
+	/* Common information for all SD-RQs at the same position */
+	struct sdrq_data *data;
+
+	/* SD hierarchy */
+	struct sdrq *sd_parent;		/* parent of this node */
+	struct list_head children;	/* children of this node */
+	struct list_head siblings;	/* link to parent's children list */
+
+	/*
+	 * is_root == 1 => link via tg_se into tg_parent->cfs_rq
+	 * is_root == 0 => link via sd_parent->sd_se into sd_parent->cfs_rq
+	 */
+	int is_root;
+
+	/*
+	 * SD-SE: an SE to be enqueued in .cfs_rq to represent this
+	 * node's children in order to make their members schedulable.
+	 *
+	 * In the bottom layer .sd_se has to be NULL for various if-conditions
+	 * and loop terminations. On other layers .sd_se points to .__sd_se.
+	 *
+	 * .__sd_se is unused within the bottom layer.
+	 */
+	struct sched_entity *sd_se;
+	struct sched_entity __sd_se;
+
+	/* Accumulated load of all SD-children */
+	atomic64_t sdse_load;
+
+	/*
+	 * Reference to the SD-runqueue at the same hierarchical position
+	 * in the parent task group.
+	 */
+	struct sdrq *tg_parent;
+	struct list_head tg_children;	/* child TGs of this node */
+	struct list_head tg_siblings;	/* link to parent's children list */
+
+	/*
+	 * TG-SE: a SE to be enqueued in .tg_parent->cfs_rq.
+	 *
+	 * In the case of a regular TG it is enqueued if .cfs_rq is not empty.
+	 * In the case of a scheduled TG it is enqueued if .cfs_rq is not empty
+	 * and this SD-RQ acts as a root SD within its TG.
+	 *
+	 * .tg_se takes over the role of .cfs_rq->my_se and points to the same
+	 * SE over its life-time, while .cfs_rq->my_se now points to either the
+	 * TG-SE or the SD-SE (or NULL in the parts of the root task group).
+	 */
+	struct sched_entity *tg_se;
+
+	/*
+	 * CFS runqueue of this SD runqueue.
+	 *
+	 * FIXME: Now that struct sdrq is embedded in struct cfs_rq, we could
+	 *        drop this.
+	 */
+	struct cfs_rq *cfs_rq;
+};
+#endif /* CONFIG_COSCHEDULING */
+
 /* CFS-related fields in a runqueue */
 struct cfs_rq {
 	struct load_weight	load;
@@ -544,6 +679,12 @@ struct cfs_rq {
 	u64			last_h_load_update;
 	struct sched_entity	*h_load_next;
 #endif /* CONFIG_FAIR_GROUP_SCHED */
+
+#ifdef CONFIG_COSCHEDULING
+	/* Extra info needed for hierarchical scheduling */
+	struct sdrq sdrq;
+#endif
+
 #endif /* CONFIG_SMP */
 
 #ifdef CONFIG_FAIR_GROUP_SCHED
@@ -817,6 +958,11 @@ struct rq {
 	struct rt_rq		rt;
 	struct dl_rq		dl;
 
+#ifdef CONFIG_COSCHEDULING
+	/* Extra information for hierarchical scheduling */
+	struct sdrq_data sdrq_data;
+#endif
+
 #ifdef CONFIG_FAIR_GROUP_SCHED
 	/* list of leaf cfs_rq on this CPU: */
 	struct list_head	leaf_cfs_rq_list;
@@ -935,6 +1081,11 @@ struct sched_domain_shared {
 	atomic_t	ref;
 	atomic_t	nr_busy_cpus;
 	int		has_idle_cores;
+
+#ifdef CONFIG_COSCHEDULING
+	/* Top level runqueue for this sched_domain */
+	struct rq rq;
+#endif
 };
 
 static inline int cpu_of(struct rq *rq)
-- 
2.9.3.1.gcba166c.dirty


  parent reply	other threads:[~2018-09-07 21:46 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-07 21:39 [RFC 00/60] Coscheduling for Linux Jan H. Schönherr
2018-09-07 21:39 ` [RFC 01/60] sched: Store task_group->se[] pointers as part of cfs_rq Jan H. Schönherr
2018-09-07 21:39 ` [RFC 02/60] sched: Introduce set_entity_cfs() to place a SE into a certain CFS runqueue Jan H. Schönherr
2018-09-07 21:39 ` [RFC 03/60] sched: Setup sched_domain_shared for all sched_domains Jan H. Schönherr
2018-09-07 21:39 ` [RFC 04/60] sched: Replace sd_numa_mask() hack with something sane Jan H. Schönherr
2018-09-07 21:39 ` [RFC 05/60] sched: Allow to retrieve the sched_domain_topology Jan H. Schönherr
2018-09-07 21:39 ` [RFC 06/60] sched: Add a lock-free variant of resched_cpu() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 07/60] sched: Reduce dependencies of init_tg_cfs_entry() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 08/60] sched: Move init_entity_runnable_average() into init_tg_cfs_entry() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 09/60] sched: Do not require a CFS in init_tg_cfs_entry() Jan H. Schönherr
2018-09-07 21:39 ` [RFC 10/60] sched: Use parent_entity() in more places Jan H. Schönherr
2018-09-07 21:39 ` [RFC 11/60] locking/lockdep: Increase number of supported lockdep subclasses Jan H. Schönherr
2018-09-07 21:39 ` [RFC 12/60] locking/lockdep: Make cookie generator accessible Jan H. Schönherr
2018-09-07 21:40 ` [RFC 13/60] sched: Remove useless checks for root task-group Jan H. Schönherr
2018-09-07 21:40 ` [RFC 14/60] sched: Refactor sync_throttle() to accept a CFS runqueue as argument Jan H. Schönherr
2018-09-07 21:40 ` [RFC 15/60] sched: Introduce parent_cfs_rq() and use it Jan H. Schönherr
2018-09-07 21:40 ` [RFC 16/60] sched: Preparatory code movement Jan H. Schönherr
2018-09-07 21:40 ` [RFC 17/60] sched: Introduce and use generic task group CFS traversal functions Jan H. Schönherr
2018-09-07 21:40 ` [RFC 18/60] sched: Fix return value of SCHED_WARN_ON() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 19/60] sched: Add entity variants of enqueue_task_fair() and dequeue_task_fair() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 20/60] sched: Let {en,de}queue_entity_fair() work with a varying amount of tasks Jan H. Schönherr
2018-09-07 21:40 ` [RFC 21/60] sched: Add entity variants of put_prev_task_fair() and set_curr_task_fair() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 22/60] cosched: Add config option for coscheduling support Jan H. Schönherr
2018-09-07 21:40 ` Jan H. Schönherr [this message]
2018-09-07 21:40 ` [RFC 24/60] cosched: Do minimal pre-SMP coscheduler initialization Jan H. Schönherr
2018-09-07 21:40 ` [RFC 25/60] cosched: Prepare scheduling domain topology for coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 26/60] cosched: Construct runqueue hierarchy Jan H. Schönherr
2018-09-07 21:40 ` [RFC 27/60] cosched: Add some small helper functions for later use Jan H. Schönherr
2018-09-07 21:40 ` [RFC 28/60] cosched: Add is_sd_se() to distinguish SD-SEs from TG-SEs Jan H. Schönherr
2018-09-07 21:40 ` [RFC 29/60] cosched: Adjust code reflecting on the total number of CFS tasks on a CPU Jan H. Schönherr
2018-09-07 21:40 ` [RFC 30/60] cosched: Disallow share modification on task groups for now Jan H. Schönherr
2018-09-07 21:40 ` [RFC 31/60] cosched: Don't disable idle tick " Jan H. Schönherr
2018-09-07 21:40 ` [RFC 32/60] cosched: Specialize parent_cfs_rq() for hierarchical runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 33/60] cosched: Allow resched_curr() to be called " Jan H. Schönherr
2018-09-07 21:40 ` [RFC 34/60] cosched: Add rq_of() variants for different use cases Jan H. Schönherr
2018-09-07 21:40 ` [RFC 35/60] cosched: Adjust rq_lock() functions to work with hierarchical runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 36/60] cosched: Use hrq_of() for rq_clock() and rq_clock_task() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 37/60] cosched: Use hrq_of() for (indirect calls to) ___update_load_sum() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 38/60] cosched: Skip updates on non-CPU runqueues in cfs_rq_util_change() Jan H. Schönherr
2018-09-07 21:40 ` [RFC 39/60] cosched: Adjust task group management for hierarchical runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 40/60] cosched: Keep track of task group hierarchy within each SD-RQ Jan H. Schönherr
2018-09-07 21:40 ` [RFC 41/60] cosched: Introduce locking for leader activities Jan H. Schönherr
2018-09-07 21:40 ` [RFC 42/60] cosched: Introduce locking for (mostly) enqueuing and dequeuing Jan H. Schönherr
2018-09-07 21:40 ` [RFC 43/60] cosched: Add for_each_sched_entity() variant for owned entities Jan H. Schönherr
2018-09-07 21:40 ` [RFC 44/60] cosched: Perform various rq_of() adjustments in scheduler code Jan H. Schönherr
2018-09-07 21:40 ` [RFC 45/60] cosched: Continue to account all load on per-CPU runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 46/60] cosched: Warn on throttling attempts of non-CPU runqueues Jan H. Schönherr
2018-09-07 21:40 ` [RFC 47/60] cosched: Adjust SE traversal and locking for common leader activities Jan H. Schönherr
2018-09-07 21:40 ` [RFC 48/60] cosched: Adjust SE traversal and locking for yielding and buddies Jan H. Schönherr
2018-09-07 21:40 ` [RFC 49/60] cosched: Adjust locking for enqueuing and dequeueing Jan H. Schönherr
2018-09-07 21:40 ` [RFC 50/60] cosched: Propagate load changes across hierarchy levels Jan H. Schönherr
2018-09-07 21:40 ` [RFC 51/60] cosched: Hacky work-around to avoid observing zero weight SD-SE Jan H. Schönherr
2018-09-07 21:40 ` [RFC 52/60] cosched: Support SD-SEs in enqueuing and dequeuing Jan H. Schönherr
2018-09-07 21:40 ` [RFC 53/60] cosched: Prevent balancing related functions from crossing hierarchy levels Jan H. Schönherr
2018-09-07 21:40 ` [RFC 54/60] cosched: Support idling in a coscheduled set Jan H. Schönherr
2018-09-07 21:40 ` [RFC 55/60] cosched: Adjust task selection for coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 56/60] cosched: Adjust wakeup preemption rules " Jan H. Schönherr
2018-09-07 21:40 ` [RFC 57/60] cosched: Add sysfs interface to configure coscheduling on cgroups Jan H. Schönherr
2018-09-07 21:40 ` [RFC 58/60] cosched: Switch runqueues between regular scheduling and coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 59/60] cosched: Handle non-atomicity during switches to and from coscheduling Jan H. Schönherr
2018-09-07 21:40 ` [RFC 60/60] cosched: Add command line argument to enable coscheduling Jan H. Schönherr
2018-09-10  2:50   ` Randy Dunlap
2018-09-12  0:24 ` [RFC 00/60] Coscheduling for Linux Nishanth Aravamudan
2018-09-12 19:34   ` Jan H. Schönherr
2018-09-12 23:15     ` Nishanth Aravamudan
2018-09-13 11:31       ` Jan H. Schönherr
2018-09-13 18:16         ` Nishanth Aravamudan
2018-09-12 23:18     ` Jan H. Schönherr
2018-09-13  3:05       ` Nishanth Aravamudan
2018-09-13 19:19 ` [RFC 61/60] cosched: Accumulated fixes and improvements Jan H. Schönherr
2018-09-26 17:25   ` Nishanth Aravamudan
2018-09-26 21:05     ` Nishanth Aravamudan
2018-10-01  9:13       ` Jan H. Schönherr
2018-09-14 11:12 ` [RFC 00/60] Coscheduling for Linux Peter Zijlstra
2018-09-14 16:25   ` Jan H. Schönherr
2018-09-15  8:48     ` Task group cleanups and optimizations (was: Re: [RFC 00/60] Coscheduling for Linux) Jan H. Schönherr
2018-09-17  9:48       ` Peter Zijlstra
2018-09-18 13:22         ` Jan H. Schönherr
2018-09-18 13:38           ` Peter Zijlstra
2018-09-18 13:54             ` Jan H. Schönherr
2018-09-18 13:42           ` Peter Zijlstra
2018-09-18 14:35           ` Rik van Riel
2018-09-19  9:23             ` Jan H. Schönherr
2018-11-23 16:51           ` Frederic Weisbecker
2018-12-04 13:23             ` Jan H. Schönherr
2018-09-17 11:33     ` [RFC 00/60] Coscheduling for Linux Peter Zijlstra
2018-11-02 22:13       ` Nishanth Aravamudan
2018-09-17 12:25     ` Peter Zijlstra
2018-09-26  9:58       ` Jan H. Schönherr
2018-09-27 18:36         ` Subhra Mazumdar
2018-11-23 16:29           ` Frederic Weisbecker
2018-09-17 13:37     ` Peter Zijlstra
2018-09-26  9:35       ` Jan H. Schönherr
2018-09-18 14:40     ` Rik van Riel
2018-09-24 15:23       ` Jan H. Schönherr
2018-09-24 18:01         ` Rik van Riel
2018-09-18  0:33 ` Subhra Mazumdar
2018-09-18 11:44   ` Jan H. Schönherr
2018-09-19 21:53     ` Subhra Mazumdar
2018-09-24 15:43       ` Jan H. Schönherr
2018-09-27 18:12         ` Subhra Mazumdar
2018-10-04 13:29 ` Jon Masters
2018-10-17  2:09 ` Frederic Weisbecker
2018-10-19 11:40   ` Jan H. Schönherr
2018-10-19 14:52     ` Frederic Weisbecker
2018-10-19 15:16     ` Rik van Riel
2018-10-19 15:33       ` Frederic Weisbecker
2018-10-19 15:45         ` Rik van Riel
2018-10-19 19:07           ` Jan H. Schönherr
2018-10-19  0:26 ` Subhra Mazumdar
2018-10-26 23:44   ` Jan H. Schönherr
2018-10-29 22:52     ` Subhra Mazumdar
2018-10-26 23:05 ` Subhra Mazumdar
2018-10-27  0:07   ` Jan H. Schönherr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180907214047.26914-24-jschoenh@amazon.de \
    --to=jschoenh@amazon.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).