linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Tejun Heo <tj@kernel.org>, Li Zefan <lizefan@huawei.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kernel-team@fb.com, pjt@google.com,
	luto@amacapital.net, Mike Galbraith <efault@gmx.de>,
	torvalds@linux-foundation.org, Roman Gushchin <guro@fb.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Patrick Bellasi <patrick.bellasi@arm.com>,
	Waiman Long <longman@redhat.com>
Subject: [PATCH v13 05/11] cpuset: Add an error state to cpuset.sched.partition
Date: Fri, 12 Oct 2018 13:55:45 -0400	[thread overview]
Message-ID: <1539366951-8498-6-git-send-email-longman@redhat.com> (raw)
In-Reply-To: <1539366951-8498-1-git-send-email-longman@redhat.com>

Handling error returned by update_parent_subparts_cpumask() in
update_cpumasks_hier() is problematic as the states may become
inconsistent. To avoid that and increase flexibility in handling other
error cases, a new error state (-1) is added to the partition_root_state
flag. This new error state is set internally and user cannot write this
value to "cpuset.sched.partition".

In this error state, the partition root becomes an erroneous one. It is
no longer a real partition root, but the CS_CPU_EXCLUSIVE flag will
still be set as it can be changed back to a real one if appropriate
change happens later on.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 kernel/cgroup/cpuset.c | 146 +++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 123 insertions(+), 23 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 9c2e73c..12ec61d 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -153,10 +153,19 @@ struct cpuset {
  * Partition root states:
  *
  *   0 - not a partition root
+ *
  *   1 - partition root
+ *
+ *  -1 - erroneous partition root
+ *       None of the cpus in cpus_allowed can be put into the parent's
+ *       subparts_cpus. In this case, the cpuset is not a real partition
+ *       root anymore.  However, the CPU_EXCLUSIVE bit will still be set
+ *       and the cpuset can be restored back to a partition root if the
+ *       parent cpuset can give more CPUs back to this child cpuset.
  */
 #define PRS_DISABLED		0
 #define PRS_ENABLED		1
+#define PRS_ERROR		-1
 
 /*
  * Temporary cpumasks for working with partitions that are passed among
@@ -251,7 +260,7 @@ static inline int is_spread_slab(const struct cpuset *cs)
 
 static inline int is_partition_root(const struct cpuset *cs)
 {
-	return cs->partition_root_state;
+	return cs->partition_root_state > 0;
 }
 
 static struct cpuset top_cpuset = {
@@ -1015,9 +1024,12 @@ enum subparts_cmd {
  *
  * For partcmd_update, if the optional newmask is specified, the cpu
  * list is to be changed from cpus_allowed to newmask. Otherwise,
- * cpus_allowed is assumed to remain the same.  The function will return
- * 1 if changes to parent's subparts_cpus and effective_cpus happen or 0
- * otherwise. In case of error, an error code will be returned.
+ * cpus_allowed is assumed to remain the same. The cpuset should either
+ * be a partition root or an erroneous partition root. The partition root
+ * state may change if newmask is NULL and none of the requested CPUs can
+ * be granted by the parent. The function will return 1 if changes to
+ * parent's subparts_cpus and effective_cpus happen or 0 otherwise.
+ * Error code should only be returned when newmask is non-NULL.
  *
  * The partcmd_enable and partcmd_disable commands are used by
  * update_prstate(). The partcmd_update command is used by
@@ -1042,6 +1054,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd,
 	struct cpuset *parent = parent_cs(cpuset);
 	int adding;	/* Moving cpus from effective_cpus to subparts_cpus */
 	int deleting;	/* Moving cpus from subparts_cpus to effective_cpus */
+	bool part_error = false;	/* Partition error? */
 
 	/*
 	 * The parent must be a partition root.
@@ -1108,13 +1121,48 @@ static int update_parent_subparts_cpumask(struct cpuset *cpuset, int cmd,
 		 * addmask = cpus_allowed & parent->effectiveb_cpus
 		 *
 		 * Note that parent's subparts_cpus may have been
-		 * pre-shrunk in case the CPUs granted to the parent
-		 * by the grandparent changes. So no deletion is needed.
+		 * pre-shrunk in case there is a change in the cpu list.
+		 * So no deletion is needed.
 		 */
 		adding = cpumask_and(tmp->addmask, cpuset->cpus_allowed,
 				     parent->effective_cpus);
-		if (cpumask_equal(tmp->addmask, parent->effective_cpus))
-			return -EINVAL;
+		part_error = cpumask_equal(tmp->addmask,
+					   parent->effective_cpus);
+	}
+
+	if (cmd == partcmd_update) {
+		int prev_prs = cpuset->partition_root_state;
+
+		/*
+		 * Check for possible transition between PRS_ENABLED
+		 * and PRS_ERROR.
+		 */
+		switch (cpuset->partition_root_state) {
+		case PRS_ENABLED:
+			if (part_error)
+				cpuset->partition_root_state = PRS_ERROR;
+			break;
+		case PRS_ERROR:
+			if (!part_error)
+				cpuset->partition_root_state = PRS_ENABLED;
+			break;
+		}
+		/*
+		 * Set part_error if previously in erroneous state.
+		 */
+		part_error = (prev_prs == PRS_ERROR);
+	}
+
+	if (!part_error && (cpuset->partition_root_state == PRS_ERROR))
+		return 0;	/* Nothing need to be done */
+
+	if (cpuset->partition_root_state == PRS_ERROR) {
+		/*
+		 * Remove all its cpus from parent's subparts_cpus.
+		 */
+		adding = false;
+		deleting = cpumask_and(tmp->delmask, cpuset->cpus_allowed,
+				       parent->subparts_cpus);
 	}
 
 	if (!adding && !deleting)
@@ -1166,28 +1214,21 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp)
 	rcu_read_lock();
 	cpuset_for_each_descendant_pre(cp, pos_css, cs) {
 		struct cpuset *parent = parent_cs(cp);
-		bool cs_empty;
 
 		compute_effective_cpumask(tmp->new_cpus, cp, parent);
-		cs_empty = cpumask_empty(tmp->new_cpus);
-
-		/*
-		 * A partition root cannot have empty effective_cpus
-		 */
-		WARN_ON_ONCE(cs_empty && is_partition_root(cp));
 
 		/*
 		 * If it becomes empty, inherit the effective mask of the
 		 * parent, which is guaranteed to have some CPUs.
 		 */
-		if (is_in_v2_mode() && cs_empty)
+		if (is_in_v2_mode() && cpumask_empty(tmp->new_cpus))
 			cpumask_copy(tmp->new_cpus, parent->effective_cpus);
 
 		/*
 		 * Skip the whole subtree if the cpumask remains the same
 		 * and has no partition root state.
 		 */
-		if (!is_partition_root(cp) &&
+		if (!cp->partition_root_state &&
 		    cpumask_equal(tmp->new_cpus, cp->effective_cpus)) {
 			pos_css = css_rightmost_descendant(pos_css);
 			continue;
@@ -1199,11 +1240,39 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp)
 		 * update_tasks_cpumask() again for tasks in the parent
 		 * cpuset if the parent's subparts_cpus changes.
 		 */
-		if ((cp != cs) && cp->partition_root_state &&
-		    update_parent_subparts_cpumask(cp, partcmd_update,
-						   NULL, tmp)) {
-			if (parent != &top_cpuset)
-				update_tasks_cpumask(parent);
+		if ((cp != cs) && cp->partition_root_state) {
+			switch (parent->partition_root_state) {
+			case PRS_DISABLED:
+				/*
+				 * If parent is not a partition root or an
+				 * erroneous partition root, clear the state
+				 * state and the CS_CPU_EXCLUSIVE flag.
+				 */
+				WARN_ON_ONCE(cp->partition_root_state
+					     != PRS_ERROR);
+				cp->partition_root_state = 0;
+				spin_lock_irq(&callback_lock);
+				clear_bit(CS_CPU_EXCLUSIVE, &cp->flags);
+				spin_unlock_irq(&callback_lock);
+				break;
+
+			case PRS_ENABLED:
+				if (update_parent_subparts_cpumask(cp,
+					partcmd_update, NULL, tmp))
+					update_tasks_cpumask(parent);
+				break;
+
+			case PRS_ERROR:
+				/*
+				 * When parent is erroneous, it has to be too.
+				 */
+				cp->partition_root_state = PRS_ERROR;
+				if (cp->nr_subparts_cpus) {
+					cp->nr_subparts_cpus = 0;
+					cpumask_clear(cp->subparts_cpus);
+				}
+				break;
+			}
 		}
 
 		if (!css_tryget_online(&cp->css))
@@ -1213,13 +1282,33 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp)
 		spin_lock_irq(&callback_lock);
 
 		cpumask_copy(cp->effective_cpus, tmp->new_cpus);
-		if (cp->nr_subparts_cpus) {
+		if (cp->nr_subparts_cpus &&
+		   (cp->partition_root_state != PRS_ENABLED)) {
+			cp->nr_subparts_cpus = 0;
+			cpumask_clear(cp->subparts_cpus);
+		} else if (cp->nr_subparts_cpus) {
 			/*
 			 * Make sure that effective_cpus & subparts_cpus
 			 * are mutually exclusive.
+			 *
+			 * In the unlikely event that effective_cpus
+			 * becomes empty. we clear cp->nr_subparts_cpus and
+			 * let its child partition roots to compete for
+			 * CPUs again.
 			 */
 			cpumask_andnot(cp->effective_cpus, cp->effective_cpus,
 				       cp->subparts_cpus);
+			if (cpumask_empty(cp->effective_cpus)) {
+				cpumask_copy(cp->effective_cpus, tmp->new_cpus);
+				cpumask_clear(cp->subparts_cpus);
+				cp->nr_subparts_cpus = 0;
+			} else if (!cpumask_subset(cp->subparts_cpus,
+						   tmp->new_cpus)) {
+				cpumask_andnot(cp->subparts_cpus,
+					cp->subparts_cpus, tmp->new_cpus);
+				cp->nr_subparts_cpus
+					= cpumask_weight(cp->subparts_cpus);
+			}
 		}
 		spin_unlock_irq(&callback_lock);
 
@@ -1724,6 +1813,17 @@ static int update_prstate(struct cpuset *cs, int val)
 		}
 		cs->partition_root_state = PRS_ENABLED;
 	} else {
+		/*
+		 * Turning off partition root will clear the
+		 * CS_CPU_EXCLUSIVE bit.
+		 */
+		if (cs->partition_root_state == PRS_ERROR) {
+			cs->partition_root_state = 0;
+			update_flag(CS_CPU_EXCLUSIVE, cs, 0);
+			err = 0;
+			goto out;
+		}
+
 		err = update_parent_subparts_cpumask(cs, partcmd_disable,
 						     NULL, &tmp);
 		if (err)
-- 
1.8.3.1


  parent reply	other threads:[~2018-10-12 17:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-12 17:55 [PATCH v13 00/11] cpuset: Enable cpuset controller in default hierarchy Waiman Long
2018-10-12 17:55 ` [PATCH v13 01/11] " Waiman Long
2018-10-15 18:31   ` Tom Hromatka
2018-10-15 18:41     ` Waiman Long
2018-10-15 19:10       ` Tom Hromatka
2018-10-12 17:55 ` [PATCH v13 02/11] cpuset: Define data structures to support scheduling partition Waiman Long
2018-10-12 17:55 ` [PATCH v13 03/11] cpuset: Simply allocation and freeing of cpumasks Waiman Long
2018-10-15 18:35   ` Tom Hromatka
2018-10-15 18:43     ` Waiman Long
2018-10-12 17:55 ` [PATCH v13 04/11] cpuset: Add new v2 cpuset.sched.partition flag Waiman Long
2018-10-13 14:10   ` kbuild test robot
2018-10-12 17:55 ` Waiman Long [this message]
2018-10-12 17:55 ` [PATCH v13 06/11] cpuset: Track cpusets that use parent's effective_cpus Waiman Long
2018-10-12 17:55 ` [PATCH v13 07/11] cpuset: Make CPU hotplug work with partition Waiman Long
2018-10-12 17:55 ` [PATCH v13 08/11] cpuset: Make generate_sched_domains() " Waiman Long
2018-10-12 17:55 ` [PATCH v13 09/11] cpuset: Expose cpus.effective and mems.effective on cgroup v2 root Waiman Long
2018-10-12 17:55 ` [PATCH v13 10/11] cpuset: Add documentation about the new "cpuset.sched.partition" flag Waiman Long
2018-10-12 17:55 ` [PATCH v13 11/11] cpuset: Expose cpuset.cpus.subpartitions with cgroup_debug Waiman Long
2018-10-15 16:35 ` [PATCH v13 00/11] cpuset: Enable cpuset controller in default hierarchy Tejun Heo
2018-10-15 17:04   ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1539366951-8498-6-git-send-email-longman@redhat.com \
    --to=longman@redhat.com \
    --cc=cgroups@vger.kernel.org \
    --cc=efault@gmx.de \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@fb.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=luto@amacapital.net \
    --cc=mingo@redhat.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).