All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] Make find_later_rq() choose a closer cpu in topology
@ 2017-05-12  5:48 Byungchul Park
  2017-05-12  5:48 ` [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically Byungchul Park
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Byungchul Park @ 2017-05-12  5:48 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, juri.lelli, rostedt, bristot, kernel-team

Change from v3
   -. rename closest_cpu to best_cpu so that it align with rt
   -. protect referring cpudl.elements with cpudl.lock
   -. change return value of cpudl_find() to bool

Change from v2
   -. add support for SD_PREFER_SIBLING

Change from v1
   -. clean up the patch

When cpudl_find() returns any among free_cpus, the cpu might not be
closer than others, considering sched domain. For example:

   this_cpu: 15
   free_cpus: 0, 1,..., 14 (== later_mask)
   best_cpu: 0

   topology:

   0 --+
       +--+
   1 --+  |
          +-- ... --+
   2 --+  |         |
       +--+         |
   3 --+            |

   ...             ...

   12 --+           |
        +--+        |
   13 --+  |        |
           +-- ... -+
   14 --+  |
        +--+
   15 --+

In this case, it would be best to select 14 since it's a free cpu and
closest to 15(this_cpu). However, currently the code select 0(best_cpu)
even though that's just any among free_cpus. Fix it.

Byungchul Park (5):
  sched/deadline: Refer to cpudl.elements atomically
  sched/deadline: Make find_later_rq() choose a closer cpu in topology
  sched/deadline: Change return value of cpudl_find()
  sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()
  sched/rt: Add support for SD_PREFER_SIBLING on find_lowest_rq()

 kernel/sched/cpudeadline.c | 46 +++++++++++++++++++++++++++++++-------------
 kernel/sched/deadline.c    | 48 +++++++++++++++++++++++++++++++---------------
 kernel/sched/rt.c          | 17 ++++++++++++++++
 3 files changed, 83 insertions(+), 28 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically
  2017-05-12  5:48 [PATCH v4 0/5] Make find_later_rq() choose a closer cpu in topology Byungchul Park
@ 2017-05-12  5:48 ` Byungchul Park
  2017-05-12 14:25   ` Steven Rostedt
  2017-05-12  5:48 ` [PATCH v4 2/5] sched/deadline: Make find_later_rq() choose a closer cpu in topology Byungchul Park
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Byungchul Park @ 2017-05-12  5:48 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, juri.lelli, rostedt, bristot, kernel-team

cpudl.elements is an instance that should be protected with a spin lock.
Without it, the code would be insane.

Current cpudl_find() has problems like,

   1. cpudl.elements[0].cpu might not match with cpudl.elements[0].dl.
   2. cpudl.elements[0].dl(u64) might not be referred atomically.
   3. Two cpudl_maximum()s might return different values.
   4. It's just insane.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/cpudeadline.c | 37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index fba235c..6b67016 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -131,16 +131,39 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
 	    cpumask_and(later_mask, cp->free_cpus, &p->cpus_allowed)) {
 		best_cpu = cpumask_any(later_mask);
 		goto out;
-	} else if (cpumask_test_cpu(cpudl_maximum(cp), &p->cpus_allowed) &&
-			dl_time_before(dl_se->deadline, cp->elements[0].dl)) {
-		best_cpu = cpudl_maximum(cp);
-		if (later_mask)
-			cpumask_set_cpu(best_cpu, later_mask);
+	} else {
+		u64 cpudl_dl;
+		int cpudl_cpu;
+		int cpudl_valid;
+		unsigned long flags;
+
+		/*
+		 * Referring to cp->elements must be atomic ops.
+		 */
+		raw_spin_lock_irqsave(&cp->lock, flags);
+		/*
+		 * No problem even in case of very initial heap tree
+		 * to which no entry has been added yet, since
+		 * cp->elements[0].cpu was initialized to zero and
+		 * cp->elements[0].idx was initialized to IDX_INVALID,
+		 * that means the case will be filtered out at the
+		 * following condition.
+		 */
+		cpudl_cpu = cpudl_maximum(cp);
+		cpudl_dl = cp->elements[0].dl;
+		cpudl_valid = cp->elements[cpudl_cpu].idx;
+		raw_spin_unlock_irqrestore(&cp->lock, flags);
+
+		if (cpudl_valid != IDX_INVALID &&
+		    cpumask_test_cpu(cpudl_cpu, &p->cpus_allowed) &&
+		    dl_time_before(dl_se->deadline, cpudl_dl)) {
+			best_cpu = cpudl_cpu;
+			if (later_mask)
+				cpumask_set_cpu(best_cpu, later_mask);
+		}
 	}
 
 out:
-	WARN_ON(best_cpu != -1 && !cpu_present(best_cpu));
-
 	return best_cpu;
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/5] sched/deadline: Make find_later_rq() choose a closer cpu in topology
  2017-05-12  5:48 [PATCH v4 0/5] Make find_later_rq() choose a closer cpu in topology Byungchul Park
  2017-05-12  5:48 ` [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically Byungchul Park
@ 2017-05-12  5:48 ` Byungchul Park
  2017-05-12  5:48 ` [PATCH v4 3/5] sched/deadline: Change return value of cpudl_find() Byungchul Park
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Byungchul Park @ 2017-05-12  5:48 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, juri.lelli, rostedt, bristot, kernel-team

When cpudl_find() returns any among free_cpus, the cpu might not be
closer than others, considering sched domain. For example:

   this_cpu: 15
   free_cpus: 0, 1,..., 14 (== later_mask)
   best_cpu: 0

   topology:

   0 --+
       +--+
   1 --+  |
          +-- ... --+
   2 --+  |         |
       +--+         |
   3 --+            |

   ...             ...

   12 --+           |
        +--+        |
   13 --+  |        |
           +-- ... -+
   14 --+  |
        +--+
   15 --+

In this case, it would be best to select 14 since it's a free cpu and
closest to 15(this_cpu). However, currently the code select 0(best_cpu)
even though that's just any among free_cpus. Fix it.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/deadline.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a2ce590..9d997d9 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1324,7 +1324,7 @@ static int find_later_rq(struct task_struct *task)
 	struct sched_domain *sd;
 	struct cpumask *later_mask = this_cpu_cpumask_var_ptr(local_cpu_mask_dl);
 	int this_cpu = smp_processor_id();
-	int best_cpu, cpu = task_cpu(task);
+	int cpu = task_cpu(task);
 
 	/* Make sure the mask is initialized first */
 	if (unlikely(!later_mask))
@@ -1337,17 +1337,14 @@ static int find_later_rq(struct task_struct *task)
 	 * We have to consider system topology and task affinity
 	 * first, then we can look for a suitable cpu.
 	 */
-	best_cpu = cpudl_find(&task_rq(task)->rd->cpudl,
-			task, later_mask);
-	if (best_cpu == -1)
+	if (cpudl_find(&task_rq(task)->rd->cpudl, task, later_mask) == -1)
 		return -1;
 
 	/*
-	 * If we are here, some target has been found,
-	 * the most suitable of which is cached in best_cpu.
-	 * This is, among the runqueues where the current tasks
-	 * have later deadlines than the task's one, the rq
-	 * with the latest possible one.
+	 * If we are here, some targets have been found, including
+	 * the most suitable which is, among the runqueues where the
+	 * current tasks have later deadlines than the task's one, the
+	 * rq with the latest possible one.
 	 *
 	 * Now we check how well this matches with task's
 	 * affinity and system topology.
@@ -1367,6 +1364,7 @@ static int find_later_rq(struct task_struct *task)
 	rcu_read_lock();
 	for_each_domain(cpu, sd) {
 		if (sd->flags & SD_WAKE_AFFINE) {
+			int best_cpu;
 
 			/*
 			 * If possible, preempting this_cpu is
@@ -1378,12 +1376,15 @@ static int find_later_rq(struct task_struct *task)
 				return this_cpu;
 			}
 
+			best_cpu = cpumask_first_and(later_mask,
+							sched_domain_span(sd));
 			/*
-			 * Last chance: if best_cpu is valid and is
-			 * in the mask, that becomes our choice.
+			 * Last chance: if a cpu being in both later_mask
+			 * and current sd span is valid, that becomes our
+			 * choice. Of course, the latest possible cpu is
+			 * already under consideration through later_mask.
 			 */
-			if (best_cpu < nr_cpu_ids &&
-			    cpumask_test_cpu(best_cpu, sched_domain_span(sd))) {
+			if (best_cpu < nr_cpu_ids) {
 				rcu_read_unlock();
 				return best_cpu;
 			}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 3/5] sched/deadline: Change return value of cpudl_find()
  2017-05-12  5:48 [PATCH v4 0/5] Make find_later_rq() choose a closer cpu in topology Byungchul Park
  2017-05-12  5:48 ` [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically Byungchul Park
  2017-05-12  5:48 ` [PATCH v4 2/5] sched/deadline: Make find_later_rq() choose a closer cpu in topology Byungchul Park
@ 2017-05-12  5:48 ` Byungchul Park
  2017-05-12  5:48 ` [PATCH v4 4/5] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq() Byungchul Park
  2017-05-12  5:48 ` [PATCH v4 5/5] sched/rt: Add support for SD_PREFER_SIBLING on find_lowest_rq() Byungchul Park
  4 siblings, 0 replies; 13+ messages in thread
From: Byungchul Park @ 2017-05-12  5:48 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, juri.lelli, rostedt, bristot, kernel-team

Currently cpudl_find() returns the best cpu that means it has the
maximum dl, however, the value is already kept in later_mask and the
return value is not referred directly any more.

Now, it's enough to return whether CPUs were found or not, like rt.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/cpudeadline.c | 13 +++++--------
 kernel/sched/deadline.c    |  6 +++---
 2 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 6b67016..28d057f 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -119,18 +119,16 @@ static inline int cpudl_maximum(struct cpudl *cp)
  * @p: the task
  * @later_mask: a mask to fill in with the selected CPUs (or NULL)
  *
- * Returns: int - best CPU (heap maximum if suitable)
+ * Returns: (int)bool - CPUs were found
  */
 int cpudl_find(struct cpudl *cp, struct task_struct *p,
 	       struct cpumask *later_mask)
 {
-	int best_cpu = -1;
 	const struct sched_dl_entity *dl_se = &p->dl;
 
 	if (later_mask &&
 	    cpumask_and(later_mask, cp->free_cpus, &p->cpus_allowed)) {
-		best_cpu = cpumask_any(later_mask);
-		goto out;
+		return 1;
 	} else {
 		u64 cpudl_dl;
 		int cpudl_cpu;
@@ -157,14 +155,13 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
 		if (cpudl_valid != IDX_INVALID &&
 		    cpumask_test_cpu(cpudl_cpu, &p->cpus_allowed) &&
 		    dl_time_before(dl_se->deadline, cpudl_dl)) {
-			best_cpu = cpudl_cpu;
 			if (later_mask)
-				cpumask_set_cpu(best_cpu, later_mask);
+				cpumask_set_cpu(cpudl_cpu, later_mask);
+			return 1;
 		}
 	}
 
-out:
-	return best_cpu;
+	return 0;
 }
 
 /*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 9d997d9..0223694 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1107,7 +1107,7 @@ static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
 	 * let's hope p can move out.
 	 */
 	if (rq->curr->nr_cpus_allowed == 1 ||
-	    cpudl_find(&rq->rd->cpudl, rq->curr, NULL) == -1)
+	    !cpudl_find(&rq->rd->cpudl, rq->curr, NULL))
 		return;
 
 	/*
@@ -1115,7 +1115,7 @@ static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
 	 * see if it is pushed or pulled somewhere else.
 	 */
 	if (p->nr_cpus_allowed != 1 &&
-	    cpudl_find(&rq->rd->cpudl, p, NULL) != -1)
+	    cpudl_find(&rq->rd->cpudl, p, NULL))
 		return;
 
 	resched_curr(rq);
@@ -1337,7 +1337,7 @@ static int find_later_rq(struct task_struct *task)
 	 * We have to consider system topology and task affinity
 	 * first, then we can look for a suitable cpu.
 	 */
-	if (cpudl_find(&task_rq(task)->rd->cpudl, task, later_mask) == -1)
+	if (!cpudl_find(&task_rq(task)->rd->cpudl, task, later_mask))
 		return -1;
 
 	/*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 4/5] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq()
  2017-05-12  5:48 [PATCH v4 0/5] Make find_later_rq() choose a closer cpu in topology Byungchul Park
                   ` (2 preceding siblings ...)
  2017-05-12  5:48 ` [PATCH v4 3/5] sched/deadline: Change return value of cpudl_find() Byungchul Park
@ 2017-05-12  5:48 ` Byungchul Park
  2017-05-12  5:48 ` [PATCH v4 5/5] sched/rt: Add support for SD_PREFER_SIBLING on find_lowest_rq() Byungchul Park
  4 siblings, 0 replies; 13+ messages in thread
From: Byungchul Park @ 2017-05-12  5:48 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, juri.lelli, rostedt, bristot, kernel-team

It would be better to avoid pushing tasks to other cpu within
a SD_PREFER_SIBLING domain, instead, get more chances to check other
siblings.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/deadline.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0223694..ada264c 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1325,6 +1325,7 @@ static int find_later_rq(struct task_struct *task)
 	struct cpumask *later_mask = this_cpu_cpumask_var_ptr(local_cpu_mask_dl);
 	int this_cpu = smp_processor_id();
 	int cpu = task_cpu(task);
+	int fallback_cpu = -1;
 
 	/* Make sure the mask is initialized first */
 	if (unlikely(!later_mask))
@@ -1385,6 +1386,15 @@ static int find_later_rq(struct task_struct *task)
 			 * already under consideration through later_mask.
 			 */
 			if (best_cpu < nr_cpu_ids) {
+				/*
+				 * If current domain is SD_PREFER_SIBLING
+				 * flaged, we have to get more chances to
+				 * check other siblings.
+				 */
+				if (sd->flags & SD_PREFER_SIBLING) {
+					fallback_cpu = best_cpu;
+					continue;
+				}
 				rcu_read_unlock();
 				return best_cpu;
 			}
@@ -1393,6 +1403,13 @@ static int find_later_rq(struct task_struct *task)
 	rcu_read_unlock();
 
 	/*
+	 * If fallback_cpu is valid, all our guesses failed *except* for
+	 * SD_PREFER_SIBLING domain. Now, we can return the fallback cpu.
+	 */
+	if (fallback_cpu != -1)
+		return fallback_cpu;
+
+	/*
 	 * At this point, all our guesses failed, we just return
 	 * 'something', and let the caller sort the things out.
 	 */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 5/5] sched/rt: Add support for SD_PREFER_SIBLING on find_lowest_rq()
  2017-05-12  5:48 [PATCH v4 0/5] Make find_later_rq() choose a closer cpu in topology Byungchul Park
                   ` (3 preceding siblings ...)
  2017-05-12  5:48 ` [PATCH v4 4/5] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq() Byungchul Park
@ 2017-05-12  5:48 ` Byungchul Park
  4 siblings, 0 replies; 13+ messages in thread
From: Byungchul Park @ 2017-05-12  5:48 UTC (permalink / raw)
  To: peterz, mingo; +Cc: linux-kernel, juri.lelli, rostedt, bristot, kernel-team

It would be better to avoid pushing tasks to other cpu within
a SD_PREFER_SIBLING domain, instead, get more chances to check other
siblings.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
---
 kernel/sched/rt.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 979b734..6332b2ad 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1624,6 +1624,7 @@ static int find_lowest_rq(struct task_struct *task)
 	struct cpumask *lowest_mask = this_cpu_cpumask_var_ptr(local_cpu_mask);
 	int this_cpu = smp_processor_id();
 	int cpu      = task_cpu(task);
+	int fallback_cpu = -1;
 
 	/* Make sure the mask is initialized first */
 	if (unlikely(!lowest_mask))
@@ -1671,6 +1672,15 @@ static int find_lowest_rq(struct task_struct *task)
 			best_cpu = cpumask_first_and(lowest_mask,
 						     sched_domain_span(sd));
 			if (best_cpu < nr_cpu_ids) {
+				/*
+				 * If current domain is SD_PREFER_SIBLING
+				 * flaged, we have to get more chances to
+				 * check other siblings.
+				 */
+				if (sd->flags & SD_PREFER_SIBLING) {
+					fallback_cpu = best_cpu;
+					continue;
+				}
 				rcu_read_unlock();
 				return best_cpu;
 			}
@@ -1679,6 +1689,13 @@ static int find_lowest_rq(struct task_struct *task)
 	rcu_read_unlock();
 
 	/*
+	 * If fallback_cpu is valid, all our quesses failed *except* for
+	 * SD_PREFER_SIBLING domain. Now, we can return the fallback cpu.
+	 */
+	if (fallback_cpu != -1)
+		return fallback_cpu;
+
+	/*
 	 * And finally, if there were no matches within the domains
 	 * just give the caller *something* to work with from the compatible
 	 * locations.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically
  2017-05-12  5:48 ` [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically Byungchul Park
@ 2017-05-12 14:25   ` Steven Rostedt
  2017-05-15  8:36     ` Juri Lelli
  2017-05-16  6:52     ` Byungchul Park
  0 siblings, 2 replies; 13+ messages in thread
From: Steven Rostedt @ 2017-05-12 14:25 UTC (permalink / raw)
  To: Byungchul Park
  Cc: peterz, mingo, linux-kernel, juri.lelli, bristot, kernel-team

On Fri, 12 May 2017 14:48:45 +0900
Byungchul Park <byungchul.park@lge.com> wrote:

> cpudl.elements is an instance that should be protected with a spin lock.
> Without it, the code would be insane.

And how much contention will this add? Spin locks in the scheduler code
that are shared among a domain can cause huge latency. This was why I
worked hard not to add any in the cpupri code.


> 
> Current cpudl_find() has problems like,
> 
>    1. cpudl.elements[0].cpu might not match with cpudl.elements[0].dl.
>    2. cpudl.elements[0].dl(u64) might not be referred atomically.
>    3. Two cpudl_maximum()s might return different values.
>    4. It's just insane.

And lockless algorithms usually are insane. But locks come with a huge
cost. What happens when we have 32 core domains. This can cause
tremendous contention and makes the entire cpu priority for deadlines
useless. Might as well rip out the code.

I haven't looked too hard into the deadline version, I may have to
spend some time doing so soon. But unfortunately, I have other critical
sections to spend brain cycles on.

-- Steve


> 
> Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> ---
>  kernel/sched/cpudeadline.c | 37 ++++++++++++++++++++++++++++++-------
>  1 file changed, 30 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
> index fba235c..6b67016 100644
> --- a/kernel/sched/cpudeadline.c
> +++ b/kernel/sched/cpudeadline.c
> @@ -131,16 +131,39 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
>  	    cpumask_and(later_mask, cp->free_cpus, &p->cpus_allowed)) {
>  		best_cpu = cpumask_any(later_mask);
>  		goto out;
> -	} else if (cpumask_test_cpu(cpudl_maximum(cp), &p->cpus_allowed) &&
> -			dl_time_before(dl_se->deadline, cp->elements[0].dl)) {
> -		best_cpu = cpudl_maximum(cp);
> -		if (later_mask)
> -			cpumask_set_cpu(best_cpu, later_mask);
> +	} else {
> +		u64 cpudl_dl;
> +		int cpudl_cpu;
> +		int cpudl_valid;
> +		unsigned long flags;
> +
> +		/*
> +		 * Referring to cp->elements must be atomic ops.
> +		 */
> +		raw_spin_lock_irqsave(&cp->lock, flags);
> +		/*
> +		 * No problem even in case of very initial heap tree
> +		 * to which no entry has been added yet, since
> +		 * cp->elements[0].cpu was initialized to zero and
> +		 * cp->elements[0].idx was initialized to IDX_INVALID,
> +		 * that means the case will be filtered out at the
> +		 * following condition.
> +		 */
> +		cpudl_cpu = cpudl_maximum(cp);
> +		cpudl_dl = cp->elements[0].dl;
> +		cpudl_valid = cp->elements[cpudl_cpu].idx;
> +		raw_spin_unlock_irqrestore(&cp->lock, flags);
> +
> +		if (cpudl_valid != IDX_INVALID &&
> +		    cpumask_test_cpu(cpudl_cpu, &p->cpus_allowed) &&
> +		    dl_time_before(dl_se->deadline, cpudl_dl)) {
> +			best_cpu = cpudl_cpu;
> +			if (later_mask)
> +				cpumask_set_cpu(best_cpu, later_mask);
> +		}
>  	}
>  
>  out:
> -	WARN_ON(best_cpu != -1 && !cpu_present(best_cpu));
> -
>  	return best_cpu;
>  }
>  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically
  2017-05-12 14:25   ` Steven Rostedt
@ 2017-05-15  8:36     ` Juri Lelli
  2017-05-16  7:00       ` Byungchul Park
  2017-05-16  6:52     ` Byungchul Park
  1 sibling, 1 reply; 13+ messages in thread
From: Juri Lelli @ 2017-05-15  8:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Byungchul Park, peterz, mingo, linux-kernel, juri.lelli, bristot,
	kernel-team

Hi,

On 12/05/17 10:25, Steven Rostedt wrote:
> On Fri, 12 May 2017 14:48:45 +0900
> Byungchul Park <byungchul.park@lge.com> wrote:
> 
> > cpudl.elements is an instance that should be protected with a spin lock.
> > Without it, the code would be insane.
> 
> And how much contention will this add? Spin locks in the scheduler code
> that are shared among a domain can cause huge latency. This was why I
> worked hard not to add any in the cpupri code.
> 
> 
> > 
> > Current cpudl_find() has problems like,
> > 
> >    1. cpudl.elements[0].cpu might not match with cpudl.elements[0].dl.
> >    2. cpudl.elements[0].dl(u64) might not be referred atomically.
> >    3. Two cpudl_maximum()s might return different values.
> >    4. It's just insane.
> 
> And lockless algorithms usually are insane. But locks come with a huge
> cost. What happens when we have 32 core domains. This can cause
> tremendous contention and makes the entire cpu priority for deadlines
> useless. Might as well rip out the code.
> 

Right. So, rationale for not taking any lock in the find() path (at the
risk of getting bogus values) is that we don't want to pay to much in
terms of contention, when also considering the fact that find_lock_later_
rq() might then release the rq lock, possibly making the search useless
(if things change in the meantime anyway). The update path is instead
guarded by a lock, to ensure consistency.

Experiments on reasonably big machines (48-cores IIRC) showed that the
approach was "good enough", so we looked somewhere else to improve
things (as there are many to improve :). This of course doesn't prevent
us to look at this again now and see if we need to do something about it.

Having numbers about introduced overhead and wrong decisions caused by
the lockless find() path would help a lot understanding what (and can)
be done.

Thanks!

- Juri

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically
  2017-05-12 14:25   ` Steven Rostedt
  2017-05-15  8:36     ` Juri Lelli
@ 2017-05-16  6:52     ` Byungchul Park
  2017-05-16 10:32       ` Juri Lelli
  1 sibling, 1 reply; 13+ messages in thread
From: Byungchul Park @ 2017-05-16  6:52 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: peterz, mingo, linux-kernel, juri.lelli, bristot, kernel-team

On Fri, May 12, 2017 at 10:25:30AM -0400, Steven Rostedt wrote:
> On Fri, 12 May 2017 14:48:45 +0900
> Byungchul Park <byungchul.park@lge.com> wrote:
> 
> > cpudl.elements is an instance that should be protected with a spin lock.
> > Without it, the code would be insane.
> 
> And how much contention will this add? Spin locks in the scheduler code
> that are shared among a domain can cause huge latency. This was why I
> worked hard not to add any in the cpupri code.

Yes. That's also whay I hesitated to post this patch..

> > Current cpudl_find() has problems like,
> > 
> >    1. cpudl.elements[0].cpu might not match with cpudl.elements[0].dl.
> >    2. cpudl.elements[0].dl(u64) might not be referred atomically.
> >    3. Two cpudl_maximum()s might return different values.
> >    4. It's just insane.
> 
> And lockless algorithms usually are insane. But locks come with a huge
> cost. What happens when we have 32 core domains. This can cause
> tremendous contention and makes the entire cpu priority for deadlines
> useless. Might as well rip out the code.

I think it would be better if we, even keeping it lockless,

   1. make cp->elements[].cpu referred once than twice,
   2. add retry logic in order to match elements[].cpu with its dl,
   3. add retry logic for the u64 variable to be read atomically,

So what do you think about the suggestions? Of course it does not solve
the problems perfectly though.. Or do you think it's not worth?

> I haven't looked too hard into the deadline version, I may have to
> spend some time doing so soon. But unfortunately, I have other critical
> sections to spend brain cycles on.

OK.

Thank you,
Byungchul

> 
> -- Steve
> 
> 
> > 
> > Signed-off-by: Byungchul Park <byungchul.park@lge.com>
> > ---
> >  kernel/sched/cpudeadline.c | 37 ++++++++++++++++++++++++++++++-------
> >  1 file changed, 30 insertions(+), 7 deletions(-)
> > 
> > diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
> > index fba235c..6b67016 100644
> > --- a/kernel/sched/cpudeadline.c
> > +++ b/kernel/sched/cpudeadline.c
> > @@ -131,16 +131,39 @@ int cpudl_find(struct cpudl *cp, struct task_struct *p,
> >  	    cpumask_and(later_mask, cp->free_cpus, &p->cpus_allowed)) {
> >  		best_cpu = cpumask_any(later_mask);
> >  		goto out;
> > -	} else if (cpumask_test_cpu(cpudl_maximum(cp), &p->cpus_allowed) &&
> > -			dl_time_before(dl_se->deadline, cp->elements[0].dl)) {
> > -		best_cpu = cpudl_maximum(cp);
> > -		if (later_mask)
> > -			cpumask_set_cpu(best_cpu, later_mask);
> > +	} else {
> > +		u64 cpudl_dl;
> > +		int cpudl_cpu;
> > +		int cpudl_valid;
> > +		unsigned long flags;
> > +
> > +		/*
> > +		 * Referring to cp->elements must be atomic ops.
> > +		 */
> > +		raw_spin_lock_irqsave(&cp->lock, flags);
> > +		/*
> > +		 * No problem even in case of very initial heap tree
> > +		 * to which no entry has been added yet, since
> > +		 * cp->elements[0].cpu was initialized to zero and
> > +		 * cp->elements[0].idx was initialized to IDX_INVALID,
> > +		 * that means the case will be filtered out at the
> > +		 * following condition.
> > +		 */
> > +		cpudl_cpu = cpudl_maximum(cp);
> > +		cpudl_dl = cp->elements[0].dl;
> > +		cpudl_valid = cp->elements[cpudl_cpu].idx;
> > +		raw_spin_unlock_irqrestore(&cp->lock, flags);
> > +
> > +		if (cpudl_valid != IDX_INVALID &&
> > +		    cpumask_test_cpu(cpudl_cpu, &p->cpus_allowed) &&
> > +		    dl_time_before(dl_se->deadline, cpudl_dl)) {
> > +			best_cpu = cpudl_cpu;
> > +			if (later_mask)
> > +				cpumask_set_cpu(best_cpu, later_mask);
> > +		}
> >  	}
> >  
> >  out:
> > -	WARN_ON(best_cpu != -1 && !cpu_present(best_cpu));
> > -
> >  	return best_cpu;
> >  }
> >  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically
  2017-05-15  8:36     ` Juri Lelli
@ 2017-05-16  7:00       ` Byungchul Park
  0 siblings, 0 replies; 13+ messages in thread
From: Byungchul Park @ 2017-05-16  7:00 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Steven Rostedt, peterz, mingo, linux-kernel, juri.lelli, bristot,
	kernel-team

On Mon, May 15, 2017 at 09:36:29AM +0100, Juri Lelli wrote:
> Hi,
> 
> On 12/05/17 10:25, Steven Rostedt wrote:
> > On Fri, 12 May 2017 14:48:45 +0900
> > Byungchul Park <byungchul.park@lge.com> wrote:
> > 
> > > cpudl.elements is an instance that should be protected with a spin lock.
> > > Without it, the code would be insane.
> > 
> > And how much contention will this add? Spin locks in the scheduler code
> > that are shared among a domain can cause huge latency. This was why I
> > worked hard not to add any in the cpupri code.
> > 
> > 
> > > 
> > > Current cpudl_find() has problems like,
> > > 
> > >    1. cpudl.elements[0].cpu might not match with cpudl.elements[0].dl.
> > >    2. cpudl.elements[0].dl(u64) might not be referred atomically.
> > >    3. Two cpudl_maximum()s might return different values.
> > >    4. It's just insane.
> > 
> > And lockless algorithms usually are insane. But locks come with a huge
> > cost. What happens when we have 32 core domains. This can cause
> > tremendous contention and makes the entire cpu priority for deadlines
> > useless. Might as well rip out the code.
> > 
> 
> Right. So, rationale for not taking any lock in the find() path (at the
> risk of getting bogus values) is that we don't want to pay to much in
> terms of contention, when also considering the fact that find_lock_later_
> rq() might then release the rq lock, possibly making the search useless
> (if things change in the meantime anyway). The update path is instead
> guarded by a lock, to ensure consistency.
> 
> Experiments on reasonably big machines (48-cores IIRC) showed that the
> approach was "good enough", so we looked somewhere else to improve
> things (as there are many to improve :). This of course doesn't prevent
> us to look at this again now and see if we need to do something about it.
> 
> Having numbers about introduced overhead and wrong decisions caused by
> the lockless find() path would help a lot understanding what (and can)
> be done.

I see what you say. Agree..

Hm.. Before that, what do you think about my suggestions in my reply to
steven?

> 
> Thanks!
> 
> - Juri

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically
  2017-05-16  6:52     ` Byungchul Park
@ 2017-05-16 10:32       ` Juri Lelli
  2017-05-16 13:10         ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: Juri Lelli @ 2017-05-16 10:32 UTC (permalink / raw)
  To: Byungchul Park
  Cc: Steven Rostedt, peterz, mingo, linux-kernel, juri.lelli, bristot,
	kernel-team

On 16/05/17 15:52, Byungchul Park wrote:
> On Fri, May 12, 2017 at 10:25:30AM -0400, Steven Rostedt wrote:
> > On Fri, 12 May 2017 14:48:45 +0900
> > Byungchul Park <byungchul.park@lge.com> wrote:
> > 
> > > cpudl.elements is an instance that should be protected with a spin lock.
> > > Without it, the code would be insane.
> > 
> > And how much contention will this add? Spin locks in the scheduler code
> > that are shared among a domain can cause huge latency. This was why I
> > worked hard not to add any in the cpupri code.
> 
> Yes. That's also whay I hesitated to post this patch..
> 
> > > Current cpudl_find() has problems like,
> > > 
> > >    1. cpudl.elements[0].cpu might not match with cpudl.elements[0].dl.
> > >    2. cpudl.elements[0].dl(u64) might not be referred atomically.
> > >    3. Two cpudl_maximum()s might return different values.
> > >    4. It's just insane.
> > 
> > And lockless algorithms usually are insane. But locks come with a huge
> > cost. What happens when we have 32 core domains. This can cause
> > tremendous contention and makes the entire cpu priority for deadlines
> > useless. Might as well rip out the code.
> 
> I think it would be better if we, even keeping it lockless,
> 
>    1. make cp->elements[].cpu referred once than twice,
>    2. add retry logic in order to match elements[].cpu with its dl,
>    3. add retry logic for the u64 variable to be read atomically,
> 
> So what do you think about the suggestions? Of course it does not solve
> the problems perfectly though.. Or do you think it's not worth?
> 

Not sure, but if we are going to retry a lot it might be better off to
put proper locking instead? We could also simply bail out when we notice
that something is changed under our feet. I'd say (again :) we might
first want to understand (with some numbers) how bad the problem is and
then fix it. I guess numbers might also help us to better understand
what the best fix is?

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically
  2017-05-16 10:32       ` Juri Lelli
@ 2017-05-16 13:10         ` Steven Rostedt
  2017-05-23  1:11           ` Byungchul Park
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2017-05-16 13:10 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Byungchul Park, peterz, mingo, linux-kernel, juri.lelli, bristot,
	kernel-team

On Tue, 16 May 2017 11:32:41 +0100
Juri Lelli <juri.lelli@arm.com> wrote:


> Not sure, but if we are going to retry a lot it might be better off to
> put proper locking instead? We could also simply bail out when we notice

Actually, locking can make it much worse. I've been playing with RT on
boxes with 240 cores (with HT turned off!), and *any* locks in the
scheduler can cause huge contention.

> that something is changed under our feet. I'd say (again :) we might
> first want to understand (with some numbers) how bad the problem is and
> then fix it. I guess numbers might also help us to better understand
> what the best fix is?

Exactly. I haven't seen any numbers. Yes, it is not perfect, but we
don't know how unperfect it is. Numbers will help to know if there is
even a problem or not with the current solution.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically
  2017-05-16 13:10         ` Steven Rostedt
@ 2017-05-23  1:11           ` Byungchul Park
  0 siblings, 0 replies; 13+ messages in thread
From: Byungchul Park @ 2017-05-23  1:11 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Juri Lelli, peterz, mingo, linux-kernel, juri.lelli, bristot,
	kernel-team

On Tue, May 16, 2017 at 09:10:53AM -0400, Steven Rostedt wrote:
> On Tue, 16 May 2017 11:32:41 +0100
> Juri Lelli <juri.lelli@arm.com> wrote:
> 
> 
> > Not sure, but if we are going to retry a lot it might be better off to
> > put proper locking instead? We could also simply bail out when we notice
> 
> Actually, locking can make it much worse. I've been playing with RT on
> boxes with 240 cores (with HT turned off!), and *any* locks in the
> scheduler can cause huge contention.

OK. I give up patches adding locks here. Thank you for explaning why you
did not write code in such a way like mine.

> > that something is changed under our feet. I'd say (again :) we might
> > first want to understand (with some numbers) how bad the problem is and
> > then fix it. I guess numbers might also help us to better understand
> > what the best fix is?
> 
> Exactly. I haven't seen any numbers. Yes, it is not perfect, but we
> don't know how unperfect it is. Numbers will help to know if there is
> even a problem or not with the current solution.

Yes. I am also curious about the number and want to check the number,
but it's not easy to me since I don't have such a big machine. For now,
only thing I can do is to skip the patch.

Thank you very much for all of your opinions.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-05-23  1:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-12  5:48 [PATCH v4 0/5] Make find_later_rq() choose a closer cpu in topology Byungchul Park
2017-05-12  5:48 ` [PATCH v4 1/5] sched/deadline: Refer to cpudl.elements atomically Byungchul Park
2017-05-12 14:25   ` Steven Rostedt
2017-05-15  8:36     ` Juri Lelli
2017-05-16  7:00       ` Byungchul Park
2017-05-16  6:52     ` Byungchul Park
2017-05-16 10:32       ` Juri Lelli
2017-05-16 13:10         ` Steven Rostedt
2017-05-23  1:11           ` Byungchul Park
2017-05-12  5:48 ` [PATCH v4 2/5] sched/deadline: Make find_later_rq() choose a closer cpu in topology Byungchul Park
2017-05-12  5:48 ` [PATCH v4 3/5] sched/deadline: Change return value of cpudl_find() Byungchul Park
2017-05-12  5:48 ` [PATCH v4 4/5] sched/deadline: Add support for SD_PREFER_SIBLING on find_later_rq() Byungchul Park
2017-05-12  5:48 ` [PATCH v4 5/5] sched/rt: Add support for SD_PREFER_SIBLING on find_lowest_rq() Byungchul Park

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.