All of lore.kernel.org
 help / color / mirror / Atom feed
* SCHED_DEADLINE cpudeadline.{h,c} fixup
       [not found] <1468880275-4338-1-git-send-email-tommaso.cucinotta@sssup.it>
@ 2016-08-14 14:27 ` Tommaso Cucinotta
  2016-08-14 14:27   ` [RFC PATCH 1/3] sched/deadline: refactor cpu heap code Tommaso Cucinotta
                     ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-08-14 14:27 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, linux-dl, Tommaso Cucinotta

Hi,

this is a rework of the cpudeadline bugfix and speed-up patch-set, that
integrates all comments received so far from Luca, Juri and Peter.

Compared with the previous post, here:
-) I'm keeping out the minimally invasive bugfix, as it's already been
   merged in tip/sched/core
-) I moved some little code refactory around change_key_dl() out of the
  (now) 2nd patch, to the 1st one. Now the 2nd (speed-up) patch just
  changes the heapify_up/down() functions
-) I rebased on top of commit f0b22e39
-) I repeated an extensive set of tests through the framework published
   separately at: https://github.com/tomcucinotta/cpudl-bench
   repeating new no-behavior-change tests, new heap-consistency tests,
   and new a/b benchmarks (I'm working on a new i5 laptop now), results at:
     https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl-100000.pdf
   highlighting up to a 14% speed-up when averaging over 100K ops. See the
   enclosed README in that repo for more info.

I'm leaving below the original description of all 4 patches.

--
The first patch is a minimally invasive (1-line) fix for the deadline
wrap-around bug. This leaves some weirdness in how cpudl_change_key() is
called. Therefore, the second patch does a minimum of refactory to make
things more explicit and clear.

The 3rd patch contains now the actual performance enhancement (avoiding
unneeded swaps during heapify operations), which has been measured to
achieve up to 14% of speed-up for cpudl_set() calls.
This has been measured with a randomly generated workload of 1K,10K,100K
random heap insertions and deletions (75% cpudl_set() calls with is_valid=1
and 25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs. Benchmarking code at: https://github.com/tomcucinotta/cpudl-bench

Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c}
and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl)
instead of cpudl_set(cp, cpu, 0 /* dl */, 0  /* is_valid */) and
cpudl_set(cp, cpu, dl, 1 /* is_valid */).

Any further comment is welcome, thanks!

  Tommaso
--

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/3] sched/deadline: refactor cpu heap code
  2016-08-14 14:27 ` SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
@ 2016-08-14 14:27   ` Tommaso Cucinotta
  2016-09-05 11:53     ` [tip:sched/core] sched/deadline: Refactor CPU " tip-bot for Tommaso Cucinotta
  2016-08-14 14:27   ` [RFC PATCH 2/3] sched/deadline: make cpu heap faster avoiding real swaps on heapify Tommaso Cucinotta
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-08-14 14:27 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, linux-dl, Tommaso Cucinotta, Juri Lelli

1. heapify up factored out in new dedicated function heapify_up()
   (avoids repetition of same code)
2. call to cpudl_change_key() replaced with heapify_up() when
   cpudl_set actually inserts a new node in the heap
3. cpudl_change_key() replaced with heapify() that heapifies up
   or down as needed.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
---
 kernel/sched/cpudeadline.c | 50 +++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index d418449..0acb0d4 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
 	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
 }
 
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
@@ -66,23 +66,24 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
 	}
 }
 
-static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
-
-	if (dl_time_before(new_dl, cp->elements[idx].dl)) {
-		cp->elements[idx].dl = new_dl;
-		cpudl_heapify(cp, idx);
-	} else {
-		cp->elements[idx].dl = new_dl;
-		while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-					cp->elements[idx].dl)) {
-			cpudl_exchange(cp, idx, parent(idx));
-			idx = parent(idx);
-		}
+	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+			cp->elements[idx].dl)) {
+		cpudl_exchange(cp, idx, parent(idx));
+		idx = parent(idx);
 	}
 }
 
+static void cpudl_heapify(struct cpudl *cp, int idx)
+{
+	if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+				cp->elements[idx].dl))
+		cpudl_heapify_up(cp, idx);
+	else
+		cpudl_heapify_down(cp, idx);
+}
+
 static inline int cpudl_maximum(struct cpudl *cp)
 {
 	return cp->elements[0].cpu;
@@ -154,27 +155,22 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 		cp->size--;
 		cp->elements[new_cpu].idx = old_idx;
 		cp->elements[cpu].idx = IDX_INVALID;
-		while (old_idx > 0 && dl_time_before(
-				cp->elements[parent(old_idx)].dl,
-				cp->elements[old_idx].dl)) {
-			cpudl_exchange(cp, old_idx, parent(old_idx));
-			old_idx = parent(old_idx);
-		}
+		cpudl_heapify(cp, old_idx);
 		cpumask_set_cpu(cpu, cp->free_cpus);
-                cpudl_heapify(cp, old_idx);
 
 		goto out;
 	}
 
 	if (old_idx == IDX_INVALID) {
-		cp->size++;
-		cp->elements[cp->size - 1].dl = dl;
-		cp->elements[cp->size - 1].cpu = cpu;
-		cp->elements[cpu].idx = cp->size - 1;
-		cpudl_change_key(cp, cp->size - 1, dl);
+		int new_idx = cp->size++;
+		cp->elements[new_idx].dl = dl;
+		cp->elements[new_idx].cpu = cpu;
+		cp->elements[cpu].idx = new_idx;
+		cpudl_heapify_up(cp, new_idx);
 		cpumask_clear_cpu(cpu, cp->free_cpus);
 	} else {
-		cpudl_change_key(cp, old_idx, dl);
+		cp->elements[old_idx].dl = dl;
+		cpudl_heapify(cp, old_idx);
 	}
 
 out:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/3] sched/deadline: make cpu heap faster avoiding real swaps on heapify
  2016-08-14 14:27 ` SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
  2016-08-14 14:27   ` [RFC PATCH 1/3] sched/deadline: refactor cpu heap code Tommaso Cucinotta
@ 2016-08-14 14:27   ` Tommaso Cucinotta
  2016-09-05 11:54     ` [tip:sched/core] sched/deadline: Make CPU " tip-bot for Tommaso Cucinotta
  2016-08-14 14:27   ` [RFC PATCH 3/3] sched/deadline: split cpudl_set() into cpudl_set() and cpudl_clear() Tommaso Cucinotta
  2016-08-19 12:58   ` SCHED_DEADLINE cpudeadline.{h,c} fixup Peter Zijlstra
  3 siblings, 1 reply; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-08-14 14:27 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, linux-dl, Tommaso Cucinotta, Juri Lelli

This change goes from heapify() ops done by swapping with parent/child
so that the item to fix moves along, to heapify() ops done by just
pulling the parent/child chain by 1 pos, then storing the item to fix
just at the end. On a non-trivial heapify(), this performs roughly half
stores wrt swaps.

This has been measured to achieve up to 10% of speed-up for cpudl_set()
calls, with a randomly generated workload of 1K,10K,100K random heap
insertions and deletions (75% cpudl_set() calls with is_valid=1 and
25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs, as measured on an Intel Core2 Duo.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
---
 kernel/sched/cpudeadline.c | 66 +++++++++++++++++++++++++++++++---------------
 1 file changed, 45 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 0acb0d4..0ace75a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,48 +31,72 @@ static inline int right_child(int i)
 	return (i << 1) + 2;
 }
 
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
-	int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
-	swap(cp->elements[a].cpu, cp->elements[b].cpu);
-	swap(cp->elements[a].dl , cp->elements[b].dl );
-
-	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
 static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
+	if (left_child(idx) >= cp->size)
+		return;
+
 	/* adapted from lib/prio_heap.c */
 	while(1) {
+		u64 largest_dl;
 		l = left_child(idx);
 		r = right_child(idx);
 		largest = idx;
+		largest_dl = orig_dl;
 
-		if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
-							cp->elements[l].dl))
+		if ((l < cp->size) && dl_time_before(orig_dl,
+						cp->elements[l].dl)) {
 			largest = l;
-		if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
-							cp->elements[r].dl))
+			largest_dl = cp->elements[l].dl;
+		}
+		if ((r < cp->size) && dl_time_before(largest_dl,
+						cp->elements[r].dl))
 			largest = r;
+
 		if (largest == idx)
 			break;
 
-		/* Push idx down the heap one level and bump one up */
-		cpudl_exchange(cp, largest, idx);
+		/* pull largest child onto idx */
+		cp->elements[idx].cpu = cp->elements[largest].cpu;
+		cp->elements[idx].dl = cp->elements[largest].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
 		idx = largest;
 	}
+	/* actual push down of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-			cp->elements[idx].dl)) {
-		cpudl_exchange(cp, idx, parent(idx));
-		idx = parent(idx);
-	}
+	int p;
+
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
+	if (idx == 0)
+		return;
+
+	do {
+		p = parent(idx);
+		if (dl_time_before(orig_dl, cp->elements[p].dl))
+			break;
+		/* pull parent onto idx */
+		cp->elements[idx].cpu = cp->elements[p].cpu;
+		cp->elements[idx].dl = cp->elements[p].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
+		idx = p;
+	} while (idx != 0);
+	/* actual push up of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify(struct cpudl *cp, int idx)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/3] sched/deadline: split cpudl_set() into cpudl_set() and cpudl_clear()
  2016-08-14 14:27 ` SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
  2016-08-14 14:27   ` [RFC PATCH 1/3] sched/deadline: refactor cpu heap code Tommaso Cucinotta
  2016-08-14 14:27   ` [RFC PATCH 2/3] sched/deadline: make cpu heap faster avoiding real swaps on heapify Tommaso Cucinotta
@ 2016-08-14 14:27   ` Tommaso Cucinotta
  2016-09-05 11:54     ` [tip:sched/core] sched/deadline: Split " tip-bot for Tommaso Cucinotta
  2016-08-19 12:58   ` SCHED_DEADLINE cpudeadline.{h,c} fixup Peter Zijlstra
  3 siblings, 1 reply; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-08-14 14:27 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, linux-dl, Tommaso Cucinotta, Juri Lelli

These 2 exercise independent code paths and need different arguments.
After this change, you call
  cpudl_clear(cp, cpu)
  cpudl_set(cp, cpu, dl)
instead of
  cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */)
  cpudl_set(cp, cpu, dl, 1 /* is_valid */)
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
---
 kernel/sched/cpudeadline.c | 49 +++++++++++++++++++++++++++++++---------------
 kernel/sched/cpudeadline.h |  3 ++-
 kernel/sched/deadline.c    | 10 +++++-----
 3 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 0ace75a..e731190 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -145,16 +145,15 @@ out:
 }
 
 /*
- * cpudl_set - update the cpudl max-heap
+ * cpudl_clear - remove a cpu from the cpudl max-heap
  * @cp: the cpudl max-heap context
  * @cpu: the target cpu
- * @dl: the new earliest deadline for this cpu
  *
  * Notes: assumes cpu_rq(cpu)->lock is locked
  *
  * Returns: (void)
  */
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+void cpudl_clear(struct cpudl *cp, int cpu)
 {
 	int old_idx, new_cpu;
 	unsigned long flags;
@@ -162,17 +161,15 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 	WARN_ON(!cpu_present(cpu));
 
 	raw_spin_lock_irqsave(&cp->lock, flags);
+
 	old_idx = cp->elements[cpu].idx;
-	if (!is_valid) {
-		/* remove item */
-		if (old_idx == IDX_INVALID) {
-			/*
-			 * Nothing to remove if old_idx was invalid.
-			 * This could happen if a rq_offline_dl is
-			 * called for a CPU without -dl tasks running.
-			 */
-			goto out;
-		}
+	if (old_idx == IDX_INVALID) {
+		/*
+		 * Nothing to remove if old_idx was invalid.
+		 * This could happen if a rq_offline_dl is
+		 * called for a CPU without -dl tasks running.
+		 */
+	} else {
 		new_cpu = cp->elements[cp->size - 1].cpu;
 		cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
 		cp->elements[old_idx].cpu = new_cpu;
@@ -180,11 +177,32 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 		cp->elements[new_cpu].idx = old_idx;
 		cp->elements[cpu].idx = IDX_INVALID;
 		cpudl_heapify(cp, old_idx);
-		cpumask_set_cpu(cpu, cp->free_cpus);
 
-		goto out;
+		cpumask_set_cpu(cpu, cp->free_cpus);
 	}
+	raw_spin_unlock_irqrestore(&cp->lock, flags);
+}
+
+/*
+ * cpudl_set - update the cpudl max-heap
+ * @cp: the cpudl max-heap context
+ * @cpu: the target cpu
+ * @dl: the new earliest deadline for this cpu
+ *
+ * Notes: assumes cpu_rq(cpu)->lock is locked
+ *
+ * Returns: (void)
+ */
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
+{
+	int old_idx;
+	unsigned long flags;
+
+	WARN_ON(!cpu_present(cpu));
 
+	raw_spin_lock_irqsave(&cp->lock, flags);
+
+	old_idx = cp->elements[cpu].idx;
 	if (old_idx == IDX_INVALID) {
 		int new_idx = cp->size++;
 		cp->elements[new_idx].dl = dl;
@@ -197,7 +215,6 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 		cpudl_heapify(cp, old_idx);
 	}
 
-out:
 	raw_spin_unlock_irqrestore(&cp->lock, flags);
 }
 
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index fcbdf83..f7da8c5 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -23,7 +23,8 @@ struct cpudl {
 #ifdef CONFIG_SMP
 int cpudl_find(struct cpudl *cp, struct task_struct *p,
 	       struct cpumask *later_mask);
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
+void cpudl_clear(struct cpudl *cp, int cpu);
 int cpudl_init(struct cpudl *cp);
 void cpudl_set_freecpu(struct cpudl *cp, int cpu);
 void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d091f4a..18fb0b8 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -798,7 +798,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
 	if (dl_rq->earliest_dl.curr == 0 ||
 	    dl_time_before(deadline, dl_rq->earliest_dl.curr)) {
 		dl_rq->earliest_dl.curr = deadline;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, deadline);
 	}
 }
 
@@ -813,14 +813,14 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
 	if (!dl_rq->dl_nr_running) {
 		dl_rq->earliest_dl.curr = 0;
 		dl_rq->earliest_dl.next = 0;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+		cpudl_clear(&rq->rd->cpudl, rq->cpu);
 	} else {
 		struct rb_node *leftmost = dl_rq->rb_leftmost;
 		struct sched_dl_entity *entry;
 
 		entry = rb_entry(leftmost, struct sched_dl_entity, rb_node);
 		dl_rq->earliest_dl.curr = entry->deadline;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline);
 	}
 }
 
@@ -1671,7 +1671,7 @@ static void rq_online_dl(struct rq *rq)
 
 	cpudl_set_freecpu(&rq->rd->cpudl, rq->cpu);
 	if (rq->dl.dl_nr_running > 0)
-		cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr);
 }
 
 /* Assumes rq->lock is held */
@@ -1680,7 +1680,7 @@ static void rq_offline_dl(struct rq *rq)
 	if (rq->dl.overloaded)
 		dl_clear_overload(rq);
 
-	cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+	cpudl_clear(&rq->rd->cpudl, rq->cpu);
 	cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: SCHED_DEADLINE cpudeadline.{h,c} fixup
  2016-08-14 14:27 ` SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
                     ` (2 preceding siblings ...)
  2016-08-14 14:27   ` [RFC PATCH 3/3] sched/deadline: split cpudl_set() into cpudl_set() and cpudl_clear() Tommaso Cucinotta
@ 2016-08-19 12:58   ` Peter Zijlstra
  3 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2016-08-19 12:58 UTC (permalink / raw)
  To: Tommaso Cucinotta
  Cc: Luca Abeni, Juri Lelli, Ingo Molnar, linux-kernel, linux-dl

On Sun, Aug 14, 2016 at 04:27:05PM +0200, Tommaso Cucinotta wrote:
> Hi,
> 
> this is a rework of the cpudeadline bugfix and speed-up patch-set, that
> integrates all comments received so far from Luca, Juri and Peter.
> 
> Compared with the previous post, here:
> -) I'm keeping out the minimally invasive bugfix, as it's already been
>    merged in tip/sched/core
> -) I moved some little code refactory around change_key_dl() out of the
>   (now) 2nd patch, to the 1st one. Now the 2nd (speed-up) patch just
>   changes the heapify_up/down() functions
> -) I rebased on top of commit f0b22e39
> -) I repeated an extensive set of tests through the framework published
>    separately at: https://github.com/tomcucinotta/cpudl-bench
>    repeating new no-behavior-change tests, new heap-consistency tests,
>    and new a/b benchmarks (I'm working on a new i5 laptop now), results at:
>      https://github.com/tomcucinotta/cpudl-bench/blob/master/cpudl-100000.pdf
>    highlighting up to a 14% speed-up when averaging over 100K ops. See the
>    enclosed README in that repo for more info.

Thanks!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:sched/core] sched/deadline: Refactor CPU heap code
  2016-08-14 14:27   ` [RFC PATCH 1/3] sched/deadline: refactor cpu heap code Tommaso Cucinotta
@ 2016-09-05 11:53     ` tip-bot for Tommaso Cucinotta
  0 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Tommaso Cucinotta @ 2016-09-05 11:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: juri.lelli, linux-kernel, luca.abeni, torvalds, peterz, hpa,
	mingo, juri.lelli, tglx, tommaso.cucinotta

Commit-ID:  126b3b6842cc848fc9880e7816e0a8d743be51f1
Gitweb:     http://git.kernel.org/tip/126b3b6842cc848fc9880e7816e0a8d743be51f1
Author:     Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
AuthorDate: Sun, 14 Aug 2016 16:27:06 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 5 Sep 2016 13:29:42 +0200

sched/deadline: Refactor CPU heap code

 1. heapify up factored out in new dedicated function heapify_up()
    (avoids repetition of same code)

 2. call to cpudl_change_key() replaced with heapify_up() when
    cpudl_set actually inserts a new node in the heap

 3. cpudl_change_key() replaced with heapify() that heapifies up
    or down as needed.

Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Juri Lelli <juri.lelli@arm.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-dl@retis.sssup.it
Link: http://lkml.kernel.org/r/1471184828-12644-2-git-send-email-tommaso.cucinotta@sssup.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/cpudeadline.c | 50 +++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 27 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index d418449..0acb0d4 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
 	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
 }
 
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
@@ -66,23 +66,24 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
 	}
 }
 
-static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
-
-	if (dl_time_before(new_dl, cp->elements[idx].dl)) {
-		cp->elements[idx].dl = new_dl;
-		cpudl_heapify(cp, idx);
-	} else {
-		cp->elements[idx].dl = new_dl;
-		while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-					cp->elements[idx].dl)) {
-			cpudl_exchange(cp, idx, parent(idx));
-			idx = parent(idx);
-		}
+	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+			cp->elements[idx].dl)) {
+		cpudl_exchange(cp, idx, parent(idx));
+		idx = parent(idx);
 	}
 }
 
+static void cpudl_heapify(struct cpudl *cp, int idx)
+{
+	if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+				cp->elements[idx].dl))
+		cpudl_heapify_up(cp, idx);
+	else
+		cpudl_heapify_down(cp, idx);
+}
+
 static inline int cpudl_maximum(struct cpudl *cp)
 {
 	return cp->elements[0].cpu;
@@ -154,27 +155,22 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 		cp->size--;
 		cp->elements[new_cpu].idx = old_idx;
 		cp->elements[cpu].idx = IDX_INVALID;
-		while (old_idx > 0 && dl_time_before(
-				cp->elements[parent(old_idx)].dl,
-				cp->elements[old_idx].dl)) {
-			cpudl_exchange(cp, old_idx, parent(old_idx));
-			old_idx = parent(old_idx);
-		}
+		cpudl_heapify(cp, old_idx);
 		cpumask_set_cpu(cpu, cp->free_cpus);
-                cpudl_heapify(cp, old_idx);
 
 		goto out;
 	}
 
 	if (old_idx == IDX_INVALID) {
-		cp->size++;
-		cp->elements[cp->size - 1].dl = dl;
-		cp->elements[cp->size - 1].cpu = cpu;
-		cp->elements[cpu].idx = cp->size - 1;
-		cpudl_change_key(cp, cp->size - 1, dl);
+		int new_idx = cp->size++;
+		cp->elements[new_idx].dl = dl;
+		cp->elements[new_idx].cpu = cpu;
+		cp->elements[cpu].idx = new_idx;
+		cpudl_heapify_up(cp, new_idx);
 		cpumask_clear_cpu(cpu, cp->free_cpus);
 	} else {
-		cpudl_change_key(cp, old_idx, dl);
+		cp->elements[old_idx].dl = dl;
+		cpudl_heapify(cp, old_idx);
 	}
 
 out:

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [tip:sched/core] sched/deadline: Make CPU heap faster avoiding real swaps on heapify
  2016-08-14 14:27   ` [RFC PATCH 2/3] sched/deadline: make cpu heap faster avoiding real swaps on heapify Tommaso Cucinotta
@ 2016-09-05 11:54     ` tip-bot for Tommaso Cucinotta
  0 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Tommaso Cucinotta @ 2016-09-05 11:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tommaso.cucinotta, hpa, peterz, torvalds, linux-kernel,
	juri.lelli, mingo, luca.abeni, juri.lelli, tglx

Commit-ID:  8e1bc301aaf9f9a2d731bf8d50d549ac2dcfdab2
Gitweb:     http://git.kernel.org/tip/8e1bc301aaf9f9a2d731bf8d50d549ac2dcfdab2
Author:     Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
AuthorDate: Sun, 14 Aug 2016 16:27:07 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 5 Sep 2016 13:29:43 +0200

sched/deadline: Make CPU heap faster avoiding real swaps on heapify

This change goes from heapify() ops done by swapping with parent/child
so that the item to fix moves along, to heapify() ops done by just
pulling the parent/child chain by 1 pos, then storing the item to fix
just at the end. On a non-trivial heapify(), this performs roughly half
stores wrt swaps.

This has been measured to achieve up to 10% of speed-up for cpudl_set()
calls, with a randomly generated workload of 1K,10K,100K random heap
insertions and deletions (75% cpudl_set() calls with is_valid=1 and
25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs, as measured on an Intel Core2 Duo.

Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Juri Lelli <juri.lelli@arm.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-dl@retis.sssup.it
Link: http://lkml.kernel.org/r/1471184828-12644-3-git-send-email-tommaso.cucinotta@sssup.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/cpudeadline.c | 66 +++++++++++++++++++++++++++++++---------------
 1 file changed, 45 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 0acb0d4..0ace75a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,48 +31,72 @@ static inline int right_child(int i)
 	return (i << 1) + 2;
 }
 
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
-	int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
-	swap(cp->elements[a].cpu, cp->elements[b].cpu);
-	swap(cp->elements[a].dl , cp->elements[b].dl );
-
-	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
 static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
+	if (left_child(idx) >= cp->size)
+		return;
+
 	/* adapted from lib/prio_heap.c */
 	while(1) {
+		u64 largest_dl;
 		l = left_child(idx);
 		r = right_child(idx);
 		largest = idx;
+		largest_dl = orig_dl;
 
-		if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
-							cp->elements[l].dl))
+		if ((l < cp->size) && dl_time_before(orig_dl,
+						cp->elements[l].dl)) {
 			largest = l;
-		if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
-							cp->elements[r].dl))
+			largest_dl = cp->elements[l].dl;
+		}
+		if ((r < cp->size) && dl_time_before(largest_dl,
+						cp->elements[r].dl))
 			largest = r;
+
 		if (largest == idx)
 			break;
 
-		/* Push idx down the heap one level and bump one up */
-		cpudl_exchange(cp, largest, idx);
+		/* pull largest child onto idx */
+		cp->elements[idx].cpu = cp->elements[largest].cpu;
+		cp->elements[idx].dl = cp->elements[largest].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
 		idx = largest;
 	}
+	/* actual push down of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-			cp->elements[idx].dl)) {
-		cpudl_exchange(cp, idx, parent(idx));
-		idx = parent(idx);
-	}
+	int p;
+
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
+	if (idx == 0)
+		return;
+
+	do {
+		p = parent(idx);
+		if (dl_time_before(orig_dl, cp->elements[p].dl))
+			break;
+		/* pull parent onto idx */
+		cp->elements[idx].cpu = cp->elements[p].cpu;
+		cp->elements[idx].dl = cp->elements[p].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
+		idx = p;
+	} while (idx != 0);
+	/* actual push up of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify(struct cpudl *cp, int idx)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [tip:sched/core] sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()
  2016-08-14 14:27   ` [RFC PATCH 3/3] sched/deadline: split cpudl_set() into cpudl_set() and cpudl_clear() Tommaso Cucinotta
@ 2016-09-05 11:54     ` tip-bot for Tommaso Cucinotta
  0 siblings, 0 replies; 8+ messages in thread
From: tip-bot for Tommaso Cucinotta @ 2016-09-05 11:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, torvalds, tommaso.cucinotta, hpa, luca.abeni,
	mingo, juri.lelli, peterz, juri.lelli, tglx

Commit-ID:  d8206bb3ffe0eaee03abfad46fd44d8b17142e88
Gitweb:     http://git.kernel.org/tip/d8206bb3ffe0eaee03abfad46fd44d8b17142e88
Author:     Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
AuthorDate: Sun, 14 Aug 2016 16:27:08 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 5 Sep 2016 13:29:43 +0200

sched/deadline: Split cpudl_set() into cpudl_set() and cpudl_clear()

These 2 exercise independent code paths and need different arguments.

After this change, you call:

  cpudl_clear(cp, cpu);
  cpudl_set(cp, cpu, dl);

instead of:

  cpudl_set(cp, cpu, 0 /* dl */, 0 /* is_valid */);
  cpudl_set(cp, cpu, dl, 1 /* is_valid */);

Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Juri Lelli <juri.lelli@arm.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-dl@retis.sssup.it
Link: http://lkml.kernel.org/r/1471184828-12644-4-git-send-email-tommaso.cucinotta@sssup.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/cpudeadline.c | 49 +++++++++++++++++++++++++++++++---------------
 kernel/sched/cpudeadline.h |  3 ++-
 kernel/sched/deadline.c    | 10 +++++-----
 3 files changed, 40 insertions(+), 22 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 0ace75a..e731190 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -145,16 +145,15 @@ out:
 }
 
 /*
- * cpudl_set - update the cpudl max-heap
+ * cpudl_clear - remove a cpu from the cpudl max-heap
  * @cp: the cpudl max-heap context
  * @cpu: the target cpu
- * @dl: the new earliest deadline for this cpu
  *
  * Notes: assumes cpu_rq(cpu)->lock is locked
  *
  * Returns: (void)
  */
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+void cpudl_clear(struct cpudl *cp, int cpu)
 {
 	int old_idx, new_cpu;
 	unsigned long flags;
@@ -162,17 +161,15 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 	WARN_ON(!cpu_present(cpu));
 
 	raw_spin_lock_irqsave(&cp->lock, flags);
+
 	old_idx = cp->elements[cpu].idx;
-	if (!is_valid) {
-		/* remove item */
-		if (old_idx == IDX_INVALID) {
-			/*
-			 * Nothing to remove if old_idx was invalid.
-			 * This could happen if a rq_offline_dl is
-			 * called for a CPU without -dl tasks running.
-			 */
-			goto out;
-		}
+	if (old_idx == IDX_INVALID) {
+		/*
+		 * Nothing to remove if old_idx was invalid.
+		 * This could happen if a rq_offline_dl is
+		 * called for a CPU without -dl tasks running.
+		 */
+	} else {
 		new_cpu = cp->elements[cp->size - 1].cpu;
 		cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
 		cp->elements[old_idx].cpu = new_cpu;
@@ -180,11 +177,32 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 		cp->elements[new_cpu].idx = old_idx;
 		cp->elements[cpu].idx = IDX_INVALID;
 		cpudl_heapify(cp, old_idx);
-		cpumask_set_cpu(cpu, cp->free_cpus);
 
-		goto out;
+		cpumask_set_cpu(cpu, cp->free_cpus);
 	}
+	raw_spin_unlock_irqrestore(&cp->lock, flags);
+}
+
+/*
+ * cpudl_set - update the cpudl max-heap
+ * @cp: the cpudl max-heap context
+ * @cpu: the target cpu
+ * @dl: the new earliest deadline for this cpu
+ *
+ * Notes: assumes cpu_rq(cpu)->lock is locked
+ *
+ * Returns: (void)
+ */
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
+{
+	int old_idx;
+	unsigned long flags;
+
+	WARN_ON(!cpu_present(cpu));
 
+	raw_spin_lock_irqsave(&cp->lock, flags);
+
+	old_idx = cp->elements[cpu].idx;
 	if (old_idx == IDX_INVALID) {
 		int new_idx = cp->size++;
 		cp->elements[new_idx].dl = dl;
@@ -197,7 +215,6 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 		cpudl_heapify(cp, old_idx);
 	}
 
-out:
 	raw_spin_unlock_irqrestore(&cp->lock, flags);
 }
 
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index fcbdf83..f7da8c5 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -23,7 +23,8 @@ struct cpudl {
 #ifdef CONFIG_SMP
 int cpudl_find(struct cpudl *cp, struct task_struct *p,
 	       struct cpumask *later_mask);
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
+void cpudl_clear(struct cpudl *cp, int cpu);
 int cpudl_init(struct cpudl *cp);
 void cpudl_set_freecpu(struct cpudl *cp, int cpu);
 void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d091f4a..18fb0b8 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -798,7 +798,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
 	if (dl_rq->earliest_dl.curr == 0 ||
 	    dl_time_before(deadline, dl_rq->earliest_dl.curr)) {
 		dl_rq->earliest_dl.curr = deadline;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, deadline);
 	}
 }
 
@@ -813,14 +813,14 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
 	if (!dl_rq->dl_nr_running) {
 		dl_rq->earliest_dl.curr = 0;
 		dl_rq->earliest_dl.next = 0;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+		cpudl_clear(&rq->rd->cpudl, rq->cpu);
 	} else {
 		struct rb_node *leftmost = dl_rq->rb_leftmost;
 		struct sched_dl_entity *entry;
 
 		entry = rb_entry(leftmost, struct sched_dl_entity, rb_node);
 		dl_rq->earliest_dl.curr = entry->deadline;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline);
 	}
 }
 
@@ -1671,7 +1671,7 @@ static void rq_online_dl(struct rq *rq)
 
 	cpudl_set_freecpu(&rq->rd->cpudl, rq->cpu);
 	if (rq->dl.dl_nr_running > 0)
-		cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr);
 }
 
 /* Assumes rq->lock is held */
@@ -1680,7 +1680,7 @@ static void rq_offline_dl(struct rq *rq)
 	if (rq->dl.overloaded)
 		dl_clear_overload(rq);
 
-	cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+	cpudl_clear(&rq->rd->cpudl, rq->cpu);
 	cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
 }
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-09-05 11:55 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1468880275-4338-1-git-send-email-tommaso.cucinotta@sssup.it>
2016-08-14 14:27 ` SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
2016-08-14 14:27   ` [RFC PATCH 1/3] sched/deadline: refactor cpu heap code Tommaso Cucinotta
2016-09-05 11:53     ` [tip:sched/core] sched/deadline: Refactor CPU " tip-bot for Tommaso Cucinotta
2016-08-14 14:27   ` [RFC PATCH 2/3] sched/deadline: make cpu heap faster avoiding real swaps on heapify Tommaso Cucinotta
2016-09-05 11:54     ` [tip:sched/core] sched/deadline: Make CPU " tip-bot for Tommaso Cucinotta
2016-08-14 14:27   ` [RFC PATCH 3/3] sched/deadline: split cpudl_set() into cpudl_set() and cpudl_clear() Tommaso Cucinotta
2016-09-05 11:54     ` [tip:sched/core] sched/deadline: Split " tip-bot for Tommaso Cucinotta
2016-08-19 12:58   ` SCHED_DEADLINE cpudeadline.{h,c} fixup Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.