linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* SCHED_DEADLINE cpudeadline.{h,c} fixup
@ 2016-05-19 16:02 Tommaso Cucinotta
  2016-05-19 16:02 ` [RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap Tommaso Cucinotta
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-19 16:02 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Tommaso Cucinotta

Hi all,

I took Luca's advice to isolate the deadline wrap-around bugfix with a
first minimally invasive patch (1-line). This leaves some weirdness in
how cpudl_change_key() is called.
Therefore, the second patch does a minimum of refactory to make things
more explicit and clear.
The 3rd patch contains now the actual performance enhancement (avoiding
unneeded swaps during heapify operations), which, as said in the previous
post, achieves up to 6% of speed-up for cpudl_set() calls.
Finally, the 4th patch is another clear-up patch touching cpudeadline.{h,c}
and deadline.c. Now you call cpudl_clear(cp, cpu) and cpudl_set(cp, cpu, dl)
instead of cpudl_set(cp, cpu, 0 /* dl */, 0  /* is_valid */) and
cpudl_set(cp, cpu, dl, 1 /* is_valid */).

Please, let me know how this looks like now.

Thanks,

  Tommaso

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap.
  2016-05-19 16:02 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
@ 2016-05-19 16:02 ` Tommaso Cucinotta
  2016-05-19 16:02 ` [RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory: - heapify_up factored out in new dedicated function (avoids repeatition of same code) - call to cpudl_change_key replaced with heapify_up when cpudl_set actually inserts a new node in the heap Tommaso Cucinotta
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-19 16:02 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Tommaso Cucinotta

---
 kernel/sched/cpudeadline.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5be5882..d418449 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -168,7 +168,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 
 	if (old_idx == IDX_INVALID) {
 		cp->size++;
-		cp->elements[cp->size - 1].dl = 0;
+		cp->elements[cp->size - 1].dl = dl;
 		cp->elements[cp->size - 1].cpu = cpu;
 		cp->elements[cpu].idx = cp->size - 1;
 		cpudl_change_key(cp, cp->size - 1, dl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory: - heapify_up factored out in new dedicated function   (avoids repeatition of same code) - call to cpudl_change_key replaced with heapify_up when cpudl_set actually inserts a new node in the heap
  2016-05-19 16:02 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
  2016-05-19 16:02 ` [RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap Tommaso Cucinotta
@ 2016-05-19 16:02 ` Tommaso Cucinotta
  2016-05-19 16:02 ` [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops Tommaso Cucinotta
  2016-05-19 16:02 ` [RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear(). These 2 exercise independent code paths and need different arguments Tommaso Cucinotta
  3 siblings, 0 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-19 16:02 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Tommaso Cucinotta

---
 kernel/sched/cpudeadline.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index d418449..3c42702 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
 	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
 }
 
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
@@ -66,20 +66,25 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
 	}
 }
 
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
+{
+	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+			cp->elements[idx].dl)) {
+		cpudl_exchange(cp, idx, parent(idx));
+		idx = parent(idx);
+	}
+}
+
 static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
 {
 	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
 
 	if (dl_time_before(new_dl, cp->elements[idx].dl)) {
 		cp->elements[idx].dl = new_dl;
-		cpudl_heapify(cp, idx);
+		cpudl_heapify_down(cp, idx);
 	} else {
 		cp->elements[idx].dl = new_dl;
-		while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-					cp->elements[idx].dl)) {
-			cpudl_exchange(cp, idx, parent(idx));
-			idx = parent(idx);
-		}
+		cpudl_heapify_up(cp, idx);
 	}
 }
 
@@ -154,24 +159,19 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 		cp->size--;
 		cp->elements[new_cpu].idx = old_idx;
 		cp->elements[cpu].idx = IDX_INVALID;
-		while (old_idx > 0 && dl_time_before(
-				cp->elements[parent(old_idx)].dl,
-				cp->elements[old_idx].dl)) {
-			cpudl_exchange(cp, old_idx, parent(old_idx));
-			old_idx = parent(old_idx);
-		}
+		cpudl_heapify_up(cp, old_idx);
 		cpumask_set_cpu(cpu, cp->free_cpus);
-                cpudl_heapify(cp, old_idx);
+                cpudl_heapify_down(cp, old_idx);
 
 		goto out;
 	}
 
 	if (old_idx == IDX_INVALID) {
-		cp->size++;
-		cp->elements[cp->size - 1].dl = dl;
-		cp->elements[cp->size - 1].cpu = cpu;
-		cp->elements[cpu].idx = cp->size - 1;
-		cpudl_change_key(cp, cp->size - 1, dl);
+		int size1 = cp->size++;
+		cp->elements[size1].dl = dl;
+		cp->elements[size1].cpu = cpu;
+		cp->elements[cpu].idx = size1;
+		cpudl_heapify_up(cp, size1);
 		cpumask_clear_cpu(cpu, cp->free_cpus);
 	} else {
 		cpudl_change_key(cp, old_idx, dl);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.
  2016-05-19 16:02 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
  2016-05-19 16:02 ` [RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap Tommaso Cucinotta
  2016-05-19 16:02 ` [RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory: - heapify_up factored out in new dedicated function (avoids repeatition of same code) - call to cpudl_change_key replaced with heapify_up when cpudl_set actually inserts a new node in the heap Tommaso Cucinotta
@ 2016-05-19 16:02 ` Tommaso Cucinotta
  2016-05-19 16:02 ` [RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear(). These 2 exercise independent code paths and need different arguments Tommaso Cucinotta
  3 siblings, 0 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-19 16:02 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Tommaso Cucinotta

---
 kernel/sched/cpudeadline.c | 114 +++++++++++++++++++++++++++------------------
 1 file changed, 69 insertions(+), 45 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 3c42702..60f933a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,60 +31,82 @@ static inline int right_child(int i)
 	return (i << 1) + 2;
 }
 
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
-	int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
-	swap(cp->elements[a].cpu, cp->elements[b].cpu);
-	swap(cp->elements[a].dl , cp->elements[b].dl );
-
-	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
 static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
+	if (left_child(idx) >= cp->size)
+		return;
+
 	/* adapted from lib/prio_heap.c */
 	while(1) {
+		u64 largest_dl;
 		l = left_child(idx);
 		r = right_child(idx);
 		largest = idx;
+		largest_dl = orig_dl;
 
-		if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
-							cp->elements[l].dl))
+		if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) {
 			largest = l;
-		if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
-							cp->elements[r].dl))
+			largest_dl = cp->elements[l].dl;
+		}
+		if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl))
 			largest = r;
+
 		if (largest == idx)
 			break;
 
-		/* Push idx down the heap one level and bump one up */
-		cpudl_exchange(cp, largest, idx);
+		/* pull largest child onto idx */
+		cp->elements[idx].cpu = cp->elements[largest].cpu;
+		cp->elements[idx].dl = cp->elements[largest].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
 		idx = largest;
 	}
+	/* actual push down of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-			cp->elements[idx].dl)) {
-		cpudl_exchange(cp, idx, parent(idx));
-		idx = parent(idx);
-	}
+	int p;
+
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
+	if (idx == 0)
+		return;
+
+	do {
+		p = parent(idx);
+		if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl))
+			break;
+		/* pull parent onto idx */
+		cp->elements[idx].cpu = cp->elements[p].cpu;
+		cp->elements[idx].dl = cp->elements[p].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
+		idx = p;
+	} while (idx != 0);
+	/* actual push up of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
-static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+static void cpudl_heapify(struct cpudl *cp, int idx)
 {
 	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
+	if (idx == IDX_INVALID)
+		return;
 
-	if (dl_time_before(new_dl, cp->elements[idx].dl)) {
-		cp->elements[idx].dl = new_dl;
-		cpudl_heapify_down(cp, idx);
-	} else {
-		cp->elements[idx].dl = new_dl;
+	if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, cp->elements[idx].dl)) {
 		cpudl_heapify_up(cp, idx);
+	} else {
+		cpudl_heapify_down(cp, idx);
 	}
 }
 
@@ -153,28 +175,30 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 			 */
 			goto out;
 		}
-		new_cpu = cp->elements[cp->size - 1].cpu;
-		cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
-		cp->elements[old_idx].cpu = new_cpu;
 		cp->size--;
-		cp->elements[new_cpu].idx = old_idx;
 		cp->elements[cpu].idx = IDX_INVALID;
-		cpudl_heapify_up(cp, old_idx);
-		cpumask_set_cpu(cpu, cp->free_cpus);
-                cpudl_heapify_down(cp, old_idx);
-
-		goto out;
-	}
+		if (old_idx != cp->size) {
+			new_cpu = cp->elements[cp->size].cpu;
+			cp->elements[old_idx].dl = cp->elements[cp->size].dl;
+			cp->elements[old_idx].cpu = new_cpu;
+			cp->elements[new_cpu].idx = old_idx;
+			cpudl_heapify(cp, old_idx);
+		}
 
-	if (old_idx == IDX_INVALID) {
-		int size1 = cp->size++;
-		cp->elements[size1].dl = dl;
-		cp->elements[size1].cpu = cpu;
-		cp->elements[cpu].idx = size1;
-		cpudl_heapify_up(cp, size1);
-		cpumask_clear_cpu(cpu, cp->free_cpus);
+		cpumask_set_cpu(cpu, cp->free_cpus);
 	} else {
-		cpudl_change_key(cp, old_idx, dl);
+		if (old_idx == IDX_INVALID) {
+			int size1 = cp->size++;
+			cp->elements[size1].dl = dl;
+			cp->elements[size1].cpu = cpu;
+			cp->elements[cpu].idx = size1;
+			cpudl_heapify_up(cp, size1);
+
+			cpumask_clear_cpu(cpu, cp->free_cpus);
+		} else {
+			cp->elements[old_idx].dl = dl;
+			cpudl_heapify(cp, old_idx);
+		}
 	}
 
 out:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear(). These 2 exercise independent code paths and need different arguments.
  2016-05-19 16:02 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
                   ` (2 preceding siblings ...)
  2016-05-19 16:02 ` [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops Tommaso Cucinotta
@ 2016-05-19 16:02 ` Tommaso Cucinotta
  3 siblings, 0 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-19 16:02 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, Tommaso Cucinotta

---
 kernel/sched/cpudeadline.c | 71 +++++++++++++++++++++++++++++-----------------
 kernel/sched/cpudeadline.h |  3 +-
 kernel/sched/deadline.c    | 10 +++----
 3 files changed, 52 insertions(+), 32 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 60f933a..0f276bf 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -147,16 +147,15 @@ out:
 }
 
 /*
- * cpudl_set - update the cpudl max-heap
+ * cpudl_clear - remove a cpu from the cpudl max-heap
  * @cp: the cpudl max-heap context
  * @cpu: the target cpu
- * @dl: the new earliest deadline for this cpu
  *
  * Notes: assumes cpu_rq(cpu)->lock is locked
  *
  * Returns: (void)
  */
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+void cpudl_clear(struct cpudl *cp, int cpu)
 {
 	int old_idx, new_cpu;
 	unsigned long flags;
@@ -164,17 +163,16 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 	WARN_ON(!cpu_present(cpu));
 
 	raw_spin_lock_irqsave(&cp->lock, flags);
+
 	old_idx = cp->elements[cpu].idx;
-	if (!is_valid) {
+	if (old_idx == IDX_INVALID) {
+		/*
+		 * Nothing to remove if old_idx was invalid.
+		 * This could happen if a rq_offline_dl is
+		 * called for a CPU without -dl tasks running.
+		 */
+	} else {
 		/* remove item */
-		if (old_idx == IDX_INVALID) {
-			/*
-			 * Nothing to remove if old_idx was invalid.
-			 * This could happen if a rq_offline_dl is
-			 * called for a CPU without -dl tasks running.
-			 */
-			goto out;
-		}
 		cp->size--;
 		cp->elements[cpu].idx = IDX_INVALID;
 		if (old_idx != cp->size) {
@@ -184,24 +182,45 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 			cp->elements[new_cpu].idx = old_idx;
 			cpudl_heapify(cp, old_idx);
 		}
-
 		cpumask_set_cpu(cpu, cp->free_cpus);
+	}
+
+	raw_spin_unlock_irqrestore(&cp->lock, flags);
+}
+
+/*
+ * cpudl_set - update the cpudl max-heap
+ * @cp: the cpudl max-heap context
+ * @cpu: the target cpu
+ * @dl: the new earliest deadline for this cpu
+ *
+ * Notes: assumes cpu_rq(cpu)->lock is locked
+ *
+ * Returns: (void)
+ */
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
+{
+	int old_idx;
+	unsigned long flags;
+
+	WARN_ON(!cpu_present(cpu));
+
+	raw_spin_lock_irqsave(&cp->lock, flags);
+
+	old_idx = cp->elements[cpu].idx;
+	if (old_idx == IDX_INVALID) {
+		int sz1 = cp->size++;
+		cp->elements[sz1].dl = dl;
+		cp->elements[sz1].cpu = cpu;
+		cp->elements[cpu].idx = sz1;
+		cpudl_heapify_up(cp, sz1);
+
+		cpumask_clear_cpu(cpu, cp->free_cpus);
 	} else {
-		if (old_idx == IDX_INVALID) {
-			int size1 = cp->size++;
-			cp->elements[size1].dl = dl;
-			cp->elements[size1].cpu = cpu;
-			cp->elements[cpu].idx = size1;
-			cpudl_heapify_up(cp, size1);
-
-			cpumask_clear_cpu(cpu, cp->free_cpus);
-		} else {
-			cp->elements[old_idx].dl = dl;
-			cpudl_heapify(cp, old_idx);
-		}
+		cp->elements[old_idx].dl = dl;
+		cpudl_heapify(cp, old_idx);
 	}
 
-out:
 	raw_spin_unlock_irqrestore(&cp->lock, flags);
 }
 
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index fcbdf83..f7da8c5 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -23,7 +23,8 @@ struct cpudl {
 #ifdef CONFIG_SMP
 int cpudl_find(struct cpudl *cp, struct task_struct *p,
 	       struct cpumask *later_mask);
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
+void cpudl_clear(struct cpudl *cp, int cpu);
 int cpudl_init(struct cpudl *cp);
 void cpudl_set_freecpu(struct cpudl *cp, int cpu);
 void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index fcb7f02..f2e8f47 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -795,7 +795,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
 	if (dl_rq->earliest_dl.curr == 0 ||
 	    dl_time_before(deadline, dl_rq->earliest_dl.curr)) {
 		dl_rq->earliest_dl.curr = deadline;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, deadline);
 	}
 }
 
@@ -810,14 +810,14 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
 	if (!dl_rq->dl_nr_running) {
 		dl_rq->earliest_dl.curr = 0;
 		dl_rq->earliest_dl.next = 0;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+		cpudl_clear(&rq->rd->cpudl, rq->cpu);
 	} else {
 		struct rb_node *leftmost = dl_rq->rb_leftmost;
 		struct sched_dl_entity *entry;
 
 		entry = rb_entry(leftmost, struct sched_dl_entity, rb_node);
 		dl_rq->earliest_dl.curr = entry->deadline;
-		cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline);
 	}
 }
 
@@ -1668,7 +1668,7 @@ static void rq_online_dl(struct rq *rq)
 
 	cpudl_set_freecpu(&rq->rd->cpudl, rq->cpu);
 	if (rq->dl.dl_nr_running > 0)
-		cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr, 1);
+		cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr);
 }
 
 /* Assumes rq->lock is held */
@@ -1677,7 +1677,7 @@ static void rq_offline_dl(struct rq *rq)
 	if (rq->dl.overloaded)
 		dl_clear_overload(rq);
 
-	cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+	cpudl_clear(&rq->rd->cpudl, rq->cpu);
 	cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.
  2016-07-19  9:44 ` [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops Tommaso Cucinotta
  2016-08-02 11:38   ` Juri Lelli
@ 2016-08-08 14:04   ` Peter Zijlstra
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2016-08-08 14:04 UTC (permalink / raw)
  To: Tommaso Cucinotta
  Cc: Luca Abeni, Juri Lelli, Ingo Molnar, linux-kernel, linux-dl, Juri Lelli

On Tue, Jul 19, 2016 at 11:44:52AM +0200, Tommaso Cucinotta wrote:
> This change achieves up to 10% of speed-up for cpudl_set() calls, as
> measured with a andomly generated workload of 1K,10K,100K random heap
> insertions and deletions (75% cpudl_set() calls with is_valid=1 and
> 25% with is_valid=0), and randomly generated cpu IDs, with up to 256
> CPUs, as measured on an Intel Core2 Duo.

Changelog fails to explain _what_ the change is.

Some details on the change in implementation and why this is faster
would be good.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.
  2016-07-19  9:44 ` [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops Tommaso Cucinotta
@ 2016-08-02 11:38   ` Juri Lelli
  2016-08-08 14:04   ` Peter Zijlstra
  1 sibling, 0 replies; 8+ messages in thread
From: Juri Lelli @ 2016-08-02 11:38 UTC (permalink / raw)
  To: Tommaso Cucinotta
  Cc: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar, linux-kernel

Hi,

On 19/07/16 11:44, Tommaso Cucinotta wrote:
> This change achieves up to 10% of speed-up for cpudl_set() calls, as
> measured with a andomly generated workload of 1K,10K,100K random heap
> insertions and deletions (75% cpudl_set() calls with is_valid=1 and
> 25% with is_valid=0), and randomly generated cpu IDs, with up to 256
> CPUs, as measured on an Intel Core2 Duo.
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@arm.com>
> Cc: Luca Abeni <luca.abeni@unitn.it>
> Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
> Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
> ---
>  kernel/sched/cpudeadline.c | 114 +++++++++++++++++++++++++++------------------
>  1 file changed, 69 insertions(+), 45 deletions(-)
> 
> diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
> index 3c42702..60f933a 100644
> --- a/kernel/sched/cpudeadline.c
> +++ b/kernel/sched/cpudeadline.c
> @@ -31,60 +31,82 @@ static inline int right_child(int i)
>  	return (i << 1) + 2;
>  }
>  
> -static void cpudl_exchange(struct cpudl *cp, int a, int b)
> -{
> -	int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
> -
> -	swap(cp->elements[a].cpu, cp->elements[b].cpu);
> -	swap(cp->elements[a].dl , cp->elements[b].dl );
> -
> -	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
> -}
> -
>  static void cpudl_heapify_down(struct cpudl *cp, int idx)
>  {
>  	int l, r, largest;
>  
> +	int orig_cpu = cp->elements[idx].cpu;
> +	u64 orig_dl = cp->elements[idx].dl;
> +
> +	if (left_child(idx) >= cp->size)
> +		return;
> +
>  	/* adapted from lib/prio_heap.c */
>  	while(1) {
> +		u64 largest_dl;
>  		l = left_child(idx);
>  		r = right_child(idx);
>  		largest = idx;
> +		largest_dl = orig_dl;
>  
> -		if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
> -							cp->elements[l].dl))
> +		if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) {
>  			largest = l;
> -		if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
> -							cp->elements[r].dl))
> +			largest_dl = cp->elements[l].dl;

OK, I was a bit puzzled by the usage of largest_dl in this loop, but I
seemed to convince myself that it is right. It's basically a local tmp
variable that you use to pick left or right child at each iteration, and
you reset it to the deadline of the node your are moving (orig_dl) after
it iteration.

> +		}
> +		if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl))

Nitpick, this and the above conditions break 80 columns.

>  			largest = r;
> +
>  		if (largest == idx)
>  			break;
>  
> -		/* Push idx down the heap one level and bump one up */
> -		cpudl_exchange(cp, largest, idx);
> +		/* pull largest child onto idx */
> +		cp->elements[idx].cpu = cp->elements[largest].cpu;
> +		cp->elements[idx].dl = cp->elements[largest].dl;
> +		cp->elements[cp->elements[idx].cpu].idx = idx;
>  		idx = largest;
>  	}
> +	/* actual push down of saved original values orig_* */
> +	cp->elements[idx].cpu = orig_cpu;
> +	cp->elements[idx].dl = orig_dl;
> +	cp->elements[cp->elements[idx].cpu].idx = idx;
>  }
>  
>  static void cpudl_heapify_up(struct cpudl *cp, int idx)
>  {
> -	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
> -			cp->elements[idx].dl)) {
> -		cpudl_exchange(cp, idx, parent(idx));
> -		idx = parent(idx);
> -	}
> +	int p;
> +
> +	int orig_cpu = cp->elements[idx].cpu;
> +	u64 orig_dl = cp->elements[idx].dl;
> +
> +	if (idx == 0)
> +		return;
> +
> +	do {
> +		p = parent(idx);
> +		if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl))
> +			break;
> +		/* pull parent onto idx */
> +		cp->elements[idx].cpu = cp->elements[p].cpu;
> +		cp->elements[idx].dl = cp->elements[p].dl;
> +		cp->elements[cp->elements[idx].cpu].idx = idx;
> +		idx = p;
> +	} while (idx != 0);
> +	/* actual push up of saved original values orig_* */
> +	cp->elements[idx].cpu = orig_cpu;
> +	cp->elements[idx].dl = orig_dl;
> +	cp->elements[cp->elements[idx].cpu].idx = idx;
>  }
>  
> -static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
> +static void cpudl_heapify(struct cpudl *cp, int idx)
>  {
>  	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
> +	if (idx == IDX_INVALID)
> +		return;

Can we actually get here with IDX_INVALID? We bail out in the !is_valid
branch and the other call point is an else branch of old_idx ==
IDX_INVALID. So, it seems that this is more a BUG_ON().

>  
> -	if (dl_time_before(new_dl, cp->elements[idx].dl)) {
> -		cp->elements[idx].dl = new_dl;
> -		cpudl_heapify_down(cp, idx);
> -	} else {
> -		cp->elements[idx].dl = new_dl;
> +	if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, cp->elements[idx].dl)) {

You can wrap after the comma. And brackets are not required.

>  		cpudl_heapify_up(cp, idx);
> +	} else {
> +		cpudl_heapify_down(cp, idx);
>  	}
>  }
>  
> @@ -153,28 +175,30 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
>  			 */
>  			goto out;
>  		}
> -		new_cpu = cp->elements[cp->size - 1].cpu;
> -		cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
> -		cp->elements[old_idx].cpu = new_cpu;
>  		cp->size--;
> -		cp->elements[new_cpu].idx = old_idx;
>  		cp->elements[cpu].idx = IDX_INVALID;
> -		cpudl_heapify_up(cp, old_idx);
> -		cpumask_set_cpu(cpu, cp->free_cpus);
> -                cpudl_heapify_down(cp, old_idx);
> -
> -		goto out;
> -	}
> +		if (old_idx != cp->size) {
> +			new_cpu = cp->elements[cp->size].cpu;
> +			cp->elements[old_idx].dl = cp->elements[cp->size].dl;
> +			cp->elements[old_idx].cpu = new_cpu;
> +			cp->elements[new_cpu].idx = old_idx;
> +			cpudl_heapify(cp, old_idx);
> +		}
>  
> -	if (old_idx == IDX_INVALID) {
> -		int size1 = cp->size++;
> -		cp->elements[size1].dl = dl;
> -		cp->elements[size1].cpu = cpu;
> -		cp->elements[cpu].idx = size1;
> -		cpudl_heapify_up(cp, size1);
> -		cpumask_clear_cpu(cpu, cp->free_cpus);
> +		cpumask_set_cpu(cpu, cp->free_cpus);
>  	} else {
> -		cpudl_change_key(cp, old_idx, dl);
> +		if (old_idx == IDX_INVALID) {
> +			int size1 = cp->size++;
> +			cp->elements[size1].dl = dl;
> +			cp->elements[size1].cpu = cpu;
> +			cp->elements[cpu].idx = size1;
> +			cpudl_heapify_up(cp, size1);
> +
> +			cpumask_clear_cpu(cpu, cp->free_cpus);
> +		} else {
> +			cp->elements[old_idx].dl = dl;
> +			cpudl_heapify(cp, old_idx);
> +		}
>  	}
>  
>  out:
> -- 

Couldn't spot any problem with this change (apart from the minor things
I pointed out above). Didn't test it extensively, though. But it seems
you did. :)

Best,

- Juri

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.
  2016-07-19  9:44 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
@ 2016-07-19  9:44 ` Tommaso Cucinotta
  2016-08-02 11:38   ` Juri Lelli
  2016-08-08 14:04   ` Peter Zijlstra
  0 siblings, 2 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-07-19  9:44 UTC (permalink / raw)
  To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
  Cc: linux-kernel, linux-dl, Tommaso Cucinotta, Juri Lelli

This change achieves up to 10% of speed-up for cpudl_set() calls, as
measured with a andomly generated workload of 1K,10K,100K random heap
insertions and deletions (75% cpudl_set() calls with is_valid=1 and
25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs, as measured on an Intel Core2 Duo.

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
---
 kernel/sched/cpudeadline.c | 114 +++++++++++++++++++++++++++------------------
 1 file changed, 69 insertions(+), 45 deletions(-)

diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 3c42702..60f933a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,60 +31,82 @@ static inline int right_child(int i)
 	return (i << 1) + 2;
 }
 
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
-	int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
-	swap(cp->elements[a].cpu, cp->elements[b].cpu);
-	swap(cp->elements[a].dl , cp->elements[b].dl );
-
-	swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
 static void cpudl_heapify_down(struct cpudl *cp, int idx)
 {
 	int l, r, largest;
 
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
+	if (left_child(idx) >= cp->size)
+		return;
+
 	/* adapted from lib/prio_heap.c */
 	while(1) {
+		u64 largest_dl;
 		l = left_child(idx);
 		r = right_child(idx);
 		largest = idx;
+		largest_dl = orig_dl;
 
-		if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
-							cp->elements[l].dl))
+		if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) {
 			largest = l;
-		if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
-							cp->elements[r].dl))
+			largest_dl = cp->elements[l].dl;
+		}
+		if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl))
 			largest = r;
+
 		if (largest == idx)
 			break;
 
-		/* Push idx down the heap one level and bump one up */
-		cpudl_exchange(cp, largest, idx);
+		/* pull largest child onto idx */
+		cp->elements[idx].cpu = cp->elements[largest].cpu;
+		cp->elements[idx].dl = cp->elements[largest].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
 		idx = largest;
 	}
+	/* actual push down of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
 static void cpudl_heapify_up(struct cpudl *cp, int idx)
 {
-	while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
-			cp->elements[idx].dl)) {
-		cpudl_exchange(cp, idx, parent(idx));
-		idx = parent(idx);
-	}
+	int p;
+
+	int orig_cpu = cp->elements[idx].cpu;
+	u64 orig_dl = cp->elements[idx].dl;
+
+	if (idx == 0)
+		return;
+
+	do {
+		p = parent(idx);
+		if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl))
+			break;
+		/* pull parent onto idx */
+		cp->elements[idx].cpu = cp->elements[p].cpu;
+		cp->elements[idx].dl = cp->elements[p].dl;
+		cp->elements[cp->elements[idx].cpu].idx = idx;
+		idx = p;
+	} while (idx != 0);
+	/* actual push up of saved original values orig_* */
+	cp->elements[idx].cpu = orig_cpu;
+	cp->elements[idx].dl = orig_dl;
+	cp->elements[cp->elements[idx].cpu].idx = idx;
 }
 
-static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+static void cpudl_heapify(struct cpudl *cp, int idx)
 {
 	WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
+	if (idx == IDX_INVALID)
+		return;
 
-	if (dl_time_before(new_dl, cp->elements[idx].dl)) {
-		cp->elements[idx].dl = new_dl;
-		cpudl_heapify_down(cp, idx);
-	} else {
-		cp->elements[idx].dl = new_dl;
+	if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, cp->elements[idx].dl)) {
 		cpudl_heapify_up(cp, idx);
+	} else {
+		cpudl_heapify_down(cp, idx);
 	}
 }
 
@@ -153,28 +175,30 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
 			 */
 			goto out;
 		}
-		new_cpu = cp->elements[cp->size - 1].cpu;
-		cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
-		cp->elements[old_idx].cpu = new_cpu;
 		cp->size--;
-		cp->elements[new_cpu].idx = old_idx;
 		cp->elements[cpu].idx = IDX_INVALID;
-		cpudl_heapify_up(cp, old_idx);
-		cpumask_set_cpu(cpu, cp->free_cpus);
-                cpudl_heapify_down(cp, old_idx);
-
-		goto out;
-	}
+		if (old_idx != cp->size) {
+			new_cpu = cp->elements[cp->size].cpu;
+			cp->elements[old_idx].dl = cp->elements[cp->size].dl;
+			cp->elements[old_idx].cpu = new_cpu;
+			cp->elements[new_cpu].idx = old_idx;
+			cpudl_heapify(cp, old_idx);
+		}
 
-	if (old_idx == IDX_INVALID) {
-		int size1 = cp->size++;
-		cp->elements[size1].dl = dl;
-		cp->elements[size1].cpu = cpu;
-		cp->elements[cpu].idx = size1;
-		cpudl_heapify_up(cp, size1);
-		cpumask_clear_cpu(cpu, cp->free_cpus);
+		cpumask_set_cpu(cpu, cp->free_cpus);
 	} else {
-		cpudl_change_key(cp, old_idx, dl);
+		if (old_idx == IDX_INVALID) {
+			int size1 = cp->size++;
+			cp->elements[size1].dl = dl;
+			cp->elements[size1].cpu = cpu;
+			cp->elements[cpu].idx = size1;
+			cpudl_heapify_up(cp, size1);
+
+			cpumask_clear_cpu(cpu, cp->free_cpus);
+		} else {
+			cp->elements[old_idx].dl = dl;
+			cpudl_heapify(cp, old_idx);
+		}
 	}
 
 out:
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-08-08 14:05 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-19 16:02 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory: - heapify_up factored out in new dedicated function (avoids repeatition of same code) - call to cpudl_change_key replaced with heapify_up when cpudl_set actually inserts a new node in the heap Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear(). These 2 exercise independent code paths and need different arguments Tommaso Cucinotta
2016-07-19  9:44 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
2016-07-19  9:44 ` [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops Tommaso Cucinotta
2016-08-02 11:38   ` Juri Lelli
2016-08-08 14:04   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).