* [RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap.
2016-05-19 16:02 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
@ 2016-05-19 16:02 ` Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory: - heapify_up factored out in new dedicated function (avoids repeatition of same code) - call to cpudl_change_key replaced with heapify_up when cpudl_set actually inserts a new node in the heap Tommaso Cucinotta
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-19 16:02 UTC (permalink / raw)
To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, Tommaso Cucinotta
---
kernel/sched/cpudeadline.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 5be5882..d418449 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -168,7 +168,7 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
if (old_idx == IDX_INVALID) {
cp->size++;
- cp->elements[cp->size - 1].dl = 0;
+ cp->elements[cp->size - 1].dl = dl;
cp->elements[cp->size - 1].cpu = cpu;
cp->elements[cpu].idx = cp->size - 1;
cpudl_change_key(cp, cp->size - 1, dl);
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory: - heapify_up factored out in new dedicated function (avoids repeatition of same code) - call to cpudl_change_key replaced with heapify_up when cpudl_set actually inserts a new node in the heap
2016-05-19 16:02 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap Tommaso Cucinotta
@ 2016-05-19 16:02 ` Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear(). These 2 exercise independent code paths and need different arguments Tommaso Cucinotta
3 siblings, 0 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-19 16:02 UTC (permalink / raw)
To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, Tommaso Cucinotta
---
kernel/sched/cpudeadline.c | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index d418449..3c42702 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -41,7 +41,7 @@ static void cpudl_exchange(struct cpudl *cp, int a, int b)
swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
}
-static void cpudl_heapify(struct cpudl *cp, int idx)
+static void cpudl_heapify_down(struct cpudl *cp, int idx)
{
int l, r, largest;
@@ -66,20 +66,25 @@ static void cpudl_heapify(struct cpudl *cp, int idx)
}
}
+static void cpudl_heapify_up(struct cpudl *cp, int idx)
+{
+ while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
+ cp->elements[idx].dl)) {
+ cpudl_exchange(cp, idx, parent(idx));
+ idx = parent(idx);
+ }
+}
+
static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
{
WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
if (dl_time_before(new_dl, cp->elements[idx].dl)) {
cp->elements[idx].dl = new_dl;
- cpudl_heapify(cp, idx);
+ cpudl_heapify_down(cp, idx);
} else {
cp->elements[idx].dl = new_dl;
- while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
- cp->elements[idx].dl)) {
- cpudl_exchange(cp, idx, parent(idx));
- idx = parent(idx);
- }
+ cpudl_heapify_up(cp, idx);
}
}
@@ -154,24 +159,19 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
cp->size--;
cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
- while (old_idx > 0 && dl_time_before(
- cp->elements[parent(old_idx)].dl,
- cp->elements[old_idx].dl)) {
- cpudl_exchange(cp, old_idx, parent(old_idx));
- old_idx = parent(old_idx);
- }
+ cpudl_heapify_up(cp, old_idx);
cpumask_set_cpu(cpu, cp->free_cpus);
- cpudl_heapify(cp, old_idx);
+ cpudl_heapify_down(cp, old_idx);
goto out;
}
if (old_idx == IDX_INVALID) {
- cp->size++;
- cp->elements[cp->size - 1].dl = dl;
- cp->elements[cp->size - 1].cpu = cpu;
- cp->elements[cpu].idx = cp->size - 1;
- cpudl_change_key(cp, cp->size - 1, dl);
+ int size1 = cp->size++;
+ cp->elements[size1].dl = dl;
+ cp->elements[size1].cpu = cpu;
+ cp->elements[cpu].idx = size1;
+ cpudl_heapify_up(cp, size1);
cpumask_clear_cpu(cpu, cp->free_cpus);
} else {
cpudl_change_key(cp, old_idx, dl);
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.
2016-05-19 16:02 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 1/4] Minimally invasive deadline wrap-around bugfix for the SCHED_DEADLINE cpu heap Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 2/4] SCHED_DEADLINE cpu heap code clarification/refactory: - heapify_up factored out in new dedicated function (avoids repeatition of same code) - call to cpudl_change_key replaced with heapify_up when cpudl_set actually inserts a new node in the heap Tommaso Cucinotta
@ 2016-05-19 16:02 ` Tommaso Cucinotta
2016-05-19 16:02 ` [RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear(). These 2 exercise independent code paths and need different arguments Tommaso Cucinotta
3 siblings, 0 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-19 16:02 UTC (permalink / raw)
To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, Tommaso Cucinotta
---
kernel/sched/cpudeadline.c | 114 +++++++++++++++++++++++++++------------------
1 file changed, 69 insertions(+), 45 deletions(-)
diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 3c42702..60f933a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,60 +31,82 @@ static inline int right_child(int i)
return (i << 1) + 2;
}
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
- int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
- swap(cp->elements[a].cpu, cp->elements[b].cpu);
- swap(cp->elements[a].dl , cp->elements[b].dl );
-
- swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
static void cpudl_heapify_down(struct cpudl *cp, int idx)
{
int l, r, largest;
+ int orig_cpu = cp->elements[idx].cpu;
+ u64 orig_dl = cp->elements[idx].dl;
+
+ if (left_child(idx) >= cp->size)
+ return;
+
/* adapted from lib/prio_heap.c */
while(1) {
+ u64 largest_dl;
l = left_child(idx);
r = right_child(idx);
largest = idx;
+ largest_dl = orig_dl;
- if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
- cp->elements[l].dl))
+ if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) {
largest = l;
- if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
- cp->elements[r].dl))
+ largest_dl = cp->elements[l].dl;
+ }
+ if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl))
largest = r;
+
if (largest == idx)
break;
- /* Push idx down the heap one level and bump one up */
- cpudl_exchange(cp, largest, idx);
+ /* pull largest child onto idx */
+ cp->elements[idx].cpu = cp->elements[largest].cpu;
+ cp->elements[idx].dl = cp->elements[largest].dl;
+ cp->elements[cp->elements[idx].cpu].idx = idx;
idx = largest;
}
+ /* actual push down of saved original values orig_* */
+ cp->elements[idx].cpu = orig_cpu;
+ cp->elements[idx].dl = orig_dl;
+ cp->elements[cp->elements[idx].cpu].idx = idx;
}
static void cpudl_heapify_up(struct cpudl *cp, int idx)
{
- while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
- cp->elements[idx].dl)) {
- cpudl_exchange(cp, idx, parent(idx));
- idx = parent(idx);
- }
+ int p;
+
+ int orig_cpu = cp->elements[idx].cpu;
+ u64 orig_dl = cp->elements[idx].dl;
+
+ if (idx == 0)
+ return;
+
+ do {
+ p = parent(idx);
+ if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl))
+ break;
+ /* pull parent onto idx */
+ cp->elements[idx].cpu = cp->elements[p].cpu;
+ cp->elements[idx].dl = cp->elements[p].dl;
+ cp->elements[cp->elements[idx].cpu].idx = idx;
+ idx = p;
+ } while (idx != 0);
+ /* actual push up of saved original values orig_* */
+ cp->elements[idx].cpu = orig_cpu;
+ cp->elements[idx].dl = orig_dl;
+ cp->elements[cp->elements[idx].cpu].idx = idx;
}
-static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+static void cpudl_heapify(struct cpudl *cp, int idx)
{
WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
+ if (idx == IDX_INVALID)
+ return;
- if (dl_time_before(new_dl, cp->elements[idx].dl)) {
- cp->elements[idx].dl = new_dl;
- cpudl_heapify_down(cp, idx);
- } else {
- cp->elements[idx].dl = new_dl;
+ if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, cp->elements[idx].dl)) {
cpudl_heapify_up(cp, idx);
+ } else {
+ cpudl_heapify_down(cp, idx);
}
}
@@ -153,28 +175,30 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
*/
goto out;
}
- new_cpu = cp->elements[cp->size - 1].cpu;
- cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
- cp->elements[old_idx].cpu = new_cpu;
cp->size--;
- cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
- cpudl_heapify_up(cp, old_idx);
- cpumask_set_cpu(cpu, cp->free_cpus);
- cpudl_heapify_down(cp, old_idx);
-
- goto out;
- }
+ if (old_idx != cp->size) {
+ new_cpu = cp->elements[cp->size].cpu;
+ cp->elements[old_idx].dl = cp->elements[cp->size].dl;
+ cp->elements[old_idx].cpu = new_cpu;
+ cp->elements[new_cpu].idx = old_idx;
+ cpudl_heapify(cp, old_idx);
+ }
- if (old_idx == IDX_INVALID) {
- int size1 = cp->size++;
- cp->elements[size1].dl = dl;
- cp->elements[size1].cpu = cpu;
- cp->elements[cpu].idx = size1;
- cpudl_heapify_up(cp, size1);
- cpumask_clear_cpu(cpu, cp->free_cpus);
+ cpumask_set_cpu(cpu, cp->free_cpus);
} else {
- cpudl_change_key(cp, old_idx, dl);
+ if (old_idx == IDX_INVALID) {
+ int size1 = cp->size++;
+ cp->elements[size1].dl = dl;
+ cp->elements[size1].cpu = cpu;
+ cp->elements[cpu].idx = size1;
+ cpudl_heapify_up(cp, size1);
+
+ cpumask_clear_cpu(cpu, cp->free_cpus);
+ } else {
+ cp->elements[old_idx].dl = dl;
+ cpudl_heapify(cp, old_idx);
+ }
}
out:
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH 4/4] Split cpudl_set() into cpudl_set() and cpudl_clear(). These 2 exercise independent code paths and need different arguments.
2016-05-19 16:02 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
` (2 preceding siblings ...)
2016-05-19 16:02 ` [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops Tommaso Cucinotta
@ 2016-05-19 16:02 ` Tommaso Cucinotta
3 siblings, 0 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-05-19 16:02 UTC (permalink / raw)
To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, Tommaso Cucinotta
---
kernel/sched/cpudeadline.c | 71 +++++++++++++++++++++++++++++-----------------
kernel/sched/cpudeadline.h | 3 +-
kernel/sched/deadline.c | 10 +++----
3 files changed, 52 insertions(+), 32 deletions(-)
diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 60f933a..0f276bf 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -147,16 +147,15 @@ out:
}
/*
- * cpudl_set - update the cpudl max-heap
+ * cpudl_clear - remove a cpu from the cpudl max-heap
* @cp: the cpudl max-heap context
* @cpu: the target cpu
- * @dl: the new earliest deadline for this cpu
*
* Notes: assumes cpu_rq(cpu)->lock is locked
*
* Returns: (void)
*/
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
+void cpudl_clear(struct cpudl *cp, int cpu)
{
int old_idx, new_cpu;
unsigned long flags;
@@ -164,17 +163,16 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
WARN_ON(!cpu_present(cpu));
raw_spin_lock_irqsave(&cp->lock, flags);
+
old_idx = cp->elements[cpu].idx;
- if (!is_valid) {
+ if (old_idx == IDX_INVALID) {
+ /*
+ * Nothing to remove if old_idx was invalid.
+ * This could happen if a rq_offline_dl is
+ * called for a CPU without -dl tasks running.
+ */
+ } else {
/* remove item */
- if (old_idx == IDX_INVALID) {
- /*
- * Nothing to remove if old_idx was invalid.
- * This could happen if a rq_offline_dl is
- * called for a CPU without -dl tasks running.
- */
- goto out;
- }
cp->size--;
cp->elements[cpu].idx = IDX_INVALID;
if (old_idx != cp->size) {
@@ -184,24 +182,45 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
cp->elements[new_cpu].idx = old_idx;
cpudl_heapify(cp, old_idx);
}
-
cpumask_set_cpu(cpu, cp->free_cpus);
+ }
+
+ raw_spin_unlock_irqrestore(&cp->lock, flags);
+}
+
+/*
+ * cpudl_set - update the cpudl max-heap
+ * @cp: the cpudl max-heap context
+ * @cpu: the target cpu
+ * @dl: the new earliest deadline for this cpu
+ *
+ * Notes: assumes cpu_rq(cpu)->lock is locked
+ *
+ * Returns: (void)
+ */
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl)
+{
+ int old_idx;
+ unsigned long flags;
+
+ WARN_ON(!cpu_present(cpu));
+
+ raw_spin_lock_irqsave(&cp->lock, flags);
+
+ old_idx = cp->elements[cpu].idx;
+ if (old_idx == IDX_INVALID) {
+ int sz1 = cp->size++;
+ cp->elements[sz1].dl = dl;
+ cp->elements[sz1].cpu = cpu;
+ cp->elements[cpu].idx = sz1;
+ cpudl_heapify_up(cp, sz1);
+
+ cpumask_clear_cpu(cpu, cp->free_cpus);
} else {
- if (old_idx == IDX_INVALID) {
- int size1 = cp->size++;
- cp->elements[size1].dl = dl;
- cp->elements[size1].cpu = cpu;
- cp->elements[cpu].idx = size1;
- cpudl_heapify_up(cp, size1);
-
- cpumask_clear_cpu(cpu, cp->free_cpus);
- } else {
- cp->elements[old_idx].dl = dl;
- cpudl_heapify(cp, old_idx);
- }
+ cp->elements[old_idx].dl = dl;
+ cpudl_heapify(cp, old_idx);
}
-out:
raw_spin_unlock_irqrestore(&cp->lock, flags);
}
diff --git a/kernel/sched/cpudeadline.h b/kernel/sched/cpudeadline.h
index fcbdf83..f7da8c5 100644
--- a/kernel/sched/cpudeadline.h
+++ b/kernel/sched/cpudeadline.h
@@ -23,7 +23,8 @@ struct cpudl {
#ifdef CONFIG_SMP
int cpudl_find(struct cpudl *cp, struct task_struct *p,
struct cpumask *later_mask);
-void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid);
+void cpudl_set(struct cpudl *cp, int cpu, u64 dl);
+void cpudl_clear(struct cpudl *cp, int cpu);
int cpudl_init(struct cpudl *cp);
void cpudl_set_freecpu(struct cpudl *cp, int cpu);
void cpudl_clear_freecpu(struct cpudl *cp, int cpu);
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index fcb7f02..f2e8f47 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -795,7 +795,7 @@ static void inc_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
if (dl_rq->earliest_dl.curr == 0 ||
dl_time_before(deadline, dl_rq->earliest_dl.curr)) {
dl_rq->earliest_dl.curr = deadline;
- cpudl_set(&rq->rd->cpudl, rq->cpu, deadline, 1);
+ cpudl_set(&rq->rd->cpudl, rq->cpu, deadline);
}
}
@@ -810,14 +810,14 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
if (!dl_rq->dl_nr_running) {
dl_rq->earliest_dl.curr = 0;
dl_rq->earliest_dl.next = 0;
- cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+ cpudl_clear(&rq->rd->cpudl, rq->cpu);
} else {
struct rb_node *leftmost = dl_rq->rb_leftmost;
struct sched_dl_entity *entry;
entry = rb_entry(leftmost, struct sched_dl_entity, rb_node);
dl_rq->earliest_dl.curr = entry->deadline;
- cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline, 1);
+ cpudl_set(&rq->rd->cpudl, rq->cpu, entry->deadline);
}
}
@@ -1668,7 +1668,7 @@ static void rq_online_dl(struct rq *rq)
cpudl_set_freecpu(&rq->rd->cpudl, rq->cpu);
if (rq->dl.dl_nr_running > 0)
- cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr, 1);
+ cpudl_set(&rq->rd->cpudl, rq->cpu, rq->dl.earliest_dl.curr);
}
/* Assumes rq->lock is held */
@@ -1677,7 +1677,7 @@ static void rq_offline_dl(struct rq *rq)
if (rq->dl.overloaded)
dl_clear_overload(rq);
- cpudl_set(&rq->rd->cpudl, rq->cpu, 0, 0);
+ cpudl_clear(&rq->rd->cpudl, rq->cpu);
cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
}
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.
2016-07-19 9:44 SCHED_DEADLINE cpudeadline.{h,c} fixup Tommaso Cucinotta
@ 2016-07-19 9:44 ` Tommaso Cucinotta
2016-08-02 11:38 ` Juri Lelli
2016-08-08 14:04 ` Peter Zijlstra
0 siblings, 2 replies; 8+ messages in thread
From: Tommaso Cucinotta @ 2016-07-19 9:44 UTC (permalink / raw)
To: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar
Cc: linux-kernel, linux-dl, Tommaso Cucinotta, Juri Lelli
This change achieves up to 10% of speed-up for cpudl_set() calls, as
measured with a andomly generated workload of 1K,10K,100K random heap
insertions and deletions (75% cpudl_set() calls with is_valid=1 and
25% with is_valid=0), and randomly generated cpu IDs, with up to 256
CPUs, as measured on an Intel Core2 Duo.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Luca Abeni <luca.abeni@unitn.it>
Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
---
kernel/sched/cpudeadline.c | 114 +++++++++++++++++++++++++++------------------
1 file changed, 69 insertions(+), 45 deletions(-)
diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
index 3c42702..60f933a 100644
--- a/kernel/sched/cpudeadline.c
+++ b/kernel/sched/cpudeadline.c
@@ -31,60 +31,82 @@ static inline int right_child(int i)
return (i << 1) + 2;
}
-static void cpudl_exchange(struct cpudl *cp, int a, int b)
-{
- int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
-
- swap(cp->elements[a].cpu, cp->elements[b].cpu);
- swap(cp->elements[a].dl , cp->elements[b].dl );
-
- swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
-}
-
static void cpudl_heapify_down(struct cpudl *cp, int idx)
{
int l, r, largest;
+ int orig_cpu = cp->elements[idx].cpu;
+ u64 orig_dl = cp->elements[idx].dl;
+
+ if (left_child(idx) >= cp->size)
+ return;
+
/* adapted from lib/prio_heap.c */
while(1) {
+ u64 largest_dl;
l = left_child(idx);
r = right_child(idx);
largest = idx;
+ largest_dl = orig_dl;
- if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
- cp->elements[l].dl))
+ if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) {
largest = l;
- if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
- cp->elements[r].dl))
+ largest_dl = cp->elements[l].dl;
+ }
+ if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl))
largest = r;
+
if (largest == idx)
break;
- /* Push idx down the heap one level and bump one up */
- cpudl_exchange(cp, largest, idx);
+ /* pull largest child onto idx */
+ cp->elements[idx].cpu = cp->elements[largest].cpu;
+ cp->elements[idx].dl = cp->elements[largest].dl;
+ cp->elements[cp->elements[idx].cpu].idx = idx;
idx = largest;
}
+ /* actual push down of saved original values orig_* */
+ cp->elements[idx].cpu = orig_cpu;
+ cp->elements[idx].dl = orig_dl;
+ cp->elements[cp->elements[idx].cpu].idx = idx;
}
static void cpudl_heapify_up(struct cpudl *cp, int idx)
{
- while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
- cp->elements[idx].dl)) {
- cpudl_exchange(cp, idx, parent(idx));
- idx = parent(idx);
- }
+ int p;
+
+ int orig_cpu = cp->elements[idx].cpu;
+ u64 orig_dl = cp->elements[idx].dl;
+
+ if (idx == 0)
+ return;
+
+ do {
+ p = parent(idx);
+ if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl))
+ break;
+ /* pull parent onto idx */
+ cp->elements[idx].cpu = cp->elements[p].cpu;
+ cp->elements[idx].dl = cp->elements[p].dl;
+ cp->elements[cp->elements[idx].cpu].idx = idx;
+ idx = p;
+ } while (idx != 0);
+ /* actual push up of saved original values orig_* */
+ cp->elements[idx].cpu = orig_cpu;
+ cp->elements[idx].dl = orig_dl;
+ cp->elements[cp->elements[idx].cpu].idx = idx;
}
-static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
+static void cpudl_heapify(struct cpudl *cp, int idx)
{
WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
+ if (idx == IDX_INVALID)
+ return;
- if (dl_time_before(new_dl, cp->elements[idx].dl)) {
- cp->elements[idx].dl = new_dl;
- cpudl_heapify_down(cp, idx);
- } else {
- cp->elements[idx].dl = new_dl;
+ if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, cp->elements[idx].dl)) {
cpudl_heapify_up(cp, idx);
+ } else {
+ cpudl_heapify_down(cp, idx);
}
}
@@ -153,28 +175,30 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
*/
goto out;
}
- new_cpu = cp->elements[cp->size - 1].cpu;
- cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
- cp->elements[old_idx].cpu = new_cpu;
cp->size--;
- cp->elements[new_cpu].idx = old_idx;
cp->elements[cpu].idx = IDX_INVALID;
- cpudl_heapify_up(cp, old_idx);
- cpumask_set_cpu(cpu, cp->free_cpus);
- cpudl_heapify_down(cp, old_idx);
-
- goto out;
- }
+ if (old_idx != cp->size) {
+ new_cpu = cp->elements[cp->size].cpu;
+ cp->elements[old_idx].dl = cp->elements[cp->size].dl;
+ cp->elements[old_idx].cpu = new_cpu;
+ cp->elements[new_cpu].idx = old_idx;
+ cpudl_heapify(cp, old_idx);
+ }
- if (old_idx == IDX_INVALID) {
- int size1 = cp->size++;
- cp->elements[size1].dl = dl;
- cp->elements[size1].cpu = cpu;
- cp->elements[cpu].idx = size1;
- cpudl_heapify_up(cp, size1);
- cpumask_clear_cpu(cpu, cp->free_cpus);
+ cpumask_set_cpu(cpu, cp->free_cpus);
} else {
- cpudl_change_key(cp, old_idx, dl);
+ if (old_idx == IDX_INVALID) {
+ int size1 = cp->size++;
+ cp->elements[size1].dl = dl;
+ cp->elements[size1].cpu = cpu;
+ cp->elements[cpu].idx = size1;
+ cpudl_heapify_up(cp, size1);
+
+ cpumask_clear_cpu(cpu, cp->free_cpus);
+ } else {
+ cp->elements[old_idx].dl = dl;
+ cpudl_heapify(cp, old_idx);
+ }
}
out:
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.
2016-07-19 9:44 ` [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops Tommaso Cucinotta
@ 2016-08-02 11:38 ` Juri Lelli
2016-08-08 14:04 ` Peter Zijlstra
1 sibling, 0 replies; 8+ messages in thread
From: Juri Lelli @ 2016-08-02 11:38 UTC (permalink / raw)
To: Tommaso Cucinotta
Cc: Luca Abeni, Juri Lelli, Peter Zijlstra, Ingo Molnar, linux-kernel
Hi,
On 19/07/16 11:44, Tommaso Cucinotta wrote:
> This change achieves up to 10% of speed-up for cpudl_set() calls, as
> measured with a andomly generated workload of 1K,10K,100K random heap
> insertions and deletions (75% cpudl_set() calls with is_valid=1 and
> 25% with is_valid=0), and randomly generated cpu IDs, with up to 256
> CPUs, as measured on an Intel Core2 Duo.
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Juri Lelli <juri.lelli@arm.com>
> Cc: Luca Abeni <luca.abeni@unitn.it>
> Reviewed-by: Luca Abeni <luca.abeni@unitn.it>
> Signed-off-by: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
> ---
> kernel/sched/cpudeadline.c | 114 +++++++++++++++++++++++++++------------------
> 1 file changed, 69 insertions(+), 45 deletions(-)
>
> diff --git a/kernel/sched/cpudeadline.c b/kernel/sched/cpudeadline.c
> index 3c42702..60f933a 100644
> --- a/kernel/sched/cpudeadline.c
> +++ b/kernel/sched/cpudeadline.c
> @@ -31,60 +31,82 @@ static inline int right_child(int i)
> return (i << 1) + 2;
> }
>
> -static void cpudl_exchange(struct cpudl *cp, int a, int b)
> -{
> - int cpu_a = cp->elements[a].cpu, cpu_b = cp->elements[b].cpu;
> -
> - swap(cp->elements[a].cpu, cp->elements[b].cpu);
> - swap(cp->elements[a].dl , cp->elements[b].dl );
> -
> - swap(cp->elements[cpu_a].idx, cp->elements[cpu_b].idx);
> -}
> -
> static void cpudl_heapify_down(struct cpudl *cp, int idx)
> {
> int l, r, largest;
>
> + int orig_cpu = cp->elements[idx].cpu;
> + u64 orig_dl = cp->elements[idx].dl;
> +
> + if (left_child(idx) >= cp->size)
> + return;
> +
> /* adapted from lib/prio_heap.c */
> while(1) {
> + u64 largest_dl;
> l = left_child(idx);
> r = right_child(idx);
> largest = idx;
> + largest_dl = orig_dl;
>
> - if ((l < cp->size) && dl_time_before(cp->elements[idx].dl,
> - cp->elements[l].dl))
> + if ((l < cp->size) && dl_time_before(orig_dl, cp->elements[l].dl)) {
> largest = l;
> - if ((r < cp->size) && dl_time_before(cp->elements[largest].dl,
> - cp->elements[r].dl))
> + largest_dl = cp->elements[l].dl;
OK, I was a bit puzzled by the usage of largest_dl in this loop, but I
seemed to convince myself that it is right. It's basically a local tmp
variable that you use to pick left or right child at each iteration, and
you reset it to the deadline of the node your are moving (orig_dl) after
it iteration.
> + }
> + if ((r < cp->size) && dl_time_before(largest_dl, cp->elements[r].dl))
Nitpick, this and the above conditions break 80 columns.
> largest = r;
> +
> if (largest == idx)
> break;
>
> - /* Push idx down the heap one level and bump one up */
> - cpudl_exchange(cp, largest, idx);
> + /* pull largest child onto idx */
> + cp->elements[idx].cpu = cp->elements[largest].cpu;
> + cp->elements[idx].dl = cp->elements[largest].dl;
> + cp->elements[cp->elements[idx].cpu].idx = idx;
> idx = largest;
> }
> + /* actual push down of saved original values orig_* */
> + cp->elements[idx].cpu = orig_cpu;
> + cp->elements[idx].dl = orig_dl;
> + cp->elements[cp->elements[idx].cpu].idx = idx;
> }
>
> static void cpudl_heapify_up(struct cpudl *cp, int idx)
> {
> - while (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl,
> - cp->elements[idx].dl)) {
> - cpudl_exchange(cp, idx, parent(idx));
> - idx = parent(idx);
> - }
> + int p;
> +
> + int orig_cpu = cp->elements[idx].cpu;
> + u64 orig_dl = cp->elements[idx].dl;
> +
> + if (idx == 0)
> + return;
> +
> + do {
> + p = parent(idx);
> + if (dl_time_before(cp->elements[idx].dl, cp->elements[p].dl))
> + break;
> + /* pull parent onto idx */
> + cp->elements[idx].cpu = cp->elements[p].cpu;
> + cp->elements[idx].dl = cp->elements[p].dl;
> + cp->elements[cp->elements[idx].cpu].idx = idx;
> + idx = p;
> + } while (idx != 0);
> + /* actual push up of saved original values orig_* */
> + cp->elements[idx].cpu = orig_cpu;
> + cp->elements[idx].dl = orig_dl;
> + cp->elements[cp->elements[idx].cpu].idx = idx;
> }
>
> -static void cpudl_change_key(struct cpudl *cp, int idx, u64 new_dl)
> +static void cpudl_heapify(struct cpudl *cp, int idx)
> {
> WARN_ON(idx == IDX_INVALID || !cpu_present(idx));
> + if (idx == IDX_INVALID)
> + return;
Can we actually get here with IDX_INVALID? We bail out in the !is_valid
branch and the other call point is an else branch of old_idx ==
IDX_INVALID. So, it seems that this is more a BUG_ON().
>
> - if (dl_time_before(new_dl, cp->elements[idx].dl)) {
> - cp->elements[idx].dl = new_dl;
> - cpudl_heapify_down(cp, idx);
> - } else {
> - cp->elements[idx].dl = new_dl;
> + if (idx > 0 && dl_time_before(cp->elements[parent(idx)].dl, cp->elements[idx].dl)) {
You can wrap after the comma. And brackets are not required.
> cpudl_heapify_up(cp, idx);
> + } else {
> + cpudl_heapify_down(cp, idx);
> }
> }
>
> @@ -153,28 +175,30 @@ void cpudl_set(struct cpudl *cp, int cpu, u64 dl, int is_valid)
> */
> goto out;
> }
> - new_cpu = cp->elements[cp->size - 1].cpu;
> - cp->elements[old_idx].dl = cp->elements[cp->size - 1].dl;
> - cp->elements[old_idx].cpu = new_cpu;
> cp->size--;
> - cp->elements[new_cpu].idx = old_idx;
> cp->elements[cpu].idx = IDX_INVALID;
> - cpudl_heapify_up(cp, old_idx);
> - cpumask_set_cpu(cpu, cp->free_cpus);
> - cpudl_heapify_down(cp, old_idx);
> -
> - goto out;
> - }
> + if (old_idx != cp->size) {
> + new_cpu = cp->elements[cp->size].cpu;
> + cp->elements[old_idx].dl = cp->elements[cp->size].dl;
> + cp->elements[old_idx].cpu = new_cpu;
> + cp->elements[new_cpu].idx = old_idx;
> + cpudl_heapify(cp, old_idx);
> + }
>
> - if (old_idx == IDX_INVALID) {
> - int size1 = cp->size++;
> - cp->elements[size1].dl = dl;
> - cp->elements[size1].cpu = cpu;
> - cp->elements[cpu].idx = size1;
> - cpudl_heapify_up(cp, size1);
> - cpumask_clear_cpu(cpu, cp->free_cpus);
> + cpumask_set_cpu(cpu, cp->free_cpus);
> } else {
> - cpudl_change_key(cp, old_idx, dl);
> + if (old_idx == IDX_INVALID) {
> + int size1 = cp->size++;
> + cp->elements[size1].dl = dl;
> + cp->elements[size1].cpu = cpu;
> + cp->elements[cpu].idx = size1;
> + cpudl_heapify_up(cp, size1);
> +
> + cpumask_clear_cpu(cpu, cp->free_cpus);
> + } else {
> + cp->elements[old_idx].dl = dl;
> + cpudl_heapify(cp, old_idx);
> + }
> }
>
> out:
> --
Couldn't spot any problem with this change (apart from the minor things
I pointed out above). Didn't test it extensively, though. But it seems
you did. :)
Best,
- Juri
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops.
2016-07-19 9:44 ` [RFC PATCH 3/4] Make deadline max-heap faster avoiding real swaps on heapify ops Tommaso Cucinotta
2016-08-02 11:38 ` Juri Lelli
@ 2016-08-08 14:04 ` Peter Zijlstra
1 sibling, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2016-08-08 14:04 UTC (permalink / raw)
To: Tommaso Cucinotta
Cc: Luca Abeni, Juri Lelli, Ingo Molnar, linux-kernel, linux-dl, Juri Lelli
On Tue, Jul 19, 2016 at 11:44:52AM +0200, Tommaso Cucinotta wrote:
> This change achieves up to 10% of speed-up for cpudl_set() calls, as
> measured with a andomly generated workload of 1K,10K,100K random heap
> insertions and deletions (75% cpudl_set() calls with is_valid=1 and
> 25% with is_valid=0), and randomly generated cpu IDs, with up to 256
> CPUs, as measured on an Intel Core2 Duo.
Changelog fails to explain _what_ the change is.
Some details on the change in implementation and why this is faster
would be good.
^ permalink raw reply [flat|nested] 8+ messages in thread